黒木玄
2018-01-10
Julia言語による配列の取り扱いは確かに面倒である.
予備知識無しに配列の和を繰り返し書くコードを書くと大量にメモリを消費してしまう.
周期境界条件の下での2次元平面の離散ラプラシアンの計算によるベンチマーク.
結論:
(1) 2次元平面上のベクトル値函数の離散化を, スカラーの3次元配列をベクトルの2次元配列とみなして実現するときは, 離散ラプラシアンの定義で dot syntax もしくは @.
マクロを適切に使用するだけではなく, 適切に函数を分割して, @inline
マクロも併用しなければいけない. そのどちらかを忘れると, メモリを大量消費し, 計算も遅くなってしまう.
(2) 2次元平面上のベクトル値函数の離散化を, ベクトルの2次元配列で実現するとき, 離散ラプラシアンの定義で dot syntax もしくは @.
マクロを適切に使用しなけれないけない. それを忘れると, メモリを大量消費し, 計算も遅くなってしまう. (この方法は上の3次元配列を使う場合よりも6割ほど計算が遅くなるようだ.)
(3) 2次元平面上のスカラー値函数の離散化を, スカラーの2次元配列で実現するとき, 離散ラプラシアンの定義に特別な注意は必要ない.
特に
それぞれの日本語訳が次の場所にある:
函数の分割, ドット演算子 (@.
マクロ), @view
マクロの使用に関するヒントは以上のドキュメントにある. しかし, さらに @inline
を使用することによって, メモリ効率の高い計算を実現できる理由がよくわからない. 知っている人がいたら教えて下さい.
using BenchmarkTools
using PyPlot
seed = 2018
cmap = "CMRmap"
# 平面全体での離散ラプラシアンの計算
#
# laplacian_local! は離散ラプラシアンの値を局所的に計算する函数である.
# このように, 局所的なラプラシアンの計算を別の函数として分離しておく.
#
function laplacian!(v, u, laplacian_local!)
m, n = size(u)[end-1:end]
for i in 1:m
for j in 1:n
laplacian_local!(v, u, m, n, i, j)
end
end
end
# laplacian_local! の性能のテスト
#
function test!(v, u, laplacian_local!; niters = 10)
for iter in 1:niters
laplacian!(v, u, laplacian_local!)
end
end
versioninfo()
Julia Version 0.6.2 Commit d386e40c17* (2017-12-13 18:08 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell) LAPACK: libopenblas64_ LIBM: libopenlibm LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
このノートブックは以上の環境での実行結果である.
Julia言語の発展は速いので, 数ヶ月後には時代遅れの内容になっているかもしれないので, 注意して欲しい
# ラプラシアンの値を局所的に計算するための函数をこのように分離しておく.
# @inline と @. を併用すると, メモリ消費も少なく, 計算も速くなる.
#
@inline function laplacian_atdot_inline!(v, u, m, n, i, j)
@.(
v[:,i,j] =
@view(u[:, ifelse(i+1 ≤ m, i+1, 1), j]) + @view(u[:, ifelse(i-1 ≥ 1, i-1, m), j]) +
@view(u[:, i, ifelse(j+1 ≤ n, j+1, 1)]) + @view(u[:, i, ifelse(j-1 ≥ 1, j-1, n)]) -
4*@view(u[:, i, j])
)
end
n = 1000
srand(seed)
u = rand(3, n, n)
v = Array{Float64,3}(3, n, n)
@benchmark test!(v, u, laplacian_atdot_inline!)
BenchmarkTools.Trial: memory estimate: 320 bytes allocs estimate: 10 -------------- minimum time: 983.911 ms (0.00% GC) median time: 990.116 ms (0.00% GC) mean time: 1.010 s (0.00% GC) maximum time: 1.092 s (0.00% GC) -------------- samples: 5 evals/sample: 1
function laplacian_atdot!(v, u, m, n, i, j)
@.(
v[:,i,j] =
@view(u[:, ifelse(i+1 ≤ m, i+1, 1), j]) + @view(u[:, ifelse(i-1 ≥ 1, i-1, m), j]) +
@view(u[:, i, ifelse(j+1 ≤ n, j+1, 1)]) + @view(u[:, i, ifelse(j-1 ≥ 1, j-1, n)]) -
4*@view(u[:, i, j])
)
end
n = 1000
srand(seed)
u = rand(3, n, n)
v = Array{Float64,3}(3, n, n)
@benchmark test!(v, u, laplacian_atdot!)
BenchmarkTools.Trial: memory estimate: 2.98 GiB allocs estimate: 50000010 -------------- minimum time: 1.473 s (4.42% GC) median time: 1.522 s (4.45% GC) mean time: 1.540 s (4.42% GC) maximum time: 1.643 s (4.35% GC) -------------- samples: 4 evals/sample: 1
@inline function laplacian_inline!(v, u, m, n, i, j)
v[:,i,j] =
@view(u[:, ifelse(i+1 ≤ m, i+1, 1), j]) + @view(u[:, ifelse(i-1 ≥ 1, i-1, m), j]) +
@view(u[:, i, ifelse(j+1 ≤ n, j+1, 1)]) + @view(u[:, i, ifelse(j-1 ≥ 1, j-1, n)]) -
4*@view(u[:, i, j])
end
n = 1000
srand(seed)
u = rand(3, n, n)
v = Array{Float64,3}(3, n, n)
@benchmark test!(v, u, laplacian_inline!)
BenchmarkTools.Trial: memory estimate: 8.79 GiB allocs estimate: 129780010 -------------- minimum time: 6.426 s (6.01% GC) median time: 6.426 s (6.01% GC) mean time: 6.426 s (6.01% GC) maximum time: 6.426 s (6.01% GC) -------------- samples: 1 evals/sample: 1
function laplacian_plain!(v, u, m, n, i, j)
v[:,i,j] =
@view(u[:, ifelse(i+1 ≤ m, i+1, 1), j]) + @view(u[:, ifelse(i-1 ≥ 1, i-1, m), j]) +
@view(u[:, i, ifelse(j+1 ≤ n, j+1, 1)]) + @view(u[:, i, ifelse(j-1 ≥ 1, j-1, n)]) -
4*@view(u[:, i, j])
end
n = 1000
srand(seed)
u = rand(3, n, n)
v = Array{Float64,3}(3, n, n)
@benchmark test!(v, u, laplacian_plain!)
BenchmarkTools.Trial: memory estimate: 8.79 GiB allocs estimate: 129780010 -------------- minimum time: 6.681 s (6.02% GC) median time: 6.681 s (6.02% GC) mean time: 6.681 s (6.02% GC) maximum time: 6.681 s (6.02% GC) -------------- samples: 1 evals/sample: 1
ベクトルの2次元配列を扱った場合.
@inline
を使わなくても, @.
を使えば望んでいたようなメモリの使い方をしてくれる.
@inline
を使うとほんの少し速くなるが, 3次元配列をベクトルの2次元配列とみなして使った場合よりも6割程度計算時間が余計にかかっている.
ベクトルの2次元配列を使う場合には @.
さえ使っていれば, メモリを大量消費して遅くなることはない. しかし, 3次元配列を使って同様のことをした場合よりも計算は遅くなる.
3次元配列を使って同様のことをしてメモリの大量消費を防ぐためには, @.
だけではなく, @inline
も併用することを忘れないようにしなければいけない. これは奇妙でかつ面倒である. しかし, ベクトルの2次元配列を使った場合よりも計算は速くなる.
一長一短.
@inline function laplacian2d_atdot_inline!(v, u, m, n, i, j)
@. v[i,j] =
u[ifelse(i+1 ≤ m, i+1, 1), j] + u[ifelse(i-1 ≥ 1, i-1, m), j] +
u[i, ifelse(j+1 ≤ n, j+1, 1)] + u[i, ifelse(j-1 ≥ 1, j-1, n)] - 4u[i,j]
end
n = 1000
srand(seed)
u = [rand(3) for i in 1:n, j in 1:n]
v = [Array{Float64,1}(3) for i in 1:n, j in 1:n]
@benchmark test!(v, u, laplacian2d_atdot_inline!)
BenchmarkTools.Trial: memory estimate: 320 bytes allocs estimate: 10 -------------- minimum time: 1.547 s (0.00% GC) median time: 1.577 s (0.00% GC) mean time: 1.584 s (0.00% GC) maximum time: 1.633 s (0.00% GC) -------------- samples: 4 evals/sample: 1
function laplacian2d_atdot!(v, u, m, n, i, j)
@. v[i,j] =
u[ifelse(i+1 ≤ m, i+1, 1), j] + u[ifelse(i-1 ≥ 1, i-1, m), j] +
u[i, ifelse(j+1 ≤ n, j+1, 1)] + u[i, ifelse(j-1 ≥ 1, j-1, n)] - 4u[i,j]
end
n = 1000
srand(seed)
u = [rand(3) for i in 1:n, j in 1:n]
v = [Array{Float64,1}(3) for i in 1:n, j in 1:n]
@benchmark test!(v, u, laplacian2d_atdot!)
BenchmarkTools.Trial: memory estimate: 320 bytes allocs estimate: 10 -------------- minimum time: 1.748 s (0.00% GC) median time: 1.754 s (0.00% GC) mean time: 1.775 s (0.00% GC) maximum time: 1.824 s (0.00% GC) -------------- samples: 3 evals/sample: 1
@inline function laplacian2d_inline!(v, u, m, n, i, j)
v[i,j] =
u[ifelse(i+1 ≤ m, i+1, 1), j] + u[ifelse(i-1 ≥ 1, i-1, m), j] +
u[i, ifelse(j+1 ≤ n, j+1, 1)] + u[i, ifelse(j-1 ≥ 1, j-1, n)] - 4u[i,j]
end
n = 1000
srand(seed)
u = [rand(3) for i in 1:n, j in 1:n]
v = [Array{Float64,1}(3) for i in 1:n, j in 1:n]
@benchmark test!(v, u, laplacian2d_inline!)
BenchmarkTools.Trial: memory estimate: 5.22 GiB allocs estimate: 50000010 -------------- minimum time: 18.027 s (75.08% GC) median time: 18.027 s (75.08% GC) mean time: 18.027 s (75.08% GC) maximum time: 18.027 s (75.08% GC) -------------- samples: 1 evals/sample: 1
function laplacian2d_plain!(v, u, m, n, i, j)
v[i,j] =
u[ifelse(i+1 ≤ m, i+1, 1), j] + u[ifelse(i-1 ≥ 1, i-1, m), j] +
u[i, ifelse(j+1 ≤ n, j+1, 1)] + u[i, ifelse(j-1 ≥ 1, j-1, n)] - 4u[i,j]
end
n = 1000
srand(seed)
u = [rand(3) for i in 1:n, j in 1:n]
v = [Array{Float64,1}(3) for i in 1:n, j in 1:n]
@benchmark test!(v, u, laplacian2d_plain!)
BenchmarkTools.Trial: memory estimate: 5.22 GiB allocs estimate: 50000010 -------------- minimum time: 21.955 s (73.55% GC) median time: 21.955 s (73.55% GC) mean time: 21.955 s (73.55% GC) maximum time: 21.955 s (73.55% GC) -------------- samples: 1 evals/sample: 1
スカラーの2次元配列の離散ラプラシアンでは上のような問題は発生しない.
n = 1000
srand(seed)
u = rand(n,n)
v = Array{Float64,2}(n,n)
@benchmark test!(v, u, laplacian2d_plain!)
BenchmarkTools.Trial: memory estimate: 320 bytes allocs estimate: 10 -------------- minimum time: 254.211 ms (0.00% GC) median time: 265.840 ms (0.00% GC) mean time: 266.084 ms (0.00% GC) maximum time: 273.868 ms (0.00% GC) -------------- samples: 19 evals/sample: 1
ラプラシアンが誤りなく定義できているかの確認.
function plot2pcolormesh(u, v)
figure(figsize=(8,3))
fig = subplot(121)
pcolormesh(x, y, u, cmap=cmap)
colorbar()
fig[:set_aspect]("equal")
fig = subplot(122)
pcolormesh(x, y, v, cmap=cmap)
colorbar()
fig[:set_aspect]("equal")
tight_layout()
end
f(x, y) = exp(-x^2-y^2)
g(x, y) = x^2 - y^2
h(x, y) = log(x^2+y^2)
F(x, y) = [f(x,y), g(x,y), h(x,y)]
r = 3
n = 1000
1000
x = y = linspace(-5,5,n)
u = F.(x, y')
U = [u[j,k][i] for i in 1:r, j in 1:n, k in 1:n]
V = Array{Float64,3}(r,n,n)
@time laplacian!(V, U, laplacian_atdot_inline!)
for i in 1:r
plot2pcolormesh(U[i,:,:], V[i,:,:])
end
0.104697 seconds (17 allocations: 1.063 KiB)
x = y = linspace(-5,5,n)
u = F.(x, y')
v = [Array{Float64,1}(r) for i in 1:n, j in 1:n]
@time laplacian!(v, u, laplacian2d_atdot_inline!)
U = [u[j,k][i] for i in 1:r, j in 1:n, k in 1:n]
V = [v[j,k][i] for i in 1:r, j in 1:n, k in 1:n]
for i in 1:r
plot2pcolormesh(U[i,:,:], V[i,:,:])
end
0.169252 seconds (5 allocations: 192 bytes)
N = 10^8 について 2N 個の和で比較.
# 速い
N = 10^8
srand(seed)
a = rand(2N)
@benchmark sum(a)
BenchmarkTools.Trial: memory estimate: 16 bytes allocs estimate: 1 -------------- minimum time: 109.935 ms (0.00% GC) median time: 123.309 ms (0.00% GC) mean time: 124.122 ms (0.00% GC) maximum time: 150.062 ms (0.00% GC) -------------- samples: 41 evals/sample: 1
# 遅い
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark sum(a+b)
BenchmarkTools.Trial: memory estimate: 762.94 MiB allocs estimate: 3 -------------- minimum time: 400.074 ms (0.12% GC) median time: 491.280 ms (12.60% GC) mean time: 488.464 ms (14.64% GC) maximum time: 573.180 ms (31.99% GC) -------------- samples: 11 evals/sample: 1
# 遅い
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark sum(a.+b)
BenchmarkTools.Trial: memory estimate: 762.94 MiB allocs estimate: 28 -------------- minimum time: 369.929 ms (0.17% GC) median time: 444.258 ms (12.63% GC) mean time: 466.281 ms (14.97% GC) maximum time: 594.597 ms (32.51% GC) -------------- samples: 11 evals/sample: 1
# 速い
function mysum(a,b)
s = zero(eltype(a))
@fastmath @inbounds @simd for i in 1:endof(a)
s += a[i] + b[i]
end
s
end
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark mysum(a,b)
BenchmarkTools.Trial: memory estimate: 16 bytes allocs estimate: 1 -------------- minimum time: 105.714 ms (0.00% GC) median time: 120.257 ms (0.00% GC) mean time: 119.414 ms (0.00% GC) maximum time: 152.791 ms (0.00% GC) -------------- samples: 42 evals/sample: 1
# 速い
macro sum(init, i, iter, f)
quote
begin
s = $(esc(init))
for $(esc(i)) in $(esc(iter))
s += $(esc(f))
end
s
end
end
end
function mysum_macro(a,b)
@sum(zero(eltype(a)), i, 1:endof(a), a[i]+b[i])
end
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark mysum_macro(a,b)
BenchmarkTools.Trial: memory estimate: 16 bytes allocs estimate: 1 -------------- minimum time: 125.708 ms (0.00% GC) median time: 131.618 ms (0.00% GC) mean time: 131.698 ms (0.00% GC) maximum time: 141.348 ms (0.00% GC) -------------- samples: 38 evals/sample: 1
@macroexpand @sum(0, k, 1:10, k)
quote # In[18], line 5: begin # In[18], line 6: #73#s = 0 # In[18], line 7: for k = 1:10 # In[18], line 8: #73#s += k end # In[18], line 10: #73#s end end
# 速い
macro sum_fis(init, i, iter, f)
quote
begin
s = $(esc(init))
@fastmath @inbounds @simd for $(esc(i)) in $(esc(iter))
s += $(esc(f))
end
s
end
end
end
function mysum_macro_fis(a,b)
@sum(zero(eltype(a)), i, 1:endof(a), a[i]+b[i])
end
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark mysum_macro_fis(a,b)
BenchmarkTools.Trial: memory estimate: 16 bytes allocs estimate: 1 -------------- minimum time: 119.703 ms (0.00% GC) median time: 128.583 ms (0.00% GC) mean time: 136.279 ms (0.00% GC) maximum time: 178.301 ms (0.00% GC) -------------- samples: 37 evals/sample: 1
@sum(0.0, i, 1:3, @sum(0, j, 1:2, 2.0^(i*j))), sum(2.0^(i*j) for i in 1:3 for j in 1:2)
(98.0, 98.0)
# 速い
function mysum_sum_of_f_itr(a,b)
sum(i->a[i]+b[i], 1:endof(a))
end
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark mysum_sum_of_f_itr(a,b)
BenchmarkTools.Trial: memory estimate: 128 bytes allocs estimate: 5 -------------- minimum time: 122.815 ms (0.00% GC) median time: 155.412 ms (0.00% GC) mean time: 156.040 ms (0.00% GC) maximum time: 206.671 ms (0.00% GC) -------------- samples: 32 evals/sample: 1
# 速い
function mysum_reduce(a,b)
reduce(+, zero(eltype(a)), (a[i]+b[i] for i in 1:endof(a)))
end
N = 10^8
srand(seed)
a = rand(N)
b = rand(N)
@benchmark mysum_reduce(a,b)
BenchmarkTools.Trial: memory estimate: 176 bytes allocs estimate: 7 -------------- minimum time: 115.950 ms (0.00% GC) median time: 130.868 ms (0.00% GC) mean time: 133.665 ms (0.00% GC) maximum time: 178.645 ms (0.00% GC) -------------- samples: 38 evals/sample: 1