Julia言語で計算が遅くなった場合の解決法 v1.0版

黒木玄

2018-01-12~2018-09-07

Julia v0.6版は次の場所にある. 比較すると面白い.

基本文献:

普遍的な対処法:

  • @code_warntype マクロを使ってJulia言語による型推定が十分に成功しているかを確認し, もしも赤字の警告が出ていたならば, 「Julia言語は函数の引数の型から他の変数の型を推定しようとすること」を思い出して, 型推定ができないもしくは難しい書き方をしていないかに注意を払う.
  • @time マクロで計算時間だけではなく, メモリ消費量も確認する. メモリ消費量が異様に大きい場合にはどこかで「失敗」してしまっている可能性が高い. 配列に大量にアクセスするプログラムを書いているときには特に注意する.

関連の Jupyter notebook が以下の場所にあるが, それらは更新されない可能性が高いので注意して欲しい.

目次

In [1]:
versioninfo()
Julia Version 1.0.0
Commit 5d4eaca0c9 (2018-08-08 20:58 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_PKGDIR = C:\JuliaPkg

Julia言語のバージョンが上がるとこのノートの内容は無意味になっているかもしれないので注意!

以下は v0.6 では正しい解説とコメントがされているコードを v1.0 で実行した結果である. 解説・コメント部分は v1.0 では正しくない部分が結構あるので注意せよ.

In [2]:
# Pkg.add("BenchmarkTools")
#
using BenchmarkTools

# http://nbviewer.jupyter.org/gist/genkuroki/81de23edcae631a995e19a2ecf946a4f
# の第1.6.1節を見よ.
#
# ENV["PYTHON"]="C:\\Anaconda3\\python.exe" # ← 自分の環境の合わせて変える
# Pkg.add("PyPlot")
#
using PyPlot

# seed!(x)
using Random
const srand = Random.seed!

# lispace(start, stop, length)
linspace(start, stop, length) = range(start, stop=stop, length=length)

# mean(x), etc
using Statistics

# Diagonal, etc
using LinearAlgebra
eye(T::Type, n) = Diagonal{T}(I, n)
eye(n) = eye(Float64, n)
eye(T::Type, m, n) = Matrix{T}(I, m, n)
eye(m, n) = eye(Float64, m, n)

# endof
endof(x) = lastindex(x)
Out[2]:
endof (generic function with 1 method)

トップレベルにコードを書くと遅くなる

解決法: 函数の中にコードを書く.

テスト用にトップレベルに書いたコードを function main() と end で囲むだけで計算が大幅に速くなる.

Julia言語について, トップレベルに書いたコードで計算速度を計測することは 禁忌 である.

以下は微分方程式 $y'=iy$, $y(0)=1$ の解に関する $y(2\pi)$ の計算. 解は $y(t) = e^{it}$ なので, 計算結果は 1 に近くならなければいけない.

函数の中に入れるだけで大幅に速くなる.

In [3]:
# 非常に遅い.

N = 10^8
h = 2π/N
y = 1.0 + 0.0im           # ← y を複素数の 1 で初期化
@time for i in 1:N
    global y
    y += h*im*y           # Julia言語では虚数単位を im と書く
end
y
 20.221734 seconds (600.01 M allocations: 14.902 GiB, 7.41% gc time)
Out[3]:
1.0000001973920172 + 8.725669089740029e-15im
In [4]:
# 函数の中に入れるだけで大幅に速くなる.

function test_of_integration()
    N = 10^8
    h = 2π/N
    y = 1.0 + 0.0im           # ← y を複素数の 1 で初期化
    for i in 1:N
        y += h*im*y
    end
    y
end

test_of_integration()
@show @time test_of_integration()

# コードを函数の中に入れたら, 常に @code_warntype で確認するようにする.
# 注意が必要な部分は赤字で表示される.
println()
@code_warntype test_of_integration()
  0.472570 seconds (5 allocations: 192 bytes)
#= In[4]:14 =# @time(test_of_integration()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Complex{Float64}
│╻╷╷╷╷ literal_pow4  1 ──       (Base.ifelse)(true, 0, 0)
││╻     macro expansion   2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│    ^   │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│    power_by_squaring   │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻     >   │    %5  = (Base.slt_int)(0, %4)::Bool
│││││    └───       goto #4 if not %5
│││││╻     *   3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││    └───       goto #2
│││││    4 ──       goto #5
││││     5 ──       goto #6
│││      6 ──       goto #7
│╻╷    *5  7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷  promote   │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻     /   │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻╷╷   *6  │          (Base.ifelse)(false, 0.0, 0.0)
││╻     *   │          (Base.ifelse)(true, 0.0, 0.0)
││╻     +   │    %17 = (Base.add_float)(1.0, 0.0)::Float64
│││╻     Type   │    %18 = %new(Complex{Float64}, %17, 0.0)::Complex{Float64}
│╻╷╷╷╷ Colon7  │    %19 = (Base.sle_int)(1, %2)::Bool
││╻     Type   │          (Base.sub_int)(%2, 1)
│││┃     unitrange_last   │    %21 = (Base.ifelse)(%19, %2, 0)::Int64
││╻╷╷   isempty   │    %22 = (Base.slt_int)(%21, 1)::Bool
││       └───       goto #9 if not %22
││       8 ──       goto #10
││       9 ──       goto #10
   10 ┄ %26 = φ (#8 => true, #9 => false)::Bool
   │    %27 = φ (#9 => 1)::Int64
   │    %28 = (Base.not_int)(%26)::Bool
   └───       goto #16 if not %28
   11 ┄ %30 = φ (#10 => %17, #15 => %47)::Float64
   │    %31 = φ (#10 => 0.0, #15 => %48)::Float64
   │    %32 = φ (#10 => 0.0, #15 => %48)::Float64
   │    %33 = φ (#10 => %17, #15 => %47)::Float64
   │    %34 = φ (#10 => %17, #15 => %47)::Float64
   │    %35 = φ (#10 => 0.0, #15 => %48)::Float64
   │    %36 = φ (#10 => %27, #15 => %55)::Int64
│╻╷╷╷╷ *8  │    %37 = (Base.copysign_float)(0.0, %14)::Float64
││┃││   *   │    %38 = (Base.ifelse)(false, %14, %37)::Float64
│││╻╷    *   │    %39 = (Base.copysign_float)(0.0, %14)::Float64
││││┃     *   │    %40 = (Base.ifelse)(true, %14, %39)::Float64
│││╻     *   │    %41 = (Base.mul_float)(%38, %30)::Float64
││││     │    %42 = (Base.mul_float)(%40, %31)::Float64
│││╻     -   │    %43 = (Base.sub_float)(%41, %42)::Float64
│││╻     *   │    %44 = (Base.mul_float)(%38, %32)::Float64
││││     │    %45 = (Base.mul_float)(%40, %33)::Float64
│││╻     +   │    %46 = (Base.add_float)(%44, %45)::Float64
││╻     +   │    %47 = (Base.add_float)(%34, %43)::Float64
│││      │    %48 = (Base.add_float)(%35, %46)::Float64
│││╻     Type   │    %49 = %new(Complex{Float64}, %47, %48)::Complex{Float64}
││╻     ==   │    %50 = (%36 === %21)::Bool
││       └───       goto #13 if not %50
││       12 ─       goto #14
││╻     +   13 ─ %53 = (Base.add_int)(%36, 1)::Int64
│╻     iterate   └───       goto #14
   14 ┄ %55 = φ (#13 => %53)::Int64
   │    %56 = φ (#12 => true, #13 => false)::Bool
   │    %57 = (Base.not_int)(%56)::Bool
   └───       goto #16 if not %57
   15 ─       goto #11
10 16 ─ %60 = φ (#14 => %49, #10 => %18)::Complex{Float64}
   └───       return %60

変数の型が途中で変わるような書き方をすると遅くなる場合がある

解決法: 変数の型が途中で変化してしまうような書き方をしない.

Julia言語がある程度面倒を見てくれるので, 徹底した神経質さは必要ない.

不安な場合には @code_warntype マクロで赤字の警告が出ないかどうか確認すればよいだろう.

以下の2つのセルを見ればわかるように, y = 1.0 + 0.0imy を複素数を代入するべきところを, y = 1 と整数を代入したり, y = 1.0 と実数(浮動小数点数)を代入したりすると遅くなる. 特に新たに変数を確保(初期化)するきには注意した方がよい. (Julia言語がある程度面倒を見てくれるので, 完璧に神経質になる必要はない.)

スカラー変数の初期化

In [5]:
# y = 1.0 + 0.0im を y = 1 に書き換えただけで大幅に遅くなる.

function test_of_integration_slow1()
    N = 10^8
    h = 2π/N                  # ← 厳密には h = Complex{Float64}(2π/N) と書くべきだがそうする必要はない.
    y = 1                      # ← y = 1 は y を整数 1 にするという意味. y = 1.0 + 0.0im と書くべき.
    for i in 1:N
        y += h*im*y            # ← y は整数変数だったはずなのに複素数を足している.
    end
    y
end

test_of_integration_slow1()
@show @time test_of_integration_slow1()

# y が複素変数なのか整数変数なのかわからない状態になってしまっている.
println()
@code_warntype test_of_integration_slow1()
  0.426350 seconds (5 allocations: 192 bytes)
#= In[5]:14 =# @time(test_of_integration_slow1()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Union{Complex{Float64}, Int64}
│╻╷╷╷╷   literal_pow4  1 ──       (Base.ifelse)(true, 0, 0)
││╻       macro expansion   2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│      ^   │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│      power_by_squaring   │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻       >   │    %5  = (Base.slt_int)(0, %4)::Bool
│││││      └───       goto #4 if not %5
│││││╻       *   3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││      └───       goto #2
│││││      4 ──       goto #5
││││       5 ──       goto #6
│││        6 ──       goto #7
│╻╷      *5  7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷    promote   │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻       /   │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻╷╷╷╷   Colon7  │    %15 = (Base.sle_int)(1, %2)::Bool
││╻       Type   │          (Base.sub_int)(%2, 1)
│││┃       unitrange_last   │    %17 = (Base.ifelse)(%15, %2, 0)::Int64
││╻╷╷     isempty   │    %18 = (Base.slt_int)(%17, 1)::Bool
││         └───       goto #9 if not %18
││         8 ──       goto #10
││         9 ──       goto #10
   10 ┄ %22 = φ (#8 => true, #9 => false)::Bool
   │    %23 = φ (#9 => 1)::Int64
   │    %24 = (Base.not_int)(%22)::Bool
   └───       goto #26 if not %24
   11 ┄ %26 = φ (#25 => %80)::Float64
   │    %27 = φ (#25 => %81)::Float64
   │    %28 = φ (#25 => %82)::Float64
   │    %29 = φ (#25 => %83)::Float64
   │    %30 = φ (#25 => %84)::Float64
   │    %31 = φ (#25 => %85)::Float64
   │    %32 = φ (#10 => 1, #25 => %86)::Union{Complex{Float64}, Int64}
   │    %33 = φ (#10 => %23, #25 => %92)::Int64
8  │    %34 = (isa)(%32, Complex{Float64})::Bool
   └───       goto #13 if not %34
││╻╷╷╷    *   12 ─ %36 = (Base.copysign_float)(0.0, %14)::Float64
│││┃│      *   │    %37 = (Base.ifelse)(false, %14, %36)::Float64
││││╻╷      *   │    %38 = (Base.copysign_float)(0.0, %14)::Float64
│││││      │    %39 = (Base.ifelse)(true, %14, %38)::Float64
│││╻       *   │    %40 = (Base.mul_float)(%37, %26)::Float64
││││       │    %41 = (Base.mul_float)(%39, %27)::Float64
│││╻       -   │    %42 = (Base.sub_float)(%40, %41)::Float64
│││╻       *   │    %43 = (Base.mul_float)(%37, %28)::Float64
││││       │    %44 = (Base.mul_float)(%39, %29)::Float64
│││╻       +   │    %45 = (Base.add_float)(%43, %44)::Float64
   └───       goto #16
   13 ─ %47 = (isa)(%32, Int64)::Bool
   └───       goto #15 if not %47
   14 ─ %49 = π (%32, Int64)
││╻╷╷╷    *   │    %50 = (Base.copysign_float)(0.0, %14)::Float64
│││┃│      *   │    %51 = (Base.ifelse)(false, %14, %50)::Float64
││││╻╷      *   │    %52 = (Base.copysign_float)(0.0, %14)::Float64
│││││      │    %53 = (Base.ifelse)(true, %14, %52)::Float64
│││╻╷╷╷╷   *   │    %54 = (Base.sitofp)(Float64, %49)::Float64
││││╻       *   │    %55 = (Base.mul_float)(%54, %51)::Float64
││││╻╷╷╷    promote   │    %56 = (Base.sitofp)(Float64, %49)::Float64
││││╻       *   │    %57 = (Base.mul_float)(%56, %53)::Float64
   └───       goto #16
   15 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
   └───       $(Expr(:unreachable))
   16 ┄ %61 = φ (#12 => %42, #14 => %55)::Float64
   │    %62 = φ (#12 => %45, #14 => %57)::Float64
   │    %63 = φ (#12 => %42, #14 => %55)::Float64
   │    %64 = φ (#12 => %45, #14 => %57)::Float64
   │    %65 = (isa)(%32, Complex{Float64})::Bool
   └───       goto #18 if not %65
││╻       +   17 ─ %67 = (Base.add_float)(%30, %61)::Float64
│││        │    %68 = (Base.add_float)(%31, %62)::Float64
│││╻       Type   │    %69 = %new(Complex{Float64}, %67, %68)::Complex{Float64}
   └───       goto #21
   18 ─ %71 = (isa)(%32, Int64)::Bool
   └───       goto #20 if not %71
   19 ─ %73 = π (%32, Int64)
││╻╷╷╷╷   +   │    %74 = (Base.sitofp)(Float64, %73)::Float64
│││╻       +   │    %75 = (Base.add_float)(%74, %63)::Float64
│││╻       Type   │    %76 = %new(Complex{Float64}, %75, %64)::Complex{Float64}
   └───       goto #21
   20 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
   └───       $(Expr(:unreachable))
   21 ┄ %80 = φ (#17 => %67, #19 => %75)::Float64
   │    %81 = φ (#17 => %68, #19 => %64)::Float64
   │    %82 = φ (#17 => %68, #19 => %64)::Float64
   │    %83 = φ (#17 => %67, #19 => %75)::Float64
   │    %84 = φ (#17 => %67, #19 => %75)::Float64
   │    %85 = φ (#17 => %68, #19 => %64)::Float64
   │    %86 = φ (#17 => %69, #19 => %76)::Complex{Float64}
││╻       ==   │    %87 = (%33 === %17)::Bool
││         └───       goto #23 if not %87
││         22 ─       goto #24
││╻       +   23 ─ %90 = (Base.add_int)(%33, 1)::Int64
│╻       iterate   └───       goto #24
   24 ┄ %92 = φ (#23 => %90)::Int64
   │    %93 = φ (#22 => true, #23 => false)::Bool
   │    %94 = (Base.not_int)(%93)::Bool
   └───       goto #26 if not %94
   25 ─       goto #11
10 26 ─ %97 = φ (#24 => %86, #10 => 1)::Union{Complex{Float64}, Int64}
   └───       return %97
In [6]:
# y = 1.0 でも同様の理由で遅い.

function test_of_integration_slow2()
    N = 10^8
    h = 2π/N                   # ← 厳密には h = Complex{Float64}(2π/N) と書くべきだがそうする必要はない.
    y = 1.0                    # ← y = 1.0 は y を浮動小数点数(実数)の 1 にするという意味. y = 1.0 + 0.0im と書くべき.
    for i in 1:N
        y += h*im*y            # ← y は実変数だったはずなのに複素数を足している.
    end
    y
end

test_of_integration_slow2()
@show @time test_of_integration_slow2()

# y が複素変数なのか実変数なのかわからない状態になってしまっている.
println()
@code_warntype test_of_integration_slow2()
  0.393946 seconds (5 allocations: 192 bytes)
#= In[6]:14 =# @time(test_of_integration_slow2()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Union{Complex{Float64}, Float64}
│╻╷╷╷╷ literal_pow4  1 ──       (Base.ifelse)(true, 0, 0)
││╻     macro expansion   2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│    ^   │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│    power_by_squaring   │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻     >   │    %5  = (Base.slt_int)(0, %4)::Bool
│││││    └───       goto #4 if not %5
│││││╻     *   3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││    └───       goto #2
│││││    4 ──       goto #5
││││     5 ──       goto #6
│││      6 ──       goto #7
│╻╷    *5  7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷  promote   │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻     /   │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻╷╷╷╷ Colon7  │    %15 = (Base.sle_int)(1, %2)::Bool
││╻     Type   │          (Base.sub_int)(%2, 1)
│││┃     unitrange_last   │    %17 = (Base.ifelse)(%15, %2, 0)::Int64
││╻╷╷   isempty   │    %18 = (Base.slt_int)(%17, 1)::Bool
││       └───       goto #9 if not %18
││       8 ──       goto #10
││       9 ──       goto #10
   10 ┄ %22 = φ (#8 => true, #9 => false)::Bool
   │    %23 = φ (#9 => 1)::Int64
   │    %24 = (Base.not_int)(%22)::Bool
   └───       goto #26 if not %24
   11 ┄ %26 = φ (#25 => %77)::Float64
   │    %27 = φ (#25 => %78)::Float64
   │    %28 = φ (#25 => %79)::Float64
   │    %29 = φ (#25 => %80)::Float64
   │    %30 = φ (#25 => %81)::Float64
   │    %31 = φ (#25 => %82)::Float64
   │    %32 = φ (#10 => 1.0, #25 => %83)::Union{Complex{Float64}, Float64}
   │    %33 = φ (#10 => %23, #25 => %89)::Int64
8  │    %34 = (isa)(%32, Complex{Float64})::Bool
   └───       goto #13 if not %34
││╻╷╷╷  *   12 ─ %36 = (Base.copysign_float)(0.0, %14)::Float64
│││┃│    *   │    %37 = (Base.ifelse)(false, %14, %36)::Float64
││││╻╷    *   │    %38 = (Base.copysign_float)(0.0, %14)::Float64
│││││    │    %39 = (Base.ifelse)(true, %14, %38)::Float64
│││╻     *   │    %40 = (Base.mul_float)(%37, %26)::Float64
││││     │    %41 = (Base.mul_float)(%39, %27)::Float64
│││╻     -   │    %42 = (Base.sub_float)(%40, %41)::Float64
│││╻     *   │    %43 = (Base.mul_float)(%37, %28)::Float64
││││     │    %44 = (Base.mul_float)(%39, %29)::Float64
│││╻     +   │    %45 = (Base.add_float)(%43, %44)::Float64
   └───       goto #16
   13 ─ %47 = (isa)(%32, Float64)::Bool
   └───       goto #15 if not %47
   14 ─ %49 = π (%32, Float64)
││╻╷╷╷  *   │    %50 = (Base.copysign_float)(0.0, %14)::Float64
│││┃│    *   │    %51 = (Base.ifelse)(false, %14, %50)::Float64
││││╻╷    *   │    %52 = (Base.copysign_float)(0.0, %14)::Float64
│││││    │    %53 = (Base.ifelse)(true, %14, %52)::Float64
│││╻     *   │    %54 = (Base.mul_float)(%49, %51)::Float64
││││     │    %55 = (Base.mul_float)(%49, %53)::Float64
   └───       goto #16
   15 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
   └───       $(Expr(:unreachable))
   16 ┄ %59 = φ (#12 => %42, #14 => %54)::Float64
   │    %60 = φ (#12 => %45, #14 => %55)::Float64
   │    %61 = φ (#12 => %42, #14 => %54)::Float64
   │    %62 = φ (#12 => %45, #14 => %55)::Float64
   │    %63 = (isa)(%32, Complex{Float64})::Bool
   └───       goto #18 if not %63
││╻     +   17 ─ %65 = (Base.add_float)(%30, %59)::Float64
│││      │    %66 = (Base.add_float)(%31, %60)::Float64
│││╻     Type   │    %67 = %new(Complex{Float64}, %65, %66)::Complex{Float64}
   └───       goto #21
   18 ─ %69 = (isa)(%32, Float64)::Bool
   └───       goto #20 if not %69
   19 ─ %71 = π (%32, Float64)
││╻     +   │    %72 = (Base.add_float)(%71, %61)::Float64
│││╻     Type   │    %73 = %new(Complex{Float64}, %72, %62)::Complex{Float64}
   └───       goto #21
   20 ─       (Core.throw)(ErrorException("fatal error in type inference (type bound)"))
   └───       $(Expr(:unreachable))
   21 ┄ %77 = φ (#17 => %65, #19 => %72)::Float64
   │    %78 = φ (#17 => %66, #19 => %62)::Float64
   │    %79 = φ (#17 => %66, #19 => %62)::Float64
   │    %80 = φ (#17 => %65, #19 => %72)::Float64
   │    %81 = φ (#17 => %65, #19 => %72)::Float64
   │    %82 = φ (#17 => %66, #19 => %62)::Float64
   │    %83 = φ (#17 => %67, #19 => %73)::Complex{Float64}
││╻     ==   │    %84 = (%33 === %17)::Bool
││       └───       goto #23 if not %84
││       22 ─       goto #24
││╻     +   23 ─ %87 = (Base.add_int)(%33, 1)::Int64
│╻     iterate   └───       goto #24
   24 ┄ %89 = φ (#23 => %87)::Int64
   │    %90 = φ (#22 => true, #23 => false)::Bool
   │    %91 = (Base.not_int)(%90)::Bool
   └───       goto #26 if not %91
   25 ─       goto #11
10 26 ─ %94 = φ (#24 => %83, #10 => 1.0)::Union{Complex{Float64}, Float64}
   └───       return %94

配列変数の初期化

次のセルとその次のセルを比較してみればわかるように, y を複素数の要素を持つ配列として確保しているならば, y[1] = 1 と初期値を設定しても遅くならない.

In [7]:
function test_of_integration_array0()
    N = 10^8
    h = 2π/N                        # ← 厳密には h = Complex{Float64}(2π/N) と書きたくなるがその必要はない
    y = Array{Complex{Float64}}(undef, N+1)
    y[1] = 1.0 + 0.0im
    for i in 1:N
        y[i+1] = (1 + h*im)*y[i]    # ← 厳密には 1 を one(Complex{Float64}) と書きたくなるがその必要はない
    end
    y[N+1]
end

test_of_integration_array0()
@show @time test_of_integration_array0()

println()
@code_warntype test_of_integration_array0()
  0.867407 seconds (7 allocations: 1.490 GiB, 21.55% gc time)
#= In[7]:13 =# @time(test_of_integration_array0()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Complex{Float64}
│╻╷╷╷╷ literal_pow2 1 ──       (Base.ifelse)(true, 0, 0)
││╻     macro expansion  2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│    ^  │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│    power_by_squaring  │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻     >  │    %5  = (Base.slt_int)(0, %4)::Bool
│││││   └───       goto #4 if not %5
│││││╻     *  3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││   └───       goto #2
│││││   4 ──       goto #5
││││    5 ──       goto #6
│││     6 ──       goto #7
│╻╷    *3 7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷  promote  │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻     /  │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻     +4 │    %15 = (Base.add_int)(%2, 1)::Int64
││╻     Type  │    %16 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Complex{Float64},1}, svec(Any, Int64), :(:ccall), 2, Array{Complex{Float64},1}, :(%15), :(%15)))::Array{Complex{Float64},1}
│╻╷╷   *5 │          (Base.ifelse)(false, 0.0, 0.0)
││╻     *  │          (Base.ifelse)(true, 0.0, 0.0)
││╻     +  │    %19 = (Base.add_float)(1.0, 0.0)::Float64
│││╻     Type  │    + = %new(Complex{Float64}, %19, 0.0)::Complex{Float64}
│╻     setindex!  │          (Base.arrayset)(true, %16, +, 1)
│╻╷╷╷╷ Colon6 │    %22 = (Base.sle_int)(1, %2)::Bool
││╻     Type  │          (Base.sub_int)(%2, 1)
│││┃     unitrange_last  │    %24 = (Base.ifelse)(%22, %2, 0)::Int64
││╻╷╷   isempty  │    %25 = (Base.slt_int)(%24, 1)::Bool
││      └───       goto #9 if not %25
││      8 ──       goto #10
││      9 ──       goto #10
  10 ┄ %29 = φ (#8 => true, #9 => false)::Bool
  │    %30 = φ (#9 => 1)::Int64
  │    %31 = φ (#9 => 1)::Int64
  │    %32 = (Base.not_int)(%29)::Bool
  └───       goto #16 if not %32
  11 ┄ %34 = φ (#10 => %30, #15 => %60)::Int64
  │    %35 = φ (#10 => %31, #15 => %61)::Int64
│╻╷╷╷  *7 │    %36 = (Base.copysign_float)(0.0, %14)::Float64
││┃│    *  │    %37 = (Base.ifelse)(false, %14, %36)::Float64
│││╻╷    *  │    %38 = (Base.copysign_float)(0.0, %14)::Float64
││││    │    %39 = (Base.ifelse)(true, %14, %38)::Float64
││╻╷    +  │    %40 = (Base.add_float)(1.0, %37)::Float64
│╻     getindex  │    %41 = (Base.arrayref)(true, %16, %34)::Complex{Float64}
││╻╷    real  │    %42 = (Base.getfield)(%41, :re)::Float64
││╻     *  │    %43 = (Base.mul_float)(%40, %42)::Float64
│││╻     getproperty  │    %44 = (Base.getfield)(%41, :im)::Float64
││╻     *  │    %45 = (Base.mul_float)(%39, %44)::Float64
││╻     -  │    %46 = (Base.sub_float)(%43, %45)::Float64
│││╻     getproperty  │    %47 = (Base.getfield)(%41, :im)::Float64
││╻     *  │    %48 = (Base.mul_float)(%40, %47)::Float64
│││╻     getproperty  │    %49 = (Base.getfield)(%41, :re)::Float64
││╻     *  │    %50 = (Base.mul_float)(%39, %49)::Float64
││╻     +  │    %51 = (Base.add_float)(%48, %50)::Float64
│││╻     Type  │    %52 = %new(Complex{Float64}, %46, %51)::Complex{Float64}
│╻     +  │    %53 = (Base.add_int)(%34, 1)::Int64
│╻     setindex!  │          (Base.arrayset)(true, %16, %52, %53)
││╻     ==  │    %55 = (%35 === %24)::Bool
││      └───       goto #13 if not %55
││      12 ─       goto #14
││╻     +  13 ─ %58 = (Base.add_int)(%35, 1)::Int64
│╻     iterate  └───       goto #14
  14 ┄ %60 = φ (#13 => %58)::Int64
  │    %61 = φ (#13 => %58)::Int64
  │    %62 = φ (#12 => true, #13 => false)::Bool
  │    %63 = (Base.not_int)(%62)::Bool
  └───       goto #16 if not %63
  15 ─       goto #11
│╻     +9 16 ─ %66 = (Base.add_int)(%2, 1)::Int64
│╻     getindex  │    %67 = (Base.arrayref)(true, %16, %66)::Complex{Float64}
  └───       return %67
In [8]:
function test_of_integration_array1()
    N = 10^8
    h = 2π/N                        # ← 厳密には h = Complex{Float64}(2π)/N と書きたくなるがその必要はない
    y = Array{Complex{Float64}}(undef, N+1)
    y[1] = 1.0                       # ← 厳密には y[1] = 1.0 + 0.0im と書きたくなるがその必要はない.
    for i in 1:N
        y[i+1] = (1 + h*im)*y[i]     # ← 厳密には 1 を one(Complex{Float64}) と書きたくなるがその必要はない
    end
    y[N+1]
end

test_of_integration_array0()
@show @time test_of_integration_array1()

println()
@code_warntype test_of_integration_array1()
  0.977599 seconds (90.90 k allocations: 1.495 GiB, 13.38% gc time)
#= In[8]:13 =# @time(test_of_integration_array1()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Complex{Float64}
│╻╷╷╷╷ literal_pow2 1 ──       (Base.ifelse)(true, 0, 0)
││╻     macro expansion  2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│    ^  │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│    power_by_squaring  │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻     >  │    %5  = (Base.slt_int)(0, %4)::Bool
│││││   └───       goto #4 if not %5
│││││╻     *  3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││   └───       goto #2
│││││   4 ──       goto #5
││││    5 ──       goto #6
│││     6 ──       goto #7
│╻╷    *3 7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷  promote  │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻     /  │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻     +4 │    %15 = (Base.add_int)(%2, 1)::Int64
││╻     Type  │    %16 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Complex{Float64},1}, svec(Any, Int64), :(:ccall), 2, Array{Complex{Float64},1}, :(%15), :(%15)))::Array{Complex{Float64},1}
│╻╷╷╷  setindex!5 │    %17 = %new(Complex{Float64}, 1.0, 0.0)::Complex{Float64}
││      │          (Base.arrayset)(true, %16, %17, 1)
│╻╷╷╷╷ Colon6 │    %19 = (Base.sle_int)(1, %2)::Bool
││╻     Type  │          (Base.sub_int)(%2, 1)
│││┃     unitrange_last  │    %21 = (Base.ifelse)(%19, %2, 0)::Int64
││╻╷╷   isempty  │    %22 = (Base.slt_int)(%21, 1)::Bool
││      └───       goto #9 if not %22
││      8 ──       goto #10
││      9 ──       goto #10
  10 ┄ %26 = φ (#8 => true, #9 => false)::Bool
  │    %27 = φ (#9 => 1)::Int64
  │    %28 = φ (#9 => 1)::Int64
  │    %29 = (Base.not_int)(%26)::Bool
  └───       goto #16 if not %29
  11 ┄ %31 = φ (#10 => %27, #15 => %57)::Int64
  │    %32 = φ (#10 => %28, #15 => %58)::Int64
│╻╷╷╷  *7 │    %33 = (Base.copysign_float)(0.0, %14)::Float64
││┃│    *  │    %34 = (Base.ifelse)(false, %14, %33)::Float64
│││╻╷    *  │    %35 = (Base.copysign_float)(0.0, %14)::Float64
││││    │    %36 = (Base.ifelse)(true, %14, %35)::Float64
││╻╷    +  │    %37 = (Base.add_float)(1.0, %34)::Float64
│╻     getindex  │    %38 = (Base.arrayref)(true, %16, %31)::Complex{Float64}
││╻╷    real  │    %39 = (Base.getfield)(%38, :re)::Float64
││╻     *  │    %40 = (Base.mul_float)(%37, %39)::Float64
│││╻     getproperty  │    %41 = (Base.getfield)(%38, :im)::Float64
││╻     *  │    %42 = (Base.mul_float)(%36, %41)::Float64
││╻     -  │    %43 = (Base.sub_float)(%40, %42)::Float64
│││╻     getproperty  │    %44 = (Base.getfield)(%38, :im)::Float64
││╻     *  │    %45 = (Base.mul_float)(%37, %44)::Float64
│││╻     getproperty  │    %46 = (Base.getfield)(%38, :re)::Float64
││╻     *  │    %47 = (Base.mul_float)(%36, %46)::Float64
││╻     +  │    %48 = (Base.add_float)(%45, %47)::Float64
│││╻     Type  │    %49 = %new(Complex{Float64}, %43, %48)::Complex{Float64}
│╻     +  │    %50 = (Base.add_int)(%31, 1)::Int64
│╻     setindex!  │          (Base.arrayset)(true, %16, %49, %50)
││╻     ==  │    %52 = (%32 === %21)::Bool
││      └───       goto #13 if not %52
││      12 ─       goto #14
││╻     +  13 ─ %55 = (Base.add_int)(%32, 1)::Int64
│╻     iterate  └───       goto #14
  14 ┄ %57 = φ (#13 => %55)::Int64
  │    %58 = φ (#13 => %55)::Int64
  │    %59 = φ (#12 => true, #13 => false)::Bool
  │    %60 = (Base.not_int)(%59)::Bool
  └───       goto #16 if not %60
  15 ─       goto #11
│╻     +9 16 ─ %63 = (Base.add_int)(%2, 1)::Int64
│╻     getindex  │    %64 = (Base.arrayref)(true, %16, %63)::Complex{Float64}
  └───       return %64
In [9]:
function test_of_integration_array2()
    N = 10^8
    h = 2π/N                        # ← 厳密には h = Complex{Float64}(2π/N) と書きたくなるがその必要はない
    y = Array{Complex{Float64}}(undef, N+1)
    y[1] = 1                         # ← 厳密には y[1] = 1.0 + 0.0im と書きたくなるがその必要はない.
    for i in 1:N
        y[i+1] = (1 + h*im)*y[i]    # ← 厳密には 1 を one(Complex{Float64}) と書きたくなるがその必要はない
    end
    y[N+1]
end

test_of_integration_array0()
@show @time test_of_integration_array2()

println()
@code_warntype test_of_integration_array2()
  0.925477 seconds (91.22 k allocations: 1.495 GiB, 16.76% gc time)
#= In[9]:13 =# @time(test_of_integration_array2()) = 1.0000001973920172 + 8.725669089740029e-15im

Body::Complex{Float64}
│╻╷╷╷╷ literal_pow2 1 ──       (Base.ifelse)(true, 0, 0)
││╻     macro expansion  2 ┄─ %2  = φ (#1 => 10, #3 => %7)::Int64
│││┃│    ^  │    %3  = φ (#1 => 4, #3 => %4)::Int64
││││┃│    power_by_squaring  │    %4  = (Base.sub_int)(%3, 1)::Int64
│││││╻     >  │    %5  = (Base.slt_int)(0, %4)::Bool
│││││   └───       goto #4 if not %5
│││││╻     *  3 ── %7  = (Base.mul_int)(%2, %2)::Int64
│││││   └───       goto #2
│││││   4 ──       goto #5
││││    5 ──       goto #6
│││     6 ──       goto #7
│╻╷    *3 7 ──       (Base.mul_float)(2.0, 3.141592653589793)
││╻╷╷╷  promote  │    %13 = (Base.sitofp)(Float64, %2)::Float64
││╻     /  │    %14 = (Base.div_float)(6.283185307179586, %13)::Float64
│╻     +4 │    %15 = (Base.add_int)(%2, 1)::Int64
││╻     Type  │    %16 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Complex{Float64},1}, svec(Any, Int64), :(:ccall), 2, Array{Complex{Float64},1}, :(%15), :(%15)))::Array{Complex{Float64},1}
│╻╷╷╷  setindex!5 │    %17 = %new(Complex{Float64}, 1.0, 0.0)::Complex{Float64}
││      │          (Base.arrayset)(true, %16, %17, 1)
│╻╷╷╷╷ Colon6 │    %19 = (Base.sle_int)(1, %2)::Bool
││╻     Type  │          (Base.sub_int)(%2, 1)
│││┃     unitrange_last  │    %21 = (Base.ifelse)(%19, %2, 0)::Int64
││╻╷╷   isempty  │    %22 = (Base.slt_int)(%21, 1)::Bool
││      └───       goto #9 if not %22
││      8 ──       goto #10
││      9 ──       goto #10
  10 ┄ %26 = φ (#8 => true, #9 => false)::Bool
  │    %27 = φ (#9 => 1)::Int64
  │    %28 = φ (#9 => 1)::Int64
  │    %29 = (Base.not_int)(%26)::Bool
  └───       goto #16 if not %29
  11 ┄ %31 = φ (#10 => %27, #15 => %57)::Int64
  │    %32 = φ (#10 => %28, #15 => %58)::Int64
│╻╷╷╷  *7 │    %33 = (Base.copysign_float)(0.0, %14)::Float64
││┃│    *  │    %34 = (Base.ifelse)(false, %14, %33)::Float64
│││╻╷    *  │    %35 = (Base.copysign_float)(0.0, %14)::Float64
││││    │    %36 = (Base.ifelse)(true, %14, %35)::Float64
││╻╷    +  │    %37 = (Base.add_float)(1.0, %34)::Float64
│╻     getindex  │    %38 = (Base.arrayref)(true, %16, %31)::Complex{Float64}
││╻╷    real  │    %39 = (Base.getfield)(%38, :re)::Float64
││╻     *  │    %40 = (Base.mul_float)(%37, %39)::Float64
│││╻     getproperty  │    %41 = (Base.getfield)(%38, :im)::Float64
││╻     *  │    %42 = (Base.mul_float)(%36, %41)::Float64
││╻     -  │    %43 = (Base.sub_float)(%40, %42)::Float64
│││╻     getproperty  │    %44 = (Base.getfield)(%38, :im)::Float64
││╻     *  │    %45 = (Base.mul_float)(%37, %44)::Float64
│││╻     getproperty  │    %46 = (Base.getfield)(%38, :re)::Float64
││╻     *  │    %47 = (Base.mul_float)(%36, %46)::Float64
││╻     +  │    %48 = (Base.add_float)(%45, %47)::Float64
│││╻     Type  │    %49 = %new(Complex{Float64}, %43, %48)::Complex{Float64}
│╻     +  │    %50 = (Base.add_int)(%31, 1)::Int64
│╻     setindex!  │          (Base.arrayset)(true, %16, %49, %50)
││╻     ==  │    %52 = (%32 === %21)::Bool
││      └───       goto #13 if not %52
││      12 ─       goto #14
││╻     +  13 ─ %55 = (Base.add_int)(%32, 1)::Int64
│╻     iterate  └───       goto #14
  14 ┄ %57 = φ (#13 => %55)::Int64
  │    %58 = φ (#13 => %55)::Int64
  │    %59 = φ (#12 => true, #13 => false)::Bool
  │    %60 = (Base.not_int)(%59)::Bool
  └───       goto #16 if not %60
  15 ─       goto #11
│╻     +9 16 ─ %63 = (Base.add_int)(%2, 1)::Int64
│╻     getindex  │    %64 = (Base.arrayref)(true, %16, %63)::Complex{Float64}
  └───       return %64

型指定をしていないことが原因で遅くなることがある.

例えば, 型指定をしていない配列 aa = [] と確保して, apush! で要素を追加して行くコードを書くと遅くなる.

In [10]:
# 非常に遅い.

L = 10^6
@show a = []
@benchmark for i in 1:L
    push!(a, rand())
end
a = [] = Any[]
Out[10]:
BenchmarkTools.Trial: 
  memory estimate:  76.28 MiB
  allocs estimate:  3998979
  --------------
  minimum time:     148.075 ms (0.00% GC)
  median time:      221.872 ms (0.00% GC)
  mean time:        489.066 ms (58.77% GC)
  maximum time:     1.313 s (87.61% GC)
  --------------
  samples:          11
  evals/sample:     1
In [11]:
# 函数の中に入れてもかなり遅い.

function test_of_push_Any_array(; L = 10^6)
    a = []
    for i in 1:L
        push!(a, rand())
    end
end

@benchmark test_of_push_Any_array()
Out[11]:
BenchmarkTools.Trial: 
  memory estimate:  24.26 MiB
  allocs estimate:  1000021
  --------------
  minimum time:     95.301 ms (58.99% GC)
  median time:      109.740 ms (58.93% GC)
  mean time:        114.553 ms (60.91% GC)
  maximum time:     168.206 ms (70.26% GC)
  --------------
  samples:          44
  evals/sample:     1
In [12]:
# a を Float64 型の要素を持つ配列に型指定するだけでかなり速くなる.

function test_of_push_Float64_array(; L = 10^6)
    a = Float64[]
    for i in 1:L
        push!(a, rand()) # ← rand() の値は Float64 型
    end
    return a
end

@benchmark a = test_of_push_Float64_array()
Out[12]:
BenchmarkTools.Trial: 
  memory estimate:  9.00 MiB
  allocs estimate:  21
  --------------
  minimum time:     11.321 ms (0.00% GC)
  median time:      14.032 ms (0.00% GC)
  mean time:        16.019 ms (6.08% GC)
  maximum time:     34.834 ms (26.66% GC)
  --------------
  samples:          298
  evals/sample:     1
In [13]:
# push! の繰り返しよりも, まとめて配列を確保して代入した方が速い.

function test_of_Array_Float64(; L = 10^6)
    a = Array{Float64}(undef, L)
    for i in 1:L
        a[i] = rand()
    end
    return a
end

@benchmark a = test_of_Array_Float64()
Out[13]:
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  3
  --------------
  minimum time:     3.962 ms (0.00% GC)
  median time:      6.274 ms (0.00% GC)
  mean time:        7.001 ms (21.23% GC)
  maximum time:     23.209 ms (0.00% GC)
  --------------
  samples:          714
  evals/sample:     1
In [14]:
# 組み込み函数は非常に速い.

@benchmark a = rand(10^6)
Out[14]:
BenchmarkTools.Trial: 
  memory estimate:  7.63 MiB
  allocs estimate:  2
  --------------
  minimum time:     2.955 ms (0.00% GC)
  median time:      3.992 ms (0.00% GC)
  mean time:        4.996 ms (26.09% GC)
  maximum time:     15.419 ms (72.40% GC)
  --------------
  samples:          999
  evals/sample:     1

Julia特有の変数の初期化の仕方

T 型の引数 x の函数は値も T 型になって欲しいことが多いだろう. そして, そのとき, 計算で使われる変数も T 型であって欲しいことが多いだろう. そのような場合には, 函数内の変数を y = 1.0 のように初期化すると, yFloat64 型であると決め打ちすることになり好ましくない. 函数の引数の型に合わせて適切に変数を確保(初期化)した方がよい.

例えば以下のような方法が使われる:

  • y = zero(T) ← T型の0
  • y = zero(x) ← 値xと同じ型の0
  • y = one(T) ← T型の1
  • y = one(x) ← 値xと同じ型の1
  • y = similar(a) ← 配列aと同じ型の変数を作成
  • y = Array{eltype(a),2}(undef, m,n) ← 配列aの要素と同じ型を成分とするサイズ(m,n)の配列を作成
  • y = rand(T,m,n) ← T型の乱数で埋め尽くされたサイズ(m,n)の配列

他にも多彩な方法がある. 新たに変数を作成する場合には何型の変数になるかに注意を払うことは, Julia言語におけるプログラミングの基本の1つである. 函数内だけで型が決まらない引数の型は以下のようにして取得する:

  • type(x) ← xの型
  • eltype(a) ← 配列aの要素の型

例: y = one(h)

次のセルの函数では変数 yy = one(h) によって作成初期化されている. y の型は h の型と同じになる. y = one(x) としなかった理由は x が整数型の場合にも配慮したいからである. x が整数型のとき h = x/N は Float64 型になる.

In [15]:
function myexp(x; N=10^6)
    h = x/N
    y = one(h)        # h と同じ型の1で y を初期化.  y は h と同じ型の変数になる.
    for i in 1:N
        y += h*y
    end
    y
end
Out[15]:
myexp (generic function with 1 method)
In [16]:
@show x = 1
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = 1 = 1
typeof(x) = Int64
  0.028577 seconds (50.13 k allocations: 2.776 MiB)
Out[16]:
(2.7182804693194718, Float64)
In [17]:
@show x = 1.0
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = 1.0 = 1.0
typeof(x) = Float64
  0.017591 seconds (50.13 k allocations: 2.777 MiB)
Out[17]:
(2.7182804693194718, Float64)
In [18]:
@show x = Int8(1)
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = Int8(1) = 1
typeof(x) = Int8
  0.019126 seconds (52.81 k allocations: 2.924 MiB)
Out[18]:
(2.7182804693194718, Float64)
In [19]:
@show x = log(BigFloat(2))
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = log(BigFloat(2)) = 6.931471805599453094172321214581765680755001343602552541206800094933936219696955e-01
typeof(x) = BigFloat
  0.246938 seconds (4.10 M allocations: 218.903 MiB, 16.26% gc time)
Out[19]:
(1.999999519547265806834507659818847707783963397126589739047723711821894726917291, BigFloat)
In [20]:
@show x = log(2)
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = log(2) = 0.6931471805599453
typeof(x) = Float64
  0.002728 seconds (5 allocations: 176 bytes)
Out[20]:
(1.9999995195471085, Float64)
In [21]:
@show x = log(Float32(2))
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = log(Float32(2)) = 0.6931472f0
typeof(x) = Float32
  0.023357 seconds (67.05 k allocations: 3.720 MiB)
Out[21]:
(1.9964288f0, Float32)
In [22]:
@show x = 2*BigFloat(π)*im
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = 2 * BigFloat(π) * im = 0.0 + 6.283185307179586476925286766559005768394338798750211641949889184615632812572396im
typeof(x) = Complex{BigFloat}
  0.883304 seconds (18.14 M allocations: 923.249 MiB, 17.05% gc time)
Out[22]:
(1.000019739403621252996352700802965665250765204184418369700029719552951878336357 - 8.268503659993478164569888347722451905981943288760811993512548549569722701649271e-11im, Complex{BigFloat})
In [23]:
@show x = 2π*im
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = (2π) * im = 0.0 + 6.283185307179586im
typeof(x) = Complex{Float64}
  0.060166 seconds (74.60 k allocations: 4.047 MiB, 26.85% gc time)
Out[23]:
(1.0000197394036099 - 8.271237919989742e-11im, Complex{Float64})
In [24]:
@show x = 2*Float32(π)*im
@show typeof(x)
@time y = myexp(x)
y, typeof(y)
x = 2 * Float32(π) * im = 0.0f0 + 6.2831855f0im
typeof(x) = Complex{Float32}
  0.057432 seconds (86.10 k allocations: 4.625 MiB)
Out[24]:
(1.0f0 + 2.8066303f-5im, Complex{Float32})
In [25]:
x = (π/6)*Float64[
    0 -1
    1  0
]
@show x
@show typeof(x)
@time y = myexp(x)
y
x = [0.0 -0.523599; 0.523599 0.0]
typeof(x) = Array{Float64,2}
  0.922647 seconds (3.43 M allocations: 282.087 MiB, 11.59% gc time)
Out[25]:
2×2 Array{Float64,2}:
 0.866026  -0.5     
 0.5        0.866026

例: y = Array{typeof(h)}(undef, N+1)

In [26]:
function myexp_orbit(x; N=10^3)
    h = x/N
    y = Array{typeof(h)}(undef, N+1) # hと同じ型の要素を持つ配列を作成
    y[1] = one(h)                    # 行列対応にするためには one(h) を 1 で置き換えてはいけない.
    for i in 1:N
        y[i+1] = (one(h) + h)*y[i]   # 行列対応にするためには one(h) を 1 で置き換えてはいけない.
    end
    y
end
Out[26]:
myexp_orbit (generic function with 1 method)
In [27]:
N = 400
x = 3
y = myexp_orbit(x; N=N)
xs = linspace(0,x,N+1)

figure(figsize=(5,3.5))
plot(xs, y, label="y = myexp(x)")
plot(xs, exp.(xs), label="y = exp(x)", ls="--")
grid(ls=":")
xlabel("x")
ylabel("y")
legend()
title("exponential of real numbers")
Out[27]:
PyObject Text(0.5,1,'exponential of real numbers')
In [28]:
# 複素数でも使える

N = 400
z = π*im
w = myexp_orbit(z; N=N)

figure(figsize=(5,3.5))
plot(linspace(0,π,N+1), real(w), label="real part")
plot(linspace(0,π,N+1), imag(w), label="imaginary part")
xlabel("arg")
grid(ls=":")
legend()
title("exponential of pure imaginary numbers")
Out[28]:
PyObject Text(0.5,1,'exponential of pure imaginary numbers')
In [29]:
# 行列でも使える

N = 10^4
Z = (5π/3)*Float64[
    0 -1
    1  0
]
W = myexp_orbit(Z; N=N)
x = [W[k][1,1] for k in 1:N+1]
y = [W[k][2,1] for k in 1:N+1]
plot(x, y)
plot([0.0, x[1]], [0.0, y[1]], color="k", lw=1)
plot([0.0, x[N+1]], [0.0, y[N+1]], color="k", lw=1)
grid()
PyPlot.axes()[:set_aspect]("equal")

大域変数を含む函数は遅くなる

解決法1: 函数内の大域変数に型宣言を付ける.

解決法2: 全体を函数の中に入れてしまう.

解決法3: 大域変数を定数に置き換える.

解決法4: function-like object を使う. ← 少し面倒だけどおすすめ!

より正確に言えば, 大域変数を含むから遅くなるのではなく, Julia言語の型推定の仕組み(函数の引数の型から変数の型を推定するという仕組み)によって十分に型推定できなくなることが原因で遅くなる.

In [30]:
# テストデータの作成

N = 10^8
srand(2018)
w = randn(N)
x = w.^2;

以下は実質的に標準正規分布の3次のモーメントをモンテカルロ積分で計算していることになる. 計算結果は 0 に近くならなければいけない.

In [31]:
# 函数の直接呼出しは速い

weighted_mean(x,w) = mean(w[i]*x[i] for i in eachindex(w)) # w を後でパラメーターだとみなす.

weighted_mean(x,w)
@show @time weighted_mean(x,w)

# println()
# @code_warntype weighted_mean(x,w)
  0.140883 seconds (7 allocations: 240 bytes)
#= In[31]:6 =# @time(weighted_mean(x, w)) = 0.0002198997715224615

大域変数を含む函数は非常に遅くなることがある.

In [32]:
# 以下の場合には大域変数を含んでいるように見えても**速い**.
# 
# function_containing_global_var1(x) を実行しようとすると, 
# Julia言語は weighted_mean(x,w) を実行しようとする.
# そして, weighted_mean 函数を引数 x, w の型に合わせてコンパイルして実行する.
# Julia言語が引数の型に合わせて十分に型推定できる場合には計算が速くなる.

function_containing_global_var1(x) = weighted_mean(x,w)

function_containing_global_var1(x)
@show @time function_containing_global_var1(x)

# 計算速度的には問題無かったが, @code_warntype で確認すると赤字の警告が出ているので避けた方が無難.
println()
@code_warntype function_containing_global_var1(x)
  0.179396 seconds (7 allocations: 240 bytes)
#= In[32]:11 =# @time(function_containing_global_var1(x)) = 0.0002198997715224615

Body::Any
8 1 ─ %1 = (Main.weighted_mean)(x, Main.w)::Any
  └──      return %1
In [33]:
# 以下のように書くと大幅に遅くなる.
# 
# function_containing_global_var1(x) を実行しようとすると, 
# Julia言語は mean(w[i]*x[i] for i in eachindex(w)) を実行しようとする.
# そして, mean(w[i]*x[i] for i in eachindex(w)) を引数 x の型に合わせてコンパイルして実行する.
# そのとき w は大域変数なので型は不明であるとする.
# Julia言語が引数の型に合わせて十分に型推定できない場合には計算は遅くなる.

function_containing_global_var2(x) = mean(w[i]*x[i] for i in eachindex(w))

function_containing_global_var2(x)
@show @time function_containing_global_var2(x)

# w の型が Any になっている
println()
@code_warntype function_containing_global_var2(x)
 42.704690 seconds (600.00 M allocations: 8.941 GiB, 50.32% gc time)
#= In[33]:12 =# @time(function_containing_global_var2(x)) = 0.0002198997715224615

Body::Any
9 1 ─ %1 = %new(getfield(Main, Symbol("##19#20")){Array{Float64,1}}, x)::getfield(Main, Symbol("##19#20")){Array{Float64,1}}
  │   %2 = (Main.eachindex)(Main.w)::Any
  │   %3 = (Base.Generator)(%1, %2)::Base.Generator{_1,getfield(Main, Symbol("##19#20")){Array{Float64,1}}} where _1
  │   %4 = (Main.mean)(%3)::Any
  └──      return %4

大域変数を含む函数は大域変数に型宣言を付けると速くなる.

In [34]:
# 以下のように大域変数を含んでいる場合であっても, 型宣言を追加すれば速くなる.

function_containing_global_var3(x) = (mean((w::Array{Float64,1})[i]*x[i] for i in eachindex(w::Array{Float64,1})))

function_containing_global_var3(x)
@show @time function_containing_global_var3(x)

# 赤字の警告がすべて消えている.
println()
@code_warntype function_containing_global_var3(x)
  0.207291 seconds (7 allocations: 224 bytes)
#= In[34]:6 =# @time(function_containing_global_var3(x)) = 0.0002198997715224615

Body::Float64
3 1 ─ %1 = %new(getfield(Main, Symbol("##21#22")){Array{Float64,1}}, x)::getfield(Main, Symbol("##21#22")){Array{Float64,1}}
  │        (Core.typeassert)(Main.w, Array{Float64,1})
  │   %3 = π (Main.w, Array{Float64,1})
│╻╷╷     eachindex  │   %4 = (Base.arraysize)(%3, 1)::Int64
││╻╷╷╷    axes1  │   %5 = (Base.slt_int)(%4, 0)::Bool
│││┃││││   axes  │   %6 = (Base.ifelse)(%5, 0, %4)::Int64
││││┃││     map  │   %7 = %new(Base.OneTo{Int64}, %6)::Base.OneTo{Int64}
││╻       Type  │   %8 = %new(Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##21#22")){Array{Float64,1}}}, %1, %7)::Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##21#22")){Array{Float64,1}}}
│╻       mean  │   %9 = invoke Statistics.mean(Statistics.identity::typeof(identity), %8::Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##21#22")){Array{Float64,1}}})::Float64
  └──      return %9

全体を函数の中に入れると速くなる.

In [35]:
# 大域変数だった w の定義も含めて, 全体を函数の中に入れると速くなる.
#
# この解決法が最もシンプルでわかりやすいが, 柔軟性に欠けた不便な解決法でもある.

function test_of_function_containing_local_var()
    N = 10^8
    srand(2018)
    w = randn(N)
    x = w.^2
    
    function_containing_local_var(x) = mean(w[i]*x[i] for i in eachindex(w))

    function_containing_local_var(x)
    @show @time function_containing_local_var(x)
end

test_of_function_containing_local_var()

# println()
# @code_warntype test_of_function_containing_local_var()
  0.211984 seconds (2 allocations: 64 bytes)
#= In[35]:14 =# @time(function_containing_local_var(x)) = 0.0002198997715224615

closureを使っても速くなる.

closure を使うよりも, 後述の function-like object を使う方がよいと思う.

In [36]:
# closureを使っても速くなる.

function make_function_closure(w)
    function_closure(x) = mean(w[i]*x[i] for i in eachindex(w))
    return function_closure
end

N = 10^8
srand(2018)
w = randn(N)
x = w.^2

function_closure = make_function_closure(w)
function_closure(x)
@show @time function_closure(x)

# println()
# @code_warntype function_closure(x)
  0.132833 seconds (7 allocations: 240 bytes)
#= In[36]:15 =# @time(function_closure(x)) = 0.0002198997715224615

大域変数を定数に変えても速くなる.

In [37]:
# 大域変数 w を定数にすると速くなる.
#
# この方法も単純だが, 柔軟性に欠けた不便な解決法でもある.

N = 10^8
srand(2018)
const w_const = randn(N)

function_containing_const(x) = mean(w_const[i]*x[i] for i in eachindex(w_const))

function_containing_const(x)
@show @time function_containing_const(x)

# println()
# @code_warntype function_containing_const()
  0.394652 seconds (7 allocations: 224 bytes)
#= In[37]:12 =# @time(function_containing_const(x)) = 0.0002198997715224615

パラメーターの型が確定している function-like object は速い.

In [38]:
# パラメーター w の型が確定している function-like object は速い

mutable struct WeightedMean{T} w::T end
(f::WeightedMean)(x) = mean(f.w[i]*x[i] for i in eachindex(f.w))

function_like_object = WeightedMean(w)
function_like_object(x)
@show @time function_like_object(x)

println()

# パラメーター w の型 T が確定している.
@show typeof(function_like_object)

# println()
# @code_warntype function_like_object(x)
  0.135698 seconds (7 allocations: 240 bytes)
#= In[38]:8 =# @time(function_like_object(x)) = 0.0002198997715224615

typeof(function_like_object) = WeightedMean{Array{Float64,1}}
In [39]:
# 上と全く同じ内容を改行を増やして書くと以下のようになる.

mutable struct WeightedMean{T}
    w::T
end

function (f::WeightedMean)(x)
    mean(f.w[i]*x[i] for i in eachindex(f.w))
end
 
function_like_object = WeightedMean(w)
function_like_object(x)
@show @time function_like_object(x)

println()

# パラメーター w の型 T が Array{Float64,1} に確定している.
@show typeof(function_like_object)

# println()
# @code_warntype function_like_object(x)
  0.147419 seconds (7 allocations: 240 bytes)
#= In[39]:13 =# @time(function_like_object(x)) = 0.0002198997715224615

typeof(function_like_object) = WeightedMean{Array{Float64,1}}
In [40]:
# 次のような書き方もできる. 
# パラメーター w の意味的にはこちらの方が好ましいと思われる.

mutable struct WeightedMeanArrayOnly{T}
    w::Array{T,1}
end

function (f::WeightedMeanArrayOnly)(x)
    mean(f.w[i]*x[i] for i in eachindex(f.w))
end
 
function_like_object_array_only = WeightedMeanArrayOnly(w)
function_like_object_array_only(x)
@show @time function_like_object_array_only(x)

println()

# 型 T が Float64 に確定している.
@show typeof(function_like_object_array_only)

# println()
# @code_warntype function_like_object_array_only(x)
  0.168467 seconds (7 allocations: 240 bytes)
#= In[40]:14 =# @time(function_like_object_array_only(x)) = 0.0002198997715224615

typeof(function_like_object_array_only) = WeightedMeanArrayOnly{Float64}
In [41]:
# パラメーターの w の型が不明の function-like object は遅い

mutable struct WeightedMeanSlow w end
(f::WeightedMeanSlow)(x) = mean(f.w[i]*x[i] for i in eachindex(f.w))

function_like_object_slow = WeightedMeanSlow(w)
function_like_object_slow(x)
@show @time function_like_object_slow(x)

println()
@show typeof(function_like_object_slow)

# w の型が Any になってしまっている
println()
@code_warntype function_like_object_slow(x)
 39.731876 seconds (600.00 M allocations: 8.941 GiB, 50.73% gc time)
#= In[41]:8 =# @time(function_like_object_slow(x)) = 0.0002198997715224615

typeof(function_like_object_slow) = WeightedMeanSlow

Body::Any
4 1 ─ %1 = %new(getfield(Main, Symbol("##37#38")){WeightedMeanSlow,Array{Float64,1}}, f, x)::getfield(Main, Symbol("##37#38")){WeightedMeanSlow,Array{Float64,1}}
│╻ getproperty  │   %2 = (Base.getfield)(f, :w)::Any
  │   %3 = (Main.eachindex)(%2)::Any
  │   %4 = (Base.Generator)(%1, %3)::Base.Generator{_1,getfield(Main, Symbol("##37#38")){WeightedMeanSlow,Array{Float64,1}}} where _1
  │   %5 = (Main.mean)(%4)::Any
  └──      return %5
In [42]:
# function-like object の函数の定義で f.w を誤って w と書いてしまうと大幅に遅くなる.
# w が大域変数として定義されているとエラーが出ないので要注意!

mutable struct WeightedMeanTypo{T} w::T end
(f::WeightedMeanTypo)(x) = mean(w[i]*x[i] for i in eachindex(f.w))

function_like_object_typo = WeightedMeanTypo(w)
function_like_object_typo(x)
@show @time function_like_object_typo(x)

println()
@code_warntype function_like_object_typo(x)
 41.312067 seconds (600.00 M allocations: 8.941 GiB, 50.34% gc time)
#= In[42]:9 =# @time(function_like_object_typo(x)) = 0.0002198997715224615

Body::Any
5 1 ─ %1 = %new(getfield(Main, Symbol("##39#40")){Array{Float64,1}}, x)::getfield(Main, Symbol("##39#40")){Array{Float64,1}}
│╻       getproperty  │   %2 = (Base.getfield)(f, :w)::Array{Float64,1}
││╻╷╷     axes1  │   %3 = (Base.arraysize)(%2, 1)::Int64
│││╻╷╷╷    axes  │   %4 = (Base.slt_int)(%3, 0)::Bool
││││┃│││    map  │   %5 = (Base.ifelse)(%4, 0, %3)::Int64
│││││┃│      Type  │   %6 = %new(Base.OneTo{Int64}, %5)::Base.OneTo{Int64}
││╻       Type  │   %7 = %new(Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##39#40")){Array{Float64,1}}}, %1, %6)::Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##39#40")){Array{Float64,1}}}
│╻       mean  │   %8 = invoke Statistics.mean(Statistics.identity::typeof(identity), %7::Base.Generator{Base.OneTo{Int64},getfield(Main, Symbol("##39#40")){Array{Float64,1}}})::Any
  └──      return %8

function-like object (=パラメーター付き函数)の作り方

Function-like object はパラメーター付き函数の作り方だと思ってよい.

In [43]:
mutable struct AffineTransformation{T}
    A::Array{T,2}
    b::Array{T,1}
end

function (f::AffineTransformation)(x)
    f.A * x + f.b
end

のような感じで作成し,

In [44]:
θ = π/3
A = [
    cos(θ) -sin(θ)
    sin(θ)  cos(θ)
]
b = [
    5.0
    -2.0
]
v = [
    1.0
    0.0
]
Φ = AffineTransformation(A, b)

@show Φ
@show Φ(v)
@show Φ.A = eye(2)
@show Φ.b = zeros(2)
@show Φ
@show Φ(v);
Φ = AffineTransformation{Float64}([0.5 -0.866025; 0.866025 0.5], [5.0, -2.0])
Φ(v) = [5.5, -1.13397]
Φ.A = eye(2) = [1.0 0.0; 0.0 1.0]
Φ.b = zeros(2) = [0.0, 0.0]
Φ = AffineTransformation{Float64}([1.0 0.0; 0.0 1.0], [0.0, 0.0])
Φ(v) = [1.0, 0.0]

のような感じで利用する.

配列を使うときには無用にメモリを消費するように書くと遅くなる.

現時点でのJulia言語では, 無用にメモリを消費する(無用に配列を確保してしまう)書き方を容易にできてしまう. 個人的な意見では現時点でのJulia言語について最も改善して欲しい所がこの問題.

無用にメモリを消費しないような書き方できれば計算も自然に速くなる.

解決法:

  • dot syntax をできるだけたくさん使う. More dots! @. マクロを使用可能ならそうする.

  • in-place 計算(すでに確保した配列ないでの計算)を利用する. = ではなく .= を使う.

  • @view および @views マクロを使用する.

  • 配列の計算を別の函数に分離して @inline を使う(これでうまく行く理由は不明).

  • 場合によってはforループに展開するか, それに近いことをする.

In [45]:
# テストデータの作成

N = 10^8
srand(2018)
a = randn(N)
b = randn(N)
c = randn(N)
d = randn(N);
In [46]:
# 次のセルよりかなり遅い.  メモリ消費量の巨大さにも注目!

s = Array{Float64}(undef, N)

function mysum_slow1(a, b, c, d)
    a + b + c + d
end

@benchmark s = mysum_slow1(a, b, c, d)
Out[46]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  7
  --------------
  minimum time:     551.476 ms (0.49% GC)
  median time:      1.007 s (0.27% GC)
  mean time:        1.076 s (31.76% GC)
  maximum time:     2.216 s (62.66% GC)
  --------------
  samples:          5
  evals/sample:     1
In [47]:
# 次のセルより遅い. 
# 前もって配列 s を確保してあるのに, 新たに同じサイズのメモリが消費されてしまっている.
# その理由は s = mysum_slow2(a, b, c, d) の = の右辺の分の配列が新たに確保されているからである.

s = Array{Float64}(undef, N)

function mysum_slow2(a, b, c, d)
    a .+ b .+ c .+ d
end

@benchmark s = mysum_slow2(a, b, c, d)
Out[47]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  2
  --------------
  minimum time:     565.737 ms (1.44% GC)
  median time:      656.078 ms (2.05% GC)
  mean time:        1.048 s (34.16% GC)
  maximum time:     2.429 s (59.77% GC)
  --------------
  samples:          5
  evals/sample:     1
In [48]:
# in-place 計算の使用によって代入操作も dot operator 化すると効率がよくなる.
#
# s .= a .+ b .+ c .+ d の代わりに @. s = a + b + c + d と書ける.

s = Array{Float64}(undef, N)

function mysum_atdot1!(s, a, b, c, d)
    @. s = a + b + c + d
end

mysum_atdot1!(s, a, b, c, d)
@show s == a + b + c + d

@benchmark mysum_atdot1!(s, a, b, c, d)
s == a + b + c + d = true
Out[48]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     351.389 ms (0.00% GC)
  median time:      380.283 ms (0.00% GC)
  mean time:        381.220 ms (0.00% GC)
  maximum time:     420.300 ms (0.00% GC)
  --------------
  samples:          14
  evals/sample:     1
In [49]:
# 上と本質的に同じ

s = Array{Float64}(undef, N)

function mysum_atdot2!(s, a, b, c, d)
    s .= a .+ b .+ c .+ d
end

mysum_atdot2!(s, a, b, c, d)
@show s == a + b + c + d

@benchmark mysum_atdot2!(s, a, b, c, d)
s == a + b + c + d = true
Out[49]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     362.076 ms (0.00% GC)
  median time:      401.119 ms (0.00% GC)
  mean time:        404.847 ms (0.00% GC)
  maximum time:     539.176 ms (0.00% GC)
  --------------
  samples:          13
  evals/sample:     1

注意:配列への代入と配列の要素への代入の違い

In [50]:
@show a = 0
@show b = Array{typeof(a),1}(undef, 2)
@show b[1] = a
@show b
@show a = 1
@show b[2] = a
@show b
println("\n以上は「意図した通り」だろう.\n")

@show a = [0,0]
@show b = Array{typeof(a),1}(undef, 2)
@show b[1] = a
@show b
@show a = [1, 2]
@show b[2] = a
@show b
println("\n以上も「意図した通り」だろう.\n")

@show a = [0,0]
@show b = Array{typeof(a),1}(undef, 2)
@show b[1] = a
@show b
@show a[1], a[2] = 1, 2
@show b[2] = a
@show b

println("\nb の要素がどちらも [1,2] になってしまった!\n")

@show a = [0,0]
@show b = Array{typeof(a),1}(undef, 2)
@show b[1] = copy(a)
@show b
@show a[1], a[2] = 1, 2
@show b[2] = a
@show b

println("\nこのように b[1] = a を b[1] = copy(a) に置き換えると「意図した通り」になる.\n")

println("\n次のセルのように2次元配列を使っても「意図した通り」になる.")
a = 0 = 0
b = Array{typeof(a), 1}(undef, 2) = [2, 246499216]
b[1] = a = 0
b = [0, 246499216]
a = 1 = 1
b[2] = a = 1
b = [0, 1]

以上は「意図した通り」だろう.

a = [0, 0] = [0, 0]
b = Array{typeof(a), 1}(undef, 2) = Array{Int64,1}[#undef, #undef]
b[1] = a = [0, 0]
b = Array{Int64,1}[[0, 0], #undef]
a = [1, 2] = [1, 2]
b[2] = a = [1, 2]
b = Array{Int64,1}[[0, 0], [1, 2]]

以上も「意図した通り」だろう.

a = [0, 0] = [0, 0]
b = Array{typeof(a), 1}(undef, 2) = Array{Int64,1}[#undef, #undef]
b[1] = a = [0, 0]
b = Array{Int64,1}[[0, 0], #undef]
(a[1], a[2]) = (1, 2) = (1, 2)
b[2] = a = [1, 2]
b = Array{Int64,1}[[1, 2], [1, 2]]

b の要素がどちらも [1,2] になってしまった!

a = [0, 0] = [0, 0]
b = Array{typeof(a), 1}(undef, 2) = Array{Int64,1}[#undef, #undef]
b[1] = copy(a) = [0, 0]
b = Array{Int64,1}[[0, 0], #undef]
(a[1], a[2]) = (1, 2) = (1, 2)
b[2] = a = [1, 2]
b = Array{Int64,1}[[0, 0], [1, 2]]

このように b[1] = a を b[1] = copy(a) に置き換えると「意図した通り」になる.


次のセルのように2次元配列を使っても「意図した通り」になる.
In [51]:
a = [0,0]
b = Array{eltype(a),2}(undef, 2,2)
display(b)
b[:,1] = a
display(b)
a[1], a[2] = 1, 2
b[:,2] = a
display(b)
2×2 Array{Int64,2}:
 21474836484  107374182416
 64424509454    8589934618
2×2 Array{Int64,2}:
 0  107374182416
 0    8589934618
2×2 Array{Int64,2}:
 0  1
 0  2
In [52]:
a = [1.2, 3.4]
s = zeros(2)
@show a
@show s

function f(s, a)
    s = a
end
f(s, a)

println()
@show s
println("函数内の s = a によって s の内容は変化していない.\n")

function g!(s, a)
    s .= a
end
g!(s, a)

@show s
println("函数内の s .= a によって s の内容が変化している.")
a = [1.2, 3.4]
s = [0.0, 0.0]

s = [0.0, 0.0]
函数内の s = a によって s の内容は変化していない.

s = [1.2, 3.4]
函数内の s .= a によって s の内容が変化している.

@view および @views を使う.

https://docs.julialang.org/en/stable/stdlib/arrays/#[email protected]

配列の部分配列に素朴にアクセスすると部分配列の大きさのメモリが新たに使用されてしまう. @view および @views マクロを使えばそうなることを防げる. (@views マクロは @. マクロの @view 版である.)

In [53]:
# 計算結果確認用のプロット函数

function plot2pcolormesh(u, v; cmap="CMRmap")
    sleep(0.1)
    d = size(u)[1]
    figure(figsize=(8,2))
    for i in 1:d
        fig = subplot(div(2d+2,4), 4, mod1(2i-1,4))
        pcolormesh(x, y, @view(u[i,:,:]), cmap=cmap)
        #colorbar()
        fig[:set_aspect]("equal")
        fig = subplot(div(2d+2,4), 4, mod1(2i,4))
        pcolormesh(x, y, @view(v[i,:,:]), cmap=cmap)
        #colorbar()
        fig[:set_aspect]("equal")
        tight_layout()
    end
end
Out[53]:
plot2pcolormesh (generic function with 1 method)
In [54]:
# テストデータの作成
# 3次元配列をベクトルの2次元配列であるかのように扱う

n = 1000
x = y = linspace(-5, 5, n)
f(x, y) = x^2 - y^2 # ラプラシアンを作用させると0になる.
g(x, y) = exp(-(x-2)^2-(y-2)^2) + exp(-(x+2)^2-(y+2)^2) # 混合ガウス分布
u = Array{Float64, 3}(undef, 2, n, n)
u[1,:,:] .= f.(x', y)
u[2,:,:] .= g.(x', y);
In [55]:
# 平面全体での離散ラプラシアン
#
# laplacian_local! は離散ラプラシアンの値を局所的に計算する函数である.
# このように, 局所的なラプラシアンの計算を別の函数として分離しておく.
# u[:,i,j] が平面座標 (i,j) における u の値(ベクトル値になる)を表現していると仮定する.
# v に離散ラプラシアンを施した結果を返す.
#
function laplacian!(v, u, laplacian_local!)
    m, n = size(u)[end-1:end]
    for i in 1:m
       for j in 1:n
            laplacian_local!(v, u, m, n, i, j)
       end
    end
end
Out[55]:
laplacian! (generic function with 1 method)
In [56]:
# 何も工夫していない laplacian_local_naive! を使用すると非常に遅い.
#
function laplacian_local_naive!(v, u, m, n, i, j)
    v[:,i,j] =
    u[:, ifelse(i+1  m, i+1, 1), j] + u[:, ifelse(i-1  1, i-1, m), j] +
    u[:, i, ifelse(j+1  n, j+1, 1)] + u[:, i, ifelse(j-1  1, j-1, n)] -
    4u[:, i, j]
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_naive!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_naive!)
Out[56]:
BenchmarkTools.Trial: 
  memory estimate:  900.27 MiB
  allocs estimate:  13000004
  --------------
  minimum time:     2.718 s (13.74% GC)
  median time:      2.857 s (12.98% GC)
  mean time:        2.857 s (12.98% GC)
  maximum time:     2.995 s (12.28% GC)
  --------------
  samples:          2
  evals/sample:     1
In [57]:
# @view を使うと少しだけ速くなる.

function laplacian_local_view!(v, u, m, n, i, j)
    v[:,i,j] =
    @view(u[:, ifelse(i+1  m, i+1, 1), j]) + @view(u[:, ifelse(i-1  1, i-1, m), j]) +
    @view(u[:, i, ifelse(j+1  n, j+1, 1)]) + @view(u[:, i, ifelse(j-1  1, j-1, n)]) -
    4*@view(u[:, i, j])
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_view!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_view!)
Out[57]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  10000004
  --------------
  minimum time:     504.526 ms (17.82% GC)
  median time:      549.978 ms (16.76% GC)
  mean time:        577.438 ms (17.30% GC)
  maximum time:     690.548 ms (18.31% GC)
  --------------
  samples:          9
  evals/sample:     1
In [58]:
# 上と同じ. @views を使った方が簡潔に書ける.

function laplacian_local_views!(v, u, m, n, i, j)
    @views(
        v[:,i,j] =
        u[:, ifelse(i+1  m, i+1, 1), j] + u[:, ifelse(i-1  1, i-1, m), j] +
        u[:, i, ifelse(j+1  n, j+1, 1)] + u[:, i, ifelse(j-1  1, j-1, n)] -
        4u[:, i, j]
    )
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_views!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_views!)
Out[58]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  10000004
  --------------
  minimum time:     514.639 ms (17.59% GC)
  median time:      553.605 ms (18.59% GC)
  mean time:        583.291 ms (18.18% GC)
  maximum time:     782.706 ms (17.84% GC)
  --------------
  samples:          9
  evals/sample:     1
In [59]:
# さらに @. を付けるとかなり速くなる.
# しかし, なぜかメモリが結構使用されている.

function laplacian_local_atdot_views!(v, u, m, n, i, j)
    @. @views(
        v[:,i,j] =
        u[:, ifelse(i+1  m, i+1, 1), j] + u[:, ifelse(i-1  1, i-1, m), j] +
        u[:, i, ifelse(j+1  n, j+1, 1)] + u[:, i, ifelse(j-1  1, j-1, n)] -
        4u[:, i, j]
    )
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_atdot_views!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_atdot_views!)
Out[59]:
BenchmarkTools.Trial: 
  memory estimate:  366.21 MiB
  allocs estimate:  6000004
  --------------
  minimum time:     222.899 ms (14.01% GC)
  median time:      237.404 ms (13.82% GC)
  mean time:        256.757 ms (14.22% GC)
  maximum time:     350.529 ms (14.65% GC)
  --------------
  samples:          20
  evals/sample:     1

v0.6 ではさらに @inline をつけると効率が大幅改善する場合があった.

これでうまく行っていた理由は不明.

In [60]:
# さらに @inline をつけるとかなり速くなる.
# メモリの使用量がほぼゼロになったことにも注目!

@inline function laplacian_local_inline_atdot_views!(v, u, m, n, i, j)
    @. @views(
        v[:,i,j] =
        u[:, ifelse(i+1  m, i+1, 1), j] + u[:, ifelse(i-1  1, i-1, m), j] +
        u[:, i, ifelse(j+1  n, j+1, 1)] + u[:, i, ifelse(j-1  1, j-1, n)] -
        4u[:, i, j]
    )
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_inline_atdot_views!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_inline_atdot_views!)
Out[60]:
BenchmarkTools.Trial: 
  memory estimate:  366.21 MiB
  allocs estimate:  6000004
  --------------
  minimum time:     207.651 ms (14.08% GC)
  median time:      224.196 ms (14.19% GC)
  mean time:        236.501 ms (14.43% GC)
  maximum time:     315.814 ms (15.33% GC)
  --------------
  samples:          22
  evals/sample:     1

forループに展開すると非常に速くなる

結局これが決定版!

In [61]:
# forループが速い!

@inline function laplacian_local_inline_for!(v, u, m, n, i, j)
    @inbounds for k in 1:size(u)[1]
        v[k, i, j] =
        u[k, ifelse(i+1  m, i+1, 1), j] + u[k, ifelse(i-1  1, i-1, m), j] +
        u[k, i, ifelse(j+1  n, j+1, 1)] + u[k, i, ifelse(j-1  1, j-1, n)] -
        4u[k, i, j]
    end
end

v = Array{Float64,3}(undef, 2, n, n)
laplacian!(v, u, laplacian_local_inline_for!)
plot2pcolormesh(u, v)
@benchmark laplacian!(v, u, laplacian_local_inline_for!)
Out[61]:
BenchmarkTools.Trial: 
  memory estimate:  160 bytes
  allocs estimate:  4
  --------------
  minimum time:     29.405 ms (0.00% GC)
  median time:      34.574 ms (0.00% GC)
  mean time:        37.997 ms (0.00% GC)
  maximum time:     61.992 ms (0.00% GC)
  --------------
  samples:          132
  evals/sample:     1

forループの使用例

forループへの展開はマクロを使うと短く書ける.

配列 a, b について sum(a+b), mean(a+b) はメモリを余計に消費して遅くなる. その場合にforループに展開するのが面倒ならば

sum(a[i]+b[i] for i in eachindex(a))
mean(a[i]+b[i] for i in eachindex(a))

と書くこともできる. このような単純なケースでは

sum(a) + sum(b)

sum を先に適用すれば速い. この方法は +* に変えると適用できない.

In [62]:
# 2N個の和をまとめて sum(a) で計算すると非常に速い

N = 10^8
srand(2018)
a = rand(2N)
@benchmark sum(a)
Out[62]:
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     124.830 ms (0.00% GC)
  median time:      139.086 ms (0.00% GC)
  mean time:        147.379 ms (0.00% GC)
  maximum time:     208.592 ms (0.00% GC)
  --------------
  samples:          34
  evals/sample:     1
In [63]:
# sumの中に配列の和a+bを入れると遅くなり, 大量にメモリを消費する.

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark sum(a+b)
Out[63]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  3
  --------------
  minimum time:     499.255 ms (0.37% GC)
  median time:      539.805 ms (12.35% GC)
  mean time:        563.864 ms (11.02% GC)
  maximum time:     695.901 ms (11.05% GC)
  --------------
  samples:          9
  evals/sample:     1
In [64]:
# a+b を a.+b に変えても効果無し

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark sum(a.+b)
Out[64]:
BenchmarkTools.Trial: 
  memory estimate:  762.94 MiB
  allocs estimate:  5
  --------------
  minimum time:     520.595 ms (0.29% GC)
  median time:      554.458 ms (13.98% GC)
  mean time:        579.219 ms (13.18% GC)
  maximum time:     664.360 ms (12.75% GC)
  --------------
  samples:          9
  evals/sample:     1
In [65]:
# forループに展開すると速い

function mysum(a,b)
    s = zero(eltype(a))
    for i in 1:endof(a)
        s += a[i] + b[i]
    end
    s
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark mysum(a,b)
Out[65]:
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     144.254 ms (0.00% GC)
  median time:      157.483 ms (0.00% GC)
  mean time:        162.621 ms (0.00% GC)
  maximum time:     198.187 ms (0.00% GC)
  --------------
  samples:          31
  evals/sample:     1
In [66]:
# forループはマクロにすると短く書ける

macro sum(init, i, iter, f)
    quote
        begin
            s = $(esc(init))
            for $(esc(i)) in $(esc(iter))
                s += $(esc(f))
            end
            s
        end
    end
end

function mysum_macro(a,b)
    @sum(zero(eltype(a)), i, 1:endof(a), a[i]+b[i])
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark mysum_macro(a,b)
Out[66]:
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     130.161 ms (0.00% GC)
  median time:      138.044 ms (0.00% GC)
  mean time:        139.475 ms (0.00% GC)
  maximum time:     154.948 ms (0.00% GC)
  --------------
  samples:          36
  evals/sample:     1
In [67]:
# @sum マクロがどのように展開されるか

@macroexpand @sum(0, k, 1:10, k)
Out[67]:
quote
    #= In[66]:5 =#
    begin
        #= In[66]:6 =#
        #218#s = 0
        #= In[66]:7 =#
        for k = 1:10
            #= In[66]:8 =#
            #218#s += k
        end
        #= In[66]:10 =#
        #218#s
    end
end
In [68]:
# マクロで正しく計算されていることを確認

@show let; @sum(0.0, i, 1:3, @sum(0, j, 1:2, 2.0^(i*j))); end
@show sum(2.0^(i*j) for i in 1:3 for j in 1:2);
let
    #= In[68]:3 =#
    #= In[68]:3 =# @sum 0.0 i 1:3 #= In[68]:3 =# @sum(0, j, 1:2, 2.0 ^ (i * j))
end = 98.0
sum((2.0 ^ (i * j) for i = 1:3 for j = 1:2)) = 98.0
In [69]:
# sum(i->a[i]+b[i], eachindex(a)) と書き換えると速くなる. 
# この方法は mean でも使用可能

function mysum_sum_of_forin(a,b)
    sum(i->a[i]+b[i], eachindex(a))
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark mysum_sum_of_forin(a,b)
Out[69]:
BenchmarkTools.Trial: 
  memory estimate:  48 bytes
  allocs estimate:  2
  --------------
  minimum time:     129.570 ms (0.00% GC)
  median time:      135.747 ms (0.00% GC)
  mean time:        139.838 ms (0.00% GC)
  maximum time:     202.300 ms (0.00% GC)
  --------------
  samples:          36
  evals/sample:     1
In [70]:
# sum(a+b) を sum(a[i]+b[i] for i in eachindex(a)) に書き換えても速くなる.
# この方法は mean でも使用可能

function mysum_sum_of_f_itr(a,b)
    sum(a[i]+b[i] for i in eachindex(a))
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark mysum_sum_of_f_itr(a,b)
Out[70]:
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  3
  --------------
  minimum time:     133.190 ms (0.00% GC)
  median time:      143.421 ms (0.00% GC)
  mean time:        147.253 ms (0.00% GC)
  maximum time:     209.965 ms (0.00% GC)
  --------------
  samples:          34
  evals/sample:     1
In [71]:
# 上と同様の趣旨

function mysum_reduce(a,b)
    reduce(+, (a[i]+b[i] for i in eachindex(a)), init=zero(eltype(a)))
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark mysum_reduce(a,b)
Out[71]:
BenchmarkTools.Trial: 
  memory estimate:  80 bytes
  allocs estimate:  3
  --------------
  minimum time:     128.215 ms (0.00% GC)
  median time:      139.305 ms (0.00% GC)
  mean time:        141.264 ms (0.00% GC)
  maximum time:     201.936 ms (0.00% GC)
  --------------
  samples:          36
  evals/sample:     1
In [72]:
# 実は以下のように sum を先に取れば速い.

function plus_sum(a,b)
    sum(a) + sum(b)
end

N = 10^8
srand(2018)
a = rand(N)
b = rand(N)
@benchmark plus_sum(a,b)
Out[72]:
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     125.157 ms (0.00% GC)
  median time:      133.192 ms (0.00% GC)
  mean time:        133.849 ms (0.00% GC)
  maximum time:     147.842 ms (0.00% GC)
  --------------
  samples:          38
  evals/sample:     1

具体例へのリンク

In [ ]: