Notebook

ロジスティック分布の2通りの正規分布近似の比較¶

黒木玄

2017-12-23, 2018-10-06

Copyright (c) 2018 Gen Kuroki
Released under the MIT license
http://opensource.org/licenses/mit-license.php

以下のブログ記事を読んだ.

https://www.johndcook.com/blog/2010/05/18/normal-approximation-to-logistic/

以下では, ログスティック分布をKullback-Leibler情報量の意味で最良近似する正規分布とsupノルムの意味で(すなわち確率密度函数の差の絶対値の上限を最小化するような)正規分布を比較してみる.

# Julia v1.0 には存在しないパッケージの読み込みと函数の定義
using Printf
linspace(start, stop, length) = range(start, stop=stop, length=length)

# 確率分布を扱うために使用
using Distributions

# supノルムを最小化する分散を求めるために使用」
using Optim

# KL情報量を積分で計算するために使用
using QuadGK

# 散布図の色付けのために使用
using KernelDensity

# プロットに使用
using PyPlot

ロジスティック分布¶

In [2]:

?Logistic

search: Logistic

Out[2]:

Logistic(μ,θ)

The Logistic distribution with location μ and scale θ has probability density function

$f(x; \mu, \theta) = \frac{1}{4 \theta} \mathrm{sech}^2 \left( \frac{x - \mu}{2 \theta} \right)$

Logistic()       # Logistic distribution with zero location and unit scale, i.e. Logistic(0, 1)
Logistic(u)      # Logistic distribution with location u and unit scale, i.e. Logistic(u, 1)
Logistic(u, b)   # Logistic distribution with location u ans scale b

params(d)       # Get the parameters, i.e. (u, b)
location(d)     # Get the location parameter, i.e. u
scale(d)        # Get the scale parameter, i.e. b

External links

Logistic distribution on Wikipedia

In [3]:

@show dist_logistic = Logistic(0.0, 1.0);

dist_logistic = Logistic(0.0, 1.0) = Logistic{Float64}(μ=0.0, θ=1.0)

dist_logistic は累積分布函数が

$F(x) = \frac{1}{1-e^{-x}}$

であるような標準ロジスティック分布を意味する.

In [4]:

?Normal

search: Normal NormalCanon NormalInverseGaussian MvNormal MvNormalCanon

Out[4]:

Normal(μ,σ)

The Normal distribution with mean μ and standard deviation σ has probability density function

$f(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp \left( - \frac{(x - \mu)^2}{2 \sigma^2} \right)$

Normal()          # standard Normal distribution with zero mean and unit variance
Normal(mu)        # Normal distribution with mean mu and unit variance
Normal(mu, sig)   # Normal distribution with mean mu and variance sig^2

params(d)         # Get the parameters, i.e. (mu, sig)
mean(d)           # Get the mean, i.e. mu
std(d)            # Get the standard deviation, i.e. sig

External links

Normal distribution on Wikipedia

KL情報量の意味での正規分布による最良近似¶

In [5]:

@show sigma_logistic = std(dist_logistic)
@show dist_best_normal_approx = Normal(0.0, sigma_logistic)

sleep(0.1)
x = linspace(-8,8,401)
figure()
plot(x, pdf.(dist_logistic, x),           label="logistic",            color="k", lw=1)
plot(x, pdf.(dist_best_normal_approx, x), label="best normal approx.", color="r", ls="--")
grid()
legend();

sigma_logistic = std(dist_logistic) = 1.8137993642342178
dist_best_normal_approx = Normal(0.0, sigma_logistic) = Normal{Float64}(μ=0.0, σ=1.8137993642342178)

dist_best_normal_approx はロジスティック分布をKullback-Leibler情報量の意味で最良近似する正規分布を意味する. そのような正規分布は平均と分散がロジスティック分布と等しい正規分布になる.

上のグラフはロジスティック分布(実線)を dist_best_normal_approx (破線)がどのように近似しているかを表わしている.

このグラフを初めて見た人は山の頂上における誤差が大きく見えてしまうかもしれない.

そこで次のセルでは確率密度函数の差が最小になるような正規分布の分散を求めてみよう.

supノルムの意味での正規分布による最良近似¶

In [6]:

function supnorm(f,  g; a=-10.0, b=10.0, L=1001)
    x = linspace(a, b, L)
    return maximum(abs.(f.(x) .- g.(x)))
end

pdf_logistic = (x -> pdf(dist_logistic, x))
pdf_normal(sigma) = (x -> pdf(Normal(0.0, sigma),x))
f(sigma) = supnorm(pdf_logistic, pdf_normal(sigma))
o = optimize(f, 0.1, 10)
show(o)
@show sigma_supnorm_normal_approx = o.minimizer
@show dist_supnorm_normal_approx = Normal(0.0, sigma_supnorm_normal_approx)

sleep(0.1)
x = linspace(-8,8,401)
figure()
plot(x, pdf.(dist_logistic, x),              label="logistic",               color="k", lw=1)
plot(x, pdf.(dist_supnorm_normal_approx, x), label="supnorm normal approx.", color="b", ls="-.")
grid()
legend();

Results of Optimization Algorithm
 * Algorithm: Brent's Method
 * Search Interval: [0.100000, 10.000000]
 * Minimizer: 1.616664e+00
 * Minimum: 1.145179e-02
 * Iterations: 36
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 37sigma_supnorm_normal_approx = o.minimizer = 1.6166639920560277
dist_supnorm_normal_approx = Normal(0.0, sigma_supnorm_normal_approx) = Normal{Float64}(μ=0.0, σ=1.6166639920560277)

上のセルでは, ロジスティック分布をsupノルムの意味で最良近似する(すなわち確率密度函数の差の絶対値の上限を最小にするような)正規分布 dist_supnorm_normal_approx を求めている.

グラフはロジスティック分布(実線)を dist_supnorm_normal_approx (破線)がどのうように近似しているかを表わしている.

KL情報量の意味での最良近似の場合と比較すると、supノルムの意味での最良近似の方がもとのロジスティック分布に「近い」と感じる人が結構いるのではなかろうか?

KL情報量の比較¶

In [7]:

function KullbackLeibler(dist_q, dist_p; a=-10.0, b=10.0)
    q(x) = pdf(dist_q, x)
    logq(x) = logpdf(dist_q, x)
    p(x) = pdf(dist_p, x)
    logp(x) = logpdf(dist_p, x)
    f(x) = (q(x) == zero(q(x)) ? zero(q(x)) : q(x)*(logq(x) - logp(x)))
    return quadgk(f, a, b)[1], f
end

KL_b, f_b = KullbackLeibler(dist_logistic, dist_best_normal_approx)
KL_s, f_s = KullbackLeibler(dist_logistic, dist_supnorm_normal_approx)

println("Kullback-Leibler informations:")
println("* D(dist_logistic||dist_best_normal_approx)    = $KL_b")
println("* D(dist_logistic||dist_supnorm_normal_approx) = $KL_s")

sleep(0.1)
x = linspace(-10, 10, 401)
figure()
axhline(0.0, color="k", lw=0.8)
plot(x, f_b.(x), label="best normal approx.",    color="r", ls="--")
plot(x, f_s.(x), label="supnorm normal approx.", color="b", ls="-.")
legend()
grid()
title("integrands of KL informations")

Kullback-Leibler informations:
* D(dist_logistic||dist_best_normal_approx)    = 0.013540014890777365
* D(dist_logistic||dist_supnorm_normal_approx) = 0.02743011460378268

Out[7]:

PyObject <matplotlib.text.Text object at 0x0000000027DBF978>

KL情報量の意味で, dist_best_normal_approx の方が dist_supnorm_normal_approx よりも dist_logistic のよい近似になっている. (KL情報量が小さい方が誤差が小さいと考える.)

確率分布 $q$ , $p$ のKL情報量を $D(q||p)$ と書くとき, Sanovの定理によれば, 確率分布 $p$ でサイズ $n$ のサンプルを生成するとき, そのサンプルが確率分布 $q$ で生成されたように見える確率は $n\to\infty$ で $\exp(-n D(q||p) + o(n))$ のように振る舞う. すなわち, KL情報量 $D(q||p)$ は確率分布 $p$ によるサンプルの生成が確率分布 $q$ によるサンプル生成でないことが判明する速さ( $p$ による $q$ のシミュレーションのボロが出る速さ)を意味する.

KL情報量 $D(q||p)$ は確率分布 $p$ による確率分布 $q$ のシミュレーションの誤差の大きさを表わしていると考えられる.

Sanovの定理(の簡単な場合)に関する解説は次のリンク先にある.

https://genkuroki.github.io/documents/20160616KullbackLeibler.pdf

どれだけロジスティック分布だと誤認し易いかを比較¶

そこで以下のような実験をしてみよう.

KL情報量の意味での最良近似の正規分布でサンプルを生成して, それをロジスティック分布が生成したサンプルだと誤認してしまう割合を計算する.
supノルムの意味での最良近似の正規分布でサンプルを生成して, それをロジスティック分布が生成したサンプルだと誤認してしまう割合を計算する.
以上の2つの割合を比較する.

ロジスティック分布が生成したサンプルだと誤認する割合が高い方がロジスティック分布により近い正規分布だと考えられる.

In [8]:

function AIC_normal(sample)
    sigma = std(sample, corrected=false, mean=0.0)
    logpred(x) = logpdf(Normal(0.0, sigma), x)
    return -2*sum(logpred.(sample)) + 2
end

AIC_logistic(sample) = -2*sum(logpdf.(dist_logistic, sample))

######### Tsets

n = 2^4

sample_best_normal_approx = rand(dist_best_normal_approx, n)
@show AIC_normal(sample_best_normal_approx)
@show AIC_logistic(sample_best_normal_approx)

sample_supnorm_normal_approx = rand(dist_supnorm_normal_approx, n)
@show AIC_normal(sample_supnorm_normal_approx)
@show AIC_logistic(sample_supnorm_normal_approx)

sample_logistic = rand(dist_logistic, n)
@show AIC_normal(sample_logistic)
@show AIC_logistic(sample_logistic)
;

AIC_normal(sample_best_normal_approx) = 67.2384203678449
AIC_logistic(sample_best_normal_approx) = 66.5329825456505
AIC_normal(sample_supnorm_normal_approx) = 52.149773978168966
AIC_logistic(sample_supnorm_normal_approx) = 53.662078971349736
AIC_normal(sample_logistic) = 63.56269384704142
AIC_logistic(sample_logistic) = 63.02840388979424

モデル選択はAICの比較によって行う. 上のセルはAICを計算するための函数の定義とそのテストを行っている.

モデルは以下の2つ.

正規分布モデル (平均は0に固定, パラメーターは分散 $\sigma^2$ のみ):

$p_{\mathrm{normal}}(x|\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{x}{2\sigma^2}\right)$

サンプル $X_1,\ldots,X_n$ に対して, この正規分布モデルの最尤法の解とAICは

$\sigma_* = \sqrt{\frac{1}{n}\sum_{i=1}^n X_i^2}, \quad p_*(x) = p_{\mathrm{normal}}(x|\sigma_*), \quad \mathrm{AIC}_{\mathrm{normal}} = -2\sum_{i=1}^n \log p_*(X_i) + 2.$

ロジスティック分布モデル (パラメーター無し):

$p_{\mathrm{logistic}}(x) = \frac{1}{4\cosh^2(x/2)}.$

サンプル $X_1,\ldots,X_n$ に対するAICは

$\mathrm{AIC}_{\mathrm{logistic}} = -2\sum_{i=1}^n \log p_{\mathrm{logistic}}(X_i).$

In [9]:

function simulation_by(dist; n = 2^8, L = 10^4)
    AIC_n = Array{Float64}(undef, L)
    AIC_l = Array{Float64}(undef, L)
    for i in 1:L
        sample = rand(dist, n)
        AIC_n[i] = AIC_normal(sample)
        AIC_l[i] = AIC_logistic(sample)
    end
    ratio_logstic = count(AIC_l .< AIC_n)/L
    return ratio_logstic, AIC_n, AIC_l 
end

function plotdiffs(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)
    figure(figsize=(10,4))
    
    subplot2grid((9,20), ( 1, 0), rowspan=8, colspan=10)
    title("Samples are generated by best normal approx.", fontsize=11)
    xlabel("AIC_normal \$-\$ AIC_logistic")
    ylabel("empirical probability")
    h = plt[:hist](AIC_n_b - AIC_l_b, normed=true, bins=100)
    axvline(x=0.0, color="k", ls="--", lw=1)
    r_str = @sprintf("%.2f", 100*count(AIC_n_b .> AIC_l_b)/L)
    text(1.0, 0.7*maximum(h[1]), "→ $r_str%", color="red")
    grid(ls=":")
    #legend()

    subplot2grid((9,20), ( 1, 10), rowspan=8, colspan=10)
    title("Samples are generated by supnorm normal approx.", fontsize=11)
    xlabel("AIC_normal \$-\$ AIC_logistic")
    ylabel("empirical probability")
    h = plt[:hist](AIC_n_s - AIC_l_s, normed=true, bins=100)
    axvline(x=0.0, color="k", ls="--", lw=1)
    r_str = @sprintf("%.2f", 100*count(AIC_n_s .> AIC_l_s)/L)
    text(1.0, 0.7*maximum(h[1]), "→ $r_str%", color="red")
    grid(ls=":")
    #legend()

    suptitle("=====    Sample size n = $n,    Iterations L = $L    =====")

    tight_layout()
    
end

function plotscatters(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)
    ik_b = InterpKDE(kde((AIC_n_b, AIC_l_b)))
    ik_s = InterpKDE(kde((AIC_n_s, AIC_l_s)))
    
    figure(figsize=(10,5))

    subplot2grid((11,20), ( 1, 0), rowspan=10, colspan=10, facecolor="darkgray")
    title("Samples are generated by best normal approx.", fontsize=11)
    xlabel("AIC of the normal model")
    ylabel("AIC of the logistic model")
    # scatter(AIC_n_b, AIC_l_b, c=pdf.(ik_b, AIC_n_b, AIC_l_b), cmap="CMRmap", s=2)
    scatter(AIC_n_b, AIC_l_b, c=[pdf(ik_b, AIC_n_b[i], AIC_l_b[i]) for i in eachindex(AIC_n_b)], cmap="CMRmap", s=2)
    xmin = minimum(AIC_n_b)
    xmax = maximum(AIC_n_b)
    plot([xmin, xmax], [xmin, xmax], color="cyan", ls="--", label="x = y")
    r_str = @sprintf("%.2f", 100*count(AIC_n_b .> AIC_l_b)/L)
    text(0.5*(xmin+xmax)+0.1*(xmax-xmin), 0.5*(xmin+xmax)-0.1*(xmax-xmin), "$r_str%", color="red")
    grid(ls=":", color="w")
    legend()

    subplot2grid((11,20), ( 1, 10), rowspan=10, colspan=10, facecolor="darkgray")
    title("Samples are generated by supnorm normal approx.", fontsize=11)
    xlabel("AIC of the normal model")
    ylabel("AIC of the logistic model")
    # scatter(AIC_n_s, AIC_l_s, c=pdf.(ik_b, AIC_n_s, AIC_l_s), cmap="CMRmap", s=2)
    scatter(AIC_n_s, AIC_l_s, c=[pdf(ik_b, AIC_n_s[i], AIC_l_s[i]) for i in eachindex(AIC_n_s)], cmap="CMRmap", s=2)
    xmin = minimum(AIC_n_s)
    xmax = maximum(AIC_n_s)
    plot([xmin, xmax], [xmin, xmax], color="cyan", ls="--", label="x = y")
    r_str = @sprintf("%.2f", 100*count(AIC_n_s .> AIC_l_s)/L)
    text(0.5*(xmin+xmax)+0.1*(xmax-xmin), 0.5*(xmin+xmax)-0.1*(xmax-xmin), "$r_str%", color="red")
    grid(ls=":", color="w")
    legend()

    suptitle("=====    Sample size n = $n,    Iterations L = $L    =====")

    tight_layout()
end

Out[9]:

plotscatters (generic function with 1 method)

上のセルで確率分布 dist でサンプルを繰り返し生成してAICを計算して比較するシミュレーションを実行するための函数を定義した.

Sample size n=128, iterations L=10000¶

In [10]:

n = 2^7
L = 10^4

@time ratio_logistic_b, AIC_n_b, AIC_l_b = simulation_by(dist_best_normal_approx,    n=n, L=L)
@time ratio_logistic_s, AIC_n_s, AIC_l_s = simulation_by(dist_supnorm_normal_approx, n=n, L=L)
println()
@show ratio_logistic_b
@show ratio_logistic_s

sleep(0.1)
plotdiffs(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)
plotscatters(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)

  0.326373 seconds (559.58 k allocations: 55.126 MiB, 6.08% gc time)
  0.131291 seconds (155.81 k allocations: 35.800 MiB, 3.23% gc time)

ratio_logistic_b = 0.2804
ratio_logistic_s = 0.1782

Sample size n=256, iterations L=10000¶

In [11]:

n = 2^8
L = 10^4

@time ratio_logistic_b, AIC_n_b, AIC_l_b = simulation_by(dist_best_normal_approx,    n=n, L=L)
@time ratio_logistic_s, AIC_n_s, AIC_l_s = simulation_by(dist_supnorm_normal_approx, n=n, L=L)
println()
@show ratio_logistic_b
@show ratio_logistic_s

sleep(0.1)
plotdiffs(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)
plotscatters(n, L, AIC_n_b, AIC_l_b, AIC_n_s, AIC_l_s)

  0.326658 seconds (222.49 k allocations: 65.657 MiB, 7.32% gc time)
  0.272361 seconds (221.70 k allocations: 65.645 MiB, 4.41% gc time)

ratio_logistic_b = 0.1514
ratio_logistic_s = 0.0654

ratio_logistic_b は「ロジスティック分布をKullback-Leibler情報量の意味で最良近似する正規分布で生成されたサンプルによるモデル選択で正規分布ではなくロジスティック分布が選択された割合」を意味する.

ratio_logistic_s は「ロジスティック分布を確率密度函数の差のsupノルムの意味で最良近似する正規分布で生成されたサンプルによるモデル選択で正規分布ではなくロジスティック分布が選択された割合」を意味する.

モデル選択にはAICを使った. ロジスティック分布モデルはパラメーターを持たないモデルを採用し, 正規分布モデルは平均は0で分散のみをパラメーターに持つモデルを採用した.

上の計算結果は, サンプルサイズ $n = 2^8$ のとき, 正規分布によるサンプルをロジスティック分布が生成したサンプルだと誤認してしまう確率は、正規分布をsupノルムの意味でロジスティック分布に近付けた場合よりも, Kullback-Leibler情報量の意味で近付けた場合の方が高くなることを意味している.

Sample size n=2^2, 2^3, ..., 2^12, iterations L=10000¶

In [12]:

@time let

ks = 2:12
L = 10^4
r_b = Array{Float64}(undef, length(ks))
r_s = Array{Float64}(undef, length(ks))
i = 0
for k in ks
    n = 2^k
    i += 1
    r_b[i], = simulation_by(dist_best_normal_approx,    n=n, L=L)
    r_s[i], = simulation_by(dist_supnorm_normal_approx, n=n, L=L)
end

sleep(0.1)
figure()
plot(ks, r_b, label="best normal approx.",    color="r", ls="--")
plot(ks, r_s, label="supnorm normal approx.", color="b", ls="-.")
grid()
xlabel("\$\\log_2\$(sample size)")
ylabel("empirical probability ($L iterations)")
legend()
title("probability of the logistic model selected")

end

 17.979838 seconds (10.56 M allocations: 3.887 GiB, 3.78% gc time)

Out[12]:

PyObject <matplotlib.text.Text object at 0x000000002CD99588>

このプロットより, 正規分布によるKL情報量の意味での最良近似によるサンプル生成の方が, supノルムの意味での最良近似よりも, ロジスティック分布だと誤認され易いことがわかる.

In [ ]: