Notebook

測定スキルの異なる7人の科学者たち¶

黒木玄

2017-09-29～30 (2019-09-18 Julia v1.0以上に対応)

1 サンプル
2 4つの確率モデル
3 4つの確率モデルを記述する函数の定義
4 最尤法の実行
5 比較
6 階層モデルとの関係
- 6.1 t 分布が出て来る仕組み
- 6.2 階層モデルにおける正規分布の分散と標準偏差の分布をプロット
7 交差検証
8 別のサンプル

In [1]:

using PyPlot
using Distributions
using Optim
linspace(a, b, l) = range(a, b, length=l)
using Printf

using Base64
displayfile(mime, file; tag="img") = open(file) do f
    display("text/html", """<$tag src="data:$mime;base64,$(base64encode(f))"/>""")
end

Out[1]:

displayfile (generic function with 1 method)

MacKay, David J. C., Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003, 640 pages. (download) のp.309にある演習問題22.15(次のセルに引用)をこのノートでは扱う.

In [2]:

displayfile("image/png", "Seven scientists.png")

サンプル¶

7人の科学者は同じ物理量を測定したところ、それぞれが次の Y のような観測結果を得た。

Y[1]=-27.020, Y[2]=3.670 の2つの「外れ値」が推定結果にどのような影響を与えるだろうか？

In [3]:

Y = [-27.020, 3.570, 8.191, 9.808, 9.603, 9.945, 10.056]
n = length(Y);

4つの確率モデル¶

Normal distribution model:

$y \sim \mathrm{Normal}(\mu,\sigma) \tag{Normal}$

Cauchy distribution model:

$y \sim \mathrm{Cauchy}(\mu,\sigma) \sim \mu + \sigma\; \mathrm{Cauchy}() \tag{Cauchy}$

t-distribution model 1:

$y \sim \mu + \mathrm{TDist}(\nu) \tag{TDist1}$

t-distribution model 2:

$y \sim \mu + \sigma\; \mathrm{TDist}(\nu) \tag{TDist2}$

自由度 $\nu$ の t 分布の確率密度函数は次のように書ける:

$\displaystyle p(x|\nu) = \frac{1}{\sqrt{\nu}\, B(1/2, \nu/2)} \left( 1 + \frac{x^2}{\nu} \right)^{-(\nu + 1)/2}.$

$\nu=1$ の場合がCauchy分布である:

$\displaystyle p(\,x\;|\;\nu=1\,) = \frac{1}{\sqrt{\pi}}\frac{1}{1+x^2}.$

パラメーター $\nu$ が大きいほど $x=0$ の周辺に分布が集中するようになる。

In [4]:

f(nu, x) = pdf(TDist(nu),x)
x = collect(linspace(-10,10,400))

figure(figsize=(6,3.6))
for nu in [0.25,0.5,1.0,2.0,10.0]
    plot(x, f.(nu,x), label="\$\\nu = $nu\$", linewidth=0.8)
end
grid("on")
legend()
title("Probability density functions of t-distributions");

In [5]:

f(nu, x) = pdf(TDist(nu),x)
g(sigma, x) = pdf(Normal(0,sigma),x)

nu = 1
sigma = 1.5

x = collect(linspace(-10,10,400))
figure(figsize=(6,3.6))
plot(x, g.(sigma,x), label="Normal(0,\$\\sigma\$=$sigma)", linewidth=0.8, linestyle="-")
plot(x, f.(nu,x), label="TDist(\$\\nu\$=$nu)", linewidth=0.8, linestyle="--")
grid("on")
legend()
title("Probability density functions");

Cauchy分布およびその一般化である t 分布の裾野は重い(太い)。

正規分布では確率密度函数の裾野が指数函数的に $0$ に近付くが、 t 分布では $O(|x|^{\nu+1})$ のオーダーでしか減少しない( $\nu >0$ )。

4つの確率モデルを記述する函数の定義¶

In [6]:

logpdf_TDist1(mu,nu,x) = logpdf(TDist(nu), x-mu)
logpdf_TDist2(mu,nu,sigma,x) = logpdf(TDist(nu), (x-mu)/sigma) - log(sigma)

negloglik_Normal(mu,sigma)    = sigma > zero(sigma) ? -sum(logpdf(Normal(mu,sigma),Y[i])   for i in 1:n) : Inf
negloglik_Cauchy(mu,sigma)    = sigma > zero(sigma) ? -sum(logpdf(Cauchy(mu,sigma),Y[i])   for i in 1:n) : Inf
negloglik_TDist1(mu,nu)       = nu > zero(nu)       ? -sum(logpdf_TDist1(mu,nu,Y[i])       for i in 1:n) : Inf
negloglik_TDist2(mu,nu,sigma) = 
             (nu > zero(nu) && sigma > zero(sigma)) ? -sum(logpdf_TDist2(mu,nu,sigma,Y[i]) for i in 1:n) : Inf

negloglik_Normal(x) = negloglik_Normal(x[1],x[2])
negloglik_Cauchy(x) = negloglik_Cauchy(x[1],x[2])
negloglik_TDist1(x) = negloglik_TDist1(x[1],x[2])
negloglik_TDist2(x) = negloglik_TDist2(x[1],x[2],x[3])

lik_Normal(mu,sigma)    = exp(-negloglik_Normal(mu,sigma))
lik_Cauchy(mu,sigma)    = exp(-negloglik_Cauchy(mu,sigma))
lik_TDist1(mu,nu)       = exp(-negloglik_TDist1(mu,nu))
lik_TDist2(mu,nu,sigma) = exp(-negloglik_TDist2(mu,nu,sigma))
;

最尤法の実行¶

In [7]:

fit_Normal = fit_mle(Normal,Y)

Out[7]:

Normal{Float64}(μ=3.450428571428571, σ=12.620958836759735)

あえて optimize() 函数で実行してみる。

In [8]:

opt_Normal = optimize(negloglik_Normal, [10.0, 1.0])
@show opt_Normal

@show nmu    = opt_Normal.minimizer[1]
@show nsigma = opt_Normal.minimizer[2]
@show nmu_str = @sprintf("%.1f", nmu)
@show nsigma_str = @sprintf("%.0f", nsigma)

pred_Normal(x) = pdf(Normal(nmu, nsigma), x)
@show maxloglik_Normal = - negloglik_Normal(nmu, nsigma)
@show AIC_Normal = -2*maxloglik_Normal + 2*2

sleep(0.2)

mu = collect(linspace(-10,15,100))
sigma = collect(linspace(6,24,100))
L = lik_Normal.(mu',sigma)
cmap = "OrRd"
figure()
pcolormesh(mu, sigma, L, cmap=cmap, vmin=0.0)
colorbar(label="likelihood")
xlabel("\$\\mu\$")
ylabel("\$\\sigma\$")
grid("on", linestyle=":")
title("Model:  \$y \\sim \\mathrm{Normal}(\\mu,\\sigma)\$")
text(-9, 6.3, "Sample: $Y", fontsize=9)
scatter([nmu],[nsigma], label="MLE")
legend()

opt_Normal =  * Status: success

 * Candidate solution
    Minimizer: [3.45e+00, 1.26e+01]
    Minimum:   2.768008e+01

 * Found with
    Algorithm:     Nelder-Mead
    Initial Point: [1.00e+01, 1.00e+00]

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    41
    f(x) calls:    81

nmu = opt_Normal.minimizer[1] = 3.450141062827999
nsigma = opt_Normal.minimizer[2] = 12.620659842630458
nmu_str = #= In[8]:6 =# @sprintf("%.1f", nmu) = "3.5"
nsigma_str = #= In[8]:7 =# @sprintf("%.0f", nsigma) = "13"
maxloglik_Normal = -(negloglik_Normal(nmu, nsigma)) = -27.68008156065995
AIC_Normal = -2maxloglik_Normal + 2 * 2 = 59.3601631213199

Out[8]:

PyObject <matplotlib.legend.Legend object at 0x0000000059964C88>

In [9]:

opt_Cauchy = optimize(negloglik_Cauchy, [10.0, 1.0])
@show opt_Cauchy

@show cmu    = opt_Cauchy.minimizer[1]
@show csigma = opt_Cauchy.minimizer[2]
@show cmu_str = @sprintf("%.1f", cmu)
@show csigma_str = @sprintf("%.2f", csigma)

pred_Cauchy(x) = pdf(Cauchy(cmu, csigma), x)
@show maxloglik_Cauchy = - negloglik_Cauchy(cmu, csigma)
@show AIC_Cauchy = -2*maxloglik_Cauchy + 2*2

sleep(0.2)

mu = collect(linspace(8.5,11,100))
sigma = collect(linspace(eps(),2,100))
L = lik_Cauchy.(mu',sigma)
cmap = "OrRd"
figure()
pcolormesh(mu, sigma, L, cmap=cmap, vmin=0.0)
colorbar(label="likelihood")
xlabel("\$\\mu\$")
ylabel("\$\\sigma\$")
grid("on", linestyle=":")
title("Model:  \$y \\sim \\mu + \\sigma\\mathrm{Cauchy}()\$")
text(8.6, 0.05, "Sample: $Y", fontsize=9)
scatter([cmu],[csigma], label="MLE")
legend()

opt_Cauchy =  * Status: success

 * Candidate solution
    Minimizer: [9.81e+00, 4.09e-01]
    Minimum:   1.966369e+01

 * Found with
    Algorithm:     Nelder-Mead
    Initial Point: [1.00e+01, 1.00e+00]

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    39
    f(x) calls:    82

cmu = opt_Cauchy.minimizer[1] = 9.812920133887147
csigma = opt_Cauchy.minimizer[2] = 0.40897290161134037
cmu_str = #= In[9]:6 =# @sprintf("%.1f", cmu) = "9.8"
csigma_str = #= In[9]:7 =# @sprintf("%.2f", csigma) = "0.41"
maxloglik_Cauchy = -(negloglik_Cauchy(cmu, csigma)) = -19.663689546771376
AIC_Cauchy = -2maxloglik_Cauchy + 2 * 2 = 43.32737909354275

Out[9]:

PyObject <matplotlib.legend.Legend object at 0x000000005A05D358>

In [10]:

opt_TDist1 = optimize(negloglik_TDist1, [10.0, 1.0])
@show opt_TDist1

@show t1mu = opt_TDist1.minimizer[1]
@show t1nu = opt_TDist1.minimizer[2]
@show t1mu_str = @sprintf("%.1f", t1mu)
@show t1nu_str = @sprintf("%.2f", t1nu)

pred_TDist1(x) = pdf(TDist(t1nu), x-t1mu)
@show maxloglik_TDist1 = - negloglik_TDist1(t1mu, t1nu)
@show AIC_TDist1 = -2*maxloglik_TDist1 + 2*2

sleep(0.2)

mu = collect(linspace(8.5,11,100))
nu = collect(linspace(eps(),2,100))
L = lik_TDist1.(mu',nu)
cmap = "OrRd"
figure()
pcolormesh(mu, nu, L, cmap=cmap, vmin=0.0)
colorbar(label="likelihood")
xlabel("\$\\mu\$")
ylabel("\$\\nu\$")
grid("on", linestyle=":")
title("Model:  \$y \\sim \\mu + \\mathrm{TDist}(\\nu)\$")
text(8.6, 0.05, "Sample: $Y", fontsize=9)
scatter([t1mu],[t1nu], label="MLE")
legend()

opt_TDist1 =  * Status: success

 * Candidate solution
    Minimizer: [9.73e+00, 6.23e-01]
    Minimum:   1.982632e+01

 * Found with
    Algorithm:     Nelder-Mead
    Initial Point: [1.00e+01, 1.00e+00]

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    33
    f(x) calls:    70

t1mu = opt_TDist1.minimizer[1] = 9.728455866678203
t1nu = opt_TDist1.minimizer[2] = 0.6230116833132611
t1mu_str = #= In[10]:6 =# @sprintf("%.1f", t1mu) = "9.7"
t1nu_str = #= In[10]:7 =# @sprintf("%.2f", t1nu) = "0.62"
maxloglik_TDist1 = -(negloglik_TDist1(t1mu, t1nu)) = -19.826318428686324
AIC_TDist1 = -2maxloglik_TDist1 + 2 * 2 = 43.65263685737265

Out[10]:

PyObject <matplotlib.legend.Legend object at 0x000000005A0A2208>

In [11]:

opt_TDist2 = optimize(negloglik_TDist2, [10.0, 1.0, 1.0])
@show opt_TDist2

@show t2mu = opt_TDist2.minimizer[1]
@show t2nu = opt_TDist2.minimizer[2]
@show t2sigma = opt_TDist2.minimizer[3]
@show t2mu_str = @sprintf("%.1f", t2mu)
@show t2nu_str = @sprintf("%.2f", t2nu)
@show t2sigma_str = @sprintf("%.2f", t2sigma)

pred_TDist2(x) = pdf(TDist(t2nu), (x-t2mu)/t2sigma)/t2sigma
@show maxloglik_TDist2 = - negloglik_TDist2(t2mu, t2nu, t2sigma)
@show AIC_TDist2 = -2*maxloglik_TDist2 + 2*3

mu = collect(linspace(8.5,11,100))
nu = collect(linspace(eps(),2,100))
L = lik_TDist2.(mu', nu, t2sigma)
cmap = "OrRd"
figure()
pcolormesh(mu, nu, L, cmap=cmap, vmin=0.0)
colorbar(label="likelihood")
xlabel("\$\\mu\$")
ylabel("\$\\nu\$")
grid("on", linestyle=":")
title("Model:  \$y \\sim \\mu + \\sigma\\;\\mathrm{TDist}(\\nu)\$")
text(8.6, 1.65, "Sample: $Y", fontsize=9)
text(8.6, 1.85, "\$\\sigma = $t2sigma_str\$")
scatter([t2mu],[t2nu], label="MLE")
legend()

sleep(0.2)

mu = collect(linspace(8.5,11,100))
sigma = collect(linspace(eps(),1,100))
L = lik_TDist2.(mu', t2nu, sigma)
cmap = "OrRd"
figure()
pcolormesh(mu, sigma, L, cmap=cmap, vmin=0.0)
colorbar(label="likelihood")
xlabel("\$\\mu\$")
ylabel("\$\\sigma\$")
grid("on", linestyle=":")
title("Model:  \$y \\sim \\mu + \\sigma\\;\\mathrm{TDist}(\\nu)\$")
text(8.6, 0.85, "Sample: $Y", fontsize=9)
text(8.6, 0.92, "\$\\nu = $t2nu_str\$")
scatter([t2mu],[t2sigma], label="MLE")
legend()

opt_TDist2 =  * Status: success

 * Candidate solution
    Minimizer: [9.88e+00, 4.07e-01, 1.94e-01]
    Minimum:   1.776957e+01

 * Found with
    Algorithm:     Nelder-Mead
    Initial Point: [1.00e+01, 1.00e+00, 1.00e+00]

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    80
    f(x) calls:    154

t2mu = opt_TDist2.minimizer[1] = 9.87567510643497
t2nu = opt_TDist2.minimizer[2] = 0.4065330086470647
t2sigma = opt_TDist2.minimizer[3] = 0.19351765869024307
t2mu_str = #= In[11]:7 =# @sprintf("%.1f", t2mu) = "9.9"
t2nu_str = #= In[11]:8 =# @sprintf("%.2f", t2nu) = "0.41"
t2sigma_str = #= In[11]:9 =# @sprintf("%.2f", t2sigma) = "0.19"
maxloglik_TDist2 = -(negloglik_TDist2(t2mu, t2nu, t2sigma)) = -17.769565021161156
AIC_TDist2 = -2maxloglik_TDist2 + 2 * 3 = 41.53913004232231

Out[11]:

PyObject <matplotlib.legend.Legend object at 0x000000005EAEBF60>

比較¶

In [12]:

@show AIC_Normal
@show AIC_Cauchy
@show AIC_TDist1
@show AIC_TDist2;

AIC_Normal = 59.3601631213199
AIC_Cauchy = 43.32737909354275
AIC_TDist1 = 43.65263685737265
AIC_TDist2 = 41.53913004232231

AIC_TDist2 < AIC_Cauchy < AIC_TDist1 < AIC_Normal

AIC は小さい方がよい。

In [13]:

figure(figsize=(6,3.6))

x = collect(linspace(8.5,11.5,1000))
plot(x, pred_Normal.(x), label="Normal(\$\\mu=$nmu_str\$, \$\\sigma=$nsigma_str\$)", lw=0.8, "-")
plot(x, pred_Cauchy.(x), label="Cauchy(\$\\mu=$cmu_str\$, \$\\sigma=$csigma_str\$)", lw=1.2, ls=":")
plot(x, pred_TDist1.(x), label="$t1mu_str + TDist(\$\\nu=$t1nu_str\$)", lw=0.8, ls="--")
plot(x, pred_TDist2.(x), label="$t2mu_str + $t2sigma_str TDist(\$\\nu=$t2nu_str\$)", lw=0.8, ls="-.")
xlabel("y")
grid("on", linestyle=":")

y = zero(Y)
scatter(Y[4:n], y[4:n], s=15, marker="o", alpha=0.5, color="red", label="Sample")

legend(fontsize=9)

title("PDFs of the predictive distributions");

正規分布のモデルの予測分布は平べったく広範囲に広がってしまっている。これは外れ値の影響である。

それに対して、裾野が重い(太い)Cauchy分布やその一般化である t 分布を用いたモデルでは予測分布の幅が狭くなっている。そして、この場合には予測分布の幅が狭いほど AIC が小さくなっており、予測精度が高いと推測される。

In [14]:

figure(figsize=(6,3.6))

x = collect(linspace(2,9,1000))
plot(x, pred_Normal.(x), label="Normal(\$\\mu=$nmu_str\$, \$\\sigma=$nsigma_str\$)", lw=0.8, "-")
plot(x, pred_Cauchy.(x), label="Cauchy(\$\\mu=$cmu_str\$, \$\\sigma=$csigma_str\$)", lw=1.2, ls=":")
plot(x, pred_TDist1.(x), label="$t1mu_str + TDist(\$\\nu=$t1nu_str\$)", lw=0.8, ls="--")
plot(x, pred_TDist2.(x), label="$t2mu_str + $t2sigma_str TDist(\$\\nu=$t2nu_str\$)", lw=0.8, ls="-.")
xlabel("y")
grid("on", linestyle=":")

y = zero(Y)
scatter(Y[2:3], y[2:3], s=15, marker="o", alpha=0.5, color="red", label="Sample")

legend(fontsize=9)

title("PDFs of the predictive distributions");

In [15]:

figure(figsize=(6,3.6))

x = collect(linspace(-40,-20,1000))
plot(x, pred_Normal.(x), label="Normal(\$\\mu=$nmu_str\$, \$\\sigma=$nsigma_str\$)", lw=0.8, "-")
plot(x, pred_Cauchy.(x), label="Cauchy(\$\\mu=$cmu_str\$, \$\\sigma=$csigma_str\$)", lw=1.2, ls=":")
plot(x, pred_TDist1.(x), label="$t1mu_str + TDist(\$\\nu=$t1nu_str\$)", lw=0.8, ls="--")
plot(x, pred_TDist2.(x), label="$t2mu_str + $t2sigma_str TDist(\$\\nu=$t2nu_str\$)", lw=0.8, ls="-.")
xlabel("y")
grid("on", linestyle=":")

y = zero(Y)
scatter(Y[1:1], y[1:1], s=15, marker="o", alpha=0.5, color="red", label="Sample")

legend(fontsize=9)

title("PDFs of the predictive distributions");

In [16]:

figure(figsize=(6,3.6))

x = collect(linspace(-40,40,1000))
plot(x, pred_Normal.(x), label="Normal(\$\\mu=$nmu_str\$, \$\\sigma=$nsigma_str\$)", lw=0.8, "-")
#plot(x, pred_Cauchy.(x), label="Cauchy(\$\\mu=$cmu_str\$, \$\\sigma=$csigma_str\$)", lw=1.2, ls=":")
#plot(x, pred_TDist1.(x), label="$t1mu_str + TDist(\$\\nu=$t1nu_str\$)", lw=0.8, ls="--")
#plot(x, pred_TDist2.(x), label="$t2mu_str + $t2sigma_str TDist(\$\\nu=$t2nu_str\$)", lw=0.8, ls="-.")
xlabel("y")
grid("on", linestyle=":")

y = zero(Y)
y[5] = 0.001
y[6] = 0.002
y[7] = 0.003
scatter(Y, y, s=15, marker="o", alpha=0.5, color="red", label="Sample")

legend(fontsize=9)

title("PDFs of the predictive distribution");

階層モデルとの関係¶

t 分布が出て来る仕組み¶

平均 $0$ , 分散 $u$ 正規分布の確率密度函数は次のように書ける:

$p(z|u) = \frac{1}{\sqrt{2\pi v}} e^{-z^2/(2u)}.$

逆ガンマ分布の確率密度函数は次のように書ける:

$\displaystyle p(u|\alpha,\theta) = \frac{\theta^\alpha}{\Gamma(\alpha)} e^{-\theta/u} u^{-\alpha-1}.$

確率変数 $u$ がこの確率密度函数の逆ガンマ分布に従うことと、その逆数 $t=u^{-1}$ が次の確率密度函数のガンマ分布に従うことは同値である:

$\displaystyle p(t|\alpha,\theta^{-1}) = \frac{\theta^\alpha}{\Gamma(\alpha)} e^{-\theta t} t^{\alpha-1}.$

これが逆ガンマ分布の名前の由来である。

パラメーター $\alpha,\theta$ が与えられており、確率変数 $z$ は平均 $0$ , 確率変数の分散 $u$ の正規分布に従い、確率変数 $u$ は逆ガンマ分布に従うと仮定する:

$\displaystyle z \sim \mathrm{Normal}(0,\sigma^2=u), \qquad u \sim \mathrm{InverseGamma}(\alpha,\theta).$

このような確率モデルは階層モデルと呼ばれている。測定誤差 $z$ は正規分布に従うが、その分散 $u$ は科学者ごとによって違っており、科学者ごとの分散 $u$ は逆ガンマ分布にしたがっているばらついていると考えるのである。一般に「パラメーターに個人差があるモデル」は階層モデルで記述される。

このとき、 $z$ が従う確率分布の確率密度函数は次の形になる:

$\displaystyle p(z|\alpha,\theta)= \int_0^\infty p(z|u)p(u|\alpha,\theta)\,du = \frac{1}{\sqrt{2\theta}\;B(1/2,\alpha)}\left(1+\frac{z^2}{2\theta}\right)^{-(\alpha+1/2)}.$

証明積分変数を $u=1/t$ と置換すれば欲しい結果が得られる:

$\displaystyle \begin{aligned} p(z|\alpha,\theta) &= \int_0^\infty \frac{1}{\sqrt{2\pi u}}e^{-z^2/(2u)} \frac{\theta^\alpha}{\Gamma(\alpha)} e^{-\theta/u} u^{-\alpha-1}\,du \\ &= \frac{\theta^\alpha}{\sqrt{2\pi}\,\Gamma(\alpha)} \int_0^\infty e^{-\theta(1+z^2/(2\theta))/u} u^{-\alpha-1/2-1}\,du \\ &=\frac{\theta^\alpha}{\sqrt{2\pi}\,\Gamma(\alpha)} \int_0^\infty e^{-\theta(1+z^2/(2\theta))t} t^{\alpha+1/2-1}\,dt \\ &=\frac{\theta^\alpha}{\sqrt{2\pi}\,\Gamma(\alpha)} \theta^{-\alpha-1/2} \left(1+\frac{z^2}{2\theta}\right)^{-\alpha-1/2} \Gamma(\alpha+1/2) \\ &=\frac{\Gamma(\alpha+1/2)}{\sqrt{2\theta}\,\sqrt{\pi}\,\Gamma(\alpha)} \left(1+\frac{z^2}{2\theta}\right)^{-\alpha-1/2} \\ &=\frac{1}{\sqrt{2\theta}\,B(1/2,\alpha)} \left(1+\frac{z^2}{2\theta}\right)^{-\alpha-1/2}. \end{aligned}$

以上の1つ目の等号は $p(z|\alpha,\theta)$ の定義。3つ目の等号は積分変数の置換。4つ目の等号で次の公式を使った( $t=x/\theta$ と置換すれば容易に証明される):

$\displaystyle \int_0^\infty e^{-at} t^{s-1}\,dt = a^{-s}\Gamma(s).$

最後の6つ目の等号で次の公式を使った:

$\displaystyle \Gamma(1/2)=\sqrt{\pi}, \qquad B(p,q) = \frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q)}.$

これらの公式はよく使われる。ガンマ函数とベータ函数については次のノートの第8節にも解説がある。

https://genkuroki.github.io/documents/20160501StirlingFormula.pdf

以上によって上の公式が証明された。q.e.d.

$\alpha=\theta=\nu/2$ の場合の

$\displaystyle p(z|\nu):= p\biggl(z\;\bigg|\;\alpha=\frac{\nu}{2},\;\theta=\frac{\nu}{2}\biggr) = \frac{1}{\sqrt{\nu}\;B(1/2,\nu/2)} \left(1+\frac{z^2}{\nu}\right)^{-(\nu+1)/2}$

は自由度 $\nu$ の t 分布の確率密度函数になる。これ以外の場合において、 $z$ が従う確率分布は t 分布の一般化になっている。

パラメーターが $\alpha=\nu/2$ , $\theta=1/2$ のガンマ分布は自由度 $\nu$ のカイ二乗分布と呼ばれている。もしも $Y$ が自由度 $\nu$ のカイ二乗分布に従うならば、 $Y/\nu$ はパラメーター $\alpha=\theta=\nu/2$ のガンマ分布に従う。ゆえに、 $Z$ が標準正規分布に従う確率変数で $Y$ が自由度 $\nu$ のカイ二乗分布に従う確率変数ならば、 $1/(Y/\nu)$ はパラメーター $\alpha=\theta=\nu/2$ の逆ガンマ分布に従うので、 $Z\big/\!\sqrt{Y/\nu}$ は自由度 $\nu$ の t 分布に従う。

$\alpha=\theta=\nu/2$ と仮定する。このとき $y = \mu + \sigma z$ は TDist2 の場合の確率モデルに一致し、その確率密度函数は

$\displaystyle p(y\,|\,\mu,\nu,\sigma) = \frac{1}{\sigma}p\biggl(\frac{y-\mu}{\sigma}\,\bigg|\,\nu\biggr) = \frac{1}{\sqrt{\nu\sigma^2}}\frac{1}{B(1/2,\nu/2)} \left(1+\frac{(y-\mu)^2}{\nu\sigma^2}\right)^{-(\nu+1)/2} = p\biggl(z=y-\mu\;\bigg|\;\alpha=\frac{\nu}{2},\,\theta=\frac{\nu}{2}\sigma^2\,\biggr).$

と書ける。 $\sigma=1$ とおけば TDist1 の場合の確率モデルが得られ、 $\nu=1$ とおけば Cauchy の場合の確率モデルが得られる。このようにして、Cauchy, TDist1, TDist2 は階層モデルとみなせる。我々は階層モデル内部の変数 $u$ について積分した結果を用いて、最尤法でモデルを解いたのである。

ベイズ統計におけるMCMCの手法を使えば、階層モデル内部の変数 $u$ について積分せずに、モデルを解くことができる。一般に内部変数 $u$ で積分した結果はよく知られている確率密度函数になるとは限らない。MCMCを使えばそのような場合も含めて同じ方法で解くことができる。その分だけ使えるモデルの選択肢が増える。

しかし、このノートが扱っている事例では、わざわざベイズ統計のMCMCの手法に頼らなくても、最尤法だけで十分に面白い結果が得られていると思う。

結論：明らかな外れ値がある場合には裾野が重い(太い)分布を用いた確率モデルが有効である。

階層モデルの立場から見ると、分散に個性を与えたので、裾野が重い分布が自然に出て来たことになる。

外れ値を出すような科学者は「実験に向かないが、理論面では優れている」というようなタイプの個性的な科学者なのだろう。

階層モデルにおける正規分布の分散と標準偏差の分布をプロット¶

正規分布モデルは分散の分布がデルタ分布になるようなモデルであるとみなせる。

それ以外のCauchy分布を使ったモデルとt分布を使った2つのモデルは、正規分布の分散が逆ガンマ分布に従っていると考える階層モデルと同値になる。以下では、それら3つのモデルについて、分散の予測分布をプロットすることにしよう。

我々が採用しているパラメーター名 $\mu,\nu,\sigma$ と階層モデルの関係は以下の通り:

$\displaystyle y \sim \mathrm{Normal}(\mathrm{mean}=\mu,\; \mathrm{var}=u), \qquad u \sim \mathrm{InverseGamma}\left(\alpha = \frac{\nu}{2},\; \theta = \frac{\nu}{2}\sigma^2\right).$

TDist2 モデルに対応する階層モデルの分散 $u$ が従う分布は $\mathrm{InverseGamma}\left(\alpha = \nu/2,\; \theta = \sigma^2\nu/2\right)$ になる。 $\nu = 1$ とおくと Cauchy モデルの場合になり、 $\sigma=1$ とおくと Tdist1 の場合になる。

後悔: $\sigma$ の代わりに $\rho$ などを使っておけば、分散に $\sigma^2$ の記号を割り振れた。

In [17]:

@show csigma
@show t1nu
@show t2nu, t2sigma

csigma = 0.40897290161134037
t1nu = 0.6230116833132611
(t2nu, t2sigma) = (0.4065330086470647, 0.19351765869024307)

Out[17]:

(0.4065330086470647, 0.19351765869024307)

In [18]:

MyInvGamma(nu, sigma) = InverseGamma(nu/2,nu*sigma^2/2)
InvGamma_Cauchy = MyInvGamma(1.0, csigma)
InvGamma_TDist1 = MyInvGamma(t1nu, 1.0)
InvGamma_TDist2 = MyInvGamma(t2nu, t2sigma)

@show InvGamma_Cauchy
@show InvGamma_TDist1
@show InvGamma_TDist2

@show mean(InvGamma_Cauchy)
@show mean(InvGamma_TDist1)
@show mean(InvGamma_TDist2)
println()
@show median(InvGamma_Cauchy)
@show median(InvGamma_TDist1)
@show median(InvGamma_TDist2)
println()
@show mode(InvGamma_Cauchy)
@show mode(InvGamma_TDist1)
@show mode(InvGamma_TDist2)
;

InvGamma_Cauchy = InverseGamma{Float64}(
invd: Gamma{Float64}(α=0.5, θ=11.95751488367983)
θ: 0.08362941712619955
)

InvGamma_TDist1 = InverseGamma{Float64}(
invd: Gamma{Float64}(α=0.31150584165663053, θ=3.2102126710750065)
θ: 0.31150584165663053
)

InvGamma_TDist2 = InverseGamma{Float64}(
invd: Gamma{Float64}(α=0.20326650432353235, θ=131.3690258787557)
θ: 0.00761214444052382
)

mean(InvGamma_Cauchy) = Inf
mean(InvGamma_TDist1) = Inf
mean(InvGamma_TDist2) = Inf

median(InvGamma_Cauchy) = 0.3676532054863362
median(InvGamma_TDist1) = 3.8628662130139606
median(InvGamma_TDist2) = 0.3459159601977134

mode(InvGamma_Cauchy) = 0.0557529447507997
mode(InvGamma_TDist1) = 0.23751769284012603
mode(InvGamma_TDist2) = 0.006326233143839828

In [19]:

pdfvar_Cauchy(x) = pdf(InvGamma_Cauchy, x)
pdfvar_TDist1(x) = pdf(InvGamma_TDist1, x)
pdfvar_TDist2(x) = pdf(InvGamma_TDist2, x)

figure(figsize=(12,4))

subplot2grid((1,20), (0,0), rowspan=1, colspan=10)
xmax = 2.0
x = collect(linspace(eps(),xmax,1000))
plot([0],[0], label="Normal: \$\\delta_{$nsigma_str^2}\$", lw=1)
plot(x, pdfvar_Cauchy.(x), label="Cauchy: $csigma inv\$\\Gamma(1/2,$csigma_str^2/2)\$", lw=1.2, ls=":")
plot(x, pdfvar_TDist1.(x), label="TDist1: inv\$\\Gamma($t1nu_str/2, $t1nu_str/2)\$", lw=0.8, ls="--")
plot(x, pdfvar_TDist2.(x), label="TDist2: inv\$\\Gamma($t2nu_str/2, $t2nu_str\\cdot$t2sigma_str^2/2)\$", lw=0.8, ls="-.")
xlabel("variance of the normal distribution")
ylabel("probability density")
grid("on", ls=":")
legend(fontsize=9)
title("The predictive distributions of variances", fontsize=10)

subplot2grid((1,20), (0,10), rowspan=1, colspan=10)
x = collect(linspace(8.5,11.5,1000))
plot(x, pred_Normal.(x), label="Normal(\$\\mu=$nmu_str\$, \$\\sigma=$nsigma_str\$)", lw=0.8, "-")
plot(x, pred_Cauchy.(x), label="Cauchy(\$\\mu=$cmu_str\$, \$\\sigma=$csigma_str\$)", lw=1.2, ls=":")
plot(x, pred_TDist1.(x), label="$t1mu_str + TDist(\$\\nu=$t1nu_str\$)", lw=0.8, ls="--")
plot(x, pred_TDist2.(x), label="$t2mu_str + $t2sigma_str TDist(\$\\nu=$t2nu_str\$)", lw=0.8, ls="-.")
xlabel("y")
grid("on", linestyle=":")
legend(fontsize=9)
title("PDFs of the predictive distributions")

tight_layout();

分散の確率密度函数のプロットでは様子がいまいちわからないので、累積分布函数をプロットしよう。

In [20]:

InvGamma(nu, sigma) = InverseGamma(nu/2,nu*sigma^2/2)
cdfvar_Normal(x) = ifelse(x < nsigma^2, 0.0, 1.0)
cdfvar_Cauchy(x) = cdf(InvGamma_Cauchy, x)
cdfvar_TDist1(x) = cdf(InvGamma_TDist1, x)
cdfvar_TDist2(x) = cdf(InvGamma_TDist2, x)

figure(figsize=(12,4))

subplot2grid((1,20), (0,0), rowspan=1, colspan=10)
x = collect(linspace(eps(),10.0,1000))
plot(x, cdfvar_Normal.(x.^2), label="Normal: \$\\delta_{$nsigma_str^2}\$", lw=0.8, ls="-")
plot(x, cdfvar_Cauchy.(x.^2), label="Cauchy: inv\$\\Gamma(1/2,$csigma_str^2/2)\$", lw=1.2, ls=":")
plot(x, cdfvar_TDist1.(x.^2), label="TDist1: inv\$\\Gamma($t1nu_str/2, $t1nu_str/2)\$", lw=0.8, ls="--")
plot(x, cdfvar_TDist2.(x.^2), label="TDist2: inv\$\\Gamma($t2nu_str/2, $t2sigma_str^2\\cdot$t2nu_str/2)\$", lw=0.8, ls="-.")
xlabel("standard deviation of the normal distribution")
ylabel("cumulative probability")
grid("on", ls=":")
legend(fontsize=9)
title("The predictive distributions of standard deviations", fontsize=10)

subplot2grid((1,20), (0,10), rowspan=1, colspan=10)
x = collect(linspace(eps(),40,1000))
plot(x, cdfvar_Normal.(x.^2), label="Normal: \$\\delta_{$nsigma_str^2}\$", lw=0.8, ls="-")
plot(x, cdfvar_Cauchy.(x.^2), label="Cauchy: inv\$\\Gamma(1/2,$csigma_str^2/2)\$", lw=1.2, ls=":")
plot(x, cdfvar_TDist1.(x.^2), label="TDist1: inv\$\\Gamma($t1nu_str/2, $t1nu_str/2)\$", lw=0.8, ls="--")
plot(x, cdfvar_TDist2.(x.^2), label="TDist2: inv\$\\Gamma($t2nu_str/2, $t2sigma_str^2\\cdot$t2nu_str/2)\$", lw=0.8, ls="-.")
xlabel("standard deviation of the normal distribution")
ylabel("cumulative probability")
grid("on", ls=":")
legend(fontsize=9)
title("The predictive distributions of standard deviations", fontsize=10)

tight_layout();

正規分布の標準偏差の累積確率は、測定誤差が95%以上の確率で $2x$ 以内に収まる科学者の割合を意味している。

正規分布モデルでは科学者全員の測定誤差の意味での標準偏差が13程度であるという結論になってしまっている。

それ以外の裾野が重い分布を使ったモデルでは、測定誤差の小さな科学者も測定誤差の大きな科学者も両方いるという結論になっている。

In [21]:

InvGamma(nu, sigma) = InverseGamma(nu/2,nu*sigma^2/2)
cdfvar_Normal(x) = ifelse(x < nsigma^2, 0.0, 1.0)
cdfvar_Cauchy(x) = cdf(InvGamma_Cauchy, x)
cdfvar_TDist1(x) = cdf(InvGamma_TDist1, x)
cdfvar_TDist2(x) = cdf(InvGamma_TDist2, x)

figure(figsize=(12,4))

subplot2grid((1,20), (0,0), rowspan=1, colspan=10)
x = collect(linspace(eps(),2.0,1000))
plot(x, cdfvar_Normal.(x.^2), label="Normal: \$\\delta_{$nsigma_str^2}\$", lw=0.8, ls="-")
plot(x, cdfvar_Cauchy.(x.^2), label="Cauchy: inv\$\\Gamma(1/2,$csigma_str^2/2)\$", lw=1.2, ls=":")
plot(x, cdfvar_TDist1.(x.^2), label="TDist1: inv\$\\Gamma($t1nu_str/2, $t1nu_str/2)\$", lw=0.8, ls="--")
plot(x, cdfvar_TDist2.(x.^2), label="TDist2: inv\$\\Gamma($t2nu_str/2, $t2sigma_str^2\\cdot$t2nu_str/2)\$", lw=0.8, ls="-.")
xlabel("standard deviation of the normal distribution")
ylabel("cumulative probability")
grid("on", ls=":")
legend(fontsize=9)
title("The predictive distributions of standard deviations", fontsize=10)

subplot2grid((1,20), (0,10), rowspan=1, colspan=10)
x = collect(linspace(15,40,1000))
plot(x, cdfvar_Normal.(x.^2), label="Normal: \$\\delta_{$nsigma_str^2}\$", lw=0.8, ls="-")
plot(x, cdfvar_Cauchy.(x.^2), label="Cauchy: inv\$\\Gamma(1/2,$csigma_str^2/2)\$", lw=1.2, ls=":")
plot(x, cdfvar_TDist1.(x.^2), label="TDist1: inv\$\\Gamma($t1nu_str/2, $t1nu_str/2)\$", lw=0.8, ls="--")
plot(x, cdfvar_TDist2.(x.^2), label="TDist2: inv\$\\Gamma($t2nu_str/2, $t2sigma_str^2\\cdot$t2nu_str/2)\$", lw=0.8, ls="-.")
xlabel("standard deviation of the normal distribution")
ylabel("cumulative probability")
grid("on", ls=":")
legend(fontsize=9)
title("The predictive distributions of standard deviations", fontsize=10)

tight_layout();

TDist2 のモデルは、Cauchy のモデルよりも測定誤差(標準偏差)が 0.25 以下の科学者はずっとたくさんいるが、分散が 5.0 以上の科学者もずっとたくさんいると予想している。

TDist2 のモデルは、 TDist1 のモデルよりも測定誤差(標準偏差)が 30 以上の科学者が少し多いと予想している

このように TDist2 のモデルは測定誤差が小さな科学者がたくさんいると同時に、測定誤差が極端に大きな科学者が結構いることを否定しない予測を与えている。 TDist2 のモデルはこのように外れ値の問題を処理している。

In [22]:

@show InvGamma_Cauchy.invd.α, InvGamma_Cauchy.θ
@show InvGamma_TDist1.invd.α, InvGamma_TDist1.θ
@show InvGamma_TDist2.invd.α, InvGamma_TDist2.θ
;

(InvGamma_Cauchy.invd.α, InvGamma_Cauchy.θ) = (0.5, 0.08362941712619955)
(InvGamma_TDist1.invd.α, InvGamma_TDist1.θ) = (0.31150584165663053, 0.31150584165663053)
(InvGamma_TDist2.invd.α, InvGamma_TDist2.θ) = (0.20326650432353235, 0.00761214444052382)

逆ガンマ分布では α が小さいほど裾野が太くなり、θ が小さいほど分布が小さな値に集中する。

TDist2 のモデルでは実際にそのような解が実現されていることがわかる。

交差検証¶

素朴な方法で一個抜き出し交差検証(leave-one-out cross-validation, LOOCV)をして実行してみることにする。

以下では一個抜き出し交差検証 LOOCV を、サンプル $Y_1,\ldots,Y_n$ から $Y_i$ の一個を抜いてできるデータ得られる予測分布の確率密度函数 $p_i(y)$ を作り、そこにサンプル $Y_i$ を代入して得られる尤度 $p_i(Y_i)$ の対数の和の $-2$ 倍として計算した。

尤度達 $p_i(Y_i)$ の値が大きい方が、 $Y_i$ を除いたデータを使った $Y_i$ の予測が当たっていることになるので、 $\log p_i(Y_i)$ 達は大きい方がよいということになる。すなわちLOOCVの値は小さい方がよいということになる。

必要な函数の定義¶

In [23]:

function constpred_Normal(Y)
    f = fit_mle(Normal,Y)
    pred(x) = pdf(Normal(f.μ,f.σ),x)
    return pred
end

function maxloglikelihood_Normal(Y)
    pred = constpred_Normal(Y)
    n = length(Y)
    maxloglik = sum(log(pred(Y[i])) for i in 1:n)
    return maxloglik
end

aic_Normal(Y) = -2*maxloglikelihood_Normal(Y) + 2*2

function loocv_Normal(Y)
    loocv = 0.0
    n = length(Y)
    for i in 1:n
        pred = constpred_Normal(Y[[1:i-1;i+1:n]])
        loocv += log(pred(Y[i]))
    end
    return -2*loocv
end

@show -2*maxloglikelihood_Normal(Y)
@show aic_Normal(Y)
@show loocv_Normal(Y);

-2 * maxloglikelihood_Normal(Y) = 55.36016310982963
aic_Normal(Y) = 59.36016310982963
loocv_Normal(Y) = 285.30604443807385

正規分布モデルでは LOOCV の値が跳ね上がっている。その理由は後で計算して確認するが、外れ値の存在が原因である。

In [24]:

negloglik_Cauchy(Y,mu,sigma) = sigma > zero(sigma) ? -sum(logpdf(Cauchy(mu,sigma),Y[i]) for i in 1:length(Y)) : Inf
negloglik_Cauchy(Y,x) = negloglik_Cauchy(Y,x[1],x[2])
   
function constpred_Cauchy(Y; init=[0.0, 1.0])
    negloglik(x) = negloglik_Cauchy(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    pred(x) = pdf(Cauchy(a[1],a[2]),x)
    return pred
end

function maxloglikelihood_Cauchy(Y)
    pred = constpred_Cauchy(Y)
    n = length(Y)
    maxloglik = sum(log(pred(Y[i])) for i in 1:n)
    return maxloglik
end

aic_Cauchy(Y) = -2*maxloglikelihood_Cauchy(Y) + 2*2

function loocv_Cauchy(Y)
    loocv = 0.0
    n = length(Y)
    for i in 1:n
        pred = constpred_Cauchy(Y[[1:i-1;i+1:n]])
        loocv += log(pred(Y[i]))
    end
    return -2*loocv
end

@show -2*maxloglikelihood_Cauchy(Y)
@show aic_Cauchy(Y)
@show loocv_Cauchy(Y);

-2 * maxloglikelihood_Cauchy(Y) = 39.32737909640757
aic_Cauchy(Y) = 43.32737909640757
loocv_Cauchy(Y) = 46.94549130671414

In [25]:

logpdf_TDist1(mu,nu,x) = nu > zero(nu) ? logpdf(TDist(nu), x-mu) : -Inf
negloglik_TDist1(Y,mu,nu) = -sum(logpdf_TDist1(mu,nu,Y[i]) for i in 1:length(Y))
negloglik_TDist1(Y,x) = negloglik_TDist1(Y,x[1],x[2])

function constpred_TDist1(Y; init=[0, 1.0])
    negloglik(x) = negloglik_TDist1(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    pred(x) = exp(logpdf_TDist1(a[1],a[2],x))
    return pred
end

function maxloglikelihood_TDist1(Y)
    pred = constpred_TDist1(Y)
    n = length(Y)
    maxloglik = sum(log(pred(Y[i])) for i in 1:n)
    return maxloglik
end

aic_TDist1(Y) = -2*maxloglikelihood_TDist1(Y) + 2*2

function loocv_TDist1(Y)
    loocv = 0.0
    n = length(Y)
    for i in 1:n
        pred = constpred_TDist1(Y[[1:i-1;i+1:n]])
        loocv += log(pred(Y[i]))
    end
    return -2*loocv
end

@show -2*maxloglikelihood_TDist1(Y)
@show aic_TDist1(Y)
@show loocv_TDist1(Y);

-2 * maxloglikelihood_TDist1(Y) = 39.652636852568655
aic_TDist1(Y) = 43.652636852568655
loocv_TDist1(Y) = 43.90239974526162

In [26]:

logpdf_TDist2(mu,nu,sigma,x) = (nu > zero(nu) && sigma > zero(nu)) ? logpdf(TDist(nu),(x-mu)/sigma)-log(sigma) : -Inf
negloglik_TDist2(Y,mu,nu,sigma) = -sum(logpdf_TDist2(mu,nu,sigma,Y[i]) for i in 1:length(Y))
negloglik_TDist2(Y,x) = negloglik_TDist2(Y,x[1],x[2],x[3])

function constpred_TDist2(Y; init=[0.0, 1.0, 1.0])
    negloglik(x) = negloglik_TDist2(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    pred(x) = exp(logpdf_TDist2(a[1],a[2],a[3],x))
    return pred
end

function maxloglikelihood_TDist2(Y)
    pred = constpred_TDist2(Y)
    n = length(Y)
    maxloglik = sum(log(pred(Y[i])) for i in 1:n)
    return maxloglik
end

aic_TDist2(Y) = -2*maxloglikelihood_TDist2(Y) + 2*3

function loocv_TDist2(Y)
    loocv = 0.0
    n = length(Y)
    for i in 1:n
        pred = constpred_TDist2(Y[[1:i-1;i+1:n]])
        loocv += log(pred(Y[i]))
    end
    return -2*loocv
end

@show -2*maxloglikelihood_TDist2(Y)
@show aic_TDist2(Y)
@show loocv_TDist2(Y);

-2 * maxloglikelihood_TDist2(Y) = 35.53912999430733
aic_TDist2(Y) = 41.53912999430733
loocv_TDist2(Y) = 41.79445861646667

4つのモデルの交差検証結果の比較¶

In [27]:

println("-- Normal --")
@show aic_Normal(Y)
@show loocv_Normal(Y)
println("\n-- Cauchy --")
@show aic_Cauchy(Y)
@show loocv_Cauchy(Y)
println("\n-- TDist1 --")
@show aic_TDist1(Y)
@show loocv_TDist1(Y)
println("\n-- TDist2 --")
@show aic_TDist2(Y)
@show loocv_TDist2(Y)
;

-- Normal --
aic_Normal(Y) = 59.36016310982963
loocv_Normal(Y) = 285.30604443807385

-- Cauchy --
aic_Cauchy(Y) = 43.32737909640757
loocv_Cauchy(Y) = 46.94549130671414

-- TDist1 --
aic_TDist1(Y) = 43.652636852568655
loocv_TDist1(Y) = 43.90239974526162

-- TDist2 --
aic_TDist2(Y) = 41.53912999430733
loocv_TDist2(Y) = 41.79445861646667

正規分布モデルの LOOCV の値だけが突出して高くなってしまっている。その原因は以下の計算を見ればわかるように外れ値 Y[1] の存在である。以下で計算するのは $-2\log p_i(Y_i)$ の表である。

In [28]:

p_Normal = [constpred_Normal(Y[[1:i-1;i+1:length(Y)]]) for i in 1:n]
loss_Normal = [-2*log(p_Normal[i](Y[i])) for i in 1:n]

Out[28]:

7-element Array{Float64,1}:
 241.70434486508626  
   7.062835168631006 
   7.2075117327291505
   7.328642753072676 
   7.311018810934502 
   7.340804529291411 
   7.350886578328783

正規分布モデルでは、外れ値 Y[1] を除いたデータを使って Y[1] を予測すると損失 $-2\log p_1(Y[1])$ が極端に大きくなる。そして、その外れ値の存在のせいで、外れ値以外の Y[i] を除いたデータを使って Y[i] を予測する場合にも損失 $-2\log p_i(Y[i])$ が十分に小さくならなくなってしまっている。

In [29]:

p_Cauchy = [constpred_Cauchy(Y[[1:i-1;i+1:length(Y)]]) for i in 1:n]
loss_Cauchy = [-2*log(p_Cauchy[i](Y[i])) for i in 1:n]

Out[29]:

7-element Array{Float64,1}:
 19.435742887322597 
 12.36774771563313  
  7.117935991639482 
  2.0322611177697802
  1.6380342696760275
  2.154343403713156 
  2.199425920959965

In [30]:

p_TDist1 = [constpred_TDist1(Y[[1:i-1;i+1:length(Y)]]) for i in 1:n]
loss_TDist1 = [-2*log(p_TDist1[i](Y[i])) for i in 1:n]

Out[30]:

7-element Array{Float64,1}:
 18.365227169595087 
  9.249568148206372 
  5.324944064251271 
  2.5691773881335958
  2.650495301137213 
  2.7602899747464233
  2.98269769919166

In [31]:

p_TDist2 = [constpred_TDist2(Y[[1:i-1;i+1:length(Y)]]) for i in 1:n]
loss_TDist2 = [-2*log(p_TDist2[i](Y[i])) for i in 1:n]

Out[31]:

7-element Array{Float64,1}:
 16.865403278740125 
 10.835388149475932 
  7.082128732986169 
  1.010609357854548 
  2.9774112991147144
  1.0999102039677469
  1.9236075943274393

裾野が重い分布を使った確率モデルは外れ値をうまく処理していることがわかる。

一個抜き出し交差検証の様子をプロット¶

In [32]:

function cyc_color(i)
    colorlist = [
        "#1f77b4",
        "#ff7f0e",
        "#2ca02c",
        "#d62728",
        "#9467bd",
        "#8c564b",
        "#e377c2",
        "#7f7f7f",
        "#bcbd22",
        "#17becf",
    ]
    n = length(colorlist)
    return colorlist[mod1(i,n)]
end

function cyc_style(i)
    stylelist = [
        "dashed",
        "dashdot",
        "dotted",
        (0, (4.5, 1.5, 1, 1.5, 1, 1.5, 1, 1.5)),
        (0, (3, 1.5, 1, 1.5, 3, 1.5, 1, 1.5)),
        (0, (2.5, 1.5, 2.5, 1.5, 2.5, 1.5, 1, 1.5)),
        (0, (2.5, 1, 2.5, 1, 2.5, 1, 2.5, 1)),
        (0, (5, 1.5, 2, 1.5, 2, 1.5)),
        (0, (5, 2, 2)),
        (0, (6, 3)),
    ]
    n = length(stylelist)
    return stylelist[mod1(i,n)]
end

Out[32]:

cyc_style (generic function with 1 method)

In [33]:

@assert n == 7

figure(figsize=(12,5))
x = collect(linspace(-30,40,1000))

shape = (17,32)
loc(i) = (1+8*div(i,4),8*mod(i,4))
rs, cs = 8, 8

ax = subplot2grid(shape, loc(0), rowspan=rs, colspan=cs)
plot(x, -2*log.(pred_Normal.(x)), label="\$-2\\log p_\\ast\$", color="black", ls="-")
scatter(Y, -2*log.(pred_Normal.(Y)), color="black")
grid("on", linestyle=":")
legend()

for i in 1:n
    ax = subplot2grid(shape, loc(i), rowspan=rs, colspan=cs)
    plot(x, -2*log.(p_Normal[i].(x)), label="$i", color=cyc_color(i), ls=cyc_style(i))
    scatter([Y[i]], [-2*log(p_Normal[i](Y[i]))], color=cyc_color(i))
    grid("on", linestyle=":")
    legend()
end

suptitle("-2 log(PDFs of the leave-one-out predictive distributions for the Normal distribution model)")

tight_layout()

図の読み方

左上の黒色のグラフはサンプル全体から作った予測分布の対数の $-2$ 倍であり、黒丸はサンプルである。

それ以外の色の付いた点線のグラフは Y[i] を除いたデータから作った予測分布のプロットであり、色丸は Y[i] を意味している。色丸の位置が下にあるほどその予測分布による Y[i] の予測精度は高いと考えられる。

正規分布モデルのケースでは、外れ値の Y[1] 抜きのデータから作った予測分布のケースだけ縦軸の縮尺のオーダーが違っていることに注意せよ。お椀の底の位置も右側にずれている。正規分布モデルでは外れ値を抜くと予測分布が大幅に変わってしまう。

In [34]:

@assert n == 7

figure(figsize=(12,5))
x = collect(linspace(-30,40,1000))

shape = (17,32)
loc(i) = (1+8*div(i,4),8*mod(i,4))
rs, cs = 8, 8

ax = subplot2grid(shape, loc(0), rowspan=rs, colspan=cs)
plot(x, -2*log.(pred_Cauchy.(x)), label="\$-2\\log p_\\ast\$", color="black", ls="-")
scatter(Y, -2*log.(pred_Cauchy.(Y)), color="black")
grid("on", linestyle=":")
legend()

for i in 1:n
    ax = subplot2grid(shape, loc(i), rowspan=rs, colspan=cs)
    plot(x, -2*log.(p_Cauchy[i].(x)), label="$i", color=cyc_color(i), ls=cyc_style(i))
    scatter([Y[i]], [-2*log(p_Cauchy[i](Y[i]))], color=cyc_color(i))
    grid("on", linestyle=":")
    legend()
end

suptitle("-2 log(PDFs of the leave-one-out predictive distributions for the Cauchy distribution model)")

tight_layout()

上段左から二番目のグラフは外れ値の Y[1] を抜いたデータから作った予測分布のグラフである。Y[1] は丸印で表示されている。予測失敗の損失は20弱ですんでいる。

外れ値を除いた場合とそうでない場合で予測分布(の対数の $-2$ 倍のグラフ)の形がほとんど変わっていないことにも注目せよ。

In [35]:

@assert n == 7

figure(figsize=(12,5))
x = collect(linspace(-30,40,1000))

shape = (17,32)
loc(i) = (1+8*div(i,4),8*mod(i,4))
rs, cs = 8, 8

ax = subplot2grid(shape, loc(0), rowspan=rs, colspan=cs)
plot(x, -2*log.(pred_TDist1.(x)), label="\$-2\\log p_\\ast\$", color="black", ls="-")
scatter(Y, -2*log.(pred_TDist1.(Y)), color="black")
grid("on", linestyle=":")
legend()

for i in 1:n
    ax = subplot2grid(shape, loc(i), rowspan=rs, colspan=cs)
    plot(x, -2*log.(p_TDist1[i].(x)), label="$i", color=cyc_color(i), ls=cyc_style(i))
    scatter([Y[i]], [-2*log(p_TDist1[i](Y[i]))], color=cyc_color(i))
    grid("on", linestyle=":")
    legend()
end

suptitle("-2 log(PDFs of the leave-one-out predictive distributions for the t-distribution model 1)")

tight_layout()

外れ値を除いた上段左から二番目の場合とそうでない場合で予測分布(の対数の $-2$ 倍のグラフ)の形がほとんど変わっていないことにも注目せよ。

In [36]:

@assert n == 7

figure(figsize=(12,5))
x = collect(linspace(-30,40,1000))

shape = (17,32)
loc(i) = (1+8*div(i,4),8*mod(i,4))
rs, cs = 8, 8

ax = subplot2grid(shape, loc(0), rowspan=rs, colspan=cs)
plot(x, -2*log.(pred_TDist2.(x)), label="\$-2\\log p_\\ast\$", color="black", ls="-")
scatter(Y, -2*log.(pred_TDist2.(Y)), color="black")
grid("on", linestyle=":")
legend()

for i in 1:n
    ax = subplot2grid(shape, loc(i), rowspan=rs, colspan=cs)
    plot(x, -2*log.(p_TDist2[i].(x)), label="$i", color=cyc_color(i), ls=cyc_style(i))
    scatter([Y[i]], [-2*log(p_TDist2[i](Y[i]))], color=cyc_color(i))
    grid("on", linestyle=":")
    legend()
end

suptitle("-2 log(PDFs of the leave-one-out predictive distributions for the t-distribution model 2)")

tight_layout()

外れ値を除いた上段左から二番目の場合とそうでない場合で予測分布(の対数の $-2$ 倍のグラフ)の形がほとんど変わっていないことにも注目せよ。

別のサンプル¶

In [37]:

function constparam_Normal(Y)
    d = fit_mle(Normal,Y)
    return d.μ, d.σ
end

function constparam_Cauchy(Y; init=[0.0, 1.0])
    negloglik(x) = negloglik_Cauchy(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    return a[1],a[2]
end

function constparam_TDist1(Y; init=[0.0, 1.0])
    negloglik(x) = negloglik_TDist1(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    return a[1],a[2]
end

function constparam_TDist2(Y; init=[0.0, 1.0, 1.0])
    negloglik(x) = negloglik_TDist2(Y,x)
    result = optimize(negloglik, init)
    a = result.minimizer
    return a[1],a[2],a[3]
end

Out[37]:

constparam_TDist2 (generic function with 1 method)

正常値を3つ追加してみると¶

In [38]:

Out[38]:

7-element Array{Float64,1}:
 -27.02 
   3.57 
   8.191
   9.808
   9.603
   9.945
  10.056

In [39]:

YY = [Y; [9.801, 9.701, 9.958]]

Out[39]:

10-element Array{Float64,1}:
 -27.02 
   3.57 
   8.191
   9.808
   9.603
   9.945
  10.056
   9.801
   9.701
   9.958

In [40]:

function comp_Normal(Y,YY;xmax=2)
    
@show aic_Normal(Y)/(2*length(Y))
@show loocv_Normal(Y)/(2*length(Y))
println()
@show aic_Normal(YY)/(2*length(YY))
@show loocv_Normal(YY)/(2*length(YY))
println()
@show a = constparam_Normal(Y)
@show nmu = a[1]
@show nsigma = a[2]
cdfvar_Normal(x) = x < nsigma^2 ? 0.0 : 1.0
println()
@show a = constparam_Normal(YY)
@show nnmu = a[1]
@show nnsigma = a[2]
cdfvar_NNormal(x) = x < nnsigma^2 ? 0.0 : 1.0
sleep(0.2)

figure(figsize=(6,3.6))
x = collect(linspace(0,xmax,1000))
plot(x, cdfvar_Normal.(x.^2),  label="Y")
plot(x, cdfvar_NNormal.(x.^2), label="YY", ls="--")
xlabel("std")
grid("on", ls=":")
title("PDFs")
legend()

end

comp_Normal(Y,YY,xmax=25)

aic_Normal(Y) / (2 * length(Y)) = 4.240011650702116
loocv_Normal(Y) / (2 * length(Y)) = 20.37900317414813

aic_Normal(YY) / (2 * length(YY)) = 4.012790145122183
loocv_Normal(YY) / (2 * length(YY)) = 19.80255974713817

a = constparam_Normal(Y) = (3.450428571428571, 12.620958836759735)
nmu = a[1] = 3.450428571428571
nsigma = a[2] = 12.620958836759735

a = constparam_Normal(YY) = (5.3613, 10.95560954077864)
nnmu = a[1] = 5.3613
nnsigma = a[2] = 10.95560954077864

Out[40]:

PyObject <matplotlib.legend.Legend object at 0x000000005FDEAF60>

In [41]:

function comp_Cauchy(Y,YY)

@show aic_Cauchy(Y)/(2*length(Y))
@show loocv_Cauchy(Y)/(2*length(Y))
println()
@show aic_Cauchy(YY)/(2*length(YY))
@show loocv_Cauchy(YY)/(2*length(YY))
println()
@show a = constparam_Cauchy(Y)
@show cmu = a[1]
@show csigma = a[2]
cdfvar_Cauchy(x) = cdf(InverseGamma(1/2,csigma^2/2), x)
pdfvar_Cauchy(x) = pdf(InverseGamma(1/2,csigma^2/2), x)
println()
@show a = constparam_Cauchy(YY)
@show ccmu = a[1]
@show ccsigma = a[2]
cdfvar_CCauchy(x) = cdf(InverseGamma(1/2,ccsigma^2/2), x)
pdfvar_CCauchy(x) = pdf(InverseGamma(1/2,ccsigma^2/2), x)
sleep(0.2)

figure(figsize=(9,3.6))
    
subplot(1,2,1)
x = collect(linspace(0,2,1000))
plot(x, pdfvar_Cauchy.(x), label="Y")
plot(x, pdfvar_CCauchy.(x),  label="YY", ls="--")
xlabel("variance")
grid("on", ls=":")
title("PDFs")
legend()

subplot(1,2,2)
x = collect(linspace(0,2,1000))
plot(x, cdfvar_Cauchy.(x.^2), label="Y")
plot(x, cdfvar_CCauchy.(x.^2),  label="YY", ls="--")
xlabel("std")
grid("on", ls=":")
title("CDFs")
legend()

tight_layout()    

end

comp_Cauchy(Y,YY)

aic_Cauchy(Y) / (2 * length(Y)) = 3.0948127926005404
loocv_Cauchy(Y) / (2 * length(Y)) = 3.35324937905101

aic_Cauchy(YY) / (2 * length(YY)) = 2.1555331081452516
loocv_Cauchy(YY) / (2 * length(YY)) = 2.1582651417030965

a = constparam_Cauchy(Y) = (9.812903700303522, 0.4089702390461565)
cmu = a[1] = 9.812903700303522
csigma = a[2] = 0.4089702390461565

a = constparam_Cauchy(YY) = (9.83176549381761, 0.1975764921912232)
ccmu = a[1] = 9.83176549381761
ccsigma = a[2] = 0.1975764921912232

Cauchy分布を使ったモデルだと、サンプルを生成した正規分布の std は 0.5 なのにこうなってしまう。

In [42]:

function comp_TDist1(Y,YY)

@show aic_TDist1(Y)/(2*length(Y))
@show loocv_TDist1(Y)/(2*length(Y))
println()
@show aic_TDist1(YY)/(2*length(YY))
@show loocv_TDist1(YY)/(2*length(YY))
println()
@show a = constparam_TDist1(Y)
@show td1mu = a[1]
@show td1nu = a[2]
cdfvar_TDist1(x) = cdf(InverseGamma(td1nu/2,td1nu/2), x)
pdfvar_TDist1(x) = pdf(InverseGamma(td1nu/2,td1nu/2), x)
println()
@show a = constparam_TDist1(YY)
@show ttd1mu = a[1]
@show ttd1nu = a[2]
cdfvar_TTDist1(x) = cdf(InverseGamma(ttd1nu/2,ttd1nu/2), x)
pdfvar_TTDist1(x) = pdf(InverseGamma(ttd1nu/2,ttd1nu/2), x)
sleep(0.2)

figure(figsize=(9,3.6))
    
subplot(1,2,1)
x = collect(linspace(0,2,1000))
plot(x, pdfvar_TDist1.(x),  label="Y")
plot(x, pdfvar_TTDist1.(x), label="YY", ls="--")
xlabel("variance")
grid("on", ls=":")
title("PDFs")
legend()

subplot(1,2,2)
x = collect(linspace(0,2,1000))
plot(x, cdfvar_TDist1.(x.^2),  label="Y")
plot(x, cdfvar_TTDist1.(x.^2), label="YY", ls="--")
xlabel("std")
grid("on", ls=":")
title("CDFs")
legend()

tight_layout()
    
end

comp_TDist1(Y,YY)

aic_TDist1(Y) / (2 * length(Y)) = 3.1180454894691896
loocv_TDist1(Y) / (2 * length(Y)) = 3.135885696090116

aic_TDist1(YY) / (2 * length(YY)) = 2.5567062152604416
loocv_TDist1(YY) / (2 * length(YY)) = 2.603262445911258

a = constparam_TDist1(Y) = (9.728434887339093, 0.6230454117603981)
td1mu = a[1] = 9.728434887339093
td1nu = a[2] = 0.6230454117603981

a = constparam_TDist1(YY) = (9.760511846452381, 0.7487214903580164)
ttd1mu = a[1] = 9.760511846452381
ttd1nu = a[2] = 0.7487214903580164

In [43]:

function comp_TDist2(Y,YY)

@show aic_TDist2(Y)/(2*length(Y))
@show loocv_TDist2(Y)/(2*length(Y))
println()
@show aic_TDist2(YY)/(2*length(YY))
@show loocv_TDist2(YY)/(2*length(YY))
println()
@show a = constparam_TDist2(Y)
@show td2mu = a[1]
@show td2nu = a[2]
@show td2sigma = a[3]
cdfvar_TDist2(x) = cdf(InverseGamma(td2nu/2,td2sigma^2*td2nu/2), x)
pdfvar_TDist2(x) = pdf(InverseGamma(td2nu/2,td2sigma^2*td2nu/2), x)
println()
@show a = constparam_TDist2(YY)
@show ttd2mu = a[1]
@show ttd2nu = a[2]
@show ttd2sigma = a[3]
cdfvar_TTDist2(x) = cdf(InverseGamma(ttd2nu/2,ttd2sigma^2*ttd2nu/2), x)
pdfvar_TTDist2(x) = pdf(InverseGamma(ttd2nu/2,ttd2sigma^2*ttd2nu/2), x)
sleep(0.2)

figure(figsize=(9,3.6))
    
subplot(1,2,1)
x = collect(linspace(0,2,1000))
plot(x, pdfvar_TDist2.(x),  label="Y",)
plot(x, pdfvar_TTDist2.(x), label="YY", ls="--")
xlabel("variance")
grid("on", ls=":")
title("PDFs")
legend()

subplot(1,2,2)
x = collect(linspace(0,2,1000))
plot(x, cdfvar_TDist2.(x.^2),  label="Y")
plot(x, cdfvar_TTDist2.(x.^2), label="YY", ls="--")
xlabel("std")
grid("on", ls=":")
title("CDFs")
legend()

tight_layout()
    
end

comp_TDist2(Y,YY)

aic_TDist2(Y) / (2 * length(Y)) = 2.967080713879095
loocv_TDist2(Y) / (2 * length(Y)) = 2.9853184726047624

aic_TDist2(YY) / (2 * length(YY)) = 2.070802534909034
loocv_TDist2(YY) / (2 * length(YY)) = 2.1498605084058284

a = constparam_TDist2(Y) = (9.875710145666144, 0.4065189705431987, 0.19347524640022232)
td2mu = a[1] = 9.875710145666144
td2nu = a[2] = 0.4065189705431987
td2sigma = a[3] = 0.19347524640022232

a = constparam_TDist2(YY) = (9.821489063298188, 0.47011900144578933, 0.13020928864533743)
ttd2mu = a[1] = 9.821489063298188
ttd2nu = a[2] = 0.47011900144578933
ttd2sigma = a[3] = 0.13020928864533743

正規分布が生成したサンプル¶

In [44]:

YYY  = 0.5*randn(128)
YYYY = 0.5*randn(512)

@show mean(YYY), std(YYY)
@show mean(YYYY), std(YYYY);

(mean(YYY), std(YYY)) = (0.014703898294867474, 0.559829387715939)
(mean(YYYY), std(YYYY)) = (-0.02860457805204251, 0.48373266190270925)

In [45]:

comp_Normal(YYY,YYYY,xmax=1)

aic_Normal(Y) / (2 * length(Y)) = 0.8505187380085045
loocv_Normal(Y) / (2 * length(Y)) = 0.8508581288695476

aic_Normal(YY) / (2 * length(YY)) = 0.6956443894783292
loocv_Normal(YY) / (2 * length(YY)) = 0.6959494493996806

a = constparam_Normal(Y) = (0.014703898294867474, 0.5576382662447745)
nmu = a[1] = 0.014703898294867474
nsigma = a[2] = 0.5576382662447745

a = constparam_Normal(YY) = (-0.0286045780520425, 0.48326003583783117)
nnmu = a[1] = -0.0286045780520425
nnsigma = a[2] = 0.48326003583783117

Out[45]:

PyObject <matplotlib.legend.Legend object at 0x000000005FF2FA58>

In [46]:

comp_Cauchy(YYY,YYYY)

aic_Cauchy(Y) / (2 * length(Y)) = 1.037825114644328
loocv_Cauchy(Y) / (2 * length(Y)) = 1.0397876781444806

aic_Cauchy(YY) / (2 * length(YY)) = 0.8719692703901739
loocv_Cauchy(YY) / (2 * length(YY)) = 0.8724317883599579

a = constparam_Cauchy(Y) = (0.007867830180010443, 0.3435701508417392)
cmu = a[1] = 0.007867830180010443
csigma = a[2] = 0.3435701508417392

a = constparam_Cauchy(YY) = (-0.049180535310153865, 0.2927461294979258)
ccmu = a[1] = -0.049180535310153865
ccsigma = a[2] = 0.2927461294979258

In [47]:

comp_TDist1(YYY,YYYY)

aic_TDist1(Y) / (2 * length(Y)) = 1.090043752716998
loocv_TDist1(Y) / (2 * length(Y)) = 1.0768768242359512

aic_TDist1(YY) / (2 * length(YY)) = 1.0396149144588125
loocv_TDist1(YY) / (2 * length(YY)) = 1.0361661148722492

a = constparam_TDist1(Y) = (0.014704095244993231, 2.1979418132535976e8)
td1mu = a[1] = 0.014704095244993231
td1nu = a[2] = 2.1979418132535976e8

a = constparam_TDist1(YY) = (-0.02860304396075325, 2.4056658090819464e9)
ttd1mu = a[1] = -0.02860304396075325
ttd1nu = a[2] = 2.4056658090819464e9

TDist1 モデルに正規分布の十分大きなサンプルを食わせるともとの std の値によらずに std = 1 と推定してしまう。

In [48]:

comp_TDist2(YYY,YYYY)

aic_TDist2(Y) / (2 * length(Y)) = 0.8583312386622173
loocv_TDist2(Y) / (2 * length(Y)) = 0.8508586223880008

aic_TDist2(YY) / (2 * length(YY)) = 0.6965875043167074
loocv_TDist2(YY) / (2 * length(YY)) = 0.6968996453694081

a = constparam_TDist2(Y) = (0.01470677516891961, 2.56611852760168e7, 0.5576374170165641)
td2mu = a[1] = 0.01470677516891961
td2nu = a[2] = 2.56611852760168e7
td2sigma = a[3] = 0.5576374170165641

a = constparam_TDist2(YY) = (-0.031317746390380694, 30.277482562199033, 0.46696296621289257)
ttd2mu = a[1] = -0.031317746390380694
ttd2nu = a[2] = 30.277482562199033
ttd2sigma = a[3] = 0.46696296621289257

TDist2 モデルならばサンプルを生成した正規分布と同じ std を推定してくれる。

In [49]:

println("-- Normal --")
@show aic_Normal(YYYY)
@show loocv_Normal(YYYY)
println("\n-- Cauchy --")
@show aic_Cauchy(YYYY)
@show loocv_Cauchy(YYYY)
println("\n-- TDist1 --")
@show aic_TDist1(YYYY)
@show loocv_TDist1(YYYY)
println("\n-- TDist2 --")
@show aic_TDist2(YYYY)
@show loocv_TDist2(YYYY)
;

-- Normal --
aic_Normal(YYYY) = 712.3398548258091
loocv_Normal(YYYY) = 712.652236185273

-- Cauchy --
aic_Cauchy(YYYY) = 892.8965328795381
loocv_Cauchy(YYYY) = 893.3701512805969

-- TDist1 --
aic_TDist1(YYYY) = 1064.565672405824
loocv_TDist1(YYYY) = 1061.0341016291832

-- TDist2 --
aic_TDist2(YYYY) = 713.3056044203083
loocv_TDist2(YYYY) = 713.6252368582739

正規分布が生成したサンプルサイズ n=256 で正規分布モデルと TDist2 モデルの AIC と LOOCV はほぼ同じ。(これはほぼ当たり前。なぜならば、TDist2モデルが推定した予測分布はほぼ正規分布になっており、正規分布モデルの予測分布とほとんど違わない(TDist2モデルにおける $\nu$ の推定値が非常に大きな値になっていることに注意せよ)。だからAICにおいてパラメーターの個数の差の2倍のちょうど 2 の違いが生じているに過ぎない。)

TDist2 モデルは外れ値がある場合に強いだけではなく、外れ値がない真の分布が正規分布の場合にも有用だと思われる。

TDist2で生成したサンプル¶

TDist2モデルで生成したサンプルと正規分布で生成したサンプルを正規分布モデルとTDist2モデルの両方に与えたときの結果を比較してみよう。

In [50]:

Ymu = 0.0
Ynu = 1.0
Ysigma = 0.5
YT = Ymu .+ Ysigma .* rand(TDist(Ynu),256)
YN = Ymu .+ Ysigma .* randn(256)
@show Ymu, Ynu, Ysigma
@show mean(YT), std(YT)
@show mean(YN), std(YN);

(Ymu, Ynu, Ysigma) = (0.0, 1.0, 0.5)
(mean(YT), std(YT)) = (-0.626121408091159, 4.376481718318639)
(mean(YN), std(YN)) = (-0.03404838001015294, 0.5037058877700993)

In [51]:

@time comp_Normal(YT, YN, xmax=1)

aic_Normal(Y) / (2 * length(Y)) = 2.901039224487904
loocv_Normal(Y) / (2 * length(Y)) = 3.0300882356010432

aic_Normal(YY) / (2 * length(YY)) = 0.739031346286549
loocv_Normal(YY) / (2 * length(YY)) = 0.7386660391223696

a = constparam_Normal(Y) = (-0.6261214080911592, 4.36792553864255)
nmu = a[1] = -0.6261214080911592
nsigma = a[2] = 4.36792553864255

a = constparam_Normal(YY) = (-0.03404838001015292, 0.5027211245842677)
nnmu = a[1] = -0.03404838001015292
nnsigma = a[2] = 0.5027211245842677

  0.275930 seconds (2.36 k allocations: 2.192 MiB)

Out[51]:

PyObject <matplotlib.legend.Legend object at 0x0000000060C27208>

In [52]:

@time comp_TDist2(YT,YN)

aic_TDist2(Y) / (2 * length(Y)) = 1.7344792510765927
loocv_TDist2(Y) / (2 * length(Y)) = 1.7334192175784031

aic_TDist2(YY) / (2 * length(YY)) = 0.7429375967120062
loocv_TDist2(YY) / (2 * length(YY)) = 0.7386657468815561

a = constparam_TDist2(Y) = (-0.033055487138168506, 0.9094692808740659, 0.3976480380754963)
td2mu = a[1] = -0.033055487138168506
td2nu = a[2] = 0.9094692808740659
td2sigma = a[3] = 0.3976480380754963

a = constparam_TDist2(YY) = (-0.034045410996012035, 1.5422476098682362e8, 0.5027203064019203)
ttd2mu = a[1] = -0.034045410996012035
ttd2nu = a[2] = 1.5422476098682362e8
ttd2sigma = a[3] = 0.5027203064019203

 10.744662 seconds (1.73 M allocations: 38.499 MiB)

このノートの最初の方で明らかになったように、TDist2モデルで生成した「外れ値」のあるサンプルは正規分布モデルでうまく扱えない。それに対して、TDist2モデルはTDist2モデルで生成したサンプルだけではなく、正規分布で生成したサンプルも適切に扱えている。

TDist2モデルの確率密度函数は

$\displaystyle p(y\,|\,\mu,\nu,\sigma) = \frac{1}{\sqrt{\nu\sigma^2}}\frac{1}{B(1/2,\nu/2)} \left(1+\frac{(y-\mu)^2}{\nu\sigma^2}\right)^{-(\nu+1)/2}.$

であった。これの $\nu\to\infty$ での極限は正規分布の確率密度函数に一致する:

$\displaystyle p(y\,|\,\mu,\infty,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(y-\mu)^2}{2\sigma^2}\right).$

この意味でTDist2モデルは正規分布モデルをその極限として含んでいる。だから、正規分布が生成するサンプルをTDist2モデルに与えると $\nu$ の推定値が非常に大きくなる。

Cauchy分布のモデルは極限として正規分布モデルを含んでいないし、TDist1モデルは極限として分散1の正規分布モデルしか含んでいない。このような理由から、外れ値があっても、実際には正規分布である疑いがある場合にはCauchy分布モデルとTDist1モデルを使用するべきではないと考えられる。

In [ ]: