2×2の分割表のカイ二乗検定とG検定とFisherの正確検定の比較¶

サンプルを多項分布で生成する場合

黒木玄

2017-09-19

2×2の分割表に関するカイ二乗検定とFisherの正確検定の比較に関する

https://oku.edu.mie-u.ac.jp/~okumura/stat/fisher-chisq.html

の結果(のsubset)をJulia言語で再現してみた。

ついでにG検定の場合の結果も付け加えておいた。

さらに複数の確率分布のケースでテストしてみた。

Distributionsパッケージについては次のリンク先を参照せよ。

https://juliastats.github.io/Distributions.jl/latest/

HypothesisTestsパッケージも存在するが、あえて利用しなかった。

統計関係のパッケージの情報については次のリンク先を参照せよ。

https://juliastats.github.io/

各種函数の定義¶

In [1]:

using PyPlot
using Distributions

In [2]:

versioninfo()

Julia Version 0.6.0
Commit 903644385b* (2017-06-19 13:05 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

In [3]:

# xlog(x,y) = if x == 0 then 0 else x*log(x/y)
#
# Warning:
#
#   Don't use x*log(x/y) instead of xlog(x,y).
#   Because log(0) = -Inf and 0*Inf = NaN.
#
function xlog(x::Float64, y::Float64)
    if x == zero(x)
        return zero(x)
    else
        return x*log(x/y)
    end
end
xlog(x,y) = xlog(Float64(x),Float64(y))

x, y = 0, 1
@show x, y
@show x*log(x/y)
@show xlog(x,y);

(x, y) = (0, 1)
x * log(x / y) = NaN
xlog(x, y) = 0.0

In [4]:

prodprob(px,py) = [px*py, px*(1-py), (1-px)*py, (1-px)*(1-py)]

function expected(a::AbstractArray{T,1}) where T
    local n = sum(a)
    return [(a[1]+a[2])*(a[1]+a[3])/n, (a[1]+a[2])*(a[2]+a[4])/n, 
            (a[3]+a[4])*(a[1]+a[3])/n, (a[3]+a[4])*(a[2]+a[4])/n]
end

function chisqtest(a::AbstractArray{Int64,1})
    local mu = expected(a)
    local chisq = sum((a .- mu).^2 ./mu)
    local pval = ccdf(Chisq(1), chisq)
    return pval
end

function gtest(a::AbstractArray{Int64,1})
    local mu = expected(a)
    local g = 2*sum(xlog.(Float64.(a),mu)) # Don't use a.*log(a./mu)
    local pval = ccdf(Chisq(1), g)
    return pval
end

function pvaluehg(d::Hypergeometric, k::Int64)
    local c = params(d)
    local amax = min(c[1],c[3])
    local p0 = pdf(d, k)
    local p1 = 0.0
    local pval = 0.0
    for j in 0:amax
        p1 = pdf(d, j)
        pval += ifelse(p1 ≤ p0, p1, 0.0)
    end
    return min(pval, 1.0)
end

function fishertest(a::AbstractArray{Int64,1})
    local d = Hypergeometric(a[1]+a[2], a[3]+a[4], a[1]+a[3])
    return pvaluehg(d, a[1])
end

ecdf(pval::AbstractArray{Float64,1}, x::Float64) = count(p -> p ≤ x, pval)/length(pval)

Out[4]:

ecdf (generic function with 1 method)

In [5]:

## test

@show x = chisqtest([5, 13, 12, 20])

y = 0.4860562818425576
@show y

abs(x-y)

x = chisqtest([5, 13, 12, 20]) = 0.4860562818425576
y = 0.4860562818425576

Out[5]:

0.0

In [6]:

## test

@show x = gtest([5, 13, 12, 20])

y = 0.48251234020228
@show y

abs(x-y)

x = gtest([5, 13, 12, 20]) = 0.4825123402022797
y = 0.48251234020228

Out[6]:

3.3306690738754696e-16

In [7]:

## test

@show x = fishertest([5, 13, 12, 20])

y = 0.5482810404890348
@show y

abs(x-y)

x = fishertest([5, 13, 12, 20]) = 0.548281040489035
y = 0.5482810404890348

Out[7]:

2.220446049250313e-16

In [8]:

function plotECDs(x, ecdf_chisq, ecdf_g, ecdf_fisher)
    plot(x, ecdf_chisq.(x),  label="\$\\chi^2\$ test", ls="-")
    plot(x, ecdf_g.(x),      label="G-test",          ls="--")
    plot(x, ecdf_fisher.(x), label="Fisher test",     ls=":")
    plot(x, x, label="y = x", color="black", lw=0.5)
    legend(fontsize=8)
    xlabel("x")
    ylabel("y = probability(p-value \$\\leqq\$ x)")
    title("empirical cumulative distributions", fontsize=10)    
end

function plotErrors(x, ecdf_chisq, ecdf_g, ecdf_fisher)
    plot(x, ecdf_chisq.(x) .-x, label="error of \$\\chi^2\$ test", ls="-")
    plot(x, ecdf_g.(x)     .-x, label="error of G-test",          ls="--")
    plot(x, ecdf_fisher.(x).-x, label="error of Fisher test",     ls=":")
    plot(x, zeros(x), color="black", lw=0.5)
    legend(fontsize=8)
    xlabel("x")
    ylabel("y = probability(p-value \$\\leqq\$ x) \$-\$ x")
    title("error = probability(p-value \$\\leqq\$ x) \$-\$ x", fontsize=10)
end

function plotcomparison(n, p, x, ecdf_chisq, ecdf_g, ecdf_fisher; ploterrors=true)
    if ploterrors
        figure(figsize=(10,4))
        
        ax1 = subplot(121)
        plotECDs(x, ecdf_chisq, ecdf_g, ecdf_fisher)
        ax1[:set_aspect]("equal")
        
        ax2 = subplot(122)
        plotErrors(x, ecdf_chisq, ecdf_g, ecdf_fisher)

        suptitle("n = $n,  n*p = $(reshape(n*p,2,2)')")
    else
        fig = figure(figsize=(6,4))
        plotECDs(x, ecdf_chisq, ecdf_g, ecdf_fisher)
        axes()[:set_aspect]("equal")
    end
end

function comparisonpval(n::Int64, px::Float64, py::Float64; N = 10^5, xmaxlist = [1.0, 0.1, 0.02])
    return comparisonpval(n, prodprob(px,py), N = N, xmaxlist = xmaxlist)
end

function comparisonpval(n::Int64, p::AbstractArray{Float64}; 
        N = 10^5, xmaxlist = [1.0, 0.1, 0.02], ploterrors = true)
    local a = rand(Multinomial(n,p), N)
    local pvalchisq  = [chisqtest(a[:,i]) for i in 1:size(a,2)]
    local pvalg      = [gtest(a[:,i]) for i in 1:size(a,2)]
    local pvalfisher = [fishertest(a[:,i]) for i in 1:size(a,2)]
    
    local ecdf_chisq(x) = ecdf(pvalchisq, x)
    local ecdf_g(x) = ecdf(pvalg, x)
    local ecdf_fisher(x) = ecdf(pvalfisher, x)
    
    for xmax in xmaxlist
        x = linspace(0.0, xmax, 101)
        plotcomparison(n, p, x, ecdf_chisq, ecdf_g, ecdf_fisher, ploterrors=ploterrors)
    end
end

Out[8]:

comparisonpval (generic function with 2 methods)

奥村晴彦さんの結果の再現¶

https://oku.edu.mie-u.ac.jp/~okumura/stat/fisher-chisq.html

以下ではG検定の場合も付け加えてあり、プロットの範囲が1.0まで全部、0.1まで、0.02までの3つの場合をプロットしてある。

In [9]:

n = 50
px, py = 0.4, 0.3
@time comparisonpval(n, px, py)

 4.084475 seconds (3.42 M allocations: 169.830 MiB, 0.75% gc time)

複数の確率分布で比較¶

確率分布の設定¶

$n$ を固定し、2×2の確率の表を

$$\displaystyle P = \begin{bmatrix} p_x p_y & p_x(1-p_y) \\ (1-p_x)p_y & (1-p_x)(1-p_y) \\ \end{bmatrix} $$

て定義し、$p_x,p_y$ を $0.1$ 刻みで $0.1\leqq p_x\leqq p_y\leqq 0.5$ の範囲を動かす。

上の形の $P$ のスカラー倍で表示できる行列は独立型であると言うことにする。

モンテカルロシミュレーション¶

確率の表 $P$ と $n$ が定める多項分布に従って、整数行列

$$\displaystyle A = \begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} $$

を生成し、この行列を2×2の分割表とみなして、カイ二乗検定、G検定、Fisherの正確検定の方法に従って、 p 値を求める。

そのようにして求めた p 値は行列 $A$ が独立型から離れれば離れるほど小さくなる。

このような計算を $N=10^5$ 回繰り返して、 p 値が $x$ 以下になる確率を求め、それをプロットする。

理想的には p 値が $x$ 以下になる確率は $x$ に一致することが望ましいが、実際には、

(p 値が $x$ 以下になる確率) $-$ $x$

で定義される誤差が生じる。その誤差もプロットする。

x = 0.0～0.1 の範囲をプロット¶

有意水準5%での精度に興味のある人は多いだろう。

そこでまず x = 0.0～0.1 の範囲をプロットしよう。

45度線に近いほど精度が高いと考えられる。

n = 25, 50, 100 の場合を扱う。

n = 25 で px = 0.1 の極端な場合を除けば、カイ二乗検定(太実線)の精度が最も高いように見える。

In [10]:

n = 25
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[0.1])
    end
end
sleep(0.2)

  0.550569 seconds (2.20 M allocations: 106.122 MiB, 2.61% gc time)
  0.600330 seconds (2.20 M allocations: 106.110 MiB, 1.23% gc time)
  0.646636 seconds (2.20 M allocations: 106.110 MiB, 0.85% gc time)
  0.719684 seconds (2.20 M allocations: 106.110 MiB, 16.81% gc time)
  0.585032 seconds (2.20 M allocations: 106.110 MiB, 1.05% gc time)
  0.639246 seconds (2.20 M allocations: 106.110 MiB, 0.90% gc time)
  0.716844 seconds (2.20 M allocations: 106.110 MiB, 0.78% gc time)
  0.767485 seconds (2.20 M allocations: 106.110 MiB, 0.88% gc time)
  0.760211 seconds (2.20 M allocations: 106.110 MiB, 0.61% gc time)
  0.910759 seconds (2.20 M allocations: 106.110 MiB, 0.73% gc time)
  0.855745 seconds (2.20 M allocations: 106.110 MiB, 0.72% gc time)
  0.849117 seconds (2.20 M allocations: 106.110 MiB, 0.95% gc time)
  0.931961 seconds (2.20 M allocations: 106.110 MiB, 0.63% gc time)
  1.034661 seconds (2.20 M allocations: 106.110 MiB, 9.44% gc time)
  0.991448 seconds (2.20 M allocations: 106.110 MiB, 0.45% gc time)

In [11]:

n = 50
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[0.1])
    end
end
sleep(0.2)

  0.877311 seconds (2.20 M allocations: 106.112 MiB, 0.96% gc time)
  0.762804 seconds (2.20 M allocations: 106.110 MiB, 0.88% gc time)
  0.786212 seconds (2.20 M allocations: 106.110 MiB, 0.80% gc time)
  0.756936 seconds (2.20 M allocations: 106.110 MiB, 0.86% gc time)
  0.825353 seconds (2.20 M allocations: 106.110 MiB, 0.82% gc time)
  1.003583 seconds (2.20 M allocations: 106.110 MiB, 0.67% gc time)
  1.061235 seconds (2.20 M allocations: 106.110 MiB, 0.61% gc time)
  1.108900 seconds (2.20 M allocations: 106.110 MiB, 0.67% gc time)
  1.122526 seconds (2.20 M allocations: 106.110 MiB, 0.58% gc time)
  1.372241 seconds (2.20 M allocations: 106.110 MiB, 7.55% gc time)
  1.397477 seconds (2.20 M allocations: 106.110 MiB, 0.48% gc time)
  1.416365 seconds (2.20 M allocations: 106.110 MiB, 0.41% gc time)
  1.621613 seconds (2.20 M allocations: 106.110 MiB, 0.37% gc time)
  1.716732 seconds (2.20 M allocations: 106.110 MiB, 0.27% gc time)
  1.841540 seconds (2.20 M allocations: 106.110 MiB, 0.32% gc time)

In [12]:

n = 100
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[0.1])
    end
end
sleep(0.2)

  1.085354 seconds (2.20 M allocations: 106.115 MiB, 0.76% gc time)
  1.056799 seconds (2.20 M allocations: 106.110 MiB, 0.71% gc time)
  1.090018 seconds (2.20 M allocations: 106.110 MiB, 0.59% gc time)
  1.076235 seconds (2.20 M allocations: 106.110 MiB, 0.69% gc time)
  1.051345 seconds (2.20 M allocations: 106.110 MiB, 0.58% gc time)
  1.681336 seconds (2.20 M allocations: 106.110 MiB, 5.94% gc time)
  1.779170 seconds (2.20 M allocations: 106.110 MiB, 0.33% gc time)
  1.778485 seconds (2.20 M allocations: 106.110 MiB, 0.34% gc time)
  1.852405 seconds (2.20 M allocations: 106.110 MiB, 0.34% gc time)
  2.293237 seconds (2.20 M allocations: 106.120 MiB, 0.26% gc time)
  2.512785 seconds (2.20 M allocations: 106.110 MiB, 0.24% gc time)
  2.494513 seconds (2.20 M allocations: 106.110 MiB, 0.21% gc time)
  2.975737 seconds (2.20 M allocations: 106.110 MiB, 0.20% gc time)
  3.296730 seconds (2.20 M allocations: 106.110 MiB, 0.20% gc time)
  3.513900 seconds (2.20 M allocations: 106.110 MiB, 0.18% gc time)

x = 0.0～1.0 の全体をプロット¶

In [13]:

n = 25
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[1.0])
    end
end
sleep(0.2)

  0.689197 seconds (2.20 M allocations: 106.112 MiB, 1.58% gc time)
  0.654032 seconds (2.20 M allocations: 106.110 MiB, 16.53% gc time)
  0.560230 seconds (2.20 M allocations: 106.110 MiB, 1.03% gc time)
  0.538139 seconds (2.20 M allocations: 106.110 MiB, 1.04% gc time)
  0.561455 seconds (2.20 M allocations: 106.110 MiB, 0.76% gc time)
  0.648619 seconds (2.20 M allocations: 106.110 MiB, 1.15% gc time)
  0.729799 seconds (2.20 M allocations: 106.110 MiB, 0.81% gc time)
  0.718673 seconds (2.20 M allocations: 106.110 MiB, 0.85% gc time)
  0.765250 seconds (2.20 M allocations: 106.110 MiB, 0.78% gc time)
  0.798651 seconds (2.20 M allocations: 106.110 MiB, 0.70% gc time)
  0.848735 seconds (2.20 M allocations: 106.110 MiB, 0.57% gc time)
  0.864701 seconds (2.20 M allocations: 106.110 MiB, 0.76% gc time)
  1.067921 seconds (2.20 M allocations: 106.110 MiB, 9.93% gc time)
  0.954935 seconds (2.20 M allocations: 106.110 MiB, 0.66% gc time)
  1.004631 seconds (2.20 M allocations: 106.110 MiB, 0.58% gc time)

In [14]:

n = 50
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[1.0])
    end
end
sleep(0.2)

  0.724750 seconds (2.20 M allocations: 106.110 MiB, 1.25% gc time)
  0.757153 seconds (2.20 M allocations: 106.110 MiB, 0.79% gc time)
  0.757605 seconds (2.20 M allocations: 106.110 MiB, 0.74% gc time)
  0.814294 seconds (2.20 M allocations: 106.110 MiB, 0.74% gc time)
  0.746321 seconds (2.20 M allocations: 106.110 MiB, 0.85% gc time)
  0.970986 seconds (2.20 M allocations: 106.110 MiB, 0.74% gc time)
  1.188753 seconds (2.20 M allocations: 106.112 MiB, 0.55% gc time)
  1.083303 seconds (2.20 M allocations: 106.110 MiB, 0.55% gc time)
  1.186058 seconds (2.20 M allocations: 106.110 MiB, 8.41% gc time)
  1.301741 seconds (2.20 M allocations: 106.110 MiB, 0.34% gc time)
  1.390584 seconds (2.20 M allocations: 106.110 MiB, 0.42% gc time)
  1.425003 seconds (2.20 M allocations: 106.110 MiB, 0.44% gc time)
  1.632366 seconds (2.20 M allocations: 106.110 MiB, 0.35% gc time)
  1.715535 seconds (2.20 M allocations: 106.110 MiB, 0.34% gc time)
  1.928616 seconds (2.20 M allocations: 106.110 MiB, 0.31% gc time)

In [15]:

n = 100
for px in 0.1:0.1:0.5
    for py in px:0.1:0.5
        @time comparisonpval(n, px, py, xmaxlist=[1.0])
    end
end
sleep(0.2)

  0.936060 seconds (2.20 M allocations: 106.110 MiB, 0.95% gc time)
  1.073932 seconds (2.20 M allocations: 106.110 MiB, 0.60% gc time)
  1.123809 seconds (2.20 M allocations: 106.110 MiB, 0.46% gc time)
  1.117452 seconds (2.20 M allocations: 106.110 MiB, 0.56% gc time)
  1.199681 seconds (2.20 M allocations: 106.110 MiB, 9.74% gc time)
  1.748188 seconds (2.20 M allocations: 106.117 MiB, 0.33% gc time)
  1.754585 seconds (2.20 M allocations: 106.110 MiB, 0.32% gc time)
  1.772892 seconds (2.20 M allocations: 106.110 MiB, 0.34% gc time)
  1.845353 seconds (2.20 M allocations: 106.110 MiB, 0.26% gc time)
  2.454991 seconds (2.20 M allocations: 106.110 MiB, 0.31% gc time)
  2.588035 seconds (2.20 M allocations: 106.110 MiB, 0.23% gc time)
  2.480909 seconds (2.20 M allocations: 106.110 MiB, 0.28% gc time)
  3.010510 seconds (2.20 M allocations: 106.110 MiB, 0.20% gc time)
  3.114921 seconds (2.20 M allocations: 106.110 MiB, 0.18% gc time)
  3.439553 seconds (2.20 M allocations: 106.110 MiB, 0.14% gc time)

サンプルを生成する確率分布が独立型からかけ離れている場合¶

独立型からかけ離れた確率分布を生成¶

p 値(次の計算中の pval)が小さいほど独立型からかけ離れていると考えられる。

In [16]:

n = 50
r = [0,0,0,n]
for i in 1:100
    r = rand(Multinomial(n,4))
    pval = chisqtest(r)
    if pval < 0.05
        println("pval = ", pval)
        println("r = ", r)
        break
    end
end
p = r/sum(r)
println("p = ", p)

pval = 0.007862025239525573
r = [19, 11, 5, 15]
p = [0.38, 0.22, 0.1, 0.3]

結果のプロット¶

当然のことながら、独立型からかけ離れていると p 値は小さくなりやすくなる。

In [17]:

n = 50
r = [9, 15, 17, 9]
@show chisqtest(r)
@show p = r/n
@time comparisonpval(n, p, xmaxlist=[1.0], ploterrors=false)

chisqtest(r) = 0.04863928767942199
p = r / n = [0.18, 0.3, 0.34, 0.18]

 1.951615 seconds (2.20 M allocations: 106.129 MiB, 6.13% gc time)

In [18]:

n = 50
r = [16, 11, 6, 17]
@show chisqtest(r)
@show p = r/n
@time comparisonpval(n, p, xmaxlist=[1.0], ploterrors=false)

chisqtest(r) = 0.018515907275267696
p = r / n = [0.32, 0.22, 0.12, 0.34]

 1.775243 seconds (2.20 M allocations: 106.071 MiB, 0.35% gc time)

In [19]:

n = 50
r = [5, 16, 18, 11]
@show chisqtest(r)
@show p = r/n
@time comparisonpval(n, p, xmaxlist=[1.0], ploterrors=false)

chisqtest(r) = 0.007382367281393829
p = r / n = [0.1, 0.32, 0.36, 0.22]

 1.730845 seconds (2.20 M allocations: 106.071 MiB, 0.40% gc time)

In [ ]: