2×2の分割表の独立性に関する様々な検定法の比較¶

Gen Kuroki

2017-11-02

このノートブックは以下のノートブックの続き：

http://nbviewer.jupyter.org/gist/genkuroki/7c52b10fc6cdb254c8227f4fb0a47e0d

http://nbviewer.jupyter.org/gist/genkuroki/1dd6d1ee5b473435a2027d221c560640

http://nbviewer.jupyter.org/gist/genkuroki/a3034d25a429b590d96c486064e53c8b

In [1]:

ENV["LINES"] = 100
using Distributions
using PyPlot

Chisq test, G-test, and Fisher's exact test¶

In [2]:

function xlog(x, y)
    if x == zero(x)
        return zero(x)
    elseif y == zero(y)
        return typemax(x)
    else
        return x*log(x/y)
    end
end

prodprob(p, q) = Float64[p*q, p*(1-q), (1-p)*q, (1-p)*(1-q)]

function expected(A::AbstractArray{T}) where T
    a = vec(A')
    n = sum(a)
    return [(a[1]+a[2])*(a[1]+a[3])/n, (a[1]+a[2])*(a[2]+a[4])/n, 
            (a[3]+a[4])*(a[1]+a[3])/n, (a[3]+a[4])*(a[2]+a[4])/n]
end

function pval_chisq(A::AbstractArray{Int64})
    a = vec(A')
    mu = expected(a)
    chisq = sum((a .- mu).^2 ./mu)
    pval = ccdf(Chisq(1), chisq)
    return pval
end

function pval_gtest(A::AbstractArray{Int64})
    a = vec(A')
    mu = expected(a)
    g = 2*sum(xlog.(a,mu)) # Don't use a.*log(a./mu)
    pval = ccdf(Chisq(1), g)
    return pval
end

function pval_hyp(d::Hypergeometric, k::Int64)
    c = params(d)
    amax = min(c[1],c[3])
    p0 = pdf(d, k)
    pval = 0.0
    for j in 0:amax
        p1 = pdf(d, j)
        pval += ifelse(p1 ≤ p0, p1, 0.0)
    end
    return min(pval, 1.0)
end

function pval_fisher(A::AbstractArray{Int64})
    a = vec(A)
    d = Hypergeometric(a[1]+a[2], a[3]+a[4], a[1]+a[3])
    return pval_hyp(d, a[1])
end

X = [
    228 863
    284 851
]
display(X)
@show pval_chisq(X)
@show pval_gtest(X)
@show pval_fisher(X)

2×2 Array{Int64,2}:
 228  863
 284  851

pval_chisq(X) = 0.02082498746388306
pval_gtest(X) = 0.02070436219763828
pval_fisher(X) = 0.023345745791372826

Out[2]:

0.023345745791372826

すぐ上のFisher正確確率検定のp値は

https://oku.edu.mie-u.ac.jp/~okumura/stat/fishertest.html

の例と一致。

	支持	それ以外
読売	228	863
NHK	284	851

R での

fisher.test(matrix(c(228,284,863,851),nrow=2))

の結果は p = 0.02335 らしい。

Barnard's and Boschloo's exact tests: double binomial case¶

For details of Barnard's and Boschloo's exact tests, browse the following URLs:

https://www.r-statistics.com/2010/02/barnards-exact-test-a-powerful-alternative-for-fishers-exact-test-implemented-in-r/

https://stats.stackexchange.com/questions/169864/boschloo-or-barnard-which-test-for-cross-table-analysis

https://cran.r-project.org/web/packages/Exact/index.html

https://cran.r-project.org/web/packages/Barnard/index.html

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.572.632&rep=rep1&type=pdf

Binomial case では2×2の分割表

$X = \begin{bmatrix} a & b \\ c & d \\ \end{bmatrix}$

において $a+b=r$ と $c+d=s$ が固定されていて、 $(a,b)$ , $(c,d)$ のそれぞれが独立な二項分布 Binomial(r,p), Binomial(s,q) に従う場合を扱う。確率は

$\displaystyle P(a,b,c,d) = \binom{r}{a}\binom{s}{c} p^a (1-p)^b q^c (1-q)^d$

と書ける。 $p=q$ であるかどうかに関する複数の検定を扱う。 $p=q$ のとき、

$\displaystyle P(a,b,c,d) = \binom{r}{a}\binom{s}{c} p^{a+c} (1-p)^{b+d}$

In [3]:

#  a  b  r=a+b
#  c  d  s=c+d
#
# a, b=r-a, c, d=s-c <---> r=a+b, s=c+d, a, c
#
# matrix X = [
#    a  b
#    c  d
# ]
# <----> values r, s, a, c

function matrix2vals(X)
    x = vec(X')
    return x[1]+x[2], x[3]+x[4], x[1], x[3]
end

vals2matrix(r, s, a, c) = [
    a r-a
    c s-c
]

@show X = [
    1 2
    3 4
]
@show r, s, a, c = matrix2vals(X)
@show vals2matrix(r, s, a, c);

X = [1 2; 3 4] = [1 2; 3 4]
(r, s, a, c) = matrix2vals(X) = (3, 7, 1, 3)
vals2matrix(r, s, a, c) = [1 2; 3 4]

p-value functions¶

In [4]:

# matrix valued rand
#
function mrand_2bin(r, s, p, q)
    a, c = rand(Binomial(r,p)), rand(Binomial(s,p))
    return vals2matrix(r, s, a, c)
end
mrand_2bin(r, s, p) = mrand_2bin(r, s, p, p)

# generator of sample
#
function rand_2bin(r, s, p, q, L::Integer)
    a = rand(Binomial(r,p),L)
    b = r .- a
    c = rand(Binomial(s,p),L)
    d = s .- c
    return vcat(a',b',c',d')
end

rand_2bin(r, s, p, L::Integer) = rand_2bin(r, s, p, p, L)

@show mrand_2bin(5, 10, 0.5)
rand_2bin(5, 10, 0.5, 10)

mrand_2bin(5, 10, 0.5) = [3 2; 8 2]

Out[4]:

4×10 Array{Int64,2}:
 1  3  5  3  4  3  2  2  3  2
 4  2  0  2  1  2  3  3  2  3
 2  4  7  6  3  7  5  6  3  5
 8  6  3  4  7  3  5  4  7  5

In [5]:

# probabilities of the double binomial distributions
#
p_2bin(r, s, a, c, p, q) = pdf(Binomial(r,p), a)*pdf(Binomial(s,q), c)
p_2bin(r, s, a, c, p) = p_2bin(r, s, a, c, p, p)
p_2bin(X, p, q) = begin r, s, a, c = matrix2vals(X); p_2bin(r, s, a, c, p, q) end
p_2bin(X, p) = p_2bin(X, p, p)

# test statistics functions
#
function score_2bin(r, s, a, c)
    pac = (a+c)/(r+s)
    pa = a/r
    pc = c/s
    return (pc - pa)/sqrt(pac*(1-pac)*(1/r+1/s))
end
absscore_2bin(r, s, a, c) = abs(score_2bin(r, s, a, c))

function wald_2bin(r, s, a, c)
    pa = a/r
    pc = c/s
    return (pc - pa)/sqrt(pa*(1-pa)/r + pc*(1-pc)/s)
end
abswald_2bin(r, s, a, c) = abs(wald_2bin(r, s, a, c))

function negpval_fisher(r, s, a, c)
    X = vals2matrix(r, s, a, c)
    return -pval_fisher(X)
end

function pearson_chisq(r, s, a, c)
    x = [a, r-a, c, s-c]
    mu = expected(x)
    return sum((x .- mu).^2 ./mu)
end

default_testfunc = absscore_2bin
#default_testfunc = abswald_2bin
#default_testfunc = negpval_fisher
#default_testfunc = pearson_chisq

# p-value functions with parameters
#
function pval_2bin(r, s, a0, c0, p; testfunc = default_testfunc)
    t0 = testfunc(r, s, a0, c0)
    pval = 0.0
    for a in 0:r
        for c in 0:s
            pval += ifelse(testfunc(r,s,a,c) ≥ t0, p_2bin(r,s,a,c,p), 0.0)
        end
    end
    return min(pval, 1.0)     
end

function pval_2bin(X0, p; testfunc = default_testfunc)
    r, s, a0, c0 = matrix2vals(X0)
    return pval_2bin(r, s, a0, c0, p, testfunc=testfunc)
end

# default set of parameters
#
default_ps = 0.0:0.05:1.0

# maximized p-valued functions
#
function maxpval_and_p_2bin(r, s, a0, c0; ps = default_ps, testfunc = default_testfunc)
    pval , i = findmax(pval_2bin(r, s, a0, c0, p, testfunc=testfunc) for p in ps)
    return pval, ps[i]
end

function maxpval_and_p_2bin(X0; ps = default_ps, testfunc = default_testfunc)
    r, s, a0, c0 = matrix2vals(X0)
    return maxpval_and_p_2bin(r, s, a0, c0, ps=ps, testfunc=testfunc)
end

function maxpval_2bin(r, s, a0, c0; ps = default_ps, testfunc = default_testfunc)
    return maxpval_and_p_2bin(r, s, a0, c0, ps=ps, testfunc=testfunc)[1]
end

function maxpval_2bin(X0; ps = default_ps, testfunc = default_testfunc)
    return maxpval_and_p_2bin(X0, ps=ps, testfunc=testfunc)[1]
end

Out[5]:

maxpval_2bin (generic function with 2 methods)

In [6]:

ps=0:0.001:1

X = [
     2  3
    35 10
]

@show pval_chisq(X)
@show pval_gtest(X)
@show pval_fisher(X)

testfunc = absscore_2bin
println("\n--- testfunc = $testfunc")
@show X
@show maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc)
@show X'
@show maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc)

testfunc = abswald_2bin
println("\n--- testfunc = $testfunc")
@show X
@show maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc)
@show X'
@show maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc)

testfunc = negpval_fisher
println("\n--- testfunc = $testfunc")
@show X
@show maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc)
@show X'
@show maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc)

# testfunc = pearson_chisq
# println("\n--- testfunc = $testfunc")
# @show X
# @show maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc)
# @show X'
# @show maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc)
;

pval_chisq(X) = 0.06769876794304266
pval_gtest(X) = 0.08846878235925332
pval_fisher(X) = 0.10299326020880142

--- testfunc = absscore_2bin
X = [2 3; 35 10]
maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc) = (0.09211707265385571, 0.039)
X' = [2 35; 3 10]
maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc) = (0.07381620119003682, 0.19)

--- testfunc = abswald_2bin
X = [2 3; 35 10]
maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc) = (0.5097750410405969, 0.104)
X' = [2 35; 3 10]
maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc) = (0.3366564805684976, 0.064)

--- testfunc = negpval_fisher
X = [2 3; 35 10]
maxpval_and_p_2bin(X, ps=ps, testfunc=testfunc) = (0.05656022451135602, 0.478)
X' = [2 35; 3 10]
maxpval_and_p_2bin(X', ps=ps, testfunc=testfunc) = (0.0808924875815553, 0.338)

In [7]:

@show maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=absscore_2bin)
@show score_2bin(2+3, 35+10, 2, 35);

maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=absscore_2bin) = (0.09211707265385571, 0.039)
score_2bin(2 + 3, 35 + 10, 2, 35) = 1.827006660620956

http://scistatcalc.blogspot.jp/2013/11/barnards-test-calculator.html

によれば

[2 3; 35 10] のケースでは

Score Statistic is 1.827007
Nuisance parameter is 0.190100
P-value for two-sided test is 0.073816

なので、testfunc = absscore_2bin の場合の結果と一致している。

R の Exact パッケージの出力

> data <- matrix(c(2, 3, 35, 10), 2, 2, byrow=TRUE)
> exact.test(data)

        z-pooled

data:  2 out of 5 vs. 35 out of 45
test statistic = -1.827, first sample size = 5, second sample size =
45, p-value = 0.09212
alternative hypothesis: true difference in proportion is not equal to 0
sample estimates:
difference in proportion 
              -0.3777778

これも上の結果と数値的に一致している。

testfunc = absscore_2bin の場合には正しく計算できているものと思われる。

In [8]:

@show maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=abswald_2bin)
@show score_2bin(2+3, 35+10, 2, 35);

maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=abswald_2bin) = (0.5097750410405969, 0.104)
score_2bin(2 + 3, 35 + 10, 2, 35) = 1.827006660620956

これもp値が R の Exact パッケージと一致している。

> exact.test(data, method="Wald")

        z-unpooled

data:  2 out of 5 vs. 35 out of 45
test statistic = -1.6592, first sample size = 5, second sample size
= 45, p-value = 0.5098
alternative hypothesis: true difference in proportion is not equal to 0
sample estimates:
difference in proportion 
              -0.3777778

In [9]:

@show maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=negpval_fisher)
@show score_2bin(2+3, 35+10, 2, 35);

maxpval_and_p_2bin([2 3; 35 10], ps=0:0.001:1, testfunc=negpval_fisher) = (0.05656022451135602, 0.478)
score_2bin(2 + 3, 35 + 10, 2, 35) = 1.827006660620956

R の Exact パッケージの結果の p-value 0.05656 と一致している。

> exact.test(data, method="Boschloo")

        boschloo

data:  2 out of 5 vs. 35 out of 45
test statistic = 0.10299, first sample size = 5, second sample size
= 45, p-value = 0.05656
alternative hypothesis: true difference in proportion is not equal to 0
sample estimates:
difference in proportion 
              -0.3777778

testfunc = negpval_fisher (Bolschloo's exact test)の場合には正しく計算できているものと思われる。

複数の検定法の比較¶

In [10]:

function make_pvals(samples, calcpval)
    L = size(samples,2)
    pvals = Array{Float64}(L)
    for j in 1:L
        pvals[j] = calcpval(@view samples[:,j])
    end
    return pvals
end

ecdf(pvals, x) = count(p -> p ≤ x, pvals)/length(pvals)

ls = [
    "--",
    "-.",
    (0, (5, 2, 2, 2)),
    (0, (5, 1.5, 1, 1.5, 1, 1.5, 1, 1.5)),
    (0, (5, 1.5, 5, 1.5, 1, 1.5, 1, 1.5)),
    (0, (5, 1.5, 5, 1.5, 5, 1.5, 1, 1.5)),
    (0, (7, 1.5, 3, 1.5, 3, 1.5, 3, 1.5)),
    (0, (7, 1.5, 7, 1.5, 3, 1.5, 3, 1.5)),
    (0, (7, 1.5, 7, 1.5, 7, 1.5, 3, 1.5)),
    ":",
    "-",
]

function plotfuncs(string, x, funcs...)
    plot(x, x, color="gray", lw=1.2, ls=":", label="y = x")
    for i in 1:length(funcs)
        plot(x, funcs[i].(x), lw=1.0, ls=ls[mod1(i, length(ls))], label="$(funcs[i])")
    end
    grid(ls=":")
    xlabel("x")
    ylabel("probabilty of p-value ≤ x")
    xmax = maximum(x)
    delta = 0.1*xmax
    xticks(0.0:delta:1.01*xmax, fontsize=8, rotation=30)
    yticks(fontsize=8)
    legend(fontsize=8)
    title(string, fontsize=10)
end

function plotecdf(string, funcs...)
    figure(figsize=(8,4))

    subplot(121)
    x = linspace(0.0,0.1,201)
    plotfuncs(string, x, funcs...)

    subplot(122)
    x = linspace(0.0,1.0,201)
    plotfuncs(string, x, funcs...)

    tight_layout()
end

Out[10]:

plotecdf (generic function with 1 method)

In [11]:

function plotall(r, s, p, L)
    samples = rand_2bin(r, s, p, L)

    @time pvals_chisq = make_pvals(samples, pval_chisq)
    @time pvals_gtest = make_pvals(samples, pval_gtest)
    @time pvals_fisher = make_pvals(samples, pval_fisher)

    pval_score(X) = maxpval_2bin(X, testfunc=absscore_2bin)
    @time pvals_score = make_pvals(samples, pval_score)

    pval_wald(X) = maxpval_2bin(X, testfunc=abswald_2bin)
    @time pvals_wald = make_pvals(samples, pval_wald)

    pval_boschloo(X) = maxpval_2bin(X, testfunc=negpval_fisher)
    @time pvals_boschloo = make_pvals(samples, pval_boschloo)

    Chisq_Test(x) = ecdf(pvals_chisq, x)
    G_Test(x) = ecdf(pvals_gtest, x)
    Fisher_Test(x) = ecdf(pvals_fisher, x)

    Barnard_Score(x) = ecdf(pvals_score, x)
    Barnard_Wald(x) = ecdf(pvals_wald, x)
    Boschloo(x) = ecdf(pvals_boschloo, x)

    sleep(0.1)
    plotecdf("r = $r, s = $s, p = $p", Fisher_Test, Barnard_Score, Barnard_Wald, Boschloo)
    plotecdf("r = $r, s = $s, p = $p", Fisher_Test, Chisq_Test, G_Test, Boschloo)
end

Out[11]:

plotall (generic function with 1 method)

In [12]:

r, s, p, L = 5, 45, 0.4, 1000
plotall(r, s, p, L)

  0.000629 seconds (13.00 k allocations: 461.063 KiB)
  0.208641 seconds (111.40 k allocations: 4.812 MiB)
  0.004550 seconds (1.00 k allocations: 54.813 KiB)
  2.481438 seconds (104.74 k allocations: 4.681 MiB)
  2.494250 seconds (102.00 k allocations: 4.524 MiB, 0.27% gc time)
 26.725942 seconds (17.55 M allocations: 1.045 GiB, 0.28% gc time)

Boschloo検定の計算が非常に重い。

他の例¶

In [13]:

r, s, p, L = 10, 40, 0.4, 1000
plotall(r, s, p, L)

  0.000676 seconds (13.00 k allocations: 461.063 KiB)
  0.016525 seconds (57.17 k allocations: 1.968 MiB)
  0.008494 seconds (1.00 k allocations: 54.813 KiB)
  4.104105 seconds (102.00 k allocations: 4.524 MiB)
  4.147616 seconds (102.00 k allocations: 4.524 MiB)
 69.452609 seconds (28.58 M allocations: 1.702 GiB, 0.19% gc time)

In [14]:

r, s, p, L = 20, 30, 0.4, 1000
plotall(r, s, p, L)

  0.000615 seconds (13.00 k allocations: 461.063 KiB)
  0.014630 seconds (57.00 k allocations: 1.961 MiB)
  0.017536 seconds (1.00 k allocations: 54.813 KiB)
  6.376714 seconds (102.00 k allocations: 4.524 MiB)
  6.353142 seconds (102.01 k allocations: 4.524 MiB, 0.05% gc time)
166.247818 seconds (41.18 M allocations: 2.453 GiB, 0.11% gc time)

事後予測分布を使う検定の試み: 二重二項分布の場合¶

$p=q$ のときの分布

$\displaystyle P(a,b,c,d) = \binom{r}{a}\binom{s}{c} p^{a+c} (1-p)^{b+d}, \qquad a+b=r,\quad c+d=s$

の共役事前分布は

$\displaystyle \varphi(\,p\mid\alpha,\beta\,) = \frac{1}{B(\alpha,\beta)} p^{\alpha-1} (1-p)^{\beta-1}$

であり、サンプル $(a_0,b_0,c_0,d_0)$ に関する事後分布は次に一致する:

$\displaystyle \varphi(\,p\mid a_0+c_0+\alpha-1, b_0+d_0+\beta-1).$

ゆえに(事後)予測分布は次の形になることがわかる:

$\displaystyle P_*(a,b,c,d) = \frac {\displaystyle\binom{r}{a}\binom{s}{c}B(a_0+c_0+\alpha, b_0+d_0+\beta)} {B(a+c+a_0+c_0+\alpha, b+d+b_0+d_0+\beta)}.$

この予測分布もとでスコアの絶対値がサンプル $(a_0,b_0,c_0,d_0)$ 以上になる確率を正確に計算してそれをp値として採用するとどうなるかを調べてみよう。 (スコアの二乗はPearsonのカイ二乗統計量と一致するので、Pearsonのカイ二乗統計量の大小関係で偏りを定義していることに等しい。他にも選択肢が無数にあるが、最も手軽であるという理由で最初に試してみるべきなのは、この場合だと思われる。)

通常のカイ二乗検定は、Pearsonのカイ二乗統計量がサンプルサイズを大きくするとき漸近的に自由度1のカイ二乗分布に従うことを使う方法なので、以上で提案した方法とは異なる。

以上では予測分布としてベイズ推定法で求めたものを採用したが、最尤法で求めたものを採用する variation も考えられる。

事後分布と事後予測分布を求める函数などの定義¶

In [12]:

# α, β = parameters of posterior
# X = sample
# Y = predictive

lbinom(n,k) = lgamma(n+1) - lgamma(k+1) - lgamma(n-k+1)
binom(n,k) = exp(lbinom(n,k))

function logposterior_2bin(α, β, X, p)
    x = vec(X')
    a0, b0, c0, d0 = x[1], x[2], x[3], x[4]
    w = (a0+c0+α-1)*log(p) + (b0+d0+β-1)*log(1-p) - lbeta(a0+c0+α, b0+d0+β)
    return w
end
posterior_2bin(α,β,X,p) = exp(logposterior_2bin(α, β, X, p))

function logpred_2bin(α, β, X, Y)
    x = vec(X')
    y = vec(Y')
    a0, b0, c0, d0 = x[1], x[2], x[3], x[4]
    a,  b,  c,  d  = y[1], y[2], y[3], y[4]
    p = lbinom(a+b,a) + lbinom(c+d,c) + lbeta(a+c+a0+c0+α, b+d+b0+d0+β)
    p -= lbeta(a0+c0+α, b0+d0+β)
    return p
end

pred_2bin(α, β, X, Y) = exp(logpred_2bin(α, β, X, Y))

function sum_2bin(f, r, s)
    y = 0.0
    for a in 0:r
        b = r - a
        for c in 0:s
            d = s - c
            y += f([a,b,c,d])
        end
    end
    y
end

@show α = 0.5
@show β = 0.5
@show X = [
    4 6
    8 12
]
@show r = X[1,1] + X[1,2]
@show s = X[2,1] + X[2,2]
pred(Y) = pred_2bin(α, β, X, Y)
@show sum_2bin(pred, r, s)

println()

@show α = 1.0
@show β = 1.0
@show X = [
    6 4
    8 12
]
@show r = X[1,1] + X[1,2]
@show s = X[2,1] + X[2,2]
pred(Y) = pred_2bin(α, β, X, Y)
@show sum_2bin(pred, r, s)

sleep(0.1)

posterior(w) = posterior_2bin(α,β,X,w)
w = linspace(0,1,201)
figure(figsize=(5,3.5))
plot(w, posterior.(w))
grid(ls=":")
title("posterior")

x = collect(0:r)
y = collect(0:s)

pred(a,c) = pred_2bin(α,β, X, [a r-a; c s-c])
z = pred.(x',y)
figure(figsize=(5,4))
pcolormesh([x;r+1]', [y;s+1], z, cmap="CMRmap")
colorbar()
xlabel("a")
ylabel("c")
title("predictive distribution")

truemodel(a,c) = p_2bin(r,s,a,c,0.4)
z = truemodel.(x',y)
figure(figsize=(5,4))
pcolormesh([x;r+1]', [y;s+1], z, cmap="CMRmap")
colorbar()
xlabel("a")
ylabel("c")
title("virtual true distribution")

α = 0.5 = 0.5
β = 0.5 = 0.5
X = [4 6; 8 12] = [4 6; 8 12]
r = X[1, 1] + X[1, 2] = 10
s = X[2, 1] + X[2, 2] = 20
sum_2bin(pred, r, s) = 1.000000000000003

α = 1.0 = 1.0
β = 1.0 = 1.0
X = [6 4; 8 12] = [6 4; 8 12]
r = X[1, 1] + X[1, 2] = 10
s = X[2, 1] + X[2, 2] = 20
sum_2bin(pred, r, s) = 0.9999999999999716

Out[12]:

PyObject <matplotlib.text.Text object at 0x0000000033AE5940>

確率の和がきちんと $1$ になっていることも確認できた(数値計算なので誤差は出ている)。

予測分布は(サンプルを生成した)真の分布を近似するものになるが、以上で示したような感じで違いが出る。

In [13]:

function score(X)
    x = vec(X')
    a = x[1]
    b = x[2]
    c = x[3]
    d = x[4]
    pac = (a+c)/(a+b+c+d)
    pa = a/(a+b)
    pc = c/(c+d)
    return (pc - pa)/sqrt(pac*(1-pac)*(1/(a+b)+1/(c+d)))
end
absscore(X) = abs(score(X))

@show score([4 6; 2 18])
@show score([4 6; 4 16])
@show score([4 6; 6 14])
@show score([4 6; 8 12])
@show score([4 6; 10 10])
@show score([4 6; 12 8])
@show score([4 6; 14 6])
@show score([4 6; 16 4])

default_teststat = absscore

function pval_pred_2bin(α,β, X, Y; teststat = default_teststat)
    y = vec(Y')
    r = y[1] + y[2]
    s = y[3] + y[4]
    pred(Y) = pred_2bin(α,β,X,Y)
    t0 = teststat(X)
    pval = 0.0
    for a in 0:r
        for c in 0:s
            Y = [a r-a; c s-c]
            pval += ifelse(teststat(Y) ≥ t0, pred(Y), 0.0)
        end
    end
    return min(pval, 1.0)     
end
pval_pred_2bin(α,β, X; teststat = default_teststat) = pval_pred_2bin(α,β, X, X, teststat=teststat)

println()
α = 0.5
β = 0.5
X = [6 4; 4 16]
@show pval_chisq(X)
@show pval_gtest(X)
@show pval_fisher(X)
@show maxpval_2bin(X)
@show pval_pred_2bin(α,β,X)

score([4 6; 2 18]) = -1.9364916731037083
score([4 6; 4 16]) = -1.1677484162422844
score([4 6; 6 14]) = -0.5477225575051663
score([4 6; 8 12]) = 0.0
score([4 6; 10 10]) = 0.5175491695067655
score([4 6; 12 8]) = 1.035098339013531
score([4 6; 14 6]) = 1.581138830084189
score([4 6; 16 4]) = 2.1908902300206643

pval_chisq(X) = 0.028459736916310565
pval_gtest(X) = 0.029908917627333883
pval_fisher(X) = 0.04485792401834373
maxpval_2bin(X) = 0.029226408763686467
pval_pred_2bin(α, β, X) = 0.027610726943770786

Out[13]:

0.027610726943770786

In [15]:

function plot_comparison_with_pred(r, s, p, L; α=0.5, β=0.5)
    samples = rand_2bin(r, s, p, L)

    pval_pred(X) = pval_pred_2bin(α,β,X)
    @time pvals_pred = make_pvals(samples, pval_pred)

    @time pvals_chisq = make_pvals(samples, pval_chisq)
    @time pvals_gtest = make_pvals(samples, pval_gtest)
    @time pvals_fisher = make_pvals(samples, pval_fisher)

    pval_score(X) = maxpval_2bin(X, testfunc=absscore_2bin)
    @time pvals_score = make_pvals(samples, pval_score)

    pval_wald(X) = maxpval_2bin(X, testfunc=abswald_2bin)
    @time pvals_wald = make_pvals(samples, pval_wald)

    # pval_boschloo(X) = maxpval_2bin(X, testfunc=negpval_fisher)
    # @time pvals_boschloo = make_pvals(samples, pval_boschloo)

    Pred_Bayes(x) = ecdf(pvals_pred, x)

    Chisq_Test(x) = ecdf(pvals_chisq, x)
    G_Test(x) = ecdf(pvals_gtest, x)
    Fisher_Test(x) = ecdf(pvals_fisher, x)

    Barnard_Score(x) = ecdf(pvals_score, x)
    Barnard_Wald(x) = ecdf(pvals_wald, x)
    #Boschloo(x) = ecdf(pvals_boschloo, x)

    sleep(0.1)
    plotecdf("r = $r, s = $s, p = $p", Fisher_Test, Barnard_Score, Barnard_Wald, Pred_Bayes)
    plotecdf("r = $r, s = $s, p = $p", Fisher_Test, Chisq_Test, G_Test, Pred_Bayes)
end

Out[15]:

plot_comparison_with_pred (generic function with 1 method)

事後予測分布を使った検定と他の検定の比較¶

In [16]:

r, s, p, L = 5, 45, 0.4, 1000
plot_comparison_with_pred(r, s, p, L)

  0.235113 seconds (2.78 M allocations: 148.329 MiB, 7.17% gc time)
  0.000944 seconds (13.00 k allocations: 461.063 KiB)
  0.202907 seconds (111.02 k allocations: 4.794 MiB, 1.39% gc time)
  0.004423 seconds (1.00 k allocations: 54.813 KiB)
  2.477677 seconds (104.74 k allocations: 4.681 MiB)
  2.474740 seconds (102.00 k allocations: 4.524 MiB)

In [17]:

r, s, p, L = 10, 40, 0.4, 1000
plot_comparison_with_pred(r, s, p, L)

  0.315893 seconds (4.52 M allocations: 241.188 MiB, 7.46% gc time)
  0.000709 seconds (13.00 k allocations: 461.063 KiB)
  0.014768 seconds (57.06 k allocations: 1.963 MiB)
  0.007700 seconds (1.00 k allocations: 54.813 KiB)
  4.106992 seconds (102.00 k allocations: 4.524 MiB)
  4.160439 seconds (102.00 k allocations: 4.524 MiB)

In [18]:

r, s, p, L = 20, 30, 0.4, 1000
plot_comparison_with_pred(r, s, p, L)

  0.456960 seconds (6.52 M allocations: 348.000 MiB, 6.17% gc time)
  0.000625 seconds (13.00 k allocations: 461.063 KiB)
  0.015091 seconds (57.00 k allocations: 1.961 MiB)
  0.013807 seconds (1.00 k allocations: 54.813 KiB)
  6.403019 seconds (102.00 k allocations: 4.524 MiB, 0.04% gc time)
  6.349685 seconds (102.00 k allocations: 4.524 MiB)

In [19]:

r, s, p, L = 10, 40, 0.2, 1000
plot_comparison_with_pred(r, s, p, L)

  0.327804 seconds (4.52 M allocations: 241.188 MiB, 6.10% gc time)
  0.000878 seconds (13.00 k allocations: 461.063 KiB)
  0.016667 seconds (58.63 k allocations: 2.033 MiB)
  0.006612 seconds (1.00 k allocations: 54.813 KiB)
  4.336305 seconds (102.00 k allocations: 4.524 MiB)
  4.330038 seconds (102.00 k allocations: 4.524 MiB)

In [20]:

r, s, p, L = 5, 45, 0.2, 1000
plot_comparison_with_pred(r, s, p, L)

  0.202223 seconds (2.77 M allocations: 147.728 MiB, 6.56% gc time)
  0.000615 seconds (13.00 k allocations: 461.063 KiB)
  0.021253 seconds (61.82 k allocations: 2.176 MiB, 9.32% gc time)
  0.004207 seconds (1.00 k allocations: 54.813 KiB)
  2.551351 seconds (102.00 k allocations: 4.524 MiB)
  2.556768 seconds (102.00 k allocations: 4.524 MiB)

In [21]:

r, s, p, L = 5, 45, 0.1, 1000
plot_comparison_with_pred(r, s, p, L)

  0.230208 seconds (2.77 M allocations: 147.728 MiB, 6.92% gc time)
  0.000738 seconds (13.00 k allocations: 461.063 KiB)
  0.025052 seconds (65.10 k allocations: 2.322 MiB)
  0.004434 seconds (1.00 k allocations: 54.813 KiB)
  2.516203 seconds (102.00 k allocations: 4.524 MiB)
  2.626976 seconds (102.00 k allocations: 4.524 MiB, 0.12% gc time)

In [22]:

r, s, p, L = 50, 150, 0.3, 1000
plot_comparison_with_pred(r, s, p, L)

  5.534531 seconds (77.02 M allocations: 4.017 GiB, 6.38% gc time)
  0.000827 seconds (13.00 k allocations: 461.063 KiB)
  0.028032 seconds (57.00 k allocations: 1.961 MiB)
  0.043911 seconds (1.00 k allocations: 54.813 KiB)
 78.133661 seconds (102.00 k allocations: 4.524 MiB)
 79.243756 seconds (102.00 k allocations: 4.524 MiB)

以上のプロットを見るとBayesina事後予測分布を用いた正確な確率計算による検定(Pred_Test)も結構悪くない感じ。

しかし、単純な(補正無しの)カイ二乗検定が非常に安定しているように見える。

p値を求める程度のことに特別な工夫をどんなにしても得られるものは少ないのではないか？

最尤法で予測分布を作った場合¶

最尤法の解は

$p = \frac{a+c}{a+b+c+d}.$

In [53]:

function pred_mle_2bin(X, Y)
    x = vec(X')
    p = (x[1]+x[3])/sum(x)
    return p_2bin(Y, p)
end

function pval_pred_mle_2bin(X, Y; teststat = default_teststat)
    y = vec(Y')
    r = y[1] + y[2]
    s = y[3] + y[4]
    f_mle(Y) = pred_mle_2bin(X, Y)
    t0 = teststat(X)
    pval = 0.0
    for a in 0:r
        for c in 0:s
            Y = [a r-a; c s-c]
            pval += ifelse(teststat(Y) ≥ t0, f_mle(Y), 0.0)
        end
    end
    return min(pval, 1.0)     
end
pval_pred_mle_2bin(X; teststat = default_teststat) = pval_pred_mle_2bin(X, X, teststat=teststat)

function plot_comparison_with_pred_all(r, s, p, L; α=0.5, β=0.5)
    samples = rand_2bin(r, s, p, L)

    pval_pred(X) = pval_pred_2bin(α,β,X)
    @time pvals_pred = make_pvals(samples, pval_pred)

    pval_pred_mle(X) = pval_pred_mle_2bin(X)
    @time pvals_pred_mle = make_pvals(samples, pval_pred_mle)

    @time pvals_chisq = make_pvals(samples, pval_chisq)
#     @time pvals_gtest = make_pvals(samples, pval_gtest)
#     @time pvals_fisher = make_pvals(samples, pval_fisher)

#     pval_score(X) = maxpval_2bin(X, testfunc=absscore_2bin)
#     @time pvals_score = make_pvals(samples, pval_score)

#     pval_wald(X) = maxpval_2bin(X, testfunc=abswald_2bin)
#     @time pvals_wald = make_pvals(samples, pval_wald)

    pval_boschloo(X) = maxpval_2bin(X, testfunc=negpval_fisher)
    @time pvals_boschloo = make_pvals(samples, pval_boschloo)

    Pred_Bayes(x) = ecdf(pvals_pred, x)
    Pred_MLE(x) = ecdf(pvals_pred_mle, x)

    Chisq_Test(x) = ecdf(pvals_chisq, x)
#     G_Test(x) = ecdf(pvals_gtest, x)
#     Fisher_Test(x) = ecdf(pvals_fisher, x)

#     Barnard_Score(x) = ecdf(pvals_score, x)
#     Barnard_Wald(x) = ecdf(pvals_wald, x)
    Boschloo(x) = ecdf(pvals_boschloo, x)

    sleep(0.1)
    plotecdf("r = $r, s = $s, p = $p", Pred_Bayes, Pred_MLE)
    plotecdf("r = $r, s = $s, p = $p", Pred_Bayes, Pred_MLE, Chisq_Test)
    plotecdf("r = $r, s = $s, p = $p", Pred_Bayes, Pred_MLE, Boschloo)
    plotecdf("r = $r, s = $s, p = $p", Pred_Bayes, Pred_MLE, Chisq_Test, Boschloo)
    #plotecdf("r = $r, s = $s, p = $p", Barnard_Score, Barnard_Wald, Pred_Bayes, Pred_MLE)
    #plotecdf("r = $r, s = $s, p = $p", Chisq_Test, G_Test, Pred_Bayes, Pred_MLE, Fisher_Test)
end

Out[53]:

plot_comparison_with_pred_all (generic function with 1 method)

In [52]:

r, s, p, L = 5, 45, 0.4, 4000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.795286 seconds (11.09 M allocations: 590.912 MiB, 7.70% gc time)
  0.919467 seconds (11.09 M allocations: 591.113 MiB, 6.24% gc time)
  0.002578 seconds (52.00 k allocations: 1.801 MiB)
107.590995 seconds (70.21 M allocations: 4.178 GiB, 0.34% gc time)

すぐ上の計算の場合には、通常のシンプルなカイ二乗検定 Chisq_Test、ベイズ推定法による帰無仮説予測分布による検定 Pred_Bayes、最尤法による帰無仮説予測分布による検定 Pred_MLE、Boschloo's exact test のどれを使っても大差ない。

In [54]:

r, s, p, L = 10, 40, 0.4, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.329458 seconds (4.52 M allocations: 241.188 MiB, 8.68% gc time)
  0.392777 seconds (4.53 M allocations: 241.482 MiB, 6.23% gc time)
  0.000662 seconds (13.00 k allocations: 461.063 KiB)
 70.430257 seconds (28.58 M allocations: 1.702 GiB, 0.21% gc time)

In [55]:

r, s, p, L = 20, 30, 0.4, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.472260 seconds (6.52 M allocations: 348.000 MiB, 7.32% gc time)
  0.561037 seconds (6.52 M allocations: 347.969 MiB, 5.92% gc time)
  0.000622 seconds (13.00 k allocations: 461.063 KiB)
165.049874 seconds (41.18 M allocations: 2.453 GiB, 0.13% gc time)

In [59]:

r, s, p, L = 3, 22, 0.4, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.107460 seconds (932.02 k allocations: 49.462 MiB, 7.09% gc time)
  0.107909 seconds (930.00 k allocations: 49.431 MiB, 4.99% gc time)
  0.000759 seconds (13.00 k allocations: 461.063 KiB)
  6.497054 seconds (5.96 M allocations: 362.129 MiB, 0.57% gc time)

In [57]:

r, s, p, L = 5, 20, 0.4, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.120186 seconds (1.27 M allocations: 67.619 MiB, 6.63% gc time)
  0.122518 seconds (1.27 M allocations: 67.589 MiB, 6.43% gc time)
  0.000688 seconds (13.00 k allocations: 461.063 KiB)
 11.383386 seconds (8.10 M allocations: 492.867 MiB, 0.36% gc time)

In [58]:

r, s, p, L = 10, 15, 0.4, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.157133 seconds (1.77 M allocations: 94.322 MiB, 6.45% gc time)
  0.166511 seconds (1.77 M allocations: 94.292 MiB, 5.95% gc time)
  0.000624 seconds (13.00 k allocations: 461.063 KiB)
 22.667274 seconds (11.25 M allocations: 685.127 MiB, 0.27% gc time)

In [60]:

r, s, p, L = 12, 13, 0.5, 1000
plot_comparison_with_pred_all(r, s, p, L, α=0.5, β=0.5)

  0.188116 seconds (1.83 M allocations: 97.527 MiB, 8.14% gc time)
  0.152548 seconds (1.83 M allocations: 97.496 MiB, 5.52% gc time)
  0.000728 seconds (13.00 k allocations: 461.063 KiB)
 24.655902 seconds (11.63 M allocations: 708.199 MiB, 0.26% gc time)

Multinomial case¶

ここで力尽きた。多項分布(4項分布)版についての試みは、お願いだから、誰かやって。

In [ ]:

Table of Contents