%matplotlib inline
import numpy as np
import pymc3 as pm
import arviz as az
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
Some fake data
MALE, FEMALE = 0, 1
y = np.array([34, 42])
gender = np.array([MALE, FEMALE])
N = np.array([104, 110])
Here's the simplest model -- two independent probabilities.
with pm.Model() as affection_status:
π = pm.Beta('π', 1, 1, shape=2)
δ = pm.Deterministic('δ', π[1] - π[0])
like = pm.Binomial('like', N, π, observed=y)
with affection_status:
trace = pm.sample(1000, njobs=2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [π] Sampling 2 chains: 100%|██████████| 3000/3000 [00:01<00:00, 1673.65draws/s]
Here's the posterior distribution of the probability difference
az.plot_posterior(trace, var_names=['δ'], ref_val=0);
Paramter estimates
pm.summary(trace).round(2)
mean | sd | mc_error | hpd_2.5 | hpd_97.5 | n_eff | Rhat | |
---|---|---|---|---|---|---|---|
π__0 | 0.33 | 0.04 | 0.0 | 0.25 | 0.42 | 2000.39 | 1.0 |
π__1 | 0.38 | 0.05 | 0.0 | 0.30 | 0.48 | 2099.90 | 1.0 |
δ | 0.05 | 0.06 | 0.0 | -0.08 | 0.17 | 2094.05 | 1.0 |
Another approach is a GLM:
with pm.Model() as affection_status_glm:
μ = pm.Normal('μ', 0, sd=5)
θ = pm.Normal('θ', 0, sd=1)
π = pm.math.invlogit(μ + θ*gender)
like = pm.Binomial('like', N, π, observed=y)
with affection_status_glm:
trace_glm = pm.sample(1000, njobs=2)
Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... Multiprocess sampling (2 chains in 2 jobs) NUTS: [θ, μ] Sampling 2 chains: 100%|██████████| 3000/3000 [00:03<00:00, 928.44draws/s]
Here you get an effect size, on the logit scale (so perhaps harder to interpret)
az.plot_posterior(trace_glm, var_names=['θ'], ref_val=0);