Open Source Bayesian Inference in Python with PyMC3¶

FOSSCON • Philadelphia • August 26, 2017¶

@AustinRochford ¶

Who am I?¶

@AustinRochford • GitHub • Website ¶

austin.rochford@gmail.com • arochford@monetate.com ¶

PyMC3 developer • Principal Data Scientist at Monetate Labs ¶

Bayesian Inference¶

A motivating question:

A rare disease is present in one out of one hundred thousand people. A test gives the correct diagnosis 99.9% of the time. What is the probability that a person that tests positive has the disease?

Conditional Probability¶

Conditional probability is the probability that one event will happen, given that another event has occured.

$$ \begin{align*} P(A\ |\ B) & = \textrm{the probability that } A \textrm{ will happen if we know that } B \textrm{ has happened} \\ & = \frac{P(A \textrm{ and } B)}{P(B)}. \end{align*} $$

Our question,

A rare disease is present in one out of one hundred thousand people. A test gives the correct diagnosis 99.9% of the time. What is the probability that a person that tests positive has the disease?

becomes

$$ \begin{align*} P(+) & = 10^{-5} \\ \\ P(\textrm{Test } +\ |\ +) & = 0.999 \\ P(\textrm{Test } -\ |\ -) & = 0.999 \\ \\ P(+\ |\ \textrm{Test } +) & =\ \textbf{?} \end{align*} $$

Bayes' Theorem¶

Bayes' theorem shows how to get from $P(B\ |\ A)$ to $P(A\ |\ B)$.

$$ \begin{align*} P(+\ |\ \textrm{Test } +) & = \frac{P(\textrm{Test } +\ |\ +) P(+)}{P(\textrm{Test } +)} \\ & = \frac{P(\textrm{Test } +\ |\ +) P(+)}{P(\textrm{Test } +\ |\ +) P(+) + P(\textrm{Test } +\ |\ -) P(-)} \end{align*} $$

$$ \begin{align*} P(+) & = 10^{-5} \\ P(\textrm{Test } +\ |\ +) & = 0.999 \\ P(\textrm{Test } -\ |\ -) & = 0.999 \\ \\ P(+\ |\ \textrm{Test } +) & = \frac{P(\textrm{Test } +\ |\ +) P(+)}{P(\textrm{Test } +\ |\ +) P(+) + P(\textrm{Test } +\ |\ -) P(-)} \\ & = \frac{0.999 \times 10^{-5}}{0.999 \times 10^{-5} + 0.001 \times \left(1 - 10^{-5}\right)} \end{align*} $$

In [8]:

0.999 * 1e-5 / (0.999 * 1e-5 + 0.001 * (1 - 1e-5))

Out[8]:

0.009891284975940117

Probabilistic Programming for Bayesian Inference¶

We know the disease is present in one in one hundred thousand people.

In [9]:

import pymc3 as pm

with pm.Model() as disease_model:
    has_disease = pm.Bernoulli('has_disease', 1e-5)

If a person has the disease, there is a 99.9% chance they test positive. If they do not have the disease, there is a 0.1% chance they test positive.

In [10]:

with disease_model:
    p_test_pos = has_disease * 0.999 + (1 - has_disease) * 0.001

This person has tested positive for the disease.

In [11]:

with disease_model:
    test_pos = pm.Bernoulli('test_pos', p_test_pos, observed=1)

What is the probability that this person has the disease?

In [12]:

with disease_model:
    disease_trace = pm.sample(draws=10000, random_seed=SEED)

Assigned BinaryGibbsMetropolis to has_disease
100%|██████████| 10500/10500 [00:04<00:00, 2427.46it/s]

In [13]:

disease_trace['has_disease']

Out[13]:

array([0, 0, 0, ..., 0, 0, 0])

In [14]:

disease_trace['has_disease'].mean()

Out[14]:

0.0104

In [15]:

0.999 * 1e-5 / (0.999 * 1e-5 + 0.001 * (1 - 1e-5))

Out[15]:

0.009891284975940117

Monte Carlo Methods¶

In [16]:

N = 5000

x, y = np.random.uniform(0, 1, size=(2, N))

In [18]:

fig

Out[18]:

In [19]:

in_circle = x**2 + y**2 <= 1

In [21]:

fig

Out[21]:

In [22]:

4 * in_circle.mean()

Out[22]:

3.1272000000000002

History of Monte Carlo Methods¶

The Monty Hall Problem¶

Initially, we have no information about which door the prize is behind.

In [23]:

with pm.Model() as monty_model:
    prize = pm.DiscreteUniform('prize', 0, 2)

If we choose door one:

	Monty can open
Prize behind	Door 1	Door 2	Door 3
Door 1	No	Yes	Yes
Door 2	No	No	Yes
Door 3	No	Yes	No

In [24]:

from theano import tensor as tt

with monty_model:
    p_open = pm.Deterministic('p_open',
                              tt.switch(tt.eq(prize, 0),
                                        # it is behind the first door
                                        np.array([0., 0.5, 0.5]),
                              tt.switch(tt.eq(prize, 1),
                                        # it is behind the second door
                                        np.array([0., 0., 1.]),
                                        # it is behind the third door
                                        np.array([0., 1., 0.]))))

Monty opened the third door, revealing a goat.

In [25]:

with monty_model:
    opened = pm.Categorical('opened', p_open, observed=2)

Should we switch our choice of door?

In [26]:

with monty_model:
    monty_trace = pm.sample(draws=10000, random_seed=SEED)
    
monty_df = pm.trace_to_dataframe(monty_trace)

Assigned Metropolis to prize
100%|██████████| 10500/10500 [00:02<00:00, 4427.89it/s]

In [27]:

monty_df.prize.head()

Out[27]:

0    0
1    0
2    0
3    0
4    1
Name: prize, dtype: int64

In [29]:

fig

Out[29]:

Yes, we should switch our choice of door.

Introduction to PyMC3¶

From the PyMC3 documentation:

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

As of October 2016, PyMC3 is a NumFOCUS fiscally sponsored project.

Features¶

Uses Theano as a computational backend
- Automated differentiation, dynamic C compilation, GPU integration
Implements Hamiltonian Monte Carlo and No-U-Turn sampling
High-level GLM (R-like syntax) and Gaussian process specification
General framework for variational inference

Contributing¶

Discourse

Case Study: Sleep Deprivation¶

In [31]:

sleep_df.head()

Out[31]:

	reaction	days	subject	reaction_std
0	249.5600	0	308	-0.868968
1	258.7047	1	308	-0.706623
2	250.8006	2	308	-0.846944
3	321.4398	3	308	0.407108
4	356.8519	4	308	1.035777

In [33]:

fig

Out[33]:

Each subject has a baseline reaction time, which should not be too far apart.

In [35]:

with pm.Model() as sleep_model:
    μ_α = pm.Normal('μ_α', 0., 5.)
    σ_α = pm.HalfCauchy('σ_α', 5.)
    α = pm.Normal('α', μ_α, σ_α, shape=n_subjects)

Each subject's reaction time increases at a different rate after days of sleep deprivation.

In [36]:

with sleep_model:
    μ_β = pm.Normal('μ_β', 0., 5.)
    σ_β = pm.HalfCauchy('σ_β', 5.)
    β = pm.Normal('β', μ_β, σ_β, shape=n_subjects)

The baseline reaction time and rate of increase lead to the observed reaction times.

In [37]:

with sleep_model:
    μ = pm.Deterministic('μ', α[subject_ix] + β[subject_ix] * days)
    σ = pm.HalfCauchy('σ', 5.)
    obs = pm.Normal('obs', μ, σ, observed=reaction_std)

In [38]:

N_JOBS = 3
JOB_SEEDS = [SEED + i for i in range(N_JOBS)]

with sleep_model:
    sleep_trace = pm.sample(njobs=N_JOBS, random_seed=JOB_SEEDS)

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
100%|██████████| 1000/1000 [00:07<00:00, 134.79it/s]

Convergence Diagnostics¶

In [39]:

ax = pm.energyplot(sleep_trace, legend=False)
ax.set_title("BFMI = {:.3f}".format(pm.bfmi(sleep_trace)));

In [40]:

max(np.max(gr_values) for gr_values in pm.gelman_rubin(sleep_trace).values())

Out[40]:

1.0019478091296214

Prediction¶

In [41]:

with sleep_model:
    pp_sleep_trace = pm.sample_ppc(sleep_trace)

100%|██████████| 500/500 [00:00<00:00, 1258.46it/s]

In [43]:

pp_df.head()

Out[43]:

	reaction	days	subject	reaction_std	pp_0	pp_1	pp_10	pp_100	pp_101	pp_102	...	pp_90	pp_91	pp_92	pp_93	pp_94	pp_95	pp_96	pp_97	pp_98	pp_99
0	249.5600	0	308	-0.868968	-0.221133	-1.094005	-1.479840	-0.797150	-0.251219	-1.846100	...	-0.680005	-0.036840	-0.618054	0.158745	-0.986407	-0.788178	-1.483328	-0.423254	-0.946230	-0.838087
1	258.7047	1	308	-0.706623	0.354375	-1.192640	-0.424441	0.064195	-0.661047	-1.181403	...	-0.472364	-0.825356	-0.374235	-0.336909	-0.940759	-1.236048	-0.749208	-0.892589	0.243243	-0.891522
2	250.8006	2	308	-0.846944	0.487935	-0.570185	0.304536	0.758313	0.566803	0.211791	...	-0.512914	0.156596	-0.451047	-0.219734	-0.773714	-0.237561	-0.199602	-0.071147	0.146439	-0.193735
3	321.4398	3	308	0.407108	0.950116	-0.554748	0.544648	1.037173	-0.245327	0.450455	...	-0.183340	0.289252	0.358078	1.579609	0.652802	-0.286667	-0.327654	0.607906	-0.087981	-0.097427
4	356.8519	4	308	1.035777	0.485269	0.549079	0.564043	0.045464	0.643697	0.190714	...	0.976137	0.306833	0.744076	0.493216	0.516644	1.090693	0.583320	1.079374	0.996899	0.671175

5 rows × 504 columns

In [46]:

grid.fig

Out[46]:

Hamiltonian Monte Carlo in PyMC3¶

In [52]:

mcmc_fig

Out[52]:

The Curse of Dimensionality¶

In [55]:

fig

Out[55]:

Hamiltonian Monte Carlo¶

$$\frac{d}{dx} \left(x^3\right) = 3 x^2$$

In [56]:

x = tt.dscalar('x')
x.tag.test_value = 0.

y = x**3

In [57]:

from theano import pprint

pprint(tt.grad(y, x))

Out[57]:

'((fill((x ** TensorConstant{3}), TensorConstant{1.0}) * TensorConstant{3}) * (x ** (TensorConstant{3} - TensorConstant{1})))'

Case Study: 1984 Congressional Votes¶

In [60]:

vote_df.head()

Out[60]:

	rep_id	party	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
0	0	republican	n	y	n	y	y	y	n	n	n	y	?	y	y	y	n	y
1	1	republican	n	y	n	y	y	y	n	n	n	n	n	y	y	y	n	?
2	2	democrat	?	y	y	?	y	y	n	n	n	n	y	n	y	y	n	n
3	3	democrat	n	y	y	n	?	y	n	n	n	n	y	n	y	n	n	y
4	4	democrat	y	y	y	n	y	y	n	n	n	n	y	?	y	y	y	y

Question: can we separate Democrats and Republicans based on their voting records?

In [65]:

grid.fig

Out[65]:

Latent State Model

Representatives ($\color{blue}{\theta_i}$) and bills ($\color{green}{b_j})$ have ideal points on a liberal-conservative spectrum.
- If the ideal points are equal, there is a 50% chance the representative will vote for the bill.

In [68]:

with pm.Model() as vote_model:
    # representative ideal points
    θ = pm.Normal('θ', 0., 1., shape=n_rep)
    
    # bill ideal points
    a = hierarchical_normal('a', N_BILL)

Bills also have an ability to discriminate ($\color{red}{a_j}$) between conservative and liberal representatives.
- Some bills have broad bipartisan support, while some provoke votes along party lines.

In [70]:

with vote_model:
    # bill discrimination parameters
    b = hierarchical_normal('b', N_BILL)

This model has

In [71]:

n_rep + 2 * (N_BILL + 1)

Out[71]:

parameters

Observation Model

$$ \begin{align*} P(\textrm{Representative }i \textrm{ votes for bill }j\ |\ \color{blue}{\theta_i}, \color{green}{b_j}, \color{red}{a_j}) & = \frac{1}{1 + \exp\left(-\left(\color{red}{a_j} \cdot \color{blue}{\theta_i} - \color{green}{b_j}\right)\right)} \end{align*} $$

In [73]:

fig

Out[73]:

In [74]:

with vote_model:
    η = a[bill_id] * θ[rep_id] - b[bill_id]
    p = pm.math.sigmoid(η)
    
    obs = pm.Bernoulli('obs', p, observed=vote)

Hamiltonian Monte Carlo Inference

In [75]:

with vote_model:
    nuts_trace = pm.sample(init='adapt_diag', njobs=N_JOBS,
                           random_seed=JOB_SEEDS)

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
100%|██████████| 1000/1000 [02:11<00:00, 12.76it/s]

Convergence Diagnostics¶

In [76]:

ax = pm.energyplot(nuts_trace, legend=False)
ax.set_title("BFMI = {:.3f}".format(pm.bfmi(nuts_trace)));

In [77]:

max(np.max(gr_values) for gr_values in pm.gelman_rubin(nuts_trace).values())

Out[77]:

1.0135857288309107

Ideal Points¶

In [79]:

fig

Out[79]:

Discriminative Ability of Bills¶

In [80]:

pm.forestplot(nuts_trace, varnames=['a'], rhat=False, xtitle="$a$",
              ylabels=["Bill {}".format(i + 1) for i in range(N_BILL)]);

Comparison to Non-Hamiltonian MCMC¶

In [81]:

with vote_model:
    step = pm.Metropolis()
    met_trace_ = pm.sample(10000, step, njobs=N_JOBS, random_seed=JOB_SEEDS)
    
met_trace = met_trace_[5000::5]

100%|██████████| 10500/10500 [03:40<00:00, 47.65it/s]

In [82]:

max(np.max(gr_values) for gr_values in pm.gelman_rubin(met_trace).values())

Out[82]:

39.87901257961822

In [84]:

fig

Out[84]:

Next Steps¶

PyMC3¶

Probabilistic Programming Ecosystem¶

Probabilistic Programming System	Language	License	Discrete Variable Support	Automatic Differentiation/Hamiltonian Monte Carlo	Variational Inference
PyMC3	Python	Apache V2	Yes	Yes	Yes
Stan	C++, R, Python, ...	BSD 3-clause	No	Yes	Yes
Edward	Python, ...	Apache V2	Yes	Yes	Yes
BUGS	Standalone program, R	GPL V2	Yes	No	No
JAGS	Standalone program, R	GPL V2	Yes	No	No

Thank you!¶

@AustinRochford • austin.rochford@gmail.com • arochford@monetate.com ¶

The Jupyter notebook used to generate these slides is available here.

Open Source Bayesian Inference in Python with PyMC3¶

FOSSCON • Philadelphia • August 26, 2017¶

@AustinRochford¶

Who am I?¶

@AustinRochford • GitHub • Website¶

austin.rochford@gmail.com • arochford@monetate.com¶

PyMC3 developer • Principal Data Scientist at Monetate Labs¶

Bayesian Inference¶

Conditional Probability¶

Bayes' Theorem¶

Probabilistic Programming for Bayesian Inference¶

Monte Carlo Methods¶

History of Monte Carlo Methods¶

The Monty Hall Problem¶

Introduction to PyMC3¶

Features¶

Contributing¶

Case Study: Sleep Deprivation¶

Convergence Diagnostics¶

Prediction¶

Hamiltonian Monte Carlo in PyMC3¶

The Curse of Dimensionality¶

Hamiltonian Monte Carlo¶

Case Study: 1984 Congressional Votes¶

Convergence Diagnostics¶

Ideal Points¶

Discriminative Ability of Bills¶

Comparison to Non-Hamiltonian MCMC¶

Next Steps¶

PyMC3¶

Probabilistic Programming Ecosystem¶

Thank you!¶

@AustinRochford • austin.rochford@gmail.com • arochford@monetate.com¶

@AustinRochford ¶

@AustinRochford • GitHub • Website ¶

austin.rochford@gmail.com • arochford@monetate.com ¶

PyMC3 developer • Principal Data Scientist at Monetate Labs ¶

@AustinRochford • austin.rochford@gmail.com • arochford@monetate.com ¶