Fitting Bayesian Hierarchical Models in Python with PyMC¶

Christopher Fonnesbeck Vanderbilt University
John Salvatier Amazon.com
Thomas Wieki Brown University, Quantopian

Software for Fitting Hierarchical Models¶

Bayesian models specified and fit using a domain-specific language.

BUGS/JAGS

Usually involves interface to R language.

send data, model to BUGS
run model in BUGS
BUGS returns output to R

This approach keeps users at a distance from their model.

Disadvantages include:

non-interactive
difficult to debug
difficult to profile
difficult to customize and extend

An alternative approach is to build models in the native language used to perform data analysis.

MCMC in R¶

A handful of packages exist that use MCMC to fit models in R.

MCMCpack
mcmc
MCMCglmm

Each of these operates on particular classes of models.

Most Bayesians use interfaces to BUGS/JAGS instead of native code.

Python¶

Python is a dynamic, interpreted programming language that is flexible enough to be used for a variety of tasks, from scripts to mission-critical applications. It currently has a very strong presence in scientific computing. Despite being an easy-to-use, high-level language, it is also capable of achieving very high performance.

Runnable pseudocode: Python's syntax is readable and clear.

Example: Gibbs sampler¶

Gibbs sampler for function:

$$f(x,y) = x x^2 \exp(-xy^2 - y^2 + 2y - 4x)$$

using conditional distributions:

$$x|y \sim Gamma(3, y^2 +4)$$$$y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)})$$

In [4]:

from numpy import zeros, random, sqrt
gamma = random.gamma
normal = random.normal

def gibbs(N=20000, thin=200):
    mat = zeros((N,2))
    x,y = mat[0]
    for i in range(N):
        for j in range(thin):
            x = gamma(3, y**2 + 4)
            y = normal(1./(x+1), 1./sqrt(2*(x+1)))
        mat[i] = x,y

    return mat

Python objects are dynamically referenced

In [5]:

y = 5
y = 'foo'
x = y = [1, 2, 3]
y[0] = -9
x

Out[5]:

[-9, 2, 3]

... but strongly typed!

In [6]:

'5' + 6

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-cfde8aeadc63> in <module>()
----> 1 '5' + 6

TypeError: cannot concatenate 'str' and 'int' objects

Python's Scientific Stack¶

NumPy and SciPy¶

NumPy and SciPy are fundamental scientific modules for Python.

In particular, NumPy provides:

fast, efficient mulitidimensional array data structure: ndarray
functions for operating on arrays
tools for integrating code from compiled languages
RNGs, linear algebra functions, Fourier transform

SciPy provides a large suite of standard scientific computing:

statistical distribtuion classes
sparse matrices
signal processing
optimizers
numerical integration and DE solvers
additional linear algebra tools

In [7]:

# Calculate second difference matrix
import numpy as np

I2 = -2*np.eye(8)

In [8]:

E = np.diag(np.ones(7), k=-1)

In [9]:

I2 + E + E.T

Out[9]:

array([[-2.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1., -2.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1., -2.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1., -2.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1., -2.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1., -2.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1., -2.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1., -2.]])

IPython¶

IPython is an enhanced Python shell which provides a more robust and productive development environment for users. It includes the HTML notebook featured here, as well as support for interactive data visualization and easy high-performance parallel computing.

Magic functions¶

IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include:

%run
%edit
%debug
%timeit
%paste
%load_ext

For example, we can use %load_ext to load an extension to run R code within IPython:

In [10]:

%load_ext rpy2.ipython

In [11]:

%%R
x <- seq(1, 10)
y <- rnorm(10)
plot(x, y)

IPython Notebooks¶

The IPython Notebook is a web-based interactive interface to IPython that combines code execution, text, mathematics, plots and rich media into a single document.

It provides an easy way of reporting, from simple data-cleaning scripts to publishable books.

In [12]:

from IPython.display import HTML
HTML('<iframe src=http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter3_MCMC/IntroMCMC.ipynb width=800 height=400></iframe>')

Out[12]:

Pandas¶

Pandas provides high-performance data structures for data manipulation and analysis. In particular, it allows for intelligent data alignment and integrated handling of missing data. Datasets can be easily reshaped, sliced, subsetted, and indexed hierarchically.

Here's a trivial example of a pandas DataFrame, populated with columns of various types:

In [13]:

%matplotlib inline

In [14]:

from pandas import *
df = DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                'B' : ['A', 'B', 'C'] * 8,
                'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                'D' : np.random.randn(24),
                'E' : np.random.randn(24),
                'F' : date_range(start='4/1/2012', periods=24)})

df.head()

Out[14]:

	A	B	C	D	E	F
0	one	A	foo	-0.079579	-0.769077	2012-04-01
1	one	B	foo	0.083715	-0.043521	2012-04-02
2	two	C	foo	0.631201	-0.101193	2012-04-03
3	three	A	bar	1.819274	0.930788	2012-04-04
4	one	B	bar	-0.124589	0.629453	2012-04-05

Data Wrangling¶

Consider some real data, from Statistical Methods for the Analysis of Repeated Measurements by Charles S. Davis. These data are from a multicenter, randomized controlled trial of botulinum toxin type B (BotB) in patients with cervical dystonia from nine U.S. sites.

In [15]:

cdystonia = read_csv("cdystonia.csv", index_col=None)
cdystonia.head()

Out[15]:

	patient	obs	week	site	id	treat	age	sex	twstrs
0	1	1	0	1	1	5000U	65	F	32
1	1	2	2	1	1	5000U	65	F	30
2	1	3	4	1	1	5000U	65	F	24
3	1	4	8	1	1	5000U	65	F	37
4	1	5	12	1	1	5000U	65	F	39

With Pandas, we can easily manipulate these data. For example, we might want a hierarchical index.

In [16]:

cdystonia.set_index(['patient','obs']).head(10)

Out[16]:

		week	site	id	treat	age	sex	twstrs
patient	obs
1	1	0	1	1	5000U	65	F	32
	2	2	1	1	5000U	65	F	30
	3	4	1	1	5000U	65	F	24
	4	8	1	1	5000U	65	F	37
	5	12	1	1	5000U	65	F	39
	6	16	1	1	5000U	65	F	36
2	1	0	1	2	10000U	70	F	60
	2	2	1	2	10000U	70	F	26
	3	4	1	2	10000U	70	F	27
	4	8	1	2	10000U	70	F	41

Or we may want to summarize data that are grouped according to one or more variables.

In [17]:

cdystonia.groupby(['treat', 'sex'])['twstrs'].mean()

Out[17]:

treat    sex
10000U   F      42.830303
         M      37.208333
5000U    F      40.216981
         M      42.523810
Placebo  F      43.258065
         M      38.891566
Name: twstrs, dtype: float64

Cython¶

Being an interpreted language, Python can sometimes be slow relative to compiled languages. Yet, we don't want to trade off the productivity benefits of using Python with the speed gains of a language like C or Fortran. Often, it is just a few lines of code that slows down an analysis, so if that code can be sped up somehow, the entire analysis could run more efficiently.

Cython is a language that allows Python programmers to write fast code without having to write C/C++/Fortran directly. It looks much like Python code, but with type declarations. Cython takes this code, translates it to C, then compiles the generated C code to create a Python extension.

In [18]:

from numpy import zeros, random, sqrt
gamma = random.gamma
normal = random.normal

def pygibbs(N=20000, thin=200):
    mat = zeros((N,2))
    x,y = mat[0]
    for i in range(N):
        for j in range(thin):
            x = gamma(3, y**2 + 4)
            y = normal(1./(x+1), 1./sqrt(2*(x+1)))
        mat[i] = x,y

    return mat

In [19]:

%timeit pygibbs(1000, 10)

10 loops, best of 3: 27.2 ms per loop

"Cythonizing" your Python code can result in significant increase in speed.

In [20]:

%load_ext cythonmagic

In [21]:

%%cython -lm -lgsl -lgslcblas

cimport cython
import numpy as np
from numpy cimport *

cdef extern from "math.h":
    double sqrt(double) 
  
cdef extern from "gsl/gsl_rng.h":
    ctypedef struct gsl_rng_type
    ctypedef struct gsl_rng

    gsl_rng_type *gsl_rng_mt19937
    gsl_rng *gsl_rng_alloc(gsl_rng_type * T) nogil
  
cdef extern from "gsl/gsl_randist.h":
    double gamma "gsl_ran_gamma"(gsl_rng * r,double,double)
    double gaussian "gsl_ran_gaussian"(gsl_rng * r,double)
  
cdef gsl_rng *r = gsl_rng_alloc(gsl_rng_mt19937)

@cython.wraparound(False)
@cython.boundscheck(False)
def gibbs(int N=20000,int thin=500):
    cdef: 
        double x=0
        double y=0
        int i, j
        ndarray[float64_t, ndim=2] samples

    samples = np.empty((N,thin))
    for i from 0 <= i < N:
        for j from 0 <= j < thin:
            x = gamma(r,3,1.0/(y*y+4))
            y = gaussian(r,1.0/sqrt(x+1))
        samples[i,0] = x
        samples[i,1] = y
    return samples

In [22]:

%timeit gibbs(1000, 10)

1000 loops, best of 3: 953 µs per loop

Python provides an idea platform for Bayesian computing, due to its unqiue combination of flexibility, performance and ease-of-use.

PyMC: Bayesian Modeling in Python¶

JAGS developer Martyn Plummer has described his software as:

A platform for experimenting with ideas in Bayesian modeling

Because it is implemented natively in Python, PyMC has the potential to be such an experimental platform for statistical methodologists to develop and implement new methods.

PyMC is a python module that implements Bayesian statistical models and fitting algorithms, including Markov chain Monte Carlo. Its flexibility and extensibility make it applicable to a large suite of problems. Along with core sampling functionality, PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics.

*PyMC provides functionalities to make Bayesian analysis as painless as possible.*

Key Features¶

Fits Bayesian statistical models with Markov chain Monte Carlo and other algorithms.
Includes a large suite of well-documented statistical distributions.
Uses NumPy for numerics wherever possible.
Includes a module for modeling Gaussian processes.
Sampling loops can be paused and tuned manually, or saved and restarted later.
Creates summaries including tables and plots.
Traces can be saved to the disk as plain text, Python pickles, SQLite or MySQL database, or hdf5 archives.
Several convergence diagnostics are available.
Extensible: easily incorporates custom step methods and unusual probability distributions.
MCMC loops can be embedded in larger programs, and results can be analyzed with the full power of Python.

Example: Modeling failure rates¶

Let's consider a simple attempt to model failure rates. The data are a vector of failure counts, as well as observation times, which can be used to calculate rates

In [23]:

y = 5,1,5,14,3,19,1,1,4,22
t = 94.320, 15.720, 62.880, 125.760, 5.240, 31.440, 1.048, 1.048, 2.096, 10.480

np.array(y)/t

Out[23]:

array([ 0.05301103,  0.06361323,  0.07951654,  0.11132316,  0.57251908,
        0.6043257 ,  0.95419847,  0.95419847,  1.90839695,  2.09923664])

The simplest (naive) model is to specify a Poisson model for the data, and estimate the corresponding rate. We might place an exponential prior on this rate.

In [24]:

from pymc import *

rate = Exponential('rate', 0.1)

Calling Exponential with appropriate arguments returns a PyMC Stochastic object.

In [25]:

rate

Out[25]:

<pymc.distributions.Exponential 'rate' at 0x111950f10>

In [26]:

isinstance(rate, Stochastic)

Out[26]:

True

PyMC objects have attributes that are relevant to their roles in Bayesian model. Thus, a Stochastic has a value and a log-probability:

In [27]:

rate.value

Out[27]:

array(2.0112404848444427)

In [28]:

rate.logp

Out[28]:

-2.5037091414784896

We did not specify a value for rate (but we could have!), so one was sampled automatically when the object was created.

We can sample from rate, which generates a new value for the object, and associated log-probability.

In [29]:

rate.random()

Out[29]:

array(17.889479833802298)

In [30]:

rate.value

Out[30]:

array(17.889479833802298)

The next step is to multiply the rate by the time offsets, to calculate expected failure counts. This generates a Deterministic object that we will call theta.

In [31]:

theta = rate * t

In [32]:

isinstance(theta, Deterministic)

Out[32]:

True

In [33]:

theta.value

Out[33]:

array([ 1687.33573792,   281.22262299,  1124.89049195,  2249.7809839 ,
          93.74087433,   562.44524597,    18.74817487,    18.74817487,
          37.49634973,   187.48174866])

Notice that even though theta is vector-valued, we did not need to construct a loop to specify it. This is due to the vectorization capabilities of NumPy.

We can now specify a likelihood, using theta as the parameter(s). What type of object is the likelihood?

In [34]:

failures = Poisson('failures', theta, value=y, observed=True)

In [35]:

isinstance(failures, Stochastic)

Out[35]:

True

Its a Stochastic. The difference is the observed=True flag fixes the value of the stochastic to the observed data, so it does not get updated.

MCMC Sampling¶

This collection of PyMC objects forms a directed acyclic graph by virtue of the parent-child relationships defined by one object (parent) being the argument for another (child).

We can pass the nodes of our graph to an MCMC object, which creates a sampler that can be used to fit our model.

In [36]:

simple_model = MCMC([failures, theta, rate])
simple_model.sample(10000, 5000)

 [-----------------100%-----------------] 10000 of 10000 complete in 0.4 sec

By default, PyMC chooses an MCMC algorithm ("step method") for each stochastic node in the graph. We can see here that the Metropolis step method was chosen for the rate parameter.

In [37]:

simple_model.step_method_dict

Out[37]:

{<pymc.distributions.Exponential 'rate' at 0x111950f10>: [<pymc.StepMethods.Metropolis at 0x1166e0ed0>]}

In [39]:

Matplot.plot(rate)

Plotting rate

In [40]:

rate.summary()

rate:
 
	Mean             SD               MC Error        95% HPD interval
	------------------------------------------------------------------
	0.214            0.024            0.001            [ 0.168  0.266]
	
	
	Posterior quantiles:
	
	2.5             25              50              75             97.5
	 |---------------|===============|===============|---------------|
	0.169            0.199           0.214          0.229         0.273

We did not have to use MCMC. PyMC also has approximation methods.

In [41]:

map_model = MAP([failures, theta, rate])
map_model.fit()

In [42]:

rate.value

Out[42]:

array(0.2142176857025202)

Hierarchical Models¶

This appears to be a lousy model for this data -- the data appear to vary more than would be expected under a Poisson model.

Let's update the model to make it more realistic.

The simplest extension of the Poisson model is to allow the mean to vary. A natural choice for the distribution of the mean is a gamma distribution.

In [43]:

alpha = Exponential('alpha', 0.1, value=1)
beta = Exponential('beta', 0.1, value=1)

rate = Gamma('rate', alpha, beta, size=len(y))
rate.value

Out[43]:

array([ 2.88503664,  0.32513389,  1.85656985,  0.24414514,  1.89594116,
        2.06834612,  2.58120821,  0.22028342,  0.51869994,  0.89119894])

In [44]:

theta = rate * t

failures = Poisson('failures', theta, value=y, observed=True)

Running this model yields more reasonable estimates of rates.

In [45]:

hierarchical_model = MCMC([failures, theta, rate, alpha, beta])
hierarchical_model.sample(10000, 5000)

 [-----------------100%-----------------] 10000 of 10000 complete in 1.2 sec

In [46]:

Matplot.summary_plot(rate, rhat=False)

PyMC 3¶

Samplers in PyMC 2 are limited to "first generation" MCMC algorithms (i.e. Metropolis and related).

PyMC 3, currently in beta, is a complete re-write of the PyMC code base, in part to allow for the implemetation of gradient-based sampling algorithms, such as Hamiltonian MC and NUTS.

Probabilistic modeling framework

Theano¶

The core change in PyMC 3 is the use of Theano for much of the computation.

Theano allows for the construction of *expressions* which can be compiled and executed in Python.

Importantly, it allows for the computation of gradients via *automatic differentiation*

In [47]:

from theano import function, tensor as T

In [48]:

x = T.dmatrix('x')
s = T.sum(1 / (1 + T.exp(-x)))
gs = T.grad(s, x)
dlogistic = function([x], gs)

In [49]:

dlogistic([[3, -1],[0, 2]])

Out[49]:

array([[ 0.04517666,  0.19661193],
       [ 0.25      ,  0.10499359]])

In [50]:

import imp
pymc3 = imp.load_module("pymc3", *imp.find_module("pymc", ["/Users/fonnescj/GitHub/pymc"]))

Hierarchical Model Example: Radon contamination¶

(Gelman and Hill 2006)¶

Radon is a radioactive gas that enters homes through contact points with the ground. It is a carcinogen that is the primary cause of lung cancer in non-smokers. Radon levels vary greatly from household to household.

radon

The EPA did a study of radon levels in 80,000 houses. Two important predictors:

measurement in basement or first floor (radon higher in basements)
county uranium level (positive correlation with radon levels)

The hierarchy in this example is households within county.

Here is the data, which requires some cleanup, courtesy of Pandas.

In [51]:

import pandas as pd

# Import radon data
srrs2 = pd.read_csv('srrs2.dat')
srrs2.columns = srrs2.columns.map(str.strip)
srrs_mn = srrs2[srrs2.state=='MN']

srrs_mn['fips'] = srrs_mn.stfips*1000 + srrs_mn.cntyfips
cty = pd.read_csv('cty.dat')
cty_mn = cty[cty.st=='MN']
cty_mn['fips'] = 1000*cty_mn.stfips + cty_mn.ctfips

# Use the `merge` method to combine home- and county-level information in a single DataFrame.
srrs_mn = srrs_mn.merge(cty_mn[['fips', 'Uppm']], on='fips')
srrs_mn = srrs_mn.drop_duplicates(subset='idnum')
u = np.log(srrs_mn.Uppm)

n = len(srrs_mn)

# We also need a lookup table (`dict`) for each unique county, for indexing.
srrs_mn.county = srrs_mn.county.map(str.strip)
mn_counties = srrs_mn.county.unique()
county_lookup = dict(zip(mn_counties, range(len(mn_counties))))

# Finally, create local copies of variables.
county = srrs_mn['county_code'] = srrs_mn.county.replace(county_lookup).values
radon = srrs_mn.activity
srrs_mn['log_radon'] = log_radon = np.log(radon + 0.1).values
floor_measure = srrs_mn.floor.values

In [52]:

Normal, Uniform = pymc3.Normal, pymc3.Uniform
NUTS, sample = pymc3.NUTS, pymc3.sample

Varying intercept and slope model¶

The most general model allows both the intercept and slope to vary by county:

$$y_i = \alpha_{j[i]} + \beta_{j[i]} x_{i} + \epsilon_i$$

where

$$\epsilon_i \sim N(0, \sigma_y^2)$$

and the intercept and slope random effects:

$$\alpha_{j[i]} \sim N(\mu_{\alpha}, \sigma_{\alpha}^2)$$$$\beta_{j[i]} \sim N(\mu_{\beta}, \sigma_{\beta}^2)$$

As with the the “no-pooling” model, we set a separate intercept for each county, but rather than fitting separate least squares regression models for each county, multilevel modeling shares strength among counties, allowing for more reasonable inference in counties with little data.

Adding group-level predictors¶

A primary strength of multilevel models is the ability to handle predictors on multiple levels simultaneously. If we consider the varying-intercepts model above:

$$y_i = \alpha_{j[i]} + \beta x_{i} + \epsilon_i$$

we may, instead of a simple random effect to describe variation in the expected radon value, specify another regression model with a county-level covariate. Here, we use the county uranium reading $u_j$, which is thought to be related to radon levels:

$$\alpha_j = \gamma_0 + \gamma_1 u_j + \zeta_j$$$$\zeta_j \sim N(0, \sigma_{\alpha}^2)$$

Thus, we are now incorporating a house-level predictor (floor or basement) as well as a county-level predictor (uranium).

Note that the model has both indicator variables for each county, plus a county-level covariate. In classical regression, this would result in collinearity. In a multilevel model, the partial pooling of the intercepts towards the expected value of the group-level linear model avoids this.

Group-level predictors also serve to reduce group-level variation $\sigma_{\alpha}$. An important implication of this is that the group-level estimate induces stronger pooling.

In [54]:

hierarchical_model = pymc3.Model()

with hierarchical_model:
    
    # Priors
    sigma_a = Uniform('sigma_a', lower=0, upper=100)
    tau_a = sigma_a**-2
    
    # County uranium model for intercept
    gamma_0 = Normal('gamma_0', mu=0., tau=0.0001)
    gamma_1 = Normal('gamma_1', mu=0., tau=0.0001)
    
    # Uranium model for intercept
    mu_a = gamma_0 + gamma_1*u
    # County variation not explained by uranium
    eps_a = Normal('eps_a', mu=0, tau=tau_a, shape=len(set(county)))
    a = pymc3.Deterministic('a', mu_a + eps_a[county])

    # Random slope
    b = Normal('b', mu=0, tau=0.001)
    
    # Model error
    sigma_y = Uniform('sigma_y', lower=0, upper=100)
    tau_y = sigma_y**-2
    
    # Expected value
    y_hat = a + b * floor_measure
    
    # Data likelihood
    y_like = Normal('y_like', mu=y_hat, tau=tau_y, observed=log_radon)

In [55]:

with hierarchical_model:
    
    step = NUTS()
    
    samples = sample(2000, step, njobs=2)

/usr/local/lib/python2.7/site-packages/theano/scan_module/scan_perform_ext.py:117: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *

 [-----------------100%-----------------] 2000 of 2000 complete in 54.1 sec

In [66]:

import matplotlib.pyplot as plt

trace = samples[-1000:]

u = np.log(srrs_mn.Uppm)
a_means = trace['a'][0].mean(axis=0)
plt.scatter(u, a_means)
g0 = trace['gamma_0'][0].mean()
g1 = trace['gamma_1'][0].mean()
xvals = np.linspace(-1, 0.8)
plt.plot(xvals, g0+g1*xvals, 'k--')
plt.xlim(-1, 0.8)

a_se = trace['a'][0].std(axis=0)
for ui, m, se in zip(u, a_means, a_se):
    plt.plot([ui,ui], [m-se, m+se], 'b-')
plt.xlabel('County-level uranium'); plt.ylabel('Intercept estimate');

GLM Module¶

For easy specification of generalized linear models, the glm module provides a convenience function. Models can be fit using only a line or two of Python code.

In [67]:

age = np.array([13, 14, 14,12, 9, 15, 10, 14, 9, 14, 13, 12, 9, 10, 15, 11, 15, 11, 7, 13, 
             13, 10, 9, 6, 11, 15, 13, 10, 9, 9, 15, 14, 14, 10, 14, 11, 13, 14, 10])
price = np.array([2950, 2300, 3900, 2800, 5000, 2999, 3950, 2995, 4500, 2800, 1990, 3500, 
               5100, 3900, 2900, 4950, 2000, 3400, 8999, 4000, 2950, 3250, 3950, 4600, 
               4500, 1600, 3900, 4200, 6500, 3500, 2999, 2600, 3250, 2500, 2400, 3990, 
               4600, 450,4700])/1000.

age_price = pd.DataFrame({'x': age, 'y': price})

In [68]:

glm = pymc3.glm

with pymc3.Model() as model:
    glm.glm('y ~ x', age_price, family=glm.families.T())
    trace = pymc3.sample(2000, pymc3.HamiltonianMC())

 [-----------------100%-----------------] 2000 of 2000 complete in 2.5 sec

In [69]:

import matplotlib.pyplot as plt
import prettyplotlib as ppl

fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, xlabel='Age', ylabel='Price')
ppl.scatter(ax, age, price)
pymc3.glm.plot_posterior_predictive(trace, eval=np.linspace(5, 16, 100), samples=100)

Summary¶

Python and its "scientific stack" offers a compelling computational environment for modern Bayesian computing.

In comparison to other Bayesian computing solutions, it requires the fewest tradeoffs and provides the most interactive platform for developing models.

PyMC provides very general implementations of modern Markov chain Monte Carlo and related procedures, and aspires to be a platform for probabilistic programming.