This notebook explains how the compound steps work in pymc3.sample
function when sampling multiple random variables. We are going to answer the following questions associated with compound steps:
import pymc3 as pm
import numpy as np
import theano
When sampling a model with multiple free random variables, the compound steps will be needed in the pm.sample
function. When the compound steps are involved, the function takes a list of step
to generate a list of methods
for different random variables. So for example in the following code,
with pm.Model() as m:
rv1 = ... # random variable 1 (continuous)
rv2 = ... # random variable 2 (continuous)
rv3 = ... # random variable 3 (categorical)
...
step1 = pm.Metropolis([rv1, rv2])
step2 = pm.CategoricalGibbsMetropolis([rv3])
trace = pm.sample(..., step=[step1, step2]...)
the compound step is now contain a list of methods
. And at each sampling step, it iterates each method, which takes a point
as input, and generates a new point
as output. The new point
is proposed within each step via a stochastic kernel, and if the proposal was rejected by the Metropolis-Hastings criteria it just outputs the original input point
.
When we call pm.sample()
, PyMC3
assigns the best step method to each of the free random variables. Take the following example:
n_ = theano.shared(np.asarray([10, 15]))
with pm.Model() as m:
p = pm.Beta('p', 1., 1.)
ni = pm.Bernoulli('ni', .5)
k = pm.Binomial('k', p=p, n=n_[ni], observed=4)
trace = pm.sample()
Multiprocess sampling (4 chains in 4 jobs) CompoundStep >NUTS: [p] >BinaryGibbsMetropolis: [ni]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 8 seconds.
There are two free parameters in the model we would like to sample from, a continuous variable p_logodds__
and a binary variable ni
.
m.free_RVs
[p_logodds__, ni]
When we call pm.sample()
, PyMC3
assigns the best step method to each of them. For example, NUTS
was assigned to p_logodds__
and BinaryGibbsMetropolis
was assigned to ni
.
But we can also specify the steps manually:
with m:
step1 = pm.Metropolis([p])
step2 = pm.BinaryGibbsMetropolis([ni])
And now we can pass a point to each step, and see what happens. First, let us generate a test point
as the input:
point = m.test_point
point
{'p_logodds__': array(0.), 'ni': array(0)}
Then pass the point
to the first step method pm.Metropolis
for random variable p
.
point, state = step1.step(point=point)
point, state
({'p_logodds__': array(0.20397629), 'ni': array(0)}, [{'tune': True, 'scaling': array([1.]), 'accept': 0.7662261757775519, 'accepted': True}])
As you can see, the value of ni
does not change, but p_logodds__
is updated.
And similarly, you can pass the updated point
to step2
and get a sample for ni
:
point = step2.step(point=point)
point
{'p_logodds__': array(0.20397629), 'ni': array(0)}
Compound step works exactly like this by iterating all the steps within the list. In effect, it is a metropolis hastings within gibbs sampling.
Moreover, pm.CompoundStep
is called internally by pm.sample()
. We can make them explicit as below:
with m:
comp_step1 = pm.CompoundStep([step1, step2])
comp_step1.methods
[<pymc3.step_methods.metropolis.Metropolis at 0x7fddc61d4910>, <pymc3.step_methods.metropolis.BinaryGibbsMetropolis at 0x7fddc6095bd0>]
When in the default setting, the parameter update order follows the same order of the random variables, and it is assigned automatically. But if you specify the steps, you can change the order of the methods in the list:
with m:
comp_step2 = pm.CompoundStep([step2, step1])
comp_step2.methods
[<pymc3.step_methods.metropolis.BinaryGibbsMetropolis at 0x7fddc6095bd0>, <pymc3.step_methods.metropolis.Metropolis at 0x7fddc61d4910>]
In the sampling process, it always follows the same step order in each sample in the Gibbs-like fashion. More precisely, at each update, it iterates over the list of methods
where the accept/reject is based on comparing the acceptance rate with $p \sim \text{Uniform}(0, 1)$ (by checking whether $\log p < \log p_{\text {updated}} - \log p_{\text {current}}$).
A recurrent issue/concern is the validity of mixing discrete and continuous sampling, especially mixing other samplers with NUTS. While in the book Bayesian Data Analysis 3rd edition Chapter 12.4, there is a small paragraph on "Combining Hamiltonian Monte Carlo with Gibbs sampling", which suggests that this could be a valid way to do, the Stan developers are always skeptical about how practical it is. (Here are more discussions about this issue 1, 2).
The concern with mixing discrete and continuous sampling is that the change in discrete parameters will affect the continuous distribution's geometry so that the adaptation (i.e., the tuned mass matrix and step size) may be inappropriate for the Hamiltonian Monte Carlo sampling. HMC/NUTS is hypersensitive to its tuning parameters (mass matrix and step size). Another issue is that we also don't know how many iterations we have to run to get a decent sample when the discrete parameters change. Though it hasn't been fully evaluated, it seems that if the discrete parameter is in low dimensions (e.g., 2-class mixture models, outlier detection with explicit discrete labeling), the mixing of discrete sampling with HMC/NUTS works OK. However, it is much less efficient than marginalizing out the discrete parameters. And sometimes it can be observed that the Markov chains get stuck quite often. In order to evaluate this more properly, one can use a simulation-based method to look at the posterior coverage and establish the computational correctness, as explained in Cook, Gelman, and Rubin 2006.
%load_ext watermark
%watermark -n -u -v -iv -w
numpy 1.18.5 pymc3 3.9.0 theano 1.0.4 last updated: Mon Jun 15 2020 CPython 3.7.7 IPython 7.15.0 watermark 2.0.2