Bayesian Methods 2

The material for this notebook is provided via this free online textbook :

I am writing this notebook to document my learning, and hopefully to help others learn machine learning. You can think of it as my personal lecture notes. I would love suggestions / corrections / feedback for these notebooks.

Visit my webpage for more.

Email me: [email protected]

I'd love for you to share if you liked this post.

In [61]:
submit to reddit

This notebook covers or includes:

  • Introduction to Bayesian Methods
  • Uses Python and PyMC
  • Many Examples and Figures


PyMC is a python library dedicated for Bayesian analysis. The book I am using for these notebooks is designed to bridge the complex gaps in the documentation, and make this library and Bayesian analysis more accessible. This notebook will be focused around solving the problem introduced above in Probability Distributions Review.

This type of programming is called probabilistic programming, and has previously been daunting to learn. The author provides an interesting quote that describes probabilistic programming.

Another way of thinking about this: unlike a traditional program, which only runs in forward directions, a probabilistic program is run in both forward and backward directions. It runs forward to compute the consequences of the assumptions it contains about the world, but also runs backwards from the data to constrain the possible explanations. In practice, many probabilistic programming systems will cleverly interleave these forward and backward operations to be efficiently hone in on the best explanations.

PyMC code is fairly easy to read. The only new thing is the syntax, so we will go through it slowly. Remember, we will be simply representing the model's components ($\tau, \ \lambda _1, \ \lambda _2$) as variables.

First we will reload and plot the text-message data from last lesson.

In [7]:
%matplotlib inline
from matplotlib import pyplot as plt

import seaborn as sns
# Some initial styling
rc={'axes.labelsize': 16.0, 'font.size': 16, 
    'legend.fontsize': 16.0, 'axes.titlesize': 18,
    'xtick.labelsize': 15.0, 'ytick.labelsize': 15.0}

sns.set(context = 'poster', style = 'whitegrid', rc=rc)

import numpy as np
import pandas as pd

count_data = pd.read_csv('',
    header=None, names=['msgs'])
N = len(count_data), count_data['msgs'], color="#9b59b6", alpha=0.8)
plt.xlabel('Time (days)')
plt.ylabel('count of msgs received')
plt.title('Did the texting habits change over time?')
plt.xlim(0, N)
(0, 74)
In [9]:
import pymc as pm

alpha = 1.0 / count_data.mean() # 'count_data' holds text message counts

lambda_1 = pm.Exponential('lambda_1', alpha)
lambda_2 = pm.Exponential('lambda_2', alpha)
tau = pm.DiscreteUniform('tau', lower=0, upper=N)

In this code we create PyMC variables for the model components that we need. We assign them to PyMC's stochastic variables, so-called because they are treated by the back end as random number generatiors. We can demonstrate this fact by calling their built-in random() methods.

In [10]:
print('Random Output:', tau.random(), tau.random(), tau.random())
('Random Output:', array(42), array(62), array(29))
In [11]:
def lambda_(tau=tau, lambda_1=lambda_1, lambda_2=lambda_2):
    out = np.zeros(N)
    out[:tau] = lambda_1 # lambda before tau is lambda_1
    out[tau:] = lambda_2 # lambda after (and including) tau is lambda_2
    return out

This code creates a new function lambda_, but we can just think of it as a random variable: the random variable $\lambda$. Note that because lambda_1, lambda_2, and tau are random, lambda_ will be random. No variables are fixed yet.

@pm.deterministic is a decorator that tells PyMC this is a deterministic function. Thus if the arguments were deterministic (they are not), the output would be deterministic as well.

In [12]:
observation = pm.Poisson('obs', lambda_, value=count_data, observed=True)
model = pm.Model([observation, lambda_1, lambda_2, tau])

The variable observation combines out data, count_data, without our proposed data-generation scheme, given by the variable lambda_, through the value keyword. We set observed = True to tell PyMC that this should stay fixed in our analysis. Finally, PyMC wants us to collect all the variables of interest and create a model instance out of them. This makes our life easier when we retrieve the results.

The code below will be fully explained later, but we show it here to see where the results come from. One can think of it as a learning step. The model employed here is a Markov Chain Monte Carlo (MCMC), which is also explained later. This technique returns thousands of random variables from the posterior distributions of $\lambda _1$, $\lambda _2$ , and $\tau$. We can plot a histogram of the random variables to see what the posterior distributions look like. Below we collect the samples (called traces in the MCMC literature) into histograms.

In [13]:
# Code to be explained later
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)
 [-----------------100%-----------------] 40000 of 40000 complete in 6.8 sec
In [14]:
lambda_1_samples = mcmc.trace('lambda_1')[:]
lambda_2_samples = mcmc.trace('lambda_2')[:]
tau_samples = mcmc.trace('tau')[:]
In [21]:
# Histogram of samples

ax = plt.subplot(311)

plt.hist(lambda_1_samples, histtype='stepfilled', bins=30, alpha=0.85,
         label='posterior of $\lambda _1$', color='#9b59b6', normed=True)
plt.legend(loc='upper right')
plt.title(r'''Posterior distributions of the variables $\lambda _1, \ \lambda _2, \ \tau$''')
plt.xlim([15, 30])
plt.xlabel('$\lambda _1$ value')

ax = plt.subplot(312)
plt.hist(lambda_2_samples, histtype='stepfilled', bins=30, alpha=0.85,
         label='posterior of $\lambda _2$', color='#3498db', normed=True)
plt.legend(loc='upper right')
plt.xlim([15, 30])
plt.xlabel('$\lambda _2$ value')

w = 1.0 / tau_samples.shape[0] * np.ones_like(tau_samples)
plt.hist(tau_samples, bins=N, alpha = 1,
         label=r'posterior of $\tau$',
         color='#A60628', weights=w, rwidth=2.)

plt.legend(loc='upper left')
plt.ylim([0, 0.75])
plt.xlim([35, len(count_data) - 20])
plt.xlabel(r'$\tau$ (in days)')
<matplotlib.text.Text at 0x1125e1ed0>


Recall that Bayesian methodology returns a distribution. Hence we now have distributions to describe the unknown $\lambda$s and $\tau$. What have we gained? Immediately, we can see the uncertainty in our estimates: the wider distribution, the less certain out posterior belief should be. We can also see what the plausible values for the parameters are: $\lambda _1$ is around 18 and $\lambda _2$ is around 23. The posterior distributions of the two lambdas are clearly distinct, indicating that it is indeed likely that there was a change in the users text message behaviour.

Notice that the posterior distributions for the $\lambda$s do not look like exponential distributions, even though our priors for these were exponential. In fact, the posterior distributions are not really of any form that we recognize from the original model. But that's OK! This is one of the benefits of a Bayesian point of view. If we had instead used a mathematical approaches, we would be stuck with a messy result.

Our analysis also returned a distribution for $\tau$. Its posterior distribution looks different from the other two because it is a discrete random variable, so it does assign probabilities to intervals. We can see that near day 45, there was a 50% chance that the user's behaviour changed. Had no change occurred, or had the change been gradual over time, the posterior distribution of $\tau$ would have been more spread out, reflecting that many days were plausible candidates for $\tau$. However, the actual results show that only three of four days make any sense as potential transition points.

Why would I want sampes from the posterior, anyways?

We will deal with this question for the remainder of the book, and it is an understatement to say that it will lead us to some amazing results. For now, let's end this chapter with one last example.

We will use the posterior samples to answer the following question: what is the expect number of texts at day $t$, $ 0 <= t <= 70$? Recall that the expected value of a Poisson variable is equal to its parameter $\lambda$. Therefore, the question is equivalent to what is the expected value of $\lambda$ at time $t$?

In the code below, let $i$ inded samples from the posterior distributions. Given a dat $t$, we average over all possible $\lambda _i$ for that dat $t$, using $\lambda _i = \lambda _{1, i} \ \ if \ \ t < \tau _i$, else we use $\lambda _i = \lambda _{2, i}$.

In [53]:
NN = tau_samples.shape[0]
expected_texts_per_day = np.zeros(N)
for day in range(0, N):
    #ix is a bool index of all tau samples corresponding to
    # the switchpoint occurring prior to value of 'day'
    ix = day < tau_samples
    # Each posterior sample corresponds to a value for tau.
    # for each day, that value of tau indicates whether we are 'before'
    # in the lambda_1 regime, or 'after' the switchpoint in the lambda_2 regime.
    # By taking the posterior sample of lambda1/2 accordingly, we can 
    # average over all samples to get an expect value for lambda on that day.
    # The message count random variable is Poisson distributed,
    # therefore lambda (the Poisson parameter) is the expected value of
    # message count.
    expected_texts_per_day[day] = (lambda_1_samples[ix].sum()+
                                   lambda_2_samples[~ix].sum()) / NN
plt.plot(range(N), expected_texts_per_day, color='#9b59b6', lw=4,
         label='Predicted number of texts')
plt.xlim(0, N)
plt.ylabel('Expected # of texts')
plt.title('Expected number of texts received'), count_data['msgs'], color="#3498db", alpha=0.65,
    label="observed texts per day")
plt.legend(loc='upper left')
<matplotlib.legend.Legend at 0x11c3c1d50>

Our analysis thus far shows strong support for believing the user's behaviour did chance ($\lambda _1$ would have been close in value to $\lambda _2$ had this not been true), and that change was sudden rather than gradual, as denoted by $\tau$'s strong peaked posterior distribution. We can speculate what might have caused this: a cheaper text-message rage, recent text subscription, or maybe a new relationship. In reality, the text message increase corresponds to a trip the user took for work, leading to an increase of messages to a girlfriend.

In [59]:
from IPython.core.display import HTML

def css_styling():
    styles = open("/users/ryankelly/desktop/custom_notebook2.css", "r").read()

    return HTML(styles)
In [60]:
def social():
    code = """
    <a style='float:left; margin-right:5px;' href="" class="twitter-share-button" data-text="Check this out" data-via="Ryanmdk">Tweet</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);;js.src=p+'://';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
    <a style='float:left; margin-right:5px;' href="" class="twitter-follow-button" data-show-count="false">Follow @Ryanmdk</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);;js.src=p+'://';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>
    <a style='float:left; margin-right:5px;'target='_parent' href="" onclick="window.location = '' + encodeURIComponent(window.location); return false"> <img src="" alt="submit to reddit" border="0" /> </a>
<script src="//" type="text/javascript">
  lang: en_US
<script type="IN/Share"></script>
    return HTML(code)