This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

# 13.1. Simulating a discrete-time Markov chain¶

1. Let's import NumPy and matplotlib.
In [ ]:
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

1. We consider a population that cannot comprise more than $N=100$ individuals. We also define birth and death rates.
In [ ]:
N = 100  # maximum population size
a = .5/N  # birth rate
b = .5/N  # death rate

1. We will simulate a Markov chain on the finite space $\{0, 1, \ldots, N\}$. Each state represents a population size. The vector $x$ will contain the population size at each time step. We set the initial state to $x_0=25$, i.e. there are 25 individuals in the population at initialization time.
In [ ]:
nsteps = 1000
x = np.zeros(nsteps)
x[0] = 25

1. We now simulate our chain. At each time step $t$, there is a new birth with probability $a \cdot x_t$, and independently, there is a new death with probability $b \cdot x_t$. These probabilities are proportional to the size of the population at that time. If the population size reaches $0$ or $N$, the evolution stops.
In [ ]:
for t in range(nsteps - 1):
if 0 < x[t] < N-1:
# Is there a birth?
birth = np.random.rand() <= a*x[t]
# Is there a death?
death = np.random.rand() <= b*x[t]
# We update the population size.
x[t+1] = x[t] + 1*birth - 1*death
# The evolution stops if we reach $0$ or $N$.
else:
x[t+1] = x[t]

1. Let's look at the evolution of the population size.
In [ ]:
plt.figure(figsize=(6,3));
plt.plot(x);


We see that, at every time, the population size can stays stable, increase by 1, or decrease by 1.

1. Now, we will simulate many independent trials of this Markov chain. We could run the previous simulation with a loop, but it would be very slow (two nested for loops). Instead, we vectorize the simulation by considering all independent trials at once. There is a single loop over time. At every time step, we update all trials simultaneously with vectorized operations on vectors. The vector x now contains the population size of all trials, at a particular time. At initialization time, the population sizes are set to random numbers between $0$ and $N$.
In [ ]:
ntrials = 100
x = np.random.randint(size=ntrials,
low=0, high=N)

1. We define a function that performs the simulation. At every time step, we find the trials that undergo births and deaths by generating random vectors, and we update the population sizes with vector operations.
In [ ]:
def simulate(x, nsteps):
"""Run the simulation."""
for _ in range(nsteps - 1):
# Which trials to update?
upd = (0 < x) & (x < N-1)
# In which trials do births occur?
birth = 1*(np.random.rand(ntrials) <= a*x)
# In which trials do deaths occur?
death = 1*(np.random.rand(ntrials) <= b*x)
# We update the population size for all trials.
x[upd] += birth[upd] - death[upd]

1. Now, we will look at the histograms of the population size at different times. These histograms represent the probability distribution of the Markov chain, estimated with independent trials (Monte Carlo method).
In [ ]:
bins = np.linspace(0, N, 25);

In [ ]:
plt.figure(figsize=(12,3));
nsteps_list = [10, 1000, 10000]
for i, nsteps in enumerate(nsteps_list):
plt.subplot(1, len(nsteps_list), i + 1);
simulate(x, nsteps)
plt.hist(x, bins=bins);
plt.xlabel("Population size");
if i == 0:
plt.ylabel("Histogram");
plt.title("{0:d} time steps".format(nsteps));


Whereas, initially, the population sizes look equally distributed between $0$ and $N$, they appear to converge to $0$ or $N$ after a sufficiently long time. This is because the states $0$ and $N$ are absorbing: once reached, the chain cannot leave those states. Furthermore, these states can be reached from any other state.

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).