by Fedor Iskhakov, ANU
Description: Stochastic matrix, irreducibility and aperiodicity, stationary distribution.
Mathematical objects describing an evolution of random variable(s) in time.
$$ \{X_t\}_{t \in T} $$Examples:
Markov property (memorylessness): probability distribution over $ X_{t+1} $ depends only on $ X_t $, but not on any previous values of the process
Transition matrix $ P $ is stochastic
By definition $ \{P_{ik}\}, k=1,\dots,n $ is probability distribution of $ X_{t+1} $ given $ X_{t} = x_i $
Therefore, it is possible to simulate the chain with
import numpy as np
P = np.array([[.5,.4,.1],[.4,.5,.1],[.2,.2,.6]])
ψ = np.array([0.2,0.3,0.5]) # arbitrary distribution of initial value
def ddraw(probs):
'''Draws one realization of discrete random variables
generated from given probability distibution (base 0)
'''
assert probs.ndim == 1, 'Expecting a one-dimensional array of probabilities'
assert np.abs(probs.sum()-1)<1e-10, 'Passed probabilities do not sum up to one'
cdf = np.cumsum(probs) # cumulative distribution
u = np.random.uniform() # u(0,1)
for i in range(cdf.size):
if u<=cdf[i]:
return i # between i-1 and i values of cdf
# generate a sequence of discrete random variables with distribution ψ
for i in range(10):
print(i,ddraw(ψ),sep=' - ')
0 - 1 1 - 2 2 - 2 3 - 1 4 - 2 5 - 1 6 - 1 7 - 2 8 - 0 9 - 0
def simmc(P,psi,T=100):
'''Simulates Markov chain with given transition probability matrix (P),
initial state distribution (psi) for a given number T of steps (time periods)
'''
P = np.asarray(P) # convert to numpy array
psi = np.asarray(psi)
assert np.all(np.abs(P.sum(axis=1)-1)<1e-10), 'Passed P is not stochastic matrix'
m = psi.size # number of states in the chain
# simulate the initial state
X = np.empty(T,dtype=int) # allocate space for the simulated values
X[0] = ddraw(psi) # initial values in first column
# main loop over time
for t in range(1,T):
X[t] = ddraw(P[int(X[t-1]),:]) # simulate using appropriate row
return X
print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')
sm = simmc(P,ψ,T=100)
print('Simulation:',sm,sep='\n')
weather=['Sunny','Partly cloudy','Rainy']
for i in sm:
print(weather[i],end=' >> ')
Transition matrix: [[0.5 0.4 0.1] [0.4 0.5 0.1] [0.2 0.2 0.6]] Distribution of initial value: [0.2 0.3 0.5] Simulation: [1 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 1 2 2 2 2 2 2 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 2 2 1 2 2 2 2 2 0 1 0 1 1 0 0 0 0 1 2 2 1 1 2 2 2 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1] Partly cloudy >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Partly cloudy >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >>
Suppose that
What then is the distribution of $ X_{t+1} $, and, more generally, of $ X_{t+m} $?
Let’s start from finding the distribution of $ X_{t+1} $ denoted $ \psi_{t+1} $
Fix $ y \in S $. Using the law of total probability, we can decompose the probability that $ \mathbb{P}\{X_{t+1} = y\} $ as follows:
$$ \mathbb {P} \{X_{t+1} = y \} = \sum_{x \in S} \mathbb {P} \{ X_{t+1} = y \, | \, X_t = x \} \cdot \mathbb {P} \{ X_t = x \} $$In words, to get the probability of being at $ y $ tomorrow, we account for all ways this can happen and sum their probabilities
Repeating for all $ j $, in matrix notation we have
$$ \psi_{t+1} = \psi_{t} \cdot P $$To find the distribution of $ X_{t+m} $ we can repeat the same arguments $ m $ times
$$ \psi_{t+m} = \psi_{t} \cdot P^m $$print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')
print('Distribution in one time period:', ψ@P ,sep='\n')
print('Distribution in two time periods:', ψ@P@P ,sep='\n')
print('Distribution in ten time periods:', ψ @ np.linalg.matrix_power(P,10) ,sep='\n')
# BE AWARE that P**10 works element-wise!
Transition matrix: [[0.5 0.4 0.1] [0.4 0.5 0.1] [0.2 0.2 0.6]] Distribution of initial value: [0.2 0.3 0.5] Distribution in one time period: [0.32 0.33 0.35] Distribution in two time periods: [0.362 0.363 0.275] Distribution in ten time periods: [0.39985352 0.39985352 0.20029297]
To see this, assume $ \psi_t=(1,0,\dots,0) $ and apply $ \psi_{t+m} = \psi_{t} \cdot P^m $ to get the probability distribution for the state in $ m $ periods. Elements of this vector are individual transition probabilities from $ x_i=1 $. The argument can be repeated for all degenerate $ \psi_t $ placing all probability mass to each of the values $ x_i $.
Two states $ i $ and $ j $ are said to communicate if there exist positive integers $ k $ and $ l $ such that
$$ P_{ij}^k > 0 \text{ and } P_{ji}^l > 0 $$Markov chain is called irreducible if every pair of states communicate.
Markov chain is called periodic if it cycles in a predictable way, and aperiodic otherwise
$$ P = \left( \begin{array}{lll} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right) $$The period of a state $ i, x_i \in S $ is the greatest common divisor of the integers $ k $ such that $ P^k_{ii} > 0 $
In the last example, these integers for state 1 are 3,6,9,…, and so the period of state 1 is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic otherwise
Stationary or invariant distribution is $ \psi^\star $ such that
$$ \psi^\star = \psi^\star \cdot P $$It immediately follows that
$$ \psi^\star = \psi^\star \cdot P^k \text{ for any } k, $$that is if $ X_0 $ has distribution $ \psi^\star $, then $ X_t $ has the same distribution for all $ t $
ψ = np.array([0.4,0.4,0.2])
print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')
print('Distribution in one time period:', ψ@P ,sep='\n')
print('Distribution in two time periods:', ψ@P@P ,sep='\n')
print('Distribution in ten time periods:', ψ @ np.linalg.matrix_power(P,10) ,sep='\n')
Transition matrix: [[0.5 0.4 0.1] [0.4 0.5 0.1] [0.2 0.2 0.6]] Distribution of initial value: [0.4 0.4 0.2] Distribution in one time period: [0.4 0.4 0.2] Distribution in two time periods: [0.4 0.4 0.2] Distribution in ten time periods: [0.4 0.4 0.2]
Theorem: Every stochastic map has at least one stationary distribution.
Proof: application of Brouwer’s fixed point theorem
$ \psi^\star $ is a fixed point of the mapping $ \psi \mapsto \psi P $ from (row) vectors to (row) vectors
Note: stochastic matrix may have multiple stationary distributions, for example for the identity matrix every distribution is stationary.
Theorem: If stochastic matrix $ P $ is both aperiodic and irreducible, then
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly ergodic
One easy sufficient condition for aperiodicity and irreducibility is that all elements of $ P $ are strictly positive (why?)
Part 2 of the theorem gives us a particular way to compute stationary distributions:
Under uniform ergodicity conditions of the theorem (i.e. aperiodicity and irreducibility), convergence is towards the stationary distribution $ \psi^\star $!
ψ = np.array([1,0,0])
print('Transition matrix:',P,sep='\n')
for t in range(50):
ψ = ψ @ P
print('after step %2d the distribution is %r'%(t+1,ψ))
Transition matrix: [[0.5 0.4 0.1] [0.4 0.5 0.1] [0.2 0.2 0.6]] after step 1 the distribution is array([0.5, 0.4, 0.1]) after step 2 the distribution is array([0.43, 0.42, 0.15]) after step 3 the distribution is array([0.413, 0.412, 0.175]) after step 4 the distribution is array([0.4063, 0.4062, 0.1875]) after step 5 the distribution is array([0.40313, 0.40312, 0.19375]) after step 6 the distribution is array([0.401563, 0.401562, 0.196875]) after step 7 the distribution is array([0.4007813, 0.4007812, 0.1984375]) after step 8 the distribution is array([0.40039063, 0.40039062, 0.19921875]) after step 9 the distribution is array([0.40019531, 0.40019531, 0.19960938]) after step 10 the distribution is array([0.40009766, 0.40009766, 0.19980469]) after step 11 the distribution is array([0.40004883, 0.40004883, 0.19990234]) after step 12 the distribution is array([0.40002441, 0.40002441, 0.19995117]) after step 13 the distribution is array([0.40001221, 0.40001221, 0.19997559]) after step 14 the distribution is array([0.4000061 , 0.4000061 , 0.19998779]) after step 15 the distribution is array([0.40000305, 0.40000305, 0.1999939 ]) after step 16 the distribution is array([0.40000153, 0.40000153, 0.19999695]) after step 17 the distribution is array([0.40000076, 0.40000076, 0.19999847]) after step 18 the distribution is array([0.40000038, 0.40000038, 0.19999924]) after step 19 the distribution is array([0.40000019, 0.40000019, 0.19999962]) after step 20 the distribution is array([0.4000001 , 0.4000001 , 0.19999981]) after step 21 the distribution is array([0.40000005, 0.40000005, 0.1999999 ]) after step 22 the distribution is array([0.40000002, 0.40000002, 0.19999995]) after step 23 the distribution is array([0.40000001, 0.40000001, 0.19999998]) after step 24 the distribution is array([0.40000001, 0.40000001, 0.19999999]) after step 25 the distribution is array([0.4 , 0.4 , 0.19999999]) after step 26 the distribution is array([0.4, 0.4, 0.2]) after step 27 the distribution is array([0.4, 0.4, 0.2]) after step 28 the distribution is array([0.4, 0.4, 0.2]) after step 29 the distribution is array([0.4, 0.4, 0.2]) after step 30 the distribution is array([0.4, 0.4, 0.2]) after step 31 the distribution is array([0.4, 0.4, 0.2]) after step 32 the distribution is array([0.4, 0.4, 0.2]) after step 33 the distribution is array([0.4, 0.4, 0.2]) after step 34 the distribution is array([0.4, 0.4, 0.2]) after step 35 the distribution is array([0.4, 0.4, 0.2]) after step 36 the distribution is array([0.4, 0.4, 0.2]) after step 37 the distribution is array([0.4, 0.4, 0.2]) after step 38 the distribution is array([0.4, 0.4, 0.2]) after step 39 the distribution is array([0.4, 0.4, 0.2]) after step 40 the distribution is array([0.4, 0.4, 0.2]) after step 41 the distribution is array([0.4, 0.4, 0.2]) after step 42 the distribution is array([0.4, 0.4, 0.2]) after step 43 the distribution is array([0.4, 0.4, 0.2]) after step 44 the distribution is array([0.4, 0.4, 0.2]) after step 45 the distribution is array([0.4, 0.4, 0.2]) after step 46 the distribution is array([0.4, 0.4, 0.2]) after step 47 the distribution is array([0.4, 0.4, 0.2]) after step 48 the distribution is array([0.4, 0.4, 0.2]) after step 49 the distribution is array([0.4, 0.4, 0.2]) after step 50 the distribution is array([0.4, 0.4, 0.2])
Under irreducibility it also holds that
$$ \frac{1}{T} \sum_{t = 1}^T \mathbf{1}\{X_t = x_j\} \to \psi^\star_j \text{ as } T \to \infty $$In other words, the stationary distribution $ \psi^\star $ also shows the limiting fractions of time that the Markov chain $ X_t $ spends at each point of the state space