Foundations of Computational Economics #20

Foundations of Computational Economics #20¶

by Fedor Iskhakov, ANU

Finite Markov chains¶

https://youtu.be/6rixSK8YQFc

Description: Stochastic matrix, irreducibility and aperiodicity, stationary distribution.

Essential for economics!¶

Markov chains, and more generally Markov processes are all over economic theory, computations and econometrics.
All dynamic models in the second half of the course use finite Markov chains as building blocks

Stochastic processes¶

Mathematical objects describing an evolution of random variable(s) in time.

$$ \{X_t\}_{t \in T} $$

Examples:

simple random walk: $ X_t $ takes integer values, time is discrete, $ X_{t+1} = X_{t} + 1 $ or $ X_{t+1} = X_{t} - 1 $ with some probabilities which do not depend on $ X_t $
Wiener process (Brownian motion): independent normally distributed increments, continuous time

Markov processes and Markov property¶

Markov property (memorylessness): probability distribution over $ X_{t+1} $ depends only on $ X_t $, but not on any previous values of the process

Markov process: stochastic process which has Markov property
Finite Markov chains are special case of Markov processes

Finite Markov chains¶

$ X_{t} $ takes values from finite set $ S = \{x_1,\dots,x_n\} $ which is called state space
time is discrete
probability transition from $ x_i \in S $ to $ x_j \in S $ denoted $ P_{ij} $, only depends on $ x_i $ and not on any previous values of $ X_t $)

$$ \mathbb{P}(X_{t+1}| X_{t},X_{t-1},\dots) = \mathbb{P}(X_{t+1}=x_j | X_{t}=x_i) = P_{ij} $$

Example¶

$$ S = \{ \text{Sunny}, \text{Partly Cloudy}, \text{Rainy} \} = \{ x_1, x_2, x_3 \} $$$$ \begin{array}{lll} P_{11} = 0.5; & P_{12} = 0.4; & P_{13} = 0.1; \\ P_{21} = 0.4; & P_{22} = 0.5; & P_{23} = 0.1; \\ P_{31} = 0.2; & P_{32} = 0.2; & P_{33} = 0.6; \end{array} $$

Transition probability matrix¶

$$ P = \left( \begin{array}{lll} P_{11}, & P_{12}, & P_{13} \\ P_{21}, & P_{22}, & P_{23} \\ P_{31}, & P_{32}, & P_{33} \end{array} \right) = \left( \begin{array}{lll} 0.5 & 0.4 & 0.1 \\ 0.4 & 0.5 & 0.1 \\ 0.2 & 0.2 & 0.6 \end{array} \right) $$

Stochastic (or Markov) matrix¶

all elements are nonnegative
each row sums to one

Transition matrix $ P $ is stochastic

Finite Markov chain¶

A sequence of random variables on $ S $ that have the Markov property
Fully characterized by its transition probability matrix $ P $

Simulation of Markov chain¶

By definition $ \{P_{ik}\}, k=1,\dots,n $ is probability distribution of $ X_{t+1} $ given $ X_{t} = x_i $

Therefore, it is possible to simulate the chain with

Draw $ X_0 $ from the initial distribution $ \psi $
At every step, given current $ X_t = x_i $, draw $ X_{t+1} $ from discrete distribution $ \{P_{ik}\}, k=1,\dots,n $

In [1]:

import numpy as np
P = np.array([[.5,.4,.1],[.4,.5,.1],[.2,.2,.6]])
ψ = np.array([0.2,0.3,0.5])  # arbitrary distribution of initial value

In [2]:

def ddraw(probs):
    '''Draws one realization of discrete random variables
    generated from given probability distibution (base 0)
    '''
    assert probs.ndim == 1, 'Expecting a one-dimensional array of probabilities'
    assert np.abs(probs.sum()-1)<1e-10, 'Passed probabilities do not sum up to one'
    cdf = np.cumsum(probs)  # cumulative distribution
    u = np.random.uniform()  # u(0,1)
    for i in range(cdf.size):
        if u<=cdf[i]:
            return i  # between i-1 and i values of cdf

In [3]:

# generate a sequence of discrete random variables with distribution ψ
for i in range(10):
    print(i,ddraw(ψ),sep=' - ')

In [4]:

def simmc(P,psi,T=100):
    '''Simulates Markov chain with given transition probability matrix (P),
    initial state distribution (psi) for a given number T of steps (time periods)
    '''
    P = np.asarray(P)  # convert to numpy array
    psi = np.asarray(psi)
    assert np.all(np.abs(P.sum(axis=1)-1)<1e-10), 'Passed P is not stochastic matrix'
    m = psi.size  # number of states in the chain
    # simulate the initial state
    X = np.empty(T,dtype=int)  # allocate space for the simulated values
    X[0] = ddraw(psi)  # initial values in first column
    # main loop over time
    for t in range(1,T):
        X[t] = ddraw(P[int(X[t-1]),:])  # simulate using appropriate row
    return X

In [5]:

print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')
sm = simmc(P,ψ,T=100)
print('Simulation:',sm,sep='\n')
weather=['Sunny','Partly cloudy','Rainy']
for i in sm:
    print(weather[i],end=' >> ')

Transition matrix:
[[0.5 0.4 0.1]
 [0.4 0.5 0.1]
 [0.2 0.2 0.6]]
Distribution of initial value:
[0.2 0.3 0.5]
Simulation:
[1 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 1 2 2 2 2
 2 2 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 2 2 1 2 2 2 2 2 0 1 0 1 1 0 0 0 0 1
 2 2 1 1 2 2 2 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1]
Partly cloudy >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Rainy >> Rainy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Partly cloudy >> Sunny >> Sunny >> Sunny >> Sunny >> Partly cloudy >> Rainy >> Rainy >> Partly cloudy >> Partly cloudy >> Rainy >> Rainy >> Rainy >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Sunny >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >> Partly cloudy >>

Marginal distribution¶

Suppose that

$ \{X_t\} $ is a Markov chain with stochastic matrix $ P $
the distribution of $ X_t $ is known to be $ \psi_t $

What then is the distribution of $ X_{t+1} $, and, more generally, of $ X_{t+m} $?

Solution¶

Let’s start from finding the distribution of $ X_{t+1} $ denoted $ \psi_{t+1} $

Fix $ y \in S $. Using the law of total probability, we can decompose the probability that $ \mathbb{P}\{X_{t+1} = y\} $ as follows:

$$ \mathbb {P} \{X_{t+1} = y \} = \sum_{x \in S} \mathbb {P} \{ X_{t+1} = y \, | \, X_t = x \} \cdot \mathbb {P} \{ X_t = x \} $$

In words, to get the probability of being at $ y $ tomorrow, we account for all ways this can happen and sum their probabilities

Matrix form¶

$$ \mathbb {P} \{X_{t+1} = x_j \} = \sum_{x_i \in S} \mathbb {P} \{ X_{t+1} = x_j | X_t = x_i \} \cdot \mathbb {P} \{ X_t = x_i \} = \sum_{i \in \{1,\dots,n\}} P_{ij} \cdot \psi_{t,i} $$

the first element is in transition probability matrix $ P $
the second term is in the distribution $ \psi_t $

Repeating for all $ j $, in matrix notation we have

$$ \psi_{t+1} = \psi_{t} \cdot P $$

$ \psi $ is a row vector

Many steps¶

To find the distribution of $ X_{t+m} $ we can repeat the same arguments $ m $ times

$$ \psi_{t+m} = \psi_{t} \cdot P^m $$

$ P^m $ is the power of the transition probability matrix
we can take powers of square matrices without any problem

In [6]:

print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')

print('Distribution in one time period:',  ψ@P    ,sep='\n')
print('Distribution in two time periods:', ψ@P@P  ,sep='\n')

print('Distribution in ten time periods:', ψ @ np.linalg.matrix_power(P,10) ,sep='\n')
# BE AWARE that P**10 works element-wise!

Transition matrix:
[[0.5 0.4 0.1]
 [0.4 0.5 0.1]
 [0.2 0.2 0.6]]
Distribution of initial value:
[0.2 0.3 0.5]
Distribution in one time period:
[0.32 0.33 0.35]
Distribution in two time periods:
[0.362 0.363 0.275]
Distribution in ten time periods:
[0.39985352 0.39985352 0.20029297]

Multiple step transition probabilities¶

powers of transition probability matrix give probabilities of transition in multiple steps!

$$ Q = P^m \Rightarrow \mathbb{P}\{X_{t+m} = x_j\} = Q_{ij} \text{ when } X_t = x_i $$

To see this, assume $ \psi_t=(1,0,\dots,0) $ and apply $ \psi_{t+m} = \psi_{t} \cdot P^m $ to get the probability distribution for the state in $ m $ periods. Elements of this vector are individual transition probabilities from $ x_i=1 $. The argument can be repeated for all degenerate $ \psi_t $ placing all probability mass to each of the values $ x_i $.

Irreducibility and aperiodicity¶

central concepts in Markov chain theory

Two states $ i $ and $ j $ are said to communicate if there exist positive integers $ k $ and $ l $ such that

$$ P_{ij}^k > 0 \text{ and } P_{ji}^l > 0 $$

state $ i $ can be reached from state $ j $ in some number of steps, and
state $ j $ can be reached from state $ i $ in some number of steps

Markov chain is called irreducible if every pair of states communicate.

Aperiodicity¶

Markov chain is called periodic if it cycles in a predictable way, and aperiodic otherwise

$$ P = \left( \begin{array}{lll} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array} \right) $$

More formally¶

The period of a state $ i, x_i \in S $ is the greatest common divisor of the integers $ k $ such that $ P^k_{ii} > 0 $

In the last example, these integers for state 1 are 3,6,9,…, and so the period of state 1 is 3.

A stochastic matrix is called aperiodic if the period of every state is 1, and periodic otherwise

Stationary distributions¶

Stationary or invariant distribution is $ \psi^\star $ such that

$$ \psi^\star = \psi^\star \cdot P $$

It immediately follows that

$$ \psi^\star = \psi^\star \cdot P^k \text{ for any } k, $$

that is if $ X_0 $ has distribution $ \psi^\star $, then $ X_t $ has the same distribution for all $ t $

In [7]:

ψ = np.array([0.4,0.4,0.2])
print('Transition matrix:',P,sep='\n')
print('Distribution of initial value:',ψ,sep='\n')

print('Distribution in one time period:',  ψ@P    ,sep='\n')
print('Distribution in two time periods:', ψ@P@P  ,sep='\n')

print('Distribution in ten time periods:', ψ @ np.linalg.matrix_power(P,10) ,sep='\n')

Transition matrix:
[[0.5 0.4 0.1]
 [0.4 0.5 0.1]
 [0.2 0.2 0.6]]
Distribution of initial value:
[0.4 0.4 0.2]
Distribution in one time period:
[0.4 0.4 0.2]
Distribution in two time periods:
[0.4 0.4 0.2]
Distribution in ten time periods:
[0.4 0.4 0.2]

Existence of stationary distributions¶

Theorem: Every stochastic map has at least one stationary distribution.

Proof: application of Brouwer’s fixed point theorem

$ \psi^\star $ is a fixed point of the mapping $ \psi \mapsto \psi P $ from (row) vectors to (row) vectors

Note: stochastic matrix may have multiple stationary distributions, for example for the identity matrix every distribution is stationary.

Uniqueness of stationary distribution¶

Theorem: If stochastic matrix $ P $ is both aperiodic and irreducible, then

$ P $ has exactly one stationary distribution $ \psi^\star $
For any initial distribution $ \psi_0 $ it holds $ \| \psi_0 P^t - \psi^\star \| \to 0 $ as $ t \to \infty $

A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly ergodic

One easy sufficient condition for aperiodicity and irreducibility is that all elements of $ P $ are strictly positive (why?)

Convergence to stationarity¶

Part 2 of the theorem gives us a particular way to compute stationary distributions:

Start from arbitrary distribution $ \psi_0 $
Compute the updated distribution $ \psi_t = \psi_{t-1} \cdot P $ until $ \psi_t $ and $ \psi_{t-1} $ are indistinguishable

Under uniform ergodicity conditions of the theorem (i.e. aperiodicity and irreducibility), convergence is towards the stationary distribution $ \psi^\star $!

In [8]:

ψ = np.array([1,0,0])
print('Transition matrix:',P,sep='\n')

for t in range(50):
    ψ = ψ @ P
    print('after step %2d the distribution is %r'%(t+1,ψ))

Transition matrix:
[[0.5 0.4 0.1]
 [0.4 0.5 0.1]
 [0.2 0.2 0.6]]
after step  1 the distribution is array([0.5, 0.4, 0.1])
after step  2 the distribution is array([0.43, 0.42, 0.15])
after step  3 the distribution is array([0.413, 0.412, 0.175])
after step  4 the distribution is array([0.4063, 0.4062, 0.1875])
after step  5 the distribution is array([0.40313, 0.40312, 0.19375])
after step  6 the distribution is array([0.401563, 0.401562, 0.196875])
after step  7 the distribution is array([0.4007813, 0.4007812, 0.1984375])
after step  8 the distribution is array([0.40039063, 0.40039062, 0.19921875])
after step  9 the distribution is array([0.40019531, 0.40019531, 0.19960938])
after step 10 the distribution is array([0.40009766, 0.40009766, 0.19980469])
after step 11 the distribution is array([0.40004883, 0.40004883, 0.19990234])
after step 12 the distribution is array([0.40002441, 0.40002441, 0.19995117])
after step 13 the distribution is array([0.40001221, 0.40001221, 0.19997559])
after step 14 the distribution is array([0.4000061 , 0.4000061 , 0.19998779])
after step 15 the distribution is array([0.40000305, 0.40000305, 0.1999939 ])
after step 16 the distribution is array([0.40000153, 0.40000153, 0.19999695])
after step 17 the distribution is array([0.40000076, 0.40000076, 0.19999847])
after step 18 the distribution is array([0.40000038, 0.40000038, 0.19999924])
after step 19 the distribution is array([0.40000019, 0.40000019, 0.19999962])
after step 20 the distribution is array([0.4000001 , 0.4000001 , 0.19999981])
after step 21 the distribution is array([0.40000005, 0.40000005, 0.1999999 ])
after step 22 the distribution is array([0.40000002, 0.40000002, 0.19999995])
after step 23 the distribution is array([0.40000001, 0.40000001, 0.19999998])
after step 24 the distribution is array([0.40000001, 0.40000001, 0.19999999])
after step 25 the distribution is array([0.4       , 0.4       , 0.19999999])
after step 26 the distribution is array([0.4, 0.4, 0.2])
after step 27 the distribution is array([0.4, 0.4, 0.2])
after step 28 the distribution is array([0.4, 0.4, 0.2])
after step 29 the distribution is array([0.4, 0.4, 0.2])
after step 30 the distribution is array([0.4, 0.4, 0.2])
after step 31 the distribution is array([0.4, 0.4, 0.2])
after step 32 the distribution is array([0.4, 0.4, 0.2])
after step 33 the distribution is array([0.4, 0.4, 0.2])
after step 34 the distribution is array([0.4, 0.4, 0.2])
after step 35 the distribution is array([0.4, 0.4, 0.2])
after step 36 the distribution is array([0.4, 0.4, 0.2])
after step 37 the distribution is array([0.4, 0.4, 0.2])
after step 38 the distribution is array([0.4, 0.4, 0.2])
after step 39 the distribution is array([0.4, 0.4, 0.2])
after step 40 the distribution is array([0.4, 0.4, 0.2])
after step 41 the distribution is array([0.4, 0.4, 0.2])
after step 42 the distribution is array([0.4, 0.4, 0.2])
after step 43 the distribution is array([0.4, 0.4, 0.2])
after step 44 the distribution is array([0.4, 0.4, 0.2])
after step 45 the distribution is array([0.4, 0.4, 0.2])
after step 46 the distribution is array([0.4, 0.4, 0.2])
after step 47 the distribution is array([0.4, 0.4, 0.2])
after step 48 the distribution is array([0.4, 0.4, 0.2])
after step 49 the distribution is array([0.4, 0.4, 0.2])
after step 50 the distribution is array([0.4, 0.4, 0.2])

Ergodicity¶

Under irreducibility it also holds that

$$ \frac{1}{T} \sum_{t = 1}^T \mathbf{1}\{X_t = x_j\} \to \psi^\star_j \text{ as } T \to \infty $$

$ \mathbf{1}\{X_t = x_j\} = 1 $ if $ X_t = x_j $ and zero otherwise
convergence is with probability one
the result does not depend on the distribution (or value) of $ X_0 $

In other words, the stationary distribution $ \psi^\star $ also shows the limiting fractions of time that the Markov chain $ X_t $ spends at each point of the state space

Further learning resources¶

QuantEcon lecture on Markov chains https://python.quantecon.org/finite_markov.html