In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import scipy.stats as st
from math import sqrt
print("Modules Imported!")

Grading: $\min(p,100)/100$, $p$ is the number of points out of 105 available pts.

The Three Problems of Hidden Markov Models

In this excercise, we will implement algorithms for Evaluation, Decoding, and Learning problems for hidden Markov Models (HMMs).

Set up

We consider a Markov chain with two states $\{0,1\}$. Denote the state of the Markov chain at time $t$ as $z_t$, for $t=0,1,\dotsc,T-1$. At each time point we get a noisy version $w_t$ of the state $z_t$. Let the transition and emission probabilities be denoted by $A=(A_{ij})$ and $B=(B_{ij})$, respectively, where $A_{ij}=\Pr(z_t=j\mid z_{t-1}=i)$ and $B_{ij}=\Pr(w_t=j\mid z_{t}=i)$.

Generating data

A function for generating data is given below. This function will be used for testing purposes. Note that z[0]=1 with probability $A_{01}=$ A[0,1]. This is equivalent to assuming that at time $t=-1$, the chain was in state $0$. Note that this is different from the setting discussed in class where z[0]=1 with probability $\pi_1$. Later when you formulate EM for this problem, you should take this difference into account.

In [ ]:
def hmm(A,B,T=15):
    z = np.zeros(T,dtype=np.int8)
    w = np.zeros(T,dtype=np.int8)

    z[0] = st.bernoulli.rvs(A[0,1])
    w[0] = st.bernoulli.rvs(B[z[0],1])

    for t in range(1,T):
        z[t] = st.bernoulli.rvs(A[z[t-1],1])
        w[t] = st.bernoulli.rvs(B[z[t],1])
    return z,w


(20 pts) Implement the forward-backward (sum-product) algorithm as function.

In [ ]:
def hmmForwardBackward(A,B,w):
    T = len(w)
    # ...
    return alpha, beta, gamma

Transition Probabilities

(10 pts) Using the previous part, implement a function that finds $\zeta_t(i,j)=\Pr(z_t=i,z_{t+1}=j|w,A,B)$.

In [ ]:
def hmmTransition(A,B,w):
    T = len(w)
    alpha,beta,gamma = hmmForwardBackward(A,B,w)
    # ...
    return zeta

Viterbi Algorithm

(30) Implement the Viterbi algorithm.

In [ ]:
def hmmViterbi(A,B,w):
    T = len(w)
    # ...
    return pathViterbi # this is a binary sequence determining the most likely sequence of hidden states given the observations

Let $p=[p_0,p_1]$ where $p_i$ is the probability of transitioning to the other state from state $i$. That is \begin{equation} A = \left[\begin{array}{c c}1-p_0 & p_0\\ p_1 & 1-p_1\end{array}\right] \end{equation}

Similarly, let $q=[q_0,q_1]$ where $q_i$ is the probability of an incorrect observation when in state $i$. That is \begin{equation} B = \left[\begin{array}{c c}1-q_0 & q_0\\ q_1 & 1-q_1\end{array}\right] \end{equation}

With this setup, the code below produces plots of the hidden states, observations, estimates, etc. Run this code to produce these plots.

In [ ]:
p = [.1,.01]
q = [0.02,0.05]
A = np.array([[1-p[0],p[0]],[p[1],1-p[1]]])
B = np.array([[1-q[0],q[0]],[q[1],1-q[1]]])
T = 100

z,w = hmm(A,B,T)
alpha,beta,gamma = hmmForwardBackward(A,B,w)
zeta = hmmTransition(A,B,w)
pathViterbi = hmmViterbi(A,B,w)

# the first plot compares the hidden states with the observations (of course in a real-world problem, we would 
# not have access to the hidden states). 
# the second plot compares the hidden states with the probability that a hidden state = 1 and with the Viterbi path
# the last plot, gives the probabilities for transitions. This is for example useful if we want to find out when 
# a change in the system has occurred, for example for detecting anomalies in networks

(5 pts) Discuss the relationship between gamma[:,1] and pathViterbi. Is it correct to say whenever gamma[t,1]>1/2 then pathViterbi[t]=1?



Baum-Welch (EM)

(40 pts) Implement the EM algorithm to estimate $A$ and $B$. Note that as described above, one term in the equation is slightly different from what we discussed in class. Set the number of iterations at 50. Note the indeces of the hidden states may be swapped since EM has no way of determining the correct labeling (i.e., which state is called 0 and which state is called 1). Print the EM estimates and compare with the true values. Repeat the plots in the previous part with EM.

In [ ]:
A_EM = np.random.rand(2,2)
A_EM = A_EM/np.sum(A_EM,axis=1)[:,None]

B_EM = np.random.rand(2,2)
B_EM = B_EM/np.sum(B_EM,axis=1)[:,None]

for it in range(50):
    # ...



# ...