Entropy, Mutual Information and Sampling

In [1]:
import numpy as np 
import scipy as sp
import matplotlib.pyplot as plt
from __future__ import division

Entropy

$$ H[X]=-\sum_x p(x) \log_2 p(x) $$

In [2]:
def entropy(p): # entropy of probability distribution array p 
    p=p[p!=0]
    result=np.sum(p*np.log2(1/p))    
    return result

Entropy of a distribution with two states

In [3]:
a=linspace(0,1,101)
e=np.array([entropy(np.array([t,1-t])) for t in a])

c=0.9

plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.bar([0,1],[c,1-c])
plt.xticks([0.5,1.5],['$x_1$','$x_2$'],fontsize=15)
plt.ylabel('$p(x)$',fontsize=15)
plt.title('probability distribution \n $p(x_1)=$ %.2f,  $p(x_2)=$ %.2f' % (c,1-c))
plt.ylim(0,1)

plt.subplot(1,2,2)
plt.plot(a,e)
plt.plot(c,entropy(np.array([c,1-c])),'ro',label='a=%.2f'%c)
plt.title('Binary Entropy Function \n $p(x_1)=a$,  $p(x_2)=1-a$,  $H_2(a)=-a \log_2 a-(1-a) \log_2 (1-a) $')
plt.xlabel('a')
plt.ylabel('$H_2(a)$')
plt.ylim(0,1.05)
plt.legend()

plt.show()

Mutual Information

$$ \begin{array} II[X;Y]&=&\sum_{x,y} p(x,y) \log_2 \frac{p(x,y)}{p(x)p(y)} \\ &=&H[X]+H[Y]-H[X,Y] \\ &=&H[X]-H[X|Y] \\ &=&H[Y]-H[Y|X] \end{array} $$

In [4]:
def I(p): # mutual information of joint probability distribution array p 
    px=np.sum(p,0)
    py=np.sum(p,1)
    Hx=entropy(px)
    Hy=entropy(py)
    p=p[p>0]
    Hxy=np.sum(p*log2(1/p))
    return Hx+Hy-Hxy
In [5]:
pxy=lambda a: np.array([[a/2,(1-a)/2],[(1-a)/2,a/2]])
In [6]:
pxy(0.2), I(pxy(0.2))
Out[6]:
(array([[ 0.1,  0.4],
       [ 0.4,  0.1]]), 0.27807190511263746)

Mutual information calculated from finite samples

Let's sample from a joint two-dimensional distribution with no dependencies, e.g. [0.25,0.25],[0.25,0.25]. Because of finite sampling the empirical mutual information from these samples will not be zero. In fact there will be a small positive bias.

In [7]:
joint_dist=pxy(0.5) # joint distribution
true_I=I(joint_dist)
print joint_dist
print('the mutual information of above distribution is %.3f bits' % (true_I))

N=1000 # number of samples to take from joint distribution
t=20000 # number of trials

rI=np.array([I((np.random.multinomial(N,joint_dist.flat)/(N)).reshape(2,2)) for i in xrange(t)])
nbins=100
maxhist=np.max(np.histogram(rI,bins=nbins)[0])
plt.hist(rI,bins=nbins)
plt.plot([true_I,true_I],[0,maxhist*1.1],'r-',label='true mutual information')
plt.xlabel('empirical mutual information')
plt.ylabel('frequency')
plt.legend(loc='center left', bbox_to_anchor=(1,0.5))
plt.show()
[[ 0.25  0.25]
 [ 0.25  0.25]]
the mutual information of above distribution is 0.000 bits

Mean empirical bias

In [8]:
print('the true mutual information is %f and the mean empirical bias is %f' % (true_I,np.mean(rI-true_I)))
the true mutual information is 0.000000 and the mean empirical bias is 0.000733

The predicted theoretical mutual information bias = $\frac{(|X|-1)(|Y|-1)}{2 N \ln 2} + O \left( \frac{1}{N^2} \right)$

In [9]:
1/(2*N*log(2))
Out[9]:
0.00072134752044448179