### 一.变分分布¶

$$p(X\mid\mu,\tau)=(\frac{\tau}{2\pi})^{\frac{N}{2}}exp\{-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2\}$$

$$p(\tau)=Gam(\tau\mid a_0,b_0)\\ p(\mu\mid\tau)=N(\mu\mid\mu_0,(\lambda_0\tau)^{-1})$$

$$p(\mu,\tau\mid X)=\frac{p(X\mid\mu,\tau)p(\tau)p(\mu\mid\tau)}{p(X)}\\ =\frac{p(X\mid\mu,\tau)p(\tau)p(\mu\mid\tau)}{\int p(X\mid\mu,\tau)p(\tau)p(\mu\mid\tau)d\mu d\tau}\\ =...省略...$$

$$q(\mu,\tau)=q_\mu(\mu)q_\tau(\tau)$$

$$ln\ q_\mu^*(\mu)=\int_\tau q_\tau(\tau)[ln\ p(X,\mu,\tau)] d\tau+const\\ =\int_\tau q_\tau(\tau)[ln\ p(X\mid\mu,\tau)+ln\ p(\mu\mid\tau)+ln p(\tau)] d\tau+const\\ =\int_\tau q_\tau(\tau)[ln\ p(X\mid\mu,\tau)+ln\ p(\mu\mid\tau)] d\tau+const（与\mu无关的项可以并入到const中）\\ =\int_\tau q_\tau(\tau)[-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2-\frac{\lambda_0\tau}{2}(\mu-\mu_0)^2]d\tau+const（这里再次将与\mu无关的项并入到const中）\\ =-\frac{E[\tau]}{2}[\lambda_0(\mu-\mu_0)^2+\sum_{n=1}^N(x_n-\mu)^2]+const\\ =-\frac{E[\tau]}{2}[\lambda_0\mu^2+\lambda_0\mu_0^2-2\lambda_0\mu_0\mu+\sum_{n=1}^N(x_n^2+\mu^2-2x_n\mu)]+const\\ =-\frac{E[\tau]}{2}[(\lambda_0+N)\mu^2-(2\lambda_0\mu_0+2\sum_{n=1}^Nx_n)\mu]+const（再次将与\mu无关的项并入到const中）\\ =-\frac{(\lambda_0+N)E[\tau]}{2}(\mu-\frac{\lambda_0\mu_0+\sum_{n=1}^Nx_n}{\lambda_0+N})^2+const（从const中提了一个与\mu无关的常数项出来）$$

$$\mu_N=\frac{\lambda_0\mu_0+N\bar{x}}{\lambda_0+N}\\ \lambda_N=(\lambda_0+N)E[\tau]$$

$$ln\ q_\tau^*(\tau)=\int_\mu q_\mu(\mu)[ln p(X\mid\mu,\tau)+ln\ p(\mu\mid\tau)]+ln\ p(\tau)+const\\ =\int_\mu q_\mu(\mu)[\frac{N}{2}ln\ \tau-\frac{N}{2}ln\ 2\pi-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2+\frac{1}{2}ln\ \lambda_0+\frac{1}{2}ln\ \tau-\frac{1}{2}ln\ 2\pi-\frac{\lambda_0\tau}{2}(\mu-\mu_0)]d\mu+ln\ \frac{b_0^{a_0}}{\Gamma(a_0)}+(a_0-1)ln\ \tau-b_0\tau+const\\ =\int_\mu q_\mu(\mu)[\frac{N}{2}ln\ \tau-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2+\frac{1}{2}ln\ \tau-\frac{\lambda_0\tau}{2}(\mu-\mu_0)]d\mu+(a_0-1)ln\ \tau-b_0\tau+const（合并与\tau无关的项目到const中）\\ =(\frac{N+1}{2}+a_0-1)ln\ \tau-\frac{\tau}{2}E_\mu[\sum_{n=1}^N(x_n-\mu)+\lambda_0(\mu-\mu_0)^2]-b_0\tau+const$$

$$a_N=\frac{N+1}{2}+a_0\\ b_N=b_0+\frac{1}{2}E_\mu[\sum_{n=1}^N(x_n-\mu)^2+\lambda_0(\mu-\mu_0)^2]$$

（1）无须指定$q_\mu(\mu)$和$q_\tau(\tau)$的函数形式，因为它们可以从似然函数和共轭先验自动推导出来；

（2）虽然我们假设了$q_\mu(\mu)$和$q_\tau(\tau)$相互独立，但求解结果表明它们是相互耦合的，即$q_\mu(\mu)$依赖于$q_\tau(\tau)$，反过来$q_\tau(\tau)$依赖于$q_\mu(\mu)$

### 二.迭代优化¶

$$E[\tau]=\frac{a_N}{b_N}$$

$$\frac{1}{E[\tau]}=E[\frac{1}{N+1}\sum_{n=1}^N(x_n-\mu)^2]=\frac{N}{N+1}(\bar{x^2}-2\bar{x}E[\mu]+E[\mu^2])$$

$$E[\mu]=\mu_N=\bar{x}$$

$$E[\mu^2]=\mu_N^2+\frac{1}{\lambda_N}=\bar{x}^2+\frac{1}{NE[\tau]}$$

$$\frac{1}{E[\tau]}=\bar{x^2}-\bar{x}^2=\frac{1}{N}\sum_{n=1}^N(x_n-\bar{x})^2$$

### 三.代码实现¶

$$p(X\mid\mu,\tau)\propto \tau^{\frac{N}{2}}exp[-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2]\\ p(\mu\mid\tau)\propto \tau^{\frac{1}{2}}exp[-\frac{\lambda_0\tau}{2}(\mu-\mu_0)^2]\\ p(\tau)\propto\tau^{a_0-1}exp[-b_0\tau]$$

$$p(\mu,\tau\mid X)\propto p(X\mid\mu,\tau)p(\mu\mid\tau)p(\tau)\\ \propto \tau^{\frac{N}{2}}exp[-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2]\tau^{\frac{1}{2}}exp[-\frac{\lambda_0\tau}{2}(\mu-\mu_0)^2]\tau^{a_0-1}exp[-b_0\tau]\\ \propto\tau^{\frac{N+1}{2}+a_0-1}exp[-b_0\tau]exp[-\frac{\tau}{2}\sum_{n=1}^N(x_n-\mu)^2-\frac{\lambda_0\tau}{2}(\mu-\mu_0)^2]\\ \propto \tau^{\frac{N+1}{2}+a_0-1}exp[-b_0\tau]exp[-\frac{\tau}{2}\sum_{n=1}^Nx_n^2]exp[-\frac{\lambda_0\mu_0^2\tau}{2}]exp[-\frac{(\lambda_0+N)\tau}{2}(\mu-\frac{\sum_{n=1}^Nx_n+\lambda_0\mu_0}{\lambda_0+N})^2]\\ \propto \tau^{\frac{N+1}{2}+a_0-1}exp[-(b_0+\frac{1}{2}\sum_{n=1}^Nx_n^2+\frac{\lambda_0\mu_0^2}{2})\tau]exp[-\frac{(\lambda_0+N)\tau}{2}(\mu-\frac{\sum_{n=1}^Nx_n+\lambda_0\mu_0}{\lambda_0+N})^2]$$

$$p(\mu,\tau\mid X)=N(\mu\mid \frac{\sum_{n=1}^Nx_n+\lambda_0\mu_0}{\lambda_0+N},[(\lambda_0+N)\tau]^-1)\cdot Gam(\tau\mid \frac{N+1}{2}+a_0,b_0+\frac{1}{2}\sum_{n=1}^Nx_n^2+\frac{\lambda_0\mu_0^2}{2})$$

In [1]:
import numpy as np
from scipy.special import gamma
def post_prob_func(mu,tau,X):
#先计算高斯部分的值
u=np.mean(X)
sigma=np.sqrt(1/(len(X)*tau))
gassian_value=1/(np.sqrt(2*np.pi)*sigma)*np.exp(-1*np.power(mu-u,2)/(2*sigma**2))
#再计算gamma部分的值
a=len(X)/2+0.5
b=0.5*np.sum(X*X)
gamma_value=1/gamma(a)*np.power(b,a)*np.power(tau,a-1)*np.exp(-1*b*tau)
return gassian_value*gamma_value

In [2]:
# 从标准高斯分布随机采样100个点
np.random.seed(0)
X=np.random.randn(100)

In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
def plot_contourf(data,func,lines=3):
n = 256
x = np.linspace(data[:,0].min(), data[:,0].max(), n)
y = np.linspace(data[:,1].min(), data[:,1].max(), n)
X, Y = np.meshgrid(x,y)
C = plt.contour(X,Y, func(np.c_[X.reshape(-1),Y.reshape(-1)]).reshape(X.shape), lines, colors='g')

In [4]:
mu_range=np.linspace(-0.2,0.3,100)
tau_range=np.linspace(0.7,1.3,100)
data=np.vstack((mu_range,tau_range)).T

In [5]:
plot_contourf(data,lambda x:post_prob_func(x[:,0],x[:,1],X),8)


In [6]:
def plot_vs_contourf(data,func1,func2,lines=3):
n = 256
x = np.linspace(data[:,0].min(), data[:,0].max(), n)
y = np.linspace(data[:,1].min(), data[:,1].max(), n)
X, Y = np.meshgrid(x,y)
plt.contour(X,Y, func1(np.c_[X.reshape(-1),Y.reshape(-1)]).reshape(X.shape), lines, colors='g')
plt.contour(X,Y, func2(np.c_[X.reshape(-1),Y.reshape(-1)]).reshape(X.shape), lines, colors='r')


In [7]:
def joint_dist_func(mu,tau,u,sigma,a,b):
gassian_value=1/(np.sqrt(2*np.pi)*sigma)*np.exp(-1*np.power(mu-u,2)/(2*sigma**2))
gamma_value=1/gamma(a)*np.power(b,a)*np.power(tau,a-1)*np.exp(-1*b*tau)
return gassian_value*gamma_value


In [8]:
#计算初始的E[\tau]
E_tau=1.0/np.mean((X-np.mean(X))*(X-np.mean(X)))
#进行epoch次迭代
epoch=3
split_point=np.linspace(0,epoch-1,3).astype(int).tolist()
plt.figure(figsize = (18,4))
for count in range(0,epoch):
#高斯分布的参数
u=np.mean(X)
lambd=len(X)*E_tau
sigma=np.sqrt(1.0/lambd)
#gamma分布的参数
a=(len(X)+1)/2
b=np.sum(X*X)/2-len(X)*u*u+len(X)/2*(np.power(sigma,2.0)+np.power(u,2.0))
E_tau=a/b
if count in split_point:
plt.subplot(1,3,split_point.index(count)+1)
plt.title("N="+str(count))
plot_vs_contourf(data,lambda x:post_prob_func(x[:,0],x[:,1],X),lambda x:joint_dist_func(x[:,0],x[:,1],u,sigma,a,b),8)


In [9]:
#计算初始的E[\tau]
E_tau=0.1
#进行epoch次迭代
epoch=3
split_point=np.linspace(0,epoch-1,3).astype(int).tolist()
plt.figure(figsize = (18,4))
for count in range(0,epoch):
#高斯分布的参数
u=np.mean(X)
lambd=len(X)*E_tau
sigma=np.sqrt(1.0/lambd)
#gamma分布的参数
a=(len(X)+1)/2
b=np.sum(X*X)/2-len(X)*u*u+len(X)/2*(np.power(sigma,2.0)+np.power(u,2.0))
E_tau=a/b
if count in split_point:
plt.subplot(1,3,split_point.index(count)+1)
plt.title("N="+str(count))
plot_vs_contourf(data,lambda x:post_prob_func(x[:,0],x[:,1],X),lambda x:joint_dist_func(x[:,0],x[:,1],u,sigma,a,b),8)


### 三.总结¶

（1）求所有变量（包括观测变量、隐变量、参数）的联合概率分布，比如一开头我们就列出了$p(X,\mu,\tau)$（只是将它拆开为$p(X\mid\mu,\tau),p(\mu\mid\tau),p(\tau)$这三部分分别表示）；

（2）求变分分布，即通过本章第一节的公式：

$$ln\ q_j^*(Z_j)=E_{i\neq j}[ln\ p(X,Z)]+const$$

（3）迭代法求解最优变分分布，第（2）步求出的各变量（组）的变分分布之间是耦合，所以通常会通过迭代的方式求得它的最优形式（即$q(Z)$分布的最优参数）；

（4）将变分分布应用到下游任务（需要用到后验概率分布的地方都可以用变分分布替代了，该页note没有涉及到这部分内容）

In [ ]: