#!/usr/bin/env python # coding: utf-8 # # Solutions for Chapter 2 Probability # + date: 2017-03-08 # + tags: mlapp, statistics # + author: Peijun Zhu # + slug: mlapp-ch2-sol # # $\def\bra#1{\mathinner{\langle{#1}|}} # \def\ket#1{\mathinner{|{#1}\rangle}} # \def\braket#1{\mathinner{\langle{#1}\rangle}} # \newcommand{\mdef}{\overset{\mathrm{def}}{=}} # \newcommand{\bm}{\mathbf} # \DeclareMathOperator{\Var}{Var} # \DeclareMathOperator{\det}{det} # \DeclareMathOperator{\tr}{tr} # \DeclareMathOperator{\E}{E} # \DeclareMathOperator{\Cov}{Cov} # \DeclareMathOperator{\Beta}{B} # \DeclareMathOperator{\Bdist}{Beta} # \DeclareMathOperator{\sgn}{sgn} # \DeclareMathOperator{\adj}{adj} # \DeclareMathOperator{\ii}{i} # \DeclareMathOperator{\dd}{d} # \newcommand{\pp}{\partial} # \DeclareMathOperator{\rhs}{RHS} # \DeclareMathOperator{\lhs}{LHS} # \DeclareMathOperator{\Beta}{B}$ # # ## 2.1 # 1. $P=1/3$ # 1. $P=1/2$, the difference is that you happen to see a child. # ## 2.2 Obvious # ## 2.3 # \begin{align} # \Var[x+y]&=\E[(x+y)^2]-\E[x+y]^2\\ # &=\E[x^2]+\E[y^2]+2\E[xy]-\E[x]^2-\E[y]^2-2\E[x]\E[y]\\ # &=\Var[x]+\Var[y]+2\Cov[x,y] # \end{align} # ## 2.4 # $I, N$ means infected and not infected, $P$ means positive # $$\Pr(I|P)=\frac{\Pr(IP)}{\Pr(P)}=\frac{\Pr(IP)}{\Pr(IP)+\Pr(NP)}\approx\frac{10^{-4}}{10^{-4}+0.01}\approx 0.01$$ # ## 2.5 # The door we chose has probability $1/3$ for the prize. After the host has revealed another invalid door, the chance of the last door is $1-1/3=2/3$. Note the host is **not randomly** removing a door, so the chance of the two unchosen doors will fall to the last door. # ## 2.6 # 1. $$P(H|e_1,e_2)=\frac{P(H,e_1,e_2)}{P(e_1,e_2}=\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}$$ # + Only ii is sufficient # 2. Given $P(e_1,e_2|H)=P(e_1|H)P(e_2|H)$, we have $$P(H|e_1,e_2)=\frac{P(e_1,H)P(e_2,H)}{P(H)P(e_1,e_2)}$$ # + Use $P(e_1,e_2)=\sum_h P(e_1,e_2|H=h)P(H=h)$, i is equivalent to iii # + Both i, iii are sufficient now # ## 2.7 # Pairwise independence means # $$P(X_iX_j)=P(X_i)P(X_j), \forall i\neq j\in \{1,\ldots ,n\}$$ # # Mutually independent means # $$P(\prod_{X\in S}X)=\prod_{X\in S}, \forall S\subseteq \{1,\ldots ,n\}$$ # # Thus for $n\geq 3$, the two group of identities are not necessarily equivalent. # ### Counter example # $$X, Y\sim \operatorname{Ber}(0.5), \quad Z=X\oplus Y\sim \operatorname{Ber}(0.5)$$ # ## 2.8 # We want to prove $$P(x,y|z)=P(x|z)P(y|z)\Leftrightarrow P(x,y|z)=g(x,z)h(y,z)$$ # # The LHS to RHS is obvious. We are now going from RHS to LHS. # First of all, $p(x,y|z)$ must satisfy unitary condition # $$\iint p(x,y|z)dxdy=\left(\int g(x,z)dx\right)\left(\int h(y,z)dy\right)=1$$ # and $$P(x|z)= g(x,z)\int h(y,z)dy,\qquad P(y|z)= h(y,z)\int h(x,z)dx$$ # And the deduction to LHS is trivial now. # ## 2.9 # $$X\perp W|Z,Y\Rightarrow P(XWZY)=P(XZY)P(WZY)/P(ZY)$$ # $$X\perp Y|Z\Rightarrow P(XYZ)=P(XZ)P(YZ)/P(Z)$$ # $$X\perp Y|W\Rightarrow P(XYW)=P(XW)P(YW)/P(W)$$ # ### a. True # We need to prove $$P(XYWZ)=P(XZ)P(YWZ)/P(Z),$$ # which is the result of plugging 2nd equation into 1st one. # ### b. False # We need to prove $$P(XYZW)=P(XZW)P(YZW)/P(ZW),$$ # which is not reachable. # ## 2.10 # The pdf of $y=1/x$ is # # \begin{align} # g(y)&=f(x(y))\left|\frac{dx}{dy}\right|=f(1/y)/y^2\\ # &=\frac{b^a}{\Gamma(a)}(1/y)^{a-1}e^{-b/y}/y^2\\ # &=\frac{b^a}{\Gamma(a)}y^{-(a+1)}e^{-b/y} # \end{align} # ## 2.11 # \begin{align} # Z^2&=\int_0^{2\pi}\int_0^\infty r\exp\left(-\frac{r^2}{2\sigma^2}\right)drd\theta\\ # &=2\pi\int_0^\infty\exp\left(-\frac{t}{\sigma^2}\right)dt,\quad t=r^2/2\\ # &=2\pi\sigma^2 # \end{align} # ## 2.12 # Notice $-\sum_x\sum_y p(x,y)\log p(x)=-\sum_x p(x)\log p(x)=H(X)$ # # \begin{align} # I(X,Y)&=\sum_x\sum_y p(x,y)\log\frac{p(x,y)}{p(x)p(y)}\\ # &=H(X)+H(Y)-H(X,Y) # \end{align} # # \begin{align} # H(Y|X)&=-\sum_y \log p(x,y)\log\frac{p(x,y)}{p(x)}\\ # &=H(X,Y)-H(X) # \end{align} # # So $I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$ # ## 2.13 # Suppose we have $$X_{\pm}=\frac{X_1\pm X_2}{\sqrt{2}}$$ # then $$\begin{bmatrix} # X_+\\ # X_- # \end{bmatrix}\sim N\left(\mathbf{0},\begin{bmatrix}\sigma_+^2&\\&\sigma_-^2\end{bmatrix}\right)=N(0,\sigma_+^2)N(0,\sigma_-^2)$$ # # And $$\sigma_+^2\frac{(X_1+X_2)^2}{2}+\sigma_-^2\frac{(X_1-X_2)^2}{2}=\sigma^2 (X_1^2+X_2^2)-2\rho X_1X_2$$ # # Thus $\sigma_\pm=(1\pm\rho)\sigma^2$. And $X_+$ is independent on $X_-$. From $X_\pm$, we can get the distribution of $X_1$ and $X_2$. # # If $\rho=1$, then $X_1=X_2=X/\sqrt 2$, the mutual information $I(X_1,X_2)=H(X_1)$ # # If $\rho=-1$, similiar. # # If $\rho=0$, they are independent. So $I(X_1,X_2)=0$ # ## 2.14 # 1. $r=1-H(Y|X)/H(X)=(H(X)-H(Y|X))/H(X)=I(Y, X)/H(x)$ # 1. It is obvious that $H(X)>0$. # 1. $r\geq 0$ because we have $I(X,Y)\geq 0$ using Jensen inequality: $$I(X,Y)=-\sum_{i,j} \Pr(i,j)\log \frac{p_ip_j}{\Pr(i,j)}\geq - \log \sum_{i,j}p_ip_j=0$$ # 1. We need to prove $H(Y|X)\geq 0$. And $H(Y|X)=H(X,Y)-H(Y)\geq 0$. # # \begin{align} # H(X,Y)&=-\sum_i\sum_j \Pr(i,j)\log \Pr(i,j)\\ # &\geq -\sum_i p_i\log \frac{\sum_j p^2(i,j)}{p_i}\\ # &\geq -\sum_i p_i\log \frac{p_i^2}{p_i}=H(Y) # \end{align} # # As a result, $r\leq 1$ # 1. $r=0$ holds iff $I(X,Y)=0$. In this case, $\dfrac{p_ip_j}{\Pr(i,j)}$ is a constant, so $X$ is independent on $Y$ # 1. $r=0$ holds iff $I(X,Y)=H(X)$. i.e. $X=Y$ # ## 2.15 # MLE(Maximum Likehood Estimation) is to maximize the likehood # $$\prod_i q(x=i|\theta)^{n_i}=\exp(n_i\log q_i)\propto\exp(\sum_ip_i\log q_i)$$ # # It is equivalent to minimize $-\sum_ip_i\log q_i\equiv -\sum_ip_i\log(q_i/p_i)$ # ## 2.16 # $f=x^{a-1}(1-x)^{b-1}/\Beta(a,b)\propto x^{a-1}(1-x)^{b-1}$ # $$E(x)=\int xf dx=\Beta(a+1,b)/\Beta(a,b)=\frac{a}{a+b}$$ # We can prove the last step as follows: # # \begin{align} # \Beta(p, q+1)&=\int_0^1 x^{p-1}(1-x)^{q}dx\\ # &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\int_0^1 x^{p}(1-x)^{q-1}dx\\ # &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\int_0^1 x^{p}(1-x)^{q-1}dx\\ # &=\int_0^1 x^{p-1}(1-x)^{q-1}dx-\frac{p}{q}\int_0^1 x^{p-1}(1-x)^{q}dx\\ # &=\Beta(p,q)-\frac{p}{q}\Beta(p, q+1) # \end{align} # # As a result, $\Beta(p, q+1)=\dfrac{q}{p+q}\Beta(p,q)$. We can also use the relationship betweeen Beta and Gamma function, and $\Gamma(p+1)=p\Gamma(p)$ # $$\Beta(p,q+1)=\frac{\Gamma(p)\Gamma(q)}{\Gamma(p+q+1)}=\frac{q\Gamma(p)\Gamma(q)}{(p+q)\Gamma(p+q)}=\frac{q}{p+q}\Beta(p,q)$$ # # $$E(x^2)=\int x^2f dx=\Beta(a+2,b)/\Beta(a,b)=\frac{a}{a+b}\frac{a+1}{a+b+1}$$ # # $$\frac{f'}{f}=\frac{a-1}{x}+\frac{b-1}{1-x}=0\Rightarrow x_m=\frac{a-1}{a-b}$$ # # $$Var(x)=E(x^2)-E(x)^2=\frac{ab}{(a+b)^2(a+b+1)}$$ # ## 2.17 # The cdf(cumulative distribution function) of each variable is # # $$F_1(x)=\Pr(X