Suppose $Y \sim N(\mu_Y,V_Y)$.
The mean and variance of $Y$ given truncation selection are:
where $$ i = \frac{f(s)}{p} $$ $f(s)$ is the standard normal density function $$ s = \frac{t - \mu_Y}{V_Y^{1/2}} $$ $$ p = \Pr(Y > t) $$
Start with mean and variance for a standard normal variable given truncation selection.
Let $Z \sim N(0,1)$.
The density function of $Z$ is: $$ f(z) = \sqrt{\frac{1}{2\pi}}e^{-\frac{1}{2}z^2} $$
The density function for $Z$ given truncation selection is $$ f(z|z>s) = f(z)/p $$
From the definition of the mean: \begin{equation*} \begin{split} E(Z|Z>s) &= \frac{1}{p} \int_s^{\infty} z f(z)dz\\ &= \frac{1}{p} [-f(z) ] _s^{\infty} \\ &= \frac{f(s)}{p} \\ &= i \end{split} \end{equation*}
because the first derivative of $f(z)$ with respect to $z$ is: \begin{equation*} \begin{split} \frac{d}{dz} f(z) &= \sqrt{\frac{1}{2\pi} } e^{-\frac{1}{2}z^2} (-z)\\ &= -zf(z) \end{split} \end{equation*}
Now, to compute the variance of $Z$ given selection, consider the following identity: \begin{equation*} \begin{split} \frac{d}{dz} z f(z) &= f(z) + z \frac{d}{dz} f(z) \\ &= f(z) - z^2 f(z) \end{split} \end{equation*}
Integrating both sides from $s$ to $\infty$ gives \begin{equation*} zf(z)]_s^\infty = \int_s^\infty f(z) dz - \int_s^\infty z^2 f(z)dz \end{equation*} Upon rearranging this gives: \begin{equation*} \begin{split} \int_s^\infty z^2 f(z)dz &= \int_s^\infty f(z) dz - zf(z)]_s^\infty \\ \frac{1}{p} \int_s^\infty z^2 f(z)dz &= \frac{1}{p} \int_s^\infty f(z) dz + \frac{f(s)}{p}s\\ &= 1 + is \end{split} \end{equation*}
So, \begin{equation} \begin{split} Var(Z|Z>s) &= E(Z^2|Z>s) - [E(Z|Z>s)]^2\\ &= 1 + is - i^2 \\ &= 1 - i(i-s) \end{split} \end{equation}
Results for $Y$ follow from the fact that $$ \mu_Y + V_Y^{1/2}Z \sim N(\mu_Y,V_Y) $$
So, let $$ Y = \mu_Y + V_Y^{1/2}Z, $$ Then, the condition $$ Y>t $$ is equivalent to \begin{equation*} \begin{split} \mu_Y + V_Y^{1/2}Z &> t \\ V_Y^{1/2}Z &> t - \mu_Y \\ Z &> \frac{t - \mu_Y}{V_Y^{1/2}}\\ Z &> s \end{split} \end{equation*}
Then, \begin{equation*} \begin{split} E(Y|Y>t) &= E(\mu_Y + V_Y^{1/2}Z |Z>s) \\ &= \mu_Y + V_Y^{1/2}i, \end{split} \end{equation*} and \begin{equation*} \begin{split} Var(Y|Y>t) &= Var(\mu_Y + V_Y^{1/2}Z |Z>s) \\ &= V_Y[1 - i(i-s)] \end{split} \end{equation*}
using Distributions
μ = 10
σ = 10
t = 15
s = (t-μ)/σ
d = Normal(0.0,1.0)
i = pdf(d,s)/(1-cdf(d,s))
meanTruncatedNormal = μ + σ*i
variTruncatedNormal = σ*σ*(1 - i*(i-s))
@printf "mean = %8.2f \n" meanTruncatedNormal
@printf "variance = %8.2f \n" variTruncatedNormal
mean = 21.41 variance = 26.85
μ = 10
σ = 10
z = rand(Normal(μ,σ),100000);
mcmcMean = mean(z[z.>t])
mcmcVar = var(z[z.>t])
@printf "MC mean = %8.2f \n" mcmcMean
@printf "MC variance = %8.2f \n" mcmcVar
MC mean = 21.38 MC variance = 26.79
Let $\mathbf(Y) \sim N(\mathbf{\mu},\mathbf{V})$
$ \mathbf{\mu} = \begin{bmatrix} 10\\ 20 \end{bmatrix}, $ $ \mathbf{V} = \begin{bmatrix} 100 & 50\\ 50 & 200 \end{bmatrix} $
μ = [10.0;20.0]
V = [100.0 50.0
50.0 200.0]
d = MvNormal(μ,V)
XY = rand(d,10000)'
10000x2 Array{Float64,2}: 11.9285 19.1144 19.2509 36.145 4.45011 23.9638 20.9049 36.2345 14.7587 29.454 7.85305 29.5753 13.3294 25.8031 17.942 21.6667 2.00279 20.1188 18.2081 29.531 22.5693 9.33006 -6.57064 1.51851 8.77239 28.1197 ⋮ 18.5498 51.7603 33.5773 27.9552 18.3895 26.4949 24.3136 46.4041 6.92297 6.27365 18.2313 22.6488 -8.52665 28.2996 20.986 16.3011 -0.377664 24.9399 24.0369 7.16565 12.1059 21.147 7.17966 23.7316
sel = XY[:,1].>10
10000-element BitArray{1}: true true false true true false true true false true true false false ⋮ true true true true false true false true false true true false
selY = XY[sel,2]
5007-element Array{Float64,1}: 19.1144 36.145 36.2345 29.454 25.8031 21.6667 29.531 9.33006 25.2757 52.031 34.1789 58.6182 10.67 ⋮ 23.9181 36.876 25.7362 17.0781 51.7603 27.9552 26.4949 46.4041 22.6488 16.3011 7.16565 21.147
mean(selY[selY.>30])
39.060176445384066
var(selY[selY.>30])
53.619263788780245
Often no closed form for $f(\mathbf{\theta}|\mathbf{y})$
Further, even if computing $f(\theta|{\mathbf y})$ is feasible, obtaining $f(\theta_{i}|{\mathbf y})$ would require integrating over many dimensions
Thus, in many situations, inferences are made using the empirical posterior constructed by drawing samples from $f({\mathbf \theta}|{\mathbf y})$
Gibbs sampler is widely used for drawing samples from posteriors
Want to draw samples from $f(x_{1},x_{2},\ldots,x_{n})$
Even though it may be possible to compute $f(x_{1},x_{2},\ldots,x_{n})$, it is difficult to draw samples directly from $f(x_{1},x_{2},\ldots,x_{n})$
Gibbs:
Get valid a starting point $\mathbf{x}^{0}$
Draw sample $\mathbf{x}^{t}$ as: $$\begin{matrix}x_{1}^{t} & \text{from} & f(x_{1}|x_{2}^{t-1},x_{3}^{t-1},\ldots,x_{n}^{t-1})\\ x_{2}^{t} & \text{from} & f(x_{2}|x_{1}^{t},x_{3}^{t-1},\ldots,x_{n}^{t-1})\\ x_{3}^{t} & \text{from} & f(x_{3}|x_{1}^{t},x_{2}^{t},\ldots,x_{n}^{t-1})\\ \vdots & & \vdots\\ x_{n}^{t} & \text{from} & f(x_{n}|x_{1}^{t},x_{2}^{t},\ldots,x_{n-1}^{t}) \end{matrix}$$
The sequence ${\mathbf x}^{1},{\mathbf x}^{2},\ldots,{\mathbf x}^{n}$ is a Markov chain with stationary distribution $f(x_{1},x_{2},\ldots,x_{n})$
Can show that samples obtained from a Markov chain can be used to draw inferences from $f(x_{1},x_{2},\ldots,x_{n})$ provided the chain is:
Irreducible: can move from any state $i$ to any other state $j$
Positive recurrent: return time to any state has finite expectation
Markov Chains, J. R. Norris (1997)
Let $f(\mathbf{x})$ be a bivariate normal density with means $$ \mu' = \begin{bmatrix} 1 & 2 \end{bmatrix} $$ and covariance matrix $$ \mathbf{V} = \begin{bmatrix} 1 & 0.5\\ 0.5& 2.0 \end{bmatrix} $$
Suppose we do not know how to draw samples from $f(\mathbf{x})$, but know how to draw samples from $f(x_i|x_j)$, which is univariate normal with mean: $$ \mu_{i.j} = \mu_i + \frac{v_{ij}}{v_{jj}}(x_j - \mu_j) $$ and variance $$ v_{i.j} = v_{ii} - \frac{v^2_{ij}}{v_{jj}} $$
m = fill(0,2)
nSamples = 20000
m = [1.0, 2.0]
v = [1.0 0.5; 0.5 2.0]
y = fill(0.0,2)
save = fill(0.0,2,nSamples)
sum = fill(0.0,2)
s12 = sqrt( v[1,1] - v[1,2]*v[1,2]/v[2,2])
s21 = sqrt(v[2,2] - v[1,2]*v[1,2]/v[1,1])
m1 = 0
m2 = 0;
for (iter in 1:nSamples)
m12 = m[1] + v[1,2]/v[2,2]*(y[2] - m[2])
y[1] = rand(Normal(m12,s12),1)[1]
m21 = m[2] + v[1,2]/v[1,1]*(y[1] - m[1])
y[2] = rand(Normal(m21,s21),1)[1]
sum += y
save[:,iter] = y
mean = sum/iter
if iter%1000 == 0
@printf "%10d %8.2f %8.2f \n" iter mean[1] mean[2]
end
end
1000 1.00 1.96 2000 1.02 2.01 3000 1.01 2.00 4000 1.02 2.02 5000 1.01 2.01 6000 1.01 2.01 7000 1.01 2.02 8000 1.00 2.01 9000 1.00 2.00 10000 1.00 2.00 11000 1.00 2.01 12000 1.00 2.00 13000 1.01 2.00 14000 1.01 2.00 15000 1.01 2.00 16000 1.00 1.99 17000 1.00 1.99 18000 1.00 1.99 19000 1.00 1.99 20000 1.00 2.00
cov(save',)
2x2 Array{Float64,2}: 1.00471 0.499934 0.499934 1.97734
Sometimes may not be able to draw samples directly from $f(x_i|\mathbf{x}_{i\_})$
Convergence of the Gibbs sampler may be too slow
Metropolis-Hastings (MH) for sampling from $f(\mathbf{x})$:
a candidate sample, $y$, is drawn from a proposal distribution $q(y|x^{t-1})$
$$ x^t = \begin{cases} y & \text{with probability}\, \alpha \\ x^{t-1} & \text{with probability}\, 1 - \alpha \\ \end{cases} $$
nSamples = 10000
m = [1.0, 2.0]
v = [1.0 0.5; 0.5 2.0]
vi = inv(v)
y = fill(0.0,2)
sum = fill(0.0,2)
m1 = 0
m2 = 0
xx = 0
y1 = 0
delta = 1.0
min1 = -delta*sqrt(v[1,1])
max1 = +delta*sqrt(v[1,1])
min2 = -delta*sqrt(v[2,2])
max2 = +delta*sqrt(v[2,2])
z = y-m
denOld = exp(-0.5*z'*vi*z)
d1 = Uniform(min1,max1)
d2 = Uniform(min2,max2)
ynew = fill(0.0,2);
for (iter in 1:nSamples)
ynew[1] = y[1] + rand(d1,1)[1]
ynew[2] = y[2] + rand(d2,1)[1]
denNew = exp(-0.5*(ynew-m)'*vi*(ynew-m));
alpha = denNew/denOld;
u = rand()
if (u < alpha[1])
y = copy(ynew)
denOld = exp(-0.5*(y-m)'*vi*(y-m))
end
sum += y
mean = sum/iter
if iter%1000 == 0
@printf "%10d %8.2f %8.2f \n" iter mean[1] mean[2]
end
end
1000 0.93 2.14 2000 0.96 2.07 3000 0.94 2.06 4000 0.96 2.07 5000 0.94 2.04 6000 0.94 2.02 7000 0.93 2.01 8000 0.95 2.01 9000 0.97 2.00 10000 0.97 1.98