Simple linear regression is used to illustrate Bayesian inference, using the Gibbs sampler. The Gibbs sampler is used to draw samples from the posterior distribution of the intercept, the slope and the residual variance.
Consider the linear model:
$$y_{i}=\beta_{0}+x_{i}\beta_{1}+e_{i}. \tag{35}$$where for observation $i$, $y_{i}$ is the value of the dependent variable, $\beta_{0}$ is the intercept, $x_{i}$ is the value of the independent variable and $e_{i}$ is a residual. Flat priors are used for the intercept and slope, and the residuals are assumed to be identically and independently distributed normal random variables with mean zero and variance $\sigma_{e}^{2}.$ A scaled inverted chi-square prior is used for $\sigma_{e}^{2}.$
using Distributions
using StatsBase
n = 200 #number of observations
k = 1 #number of covariates
x = sample([0,1,2],(n,k))
X = [ones(Int64,n) x]
betaTrue = [1,2]
y = X*betaTrue+ randn(n);
In matrix notation, the model (35) is
$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{e},$$where $$\mathbf{X}=\left[\begin{array}{cc} 1 & x_{1}\\ 1 & x_{2}\\ \vdots & \vdots\\ 1 & x_{n} \end{array}\right].$$ Then, the least-squares estimator of $\mathbf{\beta}$ is $$\mathbf{\hat{\boldsymbol{\beta}}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y},$$ and the variance of this estimator is $$Var(\hat{\mathbf{\boldsymbol{\beta}}})=(\mathbf{X}'\mathbf{X})^{-1}\sigma_{e}^{2}.$$
XPX = X'X
rhs = X'y
XPXi= inv(XPX)
println(XPXi)
[0.012246376811594203 -0.007246376811594203 -0.007246376811594203 0.007246376811594203]
betaHat = XPXi*rhs
println(betaHat)
[0.9365046648272068,2.036155693095144]
eHat = y - X*betaHat
resVar = eHat'eHat/(n-2)
println(resVar)
[0.6692032555710138]
Consider making inferences about $\boldsymbol{\beta}$ from $f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2}).$ By using the Bayes theorem, this conditional density is written as
$$\begin{eqnarray} f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2}) & = & \frac{f(\mathbf{y}|\boldsymbol{\beta},\sigma_{e}^{2})f(\boldsymbol{\beta})f(\sigma_{e}^{2})}{f(\mathbf{y},\sigma_{e}^{2})}\nonumber \\ & \propto & f(\mathbf{y}|\boldsymbol{\beta},\sigma_{e}^{2})f(\boldsymbol{\beta})f(\sigma_{e}^{2})\nonumber \\ & \propto & f(\mathbf{y}|\boldsymbol{\beta},\sigma_{e}^{2})\nonumber \\ & = & (2\pi\sigma_{e}^{2})^{-n/2}\exp\left\{ -\frac{1}{2}\frac{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})}{\sigma_{e}^{2}}\right\}\tag{36} \end{eqnarray}$$which looks like the $n$-dimensional normal density of $\mathbf{y}$ with mean $\mathbf{X}\boldsymbol{\beta}$ and covariance matrix $\mathbf{I}\sigma_{e}^{2}.$ But, $f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2})$ should be a two-dimensional density. So, the quadratic $Q=(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})$ in the exponent of (36) is rearranged as
$$\begin{eqnarray} Q & = & (\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})\\ & = & \mathbf{y}'\mathbf{y}-2\mathbf{y}'\mbox{X}\boldsymbol{\beta}+\boldsymbol{\beta}'(\mathbf{X}'\mathbf{X})\boldsymbol{\beta}\\ & = & \mathbf{y}'\mathbf{y}+(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})'(\mathbf{X}'\mathbf{X})(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})-\hat{\boldsymbol{\beta}}'(\mathbf{X}'\mathbf{X})\hat{\boldsymbol{\beta}}, \end{eqnarray}$$where $\hat{\boldsymbol{\beta}}$ is the solution to $(\mathbf{X}'\mathbf{X})\hat{\boldsymbol{\beta}}=\mathbf{X}'\mathbf{y}$, which is the least-squares estimator of $\boldsymbol{\beta}$. In this expression, only the second term depends on $\boldsymbol{\beta}$. Thus, $f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2})$ can be written as
$$ f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2})\propto\exp\left\{ -\frac{1}{2}\frac{(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})'(\mathbf{X}'\mathbf{X})(\boldsymbol{\beta}-\hat{\boldsymbol{\beta}})}{\sigma_{e}^{2}}\right\}, $$which can be recognized as proportional to the density for a two-dimensional normal distribution with mean $\hat{\boldsymbol{\beta}}$ and variance $(\mathbf{X}'\mathbf{X})^{-1}\sigma_{e}^{2}$. Thus, in this simple setting, the posterior mean of $\boldsymbol{\beta}$ is given by the least-squares estimate, and drawing samples from the posterior are not needed. But, to illustrate the Gibbs sampler, we will apply it to this simple example.
The simple regression model can be written as $$\mathbf{y=}\mathbf{1}\beta_{0}+\mathbf{x}\beta_{1}+\mathbf{e}.$$ In the Gibbs sampler, $\beta_{0}$ is sampled from its full-conditional posterior: $f(\beta_{0}|\mathbf{y},\beta_{1},\sigma_{e}^{2}).$ This conditional distribution is computed for the current values of $\beta_{1}$ and $\sigma_{e}^{2}.$ So, we can write the model as $$\mathbf{w}_{0}=\mathbf{1}\beta_{0}+\mathbf{e},$$ where $\mathbf{w}_{0}=\mathbf{y}-\mathbf{x}\beta_{1}.$ Then, the least-squares estimator of $\beta_{0}$ is $$\hat{\beta}_{0}=\frac{\mathbf{1}'\mathbf{w}_{0}}{\mathbf{1}'\mathbf{1}},$$ and the variance of this estimator is $$Var(\hat{\beta_{0}})=\frac{\sigma_{e}^{2}}{\mathbf{1}'\mathbf{1}}.$$ By applying the strategy used to derive $f(\boldsymbol{\beta}|\mathbf{y},\sigma_{e}^{2})$ above, the full-conditional posterior for $\beta_{0}$ can be shown to be a normal distribution with mean $\hat{\beta_{0}}$ and variance $\frac{\sigma_{e}^{2}}{\mathbf{1}'\mathbf{1}}.$ Similarly, the full-conditional posterior for $\beta_{1}$ is a normal distribution with mean $$\hat{\beta}_{1}=\frac{\mathbf{x}'\mathbf{w}_{1}}{\mathbf{x}'\mathbf{x}}$$ and variance $\frac{\sigma_{e}^{2}}{\mathbf{x}'\mathbf{x}}$, where $\mathbf{w}_{1}=\mathbf{y}-1\beta_{0}.$ In the calculations below, we will use the true value of $\sigma_{e}^{2}.$
# loop for Gibbs sampler
niter = 10000 # number of samples
b = [0.0, 0.0] # initial value of b
meanB = [0.0, 0.0]
a=Float64[]
for iter = 1:niter
# sampling intercept
w = y - X[:,2] * b[2]
x = X[:,1]
xpxi = 1/(x'x)[1,1]
bHat = (xpxi*x'w)[1,1]
b[1] = rand(Normal(bHat, sqrt(xpxi))) # using residual var = 1
# sampling slope
w = y - X[:,1]*b[1]
x = X[:,2]
xpxi = 1/(x'x)[1,1]
bHat = (xpxi*x'w)[1,1]
b[2] = rand(Normal(bHat, sqrt(xpxi))) # using residual var = 1
meanB = meanB + b
push!(a,b[2])
if ((iter%1000) == 0)
@printf("Intercept = %6.3f \n", meanB[1]/iter)
@printf("Slope = %6.3f \n", meanB[2]/iter)
end
end
Intercept = 0.954 Slope = 2.022 Intercept = 0.946 Slope = 2.029 Intercept = 0.941 Slope = 2.033 Intercept = 0.940 Slope = 2.033 Intercept = 0.938 Slope = 2.035 Intercept = 0.938 Slope = 2.036 Intercept = 0.938 Slope = 2.035 Intercept = 0.938 Slope = 2.035 Intercept = 0.938 Slope = 2.036 Intercept = 0.936 Slope = 2.037
using Gadfly
plot(x=a, Geom.histogram,
Guide.title("Posterior distribution of β1"),
Guide.ylabel("Frequency"),
Guide.xlabel("β1"))
Recall that we assumed a scaled inverted chi-square prior for $\sigma_{e}^{2}$. The density function for this is:
$$f(\sigma_{e}^{2})=\frac{(S_{e}^{2}\nu_{e}/2)^{\nu_{e}/2}}{\Gamma(\nu_{e}/2)}(\sigma_{e}^{2})^{-(2+\nu_{e})/2}\exp\left\{ -\frac{\nu_{e}S_{e}^{2}}{2\sigma_{e}^{2}}\right\},\tag{37}$$where $S_{e}^{2}$ and $\nu_{e}$ are the scale and the degrees of freedom parameters for this distribution. Applying Bayes theorem to combine this prior with the “likelihood” (given in (36)), the full-conditional posterior for the residual variance can be written as
$$\begin{aligned} f(\sigma_{e}^{2}|\mathbf{y},\boldsymbol{\beta}) & = \frac{f(\mathbf{y}|\boldsymbol{\beta},\sigma_{e}^{2})f(\boldsymbol{\beta})f(\sigma_{e}^{2})}{f(\mathbf{y},\boldsymbol{\beta})}\nonumber \\ & \propto f(\mathbf{y}|\boldsymbol{\beta},\sigma_{e}^{2})f(\boldsymbol{\beta})f(\sigma_{e}^{2})\nonumber \\ & \propto (\sigma_{e}^{2})^{-n/2}\exp\left\{ -\frac{1}{2}\frac{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})}{\sigma_{e}^{2}}\right\} \nonumber \\ & \times (\sigma_{e}^{2})^{-(2+\nu_{e})/2}\exp\left\{ -\frac{\nu_{e}S_{e}^{2}}{2\sigma_{e}^{2}}\right\} \nonumber \\ & = (\sigma_{e}^{2})^{-(n+2+\nu_{e})/2}\exp\left\{ -\frac{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})+\nu_{e}S_{e}^{2}}{2\sigma_{e}^{2}}\right\}. \end{aligned}\tag{38}$$Comparing (38) with (37), can see that it is proportional to a scaled inverse chi-squared density with $\tilde{\nu}_{e}=n+\nu_{e}$ degrees of freedom and $\tilde{S_{e}^{2}}=\frac{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})+\nu_{e}S_{e}^{2}}{\tilde{\nu_{e}}}$ scale parameter. A sample from this density can be obtained as $\frac{(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})'(\mathbf{y}-\mathbf{X}\boldsymbol{\beta})+\nu_{e}S_{e}^{2}}{\chi_{\tilde{\nu_{e}}}^{2}}$, where $\chi_{\tilde{\nu_{e}}}^{2}$ is a chi-squared random variable with $\tilde{\nu_{e}}$ degrees of freedom.
In the Julia script given here, the simulated value of the residual variance was used in the sampling of $\boldsymbol{\beta}$. Extend this script to also sample $\sigma_{e}^{2}$ from its full-conditional posterior given above. In Julia, rand(Chisq($\nu$),1) gives a chi-squared random variable with $\nu$ degrees of freedom. In the solution given here a flat prior for $\sigma_{e}^{2}$ was used.
Consider the simple regression model that can be written as $$\mathbf{y=}\mathbf{1}\beta_{0}+\mathbf{x}\beta_{1}+\mathbf{e}.$$ Here we consider a model with a flat prior for $\beta_{0}$ and a normal prior for the slope:
$$\beta_{1}\sim N(0,\sigma_{\beta}^{2}),$$where $\sigma_{\beta}^{2}$ is assumed to be known.
Then, the full-conditional posterior for $\theta'=[\boldsymbol{\beta},\sigma_{e}^{2}]$ is
$$\begin{aligned} f(\boldsymbol{\theta}|\mathbf{y}) & \propto f(\mathbf{y}|\boldsymbol{\theta})f(\boldsymbol{\theta})\\ & \propto \left(\sigma_{e}^{2}\right)^{-n/2}\exp\left\{ -\frac{(\mathbf{y}-\mathbf{1}\beta_{0}-\mathbf{x}\beta_{1})'(\mathbf{y}-\mathbf{1}\beta_{0}-\mathbf{x}\beta_{1})}{2\sigma_{e}^{2}}\right\} \\ & \times \left(\sigma_{\beta}^{2}\right)^{-1/2}\exp\left\{ -\frac{\beta_{1}^{2}}{2\sigma_{\beta}^{2}}\right\} \\ & \times (\sigma_{e}^{2})^{-(2+\nu_{e})/2}\exp\left\{ -\frac{\nu_{e}S_{e}^{2}}{2\sigma_{e}^{2}}\right\} .\\ \end{aligned}$$The full-conditional for $\beta_{1}$ is obtained by dropping all terms and factors that do not involve $\beta_{1}$:
$$\begin{aligned} f(\beta_{1}|\text{ELSE}) & \propto \exp\left\{ -\frac{(\mathbf{y}-\mathbf{1}\beta_{0}-\mathbf{x}\beta_{1})'(\mathbf{y}-\mathbf{1}\beta_{0}-\mathbf{x}\beta_{1})}{2\sigma_{e}^{2}}\right\} \times \exp\left\{ -\frac{\beta_{1}^{2}}{2\sigma_{\beta}^{2}}\right\} \\ & \propto \exp\left\{ -\frac{\mathbf{w}'\mathbf{w}-2\mathbf{w}'\mathbf{x}\beta_{1}+\beta_{1}^{2}(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})}{2\sigma_{e}^{2}}\right\} \\ & \propto \exp\left\{ -\frac{\mathbf{w}'\mathbf{w}-(\beta_{1}-\hat{\beta}_{1})^{2}(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})-\hat{\beta}_{1}^{2}(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})}{2\sigma_{e}^{2}}\right\} \\ & \propto \exp\left\{ -\frac{(\beta_{1}-\hat{\beta}_{1})^{2}}{\frac{2\sigma_{e}^{2}}{(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})}}\right\} , \end{aligned}$$where $$\hat{\beta}_{1}=\frac{\mathbf{x}'\mathbf{w}}{(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})},$$ and $\mathbf{w}=\mathbf{y}-\mathbf{1}\beta_{0}.$ So, the full-conditional posterior for $\beta_{1}$ is a normal distribution with $ $mean $\hat{\beta}_{1}$ and variance $\frac{\sigma_{e}^{2}}{(\mathbf{x}'\mathbf{x}+\sigma_{e}^{2}/\sigma_{\beta}^{2})}.$
Use Julia to simulate a vector of 1000 values for $\beta_{1}$ from a normal distribution with mean zero and variance 3. Plot a histogram of these values.
Use $\beta_{0}=1$, $\beta_{1}=2$ and $\sigma_{e}^{2}=5$, to generate a vector of observations, $\mathbf{y},$
that follows a simple linear regression model.
Use the Gibbs sampler to draw 10,000 samples for $\beta_{1}$ from its posterior distribution.
Compute the mean and variance of the sampled values.
Draw a histogram of the sampled values. Compare with prior.