Author(s): Paul Miles | Date Created: July 12, 2018
Many models require a set of parameters as input; however, it is not always of interest to estimate those parameter's posterior distribution. The pymcmcstat package will allow you to send static parameters to the model along with the parameters being sampled. We demonstrated this feature by considering a simple linear model.
# Import required packages
import numpy as np
from pymcmcstat.MCMC import MCMC
import matplotlib.pyplot as plt
np.seterr(over = 'ignore');
We define a linear model function and corresponding sum-of-squares function.
# define test model function
def linear_model(xdata, theta):
m = theta[0]
b = theta[1]
nrow, ncol = xdata.shape
y = np.zeros([nrow, 1])
y[:, 0] = m*xdata.reshape(nrow,) + b
return y
def ssfun(theta, data):
xdata = data.xdata[0]
ydata = data.ydata[0]
# eval model
ymodel = linear_model(xdata, theta)
# calc sos
ss = sum((ymodel[:, 0] - ydata[:, 0])**2)
return ss
We generate sythetic data using the linear model and adding observation errors $$y_i = 2x_i + 3 + \epsilon_i, \;\; \epsilon_i \sim N(0, 0.1)$$ We plot the data and linear model and see randomly distributed noise with respect to the model response.
nds = 100
x = np.linspace(2, 3, num=nds).reshape(nds, 1)
theta_data = np.array([2.0, 3.0])
y = linear_model(x, theta_data) + 0.1*np.random.standard_normal(x.shape)
plt.plot(x, y, '.b');
plt.plot(x, linear_model(x, theta_data), '-r');
# Initialize MCMC object
mcstat = MCMC()
mcstat.data.add_data_set(x, y)
mcstat.simulation_options.define_simulation_options(
nsimu=int(5.0e3),
updatesigma=True,
method='dram')
mcstat.model_settings.define_model_settings(sos_function=ssfun)
When defining the model parameters, we can turn the sampling on or off by specifying the value of the sample
argument.
sample = True
: Parameter will be treated as a random variable and be included in the sampling chain. This is the default behavior.sample = False
: Parameter will be held constant at the initial value defined by the user.mcstat.parameters.add_model_parameter(
name='m',
theta0=2.,
sample=False)
mcstat.parameters.add_model_parameter(
name='b',
theta0=5.,
minimum=-10,
maximum=100,
sample=True)
Observe that the only parameters displayed are the ones where sample = True
.
mcstat.run_simulation()
# Extract results
results = mcstat.simulation_results.results
chain = results['chain']
s2chain = results['s2chain']
sschain = results['sschain']
names = results['names']
# define burnin
burnin = 2000
# display chain statistics
mcstat.chainstats(chain[burnin:, :], results)
Sampling these parameters: name start [ min, max] N( mu, sigma^2) b: 5.00 [ -10.00, 100.00] N( 0.00e+00, inf) [-----------------100%-----------------] 5000 of 5000 complete in 1.6 sec --------------------- name : mean std MC_err tau geweke b : 3.0184 0.0087 0.0004 5.9851 0.9995 ---------------------
The plots only show the results of the sampled parameters b
.
# generate mcmc plots
mcpl = mcstat.mcmcplot # initialize plotting methods
mcpl.plot_density_panel(chain[burnin:, :], names,
figsizeinches=(3, 3))
mcpl.plot_chain_panel(chain[burnin:, :], names,
figsizeinches=(3, 3));
# generate prediction intervals
def pred_modelfun(preddata, theta):
return linear_model(preddata.xdata[0], theta)
mcstat.PI.setup_prediction_interval_calculation(
results=results,
data=mcstat.data,
modelfunction=pred_modelfun,
burnin=burnin)
mcstat.PI.generate_prediction_intervals(calc_pred_int=True)
# plot prediction intervals
data_display = dict(
marker='o',
markersize=5,
markeredgecolor='k',
markeredgewidth=1,
markerfacecolor='w')
model_display = dict(color='r', linestyle='--')
mcstat.PI.plot_prediction_intervals(
adddata=True,
plot_pred_int=True,
figsizeinches=(6, 6),
model_display=model_display,
data_display=data_display);
Generating credible/prediction intervals: Interval generation complete