Notebook

Setting the Random Seed¶

Author(s): Paul Miles | Date Created: June 19, 2018

For the purpose of testing the MCMC simulation on a particular problem, it may be useful to check whether the results are repeatable. This can be accomplished by setting the seed for the random number generator used within pymcmcstat. This tutorial outlines how to accomplish this, and demonstrates the repeatability of the results.

First we import the required packages, define our model/sum-of-squares functions, and define a data set.

In [1]:

import numpy as np
from pymcmcstat.MCMC import MCMC
np.seterr(over = 'ignore')

# define test model function
def test_modelfun(xdata, theta):
    m = theta[0]
    b = theta[1]
    nrow, ncol = xdata.shape
    y = np.zeros([nrow, 1])
    y[:,0] = m*xdata.reshape(nrow,) + b
    return y

def test_ssfun(theta, data):
    xdata = data.xdata[0]
    ydata = data.ydata[0]
    # eval model
    ymodel = test_modelfun(xdata, theta)
    # calc sos
    ss = sum((ymodel[:, 0] - ydata[:, 0])**2)
    return ss

# define data
nds = 100
x = np.linspace(2, 3, num=nds)
y = 2.*x + 3. + 0.1*np.random.standard_normal(x.shape)

Initialize MCMC object and set seed¶

By default, no seed is specifield. To specify a seed simply define a numeric value for the keywork argument rngseed.

In [2]:

# Initialize MCMC object
mcstat = MCMC(rngseed=1234)

Setup and run rest of simulation¶

We set the

data
simulation options
model settings
model parameters

just like we would for any other simulation.

In [3]:

mcstat.data.add_data_set(x, y)
mcstat.simulation_options.define_simulation_options(
    nsimu=int(5.0e3), updatesigma=True, method='dram')
# update model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.parameters.add_model_parameter(
    name='m',
    theta0=2.,
    minimum=-10,
    maximum=np.inf,
    sample=True)
mcstat.parameters.add_model_parameter(
    name='b',
    theta0=-5.,
    minimum=-10,
    maximum=100,
    sample=True)
# run mcmc
mcstat.run_simulation()

Sampling these parameters:
      name      start [      min,       max] N(       mu,   sigma^2)
         m:      2.00 [   -10.00,       inf] N( 0.00e+00,      inf)
         b:     -5.00 [   -10.00,    100.00] N( 0.00e+00,      inf)
 [-----------------100%-----------------] 5000 of 5000 complete in 1.5 sec

Extract Results and Display Chainstats¶

In addition, we check the last row of the chain to see where the simulation ended.

In [5]:

# Extract results
results = mcstat.simulation_results.results
chain = results['chain']
s2chain = results['s2chain']
sschain = results['sschain']
names = results['names']
# define burnin
burnin = 2000
# display chain statistics
mcstat.chainstats(chain[burnin:, :], results)
print('chain[-1, :] = {}'.format(chain[-1, :]))

---------------------
name      :       mean        std     MC_err        tau     geweke
m         :     2.0130     0.0320     0.0038    48.7245     0.9959
b         :     2.9673     0.0800     0.0096    49.0697     0.9921
---------------------
chain[-1, :] = [2.01337637 2.95431498]

Check Repeatability¶

To check the repeatability we simply create a new MCMC object with the same random seed and compare the result.

In [6]:

# Initialize MCMC object
mcstat = MCMC(rngseed=1234)
mcstat.data.add_data_set(x, y)
mcstat.simulation_options.define_simulation_options(
    nsimu=int(5.0e3), updatesigma=True, method='dram')
# update model settings
mcstat.model_settings.define_model_settings(sos_function=test_ssfun)
mcstat.parameters.add_model_parameter(
    name='m',
    theta0=2.,
    minimum=-10,
    maximum=np.inf,
    sample=True)
mcstat.parameters.add_model_parameter(
    name='b',
    theta0=-5.,
    minimum=-10,
    maximum=100,
    sample=True)
# run mcmc
mcstat.run_simulation()
# Extract results
results = mcstat.simulation_results.results
chain = results['chain']
s2chain = results['s2chain']
sschain = results['sschain']
names = results['names']
# define burnin
burnin = 2000
# display chain statistics
mcstat.chainstats(chain[burnin:, :], results)
print('chain[-1, :] = {}'.format(chain[-1, :]))

Sampling these parameters:
      name      start [      min,       max] N(       mu,   sigma^2)
         m:      2.00 [   -10.00,       inf] N( 0.00e+00,      inf)
         b:     -5.00 [   -10.00,    100.00] N( 0.00e+00,      inf)
 [-----------------100%-----------------] 5000 of 5000 complete in 1.4 sec
---------------------
name      :       mean        std     MC_err        tau     geweke
m         :     2.0130     0.0320     0.0038    48.7245     0.9959
b         :     2.9673     0.0800     0.0096    49.0697     0.9921
---------------------
chain[-1, :] = [2.01337637 2.95431498]

It is clearly seen that the results are identical to the first simulation, so the random number process is repeatable.