MIT License
ParaMonte: plain powerful parallel Monte Carlo library.
Copyright (C) 2012-present, The Computational Data Science Lab
https://github.com/cdslaborg/paramonte

WARNING¶

This Jupyter Notebook is not maintained anymore and is present here only for archival purposes. See the Jupyter Notebook in the link below for an updated version of this archived notebook:

https://nbviewer.jupyter.org/github/cdslaborg/paramontex/blob/main/Python/Jupyter/sampling_multivariate_normal_distribution_via_paradram/sampling_multivariate_normal_distribution_via_paradram.ipynb

In [ ]:

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

In [1]:

%matplotlib notebook
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import paramonte as pm
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::                                                                                       ::::

       _/_/_/_/                                   _/_/    _/_/
        _/    _/                                  _/_/_/_/_/                     _/
       _/    _/ _/_/_/_/   _/ /_/_/ _/_/_/_/     _/  _/  _/   _/_/   _/_/_/   _/_/_/  _/_/_/
      _/_/_/   _/    _/   _/_/     _/    _/     _/      _/  _/   _/ _/    _/   _/   _/_/_/_/                        
     _/       _/    _/   _/       _/    _/     _/      _/  _/   _/ _/    _/   _/   _/    
  _/_/_/       _/_/_/_/ _/         _/_/_/_/ _/_/_/  _/_/_/  _/_/  _/    _/   _/_/   _/_/_/


                                          ParaMonte
                                   plain powerful parallel
                                     Monte Carlo library
                                        Version 2.5.1

::::                                                                                       ::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

ParaMonte - NOTE: The ParaMonte::Kernel samplers have no Python package dependencies 
ParaMonte - NOTE: beyond numpy. However, the ParaMonte::Python post-processing and 
ParaMonte - NOTE: visualization tools require the following Python packages, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     numpy : 1.19.2
ParaMonte - NOTE:     scipy : 1.5.2
ParaMonte - NOTE:     pandas : 1.1.2
ParaMonte - NOTE:     seaborn : 0.11.0
ParaMonte - NOTE:     matplotlib : 3.3.2
ParaMonte - NOTE: 
ParaMonte - NOTE: If you do not intend to use the postprocessing and visualization tools, 
ParaMonte - NOTE: you can ignore this message. Otherwise, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     UPDATE THE ABOVE PACKAGES TO THE REQUESTED VERSIONS OR NEWER TO ENABLE
ParaMonte - NOTE:     THE VISUALIZATION AND POSTPROCESSING TOOLS OF THE ParaMonte LIBRARY.


ParaMonte - NOTE: Intel MPI library for 64-bit architecture detected at: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin
ParaMonte - NOTE: 
ParaMonte - NOTE: To perform ParaMonte simulations in parallel on a single node, 
ParaMonte - NOTE: run the following two commands, in the form and order specified, 
ParaMonte - NOTE: on a Python-aware mpiexec-aware command-line interface such as 
ParaMonte - NOTE: Anaconda3 Windows command prompt: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin\mpivars.bat"
ParaMonte - NOTE: 
ParaMonte - NOTE:     mpiexec -localonly -n 2 python main_mpi.py
ParaMonte - NOTE: 
ParaMonte - NOTE: where, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     0.   the first command defines the essential environment variables, 
ParaMonte - NOTE:          and the second command runs in the simulation in parallel, where, 
ParaMonte - NOTE:     1.   you should replace the number 2 with the desired processor count 
ParaMonte - NOTE:          you wish to assign to your simulation task and, 
ParaMonte - NOTE:     2.   the flag '-localonly' indicates a parallel simulation on only 
ParaMonte - NOTE:          a single node (this flag will obviate the need for the MPI 
ParaMonte - NOTE:          library credentials registration). For more information, visit: 
ParaMonte - NOTE:          https://software.intel.com/en-us/get-started-with-mpi-for-windows 
ParaMonte - NOTE:     3.   python is the name of the Python software installed on your system.
ParaMonte - NOTE:          If python is not recognized on your command line, use python3 instead.
ParaMonte - NOTE:     4.   main_mpi.py is the Python file which serves as the entry point to 
ParaMonte - NOTE:          your simulation, where you call the ParaMonte sampler routines. 
ParaMonte - NOTE: 
ParaMonte - NOTE: Note that the above two commands must be executed on a command-line that 
ParaMonte - NOTE: recognizes both Python and mpiexec applications, such as the Anaconda 
ParaMonte - NOTE: command-line interface. For more information, in particular, on how 
ParaMonte - NOTE: to register to run Hydra services for multi-node simulations 
ParaMonte - NOTE: on Windows servers, visit: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     https://www.cdslab.org/paramonte


ParaMonte - NOTE: To check for the MPI library installation status or display the above 
ParaMonte - NOTE: messages in the future, type the following on the Python command-line: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.verify()
ParaMonte - NOTE: 
ParaMonte - NOTE: To get started, type the following on the Python command-line,
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.helpme()
ParaMonte - NOTE: 
ParaMonte - NOTE: To get help on individual components of the paramonte modules,
ParaMonte - NOTE: pass the name of the components to `helpme()`. For example,
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.helpme("ParaDRAM")

numpy 1.19.4
scipy 1.5.4
pandas 1.2.0
seaborn 0.11.0
paramonte 2.5.1
matplotlib 3.3.3

Sampling a 4-dimensional MultiVariate Normal distribution (MVN) via the ParaMonte library's ParaDRAM routine¶

The logic of Monte Carlo sampling is very simple:

Suppose we have a mathematical objective function defined on a ndim-dimensional domain. Typically, this domain is defined on the space of real numbers. In general, understanding and visualizing the structure of such objective functions is an extremely difficult task, if not impossible. As a better alternative, you employ Monte Carlo techniques that can randomly explore the structure of the objective function and find the function's extrema.

In the following example below, an example of one of the simplest such objective functions, the Multivariate Normal Distribution (MVN), is constructed and sampled using the ParaMonte library samplers, here, the ParaDRAM sampler (Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler).

Suppose we want to sample random points from a MultiVariate Normal (MVN) Probability Density Function (PDF). The PDF equation of MVN is the following,

$$ \large \pi (x, \mu, \Sigma, k = 4) = (2\pi)^{-\frac{k}{2}} ~ |\Sigma|^{-\frac{1}{2}} ~ \exp \bigg( -\frac{1}{2} (x-\mu)^T ~ \Sigma^{-1} ~ (x-\mu) \bigg) $$

The following Python function getLogFunc() returns the natural logarithm of the Probability Density Function of the multivariate Normal distribution.

Since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead. This is the reason behind the naming convention used in the ParaMonte library for the user's objective functions: getLogFunc, implying that the user must provide a function that returns the natural logarithm of the target objective function.

See this Jupyter Notebook for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.

In [2]:

import numpy as np

NDIM = 4                                # number of dimensions of the domain of the MVN PDF
MEAN =  np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
                    , [.45,1.0,0.3,-.2]
                    , [-.3,0.3,1.0,0.6]
                    , [0.0,-.2,0.6,1.0]
                    ] )

INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.

# The following is the log of the coefficient used in the definition of the MVN.

MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )

# the logarithm of objective function: log(MVN)

def getLogFunc(point):
    """
    Return the logarithm of the MVN PDF.
    """
    normedPoint = MEAN - point
    return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )

We will sample random points from the objective function by calling the ParaDRAM sampler (Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler) of the ParaMonte library.

Running a ParaDRAM simulation on a single processor¶

To run the sampler in parallel, you will have to first save the MPI-enabled script as an external file. Visit the ParaMonte library's documentation website for more information, or see this parallel ParaDRAM simulation Jupyter notebook example.

In [3]:

import paramonte as pm

First we will check what version of the ParaMonte library your are using,

In [4]:

print( pm.version.kernel.get() )
print( pm.version.interface.get() )

ParaMonte Python Kernel Version 1.5.1
ParaMonte Python Interface Version 2.5.1

We can also dump the ParaMonte library interface version,

In [5]:

print( pm.version.kernel.dump() )
print( pm.version.interface.dump() )

1.5.1
2.5.1

We can also verify the existence of the essential components of the current installation of the ParaMonte Python library on the system,

In [6]:

pm.verify()

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
::::                                                                                       ::::

       _/_/_/_/                                   _/_/    _/_/
        _/    _/                                  _/_/_/_/_/                     _/
       _/    _/ _/_/_/_/   _/ /_/_/ _/_/_/_/     _/  _/  _/   _/_/   _/_/_/   _/_/_/  _/_/_/
      _/_/_/   _/    _/   _/_/     _/    _/     _/      _/  _/   _/ _/    _/   _/   _/_/_/_/                        
     _/       _/    _/   _/       _/    _/     _/      _/  _/   _/ _/    _/   _/   _/    
  _/_/_/       _/_/_/_/ _/         _/_/_/_/ _/_/_/  _/_/_/  _/_/  _/    _/   _/_/   _/_/_/


                                          ParaMonte
                                   plain powerful parallel
                                     Monte Carlo library
                                        Version 2.5.1

::::                                                                                       ::::
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

ParaMonte - NOTE: The ParaMonte::Kernel samplers have no Python package dependencies 
ParaMonte - NOTE: beyond numpy. However, the ParaMonte::Python post-processing and 
ParaMonte - NOTE: visualization tools require the following Python packages, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     numpy : 1.19.2
ParaMonte - NOTE:     scipy : 1.5.2
ParaMonte - NOTE:     pandas : 1.1.2
ParaMonte - NOTE:     seaborn : 0.11.0
ParaMonte - NOTE:     matplotlib : 3.3.2
ParaMonte - NOTE: 
ParaMonte - NOTE: If you do not intend to use the postprocessing and visualization tools, 
ParaMonte - NOTE: you can ignore this message. Otherwise, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     UPDATE THE ABOVE PACKAGES TO THE REQUESTED VERSIONS OR NEWER TO ENABLE
ParaMonte - NOTE:     THE VISUALIZATION AND POSTPROCESSING TOOLS OF THE ParaMonte LIBRARY.


ParaMonte - NOTE: Intel MPI library for 64-bit architecture detected at: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin
ParaMonte - NOTE: 
ParaMonte - NOTE: To perform ParaMonte simulations in parallel on a single node, 
ParaMonte - NOTE: run the following two commands, in the form and order specified, 
ParaMonte - NOTE: on a Python-aware mpiexec-aware command-line interface such as 
ParaMonte - NOTE: Anaconda3 Windows command prompt: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin\mpivars.bat"
ParaMonte - NOTE: 
ParaMonte - NOTE:     mpiexec -localonly -n 2 python main_mpi.py
ParaMonte - NOTE: 
ParaMonte - NOTE: where, 
ParaMonte - NOTE: 
ParaMonte - NOTE:     0.   the first command defines the essential environment variables, 
ParaMonte - NOTE:          and the second command runs in the simulation in parallel, where, 
ParaMonte - NOTE:     1.   you should replace the number 2 with the desired processor count 
ParaMonte - NOTE:          you wish to assign to your simulation task and, 
ParaMonte - NOTE:     2.   the flag '-localonly' indicates a parallel simulation on only 
ParaMonte - NOTE:          a single node (this flag will obviate the need for the MPI 
ParaMonte - NOTE:          library credentials registration). For more information, visit: 
ParaMonte - NOTE:          https://software.intel.com/en-us/get-started-with-mpi-for-windows 
ParaMonte - NOTE:     3.   python is the name of the Python software installed on your system.
ParaMonte - NOTE:          If python is not recognized on your command line, use python3 instead.
ParaMonte - NOTE:     4.   main_mpi.py is the Python file which serves as the entry point to 
ParaMonte - NOTE:          your simulation, where you call the ParaMonte sampler routines. 
ParaMonte - NOTE: 
ParaMonte - NOTE: Note that the above two commands must be executed on a command-line that 
ParaMonte - NOTE: recognizes both Python and mpiexec applications, such as the Anaconda 
ParaMonte - NOTE: command-line interface. For more information, in particular, on how 
ParaMonte - NOTE: to register to run Hydra services for multi-node simulations 
ParaMonte - NOTE: on Windows servers, visit: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     https://www.cdslab.org/paramonte


ParaMonte - NOTE: To check for the MPI library installation status or display the above 
ParaMonte - NOTE: messages in the future, type the following on the Python command-line: 
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.verify()
ParaMonte - NOTE: 
ParaMonte - NOTE: To get started, type the following on the Python command-line,
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.helpme()
ParaMonte - NOTE: 
ParaMonte - NOTE: To get help on individual components of the paramonte modules,
ParaMonte - NOTE: pass the name of the components to `helpme()`. For example,
ParaMonte - NOTE: 
ParaMonte - NOTE:     import paramonte as pm
ParaMonte - NOTE:     pm.helpme("ParaDRAM")

Setting up the ParaDRAM simulation specifications¶

Now, define a ParaDRAM sampler instance,

In [7]:

pmpd = pm.ParaDRAM()

The simplest scenario is to run the simulation with the default specifications that are appropriately determined by the ParaDRAM sampler. However, for the purpose of clarity reproducibility, we will specify a few simulation specs,

For a complete list of all ParaDRAM simulation specifications, visit the ParaDRAM simulation specifications webpage.

The simulation output file names and storage path¶

We will store all output files in a directory with the same name as this Jupyter Notebook's name and with the same prefix.

By default, all ParaDRAM simulation output files are named properly (see this page for a review of ParaDRAM simulation output files). In general, it is a good habit to specify an output file prefix for any ParaDRAM simulation. This way, the restart of an incomplete simulation, if it happens for some reasons, would be seamless and doable by just rerunning the simulation, without any change of any sort to any parts of the codes or the simulation.

In [8]:

pmpd.spec.outputFileName = "./out/mvn_serial"

By default, the ParaMonte samplers do not overwrite old existing simulation files for obvious reasons. For this example, however, we will let the sampler overwrite the simulation output files if they already exist at the specified path,

In [9]:

pmpd.spec.overwriteRequested = True

Ensuring the reproducibility of the results¶

We will also initialize the random seed to a fixed number to ensure the reproducibility of the simulation results in the future,

In [10]:

pmpd.spec.randomSeed = 3751

Not so important, but we will also specify a chainSize here, 30000, the number of unique points to be sampled from the objective function. This number is much smaller than the default 100,000 unique sample points of the sampler.

In [11]:

pmpd.spec.chainSize = 30000

For the sake of illustration, we will also request an ASCII-format restart output file,

In [12]:

pmpd.spec.restartFileFormat = "ascii"

Calling the ParaDRAM sampler¶

Note that none of the above specification settings are necessary, but is in general a good habit to define them prior to the simulation.
We now call the ParaDRAM sampler routine to run the sampling simulation,

In [13]:

# run the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
                , getLogFunc = getLogFunc
                )

ParaDRAM - NOTE: Running the ParaDRAM sampler in serial mode...
ParaDRAM - NOTE: To run the ParaDRAM sampler in parallel mode visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte
ParaDRAM - NOTE: 
ParaDRAM - NOTE: If you are using Jupyter notebook, check the Jupyter's 
ParaDRAM - NOTE: terminal window for realtime simulation progress and report.


ParaDRAM - NOTE: To read the generated output files, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.readReport()      # to read the summary report from the output report file.
ParaDRAM - NOTE:     pmpd.readSample()      # to read the final i.i.d. sample from the output sample file.
ParaDRAM - NOTE:     pmpd.readChain()       # to read the uniquely-accepted points from the output chain file.
ParaDRAM - NOTE:     pmpd.readMarkovChain() # to read the Markov Chain. NOT recommended for very large chains.
ParaDRAM - NOTE:     pmpd.readRestart()     # to read the contents of an ASCII-format output restart file.
ParaDRAM - NOTE:     pmpd.readProgress()    # to read the contents of an output progress file.
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you should replace `pmpd` with your ParaDRAM sampler's object name.
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte

This will print the realtime simulation progress information on your Anaconda prompt window (not inside your Jupyter notebook). Once the simulation is finished, the ParaDRAM routine generates 5 output files, each of which contains information about certain aspects of the simulation,

mvn_serial_process_1_report.txt
This file contains reports about all aspects of the simulation, from the very beginning of the simulation to the very last step, including the postprocessing of the results and final sample refinement.
mvn_serial_process_1_progress.txt
This file contains dynamic realtime information about the simulation progress. The frequency of the progress report to this file depends on the value of the simulation specification progressReportPeriod.
mvn_serial_process_1_sample.txt
This file contains the final fully-refined decorrelated i.i.d. random sample from the objective function.
mvn_serial_process_1_chain.txt
This file contains the Markov chain, generally in either binary or condensed (compact) ASCII format to improve the storage efficiency of the chain on your computer.

Now, we can read the generated output sample, contained in the file suffixed with *_sample.txt.

In [14]:

pmpd.readSample() # generates pmpd.sampleList
sample = pmpd.sampleList[0]

ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readSample()``
ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. 
ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed?
ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated 
ParaDRAM - WARNING: file format for the contents of the sample file(s) to be parsed.


ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_sample.txt"


ParaDRAM - NOTE: processing sample file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\out\mvn_serial_process_1_sample.txt
ParaDRAM - NOTE: reading the file contents... done in 0.029001 seconds.
ParaDRAM - NOTE: ndim = 4, count = 8924
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.028003 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.019 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.001 seconds.
ParaDRAM - NOTE: adding the graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.000999 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.001035 seconds.

ParaDRAM - NOTE: The processed sample files are now stored in the newly-created
ParaDRAM - NOTE: component `sampleList` of the ParaDRAM object as a Python list.
ParaDRAM - NOTE: For example, to access the contents of the first (or the only) sample file, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.sampleList[0].df
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the plotting tools, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For example,
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.line()         # to make 2D line plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.scatter()      # to make 2D scatter plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.lineScatter()  # to make 2D line-scatter plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.line3()        # to make 3D line plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.scatter3()     # to make 3D scatter plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.lineScatter3() # to make 3D line-scatter plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.contour()      # to make fast 2D kernel density plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.contourf()     # to make fast 2D kernel density filled contour plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.contour3()     # to make fast 3D kernel density contour plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.histplot()     # to make seaborn 1D distribution plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.kdeplot1()     # to make seaborn 1D kernel density plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.kdeplot2()     # to make seaborn 2D kernel density plots.
ParaDRAM - NOTE:     pmpd.sampleList[0].plot.grid()         # to make GridPlot
ParaDRAM - NOTE: 
ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.sampleList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte

By default, the sampler generates a highly refined decorrelated final sample. Essentially, the ParaDRAM sampler refining the Markov chain for as long as there is any residual auto-correlation in any of the individual variables of the Markov chain that has been sampled. This is an extremely strong and conservative refinement strategy (and it is done so in the ParaDRAM algorithm for a good reason: to ensure independent and identically distributed (i.i.d.) samples from the objective function.

This extreme refinement can be relaxed and controlled by the user, by setting the following ParaDRAM simulation specification variable prior to the simulation run,

In [15]:

pmpd.spec.sampleRefinementCount = 1

If you rerun the simulation with this specification set and, you will notice about a three-fold increase in the final sample size. Note that the above specification has to be set before running the simulation in the above if you really want to use it (also, do not forget to specify a new output file prefix for the new simulation because simulations cannot be overwritten). A value of 1 will cause the ParaDRAM sampler to refine the Markov chain for only one round, which what most people in the MCMC community do.

To quickly visualize the generated sample as a histogram, try,

In [16]:

sample.plot.histplot()

ParaDRAM - NOTE: making the histplot plot...

done in 0.241997 seconds.

How about adding a target value to this histogram plot? By default, if no value is specified for the target object, it will add the state corresponding to the mode of objective funciton to the plot, as inferred from the sample file,

In [17]:

sample.plot.histplot.reset()
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
sample.plot.histplot.target() # add a line corresponding to the maxLogFunc (mode) of the sampled points.

ParaDRAM - NOTE: making the histplot plot...

done in 0.121004 seconds.

However, assiging other values is also possible. For example, let's say that we want to add the mean and 1-sigma limits to the plot, instead of the single mode,

In [18]:

sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")

avg = sample.df.SampleVariable1.mean()
sample.plot.histplot.target( value = avg )

std = sample.df.SampleVariable1.std()
sample.plot.histplot.target.axvline.kws.linestyle = "--"
sample.plot.histplot.target( value = avg - std )
sample.plot.histplot.target( value = avg + std )

ParaDRAM - NOTE: making the histplot plot...

done in 0.148 seconds.

We can also plot all of the variables at the same time in a single figure,

In [19]:

sample.df.head()

Out[19]:

	SampleLogFunc	SampleVariable1	SampleVariable2	SampleVariable3	SampleVariable4
0	-7.437893	-10.602866	13.914840	21.541231	1.239213
1	-7.437893	-10.602866	13.914840	21.541231	1.239213
2	-6.866939	-11.305132	14.502793	22.145923	1.054045
3	-4.833832	-11.356071	13.499396	20.976090	1.235661
4	-4.833832	-11.356071	13.499396	20.976090	1.235661

In [20]:

sample.plot.histplot( xcolumns = [1,2,3,4] )

ParaDRAM - NOTE: making the histplot plot...

done in 0.259001 seconds.

Perhaps, we would also want to add legend and change the x-axis title,

In [21]:

sample.plot.histplot.reset("hard") # reseting to the default is not necessary, but does not hurt either.
sample.plot.histplot.legend.enabled = True
sample.plot.histplot( xcolumns = [1,2,3,4] )
sample.plot.histplot.currentFig.axes.set_xlabel("Multivariate Normal Variables")

ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the histplot plot...

done in 0.245996 seconds.

Out[21]:

Text(0.5, 31.07499999999998, 'Multivariate Normal Variables')

To make a trace-plot of the sample, try,

In [22]:

sample.plot.line()

ParaDRAM - NOTE: making the line plot...

done in 0.246002 seconds.

To change the scale of the x-axis, try,

In [23]:

sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot...

done in 0.282002 seconds.

By default, the color of the line in the trace-plot will represent the value returned by getLogFunc() at the given sampled point.

In [24]:

sample.plot.line.ccolumns

Out[24]:

'SampleLogFunc'

To turn the color off, we can instead try,

In [25]:

sample.plot.line.ccolumns = None
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot...

done in 0.109003 seconds.

There are many other properties of the plot that can be set or modified via the attributes of the pmpd.sampleList[0].plot.line object. To see them all, uncomment the following line to see the documentation of the object,

In [26]:

# sample.plot.line.helpme()

We could also plot all variables in the sample plot,

In [27]:

sample.plot.line.reset()         # reset to the default settings
sample.plot.line.ccolumns = None # turn off the color-mapping
sample.plot.line.ycolumns = pmpd.sampleList[0].df.columns[1:] # exclude the SampleLogFunc column
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the line plot...

done in 0.138 seconds.

We could also make scatter or line-scatter trace-plots of the sampled points. The ParaMonte library has a visualization tool for that purpose with some nice predefined settings that can be changed by the user. For example, let's make a scatter and line-scatter traceplots of the first sampled variable,

In [28]:

sample.plot.scatter()

ParaDRAM - NOTE: making the scatter plot...

done in 0.180998 seconds.

In [29]:

sample.plot.lineScatter()
sample.plot.lineScatter.currentFig.axes.set_xscale("log")

ParaDRAM - NOTE: making the lineScatter plot...

done in 0.188965 seconds.

Setting or modifying the properties of a scatter or lineScatter plot is identical to the line plot. By default, the sampler makes a scatter plot of only the first sampled variable as a function of the order by which the points appear in the sample file. Alternatively, we may want to plot individual variables against each other. For example, to make scatter plots of all variables against the logFunc,

In [30]:

sample.plot.scatter.reset()
sample.plot.scatter.ycolumns = "SampleLogFunc"
for variable in pmpd.sampleList[0].df.columns[1:]: # exclude the SampleLogFunc column
    sample.plot.scatter.xcolumns = variable
    sample.plot.scatter()

ParaDRAM - NOTE: making the scatter plot...

done in 0.17597 seconds.
ParaDRAM - NOTE: making the scatter plot...

done in 0.167 seconds.
ParaDRAM - NOTE: making the scatter plot...

done in 0.199006 seconds.
ParaDRAM - NOTE: making the scatter plot...

done in 0.197 seconds.

To make bivariate plots of sampled variables against each other, instead of traceplots, simply specify the columns of data to plot,

In [31]:

sample.plot.lineScatter( xcolumns = 1, ycolumns = 2)

ParaDRAM - NOTE: making the lineScatter plot...

done in 0.188998 seconds.

To make 3D plots of data, we can try the ParaMonte 3D visualzation tools, like, line3, scatter3, or lineScatter3,

In [32]:

sample.plot.lineScatter3()

ParaDRAM - NOTE: making the lineScatter3 plot...

done in 0.184002 seconds.

To make kernel density contour plots of the sampled points, we can try

In [33]:

sample.plot.contour()

ParaDRAM - NOTE: making the contour plot...

done in 0.555005 seconds.

To make kernel density filled contour plots of the sampled points, we can try,

In [34]:

sample.plot.contourf()

ParaDRAM - NOTE: making the contourf plot...

done in 0.627004 seconds.

Let's add the mode of the distribution as target to the distribution, too,

In [35]:

sample.plot.contourf()
sample.plot.contourf.target()

ParaDRAM - NOTE: making the contourf plot...

done in 0.615002 seconds.

In [36]:

sample.plot.contourf.reset()
sample.plot.contourf()
sample.plot.contourf.target()

ParaDRAM - NOTE: making the contourf plot...

C:\Users\shahmoradia\AppData\Roaming\Python\Python37\site-packages\paramonte\vis\_BasePlot.py:634: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  self.currentFig.figure = plt.figure( **vars(self.figure.kws) )

done in 0.630998 seconds.

we could add as many target as desired. For example, let's add the point corresponding to the maximum of the two variables in the above plot, and let's give it a purple color,

In [37]:

max1 = sample.df.SampleVariable1.max()
max2 = sample.df.SampleVariable2.max()
sample.plot.contourf.target.axvline.kws.linestyle = "--"
sample.plot.contourf.target.axhline.kws.linestyle = "--"
sample.plot.contourf.target.axvline.kws.color = "purple"
sample.plot.contourf.target.axhline.kws.color = "purple"
sample.plot.contourf.target.scatter.kws.color = "purple"
sample.plot.contourf.target( value  = [max1, max2] )

To make 3D kernel density contour plots of the sampled points, we can try,

In [38]:

sample.plot.contour3()

ParaDRAM - NOTE: making the contour3 plot...

done in 0.503999 seconds.

By default, the kernel density plot displays SampleVariable2 vs. SampleVariable1, if no other specifications are made by the user.

There are also the kdeplot1() and kdeplot2() visualization tools that make calls to the kdeplot() function of the seaborn library. However, those function implementations in seaborn are currently computationally too slow and therefore we will not diplay them here, though you can try them by yourself.

There is also the ParaMonte jointplot() visualization tool which implicitly calls the jointplot() function of the seaborn library.

In [39]:

sample.plot.jointplot(xcolumns = 3, ycolumns = 4)

ParaDRAM - NOTE: making the jointplot plot...

done in 8.539051 seconds.

We can also make a grid plot of all of the variables against each other via,

In [40]:

sample.plot.grid()

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 16... done in 0.052 seconds.
generating subplot #2: (0,1) out of 16... done in 0.26203 seconds.
generating subplot #3: (0,2) out of 16... done in 0.284002 seconds.
generating subplot #4: (0,3) out of 16... done in 0.279003 seconds.
generating subplot #5: (1,0) out of 16... done in 0.522006 seconds.
generating subplot #6: (1,1) out of 16... done in 0.052 seconds.
generating subplot #7: (1,2) out of 16... done in 0.261003 seconds.
generating subplot #8: (1,3) out of 16... done in 0.255001 seconds.
generating subplot #9: (2,0) out of 16... done in 0.465004 seconds.
generating subplot #10: (2,1) out of 16... done in 0.458004 seconds.
generating subplot #11: (2,2) out of 16... done in 0.052 seconds.
generating subplot #12: (2,3) out of 16... done in 0.258001 seconds.
generating subplot #13: (3,0) out of 16... done in 0.476004 seconds.
generating subplot #14: (3,1) out of 16... done in 0.451004 seconds.
generating subplot #15: (3,2) out of 16... done in 0.439005 seconds.
generating subplot #16: (3,3) out of 16... done in 0.052 seconds.
generating colorbar... done in 0.029002 seconds.

By default, since the grid plot is rather computationally demanding, the grid plot adds only up to 5 columns of data to the grid plot,

In [41]:

sample.plot.grid.columns

Out[41]:

Index(['SampleLogFunc', 'SampleVariable1', 'SampleVariable2',
       'SampleVariable3'],
      dtype='object')

This behavior can be overriden by the user by assigning more or less columns of the data to the columns atribute of the grid plot, like,

In [42]:

sample.plot.grid.columns = [1,2,3] # indices and column names can be mixed
sample.plot.grid.plotType.lower.value = "contourf"
sample.plot.grid()

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 9... done in 0.092999 seconds.
generating subplot #2: (0,1) out of 9... done in 0.251002 seconds.
generating subplot #3: (0,2) out of 9... done in 0.223998 seconds.
generating subplot #4: (1,0) out of 9... done in 0.662004 seconds.
generating subplot #5: (1,1) out of 9... done in 0.092999 seconds.
generating subplot #6: (1,2) out of 9... done in 0.225978 seconds.
generating subplot #7: (2,0) out of 9... done in 0.648001 seconds.
generating subplot #8: (2,1) out of 9... done in 0.683009 seconds.
generating subplot #9: (2,2) out of 9... done in 0.092999 seconds.
generating colorbar... done in 0.039001 seconds.

If one corner of the plot is not needed, we cal also simply hide it by disabling it,

In [43]:

sample.plot.grid.columns = [1,2,3] # indices and column names can be mixed
sample.plot.grid.plotType.lower.value = "contourf"
sample.plot.grid.plotType.upper.enabled = False # disable the upper triangle
sample.plot.grid()

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 9... done in 0.059667 seconds.
generating subplot #2: (0,1) out of 9... done in 0.0 seconds.
generating subplot #3: (0,2) out of 9... done in 0.0 seconds.
generating subplot #4: (1,0) out of 9... done in 0.671005 seconds.
generating subplot #5: (1,1) out of 9... done in 0.059667 seconds.
generating subplot #6: (1,2) out of 9... done in 0.0 seconds.
generating subplot #7: (2,0) out of 9... done in 0.596005 seconds.
generating subplot #8: (2,1) out of 9... done in 0.651004 seconds.
generating subplot #9: (2,2) out of 9... done in 0.059667 seconds.
generating colorbar... done in 0.062001 seconds.

You can also hide the upper or the lower triangles or the colorbar or all three within the current figure by using the "hide" method of the grid object,

In [44]:

sample.plot.grid.reset()
sample.plot.grid()
sample.plot.grid.hide('lower')

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 16... done in 0.124751 seconds.
generating subplot #2: (0,1) out of 16... done in 0.481006 seconds.
generating subplot #3: (0,2) out of 16... done in 0.466999 seconds.
generating subplot #4: (0,3) out of 16... done in 0.456004 seconds.
generating subplot #5: (1,0) out of 16... done in 0.733005 seconds.
generating subplot #6: (1,1) out of 16... done in 0.124751 seconds.
generating subplot #7: (1,2) out of 16... done in 0.458003 seconds.
generating subplot #8: (1,3) out of 16... done in 0.334 seconds.
generating subplot #9: (2,0) out of 16... done in 0.523005 seconds.
generating subplot #10: (2,1) out of 16... done in 0.513001 seconds.
generating subplot #11: (2,2) out of 16... done in 0.124751 seconds.
generating subplot #12: (2,3) out of 16... done in 0.325005 seconds.
generating subplot #13: (3,0) out of 16... done in 0.500002 seconds.
generating subplot #14: (3,1) out of 16... done in 0.502006 seconds.
generating subplot #15: (3,2) out of 16... done in 0.478026 seconds.
generating subplot #16: (3,3) out of 16... done in 0.124751 seconds.
generating colorbar... done in 0.024997 seconds.

To show the triangle again, you can call the show() method. To see what input options are available with hide() or show(), get help like the following,

In [45]:

sample.plot.grid.helpme("hide")


        Hides the requested part of the grid plot.

        **Parameters**

            part

                A string with the following possible values:

                    "lower"
                        hides the lower triangle of the grid plot.

                    "upper"
                        hides the upper triangle of the grid plot.

                    "diag"
                        hides the diagonal of the grid plot.

                    "all"
                        hides all grid plots and the colorbar.

                    "colorbar"
                        hides the colorbar of the grid plot.

                    The string can also be a mix of the above keywords,
                    separated by the ``+`` sign or some other delimiter.
                    For example, ``"lower+upper+colorbar"``

        **Returns**

            None

Ensuring the independent and identical distribution (i.i.d. property) of the resulting sample¶

We can more formally ensure the i.i.d. property of the resulting sample from the simulation by computing and visualizing the autocorrelation of the sampled points. Whenever a ParaDRAM output sample file is read, the sampler automatically computes the autocorrelation of sampled points along each dimension of the domain of the objective function.

In [46]:

sample.stats.autocorr()
sample.stats.autocorr.plot.line()

ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the line plot...

done in 0.399004 seconds.

The above AutoCorrelation plot is reassuring since the sampled points do not appear to be correlated with each other at all. This is because the ParaDRAM routine, by default, applies many rounds aggressive refinement of the raw (verbose) Markov chain as it deems necessary to remove any residual correlations from the final output refined sample that we have plotted in the above.

We can compare the autocorrelation of the refined sample in the above with the autocorrelation of the raw (verbose) Markov chain,

In [47]:

markovChainList = pmpd.readMarkovChain(renabled = True) # object return enabled.
markovChain = markovChainList[0]

ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readMarkovchain()``
ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. 
ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed?
ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated 
ParaDRAM - WARNING: file format for the contents of the chain file(s) to be parsed.


ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_chain.txt"


ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\out\mvn_serial_process_1_chain.txt
ParaDRAM - NOTE: reading the file contents... done in 5.492035 seconds.
ParaDRAM - NOTE: ndim = 4, count = 101916
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.023969 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.021998 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: adding the graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.001 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.001003 seconds.
ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.000994 seconds.

ParaDRAM - NOTE: The processed markovChain files are now stored in the output variable as a 
ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only) 
ParaDRAM - NOTE: markovChain file stored in an output variable named markovChainList, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     markovChainList[0].df
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the plotting tools, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     markovChainList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For example,
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     markovChainList[0].plot.line()         # to make 2D line plots.
ParaDRAM - NOTE:     markovChainList[0].plot.scatter()      # to make 2D scatter plots.
ParaDRAM - NOTE:     markovChainList[0].plot.lineScatter()  # to make 2D line-scatter plots.
ParaDRAM - NOTE:     markovChainList[0].plot.line3()        # to make 3D line plots.
ParaDRAM - NOTE:     markovChainList[0].plot.scatter3()     # to make 3D scatter plots.
ParaDRAM - NOTE:     markovChainList[0].plot.lineScatter3() # to make 3D line-scatter plots.
ParaDRAM - NOTE:     markovChainList[0].plot.contour()      # to make fast 2D kernel density plots.
ParaDRAM - NOTE:     markovChainList[0].plot.contourf()     # to make fast 2D kernel density filled contour plots.
ParaDRAM - NOTE:     markovChainList[0].plot.contour3()     # to make fast 3D kernel density contour plots.
ParaDRAM - NOTE:     markovChainList[0].plot.histplot()     # to make seaborn 1D distribution plots.
ParaDRAM - NOTE:     markovChainList[0].plot.kdeplot1()     # to make seaborn 1D kernel density plots.
ParaDRAM - NOTE:     markovChainList[0].plot.kdeplot2()     # to make seaborn 2D kernel density plots.
ParaDRAM - NOTE:     markovChainList[0].plot.grid()         # to make GridPlot
ParaDRAM - NOTE: 
ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     markovChainList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte

In [48]:

markovChain.stats.autocorr()
markovChain.stats.autocorr.plot.line()

ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: making the line plot...

done in 0.704007 seconds.

Compare the above plot to the same plot that we got for the resulting refined samples earlier in the above. Unlike the refined sample, the Markov chain is significantly correlated with itself along each dimension. The large amount of autocorrelation seen for SampleLogFunc is because of the fact that we started the MCMC sampling from a very bad low-probability location, which is also visible the grid plots in the above.

Obtaining the summary statistics of the simulation results¶

To get the statistics of the maximum of the function, try,

In [50]:

print( "maxLogFunc: {}".format(sample.stats.maxLogFunc.value) )
print( "The location of maxLogFunc: {}".format(sample.stats.maxLogFunc.state.values) )

maxLogFunc: -2.5858423
The location of maxLogFunc: [-10.083953    14.828449    20.037333     0.12419514]

which is again reassuring, since we already know that the maximum of the standard Gaussian distribution happens at zero, which is very close to the ParaDRAM sampler's estimated location of maxLogFunc in the above.

Further statistics of the resulting sample and the raw verbose Markov chain can be computed from the dataFrame. For example, via,

In [51]:

sample.df.describe()

Out[51]:

	SampleLogFunc	SampleVariable1	SampleVariable2	SampleVariable3	SampleVariable4
count	8924.000000	8924.000000	8924.000000	8924.000000	8924.000000
mean	-4.570621	-10.013841	14.988699	19.994788	-0.006943
std	1.400108	0.999499	1.007615	1.001679	1.006100
min	-13.263951	-13.558018	10.992699	16.534873	-4.150198
25%	-5.271783	-10.703673	14.300603	19.330915	-0.674777
50%	-4.251049	-10.035884	14.987852	20.001485	-0.009479
75%	-3.519423	-9.338450	15.684947	20.655977	0.664621
max	-2.585842	-6.221306	18.829987	23.608027	3.510943

Some statistics are also automatically computed and reported in the output report file of every ParaDRAM simulation (which ends with the suffix *_report.txt). One can either directly inspect the contents of this file, or use the readReport() function of the ParaDRAM sampler object, like,

In [52]:

reportList = pmpd.readReport(renabled = True) # object return enabled
report = reportList[0]

ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_report.txt"

ParaDRAM - NOTE: processing report file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\out\mvn_serial_process_1_report.txt
ParaDRAM - NOTE: reading the file contents... ParaDRAM - NOTE: parsing the report file contents...
done in 0.012006 seconds.

ParaDRAM - NOTE: The processed report files are now stored in the output variable as a
ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only)
ParaDRAM - NOTE: report file stored in an output variable named reportList, try:
ParaDRAM - NOTE:
ParaDRAM - NOTE: reportList[0].contents.print()
ParaDRAM - NOTE:
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the simulation statistics and information, examine the contents of the
ParaDRAM - NOTE: components of the following structures:
ParaDRAM - NOTE:
ParaDRAM - NOTE: reportList[0].contents.print() # to print the contents of the report file.
ParaDRAM - NOTE: reportList[0].setup # to get information about the simulation setup.
ParaDRAM - NOTE: reportList[0].stats.time # to get the timing information of the simulation.
ParaDRAM - NOTE: reportList[0].stats.chain # to get the statistics of the simulation output sample.
ParaDRAM - NOTE: reportList[0].stats.numFuncCall # to get information about the number of function calls.
ParaDRAM - NOTE: reportList[0].stats.parallelism # to get information about the simulation parallelism.
ParaDRAM - NOTE: reportList[0].spec # to get the simulation specification in the report file.
ParaDRAM - NOTE:
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE:
ParaDRAM - NOTE: https://www.cdslab.org/paramonte

In [53]:

print(report.stats.chain.refined.ess.value)
print(report.stats.chain.refined.ess.description)

8924
This is the estimated Effective (decorrelated) Sample Size (ESS) of the final refined chain.

Monitoring and ensuring the diminishing adaptation criterion of the¶

Now, to see the effects of setting the starting point of the MCMC sampler, we will take a look at the full chain (in compact form) of uniquely sampled points form the objective function,

In [54]:

pmpd.readChain()
chain = pmpd.chainList[0]

ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readChain()``
ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. 
ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed?
ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated 
ParaDRAM - WARNING: file format for the contents of the chain file(s) to be parsed.


ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_chain.txt"


ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\out\mvn_serial_process_1_chain.txt
ParaDRAM - NOTE: reading the file contents... done in 0.069999 seconds.
ParaDRAM - NOTE: ndim = 4, count = 30000
ParaDRAM - NOTE: parsing file contents... 
ParaDRAM - NOTE: computing the sample correlation matrix... 
ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.019999 seconds.
ParaDRAM - NOTE: computing the sample covariance matrix... 
ParaDRAM - NOTE: adding the covariance graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.018 seconds.
ParaDRAM - NOTE: computing the sample autocorrelations... 
ParaDRAM - NOTE: adding the autocrrelation graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: adding the graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.001 seconds.
ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.000998 seconds.

ParaDRAM - NOTE: The processed chain files are now stored in the newly-created
ParaDRAM - NOTE: component `chainList` of the ParaDRAM object as a Python list.
ParaDRAM - NOTE: For example, to access the contents of the first (or the only) chain file, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.chainList[0].df
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the plotting tools, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.chainList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For example,
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.chainList[0].plot.line()         # to make 2D line plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.scatter()      # to make 2D scatter plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.lineScatter()  # to make 2D line-scatter plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.line3()        # to make 3D line plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.scatter3()     # to make 3D scatter plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.lineScatter3() # to make 3D line-scatter plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.contour()      # to make fast 2D kernel density plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.contourf()     # to make fast 2D kernel density filled contour plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.contour3()     # to make fast 3D kernel density contour plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.histplot()     # to make seaborn 1D distribution plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.kdeplot1()     # to make seaborn 1D kernel density plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.kdeplot2()     # to make seaborn 2D kernel density plots.
ParaDRAM - NOTE:     pmpd.chainList[0].plot.grid()         # to make GridPlot
ParaDRAM - NOTE: 
ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     pmpd.chainList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte

Just as with the refined sample, we can also make the same types of plots for the compact and verbose (Markov) chains, for example, a grid plot,

In [55]:

chain.df.head()

Out[55]:

	ProcessID	MeanAcceptanceRate	BurninLocation	SampleWeight	SampleLogFunc	SampleVariable1	SampleVariable2	SampleVariable3	SampleVariable4
0	1	1.000000	1	1	-329.22317	0.000000	0.000000	0.000000	0.000000
1	1	0.666667	1	2	-319.38776	-1.131467	-1.011470	2.440997	-1.638004
2	1	0.600000	2	2	-292.87086	-2.708690	-1.746430	2.887417	0.428331
3	1	0.571429	3	2	-290.73338	-4.913328	-2.901997	2.351047	-0.017945
4	1	0.625000	4	1	-246.42733	-6.311880	-1.069209	3.394714	0.746134

In [56]:

chain.plot.grid(columns = [7,8,9,10]) # plot SampleVariable1 to SampleVariable4

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 16... done in 0.061001 seconds.
generating subplot #2: (0,1) out of 16... done in 0.391 seconds.
generating subplot #3: (0,2) out of 16... done in 0.362003 seconds.
generating subplot #4: (0,3) out of 16... done in 0.341002 seconds.
generating subplot #5: (1,0) out of 16... done in 0.562005 seconds.
generating subplot #6: (1,1) out of 16... done in 0.061001 seconds.
generating subplot #7: (1,2) out of 16... done in 0.391998 seconds.
generating subplot #8: (1,3) out of 16... done in 0.413006 seconds.
generating subplot #9: (2,0) out of 16... done in 0.502005 seconds.
generating subplot #10: (2,1) out of 16... done in 0.514003 seconds.
generating subplot #11: (2,2) out of 16... done in 0.061001 seconds.
generating subplot #12: (2,3) out of 16... done in 0.375968 seconds.
generating subplot #13: (3,0) out of 16... done in 0.507003 seconds.
generating subplot #14: (3,1) out of 16... done in 0.510006 seconds.
generating subplot #15: (3,2) out of 16... done in 0.563003 seconds.
generating subplot #16: (3,3) out of 16... done in 0.061001 seconds.
generating colorbar... done in 0.033999 seconds.

We can also add target values to this grid plot via the addTarget() method of the GridPlot object. By default, there are a few types target that are currently available out of the box, like "mode", "mean", "median". The "mode" option corresponds to the state where the maximum of the objective function occurs. This is the default value if no value is provided to addTarget.

In [57]:

chain.plot.grid(columns = [8,7,10,9]) # plot in random order, just for illustration.
chain.plot.grid.addTarget()

ParaDRAM - NOTE: making the grid plot...

generating subplot #1: (0,0) out of 16... done in 0.059251 seconds.
generating subplot #2: (0,1) out of 16... done in 0.374 seconds.
generating subplot #3: (0,2) out of 16... done in 0.344006 seconds.
generating subplot #4: (0,3) out of 16... done in 0.342998 seconds.
generating subplot #5: (1,0) out of 16... done in 0.488997 seconds.
generating subplot #6: (1,1) out of 16... done in 0.059251 seconds.
generating subplot #7: (1,2) out of 16... done in 0.355004 seconds.
generating subplot #8: (1,3) out of 16... done in 0.351 seconds.
generating subplot #9: (2,0) out of 16... done in 0.594 seconds.
generating subplot #10: (2,1) out of 16... done in 0.549006 seconds.
generating subplot #11: (2,2) out of 16... done in 0.059251 seconds.
generating subplot #12: (2,3) out of 16... done in 0.378003 seconds.
generating subplot #13: (3,0) out of 16... done in 0.501004 seconds.
generating subplot #14: (3,1) out of 16... done in 0.499998 seconds.
generating subplot #15: (3,2) out of 16... done in 0.486004 seconds.
generating subplot #16: (3,3) out of 16... done in 0.059251 seconds.
generating colorbar... done in 0.034 seconds.
generating target #0: (0,0) out of 16... done in 0.005002 seconds.
generating target #0: (0,1) out of 16... done in 0.006998 seconds.
generating target #0: (0,2) out of 16... done in 0.007 seconds.
generating target #0: (0,3) out of 16... done in 0.009003 seconds.
generating target #0: (1,0) out of 16... done in 0.008999 seconds.
generating target #0: (1,1) out of 16... done in 0.003964 seconds.
generating target #0: (1,2) out of 16... done in 0.006003 seconds.
generating target #0: (1,3) out of 16... done in 0.01 seconds.
generating target #0: (2,0) out of 16... done in 0.008999 seconds.
generating target #0: (2,1) out of 16... done in 0.007999 seconds.
generating target #0: (2,2) out of 16... done in 0.004 seconds.
generating target #0: (2,3) out of 16... done in 0.008 seconds.
generating target #0: (3,0) out of 16... done in 0.007999 seconds.
generating target #0: (3,1) out of 16... done in 0.01 seconds.
generating target #0: (3,2) out of 16... done in 0.008 seconds.
generating target #0: (3,3) out of 16... done in 0.004 seconds.

The color in the scatter plots represents the value of logFunc at each point. The blue color of the density plots represents the density of the sampled points at any given location.

We can also add targets to these plots, representing a specific value of interest. For example, we may be interested in displaying the location of the maximum logFunc on these plots. This is the default value that is loaded on the targets when the grid object is created for the first time.

Ensuring diminishing adaptation in the adaptive MCMC sampling¶

By default, the ParaDRAM sampler adapts the proposal distribution of the sampler throughout the entire simulation. This breaks the ergodicity condition of the Markov chain, however, if the adaptation tends to continuously and progressively diminish throughout the entire simulation, then we could potentially trust the Markov chain simulation results. The alternative is to simply turn off the adaptation, but that is rarely a good strategy. Instead, a better solution is to keep the adaptivity on, however, simulate for a long-enough times to ensure convergence of the Markov chain to the target density has occurred.

A unique feature of the ParaDRAM sampler is that, it provides a measure of the amount of adaptation of the proposal distribution of the Markov chain by measuring how much the proposal distribution progressively changes throughout the simulation. In essence, the computed quantity, represented by the column AdaptationMeasure in the output chain files, represents an upper limit on the total variation distance between each updated proposal distribution and the last one used before it.

We can plot the AdaptationMeasure for this sampling problem to ensure that the adaptation of the proposal distribution happens progressively less and less throughout the simulation. If AdaptationMeasure does not diminish monotonically throughout the simulation, it would be a strong indication of the lack of convergence of the MCMC sampler to the target objective function.

In [58]:

markovChain.plot.scatter.scatter.kws.cmap = "winter"
markovChain.plot.scatter(ycolumns="AdaptationMeasure",ccolumns=[])
markovChain.plot.scatter.currentFig.axes.set_yscale("log")
markovChain.plot.scatter.currentFig.axes.set_ylim([1.e-5,1])

ParaDRAM - NOTE: making the scatter plot...

done in 0.289001 seconds.

Out[58]:

(1e-05, 1)

The above curve is exactly the kind of diminishing adaptation that we would want to see in a ParaDRAM simulation: At the beginning, the sampler appears to struggle for a while to find the shape of the target objective function, but around step 100, it appears to have found the peak of the target and starts to quickly converge and adapt the shape of the proposal distribution of the Markov Chain sampler to the shape of the objective function.

Visualizing the adaptation of the proposal distribution of the ParaDRAM sampler¶

We can also parse the contents of the ASCII-format restart file generated by the ParaDRAM sampler to gain insight into how the proposal distribution has been updated over the course of the simulation.

In [59]:

restartList = pmpd.readRestart(renabled = True) # object return enabled.
restart = restartList[0]

ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_restart.txt"


ParaDRAM - NOTE: processing restart file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\out\mvn_serial_process_1_restart.txt
ParaDRAM - NOTE: reading the file contents... .
done in 0.470999 seconds.
ParaDRAM - NOTE: adding the graphics tools... 
ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a covmat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a covmat3 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a cormat2 plot object from scratch... done in 0.0 seconds.
ParaDRAM - NOTE: creating a cormat3 plot object from scratch... done in 0.0 seconds.
done in 0.025005 seconds.

ParaDRAM - NOTE: The processed restart files are now stored in the output variable as a
ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only) 
ParaDRAM - NOTE: restart file stored in an output variable named restartList, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     restartList[0].contents
ParaDRAM - NOTE: 
ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name.
ParaDRAM - NOTE: To access the plotting tools, try:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     restartList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS>
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For example,
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     restartList[0].plot.line()        # to make bivariate line plots.
ParaDRAM - NOTE:     restartList[0].plot.scatter()     # to make bivariate scatter plots.
ParaDRAM - NOTE:     restartList[0].plot.lineScatter() # to make bivariate line-scatter plots.
ParaDRAM - NOTE:     restartList[0].plot.covmat2()     # to make proposal covariance evolution 2D plots.
ParaDRAM - NOTE:     restartList[0].plot.covmat3()     # to make proposal covariance evolution 3D plots.
ParaDRAM - NOTE:     restartList[0].plot.cormat2()     # to make proposal correlation evolution 2D plots.
ParaDRAM - NOTE:     restartList[0].plot.cormat3()     # to make proposal correlation evolution 3D plots.
ParaDRAM - NOTE: 
ParaDRAM - NOTE: For more information and examples on the usage, visit:
ParaDRAM - NOTE: 
ParaDRAM - NOTE:     https://www.cdslab.org/paramonte

In [60]:

restart.plot.cormat2.reset()
restart.plot.cormat2()

restart.plot.cormat2.reset()
restart.plot.cormat2(dimensionPair = [2,3])

ParaDRAM - NOTE: making the cormat2 plot...

done in 0.494018 seconds.
ParaDRAM - NOTE: making the cormat2 plot...

done in 0.397006 seconds.

In [61]:

restart.plot.covmat2.reset()
restart.plot.covmat2()

restart.plot.covmat2.reset()
restart.plot.covmat2(dimensionPair = [2,3])

ParaDRAM - NOTE: making the covmat2 plot...

done in 0.420001 seconds.
ParaDRAM - NOTE: making the covmat2 plot...

done in 0.427001 seconds.

In [62]:

restart.plot.covmat3.reset()
restart.plot.covmat3()

restart.plot.covmat3.reset()
restart.plot.covmat3(dimensionPair = [2,3])

ParaDRAM - NOTE: making the covmat3 plot...

done in 0.409003 seconds.
ParaDRAM - NOTE: making the covmat3 plot...

done in 0.580012 seconds.

As seen in the above plots, the proposal adaptations gradually diminish toward the end of the simulation, and the shape of the proposal distribution stabilizes relatively fast.

Visualizing the covariance and the correlation matrices of the objective function¶

To construct and visualize the correlation matrix of the sampled points in the chain, we can try,

In [63]:

sample.stats.cormat.plot.heatmap()

ParaDRAM - NOTE: making the heatmap plot...

By default, the covariance and correlation matrices are precmputed at the time of reading the data. In particular, the correlation matrix is computed via the Pearson's method of computing the correlation. We can easily recompute the correlation matrix using either Spearman's or Kendal's rank correlation coefficients, like,

In [64]:

sample.stats.cormat(method="spearman",reself=True).plot.heatmap()

ParaDRAM - NOTE: adding the correlation graphics tools... 
ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.050999 seconds.
ParaDRAM - NOTE: making the heatmap plot...

The input argument reself requests the function to return an instance of the self object as the function output. To visualize the covariance matrix we can try,

In [65]:

sample.stats.covmat.plot.heatmap()

ParaDRAM - NOTE: making the heatmap plot...

Note that the same plots can be also made for the compact and verbose (Markov) chains. For brevity, we do not show them here.

There are many more functionalities and features of the ParaMonte library that were neither explored nor mentioned in this example Jupyter notebook. You can explore them by checking the existing components of each attribute of the ParaDRAM sampler class and by visiting the ParaMonte library's documentation website.

In [ ]: