import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
%matplotlib notebook
import paramonte as pm
import numpy as np
import scipy as sp
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
print('\n'.join(f'{m.__name__} {m.__version__}' for m in globals().values() if getattr(m, '__version__', None)))
sns.set()
paramonte 2.5.1 numpy 1.19.5 scipy 1.5.4 pandas 1.2.0 seaborn 0.11.0 matplotlib 3.3.3
The logic of Monte Carlo sampling is very simple:
Suppose we have a mathematical objective function defined on a ndim
-dimensional domain. Typically, this domain is defined on the space of real numbers. In general, understanding and visualizing the structure of such objective functions is an extremely difficult task, if not impossible. As a better alternative, you employ Monte Carlo techniques that can randomly explore the structure of the objective function and find the function's extrema.
In the following example below, an example of one of the simplest such objective functions, the Multivariate Normal Distribution (MVN), is constructed and sampled using the ParaMonte library samplers, here, the ParaDRAM sampler (Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler).
Suppose we want to sample random points from a MultiVariate Normal (MVN) Probability Density Function (PDF). The PDF equation of MVN is the following,
The following Python function getLogFunc()
returns the natural logarithm of the Probability Density Function of the multivariate Normal distribution.
Since the mathematical objective functions (e.g., probability density functions) can take extremely small or large values, we often work with their natural logarithms instead. This is the reason behind the naming convention used in the ParaMonte library for the user's objective functions: getLogFunc, implying that the user must provide a function that returns the natural logarithm of the target objective function.
See this Jupyter Notebook for an in-depth discussion of why we need to work with the logarithm of mathematical objective functions in optimization and sampling problems.
import numpy as np
NDIM = 4 # number of dimensions of the domain of the MVN PDF
MEAN = np.double([-10, 15., 20., 0.0]) # This is the mean of the MVN PDF.
COVMAT = np.double( [ [1.0,.45,-.3,0.0] # This is the covariance matrix of the MVN PDF.
, [.45,1.0,0.3,-.2]
, [-.3,0.3,1.0,0.6]
, [0.0,-.2,0.6,1.0]
] )
INVCOV = np.linalg.inv(COVMAT) # This is the inverse of the covariance matrix of the MVN distribution.
# The following is the log of the coefficient used in the definition of the MVN.
MVN_COEF = NDIM * np.log( 1. / np.sqrt(2.*np.pi) ) + np.log( np.sqrt(np.linalg.det(INVCOV)) )
# the logarithm of objective function: log(MVN)
def getLogFunc(point):
"""
Return the logarithm of the MVN PDF.
"""
normedPoint = MEAN - point
return MVN_COEF - 0.5 * ( np.dot(normedPoint,np.matmul(INVCOV,normedPoint)) )
We will sample random points from the objective function by calling the ParaDRAM sampler (Delayed-Rejection Adaptive Metropolis-Hastings Markov Chain Monte Carlo sampler) of the ParaMonte library.
To run the sampler in parallel, you will have to first save the MPI-enabled script as an external file. Visit the ParaMonte library's documentation website for more information, or see this parallel ParaDRAM simulation Jupyter notebook example.
import paramonte as pm
First we will check what version of the ParaMonte library your are using,
print( pm.version.kernel.get() )
print( pm.version.interface.get() )
ParaMonte Python Kernel Version 1.5.1 ParaMonte Python Interface Version 2.5.1
We can also dump the ParaMonte library interface version,
print( pm.version.kernel.dump() )
print( pm.version.interface.dump() )
1.5.1 2.5.1
Checking for the new versions of the ParaMonte library is also easy and recommended,
pm.checkForUpdate()
ParaMonte - NOTE: You have the latest version of the ParaMonte library. ParaMonte - NOTE: To see the most recent changes to the library, visit, ParaMonte - NOTE: ParaMonte - NOTE: https://www.cdslab.org/paramonte/notes/overview/paramonte-python-release-notes
We can also verify the existence of the essential components of the current installation of the ParaMonte Python library on the system,
pm.verify()
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :::: :::: _/_/_/_/ _/_/ _/_/ _/ _/ _/_/_/_/_/ _/ _/ _/ _/_/_/_/ _/ /_/_/ _/_/_/_/ _/ _/ _/ _/_/ _/_/_/ _/_/_/ _/_/_/ _/_/_/ _/ _/ _/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/_/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/ _/_/_/ _/_/_/_/ _/ _/_/_/_/ _/_/_/ _/_/_/ _/_/ _/ _/ _/_/ _/_/_/ ParaMonte plain powerful parallel Monte Carlo library Version 2.5.1 :::: :::: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ParaMonte - NOTE: The ParaMonte::Kernel samplers have no Python package dependencies ParaMonte - NOTE: beyond numpy. However, the ParaMonte::Python post-processing and ParaMonte - NOTE: visualization tools require the following Python packages, ParaMonte - NOTE: ParaMonte - NOTE: numpy : 1.19.2 ParaMonte - NOTE: scipy : 1.5.2 ParaMonte - NOTE: pandas : 1.1.2 ParaMonte - NOTE: seaborn : 0.11.0 ParaMonte - NOTE: matplotlib : 3.3.2 ParaMonte - NOTE: ParaMonte - NOTE: If you do not intend to use the postprocessing and visualization tools, ParaMonte - NOTE: you can ignore this message. Otherwise, ParaMonte - NOTE: ParaMonte - NOTE: UPDATE THE ABOVE PACKAGES TO THE REQUESTED VERSIONS OR NEWER TO ENABLE ParaMonte - NOTE: THE VISUALIZATION AND POSTPROCESSING TOOLS OF THE ParaMonte LIBRARY. ParaMonte - NOTE: Intel MPI library for 64-bit architecture detected at: ParaMonte - NOTE: ParaMonte - NOTE: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin ParaMonte - NOTE: ParaMonte - NOTE: To perform ParaMonte simulations in parallel on a single node, ParaMonte - NOTE: run the following two commands, in the form and order specified, ParaMonte - NOTE: on a Python-aware mpiexec-aware command-line interface such as ParaMonte - NOTE: Anaconda3 Windows command prompt: ParaMonte - NOTE: ParaMonte - NOTE: "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.1.216\windows\mpi\intel64\bin\mpivars.bat" ParaMonte - NOTE: ParaMonte - NOTE: mpiexec -localonly -n 2 python main_mpi.py ParaMonte - NOTE: ParaMonte - NOTE: where, ParaMonte - NOTE: ParaMonte - NOTE: 0. the first command defines the essential environment variables, ParaMonte - NOTE: and the second command runs in the simulation in parallel, where, ParaMonte - NOTE: 1. you should replace the number 2 with the desired processor count ParaMonte - NOTE: you wish to assign to your simulation task and, ParaMonte - NOTE: 2. the flag '-localonly' indicates a parallel simulation on only ParaMonte - NOTE: a single node (this flag will obviate the need for the MPI ParaMonte - NOTE: library credentials registration). For more information, visit: ParaMonte - NOTE: https://software.intel.com/en-us/get-started-with-mpi-for-windows ParaMonte - NOTE: 3. python is the name of the Python software installed on your system. ParaMonte - NOTE: If python is not recognized on your command line, use python3 instead. ParaMonte - NOTE: 4. main_mpi.py is the Python file which serves as the entry point to ParaMonte - NOTE: your simulation, where you call the ParaMonte sampler routines. ParaMonte - NOTE: ParaMonte - NOTE: Note that the above two commands must be executed on a command-line that ParaMonte - NOTE: recognizes both Python and mpiexec applications, such as the Anaconda ParaMonte - NOTE: command-line interface. For more information, in particular, on how ParaMonte - NOTE: to register to run Hydra services for multi-node simulations ParaMonte - NOTE: on Windows servers, visit: ParaMonte - NOTE: ParaMonte - NOTE: https://www.cdslab.org/paramonte ParaMonte - NOTE: To check for the MPI library installation status or display the above ParaMonte - NOTE: messages in the future, type the following on the Python command-line: ParaMonte - NOTE: ParaMonte - NOTE: import paramonte as pm ParaMonte - NOTE: pm.verify() ParaMonte - NOTE: ParaMonte - NOTE: To get started, type the following on the Python command-line, ParaMonte - NOTE: ParaMonte - NOTE: import paramonte as pm ParaMonte - NOTE: pm.helpme() ParaMonte - NOTE: ParaMonte - NOTE: To get help on individual components of the paramonte modules, ParaMonte - NOTE: pass the name of the components to `helpme()`. For example, ParaMonte - NOTE: ParaMonte - NOTE: import paramonte as pm ParaMonte - NOTE: pm.helpme("ParaDRAM")
pmpd = pm.ParaDRAM()
The simplest scenario is to run the simulation with the default specifications that are appropriately determined by the ParaDRAM sampler. However, for the purpose of clarity reproducibility, we will specify a few simulation specs,
For a complete list of all ParaDRAM simulation specifications, visit the ParaDRAM simulation specifications webpage.
We will store all output files in a directory with the same name as this Jupyter Notebook's name and with the same prefix.
By default, all ParaDRAM simulation output files are named properly (see this page for a review of ParaDRAM simulation output files). In general, it is a good habit to specify an output file prefix for any ParaDRAM simulation. This way, the restart of an incomplete simulation, if it happens for some reasons, would be seamless and doable by just rerunning the simulation, without any change of any sort to any parts of the codes or the simulation.
pmpd.spec.outputFileName = "./out/mvn_serial"
By default, the ParaMonte samplers do not overwrite old existing simulation files for obvious reasons. For this example, however, we will let the sampler overwrite the simulation output files if they already exist at the specified path,
pmpd.spec.overwriteRequested = True
We will also initialize the random seed to a fixed number to ensure the reproducibility of the simulation results in the future,
pmpd.spec.randomSeed = 3751
Not so important, but we will also specify a chainSize here, 30000, the number of unique points to be sampled from the objective function. This number is much smaller than the default 100,000 unique sample points of the sampler.
pmpd.spec.chainSize = 30000
For the sake of illustration, we will also request an ASCII-format restart output file,
pmpd.spec.restartFileFormat = "ascii"
Note that none of the above specification settings are necessary, but is in general a good habit to define them prior to the simulation.
We now call the ParaDRAM sampler routine to run the sampling simulation,
# run the ParaDRAM sampler
pmpd.runSampler ( ndim = 4
, getLogFunc = getLogFunc
)
ParaDRAM - NOTE: Running the ParaDRAM sampler in serial mode... ParaDRAM - NOTE: To run the ParaDRAM sampler in parallel mode visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte ParaDRAM - NOTE: ParaDRAM - NOTE: If you are using Jupyter notebook, check the Jupyter's ParaDRAM - NOTE: terminal window for realtime simulation progress and report. ParaDRAM - NOTE: To read the generated output files, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.readReport() # to read the summary report from the output report file. ParaDRAM - NOTE: pmpd.readSample() # to read the final i.i.d. sample from the output sample file. ParaDRAM - NOTE: pmpd.readChain() # to read the uniquely-accepted points from the output chain file. ParaDRAM - NOTE: pmpd.readMarkovChain() # to read the Markov Chain. NOT recommended for very large chains. ParaDRAM - NOTE: pmpd.readRestart() # to read the contents of an ASCII-format output restart file. ParaDRAM - NOTE: pmpd.readProgress() # to read the contents of an output progress file. ParaDRAM - NOTE: ParaDRAM - NOTE: where you should replace `pmpd` with your ParaDRAM sampler's object name. ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
This will print the realtime simulation progress information on your Anaconda prompt window (not inside your Jupyter notebook). Once the simulation is finished, the ParaDRAM routine generates 5 output files, each of which contains information about certain aspects of the simulation,
mvn_serial_process_1_report.txt
This file contains reports about all aspects of the simulation, from the very beginning of the simulation to the very last step, including the postprocessing of the results and final sample refinement.
mvn_serial_process_1_progress.txt
This file contains dynamic realtime information about the simulation progress. The frequency of the progress report to this file depends on the value of the simulation specification progressReportPeriod
.
mvn_serial_process_1_sample.txt
This file contains the final fully-refined decorrelated i.i.d. random sample from the objective function.
mvn_serial_process_1_chain.txt
This file contains the Markov chain, generally in either binary or condensed (compact) ASCII format to improve the storage efficiency of the chain on your computer.
Now, we can read the generated output sample, contained in the file suffixed with *_sample.txt
.
pmpd.readSample() # generates pmpd.sampleList
sample = pmpd.sampleList[0]
ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readSample()`` ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed? ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated ParaDRAM - WARNING: file format for the contents of the sample file(s) to be parsed. ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_sample.txt" ParaDRAM - NOTE: processing sample file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\sampling_multivariate_normal_distribution_via_paradram\out\mvn_serial_process_1_sample.txt ParaDRAM - NOTE: reading the file contents... done in 0.025998 seconds. ParaDRAM - NOTE: ndim = 4, count = 8924 ParaDRAM - NOTE: parsing file contents... ParaDRAM - NOTE: computing the sample correlation matrix... ParaDRAM - NOTE: adding the correlation graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.020009 seconds. ParaDRAM - NOTE: computing the sample covariance matrix... ParaDRAM - NOTE: adding the covariance graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.026999 seconds. ParaDRAM - NOTE: computing the sample autocorrelations... ParaDRAM - NOTE: adding the autocrrelation graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: adding the graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.000989 seconds. ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.001002 seconds. ParaDRAM - NOTE: The processed sample files are now stored in the newly-created ParaDRAM - NOTE: component `sampleList` of the ParaDRAM object as a Python list. ParaDRAM - NOTE: For example, to access the contents of the first (or the only) sample file, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.sampleList[0].df ParaDRAM - NOTE: ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name. ParaDRAM - NOTE: To access the plotting tools, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.sampleList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For example, ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.sampleList[0].plot.line() # to make 2D line plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.scatter() # to make 2D scatter plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.lineScatter() # to make 2D line-scatter plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.line3() # to make 3D line plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.scatter3() # to make 3D scatter plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.lineScatter3() # to make 3D line-scatter plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.contour() # to make fast 2D kernel density plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.contourf() # to make fast 2D kernel density filled contour plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.contour3() # to make fast 3D kernel density contour plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.histplot() # to make seaborn 1D distribution plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.kdeplot1() # to make seaborn 1D kernel density plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.kdeplot2() # to make seaborn 2D kernel density plots. ParaDRAM - NOTE: pmpd.sampleList[0].plot.grid() # to make GridPlot ParaDRAM - NOTE: ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.sampleList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
By default, the sampler generates a highly refined decorrelated final sample. Essentially, the ParaDRAM sampler refining the Markov chain for as long as there is any residual auto-correlation in any of the individual variables of the Markov chain that has been sampled. This is an extremely strong and conservative refinement strategy (and it is done so in the ParaDRAM algorithm for a good reason: to ensure independent and identically distributed (i.i.d.) samples from the objective function.
This extreme refinement can be relaxed and controlled by the user, by setting the following ParaDRAM simulation specification variable prior to the simulation run,
pmpd.spec.sampleRefinementCount = 1
If you rerun the simulation with this specification set and, you will notice about a three-fold increase in the final sample size. Note that the above specification has to be set before running the simulation in the above if you really want to use it (also, do not forget to specify a new output file prefix for the new simulation because simulations cannot be overwritten). A value of 1
will cause the ParaDRAM sampler to refine the Markov chain for only one round, which what most people in the MCMC community do.
To quickly visualize the generated sample as a histogram, try,
sample.plot.histplot()
ParaDRAM - NOTE: making the histplot plot...
done in 0.147 seconds.
How about adding a target value to this histogram plot? By default, if no value
is specified for the target object, it will add the state corresponding to the mode of objective funciton to the plot, as inferred from the sample file,
sample.plot.histplot.reset()
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
sample.plot.histplot.target() # add a line corresponding to the maxLogFunc (mode) of the sampled points.
ParaDRAM - NOTE: making the histplot plot...
done in 0.111 seconds.
However, assiging other values is also possible. For example, let's say that we want to add the mean and 1-sigma limits to the plot, instead of the single mode,
sample.plot.histplot()
sample.plot.histplot.currentFig.axes.set_xlabel("Standard Gaussian Random Value")
avg = sample.df.SampleVariable1.mean()
sample.plot.histplot.target( value = avg )
std = sample.df.SampleVariable1.std()
sample.plot.histplot.target.axvline.kws.linestyle = "--"
sample.plot.histplot.target( value = avg - std )
sample.plot.histplot.target( value = avg + std )
ParaDRAM - NOTE: making the histplot plot...
done in 0.109998 seconds.
We can also plot all of the variables at the same time in a single figure,
sample.df.head()
SampleLogFunc | SampleVariable1 | SampleVariable2 | SampleVariable3 | SampleVariable4 | |
---|---|---|---|---|---|
0 | -7.437893 | -10.602866 | 13.914840 | 21.541231 | 1.239213 |
1 | -7.437893 | -10.602866 | 13.914840 | 21.541231 | 1.239213 |
2 | -6.866939 | -11.305132 | 14.502793 | 22.145923 | 1.054045 |
3 | -4.833832 | -11.356071 | 13.499396 | 20.976090 | 1.235661 |
4 | -4.833832 | -11.356071 | 13.499396 | 20.976090 | 1.235661 |
sample.plot.histplot( xcolumns = [1,2,3,4] )
ParaDRAM - NOTE: making the histplot plot...
done in 0.246995 seconds.
Perhaps, we would also want to add legend and change the x-axis title,
sample.plot.histplot.reset("hard") # reseting to the default is not necessary, but does not hurt either.
sample.plot.histplot.legend.enabled = True
sample.plot.histplot( xcolumns = [1,2,3,4] )
sample.plot.histplot.currentFig.axes.set_xlabel("Multivariate Normal Variables")
ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: making the histplot plot...
done in 0.20597 seconds.
Text(0.5, 31.07499999999998, 'Multivariate Normal Variables')
To make a trace-plot of the sample, try,
sample.plot.line()
ParaDRAM - NOTE: making the line plot...
done in 0.286038 seconds.
To change the scale of the x-axis, try,
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
ParaDRAM - NOTE: making the line plot...
done in 0.218998 seconds.
By default, the color of the line in the trace-plot will represent the value returned by getLogFunc()
at the given sampled point.
sample.plot.line.ccolumns
'SampleLogFunc'
To turn the color off, we can instead try,
sample.plot.line.ccolumns = None
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
ParaDRAM - NOTE: making the line plot...
done in 0.092 seconds.
There are many other properties of the plot that can be set or modified via the attributes of the pmpd.sampleList[0].plot.line
object. To see them all, uncomment the following line to see the documentation of the object,
# sample.plot.line.helpme()
We could also plot all variables in the sample plot,
sample.plot.line.reset() # reset to the default settings
sample.plot.line.ccolumns = None # turn off the color-mapping
sample.plot.line.ycolumns = pmpd.sampleList[0].df.columns[1:] # exclude the SampleLogFunc column
sample.plot.line()
sample.plot.line.currentFig.axes.set_xscale("log")
ParaDRAM - NOTE: making the line plot...
done in 0.108999 seconds.
We could also make scatter or line-scatter trace-plots of the sampled points. The ParaMonte library has a visualization tool for that purpose with some nice predefined settings that can be changed by the user. For example, let's make a scatter and line-scatter traceplots of the first sampled variable,
sample.plot.scatter()
ParaDRAM - NOTE: making the scatter plot...
done in 0.140001 seconds.
sample.plot.lineScatter()
sample.plot.lineScatter.currentFig.axes.set_xscale("log")
ParaDRAM - NOTE: making the lineScatter plot...
done in 0.166998 seconds.
Setting or modifying the properties of a scatter or lineScatter plot is identical to the line plot. By default, the sampler makes a scatter plot of only the first sampled variable as a function of the order by which the points appear in the sample file. Alternatively, we may want to plot individual variables against each other. For example, to make scatter plots of all variables against the logFunc
,
sample.plot.scatter.reset()
sample.plot.scatter.ycolumns = "SampleLogFunc"
for variable in pmpd.sampleList[0].df.columns[1:]: # exclude the SampleLogFunc column
sample.plot.scatter.xcolumns = variable
sample.plot.scatter()
ParaDRAM - NOTE: making the scatter plot...
done in 0.137001 seconds. ParaDRAM - NOTE: making the scatter plot...
done in 0.136036 seconds. ParaDRAM - NOTE: making the scatter plot...
done in 0.122964 seconds. ParaDRAM - NOTE: making the scatter plot...
done in 0.137002 seconds.
To make bivariate plots of sampled variables against each other, instead of traceplots, simply specify the columns of data to plot,
sample.plot.lineScatter( xcolumns = 1, ycolumns = 2)
ParaDRAM - NOTE: making the lineScatter plot...
done in 0.138999 seconds.
To make 3D plots of data, we can try the ParaMonte 3D visualzation tools, like, line3
, scatter3
, or lineScatter3
,
sample.plot.lineScatter3()
ParaDRAM - NOTE: making the lineScatter3 plot...
done in 0.089024 seconds.
To make kernel density contour plots of the sampled points, we can try
sample.plot.contour()
ParaDRAM - NOTE: making the contour plot...
done in 0.465034 seconds.
To make kernel density filled contour plots of the sampled points, we can try,
sample.plot.contourf()
ParaDRAM - NOTE: making the contourf plot...
done in 0.497998 seconds.
Let's add the mode of the distribution as target to the distribution, too,
sample.plot.contourf()
sample.plot.contourf.target()
ParaDRAM - NOTE: making the contourf plot...
done in 0.496035 seconds.
sample.plot.contourf.reset()
sample.plot.contourf()
sample.plot.contourf.target()
ParaDRAM - NOTE: making the contourf plot...
C:\Users\shahmoradia\AppData\Roaming\Python\Python37\site-packages\paramonte\vis\_BasePlot.py:634: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`). self.currentFig.figure = plt.figure( **vars(self.figure.kws) )
done in 0.493001 seconds.
we could add as many target as desired. For example, let's add the point corresponding to the maximum of the two variables in the above plot, and let's give it a purple color,
max1 = sample.df.SampleVariable1.max()
max2 = sample.df.SampleVariable2.max()
sample.plot.contourf.target.axvline.kws.linestyle = "--"
sample.plot.contourf.target.axhline.kws.linestyle = "--"
sample.plot.contourf.target.axvline.kws.color = "purple"
sample.plot.contourf.target.axhline.kws.color = "purple"
sample.plot.contourf.target.scatter.kws.color = "purple"
sample.plot.contourf.target( value = [max1, max2] )
To make 3D kernel density contour plots of the sampled points, we can try,
sample.plot.contour3()
ParaDRAM - NOTE: making the contour3 plot...
done in 0.537002 seconds.
By default, the kernel density plot displays SampleVariable2
vs. SampleVariable1
, if no other specifications are made by the user.
There are also the kdeplot1()
and kdeplot2()
visualization tools that make calls to the kdeplot()
function of the seaborn library. However, those function implementations in seaborn are currently computationally too slow and therefore we will not diplay them here, though you can try them by yourself.
There is also the ParaMonte jointplot()
visualization tool which implicitly calls the jointplot()
function of the seaborn library.
sample.plot.jointplot(xcolumns = 3, ycolumns = 4)
ParaDRAM - NOTE: making the jointplot plot...
done in 7.438025 seconds.
We can also make a grid plot of all of the variables against each other via,
sample.plot.grid()
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 16... done in 0.042745 seconds. generating subplot #2: (0,1) out of 16... done in 0.243962 seconds. generating subplot #3: (0,2) out of 16... done in 0.225037 seconds. generating subplot #4: (0,3) out of 16... done in 0.223002 seconds. generating subplot #5: (1,0) out of 16... done in 0.465013 seconds. generating subplot #6: (1,1) out of 16... done in 0.042745 seconds. generating subplot #7: (1,2) out of 16... done in 0.206019 seconds. generating subplot #8: (1,3) out of 16... done in 0.212969 seconds. generating subplot #9: (2,0) out of 16... done in 0.491033 seconds. generating subplot #10: (2,1) out of 16... done in 0.407998 seconds. generating subplot #11: (2,2) out of 16... done in 0.042745 seconds. generating subplot #12: (2,3) out of 16... done in 0.236965 seconds. generating subplot #13: (3,0) out of 16... done in 0.439003 seconds. generating subplot #14: (3,1) out of 16... done in 0.4 seconds. generating subplot #15: (3,2) out of 16... done in 0.475999 seconds. generating subplot #16: (3,3) out of 16... done in 0.042745 seconds. generating colorbar... done in 0.028029 seconds.
By default, since the grid plot is rather computationally demanding, the grid plot adds only up to 5 columns of data to the grid plot,
sample.plot.grid.columns
Index(['SampleLogFunc', 'SampleVariable1', 'SampleVariable2', 'SampleVariable3'], dtype='object')
This behavior can be overriden by the user by assigning more or less columns of the data to the columns
atribute of the grid plot, like,
sample.plot.grid.columns = [1,2,3] # indices and column names can be mixed
sample.plot.grid.plotType.lower.value = "contourf"
sample.plot.grid()
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 9... done in 0.044655 seconds. generating subplot #2: (0,1) out of 9... done in 0.17302 seconds. generating subplot #3: (0,2) out of 9... done in 0.158965 seconds. generating subplot #4: (1,0) out of 9... done in 0.527028 seconds. generating subplot #5: (1,1) out of 9... done in 0.044655 seconds. generating subplot #6: (1,2) out of 9... done in 0.150036 seconds. generating subplot #7: (2,0) out of 9... done in 0.481963 seconds. generating subplot #8: (2,1) out of 9... done in 0.482001 seconds. generating subplot #9: (2,2) out of 9... done in 0.044655 seconds. generating colorbar... done in 0.024001 seconds.
If one corner of the plot is not needed, we cal also simply hide it by disabling it,
sample.plot.grid.columns = [1,2,3] # indices and column names can be mixed
sample.plot.grid.plotType.lower.value = "contourf"
sample.plot.grid.plotType.upper.enabled = False # disable the upper triangle
sample.plot.grid()
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 9... done in 0.042667 seconds. generating subplot #2: (0,1) out of 9... done in 0.0 seconds. generating subplot #3: (0,2) out of 9... done in 0.0 seconds. generating subplot #4: (1,0) out of 9... done in 0.478046 seconds. generating subplot #5: (1,1) out of 9... done in 0.042667 seconds. generating subplot #6: (1,2) out of 9... done in 0.0 seconds. generating subplot #7: (2,0) out of 9... done in 0.456987 seconds. generating subplot #8: (2,1) out of 9... done in 0.450985 seconds. generating subplot #9: (2,2) out of 9... done in 0.042667 seconds. generating colorbar... done in 0.032963 seconds.
You can also hide the upper or the lower triangles or the colorbar or all three within the current figure by using the "hide" method of the grid object,
sample.plot.grid.reset()
sample.plot.grid()
sample.plot.grid.hide('lower')
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 16... done in 0.04575 seconds. generating subplot #2: (0,1) out of 16... done in 0.250976 seconds. generating subplot #3: (0,2) out of 16... done in 0.227988 seconds. generating subplot #4: (0,3) out of 16... done in 0.219035 seconds. generating subplot #5: (1,0) out of 16... done in 0.428999 seconds. generating subplot #6: (1,1) out of 16... done in 0.04575 seconds. generating subplot #7: (1,2) out of 16... done in 0.218967 seconds. generating subplot #8: (1,3) out of 16... done in 0.234004 seconds. generating subplot #9: (2,0) out of 16... done in 0.426001 seconds. generating subplot #10: (2,1) out of 16... done in 0.457033 seconds. generating subplot #11: (2,2) out of 16... done in 0.04575 seconds. generating subplot #12: (2,3) out of 16... done in 0.226945 seconds. generating subplot #13: (3,0) out of 16... done in 0.420031 seconds. generating subplot #14: (3,1) out of 16... done in 0.426963 seconds. generating subplot #15: (3,2) out of 16... done in 0.418039 seconds. generating subplot #16: (3,3) out of 16... done in 0.04575 seconds. generating colorbar... done in 0.021963 seconds.
To show the triangle again, you can call the show()
method. To see what input options are available with hide()
or show()
, get help like the following,
sample.plot.grid.helpme("hide")
Hides the requested part of the grid plot. **Parameters** part A string with the following possible values: "lower" hides the lower triangle of the grid plot. "upper" hides the upper triangle of the grid plot. "diag" hides the diagonal of the grid plot. "all" hides all grid plots and the colorbar. "colorbar" hides the colorbar of the grid plot. The string can also be a mix of the above keywords, separated by the ``+`` sign or some other delimiter. For example, ``"lower+upper+colorbar"`` **Returns** None
We can more formally ensure the i.i.d. property of the resulting sample from the simulation by computing and visualizing the autocorrelation of the sampled points. Whenever a ParaDRAM output sample file is read, the sampler automatically computes the autocorrelation of sampled points along each dimension of the domain of the objective function.
sample.stats.autocorr()
sample.stats.autocorr.plot.line()
ParaDRAM - NOTE: adding the autocrrelation graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.001001 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: making the line plot...
done in 0.291034 seconds.
The above AutoCorrelation plot is reassuring since the sampled points do not appear to be correlated with each other at all. This is because the ParaDRAM routine, by default, applies many rounds aggressive refinement of the raw (verbose) Markov chain as it deems necessary to remove any residual correlations from the final output refined sample that we have plotted in the above.
We can compare the autocorrelation of the refined sample in the above with the autocorrelation of the raw (verbose) Markov chain,
markovChainList = pmpd.readMarkovChain(renabled = True) # object return enabled.
markovChain = markovChainList[0]
ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readMarkovchain()`` ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed? ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated ParaDRAM - WARNING: file format for the contents of the chain file(s) to be parsed. ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_chain.txt" ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\sampling_multivariate_normal_distribution_via_paradram\out\mvn_serial_process_1_chain.txt ParaDRAM - NOTE: reading the file contents... done in 4.301028 seconds. ParaDRAM - NOTE: ndim = 4, count = 101916 ParaDRAM - NOTE: parsing file contents... ParaDRAM - NOTE: computing the sample correlation matrix... ParaDRAM - NOTE: adding the correlation graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.018999 seconds. ParaDRAM - NOTE: computing the sample covariance matrix... ParaDRAM - NOTE: adding the covariance graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.016 seconds. ParaDRAM - NOTE: computing the sample autocorrelations... ParaDRAM - NOTE: adding the autocrrelation graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: adding the graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.000998 seconds. ParaDRAM - NOTE: The processed markovChain files are now stored in the output variable as a ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only) ParaDRAM - NOTE: markovChain file stored in an output variable named markovChainList, try: ParaDRAM - NOTE: ParaDRAM - NOTE: markovChainList[0].df ParaDRAM - NOTE: ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name. ParaDRAM - NOTE: To access the plotting tools, try: ParaDRAM - NOTE: ParaDRAM - NOTE: markovChainList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For example, ParaDRAM - NOTE: ParaDRAM - NOTE: markovChainList[0].plot.line() # to make 2D line plots. ParaDRAM - NOTE: markovChainList[0].plot.scatter() # to make 2D scatter plots. ParaDRAM - NOTE: markovChainList[0].plot.lineScatter() # to make 2D line-scatter plots. ParaDRAM - NOTE: markovChainList[0].plot.line3() # to make 3D line plots. ParaDRAM - NOTE: markovChainList[0].plot.scatter3() # to make 3D scatter plots. ParaDRAM - NOTE: markovChainList[0].plot.lineScatter3() # to make 3D line-scatter plots. ParaDRAM - NOTE: markovChainList[0].plot.contour() # to make fast 2D kernel density plots. ParaDRAM - NOTE: markovChainList[0].plot.contourf() # to make fast 2D kernel density filled contour plots. ParaDRAM - NOTE: markovChainList[0].plot.contour3() # to make fast 3D kernel density contour plots. ParaDRAM - NOTE: markovChainList[0].plot.histplot() # to make seaborn 1D distribution plots. ParaDRAM - NOTE: markovChainList[0].plot.kdeplot1() # to make seaborn 1D kernel density plots. ParaDRAM - NOTE: markovChainList[0].plot.kdeplot2() # to make seaborn 2D kernel density plots. ParaDRAM - NOTE: markovChainList[0].plot.grid() # to make GridPlot ParaDRAM - NOTE: ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try: ParaDRAM - NOTE: ParaDRAM - NOTE: markovChainList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
markovChain.stats.autocorr()
markovChain.stats.autocorr.plot.line()
ParaDRAM - NOTE: adding the autocrrelation graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: making the line plot...
done in 0.588002 seconds.
Compare the above plot to the same plot that we got for the resulting refined samples earlier in the above. Unlike the refined sample, the Markov chain is significantly correlated with itself along each dimension. The large amount of autocorrelation seen for SampleLogFunc
is because of the fact that we started the MCMC sampling from a very bad low-probability location, which is also visible the grid plots in the above.
To get the statistics of the maximum of the function, try,
print( "maxLogFunc: {}".format(sample.stats.maxLogFunc.value) )
print( "The location of maxLogFunc: {}".format(sample.stats.maxLogFunc.state.values) )
maxLogFunc: -2.5858423 The location of maxLogFunc: [-10.083953 14.828449 20.037333 0.12419514]
which is again reassuring, since we already know that the maximum of the standard Gaussian distribution happens at zero, which is very close to the ParaDRAM sampler's estimated location of maxLogFunc
in the above.
Further statistics of the resulting sample and the raw verbose Markov chain can be computed from the dataFrame. For example, via,
sample.df.describe()
SampleLogFunc | SampleVariable1 | SampleVariable2 | SampleVariable3 | SampleVariable4 | |
---|---|---|---|---|---|
count | 8924.000000 | 8924.000000 | 8924.000000 | 8924.000000 | 8924.000000 |
mean | -4.570621 | -10.013841 | 14.988699 | 19.994788 | -0.006943 |
std | 1.400108 | 0.999499 | 1.007615 | 1.001679 | 1.006100 |
min | -13.263951 | -13.558018 | 10.992699 | 16.534873 | -4.150198 |
25% | -5.271783 | -10.703673 | 14.300603 | 19.330915 | -0.674777 |
50% | -4.251049 | -10.035884 | 14.987852 | 20.001485 | -0.009479 |
75% | -3.519423 | -9.338450 | 15.684947 | 20.655977 | 0.664621 |
max | -2.585842 | -6.221306 | 18.829987 | 23.608027 | 3.510943 |
Some statistics are also automatically computed and reported in the output report file of every ParaDRAM simulation (which ends with the suffix *_report.txt
). One can either directly inspect the contents of this file, or use the readReport()
function of the ParaDRAM sampler object, like,
reportList = pmpd.readReport(renabled = True) # object return enabled
report = reportList[0]
ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_report.txt" ParaDRAM - NOTE: processing report file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\sampling_multivariate_normal_distribution_via_paradram\out\mvn_serial_process_1_report.txt ParaDRAM - NOTE: reading the file contents... ParaDRAM - NOTE: parsing the report file contents... done in 0.010997 seconds. ParaDRAM - NOTE: The processed report files are now stored in the output variable as a ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only) ParaDRAM - NOTE: report file stored in an output variable named reportList, try: ParaDRAM - NOTE: ParaDRAM - NOTE: reportList[0].contents.print() ParaDRAM - NOTE: ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name. ParaDRAM - NOTE: To access the simulation statistics and information, examine the contents of the ParaDRAM - NOTE: components of the following structures: ParaDRAM - NOTE: ParaDRAM - NOTE: reportList[0].contents.print() # to print the contents of the report file. ParaDRAM - NOTE: reportList[0].setup # to get information about the simulation setup. ParaDRAM - NOTE: reportList[0].stats.time # to get the timing information of the simulation. ParaDRAM - NOTE: reportList[0].stats.chain # to get the statistics of the simulation output sample. ParaDRAM - NOTE: reportList[0].stats.numFuncCall # to get information about the number of function calls. ParaDRAM - NOTE: reportList[0].stats.parallelism # to get information about the simulation parallelism. ParaDRAM - NOTE: reportList[0].spec # to get the simulation specification in the report file. ParaDRAM - NOTE: ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
print(report.stats.chain.refined.ess.value)
print(report.stats.chain.refined.ess.description)
8924 This is the estimated Effective (decorrelated) Sample Size (ESS) of the final refined chain.
Now, to see the effects of setting the starting point of the MCMC sampler, we will take a look at the full chain (in compact form) of uniquely sampled points form the objective function,
pmpd.readChain()
chain = pmpd.chainList[0]
ParaDRAM - WARNING: The ``delimiter`` is neither given as input to ``readChain()`` ParaDRAM - WARNING: nor set as a simulation specification of the ParaDRAM object. ParaDRAM - WARNING: This information is essential, otherwise how could the output files be parsed? ParaDRAM - WARNING: For now, the ParaDRAM sampler will assume a comma-separated ParaDRAM - WARNING: file format for the contents of the chain file(s) to be parsed. ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_chain.txt" ParaDRAM - NOTE: processing chain file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\sampling_multivariate_normal_distribution_via_paradram\out\mvn_serial_process_1_chain.txt ParaDRAM - NOTE: reading the file contents... done in 0.056 seconds. ParaDRAM - NOTE: ndim = 4, count = 30000 ParaDRAM - NOTE: parsing file contents... ParaDRAM - NOTE: computing the sample correlation matrix... ParaDRAM - NOTE: adding the correlation graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.015963 seconds. ParaDRAM - NOTE: computing the sample covariance matrix... ParaDRAM - NOTE: adding the covariance graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.017973 seconds. ParaDRAM - NOTE: computing the sample autocorrelations... ParaDRAM - NOTE: adding the autocrrelation graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: adding the graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a line3 plot object from scratch... done in 0.000999 seconds. ParaDRAM - NOTE: creating a scatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a jointplot plot object from scratch... done in 0.001002 seconds. ParaDRAM - NOTE: creating a histplot plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot1 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a kdeplot2 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contourf plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a contour plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a grid plot object from scratch... done in 0.001999 seconds. ParaDRAM - NOTE: The processed chain files are now stored in the newly-created ParaDRAM - NOTE: component `chainList` of the ParaDRAM object as a Python list. ParaDRAM - NOTE: For example, to access the contents of the first (or the only) chain file, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.chainList[0].df ParaDRAM - NOTE: ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name. ParaDRAM - NOTE: To access the plotting tools, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.chainList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For example, ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.chainList[0].plot.line() # to make 2D line plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.scatter() # to make 2D scatter plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.lineScatter() # to make 2D line-scatter plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.line3() # to make 3D line plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.scatter3() # to make 3D scatter plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.lineScatter3() # to make 3D line-scatter plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.contour() # to make fast 2D kernel density plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.contourf() # to make fast 2D kernel density filled contour plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.contour3() # to make fast 3D kernel density contour plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.histplot() # to make seaborn 1D distribution plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.kdeplot1() # to make seaborn 1D kernel density plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.kdeplot2() # to make seaborn 2D kernel density plots. ParaDRAM - NOTE: pmpd.chainList[0].plot.grid() # to make GridPlot ParaDRAM - NOTE: ParaDRAM - NOTE: To plot or inspect the variable autocorrelations or the correlation/covariance matrices, try: ParaDRAM - NOTE: ParaDRAM - NOTE: pmpd.chainList[0].stats.<PRESS TAB TO SEE THE LIST OF COMPONENTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
Just as with the refined sample, we can also make the same types of plots for the compact and verbose (Markov) chains, for example, a grid plot,
chain.df.head()
ProcessID | DelayedRejectionStage | MeanAcceptanceRate | AdaptationMeasure | BurninLocation | SampleWeight | SampleLogFunc | SampleVariable1 | SampleVariable2 | SampleVariable3 | SampleVariable4 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 1.000000 | 0.0 | 1 | 1 | -329.22317 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
1 | 1 | 0 | 0.666667 | 0.0 | 1 | 2 | -319.38776 | -1.131467 | -1.011470 | 2.440997 | -1.638004 |
2 | 1 | 0 | 0.600000 | 0.0 | 2 | 2 | -292.87086 | -2.708690 | -1.746430 | 2.887417 | 0.428331 |
3 | 1 | 0 | 0.571429 | 0.0 | 3 | 2 | -290.73338 | -4.913328 | -2.901997 | 2.351047 | -0.017945 |
4 | 1 | 0 | 0.625000 | 0.0 | 4 | 1 | -246.42733 | -6.311880 | -1.069209 | 3.394714 | 0.746134 |
chain.plot.grid(columns = [7,8,9,10]) # plot SampleVariable1 to SampleVariable4
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 16... done in 0.046751 seconds. generating subplot #2: (0,1) out of 16... done in 0.321995 seconds. generating subplot #3: (0,2) out of 16... done in 0.289995 seconds. generating subplot #4: (0,3) out of 16... done in 0.267007 seconds. generating subplot #5: (1,0) out of 16... done in 0.412992 seconds. generating subplot #6: (1,1) out of 16... done in 0.046751 seconds. generating subplot #7: (1,2) out of 16... done in 0.284001 seconds. generating subplot #8: (1,3) out of 16... done in 0.275013 seconds. generating subplot #9: (2,0) out of 16... done in 0.445985 seconds. generating subplot #10: (2,1) out of 16... done in 0.438034 seconds. generating subplot #11: (2,2) out of 16... done in 0.046751 seconds. generating subplot #12: (2,3) out of 16... done in 0.285999 seconds. generating subplot #13: (3,0) out of 16... done in 0.473003 seconds. generating subplot #14: (3,1) out of 16... done in 0.414995 seconds. generating subplot #15: (3,2) out of 16... done in 0.448002 seconds. generating subplot #16: (3,3) out of 16... done in 0.046751 seconds. generating colorbar... done in 0.024999 seconds.
We can also add target values to this grid plot via the addTarget()
method of the GridPlot
object. By default, there are a few types target that are currently available out of the box, like "mode"
, "mean"
, "median"
. The "mode"
option corresponds to the state where the maximum of the objective function occurs. This is the default value
if no value
is provided to addTarget.
chain.plot.grid(columns = [8,7,10,9]) # plot in random order, just for illustration.
chain.plot.grid.addTarget()
ParaDRAM - NOTE: making the grid plot...
generating subplot #1: (0,0) out of 16... done in 0.050248 seconds. generating subplot #2: (0,1) out of 16... done in 0.322016 seconds. generating subplot #3: (0,2) out of 16... done in 0.300982 seconds. generating subplot #4: (0,3) out of 16... done in 0.310997 seconds. generating subplot #5: (1,0) out of 16... done in 0.491 seconds. generating subplot #6: (1,1) out of 16... done in 0.050248 seconds. generating subplot #7: (1,2) out of 16... done in 0.301033 seconds. generating subplot #8: (1,3) out of 16... done in 0.271969 seconds. generating subplot #9: (2,0) out of 16... done in 0.44703 seconds. generating subplot #10: (2,1) out of 16... done in 0.424003 seconds. generating subplot #11: (2,2) out of 16... done in 0.050248 seconds. generating subplot #12: (2,3) out of 16... done in 0.300965 seconds. generating subplot #13: (3,0) out of 16... done in 0.452036 seconds. generating subplot #14: (3,1) out of 16... done in 0.409017 seconds. generating subplot #15: (3,2) out of 16... done in 0.420001 seconds. generating subplot #16: (3,3) out of 16... done in 0.050248 seconds. generating colorbar... done in 0.022053 seconds. generating target #0: (0,0) out of 16... done in 0.002998 seconds. generating target #0: (0,1) out of 16... done in 0.004968 seconds. generating target #0: (0,2) out of 16... done in 0.007946 seconds. generating target #0: (0,3) out of 16... done in 0.007011 seconds. generating target #0: (1,0) out of 16... done in 0.005988 seconds. generating target #0: (1,1) out of 16... done in 0.003013 seconds. generating target #0: (1,2) out of 16... done in 0.00499 seconds. generating target #0: (1,3) out of 16... done in 0.007 seconds. generating target #0: (2,0) out of 16... done in 0.005026 seconds. generating target #0: (2,1) out of 16... done in 0.005001 seconds. generating target #0: (2,2) out of 16... done in 0.002973 seconds. generating target #0: (2,3) out of 16... done in 0.004 seconds. generating target #0: (3,0) out of 16... done in 0.007022 seconds. generating target #0: (3,1) out of 16... done in 0.004001 seconds. generating target #0: (3,2) out of 16... done in 0.004976 seconds. generating target #0: (3,3) out of 16... done in 0.002002 seconds.
The color in the scatter plots represents the value of logFunc
at each point. The blue color of the density plots represents the density of the sampled points at any given location.
We can also add targets to these plots, representing a specific value of interest. For example, we may be interested in displaying the location of the maximum logFunc
on these plots. This is the default value that is loaded on the targets when the grid
object is created for the first time.
By default, the ParaDRAM sampler adapts the proposal distribution of the sampler throughout the entire simulation. This breaks the ergodicity condition of the Markov chain, however, if the adaptation tends to continuously and progressively diminish throughout the entire simulation, then we could potentially trust the Markov chain simulation results. The alternative is to simply turn off the adaptation, but that is rarely a good strategy. Instead, a better solution is to keep the adaptivity on, however, simulate for a long-enough times to ensure convergence of the Markov chain to the target density has occurred.
A unique feature of the ParaDRAM sampler is that, it provides a measure of the amount of adaptation of the proposal distribution of the Markov chain by measuring how much the proposal distribution progressively changes throughout the simulation. In essence, the computed quantity, represented by the column AdaptationMeasure in the output chain files, represents an upper limit on the total variation distance between each updated proposal distribution and the last one used before it.
We can plot the AdaptationMeasure for this sampling problem to ensure that the adaptation of the proposal distribution happens progressively less and less throughout the simulation. If AdaptationMeasure does not diminish monotonically throughout the simulation, it would be a strong indication of the lack of convergence of the MCMC sampler to the target objective function.
markovChain.plot.scatter.scatter.kws.cmap = "winter"
markovChain.plot.scatter(ycolumns="AdaptationMeasure",ccolumns=[])
markovChain.plot.scatter.currentFig.axes.set_yscale("log")
markovChain.plot.scatter.currentFig.axes.set_ylim([1.e-5,1])
ParaDRAM - NOTE: making the scatter plot...
done in 0.22904 seconds.
(1e-05, 1)
The above curve is exactly the kind of diminishing adaptation that we would want to see in a ParaDRAM simulation: At the beginning, the sampler appears to struggle for a while to find the shape of the target objective function, but around step 100, it appears to have found the peak of the target and starts to quickly converge and adapt the shape of the proposal distribution of the Markov Chain sampler to the shape of the objective function.
We can also parse the contents of the ASCII-format restart file generated by the ParaDRAM sampler to gain insight into how the proposal distribution has been updated over the course of the simulation.
restartList = pmpd.readRestart(renabled = True) # object return enabled.
restart = restartList[0]
ParaDRAM - NOTE: 1 files detected matching the pattern: "./out/mvn_serial*_restart.txt" ParaDRAM - NOTE: processing restart file: D:\Dropbox\Projects\20180101_ParaMonte\git\paramontex\Python\Jupyter\sampling_multivariate_normal_distribution_via_paradram\out\mvn_serial_process_1_restart.txt ParaDRAM - NOTE: reading the file contents... . done in 0.389984 seconds. ParaDRAM - NOTE: adding the graphics tools... ParaDRAM - NOTE: creating a line plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a scatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a lineScatter plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a covmat2 plot object from scratch... done in 0.004002 seconds. ParaDRAM - NOTE: creating a covmat3 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a cormat2 plot object from scratch... done in 0.0 seconds. ParaDRAM - NOTE: creating a cormat3 plot object from scratch... done in 0.001005 seconds. done in 0.015038 seconds. ParaDRAM - NOTE: The processed restart files are now stored in the output variable as a ParaDRAM - NOTE: Python list. For example, to access the contents of the first (or the only) ParaDRAM - NOTE: restart file stored in an output variable named restartList, try: ParaDRAM - NOTE: ParaDRAM - NOTE: restartList[0].contents ParaDRAM - NOTE: ParaDRAM - NOTE: where you will have to replace `pmpd` with your ParaDRAM instance name. ParaDRAM - NOTE: To access the plotting tools, try: ParaDRAM - NOTE: ParaDRAM - NOTE: restartList[0].plot.<PRESS TAB TO SEE THE LIST OF PLOTS> ParaDRAM - NOTE: ParaDRAM - NOTE: For example, ParaDRAM - NOTE: ParaDRAM - NOTE: restartList[0].plot.line() # to make bivariate line plots. ParaDRAM - NOTE: restartList[0].plot.scatter() # to make bivariate scatter plots. ParaDRAM - NOTE: restartList[0].plot.lineScatter() # to make bivariate line-scatter plots. ParaDRAM - NOTE: restartList[0].plot.covmat2() # to make proposal covariance evolution 2D plots. ParaDRAM - NOTE: restartList[0].plot.covmat3() # to make proposal covariance evolution 3D plots. ParaDRAM - NOTE: restartList[0].plot.cormat2() # to make proposal correlation evolution 2D plots. ParaDRAM - NOTE: restartList[0].plot.cormat3() # to make proposal correlation evolution 3D plots. ParaDRAM - NOTE: ParaDRAM - NOTE: For more information and examples on the usage, visit: ParaDRAM - NOTE: ParaDRAM - NOTE: https://www.cdslab.org/paramonte
restart.plot.cormat2.reset()
restart.plot.cormat2()
restart.plot.cormat2.reset()
restart.plot.cormat2(dimensionPair = [2,3])
ParaDRAM - NOTE: making the cormat2 plot...
done in 0.384984 seconds. ParaDRAM - NOTE: making the cormat2 plot...
done in 0.512034 seconds.
restart.plot.covmat2.reset()
restart.plot.covmat2()
restart.plot.covmat2.reset()
restart.plot.covmat2(dimensionPair = [2,3])
ParaDRAM - NOTE: making the covmat2 plot...
done in 0.340034 seconds. ParaDRAM - NOTE: making the covmat2 plot...
done in 0.329 seconds.
restart.plot.covmat3.reset()
restart.plot.covmat3()
restart.plot.covmat3.reset()
restart.plot.covmat3(dimensionPair = [2,3])
ParaDRAM - NOTE: making the covmat3 plot...
done in 0.359999 seconds. ParaDRAM - NOTE: making the covmat3 plot...
done in 0.35801 seconds.
As seen in the above plots, the proposal adaptations gradually diminish toward the end of the simulation, and the shape of the proposal distribution stabilizes relatively fast.
To construct and visualize the correlation matrix of the sampled points in the chain, we can try,
sample.stats.cormat.plot.heatmap()
ParaDRAM - NOTE: making the heatmap plot...
By default, the covariance and correlation matrices are precmputed at the time of reading the data. In particular, the correlation matrix is computed via the Pearson's method of computing the correlation. We can easily recompute the correlation matrix using either Spearman's or Kendal's rank correlation coefficients, like,
sample.stats.cormat(method="spearman",reself=True).plot.heatmap()
ParaDRAM - NOTE: adding the correlation graphics tools... ParaDRAM - NOTE: creating a heatmap plot object from scratch... done in 0.017 seconds. ParaDRAM - NOTE: making the heatmap plot...
The input argument reself
requests the function to return an instance of the self object as the function output. To visualize the covariance matrix we can try,
sample.stats.covmat.plot.heatmap()
ParaDRAM - NOTE: making the heatmap plot...
Note that the same plots can be also made for the compact and verbose (Markov) chains. For brevity, we do not show them here.
There are many more functionalities and features of the ParaMonte library that were neither explored nor mentioned in this example Jupyter notebook. You can explore them by checking the existing components of each attribute of the ParaDRAM sampler class and by visiting the ParaMonte library's documentation website.