# Multivariate Gaussian Test¶

Comparison of MCMC and PLC in a multi-variate gaussian problem

This tutorial produces an N-dimensional multivariate Gaussian with a non-trivial covariance matrix. By default N=4 (called "dim").

A subset of these are considered parameters of interest. This problem is tractable analytically.

We use this mainly as a test of Markov Chain Monte Carlo and we compare the result to the profile likelihood ratio.

We use the proposal helper to create a customized proposal function for this problem.

For N=4 and 2 parameters of interest it takes about 10-20 seconds and the acceptance rate is 37%

Author: Kevin Belasco and Kyle Cranmer
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Sunday, February 16, 2020 at 03:42 AM.

In [1]:
%%cpp -d
#include "RooGlobalFunc.h"
#include <stdlib.h>
#include "TMatrixDSym.h"
#include "RooMultiVarGaussian.h"
#include "RooArgList.h"
#include "RooRealVar.h"
#include "TH2F.h"
#include "TCanvas.h"
#include "RooAbsReal.h"
#include "RooFitResult.h"
#include "TStopwatch.h"
#include "RooStats/MCMCCalculator.h"
#include "RooStats/MetropolisHastings.h"
#include "RooStats/MarkovChain.h"
#include "RooStats/ConfInterval.h"
#include "RooStats/MCMCInterval.h"
#include "RooStats/MCMCIntervalPlot.h"
#include "RooStats/ModelConfig.h"
#include "RooStats/ProposalHelper.h"
#include "RooStats/ProposalFunction.h"
#include "RooStats/PdfProposal.h"
#include "RooStats/ProfileLikelihoodCalculator.h"
#include "RooStats/LikelihoodIntervalPlot.h"
#include "RooStats/LikelihoodInterval.h"

using namespace std;

In [2]:
%%cpp -d
// This is a workaround to make sure the namespace is used inside functions
using namespace RooFit;
using namespace RooStats;


Arguments are defined.

In [3]:
Int_t dim = 4;
Int_t nPOI = 2;


Let's time this challenging example

In [4]:
TStopwatch t;
t.Start();

RooArgList xVec;
RooArgList muVec;
RooArgSet poi;

RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby
Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University



Make the observable and means

In [5]:
Int_t i, j;
RooRealVar *x;
RooRealVar *mu_x;
for (i = 0; i < dim; i++) {
char *name = Form("x%d", i);
x = new RooRealVar(name, name, 0, -3, 3);

char *mu_name = Form("mu_x%d", i);
mu_x = new RooRealVar(mu_name, mu_name, 0, -2, 2);
}


Put them into the list of parameters of interest

In [6]:
for (i = 0; i < nPOI; i++) {
}


Make a covariance matrix that is all 1's

In [7]:
TMatrixDSym cov(dim);
for (i = 0; i < dim; i++) {
for (j = 0; j < dim; j++) {
if (i == j)
cov(i, j) = 3.;
else
cov(i, j) = 1.0;
}
}


Now make the multivariate gaussian

In [8]:
RooMultiVarGaussian mvg("mvg", "mvg", xVec, muVec, cov);


make a toy dataset

In [9]:
RooDataSet *data = mvg.generate(xVec, 100);


Now create the model config for this problem

In [10]:
RooWorkspace *w = new RooWorkspace("MVG");
ModelConfig modelConfig(w);
modelConfig.SetPdf(mvg);
modelConfig.SetParametersOfInterest(poi);


Setup calculators

Mcmc we want to setup an efficient proposal function using the covariance matrix from a fit to the data

In [11]:
RooFitResult *fit = mvg.fitTo(*data, Save(true));
ProposalHelper ph;
ph.SetVariables((RooArgSet &)fit->floatParsFinal());
ph.SetCovMatrix(fit->covarianceMatrix());
ph.SetUpdateProposalParameters(true);
ph.SetCacheSize(100);
ProposalFunction *pdfProp = ph.GetProposalFunction();

[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
**********
**    1 **SET PRINT           1
**********
**********
**********
PARAMETER DEFINITIONS:
NO.   NAME         VALUE      STEP SIZE      LIMITS
1 mu_x0        0.00000e+00  4.00000e-01   -2.00000e+00  2.00000e+00
2 mu_x1        0.00000e+00  4.00000e-01   -2.00000e+00  2.00000e+00
3 mu_x2        0.00000e+00  4.00000e-01   -2.00000e+00  2.00000e+00
4 mu_x3        0.00000e+00  4.00000e-01   -2.00000e+00  2.00000e+00
**********
**    3 **SET ERR         0.5
**********
**********
**    4 **SET PRINT           1
**********
**********
**    5 **SET STR           1
**********
NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
**********
**********
FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
FCN=707.822 FROM MIGRAD    STATUS=INITIATE       14 CALLS          15 TOTAL
EDM= unknown      STRATEGY= 1      NO ERROR MATRIX
EXT PARAMETER               CURRENT GUESS       STEP         FIRST
NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE
1  mu_x0        0.00000e+00   4.00000e-01   2.01358e-01  -9.83942e+00
2  mu_x1        0.00000e+00   4.00000e-01   2.01358e-01  -1.25127e+01
3  mu_x2        0.00000e+00   4.00000e-01   2.01358e-01   9.88572e+00
4  mu_x3        0.00000e+00   4.00000e-01   2.01358e-01  -4.09155e+00
ERR DEF= 0.5
MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=706.56 FROM MIGRAD    STATUS=CONVERGED      65 CALLS          66 TOTAL
EDM=1.95881e-05    STRATEGY= 1      ERROR MATRIX ACCURATE
EXT PARAMETER                                   STEP         FIRST
NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE
1  mu_x0        1.80728e-01   1.72980e-01   1.42731e-03  -2.59592e-02
2  mu_x1        2.07351e-01   1.72978e-01   1.42884e-03  -3.69006e-02
3  mu_x2       -1.59412e-02   1.72984e-01   1.42230e-03   3.21539e-02
4  mu_x3        1.23430e-01   1.72982e-01   1.42472e-03  -8.04153e-03
ERR DEF= 0.5
EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5
3.000e-02  9.997e-03  9.998e-03  9.997e-03
9.997e-03  3.000e-02  9.998e-03  9.997e-03
9.998e-03  9.998e-03  3.000e-02  9.998e-03
9.997e-03  9.997e-03  9.998e-03  3.000e-02
PARAMETER  CORRELATION COEFFICIENTS
NO.  GLOBAL      1      2      3      4
1  0.44715   1.000  0.333  0.333  0.333
2  0.44715   0.333  1.000  0.333  0.333
3  0.44717   0.333  0.333  1.000  0.333
4  0.44716   0.333  0.333  0.333  1.000
**********
**    7 **SET ERR         0.5
**********
**********
**    8 **SET PRINT           1
**********
**********
**    9 **HESSE        2000
**********
COVARIANCE MATRIX CALCULATED SUCCESSFULLY
FCN=706.56 FROM HESSE     STATUS=OK             23 CALLS          89 TOTAL
EDM=1.95881e-05    STRATEGY= 1      ERROR MATRIX ACCURATE
EXT PARAMETER                                INTERNAL      INTERNAL
NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE
1  mu_x0        1.80728e-01   1.72984e-01   2.85462e-04   9.04875e-02
2  mu_x1        2.07351e-01   1.72982e-01   2.85769e-04   1.03862e-01
3  mu_x2       -1.59412e-02   1.72987e-01   2.84460e-04  -7.97069e-03
4  mu_x3        1.23430e-01   1.72986e-01   2.84944e-04   6.17540e-02
ERR DEF= 0.5
EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5
3.000e-02  9.999e-03  9.999e-03  9.999e-03
9.999e-03  3.000e-02  9.999e-03  9.999e-03
9.999e-03  9.999e-03  3.000e-02  9.999e-03
9.999e-03  9.999e-03  9.999e-03  3.000e-02
PARAMETER  CORRELATION COEFFICIENTS
NO.  GLOBAL      1      2      3      4
1  0.44720   1.000  0.333  0.333  0.333
2  0.44719   0.333  1.000  0.333  0.333
3  0.44720   0.333  0.333  1.000  0.333
4  0.44720   0.333  0.333  0.333  1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization


Now create the calculator

In [12]:
MCMCCalculator mc(*data, modelConfig);
mc.SetConfidenceLevel(0.95);
mc.SetNumBurnInSteps(100);
mc.SetNumIters(10000);
mc.SetNumBins(50);
mc.SetProposalFunction(*pdfProp);

MCMCInterval *mcInt = mc.GetInterval();
RooArgList *poiList = mcInt->GetAxes();

[#1] INFO:Minization -- createNLL: caching constraint set under name CONSTR_OF_PDF_mvg_FOR_OBS_x0:x1:x2:x3 with 0 entries
Metropolis-Hastings progress: ....................................................................................................
[#1] INFO:Eval -- Proposal acceptance rate: 37.1%
[#1] INFO:Eval -- Number of steps in chain: 3710


Now setup the profile likelihood calculator

In [13]:
ProfileLikelihoodCalculator plc(*data, modelConfig);
plc.SetConfidenceLevel(0.95);
LikelihoodInterval *plInt = (LikelihoodInterval *)plc.GetInterval();

[#1] INFO:Minization -- createNLL picked up cached consraints from workspace with 0 entries
[#0] PROGRESS:Minization -- ProfileLikelihoodCalcultor::DoGLobalFit - find MLE
[#0] PROGRESS:Minization -- ProfileLikelihoodCalcultor::DoMinimizeNLL - using Minuit / Migrad with strategy 1
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --
RooFitResult: minimized FCN value: 706.56, estimated distance to minimum: 1.16082e-08
covariance matrix quality: Full, accurate covariance matrix
Status : MINIMIZE=0

Floating Parameter    FinalValue +/-  Error
--------------------  --------------------------
mu_x0    1.8119e-01 +/-  1.73e-01
mu_x1    2.0792e-01 +/-  1.73e-01
mu_x2   -1.6078e-02 +/-  1.73e-01
mu_x3    1.2370e-01 +/-  1.73e-01



Make some plots

In [14]:
MCMCIntervalPlot mcPlot(*mcInt);

TCanvas *c1 = new TCanvas();
mcPlot.SetLineColor(kGreen);
mcPlot.SetLineWidth(2);
mcPlot.Draw();

LikelihoodIntervalPlot plPlot(plInt);
plPlot.Draw("same");

if (poiList->getSize() == 1) {
RooRealVar *p = (RooRealVar *)poiList->at(0);
Double_t ll = mcInt->LowerLimit(*p);
Double_t ul = mcInt->UpperLimit(*p);
cout << "MCMC interval: [" << ll << ", " << ul << "]" << endl;
}

if (poiList->getSize() == 2) {
RooRealVar *p0 = (RooRealVar *)poiList->at(0);
RooRealVar *p1 = (RooRealVar *)poiList->at(1);
TCanvas *scatter = new TCanvas();
Double_t ll = mcInt->LowerLimit(*p0);
Double_t ul = mcInt->UpperLimit(*p0);
cout << "MCMC interval on p0: [" << ll << ", " << ul << "]" << endl;
ll = mcInt->LowerLimit(*p0);
ul = mcInt->UpperLimit(*p0);
cout << "MCMC interval on p1: [" << ll << ", " << ul << "]" << endl;

// MCMC interval on p0: [-0.2, 0.6]
// MCMC interval on p1: [-0.2, 0.6]

mcPlot.DrawChainScatter(*p0, *p1);
scatter->Update();
}

t.Print();

[#1] INFO:Minization -- RooProfileLL::evaluate(nll_mvg_mvgData_Profile[mu_x0,mu_x1]) Creating instance of MINUIT
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_mvg_mvgData_Profile[mu_x0,mu_x1]) determining minimum likelihood for current configurations w.r.t all observable
[#1] INFO:Minization -- RooProfileLL::evaluate(nll_mvg_mvgData_Profile[mu_x0,mu_x1]) minimum found at (mu_x0=0.181184, mu_x1=0.207917)
..[#1] INFO:Minization -- LikelihoodInterval - Finding the contour of mu_x0 ( 0 ) and mu_x1 ( 1 )
MCMC interval on p0: [-0.28, 0.6]
MCMC interval on p1: [-0.28, 0.6]
Real time 0:00:02, CP time 2.200


Draw all canvases

In [15]:
gROOT->GetListOfCanvases()->Draw()