Standard Bayesian Numerical Demo

Standard demo of the numerical Bayesian calculator

This is a standard demo that can be used with any ROOT file prepared in the standard way. You specify:

  • name for input ROOT file
  • name of workspace inside ROOT file that holds model and data
  • name of ModelConfig that specifies details for calculator tools
  • name of dataset

With default parameters the macro will attempt to run the standard hist2workspace example and read the ROOT file that it produces.

The actual heart of the demo is only about 10 lines long.

The BayesianCalculator is based on Bayes's theorem and performs the integration using ROOT's numeric integration utilities

Author: Kyle Cranmer
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Monday, February 24, 2020 at 03:28 AM.

In [1]:
%%cpp -d
#include "TFile.h"
#include "TROOT.h"
#include "RooWorkspace.h"
#include "RooAbsData.h"
#include "RooRealVar.h"

#include "RooUniform.h"
#include "RooStats/ModelConfig.h"
#include "RooStats/BayesianCalculator.h"
#include "RooStats/SimpleInterval.h"
#include "RooStats/RooStatsUtils.h"
#include "RooPlot.h"
#include "TSystem.h"

#include <cassert>
In [2]:
%%cpp -d
// This is a workaround to make sure the namespace is used inside functions
using namespace RooFit;
using namespace RooStats;
In [3]:
struct BayesianNumericalOptions {

   double confLevel = 0.95;      // interval CL
   TString integrationType = ""; // integration Type (default is adaptive (numerical integration)
   // possible values are "TOYMC" (toy MC integration, work when nuisances have a constraints pdf)
   //  "VEGAS" , "MISER", or "PLAIN"  (these are all possible MC integration)
   int nToys =
      10000; // number of toys used for the MC integrations - for Vegas should be probably set to an higher value
   bool scanPosterior =
      false; // flag to compute interval by scanning posterior (it is more robust but maybe less precise)
   bool plotPosterior = false; // plot posterior function after having computed the interval
   int nScanPoints = 50; // number of points for scanning the posterior (if scanPosterior = false it is used only for
                         // plotting). Use by default a low value to speed-up tutorial
   int intervalType = 1; // type of interval (0 is shortest, 1 central, 2 upper limit)
   double maxPOI = -999; // force a different value of POI for doing the scan (default is given value)
   double nSigmaNuisance = -1; // force integration of nuisance parameters to be within nSigma of their error (do first
                               // a model fit to find nuisance error)
};

BayesianNumericalOptions optBayes;

Arguments are defined.

In [4]:
const char *infile = "";
const char *workspaceName = "combined";
const char *modelConfigName = "ModelConfig";
const char *dataName = "obsData";

Option definitions

In [5]:
double confLevel = optBayes.confLevel;
TString integrationType = optBayes.integrationType;
int nToys = optBayes.nToys;
bool scanPosterior = optBayes.scanPosterior;
bool plotPosterior = optBayes.plotPosterior;
int nScanPoints = optBayes.nScanPoints;
int intervalType = optBayes.intervalType;
int maxPOI = optBayes.maxPOI;
double nSigmaNuisance = optBayes.nSigmaNuisance;

First part is just to access a user-defined file or create the standard example file if it doesn't exist

In [6]:
const char *filename = "";
if (!strcmp(infile, "")) {
   filename = "results/example_combined_GaussExample_model.root";
   bool fileExist = !gSystem->AccessPathName(filename); // note opposite return code
   // if file does not exists generate with histfactory
   if (!fileExist) {
#ifdef _WIN32
      cout << "HistFactory file cannot be generated on Windows - exit" << endl;
      return;
#endif
      // Normally this would be run on the command line
      cout << "will run standard hist2workspace example" << endl;
      gROOT->ProcessLine(".! prepareHistFactory .");
      gROOT->ProcessLine(".! hist2workspace config/example.xml");
      cout << "\n\n---------------------" << endl;
      cout << "Done creating example input" << endl;
      cout << "---------------------\n\n" << endl;
   }

} else
   filename = infile;

Try to open the file

In [7]:
TFile *file = TFile::Open(filename);
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

If input file was specified byt not found, quit

In [8]:
if (!file) {
   cout << "StandardRooStatsDemoMacro: Input file " << filename << " is not found" << endl;
   return;
}

Tutorial starts here

Get the workspace out of the file

In [9]:
RooWorkspace *w = (RooWorkspace *)file->Get(workspaceName);
if (!w) {
   cout << "workspace not found" << endl;
   return;
}

Get the modelconfig out of the file

In [10]:
ModelConfig *mc = (ModelConfig *)w->obj(modelConfigName);

Get the modelconfig out of the file

In [11]:
RooAbsData *data = w->data(dataName);

Make sure ingredients are found

In [12]:
if (!data || !mc) {
   w->Print();
   cout << "data or ModelConfig was not found" << endl;
   return;
}

create and use the BayesianCalculator to find and plot the 95% credible interval on the parameter of interest as specified in the model config

Before we do that, we must specify our prior it belongs in the model config, but it may not have been specified

In [13]:
RooUniform prior("prior", "", *mc->GetParametersOfInterest());
w->import(prior);
mc->SetPriorPdf(*w->pdf("prior"));
[#1] INFO:ObjectHandling -- RooWorkspace::import(combined) importing RooUniform::prior

Do without systematics mc->SetNuisanceParameters(RooArgSet() );

In [14]:
if (nSigmaNuisance > 0) {
   RooAbsPdf *pdf = mc->GetPdf();
   assert(pdf);
   RooFitResult *res =
      pdf->fitTo(*data, Save(true), Minimizer(ROOT::Math::MinimizerOptions::DefaultMinimizerType().c_str()),
                 Hesse(true), PrintLevel(ROOT::Math::MinimizerOptions::DefaultPrintLevel() - 1));

   res->Print();
   RooArgList nuisPar(*mc->GetNuisanceParameters());
   for (int i = 0; i < nuisPar.getSize(); ++i) {
      RooRealVar *v = dynamic_cast<RooRealVar *>(&nuisPar[i]);
      assert(v);
      v->setMin(TMath::Max(v->getMin(), v->getVal() - nSigmaNuisance * v->getError()));
      v->setMax(TMath::Min(v->getMax(), v->getVal() + nSigmaNuisance * v->getError()));
      std::cout << "setting interval for nuisance  " << v->GetName() << " : [ " << v->getMin() << " , "
                << v->getMax() << " ]" << std::endl;
   }
}

BayesianCalculator bayesianCalc(*data, *mc);
bayesianCalc.SetConfidenceLevel(confLevel); // 95% interval

Default of the calculator is central interval. here use shortest , central or upper limit depending on input doing a shortest interval might require a longer time since it requires a scan of the posterior function

In [15]:
if (intervalType == 0)
   bayesianCalc.SetShortestInterval(); // for shortest interval
if (intervalType == 1)
   bayesianCalc.SetLeftSideTailFraction(0.5); // for central interval
if (intervalType == 2)
   bayesianCalc.SetLeftSideTailFraction(0.); // for upper limit

if (!integrationType.IsNull()) {
   bayesianCalc.SetIntegrationType(integrationType); // set integrationType
   bayesianCalc.SetNumIters(nToys); // set number of iterations (i.e. number of toys for MC integrations)
}

In case of toymc make a nuisance pdf

In [16]:
if (integrationType.Contains("TOYMC")) {
   RooAbsPdf *nuisPdf = RooStats::MakeNuisancePdf(*mc, "nuisance_pdf");
   cout << "using TOYMC integration: make nuisance pdf from the model " << std::endl;
   nuisPdf->Print();
   bayesianCalc.ForceNuisancePdf(*nuisPdf);
   scanPosterior = true; // for ToyMC the posterior is scanned anyway so used given points
}

Compute interval by scanning the posterior function

In [17]:
if (scanPosterior)
   bayesianCalc.SetScanOfPosterior(nScanPoints);

RooRealVar *poi = (RooRealVar *)mc->GetParametersOfInterest()->first();
if (maxPOI != -999 && maxPOI > poi->getMin())
   poi->setMax(maxPOI);

SimpleInterval *interval = bayesianCalc.GetInterval();
[#1] INFO:Minization -- p.d.f. provides expected number of events, including extended term in likelihood.
[#1] INFO:Minization -- createNLL: caching constraint set under name CONSTR_OF_PDF_simPdf_FOR_OBS_channelCat:obs_x_channel1 with 4 entries
[#1] INFO:Minization --  Including the following constraint terms in minimization: (alpha_syst2Constraint,alpha_syst3Constraint,gamma_stat_channel1_bin_0_constraint,gamma_stat_channel1_bin_1_constraint)
[#1] INFO:Minization -- The following global observables have been defined: (nom_alpha_syst2,nom_alpha_syst3,nom_gamma_stat_channel1_bin_0,nom_gamma_stat_channel1_bin_1)
RooAbsTestStatistic::initSimMode: creating slave calculator #0 for state channel1 (2 dataset entries)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(channel1_model_Int[obs_x_channel1]) using numeric integrator RooBinIntegrator to calculate Int(obs_x_channel1)
[#1] INFO:Fitting -- RooAbsTestStatistic::initSimMode: created 1 slave calculators.
[#1] INFO:NumericIntegration -- RooRealIntegral::init(channel1_model_Int[obs_x_channel1]) using numeric integrator RooBinIntegrator to calculate Int(obs_x_channel1)
[#1] INFO:Eval -- BayesianCalculator::GetPosteriorFunction :  nll value -1033.75 poi value = 1
[#1] INFO:Eval -- BayesianCalculator::GetPosteriorFunction : minimum of NLL vs POI for POI =  1.12121 min NLL = -1033.78
[#1] INFO:InputArguments -- BayesianCalculator:GetInterval Compute the interval from the posterior cdf 
[#1] INFO:NumericIntegration -- PosteriorCdfFunction - integral of posterior = 0.0560849 +/- 0.000423427
[#0] WARNING:Eval -- BayesianCalculator::GetInterval : 315 errors reported in evaluating log-likelihood function 
[#1] INFO:Eval -- BayesianCalculator::GetInterval - found a valid interval : [0.170773 , 2.33979 ]

Print out the interval on the first parameter of interest

In [18]:
cout << "\n>>>> RESULT : " << confLevel * 100 << "% interval on " << poi->GetName() << " is : ["
     << interval->LowerLimit() << ", " << interval->UpperLimit() << "] " << endl;
>>>> RESULT : 95% interval on SigXsecOverSM is : [0.170773, 2.33979] 

End in case plotting is not requested

In [19]:
if (!plotPosterior)
   return;

Make a plot since plotting may take a long time (it requires evaluating the posterior in many points) this command will speed up by reducing the number of points to plot - do 50

Ignore errors of pdf if is zero

In [20]:
RooAbsReal::setEvalErrorLoggingMode(RooAbsReal::Ignore);

cout << "\nDrawing plot of posterior function....." << endl;
Drawing plot of posterior function.....

Always plot using numer of scan points

In [21]:
bayesianCalc.SetScanOfPosterior(nScanPoints);

RooPlot *plot = bayesianCalc.GetPosteriorPlot();
plot->Draw();
[#1] INFO:Eval -- BayesianCalculator - scan posterior function in nbins = 50
Info in <TCanvas::MakeDefCanvas>:  created default TCanvas with name c1

Draw all canvases

In [22]:
%jsroot on
gROOT->GetListOfCanvases()->Draw()