Rs 3 0 1_Splot

SPlot tutorial

This tutorial shows an example of using SPlot to unfold two distributions. The physics context for the example is that we want to know the isolation distribution for real electrons from Z events and fake electrons from QCD. Isolation is our 'control' variable. To unfold them, we need a model for an uncorrelated variable that discriminates between Z and QCD. To do this, we use the invariant mass of two electrons. We model the Z with a Gaussian and the QCD with a falling exponential.

Note, since we don't have real data in this tutorial, we need to generate toy data. To do that we need a model for the isolation variable for both Z and QCD. This is only used to generate the toy data, and would not be needed if we had real data.

Author: Kyle Cranmer
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Sunday, February 16, 2020 at 03:44 AM.

In [1]:
%%cpp -d
#include "RooRealVar.h"
#include "RooStats/SPlot.h"
#include "RooDataSet.h"
#include "RooRealVar.h"
#include "RooGaussian.h"
#include "RooExponential.h"
#include "RooChebychev.h"
#include "RooAddPdf.h"
#include "RooProdPdf.h"
#include "RooAddition.h"
#include "RooProduct.h"
#include "TCanvas.h"
#include "RooAbsPdf.h"
#include "RooFit.h"
#include "RooFitResult.h"
#include "RooWorkspace.h"
#include "RooConstVar.h"
#include <iomanip>

Use this order for safety on library loading

In [2]:
%%cpp -d
// This is a workaround to make sure the namespace is used inside functions
using namespace RooFit;
using namespace RooStats;

See below for implementation

In [3]:
void AddModel(RooWorkspace *);
void AddData(RooWorkspace *);
void DoSPlot(RooWorkspace *);
void MakePlots(RooWorkspace *);

In [4]:
%%cpp -d
void AddModel(RooWorkspace *ws)
{

   // Make models for signal (Higgs) and background (Z+jets and QCD)
   // In real life, this part requires an intelligent modeling
   // of signal and background -- this is only an example.

   // set range of observable
   Double_t lowRange = 0., highRange = 200.;

   // make a RooRealVar for the observables
   RooRealVar invMass("invMass", "M_{inv}", lowRange, highRange, "GeV");
   RooRealVar isolation("isolation", "isolation", 0., 20., "GeV");

   // --------------------------------------
   // make 2-d model for Z including the invariant mass
   // distribution  and an isolation distribution which we want to
   // unfold from QCD.
   std::cout << "make z model" << std::endl;
   // mass model for Z
   RooRealVar mZ("mZ", "Z Mass", 91.2, lowRange, highRange);
   RooRealVar sigmaZ("sigmaZ", "Width of Gaussian", 2, 0, 10, "GeV");
   RooGaussian mZModel("mZModel", "Z+jets Model", invMass, mZ, sigmaZ);
   // we know Z mass
   mZ.setConstant();
   // we leave the width of the Z free during the fit in this example.

   // isolation model for Z.  Only used to generate toy MC.
   // the exponential is of the form exp(c*x).  If we want
   // the isolation to decay an e-fold every R GeV, we use
   // c = -1/R.
   RooConstVar zIsolDecayConst("zIsolDecayConst", "z isolation decay  constant", -1);
   RooExponential zIsolationModel("zIsolationModel", "z isolation model", isolation, zIsolDecayConst);

   // make the combined Z model
   RooProdPdf zModel("zModel", "2-d model for Z", RooArgSet(mZModel, zIsolationModel));

   // --------------------------------------
   // make QCD model

   std::cout << "make qcd model" << std::endl;
   // mass model for QCD.
   // the exponential is of the form exp(c*x).  If we want
   // the mass to decay an e-fold every R GeV, we use
   // c = -1/R.
   // We can leave this parameter free during the fit.
   RooRealVar qcdMassDecayConst("qcdMassDecayConst", "Decay const for QCD mass spectrum", -0.01, -100, 100, "1/GeV");
   RooExponential qcdMassModel("qcdMassModel", "qcd Mass Model", invMass, qcdMassDecayConst);

   // isolation model for QCD.  Only used to generate toy MC
   // the exponential is of the form exp(c*x).  If we want
   // the isolation to decay an e-fold every R GeV, we use
   // c = -1/R.
   RooConstVar qcdIsolDecayConst("qcdIsolDecayConst", "Et resolution constant", -.1);
   RooExponential qcdIsolationModel("qcdIsolationModel", "QCD isolation model", isolation, qcdIsolDecayConst);

   // make the 2-d model
   RooProdPdf qcdModel("qcdModel", "2-d model for QCD", RooArgSet(qcdMassModel, qcdIsolationModel));

   // --------------------------------------
   // combined model

   // These variables represent the number of Z or QCD events
   // They will be fitted.
   RooRealVar zYield("zYield", "fitted yield for Z", 50, 0., 1000);
   RooRealVar qcdYield("qcdYield", "fitted yield for QCD", 100, 0., 1000);

   // now make the combined model
   std::cout << "make full model" << std::endl;
   RooAddPdf model("model", "z+qcd background models", RooArgList(zModel, qcdModel), RooArgList(zYield, qcdYield));

   // interesting for debugging and visualizing the model
   model.graphVizTree("fullModel.dot");

   std::cout << "import model" << std::endl;

   ws->import(model);
}

In [5]:
%%cpp -d
void AddData(RooWorkspace *ws)
{
   // Add a toy dataset

   // how many events do we want?
   Int_t nEvents = 1000;

   // get what we need out of the workspace to make toy data
   RooAbsPdf *model = ws->pdf("model");
   RooRealVar *invMass = ws->var("invMass");
   RooRealVar *isolation = ws->var("isolation");

   // make the toy data
   std::cout << "make data set and import to workspace" << std::endl;
   RooDataSet *data = model->generate(RooArgSet(*invMass, *isolation), nEvents);

   // import data into workspace
   ws->import(*data, Rename("data"));
}

In [6]:
%%cpp -d
void DoSPlot(RooWorkspace *ws)
{
   std::cout << "Calculate sWeights" << std::endl;

   // get what we need out of the workspace to do the fit
   RooAbsPdf *model = ws->pdf("model");
   RooRealVar *zYield = ws->var("zYield");
   RooRealVar *qcdYield = ws->var("qcdYield");
   RooDataSet *data = (RooDataSet *)ws->data("data");

   // fit the model to the data.
   model->fitTo(*data, Extended());

   // The sPlot technique requires that we fix the parameters
   // of the model that are not yields after doing the fit.
   //
   // This *could* be done with the lines below, however this is taken care of
   // by the RooStats::SPlot constructor (or more precisely the AddSWeight
   // method).
   //
   // RooRealVar* sigmaZ = ws->var("sigmaZ");
   // RooRealVar* qcdMassDecayConst = ws->var("qcdMassDecayConst");
   // sigmaZ->setConstant();
   // qcdMassDecayConst->setConstant();

   RooMsgService::instance().setSilentMode(true);
   
   std::cout << "\n\n------------------------------------------\nThe dataset before creating sWeights:\n";
   data->Print();
   
   RooMsgService::instance().setGlobalKillBelow(RooFit::ERROR);
   
   // Now we use the SPlot class to add SWeights to our data set
   // based on our model and our yield variables
   RooStats::SPlot *sData = new RooStats::SPlot("sData", "An SPlot", *data, model, RooArgList(*zYield, *qcdYield));

   std::cout << "\n\nThe dataset after creating sWeights:\n";
   data->Print();

   // Check that our weights have the desired properties

   std::cout << "\n\n------------------------------------------\n\nCheck SWeights:" << std::endl;

   std::cout << std::endl
             << "Yield of Z is\t" << zYield->getVal() << ".  From sWeights it is "
             << sData->GetYieldFromSWeight("zYield") << std::endl;

   std::cout << "Yield of QCD is\t" << qcdYield->getVal() << ".  From sWeights it is "
             << sData->GetYieldFromSWeight("qcdYield") << std::endl
             << std::endl;

   for (Int_t i = 0; i < 10; i++) {
      std::cout << "z Weight for event " << i << std::right << std::setw(12) << sData->GetSWeight(i, "zYield") << "  qcd Weight"
                << std::setw(12) << sData->GetSWeight(i, "qcdYield") << "  Total Weight" << std::setw(12) << sData->GetSumOfEventSWeight(i)
                << std::endl;
   }

   std::cout << std::endl;

   // import this new dataset with sWeights
   std::cout << "import new dataset with sWeights" << std::endl;
   ws->import(*data, Rename("dataWithSWeights"));
   
   RooMsgService::instance().setGlobalKillBelow(RooFit::INFO);
}

A helper function is created:

In [7]:
%%cpp -d
void MakePlots(RooWorkspace *ws)
{

   // Here we make plots of the discriminating variable (invMass) after the fit
   // and of the control variable (isolation) after unfolding with sPlot.
   std::cout << "make plots" << std::endl;

   // make our canvas
   TCanvas *cdata = new TCanvas("sPlot", "sPlot demo", 400, 600);
   cdata->Divide(1, 3);

   // get what we need out of the workspace
   RooAbsPdf *model = ws->pdf("model");
   RooAbsPdf *zModel = ws->pdf("zModel");
   RooAbsPdf *qcdModel = ws->pdf("qcdModel");

   RooRealVar *isolation = ws->var("isolation");
   RooRealVar *invMass = ws->var("invMass");

   // note, we get the dataset with sWeights
   RooDataSet *data = (RooDataSet *)ws->data("dataWithSWeights");

   // this shouldn't be necessary, need to fix something with workspace
   // do this to set parameters back to their fitted values.
//   model->fitTo(*data, Extended());

   // plot invMass for data with full model and individual components overlaid
   //  TCanvas* cdata = new TCanvas();
   cdata->cd(1);
   RooPlot *frame = invMass->frame();
   data->plotOn(frame);
   model->plotOn(frame, Name("FullModel"));
   model->plotOn(frame, Components(*zModel), LineStyle(kDashed), LineColor(kRed), Name("ZModel"));
   model->plotOn(frame, Components(*qcdModel), LineStyle(kDashed), LineColor(kGreen), Name("QCDModel"));

   TLegend leg(0.11, 0.5, 0.5, 0.8);
   leg.AddEntry(frame->findObject("FullModel"), "Full model", "L");
   leg.AddEntry(frame->findObject("ZModel"), "Z model", "L");
   leg.AddEntry(frame->findObject("QCDModel"), "QCD model", "L");
   leg.SetBorderSize(0);
   leg.SetFillStyle(0);

   frame->SetTitle("Fit of model to discriminating variable");
   frame->Draw();
   leg.DrawClone();

   // Now use the sWeights to show isolation distribution for Z and QCD.
   // The SPlot class can make this easier, but here we demonstrate in more
   // detail how the sWeights are used.  The SPlot class should make this
   // very easy and needs some more development.

   // Plot isolation for Z component.
   // Do this by plotting all events weighted by the sWeight for the Z component.
   // The SPlot class adds a new variable that has the name of the corresponding
   // yield + "_sw".
   cdata->cd(2);

   // create weighted data set
   RooDataSet *dataw_z = new RooDataSet(data->GetName(), data->GetTitle(), data, *data->get(), 0, "zYield_sw");

   RooPlot *frame2 = isolation->frame();
   // Since the data are weighted, we use SumW2 to compute the errors.
   dataw_z->plotOn(frame2, DataError(RooAbsData::SumW2));

   frame2->SetTitle("Isolation distribution with s weights to project out Z");
   frame2->Draw();

   // Plot isolation for QCD component.
   // Eg. plot all events weighted by the sWeight for the QCD component.
   // The SPlot class adds a new variable that has the name of the corresponding
   // yield + "_sw".
   cdata->cd(3);
   RooDataSet *dataw_qcd = new RooDataSet(data->GetName(), data->GetTitle(), data, *data->get(), 0, "qcdYield_sw");
   RooPlot *frame3 = isolation->frame();
   dataw_qcd->plotOn(frame3, DataError(RooAbsData::SumW2));

   frame3->SetTitle("Isolation distribution with s weights to project out QCD");
   frame3->Draw();

   //  cdata->SaveAs("SPlot.gif");
}

Create a new workspace to manage the project.

In [8]:
RooWorkspace *wspace = new RooWorkspace("myWS");
RooFit v3.60 -- Developed by Wouter Verkerke and David Kirkby 
                Copyright (C) 2000-2013 NIKHEF, University of California & Stanford University
                All rights reserved, please read http://roofit.sourceforge.net/license.txt

Add the signal and background models to the workspace. Inside this function you will find a description of our model.

In [9]:
AddModel(wspace);
make z model
make qcd model
make full model
import model
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooAddPdf::model
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooProdPdf::zModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooGaussian::mZModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::invMass
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::mZ
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::sigmaZ
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::zIsolationModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::isolation
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooConstVar::zIsolDecayConst
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::zYield
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooProdPdf::qcdModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::qcdMassModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::qcdMassDecayConst
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooExponential::qcdIsolationModel
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooConstVar::qcdIsolDecayConst
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing RooRealVar::qcdYield

Add some toy data to the workspace

In [10]:
AddData(wspace);
make data set and import to workspace
[#1] INFO:ObjectHandling -- RooWorkspace::import(myWS) importing dataset modelData
[#1] INFO:ObjectHandling -- RooWorkSpace::import(myWS) changing name of dataset from  modelData to data

Inspect the workspace if you wish wspace->Print();

Do splot. This will make a new dataset with sWeights added for every event.

In [11]:
DoSPlot(wspace);
Calculate sWeights
[#1] INFO:Minization -- createNLL: caching constraint set under name CONSTR_OF_PDF_model_FOR_OBS_invMass:isolation with 0 entries
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Minization --  The following expressions have been identified as constant and will be precalculated and cached: (zIsolationModel,qcdIsolationModel)
[#1] INFO:Minization --  The following expressions will be evaluated in cache-and-track mode: (mZModel,qcdMassModel)
 **********
 **    1 **SET PRINT           1
 **********
 **********
 **    2 **SET NOGRAD
 **********
 PARAMETER DEFINITIONS:
    NO.   NAME         VALUE      STEP SIZE      LIMITS
     1 qcdMassDecayConst  -1.00000e-02  2.00000e+01   -1.00000e+02  1.00000e+02
     2 qcdYield     1.00000e+02  5.00000e+01    0.00000e+00  1.00000e+03
     3 sigmaZ       2.00000e+00  1.00000e+00    0.00000e+00  1.00000e+01
     4 zYield       5.00000e+01  2.50000e+01    0.00000e+00  1.00000e+03
 **********
 **    3 **SET ERR         0.5
 **********
 **********
 **    4 **SET PRINT           1
 **********
 **********
 **    5 **SET STR           1
 **********
 NOW USING STRATEGY  1: TRY TO BALANCE SPEED AGAINST RELIABILITY
 **********
 **    6 **MIGRAD        2000           1
 **********
 FIRST CALL TO USER FUNCTION AT NEW START POINT, WITH IFLAG=4.
 START MIGRAD MINIMIZATION.  STRATEGY  1.  CONVERGENCE WHEN EDM .LT. 1.00e-03
 FCN=1901.86 FROM MIGRAD    STATUS=INITIATE       18 CALLS          19 TOTAL
                     EDM= unknown      STRATEGY= 1      NO ERROR MATRIX       
  EXT PARAMETER               CURRENT GUESS       STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  qcdMassDecayConst  -1.00000e-02   2.00000e+01   2.01358e-01  -1.06814e+05
   2  qcdYield     1.00000e+02   5.00000e+01   1.72186e-01  -1.57317e+03
   3  sigmaZ       2.00000e+00   1.00000e+00   2.57889e-01   3.62916e+01
   4  zYield       5.00000e+01   2.50000e+01   1.18625e-01  -1.41930e+03
                               ERR DEF= 0.5
 MIGRAD MINIMIZATION HAS CONVERGED.
 MIGRAD WILL VERIFY CONVERGENCE AND ERROR MATRIX.
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=849.923 FROM MIGRAD    STATUS=CONVERGED     110 CALLS         111 TOTAL
                     EDM=3.1409e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                   STEP         FIRST   
  NO.   NAME      VALUE            ERROR          SIZE      DERIVATIVE 
   1  qcdMassDecayConst  -9.30176e-03   7.55562e-04   1.52115e-07   2.51004e+01
   2  qcdYield     6.22140e+02   2.52789e+01   1.04861e-03  -1.45735e-02
   3  sigmaZ       1.95257e+00   8.09927e-02   4.09416e-04  -7.77193e-02
   4  zYield       3.77836e+02   1.98770e+01   8.24140e-04  -6.51095e-03
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5
  5.709e-07  1.999e-04 -1.007e-06 -1.999e-04 
  1.999e-04  6.396e+02 -9.256e-02 -1.747e+01 
 -1.007e-06 -9.256e-02  6.561e-03  9.257e-02 
 -1.999e-04 -1.747e+01  9.257e-02  3.953e+02 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3      4
        1  0.02260   1.000  0.010 -0.016 -0.013
        2  0.05626   0.010  1.000 -0.045 -0.035
        3  0.07351  -0.016 -0.045  1.000  0.057
        4  0.06697  -0.013 -0.035  0.057  1.000
 **********
 **    7 **SET ERR         0.5
 **********
 **********
 **    8 **SET PRINT           1
 **********
 **********
 **    9 **HESSE        2000
 **********
 COVARIANCE MATRIX CALCULATED SUCCESSFULLY
 FCN=849.923 FROM HESSE     STATUS=OK             23 CALLS         134 TOTAL
                     EDM=3.13931e-06    STRATEGY= 1      ERROR MATRIX ACCURATE 
  EXT PARAMETER                                INTERNAL      INTERNAL  
  NO.   NAME      VALUE            ERROR       STEP SIZE       VALUE   
   1  qcdMassDecayConst  -9.30176e-03   7.55563e-04   3.04229e-08  -9.30176e-05
   2  qcdYield     6.22140e+02   2.52790e+01   2.09721e-04   2.46778e-01
   3  sigmaZ       1.95257e+00   8.09935e-02   8.18832e-05  -6.55413e-01
   4  zYield       3.77836e+02   1.98772e+01   1.64828e-04  -2.46827e-01
                               ERR DEF= 0.5
 EXTERNAL ERROR MATRIX.    NDIM=  25    NPAR=  4    ERR DEF=0.5
  5.709e-07  2.002e-04 -1.009e-06 -2.002e-04 
  2.002e-04  6.396e+02 -9.273e-02 -1.749e+01 
 -1.009e-06 -9.273e-02  6.561e-03  9.273e-02 
 -2.002e-04 -1.749e+01  9.273e-02  3.953e+02 
 PARAMETER  CORRELATION COEFFICIENTS  
       NO.  GLOBAL      1      2      3      4
        1  0.02264   1.000  0.010 -0.016 -0.013
        2  0.05634   0.010  1.000 -0.045 -0.035
        3  0.07364  -0.016 -0.045  1.000  0.058
        4  0.06707  -0.013 -0.035  0.058  1.000
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization


------------------------------------------
The dataset before creating sWeights:
RooDataSet::data[invMass,isolation] = 1000 entries


The dataset after creating sWeights:
RooDataSet::data[invMass,isolation,zYield_sw,L_zYield,qcdYield_sw,L_qcdYield] = 1000 entries


------------------------------------------

Check SWeights:

Yield of Z is	377.841.  From sWeights it is 377.841
Yield of QCD is	622.16.  From sWeights it is 622.16

z Weight for event 0  -0.0259121  qcd Weight     1.02591  Total Weight           1
z Weight for event 1  -0.0259121  qcd Weight     1.02591  Total Weight           1
z Weight for event 2     1.02602  qcd Weight  -0.0260232  Total Weight           1
z Weight for event 3     1.01358  qcd Weight  -0.0135828  Total Weight           1
z Weight for event 4  -0.0259121  qcd Weight     1.02591  Total Weight           1
z Weight for event 5     1.02684  qcd Weight  -0.0268354  Total Weight           1
z Weight for event 6  -0.0259121  qcd Weight     1.02591  Total Weight           1
z Weight for event 7  -0.0259121  qcd Weight     1.02591  Total Weight           1
z Weight for event 8     1.03438  qcd Weight  -0.0343802  Total Weight           1
z Weight for event 9    0.910182  qcd Weight   0.0898196  Total Weight           1

import new dataset with sWeights

Make some plots showing the discriminating variable and the control variable after unfolding.

In [12]:
MakePlots(wspace);
make plots
[#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (zModel)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: (mZModel,zIsolationModel)
[#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) directly selected PDF components: (qcdModel)
[#1] INFO:Plotting -- RooAbsPdf::plotOn(model) indirectly selected PDF components: (qcdMassModel,qcdIsolationModel)
[#1] INFO:Plotting -- RooAbsReal::plotOn(model) plot on invMass integrates over variables (isolation)

Cleanup

In [13]:
delete wspace;

Draw all canvases

In [14]:
%jsroot on
gROOT->GetListOfCanvases()->Draw()