'DATA AND CATEGORIES' RooFit tutorial macro #403
Using weights in unbinned datasets
Author: Clemens Lange, Wouter Verkerke (C version)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, March 19, 2024 at 07:16 PM.
from __future__ import print_function
import ROOT
Declare observable
x = ROOT.RooRealVar("x", "x", -10, 10)
x.setBins(40)
Construction a uniform pdf
p0 = ROOT.RooPolynomial("px", "px", x)
Sample 1000 events from pdf
data = p0.generate({x}, 1000)
Construct formula to calculate (fake) weight for events
wFunc = ROOT.RooFormulaVar("w", "event weight", "(x*x+10)", [x])
Add column with variable w to previously generated dataset
w = data.addColumn(wFunc)
Dataset d is now a dataset with two observable (x,w) with 1000 entries
data.Print()
RooDataSet::pxData[x,w] = 1000 entries
Instruct dataset wdata in interpret w as event weight rather than as observable
wdata = ROOT.RooDataSet(data.GetName(), data.GetTitle(), data, data.get(), "", w.GetName())
Dataset d is now a dataset with one observable (x) with 1000 entries and a sum of weights of ~430K
wdata.Print()
RooDataSet::pxData[x,weight:w] = 1000 entries (43238.9 weighted)
Construction quadratic polynomial pdf for fitting
a0 = ROOT.RooRealVar("a0", "a0", 1)
a1 = ROOT.RooRealVar("a1", "a1", 0, -1, 1)
a2 = ROOT.RooRealVar("a2", "a2", 1, 0, 10)
p2 = ROOT.RooPolynomial("p2", "p2", x, [a0, a1, a2], 0)
Fit quadratic polynomial to weighted data
NOTE: A plain Maximum likelihood fit to weighted data does in general NOT result in correct error estimates, individual event weights represent Poisson statistics themselves.
Fit with 'wrong' errors
r_ml_wgt = p2.fitTo(wdata, Save=True, PrintLevel=-1)
[#1] INFO:Fitting -- RooAbsPdf::fitTo(p2_over_p2_Int[x]) fixing normalization set for coefficient determination to observables in data [#1] INFO:Fitting -- using CPU computation library compiled with -mavx2 [#0] WARNING:InputArguments -- RooAbsPdf::fitTo(p2): WARNING: a likelihood fit is requested of what appears to be weighted data. While the estimated values of the parameters will always be calculated taking the weights into account, there are multiple ways to estimate the errors of the parameters. You are advised to make an explicit choice for the error calculation: - Either provide SumW2Error(true), to calculate a sum-of-weights-corrected HESSE error matrix (error will be proportional to the number of events in MC). - Or provide SumW2Error(false), to return errors from original HESSE error matrix (which will be proportional to the sum of the weights, i.e., a dataset with <sum of weights> events). - Or provide AsymptoticError(true), to use the asymptotically correct expression (for details see https://arxiv.org/abs/1911.01303)." [#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_p2_over_p2_Int[x]_pxData) Summation contains a RooNLLVar, using its error level [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
A first order correction to estimated parameter errors in an (unbinned) ML fit can be obtained by calculating the covariance matrix as
V' = V C-1 V
where V is the covariance matrix calculated from a fit to -logL = - sum [ w_i log f(x_i) ] and C is the covariance matrix calculated from -logL' = -sum [ w_i^2 log f(x_i) ] (i.e. the weights are applied squared)
A fit in self mode can be performed as follows:
r_ml_wgt_corr = p2.fitTo(wdata, Save=True, SumW2Error=True, PrintLevel=-1)
[#1] INFO:Fitting -- RooAbsPdf::fitTo(p2_over_p2_Int[x]) fixing normalization set for coefficient determination to observables in data [#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_p2_over_p2_Int[x]_pxData) Summation contains a RooNLLVar, using its error level [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization [#1] INFO:Fitting -- RooAbsPdf::fitTo(p2) Calculating sum-of-weights-squared correction matrix for covariance matrix [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
Construct plot frame
frame = x.frame(Title="Unbinned ML fit, chi^2 fit to weighted data")
Plot data using sum-of-weights-squared error rather than Poisson errors
wdata.plotOn(frame, DataError="SumW2")
<cppyy.gbl.RooPlot object at 0xb800d60>
Overlay result of 2nd order polynomial fit to weighted data
p2.plotOn(frame)
<cppyy.gbl.RooPlot object at 0xb800d60>
Construct a pdf with the same shape as p0 after weighting
genPdf = ROOT.RooGenericPdf("genPdf", "x*x+10", [x])
Sample a dataset with the same number of events as data
data2 = genPdf.generate({x}, 1000)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x) [#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
Sample a dataset with the same number of weights as data
data3 = genPdf.generate({x}, 43000)
[#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x) [#1] INFO:NumericIntegration -- RooRealIntegral::init(genPdf_Int[x]) using numeric integrator RooIntegrator1D to calculate Int(x)
Fit the 2nd order polynomial to both unweighted datasets and save the results for comparison
r_ml_unw10 = p2.fitTo(data2, Save=True, PrintLevel=-1)
r_ml_unw43 = p2.fitTo(data3, Save=True, PrintLevel=-1)
[#1] INFO:Fitting -- RooAbsPdf::fitTo(p2_over_p2_Int[x]) fixing normalization set for coefficient determination to observables in data [#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_p2_over_p2_Int[x]_genPdfData) Summation contains a RooNLLVar, using its error level [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization [#1] INFO:Fitting -- RooAbsPdf::fitTo(p2_over_p2_Int[x]) fixing normalization set for coefficient determination to observables in data [#1] INFO:Fitting -- RooAddition::defaultErrorLevel(nll_p2_over_p2_Int[x]_genPdfData) Summation contains a RooNLLVar, using its error level [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: activating const optimization [#1] INFO:Minimization -- RooAbsMinimizerFcn::setOptimizeConst: deactivating const optimization
Construct binned clone of unbinned weighted dataset
binnedData = wdata.binnedClone()
binnedData.Print("v")
DataStore pxData_binned (Generated From px_binned) Contains 40 entries Observables: 1) x = 9.75 L(-10 - 10) B(40) "x" Binned Dataset pxData_binned (Generated From px_binned) Contains 40 bins with a total weight of 43238.9 Observables: 1) x = 9.75 L(-10 - 10) B(40) "x"
Perform chi2 fit to binned weighted dataset using sum-of-weights errors
NB: Within the usual approximations of a chi2 fit, chi2 fit to weighted data using sum-of-weights-squared errors does give correct error estimates
chi2 = p2.createChi2(binnedData, ROOT.RooFit.DataError("SumW2"))
m = ROOT.RooMinimizer(chi2)
m.migrad()
m.hesse()
0
Minuit2Minimizer: Minimize with max-calls 1000 convergence for edm < 1 strategy 1 Minuit2Minimizer : Valid minimum - status = 0 FVAL = 31.3747451817532266 Edm = 4.13511525432537654e-08 Nfcn = 29 a1 = -0.009989 +/- 0.0262975 (limited) a2 = 0.106373 +/- 0.0101849 (limited)
Info in <Minuit2>: MnSeedGenerator Computing seed using NumericalGradient calculator Info in <Minuit2>: MnSeedGenerator Initial state: FCN = 32.33995323 Edm = 0.8968537843 NCalls = 7 Info in <Minuit2>: MnSeedGenerator Initial state Minimum value : 32.33995323 Edm : 0.8968537843 Internal parameters: [ -0.00121057482 -1.372915994] Internal gradient : [ 27.17902731 -203.7916428] Internal covariance matrix: [[ 0.00060495515 0] [ 0 7.5618996e-05]]] Info in <Minuit2>: VariableMetricBuilder Start iterating until Edm is < 0.002 with call limit = 1000 Info in <Minuit2>: VariableMetricBuilder 0 - FCN = 32.33995323 Edm = 0.8968537843 NCalls = 7 Info in <Minuit2>: VariableMetricBuilder 1 - FCN = 31.37772394 Edm = 0.002313012125 NCalls = 13 Info in <Minuit2>: VariableMetricBuilder 2 - FCN = 31.37474518 Edm = 3.566880129e-08 NCalls = 19 Info in <Minuit2>: VariableMetricBuilder After Hessian Info in <Minuit2>: VariableMetricBuilder 3 - FCN = 31.37474518 Edm = 4.135115254e-08 NCalls = 29 Info in <Minuit2>: Minuit2Minimizer::Hesse Using max-calls 1000 Info in <Minuit2>: Minuit2Minimizer::Hesse Hesse is valid - matrix is accurate
Plot chi^2 fit result on frame as well
r_chi2_wgt = m.save()
p2.plotOn(frame, LineStyle="--", LineColor="r")
<cppyy.gbl.RooPlot object at 0xb800d60>
Note that ML fit on 1Kevt of weighted data is closer to result of ML fit on 43Kevt of unweighted data than to 1Kevt of unweighted data, the reference chi^2 fit with SumW2 error gives a result closer to that of an unbinned ML fit to 1Kevt of unweighted data.
print("==> ML Fit results on 1K unweighted events")
r_ml_unw10.Print()
print("==> ML Fit results on 43K unweighted events")
r_ml_unw43.Print()
print("==> ML Fit results on 1K weighted events with a summed weight of 43K")
r_ml_wgt.Print()
print("==> Corrected ML Fit results on 1K weighted events with a summed weight of 43K")
r_ml_wgt_corr.Print()
print("==> Chi2 Fit results on 1K weighted events with a summed weight of 43K")
r_chi2_wgt.Print()
c = ROOT.TCanvas("rf403_weightedevts", "rf403_weightedevts", 600, 600)
ROOT.gPad.SetLeftMargin(0.15)
frame.GetYaxis().SetTitleOffset(1.8)
frame.Draw()
c.SaveAs("rf403_weightedevts.png")
==> ML Fit results on 1K unweighted events ==> ML Fit results on 43K unweighted events ==> ML Fit results on 1K weighted events with a summed weight of 43K ==> Corrected ML Fit results on 1K weighted events with a summed weight of 43K ==> Chi2 Fit results on 1K weighted events with a summed weight of 43K RooFitResult: minimized FCN value: 2766.49, estimated distance to minimum: 0.000399952 covariance matrix quality: Full, accurate covariance matrix Status : MINIMIZE=0 HESSE=0 Floating Parameter FinalValue +/- Error -------------------- -------------------------- a1 8.9483e-03 +/- 2.70e-02 a2 1.0177e-01 +/- 1.69e-02 RooFitResult: minimized FCN value: 118892, estimated distance to minimum: 0.000206627 covariance matrix quality: Full, accurate covariance matrix Status : MINIMIZE=0 HESSE=0 Floating Parameter FinalValue +/- Error -------------------- -------------------------- a1 -1.2106e-03 +/- 4.02e-03 a2 9.7573e-02 +/- 2.37e-03 RooFitResult: minimized FCN value: 119682, estimated distance to minimum: 1.25398e-05 covariance matrix quality: Full, accurate covariance matrix Status : MINIMIZE=0 HESSE=0 Floating Parameter FinalValue +/- Error -------------------- -------------------------- a1 -4.8713e-03 +/- 4.03e-03 a2 9.8645e-02 +/- 2.41e-03 RooFitResult: minimized FCN value: 119682, estimated distance to minimum: 79498.5 covariance matrix quality: Full, accurate covariance matrix Status : MINIMIZE=0 HESSE=0 HESSE=0 Floating Parameter FinalValue +/- Error -------------------- -------------------------- a1 -4.8565e-03 +/- 3.00e-02 a2 9.8652e-02 +/- 2.99e-02 RooFitResult: minimized FCN value: 31.3747, estimated distance to minimum: 4.135e-08 covariance matrix quality: Full, accurate covariance matrix Status : MIGRAD=0 HESSE=0 Floating Parameter FinalValue +/- Error -------------------- -------------------------- a1 -9.9890e-03 +/- 2.63e-02 a2 1.0637e-01 +/- 1.02e-02
Info in <TCanvas::Print>: png file rf403_weightedevts.png has been created
Draw all canvases
from ROOT import gROOT
gROOT.GetListOfCanvases().Draw()