Notebook

df106_HiggsToFourLeptons¶

The Higgs to four lepton analysis from the ATLAS Open Data release of 2020, with RDataFrame.

This tutorial is the Higgs to four lepton analysis from the ATLAS Open Data release in 2020 (http://opendata.atlas.cern/release/2020/documentation/). The data was taken with the ATLAS detector during 2016 at a center-of-mass energy of 13 TeV. The decay of the Standard Model Higgs boson to two Z bosons and subsequently to four leptons is called the "golden channel". The selection leads to a narrow invariant mass peak on top a relatively smooth and small background, revealing the Higgs at 125 GeV. Systematic errors for the MC scale factors are computed and the Vary function of RDataFrame is used for plotting. The analysis is translated to an RDataFrame workflow processing about 300 MB of simulated events and data.

See the corresponding spec json file.

Author: Stefan Wunsch (KIT, CERN), Julia Mathe (CERN), Marta Czurylo (CERN)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Wednesday, April 17, 2024 at 11:08 AM.

In [1]:

import ROOT
import os

Enable Multi-threaded mode

In [2]:

ROOT.EnableImplicitMT()

Create the RDataFrame from the spec json file. The df106_HiggsToFourLeptons_spec.json is provided in the same folder as this tutorial

In [3]:

dataset_spec = os.path.join(ROOT.gROOT.GetTutorialsDir(), "dataframe", "df106_HiggsToFourLeptons_spec.json")
df = ROOT.RDF.Experimental.FromSpec(dataset_spec)  # Creates a single dataframe for all the samples

Add the ProgressBar feature

In [4]:

ROOT.RDF.Experimental.AddProgressBar(df)

Plugin No such file or directory loading sec.protocol libXrdSeckrb5-5.so

Access metadata information that is stored in the JSON config file of the RDataFrame. The metadata contained in the JSON file is accessible within a DefinePerSample call, through the RSampleInfo class.

In [5]:

df = df.DefinePerSample("xsecs", 'rdfsampleinfo_.GetD("xsecs")')
df = df.DefinePerSample("lumi", 'rdfsampleinfo_.GetD("lumi")')
df = df.DefinePerSample("sumws", 'rdfsampleinfo_.GetD("sumws")')
df = df.DefinePerSample("sample_category", 'rdfsampleinfo_.GetS("sample_category")')

We must further apply an MC correction for the ZZ decay due to missing gg->ZZ processes.

In [6]:

ROOT.gInterpreter.Declare(
    """
float scale(unsigned int slot, const ROOT::RDF::RSampleInfo &id){
                return id.Contains("mc_363490.llll.4lep.root") ? 1.3f : 1.0f;
}
"""
)

df = df.DefinePerSample("scale", "scale(rdfslot_, rdfsampleinfo_)")

Select events for the analysis

In [7]:

ROOT.gInterpreter.Declare(
    """
using ROOT::RVecF;
using ROOT::RVecI;
bool GoodElectronsAndMuons(const RVecI &type, const RVecF &pt, const RVecF &eta, const RVecF &phi, const RVecF &e, const RVecF &trackd0pv, const RVecF &tracksigd0pv, const RVecF &z0)
{
    for (size_t i = 0; i < type.size(); i++) {
        ROOT::Math::PtEtaPhiEVector p(0.001*pt[i], eta[i], phi[i], 0.001*e[i]);
        if (type[i] == 11) {
            if (pt[i] < 7000 || abs(eta[i]) > 2.47 || abs(trackd0pv[i] / tracksigd0pv[i]) > 5 || abs(z0[i] * sin(p.Theta())) > 0.5) return false;
        } else {
            if (abs(trackd0pv[i] / tracksigd0pv[i]) > 5 || abs(z0[i] * sin(p.Theta())) > 0.5) return false;
        }
    }
    return true;
}
"""
)

Out[7]:

True

Select electron or muon trigger

In [8]:

df = df.Filter("trigE || trigM")

Select events with exactly four good leptons conserving charge and lepton numbers Note that all collections are RVecs and good_lep is the mask for the good leptons. The lepton types are PDG numbers and set to 11 or 13 for an electron or muon irrespective of the charge.

In [9]:

df = (
    df.Define(
        "good_lep",
        "abs(lep_eta) < 2.5 && lep_pt > 5000 && lep_ptcone30 / lep_pt < 0.3 && lep_etcone20 / lep_pt < 0.3",
    )
    .Filter("Sum(good_lep) == 4")
    .Filter("Sum(lep_charge[good_lep]) == 0")
    .Define("goodlep_sumtypes", "Sum(lep_type[good_lep])")
    .Filter("goodlep_sumtypes == 44 || goodlep_sumtypes == 52 || goodlep_sumtypes == 48")
)

Apply additional cuts depending on lepton flavour

In [10]:

df = df.Filter(
    "GoodElectronsAndMuons(lep_type[good_lep], lep_pt[good_lep], lep_eta[good_lep], lep_phi[good_lep], lep_E[good_lep], lep_trackd0pvunbiased[good_lep], lep_tracksigd0pvunbiased[good_lep], lep_z0[good_lep])"
)

Create new columns with the kinematics of good leptons

In [11]:

df = (
    df.Define("goodlep_pt", "lep_pt[good_lep]")
    .Define("goodlep_eta", "lep_eta[good_lep]")
    .Define("goodlep_phi", "lep_phi[good_lep]")
    .Define("goodlep_E", "lep_E[good_lep]")
    .Define("goodlep_type", "lep_type[good_lep]")
)

Select leptons with high transverse momentum

In [12]:

df = df.Filter("goodlep_pt[0] > 25000 && goodlep_pt[1] > 15000 && goodlep_pt[2] > 10000")

Reweighting of the samples is different for "data" and "MC". This is the function to add reweighting for MC samples

In [13]:

ROOT.gInterpreter.Declare(
    """
double weights(float scaleFactor_1, float scaleFactor_2, float scaleFactor_3, float scaleFactor_4, float scale, float mcWeight, double xsecs, double sumws, double lumi)
{
    return scaleFactor_1 * scaleFactor_2 * scaleFactor_3 * scaleFactor_4 * scale * mcWeight * xsecs / sumws * lumi;
}
"""
)

Out[13]:

True

Use DefinePerSample to define which samples are MC and hence need reweighting

In [14]:

df = df.DefinePerSample("isMC", 'rdfsampleinfo_.Contains("mc")')
df = df.Define(
    "weight",
    "double x; return isMC ? weights(scaleFactor_ELE, scaleFactor_MUON, scaleFactor_LepTRIGGER, scaleFactor_PILEUP, scale, mcWeight, xsecs, sumws, lumi) :  1.;",
)

Compute invariant mass of the four lepton system

In [15]:

ROOT.gInterpreter.Declare(
    """
float ComputeInvariantMass(RVecF pt, RVecF eta, RVecF phi, RVecF e)
{
    ROOT::Math::PtEtaPhiEVector p1{pt[0], eta[0], phi[0], e[0]};
    ROOT::Math::PtEtaPhiEVector p2{pt[1], eta[1], phi[1], e[1]};
    ROOT::Math::PtEtaPhiEVector p3{pt[2], eta[2], phi[2], e[2]};
    ROOT::Math::PtEtaPhiEVector p4{pt[3], eta[3], phi[3], e[3]};
    return 0.001 * (p1 + p2 + p3 + p4).M();
}
"""
)

df = df.Define("m4l", "ComputeInvariantMass(goodlep_pt, goodlep_eta, goodlep_phi, goodlep_E)")

Book histograms for the four different samples: data, higgs, zz and other (this is specific to this particular analysis)

In [16]:

histos = []
for sample_category in ["data", "higgs", "zz", "other"]:
    histos.append(
        df.Filter(f'sample_category == "{sample_category}"').Histo1D(
            ROOT.RDF.TH1DModel(f"{sample_category}", "m4l", 24, 80, 170),
            "m4l",
            "weight",
        )
    )

Evaluate the systematic uncertainty

The systematic uncertainty in this analysis is the MC scale factor uncertainty that depends on lepton kinematics such as pT or pseudorapidity. Muons uncertainties are negligible, as stated in https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/MUON-2018-03/. Electrons uncertainties are evaluated based on the plots available in https://doi.org/10.48550/arXiv.1908.00005. The uncertainties are linearly interpolated, using the TGraph::Eval() method, to cover a range of pT values covered by the analysis.

Create a VaryHelper to interpolate the available data.

In [17]:

ROOT.gInterpreter.Declare(
    """
using namespace ROOT::VecOps;

class VaryHelper
{
    const std::vector<double> x{5.50e3, 5.52e3, 12.54e3, 17.43e3, 22.40e3, 27.48e3, 30e3, 10000e3};
    const std::vector<double> y{0.06628, 0.06395, 0.06396, 0.03372, 0.02441, 0.01403, 0, 0};
    TGraph graph;

public:
    VaryHelper() : graph(x.size(), x.data(), y.data()) {}
    RVec<double> operator()(const double &w, const RVecF &pt, const RVec<unsigned int> &type)
    {
        const auto v = Mean(Map(pt[type == 11], [this](auto p)
        {return this->graph.Eval(p); })
        );
        return RVec<double>{(1 + v) * w, (1 - v) * w};
    }
};

VaryHelper variationsFactory;
"""
)

Out[17]:

True

Use the Vary method to add the systematic variations to the total MC scale factor ("weight") of the analysis.

In [18]:

df_variations_mc = (
    df.Filter("isMC == true")
    .Vary("weight", "variationsFactory(weight, goodlep_pt, goodlep_type)", ["up", "down"])
    .Histo1D(ROOT.RDF.TH1DModel("Invariant Mass", "m4l", 24, 80, 170), "m4l", "weight")
)
histos_mc = ROOT.RDF.Experimental.VariationsFor(df_variations_mc)

We reached the end of the analysis part. We now evaluate the total MC uncertainty based on the variations. No computation graph was triggered yet, we trigger the computation graph for all histograms at once now, by calling 'histos_mc["nominal"].GetXaxis()'. Note, in this case the uncertainties are symmetric.

In [19]:

for i in range(0, histos_mc["nominal"].GetXaxis().GetNbins()):
    (
        histos_mc["nominal"].SetBinError(
            i, (histos_mc["weight:up"].GetBinContent(i) - histos_mc["nominal"].GetBinContent(i))
        )
    )

|>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |   [Elapsed time: 2:21m  processing file: 3 / 9  processed evts: 1000 / 555002  7.06e+00 evt/s 21:46:56h  remaining time (per file being processed)]   
|================================================================================================================================================================================================================>                                                                                                                                                                                                                                                                                                                                                                                                                                                         |   [Elapsed time: 2:22m  processing file: 3 / 9  processed evts: 179000 / 555002  8.90e+04 evt/s 0:04m  remaining time (per file being processed)]   
|=========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>                                                                                |   [Elapsed time: 2:23m  processing file: 3 / 9  processed evts: 487000 / 555002  1.62e+05 evt/s 0:00m  remaining time (per file being processed)]   
|=================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>                                                                                                                                                                        |   [Elapsed time: 2:34m  processing file: 5 / 9  processed evts: 554000 / 746365  1.23e+05 evt/s 0:01m  remaining time (per file being processed)]   
|=====================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>                                                                                    |   [Elapsed time: 2:35m  processing file: 5 / 9  processed evts: 651000 / 746365  1.18e+05 evt/s 0:00m  remaining time (per file being processed)]   
|=======================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>  |   [Elapsed time: 2:36m  processing file: 6 / 9  processed evts: 745000 / 746765  1.06e+05 evt/s 0:00m  remaining time (per file being processed)]   
|==================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>                                                                                                                       |   [Elapsed time: 2:52m  processing file: 7 / 9  processed evts: 746000 / 911481  9.08e+04 evt/s 0:01m  remaining time (per file being processed)]   
|=============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>                                            |   [Elapsed time: 2:54m  processing file: 7 / 9  processed evts: 850000 / 911481  8.88e+04 evt/s 0:00m  remaining time (per file being processed)]   
|=======================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================>  |   [Elapsed time: 3:00m  processing file: 8 / 9  processed evts: 910000 / 912379  8.01e+04 evt/s 0:00m  remaining time (per file being processed)]   
[Total elapsed time: 4:28m  processed files: 9 / 9  processed evts: 912535 / 912535]

Make the plot of the data, individual MC contributions and the total MC scale factor systematic variations.

Set styles

In [20]:

ROOT.gROOT.SetStyle("ATLAS")

Function to add ATLAS label

In [21]:

def draw_atlas_label():
    text = ROOT.TLatex()
    text.SetNDC()
    text.SetTextFont(72)
    text.SetTextSize(0.04)
    text.DrawLatex(0.19, 0.85, "ATLAS")
    text.SetTextFont(42)
    text.DrawLatex(0.19 + 0.15, 0.85, "Open Data")
    text.SetTextSize(0.035)
    text.DrawLatex(0.21, 0.80, "#sqrt{s} = 13 TeV, 10 fb^{-1}")

Create canvas with pad

In [22]:

c1 = ROOT.TCanvas("c", "", 600, 600)
pad = ROOT.TPad("upper_pad", "", 0, 0, 1, 1)
pad.SetTickx(False)
pad.SetTicky(False)
pad.Draw()
pad.cd()

Out[22]:

<cppyy.gbl.TPad object at 0x7f0ac800b400>

Draw stack with MC contributions

In [23]:

stack = ROOT.THStack()

Retrieve values of the data and MC histograms in order to plot them. Note: GetValue() action operation is performed after all lazy actions of the RDF were defined first.

In [24]:

h_data = histos[0].GetValue()
h_higgs = histos[1].GetValue()
h_zz = histos[2].GetValue()
h_other = histos[3].GetValue()

for h, color in zip([h_other, h_zz, h_higgs], [ROOT.kViolet - 9, ROOT.kAzure - 9, ROOT.kRed + 2]):
    h.SetLineWidth(1)
    h.SetLineColor(1)
    h.SetFillColor(color)
    stack.Add(h)

stack.Draw("HIST")
stack.GetXaxis().SetLabelSize(0.04)
stack.GetXaxis().SetTitleSize(0.045)
stack.GetXaxis().SetTitleOffset(1.3)
stack.GetXaxis().SetTitle("m_{4l}^{H#rightarrow ZZ} [GeV]")
stack.GetYaxis().SetLabelSize(0.04)
stack.GetYaxis().SetTitleSize(0.045)
stack.GetYaxis().SetTitle("Events")
stack.SetMaximum(35)
stack.GetYaxis().ChangeLabel(1, -1, 0)
stack.DrawClone(
    "HIST"
)  # DrawClone() method is necessary to draw a TObject in case the original object goes out of scope, needed for the interactive root session

Out[24]:

<cppyy.gbl.THStack object at 0x11eccdc0>

Draw MC scale factor and variations

In [25]:

histos_mc["nominal"].SetStats(0)
histos_mc["nominal"].SetFillColor(ROOT.kBlack)
histos_mc["nominal"].SetFillStyle(3254)
histos_mc["nominal"].DrawClone("E2 sames")
histos_mc["weight:up"].SetStats(0)
histos_mc["weight:up"].SetLineColor(ROOT.kGreen + 2)
histos_mc["weight:up"].DrawClone("HIST SAME")
histos_mc["weight:down"].SetLineColor(ROOT.kBlue + 2)
histos_mc["weight:down"].SetStats(0)
histos_mc["weight:down"].DrawClone("HIST SAME")

Out[25]:

<cppyy.gbl.TH1D object at 0x11d40ff0>

Draw data histogram

In [26]:

h_data.SetMarkerStyle(20)
h_data.SetMarkerSize(1.2)
h_data.SetLineWidth(2)
h_data.SetLineColor(ROOT.kBlack)
h_data.DrawClone("E SAME")  # Draw raw data with errorbars

Out[26]:

<cppyy.gbl.TH1D object at 0x74119b0>

Add legend

In [27]:

legend = ROOT.TLegend(0.57, 0.65, 0.94, 0.94)
legend.SetTextFont(42)
legend.SetFillStyle(0)
legend.SetBorderSize(0)
legend.SetTextSize(0.025)
legend.SetTextAlign(32)
legend.AddEntry(h_data, "Data", "lep")
legend.AddEntry(h_higgs, "Higgs MC", "f")
legend.AddEntry(h_zz, "ZZ MC", "f")
legend.AddEntry(h_other, "Other MC", "f")
legend.AddEntry((histos_mc["weight:down"]), "Total MC Variations Down", "l")
legend.AddEntry((histos_mc["weight:up"]), "Total MC Variations Up", "l")
legend.AddEntry((histos_mc["nominal"]), "Total MC Uncertainty", "f")
legend.Draw("SAME")

draw_atlas_label()

c1.Update()

Save the plot

In [28]:

c1.SaveAs("df106_HiggsToFourLeptons_python.png")
print("Saved figure to df106_HiggsToFourLeptons_python.png")

Saved figure to df106_HiggsToFourLeptons_python.png

Info in <TCanvas::Print>: png file df106_HiggsToFourLeptons_python.png has been created

Draw all canvases

In [29]:

from ROOT import gROOT 
gROOT.GetListOfCanvases().Draw()