Python version of the df102_NanoAODDimuonAnalysis
ROOT tutorial running with PyRDF.
The NanoAOD-like input files are filled with 66 mio. events from CMS OpenData containing muon candidates part of 2012 dataset (DOI: 10.7483/OPENDATA.CMS.YLIC.86ZZ and DOI: 10.7483/OPENDATA.CMS.M5AD.Y3V3).
The macro matches muon pairs and produces an histogram of the dimuon mass spectrum showing resonances up to the Z mass. Note that the bump at 30 GeV is not a resonance but a trigger effect.
Some more details about the dataset:
*Date: April 2019
Author: Stefan Wunsch (KIT, CERN)
Adapted to PyRDF and Spark: Javier Cervantes Villanueva (CERN), Prasanth Kothuri (CERN)*
To run this notebook we used the following configuration:
import PyRDF
import ROOT
# Configure PyRDF to run on Spark splitting the dataset into 32 partitions (defines parallelism)
PyRDF.use("spark", {'npartitions': '32'})
# Create dataframe from NanoAOD files
files = [
"root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root",
"root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012C_DoubleMuParked.root"
]
df = PyRDF.RDataFrame("Events", files);
Welcome to JupyROOT 6.17/01
# For simplicity, select only events with exactly two muons and require opposite charge
df_2mu = df.Filter("nMuon == 2", "Events with exactly two muons");
df_os = df_2mu.Filter("Muon_charge[0] != Muon_charge[1]", "Muons with opposite charge");
# Compute invariant mass of the dimuon system
# Uses InvariantMass, provided by RVec (see the reference for other RVec helper functions)
df_mass = df_os.Define("Dimuon_mass", "InvariantMass(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")
# Make histogram of dimuon mass spectrum
h = df_mass.Histo1D(("Dimuon_mass", "Dimuon_mass", 30000, 0.25, 300), "Dimuon_mass")
# Report is not supported yet in the Spark backend
# report = df_mass3.Report()
# Produce pdf of the plot
ROOT.gStyle.SetOptStat(0)
ROOT.gStyle.SetTextFont(42)
c = ROOT.TCanvas("c", "", 800, 700);
c.SetLogx(); c.SetLogy();
h.SetTitle("");
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04);
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04);
h.Draw();
label = ROOT.TLatex()
label.SetNDC(True);
label.DrawLatex(0.175, 0.740, "#eta");
label.DrawLatex(0.205, 0.775, "#rho,#omega");
label.DrawLatex(0.270, 0.740, "#phi");
label.DrawLatex(0.400, 0.800, "J/#psi");
label.DrawLatex(0.415, 0.670, "#psi'");
label.DrawLatex(0.485, 0.700, "Y(1,2,3S)");
label.DrawLatex(0.755, 0.680, "Z");
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}");
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");
c.SaveAs("dimuon_spectrum.pdf");
# Report is not supported yet in the Spark backend
#report.Print();
Info in <TCanvas::Print>: pdf file dimuon_spectrum.pdf has been created
The previous cell computed the result and saved it as a pdf image. Now we can visualize the plot online without triggering the event loop again, since the results have been cached:
%jsroot on
ROOT.gStyle.SetTextFont(42)
c2 = ROOT.TCanvas("c2", "", 800, 700);
c2.SetLogx(); c2.SetLogy();
h.SetTitle("");
h.GetXaxis().SetTitle("m_{#mu#mu} (GeV)"); h.GetXaxis().SetTitleSize(0.04);
h.GetYaxis().SetTitle("N_{Events}"); h.GetYaxis().SetTitleSize(0.04);
h.SetStats(False)
h.Draw();
label = ROOT.TLatex()
label.SetNDC(True);
label.DrawLatex(0.175, 0.740, "#eta");
label.DrawLatex(0.205, 0.775, "#rho,#omega");
label.DrawLatex(0.270, 0.740, "#phi");
label.DrawLatex(0.400, 0.800, "J/#psi");
label.DrawLatex(0.415, 0.670, "#psi'");
label.DrawLatex(0.485, 0.700, "Y(1,2,3S)");
label.DrawLatex(0.755, 0.680, "Z");
label.SetTextSize(0.040); label.DrawLatex(0.100, 0.920, "#bf{CMS Open Data}");
label.SetTextSize(0.030); label.DrawLatex(0.630, 0.920, "#sqrt{s} = 8 TeV, L_{int} = 11.6 fb^{-1}");
label.DrawClone("Same")
c2.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: c2