Fill RooDataSet/RooDataHist in RDataFrame.
This tutorial shows how to fill RooFit data classes directly from RDataFrame. Using two small helpers, we tell RDataFrame where the data has to go.
Author: Harshal Shende, Stephan Hageboeck (C++ version)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Wednesday, April 17, 2024 at 11:18 AM.
import ROOT
import math
We create an RDataFrame with two columns filled with 2 million random numbers.
df = ROOT.RDataFrame(2000000).Define("x", "gRandom->Uniform(-5., 5.)").Define("y", "gRandom->Gaus(1., 3.)")
We create RooFit variables that will represent the dataset.
x = ROOT.RooRealVar("x", "x", -5.0, 5.0)
y = ROOT.RooRealVar("y", "y", -50.0, 50.0)
x.setBins(10)
y.setBins(20)
We directly book the RooDataSetHelper action. We need to pass
NOTE: RDataFrame columns are matched to RooFit variables by position, not by name!
The returned object is not yet a RooDataSet, but an RResultPtr that will be lazy-evaluated once you call GetValue() on it. We will only evaluate the RResultPtr once all other RDataFrame related actions are declared. This way we trigger the event loop computation only once, which will improve the runtime significantly.
To learn more about lazy actions, see: https://root.cern/doc/master/classROOT_1_1RDataFrame.html#actions
roo_data_set_result = df.Book(
ROOT.std.move(ROOT.RooDataSetHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))), ("x", "y")
)
We first declare the RooDataHistHelper
rdhMaker = ROOT.RooDataHistHelper("dataset", "Title of dataset", ROOT.RooArgSet(x, y))
Then, we move it into an RDataFrame action:
roo_data_hist_result = df.Book(ROOT.std.move(rdhMaker), ("x", "y"))
At this point, all RDF actions were defined (namely, the Book
operations),
so we can get values from the RResultPtr objects, triggering the event loop
and getting the actual RooFit data objects.
roo_data_set = roo_data_set_result.GetValue()
roo_data_hist = roo_data_hist_result.GetValue()
Let's inspect the dataset / datahist.
def print_data(data):
print("")
data.Print()
for i in range(min(data.numEntries(), 20)):
print(
"("
+ ", ".join(["{0:8.3f}".format(var.getVal()) for var in data.get(i)])
+ ", ) weight={0:10.3f}".format(data.weight())
)
print("mean(x) = {0:.3f}".format(data.mean(x)) + "\tsigma(x) = {0:.3f}".format(math.sqrt(data.moment(x, 2.0))))
print("mean(y) = {0:.3f}".format(data.mean(y)) + "\tsigma(y) = {0:.3f}\n".format(math.sqrt(data.moment(y, 2.0))))
print_data(roo_data_set)
print_data(roo_data_hist)
( 4.997, -0.304, ) weight= 1.000 ( 4.472, 0.910, ) weight= 1.000 ( 4.575, 0.830, ) weight= 1.000 ( 0.400, 0.776, ) weight= 1.000 ( 2.599, -0.232, ) weight= 1.000 ( -1.844, 1.575, ) weight= 1.000 ( 0.197, 0.853, ) weight= 1.000 ( -1.077, -0.721, ) weight= 1.000 ( -4.697, -3.165, ) weight= 1.000 ( 4.437, -1.208, ) weight= 1.000 ( 3.983, -0.146, ) weight= 1.000 ( -0.014, -1.447, ) weight= 1.000 ( -3.177, -2.704, ) weight= 1.000 ( -4.371, -0.363, ) weight= 1.000 ( 2.254, -0.499, ) weight= 1.000 ( 2.139, 6.533, ) weight= 1.000 ( 1.993, 6.991, ) weight= 1.000 ( -3.708, 7.781, ) weight= 1.000 ( -4.168, 1.284, ) weight= 1.000 ( -4.177, 4.650, ) weight= 1.000 mean(x) = 0.001 sigma(x) = 2.886 mean(y) = 1.000 sigma(y) = 3.000 ( -4.500, -47.500, ) weight= 0.000 ( -4.500, -42.500, ) weight= 0.000 ( -4.500, -37.500, ) weight= 0.000 ( -4.500, -32.500, ) weight= 0.000 ( -4.500, -27.500, ) weight= 0.000 ( -4.500, -22.500, ) weight= 0.000 ( -4.500, -17.500, ) weight= 0.000 ( -4.500, -12.500, ) weight= 24.000 ( -4.500, -7.500, ) weight= 4537.000 ( -4.500, -2.500, ) weight= 69653.000 ( -4.500, 2.500, ) weight=107838.000 ( -4.500, 7.500, ) weight= 17790.000 ( -4.500, 12.500, ) weight= 292.000 ( -4.500, 17.500, ) weight= 0.000 ( -4.500, 22.500, ) weight= 0.000 ( -4.500, 27.500, ) weight= 0.000 ( -4.500, 32.500, ) weight= 0.000 ( -4.500, 37.500, ) weight= 0.000 ( -4.500, 42.500, ) weight= 0.000 ( -4.500, 47.500, ) weight= 0.000 mean(x) = 0.001 sigma(x) = 2.872 mean(y) = 0.999 sigma(y) = 3.329 RooDataSet::dataset[x,y] = 2000000 entries RooDataHist::dataset[x,y] = 200 bins (2e+06 weights)
Draw all canvases
from ROOT import gROOT
gROOT.GetListOfCanvases().Draw()