Df 0 1 2_ Defines And Filters As Strings

Use just-in-time-compiled Filters and Defines for quick prototyping.

This tutorial illustrates how to use jit-compiling features of RDataFrame to define data using C++ code in a Python script

Author: Guilherme Amadio (CERN)
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, June 15, 2021 at 07:17 AM.

In [ ]:
import ROOT

We will inefficiently calculate an approximation of pi by generating some data and doing very simple filtering and analysis on it.

We start by creating an empty dataframe where we will insert 10 million random points in a square of side 2.0 (that is, with an inscribed unit circle).

In [ ]:
npoints = 10000000
df = ROOT.RDataFrame(npoints)

Define what data we want inside the dataframe. We do not need to define p as an array, but we do it here to demonstrate how to use jitting with RDataFrame

In [ ]:
pidf = df.Define("x", "gRandom->Uniform(-1.0, 1.0)") \
         .Define("y", "gRandom->Uniform(-1.0, 1.0)") \
         .Define("p", "std::array<double, 2> v{x, y}; return v;") \
         .Define("r", "double r2 = 0.0; for (auto&& w : p) r2 += w*w; return sqrt(r2);")

Now we have a dataframe with columns x, y, p (which is a point based on x and y), and the radius r = sqrt(xx + yy). In order to approximate pi, we need to know how many of our data points fall inside the circle of radius one compared with the total number of points. The ratio of the areas is

 A_circle / A_square = pi r*r / l * l, where r = 1.0, and l = 2.0

Therefore, we can approximate pi with 4 times the number of points inside the unit circle over the total number of points:

In [ ]:
incircle = pidf.Filter("r <= 1.0").Count().GetValue()

pi_approx = 4.0 * incircle / npoints

print("pi is approximately equal to %g" % (pi_approx))