A basic example of ROOT's RooFit statistical modeling language in iPython notebook

License: BSD (C) 2014, Kyle Cranmer. Feel free to use, distribute, and modify with the above attribution.

Particle physicisits primarily use ROOT for the data analysis framework. Part of that framework is a package called RooFit statistical modeling and fitting package. I have contributed to this package and added a layer on top called RooStats that provides with statistical inference in both frequentist and Bayesian paradigms based on statistical models made with RooFit. These are the tools that were used to claim the discover the Higgs boson, and those statistical models get pretty complicated.

Here I demonstrate a simple example of RooFit's ability to create a statistical model, generate some simulated data, fit that data, create the profile likelihood, and provide a covariance matrix from the likelihood fit.

ROOT is a C++ library, but it has python bindings known as PyROOT.

In [1]:
import ROOT
import rootnotes #for plotting with iPython
c1 = rootnotes.default_canvas()

Here we create a "workspace" object that provides a factory with a convenient syntax for statistical models and variables. The workspace also provides an I/O mechanism to read/write statistical models and data to/from files.

In this example, we create a mixture model of a falling exponential distribution and a Gaussian for a variable x.
This is really a marked Poisson process, because in addition to the pdf on x, we also encode that we expect s=50 events from the Gaussian and b=100 events from the falling exponential.

In [2]:
w = ROOT.RooWorkspace()
w.factory('Gaussian::g(x[-5,5],mu[-3,3],sigma[1])')
w.factory('Exponential::e(x,tau[-.5,-3,0])')
w.factory('SUM::model(s[50,0,100]*g,b[100,0,1000]*e)')
w.Print() #this isn't displaying in iPython
TClass::TClass:0: RuntimeWarning: no dictionary for class stack<RooAbsArg*,deque<RooAbsArg*> > is available

Now that we have made the mdoel, we can generate some fake data, make a plot of the data, fit the model to that data, and overlay the fitted model in the plot

In [3]:
x = w.var('x')
pdf = w.pdf('model')
frame = x.frame()
data = pdf.generate(ROOT.RooArgSet(x))
data.plotOn(frame)
fitResult = pdf.fitTo(data,ROOT.RooFit.Save())
pdf.plotOn(frame)
frame.Draw()
c1
Out[3]:

We can check the value of the parameters after the fit

In [4]:
mu = w.var('mu')
print 'best fit value of mean for Gaussian is', mu.getVal(), '±', mu.getError()
best fit value of mean for Gaussian is 0.324840840875 ± 0.158716435359

And make a plot of the log(profile likelihood Ratio) as a function of the signal rate

In [5]:
s=w.var('s')
sframe=s.frame()
lnL = pdf.createNLL(data)
lnProfileL = lnL.createProfile(ROOT.RooArgSet(s))
lnProfileL.plotOn(sframe)
sframe.Draw()
c1
Out[5]:

And finally, make a plot that encodes the correlation matrix between the four free parameters in the fit.

In [6]:
corrMatrixHist = fitResult.correlationHist()
corrMatrixHist.SetStats(False)
corrMatrixHist.Draw('colz text')
c1
Out[6]:

Visualize the model

In [38]:
% pylab inline
import networkx as nx
Populating the interactive namespace from numpy and matplotlib
In [63]:
pdf.graphVizTree('expressionTree.dot')
G = nx.DiGraph(nx.read_dot('expressionTree.dot'))
nx.draw_graphviz(G,'dot')
In [97]:
import pydot
g = pydot.graph_from_dot_file('expressionTree.dot')
G2 = nx.from_pydot(g)
nx.draw_graphviz(G2,'dot')

"""
print "inspect networkx graph"
#iterate over networkx graph
print G2.node
for n in G2.nodes():
    print n, G2.out_degree(n), G2.successors(n)

print "inspect pydot graph"
#iterate over pydot graph
for n in g.get_node_list():
    print n.get_label(), n.get_attributes(), n.get_layer()
"""
g.write_png('test.png')
Out[97]:
True

Display the image generated via graphviz above that has the styling attributes

In [77]:
!cat expressionTree.dot
digraph model{
"model" [ color=red, label="RooAddPdf
model"];
"g" [ color=red, label="RooGaussian
g"];
"x" [ color=blue, label="RooRealVar
x"];
"mu" [ color=blue, label="RooRealVar
mu"];
"sigma" [ color=blue, label="RooRealVar
sigma"];
"s" [ color=blue, label="RooRealVar
s"];
"e" [ color=red, label="RooExponential
e"];
"tau" [ color=blue, label="RooRealVar
tau"];
"b" [ color=blue, label="RooRealVar
b"];
"g" -> "x";
"g" -> "mu";
"g" -> "sigma";
"e" -> "x";
"e" -> "tau";
"model" -> "g";
"model" -> "e";
"model" -> "s";
"model" -> "b";
}
In [107]:
print "inspect networkx graph"
#iterate over networkx graph
print G2.node
for n in G2.nodes():
    print n, G2.out_degree(n), G2.successors(n)
G2
inspect networkx graph
{'tau': {'color': 'blue', 'label': '"RooRealVar\ntau"'}, 'b': {'color': 'blue', 'label': '"RooRealVar\nb"'}, 'e': {'color': 'red', 'label': '"RooExponential\ne"'}, 'g': {'color': 'red', 'label': '"RooGaussian\ng"'}, 'mu': {'color': 'blue', 'label': '"RooRealVar\nmu"'}, 's': {'color': 'blue', 'label': '"RooRealVar\ns"'}, 'x': {'color': 'blue', 'label': '"RooRealVar\nx"'}, 'model': {'color': 'red', 'label': '"RooAddPdf\nmodel"'}, 'sigma': {'color': 'blue', 'label': '"RooRealVar\nsigma"'}}
tau 0 []
b 0 []
e 2 ['tau', 'x']
g 3 ['mu', 'x', 'sigma']
mu 0 []
s 0 []
x 0 []
model 4 ['s', 'b', 'e', 'g']
sigma 0 []
{}

This post was created entirely in IPython notebook. Download the raw notebook here, or see a static view on nbviewer.