In this notebook we show how to use inside IPython ROOT (C++ library, de-facto standard in High Energy Physics).
This notebook is aimed to help ROOT users.
Working using ROOT-way loops is very slow in python and in most cases useless.
You're proposed to use root_numpy
— a very convenient python library to operate with ROOT (root_numpy
is included in REP docker image, but it is installed quite easily).
%matplotlib inline
There are two libraries to work with ROOT files
Let's show how to use the second library.
import numpy
import root_numpy
# generating random data
data = numpy.random.normal(size=[10000, 2])
# adding names of columns
data = data.view([('first', float), ('second', float)])
# saving to file
root_numpy.array2root(data, filename='./toy_datasets/random.root', treename='tree', mode='recreate')
!ls ./toy_datasets
MiniBooNE_PID.txt README.md magic04.data random.root wget-log
from rootpy.io import root_open
with root_open('./toy_datasets/random.root', mode='a') as myfile:
new_column = numpy.array(numpy.ones([10000, 1]) , dtype=[('new', 'f8')])
root_numpy.array2tree(new_column, tree=myfile.tree)
myfile.write()
root_numpy.root2array('./toy_datasets/random.root', treename='tree')
array([(-0.7293706390084891, -0.6434950644122529), (0.9144554461677479, 0.18204403981590392), (1.4880249632214615, 1.0750267248542769), ..., (0.8314923070439978, -0.18743518192156958), (0.43746295194683626, -0.3003532169872483), (0.8587695590082538, 0.17430817995134756)], dtype=[('first', '<f8'), ('second', '<f8')])
pay attention that canvas
is on the last line. This is an output value of cell.
When IPython cell return canvas, it is automatically drawn
import ROOT
from rep.plotting import canvas
canvas = canvas('my_canvas')
function1 = ROOT.TF1( 'fun1', 'abs(sin(x)/x)', 0, 10)
canvas.SetGridx()
canvas.SetGridy()
function1.Draw()
# Drawing output (last line is considered as output of cell)
canvas
File = ROOT.TFile("toy_datasets/random.root")
Tree = File.Get("tree")
Tree.Draw("first")
canvas
# we need to keep histogram in any variable, otherwise it will be deleted automatically
h1 = ROOT.TH1F("h1","hist from tree",50, -0.25, 0.25)
Tree.Draw("first>>h1")
canvas
But IPython provides it's own plotting / data manipulation techniques. Brief demostration below.
Pay attention that there is column-expression which is evaluated on-the-fly.
data = root_numpy.root2array("toy_datasets/random.root",
treename='tree',
branches=['first', 'second', 'sin(first) * exp(second)'],
selection='first > 0')
in the example above we selected three branches (one of which is an expression and was computed on-the-fly) and selections
# taking, i.e. first 10 elements using python slicing:
data2 = data[:10]
pandas allows easy manipulations with data.
import pandas
dataframe = pandas.DataFrame(data)
# looking at first elements
dataframe.head()
first | second | sin(first) * exp(second) | |
---|---|---|---|
0 | 0.914455 | 0.182044 | 0.950413 |
1 | 1.488025 | 1.075027 | 2.920040 |
2 | 0.104062 | -0.600626 | 0.056972 |
3 | 0.947616 | -1.385899 | 0.203087 |
4 | 2.528759 | 1.408811 | 2.353142 |
# taking elements, that satisfy some condition, again showing only first
dataframe[dataframe['second'] > 0].head()
first | second | sin(first) * exp(second) | |
---|---|---|---|
0 | 0.914455 | 0.182044 | 0.950413 |
1 | 1.488025 | 1.075027 | 2.920040 |
4 | 2.528759 | 1.408811 | 2.353142 |
7 | 0.248552 | 1.790109 | 1.473571 |
8 | 0.248634 | 2.085482 | 1.980567 |
# adding new column as result of some operation
dataframe['third'] = dataframe['first'] + dataframe['second']
dataframe.head()
first | second | sin(first) * exp(second) | third | |
---|---|---|---|---|
0 | 0.914455 | 0.182044 | 0.950413 | 1.096499 |
1 | 1.488025 | 1.075027 | 2.920040 | 2.563052 |
2 | 0.104062 | -0.600626 | 0.056972 | -0.496564 |
3 | 0.947616 | -1.385899 | 0.203087 | -0.438283 |
4 | 2.528759 | 1.408811 | 2.353142 | 3.937570 |
Default library for plotting in python is matplotlib.
import matplotlib.pyplot as plt
plt.figure(figsize=(9, 7))
plt.hist(data['first'], bins=50)
plt.xlabel('first')
<matplotlib.text.Text at 0x10464c290>
plt.figure(figsize=(9, 7))
plt.hist(data['second'], bins=50)
plt.xlabel('second')
<matplotlib.text.Text at 0x1137e5a10>
root_numpy
as a very nice bridge between two worlds.