This tutorial shows how to apply with the modern interfaces models saved in TMVA XML files.
Author: Stefan Wunsch
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, March 19, 2024 at 07:19 PM.
using namespace TMVA::Experimental;
Definition of a helper function:
%%cpp -d
void train(const std::string &filename)
{
// Create factory
auto output = TFile::Open("TMVARR.root", "RECREATE");
auto factory = new TMVA::Factory("tmva003",
output, "!V:!DrawProgressBar:AnalysisType=Classification");
// Open trees with signal and background events
auto data = TFile::Open(filename.c_str());
auto signal = (TTree *)data->Get("TreeS");
auto background = (TTree *)data->Get("TreeB");
// Add variables and register the trees with the dataloader
auto dataloader = new TMVA::DataLoader("tmva003_BDT");
const std::vector<std::string> variables = {"var1", "var2", "var3", "var4"};
for (const auto &var : variables) {
dataloader->AddVariable(var);
}
dataloader->AddSignalTree(signal, 1.0);
dataloader->AddBackgroundTree(background, 1.0);
dataloader->PrepareTrainingAndTestTree("", "");
// Train a TMVA method
factory->BookMethod(dataloader, TMVA::Types::kBDT, "BDT", "!V:!H:NTrees=300:MaxDepth=2");
factory->TrainAllMethods();
}
First, let's train a model with TMVA.
const std::string filename = "http://root.cern/files/tmva_class_example.root";
train(filename);
<HEADER> DataSetInfo : [tmva003_BDT] : Added class "Signal" : Add Tree TreeS of type Signal with 6000 events <HEADER> DataSetInfo : [tmva003_BDT] : Added class "Background" : Add Tree TreeB of type Background with 6000 events : Dataset[tmva003_BDT] : Class index : 0 name : Signal : Dataset[tmva003_BDT] : Class index : 1 name : Background <HEADER> Factory : Booking method: BDT : : Rebuilding Dataset tmva003_BDT : Building event vectors for type 2 Signal : Dataset[tmva003_BDT] : create input formulas for tree TreeS : Building event vectors for type 2 Background : Dataset[tmva003_BDT] : create input formulas for tree TreeB <HEADER> DataSetFactory : [tmva003_BDT] : Number of events in input trees : : : Dataset[tmva003_BDT] : Weight renormalisation mode: "EqualNumEvents": renormalises all event classes ... : Dataset[tmva003_BDT] : such that the effective (weighted) number of events in each class is the same : Dataset[tmva003_BDT] : (and equals the number of events (entries) given for class=0 ) : Dataset[tmva003_BDT] : ... i.e. such that Sum[i=1..N_j]{w_i} = N_classA, j=classA, classB, ... : Dataset[tmva003_BDT] : ... (note that N_j is the sum of TRAINING events : Dataset[tmva003_BDT] : ..... Testing events are not renormalised nor included in the renormalisation factor!) : Number of training and testing events : --------------------------------------------------------------------------- : Signal -- training events : 3000 : Signal -- testing events : 3000 : Signal -- training and testing events: 6000 : Background -- training events : 3000 : Background -- testing events : 3000 : Background -- training and testing events: 6000 : <HEADER> DataSetInfo : Correlation matrix (Signal): : ---------------------------------------- : var1 var2 var3 var4 : var1: +1.000 +0.390 +0.594 +0.819 : var2: +0.390 +1.000 +0.684 +0.724 : var3: +0.594 +0.684 +1.000 +0.848 : var4: +0.819 +0.724 +0.848 +1.000 : ---------------------------------------- <HEADER> DataSetInfo : Correlation matrix (Background): : ---------------------------------------- : var1 var2 var3 var4 : var1: +1.000 +0.854 +0.917 +0.965 : var2: +0.854 +1.000 +0.926 +0.934 : var3: +0.917 +0.926 +1.000 +0.972 : var4: +0.965 +0.934 +0.972 +1.000 : ---------------------------------------- <HEADER> DataSetFactory : [tmva003_BDT] : : <HEADER> Factory : Train all methods <HEADER> Factory : [tmva003_BDT] : Create Transformation "I" with events from all classes. : <HEADER> : Transformation, Variable selection : : Input : variable 'var1' <---> Output : variable 'var1' : Input : variable 'var2' <---> Output : variable 'var2' : Input : variable 'var3' <---> Output : variable 'var3' : Input : variable 'var4' <---> Output : variable 'var4' <HEADER> TFHandler_Factory : Variable Mean RMS [ Min Max ] : ----------------------------------------------------------- : var1: 0.017312 1.6864 [ -5.8991 4.7639 ] : var2: 0.0068952 1.5665 [ -5.2454 4.6508 ] : var3: 0.0094455 1.7427 [ -5.3563 4.6430 ] : var4: 0.16960 2.1719 [ -6.9675 4.9600 ] : ----------------------------------------------------------- : Ranking input variables (method unspecific)... <HEADER> IdTransformation : Ranking result (top variable is best ranked) : ----------------------------- : Rank : Variable : Separation : ----------------------------- : 1 : var4 : 3.564e-01 : 2 : var3 : 2.899e-01 : 3 : var1 : 2.792e-01 : 4 : var2 : 2.260e-01 : ----------------------------- <HEADER> Factory : Train method: BDT for Classification : <HEADER> BDT : #events: (reweighted) sig: 3000 bkg: 3000 : #events: (unweighted) sig: 3000 bkg: 3000 : Training 300 Decision Trees ... patience please : Elapsed time for training with 6000 events: 0.589 sec <HEADER> BDT : [tmva003_BDT] : Evaluation of BDT on training sample (6000 events) : Elapsed time for evaluation of 6000 events: 0.0721 sec : Creating xml weight file: tmva003_BDT/weights/tmva003_BDT.weights.xml : Creating standalone class: tmva003_BDT/weights/tmva003_BDT.class.C : TMVARR.root:/tmva003_BDT/Method_BDT/BDT <HEADER> Factory : Training finished : : Ranking input variables (method specific)... <HEADER> BDT : Ranking result (top variable is best ranked) : -------------------------------------- : Rank : Variable : Variable Importance : -------------------------------------- : 1 : var4 : 3.940e-01 : 2 : var1 : 2.619e-01 : 3 : var2 : 1.849e-01 : 4 : var3 : 1.592e-01 : -------------------------------------- <HEADER> Factory : === Destroy and recreate all methods via weight files for testing === : : Reading weight file: tmva003_BDT/weights/tmva003_BDT.weights.xml
Next, we load the model from the TMVA XML file.
RReader model("tmva003_BDT/weights/tmva003_BDT.weights.xml");
In case you need a reminder of the names and order of the variables during training, you can ask the model for it.
auto variables = model.GetVariableNames();
The model can now be applied in different scenarios:
The event-by-event inference takes the values of the variables as a std::vector
auto prediction = model.Compute({0.5, 1.0, -0.2, 1.5});
std::cout << "Single-event inference: " << prediction[0] << "\n\n";
Single-event inference: 0.233873
For batch inference, the data needs to be structured as a matrix. For this purpose, TMVA makes use of the RTensor class. For convenience, we use RDataFrame and the AsTensor utility to make the read-out from the ROOT file.
ROOT::RDataFrame df("TreeS", filename);
auto df2 = df.Range(3); // Read only a small subset of the dataset
auto x = AsTensor<float>(df2, variables);
auto y = model.Compute(x);
std::cout << "RTensor input for inference on data of multiple events:\n" << x << "\n\n";
std::cout << "Prediction performed on multiple events: " << y << "\n\n";
RTensor input for inference on data of multiple events: { { -1.14361, -0.822373, -0.495426, -0.629427 } { 2.14344, -0.0189228, 0.26703, 1.26749 } { -0.443913, 0.486827, 0.139535, 0.611483 } } Prediction performed on multiple events: { 0.173541, -0.0540229, 0.266502 }
We write a small lambda function that performs for us the inference on a dataframe to omit code duplication.
auto make_histo = [&](const std::string &treename) {
ROOT::RDataFrame df(treename, filename);
auto df2 = df.Define("y", Compute<4, float>(model), variables);
return df2.Histo1D({treename.c_str(), ";BDT score;N_{Events}", 30, -0.5, 0.5}, "y");
};
auto sig = make_histo("TreeS");
auto bkg = make_histo("TreeB");
Make plot
gStyle->SetOptStat(0);
auto c = new TCanvas("", "", 800, 800);
sig->SetLineColor(kRed);
bkg->SetLineColor(kBlue);
sig->SetLineWidth(2);
bkg->SetLineWidth(2);
bkg->Draw("HIST");
sig->Draw("HIST SAME");
TLegend legend(0.7, 0.7, 0.89, 0.89);
legend.SetBorderSize(0);
legend.AddEntry("TreeS", "Signal", "l");
legend.AddEntry("TreeB", "Background", "l");
legend.Draw();
c->DrawClone();