T M V A Regression Application

This macro provides a simple example on how to use the trained regression MVAs within an analysis module

  • Project : TMVA - a Root-integrated toolkit for multivariate data analysis
  • Package : TMVA
  • Executable: TMVARegressionApplication

Author: Andreas Hoecker
This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Saturday, September 18, 2021 at 09:51 AM.

In [1]:
%%cpp -d
#include <cstdlib>
#include <vector>
#include <iostream>
#include <map>
#include <string>

#include "TFile.h"
#include "TTree.h"
#include "TString.h"
#include "TSystem.h"
#include "TROOT.h"
#include "TStopwatch.h"

#include "TMVA/Tools.h"
#include "TMVA/Reader.h"

using namespace TMVA;

Arguments are defined.

In [2]:
TString myMethodList = "";

This loads the library

In [3]:
TMVA::Tools::Instance();

Default mva methods to be trained + tested

In [4]:
std::map<std::string,int> Use;

--- mutidimensional likelihood and nearest-neighbour methods

In [5]:
Use["PDERS"]           = 0;
Use["PDEFoam"]         = 1;
Use["KNN"]             = 1;

--- Linear Discriminant Analysis

In [6]:
Use["LD"]              = 1;

--- Function Discriminant analysis

In [7]:
Use["FDA_GA"]          = 0;
Use["FDA_MC"]          = 0;
Use["FDA_MT"]          = 0;
Use["FDA_GAMT"]        = 0;

--- Neural Network

In [8]:
Use["MLP"] = 0;
#ifdef R__HAS_TMVACPU
Use["DNN_CPU"] = 1;
#else
Use["DNN_CPU"] = 0;
#endif

--- Support Vector Machine

In [9]:
Use["SVM"]             = 0;

--- Boosted Decision Trees

In [10]:
Use["BDT"]             = 0;
Use["BDTG"]            = 1;

In [11]:
std::cout << std::endl;
std::cout << "==> Start TMVARegressionApplication" << std::endl;
==> Start TMVARegressionApplication

Select methods (don't look at this code - not of interest)

In [12]:
if (myMethodList != "") {
   for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) it->second = 0;

   std::vector<TString> mlist = gTools().SplitString( myMethodList, ',' );
   for (UInt_t i=0; i<mlist.size(); i++) {
      std::string regMethod(mlist[i]);

      if (Use.find(regMethod) == Use.end()) {
         std::cout << "Method \"" << regMethod << "\" not known in TMVA under this name. Choose among the following:" << std::endl;
         for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) std::cout << it->first << " ";
         std::cout << std::endl;
         return;
      }
      Use[regMethod] = 1;
   }
}

--- create the reader object

In [13]:
TMVA::Reader *reader = new TMVA::Reader( "!Color:!Silent" );

Create a set of variables and declare them to the reader

  • the variable names MUST corresponds in name and type to those given in the weight file(s) used
In [14]:
Float_t var1, var2;
reader->AddVariable( "var1", &var1 );
reader->AddVariable( "var2", &var2 );

Spectator variables declared in the training have to be added to the reader, too

In [15]:
Float_t spec1,spec2;
reader->AddSpectator( "spec1:=var1*2",  &spec1 );
reader->AddSpectator( "spec2:=var1*3",  &spec2 );

--- book the mva methods

In [16]:
TString dir    = "dataset/weights/";
TString prefix = "TMVARegression";

Book method(s)

In [17]:
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) {
   if (it->second) {
      TString methodName = it->first + " method";
      TString weightfile = dir + prefix + "_" + TString(it->first) + ".weights.xml";
      reader->BookMVA( methodName, weightfile );
   }
}
                         : Booking "BDTG method" of type "BDT" from dataset/weights/TMVARegression_BDTG.weights.xml.
                         : Reading weight file: dataset/weights/TMVARegression_BDTG.weights.xml
<HEADER> DataSetInfo              : [Default] : Added class "Regression"
                         : Booked classifier "BDTG" of type: "BDT"
                         : Booking "DNN_CPU method" of type "DL" from dataset/weights/TMVARegression_DNN_CPU.weights.xml.
                         : Reading weight file: dataset/weights/TMVARegression_DNN_CPU.weights.xml
                         : Booked classifier "DNN_CPU" of type: "DL"
                         : Booking "KNN method" of type "KNN" from dataset/weights/TMVARegression_KNN.weights.xml.
                         : Reading weight file: dataset/weights/TMVARegression_KNN.weights.xml
                         : Creating kd-tree with 1000 events
                         : Computing scale factor for 1d distributions: (ifrac, bottom, top) = (80%, 10%, 90%)
<HEADER> ModulekNN                : Optimizing tree for 2 variables with 1000 values
                         : <Fill> Class 1 has     1000 events
                         : Booked classifier "KNN" of type: "KNN"
                         : Booking "LD method" of type "LD" from dataset/weights/TMVARegression_LD.weights.xml.
                         : Reading weight file: dataset/weights/TMVARegression_LD.weights.xml
                         : Booked classifier "LD" of type: "LD"
                         : Booking "PDEFoam method" of type "PDEFoam" from dataset/weights/TMVARegression_PDEFoam.weights.xml.
                         : Reading weight file: dataset/weights/TMVARegression_PDEFoam.weights.xml
                         : Read foams from file: dataset/weights/TMVARegression_PDEFoam.weights_foams.root
                         : Booked classifier "PDEFoam" of type: "PDEFoam"

Book output histograms

In [18]:
TH1* hists[100];
Int_t nhists = -1;
for (std::map<std::string,int>::iterator it = Use.begin(); it != Use.end(); it++) {
   TH1* h = new TH1F( it->first.c_str(), TString(it->first) + " method", 100, -100, 600 );
   if (it->second) hists[++nhists] = h;
}
nhists++;

Prepare input tree (this must be replaced by your data source) in this example, there is a toy tree with signal and one with background events we'll later on use only the "signal" events for the test in this example.

In [19]:
TFile *input(0);
TString fname = "./tmva_reg_example.root";
if (!gSystem->AccessPathName( fname )) {
   input = TFile::Open( fname ); // check if file in local directory exists
}
else {
   TFile::SetCacheFileDir(".");
   input = TFile::Open("http://root.cern.ch/files/tmva_reg_example.root", "CACHEREAD"); // if not: download from ROOT server
}
if (!input) {
   std::cout << "ERROR: could not open data file" << std::endl;
   exit(1);
}
std::cout << "--- TMVARegressionApp        : Using input file: " << input->GetName() << std::endl;
--- TMVARegressionApp        : Using input file: ./files/tmva_reg_example.root
Info in <TFile::OpenFromCache>: using local cache copy of http://root.cern.ch/files/tmva_reg_example.root [./files/tmva_reg_example.root]

--- event loop

Prepare the tree

  • here the variable names have to corresponds to your tree
  • you can use the same variables as above which is slightly faster, but of course you can use different ones and copy the values inside the event loop
In [20]:
TTree* theTree = (TTree*)input->Get("TreeR");
std::cout << "--- Select signal sample" << std::endl;
theTree->SetBranchAddress( "var1", &var1 );
theTree->SetBranchAddress( "var2", &var2 );

std::cout << "--- Processing: " << theTree->GetEntries() << " events" << std::endl;
TStopwatch sw;
sw.Start();
for (Long64_t ievt=0; ievt<theTree->GetEntries();ievt++) {

   if (ievt%1000 == 0) {
      std::cout << "--- ... Processing event: " << ievt << std::endl;
   }

   theTree->GetEntry(ievt);

   // Retrieve the MVA target values (regression outputs) and fill into histograms
   // NOTE: EvaluateRegression(..) returns a vector for multi-target regression

   for (Int_t ih=0; ih<nhists; ih++) {
      TString title = hists[ih]->GetTitle();
      Float_t val = (reader->EvaluateRegression( title ))[0];
      hists[ih]->Fill( val );
   }
}
sw.Stop();
std::cout << "--- End of event loop: "; sw.Print();
--- Select signal sample
--- Processing: 10000 events
--- ... Processing event: 0
                         : Rebuilding Dataset Default
--- ... Processing event: 1000
--- ... Processing event: 2000
--- ... Processing event: 3000
--- ... Processing event: 4000
--- ... Processing event: 5000
--- ... Processing event: 6000
--- ... Processing event: 7000
--- ... Processing event: 8000
--- ... Processing event: 9000
--- End of event loop: Real time 0:00:03, CP time 3.200

--- write histograms

In [21]:
TFile *target  = new TFile( "TMVARegApp.root","RECREATE" );
for (Int_t ih=0; ih<nhists; ih++) hists[ih]->Write();
target->Close();

std::cout << "--- Created root file: \"" << target->GetName()
          << "\" containing the MVA output histograms" << std::endl;

delete reader;

std::cout << "==> TMVARegressionApplication is done!" << std::endl << std::endl;
--- Created root file: "TMVARegApp.root" containing the MVA output histograms
==> TMVARegressionApplication is done!