DataLoader Example

Declare Factory

In [1]:
TMVA::Tools::Instance();

auto inputFile = TFile::Open("https://raw.githubusercontent.com/iml-wg/tmvatutorials/master/inputdata.root");
auto outputFile = TFile::Open("TMVAOutputCV.root", "RECREATE");

TMVA::Factory factory("TMVAClassification", outputFile,
                      "!V:ROC:!Correlations:!Silent:Color:!DrawProgressBar:AnalysisType=Classification" );
--- Factory                  : You are running ROOT Version: 6.07/07, Apr 1, 2016
--- Factory                  : 
--- Factory                  : _/_/_/_/_/ _|      _|  _|      _|    _|_|   
--- Factory                  :    _/      _|_|  _|_|  _|      _|  _|    _| 
--- Factory                  :   _/       _|  _|  _|  _|      _|  _|_|_|_| 
--- Factory                  :  _/        _|      _|    _|  _|    _|    _| 
--- Factory                  : _/         _|      _|      _|      _|    _| 
--- Factory                  : 
--- Factory                  : ___________TMVA Version 4.2.1, Feb 5, 2015
--- Factory                  : 

Declare DataLoader(s)

In [2]:
TMVA::DataLoader loader1("dataset1");

loader1.AddVariable("var1");
loader1.AddVariable("var2");
loader1.AddVariable("var3");

TMVA::DataLoader loader2("dataset2"); 

loader2.AddVariable("var1");
loader2.AddVariable("var2");
loader2.AddVariable("var3");
loader2.AddVariable("var4");
loader2.AddVariable("var5 := var1-var3");
loader2.AddVariable("var6 := var1+var2");

Setup Dataset(s)

In [3]:
TTree *tsignal, *tbackground;
inputFile->GetObject("Sig", tsignal);
inputFile->GetObject("Bkg", tbackground);

TCut mycuts, mycutb;

loader1.AddSignalTree    (tsignal,     1.0);   //signal weight  = 1
loader1.AddBackgroundTree(tbackground, 1.0);   //background weight = 1 
loader1.PrepareTrainingAndTestTree(mycuts, mycutb,
                                   "nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V" ); 

loader2.AddSignalTree    (tsignal,     1.0);   //signal weight  = 1
loader2.AddBackgroundTree(tbackground, 1.0);   //background weight = 1   
loader2.PrepareTrainingAndTestTree(mycuts, mycutb,
                                   "nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V" );
--- DataSetInfo              : Dataset[dataset1] : Added class "Signal"	 with internal class number 0
--- dataset1                 : Add Tree Sig of type Signal with 6000 events
--- DataSetInfo              : Dataset[dataset1] : Added class "Background"	 with internal class number 1
--- dataset1                 : Add Tree Bkg of type Background with 6000 events
--- dataset1                 : Preparing trees for training and testing...
--- DataSetInfo              : Dataset[dataset2] : Added class "Signal"	 with internal class number 0
--- dataset2                 : Add Tree Sig of type Signal with 6000 events
--- DataSetInfo              : Dataset[dataset2] : Added class "Background"	 with internal class number 1
--- dataset2                 : Add Tree Bkg of type Background with 6000 events
--- dataset2                 : Preparing trees for training and testing...

Booking Methods

First dataset

In [4]:
//Boosted Decision Trees
factory.BookMethod(&loader1,TMVA::Types::kBDT, "BDT",
                   "!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

//Multi-Layer Perceptron (Neural Network)
factory.BookMethod(&loader1, TMVA::Types::kMLP, "MLP",
                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );
--- Factory                  : Booking method: BDT DataSet Name: dataset1
--- DataSetFactory           : Dataset[dataset1] : Splitmode is: "RANDOM" the mixmode is: "SAMEASSPLITMODE"
--- DataSetFactory           : Dataset[dataset1] : Create training and testing trees -- looping over class "Signal" ...
--- DataSetFactory           : Dataset[dataset1] : Weight expression for class 'Signal': ""
--- DataSetFactory           : Dataset[dataset1] : Create training and testing trees -- looping over class "Background" ...
--- DataSetFactory           : Dataset[dataset1] : Weight expression for class 'Background': ""
--- DataSetFactory           : Dataset[dataset1] : Number of events in input trees (after possible flattening of arrays):
--- DataSetFactory           : Dataset[dataset1] :     Signal          -- number of events       : 6000   / sum of weights: 6000 
--- DataSetFactory           : Dataset[dataset1] :     Background      -- number of events       : 6000   / sum of weights: 6000 
--- DataSetFactory           : Dataset[dataset1] :     Signal     tree -- total number of entries: 6000 
--- DataSetFactory           : Dataset[dataset1] :     Background tree -- total number of entries: 6000 
--- DataSetFactory           : Dataset[dataset1] : Preselection: (will NOT affect number of requested training and testing events)
--- DataSetFactory           : Dataset[dataset1] :     No preselection cuts applied on event classes
--- DataSetFactory           : Dataset[dataset1] : Weight renormalisation mode: "NumEvents": renormalises all event classes 
--- DataSetFactory           : Dataset[dataset1] :  such that the effective (weighted) number of events in each class equals the respective 
--- DataSetFactory           : Dataset[dataset1] :  number of events (entries) that you demanded in PrepareTrainingAndTestTree("","nTrain_Signal=.. )
--- DataSetFactory           : Dataset[dataset1] :  ... i.e. such that Sum[i=1..N_j]{w_i} = N_j, j=0,1,2...
--- DataSetFactory           : Dataset[dataset1] :  ... (note that N_j is the sum of TRAINING events (nTrain_j...with j=Signal,Background..
--- DataSetFactory           : Dataset[dataset1] :  ..... Testing events are not renormalised nor included in the renormalisation factor! )
--- DataSetFactory           : Dataset[dataset1] : --> Rescale Signal     event weights by factor: 1
--- DataSetFactory           : Dataset[dataset1] : --> Rescale Background event weights by factor: 1
--- DataSetFactory           : Dataset[dataset1] : Number of training and testing events after rescaling:
--- DataSetFactory           : Dataset[dataset1] : ---------------------------------------------------------------------------
--- DataSetFactory           : Dataset[dataset1] : Signal     -- training events            : 1000 (sum of weights: 1000) - requested were 1000 events
--- DataSetFactory           : Dataset[dataset1] : Signal     -- testing events             : 5000 (sum of weights: 5000) - requested were 0 events
--- DataSetFactory           : Dataset[dataset1] : Signal     -- training and testing events: 6000 (sum of weights: 6000)
--- DataSetFactory           : Dataset[dataset1] : Background -- training events            : 1000 (sum of weights: 1000) - requested were 1000 events
--- DataSetFactory           : Dataset[dataset1] : Background -- testing events             : 5000 (sum of weights: 5000) - requested were 0 events
--- DataSetFactory           : Dataset[dataset1] : Background -- training and testing events: 6000 (sum of weights: 6000)
--- DataSetFactory           : Dataset[dataset1] : Create internal training tree
--- DataSetFactory           : Dataset[dataset1] : Create internal testing tree
--- DataSetInfo              : Dataset[dataset1] : Correlation matrix (Signal):
--- DataSetInfo              : --------------------------------
--- DataSetInfo              :             var1    var2    var3
--- DataSetInfo              :    var1:  +1.000  +0.386  +0.597
--- DataSetInfo              :    var2:  +0.386  +1.000  +0.696
--- DataSetInfo              :    var3:  +0.597  +0.696  +1.000
--- DataSetInfo              : --------------------------------
--- DataSetInfo              : Dataset[dataset1] : Correlation matrix (Background):
--- DataSetInfo              : --------------------------------
--- DataSetInfo              :             var1    var2    var3
--- DataSetInfo              :    var1:  +1.000  +0.856  +0.914
--- DataSetInfo              :    var2:  +0.856  +1.000  +0.927
--- DataSetInfo              :    var3:  +0.914  +0.927  +1.000
--- DataSetInfo              : --------------------------------
--- DataSetFactory           : Dataset[dataset1] :  
--- Factory                  : Booking method: MLP DataSet Name: dataset1
--- MLP                      : Dataset[dataset1] : Create Transformation "N" with events from all classes.
--- Norm                     : Transformation, Variable selection : 
--- Norm                     : Input : variable 'var1' (index=0).   <---> Output : variable 'var1' (index=0).
--- Norm                     : Input : variable 'var2' (index=1).   <---> Output : variable 'var2' (index=1).
--- Norm                     : Input : variable 'var3' (index=2).   <---> Output : variable 'var3' (index=2).
--- MLP                      : Building Network
--- MLP                      : Initializing weights

Second dataset

In [5]:
//Boosted Decision Trees
factory.BookMethod(&loader2, TMVA::Types::kBDT, "BDT",
                   "!V:NTrees=200:MinNodeSize=2.5%:MaxDepth=2:BoostType=AdaBoost:AdaBoostBeta=0.5:UseBaggedBoost:BaggedSampleFraction=0.5:SeparationType=GiniIndex:nCuts=20" );

//Multi-Layer Perceptron
factory.BookMethod(&loader2, TMVA::Types::kMLP, "MLP",
                   "!H:!V:NeuronType=tanh:VarTransform=N:NCycles=100:HiddenLayers=N+5:TestRate=5:!UseRegulator" );
--- Factory                  : Booking method: BDT DataSet Name: dataset2
--- DataSetFactory           : Dataset[dataset2] : Splitmode is: "RANDOM" the mixmode is: "SAMEASSPLITMODE"
--- DataSetFactory           : Dataset[dataset2] : Create training and testing trees -- looping over class "Signal" ...
--- DataSetFactory           : Dataset[dataset2] : Weight expression for class 'Signal': ""
--- DataSetFactory           : Dataset[dataset2] : Create training and testing trees -- looping over class "Background" ...
--- DataSetFactory           : Dataset[dataset2] : Weight expression for class 'Background': ""
--- DataSetFactory           : Dataset[dataset2] : Number of events in input trees (after possible flattening of arrays):
--- DataSetFactory           : Dataset[dataset2] :     Signal          -- number of events       : 6000   / sum of weights: 6000 
--- DataSetFactory           : Dataset[dataset2] :     Background      -- number of events       : 6000   / sum of weights: 6000 
--- DataSetFactory           : Dataset[dataset2] :     Signal     tree -- total number of entries: 6000 
--- DataSetFactory           : Dataset[dataset2] :     Background tree -- total number of entries: 6000 
--- DataSetFactory           : Dataset[dataset2] : Preselection: (will NOT affect number of requested training and testing events)
--- DataSetFactory           : Dataset[dataset2] :     No preselection cuts applied on event classes
--- DataSetFactory           : Dataset[dataset2] : Weight renormalisation mode: "NumEvents": renormalises all event classes 
--- DataSetFactory           : Dataset[dataset2] :  such that the effective (weighted) number of events in each class equals the respective 
--- DataSetFactory           : Dataset[dataset2] :  number of events (entries) that you demanded in PrepareTrainingAndTestTree("","nTrain_Signal=.. )
--- DataSetFactory           : Dataset[dataset2] :  ... i.e. such that Sum[i=1..N_j]{w_i} = N_j, j=0,1,2...
--- DataSetFactory           : Dataset[dataset2] :  ... (note that N_j is the sum of TRAINING events (nTrain_j...with j=Signal,Background..
--- DataSetFactory           : Dataset[dataset2] :  ..... Testing events are not renormalised nor included in the renormalisation factor! )
--- DataSetFactory           : Dataset[dataset2] : --> Rescale Signal     event weights by factor: 1
--- DataSetFactory           : Dataset[dataset2] : --> Rescale Background event weights by factor: 1
--- DataSetFactory           : Dataset[dataset2] : Number of training and testing events after rescaling:
--- DataSetFactory           : Dataset[dataset2] : ---------------------------------------------------------------------------
--- DataSetFactory           : Dataset[dataset2] : Signal     -- training events            : 1000 (sum of weights: 1000) - requested were 1000 events
--- DataSetFactory           : Dataset[dataset2] : Signal     -- testing events             : 5000 (sum of weights: 5000) - requested were 0 events
--- DataSetFactory           : Dataset[dataset2] : Signal     -- training and testing events: 6000 (sum of weights: 6000)
--- DataSetFactory           : Dataset[dataset2] : Background -- training events            : 1000 (sum of weights: 1000) - requested were 1000 events
--- DataSetFactory           : Dataset[dataset2] : Background -- testing events             : 5000 (sum of weights: 5000) - requested were 0 events
--- DataSetFactory           : Dataset[dataset2] : Background -- training and testing events: 6000 (sum of weights: 6000)
--- DataSetFactory           : Dataset[dataset2] : Create internal training tree
--- DataSetFactory           : Dataset[dataset2] : Create internal testing tree
--- DataSetInfo              : Dataset[dataset2] : Correlation matrix (Signal):
--- DataSetInfo              : --------------------------------------------------------------
--- DataSetInfo              :               var1    var2    var3    var4 var1-var3 var1+var2
--- DataSetInfo              :      var1:  +1.000  +0.386  +0.597  +0.808    +0.442    +0.826
--- DataSetInfo              :      var2:  +0.386  +1.000  +0.696  +0.743    -0.350    +0.839
--- DataSetInfo              :      var3:  +0.597  +0.696  +1.000  +0.860    -0.456    +0.778
--- DataSetInfo              :      var4:  +0.808  +0.743  +0.860  +1.000    -0.065    +0.931
--- DataSetInfo              : var1-var3:  +0.442  -0.350  -0.456  -0.065    +1.000    +0.047
--- DataSetInfo              : var1+var2:  +0.826  +0.839  +0.778  +0.931    +0.047    +1.000
--- DataSetInfo              : --------------------------------------------------------------
--- DataSetInfo              : Dataset[dataset2] : Correlation matrix (Background):
--- DataSetInfo              : --------------------------------------------------------------
--- DataSetInfo              :               var1    var2    var3    var4 var1-var3 var1+var2
--- DataSetInfo              :      var1:  +1.000  +0.856  +0.914  +0.964    +0.122    +0.966
--- DataSetInfo              :      var2:  +0.856  +1.000  +0.927  +0.937    -0.248    +0.960
--- DataSetInfo              :      var3:  +0.914  +0.927  +1.000  +0.971    -0.290    +0.955
--- DataSetInfo              :      var4:  +0.964  +0.937  +0.971  +1.000    -0.101    +0.987
--- DataSetInfo              : var1-var3:  +0.122  -0.248  -0.290  -0.101    +1.000    -0.057
--- DataSetInfo              : var1+var2:  +0.966  +0.960  +0.955  +0.987    -0.057    +1.000
--- DataSetInfo              : --------------------------------------------------------------
--- DataSetFactory           : Dataset[dataset2] :  
--- Factory                  : Booking method: MLP DataSet Name: dataset2
--- MLP                      : Dataset[dataset2] : Create Transformation "N" with events from all classes.
--- Norm                     : Transformation, Variable selection : 
--- Norm                     : Input : variable 'var1' (index=0).   <---> Output : variable 'var1' (index=0).
--- Norm                     : Input : variable 'var2' (index=1).   <---> Output : variable 'var2' (index=1).
--- Norm                     : Input : variable 'var3' (index=2).   <---> Output : variable 'var3' (index=2).
--- Norm                     : Input : variable 'var4' (index=3).   <---> Output : variable 'var4' (index=3).
--- Norm                     : Input : variable 'var5' (index=4).   <---> Output : variable 'var5' (index=4).
--- Norm                     : Input : variable 'var6' (index=5).   <---> Output : variable 'var6' (index=5).
--- MLP                      : Building Network
--- MLP                      : Initializing weights

Train Methods

In [6]:
factory.TrainAllMethods();
--- Factory                  :  
--- Factory                  : Train all methods for Classification ...
--- Factory                  : 
--- Factory                  : current transformation string: 'I'
--- Factory                  : Dataset[dataset1] : Create Transformation "I" with events from all classes.
--- Id                       : Transformation, Variable selection : 
--- Id                       : Input : variable 'var1' (index=0).   <---> Output : variable 'var1' (index=0).
--- Id                       : Input : variable 'var2' (index=1).   <---> Output : variable 'var2' (index=1).
--- Id                       : Input : variable 'var3' (index=2).   <---> Output : variable 'var3' (index=2).
--- Id                       : Preparing the Identity transformation...
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        :     var1:   0.088993     1.6612   [    -4.9583     4.3390 ]
--- TFHandler_Factory        :     var2:   0.095657     1.5731   [    -4.6474     3.9513 ]
--- TFHandler_Factory        :     var3:    0.10551     1.7363   [    -5.0373     4.2785 ]
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        : Plot event variables for Id
--- TFHandler_Factory        : Create scatter and profile plots in target-file directory: 
--- TFHandler_Factory        : TMVAOutputCV.root:/dataset1/InputVariables_Id/CorrelationPlots
--- TFHandler_Factory        :  
--- TFHandler_Factory        : Ranking input variables (method unspecific)...
--- IdTransformation         : Ranking result (top variable is best ranked)
--- IdTransformation         : -----------------------------
--- IdTransformation         : Rank : Variable  : Separation
--- IdTransformation         : -----------------------------
--- IdTransformation         :    1 : var1      : 2.898e-01
--- IdTransformation         :    2 : var3      : 2.828e-01
--- IdTransformation         :    3 : var2      : 2.032e-01
--- IdTransformation         : -----------------------------
--- Factory                  : Train method: BDT for Classification
--- BDT                      : Dataset[dataset1] : Begin training
--- BDT                      :  found and suggest the following possible pre-selection cuts 
--- BDT                      : as option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample
--- BDT                      :  found cut: Bkg if var 0 < -2.99038
--- BDT                      :  found cut: Bkg if var 2 < -2.88493
--- BDT                      : <InitEventSample> For classification trees, 
--- BDT                      :  the effective number of backgrounds is scaled to match 
--- BDT                      :  the signal. Othersise the first boosting step would do 'just that'!
--- BDT                      : re-normlise events such that Sig and Bkg have respective sum of weights = 1
--- BDT                      :   sig->sig*1ev. bkg->bkg*1ev.
--- BDT                      : #events: (reweighted) sig: 1000 bkg: 1000
--- BDT                      : #events: (unweighted) sig: 1000 bkg: 1000
--- BDT                      : Training 200 Decision Trees ... patience please
--- BinaryTree               : The minimal node size MinNodeSize=2.5 fMinNodeSize=2.5% is translated to an actual number of events = 25.7 for the training sample size of 1028
--- BinaryTree               : Note: This number will be taken as absolute minimum in the node, 
--- BinaryTree               :       in terms of 'weighted events' and unweighted ones !! 
--- BDT                      : <Train> elapsed time: 0.118 sec                              
--- BDT                      : <Train> average number of nodes (w/o pruning) : 4
--- BDT                      : Dataset[dataset1] : End of training                                              
--- BDT                      : Dataset[dataset1] : Elapsed time for training with 2000 events: 0.123 sec         
--- BDT                      : Dataset[dataset1] : Create MVA output for Dataset[dataset1] : classification on training sample
--- BDT                      : Dataset[dataset1] : Evaluation of BDT on training sample (2000 events)
--- BDT                      : Dataset[dataset1] : Elapsed time for evaluation of 2000 events: 0.0283 sec       
--- BDT                      : Dataset[dataset1] : Creating weight file in xml format: dataset1/weights/TMVAClassification_BDT.weights.xml
--- BDT                      : Dataset[dataset1] : Creating standalone response class: dataset1/weights/TMVAClassification_BDT.class.C
--- BDT                      : Write monitoring histograms to file: TMVAOutputCV.root:/dataset1/Method_BDT/BDT
--- Factory                  : Training finished
--- Factory                  : Train method: MLP for Classification
--- Norm                     : Preparing the transformation.
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.085752    0.35735   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var2:    0.10321    0.36589   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var3:    0.10411    0.37276   [    -1.0000     1.0000 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- MLP                      : Dataset[dataset1] : Begin training
--- MLP                      : Training Network
--- MLP                      : Dataset[dataset1] : End of training                                              
--- MLP                      : Dataset[dataset1] : Elapsed time for training with 2000 events: 0.955 sec         
--- MLP                      : Dataset[dataset1] : Create MVA output for Dataset[dataset1] : classification on training sample
--- MLP                      : Dataset[dataset1] : Evaluation of MLP on training sample (2000 events)
--- MLP                      : Dataset[dataset1] : Elapsed time for evaluation of 2000 events: 0.00232 sec       
--- MLP                      : Dataset[dataset1] : Creating weight file in xml format: dataset1/weights/TMVAClassification_MLP.weights.xml
--- MLP                      : Dataset[dataset1] : Creating standalone response class: dataset1/weights/TMVAClassification_MLP.class.C
--- MLP                      : Write special histos to file: TMVAOutputCV.root:/dataset1/Method_MLP/MLP
--- Factory                  : Training finished
--- Factory                  : 
--- Factory                  : Ranking input variables (method specific)...
--- BDT                      : Ranking result (top variable is best ranked)
--- BDT                      : --------------------------------------
--- BDT                      : Rank : Variable  : Variable Importance
--- BDT                      : --------------------------------------
--- BDT                      :    1 : var3      : 3.607e-01
--- BDT                      :    2 : var1      : 3.428e-01
--- BDT                      :    3 : var2      : 2.965e-01
--- BDT                      : --------------------------------------
--- MLP                      : Ranking result (top variable is best ranked)
--- MLP                      : -----------------------------
--- MLP                      : Rank : Variable  : Importance
--- MLP                      : -----------------------------
--- MLP                      :    1 : var1      : 3.074e+00
--- MLP                      :    2 : var3      : 2.229e+00
--- MLP                      :    3 : var2      : 1.144e+00
--- MLP                      : -----------------------------
--- Factory                  : 
--- Factory                  : === Destroy and recreate all methods via weight files for testing ===
--- Factory                  : 
--- MethodBase               : Dataset[dataset1] : Reading weight file: dataset1/weights/TMVAClassification_BDT.weights.xml
--- BDT                      : Dataset[dataset1] : Read method "BDT" of type "BDT"
--- BDT                      : Dataset[dataset1] : MVA method was trained with TMVA Version: 4.2.1
--- BDT                      : Dataset[dataset1] : MVA method was trained with ROOT Version: 6.07/07
--- MethodBase               : Dataset[dataset1] : Reading weight file: dataset1/weights/TMVAClassification_MLP.weights.xml
--- MLP                      : Dataset[dataset1] : Read method "MLP" of type "MLP"
--- MLP                      : Dataset[dataset1] : MVA method was trained with TMVA Version: 4.2.1
--- MLP                      : Dataset[dataset1] : MVA method was trained with ROOT Version: 6.07/07
--- MLP                      : Building Network
--- MLP                      : Initializing weights
--- Factory                  : 
--- Factory                  : current transformation string: 'I'
--- Factory                  : Dataset[dataset2] : Create Transformation "I" with events from all classes.
--- Id                       : Transformation, Variable selection : 
--- Id                       : Input : variable 'var1' (index=0).   <---> Output : variable 'var1' (index=0).
--- Id                       : Input : variable 'var2' (index=1).   <---> Output : variable 'var2' (index=1).
--- Id                       : Input : variable 'var3' (index=2).   <---> Output : variable 'var3' (index=2).
--- Id                       : Input : variable 'var4' (index=3).   <---> Output : variable 'var4' (index=3).
--- Id                       : Input : variable 'var5' (index=4).   <---> Output : variable 'var5' (index=4).
--- Id                       : Input : variable 'var6' (index=5).   <---> Output : variable 'var6' (index=5).
--- Id                       : Preparing the Identity transformation...
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        :     var1:   0.088993     1.6612   [    -4.9583     4.3390 ]
--- TFHandler_Factory        :     var2:   0.095657     1.5731   [    -4.6474     3.9513 ]
--- TFHandler_Factory        :     var3:    0.10551     1.7363   [    -5.0373     4.2785 ]
--- TFHandler_Factory        :     var4:    0.27815     2.1526   [    -5.9505     4.6404 ]
--- TFHandler_Factory        :     var5:  -0.016515    0.91457   [    -3.1013     3.0035 ]
--- TFHandler_Factory        :     var6:    0.18465     3.0371   [    -8.1442     7.2697 ]
--- TFHandler_Factory        : -----------------------------------------------------------
--- TFHandler_Factory        : Plot event variables for Id
--- TFHandler_Factory        : Create scatter and profile plots in target-file directory: 
--- TFHandler_Factory        : TMVAOutputCV.root:/dataset2/InputVariables_Id/CorrelationPlots
--- TFHandler_Factory        :  
--- TFHandler_Factory        : Ranking input variables (method unspecific)...
--- IdTransformation         : Ranking result (top variable is best ranked)
--- IdTransformation         : -----------------------------
--- IdTransformation         : Rank : Variable  : Separation
--- IdTransformation         : -----------------------------
--- IdTransformation         :    1 : var4      : 3.585e-01
--- IdTransformation         :    2 : var6      : 3.225e-01
--- IdTransformation         :    3 : var1      : 2.898e-01
--- IdTransformation         :    4 : var3      : 2.828e-01
--- IdTransformation         :    5 : var2      : 2.032e-01
--- IdTransformation         :    6 : var5      : 5.558e-02
--- IdTransformation         : -----------------------------
--- Factory                  : Train method: BDT for Classification
--- BDT                      : Dataset[dataset2] : Begin training
--- BDT                      :  found and suggest the following possible pre-selection cuts 
--- BDT                      : as option DoPreselection was not used, these cuts however will not be performed, but the training will see the full sample
--- BDT                      :  found cut: Bkg if var 0 < -2.99038
--- BDT                      :  found cut: Bkg if var 2 < -2.88493
--- BDT                      :  found cut: Bkg if var 3 < -2.54088
--- BDT                      :  found cut: Bkg if var 5 < -5.0999
--- BDT                      : <InitEventSample> For classification trees, 
--- BDT                      :  the effective number of backgrounds is scaled to match 
--- BDT                      :  the signal. Othersise the first boosting step would do 'just that'!
--- BDT                      : re-normlise events such that Sig and Bkg have respective sum of weights = 1
--- BDT                      :   sig->sig*1ev. bkg->bkg*1ev.
--- BDT                      : #events: (reweighted) sig: 1000 bkg: 1000
--- BDT                      : #events: (unweighted) sig: 1000 bkg: 1000
--- BDT                      : Training 200 Decision Trees ... patience please
--- BinaryTree               : The minimal node size MinNodeSize=2.5 fMinNodeSize=2.5% is translated to an actual number of events = 25.7 for the training sample size of 1028
--- BinaryTree               : Note: This number will be taken as absolute minimum in the node, 
--- BinaryTree               :       in terms of 'weighted events' and unweighted ones !! 
--- BDT                      : <Train> elapsed time: 0.107 sec                              
--- BDT                      : <Train> average number of nodes (w/o pruning) : 5
--- BDT                      : Dataset[dataset2] : End of training                                              
--- BDT                      : Dataset[dataset2] : Elapsed time for training with 2000 events: 0.111 sec         
--- BDT                      : Dataset[dataset2] : Create MVA output for Dataset[dataset2] : classification on training sample
--- BDT                      : Dataset[dataset2] : Evaluation of BDT on training sample (2000 events)
--- BDT                      : Dataset[dataset2] : Elapsed time for evaluation of 2000 events: 0.0222 sec       
--- BDT                      : Dataset[dataset2] : Creating weight file in xml format: dataset2/weights/TMVAClassification_BDT.weights.xml
--- BDT                      : Dataset[dataset2] : Creating standalone response class: dataset2/weights/TMVAClassification_BDT.class.C
--- BDT                      : Write monitoring histograms to file: TMVAOutputCV.root:/dataset2/Method_BDT/BDT
--- Factory                  : Training finished
--- Factory                  : Train method: MLP for Classification
--- Norm                     : Preparing the transformation.
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.085752    0.35735   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var2:    0.10321    0.36589   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var3:    0.10411    0.37276   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var4:    0.17623    0.40650   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var5:   0.010608    0.29962   [    -1.0000     1.0000 ]
--- TFHandler_MLP            :     var6:   0.080694    0.39407   [    -1.0000     1.0000 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- MLP                      : Dataset[dataset2] : Begin training
--- MLP                      : Training Network
--- MLP                      : Dataset[dataset2] : End of training                                              
--- MLP                      : Dataset[dataset2] : Elapsed time for training with 2000 events: 1.17 sec         
--- MLP                      : Dataset[dataset2] : Create MVA output for Dataset[dataset2] : classification on training sample
--- MLP                      : Dataset[dataset2] : Evaluation of MLP on training sample (2000 events)
--- MLP                      : Dataset[dataset2] : Elapsed time for evaluation of 2000 events: 0.00304 sec       
--- MLP                      : Dataset[dataset2] : Creating weight file in xml format: dataset2/weights/TMVAClassification_MLP.weights.xml
--- MLP                      : Dataset[dataset2] : Creating standalone response class: dataset2/weights/TMVAClassification_MLP.class.C
--- MLP                      : Write special histos to file: TMVAOutputCV.root:/dataset2/Method_MLP/MLP
--- Factory                  : Training finished
--- Factory                  : 
--- Factory                  : Ranking input variables (method specific)...
--- BDT                      : Ranking result (top variable is best ranked)
--- BDT                      : --------------------------------------
--- BDT                      : Rank : Variable  : Variable Importance
--- BDT                      : --------------------------------------
--- BDT                      :    1 : var4      : 2.831e-01
--- BDT                      :    2 : var1      : 1.620e-01
--- BDT                      :    3 : var6      : 1.619e-01
--- BDT                      :    4 : var3      : 1.432e-01
--- BDT                      :    5 : var2      : 1.391e-01
--- BDT                      :    6 : var5      : 1.107e-01
--- BDT                      : --------------------------------------
--- MLP                      : Ranking result (top variable is best ranked)
--- MLP                      : -----------------------------
--- MLP                      : Rank : Variable  : Importance
--- MLP                      : -----------------------------
--- MLP                      :    1 : var4      : 9.713e+00
--- MLP                      :    2 : var1      : 4.210e+00
--- MLP                      :    3 : var6      : 2.539e+00
--- MLP                      :    4 : var3      : 1.827e+00
--- MLP                      :    5 : var5      : 1.640e+00
--- MLP                      :    6 : var2      : 1.519e+00
--- MLP                      : -----------------------------
--- Factory                  : 
--- Factory                  : === Destroy and recreate all methods via weight files for testing ===
--- Factory                  : 
--- MethodBase               : Dataset[dataset2] : Reading weight file: dataset2/weights/TMVAClassification_BDT.weights.xml
--- BDT                      : Dataset[dataset2] : Read method "BDT" of type "BDT"
--- BDT                      : Dataset[dataset2] : MVA method was trained with TMVA Version: 4.2.1
--- BDT                      : Dataset[dataset2] : MVA method was trained with ROOT Version: 6.07/07
--- MethodBase               : Dataset[dataset2] : Reading weight file: dataset2/weights/TMVAClassification_MLP.weights.xml
--- MLP                      : Dataset[dataset2] : Read method "MLP" of type "MLP"
--- MLP                      : Dataset[dataset2] : MVA method was trained with TMVA Version: 4.2.1
--- MLP                      : Dataset[dataset2] : MVA method was trained with ROOT Version: 6.07/07
--- MLP                      : Building Network
--- MLP                      : Initializing weights

Test and Evaluate Methods

In [7]:
factory.TestAllMethods();
factory.EvaluateAllMethods();
--- Factory                  : Test all methods...
--- Factory                  : Test method: BDT for Classification performance
--- BDT                      : Dataset[dataset1] : Evaluation of BDT on testing sample (10000 events)
--- BDT                      : Dataset[dataset1] : Elapsed time for evaluation of 10000 events: 0.153 sec       
--- Factory                  : Test method: MLP for Classification performance
--- MLP                      : Dataset[dataset1] : Evaluation of MLP on testing sample (10000 events)
--- MLP                      : Dataset[dataset1] : Elapsed time for evaluation of 10000 events: 0.014 sec       
--- Factory                  : Test method: BDT for Classification performance
--- BDT                      : Dataset[dataset2] : Evaluation of BDT on testing sample (10000 events)
--- BDT                      : Dataset[dataset2] : Elapsed time for evaluation of 10000 events: 0.135 sec       
--- Factory                  : Test method: MLP for Classification performance
--- MLP                      : Dataset[dataset2] : Evaluation of MLP on testing sample (10000 events)
--- MLP                      : Dataset[dataset2] : Elapsed time for evaluation of 10000 events: 0.0209 sec       
--- Factory                  : Evaluate all methods...
--- Factory                  : Evaluate classifier: BDT
--- BDT                      : Dataset[dataset1] : Loop over test events and fill histograms with classifier response...
--- Factory                  : Write evaluation histograms to file
--- TFHandler_BDT            : Plot event variables for BDT
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            :     var1: 0.00077102     1.6695   [    -5.8991     4.7639 ]
--- TFHandler_BDT            :     var2: -0.0063164     1.5765   [    -5.2454     4.8300 ]
--- TFHandler_BDT            :     var3:  -0.010870     1.7365   [    -5.3563     4.6430 ]
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            : Create scatter and profile plots in target-file directory: 
--- TFHandler_BDT            : TMVAOutputCV.root:/dataset1/Method_BDT/BDT/CorrelationPlots
--- Factory                  : Evaluate classifier: MLP
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
--- TFHandler_MLP            :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
--- TFHandler_MLP            :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- MLP                      : Dataset[dataset1] : Loop over test events and fill histograms with classifier response...
--- Factory                  : Write evaluation histograms to file
--- TFHandler_MLP            : Plot event variables for MLP
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
--- TFHandler_MLP            :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
--- TFHandler_MLP            :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Create scatter and profile plots in target-file directory: 
--- TFHandler_MLP            : TMVAOutputCV.root:/dataset1/Method_MLP/MLP/CorrelationPlots
--- Factory                  : 
--- Factory                  : Evaluation results ranked by best signal efficiency and purity (area)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : DataSet              MVA              Signal efficiency at bkg eff.(error):                | Sepa-    Signifi- 
--- Factory                  : Name:                Method:          @B=0.01    @B=0.10    @B=0.30    ROC-integ    ROCCurve| ration:  cance:   
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : dataset1             MLP            : 0.066(03)  0.427(06)  0.809(05)    0.831       0.831 | 0.357    1.043
--- Factory                  : dataset1             BDT            : 0.066(03)  0.400(06)  0.802(05)    0.826       0.826 | 0.349    1.004
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : 
--- Factory                  : Testing efficiency compared to training efficiency (overtraining check)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
--- Factory                  : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : dataset1             MLP            : 0.066 (0.067)       0.427 (0.433)      0.809 (0.833)
--- Factory                  : dataset1             BDT            : 0.066 (0.095)       0.400 (0.451)      0.802 (0.838)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : 
--- Dataset:dataset1         : Dataset[dataset1] : Created tree 'TestTree' with 10000 events
--- Dataset:dataset1         : Dataset[dataset1] : Created tree 'TrainTree' with 2000 events
--- Factory                  : Evaluate classifier: BDT
--- BDT                      : Dataset[dataset2] : Loop over test events and fill histograms with classifier response...
--- Factory                  : Write evaluation histograms to file
--- TFHandler_BDT            : Plot event variables for BDT
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            :     var1: 0.00077102     1.6695   [    -5.8991     4.7639 ]
--- TFHandler_BDT            :     var2: -0.0063164     1.5765   [    -5.2454     4.8300 ]
--- TFHandler_BDT            :     var3:  -0.010870     1.7365   [    -5.3563     4.6430 ]
--- TFHandler_BDT            :     var4:    0.14557     2.1608   [    -6.9675     5.0307 ]
--- TFHandler_BDT            :     var5:   0.011641    0.92093   [    -3.2430     3.5043 ]
--- TFHandler_BDT            :     var6: -0.0055454     3.0494   [    -9.8605     7.9024 ]
--- TFHandler_BDT            : -----------------------------------------------------------
--- TFHandler_BDT            : Create scatter and profile plots in target-file directory: 
--- TFHandler_BDT            : TMVAOutputCV.root:/dataset2/Method_BDT/BDT/CorrelationPlots
--- Factory                  : Evaluate classifier: MLP
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
--- TFHandler_MLP            :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
--- TFHandler_MLP            :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
--- TFHandler_MLP            :     var4:    0.15120    0.40805   [    -1.1921     1.0737 ]
--- TFHandler_MLP            :     var5:   0.019832    0.30171   [    -1.0464     1.1641 ]
--- TFHandler_MLP            :     var6:   0.056015    0.39567   [    -1.2227     1.0821 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- MLP                      : Dataset[dataset2] : Loop over test events and fill histograms with classifier response...
--- Factory                  : Write evaluation histograms to file
--- TFHandler_MLP            : Plot event variables for MLP
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Variable        Mean        RMS   [        Min        Max ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
--- TFHandler_MLP            :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
--- TFHandler_MLP            :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
--- TFHandler_MLP            :     var4:    0.15120    0.40805   [    -1.1921     1.0737 ]
--- TFHandler_MLP            :     var5:   0.019832    0.30171   [    -1.0464     1.1641 ]
--- TFHandler_MLP            :     var6:   0.056015    0.39567   [    -1.2227     1.0821 ]
--- TFHandler_MLP            : -----------------------------------------------------------
--- TFHandler_MLP            : Create scatter and profile plots in target-file directory: 
--- TFHandler_MLP            : TMVAOutputCV.root:/dataset2/Method_MLP/MLP/CorrelationPlots
--- Factory                  : 
--- Factory                  : Evaluation results ranked by best signal efficiency and purity (area)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : DataSet              MVA              Signal efficiency at bkg eff.(error):                | Sepa-    Signifi- 
--- Factory                  : Name:                Method:          @B=0.01    @B=0.10    @B=0.30    ROC-integ    ROCCurve| ration:  cance:   
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : dataset2             MLP            : 0.355(06)  0.790(05)  0.957(02)    0.934       0.933 | 0.597    1.689
--- Factory                  : dataset2             BDT            : 0.297(06)  0.731(06)  0.939(03)    0.919       0.919 | 0.547    1.449
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : 
--- Factory                  : Testing efficiency compared to training efficiency (overtraining check)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
--- Factory                  : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : dataset2             MLP            : 0.355 (0.385)       0.790 (0.808)      0.957 (0.961)
--- Factory                  : dataset2             BDT            : 0.297 (0.345)       0.731 (0.793)      0.939 (0.942)
--- Factory                  : -------------------------------------------------------------------------------------------------------------------
--- Factory                  : 
--- Dataset:dataset2         : Dataset[dataset2] : Created tree 'TestTree' with 10000 events
--- Dataset:dataset2         : Dataset[dataset2] : Created tree 'TrainTree' with 2000 events
--- Factory                  :   
--- Factory                  : Thank you for using TMVA!
--- Factory                  : For citation information, please visit: http://tmva.sf.net/citeTMVA.html

Plot ROC Curve

We enable JavaScript visualisation for the plots

In [8]:
%jsroot on
In [9]:
auto c1 = factory.GetROCCurve(&loader1);
c1->Draw();
In [10]:
auto c2 = factory.GetROCCurve(&loader2);
c2->Draw();