DNN Example

Declare Factory

In [1]:
from ROOT import TMVA, TFile, TTree, TCut, TString
Welcome to JupyROOT 6.09/01
In [2]:
TMVA.Tools.Instance()

inputFile = TFile.Open("https://raw.githubusercontent.com/iml-wg/tmvatutorials/master/inputdata.root")
outputFile = TFile.Open("TMVAOutputDNN.root", "RECREATE")

factory = TMVA.Factory("TMVAClassification", outputFile,
        "!V:!Silent:Color:!DrawProgressBar:AnalysisType=Classification" )

Declare Variables in DataLoader

In [3]:
loader = TMVA.DataLoader("dataset_dnn")

loader.AddVariable("var1")
loader.AddVariable("var2")
loader.AddVariable("var3")
loader.AddVariable("var4")
loader.AddVariable("var5 := var1-var3")
loader.AddVariable("var6 := var1+var2")

Setup Dataset(s)

In [4]:
tsignal = inputFile.Get("Sig")
tbackground = inputFile.Get("Bkg")

loader.AddSignalTree(tsignal)
loader.AddBackgroundTree(tbackground) 
loader.PrepareTrainingAndTestTree(TCut(""),
        "nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V")
DataSetInfo              : [dataset_dnn] : Added class "Signal"
                         : Add Tree Sig of type Signal with 6000 events
DataSetInfo              : [dataset_dnn] : Added class "Background"
                         : Add Tree Bkg of type Background with 6000 events
                         : Dataset[dataset_dnn] : Class index : 0  name : Signal
                         : Dataset[dataset_dnn] : Class index : 1  name : Background

Configure Network Layout

In [5]:
# General layout
layoutString = TString("Layout=TANH|128,TANH|128,TANH|128,LINEAR");

# Training strategies
training0 = TString("LearningRate=1e-1,Momentum=0.9,Repetitions=1,"
                        "ConvergenceSteps=2,BatchSize=256,TestRepetitions=10,"
                        "WeightDecay=1e-4,Regularization=L2,"
                        "DropConfig=0.0+0.5+0.5+0.5, Multithreading=True")
training1 = TString("LearningRate=1e-2,Momentum=0.9,Repetitions=1,"
                        "ConvergenceSteps=2,BatchSize=256,TestRepetitions=10,"
                        "WeightDecay=1e-4,Regularization=L2,"
                        "DropConfig=0.0+0.0+0.0+0.0, Multithreading=True")
trainingStrategyString = TString("TrainingStrategy=")
trainingStrategyString += training0 + TString("|") + training1

# General Options
dnnOptions = TString("!H:!V:ErrorStrategy=CROSSENTROPY:VarTransform=N:"
        "WeightInitialization=XAVIERUNIFORM")
dnnOptions.Append(":")
dnnOptions.Append(layoutString)
dnnOptions.Append(":")
dnnOptions.Append(trainingStrategyString)
Out[5]:
'!H:!V:ErrorStrategy=CROSSENTROPY:VarTransform=N:WeightInitialization=XAVIERUNIFORM:Layout=TANH|128,TANH|128,TANH|128,LINEAR:TrainingStrategy=LearningRate=1e-1,Momentum=0.9,Repetitions=1,ConvergenceSteps=2,BatchSize=256,TestRepetitions=10,WeightDecay=1e-4,Regularization=L2,DropConfig=0.0+0.5+0.5+0.5, Multithreading=True|LearningRate=1e-2,Momentum=0.9,Repetitions=1,ConvergenceSteps=2,BatchSize=256,TestRepetitions=10,WeightDecay=1e-4,Regularization=L2,DropConfig=0.0+0.0+0.0+0.0, Multithreading=True'

Booking Methods

In [6]:
# Standard implementation, no dependencies.
stdOptions =  dnnOptions + ":Architecture=STANDARD"
factory.BookMethod(loader, TMVA.Types.kDNN, "DNN", stdOptions)

# CPU implementation, using BLAS
#cpuOptions = dnnOptions + ":Architecture=CPU"
#factory.BookMethod(loader, TMVA.Types.kDNN, "DNN CPU", cpuOptions)
Out[6]:
<ROOT.TMVA::MethodDNN object ("DNN") at 0x564de70>
Factory                  : Booking method: DNN
                         : 
DNN                      : [dataset_dnn] : Create Transformation "N" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
                         : Input : variable 'var5' <---> Output : variable 'var5'
                         : Input : variable 'var6' <---> Output : variable 'var6'

Train Methods

In [7]:
factory.TrainAllMethods()
Factory                  : Train all methods
DataSetFactory           : [dataset_dnn] : Number of events in input trees
                         : 
                         : 
                         : Number of training and testing events
                         : ---------------------------------------------------------------------------
                         : Signal     -- training events            : 1000
                         : Signal     -- testing events             : 5000
                         : Signal     -- training and testing events: 6000
                         : Background -- training events            : 1000
                         : Background -- testing events             : 5000
                         : Background -- training and testing events: 6000
                         : 
DataSetInfo              : Correlation matrix (Signal):
                         : --------------------------------------------------------------
                         :               var1    var2    var3    var4 var1-var3 var1+var2
                         :      var1:  +1.000  +0.386  +0.597  +0.808    +0.442    +0.826
                         :      var2:  +0.386  +1.000  +0.696  +0.743    -0.350    +0.839
                         :      var3:  +0.597  +0.696  +1.000  +0.860    -0.456    +0.778
                         :      var4:  +0.808  +0.743  +0.860  +1.000    -0.065    +0.931
                         : var1-var3:  +0.442  -0.350  -0.456  -0.065    +1.000    +0.047
                         : var1+var2:  +0.826  +0.839  +0.778  +0.931    +0.047    +1.000
                         : --------------------------------------------------------------
DataSetInfo              : Correlation matrix (Background):
                         : --------------------------------------------------------------
                         :               var1    var2    var3    var4 var1-var3 var1+var2
                         :      var1:  +1.000  +0.856  +0.914  +0.964    +0.122    +0.966
                         :      var2:  +0.856  +1.000  +0.927  +0.937    -0.248    +0.960
                         :      var3:  +0.914  +0.927  +1.000  +0.971    -0.290    +0.955
                         :      var4:  +0.964  +0.937  +0.971  +1.000    -0.101    +0.987
                         : var1-var3:  +0.122  -0.248  -0.290  -0.101    +1.000    -0.057
                         : var1+var2:  +0.966  +0.960  +0.955  +0.987    -0.057    +1.000
                         : --------------------------------------------------------------
DataSetFactory           : [dataset_dnn] :  
                         : 
Factory                  : [dataset_dnn] : Create Transformation "I" with events from all classes.
                         : 
                         : Transformation, Variable selection : 
                         : Input : variable 'var1' <---> Output : variable 'var1'
                         : Input : variable 'var2' <---> Output : variable 'var2'
                         : Input : variable 'var3' <---> Output : variable 'var3'
                         : Input : variable 'var4' <---> Output : variable 'var4'
                         : Input : variable 'var5' <---> Output : variable 'var5'
                         : Input : variable 'var6' <---> Output : variable 'var6'
TFHandler_Factory        : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.088993     1.6612   [    -4.9583     4.3390 ]
                         :     var2:   0.095657     1.5731   [    -4.6474     3.9513 ]
                         :     var3:    0.10551     1.7363   [    -5.0373     4.2785 ]
                         :     var4:    0.27815     2.1526   [    -5.9505     4.6404 ]
                         :     var5:  -0.016515    0.91457   [    -3.1013     3.0035 ]
                         :     var6:    0.18465     3.0371   [    -8.1442     7.2697 ]
                         : -----------------------------------------------------------
                         : Ranking input variables (method unspecific)...
IdTransformation         : Ranking result (top variable is best ranked)
                         : -----------------------------
                         : Rank : Variable  : Separation
                         : -----------------------------
                         :    1 : var4      : 3.585e-01
                         :    2 : var6      : 3.225e-01
                         :    3 : var1      : 2.898e-01
                         :    4 : var3      : 2.828e-01
                         :    5 : var2      : 2.032e-01
                         :    6 : var5      : 5.558e-02
                         : -----------------------------
Factory                  : Train method: DNN for Classification
                         : 
TFHandler_DNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.085752    0.35735   [    -1.0000     1.0000 ]
                         :     var2:    0.10321    0.36589   [    -1.0000     1.0000 ]
                         :     var3:    0.10411    0.37276   [    -1.0000     1.0000 ]
                         :     var4:    0.17623    0.40650   [    -1.0000     1.0000 ]
                         :     var5:   0.010608    0.29962   [    -1.0000     1.0000 ]
                         :     var6:   0.080694    0.39407   [    -1.0000     1.0000 ]
                         : -----------------------------------------------------------
TFHandler_DNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.085752    0.35735   [    -1.0000     1.0000 ]
                         :     var2:    0.10321    0.36589   [    -1.0000     1.0000 ]
                         :     var3:    0.10411    0.37276   [    -1.0000     1.0000 ]
                         :     var4:    0.17623    0.40650   [    -1.0000     1.0000 ]
                         :     var5:   0.010608    0.29962   [    -1.0000     1.0000 ]
                         :     var6:   0.080694    0.39407   [    -1.0000     1.0000 ]
                         : -----------------------------------------------------------
TFHandler_DNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
                         :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
                         :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
                         :     var4:    0.15120    0.40805   [    -1.1921     1.0737 ]
                         :     var5:   0.019832    0.30171   [    -1.0464     1.1641 ]
                         :     var6:   0.056015    0.39567   [    -1.2227     1.0821 ]
                         : -----------------------------------------------------------
                         : Using Standard Implementation.Training with learning rate = 0.1, momentum = 0.9, repetitions = 1
                         : 
                         : 
                         : Elapsed time for training with 2000 events: 29.2 sec         
DNN                      : [dataset_dnn] : Evaluation of DNN on training sample (2000 events)
                         : Elapsed time for evaluation of 2000 events: 0.154 sec       
                         : Creating xml weight file: dataset_dnn/weights/TMVAClassification_DNN.weights.xml
                         : Creating standalone class: dataset_dnn/weights/TMVAClassification_DNN.class.C
Factory                  : Training finished
                         : 
                         : Ranking input variables (method specific)...
DNN                      : Ranking result (top variable is best ranked)
                         : -----------------------------
                         : Rank : Variable  : Importance
                         : -----------------------------
                         :    1 : var1      : 1.000e+00
                         :    2 : var2      : 1.000e+00
                         :    3 : var3      : 1.000e+00
                         :    4 : var4      : 1.000e+00
                         :    5 : var5      : 1.000e+00
                         :    6 : var6      : 1.000e+00
                         : -----------------------------
Factory                  : === Destroy and recreate all methods via weight files for testing ===
                         : 

Test and Evaluate Methods

In [8]:
factory.TestAllMethods()
factory.EvaluateAllMethods()
Factory                  : Test all methods
Factory                  : Test method: DNN for Classification performance
                         : 
DNN                      : [dataset_dnn] : Evaluation of DNN on testing sample (10000 events)
                         : Elapsed time for evaluation of 10000 events: 0.753 sec       
Factory                  : Evaluate all methods
Factory                  : Evaluate classifier: DNN
                         : 
TFHandler_DNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
                         :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
                         :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
                         :     var4:    0.15120    0.40805   [    -1.1921     1.0737 ]
                         :     var5:   0.019832    0.30171   [    -1.0464     1.1641 ]
                         :     var6:   0.056015    0.39567   [    -1.2227     1.0821 ]
                         : -----------------------------------------------------------
DNN                      : [dataset_dnn] : Loop over test events and fill histograms with classifier response...
                         : 
TFHandler_DNN            : Variable        Mean        RMS   [        Min        Max ]
                         : -----------------------------------------------------------
                         :     var1:   0.066774    0.35913   [    -1.2024     1.0914 ]
                         :     var2:   0.079492    0.36669   [    -1.1391     1.2044 ]
                         :     var3:   0.079125    0.37282   [    -1.0685     1.0783 ]
                         :     var4:    0.15120    0.40805   [    -1.1921     1.0737 ]
                         :     var5:   0.019832    0.30171   [    -1.0464     1.1641 ]
                         :     var6:   0.056015    0.39567   [    -1.2227     1.0821 ]
                         : -----------------------------------------------------------
                         : 
                         : Evaluation results ranked by best signal efficiency and purity (area)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet       MVA                       
                         : Name:         Method:          ROC-integ
                         : dataset_dnn   DNN            : 0.932
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
                         : Testing efficiency compared to training efficiency (overtraining check)
                         : -------------------------------------------------------------------------------------------------------------------
                         : DataSet              MVA              Signal efficiency: from test sample (from training sample) 
                         : Name:                Method:          @B=0.01             @B=0.10            @B=0.30   
                         : -------------------------------------------------------------------------------------------------------------------
                         : dataset_dnn          DNN            : 0.350 (0.364)       0.781 (0.793)      0.954 (0.958)
                         : -------------------------------------------------------------------------------------------------------------------
                         : 
Dataset:dataset_dnn      : Created tree 'TestTree' with 10000 events
                         : 
Dataset:dataset_dnn      : Created tree 'TrainTree' with 2000 events
                         : 
Factory                  : Thank you for using TMVA!
                         : For citation information, please visit: http://tmva.sf.net/citeTMVA.html

Plot ROC Curve

We enable JavaScript visualisation for the plots

In [9]:
%jsroot on
In [10]:
c = factory.GetROCCurve(loader)
c.Draw()