Notebook

Book DNN, Training methods¶

In [1]:

import ROOT
from ROOT import TFile, TMVA, TCut

Welcome to JupyROOT 6.07/07

Enable JS visualization¶

In [2]:

%jsmva on

Declarations, building training and testing trees¶

For more details please see this notebook.

In [3]:

outputFile = TFile( "TMVA.root", 'RECREATE' )

TMVA.Tools.Instance()

factory = TMVA.Factory(JobName="TMVAClassification", TargetFile=outputFile,
            V=False, Color=True, DrawProgressBar=True, Transformations=["I", "D", "P", "G","D"],
                       AnalysisType="Classification")

dataset = "tmva_class_example"
loader  = TMVA.DataLoader(dataset)

loader.AddVariable( "myvar1 := var1+var2", 'F' )
loader.AddVariable( "myvar2 := var1-var2", "Expression 2", 'F' )
loader.AddVariable( "var3",                "Variable 3", 'F' )
loader.AddVariable( "var4",                "Variable 4", 'F' )

loader.AddSpectator( "spec1:=var1*2",  "Spectator 1",  'F' )
loader.AddSpectator( "spec2:=var1*3",  "Spectator 2",  'F' )

if ROOT.gSystem.AccessPathName( "./tmva_class_example.root" ) != 0: 
    ROOT.gSystem.Exec( "wget https://root.cern.ch/files/tmva_class_example.root")
    
input = TFile.Open( "./tmva_class_example.root" )

# Get the signal and background trees for training
signal      = input.Get( "TreeS" )
background  = input.Get( "TreeB" )
    
# Global event weights (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0

mycuts = TCut("")
mycutb = TCut("")

loader.AddSignalTree(signal, signalWeight)
loader.AddBackgroundTree(background, backgroundWeight)
loader.fSignalWeight = signalWeight
loader.fBackgroundWeight = backgroundWeight
loader.fTreeS = signal
loader.fTreeB = background

loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,
            nTrain_Signal=0, nTrain_Background=0, SplitMode="Random", NormMode="NumEvents", V=False)

DataSetInfo

Dataset: tmva_class_example

Added class "Signal"

Add Tree TreeS of type Signal with 6000 events

DataSetInfo

Dataset: tmva_class_example

Added class "Background"

Add Tree TreeB of type Background with 6000 events

Booking methods¶

The booking of a method can be done as the notebook refered before shows. The new feature introduced here is how we can pass the training strategy when we want to book DNN. Instead of a long strategy string, we can create a list, containing dictionaries, which contain the options for the corresponding layer.

In [4]:

factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kCuts, MethodTitle="Cuts",
                    H=False, V=False, FitMethod="MC", EffSel=True, SampleSize=200000, VarProp="FSmart" )

factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kSVM, MethodTitle="SVM", 
                    Gamma=0.25, Tol=0.001, VarTransform="Norm" )

factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kMLP, MethodTitle="MLP", 
                    H=False, V=False, NeuronType="tanh", VarTransform="N", NCycles=600, HiddenLayers="N+5",
                    TestRate=5, UseRegulator=False )

factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kLD, MethodTitle="LD", 
                    H=False, V=False, VarTransform="None", CreateMVAPdfs=True, PDFInterpolMVAPdf="Spline2",
                    NbinsMVAPdf=50, NsmoothMVAPdf=10 )

trainingStrategy = [{
        "LearningRate": 1e-1,
        "Momentum": 0.0,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 20,
        "TestRepetitions": 15,
        "WeightDecay": 0.001,
        "Regularization": "NONE",
        "DropConfig": "0.0+0.5+0.5+0.5",
        "DropRepetitions": 1,
        "Multithreading": True
        
    }, {
        "LearningRate": 1e-2,
        "Momentum": 0.5,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 30,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "L2",
        "DropConfig": "0.0+0.1+0.1+0.1",
        "DropRepetitions": 1,
        "Multithreading": True
        
    }, {
        "LearningRate": 1e-2,
        "Momentum": 0.3,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 40,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "L2",
        "Multithreading": True
        
    },{
        "LearningRate": 1e-3,
        "Momentum": 0.1,
        "Repetitions": 1,
        "ConvergenceSteps": 200,
        "BatchSize": 70,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "NONE",
        "Multithreading": True
        
}]

factory.BookMethod(DataLoader=loader, Method=TMVA.Types.kDNN, MethodTitle="DNN", 
                   H = False, V=False, VarTransform="Normalize", ErrorStrategy="CROSSENTROPY",
                   Layout=["TANH|100", "TANH|50", "TANH|10", "LINEAR"],
                   TrainingStrategy=trainingStrategy)

factory.BookMethod(loader, TMVA.Types.kLikelihood, "Likelihood", 
                   "NSmoothSig[0]=20:NSmoothBkg[0]=20:NSmoothBkg[1]=10:NSmooth=1:NAvEvtPerBin=50",
                    H=True, V=False,TransformOutput=True,PDFInterpol="Spline2")

factory.BookMethod(DataLoader= loader, Method=TMVA.Types.kBDT, MethodTitle="BDT",
                   H=False,V=False,NTrees=850,MinNodeSize="2.5%",MaxDepth=3,BoostType="AdaBoost", AdaBoostBeta=0.5,
                   UseBaggedBoost=True,BaggedSampleFraction=0.5, SeparationType="GiniIndex", nCuts=20 )

Out[4]:

<ROOT.TMVA::MethodBDT object ("BDT") at 0x6260870>

{ "_typename" : "TH2D", "fUniqueID" : 0, "fBits" : 50332168, "fName" : "TMatrixDBase", "fTitle" : "Correlation matrix (Signal)", "fLineColor" : 602, "fLineStyle" : 1, "fLineWidth" : 3, "fFillColor" : 0, "fFillStyle" : 1001, "fMarkerColor" : 0, "fMarkerStyle" : 8, "fMarkerSize" : 1.500000e+00, "fNcells" : 36, "fXaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 51380232, "fName" : "xaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 1.100000e-02, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] } }, "fYaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331656, "fName" : "yaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] } }, "fZaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331648, "fName" : "zaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 3.500000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 1, "fXmin" : 0, "fXmax" : 1, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 0, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : null }, "fBarOffset" : 0, "fBarWidth" : 1000, "fEntries" : 32, "fTsumw" : 0, "fTsumw2" : 0, "fTsumwx" : 0, "fTsumwx2" : 0, "fMaximum" : 100, "fMinimum" : -100, "fNormFactor" : 0, "fContour" : [], "fSumw2" : [], "fOption" : "", "fFunctions" : { "_typename" : "TList", "name" : "TList", "arr" : [], "opt" : [] }, "fBufferSize" : 0, "fBuffer" : [], "fBinStatErrOpt" : 0, "fScalefactor" : 1, "fTsumwy" : 0, "fTsumwy2" : 0, "fTsumwxy" : 0, "fArray" : [0, 0, 0, 0, 0, 0, 0, 100, 0, 77, 92, 0, 0, 0, 100, -9, 6, 0, 0, 77, -9, 100, 85, 0, 0, 92, 6, 85, 100, 0, 0, 0, 0, 0, 0, 0]}

{ "_typename" : "TH2D", "fUniqueID" : 0, "fBits" : 50332168, "fName" : "TMatrixDBase", "fTitle" : "Correlation matrix (Background)", "fLineColor" : 602, "fLineStyle" : 1, "fLineWidth" : 3, "fFillColor" : 0, "fFillStyle" : 1001, "fMarkerColor" : 0, "fMarkerStyle" : 8, "fMarkerSize" : 1.500000e+00, "fNcells" : 36, "fXaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 51380232, "fName" : "xaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 1.100000e-02, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] } }, "fYaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331656, "fName" : "yaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] } }, "fZaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331648, "fName" : "zaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 3.500000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 1, "fXmin" : 0, "fXmax" : 1, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 0, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : null }, "fBarOffset" : 0, "fBarWidth" : 1000, "fEntries" : 32, "fTsumw" : 0, "fTsumw2" : 0, "fTsumwx" : 0, "fTsumwx2" : 0, "fMaximum" : 100, "fMinimum" : -100, "fNormFactor" : 0, "fContour" : [], "fSumw2" : [], "fOption" : "", "fFunctions" : { "_typename" : "TList", "name" : "TList", "arr" : [], "opt" : [] }, "fBufferSize" : 0, "fBuffer" : [], "fBinStatErrOpt" : 0, "fScalefactor" : 1, "fTsumwy" : 0, "fTsumwy2" : 0, "fTsumwxy" : 0, "fArray" : [0, 0, 0, 0, 0, 0, 0, 100, 17, 95, 98, 0, 0, 17, 100, 13, 21, 0, 0, 95, 13, 100, 97, 0, 0, 98, 21, 97, 100, 0, 0, 0, 0, 0, 0, 0]}

Factory

Booking method: Cuts

Use optimization method: "Monte Carlo"

Use efficiency computation method: "Event Selection"

Use "FSmart" cuts for variable: 'myvar1'

Use "FSmart" cuts for variable: 'myvar2'

Use "FSmart" cuts for variable: 'var3'

Use "FSmart" cuts for variable: 'var4'

Factory

Booking method: SVM

SVM

Dataset: tmva_class_example

Create Transformation "Norm" with events from all classes.

Norm

Transformation, Variable selection :

Input : variable 'myvar1' <---> Output : variable 'myvar1'

Input : variable 'myvar2' <---> Output : variable 'myvar2'

Input : variable 'var3' <---> Output : variable 'var3'

Input : variable 'var4' <---> Output : variable 'var4'

Factory

Booking method: MLP

MLP

Dataset: tmva_class_example

Create Transformation "N" with events from all classes.

Norm

Transformation, Variable selection :

Input : variable 'myvar1' <---> Output : variable 'myvar1'

Input : variable 'myvar2' <---> Output : variable 'myvar2'

Input : variable 'var3' <---> Output : variable 'var3'

Input : variable 'var4' <---> Output : variable 'var4'

MLP

Building Network.

Initializing weights

Factory

Booking method: LD

DataSetFactory

Dataset: tmva_class_example

Number of events in input trees

Number of training and testing events
Signal	training events	3000
	testing events	3000
	training and testing events	6000
Background	training events	3000
	testing events	3000
	training and testing events	6000

DataSetInfo

Correlation matrix (Signal)

DataSetInfo

Correlation matrix (Background)

DataSetFactory

Dataset: tmva_class_example

Factory

Booking method: DNN

DNN

Dataset: tmva_class_example

Create Transformation "Normalize" with events from all classes.

Norm

Transformation, Variable selection :

Input : variable 'myvar1' <---> Output : variable 'myvar1'

Input : variable 'myvar2' <---> Output : variable 'myvar2'

Input : variable 'var3' <---> Output : variable 'var3'

Input : variable 'var4' <---> Output : variable 'var4'

Factory

Booking method: Likelihood

Factory

Booking method: BDT

Train Methods¶

When you use the jsmva magic, the original C++ version of Factory::TrainAllMethods is rewritten by a new training method, which will produce notebook compatible output during the training, so we can trace the process (progress bar, error plot). For some methods (MLP, DNN, BDT) there will be created a tracer plot (for MLP, DNN test and training error vs epoch, for BDT error fraction and boost weight vs tree number). There are also some method which doesn't support interactive tracing, so for these methods just a simple text will be printed, just to we know that TrainAllMethods function is training this method currently.

For methods where is possible to trace the training interactively there is a stop button, which can stop the training process. This button just stops the training of the current method, and doesn't stop the TrainAllMethods completely.

In [5]:

factory.TrainAllMethods()

Dataset: tmva_class_example

Train method: Cuts

Train method: SVM

Train method: MLP

Train method: LD

Training...

End

Train method: DNN

Train method: Likelihood

Training...

End

Train method: BDT

FitterBase

Sampling, please be patient ...

Elapsed time : 7.18 sec

Cuts

Cut values for requested signal efficiency: 0.1

Corresponding background efficiency : 0.0276667