Visualization of neural networks and decision trees¶

In [1]:

import ROOT
from ROOT import TFile, TMVA, TCut

Enable JS visualization¶

In [4]:

%jsmva on

Welcome to JupyROOT 6.09/01

Declarations, building training and testing trees¶

For more details please see this notebook.

In [5]:

outputFile = TFile( "TMVA.root", 'RECREATE' )

TMVA.Tools.Instance()

factory = TMVA.Factory(JobName="TMVAClassification", TargetFile=outputFile,
            V=False, Color=True, DrawProgressBar=True, Transformations=["I", "D", "P", "G","D"],
                       AnalysisType="Classification")

dataset = "tmva_class_example"
loader  = TMVA.DataLoader(dataset)

loader.AddVariable( "myvar1 := var1+var2", 'F' )
loader.AddVariable( "myvar2 := var1-var2", "Expression 2", 'F' )
loader.AddVariable( "var3",                "Variable 3", 'F' )
loader.AddVariable( "var4",                "Variable 4", 'F' )

loader.AddSpectator( "spec1:=var1*2",  "Spectator 1",  'F' )
loader.AddSpectator( "spec2:=var1*3",  "Spectator 2",  'F' )

if ROOT.gSystem.AccessPathName( "./tmva_class_example.root" ) != 0: 
    ROOT.gSystem.Exec( "wget https://root.cern.ch/files/tmva_class_example.root")
    
input = TFile.Open( "./tmva_class_example.root" )

# Get the signal and background trees for training
signal      = input.Get( "TreeS" )
background  = input.Get( "TreeB" )
    
# Global event weights (see below for setting event-wise weights)
signalWeight     = 1.0
backgroundWeight = 1.0

mycuts = TCut("")
mycutb = TCut("")

loader.AddSignalTree(signal, signalWeight)
loader.AddBackgroundTree(background, backgroundWeight)
loader.fSignalWeight = signalWeight
loader.fBackgroundWeight = backgroundWeight
loader.fTreeS = signal
loader.fTreeB = background

loader.PrepareTrainingAndTestTree(SigCut=mycuts, BkgCut=mycutb,
            nTrain_Signal=1000, nTrain_Background=1000, nTest_Signal=2000, nTest_Background=2000,
                                  SplitMode="Random", NormMode="NumEvents", V=False)

DataSetInfo

Dataset: tmva_class_example

Added class "Signal"

Add Tree TreeS of type Signal with 6000 events

DataSetInfo

Dataset: tmva_class_example

Added class "Background"

Add Tree TreeB of type Background with 6000 events

Booking methods¶

For more details please see this notebook.

In [6]:

factory.BookMethod( DataLoader=loader, Method=TMVA.Types.kMLP, MethodTitle="MLP", 
                    H=False, V=False, NeuronType="tanh", VarTransform="N", NCycles=600, HiddenLayers="N+5",
                    TestRate=5, UseRegulator=False )

trainingStrategy = [{
        "LearningRate": 1e-1,
        "Momentum": 0.0,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 20,
        "TestRepetitions": 15,
        "WeightDecay": 0.001,
        "Regularization": "NONE",
        "DropConfig": "0.0+0.5+0.5+0.5",
        "DropRepetitions": 1,
        "Multithreading": True
        
    }, {
        "LearningRate": 1e-2,
        "Momentum": 0.5,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 30,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "L2",
        "DropConfig": "0.0+0.1+0.1+0.1",
        "DropRepetitions": 1,
        "Multithreading": True
        
    }, {
        "LearningRate": 1e-2,
        "Momentum": 0.3,
        "Repetitions": 1,
        "ConvergenceSteps": 300,
        "BatchSize": 40,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "L2",
        "Multithreading": True
        
    },{
        "LearningRate": 1e-3,
        "Momentum": 0.1,
        "Repetitions": 1,
        "ConvergenceSteps": 200,
        "BatchSize": 70,
        "TestRepetitions": 7,
        "WeightDecay": 0.001,
        "Regularization": "NONE",
        "Multithreading": True
        
}]

factory.BookMethod(DataLoader=loader, Method=TMVA.Types.kDNN, MethodTitle="DNN", 
                   H = False, V=False, VarTransform="Normalize", ErrorStrategy="CROSSENTROPY",
                   Layout=["TANH|100", "TANH|50", "TANH|10", "LINEAR"],
                   TrainingStrategy=trainingStrategy, Architecture="CPU")

factory.BookMethod(DataLoader= loader, Method=TMVA.Types.kBDT, MethodTitle="BDT",
                   H=False,V=False,NTrees=850,MinNodeSize="2.5%",MaxDepth=3,BoostType="AdaBoost", AdaBoostBeta=0.5,
                   UseBaggedBoost=True,BaggedSampleFraction=0.5, SeparationType="GiniIndex", nCuts=20 )

Out[6]:

<ROOT.TMVA::MethodBDT object ("BDT") at 0x39e6080>

{ "_typename" : "TH2D", "fUniqueID" : 0, "fBits" : 50332168, "fName" : "TMatrixDBase", "fTitle" : "Correlation matrix (Signal)", "fLineColor" : 602, "fLineStyle" : 1, "fLineWidth" : 1, "fFillColor" : 0, "fFillStyle" : 1001, "fMarkerColor" : 0, "fMarkerStyle" : 1, "fMarkerSize" : 1.500000e+00, "fNcells" : 36, "fXaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 51380232, "fName" : "xaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 1.100000e-02, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] }, "fModLabs" : null }, "fYaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331656, "fName" : "yaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] }, "fModLabs" : null }, "fZaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331648, "fName" : "zaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 3.500000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 1, "fXmin" : 0, "fXmax" : 1, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 0, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : null, "fModLabs" : null }, "fBarOffset" : 0, "fBarWidth" : 1000, "fEntries" : 32, "fTsumw" : 0, "fTsumw2" : 0, "fTsumwx" : 0, "fTsumwx2" : 0, "fMaximum" : 100, "fMinimum" : -100, "fNormFactor" : 0, "fContour" : [], "fSumw2" : [], "fOption" : "", "fFunctions" : { "_typename" : "TList", "name" : "TList", "arr" : [], "opt" : [] }, "fBufferSize" : 0, "fBuffer" : [], "fBinStatErrOpt" : 0, "fScalefactor" : 1, "fTsumwy" : 0, "fTsumwy2" : 0, "fTsumwxy" : 0, "fArray" : [0, 0, 0, 0, 0, 0, 0, 100, 0, 76, 92, 0, 0, 0, 100, -10, 8, 0, 0, 76, -10, 100, 84, 0, 0, 92, 8, 84, 100, 0, 0, 0, 0, 0, 0, 0]}

{ "_typename" : "TH2D", "fUniqueID" : 0, "fBits" : 50332168, "fName" : "TMatrixDBase", "fTitle" : "Correlation matrix (Background)", "fLineColor" : 602, "fLineStyle" : 1, "fLineWidth" : 1, "fFillColor" : 0, "fFillStyle" : 1001, "fMarkerColor" : 0, "fMarkerStyle" : 1, "fMarkerSize" : 1.500000e+00, "fNcells" : 36, "fXaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 51380232, "fName" : "xaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 1.100000e-02, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] }, "fModLabs" : null }, "fYaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331656, "fName" : "yaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 4.000000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 4, "fXmin" : 0, "fXmax" : 4, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 3, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : { "_typename" : "THashList", "name" : "THashList", "arr" : [{ "_typename" : "TObjString", "fUniqueID" : 1, "fBits" : 50331648, "fString" : "var1+var2" }, { "_typename" : "TObjString", "fUniqueID" : 2, "fBits" : 50331648, "fString" : "var1-var2" }, { "_typename" : "TObjString", "fUniqueID" : 3, "fBits" : 50331648, "fString" : "var3" }, { "_typename" : "TObjString", "fUniqueID" : 4, "fBits" : 50331648, "fString" : "var4" }], "opt" : ["", "", "", ""] }, "fModLabs" : null }, "fZaxis" : { "_typename" : "TAxis", "fUniqueID" : 0, "fBits" : 50331648, "fName" : "zaxis", "fTitle" : "", "fNdivisions" : 510, "fAxisColor" : 1, "fLabelColor" : 1, "fLabelFont" : 42, "fLabelOffset" : 5.000000e-03, "fLabelSize" : 3.500000e-02, "fTickLength" : 3.000000e-02, "fTitleOffset" : 1, "fTitleSize" : 3.500000e-02, "fTitleColor" : 1, "fTitleFont" : 42, "fNbins" : 1, "fXmin" : 0, "fXmax" : 1, "fXbins" : [], "fFirst" : 0, "fLast" : 0, "fBits2" : 0, "fTimeDisplay" : false, "fTimeFormat" : "", "fLabels" : null, "fModLabs" : null }, "fBarOffset" : 0, "fBarWidth" : 1000, "fEntries" : 32, "fTsumw" : 0, "fTsumw2" : 0, "fTsumwx" : 0, "fTsumwx2" : 0, "fMaximum" : 100, "fMinimum" : -100, "fNormFactor" : 0, "fContour" : [], "fSumw2" : [], "fOption" : "", "fFunctions" : { "_typename" : "TList", "name" : "TList", "arr" : [], "opt" : [] }, "fBufferSize" : 0, "fBuffer" : [], "fBinStatErrOpt" : 0, "fScalefactor" : 1, "fTsumwy" : 0, "fTsumwy2" : 0, "fTsumwxy" : 0, "fArray" : [0, 0, 0, 0, 0, 0, 0, 100, 14, 95, 98, 0, 0, 14, 100, 10, 19, 0, 0, 95, 10, 100, 97, 0, 0, 98, 19, 97, 100, 0, 0, 0, 0, 0, 0, 0]}

Factory

Booking method: MLP

MLP

Dataset: tmva_class_example

Create Transformation "N" with events from all classes.

Transformation, Variable selection :

Input : variable 'myvar1' <---> Output : variable 'myvar1'

Input : variable 'myvar2' <---> Output : variable 'myvar2'

Input : variable 'var3' <---> Output : variable 'var3'

Input : variable 'var4' <---> Output : variable 'var4'

MLP

Building Network.

Initializing weights

Factory

Booking method: DNN

DNN

Dataset: tmva_class_example

Create Transformation "Normalize" with events from all classes.

Transformation, Variable selection :

Input : variable 'myvar1' <---> Output : variable 'myvar1'

Input : variable 'myvar2' <---> Output : variable 'myvar2'

Input : variable 'var3' <---> Output : variable 'var3'

Input : variable 'var4' <---> Output : variable 'var4'

Factory

Booking method: BDT

DataSetFactory

Dataset: tmva_class_example

Number of events in input trees

Number of training and testing events
Signal	training events	1000
	testing events	2000
	training and testing events	3000
Background	training events	1000
	testing events	2000
	training and testing events	3000

DataSetInfo

Correlation matrix (Signal)

DataSetInfo

Correlation matrix (Background)

DataSetFactory

Dataset: tmva_class_example

Draw Neural Network¶

If we trained a neural network then the weights of the network will be saved to XML and C file. We can read back the XML file and we can visualize the network using Factory.DrawNeuralNetwork function.

The arguments of this function:

Keyword	Can be used as positional argument	Default	Predefined values	Description
datasetName	yes, 1.	-	-	The name of dataset
methodName	yes, 2.	-	-	The name of method

This visualization will be interactive, and we can do the following with it:

Mouseover (node, weight): focusing
Zooming and grab and move supported
Reset: double click

The synapses are drawn with 2 colors, one for positive weight and one for negative weight. The absolute value of the synapses are scaled and transformed to thickness of line between to node.

In [7]:

factory.DrawNeuralNetwork(dataset, "MLP")

Draw Deep Neural Network¶

The DrawNeuralNetwork function also can visualize deep neural networks, we just have to pass "DNN" as method name. If you have very big network with lots of thousands of neurons then drawing the network will be a little bit slow and will need a lot of ram, so be careful with this function.

This visualization also will be interactive, and we can do the following with it:

Zooming and grab and move supported

In [8]:

factory.DrawNeuralNetwork(dataset, "DNN")

Draw Decision Tree¶

The trained decision trees will be save to XML save too, so we can read back the XML file and we can visualize the trees. This is the purpose of Factory.DrawDecisionTree function.

The arguments of this function:

Keyword	Can be used as positional argument	Default	Predefined values	Description
datasetName	yes, 1.	-	-	The name of dataset
methodName	yes, 2.	-	-	The name of method

This function will produce a little box where you can enter the index of the tree (the number of trees will be also will appear before this input box) you want to see. After choosing this number you have to press the Draw button. The nodes of tree will be colored, the color is associated to signal efficiency.

The visualization of tree will be interactive and you can do the following with it:

Mouseover (node, weight): showing decision path
Zooming and grab and move supported
Reset zoomed tree: double click
Expand all closed subtrees, turn off zoom: button in the bottom of the picture
Click on node:
- hiding subtree, if node children are hidden the node will have a green border
- rescaling: bigger nodes, bigger texts
- click again to show the subtree

In [9]:

factory.DrawDNNWeights(dataset, "DNN")

In [10]:

 factory.DrawDecisionTree(dataset, "BDT") #11

Close the factory's output file¶

In [11]:

outputFile.Close()

In [ ]: