Decision Trees ::¶

Class is implemented only using numpy and visualization are shown using matplotlib. Code for class of decision tree is given in trees.py file

Classification Tree¶

Iris Dataset¶

Loading and spliting for training and testing

In [5]:

data = datasets.load_iris()
X = data.data
y = data.target

feature_names = data.feature_names #Optional

Xt,Xs, yt, ys = train_test_split(X,y,test_size=0.3)

print(X.shape,y.shape, Xt.shape, yt.shape, Xs.shape, ys.shape)

(150, 4) (150,) (105, 4) (105,) (45, 4) (45,)

Fitting a model (displaying the tree building) with different modes¶

verbose=0 (silent mode)¶

In [6]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=0,feature_names=feature_names)

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
|
|.........................tree is buit!
---------------------------------------

verbose=1 (progress bar)¶

In [7]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=1,feature_names=feature_names)

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
|subtrees::|100%|-------------------->||
|.........................tree is buit!
---------------------------------------

verbose=2 (printing tree info)¶

In [8]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=2,feature_names=feature_names)

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
|-Feature::3_petal length (cm) Gain::0.96 thr::3.3_Depth = 0   
|->True branch (>>>)..
|--Feature::3_petal length (cm) Gain::0.83 thr::4.8_Depth = 1   
|-->True branch (>>>)..
|---Feature::3_petal length (cm) Gain::0.18 thr::5.0_Depth = 2   
|--->True branch (>>>)..
|---->{Leaf Node:: value: 2 }_Depth =3  

|
|--->False branch (<<<)..
|---Feature::4_petal width (cm) Gain::0.97 thr::1.8_Depth = 3    
|--->False branch (<<<)..
|--->{Leaf Node:: value: 1 }_Depth =4  

|
|--->True branch (>>>)..
|---->{Leaf Node:: value: 2 }_Depth =4  

|
|-->False branch (<<<)..
|-->{Leaf Node:: value: 1 }_Depth =2  

|
|->False branch (<<<)..
|->{Leaf Node:: value: 0 }_Depth =1  

|
|.........................tree is buit!
---------------------------------------

verbose=3 (printing branches only)¶

In [9]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=3,feature_names=feature_names)

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
None 0 | 
True 1 | T
True 2 | TT
True 3 | TTT
False 3 | TTF
True 4 | TTFT
False 4 | TTFF
False 2 | TF
False 1 | F
|
|.........................tree is buit!
---------------------------------------

verbose=4 (Plotting tree.. while building)¶

In [10]:

%matplotlib notebook

In [16]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=4,feature_names=feature_names, randomBranch=True)
plt.close(clf.fig)

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
|
|.........................tree is buit!
---------------------------------------

Plotting the resulting tree¶

In [17]:

plt.figure(figsize=(8,6))
clf.plotTree(show=True,DiffBranchColor=True,scale=False)

Plotting Tree with same color branches¶

In [18]:

plt.figure(figsize=(8,6))
clf.plotTree(show=True)

Predicting and computing Accuracy¶

In [20]:

ytp = clf.predict(Xt)
ysp = clf.predict(Xs)
print('Training Accuracy: ',np.mean(ytp==yt))
print('Testing  Accuracy: ',np.mean(ysp==ys))

Training Accuracy:  1.0
Testing  Accuracy:  0.8888888888888888

Iris data with smaller tree¶

In [25]:

clf = ClassificationTree(max_depth=2)
clf.fit(Xt,yt,verbose=1,feature_names=feature_names)
#plt.figure(figsize=(15,8))
plt.figure(figsize=(5,5))
clf.plotTree(show=True,DiffBranchColor=True)
ytp = clf.predict(Xt)
ysp = clf.predict(Xs)
print('Training Accuracy: ',np.mean(ytp==yt))
print('Testing  Accuracy: ',np.mean(ysp==ys))

Number of features:: 4
Number of samples :: 105
---------------------------------------
|Building the tree.....................
|subtrees::|100%|-------------------->||
|.........................tree is buit!
---------------------------------------

Training Accuracy:  0.9809523809523809
Testing  Accuracy:  0.8666666666666667

Breast Cancer data¶

In [26]:

data = datasets.load_breast_cancer()
X = data.data
y = data.target

feature_names = data.feature_names #Optional

Xt,Xs, yt, ys = train_test_split(X,y,test_size=0.3)

print(X.shape,y.shape, Xt.shape, yt.shape, Xs.shape, ys.shape)

(569, 30) (569,) (398, 30) (398,) (171, 30) (171,)

Fitting model with displaying the details of tree in process (verbose=4)¶

While building tree, To first choose True branch and then False set randomBranch=False

In [28]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=4,feature_names=feature_names,randomBranch=False)
#plt.close(clf.fig)

Number of features:: 30
Number of samples :: 398
---------------------------------------
|Building the tree.....................
|
|.........................tree is buit!
---------------------------------------

To randomly selevting True or False branch set randomBranch=True

In [29]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=4,feature_names=feature_names,randomBranch=True)
plt.close(clf.fig)

Number of features:: 30
Number of samples :: 398
---------------------------------------
|Building the tree.....................
|
|.........................tree is buit!
---------------------------------------

Resulting tree¶

In [31]:

plt.figure(figsize=(6,6))
clf.plotTree(show=True,DiffBranchColor=True,scale=False)

Fitting model with displaying the progress only (verbose=1)¶

In [32]:

clf = ClassificationTree()
clf.fit(Xt,yt,verbose=1,feature_names=feature_names)

plt.figure(figsize=(6,6))
clf.plotTree(show=True)

Number of features:: 30
Number of samples :: 398
---------------------------------------
|Building the tree.....................
|subtrees::|100%|-------------------->|-
|.........................tree is buit!
---------------------------------------

Plotting Decison Tree¶

In [33]:

plt.figure(figsize=(6,6))
clf.plotTree(show=True)

Predicting and computing MSE¶

In [34]:

ytp = clf.predict(Xt)
ysp = clf.predict(Xs)
print('Training Accuracy: ',np.mean(ytp==yt))
print('Testing  Accuracy: ',np.mean(ysp==ys))

Training Accuracy:  1.0
Testing  Accuracy:  0.9473684210526315

It's overfitting, try with smaller trees by decresing the max_depth of classifier

Regression Tree¶

Boston House price¶

In [35]:

data = datasets.load_boston()
X = data.data
y = data.target

feature_names = data.feature_names #Optional

Xt,Xs, yt, ys = train_test_split(X,y,test_size=0.3)

print(X.shape,y.shape, Xt.shape, yt.shape, Xs.shape, ys.shape)

(506, 13) (506,) (354, 13) (354,) (152, 13) (152,)

In [37]:

rgr = RegressionTree()
rgr.fit(Xt,yt,verbose=1,feature_names = feature_names)

Number of features:: 13
Number of samples :: 354
---------------------------------------
|Building the tree.....................
|subtrees::|100%|-------------------->|\
|.........................tree is buit!
---------------------------------------

In [42]:

%matplotlib inline
plt.style.use('default')

In [44]:

plt.figure(figsize=(10,10))
rgr.plotTree(show=True,scale=False,DiffBranchColor=True)

In [19]:

plt.figure(figsize=(15,15))
rgr.plotTree(show=True,scale=True, showtitle =False, showDirection=False,DiffBranchColor=True)

In [45]:

%matplotlib notebook
rgr = RegressionTree(max_depth=4)
rgr.fit(Xt,yt,verbose=4,feature_names = feature_names)

Number of features:: 13
Number of samples :: 354
---------------------------------------
|Building the tree.....................
|
|.........................tree is buit!
---------------------------------------

In [47]:

plt.figure(figsize=(10,6))
rgr.plotTree(show=True,scale=True, showtitle =False, showDirection=False,DiffBranchColor=True)

In [48]:

ytp = rgr.predict(Xt)
ysp = rgr.predict(Xs)
print('Training MSE: ',np.mean((ytp-yt)**2))
print('Testing  MSE: ',np.mean((ysp-ys)**2))

Training MSE:  6.178914224439429
Testing  MSE:  12.187349420981135

Boston Data with smaller tree¶

In [49]:

rgr = RegressionTree(max_depth=3)
rgr.fit(Xt,yt,verbose=1,feature_names = feature_names)

Number of features:: 13
Number of samples :: 354
---------------------------------------
|Building the tree.....................
|subtrees::|100%|-------------------->|\
|.........................tree is buit!
---------------------------------------

In [52]:

plt.figure(figsize=(8,6))
rgr.plotTree(show=True,scale=True, showtitle =True, showDirection=False,DiffBranchColor=True)

ytp = rgr.predict(Xt)
ysp = rgr.predict(Xs)
print('Training MSE: ',np.mean((ytp-yt)**2))
print('Testing  MSE: ',np.mean((ysp-ys)**2))

Training MSE:  8.674185349676321
Testing  MSE:  12.839154053729468

In [ ]:

Decision Trees ::¶

Table of Contents

Import classifier and regressor from given file trees.py¶

For dataset and spliting, we need sklearn (Optional, if you have your own data)¶

Classification Tree¶

Iris Dataset¶

Fitting a model (displaying the tree building) with different modes¶

verbose=0 (silent mode)¶

verbose=1 (progress bar)¶

verbose=2 (printing tree info)¶

verbose=3 (printing branches only)¶

verbose=4 (Plotting tree.. while building)¶

Plotting the resulting tree¶

Plotting Tree with same color branches¶

Predicting and computing Accuracy¶

Iris data with smaller tree¶

Breast Cancer data¶

Fitting model with displaying the details of tree in process (verbose=4)¶

Resulting tree¶

Fitting model with displaying the progress only (verbose=1)¶

Plotting Decison Tree¶

Predicting and computing MSE¶

Regression Tree¶

Boston House price¶

Boston Data with smaller tree¶

Decision Trees ::¶

Table of Contents

Import classifier and regressor from given file *trees.py*¶

For dataset and spliting, we need sklearn (Optional, if you have your own data)¶

Classification Tree¶

Iris Dataset¶

Fitting a model (displaying the tree building) with different modes¶

verbose=0 (silent mode)¶

verbose=1 (progress bar)¶

verbose=2 (printing tree info)¶

verbose=3 (printing branches only)¶

verbose=4 (Plotting tree.. while building)¶

Plotting the resulting tree¶

Plotting Tree with same color branches¶

Predicting and computing Accuracy¶

Iris data with smaller tree¶

Breast Cancer data¶

Fitting model with displaying the details of tree in process (verbose=4)¶

Resulting tree¶

Fitting model with displaying the progress only (verbose=1)¶

Plotting Decison Tree¶

Predicting and computing MSE¶

Regression Tree¶

Boston House price¶

Boston Data with smaller tree¶

Import classifier and regressor from given file trees.py¶