In [1]:
import numpy as np
import math
import plotly.offline as py
import plotly.graph_objs as go
py.init_notebook_mode(connected=True)

Perceptron

Let us implement a perceptron.

$ F(x) = {\sigma}\biggl(\sum_{i=0}^n W_i.x_i + b\biggr) $

  • This can take $ n $ input.
  • Has $ n $ weights.
  • $ \sigma $ is the sigmoid activation.
In [2]:
# Let us create a perceptron class


h = []
class Perceptron():
    """ The implementation of the perceptron model. """
    
    def __init__(self,num_inputs,lr):
        """ 
        Arguments
        num_inputs: number of inputs to the perceptron
        lr : Learning rate for the perceptron
        """    
        
        
        #Here we use num_inputs + 1, because this would take the bias into account if we pad the input with one.  
        #So x dot product of W gives us Wx + b
        self.W = np.random.randn(num_inputs+1) 
        
        self.lr = lr
    
    def step_function(self,x):
        """
        Arguments
        x : the input on which the step_function should be applied
        """
        if (x>0):
            return 1
        else :
            return 0  
        
        
    def forward(self,x):
        """
        Arguments
        x : the input numpyt array on which the perceptron is trained.
        """
        output = x.T.dot(self.W)
        return self.step_function(output)
    
    def loss(self,predict,label):
        """
        Arguments
        predict : value prediced by the perceptron
        label : original values to be predicted
        """
        
        l = label - predict
        return l
    
    def back_propagate(self,loss_value,x):
        """
        Arguments
        loss_value : the calculated loss for a set of label and predicted value
        x : the set of input values used for training
        """
        
        self.W += (self.lr*loss_value*x)
    
    def batch_train(self,x,label,epochs=2):
        """
        Arguments
        x : an array of data for training
        label : the orignal label for the set of training data
        epochs : total number of the times the data is used to train
        """
        x = np.array(x)
        n = x.shape[0]
        bias_axis = np.ones([n,1])
        x = np.concatenate((x,bias_axis),axis=1)
        loss_hist = []
        assert x.shape[1] == (self.W.shape[0]), "Input shape does not match the specified length"
        for I in range(epochs):
            avg_loss = 0
            for J in range(x.shape[0]):
                pred = self.forward(x[J])
                l = self.loss(pred,label[J])
                self.back_propagate(l,x[J])
                avg_loss += abs(l)
                h.append(self.W)
            loss_hist.append(avg_loss)
        return loss_hist

Let us test it

Here we train the perceptron to approximate the function of an AND gate.

AND gate

AND gate is a logical gate, which takes in one or more input and the resultant is one only if all of the input is one else it is zero

Input1 Input2 Output
0 0 0
0 1 0
1 0 0
1 1 1
In [3]:
p = Perceptron(2,0.5)

# AND gate input
x = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])

# Output of AND gate
y = np.array([0,0,0,1])

#training
hist = p.batch_train(x,y,12)

assert p.forward(np.array([0,0,1])) == 0, "Try using more epochs"
assert p.forward(np.array([0,1,1])) == 0, "Try using more epochs"
assert p.forward(np.array([1,0,1])) == 0, "Try using more epochs"
assert p.forward(np.array([1,1,1])) == 1, "Try using more epochs"
print ("Perceptron has been trained successfully")
Perceptron has been trained successfully

Visualization

We can infer a lot about our perceptron by using an array of visualization techiniques. We will discuss some important visualization here.

First Let us plot the Loss to see the progress of the training.

In [4]:
trace = go.Scatter(
    x = [I for I in range(len(hist))], 
    y = hist
)

layout = dict(title = 'Loss Value',
              xaxis = dict(title = 'Epochs'),
              yaxis = dict(title = 'Loss'),
              )

fig = dict(data=[trace], layout=layout)
py.iplot(fig)

From the above we can see that the model has converged after 5th epoch, this convergence epochs number may become longer or shorter if you run it. After training the model for 5 epochs, it learned the function of AND gate. If you see the training cell, we have used assert statements to verify the behaviour of the neural network. First two are the inputs and the one is padded to the input for the bias.

Decision plane

Let us now visualize the Decision plane created by the perceptron. We have plotted the inputs in the XY-plane and Z-plane acts as the output.

In [5]:
def plot_decision_plane(w):
    x_point = [0.0,0.0,1.0,1.0]
    y_point = [0.0,1.0,0.0,1.0]
    z_point = [0.0,0.0,0.0,1.0]
    
    x_plane = np.linspace(-0.2, 1.2, num=4)
    y_plane = np.linspace(-0.2, 1.2, num=4)
    Y_plane, X_plane = np.meshgrid(x_plane, y_plane)
    Z_plane = -w[0]*X_plane - w[1]*Y_plane - w[2]
    
    data = [
        go.Surface(x=X_plane,y=Y_plane,z=Z_plane,colorscale = 'Viridis'),
        go.Scatter3d(
            x=x_point,
            y=y_point,
            z=z_point,
            mode='markers',
            marker=dict(
                size=12,
                line=dict(
                    width=0.5
                ),
                colorscale='Viridis',
                color=z_point,
                opacity=0.8
            )
        )]
    layout = go.Layout(
                title= 'Decision plane of And Gate',
                scene = dict(
                    xaxis = dict(title = 'Input 1'),
                    yaxis=dict(title = 'Input 2'),
                    zaxis=dict(title = 'Output')
                )
            )
    
    fig = go.Figure(data=data, layout=layout)
    py.iplot(fig)
    
    
plot_decision_plane(p.W)

As we see above only one point which yields 1 as output has been separated by the plane from the other three points which yields an output of 0. This plane is called as the decision plane. Every points(in XY-Plane) below the plane yields an output of 0 and above the plane yields 1.

Things to try out

  • You can try to approximate OR,NOR,NAND gates.
  • Use different Activations and see if anything changes.
  • Use a sigmoid activation and use it for regression instead of a classification.