Feedforward neural network with regularization

Neural networks is a popular and powerful type of machine learning. This algorithm tries to model the way the brain thinks. There are many types of neural networks, in this notebook I'll implement a basic feedforward network as it is described in Andrew Ng's course on Coursera.

GitHub Logo

The structure of this neural network is simple: input layer, one hidden layer and output layer. Regularization is added to prevent overfitting.

In [1]:
import numpy as np
import pandas as pd
import random
from scipy.special import expit
import scipy.optimize
from scipy.optimize import minimize
In [2]:
train = pd.read_csv('../input/train.csv')
test = pd.read_csv('../input/test.csv')

I add the same new features as in the other notebook.

In [3]:
train['hair_soul'] = train['hair_length'] * train['has_soul']
train['hair_bone'] = train['hair_length'] * train['bone_length']
test['hair_soul'] = test['hair_length'] * test['has_soul']
test['hair_bone'] = test['hair_length'] * test['bone_length']
train['hair_soul_bone'] = train['hair_length'] * train['has_soul'] * train['bone_length']
test['hair_soul_bone'] = test['hair_length'] * test['has_soul'] * test['bone_length']
In [4]:
X = np.array(train.drop(['id', 'color', 'type'], axis=1))
X = np.insert(X,0,1,axis=1)
X_test = np.array(test.drop(['id', 'color'], axis=1))
X_test = np.insert(X_test,0,1,axis=1)
Y_train = np.array((pd.get_dummies(train['type'], drop_first=False)).astype(float))
#I'll need this for predictions.
monsters = (pd.get_dummies(train['type'], drop_first=False)).columns

These are the parameters of neural network. I added additional column to variables as bias, so the input size is 8. Number of nodes in hidden layer is arbitraty, I chose 12 after some test. Params - random initial weights with the same size as the network.

In [5]:
hidden_size = 12
learning_rate = 1
params = (np.random.random(size=hidden_size * (X.shape[1]) + Y_train.shape[1] * (hidden_size + 1)) - 0.5)

Forwardpropagation. Input is multiplied by weights, after that goes hidden layer with sigmoid function and output with sigmoid function.

In [6]:
def forward_propagate(X, theta1, theta2):
    z2 = X * theta1.T
    a2 = np.insert(expit(z2), 0, 1, axis=1) 
    a3 = expit(a2 * theta2.T)
    return z2, a2, a3

Backpropagation. "Going back" to minimize the error. And adding regularization here.

In [7]:
def back_propagate(X, y, theta1, theta2, z2, a2, a3):
    D1 = np.zeros(theta1.shape)
    D2 = np.zeros(theta2.shape)
    
    for t in range(len(X)):
        z2t = z2[t,:]
        
        d3t = a3[t,:] - y[t,:]
        z2t = np.insert(z2t, 0, values=1)
        d2t = np.multiply((theta2.T * d3t.T).T, np.multiply(expit(z2t), (1 - expit(z2t))))
        
        D1 += (d2t[:,1:]).T * X[t,:]
        D2 += d3t.T * a2[t,:]
        
    D1 = D1 / len(X)
    D2 = D2 / len(X)
    
    D1[:,1:] += (theta1[:,1:] * learning_rate) / len(X)
    D2[:,1:] += (theta2[:,1:] * learning_rate) / len(X)
    return D1, D2

Cost function. Convert input and output into matrixes. Divide params into thetas. Then forwardpropagate and calculate loss with regularization. After that backpropagate to minimize cost.

In [8]:
def cost(params, X, y, learningRate):  
    X = np.matrix(X)
    y = np.matrix(y)
    theta1 = np.matrix(np.reshape(params[:hidden_size * (X.shape[1])], (hidden_size, (X.shape[1]))))
    theta2 = np.matrix(np.reshape(params[hidden_size * (X.shape[1]):], (Y_train.shape[1], (hidden_size + 1))))

    z2, a2, a3 = forward_propagate(X, theta1, theta2)
    J = 0
    for i in range(len(X)):
        first_term = np.multiply(-y[i,:], np.log(a3[i,:]))
        second_term = np.multiply((1 - y[i,:]), np.log(1 - a3[i,:]))
        J += np.sum(first_term - second_term)
    
    J = (J + (float(learningRate) / 2) * (np.sum(np.power(theta1[:,1:], 2)) + np.sum(np.power(theta2[:,1:], 2)))) / len(X)
    
    #Backpropagation
    D1, D2 = back_propagate(X, y, theta1, theta2, z2, a2, a3)
    
    #Unravel the gradient into a single array.
    grad = np.concatenate((np.ravel(D1), np.ravel(D2)))
    return J, grad
#Simply to see that this works.
J, grad = cost(params, X, Y_train, 1)
J, grad.shape
Out[8]:
(2.0055905698764396, (135,))
In [9]:
#Minimizing function.
fmin = minimize(cost, x0=params, args=(X, Y_train, learning_rate), method='TNC', jac=True, options={'maxiter': 600})
In [10]:
#Get the optimized weights and use them to get output. 
theta1 = np.matrix(np.reshape(fmin.x[:hidden_size * (X.shape[1])], (hidden_size, (X.shape[1]))))
theta2 = np.matrix(np.reshape(fmin.x[hidden_size * (X.shape[1]):], (Y_train.shape[1], (hidden_size + 1))))
z2, a2, a3 = forward_propagate(X, theta1, theta2)
In [11]:
#Prediction is in form of probabilities for each class. Get the class with highest probability.
def pred(a):
    for i in range(len(a)):
        yield monsters[np.argmax(a[i])]
prediction = list(pred(a3))
In [12]:
#Accuracy on training dataset.
accuracy = sum(prediction == train['type']) / len (train['type'])
print('accuracy = {0}%'.format(accuracy * 100))
accuracy = 76.01078167115904%
In [13]:
#Predict on test set.
z2, a2, a3_test = forward_propagate(X_test, theta1, theta2)
In [14]:
prediction_test = list(pred(a3_test))
In [15]:
submission = pd.DataFrame({'id':test['id'], 'type':prediction_test})
submission.to_csv('GGG_submission.csv', index=False)

I got an accuracy of ~0.741 with this neural network. A good result, considering that my ensemble got ~0.748.