About¶

This notebook demonstrates neural networks (NN) classifiers, which are provided by Reproducible experiment platform (REP) package.
REP contains wrappers for following NN libraries:

theanets
neurolab
pybrain

In this notebook we show:¶

train classifier
get predictions
measure quality
pretraining and partial fitting
combine classifiers using meta-algorithms

Most of this is done in the same way as for other classifiers (see notebook 01-howto-Classifiers.ipynb).

Parameters selected here are specially taken to make training very fast, those are very non-optimal.

Loading data¶

download particle identification data set from UCI¶

In [1]:

!cd toy_datasets; wget -O MiniBooNE_PID.txt -nc --no-check-certificate https://archive.ics.uci.edu/ml/machine-learning-databases/00199/MiniBooNE_PID.txt

File `MiniBooNE_PID.txt' already there; not retrieving.

In [2]:

import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score

data = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep='\s*', skiprows=[0], header=None, engine='python')
labels = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep=' ', nrows=1, header=None)
labels = [1] * labels[1].values[0] + [0] * labels[2].values[0]
data.columns = ['feature_{}'.format(key) for key in data.columns]

In [3]:

len(data)

Out[3]:

First rows of data¶

In [4]:

data[:5]

Out[4]:

	feature_0	feature_1	feature_2	feature_3	feature_4	feature_5	feature_6	feature_7	feature_8	feature_9	...	feature_40	feature_41	feature_42	feature_43	feature_44	feature_45	feature_46	feature_47	feature_48	feature_49
0	2.59413	0.468803	20.6916	0.322648	0.009682	0.374393	0.803479	0.896592	3.59665	0.249282	...	101.174	-31.3730	0.442259	5.86453	0.000000	0.090519	0.176909	0.457585	0.071769	0.245996
1	3.86388	0.645781	18.1375	0.233529	0.030733	0.361239	1.069740	0.878714	3.59243	0.200793	...	186.516	45.9597	-0.478507	6.11126	0.001182	0.091800	-0.465572	0.935523	0.333613	0.230621
2	3.38584	1.197140	36.0807	0.200866	0.017341	0.260841	1.108950	0.884405	3.43159	0.177167	...	129.931	-11.5608	-0.297008	8.27204	0.003854	0.141721	-0.210559	1.013450	0.255512	0.180901
3	4.28524	0.510155	674.2010	0.281923	0.009174	0.000000	0.998822	0.823390	3.16382	0.171678	...	163.978	-18.4586	0.453886	2.48112	0.000000	0.180938	0.407968	4.341270	0.473081	0.258990
4	5.93662	0.832993	59.8796	0.232853	0.025066	0.233556	1.370040	0.787424	3.66546	0.174862	...	229.555	42.9600	-0.975752	2.66109	0.000000	0.170836	-0.814403	4.679490	1.924990	0.253893

5 rows × 50 columns

Splitting into train and test¶

In [5]:

# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.25)

Neural nets¶

All nets inherit from sklearn.BaseEstimator and have the same interface as another wrappers in REP (details see in 01-howto-Classifiers)

Neurla network libraries libraries support:

classification
multi-classification
regression
multi-target regresssion
additional fitting (using partial_fit method)

and don't support:

staged prediction methods
weights for data

Variables used in training¶

In [6]:

variables = list(data.columns[:15])

Theanets¶

In [7]:

from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__

Classifier from Theanets library. 

    Parameters:
    -----------
    :param features: list of features to train model
    :type features: None or list(str)
    :param layers: a sequence of values specifying the **hidden** layer configuration for the network.
        For more information please see 'Specifying layers' in theanets documentation:
        http://theanets.readthedocs.org/en/latest/creating.html#creating-specifying-layers
        Note that theanets "layers" parameter included input and output layers in the sequence as well.
    :type layers: sequence of int, tuple, dict
    :param int input_layer: size of the input layer. If equals -1, the size is taken from the training dataset
    :param int output_layer: size of the output layer. If equals -1, the size is taken from the training dataset
    :param str hidden_activation: the name of an activation function to use on hidden network layers by default
    :param str output_activation: the name of an activation function to use on the output layer by default
    :param float input_noise: standard deviation of desired noise to inject into input
    :param float hidden_noise: standard deviation of desired noise to inject into hidden unit activation output
    :param input_dropouts: proportion of input units to randomly set to 0
    :type input_dropouts: float in [0, 1]
    :param hidden_dropouts: proportion of hidden unit activations to randomly set to 0
    :type hidden_dropouts: float in [0, 1]
    :param decode_from: any of the hidden layers can be tapped at the output. Just specify a value greater than
        1 to tap the last N hidden layers. The default is 1, which decodes from just the last layer
    :type decode_from: positive int
    :param scaler: scaler used to transform data. If False, scaling will not be used
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param trainers: parameters to specify training algorithm(s)
        example: [{'optimize': sgd, 'momentum': 0.2}, {'optimize': 'nag'}]
    :type trainers: list[dict] or None
    :param int random_state: random seed


    For more information on available trainers and their parameters, see this page
    http://theanets.readthedocs.org/en/latest/training.html

Simple training¶

In [8]:

tn = TheanetsClassifier(features=variables, layers=[7], 
                        trainers=[{'optimize': 'nag', 'learning_rate': 0.1, 'min_improvement': 0.1}])

tn.fit(train_data, train_labels)
pass

Predicting probabilities, measuring the quality¶

In [9]:

prob = tn.predict_proba(test_data)
print prob

[[ 0.26320391  0.73679609]
 [ 0.81044349  0.18955651]
 [ 0.40544071  0.59455929]
 ..., 
 [ 0.90087309  0.09912691]
 [ 0.86900052  0.13099948]
 [ 0.90821799  0.09178201]]

In [10]:

print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

ROC AUC 0.843440299528

Theanets multistage training¶

In some cases we need to continue training: i.e., we have new data or current trainer is not efficient anymore.

For this purpose there is partial_fit method, where you can continue training using different trainer or different data.

In [11]:

tn = TheanetsClassifier(features=variables, layers=[10, 10], 
                        trainers=[{'algo': 'rprop', 'min_improvement': 0.1}])

tn.fit(train_data, train_labels)
print('training complete')

training complete

Second stage of fitting¶

In [12]:

tn.partial_fit(train_data, train_labels, **{'algo': 'adagrad', 'min_improvement': 0.1})
print('training complete')

training complete

In [13]:

# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob

[[ 0.24486897  0.75513103]
 [ 0.78883091  0.21116909]
 [ 0.47429026  0.52570974]
 ..., 
 [ 0.90560846  0.09439154]
 [ 0.88662219  0.11337781]
 [ 0.9052761   0.0947239 ]]

In [14]:

print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

ROC AUC 0.844713853906

Predictions of classes¶

In [15]:

tn.predict(test_data)

Out[15]:

array([1, 0, 1, ..., 0, 0, 0])

Neurolab¶

In [16]:

from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__

Classifier from neurolab library. 

    Parameters:
    -----------
    :param features: features used in training
    :type features: list[str] or None
    :param list[int] layers: sequence, number of units inside each **hidden** layer.
    :param string net_type: type of network
        One of 'feed-forward', 'single-layer', 'competing-layer', 'learning-vector',
        'elman-recurrent', 'hopfield-recurrent', 'hemming-recurrent'
    :param initf: layer initializers
    :type initf: anything implementing call(layer), e.g. nl.init.* or list[nl.init.*] of shape [n_layers]
    :param trainf: net train function, default value depends on type of network
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param random_state: ignored, added for uniformity.
    :param dict kwargs: additional arguments to net __init__, varies with different net_types

    .. seealso:: https://pythonhosted.org/neurolab/lib.html for supported train functions and their parameters.

Let's train network using Rprop algorithm¶

In [17]:

import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=5, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')

The maximum number of train epochs is reached
training complete

After training neural network you still can improve it by using partial fit on other data:¶

nl.partial_fit(new_train_data, new_train_labels)

Predict probabilities and estimate quality¶

In [18]:

# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob

[[ 0.72909063  0.27090937]
 [ 0.73301084  0.26698916]
 [ 0.7261278   0.2738722 ]
 ..., 
 [ 0.72833376  0.27166624]
 [ 0.72881829  0.27118171]
 [ 0.72708209  0.27291791]]

In [19]:

print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])

ROC AUC 0.471191281317

In [20]:

# predict labels
nl.predict(test_data)

Out[20]:

array([0, 0, 0, ..., 0, 0, 0])

Pybrain¶

In [21]:

from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__

Implements classification from PyBrain library 

    Parameters:
    -----------
    :param features: features used in training.
    :type features: list[str] or None
    :param scaler: transformer to apply to the input objects
    :type scaler: str or sklearn-like transformer or False (do not scale features)
    :param bool use_rprop: flag to indicate whether we should use Rprop or SGD trainer
    :param bool verbose: print train/validation errors.
    :param random_state: ignored parameter, pybrain training isn't reproducible

    **Net parameters:**

    :param list[int] layers: indicate how many neurons in each hidden(!) layer; default is 1 hidden layer with 10 neurons
    :param list[str] hiddenclass: classes of the hidden layers; default is 'SigmoidLayer'
    :param dict params: other net parameters:
        bias and outputbias (boolean) flags to indicate whether the network should have the corresponding biases,
        both default to True;
        peepholes (boolean);
        recurrent (boolean) if the `recurrent` flag is set, a :class:`RecurrentNetwork` will be created,
        otherwise a :class:`FeedForwardNetwork`

    **Gradient descent trainer parameters:**

    :param float learningrate: gives the ratio of which parameters are changed into the direction of the gradient
    :param float lrdecay: the learning rate decreases by lrdecay, which is used to multiply the learning rate after each training step
    :param float momentum: the ratio by which the gradient of the last timestep is used
    :param boolean batchlearning: if set, the parameters are updated only at the end of each epoch. Default is False
    :param float weightdecay: corresponds to the weightdecay rate, where 0 is no weight decay at all

    **Rprop trainer parameters:**

    :param float etaminus: factor by which step width is decreased when overstepping (0.5)
    :param float etaplus: factor by which step width is increased when following gradient (1.2)
    :param float delta: step width for each weight
    :param float deltamin: minimum step width (1e-6)
    :param float deltamax: maximum step width (5.0)
    :param float delta0: initial step width (0.1)

    **Training termination parameters**

    :param int epochs: number of iterations of training; if < 0 then classifier trains until convergence
    :param int max_epochs: if is given, at most that many epochs are trained
    :param int continue_epochs: each time validation error decreases, try for continue_epochs epochs to find a better one
    :param float validation_proportion: the ratio of the dataset that is used for the validation dataset

    .. note::

        Details about parameters: http://pybrain.org/docs/

In [22]:

pb = PyBrainClassifier(features=variables, layers=[5], epochs=2, hiddenclass=['TanhLayer'])
pb.fit(train_data, train_labels)
print('training complete')

training complete

Predict probabilities and estimate quality¶

again, we could proceed with training and use new dataset

nl.partial_fit(new_train_data, new_train_labels)

In [23]:

prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])

ROC AUC: 0.856107824713

Predict labels¶

In [24]:

pb.predict(test_data)

Out[24]:

array([1, 0, 1, ..., 0, 0, 0])

Scaling of features¶

initial prescaling of features is frequently crucial to get some appropriate results using neural networks.

By default, all the networks use StandardScaler from sklearn, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of scaler parameter

In [25]:

from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)

Out[25]:

NeurolabClassifier(initf=<function init_rand at 0x112431f50>, layers=[10],
          net_type='feed-forward', random_state=None, scaler=False,
          trainf=None)

Advantages of common interface¶

Let's build an ensemble of neural networks. This will be done by bagging meta-algorithm

Bagging over Theanets classifier¶

A well-known fact is that the classification quality of single neural network can be significantly improved by ensembling.

In simplest case, we average predictions of several neural networks. Bagging trains several classifiers on random subsets of training data, and thus achieves higher quality and more stable predictions.

You can try the same trick with any other network, not only Theanets.

In [26]:

# uncomment the code below to try, this may take much time

# from sklearn.ensemble import BaggingClassifier
# base_tn = TheanetsClassifier(layers=[10, 7], trainers=[{'algo': 'adadelta'}])
# bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=10)
# bagging_tn.fit(train_data[variables], train_labels)
# prob = bagging_tn.predict_proba(test_data[variables])
# print 'AUC', roc_auc_score(test_labels, prob[:, 1])

Other advantages of common interface¶

There are many things you can do with neural networks now:

cloning
getting / setting parameters as dictionaries
use grid_search, play with sizes of hidden layers and other parameters
build pipelines (sklearn.pipeline)
use hierarchical training, training on subsets
passing over internet / train classifiers on other machines / distributed learning of ensemles

And you can replace classifiers at any moment.

About¶

In this notebook we show:¶

Loading data¶

download particle identification data set from UCI¶

First rows of data¶

Splitting into train and test¶

Neural nets¶

Variables used in training¶

Theanets¶

Simple training¶

Predicting probabilities, measuring the quality¶

Theanets multistage training¶

Second stage of fitting¶

Predictions of classes¶

Neurolab¶

Let's train network using Rprop algorithm¶

After training neural network you still can improve it by using partial fit on other data:¶

Predict probabilities and estimate quality¶

Pybrain¶

Predict probabilities and estimate quality¶

Predict labels¶

Scaling of features¶

Advantages of common interface¶

Bagging over Theanets classifier¶

Other advantages of common interface¶

See also¶