This notebook demonstrates neural networks (NN) classifiers, which are provided by Reproducible experiment platform (REP) package.
REP contains wrappers for following NN libraries:
Most of this is done in the same way as for other classifiers (see notebook 01-howto-Classifiers.ipynb).
Parameters selected here are specially taken to make training very fast, those are very non-optimal.
!cd toy_datasets; wget -O MiniBooNE_PID.txt -nc --no-check-certificate https://archive.ics.uci.edu/ml/machine-learning-databases/00199/MiniBooNE_PID.txt
File `MiniBooNE_PID.txt' already there; not retrieving.
import numpy, pandas
from rep.utils import train_test_split
from sklearn.metrics import roc_auc_score
data = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep='\s*', skiprows=[0], header=None, engine='python')
labels = pandas.read_csv('toy_datasets/MiniBooNE_PID.txt', sep=' ', nrows=1, header=None)
labels = [1] * labels[1].values[0] + [0] * labels[2].values[0]
data.columns = ['feature_{}'.format(key) for key in data.columns]
len(data)
130064
data[:5]
feature_0 | feature_1 | feature_2 | feature_3 | feature_4 | feature_5 | feature_6 | feature_7 | feature_8 | feature_9 | ... | feature_40 | feature_41 | feature_42 | feature_43 | feature_44 | feature_45 | feature_46 | feature_47 | feature_48 | feature_49 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2.59413 | 0.468803 | 20.6916 | 0.322648 | 0.009682 | 0.374393 | 0.803479 | 0.896592 | 3.59665 | 0.249282 | ... | 101.174 | -31.3730 | 0.442259 | 5.86453 | 0.000000 | 0.090519 | 0.176909 | 0.457585 | 0.071769 | 0.245996 |
1 | 3.86388 | 0.645781 | 18.1375 | 0.233529 | 0.030733 | 0.361239 | 1.069740 | 0.878714 | 3.59243 | 0.200793 | ... | 186.516 | 45.9597 | -0.478507 | 6.11126 | 0.001182 | 0.091800 | -0.465572 | 0.935523 | 0.333613 | 0.230621 |
2 | 3.38584 | 1.197140 | 36.0807 | 0.200866 | 0.017341 | 0.260841 | 1.108950 | 0.884405 | 3.43159 | 0.177167 | ... | 129.931 | -11.5608 | -0.297008 | 8.27204 | 0.003854 | 0.141721 | -0.210559 | 1.013450 | 0.255512 | 0.180901 |
3 | 4.28524 | 0.510155 | 674.2010 | 0.281923 | 0.009174 | 0.000000 | 0.998822 | 0.823390 | 3.16382 | 0.171678 | ... | 163.978 | -18.4586 | 0.453886 | 2.48112 | 0.000000 | 0.180938 | 0.407968 | 4.341270 | 0.473081 | 0.258990 |
4 | 5.93662 | 0.832993 | 59.8796 | 0.232853 | 0.025066 | 0.233556 | 1.370040 | 0.787424 | 3.66546 | 0.174862 | ... | 229.555 | 42.9600 | -0.975752 | 2.66109 | 0.000000 | 0.170836 | -0.814403 | 4.679490 | 1.924990 | 0.253893 |
5 rows × 50 columns
# Get train and test data
train_data, test_data, train_labels, test_labels = train_test_split(data, labels, train_size=0.25)
All nets inherit from sklearn.BaseEstimator and have the same interface as another wrappers in REP (details see in 01-howto-Classifiers)
Neurla network libraries libraries support:
partial_fit
method)and don't support:
variables = list(data.columns[:15])
from rep.estimators import TheanetsClassifier
print TheanetsClassifier.__doc__
Classifier from Theanets library. Parameters: ----------- :param features: list of features to train model :type features: None or list(str) :param layers: a sequence of values specifying the **hidden** layer configuration for the network. For more information please see 'Specifying layers' in theanets documentation: http://theanets.readthedocs.org/en/latest/creating.html#creating-specifying-layers Note that theanets "layers" parameter included input and output layers in the sequence as well. :type layers: sequence of int, tuple, dict :param int input_layer: size of the input layer. If equals -1, the size is taken from the training dataset :param int output_layer: size of the output layer. If equals -1, the size is taken from the training dataset :param str hidden_activation: the name of an activation function to use on hidden network layers by default :param str output_activation: the name of an activation function to use on the output layer by default :param float input_noise: standard deviation of desired noise to inject into input :param float hidden_noise: standard deviation of desired noise to inject into hidden unit activation output :param input_dropouts: proportion of input units to randomly set to 0 :type input_dropouts: float in [0, 1] :param hidden_dropouts: proportion of hidden unit activations to randomly set to 0 :type hidden_dropouts: float in [0, 1] :param decode_from: any of the hidden layers can be tapped at the output. Just specify a value greater than 1 to tap the last N hidden layers. The default is 1, which decodes from just the last layer :type decode_from: positive int :param scaler: scaler used to transform data. If False, scaling will not be used :type scaler: str or sklearn-like transformer or False (do not scale features) :param trainers: parameters to specify training algorithm(s) example: [{'optimize': sgd, 'momentum': 0.2}, {'optimize': 'nag'}] :type trainers: list[dict] or None :param int random_state: random seed For more information on available trainers and their parameters, see this page http://theanets.readthedocs.org/en/latest/training.html
tn = TheanetsClassifier(features=variables, layers=[7],
trainers=[{'optimize': 'nag', 'learning_rate': 0.1, 'min_improvement': 0.1}])
tn.fit(train_data, train_labels)
pass
prob = tn.predict_proba(test_data)
print prob
[[ 0.26320391 0.73679609] [ 0.81044349 0.18955651] [ 0.40544071 0.59455929] ..., [ 0.90087309 0.09912691] [ 0.86900052 0.13099948] [ 0.90821799 0.09178201]]
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
ROC AUC 0.843440299528
In some cases we need to continue training: i.e., we have new data or current trainer is not efficient anymore.
For this purpose there is partial_fit
method, where you can continue training using different trainer or different data.
tn = TheanetsClassifier(features=variables, layers=[10, 10],
trainers=[{'algo': 'rprop', 'min_improvement': 0.1}])
tn.fit(train_data, train_labels)
print('training complete')
training complete
tn.partial_fit(train_data, train_labels, **{'algo': 'adagrad', 'min_improvement': 0.1})
print('training complete')
training complete
# predict probabilities for each class
prob = tn.predict_proba(test_data)
print prob
[[ 0.24486897 0.75513103] [ 0.78883091 0.21116909] [ 0.47429026 0.52570974] ..., [ 0.90560846 0.09439154] [ 0.88662219 0.11337781] [ 0.9052761 0.0947239 ]]
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
ROC AUC 0.844713853906
tn.predict(test_data)
array([1, 0, 1, ..., 0, 0, 0])
from rep.estimators import NeurolabClassifier
print NeurolabClassifier.__doc__
Classifier from neurolab library. Parameters: ----------- :param features: features used in training :type features: list[str] or None :param list[int] layers: sequence, number of units inside each **hidden** layer. :param string net_type: type of network One of 'feed-forward', 'single-layer', 'competing-layer', 'learning-vector', 'elman-recurrent', 'hopfield-recurrent', 'hemming-recurrent' :param initf: layer initializers :type initf: anything implementing call(layer), e.g. nl.init.* or list[nl.init.*] of shape [n_layers] :param trainf: net train function, default value depends on type of network :param scaler: transformer to apply to the input objects :type scaler: str or sklearn-like transformer or False (do not scale features) :param random_state: ignored, added for uniformity. :param dict kwargs: additional arguments to net __init__, varies with different net_types .. seealso:: https://pythonhosted.org/neurolab/lib.html for supported train functions and their parameters.
import neurolab
nl = NeurolabClassifier(features=variables, layers=[10], epochs=5, trainf=neurolab.train.train_rprop)
nl.fit(train_data, train_labels)
print('training complete')
The maximum number of train epochs is reached training complete
nl.partial_fit(new_train_data, new_train_labels)
# predict probabilities for each class
prob = nl.predict_proba(test_data)
print prob
[[ 0.72909063 0.27090937] [ 0.73301084 0.26698916] [ 0.7261278 0.2738722 ] ..., [ 0.72833376 0.27166624] [ 0.72881829 0.27118171] [ 0.72708209 0.27291791]]
print 'ROC AUC', roc_auc_score(test_labels, prob[:, 1])
ROC AUC 0.471191281317
# predict labels
nl.predict(test_data)
array([0, 0, 0, ..., 0, 0, 0])
from rep.estimators import PyBrainClassifier
print PyBrainClassifier.__doc__
Implements classification from PyBrain library Parameters: ----------- :param features: features used in training. :type features: list[str] or None :param scaler: transformer to apply to the input objects :type scaler: str or sklearn-like transformer or False (do not scale features) :param bool use_rprop: flag to indicate whether we should use Rprop or SGD trainer :param bool verbose: print train/validation errors. :param random_state: ignored parameter, pybrain training isn't reproducible **Net parameters:** :param list[int] layers: indicate how many neurons in each hidden(!) layer; default is 1 hidden layer with 10 neurons :param list[str] hiddenclass: classes of the hidden layers; default is 'SigmoidLayer' :param dict params: other net parameters: bias and outputbias (boolean) flags to indicate whether the network should have the corresponding biases, both default to True; peepholes (boolean); recurrent (boolean) if the `recurrent` flag is set, a :class:`RecurrentNetwork` will be created, otherwise a :class:`FeedForwardNetwork` **Gradient descent trainer parameters:** :param float learningrate: gives the ratio of which parameters are changed into the direction of the gradient :param float lrdecay: the learning rate decreases by lrdecay, which is used to multiply the learning rate after each training step :param float momentum: the ratio by which the gradient of the last timestep is used :param boolean batchlearning: if set, the parameters are updated only at the end of each epoch. Default is False :param float weightdecay: corresponds to the weightdecay rate, where 0 is no weight decay at all **Rprop trainer parameters:** :param float etaminus: factor by which step width is decreased when overstepping (0.5) :param float etaplus: factor by which step width is increased when following gradient (1.2) :param float delta: step width for each weight :param float deltamin: minimum step width (1e-6) :param float deltamax: maximum step width (5.0) :param float delta0: initial step width (0.1) **Training termination parameters** :param int epochs: number of iterations of training; if < 0 then classifier trains until convergence :param int max_epochs: if is given, at most that many epochs are trained :param int continue_epochs: each time validation error decreases, try for continue_epochs epochs to find a better one :param float validation_proportion: the ratio of the dataset that is used for the validation dataset .. note:: Details about parameters: http://pybrain.org/docs/
pb = PyBrainClassifier(features=variables, layers=[5], epochs=2, hiddenclass=['TanhLayer'])
pb.fit(train_data, train_labels)
print('training complete')
training complete
again, we could proceed with training and use new dataset
nl.partial_fit(new_train_data, new_train_labels)
prob = pb.predict_proba(test_data)
print 'ROC AUC:', roc_auc_score(test_labels, prob[:, 1])
ROC AUC: 0.856107824713
pb.predict(test_data)
array([1, 0, 1, ..., 0, 0, 0])
initial prescaling of features is frequently crucial to get some appropriate results using neural networks.
By default, all the networks use StandardScaler
from sklearn
, but you can use any other transformer, say MinMax or self-written by passing appropriate value as scaler. All the networks have same support of scaler
parameter
from sklearn.preprocessing import MinMaxScaler
# will use StandardScaler
NeurolabClassifier(scaler='standard')
# will use MinMaxScaler
NeurolabClassifier(scaler=MinMaxScaler())
# will not use any pretransformation of features
NeurolabClassifier(scaler=False)
NeurolabClassifier(initf=<function init_rand at 0x112431f50>, layers=[10], net_type='feed-forward', random_state=None, scaler=False, trainf=None)
Let's build an ensemble of neural networks. This will be done by bagging meta-algorithm
A well-known fact is that the classification quality of single neural network can be significantly improved by ensembling.
In simplest case, we average predictions of several neural networks. Bagging trains several classifiers on random subsets of training data, and thus achieves higher quality and more stable predictions.
You can try the same trick with any other network, not only Theanets.
# uncomment the code below to try, this may take much time
# from sklearn.ensemble import BaggingClassifier
# base_tn = TheanetsClassifier(layers=[10, 7], trainers=[{'algo': 'adadelta'}])
# bagging_tn = BaggingClassifier(base_estimator=base_tn, n_estimators=10)
# bagging_tn.fit(train_data[variables], train_labels)
# prob = bagging_tn.predict_proba(test_data[variables])
# print 'AUC', roc_auc_score(test_labels, prob[:, 1])
There are many things you can do with neural networks now:
grid_search
, play with sizes of hidden layers and other parameterssklearn.pipeline
)And you can replace classifiers at any moment.
Sklearn-compatible libraries you can use within REP: