Modifying and expanding the included TensorFlow modules

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"


This repository contains a small framework for defining models in TensorFlow. I hope the classes are easily extended. The goal of this notebook is to illustrate a few such extensions to try to convey the overall design.

The class structure for the relevant files looks like this:

  • tf_model_base.TfModelBase
    • tf_shallow_neural_classifier.TfShallowNeuralClassifier
    • tf_autoencoder.TfAutoencoder
    • tf_rnn_classifier.TfRNNClassifier

To define a new subclass of TfModelBase, you need only fill in build_graph, train_dict, and test_dict. The first defines the model's core computation graph, and the other two tell the class how to handle incoming data during training and testing.

Incidentally, the pure NumPy classes

  • nn_model_base.NNModelBase
    • rnn_classifier.RNNClassifier
    • tree_nn.TreeNN

have a very similar design, and so they should be just as extendable. However, you have to write your own backpropagation methods for them, so they are more challenging in that respect.


In [2]:
from sklearn.datasets import   make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import tensorflow as tf
from tf_model_base import TfModelBase
from tf_rnn_classifier import TfRNNClassifier
from tf_shallow_neural_classifier import TfShallowNeuralClassifier
/Applications/anaconda/envs/nlu/lib/python3.6/site-packages/h5py/ FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Basic experiments to illustrate the models

The following code is here just to facilitate testing. It's not part of the framework.

In [3]:
def sklearn_evaluation(X, y, mod, random_state=None):
    """No frills random train/test split evaluations."""
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.33, random_state=random_state), y_train, X_test=X_test, y_test=y_test)
    predictions = mod.predict(X_test)
    print(classification_report(y_test, predictions))

def artificial_evaluation(mod, random_state=42):
    """sklearn random classification dataset generation, 
    designed to be challenging."""
    X, y = make_classification(
        X, y, mod, 

A basic softmax classifier

A simple extension of TfModelBase is a softmax classifier:

$$y = \textbf{softmax}(xW + b)$$

Really all we need to do is define the parameters and computation graph.

Note: self.model has to be used to define the final output, because functions in TfModelBase assume this.

In [4]:
class TfSoftmaxClassifier(TfModelBase):
    def build_graph(self):
        # Input and output placeholders
        self.inputs = tf.placeholder(
            tf.float32, shape=[None, self.input_dim])
        self.outputs = tf.placeholder(
            tf.float32, shape=[None, self.output_dim])
        # Parameters:
        self.W = tf.Variable(
            tf.zeros([self.input_dim, self.output_dim]))
        self.b = tf.Variable(
        # The graph:        
        self.model = tf.matmul(self.inputs, self.W) + self.b
    def train_dict(self, X, y):
        return {self.inputs: X, self.outputs: y}
    def test_dict(self, X):
        return {self.inputs: X}    
In [5]:
Iteration 100: loss: 0.5050749778747559
             precision    recall  f1-score   support

          0       0.71      0.73      0.72       107
          1       0.76      0.66      0.71       118
          2       0.61      0.68      0.64       105

avg / total       0.69      0.69      0.69       330

Softmax with a better optimizer

In TfModelBase, the get_optimizer method returns a tf.train.GradientDescentOptimizer. To change this in TfSoftmaxClassifier, you can define a very small subclass.

Note: self.eta and self.cost are set by the base class. The first is a keyword parameter, and the second is an attribute that gets set inside fit, as the return value of get_cost_function.

In [6]:
class TfSoftmaxClassifierWithAdaGrad(TfSoftmaxClassifier):
    def get_optimizer(self):
        return tf.train.AdagradOptimizer(self.eta).minimize(self.cost)
In [7]:
Iteration 100: loss: 0.4801039993762975
             precision    recall  f1-score   support

          0       0.70      0.74      0.72       107
          1       0.79      0.69      0.74       118
          2       0.63      0.68      0.65       105

avg / total       0.71      0.70      0.70       330

Softmax with L2 regularization

It is very easy in TensorFlow to add L2 regularization to cost function. You really just write it down the way it appears in textbooks!

In [8]:
class TfSoftmaxClassifierL2(TfSoftmaxClassifier):
    def __init__(self, C=1.0, **kwargs):
        """`C` is the inverse regularization strength."""
        self.C = 1.0 / C
        super(TfSoftmaxClassifierL2, self).__init__(**kwargs)
    def get_cost_function(self, **kwargs):
        reg = self.C * tf.nn.l2_loss(self.W)
        cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
            logits=self.model, labels=self.outputs)
        return tf.reduce_mean(reg + cross_entropy)                
In [9]:
artificial_evaluation(TfSoftmaxClassifierL2(C=4, max_iter=100))
Iteration 100: loss: 0.5413660407066345
             precision    recall  f1-score   support

          0       0.71      0.72      0.71       107
          1       0.76      0.66      0.71       118
          2       0.61      0.69      0.65       105

avg / total       0.69      0.69      0.69       330

Shallow neural network with Dropout

In this case, we extend TfShallowNeuralClassifier (a subclass of TfModelBase) with an additional dropout layer.

Dropout is another form of regularization for neural networks: during each pass, a random selection of dimensions of the target layer are masked, to try to encourage other dimensions to bear some of the weight, and to avoid correlations between dimensions that could lead to over-fitting.

Here's a funny tweet about dropout that is surprisingly good at getting the point across.

In [10]:
class TfShallowNeuralClassifierWithDropout(TfShallowNeuralClassifier):
    def __init__(self, hidden_dim=50, keep_prob=0.8, **kwargs):
        self.hidden_dim = hidden_dim
        self.keep_prob = keep_prob
        super(TfShallowNeuralClassifierWithDropout, self).__init__(**kwargs)        
    def build_graph(self):
        # All the parameters of `TfShallowNeuralClassifier`:
        # Same hidden layer:
        self.hidden = tf.nn.relu(
            tf.matmul(self.inputs, self.W_xh) + self.b_h)
        # Drop-out on the hidden layer:
        self.tf_keep_prob = tf.placeholder(tf.float32)
        dropout_layer = tf.nn.dropout(self.hidden, self.tf_keep_prob)
        # `dropout_layer` instead of `hidden` to define full model:
        self.model = tf.matmul(dropout_layer, self.W_hy) + self.b_y            
    def train_dict(self, X, y):
        return {self.inputs: X, self.outputs: y, 
                self.tf_keep_prob: self.keep_prob}
    def test_dict(self, X):
        # No dropout at test-time, hence `self.tf_keep_prob: 1.0`:
        return {self.inputs: X, self.tf_keep_prob: 1.0}
In [11]:
Iteration 1000: loss: 0.16015665233135223
             precision    recall  f1-score   support

          0       0.77      0.74      0.75       107
          1       0.81      0.79      0.80       118
          2       0.69      0.73      0.71       105

avg / total       0.76      0.75      0.76       330

A bidirectional RNN Classifier

As a final example, let's change TfRNNClassifier into a bidirectional model that makes its softmax prediction based on the concatenation of the two final states that it computes. Here, we just need to redefine build_graph (and its actually the same as the base class up to self.cell, where the two designs diverse).

In [12]:
class TfBidirectionalRNNClassifier(TfRNNClassifier):
    def build_graph(self):

        self.inputs = tf.placeholder(
            tf.int32, [None, self.max_length])

        self.ex_lengths = tf.placeholder(tf.int32, [None])

        # Outputs as usual:
        self.outputs = tf.placeholder(
            tf.float32, shape=[None, self.output_dim])

        # This converts the inputs to a list of lists of dense vector
        # representations:
        self.feats = tf.nn.embedding_lookup(
            self.embedding, self.inputs)

        # Same cell structure as the base class, but we have
        # forward and backward versions:
        self.cell_fw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)
        self.cell_bw = tf.nn.rnn_cell.LSTMCell(
            self.hidden_dim, activation=self.hidden_activation)

        # Run the RNN:
        outputs, finals = tf.nn.bidirectional_dynamic_rnn(
        # finals is a pair of `LSTMStateTuple` objects, which are themselves
        # pairs of Tensors (x, y), where y is the output state, according to
        # Thus, we want the second member of these pairs:
        last_fw, last_bw = finals          
        last_fw, last_bw = last_fw[1], last_bw[1]
        last = tf.concat((last_fw, last_bw), axis=1)
        self.feat_dim = self.hidden_dim * 2               

        # Softmax classifier on the final hidden state:
        self.W_hy = self.weight_init(
            self.feat_dim, self.output_dim, 'W_hy')
        self.b_y = self.bias_init(self.output_dim, 'b_y')
        self.model = tf.matmul(last, self.W_hy) + self.b_y