Bake-off: Stanford Sentiment Treebank

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2018 term"

Overview

The goal of this in-class bake-off is to achieve the highest average F1 score on the SST development set, with the binary class function.

The only restriction: you cannot make any use of the subtree labels.

Bake-off submission

  1. A description of the model you created.
  2. The value of f1-score in the avg / total row of the classification report.

Submission URL: https://docs.google.com/forms/d/1R41Zxxils7lOPzuThMdv2p1TKmFEy8c0DyUg-YkzTa0/edit

Methodological note

You don't have to use the experimental framework defined below (based on sst). However, if you don't use sst.experiment as below, then make sure you're training only on train, evaluating on dev, and that you report with

from sklearn.metrics import classification_report
classification_report(y_dev, predictions)

where y_dev = [y for tree, y in sst.dev_reader(class_func=sst.binary_class_func)]

Set-up

See the first notebook in this unit for set-up instructions.

In [2]:
from collections import Counter
from rnn_classifier import RNNClassifier
from sklearn.linear_model import LogisticRegression
import sst
import tensorflow as tf
from tf_rnn_classifier import TfRNNClassifier
from tree_nn import TreeNN
/Applications/anaconda/envs/nlu/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

Baseline

In [3]:
def unigrams_phi(tree):
    """The basis for a unigrams feature function.
    
    Parameters
    ----------
    tree : nltk.tree
        The tree to represent.
    
    Returns
    -------    
    defaultdict
        A map from strings to their counts in `tree`. (Counter maps a 
        list to a dict of counts of the elements in that list.)
    
    """
    return Counter(tree.leaves())
In [4]:
def fit_maxent_classifier(X, y):        
    mod = LogisticRegression(fit_intercept=True)
    mod.fit(X, y)
    return mod
In [5]:
_ = sst.experiment(
    unigrams_phi,                      # Free to write your own!
    fit_maxent_classifier,             # Free to write your own!
    train_reader=sst.train_reader,     # Fixed by the competition.
    assess_reader=sst.dev_reader,      # Fixed.
    class_func=sst.binary_class_func)  # Fixed.
Accuracy: 0.772
             precision    recall  f1-score   support

   negative      0.783     0.741     0.761       428
   positive      0.762     0.802     0.782       444

avg / total      0.772     0.772     0.772       872

By the way, with some informal hyperparameter search on a GPU machine, I found this model

tf_rnn_glove = TfRNNClassifier(
    sst_glove_vocab,
    embedding=glove_embedding, ## 100d version
    hidden_dim=300,
    max_length=52,
    hidden_activation=tf.nn.relu,
    cell_class=tf.nn.rnn_cell.LSTMCell,
    train_embedding=True,
    max_iter=5000,
    batch_size=1028,
    eta=0.001)

which finished with almost identical performance to the above:

             precision    recall  f1-score   support

   negative       0.78      0.75      0.76       428
   positive       0.77      0.80      0.78       444

avg / total       0.77      0.77      0.77       872

TfRNNClassifier wrapper

In [6]:
def rnn_phi(tree):
    return tree.leaves()    
In [7]:
def fit_tf_rnn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TfRNNClassifier(
        vocab, 
        eta=0.05,
        batch_size=2048,
        embed_dim=50,
        hidden_dim=50,
        max_length=52, 
        max_iter=500,
        cell_class=tf.nn.rnn_cell.LSTMCell,
        hidden_activation=tf.nn.tanh,
        train_embedding=True)
    mod.fit(X, y)
    return mod
In [8]:
_ = sst.experiment(
    rnn_phi,
    fit_tf_rnn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)
Iteration 500: loss: 2.5404394865036012
Accuracy: 0.615
             precision    recall  f1-score   support

   negative      0.571     0.869     0.689       428
   positive      0.745     0.369     0.494       444

avg / total      0.660     0.615     0.590       872

TreeNN wrapper

In [9]:
def tree_phi(tree):
    return tree
In [10]:
def fit_tree_nn_classifier(X, y):
    vocab = sst.get_vocab(X, n_words=3000)
    mod = TreeNN(
        vocab, 
        embed_dim=100, 
        max_iter=100)
    mod.fit(X, y)
    return mod
In [11]:
_ = sst.experiment(
    rnn_phi,
    fit_tree_nn_classifier, 
    vectorize=False,  # For deep learning, use `vectorize=False`.
    assess_reader=sst.dev_reader)
Finished epoch 100 of 100; error is 0.8351342778738807
Accuracy: 0.510
             precision    recall  f1-score   support

   negative      0.501     0.498     0.499       428
   positive      0.519     0.523     0.521       444

avg / total      0.510     0.510     0.510       872