In this assignment we implement and improve the bag of vectors classifier discussed in the class note. This is a first step towards using neural network models.
import numpy as np
from a1 import data, featurize
from a1.dense_classifiers import SoftmaxClassifier
from a1.bag_vectors import BagOfVectors
# %load_ext autoreload
# %autoreload 2
train_data, test_data = data.polarity(verbose=True)
def x_list(data):
return [ example['x'].split(' ') for example in data]
def y_array(data):
return np.array([0 if example['y']=='neg' else 1 for example in data])
train_sents = x_list(train_data);
bag_vectors = BagOfVectors(train_sents, max_vocab=10000, dim=100)
X_train = bag_vectors.data_matrix(x_list(train_data))
X_test = bag_vectors.data_matrix(x_list(test_data))
y_train = y_array(train_data)
y_test = y_array(test_data)
Implement softmax_loss_vectorized
, DenseLinearClassifier.train
and DenseLinearClassifier.predict
in dense_classifiers.py
.
Train and test using the above bag-of-vectors data.
Implement train, predict, loss
in neural_net.py
One problem is that our random word vector is underfitting. One way to improve this is to backpropogate and modify these random word vectors. In this part, make new classifiers analogous to the linear classifer, and the neural network classifier that also updates the word vectors via backpropagation.
Hint: these models should also have a BagOfVectors
as an instance state in addition to the regular model parameters.
Describe some mistakes made by your model, inspect/visualize the weights. Comment on any issues with the model based on your concrete observations.m