In this notebook, you'll implement a recurrent neural network that performs sentiment analysis. Using an RNN rather than a feedfoward network is more accurate since we can include information about the sequence of words. Here we'll use a dataset of movie reviews, accompanied by labels.
The architecture for this network is shown below.
Here, we'll pass in words to an embedding layer. We need an embedding layer because we have tens of thousands of words, so we'll need a more efficient representation for our input data than one-hot encoded vectors. You should have seen this before from the word2vec lesson. You can actually train up an embedding with word2vec and use it here. But it's good enough to just have an embedding layer and let the network learn the embedding table on it's own.
From the embedding layer, the new representations will be passed to LSTM cells. These will add recurrent connections to the network so we can include information about the sequence of words in the data. Finally, the LSTM cells will go to a sigmoid output layer here. We're using the sigmoid because we're trying to predict if this text has positive or negative sentiment. The output layer will just be a single unit then, with a sigmoid activation function.
We don't care about the sigmoid outputs except for the very last one, we can ignore the rest. We'll calculate the cost from the output of the last step and the training label.
import numpy as np
import tensorflow as tf
with open('../sentiment-network/reviews.txt', 'r') as f:
reviews = f.read()
with open('../sentiment-network/labels.txt', 'r') as f:
labels = f.read()
reviews[:2000]
'bromwell high is a cartoon comedy . it ran at the same time as some other programs about school life such as teachers . my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers . the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students . when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled . . . . . . . . . at . . . . . . . . . . high . a classic line inspector i m here to sack one of your teachers . student welcome to bromwell high . i expect that many adults of my age think that bromwell high is far fetched . what a pity that it isn t \nstory of a man who has unnatural feelings for a pig . starts out with a opening scene that is a terrific example of absurd comedy . a formal orchestra audience is turned into an insane violent mob by the crazy chantings of it s singers . unfortunately it stays absurd the whole time with no general narrative eventually making it just too off putting . even those from the era should be turned off . the cryptic dialogue would make shakespeare seem easy to a third grader . on a technical level it s better than you might think with some good cinematography by future great vilmos zsigmond . future stars sally kirkland and frederic forrest can be seen briefly . \nhomelessness or houselessness as george carlin stated has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school work or vote for the matter . most people think of the homeless as just a lost cause while worrying about things such as racism the war on iraq pressuring kids to succeed technology the elections inflation or worrying if they ll be next to end up on the streets . br br but what if y'
The first step when building a neural network model is getting your data into the proper form to feed into the network. Since we're using embedding layers, we'll need to encode each word with an integer. We'll also want to clean it up a bit.
You can see an example of the reviews data above. We'll want to get rid of those periods. Also, you might notice that the reviews are delimited with newlines \n
. To deal with those, I'm going to split the text into each review using \n
as the delimiter. Then I can combined all the reviews back together into one big string.
First, let's remove all punctuation. Then get all the text without the newlines and split it into individual words.
from string import punctuation
all_text = ''.join([c for c in reviews if c not in punctuation])
reviews = all_text.split('\n')
all_text = ' '.join(reviews)
words = all_text.split()
all_text[:2000]
'bromwell high is a cartoon comedy it ran at the same time as some other programs about school life such as teachers my years in the teaching profession lead me to believe that bromwell high s satire is much closer to reality than is teachers the scramble to survive financially the insightful students who can see right through their pathetic teachers pomp the pettiness of the whole situation all remind me of the schools i knew and their students when i saw the episode in which a student repeatedly tried to burn down the school i immediately recalled at high a classic line inspector i m here to sack one of your teachers student welcome to bromwell high i expect that many adults of my age think that bromwell high is far fetched what a pity that it isn t story of a man who has unnatural feelings for a pig starts out with a opening scene that is a terrific example of absurd comedy a formal orchestra audience is turned into an insane violent mob by the crazy chantings of it s singers unfortunately it stays absurd the whole time with no general narrative eventually making it just too off putting even those from the era should be turned off the cryptic dialogue would make shakespeare seem easy to a third grader on a technical level it s better than you might think with some good cinematography by future great vilmos zsigmond future stars sally kirkland and frederic forrest can be seen briefly homelessness or houselessness as george carlin stated has been an issue for years but never a plan to help those on the street that were once considered human who did everything from going to school work or vote for the matter most people think of the homeless as just a lost cause while worrying about things such as racism the war on iraq pressuring kids to succeed technology the elections inflation or worrying if they ll be next to end up on the streets br br but what if you were given a bet to live on the st'
words[:100]
['bromwell', 'high', 'is', 'a', 'cartoon', 'comedy', 'it', 'ran', 'at', 'the', 'same', 'time', 'as', 'some', 'other', 'programs', 'about', 'school', 'life', 'such', 'as', 'teachers', 'my', 'years', 'in', 'the', 'teaching', 'profession', 'lead', 'me', 'to', 'believe', 'that', 'bromwell', 'high', 's', 'satire', 'is', 'much', 'closer', 'to', 'reality', 'than', 'is', 'teachers', 'the', 'scramble', 'to', 'survive', 'financially', 'the', 'insightful', 'students', 'who', 'can', 'see', 'right', 'through', 'their', 'pathetic', 'teachers', 'pomp', 'the', 'pettiness', 'of', 'the', 'whole', 'situation', 'all', 'remind', 'me', 'of', 'the', 'schools', 'i', 'knew', 'and', 'their', 'students', 'when', 'i', 'saw', 'the', 'episode', 'in', 'which', 'a', 'student', 'repeatedly', 'tried', 'to', 'burn', 'down', 'the', 'school', 'i', 'immediately', 'recalled', 'at', 'high']
The embedding lookup requires that we pass in integers to our network. The easiest way to do this is to create dictionaries that map the words in the vocabulary to integers. Then we can convert each of our reviews into integers so they can be passed into the network.
Exercise: Now you're going to encode the words with integers. Build a dictionary that maps words to integers. Later we're going to pad our input vectors with zeros, so make sure the integers start at 1, not 0. Also, convert the reviews to integers and store the reviews in a new list called
reviews_ints
.
# Create your dictionary that maps vocab words to integers here
from collections import Counter
counts = Counter(words)
vocab = sorted(counts, key=counts.get, reverse=True)
vocab_to_int = {word: ii for ii, word in enumerate(vocab, 1)}
# Convert the reviews to integers, same shape as reviews list, but with integers
reviews_ints = []
for each in reviews:
reviews_ints.append([vocab_to_int[word] for word in each.split()])
Our labels are "positive" or "negative". To use these labels in our network, we need to convert them to 0 and 1.
Exercise: Convert labels from
positive
andnegative
to 1 and 0, respectively.
# Convert labels to 1s and 0s for 'positive' and 'negative'
labels = labels.split('\n')
labels = np.array([1 if each == 'positive' else 0 for each in labels])
If you built labels
correctly, you should see the next output.
from collections import Counter
review_lens = Counter([len(x) for x in reviews_ints])
print("Zero-length reviews: {}".format(review_lens[0]))
print("Maximum review length: {}".format(max(review_lens)))
Zero-length reviews: 1 Maximum review length: 2514
Okay, a couple issues here. We seem to have one review with zero length. And, the maximum review length is way too many steps for our RNN. Let's truncate to 200 steps. For reviews shorter than 200, we'll pad with 0s. For reviews longer than 200, we can truncate them to the first 200 characters.
Exercise: First, remove the review with zero length from the
reviews_ints
list.
# Filter out that review with 0 length
non_zero_idx = [ii for ii, review in enumerate(reviews_ints) if len(review) != 0]
reviews_ints = [reviews_ints[ii] for ii in non_zero_idx]
labels = np.array([labels[ii] for ii in non_zero_idx])
Exercise: Now, create an array
features
that contains the data we'll pass to the network. The data should come fromreview_ints
, since we want to feed integers to the network. Each row should be 200 elements long. For reviews shorter than 200 words, left pad with 0s. That is, if the review is['best', 'movie', 'ever']
,[117, 18, 128]
as integers, the row will look like[0, 0, 0, ..., 0, 117, 18, 128]
. For reviews longer than 200, use on the first 200 words as the feature vector.
This isn't trivial and there are a bunch of ways to do this. But, if you're going to be building your own deep learning networks, you're going to have to get used to preparing your data.
seq_len = 200
features = np.zeros((len(reviews_ints), seq_len), dtype=int)
for i, row in enumerate(reviews_ints):
features[i, -len(row):] = np.array(row)[:seq_len]
If you build features correctly, it should look like that cell output below.
features[:10,:100]
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 21275, 308, 6, 3, 1050, 207, 8, 2141, 32, 1, 171, 57, 15, 49, 81, 5817, 44, 382, 110, 140, 15, 5237, 60, 154, 9, 1, 4975, 5902, 475, 71, 5, 260, 12, 21275, 308, 13, 1978, 6, 74, 2403], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 63, 4, 3, 125, 36, 47, 7562, 1398, 16, 3, 4182, 505, 45, 17], [22628, 42, 71368, 15, 706, 17725, 3393, 47, 77, 35, 1824, 16, 154, 19, 114, 3, 1307, 5, 336, 147, 22, 1, 857, 12, 70, 281, 1168, 399, 36, 120, 283, 38, 169, 5, 382, 158, 42, 2277, 16, 1, 541, 90, 78, 102, 4, 1, 3248, 15, 43, 3, 408, 1068, 136, 8062, 44, 182, 140, 15, 3051, 1, 320, 22, 4826, 26509, 346, 5, 3100, 2093, 1, 18864, 18529, 42, 8062, 46, 33, 236, 29, 370, 5, 130, 56, 22, 1, 1928, 7, 7, 19, 48, 46, 21, 70, 344, 3, 2103, 5, 407, 22, 1, 1928, 16], [ 4505, 505, 15, 3, 3342, 162, 8464, 1655, 6, 4853, 56, 17, 4527, 5675, 140, 11811, 5, 996, 4949, 2934, 4465, 566, 1202, 36, 6, 1519, 96, 3, 744, 4, 26900, 13, 5, 27, 3481, 9, 10640, 4, 8, 111, 3020, 5, 1, 1027, 15, 3, 4400, 82, 22, 2051, 6, 4465, 538, 2769, 7099, 42120, 41, 463, 1, 8464, 46497, 302, 123, 15, 4228, 19, 1671, 923, 1, 1655, 6, 6178, 20489, 34, 1, 980, 1758, 22455, 646, 24972, 27, 106, 11929, 13, 14400, 15336, 17947, 2461, 466, 21746, 36, 3270, 1, 6371, 1020, 45, 17, 2701, 2501, 33], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 520, 119, 113, 34, 16449, 1818, 3755, 117, 885, 21830, 721, 10, 28, 124, 108, 2, 115, 137, 9, 1623, 7763, 26, 330, 5, 590, 1, 6129, 22, 386, 6, 3, 349, 15, 50, 15, 231, 9, 7565, 11449, 1, 191, 22, 9045, 6, 82, 881, 101, 111, 3594, 4], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 20, 3657, 141, 10, 423, 23, 272, 60, 4359, 22, 32, 84, 3305, 22, 1, 172, 4, 1, 953, 506, 11, 5008, 5400, 5, 574, 4, 1155, 54, 53, 5329, 1, 261, 17, 41, 953, 125, 59, 1, 712, 137, 379, 627, 15, 111, 1509], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 6, 692, 1, 90, 2161, 20, 12010, 1, 2818, 5229, 249, 92, 3011, 8, 126, 24, 201, 3, 803, 634, 4, 22628, 1002], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 786, 295, 10, 122, 11, 6, 419, 5, 29, 35, 482, 20, 19, 1285, 33, 142, 28, 2656, 45, 1842, 32, 1, 2787, 37, 78, 97, 2442, 67, 3972, 45, 2, 24, 105, 256, 1, 134, 1575, 2, 12431, 452, 14, 319, 11, 63, 6, 98, 1323, 5, 105, 1, 3768, 4, 3], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 6, 24, 1, 779, 3699, 2818, 20, 8, 14, 74, 325, 2733, 73, 90, 4, 27, 99, 2, 165, 68], [ 54, 10, 14, 116, 60, 798, 552, 71, 364, 5, 1, 731, 5, 66, 8145, 8, 14, 30, 4, 109, 99, 10, 293, 17, 60, 798, 19, 11, 14, 1, 64, 30, 69, 2503, 45, 4, 234, 93, 10, 68, 114, 108, 8145, 363, 43, 1009, 2, 10, 97, 28, 1431, 45, 1, 357, 4, 60, 110, 205, 8, 48, 3, 1930, 10892, 2, 2130, 354, 412, 4, 13, 6666, 2, 2975, 5154, 2132, 1367, 6, 30, 4, 60, 502, 876, 19, 8145, 6, 34, 227, 1, 247, 412, 4, 582, 4, 27, 599, 9, 1, 13746, 396, 4, 14492]])
With our data in nice shape, we'll split it into training, validation, and test sets.
Exercise: Create the training, validation, and test sets here. You'll need to create sets for the features and the labels,
train_x
andtrain_y
for example. Define a split fraction,split_frac
as the fraction of data to keep in the training set. Usually this is set to 0.8 or 0.9. The rest of the data will be split in half to create the validation and testing data.
split_frac = 0.8
split_idx = int(len(features) * 0.8)
train_x, val_x = features[:split_idx], features[split_idx:]
train_y, val_y = labels[:split_idx], labels[split_idx:]
test_idx = int(len(val_x) * 0.5)
val_x, test_x = val_x[:test_idx], val_x[test_idx:]
val_y, test_y = val_y[:test_idx], val_y[test_idx:]
print("\t\t\tFeature Shapes:")
print("Train set: \t\t{}".format(train_x.shape),
"\nValidation set: \t{}".format(val_x.shape),
"\nTest set: \t\t{}".format(test_x.shape))
Feature Shapes: Train set: (20000, 200) Validation set: (2500, 200) Test set: (2500, 200)
With train, validation, and text fractions of 0.8, 0.1, 0.1, the final shapes should look like:
Feature Shapes:
Train set: (20000, 200)
Validation set: (2500, 200)
Test set: (2500, 200)
Here, we'll build the graph. First up, defining the hyperparameters.
lstm_size
: Number of units in the hidden layers in the LSTM cells. Usually larger is better performance wise. Common values are 128, 256, 512, etc.lstm_layers
: Number of LSTM layers in the network. I'd start with 1, then add more if I'm underfitting.batch_size
: The number of reviews to feed the network in one training pass. Typically this should be set as high as you can go without running out of memory.learning_rate
: Learning ratelstm_size = 256
lstm_layers = 1
batch_size = 500
learning_rate = 0.001
For the network itself, we'll be passing in our 200 element long review vectors. Each batch will be batch_size
vectors. We'll also be using dropout on the LSTM layer, so we'll make a placeholder for the keep probability.
Exercise: Create the
inputs_
,labels_
, and drop outkeep_prob
placeholders usingtf.placeholder
.labels_
needs to be two-dimensional to work with some functions later. Sincekeep_prob
is a scalar (a 0-dimensional tensor), you shouldn't provide a size totf.placeholder
.
n_words = len(vocab_to_int) + 1 # Adding 1 because we use 0's for padding, dictionary started at 1
# Create the graph object
graph = tf.Graph()
# Add nodes to the graph
with graph.as_default():
inputs_ = tf.placeholder(tf.int32, [None, None], name = 'inputs')
labels_ = tf.placeholder(tf.int32, [None, None], name = 'labels')
keep_prob = tf.placeholder(tf.float32, name = 'keep_prob')
Now we'll add an embedding layer. We need to do this because there are 74000 words in our vocabulary. It is massively inefficient to one-hot encode our classes here. You should remember dealing with this problem from the word2vec lesson. Instead of one-hot encoding, we can have an embedding layer and use that layer as a lookup table. You could train an embedding layer using word2vec, then load it here. But, it's fine to just make a new layer and let the network learn the weights.
Exercise: Create the embedding lookup matrix as a
tf.Variable
. Use that embedding matrix to get the embedded vectors to pass to the LSTM cell withtf.nn.embedding_lookup
. This function takes the embedding matrix and an input tensor, such as the review vectors. Then, it'll return another tensor with the embedded vectors. So, if the embedding layer has 200 units, the function will return a tensor with size [batch_size, 200].
# Size of the embedding vectors (number of units in the embedding layer)
embed_size = 300
with graph.as_default():
embedding = tf.Variable(tf.random_uniform((n_words, embed_size), -1, 1))
embed = tf.nn.embedding_lookup(embedding, inputs_)
Next, we'll create our LSTM cells to use in the recurrent network (TensorFlow documentation). Here we are just defining what the cells look like. This isn't actually building the graph, just defining the type of cells we want in our graph.
To create a basic LSTM cell for the graph, you'll want to use tf.contrib.rnn.BasicLSTMCell
. Looking at the function documentation:
tf.contrib.rnn.BasicLSTMCell(num_units, forget_bias=1.0, input_size=None, state_is_tuple=True, activation=<function tanh at 0x109f1ef28>)
you can see it takes a parameter called num_units
, the number of units in the cell, called lstm_size
in this code. So then, you can write something like
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
to create an LSTM cell with num_units
. Next, you can add dropout to the cell with tf.contrib.rnn.DropoutWrapper
. This just wraps the cell in another cell, but with dropout added to the inputs and/or outputs. It's a really convenient way to make your network better with almost no effort! So you'd do something like
drop = tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)
Most of the time, your network will have better performance with more layers. That's sort of the magic of deep learning, adding more layers allows the network to learn really complex relationships. Again, there is a simple way to create multiple layers of LSTM cells with tf.contrib.rnn.MultiRNNCell
:
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
Here, [drop] * lstm_layers
creates a list of cells (drop
) that is lstm_layers
long. The MultiRNNCell
wrapper builds this into multiple layers of RNN cells, one for each cell in the list.
So the final cell you're using in the network is actually multiple (or just one) LSTM cells with dropout. But it all works the same from an achitectural viewpoint, just a more complicated graph in the cell.
Exercise: Below, use
tf.contrib.rnn.BasicLSTMCell
to create an LSTM cell. Then, add drop out to it withtf.contrib.rnn.DropoutWrapper
. Finally, create multiple LSTM layers withtf.contrib.rnn.MultiRNNCell
.
Here is a tutorial on building RNNs that will help you out.
with graph.as_default():
# Your basic LSTM cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([drop] * lstm_layers)
# Getting an initial state of all zeros
initial_state = cell.zero_state(batch_size, tf.float32)
Now we need to actually run the data through the RNN nodes. You can use tf.nn.dynamic_rnn
to do this. You'd pass in the RNN cell you created (our multiple layered LSTM cell
for instance), and the inputs to the network.
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state)
Above I created an initial state, initial_state
, to pass to the RNN. This is the cell state that is passed between the hidden layers in successive time steps. tf.nn.dynamic_rnn
takes care of most of the work for us. We pass in our cell and the input to the cell, then it does the unrolling and everything else for us. It returns outputs for each time step and the final_state of the hidden layer.
Exercise: Use
tf.nn.dynamic_rnn
to add the forward pass through the RNN. Remember that we're actually passing in vectors from the embedding layer,embed
.
with graph.as_default():
outputs, final_state = tf.nn.dynamic_rnn(cell, embed, initial_state=initial_state)
We only care about the final output, we'll be using that as our sentiment prediction. So we need to grab the last output with outputs[:, -1]
, the calculate the cost from that and labels_
.
with graph.as_default():
predictions = tf.contrib.layers.fully_connected(outputs[:, -1], 1, activation_fn=tf.sigmoid)
cost = tf.losses.mean_squared_error(labels_, predictions)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
Here we can add a few nodes to calculate the accuracy which we'll use in the validation pass.
with graph.as_default():
correct_pred = tf.equal(tf.cast(tf.round(predictions), tf.int32), labels_)
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
This is a simple function for returning batches from our data. First it removes data such that we only have full batches. Then it iterates through the x
and y
arrays and returns slices out of those arrays with size [batch_size]
.
def get_batches(x, y, batch_size=100):
n_batches = len(x)//batch_size
x, y = x[:n_batches*batch_size], y[:n_batches*batch_size]
for ii in range(0, len(x), batch_size):
yield x[ii:ii+batch_size], y[ii:ii+batch_size]
Below is the typical training code. If you want to do this yourself, feel free to delete all this code and implement it yourself. Before you run this, make sure the checkpoints
directory exists.
epochs = 10
with graph.as_default():
saver = tf.train.Saver()
with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
iteration = 1
for e in range(epochs):
state = sess.run(initial_state)
for ii, (x, y) in enumerate(get_batches(train_x, train_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 0.5,
initial_state: state}
loss, state, _ = sess.run([cost, final_state, optimizer], feed_dict=feed)
if iteration%5==0:
print("Epoch: {}/{}".format(e, epochs),
"Iteration: {}".format(iteration),
"Train loss: {:.3f}".format(loss))
if iteration%25==0:
val_acc = []
val_state = sess.run(cell.zero_state(batch_size, tf.float32))
for x, y in get_batches(val_x, val_y, batch_size):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: val_state}
batch_acc, val_state = sess.run([accuracy, final_state], feed_dict=feed)
val_acc.append(batch_acc)
print("Val acc: {:.3f}".format(np.mean(val_acc)))
iteration +=1
saver.save(sess, "checkpoints/sentiment.ckpt")
Epoch: 0/10 Iteration: 5 Train loss: 0.235 Epoch: 0/10 Iteration: 10 Train loss: 0.244 Epoch: 0/10 Iteration: 15 Train loss: 0.216 Epoch: 0/10 Iteration: 20 Train loss: 0.219 Epoch: 0/10 Iteration: 25 Train loss: 0.190 Val acc: 0.724 Epoch: 0/10 Iteration: 30 Train loss: 0.196 Epoch: 0/10 Iteration: 35 Train loss: 0.221 Epoch: 0/10 Iteration: 40 Train loss: 0.225 Epoch: 1/10 Iteration: 45 Train loss: 0.185 Epoch: 1/10 Iteration: 50 Train loss: 0.168 Val acc: 0.758 Epoch: 1/10 Iteration: 55 Train loss: 0.213 Epoch: 1/10 Iteration: 60 Train loss: 0.191 Epoch: 1/10 Iteration: 65 Train loss: 0.214 Epoch: 1/10 Iteration: 70 Train loss: 0.217 Epoch: 1/10 Iteration: 75 Train loss: 0.202 Val acc: 0.642 Epoch: 1/10 Iteration: 80 Train loss: 0.202 Epoch: 2/10 Iteration: 85 Train loss: 0.135 Epoch: 2/10 Iteration: 90 Train loss: 0.167 Epoch: 2/10 Iteration: 95 Train loss: 0.195 Epoch: 2/10 Iteration: 100 Train loss: 0.148 Val acc: 0.751 Epoch: 2/10 Iteration: 105 Train loss: 0.171 Epoch: 2/10 Iteration: 110 Train loss: 0.172 Epoch: 2/10 Iteration: 115 Train loss: 0.131 Epoch: 2/10 Iteration: 120 Train loss: 0.138 Epoch: 3/10 Iteration: 125 Train loss: 0.098 Val acc: 0.784 Epoch: 3/10 Iteration: 130 Train loss: 0.136 Epoch: 3/10 Iteration: 135 Train loss: 0.124 Epoch: 3/10 Iteration: 140 Train loss: 0.121 Epoch: 3/10 Iteration: 145 Train loss: 0.114 Epoch: 3/10 Iteration: 150 Train loss: 0.105 Val acc: 0.732 Epoch: 3/10 Iteration: 155 Train loss: 0.103 Epoch: 3/10 Iteration: 160 Train loss: 0.130 Epoch: 4/10 Iteration: 165 Train loss: 0.079 Epoch: 4/10 Iteration: 170 Train loss: 0.149 Epoch: 4/10 Iteration: 175 Train loss: 0.139 Val acc: 0.747 Epoch: 4/10 Iteration: 180 Train loss: 0.149 Epoch: 4/10 Iteration: 185 Train loss: 0.115 Epoch: 4/10 Iteration: 190 Train loss: 0.114 Epoch: 4/10 Iteration: 195 Train loss: 0.106 Epoch: 4/10 Iteration: 200 Train loss: 0.110 Val acc: 0.774 Epoch: 5/10 Iteration: 205 Train loss: 0.085 Epoch: 5/10 Iteration: 210 Train loss: 0.105 Epoch: 5/10 Iteration: 215 Train loss: 0.113 Epoch: 5/10 Iteration: 220 Train loss: 0.118 Epoch: 5/10 Iteration: 225 Train loss: 0.114 Val acc: 0.718 Epoch: 5/10 Iteration: 230 Train loss: 0.079 Epoch: 5/10 Iteration: 235 Train loss: 0.092 Epoch: 5/10 Iteration: 240 Train loss: 0.078 Epoch: 6/10 Iteration: 245 Train loss: 0.062 Epoch: 6/10 Iteration: 250 Train loss: 0.081 Val acc: 0.818 Epoch: 6/10 Iteration: 255 Train loss: 0.083 Epoch: 6/10 Iteration: 260 Train loss: 0.081 Epoch: 6/10 Iteration: 265 Train loss: 0.101 Epoch: 6/10 Iteration: 270 Train loss: 0.092 Epoch: 6/10 Iteration: 275 Train loss: 0.063 Val acc: 0.824 Epoch: 6/10 Iteration: 280 Train loss: 0.064 Epoch: 7/10 Iteration: 285 Train loss: 0.053 Epoch: 7/10 Iteration: 290 Train loss: 0.085 Epoch: 7/10 Iteration: 295 Train loss: 0.074 Epoch: 7/10 Iteration: 300 Train loss: 0.080 Val acc: 0.818 Epoch: 7/10 Iteration: 305 Train loss: 0.068 Epoch: 7/10 Iteration: 310 Train loss: 0.055 Epoch: 7/10 Iteration: 315 Train loss: 0.047 Epoch: 7/10 Iteration: 320 Train loss: 0.051 Epoch: 8/10 Iteration: 325 Train loss: 0.043 Val acc: 0.805 Epoch: 8/10 Iteration: 330 Train loss: 0.060 Epoch: 8/10 Iteration: 335 Train loss: 0.068 Epoch: 8/10 Iteration: 340 Train loss: 0.059 Epoch: 8/10 Iteration: 345 Train loss: 0.048 Epoch: 8/10 Iteration: 350 Train loss: 0.061 Val acc: 0.795 Epoch: 8/10 Iteration: 355 Train loss: 0.040 Epoch: 8/10 Iteration: 360 Train loss: 0.039 Epoch: 9/10 Iteration: 365 Train loss: 0.039 Epoch: 9/10 Iteration: 370 Train loss: 0.042 Epoch: 9/10 Iteration: 375 Train loss: 0.052 Val acc: 0.814 Epoch: 9/10 Iteration: 380 Train loss: 0.052 Epoch: 9/10 Iteration: 385 Train loss: 0.084 Epoch: 9/10 Iteration: 390 Train loss: 0.090 Epoch: 9/10 Iteration: 395 Train loss: 0.054 Epoch: 9/10 Iteration: 400 Train loss: 0.051 Val acc: 0.814
test_acc = []
with tf.Session(graph=graph) as sess:
saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
test_state = sess.run(cell.zero_state(batch_size, tf.float32))
for ii, (x, y) in enumerate(get_batches(test_x, test_y, batch_size), 1):
feed = {inputs_: x,
labels_: y[:, None],
keep_prob: 1,
initial_state: test_state}
batch_acc, test_state = sess.run([accuracy, final_state], feed_dict=feed)
test_acc.append(batch_acc)
print("Test accuracy: {:.3f}".format(np.mean(test_acc)))
Test accuracy: 0.811