In [1]:
from utils import *
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
Using Theano backend.

"Ok notebook, import a bunch of libraries that we're going to use to build this model".

If we look in the utils.py file, we can see that utils has over 50 import lines, including a bunch for different parts of Keras:

import keras
from keras import backend as K
from keras.utils.data_utils import get_file
from keras.utils import np_utils
[...]

For now, we don't need to know what they all are.

In [2]:
x = random((30,2))
y = np.dot(x, [2, 3]) + 1

This is our data - x is our input, and y is our desired output.

Notice that, like in our spreadsheet, x is a collection of arbitrarily selected random numbers.

Unlike in our spreadsheet, where y was also just random numbers, we've actually defined a relationship between x and y, and we're going to provide our model with the values of x and y, but not the relationship between them (y = x [2, 3] + 1) and ask it for its best guess of what that relationship could be.

In [3]:
x[:5]
Out[3]:
array([[ 0.2006,  0.7571],
       [ 0.5218,  0.4857],
       [ 0.0908,  0.3997],
       [ 0.5502,  0.6391],
       [ 0.6486,  0.505 ]])

"Ok notebook, show me the first 5 values of x."

In [4]:
y[:5]
Out[4]:
array([ 3.6724,  3.5005,  2.3805,  4.0178,  3.8122])

"Ok notebook, do the same thing but for y."

In [5]:
lm = Sequential([Dense(1, input_dim=2)])

lm is our linear model. It can be written as

lm = Sequential()
lm.add(Dense(x, y))

or

lm = Sequential([Dense(x, input_shape=(y,)])

or

lm = Sequential([Dense(x, input_dim=y])

"dim" stands for "dimensions" by the way, because it took me a while.

The docs describe Dense() as "just your fully connected NN layer". The model we create here will take as an input any array of shape (*, y), and output an array of shape (*, x).

So all we're saying here is,

"Ok notebook, create a linear model that takes an array with any number of rows and two columns and returns an array with the same number of rows and one column."

In [6]:
print(x.shape)
print(y.shape)
(30, 2)
(30,)

Oh look - what a coincidence.

We now have a linear model that takes the right sized inputs and produces the right sized outputs.

Recall that at the end of our spreadsheet neural network implementation, we identified two things we could do to improve the output it was giving us:

  1. Make changes to the way we initialize our weights, making the random numbers in our matrices less random
  2. Build in some kind of optimization process so the network can adjust its weights in response to the difference between our activations and desired outputs

I believe that our linear model is at the "just throw some random numbers at the problem and see what happens" stage of the problem, which was as far as we got yesterday, but with one major difference: our spreadsheet neural network didn't have any kind of strategy for weight initialization, and I believe that Keras has weight initialization methods built in.

So that's the first step taken care of.

In [7]:
lm.compile(optimizer=SGD(lr=0.1), loss="mse")

"Ok model, adjust your weights and improve your guesses using stochastic gradient descent with a learning rate of 0.1. To keep track of whether your guesses are getting better or worse, use mean squared error."

We'll cover both stochastic gradient descent and mean squared error later on, but that's the second step taken care of.

In [8]:
lm.evaluate(x, y, verbose=0)
Out[8]:
20.332971572875977

"Ok model, make your first guesses based on x, compare them against the actual values of y, and tell us how you did."

Is this good or bad? The Wikipedia page for mean squared error says "values closer to zero are better" so I'm going to take that as "could be improved".

This shouldn't be surprising though - if you've been paying attention you'll realize that our model hasn't actually been trained.

In [9]:
lm.fit(x, y, nb_epoch=5, batch_size=1)
Epoch 1/5
30/30 [==============================] - 0s - loss: 1.8488     
Epoch 2/5
30/30 [==============================] - 0s - loss: 0.2749     
Epoch 3/5
30/30 [==============================] - 0s - loss: 0.1266     
Epoch 4/5
30/30 [==============================] - 0s - loss: 0.0485     
Epoch 5/5
30/30 [==============================] - 0s - loss: 0.0194     
Out[9]:
<keras.callbacks.History at 0x7f025d288810>

"Ok model, make a guess, check your guess against our desired output, and improve the weights you're using to make the guess - and do it five times."

Again, we'll cover the gradient descent method used to actually improve the our model weights later on, but notice the diminishing loss function after each epoch (an epoch is a complete pass through our data).

In [10]:
lm.evaluate(x, y, verbose=0)
Out[10]:
0.011976789683103561

That's a lot closer to zero!

In [11]:
lm.fit(x, y, nb_epoch=5, batch_size=1)
Epoch 1/5
30/30 [==============================] - 0s - loss: 0.0092     
Epoch 2/5
30/30 [==============================] - 0s - loss: 0.0036     
Epoch 3/5
30/30 [==============================] - 0s - loss: 0.0018     
Epoch 4/5
30/30 [==============================] - 0s - loss: 9.1857e-04     
Epoch 5/5
30/30 [==============================] - 0s - loss: 4.3288e-04     
Out[11]:
<keras.callbacks.History at 0x7f025d288d90>

"Ok model, do it five more times."

In [12]:
lm.evaluate(x, y, verbose=0)
Out[12]:
0.00023057886573951691

That's a lot closer to zero.

In [13]:
lm.get_weights()
Out[13]:
[array([[ 2.0156],
        [ 2.9427]], dtype=float32), array([ 1.0263], dtype=float32)]

"Ok model, what weights did you use to produce this output?"

Remember our original function for y?

y = np.dot(x, [2, 3]) + 1

The weights used by our model turned out pretty close to 2, 3, and 1!