from IPython.display import Image
Theano can:
$ pip install theano
Let's make sure Theano is working properly by running a computationally demanding deep learning example.
On Mac make sure you have hdf5 installed first
$ brew install homebrew/science/hdf5
$ pip install keras h5py
Used for image recognition
Similar to the Google Deep Dream: http://deepdreamgenerator.com/
Example through the Keras repo: https://github.com/fchollet/keras/blob/master/examples/deep_dream.py
Download weights file: https://gist.github.com/baraldilorenzo/8d096f48a1be4a2d660d
Image("img/cat_watermelon_compare.jpg")
Image("img/dream_at_iteration_0_1.png")
Ok, so we got Theano and Keras installed, working, and did some weird shit.
Now let's go back to Theano basics!
Let's write code in Python that builds an expression for Theano.
import theano
import theano.tensor as T
In Theano, all symbols must be typed.
T.dscalar is the type assigned to “0-dimensional arrays (scalar) of doubles (d)”.
# create variables representing floating-point scalars
x = T.dscalar('x')
y = T.dscalar('y')
# T.dscalar is not a class.
# Neither x nor y are actually instances of dscalar.
# They are instances of TensorVariable.
type(x)
theano.tensor.var.TensorVariable
# x and y are assigned the theano Type dscalar in their type field
x.type
TensorType(float64, scalar)
# create a simple expression
z = x + y
Use theano.pp
to pretty print out the computation associated with a variable.
print(theano.pp(z))
(x + y)
If you don't provide a string argument to your variables, the symbol will be unnamed. Names are not required, but can be useful for debugging.
# declare two symbolic floating-point scalars
a = T.dscalar()
b = T.dscalar()
# create an expression
c = a + b
print(theano.pp(c))
(<TensorType(float64, scalar)> + <TensorType(float64, scalar)>)
f = theano.function([x, y], z)
Our function takes 2 arguments:
x
and y
z
as an ouput (can also be a list of variables)The theano.function
is the interface to a compiler which builds a callable object from a purely symbolic graph.
Behind the scene, f
was being compiled into C code.
f
to evaluate c
¶f(1.5, 2.5)
array(4.0)
# declare variables
x = T.dmatrix('x')
y = T.dmatrix('y')
# write mathematical expression
z = x + y
# create a function
f = theano.function([x, y], z)
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])
array([[ 11., 22.], [ 33., 44.]])
Image("img/logistic_function.png")
Image("img/logistic.png")
For this example, we want to compute the function elementwise on matrices of doubles, which means that we want to apply this function to each individual element of the matrix.
# declare variable
x = T.dmatrix('x')
# write mathematical expression
s = 1 / (1 + T.exp(-x))
# create a function
logistic = theano.function([x], s)
# evaluate
logistic([[0, 1], [-1, -2]])
array([[ 0.5 , 0.73105858], [ 0.26894142, 0.11920292]])
Image("img/butwait.jpg")
# dmatrices produces as many outputs as names that you provide.
# it is a shortcut for allocating symbolic variables
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = theano.function([a, b], [diff, abs_diff, diff_squared])
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])
A shared variable is useful for things we want to give a definite value but we also want to update.
These are variables whose value may be shared between multiple functions and they have an internal value.
from theano import shared
# create a shared variable
state = shared(0)
inc = T.iscalar('inc')
accumulator = theano.function(
[inc],
state,
updates=[(state, state + inc)]
)
The value can be accessed and modified by the .get_value()
and .set_value()
methods.
state.get_value()
array(0)
accumulator(1)
state.get_value()
array(1)
accumulator(300)
state.get_value()
array(301)
T.grad() will give us a symbolically differentiated expression of our function, then we pass it to theano.function
to compile a new function to call it.
This is pretty sweet.
Let use the following equation in this example:
$ f(x) = e^{sin{(x^2)}} $
x = T.dscalar()
# build the expression
fx = T.exp(T.sin(x**2))
# "compile" this expression into a Theano function
f = theano.function(inputs=[x], outputs=[fx])
f(10)
[array(0.602681965908778)]
# wrt stands for 'with respect to'
fp = T.grad(fx, wrt=x)
fprime = theano.function([x], fp)
fprime(15)
array(4.347404090286685)
a collection of software “neurons” are created and connected together
the network is asked to solve a problem, which it attempts to do over and over, each time strengthening the connections that lead to success and diminishing those that lead to failure.
Daniel Smilkov and Shan Carter at Google put together this interactive learner for how a neural network works: http://playground.tensorflow.org/
Each node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.
Image("img/neural_net2.jpeg")
A neural network that will produce the following truth table, called the 'exclusive or' or 'XOR' (either A or B but not both):
Image("img/xor-table.png")
import theano
import theano.tensor as T
import theano.tensor.nnet as nnet
import numpy as np
The x variable will be the input features, a 2-element vector, ex: [0, 1].
The y variable will be the labels (and eventually the output), a scalar, 0 or 1.
# symbolically define two Theano variables
x = T.dvector()
y = T.dscalar()
For each layer we need to define a Python function that will be a matrix multiplier and sigmoid function.
It will:
Q: Why are we using a sigmoid function?
A: Sigmoid functions are often used in artificial neural networks to introduce nonlinearity.
Image("img/logistic.png")
# basic layer output function
def layer(x, w):
# b (the bias term)
b = np.array([1], dtype='float64')
# combines x and b into a tensor
new_x = T.concatenate([x, b])
# theta1: 3x3 * x: 3x1 = 3x1 ; theta2: 1x4 * 4x1
# m is the dot product of the transpose of the weight and new_x
m = T.dot(w.T, new_x)
# create a sigmoid function
h = nnet.sigmoid(m)
return h
The neural net needs a function that gives feedback each time the neural net tries to give an output.
This function takes a cost/error expression and a weight matrix.
We will use Theano's grad()
function to compute the gradient of the cost function with respect to the given weight matrix and return an updated weight matrix.
def grad_desc(cost, theta):
# learning rate
alpha = 0.1
return theta - (alpha * T.grad(cost, wrt=theta))
Since the weight matrices will take on definite values, they're not going to be represented as Theano variables, they're going to be defined as Theano's shared variable.
# define the weight matrices and initialize them to random values
theta_1 = theano.shared(np.array(np.random.rand(3,3),dtype='float64'))
theta_2 = theano.shared(np.array(np.random.rand(4,1),dtype='float64'))
Start by computing the hidden layer's output using the previously defined layer
function.
Pass in the Theano variable x
and the theta_1 matrix (weights).
hidden_layer_1 = layer(x, theta_1)
output_layer_1 = T.sum(layer(hidden_layer_1, theta_2))
# declare our cost expression
cost_expression = (output_layer_1 - y)**2
# the cost expression for training
cost = theano.function(inputs=[x, y], outputs=cost_expression,
updates=[
(theta_1, grad_desc(cost_expression, theta_1)),
(theta_2, grad_desc(cost_expression, theta_2))
])
# the output layer expression to run the network
run_forward = theano.function(inputs=[x], outputs=output_layer_1)
Updates allows us to update our shared variables according to an expression. updates expects a list of 2-tuples:
updates=[(shared_variable, update_value), ...]
The second part of each tuple can be an expression or function that returns the new value we want to update the first part.
We have two shared variables we want to update, theta1 and theta2, and we want to use our gradient_descent function to give us the updated data.
The gradient_descent function expects two arguments, a cost function and a weight matrix.
So every time we invoke/call the cost function that we've compiled with Theano, it will also update our shared variables according to our gradient_descent rule.
# training data X
inputs = np.array([[0, 1],[1, 0],[1, 1],[0, 0]]).reshape(4,2)
# training data Y
labels_y = np.array([1, 1, 0, 0])
#### Setup a for loop to iterate through the training epochs
cur_cost = 0
for i in range(10000):
for k in range(len(inputs)):
# call the Theano-compiled cost function, it will auto-update weights
cur_cost = cost(inputs[k], labels_y[k])
# only print the cost every 1000 epochs/iterations (to save space)
if i % 1000 == 0:
print('Cost: %s' % (cur_cost,))
Cost: 0.450344197495 Cost: 0.131175398493 Cost: 0.0339816822734 Cost: 0.0100899134867 Cost: 0.00419796098757 Cost: 0.00246249927827 Cost: 0.00169869648901 Cost: 0.0012825679369 Cost: 0.00102475142493 Cost: 0.000850819493948
print(run_forward([0, 1]))
print(run_forward([1, 1]))
print(run_forward([1, 0]))
print(run_forward([0, 0]))
0.964960943766 0.0328625291814 0.977921860597 0.0269406923839
Questions?
Twitter: @ifmoonwascookie