import addutils.toc ; addutils.toc.js(ipy_notebook=True)
import scipy.io
import numpy as np
import pandas as pd
from time import time
from sklearn.metrics import confusion_matrix
from IPython.display import Image
from addutils import css_notebook
css_notebook()
The main argument we treat in this notebook is TensorFlow, a library for machine learning and deep learning open sourced by Google in November 2015. For more information please visit TensorFlow.
"TensorFlow™ is an open source software library for numerical computation using data flow graphs."
In TensorFlow the computation is formally a graph, with nodes representing operations while edges representing tensors (multidimensional data) communicated between operations. According to TensorFlow web site, the flow of tensors through the graph is where TensorFlow gets its name. It is not intended to be only a neural network library but to perform any computation that can be expressed as a graph. TensorFlow automatic differentiation is especially suited for gradient based machine learning algorithms. The library is written in C++ and it has nice Python bindings. Moreover it can run both on CPU and GPU.
Great, but why TensorFlow? There a lot of framework for Deep Learning, each with its own features and use cases. For example PyTorch is suitable for reasearch because it is flexible enough to create new models but it is not suitable for production because it does note scale well. Other frameworks such as Caffe and MXNet are scalable but lack flexibility. Tensorflow is both flexible and scalable and allows users to go easility from research into production.
The notebooks on TensroFlow requires few additional packages. Please be sure to install them properly. We will outline how to do it in the following paragraphs.
To use TensorFlow to its full extent, you should take advantage of the parallelization provided by GPUs. Currently TensorFlow support Nvidia GPU via CUDA Toolkit. Download and install CUDA Toolkit for your operating system before continuing. At the time of writing the current relase of CUDA Toolkit is 9.1.
IMPORTANT: TensorFlow build uses prior version of CUDA, that is CUDA Toolkit 9.0. We highly recommend this version if you are going to install TensorFlow via pip. Remember to install cuDNN 7 with support for CUDA Toolkit 9.0 as well.
The NVIDIA® CUDA® Toolkit provides a development environment for creating high performance GPU-accelerated applications. With the CUDA Toolkit, you can develop, optimize and deploy your applications on GPU-accelerated embedded systems, desktop workstations, enterprise data centers, cloud-based platforms and HPC supercomputers. The toolkit includes GPU-accelerated libraries, debugging and optimization tools, a C/C++ compiler and a runtime library to deploy your application.
GPU-accelerated CUDA libraries enable drop-in acceleration across multiple domains such as linear algebra, image and video processing, deep learning and graph analytics. For developing custom algorithms, you can use available integrations with commonly used languages and numerical packages as well as well-published development APIs. Your CUDA applications can be deployed across all NVIDIA GPU families available on premise and on GPU instances in the cloud. Using built-in capabilities for distributing computations across multi-GPU configurations, scientists and researchers can develop applications that scale from single GPU workstations to cloud installations with thousands of GPUs.
Nvidia provides also a library of primitives designed specifically for Deep Learning called cuDNN. Installing cuDNN further improves performance and it is highly advisable (note that you need to register as a developer to download it).
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA Deep Learning SDK.
After this steps you can install TensorFlow via pip package or directly from source. Even if you can benefit from an installation from source, as generally TensorFlow runs faster when built from source, we recommend to install via pip package for the purpose of these tutorials. Activate your favourite environment (addfor_tutorials) and type:
pip install tensorflow-gpu
It should take few minutes to download and install. If you want to build from source, clone TensorFlow repository and follow this guide
If you want to use this notebook on your laptop (without GPU support) type this command:
pip install tensorflow
This allows you to run the notebooks, although with poorer performance.
If above commands fails try with:
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp36-cp36m-linux_x86_64.whl
(for GPU support)
pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp36-cp36m-linux_x86_64.whl
(for CPU only)
For further information about installing TensorFlow read the official documentation: https://www.tensorflow.org/install/.
First import tensorflow and show current version. At the moment of writing we are using tensorflow version 1.5.
import tensorflow as tf
tf.__version__
'1.5.0'
TensorFlow represent computations as graphs. Nodes in the graph are called ops (operations). An op takes zero or more Tensors and produces zero or more Tensors as output. A Tensor is a multidimensional array with a specified type. The graph is a description of a computation, in order to actually execute the computation a graph must be launched in a session. A session exectue a specific graph on one of the available devices (that can be either CPUs or GPUs).
In following paragraphs we will clarify these concepts.
TensorFlow programs are usually structured into a construction phase, that assembles a graph, and an execution phase that uses a session to execute ops in the graph.
For example, it is possible to represent and train a neural network in the construction phase, and then repeatedly execute a set of training ops in the graph in the execution phase.
It is possibile to build a graph by starting with nodes that do not need any input, such as constant nodes. Then it is possible to use the output of the constant node as input to other operations. TensorFlow uses a default graph to which operations are added, the default graph is created empty as soon as tensorflow is imported. It is sufficient for most operations but it is also possible to manage multiple graphs with the Graph
class.
x = tf.constant(4.)
y = tf.constant(3.)
product = tf.multiply(x, y)
The code creates three nodes: two constant and an operation (multiplication) that takes two inputs (the two constants) and produces an output (product). To actually procude an output is is necessary to run the graph in a session.
What is the difference between this expression and a corresponding plain python code that multiply two constants? Try to print the result:
print(product)
Tensor("Mul:0", shape=(), dtype=float32)
The key idea is that product
does not compute the product of x and y but rather add the product operator to a graph of computation that will be executed later.
Computation Graphs allow us to implement ML algorithms by creating and executing operations that interacts with each others. The interaction between operations constitutes the graph.
What is a graph in the end? It is a set of nodes (or vertices) interconnected by (directed) edges. Edges allow data to flow from one node to another. Each node represent an operation, that produces an ouput that is passed on in the graph. Operations can be of any kind, from math to logging operation (we will se more on this when we will use TensorBoard).
The graph connectivity defines a set of node dependencies, TensorFlow is able to optimize the computation based on this dependencies. Being able to identify connectivities allows to distribute computation and avoid performing redundant computations on irrelevant portion of the graph.
We can add other operations to the graph. For example we can do a sum after the product.
z = tf.constant(8.)
addition = tf.add(product, z)
The next image represent the default graph for the operations defined before
Image('images/computation_graph_example.jpg', width=600, height=600)
Remember that the name of the variables (x, y and product) should be regarded as the output of the operations and not the opration themselves. Note that for some arithmetic and logical operation it is possible to use operator overloading instead of using tf.<operator>
. For example the multiplication could be written product = x * y
.
It is possible to create additional graphs and control how operations and variables get associated with them. tf.Graph()
creates a new graph. To check which graph is currently se as default we use:
g_default = tf.get_default_graph()
print(g_default)
<tensorflow.python.framework.ops.Graph object at 0x7f7246592d68>
Now we create a new graph g.
g = tf.Graph()
print(g)
<tensorflow.python.framework.ops.Graph object at 0x7f7246592e80>
As you can see they are diffrent objects. Also, given a node, we can see to which graph is associated:
print(x.graph is g)
print(x.graph is g_default)
False True
Using with
statement we can control which graph is set as default and add new nodes to it. For example we can do:
print(g is tf.get_default_graph())
with g.as_default():
print(g is tf.get_default_graph())
print(g_default is tf.get_default_graph())
False True True
TensorFlow uses graphs as an abstraction for various reasons. The main reason is that Neural Networks have a natural graph structure. Moreover graphs can split computation into small differentiable pieces, that allows TensorFlow to compute derivative of any node w.r.t. any other node in the graph. Neural Network learning algorithm is based on gradients. Finally having the computation separated makes it much easier to distribute work across multiple CPUs, GPUs, and other computational devices. (There is one more reason that we will see briefly: save computation, only runnig subgraphs that are connected to the value to fetch).
Once the computation graph is defined we can use it in a Session to actually obtain the result. Session objects are part of TensorFlow API that are responsible for the communication between data and objects and compuational resources. The computation is performed with the method run()
of the session object. Without arguments Session
uses the default graph.
sess = tf.Session()
result = sess.run([addition])
print(result)
sess.close()
[20.0]
It is necessary to close a session after last operation, to release computational resources; closing a session is done with the method close()
. It is possible to use with ... as ...
statement, that takes care of releasing resources when computation is finished.
Tensorflow translates the graph into executable operations and it distributes the computation automatically on available resources. It uses the available GPU for as many operations as possible.
It is possible to use a specific device for a session with with tf.device("/gpu:1"):
statement. For example previous command execute the graph on the second GPU of the machine. Try to change the string and execute the code on CPU (or another GPU of your machine)
with tf.Session() as sess:
with tf.device("/cpu:0"): # optional
result = sess.run([addition])
print(result)
[20.0]
In the these example we requested a particular node of the graph, by passing it to the method run()
. This is called fetch and the arguments of the function are the fetches. It is possible to fetch more than one variable by passing them simultaneously to the run() command (session.run([var1, var2])
). The result is a list of values fetched.
As we said earlier, TensorFlow computes only the necessary portion of the graph if we ask to compute only the product, only the output of node product is computed (involving only the nodes to which it depends)
with tf.Session() as sess:
result = sess.run([product, addition])
print(result)
[12.0, 20.0]
In an interactive environment like IPython or Jupyter it could be useful to interleave graph construction and run operations, you can use InteractiveSession
. The only difference with a regular Session
is that an InteractiveSession
installs itself as the default session on construction. The methods Tensor.eval()
and Operation.run()
will use that session to run ops.
sess = tf.InteractiveSession()
c = tf.linspace(0.0, 1.0, 5)
print(c.eval())
[0. 0.25 0.5 0.75 1. ]
g = tf.get_default_graph()
operations = [op for op in g.get_operations()]
[op.name for op in operations]
['Const', 'Const_1', 'Mul', 'Const_2', 'Add', 'LinSpace/start', 'LinSpace/stop', 'LinSpace/num', 'LinSpace']
Running an operation makes tensorflow running all the operations required for the input of this operation, but doesn't return anything. It is also possible to get tensors and operations by name from a graph.
Now we close interactive session.
sess.close()
NOTE: the session runs the default graph. Using multiple graphs is generally avoided, each graph will require a separate sessions and each session tend to use all available resources. Moreover passing data between them is a bottleneck because usually require using python/numpy and does not work in a distributed environment. It is better to have a disconnected subgraph within the default graph.
TensorFlow uses Tensors to represents all data. Only tensors are passed between ops in the graph. Tensors are an n-dimensional arrays, and, in TensorFlow, are described with rank, shape and type. The rank its the number of dimensions (different from matrix rank), the shape is the number of elements for each dimension and the type is the data type assigned to the tensor.
When we create an operation for example with tf.add(), the operation is a node added to the graph but the handle we get is the tensor that results from the operation. This handle is the edge of the graph and it passes (flow) the yet-to-be-computed result of the operation to other nodes.
Tensors objects have attributes and operations. For example the tensor x
is a tensor with no shape (actually shape 1x1) that is a scalar, and it's dtype is a float32:
print(x)
Tensor("Const:0", shape=(), dtype=float32)
Tensors dtype can be specified at creation time or can be modified later using tf.cast
operation. Tensor can be viewed as n-dimensional arrays. As we have already seen a $1 \times 1$ tensor is a scalar, a $1 \times n$ tensor is a vector, a $n \times n$ tensor is a matrix, a $n \times n \times n$ tensor is a three dimensional array, and so on (it can be generalized to any dimension). As with dtypes, TensorFlow can infer automatically the dimension of a tensor object given the shape of the data. For example if we want to create a matrix with tf.constant
and see what shape we get, we can use the following code:
w = tf.constant([[1,2,3],
[4,5,6]])
print(w.get_shape())
(2, 3)
Other types of operations that generates number without user input are random number generation operators, linspace (as seen before), array of constant (one or zero values) and many others. They resemble the operations and API used by numpy.
An important operation that we will use often is matrix multiplication. It is needed to perform operations like $Ax=b$, where $A$ is a matrix and $x$ is a vector.
A = tf.constant([[1,2,3],
[4,5,6]])
print(A.get_shape())
x = tf.constant([[1],[0],[1]])
print(x.get_shape())
(2, 3) (3, 1)
b = tf.matmul(A, x)
with tf.Session() as sess:
print(b.eval())
[[ 4] [10]]
NOTE: if you experience warnings, it means TensorFlow wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. To suppress them use:
`import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'`
Tensors have a name as well, it can be assigned at creation time or TensorFlow assign it for you (as happened before). It can be displayed with method name
. The name of the objec is simply the name of the operation followed by a colon and the index of that tensor in the output of the operation that produced it. If two or more operations exists, tensorflow add an underscore to its names followed by a progressive number to avoid tensors with the same name. In following code we will specify the name of the operation at construction time.
b.name
'MatMul:0'
Sometimes can be useful, especially in complicated graphs, to group together related tensors, that belong to a certain portion of the graph for example. Name scopes are used for this purpose. Here's an example.
with tf.Graph().as_default():
a1 = tf.constant(4, dtype=tf.float64, name='variable_a1')
with tf.name_scope("group"):
group_a1 = tf.constant(4,dtype=tf.int32, name='variable_a1')
group_a2 = tf.constant(4,dtype=tf.int32, name='variable_a2')
print(a1.name)
print(group_a1.name)
print(group_a2.name)
variable_a1:0 group/variable_a1:0 group/variable_a2:0
TensorFlow provides an interface similar to numpy, for example it implements broadcasting. Note that a
is a list with 2 elements while b
is a matrix 2 by 2. a
is broadcasted (repeated) to match the shape of b
.
a = tf.constant([2, 2], name='a')
b = tf.constant([[0, 1], [2, 3]], name='b')
x = tf.multiply(a, b, name='mul')
with tf.Session() as sess:
print(sess.run(x))
[[0 2] [4 6]]
Like numpy TensorFlow provides a way to create tensors filled with specific values. Next example create a tensor of shape (2, 3) filled with zeros.
zeros = tf.zeros([2, 3], tf.int32, name='zeros')
#zeros = tf.identity(zeros, name='add_zeros')
with tf.Session() as sess:
print(sess.run(zeros))
[[0 0 0] [0 0 0]]
It is also possible to create a tensor with same shape and type of an existing tensor, for example we can create a tensor similar to the previous but filled with ones this time.
ones_like = tf.ones_like(zeros, name='ones')
with tf.Session() as sess:
print(sess.run(ones_like))
[[1 1 1] [1 1 1]]
It is also possible to use a specific value to fill the tensor with.
Next we will create a constant tensor, but this time it will have a sequence of values.
lin_space = tf.lin_space(10.0, 13.0, 4, name='linspace_example')
ran_ge = tf.range(5, name='range_example')
with tf.Session() as sess:
print('lin_space: ', sess.run(lin_space))
print('range: ', sess.run(ran_ge))
lin_space: [10. 11. 12. 13.] range: [0 1 2 3 4]
What if we want to generate random number? (We will see it is specially usefull for initialize Neural Networks weights). For example we can sample number from a normal distribution or a truncated normal, that doesn't return any values more than two standard deviations away from its mean.
truncated_normal = tf.truncated_normal((10000,))
with tf.Session() as sess:
values = sess.run(truncated_normal)
hist, edges = np.histogram(values, density=True, bins=50)
fig = bk.figure(plot_width=600, plot_height=350, title=None)
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:])
bk.show(fig)
TensorFlow offers a series of operations similar to numpy, for example: tf.abs, tf.negative, tf.sqrt, tf.exp
and so on. It also has operations for arrays, tf.concat, tf.split, tf.shape
and for control flow, checkpointing and Neural Networks building blocks. We will see them in following notebooks and we will describe them when needed (most of them are self-explanatory).
TensorFlow supports native python data types, if you pass a list or numpy array to a constant function like one of the above, the resulting tensor will have corresponding data type. Single values will be converted to 0 dimensional tensors (scalars), lists will be converted to one dimensional tensors (vectors), nested lists will be converted to two dimensional tensors (matrices) and so forth.
tensor_d0 = tf.constant(42, name='scalar')
tensor_d1 = tf.constant([42, 42], name='vector')
tensor_d2 = tf.constant([[42], [42]], name='matrix')
with tf.Session() as sess:
print(tensor_d0.eval())
print(tensor_d1.eval())
print(tensor_d2.eval())
42 [42 42] [[42] [42]]
But tensorflow has its own data types, for example tf.int8
or tf.float16
. TensorFlow integrates seamlessly with NumPy types, and as we have seen it is possible to pass numpy types to TensorFlow.
tf.int32 == np.int32
True
As we have seen using python types to specify TensorFlow type is easy, and it is useful for quick prototyping. However TensorFlow types are more specific, for examples pyhton has one integer type only whereas TensorFlow has more int-8 to int-64. For bigger projects it is preferable to use native TensorFlow types.
Aside from the fact that costants are, well, constant, there is one more caveats to their use: constants are stored in the graph definition. The graph definition is stored in a protobuf, protocol buffers, Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.
with tf.Graph().as_default():
const_value = tf.constant([42.0, 42.0], name="const_value")
with tf.Session() as sess:
print(sess.graph.as_graph_def())
node { name: "const_value" op: "Const" attr { key: "dtype" value { type: DT_FLOAT } } attr { key: "value" value { tensor { dtype: DT_FLOAT tensor_shape { dim { size: 2 } } tensor_content: "\000\000(B\000\000(B" } } } } versions { producer: 24 }
This makes loading graphs expensive when constants are big, and could reach the limit of 2GB for the size of the graph. It is better to use constants only for primitive types.
For these reasons we introduce variables next.
Variables are used to maintain state accross executions of the graph. In this example state
is initialized to zero and updated each time update
is run. When using variables, they must be initialized after launching the graph, that is after creating a session. In order to initialize the variables it is necessary to add an init operation that must be run before all other operations.
state = tf.Variable(0, name='counter')
one = tf.constant(1, name='one')
new_value = tf.add(state, one, name='new_value')
update = tf.assign(state, new_value, name='update')
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print('state: ' + str(sess.run(state)))
for _ in range(5):
sess.run(update)
print('state: ' + str(sess.run(state)))
state: 0 state: 1 state: 2 state: 3 state: 4 state: 5
Here assign
is part of the computational graph as add
and other operators. Variables are typically used to represent parameters of a model, for example in neural network they are used to store the weights matrix, that it is updated at every execution of the graph. Parameters are updated by the operations in the graph as the result of the optimization process.
A Variable exists outside the context of a session.run
. Variables can be initialized to random values, using a distribution (in the previous example it was initialized to zero). For example we can use a random normal distribution with given mean and standard deviation, as well as specify resulting shape.
init_val = tf.random_normal((1, 5), 0, 1, name='init_val')
var = tf.Variable(init_val, name='var')
print('tensor object: {}'.format(var))
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
initialized_values = sess.run(var)
print('tensor values after init: {}'.format(initialized_values))
tensor object: <tf.Variable 'var:0' shape=(1, 5) dtype=float32_ref> tensor values after init: [[-0.19331591 -2.5231225 -1.8801746 0.44765785 0.21164836]]
Note that if we run the code several times the output changes, but also the object changes (look at the name of the object!). A new variable is created every time tf.Variable
method is called.
A better approach would be to reuse the variable already created. There are two main approach: pass the variable oject around, or encapsulate variable object inside a variable scope. As stated in the documentation of TensorFlow, passing variables around is more explicit, but it is sometimes convenient to write TensorFlow functions that implicitly use variables in their implementation. In general using tf.Variable
directly is discouraged. The preferred method is to use tf.get_variable
, that allows also to reuse a variable multiple times.
def var_reuse():
with tf.variable_scope("reuse_example", reuse=tf.AUTO_REUSE):
new_var = tf.get_variable("new_var", initializer=init_val)
return new_var
Previous function defines the scope in which variable is declared and it allows you to create a hierachy of definitions (suitable for example for Neural Networks). The parameter reuse=tf.AUTO_REUSE
tells TensorFlow to create a variable the first time it is called or to reuse an existing one ater.
new_var1 = var_reuse() # Creates new_var.
new_var2 = var_reuse() # Gets the same, existing new_var.
assert new_var1 == new_var2
Both variable_scope
and name_scope
have the same effect on all operations as well as variables created using tf.Variable (the scope will be added as a prefix to the operation or variable name). However, name scope is ignored by tf.get_variable.
Each session maintains its own copy of the variables.
new_var = var_reuse()
print('tensor object: {}'.format(new_var))
with tf.Session() as sess:
sess.run(tf.variables_initializer([new_var], name='new_var_init'))
initialized_values = sess.run(new_var)
print('tensor values after init: {}'.format(initialized_values))
tensor object: <tf.Variable 'reuse_example/new_var:0' shape=(1, 5) dtype=float32_ref> tensor values after init: [[ 0.03998312 -1.1565742 0.15244694 0.41520226 0.38697594]]
So far we have covered how to store values in constant and use variable to update their values (these operations are called source operations). Remember that a the worklfow of a TensorFlow program has two phases: assemble the graph and use a session to execute operations in the graph. We need a method to assemble the graph without knowing the values for some tensors needed for computation (simply the inputs of our model).
TensorFlow provides a method to pass values to the variables with a feed mechanism. A feed replace the value of a tensor with the value that you provide. Tensorflow provides a special structure called placeholder for feeding input values. Placeholders have an optional shape parameter, that can be set to None
meaning that the shape can be of any size (we will see that it is common for mini-batches later on). To feed the values we construct a dictionary whose keys correspond to placeholder variables and values can be list or numpy arrays to feed to each variable.
input1 = tf.placeholder(tf.float32, name='input1')
input2 = tf.placeholder(tf.float32, name='input2')
output = input1 * input2
One caveat of using operator overloading, is that you cannot assign a name to the operation, because a tf.Operation (or tf.Tensor) is immutable once it has been created.
print(output.name)
mul_1:0
The typical way to rename an op is therefore to use tf.identity(), which has almost no runtime cost:
output = tf.identity(output, name='placeholder_multiplication')
NOTE: the recommended way to structure a name scope is to assign the name of the scope itself to the output from the scope (if there is a single output op):
with tf.name_scope('name_scope_multiplication') as scope:
input1 = tf.placeholder(tf.float32, name='input1')
input2 = tf.placeholder(tf.float32, name='input2')
output = tf.multiply(input1, input2, name=scope)
A variable declaered as placeholder expects a feed and generate an error if it is not supplied
with tf.Session() as sess:
print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))
[array([14.], dtype=float32)]
You can feed_dict any feedable tensor. Placeholder is just a way to indicate that something must be fed. A convenient function called tf.Graph.is_feedable(tensor)
returns true if tensor is feedable.
init_placeholder = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_placeholder)
print(sess.run([addition], feed_dict={product: 4}))
[12.0]
The practice of feeding values for some operations is extremely helpful for testing. Feed in dummy values to test parts of a large graph avoids computing those portions of the graph.
Remember, never declare an operation at execution time! It will add a new operation to the graph each time. We will illustrate this concept as well as the concept introduced so far with the aid of a tool called TensorBoard.
The computations you'll use TensorFlow for - like training a massive deep neural network - can be complex and confusing. To make it easier to understand, debug, and optimize TensorFlow programs, we've included a suite of visualization tools called TensorBoard. You can use TensorBoard to visualize your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it.
TensorBoard is a useful tool for visualizing and debugging training of a Nerual Network. It is used to babysitting the learning process, display useful information like the graph of the model and to manage experiments.
We need to tell TensorFlow to save the graph of the model in order to visualize it.
writer = tf.summary.FileWriter('temp/graphs', tf.get_default_graph())
writer.close()
Then in the current directory (remember to activate your environment) type:
tensorboard --logdir=temp/graphs
A common source of error is to add operations at run time, to the graph. While it is perfectly normal in any computer language, it is a problem in TensorFlow. Let's look at it with TensorBoard
x = tf.Variable(10, name='x')
y = tf.Variable(20, name='y')
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for _ in range(10):
sess.run(tf.add(x, y))
writer = tf.summary.FileWriter('temp/graphs', sess.graph)
writer.close()
Sometimes this is called lazy loading. As can be seen we need to defer the creation and initialization of an object until it is needed.
Now tensorflow is especially useful to train Machine Learning models. We will briefly sketch the workflow of building simple models. We will switch to more complex models (Neural Networks) in following notebooks.
The most simple model to build is based on a target variable $y$ that we are trying to predict given input data $x$. We have a set of $x$ and $y$ values that we use to train the model. Suppose that $y$ is a vector with continuous values, we have a regression model that can be described with a simple equation: $f(x) = w^Tx + b$ and $y = f(x) + \epsilon$, where $f(x)$ is a linear combination of our training data $x$, with a set of weights $w$ and an intercept $b$. The target $y$ is equal to $f(x)$ with added gaussian noise.
We implement this simple regression with tensorflow. First we need to create some input data, then we will create the computational graph, we'll train the model and look at the results.
Now we define a set of points to used for training and as target.
data = np.random.randn(2000, 3)
w_real = [0.3, 0.5, 0.1]
b_real = -0.2
noise = np.random.randn(2000) * 0.1
y_real = np.matmul(w_real, data.T) + b_real
target = y_real + noise
Phase 1: assemble the graph.
Step 1: create placeholders for input and target
X = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.placeholder(tf.float32, shape=None)
Step 2: create variables for weights and bias
init_val = tf.random_normal((1, 3), 0, 1)
W = tf.get_variable(name='weights', dtype=tf.float32, initializer=init_val)
b = tf.get_variable(name='bias', dtype=tf.float32, initializer=0.0)
Step 3: define the model
linear_model = tf.matmul(W, X, transpose_b=True) + b
NOTE: How to train such a linear model? There is however a simple closed formula to solve a linear regression model, but we will define a loss function and an optimization operation, that will be inserted into the graph (it will be helpful later on when we will see Neural Networks). The loss function defines a method to evaluate model performance. The loss function defines a distance between the predicted values and the target values, and it is used as input to the optimization procedure that finds a set of parameters that minimize the loss function. The most common loss function is the Mean Squared Error (MSE), that finds the squared distance between a set of points and then averages the results over all points. The optimization operation is gradient descent that iteratively updates the weights in a way that decreases loss over time. The update rule is based on the gradient of the loss function. If the loss is a multivariate function of the weights $F(\hat w)$, then in the neighborhood of some point $\hat w_0$, the steepest direction of decrease of $F(\hat w)$ is obtained by moving from $\hat w_0$ in the direction of the negative gradient of $F$ at $\hat w_0$. We will give more hints on the gradient descent algorithm and its variants when we will use it with Neural Networks.
Step 4: define the loss function
loss = tf.reduce_mean(tf.square(linear_model - y))
Step 5: define the optimizer and the training operation
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
Phase 2: train the model
*Step 1: define an init operation
init = tf.global_variables_initializer()
Step 2: run the optimizer with training data
NUM_STEPS = 10
with tf.Session() as sess:
sess.run(init)
tmp_ = sess.run([W, b])
print(0, tmp_[0], tmp_[1])
for step in range(NUM_STEPS):
sess.run(train, {X: data, y: target})
if ((step+1) % 5 == 0):
tmp_ = sess.run([W, b])
print(step+1, tmp_[0], tmp_[1])
0 [[-0.24243562 -0.5938537 1.0206949 ]] 0.0 5 [[0.29791233 0.5040582 0.09849343]] -0.1979366 10 [[0.2979123 0.5040581 0.09849343]] -0.19793658
We will learn to classify MNIST handwritten digit images into their correct label (0-9). MNIST is a standard dataset hosted on Yann LeCun's website. The digits have been size-normalized and centered in a fixed-size image.
The importance of classical datasets is twofold. First they are good for people who want to try machine learning techniques while spending minimal efforts on preprocessing and formatting data. Second they are useful for comparing machine learning algorithms, since we know well how they work on these datasets.
Each image is 28 pixels by 28 pixels, representing an handwritten number between 0 to 9.
Image('images/mnist1.png')
We can interpret this as a big array of numbers:
Image('images/mnist2.png', width=500, height=500)
Since each image has 28 by 28 pixels, we get a 28x28 array. We can flatten each array into a 28∗28=784 dimensional vector. Each component of the vector is a value between zero and one describing the intensity of the pixel. Thus, we generally think of MNIST as being a collection of 784-dimensional vectors. Flattening the image may throw away information about the structure and it surely does. There are methods that look directly at the 2D image but will be covered in later tutorials.
Now, we load the data and we see how it is organized.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('example_data/MNIST_data', one_hot=True)
Extracting example_data/MNIST_data/train-images-idx3-ubyte.gz Extracting example_data/MNIST_data/train-labels-idx1-ubyte.gz Extracting example_data/MNIST_data/t10k-images-idx3-ubyte.gz Extracting example_data/MNIST_data/t10k-labels-idx1-ubyte.gz
The dataset is split into three parts, one for training, one for testingand one for validation. Each dataset is an n-dimensional array with shape [number of examples, 784]. Each example is an image with associated a corresponding label, a number between 0 - 9 that represents the digit depicted in the image.
# The MNIST dataset has 10 classes, representing the digits 0 through 9.
NUM_CLASSES = 10
# The MNIST images are always 28x28 pixels.
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE
print('train examples: ', mnist.train.num_examples)
print('test examples: ', mnist.test.num_examples)
print('validation examples: ', mnist.validation.num_examples)
train examples: 55000 test examples: 10000 validation examples: 5000
In order to classify the digits we will use an output layer with 10 units, one for each digits. For this reason our labels are encoded as "one-hot vectors". A one-hot vector is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case the n-th digit will be represented as a vector which is 1 in the n-th dimension. For example, 3 would be [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]. Consequently, mnist.train.labels is a [55000, 10] array of floats.
print(mnist.train.labels[0])
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
We will use a simple linear model for classification, the logistic regression; logistic regression outputs a probability for each class. In particular we will use a softmax classification for the case of multiple classes.
Image('images/logistic_function.png', height="500", width="500")
Where the $z$ in previous function has the value of $W^Tx+b$ and the output $c$ of this function is the probability of being in that class that is $P(Y=c \mid X=x)$, given $x$ parametrized by $W$ and $b$.
Phase 1:: build the graph.
Step 1: define two placeholder variables, one for the input $x$ the other for the (true) output $y$.
x = tf.placeholder(tf.float32, shape=[None, IMAGE_PIXELS])
y = tf.placeholder(tf.int32, shape=[None, NUM_CLASSES])
Step 2: weights and biases
The first variable that must be optimized is called weights
and is defined here as a TensorFlow variable that must be initialized with zeros and whose shape is [IMAGE_PIXELS, NUM_CLASSES]
, so it is a 2-dimensional tensor (or matrix) with IMAGE_PIXELS
rows and NUM_CLASSES
columns.
The second variable that must be optimized is called biases
and is defined as a 1-dimensional tensor (or vector) of length NUM_CLASSES
.
weights = tf.Variable(tf.zeros([IMAGE_PIXELS, NUM_CLASSES]))
biases = tf.Variable(tf.zeros([NUM_CLASSES]))
Step 3: build the model
This simple mathematical model multiplies the images in the placeholder variable x
with the weights
and then adds the biases
.
The result is a matrix of shape [num_images, NUM_CLASSES]
because x
has shape [num_images, IMAGE_PIXELS]
and weights
has shape [IMAGE_PIXELS, NUM_CLASSES]
, so the multiplication of those two matrices is a matrix with shape [num_images, NUM_CLASSES]
and then the biases
vector is added to each row of that matrix.
logits = tf.matmul(x, weights) + biases
Now logits
is a matrix with num_images
rows and num_classes
columns, where the element of the $i$'th row and $j$'th column is an estimate of how likely the $i$'th input image is to be of the $j$'th class.
However, these estimates are a bit rough and difficult to interpret because the numbers may be very small or large, so we want to normalize them so that each row of the logits
matrix sums to one, and each element is limited between zero and one. This is calculated using the so-called softmax function and the result is stored in y_pred
.
y_pred = tf.nn.softmax(logits)
The predicted class can be calculated from the y_pred
matrix by taking the index of the largest element in each row.
y_pred_cls = tf.argmax(y_pred, axis=1)
Step 3: define loss function
To make the model better at classifying the input images, we must somehow change the variables for weights
and biases
. To do this we first need to know how well the model currently performs by comparing the predicted output of the model y_pred
to the desired output y
.
The softmax is a generalization of the logistic function, it is used to calculate the probability associated to each class and it is usefull in multiclass classification problems. A softmax regression has two steps: first we add up the evidence of our input being in certain classes, and then we convert that evidence into probabilities. Here a class is one of the 10 possibile digits.
Next we define the cost function, cross entropy in this case. Cross-entropy gives us a way to express how different two probability distributions are. The more different the distributions p and q are, the more the cross-entropy of p with respect to q will be bigger than the entropy of p. Similarly, the more different p is from q, the more the cross-entropy of q with respect to p will be bigger than the entropy of q. If the distributions are the same, this difference will be zero. As the difference grows, it will get bigger.
The goal of optimization is therefore to minimize the cross-entropy so it gets as close to zero as possible by changing the weights
and biases
of the model.
TensorFlow has a built-in function for calculating the cross-entropy. Note that it uses the values of the logits
because it also calculates the softmax internally.
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits,
labels=y)
We have now calculated the cross-entropy for each of the image classifications so we have a measure of how well the model performs on each image individually. But in order to use the cross-entropy to guide the optimization of the model's variables we need a single scalar value, so we simply take the average of the cross-entropy for all the image classifications.
cost = tf.reduce_mean(cross_entropy)
Step 4:
Now that we have a cost measure that must be minimized, we can then create an optimizer. In this case it is the basic form of Gradient Descent where the learning rate is set at $0.5$.
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.5).minimize(cost)
We need a few more performance measures to display the progress to the user.
This is a vector of booleans whether the predicted class equals the true class of each image.
This calculates the classification accuracy by first type-casting the vector of booleans to floats, so that False becomes 0 and True becomes 1, and then calculating the average of these numbers.
correct_prediction = tf.equal(tf.argmax(tf.nn.softmax(logits), 1),
tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
Phase 2: train the model
Step 1: initialize variables
There are 50.000 images in the training-set. It takes a long time to calculate the gradient of the model using all these images. We therefore use Stochastic Gradient Descent which only uses a small batch of images in each iteration of the optimizer.
NUM_ITERATIONS = 1000
BATCH_SIZE = 100
session = tf.Session()
session.run(tf.global_variables_initializer())
Step 2: run optimizer
Function for performing a number of optimization iterations so as to gradually improve the weights
and biases
of the model. In each iteration, a new batch of data is selected from the training-set and then TensorFlow executes the optimizer using those training samples.
for i in range(NUM_ITERATIONS):
x_batch, y_batch = mnist.train.next_batch(BATCH_SIZE)
feed_dict_train = {x: x_batch,
y: y_batch}
session.run(optimizer, feed_dict=feed_dict_train)
Step 3: verify model accuracy
Dict test contains test data that the model has not seen during the training
feed_dict_test = {x: mnist.test.images,
y: mnist.test.labels}
acc = session.run(accuracy, feed_dict=feed_dict_test)
print("Accuracy on test set: {0:.1%}".format(acc))
Accuracy on test set: 92.0%
cls_true = np.argmax(mnist.test.labels, axis=1)
cls_pred = session.run(y_pred_cls, feed_dict=feed_dict_test)
cm = confusion_matrix(y_true=cls_true,
y_pred=cls_pred)
print(cm)
[[ 953 0 1 2 0 10 7 3 4 0] [ 0 1105 2 2 1 2 4 2 17 0] [ 9 7 891 18 11 9 14 12 51 10] [ 3 1 13 908 0 39 3 8 23 12] [ 1 1 2 1 900 0 12 2 10 53] [ 8 3 3 20 7 799 12 4 29 7] [ 9 3 3 2 13 22 899 3 4 0] [ 2 9 19 8 6 1 0 932 3 48] [ 4 5 4 14 9 29 9 4 885 11] [ 10 6 0 6 26 11 1 13 9 927]]
When using TensorFlow, the first step is to write a python program that assembles the graph, second the graph is run by the C++ runtime. In a sense TensorFlow is a declarative program, because the specification of the computaion is seprated by the execution.
From another perspective TensorFlow is both a programming language and a compiler of machine learning models: its input is a description of a machine learning model (the graph) and rewrites it into an optimized and executable form.
The pros of this approach are:
The cons are:
print
and pdb
are not effectiveFor this reason since TensorFlow 1.5 Eager execution is available. Eager is:
A NumPy-like library for numerical computation with support for GPU acceleration and automatic differentiation. A flexible platform for machine learning research and experimentation (see user guide).
import tensorflow.contrib.eager as tfe
tfe.enable_eager_execution()
The pros of eager are:
i = tf.constant(0)
while i < 10:
i = tf.add(i, 1)
print('New value is: {:d}'.format(i))
New value is: 1 New value is: 2 New value is: 3 New value is: 4 New value is: 5 New value is: 6 New value is: 7 New value is: 8 New value is: 9 New value is: 10
For this reasons there is no longer need for placeholders, sessions, control dependencies, "lazy loading" and scopes.
For example in this case there is no need to declare a placeholder for x and no need to create a session.
x = [[3.]]
m = tf.matmul(x, x)
print(m)
tf.Tensor([[9.]], shape=(1, 1), dtype=float32)
The most interesting thing is to note that tensors act like numpy arrays, for example:
x = tf.constant([1.1, 2.2, 3.3])
print(type(x.numpy()) == np.ndarray)
True
and are compatible with numpy functions:
print(np.square(x))
[ 1.21 4.84 10.889999]
tensors are also iterable:
for i in x:
print('{:.1f}'.format(i))
1.1 2.2 3.3
Automated differentiation is built into eager execution, in practice operation are traced on a tape and played back to compute gradients. For example:
def square(x):
return tf.multiply(x, x)
grad = tfe.gradients_function(square)
print(square(3.))
print(grad(3.))
tf.Tensor(9.0, shape=(), dtype=float32) [<tf.Tensor: id=202, shape=(), dtype=float32, numpy=6.0>]
The API for automated differentiation works even when eager execution is not enabled.
Another way to think about TensorFlow is a collection of operations and a mean to execute computations composed of such operations. Session provide is one method, with eager enabled Python is the method of execution.
Use eager if you want flexibility when developing a new model or if you want to explore TensorFlow API. At the moment eager is under active development, not all TensorFlow API are supported. With eager execution enabled, computation do not run automatically on GPUs. The user must explicitly specify if the operation must run on GPUs with tf.device
.
Visit www.add-for.com for more tutorials and updates.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.