In this deep learning tutorial notebook, we will cover the following topics:
Note about shell commands in Jupyter Notebooks: To avoid having to open a separate terminal window, we will be using a Jupyter feature that allows you to execute commands in the bash shell by starting the line with a !
. For example:
! ls *.ipynb
will show all the Jupyter notebooks in the current directory. These commands could just as well be typed in a terminal window.
In this tutorial, we will focus on the Keras neural networking library. Keras provides an easy-to-use, high level Python API for building and training neural networks, and has quickly become the most popular interface for deep learning. Part of the power of Keras is that it delegates the computation to one of the following deep learning frameworks:
TensorFlow is the default Keras backend, and that is what we will use for this tutorial. This will enable us to make use of TensorFlow tools, like TensorBoard, in later tutorial units.
The benefit of libraries like Keras/TensorFlow is that they are designed to be hardware indepenedent. The same code model code (with few exceptions) will execute on either the CPU or the GPU. This allows you to train on a system with a GPU, and deploy on a server with only CPUs, or any combination.
We will be using GPUs to speed up our training today. Any NVIDIA GPU from the last 5 years can be used with TensorFlow, although for most problems we suggest a GPU with at least 8 GB of memory. The GeForce product line is fine for learning, but generally the Titan or Quadro cards are better for workstations and the Tesla cards for servers. For deep learning, we generally suggest:
The Tesla K80, P100 and V100 can be rented from most cloud computing providers.
For the best experience, we suggest using Linux for GPU-accelerated deep learning. We will be using Ubuntu Linux 64-bit 18.04 in this tutorial, but any Linux distribution supported by the official NVIDIA drivers. If you are using Anaconda, you do not need to install CUDA to use the GPU-accelerated packages, but the official NVIDIA drivers (not the free Nouveau drivers that come preinstalled with some Linux distributions) are required.
On Linux, we can check the GPU hardware available on our system with the nvidia-smi
command:
! nvidia-smi
This table shows the GPU name, total and used GPU memory, temperature, power consumption, and utilization. For this tutorial, we will be using the slightly older Tesla P100 cards with 16 GB of memory per device.
Once you have access to a suitable NVIDIA GPU and operating system drivers, you can get started very quickly by installing Anaconda for 64-bit Linux.
Once Anaconda is installed, you can use conda
to install the GPU-accelerated Keras/TensorFlow:
conda install keras-gpu
(For this tutorial, we also use the notebook
, h5py
, bokeh
, and holoviews
packages.)
We have already preloaded these packages into the tutorial environment, which you can verify with conda list
:
! conda list 'keras|tensorflow|bokeh|holoviews|notebook|h5py' # Using a regular expression to only show some packages
Now that we have verified the correct conda packages are installed, let's start importing some libraries. First, we are going to import Holoviews (for plotting) and enable the Bokeh backend:
import numpy as np
np.warnings.filterwarnings('ignore') # Hide np.floating warning
import holoviews as hv
import colorcet as cc
hv.extension('bokeh')
Next, let's import Keras:
import keras
keras.__version__
Even though we are using Keras for the majority of the tutorial, we can also import and access TensorFlow directly.
Allocating and deallocating GPU memory can be slow (much slower than on the CPU), so most deep learning frameworks use memory pools to speed things up. When a TensorFlow session initializes the GPU, it grabs 90% of the GPU memory in one big block, and then internally divides it up for different arrays. This is normally not a problem, but we will have several notebooks open at once in this tutorial, each one with its own TensorFlow session. For that reason, we will use the following trick to tell TensorFlow it is OK to grow its allocation as needed, rather than taking all 90% up front.
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth=True
sess = tf.Session(config=config)
Let's verify our GPUs have been detected by TensorFlow:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
Here we can see that TensorFlow has detected both CPU and GPU devices.
Keras includes a datasets package that will automatically download and import standard test datasets into NumPy arrays.
For this tutorial, we will use the CIFAR10 image set, which has the following properties:
Let's load the data:
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
# The data only has numeric categories so we also have the string labels below
cifar10_labels = np.array(['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck'])
The data is organized into NumPy arrays of image data and label data, which we can see from the shape information:
print('x_train shape:', x_train.shape)
print('y_train shape:', y_train.shape)
print('x_test shape:', x_test.shape)
print('y_test shape:', y_test.shape)
For the x
arrays, the indices are ordered:
We can pull out a single image and plot it:
# For more information on Holoviews configuration, see http://holoviews.org/user_guide/Customizing_Plots.html
i = 12
print(cifar10_labels[y_train[i][0]])
layout = hv.RGB(x_train[i])
hv.output(
layout.opts(
hv.opts.RGB(xaxis=None, yaxis=None),
),
size=48
)
i = 12
print(cifar10_labels[y_train[i][0]])
layout = hv.Image(x_train[i,:,:,0], label='r') \
+ hv.Image(x_train[i,:,:,1], label='g') \
+ hv.Image(x_train[i,:,:,2], label='b')
hv.output(
layout.opts(
hv.opts.Image(xaxis=None, yaxis=None, cmap='gray')
),
size=48
)
If we want to see a bunch of images from a particular class, we can use NumPy fancy indexing and Holoviews layouts to make a grid:
# Deer are class 4
deer = (y_test[:,0] == 4)
images = [hv.RGB(x_test[deer][i]) for i in range(24)]
hv.output(
hv.Layout(images).cols(8).opts(
hv.opts.RGB(xaxis=None, yaxis=None)
),
size=32
)
Note that Holoviews/Bokeh/Matplotlib all assume images are in RGB format, but not all systems load images that way. For example, OpenCV loads images from files and webcams in BGR format. To get a sense what that error looks like, we can simulate what that would look like by reversing the 4th axis:
i = 12
layout = hv.RGB(x_train[i], label='Correct (RGB)') + hv.RGB(x_train[i,:,:,::-1], label='Flipped (BGR)')
hv.output(
layout.opts(
hv.opts.RGB(xaxis=None, yaxis=None)
),
size=48
)
For reasons of numerical precision and stability, it is a good idea to scale inputs to the network so that they are floating point numbers approximately between -1 and 1. For best training performance on the GPU, using 32-bit floating point numbers is preferred since those have higher performance (between 2x and 24x depending on your GPU) than 64-bit floating point numbers.
We can use standard NumPy functions and expressions to make this conversion:
x_train_norm = x_train.astype('float32')
x_test_norm = x_test.astype('float32')
x_train_norm /= 255
x_test_norm /= 255
print('x_train dtype:', x_train_norm .dtype)
print('x_train min/max:', x_train_norm .min(), x_train_norm .max())
For the output labels of the network, we will use one-hot encoding. To use one-hot encoding, we need to know the number of categories (or "classes") in our data. For CIFAR10, that number is 10.
import keras.utils
num_classes = 10
y_train_norm = keras.utils.to_categorical(y_train, num_classes)
y_test_norm = keras.utils.to_categorical(y_test, num_classes)
We can look at the first 5 rows to see how there is a single 1
in each column:
print('y_train shape:', y_train_norm.shape)
y_train_norm[:5]
Note that if we ever want to convert one-hot encoding back to category numbers, we can use the NumPy argmax
function to find the column number with the largest value along axis 1 for each row:
np.argmax(y_train_norm, axis=1)[:5]
Because these transformations of the input and category labels are quick, we will not load pre-transformed data from disk in future units. However, if we wanted to save the transformed arrays, we could save them in an HDF5 file:
import h5py
with h5py.File('transformed_data.hdf5', 'w') as f:
f.create_dataset('x_train', data=x_train_norm)
f.create_dataset('y_train', data=y_train_norm)
f.create_dataset('x_test', data=x_test_norm)
f.create_dataset('y_test', data=y_test_norm)
These files can get very big:
! ls -lh transformed_data.hdf5
To load the data later:
with h5py.File('transformed_data.hdf5', 'r') as f:
x_train_norm = f['x_train'][:] # The extra slice [:] forces h5py to load into memory
y_train_norm = f['y_train'][:] # The extra slice [:] forces h5py to load into memory
x_test_norm = f['x_test'][:] # The extra slice [:] forces h5py to load into memory
y_test_norm = f['y_test'][:] # The extra slice [:] forces h5py to load into memory
print(x_train_norm.shape)
print(x_train_norm.dtype)