While Google's TensorFlow and FaceBook's PyTorch are popular libraries, and (with some caveats) are available in the default Anaconda channel, they are not installed by default. Let's look at the many libraries briefly, then we will install and play with those two.
Let's set up a Conda environment with the necessary libraries. While Anaconda does have PyTorch, it only has it for Linux, so let's add the pytorch
channel and install pytorch
from there.
conda create -n mlwork python==3.6 anaconda ipykernel tensorflow pytorch -c pytorch
You can replace the metapackage anaconda
with the list of packages you will be using. If you are on OSC, logging out then back in should be enough to set this up for Jupyter. On other systems using the latest Anaconda:
conda activate mlwork
python -m ipykernel install --user --name <pick_name_here>
conda deactivate
Or, an older Anaconda:
source activate mlwork
python -m ipykernel install --user --name <pick_name_here>
source deactivate
(Skip source
on Windows)
All the libraries have CPU/GPU support, decent performance, etc. There are lots more; these are just the most popular.
You may have noticed that I'm covering ML libraries before covering ML. That's because we have already covered fitting, and ML is mostly fitting. Let's review fitting:
What's different in ML? We usually use a different metric instead of NLL, the models are larger but make of simpler parts. That's about it! So how can we improve fitting?
Why use a framework instead of just writing plain Numpy? A few possible reasons:
Numpy classically had issues with temporaries. It's much better now (at least on Numpy 1.13+ on Linux and macOS)
import numpy as np
np.random.seed(42)
N = 1_000_000
a = np.random.rand(N)
b = np.random.rand(N)
c = np.random.rand(N)
%%timeit
s = a + b + c # How many arrays are in memory? (classic: 5, 1.13+: 4 on some systems)
%%timeit
ab = a + b
s = ab + c # Right here, how many arrays are in memory? (5)
del ab
%%timeit
s = a + b
s += c # (4)
Depending on your system, the first time should look like one of the other two times.
Let's make a small sample of data to fit.
from scipy.optimize import minimize
np.random.seed(42)
dist = np.random.normal(loc=1, scale=2., size=100_000)
def gaussian(x, μ, σ):
return 1/np.sqrt(2*np.pi*σ**2) * np.exp(-(x-μ)**2/(2*σ**2))
def nll(params, dist):
mean, sigma = params
return -np.sum(np.log(gaussian(dist, mean, sigma)))
minimize(nll, (.5, 1.), args=(dist,))
When converting to Torch, let's notice a couple of quirks compared to Numpy:
tensor
to make Tensors, not the constructor Tensor
import torch
tdist = torch.tensor(dist, dtype=torch.float64)
tmean = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
tsigma = torch.tensor([0.5], dtype=torch.float64, requires_grad=True)
def tgaussian(x, μ, σ):
return 1/torch.sqrt(2*np.pi*σ**2) * torch.exp(-(x-μ)**2/(2*σ**2))
result = -torch.sum(torch.log(tgaussian(tdist, tmean, tsigma)))
print(result.item())
result.backward()
print(tmean.grad.item(), tsigma.grad.item())
Unfortunately, this is not trivial to put into minimize, since the autograd requires "result", and is built once each time this runs.
TensorFlow does not make any pretense about looking or acting like Numpy.
import tensorflow as tf
# Make the distribution a constant Tensor;
# it does not change in iterations so TensorFlow can optimize for that.
x = tf.constant(dist)
# Make placeholders for values we are going to "feed" in
# (0D tensor == scalar)
tf_mean = tf.placeholder(dtype=tf.float64)
tf_sigma = tf.placeholder(dtype=tf.float64)
# tf_gaussian is a Tensor graph (like a function) can can compute this expression!
tf_gaussian = 1/tf.sqrt(2*np.pi*tf_sigma**2) * tf.math.exp(-(x-tf_mean)**2/(2*tf_sigma**2))
# This is still just a "description of what to do", no computation has been done yet
tf_nll = -tf.reduce_sum(tf.math.log(tf_gaussian))
# We can compute symbolic derivatives with the graph, as well
tf_grads = tf.gradients(tf_nll, [tf_mean, tf_sigma])
with tf.Session() as sess:
loss_value = sess.run(tf_nll,
feed_dict={tf_mean:0.5,
tf_sigma:0.5})
grads = sess.run(tf_grads,
feed_dict={tf_mean:0.5,
tf_sigma:0.5})
print(loss_value, grads)
Notes:
sess.run
is fast).