Implementing a simple deep neural network for MNIST classifier¶

At firs, we need to load mxnet gem.

In [2]:

require 'mxnet'

Out[2]:

true

Setting up the configuration¶

Initialize the variables that are used below.

In [3]:

@data_dir = File.expand_path("../data/mnist")
@data_ctx = MXNet.cpu
@model_ctx = MXNet.cpu
#@model_ctx = MXNet.gpu

Out[3]:

#<MXNet::Context:0x00000000024232c0 @device_type_id=1, @device_id=0>

Note that if you have CUDA available GPU, @model_ctx = MXNet.gpu enables us to use GPU for all the computation below.

Data loaders¶

Setting up data loaders for both training and validation.

In [4]:

num_inputs = 784
num_outputs = 10
batch_size = 64
num_examples = 60000

train_iter = MXNet::IO::MNISTIter.new(
  image: File.join(@data_dir, 'train-images-idx3-ubyte'),
  label: File.join(@data_dir, 'train-labels-idx1-ubyte'),
  batch_size: batch_size,
  shuffle: true)
val_iter = MXNet::IO::MNISTIter.new(
  image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
  label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
  batch_size: batch_size,
  shuffle: false)
nil

The parameters of the model¶

Initialize the weights and biases of the neural network.

In [5]:

#######################
#  Set some constants so it's easy to modify the network later
#######################
num_hidden = 256
weight_scale = 0.01

#######################
#  Allocate parameters for the first hidden layer
#######################
@w1 = MXNet::NDArray.random_normal(shape: [num_inputs, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b1 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)

#######################
#  Allocate parameters for the second hidden layer
#######################
@w2 = MXNet::NDArray.random_normal(shape: [num_hidden, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b2 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)

#######################
#  Allocate parameters for the output layer
#######################
@w3 = MXNet::NDArray.random_normal(shape: [num_hidden, num_outputs], scale: weight_scale, ctx: @model_ctx)
@b3 = MXNet::NDArray.random_normal(shape: [num_outputs], scale: weight_scale, ctx: @model_ctx)

nil

Mark all the parameters to be calculated their gradients automatically.

In [6]:

@all_params = [@w1, @b1, @w2, @b2, @w3, @b3]
@all_params.each(&:attach_grad)
nil

Activation and loss functions¶

Define ReLU activation function.

In [7]:

def relu(x)
  MXNet::NDArray.maximum(x, MXNet::NDArray.zeros_like(x))
end

Out[7]:

:relu

Define softmax cross entropy function that is used to calculate for computing predictionlosses.

In [8]:

def softmax_cross_entropy(y_hat_linear, y)
  return -MXNet::NDArray.nansum(y * MXNet::NDArray.log_softmax(y_hat_linear), axis: 0, exclude: true)
end

Out[8]:

:softmax_cross_entropy

Model definition¶

The definition of the neural network.

In [9]:

def net(x)
  # first hidden layer
  h1_linear = MXNet::NDArray.dot(x, @w1) + @b1
  h1 = relu(h1_linear)

  # second hidden layer
  h2_linear = MXNet::NDArray.dot(h1, @w2) + @b2
  h2 = relu(h2_linear)

  # output layer
  y_hat_linear = MXNet::NDArray.dot(h2, @w3) + @b3
  return y_hat_linear
end

Out[9]:

:net

Parameter optimizer¶

And parameter optimizer. In this notebook, stochastic gradient descent is used.

In [10]:

def sgd(params, lr)
  params.each do |param|
    param[0..-1] = param - lr * param.grad
  end
end

Out[10]:

:sgd

Evaluator¶

The next function is calculate the prediction accuracy for the given data set.

In [11]:

def evaluate_accuracy(data_iter)
  numerator = 0.0
  denominator = 0.0
  data_iter.each_with_index do |batch, i|
    data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
    label = batch.label[0].as_in_context(@model_ctx)
    output = net(data)
    predictions = MXNet::NDArray.argmax(output, axis: 1)
    numerator += MXNet::NDArray.sum(predictions == label)
    denominator += data.shape[0]
  end
  return (numerator / denominator).as_scalar
end

Out[11]:

:evaluate_accuracy

Training loop¶

Execute the training loop for 10 epochs.

In [12]:

epochs = 10
learning_rate = 0.001
smoothing_constant = 0.01

epochs.times do |e|
  start = Time.now
  cumulative_loss = 0.0
  train_iter.each_with_index do |batch, i|
    data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
    label = batch.label[0].as_in_context(@model_ctx)
    label_one_hot = MXNet::NDArray.one_hot(label, depth: 10)
    loss = MXNet::Autograd.record do
      output = net(data)
      softmax_cross_entropy(output, label_one_hot)
    end
    loss.backward()
    sgd(@all_params, learning_rate)
    cumulative_loss += MXNet::NDArray.sum(loss).as_scalar
  end
  
  val_accuracy = evaluate_accuracy(val_iter)
  train_accuracy = evaluate_accuracy(train_iter)
  duration = Time.now - start
  puts "Epoch #{e}. Loss: #{cumulative_loss/num_examples}, Train_acc #{train_accuracy}, Val_acc #{val_accuracy} (#{duration} sec)"
end

Epoch 0. Loss: 1.2580198553880055, Train_acc 0.8733324408531189, Val_acc 0.8747996687889099 (3.4784769 sec)
Epoch 1. Loss: 0.3402405142625173, Train_acc 0.925293505191803, Val_acc 0.9240785241127014 (2.9854487 sec)
Epoch 2. Loss: 0.2273845267256101, Train_acc 0.9493563175201416, Val_acc 0.9460136294364929 (3.1201378 sec)
Epoch 3. Loss: 0.16553548452655475, Train_acc 0.9613627195358276, Val_acc 0.9580328464508057 (3.3423264 sec)
Epoch 4. Loss: 0.1294232620060444, Train_acc 0.969266951084137, Val_acc 0.9645432829856873 (3.1851936 sec)
Epoch 5. Loss: 0.10531740031341712, Train_acc 0.9744197130203247, Val_acc 0.9686498641967773 (3.008946 sec)
Epoch 6. Loss: 0.08776766262004773, Train_acc 0.9786719679832458, Val_acc 0.9706530570983887 (2.9635756 sec)
Epoch 7. Loss: 0.07455756265173356, Train_acc 0.9818903207778931, Val_acc 0.97265625 (2.9956695 sec)
Epoch 8. Loss: 0.06412682893921931, Train_acc 0.9842416048049927, Val_acc 0.973557710647583 (2.9654095 sec)
Epoch 9. Loss: 0.05563920784989993, Train_acc 0.9862593412399292, Val_acc 0.9748597741127014 (2.9865925 sec)

Out[12]:

Now we have the model with 0.97 validation accuracy.

Prediction with the trained model¶

Let's use the trained model for prediction.

We need the following helper function to display the input image.

In [13]:

require 'chunky_png'
require 'base64'

def imshow(ary)
  height, width = ary.shape
  fig = ChunkyPNG::Image.new(width, height, ChunkyPNG::Color::TRANSPARENT)
  ary = ((ary - ary.min) / ary.max) * 255
  0.upto(height - 1) do |i|
    0.upto(width - 1) do |j|
      v = ary[i, j].round
      fig[j, i] = ChunkyPNG::Color.rgba(v, v, v, 255)
    end
  end

  src = 'data:image/png;base64,' + Base64.strict_encode64(fig.to_blob)
  IRuby.display "<img src='#{src}' width='#{width*2}' height='#{height*2}' />", mime: 'text/html'
end

Out[13]:

:imshow

Define the function for prediction.

In [14]:

def predict(data)
  output = net(data)
  MXNet::NDArray.argmax(output, axis: 1)
end

Out[14]:

:predict

Create a new data iterator for prediction. And generate predictions for the first 10 samples.

In [15]:

sample_size = 10
sample_iter = test_iter = MXNet::IO::MNISTIter.new(
  image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
  label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
  batch_size: sample_size,
  shuffle: true,
  seed: rand(100))
sample_iter.each do |batch|
  data = batch.data[0].as_in_context(@model_ctx)
  label = batch.label[0]

  im = data.transpose(axes: [1, 0, 2, 3]).reshape([10*28, 28, 1])
  imshow(im[0..-1, 0..-1, 0].to_narray)

  pred = predict(data.reshape([-1, 784]))
  puts "model predictions are: #{pred.inspect}"
  puts
  puts "true labels: #{label.inspect}"
  break
end

model predictions are: 
[0, 2, 7, 8, 7, 1, 5, 4, 3, 9]
<MXNet::NDArray 10 @cpu(0)>

true labels: 
[0, 2, 3, 8, 7, 1, 5, 8, 3, 9]
<MXNet::NDArray 10 @cpu(0)>

Conclusion¶

In this notebook, the simple neural network model for MNIST classifier is implemented by MXNet NDArray API.

More complex example is available here: https://github.com/mrkn/mxnet.rb/blob/taiwan2018/example/scratch/resnet/wrn.rb

In [ ]: