At firs, we need to load mxnet gem.
require 'mxnet'
true
Initialize the variables that are used below.
@data_dir = File.expand_path("../data/mnist")
@data_ctx = MXNet.cpu
@model_ctx = MXNet.cpu
#@model_ctx = MXNet.gpu
#<MXNet::Context:0x00000000024232c0 @device_type_id=1, @device_id=0>
Note that if you have CUDA available GPU, @model_ctx = MXNet.gpu
enables us to use GPU for all the computation below.
Setting up data loaders for both training and validation.
num_inputs = 784
num_outputs = 10
batch_size = 64
num_examples = 60000
train_iter = MXNet::IO::MNISTIter.new(
image: File.join(@data_dir, 'train-images-idx3-ubyte'),
label: File.join(@data_dir, 'train-labels-idx1-ubyte'),
batch_size: batch_size,
shuffle: true)
val_iter = MXNet::IO::MNISTIter.new(
image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
batch_size: batch_size,
shuffle: false)
nil
Initialize the weights and biases of the neural network.
#######################
# Set some constants so it's easy to modify the network later
#######################
num_hidden = 256
weight_scale = 0.01
#######################
# Allocate parameters for the first hidden layer
#######################
@w1 = MXNet::NDArray.random_normal(shape: [num_inputs, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b1 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)
#######################
# Allocate parameters for the second hidden layer
#######################
@w2 = MXNet::NDArray.random_normal(shape: [num_hidden, num_hidden], scale: weight_scale, ctx: @model_ctx)
@b2 = MXNet::NDArray.random_normal(shape: [num_hidden], scale: weight_scale, ctx: @model_ctx)
#######################
# Allocate parameters for the output layer
#######################
@w3 = MXNet::NDArray.random_normal(shape: [num_hidden, num_outputs], scale: weight_scale, ctx: @model_ctx)
@b3 = MXNet::NDArray.random_normal(shape: [num_outputs], scale: weight_scale, ctx: @model_ctx)
nil
Mark all the parameters to be calculated their gradients automatically.
@all_params = [@w1, @b1, @w2, @b2, @w3, @b3]
@all_params.each(&:attach_grad)
nil
Define ReLU activation function.
def relu(x)
MXNet::NDArray.maximum(x, MXNet::NDArray.zeros_like(x))
end
:relu
Define softmax cross entropy function that is used to calculate for computing predictionlosses.
def softmax_cross_entropy(y_hat_linear, y)
return -MXNet::NDArray.nansum(y * MXNet::NDArray.log_softmax(y_hat_linear), axis: 0, exclude: true)
end
:softmax_cross_entropy
The definition of the neural network.
def net(x)
# first hidden layer
h1_linear = MXNet::NDArray.dot(x, @w1) + @b1
h1 = relu(h1_linear)
# second hidden layer
h2_linear = MXNet::NDArray.dot(h1, @w2) + @b2
h2 = relu(h2_linear)
# output layer
y_hat_linear = MXNet::NDArray.dot(h2, @w3) + @b3
return y_hat_linear
end
:net
And parameter optimizer. In this notebook, stochastic gradient descent is used.
def sgd(params, lr)
params.each do |param|
param[0..-1] = param - lr * param.grad
end
end
:sgd
The next function is calculate the prediction accuracy for the given data set.
def evaluate_accuracy(data_iter)
numerator = 0.0
denominator = 0.0
data_iter.each_with_index do |batch, i|
data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
label = batch.label[0].as_in_context(@model_ctx)
output = net(data)
predictions = MXNet::NDArray.argmax(output, axis: 1)
numerator += MXNet::NDArray.sum(predictions == label)
denominator += data.shape[0]
end
return (numerator / denominator).as_scalar
end
:evaluate_accuracy
Execute the training loop for 10 epochs.
epochs = 10
learning_rate = 0.001
smoothing_constant = 0.01
epochs.times do |e|
start = Time.now
cumulative_loss = 0.0
train_iter.each_with_index do |batch, i|
data = batch.data[0].as_in_context(@model_ctx).reshape([-1, 784])
label = batch.label[0].as_in_context(@model_ctx)
label_one_hot = MXNet::NDArray.one_hot(label, depth: 10)
loss = MXNet::Autograd.record do
output = net(data)
softmax_cross_entropy(output, label_one_hot)
end
loss.backward()
sgd(@all_params, learning_rate)
cumulative_loss += MXNet::NDArray.sum(loss).as_scalar
end
val_accuracy = evaluate_accuracy(val_iter)
train_accuracy = evaluate_accuracy(train_iter)
duration = Time.now - start
puts "Epoch #{e}. Loss: #{cumulative_loss/num_examples}, Train_acc #{train_accuracy}, Val_acc #{val_accuracy} (#{duration} sec)"
end
Epoch 0. Loss: 1.2580198553880055, Train_acc 0.8733324408531189, Val_acc 0.8747996687889099 (3.4784769 sec) Epoch 1. Loss: 0.3402405142625173, Train_acc 0.925293505191803, Val_acc 0.9240785241127014 (2.9854487 sec) Epoch 2. Loss: 0.2273845267256101, Train_acc 0.9493563175201416, Val_acc 0.9460136294364929 (3.1201378 sec) Epoch 3. Loss: 0.16553548452655475, Train_acc 0.9613627195358276, Val_acc 0.9580328464508057 (3.3423264 sec) Epoch 4. Loss: 0.1294232620060444, Train_acc 0.969266951084137, Val_acc 0.9645432829856873 (3.1851936 sec) Epoch 5. Loss: 0.10531740031341712, Train_acc 0.9744197130203247, Val_acc 0.9686498641967773 (3.008946 sec) Epoch 6. Loss: 0.08776766262004773, Train_acc 0.9786719679832458, Val_acc 0.9706530570983887 (2.9635756 sec) Epoch 7. Loss: 0.07455756265173356, Train_acc 0.9818903207778931, Val_acc 0.97265625 (2.9956695 sec) Epoch 8. Loss: 0.06412682893921931, Train_acc 0.9842416048049927, Val_acc 0.973557710647583 (2.9654095 sec) Epoch 9. Loss: 0.05563920784989993, Train_acc 0.9862593412399292, Val_acc 0.9748597741127014 (2.9865925 sec)
10
Now we have the model with 0.97 validation accuracy.
Let's use the trained model for prediction.
We need the following helper function to display the input image.
require 'chunky_png'
require 'base64'
def imshow(ary)
height, width = ary.shape
fig = ChunkyPNG::Image.new(width, height, ChunkyPNG::Color::TRANSPARENT)
ary = ((ary - ary.min) / ary.max) * 255
0.upto(height - 1) do |i|
0.upto(width - 1) do |j|
v = ary[i, j].round
fig[j, i] = ChunkyPNG::Color.rgba(v, v, v, 255)
end
end
src = 'data:image/png;base64,' + Base64.strict_encode64(fig.to_blob)
IRuby.display "<img src='#{src}' width='#{width*2}' height='#{height*2}' />", mime: 'text/html'
end
:imshow
Define the function for prediction.
def predict(data)
output = net(data)
MXNet::NDArray.argmax(output, axis: 1)
end
:predict
Create a new data iterator for prediction. And generate predictions for the first 10 samples.
sample_size = 10
sample_iter = test_iter = MXNet::IO::MNISTIter.new(
image: File.join(@data_dir, 't10k-images-idx3-ubyte'),
label: File.join(@data_dir, 't10k-labels-idx1-ubyte'),
batch_size: sample_size,
shuffle: true,
seed: rand(100))
sample_iter.each do |batch|
data = batch.data[0].as_in_context(@model_ctx)
label = batch.label[0]
im = data.transpose(axes: [1, 0, 2, 3]).reshape([10*28, 28, 1])
imshow(im[0..-1, 0..-1, 0].to_narray)
pred = predict(data.reshape([-1, 784]))
puts "model predictions are: #{pred.inspect}"
puts
puts "true labels: #{label.inspect}"
break
end
model predictions are: [0, 2, 7, 8, 7, 1, 5, 4, 3, 9] <MXNet::NDArray 10 @cpu(0)> true labels: [0, 2, 3, 8, 7, 1, 5, 8, 3, 9] <MXNet::NDArray 10 @cpu(0)>
In this notebook, the simple neural network model for MNIST classifier is implemented by MXNet NDArray API.
More complex example is available here: https://github.com/mrkn/mxnet.rb/blob/taiwan2018/example/scratch/resnet/wrn.rb