Comet + R with nnet

This notebook is based on:

It attempts to learn to identify species of Iris flowers based on some of their characteristics.

Getting Started with Comet

To get started with Comet and R, please see: https://www.comet.ml/docs/r-sdk/getting-started/

Specifically, you need to create a .comet.yml file or add your Comet API key to create_experiment(). In this example, I've created a ~/.comet.yml file with these contents (replace items for your use):

COMET_WORKSPACE: YOUR-COMET-USERNAME
COMET_PROJECT_NAME: PROJECT-NAME
COMET_API_KEY: YOUR-API-KEY

Learning the Iris Dataset

R libraries needed for this notebook:

install.packages("cometr")
install.packages("nnet")
install.packages("stringr")
install.packages("IRdisplay")

Ok, now we are ready to machine learn. First we import the needed libraries:

In [1]:
library(cometr)
library(nnet)
library(stringr)
library(IRdisplay)

Next, we create a Comet experiment marking what we would like to log to the server:

In [2]:
exp <- create_experiment(
  keep_active = TRUE,
  log_output = FALSE,
  log_error = FALSE,
  log_code = TRUE,
  log_system_details = TRUE,
  log_git_info = TRUE
)
Experiment created: https://www.comet.ml/dsblank/cometr/ae316b3138be4dfdb02b305b1fc438c4 

Note: the notebook source isn't logged through the experiment, but we'll log the entire notebook at the end.

Let's tag the experiment, so that the experiment will be easy to select in the Comet UI:

In [3]:
exp$add_tags(c("made with nnet"))

Iris Dataset

Next, for this example, we sample the iris data:

In [4]:
sample_size <- 25 # of each iris type
total_size <- 50

We note the sample_size as a hyperparameter:

In [5]:
exp$log_parameter("sample_size", sample_size)

And actually sample the dataset:

In [6]:
ir <- rbind(iris3[,,1], iris3[,,2], iris3[,,3])
targets <- class.ind(c(
  rep("s", total_size),
  rep("c", total_size),
  rep("v", total_size))
)
samp <- c(
  sample(1:total_size, sample_size),
  sample((total_size + 1):(total_size * 2), sample_size),
  sample(((total_size * 2) + 1):(total_size * 3), sample_size)
)

Let's take a look at one of the samples:

In [7]:
ir[1,]
Sepal L.
5.1
Sepal W.
3.5
Petal L.
1.4
Petal W.
0.2

We can see the sepal length, sepal width, petal length, and the petal width. What type of an Iris is it?

In [8]:
targets[1,]
c
0
s
1
v
0

There are three types (c, s, and v) and this is type s. These stand for versicolor, setosa, and virginica, respectively.

Train the Network

Now, the hyperparameters for the actual experiment:

In [9]:
weight_decay <- 5e-4
epochs <- 200
hidden_layer_size <- 2
initial_random_weight_range <- 0.1

And log them as well:

In [10]:
exp$log_parameter("weight_decay", weight_decay)
exp$log_parameter("epochs", epochs)
exp$log_parameter("hidden_layer_size", hidden_layer_size)
exp$log_parameter("initial_random_weight_range", initial_random_weight_range)

Ideally, all we need to do next is:

nnet(
    ir[samp,],
    targets[samp,],
    size = hidden_layer_size,
    rang = initial_random_weight_range,
    decay = weight_decay,
    maxit = epochs
)

However, we wish to log the "loss" values from the training. Unfortunately, the loop that does the processing is in C. But we can grab the output, parse it, and then log it. So, a bit of code to do that.

Now, we attempt to learn the categories using the train function, logging the metric "loss" (i.e., "error"):

In [11]:
ir1 <- NULL

train <- function() {
  ir1 <<- nnet(
    ir[samp,],
    targets[samp,],
    size = hidden_layer_size,
    rang = initial_random_weight_range,
    decay = weight_decay,
    maxit = epochs)
    ir1
}

output <- capture.output(train(), split = TRUE)
output <- strsplit(output, "\n")

# "initial  value 57.703088 "
for (match in str_match(output, "^initial\\s+value\\s+([-+]?[0-9]*\\.?[0-9]+)")[,2]) {
  if (!is.na(match)) {
     exp$log_metric("loss", match, step=0)
  }
}

# "iter  10 value 46.803951"
matrix = str_match(output, "^iter\\s+(\\d+)\\s+value\\s+([-+]?[0-9]*\\.?[0-9]+)")
for (i in 1:nrow(matrix)) {
  match = matrix[i,]
  if (!is.na(match[2])) {
     exp$log_metric("loss", match[3], step=match[2])
  }
}
# weights:  19
initial  value 55.212711 
iter  10 value 32.054406
iter  20 value 25.084979
iter  30 value 24.785996
iter  40 value 18.337328
iter  50 value 17.420288
iter  60 value 17.196332
iter  70 value 17.071140
iter  80 value 17.025227
iter  90 value 16.961082
iter 100 value 16.960243
final  value 16.960180 
converged
a 4-2-3 network with 19 weights
options were - decay=5e-04

Testing the Model

Now, we'll test the trained model by creating a confusion matrix:

In [12]:
test.cl <- function(true, pred) {
    true <- max.col(true)
    cres <- max.col(pred)
    table(true, cres)
}
cm <- test.cl(targets[-samp,], predict(ir1, ir[-samp,]))
In [13]:
cm
    cres
true  1  2  3
   1 19  0  6
   2  0 25  0
   3  0  0 25

Let's make a slightly better visualization for this confusion matrix.

First we pull out the matrix:

In [14]:
matrix <- sprintf("[%s,%s,%s]", 
                  sprintf("[%s]", paste(cm[1,], collapse=",")),
                  sprintf("[%s]", paste(cm[2,], collapse=",")),
                  sprintf("[%s]", paste(cm[3,], collapse=",")))
matrix
'[[19,0,6],[0,25,0],[0,0,25]]'

And set some labels:

In [15]:
title <- "Iris Confusion Matrix"
labels <- sprintf('["%s","%s","%s"]', "Setosa","Versicolor","Virginica")

We put those together in a template of the JSON format for the Comet confusion matrix:

In [16]:
template <- '{"version":1,"title":"%s","labels":%s,"matrix":%s,"rowLabel":"Actual Category","columnLabel":"Predicted Category","maxSamplesPerCell":25,"sampleMatrix":[],"type":"integer"}'

We log the confusion matrix to a file:

In [17]:
fp <- file("confusion_matrix.json")
writeLines(c(sprintf(template, title, labels, matrix)), fp)
close(fp)
In [18]:
exp$upload_asset("confusion_matrix.json", type = "confusion-matrix")

In the Comet UI, you should see something like this on the Confusion Matrix tab:

And log some additional notes to the HTML tab on the Comet UI:

In [19]:
exp$log_html("
<h1>Comet nnet Example</h1>

<p>This example demonstrates using the nnet library on the iris dataset.</p>

<p>See the Output tab for confusion matrix.</p>

<ul>
<li><a href=https://github.com/comet-ml/cometr/blob/master/inst/train-examples/nnet-example.R>github.com/comet-ml/cometr/inst/train-example/nnet-example.R</a></li>
</ul>

<p>For help on the Comet R SDK, please see: <a href=https://www.comet.ml/docs/r-sdk/getting-started/>www.comet.ml/docs/r-sdk/getting-started/</a></p>
")

Mark the experiment as created by R:

In [20]:
exp$log_other(key = "Created by", value = "cometr")

Now, we show how you can display this experiment in the notebook:

In [21]:
url <- exp$get_url()
In [22]:
display_html(sprintf("<iframe width=900 height=900 src=%s></iframe>", url))

At this point, save your notebook. We'll then upload it as an asset:

In [23]:
exp$upload_asset(
  "Comet-R-nnet.ipynb",
  type = "notebook",
)
In [24]:
exp$print()
exp$stop()
Comet experiment https://www.comet.ml/dsblank/cometr/ae316b3138be4dfdb02b305b1fc438c4