This is the first of our beginner tutorial series that will take you through creating, training, and running inference on a neural network. In this tutorial, you will learn how to use the built-in Block
to create your first neural network - a Multilayer Perceptron.
A neural network is a black box function. Instead of coding this function yourself, you provide many sample input/output pairs for this function. Then, we try to train the network to learn how to match the behavior of the function given only these input/output pairs. A better model with more data can more accurately match the function.
A Multilayer Perceptron (MLP) is one of the simplest deep learning networks. The MLP has an input layer which contains your input data, an output layer which is produced by the network and contains the data the network is supposed to be learning, and some number of hidden layers. The example below contains an input of size 3, a single hidden layer of size 3, and an output of size 2. The number and sizes of the hidden layers are determined through experimentation but more layers enable the network to represent more complicated functions. Between each pair of layers is a linear operation (sometimes called a FullyConnected operation because each number in the input connected to each number in the output by a matrix multiplication). Not pictured, there is also a non-linear activation function after each linear operation. For more information, see Multilayer Perceptron.
This tutorial requires the installation of the Java Jupyter Kernel. To install the kernel, see the Jupyter README.
// Add the snapshot repository to get the DJL snapshot artifacts
// // %mavenRepo snapshots https://oss.sonatype.org/content/repositories/snapshots/
// Add the maven dependencies
%maven ai.djl:api:0.4.0
%maven org.slf4j:slf4j-api:1.7.26
%maven org.slf4j:slf4j-simple:1.7.26
// See https://github.com/awslabs/djl/blob/master/mxnet/mxnet-engine/README.md
// for more MXNet library selection options
%maven ai.djl.mxnet:mxnet-native-auto:1.6.0
import ai.djl.*;
import ai.djl.nn.*;
import ai.djl.nn.core.*;
import ai.djl.training.*;
The MLP model uses a one dimensional vector as the input and the output. You should determine the appropriate size of this vector based on your input data and what you will use the output of the model for. In a later tutorial, we will use this model for Mnist image classification.
Our input vector will have size 28x28
because the input images have a height and width of 28 and it takes only a single number to represent each pixel. For a color image, you would need to further multiply this by 3
for the RGB channels.
Our output vector has size 10
because there are 10
possible classes for each image.
long inputSize = 28*28;
long outputSize = 10;
The core data type used for working with Deep Learning is the NDArray. An NDArray represents a multidimensional, fixed-size homogeneous array. It has very similar behavior to the Numpy python package with the addition of efficient computing. We also have a helper class, the NDList which is a list of NDArrays which can have different sizes and data types.
In DJL, Blocks serve a purpose similar to functions that convert an input NDList
to an output NDList
. They can represent single operations, parts of a neural network, and even the whole neural network. What makes blocks special is that they contain a number of parameters that are used in their function and are trained during deep learning. As these parameters are trained, the function represented by the blocks get more and more accurate.
When building these block functions, the easiest way is to use composition. Similar to how functions are built by calling other functions, blocks can be built by combining other blocks. We refer to the containing block as the parent and the sub-blocks as the children.
We provide several helpers to make it easy to build common block composition structures. For the MLP we will use the SequentialBlock, a container block whose children form a chain of blocks where each child block feeds its output to the next child block in a sequence.
SequentialBlock block = new SequentialBlock();
An MLP is organized into several layers. Each layer is composed of a Linear Block and a non-linear activation function. If we just had two linear blocks in a row, it would be the same as a combined linear block ($f(x) = W_2(W_1x) = (W_2W_1)x = W_{combined}x$). An activation is used to intersperse between the linear blocks to allow them to represent non-linear functions. We will use the popular ReLU as our activation function.
The first layer and last layers have fixed sizes depending on your desired input and output size. However, you are free to choose the number and sizes of the middle layers in the network. We will create a smaller MLP with two middle layers that gradually decrease the size. Typically, you would experiment with different values to see what works the best on your data set.
block.add(Blocks.batchFlattenBlock(inputSize));
block.add(Linear.builder().setOutChannels(128).build());
block.add(Activation::relu);
block.add(Linear.builder().setOutChannels(64).build());
block.add(Activation::relu);
block.add(Linear.builder().setOutChannels(outputSize).build());
block
Now that you've successfully created your first neural network, you can use this network to train your model.
Next chapter: Train your first model
You can find the complete source code for this tutorial in the model zoo.