$\newcommand{\vec}[1]{\boldsymbol{#1}}$
Deep neural networks at last!
For example, doing visual pattern recognition:
Gradient descent with just 1,000 training images, trained over 500 epochs on the MNIST dataset.
Was reported by:
But, probably every body that worked in neural networks had eventually made an experiment like the previous one.
Lets picture a simple deep network:
"What makes deep networks hard to train?" is a complex question:
Many factors play a role in making deep networks hard to train, and understanding all those factors is still a subject of ongoing research.
%tikz -s 600,300 -sc 1.0 -l positioning -f svg \input{imgs/05/autoencoder.tikz}
%tikz -s 800,300 -sc 1.0 -l positioning -f svg \input{imgs/05/stacked-autoencoder.tikz}
The greedy layer-wise training procedure works like this:
Convolutional Neural Networks are very similar to ordinary neural networks:
Difference:
Convolutional Neural Networks take advantage of the fact that the input consists of images
We will stack these layers to form a full ConvNet architecture.
The convolutional layer is the core building block of a convolutional network that does most of the computational heavy lifting.
For example,
How many neurons there are in the output volume and how they are arranged? Three hyperparameters control the size of the output volume:
Corresponds to the number of filters we would like to use,
For example, if the first Convolutional Layer takes as input the raw image, then different neurons along the depth dimension may activate in presence of various oriented edged, or blobs of color.
Produces a volume of size $W_2\times H_2\times D_2$ where: $$ W_2 = \frac{W_1−F+2P}{S+1},\ H_2 = \frac{H_1−F+2P}{S+1},\ D_2=K $$
Subsampling layers reduce the size of the input.
There are multiple ways to subsample, but the most popular are:
In max-pooling, a pooling unit simply outputs the maximum activation in the 2×22×2 input region:
We can see convolution as the application of a filter or a dimensionality reduction.
The intuition behind the shared weights across the image is that the features will be detected regardless of their location, while the multiplicity of filters allows each of them to detect different set of features.
The overall goal is still the same: to use training data to train the network's weights and biases so that the network does a good job classifying input.
Restricted Boltzmann machines (RBM) are generative stochastic neural network that can learn a probability distribution over its set of inputs.
In other words, the net has a perception of how the input data must be represented, so it tries to reproduce the data based on this perception. If its reproduction isn’t close enough to reality, it makes an adjustment and tries again.
Restricted Boltzmann machines can be stacked to create a class of neural networks known as deep belief networks (DBNs).
My current preference:
There are many successfull applications, for example: