"ImageNet Classification with Deep Convolutional Neural Networks", Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2013
None of these were new ideas, but this paper brought it all together.
Deep Learning didn't come out of nothing.
The major people in the field today tried to push it for 20 years compared to other approaches.
Finally succeeded in the 2010's due to Alexnet.
# Alexnet Architecture (approximately)
model = nn.Sequential(
layers.Input("BDHW", sizes=(None, 3, 224, 224)),
flex.Conv2d(2*48, (11, 11)),
nn.ReLU(),
nn.MaxPool2d((2, 2)),
flex.Conv2d(2*192, (3, 3)),
nn.ReLU(),
flex.Conv2d(2*192, (3, 3)),
nn.ReLU(),
flex.Conv2d(2*128, (3, 3)),
layers.Reshape(0, [1, 2, 3]),
flex.Linear(4096),
nn.ReLU(),
flex.Linear(4096),
nn.ReLU(),
flex.Linear(1000)
)
NB: these are standard PCA/Gabor-jet style features from vision>
$\sigma(x) = (1 + e^{-x})^{-1}$, $\rho(x) = \max(0, x)$
property | sigmoid | ReLU |
---|---|---|
derivatives | infinite | f': discontinuous, f'': zero |
monotonicity> | monotonic | monotonic |
range | $(0, 1)$ | $(0, \infty)$ |
zero derivative | none | $(-\infty, 0)$ |
In later models, this is effectively replaced by batch normalization.
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", S. Ioffe and C. Szegedy, 2015.
Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
def conv2d_block(d):
return nn.Sequential(
flex.Conv2d(d, 3, padding=1), flex.BatchNorm2d(), nn.ReLU(),
flex.Conv2d(d, 3, padding=1), flex.BatchNorm2d(), nn.ReLU(),
flex.MaxPool2d(),
)
def make_vgg_like_model():
return nn.Sequential(
*conv2d_block(64), *conv2d_block(128), *conv2d_block(256),
*conv2d_block(512), *conv2d_block(1024), *conv2d_block(2048),
# we have a (None, 2048, 4, 4) batch at this point
# switch to global classification
nn.Flatten(),
flex.Linear(4096), flex.BatchNorm(), nn.ReLU(),
flex.Linear(4096), flex.BatchNorm(), nn.ReLU(),
flex.Linear(1000)
)
Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.