Convolutions

Import packages for:

  1. building a CNN from scratch;
  2. using built-in architectures.
In [1]:
from mxnet import np, npx
from mxnet.gluon import nn
npx.set_np()

The 2D cross-correlation operator:

In [2]:
def corr2d(X, K):
    h, w = K.shape
    Y = np.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()
    return Y

For example, a two-dimensional cross-correlation operation. The shaded portions are the first output element and the input and kernel array elements used in its computation:

\begin{align*} 0\times0+1\times1+3\times2+4\times3=19,\\ 1\times0+2\times1+4\times2+5\times3=25,\\ 3\times0+4\times1+6\times2+7\times3=37,\\ 4\times0+5\times1+7\times2+8\times3=43,\\ \end{align*}

A simple cross-correlation example.

In [3]:
X = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
K = np.array([[0, 1], [2, 3]])
corr2d(X, K)
Out[3]:
array([[19., 25.],
       [37., 43.]])

A simple cross-correlation example.

The convolutional layers

$\mathbf Y = \mathbf X \star \mathbf W + b$

In [4]:
class Conv2D(nn.Block):
    def __init__(self, kernel_size, **kwargs):
        super(Conv2D, self).__init__(**kwargs)
        self.weight = self.params.get('weight', shape=kernel_size)
        self.bias = self.params.get('bias', shape=(1,))

    def forward(self, x):
        return corr2d(x, self.weight.data()) + self.bias.data()

Check the output from the convolution layers.

In [5]:
def comp_conv2d(conv2d, X):
    conv2d.initialize()
    # Add batch and channel dimension.
    X = X.reshape((1, 1) + X.shape)
    Y = conv2d(X)
    # Exclude the first two dimensions
    return Y.reshape(Y.shape[2:])

Padding & Stride

In [6]:
X = np.random.uniform(size=(8, 8))
conv2d = nn.Conv2D(channels=1, kernel_size=3, padding=1, strides=2)
comp_conv2d(conv2d, X).shape
Out[6]:
(4, 4)
\begin{align} \text{ Output shape} & = \lfloor(n_h-k_h+p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+p_w+s_w)/s_w\rfloor \\ & = \lfloor(8 - 3 + 1 + 2) / 2\rfloor \times \lfloor(8 - 3 + 1 + 2) / 2\rfloor \\ & = (4, 4) \end{align}

A slightly more complicated example.

In [7]:
X = np.random.uniform(size=(8, 8))
conv2d = nn.Conv2D(1, kernel_size=(3, 5), padding=(0, 1), strides=(3, 4))
comp_conv2d(conv2d, X).shape
Out[7]:
(2, 2)
\begin{align} \text{ Output shape} & = \lfloor(n_h-k_h+p_h+s_h)/s_h\rfloor \times \lfloor(n_w-k_w+p_w+s_w)/s_w\rfloor \\ & = \lfloor(8 - 3 + 0 + 3)/3\rfloor \times \lfloor(8 - 5 + 1 + 4)/4\rfloor \\ & = (2, 2) \end{align}

Pooling

A 2D pooling operator

In [8]:
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = np.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w + 1))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            if mode == 'max':
                Y[i, j] = np.max(X[i: i + p_h, j: j + p_w])
            elif mode == 'avg':
                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()
    return Y

X = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
pool2d(X, (2, 2))
Out[8]:
array([[4., 5.],
       [7., 8.]])

Pooling with Padding and Stride

In [9]:
X = np.arange(16).reshape((1, 1, 4, 4))
print(X)
pool2d = nn.MaxPool2D(pool_size=3, padding=1, strides=2)
pool2d(X)
[[[[ 0.  1.  2.  3.]
   [ 4.  5.  6.  7.]
   [ 8.  9. 10. 11.]
   [12. 13. 14. 15.]]]]
Out[9]:
array([[[[ 5.,  7.],
         [13., 15.]]]])

Multiple channels pooling

In [10]:
X = np.concatenate((X, X + 1), axis=1)
print(X)
print("Input shape :", X.shape)

pool2d = nn.MaxPool2D(pool_size=3, padding=1, strides=2)
pool2d(X)
[[[[ 0.  1.  2.  3.]
   [ 4.  5.  6.  7.]
   [ 8.  9. 10. 11.]
   [12. 13. 14. 15.]]

  [[ 1.  2.  3.  4.]
   [ 5.  6.  7.  8.]
   [ 9. 10. 11. 12.]
   [13. 14. 15. 16.]]]]
shape : (1, 2, 4, 4)
Out[10]:
array([[[[ 5.,  7.],
         [13., 15.]],

        [[ 6.,  8.],
         [14., 16.]]]])