Title: Introduction to Convolutional Networks Author: Thomas Breuel Institution: UniKL

In [2]:

from pylab import *
from scipy.ndimage import filters
def F(a): return array(a,'f')

A Nonlinear Image Processing Problem¶

Let's start by constructing a simple non-linear image processing filter. Can we learn this filter?

In [5]:

image = 1.0*(mean(imread("page.png"),2))
roi = (slice(500,1000),slice(500,1000))
image = image[roi]
image -= mean(image)
out = 1.0*(8*maximum(0,filters.gaussian_filter(image,2.0,order=(1,0))))
gray()
subplot(121); imshow(image)
subplot(122); imshow(out)
amax(out)

Out[5]:

0.91207802

Learning a Linear Filter¶

Note that if this were a linear filter, we might think we can derive it from the Fourier transform:

convolution (linear filter):

$$y = f * x$$

fundamental theorem:

$${\cal F}[y] = {\cal F}[f] \cdot {\cal F}[x]$$

solving the equation:

$${\cal F}[f] = \frac{{\cal F}[y]}{{\cal F}[x]}$$

transforming back:

$$ f = {\cal F}^{-1} \left[ \frac{ {\cal F}[y] } { {\cal F}[x] } \right] $$

However, this may or may not be the filter we want.

In [14]:

def sigmoid(x): return 1/(1+exp(-x))

Single Layer Convolutional Neural Network¶

The simplest convolutional neural network has no hidden layer; it is the equivalent of a sigmoid regression. The formula is:

$$ Y = \sigma(F*X+\theta) $$

Here

$F$ is the filter
$X$ is the input image
$Y$ is the cleaned up output
$\theta$ is a threshold

Now, we can view the convolution basically as a large number of independent training problems.

Consider the image $I_\hat{p}$, which is the image $I$ shifted by $-p$. If we keep the filter fixed and shift the image, then the output $C$ at pixel $p$ is given by:

$$Y(p) = F \cdot X_{\hat{p}} + \theta$$

Generally, the filter $F$ has a small footprint, meaning that it is zero outside a small region around the origin.

In essence, this problem is just like training with lots of separate training instances, except that we're trying to use a convolution operation to implement this. We need to keep track of the coordinates in the right way.

$$ \begin{eqnarray} \frac{\partial}{\partial F_{ij}} \sum_p (~T(p)-Y(p)~)^2 &=& \sum_p 2(T(p)-Y(p)) ~~ \sigma'(F\cdot I_{\hat{p}}) ~~ I_{\hat{p}}\\\ &=&\sum_p 2(T(p)-Y(p)) ~~ Y(p)(1-Y(p)) ~~ I_{\hat{p},i,j} \end{eqnarray} $$

Here, we define $\delta(p)$ as before:

$$ \delta(p) = 2(T(p)-Y(p)) ~~ Y(p)(1-Y(p)) $$

Let's generate a random mask to start with.

In [7]:

r=5
filter = F(0.01*randn(2*r+1,2*r+1))

Now let's look at the output this filter generates.

In [8]:

pred = sigmoid(filters.convolve(image,filter))
subplot(121); imshow(pred)
subplot(122); imshow(pred,vmin=0,vmax=1)
amin(pred),amax(pred)

Out[8]:

(0.48624954, 0.52211583)

As before, we compute a delta between the output and the desired output.

In [10]:

delta0 = (out-pred)
subplot(121); imshow(delta0)
delta = (out-pred)*pred*(1-pred)
subplot(122); imshow(delta)

Out[10]:

<matplotlib.image.AxesImage at 0x374ffd0>

Note that each weight in our random filter contributes to all output pixels. If we want to "backpropagate" the deltas, we need to sum up all the deltas from all the filter outputs.

In [90]:

dw = array([[sum(delta*roll(roll(image,i,0),j,1)) for j in range(-r,r+1)] for i in range(-r,r+1)])
dw /= prod(image.shape)
imshow(dw)
amin(dw),amax(dw),dw.dtype

Out[90]:

(-0.0034652944, 0.0011683544, dtype('float32'))

Let's put this all together into a gradient descent learning algorithm.

In [11]:

theta = 0.0
for iter in range(1000):
    pred = sigmoid(filters.convolve(image,filter)+theta)
    err = sum((pred-out)**2)
    delta = (pred-out)*pred*(1-pred)
    delta /= prod(image.shape)
    dw = array([[sum(delta*roll(roll(image,i,0),j,1)) for j in range(-r,r+1)] for i in range(-r,r+1)])
    if iter%50==0:
        print iter,err,":",(amin(pred),amax(pred)),sum(abs(delta)),
        print (amin(dw),amax(dw)),(amin(filter),amax(filter)),theta
    filter -= dw
    theta -= sum(delta)

0 51495.0 : (0.48624954, 0.52211583) 0.10982 (-0.0013520373, 0.0032668943) (-0.029455993, 0.026091032) 0.0
50 4397.94 : (0.072140738, 0.36709112) 0.0162744 (-0.00070879137, 0.0011495821) (-0.10338775, 0.063890524) -1.71587491222
100 2627.21 : (0.031862784, 0.45811501) 0.0103208 (-0.00061922497, 0.00093237666) (-0.15040764, 0.088849939) -2.03730937187
150 1910.8 : (0.017386325, 0.56336188) 0.00819295 (-0.00056234887, 0.00077779568) (-0.18951969, 0.11478845) -2.19306036993
200 1473.71 : (0.010572175, 0.65190011) 0.00688511 (-0.00050368829, 0.0006372091) (-0.2218695, 0.13822004) -2.29702537518
250 1181.12 : (0.006968664, 0.71730405) 0.00591087 (-0.00044778202, 0.00052048423) (-0.24828075, 0.16170618) -2.37981959002
300 977.117 : (0.0049024085, 0.76385123) 0.00515023 (-0.00039510831, 0.00043040476) (-0.26993901, 0.18278031) -2.45108571986
350 829.998 : (0.0036344689, 0.79730797) 0.0045502 (-0.00034930339, 0.0003636154) (-0.28797349, 0.20138238) -2.51411203877
400 720.586 : (0.002809946, 0.82201749) 0.00407363 (-0.00031081311, 0.00031881803) (-0.30326092, 0.21787542) -2.57035250252
450 637.084 : (0.0022468881, 0.84082115) 0.0036917 (-0.00027875858, 0.00028317625) (-0.31643328, 0.23260616) -2.62074463977
500 571.935 : (0.0018463387, 0.8555268) 0.00338272 (-0.00025198792, 0.00025411052) (-0.32794094, 0.24586843) -2.66604685766
550 520.144 : (0.001551502, 0.86730045) 0.00313121 (-0.00022944857, 0.00022991268) (-0.33810905, 0.25789988) -2.70691331301
600 478.324 : (0.0013281747, 0.87691414) 0.00292533 (-0.00021028086, 0.00020941216) (-0.34791788, 0.26889029) -2.74391074048
650 444.095 : (0.0011548706, 0.88489538) 0.00275525 (-0.00019381418, 0.00019179737) (-0.35690808, 0.27899081) -2.77752619621
700 415.747 : (0.0010175834, 0.8916145) 0.00261426 (-0.00017952557, 0.0001764989) (-0.36504495, 0.28832325) -2.80817601702
750 392.03 : (0.00090688199, 0.89733911) 0.00249657 (-0.00016702792, 0.0001630753) (-0.37244996, 0.29698664) -2.83621535654
800 372.01 : (0.00081623549, 0.90226686) 0.0023978 (-0.00015600196, 0.00015120726) (-0.37922135, 0.30506241) -2.86194675113
850 354.975 : (0.00074100756, 0.90654665) 0.00231418 (-0.00014620741, 0.00014063941) (-0.38543916, 0.31261787) -2.88562904819
900 340.375 : (0.00067783217, 0.91029274) 0.00224326 (-0.00013745073, 0.00013117767) (-0.39117, 0.31970966) -2.9074842775
950 327.774 : (0.00062422006, 0.91359425) 0.0021827 (-0.00012957772, 0.00012265667) (-0.39646918, 0.32638583) -2.9277037502

After this training, let's look at the output.

In [12]:

figsize(12,6)
subplot(121); imshow(out,interpolation='nearest')
subplot(122); imshow(pred,interpolation='nearest')

Out[12]:

<matplotlib.image.AxesImage at 0x40dc350>

In [13]:

figsize(12,6)
subplot(121); imshow(out[100:200,100:200],interpolation='nearest')
subplot(122); imshow(pred[100:200,100:200],interpolation='nearest')

Out[13]:

<matplotlib.image.AxesImage at 0x5963f50>

In [93]:

imshow(filter,interpolation='nearest')

Out[93]:

<matplotlib.image.AxesImage at 0x12377150>

In [93]: