Title: Introduction to Convolutional Networks Author: Thomas Breuel Institution: UniKL
from pylab import *
from scipy.ndimage import filters
def F(a): return array(a,'f')
Let's start by constructing a simple non-linear image processing filter. Can we learn this filter?
image = 1.0*(mean(imread("page.png"),2))
roi = (slice(500,1000),slice(500,1000))
image = image[roi]
image -= mean(image)
out = 1.0*(8*maximum(0,filters.gaussian_filter(image,2.0,order=(1,0))))
gray()
subplot(121); imshow(image)
subplot(122); imshow(out)
amax(out)
0.91207802
Note that if this were a linear filter, we might think we can derive it from the Fourier transform:
convolution (linear filter):
$$y = f * x$$fundamental theorem:
$${\cal F}[y] = {\cal F}[f] \cdot {\cal F}[x]$$solving the equation:
$${\cal F}[f] = \frac{{\cal F}[y]}{{\cal F}[x]}$$transforming back:
$$ f = {\cal F}^{-1} \left[ \frac{ {\cal F}[y] } { {\cal F}[x] } \right] $$However, this may or may not be the filter we want.
def sigmoid(x): return 1/(1+exp(-x))
The simplest convolutional neural network has no hidden layer; it is the equivalent of a sigmoid regression. The formula is:
$$ Y = \sigma(F*X+\theta) $$Here
Now, we can view the convolution basically as a large number of independent training problems.
Consider the image $I_\hat{p}$, which is the image $I$ shifted by $-p$. If we keep the filter fixed and shift the image, then the output $C$ at pixel $p$ is given by:
$$Y(p) = F \cdot X_{\hat{p}} + \theta$$Generally, the filter $F$ has a small footprint, meaning that it is zero outside a small region around the origin.
In essence, this problem is just like training with lots of separate training instances, except that we're trying to use a convolution operation to implement this. We need to keep track of the coordinates in the right way.
$$ \begin{eqnarray} \frac{\partial}{\partial F_{ij}} \sum_p (~T(p)-Y(p)~)^2 &=& \sum_p 2(T(p)-Y(p)) ~~ \sigma'(F\cdot I_{\hat{p}}) ~~ I_{\hat{p}}\\\ &=&\sum_p 2(T(p)-Y(p)) ~~ Y(p)(1-Y(p)) ~~ I_{\hat{p},i,j} \end{eqnarray} $$Here, we define $\delta(p)$ as before:
$$ \delta(p) = 2(T(p)-Y(p)) ~~ Y(p)(1-Y(p)) $$Let's generate a random mask to start with.
r=5
filter = F(0.01*randn(2*r+1,2*r+1))
Now let's look at the output this filter generates.
pred = sigmoid(filters.convolve(image,filter))
subplot(121); imshow(pred)
subplot(122); imshow(pred,vmin=0,vmax=1)
amin(pred),amax(pred)
(0.48624954, 0.52211583)
As before, we compute a delta between the output and the desired output.
delta0 = (out-pred)
subplot(121); imshow(delta0)
delta = (out-pred)*pred*(1-pred)
subplot(122); imshow(delta)
<matplotlib.image.AxesImage at 0x374ffd0>
Note that each weight in our random filter contributes to all output pixels. If we want to "backpropagate" the deltas, we need to sum up all the deltas from all the filter outputs.
dw = array([[sum(delta*roll(roll(image,i,0),j,1)) for j in range(-r,r+1)] for i in range(-r,r+1)])
dw /= prod(image.shape)
imshow(dw)
amin(dw),amax(dw),dw.dtype
(-0.0034652944, 0.0011683544, dtype('float32'))
Let's put this all together into a gradient descent learning algorithm.
theta = 0.0
for iter in range(1000):
pred = sigmoid(filters.convolve(image,filter)+theta)
err = sum((pred-out)**2)
delta = (pred-out)*pred*(1-pred)
delta /= prod(image.shape)
dw = array([[sum(delta*roll(roll(image,i,0),j,1)) for j in range(-r,r+1)] for i in range(-r,r+1)])
if iter%50==0:
print iter,err,":",(amin(pred),amax(pred)),sum(abs(delta)),
print (amin(dw),amax(dw)),(amin(filter),amax(filter)),theta
filter -= dw
theta -= sum(delta)
0 51495.0 : (0.48624954, 0.52211583) 0.10982 (-0.0013520373, 0.0032668943) (-0.029455993, 0.026091032) 0.0 50 4397.94 : (0.072140738, 0.36709112) 0.0162744 (-0.00070879137, 0.0011495821) (-0.10338775, 0.063890524) -1.71587491222 100 2627.21 : (0.031862784, 0.45811501) 0.0103208 (-0.00061922497, 0.00093237666) (-0.15040764, 0.088849939) -2.03730937187 150 1910.8 : (0.017386325, 0.56336188) 0.00819295 (-0.00056234887, 0.00077779568) (-0.18951969, 0.11478845) -2.19306036993 200 1473.71 : (0.010572175, 0.65190011) 0.00688511 (-0.00050368829, 0.0006372091) (-0.2218695, 0.13822004) -2.29702537518 250 1181.12 : (0.006968664, 0.71730405) 0.00591087 (-0.00044778202, 0.00052048423) (-0.24828075, 0.16170618) -2.37981959002 300 977.117 : (0.0049024085, 0.76385123) 0.00515023 (-0.00039510831, 0.00043040476) (-0.26993901, 0.18278031) -2.45108571986 350 829.998 : (0.0036344689, 0.79730797) 0.0045502 (-0.00034930339, 0.0003636154) (-0.28797349, 0.20138238) -2.51411203877 400 720.586 : (0.002809946, 0.82201749) 0.00407363 (-0.00031081311, 0.00031881803) (-0.30326092, 0.21787542) -2.57035250252 450 637.084 : (0.0022468881, 0.84082115) 0.0036917 (-0.00027875858, 0.00028317625) (-0.31643328, 0.23260616) -2.62074463977 500 571.935 : (0.0018463387, 0.8555268) 0.00338272 (-0.00025198792, 0.00025411052) (-0.32794094, 0.24586843) -2.66604685766 550 520.144 : (0.001551502, 0.86730045) 0.00313121 (-0.00022944857, 0.00022991268) (-0.33810905, 0.25789988) -2.70691331301 600 478.324 : (0.0013281747, 0.87691414) 0.00292533 (-0.00021028086, 0.00020941216) (-0.34791788, 0.26889029) -2.74391074048 650 444.095 : (0.0011548706, 0.88489538) 0.00275525 (-0.00019381418, 0.00019179737) (-0.35690808, 0.27899081) -2.77752619621 700 415.747 : (0.0010175834, 0.8916145) 0.00261426 (-0.00017952557, 0.0001764989) (-0.36504495, 0.28832325) -2.80817601702 750 392.03 : (0.00090688199, 0.89733911) 0.00249657 (-0.00016702792, 0.0001630753) (-0.37244996, 0.29698664) -2.83621535654 800 372.01 : (0.00081623549, 0.90226686) 0.0023978 (-0.00015600196, 0.00015120726) (-0.37922135, 0.30506241) -2.86194675113 850 354.975 : (0.00074100756, 0.90654665) 0.00231418 (-0.00014620741, 0.00014063941) (-0.38543916, 0.31261787) -2.88562904819 900 340.375 : (0.00067783217, 0.91029274) 0.00224326 (-0.00013745073, 0.00013117767) (-0.39117, 0.31970966) -2.9074842775 950 327.774 : (0.00062422006, 0.91359425) 0.0021827 (-0.00012957772, 0.00012265667) (-0.39646918, 0.32638583) -2.9277037502
After this training, let's look at the output.
figsize(12,6)
subplot(121); imshow(out,interpolation='nearest')
subplot(122); imshow(pred,interpolation='nearest')
<matplotlib.image.AxesImage at 0x40dc350>
figsize(12,6)
subplot(121); imshow(out[100:200,100:200],interpolation='nearest')
subplot(122); imshow(pred[100:200,100:200],interpolation='nearest')
<matplotlib.image.AxesImage at 0x5963f50>
imshow(filter,interpolation='nearest')
<matplotlib.image.AxesImage at 0x12377150>