Title: Linear Threshold Units Author: Thomas M. Breuel Institution: UniKL
from pylab import *
from IPython.core.display import Image
def fig(x): return Image(filename="Figures/"+x+".png")
from pylab import *
def figs(*args):
for i,f in enumerate(args):
subplot(1,len(args),i+1)
axis("off")
fig = imshow(imread("Figures/"+f+".png"))
from scipy import linalg
(Continuous Time Models)
In previous lectures, we looked at various more-or-less realistic neuron models that treated neurons as dynamical systems operating in continuous time:
(Discrete Time Abstraction)
Assume time is discrete. Then:
# McCulloch-Pitts Neuron
figs("mp-concrete","mp-neuron-abstract")
(McCulloch-Pitts Neuron)
Component-wise notation:
$$ y = \phi(\sum_{j=1}^m w_j x_j+x_0) $$Vector notation:
$$ y = \phi(w\cdot x + x_0) $$(Multiple McCulloch-Pitts Neurons)
Component-wise notation:
$$ y_k = \phi(\sum_{j=1}^m w_{kj}x_j+x_0) $$Matrix notation:
$$ y = \phi(W \cdot x + x_0) $$(Rate Coding Justification)
The original McCulloch-Pitts justification was a discrete time model. We can also take another view:
(Linear Threshold Unit)
$\phi(\cdot)$ is some non-linearity.
We commonly choose the Heaviside function:
$$ H(x) = \left\lfloor x>0 \right\rfloor $$This gives rise to linear threshold units or linear threshold neurons.
def H(x): return 1.0*(x>=0.0)
(Linear Threshold Units)
$$ f(x) = \left\lfloor \sum_{j=1}^N w_j x_j > \theta \right\rfloor $$$$ ~~~ = H(\sum w_j x_j - \theta) $$(Linear Threshold Units as Boolean Gates)
The $x_j$ are assumed to be binary input values. What can we compute with LTUs?
AND-gate
$$ \hbox{AND}(x_1,...,x_N) = \left\lfloor \sum_{j=1}^N x_j \geq N \right\rfloor $$OR-gate
$$ \hbox{OR}(x_1,...,x_N) = \left\lfloor \sum_{j=1}^N x_j \geq 1 \right\rfloor $$fig("gates-1")
(Negation)
To build negation, we need negative input weights.
This corresponds to inhibitory synapses.
NAND-gate (and NOT gate):
$$ \hbox{NAND}(x_1,...,x_N) = \left\lfloor \sum_{j=1}^N -1 \cdot x_j > -N \right\rfloor $$fig("gates-2")
(Neural Networks as Logic Circuits)
By setting the weights appropriately, we can create AND, OR, NAND, and NOT gates out of linear threshold units.
Linear threshold units are one time step delay gates.
Take together, this means we can build arbitrary computations out of these circuits.
(Computational Universality)
Are networks of neurons Turing equivalent?
(Cellular Automata)
(3-Neighborhood 1D Cellular Automata)
(Rule 110)
Rule 110 is the rule corresponding to choices $01101110$ for the 8 different outputs for inputs $111_2$ down to $000_2$.
Rule 110 has been shown to be computationally universal by having disturbances (like particles in 1D) interact with each other.
# example output from Rule 110
fig("r110-example")
# interacting "particles" in Rule 110
fig("r110-interactions")
(Tag Systems for Turing Equivalence)
(Tag Systems and Rule 110 Example)
fig("rule110-construction")
# Rule 110 Zoom
# http://uncomp.uwe.ac.uk/genaro/rule110/ctsRule110.html
figs("rule110-zoom")
(Finite Circuits and Turing Equivalence)
problem:
solution:
We start off with defining a size for the input, a number of iterations to run, and an initial random boolean vector.
N = 100
niter = 100
vstart = 1.0 * (rand(N)>0.5)
(Encoding Rule 110 as Matrix Operations)
The idea behind encoding Rule 110 as matrix operations is the following:
Given a vector $(...,x_i-1,x_i,x_i+1,...)$, we can test whether all three bits are 1 by computing a dot product:
$$ (0,...,0,1,1,1,0,...,0) \cdot (...,x_i-1,x_i,x_i+1,...) \geq 3 $$We can do this for every 3-neighborhood by constructing a circulant matrix.
# counting the number of bits
Mall = linalg.circulant(1*(roll(arange(N)<3,-1)))
imshow(Mall,interpolation='none',cmap=cm.gray)
print 1*(dot(Mall,vstart)>=3)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0]
With the same idea, we test whether the two rightmost bits are zero.
Mlo = linalg.circulant(-1*(arange(N)<2))
We can now put these two ideas together and compute vectors $v_{111}$ and $v_{x00}$ that indicate whether we are in a $111$ or $x00$ neighborhood.
We can then compute another threshold function to compute the NOR of these two vectors to get the final output.
# Rule 110, two steps
v = vstart.copy()
result = []
for i in range(niter):
result.append(v)
v111 = (dot(Mall,v)>=3)
vx00 = (dot(Mlo,v)>=0)
on = (-1*v111-1*vx00>=0)
v = 1*on
This actually looks quite nicely like Rule 110 ought to look.
imshow(array(result),interpolation='nearest',cmap=cm.gray)
<matplotlib.image.AxesImage at 0xf7d2bd0>
(Single Recurrent Neural Network)
This isn't quite a recurrent neural network yet.
To get that, we pack together the vector $v$ with two variables representing intermediate states of the computation.
If you work out the matrix multiplications, you'll see that this outputs a row of the original output every other step.
Z = zeros((N,N))
v = concatenate([vstart,zeros(2*N)])
M = array(bmat([[Z,-1*eye(N),-1*eye(N)],[Mall,Z,Z],[Mlo,Z,Z]]))
b = concatenate([zeros(N),3*ones(N),zeros(N)])
(Weight Matrix)
This shows the weight matrix of the single layer recurrent network for Rule 110. Note that all the green entries are 0. We can greatly speed up such computations using a sparse matrix package.
imshow(M)
<matplotlib.image.AxesImage at 0xfb4dd90>
After this packing, the entire inner loop is a simpler recurrent layer of linear threshold units.
result = []
for i in range(2*niter):
result.append(v)
v = H(dot(M,v)-b)
result = array(result)
imshow(result,interpolation='nearest',cmap=cm.gray)
<matplotlib.image.AxesImage at 0xfcf7910>
To extract the Rule 110 result, we just take every other output, and we ignore the intermediate results.
imshow(result[::2,:N],interpolation='nearest',cmap=cm.gray)
<matplotlib.image.AxesImage at 0xfd1e910>
(Summary)
Therefore, recurrent single layer neural networks are Turing complete.
We also could emulate a Turing machine directly.