Artificial Neural Networks
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from IPython.display import SVG
Using TensorFlow backend.
Definition (neural network): A neural network $\operatorname{NN}$ is a tuple $\operatorname{NN}=(A_l, b_l, \sigma_l)_{1 \leq l \leq N_l}$ defined by
For any $1 \leq l \leq N_L$, the tuple $(A_l, b_l, \sigma_l)$ is called a layer. For $l=N_L$, the layer is called output layer and for $1 \leq l<N_L$, the layer is called hidden layer.
Definition (feed forward): Let $\operatorname{NN}=(A_l, b_l, \sigma_l)_{1 \leq l \leq N_l}$ be a neural network. Then for each $1 \leq l \leq N_l$, we define a function $$F_l:\mathbb{R}^{n_{l-1}} \to \mathbb{R}^{n_l}, \qquad v \mapsto \sigma_l(v^T A_l + b_l),$$ where we employ the convention that $\sigma_l$ is applied in every component. The composition $F:\mathbb{R}^{n_i} \to \mathbb{R}^{n_o}$, $F := F_{N_L} \circ \ldots \circ F_2 \circ F_1$ is called the feed forward of $\operatorname{NN}$. Any set of inputs $x \in \mathbb{R}^{n_i}$ is called an input layer.
Warning: The formulation $F_l(v) = \sigma_l(A_l v + b_l)$ is also very common. Conceptually, this doesn't make a difference as long as the dimensions of $A_l$ are then chosen accordingly. The conventions in this notebook are consistent with keras
.
fig, ax = plt.subplots(figsize=(7, 5))
ax.axis("equal")
xs = 2
ys = 1.5
radius = 0.5
fsize = 20
def add_circle(x, y, label, color='r'):
c1 = plt.Circle((x, y), radius, alpha=0.2, color=color, zorder=1)
c2 = plt.Circle((x, y), radius, color='k', fill=False, zorder=1)
ax.add_artist(c1)
ax.add_artist(c2)
ax.text(x-radius/2, y-radius/3, label, size=fsize)
def add_line(x1, x2, y1, y2):
ax.plot([x1, x2], [y1, y2], alpha=0.1, color='k', zorder=-1)
for unit in (1, 0, -1):
add_circle(0, unit * ys, "$x_{%s}$" % str(3 - unit - 1), color='green')
for hlayer in (1,2,3):
for unit in (-2, -1, 0, 1, 2):
add_circle(hlayer * xs, unit * ys, "$v_{%s;%s}$" % (str(hlayer),str(5 - unit - 1)), color='blue')
for unit in (-1, 1):
add_circle(4*xs, unit * ys, "$y_{%s}$" % ("1" if unit==1 else "2"), color='red')
for iunit in (-1, 0, 1):
for hunit in (-2, -1, 0, 1, 2):
add_line(0, xs, iunit*ys, hunit*ys)
for hlayer in (1, 2):
for unita in (-2, -1, 0, 1, 2):
for unitb in (-2, -1, 0, 1, 2):
add_line(hlayer*xs, (hlayer+1)*xs, unita*ys, unitb*ys)
for hunit in (-2, -1, 0, 1, 2):
for ounit in (-1, 1):
add_line(3*xs, 4*xs, hunit*ys, ounit*ys)
ax.text(0, -4, "Input", size=fsize)
ax.text(1.7*xs, -4, "Hidden", size=fsize)
ax.text(3.7*xs, -4, "Output", size=fsize)
ax.set_xlim([0,5])
ax.set_ylim([-4,4])
_ = plt.axis('off')
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous…
Neural networks can be visualized like in the above picture: This shows a network $(A_l, b_l, \sigma_l)_{1 \leq l \leq N_L}$ with a total of $N_L=4$ layers, i.e. $3$ layers are hidden. Notice that the input layer is just a visualization of the input and is not part of the actual network topology.
The links between the nodes visualize the feedforward. This can be seen as follows: The input layer as $n_i=3$ units, i.e. any $x \in \mathbb{R}^3$ can be used as an input layer. The first hidden layer as $n_1=5$ units, i.e. the first function in the feedforward is a map $F_1: \mathbb{R}^3 \to \mathbb{R}^5$, $x \mapsto F_1(x) = \sigma(xA_1 + b_1)$. If we unwind the definition of the matrix-vector multiplication and compute the components $v_{1;i}$ of the result $v_1 := F_1(x) $, we obtain for any $1 \leq i \leq 5$ $$ v_{1;i} = \sigma_1 \Big{(} \sum_{j=1}^{3}{x_j A_{1;ij}} \Big{)}$$ For the first component $i=1$, we can see that it depends on all the inputs, i.e. on $x_1, x_2, x_3$. This is visualized by the $3$ lines connecting the respective units. The same holds for the other components. The computation of the feedforward now proceeds through the second hidden layers in the same fashion as from the input layer into the first hidden layer, i.e. the computed outputs of the first layer $v_{1} \in \mathbb{R}^5$ can now be though of as an input to the second layer. Here, the feedforward function is $F_2:\mathbb{R}^5 \to \mathbb{R}^5$ and the layer computes $v_2 := F_2(v_1)$ again for every component $1 \leq i \leq 5$ via $$ v_{2;i} = \sigma_2 \Big{(} \sum_{j=1}^{5}{x_j A_{2;ij}} \Big{)}$$ The computation for $F_3:\mathbb{R}^5 \to \mathbb{R}^5$ is analogous and we obtain $v_3 := F_3(v_2)$. Finally, the output layer applies $F_4:\mathbb{R}^5 \to \mathbb{R}^2$ and we obtain $v_4 := F_4(v_3)$. The output layer computes this just like the hidden layer. By definition the result of the output layer is the final result of the complete computation, i.e. $y:= F(x) = v_4 = F_4(v_3)$.
Neural Networks can be conveniently set up in Python using keras
. In order to train them, a backend is required. We chose tensorflow
here.
We setup the neural network from the previous section in keras
.
model = Sequential() # initializes the model whith no layers yet
model.add(Dense(units=5, input_dim=3, activation='sigmoid')) # first hidden layer needs to know the input dimension
model.add(Dense(units=5, activation='sigmoid')) # second hidden layer
model.add(Dense(units=5, activation='sigmoid')) # third hidden layer
model.add(Dense(units=2)) # output layer (uses linear activation here as a default)
model.compile(optimizer='adam', loss='MAE') # this creates the actual model (optimizer and loss metric are needed for training later)
model.summary() # provides neat summary about the network topology and the parameters
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_1 (Dense) (None, 5) 20 _________________________________________________________________ dense_2 (Dense) (None, 5) 30 _________________________________________________________________ dense_3 (Dense) (None, 5) 30 _________________________________________________________________ dense_4 (Dense) (None, 2) 12 ================================================================= Total params: 92 Trainable params: 92 Non-trainable params: 0 _________________________________________________________________
# keras can also plot the model automatically (Graphviz needs to be installed and in PATH)
# from keras.utils.vis_utils import model_to_dot
# SVG(model_to_dot(model).create(prog='dot', format='svg'))
# after the model is set up the weights are initialized with random values
model.get_weights() # notice that the shape of the matrices and vectors are as expected
[array([[ 0.7183232 , 0.38307422, 0.0776127 , 0.8320963 , -0.09762895], [-0.2450003 , -0.85496855, -0.35251206, 0.66085297, -0.47060898], [-0.52232635, -0.4491664 , 0.41039938, -0.19148451, -0.11118329]], dtype=float32), array([0., 0., 0., 0., 0.], dtype=float32), array([[-0.23155713, -0.24680787, -0.0723573 , -0.66734 , 0.7170485 ], [-0.6729239 , -0.5296908 , 0.09631431, -0.06348813, -0.04985815], [-0.573444 , -0.5506638 , -0.39181206, 0.7385675 , 0.5457883 ], [-0.6523951 , -0.69977224, -0.66549873, 0.57492626, -0.5063471 ], [ 0.28198433, -0.39915782, 0.2621491 , 0.0493477 , 0.69479597]], dtype=float32), array([0., 0., 0., 0., 0.], dtype=float32), array([[-0.6755266 , 0.60235417, -0.13710588, -0.2865331 , 0.5920446 ], [ 0.14386547, 0.43755114, 0.71060646, 0.20056444, -0.24053401], [-0.76083595, 0.14162123, 0.4353727 , -0.21385908, -0.25838703], [ 0.01410168, 0.640501 , -0.1288696 , -0.11130595, -0.02846223], [-0.25267512, 0.2775594 , 0.54000807, 0.40621924, 0.7530713 ]], dtype=float32), array([0., 0., 0., 0., 0.], dtype=float32), array([[ 0.5556489 , -0.34421366], [-0.5555906 , -0.02127665], [-0.8151509 , 0.861356 ], [-0.21263468, 0.13752174], [-0.8979237 , 0.4667455 ]], dtype=float32), array([0., 0.], dtype=float32)]
# we create an input layer
x = np.array([1,2,3])
X = x.reshape(1,3) # because usually ouputs are computed for multiple inputs, the input layer needs an additional axis here
Y = model.predict(X) # computes the outputs for all of the current inputs
y = Y[0] # currently, we only have one input
y
array([-1.3470876, 0.7427888], dtype=float32)
In this section, we manually reproduce the computation of y
from X
.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
A = model.get_weights()[::2] # extracting the matrix weights
b = model.get_weights()[1::2] # extracting the vector weights
# compute the feedforward
v = [sigmoid(x @ A[0] + b[0])]
v.append(sigmoid(v[0] @ A[1] + b[1]))
v.append(sigmoid(v[1] @ A[2] + b[2]))
v.append(v[2] @ A[3] + b[3])
print(v[3])
# result is in the last vector, which should agree with the keras computation (up to numerical errors)
np.testing.assert_array_almost_equal(y,v[3])
[-1.34708779 0.74278881]