In this chapter, we will:
[Warren Ellis] I try not to get envolved in the business of prediction. It's a quick way to look like an idiot.
Previously, we have learned about the paradigm "Predict, Compare, Learn". In this chapter, we'll dive deep into the first step: Predict. For our first neural network, we will be predicting one data point at a time (single input, single output), like so:
The "shape" of the input has a significant impact on what a network looks like (its architecture).
We usually start by inputting all of the information that correspond to the input entity (example: all pixel values of a cat image). If we are short on computational resources, we limit the inputs to the ones that we think will be "most" helpful in prediction.
As a result, we can create a network only after we understand the shape of the input and output data sets.
We are going to build a network with a single knob mapping from the input point to the output. In representation learning, these. knobs are acutally called "weights".
Here's our first network, with a single "weight" mapping from the input "# toes" to the output "win?":
Let's start with the simplest neural network possible:
# an empty network.
weight = .1
def neural_network(x, w):
prediction = x * w
return prediction
# inserting one input datapoint.
number_of_toes = [8.5, 9.5, 10, 9]
x0 = number_of_toes[0]
pred = neural_network(x0, weight)
print(pred)
0.8500000000000001
From the previous example, we can say that a neural network is of one or more weights that we can multiply by the input data to make a prediction. The input data is a numerical value that we measure in the real world, and a prediction is what. the neural network tells us, given the input data.
The prediction, however, is not always right. Sometimes the neural network makes mistakes, but what is important is that it learns from them. A neural network learns by following these steps:
A neural network, in its simplest form, uses the power of multiplication. Some weight values make parts of the input bigger, while others make other parts smaller.
A neural network considers the input value as "given information" and its weight values as knowledge. By using both, it outputs a prediction. A neural network uses the knowledge stored in its weights to interpret the information in the input. Weights can be interpreted as a measue of sensitivity between the input and the prediction (weight is a volume knob).
As demonstrated in the previous example, the NN had access to only one input instance. This implies that if we were to feed in number_of_toes[1]
, the NN wouldn't remmeber the prediction it made in the last timestep. All in all, a neural network knows only what you feed it as input, it forgets everything else.
Later, we will learn how to give a neural network a "short-term memory" by feeding in multiple inputs at once.
Now we want to use multiple inputs or "features" that describe the same input instance.
# Our implementation.
def w_sum(W, X):
"""Calculates W*X"""
assert(len(W) == len(X))
muls = list()
for i in range(slen(ws)):
muls.append(W[i] * X[i])
return sum(muls)
def w_sum(a,b):
"""Books implementation of W*X"""
assert(len(a) == len(b))
output = 0
for i in range(len(a)):
output += (a[i] * b[i])
return output
# An empty network with multiple inputs.
weights = [.1, .2, 0]
def neural_network(X, weights):
pred = w_sum(X, weights)
return pred
This dataset provides the following information at the beginning of each game for the first four games in a season:
toes = [8.5, 9.5, 9.9, 9]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1]
x0 = [toes[0], wlrec[0], nfans[0]]
neural_network(x0, weights)
0.9800000000000001
The network simply multiples the 3 inputs by its internal 3 knob weights then sums them. This is what we call a weighted sum.
As a result of the input having 3 values, the network also has 3 knobs, we multiply each input by its own weight. Because we have multiple inputs, we have to sum their respective local predictions. This is called the weighted sum or the dot product.
Being able to manipulate vectors is a cornerstore technique for representation learning. Let's implement simple functions that support the following operations:
def elementwise_multiplication(vec_a, vec_b):
"""Element-wise multiplication of two vectors of the same size."""
assert(len(vec_a) == len(vec_b))
muls = list()
for i in range(len(vec_a)):
muls.append(vec_a[i] * vec_b[i])
return muls
def elementwise_addition(vec_a, vec_b):
"""Element-wise addition of two vectors of the same size."""
assert(len(vec_a) == len(vec_b))
adds = list()
for i in range(len(vec_b)):
adds.append(vec_a[i] + vec_b[i])
return adds
def vector_sum(vec_a):
"""Sums the values of a vector."""
assert(type(vec_a) == list)
return sum(vec_a)
def vector_avg(vec_a):
"""Averages the values in a vector."""
assert(type(vec_a) == list)
return sum(vec_a) / len(vec_a)
a, b = [1,2,3], [4,5,6]
vector_sum(elementwise_multiplication(a, b))
32
The intuition behind the dot product operation is one of the most important parts of truly understanding how neural networks make predictions. Because we are summing up element-wise multiplications, the dot product gives us a measure of similarity between two vectors. Let's consider the following example:
a = [ 0, 1, 0, 1] w_sum(a,b) = 0
b = [ 1, 0, 1, 0] w_sum(b,c) = 1
c = [ 0, 1, 1, 0] w_sum(b,d) = 1
d = [.5, 0,.5, 0] w_sum(c,c) = 2
e = [ 0, 1,-1, 0] w_sum(c,e) = 0
We can equate the properties of the dot product to the logical AND
. In this analogy, negative weights tend to imply a logical NOT
operator, and as we can observe, positive weights pairs with negative weights will cause the overall similarity score to go down.
Neural networks are also able to model partial AND
ing. But after multiplication, comes the summation (OR
). In the case of OR
, if any feature result in a high product, it will effect the overall score.
Amusingly, this gives us a kind of crude language for reading weights. The following examples assume we're performing the dot product. In the case when if
statments return True
, we give back a high score:
W = [ 1, 0, 1] => if input[0] OR input[2]
W = [ 0, 0, 1] => if input[2]
W = [ 1, 0,-1] => if input[0] OR NOT input[2]
W = [-1, 0,-1] => if NOT input[0] OR NOT input[2]
W = [.5, 0, 1] => if BIG input[0] or input[2]
Takeaways:
nfans
is completely ignored in the prediction since its corresponding weight is 0.This analogy will help us significantly in the future, especially when putting networks together in increasingly complex ways.
Now's the time to use numpy
(numerical python), which is a Python library, to optimize our neural network implementation:
import numpy as np
weights = np.array([.1, .2, 0])
toes = [8.5, 9.5, 9.9, 9]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1]
def neural_network(W, X):
assert(W.shape[0] == X.shape[0])
return np.dot(weights, X)
neural_network(weights, np.array([toes[0], wlrec[0], nfans[0]]))
0.9800000000000001
Neural networks can also make multiple predictions using only a single input. Prediction occurs the same as if there were 3 disconnected single-weight neural networks:
def neural_network(W, X):
"""Note: NumPy performs accurate dot products based on the Input & Weight shapes."""
return X * W
neural_network(np.array([3]), np.array([.2, .7, 0]))
array([0.6, 2.1, 0. ])
weights = [.3, .2, .9]
def neural_network(W, X):
"""Book's Implementation."""
pred = ele_mul(X, W)
return pred
def ele_mul(c, l):
"""Element-wise scalar multiplication."""
assert(type(c) == int)
result = list()
for i in range(len(l)):
result.append(c * l[i])
return result
We should note that the three predictions are completely separate. Unike neural networks with multiple inputs & a single output where we sum products. This network truly behaves as three independent components, each receiving the same input data.
Finally, neural networks can also predict multiple outputs given multiple inputs:
# We set the weights (knowledge) values
#toes #%win #fans
weights = [[.1, .1, -.3], # 1st Neuron: Hurt ?
[.1, .2, .0], # win ?
[.0, 1.3, .1]] # Sad ?
def neural_network(X, W):
pred = vect_mat_mult(X, W)
return pred
# Input [R^{1x3}] ; Weights [R_{3x3}]
def vect_mat_mult(vect, matrix):
"""Calculates X (vect) * W (matrix)"""
assert(len(vect) == len(matrix))
output = [0] * len(vect)
for i in range(len(vect)):
output[i] = w_sum(vect, matrix[i])
return output
# inputs.
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [.65, .8, .8, .9]
nfans = [1.2, 1.3, .5, 1.0]
# one column.
x0 = [toes[0], wlrec[0], nfans[0]]
pred = neural_network(x0, weights); pred
[0.555, 0.9800000000000001, 0.9650000000000001]
In the case of multiple inputs & outputs, the network performs three independent weighted sums of the input to make three predictions:
The network performs three independent dot products (weighted sums) to output the predictions.
Neural networks can also be stacked:
We can also take the output of one network and feed it as input to another network. This will result in two consecutive vector-matrix multiplications. Let's try an example:
# A Network with multiple inputs and outputs
ih_wgt = [[0.1, 0.2, -0.1],
[-0.1,0.1, 0.9],
[0.1, 0.4, 0.1]]
hp_wgt = [[0.3, 1.1, -0.3],
[0.1, 0.2, 0.0],
[0.0, 1.3, 0.1]]
weights = [ih_wgt, hp_wgt]
def neural_network(X, W):
hid = vect_mat_mult(X, W[0])
pred = vect_mat_mult(hid, W[1])
return pred
pred = neural_network(x0, weights); pred
[0.21350000000000002, 0.14500000000000002, 0.5065]
The following example shows us how we can perform the same computatios using a convenient Python library called Numpy
. Using libraries like NumPy
makes our code faster and easier to read and write.
import numpy as np
ih_wgt = np.array(ih_wgt).transpose()
hp_wgt = np.array(hp_wgt).transpose()
weights = [ih_wgt, hp_wgt]
def neural_network(X, W):
out = X.dot(W[0])
pred = out.dot(W[1])
return pred
toes = np.array(toes)
wlrec = np.array(wlrec)
nfans = np.array(nfans)
x0 = np.array([toes[0], wlrec[0], nfans[0]])
pred = neural_network(x0, weights); pred
array([0.2135, 0.145 , 0.5065])
Numpy
adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. We note the following before diving into examples:
Let's check some numpy
examples:
a = np.array([0,1,2,3]) # a vector.
b = np.array([4,5,6,7]) # another vector.
c = np.array([[0,1,2,3], [4,5,6,7]]) # A Matrix.
d = np.zeros((2,4)) # 2x4 matrix of zeros.
e = np.random.rand(2,5) # 2x5 matrix of random number between 0 & 1.
print(a,b,c,d,e, sep='\n')
[0 1 2 3] [4 5 6 7] [[0 1 2 3] [4 5 6 7]] [[0. 0. 0. 0.] [0. 0. 0. 0.]] [[0.1645878 0.29578112 0.18441271 0.14036276 0.07897252] [0.27901318 0.69096197 0.60646666 0.01898684 0.88302135]]
# element-wise multiplication.
print(a*.2)
[0. 0.2 0.4 0.6]
# element-wise multiplication.
print(c*.1)
[[0. 0.1 0.2 0.3] [0.4 0.5 0.6 0.7]]
# multiply two vectors (element wise).
print(a*b)
[ 0 5 12 21]
# complex element-wise multiplications.
print(a*b*.3)
[0. 1.5 3.6 6.3]
# element-wise row multiplications (because of compatible shapes).
print(a*c)
[[ 0 1 4 9] [ 0 5 12 21]]
# error in case of incompatible shapes.
print(a*e)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-58-82387bc21b72> in <module> 1 # error in case of incompatible shapes. ----> 2 print(a*e) ValueError: operands could not be broadcast together with shapes (4,) (2,5)
When we multiply two variables using the *
operator, numpy
automatically detects what kind of variables we are working with and tries to figure out what kind of operation we want. When we use (+
, -
, /
), either the two variables should have the same number of columns or one of them should have 1
in its shape
.
When we are performing an operation using numpy
, we should always keep the shapes of the inputs in mind.
a = np.zeros((1,4))
b = np.zeros((4,3))
c = a.dot(b)
print(c.shape)
(1, 3)
If we use the .dot
numpy
operator, we should pay attention to the order because we dot the variables that are next to each other.
Let's check more examples that demonstrate the concept of shape
:
import numpy as np
a = np.zeros((2,4))
b = np.zeros((4,3))
c = a.dot(b)
print(c.shape)
(2, 3)
e = np.zeros((2,1))
f = np.zeros((1,3))
g = e.dot(f)
print(g.shape)
(2, 3)
h = np.zeros((5,4)).T
i = np.zeros((5,6))
j = h.dot(i)
print(j.shape)
(4, 6)
import numpy as np
h = np.zeros((5,4))
i = np.zeros((5,6))
j = h.dot(i)
print(j.shape)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-64-86b642518282> in <module> 3 h = np.zeros((5,4)) 4 i = np.zeros((5,6)) ----> 5 j = h.dot(i) 6 print(j.shape) ValueError: shapes (5,4) and (5,6) not aligned: 4 (dim 1) != 5 (dim 0)
To predict, neural networks perform repeated weighted sums of the input. The network's "intelligence" depends on the weight values we give it.
Everything we have done in this chapter is a form of what is called forward propagation.