Give a high level (i.e., no equations) explanation of the topic, and why it is important.
Vector spaces are widely used in electical engineering, computer science, physics, chemistry, and mathematics. They are a fundamental concept in abstract algebra and are consequently used all over the place. Defining a vector space involves defining a mathematical system withs two operators, an additive operation and scalar multiplication operation usually denoted + and *. Vector spaces can be defined over discrete-time signals (things like Rn, Cn) but they can also be defined over continuous-time signals (things like Pn, the set of all polynomials of degree n or less). Once a vector space is defined then it satisfies certain mathematical properties and any transformation in that vector space can be guaranteed to follow the properties of vector spaces.
There are some special vector spaces that always seem to come up so they have special names. A couple examples of these special vector spaces are Hilbert and Banach Spaces. (These special spaces are covered in more detail in later topics).
The following is the definition of a vector space where x and y are elements of a vector space V where 0 is the 0 vector; and c and d are arbitrary scalar values:
When any two operators that can be defined as + and * that satisfy the above constraints is a vector space.
Axiom 1-4 describes how a vector space must be closed under addition. In order to be closed under addition, any arbitrary elements from a vector space must be commutative, associative, contain an additive identity (the 0 vector), and contain an additive inverse. Both commutative and associative are commonly understood properties. An additive identity simply means that there exists an element that when operating on two elements always results in the other element. An additive inverse means that for every element there exists another element that added together is the 0 vector.
Axioms 5-7 state that a vector space is closed under scalar multiplication. To be closed under scalar multiplication means that there is a 0 vector, a multiplicative identity, and scalar multiplication is associative. Containing a 0 vector simply means that there is an element where any scalar times this element is the itself the 0 vector. A multiplicative identity means that there exists a scalar where any element times this scalar is the same element. Finally scalar multiplications being associative means that it is associative.
Axioms 8-9 state that the distributive property holds. In addition, it might be helpful to note that x, y and z must exist in the vector space so any amount of adding or scaling must result in a vector that is also in the vector space.
The first vector space we are going to show is the real numbers R. In order to do this we need to prove all 9 axioms above. Rather than prove we will check numerically that the above statment is true.
import numpy as np
# we will use this definition for equivilance to deal with floating point calculation errors
def equal(x, y):
return x - y > -0.00000001 and x - y < 0.00000001
def print_result(a, b, c, d):
print(a, b, c, d)
print()
print("We will show below that each of the conditions are met for real numbers")
# first we initialize three random variables in R
x = np.random.sample(1)[0] * 100
y = np.random.sample(1)[0] * 100
z = np.random.sample(1)[0] * 100
# now we need to initialize a few random scalars
c = np.random.sample(1)[0] * 100
d = np.random.sample(1)[0] * 100
print("x = ", x)
print("y = ", y)
print("z = ", z)
print("c = ", c)
print("d = ", d)
print()
def check_if_is_valid_vector_space(x, y, z, c, d):
# 1. $\forall{x, y}$: $x+y = y+x$
print_result("1. x + y = y + x =", x + y, ":", equal(x + y, y + x))
# 2. $\forall{x, y, z}$: $(x+y)+z = x+(y+z)$
print_result("2. (x + y) + z = y + (x + z) =",(x + y) + z, ":", equal((x + y) + z, x + (y + z)))
# 3. $\forall{x, y}$: $0+x = x+0 = x$
print_result("3. 0 + x = x =", x + 0, ":", equal(x + 0, x) )
# 4. $\forall{x, y}$: $(-x) + x = x + (-x) = 0$
print_result("4. x + -x = 0 =", x + -x, ":", equal(x + -x, 0) )
# 5. $\forall{x}$: $0x = 0$
print_result("5. x * 0 = 0 =", x * 0, ":", equal(x * 0, 0) )
# 6. $\forall{x}$: $1x = x$
print_result("6. x * 1 = x =", x * 1, ":", equal(x * 1, x) )
# 7. $\forall{x, c, d}$: $(cd)x = c(dx)$
print_result("7. (cd)x = c(dx) =", (c * d) * x, ":", equal((c * d) * x, c * (d * x)) )
# 8. $\forall{x, y, c, d}$: $c(x+y) = cx + cy$
print_result("8. c(x + y) = cx + cy =", c * (x + y), ":", equal(c * (x + y), (c * x) + (c * y)) )
# 9. $\forall{x, c, d}$: $(c+d)x = cx +dx$
print_result("9. (c + d)x = cx + dx =", (c + d) * x, ":", equal((c + d) * x, (c * x) + (d * x)) )
check_if_is_valid_vector_space(x, y, z, c, d)
We have shown that R can be defined as a vector space. A vector space can be defined over any Rn. As an example we will show that R10 can be defined as a vector space.
import numpy as np
# we will use this definition for equivilance to deal with floating point calculation errors
def equal(x, y):
return np.sum(x - y) > -0.00000001 and np.sum(x - y) < 0.00000001
print("We will show below that each of the conditions are met for numbers in R^n")
# first we initialize three random variables in R^n
n = 10
x = np.random.sample(n) * 100
y = np.random.sample(n) * 100
z = np.random.sample(n) * 100
# now we need to initialize a few random scalars
c = np.random.sample(1)[0] * 100
d = np.random.sample(1)[0] * 100
print("x = ", x)
print("y = ", y)
print("z = ", z)
print("c = ", c)
print("d = ", d)
print()
check_if_is_valid_vector_space(x, y, z, c, d)
Hopefully now, you can see how this could be applied to any Rn. A simple extension would be to define a vector space over Cn. We can even define over a set of functions. Below we define P3 as a vector space. P3 is all polynomials of degree 3 or less.
n = 3
p1 = np.random.sample(n + 1) * 100
p2 = np.random.sample(n + 1) * 100
p3 = np.random.sample(n + 1) * 100
c = np.random.sample(1)[0] * 100
d = np.random.sample(1)[0] * 100
# here we define how to print a polynomial
def polynomial_to_string(p1):
m = p1.shape[0]
string = ''
for i in range(m - 1, 0, -1):
string += str(p1[i]) + "v^" + str(i) + " + "
string += str(p1[0])
return string
# now we will want to print out our nice polynomial
def print_result(a, b, c, d):
print(a, polynomial_to_string(b), c, d)
print()
# And finally we check to make sure that the polynomials are valid
check_if_is_valid_vector_space(p1, p2, p3, c, d)
Provide a more sophisticated example showing one engineering example of the topic, complete with python code.
Neural networks are powerful examples that often use vector representations. One of the first examples of neural networks to come out of research is called a Perceptron. If you are unfamiliar a perceptron is a single layer neural network that is used in one famous example is using the perceptron to identify the type of flower from a dataset called the Iris dataset. Iris dataset contains 3 types of flowers: Iris-setosa, Iris-versicolor, and Iris-virginica. We will ignore Iris-virginica in order to simplify the problem. Each instance of a flower contains certain measurements: sepal length, sepal width, petal length, and petal width. All measurements are in cm.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def load_data():
URL_='https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
data = pd.read_csv(URL_, header = None)
data = data[:100]
print(data)
data[4] = np.where(data.iloc[:, -1]=='Iris-setosa', 0, 1)
return np.array(data)
data = load_data()
The task in this problem is to try to identify the type of flower from the features. In order to do this we will represent each feature of flowers as a dimension in our Iris flower vector space. In machine learning this technique of creating a vector space to represent a list of attributes of an object for identification is also called a feature space. For the iris dataset we will end up with a vector space defined in R4. The first flower would be represented in 4 dimensional space as the vector, x, [5.1, 3.5, 1.4, 0.2]T. We want to define the problem so that we input x and we get out the class, y, or in this case Iris-setosa. We will represent Iris-setosa as class 0, Iris-versicolor as class 1.
Below is an implemented perceptron. For anyone unfamiliar I tried to comment my code a lot so that it is easy to understand. First during training the model will run through each example and if it is wrong it will take a step in the right direction (using gradient descent) so that it will be right next time. Each run through the training data is called an epoch (pronounced like epic). It is important to run enough epochs so that the data converges. If you have a hard time understanding what is going on in the Perceptron Class try looking at the next segment first.
import numpy as np
# Throughout this code the following symbols are used a is the learning rate,
# x is the feature vector, y is the class label, w is a weight vector
class Perceptron():
ys = []
def __init__(self):
self.w = []
self.a = .1
self.num_epochs = 1000
pass
# calculates the accuracy of the machine learning model
def measure_accuracy(self, xs, ys):
correct_count = 0
prediction = []
print("Test Set Results")
for i in range(np.size(xs, axis=0)):
x = xs[i]
y = int(ys[i])
self.predict(x, prediction)
pred = int(prediction[0])
if pred == y:
correct_count += 1
print(" Test Sample", i + 1)
print(" Input:", xs[i])
print(" Predicted Class:", pred)
print(" True Class:", int(ys[i]))
print()
return correct_count / np.size(xs, axis=0)
# does the prediction at test time
def test(self, x, w):
n = np.dot(x, w)
z = 1 if n > 0 else 0
return z
# calculates the gradient and the change is w
def delta_w(self, a, x, y, w):
n = np.dot(x, w)
z = 1 if n > 0 else 0
delta = (a * (y - z)) * x
return w + delta
# one run through the training set is also called one epoch
def epoch(self, a, x, y, w, N):
for i in range(N):
w = self.delta_w(a, x[i], y[i], w)
return w
# trains the model
def train(self, x, y):
self.w = np.random.randn(np.size(x, axis=1) + 1)
x = np.squeeze(np.asarray(x))
x = np.hstack((x, np.ones((np.size(x, axis=0), 1))))
for _ in range(self.num_epochs):
self.w = self.epoch(self.a, x, np.squeeze(y), self.w, np.size(x, axis=0))
return self.w
# a wrapper function for test (it predicts the output at test time)
def predict(self, x, y):
x = np.append(x, 1)
del y[:]
y += [(self.test(x, self.w))]
return y
Now that we have defined our model we need to shuffle the data, split it into training and testing sets, train our model, and finally test out model.
# load the model
learner = Perceptron()
# shufle the data
np.random.shuffle(data)
# we will split our data into training and test sets
num_trn_examples = 80 # this means there are 20 tst examples b/c there are 100 total examples
trn_data = data[:num_trn_examples]
tst_data = data[num_trn_examples:]
# split the input from the class labels
trn_x = trn_data[:, :-1]
trn_y = trn_data[:, -1]
tst_x = tst_data[:, :-1]
tst_y = tst_data[:, -1]
# train the perceptron
learner.train(trn_x, trn_y)
# test our model
accuracy = learner.measure_accuracy(tst_x, tst_y)
print("Test Set Accuracy", str(int(accuracy * 100)) + "%")
We were able to model the different Iris flowers according to a vector space model and consequently we were able to identify the type of Iris flower based on several sepal/petal measurements. YAY! In machine learning there are many sophisticated examples of when modeling things using vector spaces is very useful. Hopefully now you can see other instances where vector spaces would also be useful.
Write to numerically verify that the integer class Zn does not satisfy all of the conditions for a vector space? (hint: Are the results always in the same space?)
# todo: add your code here!