(C) 2016 Steve Phelps

- Floating-point representation
- Arrays and Matrices with
`numpy`

- Basic plotting with
`matplotlib`

- Pseudo-random variates with
`numpy.random`

Digital computers are inherently

*discrete*.Real numbers $x \in R$ cannot always be represented exactly in a digital computer.

They are stored in a format called

*floating-point*.IEEE Standard 754 specifies a universal format across different implementations.

- As always there are deviations from the standard.

There are two standard sizes of floating-point numbers: 32-bit and 64-bit.

64-bit numbers are called

*double precision*, are sometimes called*double*values.IEEE floating-point calculations are performed in

*hardware*on modern computers.How can we represent aribitrary real values using only 32 bits?

One way we could discretise continuous values is to represent them as two integers $x$ and $y$.

The final value is obtained by e.g. $r = x + y \times 10^{-5}$.

So the number $500.4421$ would be represented as the tuple $x = 500$, $y = 44210$.

The exponent $5$ is fixed for all computations.

This number represents the

*precision*with which we can represent real values.It corresponds to the where we place we place the decimal

*point*.This scheme is called fixed precision.

It is useful in certain circumstances, but suffers from many problems.

In practice, we use variable precision, also known as

*floating*point.

Humans also use a form of floating-point representation.

In Scientific notation, all numbers are written in the form $m \times 10^n$.

When represented in ASCII, we abbreviate this

`<m>e<n>`

, for example`6.72e+11`

= $6.72 \times 10^{11}$.The integer $m$ is called the

*significand*or*mantissa*.The integer $n$ is called the

*exponent*.The integer $10$ is the

*base*.

- Python uses Scientific notation when it displays floating-point numbers:

In [1]:

```
print 672000000000.0
```

Note that internally, the value is not

*represented*exactly like this.Scientific notation is a convention for writing or rendering numbers,

*not*representing them digitally.

Floating point numbers use a base of $2$ instead of $10$.

Additionally, the mantissa are exponent are stored in binary.

The mantissa uses two's complement to represent positive and negative numbers.

- One bit is reserved as the sign-bit: 1 for negative, 0 for positive values.

The mantissa is normalised, so we assume that it starts with the digit $1$ (which is not stored).

We also need to represent signed

*exponents*.The exponent does

*not*use two's complement.Instead a

*bias*value is subtracted from the stored value to obtain the final stored exponent.Double-precision values use a bias of $1023$, and single-precision uses a bias value of $127$.

For example, in double-precision the exponent $1022$ would be encoded as $1$.

The stored exponent values $0$ and $1024$ are reserved for special values- discussed later.

The number of bits allocated to represent each integer component of a float is given below:

Format |
Sign |
Exponent |
Mantissa |
Total |
---|---|---|---|---|

single |
1 | 8 | 23 | 32 |

double |
1 | 11 | 52 | 64 |

Python normally works 64-bit precision.

`numpy`

allows us to specify the type when storing data in arrays.This is particularly useful for big data where we may need to be careful about the storage requirements of our data-set.

We cannot represent every value in floating-point.

Consider single-precision (32-bit).

Let's try to represent $4,039,944,879$.

As a binary integer we write this:

`11110000 11001100 10101010 10101111`

This already takes up 32-bits.

The mantissa only allows us to store 24-bit integers.

So we have to

*round*. We store it as:

`+1.1110000 11001100 10101011e+31`

- Which gives us

`+11110000 11001100 10101011 0000000`

$= 4,039,944,960$

In single precision arithmetic, we *cannot* represent the following values:

Negative numbers less than $-(2-2^{-23}) \times 2^{127}$

Negative numbers greater than $-2^{-149}$

Positive numbers less than $2^{-149}$

Positive numbers greater than $(2-2^{-23}) \times 2^{127}$

Attempting to represent these numbers results in *overflow* or *underflow*.

Format | Binary | Decimal |
---|---|---|

single | $\pm (2-2^{-23}) \times 2^{127}$ | $\approx \pm 10^{38.53}$ |

double | $\pm (2-2^{-52}) \times 2^{1023}$ | $\approx \pm 10^{308.25}$ |

Zero cannot be represented straightforwardly because we assume that all mantissa values start with the digit 1.

Zero is stored as a special-case, by setting mantissa

*and*exponent both to zero.The sign-bit can either be set or unset, so there are distinct positive and negative representations of zero.

In [2]:

```
x = +0.0
x
```

Out[2]:

In [3]:

```
y = -0.0
y
```

Out[3]:

- However, these are considered equal:

In [4]:

```
x == y
```

Out[4]:

Positive overflow results in a special value of infinity (in Python

`inf`

).This is stored with an exponent consiting of all 1s, and a mantissa of all 0s.

The sign-bit allows us to differentiate between negative and positive overflow: $-\infty$ and $+\infty$.

This allows us to carry on calculating past an overflow event.

In [5]:

```
x = 1e300 * 1e100
x
```

Out[5]:

In [6]:

```
x = x + 1
x
```

Out[6]:

In [7]:

```
x > 0
```

Out[7]:

In [8]:

```
y = -x
y
```

Out[8]:

In [9]:

```
y < x
```

Out[9]:

Some mathematical operations on real numbers do not map onto real numbers.

These results are represented using the special value to

`NaN`

which represents "not a (real) number".`NaN`

is represented by an exponent of all 1s, and a non-zero mantissa.

`NaN`

in Python¶In [10]:

```
from numpy import sqrt, inf, isnan, nan
x = sqrt(-1)
x
```

Out[10]:

In [11]:

```
y = inf - inf
y
```

Out[11]:

`nan`

values in Python¶- Beware of comparing
`nan`

values

In [12]:

```
x == y
```

Out[12]:

- To test whether a value is
`nan`

use the`isnan`

function:

In [13]:

```
isnan(x)
```

Out[13]:

`NaN`

is not the same as `None`

¶`None`

represents a*missing*value.`NaN`

represents an*invalid*floating-point value.These are fundamentally different entities:

In [14]:

```
nan is None
```

Out[14]:

In [15]:

```
isnan(None)
```

In e.g. simulation models or quantiative analysis we typically repeatedly update numerical values inside long loops.

Programs such as these implement

*numerical algorithms*.It is very easy to introduce bugs into code like this.

The round-off error associated with a result can be compounded in a loop.

If the error increases as we go round the loop, we say the algorithm is numerically

*unstable*.Mathematicians design numerically stable algorithms using numerical analysis.

Suppose we have two real values $x$ and $y = x + \epsilon$.

$\epsilon$ is very small and $x$ is very large.

$x$ has an

*exact*floating point representationHowever, because of lack of precision $x$ and $y$ have the same floating point representation.

- i.e. they are represented as the same sequence of 64-bits

The absolute error $r = x - y = (x + \epsilon) - x$.

The relative error is $R = r/x = (x + \epsilon)/x - 1 = \epsilon/x$.

So the relative error is tiny because $x$ is large relative to $\epsilon$.

Ok, but how about the error in computing the

*difference*between these values.If we perform this in floating-point we will get 0.

The correct value is $(x + \epsilon) - x = \epsilon$.

The relative error is infinite!

In [16]:

```
x = 3.141592653589793
x
```

Out[16]:

In [17]:

```
y = 6.022e23
x = (x + y) - y
```

In [18]:

```
x
```

Out[18]:

Avoid subtracting two nearly-equal numbers.

Especially in a loop!

Better-yet use a well-validated existing implementation in the form of a numerical library.

In [19]:

```
import numpy as np
```

- We can now use the functions defined in this package by prefixing them with
`np`

.

Arrays represent a collection of values.

In contrast to lists:

- arrays typically have a
*fixed length*- they can be resized, but this involves an expensive copying process.

- and all values in the array are of the
*same type*.- typically we store floating-point values.

- arrays typically have a
Like lists:

- arrays are
*mutable*; - we can change the elements of an existing array.

- arrays are

`numpy`

¶Arrays are provided by the

`numpy`

module.The function

`array()`

creates an array given a list.

In [20]:

```
import numpy as np
x = np.array([0, 1, 2, 3, 4])
x
```

Out[20]:

- We can index an array just like a list

In [21]:

```
x[4]
```

Out[21]:

In [22]:

```
x[4] = 2
x
```

Out[22]:

- Although this looks a bit like a list of numbers, it is a fundamentally different type of value:

In [23]:

```
type(x)
```

Out[23]:

- For example, we cannot append to the array:

In [24]:

```
x.append(5)
```

- When we use arithmetic operators on arrays, we create a new array with the result of applying the operator to each element.

In [25]:

```
y = x * 2
y
```

Out[25]:

- The same goes for numerical functions:

In [26]:

```
x = np.array([-1, 2, 3, -4])
y = abs(x)
y
```

Out[26]:

Note that not every function automatically works with arrays.

Functions that have been written to work with arrays of numbers are called

*vectorized*functions.Most of the functions in

`numpy`

are already vectorized.You can create a vectorized version of any other function using the higher-order function

`numpy.vectorize()`

.

`vectorize`

example¶In [27]:

```
def myfunc(x):
if x >= 0.5:
return x
else:
return 0.0
fv = np.vectorize(myfunc)
```

In [28]:

```
x = np.arange(0, 1, 0.1)
x
```

Out[28]:

In [29]:

```
fv(x)
```

Out[29]:

- To populate an array with a range of values we use the
`np.arange()`

function:

In [30]:

```
x = np.arange(0, 10)
print x
```

- We can also use floating point increments.

In [31]:

```
x = np.arange(0, 1, 0.1)
print x
```

We will use a module called

`matplotlib`

to plot some simple graphs.This module has a nested module called

`pyplot`

.By convention we import this with the alias

`plt`

.This module provides functions which are very similar to MATLAB plotting commands.

In [32]:

```
import matplotlib.pyplot as plt
%matplotlib inline
y = x*2 + 5
plt.plot(x, y)
```

Out[32]:

In [33]:

```
from numpy import pi, sin
x = np.arange(0, 2*pi, 0.01)
y = sin(x)
plt.plot(x, y)
```

Out[33]:

Numpy arrays can hold multi-dimensional data.

To create a multi-dimensional array, we can pass a list of lists to the

`array()`

function:

In [34]:

```
import numpy as np
x = np.array([[1,2], [3,4]])
x
```

Out[34]:

A multi-dimensional array is an array of an arrays.

The outer array holds the rows.

Each row is itself an array:

In [35]:

```
x[0]
```

Out[35]:

In [36]:

```
x[1]
```

Out[36]:

- So the element in the second row, and first column is:

In [37]:

```
x[1][0]
```

Out[37]:

- We can create a matrix from a multi-dimensional array.

In [38]:

```
M = np.matrix(x)
M
```

Out[38]:

- If we supply a matrix to
`plot()`

then it will plot the y-values taken from the*columns*of the matrix (notice the transpose in the example below).

In [39]:

```
from numpy import pi, sin, cos
x = np.arange(0, 2*pi, 0.01)
y = sin(x)
ax = plt.plot(x, np.matrix([sin(x), cos(x)]).T)
```

When we use

`numpy`

matrices in Python the corresponding functions are linked with libraries written in C and FORTRAN.For example, see the BLAS (Basic Linear Algebra Subprograms) library.

These libraries are very fast, and can be configured so that operations are performed in parallel on multiple CPU cores, or GPU hardware.

To compute the transpose $M^{T}$

In [40]:

```
M.T
```

Out[40]:

To compute the inverse $M^{-1}$

In [41]:

```
M.I
```

Out[41]:

- The total number of elements, and the dimensions of the array:

In [42]:

```
M.size
```

Out[42]:

In [43]:

```
M.shape
```

Out[43]:

In [44]:

```
len(M.shape)
```

Out[44]:

- We can also create arrays directly from strings, which saves some typing:

In [45]:

```
I2 = np.matrix('2 0; 0 2')
I2
```

Out[45]:

- The semicolon starts a new row.

Now that we have two matrices, we can perform matrix multiplication:

In [46]:

```
M * I2
```

Out[46]:

- We can index and slice matrices using the same syntax as lists.

In [47]:

```
M[:,1]
```

Out[47]:

- If we use this is an assignment, we create a
*reference*to the sliced elements,*not*a copy.

In [48]:

```
V = M[:,1] # This does not make a copy of the elements!
V
```

Out[48]:

In [49]:

```
M[0,1] = -2
V
```

Out[49]:

- To copy a matrix, or a slice of its elements, use the function
`np.copy()`

:

In [50]:

```
M = np.matrix('1 2; 3 4')
V = np.copy(M[:,1]) # This does copy the elements.
V
```

Out[50]:

In [51]:

```
M[0,1] = -2
V
```

Out[51]:

One way we *could* sum a vector or matrix is to use a `for`

loop.

In [52]:

```
vector = np.arange(0.0, 100.0, 10.0)
vector
```

Out[52]:

In [53]:

```
result = 0.0
for x in vector:
result = result + x
result
```

Out[53]:

- This is not the most
*efficient*way to compute a sum.

Instead of using a

`for`

loop, we can use a numpy function`sum()`

.This function is written in the C language, and is very fast.

In [54]:

```
vector = np.array([0, 1, 2, 3, 4])
print np.sum(vector)
```

- When dealing with multi-dimensional data, the 'sum()' function has a named-argument
`axis`

which allows us to specify whether to sum along, each rows or columns.

In [55]:

```
matrix = np.matrix('1 2 3; 4 5 6; 7 8 9')
print matrix
```

- To sum along rows:

In [56]:

```
np.sum(matrix, axis=0)
```

Out[56]:

- To sum along columns:

In [57]:

```
np.sum(matrix, axis=1)
```

Out[57]:

- Suppose we want to compute $y_n = \sum_{i=1}^{n} x_i$ where $\mathbf{x}$ is a vector.

In [58]:

```
import numpy as np
x = np.array([0, 1, 2, 3, 4])
y = np.cumsum(x)
print y
```

In [59]:

```
x = np.matrix('1 2 3; 4 5 6; 7 8 9')
print x
```

In [60]:

```
y = np.cumsum(x)
np.cumsum(x, axis=0)
```

Out[60]:

In [61]:

```
np.cumsum(x, axis=1)
```

Out[61]:

- Similarly we can compute $y_n = \Pi_{i=1}^{n} x_i$ using
`cumprod()`

:

In [62]:

```
import numpy as np
x = np.array([1, 2, 3, 4, 5])
np.cumprod(x)
```

Out[62]:

- We can compute cummulative products along rows and columns using the
`axis`

parameter, just as with the`cumsum()`

example.

- The nested module
`numpy.random`

contains functions for generating random numbers from different probability distributions.

In [63]:

```
from numpy.random import normal, uniform, exponential, randint
```

Suppose that we have a random variable $\epsilon \sim N(0, 1)$.

In Python we can draw from this distribution like so:

In [64]:

```
epsilon = normal()
print epsilon
```

- If we execute another call to the function, we will make a
*new*draw from the distribution:

In [65]:

```
epsilon = normal()
print epsilon
```

Strictly speaking, these are not random numbers.

They rely on an initial state value called the

*seed*.If we know the seed, then we can predict with total accuracy the rest of the sequence, given any "random" number.

Nevertheless, statistically they behave like independently and identically-distributed values.

- Statistical tests for correlation and auto-correlation give insignificant results.

For this reason they called

*pseudo*-random numbers.The algorithms for generating them are called Pseudo-Random Number Generators (PRNGs).

Some applications, such as cryptography, require genuinely unpredictable sequences.

- never use a standard PRNG for these applications!

In some applications we need to reliably reproduce the same sequence of pseudo-random numbers that were used.

We can specify the seed value at the beginning of execution to achieve this.

Use the function

`seed()`

in the`numpy.random`

module.

In [66]:

```
from numpy.random import seed
seed(5)
```

In [67]:

```
normal()
```

Out[67]:

In [68]:

```
normal()
```

Out[68]:

In [69]:

```
seed(5)
```

In [70]:

```
normal()
```

Out[70]:

In [71]:

```
normal()
```

Out[71]:

- To generate more than number, we can specify the
`size`

parameter:

In [72]:

```
normal(size=10)
```

Out[72]:

- If you are generating very many variates, this will be
*much*faster than using a for loop

- We can also specify more than one dimension:

In [73]:

```
normal(size=(5,5))
```

Out[73]:

- We can plot a histograms of randomly-distributed data using the
`hist()`

function from matplotlib:

In [74]:

```
import matplotlib.pyplot as plt
%matplotlib inline
data = normal(size=10000)
ax = plt.hist(data)
```

- The function
`histogram()`

in the`numpy`

module will count frequencies into bins and return the result as a 2-dimensional array.

In [75]:

```
import numpy as np
np.histogram(data)
```

Out[75]:

In [76]:

```
np.mean(data)
```

Out[76]:

In [77]:

```
np.var(data)
```

Out[77]:

- These functions also have an
`axis`

parameter to compute mean and variances of columns or rows of a multi-dimensional data-set.

`nan`

values¶- If the data contains
`nan`

values, then the summary statistics will also be`nan`

.

In [78]:

```
from numpy import nan
import numpy as np
data = np.array([1, 2, 3, 4, nan])
np.mean(data)
```

Out[78]:

- To omit
`nan`

values from the calculation, use the functions`nanmean()`

and`nanvar()`

:

In [79]:

```
np.nanmean(data)
```

Out[79]:

The

`randint()`

function in`numpy.random`

can be used to draw from a uniform discrete probability distribution.It takes two parameters: the low value (inclusive), and the high value (exclusive).

So to simulate one roll of a die, we would use the following Python code.

In [80]:

```
die_roll = randint(0, 6) + 1
die_roll
```

Out[80]:

Just as with the

`normal()`

function, we can generate an entire sequence of values.To simulate a Bernoulli process with $n=20$ trials:

In [81]:

```
bernoulli_trials = randint(0, 2, size = 20)
bernoulli_trials
```

Out[81]:

The earlier sections of this notebook were adapted from an article on floating-point numbers written by Steve Hollasch.