This notebook is an element of the risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden [email protected]

This notebook contains an introduction to the use of SciPy library's support for various probability distributions. The library documentation is available online. We also show how the SymPy library for symbolic mathematics can be used to calculate various statistical properties analytically.

In [1]:

```
import numpy
import scipy.stats
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline
%config InlineBackend.figure_formats=['svg']
```

Let's generate 5 random variates from a continuous uniform distribution between 90 and 100 (the first argument to the `scipy.stats.uniform`

function is the lower bound, and the second argument is the width). The object `u`

will contain the "frozen distribution".

In [2]:

```
u = scipy.stats.uniform(90, 10)
u.rvs(5)
```

Out[2]:

Let's check that the expected value of the distribution is around 95.

In [3]:

```
u.rvs(1000).mean()
```

Out[3]:

Let's check that around 20% of the variates are less than 92.

In [4]:

```
(u.rvs(1000) < 92).sum() / 1000.0
```

Out[4]:

We can also use the `stats`

module of the SymPy library to obtain the same information using an analytical (rather than stochastic) method.

In [5]:

```
import sympy.stats
us = sympy.stats.Uniform("unif", 90, 100)
# generate one random variate
sympy.stats.sample(us)
```

Out[5]:

Check that the expected value (the mean of the distribution) is 95.

In [6]:

```
sympy.stats.E(us)
```

Out[6]:

The probability of a random variate being less than 92:

In [7]:

```
sympy.stats.P(us < 92)
```

Out[7]:

Consider a Gaussian (normal) distribution centered in 5, with a standard deviation of 1.

In [8]:

```
norm = scipy.stats.norm(5, 1)
x = numpy.linspace(1, 9, 100)
plt.plot(x, norm.pdf(x));
```

Check that half the distribution is located to the left of 5.

In [9]:

```
norm.cdf(5)
```

Out[9]:

In [10]:

```
norm.ppf(0.5)
```

Out[10]:

Find the first percentile of the distribution (the value of $x$ which has 1% of realizations to the left). Check that it is also equal to the 99% survival quantile.

In [11]:

```
norm.ppf(0.01)
```

Out[11]:

In [12]:

```
norm.isf(0.99)
```

Out[12]:

In [13]:

```
norm.cdf(2.67365)
```

Out[13]:

The central limit theorem states that the mean of a set of random measurements will tend to a normal distribution, no matter the shape of the original measurement distribution. The property is also true of the sum of a set of random measurements. Let's test that in Python, simulating measurements from a uniform distribution between 30 and 40.

Procedure: take 100 measurements from the $U(30, 40)$ distribution, and calculate their mean. Repeat this 10000 times and plot a histogram of the means, which should be normally distributed.

In [14]:

```
N = 10000
sim = numpy.zeros(N)
for i in range(N):
sim[i] = numpy.random.uniform(30, 40, 100).mean()
plt.hist(sim, bins=20, alpha=0.5, normed=True);
```

Exercise: try this with other probability distributions.

The exponential distribution is often used in reliability engineering to represent failure of mechanical equipment (which is exposed to wear). The hazard function, or failure rate, of the exponential distribution is constant, equal to $\lambda$. Let's check the property that the expected value of an exponential random variable is $\frac{1}{\lambda}$.

In [15]:

```
lda = 25
obs = scipy.stats.expon.rvs(scale=1/float(lda), size=1000)
obs.mean()
```

Out[15]:

In [16]:

```
1/float(lda)
```

Out[16]:

Indeed, those are quite close! Let's check another property of the exponential distribution: that the variance is equal to $\lambda^{-2}$.

In [17]:

```
obs.var()
```

Out[17]:

In [18]:

```
1/float(lda)**2
```

Out[18]:

And of course since the standard deviation is the square root of the variance, it should be equal to the expected value.

In [19]:

```
obs.std()
```

Out[19]: