# stats_arrays demonstration¶

The stats_arrays library is designed with five main objectives:

• Want a consistent interface to SciPy and NumPy statistical function
• Want to be able to quickly load and save many parameter uncertainty distribution definitions in a portable format
• Want to manipulate and switch parameter uncertainty distributions and variables
• Want simple Monte Carlo random number generators that return a vector of parameter values to be fed into uncertainty or sensitivity analysis
• Want something simple, extensible, documented and tested

Find out more:

## Defining arrays¶

Let's create a simple array with two uncertain parameters from different uncertainty distributions:

In [1]:
from stats_arrays import *
import numpy as np
my_variables = UncertaintyBase.from_dicts(
{'loc': 2, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id},
{'loc': 1.5, 'minimum': 0, 'maximum': 10, 'uncertainty_type': TriangularUncertainty.id}
)
my_variables

Out[1]:
array([(2.0, 0.5, nan, nan, nan, False, 3),
(1.5, nan, nan, 0.0, 10.0, False, 5)],
dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'), ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'), ('uncertainty_type', 'u1')])

The returned array is a normal NumPy array, and is not wrapped in a class or provided with special functionality. We want the arrays to be quite simple and easy to save and load.

In [2]:
type(my_variables)

Out[2]:
numpy.ndarray

One obvious thing that we want to do with uncertain parameters is to sample from their distributions.

In [3]:
my_rng = MCRandomNumberGenerator(my_variables)
for x in range(5):
print my_rng.next()

[ 1.97593555  0.8728425 ]
[ 1.53794138  3.00478287]
[ 2.08384221  3.87324292]
[ 1.20813909  3.58180859]
[ 2.2107861   2.21406803]


The Monte Carlo RNG can also be an iterator:

In [4]:
zip(my_rng, xrange(5))

Out[4]:
[(array([ 1.66232583,  0.95309365]), 0),
(array([ 1.93255217,  0.74483594]), 1),
(array([ 2.22056889,  2.41335127]), 2),
(array([ 2.2502121 ,  2.29962017]), 3),
(array([ 1.33087407,  7.75322672]), 4)]

## Hypercubic sampling¶

We can also sample from a hypercube of parameter values.

Note: that this is not technically Latin sampling, as there is no guarantee that each value is only sampled once

In [5]:
lhc = LatinHypercubeRNG(my_variables)
lhc.hypercube

Out[5]:
array([[ 1.33241113,  1.54577107,  1.69770733,  1.82562215,  1.94290735,
2.05709265,  2.17437785,  2.30229267,  2.45422893,  2.66758887],
[ 1.16774842,  1.66060826,  2.13754607,  2.64534779,  3.19091516,
3.78418439,  4.44040551,  5.18524994,  6.06877303,  7.22020275]])
In [6]:
lhc.next()

Out[6]:
array([ 1.54577107,  1.66060826])

We can use the power of NumPy arrays to do some neat things. First, let's get a new array with just Gaussian-distributed variables.

In [7]:
my_variables = UncertaintyBase.from_dicts(
{'loc': 3, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id},
{'loc': 4, 'scale': 1, 'uncertainty_type': NormalUncertainty.id},
{'loc': 5, 'scale': 2.5, 'uncertainty_type': NormalUncertainty.id},
{'loc': 6, 'scale': 5, 'uncertainty_type': NormalUncertainty.id},
)


We can then add bounds. Let's exclude values less than zero, and values more than 10 for variables who standard deviation is greater than 2.

In [8]:
my_variables['minimum'] = 0.
my_variables

Out[8]:
array([(3.0, 0.5, nan, 0.0, nan, False, 3),
(4.0, 1.0, nan, 0.0, nan, False, 3),
(5.0, 2.5, nan, 0.0, 10.0, False, 3),
(6.0, 5.0, nan, 0.0, 10.0, False, 3)],
dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'), ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'), ('uncertainty_type', 'u1')])
In [9]:
my_rng = MCRandomNumberGenerator(my_variables)
some_samples = np.array([my_rng.next() for x in xrange(1000)]).ravel()
print (some_samples < 0).sum()

0


Bounds work throughout stats_arrays, including with hypercubic sampling and with the individual distribution functions described below.

## Working with individual distributions¶

There are a number of functions for each uncertainty distribution, e.g. PDF, CDF, PPF.

In [10]:
ln = UncertaintyBase.from_dicts(
{'loc': 0.2, 'scale': 0.1, 'uncertainty_type': LognormalUncertainty.id}
)

In [11]:
LognormalUncertainty.statistics(ln)

Out[11]:
{'lower': 0.9999999999999998,
'mean': 1.2275250649631777,
'median': 1.2214027581601699,
'mode': 1.2092495976572515,
'upper': 1.4918246976412706}
In [12]:
xs = np.linspace(0.5, 2, 100)
ys = LognormalUncertainty.cdf(ln, xs.reshape((1, 100)))
plot(xs, ys.ravel(), 'g-', lw=2)

Out[12]:
[<matplotlib.lines.Line2D at 0x1091dc050>]
In [13]:
xs = np.linspace(0.5, 2, 100)
xs, ys = LognormalUncertainty.pdf(ln, xs)
plot(xs, ys, 'g-', lw=2)

Out[13]:
[<matplotlib.lines.Line2D at 0x109011d10>]

## Defining new distributions¶

It's pretty easy to define a new distribution, especially if you only want to sample from it. See the documentation for the details for how to completely define a new uncertainty distribution.

In [14]:
import numpy as np

class GammaUncertainty(UncertaintyBase):
"""
Gamma distribution.

shape : scalar > 0
The shape of the gamma distribution.
scale : scalar > 0, optional
The scale of the gamma distribution.  Default is equal to 1.
"""
id = 99
description = "Gamma uncertainty"

@classmethod
def validate(cls, params, transform=False):
return

@classmethod
def random_variables(cls, params, size, seeded_random=None,
**kwargs):
if not seeded_random:
seeded_random = np.random

data = seeded_random.gamma(
shape=params['shape'],
scale=params['scale'],
size=(size, params.shape[0])).T
data[params['negative'], :] = -1 * data[params['negative'], :]
return data


Distributions need to be registered with uncertainty_choices.

In [15]:
uncertainty_choices.add(GammaUncertainty)
print GammaUncertainty in uncertainty_choices

True

In [16]:
gamma = UncertaintyBase.from_dicts(
{'scale': 0.2, 'shape': 0.5, 'uncertainty_type': GammaUncertainty.id}
)
rng = MCRandomNumberGenerator(gamma)
rng.next()

Out[16]:
array([ 0.2103989])
In [16]: