stats_arrays demonstration

The stats_arrays library is designed with five main objectives:

  • Want a consistent interface to SciPy and NumPy statistical function
  • Want to be able to quickly load and save many parameter uncertainty distribution definitions in a portable format
  • Want to manipulate and switch parameter uncertainty distributions and variables
  • Want simple Monte Carlo random number generators that return a vector of parameter values to be fed into uncertainty or sensitivity analysis
  • Want something simple, extensible, documented and tested

Find out more:

Defining arrays

Let's create a simple array with two uncertain parameters from different uncertainty distributions:

In [1]:
from stats_arrays import *
import numpy as np
my_variables = UncertaintyBase.from_dicts(
    {'loc': 2, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id},
    {'loc': 1.5, 'minimum': 0, 'maximum': 10, 'uncertainty_type': TriangularUncertainty.id}
)
my_variables
Out[1]:
array([(2.0, 0.5, nan, nan, nan, False, 3),
       (1.5, nan, nan, 0.0, 10.0, False, 5)], 
      dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'), ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'), ('uncertainty_type', 'u1')])

The returned array is a normal NumPy array, and is not wrapped in a class or provided with special functionality. We want the arrays to be quite simple and easy to save and load.

In [2]:
type(my_variables)
Out[2]:
numpy.ndarray

One obvious thing that we want to do with uncertain parameters is to sample from their distributions.

In [3]:
my_rng = MCRandomNumberGenerator(my_variables)
for x in range(5):
    print my_rng.next()
[ 1.97593555  0.8728425 ]
[ 1.53794138  3.00478287]
[ 2.08384221  3.87324292]
[ 1.20813909  3.58180859]
[ 2.2107861   2.21406803]

The Monte Carlo RNG can also be an iterator:

In [4]:
zip(my_rng, xrange(5))
Out[4]:
[(array([ 1.66232583,  0.95309365]), 0),
 (array([ 1.93255217,  0.74483594]), 1),
 (array([ 2.22056889,  2.41335127]), 2),
 (array([ 2.2502121 ,  2.29962017]), 3),
 (array([ 1.33087407,  7.75322672]), 4)]

Hypercubic sampling

We can also sample from a hypercube of parameter values.

Note: that this is not technically Latin sampling, as there is no guarantee that each value is only sampled once

In [5]:
lhc = LatinHypercubeRNG(my_variables)
lhc.hypercube
Out[5]:
array([[ 1.33241113,  1.54577107,  1.69770733,  1.82562215,  1.94290735,
         2.05709265,  2.17437785,  2.30229267,  2.45422893,  2.66758887],
       [ 1.16774842,  1.66060826,  2.13754607,  2.64534779,  3.19091516,
         3.78418439,  4.44040551,  5.18524994,  6.06877303,  7.22020275]])
In [6]:
lhc.next()
Out[6]:
array([ 1.54577107,  1.66060826])

Adding bounds

We can use the power of NumPy arrays to do some neat things. First, let's get a new array with just Gaussian-distributed variables.

In [7]:
my_variables = UncertaintyBase.from_dicts(
    {'loc': 3, 'scale': 0.5, 'uncertainty_type': NormalUncertainty.id},
    {'loc': 4, 'scale': 1, 'uncertainty_type': NormalUncertainty.id},
    {'loc': 5, 'scale': 2.5, 'uncertainty_type': NormalUncertainty.id},
    {'loc': 6, 'scale': 5, 'uncertainty_type': NormalUncertainty.id},
)

We can then add bounds. Let's exclude values less than zero, and values more than 10 for variables who standard deviation is greater than 2.

In [8]:
my_variables['minimum'] = 0.
mask = my_variables['scale'] > 2
my_variables['maximum'][mask] = 10
my_variables
Out[8]:
array([(3.0, 0.5, nan, 0.0, nan, False, 3),
       (4.0, 1.0, nan, 0.0, nan, False, 3),
       (5.0, 2.5, nan, 0.0, 10.0, False, 3),
       (6.0, 5.0, nan, 0.0, 10.0, False, 3)], 
      dtype=[('loc', '<f8'), ('scale', '<f8'), ('shape', '<f8'), ('minimum', '<f8'), ('maximum', '<f8'), ('negative', '?'), ('uncertainty_type', 'u1')])
In [9]:
my_rng = MCRandomNumberGenerator(my_variables)
some_samples = np.array([my_rng.next() for x in xrange(1000)]).ravel()
print (some_samples < 0).sum()
0

Bounds work throughout stats_arrays, including with hypercubic sampling and with the individual distribution functions described below.

Working with individual distributions

There are a number of functions for each uncertainty distribution, e.g. PDF, CDF, PPF.

In [10]:
ln = UncertaintyBase.from_dicts(
    {'loc': 0.2, 'scale': 0.1, 'uncertainty_type': LognormalUncertainty.id}
)
In [11]:
LognormalUncertainty.statistics(ln)
Out[11]:
{'lower': 0.9999999999999998,
 'mean': 1.2275250649631777,
 'median': 1.2214027581601699,
 'mode': 1.2092495976572515,
 'upper': 1.4918246976412706}
In [12]:
xs = np.linspace(0.5, 2, 100)
ys = LognormalUncertainty.cdf(ln, xs.reshape((1, 100)))
plot(xs, ys.ravel(), 'g-', lw=2)
Out[12]:
[<matplotlib.lines.Line2D at 0x1091dc050>]
In [13]:
xs = np.linspace(0.5, 2, 100)
xs, ys = LognormalUncertainty.pdf(ln, xs)
plot(xs, ys, 'g-', lw=2)
Out[13]:
[<matplotlib.lines.Line2D at 0x109011d10>]

Defining new distributions

It's pretty easy to define a new distribution, especially if you only want to sample from it. See the documentation for the details for how to completely define a new uncertainty distribution.

In [14]:
import numpy as np


class GammaUncertainty(UncertaintyBase):
    """
Gamma distribution.

shape : scalar > 0
    The shape of the gamma distribution.
scale : scalar > 0, optional
    The scale of the gamma distribution.  Default is equal to 1.
    """
    id = 99
    description = "Gamma uncertainty"

    @classmethod
    def validate(cls, params, transform=False):
        return

    @classmethod
    def random_variables(cls, params, size, seeded_random=None,
                         **kwargs):
        if not seeded_random:
            seeded_random = np.random

        data = seeded_random.gamma(
            shape=params['shape'],
            scale=params['scale'],
            size=(size, params.shape[0])).T
        data[params['negative'], :] = -1 * data[params['negative'], :]
        return data

Distributions need to be registered with uncertainty_choices.

In [15]:
uncertainty_choices.add(GammaUncertainty)
print GammaUncertainty in uncertainty_choices
True
In [16]:
gamma = UncertaintyBase.from_dicts(
    {'scale': 0.2, 'shape': 0.5, 'uncertainty_type': GammaUncertainty.id}
)
rng = MCRandomNumberGenerator(gamma)
rng.next()
Out[16]:
array([ 0.2103989])