Monte Carlo sampling methods

This notebook is an element of the risk-engineering.org courseware. It can be distributed under the terms of the Creative Commons Attribution-ShareAlike licence.

Author: Eric Marsden [email protected]


This notebook contains an introduction to different sampling methods in Monte Carlo analysis (standard random sampling, latin hypercube sampling, and low discrepency sequences such as that of Sobol' and that of Halton). The notebook shows how to use Python, with the SciPy and SymPy libraries. See the lecture slides at risk-engineering.org for more background information on stochastic simulation.

In [1]:
import math
import numpy
import scipy.stats

Let's start with a simple integration problem in 1D,

$\int_1^5 x^2$

This is easy to solve analytically, and we can use the SymPy library in case you've forgotten how to resolve simple integrals.

In [2]:
import sympy
x = sympy.Symbol('x')
i = sympy.integrate(x**2)
float(i.subs(x, 5) - i.subs(x, 1))
Out[2]:
41.333333333333336

We can estimate this integral using a standard Monte Carlo method, where we use the fact that the expectation of a random variable is related to its integral

$\mathbb{E}(f(x)) = \int_I f(x) dx$

We will sample a large number $N$ of points in $I$ and calculate their average, and multiply by the range over which we are integrating.

In [3]:
N = 10000
accum = 0
for i in range(N):
    x = numpy.random.uniform(1, 5)
    accum += x**2
volume = 5 - 1
volume * accum / float(N)
Out[3]:
41.32694614193652

If we increase $N$, the estimation will converge to the analytical value.

Latin hypercube sampling

The LHS method consists of dividing the input space into a number of equiprobable regions, then taking random samples from each region. We can use it conveniently in Python thanks to the pyDOE library, which you will probably need to install on your computer, using a command such as

pip install pyDOE

The lhs function in this library returns an "experimental design" consisting of points in the $[0, 1]^d$ hypercube, where $d$ is the dimension of your problem (it's 2 in this simple example). You need to scale these points to your input domain.

In [4]:
# obtain the pyDOE library from http://pythonhosted.org/pyDOE/
from pyDOE import lhs

seq = lhs(2, N)
accum = 0
for i in range(N):
    x = 1 + seq[i][0]*(5 - 1)
    y = 1 + seq[i][1]*(5**2 - 1**2)
    accum += x**2
volume = 5 - 1
volume * accum / float(N)
Out[4]:
41.33332460017084

Note that the error in this estimation is significantly lower than that obtained using standard Monte Carlo sampling (and if you repeat this experiment many times, you should find this is true in most cases).

Halton’s low discrepency sequences

In [5]:
# from https://mail.scipy.org/pipermail/scipy-user/2013-June/034744.html
def halton(dim, nbpts):
    h = numpy.empty(nbpts * dim)
    h.fill(numpy.nan)
    p = numpy.empty(nbpts)
    p.fill(numpy.nan)
    P = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]
    lognbpts = math.log(nbpts + 1)
    for i in range(dim):
        b = P[i]
        n = int(math.ceil(lognbpts / math.log(b)))
        for t in range(n):
            p[t] = pow(b, -(t + 1))

        for j in range(nbpts):
            d = j + 1
            sum_ = math.fmod(d, b) * p[0]
            for t in range(1, n):
                d = math.floor(d / b)
                sum_ += math.fmod(d, b) * p[t]

            h[j*dim + i] = sum_
    return h.reshape(nbpts, dim)
In [6]:
seq = halton(2, N)
accum = 0
for i in range(N):
    x = 1 + seq[i][0]*(5 - 1)
    y = 1 + seq[i][1]*(5**2 - 1**2)
    accum += x**2
volume = 5 - 1
volume * accum / float(N)
Out[6]:
41.317080778527263

A higher dimensional integral

Let us now analyze an integration problem in dimension 4, the Ishigami function. This is a well-known function in numerical optimization and stochastic analysis, because it is very highly non-linear.

In [7]:
def ishigami(x1, x2, x3):
    return numpy.sin(x1) + 7*numpy.sin(x2)**2 + 0.1 * x3**4 * numpy.sin(x1)

We want to resolve the integral over $[-\pi, \pi]^3$. We start by resolving the problem analytically, using SymPy.

In [8]:
x1 = sympy.Symbol('x1')
x2 = sympy.Symbol('x2')
x3 = sympy.Symbol('x3')
expr = sympy.sin(x1) + 7*sympy.sin(x2)**2 + 0.1 * x3**4 * sympy.sin(x1)
result = sympy.integrate(expr,
                         (x1, -sympy.pi, sympy.pi),
                         (x2, -sympy.pi, sympy.pi),
                         (x3, -sympy.pi, sympy.pi))
float(result)
Out[8]:
868.175747048395

Result from a standard Monte Carlo sampling method:

In [9]:
N = 10000
accum = 0
for i in range(N):
    xx1 = numpy.random.uniform(-numpy.pi, numpy.pi)
    xx2 = numpy.random.uniform(-numpy.pi, numpy.pi)
    xx3 = numpy.random.uniform(-numpy.pi, numpy.pi)
    accum += numpy.sin(xx1) + 7*numpy.sin(xx2)**2 + 0.1 * xx3**4 * numpy.sin(xx1)
volume = (2 * numpy.pi)**3
volume * accum / float(N)
Out[9]:
875.91412330323476

Using latin hypercube sampling:

In [10]:
seq = lhs(3, N)
accum = 0
for i in range(N):
    xx1 = -numpy.pi + seq[i][0] * numpy.pi * 2
    xx2 = -numpy.pi + seq[i][1] * numpy.pi * 2
    xx3 = -numpy.pi + seq[i][2] * numpy.pi * 2
    accum += numpy.sin(xx1) + 7*numpy.sin(xx2)**2 + 0.1 * xx3**4 * numpy.sin(xx1)
volume = (2 * numpy.pi)**3
volume * accum / float(N)
Out[10]:
869.50363368623914

A low-discrepency Halton sequence:

In [11]:
seq = halton(3, N)
accum = 0
for i in range(N):
    xx1 = -numpy.pi + seq[i][0] * numpy.pi * 2
    xx2 = -numpy.pi + seq[i][1] * numpy.pi * 2
    xx3 = -numpy.pi + seq[i][2] * numpy.pi * 2
    accum += numpy.sin(xx1) + 7*numpy.sin(xx2)**2 + 0.1 * xx3**4 * numpy.sin(xx1)
volume = (2 * numpy.pi)**3
volume * accum / float(N)
Out[11]:
868.23892803059198