Today we are going to simulate stochastic processes (time-dependent random data), and filled a numpy array with their discrete analogs (random walks).
Stochastic processes are random process evolving with time in a "true" or "empirical" stochastic/random fashion. Examples including: stock prices, the coordinate of a pollen in the water, the amount of cash in your wallet in a given Las Vegas casino, the acoustic data from an earthquake simulation, etc.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import random # this is the non-vectorized random module
# some examples here
type(1*True) # what just happened?
type(True+True) # what what what? wait what?
Consider we want to simulate the following stochastic process.
Consider the following game: we start from the time $t_0 = 0$, at each subsequent $t_i=i$ ($i=1,2,\dots$), we flip a coin. If the coin lands on head, we win $\$ 1 $, otherwise we lose $\$ 1$. Suppose $M_i$ denotes our money (in $\$ $) in the wallet at $t_i$, and $M_0 = 0$ (when the money amount is $<0$, it means we owe money to the dealer).
We want to model how $M_i$ evolves after 10000 steps.
def coingame(num_flips):
money = np.zeros(num_flips+1)
for i in range(1,num_flips+1):
if random.random() > 0.5:
money[i] = money[i-1] + 1
else:
money[i] = money[i-1] - 1
return money
money = coingame(1000)
plt.plot(money[:300])
# use np.cumsum()
def coingame_vec(num_flips):
coinflip = np.random.random(num_flips)
gain = 1.0*(coinflip < 0.5) - 1.0*(coinflip > 0.5)
# method 2
# gain = np.random.choice([-1.0, 1.0], size= num_flips, p = [0.4, 0.6])
# a nice trick to convert logical 1s and 0s (boolean) to floats
money = np.cumsum(gain)
return money
# plot a few more simulation here
for i in range(10):
money = coingame_vec(300)
plt.plot(money[:300])
num_steps = 300 # max length of a simulated random walk
N = 1000 # number of simulation
simulations = np.zeros([N,num_steps])
for i in range(N):
simulations[i,:] = coingame_vec(num_steps) # each row stands for a simulation
for j in range(100):
plt.plot(simulations[j,:])
Let's say we have a probability distribution that gets us some random $X$'s. We can then get a new probability distribution that takes a bunch of random $X$'s and gives the average:
$$Y_n = \frac{X_1 + X_2 + ... + X_n}{n},$$How are the $Y_n$'s distributed?
For example, each $X_i$ represent the flipping result at $i$-th step. If it's heads, I win 1 dollar so $X_i=1$; if it's tails I lose 1 dollar, so $X_i = -1$. If I play this flipping game $n$ times, my average winning per game is $Y_n$. It's pretty clear that on average, I should break even, but how likely is it for my average winning per game will be high? What's the distribution of my per-game winnings if I play 100 games of coin-flipping?
Let's take 1000 $Y_1$'s, 1000 $Y_2$'s, 1000 $Y_3$'s, ... and see what we get. Let's start by plotting the mean of each one.
Let's design our experiment a little more carefully so that we can get the most information:
# This is the histograam fitting function hist_and_fit.
# a random variable's histogram is plotted, then we fit it using normal distribution.
from math import sqrt, pi, e
def hist_and_fit(X, num_bins=20):
# calculate mean and standard deviation.
mu = np.mean(X)
sigma = np.std(X)
Z = np.linspace(-1,1,300)
plt.axis([-50,50,0,0.1])
plt.hist(X, num_bins, density=True, edgecolor = 'black')
guassian_func = lambda mu, sigma: lambda x: 1/(sqrt(2*pi)*sigma) * e**(-0.5*(x - mu)*(x-mu)/(sigma * sigma))
plt.plot(Z, guassian_func(mu, sigma)(Z))
np.random.seed(12345)
N = 1000 # number of sampling experiments
num_steps = 200 # each experiment has 200 steps
means = np.zeros(num_steps) # initialization
stdevs = np.zeros(num_steps) # initialization
for n in range(1, num_steps):
# Y_n is the average winning for n flippings for N = 1000 simulations
Y_n = np.zeros(N)
for i in range(N):
Y_n[i] = np.sum(np.random.choice([-1,1], size= n)) / n # average gain at each n
means[n] = np.mean(Y_n)
stdevs[n] = np.std(Y_n)
if n % 40 == 0:
plt.figure()
plt.title("n=" + str(n) + ", mean=" + str(means[n])[:10] + ", stdev= " + str(stdevs[n])[:5])
hist_and_fit(Y_n, 20)
plt.axis([-1,1,0,8])
plt.show()
The following code vectorizes the code above, no for
is used.
# vectorized version of the code above here
## YY_n represent the average gain for all N simulations at each time step n
# YY_n[i,:] the i-th row of YY_n represent the average gain at each time step n for the i-th simulation
YY_n = np.zeros([N, num_steps])
np.random.seed(12345)
flips = np.random.choice([-1,1], size= (N, num_steps))
YY_n = np.cumsum(flips, axis = 1)/range(1,num_steps+1)
means = np.mean(YY_n, axis = 0)
stdevs = np.std(YY_n, axis = 0)
## plot
for n in range(1, num_steps):
if n % 40 == 0:
plt.figure()
plt.title("n=" + str(n) + ", mean=" + str(means[n])[:10] + ", stdev= " + str(stdevs[n])[:5])
hist_and_fit(YY_n[:,n], 10)
plt.axis([-1,1,0,8])
plt.show()
If we take $n \rightarrow \infty$...
If we have $X_1,...,X_n$ from the same probability distribution $X$, and look at the probability distribution of the average:
$$Y_n = \frac{X_1 + X_2 + ... + X_n}{n},$$then, as $n$ goes to infinity, $Y_n$ approaches the constant probability distribution that always gives $\mu$, the mean of $X$ the probability 1. In other words, the probability
$$P\Big(\lim_{n\rightarrow \infty} Y_n = \mu \Big) = 1$$Remark: This is regardless of what probability density $X$ has.
plt.plot(range(1,num_steps), means[1:])
plt.plot(range(1,num_steps), stdevs[1:])
plt.figure(figsize=(12,5))
plt.plot(range(1,num_steps), stdevs[1:], 'bo-', markersize = 3)
plt.plot(range(1,num_steps), 1 / np.sqrt(np.arange(1,num_steps)), 'r', linewidth = 2)
So as $n\rightarrow \infty$, the mean of the nth average is going to 0 (the mean of the original distribution) and the standard deviation is going to $\frac{1}{\sqrt{n}}$.
Let $X_n$ be coming from a probability distribution with mean $\mu$ and standard deviation $\sigma$, and let
$$Y_n = \frac{X_1 + X_2 + ... + X_n}{n},$$be the average. As $n \rightarrow \infty$, the distribution of $Y_n$ approaches the normal distribution:
$$N\left(\mu, \Big(\frac{\sigma}{\sqrt{n}} \Big)^2\right)$$with mean $\mu$ and standard deviation $\frac{\sigma}{\sqrt{n}}$.
Note that this is independent of the original shape of $X$. This means that if you have many many things averaging out, you are certain to get a normal distribution! That's why normal distribution is the most standard distribution.
Now suppose the coin is a biased one with $p = 0.6$. Repeat the simulations above using
np.random.choice([-1, 1], size=num_steps, p=[0.4, 0.6])