pylab inline
Populating the interactive namespace from numpy and matplotlib
First let's generate some simulated data for our problem. This is how I made the data for you to download.
#Simulation input values
a = 1.0
b = 3.0
sigma = 1.5
s = 0.5
#Choose some observed points on the x axis
M_true = np.random.uniform(2.0, 8.0, size=50)
#Generate the intrinsic scatter
L_true = np.random.normal(a*M_true+b, sigma)
#Add noise from observations
L_obs = np.random.normal(L_true, s)
#Save the data so the students can use it
savetxt("simulated_data.txt", transpose([M_true, L_true]), header="M_i L_i_obs s")
#Plot
errorbar(M_true, L_true, s, fmt='.')
M_line = arange(0,10,1.0)
plot(M_line, a*M_line+b, 'r-')
xlim(0,10)
ylim(0,15)
xlabel("M")
ylabel("L")
<matplotlib.text.Text at 0x1095b1890>
Now let's try to recover the underlying parameters, $a$, $b$, and $\sigma$ - we will assume we know the measurement errors; this is often true. Let's collect our parameters together as a vector to avoid having to keep writing them out: \begin{equation} p \equiv \left( a, b, \sigma\right) \end{equation}
Our model is first that our data points are independent: \begin{equation} P(L^\mathrm{obs}|p) = \Pi_i P(L_i^\mathrm{obs}|p) \end{equation} and that the intrinsic scatter and noise are both Gaussian: \begin{equation} L^\mathrm{obs}_i \sim N(L_i, s_i^2) \end{equation} \begin{equation} L_i \sim N(\mu(M_i), \sigma^2) \end{equation}
So that the likelihood for a single point is: \begin{equation} P(L_i^\mathrm{obs}|p) = \int \mathrm{d}L_i P(L_i^\mathrm{obs}|p L_i) P(L_i|p) \end{equation}
and the total likelihood for all of them is the product of the $L_i$, since the data points are independent. It's nearly always easier and more accurate to work with log probabilities, since then you can add them together instead of multiplying small numbers. Since logarithms are monotonic increasing you can also always maximize a log-likelihood to get the same answer as maximimizing the likelihood.
If you write in the definition of the Gaussians on the integral and complete the square you can integrate out the dependence on $L_i$ analytically. Because this is a very simple example everything is Gaussian and this is possible; often your likelihoods are more complex and you do your integrals numerically. \begin{equation} P(L_i^\mathrm{obs}|p) \propto \frac{1}{(s^2+\sigma^2) \sigma s} \exp{-0.5\left(\frac{(L_i^\mathrm{obs}-(aM_i+b))^2}{s^2+\sigma^2}\right)} \end{equation}
Let's write a function to compute this likelihood
#One nice feature of python is that we can write this function so that
#it can take an array of values for a, b, or sigma.
def logP_i(L_i_obs, M_i, a, b, sigma, s):
L_mu = a*M_i+b
alpha = (sigma**-2 + s**-2)**0.5
logP = -0.5*(L_i_obs-L_mu)**2/(sigma**2+s**2) - np.log(alpha) -np.log(s) -np.log(sigma)
return logP
#The total likelihood for all our data points
def logP(a, b, sigma, s):
L = np.array([logP_i(L_i_obs, M_i, a, b, sigma, s) for (L_i_obs, M_i) in zip(L_obs, M_true)])
return np.sum(L,0)
Remember: "real" answers to inference and stats questions are distributions, not single numbers, or even single numbers with error bars. In fact the really real answer is usually a multi-dimensional distribution! But as it's pretty hard to visualize distributions in more than about 2 dimensions we usually have to condense that information down.
The simplest thing you can do is take slices through a distribution. That's what we'll do here - slices in the $a$ and $\sigma$ parameters:
a_trial=arange(0.5, 2.0, 0.001)
plot(a_trial, exp(logP(a_trial, b, sigma, s)))
xlabel("$a$")
ylabel("Likelihood")
<matplotlib.text.Text at 0x1095e2c10>
sigma_trial=arange(0.5, 2.0, 0.01)
plot(sigma_trial, exp(logP(a, b, sigma_trial, s)))
xlabel("$\sigma$")
ylabel("Likelihood")
<matplotlib.text.Text at 0x109606850>
As we'd hoped, these are pretty close to the true underlying values ($a=1.0$ amd $\sigma=1.5$).
Remember that these are slices in likelihood - no more, no less - they are not constraints we have on the parameters. They are also not the likelihood for a single parameter - they are likelihoods after adding an additional, very strong piece of information (the exact value of the other parameters). When you look at something like this you have to be very careful about what you're question you're asking, and usally when you want something more useful result is that you have to marginalize over the other parameters in your space.
In particular, if our slice had been taken through points in the other parameters that were not the truth then we might not expect particularly good answers.
Let's try to visualize this and do something slightly more sophisticated - a 2D plot of the likelihood, on a grid, in the $a$ and $b$ parameters. This is still a slice, but a 2D slice instead
#This is a handy numpy function to build two 2D grids of the parameter values
a, b = mgrid[0.0:2.0:0.01, 2.0:4.0:0.01]
#The T means transpose - numpy very irritatingly swaps the order around. I'm sure there are reasons.
imshow(exp(logP(a, b, sigma, s)).T, extent=(a.min(), a.max(), b.min(), b.max()), interpolation='nearest')
colorbar()
<matplotlib.colorbar.Colorbar instance at 0x109821b90>
There is a clear degeneracy here - why is it in the direction you would expect?
A few more exercises for the enthusiastic: - Do the derivation of the analytic form of the likelihood - Find the maximum likelihood for the three parameters