Antonino Ingargiola -- tritemio AT gmail.com
ORCID 0000-0002-9348-1397
twitter @tritemio_sc
Date: April 2016
In this notebook I explore properties of different estimators of "rate" for random events generated by a Poisson process. The initial reason was computing the rate of photon detection events (for example in single-molecule spectroscopy experiments), but the treatment is general and valid for any Poisson process.
Let's assume that $\{d_i\}$ is a set of $n$ samples distributed as $D \sim\operatorname{Exp}(\lambda)$.
Then, the can construct $m = n + 1$ times of random events $\{t_i\}$ as:
$$t_0 = 0, \quad t_i = \sum_{j=0}^{i-1} d_j \quad\textrm{for}\quad i > 1$$These events are stochastic, statistically independent and have a fixed rate $\lambda$.
Given $\{t_i\}$ we can trivially compute $\{d_i\}$ as:
$$d_i = t_{i+1} - t_i$$Instead of $\lambda$, we can directly estimate $\tau = 1/\lambda$:
$$\hat{\tau} = \frac{1}{n}\sum_{i=0}^{n-1} d_i = \frac{T_n}{n}$$The estimator $\hat{\tau} \sim \frac{1}{n}\operatorname{Erlang}(n, \lambda)$. The Erlang distribution has mean $n/\lambda$ and variance $n/\lambda^2$.
Note that:
$$\hat{\tau} \sim \frac{1}{n}\operatorname{Erlang}(n, \lambda) = \operatorname{Erlang}(n, n\,\lambda)$$thus, $\hat{\tau}$ has mean $1/\lambda$ and variance $1/(n \lambda)^2$, i.e. it is an unbiased estimator of $1/\lambda$.
However, the fact the $\hat{\tau}$ is an unbiased estimator of $1/\lambda$ does not mean that $1/\hat{\tau}$ is an unbiased estimator of $\lambda$. Indeed, as shown in the next section, the unbiased estimator of the rate is:
$$\hat\lambda_u = \frac{n-1}{T_n}$$It can be shown that $\hat{\lambda}_u$ has also the minimum variance among the unbiased estimators (i.e. it is a minimum variance unbiased estimator, MVUE) of the rate (see Lehmann-Scheffé theorem and Rao–Blackwell theorem).
For our sample of $n$ observations from an Exponential distribution, the likelihood function is
$$L(\lambda) = \prod_{i=1}^n f(d_i;\lambda) = \prod_{i=1}^n \lambda\, e^{-\lambda d_i}$$$$\log L(\lambda) = \sum_{i=1}^n \log (\lambda\, e^{-\lambda d_i} ) = n\log(\lambda) - \lambda \sum_{i=1}^n d_i = n\log(\lambda) - \lambda T_n$$Taking the first derivative and setting it to 0 we obtain:
$$\frac{\rm d}{{\rm d}\lambda} \log L(\lambda) = \frac{n}{\lambda} - T_n = 0$$$$\Rightarrow\quad \hat{\lambda}_{MLE} = \frac{n}{T_n} = \frac{n}{\sum_{i=1}^n d_i}$$Given an estimator $\Lambda_n$ (a R.V.), we define the mean square error (MSE) as:
$$\epsilon(\Lambda_n) = \mathrm{E}\{ (\Lambda_n - \lambda)^2 \} = \mathrm{Var}\{ \Lambda_n\} + (\mathrm{E}\{\Lambda_n\} - \lambda)^2$$In general the minimum MSE (MMSE) estimator is defined as:
$$\hat{\lambda}_{MSE} = \operatorname*{arg\,min}_{\Lambda_n} \epsilon(\Lambda_n)$$If $\Lambda_n$ has the form:
$$\Lambda_{n,c} = \frac{n - c}{T_n}$$we need to find $c$ that minimizes $\epsilon(\Lambda_{n,c})$. It can be shown that the minimum is achieved for $c=2$
To derive this result use the expression for $\mathrm{Var}\{ \Lambda_{n,c}\}$ derived in Appendix 2.
We consider the family of estimators
$$\Lambda_{n,c} = \frac{n - c}{T_n}$$where $T_n = \sum d_i \sim \textrm{Erlang}(\lambda, n)$ and $c < n$. We can write:
$$\textrm{Erlang PDF:}\quad f(x \:|\: n, \lambda) = \frac{\lambda^n x^{n-1} e^{-\lambda x}}{(n-1)!\,}$$$$\mathrm{E}\{\Lambda_{n,c}\} = \int \frac{n - c}{x} \, f(x \:|\: n, \lambda)\, dx = (n - c) \int_0^\infty \frac{1}{x}\,f(x \:|\: n, \lambda)\, dx = (n - c) \int_0^\infty \frac{\lambda^n x^{n-2} e^{-\lambda x}}{(n-1)!\,}\, dx = \frac{(n - c)\lambda^n}{(n-1)!} \int_0^\infty x^{n-2} e^{-\lambda x}\, dx $$Knowing that (see Appendix 1):
$$ \int_0^{+\infty} x^{n-2} e^{-\lambda x}\, dx = \frac{(n-2)!}{\lambda^{n-1}}\qquad\qquad \textrm{(A)}$$we find that:
$$\mathrm{E}\{\Lambda_{n,c}\} = (n - c)\frac{\lambda}{n-1} = \overline{\Lambda}_{n,c}$$Note that
Furthermore, from simulations (see below), I found that for $c=1/3$ the estimator has median equal to the rate $\lambda$. This result should not be difficult to prove analytically.
%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
n = 5 # number of delays
num_times = n + 1 # number of timestamps
λ = 1000 # simulated rate
num_iter = 100 * 1000
np.random.seed(5)
d1 = np.random.exponential(scale=1/λ, size=(num_iter * n)).reshape(num_iter, n)
t = np.cumsum(d1, axis=1)
assert np.allclose(np.cumsum(d1, axis=1), t)
We $n$ delays, exponentially distributed with a fixed rate. Their cumulative sum (the times) will not have a sharp cut-off:
plt.hist(t.ravel(), bins=50);
The delays are exponential by construction:
plt.hist(d1.ravel(), bins=40, histtype='step');
plt.yscale('log')
plt.title('$\{d_i\}\quad$ Mean = %.3e Std.Dev. = %.3e ' % (d1.mean(), d1.std()));
Here we compare a set of estimators:
$$\hat{\Lambda}_{n,c} = \frac{n - c}{T_n}$$kws = dict(bins=np.arange(0, 4, 0.05), histtype='step', lw=1.5, normed=True)
print('{:>8} {:>6} {:>8} {:>8} {:>8} {:>8} {:>8}'
.format('c', 'n', 'Mean', 'Std', 'MSE', 'Median', '% > λ'))
for c in (-1, 0, 1/3, 1, 2, 3):
r_hc = (n - c) / d1.sum(axis=1) / λ
r_hc_err_rms = np.sqrt(np.mean(((r_hc - 1)**2)))
plt.hist(r_hc, **kws, label = 'c = %.1f' % c);
print('%8.2f %6d %8.5f %8.2f%% %8.3f%% %8.3f %8.2f%%' %
(c, n, r_hc.mean(), r_hc.std()*100, r_hc_err_rms*100, np.median(r_hc),
(r_hc > 1).sum() * 100 / num_iter))
r_hm = 1/np.median(d1, axis=1) / λ
r_hm_err_rms = np.sqrt(np.mean(((r_hm - 1)**2)))
plt.hist(r_hm, **kws, label = 'median');
print('%8s %6d %8.5f %8.2f%% %8.3f%% %8.3f %8.2f%%' %
('median', n, r_hm.mean(), r_hm.std()*100, r_hm_err_rms*100, np.median(r_hm),
(r_hm > 1).sum() * 100 / num_iter))
plt.xlabel('Normalized Rate')
plt.axvline(1, color='k');
plt.legend()
plt.text(0.35, 0.75, r'$\Lambda_{n,c} = \frac{n - c}{T_n}$',
fontsize=24, transform=plt.gcf().transFigure);
c n Mean Std MSE Median % > λ -1.00 5 1.49712 85.90% 99.247% 1.282 71.34% 0.00 5 1.24760 71.58% 75.744% 1.068 55.67% 0.33 5 1.16442 66.81% 68.804% 0.997 49.76% 1.00 5 0.99808 57.27% 57.267% 0.854 36.93% 2.00 5 0.74856 42.95% 49.769% 0.641 18.41% 3.00 5 0.49904 28.63% 57.702% 0.427 5.27% median 5 1.92114 181.83% 203.832% 1.440 73.49%
We can observe that:
By contrast, when estimating $\tau = \lambda^{-1}$, the MVUE is the empirical mean of the delays ($\hat{\tau}$). The distribution of $\hat{\tau}$ is shown below:
kws = dict(bins=np.arange(0, 4, 0.05), histtype='step', lw=1.5, normed=True)
print('{:>6} {:>8} {:>8} {:>8} {:>8} {:>8}'
.format('n', 'Mean', 'Std', 'Err.RMS', 'Median', '% > 1/λ'))
d_h = np.mean(d1, axis=1) * λ
d_h_err_rms = np.sqrt(np.mean(((d_h - 1)**2)))
plt.hist(d_h, **kws, label = r'$\hat\tau$');
print('%6d %8.5f %8.2f%% %8.3f%% %8.3f %8.2f%%' %
(n, d_h.mean(), d_h.std()*100, d_h_err_rms*100, np.median(d_h),
(d_h > 1).sum() * 100 / num_iter))
plt.legend(fontsize=18)
plt.xlabel('Normalized Delays')
plt.axvline(1, color='k');
n Mean Std Err.RMS Median % > 1/λ 5 1.00177 44.74% 44.742% 0.936 44.33%
Since both $\hat\tau = T_n / n$ and $\hat\lambda_u = (n-1)/T_n$
are unbiased estimators their empirical mean
computed on $N$ (num_iter
) sets of n
delays, will converge
to $\lambda^{-1}$ and $\lambda$ respectively. In symbols:
Below we plot the convergence of the expected value, approximated by an empirical average with number of "trials" $N \to \infty$.
Let's start plotting $1/\textrm{E}[\hat\tau]$ for $N \to \infty$ first:
rate_t = 1 / ((d1.mean(axis=1)).cumsum() / np.arange(1, num_iter+1)) / λ
plt.plot(rate_t[100:])
plt.ylim(0.98, 1.02)
plt.axhline(1, color='k');
And now, let's plot the same for $\textrm{E}[\hat\lambda_u]$:
rate_t2 = ((n - 1) / d1.sum(axis=1)).cumsum() / np.arange(1, num_iter+1) / λ
plt.plot(rate_t2[100:])
plt.ylim(0.98, 1.02)
plt.axhline(1, color='k');
In both cases there is convergence to $\lambda$.
Here we want to compute the integral:
$$ \int_0^{\infty} x^{n-2} e^{-\lambda x}\, dx $$The $\Gamma$ function is defined as:
$$\Gamma(s) = \int_0^{\infty} t^{s-1}\,e^{-t}\,{\rm d}t$$Therefore, with the variable substitution $t = \lambda x$:
$$ {\rm d}t = \lambda {\rm d} x$$$$\frac{1}{\lambda^{n-2}} \int_0^\infty t^{n-2} e^{-t}\,\frac{{\rm d}t}{\lambda} = \frac{1}{\lambda^{n-1}} \int_0^\infty t^{n-2} e^{-t}\,{\rm d}t = \frac{1}{\lambda^{n-1}} \Gamma(n-1) = \frac{(n-2)!}{\lambda^{n-1}}$$So we obtain:
$$ \int_0^{\infty} x^{n-2} e^{-\lambda x}\, dx = \frac{(n-2)!}{\lambda^{n-1}}\qquad\qquad \textrm{(A)}$$Finally, note that $n$ is the number of delays, while the number of times/events is $m = n+1$.
We want to compute the variance for the family of rate estimators $\Lambda_{n,c}$ defined as
$$\Lambda_{n,c} = \frac{n-c}{T_n}$$where $T_n = \sum d_i$ is the sum of the $n$ delays, and $c$ is a real parameter with $c < n$.
We start by writing:
$$\mathrm{Var}\{\Lambda_{n,c}\} = \mathrm{E}\left\{(\Lambda_{n,c} - \mathrm{E}\{\Lambda_{n,c}\})^2 \right\} = \int \left( \Lambda_{n,c} - \overline{\Lambda}_{n,c} \right)^2 f(x \:|\: n, \lambda)\, dx = \int \left(\frac{n - c}{x} - (n - c)\frac{\lambda}{n-1} \right)^2 f(x \:|\: n, \lambda)\, dx =$$$$= (n-c)^2 \int \left(\frac{1}{x} - \frac{\lambda}{n-1} \right)^2 f(x \:|\: n, \lambda)\, dx$$For brevity, let's define:
$$\left(\frac{1}{x} - \frac{\lambda}{n-1} \right)^2 = \left(\frac{1}{x^2} + \left(\frac{\lambda}{n-1}\right)^2 -2 \frac{\lambda}{x(n-1)})\right) = \left(\frac{1}{x^2} + a + \frac{b}{x} \right)$$From term 2:
$$(n-c)^2 \int a \, f(x \:|\: n, \lambda)\, dx = a (n-c)^2 = (n-c)^2\left( \frac{\lambda}{n-1} \right)^2$$From term 1:
$$(n-c)^2 \int \frac{1}{x^2} \, f(x \:|\: n, \lambda)\, dx = (n - c)^2 \int \frac{\lambda^n x^{n-3} e^{-\lambda x}}{(n-1)!\,}\, dx = \frac{(n - c)^2\lambda^n}{(n-1)!} \int x^{n-3} e^{-\lambda x}\, dx = \frac{(n - c)^2\lambda^n}{(n-1)!} \frac{(n-3)!}{\lambda^{n-2}} =$$$$= \frac{(n - c)^2\lambda^2}{(n-1)(n-2)}$$And from term 3:
$$(n-c)^2 \int \frac{b}{x} \, f(x \:|\: n, \lambda)\, dx = b(n - c)^2 \int \frac{\lambda^n x^{n-2} e^{-\lambda x}}{(n-1)!\,}\, dx = \frac{b(n - c)^2\lambda^n}{(n-1)!} \int x^{n-2} e^{-\lambda x}\, dx = \frac{b(n - c)^2\lambda^n}{(n-1)!} \frac{(n-2)!}{\lambda^{n-1}} =$$$$= \frac{b(n - c)^2\lambda}{(n-1)} = -\frac{2\lambda}{n-1} \frac{(n - c)^2\lambda}{(n-1)} = -2 \frac{(n - c)^2\lambda^2}{(n-1)^2}$$Summing these 3 terms we obtain the variance:
$$\mathrm{Var}\{\Lambda_{n,c}\} = \lambda^2(n-c)^2\left[ \frac{1}{n-1} + \frac{1}{n-2} - 2 \frac{1}{n-1}\right] = \lambda^2(n-c)^2\left[\frac{1}{n-2} - \frac{1}{n-1}\right]$$