by Kyle Cranmer, Dec 17, 2015
The correction for 1d look-elsewhere effect (LEE) presented in Trial factors or the look elsewhere effect in high energy physics by Ofer Vitells and Eilam Gross http://arxiv.org/abs/arXiv:1005.1891
It is often claimed that when one has two statistically independent searches for the same particle, then a peak in one tells you where to look for the other, thus eliminating the look-elsewhere effect. These searches might be from two different experiments (eg. ATLAS and CMS), two different decay modes, or two different time perieods (eg. run1 and run2). There are various flaws in this logic, as stressed by Bob Cousins in these slides. This issue quickly becomes subtle as the intuitive procedure of using the location of the excess in one search to inform where to look in the other is not fully specified. A few things can be said about the intuitive procedure
In what follows I will explore the behavior of the LEE correction for the combination of two searches (which can be trivially extended to more than two searches).
The starting point is to consider a search for a new particle with signal strength $\mu$ and unknown mass $\nu$ on top of a background distribution described by some nuisance parameters $\theta$. We perform the search by scanning over the mass $\nu$ and calculating the test statistic
\begin{equation} q(\nu) = -2 \log \frac{ \max_{\theta} L(\mu=0, \nu, \theta)}{ \max_{\mu, \theta} L(\mu, \nu, \theta)} \end{equation}Assuming the background-only is true, $q(\nu)$ is a chi-square random field (with 1 degree of freedom). That means that, for any point $\nu$, the quantity $q(\nu)$ would have a chi-square distribution if you repeated the experiment many times.
The maximum local significance is based on $Z_{local} = \sqrt{q_{max}}$, where $q_{max} = \max_\nu q(\nu)$. The correction from local to global significance is given by:
\begin{equation} p_{global} = p_{local} + N \exp(-(q_{max}-u_0)/2) \end{equation}where $N$ is the average number of upcrossings above level $u_0$ (i.e. times that $q(\nu) > u_0$). This $N$ characterizes the search -- searches with good mass resolution over a large mass range will have large values of $N$, while searches with poor mass resolution will have small values of $N$.
Creating many likelihood scans from pseudo-experiments (toy Monte Carlo) is somewhat time consuming, so here we make realizations of a chi-square random field by using a Gaussian Process. The main trick we will use is that a chi-square distribution for one degree of freedom is the same as the distribution of $x^2$ if $x$ is normally distributed. As you might have guessed, a Gaussian Process (GP) is like a chi-square random field, but it is Gaussian-distributed at each point.
Note, the distributions are not independent at each point, there is some covariance. So if the $q(\nu)$ is high at one point, you can expect it to be high near by. We can control this behavior via the GP's kernel. In particular, $K(\nu, \nu') = Cov[q(\nu)^2, q(\nu')^2]$. We can essentially specify what the mass resolution of our virtual search is via the length scale used in ther kernel.
For more on the theory of Gaussian Processes, the best resource is available for free online: Rasmussen & Williams (2006). We will george
-- a nice python package for Gaussian Processes (GP).
The next major conceptual pilar for this work is the asymptotic approximations for the likelihood ratio. Wilks's theorem states that assuming background-only ($\mu=0$) the distribution of the best fit signal strength $\hat{\mu}$ follows a Gaussian distribution $G(\hat{\mu} | \mu=0, \sigma)$, where $\sigma^2 = \textrm{Var}[\mu]$. With that assumption $q(\mu) = (\hat{\mu}/\sigma)^2$, hence $q(\mu)$ is chi-square distributed. In this way of thinking, the GP is generating results for $(\hat{\mu}/\sigma)$ as a function of the mass parameter $\nu$.
This allows us to quickly do combinations on these toy results. Fundamentally, we wish to perform a likelihood combination at every mass point. This is additive in the log-likelihood ratio: \begin{equation} q_{12}(\nu) = q_1(\nu) + q_2(\nu) + const \end{equation}
This is also a chi-square random field. In the asymptotic limit, the likelihood combination is equivalent to:
\begin{equation} \hat{\mu}_{12}(\nu) = \frac{\hat{\mu}_1(\nu) \sigma_1^{-2}(\nu) +\hat{\mu}_2(\nu)\sigma_2^{-2}(\nu)}{\sigma_1^{-2}(\nu)+\sigma_2^{-2}(\nu)} \end{equation}together with variance
\begin{equation} Var[\hat{\mu}_{12}(\nu)] \equiv\sigma_{12}^2(\nu) = \frac{1}{\sigma_1^{-2}(\nu)+\sigma_2^{-2}(\nu)} \end{equation}The important part here is that we can also work out the kernel for the Gaussian process that describes $(\hat{\mu}_{12}/\sigma_{12})^2(\nu)$. In particular, the covariance between $\nu_1$ and $\nu_2$ of the GP for combination can be derived from the covariance of the GP for searches 1 and 2.
Consider the simple case where the two searches have the same constant sensitivity: $\sigma_1(\nu) = \sigma_2(\nu) = \sigma$. Then $\hat{\mu}_{12}(\nu) = [\hat{\mu}_1(\nu)+\hat{\mu}_2(\nu)]/2$ and $\sigma_{12}^2 = \sigma^2/2$. So the kernel for the GP that describes the combination is given by:
\begin{equation} K_{12}(\nu, \nu') = Cov[(\hat{\mu}_{12}(\nu)/\sigma_{12})^2, (\hat{\mu}_{12}(\nu')/\sigma_{12})^2 = \frac{1}{2} K_{1}(\nu, \nu') + \frac{1}{2} K_{2}(\nu, \nu') \end{equation}Correlarry If two searches have the same mass resolution and statistical power, the effective $N$ needed to calculate the LEE for the combination is the same as the individual searches. (This can be demonstraited with the code below by setting ratio_of_correlations=1.
Note In what follows, I'll demonstrate this for the simple case, but this can be extended to have separate $\sigma(\nu)$ curves for searches 1 and 2.
We will create three different Gaussian Processes:
gp1
will generate results from experiment 1gp2
will generate results from experiment 2gp12
will be shown to have the same behavior as the combination of gp1
and gp2
%pylab inline --no-import-all
#plt.rc('text', usetex=True)
plt.rcParams['figure.figsize'] = (6.0, 6.0)
#plt.rcParams['savefig.dpi'] = 60
Populating the interactive namespace from numpy and matplotlib
import george
from george.kernels import ExpSquaredKernel, My2ExpLEEKernel, MySignificanceKernel
from scipy.stats import chi2, norm
length_scale_of_correaltion=3.
ratio_of_length_scales=10.
kernel1 = ExpSquaredKernel(length_scale_of_correaltion, ndim=1)
kernel2 = ExpSquaredKernel(ratio_of_length_scales**2*length_scale_of_correaltion, ndim=1)
kernel12 = 0.5*kernel1+0.5*kernel2
# Create the Gaussian process
# gp = george.GP(kernel)
gp1 = george.GP(kernel1, solver=george.HODLRSolver) #faster
gp2 = george.GP(kernel2, solver=george.HODLRSolver) #faster
gp12 = george.GP(kernel12, solver=george.HODLRSolver) #faster
a1,b1,a2,b2, dummy = 1.e-2,0.1,-1.e-2,1.1, 1e3
newkernel1 = My2ExpLEEKernel(a1=a1,b1=b1,a2=0.,b2=dummy,\
l1=length_scale_of_correaltion,\
l2=ratio_of_length_scales*length_scale_of_correaltion)
newkernel2 = My2ExpLEEKernel(a1=0.,b1=dummy,a2=a2,b2=b2,\
l1=length_scale_of_correaltion,\
l2=ratio_of_length_scales*length_scale_of_correaltion)
newkernel12 = My2ExpLEEKernel(a1=a1,b1=b1,a2=a2,b2=b2,\
l1=length_scale_of_correaltion,\
l2=ratio_of_length_scales*length_scale_of_correaltion)
#build equivalent combined kernel from basic parts and operations
# this is almost it, but still need to divide by ((1./sig11/sig11 + 1./sig21/sig21))
# and current implementation of MySignificanceKernel should be 1/sig not sig
#sig1kernel = MySignificanceKernel(a=a1,b=b1)
#sig2kernel = MySignificanceKernel(a=a2,b=b2)
#newkernel12 = sig1kernel*kernel1+sig2kernel*kernel2
# made up sensitivity curves
def sigma1(x):
return a1*x+b1
def sigma2(x):
return a2*x+b2
gp1 = george.GP(newkernel1) #,solver=george.HODLRSolver)
gp2 = george.GP(newkernel2) #,solver=george.HODLRSolver)
gp12 = george.GP(newkernel12) #,solver=george.HODLRSolver)
n_scan_points=250
x = np.linspace(0,100,n_scan_points)
# slow part: pre-compute internal stuff for the GP
gp1.compute(x)
gp2.compute(x)
gp12.compute(x)
# evaluate one realization of the GP
z1 = gp1.sample(x)
z2 = gp2.sample(x)
z12 = gp12.sample(x)
# plot the chi-square random field
plt.plot(x,z1, color='red')
plt.plot(x,z2, color='red')
plt.plot(x,z12)
plt.ylabel(r'$z(\nu)$')
plt.xlabel(r'$\nu$')
<matplotlib.text.Text at 0x1127d3450>
Now lets histogram the values of the random field.
Don't get confused here... if you pick a single point and histogram the value of over many instances, you expect a Gaussian. However, for a single instance, you don't expect the histogram for the value of the field to be Gaussian (because of the correlations). Thought experiments: if you make length_scale_of_correaltion
very small, then each point is essentially independent and you do expect to see a Gaussian; however, if length_scale_of_correaltion
is very large then you expect the field to be nearly constant and the histogram below would be a delta function.
def q_to_pvalue(q):
return (1.-chi2.cdf(q, 1))/2 #divide by 2 for 1-sided test
def pvalue_to_significance(p):
return -norm.ppf(p)
def significance_to_pvalue(Z):
return 1.-norm.cdf(Z)
def num_upcrossings(z):
"""count number of times adjacent bins change between 0,1"""
return np.sum((z-np.roll(z,1))**2)/2
def global_pvalue(u,u0, n):
#return (1.-chi2.cdf(u, 1))/2. + np.exp(-(u-u0)/2)*n #1-sided p-value
return (1.-chi2.cdf(u, 1)) + np.exp(-(u-u0)/2)*n # 2-sided p-value
u1 = 0.1
Check the code to count upcrossings and the LEE correction is working
n_samples = 1000
n_plots = 3
plt.figure(figsize=(9,n_plots*3))
z_array = gp1.sample(x,n_samples)
n_up = np.zeros(n_samples)
sig1 = sigma1(x)
sig2 = sigma2(x)
for scan_no, z in enumerate(z_array):
scan = (z/sig1)**2
exc1 = (scan>u1) + 0. #add 0. to convert from bool to double
n_up[scan_no] = num_upcrossings(exc1)
if scan_no < n_plots:
plt.subplot(n_plots,2,2*scan_no+1)
plt.plot(x,scan)
plt.plot([0,100],[u1,u1], c='r')
plt.subplot(n_plots,2,2*scan_no+2)
plt.plot(x,exc1)
plt.ylim(-.1,1.1)
print('experiment %d has %d upcrossings' %(scan_no, n_up[scan_no]))
n_av = np.mean(n_up)
print("average number of upcrossings in %d experiments is %f" %(n_samples, n_av))
experiment 0 has 13 upcrossings experiment 1 has 15 upcrossings experiment 2 has 14 upcrossings average number of upcrossings in 1000 experiments is 14.448000
u = np.linspace(5,25,100)
global_p = global_pvalue(u,u1,n_av)
n_samples = 10000
z_array = gp1.sample(x,n_samples)
q_max = np.zeros(n_samples)
for scan_no, z in enumerate(z_array):
scan = (z/sig1)**2
q_max[scan_no] = np.max(scan)
bins, edges, patches = plt.hist(q_max, bins=30)
icdf = 1.-np.cumsum(bins/n_samples)
icdf = np.hstack((1.,icdf))
icdf_error = np.sqrt(np.cumsum(bins))/n_samples
icdf_error = np.hstack((0.,icdf_error))
plt.xlabel('$q_{max}$')
plt.ylabel('counts / bin')
<matplotlib.text.Text at 0x1136e8fd0>
# plot the p-value
plt.plot(edges,icdf, c='r', label='toys')
plt.errorbar(edges,icdf,yerr=icdf_error)
plt.plot(u, global_p, label='prediction')
plt.xlabel('$u$')
plt.ylabel('$P(q_{max} >u)$')
plt.legend(('prediction','toys'))
#plt.ylabel('P(q>u)')
plt.ylim(1E-3,10)
plt.xlim(0,25)
plt.semilogy()
[]
Wow! that was awesome! Go math!
plt.plot(x,sig1)
plt.plot(x,sig2)
plt.ylim(0,4)
(0, 4)
n_samples = 10000
z_array1 = gp1.sample(x,n_samples)
z_array2 = gp2.sample(x,n_samples)
n_av1, n_av2, n_av12 = 0., 0., 0.
q_max = np.zeros((n_samples,3))
q_10 = np.zeros((n_samples,3))
n_plots = 3
plt.figure(figsize=(9,n_plots*3))
scan_no=0
for z1, z2 in zip(z_array1,z_array2):
scan1 = (z1/sig1)**2
scan2 = (z2/sig2)**2
scan12 = ((z1/sig1**2 + z2/sig2**2 )/(sig1**-2+sig2**-2))**2 # This is where the combination happens
scan12 = scan12*(sig1**-2 + sig2**-2)
if scan_no==0:
print sig1[5], sig2[5], scan1[5], scan2[5], scan12[5]
exc1 = (scan1>u1) + 0. #add 0. to convert from bool to double
exc2 = (scan2>u1) + 0. #add 0. to convert from bool to double
exc12 = (scan12>u1) + 0. #add 0. to convert from bool to double
if scan_no < n_plots:
aspect = 1.
#plt.subplot(n_plots,3,3*scan_no+1)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan1, c='r', label='search 1')
#plt.subplot(n_plots,3,3*scan_no+2)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan2, c='g', label='search 2')
#plt.subplot(n_plots,3,3*scan_no+3)
plt.subplot(n_plots,1,1*scan_no+1)
plt.plot(x,scan12, c='b', label='combined')
plt.legend(('search 1', 'search 2', 'combined'))
q_max[scan_no,:] = [np.max(scan1), np.max(scan2), np.max(scan12)]
q_10[scan_no,:] = [scan1[10],scan2[10], scan12[10]]
#print num_upcrossings(exc1)
n_av1 += 1.*num_upcrossings(exc1)/n_samples
n_av2 += 1.*num_upcrossings(exc2)/n_samples
n_av12 += 1.*num_upcrossings(exc12)/n_samples
scan_no +=1
print "n_av search 1, search 2, combined = ", n_av1, n_av2, n_av12
0.120080321285 1.07991967871 0.0611510142466 2.96467034774 0.00307908455561 n_av search 1, search 2, combined = 14.3972 1.6124 9.4427
#Simple scaling:
print "check simple scailing rule: prediction=%f, observed=%f" %(np.sqrt((n_av1**2+n_av2**2)/2), n_av12)
check simple scailing rule: prediction=10.244003, observed=9.442700
gp12
has the same behavior as the explicit combination of search 1 and search 2.¶z_array12 = gp12.sample(x,n_samples)
q12_max = np.zeros((n_samples))
n_up = np.zeros(n_samples)
for scan_no, z12 in enumerate(z_array12):
scan12 = (z12)**2 * (sig1**-2 + sig2**-2) #divide out the variance of the combined
q12_max[scan_no] = np.max(scan12)
n_up[scan_no] = num_upcrossings((scan12 > u1)+0.)
print("average number of upcrossings for combined GP = %f" %(np.mean(n_up)))
average number of upcrossings for combined GP = 9.568000
Compare $q_{max}$ distribution from direct combination with the prediction from gp12
bins, edges, patches = plt.hist(q_max[:,2], bins=50, alpha=0.1, color='r', label='explicit combination')
bins, edges, patches = plt.hist(q12_max, bins=edges, alpha=0.1, color='b', label='predicted')
plt.ylabel('counts/bin')
plt.xlabel('$q_{max}$')
plt.legend(('explicit combination', 'predicted'))
<matplotlib.legend.Legend at 0x113610a10>
u = np.linspace(5,25,100)
global_p = global_pvalue(u,u1,np.mean(n_up))
icdf = 1.-np.cumsum(bins/n_samples)
icdf = np.hstack((1.,icdf))
icdf_error = np.sqrt(np.cumsum(bins))/n_samples
icdf_error = np.hstack((0.,icdf_error))
plt.plot(edges,icdf, c='r', label='toys')
plt.errorbar(edges,icdf,yerr=icdf_error)
plt.plot(u, global_p, label='prediction')
plt.xlabel('$u$')
plt.ylabel('$P(q_{max} >u)$')
plt.legend(('prediction','toys'))
#plt.ylabel('P(q>u)')
plt.ylim(1E-3,10)
plt.xlim(0,25)
plt.semilogy()
[]
Bingo!