Kyle Cranmer, Nov 19, 2015
Based on Estimating the significance of a signal in a multi-dimensional search by Ofer Vitells and Eilam Gross http://arxiv.org/pdf/1105.4355v1.pdf
This is for the special case of a likelihood function of the form $L(\mu, \nu_1, \nu_2)$ where $\mu$ is a single parameter of interest and $\nu_1,\nu_2$ are two nuisance parameters that are not identified under the null. For example, $\mu$ is the signal strength of a new particle and $\nu_1,\nu_2$ are the unknown mass and width of the new particle. Under the null hypothesis, those parameters don't mean anything... aka they "are not identified under the null" in the statistics jargon. This introduces a 2-d look elsewhere effect.
The LEE correction in this case is based on
\begin{equation} E[ \phi(A_u) ] = P(\chi^2_1 > u) + e^{-u/2} (N_1 + \sqrt{u} N_2) \, \end{equation}where
The notebook is broken into two parts.
%pylab inline --no-import-all
Populating the interactive namespace from numpy and matplotlib
from lee2d import *
These are toy histograms, but they are supposed to represent scans of $ q(\nu_1, \nu_2)$ where
\begin{equation} q(\nu_1, \nu_2) = -2 \log \frac{ \max_\theta L(\mu=0, \nu_1, \nu_2, \theta)}{ \max_{\mu, \theta} L(\mu, \nu_1, \nu_2, \theta)} \end{equation}and $\theta$ are nuisance parameters for the background model, $\mu$ is parameter of interest, and $\nu_1, \nu_2$ are nuisance parameters for the signal that are not meaningful for the null (eg. the mass and width of a hypothetical particle).
In this setting, the maximum local significance is given by
\begin{equation} Z_{local} = \max_{\nu_1, \nu_2} \sqrt{q(\nu_1, \nu_2)} \end{equation}from create_test_histograms import *
create_test_histograms()
Welcome to ROOTaaS 6.05/03
#check to make sure test_hists.root exists
!ls *root
HiggsLWA.root HiggsLWANew.root test_hists.root
# Specify the necessary info
max_local_significance = 4.
# choose u1, u2 thresholds for doing the scan.
# these are arbitrary
# if there are enough toys the choice shouldn't matter, but
# we may want to do some tests with other choices
this_u1, this_u2 = 0.1, 0.9
# Specify the root file with the histograms and their names
root_file_name = 'test_hists.root'
num_toy_scans = 25
names_of_toy_likleihood_scans = [('scan_toy_%d' %(i)) for i in range(11,num_toy_scans)]
def convert_hist_to_numpy(hist):
"""a little helper script"""
temp = np.zeros((hist.GetNbinsX(), hist.GetNbinsY()))
for i in range(temp.shape[0]):
for j in range(temp.shape[1]):
temp[i,j] = hist.GetBinContent(i+1, j+1)
return temp
# Read in histograms, convert them to numpy arrays
inFile = ROOT.TFile(root_file_name, 'READ')
likelihoodScans = []
for histName in names_of_toy_likleihood_scans:
inHist = inFile.Get(histName)
temp = convert_hist_to_numpy(inHist)
likelihoodScans.append(temp)
from scipy.ndimage import grey_closing, binary_closing
def fill_holes(array):
zero_array = array==0.
temp = grey_closing(array, size=2)*zero_array
return temp+array
def get_euler_characteristics(listOfScans, u1=0.1, u2=0.9):
"""
loop through the likleihood scans and calculate expectation
of Euler characteristic for excursion sets above levels u1, u2
"""
n_plots = 3
plt.figure(figsize=(9,n_plots*3))
phis = np.zeros((len(listOfScans),2))
for scan_no, scan in enumerate(listOfScans):
# fill holes from failures in original likelihood
scan = fill_holes(scan)
#get excursion sets above those two levels
exc1 = (scan>u1) + 0. #add 0. to convert from bool to double
exc2 = (scan>u2) + 0.
#print '\nu1,u2 = ', u1, u2
if scan_no < n_plots:
plt.subplot(n_plots,3,3*scan_no+1)
aspect = 1.*scan.shape[0]/scan.shape[1]
plt.imshow(scan.T, cmap='gray', aspect=aspect)
plt.subplot(n_plots,3,3*scan_no+2)
plt.imshow(exc1.T, cmap='gray', aspect=aspect)
plt.subplot(n_plots,3,3*scan_no+3)
plt.imshow(exc2.T, cmap='gray', aspect=aspect)
phi1 = calculate_euler_characteristic(exc1)
phi2 = calculate_euler_characteristic(exc2)
#print 'phi1, phi2 = ', phi1, phi2
phis[scan_no] = [phi1, phi2]
plt.savefig('islands.png')
print 'Exp phi_0=%f, phi_2=%f' %(mean(phis[:,0]), mean(phis[:,1]))
return mean(phis[:,0]), mean(phis[:,1])
The columns of the below are for
$\left| \hspace{2em} q(\nu_1, \nu_2) \hspace{2em}\right| \hspace{2em}q(\nu_1, \nu_2)>u_1 \hspace{2em} \left| \hspace{2em} q(\nu_1, \nu_2) > u_2 \hspace{2em} \right|$
expphi1, expphi2 = get_euler_characteristics(likelihoodScans, u1=this_u1, u2=this_u2)
Exp phi_0=4.857143, phi_2=4.571429
global_p_value = do_LEE_correction(max_local_significance, this_u1, this_u2, expphi1, expphi2)
n1, n2 = 3.15777649996 3.66198277791 local p_value = 0.000032, local significance = 4.000000 global p_value = 0.006036, global significance = 2.510004