#!/usr/bin/env python # coding: utf-8 # # The RR term in autocorrelations # Author: Lehman Garrison (http://lgarrison.github.io) # $\newcommand{\RR}{\mathrm{RR}}$ # $\newcommand{\DD}{\mathrm{DD}}$ # When computing a two-point correlation function estimator like # $\xi(r) = \frac{\DD}{\RR} - 1$, # the $\RR$ term can be computed analytically if the domain is a periodic box. # In 2D, this is often done as # $$\begin{align} # \RR_i &= N A_i \bar\rho \\ # &= N A_i \frac{N}{L^2} # \end{align}$$ # where $\RR_i$ is the expected number of random-random pairs in bin $i$, $N$ is the total number of points, $A_i$ is the area (or volume if 3D) of bin $i$, $L$ is the box size, and $\bar\rho$ is the average density in the box. # # However, using $\bar\rho = \frac{N}{L^2}$ is not quite right. When sitting on a particle, only $N-1$ particles are available to be in a bin some distance away. The other particle is the particle you're sitting on, which is always at distance $0$. Thus, the correct expression is # $$\RR_i = N A_i \frac{N-1}{L^2}.$$ # # The following notebook empirically demonstrates that using $N-1$ is correct, and that using $N$ introduces bias of order $\frac{1}{N}$ into the estimator. This is a tiny correction for large $N$ problems, but important for small $N$. # # Cross-correlations of two different particle sets don't suffer from this problem; the particle you're sitting on is never part of the set of particles under consideration for pair-making. # # This $N-1$ correction is implemented in the [Corrfunc](https://github.com/manodeep/Corrfunc) code. # In[1]: import numpy as np # In[2]: # Generates a set of N uniform random points in boxsize L # and returns binned pair counts # Pair counts are doubled, and self-pairs are counted def rand_DD(N, L, bins): # Make a 2D random set of N particles in [0,L) x = np.random.random((2,N))*L # Form the periodic distance array dist = x[:,None] - x[...,None] dist -= np.round(dist/L)*L dist = (dist**2).sum(axis=0)**.5 # Count pairs in 2 bins DD = np.histogram(dist, bins=bins)[0] return DD # In[3]: # DD parameters N = 10 L = 1. bins = np.linspace(0.0,0.49, 5) n_iter = 100000 # number of times to repeat the experiment seed = 42 np.random.seed(seed) # First, calculate RR analytically dA = (bins[1:]**2 - bins[:-1]**2)*np.pi # The area of each 2D annular bin RR = np.empty((2, len(dA))) RR[:] = N/L**2*dA # Calculate RR using N-1 and N RR[0] *= N - 1 RR[1] *= N # If the first bin includes 0, the random term must include the self-pairs if bins[0] == 0.: RR[:, 0] += N xi_mean = np.zeros((2, len(bins)-1)) for i in xrange(n_iter): DD = rand_DD(N, L, bins) xi_mean += DD/RR xi_mean /= n_iter print 'DD/RR using density (N-1) = {}'.format(xi_mean[0]) print 'DD/RR using density (N) = {}'.format(xi_mean[1]) # As we expected, using $N$ in the density leads to a $\frac{1}{N} = 10\%$ bias in $\frac{\DD}{\RR}$, when averaging over many realizations of $\DD$. # # The first bin has a smaller bias because we chose to include a bin that starts at $0$ separation, and thus includes self-pairs, which are indepdendent of the $N$ or $N-1$ density term.