The Lomb-Scargle Periodogram is a well-known method of finding periodicity in irregularly-sampled time-series data.
The common implementation of the periodogram is relatively slow: for $N$ data points, a frequency grid of $\sim N$ frequencies is required and the computation scales as $O[N^2]$.
In a 1989 paper, Press and Rybicki presented a faster technique which makes use of fast Fourier transforms to reduce this cost to $O[N\log N]$ on a regular frequency grid.
The `gatspy`

package implement this in the `LombScargleFast`

object, which we'll explore below.

But first, we'll motivate *why* this algorithm is needed at all.
We'll start this notebook with some standard imports:

In [1]:

```
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# use seaborn's default plotting styles for matplotlib
import seaborn; seaborn.set()
```

To begin, let's make a function which will create $N$ noisy, irregularly-spaced data points containing a periodic signal, and plot one realization of that data:

In [2]:

```
def create_data(N, period=2.5, err=0.1, rseed=0):
rng = np.random.RandomState(rseed)
t = np.arange(N, dtype=float) + 0.3 * rng.randn(N)
y = np.sin(2 * np.pi * t / period) + err * rng.randn(N)
return t, y, err
t, y, dy = create_data(100, period=20)
plt.errorbar(t, y, dy, fmt='o');
```

From this, our algorithm should be able to identify any periodicity that is present.

The Lomb-Scargle Periodogram works by evaluating a power for a set of candidate frequencies $f$. So the first question is, how many candidate frequencies should we choose?
It turns out that this question is *very* important. If you choose the frequency spacing poorly, it may lead you to miss strong periodic signal in the data!

First, let's think about the frequency spacing we need in our grid. If you're asking about a candidate frequency $f$, then data with range $T$ contains $T \cdot f$ complete cycles. If our error in frequency is $\delta f$, then $T\cdot\delta f$ is the error in number of cycles between the endpoints of the data. If this error is a significant fraction of a cycle, this will cause problems. This givs us the criterion $$ T\cdot\delta f \ll 1 $$ Commonly, we'll choose some oversampling factor around 5 and use $\delta f = (5T)^{-1}$ as our frequency grid spacing.

Next, we need to choose the limits of the frequency grid. On the low end, $f=0$ is suitable, but causes some problems – we'll go one step away and use $\delta f$ as our minimum frequency. But on the high end, we need to make a choice: what's the highest frequency we'd trust our data to be sensitive to? At this point, many people are tempted to mis-apply the Nyquist-Shannon sampling theorem, and choose some version of the Nyquist limit for the data. But this is entirely wrong! The Nyquist frequency applies for regularly-sampled data, but irregularly-sampled data can be sensitive to much, much higher frequencies, and the upper limit should be determined based on what kind of signals you are looking for.

Still, a common (if dubious) rule-of-thumb is that the high frequency is some multiple of what Press & Rybicki call the "average" Nyquist frequency, $$ \hat{f}_{Ny} = \frac{N}{2T} $$ With this in mind, we'll use the following function to determine a suitable frequency grid:

In [3]:

```
def freq_grid(t, oversampling=5, nyquist_factor=3):
T = t.max() - t.min()
N = len(t)
df = 1. / (oversampling * T)
fmax = 0.5 * nyquist_factor * N / T
N = int(fmax // df)
return df + df * np.arange(N)
```

Now let's use the `gatspy`

tools to plot the periodogram:

In [4]:

```
t, y, dy = create_data(100, period=2.5)
freq = freq_grid(t)
print(len(freq))
```

In [5]:

```
from gatspy.periodic import LombScargle
model = LombScargle().fit(t, y, dy)
period = 1. / freq
power = model.periodogram(period)
plt.plot(period, power)
plt.xlim(0, 5);
```

The algorithm finds a strong signal at a period of 2.5.

To demonstrate explicitly that the Nyquist rate doesn't apply in irregularly-sampled data, let's use a period below the averaged sampling rate and show that we can find it:

In [6]:

```
t, y, dy = create_data(100, period=0.3)
period = 1. / freq_grid(t, nyquist_factor=10)
model = LombScargle().fit(t, y, dy)
power = model.periodogram(period)
plt.plot(period, power)
plt.xlim(0, 1);
```

With a data sampling rate of approximately $1$ time unit, we easily find a period of $0.3$ time units. The averaged Nyquist limit clearly does not apply for irregularly-spaced data! Nevertheless, short of a full analysis of the temporal window function, it remains a useful milepost in estimating the upper limit of frequency.

With these rules in mind, we see that the size of the frequency grid is approximately $$ N_f = \frac{f_{max}}{\delta f} \propto \frac{N/(2T)}{1/T} \propto N $$ So for $N$ data points, we will require some multiple of $N$ frequencies (with a constant of proportionality typically on order 10) to suitably explore the frequency space. This is the source of the $N^2$ scaling of the typical periodogram: finding periods in $N$ datapoints requires a grid of $\sim 10N$ frequencies, and $O[N^2]$ operations. When $N$ gets very, very large, this becomes a problem.

`LombScargleFast`

¶Finally we get to the meat of this discussion.

In a 1989 paper, Press and Rybicki proposed a clever method whereby a Fast Fourier Transform is used on a grid *extirpolated* from the original data, such that this problem can be solved in $O[N\log N]$ time. The `gatspy`

package contains a pure-Python implementation of this algorithm, and we'll explore it here.
If you're interested in seeing how the algorithm works in Python, check out the code in the gatspy source.
It's far more readible and understandable than the Fortran source presented in Press *et al.*

For convenience, the implementation has a `periodogram_auto`

method which automatically selects a frequency/period range based on an oversampling factor and a nyquist factor:

In [7]:

```
from gatspy.periodic import LombScargleFast
help(LombScargleFast.periodogram_auto)
```

In [8]:

```
from gatspy.periodic import LombScargleFast
t, y, dy = create_data(100)
model = LombScargleFast().fit(t, y, dy)
period, power = model.periodogram_auto()
plt.plot(period, power)
plt.xlim(0, 5);
```

Here, to illustrate the different computational scalings, we'll evaluate the computational time for a number of inputs, using `LombScargleAstroML`

(a fast implementation of the $O[N^2]$ algorithm) and `LombScargleFast`

, which is the fast FFT-based implementation:

In [9]:

```
from time import time
from gatspy.periodic import LombScargleAstroML, LombScargleFast
def get_time(N, Model):
t, y, dy = create_data(N)
model = Model().fit(t, y, dy)
t0 = time()
model.periodogram_auto()
t1 = time()
result = t1 - t0
# for fast operations, we should do several and take the median
if result < 0.1:
N = min(50, 0.5 / result)
times = []
for i in range(5):
t0 = time()
model.periodogram_auto()
t1 = time()
times.append(t1 - t0)
result = np.median(times)
return result
N_obs = list(map(int, 10 ** np.linspace(1, 4, 5)))
times1 = [get_time(N, LombScargleAstroML) for N in N_obs]
times2 = [get_time(N, LombScargleFast) for N in N_obs]
```

In [10]:

```
plt.loglog(N_obs, times1, label='Naive Implmentation')
plt.loglog(N_obs, times2, label='FFT Implementation')
plt.xlabel('N observations')
plt.ylabel('t (sec)')
plt.legend(loc='upper left');
```

For fewer than 100 observations, the naive implementation wins out, but as the number of points grows, we observe the clear trends in scaling: $O[N^2]$ for the Naive method, and $O[N\log N]$ for the fast method. We could push this plot higher, but the trends are already clear: for $10^5$ points, while the FFT method would complete in a couple seconds, the Naive method would take nearly two hours! Who's got the time for that plot?