This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

# 10.1. Analyzing the frequency components of a signal with a Fast Fourier Transform¶

The data has been obtained here: http://www.ncdc.noaa.gov/cdo-web/datasets#GHCND.

1. Let's import the packages, including scipy.fftpack which includes many FFT-related routines.
In [ ]:
import datetime
import numpy as np
import scipy as sp
import scipy.fftpack
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

1. We import the data from the CSV file. The number -9999 is used for N/A values. Pandas can easily handle this. In addition, we tell Pandas to parse dates contained in the DATE column.
In [ ]:
df0 = pd.read_csv('data/weather.csv',
na_values=(-9999),
parse_dates=['DATE'])

In [ ]:
df = df0[df0['DATE']>='19940101']

In [ ]:
df.head()

1. Each row contains the precipitation and extremal temperatures recorded one day by one weather station in France. For every date, we want to get a single average temperature for the whole country. The groupby method provided by Pandas lets us do that easily. We also remove any NA value.
In [ ]:
df_avg = df.dropna().groupby('DATE').mean()

In [ ]:
df_avg.head()

1. Now, we get the list of dates and the list of corresponding temperature. The unit is in tenth of degree, and we get the average value between the minimal and maximal temperature, which explains why we divide by 20.
In [ ]:
date = df_avg.index.to_datetime()
temp = (df_avg['TMAX'] + df_avg['TMIN']) / 20.
N = len(temp)

1. Let's take a look at the evolution of the temperature.
In [ ]:
plt.figure(figsize=(6,3));
plt.plot_date(date, temp, '-', lw=.5);
plt.ylim(-10, 40);
plt.xlabel('Date');
plt.ylabel('Mean temperature');

1. We now compute the Fourier transform and the spectral density of the signal. The first step is to compute the FFT of the signal using the fft function.
In [ ]:
temp_fft = sp.fftpack.fft(temp)

1. Once the FFT has been obtained, one needs to take the square of its absolute value to get the power spectral density (PSD).
In [ ]:
temp_psd = np.abs(temp_fft) ** 2

1. The next step is to get the frequencies corresponding to the values of the PSD. The fftfreq utility function does just that. It takes as input the length of the PSD vector, as well as the frequency unit. Here, we choose an annual unit: a frequency of 1 corresponds to 1 year (365 days). We provide 1./365 because the original unit is in days.
In [ ]:
fftfreq = sp.fftpack.fftfreq(len(temp_psd), 1./365)

1. The fftfreq function returns positive and negative frequencies. We are only interested in positive frequencies here since we have a real signal (this will be explained in How it works...).
In [ ]:
i = fftfreq>0

1. We now plot the power spectral density of our signal, as a function of the frequency (in unit of 1/year).
In [ ]:
plt.figure(figsize=(8,4));
plt.plot(fftfreq[i], 10*np.log10(temp_psd[i]));
plt.xlim(0, 5);
plt.xlabel('Frequency (1/year)');
plt.ylabel('PSD (dB)');


We observe a peak for $f=1/year$: this is because the fundamental frequency of the signal is the yearly variation of the temperature.

1. Now, we cut out the frequencies higher than the fundamental frequency.
In [ ]:
temp_fft_bis = temp_fft.copy()
temp_fft_bis[np.abs(fftfreq) > 1.1] = 0

1. The next step is to perform an inverse FFT to convert the modified Fourier transform back to the temporal domain. This way, we recover a signal that mainly contains the fundamental frequency, as shown in the figure below.
In [ ]:
temp_slow = np.real(sp.fftpack.ifft(temp_fft_bis))

In [ ]:
plt.figure(figsize=(6,3));
plt.plot_date(date, temp, '-', lw=.5);
plt.plot_date(date, temp_slow, '-');
plt.xlim(datetime.date(1994, 1, 1), datetime.date(2000, 1, 1));
plt.ylim(-10, 40);
plt.xlabel('Date');
plt.ylabel('Mean temperature');


You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).