ThinkDSP¶

This notebook contains code examples from Chapter 8: Filtering and Convolution

License: Creative Commons Attribution 4.0 International

In [1]:

# Get thinkdsp.py

import os

if not os.path.exists('thinkdsp.py'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/thinkdsp.py

In [2]:

import numpy as np
import matplotlib.pyplot as plt

from thinkdsp import decorate

In [3]:

# suppress scientific notation for small numbers
np.set_printoptions(precision=3, suppress=True)

Smoothing¶

As the first example, I'll look at daily closing stock prices for Facebook, from its IPO on 2012-05-18 to 2020-03-27 (note: the dataset includes only trading days )

In [4]:

if not os.path.exists('FB_2.csv'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/FB_2.csv

In [5]:

import pandas as pd

df = pd.read_csv('FB_2.csv', header=0, parse_dates=[0])
df.head()

Out[5]:

	Date	Open	High	Low	Close	Adj Close	Volume
0	2012-05-18	42.049999	45.000000	38.000000	38.230000	38.230000	573576400
1	2012-05-21	36.529999	36.660000	33.000000	34.029999	34.029999	168192700
2	2012-05-22	32.610001	33.590000	30.940001	31.000000	31.000000	101786600
3	2012-05-23	31.370001	32.500000	31.360001	32.000000	32.000000	73600000
4	2012-05-24	32.950001	33.209999	31.770000	33.029999	33.029999	50237200

In [6]:

df.tail()

Out[6]:

	Date	Open	High	Low	Close	Adj Close	Volume
1972	2020-03-23	149.660004	152.309998	142.250000	148.100006	148.100006	29830800
1973	2020-03-24	155.210007	161.309998	152.570007	160.979996	160.979996	30440400
1974	2020-03-25	158.919998	162.990005	153.059998	156.210007	156.210007	35184300
1975	2020-03-26	158.250000	164.000000	157.020004	163.339996	163.339996	26556800
1976	2020-03-27	158.199997	160.089996	154.750000	156.789993	156.789993	24861900

Extract the close prices and days since start of series:

In [7]:

close = df['Close']
dates = df['Date']
days = (dates - dates[0]) / np.timedelta64(1,'D')

Make a window to compute a 30-day moving average and convolve the window with the data. The valid flag means the convolution is only computed where the window completely overlaps with the signal.

In [8]:

M = 30
window = np.ones(M)
window /= sum(window)
smoothed = np.convolve(close, window, mode='valid')
smoothed_days = days[M//2: len(smoothed) + M//2]

Plot the original and smoothed signals.

In [9]:

plt.plot(days, close, color='gray', alpha=0.6, label='daily close')
plt.plot(smoothed_days, smoothed, color='C1', alpha=0.6, label='30 day average')

decorate(xlabel='Time (days)', ylabel='Price ($)')

Smoothing sound signals¶

Generate a 440 Hz sawtooth signal.

In [10]:

from thinkdsp import SawtoothSignal

signal = SawtoothSignal(freq=440)
wave = signal.make_wave(duration=1.0, framerate=44100)
wave.make_audio()

Out[10]:

Make a moving average window.

In [11]:

window = np.ones(11)
window /= sum(window)
plt.plot(window)
decorate(xlabel='Index')

Plot the wave.

In [12]:

segment = wave.segment(duration=0.01)
segment.plot()
decorate(xlabel='Time (s)')

Pad the window so it's the same length as the signal, and plot it.

In [13]:

def zero_pad(array, n):
    """Extends an array with zeros.

    array: NumPy array
    n: length of result

    returns: new NumPy array
    """
    res = np.zeros(n)
    res[:len(array)] = array
    return res

In [14]:

N = len(segment)
padded = zero_pad(window, N)
plt.plot(padded)
decorate(xlabel='Index')

Apply the window to the signal (with lag=0).

In [15]:

prod = padded * segment.ys
np.sum(prod)

Out[15]:

-0.9001814882032255

Compute a convolution by rolling the window to the right.

In [16]:

smoothed = np.zeros(N)
rolled = padded.copy()
for i in range(N):
    smoothed[i] = sum(rolled * segment.ys)
    rolled = np.roll(rolled, 1)

In [17]:

plt.plot(rolled)
decorate(xlabel='Index')

Plot the result of the convolution and the original.

In [18]:

from thinkdsp import Wave

segment.plot(color='gray')
smooth = Wave(smoothed, framerate=wave.framerate)
smooth.plot()
decorate(xlabel='Time(s)')

Compute the same convolution using numpy.convolve.

In [19]:

segment.plot(color='gray')
ys = np.convolve(segment.ys, window, mode='valid')
smooth2 = Wave(ys, framerate=wave.framerate)
smooth2.plot()
decorate(xlabel='Time(s)')

Frequency domain¶

Let's see what's happening in the frequency domain.

Compute the smoothed wave using np.convolve, which is much faster than my version above.

In [20]:

convolved = np.convolve(wave.ys, window, mode='same')
smooth = Wave(convolved, framerate=wave.framerate)
smooth.make_audio()

Out[20]:

Plot spectrums of the original and smoothed waves:

In [21]:

spectrum = wave.make_spectrum()
spectrum.plot(color='gray')

spectrum2 = smooth.make_spectrum()
spectrum2.plot()

decorate(xlabel='Frequency (Hz)', ylabel='Amplitude')

For each harmonic, compute the ratio of the amplitudes before and after smoothing.

In [22]:

amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps    
ratio[amps<280] = 0

plt.plot(ratio)
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')

Plot the ratios again, but also plot the FFT of the window.

In [23]:

padded =  zero_pad(window, len(wave))
dft_window = np.fft.rfft(padded)

plt.plot(np.abs(dft_window), color='gray', label='DFT(window)')
plt.plot(ratio, label='amplitude ratio')

decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')

Gaussian window¶

Let's compare boxcar and Gaussian windows.

Make the boxcar window.

In [24]:

boxcar = np.ones(11)
boxcar /= sum(boxcar)

Make the Gaussian window.

In [25]:

import scipy.signal

gaussian = scipy.signal.gaussian(M=11, std=2)
gaussian /= sum(gaussian)

Plot the two windows.

In [26]:

plt.plot(boxcar, label='boxcar')
plt.plot(gaussian, label='Gaussian')
decorate(xlabel='Index')

Convolve the square wave with the Gaussian window.

In [27]:

ys = np.convolve(wave.ys, gaussian, mode='same')
smooth = Wave(ys, framerate=wave.framerate)
spectrum2 = smooth.make_spectrum()

Compute the ratio of the amplitudes.

In [28]:

amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps    
ratio[amps<560] = 0

Compute the FFT of the window.

In [29]:

padded =  zero_pad(gaussian, len(wave))
dft_gaussian = np.fft.rfft(padded)

Plot the ratios and the FFT of the window.

In [30]:

plt.plot(np.abs(dft_gaussian), color='0.7', label='Gaussian filter')
plt.plot(ratio, label='amplitude ratio')

decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')

Combine the preceding example into one big function so we can interact with it.

In [31]:

from thinkdsp import SquareSignal

def plot_filter(M=11, std=2):
    signal = SquareSignal(freq=440)
    wave = signal.make_wave(duration=1, framerate=44100)
    spectrum = wave.make_spectrum()

    gaussian = scipy.signal.gaussian(M=M, std=std)
    gaussian /= sum(gaussian)

    ys = np.convolve(wave.ys, gaussian, mode='same')
    smooth =  Wave(ys, framerate=wave.framerate)
    spectrum2 = smooth.make_spectrum()

    # plot the ratio of the original and smoothed spectrum
    amps = spectrum.amps
    amps2 = spectrum2.amps
    ratio = amps2 / amps    
    ratio[amps<560] = 0

    # plot the same ratio along with the FFT of the window
    padded =  zero_pad(gaussian, len(wave))
    dft_gaussian = np.fft.rfft(padded)

    plt.plot(np.abs(dft_gaussian), color='gray', label='Gaussian filter')
    plt.plot(ratio, label='amplitude ratio')

    decorate(xlabel='Frequency (Hz)', ylabel='Amplitude ratio')
    plt.show()

Try out different values of M and std.

In [32]:

from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

slider = widgets.IntSlider(min=2, max=100, value=11)
slider2 = widgets.FloatSlider(min=0, max=20, value=2)
interact(plot_filter, M=slider, std=slider2);

Convolution theorem¶

Let's use the Convolution theorem to compute convolutions using FFT.

I'll use the Facebook data again, and smooth it using np.convolve and a 30-day Gaussian window.

I ignore the dates and treat the values as if they are equally spaced in time.

In [33]:

window = scipy.signal.gaussian(M=30, std=6)
window /= window.sum()
smoothed = np.convolve(close, window, mode='valid')

len(close), len(smoothed)

Out[33]:

(1977, 1948)

Plot the original and smoothed data.

In [34]:

plt.plot(close, color='gray')
plt.plot(smoothed)
decorate(xlabel='Time (days)', ylabel='Price ($)')

Pad the window and compute its FFT.

In [35]:

N = len(close)
padded = zero_pad(window, N)
fft_window = np.fft.fft(padded)
plt.plot(np.abs(fft_window))
decorate(xlabel='Index')

Apply the convolution theorem.

In [36]:

fft_signal = np.fft.fft(close)
smoothed2 = np.fft.ifft(fft_signal * fft_window)
M = len(window)
smoothed2 = smoothed2[M-1:]

Plot the two signals (smoothed with numpy and FFT).

In [37]:

plt.plot(smoothed)
plt.plot(smoothed2.real)
decorate(xlabel='Time (days)', ylabel='Price ($)')

Confirm that the difference is small.

In [38]:

diff = smoothed - smoothed2
np.max(np.abs(diff))

Out[38]:

2.089768879877275e-13

scipy.signal provides fftconvolve, which computes convolutions using FFT.

In [39]:

smoothed3 = scipy.signal.fftconvolve(close, window, mode='valid')

Confirm that it gives the same answer, at least approximately.

In [40]:

diff = smoothed - smoothed3
np.max(np.abs(diff))

Out[40]:

8.526512829121202e-14

We can encapsulate the process in a function:

In [41]:

def fft_convolve(signal, window):
    fft_signal = np.fft.fft(signal)
    fft_window = np.fft.fft(window)
    return np.fft.ifft(fft_signal * fft_window)

And confirm that it gives the same answer.

In [42]:

smoothed4 = fft_convolve(close, padded)[M-1:]
len(smoothed4)

Out[42]:

In [43]:

diff = smoothed - smoothed4
np.max(np.abs(diff))

Out[43]:

2.089768879877275e-13

Autocorrelation¶

We can also use the convolution theorem to compute autocorrelation functions.

Compute autocorrelation using numpy.correlate:

In [44]:

corrs = np.correlate(close, close, mode='same')
corrs[:7]

Out[44]:

array([10515607.461, 10533254.812, 10550927.762, 10570279.165,
       10589752.032, 10609151.979, 10628488.485])

Compute autocorrelation using my fft_convolve. The window is a reversed copy of the signal. We have to pad the window and signal with zeros and then select the middle half from the result.

In [45]:

def fft_autocorr(signal):
    N = len(signal)
    signal = zero_pad(signal, 2*N)
    window = np.flipud(signal)

    corrs = fft_convolve(signal, window)
    corrs = np.roll(corrs, N//2+1)[:N]
    return corrs

Test the function.

In [46]:

corrs2 = fft_autocorr(close)
corrs2[:7]

Out[46]:

array([10515607.461-0.j, 10533254.812-0.j, 10550927.762+0.j,
       10570279.165-0.j, 10589752.032+0.j, 10609151.979-0.j,
       10628488.485-0.j])

Plot the results.

In [47]:

lags = np.arange(N) - N//2
plt.plot(lags, corrs, color='gray', alpha=0.5, label='np.convolve')
plt.plot(lags, corrs2.real, color='C1', alpha=0.5, label='fft_convolve')
decorate(xlabel='Lag', ylabel='Correlation')
len(corrs), len(corrs2)

Out[47]:

(1977, 1977)

Confirm that the difference is small.

In [48]:

diff = corrs - corrs2.real
np.max(np.abs(diff))

Out[48]:

2.60770320892334e-08

In [ ]: