ThinkDSP

This notebook contains code examples from Chapter 9: Differentiation and Integration

Copyright 2015 Allen Downey

License: Creative Commons Attribution 4.0 International

In [1]:
# Get thinkdsp.py

import os

if not os.path.exists('thinkdsp.py'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/thinkdsp.py
In [2]:
import numpy as np
import matplotlib.pyplot as plt

from thinkdsp import decorate

Difference

As the first example, let's look at the Facebook data again.

In [3]:
if not os.path.exists('FB.csv'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/FB.csv
In [4]:
import pandas as pd

df = pd.read_csv('FB.csv', header=0, parse_dates=[0])
len(df)
Out[4]:
1977
In [5]:
from thinkdsp import Wave

ys = df['Close']

# for these examples, we need the wave to have 
# an even number of samples
if len(ys) % 2:
    ys = ys[:-1]

close = Wave(ys, framerate=1)
len(close)
Out[5]:
1976

Here's what the time series looks like (ignoring the gaps between trading days).

In [6]:
close.plot()
decorate(xlabel='Time (days)', ylabel='Price ($)')

And here's the spectrum on a log-log scale.

In [7]:
close_spectrum = close.make_spectrum()
close_spectrum.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude',
                 xscale='log', yscale='log')

The slope of the power spectrum is -1.86, which is similar to red noise (which should have a slope of -2).

In [8]:
close_spectrum.estimate_slope().slope
Out[8]:
-1.7895322348034641

We can use np.diff to compute the difference between successive elements, which is the daily change.

In [9]:
change = Wave(np.diff(close.ys), framerate=1)
change.plot()
decorate(xlabel='Time (days)', ylabel='Price change($)')
len(change)
Out[9]:
1975

And here's the spectrum of the daily changes:

In [10]:
change_spectrum = change.make_spectrum()
change_spectrum.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude')

Recall that the spectrum of white noise looks like white noise.

Here's the spectrum on a log-log scale.

In [11]:
change_spectrum.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude',
                 xscale='log', yscale='log')

The estimated slope is close to zero, which is consistent with white noise.

In [12]:
change_spectrum.estimate_slope().slope
Out[12]:
0.09137404305435813

We can think the diff operation as convolution with a difference window, [1, -1].

And convolution with this window corresponds to multiplication by a filter.

The following function computes the filter that corresponds to the window.

In [13]:
from thinkdsp import zero_pad

def make_filter(window, wave):
    """Computes the filter that corresponds to a window.
    
    window: NumPy array
    wave: wave used to choose the length and framerate
    
    returns: new Spectrum
    """
    padded = zero_pad(window, len(wave))
    window_wave = Wave(padded, framerate=wave.framerate)
    window_spectrum = window_wave.make_spectrum()
    return window_spectrum

And here's what the filter looks like for the difference window:

In [14]:
diff_window = np.array([1.0, -1.0])
diff_filter = make_filter(diff_window, close)
diff_filter.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude ratio')

And the angles:

In [15]:
plt.plot(diff_filter.angles)
decorate(xlabel='Frequency (1/day)', ylabel='Phase offset (radians)')

So we could also compute the daily changes by multiplying the spectrum of closing prices by the diff filter:

In [16]:
change_spectrum2 = close_spectrum * diff_filter
change_spectrum2.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude')

And then converting the spectrum to a wave.

In [17]:
change2 = change_spectrum2.make_wave()

# we have to trim the first element to avoid wrap-around
change2.ys = change2.ys[1:]
change2.ts = change2.ts[1:]

change.plot()
change2.plot()
decorate(xlabel='Time (day)', ylabel='Price change ($)')

Then we can confirm that we get the same result both ways (within floating point error).

In [44]:
change.max_diff(change2)
Out[44]:
1.2390088954816747e-13

Differentiation

This diff operation is an approximation of differentiation, and we can compute the filter for differentiation analytically: each complex component is multiplied by $2 \pi i f$.

In [19]:
#start with a filter that has the right size, then replace hs

PI2 = np.pi * 2
deriv_filter = close.make_spectrum()
deriv_filter.hs = PI2 * 1j * deriv_filter.fs
deriv_filter.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude ratio')

Now we can apply the derivative filter to the spectrum of closing prices:

In [20]:
deriv_spectrum = close.make_spectrum().differentiate()

deriv_spectrum.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude')

The results are similar to what we got from np.diff, with some differences due to (1) the difference window is only a coarse approximation of the derivative, especially at higher frequencies, and (2) the spectral derivative is based on the assumption that the signal is periodic, so the behavior at the beginning and end is different.

In [21]:
deriv = deriv_spectrum.make_wave()
len(deriv), len(change)
Out[21]:
(1976, 1975)
In [22]:
deriv = deriv_spectrum.make_wave()
change.plot(alpha=0.5)
deriv.plot(alpha=0.5)
decorate(xlabel='Time (day)')

We can see the differences more clearly by zooming in on a slice:

In [23]:
low, high = 0, 50
plt.plot(change.ys[low:high], label='diff')
plt.plot(deriv.ys[low:high], label='deriv')
decorate(xlabel='Time (day)', ylabel='Price change ($)')

The diffs and the spectral derivative are similar in many places, but sometimes substantially different.

Here's the difference between the derivative filter and the difference filter:

In [24]:
deriv_filter.plot()
diff_filter.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude ratio')

The difference filter does not amplify the highest frequencies as much, which is why the diffs are smoother than the derivative.

Integration

Now let's think about integration. We can compute the filter for integration analytically: each frequency component gets divided by $2 \pi i f$.

I plot the result on a log-y scale so we can see it more clearly.

In [25]:
#start with a copy of the deriv filter and replace the hs
integ_filter = deriv_filter.copy()
integ_filter.hs[1:] = 1 / (PI2 * 1j * integ_filter.fs[1:])

# set the component at freq=0 to infinity
integ_filter.hs[0] = np.inf

integ_filter.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude ratio', 
                 yscale='log')

We can confirm that the integration filter is correct by applying it to the spectrum of the derivative we just computed:

In [26]:
integ_spectrum = deriv_spectrum.copy().integrate()
integ_spectrum.plot()
decorate(xlabel='Frequency (1/day)', ylabel='Amplitude')
decorate(yscale='log')

And then converting back to a wave. The result is identical to the daily closing prices we started with, but shifted so the mean is 0.

The reason the mean is 0 is that the derivative clobbers the first element of the spectrum, which is the bias. Once the bias information is lost, integration can't restore it. So the result has an unspecified constant of integration.

In [27]:
close.plot(label='closing prices', alpha=0.7)

integ_spectrum.hs[0] = 0
integ_wave = integ_spectrum.make_wave()
integ_wave.plot(label='integrated derivative', alpha=0.7)
decorate(xlabel='Time (day)', ylabel='Price ($)')
In [28]:
shift = np.mean(close.ys) - np.mean(integ_wave.ys)

diff = integ_wave.ys - close.ys + shift
np.max(np.abs(diff))
Out[28]:
9.947598300641403e-14

Cumulative sum

In the same way that the diff operator approximates differentiation, the cumulative sum approximates integration.

I'll demonstrate with a Sawtooth signal.

In [29]:
from thinkdsp import SawtoothSignal

in_wave = SawtoothSignal(freq=50).make_wave(duration=0.1, framerate=44100)
in_wave.unbias()
in_wave.plot()
decorate(xlabel='Time (s)')

Here's the spectrum before the cumulative sum:

In [30]:
in_spectrum = in_wave.make_spectrum()
in_spectrum.plot()
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude')

The output wave is the cumulative sum of the input

In [31]:
out_wave = in_wave.cumsum()
out_wave.unbias()
out_wave.plot()
decorate(xlabel='Time (s)')

And here's its spectrum

In [32]:
out_spectrum = out_wave.make_spectrum()
out_spectrum.plot()
decorate(xlabel='Frequency (Hz)', ylabel='Amplitude')

Now we compute the ratio of the output to the input:

In [33]:
sum(in_spectrum.amps < 1), len(in_spectrum)
Out[33]:
(1765, 2206)

In between the harmonics, the input componenents are small, so I set those ratios to NaN.

In [34]:
ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
ratio_spectrum.plot(marker='.', ms=4, ls='')

decorate(xlabel='Frequency (Hz)',
         ylabel='Amplitude ratio',
         yscale='log')

To get the cumsum filter, I compute the diff filter again and invert it.

In [35]:
# compute the diff filter
diff_window = np.array([1.0, -1.0])
padded = zero_pad(diff_window, len(in_wave))
diff_wave = Wave(padded, framerate=in_wave.framerate)
diff_filter = diff_wave.make_spectrum()
In [36]:
# compute the cumsum filter by inverting the diff filter
cumsum_filter = diff_filter.copy()
cumsum_filter.hs[1:] = 1 / cumsum_filter.hs[1:]
cumsum_filter.hs[0] = np.inf
In [37]:
# compute the integration filter
integ_filter = cumsum_filter.copy()
integ_filter.hs[1:] = integ_filter.framerate / (PI2 * 1j * integ_filter.fs[1:])
integ_filter.hs[0] = np.inf
In [38]:
cumsum_filter.plot(label='cumsum filter', alpha=0.7)
integ_filter.plot(label='integral filter', alpha=0.7)

decorate(xlabel='Frequency (Hz)',
         ylabel='Amplitude ratio',
         yscale='log')

Finally, we can compare the computed ratios to the filter. They match, confirming that the cumsum filter is the inverse of the diff filter.

In [39]:
cumsum_filter.plot(label='cumsum filter')
ratio_spectrum.plot(label='ratio', marker='.', ms=4, ls='')
decorate(xlabel='Frequency (Hz)',
         ylabel='Amplitude ratio',
         yscale='log')

Now we can compute the output wave using the convolution theorem, and compare the results:

In [40]:
len(in_spectrum), len(cumsum_filter)
Out[40]:
(2206, 2206)
In [41]:
out_wave.plot(label='summed', alpha=0.7)

cumsum_filter.hs[0] = 0
out_wave2 = (in_spectrum * cumsum_filter).make_wave()
out_wave2.plot(label='filtered', alpha=0.7)

decorate(xlabel='Time (s)')

They are the same, within floating point error.

In [43]:
out_wave.max_diff(out_wave2)
Out[43]:
7.7982065249671e-13
In [ ]: