ThinkDSP¶

This notebook contains solutions to exercises in Chapter 4: Noise

License: Creative Commons Attribution 4.0 International

In [1]:

# Get thinkdsp.py

import os

if not os.path.exists('thinkdsp.py'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/thinkdsp.py

In [2]:

import numpy as np
import matplotlib.pyplot as plt

from thinkdsp import decorate

Exercise 1¶

``A Soft Murmur'' is a web site that plays a mixture of natural noise sources, including rain, waves, wind, etc. At http://asoftmurmur.com/about/ you can find their list of recordings, most of which are at http://freesound.org.

Download a few of these files and compute the spectrum of each signal. Does the power spectrum look like white noise, pink noise, or Brownian noise? How does the spectrum vary over time?

In [3]:

if not os.path.exists('132736__ciccarelli__ocean-waves.wav'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/132736__ciccarelli__ocean-waves.wav

In [4]:

from thinkdsp import read_wave

wave = read_wave('132736__ciccarelli__ocean-waves.wav')
wave.make_audio()

Out[4]:

I chose a recording of ocean waves. I selected a short segment:

In [5]:

segment = wave.segment(start=1.5, duration=1.0)
segment.make_audio()

Out[5]:

And here's its spectrum:

In [6]:

spectrum = segment.make_spectrum()
spectrum.plot_power()
decorate(xlabel='Frequency (Hz)',
         ylabel='Power')

Amplitude drops off with frequency, so this might be red or pink noise. We can check by looking at the power spectrum on a log-log scale.

In [7]:

spectrum.plot_power()

loglog = dict(xscale='log', yscale='log')
decorate(xlabel='Frequency (Hz)',
         ylabel='Power', 
         **loglog)

This structure, with increasing and then decreasing amplitude, seems to be common in natural noise sources.

Above $f = 10^3$, it might be dropping off linearly, but we can't really tell.

To see how the spectrum changes over time, I'll select another segment:

In [8]:

segment2 = wave.segment(start=2.5, duration=1.0)
segment2.make_audio()

Out[8]:

And plot the two spectrums:

In [9]:

spectrum2 = segment2.make_spectrum()

spectrum.plot_power(alpha=0.5)
spectrum2.plot_power(alpha=0.5)
decorate(xlabel='Frequency (Hz)',
         ylabel='Power')

Here they are again, plotting power on a log-log scale.

In [10]:

spectrum.plot_power(alpha=0.5)
spectrum2.plot_power(alpha=0.5)
decorate(xlabel='Frequency (Hz)',
         ylabel='Power',
         **loglog)

So the structure seems to be consistent over time.

We can also look at a spectrogram:

In [11]:

segment.make_spectrogram(512).plot(high=5000)
decorate(xlabel='Time(s)', ylabel='Frequency (Hz)')

Within this segment, the overall amplitude drops off, but the mixture of frequencies seems consistent.

Exercise: In a noise signal, the mixture of frequencies changes over time. In the long run, we expect the power at all frequencies to be equal, but in any sample, the power at each frequency is random.

To estimate the long-term average power at each frequency, we can break a long signal into segments, compute the power spectrum for each segment, and then compute the average across the segments. You can read more about this algorithm at http://en.wikipedia.org/wiki/Bartlett%27s_method.

Implement Bartlett's method and use it to estimate the power spectrum for a noise wave. Hint: look at the implementation of make_spectrogram.

In [12]:

from thinkdsp import Spectrum

def bartlett_method(wave, seg_length=512, win_flag=True):
    """Estimates the power spectrum of a noise wave.
    
    wave: Wave
    seg_length: segment length
    """
    # make a spectrogram and extract the spectrums
    spectro = wave.make_spectrogram(seg_length, win_flag)
    spectrums = spectro.spec_map.values()
    
    # extract the power array from each spectrum
    psds = [spectrum.power for spectrum in spectrums]
    
    # compute the root mean power (which is like an amplitude)
    hs = np.sqrt(sum(psds) / len(psds))
    fs = next(iter(spectrums)).fs
    
    # make a Spectrum with the mean amplitudes
    spectrum = Spectrum(hs, fs, wave.framerate)
    return spectrum

bartlett_method makes a spectrogram and extracts spec_map, which maps from times to Spectrum objects. It computes the PSD for each spectrum, adds them up, and puts the results into a Spectrum object.

In [13]:

psd = bartlett_method(segment)
psd2 = bartlett_method(segment2)

psd.plot_power()
psd2.plot_power()

decorate(xlabel='Frequency (Hz)', 
         ylabel='Power', 
         **loglog)

Now we can see the relationship between power and frequency more clearly. It is not a simple linear relationship, but it is consistent across different segments, even in details like the notches near 5000 Hz, 6000 Hz, and above 10,000 Hz.

Exercise 2¶

At coindesk you can download the daily price of a BitCoin as a CSV file. Read this file and compute the spectrum of BitCoin prices as a function of time. Does it resemble white, pink, or Brownian noise?

In [14]:

if not os.path.exists('BTC_USD_2013-10-01_2020-03-26-CoinDesk.csv'):
    !wget https://github.com/AllenDowney/ThinkDSP/raw/master/code/BTC_USD_2013-10-01_2020-03-26-CoinDesk.csv

In [15]:

import pandas as pd

df = pd.read_csv('BTC_USD_2013-10-01_2020-03-26-CoinDesk.csv', 
                 parse_dates=[0])
df

Out[15]:

	Currency	Date	Closing Price (USD)	24h Open (USD)	24h High (USD)	24h Low (USD)
0	BTC	2013-10-01	123.654990	124.304660	124.751660	122.563490
1	BTC	2013-10-02	125.455000	123.654990	125.758500	123.633830
2	BTC	2013-10-03	108.584830	125.455000	125.665660	83.328330
3	BTC	2013-10-04	118.674660	108.584830	118.675000	107.058160
4	BTC	2013-10-05	121.338660	118.674660	121.936330	118.005660
...	...	...	...	...	...	...
2354	BTC	2020-03-22	5884.340133	6187.042146	6431.873162	5802.553402
2355	BTC	2020-03-23	6455.454688	5829.352511	6620.858253	5694.198299
2356	BTC	2020-03-24	6784.318011	6455.450650	6863.602196	6406.037439
2357	BTC	2020-03-25	6706.985089	6784.325204	6981.720386	6488.111885
2358	BTC	2020-03-26	6721.495392	6697.948320	6796.053701	6537.856462

2359 rows × 6 columns

In [16]:

ys = df['Closing Price (USD)']
ts = df.index

In [17]:

from thinkdsp import Wave

wave = Wave(ys, ts, framerate=1)
wave.plot()
decorate(xlabel='Time (days)')

In [18]:

spectrum = wave.make_spectrum()
spectrum.plot_power()
decorate(xlabel='Frequency (1/days)',
         ylabel='Power', 
         **loglog)

In [19]:

spectrum.estimate_slope()[0]

Out[19]:

-1.733254093675893

Red noise should have a slope of -2. The slope of this PSD is close to 1.7, so it's hard to say if we should consider it red noise or if we should say it's a kind of pink noise.

Exercise 3¶

A Geiger counter is a device that detects radiation. When an ionizing particle strikes the detector, it outputs a surge of current. The total output at a point in time can be modeled as uncorrelated Poisson (UP) noise, where each sample is a random quantity from a Poisson distribution, which corresponds to the number of particles detected during an interval.

Write a class called UncorrelatedPoissonNoise that inherits from _Noise and provides evaluate. It should use np.random.poisson to generate random values from a Poisson distribution. The parameter of this function, lam, is the average number of particles during each interval. You can use the attribute amp to specify lam. For example, if the framerate is 10 kHz and amp is 0.001, we expect about 10 “clicks” per second.

Generate about a second of UP noise and listen to it. For low values of amp, like 0.001, it should sound like a Geiger counter. For higher values it should sound like white noise. Compute and plot the power spectrum to see whether it looks like white noise.

In [20]:

from thinkdsp import Noise

class UncorrelatedPoissonNoise(Noise):
    """Represents uncorrelated Poisson noise."""

    def evaluate(self, ts):
        """Evaluates the signal at the given times.

        ts: float array of times
        
        returns: float wave array
        """
        ys = np.random.poisson(self.amp, len(ts))
        return ys

Here's what it sounds like at low levels of "radiation".

In [21]:

amp = 0.001
framerate = 10000
duration = 1

signal = UncorrelatedPoissonNoise(amp=amp)
wave = signal.make_wave(duration=duration, framerate=framerate)
wave.make_audio()

Out[21]:

To check that things worked, we compare the expected number of particles and the actual number:

In [22]:

expected = amp * framerate * duration
actual = sum(wave.ys)
print(expected, actual)

10.0 11

Here's what the wave looks like:

In [23]:

wave.plot()

And here's its power spectrum on a log-log scale.

In [24]:

spectrum = wave.make_spectrum()
spectrum.plot_power()
decorate(xlabel='Frequency (Hz)',
         ylabel='Power',
         **loglog)

Looks like white noise, and the slope is close to 0.

In [25]:

spectrum.estimate_slope().slope

Out[25]:

-0.0011109546801804139

With a higher arrival rate, it sounds more like white noise:

In [26]:

amp = 1
framerate = 10000
duration = 1

signal = UncorrelatedPoissonNoise(amp=amp)
wave = signal.make_wave(duration=duration, framerate=framerate)
wave.make_audio()

Out[26]:

It looks more like a signal:

In [27]:

wave.plot()

And the spectrum converges on Gaussian noise.

In [28]:

import matplotlib.pyplot as plt

def normal_prob_plot(sample, fit_color='0.8', **options):
    """Makes a normal probability plot with a fitted line.

    sample: sequence of numbers
    fit_color: color string for the fitted line
    options: passed along to Plot
    """
    n = len(sample)
    xs = np.random.normal(0, 1, n)
    xs.sort()
    
    ys = np.sort(sample)
    
    mean, std = np.mean(sample), np.std(sample)
    fit_ys = mean + std * xs
    plt.plot(xs, fit_ys, color='gray', alpha=0.5, label='model')

    plt.plot(xs, ys, **options)

In [29]:

spectrum = wave.make_spectrum()
spectrum.hs[0] = 0

normal_prob_plot(spectrum.real, label='real')
decorate(xlabel='Normal sample',
        ylabel='Power')

In [30]:

normal_prob_plot(spectrum.imag, label='imag', color='C1')
decorate(xlabel='Normal sample')

Exercise 4¶

The algorithm in this chapter for generating pink noise is conceptually simple but computationally expensive. There are more efficient alternatives, like the Voss-McCartney algorithm. Research this method, implement it, compute the spectrum of the result, and confirm that it has the desired relationship between power and frequency.

Solution: The fundamental idea of this algorithm is to add up several sequences of random numbers that get updates at different sampling rates. The first source should get updated at every time step; the second source every other time step, the third source ever fourth step, and so on.

In the original algorithm, the updates are evenly spaced. In an alternative proposed at http://www.firstpr.com.au/dsp/pink-noise/, they are randomly spaced.

My implementation starts with an array with one row per timestep and one column for each of the white noise sources. Initially, the first row and the first column are random and the rest of the array is Nan.

In [31]:

nrows = 100
ncols = 5

array = np.empty((nrows, ncols))
array.fill(np.nan)
array[0, :] = np.random.random(ncols)
array[:, 0] = np.random.random(nrows)
array[0:6]

Out[31]:

array([[0.39517038, 0.33346713, 0.80747793, 0.23050029, 0.12329066],
       [0.93480849,        nan,        nan,        nan,        nan],
       [0.61007844,        nan,        nan,        nan,        nan],
       [0.74423927,        nan,        nan,        nan,        nan],
       [0.43421634,        nan,        nan,        nan,        nan],
       [0.07031545,        nan,        nan,        nan,        nan]])

The next step is to choose the locations where the random sources change. If the number of rows is $n$, the number of changes in the first column is $n$, the number in the second column is $n/2$ on average, the number in the third column is $n/4$ on average, etc.

So the total number of changes in the matrix is $2n$ on average; since $n$ of those are in the first column, the other $n$ are in the rest of the matrix.

To place the remaining $n$ changes, we generate random columns from a geometric distribution with $p=0.5$. If we generate a value out of bounds, we set it to 0 (so the first column gets the extras).

In [32]:

p = 0.5
n = nrows
cols = np.random.geometric(p, n)
cols[cols >= ncols] = 0
cols

Out[32]:

array([4, 1, 3, 2, 1, 3, 1, 2, 1, 3, 1, 1, 4, 2, 1, 2, 1, 1, 1, 1, 1, 0,
       1, 1, 2, 1, 2, 0, 2, 1, 1, 2, 0, 1, 3, 1, 2, 4, 0, 1, 1, 4, 1, 3,
       1, 1, 1, 3, 2, 1, 3, 0, 3, 3, 4, 1, 4, 2, 1, 1, 1, 1, 2, 2, 1, 1,
       1, 2, 2, 1, 1, 3, 2, 1, 1, 1, 1, 4, 1, 4, 3, 1, 3, 1, 2, 1, 0, 1,
       3, 2, 3, 3, 1, 0, 1, 1, 2, 2, 2, 2])

Within each column, we choose a random row from a uniform distribution. Ideally we would choose without replacement, but it is faster and easier to choose with replacement, and I doubt it matters.

In [33]:

rows = np.random.randint(nrows, size=n)
rows

Out[33]:

array([81, 74, 76, 24, 12, 56, 95,  4,  1, 12, 95, 44, 50, 76, 95, 34,  4,
       91, 39, 38, 51, 13, 48, 54, 30,  9, 31,  4, 28, 79, 75, 45, 29, 65,
       53, 83, 12, 53, 66, 88, 63, 76, 97, 28, 60, 99, 27, 84, 11,  8, 30,
       71, 18,  4,  6, 57, 19, 58, 74, 37, 32, 53, 12, 91, 72, 36, 26, 87,
        0, 94, 74, 69, 48, 91, 12, 50, 49, 64, 39,  3, 37, 91, 30, 27, 81,
       79, 39, 45, 76, 86, 47, 54, 79,  0, 36, 86, 76, 43, 48,  8])

Now we can put random values at rach of the change points.

In [34]:

array[rows, cols] = np.random.random(n)
array[0:6]

Out[34]:

array([[0.75218982, 0.33346713, 0.33151149, 0.23050029, 0.12329066],
       [0.93480849, 0.54326904,        nan,        nan,        nan],
       [0.61007844,        nan,        nan,        nan,        nan],
       [0.74423927,        nan,        nan,        nan, 0.12887454],
       [0.32223056, 0.44575854, 0.08476646, 0.88175009,        nan],
       [0.07031545,        nan,        nan,        nan,        nan]])

Next we want to do a zero-order hold to fill in the NaNs. NumPy doesn't do that, but Pandas does. So I'll create a DataFrame:

In [35]:

df = pd.DataFrame(array)
df.head()

Out[35]:

	0	1	2	3	4
0	0.752190	0.333467	0.331511	0.23050	0.123291
1	0.934808	0.543269	NaN	NaN	NaN
2	0.610078	NaN	NaN	NaN	NaN
3	0.744239	NaN	NaN	NaN	0.128875
4	0.322231	0.445759	0.084766	0.88175	NaN

And then use fillna along the columns.

In [36]:

filled = df.fillna(method='ffill', axis=0)
filled.head()

Out[36]:

	0	1	2	3	4
0	0.752190	0.333467	0.331511	0.23050	0.123291
1	0.934808	0.543269	0.331511	0.23050	0.123291
2	0.610078	0.543269	0.331511	0.23050	0.123291
3	0.744239	0.543269	0.331511	0.23050	0.128875
4	0.322231	0.445759	0.084766	0.88175	0.128875

Finally we add up the rows.

In [37]:

total = filled.sum(axis=1)
total.head()

Out[37]:

0    1.770959
1    2.163380
2    1.838650
3    1.978395
4    1.863380
dtype: float64

If we put the results into a Wave, here's what it looks like:

In [38]:

wave = Wave(total.values)
wave.plot()

Here's the whole process in a function:

In [39]:

def voss(nrows, ncols=16):
    """Generates pink noise using the Voss-McCartney algorithm.
    
    nrows: number of values to generate
    rcols: number of random sources to add
    
    returns: NumPy array
    """
    array = np.empty((nrows, ncols))
    array.fill(np.nan)
    array[0, :] = np.random.random(ncols)
    array[:, 0] = np.random.random(nrows)
    
    # the total number of changes is nrows
    n = nrows
    cols = np.random.geometric(0.5, n)
    cols[cols >= ncols] = 0
    rows = np.random.randint(nrows, size=n)
    array[rows, cols] = np.random.random(n)

    df = pd.DataFrame(array)
    df.fillna(method='ffill', axis=0, inplace=True)
    total = df.sum(axis=1)

    return total.values

To test it I'll generate 11025 values:

In [40]:

ys = voss(11025)
ys

Out[40]:

array([ 7.74931667,  8.11992447,  7.81759903, ..., 10.21400096,
        9.58848738, 10.08793761])

And make them into a Wave:

In [41]:

wave = Wave(ys)
wave.unbias()
wave.normalize()

Here's what it looks like:

In [42]:

wave.plot()

As expected, it is more random-walk-like than white noise, but more random looking than red noise.

Here's what it sounds like:

In [43]:

wave.make_audio()

Out[43]:

And here's the power spectrum:

In [44]:

spectrum = wave.make_spectrum()
spectrum.hs[0] = 0
spectrum.plot_power()
decorate(xlabel='Frequency (Hz)',
         ylabel='Power',
         **loglog)

The estimated slope is close to -1.

In [45]:

spectrum.estimate_slope().slope

Out[45]:

-0.9961871315528988

We can get a better sense of the average power spectrum by generating a longer sample:

In [46]:

seg_length = 64 * 1024
iters = 100
wave = Wave(voss(seg_length * iters))
len(wave)

Out[46]:

And using Barlett's method to compute the average.

In [47]:

spectrum = bartlett_method(wave, seg_length=seg_length, win_flag=False)
spectrum.hs[0] = 0
len(spectrum)

Out[47]:

It's pretty close to a straight line, with some curvature at the highest frequencies.

In [48]:

spectrum.plot_power()
decorate(xlabel='Frequency (Hz)',
         ylabel='Power',
         **loglog)

And the slope is close to -1.

In [49]:

spectrum.estimate_slope().slope

Out[49]:

-1.0020572884106542

In [ ]: