Remarque (vu avec Quentin dans le bus)

Filtrer l'échantillon de son avant d'appliquer l'algorithme HPS!!!!

In [ ]:

In this post, we're building a pitch tracking algorithm for the purpose of identifying the pitches of a guitar that needs to be tuned. This outline of this post is:

we analyze a guitar sound that I have recorded
we implement two algorithms for pitch detection that we apply to the previous sound
- HPS (harmonic product spectrum) method
- The cepstrum method

In [1]:

from pylab import *
%matplotlib inline

Analyzing a guitar sound¶

First, let us display the waveform and spectrum of our recording. The sound is a recording of the 6 strings of a fully tuned guitar. The strings are played one after another, two times. Our reference frequencies (the ones our algorithm should find) are thus:

82.4 Hz
110 Hz
146.8 Hz
196 Hz
247 Hz
329.6 Hz

In [2]:

from scipy.io import wavfile

In [3]:

rate, waveform = wavfile.read("files/tuning_6guitarstrings2.wav")

We define a timescale using the sampling frequency obtained from the wavefile.

In [4]:

t = arange(waveform.size) / float(rate)

In [5]:

figure(figsize=(12, 4))
plot(t, waveform)
xlabel('time (s)')
ylabel('amplitude (a. u.)')
grid(True)

To give your ear an idea how this sounds, we can embed this recording directly in this page using the IPython display tools:

In [6]:

from IPython.display import Audio

In [7]:

Audio(waveform, rate=rate)

Out[7]:

Let's now plot the spectrogram of this sound.

In [8]:

s = specgram(waveform, Fs=rate, NFFT=1024)
xlabel("time (microseconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, rate/2)

Out[8]:

(0, 4000.0)

In the following, we're going to work with a single frame. To this end, we write a function that allows us to extract a single frame and times from our original sound:

In [9]:

def select_frame(t, waveform, start_time, duration):
    time_selection= (t>start_time) & (t<start_time + duration)
    return (t[time_selection], waveform[time_selection])

In [281]:

selected_t, frame = select_frame(t, waveform, 29, 1)

In [282]:

figure(figsize=(12, 4))
plot(selected_t, frame)
xlabel('time (s)')
ylabel('amplitude (a. u.)')
grid(True)

The cepstrum method¶

In [197]:

spectrum = fft(hanning(frame.size) * frame)
freqs = linspace(0, 1, spectrum.size) * rate

In [198]:

spectrum *= 1 / (1 + freqs**2)

In [199]:

cepstrum = ifft((log10(abs(spectrum)**2)))# - mean(spectrum))))

In [200]:

plot(abs(cepstrum))

Out[200]:

[<matplotlib.lines.Line2D at 0x109845160>]

In [194]:

print(argmax(abs(cepstrum[10:cepstrum.size/2])) + 10)
(argmax(abs(cepstrum[10:cepstrum.size/2])) + 10) * freqs.max()/freqs.size * 2

Out[194]:

108.01350168771097

In [195]:

def get_cepstrum_freq(frame, rate):
    spectrum = fft(frame)
    freqs = linspace(0, 1, spectrum.size) * rate
    spectrum *= 1 / (1 + freqs**2)
    cepstrum = fft((abs(spectrum)))
    max_index = argmax(abs(cepstrum[10:int(cepstrum.size/2)])) + 10
    return (freqs[1] - freqs[0]) * max_index

In [169]:

get_cepstrum_freq(frame, rate)

Out[169]:

54.013503375843968

The HPS algorithm¶

How it works¶

The principle behind this algorithm is simple: using the frequency spectrum of a sound, it sums a certain number of harmonics of a certain range of frequencies. The outcome of this process usually yields a number of peaks that define the possible candidate frequencies for detection.

Implementation¶

Let's try to implement this algorithm. First, we define a minimum and maximum frequency for our algorithm to work on:

In [170]:

min_freq, max_freq = 20., 1000.

Then, we compute the discrete Fourier transform of our signal and the discretized frequency spectrum.

In [201]:

spectrum = fft(hanning(frame.size) * frame)
freqs = linspace(0, 1, spectrum.size) * rate

The next step involves filtering the spectrum we just obtained. I just choose a low pass filter using the equation:

$$ a(f) = \frac{1}{1 + f^2} $$

In [202]:

spectrum *= 1 / (1 + freqs**2)

In [203]:

plot(freqs, abs(spectrum))
ylim(0, abs(spectrum).mean() * 50)
xlim(0, 1000)

Out[203]:

(0, 1000)

We now determine the indices range that we will take into account during our HPS computation.

In [204]:

min_index = (freqs > min_freq).nonzero()[0][0]
max_index = (freqs < max_freq).nonzero()[0][-1]

In [205]:

min_index, max_index

Out[205]:

(20, 999)

Finally, we define the number of harmonics to take into account and compute the harmonic product spectrum.

In [214]:

harmonics = 2

In [215]:

hps = abs(spectrum.copy())
hps[:min_index] = 0
for i in range(min_index, max_index):
    for j in range(1, harmonics):
        #print(i * j)
        hps[i] *= abs(spectrum[i * j])
        
    #break

In [216]:

fig = figure(figsize=(10, 4))
plot(freqs, hps)
xlim(50, 500)
grid(True)

In [209]:

argmax(hps)

Out[209]:

In [210]:

freqs[argmax(hps)]

Out[210]:

1329.3323330832709

In [186]:

freqs[argmax(hps)] / 146

Out[186]:

2.0142021806821568

As we can see, the spectrum is reduced to a succession of peaks. Here, the loudest measured pitch has the following frequency:

Let's now define a function that applies the HPS algorithm to any sound. I added the following "tricks" to the function:

zero padding of the input signal to give more precise pitch estimation (1 Hz resolution by default)
a "debug output mode" that allows to visualize the computed harmonic product spectrum

In [187]:

def compute_hps_pitch(waveform, rate, 
                      min_freq, max_freq, 
                      harmonics,
                      frequency_resolution=1.0,
                      debug=False):
    # adjusting frequency resolution by zero padding
    df = rate / float(waveform.size)
    n = waveform.size * df
    spectrum = rfft(waveform, n=int(n))
    freqs = linspace(0, 0.5, spectrum.size) * rate
    
    # computing indices
    min_index = (freqs > min_freq).nonzero()[0][0]
    max_index = (freqs < max_freq).nonzero()[0][-1] 
    
    # applying algorithm
    hps = zeros_like(spectrum)
    for i in range(min_index, max_index):
        ampl = abs(spectrum[i])
        for j in range(2, harmonics + 1):
            ampl *= abs(spectrum[i * j])
        hps[i] = ampl
    if debug:
        return freqs, hps
    else:
        return freqs[argmax(hps)]

Maximum likelihood estimator¶

Note: we use the real valued FFT in this case.

In [335]:

spectrum = rfft(hanning(frame.size) * (frame - frame.mean()))
freqs = linspace(0, 0.5, spectrum.size) * rate

In [336]:

freqs.shape

Out[336]:

(4000,)

In [337]:

min_freq, max_freq = 40., 1000.

In [338]:

spectrum[freqs < min_freq] = 0.

In [339]:

spectrum[freqs > 3 * max_freq] = 0.

In [340]:

#spectrum *= 1 / (1 + freqs**2)

In [347]:

plot(freqs, abs(spectrum))

Out[347]:

[<matplotlib.lines.Line2D at 0x110662860>]

We need to implement some sort of gates to build any sort of periodic function for windowing:

In [428]:

def make_window(frequency_vector, center_frequency, half_width=10):
    """
    args: frequencies from FFT (in Hz), center frequency for likelihood estimation (in Hz),
    half-width for estimation (in Hz)
    """
    df = frequency_vector[1] - frequency_vector[0]
    octave_index = int(center_frequency / df)
    frequency_window = zeros((int(frequency_vector.size / octave_index) + 1, 
                            octave_index))
    print(frequency_window.shape)
    #1/0
    for i in range(frequency_window.shape[0] - 1):
        width = int((i + 1) * half_width/df)
        if width == 0: width + 1
        print(i, width)
        frequency_window[i, -width:] = True
        frequency_window[i+1, :width] = True
    return frequency_window
    return frequency_window.ravel()[:frequency_vector.size]

In [429]:

imshow(make_window(freqs, 82), interpolation='nearest')
colorbar()

Out[429]:

<matplotlib.colorbar.Colorbar at 0x11a7f7588>

In [369]:

%pdb

Automatic pdb calling has been turned ON

In [420]:

plot(make_window(freqs, 82))

(50, 81)

Out[420]:

[<matplotlib.lines.Line2D at 0x112e7ffd0>]

In [421]:

50 * 81

Out[421]:

In [349]:

%timeit make_window(freqs, 82)

1000 loops, best of 3: 315 µs per loop

In [350]:

from IPython.html.widgets import interact

In [375]:

interact(lambda f: plot(make_window(freqs, f, 2)),
         f=(1, 1000))

We can now apply the algorithm in 1 Hz steps as a matrix vector product. We first build the matrix.

In [404]:

m = zeros((max_freq - min_freq, spectrum.size))
for ind, f in enumerate(range(int(min_freq), int(max_freq))):
    m[ind, :] = make_window(freqs, f, half_width=2)

In [405]:

matshow(m)

Out[405]:

<matplotlib.image.AxesImage at 0x112db6d68>

In [397]:

m.shape

Out[397]:

(960, 4000)

In [398]:

spectrum.shape

Out[398]:

(4000,)

Now the matrix vector product:

In [412]:

ml = dot(m, abs(spectrum))

In [414]:

ml = (dot(m, spectrum))

In [417]:

plot(range(int(min_freq), int(max_freq)), abs(ml))
xlim(0, 500)
grid(True)

In [418]:

argmax(abs(ml))

Out[418]:

In [ ]:

Analysis of the recorded guitar sound¶

We can now analyze the sound we recorded earlier chunk by chunk to find the underlying frequencies. We consider the pitch detection incorrect if the general amplitude of the sound is below 1/10 of the maximum amplitude of the sound we're analyzing due to the fact that the HPS algorithm always returns an output.

In [206]:

chunk_length = 3
chunks = arange(0, t.max(), chunk_length)
computed_freqs = zeros_like(chunks)
for ind, start_t in enumerate(chunks):
    time_selection= (t>start_t) & (t<start_t + chunk_length)
    frame = waveform[time_selection]
    if abs(frame).max() < abs(waveform).max() / 10.:
        computed_freqs[ind] = nan
    else:
        #computed_freqs[ind] = compute_hps_pitch(frame, rate, 20, 1000, 4)
        computed_freqs[ind] = get_cepstrum_freq(frame, rate)

In [207]:

plot(chunks + chunk_length/2., computed_freqs, label='pitch (Hz)', lw=3)
s = specgram(waveform, Fs=rate, NFFT=1024)
xlabel("time (seconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, 2000)
#xticks(arange(0, 5, 0.5))
grid(True)
legend()

Out[207]:

<matplotlib.legend.Legend at 0x1166f2a20>

In [208]:

computed_freqs

Out[208]:

array([ 32.00266689,  30.66922244,  24.00200017,  24.00200017,
        24.00200017,  18.00150013,  18.00150013,  27.00225019,
        27.00225019,  10.66755563,  10.66755563,  32.00266689,
         8.00066672,   8.00066672,   8.59878835])

We can look at it in an interactive way:

In [47]:

s[2].shape

Out[47]:

(386,)

In [74]:

def interact_with_HPS(start_t, duration):
    time_selection= (t>start_t) & (t<start_t + duration)
    frame = waveform[time_selection]
    
    frequency = compute_hps_pitch(frame, rate, 20, 1000, 4)
    print(frequency)
    
    subplot(211)
    pcolormesh(s[2], s[1], 20*log10(s[0]), cmap='jet')
    xlabel("time (seconds)")
    ylabel("frequency (Hz)")
    title("Spectrogram of guitar sound")
    ylim(0, 2000)
    #xticks(arange(0, 5, 0.5))
    grid(True)

    vlines(start_t, ylim()[0], ylim()[1])   
    vlines(start_t + duration, ylim()[0], ylim()[1])
    
    subplot(212)
    plot(t[time_selection], frame)

In [75]:

from IPython.html.widgets import interact

In [76]:

interact(interact_with_HPS,
         start_t=(0, waveform.size / rate, 0.1),
         duration=(0.1, 2, 0.1))

443.110777694

The pitches we have found are the following:

Exploration of the output as a function of the algorithm parameters¶

This seems to work with the parameters I have chosen to use. However, it seems that the algorithm is quite sensitive to the number of harmonics and other things. To try this out, I will use the interactive IPython machinery to determine a good parameter set for my problem.

In [237]:

def redraw_spectrogram():
    # redraws spectrogram without recomputing it
    imshow((10 * log10(s[0])), origin='lower', aspect='auto', extent=(s[2].min(), s[2].max(), s[1].min(), s[1].max()))

In [244]:

def explore_HPS_params(chunk_length, min_freq, max_freq, harmonics):
    chunks = arange(0, t.max(), chunk_length)
    computed_freqs = zeros_like(chunks)
    for ind, start_t in enumerate(chunks):
        time_selection= (t>start_t) & (t<start_t + chunk_length)
        frame = waveform[time_selection]
        if abs(frame).max() < abs(waveform).max() / 10.:
            computed_freqs[ind] = nan
        else:
            computed_freqs[ind] = compute_hps_pitch(frame, rate, 20, 1000, 4)
    plot(chunks + chunk_length/2., computed_freqs)
    redraw_spectrogram()
    xlabel("time (microseconds)")
    ylabel("frequency (Hz)")
    title("Spectrogram of guitar sound")
    ylim(0, 700)
    grid(True);

In [245]:

from IPython.html.widgets import interact, fixed

In [246]:

interact(explore_HPS_params,
         chunk_length=(0.1, 1.5, 0.1),
         min_freq=fixed(20),
         max_freq=fixed(700),
         harmonics=(1, 10))

Based on my exploration, I reckon that a number of harmonics equal to 4 works quite well for my recorded guitar. We can also visualize the output of the HPS algorithm as a function of the chunk we are watching.

In [198]:

def explore_HPS_chunks(chunk_index, harmonics):
    start_t = chunks[chunk_index]
    time_selection= (t>start_t) & (t<start_t + chunk_length)
    frame = waveform[time_selection]
    freqs, hps = compute_hps_pitch(frame, rate, 20, 1000, harmonics, debug=True)
    subplot(211)
    vlines(start_t, 2 * waveform.min(), 2 * waveform.max(), 'r', lw=10)
    plot(t, waveform)
    subplot(212)
    plot(freqs, hps)
    xlim(0, 1000)
    

In [199]:

interact(explore_HPS_chunks, 
         chunk_index=(0, chunks.size - 1),
         harmonics=(1, 10))

Out[199]:

<function __main__.explore_HPS_chunks>

As we can see, the HPS algorithm outputs a series of peaks from which the highest is selected as the pitch.

In [ ]: