Remarque (vu avec Quentin dans le bus)
Filtrer l'échantillon de son avant d'appliquer l'algorithme HPS!!!!
In this post, we're building a pitch tracking algorithm for the purpose of identifying the pitches of a guitar that needs to be tuned. This outline of this post is:
from pylab import *
%matplotlib inline
First, let us display the waveform and spectrum of our recording. The sound is a recording of the 6 strings of a fully tuned guitar. The strings are played one after another, two times. Our reference frequencies (the ones our algorithm should find) are thus:
from scipy.io import wavfile
rate, waveform = wavfile.read("files/tuning_6guitarstrings2.wav")
We define a timescale using the sampling frequency obtained from the wavefile.
t = arange(waveform.size) / float(rate)
figure(figsize=(12, 4))
plot(t, waveform)
xlabel('time (s)')
ylabel('amplitude (a. u.)')
grid(True)
To give your ear an idea how this sounds, we can embed this recording directly in this page using the IPython display tools:
from IPython.display import Audio
Audio(waveform, rate=rate)
Let's now plot the spectrogram of this sound.
s = specgram(waveform, Fs=rate, NFFT=1024)
xlabel("time (microseconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, rate/2)
(0, 4000.0)
In the following, we're going to work with a single frame. To this end, we write a function that allows us to extract a single frame and times from our original sound:
def select_frame(t, waveform, start_time, duration):
time_selection= (t>start_time) & (t<start_time + duration)
return (t[time_selection], waveform[time_selection])
selected_t, frame = select_frame(t, waveform, 29, 1)
figure(figsize=(12, 4))
plot(selected_t, frame)
xlabel('time (s)')
ylabel('amplitude (a. u.)')
grid(True)
spectrum = fft(hanning(frame.size) * frame)
freqs = linspace(0, 1, spectrum.size) * rate
spectrum *= 1 / (1 + freqs**2)
cepstrum = ifft((log10(abs(spectrum)**2)))# - mean(spectrum))))
plot(abs(cepstrum))
[<matplotlib.lines.Line2D at 0x109845160>]
print(argmax(abs(cepstrum[10:cepstrum.size/2])) + 10)
(argmax(abs(cepstrum[10:cepstrum.size/2])) + 10) * freqs.max()/freqs.size * 2
54
108.01350168771097
def get_cepstrum_freq(frame, rate):
spectrum = fft(frame)
freqs = linspace(0, 1, spectrum.size) * rate
spectrum *= 1 / (1 + freqs**2)
cepstrum = fft((abs(spectrum)))
max_index = argmax(abs(cepstrum[10:int(cepstrum.size/2)])) + 10
return (freqs[1] - freqs[0]) * max_index
get_cepstrum_freq(frame, rate)
54.013503375843968
The principle behind this algorithm is simple: using the frequency spectrum of a sound, it sums a certain number of harmonics of a certain range of frequencies. The outcome of this process usually yields a number of peaks that define the possible candidate frequencies for detection.
Let's try to implement this algorithm. First, we define a minimum and maximum frequency for our algorithm to work on:
min_freq, max_freq = 20., 1000.
Then, we compute the discrete Fourier transform of our signal and the discretized frequency spectrum.
spectrum = fft(hanning(frame.size) * frame)
freqs = linspace(0, 1, spectrum.size) * rate
The next step involves filtering the spectrum we just obtained. I just choose a low pass filter using the equation:
$$ a(f) = \frac{1}{1 + f^2} $$spectrum *= 1 / (1 + freqs**2)
plot(freqs, abs(spectrum))
ylim(0, abs(spectrum).mean() * 50)
xlim(0, 1000)
(0, 1000)
We now determine the indices range that we will take into account during our HPS computation.
min_index = (freqs > min_freq).nonzero()[0][0]
max_index = (freqs < max_freq).nonzero()[0][-1]
min_index, max_index
(20, 999)
Finally, we define the number of harmonics to take into account and compute the harmonic product spectrum.
harmonics = 2
hps = abs(spectrum.copy())
hps[:min_index] = 0
for i in range(min_index, max_index):
for j in range(1, harmonics):
#print(i * j)
hps[i] *= abs(spectrum[i * j])
#break
fig = figure(figsize=(10, 4))
plot(freqs, hps)
xlim(50, 500)
grid(True)
argmax(hps)
1329
freqs[argmax(hps)]
1329.3323330832709
freqs[argmax(hps)] / 146
2.0142021806821568
As we can see, the spectrum is reduced to a succession of peaks. Here, the loudest measured pitch has the following frequency:
Let's now define a function that applies the HPS algorithm to any sound. I added the following "tricks" to the function:
def compute_hps_pitch(waveform, rate,
min_freq, max_freq,
harmonics,
frequency_resolution=1.0,
debug=False):
# adjusting frequency resolution by zero padding
df = rate / float(waveform.size)
n = waveform.size * df
spectrum = rfft(waveform, n=int(n))
freqs = linspace(0, 0.5, spectrum.size) * rate
# computing indices
min_index = (freqs > min_freq).nonzero()[0][0]
max_index = (freqs < max_freq).nonzero()[0][-1]
# applying algorithm
hps = zeros_like(spectrum)
for i in range(min_index, max_index):
ampl = abs(spectrum[i])
for j in range(2, harmonics + 1):
ampl *= abs(spectrum[i * j])
hps[i] = ampl
if debug:
return freqs, hps
else:
return freqs[argmax(hps)]
Note: we use the real valued FFT in this case.
spectrum = rfft(hanning(frame.size) * (frame - frame.mean()))
freqs = linspace(0, 0.5, spectrum.size) * rate
freqs.shape
(4000,)
min_freq, max_freq = 40., 1000.
spectrum[freqs < min_freq] = 0.
spectrum[freqs > 3 * max_freq] = 0.
#spectrum *= 1 / (1 + freqs**2)
plot(freqs, abs(spectrum))
[<matplotlib.lines.Line2D at 0x110662860>]
We need to implement some sort of gates to build any sort of periodic function for windowing:
def make_window(frequency_vector, center_frequency, half_width=10):
"""
args: frequencies from FFT (in Hz), center frequency for likelihood estimation (in Hz),
half-width for estimation (in Hz)
"""
df = frequency_vector[1] - frequency_vector[0]
octave_index = int(center_frequency / df)
frequency_window = zeros((int(frequency_vector.size / octave_index) + 1,
octave_index))
print(frequency_window.shape)
#1/0
for i in range(frequency_window.shape[0] - 1):
width = int((i + 1) * half_width/df)
if width == 0: width + 1
print(i, width)
frequency_window[i, -width:] = True
frequency_window[i+1, :width] = True
return frequency_window
return frequency_window.ravel()[:frequency_vector.size]
imshow(make_window(freqs, 82), interpolation='nearest')
colorbar()
(50, 81) 0 9 1 19 2 29 3 39 4 49 5 59 6 69 7 79 8 89 9 99 10 109 11 119 12 129 13 139 14 149 15 159 16 169 17 179 18 189 19 199 20 209 21 219 22 229 23 239 24 249 25 259 26 269 27 279 28 289 29 299 30 309 31 319 32 329 33 339 34 349 35 359 36 369 37 379 38 389 39 399 40 409 41 419 42 429 43 439 44 449 45 459 46 469 47 479 48 489
<matplotlib.colorbar.Colorbar at 0x11a7f7588>
%pdb
Automatic pdb calling has been turned ON
plot(make_window(freqs, 82))
(50, 81)
[<matplotlib.lines.Line2D at 0x112e7ffd0>]
50 * 81
4050
%timeit make_window(freqs, 82)
1000 loops, best of 3: 315 µs per loop
from IPython.html.widgets import interact
interact(lambda f: plot(make_window(freqs, f, 2)),
f=(1, 1000))
We can now apply the algorithm in 1 Hz steps as a matrix vector product. We first build the matrix.
m = zeros((max_freq - min_freq, spectrum.size))
for ind, f in enumerate(range(int(min_freq), int(max_freq))):
m[ind, :] = make_window(freqs, f, half_width=2)
matshow(m)
<matplotlib.image.AxesImage at 0x112db6d68>
m.shape
(960, 4000)
spectrum.shape
(4000,)
Now the matrix vector product:
ml = dot(m, abs(spectrum))
ml = (dot(m, spectrum))
plot(range(int(min_freq), int(max_freq)), abs(ml))
xlim(0, 500)
grid(True)
argmax(abs(ml))
72
We can now analyze the sound we recorded earlier chunk by chunk to find the underlying frequencies. We consider the pitch detection incorrect if the general amplitude of the sound is below 1/10 of the maximum amplitude of the sound we're analyzing due to the fact that the HPS algorithm always returns an output.
chunk_length = 3
chunks = arange(0, t.max(), chunk_length)
computed_freqs = zeros_like(chunks)
for ind, start_t in enumerate(chunks):
time_selection= (t>start_t) & (t<start_t + chunk_length)
frame = waveform[time_selection]
if abs(frame).max() < abs(waveform).max() / 10.:
computed_freqs[ind] = nan
else:
#computed_freqs[ind] = compute_hps_pitch(frame, rate, 20, 1000, 4)
computed_freqs[ind] = get_cepstrum_freq(frame, rate)
plot(chunks + chunk_length/2., computed_freqs, label='pitch (Hz)', lw=3)
s = specgram(waveform, Fs=rate, NFFT=1024)
xlabel("time (seconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, 2000)
#xticks(arange(0, 5, 0.5))
grid(True)
legend()
<matplotlib.legend.Legend at 0x1166f2a20>
computed_freqs
array([ 32.00266689, 30.66922244, 24.00200017, 24.00200017, 24.00200017, 18.00150013, 18.00150013, 27.00225019, 27.00225019, 10.66755563, 10.66755563, 32.00266689, 8.00066672, 8.00066672, 8.59878835])
We can look at it in an interactive way:
s[2].shape
(386,)
def interact_with_HPS(start_t, duration):
time_selection= (t>start_t) & (t<start_t + duration)
frame = waveform[time_selection]
frequency = compute_hps_pitch(frame, rate, 20, 1000, 4)
print(frequency)
subplot(211)
pcolormesh(s[2], s[1], 20*log10(s[0]), cmap='jet')
xlabel("time (seconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, 2000)
#xticks(arange(0, 5, 0.5))
grid(True)
vlines(start_t, ylim()[0], ylim()[1])
vlines(start_t + duration, ylim()[0], ylim()[1])
subplot(212)
plot(t[time_selection], frame)
from IPython.html.widgets import interact
interact(interact_with_HPS,
start_t=(0, waveform.size / rate, 0.1),
duration=(0.1, 2, 0.1))
443.110777694
The pitches we have found are the following:
This seems to work with the parameters I have chosen to use. However, it seems that the algorithm is quite sensitive to the number of harmonics and other things. To try this out, I will use the interactive IPython machinery to determine a good parameter set for my problem.
def redraw_spectrogram():
# redraws spectrogram without recomputing it
imshow((10 * log10(s[0])), origin='lower', aspect='auto', extent=(s[2].min(), s[2].max(), s[1].min(), s[1].max()))
def explore_HPS_params(chunk_length, min_freq, max_freq, harmonics):
chunks = arange(0, t.max(), chunk_length)
computed_freqs = zeros_like(chunks)
for ind, start_t in enumerate(chunks):
time_selection= (t>start_t) & (t<start_t + chunk_length)
frame = waveform[time_selection]
if abs(frame).max() < abs(waveform).max() / 10.:
computed_freqs[ind] = nan
else:
computed_freqs[ind] = compute_hps_pitch(frame, rate, 20, 1000, 4)
plot(chunks + chunk_length/2., computed_freqs)
redraw_spectrogram()
xlabel("time (microseconds)")
ylabel("frequency (Hz)")
title("Spectrogram of guitar sound")
ylim(0, 700)
grid(True);
from IPython.html.widgets import interact, fixed
interact(explore_HPS_params,
chunk_length=(0.1, 1.5, 0.1),
min_freq=fixed(20),
max_freq=fixed(700),
harmonics=(1, 10))
Based on my exploration, I reckon that a number of harmonics equal to 4 works quite well for my recorded guitar. We can also visualize the output of the HPS algorithm as a function of the chunk we are watching.
def explore_HPS_chunks(chunk_index, harmonics):
start_t = chunks[chunk_index]
time_selection= (t>start_t) & (t<start_t + chunk_length)
frame = waveform[time_selection]
freqs, hps = compute_hps_pitch(frame, rate, 20, 1000, harmonics, debug=True)
subplot(211)
vlines(start_t, 2 * waveform.min(), 2 * waveform.max(), 'r', lw=10)
plot(t, waveform)
subplot(212)
plot(freqs, hps)
xlim(0, 1000)
interact(explore_HPS_chunks,
chunk_index=(0, chunks.size - 1),
harmonics=(1, 10))
<function __main__.explore_HPS_chunks>
As we can see, the HPS algorithm outputs a series of peaks from which the highest is selected as the pitch.