This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

11.6. Applying digital filters to speech sounds

PYTHON 2 VERSION

You need the pydub package: you can install it with pip install pydub. (https://github.com/jiaaro/pydub/)

This package requires the open-source multimedia library FFmpeg for the decompression of MP3 files. (http://www.ffmpeg.org)

  1. Let's import the packages.
In [ ]:
import urllib
import urllib2
import cStringIO
import numpy as np
import scipy.signal as sg
import pydub
import matplotlib.pyplot as plt
from IPython.display import Audio, display
%matplotlib inline
  1. We create a Python function to generate a sound from an English sentence. This function uses Google's Text-To-Speech (TTT) API. We retrieve the sound in mp3 format, and we convert it to the Wave format with pydub. Finally, we retrieve the raw sound data by removing the wave header with NumPy.
In [ ]:
def speak(sentence):
    url = "http://translate.google.com/translate_tts?tl=en&q=" + \
          urllib.quote_plus(sentence)
    req = urllib2.Request(url, headers={'User-Agent': ''}) 
    mp3 = urllib2.urlopen(req).read()
    # We convert the mp3 bytes to wav.
    audio = pydub.AudioSegment.from_mp3(cStringIO.StringIO(mp3))
    wave = audio.export(cStringIO.StringIO(), format='wav')
    wave.reset()
    wave = wave.read()
    # We get the raw data by removing the 24 first bytes 
    # of the header.
    x = np.frombuffer(wave, np.int16)[24:] / 2.**15
    return x, audio.frame_rate
  1. We create a function that plays a sound (represented by a NumPy vector) in the notebook, using IPython's Audio class.
In [ ]:
def play(x, fr, autoplay=False):
    display(Audio(x, rate=fr, autoplay=autoplay))
  1. Let's play the sound "Hello world". We also display the waveform with matplotlib.
In [ ]:
x, fr = speak("Hello world")
play(x, fr)
plt.figure(figsize=(6,3));
t = np.linspace(0., len(x)/fr, len(x))
plt.plot(t, x, lw=1);
  1. Now, we will hear the effect of a Butterworth low-pass filter applied to this sound (500 Hz cutoff frequency).
In [ ]:
b, a = sg.butter(4, 500./(fr/2.), 'low')
x_fil = sg.filtfilt(b, a, x)
In [ ]:
play(x_fil, fr)
plt.figure(figsize=(6,3));
plt.plot(t, x, lw=1);
plt.plot(t, x_fil, lw=1);

We hear a muffled voice.

  1. And now with a high-pass filter (1000 Hz cutoff frequency).
In [ ]:
b, a = sg.butter(4, 1000./(fr/2.), 'high')
x_fil = sg.filtfilt(b, a, x)
In [ ]:
play(x_fil, fr)
plt.figure(figsize=(6,3));
plt.plot(t, x, lw=1);
plt.plot(t, x_fil, lw=1);

It sounds like a phone call.

  1. Finally, we can create a simple widget to quickly test the effect of a high-pass filter with an arbitrary cutoff frequency.
In [ ]:
from IPython.html import widgets
@widgets.interact(t=(100., 5000., 100.))
def highpass(t):
    b, a = sg.butter(4, t/(fr/2.), 'high')
    x_fil = sg.filtfilt(b, a, x)
    play(x_fil, fr, autoplay=True)

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).