In audio processing, it is common to operate on one frame at a time. Frames are typically chosen to be 10 to 100 ms in duration.
essentia.standard.FrameGenerator
¶For this workshop, we will use essentia.standard.FrameGenerator
to segment our audio signal.
Let's create an audio sweep signal that is frequency modulated from 110 Hz to 880 Hz. Then, we will segment the signal and compute the zero crossing rate for each frame.
First, set our parameters:
T = 3.0
fs = 44100.0
f0 = 440*logspace(-2, 1, T*fs, endpoint=False, base=2.0)
print f0.min(), f0.max()
110.0 879.9861686
Create the signal:
import essentia
t = linspace(0, T, T*fs, endpoint=False)
x = essentia.array(sin(2*pi*f0*t))
Listen to the signal:
from IPython.display import Audio
Audio(x, rate=fs)
For each frame, compute the zero crossing rate, and display:
from essentia.standard import FrameGenerator, ZeroCrossingRate
zcr = ZeroCrossingRate()
frame_sz = 1024
hop_sz = 512
semilogy([zcr(frame) for frame in FrameGenerator(x, frameSize=frame_sz, hopSize=hop_sz)])
[<matplotlib.lines.Line2D at 0x4a06f90>]
for
Loops¶If you prefer not to use Essentia to segment your signal, you can use a standard for
loop:
semilogy([zcr(x[i:i+frame_sz]) for i in range(0, len(x), hop_sz)])
[<matplotlib.lines.Line2D at 0x647fc50>]
from essentia.standard import Spectrum, Windowing, FrameGenerator
hamming_window = Windowing(type='hamming')
spectrum = Spectrum() # we just want the magnitude spectrum
spectrogram = array([spectrum(hamming_window(frame))
for frame in FrameGenerator(x, frameSize=1024, hopSize=500)])
print spectrogram.shape
(266, 513)
imshow(spectrogram.T, origin='lower', aspect='auto', interpolation='nearest')
ylabel('Spectral Bin Index')
xlabel('Frame Index')
<matplotlib.text.Text at 0x62611d0>
(There are easier ways to display a spectrogram, e.g. using Matplotlib or librosa
. This example was just used to illustrate segmentation in Essentia.)