I'm taking a short break from my Cyclone V Tutorials to read up a bit on digital signal processing. This will be useful background for the next project in the series, which involves real-time audio processing.
Like all of the other Columbia EE and CompE majors, I took the EE department's infamous Signals and Systems course, which provided an introduction to analog signal processing. However, I never got to chance to take the introductory digital signal processing course. So other than implementing an FFT in VHDL for the final project in my embedded systems course, I don't have much experience when it comes to DSP.
To make up for this, I've started reading the rather excellent Scientist and Engineer's Guide to Digital Signal Processing by Dr. Steven Smith. This book has very clear and understandable explanations of key DSP concepts. One thing it lacks, however, is up-to-date code examples. The few pieces of code it does have are written in either BASIC or SHARC DSP Assembly. I thought I would remedy this by writing some code examples in a modern scientific computing language. This gave me the perfect opportunity to learn more about the Julia Programming Language and IPython Notebook.
Julia is a new programming language focused on scientific computing which is being developed at MIT. Its goals are to combine high expressiveness and high performance. Its syntax is similar to MATLAB, so if you've ever used MATLAB before, the code examples should look very familiar.
The core signal processing concepts involve quite a bit of advanced mathematics. I've tried to gloss over the math as much as possible here. If you're interested in the details, I recommend reading The Scientist and Engineer's Guide.
Signal processing is the art of analyzing and manipulating signals. A signal is simply a measurable quantity that varies in time and/or space. The key insight of signal processing is that a signal in time can be represented by a linear combination of sinusoids at different frequencies. There exists an operation called the Fourier transform, which takes a function of time $x(t)$ (which is called the time-domain signal) and computes the weights of its sinusoidal components. These weights are represented as a function of frequency $X(f)$ (called the frequency-domain signal). The Fourier transform takes a continuous function and produces another continuous function. In digital signal processing, since we operate on signals in discrete time, we use the discrete Fourier transform (DFT). This takes a set of $N$ samples in time and produces weights at $N$ different frequencies. Julia's signal processing library, like most common signal processing software packages, computes DFTs using a class of algorithms known as fast Fourier transforms (FFT).
To explain the use of Fourier transforms, let's look at an example. Let's examine what happens when we take the Fourier transform of several periods of a sinusoid at 440 Hz.
using PyPlot
freq = 440.0
N = 256
duration = 8 / freq
t = linspace(0, duration, N)
x = cos(2 * pi * freq * t)
plot(t, x)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xe8694d0>
We use the rfft
function (the real FFT function), because our input signal is composed entirely of real numbers (as opposed to complex numbers). This allows us to optimize by only computing the weights for frequencies from $1$ to $N / 2 + 1$. The higher frequencies are simply a mirror image of the lower ones, and so do not contain any extra information. We need to use abs
on the output of rfft
because the outputs are complex numbers. Right now we only care about the magnitude, not the phase of the frequency domain signal.
X = rfft(x)
sampRate = N / duration
f = linspace(0, sampRate / 2, int(N / 2) + 1)
magX = abs(X)
plot(f, magX)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x659afd0>
As you can see, the frequency domain signal is just a single spike at 440 Hz. This is pretty much what we expected, since our time-domain signal is just a single sinusoid. Let's see what we get when we add two sinusoids together. Let's add our original sinusoid to another at twice the frequency but half the amplitude.
chord = cos(2 * pi * freq * t) + 0.5 * cos(4 * pi * freq * t)
plot(t, chord)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x6ca3710>
Chord = abs(rfft(chord))
plot(f, Chord)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xab42990>
So we still see our original spike at 440 Hz, but now there's another spike at 880 Hz, which is half the height. So we see that the weights given to the sinusoids in the time domain correspond to value of the frequency domain signal at those frequencies. This is an important result, since it works the other way as well. If we can change the values at certain frequencies in the frequency domain, we can emphasize or de-emphasize certain sinusoidal components of the time-domain signal. This is the basis for all digital filters.
A filter is an operation which takes some signal as input and manipulates it to produce a different signal as the output. The simplest class of filters used in DSP are convolutional or finite impulse response (FIR) filters. They are called convolutional filters because they produce the output by convolving the input with another function called the filter kernel. Convolution is an operation in which each element of the output is the linear combination of some elements of the input. The weights come from the elements of the kernel. More formally, the discrete convolution is defined as
$$(f * h)[n] = \sum\limits_{m = -\infty}^\infty {f[n - m] \cdot h[m]}$$The $ * $ is the convolution operator. The filter kernel $h$ is also known as the impulse response because it is what the output will be if the input is an impulse function (an impulse function is a spike at an origin). In the definition of convolution given above, the summation index $m$ is unbounded in either direction, but for FIR filters the kernel is a finite array (hence finite impulse response), so $m$ simply goes across the indices of the kernel.
The reason convolution is used for filters is that convolution in the time domain corresponds to multiplication in the frequency domain. That is,
$$f(t) * h(t) \xrightarrow{\mathcal{F}} F(f) \cdot H(f)$$The $\xrightarrow{\mathcal{F}}$ symbol means "Fourier transform of". You can see how multiplying the frequency domain of the input by some known function would be useful if our goal is to emphasize or deemphasize certain sinusoids.
The simplest FIR filter is a moving average filter. This is exactly what it sounds like. Each element of the output is a weighted average of a certain number of elements of the input. The kernel for this filter would just be a constant function in the time domain. If $N$ is the number of elements we want to average, our kernel $h$ is defined such that
$$ h[n] = \frac{1}{N}, 0 < n \le N $$Moving average filters are useful if you want to get clean up random noise in the input. As an example, let's take our 440 Hz sinusoid and add some small random noise to it.
noise = randn(length(x)) / 20
noisy_x = x + noise
plot(t, noisy_x)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x1043f290>
Now let's apply our moving average filter and see if it cleans up the noise.
mavg_n = 16
mavg_h = ones(Float64, mavg_n) / mavg_n
# conv is the convolution function
filtered_x = conv(noisy_x, mavg_h)
t = [0:length(filtered_x) - 1] / sampRate
plot(t, filtered_x)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x448a850>
That's not too bad! It's a little bit weird on the sides because there are less points to add. This becomes less of a problem with larger inputs.
Moving average filters work great for removing noise, but what if we want to selectively remove parts of the signal? For instance, what if we only care about certain frequencies, but want to filter out the others. This is very common in RF communications, where each "channel" is a different frequency.
The simplest filter of this sort is a low-pass filter. This is a filter which allows sinusoids below a critical frequency to go through unchanged, while attenuating signals above the critical frequency (hence why it is called low-pass). An ideal low-pass filter has a frequency response (the Fourier transform of the impulse response) defined as follows ...
$$ H(f) = \left \{ \begin{array}{lr} 1, & f < f_c \\ 0, & f \ge f_c \\ \end{array} \right. $$The impulse response of such a filter is the sinc function
$$ h(t) = sinc(2 \pi f_c t) = \frac{\sin(2 \pi f_c t)}{2 \pi f_c t} $$Unfortunately, the sinc function is infinite in either direction, but we can only use a finite kernel. Therefore, the filter we end up using is a portion of the sinc function centered around the origin, with the two tails on either side cut off. This is appropriately named a windowed-sinc filter.
sinc_w = 75
t = [-sinc_w:sinc_w] / sampRate
freqCrit = 660.0
sinc_h = sin(2 * pi * freqCrit * t) ./ (2 * pi * freqCrit * t)
# need to fix the singularity at the origin
sinc_h[sinc_w + 1] = 1
# normalize so that the sum is 1.0
sinc_h ./= sum(sinc_h)
plot(t, sinc_h)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xae3bd10>
Of course, since this function is cut off, the corresponding frequency response isn't ideal anymore. Let's see what it looks like.
sinc_H = abs(rfft(sinc_h))
f = linspace(0, sampRate / 2, sinc_w + 1)
plot(f, sinc_H)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x7436110>
That's not so great. There's a lot of ripple in the passband (the part below the critical frequency), and it goes to 0 a bit too slowly in the stopband (the part above the critical frequency). The problem here is the sharp discontinuity at the edges of the filter kernel. The DSP guide suggests that we can solve this problem by multiplying our sinc function with a Hamming window, a function which is 1.0 in the center and then rolls off at the edges.
M = 2 * sinc_w
i = [0:M]
hamming_w = 0.54 - 0.46 * cos(2 * pi * i / M)
plot(t, hamming_w)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x5f33250>
win_sinc_h = hamming_w .* sinc_h
win_sinc_h ./= sum(win_sinc_h)
plot(t, win_sinc_h)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x4987a10>
As you can see, multiplying by the Hamming window causes our filter kernel to roll off a lot more gradually at the edges. Let's see how that affects our frequency response.
Win_Sinc_H = abs(rfft(win_sinc_h))
plot(f, Win_Sinc_H)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xe936e50>
This frequency response is pretty close to ideal. The passband has very little ripple, and the stop band is fully attenuated, which is exactly what we want from a low-pass filter. The only difference between this and an ideal low-pass filter is that the slope at the critical frequency (known as the gain) is not infinite. We of course cannot expect infinite gain from a finite filter, but the gain can be adjusted by changing the size of our filter. The larger the filter, the larger the gain.
Let's see what happens when we run our chord signal through the low-pass filter.
filtered_x = conv(chord, win_sinc_h)
t = [0:length(filtered_x)-1] / sampRate
plot(t, filtered_x)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x10c95f90>
This looks pretty close to the original single sinusoid signal. But again, the edges are weird. Let's see what it looks like in the frequency domain.
filtered_X = abs(rfft(filtered_x))
f = linspace(0, sampRate / 2, length(filtered_X))
plot(f, filtered_X)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x8c52a50>
That's pretty good. It's a bit messier than the original, but there is clearly only one major spike.
What would we do if we wanted to filter out the lower frequency and isolate the higher one? For that, we would need the opposite of a low-pass filter, which is called, you guessed it, a high-pass filter. We can construct a high-pass filter from a low-pass filter quite easily by flipping the frequency response vertically. That is
$$H_{high}(f) = 1 - H_{low}(f)$$The impulse response of the high-pass filter is the inverse fourier transform of the frequency response. As we have seen before, the fourier transform is a linear operation, so the impulse response of the high-pass is the inverse Fourier transform of the constant 1 minus the impulse response of the low-pass filter.
$$h_{high}(t) = \delta(t) - h_{low}(f)$$The inverse fourier transform of 1 is $\delta(t)$, the delta impulse function. In discrete time, $\delta(t)$ is a function which is 1 at the origin and 0 everywhere else.
delta = zeros(Float32, 2 * sinc_w + 1)
delta[sinc_w + 1] = 1.0
t = [-sinc_w:sinc_w] / sampRate
plot(t, delta)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xe954c90>
Delta = abs(rfft(delta))
# set the edges to different numbers so you can clearly see the scale
f = linspace(0, sampRate / 2, sinc_w + 1)
ylim([0.0, 1.1])
plot(f, Delta)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xac0e1d0>
So subtracting our low-pass filter kernel from the delta function will give us our high-pass filter.
highp_h = delta - win_sinc_h
plot(t, highp_h)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xec96590>
Highp_H = abs(rfft(highp_h))
plot(f, Highp_H)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xecca850>
If we convolved the high-pass filter kernel with the chord, we can isolate the higher frequency.
filtered_x = conv(chord, highp_h)
t = [0:length(filtered_x) - 1] / sampRate
plot(t, filtered_x)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0x412b1d0>
Filtered_X = abs(rfft(filtered_x))
f = linspace(0, sampRate / 2, length(Filtered_X))
plot(f, Filtered_X)
1-element Array{Any,1}: PyObject <matplotlib.lines.Line2D object at 0xdb4dc90>
The result of this filter has a lot more distortion than the low-pass filter. This is because our pass band is a lot wider.
So now we know what these signals and filters look like, but what do they sound like? To explore this, I decided to find a Creative Commons song from Jamendo and run it through a high pass and low pass filter.
The song I decided to use was Savarnah by Pasqualino Ubaldini and Paolo Pavan (CC-BY-NC-SA). Here's the original song.
Now here is the song put through a low-pass filter with a corner frequency of 440 Hz. You may want to turn your speakers up a bit for this one, since the low-pass filtering muffles the audio quite a bit. But you can hear the low notes (especially on the guitar) get emphasized quite a bit compared to the high notes. Also, the higher harmonics of the notes are filtered out, so the texture of the instruments' sounds (called the timbre in music theory terminology) is changed.
Here is the song put through a high-pass filter with a corner frequency of 880 Hz. The higher notes all go through, but now you can barely hear the lower notes. In fact, the low notes sound higher now because you are just hearing the harmonics and not the base note.