In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import scipy

from math import gcd

import scikits.samplerate as sk_samplerate
import scipy.signal as scipy_signal
import resampy

from scipy.signal.fir_filter_design import firwin
In [2]:
Q = 48000
P = 44100

offset = 2000
instfreq = np.exp(np.linspace(np.log(offset+100), np.log(offset+23900), 96000))-offset
deltaphase = 2*np.pi*instfreq/Q
cphase = np.cumsum(deltaphase)
sig = np.sin(cphase)

Sample Rate Conversion: It's all about the filter

by Joachim Thiemann, 2.5.2017, [email protected]

My previous post comparing various sample rate conversion (SRC) methods in Python was quite well received, and I have gotten some emails and comments on it. In a comment, Prof. Christian Muenker from the Munich University of Applied Sciences pointed me to another method that I had not been aware of: scipy.signal.resample_poly (it appeared in scipy 0.18.0). "Great!" I thought. A polyphase resampling method that is in scipy (and thus likely to already be installed by anyone who works on audio processing in python)!

As it turns out, it's a good polyphase filtering implementation - but the default filter it uses is plain unusable (for any sample rate conversion use-case I can think of, anyways). Below is the sweep as used in my previous resampling post, resampled by scipy.signal.resample_poly; that's some ugly aliasing.

In [3]:
plt.figure(figsize=(15,3))
plt.specgram(scipy_signal.resample_poly(sig, P, Q)*30, scale='dB', Fs=P, NFFT=256)
plt.colorbar()
_=plt.axis((0,2,0,Q/2))

To understand why - and to see how we can stick within scipy and still make a good SRC - let's look at some up- and downsampling theory.

Resampling by a rational factor

Pretty much every resampling algorithm relies on the original and target sampling rate being relating to each other by a fractional number. (So, no resampling by $\pi$!) Let's say we have data sampled at rate $Q$, and want to have it at rate $P$; so we want to upsample by $M = P/gcd(P, Q)$ and downsample by $N = Q/gcd(P, Q)$ so $\frac{M}{N} = \frac{P}{Q}$. In the typical example where $P = 48000$ Hz and $Q = 44100$ Hz, we get $M=160$ and $Q=147.$

Upsampling by a factor $M$ can conceptually be viewed as simply inserting $M-1$ zero-value samples inbetween the actual original samples; this creates alias copies of the signal in frequency domain which are removed using an antialias lowpass filter. (This can be efficiently implemented using a polyphase filter, so the extra zeros are never actually put in there.) Since the (normalized) frequency domain got compressed by a factor of $M$. the filter needs to remove all frequencies higher than $\frac{1}{2M}$ (note that many implementations normalize to the Nyquist frequency, not the Nyquist rate, thus this might end up being written as $\frac{1}{M}$).

Upsampling Diagram

Similarly, downsampling is simply the process of using only every $N$th sample, but before we do that, we need to ensure the bandwith of the signal does not exceed $1/N$ of the original possible bandwidth, that is, we filter at $\frac{1}{2N}$ of $Q$.

Downsampling Diagram

If we now consider resampling, we note we have two LP filters in series; these can be combined into a single LP filter whose cutoff frequency is determined by the minimum of $\frac{1}{2M}$ or $\frac{1}{2N}$. The quality of the resampler is completely determined by this one filter.

Filter Design

There are many (MANY!) ways to design filters; but for sample rate conversion there is only really one commonly used method, and that is to use a linear-phase FIR filter using a windowed sinc function. There are numerous reasons for this, mostly that they are stable and can be efficiently implemented in polyphase structures.

The sinc function computed at the correct frequency provides an ideal filter - if it were infinite. To be causal and computable, it needs to be finite, so we cut it off at some point. This determines the order. However, cutting it off degrades the "perfect"-ness of the filter: it will no longer block all frequencies above its cutoff and/or no longer pass the desired frequencies without modification. So, let's talk tradeoffs.

Filter order: lenght of window

The first know we can tweak is the order of the filter, or the length of the impulse response: the longer, the closer to ideal you can get, but in exchange the filtering needs more computational power, and in a real-time application, the delay of the SRC increases. This can become quite significant, but remember: the filter runs at rate $M\cdot Q$! In our example, that is 7.056 MHz; so even with a filter of several thousand taps, the delay is in the order of milliseconds. So, go big or go home! Kaiser gave a formula for estimating the order based on the desired stopband attenuation $A_s$ and transition bandwidth $\Delta\omega$, giving $N \approx \frac{A_s - 7.95}{14.36\Delta\omega/2\pi}$; picking randomly $\Delta\omega = 0.01$ and $A_s = 60dB$ we find that the filter should be about 2277 samples long.

The Window function

Next, we look at the window. Remember: if you don't choose one, you still have made a choice (we just give it a fancy name, like "boxcar" or "rectangular window"). Applying a window will smooth the frequency response of the filter, which removes ripples in the passband but widen it at the same time. Let's look at the effects of some windows on a filter of 1024 taps with cutoff frequency of 0.1.

In [4]:
fig, axs = plt.subplots(1, 2, figsize=(15,5))

for w in {'boxcar', 'hann', 'hamm', ('kaiser', 5.0)}:
    axs[0].plot(firwin(1024, 1e-99, window=w), label=w)
    axs[1].magnitude_spectrum(firwin(1024, .1, window=w), scale='dB', 
                              pad_to=2**16, window=mpl.mlab.window_none, label=w)
axs[1].vlines(.1, -100, 2, colors='r', linestyles='dashed')
axs[1].axis((0.09, 0.11, -100, 2)), axs[0].legend(), axs[1].legend();
/usr/local/lib/python3.5/dist-packages/matplotlib/axes/_axes.py:6833: RuntimeWarning: divide by zero encountered in log10
  Z = 20. * np.log10(spec)

We note the following: There is always some energy above the cutoff frequency. Some of it is in the main lobe (left of the first notch), and then in the side lobes. The 'boxcar' window has the notch nearest to the desired cutoff frequency, but it also has very large side lobes; it also has some passband ripple! All the other windows widen the main lobe by roughly the same amount, but have varying side lobe attenuations. DSP folk like the Kaiser window; the parameter (here, I picked $\beta=5.0$) allows us to approximate various (named) windows, even up to a Gaussian window, but it has other properties that are nice, too. I refer you to the literature.

The cutoff frequency

So, windowing (always implied if having a finite length impulse resonse) widens the main lobe. So, why not, to reduce aliasing, drop the cutoff frequency of the LPF? Good idea! How much though is a design decision, affecting how much of the (usable) signal is retained vs. how much aliasing you are willing to tolerate.

My argument is that - if possible - the cutoff should fall exactly on the first notch ("null"). That way, the maximum aliasing is given by the sidelobe attenuation, and we are guranteed to have no energy exactly at Nyquist; in the rest of the specrum, aliasing noise might get masked by the real signal (in an audio context anyways).
Unfortunately, while the location of the first null of the window itself is known (at $\frac{1}{L}\sqrt{1+(\frac{\beta}{\pi})^2}$, for a window of length $L$) this is not the case once you multiply it with the sinc function. If there is a closed-form solution, I do not know it; instead $\frac{1}{L}\sqrt{1+(\frac{\beta}{\pi})^2}$ seems to give us the max of the first sidelobe.

This is actually a useful thing, since now we have a range where we can numerically search for the minimum!

Code

So, here is my implementation of the above discussion. I still use scipy.signal.resample but compute a better filter. The function gives you 2 configurable parameters: $\beta$ and $L$, with reasonable defaults for audio processing (in other words, pulled out of my ...)

In [5]:
def resample_poly_filter(up, down, beta=5.0, L=16001):
    
    # *** this block STOLEN FROM scipy.signal.resample_poly ***
    # Determine our up and down factors
    # Use a rational approximation to save computation time on really long
    # signals
    g_ = gcd(up, down)
    up //= g_
    down //= g_
    max_rate = max(up, down)

    sfact = np.sqrt(1+(beta/np.pi)**2)
            
    # generate first filter attempt: with 6dB attenuation at f_c.
    filt = firwin(L, 1/max_rate, window=('kaiser', beta))
    
    N_FFT = 2**19
    NBINS = N_FFT/2+1
    paddedfilt = np.zeros(N_FFT)
    paddedfilt[:L] = filt
    ffilt = np.fft.rfft(paddedfilt)
    
    # now find the minimum between f_c and f_c+sqrt(1+(beta/pi)^2)/L
    bot = int(np.floor(NBINS/max_rate))
    top = int(np.ceil(NBINS*(1/max_rate + 2*sfact/L)))
    firstnull = (np.argmin(np.abs(ffilt[bot:top])) + bot)/NBINS
    
    # generate the proper shifted filter
    filt2 = firwin(L, -firstnull+2/max_rate, window=('kaiser', beta))
    
    return filt2

Sanity test: Upsample an impulse and make sure that the first null is at the right place ($F_s/2P$).

In [6]:
pw = 18
impulse = np.zeros(2**pw)
impulse[2**(pw-1)] = 1

Fs = 48000
US = 50
filt1 = resample_poly_filter(US, 1, L=10001, beta=5.0)
us = scipy_signal.resample_poly(impulse, US, 1, window=filt1)

plt.figure(figsize=(17,3))
plt.magnitude_spectrum(us, c='m', Fs=Fs, scale='dB', label='resample_poly_audio', 
                       window=mpl.mlab.window_none)
plt.axis((Fs/(2*US)-50, Fs/(2*US)+50, -120, 40))
_=plt.vlines(Fs/(2*US), -120, 3, colors='r', linestyles='dashed');

OK! Let's try our sweep again:

In [7]:
plt.figure(figsize=(15,3))
wfilt = resample_poly_filter(P, Q, L=2**16+1)
plt.specgram(scipy_signal.resample_poly(sig, P, Q, window=wfilt)*30, scale='dB', Fs=P, NFFT=256)
plt.colorbar()
_=plt.axis((0,2,0,Q/2))

And compare to resampy (which uses a fixed 64k sample hand-tuned window, hence the choice of $L$ above):

In [8]:
plt.figure(figsize=(15,3))
plt.specgram(resampy.resample(sig, Q, P)*30, scale='dB', Fs=P, NFFT=256)
plt.colorbar()
_=plt.axis((0,2,0,Q/2))

I daresay my method works pretty well! One quick test with a more radical downsampling, to 16k (ratio 1/3), with default parameters:

In [9]:
P2 = 16000
plt.figure(figsize=(15,3))
wfilt = resample_poly_filter(P2, Q)
plt.specgram(scipy_signal.resample_poly(sig, P2, Q, window=wfilt)*30, scale='dB', Fs=P2, NFFT=256)
plt.colorbar()
_=plt.axis((0,2,0,P2/2))

Yeah, I can live with that.

Conclusion

Sample rate conversion is tricky business, but with a bit of fiddling, you can get good results in python - now also using nothing but scipy as required package (and my little precomputation function).

While I haven't looked at the (extensive) literature if the method presented here has already been done before (after all, working on this is not my day job... I'm supposed to be writing a paper right now!) it makes sense to me, and I'd love to get some feedback on it. Please leave a comment on my blog!