Sound Processing with Short Time Fourier Transform

Important: Please read the installation page for details about how to install the toolboxes. $\newcommand{\dotp}[2]{\langle #1, #2 \rangle}$ $\newcommand{\enscond}[2]{\lbrace #1, #2 \rbrace}$ $\newcommand{\pd}[2]{ \frac{ \partial #1}{\partial #2} }$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\umax}[1]{\underset{#1}{\max}\;}$ $\newcommand{\umin}[1]{\underset{#1}{\min}\;}$ $\newcommand{\uargmin}[1]{\underset{#1}{argmin}\;}$ $\newcommand{\norm}[1]{\|#1\|}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\choice}[1]{ \left\{ \begin{array}{l} #1 \end{array} \right. }$ $\newcommand{\pa}[1]{\left(#1\right)}$ $\newcommand{\diag}[1]{{diag}\left( #1 \right)}$ $\newcommand{\qandq}{\quad\text{and}\quad}$ $\newcommand{\qwhereq}{\quad\text{where}\quad}$ $\newcommand{\qifq}{ \quad \text{if} \quad }$ $\newcommand{\qarrq}{ \quad \Longrightarrow \quad }$ $\newcommand{\ZZ}{\mathbb{Z}}$ $\newcommand{\CC}{\mathbb{C}}$ $\newcommand{\RR}{\mathbb{R}}$ $\newcommand{\EE}{\mathbb{E}}$ $\newcommand{\Zz}{\mathcal{Z}}$ $\newcommand{\Ww}{\mathcal{W}}$ $\newcommand{\Vv}{\mathcal{V}}$ $\newcommand{\Nn}{\mathcal{N}}$ $\newcommand{\NN}{\mathcal{N}}$ $\newcommand{\Hh}{\mathcal{H}}$ $\newcommand{\Bb}{\mathcal{B}}$ $\newcommand{\Ee}{\mathcal{E}}$ $\newcommand{\Cc}{\mathcal{C}}$ $\newcommand{\Gg}{\mathcal{G}}$ $\newcommand{\Ss}{\mathcal{S}}$ $\newcommand{\Pp}{\mathcal{P}}$ $\newcommand{\Ff}{\mathcal{F}}$ $\newcommand{\Xx}{\mathcal{X}}$ $\newcommand{\Mm}{\mathcal{M}}$ $\newcommand{\Ii}{\mathcal{I}}$ $\newcommand{\Dd}{\mathcal{D}}$ $\newcommand{\Ll}{\mathcal{L}}$ $\newcommand{\Tt}{\mathcal{T}}$ $\newcommand{\si}{\sigma}$ $\newcommand{\al}{\alpha}$ $\newcommand{\la}{\lambda}$ $\newcommand{\ga}{\gamma}$ $\newcommand{\Ga}{\Gamma}$ $\newcommand{\La}{\Lambda}$ $\newcommand{\si}{\sigma}$ $\newcommand{\Si}{\Sigma}$ $\newcommand{\be}{\beta}$ $\newcommand{\de}{\delta}$ $\newcommand{\De}{\Delta}$ $\newcommand{\phi}{\varphi}$ $\newcommand{\th}{\theta}$ $\newcommand{\om}{\omega}$ $\newcommand{\Om}{\Omega}$

This numerical tour explores local Fourier analysis of sounds, and its application to source denoising.

Local Fourier analysis of sound.

A sound is a 1D signal that is locally highly oscillating and stationary. A local Fourier analysis is thus usefull to study the property of the sound such as its local amplitude and frequency.

First we load a sound, with a slight sub-sampling

In [5]:
n = 1024*16;
options.n = n;
[x,fs] = load_sound('bird', n);

You can actually play a sound. In case this does not work, you need to run the command |wavwrite(x(:)', 'tmp.wav')| and click on the saved file |'tmp.wav'| to read it.

In [5]:
sound(x(:)',fs);

We can display the sound.

In [6]:
clf;
plot(1:n,x);
axis('tight');
set_graphic_sizes([], 20);
title('Signal');

Local zooms on the sound show that it is highly oscilating.

In [7]:
p = 512;
t = 1:n;
clf;
sel = n/4 + (0:p-1);
subplot(2,1,1);
plot(t(sel),x(sel)); axis tight;
sel = n/2 + (0:p-1);
subplot(2,1,2);
plot(t(sel),x(sel)); axis tight;

Exercise 1

Compute the local Fourier transform around a point |t0| of |x|, which is the FFT (use the function |fft|) of the windowed signal |x.*h| where |h| is smooth windowing function located around |t0|. For instance you can use for |h| a Gaussian bump centered at |t0|. To center the FFT for display, use |fftshift|. enter for the Fourier analysis idth of the bump indow ft isplay signal isplay FFTs

In [8]:
exo1()
In [9]:
%% Insert your code here.

A good windowing function should balance both time localization and frequency localization.

In [10]:
t = linspace(-10,10,2048);
eta = 1e-5;
vmin = -2;

The block window has a sharp transition and thus a poor frequency localization.

In [11]:
h = double( abs(t)<1 );
hf = fftshift(abs(fft(h)));
hf = log10(eta+hf); hf = hf/max(hf);
clf;
subplot(2,1,1);
title('Block window');
plot(t, h); axis([-2 2, -.1, 1.1]);
subplot(2,1,2);
plot(t, hf); axis([-2 2, vmin, 1.1]);
title('Fourier transform');

A Hamming window is smoother.

In [12]:
h = cos(t*pi()/2) .* double(abs(t)<1);
hf = fftshift(abs(fft(h)));
hf = log10(eta+hf); hf = hf/max(hf);
clf;
subplot(2,1,1);
title('Hamming window');
plot(t, h); axis([-2 2, -.1, 1.1]);
subplot(2,1,2);
plot(t, hf); axis([-2 2, vmin, 1.1]);
title('Fourier transform');

A Haning window has continuous derivatives.

In [13]:
h = (cos(t*pi())+1)/2 .* double(abs(t)<1);
hf = fftshift(abs(fft(h)));
hf = log10(eta+hf); hf = hf/max(hf);
clf;
subplot(2,1,1);
title('Haning window');
plot(t, h); axis([-2 2, -.1, 1.1]);
subplot(2,1,2);
plot(t, hf); axis([-2 2, vmin, 1.1]);
title('Fourier transform');

A normalized Haning window has a sharper transition. It has the advantage of generating a tight frame STFT, and is used in the following.

In [14]:
h = sqrt(2)/2 * (1+cos(t*pi())) ./ sqrt( 1+cos(t*pi()).^2 ) .* double(abs(t)<1);
hf = fftshift(abs(fft(h)));
hf = log10(eta+hf); hf = hf/max(hf);
clf;
subplot(2,1,1);
title('Normalized Haning window');
plot(t, h); axis([-2 2, -.1, 1.1]);
subplot(2,1,2);
plot(t, hf); axis([-2 2, vmin, 1.1]);
title('Fourier transform');

Short time Fourier transform.

Gathering a local Fourier transform at equispaced point create a local Fourier transform, also called spectrogram. By carefully chosing the window, this transform corresponds to the decomposition of the signal in a redundant tight frame. The redundancy corresponds to the overlap of the windows, and the tight frame corresponds to the fact that the pseudo-inverse is simply the transposed of the transform (it means that the same window can be used for synthesis with a simple summation of the reconstructed signal over each window).

The only parameters of the transform are the size of the window and the overlap.

size of the window

In [15]:
w = 64*2;

overlap of the window

In [16]:
q = w/2;

Gabor atoms are computed using a Haning window. The atoms are obtained by translating in time and in frequency (modulation) the window.

In [17]:
t = 0:3*w-1;
t1 = t-2*w;
f = w/8;

Position 0, frequency 0.

In [18]:
g1 = sin( pi*t/w ).^2 .* double(t<w);

Position 2*w, frequency 0.

In [19]:
g2 = sin( pi*t1/w ).^2 .* double( t1<w & t1>=0 );

Position 0, frequency w/8

In [20]:
g3 = g1 .* sin( t * 2*pi/w * f);

Position 2*w, frequency w/8

In [21]:
g4 = g2 .* sin( t * 2*pi/w * f);

display

In [22]:
clf;
subplot(2,2,1);
plot(g1); axis('tight');
title('Position 0, frequency 0');
subplot(2,2,2);
plot(g2); axis('tight');
title('Position 2*w, frequency 0');
subplot(2,2,3);
plot(g3); axis('tight');
title('Position 0, frequency w/8');
subplot(2,2,4);
plot(g4); axis('tight');
title('Position 2*w, frequency w/8');

We can compute a spectrogram of the sound to see its local Fourier content. The number of windowed used is |(n-noverlap)/(w-noverlap)|

In [23]:
S = perform_stft(x,w,q, options);

To see more clearly the evolution of the harmonics, we can display the spectrogram in log coordinates. The top of the spectrogram corresponds to low frequencies.

display the spectrogram

In [24]:
clf; imageplot(abs(S)); axis('on');

display log spectrogram

In [25]:
plot_spectrogram(S,x);

The STFT transform is decomposing the signal in a redundant tight frame. This can be checked by measuring the energy conservation.

energy of the signal

In [26]:
e = norm(x,'fro').^2;

energy of the coefficients

In [27]:
eS = norm(abs(S),'fro').^2;
disp(strcat(['Energy conservation (should be 1)=' num2str(e/eS)]));
Energy conservation (should be 1)=1

One can also check that the inverse transform (which is just the transposed operator - it implements exactly the pseudo inverse) is working fine.

one must give the signal size for the reconstruction

In [28]:
x1 = perform_stft(S,w,q, options);
disp(strcat(['Reconstruction error (should be 0)=' num2str( norm(x-x1, 'fro')./norm(x,'fro') ) ]));
Reconstruction error (should be 0)=2.2401e-16

Audio Denoising

One can perform denosing by a non-linear thresholding over the transfomede Fourier domain.

First we create a noisy signal

In [29]:
sigma = .2;
xn = x + randn(size(x))*sigma;

Play the noisy sound.

In [30]:
sound(xn,fs);

Display the Sounds.

In [31]:
clf;
subplot(2,1,1);
plot(x); axis([1 n -1.2 1.2]);
set_graphic_sizes([], 20);
title('Original signal');
subplot(2,1,2);
plot(xn); axis([1 n -1.2 1.2]);
set_graphic_sizes([], 20);
title('Noisy signal');

One can threshold the spectrogram.

perform thresholding

In [32]:
Sn = perform_stft(xn,w,q, options);
SnT = perform_thresholding(Sn, 2*sigma, 'hard');

display the results

In [33]:
subplot(2,1,1);
plot_spectrogram(Sn);
subplot(2,1,2);
plot_spectrogram(SnT);

Exercise 2

A denoising is performed by hard or soft thresholding the STFT of the noisy signal. Compute the denosing SNR with both soft and hard thresholding, and compute the threshold that minimize the SNR. Remember that a soft thresholding should be approximately twice smaller than a hard thresholding. Check the result by listening. What can you conclude about the quality of the denoised signal ? etrieve best hard thresholding result isplay the error curves

In [34]:
exo2()
In [35]:
%% Insert your code here.

Exercise 3

Display and hear the results. What do you notice ? isplay ear

In [36]:
exo3()
In [37]:
%% Insert your code here.

Audio Block Thresholding

It is possible to remove musical noise by thresholding blocks of STFT coefficients.

Denoising is performed by block soft thresholding.

perform thresholding

In [38]:
Sn = perform_stft(xn,w,q, options);
SnT = perform_thresholding(Sn, sigma, 'block');

display the results

In [39]:
subplot(2,1,1);
plot_spectrogram(Sn);
subplot(2,1,2);
plot_spectrogram(SnT);

Exercise 4

Trie for various block sizes and report the best results. progressbar(k,length(bsX(:)));

In [40]:
exo4()
In [41]:
%% Insert your code here.