#!/usr/bin/env python
# coding: utf-8

# # Random Signals and LTI-Systems
# 
# *This jupyter notebook is part of a [collection of notebooks](../index.ipynb) on various topics of Digital Signal Processing. Please direct questions and suggestions to [Sascha.Spors@uni-rostock.de](mailto:Sascha.Spors@uni-rostock.de).*

# ## The Wiener Filter
# 
# The [Wiener filter](https://en.wikipedia.org/wiki/Wiener_filter), named after [*Norbert Wiener*](https://en.wikipedia.org/wiki/Norbert_Wiener), aims at estimating an unknown random signal by filtering a noisy distorted observation of the signal. It has a wide range of applications in signal processing. For instance, noise reduction, system identification, deconvolution and signal detection. The Wiener filter is frequently used as basic building block for algorithms that denoise audio signals, like speech, or remove noise from a image.

# ### Signal Model
# 
# The following signal model is underlying the Wiener filter
# 
# ![Illustration: Signal model for the Wiener filter](model_wiener_filter.png)
# 
# The random signal $s[k]$ is subject to distortion by the linear time-invariant (LTI) system $G(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ and additive noise $n[k]$, resulting in the observed signal $x[k] = s[k] * g[k] + n[k]$. The additive noise $n[k]$ is assumed to be uncorrelated from $s[k]$. It is furthermore assumed that all random signals are wide-sense stationary (WSS). This distortion model holds for many practical problems, like e.g. the measurement of a physical quantity by a sensor.
# 
# The basic concept of the Wiener filter is to apply the LTI system $H(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ to the observed signal $x[k]$ such that the output signal $y[k] = x[k] * h[k]$ matches $s[k]$ as close as possible. In order to quantify the deviation between $y[k]$ and $s[k]$, the error signal
# 
# \begin{equation}
# e[k] = y[k] - s[k]
# \end{equation}
# 
# is introduced. The error $e[k]$ equals zero, if the output signal $y[k]$ perfectly matches $s[k]$. In general, this goal cannot be achieved in a strict sense. As alternative one could aim at minimizing the linear average of the error $e[k]$. However, this quantity is not very well suited for optimization since it is signed. Instead, the quadratic average of the error $e[k]$ is used in the Wiener filter and other techniques. This measure is known as [*mean squared error*](https://en.wikipedia.org/wiki/Mean_squared_error) (MSE). It is defined as
# 
# \begin{equation}
# E \{ |e[k]|^2 \} = \frac{1}{2 \pi} \int\limits_{-\pi}^{\pi} \Phi_{ee}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \,\mathrm{d}\Omega = \varphi_{ee}[\kappa] \Big\vert_{\kappa = 0}
# \end{equation}
# 
# the equalities are deduced from the properties of the power spectral density (PSD) and the auto-correlation function (ACF). Above equation is referred to as [*cost function*](https://en.wikipedia.org/wiki/Loss_function). We aim at minimizing the cost function, hence minimizing the MSE between the original signal $s[k]$ and its estimate $y[k]$. The solution is referred to as [minimum mean squared error](https://en.wikipedia.org/wiki/Minimum_mean_square_error) (MMSE) solution.

# ### Transfer Function of the Wiener Filter
# 
# At first, the Wiener filter shall only have access to the observed signal $x[k]$ and selected statistical measures. It is assumed that the cross-power spectral density $\Phi_{xs}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ between the observed signal $x[k]$ and the original signal $s[k]$, and the PSD of the observed signal $\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ are known. This knowledge can either be gained by estimating both from measurements taken at an actual system or by using suitable statistical models.
# 
# The optimal filter is found by minimizing the MSE $E \{ |e[k]|^2 \}$ with respect to the transfer function $H(\mathrm{e}^{\,\mathrm{j}\,\Omega})$. The solution of this optimization problem goes beyond the scope of this notebook and can be found in the literature, e.g. [[Girod et. al](../index.ipynb#Literature)]. The transfer function of the Wiener filter is given as
# 
# \begin{equation}
# H(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{\Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}{\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})} = \frac{\Phi_{xs}(\mathrm{e}^{\,-\mathrm{j}\,\Omega})}{\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}
# \end{equation}
# 
# for $\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \neq 0$.
# No knowledge on the actual distortion process is required besides the assumption of an LTI system and additive noise. Only the PSDs $\Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ and $\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ have to be known in order to estimate $s[k]$ from $x[k]$ in the MMSE sense. Care has to be taken that the filter $H(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ is causal and stable in practical applications.

# **Example - Denoising of an Deterministic Signal**
# 
# The following example considers the estimation of the original signal from a distorted observation. It is assumed that the original signal is a deterministic signal $s[k] = \sin[\Omega_0\,k]$ which is distorted by an LTI system and additive wide-sense ergodic normal distributed zero-mean white noise with PSD $N_0 = 0.1$. The PSDs $\Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ and $\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ are estimated from $s[k]$ and $x[k]$ using the [Welch technique](../spectral_estimation_random_signals/welch_method.ipynb). The Wiener filter is applied to the observation $x[k]$ in order to compute the estimate $y[k]$ of $s[k]$. Note that the discrete signals have been illustrated by continuous lines for ease of illustration.

# In[1]:


import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig

N = 2**14  # number of samples
M = 256  # length of Wiener filter
Om0 = 0.1 * np.pi  # frequency of original signal
N0 = 0.1  # PSD of additive white noise

# generate original signal
s = np.cos(Om0 * np.arange(N))
# generate observed signal
g = 1 / 20 * np.asarray([1, 2, 3, 4, 5, 4, 3, 2, 1])
np.random.seed(1)
n = np.random.normal(size=N, scale=np.sqrt(N0))
x = np.convolve(s, g, mode="same") + n
# estimate (cross) PSDs using Welch technique
f, Pxx = sig.csd(x, x, nperseg=M)
f, Psx = sig.csd(s, x, nperseg=M)
# compute Wiener filter
H = Psx / Pxx
H = H * np.exp(
    -1j * 2 * np.pi / len(H) * np.arange(len(H)) * (len(H) // 2)
)  # shift for causal filter
h = np.fft.irfft(H)
# apply Wiener filter to observation
y = np.convolve(x, h, mode="same")

# plot (cross) PSDs
Om = np.linspace(0, np.pi, num=len(H))

plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.plot(
    Om, 20 * np.log10(np.abs(0.5 * Pxx)), label=r"$| \Phi_{xx}(e^{j \Omega}) |$ in dB"
)
plt.plot(
    Om, 20 * np.log10(np.abs(0.5 * Psx)), label=r"$| \Phi_{sx}(e^{j \Omega}) |$ in dB"
)
plt.title("(Cross) PSDs")
plt.xlabel(r"$\Omega$")
plt.legend()
plt.axis([0, np.pi, -60, 40])
plt.grid()

# plot transfer function of Wiener filter
plt.subplot(122)
plt.plot(Om, 20 * np.log10(np.abs(H)))
plt.title("Transfer function of Wiener filter")
plt.xlabel(r"$\Omega$")
plt.ylabel(r"$| H(e^{j \Omega}) |$ in dB")
plt.axis([0, np.pi, -150, 3])
plt.grid()
plt.tight_layout()

# plot signals
idx = np.arange(500, 600)
plt.figure(figsize=(10, 4))
plt.plot(idx, x[idx], label=r"observed signal $x[k]$")
plt.plot(idx, s[idx], label=r"original signal $s[k]$")
plt.plot(idx, y[idx], label=r"estimated signal $y[k]$")
plt.title("Signals")
plt.xlabel(r"$k$")
plt.axis([idx[0], idx[-1], -1.5, 1.5])
plt.legend()
plt.grid()


# Let's listen to the observed signal $x[k]$ and the output of the Wiener filter $y[k]$

# In[2]:


from scipy.io import wavfile

fs = 8000
wavfile.write("Wiener_observed_signal.wav", fs, np.int16(x * 16384))
wavfile.write("Wiener_output.wav", fs, np.int16(y * 16384))


# **Observed signal**
# 
# <audio src="./Wiener_observed_signal.wav" controls>Your browser does not support the audio element.</audio>[./Wiener_observed_signal.wav](./Wiener_observed_signal.wav)
# 
# **Output of Wiener filter**
# 
# <audio src="./Wiener_output.wav" controls>Your browser does not support the audio element.</audio>[./Wiener_output.wav](./Wiener_output.wav)

# **Exercise**
# 
# * Take a look at the PSDs and the resulting transfer function of the Wiener filter. How does the Wiener filter remove the noise from the observed signal?
# * Change the frequency `Om0` of the original signal $s[k]$ and the noise power `N0` of the additive noise. What changes?
# 
# Solution: The cross-PSD $\Phi_{sx}(e^{j \Omega})$ has its maximum at the frequency $\Omega_0$ of the original signal $s[k]$, besides this frequency the PSD $\Phi_{xx}(e^{j \Omega})$ has approximately the value of the PSD $\Phi_{nn}(e^{j \Omega}) = N_0$ of the additive noise. The resulting Wiener filter $G(e^{j \Omega})$ has an attenuation of 0 dB at $\Omega_0$ which increases off this frequency. It has the characteristics of a band-pass filter centered at $\Omega_0$. By filtering the observed signal $x[k]$ with the Wiener filter, the major portion of the additive white noise is removed. However, around $\Omega_0$ both the original signal $s[k]$ and noise pass the filter. This explains the remaining small deviations between the estimated signal $y[k]$ and the original signal $y[k]$. The adaption of the Wiener filter to the original signal and additive noise changes when changing its frequency and PSD.

# ### Wiener Deconvolution
# 
# As discussed above, the general formulation of the Wiener filter is based on the knowledge of the PSDs $\Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ and $\Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ characterizing the distortion process and the observed signal, respectively. These PSDs can be derived from the PSDs of the original signal $\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ and the noise $\Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$, and the transfer function $G(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ of the distorting system.
# 
# Under the assumption that $n[k]$ is uncorrelated from $s[k]$, the PSD $\Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ can be derived as
# 
# \begin{equation}
# \Phi_{sx}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \cdot G(\mathrm{e}^{\,-\mathrm{j}\,\Omega})
# \end{equation}
# 
# and
# 
# \begin{equation}
# \Phi_{xx}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \cdot  |G(\mathrm{e}^{\,\mathrm{j}\,\Omega})|^2 + \Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})
# \end{equation}
# 
# Introducing these results into the general formulation of the Wiener filter yields
# 
# \begin{equation}
# H(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \cdot G(\mathrm{e}^{\,-\mathrm{j}\,\Omega})}{\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) \cdot  |G(\mathrm{e}^{\,\mathrm{j}\,\Omega})|^2 + \Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}
# \end{equation}
# 
# This specialization is also known as [*Wiener deconvolution filter*](https://en.wikipedia.org/wiki/Wiener_deconvolution). The filter can be derived from the PSDs of the original signal $s[k]$ and the noise $n[k]$, and the transfer function $G(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ of the distorting system. This form is especially useful when the PSDs can be modeled reasonably well by analytic probabilty density functions (PSDs). In many cases, the additive noise can be modeled as white noise $\Phi_{nn}(e^{j \Omega}) = N_0$.

# ### Interpretation
# 
# The Wiener deconvolution filter derived above is also useful for an interpretation of the operation of the Wiener filter. It can be rewritten by introducing the frequency dependent [signal-to-noise ratio](https://en.wikipedia.org/wiki/Signal-to-noise_ratio) $\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}{\Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}$ between the original signal and the noise as
# 
# \begin{equation}
# H(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{1}{G(\mathrm{e}^{\,\mathrm{j}\,\Omega})} \cdot \left(   \frac{|G(\mathrm{e}^{\,\mathrm{j}\,\Omega})|^2}{|G(\mathrm{e}^{\,\mathrm{j}\,\Omega})|^2 + \frac{1}{\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}} \right)
# \end{equation}
# 
# Now two special cases are discussed:
# 
# 1. If there is no additive noise $\Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = 0$, the bracketed expression is equal to 1. Hence, the Wiener filter is simply given as the inverse system to the distorting system $$H(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{1}{G(\mathrm{e}^{\,\mathrm{j}\,\Omega})}$$
# 
# 2. If the distorting system is just a pass-through $G(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = 1$, the Wiener filter is given as
# 
#     $$H(\mathrm{e}^{\,\mathrm{j}\,\Omega}) = \frac{\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}{\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega}) + 1}=\frac{\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}{\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})+\Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})}$$
#     Two extreme cases are considered
#     * for $\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})\gg \Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ (high $\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$) at a given frequency $\Omega$ the transfer function approaches 1
#     * for $\Phi_{ss}(\mathrm{e}^{\,\mathrm{j}\,\Omega})\ll \Phi_{nn}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$ (low $\text{SNR}(\mathrm{e}^{\,\mathrm{j}\,\Omega})$) at a given frequency $\Omega$ the transfer function approaches 0
#     
# The discussed cases explain the operation of the Wiener filter for specific situations, the general operation is a combination of these.

# **Example - Denoising of a deterministic signal with the Wiener deconvolution filter**
# 
# The preceding example of the general Wiener filter will now be reevaluated with the Wiener deconvolution filter.

# In[3]:


N = 2**14  # number of samples
M = 256  # length of Wiener filter
Om0 = 0.1 * np.pi  # frequency of original signal
N0 = 0.1  # PSD of additive white noise

# generate original signal
s = np.cos(Om0 * np.arange(N))
# generate observed signal
g = 1 / 20 * np.asarray([1, 2, 3, 4, 5, 4, 3, 2, 1])
np.random.seed(1)
n = np.random.normal(size=N, scale=np.sqrt(N0))
x = np.convolve(s, g, mode="same") + n
# estimate PSD
f, Pss = sig.csd(s, s, nperseg=M)
f, Pnn = sig.csd(n, n, nperseg=M)
# compute Wiener filter
G = np.fft.rfft(g, M)
H = 1 / G * (np.abs(G) ** 2 / (np.abs(G) ** 2 + N0 / Pss))
H = H * np.exp(
    -1j * 2 * np.pi / len(H) * np.arange(len(H)) * (len(H) // 2 - 8)
)  # shift for causal filter
h = np.fft.irfft(H)
# apply Wiener filter to observation
y = np.convolve(x, h, mode="same")

# plot (cross) PSDs
Om = np.linspace(0, np.pi, num=len(H))

plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.plot(
    Om, 20 * np.log10(np.abs(0.5 * Pss)), label=r"$| \Phi_{ss}(e^{j \Omega}) |$ in dB"
)
plt.plot(
    Om, 20 * np.log10(np.abs(0.5 * Pnn)), label=r"$| \Phi_{nn}(e^{j \Omega}) |$ in dB"
)
plt.title("PSDs")
plt.xlabel(r"$\Omega$")
plt.legend()
plt.axis([0, np.pi, -60, 40])
plt.grid()

# plot transfer function of Wiener filter
plt.subplot(122)
plt.plot(Om, 20 * np.log10(np.abs(H)))
plt.title("Transfer function of Wiener filter")
plt.xlabel(r"$\Omega$")
plt.ylabel(r"$| H(e^{j \Omega}) |$ in dB")
plt.axis([0, np.pi, -150, 10])
plt.grid()
plt.tight_layout()

# plot signals
idx = np.arange(500, 600)
plt.figure(figsize=(10, 4))
plt.plot(idx, x[idx], label=r"observed signal $x[k]$")
plt.plot(idx, s[idx], label=r"original signal $s[k]$")
plt.plot(idx, y[idx], label=r"estimated signal $y[k]$")
plt.title("Signals")
plt.xlabel(r"$k$")
plt.axis([idx[0], idx[-1], -1.5, 1.5])
plt.legend()
plt.grid()


# **Exercise**
# 
# * Compare the transfer function of the Wiener deconvolution filter to the transfer function of the general Wiener filter from the preceding example.
# * What is different compared to the general Wiener filter? Why?
# 
# Solution: The transfer function $H(e^{j \Omega})$ of the Wiener deconvolution filter is much smoother than the transfer function of the general Wiener filter from the preceding example. This is due to the knowledge of the transfer function $G(e^{j \Omega})$ of the disturbing system. The general Wiener filter inherently estimates this from the noisy observations.

# **Copyright**
# 
# This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the IPython examples under the [MIT license](https://opensource.org/licenses/MIT). Please attribute the work as follows: *Sascha Spors, Digital Signal Processing - Lecture notes featuring computational examples*.