Notebook

Binaural Hearing¶

In [ ]:

import numpy as np
import soundfile as sf
import sounddevice as sd
from scipy.signal import fftconvolve
import tools

Interaural Correlation, Degree of Coherence¶

This exercise examines the degree of coherence $k$ between right and left ear using noise signals. You can find more information in the (non-public) lecture notes on slides 4-29 to 4-31.

In [ ]:

fs = 44100
dur = 1  # seconds
ks = np.linspace(0, 1, 11)
stddev = 0.2

In [ ]:

length = int(dur * fs)

np.random.seed(7)

for k in ks:
    R = [[1, k],
         [k, 1]]
    L, V = np.linalg.eig(R)
    L.shape = 1, -1  # turn L into row vector
    W = V * np.sqrt(L)
    # create a new random signal in each iteration:
    stereo_noise = np.random.normal(scale=stddev, size=(length, 2))
    outsig = np.dot(stereo_noise, W.T)  # matrix product
    sf.write('outdata/noise_k{:.1f}.wav'.format(k), outsig, fs)

This creates several WAV files containing noise. The degree of coherence $k$ between both channels is part of the respective filename.

Exercise: Listen to the different files and note how you perceive different degrees of coherence.

$k=1.0$ outdata/noise_k1.0.wav

$k=0.9$ outdata/noise_k0.9.wav

$k=0.8$ outdata/noise_k0.8.wav

$k=0.7$ outdata/noise_k0.7.wav

$k=0.6$ outdata/noise_k0.6.wav

$k=0.5$ outdata/noise_k0.5.wav

$k=0.4$ outdata/noise_k0.4.wav

$k=0.3$ outdata/noise_k0.3.wav

$k=0.2$ outdata/noise_k0.2.wav

$k=0.1$ outdata/noise_k0.1.wav

$k=0.0$ outdata/noise_k0.0.wav

Binaural Unmasking¶

Create audio examples based on slide 4-33.

To do that, convolve the given HRIRs (stored in the files data/hrir00.wav, data/hrir45.wav and data/hrir90.wav) on the one hand with an arbitrary speech signal (e.g. from the data/ directory), on the other hand with a noise signal (use the function numpy.random.normal()).

Add speech and noise in different combinations of angles of incidence. Amplify or attenuate speech and noise in a way that the speech is just barely understandable.

Listen to the different combinations. In which combination can you understand the speech better or worse?

In [ ]:

Interaural Time Difference/Interaural Level Difference¶

Two important cues for spatial hearing are ITD and ILD.

Exercise: Click and learn:

Watch (and listen to!) the videos on this page: http://auditoryneuroscience.com/spatial-hearing/binaural-cues
Here is an interactive example: http://auditoryneuroscience.com/spatial-hearing/time-intensity-trading

Exercise (only if you happen to have access to Matlab®): Do the listening test described on this page: http://auditoryneuroscience.com/spatial-hearing/ITD-ILD-practical

Exercise: Try to calculate the ITD as function of the angle of incidence using a very much simplified head model. Assume a spherical head with ear holes on exactly opposite points on the sphere. Make the head radius an adjustable parameter in your calculations; use 8.75 cm as default value. Assume further, that the sound source is sufficiently far away, so that the angle of incidence is constant on the whole sphere.

What is the maximum distance in the path from the sound source to the two ear holes, respectively?

What time difference does that correspond to, assuming the speed of sound $c = 343 \text{ m/s}$?

Plot the ITD as a function of angle of incidence for a given head radius.

Spatial Hearing of Multiple Sound Sources¶

Simulate the ear signals of the listening experiments on slide 4-26 and 4.27.

Exercise: Create a function that takes a mono signal and returns two signals. The left signal is same as the input but the right signal is delayed and scaled.

In [ ]:

def delay_and_attenuate(x, delay, gain, fs=44100):
    """
    Parameters
    ----------
    x : array_like
        Mono signal
    delay : float
        Delay of the second channel (in milliseconds)
    gain : float
        Gain of the second channel (in dB)
    """
    d = int(np.round(delay * fs * 0.001))
    x = np.tile(np.concatenate((x, np.zeros(np.abs(d))), axis=-1), (2, 1)).T
    x[:, 1] = 10**(gain / 20) * np.roll(x[:, 1], d, axis=0)
        
    return tools.normalize(x, maximum=0.6)

In [ ]:

Exercise: Use two HRIR pairs to create the ear signals

In [ ]:

def virtual_stereo(x, **kwargs):
    """
    
    """
    # load HRIRs
    hright, _ = sf.read('data/hrir30.wav')
    hleft = np.roll(hright, 1)

    return np.column_stack([fftconvolve(x[:, 0], ir, **kwargs) for ir in hleft.T]) \
            + np.column_stack([fftconvolve(x[:, 1], ir, **kwargs) for ir in hright.T])

In [ ]:

Create the ear signals of the 'Virtual Stereo System'. Try it with different types of signals in 'data/' ('castanets.wav', 'xmax.wav', 'singing.wav').

In [ ]:

delay = 0
gain = 0
# mono_signal, _ = sf.read('...')

stereo_signal = delay_and_attenuate(mono_signal, delay, gain)
y = tools.normalize(virtual_stereo(stereo_signal), maximum=0.6)

sd.play(y)
sd.wait()

Summing Localization¶

Vary the relative gain between the left and right channels. Try to localize the sound source. Do you hear only one source? Compare your observations with the results on Slide 4-26.

In [ ]:

Law of the First Wave Front¶

Apply a delay to the second (right) signal, in the range of 0 -- 50 ms. Compare your observation with the results shown in Slide 4-27.

In [ ]:

Applications of Binaural Synthesis¶

Games https://www.youtube.com/watch?v=8e-O2V8QcUE
Virtual studio http://www.waves.com/plugins/nx#nx-head-tracker-for-headphones-quick-start
Demo with head-tracker https://www.waves.com/nx/player
Demo https://www.bbc.co.uk/programmes/p05189jv

Solutions¶

If you had problems solving some of the exercises, don't despair! Have a look at the example solutions.

To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.