Acoustics To Deep Learning

@stephencwelch

Originally featured at Welch Labs.

In [ ]:
#Link video here once live.

Sound is pretty simple. Everything you hear begins with motion. If I snap my fingers, that motion sets off a chain reaction. As my finger hits into my thumb, the air molecules that are immediately in contact with my fingers are set into motion. These molecules bump into their neighbors, these neighbors bump into their neighbors, and so on, until the very last air molecule bumps into your eardrum…and…

We can capture sound by placing a microphone in the path of these moving air molecules. Microphones contain a diaphragm that is part of a circuit that converts mechanical motion into electrical voltage. When air molecules bump into our microphone diaphragm, the voltage across our circuit changes. The harder the air molecules push on the microphone, the bigger our voltage is.

What is absolutely remarkable is that if we take a signal recorded from our microphone and apply it across the wires of a speaker – we hear a sound that is incredibly similar to the one we started with. And all speakers do is move in response to the voltage applied across their terminals. If a large voltage is applied, the speaker moves a large distance. If a small voltage is applied, the speaker moves a short distance. Simply by pushing air molecules around in the same pattern as they bumped into the microphone, we can recreate the hearing experience.

In [1]:
from scikits.audiolab import wavread, wavwrite
data, fs, enc = wavread('sounds/Snap.wav')
In [2]:
plot(data[0:300, 0])
grid(1)
ylabel('Voltage')
xlabel('Samples')
Out[2]:
<matplotlib.text.Text at 0x1067e5d10>
In [4]:
from IPython.display import Audio
Audio('sounds/Snap.wav')
Out[4]:
In [5]:
#Apply large positive voltage to speaker:

smallSignal = 1*np.ones(1000)

plot(smallSignal, linewidth=2)
grid(1)
ylim([0,1])
xlabel('Samples')
ylabel('Voltage')
Out[5]:
<matplotlib.text.Text at 0x106657910>
In [6]:
wavwrite(smallSignal, 'sounds/smallSignal.wav', fs)
Audio('sounds/smallSignal.wav')
Out[6]:
In [7]:
#Apply smaller positive voltage to speaker:

smallSignal = 0.25*np.ones(1000)

plot(smallSignal, linewidth=2)
grid(1)
ylim([0,1])
xlabel('Samples')
ylabel('Voltage')
Out[7]:
<matplotlib.text.Text at 0x10664a610>
In [8]:
wavwrite(smallSignal, 'sounds/smallSignal.wav', fs)
Audio('sounds/smallSignal.wav')
Out[8]:

A simple voltage signal like this applied across a speaker can bring us our favorite music, the voice of someone we love, or even this video. Sound can entertain, educate, make us feel happy or sad, and all these are somehow captured in a voltage signal. Sound is one of the few ways we perceive the world around us, and in the last couple hundred years we have learned some incredible things about sound and hearing, and developed some really cool technologies.

Once we understand that we can express sounds, even really complicated sounds as a voltage signal, we can begin to understand the inner workings of technologies like records, CDs, iPods, cell phones, voice recognition, shazaam, and noise cancellation.

And understanding these technologies is not just interesting from an engineering perspective, understanding these how these things work provides us fascinating insight into how our mind functions as it processes the world around us.

It may seem like our brains work like a microphone does. Sound travels though the air, air molecules bump into our ears, and our brain turns these motions into voltage signals. There is truth to this, but in reality, doing this is inefficient and would require way too much energy – our ancestors would have had to eat massive amount of food just to support their hearing systems.

What we have instead is an incredibly efficient system that is full of fascinating complexity, such as feedback loops, and non-linearites. What’s even more interesting is that understanding the complexity of hearing systems is not just biologically interesting, but is a key part of modern technologies. Without the understanding of how our brains hear and make sense of sounds, we wouldn’t be able to effectively compress audio, and mp3s and much of digital music via the internet would not have been possible. Instead of the original iPod being “1000 songs in your pocket”, it would have been more like 1 album in your pocket, which probably already fit…

In the coming Months, we’ll explore the intersection of sound, music, audio, digital signal processing, perception, psychoacoustics, and machine learning. While machine learning is pretending to be a new field much of the work being done in machine learning, especially on audio signals is informed by, and only fully understood in the context of the electrical engineering and signal processing work of the last century. We’ll start from here, cover lots of cool stuff along the way, and ultimately dive into the cutting edge of the field, deep learning on audio signals.

Next time, we’ll discuss sound and acoustics.

Why this series?

While we’re covering a lot of ground here, in some cases in substantial depth, this series is not meant to be exhaustive or even thorough. I’m going to talk about what I think is interesting and compelling with as much depth and clarity I can squeeze into a YouTube video. This is mostly because I believe that with the number of resources available, being thorough is a waste of our time. Finally, we will cover some serious computational techniques here, but only in the context of examples. I think it’s important to remember that as cool as tools and techniques are, they are only, at most, a means of accomplishing something. The techniques shown here were not developed in a vacuum, but in the rich and complex world of application, and I believe that presenting tools and techniques in the absence of the appropriate context does you a disservice, and can even hinder learning – the why is just as important as the how.