This notebook is available as a public gist. If you like it, feel free to let me know on Twitter.

Recently my attention was drawn on Twitter towards a video from a 1984 episode of the Thames Television programme "Database" on sending e-mail via the phone line. What's most interesting is that the episode includes an experimental piece of "tele-software" which is broadcast as an audio signal over the end credits.

You can watch the video here:

In [1]:
from IPython.display import YouTubeVideo

Out[1]:

I wondered how hard it would be to load (and run) this software from 1984. I knew the software was for the BBC Micro which was a machine I had growing up. It'd be a nice nostalgia trip to try and load some software again.

The first thing to do is to download the video and snip out the audio section. Below I've got a little snippet of bash script (assuming you're running on a Unix-like machine) which will download and snip out the audio. If you're not running on a Unix machine, you can download the video and use a program like Audacity to save the audio segment. YOU MUST SAVE IT AS AN UNCOMPRESSED WAV FILE. Python cannot load compressed WAV files.

In [2]:
%%bash
# You must have yourtube-dl and ffmpeg installed to run this script

[ -f video.mp4 ] || youtube-dl -o video.mp4 szdbKz5CyhA

# Tele-software starts at around 5:38. Use ffmpeg to snip out the relevent section and to
# convert the audio track to an uncompressed wav.
[ -f database-telesoftware.m4a ] || ffmpeg -i video.mp4 -ss 5:38.5 -vn -acodec copy database-telesoftware.m4a
[ -f database-telesoftware.wav ] || ffmpeg -i database-telesoftware.m4a -acodec pcm_s16le database-telesoftware.wav


Let's check that our audio segment was clipped out correctly. Here I'm embedding the compressed AAC audio into the notebook to keep the size of the notebook down but you should be able to load the WAV file in the same way:

In [3]:
import os
from IPython.display import Audio

Out[3]:

Now we have the audio data, we can use the Python wave module from the standard library to load it.

In [4]:
import wave
import contextlib
import numpy as np

# Where is the audio file on disk?

# Use contextlib's closing wrapper on wave.open() since the returned object must
# have .close() called on it
with contextlib.closing(wave.open(audio_fn)) as wf:
# Record the sample rate, number of channels and load the raw samples as
# little-endian 16-bit values
sample_rate = wf.getframerate()
n_channels = wf.getnchannels()

# Downmix the samples to mono
samples = np.mean(samples.reshape((-1, 2), order='C'), axis=-1)

# Construct an array giving times in seconds for each sample
ts = np.arange(samples.shape[0]) / sample_rate


Let's have a look at the samples from somewhere in the middle of the data section:

In [5]:
# Enable IPython's matplotlib integration
%matplotlib inline

# Import matplotlib's pyplot API. Configure matplotlib to have slightly larger
# figures by default.
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (14, 9)

In [6]:
example_slice = slice(600000, 601500)
plt.plot(ts[example_slice], samples[example_slice])
plt.xlabel('Time [s]')
plt.ylabel('Sample value')

Out[6]:
<matplotlib.text.Text at 0x7f8584ddaeb8>

There's some documentation available on the Acorn cassette tape format. The physical layer is built around two tones, one at 1200Hz and one at 2400Hz. A physical "zero" bit is represented by one cycle of the 1200Hz tone and a physical "one" bit by two cycles of the 2400Hz one. We can see that in the samples above; there are high frequency and low frequency sections corresponding to "one" and "zero" bit regions.

Since one physical bit is transferred by one cycle of the 1200Hz tone, the physical data transfer rate is 1200 baud. The upper limit for our 10 second clip is therefore around 1KB.

In [7]:
# Set "zero" and "one" bit frequencies and the number of cycles per bit.

zero_freq = 1200 # Hz
zero_cycles_per_bit = 1

one_freq = 2 * zero_freq
one_cycles_per_bit = 2 * zero_cycles_per_bit


## Physical layer decoding¶

There are lots of ways to look for sinusoids of known frequencies in signals. I'm going to make use of a matched filter approach where we correlate the input signal with some version of our expected signal. The output will be large when our signal looks like our expected signal and low otherwise.

The filter here is essentially a windowed DFT filter for a specific frequency bin. We create a complex sinusoid for the "zero" and "one" freqencies. For the "zero" frequency we window the filter to cover one cycle. For the "one" frequency we window it to cover two cycles.

We expect the absolute value of the filter responses to be maximum when the signal matches. In the case of the "zero" filter, when the signal is a single-cycle of a 1200Hz sinusoid and in the case of the "one" filter, when the signal is two-cycles of a 2400Hz sinusoid.

In [8]:
filter_length = zero_cycles_per_bit * sample_rate / zero_freq # samples
filter_ts = np.arange(-int(np.ceil(filter_length*0.5)), int(np.ceil(filter_length*0.5))+1) / sample_rate
window = np.hanning(filter_ts.shape[0])
window /= np.sqrt(np.square(window).sum())
zero_filter = window * np.exp(1j * zero_freq*filter_ts*np.pi*2)

plt.plot(filter_ts, np.real(zero_filter), label='Real')
plt.plot(filter_ts, np.imag(zero_filter), label='Imaginary')
plt.plot(filter_ts, np.abs(zero_filter), label='Magnitude')
plt.legend(loc='best')
plt.xlabel('Time [s]')
plt.ylabel('Filter')
plt.grid(True)