Inspired by this tweet from Scott Weingart, I thought I'd demonstrate some ways to convert a skyline image into a waveform! (An opportunity for some basic sonification.)

To be clear, we'll work with a basic 2D skyline where all we have are heights. It might be interesting to think of each building as its own wavelet and mix them together in sequence, or use some techniques from the Sonification Handbook, but the roof contours of individual buildings are a little harder to obtain, so let's stick to a 2D outline for now.

Some basic image processing with scikit-image will allow us to extract the skyline itself. There are then two simple ways to turn it into sound: directly converting the skyline into an amplitude envelope, or treating the image as a frequency-domain representation and using an inverse FFT.

Method 1: Amplitude envelope

In this case, we create a waveform that has the same envelope as the skyline image. This means that the resulting sound's loudness is proportional to the buildings' heights.

In [1]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

from skimage import feature, filters, io, morphology
from scipy import ndimage
import librosa
import librosa.display # must be imported separately

from IPython.display import Audio, Image
In [2]:
# Load the Boston skyline image
# by Wikimedia Commons user Fcb981
# CC BY-SA 3.0

skyline_url = ""
skyline = io.imread(skyline_url)
In [3]:
# Cropping and thresholding to capture just the outline.

lx, ly, _ = skyline.shape
skyline_cropped = skyline[: -lx // 4, :, 2]

threshold = filters.threshold_isodata(skyline_cropped)
thresholded = skyline_cropped > threshold

outline = np.argmin(thresholded, axis=0)
outline = skyline_cropped.shape[0] - outline

thresholded = thresholded[::-1, :]

fig, axes = plt.subplots()
axes.plot(outline, color='r')
axes.imshow(thresholded,, origin='lower')
<matplotlib.image.AxesImage at 0x117a371d0>

Armed with the outline as a series of y-values, we can create the corresponding amplitude envelope in the range [0.0, 1.0] and multiply it by white noise. (We stretch out the envelope so that the evolution of the sound can be heard).

In [4]:
envelope = outline / outline.max()
stretched_envelope = envelope.repeat(10)

# generate white noise in the range [-1.0, 1.0)
waveform = ((np.random.rand(*stretched_envelope.shape) - 0.5) * 2)

# impose envelope on noise
waveform *= stretched_envelope

fig, axes = plt.subplots()
Audio(waveform, rate=22050)

Sounds, well, like noise.

It's also possible to simply play back the envelope itself (although due to the DC offset it isn't good for your speakers). Since sound is essentially variation in pressure, and the pressure doesn't change widely enough in the envelope, this doesn't sound like much.

In [5]:
fig, axes = plt.subplots()
Audio(stretched_envelope, rate=22050)

Method 2: Inverse FFT

The other common way of turning images into sound is to use an inverse FFT. This treates the image's y-axis as frequency, meaning that the highest pitches correspond to the contours of the roofs.

NB: The resulting audio is much louder than the above, and quite shrill.

In [6]:
skyline_ifft = librosa.istft(thresholded)

D = librosa.stft(skyline_ifft)

fig, axes = plt.subplots()

librosa.display.specshow(librosa.amplitude_to_db(D, ref=np.max),
                         y_axis='log', x_axis='time')
plt.title('Power spectrogram')
plt.colorbar(format='%+2.0f dB')

Audio(skyline_ifft, rate=22050)