In [2]:
%pylab inline
from IPython.display import display, HTML, Audio
display(HTML(data="<style>div#notebook-container{width:95%;}div#menubar-container{width:65%;}div#maintoolbar-container{width:99%;}</style>"))
rcParams['figure.figsize'] = (20, 8) #wide graphs by default
Populating the interactive namespace from numpy and matplotlib

Sonifying Unemployment in Europe Using Wavelet Analysis

Marc Evans

Abstract: This project explores the sonification of unemployment data for European countries during the lead-up to and aftermath of the 2008 financial crisis. The main tool used to analyze and transform the data is Wavelet Analysis. Like Fourier Analysis, Wavelet Analysis allows us to identify important frequency components in the data, but it has the added advantage of pinpointing where in the data those frequency components are most present.

Inspiration

I first became interested in this through several discussions with my parents about the financial crisis, and the severe economic consequences for countries like Greece and Spain. I'm not an economist, but everyone in my immediate family either is currently or was an economics professor, so we talk about things like this.

What struck me most was when my parents commented that sharing a currency entangles countries together in a way that makes them like brothers and sisters. It is this familial dynamic that most interests me: here we have countries that are in some ways very independent and in other ways deeply codependent on one another.

Moreover, this entanglement was suggestive to me of the interrelationships of musical processes. Most music involves parallel processes that are neither fully independent nor fully dependent upon one another, and so I felt that this was fertile ground for musical exploration.

The Data

I downloaded the unemployment data for several european countries in period from 2000-2018 in csv form from:

https://www.euro-area-statistics.org/macro-economic-indicators?cr=eur&lg=en&page=1&visited=1#

First we use pandas to read in the data from a csv file:

In [3]:
import pandas as pd
df1 = pd.read_csv("EuropeUnemploymentData.csv")
df2 = df1.set_index("Country")
print(df2)
            Statistical Data Warehouse code  2018-03  2018-02  2018-01  \
Country                                                                  
Austria        STS.M.AT.S.UNEH.RTT000.4.000     5.04     5.02     5.19   
Belgium        STS.M.BE.S.UNEH.RTT000.4.000     6.37     6.36     6.31   
Cyprus         STS.M.CY.S.UNEH.RTT000.4.000     9.07     9.45     9.93   
Germany        STS.M.DE.S.UNEH.RTT000.4.000     3.43     3.47     3.50   
Estonia        STS.M.EE.S.UNEH.RTT000.4.000     6.25     6.48     6.49   
Spain          STS.M.ES.S.UNEH.RTT000.4.000    16.10    16.23    16.35   
Finland        STS.M.FI.S.UNEH.RTT000.4.000     8.20     8.25     8.34   
France         STS.M.FR.S.UNEH.RTT000.4.000     8.84     8.91     8.97   
Greece         STS.M.GR.S.UNEH.RTT000.4.000    20.43    20.54    20.65   
Ireland        STS.M.IE.S.UNEH.RTT000.4.000     6.06     6.14     6.24   
Italy          STS.M.IT.S.UNEH.RTT000.4.000    11.02    10.98    11.16   
Lithuania      STS.M.LT.S.UNEH.RTT000.4.000     7.53     7.33     7.34   
Luxembourg     STS.M.LU.S.UNEH.RTT000.4.000     5.38     5.40     5.34   
Latvia         STS.M.LV.S.UNEH.RTT000.4.000     7.89     8.04     8.26   
Malta          STS.M.MT.S.UNEH.RTT000.4.000     3.35     3.45     3.59   
Netherlands    STS.M.NL.S.UNEH.RTT000.4.000     3.94     4.06     4.19   
Portugal       STS.M.PT.S.UNEH.RTT000.4.000     7.37     7.63     7.87   
Slovenia       STS.M.SI.S.UNEH.RTT000.4.000     5.15     5.27     5.40   
Slovakia       STS.M.SK.S.UNEH.RTT000.4.000     7.45     7.52     7.60   
Euro area      STS.M.U2.S.UNEH.RTT000.4.000     8.47     8.53     8.62   

             2017-12  2017-11  2017-10  2017-09  2017-08  2017-07   ...     \
Country                                                             ...      
Austria         5.39     5.44     5.41     5.41     5.47     5.42   ...      
Belgium         6.24     6.34     6.58     6.85     7.10     7.20   ...      
Cyprus         10.46    10.40    10.06    10.14    10.36    10.51   ...      
Germany         3.55     3.59     3.63     3.67     3.70     3.74   ...      
Estonia         5.71     5.56     5.45     5.99     5.56     6.00   ...      
Spain          16.48    16.58    16.65    16.72    16.82    16.89   ...      
Finland         8.41     8.44     8.45     8.46     8.49     8.55   ...      
France          8.96     9.01     9.18     9.37     9.55     9.60   ...      
Greece         20.76    21.10    21.00    20.84    20.82    20.91   ...      
Ireland         6.33     6.39     6.50     6.62     6.67     6.74   ...      
Italy          10.95    11.07    11.12    11.12    11.19    11.33   ...      
Lithuania       6.84     6.71     6.72     6.87     6.77     6.90   ...      
Luxembourg      5.44     5.37     5.39     5.53     5.56     5.62   ...      
Latvia          8.25     8.27     8.31     8.41     8.65     8.84   ...      
Malta           3.79     3.78     3.81     4.01     3.98     3.94   ...      
Netherlands     4.37     4.38     4.47     4.66     4.71     4.83   ...      
Portugal        7.94     8.07     8.41     8.53     8.76     8.90   ...      
Slovenia        5.55     5.74     6.00     6.36     6.56     6.61   ...      
Slovakia        7.64     7.71     7.79     7.87     7.93     8.03   ...      
Euro area       8.63     8.70     8.79     8.87     8.96     9.04   ...      

             2000-10  2000-09  2000-08  2000-07  2000-06  2000-05  2000-04  \
Country                                                                      
Austria         3.85     3.87     3.83     3.76     3.63     3.67     3.84   
Belgium         6.84     6.89     6.99     6.75     6.75     6.81     6.89   
Cyprus          4.79     4.77     4.67     4.61     4.75     5.34     5.36   
Germany         7.83     7.86     7.89     7.92     7.96     7.99     8.03   
Estonia        14.09    14.24    14.26    13.87    14.04    13.93    14.53   
Spain          11.66    11.74    11.85    11.94    11.92    11.90    11.98   
Finland         9.51     9.58     9.56     9.55     9.63     9.79    10.03   
France          8.84     8.89     8.98     9.07     9.15     9.27     9.39   
Greece         10.63    11.20    11.25    11.29    11.41    11.47    11.51   
Ireland         4.03     4.20     4.39     4.51     4.66     4.76     4.86   
Italy           9.63     9.94     9.98    10.03    10.21    10.26    10.31   
Lithuania      16.52    16.93    17.05    17.01    16.79    16.53    16.18   
Luxembourg      2.07     2.24     2.29     2.26     2.30     2.32     2.28   
Latvia         14.51    14.40    14.35    14.34    14.36    14.36    14.32   
Malta           6.65     6.61     6.59     6.58     6.62     6.72     6.81   
Netherlands     3.44     3.49     3.55     3.62     3.67     3.74     3.80   
Portugal        5.02     5.15     5.15     5.07     5.01     5.09     5.02   
Slovenia        6.31     6.58     6.80     7.03     7.16     7.17     7.06   
Slovakia       18.76    18.62    18.62    19.68    19.41    19.23    18.90   
Euro area       8.31     8.41     8.47     8.51     8.57     8.62     8.69   

             2000-03  2000-02  2000-01  
Country                                 
Austria         4.14     4.26     4.28  
Belgium         6.89     7.10     7.33  
Cyprus          5.06     4.77     4.64  
Germany         8.08     8.12     8.18  
Estonia        14.18    14.90    14.50  
Spain          12.17    12.42    12.76  
Finland        10.23    10.24    10.14  
France          9.47     9.61     9.76  
Greece         11.56    11.58    11.52  
Ireland         4.77     4.87     5.01  
Italy          10.52    10.55    10.60  
Lithuania      15.74    15.60    15.57  
Luxembourg      2.33     2.29     2.28  
Latvia         14.31    14.34    14.38  
Malta           6.91     6.94     6.91  
Netherlands     3.84     3.90     3.93  
Portugal        5.08     5.29     5.47  
Slovenia        6.82     6.76     6.80  
Slovakia       18.70    18.49    18.26  
Euro area       8.79     8.90     9.02  

[20 rows x 220 columns]

We can then extract each country's data into an array and place all the arrays into a dictionary indexed by country name:

In [4]:
time_series_by_country = {
    country: df2.loc[country, :].values[-1:1:-1].astype(np.float) for country in df2.index.values
}

If we plot all the countries' data together, the point at which the financial crisis hits comes through loud and clear:

In [5]:
headers = df2.dtypes.index
dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m')
times = map(dateparse, df2.dtypes.index[:1:-1]) 
times = np.array(list(times))

for country in time_series_by_country:
    plot(times, time_series_by_country[country], label=country)
ylabel("Unemployment %")
legend()
Out[5]:
<matplotlib.legend.Legend at 0x7f584da28cf8>

Simple Sine Wave Sonification

Some Utility Functions

Since the length of the arrays above is very small relative to audio sampling frequency, it's useful to define a function that can resample arrays:

In [6]:
from scipy.interpolate import interp1d

def normalized_interp(an_array):
    return interp1d(np.linspace(0, 1, len(an_array)), an_array)

def resample_array(an_array, new_length):
    return normalized_interp(an_array)(np.linspace(0, 1, new_length))

For this notebook, we'll be using a 44100 Hz sample rate. However, I will often decimate the audio to 22050 Hz when presenting intermediate results to save space.

We'll also define a function that converts a frequency curve in Hz to a phase curve that we can easily take the sine of. Note that, since frequency is the derivative of phase, phase is the integral (or cumulative sum) of frequency. (For the units to work, though, we have to convert frequency to radians / sample.)

In [7]:
SAMPLE_RATE = 44100

def phase_array_from_frequency_array(frequency_array):
    # frequency is the derivative of phase, so to go the other way, we integrate
    # but first we need it to express frequency in the units of radians / sample
    # currently it's in cycles / second so we need to multiply by seconds / sample and radians / cycle
    # which means we multiply by 2*pi / SAMPLE_RATE
    scaled_frequency_array = frequency_array * 2 * pi / SAMPLE_RATE
    return np.cumsum(scaled_frequency_array)

It will also be useful to define a function that maps the data to a range using an exponential, so that it sounds linear in pitch:

In [8]:
def map_to_exponential_range(data, exp_min, exp_max):
    data_min, data_max = np.min(data), np.max(data)
    normalized_data = (data - data_min) / (data_max - data_min)
    return exp((1-normalized_data) * log(exp_min) + normalized_data * log(exp_max))

plot(map_to_exponential_range(time_series_by_country["Austria"], 30, 900))
Out[8]:
[<matplotlib.lines.Line2D at 0x7f584c0edda0>]

Sonification

Putting this all together, we can go ahead and sonify the above curves as sine waves by mapping all the curves to a frequency range, resampling to audio rate, converting to a phase array, and then taking the sine. I also normalized and added a little fade in and out.

In [9]:
all_the_curves_sonified = np.zeros(SAMPLE_RATE * 10)

# convert all the curves to frequency curves
frequency_curves = map_to_exponential_range(np.array(list(time_series_by_country.values())), 100, 4000)

for frequency_curve in frequency_curves:
    # to get a frequency curve that feels linear in pitch, we need to exponentiate. The multiple of 5 just transposes up a bit.
    all_the_curves_sonified += sin(phase_array_from_frequency_array(resample_array(frequency_curve, SAMPLE_RATE * 10)))
    
all_the_curves_sonified /= len(frequency_curve)
# fade in and out
all_the_curves_sonified[:1000] *= linspace(0, 1, 1000)
all_the_curves_sonified[-5000:] *= linspace(1, 0, 5000)
Audio(data=all_the_curves_sonified[::2], rate=SAMPLE_RATE/2)
Out[9]:

Exploring the Continuous Wavelet Tranform

The continuous wavelet transform essentially consists of convolving a signal with a special kind of test signal which is stretched and compressed to highlight different parts of the frequency spectrum. I'll first explore it "manually", using the ricker wavelet (also known delightfully as the "Mexican Hat Wavelet").

In [10]:
from scipy.signal import ricker
from numpy import convolve
# 'ricker' takes two arguments: the number of points, and the size of the main bulge
plot(ricker(200, 20))
Out[10]:
[<matplotlib.lines.Line2D at 0x7f583e33e4e0>]

Let's start by using this to analyse the unemployment time series for Austria:

In [11]:
austria_unemployment = time_series_by_country["Austria"]
plot(austria_unemployment)
Out[11]:
[<matplotlib.lines.Line2D at 0x7f583e0c6f98>]