Analyzing a Corpus of Sheet Music

In [1]:
%%html 
<style>
  table {margin-left: 0 !important;}
  img {margin-left: 0 !important;}
</style>
In [2]:
%load_ext autoreload
%autoreload 2
import jh, os
import pandas as pd
from IPython.display import display_html, IFrame
from IPython.display import HTML
from IPython.display import Video
import plotly.graph_objects as go
import plotly.express as px
import cufflinks as cf # for creating plots from pandas on the fly
cf.go_offline()
cf.set_config_file(theme='ggplot')
import numpy as np
import plotly.figure_factory as ff
import pickle

def display_side_by_side(title_df_dict):
    """Pass a {'title': DataFrame} dict."""
    stylers = [df.style.set_table_attributes("style='display:inline'").set_caption(title)._repr_html_() for title, df in title_df_dict.items()]
    display_html(''.join(stylers), raw=True)

data_ms3 = 'data/MuseScore_3'
data_tsv = 'data/tsv'
classif_dances = ['walzer', 'deutscher', 'ländler', 'menuett', 'trio']
color_dances = {'walzer': 'red',
 'ländler': 'yellow',
 'ecossaise': 'brown',
 'deutscher': 'orange',                
 'trio': 'lightblue',
 'menuett': 'blue',
 'galopp': 'gray',
 'cotillon': 'green',
 }
In [3]:
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
The raw code for this IPython notebook is by default hidden for easier reading.
To toggle on/off the raw code, click <a href="javascript:code_toggle()">here</a>.''')
Out[3]:
The raw code for this IPython notebook is by default hidden for easier reading. To toggle on/off the raw code, click here.

Characterizing a data set of 435 short piano dances by Franz Schubert

The Authors

initials name
GC Gabriele Cecchetti
JH Johannes Hentschel
JR Johannes Rüther
SR Sébastien Rouault

Abstract

Everybody has an emotional access to music, which in turn transports listeners into specific moods, atmospheres and settings. This intuitive understanding of musical pieces is grounded in structural elements such as melodic motives and rhythms, with their variations or repetitions. Our project aims at gaining an objective insight into how these patterns engender different shades of musical engagement. Specifically, we will investigate the structure and composition of Franz Schubert's "dances", a large set of 435 piano pieces classified in seven different types according to their times’ conventions. The conventionality of these short and catchy pieces makes them an ideal testing ground for computer-assisted analysis, but also makes them highly representative of the taste of early-nineteenth-century listeners. With our statistical investigation we hope to open a window onto this past: what did listeners expect of appealing dance-like music, and did different dance types correspond to different musical features or just to different social perspectives on the same musical features?

The data set

In this section we introduce the data set of sheet music that we have analyzed. First we will start with a small introduction into its historical background and then we explain the data representation that we used.

Historical Background

In the time between 1813 and his death in 1828, the composer Franz Schubert composed a large number of short piano pieces, the "Dances". Throughout his life, spent entirely in Vienna, Schubert gathered a good number of collections of such pieces which were aimed at meeting the taste of his hometown's burgeoisie. While Schubert did write long and more complex compositions for which he is famous nowadays, at the time he was better known for small and not very ambitious forms such as these "Dances". Since this genre can be considered as commercial music, destined to a broad audience (a bit similar to Pop music today), they are musically quite conventional and therefore represent an ideal data set for computer-assisted analysis of the musical tastes of the early 19th century.

The data set is comprised of the following dance types:

In [4]:
files = pd.read_csv(os.path.join(data_tsv, 'files.tsv'), sep='\t', index_col=[0])
pd.DataFrame(files.dance.value_counts()).reset_index().iplot('pie', labels='index', values='dance', color=color_dances.values(), title="Dance types in Schubert's dances")

The Viennese dance floor

All the dances represented in the corpus comprise the typical soundtrack of the Viennese ball rooms.

The “queen of all dances” (Feldtenstein, 1767) was undoubtedly the French Menuett. As most dance forms of the classical repertoire, the minuet originated as an appropriation of folk tunes into the stereotyped taste of nobility and courts, and in this form it was then exported throughout Europe. Traditionally, each minuet would be coupled with a contrasting dance, the Trio: hard to characterise collectively, trios were individually crafted to provide variety by conveying a different character from the related minuet. In general, trios were more explicit in emulating the rural origins of the dance.

In [5]:
IFrame("https://www.youtube.com/embed/ZJ-jzZtARlw", 280, 160)
Out[5]:

Vid. First minuet & trio of the data set: D. 41, no. 1

Closer to their rural origins, both the Cotillon and the Ecossaise are country dances and both French, the latter though being inspired by the rhythms of the Scottish folk tunes. The Galopp was also a fast-paced dance eventually turning into the - nowdays better known - Polka.

In [6]:
IFrame('https://www.youtube.com/embed/J2fUU6_DDvU', 280, 160)
Out[6]:

Vid. Galopp und 8 Ecossaisen, D. 735

Of strictly German origin are the Ländler and the Walzer. The former was characterised by a variety of body motions interrupted by clapping and feet stomping, evoking the scenery of countryside paesant dances. On the contrary, the Walzer was a fast and smooth spinning dance that became explosively popular because of its intrinsic sensuality, to the point that it was formally banned by the authorities several times: "In this dance, everything is circle-shaped and whirling movement, everything designed to provoke giddiness and seduce the senses." (Vieth, 1794).

In [7]:
IFrame("https://www.youtube.com/embed/94uW9Bbuwm8", 280, 160)
Out[7]:

Vid. Walzer D. 779, no. 23

In [8]:
IFrame("https://www.youtube.com/embed/BwqjeWEp3dM", 280, 160)
Out[8]:

Vid. Ländler D. 681, no. 1

The term Deutscher Tanz, literally "German dance", was used in association with both the Walzer and the Ländler, and it is unclear whether it would also identify a dance type of its own right: in a few instances, Schubert himself happened to label the same dances as Ländler or Deutscher.

In [9]:
IFrame("https://www.youtube.com/embed/29GOPJY34hU", 280, 160)
Out[9]:

Vid. Deutscher D. 643, no. 1

To know more: McKee E. (2014). Ballroom Dances of the Late Eighteenth Century in Danuta Mirka ed., The Oxford handbook of topic theory. Oxford University Press.

Handling the data

Music scores are complex graphical representations of music. Trained musicians can read music scores and use them to imagine or reproduce the symbolically encoded sounds. This section elicits how we transformed the scores into our working representation, namely one note list and one measure list for the entire corpus.

The dataset consists of 435 music scores in the XML format of the open source notation software MuseScore 3 and has been compiled for this project by GC & JH using funds of EPFL's Digital and Cognitive Musicology Lab (DCML).

score_example Fig. The first two measures of the Ecossaise D. 977, no. 2. Left: Representation in MuseScore 3. Right: Abbreviated source code of the first measure in the right hand. For explanations of the terms, refer to the section 'Hints for music notation agnostics' below.

After parsing the scores, the music seen in the figure looks like this (for details concerning the parsing and the note list features, cf. jh.ipynb):

In [10]:
note_list = jh.read_note_list(os.path.join(data_tsv, 'note_list_expanded.tsv'), index_col=[0,1,2])
note_list.loc[[424]].iloc[:11]
Out[10]:
mc mn onset duration gracenote nominal_duration scalar tied tpc midi staff voice volta timesig beats beatsize note_names octaves beat subbeat
id section ix
424 0 0 0 1 0 1/4 NaN 1/4 1 NaN -5 49 2 1 NaN 2/4 1 1/4 Db 3 1 0
1 0 1 0 1/4 NaN 1/4 1 NaN -1 77 1 1 NaN 2/4 1 1/4 F 5 1 0
2 0 1 0 1/8 NaN 1/8 1 NaN -1 77 1 2 NaN 2/4 1 1/4 F 5 1 0
3 0 1 1/8 1/8 NaN 1/8 1 NaN -1 65 1 2 NaN 2/4 1.1/2 1/4 F 4 1 1/2
4 0 1 1/4 1/4 NaN 1/4 1 NaN -1 53 2 1 NaN 2/4 2 1/4 F 3 2 0
5 0 1 1/4 1/4 NaN 1/4 1 NaN -4 56 2 1 NaN 2/4 2 1/4 Ab 3 2 0
6 0 1 1/4 1/4 NaN 1/4 1 NaN -5 61 2 1 NaN 2/4 2 1/4 Db 4 2 0
7 0 1 1/4 1/8 NaN 1/8 1 NaN -4 68 1 2 NaN 2/4 2 1/4 Ab 4 2 0
8 0 1 1/4 1/4 NaN 1/4 1 NaN -5 73 1 1 NaN 2/4 2 1/4 Db 5 2 0
9 0 1 3/8 1/8 NaN 1/8 1 NaN -1 65 1 2 NaN 2/4 2.1/2 1/4 F 4 2 1/2
10 1 2 0 1/4 NaN 1/4 1 NaN -4 44 2 1 NaN 2/4 1 1/4 Ab 2 1 0

Hints for music notation agnostics

This section briefly explains some of the main elements that music scores contain. Please compare the explanations with the example score above.

The words 'measure' and 'bar' can be used interchangeably. Measures are separated by vertical bar lines designating music chunks of equal length. They also give information about the temporal hierarchy of the notes, with the ones directly after the bar line generally being the most accented.

Note that MuseScore XML files contain one \<Measure> node for every bar line and that we address these units with running numbers starting at zero which we call measure counts (mc). The measure numbers (mn), however, are those that are displayed/printed in the score. Since, say, a measure of length 3/4 (see time signature) can be split into two measures with irregular length for musical and notational conventions, measure numbers frequently span multiple measure counts and had to be computed by the parser following the usual conventions.

A staff (pl. 'staves') is a set of usually five horizontal lines where the positioning of note heads on or between lines determines their pitch, i.e. their vertical position in terms of note name and octave.

A clef determines which line represents which pitch.

A key signature (KeySig) further specifies the exact pitches and reveals information as to which key the piece is in. A key could be defined as a hierarchy between the occurring pitches. In the example figure above, the KeySig of 5 $\flat$ means that the key of this piece is either D$\flat$ major or B$\flat$ minor and that the pitches B, E, A, D, G have to be lowered throughout the piece. Lowering ($\flat$) or raising ($\sharp$) a pitch can be understood as playing the pitch just below or just above, often resulting on the piano in playing the neighbouring black key instead of a white key.

keyboard

Fig. showing one octave on a piano keyboard and the most common pitch names together with their tonal pitch class tpc. If you consider the leftmost C as the middle C on the piano, C4, then these keys would have the midi numbers 60-72

A time signature (TimeSig) stipulates the measure length and the metre), which is the information which onset positions ('beats') are most important. In our data set, the only time signatures are 2/4 (2 quarter notes per measure, 'double meter') and 3/4 (3 quarter notes per measure, 'triple meter').

Dynamic markings contain information on how loud something is to be played.

Slurs reveal phrasing information, meaning which parts of the music are to be thought of and played as units.

Research questions and hypotheses

With our statistical investigation we hope to open a window into the past: what did listeners expect of appealing dance-like music, and did different dance types correspond to different musical features or just to different social perspectives on the same musical features?

Recurring Patterns (JH & SR)

What recurring patterns and regularities make Schubert's dances so intuitively appealing and easy to grasp? To support our claim that the dances are rather conventional music, we want to show that a large portion of the corpus is made up of just a hand full of recurring patterns. We will investigate the most common patterns on the levels of

* rhythms
* harmonies
* formal sections

Dance Type Classification (GC & JR)

Are the different dance types musically distinguishable? Which features will a classifier use to most accurately distinguish the different dance types?

Whether the clues that make different dances sound different are hidden in the score or rather in unwritten conventions is an open question: the score does not contain all of the audible properties of a musical performance. However, it is possible that at least some of the audible features that make a dance different from another are represented in the score, and thus in our dataset. Will our machine learning model be able to learn from the score, without any access to the shared social understanding of the music?

This task allows for two possible outcomes: If we manage to design features that render possible a satisfying automatic classification through a machine learning algorithm, this would show that the different dance types are indeed characterized by features contained in some way or another in their scores. If, on the other hand, a satisfying classification will not be possible, this can either mean that our approach has shown inadequate or that dance types have been attributed somewhat arbitrarily to quite similar compositions.

Recurring patterns

Most common rhythms

To get a first overview of the most common rhythms appearing in Schubert's dances, we extract and count the onset patterns of every measure. We define an onset pattern as an ordered list of unique onsets. Consider for example the first measure of the Walzer D. 924, no. 1 Drawing In the left hand, there are only three quarter notes, which corresponds to the onset pattern [0, 1/4, 1/2] whereas the right hand also has eight notes and the onset pattern [0, 1/4, 3/8, 1/2, 5/8]. The latter is also the onset pattern of the whole measure since it has the more fine-grained onset distribution and the left-hand rhythm only duplicates onsets. Since lists of fractions are unintuitive feature representations, we decided to translate them into rhythmical language by mapping all possible divisions, the rhythmical atoms, to syllables as they are frequently used by music pedagogues: syllables

pattern onset pattern length
A [0] 3/4
B [0] 1/2
C [0, 3/8] 1/2
D [0] 1/4
E [0, 1/8] 1/4
F [0, 1/16, 1/8, 3/16] 1/4
G [0, 3/16] 1/4
H [0, 1/32, 1/16, 3/32, 1/8, 7/32] 1/4
I [0, 1/12, 1/6] 1/4

Most frequent patterns in double meter (2/4)

In this plot we show the most frequent onset patterns for all pieces with time signature 2/4 throughout the corpus. The analysis is split into the patterns of entire measures, those within the right and the left hand. The most frequent patterns are very common and rhythmically transparent combinations of the atoms

  • Tao (Half note)
  • Ta (Quarter note)
  • Titi (Two eights)
  • Tigitigi (Four sixteenths)
  • Triole (Triplet of eights)

Two spikes among the left hand patterns (the slower rhythms TaTa and Tao) show that these are particularly common for the left hand which mostly fulfills an accompagnying function.

In [11]:
pd.read_csv(os.path.join(data_tsv, 'onset_pattern_double.tsv'), sep='\t', index_col=0).iplot('bar', 
               layout=dict(yaxis={'type': 'log', 
                                  'title': 'Total count'},
                           xaxis={'title': 'Onset pattern'},
                           title='Most frequent 2/4 onset patterns throughout the corpus'))

Most frequent rhythms in triple meter (3/4)

The most frequent rhythms among the triple meter dances are once again composed of the most basic rhythmic cells mentioned above, although especially the right hand also frequently features rhythms with the dotted rhythms Taiti (dotted quarter plus eighth) and Timgi (dotted eight plus sixteenth).

In both cases, the patterns including faster note values (eighths and sixteenths) occur more rarely or not at all in the left hand. This reflects the general listening impression (see below) that very often the left hand sticks to stereotypical chordal accompagniments while the right hand adds more motioned melodies above. The most typical chordal accompagniments - both consisting of quarter notes only - can be onomatopoetically baptized "oom-pah" and "oom-pah-pah" (see examples in the next sections).

In [12]:
pd.read_csv(os.path.join(data_tsv, 'onset_pattern_triple.tsv'), sep='\t', index_col=0).iplot('bar', 
               layout=dict(yaxis={'type': 'log', 
                                  'title': 'Total count'},
                           xaxis={'title': 'Onset pattern'},
                           title='Most frequent 3/4 onset patterns throughout the corpus'))

Most stereotypical pieces in double meter

To find the most rhythmically most stereotypical pieces, we take the most frequent onset pattern in the right hand, TitiTiti and the one in the left hand, TaTa and check which pieces have the highest percentage among all measures. Here, we have to typical instances of an "oom-pah" accompagniment:

id TitiTiti_right TaTa_left piece
91 0.8125 0.8750 D145ecossaise02.mscx
257 0.8750 0.8750 D697ecossaise03.mscx

D145n2 Fig. Ecossaise D. 145, no. 2

In [13]:
IFrame("https://www.youtube.com/embed/BMDKatK31P4?start=37", 260, 180)
Out[13]:

D697n3 Fig. Ecossaise D. 697, no. 3

In [14]:
IFrame("https://www.youtube.com/embed/899EiZ5Se8o?start=71", 260, 180)
Out[14]:

Most stereotypical pieces in triple meter

To find the most rhythmically most stereotypical pieces, we take the most frequent onset pattern in the right hand, TitiTitiTiti and the one in the left hand, TaTaTa and check which pieces have the highest percentage among all measures. The waltz shows a typical instance of the oom-pah-pah accompagniment, whereas the Ländler exposes a different left hand, although with the same onset pattern.

id TitiTitiTiti_right TaTaTa_left piece
163 0.9583 0.8750 D365walzer14.mscx
269 0.9167 0.9167 D734ländler08.mscx

D365n14 Fig. Walzer D. 365, no. 14

In [15]:
IFrame("https://www.youtube.com/embed/oSue9KysoQY", 260, 180)
Out[15]:

D734n8 Fig. Ländler D. 734, no. 8

In [16]:
IFrame("https://www.youtube.com/embed/8SiqQdPsNO4?start=377", 260, 180)
Out[16]:

Analyzing harmonic patterns

The words 'harmony' and 'chord' are used interchangeably in this section.

Analyzing harmonies is generally seen as the task of separating chord tones from non-chord tones. The more complex the music, the more disagreement there is amongst human experts about this separation. If however, as we claim, the dances are written in a very conventional style, simple heuristics should be sufficient to analyse most of the harmonies and very few harmonies should account for a large part of the corpus.

In general, the problem of chord classification is linked to a score segmentation problem: At which time points in the score do you start considering different notes as chord and non-chord tones. But if you consider the four previous music examples, three striking facts about their harmonic content become evident:

  • In none of the examples do we see more than one harmony per measure.
  • The bass note of every chord is always the lowest note in the measure.
  • In the four measures of every example, there appear no other harmonies than tonic (I) and dominant (V).

We use these obervations to create our heuristics for the chord classification. Since the simplest piece would contain tonic and dominant chords exclusively, we will check afterwards if their might appear such overly simplistic pieces.

Creating chord labels

To make the notes constellations of all pieces comparable, we have transposed them to the key of C major/A minor, so that a C major chord can be considered as tonic in major pieces, and an A minor chord as the tonic in minor pieces:

mode tonic harmony dominant harmony
major C major G major or G dominant7
minor A minor E major or E dominant7
Roman numeral I V or V7

From our previous observations we have constructed the following heuristics:

  • For every measure, we assume one harmony (i.e., one chord)
  • and consider the lowest note as being the bass note.

Then we express all other notes as intervals relative to the bass note and map interval constellations to chord types. In the following table, you see the result of our chord classification for the four previous music examples (four measures each). For every measure, you see the bass note name, the intervals of all sounding notes above the bass and the label that has been assigned to the chord type. As you can see, dominants, as well as all other chords, can occur in several permutations, depending on which of the chord tones is the bass note. The permutations are indicated by (thorough bass) numbers ending the label.

In [17]:
chord_profiles = jh.read_chord_profiles(os.path.join(data_tsv, 'chord_profiles.tsv'))
pd.concat([chord_profiles.loc[id, ['bass', 'intervals', 'labels']].iloc[:4] for id in [91, 257, 163, 269]], keys=[91, 257, 163, 269], names=['id', 'mn'])
Out[17]:
bass intervals labels
id mn
91 1 C (M3, P5) major
2 G (M3, m7) dominant7
3 C (M3, P5, M6) major
4 G (M3, P5, m7, M7) dominant7
257 1 G (M3, P5, M7, m7) dominant7
2 C (M3, P5, M6) major
3 G (M3, P5, m7) dominant7
4 C (M3, P5) major
163 1 C (M3, A4, P5) major
2 C (M2, M3, P5) major
3 B (m3, D5, P5, m6) dominant65
4 G (M2, M3, P5, M7) major
269 1 C (M3, P5, M6) major
2 D (m3, P4, P5, M6) dominant43
3 G (M3, P5, M6, m7) dominant7
4 C (M3, P5) major

Chord distributions

The vast majority of bass notes (95 %) are those of the diatonic scale (i.e. white keys of the piano in our C major case). Only 5 % of all bass notes have accidentals (black piano keys). 65 % of all bass notes are on C or G. Depending on which chord types are sounding above these bass notes, this is already a good indicator of an easy-to-grasp harmonic language.

In [18]:
jh.cumulative_fraction(chord_profiles.bass).iplot('scatter', y='y', xTitle='bass notes', yTitle='cumulative percentage', title='Which bass notes together make up for how much of all bass notes', )

If we look at the different chord types we see once again a similar pattern, namely that very few types together (over 70 % are unpermutated major, dominant7 and minor chords) make up for the largest part of the corpus.

In [19]:
jh.cumulative_fraction(chord_profiles.labels).iplot('scatter', y='y', xTitle='chord types', yTitle='cumulative percentage', title='Which chord types together make up for how much of all chord tokens')

If we aggregate the permutations of the different chord types, the picture becomes even clearer: Even when considering the chord types exclusively, the harmonic language seems to be quite conventional. Otherwise, we would have seen more chord in the aggregation category others which accounts for measures which were ambiguous because of too many or too few notes. Only 8 % of chords could not be attributed to one of the basic chord types.

In [20]:
type_dist = pd.read_csv(os.path.join(data_tsv, 'type_distribution.tsv'), sep='\t', index_col=0)
type_dist.sort_values(by='fraction', ascending=False).iplot('bar', xTitle='chord category', yTitle='percentage of the corpus', title='Fractions of the different chord categories')

To see how complex the harmonic development is in Schubert's dances, we use as a rough measure the number of distinct chords per piece, where we define a chord as a tuple (bass_note, chord_type). A tonic chord would be ('C', 'major') in major and ('A', 'minor') in minor, and so on.

In [21]:
chord_tuples = pd.read_csv(os.path.join(data_tsv, 'chord_tuples.tsv'), sep='\t', index_col=[0, 1], converters={'chords': jh.parse_tuples})
(chord_tuples.groupby('id').nunique().chords.value_counts() / 435).iplot('bar', xTitle='#distinct chords in one piece', yTitle='percentage', title='How many pieces have how many distinct chords?')

If we look at the distribution over the occurring chords of the corpus, a striking 80 % make up for only tonic chords and permutations (C major, C suspension, E major6, G major64, A minor), dominant chords and permutations (G dominant7, B dominant65, D dominant43, F dominant2, G major, E dominant7, G# dominant65, E major, E major6), and predominant chords (F major, D minor D dominant7). From more complex or unconventional music we would expect many more chords to diverge from this tonic region.

In [22]:
chord_dist = jh.cumulative_fraction(chord_tuples.chords)
chord_dist.index = chord_dist.index.map(str)
chord_dist['y'].iplot('scatter', y='y', dimensions=(3000,500))

Repeating pattern detection with auto-correlation

One common task in musicology is to identify repeating patterns in a piece.

For instance, a piece may be composed of 3 parts A, B, C, repeated as: A-B-A-C.

Here we will focus on automating this detection process in our dataset of dance pieces.

In [23]:
# Compute all the graphs for this part of the data story 
from tools import datastory_patterns as dsp

How to identify a repeating part from a new part?

The criterion to distinguish one part from another might be considered listener-dependent. One listener could for instance choose to identify parts as the overall melody change, but what makes for enough melody change is left to arbitrary judgements.

As an arguably more unanimous (and calculable) way to distiguish parts, we will consider that segment A repeats in segment B when a sufficient fraction of the notes of A appear in B. Our first step to identify repeating patterns will then be to look for common notes for every two segments A and B.

In classical music, a part is necessarily composed of a whole number of metrics. To find repetitions, we will then have to compare every measure of a piece with each other measure of the same piece.

The operations we will perform are called auto-correlations.

Definition of an auto-correlation for a musical piece

Let a piece $A$ be a sequence of measures (in the musical sense) $a_x, x \in \left\lbrace 0 ~..~ n \right\rbrace$.

The auto-correlation of a segment $A_{\alpha, \beta} \triangleq \left[ a_\alpha ~..~ a_\beta \right]$ of $A$ over $A$ is defined by: $$C_{A, \alpha, \beta}\left( \delta \right) = \sum\limits_{i=\alpha}^{\beta}{a_i \cdot a_{i + \delta}}$$

We then need to define a product function '$\cdot$' that counts how many notes match in the two given measures $a_x$ and $a_y$. That is, the number of notes having same pitches and same onsets (i.e., start time relative to the beginning of the measure). When measure $a_i$ is outside the piece, it is considered empty: its product with any other measure is 0. There is a small peculiarity though: a sharp C is not equivalent to a flat D, although the same key would be pressed on a piano. Writing a flat D in a composition instead of a sharp C may reflect a global key change, which may be a hint for a section change.

So we will consider two product functions: one matching onset and piano key (C♯4 matches D♭4, but C♯4 does not match C♯5), another matching onset and pitch class (C♯4 does not match D♭4, but C♯4 matches C♯5). The former function will be called the key-product, and the latter the class-product.

Using auto-correlation for initial section detection

If a piece $A$ has repeating sections, we should expect to observe relatively high products $a_x \cdot a_y$ for $x \neq y$.

Printing each pairwise product $a_x · a_y$ in a heatmap for the first piece (D41 menuett n°1), we can observe with respectively the key- and class-products:

In [24]:
dsp.heatmap_piece1_pitch
In [25]:
dsp.heatmap_piece1_class

A first trivial observation is that these matrices are symmetrical, as the product functions are both commutative. Another observation is that the diagonal stands out. No surprise: every note of a measure is present in the same measure, hence a generally higher product on the diagonal corresponding to the number of notes in each measure.

The heatmap produced show kind of "supplementary diagonals", which indicate part repetitions. Indeed: let suppose a piece is composed of two parts A-B-A, each part lasting 4 measures. Then one can expect these "supplementary" diagonals to start at measure index 8, marking the repetition of A.

Repetitions can also be guessed by looking at the rows of a few measure. Taking for instance measure two, we see it repeats at 3 instances with one "black" spot. This tends to indicate a structure A-A-B-A.

There is a problem though. Quite often measures do not repeat much within a part, especially the last measure of a part (take for instance measures 3, 7, 11, 16, but also measure 0 of D41 menuett n°1). This makes it hard to estimate part lengths.

To overcome this issue, we will instead auto-correlate the whole piece with itself. The rationale is that when the offset $\delta$ corresponds to a part offset (in our first example 8, and in the menuett above 4, 8, 12), a spike should appear as we auto-correlate a whole part and not measures alone. This approach is to work well when part lengths are of fixed size (or some are multiples of the other), which can be expected for many classical pieces.

The autocorrelation of the first piece of our corpus, D41 menuett n°1, gives:

In [26]:
dsp.autocor_piece1
Out[26]:

From initial section detection to dance structure

For sections of the same size, spikes correspond to part boundaries. In the example above, the part boundaries are at (0-4), (4-8), (8-12) and (12-16). To automate this detection, we need to write a filter detecting these spikes. We use a simple rolling window $w$ of size 3, and detect for which offsets $k$ we have: $w\!\left[k-1\right] < w\!\left[k\right]$ and $w\!\left[k\right] \ge w\!\left[k+1\right]$

Now that we have part boundaries, we still miss which parts correspond to which boundaries. To answer this question, each part boundary (0-4), (4-8), ... is auto-correlated (always for $\delta = 0$) with each of the other part boundaries of the same size. This allows to identify where a part repeats, e.g. in D41 menuett n°1: (0-4) repeats in (4-8) and (12-16), as these measures have a high-enough, pairwise auto-correlation, but does not in (8-12).

High-enough is an arbitrary criteria. In our automated process, we use the notion of a trigger (threshold may have been more correct from an english standpoint): if the auto-correlation of detected parts X and Y divided by the number of notes in X is below the trigger (a value between 0 and 1), the parts are considered disticts (i.e. Y is not X). The normalization is a primitive form of regularization, favoring repetition of less complex measures. This is also an interpretable parameter, as it is the minimal fraction of notes that must repeat (same onset and same key or pitch class) to consider a repetition.

There is also a mechanism to merge adjacent parts for cosmetic reasons. For instance if a piece has a detected structure C-C-D-D-C-C, then the parts C-C and D-D can be rewritten as A = C-C and B = D-D, resulting in a visually more readable structure A-B-A.

At this point, we have a function that translates a piece (a list of notes) to its structure (a list of letters) given a trigger value. The higher the trigger, the more certain we are repetitions actually occurred. Now we have one more prior knownledge to add: most of the studied pieces are short and actually contain only 2 sections, or more rarely 3. Using this information, for each piece we will run the structure detection algorithm for $\left[ 0.1 ~..~ 0.9 \right]$ by steps of $0.1$, keeping the highest trigger that leads to selecting a structure with minimal number of different parts (and, of course, at least two parts).

The results are laid in the table below. HR stands for the key-product and TR for the class-product. The confidence column is only for readability purpose (it is immediately derived from the trigger, e.g. $0.4 \le trigger < 0.6$ maps to "presumably"). A trigger of $0.$ means no structure found.

In [27]:
dsp.structures
Out[27]:
name HR-structure TR-structure HR-confidence TR-confidence HR-trigger TR-trigger
0 D41 menuett n°1 [A, A, B, A] [A, A, B, A] confident confident 0.6 0.7
1 D41 trio n°1 [A, A, A, A, A, B, A, C] [A, A, A, A, A, B, A, C] wild guess unsure 0.1 0.2
2 D41 menuett n°2 [A, B, A, A] [A, B, A] wild guess confident 0.1 0.6
3 D41 trio n°2 [A, A, A, B, A, A, A, A] [A, A, A, B, A, A, A, A] unsure unsure 0.2 0.3
4 D41 menuett n°3 [A] [A, B, A] --- confident 0.0 0.6
... ... ... ... ... ... ... ...
430 D978 walzer n°1 [A, A, A, B, A, A] [A, A, A, B, A, A] unsure presumably 0.3 0.4
431 D979 walzer n°1 [A] [A, B, A, B, C] --- presumably 0.0 0.5
432 D980 walzer n°1 [A, A, A, B, A, A, C] [A, A, B, A, A, A, A] unsure unsure 0.2 0.2
433 D980 walzer n°2 [A, B, A, C] [A, B, A, C] unsure confident 0.3 0.6
434 D679 ländler n°2 [A, B, A, C] [A, B, A, B, B, C] certain confident 0.8 0.6

435 rows × 7 columns

Classifying dance types

This huge repertoire of piano pieces was intended for the private enjoyment of the Viennese bourgeoisie, whose social gatherings invariably involved music and, clearly, group dancing. Accordingly, each piece in the corpus bears as a title the name of one of the most common dance types of Schubert’s times. The title could not be assigned randomly, as the music had to match smoothly the common-sense expectations regarding the specific choreography that characterised each dance. Musicians and dancers were often amateurs, and yet, whenever the word “Walzer” or “Ländler” was heard everyone would also have known how the music would have sounded like, and how to move on the dancing floor. However, it is by no means trivial to tell what the exact differences among dance types would be, and whether these difference were encoded in the score or rather in the unspoken conventions of a specific social group, the burgeoisie, striving to establish its status and identity in the powerful and dynamic environment of the early-19th-century Vienna.

For instance, the Walzer and the Ländler could be experienced as the extremes of a continuum in a role-playing game across social classes, with the refined Walzer representing the high-class status eventually degenerating, but just for apotropaic fun, into the rural Ländler (Witzmann, 1976).

In the following, we will attempt to single out musical features from the score that can be hypothesised to characterise different dance types, and assess to what extent they account for the original destination of the pieces as intended by the composer.

2.2 Features

Since there are only few instances of Galopp and Cotillon in the corpus, we will not include these dances in our investigation. Among the remaining dances, the Ecossaises are the only ones in duple meter (oom-pah-oom-pah rather than oom-pah-pah-oom-pah-pah): even the time-signature alone would provide perfect accuracy in recognising this dance type. Accordingly, we will focus on the non-trivial task of classifying the five triple-meter dance types: the French Menuet and Trio, and the German Ländler, Walzer and Deutscher.

Complexity

Simplicity is the keyword for a popular musical product aimed at universally meeting the taste, and the keyboard skills, of the viennese bourgeoisie. A mass product for private consumption, somewhat similar to today’s playlists, these dances were to be enjoyed in rather informal settings within the familiar social atmosphere of the ballroom. All in all, the piano dance was not the ideal testing ground for compositional experimentalism. The entropy of the note distribution over the corpus reflects this feature. Compared to music from earlier and later composers, the dances show a more selective and hierarchically distributed usage of pitches (lower entropy): in one word, they are remarkably simple.

In [28]:
entropy_comparisons = pd.read_csv(os.path.join(data_tsv, 'entropy_comparisons.tsv'), sep='\t', index_col=0)
fig = go.Figure(data=go.Scatter(
        mode = 'markers',
        x = entropy_comparisons.index.tolist(),
        y = entropy_comparisons['mean'].tolist(),
        error_y=dict(
            type='data', # value of error bar given in data coordinates
            array=entropy_comparisons['95% CI: +-'].tolist(),
            visible = True),
    ))

fig.add_shape(
        # Line Horizontal
        go.layout.Shape(
            type="line",
            x0 = 0,
            x1 = len(entropy_comparisons),
            y0=entropy_comparisons.loc["Schubert (1797-1828): Dances"]['mean'],
            y1=entropy_comparisons.loc["Schubert (1797-1828): Dances"]['mean'],
            line=dict(
                color="red",
                width=1,
                dash="dot",
            ),
    ))

fig.update_layout(
    title="Mean entropies of Schubert, DCML and Snyder corpora",
    yaxis_title="Mean entropy (95% CI)",
    height = 800)
fig.show()

Fig. Mean entropies with 95% confidence intervals from the Schubert Dances corpus, the DCML corpus and tthe corpus analyzed in Snyder, J. L. (1990). Entropy as a Measure of Musical Style: The Influence of A Priori Assumptions, Music Theory Spectrum 12(1).

Among the various dance types, both Trios and Ländler have a significantly lower entropy on average than Deutscher, Walzer and Menuetts. In terms of pitch content, they are simpler than the other dance types.

What about the rhythm? The following plots show the entropy of the distribution of rhythmic durations and of rhythmic onsets: the higher the corresponding entropy, the more varied the notated and perceived rhythmic texture, respectively. In general, the “German” dances exhibit the tendency for a simpler texture than the “French” ones, maybe reflecting the stronger link to their folk origin.

In [29]:
fig = go.Figure()

for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).entropy.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
             line = dict(color = color_dances[dance])
        )
    )
    
for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).duration_entropy.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
            visible = False,
            line = dict(color = color_dances[dance])
    ))

for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).onset_entropy.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
            visible = False,
            line = dict(color = color_dances[dance])
    ))


fig.update_layout(
    updatemenus=[
        go.layout.Updatemenu(
            active=0,
            buttons=list([
                dict(label="Pitch entropy",
                     method="update",
                     args=[{"visible": [True]*len(classif_dances)+[False]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Entropy of pitch distribution",
                            }]),
                dict(label="Duration entropy",
                     method="update",
                     args=[{"visible": [False]*len(classif_dances)+[True]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Entropy of duration distribution",
                           }]),
                dict(label="Onset entropy",
                     method="update",
                     args=[{"visible": [False]*len(classif_dances)+[False]*len(classif_dances)+[True]*len(classif_dances)},
                           {"title": "Entropy of onset distribution",
                            }]),    
            ]),
        )
    ])

fig.update_layout(title_text="Entropy of pitch and rhythm distributions")

fig.show()

Key

Intro

A piece of tonal music is like a journey across different regions, each being characterised by specific sonorities, moods, colours. These features of a tonal region arise as a consequence of the choice of notes and relationships among notes that the composer assembles in that particular spot in the music. In one word, a key: with this term, we refer to (1) a set of notes and (2) a hierarchy among them, where one note (the tonic) can be thought of as the center while all other notes have different functions with respect to the tonic itself.

In each piece, one key is particularly important, as it marks the final goal and often also the starting point of the music: it is the global key of the piece. However, just as painters choose colours from their palette, composers can change the “colour” of the music by shifting from key to key as the piece unfolds.

The most striking distinction is between so-called major and minor keys, that roughly correspond to happy and melancholic moods respectively. Furthermore, with respect to the global key of a piece, each other major and minor key conveys different shades of these moods.

Methodology

The musicological understanding of a piece relies on a detailed analysis of what is written in the score to infer the key that are encountered as it unfolds. However, to what extent does this score-based mode identification agree with the perceptual “feel” of a key?

Extensive research has shown that Western listeners have consistent expectations on the fitness of individual notes into a key. For example, this plot shows how much each one of the twelve notes is perceived as fitting into the Cmajor or Cminor keys: these are called the Cmajor and Cminor key profiles.

[Krumhansl, C.L. & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89(4), 334–368.]

In [30]:
maj_min_key_profiles = pd.DataFrame(np.array([[6.20, 2.55, 3.45, 2.85, 4.22, 4.57, 2.67, 5.25, 2.45, 3.35, 2.70, 2.70],[6.03, 3.35, 3.67, 5.28, 2.58, 3.55, 2.87, 4.80, 4.35, 2.67, 2.50, 3.42]]).transpose(), columns = ['major', 'minor'], index = ['C', 'C#/Db',  'D',  'D#/Eb', 'E','F', 'F#/Gb', 'G','G#/Ab', 'A', 'A#/Bb', 'B']).iplot('scatter', mode = 'lines+markers', title = 'Major and minor key profiles (Krumhansl & Kessler 1982)', xTitle = 'pitch class', yTitle = 'rating')
maj_min_key_profiles

Any fragment of music consists of a bag of notes, i.e. the frequency distribution of notes occurring in the fragment itself. The correlation between the bag of notes and any individual key profile is a proxy to how perceptually close to that key the fragment of music is. Each fragment of music can closely match one key, i.e. sound decidedly in that key, based on the corresponding perceptual expectations, or it can be ambiguously equidistant to several keys.

[Cf. Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. Oxford University Press.]

Tracking key

It is possible to track the emotional journey of a dance by following the unfolding of keys in the piece, each key corresponding to a different musical colour or shade. Listen to the animated example: can you follow the shifting keys and interpret the trajectories?

In [31]:
IFrame("https://www.youtube.com/embed/6cwMEQ8ODz0", width = 800, height=450)
Out[31]:

Now you can explore the emotional path of a few dances of your choice by inspecting the corresponding key trajectories. In some cases, there is one clear dominant key, while there are also moments where several keys compete with each other for tonal primacy, resulting in a more ambiguous and tormented expression.

In [32]:
with open(os.path.join('data', 'pickle', 'key_trajectory_dict.pkl'), 'rb') as f:
    key_traj_dict = pickle.load(f)
    
# Specifiy the dance IDs that should appear in the dropdown
selection = [1, 2, 3, 24, 200, 300, 400]

buttons = []
lines = []
lines_idx_dict = {}
counter = 0

# Create the lines for all trajectories
for idx, dance in enumerate(selection):
    counter_before = counter
    key_trajectories = key_traj_dict[dance]
    if idx == 0:
        visible=True
    else:
        visible=False

    for key in key_trajectories:
        key_string = str(key[1])+' '+str(key[0])
        line = go.Scatter(x=np.arange(len(key_trajectories[key])),
                          y=key_trajectories[key].to_list(),
                          name=key_string,
                          visible=visible,
                          mode = 'lines',
                          showlegend=True)
        
        lines.append(line)
        counter = counter + 1
        
    lines_idx_dict[dance] = (counter_before, counter)
    
# Create the buttons that toggle the visibility
for idx, dance in enumerate(selection):
    true_false_list = [False] * len(lines)
    label = 'D' + str(files.loc[dance]['D']) + ' ' + str(files.loc[dance]['dance']).capitalize() + ' N. ' + str(files.loc[dance]['no'])    
    visible_list = true_false_list
    
    for i in range(lines_idx_dict[dance][0], lines_idx_dict[dance][1]):
        true_false_list[i] = True
    
    button = dict(label=label,
                  method='update',
                  args=[{'visible': true_false_list}])
    buttons.append(button)
    
updatemenus = [dict(buttons=buttons, direction='down', showactive=True)]

layout = go.Layout(
    title='Key Trajectories',
    updatemenus=updatemenus,
    xaxis_title='Bars',
    yaxis_title='Correlation',
    yaxis=dict(range=[-0.5, 1.1], autorange=False, zeroline=False)
)

figure = go.Figure(data=lines, layout=layout)
figure.show()

Variety

Let us look into some details. Many dances start and end in the same key, but this is by no means a necessity.

In [33]:
start_end_key = pd.DataFrame(files.groupby('dance').start_end_key.value_counts()).rename(columns = {'start_end_key': 'counts' })
start_end_key.reset_index(inplace = True)
start_end_key['start_end_key'] = list(map(int, start_end_key['start_end_key']))
px.bar(start_end_key.reset_index(), x = 'dance', y = 'counts', color = 'start_end_key', barmode = 'stack', title = 'Does the dance start and end in the same key?')

Dances have a twofold expressive purpose: being simple enough to be enjoyable without dedicated intellectual engagement, while at the same time materialising well-identified moods possibly taken to their extremes and contrasted with one another in sudden, although temporary, emotional clashes. As each key corresponds to an emotional region, how many keys does each dance cross during its journey? A plausible hypothesis is that certain dance-types are more likely to exhibit complex key trajectories than others.

Indeed, more than half of the dances do not touch more than three keys, and unsurprisingly the rural Ländler are tendentially more likely to be monolithic in this respect compared to all other dance types.

In [34]:
files[files.dance.isin(classif_dances)].set_index('dance', append=True)\
     .num_keys\
     .unstack(level=1)\
     .iplot('hist', barmode='stack', colors=color_dances)

Mode identification

According to our musicological ground truth analysis, most dances are in the major mode, consistently with the enjoyable character expected of a dance.

In [35]:
dances_mode = pd.DataFrame(files.groupby('dance').gt_mode.value_counts())
dances_mode.rename(columns = {'gt_mode': 'count'}, inplace = True)
dances_mode.reset_index(inplace = True)
px.bar(dances_mode, x = 'dance', y = 'count', color = 'gt_mode')

How does this musicological classification correspond to the feel of the dances themselves? To answer this question, we focus on the first phrase of a piece: the similarity with the major and minor profile determines the coordinates of a “mode space” where each dance occupies a point based on its incipit.

In [36]:
fig = px.scatter(files, x = 'major', y = 'minor', color = 'gt_mode')
fig.update_layout(
    title="Mood map",
    xaxis_title="correlation with major key profile",
    yaxis_title="correlation with minor key profile")
fig.add_shape(
        # Line Horizontal
        go.layout.Shape(
            type="line",
            x0 = -0.1,
            x1 = 1,
            y0=-0.1,
            y1=1,
            line=dict(
                color="black",
                width=2,
                dash="dot",
            ),
    ))

fig.show()

The diagonal of this plot sets predominantly positively and predominantly negatively valenced dances apart. The classification provided by this divider is indeed in good agreement with the musicological ground truth (Cohen’s Kappa = 0.847). In other words, the initial mood of a piece is what determines the musical character of the piece as a whole.

In particular, we can also compute a coefficient of modal ambiguity for each dance: once again, Ländler stand out as being less likely to be more ambiguous compared to representatives of other dance types.

In [37]:
files[files.dance.isin(classif_dances)].set_index('dance', append=True)\
     .maj_min_first_abs\
     .unstack(level=1)\
     .iplot('box', title = 'Modal ambiguity', yTitle = '|major correlation - minor correlation|',
            color=color_dances)

Rhythmic patterns

Characteristic rhythmic patterns

The individual rhythmic patterns singled out in Part I might contribute to characterise different dance types. In particular, the rhythmic texture of the left hand, which represents the metrical skeleton of the music, and of the right hand, which usually articulates the melody, might help the classification task.

Event density

A surface-level characterisation of the texture of a dance is how "crowded" the music is: i.e., how many notes populate each time interval on average. All dances are somewhat similar in the overall density, but even more relevant for the auditory experience of music is the density of onsets of events, which marks the superficial rhythmic texture. French dances tend to be more dense in this respect.

In [38]:
files[files.dance.isin(classif_dances)].set_index('dance', append=True)\
     .onset_density\
     .unstack(level=1)\
     .iplot('box', xTitle = 'Onset density', color=color_dances)

Downbeat accent

Another potentially important feature for a dance is the relationship between the downbeat and the other beats of each bar, as this maps to the typical steps of the corresponding coreography.

For example, it may be expected that a dance that is markedly "in one" (i.e., with one "heavy" step and two lighter ones) such as the spinning walzer might have a higher downbeat vs. offbeat density ratio compared to heavier dances that are markedly "in three" (e.g. the Ländler). However, this does not seem to be the case: Ländler have generally not lower downbeat vs. offbeat ratios with respect to Waltzer. This is biased, however, by a notational issue: in the well-known "oom-pah-pah" pattern of the Walzer, the first beat in the left hand is usually just one single note, while the second and third beats are chords; it is the performer, in these cases, that modulates the weight of the beats to convey the appropriate metrical feel, which is not reflected in the notation. These exceeds the scope of a score-based approach: especially in this case, music is a tool for social interaction, and much of the music-related knowledge is diffused in the social norms of the group rather than written down in the score.

However, compared to the German dances, French dances have heavier downbeats in terms of overall event density, considering both the events contained in the beats and those strictly happening on the beats.

On the contrary, if we look at the ratio of onset densities between downbeats and offbeats, Menuets are significantly lower and Ländler asignificantly higher on average than other dances.

In [39]:
fig = go.Figure()

# Add Traces
for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).ratio_downbeat_non_downbeat.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
             line = dict(color = color_dances[dance])
        )
    )
    
for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).ratio_downbeat_non_downbeat_onset.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
            visible = False,
            line = dict(color = color_dances[dance])
    ))

for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).ratio_downbeat_non_downbeat_strictly.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
            visible = False,
            line = dict(color = color_dances[dance])
    ))


fig.update_layout(
    updatemenus=[
        go.layout.Updatemenu(
            active=0,
            buttons=list([
                dict(label="Event density",
                     method="update",
                     args=[{"visible": [True]*len(classif_dances)+[False]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Ratio of downbeat vs. offbeat event density",
                            "annotations": []}]),
                dict(label="Onset density",
                     method="update",
                     args=[{"visible": [False]*len(classif_dances)+[True]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Ratio fo downbeat vs. offbeat onset density",
                            "annotations": []}]),
                dict(label="Strict",
                     method="update",
                     args=[{"visible": [False]*len(classif_dances)+[False]*len(classif_dances)+[True]*len(classif_dances)},
                           {"title": "Ratio of density of strictly downbeat vs. offbeat events",
                            "annotations": []}]),
               
            ]),
        )
    ])

# Set title
fig.update_layout(title_text="Ratio of downbeat and offbeat events")

fig.show()

Characteristic intervals

Registral accent

A form of accent that is not reflected in the rhythmicality of the music is the registral accent, i.e. the abrupt change of height (register) within one voice. It may be hypothesised that this feeature may betray, on a notational level, the bar-by-bar flow typical of some dance types (see, e.g., the prototypical bass line in the example from Walzer D924 no. 1 above). In our case, the Ländler and the Walzer share the feature of having stronger registral accents on average than other dance types.

In [40]:
files[files.dance.isin(classif_dances)].set_index('dance', append=True)\
     .interval_downbeat_offbeat\
     .unstack(level=1)\
     .iplot('box', title = 'Average ratio of melodic intervals between downbeat and offbeat',
            yTitle = 'average ratio',
            color=color_dances)

Sixths and thirds

As a final classification feature, we will track the use of peculiar pitch intervals in the various dances: different harmonic intervals correspond to different sonorities, that might be characteristic of certain dances. Intervals of sixth and third, in particular, result to be less frequent in French dances than in German ones.

In [41]:
fig = go.Figure()

# Add Traces
for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).sixths_count.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
             line = dict(color = color_dances[dance])
        )
    )
    
for dance in classif_dances:
    fig.add_trace(
        go.Box( 
             y= files.set_index('dance', append=True).thirds_count.unstack(level=1)[dance].dropna().to_list(),
               name=dance,
            visible = False,
            line = dict(color = color_dances[dance])
    ))



fig.update_layout(
    updatemenus=[
        go.layout.Updatemenu(
            active=None,
            buttons=list([
                dict(label="Sixths",
                     method="update",
                     args=[{"visible": [True]*len(classif_dances)+[False]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Frequency of sixth intervals",
                            "annotations": []}]),
                dict(label="Thirds",
                     method="update",
                     args=[{"visible": [False]*len(classif_dances)+[True]*len(classif_dances)+[False]*len(classif_dances)},
                           {"title": "Frequency of third intervals",
                            "annotations": []}]),
    
               
            ]),
        )
    ])

# Set title
fig.update_layout(title_text="Ratio of downbeat and offbeat events")

fig.show()

Classification: dancing in the forest

Although specific biases can be recognised for the different dance types with respect to the features discussed above, this may not be enough to classify individual dances. A data-driven machine-learning approach is precious in this case: our poor computer has never enjoyed the beauty of these short dances, which makes of it the perfect naive candidate to learn their characteristics from scratch.

The task is indeed daunting and the fact that a random forest model trained on the features described above achieves a significantly above chance performance (Cohen's Kappa score = 0.46, 95% CI = (0.37,0.55)) is already an interesting fact. Dance types are not only social stereotypes: they are characterised by musical features that can be spotted when listening.

In [42]:
global_confusion = pd.read_csv(os.path.join(data_tsv, 'global_confusion.tsv'), sep='\t', index_col=0)
dance_dict = {'deutscher':0, 'ländler':2, 'menuett':3, 'trio':4, 'walzer':5}
dance_list = list(dance_dict.keys())

fig = ff.create_annotated_heatmap(z=np.array(global_confusion), x=dance_list, y=dance_list)

#fig = go.Figure(data=go.Heatmap(z=cm, x=dance_list, y=dance_list))
fig.update_layout(title='Confusion Map For Dance Classification',
                  xaxis_title="Predicted labels", yaxis_title="True labels")
fig.data[0].update(zmin=0, zmax=1)
fig.show()

Among all of the features, the following are the most relevant for the classification task:

In [43]:
feature_ranking = pd.read_csv(os.path.join(data_tsv, 'feature_ranking.tsv'), sep='\t', index_col=0)
top_features = feature_ranking[feature_ranking.importance >.02]
fig = go.Figure()
fig.add_trace(go.Bar(
    name='Feature importance: top 20',
    x=top_features.index, y=top_features.importance,
    error_y=dict(type='data', array=top_features.CI)
))

fig.update_layout(barmode='group', yaxis_title = 'Feature importance')
fig.show()

The various forms of metrical accent appear to be particularly informative in distinguishing the various dance types. A few of the rhithmic patterns are also important cues that should help the listener to recognise dance types: the left hand and right hand versions of the "oom-pah-pah" are, unsurprisingly, among these, as well as the uniform eigth-notes rhythm. Among the top ranking features we also find the rhythmic texture (onset density) and various forms of complexity (entropies).

The few top-ranking features shown above are successful in unearthing some structure in the dancing fog of this corpus.

In [44]:
tsne_coord = pd.read_csv(os.path.join(data_tsv, 'tsne.tsv'), sep='\t', index_col=0)
px.scatter(tsne_coord, x = 'TSNE1', y = 'TSNE2', color = 'dance', title = 't-SNE scatter plot')

The unsupervised spatialization of dances induced by our top-ranking features shows the segregation of the French, towards the top-left corner, and German dances, towards the bottom-right one. Indeed, these features contain enough information to recognise at least the cultural model that conventionally inspires the character of each dance, be it the noble Versailles court or the rural Austrian countryside.

In [45]:
national_classif = pd.read_csv(os.path.join(data_tsv, 'cm_national.tsv'), sep='\t', index_col=0)

fig = ff.create_annotated_heatmap(z=np.array(national_classif), x=['french', 'german'], y=['french', 'german'])
fig.update_layout(title='Confusion Map For Dance Classification',
                  xaxis_title="Predicted labels", yaxis_title="True labels")
fig

Can you do better?

Clearly, the performance of the model is not ideal: it could hardly be otherwise, since the variety within each dance type is so high that one can only recognise minor trends. This should not be surprising, though: even for a trained musician, with implicit access to many more features than the ones we considered here and, more importantly, with appropriate means to prioritize among them, it is very hard to distinguish dance types. The performance of the model should then be gauged against the intrinsic difficulty of the task: do you think you can you do better? Here is a small game to test your classifying skills!

In [46]:
%%HTML
<span id="round"> Round 1/5 </span> <p>

<button type="button" onclick="dancePlay()">Play</button>
<button tyle="button" onclick="dancePause()">Pause</button> <p>
<audio id="dance">
<source type="audio/mpeg" />
</audio>

<span> Guess the dance type: </span> </br>
    
<select id="dropdown">
    <option value="default">Choose...</option>
    <option value="deutscher">Deutsche</option>
    <option value="ländler">Ländler</option>
    <option value="walzer">Waltz</option>
    <option value="menuett">Menuet</option>
    <option value="trio">Trio</option>
    <option value="ecossaise">Ecossaise</option>
</select>

<button style="button" onclick="select()">Select</button> <p>
<span id="answer"> </span> <p>
<button id='nextRound' style="button" onclick="nextRound()">Next round</button>
<button id='playAgain' style="button" onclick="reset()">Play again</button> <p> 

</center>
<script src='quiz.js'>

</script>
Round 1/5

Guess the dance type:

This music, that a couple of centuries ago represented the essence of simplicity, is to our ears somewhat obscure. It hides a code of social practices, class ambitions, shy or audacious sexuality. Learning the keys to crack this code would allow us to feel familiar with the inhabitants of a distant age in the most natural way: dancing together.