Cover song identification using 2D Fourier transform sequences

Prem Seetharaman and Zafar Rafii, Summer 2016

This notebook presents a cover identification algorithm based on the Magnitude 2DFT.

Abstract: We approach cover song identification using a novel time-series representation of audio based on the 2DFT. The audio is represented as a sequence of magnitude 2D Fourier Transforms (2DFT). This representation is robust to key changes, timbral changes, and small local tempo deviations. We look at cross-similarity between these time-series, and extract a distance measure that is invariant to music structure changes. Our approach is state-of-the-art on a recent cover song dataset, and expands on previous work using the 2DFT for music representation and work on live song recognition.

Background

A cover version of a song is one that is performed by someone else other than the original artist. Many things can change between a cover version and the original version, such as:

  • Key
  • Tempo
  • Instrumentation
  • Music structure
  • Genre

Successful automatic cover song identification approaches try to be invariant to these changes while keeping the aspects of music that are transferred from the original to the cover, such as:

  • Chord progression
  • Melody
  • Signature rhythmic or harmonic patterns

Here are some examples! Each example has the CQT, the LiveID Fingerprint (explained later), and the audio file.

Can't Help Falling In Love - Elvis Presley

First, the original song:

In [8]:
load_and_display("../datasets/Elvis Presley - Can't Help Falling In Love/Can't Help Falling In Love-5V430M59Yn8.mp3", 
                 "Can't Help Falling In Love -  Elvis Presley (original)", 2)