Feature extraction and temporal segmentation¶

In [1]:

from pyannote import Segment, Timeline
from matplotlib import pyplot as plt

/local/Development/virtualenv/pyannote/lib/python2.7/site-packages/pytz/__init__.py:31: UserWarning: Module multiprocessing was already imported from /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/__init__.pyc, but /Volumes/Macintosh HD/Users/bredin/Development/virtualenv/pyannote/lib/python2.7/site-packages is being added to sys.path
  from pkg_resources import resource_stream

uri stands for uniform resource identifier.

In [2]:

uri = 'GameOfThrones.Season01.Episode01'

Let's start by loading the reference (i.e. manual) segmentation into scenes.
It is stored in segmentation_data/GameOfThrones.Season01.Episode01.scenes.txt.

In [3]:

with open('data/{uri}/scenes.txt'.format(uri=uri), 'r') as f:
    lines = [line.split() for line in f.readlines()]

pyannote.Timeline objects are used to store set of pyannote.Segment (one per scene).
A pyannote.Segment corresponds to a time range (with a start time and an end time, in seconds).

In [4]:

reference = Timeline(uri=uri)
for start_time, end_time, _ in lines:
    segment = Segment(start=float(start_time), end=float(end_time))
    reference.add(segment)

Now, we will initialize an extractor of MFCC features (including energy and 12 first coefficients).

In [5]:

from pyannote.feature.yaafe import YaafeMFCC
mfcc_extractor = YaafeMFCC(e=True, coefs=12)

Once initialized, it can be used to extract the actual features.
Beware, it may take a while (a few seconds for a one hour episode).

In [6]:

features = mfcc_extractor.extract('data/{uri}/english.wav'.format(uri=uri))

pyannote.Features instances have several handy methods.
crop is one of them and allows to get all features for a given pyannote.Segment in a numpy.array.

In [7]:

x = features.crop(Segment(0., 10.))
print x.shape

(625, 13)

Let's plot audio signal energy for the first 3 minutes.

In [8]:

plt.plot(features.crop(Segment(0., 3*60))[:,0])
plt.ylim(7, 17);

Now, we are going to segment the episode using Gaussian divergence.
Two sliding windows (left and right) of 20 seconds each are used, with a step of 1 second.

In [9]:

from pyannote.algorithm.segmentation import SegmentationGaussianDivergence
segmenter = SegmentationGaussianDivergence(duration=20, step=1)

One can use segmenter to compute the Gaussian divergence d between left and right windows for each position t of the sliding windows...

In [10]:

T, D = zip(*[(t, d) for (t, d) in segmenter.iterdiff(features)]);

... and consequently plot $d = f(t)$ alongside the actual position of scene boundaries.

In [11]:

for segment in reference:
    plt.plot([segment.start, segment.start], [0, 20], 'r')
plt.plot(T, D)
plt.xlim(0, 2000)
plt.ylim(0,20);

It looks like setting a detection threshold $\theta = 7$ might do (some of) the trick.

In [12]:

segmenter = SegmentationGaussianDivergence(duration=20, step=1, threshold=7)
hypothesis = segmenter.apply(features)

Let's evaluate the results visually (reference in green, hypothesis in red).

In [13]:

for segment in hypothesis:
    plt.plot([segment.start, segment.start], [0, 1], 'r')
for segment in reference:
    plt.plot([segment.start, segment.start], [1, 2], 'g')

One can also use evaluation metrics :

In [14]:

from pyannote.metric.segmentation import SegmentationPurity, SegmentationCoverage
from pyannote.metric import f_measure
purity = SegmentationPurity()
coverage = SegmentationCoverage()

In [15]:

p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
f = f_measure(p, c)
print "Purity {p:.1f}% / Coverage {c:.1f}% / F-Measure {f:.1f}".format(p=100*p,c=100*c,f=100*f)

Purity 77.6% / Coverage 85.9% / F-Measure 81.5