from pyannote import Segment, Timeline
from matplotlib import pyplot as plt
/local/Development/virtualenv/pyannote/lib/python2.7/site-packages/pytz/__init__.py:31: UserWarning: Module multiprocessing was already imported from /usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/__init__.pyc, but /Volumes/Macintosh HD/Users/bredin/Development/virtualenv/pyannote/lib/python2.7/site-packages is being added to sys.path from pkg_resources import resource_stream
uri
stands for uniform resource identifier.
uri = 'GameOfThrones.Season01.Episode01'
Let's start by loading the reference (i.e. manual) segmentation into scenes.
It is stored in segmentation_data/GameOfThrones.Season01.Episode01.scenes.txt
.
with open('data/{uri}/scenes.txt'.format(uri=uri), 'r') as f:
lines = [line.split() for line in f.readlines()]
pyannote.Timeline
objects are used to store set of pyannote.Segment
(one per scene).
A pyannote.Segment
corresponds to a time range (with a start time and an end time, in seconds).
reference = Timeline(uri=uri)
for start_time, end_time, _ in lines:
segment = Segment(start=float(start_time), end=float(end_time))
reference.add(segment)
Now, we will initialize an extractor of MFCC features (including energy and 12 first coefficients).
from pyannote.feature.yaafe import YaafeMFCC
mfcc_extractor = YaafeMFCC(e=True, coefs=12)
Once initialized, it can be used to extract the actual features.
Beware, it may take a while (a few seconds for a one hour episode).
features = mfcc_extractor.extract('data/{uri}/english.wav'.format(uri=uri))
pyannote.Features
instances have several handy methods.
crop
is one of them and allows to get all features for a given pyannote.Segment
in a numpy.array
.
x = features.crop(Segment(0., 10.))
print x.shape
(625, 13)
Let's plot audio signal energy for the first 3 minutes.
plt.plot(features.crop(Segment(0., 3*60))[:,0])
plt.ylim(7, 17);
Now, we are going to segment the episode using Gaussian divergence.
Two sliding windows (left and right) of 20 seconds each are used, with a step of 1 second.
from pyannote.algorithm.segmentation import SegmentationGaussianDivergence
segmenter = SegmentationGaussianDivergence(duration=20, step=1)
One can use segmenter
to compute the Gaussian divergence d between left and right windows for each position t of the sliding windows...
T, D = zip(*[(t, d) for (t, d) in segmenter.iterdiff(features)]);
... and consequently plot $d = f(t)$ alongside the actual position of scene boundaries.
for segment in reference:
plt.plot([segment.start, segment.start], [0, 20], 'r')
plt.plot(T, D)
plt.xlim(0, 2000)
plt.ylim(0,20);
It looks like setting a detection threshold $\theta = 7$ might do (some of) the trick.
segmenter = SegmentationGaussianDivergence(duration=20, step=1, threshold=7)
hypothesis = segmenter.apply(features)
Let's evaluate the results visually (reference in green, hypothesis in red).
for segment in hypothesis:
plt.plot([segment.start, segment.start], [0, 1], 'r')
for segment in reference:
plt.plot([segment.start, segment.start], [1, 2], 'g')
One can also use evaluation metrics :
from pyannote.metric.segmentation import SegmentationPurity, SegmentationCoverage
from pyannote.metric import f_measure
purity = SegmentationPurity()
coverage = SegmentationCoverage()
p = purity(reference, hypothesis)
c = coverage(reference, hypothesis)
f = f_measure(p, c)
print "Purity {p:.1f}% / Coverage {c:.1f}% / F-Measure {f:.1f}".format(p=100*p,c=100*c,f=100*f)
Purity 77.6% / Coverage 85.9% / F-Measure 81.5