Introduction to SPTK.jl

This notebook gives a brief instroduction to SPTK.jl, especially focused on mel-generalized cepstrum analysis.

Preparation

  • Configure visualization settings (using PyPlot in this notebook)
  • Loading audio files (using WAV package)
In [1]:
using PyCall
matplotlib = pyimport("matplotlib")
PyDict(matplotlib["rcParams"])["figure.figsize"] = (12, 5)
using PyPlot
INFO: Loading help data...
Warning: using PyPlot.matplotlib in module Main conflicts with an existing identifier.
In [2]:
using WAV
import SPTK
In [3]:
filepath = joinpath(Pkg.dir("SPTK"), "examples", "test16k.wav")
x, fs = wavread(filepath, format="native")
x = convert(Vector{Float64}, vec(x)) # monoral
fs = convert(Int, fs)

# Visualize the speech signal in time-domain
plot(1:endof(x), x, label="a speech signal")
xlim(1, endof(x))
xlabel("sample")
legend()
Out[3]:
PyObject <matplotlib.legend.Legend object at 0x7f97df24f190>

Windowing

In [4]:
# Pick a short segment
pos = 3000
fftlen = 1024
# Note that mel-generalized cepstrum analysis basically assumes window is power-normalized.
xw = x[pos+1:pos+fftlen] .* SPTK.blackman(fftlen)

plot(1:endof(xw), xw, linewidth="2", label="a windowed speech signal")
xlim(1, endof(xw))
xlabel("sample")
legend()
Out[4]:
PyObject <matplotlib.legend.Legend object at 0x7f97bb3ac810>

Mel-generalized cepstrum analysis

In [5]:
# Plotting utility for visualizing spectral envelope estimate
function pplot(sp, envelope; title="envelope")
    plot(sp, "b-", linewidth="2", label="Original log spectrum 20log|X(ω)|")
    plot(20/log(10)*(envelope), "r-", linewidth="3", label=title)
    xlim(1, length(sp))
    xlabel("frequency bin")
    ylabel("log amplitude")
    legend()
end
Out[5]:
pplot (generic function with 1 method)
In [6]:
# Compute spectrum 20log|X(ω)| for a windowed signal
sp = 20log10(abs(rfft(xw)));
In [7]:
# Linear Cepstrum
c = SPTK.mgcep(xw, 20, 0.0, 0.0)
pplot(sp, real(SPTK.mgc2sp(c, 0.0, 0.0, fftlen)), title="Linear frequency cepstrum based envelope")
Out[7]:
PyObject <matplotlib.legend.Legend object at 0x7f97b87033d0>
In [8]:
# Mel-Cepstrum
mc = SPTK.mcep(xw, 20, 0.41)
pplot(sp, real(SPTK.mgc2sp(mc, 0.41, 0.0, fftlen)), title="Mel-cepstrum based envelope")
Out[8]:
PyObject <matplotlib.legend.Legend object at 0x7f97b864b1d0>
In [9]:
# LPC Cepstrum 
mgc = SPTK.mgcep(xw, 20, 0.0, -1.0)
pplot(sp, real(SPTK.mgc2sp(mgc, 0.0, -1.0, fftlen)), title="LPC cepstrum based envelope")
Out[9]:
PyObject <matplotlib.legend.Legend object at 0x7f97b85918d0>
In [10]:
# Warped LPC
mgc = SPTK.mgcep(xw, 20, 0.41, -1.0)
pplot(sp, real(SPTK.mgc2sp(mgc, 0.41, -1.0, fftlen)), title="Warped LPC based envelope")
Out[10]:
PyObject <matplotlib.legend.Legend object at 0x7f97b84d9fd0>
In [11]:
# Generalized Cepstrum
mgc = SPTK.gcep(xw, 20, -0.35)
pplot(sp, real(SPTK.mgc2sp(mgc, 0.0, -0.35, fftlen)), title="Generalized cepstrum based envelope")
Out[11]:
PyObject <matplotlib.legend.Legend object at 0x7f97b83af6d0>
In [12]:
# Mel-Generalized Cepstrum
mgc = SPTK.mgcep(xw, 20, 0.41, -0.35)
pplot(sp, real(SPTK.mgc2sp(mgc, 0.41, -0.35, fftlen)), title="Mel-generalized cepstrum based envelope")
Out[12]:
PyObject <matplotlib.legend.Legend object at 0x7f97b82f6d90>

Matrix-to-matrix transform

Almost all of vec2vec transforms have its mat2mat (col-wise) functions. Demonstrations are below.

In [13]:
# Utilities for splitting a time sequence into overlapping frames
countframes(x::AbstractVector, framelen, hopsize) = div(length(x) - framelen, hopsize) + 1

function splitframes(x::AbstractVector, framelen=1024, hopsize=framelen>>1)
    N = countframes(x, framelen, hopsize)
    frames = Array(eltype(x), framelen, N)
    @inbounds for i = 1:N
        frames[:,i] = x[(i-1)*hopsize+1:(i-1)*hopsize+framelen]
    end
    frames
end
Out[13]:
splitframes (generic function with 3 methods)
In [14]:
winlen = 1024
hopsize = winlen>>1
xw = splitframes(x, winlen, hopsize) .* SPTK.blackman(winlen);
@show size(xw)
size(xw) => (1024,117)
Out[14]:
(1024,117)
In [15]:
# Let's see how spectral envelope estimates are changed with different order of mel-cepstrum
# Estimate spectral envelope by mel-cepstrum analysis where order = 20
logsp = real(SPTK.mgc2sp(SPTK.mcep(xw, 20, 0.41), 0.41, 0.0, winlen))
imshow(20/log(10)*logsp, origin="lower", aspect="auto")
colorbar()