In this example, we'll search for a short audio clip in a a longer recording. The sound we're looking for is stored in the file fish_pattern1.wav
and the recording we're looking through is stored in fish_data.wav
. The fiels are available on google drive. We start by loading some packages
@show ENV["PYTHON"], ENV["JULIA_NUM_THREADS"], ENV["IJULIA_DEBUG"]
(ENV["PYTHON"], ENV["JULIA_NUM_THREADS"], ENV["IJULIA_DEBUG"]) = ("python3", "4", "false")
("python3", "4", "false")
using SpectralDistances, DynamicAxisWarping, DSP, TotalLeastSquares, WAV, LPVSpectral, Distances, Dates, Plots
using DynamicAxisWarping: lastlength
plotly()
function seconds2ms(s)
s = round(Int, s)
h = s÷3600
s -= 3600h
m = s÷60
s -= 60m
"$m:$s"
end
default(xformatter=seconds2ms, colorbar=false)
cd(@__DIR__)
We then set some parameters for the pre-processing. To cut down on processing time, we select a rather coarse resolution along both frequency and time axes. The radius rad
is the maximum allowed time warping. Increasing this parameter makes the problem take longer time to solve.
nfft = 256
rad = 6 # This is the maximum allowed warping radius
timevec(res) = range(0,stop=length(y)/fs, length=length(res))
function n01(x) # Normalization for plotting
ql = quantile(filter(isfinite,x), 0.001)
x[x .< ql] .= ql
x = x .- ql
qu = quantile(filter(isfinite,x), 0.95)
x[x .> qu] .= qu
x ./= qu
x
end
ff(x,n=10) = filtfilt(ones(n),[n],x)
plotvecs(res) = timevec(res), n01(res);
We're now ready to load the data and estimate some spectrograms. Here we take care to only estimate spectrograms for the frequencies that are improtant for the task, 200-550Hz.
function read_and_downsample(path, newf=5000)
q, fs = wavread(path)
clamp!(q, -0.012, 0.012) # Clamp filtering against snapping shrimp, is quite effective
q = DSP.resample(vec(q), newf/fs)
q = v1(Float32.(q)) # Normalize variance
q, round(Int, newf)
end
q,fs = read_and_downsample("fish_pattern1.wav")
Q = melspectrogram(q, nfft, nmels=30, fmin = 200, fmax = 550, fs = fs)
n, m = size(Q.power)
plot(Q)
y, fs = read_and_downsample("fish_data.wav")
Y = melspectrogram(y, nfft, nmels=30, fmin = 200, fmax = 550, fs = fs)
# plot(Y) # This plot is very large
LPVSpectral.MelSpectrogram{Float32,LinRange{Float32},StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}}(Float32[1.7270035f-5 2.9852868f-5 … 6.759083f-6 3.5077665f-5; 7.931863f-6 1.3710968f-5 … 3.1043442f-6 1.6110638f-5; … ; 1.1150629f-5 1.1372093f-6 … 8.214455f-6 2.6396156f-5; 7.617088f-6 1.8829588f-6 … 3.2400949f-6 9.794874f-7], range(3.0f0, stop=8.25f0, length=30), 0.0256:0.0256:300.0064)
Next, we estiamte linear models that will be used for some of the detection methods
fm = TimeWindow(inner = LS(na = 7, λ=1e-3), n = nfft, noverlap = nfft÷2)
Qm = fm(q) |> change_precision(Float64)
Ym = fm(y) |> change_precision(Float64);
res_nn = dtwnn(
log1p.(Q.power),
log1p.(Y.power),
DiscreteGridTransportDistance(Cityblock(), eltype(Q.power), n, n),
rad,
saveall = true,
);
This method performs DTW along the time axis and simple Euclidean distance along the frequency axis on the raw spectrograms
res_naive = dtwnn(log1p.(Q.power), log1p.(Y.power), SqEuclidean(), rad, saveall=true);
This method performs DTW along the time axis and an approximation to optimal transport along the frequency axis using the distance between the roots of the estimated linear models
res_dtwrd = dtwnn(
Qm,
Ym,
EuclideanRootDistance(p = 1, weight = simplex_residueweight),
rad,
saveall = true,
);
This method performs DTW along the time axis and an performs true optimal transport along the frequency axis using the location of the roots of the estimated linear models
res_dtwotrd = dtwnn(
Qm,
Ym,
OptimalTransportRootDistance(p = 1, weight = simplex_residueweight),
rad,
saveall = true,
tol = 1e-3,
);
This method performs optimal transport along both frequency and time axes using the location of the roots of the estimated linear models
dist = TimeDistance(
inner = OptimalTransportRootDistance(p = 1, β = 0.5, weight = simplex_residueweight),
tp = 1,
c = 0.1,
);
res_tt = distance_profile(dist, Qm, Ym, tol=1e-3, check_interval=10);
# Plotting
In the following plot, a low score indicates that the pattern is detected. If the plot is made with Plotly, you can deactivate some series to increase legibility by clicking on the corresponding legend entry.
fig1 = plot(Y, layout=@layout([a{1.0w,0.3h}; b]), link=:x, colorbar=false, xlabel="Time [minutes:seconds]");
plot!(plotvecs(ff(res_nn.dists,3))..., sp=2, lab = "DTW-OT", ylabel="Score");
plot!(plotvecs(res_naive.dists)..., sp=2, lab = "DTW-E")
plot!(plotvecs(res_tt)..., sp=2, lab = "Time transport")
plot!(plotvecs(res_dtwrd.dists)..., sp=2, lab = "DTW-WRD")
plot!(plotvecs(res_dtwotrd.dists)..., sp=2, lab = "DTW-OTRD")