%pylab inline
Populating the interactive namespace from numpy and matplotlib
import torch
from torch import nn
from torchmore import layers, flex
OCR: "The ca? drives down the road." ? = r or n
Correct: "The car drives down the road."
$P(\hbox{next character} | \hbox{previous characters})$
E.g.,
$$P(c | \hbox{"neur"}) = ...$$High for $c \in \{"a", "o"\}$, low for $c \in \{"t", "x"\}$
For $n$-gram models, we simply tabulate $P(s_n | s_{s-1} ... s_{s-n})$ for all valid strings $s$.
from collections import Counter
import re
training_set = [ s.strip().lower() for s in open("/usr/share/dict/words").readlines()]
training_set = [ s for s in training_set if not re.search(r"[^a-z]", s)]
print(len(training_set))
print(training_set[:10])
282695 ['a', 'amd', 'aol', 'aws', 'aachen', 'aalborg', 'aalesund', 'aaliyah', 'aalst', 'aalto']
n = 3
suffixes, ngrams = Counter(), Counter()
for s in training_set:
s = "_"*n + s
for i in range(n, len(s)):
suffixes[s[i-n:i-1]] += 1
ngrams[s[i-n:i]] += 1
Now we can estimate probabilities easily:
def prob(s):
return ngrams[s] / float(suffixes[s[:-1]])
probabilities = sorted([(prob("aa"+chr(i)), "aa"+chr(i)) for i in range(ord("a"), ord("a")+26)], reverse=True)
probabilities[:10]
[(0.2603550295857988, 'aal'), (0.1893491124260355, 'aar'), (0.08875739644970414, 'aan'), (0.08284023668639054, 'aat'), (0.08284023668639054, 'aas'), (0.07100591715976332, 'aam'), (0.03550295857988166, 'aai'), (0.029585798816568046, 'aag'), (0.029585798816568046, 'aab'), (0.023668639053254437, 'aaf')]
Under some assumptions, the optimal solution is given by:
$$D(x) = \arg\max_s P(s) \prod P(s_i | x_i)$$$P(s)$: language model, $P(s_i | x_i)$: "acoustic" model
Basis:
Complications:
Reasoning:
Figure: Prabhavalkar, Rohit, et al. "A Comparison of Sequence-to-Sequence Models for Speech Recognition." Interspeech. 2017.
Chorowski, Jan, et al. "End-to-end continuous speech recognition using attention-based recurrent nn: First results." arXiv preprint arXiv:1412.1602 (2014).
Figure: Prabhavalkar, Rohit, et al. "A Comparison of Sequence-to-Sequence Models for Speech Recognition." Interspeech. 2017.
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
from torch import nn
class Seq2Seq(nn.Module):
def __init__(self, ninput, noutput, nhidden, pre=None, nlayers=1):
super().__init__()
self.encoder = nn.LSTM(ninput, nhidden, num_layers=nlayers)
self.decoder = nn.LSTM(noutput, nhidden, num_layers=nlayers)
self.out = nn.Linear(nhidden, noutput)
self.logsoftmax = nn.LogSoftmax(dim=0)
For a seq2seq model, we need an encoder, a decoder, and an output map
def forward(self, inputs, target, forcing=0.5):
_, state = self.encoder(inputs)
for i in range(len(targets)):
decoder_output, state = self.decoder(decoder_input.view(1, 1, -1), state)
_, pred = self.logsoftmax(self.out(decoder_output)[0, 0]).topk(1)
decoder_input = hotone(pred)
def greedy_predict(self, inputs):
_, state = self.encoder(inputs)
result = []
for i in range(MAXLEN):
decoder_output, state = self.decoder(decoder_input.view(1, 1, -1), state)
_, pred = self.logsoftmax(self.out(decoder_output)[0, 0]).topk(1)
decoder_input = hotone(pred)
result.append(pred)
if pred==0: break
return result
Bluche, Théodore, Jérôome Louradour, and Ronaldo Messina. "Scan, attend and read: End-to-end handwritten paragraph recognition with mdlstm attention." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017.
Wigington, Curtis, et al. "Start, follow, read: End-to-end full-page handwriting recognition." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
Questions:
Applications: