Define a class Trie
that can be used for fast lookups of strings.
Such classes are frequently used in many natural language processing applications.
(Writing such a class is also a common interview question; you need to be able to do it in real time.)
Have your class update a global variable nops
for each node traversal during
add
, lookup
, and remove
operations.
class Trie:
def __init__(self):
pass
def add(self,s,value):
"""Add the string `s` to the `Trie` and
map it to the given value."""
global nops
nops += 1 # this is just a placeholder
pass
def lookup(self,s,default=None):
"""Look up the value corresponding to the
string `s`."""
def remove(self,s):
"""Remove the string s from the Trie.
Returns True if the string was a member."""
pass
def prefix(self,s):
"""Check whether the string `s` is a prefix
of some member."""
pass
def items(self):
"""Return an iterator over the items of the `Trie`."""
Write some unit tests demonstrating that your class works as intended. The next cell gives some examples, but you need to write additional tests for other methods and common sequences of operations.
trie = Trie()
trie.add("hello",1)
trie.add("world",2)
assert not trie.lookup("street")
assert not trie.lookup("house")
assert trie.lookup("hello")
assert trie.lookup("world")
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-21-18e07a553991> in <module>() 4 assert not trie.lookup("street") 5 assert not trie.lookup("house") ----> 6 assert trie.lookup("hello") 7 assert trie.lookup("world") AssertionError:
trie = Trie()
nops = 0
trie.add("hello",1)
assert nops>0
Next, let's measure how Trie
performance scales on real data.
import re
words = re.findall(r'\w+',open("tomsawyer.txt").read())
words = [w.lower() for w in words]
print len(words),words[:10]
74354 ['the', 'adventures', 'of', 'tom', 'sawyer', 'mark', 'twain', 'harper', 'and', 'brothers']
counts = []
for n in range(1000,70000,1000):
trie = Trie()
nops = 0
for i,w in enumerate(words[:n]):
trie.add(w,i)
counts.append((n,nops))
counts = array(counts)
plot(counts[:,0],counts[:,1])
[<matplotlib.lines.Line2D at 0x2fd9a10>]