Can you find the "edit distance" between two strings? The edit distance is smallest number of changes you can make to get from s1 to s2 (one character at time). The only changes you're allowed to make are:
Note: the strings will consist of only lowercase english letters
Examples:
s1: "addis" s2: "adis" -> 1 (delete second 'd')
s1: "ready" s2: "really" -> 2 (replace 'd' with 'l', insert 'l')
def check(actual, expected):
if expected != actual:
print(f"Function should return the value {expected}, it is returning value {actual}")
else:
print(f"Congratulations, the test cases passed!")
Your solution:
Highlight the invisible text below to see the solution:
How could we break this problem into subproblems? Write down the subproblem in words.
Your solution:
Highlight the invisible text below to see the solution:
What are the "easy" subproblems or base cases? What are their solutions?
Your solution:
Highlight the invisible text below to see the solution:
If the first string is empty (m is 0), we have to insert all the letters from the second string, so the edit distance is the length of the second string.
If the second string is empty (n is 0), we have to delete all the letters from the first string, so the edit distance is the length of the first string.
if m==0: return nif n==0: return m
How can we get from one subproblem to another subproblem? What are the cases?
More specifically, if we want to know the edit_distance(s1[:m], s2[:n])
, how can we calculate that in terms of s1[:m-1], s2[:n-1]), s1[:m]
and s2[:n]
?
Your solution: (write your explanation here)
Highlight the invisible text below to see the solution:
Solution: There are two cases: First Case: the last characters in each word are the same In this case, we don't need to increase the edit distance. We can just move on to the two previous characters. return edit_distance(s1[:m-1], s2[:n-1]) Second Case: the last characters in each word are different In the second case, we have three options: we can either delete, replace, or insert a character to make them the same. The edit distance of s1[:m], s2[:n] will be the minimum of the edit distance resulting from deleting, inserting, or changing a character to make them the same. Don't forget to add one because we made an edit! return 1 + min(edit_distance(s1[:m], s2[:n-1]), # Insert edit_distance(s1[:m-1], s2[:n]), # Remove edit_distance(s1[:m-1], s2[:n-1]) # Replace )
Put these pieces together to write a function edit_distance
that uses recursion to solve the problem!
def edit_distance(str1, str2, m , n):
# write your code here
pass
Test your recursive solution on the following 2 test cases before you proceed to the next exercises (just execute the following cells).
def test_small(fn):
def test_1():
s1 = "ztlqqmtppm"
s2 = "zfewmvuc"
m, n = len(s1), len(s2)
check(fn(s1, s2, m, n), 8)
def test_2():
s1 = "eodh"
s2 = "hmowpx"
m, n = len(s1), len(s2)
check(fn(s1, s2, m, n), 5)
test_1()
test_2()
test_small(edit_distance)
Highlight the invisible text below to see one possible solution:
def edit_distance(str1, str2, m , n): if m==0: return n if n==0: return m if str1[m-1]==str2[n-1]: return edit_distance(str1,str2,m-1,n-1) return 1 + min(edit_distance(str1, str2, m, n-1), # Insert edit_distance(str1, str2, m-1, n), # Remove edit_distance(str1, str2, m-1, n-1) # Replace )
We can use memoization to improve our solution (remember that memoization means keeping track of the subproblems we've already solved so we don't do unnecessary work).
What would our data structure look like to store the subproblems?
Your solution:
Highlight the invisible text below to see the solution:
Solution: We notice that every subproblem is completely defined by the state (m, n). That means we can use a 2-D array to represent every state (0, 0), (0, 1) . . . (m, n). Also note that we need to initialize our memo table with an invalid value to denote that we have not visited this state yet. Here we put 'None'.m = len(s1) n = len(s2) memo = [[None for i in range(n+1)] for j in range(m+1)]
Write a new function called edit_distance_memo
that uses memoization to save the answers into a memo table.
def edit_distance_memo(str1, str2, m, n):
# write your code here. Hint: you may need to create new functions
pass
Now test your solution on the following 4 test cases (just execute the following cell). Also make sure that you have already tested your recursive solution.
def test_large(fn):
def test_1():
s1 = "unfhabkcodvvnhywehylksbxqrpcogowrwhfxppmekrxlmwzvpigguswoigazmwnvkblbcnpjcejetsnafbapjvaykdetadfxtapsgspvacmrzyhhtmquvrltnibowfvjsypixrkfryoaxzrmiyhmzygusdngbcobbvdsvucutgrelrgruipbljesxjudogkqmnqwwqotapqdijmslphcaylzbelbyvbuhghneq"
s2 = "qyquxbguzngqxvqifkizoyelwpnfbcbpcjjnhwxxcuaxjxigibwfwspuouzukzdrobbpcdszhihsizewhtsgtkjncyezdxlfagpgukultsotfxheydwgmjgjrjalxaojblsddnnoavseroauchhryhzjgtgpczlozhikkiaycsteolhecwfutoyrhkrmerdrhmyeecukly"
m, n = len(s1), len(s2)
check(fn(s1, s2, m, n), 199)
def test_2():
s1 = "mzvwhhkckeszquyxfmkvjffintsnhszyvsnjbyoudmlinismsestwagqemblfmrmlakcerxphlyiqxoqmxuqmvrkjapqynworjgjndchvnyawpygkbwqiknhbjflboaauzmkexigmfhkpsckamsgqvtbmpwmnaovdvxbmfowlmarkxwnhldtrwvtbdifi"
s2 = "ovkruoltszfadqmitjvkjakrxljydrbcoxuyiglwoebvhhhqzkopgtyjjrajlpbtkvqcnokttiaurqpueczbzqdtifwwltyxrllihsdskzdqcivcefbpobrinmmmlpodkzimqxrmhzfvdopohtgxeqdtqaugxagqwvsjmvsktjxtlcsixxomkcrrcetjbymuviwcyvssngdkudczeurhbecpjaozavlftpdubowvhfdwqvkijmbvroko"
m, n = len(s1), len(s2)
check(fn(s1, s2, m, n), 209)
test_1()
test_2()
test_small(edit_distance_memo)
test_large(edit_distance_memo)
Highlight the invisible text below to see the solution:
def edit_distance_memo(str1, str2, m, n): memo = [[None for i in range(n+1)] for j in range(m+1)] return dp(str1, str2, m, n, memo) def dp(str1, str2, m, n, memo): if m == 0: return n if n == 0: return m if memo[m][n] != None: return memo[m][n] if str1[m-1] == str2[n-1]: memo[m][n] = dp(str1, str2, m-1, n-1, memo) return memo[m][n] memo[m][n] = 1 + min(dp(str1, str2, m, n-1, memo), # Insert dp(str1, str2, m-1, n, memo), # Remove dp(str1, str2, m-1, n-1, memo) # Replace ) return memo[m][n]
As you're solving other dynamic programming problems, remember the steps you followed here to break down the problem.
How could we break this problem into subproblems? Write down the subproblem in words.
What are the "easy" subproblems or base cases? What are their solutions?
How can we get from one subproblem to another subproblem? What are the cases?
Can you incorporate memoization?