This notebook was prepared by Donne Martin. Source and license info is on GitHub.

# Solution Notebook¶

## Constraints¶

• Can we assume the inputs are valid?
• No
• Can we assume the strings are ASCII?
• Yes
• Is this case sensitive?
• Yes
• Is a subsequence a non-contiguous block of chars?
• Yes
• Do we expect a string as a result?
• Yes
• Can we assume this fits memory?
• Yes

## Test Cases¶

• str0 or str1 is None -> Exception
• str0 or str1 equals 0 -> ''
• General case

str0 = 'ABCDEFGHIJ' str1 = 'FOOBCDBCDE'

result: 'BCDE'

## Algorithm¶

We'll use bottom up dynamic programming to build a table.

The rows (i) represent str0.
The columns (j) represent str1.

str1
-------------------------------------------------
|   |   | A | B | C | D | E | F | G | H | I | J |
-------------------------------------------------
|   | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| F | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
| O | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
s | O | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
t | B | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
r | C | 0 | 0 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
0 | D | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| B | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| C | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| D | 0 | 0 | 1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| E | 0 | 0 | 1 | 2 | 3 | 4 | 4 | 4 | 4 | 4 | 4 |
-------------------------------------------------

if str1[j] != str0[i]:
T[i][j] = max(
T[i][j - 1],
T[i - 1][j])
else:
T[i][j] = T[i - 1][j - 1] + 1


Complexity:

• Time: O(m * n), where m is the length of str0 and n is the length of str1
• Space: O(m * n), where m is the length of str0 and n is the length of str1

## Code¶

In [1]:
class StringCompare(object):

def longest_common_subseq(self, str0, str1):
if str0 is None or str1 is None:
raise TypeError('str input cannot be None')
# Add one to number of rows and cols for the dp table's
# first row of 0's and first col of 0's
num_rows = len(str0) + 1
num_cols = len(str1) + 1
T = [[None] * num_cols for _ in range(num_rows)]
for i in range(num_rows):
for j in range(num_cols):
if i == 0 or j == 0:
T[i][j] = 0
elif str0[j - 1] != str1[i - 1]:
T[i][j] = max(T[i][j - 1],
T[i - 1][j])
else:
T[i][j] = T[i - 1][j - 1] + 1
results = ''
i = num_rows - 1
j = num_cols - 1
# Walk backwards to determine the subsequence
while T[i][j]:
if T[i][j] == T[i][j - 1]:
j -= 1
elif T[i][j] == T[i - 1][j]:
i -= 1
elif T[i][j] == T[i - 1][j - 1] + 1:
results += str1[i - 1]
i -= 1
j -= 1
else:
raise Exception('Error constructing table')
# Walking backwards results in a string in reverse order
return results[::-1]


## Unit Test¶

In [2]:
%%writefile test_longest_common_subseq.py
import unittest

class TestLongestCommonSubseq(unittest.TestCase):

def test_longest_common_subseq(self):
str_comp = StringCompare()
self.assertRaises(TypeError, str_comp.longest_common_subseq, None, None)
self.assertEqual(str_comp.longest_common_subseq('', ''), '')
str0 = 'ABCDEFGHIJ'
str1 = 'FOOBCDBCDE'
expected = 'BCDE'
self.assertEqual(str_comp.longest_common_subseq(str0, str1), expected)
print('Success: test_longest_common_subseq')

def main():
test = TestLongestCommonSubseq()
test.test_longest_common_subseq()

if __name__ == '__main__':
main()

Overwriting test_longest_common_subseq.py

In [3]:
%run -i test_longest_common_subseq.py

Success: test_longest_common_subseq