Recursion recursion recursion!
We saw two recursive functions for finding the maximal element in a list. We discussed quicksort. Then we wrote three recursive functions: subset sum, change and counting paths. We've briefly discussed a combinatorial solution for the counting paths problem.
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
The maximum is the maximal value between lst[0] and the result of recursively finding the max in lst[1:].
Let $n$ denote the size of lst.
Recursion depth: $O(n)$
Time complexity: $O(n^2)$
def max1(L):
if len(L) == 1:
return L[0]
return max(max1(L[1:]), L[0])
The maximum is the maximal value between the result of recursively finding the max in lst[:n//2] and the result of recursively finding the max in lst[n//2:], where $n$ denotes the size of lst.
Recursion depth: $O(\log{n})$
Time complexity: $O(n\log{n})$
def max2(L):
if len(L) == 1:
return L[0]
l = max2(L[:len(L) // 2])
r = max2(L[len(L) // 2:])
return max(l, r)
Since slicing is a costly action we can do things better. Instead of slicing the list each time we will maintain indices for the "active" part of the list (like we did for binary search) and simply recurse after updating the indices according to the same logic.
We also add envelope functions for a more user-friendly code.
How does time/depth change for each function? The depth is clearly unaffected. Time, however, is much better.
Since we only do $O(1)$ work in the function, the runtime is analogous to computing the size of the tree which is $O(n)$ in both cases.
def max11(L,left,right):
if left == right:
return L[left]
return max(L[left], max11(L, left + 1, right))
def max22(L, left, right):
if left == right:
return L[left]
mid = (left + right) // 2
l = max22(L, left, mid)
r = max22(L, mid + 1, right)
return max(l, r)
def max1_slice(L):
return max11(L, 0, len(L) - 1)
def max2_slice(L):
return max22(L, 0, len(L) - 1)
import time
import random
import sys
sys.setrecursionlimit(3000)
for f in [max1, max2]:
print(f.__name__)
for n in [500, 1000, 2000]:
L = [i for i in range(n)]
random.shuffle(L)
tic = time.process_time()
for x in range(10):
m = f(L)
toc = time.process_time()
print("n=", n, ": ",toc - tic)
Quicksort has a very simple recursive logic:
import random
def quicksort(lst):
""" quick sort of lst """
if len(lst) <= 1:
return lst
else:
pivot = random.choice(lst) # select a random element from list
smaller = [elem for elem in lst if elem < pivot]
equal = [elem for elem in lst if elem == pivot]
greater = [elem for elem in lst if elem > pivot]
return quicksort(smaller) + equal + quicksort(greater) #two recursive calls
def det_quicksort(lst):
""" sort using deterministic pivot selection """
if len(lst) <= 1:
return lst
else:
pivot = lst[0] # select first element from list
smaller = [elem for elem in lst if elem < pivot]
equal = [elem for elem in lst if elem == pivot]
greater = [elem for elem in lst if elem > pivot]
return det_quicksort(smaller) + equal + det_quicksort(greater) #two recursive calls
The worst case and best case analyses were discussed in class. It is interesting to note that:
The subset sum problem is described as follows: given a list of integers $L$ and a value $s \in \mathbb{Z}$, is there a subset $S \subseteq L$ such that: $$\sum_{x \in S} x = s$$ If such an $S$ exists we return True, otherwise False.
Examples:
The base cases are pretty straight-forward:
What about the recursive call? Well, if $s \neq 0, L \neq []$ then the following holds:
So what do we do? We recursively check if either $L[1:], s - L[0]$ or $L[1:], s$ can be solved, and we only return False if both fail. Let's code:
def subset_sum(L, s):
if s == 0:
return True
if L == []:
return False
with_first = subset_sum(L[1:], s - L[0])
without_first = subset_sum(L[1:], s)
return with_first or without_first
# Sanity checks
L1 = [4, -7, 12, 5, 1]
L2 = [1, 2, 4, 8, 16]
s1 = 6
s2 = 32
print(subset_sum(L1, s1))
print(subset_sum(L2, s2))
What is the running time of the code in relation to the size of the list $|L| = n$?
The recurrence relation is clearly $T(n) = 2 \cdot T(n-1) + O(1)$, which yields $T(n) = 2^n$.
In class we've seen how to transform recursive algorithms which run in exponential time into iterative algorithms that run in linear time (i.e. - Fibonacci, factorial).
Can you think of a better solution? One that works in time $O(n^2)$? How about $O(n^{100})$?
Incredibly, there is a wide held belief that this problem yields no algorithm which runs in time $\mathrm{poly}(n)$. More on that later (much later, say, next year in computational models)...
A bus driver needs to give an exact change and she has coins of limited types. She has infinite coins of each type. Given the amount of change ($amount$) and the coin types (the list $coins$), how many ways are there?
def change(amount, coins):
if amount == 0:
return 1
elif amount < 0 or coins == []:
return 0
return change(amount, coins[:-1]) +\
change(amount - coins[-1], coins)
change(5, [1,2,3])
Consider the case where $amount = n^2, coins = [1,2,\ldots, n]$. When we call $change(amount, coins)$ the first level of the recursion calls $change([1,\ldots, n-1], n^2)$ and $change([1,\ldots, n], n^2 - n)$. This means that the list size in the recusrive calls is at least $n-1$ and $amount \geq n^2 -n$.
One can show (using induction) that in the first $k \leq n$ levels of recursion the list size is at least $n - k$ and $amount \geq n k$, thus there are two recursive calls at each of these layers.
This gives a running time of at least $2^n$ by the same argument as that we applied in the subset sum problem.
It is interesting to note that while the two problem share some similarities, one major difference is that in the subset sum problem we are asking whether a solution exists while in the change problem we are trying to count the number of valid solutions to a problem.
This is a recurring theme in CS that we will encounter many times in the future.
Question 2(a) from the 2015 fall semester exam (Moed B).
Given a list $L$ of non-negative integers with $len(L) = d$ we consider $L$ as a point in a $d$-dimensional space.
For example: $L = [0, 0, 0]$ represents the origin in a $3$-dimensional space.
Our goal is to find how many ways we have of getting from $[0, \ldots, 0]$ to $L$ by advancing only forward.
For example, if $L=[1, 2]$ then we have three such paths:
Again, we first think of the base case, and then reduce big problems to smaller ones.
This gives rise to a simple recursive algorithm:
def cnt_paths(L):
if all_zeros(L):
return 1
result = 0
for i in range(len(L)):
if L[i] != 0:
L[i] -= 1
result += cnt_paths(L)
L[i] += 1
return result
def all_zeros(L):
for i in L:
if i != 0:
return False
return True
print(cnt_paths([3,4,5]))
Let's take a simple case where $L = [n, n, \ldots, n]$ and $|L| = d$. I.e. - we are in a $d$-dimensional space and we want to get to $[n, \ldots, n]$.
How does the recursion tree for this problem looks like? At each node we have $k$ recursive calls where $k$ is the number of non-zero coordinates in the current list. This means that at the $i$th level of the recursion we have $sum(L) = nd - i$, and so each path in the tree has length exactly $nd$ and in particular the depth of the tree is $nd$.
Additionally, the leaves in the tree are exactly the "legal paths" which we count.
Since we increment $cnt$ by $1$ for each legal path, this means that the running time is at least $cnt$.
How big can $cnt$ get, and can we do better?