Many times, recursion gives us a clean way to think about problems and solve them.
But a recursive program is often slower than non recursive version.
So sometimes, after finding a recursive solution, we want to transform it to a non recursive solution.
Understanding how the non recursive function also helps us understand the recursive version better.
Recall the recursive code for binary search:
def bin_search(L,item):
n = len(L)
if n==0:
return -1
m = int(n/2)
if L[m]==item:
return m
if L[m]>item:
return bin_search(L[:m],item)
res = bin_search(L[m+1:n],item)
return -1 if res==-1 else m+1+res
To make it non recursive we will do the following:
def bin_search_nr(L,item):
left = 0
right= len(L)
while right-left >0:
m = int((left+right)/2)
if L[m]==item:
return m
if L[m]>item:
right = m
else:
left = m+1
return -1
L = range(0,200,2)
bin_search_nr(L,100)
50
bin_search_nr(L,101)
-1
def find_min_index(L):
current_index = 0
current_min = L[0]
for j in range(1,len(L)):
if current_min > L[j]:
current_min = L[j]
current_index = j
return current_index
def selection_sort(L):
if len(L)<=1:
return L # a one-element list is always sorted
min_idx = find_min_index(L) #non-recursive helper function
L[0], L[min_idx] = L[min_idx], L[0]
return [L[0]] + sort(L[1:len(L)])
def selection_sort_nr(L):
for i in range(len(L)):
min_idx = i+find_min_index(L[i:])
L[i], L[min_idx] = L[min_idx], L[i]
return L
selection_sort_nr([3,1,4,1,5,9,2])
[1, 1, 2, 3, 4, 5, 9]
def merge_lists(L1,L2):
i=0
j=0
res = []
while i<len(L1) and j<len(L2):
if L1[i] < L2[j]:
res.append(L1[i])
i += 1
else:
res.append(L2[j])
j += 1
res += L1[i:]+L2[j:]
return res
def merge_sort(L):
if len(L) <= 1:
return L
m = int(len(L)/2)
L1 = merge_sort(L[0:m])
L2 = merge_sort(L[m:])
return merge_lists(L1,L2)
merge_sort([3,1,4,1,5,9,2])
[1, 1, 2, 3, 4, 5, 9]
def merge_sort_nr(L):
lists = [ [x] for x in L]
while len(lists)>1:
new_lists = []
if len(lists) % 2:
lists.append([])
for i in range(0,len(lists)-1,2):
new_lists.append(merge_lists(lists[i],lists[i+1]))
lists = new_lists
return lists[0]
merge_sort_nr([3,1,4,1,5,9,2])
[1, 1, 2, 3, 4, 5, 9]
Often in computation we have data from the world, and a question we want to answer about these data.
To do so, we need to find a model for the data, and a way to translate our question into a mathemtical question about the model
Here are some examples:
Suppose you have a map of Addis Ababa and want to find out what's the fastest way to get from the national museum to the market.
Suppose you are Facebook and you are trying to figure out how many friends of friends does the average Ethiopean has.
Suppose you are a geneticist, and are trying to figure out which genes are related to a particular type of colon cancer.
What is perhaps most surprising is that these and any many other questions, all use the same mathematical model of a graph
A graph is just a way to store connections between pairs of entities:
The graph of Addis's roads could be composed of all street intersections, with a connection between intersection $u$ and intersection $v$ if they are directly connected by a road.
The Facebook graphs is composed of all Facebook users, with a connection between user $u$ and user $v$ if they are friends.
The gene-symptom interaction graph is composed of all genes and all "symptoms" (also known as phenotypes: some observable differences in people), where gene $u$ is connected to symptom $v$ if there is a correlation between people having the gene $u$ and symptom $v$.
Mathematically, a graph is a set $V$ of vertices and a set $E$ of pairs of these vertices which is known as the set of edges. We say that a vertex $u\in V$ is connected to $v\in V$ if the pair $(u,v)$ is in $E$.
A graph where $(u,v)\in E$ if and only if $(v,u)\in E$ is known as an undirected graphs. Undirected graphs form an important special case, and we will mostly be interested in those graphs.
Sometimes the edges (or vertices) of the graph are labeled (often by a number), for example in the case of the road network, we might label every road segment with the average time it takes to travel from one end to the other.
There are two main representations for graphs. We can always assume the vertices are simply identified by the numbers $1$ to $n$ for some $n$.
The adjacency list representation is an array $L$ where $L[i]$ is the list of all neighbors of the vertex $i$ (i.e., all $j$ such that $(i,j)\in E$)
The adjacency matrix representation is an $n\times n$ two-dimensional array $M$ (i.e., matrix) such that $M[i][j]$ equals $1$ if $j$ is a neighbor of $i$ and equals $0$ otherwise.
G = [[1],[2],[3],[0]]
draw_graph(G)
shell_layout
G = [[1,2,3,4,5,6]]
draw_graph(G)
shell_layout
n = 20
G = [ [(i+1) % n] for i in range(n) ]
draw_graph(G)
spectral_layout
def grid_neighbors(i,j,n):
if i==n-1 and j== n-1: return []
if i==n-1:
return [i*n+j+1]
if j==n-1:
return [(i+1)*n+j]
return [n*i+((j+1) % n), n*((i+1) % n)+j]
n = 5
G = [ grid_neighbors(i,j,n) for i in range(n) for j in range(n) ]
draw_graph(G,'grid_layout')
grid_layout
Given $i,j$ and a graph $G$: find out if $j$ is connected to $i$ (perhaps indirectly) in the graph
Here is a natural suggestion for a recursive algorithm:
$connected(i,j,G)$ is True if $i$ is a neighbor of $j$, and otherwise it is True if there is some neighbor $k$ of $i$ such that $k$ is connected to $j$.
Let's code it up try to see what happens:
def connected(i,j,G):
sys.stdout.write('.')
if j in G[i]:
return True
return any([connected(k,j,G) for k in G[i]])
def undir(G):
n = max(max(G[i]) if G[i] else 0 for i in range(len(G)))
n = max(n+1,len(G))
_G = [[] for i in range(n)]
for i in range(len(G)):
for j in G[i]:
if not j in _G[i]:
_G[i].append(j)
if not i in _G[j]:
_G[j].append(i)
return _G
G = [[1],[2],[3],[4],[]]
draw_graph(G)
spectral_layout
G = undir(G)
G
[[1], [0, 2], [1, 3], [2, 4], [3]]
connected(0,1,G)
.
True
connected(0,2,G)
..
True
connected(0,3,G)
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
------------------------------------------------------------------------ RuntimeError Traceback (most recent call last) <ipython-input-33-f99c3502d881> in <module>() ----> 1 connected(0,3,G) <ipython-input-27-8da4daf39797> in connected(i, j, G) 3 if j in G[i]: 4 return True ----> 5 return any([connected(k,j,G) for k in G[i]]) ... last 1 frames repeated, from the frame below ... <ipython-input-27-8da4daf39797> in connected(i, j, G) 3 if j in G[i]: 4 return True ----> 5 return any([connected(k,j,G) for k in G[i]]) RuntimeError: maximum recursion depth exceeded while calling a Python object
The problem is that we are getting into an infinite loop! We can fix this by remembering which vertices we visited.
def grid_input(n): # return a n by n grid with an isolated vertex
G = [ grid_neighbors(i,j,n) for i in range(n) for j in range(n) ]
G.append([])
G = undir(G)
return (0,len(G)-1,G)
def connected(source,target,G):
added = [False for i in range(len(G))]
added[source] = True
to_visit = [source] # to visit: list of vertices that are definitely connected to the source
while to_visit:
step_pc() # count how many times the while loop is executed
i = to_visit.pop()
if i==target:
return True
for j in G[i]:
if not added[j]:
added[j] = True
to_visit.append(j)
return False
G = undir([[1],[2],[0],[]])
draw_graph(G)
spring100_layout
print connected(0,1,G) , connected(0,3,G)
True False
# running time of connectivity algorithm
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ....................................................................................................................................................................................................... 0.684 micro-seconds per step (array([4], dtype=int64),) Curve (steps): $n$
Let's see how the evolution of the algorithm looks on a typical graph:
def connected_viz(source,target,G,layout_method=None):
initialize_animation(G,my_layout_method=layout_method)
visited = [False for i in range(len(G))]
to_visit = [source] # to visit: list of vertices that are definitely connected to the source
while to_visit:
step_pc() # count how many times the while loop is executed
i = to_visit.pop()
color(i,'r') # red: observed
if i==target:
return True
visited[i] = True
for j in G[i]:
if not visited[j]:
to_visit.append(j)
color(j,'g') # green: waiting to be visited
return False
(s,t,G) = grid_input(5)
draw_graph(G,'grid_layout')
grid_layout
connected_viz(s,t,G,'grid_layout')
False
show_animation()
saving.. rendering..
def connected_FIFO(source,target,G):
added = [False for i in range(len(G))]
added[source] = True
to_visit = [source] # to visit: list of vertices that are definitely connected to the source
while to_visit:
i = to_visit.pop(0) # remove first element
if i==target:
return True
for j in G[i]:
if not added[j]:
added[j] = True
to_visit.append(j)
return False
def connected_FIFO_viz(source,target,G, layout_method = None):
initialize_animation(G,my_layout_method=layout_method)
added = [False for i in range(len(G))]
added[source] = True
to_visit = [source] # to visit: list of vertices that are definitely connected to the source
while to_visit:
step_pc() # count how many times the while loop is executed
i = to_visit.pop(0) # remove first element
color(i,'r') # red: observed
if i==target:
return True
for j in G[i]:
if not added[j]:
added[j] = True
to_visit.append(j)
color(j,'g') # green: added to queue
return False
(s,t,G) = grid_input(5)
connected_FIFO_viz(s,t,G,'grid_layout')
show_animation()
saving.. rendering..
The function connected
is known as depth first search and connected_FIFO
is known as breadth first search
This week you actually managed to do some pretty impressive work - congratulations
What I hope you learned:
Coding is about understanding what problem you need to solve then breaking it into smaller problems
This is not about typing or computers but about thinking, just like math.
My main hope:
You are always welcome to contact me:
Email: b@boazbarak.org
Web page: http://www.boazbarak.org
Implement a function hasElementSum(n, L)
where n
is an int
and L
is a list of int
s. The function should return
False
if no two distinct elements in L
sum to n
, and otherwise it should return a list of size two, where the
elements of the returned list are two elements in L
which sum to n
. There can be multiple valid return values.
def hasElementSum(n, L):
# write your code here
pass
print hasElementSum(5, [1,2,3,4])
# can return either [1,4], [4,1], [2,3], or [3,2]
print hasElementSum(8, [1,2,3,4])
# should return False
print hasElementSum(4, [2,2])
# should return [2,2]
Implement a function hasElementSumSorted(n, L)
where n
is an int
and L
is a sorted list of int
s. The function should return
False
if no two distinct elements in L
sum to n
, and otherwise it should return a list of size two, where the
elements of the returned list are two elements in L
which sum to n
. There can be multiple valid return values. Your code should be able to handle lists of very large size (for example, of size one million). Hint:
use binary search.
# now L is sorted, from smallest to biggest
def hasElementSumSorted(n, L):
# write your code here
pass
print hasElementSumSorted(750000, range(1,1000000))
# there are many correct return values [a,b], but a+b should sum to 750,000, be different, and be in the range
# from 1 to 999,999
None
Define a function flooredSquareRoot(n)
which takes a positive int
or long
n
and computes its square root, rounded down to the nearest integer. Python has a buit-in sqrt
function which could be helpful here, but don’t use it. You also should not use the exponentiation operator **
. Your code should run
quickly as long as n
is not bigger than 1,000,000
.
def flooredSquareRoot(n):
# write your code here
pass
print flooredSquareRoot(10)
# should print 3
print flooredSquareRoot(25)
# should print 5
print flooredSquareRoot(1000001)
# should print 1000
Write a function flooredSquareRootFast(n)
which works just as above, but is fast even for very large numbers (see below). Use binary search.
def flooredSquareRootFast(n):
# write your code here
pass
t = 10**50 + 1
print flooredSquareRootFast(t)
# should print 10000000000000000000000000 (that's 25 zeroes)
# note: if you didn't use the Fast version, this would take a really long time!
Implement a function calcNthSmallest(n, intervals)
which takes as input a nonnegative int n
, and a list of intervals [[a1 , b1 ], . . . , [am , bm ]] and calculates the nth smallest number
(0-indexed) when taking the union of all the intervals with repetition. For example, if the intervals
were [1, 5], [2, 4], [7, 9], their union with repetition would be {1, 2, 2, 3, 3, 4, 4, 5, 7, 8, 9} (note 2, 3, 4
each appear twice since they’re in both the intervals [1, 5] and [2, 4]). For this list of intervals, the
0th smallest number would be 1, and the 3rd and 4th smallest would both be 3.
Your implementation should run quickly even when the ai, bi can be very large (like, one trillion),
and there are several intervals (use binary search). First try a version without binary search that works fast when the ai and bi are small.
You may find it useful to implement the helper functions below.
# compute the index of the first time x appears in the union of intervals
def firstTime(x, intervals):
pass
# compute the index of the last time x appears in the union of intervals
def lastTime(x, intervals):
pass
def calcNthSmallest(n, intervals):
# write your code here
pass