# More Sorting¶

In [4]:
sort([3,1,4,1,5,9,2])

Out[4]:
[1, 1, 2, 3, 4, 5, 9]
In [24]:
# running time of our sort() function on inputs from size 0 to 3000

............................................................plot_steps: True
0.161 micro-seconds per step
(array([2], dtype=int64),)
Curve (steps): $0.4n^2$


What is the shape of this curve? Have you seen it before?

It turns out that for sorting a list of $n$ elements, our algorithm will take about $10^{-7}n^2$ seconds.

Facebook has $1,000,000,000 = 10^9$ users. If they wanted to sort the list of their users using this algorithm it will take them about $10^{-7}(10^9)^2 = 10^{-7}10^{18}= 10^{11}$ seconds.

In [7]:
print (10**11 / (60*60*24*365)), " years!"

3170  years!


The problem is that when $n$ becomes big, $n^2$ becomes much bigger.

If we had an algorithm that runs in $10^{-7}n$ seconds, then we could sort Facebook's users in $0.01$ seconds. Even an algorithm that runs in time $10^{-2}n$ would take less than twenty minutes to do it.

We see that the effect of $n$ vs. $n^2$ is much more important than the effect of the constant.

Where does the $n^2$ come from?

In [9]:
def find_min_index(L):
current_index = 0
current_min = L[0]
for j in range(1,len(L)):
sys.stdout.write('*')
if current_min > L[j]:
current_min = L[j]
current_index = j
return current_index

In [10]:
def sort(L):
if len(L)<=1:
return L # a one-element list is always sorted
min_idx = find_min_index(L)
print ""
L[0], L[min_idx] = L[min_idx], L[0]
# switch minimum element to first location

return [L[0]] + sort(L[1:len(L)])

In [11]:
sort([10,9,8,7,6,5,4,3,2,1])

*********
********
*******
******
*****
****
***
**
*

Out[11]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

The number of steps we take to sort a list of length $n$ is about $n+(n-1)+(n-2)+\cdots+2+1 = \tfrac{n(n+1)}{2}=0.5n^2+0.5n$

Can we do better?

Turns out that the answer is yes

There is a different sorting algorithm for which the running time looks like:

In [21]:
# running time of alternative sorting algorithm for inputs from size 0 to 200,000

....................................................................................................plot_steps: False
1.000 micro-seconds per step
(array([3, 4], dtype=int64),)
Curve (steps): $0.2n\log n$


That is, sorting $n$ elements takes about $5\cdot 10^{-7}n\log_2 n$ seconds. $\log_2 n$ is much smaller than $n$ ($\log_2 (10,000,000) < 30$) and so this is much better.

In [30]:
# compare n*log(n) vs n^2


In particular this algorithm will take about $5\cdot 10^{-7}\cdot 10^9 \cdot 30 = 150\cdot 10^2 = 1500$ seconds or 25 minutes to sort the Facebook user list

This is $10^9$ times faster than what it would take in the slower sorting algorithm!!

Even if Facebook has a computer that is a million times faster than my laptop, it would still take them more time than for me to sort this list.

The cleverness of the algorithm is more important than the speed of the machine! Next time we will learn a more clever sorting algorithm

In [16]:
# comparison of running time of selection sort and merge sort

..............................plot_steps: False
1.000 micro-seconds per step
..............................plot_steps: True
0.611 micro-seconds per step


### Sorting with different keys¶

Recall the bonus homework exercise of sorting an array by last name:

In [12]:
names = ['abinet mulugeta', 'urgie  huseien', 'yonatan wosenyeleh', 'amanuel asfaw', 'tibebu solomon', 'hailegbrel wudneh', 'gatluk chuol', 'elsabet buzuneh', 'eden ketema', 'maeden seid', 'mikyas legese', 'meskerem birhanu demeke', 'kumneger worku', 'shambel abate', 'hailmeskel shimeles', 'tsega hailu', 'dawit fikeru', 'asmare habitamo', 'zelalem ades', 'betelehem eshetu', 'yosef tadiwos', 'haymanot gidena', 'henock mersha', 'binyam kidane', 'mohammed nur', 'bethelehem walelegn', 'lewi mekonnen', 'wondimu yohanes', 'hodo mukitar', 'yonas adugna', 'tigabu gebrecherkos', 'nardos gesese', 'mohammed nur', 'abdurezak temam', 'shambel elena', 'adem mohamed', 'zakira tebarek', 'lidya gegnaw', 'knesa desta', 'ibrahim ahmed', 'betlehem desalegn', 'adonay geremew', 'kalkidan muluneh', 'haile gebreselasie', 'eden tekilu tilahun', 'ayantu aleneh', 'yosef nosha', 'mebrihity girmay', 'finet hailu', 'elisa feloh', 'bezawit gebremariam', 'nigusu terefe', 'amina bedrie', 'kiflom leuel', 'hana tariku', 'nejat beshir', 'mesfen tamiru', 'shafi abdi', 'kelbesa ambesa', 'abrham tuna', 'daniel hagos', 'yordanos jemberu', 'aman musa', 'habene abdi', 'kawuser jemal', 'tariku erina', 'mesigina gebretsadik', 'yetnayet birhanu', 'semer abrar', 'nur ahmed', 'eman hasen', 'natol gizaw', 'banchayehu asrat', 'hilina thewodros', 'hasen ali', 'mebrihatu lebelo', 'yosef enawgaw', 'nesera teyib', 'mekdes muluneh', 'surafel sewutu', 'mentesenot tefera']

In [13]:
'zelalem ades' < 'betelehem eshetu'

Out[13]:
False

### Our approach:¶

We define a function last_name such that:

In [15]:
last_name('zelalem ades')

Out[15]:
'ades'
In [16]:
last_name('betelehem eshetu')

Out[16]:
'eshetu'
In [17]:
last_name('zelalem ades') < last_name('betelehem eshetu')

Out[17]:
True

Recall that the code of sort was the following:

Now all that is left is to write the function last_name, which will be an exercise for you.

# Sorting in python¶

Python provides a built-in function sorted''

In [22]:
sorted(names)

Out[22]:
['abdurezak temam',
'abinet mulugeta',
'abrham tuna',
'aman musa',
'amanuel asfaw',
'amina bedrie',
'asmare habitamo',
'ayantu aleneh',
'banchayehu asrat',
'betelehem eshetu',
'bethelehem walelegn',
'betlehem desalegn',
'bezawit gebremariam',
'binyam kidane',
'daniel hagos',
'dawit fikeru',
'eden ketema',
'eden tekilu tilahun',
'elisa feloh',
'elsabet buzuneh',
'eman hasen',
'finet hailu',
'gatluk chuol',
'habene abdi',
'haile gebreselasie',
'hailegbrel wudneh',
'hailmeskel shimeles',
'hana tariku',
'hasen ali',
'haymanot gidena',
'henock mersha',
'hilina thewodros',
'hodo mukitar',
'ibrahim ahmed',
'kalkidan muluneh',
'kawuser jemal',
'kelbesa ambesa',
'kiflom leuel',
'knesa desta',
'kumneger worku',
'lewi mekonnen',
'lidya gegnaw',
'maeden seid',
'mebrihatu lebelo',
'mebrihity girmay',
'mekdes muluneh',
'mentesenot tefera',
'mesfen tamiru',
'meskerem birhanu demeke',
'mikyas legese',
'mohammed nur',
'mohammed nur',
'nardos gesese',
'natol gizaw',
'nejat beshir',
'nesera teyib',
'nigusu terefe',
'nur ahmed',
'semer abrar',
'shafi abdi',
'shambel abate',
'shambel elena',
'surafel sewutu',
'tariku erina',
'tibebu solomon',
'tigabu gebrecherkos',
'tsega hailu',
'urgie  huseien',
'wondimu yohanes',
'yetnayet birhanu',
'yonatan wosenyeleh',
'yordanos jemberu',
'yosef enawgaw',
'yosef nosha',
'zakira tebarek',
'zelalem ades']

The function can even sort in reverse:

In [23]:
sorted(names, reverse=True)

Out[23]:
['zelalem ades',
'zakira tebarek',
'yosef nosha',
'yosef enawgaw',
'yordanos jemberu',
'yonatan wosenyeleh',
'yetnayet birhanu',
'wondimu yohanes',
'urgie  huseien',
'tsega hailu',
'tigabu gebrecherkos',
'tibebu solomon',
'tariku erina',
'surafel sewutu',
'shambel elena',
'shambel abate',
'shafi abdi',
'semer abrar',
'nur ahmed',
'nigusu terefe',
'nesera teyib',
'nejat beshir',
'natol gizaw',
'nardos gesese',
'mohammed nur',
'mohammed nur',
'mikyas legese',
'meskerem birhanu demeke',
'mesfen tamiru',
'mentesenot tefera',
'mekdes muluneh',
'mebrihity girmay',
'mebrihatu lebelo',
'maeden seid',
'lidya gegnaw',
'lewi mekonnen',
'kumneger worku',
'knesa desta',
'kiflom leuel',
'kelbesa ambesa',
'kawuser jemal',
'kalkidan muluneh',
'ibrahim ahmed',
'hodo mukitar',
'hilina thewodros',
'henock mersha',
'haymanot gidena',
'hasen ali',
'hana tariku',
'hailmeskel shimeles',
'hailegbrel wudneh',
'haile gebreselasie',
'habene abdi',
'gatluk chuol',
'finet hailu',
'eman hasen',
'elsabet buzuneh',
'elisa feloh',
'eden tekilu tilahun',
'eden ketema',
'dawit fikeru',
'daniel hagos',
'binyam kidane',
'bezawit gebremariam',
'betlehem desalegn',
'bethelehem walelegn',
'betelehem eshetu',
'banchayehu asrat',
'ayantu aleneh',
'asmare habitamo',
'amina bedrie',
'amanuel asfaw',
'aman musa',
'abrham tuna',
'abinet mulugeta',
'abdurezak temam']

and take a key:

In [24]:
sorted(names,key=last_name)

Out[24]:
['shambel abate',
'shafi abdi',
'habene abdi',
'semer abrar',
'ibrahim ahmed',
'nur ahmed',
'ayantu aleneh',
'hasen ali',
'kelbesa ambesa',
'amanuel asfaw',
'banchayehu asrat',
'amina bedrie',
'nejat beshir',
'meskerem birhanu demeke',
'yetnayet birhanu',
'elsabet buzuneh',
'gatluk chuol',
'betlehem desalegn',
'knesa desta',
'shambel elena',
'yosef enawgaw',
'tariku erina',
'betelehem eshetu',
'elisa feloh',
'dawit fikeru',
'tigabu gebrecherkos',
'bezawit gebremariam',
'haile gebreselasie',
'lidya gegnaw',
'nardos gesese',
'haymanot gidena',
'mebrihity girmay',
'natol gizaw',
'asmare habitamo',
'daniel hagos',
'tsega hailu',
'finet hailu',
'eman hasen',
'urgie  huseien',
'kawuser jemal',
'yordanos jemberu',
'eden ketema',
'binyam kidane',
'mebrihatu lebelo',
'mikyas legese',
'kiflom leuel',
'lewi mekonnen',
'henock mersha',
'hodo mukitar',
'abinet mulugeta',
'kalkidan muluneh',
'mekdes muluneh',
'aman musa',
'yosef nosha',
'mohammed nur',
'mohammed nur',
'maeden seid',
'surafel sewutu',
'hailmeskel shimeles',
'tibebu solomon',
'mesfen tamiru',
'hana tariku',
'zakira tebarek',
'mentesenot tefera',
'eden tekilu tilahun',
'abdurezak temam',
'nigusu terefe',
'nesera teyib',
'hilina thewodros',
'abrham tuna',
'bethelehem walelegn',
'kumneger worku',
'yonatan wosenyeleh',
'hailegbrel wudneh',
'wondimu yohanes']

It is also quite fast:

In [47]:
# merge sort vs Python built-in sorted algorithm

..........plot_steps: False
1.000 micro-seconds per step
..........plot_steps: True
0.506 micro-seconds per step


# Why teach sorting?¶

and it's even faster than the best algorithm we could write.

So why force you to learn and code sorting algorithms?

Answer 1: Computer Science and programming is more than just Python.

Answer 2: I am not trying to teach you to sort numbers. I am trying to teach you how to think.

# Lab work¶

## Exercise 1¶

Write the function sort4(L) that takes a list of 4 elements and sorts it. The last line of the function must be return [L[0]]+sort3(L[1:4])

In [ ]:
def sort4(L):
#
#
return [L[0]]+ sort3(L[1:4])


Here are some output examples:

In [ ]:
sort4([7,8,1,2])

In [ ]:
sort4([1,9,2,3])

In [ ]:
sort4(['Mickey','Donald','Goofy','Minney'])


## Exercise 2¶

Suppose that you are given the function sort9 that sorts a list of 9 elements. Write a function sort10(L) that sorts a list L of 10 elements. The last line of the function must be return [L[0]]+sort9(L[1,4])

In [ ]:
# you can use this function as a "black box" but there's no need to read it or understand its code
def sort9(L):
return sorted(L[0:9])

In [ ]:
def sort10(L):
#
#
return [L[0]] + sort9(L[1:10])

In [ ]:
sort10([0, 1, 6, 10, 9, 3, 3, 9, 9, 5])

In [ ]:
sort10([15, 16, 19, 13, 5, 1, 7, 19, 12, 4])

In [ ]:
sort10([5, 9, 13, 8, 15, 17, 20, 9, 10, 8])


The array below contains the names of all the students that were registered to the course. Compute an array that contains these students in alphabetical order by first name. Use the function you wrote to sort it by first name.

In [ ]:
L = ['abinet mulugeta', 'urgie  huseien', 'yonatan wosenyeleh', 'amanuel asfaw', 'tibebu solomon', 'hailegbrel wudneh', 'gatluk chuol', 'elsabet buzuneh', 'eden ketema', 'maeden seid', 'mikyas legese', 'meskerem birhanu demeke', 'kumneger worku', 'shambel abate', 'hailmeskel shimeles', 'tsega hailu', 'dawit fikeru', 'asmare habitamo', 'zelalem ades', 'betelehem eshetu', 'yosef tadiwos', 'haymanot gidena', 'henock mersha', 'binyam kidane', 'mohammed nur', 'bethelehem walelegn', 'lewi mekonnen', 'wondimu yohanes', 'hodo mukitar', 'yonas adugna', 'tigabu gebrecherkos', 'nardos gesese', 'mohammed nur', 'abdurezak temam', 'shambel elena', 'adem mohamed', 'zakira tebarek', 'lidya gegnaw', 'knesa desta', 'ibrahim ahmed', 'betlehem desalegn', 'adonay geremew', 'kalkidan muluneh', 'haile gebreselasie', 'eden tekilu tilahun', 'ayantu aleneh', 'yosef nosha', 'mebrihity girmay', 'finet hailu', 'elisa feloh', 'bezawit gebremariam', 'nigusu terefe', 'amina bedrie', 'kiflom leuel', 'hana tariku', 'nejat beshir', 'mesfen tamiru', 'shafi abdi', 'kelbesa ambesa', 'abrham tuna', 'daniel hagos', 'yordanos jemberu', 'aman musa', 'habene abdi', 'kawuser jemal', 'tariku erina', 'mesigina gebretsadik', 'yetnayet birhanu', 'semer abrar', 'nur ahmed', 'eman hasen', 'natol gizaw', 'banchayehu asrat', 'hilina thewodros', 'hasen ali', 'mebrihatu lebelo', 'yosef enawgaw', 'nesera teyib', 'mekdes muluneh', 'surafel sewutu', 'mentesenot tefera']


## Exercise 3¶

Sort the array above in reverse alphabetical order by first name (so that L[0] will be the name that is last in alphabetical order and L[80] will be the name that is first)

In [ ]:
# your code goes here


## Exercise 4¶

Sort the array in alphabetical order by last name.

In [ ]:
#your code goes here


### Exercise 5¶

Write a function last_name that on input a string $s$, will find the first space character ' ' in $s$, and will return the rest of $s$. You don't have to worry about strings that don't contain spaces or contain more than one space.

In [34]:
last_name('boaz barak')

Out[34]:
'barak'`