cs1001.py , Tel Aviv University, Spring 2019

Recitation 3

We talked about binary numbers (with some theory) and base conversions and implemented an algorithm for base conversion (from decimal to 2<=b<=36). We examined Python's memory model and analyzed the efficiency of constructing a list using + or += operators.

Takeaways:

  1. Make sure you understand binary numbers and base conversions (including the algorithms for conversion to and from a base b to decimal). It is a very useful tool in computer science.
  2. Elements of a list can be changed from inside a function, if this list is given as a parameter. Note that one should use dedicated list functions or the [] operator for mutating the list.
  3. Use Python tutor in order to understand what's going on in terms of memory. It can be very helpful.
  4. Try to analyze the number of operations your function does to see how will its runtime scale as a function of the input (we will elabore on this soon).

Python tutor guidelines:

Before you click "Visualize Execution" button, you may want to use the following settings (can be adjusted via the drop boxes next to the textbox):

  1. Python 3.6
  2. Show exited frames (Python)
  3. Render all objects on the heap
  4. Draw pointers as arrows [default]

The binary system and base conversions

We use decimal numbers, that is, numbers in base 10. These numbers are represented using 10 digits (0-9). Binary numbers (in base 2) use two possible digits: 0-1. In general, a number in base $b$ will be represented using $b$ possible digits.

Code for printing several outputs in one cell (not part of the recitation):

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Base conversions

Using Python functions:

In [3]:
bin(10)
type(bin(10))
hex(161)
int("345")
int("1101", 2)
int("a1", 16)
int("a1", 2)
Out[3]:
'0b1010'
Out[3]:
str
Out[3]:
'0xa1'
Out[3]:
345
Out[3]:
13
Out[3]:
161
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-45b2071e6420> in <module>
      5 int("1101", 2)
      6 int("a1", 16)
----> 7 int("a1", 2)

ValueError: invalid literal for int() with base 2: 'a1'

Converting from binary to decimal

Let $x_{base 2} = a_{n-1} ... a_1 a_0$ be a binary number in base 2 with $n$ digits. The following formula returns its decimal representation: $$x_{base 10} = \sum_{0 \le k \le n-1} a_k \cdot 2^k$$

Converting from decimal to binary

Converting from decimal to binary is done by integer division and modulo operations. Here is an implementation of the algorithm for conversion from decimal to a general base $b$.

Our version for convert_base:

In [6]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    result = ""
    while n != 0:
        digit = n % b
        n //= b
        result = str(digit) + result
        
    return result
    
    
    
    
In [7]:
convert_base(10,2)
Out[7]:
'1010'
In [8]:
convert_base(0, 2)
Out[8]:
''

convert_base above mishandles n = 0

Improved version of convert_base:

In [26]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    if n == 0:
        return "0"
    
    result = ""
    while n != 0:
        digit = n % b
        n //= b
        result = str(digit) + result
        
    return result
    
    
In [10]:
convert_base(0,2)
convert_base(12,3)
Out[10]:
'0'
Out[10]:
'110'
In [11]:
convert_base(161,16)
Out[11]:
'101'

The improved version above fails for 10<b (why ?) This issue is fixed in the final version below (improvement #2) (what happens when b = 1?)

In [ ]:
alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
In [ ]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    assert 2 <= b <= 36
    
    if n == 0:
        return "0"
    
    result = ""
    alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
    while n != 0:
        digit = n % b
        n //= b
        result = alphabet[digit] + result
        
    return result
    
    
In [13]:
convert_base(161, 16)
convert_base(10, 2)
convert_base(10, 80)
Out[13]:
'a1'
Out[13]:
'1010'
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-13-d75904b47e95> in <module>()
      1 convert_base(161, 16)
      2 convert_base(10, 2)
----> 3 convert_base(10, 80)

<ipython-input-12-c7d9cdac293e> in convert_base(n, b)
      4         of n(decimal) in base 2<=b<=10
      5     '''
----> 6     assert 2 <= b <= 36
      7 
      8     if n == 0:

AssertionError: 

Memory model

By understanding Python's memory model we

  1. can distinguish between objects that can be mutated and those that cannot. We can also understand how to mutate objects.
  2. can write a more efficient code.

Motivation examples:

In [14]:
def change_num(num):
    num = 999

    
x = 30
change_num(x)
x

    
    
Out[14]:
30
In [15]:
def add_to_list(lst):
    lst.append(999)
    
mylst = [1,2,3]
add_to_list(mylst)
mylst
Out[15]:
[1, 2, 3, 999]
In [16]:
def change_lst(lst):
    lst = []

mylst = [1,2,3]
change_lst(mylst)
mylst
Out[16]:
[1, 2, 3]

Programs examined with Python tutor:

In [ ]:
#1 basics  #####################################################
a = 1000                #1
b = "hello"             #2
a += len(b)             #3
c = 2*b[:2]             #4
if b!=c:                #5
    c = b               #6
del c                   #7
In [ ]:
#2 lists  #####################################################
a = 1000                #1
d = [a,2]               #2
d[1] = -1               #3
a = 1003                #4
for x in d:             #5a
    x = 7               #5b
In [ ]:
#3 funcs  #####################################################
a = 1000                #1
b = "hello"             #2
def is_palindrom(a):    #3a
    b = a[::-1]         #3b
    return a==b         #3c
is_pal = is_palindrom   #4
x = is_pal(b)           #5
In [ ]:
#4 lists+funcs  #####################################################
a = 1000                #1
d = [a,2]               #2
def f(a,d):             #3a
    a = 2000            #3b
    d[0] = a            #3c
    d = []              #3d
    return d            #3e
x = f(a,d)              #4

Extend(), append(), sort(), and sorted():

In [28]:
#5 – lists operators and methods #####################################################
lst = [3,2,1]
id(lst)
lst
lst = lst+[0] 
id(lst)
lst
lst.append(4)
id(lst)
lst
lst.insert(1,5)
id(lst)
lst
lst.extend([6,7])
id(lst)
lst
lst += [8] 
id(lst)
lst
Out[28]:
1787938575240
Out[28]:
[3, 2, 1]
Out[28]:
1787938858056
Out[28]:
[3, 2, 1, 0]
Out[28]:
1787938858056
Out[28]:
[3, 2, 1, 0, 4]
Out[28]:
1787938858056
Out[28]:
[3, 5, 2, 1, 0, 4]
Out[28]:
1787938858056
Out[28]:
[3, 5, 2, 1, 0, 4, 6, 7]
Out[28]:
1787938858056
Out[28]:
[3, 5, 2, 1, 0, 4, 6, 7, 8]
In [29]:
lst
id(lst)
#sorted returns a sorted copy of the list, and does not change the original one
lst2 = sorted(lst)
id(lst)
lst
#list.sort() sorts in place, and does not return anything
lst.sort()
id(lst)
lst
Out[29]:
[3, 5, 2, 1, 0, 4, 6, 7, 8]
Out[29]:
1787938858056
Out[29]:
1787938858056
Out[29]:
[3, 5, 2, 1, 0, 4, 6, 7, 8]
Out[29]:
1787938858056
Out[29]:
[0, 1, 2, 3, 4, 5, 6, 7, 8]

comparison between + and +=

We use + or += operators in order to construct the same list.

Using operator +: less efficient, since a new list is constructed every iteration the number of integers written to memory equals 1+2+ ... + n = (n+1)*n/2 (= asumptotically speaking, grows quadratically with n)

Using operator +=: more efficient, since the same list is extended again and again the number of integers written to memory equals n (= asumptotically speaking, grows linearly with n)

memory and running times

In [5]:
import time
In [12]:
n= 20000

lst = []
t0 = time.perf_counter()
for i in range(n):
	lst = lst + [i]
t1 = time.perf_counter()
print("Extending a list n=", n, "times, using the + operator took",t1-t0, "seconds")



lst = []
t0 = time.perf_counter()
for i in range(n):
	lst += [i]
t1 = time.perf_counter()
print("Extending a list n=", n, "times, using the += operator (extend()) took",t1-t0, "seconds")
Extending a list n= 20000 times, using the + operator took 0.5090059999999994 seconds
Extending a list n= 20000 times, using the += operator (extend()) took 0.0018683000000692118 seconds

We double the size of the input.

In [13]:
n= 40000

lst = []
t0 = time.perf_counter()
for i in range(n):
	lst = lst + [i]
t1 = time.perf_counter()
print("Extending a list n=", n, "times, using the + operator took",t1-t0, "seconds")



lst = []
t0 = time.perf_counter()
for i in range(n):
	lst += [i]
t1 = time.perf_counter()
print("Extending a list n=", n, "times, using the += operator (extend()) took",t1-t0, "seconds")
Extending a list n= 40000 times, using the + operator took 2.10560119999991 seconds
Extending a list n= 40000 times, using the += operator (extend()) took 0.004132899999945039 seconds

In accordance with the theory, the running time of the inefficient code increases by approximately 2**2 = 4 as n increases by 2, while the running time of the efficient code increases by approximately 2. Apparently n is large enough for the constants not to play a serious role, so that the measurements agree with the asymptotic analysis.