cs1001.py , Tel Aviv University, Winter 2020

Recitation 3

We talked about binary numbers (with some theory) and base conversions and implemented an algorithm for base conversion (from decimal to 2<=b<=36). We examined Python's memory model and analyzed the efficiency of constructing a list using + or += operators.

Takeaways:

  1. Make sure you understand binary numbers and base conversions (including the algorithms for conversion to and from a base b to decimal). It is a very useful tool in computer science.
  2. Elements of a list can be changed from inside a function, if this list is given as a parameter. Note that one should use dedicated list functions or the [] operator for mutating the list.
  3. Use Python tutor in order to understand what's going on in terms of memory. It can be very helpful.
  4. Try to analyze the number of operations your function does to see how will its runtime scale as a function of the input (we will elabore on this soon).

Python tutor guidelines:

Before you click "Visualize Execution" button, you may want to use the following settings (can be adjusted via the drop boxes next to the textbox):

  1. Python 3.6
  2. Show exited frames (Python)
  3. Render all objects on the heap
  4. Draw pointers as arrows [default]

The binary system and base conversions

When we count natural numbers, we use the decimal system. That is, we represent numbers in base 10, using 10 digits (0-9). Binary representation (base 2) uses two possible digits: 0-1. In general, a number in base $b$ will be represented using $b$ possible digits.

Code for printing several outputs in one cell (not part of the recitation):

In [ ]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Base conversions

Python has some built-in conversion methods from-and-to decimal, binary and other bases:

In [ ]:
bin(10)
type(bin(10))
In [ ]:
hex(161)
int("345")
In [ ]:
int("1101", 2)
int("a1", 16)
int("a1", 2)

Converting from binary to decimal

Let $x_{[2]} = a_{n-1} ... a_1 a_0$ be the binary representation of a number with $n$ binary digits. The following formula returns its decimal representation: $$x_{[10]} = \sum_{0 \le k \le n-1} a_k \cdot 2^k$$

Similarly, if $x_{[b]} = a_{n-1} ... a_0$ is represented in an arbitrary base with $b$ digits then: $$x_{[10]} = \sum_{0 \le k \le n-1} a_k \cdot b^k$$

Converting from decimal to binary

Converting from decimal to binary is done by integer division and modulo operations. Here is an implementation of the algorithm for conversion from decimal to a general base $b$.

Our version for convert_base:

In [ ]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    result = ""
    while n != 0:
        digit = n % b
        n //= b
        result = str(digit) + result
        
    return result
    
    
    
    
In [ ]:
convert_base(10,2)
In [ ]:
convert_base(0, 2)

Is the code above correct? What happens in the case where $n = 0$?

Improved version of convert_base:

In [ ]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    if n == 0:
        return "0"
    
    result = ""
    while n != 0:
        digit = n % b
        n //= b
        result = str(digit) + result
        
    return result
    
    
In [ ]:
convert_base(0,2)
convert_base(12,3)
In [ ]:
convert_base(161,16)

The improved version above fails for $10<b$ (why ?) This issue is fixed in the final version below (improvement #2) (what happens when $b = 1$?)

In [ ]:
alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
In [ ]:
def convert_base(n,b):
    '''convert_base(int, int)->string
        Returns the textual representation 
        of n(decimal) in base 2<=b<=10
    '''
    assert 2 <= b <= 36
    
    if n == 0:
        return "0"
    
    result = ""
    alphabet = "0123456789abcdefghijklmnopqrstuvwxyz"
    while n != 0:
        ## Complete
        
    return result
    
    
In [ ]:
convert_base(161, 16)
convert_base(10, 2)
convert_base(10, 80)

Memory model

Understanding Python's memory model means understanding the way Python stores our variables in the memory and the way Python manipulates these variables when we perform operations on them.

Memory management depends on various properties, including:

  • The precide method we use to change our variables
  • Whether our variable is of a mutable type or not
  • The specific location of our operation within the scope of the code

Understanding this model will allow us to write better and more efficient code.

Motivating examples:

In [ ]:
def change_num(num):
    num = 999

    
x = 30
change_num(x)
x

    
    
In [ ]:
def add_to_list(lst):
    lst.append(999)
    
mylst = [1,2,3]
add_to_list(mylst)
mylst
In [ ]:
def change_lst(lst):
    lst = []

mylst = [1,2,3]
change_lst(mylst)
mylst

Programs examined with Python tutor:

In [ ]:
#1 basics  #####################################################
a = 1000                #1
b = "hello"             #2
a += len(b)             #3
c = 2*b[:2]             #4
if b!=c:                #5
    c = b               #6
del c                   #7
In [ ]:
#2 lists  #####################################################
a = 1000                #1
d = [a,2]               #2
d[1] = -1               #3
a = 1003                #4
for x in d:             #5a
    x = 7               #5b
In [ ]:
#3 funcs  #####################################################
a = 1000                #1
b = "hello"             #2
def is_palindrom(a):    #3a
    b = a[::-1]         #3b
    return a==b         #3c
is_pal = is_palindrom   #4
x = is_pal(b)           #5
In [ ]:
#4 lists+funcs  #####################################################
a = 1000                #1
d = [a,2]               #2
def f(a,d):             #3a
    a = 2000            #3b
    d[0] = a            #3c
    d = []              #3d
    return d            #3e
x = f(a,d)              #4

Extend(), append(), sort(), and sorted():

In [ ]:
#5 – lists operators and methods #####################################################
lst = [3,2,1]
id(lst)
lst
lst = lst+[0] 
id(lst)
lst
lst.append(4)
id(lst)
lst
lst.insert(1,5)
id(lst)
lst
lst.extend([6,7])
id(lst)
lst
lst += [8] 
id(lst)
lst
In [ ]:
lst
id(lst)
#sorted returns a sorted copy of the list, and does not change the original one
lst2 = sorted(lst)
id(lst)
lst
#list.sort() sorts in place, and does not return anything
lst.sort()
id(lst)
lst

comparison between + and +=

In what follows we have two different implementations of the same operation: given an input $n$ we construct a list of the range $[0,...,n-1]$. The difference between the implementation is that in build_range1 we append items to our list using the command $lst = lst + [i]$ and in build_range2 we do this via the command $lst += [i]$, which, as we recall, invokes $lst.extend([i])$.

memory and running times

In [ ]:
import time
In [ ]:
def build_range1(n):
    lst = []
    t0 = time.perf_counter()
    for i in range(n):
        lst = lst + [i]
    t1 = time.perf_counter()
    print("Extending a list n=", n, "times, using the + operator took",t1-t0, "seconds")


def build_range2(n):
    lst = []
    t0 = time.perf_counter()
    for i in range(n):
        lst += [i]
    t1 = time.perf_counter()
    print("Extending a list n=", n, "times, using the += operator (extend()) took",t1-t0, "seconds")

build_range1(20000)
build_range2(20000)

Empirically, calling $lst = lst + [i]$ is less efficient. Why is that?

Since a new list is constructed every iteration the number of integers written to memory equals $1+2+ ... + n = \frac{n \cdot (n+1)}{2} \approx \frac{n^2}{2}$. Asymptotically speaking, the number of copy operations grows quadratically with $n$.

Using invoke, the same list is extended again and again. The number of integers written to memory equals $n$. Asymptotically speaking, the number of copy operations grows linearly with $n$.

What do we expect will happen if we double the input size $n=40000$?

In [ ]:
build_range1(40000)
build_range2(40000)

In accordance with the theory, the running time of the inefficient code increases by approximately $2^2 = 4$ as $n$ increased by a factor of $2$, while the running time of the efficient code increases by a factor of approximately $2$.