Modules and Functions¶

Functions¶

Last week, we learned how to run Python code, assign variables, and write control flow statements, which allows us to write programs that can do calculations. In fact, this is all you really need to write programs (except for being able to read in and write out data which we will talk about later). However, if you try and write more complicated programs than those you wrote for Problem Set 1, the programs will quickly become very long and unreadable. So one very important rule in programming is to avoid repetition.

Let's say you want to write a program that checks if certain numbers are prime, and then you want to check later in the program if another set of values is prime. You could write:

In [1]:

for value in [3,5,6,8,9]:
    prime = True
    for i in range(2, int(value ** 0.5) + 1):
        if value % i == 0:
            prime = False
    if prime:
        print value, "is prime" 

3 is prime
5 is prime

then:

In [2]:

for value in [34,12,27,21,23,13]:
    prime = True
    for i in range(2, int(value ** 0.5) + 1):
        if value % i == 0:
            prime = False
    if prime:
        print value, "is prime" 

23 is prime
13 is prime

But this is inefficient, because you are duplicating the code that checks if a value is prime. This is why, if you ever need to do something twice or more, and it is a well-defined task, think about putting it in a function.

For example:

In [3]:

def is_prime(value):
    prime = True
    for i in range(2, int(value ** 0.5) + 1):
        if value % i == 0:
            prime = False
    return prime

Once the function is defined, you can then do:

In [4]:

for value in [3,5,6,8,9]:
    if is_prime(value):
        print value, "is prime" 

for value in [34,12,27,21,23,13]:
    if is_prime(value):
        print value, "is prime" 

3 is prime
5 is prime
23 is prime
13 is prime

The syntax for a function is:

def function_name(arguments):
    # code here
    return values

Functions are the building blocks of programs - think of them as basic units that are given a certain input an accomplish a certain task. Over time, you can build up more complex programs while preserving readability.

Similarly to if statements and for and while loops, indentation is very important because it shows where the function starts and ends.

Note: it is a common convention to always use lowercase names for functions.

A function can take multiple arguments...

In [5]:

def add(a, b):
    return a + b

print add(1,3)
print add(1.,3.2)
print add(4,3.)

4
4.2
7.0

... and can also return multiple values:

In [6]:

def double_and_halve(value):
    return value * 2., value / 2.

print double_and_halve(5.)

(10.0, 2.5)

If multiple values are returned, you can store them in separate variables.

In [7]:

d, h = double_and_halve(5.)

In [8]:

print d

10.0

In [9]:

print h

2.5

Functions can call other functions:

In [10]:

def is_divisible_by(value, other_value):
    return value % other_value == 0

def is_prime(value):
    prime = True
    for i in range(2, int(value ** 0.5) + 1):
        if is_divisible_by(value, i):
            prime = False
    return prime

In [11]:

for value in range(2, 30):
    if is_prime(value):
        print value, "is prime" 

2 is prime
3 is prime
5 is prime
7 is prime
11 is prime
13 is prime
17 is prime
19 is prime
23 is prime
29 is prime

Just because you can put code in functions doesn't mean you always should. It's best to try and break up the code into units that make sense - in the end, your function should ideally have a name that everyone can understand.

If we take the example from above, the following, in my opinion, is not as good even though it's one line shorter:

In [12]:

def is_prime(value):
    prime = True
    for i in range(2, int(value ** 0.5) + 1):
        if value % i == 0:
            prime = False
    print value, "is prime"

for value in [3,5,6,8,9]:
    is_prime(value)

3 is prime
5 is prime
6 is prime
8 is prime
9 is prime

The issue is that to me, is_prime means - just from the name - that it will return True or False depending on whether the value passed is prime, and doesn't say anything about printing. If you wanted to do this, the function should ideally be called print_message_if_prime or something similar. So don't define functions based on just making the shortest possible program, but also take into account that functions should be basic units that make sense conceptually.

Built-in functions¶

Some of you may have already noticed that there are a few functions that are defined by default in Python:

In [13]:

x = [1,3,6,8,3]

In [14]:

len(x)

Out[14]:

In [15]:

sum(x)

Out[15]:

In [16]:

int(1.2)

Out[16]:

A full list of built-in functions is available here. Note that there are not that many - these are only the most common functions. Most functions are in fact kept inside modules, which we will now cover.

Modules¶

One of the strengths of Python is that there are many built-in add-ons - or modules - which contain existing functions, classes, and variables which allow you to do complex tasks in only a few lines of code. In addition, there are many other third-party modules (e.g. Numpy, Scipy, Matplotlib) that can be installed, and you can also develop your own modules that include functionalities you commonly use.

The built-in modules are referred to as the Standard Library, and you can find a full list of the available functionality in the Python Documentation.

To use modules in your Python session or script, you need to import them. The following example shows how to import the built-in math module, which contains a number of useful mathematical functions:

In [17]:

import math

You can then access functions and other objects in the module with math.<function>, for example:

In [18]:

math.sin(2.3)

Out[18]:

0.7457052121767203

In [19]:

math.factorial(20)

Out[19]:

2432902008176640000

In [20]:

math.pi

Out[20]:

3.141592653589793

Because these modules exist, it means that if what you want to do is very common, it means it probably already exists, and you won't need to write it (making your code easier to read).

For example, the numpy module, which we will talk about next week, contains useful functions for finding e.g. the mean, median, and standard deviation of a sequence of numbers:

In [21]:

import numpy as np

In [22]:

li = [1,2,7,3,1,3]
np.mean(li)

Out[22]:

2.8333333333333335

In [23]:

np.median(li)

Out[23]:

2.5

In [24]:

np.std(li)

Out[24]:

2.0344259359556172

Notice that in the above case, we used import numpy as np instead of import numpy - this means that we can rename the module so that it's not as long to type in the program.

Finally, it's also possible to simply import the functions needed directly:

In [25]:

from math import sin, cos
sin(3.4)
cos(3.4)

Out[25]:

-0.9667981925794611

You may find examples on the internet that use e.g. from module import *, but this is not recommended, because it will make it difficult to debug programs, since common debugging tools that rely on just looking at the programs will not know all the functions that are being imported (more on this later).

Where to find modules and functions [demo]¶

How do you know which modules exist in the first place? The Python documentation contains a list of modules in the Standard Library, but you can also simply search the web (Google is your friend!). Once you have a module that you think should contain the right kind of function, you can either look at the documentation for that module, or you can use the tab-completion in IPython:

In [2]: math.<TAB>
math.acos       math.degrees    math.fsum       math.pi
math.acosh      math.e          math.gamma      math.pow
math.asin       math.erf        math.hypot      math.radians
math.asinh      math.erfc       math.isinf      math.sin
math.atan       math.exp        math.isnan      math.sinh
math.atan2      math.expm1      math.ldexp      math.sqrt
math.atanh      math.fabs       math.lgamma     math.tan
math.ceil       math.factorial  math.log        math.tanh
math.copysign   math.floor      math.log10      math.trunc
math.cos        math.fmod       math.log1p      
math.cosh       math.frexp      math.modf

Variable scope¶

Consider the following example:

In [26]:

a = 1

def show_var():
    print a,b
  
b = 2
show_var()

1 2

In this case, the function knows about the variables defined outside the function.

Now consider the following example:

In [27]:

a = 1

def show_var():
    print a
    a = 2
  
show_var()

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-27-e84ac8cb6bcc> in <module>()
      5     a = 2
      6 
----> 7 show_var()

<ipython-input-27-e84ac8cb6bcc> in show_var()
      2 
      3 def show_var():
----> 4     print a
      5     a = 2
      6 

UnboundLocalError: local variable 'a' referenced before assignment

What happened? To understand this behavior, we have to talk about variable scope

Variables defined anywhere inside a function are part of the local scope of the function. Any variable in the local scope takes precedence over any other variable, even before it is actually used:

In [28]:

def show_var():
    print a
    a = 2

In this case, a is defined inside the function and so it doesn't matter if a is used anywhere else in the Python code. The above function will therefore not work because a is used before it is defined.

Variables defined at the top level of a script are in the global scope. If a variable is not defined inside a function, then variables from the global scope are used.

In [29]:

a = 1
def show_var():
    print a

In this case, a gets used from the global scope.

This is very useful because it means that modules don't have to be imported inside functions, you can import them at the top level:

In [30]:

import numpy as np

def normalize(x):
    return x / np.mean(x)

This works because modules are objects in the same sense as any other variable.

In practice, this does not mean that you should ever use:

In [31]:

a = 1
def show_var():
    print a

because it makes the code harder to read. The exception to this are modules, and variables that remain constant during the execution of the program.

PEP8 style¶

We just touched on the idea of constants being used in functions - but Python does not really have constants, so how do we recognize these? We now need to speak about coding style.

There is a set of style guidelines referred to as PEP8, which you can find here. These guidelines are not compulsory, but you should follow them as much as possible, especially when you have to work with other people or need other people to read your code.

You don't need to read the guidelines now, but I will first give a couple of examples, then I will show you a tool that can help you follow the guidelines. The following example does not follow the style guidelines:

In [32]:

pi = 3.1415926
def CalculateValues(x):
    return x*pi

Constants should be made uppercase, and function names should be lower case separated by underscores (the so called camel-case used above is reserved for classes).

This is the correct way to write the code:

In [33]:

PI = 3.1415926

def calculate_values(x):
    return x * PI

Other examples include that indentation should always be 4 spaces, etc. In practice, you can check your code with this script. Download the script to the folder where you are writing code, and do:

python pep8.py my_script.py

where my_script.py is the script you want to check. For example, you might see:

my_script.py:2:1: W191 indentation contains tabs

The errors include the line number. In general, try and make sure that your scripts do not return any errors!