Notebook

2. Utility functions and plotting¶

This is the first session concerning Python code for the Python 201 course. Thus, the focus will be on brushing up and only introducing a few new smaller topics.

The primary focus will be on writing well-structured functions and keeping code modular.

Static & Dynamic Code Analysis¶

Static code analysis means analysing the code without executing it. The opposite naturally being dynamic code analysis where the analysis is performed by executing the code on a real or virtual processor.

These types of analysis have a lot of use cases but can e.g. provide:

find common security issues in the code (such as SQL injection vulnerabilities)
code linting against various standards/guidelines and/or style conventions
automatic code formating
autocompletion
measuring class cohesion
spell checking
measure code (cyclomatic) complexity
type checking
find unused classes, functions and variables
graphing imports/dependencies
run/execution timers
find program errors
find memory leaks,
find race conditions

In the following are the most important use cases you should know summarised.

Autocompletion¶

Autocompletion is a tool constantly trying to predict the rest of what the user is typing. This is a must have and are activated by default in most editors. For Python Jedi is one of the best. It's been heavily tested and seem to have a deeper understanding of Python than its competitors. The new kid on the block is Kite which uses Deep Learning to do multi-line predicts etc.

Linting¶

Code linters check that the formating of the code is inline with a selected style. This is a very useful tool that helps the user write more clear and readable code. Especially new users can benefit a lot from this.

We recommend using Pycodestyle (former PEP8) that highlights deviations from the well-known PEP8 Style Guide https://www.python.org/dev/peps/pep-0008/.

Type Checking¶

Python is a dynamically typed language with all the pros and cons that comes with. However, it is still possible to achive some of the benefints of statically typed language by using type hinting (specify types without impacting runtime) and then use static analysis to detect type errors/inconsistencies. Note that this is primarily relevant for larger Python projects.

Type hinting:

age: int = 1  # This is how you declare the type of a variable type

child: bool  # You don't need to initialize a variable to annotate it

if age < 18:
    child = True
else:
    child = False

# Built-in types
from typing import List, Set, Dict, Tuple, Optional

# This is how you annotate a function definition
def f(x: int, s: Sequence[int]) -> List[int]:
    ...

Unit testing¶

Unit testing means that you predefine a series of code inputs for which you know what the outcome should be. By dynamic code analysis thess tests can be setup to automatically run on every single code commit immidiately informing the user if any of the tests fails. Unit testing and test converage gets relevant when you have a project many people rely on and you need to ensure correct behaviour during futher development. We recomment using pytest

Code in multiple files¶

Once a program grows to more than a few hundred lines it might make sense to separate code into multiple files. This can help to maintain a better overview of the structure of the program by storing connected pieces of logic together.

Before moving on let's distinguish between two types of Python files:

A module is a .py file that gets imported into another file so the code can be used there
A script is a .py file that is meant to be run directly

A quick recap of how modules imports work in Python, here shown for the math library:

# Lets you access everything in the math library by dot-notation (e.g math.pi) 
import math

# Lets you use pi directly
from math import pi    

# Lets you use everything in the math library directly
from math import *

The last one is not considered good practice, since variables will be untraceable. It can be good for making quick tests though.

This same principle works for file specific modules as well. So if we create my_module.py.

# my_module could be your own python file
import my_module

If my_module contains a function called my_func, you can after the import call it as my_module.my_func().

Similar to the math module example, we could do

# Lets you use my_func directly 
from my_module import my_func

Understanding the `import` search¶

When the Python interpreter sees a statement like import module_name, Python will search for a file called module_name.py.

It searches in certain directories among which are the current working directory and the directory where you installed Python (if it is added to your path).

It's easy to see the directories within which Python searches. Just add the following lines to your code:

# Import module that has information about the Python interpreter
import sys 

# Print a list of directories where interpreter searches
print(sys.path)

You can control which directories Python should search. This is as simple as adding a directory path to the list like sys.path.append(directory_path).

It should be noted that the search is sequential, so if you want to have a new directory searched first you could do sys.path.insert(0, directory_path) which would insert directory_path as the first element to be searched.

This can for example be used to make sure that the top level (root) of a project directory is always searched.

Importing executes the file¶

It is important to stress that when importing a file, the actual code inside the file is run.

Suppose you have a module moduleA

# moduleA.py 
print(2+2)

and a module moduleB

# moduleB.py
import moduleA

If you open moduleB in your editor and run the code it will output 4. This is because importing moduleA literally runs the code that resides inside moduleA.

Try to recreate this small example to see it in action.

Simple use case with multiple files¶

One example of this could be

One module called utils.py containing all utility functions that the program uses. This could e.g. be code for reading in, cleaning or manipulating data.
A module plotting.py containing all code responsible for plotting of results.
A script file main.py that imports the modules and is used to run the code.

# main.py

import utils
import plotting

# Create some example data as lists
x = [i for i in range(100)]
y = [i*i for i in x]

# Modify the data by means of a function inside utils.py
x_modified, y_modified = utils.func_inside_utils(x, y)

# Plot the modified lists by means of function inside plotting 
plotting.func_inside_plotting(x_modified, y_modified)

This approach has some advantages as it

creates modularity by grouping related code inside modules and functions
makes the code more readable, reusable and maintainable
encourages the creator to make the code in utils and plotting more project independent
makes the actual working file (main.py) more clean since most of the complicated logic sits inside separate modules

A more here for a good resource on Python import statements.

Running a module as a script¶

Even though a module is meant to be imported into another file, it's possible to make it runnable as a script. The way to do this is to add the following line, typically at the end of the module level code:

if __name__ == '__main__':
    # Code here will only be run if this exact .py file is run directly
    # It will not run if it is imported

This can be used in various ways, for example for setting up simple runnable test examples in each module.

Functions with variable number of parameter¶

Dynamic arguments `*args`¶

Python has a built-in feature that enables use of a unlimited number of function parameters.

In [1]:

def func(*args):
    ''' Print the arguments that were inputted to the function.'''
    print(args)
    

# Call the function with various number of arguments
func()
func(2)
func(2, 4)
func(2, 4, 8)

()
(2,)
(2, 4)
(2, 4, 8)

Recall that a standard function with a fixed number of arguments will raise an error is it does not receive the exact number of arguments that it expects.

Note that default arguments can be omitted when calling standard functions though.

In [2]:

def standard_func(string):
    ''' Print the string that was inputted '''
    print(string)
    
    
# Run the standard function with a single argument (as it expects)
standard_func('Hi')

Hi

In [3]:

# Run the function with two arguments when it expects only a single argument
standard_func('Hi', 'Ho')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-5340979a5eb5> in <module>
      1 # Run the function with two arguments when it expects only a single argument
----> 2 standard_func('Hi', 'Ho')

TypeError: standard_func() takes 1 positional argument but 2 were given

As seen, this raises an error.

The naming *args is not forced, but just the standard convention when talking about this behavior. You could call it whatever you want as long as there is an * in front.

In the example below the name numbers is used in place of args.

In [4]:

def sum_numbers(*numbers):
    ''' Return the sum of all given arguments '''
    return sum([*numbers])


# Call function with different number of arguments
print(sum_numbers(2))
print(sum_numbers(2, 7))
print(sum_numbers(2, 7, 5, 1))

2
9
15

If you already have a list of values from any previous code, you can unpack the list while calling the function by using an *.

In [5]:

# Define a list of values
a = [2, 7, 5, 1]

# Call function and unpack at the same time
sum_numbers(*a)

Out[5]:

The *args feature can be useful for creating flexibility in a function, allowing the caller more options and freedom.

An good example of an *args implementation is the built-in zip, which takes in a dynamic amount of iterables and generates values for looping over them simultaneously. In the Python documentation this is written as zip(*iterables), see here.

See how zip can be used with a varying number of arguments in the demo below.

In [6]:

# Using zip with two arguments
for n, s in zip([1, 2], ['Hi', 'Ho']):
    print(n, s)

1 Hi
2 Ho

In [7]:

# Using zip with three arguments
for n, s1, s2  in zip([1, 2], ['Hi', 'Ho'], ['Yi', 'Ha']):
    print(n, s1, s2)

1 Hi Yi
2 Ho Ha

Dynamic keyword arguments `**kwargs`¶

The functionality for *args can be used with so-called *keyword arguments. It works very much the same way as `args`, except there is a keyword assigned to each argument value.

The syntax is with double asterisk **kwargs and it works as demonstrated below.

In [8]:

def kw_func(**kwargs):
    ''' Print the keyword arguments that were inputted to the function. '''
    print(kwargs)
    
    
# Call the function with various numbers of keyword arguments
kw_func(p=1, q=2)
kw_func(bla=327, blo=('hi', 'ho'))

{'p': 1, 'q': 2}
{'bla': 327, 'blo': ('hi', 'ho')}

One might say that *args refer to lists and **kwargs refer to dictionaries.

Also here the naming kwargs is just a standard convention name, short for keyword arguments. You can call it whatever you want. As long as there are ** in front it will create this behavior.

It might not be very often that *args or **kwargs are needed for your own functions, but they do have their place and they are used in many functions and methods in third-party libraries.

Some NumPy and SciPy functions¶

NumPy and SciPy has a great amount of useful functions to do all sorts of operations. Only a few were mentioned in the Python 101 course.

The brief description below gives an overview of some functions that might be useful for solving the exercises.

`np.arange`¶

np.arange is sort of like Python's standard range(start, stop [, step]). However, the standard Python version only allows for use of integer values, whereas the NumPy version is more flexible. See a small demonstration below.

In [9]:

import numpy as np

# Create and array from -2 to 2 with step 0.5
a = np.arange(-2, 2, 0.5)
a

Out[9]:

array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5])

Notice the that start point in included, but not the endpoint.

A related NumPy function is np.linspace which was used in the previous course. It lets the user specify the number of elements for the resulting array rather than the step, as for np.arange.

Docs for np.arange

Docs for np.linspace

Creation of boolean arrays (aka masks)¶

A boolean array, also called a mask array, can be created by simply stating a condition on an existing array.

In [10]:

# Using the `a` array defined above

# Create a mask array for a with a condition
a < 0.2

Out[10]:

array([ True,  True,  True,  True,  True, False, False, False])

Multiple conditions are also allowed with a simple syntax.

In [11]:

# Create a mask array for a with multiple conditions
(a > -1.4) & (a < 1.4)

Out[11]:

array([False, False,  True,  True,  True,  True,  True, False])

`np.where`¶

This function returns the indices where a boolean array has elements True.

In [12]:

# Extract indices where elements in a are less than 2 
np.where(a < 0.2)

Out[12]:

(array([0, 1, 2, 3, 4], dtype=int64),)

In [13]:

# Repeating from above
a_bool = (a > -1.4) & (a < 1.4)

# Get indices where the multiple conditions are True
np.where(a_bool)

Out[13]:

(array([2, 3, 4, 5, 6], dtype=int64),)

Notice that the returned value is a tuple. If you want the first element, which is an array, extract it by indexing.

Docs for np.where

Another explanation (arguably better)

`np.append`¶

Appending of an array to another array is demonstrated below.

In [14]:

# Create two arrays
e = np.array([1, 2, 3])
f = np.array([11, 12])

# Combine them by appending one to the other
np.append(e, f)

Out[14]:

array([ 1,  2,  3, 11, 12])

Doc for np.append.

NumPy fancy indexing¶

NumPy arrays can be indexed by other arrays. This is an extremely useful feature. See the demonstration below.

In [15]:

# Given an array (`a` from before)
a

Out[15]:

array([-2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,  1.5])

In [16]:

# And another array representing indices
b = np.array([3, 4])
b

Out[16]:

array([3, 4])

In [17]:

# a can be indexed by b as
a[b]

Out[17]:

array([-0.5,  0. ])

This concept can be extended to more complicated use cases.

NumPy/SciPy explanation for fancy indexing

SciPy `find_peaks`¶

The SciPy library has many useful functions for scientific computing. One of them is find_peaks which is found in the signal module (scipy.signal). This means that it can be imported as from scipy.signal import find_peaks.

Given an array of values it finds all the peaks by comparing to the value in the two neighboring points. Note that it does not find the valleys!

It has some useful optional parameters, e.g. setting a minimum height below which no peaks are returned.

The function returns a tuple with two values. The first is the array indices where peaks were found, the next is a property dictionary with some information about the analysis results.

Saving a function return to variables can be done by unpacking. If a tuple with two elements are returned, one can save the returned values like this:

# Save variables from a function that returns two variables
a, b = function_call()

# Or if only the first variables is need in the code
a, _ = function_call()

Docs for find_peaks

Exercises¶

From solving the exercises in Session 0 you should have a local Git repository with a README.md and a .gitignore. We encourage to use this repository while working with the exercises in this course to get familiarity with the Git workflow.

In Session 1 you linked the local repo to a remote one on your GitHub account. You can push when you want to sync your local work to the remote.

The exercises in this session focus on:

Git for version control
numpy and scipy
usage of **kwargs as a function parameter (and **args if you do the bonus exercises)
code spread across in multiple files

In case you do not yet have a local Git repository set up for this course, you can do the exercises from Session 0 before doing these. If you just want to get going with these exercises, you can run git init inside the directory where you want to store your code and start.

Exercises 1¶

Create a branch

As you're about do a larger chunk of work with a specific topic in mind, create a branch with a descriptive name (e.g. session2) and check it out.

Recall that git branch <branch_name> creates a branch and git checkout <branch_name> checks it out. This can be done in one operation as git checkout -b <branc_name>.
Create a folder and a module

Create a folder for Session 2 and a module called utils.py inside it.

The utils.py module will store all the utility functions that can be called upon to do a lot of ground work.

Exercise 2¶

Inside utils.py write the function called extract_interval which has its skeleton presented below. Read the docstring for details.

def extract_interval(x, x_bounds):
    '''  
    Return the indices of `x` that are within `x_bounds` with both of values 
    `x_bounds` being inclusive.

    Parameters
    ----------
    x : 1D numpy array
        x-coordinates sorted in ascending order.
    x_bounds : tuple
        A two-element tuple defining the interval within which to extract x-indices.

    Returns
    -------
    1D numpy array
        Array of indices for `x`-values that reside within `x_bounds`.

    Assumptions
    ------------
    The function assumes that the input array `x` is sorted in
    ascending order.
    '''
    # Your code here!

Some examples to test against:¶

# Basic test 1
x1 = np.arange(0, 10)
idx1 = extract_interval(x=x1, x_bounds=(3, 7))

# Basic test 2
x2 = np.arange(-0.6, 3.8, 0.4)
idx2 = extract_interval(x=x2, x_bounds=(-0.3, 1.5))

Results should be idx1 = [3 4 5 6 7] and idx2 = [1 2 3 4 5].

Place test runs like the above in the bottom of the file in a if _name__ == '__main__': block. This way it will not get run if the module is imported from elsewhere.

Commit your changes¶

Once you're happy with the function you can commit your changes to create a snapshot of you repository state.

Exercise 3¶

Write a function called extrema_indices inside utils.py that takes as input an array of y-coordinates and returns the indices of all local extrema (peaks and valleys). Use the find_peaks function from scipy.signal.

Be sure to write a good docstring for the function. You can use the same structure as in the previous exercise as guideline.

Recall that find_peaks only finds the peaks and not* the valleys, so you have to do a slight workaround to get the valleys.*

Testing the code¶

Test the code by running yy from below through the function

# A dummy graph to test against
xx = np.linspace(1, 20, 40)
yy = np.sin(xx) * 3 * np.cos(xx**2)

It should return this array:

array([ 1.83936866,  2.00743698,  1.76511072,  2.68372999, -0.10032392,
        1.99429972,  0.27309389,  2.23407394,  0.12795831,  2.51053515,
        0.2974298 , -2.00646398, -0.63018205, -2.36255017, -2.42571128,
       -0.65672365, -0.37642334, -2.60571339, -1.05772914, -1.4625172 ,
       -0.7508744 , -2.34019749, -1.50602509])

Commit your changes¶

Create a commit when you have finished the exercise.

Exercise 4¶

Write the function described in the docstring below.

def arrays_todict(x_arr, y_arr):
    ''' 
    Return a dictionary with y-values in string form as keys 
    and x- and y-values as values. 

    Parameters
    ----------
    x_arr : 1D numpy array
        Array of x-values
    y_arr : 1D numpy array
        Array of y-values

    Returns
    -------
    dict
        Dictionary of the form 
        {'y1': (x1, y1), 'y2': (x2, y2), ..., 'yn': (xn, yn)}
    '''
    # Your code here!

You might want to format the decimals for the keys. See here for how to do that using f-strings.

Create a commit when you have tested and finished the exercise.

Hint¶

This can be solved by a one-liner with a dictionary comprehension. They work similarly to list comprehensions, but instead of a single value at each element, one must supply a key and a value. See a simple demonstration of a dictionary comprehension below.

In [18]:

# Example of dict comprehension
{f'2*{x}': 2*x for x in range(5)}

Out[18]:

{'2*0': 0, '2*1': 2, '2*2': 4, '2*3': 6, '2*4': 8}

The built-in zip might also be useful. It enables iterating over two iterables simultaneously.

Docs for zip.

Notice that the Python implementation of zip says `zip( iterables) . So it uses the args` concept described above.

Exercise 5¶

Create a new module plotting.py for storing plotting related code. Inside it, write a function called annotate_points.

def annotate_points(points_to_annotate, ax, **kwargs):
    '''
    Annotate points with corresponding text.

    Parameters
    ----------
    points_to_annoate : dict
        Dictionary with desired annotation text as keys and the (x, y)-
        coordinates of the annotation as values.
    ax : matplotlib axis object
        Axis object on which to plot the annotations. 
    **kwargs : keyword arguments
        Arguments to be forwarded to the ax.annotate call.    
    '''
    # Your code here

The purpose of the function for these exercises is to annotate the extreme values which are to be stored inside a dictionary of the form {'extr_val1': (x1, y1), 'extr_val2': (x2, y2), ...}.

For annotating points use plt.annotate or ax.annotate. They work similarly, but plt is for the Matlab-like API and ax is for the Object-oriented API of matplotlib.

Docs for plt.annotate

Docs for ax.annoate

Note the intended use of **kwargs in the annotate_points function. It is meant to just transfer the many possible keyword arguments from your custom function to the matplotlib function. By using this, your own function does not limit the customizability of the underlying plotting code. Try for example to add the argument color='green' to the function call when testing.

Testing the function¶

You can test the function by creating a plot and annotating the extreme values. Create some dummy values for a graph yourself or use these ones provided in Exercise 3.

Commit you changes¶

Commit your changes when you are done.

Exercise 6¶

Create module main.py and import utils and plotting into it.

This is now a "clean" file that has access to all of our custom functions without looking cluttered.

Inside main.py create a graph and use the functions previously created to annotate only extreme values within a certain $x$ -interval of your choice.

You can use the test code from Exercise 3 and 5 if you want.

Some improvements¶

The exercises are basically done. If you want you can keep improving the functions. Some good improvements could be:

Make extract_interval work with multiple intervals by use of *args.
Specify the number of decimal points for the dict keys that are returned from the arrays_to_dict function. This would be prettier when the strings that are stored as keys are eventually annotated in the plot.

If you start altering your existing, working code remember to make sure that you have that code stored in a commit (snapshot). This way you can always revert to that state if need be.

End of exercises¶

The cell below is for setting the style of this document. It's not part of the exercises.

In [19]:

# Apply css theme to notebook
from IPython.display import HTML
HTML('<style>{}</style>'.format(open('../css/cowi.css').read()))

Out[19]:

2. Utility functions and plotting¶

Static & Dynamic Code Analysis¶

Autocompletion¶

Linting¶

Type Checking¶

Unit testing¶

Code in multiple files¶

Understanding the import search¶

Importing executes the file¶

Simple use case with multiple files¶

Running a module as a script¶

Functions with variable number of parameter¶

Dynamic arguments *args¶

Dynamic keyword arguments **kwargs¶

Some NumPy and SciPy functions¶

np.arange¶

Creation of boolean arrays (aka masks)¶

np.where¶

np.append¶

NumPy fancy indexing¶

SciPy find_peaks¶

Exercises¶

Exercises 1¶

Exercise 2¶

Some examples to test against:¶

Commit your changes¶

Exercise 3¶

Testing the code¶

Commit your changes¶

Exercise 4¶

Hint¶

Exercise 5¶

Testing the function¶

Commit you changes¶

Exercise 6¶

Some improvements¶

End of exercises¶

Understanding the `import` search¶

Dynamic arguments `*args`¶

Dynamic keyword arguments `**kwargs`¶

`np.arange`¶

`np.where`¶

`np.append`¶

SciPy `find_peaks`¶