This is the first session concerning Python code for the Python 201 course. Thus, the focus will be on brushing up and only introducing a few new smaller topics.
The primary focus will be on writing well-structured functions and keeping code modular.
Static code analysis means analysing the code without executing it. The opposite naturally being dynamic code analysis where the analysis is performed by executing the code on a real or virtual processor.
These types of analysis have a lot of use cases but can e.g. provide:
In the following are the most important use cases you should know summarised.
Autocompletion is a tool constantly trying to predict the rest of what the user is typing. This is a must have and are activated by default in most editors. For Python Jedi is one of the best. It's been heavily tested and seem to have a deeper understanding of Python than its competitors. The new kid on the block is Kite which uses Deep Learning to do multi-line predicts etc.
Code linters check that the formating of the code is inline with a selected style. This is a very useful tool that helps the user write more clear and readable code. Especially new users can benefit a lot from this.
We recommend using Pycodestyle (former PEP8) that highlights deviations from the well-known PEP8 Style Guide https://www.python.org/dev/peps/pep-0008/.
Python is a dynamically typed language with all the pros and cons that comes with. However, it is still possible to achive some of the benefints of statically typed language by using type hinting (specify types without impacting runtime) and then use static analysis to detect type errors/inconsistencies. Note that this is primarily relevant for larger Python projects.
Type hinting:
age: int = 1 # This is how you declare the type of a variable type
child: bool # You don't need to initialize a variable to annotate it
if age < 18:
child = True
else:
child = False
# Built-in types
from typing import List, Set, Dict, Tuple, Optional
# This is how you annotate a function definition
def f(x: int, s: Sequence[int]) -> List[int]:
...
Unit testing means that you predefine a series of code inputs for which you know what the outcome should be. By dynamic code analysis thess tests can be setup to automatically run on every single code commit immidiately informing the user if any of the tests fails. Unit testing and test converage gets relevant when you have a project many people rely on and you need to ensure correct behaviour during futher development. We recomment using pytest
Once a program grows to more than a few hundred lines it might make sense to separate code into multiple files. This can help to maintain a better overview of the structure of the program by storing connected pieces of logic together.
Before moving on let's distinguish between two types of Python files:
A module is a .py
file that gets imported into another file so the code can be used there
A script is a .py
file that is meant to be run directly
A quick recap of how modules imports work in Python, here shown for the math library:
# Lets you access everything in the math library by dot-notation (e.g math.pi)
import math
# Lets you use pi directly
from math import pi
# Lets you use everything in the math library directly
from math import *
The last one is not considered good practice, since variables will be untraceable. It can be good for making quick tests though.
This same principle works for file specific modules as well. So if we create my_module.py
.
# my_module could be your own python file
import my_module
If my_module
contains a function called my_func
, you can after the import call it as my_module.my_func()
.
Similar to the math module example, we could do
# Lets you use my_func directly
from my_module import my_func
import
search¶When the Python interpreter sees a statement like import module_name
, Python will search for a file called module_name.py
.
It searches in certain directories among which are the current working directory and the directory where you installed Python (if it is added to your path).
It's easy to see the directories within which Python searches. Just add the following lines to your code:
# Import module that has information about the Python interpreter
import sys
# Print a list of directories where interpreter searches
print(sys.path)
You can control which directories Python should search. This is as simple as adding a directory path to the list like sys.path.append(directory_path)
.
It should be noted that the search is sequential, so if you want to have a new directory searched first you could do sys.path.insert(0, directory_path)
which would insert directory_path
as the first element to be searched.
This can for example be used to make sure that the top level (root) of a project directory is always searched.
It is important to stress that when importing a file, the actual code inside the file is run.
Suppose you have a module moduleA
# moduleA.py
print(2+2)
and a module moduleB
# moduleB.py
import moduleA
If you open moduleB
in your editor and run the code it will output 4
. This is because importing moduleA
literally runs the code that resides inside moduleA
.
Try to recreate this small example to see it in action.
One example of this could be
One module called utils.py
containing all utility functions that the program uses. This could e.g. be code for reading in, cleaning or manipulating data.
A module plotting.py
containing all code responsible for plotting of results.
A script file main.py
that imports the modules and is used to run the code.
# main.py
import utils
import plotting
# Create some example data as lists
x = [i for i in range(100)]
y = [i*i for i in x]
# Modify the data by means of a function inside utils.py
x_modified, y_modified = utils.func_inside_utils(x, y)
# Plot the modified lists by means of function inside plotting
plotting.func_inside_plotting(x_modified, y_modified)
This approach has some advantages as it
creates modularity by grouping related code inside modules and functions
makes the code more readable, reusable and maintainable
encourages the creator to make the code in utils
and plotting
more project independent
makes the actual working file (main.py
) more clean since most of the complicated logic sits inside separate modules
A more here for a good resource on Python import statements.
Even though a module is meant to be imported into another file, it's possible to make it runnable as a script. The way to do this is to add the following line, typically at the end of the module level code:
if __name__ == '__main__':
# Code here will only be run if this exact .py file is run directly
# It will not run if it is imported
This can be used in various ways, for example for setting up simple runnable test examples in each module.
def func(*args):
''' Print the arguments that were inputted to the function.'''
print(args)
# Call the function with various number of arguments
func()
func(2)
func(2, 4)
func(2, 4, 8)
() (2,) (2, 4) (2, 4, 8)
Recall that a standard function with a fixed number of arguments will raise an error is it does not receive the exact number of arguments that it expects.
Note that default arguments can be omitted when calling standard functions though.
def standard_func(string):
''' Print the string that was inputted '''
print(string)
# Run the standard function with a single argument (as it expects)
standard_func('Hi')
Hi
# Run the function with two arguments when it expects only a single argument
standard_func('Hi', 'Ho')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-5340979a5eb5> in <module> 1 # Run the function with two arguments when it expects only a single argument ----> 2 standard_func('Hi', 'Ho') TypeError: standard_func() takes 1 positional argument but 2 were given
As seen, this raises an error.
The naming *args
is not forced, but just the standard convention when talking about this behavior. You could call it whatever you want as long as there is an *
in front.
In the example below the name numbers
is used in place of args
.
def sum_numbers(*numbers):
''' Return the sum of all given arguments '''
return sum([*numbers])
# Call function with different number of arguments
print(sum_numbers(2))
print(sum_numbers(2, 7))
print(sum_numbers(2, 7, 5, 1))
2 9 15
If you already have a list of values from any previous code, you can unpack the list while calling the function by using an *
.
# Define a list of values
a = [2, 7, 5, 1]
# Call function and unpack at the same time
sum_numbers(*a)
15
The *args
feature can be useful for creating flexibility in a function, allowing the caller more options and freedom.
An good example of an *args
implementation is the built-in zip
, which takes in a dynamic amount of iterables and generates values for looping over them simultaneously. In the Python documentation this is written as zip(*iterables)
, see here.
See how zip
can be used with a varying number of arguments in the demo below.
# Using zip with two arguments
for n, s in zip([1, 2], ['Hi', 'Ho']):
print(n, s)
1 Hi 2 Ho
# Using zip with three arguments
for n, s1, s2 in zip([1, 2], ['Hi', 'Ho'], ['Yi', 'Ha']):
print(n, s1, s2)
1 Hi Yi 2 Ho Ha
**kwargs
¶The functionality for *args
can be used with so-called *keyword arguments. It works very much the same way as `args`, except there is a keyword assigned to each argument value.
The syntax is with double asterisk **kwargs
and it works as demonstrated below.
def kw_func(**kwargs):
''' Print the keyword arguments that were inputted to the function. '''
print(kwargs)
# Call the function with various numbers of keyword arguments
kw_func(p=1, q=2)
kw_func(bla=327, blo=('hi', 'ho'))
{'p': 1, 'q': 2} {'bla': 327, 'blo': ('hi', 'ho')}
One might say that *args
refer to lists and **kwargs
refer to dictionaries.
Also here the naming kwargs
is just a standard convention name, short for keyword arguments. You can call it whatever you want. As long as there are **
in front it will create this behavior.
It might not be very often that *args
or **kwargs
are needed for your own functions, but they do have their place and they are used in many functions and methods in third-party libraries.
NumPy and SciPy has a great amount of useful functions to do all sorts of operations. Only a few were mentioned in the Python 101 course.
The brief description below gives an overview of some functions that might be useful for solving the exercises.
np.arange
¶np.arange
is sort of like Python's standard range(start, stop [, step])
. However, the standard Python version only allows for use of integer values, whereas the NumPy version is more flexible. See a small demonstration below.
import numpy as np
# Create and array from -2 to 2 with step 0.5
a = np.arange(-2, 2, 0.5)
a
array([-2. , -1.5, -1. , -0.5, 0. , 0.5, 1. , 1.5])
Notice the that start point in included, but not the endpoint.
A related NumPy function is np.linspace
which was used in the previous course. It lets the user specify the number of elements for the resulting array rather than the step, as for np.arange
.
Docs for np.arange
Docs for np.linspace
A boolean array, also called a mask array, can be created by simply stating a condition on an existing array.
# Using the `a` array defined above
# Create a mask array for a with a condition
a < 0.2
array([ True, True, True, True, True, False, False, False])
Multiple conditions are also allowed with a simple syntax.
# Create a mask array for a with multiple conditions
(a > -1.4) & (a < 1.4)
array([False, False, True, True, True, True, True, False])
np.where
¶This function returns the indices where a boolean array has elements True
.
# Extract indices where elements in a are less than 2
np.where(a < 0.2)
(array([0, 1, 2, 3, 4], dtype=int64),)
# Repeating from above
a_bool = (a > -1.4) & (a < 1.4)
# Get indices where the multiple conditions are True
np.where(a_bool)
(array([2, 3, 4, 5, 6], dtype=int64),)
Notice that the returned value is a tuple. If you want the first element, which is an array, extract it by indexing.
Docs for np.where
Another explanation (arguably better)
np.append
¶Appending of an array to another array is demonstrated below.
# Create two arrays
e = np.array([1, 2, 3])
f = np.array([11, 12])
# Combine them by appending one to the other
np.append(e, f)
array([ 1, 2, 3, 11, 12])
Doc for np.append.
NumPy arrays can be indexed by other arrays. This is an extremely useful feature. See the demonstration below.
# Given an array (`a` from before)
a
array([-2. , -1.5, -1. , -0.5, 0. , 0.5, 1. , 1.5])
# And another array representing indices
b = np.array([3, 4])
b
array([3, 4])
# a can be indexed by b as
a[b]
array([-0.5, 0. ])
This concept can be extended to more complicated use cases.
NumPy/SciPy explanation for fancy indexing
find_peaks
¶The SciPy library has many useful functions for scientific computing. One of them is find_peaks
which is found in the signal
module (scipy.signal
). This means that it can be imported as from scipy.signal import find_peaks
.
Given an array of values it finds all the peaks by comparing to the value in the two neighboring points. Note that it does not find the valleys!
It has some useful optional parameters, e.g. setting a minimum height below which no peaks are returned.
The function returns a tuple with two values. The first is the array indices where peaks were found, the next is a property dictionary with some information about the analysis results.
Saving a function return to variables can be done by unpacking. If a tuple with two elements are returned, one can save the returned values like this:
# Save variables from a function that returns two variables
a, b = function_call()
# Or if only the first variables is need in the code
a, _ = function_call()
Docs for find_peaks
From solving the exercises in Session 0 you should have a local Git repository with a README.md
and a .gitignore
. We encourage to use this repository while working with the exercises in this course to get familiarity with the Git workflow.
In Session 1 you linked the local repo to a remote one on your GitHub account. You can push
when you want to sync your local work to the remote.
The exercises in this session focus on:
numpy
and scipy
**kwargs
as a function parameter (and **args
if you do the bonus exercises)In case you do not yet have a local Git repository set up for this course, you can do the exercises from Session 0 before doing these. If you just want to get going with these exercises, you can run git init
inside the directory where you want to store your code and start.
Create a branch
As you're about do a larger chunk of work with a specific topic in mind, create a branch with a descriptive name (e.g. session2
) and check it out.
Recall that git branch <branch_name>
creates a branch and git checkout <branch_name>
checks it out. This can be done in one operation as git checkout -b <branc_name>
.
Create a folder and a module
Create a folder for Session 2 and a module called utils.py
inside it.
The utils.py
module will store all the utility functions that can be called upon to do a lot of ground work.
Inside utils.py
write the function called extract_interval
which has its skeleton presented below. Read the docstring for details.
def extract_interval(x, x_bounds):
'''
Return the indices of `x` that are within `x_bounds` with both of values
`x_bounds` being inclusive.
Parameters
----------
x : 1D numpy array
x-coordinates sorted in ascending order.
x_bounds : tuple
A two-element tuple defining the interval within which to extract x-indices.
Returns
-------
1D numpy array
Array of indices for `x`-values that reside within `x_bounds`.
Assumptions
------------
The function assumes that the input array `x` is sorted in
ascending order.
'''
# Your code here!
# Basic test 1
x1 = np.arange(0, 10)
idx1 = extract_interval(x=x1, x_bounds=(3, 7))
# Basic test 2
x2 = np.arange(-0.6, 3.8, 0.4)
idx2 = extract_interval(x=x2, x_bounds=(-0.3, 1.5))
Results should be idx1 = [3 4 5 6 7]
and idx2 = [1 2 3 4 5]
.
Place test runs like the above in the bottom of the file in a if _name__ == '__main__':
block. This way it will not get run if the module is imported from elsewhere.
Once you're happy with the function you can commit your changes to create a snapshot of you repository state.
Write a function called extrema_indices
inside utils.py
that takes as input an array of y-coordinates and returns the indices of all local extrema (peaks and valleys). Use the find_peaks
function from scipy.signal
.
Be sure to write a good docstring for the function. You can use the same structure as in the previous exercise as guideline.
Recall that find_peaks
only finds the peaks and not* the valleys, so you have to do a slight workaround to get the valleys.*
Test the code by running yy
from below through the function
# A dummy graph to test against
xx = np.linspace(1, 20, 40)
yy = np.sin(xx) * 3 * np.cos(xx**2)
It should return this array:
array([ 1.83936866, 2.00743698, 1.76511072, 2.68372999, -0.10032392,
1.99429972, 0.27309389, 2.23407394, 0.12795831, 2.51053515,
0.2974298 , -2.00646398, -0.63018205, -2.36255017, -2.42571128,
-0.65672365, -0.37642334, -2.60571339, -1.05772914, -1.4625172 ,
-0.7508744 , -2.34019749, -1.50602509])
Create a commit when you have finished the exercise.
Write the function described in the docstring below.
def arrays_todict(x_arr, y_arr):
'''
Return a dictionary with y-values in string form as keys
and x- and y-values as values.
Parameters
----------
x_arr : 1D numpy array
Array of x-values
y_arr : 1D numpy array
Array of y-values
Returns
-------
dict
Dictionary of the form
{'y1': (x1, y1), 'y2': (x2, y2), ..., 'yn': (xn, yn)}
'''
# Your code here!
You might want to format the decimals for the keys. See here for how to do that using f-strings.
Create a commit when you have tested and finished the exercise.
This can be solved by a one-liner with a dictionary comprehension. They work similarly to list comprehensions, but instead of a single value at each element, one must supply a key and a value. See a simple demonstration of a dictionary comprehension below.
# Example of dict comprehension
{f'2*{x}': 2*x for x in range(5)}
{'2*0': 0, '2*1': 2, '2*2': 4, '2*3': 6, '2*4': 8}
The built-in zip
might also be useful. It enables iterating over two iterables simultaneously.
Docs for zip.
Notice that the Python implementation of zip
says `zip( iterables) . So it uses the
args` concept described above.
Create a new module plotting.py
for storing plotting related code. Inside it, write a function called annotate_points
.
def annotate_points(points_to_annotate, ax, **kwargs):
'''
Annotate points with corresponding text.
Parameters
----------
points_to_annoate : dict
Dictionary with desired annotation text as keys and the (x, y)-
coordinates of the annotation as values.
ax : matplotlib axis object
Axis object on which to plot the annotations.
**kwargs : keyword arguments
Arguments to be forwarded to the ax.annotate call.
'''
# Your code here
The purpose of the function for these exercises is to annotate the extreme values which are to be stored inside a dictionary of the form {'extr_val1': (x1, y1), 'extr_val2': (x2, y2), ...}
.
For annotating points use plt.annotate
or ax.annotate
. They work similarly, but plt
is for the Matlab-like API and ax
is for the Object-oriented API of matplotlib.
Docs for plt.annotate
Docs for ax.annoate
Note the intended use of **kwargs
in the annotate_points
function. It is meant to just transfer the many possible keyword arguments from your custom function to the matplotlib function. By using this, your own function does not limit the customizability of the underlying plotting code. Try for example to add the argument color='green'
to the function call when testing.
You can test the function by creating a plot and annotating the extreme values. Create some dummy values for a graph yourself or use these ones provided in Exercise 3.
Commit your changes when you are done.
Create module main.py
and import utils
and plotting
into it.
This is now a "clean" file that has access to all of our custom functions without looking cluttered.
Inside main.py
create a graph and use the functions previously created to annotate only extreme values within a certain x-interval of your choice.
You can use the test code from Exercise 3 and 5 if you want.
The exercises are basically done. If you want you can keep improving the functions. Some good improvements could be:
Make extract_interval
work with multiple intervals by use of *args
.
Specify the number of decimal points for the dict keys that are returned from the arrays_to_dict
function. This would be prettier when the strings that are stored as keys are eventually annotated in the plot.
If you start altering your existing, working code remember to make sure that you have that code stored in a commit (snapshot). This way you can always revert to that state if need be.
The cell below is for setting the style of this document. It's not part of the exercises.
# Apply css theme to notebook
from IPython.display import HTML
HTML('<style>{}</style>'.format(open('../css/cowi.css').read()))