# Writing Modules and Functions¶

## Unit 14, Lecture 2¶

Numerical Methods and Statistics

# Writing Good, Reliable Documented Functions¶

We're going to focus now on what goes into writing a good Python function. If you want your function to be reusable, you need to store it in a textfile that ends in .py. We can do this using the %%writefile magic. Let's see an example:

In [50]:
%%writefile test.py

def hello_world():
print('Hello World')

Overwriting test.py

In [12]:
import test

test.hello_world()

Hello World


If you look in the file system, you'll see we have a file called test.py. If it's in the same directory as you, you can get everything from the test file using import. Here's some examples of it's somewhere else:

1. If test.py is in the parent directory of yours: import ..test
2. If test.py is in a subdirectory called sub: import sub.test. To do that though you need to have an empty file called __init__.py inside of the sub folder

# Modules¶

This file we've created is called a module, just like the math or numpy module. We can have multiple functions inside the module as well as variables.

In [13]:
%%writefile test.py

pi = 3.0

def square(x):
return x*x

def hello_world():
print('Hello World')

Overwriting test.py

In [14]:
import test
print('pi is exactly {}'.format(test.pi))

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-c48ab7ab36ef> in <module>()
1 import test
----> 2 print('pi is exactly {}'.format(test.pi))

AttributeError: module 'test' has no attribute 'pi'

Uh-oh! It is using an outdated of test.py. To get python to reload it, we can restart the kernel or use the reload command

In [15]:
from importlib import reload
print('pi is exactly {}'.format(test.pi))

pi is exactly 3.0


# Documenting¶

You can add helpful documentation at the module (top of file) and function level

In [16]:
%%writefile test.py
'''This module contains nonsense'''

pi = 3.0

def square(x):
'''Want to square a number? This function will help'''
return x*x

def hello_world():
print('Hello World')

Overwriting test.py

In [17]:
reload(test)
help(test)

Help on module test:

NAME
test - This module contains nonsense

FUNCTIONS
hello_world()

square(x)
Want to square a number? This function will help

DATA
pi = 3.0

FILE
/home/jovyan/work/test.py



# Writing a good function¶

The reason for creating a module like test.py is to write a function once and for all so you don't need to copy-pasta. Let's try this for confidence intervals of data. Here are the steps:

1. Document what your function should do (plan)
2. Get basic functionality working in a notebook (develop)
3. Move function to a file and import (deploy)
4. Write some cells in a notebook to test basic cases until you have everything working (test)
5. Finally polish off your code by testing bad inputs and trying to break it (more testing)

## Example: Writing a function to compute confidence intervals¶

Let's see this in action for computing confidence intervals

## 1. Plan¶

I'll be writing out the documentation. I'll use a docstring format called Napoleon. This is more complex than what we've seen before. We specify the function, how it works, examples, what it takes and what it returns. It's important to write your documentation FIRST, so you know what to write

In [18]:
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''


## 2. Develop¶

Let's try first of all to compute just a double-sided confidence interval

In [19]:
import scipy.stats as ss
import numpy as np

data = [4,3,5,3,6, 7]
interval_type = 'double'
confidence = 0.95

center = np.mean(data)
s = np.std(data, ddof=1)
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

print(center, width, ppf)

4.66666666667 1.63127456586 0.975


Now let's try adding some logic for the interval_type of confidence interval

In [20]:
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, top)

4.66666666667 1.29545352026


The lower confidence interval should run from neg-infinity to a value above the mean. We need to adjust the code.

In [21]:
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)

4.66666666667 5.96212018693

In [22]:
interval_type = 'upper'
if interval_type == 'upper':
ppf = 1 - confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)

4.66666666667 3.3712131464


We can see there is quite a bit of code-repeat. Let's try to put the whole thing together without repeats

In [23]:
import scipy.stats as ss
import numpy as np

data = [4,3,5,3,6, 7]
interval_type = 'lower'
confidence = 0.95

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = width + center

print(center, width, ppf)

4.66666666667 5.96212018693 0.95


## 3. Deploy¶

Let's put everything together now into a file

In [24]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width

Overwriting utilities.py

In [25]:
import utilities

Out[25]:
<module 'utilities' from '/home/jovyan/work/utilities.py'>

I wrote some example code with the documentation. Let's see if it works

In [26]:
data = [4,3,2,5]
center, width = utilities.conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

The mean is 3.5 +/- 1.792187609015029


## 4. Test¶

Let's now test the code for a few different cases

In [27]:
#see if it recovers the correct mean
data = ss.norm.rvs(size=1000, loc=12.4)
print(utilities.conf_interval(data))

(12.413339150292749, 0.062443365827491451)

In [28]:
#see if it can handle upper/lower
print(utilities.conf_interval(data, 'upper'))

(12.413339150292749, 12.360949919612446)

In [29]:
print(utilities.conf_interval(data, 'lower'))

(12.413339150292749, 12.465728380973053)

In [30]:
#Check different confidence values
print(utilities.conf_interval(data, confidence=0.75))

(12.413339150292749, 0.036626408883252866)


## 5. Break it¶

In [31]:
utilities.conf_interval(data, confidence=95)

Out[31]:
(12.413339150292749, nan)

This is a pretty usual mistake. We should probably check that confidence is a valid probability.

In [32]:
utilities.conf_interval([3], confidence=0.5)

/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

Out[32]:
(3.0, nan)

Uh-oh, only one value was given. We should probably warn if there are not enough values.

In [33]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''

if(len(data) < 3):
print('Not enough data given. Must have at least 3 values')

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width

Overwriting utilities.py

In [34]:
reload(utilities)
utilities.conf_interval([3])

Not enough data given. Must have at least 3 values

/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)

Out[34]:
(3.0, nan)

Ah, but notice it didn't actually stop the program!

## Exceptions¶

What we need is to do one of those error messages you see a lot. We can do that by raising an exception

In [35]:
raise RuntimeError('This is a problem')

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-3649338a9130> in <module>()
----> 1 raise RuntimeError('This is a problem')

RuntimeError: This is a problem
In [36]:
raise ValueError('Your value is bad and you should feel bad')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-8f472bdb8b36> in <module>()

ValueError: Your value is bad and you should feel bad
In [37]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''

if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width

Overwriting utilities.py

In [38]:
reload(utilities)
utilities.conf_interval([3])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-7af562286d5c> in <module>()
----> 2 utilities.conf_interval([3])

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
29
30     if(len(data) < 3):
---> 31         raise ValueError('Not enough data given. Must have at least 3 values')
32
33     center = np.mean(data)

ValueError: Not enough data given. Must have at least 3 values
In [39]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''

if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width

Overwriting utilities.py

In [40]:
reload(utilities)
utilities.conf_interval([3])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-7af562286d5c> in <module>()
----> 2 utilities.conf_interval([3])

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
29
30     if(len(data) < 3):
---> 31         raise ValueError('Not enough data given. Must have at least 3 values')
32     if(interval_type not in ['upper', 'lower', 'double']):
33         raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))

ValueError: Not enough data given. Must have at least 3 values
In [41]:
utilities.conf_interval([3,4,32], confidence=95)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-51802216abd6> in <module>()
----> 1 utilities.conf_interval([3,4,32], confidence=95)

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
33         raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
34     if(0 > confidence or confidence > 1):
---> 35         raise ValueError('Confidence must be between 0 and 1')
36
37     center = np.mean(data)

ValueError: Confidence must be between 0 and 1

Now we'll learn how to put all our files together into a package that we can always use.

You need to arrange your files and folders in a special way. Let's say I'm putting all my functions together into a package called che116. I need to arrange it like this:

che116-package/                   <-- the top directory
setup.py              <-- the file which gives info about the package
che116/               <-- a folder where the code is stored
__init__.py       <-- a completely empty file. The name is important
stats.py          <-- where I would put some functions related to stats

Here's the contents of the three files we need to make. NOTE: You need to create the folders above before you can run this. Change the stuff after %%writefile to match where you want it.

In [43]:
%%writefile unit_15/che116-package/setup.py

from setuptools import setup

setup(name = 'che116', #the name for install purposes
author = 'Andrew White', #for your own info
description = 'Some stuff I wrote for CHE 116', #displayed when install/update
version='1.0',
packages=['che116']) #This name should match the directory where you put your code

Overwriting unit_15/che116-package/setup.py

In [44]:
%%writefile unit_15/che116-package/che116/__init__.py
'''You can put some comments in here if you want. They should describe the package.'''

Overwriting unit_15/che116-package/che116/__init__.py

In [45]:
%%writefile unit_15/che116-package/che116/stats.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''

if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width

Overwriting unit_15/che116-package/che116/stats.py


Once you're done, run pip install -e [path to your folder], where the path is the directory where you put the setup.py file. The -e means editable: if you edit any of the above files, you do not need to reinstall

In [46]:
%system pip install -e unit_15/che116-package

Out[46]:
['Obtaining file:///home/jovyan/work/unit_15/che116-package',
'Installing collected packages: che116',
'  Running setup.py develop for che116',
'Successfully installed che116-1.0']
In [47]:
#YOU MUST RESTART KERNEL FIRST TIME THROUGH
#after intall + restart, you'll always have your package available
import che116.stats as cs

cs.conf_interval([4,3,4])

Out[47]:
(3.6666666666666665, 4.7274821017614208)
In [48]:
help(che116)

Help on package che116:

NAME
che116 - You can put some comments in here if you want. They should describe the package.

PACKAGE CONTENTS
stats

FILE
/home/jovyan/work/unit_15/che116-package/che116/__init__.py


In [49]:
help(che116.stats)

Help on module che116.stats in che116:

NAME
che116.stats

FUNCTIONS
conf_interval(data, interval_type='double', confidence=0.95)
This function takes in the data and computes a confidence interval

Examples
--------

data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))

Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.

FILE
/home/jovyan/work/unit_15/che116-package/che116/stats.py