Writing Modules and Functions

Unit 14, Lecture 2

Numerical Methods and Statistics


May 1, 2018

Writing Good, Reliable Documented Functions

We're going to focus now on what goes into writing a good Python function. If you want your function to be reusable, you need to store it in a textfile that ends in .py. We can do this using the %%writefile magic. Let's see an example:

In [50]:
%%writefile test.py

def hello_world():
    print('Hello World')
Overwriting test.py
In [12]:
import test

test.hello_world()
Hello World

If you look in the file system, you'll see we have a file called test.py. If it's in the same directory as you, you can get everything from the test file using import. Here's some examples of it's somewhere else:

  1. If test.py is in the parent directory of yours: import ..test
  2. If test.py is in a subdirectory called sub: import sub.test. To do that though you need to have an empty file called __init__.py inside of the sub folder

Modules

This file we've created is called a module, just like the math or numpy module. We can have multiple functions inside the module as well as variables.

In [13]:
%%writefile test.py

pi = 3.0

def square(x):
    return x*x


def hello_world():
    print('Hello World')
Overwriting test.py
In [14]:
import test
print('pi is exactly {}'.format(test.pi))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-c48ab7ab36ef> in <module>()
      1 import test
----> 2 print('pi is exactly {}'.format(test.pi))

AttributeError: module 'test' has no attribute 'pi'

Uh-oh! It is using an outdated of test.py. To get python to reload it, we can restart the kernel or use the reload command

In [15]:
from importlib import reload
reload(test)
print('pi is exactly {}'.format(test.pi))
pi is exactly 3.0

Documenting

You can add helpful documentation at the module (top of file) and function level

In [16]:
%%writefile test.py
'''This module contains nonsense'''

pi = 3.0

def square(x):
    '''Want to square a number? This function will help'''
    return x*x


def hello_world():
    print('Hello World')
Overwriting test.py
In [17]:
reload(test)
help(test)
Help on module test:

NAME
    test - This module contains nonsense

FUNCTIONS
    hello_world()
    
    square(x)
        Want to square a number? This function will help

DATA
    pi = 3.0

FILE
    /home/jovyan/work/test.py


Writing a good function

The reason for creating a module like test.py is to write a function once and for all so you don't need to copy-pasta. Let's try this for confidence intervals of data. Here are the steps:

  1. Document what your function should do (plan)
  2. Get basic functionality working in a notebook (develop)
  3. Move function to a file and import (deploy)
  4. Write some cells in a notebook to test basic cases until you have everything working (test)
  5. Finally polish off your code by testing bad inputs and trying to break it (more testing)

Example: Writing a function to compute confidence intervals

Let's see this in action for computing confidence intervals

1. Plan

I'll be writing out the documentation. I'll use a docstring format called Napoleon. This is more complex than what we've seen before. We specify the function, how it works, examples, what it takes and what it returns. It's important to write your documentation FIRST, so you know what to write

In [18]:
def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''

2. Develop

Let's try first of all to compute just a double-sided confidence interval

In [19]:
import scipy.stats as ss
import numpy as np

data = [4,3,5,3,6, 7]
interval_type = 'double'
confidence = 0.95

center = np.mean(data)
s = np.std(data, ddof=1)
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

print(center, width, ppf)
4.66666666667 1.63127456586 0.975

Now let's try adding some logic for the interval_type of confidence interval

In [20]:
interval_type = 'lower'
if interval_type == 'lower':
    ppf = confidence
    t = ss.t.ppf(ppf, len(data))
    top = s / np.sqrt(len(data)) * t
    print(center, top)
4.66666666667 1.29545352026

The lower confidence interval should run from neg-infinity to a value above the mean. We need to adjust the code.

In [21]:
interval_type = 'lower'
if interval_type == 'lower':
    ppf = confidence
    t = ss.t.ppf(ppf, len(data))
    top = s / np.sqrt(len(data)) * t
    print(center, center + top)
4.66666666667 5.96212018693
In [22]:
interval_type = 'upper'
if interval_type == 'upper':
    ppf = 1 - confidence
    t = ss.t.ppf(ppf, len(data))
    top = s / np.sqrt(len(data)) * t
    print(center, center + top)
4.66666666667 3.3712131464

We can see there is quite a bit of code-repeat. Let's try to put the whole thing together without repeats

In [23]:
import scipy.stats as ss
import numpy as np

data = [4,3,5,3,6, 7]
interval_type = 'lower'
confidence = 0.95

center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
    ppf = confidence
elif interval_type == 'upper':
    ppf = 1 - confidence
else:
    ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t

if interval_type == 'lower' or interval_type == 'upper':
    width = width + center

print(center, width, ppf)
4.66666666667 5.96212018693 0.95

3. Deploy

Let's put everything together now into a file

In [24]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''

    center = np.mean(data)
    s = np.std(data, ddof=1)
    if interval_type == 'lower':
        ppf = confidence
    elif interval_type == 'upper':
        ppf = 1 - confidence
    else:
        ppf = 1 - (1 - confidence) / 2
    t = ss.t.ppf(ppf, len(data))
    width = s / np.sqrt(len(data)) * t
    
    if interval_type == 'lower' or interval_type == 'upper':
        width = center + width
    return center, width
Overwriting utilities.py
In [25]:
import utilities
reload(utilities)
Out[25]:
<module 'utilities' from '/home/jovyan/work/utilities.py'>

I wrote some example code with the documentation. Let's see if it works

In [26]:
data = [4,3,2,5]
center, width = utilities.conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
The mean is 3.5 +/- 1.792187609015029

4. Test

Let's now test the code for a few different cases

In [27]:
#see if it recovers the correct mean
data = ss.norm.rvs(size=1000, loc=12.4)
print(utilities.conf_interval(data))
(12.413339150292749, 0.062443365827491451)
In [28]:
#see if it can handle upper/lower
print(utilities.conf_interval(data, 'upper'))
(12.413339150292749, 12.360949919612446)
In [29]:
print(utilities.conf_interval(data, 'lower'))
(12.413339150292749, 12.465728380973053)
In [30]:
#Check different confidence values
print(utilities.conf_interval(data, confidence=0.75))
(12.413339150292749, 0.036626408883252866)

5. Break it

In [31]:
utilities.conf_interval(data, confidence=95)
Out[31]:
(12.413339150292749, nan)

This is a pretty usual mistake. We should probably check that confidence is a valid probability.

In [32]:
utilities.conf_interval([3], confidence=0.5)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Out[32]:
(3.0, nan)

Uh-oh, only one value was given. We should probably warn if there are not enough values.

In [33]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''
    
    if(len(data) < 3):
        print('Not enough data given. Must have at least 3 values')

    center = np.mean(data)
    s = np.std(data, ddof=1)
    if interval_type == 'lower':
        ppf = confidence
    elif interval_type == 'upper':
        ppf = 1 - confidence
    else:
        ppf = 1 - (1 - confidence) / 2
    t = ss.t.ppf(ppf, len(data))
    width = s / np.sqrt(len(data)) * t
    
    if interval_type == 'lower' or interval_type == 'upper':
        width = center + width
    return center, width
Overwriting utilities.py
In [34]:
reload(utilities)
utilities.conf_interval([3])
Not enough data given. Must have at least 3 values
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice
  warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Out[34]:
(3.0, nan)

Ah, but notice it didn't actually stop the program!

Exceptions

What we need is to do one of those error messages you see a lot. We can do that by raising an exception

In [35]:
raise RuntimeError('This is a problem')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-35-3649338a9130> in <module>()
----> 1 raise RuntimeError('This is a problem')

RuntimeError: This is a problem
In [36]:
raise ValueError('Your value is bad and you should feel bad')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-36-8f472bdb8b36> in <module>()
----> 1 raise ValueError('Your value is bad and you should feel bad')

ValueError: Your value is bad and you should feel bad
In [37]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''
    
    if(len(data) < 3):
        raise ValueError('Not enough data given. Must have at least 3 values')

    center = np.mean(data)
    s = np.std(data, ddof=1)
    if interval_type == 'lower':
        ppf = confidence
    elif interval_type == 'upper':
        ppf = 1 - confidence
    else:
        ppf = 1 - (1 - confidence) / 2
    t = ss.t.ppf(ppf, len(data))
    width = s / np.sqrt(len(data)) * t
    
    if interval_type == 'lower' or interval_type == 'upper':
        width = center + width
    return center, width
Overwriting utilities.py
In [38]:
reload(utilities)
utilities.conf_interval([3])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-7af562286d5c> in <module>()
      1 reload(utilities)
----> 2 utilities.conf_interval([3])

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
     29 
     30     if(len(data) < 3):
---> 31         raise ValueError('Not enough data given. Must have at least 3 values')
     32 
     33     center = np.mean(data)

ValueError: Not enough data given. Must have at least 3 values
In [39]:
%%writefile utilities.py

import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''
    
    if(len(data) < 3):
        raise ValueError('Not enough data given. Must have at least 3 values')
    if(interval_type not in ['upper', 'lower', 'double']):
        raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
    if(0 > confidence or confidence > 1):
        raise ValueError('Confidence must be between 0 and 1')
    
    center = np.mean(data)
    s = np.std(data, ddof=1)
    if interval_type == 'lower':
        ppf = confidence
    elif interval_type == 'upper':
        ppf = 1 - confidence
    else:
        ppf = 1 - (1 - confidence) / 2
    t = ss.t.ppf(ppf, len(data))
    width = s / np.sqrt(len(data)) * t
    
    if interval_type == 'lower' or interval_type == 'upper':
        width = center + width
    return center, width
Overwriting utilities.py
In [40]:
reload(utilities)
utilities.conf_interval([3])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-40-7af562286d5c> in <module>()
      1 reload(utilities)
----> 2 utilities.conf_interval([3])

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
     29 
     30     if(len(data) < 3):
---> 31         raise ValueError('Not enough data given. Must have at least 3 values')
     32     if(interval_type not in ['upper', 'lower', 'double']):
     33         raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))

ValueError: Not enough data given. Must have at least 3 values
In [41]:
utilities.conf_interval([3,4,32], confidence=95)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-51802216abd6> in <module>()
----> 1 utilities.conf_interval([3,4,32], confidence=95)

/home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence)
     33         raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
     34     if(0 > confidence or confidence > 1):
---> 35         raise ValueError('Confidence must be between 0 and 1')
     36 
     37     center = np.mean(data)

ValueError: Confidence must be between 0 and 1

Packaging up your files

Now we'll learn how to put all our files together into a package that we can always use.

You need to arrange your files and folders in a special way. Let's say I'm putting all my functions together into a package called che116. I need to arrange it like this:

che116-package/                   <-- the top directory
    setup.py              <-- the file which gives info about the package
    che116/               <-- a folder where the code is stored
        __init__.py       <-- a completely empty file. The name is important
        stats.py          <-- where I would put some functions related to stats

Here's the contents of the three files we need to make. NOTE: You need to create the folders above before you can run this. Change the stuff after %%writefile to match where you want it.

In [43]:
%%writefile unit_15/che116-package/setup.py

from setuptools import setup

setup(name = 'che116', #the name for install purposes
      author = 'Andrew White', #for your own info
      description = 'Some stuff I wrote for CHE 116', #displayed when install/update
      version='1.0',
      packages=['che116']) #This name should match the directory where you put your code
Overwriting unit_15/che116-package/setup.py
In [44]:
%%writefile unit_15/che116-package/che116/__init__.py
'''You can put some comments in here if you want. They should describe the package.'''
Overwriting unit_15/che116-package/che116/__init__.py
In [45]:
%%writefile unit_15/che116-package/che116/stats.py


import scipy.stats as ss
import numpy as np

def conf_interval(data, interval_type='double', confidence=0.95):
    '''This function takes in the data and computes a confidence interval
    
    Examples
    --------

        data = [4,3,2,5]
        center, width = conf_interval(data)
        print('The mean is {} +/- {}'.format(center, width))
    
    Parameters
    ----------
        data : list
            The list of data points
        interval_type : str
            What kind of confidence interval. Can be double, upper, lower.
        confidence : float
            The confidence of the interval
    Returns
    -------
    center, width
        Center is the mean of the data. Width is the width of the confidence interval. 
        If a lower or upper is specified, width is the upper or lower value.
    '''
    
    if(len(data) < 3):
        raise ValueError('Not enough data given. Must have at least 3 values')
    if(interval_type not in ['upper', 'lower', 'double']):
        raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
    if(0 > confidence or confidence > 1):
        raise ValueError('Confidence must be between 0 and 1')
    
    center = np.mean(data)
    s = np.std(data, ddof=1)
    if interval_type == 'lower':
        ppf = confidence
    elif interval_type == 'upper':
        ppf = 1 - confidence
    else:
        ppf = 1 - (1 - confidence) / 2
    t = ss.t.ppf(ppf, len(data))
    width = s / np.sqrt(len(data)) * t
    
    if interval_type == 'lower' or interval_type == 'upper':
        width = center + width
    return center, width
Overwriting unit_15/che116-package/che116/stats.py

Installing your package

Once you're done, run pip install -e [path to your folder], where the path is the directory where you put the setup.py file. The -e means editable: if you edit any of the above files, you do not need to reinstall

In [46]:
%system pip install -e unit_15/che116-package
Out[46]:
['Obtaining file:///home/jovyan/work/unit_15/che116-package',
 'Installing collected packages: che116',
 '  Running setup.py develop for che116',
 'Successfully installed che116-1.0']
In [47]:
#YOU MUST RESTART KERNEL FIRST TIME THROUGH
#after intall + restart, you'll always have your package available
import che116.stats as cs

cs.conf_interval([4,3,4])
Out[47]:
(3.6666666666666665, 4.7274821017614208)
In [48]:
help(che116)
Help on package che116:

NAME
    che116 - You can put some comments in here if you want. They should describe the package.

PACKAGE CONTENTS
    stats

FILE
    /home/jovyan/work/unit_15/che116-package/che116/__init__.py


In [49]:
help(che116.stats)
Help on module che116.stats in che116:

NAME
    che116.stats

FUNCTIONS
    conf_interval(data, interval_type='double', confidence=0.95)
        This function takes in the data and computes a confidence interval
        
        Examples
        --------
        
            data = [4,3,2,5]
            center, width = conf_interval(data)
            print('The mean is {} +/- {}'.format(center, width))
        
        Parameters
        ----------
            data : list
                The list of data points
            interval_type : str
                What kind of confidence interval. Can be double, upper, lower.
            confidence : float
                The confidence of the interval
        Returns
        -------
        center, width
            Center is the mean of the data. Width is the width of the confidence interval. 
            If a lower or upper is specified, width is the upper or lower value.

FILE
    /home/jovyan/work/unit_15/che116-package/che116/stats.py