We're going to focus now on what goes into writing a good Python function. If you want your function to be reusable, you need to store it in a textfile that ends in .py
. We can do this using the %%writefile
magic. Let's see an example:
%%writefile test.py
def hello_world():
print('Hello World')
Overwriting test.py
import test
test.hello_world()
Hello World
If you look in the file system, you'll see we have a file called test.py
. If it's in the same directory as you, you can get everything from the test file using import
. Here's some examples of it's somewhere else:
test.py
is in the parent directory of yours: import ..test
test.py
is in a subdirectory called sub: import sub.test
. To do that though you need to have an empty file called __init__.py
inside of the sub
folderThis file we've created is called a module, just like the math or numpy module. We can have multiple functions inside the module as well as variables.
%%writefile test.py
pi = 3.0
def square(x):
return x*x
def hello_world():
print('Hello World')
Overwriting test.py
import test
print('pi is exactly {}'.format(test.pi))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-14-c48ab7ab36ef> in <module>() 1 import test ----> 2 print('pi is exactly {}'.format(test.pi)) AttributeError: module 'test' has no attribute 'pi'
Uh-oh! It is using an outdated of test.py
. To get python to reload it, we can restart the kernel or use the reload
command
from importlib import reload
reload(test)
print('pi is exactly {}'.format(test.pi))
pi is exactly 3.0
You can add helpful documentation at the module (top of file) and function level
%%writefile test.py
'''This module contains nonsense'''
pi = 3.0
def square(x):
'''Want to square a number? This function will help'''
return x*x
def hello_world():
print('Hello World')
Overwriting test.py
reload(test)
help(test)
Help on module test: NAME test - This module contains nonsense FUNCTIONS hello_world() square(x) Want to square a number? This function will help DATA pi = 3.0 FILE /home/jovyan/work/test.py
The reason for creating a module like test.py
is to write a function once and for all so you don't need to copy-pasta. Let's try this for confidence intervals of data. Here are the steps:
Let's see this in action for computing confidence intervals
I'll be writing out the documentation. I'll use a docstring format called Napoleon. This is more complex than what we've seen before. We specify the function, how it works, examples, what it takes and what it returns. It's important to write your documentation FIRST, so you know what to write
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
Let's try first of all to compute just a double-sided confidence interval
import scipy.stats as ss
import numpy as np
data = [4,3,5,3,6, 7]
interval_type = 'double'
confidence = 0.95
center = np.mean(data)
s = np.std(data, ddof=1)
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
print(center, width, ppf)
4.66666666667 1.63127456586 0.975
Now let's try adding some logic for the interval_type of confidence interval
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, top)
4.66666666667 1.29545352026
The lower confidence interval should run from neg-infinity to a value above the mean. We need to adjust the code.
interval_type = 'lower'
if interval_type == 'lower':
ppf = confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)
4.66666666667 5.96212018693
interval_type = 'upper'
if interval_type == 'upper':
ppf = 1 - confidence
t = ss.t.ppf(ppf, len(data))
top = s / np.sqrt(len(data)) * t
print(center, center + top)
4.66666666667 3.3712131464
We can see there is quite a bit of code-repeat. Let's try to put the whole thing together without repeats
import scipy.stats as ss
import numpy as np
data = [4,3,5,3,6, 7]
interval_type = 'lower'
confidence = 0.95
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = width + center
print(center, width, ppf)
4.66666666667 5.96212018693 0.95
Let's put everything together now into a file
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
Overwriting utilities.py
import utilities
reload(utilities)
<module 'utilities' from '/home/jovyan/work/utilities.py'>
I wrote some example code with the documentation. Let's see if it works
data = [4,3,2,5]
center, width = utilities.conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
The mean is 3.5 +/- 1.792187609015029
Let's now test the code for a few different cases
#see if it recovers the correct mean
data = ss.norm.rvs(size=1000, loc=12.4)
print(utilities.conf_interval(data))
(12.413339150292749, 0.062443365827491451)
#see if it can handle upper/lower
print(utilities.conf_interval(data, 'upper'))
(12.413339150292749, 12.360949919612446)
print(utilities.conf_interval(data, 'lower'))
(12.413339150292749, 12.465728380973053)
#Check different confidence values
print(utilities.conf_interval(data, confidence=0.75))
(12.413339150292749, 0.036626408883252866)
utilities.conf_interval(data, confidence=95)
(12.413339150292749, nan)
This is a pretty usual mistake. We should probably check that confidence is a valid probability.
utilities.conf_interval([3], confidence=0.5)
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning) /opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)
(3.0, nan)
Uh-oh, only one value was given. We should probably warn if there are not enough values.
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
print('Not enough data given. Must have at least 3 values')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
Overwriting utilities.py
reload(utilities)
utilities.conf_interval([3])
Not enough data given. Must have at least 3 values
/opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:82: RuntimeWarning: Degrees of freedom <= 0 for slice warnings.warn("Degrees of freedom <= 0 for slice", RuntimeWarning) /opt/conda/lib/python3.5/site-packages/numpy/core/_methods.py:116: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)
(3.0, nan)
Ah, but notice it didn't actually stop the program!
What we need is to do one of those error messages you see a lot. We can do that by raising an exception
raise RuntimeError('This is a problem')
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-35-3649338a9130> in <module>() ----> 1 raise RuntimeError('This is a problem') RuntimeError: This is a problem
raise ValueError('Your value is bad and you should feel bad')
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-36-8f472bdb8b36> in <module>() ----> 1 raise ValueError('Your value is bad and you should feel bad') ValueError: Your value is bad and you should feel bad
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
Overwriting utilities.py
reload(utilities)
utilities.conf_interval([3])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-38-7af562286d5c> in <module>() 1 reload(utilities) ----> 2 utilities.conf_interval([3]) /home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence) 29 30 if(len(data) < 3): ---> 31 raise ValueError('Not enough data given. Must have at least 3 values') 32 33 center = np.mean(data) ValueError: Not enough data given. Must have at least 3 values
%%writefile utilities.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
Overwriting utilities.py
reload(utilities)
utilities.conf_interval([3])
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-40-7af562286d5c> in <module>() 1 reload(utilities) ----> 2 utilities.conf_interval([3]) /home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence) 29 30 if(len(data) < 3): ---> 31 raise ValueError('Not enough data given. Must have at least 3 values') 32 if(interval_type not in ['upper', 'lower', 'double']): 33 raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type)) ValueError: Not enough data given. Must have at least 3 values
utilities.conf_interval([3,4,32], confidence=95)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-41-51802216abd6> in <module>() ----> 1 utilities.conf_interval([3,4,32], confidence=95) /home/jovyan/work/utilities.py in conf_interval(data, interval_type, confidence) 33 raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type)) 34 if(0 > confidence or confidence > 1): ---> 35 raise ValueError('Confidence must be between 0 and 1') 36 37 center = np.mean(data) ValueError: Confidence must be between 0 and 1
Now we'll learn how to put all our files together into a package that we can always use.
You need to arrange your files and folders in a special way. Let's say I'm putting all my functions together into a package called che116. I need to arrange it like this:
che116-package/ <-- the top directory
setup.py <-- the file which gives info about the package
che116/ <-- a folder where the code is stored
__init__.py <-- a completely empty file. The name is important
stats.py <-- where I would put some functions related to stats
Here's the contents of the three files we need to make. NOTE: You need to create the folders above before you can run this. Change the stuff after %%writefile
to match where you want it.
%%writefile unit_15/che116-package/setup.py
from setuptools import setup
setup(name = 'che116', #the name for install purposes
author = 'Andrew White', #for your own info
description = 'Some stuff I wrote for CHE 116', #displayed when install/update
version='1.0',
packages=['che116']) #This name should match the directory where you put your code
Overwriting unit_15/che116-package/setup.py
%%writefile unit_15/che116-package/che116/__init__.py
'''You can put some comments in here if you want. They should describe the package.'''
Overwriting unit_15/che116-package/che116/__init__.py
%%writefile unit_15/che116-package/che116/stats.py
import scipy.stats as ss
import numpy as np
def conf_interval(data, interval_type='double', confidence=0.95):
'''This function takes in the data and computes a confidence interval
Examples
--------
data = [4,3,2,5]
center, width = conf_interval(data)
print('The mean is {} +/- {}'.format(center, width))
Parameters
----------
data : list
The list of data points
interval_type : str
What kind of confidence interval. Can be double, upper, lower.
confidence : float
The confidence of the interval
Returns
-------
center, width
Center is the mean of the data. Width is the width of the confidence interval.
If a lower or upper is specified, width is the upper or lower value.
'''
if(len(data) < 3):
raise ValueError('Not enough data given. Must have at least 3 values')
if(interval_type not in ['upper', 'lower', 'double']):
raise ValueError('I do not know how to make a {} confidence interval'.format(interval_type))
if(0 > confidence or confidence > 1):
raise ValueError('Confidence must be between 0 and 1')
center = np.mean(data)
s = np.std(data, ddof=1)
if interval_type == 'lower':
ppf = confidence
elif interval_type == 'upper':
ppf = 1 - confidence
else:
ppf = 1 - (1 - confidence) / 2
t = ss.t.ppf(ppf, len(data))
width = s / np.sqrt(len(data)) * t
if interval_type == 'lower' or interval_type == 'upper':
width = center + width
return center, width
Overwriting unit_15/che116-package/che116/stats.py
Once you're done, run pip install -e [path to your folder]
, where the path is the directory where you put the setup.py file. The -e
means editable: if you edit any of the above files, you do not need to reinstall
%system pip install -e unit_15/che116-package
['Obtaining file:///home/jovyan/work/unit_15/che116-package', 'Installing collected packages: che116', ' Running setup.py develop for che116', 'Successfully installed che116-1.0']
#YOU MUST RESTART KERNEL FIRST TIME THROUGH
#after intall + restart, you'll always have your package available
import che116.stats as cs
cs.conf_interval([4,3,4])
(3.6666666666666665, 4.7274821017614208)
help(che116)
Help on package che116: NAME che116 - You can put some comments in here if you want. They should describe the package. PACKAGE CONTENTS stats FILE /home/jovyan/work/unit_15/che116-package/che116/__init__.py
help(che116.stats)
Help on module che116.stats in che116: NAME che116.stats FUNCTIONS conf_interval(data, interval_type='double', confidence=0.95) This function takes in the data and computes a confidence interval Examples -------- data = [4,3,2,5] center, width = conf_interval(data) print('The mean is {} +/- {}'.format(center, width)) Parameters ---------- data : list The list of data points interval_type : str What kind of confidence interval. Can be double, upper, lower. confidence : float The confidence of the interval Returns ------- center, width Center is the mean of the data. Width is the width of the confidence interval. If a lower or upper is specified, width is the upper or lower value. FILE /home/jovyan/work/unit_15/che116-package/che116/stats.py