In our last class, we used NumPy to work with arrays, created plots with matplotlib, and used for loops to repeat actions.
We'll continue working with those same data files and add to our reproducible workflow to automate data analysis. By the end of this class, you should be able to:
if
, elif
, else
)and
and or
The last section of the previous class used for loops to repeat actions across mutliple file names. We're ready to integrate this programming method into our data visualization for the arthritis inflammation data:
# plot average inflammation for each file in a separate plot
import glob
import numpy as np
import matplotlib.pyplot as plt
filenames = sorted(glob.glob("data/inflammation*.csv"))
for filename in filenames:
print(filename)
data = np.loadtxt(fname=filename, delimiter=',')
fig = plt.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel('average')
axes1.plot(np.mean(data, axis=0))
axes2.set_ylabel('max')
axes2.plot(np.max(data, axis=0))
axes3.set_ylabel('min')
axes3.plot(np.min(data, axis=0))
fig.tight_layout()
plt.show()
data/inflammation-01.csv
data/inflammation-02.csv
data/inflammation-03.csv
data/inflammation-04.csv
data/inflammation-05.csv
data/inflammation-06.csv
data/inflammation-07.csv
data/inflammation-08.csv
data/inflammation-09.csv
data/inflammation-10.csv
data/inflammation-11.csv
data/inflammation-12.csv
Challenge-show¶
Optional: We've noted that plt.show()
may not be necessary for plots to appear using some Python interpreters. Try removing this line from the following for loop and view the output. How do you explain the results?
Challenge-three¶
Optional: How could you modify the loop coded above so only the first three data files are visualized?
The results from the plots above indicate that at least one of our datasets may have suspicious data, likely related to data entry error.
In this section, we'll explore ways to use Python to help us determine which files are erroneous, and which errors apply to each file.
So far, we've been building our programming skills with Python to automate analyses by repeating actions. Eventually, though, we'll encounter circumstances where we need Python to run certain code under some circumstances, but to run other code under other circumstances. Running code based on certain criteria relies on conditional statements.
An example would be assessing whether a given value is greater than a certain number:
num = 3
if num > 0:
print(num, "is positive")
3 is positive
This example accurately assesses the condition,
and also demonstrates the general syntax for an if
statement:
if CONDITION:
ACTION
If we used a number that did not meet the condition, however, the output would be more cryptic, since nothing would be printed to the screen:
num = -3
print("before conditional...")
if num > 100:
print(num, "is positive")
print("...after conditional")
before conditional... ...after conditional
In the example above, the condition is not met, so no statements are printed as a part of the conditionals. Our additional print statements allow us to determine (by inference) that the condition is not met.
We could include an additional option that would let us know the condition has not been met:
num = -3
if num > 0:
print(num, "is positive")
else:
print(num, "is negative")
-3 is negative
This accurately represents this particular output. However, if we input zero, we'd be told it was negative as well.
We can include multiple alternatives using elif
:
num = 0
if num > 0:
print(num, "is positive")
elif num == 0:
print(num, "is zero")
else:
print(num, "is negative")
0 is zero
The double equal signs (==
) are required in the code above to denote mathematical equivalency,
and to differentiate from a single equal sign for parameter and variable assignment.
Developing this code has shown us a few features of if/else statements:
if
statements can be used aloneelif
provides an alternative condition;there can be multiple elif
included in the if
statement
else
accommodates any input that does not meet the if
and elif
conditions aboveWe can also combine conditions.
Use of and
indicates that both conditions must be true:
if (1 > 0) and (-1 < 0):
print("both parts are true")
else:
print("at least one part is false")
both parts are true
Try switching the numbers above to make one of the conditions false so you can obtain the alternative answer.
Combining conditions using if
indicates only one condition must be true:
if (1 < 0) or (-1 < 0):
print("at least one part is true")
else:
print("both parts are false")
at least one part is true
In the code above,
we've demonstrated combining conditions using if
,
as well as assessing statements using logical (true/false) outputs.
Challenge-conditionals¶
Given the following code, what answer (A, B, or C) do you expect to be correct? How would you rewrite the code to get another answer?
if 4 < 5:
print("A")
elif 4 == 5:
print("B")
elif 4 < 5:
print("C")
Now that we understand conditional statements, let's begin applying them to our inflammation data. We'll be using conditionals to assess whether our data possess any potential errors, since our visualizations in the previous class indicated possibly suspicious patterns.
We'll begin by setting up a comparison between two different days in the study:
import numpy as np
data = np.loadtxt(fname="data/inflammation-01.csv", delimiter=",")
max_inflammation_0 = np.max(data, axis=0)[0] # max for early day
max_inflammation_20 = np.max(data, axis=0)[20] # max for middle day
We can use the code above to evaluate whether the data entered for a specific day is exactly the same as the day, which would indicate an error in data entry:
# check if max equals day number
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print("Suspicious looking maxima!")
Suspicious looking maxima!
At least for our first data file, it seems like we have cause to be suspicious.
Next, we can evaluate whether there are any patients without inflammation present in our dataset:
# check if any patients are have zero total inflammation
if np.sum(np.min(data, axis=0)) == 0:
print("Minima add up to zero!")
We can combine these conditional statements together, while also testing on another data file:
# test both conditions on another data file
data = np.loadtxt(fname="data/inflammation-03.csv", delimiter=",")
max_inflammation_0 = np.max(data, axis=0)[0]
max_inflammation_20 = np.max(data, axis=0)[20]
if max_inflammation_0 == 0 and max_inflammation_20 == 20:
print("Suspicious looking maxima!")
elif np.sum(np.min(data, axis=0)) == 0:
print("Minima add up to zero!")
else:
print("Seems OK!")
Minima add up to zero!
Challenge-more-vowels¶
For loops and conditionals represent two powerful programming structures for automating our work. The last section of today's material will cover another programming approach: creating our own functions.
We'll begin by exploring some basic functions that convert temperatures:
# define a function that converts F to C
def fahr_to_celsius(temp):
return ((temp - 32) * (5/9))
# test function
fahr_to_celsius(32)
0.0
You may interpret from the function name that this converts Fahrenheit to Celsius.
As with other programming structures, the format here is formulaic:
def FUNCTION_NAME(VARIABLE):
return (ACTION)
Then we run the function:
FUNCTION_NAME(TEST_VALUE)
You may wonder why this function uses return()
instead of print()
.
This is because print()
is used to provide human readable output,
while return()
indicates that the function is complete,
and provides the output in a way Python can understand as well.
We can make the output of the print statement more meaningful for us to understand with improved print
statements,
and also test on another value:
print("freezing point of water:", fahr_to_celsius(32), "C")
print("boiling point of water:", fahr_to_celsius(212), "C")
freezing point of water: 0.0 C boiling point of water: 100.0 C
Similarly, we can write another function that converts Celsius to Kelvin:
def celsius_to_kelvin(temp_c):
return temp_c + 273.15
print("freezing point of water in Kelvin:", celsius_to_kelvin(0.))
freezing point of water in Kelvin: 273.15
We can then use these two functions to convert from Fahrenheit to Kelvin. Applying a function (or more) to creation of another function is called function composition:
# converting F to K
def fahr_to_kelvin(temp_f):
temp_c = fahr_to_celsius(temp_f)
temp_k = celsius_to_kelvin(temp_c)
return temp_k
print("boiling point of water in Kelvin:", fahr_to_kelvin(212.0))
boiling point of water in Kelvin: 373.15
This example highlights two important points related to writing functions:
starting with small actions and using them together to build complexity
since you'll be referencing them throughout your work
Challenge-fence¶
You can "add" strings together using +
,
such as: "a" + "b"
.
Write a function called fence
that takes two parameters called original
and wrapper
and returns a new string that has the wrapper character at the beginning and end of the original. A call to your function should look like this:
print(fence('name', '*'))
*name*
Let's now encase our data visualization code from the last class into a function called analyze
:
import matplotlib.pyplot as plt
def analyze(filename):
data = np.loadtxt(fname=filename, delimiter=",")
fig = plt.figure(figsize=(10.0, 3.0))
axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)
axes1.set_ylabel("average")
axes1.plot(np.mean(data, axis=0))
axes2.set_ylabel("max")
axes2.plot(np.max(data, axis=0))
axes3.set_ylabel("min")
axes3.plot(np.min(data, axis=0))
fig.tight_layout()
plt.show()
Next, let's create a function to detect errors in our data associated with the minima and maxima for each patient:
def detect_problems(filename):
data = np.loadtxt(fname=filename, delimiter=",")
if np.max(data, axis=0)[0] == 0 and np.max(data, axis=0)[20] == 20:
print("Suspicious looking maxima!")
elif np.sum(np.min(data, axis=0)) == 0:
print("Minima add up to zero!")
else:
print("Seems OK!")
Challenge-test¶
Write code to test your two new functions (analyze
and detect_problems
).
Now, we'll apply both functions across the first three files using a for loop:
import glob
filenames = sorted(glob.glob("data/inflammation*.csv"))
for f in filenames[:3]:
print(f)
analyze(f)
detect_problems(f)
data/inflammation-01.csv
Suspicious looking maxima! data/inflammation-02.csv
Suspicious looking maxima! data/inflammation-03.csv
Minima add up to zero!
Now that you've had a chance to write a few functions of your own, we'll conclude this section with a few additional guidelines for making your functions readable and readily useful to other people.
First, take a look at the following two examples:
import numpy as np
# Example 1
def s(p):
a=0
for v in p:
a+=v
m=a/len(p)
d=0
for v in p:
d+=(v-m)*(v-m)
return np.sqrt(d/(len(p)-1))
# Example 2
def std_dev(sample):
sample_sum = 0
for value in sample:
sample_sum += value
sample_mean = sample_sum / len(sample)
sum_squared_devs = 0
for value in sample:
sum_squared_devs += (value - sample_mean) * (value - sample_mean)
return np.sqrt(sum_squared_devs / (len(sample) - 1))
If you look at Example 1 long enough, you may eventually figure out that the goal is to calculate standard deviation for a given sample. Example 2 performs the same task, but includes two important modifications:
+
, =
) work alongside the color coding of our Python interpreter to help us differentiate parts of codeWhen combined with our previous guidelines for function names and function composition, you should know have a fundamental understanding of how to write code for functions that allows it to be readable, both within a function and throughout a project.
Challenge-ice-cream¶
The following code defines and tests a function to list the best and worst flavor of ice cream. Modify the code to improve its readability. Extra: Explain the difference between the use of print
and return
in the output.
def hippopotamus(x,y):
foo=["chocolate","vanilla","strawberry"]
print(foo[x])
return foo[y]
hippopotamus(1,0)
vanilla
'chocolate'
In this class,
we explored use of Python to make decisions about how code should be run (using if
, elif
, else
),
evaluate expressions containing and
and or
,
and learned to define our own readable functions.
In our next class, we'll continue developing good practices for creating robust, reusable functions, and also explore how to correct errors reported by Python.
Challenge-f2k¶
What does the following piece of code display when run — and why?
f = 0
k = 0
def f2k(f):
k = ((f-32)*(5.0/9.0)) + 273.15
return k
f2k(8)
f2k(41)
f2k(32)
print(k)
Challenge-swap¶
Explain the result of the following code:
a = 3
b = 7
def swap(a, b):
temp = a
a = b
b = temp
swap(a, b)
print(a, b)
Challenge-outer¶
If the variable s
refers to a string, then s[0]
is the string’s first character and s[-1]
is its last. Write a function called outer
that returns a string made up of just the first and last characters of its input. A call to your function should look like this:
print(outer('helium'))
hm