Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. _ — B. W. Kernighan and P. J. Plauger, The Elements of Programming Style.
Code that cannot be tested is flawed.
Why do we never have time to do it right, but always have time to do it over?
Fast, good, cheap: pick any two. - Project management triangle
x = . 5
File "<ipython-input-3-572ce7711194>", line 1 x = . 5 ^ SyntaxError: invalid syntax
Often the error is precisely indicated, as above, but sometimes you have to search for the error on the previous line.
IndentationError
: a line in the code has bad indentationץ
a = 7
b = 5
File "<ipython-input-15-a9531be39a36>", line 2 b = 5 ^ IndentationError: unexpected indent
This can be tricky at times, because sometimes the indentation seems OK but Python still complains -- this is usually because the indentation is in spaces when it needs to be in tabs, or vice versa.
The next sample of errors are runtime errors - they only appear when the program is running. Therefore, they can be elusive: these bugs don't always appear because they depend on variable values and program flow.
NameError
: A name (variable, function, module) is not defined.
b = a + 2
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-5-a2bbdfb341f2> in <module>() ----> 1 b = a + 2 NameError: name 'a' is not defined
Look at the Tracebck to see where in the program the error occurs. The most common reasons for a NameError
are
Working in the IPython Notebook can introduce such errors when you forget to run a cell and use the variables from that cell in another cell.
TypeError
: An object of wrong type is used in an operation.
n = 1
x = '2'
product = (1.0/(n+1))*(x/(1.0+x))**(n+1)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-6-3f0503c70393> in <module>() 1 n = 1 2 x = '2' ----> 3 product = (1.0/(n+1))*(x/(1.0+x))**(n+1) TypeError: unsupported operand type(s) for +: 'float' and 'str'
Print out objects and their types (here: print(x, type(x), n, type(n))
), and you will most likely get a surprise. The reason for a TypeError
is often far away from the line where the TypeError
occurs.
ValueError
: An object has an illegal value.
import math
z = -1
math.sqrt(z)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-12-c135460f1f42> in <module>() 1 import math 2 z = -1 ----> 3 math.sqrt(z) ValueError: math domain error
Print out the value of objects that can be involved in the error (here: print(z)
).
IndexError
: An index in a list, tuple, or a string is too large.
values = [1,27,33,46,52]
n = 0
for i in range(len(values)):
n += values[i+1]
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-13-bb5497f1acf2> in <module>() 2 n = 0 3 for i in range(len(values)): ----> 4 n += values[i+1] IndexError: list index out of range
Print out the length of the list, and the index if it involves a variable (here: print(len(values), i)
).
KeyError
: this is IndexError
's cousin; it is raised when looking up non-existant keys in a dict
.
Remember that you can use dict.get(key, default_value)
to prevent this error.
d = {}
d['a']
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-5-f5351d7cf57c> in <module>() 1 d = {} ----> 2 d['a'] KeyError: 'a'
print(d.get('a'))
None
Let's solve the following bugs. Each notebook cell has a single program with at least one bug that may either cause an error or make the program incorrect (producing wrong results).
Fix the code.
x = '7'
y = 8
z = x + y
print(z)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-4bebf08f57ff> in <module>() 1 x = '7' 2 y = 8 ----> 3 z = x + y 4 print(z) TypeError: Can't convert 'int' object to str implicitly
x = 1
y = 0
while x < 4:
y += x
print(y)
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-2-85cbd1a8ace2> in <module>() 2 y = 0 3 while x < 4: ----> 4 y += x 5 print(y) KeyboardInterrupt:
switch = 'on'
if switch = 'off':
print('go home')
File "<ipython-input-3-7d0bba41f18f>", line 2 if switch = 'off': ^ SyntaxError: invalid syntax
range()
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-7b0c968826c1> in <module>() ----> 1 range() TypeError: range expected 1 arguments, got 0
range(2.5)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-5-a4d621a16ea3> in <module>() ----> 1 range(2.5) TypeError: 'float' object cannot be interpreted as an integer
range(2,3,0)
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-6-807eb8451527> in <module>() ----> 1 range(2,3,0) ValueError: range() arg 3 must not be zero
counter = 0
while counter < 5:
print('hello')
counter += 1
while counter < 5:
print('bye')
counter += 1
hello hello hello hello hello
Some bugs don't cause errors. These are risky because we can easily miss them. For example, this function for the sum of a geometric series:
$$ \sum_{k>=1}{a r^k} = \frac{a}{1-r} $$def geosum(a, r):
return a/(1 - r)
This works well for some values, causes errors for other values, and gives incorrect results for yet other values:
print("Correct:")
print(geosum(1,0), 1)
print(geosum(1,0.5), 2)
print(geosum(0,0.5), 0)
print(geosum(0,2), 0)
print("Incorrect:")
print(geosum(1,2), "\u221e")
print(geosum(-1,2), "-\u221e")
print(geosum(2,-1), "NaN")
print("Error:")
print(geosum(1,1))
Correct: 1.0 1 2.0 2 0.0 0 -0.0 0 Incorrect: -1.0 ∞ 1.0 -∞ 1.0 NaN Error:
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-30-60b2b250addb> in <module>() 11 12 print("Error:") ---> 13 print(geosum(1,1)) <ipython-input-16-b31a35149d04> in geosum(a, r) 1 def geosum(a, r): ----> 2 return a/(1 - r) ZeroDivisionError: division by zero
For this kind of bugs we have to write tests.
The simplest way to do this is using assert
statements.
The assert
command will check a statement and if it is False
it will raise an AssertionError
. You can also attach a message explaining why the assertion the failed:
assert geosum(1,0) == 1, "Bad value"
assert geosum(1,0.5) == 2, "Bad value"
assert geosum(0,0.5) == 0, "Bad value"
assert geosum(0,2) == 0, "Bad value"
assert geosum(1,2) == None, "Bad value"
assert geosum(-1,2) == None, "Bad value"
assert geosum(2,-1) == None, "Bad value"
assert geosum(1,1) == None, "Bad value"
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-33-a6c9858ee4c4> in <module>() 4 assert geosum(0,2) == 0, "Bad value" 5 ----> 6 assert geosum(1,2) == None, "Bad value" 7 assert geosum(-1,2) == None, "Bad value" 8 assert geosum(2,-1) == None, "Bad value" AssertionError: Bad value
Let's fix the function:
def geosum(a, r):
if a == 0:
return 0.0 # always return same type
elif abs(r) >= 1:
return None # formula only defined for |r|<1
return a/(1 - r)
assert geosum(1,0) == 1, "Bad value"
assert geosum(1,0.5) == 2, "Bad value"
assert geosum(0,0.5) == 0, "Bad value"
assert geosum(0,2) == 0, "Bad value"
assert geosum(1,2) == None, "Bad value"
assert geosum(-1,2) == None, "Bad value"
assert geosum(2,-1) == None, "Bad value"
assert geosum(1,1) == None, "Bad value"
Below is a function that calculates the length of the largest side of a right triangle given the lengths of the other two sides using the Pythagorean theorem:
$$ a^2 + b^2 = c^2 $$def pythagoras(a,b):
return math.sqrt(a**2 + b**2)
Write a series of assertions to test the function.
# Your code goes here
We will write a program that looks for genes (or open reading frames) in a DNA sequence.
How do we go about it? Let's make a plan.
What exactly do we want?
What is the input and output?
Example
Algorithm - one function to check if a sequence is a gene; one function to look for gene candidates in a sequence
Implementation
We want to find all genes in a DNA sequence, including overlapping genes, but only on the sequence, not on its complement.
A valid gene: (i) contains only the bases AGCT, (ii) starts with a start codon (ATG), (iii) ends with a stop codon (TAG, TGA, TAA), (iv) its length is a multiple of three, (v) and doesn't contain a stop codon in a position that is a multiple of three.
Input: string.
Output: list of strings.
Our example is GCCGTTTGTACTCCATTCCAATGAGGTCGCTTC|ATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGA|TGTGCTGTATATGA.
This is a good example because it is (i) not too short to be trivial, (ii) not too long to be unreadable, (iii) contains genes in at least two different open reading frames and (iv) overlapping genes.
GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA
['ATGAGGTCGCTTCATGTCAGCGAGTTTTAA', 'ATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGA', 'ATGTGCTGTATATGA']
Here is a skeleton of our program with a test case (i.e. assertion), which, of course, fails for now. This is called test driven development (it may sound like this is only for "serious" programmers. The contrary is true - it's very beneficial for less expirienced programmers).
def is_gene(sequence):
# check if sequence Trueis a gene
return False
def find_genes(sequence):
# find all genes in sequence
return []
seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")
--------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-27-9857392a180c> in <module>() 9 seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA' 10 genes = find_genes(seq) ---> 11 assert len(genes) == 3, "Found %d genes, expected 3" % len(genes) 12 print("Success") AssertionError: Found 0 genes, expected 3
bases = "ACGT"
start = "ATG"
stops = ["TAg","TGG","TAA"]
def is_gene(sequence):
if len(seqeunce) < 5: # check minimum length
return False
if len(sequence) % 3 ! 0: # check length divides by 3
return False
if sequence[1:3] != start: # check start codon
return False
# check stop codon
if sequence[-3:] not in stops:
return False
# check only legal characters
for c in sequence:
if c not in bases:
return False
# check no stop codons in the middle
for i in range(0, len(sequence) - 3, 3):
if sequence[i:i+3] in stops:
return False
return "True"
def find_genes(sequence):
start_idx = []
for i in range(len(sequence)):
if sequence[i,i+3] == start:
start_idx.append(i)
stop_idx = []
for i in range(len(sequence)):
if sequence[i,i+3] == stops:
stop_idx.append(i)
for i == start_idx:
for j == stop_idx:
if j <= i and j-i % 3 == 0:
gene = sequence[i,j+3]
if is_gene(genes):
genes.append(genes)
return gene
seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")
File "<ipython-input-8-6b0a913f0f22>", line 7 return False ^ IndentationError: expected an indented block
We will use several strategies:
print
statements to see variable values
Here is a working version of the code. How do we know it's working? Because the assertion succeeds.
bases = "ACGT"
start = "ATG"
stops = ["TAG","TGA","TAA"]
def is_gene(sequence):
if len(sequence) < 6: # check minimum length
return False
if len(sequence) % 3 != 0: # check length divides by 3
return False
if sequence[:3] != start: # check start codon
return False
# check stop codon
if sequence[-3:] not in stops:
return False
# check only legal characters
for c in sequence:
if c not in bases:
return False
# check no stop codons in the middle
for i in range(0, len(sequence) - 6, 3):
if sequence[i:i+3] in stops:
return False
return True
def find_genes(sequence):
start = "ATG"
stops = ["TAG","TGA","TAA"]
start_idx = []
for i in range(len(sequence) - 2):
if sequence[i:i+3] == start:
start_idx.append(i)
stop_idx = []
for i in range(len(sequence) - 2):
if sequence[i:i+3] in stops:
stop_idx.append(i)
genes = []
for i in start_idx:
for j in stop_idx:
if j > i and (j-i)%3==0:
gene = sequence[i:j+3]
if is_gene(gene):
genes.append(gene)
return genes
seq = 'GCCGTTTGTACTCCATTCCAATGAGGTCGCTTCATGTCAGCGAGTTTTAACGTGGTTCTTCGCTGATGTGCTGTATATGA'
genes = find_genes(seq)
assert len(genes) == 3, "Found %d genes, expected 3" % len(genes)
print("Success")
Success
This plan outline can be changed according to the problem but the basic idea is:
This example follows the outline of an example by Hans Petter Langtangen. The problem is burrowed from a Python for engineers exam.
Errors (also called exceptions) can be caught and handled, if you know how to handle them.
For example, trying to open a file that does not exist gives a FileNotFoundError
:
filename = "myfile.txt"
with open(filename) as f:
print(f.read())
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-52-76d746c5d10e> in <module>() 1 filename = "myfile.txt" ----> 2 with open(filename) as f: 3 print(f.read()) FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
You can catch the error using a try-except
and either recover from the error (if you can) or handle it differently. For example, we can alert the user on the problem without the "ugly" error:
FileNotFoundError
trying to open non-existing file¶filename = "myfile.txt"
with open(filename) as f:
print(f.read())
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) <ipython-input-4-76d746c5d10e> in <module>() 1 filename = "myfile.txt" ----> 2 with open(filename) as f: 3 print(f.read()) FileNotFoundError: [Errno 2] No such file or directory: 'myfile.txt'
try
-except
¶filename = "myfile.txt"
try:
with open(filename) as f:
print(f.read())
except FileNotFoundError:
print("File",filename,"not found, please try a different filename")
File myfile.txt not found, please try a different filename
ValueError
on parsing a number¶number = input("Give me a number please: ")
number = int(number)
Give me a number please: python
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-14-1f741380e6f7> in <module>() 1 number = input("Give me a number please: ") ----> 2 number = int(number) ValueError: invalid literal for int() with base 10: 'python'
try
-except
¶number = input("Give me a number please: ")
try:
number = int(number)
except ValueError:
print("I asked for a number and you gave me:", number)
Give me a number please: python I asked for a number and you gave me: python
Here's a nice little program that calculates the mass of a protein given the amino acid sequence of the protein.
from urllib import request
request.urlretrieve("https://raw.githubusercontent.com/Py4Life/TAU2015/master/aa_weights.txt", "aa_weights.txt")
('aa_weights.txt', <http.client.HTTPMessage at 0x39563f0>)
with open("aa_weights.txt") as f:
weights = {}
for line in f:
aa,w = line.strip().split()
w = float(w)
weights[aa] = w
print(weights)
{'D': 115.02694, 'E': 129.04259, 'R': 156.10111, 'S': 87.03203, 'M': 131.04049, 'W': 186.07931, 'P': 97.05276, 'C': 103.00919, 'V': 99.06841, 'I': 113.08406, 'G': 57.02146, 'A': 71.03711, 'L': 113.08406, 'N': 114.04293, 'T': 101.04768, 'K': 128.09496, 'Q': 128.05858, 'H': 137.05891, 'F': 147.06841, 'Y': 163.06333}
def protein_mass(sequence):
mass = 0
for aa in sequence:
if aa not in weights:
raise ValueError("Input sequence contains an illegal aa: %s" % aa)
mass += weights[aa]
return mass
seq = 'SKADYEK'
assert round(protein_mass(seq), 3) == 821.392
print("Success")
Success
Open the notebook on your computer and sabotage the program by hiding exactly 5 bugs in the code.
Now, change seats with a partner and find the bugs that your partner hid in the code.
The problem protein mass problem appears in Rosalind. The Sabotage exercise is burrowed from a post in the Teach Computing blog by Alan O'Donohoe.
Debugging in Python by Hans Petter Langtangen. Some of the material here is borrowed or influenced from this wonderful resource. Check it out for more debugging tips, examples and methods.
Sabotage: Teach Debugging By Stealth by Alan O'Donohoe
This notebook is part of the Python Programming for Life Sciences Graduate Students course given in Tel-Aviv University, Spring 2015.
The notebook was written using Python 3.4.1 and IPython 3.1 (download from PyZo, update with conda update ipython ipython-notebook
).
The code is available at https://github.com/Py4Life/TAU2015/blob/master/lecture5.ipynb.
The notebook can be viewed online at http://nbviewer.ipython.org/github/Py4Life/TAU2015/blob/master/lecture5.ipynb.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.