unit tests

This is an example of unit testing with nose. We are trying to make sure that the function calc_gc properly calculated the gc fraction of the DNA sequence.

Problems worked through in class included --

  1. the sequence contained 'N's
  2. the sequence contained lowercase char
  3. divide by zero for sequences with no A, T, C, G
In [1]:
%%file calc_gc.py
def calc_gc(sequence):
    sequence = sequence.upper()                    # make all chars uppercase
    n = sequence.count('T') + sequence.count('A')  # count only A, T,
    m = sequence.count('G') + sequence.count('C')  # C, and G -- nothing else (no Ns, Rs, Ws, etc.)
    if n + m == 0:
        return 0.                                  # avoid divide-by-zero
    return float(m) / float(n + m)

def test_1():
    result = round(calc_gc('ATGGCAT'), 2)
    print 'hello, this is a test; the value of result is', result
    assert result == 0.43
def test_2(): # test handling N
    result = round(calc_gc('NATGC'), 2)
    assert result == 0.5, result
def test_3(): # test handling lowercase
    result = round(calc_gc('natgc'), 2)
    assert result == 0.5, result
Overwriting calc_gc.py

Running nosetests

Here, the 'nosetests' command looks through calcgc.py, finds all functions named test, and runs them.

In [2]:
!nosetests calc_gc.py
Ran 3 tests in 0.001s


You can also run nosetests with a '-v' option:

In [3]:
!nosetests -v calc_gc.py
calc_gc.test_1 ... ok
calc_gc.test_2 ... ok
calc_gc.test_3 ... ok

Ran 3 tests in 0.001s


Regression testing

Here I'm going to set up some regression tests, where we're simply comparing the output of a previously run script with the output of that script now. If we're running on the same data, we should get the same answer... right?

The script just calculates the average of the average GC content of each sequence in 25k.fq.gz.

In [4]:
%%file gc-of-seqs.py
import sys
import screed
import calc_gc

filename = sys.argv[1]    # take the sequence filename in from the command line
total_gc = []
for record in screed.open(filename):
    gc = calc_gc.calc_gc(record.sequence)
print sum(total_gc) / float(len(total_gc))
Overwriting gc-of-seqs.py
In [5]:
# run the script and look at the output -- then write that output into the following file.
!python gc-of-seqs.py 25k.fq.gz
In [6]:
%%file test_gc_script.py
import subprocess

correct_output = "0.607911191366\n"   # this is taken from the previous exec'd cell

# the following function checks to see if running this script at the command line
# returns the right result.  make sure you're running this from *within* the python/ subdirectory
# of the 2012-11-scripps/ repository.
def test_run():
    p = subprocess.Popen('python gc-of-seqs.py 25k.fq.gz', shell=True, stdout=subprocess.PIPE)
    (stdout, stderr) = p.communicate()
    assert stdout == correct_output
Overwriting test_gc_script.py
In [7]:
!nosetests test_gc_script.py
Ran 1 test in 0.937s