Remember to go to 'File' and 'Make a copy'¶

In [1]:

cd /mnt

/mnt

Change the name below to something unique to you¶

In [2]:

NetID='titus'

In [3]:

mkdir -p $NetID

In [4]:

cd $NetID

/mnt/titus

You should now be in '/mnt/USERNAME', where USERNAME is not CHANGEME

Make two random genomes with a 10:1 abundance¶

In [5]:

# configure some stuff...
import sys
sys.path.insert(0, '/usr/local/share/khmer/python')
import os
os.environ['PYTHONPATH'] = '/usr/local/share/khmer/python'
import khmer

In [6]:

import random
random.seed(1)

x = ["A"] + ["G"] + ["C"] + ["T"]
x = x*1000
random.shuffle(x)
x = "".join(x)

y = ["A"] + ["G"] + ["C"] + ["T"]
y = y*1000
random.shuffle(y)
y = "".join(y)

print 'x is', x[:100]
print 'y is', y[:100]

x is CAGACTTGGAAGCTGAGAGTCCGACGTCACTGCCTCAACTCGCGCAAATGTTCCCGCCAAATTGTATCCTAGGGATCTTCCATAAGCTTATATACGGGGG
y is ATTAGACCACATGTTGCTAGCTATTCGGAGGAAGGCAGTTTAGCTAAATATGTGAGGGCTGTGGGACGAATATCGGGGGATGACGGTCCCATAGCTAGTT

In [7]:

outfp = open('metagenome.fa', 'w')
print >>outfp, ">x 1"
print >>outfp, x
print >>outfp, ">y 2"
print >>outfp, y
outfp.close()

In [8]:

!python /usr/local/share/2012-paper-diginorm/pipeline/make-biased-reads.py metagenome.fa | head -100000 > reads.fa

Traceback (most recent call last):
  File "/usr/local/share/2012-paper-diginorm/pipeline/make-biased-reads.py", line 56, in <module>
    print '>read%d\n%s' % (i, read)
IOError: [Errno 32] Broken pipe

(Yes, you should see an error.)

Now, calculate a k-mer abundance histogram for k=20¶

In [9]:

!/usr/local/share/khmer/scripts/abundance-dist-single.py -k 20 -x 1e8 reads.fa reads.hist

PARAMETERS:
 - kmer size =    20 		(-k)
 - n hashes =     4 		(-N)
 - min hashsize = 1e+08 	(-x)

Estimated memory usage is 4e+08 bytes (n_hashes x min_hashsize)
--------
making hashtable
building tracking ht
K: 20
HT sizes: [100000007, 100000037, 100000039, 100000049]
outputting to reads.hist
consuming input, round 1 -- reads.fa
preparing hist from reads.fa...
consuming input, round 2 -- reads.fa

In [10]:

histdata = numpy.loadtxt('reads.hist')

In [11]:

plot(histdata[:,0], histdata[:,1])
xlabel("k-mer abundance")
ylabel("N of k-mers with that abundance")

Out[11]:

<matplotlib.text.Text at 0x7fdfac1fdc10>

In [12]:

plot(histdata[:,0], histdata[:,1])
axis(ymax=500)
xlabel("k-mer abundance")
ylabel("N of k-mers with that abundance")

Out[12]:

<matplotlib.text.Text at 0x7fdfac1f3450>

In [13]:

plot(histdata[:,0], histdata[:,1])
axis(xmax=10)

Out[13]:

(0.0, 10, 0.0, 160000.0)

Questions to answer¶

How do the 3 peaks (1, 100, 800) shift relative to each other with (a) changing coverage and (b) different k values?

The '-k' parameter is set up above in the abundance-dist-single command; the coverage is set by how many sequences you keep ('head -10000') in the make-biased-reads.py command.

In [13]: