!python sandbox/bin-reads-by-abundance.py data/stamps-reads.fa.gz
... 0 PING ... 25000 ... 50000 PING PING ... 75000 ... 100000 ... 125000 ... 150000 ... 175000 199999 7359 9413 175 183053
!python scripts/abundance-dist-single.py -s -k 20 -x 1e8 stamps-reads.fa.gz.bink.2 bink2.hist
!python scripts/abundance-dist-single.py -s -k 20 -x 1e8 stamps-reads.fa.gz.bink.1 bink1.hist
PARAMETERS: - kmer size = 20 (-k) - n hashes = 4 (-N) - min hashsize = 1e+08 (-x) Estimated memory usage is 4e+08 bytes (n_hashes x min_hashsize) -------- making hashtable building tracking ht K: 20 HT sizes: [100000007, 100000037, 100000039, 100000049] outputting to bink2.hist consuming input, round 1 -- stamps-reads.fa.gz.bink.2 preparing hist from stamps-reads.fa.gz.bink.2... consuming input, round 2 -- stamps-reads.fa.gz.bink.2 PARAMETERS: - kmer size = 20 (-k) - n hashes = 4 (-N) - min hashsize = 1e+08 (-x) Estimated memory usage is 4e+08 bytes (n_hashes x min_hashsize) -------- making hashtable building tracking ht K: 20 HT sizes: [100000007, 100000037, 100000039, 100000049] outputting to bink1.hist consuming input, round 1 -- stamps-reads.fa.gz.bink.1 preparing hist from stamps-reads.fa.gz.bink.1... consuming input, round 2 -- stamps-reads.fa.gz.bink.1
raw = numpy.loadtxt('raw-reads.hist')
bink = numpy.loadtxt('bink-reads.hist')
bink1 = numpy.loadtxt('bink1.hist')
bink2 = numpy.loadtxt('bink2.hist')
plot(raw[:,0], raw[:,1])
plot(bink[:,0], bink[:,1])
axis(ymax=200)
(0.0, 3500.0, 0.0, 200)
plot(raw[:,0], raw[:,1])
plot(bink[:,0], bink[:,1])
axis(ymax=200, xmax=200)
(0.0, 200, 0.0, 200)
plot(bink1[:,0], bink1[:,1])
plot(bink2[:,0], bink2[:,1])
axis(ymax=800)
(0.0, 120.0, 0.0, 800)
!python scripts/load-into-counting.py -x 1e7 -k 20 bink1.kh stamps-reads.fa.gz.bink.1
!python scripts/abundance-dist.py -s bink1.kh a.fa bink1.x.a.hist
!python scripts/abundance-dist.py -s bink1.kh b.fa bink1.x.b.hist
!python scripts/load-into-counting.py -x 1e7 -k 20 bink2.kh stamps-reads.fa.gz.bink.2
!python scripts/abundance-dist.py -s bink2.kh a.fa bink2.x.a.hist
!python scripts/abundance-dist.py -s bink2.kh b.fa bink2.x.b.hist
PARAMETERS: - kmer size = 20 (-k) - n hashes = 4 (-N) - min hashsize = 1e+07 (-x) Estimated memory usage is 4e+07 bytes (n_hashes x min_hashsize) -------- Saving hashtable to bink1.kh Loading kmers from sequences in ['stamps-reads.fa.gz.bink.1'] making hashtable consuming input stamps-reads.fa.gz.bink.1 saving bink1.kh fp rate estimated to be 0.000 DONE. hashtable from bink1.kh K: 20 HT sizes: [10000019, 10000079, 10000103, 10000121] outputting to bink1.x.a.hist ** squashing existing file bink1.x.a.hist preparing hist... hashtable from bink1.kh K: 20 HT sizes: [10000019, 10000079, 10000103, 10000121] outputting to bink1.x.b.hist ** squashing existing file bink1.x.b.hist preparing hist... PARAMETERS: - kmer size = 20 (-k) - n hashes = 4 (-N) - min hashsize = 1e+07 (-x) Estimated memory usage is 4e+07 bytes (n_hashes x min_hashsize) -------- Saving hashtable to bink2.kh Loading kmers from sequences in ['stamps-reads.fa.gz.bink.2'] making hashtable consuming input stamps-reads.fa.gz.bink.2 saving bink2.kh fp rate estimated to be 0.000 DONE. hashtable from bink2.kh K: 20 HT sizes: [10000019, 10000079, 10000103, 10000121] outputting to bink2.x.a.hist ** squashing existing file bink2.x.a.hist preparing hist... hashtable from bink2.kh K: 20 HT sizes: [10000019, 10000079, 10000103, 10000121] outputting to bink2.x.b.hist ** squashing existing file bink2.x.b.hist preparing hist...
bink1xa = numpy.loadtxt('bink1.x.a.hist')
bink1xb = numpy.loadtxt('bink1.x.b.hist')
bink2xa = numpy.loadtxt('bink2.x.a.hist')
bink2xb = numpy.loadtxt('bink2.x.b.hist')
plot(bink1xa[:,0], bink1xa[:,1], label='bink1 x a')
plot(bink1xb[:,0], bink1xb[:,1], label='bink1 x b')
plot(bink2xa[:,0], bink2xa[:,1], label='bink2 x a')
plot(bink2xb[:,0], bink2xb[:,1], label='bink2 x b')
legend()
<matplotlib.legend.Legend at 0x10c13cfd0>
`