In [38]:

!python sandbox/bin-reads-by-abundance.py data/stamps-reads.fa.gz

... 0
PING
... 25000
... 50000
PING
PING
... 75000
... 100000
... 125000
... 150000
... 175000
199999 7359 9413 175 183053

In [46]:

!python scripts/abundance-dist-single.py -s -k 20 -x 1e8 stamps-reads.fa.gz.bink.2 bink2.hist
!python scripts/abundance-dist-single.py -s -k 20 -x 1e8 stamps-reads.fa.gz.bink.1 bink1.hist

PARAMETERS:
 - kmer size =    20 		(-k)
 - n hashes =     4 		(-N)
 - min hashsize = 1e+08 	(-x)

Estimated memory usage is 4e+08 bytes (n_hashes x min_hashsize)
--------
making hashtable
building tracking ht
K: 20
HT sizes: [100000007, 100000037, 100000039, 100000049]
outputting to bink2.hist
consuming input, round 1 -- stamps-reads.fa.gz.bink.2
preparing hist from stamps-reads.fa.gz.bink.2...
consuming input, round 2 -- stamps-reads.fa.gz.bink.2

PARAMETERS:
 - kmer size =    20 		(-k)
 - n hashes =     4 		(-N)
 - min hashsize = 1e+08 	(-x)

Estimated memory usage is 4e+08 bytes (n_hashes x min_hashsize)
--------
making hashtable
building tracking ht
K: 20
HT sizes: [100000007, 100000037, 100000039, 100000049]
outputting to bink1.hist
consuming input, round 1 -- stamps-reads.fa.gz.bink.1
preparing hist from stamps-reads.fa.gz.bink.1...
consuming input, round 2 -- stamps-reads.fa.gz.bink.1

In [47]:

raw = numpy.loadtxt('raw-reads.hist')
bink = numpy.loadtxt('bink-reads.hist')
bink1 = numpy.loadtxt('bink1.hist')
bink2 = numpy.loadtxt('bink2.hist')

In [48]:

plot(raw[:,0], raw[:,1])
plot(bink[:,0], bink[:,1])
axis(ymax=200)

Out[48]:

(0.0, 3500.0, 0.0, 200)

In [49]:

plot(raw[:,0], raw[:,1])
plot(bink[:,0], bink[:,1])
axis(ymax=200, xmax=200)

Out[49]:

(0.0, 200, 0.0, 200)

In [49]:

In [50]:

plot(bink1[:,0], bink1[:,1])
plot(bink2[:,0], bink2[:,1])
axis(ymax=800)

Out[50]:

(0.0, 120.0, 0.0, 800)

In [51]:

!python scripts/load-into-counting.py -x 1e7 -k 20 bink1.kh stamps-reads.fa.gz.bink.1
!python scripts/abundance-dist.py -s bink1.kh a.fa bink1.x.a.hist
!python scripts/abundance-dist.py -s bink1.kh b.fa bink1.x.b.hist

!python scripts/load-into-counting.py -x 1e7 -k 20 bink2.kh stamps-reads.fa.gz.bink.2
!python scripts/abundance-dist.py -s bink2.kh a.fa bink2.x.a.hist
!python scripts/abundance-dist.py -s bink2.kh b.fa bink2.x.b.hist

PARAMETERS:
 - kmer size =    20 		(-k)
 - n hashes =     4 		(-N)
 - min hashsize = 1e+07 	(-x)

Estimated memory usage is 4e+07 bytes (n_hashes x min_hashsize)
--------
Saving hashtable to bink1.kh
Loading kmers from sequences in ['stamps-reads.fa.gz.bink.1']
making hashtable
consuming input stamps-reads.fa.gz.bink.1
saving bink1.kh
fp rate estimated to be 0.000
DONE.
hashtable from bink1.kh
K: 20
HT sizes: [10000019, 10000079, 10000103, 10000121]
outputting to bink1.x.a.hist
** squashing existing file bink1.x.a.hist
preparing hist...
hashtable from bink1.kh
K: 20
HT sizes: [10000019, 10000079, 10000103, 10000121]
outputting to bink1.x.b.hist
** squashing existing file bink1.x.b.hist
preparing hist...

PARAMETERS:
 - kmer size =    20 		(-k)
 - n hashes =     4 		(-N)
 - min hashsize = 1e+07 	(-x)

Estimated memory usage is 4e+07 bytes (n_hashes x min_hashsize)
--------
Saving hashtable to bink2.kh
Loading kmers from sequences in ['stamps-reads.fa.gz.bink.2']
making hashtable
consuming input stamps-reads.fa.gz.bink.2
saving bink2.kh
fp rate estimated to be 0.000
DONE.
hashtable from bink2.kh
K: 20
HT sizes: [10000019, 10000079, 10000103, 10000121]
outputting to bink2.x.a.hist
** squashing existing file bink2.x.a.hist
preparing hist...
hashtable from bink2.kh
K: 20
HT sizes: [10000019, 10000079, 10000103, 10000121]
outputting to bink2.x.b.hist
** squashing existing file bink2.x.b.hist
preparing hist...

In [52]:

bink1xa = numpy.loadtxt('bink1.x.a.hist')
bink1xb = numpy.loadtxt('bink1.x.b.hist')
bink2xa = numpy.loadtxt('bink2.x.a.hist')
bink2xb = numpy.loadtxt('bink2.x.b.hist')

In [53]:

plot(bink1xa[:,0], bink1xa[:,1], label='bink1 x a')
plot(bink1xb[:,0], bink1xb[:,1], label='bink1 x b')
plot(bink2xa[:,0], bink2xa[:,1], label='bink2 x a')
plot(bink2xb[:,0], bink2xb[:,1], label='bink2 x b')
legend()

Out[53]:

<matplotlib.legend.Legend at 0x10c13cfd0>

In [53]:

In [ ]: