docker run -it --rm -v ${PWD}:/work -p 8888:8888 sfletcher/scram_docker
%matplotlib inline
#To display pandas dataframes inline
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import pandas
! tree
. ├── license ├── out_dir ├── ref │ ├── GFP.fa │ ├── TAIR10_transposable_elements.fa │ └── ath_mir.fa ├── scram_demonstration.ipynb └── seq ├── treatment_a_rep1.fa ├── treatment_a_rep2.fa ├── treatment_a_rep3.fa ├── treatment_b_rep1.fa ├── treatment_b_rep2.fa └── treatment_b_rep3.fa 3 directories, 11 files
!scram -h
Fast and simple small RNA read alignment v0.2.0 Usage: scram [command] Available Commands: compare Compare normalised alignment counts and standard errors for 2 read file sets help Help about any command profile Align reads of length l from 1 read file set to all sequences in a reference file Flags: --adapter string 3' adapter sequence to trim - FASTA & FASTQ only (default "nil") -r, --alignTo string path/to/FASTA reference file -1, --fastxSet1 string comma-separated path/to/read file set 1. GZIPped files must have .gz file extension -h, --help help for scram -l, --length string comma-separated read (sRNA) lengths to align --maxLen int Maximum read length to include for RPMR normalization (default 32) --minCount float Minimum read count for alignment and to include for RPMR normalization (default 1) --minLen int Minimum read length to include for RPMR normalization (default 18) --noNorm Do not normalize read counts by library size (i.e. reads per million reads) --noSplit Do not split alignment count for each read by the number of times it aligns -o, --outFilePrefix string path/to/outfile prefix (len.csv will be appended) -t, --readFileType string Read file type: cfa (collapsed FASTA), fa (FASTA), fq (FASTQ), clean (BGI clean.fa). (default "cfa") Use "scram [command] --help" for more information about a command.
!scram compare -r ref/TAIR10_transposable_elements.fa \
-1 seq/treatment_a_rep1.fa,seq/treatment_a_rep2.fa,seq/treatment_a_rep3.fa \
-2 seq/treatment_b_rep1.fa,seq/treatment_b_rep2.fa,seq/treatment_b_rep3.fa \
-l 21,22,24 -o out_dir/treatment_a_vs_b
Loading reads SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_a_rep1.fa - 7,916,958 reads processed seq/treatment_a_rep2.fa - 7,827,082 reads processed seq/treatment_a_rep3.fa - 7,897,787 reads processed SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_b_rep3.fa - 9,185,811 reads processed seq/treatment_b_rep2.fa - 8,311,241 reads processed seq/treatment_b_rep1.fa - 8,203,718 reads processed Loading reference No. of reference sequences: 31189 Combined length of reference sequences: 23,315,940 nt Aligning 21 nt reads Aligning 22 nt reads Aligning 24 nt reads Alignment complete. Total time taken = 15.746887639s
comparison_alignment = pandas.read_csv('out_dir/treatment_a_vs_b_21.csv')
comparison_alignment.head()
Header | Mean count 1 | Std. err 1 | Mean count 2 | Std. err 2 | |
---|---|---|---|---|---|
0 | AT3TE56475|+|13785373|13785436|ATDNA12T3_2|DNA... | 0.466 | 0.017302 | 0.427 | 0.022843 |
1 | AT3TE55845|+|13718067|13718130|ATDNA12T3_2|DNA... | 0.953 | 0.019223 | 0.920 | 0.040779 |
2 | AT5TE53195|+|14754521|14755395|ATREP15|RC/Heli... | 0.001 | 0.000003 | 0.000 | 0.000017 |
3 | AT3TE45810|+|11021715|11022665|BOMZH1|DNA/MuDR... | 0.034 | 0.004299 | 0.027 | 0.004598 |
4 | AT3TE47200|-|11319279|11319496|HELITRONY3|RC/H... | 0.039 | 0.002143 | 0.037 | 0.002122 |
%run /scram_plot/scram_plot.py compare -h
usage: scram_plot.py compare [-h] [-plot_type PLOT_TYPE] [-a ALIGNMENT] [-l LENGTH] [-xlab [X_LABEL [X_LABEL ...]]] [-ylab [Y_LABEL [Y_LABEL ...]]] [-html] [-pub] [-png] [-xylim XYLIM] [-fig_size FIG_SIZE] optional arguments: -h, --help show this help message and exit -plot_type PLOT_TYPE, --plot_type PLOT_TYPE Bokeh plot type to display (log, log_error or all) -a ALIGNMENT, --alignment ALIGNMENT sRNA alignment file prefix used by SCRAM profile (i.e. exclude _21.csv, _22.csv, _24.csv) -l LENGTH, --length LENGTH Comma-separated list of sRNA lengths to plot. SCRAM alignment files must be available for each sRNA length. For an miRNA alignment file, use 'mir' instead of an integer -xlab [X_LABEL [X_LABEL ...]], --x_label [X_LABEL [X_LABEL ...]] x label - corresponds to -s1 treatment in SCRAM arguments -ylab [Y_LABEL [Y_LABEL ...]], --y_label [Y_LABEL [Y_LABEL ...]] y label - corresponds to -s2 treatment in SCRAM arguments -html, --html If not using Jupyter Notebook, output interactive plot to browser as save to .html -pub, --publish Remove all labels from profiles for editing for publication -png, --png Export plot/s as 300 dpi .png file/s -xylim XYLIM, --xylim XYLIM x and y max. axis limits -fig_size FIG_SIZE, --fig_size FIG_SIZE Output plot dimensions
%run /scram_plot/scram_plot.py compare -a out_dir/treatment_a_vs_b \
-l 24 \
-xlab Treatment A (RPMR) \
-ylab Treatment B (RPMR)
--mir
flag is required, and the -l
flag and arguments are not - all lengths are aligned!scram compare -r ref/ath_mir.fa \
-1 seq/treatment_a_rep1.fa,seq/treatment_a_rep2.fa,seq/treatment_a_rep3.fa \
-2 seq/treatment_b_rep1.fa,seq/treatment_b_rep2.fa,seq/treatment_b_rep3.fa \
-o out_dir/treatment_a_vs_b_mir \
--mir
Loading reads SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_a_rep3.fa - 7,897,787 reads processed seq/treatment_a_rep1.fa - 7,916,958 reads processed seq/treatment_a_rep2.fa - 7,827,082 reads processed SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_b_rep3.fa - 9,185,811 reads processed seq/treatment_b_rep1.fa - 8,203,718 reads processed seq/treatment_b_rep2.fa - 8,311,241 reads processed Loading reference Alignment complete. Total time taken = 3.632545103s
-l
flag¶-l
, enter mir
as the argument, so all miRNA alignments are shown in a single plot%run /scram_plot/scram_plot.py compare -a out_dir/treatment_a_vs_b_mir \
-l mir \
-xlab Treatment A (RPMR) \
-ylab Treatment B (RPMR)
!scram profile -r ref/TAIR10_transposable_elements.fa \
-1 seq/treatment_a_rep1.fa,seq/treatment_a_rep2.fa,seq/treatment_a_rep3.fa \
-l 21,22,24 -o out_dir/treatment_a_profile
Loading reads SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_a_rep2.fa - 7,827,082 reads processed seq/treatment_a_rep3.fa - 7,897,787 reads processed seq/treatment_a_rep1.fa - 7,916,958 reads processed Loading reference No. of reference sequences: 31189 Combined length of reference sequences: 23,315,940 nt Aligning 21 nt reads Aligning 22 nt reads Aligning 24 nt reads Alignment complete. Total time taken = 9.131281456s
comparison_alignment = pandas.read_csv('out_dir/treatment_a_profile_21.csv')
comparison_alignment.head()
Header | len | sRNA | Position | Strand | Count | Std. Err | Times aligned | |
---|---|---|---|---|---|---|---|---|
0 | AT1TE52125|-|15827287|15838845|ATHILA2|LTR/Gyp... | 11559 | AAAAGGTCAAGAGACAAAGAT | 3237 | - | 0.004 | 0.000621 | 70 |
1 | AT1TE52125|-|15827287|15838845|ATHILA2|LTR/Gyp... | 11559 | TAATCCGGATTTCTCTTTATC | 4253 | + | 0.003 | 0.000009 | 99 |
2 | AT1TE52125|-|15827287|15838845|ATHILA2|LTR/Gyp... | 11559 | AGAAAACCTACTGTAAACTGT | 11514 | - | 0.068 | 0.008258 | 5 |
3 | AT1TE53750|-|16325184|16327367|ATREP3|RC/Helit... | 2184 | ACTAGATTTTAACCCGCGGTA | 65 | + | 0.002 | 0.000265 | 164 |
4 | AT1TE53750|-|16325184|16327367|ATREP3|RC/Helit... | 2184 | AAAAATAAATCGTCCCGCGGT | 87 | - | 0.007 | 0.000024 | 37 |
%run /scram_plot/scram_plot.py profile -h
usage: scram_plot.py profile [-h] [-a ALIGNMENT] [-cutoff CUTOFF] [-s [SEARCH [SEARCH ...]]] [-l LENGTH] [-ylim YLIM] [-win WIN] [-pub] [-png] [-bin_reads] optional arguments: -h, --help show this help message and exit -a ALIGNMENT, --alignment ALIGNMENT sRNA alignment file prefix used by SCRAM profile (i.e. exclude _21.csv, _22.csv, _24.csv) -cutoff CUTOFF, --cutoff CUTOFF Min. alignment RPMR from the most abundant profile (if multi) to generate plot -s [SEARCH [SEARCH ...]], --search [SEARCH [SEARCH ...]] Full header or substring of header. Without flag, all headers will be plotted -l LENGTH, --length LENGTH Comma-separated list of sRNA lengths to plot. SCRAM alignment files must be available for each sRNA length -ylim YLIM, --ylim YLIM +/- y axis limit -win WIN, --win WIN Smoothing window size (default=auto) -pub, --publish Remove all labels from profiles for editing for publication -png, --png Export plot/s as 300 dpi .png file/s -bin_reads, --bin_reads For plotting large profiles (i.e. chromosomes). Assigns reads to 10,000 bins prior to smoothing. X-axis shows bin, not reference position
%run /scram_plot/scram_plot.py profile -s helitron \
-a out_dir/treatment_a_profile -l 21,22,24\
-cutoff 30
Loading scram alignment files: out_dir/treatment_a_profile_21.csv out_dir/treatment_a_profile_22.csv out_dir/treatment_a_profile_24.csv Extracting headers: Plotting: AT3TE33205|-|7920884|7922456|ATREP10B|RC/Helitron|1573 bp
Plotting: AT5TE60080|+|16641990|16642980|ATREP10D|RC/Helitron|991 bp
Plotting: AT4TE60630|-|12956832|12957720|ATREP3|RC/Helitron|889 bp
Plotting: AT3TE41655|+|10008501|10009413|ATREP5|RC/Helitron|913 bp
Plotting: AT2TE22655|+|5607445|5607739|ATREP10A|RC/Helitron|295 bp
!scram profile -r ref/GFP.fa \
-1 seq/treatment_a_rep1.fa,seq/treatment_a_rep2.fa,seq/treatment_a_rep3.fa \
-l 21,22,24 -o out_dir/treatment_a_profile_GFP
Loading reads SCRAM is attempting to load read files in the default collapsed FASTA format seq/treatment_a_rep3.fa - 7,897,787 reads processed seq/treatment_a_rep1.fa - 7,916,958 reads processed seq/treatment_a_rep2.fa - 7,827,082 reads processed Loading reference No. of reference sequences: 1 Combined length of reference sequences: 987 nt Aligning 21 nt reads Aligning 22 nt reads Aligning 24 nt reads Alignment complete. Total time taken = 1.719335469s
%run /scram_plot/scram_plot.py profile \
-a out_dir/treatment_a_profile_GFP -l 21,22,24
Loading scram alignment files: out_dir/treatment_a_profile_GFP_21.csv out_dir/treatment_a_profile_GFP_22.csv out_dir/treatment_a_profile_GFP_24.csv Extracting headers: Plotting: GFP
%run /scram_plot/scram_plot.py profile \
-a out_dir/treatment_a_profile_GFP -l 22
Loading scram alignment files: out_dir/treatment_a_profile_GFP_22.csv Extracting headers: Plotting: GFP
%run /scram_plot/scram_plot.py profile \
-a out_dir/treatment_a_profile_GFP -l 22 -win 1
Loading scram alignment files: out_dir/treatment_a_profile_GFP_22.csv Extracting headers: Plotting: GFP