Developing Canonical Tracks for IGV Oyster Genome Browser

This is my attempt derive fundamental genomic tracks for the oyster genome that can be easily visualized.

render in viewer

Contents


Will use the full genome as scaffold (should be in cnidaria)

Launch IGV

clicking this file opens the app

Screenshot%205/27/13%2012:14%20PM

Load genome via url

Screenshot%205/27/13%2012:16%20PM

Exons and Genes

Derived from

ftp://climb.genomics.cn/pub/10.5524/100001_101000/100030/gene_v9/

gene:
    gene_v9/oyster.v9.glean.final.rename.gff.gz gene feature of pacific oyster in gff format
    gene_v9/oyster.v9.glean.final.rename.gff.cds.gz coding sequence of pacific oyster in fasta format
    gene_v9/oyster.v9.glean.final.rename.gff.pep.gz protein sequence of pacific oyster in fasta format
In [3]:
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff
C16582	GLEAN	mRNA	35	385	0.555898	-	.	ID=CGI_10000001;
C16582	GLEAN	CDS	35	385	.	-	0	Parent=CGI_10000001;
C17212	GLEAN	mRNA	31	363	0.999572	+	.	ID=CGI_10000002;
C17212	GLEAN	CDS	31	363	.	+	0	Parent=CGI_10000002;
C17316	GLEAN	mRNA	30	257	0.555898	+	.	ID=CGI_10000003;
C17316	GLEAN	CDS	30	257	.	+	0	Parent=CGI_10000003;
C17476	GLEAN	mRNA	34	257	0.998947	-	.	ID=CGI_10000004;
C17476	GLEAN	CDS	104	257	.	-	0	Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	Parent=CGI_10000004;
C17998	GLEAN	mRNA	196	387	1	-	.	ID=CGI_10000005;
In [12]:
!wc /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff
  224718 2022462 14179523 /Volumes/Bay4 scratch/oyster.v9.glean.final.rename.gff
In [5]:
#not quite a GFF!
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff.pep
>CGI_10000780
MERYGARRLRMTIWETTRNGQLQTTHLGSILFILVMMYACVFCRVSLKNG
EEITQLREKGCNTVNRTSQTRNNTIVTTPGQKVHQKCRRDYINANSIKNY
MREKDVSITEPTRDLRSSTPDFEFQKNCLFCGYFAKFSECKRGIDVFPVR
TTDFSNTLRNICKKRNDEWSEIVLRRLNIAPSDLHAADAIYHQTCSVNFR
TGQQIPVSKQANKMVEKGIKTKHADADADVLIALTAIESAKTKPTVLLGE
DTDLLVLLLHHADVTSNSLIFKSGNVSKVNTHIKIWDILKTKVLLGEELC
TLLPLIHAISGCDTTSRMFGVSKAATLKKFAEHDFLKTRQLLCNANAKDD
VISAGENIISSLYNGAPYEELNVLRYRKFAARVLTNKTCVQIHTLPPTSN
AASFHSQRAYLQMKMWMNEDNLNPCEWGWKVANGNLVPVKCTVKLPLNC
In [7]:
#not quite a GFF!
!head /Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff.CDS
>CGI_10000780
ATGGAAAGATATGGCGCCCGTAGATTAAGAATGACGATATGGGAGACAAC
TCGTAATGGTCAACTGCAGACGACGCATCTAGGTTCCATCCTTTTCATTC
TGGTAATGATGTATGCTTGTGTTTTTTGTCGGGTGTCTCTAAAAAATGGT
GAAGAAATAACACAACTAAGAGAAAAAGGATGTAACACAGTTAATAGGAC
CAGCCAAACCAGAAATAATACAATCGTCACAACTCCAGGACAAAAAGTTC
ATCAGAAATGTCGACGTGATTACATTAATGCTAACTCAATCAAGAATTAC
ATGCGAGAAAAGGATGTATCGATAACCGAGCCAACTCGTGACTTACGATC
TTCTACTCCTGATTTTGAGTTCCAGAAGAACTGTTTATTTTGTGGATATT
TTGCAAAATTTTCAGAATGCAAAAGGGGAATCGACGTGTTTCCTGTCAGG

Specifically, (/Volumes/Bay4\ scratch/oyster.v9.glean.final.rename.gff) was parsed to Exon (CDS) and full gene (mRNA).

In [8]:
!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
C16582	GLEAN	CDS	35	385	.	-	0	Parent=CGI_10000001;
C17212	GLEAN	CDS	31	363	.	+	0	Parent=CGI_10000002;
C17316	GLEAN	CDS	30	257	.	+	0	Parent=CGI_10000003;
C17476	GLEAN	CDS	104	257	.	-	0	Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	Parent=CGI_10000004;
C17998	GLEAN	CDS	196	387	.	-	0	Parent=CGI_10000005;
C18346	GLEAN	CDS	174	551	.	+	0	Parent=CGI_10000009;
C18428	GLEAN	CDS	286	546	.	-	0	Parent=CGI_10000010;
C18964	GLEAN	CDS	203	658	.	-	0	Parent=CGI_10000011;
C18980	GLEAN	CDS	30	674	.	+	0	Parent=CGI_10000012;
In [9]:
!head /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
C16582	GLEAN	mRNA	35	385	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	mRNA	31	363	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	mRNA	30	257	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	mRNA	34	257	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	mRNA	196	387	1	-	.	ID=CGI_10000005;
C18346	GLEAN	mRNA	174	551	1	+	.	ID=CGI_10000009;
C18428	GLEAN	mRNA	286	546	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	mRNA	203	658	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	mRNA	30	674	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	mRNA	160	681	0.999955	-	.	ID=CGI_10000013;
In [11]:
!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
  196691 1770219 12359791 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff
In [13]:
!wc /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
   28027  252243 1819732 /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff
In [16]:
#check to make sure files add up.
sum(196691 + 28027)
Out[16]:
224718
In [20]:
cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.CDS.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
In [21]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
C16582	GLEAN	CDS	35	385	.	-	0	Parent=CGI_10000001;
C17212	GLEAN	CDS	31	363	.	+	0	Parent=CGI_10000002;
C17316	GLEAN	CDS	30	257	.	+	0	Parent=CGI_10000003;
C17476	GLEAN	CDS	104	257	.	-	0	Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	Parent=CGI_10000004;
C17998	GLEAN	CDS	196	387	.	-	0	Parent=CGI_10000005;
C18346	GLEAN	CDS	174	551	.	+	0	Parent=CGI_10000009;
C18428	GLEAN	CDS	286	546	.	-	0	Parent=CGI_10000010;
C18964	GLEAN	CDS	203	658	.	-	0	Parent=CGI_10000011;
C18980	GLEAN	CDS	30	674	.	+	0	Parent=CGI_10000012;
In [22]:
cp /Volumes/web/cnidarian/oyster.v9.glean.final.rename.mRNA.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
In [23]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
C16582	GLEAN	mRNA	35	385	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	mRNA	31	363	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	mRNA	30	257	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	mRNA	34	257	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	mRNA	196	387	1	-	.	ID=CGI_10000005;
C18346	GLEAN	mRNA	174	551	1	+	.	ID=CGI_10000009;
C18428	GLEAN	mRNA	286	546	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	mRNA	203	658	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	mRNA	30	674	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	mRNA	160	681	0.999955	-	.	ID=CGI_10000013;

All CGs

In [24]:
!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
 10035701 99934100 977314599 /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
In [28]:
!fgrep -c "fuzznuc	nucleotide_motif" /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
9978551
In [25]:
!head /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff
##gff-version 3
##sequence-region scaffold360 1 280
#!Date 2013-04-23
#!Type DNA
#!Source-version EMBOSS 6.5.7.0
scaffold360	fuzznuc	nucleotide_motif	60	61	2	+	.	ID=scaffold360.1;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	96	97	2	+	.	ID=scaffold360.2;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	120	121	2	+	.	ID=scaffold360.3;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	187	188	2	+	.	ID=scaffold360.4;note=*pat pattern:CG
##gff-version 3
In [ ]:
cp /Volumes/web/cnidarian/TJGR_oyster_v9_CG.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
In [5]:
!sortbed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff > /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff
In [8]:
!wc /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff
 9978551 99785510 976050492 /Volumes/web/cnidarian/TJGR_oyster_v9_CG_sorted.gff

Promoter

In [34]:
!head /Volumes/web/cnidarian/qDOD_scaffold_length.csv








In [36]:
!tr ',' "\t" </Volumes/web/cnidarian/qDOD_scaffold_length.csv> /Volumes/web/cnidarian/qDOD_scaffold_length.txt
In [37]:
!head /Volumes/web/cnidarian/qDOD_scaffold_length.txt








In [42]:
!flankbed -s -l 1000 -r 0 -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff
In [43]:
!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff
C16582	GLEAN	mRNA	386	395	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	mRNA	1	30	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	mRNA	1	29	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	mRNA	258	491	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	mRNA	388	559	1	-	.	ID=CGI_10000005;
C18346	GLEAN	mRNA	1	173	1	+	.	ID=CGI_10000009;
C18428	GLEAN	mRNA	547	611	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	mRNA	659	714	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	mRNA	1	29	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	mRNA	682	743	0.999955	-	.	ID=CGI_10000013;
In [44]:
!sed 's/mRNA/promoter/g' </Volumes/web/cnidarian/TJGR_Promoter_1k5p.gff> /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff
In [45]:
!head /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff
C16582	GLEAN	promoter	386	395	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	promoter	1	30	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	promoter	1	29	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	promoter	258	491	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	promoter	388	559	1	-	.	ID=CGI_10000005;
C18346	GLEAN	promoter	1	173	1	+	.	ID=CGI_10000009;
C18428	GLEAN	promoter	547	611	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	promoter	659	714	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	promoter	1	29	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	promoter	682	743	0.999955	-	.	ID=CGI_10000013;
http://eagle.fish.washington.edu/cnidarian/TJGR_Promoter_1k5p_b.gff
In [46]:
cp /Volumes/web/cnidarian/TJGR_Promoter_1k5p_b.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
In [27]:
#clean up in SQLShare
!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff








In [29]:
!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2.gff > /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2b.gff
In [30]:
!tr ',' "\t" </Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2b.gff> /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff
In [31]:
!head /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff








In [34]:
!cp /Volumes/web/cnidarian/Cgigas_v9_1k5p_gene_promoter_v2c.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff

Introns (Option 1)

In [70]:
!sed 's/Parent=/#Parent=/g' </Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff> /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff
In [75]:
!sed 's/ID=/#ID=/g' </Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff> /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff
In [71]:
!head /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff
C16582	GLEAN	CDS	35	385	.	-	0	#Parent=CGI_10000001;
C17212	GLEAN	CDS	31	363	.	+	0	#Parent=CGI_10000002;
C17316	GLEAN	CDS	30	257	.	+	0	#Parent=CGI_10000003;
C17476	GLEAN	CDS	104	257	.	-	0	#Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	#Parent=CGI_10000004;
C17998	GLEAN	CDS	196	387	.	-	0	#Parent=CGI_10000005;
C18346	GLEAN	CDS	174	551	.	+	0	#Parent=CGI_10000009;
C18428	GLEAN	CDS	286	546	.	-	0	#Parent=CGI_10000010;
C18964	GLEAN	CDS	203	658	.	-	0	#Parent=CGI_10000011;
C18980	GLEAN	CDS	30	674	.	+	0	#Parent=CGI_10000012;
In [76]:
!subtractBed -a /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_exon_b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron.gff 
In [77]:
!head /Volumes/web/cnidarian/Cgigas_v9_intron.gff 
C17476	GLEAN	mRNA	75	103	0.998947	-	.	#ID=CGI_10000004;
C19392	GLEAN	mRNA	184	451	1	+	.	#ID=CGI_10000015;
C20262	GLEAN	mRNA	539	641	1	-	.	#ID=CGI_10000025;
C20262	GLEAN	mRNA	650	871	1	-	.	#ID=CGI_10000025;
C20334	GLEAN	mRNA	524	867	1	-	.	#ID=CGI_10000028;
C20412	GLEAN	mRNA	215	409	1	-	.	#ID=CGI_10000029;
C20412	GLEAN	mRNA	464	705	1	-	.	#ID=CGI_10000029;
C20462	GLEAN	mRNA	50	271	1	+	.	#ID=CGI_10000030;
C20462	GLEAN	mRNA	360	481	1	+	.	#ID=CGI_10000030;
C20462	GLEAN	mRNA	577	822	1	+	.	#ID=CGI_10000030;
In [99]:
!sed 's/#ID=/Parent=/g' </Volumes/web/cnidarian/Cgigas_v9_intron.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_b.gff 
In [105]:
!sed 's/GLEAN/subtractBed/g' </Volumes/web/cnidarian/Cgigas_v9_intron_b.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_c.gff 
In [111]:
!sed 's/mRNA/_intron/g' </Volumes/web/cnidarian/Cgigas_v9_intron_c.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff 
In [112]:
!head /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff 
C17476	subtractBed	_intron	75	103	0.998947	-	.	Parent=CGI_10000004;
C19392	subtractBed	_intron	184	451	1	+	.	Parent=CGI_10000015;
C20262	subtractBed	_intron	539	641	1	-	.	Parent=CGI_10000025;
C20262	subtractBed	_intron	650	871	1	-	.	Parent=CGI_10000025;
C20334	subtractBed	_intron	524	867	1	-	.	Parent=CGI_10000028;
C20412	subtractBed	_intron	215	409	1	-	.	Parent=CGI_10000029;
C20412	subtractBed	_intron	464	705	1	-	.	Parent=CGI_10000029;
C20462	subtractBed	_intron	50	271	1	+	.	Parent=CGI_10000030;
C20462	subtractBed	_intron	360	481	1	+	.	Parent=CGI_10000030;
C20462	subtractBed	_intron	577	822	1	+	.	Parent=CGI_10000030;
In [ ]:
http://eagle.fish.washington.edu/cnidarian/Cgigas_v9_intron_d.gff 
In [113]:
cp /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
In [9]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
C17476	subtractBed	_intron	75	103	0.998947	-	.	Parent=CGI_10000004;
C19392	subtractBed	_intron	184	451	1	+	.	Parent=CGI_10000015;
C20262	subtractBed	_intron	539	641	1	-	.	Parent=CGI_10000025;
C20262	subtractBed	_intron	650	871	1	-	.	Parent=CGI_10000025;
C20334	subtractBed	_intron	524	867	1	-	.	Parent=CGI_10000028;
C20412	subtractBed	_intron	215	409	1	-	.	Parent=CGI_10000029;
C20412	subtractBed	_intron	464	705	1	-	.	Parent=CGI_10000029;
C20462	subtractBed	_intron	50	271	1	+	.	Parent=CGI_10000030;
C20462	subtractBed	_intron	360	481	1	+	.	Parent=CGI_10000030;
C20462	subtractBed	_intron	577	822	1	+	.	Parent=CGI_10000030;
In [21]:
#will clean up in SQLSHARE
!head /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff
#!tail -n +2 /Volumes/web/cnidarian/Cgigas_v9_intron_v2b.gff > /Volumes/web/cnidarian/Cgigas_v9_intron_v2c.gff








In [20]:
!sed 's/intron/intrn/g' </Volumes/web/cnidarian/Cgigas_v9_intron_v2c.gff> /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff
In [22]:
cp /Volumes/web/cnidarian/Cgigas_v9_intron_v2d.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
In [23]:
!wc /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
  176049 1584441 12654996 /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff
In [24]:
!wc /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff
  176049 1584441 13834641 /Volumes/web/cnidarian/Cgigas_v9_intron_d.gff

Intron (Option 2)

In [143]:
!complementBed -i /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_complement_exon.bed
In [144]:
!head /Volumes/web/cnidarian/TJGR_complement_exon.bed
C1	0	100
C10003	0	156
C10005	0	156
C10007	0	156
C10009	0	156
C1001	0	103
C10011	0	156
C10013	0	157
C10015	0	157
C10021	0	157
In [145]:
!intersectBed -a /Volumes/web/cnidarian/TJGR_complement_exon.bed -b /Volumes/web/cnidarian/TJGR_Cgigas_v9_gene_b.gff > /Volumes/web/cnidarian/TJGR_intron2.bed
In [146]:
!head /Volumes/web/cnidarian/TJGR_intron2.bed
C17476	74	103
C19392	183	451
C20262	538	641
C20262	649	871
C20334	523	867
C20412	214	409
C20412	463	705
C20462	49	271
C20462	359	481
C20462	576	822

Transposable Elements

Generating TE canonical GFF from RepeatProteinMask oyster v9 Updated Today The starting file for this is the output of RepeatProteinMask performed by SR (look towards the bottom of this entry): https://www.evernote.com/shard/s10/sh/7dea995c-17ac-4bcf-bc38-963220e9e7c9/b28dacbbdbfe123960b88e42fa45a34a

The txt file (http://eagle.fish.washington.edu/cnidarian/qDOD_RepeatProteinMask_v9.txt) was uploaded into SQLshare

Then a gff was derived using the following query:

SELECT  
SeqID as seqname,  
Method as source,  
Type as feature, 
[Begin] as [start],
[End] as [end],
Score as score,  
sym as strand,  
'.' as frame,  
'.' as attribute 
FROM [[email protected]].[qDOD_RepeatProteinMask_v9.txt]

The derived SQLdataset is shared publicly here: https://sqlshare.escience.washington.edu/sqlshare#s=query/mgavery%40washington.edu/qDOD_RepeatProteinMask_v9_asgff

The file was downloaded and saved as a .gff and saved here: http://eagle.fish.washington.edu/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff

In [ ]:
 
In [115]:
!head /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff
C21242	TRF	Tandem_Repeat	38	100	72	+	.	.
C21306	TRF	Tandem_Repeat	35	143	112	+	.	.
C21306	TRF	Tandem_Repeat	574	947	208	+	.	.
C21306	TRF	Tandem_Repeat	574	901	313	+	.	.
C21372	TRF	Tandem_Repeat	643	671	58	+	.	.
C22542	TRF	Tandem_Repeat	1727	1774	96	+	.	.
C22728	TRF	Tandem_Repeat	426	491	105	+	.	.
C23428	TRF	Tandem_Repeat	130	415	202	+	.	.
C23796	TRF	Tandem_Repeat	547	608	97	+	.	.
C24440	TRF	Tandem_Repeat	1059	1089	62	+	.	.
In [116]:
cp /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/gffs/qDOD_RepeatProteinMask_v9_asgff.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff

Other

Complement to gene, promoter, nor TE

$ cat file1 file2 ... fileN > file1-N.nonunique.bed $ mergeBed -i file1-N.nonunique.bed > file1-N.merged.bed
In [126]:
!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff
In [129]:
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff
C16582	GLEAN	promoter	386	395	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	promoter	1	30	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	promoter	1	29	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	promoter	258	491	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	promoter	388	559	1	-	.	ID=CGI_10000005;
C18346	GLEAN	promoter	1	173	1	+	.	ID=CGI_10000009;
C18428	GLEAN	promoter	547	611	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	promoter	659	714	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	promoter	1	29	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	promoter	682	743	0.999955	-	.	ID=CGI_10000013;
In [130]:
!sortBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff
In [ ]:
 
In [131]:
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff
C10153	WUBlastX	LTR_Pao	3	158	109	-	.	.
C10177	WUBlastX	LINE_L2	2	157	97	-	.	.
C10191	WUBlastX	LTR_Copia	2	157	174	-	.	.
C10245	WUBlastX	LINE_Penelope	5	154	59	-	.	.
C10291	WUBlastX	LTR_Copia	2	160	85	-	.	.
C10475	WUBlastX	LINE_L1-Tx1	3	149	50	-	.	.
C10673	WUBlastX	LTR_DIRS	37	162	59	+	.	.
C10675	WUBlastX	LINE_L2	1	165	132	+	.	.
C10805	WUBlastX	LINE_I	1	168	100	-	.	.
C10973	WUBlastX	LTR_Gypsy	3	167	186	+	.	.
In [134]:
!mergebed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s.gff > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed
In [135]:
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed
C10153	2	158
C10177	1	157
C10191	1	157
C10245	4	154
C10291	1	160
C10475	2	149
C10673	36	162
C10675	0	165
C10805	0	168
C10973	2	167
In [136]:
!complementBed -i /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique.bed -g /Volumes/web/cnidarian/qDOD_scaffold_length.txt > /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
In [137]:
!head /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
C1	0	100
C10003	0	156
C10005	0	156
C10007	0	156
C10009	0	156
C1001	0	103
C10011	0	156
C10013	0	157
C10015	0	157
C10021	0	157
In [ ]:
http://eagle.fish.washington.edu/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed
In [139]:
cp /Volumes/web/cnidarian/TJGR_gene_TE_promoter_s_unique_comp.bed /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
In [140]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
C1	0	100
C10003	0	156
C10005	0	156
C10007	0	156
C10009	0	156
C1001	0	103
C10011	0	156
C10013	0	157
C10015	0	157
C10021	0	157

TEST - Verification everything is covered

In [149]:
!cat /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff > /Volumes/web/cnidarian/TJGR_CanTest
In [150]:
!head /Volumes/web/cnidarian/TJGR_CanTest
C16582	GLEAN	CDS	35	385	.	-	0	Parent=CGI_10000001;
C17212	GLEAN	CDS	31	363	.	+	0	Parent=CGI_10000002;
C17316	GLEAN	CDS	30	257	.	+	0	Parent=CGI_10000003;
C17476	GLEAN	CDS	104	257	.	-	0	Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	Parent=CGI_10000004;
C17998	GLEAN	CDS	196	387	.	-	0	Parent=CGI_10000005;
C18346	GLEAN	CDS	174	551	.	+	0	Parent=CGI_10000009;
C18428	GLEAN	CDS	286	546	.	-	0	Parent=CGI_10000010;
C18964	GLEAN	CDS	203	658	.	-	0	Parent=CGI_10000011;
C18980	GLEAN	CDS	30	674	.	+	0	Parent=CGI_10000012;
In [151]:
!sortBed -i /Volumes/web/cnidarian/TJGR_CanTest > /Volumes/web/cnidarian/TJGR_CanTest_s
In [155]:
!mergebed -i /Volumes/web/cnidarian/TJGR_CanTest_s > /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed
In [156]:
!head /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed
C10153	2	158
C10177	1	157
C10191	1	157
C10245	4	154
C10291	1	160
C10475	2	149
C10673	36	162
C10675	0	165
C10805	0	168
C10973	2	167
In [ ]:
http://eagle.fish.washington.edu/cnidarian/TJGR_CanTest_s_unique.bed
In [157]:
!intersectBed -a /Volumes/web/cnidarian/TJGR_CanTest_s_unique.bed -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed > /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed
In [159]:
!wc /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed
       0       0       0 /Volumes/web/cnidarian/TJGR_CanTest_s_unique_inter_COMP.bed

URLS for Canonical Genome Features

Gene
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff

Exons
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff

Intron
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff

Promoter (= 1kbp 5' of genes)
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff

Transposable Elements
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff

Complement to Gene, Promoter, and TE tracks
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed

All CGs
http://eagle.fish.washington.edu/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff


Import all tracks
http://eagle.fish.washington.edu/cnidarian/igv_session_073013.xml

previews

In [123]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff
C16582	GLEAN	mRNA	35	385	0.555898	-	.	ID=CGI_10000001;
C17212	GLEAN	mRNA	31	363	0.999572	+	.	ID=CGI_10000002;
C17316	GLEAN	mRNA	30	257	0.555898	+	.	ID=CGI_10000003;
C17476	GLEAN	mRNA	34	257	0.998947	-	.	ID=CGI_10000004;
C17998	GLEAN	mRNA	196	387	1	-	.	ID=CGI_10000005;
C18346	GLEAN	mRNA	174	551	1	+	.	ID=CGI_10000009;
C18428	GLEAN	mRNA	286	546	0.555898	-	.	ID=CGI_10000010;
C18964	GLEAN	mRNA	203	658	0.999572	-	.	ID=CGI_10000011;
C18980	GLEAN	mRNA	30	674	0.555898	+	.	ID=CGI_10000012;
C19100	GLEAN	mRNA	160	681	0.999955	-	.	ID=CGI_10000013;
In [122]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_exon.gff
C16582	GLEAN	CDS	35	385	.	-	0	Parent=CGI_10000001;
C17212	GLEAN	CDS	31	363	.	+	0	Parent=CGI_10000002;
C17316	GLEAN	CDS	30	257	.	+	0	Parent=CGI_10000003;
C17476	GLEAN	CDS	104	257	.	-	0	Parent=CGI_10000004;
C17476	GLEAN	CDS	34	74	.	-	2	Parent=CGI_10000004;
C17998	GLEAN	CDS	196	387	.	-	0	Parent=CGI_10000005;
C18346	GLEAN	CDS	174	551	.	+	0	Parent=CGI_10000009;
C18428	GLEAN	CDS	286	546	.	-	0	Parent=CGI_10000010;
C18964	GLEAN	CDS	203	658	.	-	0	Parent=CGI_10000011;
C18980	GLEAN	CDS	30	674	.	+	0	Parent=CGI_10000012;
In [25]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_intron.gff








In [35]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff








In [124]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_TE.gff
C21242	TRF	Tandem_Repeat	38	100	72	+	.	.
C21306	TRF	Tandem_Repeat	35	143	112	+	.	.
C21306	TRF	Tandem_Repeat	574	947	208	+	.	.
C21306	TRF	Tandem_Repeat	574	901	313	+	.	.
C21372	TRF	Tandem_Repeat	643	671	58	+	.	.
C22542	TRF	Tandem_Repeat	1727	1774	96	+	.	.
C22728	TRF	Tandem_Repeat	426	491	105	+	.	.
C23428	TRF	Tandem_Repeat	130	415	202	+	.	.
C23796	TRF	Tandem_Repeat	547	608	97	+	.	.
C24440	TRF	Tandem_Repeat	1059	1089	62	+	.	.
In [142]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_COMP_gene_prom_TE.bed
C1	0	100
C10003	0	156
C10005	0	156
C10007	0	156
C10009	0	156
C1001	0	103
C10011	0	156
C10013	0	157
C10015	0	157
C10021	0	157
In [6]:
!head /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff
##gff-version 3
##sequence-region scaffold360 1 280
#!Date 2013-04-23
#!Type DNA
#!Source-version EMBOSS 6.5.7.0
scaffold360	fuzznuc	nucleotide_motif	60	61	2	+	.	ID=scaffold360.1;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	96	97	2	+	.	ID=scaffold360.2;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	120	121	2	+	.	ID=scaffold360.3;note=*pat pattern:CG
scaffold360	fuzznuc	nucleotide_motif	187	188	2	+	.	ID=scaffold360.4;note=*pat pattern:CG
##gff-version 3
IGV session file available: http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml

Want to convert to IGV

Screenshot%205/28/13%208:53%20PM

SELECT 
chr as seqname,  
pos - 1 as start, -- compensating for going to zero-based?
pos + 1 as [end], 
'CG' as feature, 
ratio as score  

FROM [[email protected]].     
[BiGill_methratio_v9_A.txt] yel 
where 
context like '__CG_' --_=single character wildcard
and
CT_Count >= 5​

python fetchdata.py -d "[[email protected]].[BiGill_methratio_v9_IGV]​​​" -f tsv -o /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv
In [4]:
!head /Volumes/web/cnidarian/BiGill_meth_v9_5x.igv








Imported in IGV and looks like coordinates are ok

Screenshot%205/28/13%209:11%20PM


Screenshot%205/28/13%209:13%20PM


Male Gonad Methylation data

Developing IGV file format

SELECT 
chr as seqname,  
pos - 1 as start, -- compensating for going to zero-based?
pos + 1 as [end], 
'CG' as feature, 
ratio as score  

FROM [[email protected]].     
[BiGO_betty_plain_methratio_v1.txt] yel 
where 
context like '__CG_' --_=single character wildcard
and
CT_Count >= 5​

python fetchdata.py -d "[[email protected]].[BiGO_betty_methratio_v1_IGV]​​​" -f tsv -o /Volumes/web/cnidarian/BiGO_betty_methratio_v1.igv

Screenshot%205/29/13%204:01%20PM

IGV Session resaved http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml

Details on sperm exon level expression available here

Gene level expression is in SQLShare, originally derived from CLC RNA-Seq

Gill Expression data

https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/qDOD_Zhang_Gil_gene_RNA-seq

Screenshot%206/2/13%208:14%20AM

SQLShare Query

SELECT 
Chromosome,
 "Chromosome region start" - 1 as start,
 "Chromosome region end" as [end],
'gene' as feature,
 RPKM 

  FROM [[email protected]].[qDOD_Zhang_Gil_gene_RNA-seq]​​​​​​​​​​​​​​

Resulting file https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/Zhang_Gil_gene_RNA-seq_IGV

Downloading
python fetchdata.py -d "[[email protected]].[Zhang_Gil_gene_RNA-seq_IGV]​" -f tsv -o /Volumes/web/cnidarian/Zhang_Gil_gene_RNA-seq.igv

Needs to be sorted in IGV
http://eagle.fish.washington.edu/cnidarian/Zhang_Gil_gene_RNA-seq.sorted.igv

Sperm Gene level expression

File in SQLShare https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/qDOD_Zhang_Mgo_gene_RNA-seq

SQLShare Query

SELECT 
Chromosome,
"Chromosome region start" - 1 as start,
"Chromosome region end" as [end],
'gene' as feature,
RPKM as Mgo_RPKM
FROM [[email protected]].[qDOD_Zhang_Mgo_gene_RNA-seq]​

New Dataset https://sqlshare.escience.washington.edu/sqlshare#s=query/sr320%40washington.edu/Zhang_Mgo_gene_RNA-seq_IGV

Downloading
python fetchdata.py -d "[[email protected]].[Zhang_Mgo_gene_RNA-seq_IGV]​​" -f tsv -o /Volumes/web/cnidarian/Zhang_Mgo_gene_RNA-seq.igv

Sorted
http://eagle.fish.washington.edu/cnidarian/Zhang_Mgo_gene_RNA-seq.sorted.igv


New IGV Browser ...

Screenshot%206/2/13%209:48%20AM
http://eagle.fish.washington.edu/cnidarian/oyster_v9_igv_session.xml

In [ ]: