Thinking about comparing CpG islands (evolutionary deplete with methylation) with experiementally determined regions to see what might be transient
!head /Volumes/web/cnidarian/TJGR_v9_cpgreport_1.output
CPGREPORT of scaffold360 from 1 to 280 Sequence Begin End Score CpG %CG CG/GC scaffold360 60 61 17 1 100.0 - scaffold360 96 97 17 1 100.0 - scaffold360 120 121 17 1 100.0 - scaffold360 187 188 17 1 100.0 -
!head /Volumes/web/cnidarian/TJGR_v9_cpgreport_1.gff
##gff-version 3 ##sequence-region scaffold360 1 1964554 #!Date 2013-08-14 #!Type DNA #!Source-version EMBOSS 6.4.0.0 scaffold360 cpgreport sequence_feature 1 13910 13703.000 + . ID=scaffold360.5581277 scaffold360 cpgreport sequence_feature 1 7092 9397.000 + . ID=scaffold360.5530319 scaffold360 cpgreport sequence_feature 1 2866 3093.000 + . ID=scaffold360.5465795 scaffold360 cpgreport sequence_feature 1 1259 452.000 + . ID=scaffold360.5437400 scaffold360 cpgreport sequence_feature 1 725 572.000 + . ID=scaffold360.5433333
!head /Volumes/web/cnidarian/TJGR_v9_newcpgreport_1.output
ID scaffold360 280 BP. XX DE CpG Island report. XX CC Obs/Exp ratio > 0.60. CC % C + % G > 50.00. CC Length > 200. XX FH Key Location/Qualifiers FT no islands detected
Failed
Window size [100]: Minimum length of an island [200]: Minimum observed/expected [0.6]: Minimum percentage [50.]:
!head /Volumes/web/cnidarian/TJGR_v9cpgplot1.gff
##gff-version 3 ##sequence-region scaffold360 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0 ##gff-version 3 ##sequence-region scaffold18356 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0
!fgrep -c "sequence_feature" /Volumes/web/cnidarian/TJGR_v9cpgplot1.gff
7536
again
Window size [100]: Minimum length of an island [200]: 150 Minimum observed/expected [0.6]: Minimum percentage [50.]: 40 Output file [scaffold360.cpgplot]: /Users/sr320/Desktop/v9.cpgplot Graph type [x11]: none Features output [scaffold360.gff]: /Volumes/web/cnidarian/TJGR_v9cpgplot2.gff
!head /Volumes/web/cnidarian/TJGR_v9cpgplot2.gff
##gff-version 3 ##sequence-region scaffold360 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0 ##gff-version 3 ##sequence-region scaffold18356 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0
!fgrep -c "sequence_feature" /Volumes/web/cnidarian/TJGR_v9cpgplot2.gff
121844
again
Window size [100]: Minimum length of an island [200]: Minimum observed/expected [0.6]: Minimum percentage [50.]: 45 Output file [scaffold360.cpgplot]: /Users/sr320/Desktop/v9.cpgplot Graph type [x11]: none Features output [scaffold360.gff]: /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff
!head /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff
##gff-version 3 ##sequence-region scaffold360 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0 ##gff-version 3 ##sequence-region scaffold18356 1 0 #!Date 2013-08-26 #!Type DNA #!Source-version EMBOSS 6.5.7.0
!fgrep -c "sequence_feature" /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff
22410
#how about intersecting methylation with CpG islands and plot histogram?
!head /Volumes/web/cnidarian/BiGill_island3_intersect_mCpG.txt
scaffold23910 cpgplot sequence_feature 1279 1555 . + . ID=scaffold23910.1 0 scaffold32586 cpgplot sequence_feature 1692 1941 . + . ID=scaffold32586.1 0 scaffold32586 cpgplot sequence_feature 5695 6035 . + . ID=scaffold32586.2 1 scaffold32586 cpgplot sequence_feature 6602 6806 . + . ID=scaffold32586.3 0 scaffold900 cpgplot sequence_feature 2406 3042 . + . ID=scaffold900.1 0 scaffold33640 cpgplot sequence_feature 4077 4288 . + . ID=scaffold33640.1 0 scaffold32024 cpgplot sequence_feature 8005 8296 . + . ID=scaffold32024.1 2 scaffold33132 cpgplot sequence_feature 8510 8775 . + . ID=scaffold33132.1 0 scaffold33132 cpgplot sequence_feature 8834 9115 . + . ID=scaffold33132.2 0 scaffold32140 cpgplot sequence_feature 480 821 . + . ID=scaffold32140.1 0
!wc /Volumes/web/cnidarian/BiGill_island3_intersect_mCpG.txt
22410 224100 1713353 /Volumes/web/cnidarian/BiGill_island3_intersect_mCpG.txt
from pandas import *
# read data from data file into a pandas DataFrame
island = read_table("/Volumes/web/cnidarian/BiGill_island3_intersect_mCpG.txt", # name of the data file
#sep="\t", # what character separates each column?
#na_values=["", " "], # what values should be considered "blank" values?
header=None)
island[9].hist(bins=50);
#Axis limits are changed using the axis([xmin, xmax, ymin, ymax]) function.
#plt.axis([0, 30, 0, 1000]);
plt.title('island_meth');
#also need to get total CG...
!intersectbed -c -a /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff > /Volumes/web/cnidarian/BiGill_island3_intersect_CG.txt
!head /Volumes/web/cnidarian/BiGill_island3_intersect_CG.txt
scaffold23910 cpgplot sequence_feature 1279 1555 . + . ID=scaffold23910.1 17 scaffold32586 cpgplot sequence_feature 1692 1941 . + . ID=scaffold32586.1 10 scaffold32586 cpgplot sequence_feature 5695 6035 . + . ID=scaffold32586.2 30 scaffold32586 cpgplot sequence_feature 6602 6806 . + . ID=scaffold32586.3 16 scaffold900 cpgplot sequence_feature 2406 3042 . + . ID=scaffold900.1 36 scaffold33640 cpgplot sequence_feature 4077 4288 . + . ID=scaffold33640.1 15 scaffold32024 cpgplot sequence_feature 8005 8296 . + . ID=scaffold32024.1 17 scaffold33132 cpgplot sequence_feature 8510 8775 . + . ID=scaffold33132.1 16 scaffold33132 cpgplot sequence_feature 8834 9115 . + . ID=scaffold33132.2 10 scaffold32140 cpgplot sequence_feature 480 821 . + . ID=scaffold32140.1 23
!wc /Volumes/web/cnidarian/BiGill_island3_intersect_CG.txt
22410 224100 1733171 /Volumes/web/cnidarian/BiGill_island3_intersect_CG.txt
# read data from data file into a pandas DataFrame
iscg = read_table("/Volumes/web/cnidarian/BiGill_island3_intersect_CG.txt", # name of the data file
#sep="\t", # what character separates each column?
#na_values=["", " "], # what values should be considered "blank" values?
header=None)
iscg[9].hist(bins=50);
#Axis limits are changed using the axis([xmin, xmax, ymin, ymax]) function.
#plt.axis([0, 1000, 0, 200]);
plt.title('island_cg');
#now need to do some math (SQLShare?) for each island mCG/totalCG to get %methylation
print island[9],iscg[9],island[9]/iscg[9]
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-3-d40ab837282e> in <module>() ----> 1 print island[9],iscg[9],island[9]/iscg[9] 2 NameError: name 'island' is not defined
GAVE UP -- excelled it
File "<ipython-input-70-31e33965a1a2>", line 1 GAVE UP -- excelled it ^ SyntaxError: invalid syntax
!head /Volumes/web/cnidarian/BiGill_island3_intersect_PERmCpG.txt
scaffold23910 cpgplot sequence_feature 1279 1555 . + . ID=scaffold23910.1 0 17 0 scaffold32586 cpgplot sequence_feature 1692 1941 . + . ID=scaffold32586.1 0 10 0 scaffold32586 cpgplot sequence_feature 5695 6035 . + . ID=scaffold32586.2 1 30 0.033333333 scaffold32586 cpgplot sequence_feature 6602 6806 . + . ID=scaffold32586.3 0 16 0 scaffold900 cpgplot sequence_feature 2406 3042 . + . ID=scaffold900.1 0 36 0 scaffold33640 cpgplot sequence_feature 4077 4288 . + . ID=scaffold33640.1 0 15 0 scaffold32024 cpgplot sequence_feature 8005 8296 . + . ID=scaffold32024.1 2 17 0.117647059 scaffold33132 cpgplot sequence_feature 8510 8775 . + . ID=scaffold33132.1 0 16 0 scaffold33132 cpgplot sequence_feature 8834 9115 . + . ID=scaffold33132.2 0 10 0 scaffold32140 cpgplot sequence_feature 480 821 . + . ID=scaffold32140.1 0 23 0
# read data from data file into a pandas DataFrame
perMcg = read_table("/Volumes/web/cnidarian/BiGill_island3_intersect_PERmCpG.txt", # name of the data file
#sep="\t", # what character separates each column?
#na_values=["", " "], # what values should be considered "blank" values?
header=None)
perMcg[11].hist(bins=50);
#Axis limits are changed using the axis([xmin, xmax, ymin, ymax]) function.
plt.axis([0, 1, 0, 2000]);
plt.title('Island percent_mcg');
# Develop track of CpG islands that are >80% methylated
!head /Volumes/web/cnidarian/TJGR_v9cpgplot3_80.gff
scaffold1260 cpgplot sequence_feature 113426 113836 . + . ID=scaffold1260.1 scaffold1260 cpgplot sequence_feature 138056 138259 . + . ID=scaffold1260.2 scaffold1050 cpgplot sequence_feature 63117 63359 . + . ID=scaffold1050.2 scaffold560 cpgplot sequence_feature 162469 162709 . + . ID=scaffold560.8 scaffold560 cpgplot sequence_feature 166332 166575 . + . ID=scaffold560.9 scaffold40246 cpgplot sequence_feature 67350 67561 . + . ID=scaffold40246.1 scaffold42236 cpgplot sequence_feature 31313 31549 . + . ID=scaffold42236.1 scaffold42236 cpgplot sequence_feature 32151 32455 . + . ID=scaffold42236.2 scaffold39720 cpgplot sequence_feature 44558 44761 . + . ID=scaffold39720.3 scaffold37178 cpgplot sequence_feature 3880 4081 . + . ID=scaffold37178.1
!wc /Volumes/web/cnidarian/TJGR_v9cpgplot3_80.gff
1723 15516 129211 /Volumes/web/cnidarian/TJGR_v9cpgplot3_80.gff
Represents 1723 islands that are 80% methylated or higher
#ID promoters that do not overlap gene bodies
!subtractbed -a /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_gene.gff > /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff
!head /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff
!tail /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff
!wc /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff
26706 240354 1803061 /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff
!wc /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
28023 252207 1892891 /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_1k5p_gene_promoter.gff
take promoter track and keep only contain CpG Islands as defined by /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff
Window size [100]: Minimum length of an island [200]: Minimum observed/expected [0.6]: Minimum percentage [50.]: 45 Output file [scaffold360.cpgplot]: /Users/sr320/Desktop/v9.cpgplot
!intersectbed -wa -a /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff -b /Volumes/web/cnidarian/TJGR_v9cpgplot3.gff > /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1.gff
!head /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1.gff
!tail /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1.gff
!wc /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1.gff
2649 23841 178121 /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1.gff
!wc /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1u.gff
2464 22176 165912 /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1u.gff
!tail /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1u.gff
#again with more generous definition of CpG Island
Window size [100]: Minimum length of an island [200]: 150 Minimum observed/expected [0.6]: Minimum percentage [50.]: 40 Output file [scaffold360.cpgplot]: /Users/sr320/Desktop/v9.cpgplot Graph type [x11]: none Features output [scaffold360.gff]: */Volumes/web/cnidarian/TJGR_v9cpgplot2.gff*
!intersectbed -wa -a /Volumes/web/cnidarian/TJGR_prom_subtract_gene1.gff -b /Volumes/web/cnidarian/TJGR_v9cpgplot2.gff > /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2.gff
#removing duplicates lines
!uniq /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2.gff > /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff
!wc /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2.gff
12286 110574 830426 /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2.gff
!wc /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff
9961 89649 673382 /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff
!head /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff
!intersectbed -c -a /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff > /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG.gff
!head /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG.gff
8 21 2 20 15 16 31 32 66 21
!intersectbed -a /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u.gff -b /Volumes/web/trilobite/Crassostrea_gigas_v9_tracks/Cgigas_v9_CG.gff -c > /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG2.gff
!tail /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG2.gff
16 24 4 4 2 25 21 33 31 26
!tr "\n \t" ' ' </Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG2.gff> /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG3.gff
!head /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland2u_in_CG3.gff
26
Only "promoters" with islands and that do not overlap with gene bodies
#absolute methylation
#using the conserved model
#Window size [100]:
#Minimum length of an island [200]:
#Minimum observed/expected [0.6]:
#Minimum percentage [50.]: 45
#promoter does not overlap gene
!intersectbed -c -a /Volumes/web/cnidarian/TJGR_prom_notgene_cpgIsland1u.gff -b /Volumes/web/bivalvia/wholegenomefiles_MBDbsSeq_gill/finalSupp_figshare/Cgigas_gill_HTbisulfiteSeq_CG_methylated.gff > /Volumes/web/cnidarian/BiGill_pro_island1u_intersect_mCpG.txt
!head /Volumes/web/cnidarian/BiGill_pro_island1u_intersect_mCpG_mod.txt
C18346 flankbed promoter 1 173 . + . CGI_10000009 0 C18428 flankbed promoter 547 611 . - . CGI_10000010 0 C19356 flankbed promoter 1 354 . + . CGI_10000014 0 C19532 flankbed promoter 602 843 . - . CGI_10000017 0 C20578 flankbed promoter 1 698 . + . CGI_10000034 0 C21046 flankbed promoter 813 1275 . - . CGI_10000042 0 C21254 flankbed promoter 1 669 . + . CGI_10000047 0 C21260 flankbed promoter 1176 1343 . - . CGI_10000048 0 C22036 flankbed promoter 1 947 . + . CGI_10000068 0 C22346 flankbed promoter 756 1755 . - . CGI_10000079 0
!wc /Volumes/web/cnidarian/BiGill_pro_island1u_intersect_mCpG_mod.txt
2463 24640 158606 /Volumes/web/cnidarian/BiGill_pro_island1u_intersect_mCpG_mod.txt
from pandas import *
# read data from data file into a pandas DataFrame
npmpg = read_table("/Volumes/web/cnidarian/BiGill_pro_island1u_intersect_mCpG_mod.txt", # name of the data file
#sep="\t", # what character separates each column?
#na_values=["", " "], # what values should be considered "blank" values?
header=None)
npmpg[9].hist(bins=50);
#Axis limits are changed using the axis([xmin, xmax, ymin, ymax]) function.
#plt.axis([0, 40, 0, 500]);
plt.title('nProm_mCpG');
#next step plot absolute methylation against gene expression as provided by Mac
SQLShare
SELECT * FROM [sr320@washington.edu].[BiGill_pro_island1u_intersect_mCpG_mod.txt]pls left join [sr320@washington.edu].[ExpressedGeneGil.txt]z on pls.Column9 = z.ID