!head /Volumes/web/cnidarian/BioSys_mCpGbind_492643.fasta
>gi|528955085|ref|XP_005208771.1| PREDICTED: DNA (cytosine-5)-methyltransferase 1 isoform X5 [Bos taurus] MAEKGKPPKPVSRLYTPRRSKSDGETKSEVSSSPRITRKTTRQTTITSHFPRGPAKRKPEEEPEKVKSDD SVDEEKDQEEKRRRVTSRERVAGLLPAEEPGRVRPGTHMEEEGRDDKEEKRLRSQTKEPTPKHKAKEEPD RDVRPGGAQAEMNEGEDKDEKRHRSQPKDLASKRRPEEKEPERVKPQVSDEKDEDEKFWRIQFTYQSTSR EEKRRRTTYRELTEKKMTRTKIAVVSKTNPPKCTECLQYLDDPELRYEQHPPDAVEEIQILTNERLSIFD ANESGFESYEDLPQHKLTCFSVYCKRGHLCPIDTGLIEKDVELLFSGSAKPIYEDDPSPEGGINGKNFGP INEWWIAGFDGGEKALLGFSTSFAEYILMDPSPEYAPLFSVMQEKIYISKIVVEFLQSNPDSTYEDLINK IETTVPPCMLNLNRFTEDSLLRHAQFVVEQVESYDRAGDSDEQPIFLSPCMRDLIKLAGVTLGKRRAERR QTIRQPAKEKDKGPTKATTTKLVYQIFDTFFAEQIEKDDKEDKENAFKRRRCGVCEICQQPECGKCKACK DMVKFGGSGRSKQACQKRRCPNMAMKEADDDEEVDDNIPEMPSPKKMHQGKKKKQNKNRISWVGDAVKTD
!wc /Volumes/web/cnidarian/BioSys_mCpGbind_492643.fasta
4947 7661 316301 /Volumes/web/cnidarian/BioSys_mCpGbind_492643.fasta
!makeblastdb -in /Volumes/Bay3/Software/ncbi-blast-2.2.28\+/db/oyster_v9p.fa -out /Volumes/Bay3/Software/ncbi-blast-2.2.28\+/db/oyster_v9p -dbtype prot
Building a new DB, current time: 08/16/2013 07:28:01 New DB name: /Volumes/Bay3/Software/ncbi-blast-2.2.28+/db/oyster_v9p New DB title: /Volumes/Bay3/Software/ncbi-blast-2.2.28+/db/oyster_v9p.fa Sequence type: Protein Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 28027 sequences in 1.79495 seconds.
!blastp -query /Volumes/web/cnidarian/BioSys_mCpGbind_492643.fasta -db /Volumes/Bay3/Software/ncbi-blast-2.2.28\+/db/oyster_v9p -out /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out.txt -outfmt 6 -max_target_seqs 10
!head /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out.txt
gi|528955085|ref|XP_005208771.1| CGI_10021920 61.43 1006 353 15 294 1287 1 983 0.0 1202 gi|528955085|ref|XP_005208771.1| CGI_10004707 45.13 113 59 2 689 798 11 123 2e-22 103 gi|528955085|ref|XP_005208771.1| CGI_10025673 48.98 49 20 1 542 585 703 751 2e-07 55.8 gi|528955085|ref|XP_005208771.1| CGI_10020946 25.71 210 120 9 1028 1227 1 184 7e-07 53.1 gi|528955085|ref|XP_005208771.1| CGI_10003574 41.67 60 32 2 527 585 110 167 5e-06 50.8 gi|528955085|ref|XP_005208771.1| CGI_10021919 46.88 32 17 0 234 265 239 270 0.96 33.5 gi|528955085|ref|XP_005208771.1| CGI_10013639 25.00 132 92 2 5 136 1444 1568 1.3 33.5 gi|528955085|ref|XP_005208771.1| CGI_10002526 22.58 279 186 6 1 261 153 419 5.8 31.2 gi|528955083|ref|XP_005208770.1| CGI_10021920 61.43 1006 353 15 408 1401 1 983 0.0 1202 gi|528955083|ref|XP_005208770.1| CGI_10004707 45.13 113 59 2 803 912 11 123 3e-22 103
!wc /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out.txt
4694 56328 389245 /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out.txt
!blastp -query /Volumes/web/cnidarian/BioSys_mCpGbind_492643.fasta -db /Volumes/Bay3/Software/ncbi-blast-2.2.28\+/db/oyster_v9p -out /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out2.txt -evalue 1E-20 -outfmt 6 -max_target_seqs 10
!wc /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out2.txt
365 4380 30402 /Volumes/web/cnidarian/mCpG_BP_blastp_v9_out2.txt
#13 oyster sequences identified with 1E-20 cutoff
!head /Volumes/web/cnidarian/mCpG_BP_candidates.fa
>CGI_10021920 QHKITNFSVYDKNTHLCPFDTGLIEKNVFLYFSGVVKPIYDENSSPEGGI RACKMGPINEWWTAGFDGGENALIGFSTAYAEYILMSPSEAYKPYMDTMR EKIHMSKVVIEFMQNNQEATYEDLLNKIQTTVPPTGLSSLTEDSLLRHAQ FVLDQVQSYDEAAEEDEGLLITTPCMRALIKLAGVTLGKRRQMRKELRKT KDKVKKPAFTMATTTRLVTQIFDSLFQGEIDDKSGQGSKRRRCGICEICQ QPDCGKCTACKDMVKFGGSGKAKQACINRRCPNMAMKEADEDDILDDDDT DEKLETTKLSWVGDPVLQDGKNSYYSAVLINDEKVSFGDFISIKPEDVAI PVYIAMVNYLWENASGNKMCHVQWLCRGSDTILGETGDPLELFFVDDCES IKLESSLRKVKVLHKETSPDWFMQGGIEHPEKDFPIEDDSNTFYYQKWYD
#Batch Web CD search
!head /Volumes/web/cnidarian/mCpG_BP_candidates_CDhitdata.txt
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-1A5793D4DABAFA61-10538AEF94374D59 #datatype hits Concise data #status 0 #Start time 2013-08-16T14:49:55 Run time 0:00:00:20 #status success Query Hit type PSSM-ID From To E-Value Bitscore Accession Short name Incomplete Superfamily Q#1 - >CGI_10021920 specific 240107 332 457 2.95408e-47 165.713 cd04760 BAH_Dnmt1_I - cl02608 Q#1 - >CGI_10021920 superfamily 243106 332 457 2.95408e-47 165.713 cl02608 BAH superfamily - -
Two hard matches CGI_10023379 CGI_10011651
CGI_10023379 O95243 3e-66 Methyl-CpG-binding domain protein 4
CGI_10011651 Q9UBB5 4e-77 Methyl-CpG-binding domain protein 2
!head /Volumes/web/cnidarian/TJGR_CCD_feature.txt
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-1DCFE6033927026F-7DD9BF65B71B020B #datatype feats #status 0 #Start time 2013-08-16T14:31:22 Run time 0:11:44:47 #status success Query Type Title coordinates complete size mapped size source domain Q#2 - >CGI_10000456 specific active site residues K116,K123 2 2 238385 Q#2 - >CGI_10000456 specific MoaE homodimer interface S17,T21,G26,I28,S29,I30,F31,V32,I34,R36,A90,P99,R101,E109,D113 15 15 238385
!head /Volumes/web/cnidarian/TJGR_CCD_dom_con_S_define.txt
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-1DCFE6033927026F-5D508E148AAAD632 #datatype hits Concise data(Superfamily only) #status 0 #Start time 2013-08-16T14:31:22 Run time 0:11:44:47 #status success Query Hit type PSSM-ID From To E-Value Bitscore Accession Short name Incomplete Superfamily Definition Q#2 - >CGI_10000456 superfamily 241841 11 135 8.33438e-46 147.279 cl00399 MoaE superfamily - - MoaE family. Members of this family are involved in biosynthesis of the molybdenum cofactor (Moco), an essential cofactor for a diverse group of redox enzymes. Moco biosynthesis is an evolutionarily conserved pathway present in eubacteria, archaea and eukaryotes. Moco contains a tricyclic pyranopterin, termed molybdopterin (MPT), which carries the cis-dithiolene group responsible for molybdenum ligation. This dithiolene group is generated by MPT synthase in the second major step in Moco biosynthesis. MPT synthase is a heterotetramer consisting of two large (MoaE) and two small (MoaD) subunits. Q#4 - >CGI_10000774 superfamily 220249 54 121 1.85274e-18 74.564 cl09695 H_lectin superfamily - - H-type lectin domain; The H-type lectin domain is a unit of six beta chains, combined into a homo-hexamer. It is involved in self/non-self recognition of cells, through binding with carbohydrates. It is sometimes found in association with the F5_F8_type_C domain pfam00754.
!head /Volumes/web/cnidarian/TJGR_CCD_dom_con.txt
#Batch CD-search tool NIH/NLM/NCBI #cdsid QM3-qcdsearch-1DCFE6033927026F-517F715027090CD6 #datatype hits Concise data #status 0 #Start time 2013-08-16T14:31:22 Run time 0:11:44:47 #status success Query Hit type PSSM-ID From To E-Value Bitscore Accession Short name Incomplete Superfamily Q#2 - >CGI_10000456 specific 238385 11 135 8.33438e-46 147.279 cd00756 MoaE - cl00399 Q#2 - >CGI_10000456 superfamily 241841 11 135 8.33438e-46 147.279 cl00399 MoaE superfamily - -
File with definition uploaded to SQLShare
SELECT *
FROM [sr320@washington.edu].[TJGR_CCD_domain_concise_Superfamily_def]
where
Definition like '%CpG%'
!wc /Volumes/web/cnidarian/TJGR_CpG_domain.csv
18 1699 13383 /Volumes/web/cnidarian/TJGR_CpG_domain.csv