#So if you completed Module 1, you might have some data that looks like...
If you blasted on UniProt-SwissProt you are have some ID numbers ie P12234
Before we get going too much farther I would like to clean up sequence title name to something more meaningful and would like to separate out SPID (ie P12234
) into its own column. Spoiler alert: eventually we will be joining this table with another table so that we can glean associated information including the name of the gene and gene ontology.
!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx.tab
solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_3 sp|O42248|GBLP_DANRE 82.46 171 30 0 1 513 35 205 1e-101 301 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_5 sp|Q08013|SSRG_RAT 75.38 65 16 0 3 197 121 185 1e-27 104 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_6 sp|P12234|MPCP_BOVIN 76.62 77 18 0 2 232 286 362 2e-23 98.6 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_9 sp|Q41629|ADT1_WHEAT 82.26 62 11 0 3 188 170 231 3e-27 104 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_13 sp|Q32NG4|PDDC1_XENLA 54.44 90 40 1 1 270 140 228 1e-27 106 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_23 sp|Q9GNE2|RL23_AEDAE 97.22 72 2 0 67 282 14 85 1e-42 142 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_31 sp|Q3V1H3|HPHL1_MOUSE 53.38 133 59 1 2 391 23 155 5e-42 153 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_32 sp|Q641Y2|NDUS2_RAT 88.03 117 14 0 2 352 334 450 1e-70 224 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_37 sp|Q9D3D9|ATPD_MOUSE 56.10 123 54 0 2 370 46 168 7e-42 144 solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed_contig_39 sp|Q39613|CYPH_CATRO 75.00 120 23 1 55 393 1 120 7e-49 160
!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx2.tab
Haliotis_cra_4_contig_3 sp|O42248|GBLP_DANRE 82.46 171 30 0 1 513 35 205 1e-101 301 Haliotis_cra_4_contig_5 sp|Q08013|SSRG_RAT 75.38 65 16 0 3 197 121 185 1e-27 104 Haliotis_cra_4_contig_6 sp|P12234|MPCP_BOVIN 76.62 77 18 0 2 232 286 362 2e-23 98.6 Haliotis_cra_4_contig_9 sp|Q41629|ADT1_WHEAT 82.26 62 11 0 3 188 170 231 3e-27 104 Haliotis_cra_4_contig_13 sp|Q32NG4|PDDC1_XENLA 54.44 90 40 1 1 270 140 228 1e-27 106 Haliotis_cra_4_contig_23 sp|Q9GNE2|RL23_AEDAE 97.22 72 2 0 67 282 14 85 1e-42 142 Haliotis_cra_4_contig_31 sp|Q3V1H3|HPHL1_MOUSE 53.38 133 59 1 2 391 23 155 5e-42 153 Haliotis_cra_4_contig_32 sp|Q641Y2|NDUS2_RAT 88.03 117 14 0 2 352 334 450 1e-70 224 Haliotis_cra_4_contig_37 sp|Q9D3D9|ATPD_MOUSE 56.10 123 54 0 2 370 46 168 7e-42 144 Haliotis_cra_4_contig_39 sp|Q39613|CYPH_CATRO 75.00 120 23 1 55 393 1 120 7e-49 160
#now lets get rid of the pipes
!tr '|' "\t" </Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx2.tab> /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx3.tab
!head /Volumes/web/cnidarian/Ab_4denovo_CLC6_a_uniprot_blastx3.tab
Haliotis_cra_4_contig_3 sp O42248 GBLP_DANRE 82.46 171 30 0 1 513 35 205 1e-101 301 Haliotis_cra_4_contig_5 sp Q08013 SSRG_RAT 75.38 65 16 0 3 197 121 185 1e-27 104 Haliotis_cra_4_contig_6 sp P12234 MPCP_BOVIN 76.62 77 18 0 2 232 286 362 2e-23 98.6 Haliotis_cra_4_contig_9 sp Q41629 ADT1_WHEAT 82.26 62 11 0 3 188 170 231 3e-27 104 Haliotis_cra_4_contig_13 sp Q32NG4 PDDC1_XENLA 54.44 90 40 1 1 270 140 228 1e-27 106 Haliotis_cra_4_contig_23 sp Q9GNE2 RL23_AEDAE 97.22 72 2 0 67 282 14 85 1e-42 142 Haliotis_cra_4_contig_31 sp Q3V1H3 HPHL1_MOUSE 53.38 133 59 1 2 391 23 155 5e-42 153 Haliotis_cra_4_contig_32 sp Q641Y2 NDUS2_RAT 88.03 117 14 0 2 352 334 450 1e-70 224 Haliotis_cra_4_contig_37 sp Q9D3D9 ATPD_MOUSE 56.10 123 54 0 2 370 46 168 7e-42 144 Haliotis_cra_4_contig_39 sp Q39613 CYPH_CATRO 75.00 120 23 1 55 393 1 120 7e-49 160
For intro to SQLShare see https://github.com/sr320/qdod/wiki/SQLShare-Intro
A BLAST output table will be joined with gene descriptions.
Explanation of code
SELECT *
FROM
[wearh@washington.edu].[Oly_Blast_uniprot_swissprot.txt]blast
Left Join
[samwhite@washington.edu].[UniprotProtNamesReviewed_yes20130610]unp
on
blast.Column3 = unp.SPID