Intermine-Python: Tutorial 10: Enrichment Calculations on Lists

This tutorial will talk about how you can perform enrichment calculations on lists that you have access to.

In [1]:
from intermine.webservice import Service
In [2]:
service = Service("https://www.flymine.org/flymine/service")

The intermine service has various widgets that can perform different functions. These widgets are stored in a dictionary in the Service class. To view all the widgets use service.widgets.

In [3]:
service.widgets
Out[3]:
{'flyatlas_for_gene': {'name': 'flyatlas_for_gene',
  'chartType': 'BarChart',
  'description': 'For each tissue in the fly, the number of genes from this list for which the levels of expression are significantly high (Up) or low (Down) according to <a href="http://www.flyatlas.org/" target="_new">FlyAtlas</a> AffyCall.',
  'startClass': 'FlyAtlasResult',
  'title': 'Gene Expression in the Fly (FlyAtlas)',
  'targets': ['Gene'],
  'widgetType': 'chart',
  'labels': {'x': 'Tissue', 'y': 'Up (+) or Down (-) gene count'}},
 'go_enrichment_for_gene': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'goAnnotation.ontologyTerm.parents.identifier',
  'name': 'go_enrichment_for_gene',
  'description': 'GO terms enriched for items in this list.',
  'enrich': 'goAnnotation.ontologyTerm.parents.name',
  'filters': 'biological_process,cellular_component,molecular_function',
  'startClass': 'Gene',
  'title': 'Gene Ontology Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'},
 'bdgp': {'name': 'bdgp',
  'chartType': 'ColumnChart',
  'description': 'Expression patterns of Drosophila mRNAs during embryogenesis - data from  <a href="http://www.fruitfly.org/cgi-bin/ex/insitu.pl" target="_new">BGDP</a>.  Note that not all genes have been assayed by <a href="http://www.fruitfly.org/cgi-bin/ex/insitu.pl" target="_new">BGDP</a>. ',
  'startClass': 'Gene',
  'title': 'BDGP expression patterns',
  'targets': ['Gene'],
  'widgetType': 'chart',
  'labels': {'x': 'Stage', 'y': 'Gene count'}},
 'pathway_enrichment': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'pathways.identifier',
  'name': 'pathway_enrichment',
  'description': 'Pathways enriched for genes in this list - data from KEGG and Reactome',
  'enrich': 'pathways.name',
  'filters': 'All,KEGG pathways data set,Reactome data set',
  'startClass': 'Gene',
  'title': 'Pathway Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'},
 'flyfish': {'name': 'flyfish',
  'chartType': 'ColumnChart',
  'description': 'Expression patterns of Drosophila mRNAs at the subcellular level during early embryogenesis - data from  <a href="http://fly-fish.ccbr.utoronto.ca/" target="_new">Fly-FISH</a>.  Note that not all genes have been assayed by <a href="http://fly-fish.ccbr.utoronto.ca/" target="_new">Fly-FISH</a>.',
  'startClass': 'Gene',
  'title': 'mRNA subcellular localisation (fly-FISH)',
  'targets': ['Gene'],
  'widgetType': 'chart',
  'labels': {'x': 'Stage', 'y': 'Gene count'}},
 'prot_dom_enrichment_for_protein': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'proteinDomainRegions.proteinDomain.primaryIdentifier',
  'name': 'prot_dom_enrichment_for_protein',
  'description': 'Protein Domains enriched for items in this list.',
  'enrich': 'proteinDomainRegions.proteinDomain.name',
  'startClass': 'Protein',
  'title': 'Protein Domain Enrichment',
  'targets': ['Protein'],
  'widgetType': 'enrichment'},
 'protein_features': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'features.type',
  'name': 'protein_features',
  'description': 'UniProt features enriched for proteins in this list.',
  'enrich': 'features.type',
  'startClass': 'Protein',
  'title': 'UniProt Features',
  'targets': ['Protein'],
  'widgetType': 'enrichment'},
 'flyatlas_for_probeset': {'name': 'flyatlas_for_probeset',
  'chartType': 'BarChart',
  'description': 'For each tissue, the number of genes from this list for which the levels of expression are significantly high (Up) or low (Down) according to <a href="http://www.flyatlas.org/" target="_new">FlyAtlas</a> AffyCall.',
  'startClass': 'FlyAtlasResult',
  'title': 'Gene Expression in Fly Tissues (FlyAtlas Data)',
  'targets': ['ProbeSet'],
  'widgetType': 'chart',
  'labels': {'x': 'Tissue', 'y': 'Up (+) or Down (-) gene count'}},
 'uniprot_keywords': {'startClassDisplay': 'primaryIdentifier',
  'name': 'uniprot_keywords',
  'description': 'UniProt keywords enriched for proteins in this list.',
  'enrich': 'keywords.name',
  'startClass': 'Protein',
  'title': 'UniProt Keywords',
  'targets': ['Protein'],
  'widgetType': 'enrichment'},
 'orthologues': {'name': 'orthologues',
  'description': 'Counts of genes in list with orthologues.',
  'title': 'Orthologues',
  'targets': ['Gene'],
  'widgetType': 'table'},
 'publication_enrichment': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'publications.pubMedId',
  'name': 'publication_enrichment',
  'description': 'Publications enriched for genes in this list.',
  'enrich': 'publications.title',
  'startClass': 'Gene',
  'title': 'Publication Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'},
 'bdgp_enrichment': {'startClassDisplay': 'primaryIdentifier',
  'name': 'bdgp_enrichment',
  'description': 'ImaGO terms enriched for genes in this list - data from <a href="http://www.fruitfly.org/cgi-bin/ex/insitu.pl" target="_new">BDGP</a>.  Note that not all genes have been assayed by BDGP.',
  'enrich': 'mRNAExpressionResults.mRNAExpressionTerms[MRNAExpressionTerm].name',
  'startClass': 'Gene',
  'title': 'BDGP Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'},
 'chromosome_distribution_for_gene': {'name': 'chromosome_distribution_for_gene',
  'chartType': 'ColumnChart',
  'description': 'Actual: number of items in this list found on each chromosome.  Expected: given the total number of items on the chromosome and the number of items in this list, the number of items expected to be found on each chromosome.',
  'filters': 'organism.name=[list]',
  'startClass': 'SequenceFeature',
  'title': 'Chromosome Distribution',
  'targets': ['SequenceFeature'],
  'widgetType': 'chart',
  'labels': {'x': 'Chromosome', 'y': 'Count'}},
 'prot_dom_enrichment_for_gene': {'startClassDisplay': 'primaryIdentifier',
  'enrichIdentifier': 'proteins.proteinDomainRegions.proteinDomain.primaryIdentifier',
  'name': 'prot_dom_enrichment_for_gene',
  'description': 'Protein Domains enriched for items in this list.',
  'enrich': 'proteins.proteinDomainRegions.proteinDomain.name',
  'startClass': 'Gene',
  'title': 'Protein Domain Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'},
 'miranda_enrichment': {'startClassDisplay': 'symbol',
  'name': 'miranda_enrichment',
  'description': 'MiRNAs enriched for items in this list.',
  'enrich': 'transcripts[MRNA].miRNAinteractions.mirnagene.symbol',
  'startClass': 'Gene',
  'title': 'MiRNA Enrichment',
  'targets': ['Gene'],
  'widgetType': 'enrichment'}}

If you want to view only those widgets that are related to enrichment, you can use filter based on the widget type. I've printed all the enrichment related widgets below.

In [4]:
for key in service.widgets.keys():
    if service.widgets[key]["widgetType"]=="enrichment":
        print(service.widgets[key])
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'goAnnotation.ontologyTerm.parents.identifier', 'name': 'go_enrichment_for_gene', 'description': 'GO terms enriched for items in this list.', 'enrich': 'goAnnotation.ontologyTerm.parents.name', 'filters': 'biological_process,cellular_component,molecular_function', 'startClass': 'Gene', 'title': 'Gene Ontology Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'pathways.identifier', 'name': 'pathway_enrichment', 'description': 'Pathways enriched for genes in this list - data from KEGG and Reactome', 'enrich': 'pathways.name', 'filters': 'All,KEGG pathways data set,Reactome data set', 'startClass': 'Gene', 'title': 'Pathway Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'proteinDomainRegions.proteinDomain.primaryIdentifier', 'name': 'prot_dom_enrichment_for_protein', 'description': 'Protein Domains enriched for items in this list.', 'enrich': 'proteinDomainRegions.proteinDomain.name', 'startClass': 'Protein', 'title': 'Protein Domain Enrichment', 'targets': ['Protein'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'features.type', 'name': 'protein_features', 'description': 'UniProt features enriched for proteins in this list.', 'enrich': 'features.type', 'startClass': 'Protein', 'title': 'UniProt Features', 'targets': ['Protein'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'name': 'uniprot_keywords', 'description': 'UniProt keywords enriched for proteins in this list.', 'enrich': 'keywords.name', 'startClass': 'Protein', 'title': 'UniProt Keywords', 'targets': ['Protein'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'publications.pubMedId', 'name': 'publication_enrichment', 'description': 'Publications enriched for genes in this list.', 'enrich': 'publications.title', 'startClass': 'Gene', 'title': 'Publication Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'name': 'bdgp_enrichment', 'description': 'ImaGO terms enriched for genes in this list - data from <a href="http://www.fruitfly.org/cgi-bin/ex/insitu.pl" target="_new">BDGP</a>.  Note that not all genes have been assayed by BDGP.', 'enrich': 'mRNAExpressionResults.mRNAExpressionTerms[MRNAExpressionTerm].name', 'startClass': 'Gene', 'title': 'BDGP Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'primaryIdentifier', 'enrichIdentifier': 'proteins.proteinDomainRegions.proteinDomain.primaryIdentifier', 'name': 'prot_dom_enrichment_for_gene', 'description': 'Protein Domains enriched for items in this list.', 'enrich': 'proteins.proteinDomainRegions.proteinDomain.name', 'startClass': 'Gene', 'title': 'Protein Domain Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}
{'startClassDisplay': 'symbol', 'name': 'miranda_enrichment', 'description': 'MiRNAs enriched for items in this list.', 'enrich': 'transcripts[MRNA].miRNAinteractions.mirnagene.symbol', 'startClass': 'Gene', 'title': 'MiRNA Enrichment', 'targets': ['Gene'], 'widgetType': 'enrichment'}

We will now get a list on which we want to perform the analysis using the list manager.

In [5]:
lm=service.list_manager()
In [6]:
l=lm.get_list(name="PL FlyAtlas_brain_top")

To perform any enrichment analysis on the list we use the calculate_enrichment method. I've stored the results in "r". We will now iterate through R and view the results.

In [7]:
r=l.calculate_enrichment(widget="publication_enrichment",maxp=0.01)
In [8]:
for i in r:
    print(i.identifier,i.description,i.p_value)
29671739 Drosophilamidbrain revealed by single-cell transcriptomics. 4.311080647146878e-05
22683328 Mutation of Drosophila dopamine receptor DopR leads to male-male courtship behavior. 0.0004231640537470922
23895060 Temporally dimorphic recruitment of dopamine neurons into stress response circuitry in Drosophila. 0.0004231640537470922
24128361 Sexually dimorphic recruitment of dopamine neurons into the stress response circuitry. 0.0004231640537470922
15987944 Rapid, nongenomic responses to ecdysteroids and catecholamines mediated by a novel Drosophila G-protein-coupled receptor. 0.0016901138347928336
17986026 Suppression of excitatory cholinergic synaptic transmission by Drosophila dopamine D1-like receptors. 0.0016901138347928336
24303109 Pharmacological analysis of dopamine modulation in the Drosophila melanogaster larval heart. 0.0016901138347928336
28902472 The mushroom body D1 dopamine receptor controls innate courtship drive. 0.0016901138347928336
27160913 Cell-Type-Specific Transcriptome Analysis in the Drosophila Mushroom Body Reveals Memory-Related Changes in Gene Expression. 0.0027171143620512823
27571359 Visual Attention in Flies-Dopamine in the Mushroom Bodies Mediates the After-Effect of Cueing. 0.004218770306957829
29779874 A GABAergic Feedback Shapes Dopaminergic Input on the Drosophila Mushroom Body to Promote Appetitive Long-Term Memory. 0.004218770306957829
30397017 DrosophilaMushroom Bodies. 0.004460444207028984
27487216 Operation of a homeostatic sleep switch. 0.005793537089956694
21286249 Sleep deprivation during early-adult development results in long-lasting learning deficits in adult Drosophila. 0.008424700608528294
25632118 Behavioral and circuit basis of sucrose rejection by Drosophila females in a simple decision-making task. 0.008424700608528294
27292538 Dopaminergic Circuitry Underlying Mating Drive. 0.008424700608528294
28473588 Branch-specific plasticity of a bifunctional dopamine circuit encodes protein hunger. 0.008424700608528294

This is how you can perform enrichment calculations on lists of your choice.