Enrichment analysis and visualisations

This is loosely adapted from one of the InterMineR vignettes https://bioconductor.org/packages/release/bioc/vignettes/InterMineR/inst/doc/Enrichment_Analysis_and_Visualization.html

Background

The InterMine gene set enrichment tool looks for annotations to a set of genes that occur more than would be expected by chance, given a background population of annotations. The hypergeometric distribution is used to calculate a P-value for each annotation and a choice of correction methods for multiple testing (Bonferonni, Holm-Bonferonni and Benjamini Hochberg) are available (Smith et al. 2012; Kalderimis et al. 2014).

InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types including protein domains, pathways, human diseases, mammalian phenotypes and publications. The default background population is all genes in the genome with the specified annotation type. However, the background population can be changed by specifying another list. More information can be found in the online documentation.

Goal for this exercise

  1. Use InterMine's enrichment widgets on the HumanMine public list PL_GenomicsEngland_GenePanel%:Genetic_Epilepsy_Syndromes to see if the list is enriched for any GO Terms.
  2. Visualise the results using the GeneAnswers class.

Setup

Let's start by initialising InterMineR and choosing which InterMine we'll be running queries against:

In [ ]:
# Begin by activating the InterMineR library
# Syntax: library(LIBRARY_NAME_HERE)

We want to query human data - so let's look and see what InterMines are available, using the listMines() function:

In [ ]:

Okay, let's select HumanMine from this list, and store it in a variable called humanMine:

In [ ]:
# Syntax: listMines()["MINE _NAME_HERE"] 

# Print the value to make sure it's what we expect

And now we need to tell InterMineR to query HumanMine specifically, using the function initInterMine(). Let's store this in a variable called im.

In [ ]:
# Syntax: initInterMine(mine=MINE_URL_HERE) or, if you're accessing a personal list, you'll also need your API token: 
#         initInterMine(mine=MINE_URL_HERE, "YOUR_TOKEN_HERE")

Getting started with Enrichment

Performing enrichment analysis with InterMineR is preceded by two steps:

  1. Get the enrichment widget name which indicates the annotation types that you want to investigate for enrichment (e.g. Gene Ontology Terms, protein domains, KEGG and Reactome Pathways, human diseases, mammalian phenotypes and publications).
  2. Retrieve the list of bioentities of interest (Genes, Proteins, SNPs, etc.) for which the analysis will be performed.
In [ ]:
# First, we'll retrieve the widgets available from HumanMine and store it in a variable called `humanWidgets`. 
# This will tell us what types of enrichment are available.
# 
# Syntax: getWidgets(im)


# Print out the widgets so we can see what there is

humanWidgets

Take a look through the widgets returned, especially the columns targets and widgetType. Since we plan to enrich the gene list PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes, we're only interested in widgets that target Genes. Similarly, we don't need to look at any widgets that aren't enrichment type widgets.

In [ ]:
# We'll put the widget list `humanWidgets` in to a data frame, so it's easy to filter.
#
# Syntax: as.data.frame(LIST_NAME_HERE)

humanWidgetsDataFrame <- #put data frame code here

Now let's ask the data frame to give us a subset of the widgets.
We want the following filters:

  • widgetType should be 'enrichment'
  • targets should be Gene
In [ ]:
# Syntax: subset(YOU_DATA_FRAME, COLUMN1 == 'VALUE1' & COLUMN2 == 'VALUE2' )

Syntax Note: = vs ==

Assignment operator: = or <-

In the code example above, we use the double equals sign ==. This is because a single equals sign is used for assignment, e.g. kittens = 5, which would assign the value 5 to the variable kittens.

In R we also use <- for assignment, so kittens = 5 and kittens <- 5 do (basically) the same thing.

Comparison operator: ==

The double equals, however, is used for comparison - when we say widgetType == 'enrichment' we're saying that the code should compare all widgetTypes, and only give us ones that are equal to enrichment.

Perform enrichment analysis

Looking at the filtered list of widgets above, it looks like the widget we want to use to check for enriched GO Terms is named go_enrichment_for_gene. We'll use the doEnrichment method to get the enrichment results from HumanMine.

In [ ]:
# We're using the widget `go_enrichment_for_gene` against the list name 
# `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes`
#
#Syntax: 
#
# GOEnrichmentResult <- doEnrichment(
#   im = YOUR_CHOSEN_INTERMINE_HERE
#   list = LIST_NAME_SAVE_ON_YOUR_CHOSEN_INTERMINE,
#   widget = "go_enrichment_for_gene" #  Or whichever widget you're using
#   organism = "Homo sapiens" # optional if results from more than one organism are retrieved
# )

GOEnrichmentResult <- doEnrichment(
    # Put enrichment details here
)

# Print the results 
GOEnrichmentResult

Visualise it with Gene Answers

For the visualisation, we'll need both the enrichment results and the list of genes we originally enriched - but the list is still on the server. Let's run a query to get the primaryIdentifiers for the genes in PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes.

Views to Select:

  • "Gene.primaryIdentifier"

Constraints to set:

  • Path: "Gene"
  • Operator: IN,
  • Value: "PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes"
In [ ]:
# We'll be using the setQuery method here
# 
# Syntax: 
# setQuery( 
#  select = c("VIEW.NAME1", "VIEW.NAME2", .... ),
#  where = setConstraints(
#    paths = c("CONSTRAINT.PATH.A", "CONSTRAINT.PATH.B"),
#    operators = c("CONSTRAINT.OPERATOR.A", "CONSTRAINT.OPERATOR.B"),
#    values = list("CONSTRAINT.VALUE.A","CONSTRAINT.VALUE.B")
#  )
#)
queryEpilepsy <- setQuery(
    # fill out your view and constraints here
)

Okay, looking good! Now that we've set that query up, we need to actually run it.

In [ ]:
# Run the query using the syntax `runQuery(im, QUERY_HERE)` 
queryEpilepsyResults <- # Add run query details here

# print the results:

Format data for GeneAnswers

We'll be using GeneAnswers to generate some visualisations of the data we've just fetched.

In [ ]:
# load GeneAnswers package. Remember the syntax is `library(LIBRARYNAME)`

We'll need to convert our InterMine data into a shape that GeneAnswers can read. Luckily, there's a nice method for that in InterMineR: convertToGeneAnswers. We'll need to pass it the following arguments:

  • GOEnrichmentResult - the enriched GO term results.
  • queryEpilepsyResults as the original list of IDs we enriched
  • "Gene.primaryIdentifier" as the type of IDs we're providing (as opposed to symbol or secondaryidentifiers).
In [ ]:
# convert InterMineR Gene Ontology Enrichment analysis results to GeneAnswers object
# Syntax: 
# convertToGeneAnswers(
#  enrichmentResult = AN_INTERMINE_ENRICHMENT_RESULT,
#  geneInput = data.frame(GeneID = ORIGINAL_LIST_OF_IDS_WE_ENRICHED_ON, 
#                         stringsAsFactors = FALSE),
#  geneInputType = "TYPE_OF_IDS_PROVIDED" (e.g. "Gene.primaryIdentifier" or "Gene.symbol"),
#  categoryType = "GO" # could also be null for a user-provided annotation list
#)

queryEpilepsyForGeneAnswers <- convertToGeneAnswers(
    # Put the arguments here
    )

We can use the summary(geneAnswersObjectHere) function to see interesting info about our list + enrichment. Try it now:

In [ ]:
# Run the summary on the Gene Answers object we made in the previous code cell. 
# The variable name we used was `queryEpilepsyForGeneAnswers`

It's time for some graphs

For more info on how the graphs are constructed and what arguments you can pass them, visit the GeneAnswers Bioconductor page - the vignettes and especially the manual have lots of helpful information.

We'll start with a concept-gene network that shows which genes are associated with our given GO terms:

In [ ]:
geneAnswersConceptNet(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created
                      colorValueColumn=NULL,       # Default colours
                      centroidSize='correctedPvalue', # the column used to size graph nodes
                      output='fixed',              # don't spawn a new pop-up window 
                      geneSymbol = TRUE)           # Show gene symbols as labels

GO structure network

We can plot the GO terms that our list is enriched for to show their relationships to one another:

In [ ]:
geneAnswersConceptRelation(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created
                           directed=TRUE,               # show arrows to indicate the direction of relationship
                           output="fixed",              # don't spawn a new pop-up window 
                           netMode='connection')

Gene Interaction Network

In this example, GeneAnswers uses the annotation package ‘org.Hs.eg.db’, which was used previously in convertToGeneAnswers function, to retrieve gene interactions for the specified genes.

In [ ]:
buildNet(getGeneInput(queryEpilepsyForGeneAnswers)[,1], # queryEpilepsyForGeneAnswers
         idType='GeneInteraction', 
         layers=3,
         output="fixed",            # don't spawn a new pop-up window 
         filterGraphIDs=getGeneInput(queryEpilepsyForGeneAnswers)[,1],
         filterLayer=2,             # How many layers deep to look for interactors. Try changing it to 3! 
         netMode='connection')

Gene Interaction Network for PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes.

The large black dots with yellow frame stand for the six given genes. They also connect to other genes by dark-blue-purple edges. Small black dots represent the other genes from getGeneInput. Small white dots are genes that are not in the genes from getGeneInput, but interact with these genes.

Genes vs GO Term heat map

Let's plot the genes which were enriched for GO terms in a heat map...

In [ ]:
geneAnswersHeatmap(queryEpilepsyForGeneAnswers, 
                   catTerm=TRUE, 
                   geneSymbol=TRUE,
                   cex.axis = 0.75)