This is loosely adapted from one of the InterMineR vignettes https://bioconductor.org/packages/release/bioc/vignettes/InterMineR/inst/doc/Enrichment_Analysis_and_Visualization.html
The InterMine gene set enrichment tool looks for annotations to a set of genes that occur more than would be expected by chance, given a background population of annotations. The hypergeometric distribution is used to calculate a P-value for each annotation and a choice of correction methods for multiple testing (Bonferonni, Holm-Bonferonni and Benjamini Hochberg) are available (Smith et al. 2012; Kalderimis et al. 2014).
InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types including protein domains, pathways, human diseases, mammalian phenotypes and publications. The default background population is all genes in the genome with the specified annotation type. However, the background population can be changed by specifying another list. More information can be found in the online documentation.
Let's start by initialising InterMineR and choosing which InterMine we'll be running queries against:
# Begin by activating the InterMineR library
# Syntax: library(LIBRARY_NAME_HERE)
We want to query human data - so let's look and see what InterMines are available, using the listMines()
function:
Okay, let's select HumanMine from this list, and store it in a variable called humanMine
:
# Syntax: listMines()["MINE _NAME_HERE"]
# Print the value to make sure it's what we expect
And now we need to tell InterMineR to query HumanMine specifically, using the function initInterMine()
. Let's store this in a variable called im
.
# Syntax: initInterMine(mine=MINE_URL_HERE) or, if you're accessing a personal list, you'll also need your API token:
# initInterMine(mine=MINE_URL_HERE, "YOUR_TOKEN_HERE")
Performing enrichment analysis with InterMineR is preceded by two steps:
# First, we'll retrieve the widgets available from HumanMine and store it in a variable called `humanWidgets`.
# This will tell us what types of enrichment are available.
#
# Syntax: getWidgets(im)
# Print out the widgets so we can see what there is
humanWidgets
Take a look through the widgets returned, especially the columns targets
and widgetType
. Since we plan to enrich the gene list PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes
, we're only interested in widgets that target Gene
s. Similarly, we don't need to look at any widgets that aren't enrichment
type widgets.
# We'll put the widget list `humanWidgets` in to a data frame, so it's easy to filter.
#
# Syntax: as.data.frame(LIST_NAME_HERE)
humanWidgetsDataFrame <- #put data frame code here
Now let's ask the data frame to give us a subset of the widgets.
We want the following filters:
widgetType
should be 'enrichment'
targets
should be Gene
# Syntax: subset(YOU_DATA_FRAME, COLUMN1 == 'VALUE1' & COLUMN2 == 'VALUE2' )
=
vs ==
¶=
or <-
¶In the code example above, we use the double equals sign ==
. This is because a single equals sign is used for assignment, e.g. kittens = 5
, which would assign the value 5 to the variable kittens
.
In R we also use <-
for assignment, so kittens = 5
and kittens <- 5
do (basically) the same thing.
==
¶The double equals, however, is used for comparison - when we say widgetType == 'enrichment'
we're saying that the code should compare all widgetTypes
, and only give us ones that are equal to enrichment
.
Looking at the filtered list of widgets above, it looks like the widget we want to use to check for enriched GO Terms is named go_enrichment_for_gene
. We'll use the doEnrichment
method to get the enrichment results from HumanMine.
# We're using the widget `go_enrichment_for_gene` against the list name
# `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes`
#
#Syntax:
#
# GOEnrichmentResult <- doEnrichment(
# im = YOUR_CHOSEN_INTERMINE_HERE
# list = LIST_NAME_SAVE_ON_YOUR_CHOSEN_INTERMINE,
# widget = "go_enrichment_for_gene" # Or whichever widget you're using
# organism = "Homo sapiens" # optional if results from more than one organism are retrieved
# )
GOEnrichmentResult <- doEnrichment(
# Put enrichment details here
)
# Print the results
GOEnrichmentResult
For the visualisation, we'll need both the enrichment results and the list of genes we originally enriched - but the list is still on the server. Let's run a query to get the primaryIdentifiers for the genes in PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes
.
Views to Select:
"Gene.primaryIdentifier"
Constraints to set:
"Gene"
IN
,"PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes"
# We'll be using the setQuery method here
#
# Syntax:
# setQuery(
# select = c("VIEW.NAME1", "VIEW.NAME2", .... ),
# where = setConstraints(
# paths = c("CONSTRAINT.PATH.A", "CONSTRAINT.PATH.B"),
# operators = c("CONSTRAINT.OPERATOR.A", "CONSTRAINT.OPERATOR.B"),
# values = list("CONSTRAINT.VALUE.A","CONSTRAINT.VALUE.B")
# )
#)
queryEpilepsy <- setQuery(
# fill out your view and constraints here
)
Okay, looking good! Now that we've set that query up, we need to actually run it.
# Run the query using the syntax `runQuery(im, QUERY_HERE)`
queryEpilepsyResults <- # Add run query details here
# print the results:
We'll be using GeneAnswers to generate some visualisations of the data we've just fetched.
# load GeneAnswers package. Remember the syntax is `library(LIBRARYNAME)`
We'll need to convert our InterMine data into a shape that GeneAnswers can read. Luckily, there's a nice method for that in InterMineR: convertToGeneAnswers
. We'll need to pass it the following arguments:
GOEnrichmentResult
- the enriched GO term results.queryEpilepsyResults
as the original list of IDs we enriched"Gene.primaryIdentifier"
as the type of IDs we're providing (as opposed to symbol or secondaryidentifiers).# convert InterMineR Gene Ontology Enrichment analysis results to GeneAnswers object
# Syntax:
# convertToGeneAnswers(
# enrichmentResult = AN_INTERMINE_ENRICHMENT_RESULT,
# geneInput = data.frame(GeneID = ORIGINAL_LIST_OF_IDS_WE_ENRICHED_ON,
# stringsAsFactors = FALSE),
# geneInputType = "TYPE_OF_IDS_PROVIDED" (e.g. "Gene.primaryIdentifier" or "Gene.symbol"),
# categoryType = "GO" # could also be null for a user-provided annotation list
#)
queryEpilepsyForGeneAnswers <- convertToGeneAnswers(
# Put the arguments here
)
We can use the summary(geneAnswersObjectHere)
function to see
interesting info about our list + enrichment. Try it now:
# Run the summary on the Gene Answers object we made in the previous code cell.
# The variable name we used was `queryEpilepsyForGeneAnswers`
For more info on how the graphs are constructed and what arguments you can pass them, visit the GeneAnswers Bioconductor page - the vignettes and especially the manual have lots of helpful information.
We'll start with a concept-gene network that shows which genes are associated with our given GO terms:
geneAnswersConceptNet(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created
colorValueColumn=NULL, # Default colours
centroidSize='correctedPvalue', # the column used to size graph nodes
output='fixed', # don't spawn a new pop-up window
geneSymbol = TRUE) # Show gene symbols as labels
We can plot the GO terms that our list is enriched for to show their relationships to one another:
geneAnswersConceptRelation(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created
directed=TRUE, # show arrows to indicate the direction of relationship
output="fixed", # don't spawn a new pop-up window
netMode='connection')
In this example, GeneAnswers uses the annotation package ‘org.Hs.eg.db’, which was used previously in convertToGeneAnswers function, to retrieve gene interactions for the specified genes.
buildNet(getGeneInput(queryEpilepsyForGeneAnswers)[,1], # queryEpilepsyForGeneAnswers
idType='GeneInteraction',
layers=3,
output="fixed", # don't spawn a new pop-up window
filterGraphIDs=getGeneInput(queryEpilepsyForGeneAnswers)[,1],
filterLayer=2, # How many layers deep to look for interactors. Try changing it to 3!
netMode='connection')
Gene Interaction Network for PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes.
The large black dots with yellow frame stand for the six given genes. They also connect to other genes by dark-blue-purple edges. Small black dots represent the other genes from getGeneInput. Small white dots are genes that are not in the genes from getGeneInput, but interact with these genes.
Let's plot the genes which were enriched for GO terms in a heat map...
geneAnswersHeatmap(queryEpilepsyForGeneAnswers,
catTerm=TRUE,
geneSymbol=TRUE,
cex.axis = 0.75)