This is loosely adapted from one of the InterMineR vignettes https://bioconductor.org/packages/release/bioc/vignettes/InterMineR/inst/doc/Enrichment_Analysis_and_Visualization.html
The InterMine gene set enrichment tool looks for annotations to a set of genes that occur more than would be expected by chance, given a background population of annotations. The hypergeometric distribution is used to calculate a P-value for each annotation and a choice of correction methods for multiple testing (Bonferonni, Holm-Bonferonni and Benjamini Hochberg) are available (Smith et al. 2012; Kalderimis et al. 2014).
InterMine provides Gene Ontology enrichment statistics as well as enrichment statistics for other annotation types including protein domains, pathways, human diseases, mammalian phenotypes and publications. The default background population is all genes in the genome with the specified annotation type. However, the background population can be changed by specifying another list. More information can be found in the online documentation.
Let's start by initialising InterMineR and choosing which InterMine we'll be running queries against:
# Begin by activating the InterMineR library # Syntax: library(LIBRARY_NAME_HERE)
We want to query human data - so let's look and see what InterMines are available, using the
Okay, let's select HumanMine from this list, and store it in a variable called
# Syntax: listMines()["MINE _NAME_HERE"] # Print the value to make sure it's what we expect
And now we need to tell InterMineR to query HumanMine specifically, using the function
initInterMine(). Let's store this in a variable called
# Syntax: initInterMine(mine=MINE_URL_HERE) or, if you're accessing a personal list, you'll also need your API token: # initInterMine(mine=MINE_URL_HERE, "YOUR_TOKEN_HERE")
Performing enrichment analysis with InterMineR is preceded by two steps:
# First, we'll retrieve the widgets available from HumanMine and store it in a variable called `humanWidgets`. # This will tell us what types of enrichment are available. # # Syntax: getWidgets(im) # Print out the widgets so we can see what there is humanWidgets
Take a look through the widgets returned, especially the columns
widgetType. Since we plan to enrich the gene list
PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes, we're only interested in widgets that target
Genes. Similarly, we don't need to look at any widgets that aren't
enrichment type widgets.
# We'll put the widget list `humanWidgets` in to a data frame, so it's easy to filter. # # Syntax: as.data.frame(LIST_NAME_HERE) humanWidgetsDataFrame <- #put data frame code here
Now let's ask the data frame to give us a subset of the widgets.
We want the following filters:
# Syntax: subset(YOU_DATA_FRAME, COLUMN1 == 'VALUE1' & COLUMN2 == 'VALUE2' )
In the code example above, we use the double equals sign
==. This is because a single equals sign is used for assignment, e.g.
kittens = 5, which would assign the value 5 to the variable
In R we also use
<- for assignment, so
kittens = 5 and
kittens <- 5 do (basically) the same thing.
The double equals, however, is used for comparison - when we say
widgetType == 'enrichment' we're saying that the code should compare all
widgetTypes, and only give us ones that are equal to
Looking at the filtered list of widgets above, it looks like the widget we want to use to check for enriched GO Terms is named
go_enrichment_for_gene. We'll use the
doEnrichment method to get the enrichment results from HumanMine.
# We're using the widget `go_enrichment_for_gene` against the list name # `PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes` # #Syntax: # # GOEnrichmentResult <- doEnrichment( # im = YOUR_CHOSEN_INTERMINE_HERE # list = LIST_NAME_SAVE_ON_YOUR_CHOSEN_INTERMINE, # widget = "go_enrichment_for_gene" # Or whichever widget you're using # organism = "Homo sapiens" # optional if results from more than one organism are retrieved # ) GOEnrichmentResult <- doEnrichment( # Put enrichment details here ) # Print the results GOEnrichmentResult
For the visualisation, we'll need both the enrichment results and the list of genes we originally enriched - but the list is still on the server. Let's run a query to get the primaryIdentifiers for the genes in
Views to Select:
Constraints to set:
# We'll be using the setQuery method here # # Syntax: # setQuery( # select = c("VIEW.NAME1", "VIEW.NAME2", .... ), # where = setConstraints( # paths = c("CONSTRAINT.PATH.A", "CONSTRAINT.PATH.B"), # operators = c("CONSTRAINT.OPERATOR.A", "CONSTRAINT.OPERATOR.B"), # values = list("CONSTRAINT.VALUE.A","CONSTRAINT.VALUE.B") # ) #) queryEpilepsy <- setQuery( # fill out your view and constraints here )
Okay, looking good! Now that we've set that query up, we need to actually run it.
# Run the query using the syntax `runQuery(im, QUERY_HERE)` queryEpilepsyResults <- # Add run query details here # print the results:
# load GeneAnswers package. Remember the syntax is `library(LIBRARYNAME)`
We'll need to convert our InterMine data into a shape that GeneAnswers can read. Luckily, there's a nice method for that in InterMineR:
convertToGeneAnswers. We'll need to pass it the following arguments:
GOEnrichmentResult- the enriched GO term results.
queryEpilepsyResultsas the original list of IDs we enriched
"Gene.primaryIdentifier"as the type of IDs we're providing (as opposed to symbol or secondaryidentifiers).
# convert InterMineR Gene Ontology Enrichment analysis results to GeneAnswers object # Syntax: # convertToGeneAnswers( # enrichmentResult = AN_INTERMINE_ENRICHMENT_RESULT, # geneInput = data.frame(GeneID = ORIGINAL_LIST_OF_IDS_WE_ENRICHED_ON, # stringsAsFactors = FALSE), # geneInputType = "TYPE_OF_IDS_PROVIDED" (e.g. "Gene.primaryIdentifier" or "Gene.symbol"), # categoryType = "GO" # could also be null for a user-provided annotation list #) queryEpilepsyForGeneAnswers <- convertToGeneAnswers( # Put the arguments here )
We can use the
summary(geneAnswersObjectHere) function to see
interesting info about our list + enrichment. Try it now:
# Run the summary on the Gene Answers object we made in the previous code cell. # The variable name we used was `queryEpilepsyForGeneAnswers`
For more info on how the graphs are constructed and what arguments you can pass them, visit the GeneAnswers Bioconductor page - the vignettes and especially the manual have lots of helpful information.
We'll start with a concept-gene network that shows which genes are associated with our given GO terms:
geneAnswersConceptNet(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created colorValueColumn=NULL, # Default colours centroidSize='correctedPvalue', # the column used to size graph nodes output='fixed', # don't spawn a new pop-up window geneSymbol = TRUE) # Show gene symbols as labels
We can plot the GO terms that our list is enriched for to show their relationships to one another:
geneAnswersConceptRelation(queryEpilepsyForGeneAnswers, # This is the Gene Answers object we created directed=TRUE, # show arrows to indicate the direction of relationship output="fixed", # don't spawn a new pop-up window netMode='connection')
In this example, GeneAnswers uses the annotation package ‘org.Hs.eg.db’, which was used previously in convertToGeneAnswers function, to retrieve gene interactions for the specified genes.
buildNet(getGeneInput(queryEpilepsyForGeneAnswers)[,1], # queryEpilepsyForGeneAnswers idType='GeneInteraction', layers=3, output="fixed", # don't spawn a new pop-up window filterGraphIDs=getGeneInput(queryEpilepsyForGeneAnswers)[,1], filterLayer=2, # How many layers deep to look for interactors. Try changing it to 3! netMode='connection')
Gene Interaction Network for PL_GenomicsEngland_GenePanel:Genetic_Epilepsy_Syndromes.
The large black dots with yellow frame stand for the six given genes. They also connect to other genes by dark-blue-purple edges. Small black dots represent the other genes from getGeneInput. Small white dots are genes that are not in the genes from getGeneInput, but interact with these genes.
geneAnswersHeatmap(queryEpilepsyForGeneAnswers, catTerm=TRUE, geneSymbol=TRUE, cex.axis = 0.75)