Creating an InterMine workflow using the API

We start by importing the Service class from InterMine's webservice module. You will need to access your account on humanMine and you do this through an API token. You can get your token by logging into HumanMine and going to the account details tab within MyMine. Cut and paste your token into the code below.

In [ ]:
from intermine.webservice import Service
service = Service("http://www.humanmine.org/humanmine/service", token = "YOUR TOKEN HERE")

First Query: Pax6 Targets

Our first query looked at whether the set of Pax6 targets (from list PL_Pax6_Targets) are expressed in the pancreas. In the web interface we used a template to run this query, but here we will create a query object. We want to create a query against the Gene class in this case.

In [ ]:
# Create a new query against the root class "Gene"
# Syntax: query = service.new_query("root_class_here")

query = 

First we will define the output columns that we want in our result - i.e the view. We want to add fields (attributes) from both the Gene class and its child, the proteinAtlasExpression class.

Note that we have started our query from the Gene class in the previous step. symbol and primaryIdentifier are attributes of this class. The proteinAtlasExpression class is referenced from the Gene class, so to return the protein atlas information we give the path from the Gene class - i.e proteinAtlasExpression.level etc.

In [ ]:
# Now select the following views: 
#
#    "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
#    "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
#    "proteinAtlasExpression.tissue.name"
#
# The syntax to do so is query = query.add_view("comma","separated", "set", "of", "views")

Next, add the constraints to your query. We want to constrain the Gene class to the genes in the PL_Pax6_Targets list.

In [ ]:
# Syntax: query.add_constraint("view_name", "operator", "value", code = "SINGLE_CAPITAL_LETTER_HERE")
#
# Let's set "Gene" to be "IN" the list named "PL_Pax6_Targets". 
# 
# note the final argument is the "code" - this is simply a unique identifier for
# each constraint that allows tou to revisit them later if needed. Usually it's easiest 
# just to sequentially assign an alphabet letter - so in this case, set it to A next time to B, etc.

# e.g. code = "A".

# Here's the first one written for you as an example: 

query.add_constraint("Gene", "IN", "PL_Pax6_Targets", code = "A")

We also need to constrain the expression level to be "high" or "medium" and the tissue to be "pancreas".

In [ ]:
# Syntax reminder: query.add_constraint("view_name", "operator", "value")
#
# Let's add two constraints: 
# - Set "proteinAtlasExpression.tissue.name" to be equal to "Pancreas", with code "B"
# - Set "proteinAtlasExpression.level" to be "ONE OF" the options "Medium" or "High", with code "C" 
#   (Note: when you use the ONE OF operator, each option needs to be part of an array,
#          i.e. in this case we would write ["Medium", "High"] as the value.)
#

Now, let's check what the query returns by looping through the rows and printing the results:

In [ ]:
# We've filled this one out for you :) 

for row in query.rows():
    print(row["primaryIdentifier"], row["symbol"], row["proteinAtlasExpression.cellType"], \
        row["proteinAtlasExpression.level"], row["proteinAtlasExpression.reliability"], \
        row["proteinAtlasExpression.tissue.name"])

We want to save this set of genes (i.e genes from the Pax6 target set that are expressed in the pancreas) for further analysis. To do this we define our python list and loop through our results again - this time, instead of printing the results, we append just the primary identifiers returned to our list.

In [ ]:
# let's make an empty python list called 
UpinPancreas = list()

# now let's use a for loop on query results and select just 
# the gene primary identifiers 
# then append them to our UpinPancreas list. 
#
# To append an item to a python list, it's ListName.append(someValueHere)

and check that the list we have created looks correct:

In [ ]:
print(UpinPancreas)

We now need to save the list to our InterMine account so we can use it again in a later query. The ListManager python class provides methods to manage list contents and operations.

In [ ]:
# first let's make a new list manager assigned to the variable lm
# the syntax to make a list manager is service.list_manager()
 

    
# next, we want to put the contents of UpinAdipose into an InterMine list.
# The syntax is lm.create_list(content=a_list_of_ids, list_type="identifier_class", name="some name")
# In this case, you'll want to set the following arguments:
# - content should be our python list UpinPancreas
# - list_type is "Gene"
# - name - could be anything you want, but let's be consistent and call it "UpinPancreas"

Log in to HumanMine and check your list has been created.

Second query: Diabetes genes

Our second query (which we created using the query builder) found genes that are associated with the disease diabetes. Re-create this query using code as follows:

In [ ]:
query2 = service.new_query("Gene")

# Let's add views for "primaryIdentifier" and "symbol" using query.add_view()


# And let's give it some constraints using 
# query.add_constraint("view_name", "operator", "value", code = "constraint_code")
# 
# Constraint A: organism.name should equal Homo sapiens
# Constraint B: diseases.name should contain diabetes (operator is CONTAINS) 


# We've written the code to print it out for you. 
for row in query2.rows():
    print (row["primaryIdentifier"], row["symbol"])

and save the set of genes returned as a list:

In [ ]:
# Make a python list of gene identifiers
diabetesGenes = list()
for row in query2.rows():
    diabetesGenes.append(row["primaryIdentifier"])
In [ ]:
# One last time, we'll create a list and save it to our HumanMine account
#
# syntax: lm.create_list(content=a_list_of_ids, list_type="identifier_class", name="some name")
# 
# - content should be diabetesGenes
# - list_type is "Gene"
# - name  "diabetesGenes"
# Try it now: 

Next, we used a list intersect to find those genes that are upregulated in the pancreas that are also associated with the disease diabetes. We need to intersect the first (UpinPancreas) and second (diabetesGenes) lists that we created. We can do this using the intersect method from the ListManager class.

In [ ]:
# The syntax to create an InterMine list intersection is
# lm.intersect(["comma_separated", "list", "of_intermine_lists"], "name for new list")
#
# We want to intersect the last two lists we created - 
# "UpinPancreas" and "diabetesGenes"
# try it now: 

The last list intersection was stored in our HumanMine account, so we need to use the method get_list to retrieve it from HumanMine

In [ ]:
# Syntax: lm.get_list("name of the intersected list you just created")
# Store it in a variable called intersectedList, so we can print it in the next step
In [ ]:
print(intersectedList)

Final Query: GWAS

Finally, we fed the intersected list from above back into another query to see if there was any association of these genes with diabetes phenotypes according to GWAS studies. Note that we now start our query from the GWAS class:

In [ ]:
# We've supplied some code for you to get you started: 
query = service.new_query("GWAS")
In [ ]:
# Adding the columns we'd like to view: 
query.add_view(
    "results.associatedGenes.primaryIdentifier",
    "results.associatedGenes.symbol", "results.associatedGenes.name",
    "results.SNP.primaryIdentifier", "results.pValue", "results.phenotype",
    "firstAuthor", "name", "publication.pubMedId",
    "results.associatedGenes.organism.shortName"
)

We only want results with a significant pValue and the phenotype diabetes, and we also want all of these results to be part of the intersectedList we just created. So, a few more constraints:

In [ ]:
# Syntax reminder: query.add_constraint("view_name", "operator", "value", code="constraint_code")
# 
# Let's add three constraints: 
# - Set "results.pValue" to be less than or equal to "1e-04" (the operator you'll want is "<=")
# - Set "results.phenotype" to contain "diabetes" (the contain operator you're looking for here is "CONTAINS")
# - Set "results.associatedGenes" to be IN the list "intersectedList"

Right, let's take a look at the results of our query:

In [ ]:
for row in query.rows():
    print(row["results.associatedGenes.primaryIdentifier"], row["results.associatedGenes.symbol"], \
        row["results.associatedGenes.name"], row["results.SNP.primaryIdentifier"], \
        row["results.pValue"], row["results.phenotype"], row["firstAuthor"], row["name"], \
        row["publication.pubMedId"], row["results.associatedGenes.organism.shortName"])
In [ ]:
for row in query.rows():
    print(row["results.associatedGenes.symbol"])

That's it - all done!