Intermine-Python: Tutorial 7: Templates

Templates are exactly like queries and can do everything that a query can do.

While writing queries of your own, you would have probably realized that there is a lot of duplication of effort. Templates are basically pre defined queries that can be run numerous times and certain values can also be changed easily.

Everything that you can do with a query - add new constraints, add new views, process the results using the results iterator can be done with templates as well.

We will begin with a simple example. Let's say you want to extract the publication information about various Genes related to an organism. There is already a set template for this process. We begin by importing the Service class and then create a template object. The parameter that we pass to the get template method is the name of the template.

In [1]:
from intermine.webservice import Service
In [2]:
service = Service("https://www.flymine.org/flymine/service")
In [3]:
template=service.get_template('All_Genes_In_Organism_To_Publications')

To check the columns that our results will have you can use template.views. If you want to add a column use the add_view/add_views method.

In [4]:
template.views
Out[4]:
['Gene.secondaryIdentifier',
 'Gene.publications.pubMedId',
 'Gene.publications.firstAuthor',
 'Gene.publications.journal',
 'Gene.publications.year',
 'Gene.organism.name']
In [5]:
template.constraint_dict
Out[5]:
{'A': <TemplateBinaryConstraint: Gene.organism.name = Drosophila melanogaster (editable, locked)>}

If you want to look at the current constraints, use template.constraint_dict. There is only one constraint with is editable, i.e. you can change the value or operator if you want. However, even for editable constraints you are not allowed to change the path of the constraint.

To view the results we can use the results iterator as we learned previously.

In [6]:
for row in template.results(row="rr",size=10):
    print(row)
Gene: secondaryIdentifier='CG10000' publications.pubMedId='10731132' publications.firstAuthor='Adams M D' publications.journal='Science' publications.year=2000 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='11925450' publications.firstAuthor='Schwientek Tilo' publications.journal='J. Biol. Chem.' publications.year=2002 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='12429111' publications.firstAuthor='Luque Teresa' publications.journal='Insect Biochem. Mol. Biol.' publications.year=2002 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='12537569' publications.firstAuthor='Stapleton Mark' publications.journal='Genome Biol.' publications.year=2002 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='12537572' publications.firstAuthor='Misra Sima' publications.journal='Genome Biol.' publications.year=2002 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='12829714' publications.firstAuthor='Ten Hagen Kelly G' publications.journal='J. Biol. Chem.' publications.year=2003 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='16251381' publications.firstAuthor='Tian E' publications.journal='Glycobiology' publications.year=2006 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='20220848' publications.firstAuthor='Schnorrer Frank' publications.journal='Nature' publications.year=2010 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='20371600' publications.firstAuthor='Zhang Liping' publications.journal='J. Biol. Chem.' publications.year=2010 organism.name='Drosophila melanogaster'
Gene: secondaryIdentifier='CG10000' publications.pubMedId='20807760' publications.firstAuthor='Zhang Liping' publications.journal='J. Biol. Chem.' publications.year=2010 organism.name='Drosophila melanogaster'

Let's say that you want to extract information for Drosophila Erecta and not Drosophila melanogaster. You can edit the query while calling the results method. In the code shown below, A refers to the code of the constraint. This code can be viewed using "template.constraint_dict" as shown above.

In [7]:
for row in template.results(row="rr",A={"op":"=","value":"Drosophila erecta"},size=10):
    print(row)
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10486967' publications.firstAuthor='Wang S' publications.journal='Mol. Biol. Evol.' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10545459' publications.firstAuthor='Takano-Shimizu T' publications.journal='Genetics' publications.year=1999 organism.name='Drosophila erecta'
Gene: secondaryIdentifier=None publications.pubMedId='10545459' publications.firstAuthor='Takano-Shimizu T' publications.journal='Genetics' publications.year=1999 organism.name='Drosophila erecta'

This is how you can use a pre defined query and modify it to work for you. I would suggest that you visit the flymine website and take a look at some of the templates that have been defined there. Try running them using Python and change the constraints and views.

While exploring through templates you may come across templates that can be switched on or off. To switch off a constraint that is already turned on you can use the following code: template.get_constraints('B').switch_off , where B is the code of the constraint in the constraint dictionary. In our example, the code is A since there is only one constraint.

To check if a particular constraint is switchable use the get_switchable_status method. This method can return three possible values - locked, on or off. Locked means that the particular constraint is fixed and cannot be switched on or off. If a particular constraint is switchable, it will return on or off depending on it's current status.

In [8]:
template.get_constraint('A').get_switchable_status()
Out[8]:
'locked'

If you try to switch off a constraint that is "locked" or not switchable, you will get an error.

In [9]:
# template.get_constraint('A').switch_off()

It is also possible to modify constraints on the templates, as discussed above. Both the operator and value maybe altered. Here is an example from the Gene Intron template-

We first see the status of the constraints on this template.

In [10]:
from intermine.webservice import Service
service=Service("https://www.flymine.org/flymine/service")
query=service.new_query('Gene')
template=service.get_template('Gene_IntronChromosome')
template.constraint_dict
Out[10]:
{'A': <TemplateTernaryConstraint: Gene LOOKUP CG10016 (editable, on)>,
 'B': <TemplateBinaryConstraint: Gene.organism.name = Drosophila melanogaster (editable, on)>}

Now we modify constraint A, such that it only contains ouput data with secondary identifier CG10023.

In [11]:
from intermine.webservice import Service
service = Service("https://www.flymine.org/flymine/service")
template = service.get_template('Gene_IntronChromosome')
rows = template.rows(
    A = {"op": "=", "value": "CG10023"})
for row in rows:
    print(row)
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:10_FBgn0020440:9' transcripts.introns.length=55 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19434109 transcripts.introns.chromosomeLocation.end=19434163 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:11_FBgn0020440:10' transcripts.introns.length=61 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19434309 transcripts.introns.chromosomeLocation.end=19434369 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:12_FBgn0020440:11' transcripts.introns.length=57 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19434503 transcripts.introns.chromosomeLocation.end=19434559 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:13_FBgn0020440:12' transcripts.introns.length=62 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19434693 transcripts.introns.chromosomeLocation.end=19434754 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:14_FBgn0020440:13' transcripts.introns.length=63 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19435251 transcripts.introns.chromosomeLocation.end=19435313 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:15_FBgn0020440:14' transcripts.introns.length=58 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19435550 transcripts.introns.chromosomeLocation.end=19435607 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:16_FBgn0020440:15' transcripts.introns.length=59 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19435718 transcripts.introns.chromosomeLocation.end=19435776 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:17_FBgn0020440:16' transcripts.introns.length=68 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19436480 transcripts.introns.chromosomeLocation.end=19436547 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:18_FBgn0020440:16' transcripts.introns.length=68 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19436480 transcripts.introns.chromosomeLocation.end=19436547 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:19_FBgn0020440:17' transcripts.introns.length=214 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19436644 transcripts.introns.chromosomeLocation.end=19436857 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:20_FBgn0020440:17' transcripts.introns.length=407 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19436644 transcripts.introns.chromosomeLocation.end=19437050 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:21_FBgn0020440:17' transcripts.introns.length=571 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19436644 transcripts.introns.chromosomeLocation.end=19437214 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:2_FBgn0020440:1' transcripts.introns.length=64 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19431563 transcripts.introns.chromosomeLocation.end=19431626 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:3_FBgn0020440:2' transcripts.introns.length=54 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19431738 transcripts.introns.chromosomeLocation.end=19431791 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:4_FBgn0020440:2' transcripts.introns.length=54 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19431738 transcripts.introns.chromosomeLocation.end=19431791 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:5_FBgn0020440:3' transcripts.introns.length=62 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19432241 transcripts.introns.chromosomeLocation.end=19432302 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:5_FBgn0020440:4' transcripts.introns.length=56 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19432247 transcripts.introns.chromosomeLocation.end=19432302 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:6_FBgn0020440:5' transcripts.introns.length=203 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19432393 transcripts.introns.chromosomeLocation.end=19432595 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:7_FBgn0020440:5' transcripts.introns.length=1109 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19432393 transcripts.introns.chromosomeLocation.end=19433501 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:8_FBgn0020440:6' transcripts.introns.length=61 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19433642 transcripts.introns.chromosomeLocation.end=19433702 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:8_FBgn0020440:7' transcripts.introns.length=61 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19433642 transcripts.introns.chromosomeLocation.end=19433702 transcripts.introns.chromosomeLocation.strand='-1'
Gene: secondaryIdentifier='CG10023' symbol='Fak' transcripts.introns.primaryIdentifier='intron_FBgn0020440:9_FBgn0020440:8' transcripts.introns.length=61 transcripts.introns.chromosome.primaryIdentifier='2R' transcripts.introns.chromosomeLocation.start=19433843 transcripts.introns.chromosomeLocation.end=19433903 transcripts.introns.chromosomeLocation.strand='-1'

Thus the output only contains results as per the applied constraints.

This tutorial highlighted how templates help in automating commonly used queries and can make extremely efficient workflows.