Intermine-Python: Tutorial 3: More about Constraints

In the previous tutorial, we learnt about adding constraints to our query so that we could filter the results. In this tutorial we will take a look at some more contraints and the different types of constraints.

In [1]:
from intermine.webservice import Service
In [2]:
service = Service("https://www.flymine.org/flymine/service")
query=service.new_query("Gene") 
Unary Constraint

The first type of constraint that we will look at is a Unary Constraint. A Unary Constraint is one that does not take any value but can be used to check if a particular attirbute is absent or present. The Unary constraints are IS Null and IS NOT Null. We can look at a small example.

In [3]:
query.add_constraint("primaryIdentifier","IS NOT NULL")
Out[3]:
<UnaryConstraint: Gene.primaryIdentifier IS NOT NULL>
In [4]:
for row in query.rows(size=10):
    print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000219 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1006102 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1010363 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015832 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1019267 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167'
Gene: briefDescription=None cytoLocation='-' description=None id=1019385 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061'
Gene: briefDescription=None cytoLocation='-' description=None id=1019845 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099'
Gene: briefDescription=None cytoLocation='-' description=None id=1020575 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698'
Gene: briefDescription=None cytoLocation='-' description=None id=1020775 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10'
Gene: briefDescription=None cytoLocation='-' description=None id=1021531 length=5696 name=None primaryIdentifier='FBgn0025835' score=None scoreType=None secondaryIdentifier='CG17707' symbol='CG17707'
Binary Constraint

The next type of constraint is a Binary Constraint. This refers to constraints that take a value. Most of the constraints that we looked at in the second tutorial were binary constraints. Binary constraints are the largest group of constraints. The operators are =,<=,>=,<,>,!=

In [5]:
query.add_constraint("length",">=","12000")
Out[5]:
<BinaryConstraint: Gene.length >= 12000>
In [6]:
for row in query.rows(size=10):
    print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000219 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1006102 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1010363 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015832 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1019267 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167'
Gene: briefDescription=None cytoLocation='-' description=None id=1019845 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099'
Gene: briefDescription=None cytoLocation='-' description=None id=1020775 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10'
Gene: briefDescription=None cytoLocation='-' description=None id=1057447 length=133933 name=None primaryIdentifier='FBgn0058006' score=None scoreType=None secondaryIdentifier='CG40006' symbol='CG40006'
Gene: briefDescription=None cytoLocation='-' description=None id=1059401 length=76790 name='neverland' primaryIdentifier='FBgn0287185' score=None scoreType=None secondaryIdentifier='CG40050' symbol='nvd'
Gene: briefDescription=None cytoLocation='-' description=None id=1119953 length=214356 name='WD40 Y' primaryIdentifier='FBgn0267449' score=None scoreType=None secondaryIdentifier='CG45799' symbol='WDY'

The above constraint is an example of a binary constraint.

Ternary Constraint

We will now look at Ternary constraints. A ternary constraint is a type of constraint which has one required value and one optional value. Currently, intermine supports only one such type of operator: LOOKUP. The lookup operator searches through all the fields in a particular class for the value specified by the user. In the example given below, it will search through the entire gene class to find if any of the fields has an occurence of "zen". The advantage of this is that you do not need to remember if zen is a symbol or a name or a primaryIdentifier. However, this may lead to ambiguous results and so you can use the optional extra_value parameter to limit the search to the type of object (for example, organism in genes).

In [7]:
query2=service.new_query()
In [8]:
query2.add_constraint("Gene","LOOKUP","zen",extra_value="D. melanogaster")
Out[8]:
<TernaryConstraint: Gene LOOKUP zen IN D. melanogaster>
In [9]:
for row in query2.rows():
    print(row)
Gene: briefDescription=None cytoLocation='84A5-84A5' description=None id=1007877 length=1331 name='zerknullt' primaryIdentifier='FBgn0004053' score=None scoreType=None secondaryIdentifier='CG1046' symbol='zen'
Multi-Value Constraint

The next constraint type that we will look at is Multi-Value constraints. This allows the constraint to take multiple values. The two operators that are allowed are ONE OF and NONE OF.

In [10]:
query3=service.new_query("Gene")
In [11]:
query3.add_constraint("symbol","NONE OF",['zen','eve'])
Out[11]:
<MultiConstraint: Gene.symbol NONE OF ['zen', 'eve']>
In [12]:
for row in query3.rows(size=10):
    print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000219 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1006102 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1010363 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015832 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1019267 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167'
Gene: briefDescription=None cytoLocation='-' description=None id=1019385 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061'
Gene: briefDescription=None cytoLocation='-' description=None id=1019845 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099'
Gene: briefDescription=None cytoLocation='-' description=None id=1020575 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698'
Gene: briefDescription=None cytoLocation='-' description=None id=1020775 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10'
Gene: briefDescription=None cytoLocation='-' description=None id=1021531 length=5696 name=None primaryIdentifier='FBgn0025835' score=None scoreType=None secondaryIdentifier='CG17707' symbol='CG17707'
List Constraint

List Constraints: List constraints allow users to create a named list of objects and then use the operators IN and NOT IN to use those named lists in queries. An example for the same is below. The path in such a query must always be a Class (for example - Gene is a valid path). The available lists in intermine can be found at: http://www.flymine.org/flymine/bag.do?subtab=view .

In [13]:
query4=service.new_query()
In [14]:
query4.add_constraint("Gene","IN","PL FlyAtlas_brain_top")
Out[14]:
<ListConstraint: Gene IN PL FlyAtlas_brain_top>
In [15]:
for row in query4.rows(size=10):
    print(row)
Gene: briefDescription=None cytoLocation='10A3-10A3' description=None id=1104775 length=2075 name=None primaryIdentifier='FBgn0030259' score=None scoreType=None secondaryIdentifier='CG1545' symbol='CG1545'
Gene: briefDescription=None cytoLocation='11D8-11D8' description=None id=1068742 length=90456 name='radish' primaryIdentifier='FBgn0265597' score=None scoreType=None secondaryIdentifier='CG44424' symbol='rad'
Gene: briefDescription=None cytoLocation='14A1-14A1' description=None id=1039501 length=26224 name='mind-meld' primaryIdentifier='FBgn0259110' score=None scoreType=None secondaryIdentifier='CG42252' symbol='mmd'
Gene: briefDescription=None cytoLocation='16F3-16F5' description=None id=1058279 length=138941 name='Shaker' primaryIdentifier='FBgn0003380' score=None scoreType=None secondaryIdentifier='CG12348' symbol='Sh'
Gene: briefDescription=None cytoLocation='18C2-18C3' description=None id=1074968 length=21373 name='nicotinic Acetylcholine Receptor alpha7' primaryIdentifier='FBgn0086778' score=None scoreType=None secondaryIdentifier='CG32538' symbol='nAChRalpha7'
Gene: briefDescription=None cytoLocation='19A4-19A4' description=None id=1068367 length=58027 name='Dopamine 2-like receptor' primaryIdentifier='FBgn0053517' score=None scoreType=None secondaryIdentifier='CG33517' symbol='Dop2R'
Gene: briefDescription=None cytoLocation='24C9-24D1' description=None id=1008553 length=88786 name='friend of echinoid' primaryIdentifier='FBgn0051774' score=None scoreType=None secondaryIdentifier='CG31774' symbol='fred'
Gene: briefDescription=None cytoLocation='25B4-25B4' description=None id=1693557 length=1564 name=None primaryIdentifier='FBgn0031650' score=None scoreType=None secondaryIdentifier='CG14044' symbol='CG14044'
Gene: briefDescription=None cytoLocation='30D1-30E1' description=None id=1428781 length=92934 name='nicotinic Acetylcholine Receptor alpha6' primaryIdentifier='FBgn0032151' score=None scoreType=None secondaryIdentifier='CG4128' symbol='nAChRalpha6'
Gene: briefDescription=None cytoLocation='42D4-42D6' description=None id=1176763 length=10005 name=None primaryIdentifier='FBgn0033108' score=None scoreType=None secondaryIdentifier='CG15236' symbol='CG15236'
Sub-Class Constraints

The intermine database is a hierarchical database. Sub-class constraints allow you to specify a sub-class of a class to constrain a path to. This basically allows us to constrain our results to only those items of the sub class. The example below is an example of a sub-class constraint.

In [16]:
query5=service.new_query("Gene")
In [17]:
query5.add_constraint("ontologyAnnotations","GOAnnotation")
Out[17]:
<SubClassConstraint: Gene.ontologyAnnotations ISA GOAnnotation>
In [18]:
for row in query5.rows(size=10):
    print(row)
Gene: briefDescription=None cytoLocation='-' description=None id=1000219 length=12653 name='zydeco' primaryIdentifier='FBgn0265767' score=None scoreType=None secondaryIdentifier='CG2893' symbol='zyd'
Gene: briefDescription=None cytoLocation='-' description=None id=1006102 length=12892 name='Rho GTPase activating protein at 1A' primaryIdentifier='FBgn0025836' score=None scoreType=None secondaryIdentifier='CG40494' symbol='RhoGAP1A'
Gene: briefDescription=None cytoLocation='-' description=None id=1010363 length=21475 name='verthandi' primaryIdentifier='FBgn0260987' score=None scoreType=None secondaryIdentifier='CG17436' symbol='vtd'
Gene: briefDescription=None cytoLocation='-' description=None id=1015832 length=14286 name='Maf1' primaryIdentifier='FBgn0267861' score=None scoreType=None secondaryIdentifier='CG40196' symbol='Maf1'
Gene: briefDescription=None cytoLocation='-' description=None id=1019267 length=12844 name=None primaryIdentifier='FBgn0039941' score=None scoreType=None secondaryIdentifier='CG17167' symbol='CG17167'
Gene: briefDescription=None cytoLocation='-' description=None id=1019385 length=11613 name=None primaryIdentifier='FBgn0040031' score=None scoreType=None secondaryIdentifier='CG12061' symbol='CG12061'
Gene: briefDescription=None cytoLocation='-' description=None id=1019845 length=21673 name=None primaryIdentifier='FBgn0039955' score=None scoreType=None secondaryIdentifier='CG41099' symbol='CG41099'
Gene: briefDescription=None cytoLocation='-' description=None id=1020575 length=11099 name=None primaryIdentifier='FBgn0040056' score=None scoreType=None secondaryIdentifier='CG17698' symbol='CG17698'
Gene: briefDescription=None cytoLocation='-' description=None id=1020775 length=32088 name='kelch like family member 10' primaryIdentifier='FBgn0040038' score=None scoreType=None secondaryIdentifier='CG12423' symbol='klhl10'
Gene: briefDescription=None cytoLocation='-' description=None id=1021531 length=5696 name=None primaryIdentifier='FBgn0025835' score=None scoreType=None secondaryIdentifier='CG17707' symbol='CG17707'

Unlike most constraints, Sub-class constraints do not have an operator that is specified as a parameter to a constraint.

Loop Constraints

Loop Constraints assert that two paths refer to the same object. The valid operators are IS and IS NOT. The Path and LoopPath in such a query must always be a Class(for example - Gene is a valid path). Also, the operators IS and IS NOT map to the ops = and != when they are used in XML serialisation. The example below is an example of a Loop Constraint.

In [19]:
query=service.new_query("Gene")
In [20]:
query.add_view("homologues.gene.primaryIdentifier","homologues.homologue.primaryIdentifier")
Out[20]:
<intermine.query.Query at 0x7fc4b4b76a20>
In [21]:
query.add_constraint("Gene", "IN", "H. sapiens orthologues of FlyTF_site_specific_1", code = "A")
Out[21]:
<ListConstraint: Gene IN H. sapiens orthologues of FlyTF_site_specific_1>
In [22]:
query.add_constraint("homologues.homologue", "IS NOT", "Gene", code = "B")
Out[22]:
<LoopConstraint: Gene.homologues.homologue IS NOT Gene>
In [23]:
for row in query.rows(size=10):
    print(row["homologues.gene.primaryIdentifier"],row["homologues.homologue.primaryIdentifier"])
1390 1385
1390 466
1390 FBgn0265784
1390 MGI:88495
1390 RGD:2402
1390 WBGene00000793
1390 ZDB-GENE-030131-7031
1390 ZDB-GENE-050417-150
6935 25988
6935 9839
Range Constraints

Range Constraints are used for testing where a value lies relative to a set of ranges.These constraints require that the value of the path they constrain should lie in relationship to the set of values passed according to the specific operator. Valid operators are OVERLAPS, DOES NOT OVERLAP, WITHIN, OUTSIDE, CONTAINS and DOES NOT CONTAIN. Here is an example of Range Constraint.

In [24]:
query=service.new_query()
In [25]:
query.add_view("SequenceFeature.organism.shortName", "SequenceFeature.chromosomeLocation.locatedOn.primaryIdentifier", "SequenceFeature.chromosomeLocation.start", "SequenceFeature.chromosomeLocation.end" )
Out[25]:
<intermine.query.Query at 0x7fc4b4bd0c18>
In [26]:
query.add_constraint("chromosomeLocation", "OVERLAPS", ["X:94248091..143371935"])
Out[26]:
<RangeConstraint: SequenceFeature.chromosomeLocation OVERLAPS ['X:94248091..143371935']>
In [27]:
for row in query.rows(size=4): 
    print(row)

This tutorial summed up some of the important constraint types. In the next tutorial we will look at some of the other features of a query.