Tutorial 15 - Simple ways to manipulate the model

This tutorial will help you utilise the module to get details about the data model of various mines. These are additional examples, besides the ones covered here. Each class (data type) has information containing references to other objects in the data model, collections of references or attribute details.

Let's begin by showing all the fields a data type contains (including all Attributes, References and Collections listed alphabetically),then-

In [1]:
from intermine.webservice import Service
service = Service("http://flymine.org/flymine/service")
model = service.model
datatype = model.get_class("Gene")
datatype.fields
Out[1]:
[CDSs is a group of CDS objects, which link back to this as gene,
 UTRs is a group of UTR objects, which link back to this as gene,
 alleles is a group of Allele objects, which link back to this as gene,
 briefDescription is a String,
 childFeatures is a group of SequenceFeature objects,
 chromosome is a Chromosome,
 chromosomeLocation is a Location,
 clones is a group of CDNAClone objects, which link back to this as gene,
 crossReferences is a group of CrossReference objects, which link back to this as subject,
 cytoLocation is a String,
 dataSets is a group of DataSet objects, which link back to this as bioEntities,
 description is a String,
 diseases is a group of Disease objects, which link back to this as genes,
 downstreamIntergenicRegion is a IntergenicRegion,
 exons is a group of Exon objects, which link back to this as gene,
 flankingRegions is a group of GeneFlankingRegion objects, which link back to this as gene,
 goAnnotation is a group of GOAnnotation objects,
 homologues is a group of Homologue objects, which link back to this as gene,
 id is a Integer,
 interactions is a group of Interaction objects, which link back to this as participant1,
 introns is a group of Intron objects, which link back to this as genes,
 length is a Integer,
 locatedFeatures is a group of Location objects, which link back to this as locatedOn,
 locations is a group of Location objects, which link back to this as feature,
 mRNAExpressionResults is a group of MRNAExpressionResult objects, which link back to this as gene,
 miRNAtargets is a group of MiRNATarget objects, which link back to this as mirnagene,
 microArrayResults is a group of MicroArrayResult objects, which link back to this as genes,
 name is a String,
 ontologyAnnotations is a group of OntologyAnnotation objects, which link back to this as subject,
 organism is a Organism,
 overlappingFeatures is a group of SequenceFeature objects,
 pathways is a group of Pathway objects, which link back to this as genes,
 primaryIdentifier is a String,
 probeSets is a group of ProbeSet objects, which link back to this as genes,
 proteins is a group of Protein objects, which link back to this as genes,
 publications is a group of Publication objects, which link back to this as entities,
 regulatoryRegions is a group of RegulatoryRegion objects, which link back to this as gene,
 rnaSeqResults is a group of RNASeqResult objects, which link back to this as gene,
 rnaiResults is a group of RNAiResult objects, which link back to this as gene,
 score is a Double,
 scoreType is a String,
 secondaryIdentifier is a String,
 sequence is a Sequence,
 sequenceOntologyTerm is a SOTerm,
 strain is a Strain, which links back to this as features,
 symbol is a String,
 synonyms is a group of Synonym objects, which link back to this as subject,
 transcripts is a group of Transcript objects, which link back to this as gene,
 upstreamIntergenicRegion is a IntergenicRegion]

However suppose you have a class name, and want to know the details about a particular field it contains, then you can do the following-

In [2]:
datatype = model.get_class("Protein")
datatype.get_field('genes')
Out[2]:
genes is a group of Gene objects, which link back to this as proteins

Note that if you enter a field name that does not belong to the class, then you will receive an error.

In [3]:
datatype.get_field('puppies')
---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-3-dd69e236017c> in <module>
----> 1 datatype.get_field('puppies')

~/opt/anaconda3/lib/python3.7/site-packages/intermine-1.12.0-py3.7.egg/intermine/model.py in get_field(self, name)
    330         else:
    331             raise ModelError("There is no field called %s in %s" %
--> 332                              (name, self.name))
    333 
    334     def isa(self, other):

ModelError: 'There is no field called puppies in Protein'

Now, suppose you want to find out the nature of the data you retrieved earlier (i.e. whether it is an 'Attribute' or 'Collection'/'Reference'), you can easily iterate over the fields we had obtained earlier-

In [4]:
datatype = model.get_class("Protein")
fieldtypes=[]
for field in datatype.fields:

    fieldtypes.append(str(field))
    path = model.make_path("Protein"+"."+fieldtypes[len(fieldtypes)-1])
    if path.is_reference()== True:
        print(fieldtypes[len(fieldtypes)-1],'-', 'REFERENCE')
       
    if path.is_attribute()== True:
        print(fieldtypes[len(fieldtypes)-1], '-','ATTRIBUTE')
        
CDSs - REFERENCE
canonicalProtein - REFERENCE
comments - REFERENCE
components - REFERENCE
crossReferences - REFERENCE
dataSets - REFERENCE
ecNumber - ATTRIBUTE
ecNumbers - REFERENCE
features - REFERENCE
genbankIdentifier - ATTRIBUTE
genes - REFERENCE
id - ATTRIBUTE
interactions - REFERENCE
isFragment - ATTRIBUTE
isUniprotCanonical - ATTRIBUTE
isoforms - REFERENCE
keywords - REFERENCE
length - ATTRIBUTE
locatedFeatures - REFERENCE
locations - REFERENCE
md5checksum - ATTRIBUTE
molecularWeight - ATTRIBUTE
name - ATTRIBUTE
ontologyAnnotations - REFERENCE
organism - REFERENCE
pathways - REFERENCE
primaryAccession - ATTRIBUTE
primaryIdentifier - ATTRIBUTE
proteinDomainRegions - REFERENCE
publications - REFERENCE
secondaryIdentifier - ATTRIBUTE
sequence - REFERENCE
symbol - ATTRIBUTE
synonyms - REFERENCE
transcripts - REFERENCE
uniprotAccession - ATTRIBUTE
uniprotName - ATTRIBUTE

Notice that we used make_path() in the process. This function helps us construct paths and inspect whether it is valid or not, or as in here, utilise it to get more information from the model.

Now let's look at what information we can get regarding 'inheritance' of class'.

The isa(input) function helps us determine whether 'input' belongs to the ancestry(is a parent class or one of the parents of the parent class or so on) of a given class.

In [5]:
datatype = model.get_class("Protein")
datatype.isa('Allele')
Out[5]:
False

Thus Allele is not an ancestor of of Protein.

Now let's see how we can get retrieve the ancestry of a particular class-

In [6]:
datatype = model.get_class("Protein")
classes = model.to_ancestry(datatype)
for i in classes:
    print(i.name)
BioEntity
Annotatable

This tutorial thus explained how to get information regarding a class i.e. its attributes and the inheritance it shares without having to retrieve the entire model!