Intermine-Python: Tutorial 6: Advanced Results Management

In the previous tutorials we learnt how to add filters to limit the query results. However, it's often useful to view the results without filters, to get an idea about the shape of the data. This tutorial will show you how you can process, sort and analyse the query results.

We begin by creating a query object. Our example for this tutorial is going to be related to RNA Sequences.

In [1]:
from intermine.webservice import Service
In [2]:
service = Service("https://www.flymine.org/flymine/service")
In [3]:
query=service.new_query()
In [4]:
query.add_views("RNASeqResult.expressionScore RNASeqResult.expressionLevel RNASeqResult.gene.symbol")
Out[4]:
<intermine.query.Query at 0x7fb6a9c70dd8>

We will now sort the results in descending order of their expression score.

In [5]:
query.add_sort_order("RNASeqResult.expressionScore","desc")
Out[5]:
<intermine.query.Query at 0x7fb6a9c70dd8>

We will now print the first ten rows.

In [6]:
for r in query.rows(size=10):
    print(r)
RNASeqResult: expressionScore=273318 expressionLevel='Extremely high expression' gene.symbol='Sgs4'
RNASeqResult: expressionScore=162095 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=143098 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=132342 expressionLevel='Extremely high expression' gene.symbol='lncRNA:CR40469'
RNASeqResult: expressionScore=130637 expressionLevel='Extremely high expression' gene.symbol='Sgs7'
RNASeqResult: expressionScore=119145 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=116020 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=115148 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=114795 expressionLevel='Extremely high expression' gene.symbol='Yp1'
RNASeqResult: expressionScore=111450 expressionLevel='Extremely high expression' gene.symbol='Yp1'

Note that we have not added any constraints and so we have extracted all the possible results.

Now, let's say that we want to sort all the results into three different dictionaries (or maps). The ones with expressionScores of greater than 25 into one map, the ones with expression scores greater than 10 but lesser than (or equal to 25) in another map and all the remaining ones into a separate map.

We begin by declaring these three dictionaries.

In [7]:
high_dict={}
medium_dict={}
low_dict={}
In [8]:
for r in query.rows():
    if(r["expressionScore"]>25):
        high_dict[r["gene.symbol"]]=r["expressionLevel"]
    else:
        if(r["expressionScore"]>10):
            medium_dict[r["gene.symbol"]]=r["expressionLevel"]
        else:
            low_dict[r["gene.symbol"]]=r["expressionLevel"]
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-8-1a11a8a6d37e> in <module>
----> 1 for r in query.rows():
      2     if(r["expressionScore"]>25):
      3         high_dict[r["gene.symbol"]]=r["expressionLevel"]
      4     else:
      5         if(r["expressionScore"]>10):

~/.local/lib/python3.6/site-packages/intermine/results.py in __next__(self)
    490     def __next__(self):
    491         """2.6.x-3.x bridge"""
--> 492         return self.next()
    493 
    494     def next(self):

~/.local/lib/python3.6/site-packages/intermine/results.py in next(self)
    496         if self._is_finished:
    497             raise StopIteration
--> 498         return self.get_next_row_from_connection()
    499 
    500     def parse_header(self):

~/.local/lib/python3.6/site-packages/intermine/results.py in get_next_row_from_connection(self)
    539         next_row = None
    540         try:
--> 541             line = decode_binary(next(self.connection))
    542             if line.startswith("]"):
    543                 self.footer += line;

/usr/lib/python3.6/http/client.py in readline(self, limit)
    681         if self.chunked:
    682             # Fallback to IOBase readline which uses peek() and read()
--> 683             return super().readline(limit)
    684         if self.length is not None and (limit < 0 or limit > self.length):
    685             limit = self.length

/usr/lib/python3.6/http/client.py in peek(self, n)
    673             return b""
    674         if self.chunked:
--> 675             return self._peek_chunked(n)
    676         return self.fp.peek(n)
    677 

/usr/lib/python3.6/http/client.py in _peek_chunked(self, n)
    716         # peek is allowed to return more than requested.  Just request the
    717         # entire chunk, and truncate what we get.
--> 718         return self.fp.peek(chunk_left)[:chunk_left]
    719 
    720     def fileno(self):

/usr/lib/python3.6/socket.py in readinto(self, b)
    584         while True:
    585             try:
--> 586                 return self._sock.recv_into(b)
    587             except timeout:
    588                 self._timeout_occurred = True

/usr/lib/python3.6/ssl.py in recv_into(self, buffer, nbytes, flags)
   1010                   "non-zero flags not allowed in calls to recv_into() on %s" %
   1011                   self.__class__)
-> 1012             return self.read(nbytes, buffer)
   1013         else:
   1014             return socket.recv_into(self, buffer, nbytes, flags)

/usr/lib/python3.6/ssl.py in read(self, len, buffer)
    872             raise ValueError("Read on closed or unwrapped SSL socket.")
    873         try:
--> 874             return self._sslobj.read(len, buffer)
    875         except SSLError as x:
    876             if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:

/usr/lib/python3.6/ssl.py in read(self, len, buffer)
    629         """
    630         if buffer is not None:
--> 631             v = self._sslobj.read(len, buffer)
    632         else:
    633             v = self._sslobj.read(len)

KeyboardInterrupt: 

If you now want to view the items stored in the high_dict dicitionary, you can print them out as shown below.

In [ ]:
for k,v in high_dict.items():
    print(k,v)

Let's say that you want to view the average score for extremely high expressions. This can be done as follows.

In [ ]:
total=0
count=0
for r in query.rows():
    if(r["expressionScore"]>1000):
        total=total+r["expressionScore"]
        count=count+1
print("Average is",total/count)  

An advantage of extracting all the results first and then processing them in the client side is that you do not need to keep re-running the query with different constraints. If you change your mind about the constraints, iterate over the same results and store them into a different list by simply changing the "if" condition.