In the previous tutorials we learnt how to add filters to limit the query results. However, it's often useful to view the results without filters, to get an idea about the shape of the data. This tutorial will show you how you can process, sort and analyse the query results.
We begin by creating a query object. Our example for this tutorial is going to be related to RNA Sequences.
from intermine.webservice import Service
service = Service("https://www.flymine.org/flymine/service")
query=service.new_query()
query.add_views("RNASeqResult.expressionScore RNASeqResult.expressionLevel RNASeqResult.gene.symbol")
<intermine.query.Query at 0x7fb6a9c70dd8>
We will now sort the results in descending order of their expression score.
query.add_sort_order("RNASeqResult.expressionScore","desc")
<intermine.query.Query at 0x7fb6a9c70dd8>
We will now print the first ten rows.
for r in query.rows(size=10):
print(r)
RNASeqResult: expressionScore=273318 expressionLevel='Extremely high expression' gene.symbol='Sgs4' RNASeqResult: expressionScore=162095 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=143098 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=132342 expressionLevel='Extremely high expression' gene.symbol='lncRNA:CR40469' RNASeqResult: expressionScore=130637 expressionLevel='Extremely high expression' gene.symbol='Sgs7' RNASeqResult: expressionScore=119145 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=116020 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=115148 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=114795 expressionLevel='Extremely high expression' gene.symbol='Yp1' RNASeqResult: expressionScore=111450 expressionLevel='Extremely high expression' gene.symbol='Yp1'
Note that we have not added any constraints and so we have extracted all the possible results.
Now, let's say that we want to sort all the results into three different dictionaries (or maps). The ones with expressionScores of greater than 25 into one map, the ones with expression scores greater than 10 but lesser than (or equal to 25) in another map and all the remaining ones into a separate map.
We begin by declaring these three dictionaries.
high_dict={}
medium_dict={}
low_dict={}
for r in query.rows():
if(r["expressionScore"]>25):
high_dict[r["gene.symbol"]]=r["expressionLevel"]
else:
if(r["expressionScore"]>10):
medium_dict[r["gene.symbol"]]=r["expressionLevel"]
else:
low_dict[r["gene.symbol"]]=r["expressionLevel"]
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-8-1a11a8a6d37e> in <module> ----> 1 for r in query.rows(): 2 if(r["expressionScore"]>25): 3 high_dict[r["gene.symbol"]]=r["expressionLevel"] 4 else: 5 if(r["expressionScore"]>10): ~/.local/lib/python3.6/site-packages/intermine/results.py in __next__(self) 490 def __next__(self): 491 """2.6.x-3.x bridge""" --> 492 return self.next() 493 494 def next(self): ~/.local/lib/python3.6/site-packages/intermine/results.py in next(self) 496 if self._is_finished: 497 raise StopIteration --> 498 return self.get_next_row_from_connection() 499 500 def parse_header(self): ~/.local/lib/python3.6/site-packages/intermine/results.py in get_next_row_from_connection(self) 539 next_row = None 540 try: --> 541 line = decode_binary(next(self.connection)) 542 if line.startswith("]"): 543 self.footer += line; /usr/lib/python3.6/http/client.py in readline(self, limit) 681 if self.chunked: 682 # Fallback to IOBase readline which uses peek() and read() --> 683 return super().readline(limit) 684 if self.length is not None and (limit < 0 or limit > self.length): 685 limit = self.length /usr/lib/python3.6/http/client.py in peek(self, n) 673 return b"" 674 if self.chunked: --> 675 return self._peek_chunked(n) 676 return self.fp.peek(n) 677 /usr/lib/python3.6/http/client.py in _peek_chunked(self, n) 716 # peek is allowed to return more than requested. Just request the 717 # entire chunk, and truncate what we get. --> 718 return self.fp.peek(chunk_left)[:chunk_left] 719 720 def fileno(self): /usr/lib/python3.6/socket.py in readinto(self, b) 584 while True: 585 try: --> 586 return self._sock.recv_into(b) 587 except timeout: 588 self._timeout_occurred = True /usr/lib/python3.6/ssl.py in recv_into(self, buffer, nbytes, flags) 1010 "non-zero flags not allowed in calls to recv_into() on %s" % 1011 self.__class__) -> 1012 return self.read(nbytes, buffer) 1013 else: 1014 return socket.recv_into(self, buffer, nbytes, flags) /usr/lib/python3.6/ssl.py in read(self, len, buffer) 872 raise ValueError("Read on closed or unwrapped SSL socket.") 873 try: --> 874 return self._sslobj.read(len, buffer) 875 except SSLError as x: 876 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs: /usr/lib/python3.6/ssl.py in read(self, len, buffer) 629 """ 630 if buffer is not None: --> 631 v = self._sslobj.read(len, buffer) 632 else: 633 v = self._sslobj.read(len) KeyboardInterrupt:
If you now want to view the items stored in the high_dict dicitionary, you can print them out as shown below.
for k,v in high_dict.items():
print(k,v)
Let's say that you want to view the average score for extremely high expressions. This can be done as follows.
total=0
count=0
for r in query.rows():
if(r["expressionScore"]>1000):
total=total+r["expressionScore"]
count=count+1
print("Average is",total/count)
An advantage of extracting all the results first and then processing them in the client side is that you do not need to keep re-running the query with different constraints. If you change your mind about the constraints, iterate over the same results and store them into a different list by simply changing the "if" condition.