%matplotlib inline
from IPython.display import HTML
import matplotlib.pyplot as plt
import requests
import pandas as pd
import numpy as np
import seaborn as sns
sns.set_style('white')
sns.set_context('talk', font_scale=1.2)
HTML('<blockquote class="twitter-tweet" lang="he"><p lang="en" dir="ltr">Hmm, I don't know about this caterpillar rearing manual. I thought P.rapae had an obligate association w/ Brassica. <a href="http://t.co/M10dqbOYlN">pic.twitter.com/M10dqbOYlN</a></p>— Christie Bahlai (@cbahlai) <a href="https://twitter.com/cbahlai/status/597462491166150656">מאי 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
HTML('<blockquote class="twitter-tweet" lang="he"><p lang="en" dir="ltr">This is a terrible dataset about caterpillar diet. How did it got published? <a href="http://t.co/XkAq51HxEP">pic.twitter.com/XkAq51HxEP</a></p>— Timothée Poisot (@tpoi) <a href="https://twitter.com/tpoi/status/591041490618552320">אפריל 23, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
HTML('<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="und" dir="ltr"><a href="https://twitter.com/tpoi">@tpoi</a> <a href="https://twitter.com/kara_woo">@kara_woo</a> <a href="https://twitter.com/cbahlai">@cbahlai</a> <a href="http://t.co/5lj9EzuKjW">pic.twitter.com/5lj9EzuKjW</a></p>— Yoav Ram (@yoavram) <a href="https://twitter.com/yoavram/status/597518650082365440">May 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
HTML('<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="en" dir="ltr">[blog] How hungry are caterpillars anyway? <a href="http://t.co/SvImkHYHhR">http://t.co/SvImkHYHhR</a> <a href="https://twitter.com/hashtag/opendata?src=hash">#opendata</a></p>— Timothée Poisot (@tpoi) <a href="https://twitter.com/tpoi/status/597518409203589122">May 10, 2015</a></blockquote><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>')
We will learn how to use the Global Biotic Interactions (globi) API with Python to check How hungry are caterpillars anyway? (sort of).
First, have a look at the API and the API docs. It is a RESTful API that returns responses in JSON format over HTTP.
dict
.Let's try it, following Poisot's lead on The Very Hungry Caterpillar.
We will use requests - a Python HTTP library for humans.
response = requests.get("http://api.globalbioticinteractions.org/interaction?sourceTaxon=Pieris&interactionType=eats")
print("OK:", response.ok)
The respose payload is in JSON format. Calling the json
method will return the payload as a dict
:
payload = response.json()
print(len(payload))
print(payload.keys())
The response has two fields, columns
and data
, corresponding to the data frame's column names and rows. That's great because we can push it right into a pandas.DataFrame
:
print(payload['columns'])
print(payload['data'][0])
df = pd.DataFrame(data['data'], columns=data['columns'])
print(df.shape)
df.head()
Let's see what each caterpillar eats. We got the eats
interactions, so let's just leave the source and target taxons:
cols = df.columns.tolist()
cols.remove('source_taxon_name')
cols.remove('target_taxon_name')
print(cols)
df.drop(labels=cols, axis=1, inplace=True)
df.head()
Next, we count how many target taxons occur for each source taxon. For that, we group by source and aggregate by length (I made sure before that each source-target pair appears only once. How??).
The groupby
made source_taxon_name
become an index rather than a column and that's why we call reset_index
.
table = df.groupby(by='source_taxon_name').aggregate(len).reset_index()
table.head()
Finally we rename the columns to make them more meaningful and we sort the table by the number of target taxons. Then we print and plot:
table = table.rename(columns={'source_taxon_name':'Pieris species', 'target_taxon_name': 'Number of known items in diet'})
table = table.sort('Number of known items in diet', ascending=False)
table
table.plot(x="Pieris species", y="Number of known items in diet", kind="barh", legend=False)
plt.ylabel('Number of known items in diet')
plt.grid(False)
sns.despine()
Pieris rapae
Pieris brassicae