In this notebook, we'll take a look at EventVestor's Clinical Trials dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents announcements of key phases of clinical trials by biotech/pharmaceutical companies.
Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.
Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.
It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.
Helpful links:
Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:
from odo import odo
odo(expr, pandas.DataFrame)
One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.
There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.
With preamble in place, let's get started:
# import the dataset
from quantopian.interactive.data.eventvestor import clinical_trials
# or if you want to import the free dataset, use:
# from quantopian.data.eventvestor import clinical_trials_free
# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd
# Let's use blaze to understand the data a bit using Blaze dshape()
clinical_trials.dshape
dshape("""var * { event_id: ?float64, asof_date: datetime, trade_date: ?datetime, symbol: ?string, event_type: ?string, event_headline: ?string, clinical_phase: ?string, clinical_scope: ?string, clinical_result: ?string, product_name: ?string, event_rating: ?float64, timestamp: datetime, sid: ?int64 }""")
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
clinical_trials.count()
# Let's see what the data looks like. We'll grab the first three rows.
clinical_trials[:3]
event_id | asof_date | trade_date | symbol | event_type | event_headline | clinical_phase | clinical_scope | clinical_result | product_name | event_rating | timestamp | sid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 138303 | 2007-01-03 | 2007-01-03 | IMCL | Clinical Trials | ImClone Systems Commences Patient Treatment in... | Phase I | NaN | NaN | IMC-3G3 | 1 | 2007-01-04 | 3871 |
1 | 138180 | 2007-01-04 | 2007-01-04 | DNA | Clinical Trials | Genentech Announces Positive Results From Rand... | Phase II | NaN | Positive | Pertuzumab | 1 | 2007-01-05 | 24847 |
2 | 952759 | 2007-01-04 | 2007-01-04 | VICL | Clinical Trials | Vical Initiates Pivotal Phase 3 Trial of Allov... | Phase III | NaN | NaN | Allovectin-7 | 1 | 2007-01-05 | 8763 |
Let's go over the columns:
We've done much of the data processing for you. Fields like timestamp
and sid
are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid
across all our equity databases.
We can select columns and rows with ease. Below, we'll fetch all phase-3 announcements. We'll only display the columns for the sid and the drug name.
phase_three = clinical_trials[clinical_trials.clinical_phase == "Phase III"][['timestamp', 'sid','product_name']].sort('timestamp')
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
phase_three
timestamp | sid | product_name | |
---|---|---|---|
0 | 2007-01-05 | 8763 | Allovectin-7 |
1 | 2007-01-09 | 1416 | FENTORA |
2 | 2007-01-11 | 3871 | ERBITUX |
3 | 2007-01-25 | 8763 | Allovectin-7 |
4 | 2007-02-09 | 24415 | Xibrom |
5 | 2007-02-23 | 24847 | Avastin |
6 | 2007-04-05 | 3871 | ERBITUX (Cetuximab) |
7 | 2007-04-11 | 3871 | ERBITUX |
8 | 2007-04-17 | 3871 | ERBITUX (Cetuximab) |
9 | 2007-04-26 | 23846 | BEMA Fentanyl |
10 | 2007-04-27 | 5847 | Nuvion |
Finally, suppose we want a DataFrame of GlaxoSmithKline Phase-III announcements, sorted in descending order by date:
gsk_sid = symbols('GSK').sid
gsk = clinical_trials[clinical_trials.sid==gsk_sid].sort('timestamp',ascending=False)
gsk_df = odo(gsk, pd.DataFrame)
# now filter down to the Phase 4 trials
gsk_df = gsk_df[gsk_df.clinical_phase=="Phase III"]
gsk_df
event_id | asof_date | trade_date | symbol | event_type | event_headline | clinical_phase | clinical_scope | clinical_result | product_name | event_rating | timestamp | sid | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1937202 | 2015-09-27 | 2015-09-28 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results From ... | Phase III | NaN | Positive | Anoro Ellipta | 1 | 2015-09-29 11:17:13.121838 | 3242 |
3 | 1836852 | 2015-02-09 | 2015-02-09 | GSK | Clinical Trials | GlaxoSmithKline and Theravance Initiate Phase ... | Phase III | NaN | NaN | fluticasone furoate/umeclidinium/vilanterol ( | 1 | 2015-02-10 00:00:00 | 3242 |
4 | 1817331 | 2014-12-18 | 2014-12-18 | GSK | Clinical Trials | GlaxoSmithKline Reports ZOE-50 Phase 3 Study M... | Phase III | NaN | Positive | ZOE-50 | 1 | 2014-12-19 00:00:00 | 3242 |
5 | 1745987 | 2014-07-16 | 2014-07-16 | GSK | Clinical Trials | GlaxoSmithKline & Theravance Initiates Phase I... | Phase III | NaN | NaN | IMPACT | 1 | 2014-07-17 00:00:00 | 3242 |
6 | 1738566 | 2014-06-25 | 2014-06-25 | GSK | Clinical Trials | GlaxoSmithKline Initiates Phase 3 Study with E... | Phase III | NaN | NaN | Eltrombopag | 1 | 2014-06-26 00:00:00 | 3242 |
7 | 1735091 | 2014-06-13 | 2014-06-13 | GSK | Clinical Trials | GlaxoSmithKline Reports Phase 3 PETIT2 Study M... | Phase III | NaN | Positive | PETIT2 | 1 | 2014-06-14 00:00:00 | 3242 |
8 | 1734216 | 2014-06-11 | 2014-06-11 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results from ... | Phase III | NaN | Positive | Incruse Ellipta | 1 | 2014-06-12 00:00:00 | 3242 |
10 | 1707265 | 2014-04-22 | 2014-04-22 | GSK | Clinical Trials | GlaxoSmithKline and Theravance Starts Phase II... | Phase III | NaN | NaN | FF/VI | 1 | 2014-04-23 00:00:00 | 3242 |
11 | 1700157 | 2014-04-02 | 2014-04-02 | GSK | Clinical Trials | GlaxoSmithKline to Stop MAGE-A3 Cancer Immunot... | Phase III | NaN | NaN | MAGRITi | 1 | 2014-04-03 00:00:00 | 3242 |
12 | 1695526 | 2014-03-20 | 2014-03-20 | GSK | Clinical Trials | GlaxoSmithKline's MAGE-A3 Cancer Immunotherape... | Phase III | NaN | Negative | MAGE-A3 | 1 | 2014-03-21 00:00:00 | 3242 |
13 | 1693181 | 2014-03-14 | 2014-03-14 | GSK | Clinical Trials | GlaxoSmithKline & Theravance Reports Positve R... | Phase III | NaN | Positive | Anoro Ellipta | 1 | 2014-03-15 00:00:00 | 3242 |
15 | 1653485 | 2013-12-06 | 2013-12-06 | GSK | Clinical Trials | GlaxoSmithKline and Theravance Announces Posit... | Phase III | NaN | Positive | Fluticasone Furoate | 1 | 2013-12-07 00:00:00 | 3242 |
17 | 1647384 | 2013-11-12 | 2013-11-12 | GSK | Clinical Trials | GlaxoSmithKline Announces Phase III Stability ... | Phase III | NaN | Negative | Darapladib | 1 | 2013-11-13 00:00:00 | 3242 |
18 | 1620476 | 2013-09-05 | 2013-09-05 | GSK | Clinical Trials | GlaxoSmithKline's MAGE-A3 Vaccine Fails to Mee... | Phase III | NaN | Negative | MAGE-A3 | 1 | 2013-09-06 00:00:00 | 3242 |
19 | 1521603 | 2012-12-19 | 2012-12-20 | GSK | Clinical Trials | GlaxoSmithKline, Amicus Therapeutics Announce ... | Phase III | NaN | Negative | Migalastat HCl | 1 | 2012-12-20 00:00:00 | 3242 |
21 | 1474291 | 2012-08-24 | 2012-08-24 | GSK | Clinical Trials | GlaxoSmithKline, Theravance Complete Phase III... | Phase III | NaN | NaN | LAMA/LABA | 1 | 2012-08-25 00:00:00 | 3242 |
22 | 1451483 | 2012-07-11 | 2012-07-11 | GSK | Clinical Trials | Shionogi-ViiV Healthcare Reports Positive Init... | Phase III | NaN | Positive | ING114467 | 1 | 2012-07-12 00:00:00 | 3242 |
23 | 1451624 | 2012-07-11 | 2012-07-11 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results in Ph... | Phase III | NaN | Positive | Albiglutide | 1 | 2012-07-12 00:00:00 | 3242 |
24 | 1448947 | 2012-07-02 | 2012-07-02 | GSK | Clinical Trials | Theravance and GlaxoSmithKline Report Positive... | Phase III | NaN | Positive | LAMA/LABA | 1 | 2012-07-03 00:00:00 | 3242 |
26 | 1414886 | 2012-04-03 | 2012-04-03 | GSK | Clinical Trials | GlaxoSmithKline Reports Further Positive Resul... | Phase III | NaN | Positive | Albiglutide | 1 | 2012-04-04 00:00:00 | 3242 |
27 | 1381734 | 2012-01-09 | 2012-01-09 | GSK | Clinical Trials | GlaxoSmithKline, Theravance Report Initial Res... | Phase III | NaN | Partial | Relovair | 1 | 2012-01-10 00:00:00 | 3242 |
28 | 1376242 | 2011-12-15 | 2011-12-15 | GSK | Clinical Trials | GlaxoSmithKline and Human Genome Initiate Phas... | Phase III | NaN | NaN | BENLYSTA | 1 | 2011-12-16 00:00:00 | 3242 |
29 | 1352859 | 2011-10-18 | 2011-10-18 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results from ... | Phase III | NaN | Positive | RTS,S | 1 | 2011-10-19 00:00:00 | 3242 |
30 | 1351145 | 2011-10-11 | 2011-10-11 | GSK | Clinical Trials | GlaxoSmithKline and Pfizer JV Initiates Phase ... | Phase III | NaN | NaN | Celsentri/Selzentry; emtricitabine/tenofovir | 1 | 2011-10-12 00:00:00 | 3242 |
31 | 1336573 | 2011-09-12 | 2011-09-12 | GSK | Clinical Trials | GlaxoSmithKline and Amicus Therapeutics Initia... | Phase III | NaN | NaN | Amigal | 1 | 2011-09-13 00:00:00 | 3242 |
32 | 1335427 | 2011-08-15 | 2011-08-15 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results for I... | Phase III | NaN | Positive | IPX066 | 1 | 2011-08-16 00:00:00 | 3242 |
33 | 1332195 | 2011-07-26 | 2011-07-26 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Data from Pro... | Phase III | NaN | Positive | ENABLE-1 | 1 | 2011-07-27 00:00:00 | 3242 |
34 | 1301739 | 2011-06-02 | 2011-06-02 | GSK | Clinical Trials | GlaxoSmithKline and Theravance Announce Positi... | Phase III | NaN | Positive | Relovair | 1 | 2011-06-03 00:00:00 | 3242 |
36 | 1249526 | 2011-02-07 | 2011-02-07 | GSK | Clinical Trials | GlaxoSmithKline, Human Genome Announce Positiv... | Phase III | NaN | Positive | BENLYSTA | 1 | 2011-02-08 00:00:00 | 3242 |
37 | 1249289 | 2011-02-03 | 2011-02-03 | GSK | Clinical Trials | GSK and Theravance Announce Progression of LAM... | Phase III | NaN | Positive | GSK573719/vilanterol | 1 | 2011-02-04 00:00:00 | 3242 |
39 | 1188282 | 2010-10-21 | 2010-10-21 | GSK | Clinical Trials | GlaxoSmithKline JV Initiates Phase III Trial f... | Phase III | NaN | NaN | S/GSK1349572 | 1 | 2010-10-22 00:00:00 | 3242 |
42 | 1126301 | 2010-06-17 | 2010-06-17 | GSK | Clinical Trials | GlaxoSmithKline Announces Positive Result in P... | Phase III | NaN | Partial | BENLYSTA | 1 | 2010-06-18 00:00:00 | 3242 |
43 | 1126332 | 2010-06-17 | 2010-06-17 | GSK | Clinical Trials | GlaxoSmithKline Announces Positive Phase 3 Res... | Phase III | NaN | Positive | BENLYSTA | 1 | 2010-06-18 00:00:00 | 3242 |
44 | 1089424 | 2010-04-20 | 2010-04-20 | GSK | Clinical Trials | GlaxoSmithKline, Human Genome Announce Failure... | Phase III | Limited Indications | Negative | BENLYSTA | 1 | 2010-04-21 00:00:00 | 3242 |
46 | 1004093 | 2009-11-02 | 2009-11-02 | GSK | Clinical Trials | GlaxoSmithKline Reports Positive Results in Se... | Phase III | NaN | Positive | BENLYSTA | 1 | 2009-11-03 00:00:00 | 3242 |
47 | 1000032 | 2009-10-27 | 2009-10-27 | GSK | Clinical Trials | GlaxoSmithKline Commences Phase III Horizon Pr... | Phase III | NaN | NaN | COPD | 1 | 2009-10-28 00:00:00 | 3242 |
48 | 976852 | 2009-10-20 | 2009-10-20 | GSK | Clinical Trials | GlaxoSmithKline Announces Positive Phase3 Resu... | Phase III | NaN | Positive | Belimumab | 1 | 2009-10-21 00:00:00 | 3242 |
53 | 537694 | 2009-02-17 | 2009-02-17 | GSK | Clinical Trials | GSK Initiates Phase III Programme for Novel Ty... | Phase III | NaN | NaN | GLP-1 | 1 | 2009-02-18 00:00:00 | 3242 |
56 | 522433 | 2008-12-06 | 2008-12-08 | GSK | Clinical Trials | GSK, Valeant's Retigabine Reduces Seizures in ... | Phase III | NaN | Positive | Retigabine | 1 | 2008-12-07 00:00:00 | 3242 |
70 | 147654 | 2008-02-28 | 2008-02-28 | GSK | Clinical Trials | GlaxoSmithKline and XenoPort Get Positive Resu... | Phase III | NaN | Positive | XP13512 | 1 | 2008-02-29 00:00:00 | 3242 |