In this notebook, we'll take a look at EventVestor's Impairments and Charges dataset, available on the Quantopian Store. This dataset spans January 01, 2007 through the current day, and documents goodwill impairments and other one time charges reported by companies.
Before we dig into the data, we want to tell you about how you generally access Quantopian Store data sets. These datasets are available through an API service known as Blaze. Blaze provides the Quantopian user with a convenient interface to access very large datasets.
Blaze provides an important function for accessing these datasets. Some of these sets are many millions of records. Bringing that data directly into Quantopian Research directly just is not viable. So Blaze allows us to provide a simple querying interface and shift the burden over to the server side.
It is common to use Blaze to reduce your dataset in size, convert it over to Pandas and then to use Pandas for further computation, manipulation and visualization.
Helpful links:
Once you've limited the size of your Blaze object, you can convert it to a Pandas DataFrames using:
from odo import odo
odo(expr, pandas.DataFrame)
One other key caveat: we limit the number of results returned from any given expression to 10,000 to protect against runaway memory usage. To be clear, you have access to all the data server side. We are limiting the size of the responses back from Blaze.
There is a free version of this dataset as well as a paid one. The free one includes about three years of historical data, though not up to the current day.
With preamble in place, let's get started:
# import the dataset
from quantopian.interactive.data.eventvestor import impairments_and_charges
# or if you want to import the free dataset, use:
# from quantopian.interactive.data.eventvestor import impairments_and_charges_free
# import data operations
from odo import odo
# import other libraries we will use
import pandas as pd
# Let's use blaze to understand the data a bit using Blaze dshape()
impairments_and_charges.dshape
dshape("""var * { event_id: ?float64, asof_date: datetime, trade_date: ?datetime, symbol: ?string, event_type: ?string, event_headline: ?string, charge_amount: ?float64, amount_units: ?string, event_rating: ?float64, timestamp: datetime, sid: ?int64 }""")
# And how many rows are there?
# N.B. we're using a Blaze function to do this, not len()
impairments_and_charges.count()
# Let's see what the data looks like. We'll grab the first three rows.
impairments_and_charges[:3]
event_id | asof_date | trade_date | symbol | event_type | event_headline | charge_amount | amount_units | event_rating | timestamp | sid | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 131321 | 2007-01-05 | 2007-01-08 | GT | Impairments/Charges | Goodyear To Record $155M To $160M Charges in 1... | 160 | $M | 1 | 2007-01-06 | 3384 |
1 | 110962 | 2007-01-08 | 2007-01-09 | MO | Impairments/Charges | Altria Group Subsidiary To Record $245M Asset ... | 245 | $M | 1 | 2007-01-09 | 4954 |
2 | 1182869 | 2007-01-16 | 2007-01-16 | FRX | Impairments/Charges | Forest Labs to Record $494M Charge in 4Q 07 | 494 | $M | 1 | 2007-01-17 | 3014 |
Let's go over the columns:
amount_units
We've done much of the data processing for you. Fields like timestamp
and sid
are standardized across all our Store Datasets, so the datasets are easy to combine. We have standardized the sid
across all our equity databases.
We can select columns and rows with ease. Below, we'll fetch all 2012 charges greater than $200M.
twohundreds = impairments_and_charges[('2011-12-31' < impairments_and_charges['asof_date']) &
(impairments_and_charges['asof_date'] <'2013-01-01') &
(impairments_and_charges.charge_amount > 200)&
(impairments_and_charges.amount_units == "$M")]
# When displaying a Blaze Data Object, the printout is automatically truncated to ten rows.
twohundreds.sort('asof_date')
event_id | asof_date | trade_date | symbol | event_type | event_headline | charge_amount | amount_units | event_rating | timestamp | sid | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1382496 | 2012-01-11 | 2012-01-12 | XL | Impairments/Charges | XL Group to Record Upto $220M Charges | 220 | $M | 1 | 2012-01-12 | 8340 |
1 | 1382455 | 2012-01-11 | 2012-01-12 | RF | Impairments/Charges | Regions Financial to Record Upto $745M Impairm... | 745 | $M | 1 | 2012-01-12 | 34913 |
2 | 1383159 | 2012-01-13 | 2012-01-16 | ADM | Impairments/Charges | Archer Daniels to Record Upto $360M Charge in ... | 360 | $M | 1 | 2012-01-14 | 128 |
3 | 1383004 | 2012-01-13 | 2012-01-13 | NVS | Impairments/Charges | Novartis to Record $1.22B Charges | 1220 | $M | 1 | 2012-01-14 | 21536 |
4 | 1383880 | 2012-01-18 | 2012-01-18 | HES | Impairments/Charges | Hess Corp to Record $525M Charge in 4Q 11 on R... | 525 | $M | 1 | 2012-01-19 | 216 |
5 | 1384387 | 2012-01-19 | 2012-01-19 | ECL | Impairments/Charges | Ecolab to Record $480M Charges by FY 13 | 480 | $M | 1 | 2012-01-20 | 2427 |
6 | 1385496 | 2012-01-23 | 2012-01-24 | MUR | Impairments/Charges | Murphy Oil Unit to Record $370M Asset Impairme... | 370 | $M | 1 | 2012-01-24 | 5126 |
7 | 1386032 | 2012-01-24 | 2012-01-25 | BBOX | Impairments/Charges | Black Box to Record $320M Charges in 3Q 12 | 320 | $M | 1 | 2012-01-25 | 11732 |
8 | 1385962 | 2012-01-24 | 2012-01-25 | RE | Impairments/Charges | Everest Re Group to Record $245M Catastrophe L... | 245 | $M | 1 | 2012-01-25 | 13720 |
9 | 1388133 | 2012-01-30 | 2012-01-30 | X | Impairments/Charges | United States Steel to Record Upto $450M Charg... | 450 | $M | 1 | 2012-01-31 | 8329 |
10 | 1388719 | 2012-01-31 | 2012-01-31 | X | Impairments/Charges | United States Steel to Record Up to $450M Char... | 450 | $M | 1 | 2012-02-01 | 8329 |
Now suppose we want a DataFrame of the Blaze Data Object above, and we only want the sid, charge_amount and the asof_date.
df = odo(twohundreds, pd.DataFrame)
df = df[['sid', 'asof_date','charge_amount']].dropna()
# When printing a pandas DataFrame, the head 30 and tail 30 rows are displayed. The middle is truncated.
df
sid | asof_date | charge_amount | |
---|---|---|---|
0 | 34913 | 2012-01-11 | 745.0 |
1 | 8340 | 2012-01-11 | 220.0 |
2 | 128 | 2012-01-13 | 360.0 |
3 | 21536 | 2012-01-13 | 1220.0 |
4 | 216 | 2012-01-18 | 525.0 |
5 | 2427 | 2012-01-19 | 480.0 |
6 | 5126 | 2012-01-23 | 370.0 |
7 | 11732 | 2012-01-24 | 320.0 |
8 | 13720 | 2012-01-24 | 245.0 |
9 | 8329 | 2012-01-30 | 450.0 |
10 | 8329 | 2012-01-31 | 450.0 |
11 | 351 | 2012-03-05 | 703.0 |
12 | 7334 | 2012-03-13 | 293.0 |
13 | 1335 | 2012-03-24 | 700.0 |
14 | 2263 | 2012-04-02 | 350.0 |
15 | 6116 | 2012-04-05 | 372.0 |
16 | 23112 | 2012-04-10 | 400.0 |
17 | 32902 | 2012-04-17 | 370.0 |
18 | 24838 | 2012-04-19 | 260.0 |
19 | 2351 | 2012-04-30 | 420.0 |
20 | 24838 | 2012-05-17 | 280.0 |
21 | 754 | 2012-05-22 | 350.0 |
22 | 3735 | 2012-05-23 | 1700.0 |
23 | 14388 | 2012-06-05 | 425.0 |
24 | 4151 | 2012-06-08 | 600.0 |
25 | 11673 | 2012-06-14 | 1000.0 |
26 | 88 | 2012-06-21 | 439.0 |
27 | 26204 | 2012-06-25 | 272.0 |
29 | 5061 | 2012-07-02 | 6200.0 |
30 | 903 | 2012-07-06 | 210.0 |
... | ... | ... | ... |
61 | 5520 | 2012-10-26 | 275.0 |
62 | 166 | 2012-11-01 | 2000.0 |
63 | 42173 | 2012-11-01 | 250.0 |
65 | 26169 | 2012-11-15 | 400.0 |
66 | 7671 | 2012-11-15 | 325.0 |
67 | 24833 | 2012-11-17 | 400.0 |
68 | 161 | 2012-11-20 | 290.0 |
69 | 23998 | 2012-11-26 | 400.0 |
71 | 24838 | 2012-11-28 | 1075.0 |
72 | 5092 | 2012-11-30 | 267.5 |
73 | 5862 | 2012-12-04 | 300.0 |
74 | 1335 | 2012-12-05 | 1000.0 |
75 | 7041 | 2012-12-05 | 650.0 |
76 | 239 | 2012-12-07 | 2000.0 |
77 | 8580 | 2012-12-10 | 380.0 |
78 | 1274 | 2012-12-11 | 880.0 |
79 | 14064 | 2012-12-11 | 370.0 |
80 | 8340 | 2012-12-12 | 350.0 |
81 | 4488 | 2012-12-13 | 750.0 |
82 | 25305 | 2012-12-17 | 300.0 |
83 | 25955 | 2012-12-18 | 220.0 |
84 | 8369 | 2012-12-18 | 288.0 |
85 | 21462 | 2012-12-19 | 240.0 |
86 | 40430 | 2012-12-19 | 400.0 |
87 | 10025 | 2012-12-19 | 240.0 |
88 | 24783 | 2012-12-20 | 2000.0 |
89 | 13720 | 2012-12-20 | 220.0 |
90 | 17395 | 2012-12-21 | 4300.0 |
91 | 34334 | 2012-12-21 | 333.1 |
92 | 7543 | 2012-12-26 | 1100.0 |
89 rows × 3 columns