Most finance websites show quotes or charts for various financial markets. Where is this data freely available and how can we download it, wrangle it into a useable format, and visualize it ourselves?
A quote for a financial instrument like a stock, currency, or commodity is a snapshot in time reflecting the sum of individual trades happening on the market for this instrument. Data points like price or trading volume are collected periodically to form a historical time series. If we chart this time series, we can visualize the activity of individual instruments, or the market as a whole with an index, and gain a quick overview, and perhaps some insights into the data.
Python is an excellent language for working with this kind of financial market data. The syntax is compact and helpful for exploratory data analysis using an interactive shell like iPython. This also makes it easy to capture ideas quickly, and concisely display code and data in a logbook format like a Jupyter Notebook. There is also strong support for downloading, manipulating, and visualizing financial market data through popular open source libraries like Pandas and Matplotlib.
# Download free end of day historical stock data
# Use pandas-datareader and Yahoo finance
from datetime import datetime
import pandas_datareader as pdr
end = datetime.now()
start = datetime(end.year - 5, end.month, end.day)
df = pdr.get_data_yahoo('SPY', start, end)
df.tail()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2017-10-16 | 255.210007 | 255.509995 | 254.820007 | 255.289993 | 255.289993 | 38221700 |
2017-10-17 | 255.229996 | 255.520004 | 254.979996 | 255.470001 | 255.470001 | 31561000 |
2017-10-18 | 255.899994 | 255.949997 | 255.500000 | 255.720001 | 255.720001 | 40888300 |
2017-10-19 | 254.830002 | 255.830002 | 254.350006 | 255.789993 | 255.789993 | 61903800 |
2017-10-20 | 256.700012 | 257.140015 | 255.770004 | 257.109985 | 257.109985 | 82477300 |
The above is a Pandas DataFrame, a two-dimensional tabular, column-oriented data structure with rich, high-performance time series functionality built on top of NumPy's array-computing features. A DataFrame provides many of the capabilities of a spreadsheet and relational database with flexible handling of missing data and integration with Matplotlib for visualization.
# summary statistics accross the whole DataFrame
df.describe()
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
count | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1258.000000 | 1.258000e+03 |
mean | 198.976081 | 199.786065 | 198.090239 | 199.021717 | 189.901637 | 1.084079e+08 |
std | 27.420185 | 27.392707 | 27.440104 | 27.424731 | 31.270607 | 4.908361e+07 |
min | 135.899994 | 136.490005 | 134.699997 | 135.699997 | 122.511452 | 3.156100e+07 |
25% | 182.962494 | 183.642505 | 181.845005 | 182.922497 | 169.699688 | 7.428145e+07 |
50% | 202.840004 | 203.855004 | 201.740005 | 203.114998 | 193.377548 | 9.750890e+07 |
75% | 213.127498 | 214.035003 | 212.500000 | 213.164997 | 208.331898 | 1.310476e+08 |
max | 256.700012 | 257.140015 | 255.770004 | 257.109985 | 257.109985 | 5.072443e+08 |
Slicing a DataFrame's column yields a Series that can be operated on alone as seen below.
# Closing price for most recent 5 trading days
df['Close'].tail()
Date 2017-10-16 255.289993 2017-10-17 255.470001 2017-10-18 255.720001 2017-10-19 255.789993 2017-10-20 257.109985 Name: Close, dtype: float64
# volume statistics
vol = df['Volume']
print("Min: %s Max: %s Average: %s" % (vol.min(), vol.max(), vol.mean()))
Min: 31561000 Max: 507244300 Average: 108407877.34499206
A wrapper around Matplotlib produces preformatted two-dimensional charts.
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
# plot the historical closing prices and volume using matplotlib
plots = df[['Close', 'Volume']].plot(subplots=True, figsize=(10, 10))
plt.show()
# chart a basic 50 period moving average of the closing price
import pandas as pd
df['ma50'] = df.Close.rolling(window=50).mean()
df['ma200'] = df.Close.rolling(window=200).mean()
data = df[['Close', 'ma50', 'ma200']][-500:]
plots = data.plot(subplots=False,figsize=(10, 4))
plt.show()