Financial market data manipulation and visualization with Python

In finance, market data is price and trade data for a given instrument like a stock, currency pair, or futures contract. Often times this data is visualized as a chart of historical data.

Python is an excellent language for working with financial data. Python syntax is compact for data exploration using an interactive shell like iPython. There is also strong support for downloading, manipulating, and visualizing financial market data through popular open source libraries like Pandas and Matplotlib.

In [2]:
# download free end of day historical stock data from Yahoo finance using Pandas
import pandas.io.data as web
from datetime import datetime

end = datetime.now()
start = datetime(end.year - 5, end.month, end.day)
df = web.DataReader("SPY", 'yahoo', start, end)

print df.tail()
              Open    High     Low   Close     Volume  Adj Close
Date                                                            
2014-06-09  195.35  196.05  195.17  195.58   65119000     195.58
2014-06-10  195.34  195.64  194.92  195.60   57129000     195.60
2014-06-11  194.90  195.12  194.48  194.92   68772000     194.92
2014-06-12  194.69  194.80  193.11  193.54  106350000     193.54
2014-06-13  193.92  194.32  193.30  194.13   81991900     194.13

[5 rows x 6 columns]

The above is a Pandas DataFrame, a two-dimensional tabular, column-oriented data structure with rich, high-performance time series functionality built on top of NumPy's array-computing features. A DataFrame provides many of the capabilities of a spreadsheet and relational database with flexible handling of missing data and integration with Matplotlib for visualization.

In [3]:
# summary statistics accross the whole DataFrame
df.describe()
Out[3]:
Open High Low Close Volume Adj Close
count 1258.000000 1258.000000 1258.000000 1258.000000 1.258000e+03 1258.000000
mean 136.798720 137.534714 136.000914 136.837981 1.683925e+08 131.023339
std 25.741783 25.688088 25.790178 25.744986 7.967646e+07 28.508414
min 87.700000 88.490000 87.000000 87.960000 3.967750e+07 79.860000
25% 116.262500 117.352500 115.202500 116.417500 1.150927e+08 108.050000
50% 132.595000 133.235000 131.715000 132.565000 1.519644e+08 125.480000
75% 155.157500 155.630000 154.260000 155.075000 2.040617e+08 151.905000
max 195.350000 196.050000 195.170000 195.600000 7.178287e+08 195.600000

8 rows × 6 columns

Slicing a DataFrame's column yields a Series that can be operated on alone as seen below.

In [4]:
# Closing price for most recent 5 trading days
df[['Close']].tail()
Out[4]:
Close
Date
2014-06-09 195.58
2014-06-10 195.60
2014-06-11 194.92
2014-06-12 193.54
2014-06-13 194.13

5 rows × 1 columns

In [5]:
# volume statistics
vol = df[['Volume']]
print "Min: %s Max: %s Average: %s" % (vol.min().values[0], vol.max().values[0], vol.mean().values[0])
Min: 39677500 Max: 717828700 Average: 168392548.649

A wrapper around Matplotlib produces preformatted two-dimensional charts.

In [6]:
# plot the historical closing prices and volume using matplotlib
plots = df[['Close', 'Volume']].plot(subplots=True, figsize=(10, 10))
plt.show()
In [7]:
# chart a basic 50 period moving average of the closing price
import pandas as pd
df['ma50'] = pd.rolling_mean(df['Close'], 50)
df['ma200'] = pd.rolling_mean(df['Close'], 200)
plots = df[['Close', 'ma50', 'ma200']].plot(subplots=False, figsize=(10, 4))
plt.show()