#!/usr/bin/env python # coding: utf-8 # # Intro to Python for Financial Market Data # Most finance websites show quotes or charts for various financial markets. Where is this data freely available and how can we download it, wrangle it into a useable format, and visualize it ourselves? # # A quote for a financial instrument like a stock, currency, or commodity is a snapshot in time reflecting the sum of individual trades happening on the market for this instrument. Data points like price or trading volume are collected periodically to form a historical time series. If we chart this time series, we can visualize the activity of individual instruments, or the market as a whole with an index, and gain a quick overview, and perhaps some insights into the data. # # Python is an excellent language for working with this kind of financial market data. The syntax is compact and helpful for exploratory data analysis using an interactive shell like iPython. This also makes it easy to capture ideas quickly, and concisely display code and data in a logbook format like a Jupyter Notebook. There is also strong support for downloading, manipulating, and visualizing financial market data through popular open source libraries like Pandas and Matplotlib. # ## Fetching and Exploring Data # In[28]: # Download free end of day historical stock data # Use pandas-datareader and Yahoo finance from datetime import datetime import pandas_datareader as pdr end = datetime.now() start = datetime(end.year - 5, end.month, end.day) df = pdr.get_data_yahoo('SPY', start, end) df.tail() # The above is a Pandas DataFrame, a two-dimensional tabular, column-oriented data structure with rich, high-performance time series functionality built on top of NumPy's array-computing features. A DataFrame provides many of the capabilities of a spreadsheet and relational database with flexible handling of missing data and integration with Matplotlib for visualization. # In[29]: # summary statistics accross the whole DataFrame df.describe() # Slicing a DataFrame's column yields a Series that can be operated on alone as seen below. # In[30]: # Closing price for most recent 5 trading days df['Close'].tail() # In[31]: # volume statistics vol = df['Volume'] print("Min: %s Max: %s Average: %s" % (vol.min(), vol.max(), vol.mean())) # A wrapper around [Matplotlib](http://matplotlib.org/) produces preformatted two-dimensional charts. # ## Charting and Visualiation # In[32]: get_ipython().run_line_magic('matplotlib', 'inline') import seaborn as sns import matplotlib.pyplot as plt # plot the historical closing prices and volume using matplotlib plots = df[['Close', 'Volume']].plot(subplots=True, figsize=(10, 10)) plt.show() # In[33]: # chart a basic 50 period moving average of the closing price import pandas as pd df['ma50'] = df.Close.rolling(window=50).mean() df['ma200'] = df.Close.rolling(window=200).mean() data = df[['Close', 'ma50', 'ma200']][-500:] plots = data.plot(subplots=False,figsize=(10, 4)) plt.show() # ## References # * [Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython](http://amzn.to/2zG3dM4) # * [10 Minutes to pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html) # * [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/) # * [Visualization](http://pandas.pydata.org/pandas-docs/stable/visualization.html) # * [Seaborn](http://seaborn.pydata.org/) # In[ ]: