This notebook contains course material from CBE40455 by Jeffrey Kantor (jeff at nd.edu); the content is available on Github. The text is released under the CC-BY-NC-ND-4.0 license, and code is released under the MIT license.
How much does one earn relative to the amount invested?
This is the basic concept of return, and one of the fundamental measurements of financial performance. This notebook examines the different ways in which return can be measured.
As will be shown below, pandas-datareader provides a convenient means access and manipulate financial data using the Pandas library. The pandas-datareader is normally imported separately from pandas. Typical installation is
pip install pandas-datareader
from a terminal window, or executing
!pip install pandas-datareader
in a Jupyter notebook cell. Google Colab environment now includes pandas-datareader, so separate installation is required.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import datetime
import pandas as pd
import pandas_datareader as pdr
This notebook uses the price of stocks and various commodity goods for the purpose of demonstrating returns. Price data is available from a number of sources. Here we demonstrate the process of obtaining price data on financial goods from Yahoo Finance and downloading price data sets from Quandl.
The most comprehensive repositories of financial data are commercial enterprises. Some provide a free tier of service for limited use, typically 50 inquires a day or several hundred a month. Some require registration to access the free tier. These details are a constantly changing. A listing of free services is available from awesome-quant, but please note that details change quickly.
Stock price data is usually indexed and accessed by stock symbols. Stock symbols are unique identifiers for a stock, commodity, or other financial good on a specific exchanges. The following function looks up details of stock symbol on yahoo finance..
# python libraray for accessing internet resources
import requests
def lookup_yahoo(symbol):
"""Return a list of all matches for a symbol on Yahoo Finance."""
url = f"http://d.yimg.com/autoc.finance.yahoo.com/autoc?query={symbol}®ion=1&lang=en"
return requests.get(url).json()["ResultSet"]["Result"]
lookup_yahoo("XOM")
[{'exch': 'NYQ', 'exchDisp': 'NYSE', 'name': 'Exxon Mobil Corporation', 'symbol': 'XOM', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'NMS', 'exchDisp': 'NASDAQ', 'name': 'XOMA Corporation', 'symbol': 'XOMA', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'YHD', 'exchDisp': 'Industry', 'name': 'Exxon Mobil Corporation', 'symbol': 'XOM.BA', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'YHD', 'exchDisp': 'Industry', 'name': 'Exxon Mobil Corporation', 'symbol': 'XOM.MX', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'DUS', 'exchDisp': 'Dusseldorf Stock Exchange', 'name': 'XOMA CORP. DL -,0005', 'symbol': 'X0M1.DU', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'STU', 'exchDisp': 'Stuttgart', 'name': 'XOMA Corp. Registered Shares DL', 'symbol': 'X0M1.SG', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'TLO', 'exchDisp': 'TLX Exchange', 'name': 'Exxon Mobil Corporation', 'symbol': 'XOM-U.TI', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'VIE', 'exchDisp': 'Vienna', 'name': 'Exxon Mobil Corporation', 'symbol': 'XOM.VI', 'type': 'S', 'typeDisp': 'Equity'}, {'exch': 'BUE', 'exchDisp': 'Buenos Aires', 'name': 'EXXON MOBIL CORP', 'symbol': 'XOMD.BA', 'type': 'S', 'typeDisp': 'Equity'}]
def get_symbol(symbol):
"""Return exact match for a symbol."""
result = [r for r in lookup_yahoo(symbol) if symbol == r['symbol']]
return result[0] if len(result) > 0 else None
get_symbol('X')
{'exch': 'NYQ', 'exchDisp': 'NYSE', 'name': 'United States Steel Corporation', 'symbol': 'X', 'type': 'S', 'typeDisp': 'Equity'}
Yahoo Finance provides historical Open, High, Low, Close, and Volume date for quotes on traded securities. In addition, Yahoo Finance provides historical Adjusted Close price data that corrects for splits and dividend distributions. Adjusted Close is a useful tool for computing the return on long-term investments.
The following cell demonstrates how to download historical Adjusted Close price for a selected security into a pandas DataFrame.
symbol = 'TSLA'
symbol = 'AAPL'
# get symbol data
symbol_data = get_symbol(symbol)
assert symbol_data, f"Symbol {symbol} wasn't found."
# start and end of a three year interval that ends today
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
# plot data
plt.figure(figsize=(10,4))
title = f"{symbol_data['name']} ({symbol_data['exchDisp']} {symbol_data['typeDisp']} {symbol_data['symbol']})"
S.plot(title=title)
plt.ylabel('Adjusted Close')
plt.grid()
Note that S
is an example of a Pandas time series.
S
Date 2017-11-06 41.842739 2017-11-07 41.977203 2017-11-08 42.320599 2017-11-09 42.234150 2017-11-10 42.094368 ... 2020-10-27 116.599998 2020-10-28 111.199997 2020-10-29 115.320000 2020-10-30 108.860001 2020-11-02 108.769997 Name: Adj Close, Length: 753, dtype: float64
Pandas time series are indexed by datetime entries. There is a large collection of functions in Pandas for manipulating time series data.
S['2018'].shift(1)
Date 2018-01-02 NaN 2018-01-03 41.513580 2018-01-04 41.506344 2018-01-05 41.699139 2018-01-08 42.173893 ... 2018-12-24 36.870335 2018-12-26 35.916355 2018-12-27 38.445637 2018-12-28 38.196133 2018-12-31 38.215702 Name: Adj Close, Length: 251, dtype: float64
Quandl is a searchable source of time-series data on a wide range of commodities, financials, and many other economic and social indicators. Data from Quandl can be downloaded as files in various formats, or accessed directly using the Quandl API or software-specific package. Here we use demonstrate use of the Quandl Python package.
The first step is execute a system command to check that the Quandl package has been installed.
Here are examples of energy datasets. These were found by searching Quandl, then identifying the Quandl code used for accessing the dataset, a description, the name of the field containing the desired price information.
%%capture
capture = !pip install quandl
code = 'CHRIS/MCX_CL1'
description = 'NYMEX Crude Oil Futures, Continuous Contract #1 (CL1) (Front Month)'
field = 'Close'
import quandl
end = datetime.datetime.today().date()
start = end - datetime.timedelta(5*365)
S = quandl.get(code, collapse='daily', trim_start=start.isoformat(), trim_end=end.isoformat())[field]
plt.figure(figsize=(10,4))
S.plot()
plt.title(description)
plt.ylabel('Price $/bbl')
plt.grid()
--------------------------------------------------------------------------- LimitExceededError Traceback (most recent call last) <ipython-input-14-24e5f07348b3> in <module>() 4 start = end - datetime.timedelta(5*365) 5 ----> 6 S = quandl.get(code, collapse='daily', trim_start=start.isoformat(), trim_end=end.isoformat())[field] 7 8 plt.figure(figsize=(10,4)) /usr/local/lib/python3.6/dist-packages/quandl/get.py in get(dataset, **kwargs) 46 if dataset_args['column_index'] is not None: 47 kwargs.update({'column_index': dataset_args['column_index']}) ---> 48 data = Dataset(dataset_args['code']).data(params=kwargs, handle_column_not_found=True) 49 # Array 50 elif isinstance(dataset, list): /usr/local/lib/python3.6/dist-packages/quandl/model/dataset.py in data(self, **options) 45 updated_options = Util.merge_options('params', params, **options) 46 try: ---> 47 return Data.all(**updated_options) 48 except NotFoundError: 49 if handle_not_found_error: /usr/local/lib/python3.6/dist-packages/quandl/operations/list.py in all(cls, **options) 13 options['params'] = {} 14 path = Util.constructed_path(cls.list_path(), options['params']) ---> 15 r = Connection.request('get', path, **options) 16 response_data = r.json() 17 Util.convert_to_dates(response_data) /usr/local/lib/python3.6/dist-packages/quandl/connection.py in request(cls, http_verb, url, **options) 36 abs_url = '%s/%s' % (ApiConfig.api_base, url) 37 ---> 38 return cls.execute_request(http_verb, abs_url, **options) 39 40 @classmethod /usr/local/lib/python3.6/dist-packages/quandl/connection.py in execute_request(cls, http_verb, url, **options) 48 **options) 49 if response.status_code < 200 or response.status_code >= 300: ---> 50 cls.handle_api_error(response) 51 else: 52 return response /usr/local/lib/python3.6/dist-packages/quandl/connection.py in handle_api_error(cls, resp) 112 klass = d_klass.get(code_letter, QuandlError) 113 --> 114 raise klass(message, resp.status_code, resp.text, resp.headers, code) LimitExceededError: (Status 429) (Quandl Error QELx01) You have exceeded the anonymous user limit of 50 calls per day. To make more calls today, please register for a free Quandl account and then include your API key with your requests.
The statistical properties of financial series are usually studied in terms of the change in prices. There are several reasons for this, key among them is that the changes can often be closely approximated as stationary random variables whereas prices are generally non-stationary sequences.
A common model is
$$S_{t} = R_{t} S_{t-1}$$so, recursively,
$$S_{t} = R_{t} R_{t-1} \cdots R_{0} S_{0}$$The gross return $R_t$ is simply the ratio of the current price to the previous, i.e.,
$$R_t = \frac{S_t}{S_{t-1}}$$$R_t$ will typically be a number close to one in value. The return is greater than one for an appreciating asset, or less than one for a declining asset.
symbol = 'AAPL'
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
R = S/S.shift(1)
# plot data
plt.figure(figsize=(10, 5))
plt.subplot(2, 1, 1)
S.plot(title=symbol)
plt.ylabel('Adjusted Close')
plt.grid()
plt.subplot(2, 1, 2)
R.plot()
plt.ylabel('Returns')
plt.grid()
plt.tight_layout()
Perhaps the most common way of reporting returns is simply the fractional increase in value of an asset over a period, i.e.,
$$r^{lin}_t = \frac{S_t - S_{t-1}}{S_{t-1}} = \frac{S_t}{S_{t-1}} - 1 $$Obviously
$$r^{lin}_t = R_t - 1$$symbol = 'AAPL'
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol,"yahoo",start,end)['Adj Close']
rlin = S/S.shift(1) - 1
# plot data
plt.figure(figsize=(10,5))
plt.subplot(2,1,1)
S.plot(title=symbol)
plt.ylabel('Adjusted Close')
plt.grid()
plt.subplot(2,1,2)
rlin.plot()
plt.title('Linear Returns (daily)')
plt.grid()
plt.tight_layout()
Suppose you put money in an asset that returns 10% interest in even numbered years, but loses 10% in odd numbered years. Is this a good investment for the long-haul?
If we look at mean linear return
\begin{align} \bar{r}^{lin} & = \frac{1}{T}\sum_{t=1}{T} r^{lin}_t \\ & = \frac{1}{T} (0.1 - 0.1 + 0.1 - 0.1 + \cdots) \\ & = 0 \end{align}we would conclude this asset, on average, offers zero return. What does a simulation show?
S = 100
log = [[0,S]]
r = 0.10
for k in range(1,101):
S = S + r*S
r = -r
log.append([k,S])
df = pd.DataFrame(log,columns = ['k','S'])
plt.plot(df['k'],df['S'])
plt.xlabel('Year')
plt.ylabel('Value')
Text(0, 0.5, 'Value')
Despite an average linear return of zero, what we observe over time is an asset declining in price. The reason is pretty obvious --- on average, the years in which the asset loses money have higher balances than years where the asset gains value. Consequently, the losses are somewhat greater than the gains which, over time, leads to a loss of value.
Here's a real-world example of this phenomenon. For a three year period ending October 24, 2017, United States Steel (stock symbol 'X') offers an annualized linear return of 15.9%. Seems like a terrific investment opportunity, doesn't it? Would you be surprised to learn that the actual value of the stock fell 18.3% over that three-year period period?
What we can conclude from these examples is that average linear return, by itself, does not provide us with the information needed for long-term investing.
symbol = 'X'
end = datetime.datetime(2017, 10, 24)
start = end-datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
rlin = S/S.shift(1) - 1
print('Three year return :', 100*(S[-1]-S[0])/S[0], '%')
# plot data
plt.figure(figsize=(10,5))
plt.subplot(2,1,1)
S.plot(title=symbol)
plt.ylabel('Adjusted Close')
plt.grid()
plt.subplot(2,1,2)
rlin.plot()
plt.title('Mean Linear Returns (annualized) = {0:.2f}%'.format(100*252*rlin.mean()))
plt.grid()
plt.tight_layout()
Three year return : -18.27174276977313 %
Compounded, or log returns, are defined as
$$r^{log}_{t} = \log R_t = \log \frac{S_{t}}{S_{t-1}}$$The log returns have a very useful compounding property for aggregating price changes across time
$$ \log \frac{S_{t+k}}{S_{t}} = r^{log}_{t+1} + r^{log}_{t+2} + \cdots + r^{log}_{t+k}$$If the compounded returns are statistically independent and identically distributed, then this property provides a means to aggregate returns and develop statistical price projections.
symbol = 'AAPL'
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
rlog = np.log(S/S.shift(1))
# plot data
plt.figure(figsize=(10,5))
plt.subplot(2,1,1)
S.plot(title=symbol)
plt.ylabel('Adjusted Close')
plt.grid()
plt.subplot(2,1,2)
rlin.plot()
plt.title('Log Returns (daily)')
plt.grid()
plt.tight_layout()
For long-term financial decision making, it's important to understand the relationship between $r_t^{log}$ and $r_t^{lin}$. Algebraically, the relationships are simple.
$$r^{log}_t = \log \left(1+r^{lin}_t\right)$$$$r^{lin}_t = e^{r^{log}_t} - 1$$The linear return $r_t^{lin}$ is the fraction of value that is earned from an asset in a single period. It is a direct measure of earnings. The average value $\bar{r}^{lin}$ over many periods this gives the average fractional earnings per period. If you care about consuming the earnings from an asset and not about growth in value, then $\bar{r}^{lin}$ is the quantity of interest to you.
Log return $r_t^{log}$ is the rate of growth in value of an asset over a single period. When averaged over many periods, $\bar{r}^{log}$ measures the compounded rate of growth of value. If you care about the growth in value of an asset, then $\bar{r}^{log}$ is the quantity of interest to you.
The compounded rate of growth $r_t^{log}$ is generally smaller than average linear return $\bar{r}^{lin}$ due to the effects of volatility. To see this, consider an asset that has a linear return of -50% in period 1, and +100% in period 2. The average linear return is would be +25%, but the compounded growth in value would be 0%.
A general formula for the relationship between $\bar{r}^{log}$ and $\bar{r}^{lin}$ is derived as follows:
$$\begin{align*} \bar{r}^{log} & = \frac{1}{T}\sum_{t=1}^{T} r_t^{log} \\ & = \frac{1}{T}\sum_{t=1}^{T} \log\left(1+r_t^{lin}\right) \\ & = \frac{1}{T}\sum_{t=1}^{T} \left(\log(1) + r_t^{lin} - \frac{1}{2} (r_t^{lin})^2 + \cdots \right) \\ & = \frac{1}{T}\sum_{t=1}^{T} r_t^{lin} - \frac{1}{2}\frac{1}{T}\sum_{t=1}^{T} (r_t^{lin})^2 + \cdots \\ & = \bar{r}^{lin} - \frac{1}{2}\left(\frac{1}{T}\sum_{t=1}^{T} (r_t^{lin})^2\right) + \cdots \\ & = \bar{r}^{lin} - \frac{1}{2}\left((\bar{r}^{lin})^2 + \frac{1}{T}\sum_{t=1}^{T} (r_t^{lin}-\bar{r}^{lin})^2\right) + \cdots \end{align*}$$For typical values $\bar{r}^{lin}$ of and long horizons $T$, this results in a formula
$$\begin{align*} \bar{r}^{log} & \approx \bar{r}^{lin} - \frac{1}{2} \left(\sigma^{lin}\right)^2 \end{align*}$$where $\sigma^{lin}$ is the standard deviation of linear returns, more commonly called the volatility.
The difference $- \frac{1}{2} \left(\sigma^{lin}\right)^2$ is the volatility drag imposed on the compounded growth in value of an asset due to volatility in linear returns. This can be significant and a source of confusion for many investors.
It's indeed possible to have a positive average linear return, but negative compounded growth. To see this, consider a $100 investment which earns 20% on even-numbered years, and loses 18% on odd-numbered years. The average linear return is 1%, and the average log return is -0.81%.
symbol = 'AAPL'
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
rlin = (S - S.shift(1))/S.shift(1)
rlog = np.log(S/S.shift(1))
# plot data
plt.figure(figsize=(10,6))
plt.subplot(3,1,1)
S.plot(title=symbol)
plt.ylabel('Adjusted Close')
plt.grid()
plt.subplot(3,1,2)
rlin.plot()
plt.title('Linear Returns (daily)')
plt.grid()
plt.tight_layout()
plt.subplot(3,1,3)
rlog.plot()
plt.title('Log Returns (daily)')
plt.grid()
plt.tight_layout()
print("Mean Linear Return (rlin) = {0:.7f}".format(rlin.mean()))
print("Linear Volatility (sigma) = {0:.7f}".format(rlin.std()))
print("Volatility Drag -0.5*sigma**2 = {0:.7f}".format(-0.5*rlin.std()**2))
print("rlin - 0.5*vol = {0:.7f}\n".format(rlin.mean() - 0.5*rlin.std()**2))
print("Mean Log Return = {0:.7f}".format(rlog.mean()))
Mean Linear Return (rlin) = 0.0015108 Linear Volatility (sigma) = 0.0218788 Volatility Drag -0.5*sigma**2 = -0.0002393 rlin - 0.5*vol = 0.0012715 Mean Log Return = 0.0012704
symbols = ['AAPL','MSFT','F','XOM','GE','X']
end = datetime.datetime.today().date()
start = end - datetime.timedelta(3*365)
rlin = []
rlog = []
sigma = []
for symbol in symbols:
# get stock price data
S = pdr.data.DataReader(symbol, "yahoo", start, end)['Adj Close']
r = (S - S.shift(1))/S.shift(1)
rlin.append(r.mean())
rlog.append((np.log(S/S.shift(1))).mean())
sigma.append(r.std())
import seaborn as sns
N = len(symbols)
idx = np.arange(N)
width = 0.2
p0 = plt.bar(idx - 1.25*width, rlin, width)
p1 = plt.bar(idx, -0.5*np.array(sigma)**2, width, bottom=rlin)
p2 = plt.bar(idx + 1.25*width, rlog, width)
for k in range(0,N):
plt.plot([k - 1.75*width, k + 0.5*width],[rlin[k],rlin[k]],'k',lw=1)
plt.plot([k - 0.5*width, k + 1.75*width],[rlog[k],rlog[k]],'k',lw=1)
plt.xticks(idx,symbols)
plt.legend((p0[0],p1[0],p2[0]),('rlin','0.5*sigma**2','rlog'))
plt.title('Components of Linear Return')
Text(0.5, 1.0, 'Components of Linear Return')