John Stachurski and Quentin Batista
A well-known phenomenon in Economics is that financial returns tend to have a fat-tailed distribution. Broadly speaking, this means that the probability of observing an extreme event under this distribution is larger than it is for a normal distribution with the same mean and variance. This empirical power law has been well-documented for a wide range of securities in many different countries. In this notebook, we attempt to determine whether such a power law holds for cryptocurrencies by estimating the distribution tail of their returns.
We obtain data on cryptocurrency prices from Poloniex, a US-based cryptocurrency exchange platform. The data is available at different frequencies. We choose the highest frequency (5-minute interval) in order to maximize the number of data points. Below is a list of the different currencies we analyze:
import pandas as pd
import numpy as np
def get_data(symbol, frequency):
#Params: String symbol, int frequency = 300,900,1800,7200,14400,86400
#Returns: df from first available date
url ='https://poloniex.com/public?command=returnChartData¤cyPair='+symbol+'&end=9999999999&period='+str(frequency)+'&start=0'
df = pd.read_json(url)
df.set_index('date',inplace=True)
return df
tickers = ['BTC', 'ETC', 'XRP', 'ETH', 'XMR', 'LTC', 'STR', 'BCH', 'NXT', 'ZEC', 'DASH', 'REP']
tickers = ['USDT_' + ticker for ticker in tickers]
# Sometimes fails -- run again if it does
prices_df = pd.DataFrame()
for ticker in tickers:
prices_df = prices_df.join(get_data(ticker, 300)['close'].rename(ticker), how='outer')
We use the first order difference of the natural log of prices as a measure of return.
# Log difference
five_min_returns = np.log(prices_df)
five_min_returns = five_min_returns-five_min_returns.shift(1)
We begin by visualizing the data to find patterns suggesting the presence of heavy tails. Using kernel density estimates, we plot the distribution of the returns.
import seaborn as sns
density_df = pd.DataFrame(five_min_returns.stack(), columns=['return']).reset_index()
%matplotlib inline
kde_plot = (sns.FacetGrid(density_df,
hue='level_1',
size=8,
xlim=(min(density_df['return']), max(density_df['return']))).map(sns.kdeplot, 'return').add_legend())
This plot suggests that returns can sometimes take on very extreme values as evidenced by the heavy tails of the estimated distribution. However, not only is it difficult to differentiate between currencies in this plot, it is also hard to see what exactly happens in the tails. To better understand this pattern, we plot the returns over time with some deviation statistics.
import matplotlib.pyplot as plt
def find_fattails(data, name):
series = data[data.notnull()] # Remove null entries from series
std = series.std() # Find standard deviation
mean = series.mean() # Find mean
print(name)
plt.figure(figsize=(15, 7))
plt.axhline(mean + std * 2, c='k')
plt.axhline(mean + std * 3, c='k', ls='--')
plt.axhline(mean - std * 2, c='k')
plt.axhline(mean - std * 3, c='k', ls='--')
plt.plot(series)
plt.axhline(mean, c='k')
plt.title('log price first-order difference: ' + name)
plt.xlim(min(series.index), max(series.index))
plt.show()
for n in (3, 4):
outliers = series[(series > mean + std * n) | (series < mean - std * n)]
outliers_pct = len(outliers) / len(series) * 100
print(str(round(outliers_pct, 2)) + "% of observations are greater than " + str(n) + " standard deviations from the mean")
max_dev = (series.abs().max() - mean)/std
print("The maximum deviation is " + str(round(max_dev, 2)) + " deviations from the mean")
for ticker in tickers:
find_fattails(five_min_returns[ticker], ticker)
USDT_BTC
1.44% of observations are greater than 3 standard deviations from the mean 0.76% of observations are greater than 4 standard deviations from the mean The maximum deviation is 198.46 deviations from the mean USDT_ETC
1.6% of observations are greater than 3 standard deviations from the mean 0.69% of observations are greater than 4 standard deviations from the mean The maximum deviation is 70.16 deviations from the mean USDT_XRP
0.42% of observations are greater than 3 standard deviations from the mean 0.22% of observations are greater than 4 standard deviations from the mean The maximum deviation is 174.45 deviations from the mean USDT_ETH
0.31% of observations are greater than 3 standard deviations from the mean 0.17% of observations are greater than 4 standard deviations from the mean The maximum deviation is 330.58 deviations from the mean USDT_XMR
0.38% of observations are greater than 3 standard deviations from the mean 0.27% of observations are greater than 4 standard deviations from the mean The maximum deviation is 131.79 deviations from the mean USDT_LTC
0.26% of observations are greater than 3 standard deviations from the mean 0.13% of observations are greater than 4 standard deviations from the mean The maximum deviation is 251.05 deviations from the mean USDT_STR
0.11% of observations are greater than 3 standard deviations from the mean 0.06% of observations are greater than 4 standard deviations from the mean The maximum deviation is 154.13 deviations from the mean USDT_BCH
1.38% of observations are greater than 3 standard deviations from the mean 0.6% of observations are greater than 4 standard deviations from the mean The maximum deviation is 66.82 deviations from the mean USDT_NXT
0.15% of observations are greater than 3 standard deviations from the mean 0.09% of observations are greater than 4 standard deviations from the mean The maximum deviation is 387.61 deviations from the mean USDT_ZEC
0.95% of observations are greater than 3 standard deviations from the mean 0.49% of observations are greater than 4 standard deviations from the mean The maximum deviation is 80.28 deviations from the mean USDT_DASH
0.11% of observations are greater than 3 standard deviations from the mean 0.08% of observations are greater than 4 standard deviations from the mean The maximum deviation is 216.55 deviations from the mean USDT_REP
1.82% of observations are greater than 3 standard deviations from the mean 0.8% of observations are greater than 4 standard deviations from the mean The maximum deviation is 37.31 deviations from the mean
These plots confirm our previous observation as some observations are extremely far away from the mean.
One of the ways to measure the degree of heavy-tailedness of a distribution is to use a tail index which characterizes the rate of power decay of the distribution tails. A low index value corresponds to a high probability mass in the tails. A popular way of estimating these indices is to use an OLS log-log rank-size regression. With optimal shift, the process is as follows:
The estimated value of $\beta$ is the estimate for the tail index.
First, we create a log-log rank-size scatter plot to determine whether the linear model is appropriate. Then, we carry out the regressions using the statsmodels
package.
abs_five_min_returns = abs(five_min_returns)
quantile = 0.9
plt.figure(figsize=(12, 30))
for i, ticker in enumerate(tickers):
truncation_level = abs_five_min_returns[ticker].quantile(quantile)
condition = abs_five_min_returns[ticker] > truncation_level
x = np.log(abs_five_min_returns.rank(ascending=False)[condition][ticker])
y = np.log(abs_five_min_returns[condition][ticker])
plt.subplot(int(len(tickers) / 2), 2, i+1)
plt.scatter(x, y)
plt.title(ticker)
plt.xlabel('Rank')
plt.ylabel('Size')
plt.xlim(min(x), max(x))
plt.show()
A linear model seems to provide a reasonable approximation to the relationship in the data despite the fact that some of the highest ranked observations are often outliers.
import statsmodels.api as sm
def log_log_rank_size_reg(ticker, quantile, n):
truncation_level = abs_five_min_returns[ticker].quantile(quantile)
condition = abs_five_min_returns[ticker] > truncation_level
Y = np.log(abs_five_min_returns.rank(ascending=False)[condition][ticker] - 1/2)
X = -np.log(abs_five_min_returns[condition][ticker])
X = sm.add_constant(X)
model = sm.OLS(Y,X)
results = model.fit()
se = np.sqrt(2/n)*results.params[1]
output = 'Top ' + str(round(100-quantile*100)) + '% -- ' + 'Estimated Tail Index: ' + str(round(results.params[1], 2)) + (
' -- 95% Confidence Interval: [' + str(round(results.params[1]-1.96*se, 2)) + ', ' + str(round(results.params[1]+1.96*se, 2)) + ']')
print(output)
/Users/QBatista/anaconda/lib/python3.6/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead. from pandas.core import datetools
for ticker in tickers:
print(ticker)
n = abs_five_min_returns[ticker].dropna().shape[0]
print("total number of observations: " + str(n))
log_log_rank_size_reg(ticker, 0.9, n)
log_log_rank_size_reg(ticker, 0.95, n)
print("")
USDT_BTC total number of observations: 353925 Top 10% -- Estimated Tail Index: 2.07 -- 95% Confidence Interval: [2.06, 2.08] Top 5% -- Estimated Tail Index: 2.3 -- 95% Confidence Interval: [2.29, 2.31] USDT_ETC total number of observations: 202933 Top 10% -- Estimated Tail Index: 2.76 -- 95% Confidence Interval: [2.74, 2.78] Top 5% -- Estimated Tail Index: 2.97 -- 95% Confidence Interval: [2.95, 2.99] USDT_XRP total number of observations: 353627 Top 10% -- Estimated Tail Index: 1.79 -- 95% Confidence Interval: [1.78, 1.79] Top 5% -- Estimated Tail Index: 1.9 -- 95% Confidence Interval: [1.89, 1.91] USDT_ETH total number of observations: 305120 Top 10% -- Estimated Tail Index: 1.89 -- 95% Confidence Interval: [1.88, 1.9] Top 5% -- Estimated Tail Index: 1.9 -- 95% Confidence Interval: [1.89, 1.91] USDT_XMR total number of observations: 354692 Top 10% -- Estimated Tail Index: 1.67 -- 95% Confidence Interval: [1.67, 1.68] Top 5% -- Estimated Tail Index: 1.59 -- 95% Confidence Interval: [1.58, 1.6] USDT_LTC total number of observations: 349288 Top 10% -- Estimated Tail Index: 2.07 -- 95% Confidence Interval: [2.06, 2.08] Top 5% -- Estimated Tail Index: 2.04 -- 95% Confidence Interval: [2.03, 2.05] USDT_STR total number of observations: 348131 Top 10% -- Estimated Tail Index: 1.78 -- 95% Confidence Interval: [1.77, 1.79] Top 5% -- Estimated Tail Index: 1.82 -- 95% Confidence Interval: [1.81, 1.82] USDT_BCH total number of observations: 92896 Top 10% -- Estimated Tail Index: 2.56 -- 95% Confidence Interval: [2.54, 2.58] Top 5% -- Estimated Tail Index: 2.69 -- 95% Confidence Interval: [2.67, 2.72] USDT_NXT total number of observations: 353909 Top 10% -- Estimated Tail Index: 1.87 -- 95% Confidence Interval: [1.86, 1.88] Top 5% -- Estimated Tail Index: 2.0 -- 95% Confidence Interval: [1.99, 2.01] USDT_ZEC total number of observations: 176230 Top 10% -- Estimated Tail Index: 2.13 -- 95% Confidence Interval: [2.11, 2.14] Top 5% -- Estimated Tail Index: 2.1 -- 95% Confidence Interval: [2.09, 2.11] USDT_DASH total number of observations: 354690 Top 10% -- Estimated Tail Index: 2.13 -- 95% Confidence Interval: [2.12, 2.14] Top 5% -- Estimated Tail Index: 2.01 -- 95% Confidence Interval: [2.0, 2.02] USDT_REP total number of observations: 183135 Top 10% -- Estimated Tail Index: 2.57 -- 95% Confidence Interval: [2.56, 2.59] Top 5% -- Estimated Tail Index: 2.79 -- 95% Confidence Interval: [2.77, 2.8]
All our estimates fall into the $\left[1.5,3\right]$ interval, which corresponds to a relatively high degree of heavy-tailedness.
To put this in context, we carry out the same process on S&P 500 daily returns.
import pandas_datareader as pdr
import datetime
start = datetime.datetime(1993, 2, 1)
sp_prices = pdr.data.DataReader('SPY', data_source='yahoo', start=start)['Adj Close']
sp_returns = (np.log(sp_prices) - np.log(sp_prices).shift(1)).dropna()
n = sp_returns.shape[0]
quantile = 0.95
truncation_level = sp_returns.quantile(quantile)
condition = sp_returns > truncation_level
Y = np.log(sp_returns.rank(ascending=False)[condition] - 1/2)
X = -np.log(sp_returns[condition])
X = sm.add_constant(X)
model = sm.OLS(Y,X)
results = model.fit()
se = np.sqrt(2/n)*results.params[1]
output = 'Top ' + str(round(100-quantile*100)) + '% -- ' + 'Estimated Tail Index: ' + str(round(results.params[1], 2)) + (
' -- 95% Confidence Interval: [' + str(round(results.params[1]-1.96*se, 2)) + ', ' + str(round(results.params[1]+1.96*se, 2)) + ']')
print(output)
Top 5% -- Estimated Tail Index: 2.91 -- 95% Confidence Interval: [2.81, 3.02]
The estimated tail index is higher than that of most cryptocurrencies, which suggests that the distribution of cryptocurrency returns have fatter tails than the distribution of returns for the average U.S. based large-cap company.
As mentioned previously, we attempted to determine whether cryptocurrency returns have, as returns of other financial securities, a fat-tailed distribution. From our analysis, not only did we find that they do, but also that the degree of heavy-tailedness is relatively high in comparison to an average U.S. based large-cap company.
[1] Andrei Ankudinov, Rustam Ibragimov, and Oleg Lebedev. Heavy tails and asymmetry of returns in the russian stock market. Emerging Markets Review, 32:200–219, 2017.
[2] Thomas Lux and Simone Alfarano. Financial power laws: Empirical evidence,models, and mechanisms. Chaos, Solitons & Fractals, 88:3–18, 2016.