An R/Finance talk by Stuart Reid, Chief Engineer @NMRQL and Blogger @TuringFinance
There is a legend that Professor Burton G. Malkiel, author of A Random Walk Down Wall Street, constructed a price chart by flipping a coin and presented it to a world-renowned chartist to analyze. The chartist studied the movements of the fictitious security and exclaimed that it was a textbook pattern and a once-in-a-lifetime opportunity ...
One line is a jump diffusion process and the other is an asset taken from the JSE Top 40 over a random one-year period
The Random Walk Hypothesis is a 116-year-old empirical assumption that security prices can be modelled by a random walk, a stochastic process.
The Random Walk Hypothesis is not an economic theory nor is it even an assertion that markets are literally random. It is merely an assumption.
And it is an assumption which lies at the heart of everything we quants hold dear: risk metrics, derivatives, mean variance optimization, even sharpe ratios!
It was only in the 1970's that the empirical assumption of randomness was finally justified by an economic theory called the Efficient Market Hypothesis.
The Nobel-prize worthy Efficient Market Hypothesis explained the Random Walk Assumption using the counter-intuitive lexicon of Classical Information Theory.
Consider the following two binary strings,
Which string contains more information?
The first string contains almost no information because all of the bits are exactly the same ad infinitum. Knowing one bit is sufficient to know what every other bit will be.
The first string is called determinstic, predictable, or simply just boring.
The second string contains lots of information because knowing one bit does not seem to be sufficient to know what any other bit will most likely be.
The second string is called stochastic, unpredictable, or simply just interesting.
The Efficient Market Hypothesis argues that investors trading on new information reflect that informaton into the price of the security thereby making it more random; the Efficient Market Hypothesis is a byproduct of participation by many intelligent agents.
The Efficient Market Hypothesis is an Economic Theory because it allows for investors to be reimbursed for taking on risk when they invest in the market. This return, called the market risk premium, is the expected rate of return of a diversified portfolio. This premium explains why buy-and-hold investors and index funds have a non-zero expected return!
So when proponents of the Efficient Market Hypothesis say "active investors can't beat the market" what they really meant to say is that "active investors can't generate returns above the market risk premium with less risk than the market (a.k.a abnormal returns)"
The Efficient Market Hypothesis distinguishes between three forms of market efficiency: weak-form, semi-strong-form, and strong-form. These differ according to the set of information, $\Phi$, which is reflected by investors into the price of securities,
The Random Walk Hypothesis typically deals with just the first form of market efficiency, but theoretically it could be used to test the hypothesis that the market is efficient with respect to any subset of information, $\Phi_i$. More on that later ...
If the current price of a security reflects all historical price data, then the expected price of the security tomorrow with respect to historical price data is its price today,
$E[S_{t+1}|\Phi_{weak}] = S_{t}$ or alternatively,
Let $r = ln\Big(\frac{S_{t+1}}{S_{t}}\Big)$ then,
$E[r_{t+1}|\Phi_{weak}] = ln\Big(\frac{E[S_{t+1}]}{S_{t}}\Big) = ln(S_{t}/S_{t}) = ln(1) = 0$
The above model is called a Martingale and it is the purest type of random walk! But, it does not reflect the equity risk premium ...
If the current price of a security reflects all historical price data, then the expected price of the security tomorrow with respect to historical price data is greater than today's price because securities are risky assets and we should be rewarded for taking on that risk,
$E[S_{t+1}|\Phi_{weak}] \geq S_{t}$ or alternatively,
$E[r_{t+1}|\Phi_{weak}] \geq 0$
This is called a Submartingale random walk and it is the underlying assumption of most quantitative models. The Submartingale Random Walk Hypothesis is a consequent of the Efficient Market Hypothesis which is the consequent of active investment,
$\textbf{Intelligent Investors} \rightarrow \textbf{Efficient Market} \rightarrow \textbf{Random Walks}$
One, often overlooked, aspect of market efficiency is the set of models, $\mathcal{M}$, used by investors. Consider that "random string" we looked at earlier, "1101000101001000100010000100110100110", and now consider the below model:
model <- function(binstr) {
sum(2^(which(rev(unlist(strsplit(as.character(binstr), "")) == 1))-1))
}
print(model("1101000101001000100010000100110100110"))
[1] 112358132134
Does it look familiar? 1, 1, 2, 3, 5, 8, 13, 21, 34, ... The sequence may look random to our mental model in binary, but to our model above it is decidedly not random! Therefore models are an important aspect of market efficiency,
Quantitative investing is the belief that with respect to any set of information, $\Phi_{i}$, and some set of models, $\mathcal{M}$, security prices consist of a signal component and a noise component. So for some combinations of $\Phi_{i}$ and $\mathcal{M}$ the market may be totally random, but for some other combinations of $\Phi_{i}$ and $\mathcal{M}$ is may only be semi-random.
Quantitative investors try to find combinations of $\Phi_{i}$ and $\mathcal{M}$ that reduce the randomness of the security we are trading. We seek information-rich data and build powerful models.
If the markets are random walks, what type of distribution to they have?
Generally speaking you get two types of randomness tests: parametric and nonparametric.
A good test of the random walk hypothesis makes few assumptions about the data ... but may, as a result, be less powerful than another more specific test.
This, and biases in randomness tests, are the reason why I believe we should ensemble randomness tests together when testing the random walk hypothesis.
And if the markets are random walks, are they random in all frequencies?
Most statistical tests of randomness are conducted on returns computed over a specific period of time, usually daily. But just because daily returns are random doesn't automatically imply that weekly or monthly returns are random too.
As such most randomness tests are conducted in multiple frequencies or rather, across multiple lags. Eugene Fama's original paper looked at lags from one to ten days.
All of these issues are what inspired me to write the emh package for R. This package, which we will be going through shortly, makes it increadibly easy to run a suite of randomness tests on a financial time series object and extract the results of each test in the suite on the data sampled at different frequencies.
library(emh)
For weak-form market efficiency testing there are five types of randomness tests,
These are all univariate tests done to determine whether a time series of returns, $r$, is random with respect to itself, $\Phi = r$. Multivariate tests do exist but have, unfortunately, not been applied very often in the context of market efficiency testing ...
A run is a continuous sequence of either (-)'s - down days - or (+)'s - up days. In the early 1940's Abraham Wald and Jacob Wolfowitz proved that the distribution of the number of runs is for large sample sizes is approximately normally distributed with,
$\mu = \frac{2 N_{+} N_{-}}{N} - 1$ and
$\sigma^2 = \frac{N_{+} N_{-}(2 N_{+} N_{-} - N)}{N^2 (N-1)} = \frac{(\mu - 1)(\mu - 2)}{N - 1}$
where $N_{+}$ is the number of number of +'s and $N_{-}$ is the number of -'s
This test if nonparametric meaning that you can have 90% of your day's being (+) and still test whether or not the sequence was random. This is quite a simple test which was used by Fama in his original papers on market randomness.
The Durbin-Watson test is named after James Durbin and Geoffrey Watson. It is a test for statistically significant levels of serial correlation in a time series. The Durbin-Watson is conducted on the residuals, $\epsilon$, from a regression analysis on the returns with itself.
As such, the test does take into account drift ... but only if you assume that drift is stationary and constant. The test can be done on the residuals of a moving average,
$d = \frac{\sum^{\tau}_{t=2}(\epsilon_{t} - \epsilon_{t-1})^2}{\sum^{\tau}_{t=1} \epsilon_{t}^2}$
James Durbin and Geoffrey Watson proved that the distribution of the test statistic $d$ is asymptotically distributed according to the Chi-Squared distribution for random walks.
The Ljung-Box Test is named after Greta Ljung and George Box. The Ljung-Box test is a test of whether any autocorrelation, $\rho_i$, in a group of autocorrelations computed at lags 1 to $h$ for a time series are significantly different from zero,
$Q = \tau(\tau + 2)\sum^{h}_{k=1}\frac{\hat{\rho}^{2}_{k}}{\tau - k}$
where $\tau$ is the length of the time series and $\hat{\rho}^{2}_{k}$ is the squared autocorrelation calculated with lag equal to $k$. Greta Ljung and George Box proved that the test statistic, $Q$, is asymptotically distributed according to the Chi-Squared distribution.
The Durbin-Watson Test and the Ljung-Box Test are used very often, however some studies indicate that they are biased toward the null hypothesis. In other words, they are more likely to say that a time series is random than non-random.
The Breusch-Godfrey Test was developed by Trevor S. Breusch and Leslie G. Godfrey and is considered a more powerful test for autocorrelations than either the Durbin-Watson or the Ljung-Box test. The Breusch-Godfrey test also tests for statistically significant autocorrelation in the residuals, $\epsilon$, from a regression analysis.
Breusch and Godfrey proved that if you fit an auxiliary regression to the original data and the lagged residuals from a linear regression the statistic, $n R^{2}$, is asymptotically distributed according to the Chi-Squared distribution,
Some of the most powerful randomness tests which exist are called Variance Ratio tests. In a random walk process the variance should scale linearly in the sampling interval.
We can express this relationship as the ratio of variances at two different frequencies,
$\sigma^2_{f=1} \approx n \sigma^2_{f=n}$ ... therefore,
$\frac{\sigma^2_{f=1}}{n \sigma^2_{f=n}} \approx 1$
However if there are patterns or cycles in the returns data then the variances will not scale linearly in the sampling interval and the variance ratios will be off.
# Let's simulate a simple GBM random walk.
dates <- seq.Date(Sys.Date(), Sys.Date() + 2520, 1)
gbm_walk <- emh::simulate_brownian_motion(n = 2520)
gbm_walk <- zoo::zoo(gbm_walk, dates)
plot(emh::as_levels(gbm_walk))
# Now let's print it's variance at different frequencies.
variances <- c()
for(i in 1:55) {
suppressWarnings(variances <- c(variances,
var(emh::as_frequency(gbm_walk, i))))
}
plot(variances)
# Let's simulate a simple noisy sin wave random walk.
dates <- seq.Date(Sys.Date(), Sys.Date() + 2520, 1)
gbm_walk <- gbm_walk + (sin(seq(1, 2521)) / 25)
gbm_walk <- zoo::zoo(gbm_walk, dates)
plot(emh::as_levels(gbm_walk))
# Now let's print it's variance at different frequencies.
variances <- c()
for(i in 1:55) {
suppressWarnings(variances <- c(variances,
var(emh::as_frequency(gbm_walk, i))))
}
plot(variances)
In 1941 John Von Neumann introduced a test of randomness based on the ratios of variances computed at different sampling intervals. The Von Neumann Ratio Test, is a very good test of randomness under the assumption of normality.
In 1982 Robert Bartell create a nonparametric, rank-based version of the Von Neumann Ratio Test which doesn't assume that the data is normally distributed,
$RVN = \frac{\sum^{n-1}_{i=1}(\mathcal{R}_{i} -\mathcal{R}_{i-1})^2}{\sum^{n}_{i=1} (\mathcal{R}_{i} - (n + 1) / 2)^2}$
where $\mathcal{R}_i$ is the the rank of the logarithmic return $r_{i}$ and $n$ is the length of the time series. Bartell proved that the statistic, $\frac{RVN - 2}{\sigma}$, is asymptotically standard normal with,
$\sigma^2 = \frac{4(n-2)(5n^2 - 2n - 9)}{5n(n+1)(n-1)^2}$
The Lo-MacKinlay Variance Ratio Test was first introduced in 1987 in a paper entitled "Stock Market Prices Do Not Follow Random Walks". The test is a parametric variance ratio test which works for all random walk models with finite variances,
Our data plus an interval, $q$,
Now compute the variance estimates of $X$ given $q$
Now $\sigma^2_{a}(q)$ and $\sigma^2_{c}$ should be approximately equal in a random walk,
Since $\sigma^2_{a}(q) \approx \sigma^2_{c}$, $\hat{M}_{r}(q) \approx 0$ for a random walk
The test is extremely sensitive to deviations in $\hat{M}_{r}(q)$ from 0 which makes it a very powerful test of the random walk hypothesis assuming finite variances. Furthermore the test is consistent with stochastic volatility which is a well known stylized fact of market returns! Oh and $Z^*$ is asymptotically standard normal!
At this stage you may be wondering ... what is the asymptotic variance because the above overview of the Lo-MacKinlay variance ratio test went something like this,
The asymptotic variance, $\hat{\theta}$, here refers to the variance of $\hat{M}_{r}(q)$ when the sample size approaches infinity. In other words it is the limit of the variance of $\hat{M}_{r}(q)$. It's somewhat tricky to calculate - as it requires two calculations - but here it is,
$\hat{\delta}(j) = \frac{nq \sum^{nq}_{k = j + 1} \big(X_{k} -X_{k-1} - \hat{\mu} \big)^2 \big(X_{k-j} -X_{k-j-1} - \hat{\mu}\big)^2}{\Bigg[ \sum^{nq}_{k=1} \big( X_{k} -X_{k-1} - \hat{\mu} \big)^2 \Bigg]^2}$
$\hat{\theta}(q) \equiv \sum^{q-1}_{j=1} \Big[ \frac{2(q-j)}{q} \Big]^2\hat{\delta}(j)$
The good news is that I took the time to code and test this really cool randomness test in the emh package, so you don't have to worry about the stuff I just told you :-).
suppressMessages(library(Quandl))
Quandl.api_key("t6Rn1d5N1W6Qt4jJq_zC")
suppressMessages(library(PerformanceAnalytics))
Geometric Brownian Motion is a very popular stochastic process used in all areas of finance. The stochastic differential equation (SDE) for GBM is given below and the R code for simulating it is included in emh,
$dS_t = \mu S_t dt + \sigma S_t dW_t$
where $dW_t$ is a Wiener Process, $\mu$ is the annualized rate of return, $\sigma$ is the annualized volatility, and $dt$ is the rate of change of time (for daily this is $\frac{1}{252}$)
gbm_walk <- emh::simulate_brownian_motion(drift = 0.1, n = 3780)
dates <- seq.Date(Sys.Date(), Sys.Date() + 3779, 1)
gbm_walk <- emh::as_levels(zoo::zoo(gbm_walk, dates))
PerformanceAnalytics::charts.PerformanceSummary(emh::as_returns(gbm_walk))
df_gbm_walk <- emh::is_random(gbm_walk, a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_gbm_walk)
emh:::.plot_results_test_name(df_gbm_walk)
The Merton Jump Diffusion Model is another popular stochastic process which combines the Geometric Brownian Motion process is a Poisson Process which is used to add discontinuities a.k.a jumps a.k.a stock market crashes to the time series through time,
$d S_t = \mu S_t dt + \sigma S_t dW_t + d J_t$
where $J_t$ is the jump component given by,
$d J_t = S_t d \big( \sum^{N_t}_{i=0} (Y_i - 1) \big)$
where $N_t$ is the Poisson process with rate $\lambda$ and $Y_i$ is a random variable which follows a log-normal distribution. The R code for simualting Jump Diffusion process is also included in the emh package. In later versions I will include calibrators as well.
jump_walk <- emh::simulate_merton_model(drift = 0.1, n = 3780,
jlambda = 0.4)
jump_walk <- emh::as_levels(zoo::zoo(jump_walk, dates))
PerformanceAnalytics::charts.PerformanceSummary(emh::as_returns(jump_walk))
df_jump_walk <- emh::is_random(jump_walk, a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_jump_walk)
emh:::.plot_results_test_name(df_jump_walk)
usdzar <- Quandl("CURRFX/USDZAR", type = "zoo")$Rate
PerformanceAnalytics::charts.PerformanceSummary(emh::as_returns(usdzar))
df_usdzar <- emh::is_random(usdzar, a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_usdzar)
emh:::.plot_results_test_name(df_usdzar)
stx40 <- Quandl("GOOG/JSE_STX40",
type = "zoo")$Close
PerformanceAnalytics::charts.PerformanceSummary(emh::as_returns(stx40))
df_stx40 <- emh::is_random(stx40, a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_stx40)
emh:::.plot_results_test_name(df_stx40)
stxfin <- Quandl("GOOG/JSE_STXFIN",
type = "zoo")$Close
PerformanceAnalytics::charts.PerformanceSummary(emh::as_returns(stxfin))
df_stxfin <- emh::is_random(stxfin, a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_stxfin)
emh:::.plot_results_test_name(df_stxfin)
One problem with the Lo-MacKinlay variance ratio test is that it is parametric. Meaning that there are two possible interpretations when the test fails.
Either the market is not random or the variance of security prices is infinite and the distribution of returns is described by a stable distribution.
A few people believe that returns are distributed according to a stable distribution. This version of the random walk hypothesis is known as the Stable Paretian Hypothesis and was first proposed by Benoit Mandelbrot. If true, then most of modern quantitative finance is wrong because the following things are not possible and / or meaningful,
One counter argument to it being true is that if daily returns have infinite variance then so should every other frequencies' returns because the characteristic exponent of a stable distribution is invariant in the sampling interval ... but here's a simpler test:
If the failed Lo-MacKinlay variance ratio tests are due to the distribution of returns and not because there are patterns which cause the variance ratios to deviate from one, then if I shuffle the returns I should get the same result (the market is non-random).
Let's try it and see what happens,
stxfin_logrets <- emh::as_logreturns(stxfin)
stxfin_sim <- emh::simulate_permutation(logrets = stxfin_logrets,
window = 126)
stxfin_logrets <- head(stxfin_logrets, length(stxfin_logrets) - 14)
Warning message in emh::simulate_permutation(logrets = stxfin_logrets, window = 126): “logrets length is not a multiple of the window size discared the last 14 logrets in the series.”
PerformanceAnalytics::charts.PerformanceSummary(merge(exp(stxfin_logrets)-1,
exp(stxfin_sim)-1))
# Just so you believe me, here are the moments:
suppressMessages(library(moments))
for(i in seq(1, 5)) {
print(paste(moment(stxfin_logrets, i) ==
moment(stxfin_sim, i),
moment(stxfin_logrets, i)))
}
[1] "TRUE 0.00047281265294371" [1] "TRUE 0.000244905060180935" [1] "TRUE 1.7181101847927e-07" [1] "TRUE 4.5431159026863e-07" [1] "TRUE 4.09480614576749e-10"
df_stxfin_sim <- emh::is_random(emh::as_levels(stxfin_sim),
a = 0.9999,
freqs1 = seq(1, 20),
freqs2 = c("Mon", "Tue", "Wed",
"Thu", "Fri", "Week"))
|======================================================================| 100%
emh:::.plot_results_frequency(df_stxfin_sim)
emh:::.plot_results_test_name(df_stxfin_sim)
Security market prices cannot be fully described by random walks. There is little evidence to support the Efficient Market or Random Walk Hypotheses so the true story is probably that security prices contain some noise and some signal.
As such, many of our quantitative models are incorrect meaning that assets are mispriced and risk is misunderstood. But let's not "throw the baby out with the bathwater". Random walks aren't perfect but they are the best tool we have right now and they do work.
That said, in the long run I think that we, as an industry, need to adopt semi-stochastic models learnt by Machine Learning algorithms. If ML can learn to write like Shakespeare, I'm sure it can learn how to simulate stock markets better than random walks!
Thank you for listening, I really hope you enjoyed the talk. Keep your eye on the emh package because I am going to be adding a tonne of new randomness tests to it over the following months. For more information about this topic, please check out the following: