The emh package allows you to test the effeciency of any univariate zoo time series object in R.
The package achieves this using the following methodology,
In addition to randomness tests emh also includes a number of stochastic process models.
The first step to using the package is to download it and install it using the devtools R package
library(devtools)
suppressMessages(install_github(repo="stuartgordonreid/emh",
force = TRUE))
Now check that you can load the package,
suppressMessages(library(emh))
The emh package includes a few functions which allow you to download a bunch of global stock market indices from Quandl.com right off the bat. You can, of course, also pass in your own data. I recommend sticking with zoo objects when using emh because of how the downsampling works.
# This may take some time. Use the S3, $ operator to see the datasets.
global_indices <- emh::data_quandl_downloader(data_quandl_indices())
[1] "DOWNLOADING DATASETS ..." |======================================================================| 100%
Generating a data.frame with the results from each of the randomness tests is as easy as passing a zoo object into the is_random function in emh. This function will downsample the data into multiple lower frequencies and run a battery of tests on each subfrequency.
Frequencies are specified by the freqs1 and freqs2 arguments,
results <- is_random(S = global_indices$'YAHOO/INDEX_SML',
a = 0.99, # To get a 99% confident result
freqs1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
freqs2 = c("Mon", "Tue", "Wed", "Thu", "Fri", "Week", "Month"))
|======================================================================| 100%
You can now view the results (a data.frame object) or plot some interesting statistics
head(results, 30)
Test_Name | Frequency | Sample_Size | Statistic | Two_sided_p | Z_Score | Non_Random |
---|---|---|---|---|---|---|
Independent Runs | (t-1 to t) | 7031 | 3222.000000 | 0.000000 | -6.401018 | TRUE |
Durbin-Watson | (t-1 to t) | 7031 | 2.000085 | 0.997102 | 2.759057 | TRUE |
Ljung-Box | (t-1 to t) | 7031 | 40.018562 | 0.000451 | -3.319694 | TRUE |
Breusch-Godfrey | (t-1 to t) | 7031 | 1.493007 | 0.221750 | -0.766295 | FALSE |
Bartell Rank | (t-1 to t) | 7031 | -4.581588 | 0.000005 | -4.434487 | TRUE |
Variance-Ratio LoMac | (t-1 to t) | 7031 | 0.524304 | 0.612558 | 0.285993 | FALSE |
Independent Runs | (t-2 to t) | 3515 | 1615.000000 | 0.000025 | -4.056609 | TRUE |
Durbin-Watson | (t-2 to t) | 3515 | 1.999914 | 0.997877 | 2.859250 | TRUE |
Ljung-Box | (t-2 to t) | 3515 | 35.686137 | 0.001962 | -2.884263 | TRUE |
Breusch-Godfrey | (t-2 to t) | 3515 | 0.002772 | 0.958014 | 1.728094 | FALSE |
Bartell Rank | (t-2 to t) | 3515 | -1.561292 | 0.118455 | -1.182746 | FALSE |
Variance-Ratio LoMac | (t-2 to t) | 3515 | -0.298505 | 0.427464 | -0.182833 | FALSE |
Independent Runs | (t-3 to t) | 2343 | 1107.000000 | 0.034269 | -1.821457 | FALSE |
Durbin-Watson | (t-3 to t) | 2343 | 2.000828 | 0.983218 | 2.125265 | FALSE |
Ljung-Box | (t-3 to t) | 2343 | 36.372465 | 0.001562 | -2.955341 | TRUE |
Breusch-Godfrey | (t-3 to t) | 2343 | 1.496615 | 0.221193 | -0.768171 | FALSE |
Bartell Rank | (t-3 to t) | 2343 | 0.044739 | 0.964315 | 1.803117 | FALSE |
Variance-Ratio LoMac | (t-3 to t) | 2343 | 2.593742 | 0.889218 | 1.222378 | FALSE |
Independent Runs | (t-4 to t) | 1757 | 861.000000 | 0.467113 | -0.082529 | FALSE |
Durbin-Watson | (t-4 to t) | 1757 | 1.993175 | 0.886997 | 1.210711 | FALSE |
Ljung-Box | (t-4 to t) | 1757 | 18.857366 | 0.220270 | -0.771281 | FALSE |
Breusch-Godfrey | (t-4 to t) | 1757 | 0.064177 | 0.800012 | 0.841665 | FALSE |
Bartell Rank | (t-4 to t) | 1757 | 1.664227 | 0.096067 | -1.304292 | FALSE |
Variance-Ratio LoMac | (t-4 to t) | 1757 | 2.066387 | 0.898717 | 1.274274 | FALSE |
Independent Runs | (t-5 to t) | 1406 | 713.000000 | 0.905517 | 1.313647 | FALSE |
Durbin-Watson | (t-5 to t) | 1406 | 1.995872 | 0.938608 | 1.543194 | FALSE |
Ljung-Box | (t-5 to t) | 1406 | 14.701537 | 0.473122 | -0.067424 | FALSE |
Breusch-Godfrey | (t-5 to t) | 1406 | 0.446644 | 0.503933 | 0.009859 | FALSE |
Bartell Rank | (t-5 to t) | 1406 | 1.576687 | 0.114868 | -1.201041 | FALSE |
Variance-Ratio LoMac | (t-5 to t) | 1406 | 0.509927 | 0.623519 | 0.314736 | FALSE |
plot_results(results)
In the two graphs above we can see that there are a large number of non-random results at the $t-1$ to $t$, and $t-2$ to $t$ frequencies.
This might imply that the selected market, the small cap index of the S&P 500, is non-random at those frequencies. When we look at the second graph we see that most of the results were produced by the Ljung-Box and Durbin-Watson statistical tests which implies that there might exist some serial correlations in the data which are significantly different from zero. These tests do not tell us how economically significant the serial correlations are nor does it tell us where in the data these serial correlations were observed. These questions are best left to the entrepid quant trader to answer.