First, we set up HaskellR.
:ext QuasiQuotes
import qualified H.Prelude as H
H.initialize H.defaultConfig
Then, we set up R, loading all the packages we need.
[r|
library(keras)
library(dplyr)
library(ggplot2)
library(lubridate)
library(tidyr)
library(zoo)
library(forecast)
library(xts) |]
0x00007f64e42449a0
Let's assume our data is available as part of a larger trading application, written in Haskell.
Here we just load stock market data from files.
getAsDouble :: String -> [Double]
getAsDouble = map read . lines
googl <- fmap getAsDouble (readFile "googl.csv")
intl <- fmap getAsDouble (readFile "intl.csv")
nvda <- fmap getAsDouble (readFile "nvda.csv")
dates <- fmap lines (readFile "dates.csv")
Now we'd like to visualize our data (as time series, preferredly) and do some exploration.
How? Well - we do what we always do - use ggplot2...
The data shown are stock returns (= relative price differences). We compare returns for Google, Intel and Nvidia starting from 2017-01-01.
[rgraph|
df <- data.frame(tstamp = ymd(dates_hs), googl = googl_hs, intl = intl_hs, nvda = nvda_hs)
tss <<- read.zoo(df)
autoplot(tss) + facet_free() |]
We can quickly look at (auto-) correlations...
[rgraph|
acf(tss, na.action = na.pass) |]
And sure, we can quickly do an auto.arima on one of the series!
[rgraph|
intl_ts <- as.xts(tss[,2])
fit <- auto.arima(intl_ts)
fc <- forecast(fit, h=7)
plot(fc)|]
We need a TensorFlow installation for this, so let's check if R can find it!
[rprint| reticulate::py_config() |]
python: /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/bin/python libpython: /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/lib/libpython3.5m.so pythonhome: /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3:/nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3 version: 3.5.3 (default, Jan 17 2017, 07:57:56) [GCC 5.4.0] numpy: /nix/store/7xk7ylqagd3w72y1p39rlcsamx3hcxl3-python3.5-numpy-1.12.1/lib/python3.5/site-packages/numpy numpy_version: 1.12.1 tensorflow: /nix/store/qmr3h3f1s1x1dr64sb61igig8469imn6-python3.5-tensorflow-1.1.0/lib/python3.5/site-packages/tensorflow python versions found: /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/bin/python /usr/bin/python /usr/bin/python3
First, we prepare our time series so it's in the correct shape for an LSTM with 10 timesteps...
[rprint|
lstm_num_timesteps <<- 7
intl <<- unclass(tss[,2])
# difference
intl_start <- intl[1]
intl_diff <- diff(intl)
# normalize
minval <<- min(intl_diff)
maxval <<- max(intl_diff)
normalize <- function(vec, min, max) {
(vec-min) / (max-min)
}
denormalize <<- function(vec,min,max) {
vec * (max - min) + min
}
intl_diff <- normalize(intl_diff, minval, maxval)
# create timesteps
X_train <<- t(sapply(1:(length(intl_diff) - lstm_num_timesteps), function(x) intl_diff[x:(x + lstm_num_timesteps - 1)]))
y_train <<- sapply((lstm_num_timesteps + 1):(length(intl_diff)), function(x) intl_diff[x])
# Keras LSTMs expect the input array to be shaped as (no. samples, no. time steps, no. features)
dim(X_train) <<- c(dim(X_train)[1], dim(X_train)[2], 1)
num_samples <- dim(X_train)[1]
num_steps <<- dim(X_train)[2]
num_features <<- dim(X_train)[3]
c(num_samples, num_steps, num_features) |]
[1] 105 7 1
Now, we create the model!
-- at this point, there is no model yet
[rprint| model |]
[rprint|
batch_size <<- 1
epochs <<- 20
lstm_units <<- 4
model <<- keras_model_sequential()
|]
Model ________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================ Total params: 0 Trainable params: 0 Non-trainable params: 0 ________________________________________________________________________________
[rprint|
model %>%
layer_lstm(units = lstm_units, input_shape = c(num_steps, num_features)) %>%
layer_dense(units = 1) %>%
compile(
loss = 'mean_squared_error',
optimizer = 'adam'
)
model %>% summary()
|]
Model ________________________________________________________________________________ Layer (type) Output Shape Param # ================================================================================ lstm_1 (LSTM) (None, 4) 96 ________________________________________________________________________________ dense_1 (Dense) (None, 1) 5 ================================================================================ Total params: 101.0 Trainable params: 101 Non-trainable params: 0.0 ________________________________________________________________________________ NULL
... and we train it!
[rprint|
model %>% fit(X_train, y_train, batch_size = batch_size, epochs = epochs)
# model %>% save_model_hdf5(filepath = paste0(model_name, ".h5"))
|]
Epoch 1/20 1/105 [..............................] - ETA: 78s - loss: 0.0579 11/105 [==>...........................] - ETA: 6s - loss: 0.1364 24/105 [=====>........................] - ETA: 2s - loss: 0.1503 35/105 [=========>....................] - ETA: 1s - loss: 0.1336 46/105 [============>.................] - ETA: 1s - loss: 0.1304 58/105 [===============>..............] - ETA: 0s - loss: 0.1208 70/105 [===================>..........] - ETA: 0s - loss: 0.1082 82/105 [======================>.......] - ETA: 0s - loss: 0.0968 95/105 [==========================>...] - ETA: 0s - loss: 0.0861 105/105 [==============================] - 1s - loss: 0.0790 Epoch 2/20 1/105 [..............................] - ETA: 0s - loss: 0.0044 13/105 [==>...........................] - ETA: 0s - loss: 0.0421 25/105 [======>.......................] - ETA: 0s - loss: 0.0261 37/105 [=========>....................] - ETA: 0s - loss: 0.0242 49/105 [=============>................] - ETA: 0s - loss: 0.0215 61/105 [================>.............] - ETA: 0s - loss: 0.0276 73/105 [===================>..........] - ETA: 0s - loss: 0.0241 85/105 [=======================>......] - ETA: 0s - loss: 0.0218 97/105 [==========================>...] - ETA: 0s - loss: 0.0197 105/105 [==============================] - 0s - loss: 0.0197 Epoch 3/20 1/105 [..............................] - ETA: 0s - loss: 0.0033 12/105 [==>...........................] - ETA: 0s - loss: 0.0269 25/105 [======>.......................] - ETA: 0s - loss: 0.0225 37/105 [=========>....................] - ETA: 0s - loss: 0.0200 50/105 [=============>................] - ETA: 0s - loss: 0.0226 62/105 [================>.............] - ETA: 0s - loss: 0.0190 74/105 [====================>.........] - ETA: 0s - loss: 0.0181 86/105 [=======================>......] - ETA: 0s - loss: 0.0171 98/105 [===========================>..] - ETA: 0s - loss: 0.0164 105/105 [==============================] - 0s - loss: 0.0189 Epoch 4/20 1/105 [..............................] - ETA: 0s - loss: 0.0043 14/105 [===>..........................] - ETA: 0s - loss: 0.0365 25/105 [======>.......................] - ETA: 0s - loss: 0.0298 37/105 [=========>....................] - ETA: 0s - loss: 0.0236 47/105 [============>.................] - ETA: 0s - loss: 0.0234 58/105 [===============>..............] - ETA: 0s - loss: 0.0208 70/105 [===================>..........] - ETA: 0s - loss: 0.0185 82/105 [======================>.......] - ETA: 0s - loss: 0.0189 92/105 [=========================>....] - ETA: 0s - loss: 0.0174 103/105 [============================>.] - ETA: 0s - loss: 0.0183 105/105 [==============================] - 0s - loss: 0.0189 Epoch 5/20 1/105 [..............................] - ETA: 0s - loss: 0.0192 14/105 [===>..........................] - ETA: 0s - loss: 0.0148 25/105 [======>.......................] - ETA: 0s - loss: 0.0113 37/105 [=========>....................] - ETA: 0s - loss: 0.0105 47/105 [============>.................] - ETA: 0s - loss: 0.0101 58/105 [===============>..............] - ETA: 0s - loss: 0.0147 68/105 [==================>...........] - ETA: 0s - loss: 0.0177 79/105 [=====================>........] - ETA: 0s - loss: 0.0215 89/105 [========================>.....] - ETA: 0s - loss: 0.0205 101/105 [===========================>..] - ETA: 0s - loss: 0.0188 105/105 [==============================] - 0s - loss: 0.0190 Epoch 6/20 1/105 [..............................] - ETA: 0s - loss: 0.0111 14/105 [===>..........................] - ETA: 0s - loss: 0.0175 26/105 [======>.......................] - ETA: 0s - loss: 0.0114 39/105 [==========>...................] - ETA: 0s - loss: 0.0122 50/105 [=============>................] - ETA: 0s - loss: 0.0130 63/105 [=================>............] - ETA: 0s - loss: 0.0133 75/105 [====================>.........] - ETA: 0s - loss: 0.0196 88/105 [========================>.....] - ETA: 0s - loss: 0.0181 99/105 [===========================>..] - ETA: 0s - loss: 0.0173 105/105 [==============================] - 0s - loss: 0.0184 Epoch 7/20 1/105 [..............................] - ETA: 0s - loss: 0.0051 13/105 [==>...........................] - ETA: 0s - loss: 0.0068 25/105 [======>.......................] - ETA: 0s - loss: 0.0132 36/105 [=========>....................] - ETA: 0s - loss: 0.0150 48/105 [============>.................] - ETA: 0s - loss: 0.0129 59/105 [===============>..............] - ETA: 0s - loss: 0.0138 72/105 [===================>..........] - ETA: 0s - loss: 0.0199 83/105 [======================>.......] - ETA: 0s - loss: 0.0205 96/105 [==========================>...] - ETA: 0s - loss: 0.0192 105/105 [==============================] - 0s - loss: 0.0186 Epoch 8/20 1/105 [..............................] - ETA: 0s - loss: 0.0014 14/105 [===>..........................] - ETA: 0s - loss: 0.0145 25/105 [======>.......................] - ETA: 0s - loss: 0.0117 38/105 [=========>....................] - ETA: 0s - loss: 0.0123 50/105 [=============>................] - ETA: 0s - loss: 0.0194 63/105 [=================>............] - ETA: 0s - loss: 0.0220 74/105 [====================>.........] - ETA: 0s - loss: 0.0216 87/105 [=======================>......] - ETA: 0s - loss: 0.0191 97/105 [==========================>...] - ETA: 0s - loss: 0.0179 105/105 [==============================] - 0s - loss: 0.0187 Epoch 9/20 1/105 [..............................] - ETA: 0s - loss: 0.0030 12/105 [==>...........................] - ETA: 0s - loss: 0.0073 25/105 [======>.......................] - ETA: 0s - loss: 0.0234 36/105 [=========>....................] - ETA: 0s - loss: 0.0188 48/105 [============>.................] - ETA: 0s - loss: 0.0192 58/105 [===============>..............] - ETA: 0s - loss: 0.0194 70/105 [===================>..........] - ETA: 0s - loss: 0.0191 80/105 [=====================>........] - ETA: 0s - loss: 0.0183 91/105 [=========================>....] - ETA: 0s - loss: 0.0183 102/105 [============================>.] - ETA: 0s - loss: 0.0182 105/105 [==============================] - 0s - loss: 0.0180 Epoch 10/20 1/105 [..............................] - ETA: 0s - loss: 0.0168 11/105 [==>...........................] - ETA: 0s - loss: 0.0263 22/105 [=====>........................] - ETA: 0s - loss: 0.0183 33/105 [========>.....................] - ETA: 0s - loss: 0.0253 44/105 [===========>..................] - ETA: 0s - loss: 0.0201 55/105 [==============>...............] - ETA: 0s - loss: 0.0194 68/105 [==================>...........] - ETA: 0s - loss: 0.0215 79/105 [=====================>........] - ETA: 0s - loss: 0.0199 92/105 [=========================>....] - ETA: 0s - loss: 0.0194 102/105 [============================>.] - ETA: 0s - loss: 0.0179 105/105 [==============================] - 0s - loss: 0.0183 Epoch 11/20 1/105 [..............................] - ETA: 0s - loss: 0.0036 14/105 [===>..........................] - ETA: 0s - loss: 0.0249 26/105 [======>.......................] - ETA: 0s - loss: 0.0198 38/105 [=========>....................] - ETA: 0s - loss: 0.0187 50/105 [=============>................] - ETA: 0s - loss: 0.0237 62/105 [================>.............] - ETA: 0s - loss: 0.0222 74/105 [====================>.........] - ETA: 0s - loss: 0.0199 85/105 [=======================>......] - ETA: 0s - loss: 0.0200 97/105 [==========================>...] - ETA: 0s - loss: 0.0181 105/105 [==============================] - 0s - loss: 0.0177 Epoch 12/20 1/105 [..............................] - ETA: 0s - loss: 0.1567 13/105 [==>...........................] - ETA: 0s - loss: 0.0303 25/105 [======>.......................] - ETA: 0s - loss: 0.0208 36/105 [=========>....................] - ETA: 0s - loss: 0.0258 49/105 [=============>................] - ETA: 0s - loss: 0.0267 61/105 [================>.............] - ETA: 0s - loss: 0.0227 73/105 [===================>..........] - ETA: 0s - loss: 0.0208 83/105 [======================>.......] - ETA: 0s - loss: 0.0203 92/105 [=========================>....] - ETA: 0s - loss: 0.0193 105/105 [==============================] - 0s - loss: 0.0183 Epoch 13/20 1/105 [..............................] - ETA: 0s - loss: 1.1790e-04 13/105 [==>...........................] - ETA: 0s - loss: 0.0085 26/105 [======>.......................] - ETA: 0s - loss: 0.0089 37/105 [=========>....................] - ETA: 0s - loss: 0.0089 49/105 [=============>................] - ETA: 0s - loss: 0.0154 60/105 [================>.............] - ETA: 0s - loss: 0.0148 72/105 [===================>..........] - ETA: 0s - loss: 0.0175 82/105 [======================>.......] - ETA: 0s - loss: 0.0161 93/105 [=========================>....] - ETA: 0s - loss: 0.0166 103/105 [============================>.] - ETA: 0s - loss: 0.0175 105/105 [==============================] - 0s - loss: 0.0176 Epoch 14/20 1/105 [..............................] - ETA: 0s - loss: 0.0368 12/105 [==>...........................] - ETA: 0s - loss: 0.0129 22/105 [=====>........................] - ETA: 0s - loss: 0.0116 33/105 [========>.....................] - ETA: 0s - loss: 0.0117 43/105 [===========>..................] - ETA: 0s - loss: 0.0139 54/105 [==============>...............] - ETA: 0s - loss: 0.0137 64/105 [=================>............] - ETA: 0s - loss: 0.0148 74/105 [====================>.........] - ETA: 0s - loss: 0.0152 87/105 [=======================>......] - ETA: 0s - loss: 0.0195 98/105 [===========================>..] - ETA: 0s - loss: 0.0183 105/105 [==============================] - 0s - loss: 0.0177 Epoch 15/20 1/105 [..............................] - ETA: 0s - loss: 0.0085 13/105 [==>...........................] - ETA: 0s - loss: 0.0262 26/105 [======>.......................] - ETA: 0s - loss: 0.0250 38/105 [=========>....................] - ETA: 0s - loss: 0.0182 51/105 [=============>................] - ETA: 0s - loss: 0.0174 63/105 [=================>............] - ETA: 0s - loss: 0.0159 76/105 [====================>.........] - ETA: 0s - loss: 0.0169 88/105 [========================>.....] - ETA: 0s - loss: 0.0192 101/105 [===========================>..] - ETA: 0s - loss: 0.0179 105/105 [==============================] - 0s - loss: 0.0175 Epoch 16/20 1/105 [..............................] - ETA: 0s - loss: 0.0016 13/105 [==>...........................] - ETA: 0s - loss: 0.0172 25/105 [======>.......................] - ETA: 0s - loss: 0.0158 37/105 [=========>....................] - ETA: 0s - loss: 0.0147 49/105 [=============>................] - ETA: 0s - loss: 0.0198 60/105 [================>.............] - ETA: 0s - loss: 0.0179 72/105 [===================>..........] - ETA: 0s - loss: 0.0168 83/105 [======================>.......] - ETA: 0s - loss: 0.0180 95/105 [==========================>...] - ETA: 0s - loss: 0.0175 105/105 [==============================] - 0s - loss: 0.0176 Epoch 17/20 1/105 [..............................] - ETA: 0s - loss: 0.0018 12/105 [==>...........................] - ETA: 0s - loss: 0.0055 25/105 [======>.......................] - ETA: 0s - loss: 0.0119 37/105 [=========>....................] - ETA: 0s - loss: 0.0142 50/105 [=============>................] - ETA: 0s - loss: 0.0125 61/105 [================>.............] - ETA: 0s - loss: 0.0191 73/105 [===================>..........] - ETA: 0s - loss: 0.0171 85/105 [=======================>......] - ETA: 0s - loss: 0.0164 98/105 [===========================>..] - ETA: 0s - loss: 0.0180 105/105 [==============================] - 0s - loss: 0.0171 Epoch 18/20 1/105 [..............................] - ETA: 0s - loss: 0.0091 11/105 [==>...........................] - ETA: 0s - loss: 0.0257 19/105 [====>.........................] - ETA: 0s - loss: 0.0164 27/105 [======>.......................] - ETA: 0s - loss: 0.0170 34/105 [========>.....................] - ETA: 0s - loss: 0.0248 40/105 [==========>...................] - ETA: 0s - loss: 0.0215 46/105 [============>.................] - ETA: 0s - loss: 0.0194 54/105 [==============>...............] - ETA: 0s - loss: 0.0197 61/105 [================>.............] - ETA: 0s - loss: 0.0195 69/105 [==================>...........] - ETA: 0s - loss: 0.0184 75/105 [====================>.........] - ETA: 0s - loss: 0.0176 83/105 [======================>.......] - ETA: 0s - loss: 0.0193 90/105 [========================>.....] - ETA: 0s - loss: 0.0187 99/105 [===========================>..] - ETA: 0s - loss: 0.0179 105/105 [==============================] - 0s - loss: 0.0171 Epoch 19/20 1/105 [..............................] - ETA: 0s - loss: 0.0022 13/105 [==>...........................] - ETA: 0s - loss: 0.0280 23/105 [=====>........................] - ETA: 0s - loss: 0.0176 35/105 [=========>....................] - ETA: 0s - loss: 0.0161 46/105 [============>.................] - ETA: 0s - loss: 0.0170 57/105 [===============>..............] - ETA: 0s - loss: 0.0150 70/105 [===================>..........] - ETA: 0s - loss: 0.0168 81/105 [======================>.......] - ETA: 0s - loss: 0.0189 94/105 [=========================>....] - ETA: 0s - loss: 0.0175 105/105 [==============================] - 0s - loss: 0.0170 Epoch 20/20 1/105 [..............................] - ETA: 0s - loss: 4.3640e-04 14/105 [===>..........................] - ETA: 0s - loss: 0.0201 25/105 [======>.......................] - ETA: 0s - loss: 0.0128 38/105 [=========>....................] - ETA: 0s - loss: 0.0224 48/105 [============>.................] - ETA: 0s - loss: 0.0187 61/105 [================>.............] - ETA: 0s - loss: 0.0207 72/105 [===================>..........] - ETA: 0s - loss: 0.0195 85/105 [=======================>......] - ETA: 0s - loss: 0.0196 96/105 [==========================>...] - ETA: 0s - loss: 0.0182 105/105 [==============================] - 0s - loss: 0.0173 Trained on 105 samples, validated on NULL samples (batch_size=1, epochs=20) Final epoch (plot to see history): loss: 0.01734
Let's see how well the model predicts the data (of course, in reality we'd have a test series)!
[rgraph|
pred_train <- model %>% predict(X_train, batch_size = 1)
pred_train <- denormalize(pred_train, minval, maxval)
pred_train_undiff <- pred_train + intl[(lstm_num_timesteps+1):(length(intl)-1)]
c(length(intl), length(pred_train))
df <- data_frame(time_id = 1:113,
train = intl,
pred_train = c(rep(NA, lstm_num_timesteps+1), pred_train_undiff))
df <- df %>% gather(key = 'type', value = 'value', train:pred_train)
ggplot(df, aes(x = time_id, y = value)) + geom_line(aes(color = type)) + theme(aspect.ratio=0.8)
|]
Not bad for such a short training time, is it? :-)