Using R from Haskell: Stock market demo

Setup

First, we set up HaskellR.

In [1]:
:ext QuasiQuotes
import qualified H.Prelude as H
H.initialize H.defaultConfig

Then, we set up R, loading all the packages we need.

In [2]:
[r|
  library(keras)
  library(dplyr)
  library(ggplot2)
  library(lubridate)
  library(tidyr)
  library(zoo)
  library(forecast)
  library(xts) |]
0x00007f64e42449a0

Our data's available in Haskell

Let's assume our data is available as part of a larger trading application, written in Haskell.

Here we just load stock market data from files.

In [3]:
getAsDouble :: String -> [Double]
getAsDouble =  map read . lines
googl <- fmap getAsDouble (readFile "googl.csv") 
intl <- fmap getAsDouble (readFile "intl.csv") 
nvda <- fmap getAsDouble (readFile "nvda.csv") 
dates <- fmap lines (readFile "dates.csv")

Now we'd like to visualize our data (as time series, preferredly) and do some exploration.

Explore the data - using R

How? Well - we do what we always do - use ggplot2...

The data shown are stock returns (= relative price differences). We compare returns for Google, Intel and Nvidia starting from 2017-01-01.

In [7]:
[rgraph|
  df <- data.frame(tstamp = ymd(dates_hs), googl = googl_hs, intl = intl_hs, nvda = nvda_hs) 
  tss <<- read.zoo(df)
  autoplot(tss) + facet_free() |]

We can quickly look at (auto-) correlations...

In [8]:
[rgraph|
  acf(tss, na.action = na.pass) |]

And sure, we can quickly do an auto.arima on one of the series!

Let's get forecasting - using R's forecast package, of course!

In [9]:
[rgraph|
  intl_ts <- as.xts(tss[,2])
  fit <- auto.arima(intl_ts)
  fc  <- forecast(fit, h=7)
  plot(fc)|]

But deep learning is all the hype now... anything we can do?

Sure: use keras from R, with the keras package!

We need a TensorFlow installation for this, so let's check if R can find it!

In [10]:
[rprint| reticulate::py_config() |]
python:         /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/bin/python
libpython:      /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/lib/libpython3.5m.so
pythonhome:     /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3:/nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3
version:        3.5.3 (default, Jan 17 2017, 07:57:56)  [GCC 5.4.0]
numpy:          /nix/store/7xk7ylqagd3w72y1p39rlcsamx3hcxl3-python3.5-numpy-1.12.1/lib/python3.5/site-packages/numpy
numpy_version:  1.12.1
tensorflow:     /nix/store/qmr3h3f1s1x1dr64sb61igig8469imn6-python3.5-tensorflow-1.1.0/lib/python3.5/site-packages/tensorflow

python versions found: 
 /nix/store/5kfn0xxh3ipjdyjly2d5wrmh4cidsm8k-python3-3.5.3/bin/python
 /usr/bin/python
 /usr/bin/python3

First, we prepare our time series so it's in the correct shape for an LSTM with 10 timesteps...

In [11]:
[rprint| 
  lstm_num_timesteps <<- 7
  
  intl <<- unclass(tss[,2])
  
   # difference
  intl_start <- intl[1]
  intl_diff <- diff(intl)
  
  # normalize
  minval <<- min(intl_diff)
  maxval <<- max(intl_diff)
  normalize <- function(vec, min, max) {
    (vec-min) / (max-min)
  }
  denormalize <<- function(vec,min,max) {
    vec * (max - min) + min
  }
  intl_diff <- normalize(intl_diff, minval, maxval) 
  
  # create timesteps
  X_train <<- t(sapply(1:(length(intl_diff) - lstm_num_timesteps), function(x) intl_diff[x:(x + lstm_num_timesteps - 1)]))
  y_train <<- sapply((lstm_num_timesteps + 1):(length(intl_diff)), function(x) intl_diff[x])
  
  # Keras LSTMs expect the input array to be shaped as (no. samples, no. time steps, no. features)
  dim(X_train) <<- c(dim(X_train)[1], dim(X_train)[2], 1)
  num_samples <- dim(X_train)[1]
  num_steps <<- dim(X_train)[2]
  num_features <<- dim(X_train)[3]
  c(num_samples, num_steps, num_features) |]
[1] 105   7   1

Now, we create the model!

In [12]:
-- at this point, there is no model yet
[rprint| model |]
R Runtime Error: Error in (function () : object 'model' not found
In [13]:
[rprint| 
 
  batch_size <<- 1
  epochs <<- 20
  lstm_units <<- 4

  model <<- keras_model_sequential()
  |]
Model
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
________________________________________________________________________________
In [14]:
[rprint| 
 
  model %>% 
    layer_lstm(units = lstm_units, input_shape = c(num_steps, num_features)) %>% 
    layer_dense(units = 1) %>% 
    compile(
      loss = 'mean_squared_error',
      optimizer = 'adam'
    )
  model %>% summary()
  |]
Model
________________________________________________________________________________
Layer (type)                        Output Shape                    Param #     
================================================================================
lstm_1 (LSTM)                       (None, 4)                       96          
________________________________________________________________________________
dense_1 (Dense)                     (None, 1)                       5           
================================================================================
Total params: 101.0
Trainable params: 101
Non-trainable params: 0.0
________________________________________________________________________________

 
NULL

... and we train it!

In [15]:
[rprint| 
 
  model %>% fit(X_train, y_train, batch_size = batch_size, epochs = epochs)
  # model %>% save_model_hdf5(filepath = paste0(model_name, ".h5"))
  |]
Epoch 1/20

  1/105 [..............................] - ETA: 78s - loss: 0.0579
 11/105 [==>...........................] - ETA: 6s - loss: 0.1364 
 24/105 [=====>........................] - ETA: 2s - loss: 0.1503
 35/105 [=========>....................] - ETA: 1s - loss: 0.1336
 46/105 [============>.................] - ETA: 1s - loss: 0.1304
 58/105 [===============>..............] - ETA: 0s - loss: 0.1208
 70/105 [===================>..........] - ETA: 0s - loss: 0.1082
 82/105 [======================>.......] - ETA: 0s - loss: 0.0968
 95/105 [==========================>...] - ETA: 0s - loss: 0.0861
105/105 [==============================] - 1s - loss: 0.0790     
Epoch 2/20

  1/105 [..............................] - ETA: 0s - loss: 0.0044
 13/105 [==>...........................] - ETA: 0s - loss: 0.0421
 25/105 [======>.......................] - ETA: 0s - loss: 0.0261
 37/105 [=========>....................] - ETA: 0s - loss: 0.0242
 49/105 [=============>................] - ETA: 0s - loss: 0.0215
 61/105 [================>.............] - ETA: 0s - loss: 0.0276
 73/105 [===================>..........] - ETA: 0s - loss: 0.0241
 85/105 [=======================>......] - ETA: 0s - loss: 0.0218
 97/105 [==========================>...] - ETA: 0s - loss: 0.0197
105/105 [==============================] - 0s - loss: 0.0197     
Epoch 3/20

  1/105 [..............................] - ETA: 0s - loss: 0.0033
 12/105 [==>...........................] - ETA: 0s - loss: 0.0269
 25/105 [======>.......................] - ETA: 0s - loss: 0.0225
 37/105 [=========>....................] - ETA: 0s - loss: 0.0200
 50/105 [=============>................] - ETA: 0s - loss: 0.0226
 62/105 [================>.............] - ETA: 0s - loss: 0.0190
 74/105 [====================>.........] - ETA: 0s - loss: 0.0181
 86/105 [=======================>......] - ETA: 0s - loss: 0.0171
 98/105 [===========================>..] - ETA: 0s - loss: 0.0164
105/105 [==============================] - 0s - loss: 0.0189     
Epoch 4/20

  1/105 [..............................] - ETA: 0s - loss: 0.0043
 14/105 [===>..........................] - ETA: 0s - loss: 0.0365
 25/105 [======>.......................] - ETA: 0s - loss: 0.0298
 37/105 [=========>....................] - ETA: 0s - loss: 0.0236
 47/105 [============>.................] - ETA: 0s - loss: 0.0234
 58/105 [===============>..............] - ETA: 0s - loss: 0.0208
 70/105 [===================>..........] - ETA: 0s - loss: 0.0185
 82/105 [======================>.......] - ETA: 0s - loss: 0.0189
 92/105 [=========================>....] - ETA: 0s - loss: 0.0174
103/105 [============================>.] - ETA: 0s - loss: 0.0183
105/105 [==============================] - 0s - loss: 0.0189     
Epoch 5/20

  1/105 [..............................] - ETA: 0s - loss: 0.0192
 14/105 [===>..........................] - ETA: 0s - loss: 0.0148
 25/105 [======>.......................] - ETA: 0s - loss: 0.0113
 37/105 [=========>....................] - ETA: 0s - loss: 0.0105
 47/105 [============>.................] - ETA: 0s - loss: 0.0101
 58/105 [===============>..............] - ETA: 0s - loss: 0.0147
 68/105 [==================>...........] - ETA: 0s - loss: 0.0177
 79/105 [=====================>........] - ETA: 0s - loss: 0.0215
 89/105 [========================>.....] - ETA: 0s - loss: 0.0205
101/105 [===========================>..] - ETA: 0s - loss: 0.0188
105/105 [==============================] - 0s - loss: 0.0190     
Epoch 6/20

  1/105 [..............................] - ETA: 0s - loss: 0.0111
 14/105 [===>..........................] - ETA: 0s - loss: 0.0175
 26/105 [======>.......................] - ETA: 0s - loss: 0.0114
 39/105 [==========>...................] - ETA: 0s - loss: 0.0122
 50/105 [=============>................] - ETA: 0s - loss: 0.0130
 63/105 [=================>............] - ETA: 0s - loss: 0.0133
 75/105 [====================>.........] - ETA: 0s - loss: 0.0196
 88/105 [========================>.....] - ETA: 0s - loss: 0.0181
 99/105 [===========================>..] - ETA: 0s - loss: 0.0173
105/105 [==============================] - 0s - loss: 0.0184     
Epoch 7/20

  1/105 [..............................] - ETA: 0s - loss: 0.0051
 13/105 [==>...........................] - ETA: 0s - loss: 0.0068
 25/105 [======>.......................] - ETA: 0s - loss: 0.0132
 36/105 [=========>....................] - ETA: 0s - loss: 0.0150
 48/105 [============>.................] - ETA: 0s - loss: 0.0129
 59/105 [===============>..............] - ETA: 0s - loss: 0.0138
 72/105 [===================>..........] - ETA: 0s - loss: 0.0199
 83/105 [======================>.......] - ETA: 0s - loss: 0.0205
 96/105 [==========================>...] - ETA: 0s - loss: 0.0192
105/105 [==============================] - 0s - loss: 0.0186     
Epoch 8/20

  1/105 [..............................] - ETA: 0s - loss: 0.0014
 14/105 [===>..........................] - ETA: 0s - loss: 0.0145
 25/105 [======>.......................] - ETA: 0s - loss: 0.0117
 38/105 [=========>....................] - ETA: 0s - loss: 0.0123
 50/105 [=============>................] - ETA: 0s - loss: 0.0194
 63/105 [=================>............] - ETA: 0s - loss: 0.0220
 74/105 [====================>.........] - ETA: 0s - loss: 0.0216
 87/105 [=======================>......] - ETA: 0s - loss: 0.0191
 97/105 [==========================>...] - ETA: 0s - loss: 0.0179
105/105 [==============================] - 0s - loss: 0.0187     
Epoch 9/20

  1/105 [..............................] - ETA: 0s - loss: 0.0030
 12/105 [==>...........................] - ETA: 0s - loss: 0.0073
 25/105 [======>.......................] - ETA: 0s - loss: 0.0234
 36/105 [=========>....................] - ETA: 0s - loss: 0.0188
 48/105 [============>.................] - ETA: 0s - loss: 0.0192
 58/105 [===============>..............] - ETA: 0s - loss: 0.0194
 70/105 [===================>..........] - ETA: 0s - loss: 0.0191
 80/105 [=====================>........] - ETA: 0s - loss: 0.0183
 91/105 [=========================>....] - ETA: 0s - loss: 0.0183
102/105 [============================>.] - ETA: 0s - loss: 0.0182
105/105 [==============================] - 0s - loss: 0.0180     
Epoch 10/20

  1/105 [..............................] - ETA: 0s - loss: 0.0168
 11/105 [==>...........................] - ETA: 0s - loss: 0.0263
 22/105 [=====>........................] - ETA: 0s - loss: 0.0183
 33/105 [========>.....................] - ETA: 0s - loss: 0.0253
 44/105 [===========>..................] - ETA: 0s - loss: 0.0201
 55/105 [==============>...............] - ETA: 0s - loss: 0.0194
 68/105 [==================>...........] - ETA: 0s - loss: 0.0215
 79/105 [=====================>........] - ETA: 0s - loss: 0.0199
 92/105 [=========================>....] - ETA: 0s - loss: 0.0194
102/105 [============================>.] - ETA: 0s - loss: 0.0179
105/105 [==============================] - 0s - loss: 0.0183     
Epoch 11/20

  1/105 [..............................] - ETA: 0s - loss: 0.0036
 14/105 [===>..........................] - ETA: 0s - loss: 0.0249
 26/105 [======>.......................] - ETA: 0s - loss: 0.0198
 38/105 [=========>....................] - ETA: 0s - loss: 0.0187
 50/105 [=============>................] - ETA: 0s - loss: 0.0237
 62/105 [================>.............] - ETA: 0s - loss: 0.0222
 74/105 [====================>.........] - ETA: 0s - loss: 0.0199
 85/105 [=======================>......] - ETA: 0s - loss: 0.0200
 97/105 [==========================>...] - ETA: 0s - loss: 0.0181
105/105 [==============================] - 0s - loss: 0.0177     
Epoch 12/20

  1/105 [..............................] - ETA: 0s - loss: 0.1567
 13/105 [==>...........................] - ETA: 0s - loss: 0.0303
 25/105 [======>.......................] - ETA: 0s - loss: 0.0208
 36/105 [=========>....................] - ETA: 0s - loss: 0.0258
 49/105 [=============>................] - ETA: 0s - loss: 0.0267
 61/105 [================>.............] - ETA: 0s - loss: 0.0227
 73/105 [===================>..........] - ETA: 0s - loss: 0.0208
 83/105 [======================>.......] - ETA: 0s - loss: 0.0203
 92/105 [=========================>....] - ETA: 0s - loss: 0.0193
105/105 [==============================] - 0s - loss: 0.0183     
Epoch 13/20

  1/105 [..............................] - ETA: 0s - loss: 1.1790e-04
 13/105 [==>...........................] - ETA: 0s - loss: 0.0085    
 26/105 [======>.......................] - ETA: 0s - loss: 0.0089
 37/105 [=========>....................] - ETA: 0s - loss: 0.0089
 49/105 [=============>................] - ETA: 0s - loss: 0.0154
 60/105 [================>.............] - ETA: 0s - loss: 0.0148
 72/105 [===================>..........] - ETA: 0s - loss: 0.0175
 82/105 [======================>.......] - ETA: 0s - loss: 0.0161
 93/105 [=========================>....] - ETA: 0s - loss: 0.0166
103/105 [============================>.] - ETA: 0s - loss: 0.0175
105/105 [==============================] - 0s - loss: 0.0176     
Epoch 14/20

  1/105 [..............................] - ETA: 0s - loss: 0.0368
 12/105 [==>...........................] - ETA: 0s - loss: 0.0129
 22/105 [=====>........................] - ETA: 0s - loss: 0.0116
 33/105 [========>.....................] - ETA: 0s - loss: 0.0117
 43/105 [===========>..................] - ETA: 0s - loss: 0.0139
 54/105 [==============>...............] - ETA: 0s - loss: 0.0137
 64/105 [=================>............] - ETA: 0s - loss: 0.0148
 74/105 [====================>.........] - ETA: 0s - loss: 0.0152
 87/105 [=======================>......] - ETA: 0s - loss: 0.0195
 98/105 [===========================>..] - ETA: 0s - loss: 0.0183
105/105 [==============================] - 0s - loss: 0.0177     
Epoch 15/20

  1/105 [..............................] - ETA: 0s - loss: 0.0085
 13/105 [==>...........................] - ETA: 0s - loss: 0.0262
 26/105 [======>.......................] - ETA: 0s - loss: 0.0250
 38/105 [=========>....................] - ETA: 0s - loss: 0.0182
 51/105 [=============>................] - ETA: 0s - loss: 0.0174
 63/105 [=================>............] - ETA: 0s - loss: 0.0159
 76/105 [====================>.........] - ETA: 0s - loss: 0.0169
 88/105 [========================>.....] - ETA: 0s - loss: 0.0192
101/105 [===========================>..] - ETA: 0s - loss: 0.0179
105/105 [==============================] - 0s - loss: 0.0175     
Epoch 16/20

  1/105 [..............................] - ETA: 0s - loss: 0.0016
 13/105 [==>...........................] - ETA: 0s - loss: 0.0172
 25/105 [======>.......................] - ETA: 0s - loss: 0.0158
 37/105 [=========>....................] - ETA: 0s - loss: 0.0147
 49/105 [=============>................] - ETA: 0s - loss: 0.0198
 60/105 [================>.............] - ETA: 0s - loss: 0.0179
 72/105 [===================>..........] - ETA: 0s - loss: 0.0168
 83/105 [======================>.......] - ETA: 0s - loss: 0.0180
 95/105 [==========================>...] - ETA: 0s - loss: 0.0175
105/105 [==============================] - 0s - loss: 0.0176     
Epoch 17/20

  1/105 [..............................] - ETA: 0s - loss: 0.0018
 12/105 [==>...........................] - ETA: 0s - loss: 0.0055
 25/105 [======>.......................] - ETA: 0s - loss: 0.0119
 37/105 [=========>....................] - ETA: 0s - loss: 0.0142
 50/105 [=============>................] - ETA: 0s - loss: 0.0125
 61/105 [================>.............] - ETA: 0s - loss: 0.0191
 73/105 [===================>..........] - ETA: 0s - loss: 0.0171
 85/105 [=======================>......] - ETA: 0s - loss: 0.0164
 98/105 [===========================>..] - ETA: 0s - loss: 0.0180
105/105 [==============================] - 0s - loss: 0.0171     
Epoch 18/20

  1/105 [..............................] - ETA: 0s - loss: 0.0091
 11/105 [==>...........................] - ETA: 0s - loss: 0.0257
 19/105 [====>.........................] - ETA: 0s - loss: 0.0164
 27/105 [======>.......................] - ETA: 0s - loss: 0.0170
 34/105 [========>.....................] - ETA: 0s - loss: 0.0248
 40/105 [==========>...................] - ETA: 0s - loss: 0.0215
 46/105 [============>.................] - ETA: 0s - loss: 0.0194
 54/105 [==============>...............] - ETA: 0s - loss: 0.0197
 61/105 [================>.............] - ETA: 0s - loss: 0.0195
 69/105 [==================>...........] - ETA: 0s - loss: 0.0184
 75/105 [====================>.........] - ETA: 0s - loss: 0.0176
 83/105 [======================>.......] - ETA: 0s - loss: 0.0193
 90/105 [========================>.....] - ETA: 0s - loss: 0.0187
 99/105 [===========================>..] - ETA: 0s - loss: 0.0179
105/105 [==============================] - 0s - loss: 0.0171     
Epoch 19/20

  1/105 [..............................] - ETA: 0s - loss: 0.0022
 13/105 [==>...........................] - ETA: 0s - loss: 0.0280
 23/105 [=====>........................] - ETA: 0s - loss: 0.0176
 35/105 [=========>....................] - ETA: 0s - loss: 0.0161
 46/105 [============>.................] - ETA: 0s - loss: 0.0170
 57/105 [===============>..............] - ETA: 0s - loss: 0.0150
 70/105 [===================>..........] - ETA: 0s - loss: 0.0168
 81/105 [======================>.......] - ETA: 0s - loss: 0.0189
 94/105 [=========================>....] - ETA: 0s - loss: 0.0175
105/105 [==============================] - 0s - loss: 0.0170     
Epoch 20/20

  1/105 [..............................] - ETA: 0s - loss: 4.3640e-04
 14/105 [===>..........................] - ETA: 0s - loss: 0.0201    
 25/105 [======>.......................] - ETA: 0s - loss: 0.0128
 38/105 [=========>....................] - ETA: 0s - loss: 0.0224
 48/105 [============>.................] - ETA: 0s - loss: 0.0187
 61/105 [================>.............] - ETA: 0s - loss: 0.0207
 72/105 [===================>..........] - ETA: 0s - loss: 0.0195
 85/105 [=======================>......] - ETA: 0s - loss: 0.0196
 96/105 [==========================>...] - ETA: 0s - loss: 0.0182
105/105 [==============================] - 0s - loss: 0.0173     Trained on 105 samples, validated on NULL samples (batch_size=1, epochs=20)
Final epoch (plot to see history):
loss: 0.01734

Let's see how well the model predicts the data (of course, in reality we'd have a test series)!

In [16]:
[rgraph| 
 
   pred_train <- model %>% predict(X_train, batch_size = 1)

   pred_train <- denormalize(pred_train, minval, maxval)
   pred_train_undiff <- pred_train + intl[(lstm_num_timesteps+1):(length(intl)-1)] 
   c(length(intl), length(pred_train))
   df <- data_frame(time_id = 1:113,
                    train = intl,
                    pred_train = c(rep(NA, lstm_num_timesteps+1), pred_train_undiff))       
   df <- df %>% gather(key = 'type', value = 'value', train:pred_train)
   ggplot(df, aes(x = time_id, y = value)) + geom_line(aes(color = type)) + theme(aspect.ratio=0.8)
   |]

Not bad for such a short training time, is it? :-)