Varliadation of LSTM Stock Forecasting (HarmonicStockOscillator)

I was struggling with the DL4J LSTM network to be used to forecast stock data and I was not sure where the issue was:

  1. was the data not suited for forcasting ?
  2. did I have some errors in the construction of the Iterator ?
  3. did i have an issue with scaling the data ?
  4. was the network setup not correct ?

So I decided to simplify the problem by using generated sinusoid data instead of real stock values in order to proof that the LSTM model and the StockData3DIterator are working as expected.

Setup

We add the necessary java libraries with the help of Maven

In [193]:
%classpath config resolver maven-public1 http://nuc.local:8081/repository/maven-public/
%%classpath add mvn 
ch.pschatzmann:investor:LATEST
ch.pschatzmann:investor-dl4j:LATEST
org.nd4j:nd4j-native:1.0.0-beta2
org.deeplearning4j:deeplearning4j-core:1.0.0-beta2
In [194]:
import java.util.Arrays
import java.util.Date
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import ch.pschatzmann.stocks.forecasting._
import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration.dl4j._
import ch.pschatzmann.stocks.integration.StockTimeSeries
import ch.pschatzmann.display.Table

import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator;
import org.deeplearning4j.eval._;
import org.deeplearning4j.nn.conf._
import org.deeplearning4j.nn.conf.inputs.InputType;
import org.deeplearning4j.nn.conf.layers._;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.deeplearning4j.datasets.datavec._;
import org.deeplearning4j.evaluation._
import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.datavec.api.records.reader.RecordReader
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
import org.nd4j.linalg.learning.config._
import org.nd4j.linalg.lossfunctions.LossFunctions
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
import org.nd4j.linalg.dataset.api.preprocessor._
import org.nd4j.linalg.dataset.api._
import org.ta4j.core.indicators.helpers._
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import scala.collection.mutable.ListBuffer
import scala.collection.Map
Out[194]:
import java.util.Arrays
import java.util.Date
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import ch.pschatzmann.stocks.forecasting._
import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration.dl4j._
import ch.pschatzmann.stocks.integration.StockTimeSeries
import ch.pschatzmann.display.Table
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator
import org.deeplearning4j.eval._
import org.deeplearning4j.nn.conf._
import org.deeplearning4j.nn.conf.inputs.InputType
import org.deeplearning4j.nn.conf.layers._
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.optimize.listeners.ScoreIterationListener
import org.deeple...

Data Generation

We can generate some sinoid data with the help of the HarmonicStockOscillator class. Then we use the MinMaxScaledIndicator to scale the data between -1 and 1. This data can be used as output labels and we use the OffsetIndicator to shift this data by one period back to generate the input data.

Now we can show the data in a table...

In [195]:
var osc = new HarmonicStockOscillatorForecast(Context.date("2010-01-01"))
var forecastIndicator = new ForecastIndicator(osc, Context.date("2019-01-01"));
var actual = new MinMaxScaledIndicator(forecastIndicator, -1.0, 1.0)
var history = new OffsetIndicator(actual, -1)

var table = Table.create(forecastIndicator,history,actual)

... and we generate a Chart

In [196]:
new SimpleTimePlot {
    data = table.seq()
    columns = Seq("ForecastIndicator", "ForecastIndicator-MinMaxScaled","ForecastIndicator-MinMaxScaled-1")
}

Test Data

We split the data and generate a StockData3DIterator for the training data. We use

  • batches of 50 entries.
  • 100 periods as input
  • 100 periods as output (with no masking)
  • a sliding window of 100 periods (so that each batch contains no overlapping data)
In [197]:
var splitDate = Context.date("2018-01-01")
var inputTrain = IndicatorSplitter.split(Arrays.asList(history).asInstanceOf[java.util.List[Indicator[Num]]], splitDate, true)
var outputTrain = IndicatorSplitter.split(Arrays.asList(actual).asInstanceOf[java.util.List[Indicator[Num]]], splitDate, true)
var iteratorTrain = new StockData3DIterator(inputTrain, outputTrain, 50, 100, 100, 100);    
iteratorTrain.setScalingPerDataset(false)
Out[197]:
null
In [198]:
iteratorTrain.next()
Out[198]:
===========INPUT===================
[[[    1.0000,    0.9997,    0.9988  ...   -0.7540   -0.7702,   -0.7859]], 

 [[   -0.8011,   -0.8159,   -0.8301  ...    0.2110    0.2354,    0.2596]], 

 [[    0.2837,    0.3075,    0.3312  ...    0.4159    0.3931,    0.3700]], 

  ..., 

 [[         0,         0,         0  ...         0         0,         0]], 

 [[         0,         0,         0  ...         0         0,         0]], 

 [[         0,         0,         0  ...         0         0,         0]]]
=================OUTPUT==================
[[[    0.9997,    0.9988,    0.9972  ...   -0.7702   -0.7859,   -0.8011]], 

 [[   -0.8159,   -0.8301,   -0.8437  ...    0.2354    0.2596,    0.2837]], 

 [[    0.3075,    0.3312,    0.3547  ...    0.3931    0.3700,    0.3466]], 

  ..., 

 [[         0,         0,         0  ...         0         0,         0]], 

 [[         0,         0,         0  ...         0         0,         0]], 

 [[         0,         0,         0  ...         0         0,         0]]]
===========INPUT MASK===================
[[    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
 [    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
 [    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
  ..., 
 [         0,         0,         0  ...         0         0,         0], 
 [         0,         0,         0  ...         0         0,         0], 
 [         0,         0,         0  ...         0         0,         0]]
===========OUTPUT MASK===================
[[    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
 [    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
 [    1.0000,    1.0000,    1.0000  ...    1.0000    1.0000,    1.0000], 
  ..., 
 [         0,         0,         0  ...         0         0,         0], 
 [         0,         0,         0  ...         0         0,         0], 
 [         0,         0,         0  ...         0         0,         0]]

LSTM Network

We define the LSTM multilayer network

In [199]:
val seed = 12345;

val periods = iteratorTrain.inputPeriods() + iteratorTrain.outcomePeriods()
val lstmLayer1Size = periods*2;
val lstmLayer2Size = periods;
val denseLayerSize = periods;
var truncatedBPTTLength = 250
val dropoutRatio = 0.8;
var nIn = iteratorTrain.inputColumns()
var nOut = iteratorTrain.totalOutcomes()

var conf = new NeuralNetConfiguration.Builder()
    .seed(seed)
    .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
    .weightInit(WeightInit.XAVIER)
    .updater(Updater.ADAGRAD)  // RMSPROP or ADAGRAD
    .l2(1e-2)
    .list()
    .layer(0, new GravesLSTM.Builder()
            .nIn(nIn)
            .nOut(lstmLayer1Size)
            .gateActivationFunction(Activation.SOFTSIGN)
            .dropOut(dropoutRatio)
            .build())
    .layer(1, new GravesLSTM.Builder()
            .nIn(lstmLayer1Size)
            .nOut(lstmLayer2Size)
            .gateActivationFunction(Activation.SOFTSIGN)
            .dropOut(dropoutRatio)
            .build())
    .layer(2, new RnnOutputLayer.Builder()
            .nIn(lstmLayer2Size)
            .nOut(nOut)
            .activation(Activation.IDENTITY)
            .lossFunction(LossFunctions.LossFunction.MSE)
            .build())
    .backpropType(BackpropType.TruncatedBPTT)
    .tBPTTForwardLength(truncatedBPTTLength)
    .tBPTTBackwardLength(truncatedBPTTLength)
    .pretrain(false)
    .backprop(true)
    .build();

var net = new MultiLayerNetwork(conf);
net.init()

conf
Out[199]:
{
  "backprop" : true,
  "backpropType" : "TruncatedBPTT",
  "cacheMode" : "NONE",
  "confs" : [ {
    "cacheMode" : "NONE",
    "epochCount" : 0,
    "iterationCount" : 0,
    "layer" : {
      "@class" : "org.deeplearning4j.nn.conf.layers.GravesLSTM",
      "activationFn" : {
        "@class" : "org.nd4j.linalg.activations.impl.ActivationSigmoid"
      },
      "biasInit" : 0.0,
      "biasUpdater" : null,
      "constraints" : null,
      "dist" : null,
      "distRecurrent" : null,
      "forgetGateBiasInit" : 1.0,
      "gateActivationFn" : {
        "@class" : "org.nd4j.linalg.activations.impl.ActivationSoftSign"
      },
      "gradientNormalization" : "None",
      "gradientNormalizationThreshold" : 1.0,
      "idropout" : {
        "@class" : "org.deeplearning4j.nn.conf.dropout.Dropout",
        "p" : 0.8,
        "pschedule" : null
      },
      "iupdater" : {
        "@class" : "org.nd4j.linalg.learning.config.AdaGrad",
        "epsilon" : 1.0E-6,
        "learningRate" : 0.1
      },
      "l1" : 0.0,
      "l1Bias" : 0.0,
      "l2" : 0.01,
      "l2Bias" : 0.0,
      "layerName" : "layer0",
      "nin" : 1,
      "nout" : 400,
      "pretrain" : false,
      "weightInit" : "XAVIER",
      "weightInitRecurrent" : null,
      "weightNoise" : null
    },
    "maxNumLineSearchIterations" : 5,
    "miniBatch" : true,
    "minimize" : true,
    "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT",
    "pretrain" : false,
    "seed" : 12345,
    "stepFunction" : null,
    "variables" : [ "W", "RW", "b" ]
  }, {
    "cacheMode" : "NONE",
    "epochCount" : 0,
    "iterationCount" : 0,
    "layer" : {
      "@class" : "org.deeplearning4j.nn.conf.layers.GravesLSTM",
      "activationFn" : {
        "@class" : "org.nd4j.linalg.activations.impl.ActivationSigmoid"
      },
      "biasInit" : 0.0,
      "biasUpdater" : null,
      "constraints" : null,
      "dist" : null,
      "distRecurrent" : null,
      "forgetGateBiasInit" : 1.0,
      "gateActivationFn" : {
        "@class" : "org.nd4j.linalg.activations.impl.ActivationSoftSign"
      },
      "gradientNormalization" : "None",
      "gradientNormalizationThreshold" : 1.0,
      "idropout" : {
        "@class" : "org.deeplearning4j.nn.conf.dropout.Dropout",
        "p" : 0.8,
        "pschedule" : null
      },
      "iupdater" : {
        "@class" : "org.nd4j.linalg.learning.config.AdaGrad",
        "epsilon" : 1.0E-6,
        "learningRate" : 0.1
      },
      "l1" : 0.0,
      "l1Bias" : 0.0,
      "l2" : 0.01,
      "l2Bias" : 0.0,
      "layerName" : "layer1",
      "nin" : 400,
      "nout" : 200,
      "pretrain" : false,
      "weightInit" : "XAVIER",
      "weightInitRecurrent" : null,
      "weightNoise" : null
    },
    "maxNumLineSearchIterations" : 5,
    "miniBatch" : true,
    "minimize" : true,
    "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT",
    "pretrain" : false,
    "seed" : 12345,
    "stepFunction" : null,
    "variables" : [ "W", "RW", "b" ]
  }, {
    "cacheMode" : "NONE",
    "epochCount" : 0,
    "iterationCount" : 0,
    "layer" : {
      "@class" : "org.deeplearning4j.nn.conf.layers.RnnOutputLayer",
      "activationFn" : {
        "@class" : "org.nd4j.linalg.activations.impl.ActivationIdentity"
      },
      "biasInit" : 0.0,
      "biasUpdater" : null,
      "constraints" : null,
      "dist" : null,
      "gradientNormalization" : "None",
      "gradientNormalizationThreshold" : 1.0,
      "hasBias" : true,
      "idropout" : null,
      "iupdater" : {
        "@class" : "org.nd4j.linalg.learning.config.AdaGrad",
        "epsilon" : 1.0E-6,
        "learningRate" : 0.1
      },
      "l1" : 0.0,
      "l1Bias" : 0.0,
      "l2" : 0.01,
      "l2Bias" : 0.0,
      "layerName" : "layer2",
      "lossFn" : {
        "@class" : "org.nd4j.linalg.lossfunctions.impl.LossMSE",
        "configProperties" : false,
        "numOutputs" : -1
      },
      "nin" : 200,
      "nout" : 1,
      "pretrain" : false,
      "weightInit" : "XAVIER",
      "weightNoise" : null
    },
    "maxNumLineSearchIterations" : 5,
    "miniBatch" : true,
    "minimize" : true,
    "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT",
    "pretrain" : false,
    "seed" : 12345,
    "stepFunction" : null,
    "variables" : [ "W", "b" ]
  } ],
  "epochCount" : 0,
  "inferenceWorkspaceMode" : "ENABLED",
  "inputPreProcessors" : { },
  "iterationCount" : 0,
  "pretrain" : false,
  "tbpttBackLength" : 250,
  "tbpttFwdLength" : 250,
  "trainingWorkspaceMode" : "ENABLED"
}

Fitting the Network

Next we train the network in with n epochs:

In [200]:
println("Training: ")
var nEpochs = 10
var client = NamespaceClient.getBeakerX()
for(i <- 0  to nEpochs-1 ) {
    iteratorTrain.reset(); 
    while (iteratorTrain.hasNext()) {
        var data = iteratorTrain.next()
        var idx = iteratorTrain.currentIndex()
        var maxIdx = iteratorTrain.maxIndex()
        client.showProgressUpdate("", ((i * maxIdx + idx) * 100) / (nEpochs * maxIdx) )
        net.fit(data); 
    }
}

"Done"
Training: 
Out[200]:
Done

Evaluation

Next we setup the test data in order to evaluate the predictions. We will use the following parameter values for the iterator

  • number of batches per iteration: 1
  • input size: 100 (like in the training)
  • output size: 100 (like in the training)
  • sliding window: 100 (so that we do not get any overlapping data points)

We decrease the split index by half of the sliding window so that we continue with the starting index position where we stopped the training.

In [201]:
import scala.collection.JavaConverters._

var actualIndicator = new MinMaxScaledIndicator(forecastIndicator, -1.0, 1.0).asInstanceOf[Indicator[Num]]
var actualList = List(actualIndicator).asJava
var historyIndicator = new OffsetIndicator(actualIndicator, -1).asInstanceOf[Indicator[Num]]
var historyList = List(historyIndicator).asJava
var splitDate = Context.date("2018-01-01")
var pos = IndicatorSplitter.getSplitPos(actualList, splitDate)
var inputTest = IndicatorSplitter.split(actualList, pos, false)
var outputTest = IndicatorSplitter.split(actualList, pos, false)

var iteratorTest = new StockData3DIterator(inputTest, outputTest,1,100,100,100);
Out[201]:

Then we determine forecasted values and compare them with the actual values: In each batch we need to take the all values where the mask returns a 1.0 value:

In [202]:
var resultMap = new ListBuffer[Map[String,Any]] 
var ev = new RegressionEvaluation(1)
iteratorTest.reset();
net.rnnClearPreviousState()

while (iteratorTest.hasNext()) {
    var data = iteratorTest.next()
    var mask = data.getLabelsMaskArray() 
    var labels = data.getLabels()
    var prediction = net.rnnTimeStep(data.getFeatures());
    
    ev.eval(labels, prediction)
    for (j <- 0 to iteratorTest.inputPeriods()-1) {
        if (mask.getDouble(0l,j) == 1.0) {    
            resultMap += scala.collection.Map("actual" -> labels.getDouble(0l,0l, j), "predict" -> prediction.getDouble(0l,0l,j))
        }
    }
}

ev.stats
Out[202]:
Column    MSE            MAE            RMSE           RSE            PC             R^2            
col_0     1.08503e-02    6.65588e-02    1.04165e-01    3.49023e-02    9.83712e-01    9.65098e-01    
In [203]:
import scala.collection.JavaConverters._

resultMap.map(r => r.asJava).asJava
In [204]:
val actualLine = new Line() {
    x = 1 to resultMap.size
    y = resultMap.map(map => map.get("actual").get.asInstanceOf[Double])
    displayName = "actual"
}
val predictLine = new Line() {
    x = 1 to resultMap.size
    y = resultMap.map(map => map.get("predict").get.asInstanceOf[Double])
    displayName = "predict"
}

new Plot().add(Seq(actualLine, predictLine))

Full Data

And finally here is the unscaled result over the full Data

In [205]:
import scala.collection.JavaConverters._

var resultList = new ListBuffer[Double] 
var iterator = new StockData3DIterator(Arrays.asList(history), Arrays.asList(actual),1,100,100,100);    
var scaler = actual.getScaler()
net.rnnClearPreviousState()

while (iterator.hasNext()) {
    var data = iterator.next()
    var mask = data.getLabelsMaskArray() 
    var labels = data.getLabels()
    var prediction = net.rnnTimeStep(data.getFeatures());
    
    ev.eval(labels, prediction)
    for (j <- 0 to iteratorTest.inputPeriods()-1) {
        if (mask.getDouble(0l,j) == 1.0) {    
            resultList +=  scaler.denormalizeValue(prediction.getDouble(0l,0l,j))
        }
    }
}

var actualValues = forecastIndicator.toHistoricValues()
val actualLine = new Line() {
    x = 1 to actualValues.size()
    y = actualValues.getValues().asScala
    displayName = "actual"
}

val predictLine = new Line() {
    x = 1 to resultList.size
    y = resultList
    displayName = "predictLine"
}

new Plot().add(Seq(actualLine, predictLine))

Consclusions

From this excercises we can draw the following conclusions:

  • The input and output data should be scaled between -1 and 1
  • The StockData3DIterator is providing the data in a correct format that can be used by the DL4J RNN networks. The parameters in the constructor provide a great deal of flexibility.
  • The network is able to predict the test data reasonably well
  • At the beginning of the prediction the system needs some start up period to build up internal state and to synchronize with the real data
In [ ]: