I was struggling with the DL4J LSTM network to be used to forecast stock data and I was not sure where the issue was:
So I decided to simplify the problem by using generated sinusoid data instead of real stock values in order to proof that the LSTM model and the StockData3DIterator are working as expected.
We add the necessary java libraries with the help of Maven
%classpath config resolver maven-public1 http://nuc.local:8081/repository/maven-public/
%%classpath add mvn
ch.pschatzmann:investor:LATEST
ch.pschatzmann:investor-dl4j:LATEST
org.nd4j:nd4j-native:1.0.0-beta2
org.deeplearning4j:deeplearning4j-core:1.0.0-beta2
import java.util.Arrays
import java.util.Date
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import ch.pschatzmann.stocks.forecasting._
import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration.dl4j._
import ch.pschatzmann.stocks.integration.StockTimeSeries
import ch.pschatzmann.display.Table
import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator;
import org.deeplearning4j.eval._;
import org.deeplearning4j.nn.conf._
import org.deeplearning4j.nn.conf.inputs.InputType;
import org.deeplearning4j.nn.conf.layers._;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.deeplearning4j.datasets.datavec._;
import org.deeplearning4j.evaluation._
import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.datavec.api.records.reader.RecordReader
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator
import org.nd4j.linalg.learning.config._
import org.nd4j.linalg.lossfunctions.LossFunctions
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
import org.nd4j.linalg.dataset.api.preprocessor._
import org.nd4j.linalg.dataset.api._
import org.ta4j.core.indicators.helpers._
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import scala.collection.mutable.ListBuffer
import scala.collection.Map
import java.util.Arrays import java.util.Date import org.ta4j.core.Indicator import org.ta4j.core.num.Num import ch.pschatzmann.stocks.forecasting._ import ch.pschatzmann.stocks.Context import ch.pschatzmann.stocks.ta4j.indicator._ import ch.pschatzmann.stocks.integration.dl4j._ import ch.pschatzmann.stocks.integration.StockTimeSeries import ch.pschatzmann.display.Table import org.deeplearning4j.datasets.iterator.impl.MnistDataSetIterator import org.deeplearning4j.eval._ import org.deeplearning4j.nn.conf._ import org.deeplearning4j.nn.conf.inputs.InputType import org.deeplearning4j.nn.conf.layers._ import org.deeplearning4j.nn.multilayer.MultiLayerNetwork import org.deeplearning4j.nn.weights.WeightInit import org.deeplearning4j.optimize.listeners.ScoreIterationListener import org.deeple...
We can generate some sinoid data with the help of the HarmonicStockOscillator class. Then we use the MinMaxScaledIndicator to scale the data between -1 and 1. This data can be used as output labels and we use the OffsetIndicator to shift this data by one period back to generate the input data.
Now we can show the data in a table...
var osc = new HarmonicStockOscillatorForecast(Context.date("2010-01-01"))
var forecastIndicator = new ForecastIndicator(osc, Context.date("2019-01-01"));
var actual = new MinMaxScaledIndicator(forecastIndicator, -1.0, 1.0)
var history = new OffsetIndicator(actual, -1)
var table = Table.create(forecastIndicator,history,actual)
... and we generate a Chart
new SimpleTimePlot {
data = table.seq()
columns = Seq("ForecastIndicator", "ForecastIndicator-MinMaxScaled","ForecastIndicator-MinMaxScaled-1")
}
We split the data and generate a StockData3DIterator for the training data. We use
var splitDate = Context.date("2018-01-01")
var inputTrain = IndicatorSplitter.split(Arrays.asList(history).asInstanceOf[java.util.List[Indicator[Num]]], splitDate, true)
var outputTrain = IndicatorSplitter.split(Arrays.asList(actual).asInstanceOf[java.util.List[Indicator[Num]]], splitDate, true)
var iteratorTrain = new StockData3DIterator(inputTrain, outputTrain, 50, 100, 100, 100);
iteratorTrain.setScalingPerDataset(false)
null
iteratorTrain.next()
===========INPUT=================== [[[ 1.0000, 0.9997, 0.9988 ... -0.7540 -0.7702, -0.7859]], [[ -0.8011, -0.8159, -0.8301 ... 0.2110 0.2354, 0.2596]], [[ 0.2837, 0.3075, 0.3312 ... 0.4159 0.3931, 0.3700]], ..., [[ 0, 0, 0 ... 0 0, 0]], [[ 0, 0, 0 ... 0 0, 0]], [[ 0, 0, 0 ... 0 0, 0]]] =================OUTPUT================== [[[ 0.9997, 0.9988, 0.9972 ... -0.7702 -0.7859, -0.8011]], [[ -0.8159, -0.8301, -0.8437 ... 0.2354 0.2596, 0.2837]], [[ 0.3075, 0.3312, 0.3547 ... 0.3931 0.3700, 0.3466]], ..., [[ 0, 0, 0 ... 0 0, 0]], [[ 0, 0, 0 ... 0 0, 0]], [[ 0, 0, 0 ... 0 0, 0]]] ===========INPUT MASK=================== [[ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], [ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], [ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], ..., [ 0, 0, 0 ... 0 0, 0], [ 0, 0, 0 ... 0 0, 0], [ 0, 0, 0 ... 0 0, 0]] ===========OUTPUT MASK=================== [[ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], [ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], [ 1.0000, 1.0000, 1.0000 ... 1.0000 1.0000, 1.0000], ..., [ 0, 0, 0 ... 0 0, 0], [ 0, 0, 0 ... 0 0, 0], [ 0, 0, 0 ... 0 0, 0]]
We define the LSTM multilayer network
val seed = 12345;
val periods = iteratorTrain.inputPeriods() + iteratorTrain.outcomePeriods()
val lstmLayer1Size = periods*2;
val lstmLayer2Size = periods;
val denseLayerSize = periods;
var truncatedBPTTLength = 250
val dropoutRatio = 0.8;
var nIn = iteratorTrain.inputColumns()
var nOut = iteratorTrain.totalOutcomes()
var conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.weightInit(WeightInit.XAVIER)
.updater(Updater.ADAGRAD) // RMSPROP or ADAGRAD
.l2(1e-2)
.list()
.layer(0, new GravesLSTM.Builder()
.nIn(nIn)
.nOut(lstmLayer1Size)
.gateActivationFunction(Activation.SOFTSIGN)
.dropOut(dropoutRatio)
.build())
.layer(1, new GravesLSTM.Builder()
.nIn(lstmLayer1Size)
.nOut(lstmLayer2Size)
.gateActivationFunction(Activation.SOFTSIGN)
.dropOut(dropoutRatio)
.build())
.layer(2, new RnnOutputLayer.Builder()
.nIn(lstmLayer2Size)
.nOut(nOut)
.activation(Activation.IDENTITY)
.lossFunction(LossFunctions.LossFunction.MSE)
.build())
.backpropType(BackpropType.TruncatedBPTT)
.tBPTTForwardLength(truncatedBPTTLength)
.tBPTTBackwardLength(truncatedBPTTLength)
.pretrain(false)
.backprop(true)
.build();
var net = new MultiLayerNetwork(conf);
net.init()
conf
{ "backprop" : true, "backpropType" : "TruncatedBPTT", "cacheMode" : "NONE", "confs" : [ { "cacheMode" : "NONE", "epochCount" : 0, "iterationCount" : 0, "layer" : { "@class" : "org.deeplearning4j.nn.conf.layers.GravesLSTM", "activationFn" : { "@class" : "org.nd4j.linalg.activations.impl.ActivationSigmoid" }, "biasInit" : 0.0, "biasUpdater" : null, "constraints" : null, "dist" : null, "distRecurrent" : null, "forgetGateBiasInit" : 1.0, "gateActivationFn" : { "@class" : "org.nd4j.linalg.activations.impl.ActivationSoftSign" }, "gradientNormalization" : "None", "gradientNormalizationThreshold" : 1.0, "idropout" : { "@class" : "org.deeplearning4j.nn.conf.dropout.Dropout", "p" : 0.8, "pschedule" : null }, "iupdater" : { "@class" : "org.nd4j.linalg.learning.config.AdaGrad", "epsilon" : 1.0E-6, "learningRate" : 0.1 }, "l1" : 0.0, "l1Bias" : 0.0, "l2" : 0.01, "l2Bias" : 0.0, "layerName" : "layer0", "nin" : 1, "nout" : 400, "pretrain" : false, "weightInit" : "XAVIER", "weightInitRecurrent" : null, "weightNoise" : null }, "maxNumLineSearchIterations" : 5, "miniBatch" : true, "minimize" : true, "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT", "pretrain" : false, "seed" : 12345, "stepFunction" : null, "variables" : [ "W", "RW", "b" ] }, { "cacheMode" : "NONE", "epochCount" : 0, "iterationCount" : 0, "layer" : { "@class" : "org.deeplearning4j.nn.conf.layers.GravesLSTM", "activationFn" : { "@class" : "org.nd4j.linalg.activations.impl.ActivationSigmoid" }, "biasInit" : 0.0, "biasUpdater" : null, "constraints" : null, "dist" : null, "distRecurrent" : null, "forgetGateBiasInit" : 1.0, "gateActivationFn" : { "@class" : "org.nd4j.linalg.activations.impl.ActivationSoftSign" }, "gradientNormalization" : "None", "gradientNormalizationThreshold" : 1.0, "idropout" : { "@class" : "org.deeplearning4j.nn.conf.dropout.Dropout", "p" : 0.8, "pschedule" : null }, "iupdater" : { "@class" : "org.nd4j.linalg.learning.config.AdaGrad", "epsilon" : 1.0E-6, "learningRate" : 0.1 }, "l1" : 0.0, "l1Bias" : 0.0, "l2" : 0.01, "l2Bias" : 0.0, "layerName" : "layer1", "nin" : 400, "nout" : 200, "pretrain" : false, "weightInit" : "XAVIER", "weightInitRecurrent" : null, "weightNoise" : null }, "maxNumLineSearchIterations" : 5, "miniBatch" : true, "minimize" : true, "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT", "pretrain" : false, "seed" : 12345, "stepFunction" : null, "variables" : [ "W", "RW", "b" ] }, { "cacheMode" : "NONE", "epochCount" : 0, "iterationCount" : 0, "layer" : { "@class" : "org.deeplearning4j.nn.conf.layers.RnnOutputLayer", "activationFn" : { "@class" : "org.nd4j.linalg.activations.impl.ActivationIdentity" }, "biasInit" : 0.0, "biasUpdater" : null, "constraints" : null, "dist" : null, "gradientNormalization" : "None", "gradientNormalizationThreshold" : 1.0, "hasBias" : true, "idropout" : null, "iupdater" : { "@class" : "org.nd4j.linalg.learning.config.AdaGrad", "epsilon" : 1.0E-6, "learningRate" : 0.1 }, "l1" : 0.0, "l1Bias" : 0.0, "l2" : 0.01, "l2Bias" : 0.0, "layerName" : "layer2", "lossFn" : { "@class" : "org.nd4j.linalg.lossfunctions.impl.LossMSE", "configProperties" : false, "numOutputs" : -1 }, "nin" : 200, "nout" : 1, "pretrain" : false, "weightInit" : "XAVIER", "weightNoise" : null }, "maxNumLineSearchIterations" : 5, "miniBatch" : true, "minimize" : true, "optimizationAlgo" : "STOCHASTIC_GRADIENT_DESCENT", "pretrain" : false, "seed" : 12345, "stepFunction" : null, "variables" : [ "W", "b" ] } ], "epochCount" : 0, "inferenceWorkspaceMode" : "ENABLED", "inputPreProcessors" : { }, "iterationCount" : 0, "pretrain" : false, "tbpttBackLength" : 250, "tbpttFwdLength" : 250, "trainingWorkspaceMode" : "ENABLED" }
Next we train the network in with n epochs:
println("Training: ")
var nEpochs = 10
var client = NamespaceClient.getBeakerX()
for(i <- 0 to nEpochs-1 ) {
iteratorTrain.reset();
while (iteratorTrain.hasNext()) {
var data = iteratorTrain.next()
var idx = iteratorTrain.currentIndex()
var maxIdx = iteratorTrain.maxIndex()
client.showProgressUpdate("", ((i * maxIdx + idx) * 100) / (nEpochs * maxIdx) )
net.fit(data);
}
}
"Done"
Training:
Done
Next we setup the test data in order to evaluate the predictions. We will use the following parameter values for the iterator
We decrease the split index by half of the sliding window so that we continue with the starting index position where we stopped the training.
import scala.collection.JavaConverters._
var actualIndicator = new MinMaxScaledIndicator(forecastIndicator, -1.0, 1.0).asInstanceOf[Indicator[Num]]
var actualList = List(actualIndicator).asJava
var historyIndicator = new OffsetIndicator(actualIndicator, -1).asInstanceOf[Indicator[Num]]
var historyList = List(historyIndicator).asJava
var splitDate = Context.date("2018-01-01")
var pos = IndicatorSplitter.getSplitPos(actualList, splitDate)
var inputTest = IndicatorSplitter.split(actualList, pos, false)
var outputTest = IndicatorSplitter.split(actualList, pos, false)
var iteratorTest = new StockData3DIterator(inputTest, outputTest,1,100,100,100);
ch.pschatzmann.stocks.integration.dl4j.StockData3DIterator@4c552ea8
Then we determine forecasted values and compare them with the actual values: In each batch we need to take the all values where the mask returns a 1.0 value:
var resultMap = new ListBuffer[Map[String,Any]]
var ev = new RegressionEvaluation(1)
iteratorTest.reset();
net.rnnClearPreviousState()
while (iteratorTest.hasNext()) {
var data = iteratorTest.next()
var mask = data.getLabelsMaskArray()
var labels = data.getLabels()
var prediction = net.rnnTimeStep(data.getFeatures());
ev.eval(labels, prediction)
for (j <- 0 to iteratorTest.inputPeriods()-1) {
if (mask.getDouble(0l,j) == 1.0) {
resultMap += scala.collection.Map("actual" -> labels.getDouble(0l,0l, j), "predict" -> prediction.getDouble(0l,0l,j))
}
}
}
ev.stats
Column MSE MAE RMSE RSE PC R^2 col_0 1.08503e-02 6.65588e-02 1.04165e-01 3.49023e-02 9.83712e-01 9.65098e-01
import scala.collection.JavaConverters._
resultMap.map(r => r.asJava).asJava
val actualLine = new Line() {
x = 1 to resultMap.size
y = resultMap.map(map => map.get("actual").get.asInstanceOf[Double])
displayName = "actual"
}
val predictLine = new Line() {
x = 1 to resultMap.size
y = resultMap.map(map => map.get("predict").get.asInstanceOf[Double])
displayName = "predict"
}
new Plot().add(Seq(actualLine, predictLine))
And finally here is the unscaled result over the full Data
import scala.collection.JavaConverters._
var resultList = new ListBuffer[Double]
var iterator = new StockData3DIterator(Arrays.asList(history), Arrays.asList(actual),1,100,100,100);
var scaler = actual.getScaler()
net.rnnClearPreviousState()
while (iterator.hasNext()) {
var data = iterator.next()
var mask = data.getLabelsMaskArray()
var labels = data.getLabels()
var prediction = net.rnnTimeStep(data.getFeatures());
ev.eval(labels, prediction)
for (j <- 0 to iteratorTest.inputPeriods()-1) {
if (mask.getDouble(0l,j) == 1.0) {
resultList += scaler.denormalizeValue(prediction.getDouble(0l,0l,j))
}
}
}
var actualValues = forecastIndicator.toHistoricValues()
val actualLine = new Line() {
x = 1 to actualValues.size()
y = actualValues.getValues().asScala
displayName = "actual"
}
val predictLine = new Line() {
x = 1 to resultList.size
y = resultList
displayName = "predictLine"
}
new Plot().add(Seq(actualLine, predictLine))
From this excercises we can draw the following conclusions: