Smile - Predicting the Direction of Stock Market Prices using a Random Forrest Classifier

In this demo we show how to forecast if the closing price of NASDAQ-100 is moving up or down. We do this with the help of a Random Forest Classifier. I tried to replicate the result from a research paper authored by Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey:

  • Data Preprocessing
    • exponentially smoothing
  • Features
    • Relative Strength Index
    • Stochastic Oscillator
    • Williams %R
    • Moving Average Convergence Divergence
    • Price Rate of Change
    • On Balance Volume
  • Labels
    • Sign(closei+d − closei)
  • Ticker Symbols
    • NDX

The solution has been implemented in Scala using Jupyter with the BeakerX kernel using the following libraries

  • Investor
  • Smile (Machine Learning Framework for Java and Scala)

Setup

We add the necessary java libraries with the help of Maven...

In [52]:
%classpath config resolver maven-public http://software.pschatzmann.ch/repository/maven-public/
%%classpath add mvn 
ch.pschatzmann:investor:LATEST
ch.pschatzmann:jupyter-jdk-extensions:LATEST
com.github.haifengl smile-scala_2.11 1.5.2

... and we import all relevant packages

In [53]:
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import org.ta4j.core.indicators._
import org.ta4j.core.indicators.volume._
import org.ta4j.core.indicators.helpers._

import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration._
import ch.pschatzmann.display._
Out[53]:
import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import org.ta4j.core.indicators._
import org.ta4j.core.indicators.volume._
import org.ta4j.core.indicators.helpers._
import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration._
import ch.pschatzmann.display._

Data Generation

First we can use the StockTimeSeriesEMA for the exponential smoothing of the original input time series. We generate a plot with different smoothing periods:

In [54]:
import scala.collection.JavaConverters._

// Use exponentially smoothed time series
var timeSeries = Context.getStockData("NDX").toTimeSeries()

var timeSeries0 = StockTimeSeriesEMA.create(timeSeries, 0)
var timeSeries5 = StockTimeSeriesEMA.create(timeSeries, 5)
var timeSeries10 = StockTimeSeriesEMA.create(timeSeries, 10)
var timeSeries20 = StockTimeSeriesEMA.create(timeSeries, 20)

var close = new ClosePriceIndicator(timeSeries)
var close0 = new NamedIndicator(new ClosePriceIndicator(timeSeries0),"close-0")
var close5 = new NamedIndicator(new ClosePriceIndicator(timeSeries5),"close-5")
var close10 = new NamedIndicator(new ClosePriceIndicator(timeSeries10),"close-10")
var close20 = new NamedIndicator(new ClosePriceIndicator(timeSeries20),"close-20")

var table = Table.create(close, close0,close5,close10,close20)
In [55]:
new SimpleTimePlot {
    data = table.seq()
    columns = Seq("ClosePriceIndicator","close-0","close-5","close-10","close-20")
    showLegend = true
}

We can easily generate the described input input features and output labels with the help of ta4j Indicators and then we display the data as a table.

In [56]:
import scala.collection.JavaConverters._

var timeSeries = timeSeries10

// Relative Strength Index
var close = new ClosePriceIndicator(timeSeries)
var rsi = new RSIIndicator(close, 10)

// Stochastic Oscillator
var sk = new StochasticOscillatorKIndicator(timeSeries, 10 )
var sd = new StochasticOscillatorDIndicator(sk)

// Williams %R
var williamsR = new WilliamsRIndicator(timeSeries,10)

// Moving Average Convergence Divergence
var macd = new MACDIndicator(close)

// Price Rate of Change
var roc = new ROCIndicator(close, 10)

// On Balance Volume
var obv = new OnBalanceVolumeIndicator(timeSeries)

// Label
var offsetIndicator= new OffsetIndicator(close, +10)
var difference = new DifferenceIndicator(close, offsetIndicator)
var label = new SignIndicator(difference) 

var in:List[org.ta4j.core.Indicator[org.ta4j.core.num.Num]] = List(rsi,sk,sd,williamsR,macd,roc,obv) 
var out:List[org.ta4j.core.Indicator[org.ta4j.core.num.Num]] = List(label)

val table = Table.create(rsi,sk,sd,williamsR,macd,roc,obv, label)
In [ ]:

Shuffle and Splitting the Data

Before we use the data in machine learning we shuffle it and split it into a training and test dataset

In [57]:
table.shuffle()
val tuple = table.split(0.8)
val training = tuple.x
val testing = tuple.y

s"Size: ${training.size} / ${testing.size}"
Out[57]:
Size: 1423 / 355

Model

Smile needs Arrays as input. We can convert our Table data into Arrays by calling to1DArray and to2DArray. The Sign Indicator is between -1 and +1 but Smile expects the values as integers starting from 0. We convert the data to integers and add 1 so that we get 0 for negative, 1 for indifferent and 2 for positive.

We can convert the table data by indicating the fields or exclusion fields (with a - prefix). Because we get the numbers as java.lang.Double we need to convert themto Scala Doubles:

In [58]:
import smile.classification.RandomForest;

val trainLabel = training.to1DArray("SignIndicator").map(n => n.toInt + 1)
val trainData = training.to2DArray("-SignIndicator","-time").map(_.map(_.toDouble))
val numberOfTrees = trainLabel.toSet.size
val model = new RandomForest(trainData, trainLabel, numberOfTrees);

Accuracy

Here is the calculation of the accuracy of our predictions (for the test data)

In [59]:
import smile.validation._

val testLabel = testing.to1DArray("SignIndicator").map(n => n.toInt + 1)
val testData = testing.to2DArray("-SignIndicator","-time").map(_.map(_.toDouble))
var labelPredict = new Array[Int](testData.size)

for (i <- 0 to testLabel.length-1) 
    labelPredict(i) = model.predict(testData(i))

val accuracyResult = accuracy(testLabel, labelPredict);
Out[59]:
0.7492957746478873
In [ ]:

In [ ]: