Smile - Predicting the Direction of Stock Market Prices using a Random Forrest Classifier¶

In this demo we show how to forecast if the closing price of NASDAQ-100 is moving up or down. We do this with the help of a Random Forest Classifier. I tried to replicate the result from a research paper authored by Luckyson Khaidem, Snehanshu Saha, Sudeepa Roy Dey:

Data Preprocessing
- exponentially smoothing
Features
- Relative Strength Index
- Stochastic Oscillator
- Williams %R
- Moving Average Convergence Divergence
- Price Rate of Change
- On Balance Volume
Labels
- Sign(closei+d − closei)
Ticker Symbols
- NDX

The solution has been implemented in Scala using Jupyter with the BeakerX kernel using the following libraries

Investor
Smile (Machine Learning Framework for Java and Scala)

Setup¶

We add the necessary java libraries with the help of Maven...

In [52]:

%classpath config resolver maven-public http://software.pschatzmann.ch/repository/maven-public/
%%classpath add mvn 
ch.pschatzmann:investor:LATEST
ch.pschatzmann:jupyter-jdk-extensions:LATEST
com.github.haifengl smile-scala_2.11 1.5.2

... and we import all relevant packages

In [53]:

import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import org.ta4j.core.indicators._
import org.ta4j.core.indicators.volume._
import org.ta4j.core.indicators.helpers._

import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration._
import ch.pschatzmann.display._

Out[53]:

import org.ta4j.core.Indicator
import org.ta4j.core.num.Num
import org.ta4j.core.indicators._
import org.ta4j.core.indicators.volume._
import org.ta4j.core.indicators.helpers._
import ch.pschatzmann.stocks.Context
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration._
import ch.pschatzmann.display._

Data Generation¶

First we can use the StockTimeSeriesEMA for the exponential smoothing of the original input time series. We generate a plot with different smoothing periods:

In [54]:

import scala.collection.JavaConverters._

// Use exponentially smoothed time series
var timeSeries = Context.getStockData("NDX").toTimeSeries()

var timeSeries0 = StockTimeSeriesEMA.create(timeSeries, 0)
var timeSeries5 = StockTimeSeriesEMA.create(timeSeries, 5)
var timeSeries10 = StockTimeSeriesEMA.create(timeSeries, 10)
var timeSeries20 = StockTimeSeriesEMA.create(timeSeries, 20)

var close = new ClosePriceIndicator(timeSeries)
var close0 = new NamedIndicator(new ClosePriceIndicator(timeSeries0),"close-0")
var close5 = new NamedIndicator(new ClosePriceIndicator(timeSeries5),"close-5")
var close10 = new NamedIndicator(new ClosePriceIndicator(timeSeries10),"close-10")
var close20 = new NamedIndicator(new ClosePriceIndicator(timeSeries20),"close-20")

var table = Table.create(close, close0,close5,close10,close20)

In [55]:

new SimpleTimePlot {
    data = table.seq()
    columns = Seq("ClosePriceIndicator","close-0","close-5","close-10","close-20")
    showLegend = true
}

We can easily generate the described input input features and output labels with the help of ta4j Indicators and then we display the data as a table.

In [56]:

import scala.collection.JavaConverters._

var timeSeries = timeSeries10

// Relative Strength Index
var close = new ClosePriceIndicator(timeSeries)
var rsi = new RSIIndicator(close, 10)

// Stochastic Oscillator
var sk = new StochasticOscillatorKIndicator(timeSeries, 10 )
var sd = new StochasticOscillatorDIndicator(sk)

// Williams %R
var williamsR = new WilliamsRIndicator(timeSeries,10)

// Moving Average Convergence Divergence
var macd = new MACDIndicator(close)

// Price Rate of Change
var roc = new ROCIndicator(close, 10)

// On Balance Volume
var obv = new OnBalanceVolumeIndicator(timeSeries)

// Label
var offsetIndicator= new OffsetIndicator(close, +10)
var difference = new DifferenceIndicator(close, offsetIndicator)
var label = new SignIndicator(difference) 

var in:List[org.ta4j.core.Indicator[org.ta4j.core.num.Num]] = List(rsi,sk,sd,williamsR,macd,roc,obv) 
var out:List[org.ta4j.core.Indicator[org.ta4j.core.num.Num]] = List(label)

val table = Table.create(rsi,sk,sd,williamsR,macd,roc,obv, label)

In [ ]:

Shuffle and Splitting the Data¶

Before we use the data in machine learning we shuffle it and split it into a training and test dataset

In [57]:

table.shuffle()
val tuple = table.split(0.8)
val training = tuple.x
val testing = tuple.y

s"Size: ${training.size} / ${testing.size}"

Out[57]:

Size: 1423 / 355

Model¶

Smile needs Arrays as input. We can convert our Table data into Arrays by calling to1DArray and to2DArray. The Sign Indicator is between -1 and +1 but Smile expects the values as integers starting from 0. We convert the data to integers and add 1 so that we get 0 for negative, 1 for indifferent and 2 for positive.

We can convert the table data by indicating the fields or exclusion fields (with a - prefix). Because we get the numbers as java.lang.Double we need to convert themto Scala Doubles:

In [58]:

import smile.classification.RandomForest;

val trainLabel = training.to1DArray("SignIndicator").map(n => n.toInt + 1)
val trainData = training.to2DArray("-SignIndicator","-time").map(_.map(_.toDouble))
val numberOfTrees = trainLabel.toSet.size
val model = new RandomForest(trainData, trainLabel, numberOfTrees);

Out[58]:

smile.classification.RandomForest@112b049f

Accuracy¶

Here is the calculation of the accuracy of our predictions (for the test data)

In [59]:

import smile.validation._

val testLabel = testing.to1DArray("SignIndicator").map(n => n.toInt + 1)
val testData = testing.to2DArray("-SignIndicator","-time").map(_.map(_.toDouble))
var labelPredict = new Array[Int](testData.size)

for (i <- 0 to testLabel.length-1) 
    labelPredict(i) = model.predict(testData(i))

val accuracyResult = accuracy(testLabel, labelPredict);

Out[59]:

0.7492957746478873

In [ ]: