In my last blog I demonstrated how to build a model that can predict if a stock is going up or down based on the news headlines using Spark MLLib. In this demo I will do the same - but with the help of OpenNLP.
The solution consists of the following components
We import the necessary libraries with the help of Maven...
%classpath config resolver maven-public http://192.168.1.10:8081/repository/maven-public/
%%classpath add mvn
org.apache.opennlp:opennlp-tools:LATEST
ch.pschatzmann:news-digest:LATEST
ch.pschatzmann:investor:LATEST
com.github.habernal:confusion-matrix:1.0
Added new repo: maven-public
...and we import the relevant packages or classes
import java.io._
import java.util.Date
import org.ta4j.core.indicators._
import org.ta4j.core.indicators.helpers._
import ch.pschatzmann.news._
import ch.pschatzmann.stocks._
import ch.pschatzmann.stocks.ta4j.indicator._
import ch.pschatzmann.stocks.integration.HistoricValues
import ch.pschatzmann.dates.CalendarUtils
import opennlp.tools.ml.maxent.quasinewton.QNTrainer
import opennlp.tools.doccat.DocumentSample
import opennlp.tools.tokenize._
import opennlp.tools.util.TrainingParameters
import opennlp.tools.doccat.DoccatFactory
import opennlp.tools.doccat.BagOfWordsFeatureGenerator
import opennlp.tools.util.CollectionObjectStream
import opennlp.tools.doccat.DocumentCategorizerME;
import com.twosigma.beakerx.widget.Output
import com.github.habernal.confusionmatrix.ConfusionMatrix
import java.io._ import java.util.Date import org.ta4j.core.indicators._ import org.ta4j.core.indicators.helpers._ import ch.pschatzmann.news._ import ch.pschatzmann.stocks._ import ch.pschatzmann.stocks.ta4j.indicator._ import ch.pschatzmann.stocks.integration.HistoricValues import ch.pschatzmann.dates.CalendarUtils import opennlp.tools.ml.maxent.quasinewton.QNTrainer import opennlp.tools.doccat.DocumentSample import opennlp.tools.tokenize._ import opennlp.tools.util.TrainingParameters import opennlp.tools.doccat.DoccatFactory import opennlp.tools.doccat.BagOfWordsFeatureGenerator import opennlp.tools.util.CollectionObjectStream import opennlp.tools.doccat.DocumentCategorizerME import com.twosigma.beakerx.widget.Output import com.github.habernal.confusionmatrix.ConfusionMatrix
We predict the stock movement using the closing prices by calculating the sign of the difference of the closing prices. We also need to handle the case that for some dates (e.g. Saturday or Sundays) we might not have any stock quotes. In this case we just us the prior working day.
var stockData = Context.getStockData("MSFT")
var series = stockData.toTimeSeries()
var close = new ClosePriceIndicator(series)
var closePrior = new OffsetIndicator(close, -1)
var closeNext = new OffsetIndicator(close, +1)
var sentiment = new SignIndicator(new DifferenceIndicator(closeNext, closePrior)).toHistoricValues()
def getSentiment(date:Date):String = {
var result = sentiment.getValue(date)
if (result==null){
val altDate = CalendarUtils.priorWorkDay(date)
result = sentiment.getValue(altDate)
}
if (result==null){
result = 0;
}
return SignIndicator.getLabel(result)
}
getSentiment(Context.date("2010-01-04"))
negative
We need to provide the data as collection of DocumentSample objects. This consists of the sentiment - which is calculated as indicated above and the tokenized text of the news headline. So we define a tokenizer first:
val tokenizer = SimpleTokenizer.INSTANCE
tokenizer.tokenize("this is a sentence. And this as well");
[this, is, a, sentence, ., And, this, as, well]
Now we can query our Solr search engine to provide all articles related to a ticker symbol and convert the result to DocumentSample objects
import scala.collection.JavaConverters._
val ticker = "MSFT"
val query = Utils.companyNameByTickerSearch(ticker)
val store = new SolrDocumentStore()
val data = store.pagedDocuments(query).asScala
.map(page => page.values.asScala)
.flatMap(l => l)
.map(doc => new DocumentSample(getSentiment(doc.date), tokenizer.tokenize(doc.content)))
data.size
18509
Finally we split our data into training and test data
val shuffled = scala.util.Random.shuffle(data)
val pos = (shuffled.size * 0.8).toInt
val train = shuffled.slice(0, pos)
val test = shuffled.slice(pos, shuffled.size)
train.size
14807
data.last
neutral Companies like Facebook , Apple , Amazon , Netflix , Google and Microsoft aren ’ t the same . Investors shouldn ’ t act as though they are , our columnist says .
Now we can build our model by calling DocumentCategorizerME.train.
import scala.collection.JavaConverters._
var factory = new DoccatFactory()
var params = TrainingParameters.defaultParams();
params.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(2));
params.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(200));
var model = DocumentCategorizerME.train("en", new CollectionObjectStream(train.asJava), params, factory);
Indexing events with TwoPass using cutoff of 2 Computing event counts... done. 14807 events Indexing... done. Sorting and merging events... done. Reduced 14807 events to 14546. Done indexing in 3.04 s. Incorporating indexed data for training... done. Number of Event Tokens: 14546 Number of Outcomes: 3 Number of Predicates: 44075 ...done. Computing model parameters ... Performing 200 iterations. 1: ... loglikelihood=-16267.152158306564 0.4888903896805565 2: ... loglikelihood=-16139.393733735737 0.7857769973661106 3: ... loglikelihood=-16025.201229377995 0.7892888498683055 4: ... loglikelihood=-15919.538588972775 0.7912473829945296 5: ... loglikelihood=-15820.878124436493 0.791990274869994 6: ... loglikelihood=-15728.24506898364 0.7938137367461335 7: ... loglikelihood=-15640.890749065113 0.7944215573715134 8: ... loglikelihood=-15558.20908429748 0.7949618423718512 9: ... loglikelihood=-15479.696315876308 0.7956371986222732 10: ... loglikelihood=-15404.92664265526 0.79523198487202 11: ... loglikelihood=-15333.535869162475 0.7962450192476531 12: ... loglikelihood=-15265.209706781417 0.7957722698723577 13: ... loglikelihood=-15199.675022173273 0.7962450192476531 14: ... loglikelihood=-15136.693071179749 0.7963800904977375 15: ... loglikelihood=-15076.05413566262 0.7965826973728641 16: ... loglikelihood=-15017.57319055486 0.7969203754980753 17: ... loglikelihood=-14961.086351523561 0.7969879111231174 18: ... loglikelihood=-14906.447929308275 0.7975281961234552 19: ... loglikelihood=-14853.527965219462 0.7973931248733707 20: ... loglikelihood=-14802.210154422686 0.7977308029985818 21: ... loglikelihood=-14752.390085750658 0.7980009454987506 22: ... loglikelihood=-14703.97374248492 0.7984736948740461 23: ... loglikelihood=-14656.876220040984 0.7980684811237928 24: ... loglikelihood=-14611.020625109879 0.7980684811237928 25: ... loglikelihood=-14566.337127426397 0.7982710879989194 26: ... loglikelihood=-14522.762140505361 0.798743837374215 27: ... loglikelihood=-14480.23761178039 0.7992165867495103 28: ... loglikelihood=-14438.710405881378 0.7993516579995947 29: ... loglikelihood=-14398.131767449644 0.7998919429999325 30: ... loglikelihood=-14358.456852075182 0.8002296211251435 31: ... loglikelihood=-14319.644315739306 0.8004322280002701 32: ... loglikelihood=-14281.655954627637 0.8004997636253124 33: ... loglikelihood=-14244.45638842029 0.8006348348753968 34: ... loglikelihood=-14208.012781192041 0.800702370500439 35: ... loglikelihood=-14172.294594928098 0.8009725130006078 36: ... loglikelihood=-14137.27337137735 0.80104004862565 37: ... loglikelihood=-14102.922538586483 0.8011751198757344 38: ... loglikelihood=-14069.217238971678 0.8013101911258189 39: ... loglikelihood=-14036.134176223313 0.801377726750861 40: ... loglikelihood=-14003.651478711843 0.8015127980009455 41: ... loglikelihood=-13971.748577381595 0.801985547376241 42: ... loglikelihood=-13940.406096386672 0.801985547376241 43: ... loglikelihood=-13909.605754958031 0.802323225501452 44: ... loglikelihood=-13879.330279183643 0.8025258323765786 45: ... loglikelihood=-13849.563322559792 0.8025933680016208 46: ... loglikelihood=-13820.28939431235 0.8026609036266631 47: ... loglikelihood=-13791.493794616194 0.8027284392517052 48: ... loglikelihood=-13763.162555948156 0.8026609036266631 49: ... loglikelihood=-13735.282389901742 0.8027959748767475 50: ... loglikelihood=-13707.840638877062 0.8028635105017897 51: ... loglikelihood=-13680.825232124958 0.8028635105017897 52: ... loglikelihood=-13654.224645690696 0.8031336530019585 53: ... loglikelihood=-13628.027865852257 0.803268724252043 54: ... loglikelihood=-13602.224355696222 0.8033362598770851 55: ... loglikelihood=-13576.804024518111 0.8036739380022963 56: ... loglikelihood=-13551.75719976125 0.803944080502465 57: ... loglikelihood=-13527.074601251164 0.8041466873775917 58: ... loglikelihood=-13502.747317499492 0.8040791517525495 59: ... loglikelihood=-13478.766783882356 0.8040791517525495 60: ... loglikelihood=-13455.124762514928 0.8040791517525495 61: ... loglikelihood=-13431.813323666753 0.8042142230026339 62: ... loglikelihood=-13408.824828574678 0.8042142230026339 63: ... loglikelihood=-13386.15191352975 0.8044168298777605 64: ... loglikelihood=-13363.787475121122 0.8046869723779294 65: ... loglikelihood=-13341.724656538181 0.8049571148780982 66: ... loglikelihood=-13319.956834838567 0.8049571148780982 67: ... loglikelihood=-13298.477609097039 0.8051597217532248 68: ... loglikelihood=-13277.28078936486 0.8054298642533937 69: ... loglikelihood=-13256.360386368462 0.8054298642533937 70: ... loglikelihood=-13235.710601888844 0.8056324711285203 71: ... loglikelihood=-13215.325819766238 0.8054973998784358 72: ... loglikelihood=-13195.200597479006 0.8055649355034781 73: ... loglikelihood=-13175.3296582525 0.8056324711285203 74: ... loglikelihood=-13155.707883656276 0.8055649355034781 75: ... loglikelihood=-13136.330306652651 0.8057000067535625 76: ... loglikelihood=-13117.19210506105 0.805835078003647 77: ... loglikelihood=-13098.288595408789 0.805835078003647 78: ... loglikelihood=-13079.615227138642 0.8059701492537313 79: ... loglikelihood=-13061.167577147911 0.806172756128858 80: ... loglikelihood=-13042.941344633928 0.806172756128858 81: ... loglikelihood=-13024.932346226708 0.8059701492537313 82: ... loglikelihood=-13007.136511385936 0.8060376848787736 83: ... loglikelihood=-12989.549878045798 0.8061052205038157 84: ... loglikelihood=-12972.168588490851 0.806172756128858 85: ... loglikelihood=-12954.988885446051 0.806172756128858 86: ... loglikelihood=-12938.0071083681 0.8063753630039846 87: ... loglikelihood=-12921.219689924257 0.8064428986290268 88: ... loglikelihood=-12904.623152647182 0.8067130411291956 89: ... loglikelihood=-12888.214105753184 0.8067805767542379 90: ... loglikelihood=-12871.989242115957 0.8069156480043223 91: ... loglikelihood=-12855.945335383229 0.8070507192544067 92: ... loglikelihood=-12840.079237230466 0.8071857905044911 93: ... loglikelihood=-12824.387874741293 0.8071857905044911 94: ... loglikelihood=-12808.86824790747 0.8072533261295334 95: ... loglikelihood=-12793.517427242725 0.8072533261295334 96: ... loglikelihood=-12778.332551502126 0.8072533261295334 97: ... loglikelihood=-12763.31082550149 0.80745593300466 98: ... loglikelihood=-12748.449518031823 0.8075234686297021 99: ... loglikelihood=-12733.745959862088 0.8077260755048288 100: ... loglikelihood=-12719.197541827727 0.807793611129871 101: ... loglikelihood=-12704.80171299708 0.8079286823799554 102: ... loglikelihood=-12690.555978914328 0.8080637536300398 103: ... loglikelihood=-12676.457899913532 0.808131289255082 104: ... loglikelihood=-12662.505089500502 0.8083338961302087 105: ... loglikelihood=-12648.69521279808 0.8084014317552509 106: ... loglikelihood=-12635.02598505332 0.8085365030053353 107: ... loglikelihood=-12621.495170200984 0.8084689673802931 108: ... loglikelihood=-12608.100579483673 0.8086715742554197 109: ... loglikelihood=-12594.840070122778 0.8086040386303776 110: ... loglikelihood=-12581.711544038906 0.8086715742554197 111: ... loglikelihood=-12568.71294662072 0.8087391098804619 112: ... loglikelihood=-12555.842265537854 0.8086040386303776 113: ... loglikelihood=-12543.09752959652 0.8088066455055042 114: ... loglikelihood=-12530.476807636542 0.8088741811305463 115: ... loglikelihood=-12517.978207466886 0.8089417167555886 116: ... loglikelihood=-12505.599874838568 0.809076788005673 117: ... loglikelihood=-12493.339992453197 0.809076788005673 118: ... loglikelihood=-12481.196779004793 0.8092118592557574 119: ... loglikelihood=-12469.16848825509 0.8092118592557574 120: ... loglikelihood=-12457.253408138642 0.8092793948807996 121: ... loglikelihood=-12445.449859898707 0.8093469305058418 122: ... loglikelihood=-12433.756197250863 0.8094820017559262 123: ... loglikelihood=-12422.170805573789 0.8096170730060107 124: ... loglikelihood=-12410.692101126771 0.8097521442560951 125: ... loglikelihood=-12399.318530291384 0.810022286756264 126: ... loglikelihood=-12388.04856883754 0.810022286756264 127: ... loglikelihood=-12376.880721213201 0.8101573580063484 128: ... loglikelihood=-12365.813519854675 0.8102924292564327 129: ... loglikelihood=-12354.845524519249 0.8104275005065172 130: ... loglikelihood=-12343.975321638067 0.8106301073816439 131: ... loglikelihood=-12333.201523688287 0.8106301073816439 132: ... loglikelihood=-12322.522768584278 0.810697643006686 133: ... loglikelihood=-12311.937719086747 0.810697643006686 134: ... loglikelihood=-12301.445062229439 0.8107651786317283 135: ... loglikelihood=-12291.043508762934 0.8107651786317283 136: ... loglikelihood=-12280.731792613647 0.8108327142567704 137: ... loglikelihood=-12270.508670359315 0.811035321131897 138: ... loglikelihood=-12260.372920719235 0.8109677855068549 139: ... loglikelihood=-12250.32334405866 0.811035321131897 140: ... loglikelihood=-12240.358761907572 0.8109677855068549 141: ... loglikelihood=-12230.478016492398 0.811035321131897 142: ... loglikelihood=-12220.679970281486 0.8108327142567704 143: ... loglikelihood=-12210.963505541871 0.8109002498818126 144: ... loglikelihood=-12201.327523909657 0.8111703923819815 145: ... loglikelihood=-12191.770945970547 0.8113054636320659 146: ... loglikelihood=-12182.292710852873 0.8112379280070237 147: ... loglikelihood=-12172.891775831044 0.8113729992571082 148: ... loglikelihood=-12163.567115939237 0.8113054636320659 149: ... loglikelihood=-12154.317723595968 0.8113729992571082 150: ... loglikelihood=-12145.142608237387 0.8114405348821503 151: ... loglikelihood=-12136.04079596183 0.8115080705071925 152: ... loglikelihood=-12127.011329182225 0.8116431417572769 153: ... loglikelihood=-12118.053266287257 0.8117106773823192 154: ... loglikelihood=-12109.165681312496 0.8118457486324036 155: ... loglikelihood=-12100.347663619023 0.8118457486324036 156: ... loglikelihood=-12091.598317579776 0.8118457486324036 157: ... loglikelihood=-12082.916762275063 0.8119132842574458 158: ... loglikelihood=-12074.302131193997 0.8121158911325724 159: ... loglikelihood=-12065.753571945033 0.8121834267576147 160: ... loglikelihood=-12057.270245972226 0.8123184980076991 161: ... loglikelihood=-12048.851328279767 0.8122509623826568 162: ... loglikelihood=-12040.496007161411 0.8122509623826568 163: ... loglikelihood=-12032.203483938787 0.8123860336327413 164: ... loglikelihood=-12023.97297270411 0.8124535692577834 165: ... loglikelihood=-12015.803700069544 0.8124535692577834 166: ... loglikelihood=-12007.694904923894 0.8124535692577834 167: ... loglikelihood=-11999.64583819256 0.8125211048828257 168: ... loglikelihood=-11991.655762605173 0.8127237117579523 169: ... loglikelihood=-11983.723952468434 0.8127237117579523 170: ... loglikelihood=-11975.84969344307 0.8128587830080367 171: ... loglikelihood=-11968.032282327147 0.812926318633079 172: ... loglikelihood=-11960.27102684442 0.8130613898831633 173: ... loglikelihood=-11952.565245436428 0.8131289255082056 174: ... loglikelihood=-11944.914267060605 0.8131964611332478 175: ... loglikelihood=-11937.317430992081 0.8133315323833322 176: ... loglikelihood=-11929.774086630954 0.81326399675829 177: ... loglikelihood=-11922.283593312557 0.8133990680083745 178: ... loglikelihood=-11914.845320123219 0.8133990680083745 179: ... loglikelihood=-11907.45864571987 0.8134666036334166 180: ... loglikelihood=-11900.122958152813 0.8135341392584589 181: ... loglikelihood=-11892.83765469356 0.813601674883501 182: ... loglikelihood=-11885.602141666104 0.8137367461335855 183: ... loglikelihood=-11878.415834281212 0.8137367461335855 184: ... loglikelihood=-11871.278156475642 0.8137367461335855 185: ... loglikelihood=-11864.188540753416 0.8137367461335855 186: ... loglikelihood=-11857.146428031723 0.8137367461335855 187: ... loglikelihood=-11850.151267489415 0.8138042817586276 188: ... loglikelihood=-11843.202516419098 0.8138042817586276 189: ... loglikelihood=-11836.299640082685 0.8139393530087121 190: ... loglikelihood=-11829.442111568787 0.8139393530087121 191: ... loglikelihood=-11822.629411654632 0.8140744242587965 192: ... loglikelihood=-11815.86102867034 0.8140744242587965 193: ... loglikelihood=-11809.13645836561 0.8142094955088809 194: ... loglikelihood=-11802.455203779667 0.8142770311339231 195: ... loglikelihood=-11795.816775113868 0.8143445667589654 196: ... loglikelihood=-11789.220689607073 0.814547173634092 197: ... loglikelihood=-11782.66647141325 0.8144796380090498 198: ... loglikelihood=-11776.153651481738 0.814547173634092 199: ... loglikelihood=-11769.681767440543 0.8146822448841764 200: ... loglikelihood=-11763.250363480696 0.8146822448841764
opennlp.tools.doccat.DoccatModel@48b5fd47
val modelOut = new BufferedOutputStream(new FileOutputStream("model.bin"));
model.serialize(modelOut);
null
var myCategorizer = new DocumentCategorizerME(model);
def predict(text:Array[String]):String = {
var outcomes = myCategorizer.categorize(text)
var category = myCategorizer.getBestCategory(outcomes)
return category
}
predict(tokenizer.tokenize("big Loss"))
negative
We calculate some KPIs using ou test data with the help of the ConfusionMatrix class. https://github.com/habernal/confusion-matrix
var confusionMatrix = new ConfusionMatrix();
test.foreach(t => confusionMatrix.increaseValue(t.getCategory(), predict(t.getText()), 1))
println(confusionMatrix);
println(confusionMatrix.printNiceResults())
s"Accuracy ${confusionMatrix.getAccuracy()}"
↓gold\pred→ negative neutral positive negative 1406 0 478 neutral 29 0 11 positive 1229 0 549 Macro F-measure: 0.336, (CI at .95: 0.015), micro F-measure (acc): 0.528
Accuracy 0.5280929227444624
confusionMatrix.getPrecisionForLabels()
confusionMatrix.getRecallForLabels()
We conclude our blog by providing a scala class which summarizes all the different concents described above:
import scala.collection.JavaConverters._
class StockClassification(ticker:String) {
var stockData = Context.getStockData(ticker)
var series = stockData.toTimeSeries()
var close = new ClosePriceIndicator(series)
val tokenizer = SimpleTokenizer.INSTANCE
// calculate the stock movement for the indicated date
def getSentiment(date:Date, offset:Int):String = {
var closePrior = new OffsetIndicator(close, -1)
var closeNext = new OffsetIndicator(close, +offset)
var sentiment = new SignIndicator(new DifferenceIndicator(closeNext, closePrior)).toHistoricValues()
var result = sentiment.getValue(date)
if (result==null){
val nextDate = CalendarUtils.nextWorkDay(date)
result = sentiment.getValue(nextDate)
}
if (result==null){
result = 0;
}
return SignIndicator.getLabel(result)
}
// determines the data with stock movememnt and the headlines per date
def getData(offset: Int):scala.collection.mutable.Buffer[DocumentSample] = {
val query = Utils.companyNameByTickerSearch(ticker)
val store = new SolrDocumentStore()
val data = store.pagedDocuments(query).asScala
.map(page => page.values.asScala)
.flatMap(l => l)
.map(doc => new DocumentSample(getSentiment(doc.date, offset), tokenizer.tokenize(doc.content)))
return data
}
// calculates the accuracy
def getAccuracy(offset: Int, iterations:Int):Double = {
return getConfusionMatrix(offset: Int, iterations:Int).getAccuracy()
}
// calculates the confustion matrix
def getConfusionMatrix(offset: Int, iterations:Int):ConfusionMatrix = {
val data = this.getData(offset)
val shuffled = scala.util.Random.shuffle(data)
val pos = (shuffled.size * 0.8).toInt
val train = shuffled.slice(0, pos)
val test = shuffled.slice(pos, shuffled.size)
var factory = new DoccatFactory()
var params = TrainingParameters.defaultParams();
params.put(TrainingParameters.CUTOFF_PARAM, Integer.toString(2));
params.put(TrainingParameters.ITERATIONS_PARAM, Integer.toString(iterations));
// We could use NaiveBayesTrainer.NAIVE_BAYES_VALUE or QNTrainer.MAXENT_QN_VALUE
params.put(TrainingParameters.ALGORITHM_PARAM, QNTrainer.MAXENT_QN_VALUE);
// do not print progress
OutputManager.setStandardOutput(new Output())
var model = DocumentCategorizerME.train("en", new CollectionObjectStream(train.asJava), params, factory);
var myCategorizer = new DocumentCategorizerME(model);
var confusionMatrix = new ConfusionMatrix();
test.foreach(t => confusionMatrix.increaseValue(t.getCategory(), predict(myCategorizer, t.getText()), 1))
// activate printing again
OutputManager.clear
return confusionMatrix
}
// categorize the text
def predict(myCategorizer:DocumentCategorizerME, text:Array[String]):String = {
var outcomes = myCategorizer.categorize(text)
var category = myCategorizer.getBestCategory(outcomes)
return category
}
}
import scala.collection.JavaConverters._ defined class StockClassification
We can calculate the accuracy:
new StockClassification("AAPL").getAccuracy(1, 100)
0.5208955223880597
new StockClassification("AAPL").getAccuracy(20, 100)
0.582089552238806
We can do this calculation for all Nasdaq 100 stocks:
import ch.pschatzmann.stocks.data.index.Nasdaq100Index
var stocks = new Nasdaq100Index().listID().asScala.map(id => id.getTicker())
var result = stocks.map(ticker => Map("ticker"-> ticker, "accuracy" -> new StockClassification(ticker).getAccuracy(20, 100)) )
[[Map(ticker -> MSFT, accuracy -> 0.5364667747163695), Map(ticker -> AAPL, accuracy -> 0.5741293532338309), Map(ticker -> AMZN, accuracy -> 0.5228844254198222), Map(ticker -> GOOG, accuracy -> 0.7035413153456999), Map(ticker -> FB, accuracy -> 0.8712), Map(ticker -> GOOGL, accuracy -> 0.6800539993250084), Map(ticker -> CSCO, accuracy -> 0.510376282782212), Map(ticker -> INTC, accuracy -> 0.5367047308319739), Map(ticker -> CMCSA, accuracy -> 0.5449983271997324), Map(ticker -> PEP, accuracy -> 0.6232219649354945), Map(ticker -> AMGN, accuracy -> 0.49264454327745466), Map(ticker -> ADBE, accuracy -> 0.5586253369272237), Map(ticker -> NFLX, accuracy -> 0.47262714609653383), Map(ticker -> AVGO, accuracy -> 0.8325915782266348), Map(ticker -> PYPL, accuracy -> 0.9755427315196482), Map(ticker -> COST, accuracy -> 0.5518796992481203), Map(ticker -> TXN, accuracy -> 0.5500310752019888), Map(ticker -> NVDA, accuracy -> 0.5431754874651811), Map(ticker -> SBUX, accuracy -> 0.5842329080798168), Map(ticker -> GILD, accuracy -> 0.5589254766031195), Map(ticker -> BKNG, accuracy -> 0.5436945415434257), Map(ticker -> WBA, accuracy -> 0.5596172718351324), Map(ticker -> CHTR, accuracy -> 0.8820095893266625), Map(ticker -> QCOM, accuracy -> 0.5632183908045977), Map(ticker -> MDLZ, accuracy -> 0.519559902200489), Map(ticker -> BIIB, accuracy -> 0.5360082304526749), Map(ticker -> TSLA, accuracy -> 0.8644010767160162), Map(ticker -> ADP, accuracy -> 0.5819485744271471), Map(ticker -> KHC, accuracy -> 0.8978129802577137), Map(ticker -> CSX, accuracy -> 0.5998610628690517), Map(ticker -> ISRG, accuracy -> 0.5231259968102073), Map(ticker -> TMUS, accuracy -> 0.6511010362694301), Map(ticker -> ESRX, accuracy -> 0.666754478398314), Map(ticker -> INTU, accuracy -> 0.5461723309303124), Map(ticker -> FOXA, accuracy -> 0.5419777918606894), Map(ticker -> BIDU, accuracy -> 0.6626506024096386), Map(ticker -> ILMN, accuracy -> 0.5325301204819277), Map(ticker -> CELG, accuracy -> 0.574869109947644), Map(ticker -> VRTX, accuracy -> 0.5617283950617284), Map(ticker -> MU, accuracy -> 0.49945286853212445), Map(ticker -> REGN, accuracy -> 0.505358882754141), Map(ticker -> CTSH, accuracy -> 0.5817278747742324), Map(ticker -> FOX, accuracy -> 0.5491692612900559), Map(ticker -> MAR, accuracy -> 0.5865639709028669), Map(ticker -> ATVI, accuracy -> 0.5631421942326815), Map(ticker -> AMAT, accuracy -> 0.47537368292085275), Map(ticker -> ADI, accuracy -> 0.519979242345615), Map(ticker -> FISV, accuracy -> 0.6013793103448276), Map(ticker -> ROST, accuracy -> 0.6156035493052068), Map(ticker -> ADSK, accuracy -> 0.5959352394075095), Map(ticker -> MNST, accuracy -> 0.627906976744186), Map(ticker -> EBAY, accuracy -> 0.5182225063938619), Map(ticker -> ORLY, accuracy -> 0.6261767385362891), Map(ticker -> SIRI, accuracy -> 0.5556152606125739), Map(ticker -> XEL, accuracy -> 0.6315624017604526), Map(ticker -> NXPI, accuracy -> 0.8939393939393939), Map(ticker -> ALXN, accuracy -> 0.5349139330951608), Map(ticker -> EA, accuracy -> 0.5842682622787335), Map(ticker -> PAYX, accuracy -> 0.5426008968609866), Map(ticker -> WDAY, accuracy -> 0.9463861554122837), Map(ticker -> XLNX, accuracy -> 0.5101759227319765), Map(ticker -> LRCX, accuracy -> 0.5053617483371793), Map(ticker -> JD, accuracy -> 0.9562142135399124), Map(ticker -> DLTR, accuracy -> 0.5727127447063524), Map(ticker -> PCAR, accuracy -> 0.6119351500517419), Map(ticker -> NTES, accuracy -> 0.5277681959296309), Map(ticker -> VRSK, accuracy -> 0.8779661016949153), Map(ticker -> CERN, accuracy -> 0.5883785664578984), Map(ticker -> CTAS, accuracy -> 0.5717277486910994), Map(ticker -> ALGN, accuracy -> 0.4898690773067332), Map(ticker -> MCHP, accuracy -> 0.548993288590604), Map(ticker -> IDXX, accuracy -> 0.5870406189555126), Map(ticker -> CHKP, accuracy -> 0.5345616421417089), Map(ticker -> BMRN, accuracy -> 0.58044744100521), Map(ticker -> EXPE, accuracy -> 0.5232132808103546), Map(ticker -> FAST, accuracy -> 0.574737786751921), Map(ticker -> MYL, accuracy -> 0.41379310344827586), Map(ticker -> ULTA, accuracy -> 0.8101591760299626), Map(ticker -> AAL, accuracy -> 0.5004637143519592), Map(ticker -> CTXS, accuracy -> 0.5871959614502065), Map(ticker -> MXIM, accuracy -> 0.5264503441494592), Map(ticker -> MELI, accuracy -> 0.8209366391184573), Map(ticker -> KLAC, accuracy -> 0.5242582897033159), Map(ticker -> INCY, accuracy -> 0.5488826815642458), Map(ticker -> SYMC, accuracy -> 0.5488695652173913), Map(ticker -> SHPG, accuracy -> 0.5657894736842105), Map(ticker -> SNPS, accuracy -> 0.5629527423249396), Map(ticker -> CTRP, accuracy -> 0.40652753108348133), Map(ticker -> HSIC, accuracy -> 0.6255716842314576), Map(ticker -> LBTYK, accuracy -> 0.4922607879924953), Map(ticker -> SWKS, accuracy -> 0.5219135802469136), Map(ticker -> CDNS, accuracy -> 0.5466441536513297), Map(ticker -> TTWO, accuracy -> 0.5584084672677382), Map(ticker -> ASML, accuracy -> 0.5399728997289973), Map(ticker -> WYNN, accuracy -> 0.46490663232453316), Map(ticker -> HOLX, accuracy -> 0.5941379310344828), Map(ticker -> WDC, accuracy -> 0.5390340218712029), Map(ticker -> STX, accuracy -> 0.4162758620689655), Map(ticker -> HAS, accuracy -> 0.5716734588265735), Map(ticker -> JBHT, accuracy -> 0.6190684452219549), Map(ticker -> VOD, accuracy -> 0.5201972956884089), Map(ticker -> QRTEA, accuracy -> 0.9982647496281606), Map(ticker -> LBTYA, accuracy -> 0.47701688555347094)]]
new TableDisplay(result)
And we can calculate the accuracy for differnt periods
val days = (0 to 100)
val prediction = new StockClassification(ticker)
val accuracies = days.map(n => prediction.getAccuracy(n, 100))
[[0.4929767693138844, 0.4970286331712588, 0.5016207455429498, 0.5186385737439222, 0.5316045380875203, 0.5367368989735278, 0.5183684494867639, 0.5132360886007563, 0.5432198811453268, 0.5650999459751486, 0.5310642895732036, 0.541599135602377, 0.5256618044300378, 0.5615883306320908, 0.5189086980010805, 0.5475418692598595, 0.46947595894111294, 0.5213398163155051, 0.5499729875742841, 0.5629389519178822, 0.5421393841166937, 0.5367368989735278, 0.5378173960021609, 0.5648298217179902, 0.5564559697460832, 0.5510534846029174, 0.5470016207455429, 0.541599135602377, 0.5380875202593193, 0.5048622366288493, 0.5451107509454349, 0.5440302539168017, 0.5726634251755808, 0.52215018908698, 0.5424095083738519, 0.5253916801728795, 0.5583468395461912, 0.5310642895732036, 0.5164775796866559, 0.5502431118314425, 0.5343057806591032, 0.5548352242031335, 0.5229605618584549, 0.5305240410588871, 0.5299837925445705, 0.5259319286871961, 0.5070232306861157, 0.5667206915180983, 0.5359265262020529, 0.5429497568881686, 0.5578065910318747, 0.5583468395461912, 0.550783360345759, 0.5537547271745002, 0.5297136682874122, 0.5699621826039979, 0.5561858454889249, 0.5426796326310103, 0.5418692598595354, 0.5383576445164776, 0.526742301458671, 0.5521339816315505, 0.5621285791464073, 0.5729335494327391, 0.5359265262020529, 0.54430037817396, 0.5791464073473798, 0.5823878984332793, 0.5372771474878444, 0.5499729875742841, 0.5721231766612642, 0.5483522420313344, 0.5253916801728795, 0.5410588870880605, 0.5551053484602917, 0.5726634251755808, 0.5364667747163695, 0.5515937331172339, 0.5783360345759049, 0.557266342517558, 0.5780659103187467, 0.5767152890329552, 0.5399783900594274, 0.5607779578606159, 0.5316045380875203, 0.5243111831442464, 0.5334954078876283, 0.5926526202052944, 0.5418692598595354, 0.5375472717450027, 0.5788762830902215, 0.5253916801728795, 0.5256618044300378, 0.5578065910318747, 0.5334954078876283, 0.5378173960021609, 0.5729335494327391, 0.5742841707185306, 0.5367368989735278, 0.5796866558616964, 0.5745542949756888]]
val plot = new Plot
plot.add(new Line { x = days; y = accuracies })
plot