I am planning to use the Edgar data to determine and calculate some financial KPIs and feed these into a Neural Network. In my prior posts I described how to use my Webservices to request and display Edgar information with the help of Python and Pandas.
In this instalment I show how we can direcly use the 'built in' Java Query functionality of Smart-Edgar from Scala in order to calculated some financial KPIs.
We install the dependencies with the help of Maven
%classpath config resolver maven-public http://192.168.1.10:8081/repository/maven-public/
%%classpath add mvn
ch.pschatzmann:smart-edgar:LATEST
org.apache.spark:spark-sql_2.11:2.3.2
Added new repo: maven-public
I also import the relevant packages or classes
import ch.pschatzmann.edgar.reporting.company._
import ch.pschatzmann.edgar.utils.Utils
import ch.pschatzmann.edgar.reporting._
import java.util.Arrays
import ch.pschatzmann.edgar.reporting.company._ import ch.pschatzmann.edgar.utils.Utils import ch.pschatzmann.edgar.reporting._ import java.util.Arrays
We also need to provide the login information to the database via system properties or environment variables:
First we run the query w/o any filter to display all available data. We see that we get different forms and number of months. We potentially have duplicate information.
%%time
val values = new CompanyEdgarValues(new CompanySelection().setTradingSymbol("AAPL"))
.setParameterNames("NetIncomeLoss")
values.size
CPU times: user 0 ns, sys: 442 µs, total: 442 µs Wall Time: 859 ms
31
We can display the information in a table. Unfortunatly BeakerX can not handle the custom List types correctly so we wrap the result in an ArrayList
values.setFilter(new NoFilter(), false)
new java.util.ArrayList(values.toList)
In order to plot the data we need to convert our Java objects into Scala and select the relevant fields
import scala.collection.JavaConverters._
def plotEdgar(edgar:CompanyEdgarValues):TimePlot = {
val list = new java.util.ArrayList(edgar.toList)
val df = new java.text.SimpleDateFormat("yyyy-MM-dd")
val xList = list.asScala.map(m => m.get("date").asInstanceOf[String]).map(ds => df.parse(ds))
val yList = list.asScala.map(m => m.get("NetIncomeLoss").asInstanceOf[java.math.BigDecimal])
val plot = new TimePlot()
plot.add(new Line { x = xList ; y = yList })
return plot
}
plotEdgar(values)
This filter provides only the valid cumulated quarterly values. We also filter out duplicate informarmation
values.setFilter(new FilterQuarterlyCumulated(), true)
new java.util.ArrayList(values.toList)
plotEdgar(values)
values.getTable.asInstanceOf[ch.pschatzmann.common.table.TableConsolidated].getMessages
[Replaced [2009-03-28, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 1620000000.00 -> 3875000000.00, Replaced [2010-03-27, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 3074000000.00 -> 6452000000.00, Replaced [2011-03-26, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 5987000000.00 -> 11991000000.00, Replaced [2012-03-31, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 11622000000.00 -> 24686000000.00, Replaced [2013-03-30, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 9547000000.00 -> 22625000000.00, Replaced [2014-03-29, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 10223000000.00 -> 23295000000.00, Replaced [2015-03-28, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 13569000000.00 -> 31593000000.00, Replaced [2016-03-26, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 10516000000.00 -> 28877000000.00, Replaced [2017-04-01, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 11029000000.00 -> 28920000000.00, Replaced [2018-03-31, APPLE INC, AAPL, 0000320193, CA, CA, ELECTRONIC COMPUTERS]: 13822000000.00 -> 33887000000.00]
This filter provides only the valid 10-K filings.
values.setFilter(new FilterYearly())
new java.util.ArrayList(values.toList)
plotEdgar(values)
We can convert the data to html...
new MIMEContainer("text/html", values.toHtml)
date | companyName | tradingSymbol | identifier | incorporation | location | sicDescription | NetIncomeLoss |
---|---|---|---|---|---|---|---|
2007-09-29 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 3495000000.00 |
2008-09-27 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 6119000000.00 |
2009-09-26 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 8235000000.00 |
2010-09-25 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 14013000000.00 |
2011-09-24 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 25922000000.00 |
2012-09-29 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 41733000000.00 |
2013-09-28 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 37037000000.00 |
2014-09-27 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 39510000000.00 |
2015-09-26 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 53394000000.00 |
2016-09-24 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 45687000000.00 |
2017-09-30 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 48351000000.00 |
2018-09-29 | APPLE INC | AAPL | 0000320193 | CA | CA | ELECTRONIC COMPUTERS | 59531000000.00 |
... or data to csv
values.toCsv
date;companyName;tradingSymbol;identifier;incorporation;location;sicDescription;NetIncomeLoss 2007-09-29;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;3495000000.00 2008-09-27;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;6119000000.00 2009-09-26;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;8235000000.00 2010-09-25;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;14013000000.00 2011-09-24;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;25922000000.00 2012-09-29;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;41733000000.00 2013-09-28;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;37037000000.00 2014-09-27;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;39510000000.00 2015-09-26;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;53394000000.00 2016-09-24;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;45687000000.00 2017-09-30;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;48351000000.00 2018-09-29;APPLE INC;AAPL;0000320193;CA;CA;ELECTRONIC COMPUTERS;59531000000.00
This filter only provides the values for 3 months
values.setFilter(new FilterQuarterValues())
new java.util.ArrayList(values.toList)
plotEdgar(values)
We can also search for companies by indicating the search field name and the corresponding search values. We can also filter out all companies without a ticker symbol:
%%time
val companies = new CompanySearch("companyName","A%","B%").onlyCompaniesWithTradingSymbol(true)
new java.util.ArrayList(companies.toList)
CPU times: user 0 ns, sys: 210 µs, total: 210 µs Wall Time: 376 ms
or we can just select all companies
%%time
val companies = new CompanySearch().onlyCompaniesWithTradingSymbol(false)
new java.util.ArrayList(companies.toList)
CPU times: user 0 ns, sys: 259 µs, total: 259 µs Wall Time: 563 ms
java.lang.NullPointerException at ch.pschatzmann.edgar.reporting.company.CompanySearch.setup(CompanySearch.java:77) at ch.pschatzmann.edgar.reporting.company.CompanySearch.toList(CompanySearch.java:111) ... 50 elided
In order to get a better overview of the available parameters which are available in Edgar for AAPL we can create the following query: we show the count of parameterName by form:
val model = new EdgarModel().create()
model.getTableField("values", "unitref").setFilterValues(Arrays.asList("USD"))
model.getTableField("values", "segment").setFilterValues(Arrays.asList(""))
model.getTableField("values", "segmentdimension").setFilterValues(Arrays.asList(""))
val table = new Table()
val valueField = model.getTable("values").getValueField().asInstanceOf[ValueField]
valueField.setSelectedFunction("COUNT(%fld)")
table.setValueField(valueField)
table.addRow(model.getNavigationField("company", "tradingSymbol").setFilterValues("AAPL"))
table.addRow(model.getNavigationField("values", "parameterName"))
table.addColumn(model.getNavigationField("values", "form"))
table.execute(model)
new java.util.ArrayList(table.toList)
Finally we demonstrate how different KPIs can be easily calclated with the help of Spark. We create and start a SparkSession with the help of the BeakerX Spark magic.
%%spark --start
val spark = SparkSession.builder()
.appName("Edgar")
.master("local[*]")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.ui.enabled", "false")
We avoid the complexity around the conversion of java to scala objects and just write the Edgar data to a temporary csv file which is then used by Spark.
We use the parameter filter to select the relevant parameters only:
%%time
import spark.implicits._
import org.apache.spark.sql.functions.expr
import org.apache.spark.sql.expressions._
import org.apache.spark.sql.functions._
import ch.pschatzmann.edgar.reporting.company._
import ch.pschatzmann.edgar.utils.Utils
val values = new CompanyEdgarValues(new CompanySelection().setTradingSymbol("AAPL"))
.setFilter(new FilterYearly())
.setParameterNames("NetIncomeLoss", "OperatingIncomeLoss", "ResearchAndDevelopmentExpense",
"CashAndCashEquivalentsAtCarryingValue", "AvailableForSaleSecuritiesCurrent", "AccountsReceivableNetCurrent",
"Revenues", "SalesRevenueNet", "InventoryNet", "AssetsCurrent", "LiabilitiesCurrent", "Assets",
"EarningsPerShareBasic", "StockholdersEquity")
val file = Utils.createTempFile(values.toCsv());
println(file)
val edgarAAPLYear = spark.read.format("csv")
.option("delimiter", ";")
.option("header", "true")
.load(file.getAbsolutePath)
edgarAAPLYear.display(1000)
/tmp/edgar943078531091713431.tmp
CPU times: user 0 ns, sys: 301 µs, total: 301 µs Wall Time: 7 s
org.apache.spark.sql.SparkSession$implicits$@22452219
In Spark we can add additional columns by defining a formula with the help of the expr function.
We add some calculated KPIs to the dataframe:
%%time
// Current ratio = current assets/current liabilities
// Quick ratio = cash + Account Receivables + Short Term investments / current liabilities
// Net profit margin = net profit/total revenue
// Inventory Turnover = Sales / Inventory
var edgarAAPLYear1 = edgarAAPLYear
.withColumn("Revenue",expr("coalesce(Revenues, SalesRevenueNet)"))
.withColumn("QuickRatio",expr("(CashAndCashEquivalentsAtCarryingValue + AccountsReceivableNetCurrent + AvailableForSaleSecuritiesCurrent) / LiabilitiesCurrent"))
.withColumn("CurrentRatio",expr("AssetsCurrent / LiabilitiesCurrent"))
.withColumn("InventoryTurnover",expr("Revenue / InventoryNet"))
.withColumn("NetProfitMargin",expr("NetIncomeLoss / Revenue"))
.withColumn("SalesResearchRatio%",expr("ResearchAndDevelopmentExpense / Revenue *100"))
.withColumn("NetIncomeResearchRatio%",expr("ResearchAndDevelopmentExpense / NetIncomeLoss * 100"))
.drop($"Revenues")
.drop($"SalesRevenueNet")
edgarAAPLYear1.display(100)
CPU times: user 0 ns, sys: 203 µs, total: 203 µs Wall Time: 1 s
null
We can also calculate the percent change compared with the prior period with the help of the lag function. We do this for
%%time
val windowSpec = Window.partitionBy("companyName").orderBy("date")
val netIncomeChangeFormula = ($"NetIncomeLoss" - lag("NetIncomeLoss", 1).over(windowSpec)) / lag("NetIncomeLoss", 1).over(windowSpec) * 100
val salesChangeFormula = ($"Revenue" - lag("Revenue", 1).over(windowSpec)) / lag("Revenue", 1).over(windowSpec) * 100
val researchChangeFormula = ($"ResearchAndDevelopmentExpense" - lag("ResearchAndDevelopmentExpense", 1).over(windowSpec)) / lag("ResearchAndDevelopmentExpense", 1).over(windowSpec) * 100
val edgarAAPLYear2 = edgarAAPLYear1
.withColumn("NetIncomeChange%", netIncomeChangeFormula )
.withColumn("RevenueChange%", salesChangeFormula )
.withColumn("ResearchAndDevelopmentChange%", researchChangeFormula )
edgarAAPLYear2.display(100)
CPU times: user 0 ns, sys: 199 µs, total: 199 µs Wall Time: 7 s
null
Finally here we have the consolidated example that is wrapping the logic which has been explained above in a simple Object so that we can reuse the functionlity for different ticker symbols:
%%time
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
object EdgarCompanyData {
def getDataset(ticker:String):Dataset[Row] = {
// Edgar data selection
val values = new CompanyEdgarValues(new CompanySelection().setTradingSymbol(ticker))
.setFilter(new FilterYearly())
.setParameterNames("NetIncomeLoss","OperatingIncomeLoss","ResearchAndDevelopmentExpense",
"CashAndCashEquivalentsAtCarryingValue","AvailableForSaleSecuritiesCurrent","AccountsReceivableNetCurrent",
"Revenues","SalesRevenueNet","InventoryNet","AssetsCurrent","LiabilitiesCurrent","Assets","EarningsPerShareBasic",
"StockholdersEquity")
// formulas
val file = Utils.createTempFile(values.toCsv());
val windowSpec = Window.partitionBy("identifier").orderBy("date")
val netIncomeChangeFormula = ($"NetIncomeLoss" - lag("NetIncomeLoss", 1).over(windowSpec)) / lag("NetIncomeLoss", 1).over(windowSpec) * 100
val salesChangeFormula = ($"Revenue" - lag("Revenue", 1).over(windowSpec)) / lag("Revenue", 1).over(windowSpec) * 100
val researchChangeFormula = ($"ResearchAndDevelopmentExpense" - lag("ResearchAndDevelopmentExpense", 1).over(windowSpec)) / lag("ResearchAndDevelopmentExpense", 1).over(windowSpec) * 100
val edgarYear = spark.read.format("csv")
.option("delimiter", ";")
.option("header", "true")
.load(file.getAbsolutePath)
.withColumn("Revenue",expr("coalesce(Revenues, SalesRevenueNet)"))
.withColumn("QuickRatio",expr("(CashAndCashEquivalentsAtCarryingValue + AccountsReceivableNetCurrent + AvailableForSaleSecuritiesCurrent) / LiabilitiesCurrent"))
.withColumn("CurrentRatio",expr("AssetsCurrent / LiabilitiesCurrent"))
.withColumn("InventoryTurnover",expr("Revenue / InventoryNet"))
.withColumn("NetProfitMargin",expr("NetIncomeLoss / Revenue"))
.withColumn("SalesResearchRatio%",expr("ResearchAndDevelopmentExpense / Revenue *100"))
.withColumn("NetIncomeResearchRatio%",expr("ResearchAndDevelopmentExpense / NetIncomeLoss * 100"))
.withColumn("NetIncomeChange%", netIncomeChangeFormula )
.withColumn("RevenueChange%", salesChangeFormula )
.withColumn("ResearchAndDevelopmentChange%", researchChangeFormula )
.drop($"Revenues")
.drop($"SalesRevenueNet")
return edgarYear
}
}
EdgarCompanyData.getDataset("AAPL").display(100)
CPU times: user 0 ns, sys: 288 µs, total: 288 µs Wall Time: 7 s
null
And now we try another ticker symbol
EdgarCompanyData.getDataset("MSFT").display(100)