A previous notebook showed how to get an overview of the values of data in OCRE. This notebook shows you how to summarize and graph distributions of values for OCRE properties. It uses version 1.6.0
of the nomisma
library.
First configure the Jupyter notebook. In addition to the nomisma
library, we will use plotly
for graph plots, and a histoutils
package to simplify working with histograms.
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.6.0`
import $ivy.`edu.holycross.shot::histoutils:2.2.0`
import $ivy.`org.plotly-scala::plotly-almond:0.7.1`
import edu.holycross.shot.nomisma._
val ocreCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-cite-ids.cex"
val ocre = OcreSource.fromUrl(ocreCex)
// Sanity check:
require(ocre.size > 50000)
A previous notebook
(on mybinder here) showed how to check the values for properties of an Ocre
object. Let's review how many valid values OCRE includes for denomination. We'll use the hasDenomination
function to get only issues with valid data values for the denomination property, then apply the denominationList
to that result.
println("Number of valid values for denomination: " + ocre.hasDenomination.denominationList.size)
Seventy one seems like a lot. How often does each denomination appear?
Ocre
includes a function to create a Histogram
object for a named property.The Histogram
has a Vector of Frequency
s, so if we sort the frequencies by count we can look at the first and last entries to see the most and least common values in OCRE for denomination.
import edu.holycross.shot.histoutils._
val denominationHisto: edu.holycross.shot.histoutils.Histogram[String] = ocre.histogram("denomination").sorted
println("Entries in histogram of denominations: " + denominationHisto.size)
println("Most frequent denomination: " + denominationHisto.frequencies.head)
println("Least frequent denomination: " + denominationHisto.frequencies.last)
It's straightforward to visualize histograms as bar graphs using the plotly
library.
// 1. Import plotly libraries, and set display defaults suggested for use in Jupyter NBs:
import plotly._, plotly.element._, plotly.layout._, plotly.Almond._
repl.pprinter() = repl.pprinter().copy(defaultHeight = 3)
Plotly can construct a bar graph from two parallel lists of values for x and y axis. The Frequency
object in our histogram has item
and count
properties we can use for x and y respectively.
val denominationValues = denominationHisto.frequencies.map(_.item)
val denominationCounts = denominationHisto.frequencies.map(_.count)
val denominationPlot = Seq(
Bar(x = denominationValues, y = denominationCounts)
)
plot(denominationPlot)
Let's take a second example: how frequently are issues struck in different geographic regions over the five centuries of data in OCRE?
val regionHisto: edu.holycross.shot.histoutils.Histogram[String] = ocre.histogram("region").sorted
val regionValues = regionHisto.frequencies.map(_.item)
val regionCounts = regionHisto.frequencies.map(_.count)
val regionPlot = Seq(
Bar(x = regionValues, y = regionCounts)
)
plot(regionPlot)
It would be nice to look further into the uneven distribution of issues outside of Italy. In a subsequent notebook, we'll take OCRE's information about specific mints and generate geographic maps.