This notebook shows you how to load OCRE data from a CEX file over the internet, and build a corpus of text citable by CTS URN. It uses version 1.7.0
of the nomisma
library.
First configure the Jupyter notebook. In addition to the nomisma
library, we'll need the cite
and ohco2
libraries from the CITE architecture.
// 1. Add maven repository where we can find our libraries
val myBT = coursierapi.MavenRepository.of("https://dl.bintray.com/neelsmith/maven")
interp.repositories() ++= Seq(myBT)
// 2. Make libraries available with `$ivy` imports:
import $ivy.`edu.holycross.shot::nomisma:1.7.0`
import $ivy.`edu.holycross.shot::ohco2:10.16.0`
import $ivy.`edu.holycross.shot.cite::xcite:4.1.1`
import edu.holycross.shot.nomisma._
val ocreCex = "https://raw.githubusercontent.com/neelsmith/nomisma/master/cex/ocre-cite-ids.cex"
val ocre = OcreSource.fromUrl(ocreCex)
// Sanity check:
require(ocre.size > 50000)
You can build an OHCO2 corpus with the corpus
function.
import edu.holycross.shot.ohco2._
import edu.holycross.shot.cite._
val corpus: Corpus = ocre.corpus
println("Citable nodes of text in corpus: " + corpus.size)
The OcreIssue
class includes a textNodes
function that creates a Vector of 0-2 CitableNode
s. There will be two text nodes if the issue has both an obverse and reverse legend. Let's examine the CTS URNS of an issue that has both obverse and reverse legends.
val issueId = "3.com.43"
val randomIssue = ocre.issue(issueId).get
println("In issue " + issueId + ", made " + randomIssue.textNodes.size + " text nodes")
for (n <- randomIssue.textNodes) {
println("\nReference: " + n.urn)
println("Text content: " + n.text)
}
Let's parse the components of the URN.
It belongs to the CTS namespace hcnum
, and a text group issues
.
Within that group, its document identifier is ric
, and the specific version identifier is raw
. When we process the corpus (e.g., to generate a fully expanded version of abbreviated terms), we will use a different version identifier, but the rest of the URN will be the same.
The passage component is directly adapted from the nomisma.org identifier: 3.com.43
identifies RIC volume 3, Commodus, issue 43. The final piece of the passage component distinguishes obverse text from reverse text.
The corpus
function in Ocre
creates 0-2 CitableNode
s for each issue and compiles them into a text Corpus
.
As in any CTS environment, we can then select texts identified at any level of the passage and work hierarchies.
val commodus43 = corpus.nodes.filter(_.urn <= CtsUrn("urn:cts:hcnum:issues.ric.raw:3.com.43"))
println("**OBV** " + commodus43.map(_.text).mkString(" **REV** "))
val allCommodus = corpus.nodes.filter(_.urn <= CtsUrn("urn:cts:hcnum:issues.ric.raw:3.com"))
println("All legends in coins of Commodus: " + allCommodus.size)
val allRIC3 = corpus.nodes.filter(_.urn <= CtsUrn("urn:cts:hcnum:issues.ric.raw:3"))
println("All legends in RIC 3: " + allRIC3.size)