You'll need to get CoreNLP jars, for example by loading it as a dependency in some Maven project.
We have to use such method because CoreNLP-models has classifier 'models' and currently jupyter-scala doesn't allow classifiers in dependency loading syntax.
import ammonite.ops.Path
import $ivy.`org.slf4j:slf4j-api:1.7.6`
import $ivy.`com.google.protobuf:protobuf-java:3.0.0`
import $ivy.`joda-time:joda-time:2.9.4`
import $ivy.`de.jollyday:jollyday:0.5.1`
val coreNLPVersion = "3.8.0"
val myHome = ""
val pathPrefix = s"$myHome/.m2/repository"
val stanfordPrefix = s"${pathPrefix}/edu/stanford/nlp/stanford-corenlp/$coreNLPVersion"
interp.load.cp(
Seq(
Path(s"${stanfordPrefix}/stanford-corenlp-$coreNLPVersion.jar"),
Path(s"${stanfordPrefix}/stanford-corenlp-$coreNLPVersion-models.jar")
)
)
import edu.stanford.nlp.simple._
import scala.collection.JavaConverters._
import edu.stanford.nlp.simple._ import scala.collection.JavaConverters._
val sentenceText = "Chomsky's colorless green ideas sleep furiously."
val sentence = new Sentence(sentenceText)
sentenceText: String = "Chomsky's colorless green ideas sleep furiously." sentence: Sentence = Chomsky's colorless green ideas sleep furiously.
sentence.nerTags
res4: java.util.List[String] = [PERSON, O, O, O, O, O, O, O]
sentence.posTags
res5: java.util.List[String] = [NNP, POS, JJ, JJ, NNS, VBP, RB, .]
val parseTree = sentence.parse()
parseTree.indentedListPrint
ROOT S NP NP NNP Chomsky POS 's JJ colorless JJ green NNS ideas VP VBP sleep ADVP RB furiously . .
parseTree: edu.stanford.nlp.trees.Tree = (ROOT (S (NP (NP (NNP Chomsky) (POS 's)) (JJ colorless) (JJ green) (NNS ideas)) (VP (VBP sleep) (ADVP (RB furiously))) (. .)))
We'll try to parse some garden path sentences.
val gardenPathSentences = List(
"The government plans to raise taxes were approved.",
"The complex houses married and single soldiers and their families.",
"The horse raced past the barn fell.",
"The old man the boat."
).map(str => new Sentence(str))
gardenPathSentences.map {
sent =>
sent.words.asScala.toList.zip(
sent.posTags.asScala.toList
)
}
gardenPathSentences: List[Sentence] = List( The government plans to raise taxes were approved., The complex houses married and single soldiers and their families., The horse raced past the barn fell., The old man the boat. ) res7_1: List[List[(String, String)]] = List( List( ("The", "DT"), ("government", "NN"), ("plans", "VBZ"), ("to", "TO"), ("raise", "VB"), ("taxes", "NNS"), ("were", "VBD"), ("approved", "VBN"), (".", ".") ), ...
gardenPathSentences.foreach { sent =>
println("Sentence:")
println(sent)
println()
println("Parse tree:")
sent.parse.indentedListPrint
}
Sentence: The government plans to raise taxes were approved. Parse tree: ROOT S NP DT The NN government VP VBZ plans S VP TO to VP VB raise SBAR S NP NNS taxes VP VBD were VP VBN approved . . Sentence: The complex houses married and single soldiers and their families. Parse tree: ROOT NP NP DT The ADJP JJ complex NNS houses NP NP VBN married CC and JJ single NNS soldiers CC and NP PRP$ their NNS families . . Sentence: The horse raced past the barn fell. Parse tree: ROOT S NP DT The NN horse VP VBD raced SBAR S NP IN past DT the NN barn VP VBD fell . . Sentence: The old man the boat. Parse tree: ROOT NP NP DT The JJ old NN man NP DT the NN boat . .