Stanza pipeline

If gatenlp has been installed with the stanza extra (pip install gatenlp[stanza] or pip install gatenlp[all]) you can run a Stanford Stanza pipeline on a document and get the result as gatenlp annotations.

In [1]:
from gatenlp import Document
from gatenlp.lib_stanza import AnnStanza
import stanza
In [2]:
# In order to use the English pipeline with stanza, the model has to get downloaded first
stanza.download('en')
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/master/resources_1.1.0.json: 122kB [00:00, 65.9MB/s]                    
2020-11-29 16:43:27,915|INFO|stanza|Downloading default packages for language: en (English)...
2020-11-29 16:43:31,366|INFO|stanza|File exists: /home/johann/stanza_resources/en/default.zip.
2020-11-29 16:43:36,801|INFO|stanza|Finished downloading models and saved to /home/johann/stanza_resources.
In [3]:
doc = Document.load("https://gatenlp.github.io/python-gatenlp/testdocument2.txt")
doc
Out[3]:

Annotating the document using Stanza

In order to annotate one or more documents using Stanza, first create a AnnStanza annotator object and the run the document(s) through this annotator:

In [4]:
stanza_annotator = AnnStanza(lang="en")
2020-11-29 16:43:37,651|INFO|stanza|Loading these models for language: en (English):
=========================
| Processor | Package   |
-------------------------
| tokenize  | ewt       |
| pos       | ewt       |
| lemma     | ewt       |
| depparse  | ewt       |
| sentiment | sstplus   |
| ner       | ontonotes |
=========================

2020-11-29 16:43:37,680|INFO|stanza|Use device: cpu
2020-11-29 16:43:37,681|INFO|stanza|Loading: tokenize
2020-11-29 16:43:37,918|INFO|stanza|Loading: pos
2020-11-29 16:43:38,649|INFO|stanza|Loading: lemma
2020-11-29 16:43:38,673|INFO|stanza|Loading: depparse
2020-11-29 16:43:39,477|INFO|stanza|Loading: sentiment
2020-11-29 16:43:40,340|INFO|stanza|Loading: ner
2020-11-29 16:43:40,918|INFO|stanza|Done loading processors!
In [5]:
doc = stanza_annotator(doc)
doc
Out[5]:
In [ ]: