Web Service Client Annotators

Web service client annotators are annotators which use a web service to annotate documents: for each document that gets processed, data is sent to a HTTP endpoint, processed there and information is sent back that is then used to annotate the document.

Currently the following client annotators are implemented:

  • GateCloudAnnotator: this annotator connects to one of the many services on the GATE Cloud platform (https://cloud.gate.ac.uk/). Services include named entity recognition for tweets or standard texts in several languages, entity disambiguation and linking to Wikipedia or MeSH or Snomed. NOTE: some services require input that is not just plain text, at the moment only those services are supported which can annotate plain text
  • TagMeAnnotator: this annotator connects to the TagMe mention disambiguation and linking service (https://sobigdata.d4science.org/group/tagme/tagme) to either perform the task "tag" (disambiguation and linking of mentions) or "spot" finding mentions only.
  • TextRazorTextAnnotator: this annotator connects to the TextRazor API endpoint (see https://www.textrazor.com/) to annotate the text of the document
  • ElgTextAnnotator: this annotator connects to one of the public endpoints from the European Language Grid project (see https://live.european-language-grid.eu)
In [3]:
from gatenlp import Document
from gatenlp.processing.client import GateCloudAnnotator

Lets try annotating a document with the English Named Entity Recognizer on GATE cloud (https://cloud.gate.ac.uk/shopfront/displayItem/annie-named-entity-recognizer).

The information page for that service shows that the following annotation types can be requested of which the first 5 are requested by default if no alternate list is specified:

  • Address (included by default)
  • Date (included by default)
  • Location (included by default)
  • Organization (included by default)
  • Person (included by default)
  • Money
  • Percent
  • Token
  • SpaceToken
  • Sentence

We create a GateCloudAnnotator an specify the full list of all supported annotation types. We also specify the URL of the service endpoint as provided on the info page and specify that the annotations should be put into the annotation set "ANNIE". Note that a limited number of documents can be annotated for free and without authentication, so we do not need to specify the api_key and api_password parameters.

In [4]:
annotator = GateCloudAnnotator(
    url="https://cloud-api.gate.ac.uk/process-document/annie-named-entity-recognizer", 
    out_annset="ANNIE", 
    ann_types=":Address,:Date,:Location,:Organization,:Person,:Money,:Percent,:Token,:SpaceToken,:Sentence"
)
In [6]:
# an example document to annotate
doc = Document("Barack Obama visited Microsoft in New York last May.")
In [7]:
# Run the annotator and show the annotated document
doc = annotator(doc)
doc
Out[7]: