OpenCog Natural Language Processing

The OpenCog Framework is a toolkit for software developers which includes at its foundation the OpenCog Core and the OpenCog runtime consisting of the AtomSpace and CogServer. The OCF also includes source code and stable APIs for developing AI applications that run on the OpenCog Core, where the OCF is designed to be used with specific cognitive algorithms (e.g. MOSES and PLN which are developed as plug-ins to the OCF) to build AI applications as collections of interacting OpenCog MindAgents.

Pipeline

For natural language comprehension, OpenCog currently has three major steps:

  • Link Grammer

    The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" representation of a sentence (showing noun phrases, verb phrases, etc.).

  • RelEx

    RelEx, a narrow-AI component of OpenCog, is an English-language semantic dependency relationship extractor, built on the Link Grammar parser. It uses a series of graph rewriting rules to identify subject, object, indirect object and many other syntactic dependency relationships between words in a sentence. That is, it generates the grammatical relations of a dependency grammar. The set of dependency relations it employs resemble those of Dekang Lin's MiniPar and the Stanford parser (and it has an explicit compatibility mode). It is inspired in part by the ideas of Hudson's Word Grammar.

    Unlike other dependency parsers, RelEx attempts a greater degree of semantic normalization: for questions, comparatives, entities, and for prepositional relationships, whereas other parsers (such as the Stanford parser) stick to a literal presentation of the syntactic structure of text. For example, RelEx pays special attention to determining when a sentence or phrase implies that its referents are hypothetical versus factive, and to isolating the query variables in a question, and determining whether a question is seeking specific information or questioning the truth-value of a proposition. Both of these aspects are intended to make RelEx well-suited for question-answering and semantic comprehension/reasoning systems. In addition, RelEx tags words with features including part-of-speech, noun-number, verb-tense, gender, determinacy, etc.

  • RelEx2Logic

    RelEx2Logic uses the information recorded in both the link-grammar and RelEx representations to generate OpenCog 'atoms'--entities generated in Scheme code which form the nodes and links of the hypergraph which is to be operated on by automated reasoning components such as MOSES and PLN, and which constitutes the central representation of all knowledge in OpenCog.

Example

We begin by starting the OpenCog and RelEx servers.

Important: The CogServer and RelEx servers must be running in the background and the REST API must be loaded before running this notebook.

In [9]:
from opencog import *
In [ ]:
# Start the OpenCog server
server = Server()
#server.start()
In [18]:
# Start the RelEx server
relex_server = RelExServer()
#relex_server.start();

Then we can use Link Grammar and RelEx to parse this sentence:

I think that all dogs can fly.

In [14]:
relex("I think that all dogs can fly.")
(S (NP I) (VP think (SBAR that (S (NP all dogs) (VP can (VP fly))))) .)


    +-------------------------Xp------------------------+       
    |                     +------------CV---------->+   |       
    +---->WV---->+        +-----Cet-----+           |   |       
    +--Wd--+-Sp*i+---TH---+      +--Dmc-+--Sp-+--I--+   +--RW--+
    |      |     |        |      |      |     |     |   |      |
LEFT-WALL I.p think.v that.j-c all.a dogs.n can.v fly.v . RIGHT-WALL 


Parse confidence: 0.7866
cost vector = (UNUSED=0.0 DIS=0.0 LEN=12.0)

======

Dependency relations:

    that(think, fly)
    _subj(think, I)
    _quantity(dog, all)
    _subj(fly, dog)

Attributes:

    tense(think, present)
    subscript-TAG(think, .v)
    pos(think, verb)
    subscript-TAG(can, .v)
    pos(can, verb)
    subscript-TAG(dog, .n)
    pos(dog, noun)
    noun_number(dog, plural)
    subscript-TAG(all, .a)
    pos(all, adj)
    pos(., punctuation)
    tense(fly, present_future)
    HYP(fly, T)
    subscript-TAG(fly, .v)
    pos(fly, verb)
    pronoun-FLAG(I, T)
    gender(I, person)
    definite-FLAG(I, T)
    subscript-TAG(I, .p)
    pos(I, noun)
    noun_number(I, singular)
    subscript-TAG(that, .j-c)
    pos(that, conjunction)

The above shows the linked representation produced by LG, which RelEx uses to extract the relations & attributes. In order to apply RelEx2Logic and generate the atoms in the OpenCog atomspace for reasoning, we first need to make sure both OpenCog and RelEx servers are running. Then we run the following scheme code from the OpenCog scheme shell.

In [19]:
to_logic("I think that all dogs can fly.")
((ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "think" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "I" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "present" (stv 0.001 0.99000001))
)
 (ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "fly" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "that" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (PredicateNode "[email protected]" (stv 0.001 0.99000001))
      (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "dog" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "present_future" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "definite" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (ForAllLink (stv 0.99000001 0.99000001)
   (VariableNode "$X" (stv 0.001 0.99000001))
   (ImplicationLink (stv 0.99000001 0.99000001)
      (InheritanceLink (stv 0.99000001 0.99000001)
         (VariableNode "$X" (stv 0.001 0.99000001))
         (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      )
      (EvaluationLink (stv 0.99000001 0.99000001)
         (PredicateNode "[email protected]" (stv 0.001 0.99000001))
         (ListLink (stv 0.99000001 0.99000001)
            (VariableNode "$X" (stv 0.001 0.99000001))
         )
      )
   )
)
)

The above is formatted in the Scheme programming language notation. Each atom consists of a node or link type followed by the specific information defining that atom and two numbers used for probabilistic reasoning, as further discussed below.

The links and nodes returned above represent the logic of the sentence in terms of Predicate Logic. For example, the link (ForAllLink ...) roughly means for all variables $X which inherit from the concept dogs, applying the predicate fly to $X will have a high truth value. On the other hand, the link (EvaluationLink ... think ... I ...) represents I think, with think being treated as a predicate and I as its argument. In general, each Link, and some node-types (such as PredicateNodes), may take two or more atoms as arguments, relating them to each other and to the concept defined by the governing node, to form various natural language dependency relations such as subject-verb-object, or basic logical relations, such as 'inheritance.'

Combining all the information so represented forms a hypergraph capturing the semantics of the sentence "I think that all dogs can fly." in a format suitable for automated reasoning.

In OpenCog, reasoning is done by PLN (Probabilistic Logic Networks). PLN is a novel conceptual, mathematical, and computational approach to uncertain inference. It provides a framework for probabilistic inference in a way that is compatible with both Term Logic and Predicate Logic, and scales up to operate in real time on large dynamic knowledge bases. The two numbers following the notation 'stv' in each atom above has a particular syntax in PLN defining the 'strength' and 'confidence' of that atom.

Other sentences to try

To whom did you send that message?

In [20]:
relex("To whom did you send that message?")
(S to whom did (NP you) (VP send (NP that message)) ?)


    +--------------------------Xp-------------------------+       
    +------->WV------->+                                  |       
    |       +----Qd----+----I*d----+-------Os-------+     |       
    +---Wj--+-Jw-+     +-SIp-+     |       +---Dsu--+     +--RW--+
    |       |    |     |     |     |       |        |     |      |
LEFT-WALL to.r whom did.v-d you send.v that.j-d message.n ? RIGHT-WALL 


Parse confidence: 0.4965
cost vector = (UNUSED=0.0 DIS=2.0 LEN=15.0)

======

Dependency relations:

    to(send, _$qVar)
    _obj(send, message)
    _subj(send, you)
    _det(message, that)

Attributes:

    pronoun-FLAG(you, T)
    gender(you, person)
    pos(you, noun)
    tense(send, past_infinitive)
    HYP(send, T)
    subscript-TAG(send, .v)
    pos(send, verb)
    subscript-TAG(do, .v-d)
    pos(do, verb)
    QUERY-TYPE(_$qVar, who)
    pronoun-FLAG(_$qVar, T)
    relative-FLAG(_$qVar, T)
    interrogative-FLAG(_$qVar, T)
    pos(_$qVar, noun)
    pos(?, punctuation)
    definite-FLAG(message, T)
    subscript-TAG(message, .n)
    pos(message, noun)
    noun_number(message, singular)
    subscript-TAG(that, .j-d)
    pos(that, det)
    subscript-TAG(to, .r)
    pos(to, prep)
In [21]:
to_logic("To whom did you send that message?")
((ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "send" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "you" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "message" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (VariableNode "_$qVar" (stv 0.001 0.99000001))
   )
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]d0-2f43-48fe-9166-1a8c79f269ae" (stv 0.001 0.99000001))
   (ConceptNode "past_infinitive" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (VariableNode "$9ZTKrfM3TuQYIXPmcF92hkGLUJmL6y5fS9uP" (stv 0.001 0.99000001))
   (ConceptNode "message" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "definite" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
)

How did you learn that?

In [22]:
relex("How did you learn that?")
(S how did (NP you) (VP learn (NP that)) ?)


    +------------------Xp------------------+       
    +---->WV---->+----I*d----+             |       
    +--Wq--+--Q--+-SIp-+     +---Os---+    +--RW--+
    |      |     |     |     |        |    |      |
LEFT-WALL how did.v-d you learn.v that.j-p ? RIGHT-WALL 


Parse confidence: 0.6703
cost vector = (UNUSED=0.0 DIS=1.0 LEN=10.0)

======

Dependency relations:

    _obj(learn, that)
    how(learn, _$qVar)
    _subj(learn, you)

Attributes:

    pronoun-FLAG(you, T)
    gender(you, person)
    pos(you, noun)
    tense(learn, past_infinitive)
    HYP(learn, T)
    subscript-TAG(learn, .v)
    pos(learn, verb)
    pos(?, punctuation)
    pronoun-FLAG(that, T)
    relative-FLAG(that, T)
    demonstrative-FLAG(that, T)
    subscript-TAG(that, .j-p)
    pos(that, noun)
    noun_number(that, uncountable)
    subscript-TAG(do, .v-d)
    pos(do, verb)
    QUERY-TYPE(_$qVar, how)
    pos(_$qVar, adv)
In [9]:
to_logic("How did you learn that?")
((ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "learn" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "you" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "that" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "InManner" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (VariableNode "$qVar")
      (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "past_infinitive" (stv 0.001 0.99000001))
)
)

To point out the way Relex2logic and the atomspace are tailored for natural language reasoning, note that in "To whom did you send the message" the VariableNode $qVar (the indirect object) is an argument of the PredicateNode "send" while in "How did you learn that?" the $qVar and the verb 'learn' are arguments of the PredicateNode "InManner" a semantic relation which is implicit in the grammar of the original sentence and the use of the word "how." This is a good example of the way Relex2Logic does more than just translate the grammatical relations into a new format; it infers the logic of the original sentence and makes it explicit for the purposes of automated reasoning. You can see in the list of attributes where the query-type is defined by Relex as 'who' or 'how' in these cases, which notations are then used by Relex2Logic to generate Links such as the "InManner" seen above.

When did you bake the cake that you gave to Susan?

In [10]:
relex("When did you bake the cake that you gave to Susan?")
(S when did (NP you) (VP bake (NP (NP the cake) (SBAR (WHNP that) (S (NP you) (VP gave (PP to (NP Susan))))))) ?)


    +-------------------------------------Xp------------------------------------+       
    |                                    +---------Bs---------+                 |       
    +----->WV---->+----I*d----+----Os----+       +-----CV---->+                 |       
    +---Wq--+--Q--+-SIp-+     |    +Ds**c+---R---+--Cr-+--Sp--+--MVp-+--Js-+    +--RW--+
    |       |     |     |     |    |     |       |     |      |      |     |    |      |
LEFT-WALL when did.v-d you bake.v the cake.s that.j-r you gave.v-d to.r Susan.f ? RIGHT-WALL 


Parse confidence: 0.5488
cost vector = (UNUSED=0.0 DIS=1.0 LEN=20.0)

======

Dependency relations:

    _%atTime(bake, _$qVar)
    _obj(bake, cake)
    _subj(bake, you)
    to(give, Susan)
    _obj(give, cake)
    _subj(give, you)
    that(cake, give)

Attributes:

    pronoun-FLAG(you, T)
    gender(you, person)
    pos(you, noun)
    tense(bake, past_infinitive)
    HYP(bake, T)
    subscript-TAG(bake, .v)
    pos(bake, verb)
    subscript-TAG(do, .v-d)
    pos(do, verb)
    QUERY-TYPE(_$qVar, when)
    pos(_$qVar, adv)
    tense(give, past)
    HYP(give, T)
    subscript-TAG(give, .v-d)
    pos(give, verb)
    pos(?, punctuation)
    gender(Susan, feminine)
    definite-FLAG(Susan, T)
    person-FLAG(Susan, T)
    subscript-TAG(Susan, .f)
    pos(Susan, noun)
    noun_number(Susan, singular)
    pronoun-FLAG(you, T)
    gender(you, person)
    pos(you, noun)
    subscript-TAG(to, .r)
    pos(to, prep)
    subscript-TAG(that, .j-r)
    pos(that, conjunction)
    definite-FLAG(cake, T)
    subscript-TAG(cake, .s)
    pos(cake, noun)
    noun_number(cake, singular)
    pos(the, det)
In [11]:
to_logic("When did you bake the cake that you gave to Susan?")
((ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "bake" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "you" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "cake" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (AtTimeLink (stv 0.99000001 0.99000001)
   (VariableNode "$qVar")
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "past_infinitive" (stv 0.001 0.99000001))
)
 (ImplicationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (PredicateNode "give" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "you" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "Susan" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (PredicateNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "past" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (SpecificEntityNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "female" (stv 0.001 0.99000001))
)
 (InheritanceLink (stv 0.99000001 0.99000001)
   (SpecificEntityNode "[email protected]" (stv 0.001 0.99000001))
   (ConceptNode "Susan" (stv 0.001 0.99000001))
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "definite" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
 (EvaluationLink (stv 0.99000001 0.99000001)
   (PredicateNode "definite" (stv 0.001 0.99000001))
   (ListLink (stv 0.99000001 0.99000001)
      (ConceptNode "[email protected]" (stv 0.001 0.99000001))
   )
)
)

This sentence gives some idea of the complexity available in the OpenCog NLP pipeline. One may note several features of this representation. The "AtTime" relation has its own link type, reflecting the fundamental status of that relation in cognition. Note that the verbs 'bake' and 'give' are given HYP() attributes by Relex, indicating that the ideas that you baked and gave a cake to Susan are hypothetical to the speaker as far as the system knows.

Further stages of processing available in the OpenCog pipeline are reasoning (MOSES and PLN), action-planning, and expression. For example, PLN can be applied to search the atomspace for sub-hypergraphs that contain the information being asked for in a question, and then a Scheme function called cognitive-binding can be applied to bind the query-variable to the node which answers the question. The expressive end of the pipeline currently consists of two subsystems, microplanning and surface realization, which are currently under development. The microplanning sub-system selects the set of atoms for expression in a sentence and surface realization creates the sentence for output.

Future work

The OpenCog NLP pipeline might raise a number of philosophical issues. It is certainly not intended to be a close imitation of the way human beings process natural language; it is tailored for reasoning by the OpenCog system. As a guiding factor, it is designed to comprehend, reason about, and produce natural language expressions in a way which is more natural and more amenable to General Intelligence than AI NLP systems which use heuristics and statistics to simulate language comprehension.

On the other hand, although OpenCog NLP strives for a relatively faithful representation of the grammar and logic of natural language, like other narrow-AI systems its knowledge is still not sufficient to comprehend and participate intelligently in an indefinite range of linguistic interactions.

For that reason, we are also interested in the possibility of automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora. This proposed research is described in Learning Language from a Large (Unannotated) Corpus.