In [64]:

%%html
<script>
  function code_toggle() {
    if (code_shown){
      $('div.input').hide('500');
      $('#toggleButton').val('Show Code')
    } else {
      $('div.input').show('500');
      $('#toggleButton').val('Hide Code')
    }
    code_shown = !code_shown
  }

  $( document ).ready(function(){
    code_shown=false;
    $('div.input').hide()
  });
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>
<style>
.rendered_html td {
    font-size: xx-large;
    text-align: left; !important
}
.rendered_html th {
    font-size: xx-large;
    text-align: left; !important
}
</style>

In [65]:

%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("..")
from statnlpbook.util import execute_notebook
import statnlpbook.parsing as parsing
from statnlpbook.transition import *
from statnlpbook.dep import *
import pandas as pd
from io import StringIO
from IPython.display import display, HTML

execute_notebook('transition-based_dependency_parsing.ipynb')

$$ \newcommand{\Xs}{\mathcal{X}} \newcommand{\Ys}{\mathcal{Y}} \newcommand{\y}{\mathbf{y}} \newcommand{\balpha}{\boldsymbol{\alpha}} \newcommand{\bbeta}{\boldsymbol{\beta}} \newcommand{\aligns}{\mathbf{a}} \newcommand{\align}{a} \newcommand{\source}{\mathbf{s}} \newcommand{\target}{\mathbf{t}} \newcommand{\ssource}{s} \newcommand{\starget}{t} \newcommand{\repr}{\mathbf{f}} \newcommand{\repry}{\mathbf{g}} \newcommand{\x}{\mathbf{x}} \newcommand{\prob}{p} \newcommand{\a}{\alpha} \newcommand{\b}{\beta} \newcommand{\vocab}{V} \newcommand{\params}{\boldsymbol{\theta}} \newcommand{\param}{\theta} \DeclareMathOperator{\perplexity}{PP} \DeclareMathOperator{\argmax}{argmax} \DeclareMathOperator{\argmin}{argmin} \newcommand{\train}{\mathcal{D}} \newcommand{\counts}[2]{\#_{#1}(#2) } \newcommand{\length}[1]{\text{length}(#1) } \newcommand{\indi}{\mathbb{I}} $$

In [66]:

%load_ext tikzmagic

The tikzmagic extension is already loaded. To reload it, use:
  %reload_ext tikzmagic

Parsing¶

Schedule¶

Parsing motivation
Background: parsing (10 min.)
Exercise: multi-word expressions (10 min.)
Background: Universal Dependencies (5 min.)
Background: transition-based parsing (10 min.)
Break (10 min.)
Example: transition-based parsing (5 min.)
Motivation: natural language understanding (5 min.)
Background: learning to parse (10 min.)
Math: dependency parsing evaluation (5 min.)
Examples: dependency parsers (5 min.)
Background: semantic parsing (15 min.)

Motivation: information extraction¶

Dechra Pharmaceuticals, which has just made its second acquisition, had previously purchased Genitrix.

Trinity Mirror plc, the largest British newspaper, purchased Local World, its rival.

Kraft, owner of Milka, purchased Cadbury Dairy Milk and is now gearing up for a roll-out of its new brand.

Check out UDPipe and Stanza.

Motivation: question answering by reading comprehension¶

(from [Rajpurkar et al., 2016](https://aclanthology.org/D16-1264))

Motivation: question answering from knowledge bases¶

(from [Reddy et al., 2017](https://aclanthology.org/D17-1009/))

Parsing is is the process of constructing these graphs:

very important for downstream applications
researched in academia and industry

How is this done?

Syntactic Dependencies¶

Lexical Elements: words
Syntactic Relations: subject, direct object, nominal modifier, etc.

Task: determine the syntactic relations between words

Grammatical Relations¶

Kraft, owner of Milka, purchased Cadbury Dairy Milk and is now gearing up for a roll-out of its new brand.

Subject of purchased: Kraft
Object of purchased: Cadbury

In [53]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Kraft	Kraft	NOUN	NN	_	7	nsubj	_	_
2	,	,	PUNCT	,	_	1	punct	_	_
3	owner	owner	NOUN	NN	_	1	appos	_	_
4	of	of	ADP	IN	_	5	case	_	_
5	Milka	Milka	PROPN	NNP	_	3	nmod	_	_
6	,	,	PUNCT	,	_	7	punct	_	_
7	purchased	purchase	VERB	VBD	_	0	root	_	_
8	Cadbury	Cadbury	PROPN	NNP	_	7	dobj	_	_
9	Dairy	Dairy	PROPN	NNP	_	8	flat	_	_
10	Milk	milk	PROPN	NNP	_	8	flat	_	_
"""
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"1200px")

Out[53]:

Anatomy of a Dependency Tree¶

Nodes (vertices):
- Words of the sentence (+ punctuation tokens)
- a ROOT node
Arcs (edges):
- Directed from syntactic head to dependent
- Each non-ROOT token has exactly one head
  - the word that controls its syntactic function, or
  - the word "it depends on"
ROOT has no head

In [54]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Kraft	Kraft	NOUN	NN	_	7	nsubj	_	_
2	,	,	PUNCT	,	_	1	punct	_	_
3	owner	owner	NOUN	NN	_	1	appos	_	_
4	of	of	ADP	IN	_	5	case	_	_
5	Milka	Milka	PROPN	NNP	_	3	nmod	_	_
6	,	,	PUNCT	,	_	7	punct	_	_
7	purchased	purchase	VERB	VBD	_	0	root	_	_
8	Cadbury	Cadbury	PROPN	NNP	_	7	dobj	_	_
9	Dairy	Dairy	PROPN	NNP	_	8	flat	_	_
10	Milk	milk	PROPN	NNP	_	8	flat	_	_
"""
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"1200px")

Out[54]:

Example¶

(in CoNLL-U Format)

In [55]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Bob	_	_	_	_	2	dobj	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Bob	_	_	_	_	2	dobj	_	_

Out[55]:

https://ucph.padlet.org/dh/mw ¶

Need for Universal Syntax¶

https://cl.lingfil.uu.se/~nivre/docs/NivreCLIN2020.pdf ¶

Universal Syntax¶

English and Danish are similar, while others are more distant: similarities

Left: clustering based on syntactic dependencies; right: genetic tree (from Bjerva et al., 2019)

Danish Example¶

In [56]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	Alice	NOUN	_	_	2	nsubj	_	_
2	så	se	VERB	_	_	0	root	_	_
3	Bob	Bob	PROPN	_	_	2	obj	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	Alice	NOUN	_	_	2	nsubj	_	_
2	så	se	VERB	_	_	0	root	_	_
3	Bob	Bob	PROPN	_	_	2	obj	_	_

Out[56]:

Korean Example¶

In [57]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	앨리스는	앨리스+는	NOUN	_	_	3	nsubj	_	_
2	밥을	밥+을	NOUN	_	_	3	obj	_	_
3	보았다	보+았+다	VERB	_	_	0	root	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	앨리스는	앨리스+는	NOUN	_	_	3	nsubj	_	_
2	밥을	밥+을	NOUN	_	_	3	obj	_	_
3	보았다	보+았+다	VERB	_	_	0	root	_	_

Out[57]:

Longer English Example¶

In [58]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Kraft	Kraft	NOUN	NN	_	7	nsubj	_	_
2	,	,	PUNCT	,	_	1	punct	_	_
3	owner	owner	NOUN	NN	_	1	appos	_	_
4	of	of	ADP	IN	_	5	case	_	_
5	Milka	Milka	PROPN	NNP	_	3	nmod	_	_
6	,	,	PUNCT	,	_	7	punct	_	_
7	purchased	purchase	VERB	VBD	_	0	root	_	_
8	Cadbury	Cadbury	PROPN	NNP	_	7	dobj	_	_
9	Dairy	Dairy	PROPN	NNP	_	8	flat	_	_
10	Milk	milk	PROPN	NNP	_	8	flat	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"1200px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Kraft	Kraft	NOUN	NN	_	7	nsubj	_	_
2	,	,	PUNCT	,	_	1	punct	_	_
3	owner	owner	NOUN	NN	_	1	appos	_	_
4	of	of	ADP	IN	_	5	case	_	_
5	Milka	Milka	PROPN	NNP	_	3	nmod	_	_
6	,	,	PUNCT	,	_	7	punct	_	_
7	purchased	purchase	VERB	VBD	_	0	root	_	_
8	Cadbury	Cadbury	PROPN	NNP	_	7	dobj	_	_
9	Dairy	Dairy	PROPN	NNP	_	8	flat	_	_
10	Milk	milk	PROPN	NNP	_	8	flat	_	_

Out[58]:

Universal Dependencies¶

Annotation framework featuring 37 syntactic relations
Treebanks in over 90 languages
Large project with over 200 contributors
Linguistically universal annotation guidelines

UD Dependency Relations¶

	Nominals	Clauses	Modifier words	Function Words
Core arguments	nsubj obj iobj	csubj ccomp xcomp
Non-core dependents	obl vocative expl dislocated	advcl	advmod discourse	aux cop mark
Nominal dependents	nmod appos nummod	acl	amod	det clf case
Coordination	MWE	Loose	Special	Other
conj cc	fixed flat compound	list parataxis	orphan goeswith reparandum	punct root dep

Universal POS Tags (UPOS)¶

As opposed to language-specific POS tags (XPOS).

Open class words	Closed class words	Other
ADJ	ADP	PUNCT
ADV	AUX	SYM
INTJ	CCONJ	X
NOUN	DET
PROPN	NUM
VERB	PART
	PRON
	SCONJ

Dependency Parsing¶

Predict head and relation for each word.
Structured prediction, just like POS tagging.
Or is it?

In [59]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Bob	_	_	_	_	2	dobj	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	Alice	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Bob	_	_	_	_	2	dobj	_	_

Dependency Parsing Approaches¶

Graph-based: score all possible parts (e.g. word pairs), find best combination (e.g. maximum spanning tree)
Transition-based: incrementally build the tree, one arc at a time, by applying a sequence of actions

Transition-Based Parsing¶

Learn to perform the right action / transition in a bottom-up left-right parser
Train classifiers $p(y|\x)$ where $y$ is an action, and $\x$ is the parser state
Many possible transition systems; shown here: arc-standard (Nivre, 2004)

Configuration (Parser State)¶

Consists of a buffer, stack and set of arcs created so far.

Buffer¶

of tokens waiting for processing

In [26]:

render_transitions_displacy(transitions[0:1], tokenized_sentence)

Out[26]:

stack	buffer	parse	action
ROOT	Alice saw Bob

Stack¶

of tokens currently being processed

In [27]:

render_transitions_displacy(transitions[2:3],tokenized_sentence)

Out[27]:

stack	buffer	parse	action
ROOT Alice saw	Bob		shift

Parse (set of arcs)¶

tree built so far

In [60]:

render_transitions_displacy(transitions[6:7], tokenized_sentence)

Out[60]:

stack	buffer	parse	action
ROOT			rightArc-root

We use the following

Actions¶

Shift¶

Push the word at the top of the buffer to the stack.

$$ (S, i|B, A)\rightarrow(S|i, B, A) $$

In [29]:

render_transitions_displacy(transitions[0:2], tokenized_sentence)

Out[29]:

stack	buffer	parse	action
ROOT	Alice saw Bob
ROOT Alice	saw Bob		shift

rightArc-[label]¶

Add labeled arc from secondmost top node of stack $i$ to top of the stack $j$. Pop the top of the stack.

$$ (S|i|j, B, A) \rightarrow (S|i, B, A\cup\{(i,j,l)\}) $$

In [61]:

render_transitions_displacy(transitions[4:7], tokenized_sentence)

Out[61]:

stack	buffer	parse	action
ROOT saw Bob			shift
ROOT saw			rightArc-dobj
ROOT			rightArc-root

leftArc-[label]¶

Add labeled arc from top of stack, $j$, to secondmost top node of stack, $i$. Reduce the secondmost top node of the stack.

$$ (S|i|j, B, A) \rightarrow (S|j, B, A\cup\{(j,i,l)\}) $$

In [62]:

render_transitions_displacy(transitions[2:4], tokenized_sentence)

Out[62]:

stack	buffer	parse	action
ROOT Alice saw	Bob		shift
ROOT saw	Bob		leftArc-nsubj

Full Example¶

In [63]:

render_transitions_displacy(transitions[:], tokenized_sentence)

Out[63]:

stack	buffer	parse	action
ROOT	Alice saw Bob
ROOT Alice	saw Bob		shift
ROOT Alice saw	Bob		shift
ROOT saw	Bob		leftArc-nsubj
ROOT saw Bob			shift
ROOT saw			rightArc-dobj
ROOT			rightArc-root
ROOT

Summary: Configuration¶

Configuration:

Stack $S$: a last-in, first-out memory to keep track of words to process later
Buffer $B$: words not processed so far
Arcs $A$: the dependency edges predicted so far

We further define two special configurations:

initial: buffer is initialised to the words in the sentence, stack contains root, and arcs are empty
terminal: buffer is empty, stack contains only root

Summary: Actions¶

shift: Push the word at the top of the buffer to the stack $(S, i|B, A)\rightarrow(S|i, B, A)$
rightArc-label: Add labeled arc from secondmost top node of stack $i$ to top of the stack $j$. Pop the top of the stack.
leftArc-label: Add labeled arc from top of stack, $j$, to secondmost top node of stack, $i$. Reduce the secondmost top node of the stack.

Syntactic Ambiguity¶

In [51]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	the	_	_	_	_	4	det	_	_
4	star	_	_	_	_	2	dobj	_	_
5	with	_	_	_	_	7	case	_	_
6	the	_	_	_	_	7	det	_	_
7	telescope	_	_	_	_	2	obl	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	the	_	_	_	_	4	det	_	_
4	star	_	_	_	_	2	dobj	_	_
5	with	_	_	_	_	7	case	_	_
6	the	_	_	_	_	7	det	_	_
7	telescope	_	_	_	_	2	obl	_	_

Out[51]:

Syntactic Ambiguity¶

In [52]:

conllu = """
# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	the	_	_	_	_	4	det	_	_
4	star	_	_	_	_	2	dobj	_	_
5	with	_	_	_	_	7	case	_	_
6	the	_	_	_	_	7	det	_	_
7	telescope	_	_	_	_	4	nmod	_	_
"""
display(HTML(pd.read_csv(StringIO(conllu), sep="\t").to_html(index=False)))
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

# ID	FORM	LEMMA	UPOS	XPOS	FEATS	HEAD	DEPREL	DEPS	MISC
1	I	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	the	_	_	_	_	4	det	_	_
4	star	_	_	_	_	2	dobj	_	_
5	with	_	_	_	_	7	case	_	_
6	the	_	_	_	_	7	det	_	_
7	telescope	_	_	_	_	4	nmod	_	_

Out[52]:

Learning a Transition-Based Parser¶

Decompose parse tree into a sequence of actions
Learn to score individual actions
- Structured prediction problem!
- Sequence labeling? Sequence-to-sequence?

How to decide what action to take?

Learn a discriminative classifier $p(y | \x)$ where
- $\x$ is a representation of buffer, stack and parse
- $y$ is the action to choose
Current state-of-the-art systems use neural networks as classifiers (Bi-LSTMs, Transformers, BERT)
Use greedy search or beam search to find the highest scoring sequence of steps

(from Hershcovich et al., 2018)

Oracle¶

How do we get training data for the classifier?

Training data: whole trees labelled as correct
We need to design an oracle
- function that, given a sentence and its dependency tree, recovers the sequence of actions used to construct it
- can also be thought of reverse engineering a tree into a sequence of actions
An oracle does this for every possible parse tree
Oracle can also be thought of as human demonstrator teaching the parser

Dependency Parsing Evaluation¶

Unlabeled Attachment Score (UAS): % of words with correct head
Labeled Attachment Score (LAS): % of words with correct head and label

Always 0 $\leq$ LAS $\leq$ UAS $\leq$ 100%.

Example: LAS and UAS¶

$\mathrm{UAS}=\frac{8}{12}=67\%$ $\mathrm{LAS}=\frac{7}{12}=58\%$

Beyond Dependency Parsing: Meaning Representations¶

https://danielhers.github.io/dikubits_20200218.pdf ¶

Summary¶

Dependency parsing predicts word-to-word dependencies
Simple annotations in many languages, thanks to UD
Fast parsing, e.g. transition-based
Sufficient for most down-stream applications
More sophisticated meaning representations are more informative but harder to parse

Background Material¶

Arc-standard transition-based parsing system (Nivre, 2004)
EACL 2014 tutorial
Jurafsky & Martin, Speech and Language Processing (Third Edition): Chapter 15, Dependency Parsing.

Parsing¶

Schedule¶

Motivation: information extraction¶

Motivation: question answering by reading comprehension¶

Motivation: question answering from knowledge bases¶

Syntactic Dependencies¶

Grammatical Relations¶

Anatomy of a Dependency Tree¶

Example¶

https://ucph.padlet.org/dh/mw¶

Need for Universal Syntax¶

https://cl.lingfil.uu.se/~nivre/docs/NivreCLIN2020.pdf¶

Universal Syntax¶

Danish Example¶

Korean Example¶

Longer English Example¶

Universal Dependencies¶

UD Dependency Relations¶

Universal POS Tags (UPOS)¶

Dependency Parsing¶

Dependency Parsing Approaches¶

Transition-Based Parsing¶

Configuration (Parser State)¶

Buffer¶

Stack¶

Parse (set of arcs)¶

Actions¶

Shift¶

rightArc-[label]¶

leftArc-[label]¶

Full Example¶

Summary: Configuration¶

Summary: Actions¶

Syntactic Ambiguity¶

Syntactic Ambiguity¶

Learning a Transition-Based Parser¶

Oracle¶

Dependency Parsing Evaluation¶

Example: LAS and UAS¶

State-of-the-Art in Dependency Parsing¶

NN Parsers¶

Stack LSTMs¶

Transition-Based Neural Networks¶

mBERT for zero-shot cross-lingual parsing¶

Beyond Dependency Parsing: Meaning Representations¶

https://danielhers.github.io/dikubits_20200218.pdf¶

Summary¶

Background Material¶

https://ucph.padlet.org/dh/mw ¶

https://cl.lingfil.uu.se/~nivre/docs/NivreCLIN2020.pdf ¶

https://danielhers.github.io/dikubits_20200218.pdf ¶