#!/usr/bin/env python # coding: utf-8 # # Constituent Parsing Exercises # # # # In the lecture we took a look at a simple tokenizer and sentence segmenter. In this exercise we will expand our understanding of the problem by asking a few important questions, and looking at the problem from a different perspectives. # ## Setup 1: Load Libraries # In[1]: get_ipython().run_cell_magic('capture', '', '%load_ext autoreload\n%autoreload 2\n%matplotlib inline\n# %cd .. \nimport sys\nsys.path.append("..")\nimport math \nimport statnlpbook.util as util\nimport statnlpbook.parsing as parsing\n') # ## Task 1: Understanding parsing # # Be sure you understand [grammatical categories and structures](http://webdelprofesor.ula.ve/humanidades/azapata/materias/english_4/grammatical_categories_structures_and_syntactical_functions.pdf) and brush up on your [grammar skils](http://www.ucl.ac.uk/internet-grammar/intro/intro.htm). # # Then re-visit the [Enju online parser](http://www.nactem.ac.uk/enju/demo.html), and parse the following sentences... # # What is wrong with the parses of the following sentences? Are they correct? # - Fat people eat accumulates. # - The fat that people eat accumulates in their bodies. # - The fat that people eat is accumulating in their bodies. # # What about these, is the problem in the parser or in the sentence? # - The old man the boat. # - The old people man the boat. # # These were examples of garden path sentences, find out what that means. # # What about these sentences? Are their parses correct? # - Time flies like an arrow; fruit flies like a banana. # - We saw her duck. # ## Task 2: Parent Annotation # # # Reminisce the lecture notes in parsing, and the mentioned parent annotation. (grand)*parents, matter - knowing who the parent is in a tree gives a bit of context information which can later help us with smoothing probabilities, and approaching context-dependent parsing. # # in that case, each non-terminal node should know it's parent. We'll do this exercise on a single tree, just to play around a bit with trees and their labeling. # # # Given the following tree: # In[3]: x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])]) parsing.render_tree(x) # Construct an `annotate_parents` function which will take that tree, and annotate its parents. The final annotation result should look like this: # In[4]: y = ('S^?', [('Subj^S', ['He']), ('VP^S', [('Verb^VP', ['shot']), ('Obj^VP', ['the', 'elephant']), ('PP^VP', ['in', 'his', 'pyjamas'])])]) parsing.render_tree(y) # ## Solutions # # You can find the solutions to this exercises [here](parsing_solutions.ipynb)