Week 5: Context-free languages

In [1]:
from tock import *

Monday reading

Read Section 2.1, but skip "Chomsky Normal Form."

Tuesday class

We're beginning the second unit of the course, in which we look at context-free grammars and pushdown automata. Recall that in the first class we briefly introduced Turing machines, and then introduced finite automata as being like Turing machines but restricted to have only a one-way, read-only tape. Pushdown automata are a little bit less restricted: they have a one-way, read-only tape, but they also have a stack. But the book starts with context-free grammars, probably because they're a little more familiar and/or easier to understand.

Context-free grammars

Context-free grammars were invented in the late 1950s by Noam Chomsky as a way of describing the syntax of human languages (he called them phrase-structure grammars). Later, they were appropriate by the inventors of the programming language ALGOL-60 as a way to describe that language's syntax (which came to be called Backus-Naur form).

There is some variation in terminology that the book doesn't mention. Most importantly, variables are also called nonterminal symbols or nonterminals. You'll hear me call them nonterminals almost exclusively -- I can't get myself to call them variables.

We can start with a simpler example than the book's. \begin{align*} S &\rightarrow \mathtt{a}~S~\mathtt{b} \\ S &\rightarrow \varepsilon \end{align*} Question. What language does this CFG generate?

And here's a version of Example 2.3 that is rewritten using parentheses: \begin{align*} S &\rightarrow \mathtt{(}~S~\mathtt{)} \\ S &\rightarrow S~S \\ S &\rightarrow \varepsilon \end{align*}

Real-world CFGs

You can see a CFG for C in the draft standard, starting on page 458, or as a YACC grammar.

Generalized Phrase Structure Grammar was an attempt to write a CFG for English. The Berkeley Parser builds a CFG for English partly using human input and partly automatically.

L-systems are a kind of visual grammar; context-free L-systems can be used to draw surprisingly natural-looking images, especially of trees:

Dragon trees

Wednesday reading

Read Section 2.2, up to but not including "Equivalence with Context-Free Grammars".

Thursday class

Pushdown automata

Pushdown automata are equipped with, in addition to the input tape, a pushdown store, said to have been inspired by the tray dispenser in a university dining hall:

Tray dispenser

Nowadays, a pushdown store is more commonly known as a stack, which you should be quite familiar with. Here's an example (2.14):

In [2]:
m1 = read_csv("pda-m1.csv")
m1
%3 _START 0 q1 _START->0 1 q2 0->1 ε,ε → $ 1->1 0,ε → 0 2 q3 1->2 1,0 → ε 2->2 1,0 → ε 3 q4 2->3 ε,$ → ε
In [3]:
run(m1, "0 0 0 1 1 1")
%3 _START 13 q1,ε _START->13 0 q3,[0] 0 $ 1 q3,[0] $ 0->1 5 q3,$ 1->5 2 q2,[0] 0 $ 3 q2,[0] 0 0 … 2->3 3->0 4 q4,ε 5->4 6 7 6->7 0 8 7->8 0 9 8->9 0 10 9->10 1 11 10->11 1 12 11->12 1 14 q2,$ 13->14 15 q2,[0] $ 14->15 15->2

The run graph shows both the state and the stack at each time step. The square brackets just indicate the top of the stack. Note that when the stack gets deeper, ellipses (…) are used to keep the node labels compact.

This happens to be a deterministic PDA: at every time step, there's only one possibility for the next step.

More examples from the book:

In [4]:
m2 = read_csv("pda-m2.csv")
m2
%3 _START 0 q1 _START->0 1 q2 0->1 ε,ε → $ 1->1 a,ε → a 5 q3 1->5 ε,ε → ε 6 q5 1->6 ε,ε → ε 2 q6 2->2 c,a → ε 4 q7 2->4 ε,$ → ε 3 q4 3->3 c,ε → ε 5->3 ε,$ → ε 5->5 b,a → ε 6->2 ε,ε → ε 6->6 b,ε → ε

Question. Does the above PDA accept the strings $\mathtt{aabb}$? $\mathtt{aabc}$? $\mathtt{aacc}$?

In [5]:
m3 = read_csv("pda-m3.csv")
m3
%3 _START 2 q1 _START->2 0 q3 0->0 0,0 → ε 1,1 → ε 1 q4 0->1 ε,$ → ε 3 q2 2->3 ε,ε → $ 3->0 ε,ε → ε 3->3 0,ε → 0 1,ε → 1
In [6]:
run(m3, "0 1 0 0 1 0")
%3 _START 8 q1,ε _START->8 0 q3,$ 13 q4,ε 0->13 1 2 1->2 0 3 2->3 1 4 3->4 0 5 4->5 0 6 5->6 1 7 6->7 0 9 q2,$ 8->9 9->0 10 q2,[0] $ 9->10 11 q2,[1] 0 $ 10->11 12 q3,[0] $ 10->12 14 q3,[1] 0 $ 11->14 15 q2,[0] 1 0 … 11->15 16 q3,[0] 1 0 … 15->16 17 q2,[0] 0 1 … 15->17 19 q3,[1] 0 $ 16->19 18 q3,[0] 0 1 … 17->18 20 q2,[1] 0 0 … 17->20 21 q3,[0] $ 19->21 22 q3,[1] 0 0 … 20->22 24 q2,[0] 1 0 … 20->24 23 q3,$ 21->23 25 q4,ε 23->25 26 q3,[0] 1 0 … 24->26

Question. Design a PDA that recognizes the language of matching left and right parentheses (like Example 2.3).

Question. Design a PDA that recognizes the language over $\Sigma = \{\mathtt{a}, \mathtt{u}, \mathtt{c}, \mathtt{g}\}$ such that every symbol is paired with exactly one other symbol -- $\mathtt{a}$ with $\mathtt{u}$ and $\mathtt{c}$ with $\mathtt{g}$, and the pairings are nested like parentheses in the previous question.

Question. Do you think a queue automaton would be more or less powerful than a pushdown (stack) automaton?