Homework 06

PDA

Your next mission is to write a parser by implementing a PDA in Python.

I won't give you starter code this time, but in class we will discuss implementation choices. Broadly, you have two options:

1) Implement a PDA as in Definition 2.13 (page 111) using appropriate data structures, and then process input using an algorithm similar to NFA.process().

2) Implement the algorithm in the informal PDA description on page 116. If you choose this option, you will have to think about how to implement the nondeterminism.

This assignment comes in three levels of difficulty:

1) Basic: write a function called Grammar.process that takes a string and returns 'accept' if the string is in the language described by the Grammar, and 'reject' otherwise.

2) Advanced: extend your function to return a parse tree for the string, if it exists, or None otherwise. If you take this on, you might want to read about "shift-reduce parsers."

3) Extra advanced: if the Grammar is ambiguous, return all possible parse trees.

Test your PDA with a grammar that checks for balanced parentheses. Here is a simple test suite you can use:

variables = set('ST')

terms = set('()')

start = 'S'

g = pda.Grammar(variables, terms, start)

g.add_rule('S', 'TS')

g.add_rule('S', '')

g.add_rule('T', '(S)')

self.assertNotEqual(g.process('()'), None)

self.assertNotEqual(g.process('(())'), None)

self.assertNotEqual(g.process('()()'), None)

self.assertEqual(g.process('(()'), None)

self.assertEqual(g.process('(()))'), None)

Parse lambda calculus

Write a grammar for the lambda calculus and implement a parser for it. It should accept strings like this:

print g.process('(x)')

print g.process('Lx.x')

print g.process('(Lx.x)(Lx.x)')

print g.process('Lpab.pab')

print g.process('Lp.La.Lb.((pa)b)')

Your parser should recognize any lower-case letter as an identifier. You might find it helpful to pre-process strings to remove whitespace and replace identifiers with a token like V. This is a kind of mini-tokenization. See http://en.wikipedia.org/wiki/Tokenization

Hints

Based on the test cases, above, you can infer a few things about the interface I implemented:

- The module pda.py provides a class named Grammar (which is part of the public interface) and a class named PDA, which is used internally, but not part of the public interface.
- Grammar provides a method named process that takes a string and returns either None or the PDA object that parsed the string.

Your implementation doesn't have to work exactly the same way, but it should provide the same interface. To "accept" a string, Grammar.process can return any non-None value.

You should instantiate and test two Grammar objects, one that recognizes balanced parentheses and one that recognizes lambda calculus.

Turnin

In our shared Dropbox folder, please turn in pda.py, which contains your implementation of the PDA, and hw06.py, which tests your lambda calculus parser.