Homework 10

Parsing CFLs

1) Read about Dynamic programming.

2) Read about the the Cocke-Younger-Kasami (CYK) algorithm for recognizing a CFL. Can you convince yourself that this algorithm is O(n^3) where n is the length of the string?

3) Download parzz_test.py, which tests a module named parzz.py that provides a class named Grammar. Here is the code that builds the grammar and tests it:

"""Create a grammar that recognizes a single S-expression:

1) the whole thing has to be in paretheses

2) the middle can be empty

3) the middle can be a list of S-expressions

"""

variables = set('STORC')

terminals = set('()')

start = 'S'

g = parzz.Grammar(variables, terminals, start)

g.add_rule('S', 'OC')

g.add_rule('S', 'OR')

g.add_rule('T', 'OC')

g.add_rule('T', 'OR')

g.add_rule('T', 'TT')

g.add_rule('R', 'TC')

g.add_rule('O', '(')

g.add_rule('C', ')')

# test parse

self.assertTrue(g.parse('()'))

self.assertTrue(g.parse('(())'))

self.assertTrue(g.parse('(()())'))

self.assertTrue(g.parse('(()(())())'))

self.assertFalse(g.parse('()()'))

self.assertFalse(g.parse('(()'))

self.assertFalse(g.parse('(()))'))

Note that the grammar is in CNF, so it is a little longer than it would be otherwise.

4) Write a module named parzz.py that passes all tests. A couple of hints:

a) The pseudocode on the Wikipedia page assumes that string indices start from 1, and that a range like 1 to n includes n.

b) You don't have to use an array of booleans for the cache. A set of tuples might be better.