Homework 10
Parsing CFLs
1) Read about Dynamic programming.
2) Read about the the Cocke-Younger-Kasami (CYK) algorithm for recognizing a CFL. Can you convince yourself that this algorithm is O(n^3) where n is the length of the string?
3) Download parzz_test.py, which tests a module named parzz.py that provides a class named Grammar. Here is the code that builds the grammar and tests it:
"""Create a grammar that recognizes a single S-expression:
1) the whole thing has to be in paretheses
2) the middle can be empty
3) the middle can be a list of S-expressions
"""
variables = set('STORC')
terminals = set('()')
start = 'S'
g = parzz.Grammar(variables, terminals, start)
g.add_rule('S', 'OC')
g.add_rule('S', 'OR')
g.add_rule('T', 'OC')
g.add_rule('T', 'OR')
g.add_rule('T', 'TT')
g.add_rule('R', 'TC')
g.add_rule('O', '(')
g.add_rule('C', ')')
# test parse
self.assertTrue(g.parse('()'))
self.assertTrue(g.parse('(())'))
self.assertTrue(g.parse('(()())'))
self.assertTrue(g.parse('(()(())())'))
self.assertFalse(g.parse('()()'))
self.assertFalse(g.parse('(()'))
self.assertFalse(g.parse('(()))'))
Note that the grammar is in CNF, so it is a little longer than it would be otherwise.
4) Write a module named parzz.py that passes all tests. A couple of hints:
a) The pseudocode on the Wikipedia page assumes that string indices start from 1, and that a range like 1 to n includes n.
b) You don't have to use an array of booleans for the cache. A set of tuples might be better.