The Early Parser

Reading Assignment

Take a quick look at this page to get an idea of what we’re looking at:

http://www.cs.toronto.edu/~sengels/tutorials/parsing.html

Then try to understand how it really works by going through this page from UC Berkeley. Be sure to click through the examples.

http://inst.eecs.berkeley.edu/~cs164/fa10/earley/earley.html

Reading Questions

(Toronto) 1. CYK increases the length of the string until it is recognized by a rule of the grammar. What is different about how the Early parser steps through input?

(Berkley) 1. Click through the first example. Each edge is labeled with a rule from the grammar, and includes a period. What does the period represent? (If you’re lost, compare step 4 to step 5 – then go through the entire example again.)

(Berkley) 2. In the first example, at what step do we know that the string has been parsed successfully? How do we know?

(Berkley) 3. In the second example, why don’t we also add an edge for the “C-> .c” rule?

(Berkley) 4. To create a parse tree, we work backwards. What is common to the labels of all the edges that will be used to back-out the parse tree? The author gives these edges a specific name – what is it?

(Berkley) 5. Identify the other two types of edges that make up the early algorithm, and quickly explain each (in general terms – what does it do?).

Read the pseudo-code and see what you can make of it – don’t worry about getting it 100%

(Berkley) 6. Skim the last section on Disambiguation. Explain in simple terms what this problem is (thinking about how to build a parse tree is probably best.)

Lecture Notes

Yes, it’s just another parsing method. Some cool things:

Early parser runs in O(n3) in the general case – and faster in special cases. O(n2) for unambiguous and linear time O(n) for most LR(k) grammars. (source: http://en.wikipedia.org/wiki/Early_parser)

Can parse all context free languages. (a step above LR and LL parsers)

The Steps: (in terms that are possibly too simplistic to be useful)

The predictor. Looks for what rules might apply to the current input character

The scanner. Consumes one character.

The completer. Applies a rule/reduction when the final necessary character for that rule has been parsed.

Homework:

First, download this implementation of an early parser written in python: https://github.com/tomerfiliba/tau/blob/master/earley3.py

Add a quick comment to each line of code in the "predict" "scan" and "complete" functions so that you understand what is going on with this implementation.

Following the example grammar at the bottom of the code (lines 170 and below), implement the grammar for simple arithmetic used for the example on wikipedia. http://en.wikipedia.org/wiki/Early_parser#Example.