LR Parsers (Mandy Korpusik)

Reading: Wikipedia article on “LR Parsers”

Reading Questions

1. How does LR parsing differ from the parsing we have been doing (i.e. recursive descent parsing)?

2. What are two benefits of LR parsing?

3. List the three types of LR parsers in increasing order of language recognition power.

4. How are LR parsers often used?

5. What is a parser table?

6. What are the five parts of the parser? If it helps, draw a picture.

7. Describe the shift and reduce operations.

8. How does the canonical LR(1) parser differ from the LR(0) parser? Hint: see the Wikipedia article on "canonical LR parser".

9. What is an item set?

10. Why does LR parsing sometimes result in conflicts?

Lecture Notes

Bottom-up Parsing

Recursive ascent parsers
Shift-reduce parsers
- LR parser
- Precedence parser

Top-down Parsing

Recursive descent parsers
LL parser

Top-down Parsing

Works its way downwards and rightwards in the parse tree
Inefficient because it predicts the parse tree’s structure without considering the input first

Bottom-up Parsing

Visualize a parse tree: start at the bottom left corner, then slowly work your way upwards and rightwards until you finally reach the top node
Unlike top-down, it builds the parse tree after inspecting the input!

Shift-reduce Parsing

Shifts each leftmost symbol of the input string onto a stack
Matches the items on the top of the stack with the right-hand side of a grammar rule, and reduces the matched symbols to the variable on the left-hand side of the rule
Example (this requires installing and importing nltk): run nltk.app.srparser()

LR Parsers

LR(0) parser
Reads input from Left to right and produces a Rightmost derivation
Uses the shift-reduce algorithm
Three additional types that can parse more grammars than LR(0): SLR (simple LR parser), LALR (look-ahead LR parser), and canonical LR parser

LR Parser Architecture

LR parser is essentially a pushdown automaton with five components:

o Input string (and $ to represent the empty stack)

o Stack with all the states the parser has been in

o Goto table that determines the next state of the parser

o Action table that specifies the shift or reduce operation to be done

o Grammar rules

Example Action and Goto Table

Example Parsing Procedure

Conflicts

Shift-reduce conflict: A cell in an action table contains both shift and reduce actions.

o Example: when the parser encounters the terminal ‘1’, it can reduce to rule (2) or shift

(1) E → 1 E

(2) E → 1

Yacc deals with conflicts by preferring shift over reduce and the first of two reduces

Item Sets

Item sets are used to construct the action and goto tables
Items are grammar rules with a dot added to the right-hand side, positioned after the portion of the input string that has already been recognized, but before the symbol it expects next
Example: E → E • + B
If an item contains a dot before a variable, then you must add items to the item set constructed from each rule for that variable, where a dot is placed at the beginning of the right-hand side
Continue to move the dot to the right to form the rest of the item sets

Canonical Parser (most language recognition power)

LR(1) parser
Items contain a look-ahead, the terminal expected after the right-hand side of a rule
Example:

o A → B C

o A → B • C, a (parser expects a string corresponding to C, followed by terminal ‘a’ next)

Enables these parsers to deal with a large class of grammars by making better reduce decisions (may avoid reduce-reduce conflicts)

LALR Parser (less general)

Same look-ahead approach as canonical, but merges item sets that are identical except for the look-ahead
Became popular in 1960s because had same performance as LR parsers, but produced smaller tables
Generated by Yacc, GNU bison, and other compiler-compilers

SLR Parser (parses even fewer grammars)

Uses more general follow sets to construct reduce actions, instead of specific look-ahead sets
Follow set contains all symbols that could appear after a string corresponding to the variable on the left-hand side of an item set, whereas look-ahead sets depend on the parser state and the right-hand side of the rule as well

Applications

LR parsers are often used by compilers to analyze source code syntax
Usually created by parser generators

In-class Activities

Exercise 1) Given the following grammar, shift-reduce the string “the dog jumps” on the whiteboard:

S → NP VP

NP → Art Noun

VP → Verb | Adverb Verb

Art → the | a

Verb → jumps | sings

Noun → dog | cat

Answer:

Stack Input Sequence Action

[] the dog jumps

[the] dog jumps Shift

[Art] dog jumps Reduce

[Art dog] jumps Shift

[Art Noun] jumps Reduce

[NP] jumps Reduce

[NP jumps] ε Shift

[NP Verb] ε Reduce

[NP VP] ε Reduce

[S] ε Success

Exercise 2) What kind of conflict do you think this grammar will create?

(1) E → A 1

(2) E → B 2

(3) A → 1

(4) B → 1

Answer: reduce-reduce conflict because a terminal ‘1’ can reduce to rule (3) and rule (4)

Exercise 3) Given the first item, what must be added to the item set?

E → E * • B

B → 0

B → 1

Answer: B → • 0 and B → • 1

Quiz Questions

1) Shift-reduce the input string “101” for the following grammar:

S → ASA

S → 0

A → 1

2) Write a grammar that contains a reduce-reduce conflict.

3) Construct the closed item set for the item S → • NP VP, given the following rules:

NP → D N

D → the

N → man

VP → VP PP

PP → P NP

P → in

4) Explain the difference between a follow set and a look-ahead.

Homework Exercise

Implement an LR parser for the grammar with the following rules:

(1) E → E * B

(2) E → E + B

(3) E → B

(4) B → 0

(5) B → 1

Test it by parsing the strings "1+1", "1*1", and "1+1*1".

Hint: see the example in the Wikipedia article on the "LR parser".

How do you handle syntax errors? Try parsing "*".