LR Parsers (Mandy Korpusik)

Reading: Wikipedia article on “LR Parsers”

Reading Questions

1. How does LR parsing differ from the parsing we have been doing (i.e. recursive descent parsing)?

2. What are two benefits of LR parsing?

3. List the three types of LR parsers in increasing order of language recognition power.

4. How are LR parsers often used?

5. What is a parser table?

6. What are the five parts of the parser? If it helps, draw a picture.

7. Describe the shift and reduce operations.

8. How does the canonical LR(1) parser differ from the LR(0) parser? Hint: see the Wikipedia article on "canonical LR parser".

9. What is an item set?

10. Why does LR parsing sometimes result in conflicts?

Lecture Notes

Bottom-up Parsing

  • Recursive ascent parsers

  • Shift-reduce parsers

    • LR parser

    • Precedence parser

Top-down Parsing

  • Recursive descent parsers

  • LL parser

Top-down Parsing

  • Works its way downwards and rightwards in the parse tree

  • Inefficient because it predicts the parse tree’s structure without considering the input first

Bottom-up Parsing

  • Visualize a parse tree: start at the bottom left corner, then slowly work your way upwards and rightwards until you finally reach the top node

  • Unlike top-down, it builds the parse tree after inspecting the input!

Shift-reduce Parsing

  • Shifts each leftmost symbol of the input string onto a stack

  • Matches the items on the top of the stack with the right-hand side of a grammar rule, and reduces the matched symbols to the variable on the left-hand side of the rule

  • Example (this requires installing and importing nltk): run nltk.app.srparser()

LR Parsers

  • LR(0) parser

  • Reads input from Left to right and produces a Rightmost derivation

  • Uses the shift-reduce algorithm

  • Three additional types that can parse more grammars than LR(0): SLR (simple LR parser), LALR (look-ahead LR parser), and canonical LR parser

LR Parser Architecture

  • LR parser is essentially a pushdown automaton with five components:

o Input string (and $ to represent the empty stack)

o Stack with all the states the parser has been in

o Goto table that determines the next state of the parser

o Action table that specifies the shift or reduce operation to be done

o Grammar rules

Example Action and Goto Table

Example Parsing Procedure

Conflicts

  • Shift-reduce conflict: A cell in an action table contains both shift and reduce actions.

o Example: when the parser encounters the terminal ‘1’, it can reduce to rule (2) or shift

(1) E → 1 E

(2) E → 1

  • Yacc deals with conflicts by preferring shift over reduce and the first of two reduces

Item Sets

  • Item sets are used to construct the action and goto tables

  • Items are grammar rules with a dot added to the right-hand side, positioned after the portion of the input string that has already been recognized, but before the symbol it expects next

  • Example: E → E • + B

  • If an item contains a dot before a variable, then you must add items to the item set constructed from each rule for that variable, where a dot is placed at the beginning of the right-hand side

  • Continue to move the dot to the right to form the rest of the item sets

Canonical Parser (most language recognition power)

  • LR(1) parser

  • Items contain a look-ahead, the terminal expected after the right-hand side of a rule

  • Example:

o A → B C

o A → B • C, a (parser expects a string corresponding to C, followed by terminal ‘a’ next)

  • Enables these parsers to deal with a large class of grammars by making better reduce decisions (may avoid reduce-reduce conflicts)

LALR Parser (less general)

  • Same look-ahead approach as canonical, but merges item sets that are identical except for the look-ahead

  • Became popular in 1960s because had same performance as LR parsers, but produced smaller tables

  • Generated by Yacc, GNU bison, and other compiler-compilers

SLR Parser (parses even fewer grammars)

  • Uses more general follow sets to construct reduce actions, instead of specific look-ahead sets

  • Follow set contains all symbols that could appear after a string corresponding to the variable on the left-hand side of an item set, whereas look-ahead sets depend on the parser state and the right-hand side of the rule as well

Applications

  • LR parsers are often used by compilers to analyze source code syntax

  • Usually created by parser generators

In-class Activities

Exercise 1) Given the following grammar, shift-reduce the string “the dog jumps” on the whiteboard:

S → NP VP

NP → Art Noun

VP → Verb | Adverb Verb

Art → the | a

Verb → jumps | sings

Noun → dog | cat

Answer:

Stack Input Sequence Action

[] the dog jumps

[the] dog jumps Shift

[Art] dog jumps Reduce

[Art dog] jumps Shift

[Art Noun] jumps Reduce

[NP] jumps Reduce

[NP jumps] ε Shift

[NP Verb] ε Reduce

[NP VP] ε Reduce

[S] ε Success

Exercise 2) What kind of conflict do you think this grammar will create?

(1) E → A 1

(2) E → B 2

(3) A → 1

(4) B → 1

Answer: reduce-reduce conflict because a terminal ‘1’ can reduce to rule (3) and rule (4)

Exercise 3) Given the first item, what must be added to the item set?

E → E * • B

B → 0

B → 1

Answer: B → • 0 and B → • 1

Quiz Questions

1) Shift-reduce the input string “101” for the following grammar:

S → ASA

S → 0

A → 1

2) Write a grammar that contains a reduce-reduce conflict.

3) Construct the closed item set for the item S → • NP VP, given the following rules:

NP → D N

D → the

N → man

VP → VP PP

PP → P NP

P → in

4) Explain the difference between a follow set and a look-ahead.

Homework Exercise

Implement an LR parser for the grammar with the following rules:

(1) E → E * B

(2) E → E + B

(3) E → B

(4) B → 0

(5) B → 1

Test it by parsing the strings "1+1", "1*1", and "1+1*1".

Hint: see the example in the Wikipedia article on the "LR parser".

How do you handle syntax errors? Try parsing "*".