References‎ > ‎

Parser Rules


The grammar file is structure as follows: The top of the file contains declarations for the lexer and disambiguation rules. The declaration section ends with %% on its own line.  The next section contains the list of grammar rules. The left-hand-side if the first rule is the start nonterminal.

In this section, we explain the directives.  Lexer directives and lexing rules are detailed here

  • %ignore defines a list of token which are thrown away by the lexer. This is useful to handle white-space characters for instance. The first declaration ignores comments (i.e. lines starting with #). The second, which you will find handy, ignores whitespace. 
    %ignore  /#.*/
    %ignore  /[ \t\v\f\r\n]+/
  • %left and %right define associativity and precedence of binary and unary operators. The order of these declarations matters: later declarations have higher precedence than the earlier ones. Operators on the same line have the same precedence. Here we define both + and * as left associative. * has a higher precedence than +.  
    In this project, you will disambiguate the precedence defined by %left and %right but not associativity defined by these directives (the latter is done more easily once you have your parse tree built, in PA6).  
    Important note:  when distinguishing between outputting "resolved" vs "unresolved", your parser must assume that associativity has been resolved.  For example, the input 1+2+3 in the third grammar above is resolved.

    %left '+' '-
    %left '*' '/'
  • %dprec defines the priority of a rule. The higher the number, the higher the disambiguation priority. The disambiguation priority is what determines which of conflicting edges will be selected by the parser from among the conflicting edges. The example below uses %dprec to disambiguate the famous dangling else problem.  Because if-then has higher priority than if-then-else, if-then wins when these two rules are in conflict and thus if-then will be higher up in the parse tree and, consequently, if-then-else will be lower in the tree.  

    E  -> 'if' '(' E ')' E           %dprec 2  
        | 'if' '(' E ')' E 'else' E  %dprec 1  
        | /[0-9]+/ ;

    A common confusion is to conflate precedence (%left, %right) and disambiguation priority (%dprec).  Note the meaning of %dprec: higher dprec value does not mean higher *precedence* (as in "operator * has higher precedence than operator +"). Instead, higher dprec value means higher *priority* in being selected from among the conflicting edges. Precedence and priority are different notions; you may ask why we do not define precedence for rules like if-then-else.  It's because these rules are not operators, hence defining precedence in the same way as for operators is not appropriate.
    • %prec defines the priority of a rule to be the same as that of an operator that appears in the list of %left, %right declarations.  Here, the function call is given the same precedence as the filed access operator '.' .  Because calls and field accesses are of the same precedence (and left associative), the expression p.field() will be parsed as call(dot(p,field)) rather than as dot(p,call(field)).

      %left '+' '-'

      %left '*' '/'
      %left '.'
      E -> ...
      | E '(' E_list ')' %prec '.'
      | E '.' Id ;

    Resolving ambiguities

    This handout will not delve deeply into the mechanisms of disambiguation. The general algorithm for choosing between two conflicting parse trees p1, p2 is:

    Disambiguate (p1, p2):   
      if p1.operatorPrecedence is defined and p2.operatorPrecedence is defined
          if p1.operatorPrecedence = p2.operatorPrecedence     
              return the p1 or p2 that has correct associativity  [see note below] 
              return the p1 or p2 with lower operator precedence  
      else if p1.dprec is defined and p2.dprec is defined  
          return the p1 or p2 with higher dprec   
          # declaritve rules do not provide disambiguate between p1 and p2
          return either p1 or p2

    There a few additional issues we will discuss.

    %left, %right:  Associativity and Operator Precedence

    Any number of terminals with associativity/precedence declarations (i.e., %left or %right) could appear in any production. However, you may assume that the grammar writer has only included such terminals in productions of the form:

    L --> N1 Term N2
    L -->Term N
    L --> N Term

    You do not have to check if productions of other forms have terminals with associativity/precedence declarations. If there are such productions in a grammar AST, your parser's behavior is undefined.

    Operator precedence is determined by the order of the associativity declarations. The lower the declaration in he list, the higher the precedence.  For example, the code

    %left '+' '-'
    %left '*' '/'
    %right '!'

    gives negation higher precedence than multiplication and division, which themselves have higher precedence than addition and subtraction.  Operators on the same line have the same precedence.

    Any terminal symbol without a %left or %right declaration has undefined associativity and operator precedence. 


    Productions with both %dprec declarations and terminals with associativity/precedence declarations represent a poorly-designed grammar at best, and a semantic error at worst. However, you do not need to check for this. The disambiguation algorithm above considers operator precedence and associativity before it considers the dprec value.

    Any production without a %dprec declaration has undefined dprec value.


    The construct %prec '!' is useful when handling unary operators. A rule %prec '!' says that the precedence of the rule where %prec '!' appears is the same as that of the rule with the operator '!'.

    Example: in this specific grammar, these three rules have the same precedence because +E and -E are given the same precedence as !E.

    | '!' E
    | '+' E  %prec '!'
    | '-' E  %prec '!'

    A bigger lesson: what you should learn in cs164 is how to come up with the rationale for this design choice (ie, why did we need to introduce %prec?).  The obvious (but incorrect) solution is to set he precedence of the unary operators + and - in the following way.

    %left '+' '-'
    %left '*' '/'
    %right '!' '+' '-'

    Why would this not work?  Imagine the rules E+E and E*E.  You cannot disambiguate these rules with the above precedence declarations just because + appears twice in them, once it has a higher priority than *, once it has lower.  In other words, these declarations are ambiguous.

    To get around this ambiguity, we decided set the precedence of the *unary* + and - by reference to the unary operator !.

    Nonassociative operators

    ++ and -- are non-associative, in the sense that you have no choice just to associate them to the right if they appear as prefix operators

        ++ ++ x     is ( ++ ( ++ x ) ) 

    and to the left if they appear as postfix operators

        x ++ ++    is ( ( x ++ ) ++ ) 

    Note that their precedence (with respect to ohter operators) is still important!  For example, is

        ++ x + x 

    intended to be

        ++ ( x + x ) 


        ( ++  x ) + x  ?

    Which is why you want to list these operators among the %left/%right declarations.  For example, in a JavaScript grammar, you may want to say

    %left  ','
    %right '='  '+='  '-='  '*='  '/='  '%='  '<<='  '>>=' '>>>='  '&='  '^='  '|='
    %left  '||'
    %left  '&&'
    %left  '|'
    %left  '^'
    %left  '&'
    %left  '=='   '!='  '==='  '!=='
    %left  '<'  '<='  '>'  '>='  'in'  'instanceof'
    %left  '<<'  '>>'  '>>>'
    %left  '+'  '-'
    %left  '*'  '/'  '%'
    %right '!'  '~'  'typeof'  'void'  'delete'
    %left  '++'  '--'  // these are non-associative, so you could
                       // as well say %right '++' '--'

    Epsilon Productions

    In our grammar files, we use _ to represent the empty string, commonly called epsilon.  Thus the following is a grammar for matched parentheses:
    P -> '(' P ')' | P P | _ ;
    The grammar above thus accepts the following strings: '' and '()' and '()()' and '(())' and '((()))(((((())))))'.