Yapps 2.5

When you come to a fork in the road, take it. 

About Yapps

Yapps (Yet Another Python Parser System) is an easy to use parser generator that is written in Python and generates Python code.  You can find out more about it here.  The most recent version is 2.1.1, unfortunately it hasn't been worked on since August, 2003.

I recently found myself in need of a parser for the Inform 6 programming language.  As it happens, the Inform language is primarily defined by it's compiler, written in C.  It uses a rather ad hoc, hand-written, recursive descent parser.  I faced two challenges:  turning the C code into a formal grammar specification, and adapting Yapps to understand that grammar.  This page details the work that I've done on the second issue.

Changes to Yapps 2.1.1

Case sensitivity 

The first problem that I faced is that Inform is case insensitive.  Yapps has an option statement that I extended, so that any flags that can be passed to Python's regular expression compiler may now be specified as options to Yapps.

option: "re.IGNORECASE"

Wildcards in Rule Names 

Inform supports a large number of statements called "directives", so I had a rule that listed all of the alternatives.  Every time I coded the grammar for another directive, I needed to add that rules to the list.  About the third or fourth time I forget this second step, I decided that Yapps needed to support wildcards in rule names.  Glob-like wildcards were out of the question, but the "%" symbol, used for wildcards in SQL, weren't being used.  Now my rule listing all of the directives looks like this:

rule statements: %_directive | code |

When Yapps begins to generate code, it scans for rules like this and creates a rule that matches any non-terminal whose name ends with "_directive"; in effect, this rule is added to my grammar:

rule %_directive:
     constant_directive
   | include_directive
   | ...

Doc-strings in Methods 

The parser object now has doc-strings for all of its methods, similar to the input doc-strings used by SPARK.  Also, since the derived scanner and parser classes have names that may not relate to the name of the generated module, the module now defines an __all__ variable that lists the names in a defined order.

Additional Plans

Bug in Terminal Symbol Definition 

I've discovered a small bug in Yapp's design.  Yapps allows terminal symbols to be defined as Python-like strings, using either single or double quotes.  It then treats those as distinct symbols.  For examples, consider these two rules:

rule ImportStmt: "import" NAME
rule FromStmt: 'from' NAME 'import' list

Since they use different quoting, these rules generate different non-terminals in the parser, however the scanner will only generate tokens for the first one.  This means that the second rule will never be able to match anything.  I plan to convert all terminals into a canonical form to avoid this issue.

Warn About Case-sensitive Non-terminals

I actually found the preceding bug when I accidentally used different capitalizations for what should have been the same token in different rules.  (Remember, Inform 6 is case insensitive, so I'm ignoring the case of the terminal symbols.)  In Python terms, I'd defined something similar to this:

rule ImportStmt: "Import" NAME
rule FromStmt: "From" NAME "import" list

Besides resolving to use consistent capitalization in the future, I plan to do a case-insensitive comparison of all non-terminals and issue a warning if any match.

Lessen the Use of Regular Expressions

All terminal symbols used in Yapps are interpreted as regular expressions.  This is fine for tokens, but much less so for ordinary terminals.  I'm like to fix this, perhaps with one or more options to modify the behavior.

Special Support for Precedence

Expressions in Inform 6 use about 50 binary operators that are grouped into 14 precedence levels.  To support this, I could write 14 different rules, but I'd like to add an understanding of precedence to Yapps.  Precedence climbing seems a useful foundation for this.  I'm thinking about adding syntax similar to this:

binary_op B:
    left: '[+]' '[-]'
    left: '[*]' '[/]'
    right: '[^]'
unary_op U:
    unary: '[-]'

rule Expression:
    Term ( B Expression )*
rule Term:
    U? Name

The idea is to list terminal symbols in a way similar to YACC's %left and %right

Future Directions

Zed Shaw is supposed to be developing a fork of Yapps called Zapps, but his web site is lacking in details.  The HTTP headers for the page claim that it was last modified on May 6, 2008, while (judging from the Internet Archive cache of the page) Zed wasn't mentioned until sometime after August 14, 2007.  I need to email him and find out the project's status.

Site Navigation Menu