When you come to a fork in the road, take it.
About YappsYapps (Yet Another Python Parser System) is an easy to use parser generator that is written in Python and generates Python code. You can find out more about it here. The most recent version is 2.1.1, unfortunately it hasn't been worked on since August, 2003. I recently found myself in need of a parser for the Inform 6 programming language. As it happens, the Inform language is primarily defined by it's compiler, written in C. It uses a rather ad hoc, hand-written, recursive descent parser. I faced two challenges: turning the C code into a formal grammar specification, and adapting Yapps to understand that grammar. This page details the work that I've done on the second issue. Changes to Yapps 2.1.1Case sensitivityThe first problem that I faced is that Inform is case insensitive. Yapps has an option statement that I extended, so that any flags that can be passed to Python's regular expression compiler may now be specified as options to Yapps. option: "re.IGNORECASE" Wildcards in Rule NamesInform supports a large number of statements called "directives", so I had a rule that listed all of the alternatives. Every time I coded the grammar for another directive, I needed to add that rules to the list. About the third or fourth time I forget this second step, I decided that Yapps needed to support wildcards in rule names. Glob-like wildcards were out of the question, but the "%" symbol, used for wildcards in SQL, weren't being used. Now my rule listing all of the directives looks like this: rule statements: %_directive | code | When Yapps begins to generate code, it scans for rules like this and creates a rule that matches any non-terminal whose name ends with "_directive"; in effect, this rule is added to my grammar: rule %_directive: Doc-strings in MethodsThe parser object now has doc-strings for all of its methods, similar to the input doc-strings used by SPARK. Also, since the derived scanner and parser classes have names that may not relate to the name of the generated module, the module now defines an __all__ variable that lists the names in a defined order. Additional PlansBug in Terminal Symbol DefinitionI've discovered a small bug in Yapp's design. Yapps allows terminal symbols to be defined as Python-like strings, using either single or double quotes. It then treats those as distinct symbols. For examples, consider these two rules: rule ImportStmt: "import" NAME Since they use different quoting, these rules generate different non-terminals in the parser, however the scanner will only generate tokens for the first one. This means that the second rule will never be able to match anything. I plan to convert all terminals into a canonical form to avoid this issue. Warn About Case-sensitive Non-terminalsI actually found the preceding bug when I accidentally used different capitalizations for what should have been the same token in different rules. (Remember, Inform 6 is case insensitive, so I'm ignoring the case of the terminal symbols.) In Python terms, I'd defined something similar to this: rule ImportStmt: "Import" NAME Besides resolving to use consistent capitalization in the future, I plan to do a case-insensitive comparison of all non-terminals and issue a warning if any match. Lessen the Use of Regular Expressions
| Site Navigation Menu |