Item Grammar & Generator Encoding

Item grammars are parametrized in GIGL, i.e. the encoding of item grammars in item types are parametrized PCFGs. The back bone of the item grammars, or the PCFGs, are CFGs that are defined with nonterminals and rules etc.. Note that terminals are also part of CFGs as well as item grammars, but they are concrete problem specific types, that can be any C++/C or GIGL compatible types, thus does not need specific GIGL features to support. In addition to the definition of the grammar itself, the definition of generators and how attributes relates on different nodes behaves and relates to each other is also part of the item type. The discussion about attributes can be seen in [Here]. The syntax for nonterminal blocks and rule blocks are in [Here] and [Here] respectively.

Nonterminals

    • Nonterminal types (or nonterminal symbols) are declared in nonterminal blocks. Each nonterminal type must be declared before it is referenced (except for those reference in wrapper and node blocks).

    • Nonterminal types can be declared with arguments (called nonterminal arguments or arguments to a nonterminal type), and they are used as arguments to node generators. They are used to pass down information from the top of the tree in the generation process, and often related to some context sensitive control on the stochastic generation process (see [Here] for examples of how these arguments are related to the context senstive control). An example is the "depth" in the Quiz demo, which is declared with the "Expr" nonterminal as:

    • Expr(int depth);

    • and the way to invoke the node generator for an "Expr" type node can be something like:

    • generate Expr(depth + 1);

    • which pass down the incremented depth value to this generated node. Another way to invoke the node generator, which is not shown in the example is to specify the rule name, which effective skips the selection step and force the specified rule to be selected, such as

    • generate addExpr(depth + 1);

    • would guarantee generate an additive sub-expression. Both ways uses the same signature for arguments, i.e. the one declared along with the nonterminal type.

    • Nonterminal nodes can also be created with node constructors (in addition to node generators) in a different way. For the discussion of generate and construct, please refer to [Here].

    • Nonterminal type can inherit C++ classes to be able to use there members without adding node attributes, with the inherit keyword. This feature is experimental and has not been shown in any demos yet.

Rules

    • All rules in GIGL are named (like the rules denoted in an abstract syntax tree) to enable convenient and accurate referencing. For example, in the Quiz demo, the code

    • Expr :=

    • | mulExpr: Expr* expr, Expr* expr { ... }

    • | addExpr: Expr* expr, Expr* expr { ... }

    • | intExpr: int val { ... }

    • encodes a grammar

    • Expr :=

    • | mulExpr(Expr, Expr)

    • | addExpr(Expr, Expr)

    • | intExpr(int)

    • where the "mulExpr", "addExpr" and "intExpr" are names (or labels) of the rules that expands the nonterminal type "Expr". What's declared after the : symbol in the code are the children of the rule (written in parenthesis in the grammar version), which are the set of node types (symbols) the rule expands into. In GIGL code, each children must be associated with some variable so that they can be referenced.

    • Children of rules that are of nonterminal types (i.e. not terminals) must be declared as pointers and used as pointers (and they are implemented as pointers).

    • Children of rules that are of array types have special semantics in GIGL, which is discussed [Here].

    • Two different rules (different names) can take the same set of child types, e.g. the "mulExpr" and "addExpr" in the example above, but their semantics are generally different (otherwise only one rule would be needed).

    • Currently GIGL requires all rules expanded from the same nonterminal type grouped together (using the format showed above with the first | symbol being optional), this may or may not be changed later.

    • Each rule declaration is followed with a block (the omitted parts wrapped in { and } above) that defines implementations associated to this rule, including node functional attribute implementation, the generator (constructor, destructor), etc..

    • The node variable attributes is often assigned in the generator implementation, and alternatively, they can also be written outside generator implementations, which still implicitly inserts them inside there (would insert to the node constructor as well in that case so it is not always exactly equivalent).

    • The node functional attributes must be implemented outside the generator implementations, and starts with only function name but not function signatures (the function signature automatically takes one declared in the node block). They may also take the form of assignments, but it is equivalent to define a function with a simple return statement. An example is in the HelloWord demo, the

    • Print = "Hello World!";

    • in the rule "helloWorld" for assign the functional attribute Print is just an alias of

    • Print { return "Hello World!"; }

    • in the same place. The function signature for "Print" is

    • string Print()

    • which is declared in the node block.

Generators

    • Item generators are defined in the wrapper block (also called wrapper generators), which serves as interfaces to items of the item type. Node generators are defined inside each rule (similar for constructors and destructors). They are also discussed in the topic wrapper vs. node.

    • By default, an item generator is implemented to call the generator of the root node with no parameters (this may change later) if the root is not explicitly assigned, a node generator is implemented to call the generator of its child nodes that are of nonterminal (pointer) types, if they are not explicitly called (it checks for each nonterminal child separately). The convention applies analogously to destructors (but not constructors).

    • Calling the node generator with nonterminal type name (the "generate Expr" version above) first implicitly go through a stochastic selection process (after a pre-selection process if a pre-selector is defined) and then execute what is implicitly or explicitly defined as the implementation of the generator in the rule that the selection process selects. The arguments are the arguments to this nonterminal type as mentioned above (e.g. an integer representing depth in the example above).

    • Calling the node generator with rule name (the "generate addExpr" version) skips the selection process (and the pre-selection process etc. if any) and directly excutes the implementation of the generator in the specified rule. However, note that the argument signature does not change when using rule names to call (as opposed to using nonterminal name to call), still from those arguments to the nonterminal type the rule expands from (e.g. still an integer representing depth in the example above).

    • Node attributes are only available after the stochastic rule selection step, because a node is only created when its sub-type (rule) is determined.

Execution flow for node generators and different ways to invoke it (red code). Steps with blue backgrounds are always implicit in GIGL. For pre-selectors please refer to [Here]. For configure parameters, please refer to [Here].