Background & Terminologies

Formal Language Terminologies

The grammatical representations in GIGL builds on top of context-free grammars (CFG), or probabilistic context-free grammars (PCFG) to be more accurate. A CFG uses a set of nonterminal and terminal types and a set of rules to encode a set of strings. Any string in that set can be generated by starting from the start nonterminal and recursively applying one of the rules to expand each nonterminal node into a set of nonterminals or terminals. SCFGs associate probabilities to the rules to represent the chances of them being taken when expanding a certain type of nonterminal nodes. We use grammars as generative models such that the rules are stochastically applied to generate items (used in a general fashion to refer to instances of content to be procedurally generated), and with the additional feature of allowing context sensitive probabilities etc. (explained later).

  • Rule, Nonterminal, Terminal

We adopt and adapt many terminologies from CFGs and SCFGs. The following is a rule-based description of a CFG, which encode any algebraic expression containing the muliplication and addition operations on one variable 'a' (this is an unambiguous grammar).

E := E '+' T

E := T

T := T '*' F

T := F

F := '(' E ')'

F := 'a'

Each line above is a rule (or production rule, rewrite rule, replacement rule) which stands for a way we may choose to replace a certain symbol with a set of others. Nonterminal (node) types or nonterminal symbols refer to type of representations (or node, symbols) that we can apply rules on (i.e. those appearing on the left side of rules; e.g. E, T and in the example above). Nonterminal nodes refers to instances of these types. The term nonterminal on itself can refer to either a type or an instance, based on the context. The start nonterminal (or start symbol) is where the rule application starts. It is the first nonterminal appear in the rule list (e.g. E in the example above) by default. The counterpart of nonterminal is terminal, which refer to the types/symbols (or instances of them) that is only a result of some rules but cannot be further applied with rules (i.e. those only appearing on the right side of rules but not on the left; e.g. the '+', '(', ')' and 'a' in the example above). The term expand or expansion have two meanings. The expansion of a node (an instance) means an instance of action of applying a rule, while the expansion of a type means the set of rules starting from that nonterminal type (e.g. the first two rules above are the expansion of nonterminal E), which may alternatively be called expansion rules or the rule set for that nonterminal type. The symbols (nonterminals and terminals) on the right hand side of a rule are called the children of the rule (or rule children). We may collect the rules together based on what nonterminal they expand from, using a bar symbol (|) to stand for alternatives, as follows:

E := E '+' T | T

T := T '*' F | F

F := '(' E ')' | 'a'

It is important to note that this is just another way of display the same grammar and it does not change the number of rules. Here we still have six rules. Rule probabilities (or probabilities of rules) which means the probability to choose a certain rule in SCFG when expanding a type of nonterminal. The rule probabilities in the expansion of each nonterminal should add up to 1.0 by default (e.g. the first two rules above should have their probabilities add up to 1.0, so do the middle two, and the last two). Another relevant terminology is syntax trees or parse trees, which are instances of tree structures formed applying rules, which contain instances of terminals, nonterminals and expansions that form connection between the nodes. One grammar often corresponds to multiple parse trees. There may be some nuance in the terminology of concrete syntax tree or abstract syntax tree depending on how the grammar is represented, we are not trying to emphases it here.

  • Item Grammar and Item Tree

We call the grammars in the GIGL framework as item grammars, which may appear somewhat different from how we mention grammar rules above. Here is an example item grammar for a math quiz containing a list of arithmetic expressions containing multiplication and addition operations on integers. Note: this is still not GIGL syntax but just a close version that encode the idea of the representation when using GIGL.

Quiz := seqQuiz(Expr, Quiz) | emptyQuiz()

Expr := mulExpr(Expr, Expr) | addExpr(Expr, Expr) | intExpr(int)

Here there are two nonterminal types (named Quiz and Expr) and five rules (named seqQuiz, emptyQuiz, mulExpr, addExpr and intExpr) in total. The rules are named and we are focusing on less on encoding concrete information, which is close to the notion of abstract syntax used in many programming language studies.

Item trees are instances of tree structures formed by applying rules in item grammars. This corresponds to the notion of syntax trees or parse trees in CFG. Because of the features of item grammars, the item trees in GIGL is close to the notion of abstract syntax tree. The following is one possible item tree generated by the item grammar above, where ε is the empty terminal symbol. This item tree correspond to a quiz containing to arithmetic questions, "(1 + 2) * 3" and "6 * 9".

An alternative style for illustrating item tree structures is as follows, which appears first in our AIIDE paper. It present more faithfully the data structures within the C++ objects generated, and does not include the empty terminal symbol (i.e. empty terminal symbol does not need memory to store).

GIGL Programming Language Terminologies

Grammatical item generation language (GIGL) is a domain specific language (DSL) designed to support the grammatical representation framework as mentioned above. The domain here is PCG with grammatical approaches. In general, GIGL extends from C/C++, inheriting their general programming power (there are some nuances in current implementation), while adding new syntax features for domain specific tasks. As a programming language, GIGL itself is a context-free language (CFL), which has its own grammar (which is a CFG), which is different to the item grammars it encodes. Therefore, please note whether it refers to the item grammar or the programming language grammar when reading about grammar terminologies. We prefer using "productions" over "rules" when referring to the production rules in the GIGL programming language grammar (this choice is consistent to the terminology in Silver, the programming language in which GIGL is implemented in).

    • item types: Item types are custom defined data types for (parametrized) item grammars and their associated generators. Each item type encodes the grammar components (nonterminals, rules etc.), attributes, and certain generator behaviors. Item types in GIGL is analogous to classes in C++. See [Here] for a big picture the role of item type in a GIGL grammar, see [Here] for an illustration of data structure, and [Here] for details about the encoding schemes.

    • item type options: Item type options are options to set some of the implicit parts of the item type data structures and generator behaviors. They are associated to each item type. Each option has default setting so this part is optional. See [Here] for more details.

    • wrapper: In GIGL, the wrapper is the interface C++ object for a (generated) item. See [Here] for an illustration of data structure and [Here] for more discussions.

    • node: In GIGL, unless otherwise specified, nodes refers to nonterminal nodes (usually instances of them, as opposed to nonterminal types or symbols). In terms of data structure, this is often mentioned in contrast to the wrapper. See [Here] for an illustration of data structure and [Here] for more discussions.

    • attributes: Attributes are associated data that are not part of the backbone of the grammar itself (i.e. not data used represent rules, nonterminals and terminals etc.). They can be considered regular C++ class members. Attributes can be variables (called variable attributes) or functions (called functional attributes), and they can be defined on the wrapper (called wrapper attributes) or each of the node (called node attributes). See [Here] for an illustration of data structure and [Here] for more discussions.

    • configure parameter: Configure parameters are parameters that are parameters for item types, used to parameterize the item grammar and the associated generation process. They are set in the generator configurations. See [Here] fore more details.

    • generator configuration: A generator configuration is a data structure that stores all information that are used to set the configure parameters for an item type. See [Here] for more details.

    • generation time: Generation time refer to the time when a generator is being executed as it is generating items or nodes in the items. This is used often in discussing when (the point of time) some expression gets evaluated, and in contrast to configuration time. Note that this is different from the usage of generate time and access time as the length of time periods in evaluating runtime performance related metrics.

    • configuration time: Generation time refer to the time when a generator configuration is being constructed. This is used often in discussing when (the point of time) some expression gets evaluated, and in contrast to generation time. Note that this is different from the usage of generate time and access time as the length of time periods in evaluating runtime performance related metrics.

    • lambda configuration: Lambda configuration is an important feature in GIGL, which states that the setting (binding) of each configure parameter in the generator configurations is considered a lambda expression, i.e. they are evaluated at generation time rather than at configuration time. See [Here] for more details.

    • generate: Generate (items or nodes), in contrast to construct, refers to the action or process of creating instances of items (or parts of them) through stochastic expansion via the grammar, i.e. selecting expansion rules stochastically. A generator is the code for executing the corresponding generation process, (implicitly or explicitly) defined in the wrapper for the item level and in the rule block for the node level. See [Here] for more details.

    • construct: Construct (items or nodes), in contract to generate, refers to the action or process of creating instances of items (or parts of them) through deterministically specifying item trees, i.e. specifying selected rules concretely. A constructor is the code for executing the corresponding construction process, (implicitly or explicitly) defined in the wrapper for the item level and in the rule block for the node level. We used the keyword and terminology gencontor, as as a way to define shared statement between generators and constructors. See [Here] for more details.

    • maybe semantics: Maybe semantics is a GIGL feature, which states that each rule child, if of nonterminal type, can optionally be non-existent (represented by a null pointer). It can be made into effect with a maybe generator calls. See [Here] for more details.

    • maybe (node) generator calls: Maybe (node) generator calls are expressions to generator a nonterminal node (with its sub-syntax trees), but only do so with certain (specified) probability (otherwise return null pointers). See [Here] for more details.

    • default rule: A default rule is not actually a rule in grammars, but rather a definition of elements (attribute implementation etc.) that are shared by all rules expanding from the same nonterminal type. The choice of this naming comes from the convention of the default production in Silver. See [Here] for more details.

    • pre-selector: A pre-selector is a block of statements to be executed before selecting rules to expand from a nonterminal. It often contains statements that temporary manipulating rule probabilities, including forcing or forbidding certain rules, according to local conditions. Declared inside the default rule blocks. See [Here] for more details. Other analogous keywords and terminologies corresponding to blocks that are declared in the default rule blocks are pre-generator, pre-constructor, pre-gencontor, post-destructor, which are used to define generator/constructor/destructor parts that are shared among all rules expanded from the same nonterminal type. See [Here] for more details.

    • rule child options: Rule child options are options that can (optionally) be attach when declaring each rule child. They are mostly related to implicit allocation, initialization, destruction behaviors many of which are related to array typed rule children. See [Here] for more details.

    • array (rule) child: A rule child that is of array type is called an array (rule) child. See [Here] for more details.

    • array child helpers: Array child helpers refers to syntax features (mostly for statements) for helping conveniently express operations on array type of children, often by eliminating the need to write explicit loops. See [Here] for more details.

    • smart rule probability configuration: Smart rule probability configuration refers to the flexible usage of a range of syntax options for conveniently express the setting of rule probabilities in generator configurations, rather than always doing them explicitly and in standard ways. On this aspect, GIGL provides some shorthanded syntax with some implicit behavior, or by providing syntax features for referencing rule probabilities. See [Here] for more details.