Array Rule Child & Rule Child Options

Array Rule Child

GIGL allows rule child to be declared as arrays, similar (but having some differences) to C/C++ arrays, which is a convenient feature to have for rules that expands to a list of child of the same type. For syntax of declaring rule child, please refer to [Here].

- Elements of the arrays can be nonterminals (pointers to the nonterminal types) or terminals (C++ types etc.).
- Multidimentional arrays are allowed.
- The size of arrays is allowed to be a runtime evaluated expression, and is allowed to have access to the configure parameters (the item level ones and the ones for this rule). This is different form most C/C++ versions. For example a rule like follows is allowed (assuming A, B, and C are declared nonterminal types):
- A :=
- ruleX{int m, int n} : B* bs[GetRandInt(m)+1][GetRandInt(n)+1], C* c {...}
- It means that the ruleX expands a nonterminal of type A into a matrix of children of type B and a child of type C, where the first dimension of the matrix is an integer randomly chosen from 1 to m and the second dimension is an integer randomly chosen from 1 to n (and m and n are configure parameters for this rule that are set in the generator configurations).
- Whether the arrays are stored on the heap or on the stack depends on the rule child option. By default, they are stored on the heap (to accommodate the need of runtime evaluated sizes).

Rule Child Options

Each rule child can optionally be declared with an option field, which is mostly related to implicit allocation, initialization, destruction behaviors many of which are related to array typed rule children. The default behavior of a rule child is as follows (currently the ordering of children for the default operations are not well defined, i.e. do not assume in the same order or reverse order as the declaration; this may gets changed in the future).

- If the child is of array type, then the array space is allocated on the heap at the beginning of the generator (after evaluating the configure parameters) and constructor of the rule.
- For each non-array-type child or each element of an array type child (after array allocation), if it is of nonterminal (pointer) type, then it is initialized with a generator call to that nonterminal type without no parameter; otherwise if it is of pointer type then it is initialized to null pointer; otherwise there is no well defined default initialization for it. This only happens in the generator and does not apply for the constructor.
- For each non-array-type child or each element of an array type child (before array deallocation), if it is of nonterminal (pointer) type or just regular pointer type, then the memory is released (with a "delete" in C++) in the destructor of the rule.
- If the child is of array type, then the array space is deallocated at the end of the destructor.

Rule child options can be set to deviate from the default behavior and have more fine control on the mechanisms. For syntax of setting rule child options, please refer to [Here] plus [Here]. The options are indicated by the following keywords and each option is by default not enabled. Multiple options maybe enabled at the same time although some combinations may not make much sense (such as noexpand + nofill is equalivalent to just noexpand).

- 'static' indicates this child is a static array (i.e. not dynamically allocated), only available for array type children. This will put the array on the stack which often a little more efficient than putting it on the heap as what GIGL does by default, and it will not have the default array allocation and deallocation step. Note that this should only be used when the array dimensions are sized with constants, otherwise it is likely to cause an compilation error. This option affects both the generator and the constructor of the rule.
- 'noexpand' indicates the child is not implicitly generated when it is of nonterminal pointer type; in addition, if it is an array, the array space is not allocated (need manual allocation). If there in a statement in an explicit definition of the generator contains evidences of manual allocation of the space, including using array child helpers (discussed below) or having array access at higher than the element level (i.e. the pointer to elements or list of elements or higher) on the LHS of an assignment type expression, the option is automatically enabled. E.g. if a child is declared as "int a[2][3]", having an expression in the generator like "a[2] = new int[3]" automatically enables this option. This option only affect the generator but not the constructor (the constructor always allocates the array space when it is not static).
  - 'nodestroy' indicates the child is not implicitly deleted when it is a of nonterminal pointer type; in addition, if it is an array, the array space is not deallocation (need manual deallocation). If there in a statement in an explicit definition of the destructor contains evidences of manual allocation of the space, including using array child helpers (discussed below) or deleting with array access at higher than the element level (i.e. the pointer to elements or list of elements or higher), the option is automatically enabled. E.g. if a child is declared as "int a[2][3]", having an expression in the generator like "delete a[2]" automatically enables this option.
  - 'nofill' indicates the child is not implicitly generated when it is pointer type (same as 'noexpand' for non-array children). If there in a statement in an explicit definition of the generator contains evidences of manual assignment to this child, including using array child helpers (discussed below) or having the child (including any level of array access) on the LHS of an assignment type expression, the option is automatically enabled. This option only affect the generator but not the constructor (the constructor never generates the child nodes by default, in fact, it expect them to be already created and passed in as parameters).
- 'norelease' indicates the child is not implicitly deleted when it is pointer type (same as 'nodestroy' for non-array children). If there in a statement in an explicit definition of the destructor contains evidences of manual assignment to this child, including using array child helpers (discussed below) or deleting the child (including any level of array access), the option is automatically enabled.

Array Child Helpers

Users of GIGL may generally use the arrays as regular C++ arrays, but some helper syntax can facilitate operations over arrays and array elements by eliminating the need for explicit loops. The array child helper syntax are a set of special type of statement leaded by one of several keyword introduced in GIGL, as shown below. Many of the memory management operations are implicitly executed with default rule child options, but can however be overridden by user controls. The detailed syntax can be seen at [Here].

- 'expand' leads a statement that allocates the array space and fills each entry with the specified expression. E.g., "expand a with 3;" allocates space for array a and fills in each entry of array a with the value 3 (assuming it is an integer array), regardless of whether it is a one dimensional or two dimensional (or three ... ) arrays. E.g., "expand b with generate Expr(...);" allocates space for array b and fills in each entry of array with (pointers to) generated nodes of Expr type (assuming Expr is a nonterminal type).
- 'expandzero' leads a statement that allocates the array space and fills each entry with zero (equivalent to 'expand' statement with the expression be integer literal zero).
- 'destroy' leads a statement that deallocates the array space and also calls delete on each array entry before that (assuming the array entries are pointers to some valid memory). E.g., "destroy b;" following the second example in 'expand' first deletes the Expr type nodes through the pointers stored in the array and then deallocates the array.
- 'allocate' leads a statement that allocates the array space (but does not fill in the entries). E.g., "allocate a;" allocates the space for array b (regardless of dimensions) but does not initialize any values for array entries.
- 'dealloc' leads a statement that deallocates the array space (but does not call delete on each array entry). E.g., "deallocate a;" deallocates the space for array b. If array a does not store pointers this is usually fine on itself, however, be aware if it stored pointers to dynamically allocated contents (i.e. heap contents) and there were no other duplicates of those pointers, then the program would lose track of those memories which is a typical memory leak problem. Therefore, it is better to let the system manage the whole deallocation/memory release part, or, in this case, use destroy instead of deallocate, or make sure the memories those pointers points to are properly released in advance.
- 'fill' leads a statement that fills each entry with the specified expression (it assumes the array space is already allocated and does not do the allocation). E.g., "fill b with generate Expr(...);" fills in each entry of array b with (pointers to) generated nodes of Expr type, assuming the array space is already allocated. There are also potential dangers of memory leak when replacing array of pointers to dynamically allocated contents, which may be conveniently countered by using the 'refill' below instead.
- 'fillzero' leads a statement that fills each entry with zero (equivalent to 'fill' statement with the expression be integer literal zero).
- 'release' leads a statement that calls delete on each array entry (but does not deallocate the array space). E.g., "release b;" following the example in 'fill' deletes the Expr type nodes through the pointers stored in the array but does not deallocate the array, therefore be aware of potential dangling pointer issue (accessing released memories). Using the 'releasezero' below can avoid the dangling pointer issue, but may cause null pointer reference issues (anyway, if delete is used on a pointer, that pointer should not be used for access until it is assigned with the address of some new contents).
- 'refill' leads a statement that first calls delete on each array entry then fill it with the specified expression.
- 'releasezero' leads a statement that first calls delete on each array entry then fill it with zero (equivalent to 'refill' statement with the expression be integer literal zero), an alias of 'refillzero'.
- 'refillzero' leads a statement that first calls delete on each array entry then fill it with with zero (equivalent to 'refill' statement with the expression be integer literal zero), an alias of 'releasezero'.
- 'iterate' leads a statement that executes the specified statement for each array entry. The statement can be a single statement or a block statement (i.e. those wrapped in braces). This is usually meaningful only with the indexing features below (otherwise it can only repeat literally same operation for each array entry).

All array operations can optionally have an index specification part for specifying the set of entries the operation applies on.

- A single integer type expression means a single entry, as it does normally in C++, such as "a[3]" and "a[k + 1]" (where k is an integer variable). All indices counts from zero as in C/C++.
- Nothing means all entries, such as "a[]".
- Single comma means union (i.e. and, or, listing), such as "a[0, 3]" which means entry a[0] and a[3].
- Double comma means range, such as "a[0,,3]" which means entry a[0], a[1], a[2], and a[3]. This can combine with single comma, such as "a[0,,2, 4]" which means a[0], a[1], a[2] and a[4].
- Double comma with only one end means from the beginning or until the end, such as "a[,,2]" which means a[0], a[1], a[2], and "a[3,,]" which means a[3], a[4] assuming the size is 5. This can combine with single comma, such as such as "a[,,2, 4]" which means a[0], a[1], a[2] and a[4].
- Double comma with no ends is the same as nothing, such as "a[,,]" which is equivalent to "a[]". This cannot be combined with any other separators.
- The index specification part can contain specifications across multiple dimensions for multi-dimensional arrays. All other rules applies the same way. E.g. "b[1, 3][,,2, 4]" means b[1][0], b[1][1], b[1][2], b[1][4], b[3][0], b[3][1], b[3][2] and b[3][4].
- Many of these operations in most of the cases make sense on the element level of the array, such as "fill", "iterate" etc. For those statements, in default and most common setting, what matters is only the element level entries the specifications covers. E.g., "fill b with 3" is the same as "fill b[] with 3", and "fill b[1,,3] with 3" is the same as "fill b[1,,3][] with 3". For others, that is, those with allocation/deallocation components, such as "expand", "dealloc" etc., and other special situations, e.g. filling in the first dimension of 2D array with prepared 1D arrays, the situation might be a little more complicated, which is briefly mentioned along with some index specification options below.
- The index specification for each dimension can optionally be labeled with a iteration variable, such as "a[i: 1,,3]". This can be useful in examples like "fill a[i: 1,,3] with i+1;" which fills each entry with a different value, and like "iterate over a[i: 1,,3] sum += a[i];" which sums the values of different entries.

As mentioned above, for allocation/deallocation related operations for multi-dimensional arrays, the index specification may have some ambiguities. E.g., if b is a two dimensional array, does "allocate b[1,,3];" mean literally allocate the memory for b[1], b[2] and b[3] themselves? Should it also first allocate the higher level memory, the memory that holds b[0], b[1], b[2], b[3] ...(i.e. b = new ...)? What about the lower level memory, i.e. the memory to hold each array element b[1][1], b[1][2] ... ? While this distinction is not that important in practical usage (because those situations are rare and often the default allocation/deallocation mechanisms are sufficient), we does have some index specification options (optional as we has default setting) just for completeness of the semantics. The option is specified with one of the four keywords mentioned as follows (the default is 'downfrom'). Note that selecting non-default setting may also affect the semantics of other operations like "fill", "iterate" etc., so that we can do very uncommon things like filling in the first dimension of 2D array with prepared 1D arrays.

- 'downfrom' means allocation/deallocation applies to spaces starting from the lowest level in the index specification and goes to the bottom level, and fill/release/iterate applies to entries at the bottom level (element level).
- 'upto' means allocation/deallocation applies to spaces upto the lowest level in the index specification, and fill/release/iterate applies to that lowest level in the index specification (which may not be bottom level array entries).
- 'all' means allocation/deallocation applies to all dimension levels, and fill/release/iterate applies to entries at the bottom level (element level).
- 'only' means allocation/deallocation applies to spaces only at the lowest level in the index specification, and fill/release/iterate applies to that same level (which may not be bottom level array entries).

Some of the array helpers (those with expressions to fill in) has lambda expression implementation similar to that in the lambda configuration feature, which is very important for increasing the usefulness of these helpers. Here it means certain expressions are not evaluated with the specification of the the helper statement, but rather when applying the operations to each applicable entries. It is not that surprised that the expressions involved in the "iterate" statements are evaluated as it operates on each entry, as it is basically executing loops containing a specified statement over those entries. For some other cases, including "expand", "fill", "refill" statements, where an expression to be filled in needs to be specified, the evaluation of these expressions to be filled in is something worth discussing.

- As a lambda expression, they gets evaluated when the operation executes on each applicable entries. E.g. "fill a with GetRandInt(10)" can fill in different integers randomly selected from 0 ~ 9 for each entry. E.g. the example "fill b with generate Expr(...);" appeared earlier this page actually does proper stochastic generation as the "generate Expr(...)" give a different node for each entry of array b.
- The arguments to the lambda expressions are the index iteration variables, if they are labeled. E.g. "fill a[i: 1,,3] with GetRandInt(i)+1" has the random range for each entry dependent on the index of the entry.

Note that the if a rule child is an array type, in the rule constructor, it should be passed in with a variable with a type gigltable<...> where inside the <...> is the type of each element of the array (its like a C++ templated type but C++ templated type syntax has not been included in GIGL yet) which can be constructed with <...> {...} where inside the <...> is the type of each element, and inside {...} is the initialization list for the array. The special syntax is adopted as C++ does not allow passing in initialization list for array type (or pointer type) arguments (which means it might gets changed later). See [Here] for exact syntax about this.