# From the beginning G-REX was created
as a tool for performing pedagogical rule extraction using GP, i.e. see
(Johansson, König & Niklasson 2003).
During the following years the functionality of G-REX was extended, by
the author, as it was used in several successful studies including the
licentiate thesis work of the author (König 2009). Rule extraction is
essentially a predictive modeling task which naturally meant that most of the
functionality implemented in G-REX also could be applied to predictive
modeling. Hence, it eventually became naturally to adapt G-REX to fully support
predictive modeling and to launch it as an open source GP framework for
predictive modeling, which was done in (König, Johansson & Niklasson 2008). The following sections will present the general GP design of G-REX’s GP
engine.

### Representation

G-REX can optimize arbitrary
representation since the GP optimization is inherently independent of the
representation. A combination of constrained syntactic- and strongly typed GP is
used to give the user have full control of the structure of the programs
without needing to bother about dataset specific details such as defining how
all independent variables should be handled. More specifically, functions are
handled syntactically while the input variables are strongly typed as
categorical or numerical variables.

#### Scoring

A fundamental design choice in G-REX
is to use *leaf scoring,* i.e. to score
instances when they reach a leaf node, rather than using *pr**ogram scoring, *i.e. evaluating the program an instance a time by propagating
the prediction made from a leaf back to the instance. More specifically, when
using leaf scoring the trainingset is presented to the root node of a program
and is then partitioned into smaller sets by the following child nodes. When a
leaf is reached all training instances that should be scored by that leaf are
present and can be used to calculate a suitable prediction. Hence, the
predictions do not need to be done using ERCs but can instead be calculated in
different ways to simplify the search process, e.g. by prediction the majority
class for classification tasks or the average value for a regression tasks.
Furthermore, additional metrics like probability estimates can also be
calculated based on the training instances in a leaf, thus giving more informed
predictions. Another advantage of the leaf scoring design is that it is well
suited to handle large search spaces since local search may be made in each
node. Consider for example the problem of optimizing AUC which requires a
probability estimate: *leaf scoring*
would reduce the problem of finding the best partitioning of the dataset, since
the probability estimates could be calculated based on the distribution of the
training instances in the leaf; *program
scoring* using ERCs would however in addition to the optimization of the partitioning
of the dataset require a search for the best probability estimate for each
partition, which of course is a much harder problem.A
negative side of the node scoring design is that the whole trainingset must be
held in memory during training which can be a limitation for extremely large datasets.
However, even if it is possible to save the training instances in each leaf
node this is normally not necessary since the calculated prediction is all that
is needed to score novel instances.Finally
it should be noted that the *program
scoring* design also could apply similar functionality in a second
optimization phase, before the actual scoring, if not only the prediction but
also the predicting leaf would be registered for each prediction. This approach
would however require that the local search functionality would be implemented
separately from the leaf class thus resulting in classes with lower cohesion
hand higher coupling which is opposite to the object oriented design
principles.

#### Creation of initial population

G-REX supports Grow, Full and Ramped
half and half initialization with the addition that the chosen representation
is followed. The first function is selected randomly and each parameter of the
function is then selected from the parameter list with a uniform probability.
This means that function and terminals per default will be selected with the
same probability and not 80%/20% which Koza’s recommends. However, the exact
selection probability can easily be changed by repeating the name of a function
in the parameter list. This approach is more flexible and lets the user decide
the probability for each function and terminal used in a representation.

### Evolution

The evolution process implemented in
G-REX follow Koza’s (1992) original description closely with a few exceptions.Fitness functions in G-REX are
minimized and have a built in typical parsimony pressure according to equation 34.
All predefined fitness function in G-REX are normalized to promote more similar
effects of the same level of parsimony pressure. To speed up fitness
calculation *n-*threaded* *concurrent evaluation is supported.
Furthermore, only the instances predicted by a part of a program affected by
crossover or mutation are recalculated each generation.

A unique G-REX feature is that it
supports rule extraction from other PM techniques (implemented in WEKA). When
performing rule extraction G-REX replaces the actual value of the target
variable, of all training instances, with the prediction of the predictive
model, i.e. the pedagogical approach described in section 2.4.6.
Hence, all fitness functions can also be used for rule extraction.

G-REX supports both Roulette
wheel- and Tournament selection and creates a mating pool with the same size as
the original population. Crossover, mutation and reproduction are then applied
in user specified proportion. One difference from Koza’s implementation of the
genetic operators is that it is ensured that the programs produced by the
crossover and mutation operators are legal according to the current
representation. Another difference is that values of constants used in
combination with a variable, e.g. like *x*_{1}<*co**nstant*, are represented by an internal
value which is translated to a value of the corresponding variable to ensure
that the meaning of a constant is retained even if it, (due to a genetic
operation), is associated with another variable. In other words, a low internal
value will always correspond to a low value of the associated variable
regardless of the variable scale or interval; see section 7.8.1
for more details.

### Predictive modeling using G-REX

In the evaluation of G-REX against
the identified GP-PM criteria, G-REX meets a majority of the proposed criteria.
G-REX can solve both regression and *n-*class
classification, has predefined fitness functions for most typical score
functions, handles the accuracy vs. comprehensibility tradeoff, allows
tailoring of both representation, and facilitates benchmarking against many
other PM-techniques.G-REX
does however, like all other evaluated frameworks, not fulfill all criteria.
More specifically G-REX lacks the ability to: remove introns, handle large
search spaces, handle the inherent inconsistency of GP and has no explicit
functionality to promote the discovery of interesting rules.