Genetic Programming - Design

From the beginning G-REX was created as a tool for performing pedagogical rule extraction using GP, i.e. see (Johansson, König & Niklasson 2003).  During the following years the functionality of G-REX was extended, by the author, as it was used in several successful studies including the licentiate thesis work of the author (König 2009). Rule extraction is essentially a predictive modeling task which naturally meant that most of the functionality implemented in G-REX also could be applied to predictive modeling. Hence, it eventually became naturally to adapt G-REX to fully support predictive modeling and to launch it as an open source GP framework for predictive modeling, which was done in (König, Johansson & Niklasson 2008). The following sections will present the general GP design of G-REX’s GP engine.

Representation

G-REX can optimize arbitrary representation since the GP optimization is inherently independent of the representation. A combination of constrained syntactic- and strongly typed GP is used to give the user have full control of the structure of the programs without needing to bother about dataset specific details such as defining how all independent variables should be handled. More specifically, functions are handled syntactically while the input variables are strongly typed as categorical or numerical variables.

Scoring

A fundamental design choice in G-REX is to use leaf scoring, i.e. to score instances when they reach a leaf node, rather than using program scoring, i.e. evaluating the program an instance a time by propagating the prediction made from a leaf back to the instance. More specifically, when using leaf scoring the trainingset is presented to the root node of a program and is then partitioned into smaller sets by the following child nodes. When a leaf is reached all training instances that should be scored by that leaf are present and can be used to calculate a suitable prediction. Hence, the predictions do not need to be done using ERCs but can instead be calculated in different ways to simplify the search process, e.g. by prediction the majority class for classification tasks or the average value for a regression tasks. Furthermore, additional metrics like probability estimates can also be calculated based on the training instances in a leaf, thus giving more informed predictions. Another advantage of the leaf scoring design is that it is well suited to handle large search spaces since local search may be made in each node. Consider for example the problem of optimizing AUC which requires a probability estimate: leaf scoring would reduce the problem of finding the best partitioning of the dataset, since the probability estimates could be calculated based on the distribution of the training instances in the leaf; program scoring using ERCs would however in addition to the optimization of the partitioning of the dataset require a search for the best probability estimate for each partition, which of course is a much harder problem.A negative side of the node scoring design is that the whole trainingset must be held in memory during training which can be a limitation for extremely large datasets. However, even if it is possible to save the training instances in each leaf node this is normally not necessary since the calculated prediction is all that is needed to score novel instances.Finally it should be noted that the program scoring design also could apply similar functionality in a second optimization phase, before the actual scoring, if not only the prediction but also the predicting leaf would be registered for each prediction. This approach would however require that the local search functionality would be implemented separately from the leaf class thus resulting in classes with lower cohesion hand higher coupling which is opposite to the object oriented design principles

Creation of initial population

G-REX supports Grow, Full and Ramped half and half initialization with the addition that the chosen representation is followed. The first function is selected randomly and each parameter of the function is then selected from the parameter list with a uniform probability. This means that function and terminals per default will be selected with the same probability and not 80%/20% which Koza’s recommends. However, the exact selection probability can easily be changed by repeating the name of a function in the parameter list. This approach is more flexible and lets the user decide the probability for each function and terminal used in a representation.

 Evolution

The evolution process implemented in G-REX follow Koza’s (1992) original description closely with a few exceptions.Fitness functions in G-REX are minimized and have a built in typical parsimony pressure according to equation 34. All predefined fitness function in G-REX are normalized to promote more similar effects of the same level of parsimony pressure. To speed up fitness calculation n-threaded concurrent evaluation is supported. Furthermore, only the instances predicted by a part of a program affected by crossover or mutation are recalculated each generation.

A unique G-REX feature is that it supports rule extraction from other PM techniques (implemented in WEKA). When performing rule extraction G-REX replaces the actual value of the target variable, of all training instances, with the prediction of the predictive model, i.e. the pedagogical approach described in section 2.4.6. Hence, all fitness functions can also be used for rule extraction.

G-REX supports both Roulette wheel- and Tournament selection and creates a mating pool with the same size as the original population. Crossover, mutation and reproduction are then applied in user specified proportion. One difference from Koza’s implementation of the genetic operators is that it is ensured that the programs produced by the crossover and mutation operators are legal according to the current representation. Another difference is that values of constants used in combination with a variable, e.g. like x1<constant, are represented by an internal value which is translated to a value of the corresponding variable to ensure that the meaning of a constant is retained even if it, (due to a genetic operation), is associated with another variable. In other words, a low internal value will always correspond to a low value of the associated variable regardless of the variable scale or interval; see section 7.8.1 for more details.

 Predictive modeling using G-REX

In the evaluation of G-REX against the identified GP-PM criteria, G-REX meets a majority of the proposed criteria. G-REX can solve both regression and n-class classification, has predefined fitness functions for most typical score functions, handles the accuracy vs. comprehensibility tradeoff, allows tailoring of both representation, and facilitates benchmarking against many other PM-techniques.G-REX does however, like all other evaluated frameworks, not fulfill all criteria. More specifically G-REX lacks the ability to: remove introns, handle large search spaces, handle the inherent inconsistency of GP and has no explicit functionality to promote the discovery of interesting rules.  

Comments