Learning Classifier Systems
Links:
http://championship.mql5.com/2010/en/news/21
http://code.google.com/p/betterea/downloads/detail?name=xcs.pdf&can=2&q=
http://code.google.com/p/betterea/downloads/detail?name=xcslib-1.0.tar.gz&can=2&q=
2.3 XCS Introduction from Stock Trading Perspective
XCS stands for extended Classifier System. It is an accuracy based classifier
system which is different from other classifier in the way that classifier fitness is
derived from estimated accuracy of reward predictions instead of from reward
prediction themselves. It is an Online learning machine, which improves its behavior
with time through interaction with environment. XCS learns through reinforcement
and the aim is not only to get more reward but to maximize the value (Summation of
all the rewards in long run). The System is given least amount of prior information, so
that most of the machine knowledge results from adaptation to the environment. We
don’t tell it how to do things but let it learn through fed inputs, action it takes and the
reinforcement it gets. If it does well, we give it positive reward else penalize it in some
form.
2.3.1 XCS Input and Output
XCS Input Unit : The input to XCS is binary vector e.g. 10010110 where each bit
can be thought of as crossing the threshold of continuous valued Output of some
sensors. In our XCS, everyday the system gets previous day data about the individual
stock price (open, high, low and close) and volume. The Meta agents present in the
system apply technical analysis on this information to get buy (1), sell (0) signals. A
very simple example is moving average of price and volume. Let’s say a particular
agent calculates the moving average of price and volume for the past 10 days. If the
closing price of stock is greater than 10 day moving average of price, it suggests price
may go up and that’s a buy (1) sign. Similarly, if the volume of the stock is greater
than 10 day moving average of volume, it also predicts a buy (1) sign. So the input
string that will be fed into the XCS will be 11. This is a very simple example. In
actual, system uses more advance technical analysis to form the input binary string
which may range from 6 to 9 bits in length. Each bit can be either 0 (sell) or 1 (buy).
More detail about how the problem is being defined to the XCS System using
technical analysis can be found in section 3.1.
XCS Output: XCS output is discrete action or decisions. For example in our case it is
either 0(sell) or 1(buy). The final aim of learning cycle for XCS is to learn what action
it should take for a particular combination of input binary bits. Please note XCS uses
unsupervised learning (Reinforcement Learning) wherein at any point we don’t tell it
what is right and what is wrong. It has to find this out through experimental trial and
error and reward mechanism. Depending on the input string sometimes it is easy to
predict what the correct action is. For example if input string is 101111 i.e. out of 6
bits, 5 bits are suggesting to buy the stock, then the correct action must be to buy the
stock. However, at other times the correct action might not be so evident. For example
if the input string is 111000, the proportion for both buy and sell signal are equal and
we expect it to learn what weight would be appropriate to give to individual signal.
Even in real life scenario, on a given day an actual trader might face with situation
wherein there is no clear sign of buy or sell from combination of technical indicator.
He/She then have to judge from experience which technical indicator information
should be given more weight and decide accordingly.
2.3.2 XCS Frame Work [15]
XCS contains population of Classifiers. Each classifier in the population is
characterized by 5 main components -:
1) Condition part C, which specifies on what problem instances the classifier is
applicable.
2) Action part A, specifies what action classifier takes when condition C is
fulfilled.
3) Reward prediction P, estimates what payoff or reward classifier can expect on
executing the action.
4) Reward prediction error ε estimated the mean absolute difference of R with
respect to the actual reward.
5) Fitness F estimates the scaled, relative accuracy of classifier with respect to
other overlapping classifiers in the action set it is present.
In Short a classifier is a set of
<Condition> : <Action> => <Prediction>
Prediction is similar to payoff of Reinforcement Learning.
Eg 01#1## : 1 => 693.2
{0 sell : 1 Buy : # : don’t care}
This classifier says if first bit is 0, second is 1 and fourth is 1 and I don’t care about
others then after taking action 1(Buy), 693.2 will be the payoffs. The payoffs are
updated with the learning of system. The above given classifier’s condition matches
with following 8 input string. It might be the sum of all there payoffs.
010100
010110 There can be 8 such cases
010101
..........
It’s different from other action based systems like Neural Network in the sense that in
Neural Networks payoff information for any Input are distributed over the whole
Network. Each classifier acts only a subset of problems. It checks whether given
condition is one on which it can act. If condition is there, it acts on it and predict
certain payoff.
Figure1 XCS Frame Work [15]
F: fitness of prediction α 1\ε
[P]: Classifier population
[M] : Match Set ε: error in prediction
p: predicted value
2.3.3 XCS Learning Cycle
In the starting the classifier population [P] is generally empty. The agents use
the stock price and volume data to form the input string (For details please see section
3.1). The Input is fed into classifier population [P], which detects if there is any match.
The 4 classifier marked with -- in fig 1 matches the Input 0011. They are put in a
match set [M]. If no classifier matches the given input, XCS creates classifier by
covering mechanism (A rule is created at random and has random action and is
assigned a low prediction). A new rule has a certain number of don’t care sign (#) in
random position. The # sign give classifier an initial generality due to which it can be
tested on many input problem instances.
Covering is necessary only initially and vast majority of new rules are derived
from existing rules.
For example, suppose Input string is 11000101 and there is no classifier which
matches this input .Then the rule created is 1##0010# : 01 10
Continuing the process, after creation of Match set, XCS estimates payoff for each
possible action by forming a prediction array P (A). In fig1, 2 classifiers in the match
sets are predicting 01 and two are predicting 11. We take the Fitness weighted average
of prediction for each action
Predicted weighted = Σ prediction * fitness
Average --------------------------------
Σ fitness
Eg P(action=01) = 43*99 + 27* 3
-------------------- = 42.57
99+3
Similarly P(action=11) = 16.6
Hence, P(A) shows fitness weighted average of all reward prediction estimates of the
classifier in [M] that advocate classification A. The System follows an ε greedy policy
i.e. it takes the best action most of the time, but with small probability
ε (exploration probability) it also takes suboptimal action and chooses random action
from those in the prediction array. All classifiers in match set [M] that specifies
chosen action A forms the action set [A]. In fig 1, we have chosen the action with
maximum prediction i.e. 01 and 2 classifiers having this action are put in action set.
The System executes the prescribed action. Next day the correctness of the action
taken is judged by the stock price movement. For every correct action a reward of
1000 is given. For wrong action reward of 0 is given. It differs from normal
Reinforcement Learning methodology in the sense that for incorrect action, negative
reward is not given. For example let’s suppose the system predicts rise in the price of
stock and it buys the share. If next day the prices go up, then a reward of 1000 is
given. This reward is used to update the parameters of classifiers in action set [A].
2.3.3.1 Updating XCS Parameters
Initially on creation of a classifier, it is given a very low prediction value. After
getting the reward for the executed action, its parameters are updated as follows-:
: Pj Pj + α(R - Pj)
Prediction
α Is learning rate (~ 0.2)
so if R > Pj then Pj value is increased i.e. it’s prediction will go up. As can be seen, if
this particular classifier is updated many times, Pj will tend toward ‘R’ i.e. predicted
value will tend towards the actual return from the process.
Similarly, other parameters are updated as
: Ej Ej + α(|R – Pj| - Ej)
Error
−m −n
Ej Eo
Accuracy : Kj == if Ej > Eo else
Relative : Kj’= Kj / Σ Kj over [A]
Accuracy
Relative accuracy shows relative accuracy of classifier with respect to classifiers in
action set.
: Fj Fj + α(Kj’ - Fj)
Fitness
Fitness of the classifier is an estimate of its accuracy with respect to accuracies of
other classifiers in the action set it occurs
2.3.3.2 Genetic Algorithm role and rule evolution[15]
XCS applies Genetic algorithm for rule evolution. If the average time since the last
GA was applied, exceeds certain threshold then genetic reproduction is invoked in the
current action set [A]. The GA selects 2 parental classifier based on there relative
fitness in action set [A]. Two offspring’s are generated reproducing the parents by
applying crossover and mutation. Parents and offspring’s both compete in the same
population [P]. Niche mutation is applied in the classifier which means that the
mutated classifiers still matches the current problem instance or input binary string
they were able to act previously. If the offspring condition is subsumed by some other
classifier than it is not inserted into the population and only the numerosity of the
subsumer classifier is increased by 1. The classifier population is fixed and deletion is
done if over populated. Excess classifiers are deleted from [P] with probability
proportional to the action set size estimate that the classifiers occur in. If classifiers are
more experienced with less fitness there probability of deletion is more [15]. For more
information on this, readers are encouraged to read chapter 4 of martin butz book.
The classifiers which are more general will more often be part of an action set
and thus undergo more reproduction events and thus propagates faster. Thus the GA
process is expected to evolve the accurate, maximally general solution as the final
outcome.
For example the below mentioned classifier undergoes cross over to give offspring on
the right hand side.
10##|11:1 10##1#:1 -----(1)
#000|1#:2 #00011:2 ------(2)
Please note result of crossing are :
A classifier (1) which is more general than both,
A classifier (2), which is more specific than both.
A more specific classifier can never be less accurate. It is not the case always but the
process tends on balance to search along generality specific dimensions, using piece of
existing higher accuracy classifiers. It is clear that population will tend towards having
classifiers with greater accuracy [15].
2.3.4 Deviation from other LCS based Systems
XCS reproduces classifiers selecting from the current action set instead of from
•
the whole population.
Relative accuracy based fitness measure the performance of a classifier.
•
Reproduction favors those who’s condition matches and come more often in
•
the action set.
Deletion occurs from whole of the population.
•