Learning Classifier Systems

Links:

2.3 XCS Introduction from Stock Trading Perspective

XCS stands for extended Classifier System. It is an accuracy based classifier

system which is different from other classifier in the way that classifier fitness is

derived from estimated accuracy of reward predictions instead of from reward

prediction themselves. It is an Online learning machine, which improves its behavior

with time through interaction with environment. XCS learns through reinforcement

and the aim is not only to get more reward but to maximize the value (Summation of

all the rewards in long run). The System is given least amount of prior information, so

that most of the machine knowledge results from adaptation to the environment. We

don’t tell it how to do things but let it learn through fed inputs, action it takes and the

reinforcement it gets. If it does well, we give it positive reward else penalize it in some

form.

2.3.1 XCS Input and Output

XCS Input Unit : The input to XCS is binary vector e.g. 10010110 where each bit

can be thought of as crossing the threshold of continuous valued Output of some

sensors. In our XCS, everyday the system gets previous day data about the individual

stock price (open, high, low and close) and volume. The Meta agents present in the

system apply technical analysis on this information to get buy (1), sell (0) signals. A

very simple example is moving average of price and volume. Let’s say a particular

agent calculates the moving average of price and volume for the past 10 days. If the

closing price of stock is greater than 10 day moving average of price, it suggests price

may go up and that’s a buy (1) sign. Similarly, if the volume of the stock is greater

than 10 day moving average of volume, it also predicts a buy (1) sign. So the input

string that will be fed into the XCS will be 11. This is a very simple example. In

actual, system uses more advance technical analysis to form the input binary string

which may range from 6 to 9 bits in length. Each bit can be either 0 (sell) or 1 (buy).

More detail about how the problem is being defined to the XCS System using

technical analysis can be found in section 3.1.

XCS Output: XCS output is discrete action or decisions. For example in our case it is

either 0(sell) or 1(buy). The final aim of learning cycle for XCS is to learn what action

it should take for a particular combination of input binary bits. Please note XCS uses

unsupervised learning (Reinforcement Learning) wherein at any point we don’t tell it

what is right and what is wrong. It has to find this out through experimental trial and

error and reward mechanism. Depending on the input string sometimes it is easy to

predict what the correct action is. For example if input string is 101111 i.e. out of 6

bits, 5 bits are suggesting to buy the stock, then the correct action must be to buy the

stock. However, at other times the correct action might not be so evident. For example

if the input string is 111000, the proportion for both buy and sell signal are equal and

we expect it to learn what weight would be appropriate to give to individual signal.

Even in real life scenario, on a given day an actual trader might face with situation

wherein there is no clear sign of buy or sell from combination of technical indicator.

He/She then have to judge from experience which technical indicator information

should be given more weight and decide accordingly.

2.3.2 XCS Frame Work [15]

XCS contains population of Classifiers. Each classifier in the population is

characterized by 5 main components -:

1) Condition part C, which specifies on what problem instances the classifier is

applicable.

2) Action part A, specifies what action classifier takes when condition C is

fulfilled.

3) Reward prediction P, estimates what payoff or reward classifier can expect on

executing the action.

4) Reward prediction error ε estimated the mean absolute difference of R with

respect to the actual reward.

5) Fitness F estimates the scaled, relative accuracy of classifier with respect to

other overlapping classifiers in the action set it is present.

In Short a classifier is a set of

Prediction is similar to payoff of Reinforcement Learning.

Eg 01#1## : 1 => 693.2

{0 sell : 1  Buy : # : don’t care}

This classifier says if first bit is 0, second is 1 and fourth is 1 and I don’t care about

others then after taking action 1(Buy), 693.2 will be the payoffs. The payoffs are

updated with the learning of system. The above given classifier’s condition matches

with following 8 input string. It might be the sum of all there payoffs.

010100

010110 There can be 8 such cases

010101

..........

It’s different from other action based systems like Neural Network in the sense that in

Neural Networks payoff information for any Input are distributed over the whole

Network. Each classifier acts only a subset of problems. It checks whether given

condition is one on which it can act. If condition is there, it acts on it and predict

certain payoff.

Figure1 XCS Frame Work [15]

F: fitness of prediction α 1\ε

[P]: Classifier population

[M] : Match Set ε: error in prediction

p: predicted value

2.3.3 XCS Learning Cycle

In the starting the classifier population [P] is generally empty. The agents use

the stock price and volume data to form the input string (For details please see section

3.1). The Input is fed into classifier population [P], which detects if there is any match.

The 4 classifier marked with -- in fig 1 matches the Input 0011. They are put in a

match set [M]. If no classifier matches the given input, XCS creates classifier by

covering mechanism (A rule is created at random and has random action and is

assigned a low prediction). A new rule has a certain number of don’t care sign (#) in

random position. The # sign give classifier an initial generality due to which it can be

tested on many input problem instances.

Covering is necessary only initially and vast majority of new rules are derived

from existing rules.

For example, suppose Input string is 11000101 and there is no classifier which

matches this input .Then the rule created is 1##0010# : 01  10

Continuing the process, after creation of Match set, XCS estimates payoff for each

possible action by forming a prediction array P (A). In fig1, 2 classifiers in the match

sets are predicting 01 and two are predicting 11. We take the Fitness weighted average

of prediction for each action

Predicted weighted = Σ prediction * fitness

Average --------------------------------

Σ fitness

Eg P(action=01) = 43*99 + 27* 3

-------------------- = 42.57

99+3

Similarly P(action=11) = 16.6

Hence, P(A) shows fitness weighted average of all reward prediction estimates of the

classifier in [M] that advocate classification A. The System follows an ε greedy policy

i.e. it takes the best action most of the time, but with small probability

ε (exploration probability) it also takes suboptimal action and chooses random action

from those in the prediction array. All classifiers in match set [M] that specifies

chosen action A forms the action set [A]. In fig 1, we have chosen the action with

maximum prediction i.e. 01 and 2 classifiers having this action are put in action set.

The System executes the prescribed action. Next day the correctness of the action

taken is judged by the stock price movement. For every correct action a reward of

1000 is given. For wrong action reward of 0 is given. It differs from normal

Reinforcement Learning methodology in the sense that for incorrect action, negative

reward is not given. For example let’s suppose the system predicts rise in the price of

stock and it buys the share. If next day the prices go up, then a reward of 1000 is

given. This reward is used to update the parameters of classifiers in action set [A].

2.3.3.1 Updating XCS Parameters

Initially on creation of a classifier, it is given a very low prediction value. After

getting the reward for the executed action, its parameters are updated as follows-:

: Pj  Pj + α(R - Pj)

Prediction

α Is learning rate (~ 0.2)

so if R > Pj then Pj value is increased i.e. it’s prediction will go up. As can be seen, if

this particular classifier is updated many times, Pj will tend toward ‘R’ i.e. predicted

value will tend towards the actual return from the process.

Similarly, other parameters are updated as

: Ej  Ej + α(|R – Pj| - Ej)

Error

−m −n

Ej Eo

Accuracy : Kj == if Ej > Eo else

Relative : Kj’= Kj / Σ Kj over [A]

Accuracy

Relative accuracy shows relative accuracy of classifier with respect to classifiers in

action set.

: Fj  Fj + α(Kj’ - Fj)

Fitness

Fitness of the classifier is an estimate of its accuracy with respect to accuracies of

other classifiers in the action set it occurs

2.3.3.2 Genetic Algorithm role and rule evolution[15]

XCS applies Genetic algorithm for rule evolution. If the average time since the last

GA was applied, exceeds certain threshold then genetic reproduction is invoked in the

current action set [A]. The GA selects 2 parental classifier based on there relative

fitness in action set [A]. Two offspring’s are generated reproducing the parents by

applying crossover and mutation. Parents and offspring’s both compete in the same

population [P]. Niche mutation is applied in the classifier which means that the

mutated classifiers still matches the current problem instance or input binary string

they were able to act previously. If the offspring condition is subsumed by some other

classifier than it is not inserted into the population and only the numerosity of the

subsumer classifier is increased by 1. The classifier population is fixed and deletion is

done if over populated. Excess classifiers are deleted from [P] with probability

proportional to the action set size estimate that the classifiers occur in. If classifiers are

more experienced with less fitness there probability of deletion is more [15]. For more

information on this, readers are encouraged to read chapter 4 of martin butz book.

The classifiers which are more general will more often be part of an action set

and thus undergo more reproduction events and thus propagates faster. Thus the GA

process is expected to evolve the accurate, maximally general solution as the final

outcome.

For example the below mentioned classifier undergoes cross over to give offspring on

the right hand side.

10##|11:1 10##1#:1 -----(1)



#000|1#:2 #00011:2 ------(2)

Please note result of crossing are :

A classifier (1) which is more general than both,

A classifier (2), which is more specific than both.

A more specific classifier can never be less accurate. It is not the case always but the

process tends on balance to search along generality specific dimensions, using piece of

existing higher accuracy classifiers. It is clear that population will tend towards having

classifiers with greater accuracy [15].

2.3.4 Deviation from other LCS based Systems

XCS reproduces classifiers selecting from the current action set instead of from

•

the whole population.

Relative accuracy based fitness measure the performance of a classifier.

•

Reproduction favors those who’s condition matches and come more often in

•

the action set.

Deletion occurs from whole of the population.

•