An Introduction to Learning Classifier Systems

advertisement
An Introduction to Learning Classifier Systems
by
Dayle Majors
Abstract
This paper discusses learning classifiers research. It covers their introduction and then
continues with a discussion of some current research to show the directions that research
is now headed.
Keywords: Genetic Algorithms, Learning Classifiers, Reinforcement Learning, Fuzzy
Logic, XCS system
Introduction
In Douglas Hofstadter’s book, [Hof79], he made a case for learning arising from
interactions with the environment, especially in artificial intelligence. His research
reported in [Hof95] expanded on that theme with discussions of computer models of
thought processes. Learning Classifiers Systems are a genetic algorithm approach to
learning arising from interactions with the environment.
Learning is acquiring knowledge about an environment or situation so that one can
accomplish a task or respond to the situation. Most humans spend many years at home
and in school trying to learn how to live in the world. Machines do not normally exhibit
knowledge. Interactions with machines require very explicit instructions. For some time,
scientists have dreamed of having machines that could react with the environment much
as we as humans do. This they believe would make easier to work with the machines.
Learning from the environment is studied as Reinforcement Learning for both real and
artificial entities. An entity is given some learning task and when it completes the task,
the entity is reinforced by some reward. Punishment may be viewed as a negative reward
and may or may not be given depending on the particular learning experiment. The
learning task may be a short with frequent feedback or it may be long as in playing games
such as checkers or chess where many moves are made before feedback is received. As
we will see, Learning Classifier Systems fit into this reinforcement learning research
model.
This paper will discuss Learning Classifier Systems, their origins, structure, and
functionality and some of the research issues that surround them. After laying the
foundation by describing the structure, and how knowledge is represented, this paper will
discuss some of the issues surrounding their use. Next the paper will discuss how
research in the past decade has modified components of the structure in an attempt to
improve the learning embodied in the model. Finally the paper will draw some
conclusions regarding the research in learning classifier systems.
1
Historical Perspective
What is a Learning Classifier System?
The learning classifier system was first introduced by John H. Holland in a book in 1975
[Hol75, Hol92]. In this computational model, a learning classifier has three major
components: a performance system, an apportionment of credit procedure, and a rule
discovery system. See diagram below:
Environment
Sensors
Performance
System
Effectors
Reward
Learning
Classifier
system
Conflict
Resolution
Rules and
Messages
Rule Conflict
Resolution
Apportionment
Of Credit System
Rule
Discovery
System
Figure 1 Components of a Learning Classifier System
The performance system consists of a message list and a collection of rules. The rules
represent the knowledge base of the system. At any time in the processing, the collection
of rules is the system’s best approximation to the knowledge the system is processing.
However, all communication within the system involves messages. Input from the
environment is formatted by the sensors into messages and placed on the message list.
The rules specify conditions for their execution and produce messages or perhaps output
as effectors. The rules may be thought of as if-then statements.
An input message is processed by matching the first portion of rules, including effector
conditions. That is an effector appears in the rule collection as other rules. The difference
is that it produces output rather than another message. One or more rules may match the
message. One is selected and if it is not an effector then it outputs a new message to the
list. This may be repeated several times before the message placed on the list matches an
effectors condition.
2
The rule discovery system is much like a basic genetic algorithm. In classical learning
classifier systems, the if-then rules are represented as fixed length strings over some small
alphabet. Holland used a three character alphabet consisting of 0, 1, and # where #
denotes “don’t care” in the if portion of the rule and “copy input” in the then part of the
rule.
The fitness used in the genetic algorithm is not a direct function of the rule but rather the
result of the apportionment of credit procedure. The apportionment of credit procedure
described by Holland has become known as the “bucket brigade” algorithm because of
the way credit is passed back through the collection of rules. The following is the basic
algorithm using the bucket brigade algorithm:
0. Set initial strength to all classifiers (rules). Clear list and append all input
messages to list. Set all classifiers to not active.
1. Set to active all classifiers that match messages on the list to active and clear list.
2. For each classifier that is marked active calculate a bid quantity
3. Choose stochastically with a probability related to the bid quantity classifiers to
add new messages to the list. (Each message is tagged with the classifier that
added it to make possible 4.)
4. Each classifier that has successfully posted a new message pays the classifier that
posted the message that caused it to be active its payment quantity, which is a
function of its strength and specificity – number of specific conditions.
5. All classifiers are set to not active.
6. New environmental messages are added to list and process is repeated starting at
1.
bid
Rule
bid
Rule
Rule
payout
payout
Figure 2
3
The strength of each classifier is updated on each cycle by adding any payment it
receives and subtracting what it pays out. If the payment received over several cycles
exceed the cost of posting messages the rule will accrue the excess as additional strength.
If the payment does not exceed the cost, the rule will loose value. It is this value that is
used in the rule discovery system as the fitness of a given rule. Those with low value will
cease to be able to win bids to post messages and ultimately will be discarded.
The representation of knowledge will be the collection of rules in the system. Obviously
since it may take several cycles through the process of selecting rules and distributing
payoff before the quality of rules is well enough established to be reasonably used in a
genetic algorithm, the genetic algorithm will only operate after some number of cycles.
This number has been mentioned as being in the range of 50 to 100 [Hol92, Gol89]. In
addition, the genetic algorithms were described as stable with only 5 to 10% of the
population of rules being replaced on a given reproductive cycle.
Where do Learning Classifiers fit?
Learning classifiers fit into the machine learning research in the general classification of
reinforcement learning methods. CN2 developed by Clark and Niblett [Cla89] and AQ
developed by Michelski [Mic78] are other reinforcement learning methods. Some of the
research in the last decade has compared the efficiency of learning classifiers systems to
these and other methods. .
So how do learning classifier systems differ from these other reinforcement learning
methods? Reinforcement learning methods develop descriptions of the problem by
“learning” rules that correctly classify information from the environment. The system can
then use those rules to react to the environment. The rules create a map from the
environmental input to some set of classes for the data. Most of the reinforcement
machine learning algorithms produce a homomorphism from the input to the classes of
data. The classes of data can be thought of as a model of the environment. A
homomorphism is a complete map in the sense that for any given input there is one
unique output class.
Learning classifier systems do not necessarily create a complete map in quite the same
way. Given that the credit assignment procedure is working appropriately, a hierarchy of
rules may be developed that will map the input to the correct output class but some
situations will be covered by “don’t care” conditions. As a result the classification map
will not be complete. This type of map is called a quasi-homomorphism. Such
representations of the input are usually smaller than the complete map produced by most
other reinforcement learning methods.
Goldberg [Gol89] discusses the necessary conditions for the creation of default
hierarchies as described above. He gives an implementation of a learning classifier
system in Pascal. He uses that code sample to explain the relationship of the various
components of a learning classifier system. He details how default hierarchies are
encouraged to develop and will be maintained. However, there are some problems
maintaining default hierarchies.
4
Recent Research
Introduction
In order to give a flavor to current research topics in learning classifiers several research
papers will be reviewed. These are not a comprehensive list of current research but rather
a few research topics to indicate the breath of research. Default hierarchies were
discussed by Goldberg [Gol89] but are still an active area. The XCS system discussed
below is an approach to learning classifiers based on Q-learning. Fuzzy learning
classifiers are an approach based on fuzzy logic where the boundaries of sets are not
sharp as they are in classical logic. These are diverse approaches to classification and
learning, as is current research in learning classifier systems.
Experiments with Default Hierarchies
In Dorico’s paper [Dor91] is a discussion of default hierarchies. These are collections of
rules were there exist a default rule and other rules specify conditions more specific than
the default rule and produce actions that differ from the default rule. Consider the set of
rules: 00 → 0; 01 → 1; 10 → 1; 11 → 1. These four rules implement the logical function
or. A default hierarchy that represents the same function is: 00 → 0; ** → 1. That is if
the input is 00 then output is 0 else output is 1. Note that the default hierarchy captures
the information in two rules where the complete map contains four rules.
Dorico explains how default hierarchies form in the traditional learning classifier systems
but notes that when the default rule get used much more often than more specific rules
the specific rules will tend to loose strength over time. This results in poor performance
periodically for the system. He suggests that if strength is associated to the output
message rather than the rule itself that the oscillation can be avoided. He gives specific
equations for implementing this approach to the bucket brigade algorithm which he calls
message-based bucket brigade.
Dorico associates with each message that is the output of some classifier in the current set
of classifiers a strength, Mc. The bid is calculated as B = Mc+αMp1+βMp2 where Mp1 and
Mp2 are strengths of the messages that matched the first or second part of the classifiers
condition. The weights α and β are the respective specificity of those messages. A
winning message pays an amount proportional to its strength to the messages that
invoked it. The message’s strength will over time increase or decrease depending on
whether or not its income exceeds its payout. The paper is not specific about how the
message strength affects the genetic algorithm.
The XCS system
Stewart W. Wilson introduced a new form of learning classifier system in 1995 [Wil95].
As discussed earlier the bucket brigade distributes strength based on the usefulness of the
rules to the system. This new system, called XCS, bases the strength of a rule on the
accuracy of the rules. In [Wil00] Wilson discusses the current state of research using this
model. Basically XCS works best in Markov decision environments. However he
discusses how XCS can be modified to work in non-Markov environments. He notes that
a environment may be non-Markov simply because the sensors are not able to distinguish
all of the states through which the system passes. Note a system is Markov if a finite
5
history is able to distinguish the states, that is, a system is Markov if a finite sequence of
states determines the next correct action.
Wilson describes his system as developing a map from the Cartesian of all states and all
actions to the reward or payoff. While his classifiers may use defaults to capture the state
these generalizations should be correct to properly represent state. By keeping two
counts, one for correct predicted payoff and one for incorrect payoff, one could assign a
strength value to be used in the genetic algorithm for rule discovery as the correct count
divided by the sum of the two counts. Thus a classifier that correctly predicts the payoff
would be stronger than one that predicts a high payoff incorrectly. (One of the problems
discussed by several of the authors in [Hol98] was over general classifiers becoming to
strong and pushing out more accurate classifiers.)
XCS is an attempt to avoid some of the stability issues experienced with the older
learning classifier systems. The development of stable default hierarchies is still very
much an open question. Some of the authors, especially Wilson, believe that the small
rule set developed in default hierarchies is more of a liability than a benefit. However, in
situation of delayed and sparse feedback Wilson’s XCS has difficulty.
Fuzzy Systems
In [Bon98], a learning classifier system based on fuzzy logic is discussed. Fuzzy logic
uses real numbers to represent the truth value associated with a variable. In classical
logic the truth value is either zero or one. In fuzzy logic the variable can take on any
value from zero to one. People working with fuzzy logic have worked out how to
combine these values in a way consistent with classical logic. Basically the truth value is
thought of as a set membership function. If the value is one, the point is in the set. If the
value is zero, the point is not in the set. Otherwise the value will be between zero and
one. This value can be interpreted as the percent of truth associated with the variable
being in the set.
The sensors produce real valued fuzzy logic values and the rules produce real valued
vectors rather than Boolean strings. That implies that the rule discovery system will be
similar to a evolutionary strategy. The crossover and mutation operators will deal with
real valued vectors in the rule discovery algorithm. Matching of rules to messages are
performed using fuzzy logic and a concept of closeness. The selection is then made from
the matched rules stochastically as in regular learning classifier systems.
Conclusions
Learning classifiers have been presented along with several of the research developments
of the last decade. Default hierarchies, XCS, and fussy logic based classifiers are three of
the current learning classifier system research areas. There are several other areas of
research discussed in [Lan98]. With the diverse areas of research, learning classifier
systems will continue to be an active and dynamic environment for years to come,
However, this is but one of the approaches to machine learning. The desire to have
machines that can learn and interact with humans in a more “natural” way will continue
to be investigated. What will eventually prove to be a viable approach to developing
6
such machines is as yet unknown. Learning classifier systems are certainly one of the
approaches that hold promise.
7
Bibliography
[Cla89] The CN2 Induction Algorithm. Peter Clark and Tim Niblett, Machine
Learning, 3(4):261-283, 1989.
[Dor91] New Perspectives about Default Hierarchies Formation in Learning
Classifier Systems. Marco Dorigo, Proceedings of II Italian Congress on Artificial
Intelligence, Palermo, Italy, E.Ardizzone, S.Gaglio and F.Sorbello (Eds.), SpringerVerlag, 218-227, Berlin, 1998..
[Gol89] Genetic Algorithms in Search, Optimization, and Machine Learning Davit E.
Goldberg, Addison-Wesley, Reading, Mass. 1989
[Hof79] Gödel, Escher, Bach: an Eternal Golden Braid. Douglas R. Hofstadter, Basic
Books, New York, 1979
[Hof95] Fluid Concepts and Creative Analogies: Computer Models of the Fundamental
Mechanisms of Thought. Douglas R. Hofstadter and the Fluid Analogies Research Group,
Basic Books, New York, 1995
[Hol75, Hol92] Adaptation in Natural and Artificial Systems John H. Holland, University
of Michigan, Ann Arbor, 1975. Republished bi MIT Press, 1992
[Hol98] What Is a Learning Classifier System? John H. Holland, et al, in [Lan98]
[Lan98] Learning Classifier Systems from Foundations to Applications Pier L. Lanzi,
Wolfgang Stoltzmann, Stewart W. Wilson, eds. Springer-Verlag, New York, 1998
[Mic78] Selection of Most representative Training Examples and Incremental
Generation of VLI Hypotheses: An Underlying Methodology and a Description of
Programs ESEL and AQI 1, Report No. 867, R. S. Michalski and J. B. Larson,
Department of Computer Science, University of Illinois, Urbana 1978.
[Wil95] Classifier Fitness based on Accuracy, Stewart W. Wilson, in Evolutionary
Computation, 2(1):149-175, 1995
[Wil98] State of XCS Classifier System Research, Stewart W. Wilson, in [Lan98]
8
Download