An Introduction to Learning Classifier Systems by Dayle Majors Abstract This paper discusses learning classifiers research. It covers their introduction and then continues with a discussion of some current research to show the directions that research is now headed. Keywords: Genetic Algorithms, Learning Classifiers, Reinforcement Learning, Fuzzy Logic, XCS system Introduction In Douglas Hofstadter’s book, [Hof79], he made a case for learning arising from interactions with the environment, especially in artificial intelligence. His research reported in [Hof95] expanded on that theme with discussions of computer models of thought processes. Learning Classifiers Systems are a genetic algorithm approach to learning arising from interactions with the environment. Learning is acquiring knowledge about an environment or situation so that one can accomplish a task or respond to the situation. Most humans spend many years at home and in school trying to learn how to live in the world. Machines do not normally exhibit knowledge. Interactions with machines require very explicit instructions. For some time, scientists have dreamed of having machines that could react with the environment much as we as humans do. This they believe would make easier to work with the machines. Learning from the environment is studied as Reinforcement Learning for both real and artificial entities. An entity is given some learning task and when it completes the task, the entity is reinforced by some reward. Punishment may be viewed as a negative reward and may or may not be given depending on the particular learning experiment. The learning task may be a short with frequent feedback or it may be long as in playing games such as checkers or chess where many moves are made before feedback is received. As we will see, Learning Classifier Systems fit into this reinforcement learning research model. This paper will discuss Learning Classifier Systems, their origins, structure, and functionality and some of the research issues that surround them. After laying the foundation by describing the structure, and how knowledge is represented, this paper will discuss some of the issues surrounding their use. Next the paper will discuss how research in the past decade has modified components of the structure in an attempt to improve the learning embodied in the model. Finally the paper will draw some conclusions regarding the research in learning classifier systems. 1 Historical Perspective What is a Learning Classifier System? The learning classifier system was first introduced by John H. Holland in a book in 1975 [Hol75, Hol92]. In this computational model, a learning classifier has three major components: a performance system, an apportionment of credit procedure, and a rule discovery system. See diagram below: Environment Sensors Performance System Effectors Reward Learning Classifier system Conflict Resolution Rules and Messages Rule Conflict Resolution Apportionment Of Credit System Rule Discovery System Figure 1 Components of a Learning Classifier System The performance system consists of a message list and a collection of rules. The rules represent the knowledge base of the system. At any time in the processing, the collection of rules is the system’s best approximation to the knowledge the system is processing. However, all communication within the system involves messages. Input from the environment is formatted by the sensors into messages and placed on the message list. The rules specify conditions for their execution and produce messages or perhaps output as effectors. The rules may be thought of as if-then statements. An input message is processed by matching the first portion of rules, including effector conditions. That is an effector appears in the rule collection as other rules. The difference is that it produces output rather than another message. One or more rules may match the message. One is selected and if it is not an effector then it outputs a new message to the list. This may be repeated several times before the message placed on the list matches an effectors condition. 2 The rule discovery system is much like a basic genetic algorithm. In classical learning classifier systems, the if-then rules are represented as fixed length strings over some small alphabet. Holland used a three character alphabet consisting of 0, 1, and # where # denotes “don’t care” in the if portion of the rule and “copy input” in the then part of the rule. The fitness used in the genetic algorithm is not a direct function of the rule but rather the result of the apportionment of credit procedure. The apportionment of credit procedure described by Holland has become known as the “bucket brigade” algorithm because of the way credit is passed back through the collection of rules. The following is the basic algorithm using the bucket brigade algorithm: 0. Set initial strength to all classifiers (rules). Clear list and append all input messages to list. Set all classifiers to not active. 1. Set to active all classifiers that match messages on the list to active and clear list. 2. For each classifier that is marked active calculate a bid quantity 3. Choose stochastically with a probability related to the bid quantity classifiers to add new messages to the list. (Each message is tagged with the classifier that added it to make possible 4.) 4. Each classifier that has successfully posted a new message pays the classifier that posted the message that caused it to be active its payment quantity, which is a function of its strength and specificity – number of specific conditions. 5. All classifiers are set to not active. 6. New environmental messages are added to list and process is repeated starting at 1. bid Rule bid Rule Rule payout payout Figure 2 3 The strength of each classifier is updated on each cycle by adding any payment it receives and subtracting what it pays out. If the payment received over several cycles exceed the cost of posting messages the rule will accrue the excess as additional strength. If the payment does not exceed the cost, the rule will loose value. It is this value that is used in the rule discovery system as the fitness of a given rule. Those with low value will cease to be able to win bids to post messages and ultimately will be discarded. The representation of knowledge will be the collection of rules in the system. Obviously since it may take several cycles through the process of selecting rules and distributing payoff before the quality of rules is well enough established to be reasonably used in a genetic algorithm, the genetic algorithm will only operate after some number of cycles. This number has been mentioned as being in the range of 50 to 100 [Hol92, Gol89]. In addition, the genetic algorithms were described as stable with only 5 to 10% of the population of rules being replaced on a given reproductive cycle. Where do Learning Classifiers fit? Learning classifiers fit into the machine learning research in the general classification of reinforcement learning methods. CN2 developed by Clark and Niblett [Cla89] and AQ developed by Michelski [Mic78] are other reinforcement learning methods. Some of the research in the last decade has compared the efficiency of learning classifiers systems to these and other methods. . So how do learning classifier systems differ from these other reinforcement learning methods? Reinforcement learning methods develop descriptions of the problem by “learning” rules that correctly classify information from the environment. The system can then use those rules to react to the environment. The rules create a map from the environmental input to some set of classes for the data. Most of the reinforcement machine learning algorithms produce a homomorphism from the input to the classes of data. The classes of data can be thought of as a model of the environment. A homomorphism is a complete map in the sense that for any given input there is one unique output class. Learning classifier systems do not necessarily create a complete map in quite the same way. Given that the credit assignment procedure is working appropriately, a hierarchy of rules may be developed that will map the input to the correct output class but some situations will be covered by “don’t care” conditions. As a result the classification map will not be complete. This type of map is called a quasi-homomorphism. Such representations of the input are usually smaller than the complete map produced by most other reinforcement learning methods. Goldberg [Gol89] discusses the necessary conditions for the creation of default hierarchies as described above. He gives an implementation of a learning classifier system in Pascal. He uses that code sample to explain the relationship of the various components of a learning classifier system. He details how default hierarchies are encouraged to develop and will be maintained. However, there are some problems maintaining default hierarchies. 4 Recent Research Introduction In order to give a flavor to current research topics in learning classifiers several research papers will be reviewed. These are not a comprehensive list of current research but rather a few research topics to indicate the breath of research. Default hierarchies were discussed by Goldberg [Gol89] but are still an active area. The XCS system discussed below is an approach to learning classifiers based on Q-learning. Fuzzy learning classifiers are an approach based on fuzzy logic where the boundaries of sets are not sharp as they are in classical logic. These are diverse approaches to classification and learning, as is current research in learning classifier systems. Experiments with Default Hierarchies In Dorico’s paper [Dor91] is a discussion of default hierarchies. These are collections of rules were there exist a default rule and other rules specify conditions more specific than the default rule and produce actions that differ from the default rule. Consider the set of rules: 00 → 0; 01 → 1; 10 → 1; 11 → 1. These four rules implement the logical function or. A default hierarchy that represents the same function is: 00 → 0; ** → 1. That is if the input is 00 then output is 0 else output is 1. Note that the default hierarchy captures the information in two rules where the complete map contains four rules. Dorico explains how default hierarchies form in the traditional learning classifier systems but notes that when the default rule get used much more often than more specific rules the specific rules will tend to loose strength over time. This results in poor performance periodically for the system. He suggests that if strength is associated to the output message rather than the rule itself that the oscillation can be avoided. He gives specific equations for implementing this approach to the bucket brigade algorithm which he calls message-based bucket brigade. Dorico associates with each message that is the output of some classifier in the current set of classifiers a strength, Mc. The bid is calculated as B = Mc+αMp1+βMp2 where Mp1 and Mp2 are strengths of the messages that matched the first or second part of the classifiers condition. The weights α and β are the respective specificity of those messages. A winning message pays an amount proportional to its strength to the messages that invoked it. The message’s strength will over time increase or decrease depending on whether or not its income exceeds its payout. The paper is not specific about how the message strength affects the genetic algorithm. The XCS system Stewart W. Wilson introduced a new form of learning classifier system in 1995 [Wil95]. As discussed earlier the bucket brigade distributes strength based on the usefulness of the rules to the system. This new system, called XCS, bases the strength of a rule on the accuracy of the rules. In [Wil00] Wilson discusses the current state of research using this model. Basically XCS works best in Markov decision environments. However he discusses how XCS can be modified to work in non-Markov environments. He notes that a environment may be non-Markov simply because the sensors are not able to distinguish all of the states through which the system passes. Note a system is Markov if a finite 5 history is able to distinguish the states, that is, a system is Markov if a finite sequence of states determines the next correct action. Wilson describes his system as developing a map from the Cartesian of all states and all actions to the reward or payoff. While his classifiers may use defaults to capture the state these generalizations should be correct to properly represent state. By keeping two counts, one for correct predicted payoff and one for incorrect payoff, one could assign a strength value to be used in the genetic algorithm for rule discovery as the correct count divided by the sum of the two counts. Thus a classifier that correctly predicts the payoff would be stronger than one that predicts a high payoff incorrectly. (One of the problems discussed by several of the authors in [Hol98] was over general classifiers becoming to strong and pushing out more accurate classifiers.) XCS is an attempt to avoid some of the stability issues experienced with the older learning classifier systems. The development of stable default hierarchies is still very much an open question. Some of the authors, especially Wilson, believe that the small rule set developed in default hierarchies is more of a liability than a benefit. However, in situation of delayed and sparse feedback Wilson’s XCS has difficulty. Fuzzy Systems In [Bon98], a learning classifier system based on fuzzy logic is discussed. Fuzzy logic uses real numbers to represent the truth value associated with a variable. In classical logic the truth value is either zero or one. In fuzzy logic the variable can take on any value from zero to one. People working with fuzzy logic have worked out how to combine these values in a way consistent with classical logic. Basically the truth value is thought of as a set membership function. If the value is one, the point is in the set. If the value is zero, the point is not in the set. Otherwise the value will be between zero and one. This value can be interpreted as the percent of truth associated with the variable being in the set. The sensors produce real valued fuzzy logic values and the rules produce real valued vectors rather than Boolean strings. That implies that the rule discovery system will be similar to a evolutionary strategy. The crossover and mutation operators will deal with real valued vectors in the rule discovery algorithm. Matching of rules to messages are performed using fuzzy logic and a concept of closeness. The selection is then made from the matched rules stochastically as in regular learning classifier systems. Conclusions Learning classifiers have been presented along with several of the research developments of the last decade. Default hierarchies, XCS, and fussy logic based classifiers are three of the current learning classifier system research areas. There are several other areas of research discussed in [Lan98]. With the diverse areas of research, learning classifier systems will continue to be an active and dynamic environment for years to come, However, this is but one of the approaches to machine learning. The desire to have machines that can learn and interact with humans in a more “natural” way will continue to be investigated. What will eventually prove to be a viable approach to developing 6 such machines is as yet unknown. Learning classifier systems are certainly one of the approaches that hold promise. 7 Bibliography [Cla89] The CN2 Induction Algorithm. Peter Clark and Tim Niblett, Machine Learning, 3(4):261-283, 1989. [Dor91] New Perspectives about Default Hierarchies Formation in Learning Classifier Systems. Marco Dorigo, Proceedings of II Italian Congress on Artificial Intelligence, Palermo, Italy, E.Ardizzone, S.Gaglio and F.Sorbello (Eds.), SpringerVerlag, 218-227, Berlin, 1998.. [Gol89] Genetic Algorithms in Search, Optimization, and Machine Learning Davit E. Goldberg, Addison-Wesley, Reading, Mass. 1989 [Hof79] Gödel, Escher, Bach: an Eternal Golden Braid. Douglas R. Hofstadter, Basic Books, New York, 1979 [Hof95] Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. Douglas R. Hofstadter and the Fluid Analogies Research Group, Basic Books, New York, 1995 [Hol75, Hol92] Adaptation in Natural and Artificial Systems John H. Holland, University of Michigan, Ann Arbor, 1975. Republished bi MIT Press, 1992 [Hol98] What Is a Learning Classifier System? John H. Holland, et al, in [Lan98] [Lan98] Learning Classifier Systems from Foundations to Applications Pier L. Lanzi, Wolfgang Stoltzmann, Stewart W. Wilson, eds. Springer-Verlag, New York, 1998 [Mic78] Selection of Most representative Training Examples and Incremental Generation of VLI Hypotheses: An Underlying Methodology and a Description of Programs ESEL and AQI 1, Report No. 867, R. S. Michalski and J. B. Larson, Department of Computer Science, University of Illinois, Urbana 1978. [Wil95] Classifier Fitness based on Accuracy, Stewart W. Wilson, in Evolutionary Computation, 2(1):149-175, 1995 [Wil98] State of XCS Classifier System Research, Stewart W. Wilson, in [Lan98] 8