Combining Classification and Temporal Learning Matthew Winston Mitchell

From: AAAI-00 Proceedings. Copyright © 2000, AAAI (www.aaai.org). All rights reserved.
Combining Classification and Temporal Learning
Matthew Winston Mitchell
School of Computer Science and Software Engineering,
Faculty of Information Technology, Monash University
P.O Box 197 Caulfield East, 3145 Australia
matt@insect.sd.monash.edu.au
Background
This introduces TRACA (Temporal Reinforcement-learning
and Classification Architecture), a connectionist learning
system for solving problems in large state spaces. These
types of problems, such as robot control, commonly include
the presence of irrelevant attributes and hidden-state.
TRACA is capable of dealing with both irrelevant information and hidden-state while addressing two common
shortcomings of other learning systems. The first shortcoming is requiring a large number of training examples which
is unrealistic for learning in the real world. The second is
having to pre-determine or constrain network structure and
size.
System Overview
TRACA dynamically develops a model of its environment
while learning. This model consists of combination groups,
which are used to construct general rules, and temporal
groups, which implement a memory mechanism.
Groups represent one or more situations and are connected to detector inputs and/or other groups by arcs which
are used to pass a variety of messages. Based on the situations they represent, groups contain nodes which store estimates of action-values (Sutton 1998) - and maintain transition probabilities to other situations.
New groups are created incrementally while learning by
combining existing nodes as selected by a localized probabilistic mechanism. Each new combination of nodes is given
a small number of trials to determine its usefulness and is
then retained only while it demonstrates an improved value
estimate over those of its lower level component nodes.
Relationship to Other Work
TRACA has several distinguishing features. It is able to
reduce the complexity of structures by representing NOT
and XOR using only logical AND combinations in conjunction with the organisation of nodes into groups and a suppression mechanism. When compared to Neural Networks
(Lin 1993), nodes in TRACA store value estimates independently, allowing it to exploit learning from only a few training examples. Finally, TRACA will not continue to solve
Copyright c 2000, American Association for Artificial Intelligence (www.aaai.org). All rights reserved.
hidden-state problems if the solution provides no useful improvement in achieving reinforcement. This avoids the problem of choosing to either have a fixed-size history window
or to artificially restrict the number of temporal nodes (McCallum 1996; Ring 1994).
TRACA’s creation of new groups through combinations
has strong parallels to both Holland’s Learning Classifier
Systems (Holland 1975) and Drescher’s Schema Mechanism
(Drescher 1991).
Results
Experimental results have demonstrated that TRACA is capable of representing a number of problems - including those
with hidden-state - without having to pre-determine network
size or structure. The performance of TRACA in experiments indicate that it can match the accuracy of a number
of other systems with a relatively small number of training
examples.
References
Drescher, G. 1991. Made-Up Minds. The MIT Press.
Holland, J. 1975. Adaption in natural and artificial systems. University of Michigan Press.
Lin, L. 1993. Reinforcement Learning for Robots Using
Neural Networks. Ph.D. Dissertation, School of Computer
Science, Carnegie Mellon University, Pittburgh USA.
McCallum, A. 1996. Reinforcement Learning With Selective Perception and Hidden State. Ph.D. Dissertation,
Department of Computer Science, University of Rochester,
NY.
Ring, M. 1994. Continual Learning in Reinforcement Environments. Ph.D. Dissertation, The University of Texas at
Austin.
Sutton, R. 1998. Reinforcement Learning: An Introduction. The MIT Press.