From: AAAI Technical Report SS-01-06. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved.
The Roles
of Machine Learning
in Robust Autonomous Systems
David
Kortenkamp
NASAJohnson Space Center - ER2
Houston, TX 77058
kortenkamp@jsc.nasa.gov
Extended
Abstract
Robust autonomous systems will need to be adaptable to changes in the environment and changes in the
underlying physical system. This is especially critical
for long-duration missions. For example, autonomous
robots that explore other planets for years will need
to adapt to degradation in their capabilities and to
unforeseen environmental factors. Another NASAdomain requiring robust, adaptable autonomy is control
of closed-loop systems that will provide life support to
crews on long-duration missions. Wehave been investigating autonomous control of advanced life support
systems for many years (Schreckenghost et al. 1998;
Kortenkamp, Keirn-Schreckenghost, & Bonasso 2000).
Recently we have begun investigating learning with respect to advanced life support systems (Kortenkamp,
Bonasso, & Subramanian 2001). In this abstract
briefly discuss some of the roles machine learning can
play with respect to control of advanced life support
systems or any complex, real-time system.
Detecting
signatures
Certain sensor "signatures" require specific responses
from the autonomous control system. These signatures
are often hand-coded by the programmer. For example,
the programmer might state that if the temperature is
above 100 and the pressure is above 1000 then vent
the tank. This is a trivial example and real-world examples will be more complex and involved. This means
that hand-coded signatures may not accurately capture
the event, especially if the environment or the physical
plant are changing. A variety of machinelearning techniques could be used to look at the history of the system
and adjust the signatures automatically. A specific research challenge in this area is:
¯ Can we learn to parse real-world data streams into
recognizable system modes or events?
strained in the amountof energy (from batteries or solar
panels) that they have. Crewedmission are often constrained by life support consumables such as oxygen,
water and food. Given adequate simulations, machine
learning techniques such as reinforcement learning or
genetic algorithms can search through combinations of
control actions to discover a control policy that optimizes usage of a particular resource (or combination of
resources). The resulting policy can be implemented
by the autonomous control system to increase mission
duration. We have conducted experiments using genetic algorithms and reinforcement learning to optimize
resource utilization in a simulated advanced life support system (see (Kortenkamp, Bonasso, & Subramanian 2001) for details).
Refining
models
Many autonomous controllers contain a model of the
system they are trying to control, either explicitly or implicitly. Typically, these models are hand-coded and do
not change. However, in long-duration missions the underlying physical system may change dramatically due
to damageor degradation. For example, a robot’s solar
panels may accumulate dust that reduces their power
output. Or, a robot’s wheels may slip more than anticipated. Or a filter in a life support system mayclog more
frequently than expected. In each of these cases, machine learning techniques could be used to modify the
internal model of the system based on real-world data.
This is especially necessary if other machine learning
techniques are using these models for optimization (as
discussed in the previous section). A specific research
challenge in this area is:
¯ Can feedback from actual operation of the system
be used to automatically refine our simulations and
models?
Learning/optimizing
sequences
The behavior of most autonomous systems can be characterized as sequence of actions that lead to some desired result. These sequences can be generated by planners, be hand-coded by programmers, or "emerge" from
the interactions of independent behaviors. In any case,
Optimizing control/resource
usage
Long-duration missions, both robotic and crewed, face
severe resource constraints.
Robots are often conCopyright(~) 2001, AmericanAssociation for Artificial Intelligence (www.aaai.org).
All rights reserved.
iii
sequences are at the heart of a robust autonomoussystem. Machine learning techniques can be used to optimize sequences, especially in the case of sequences that
are generated by planners or are hand-coded. Optimization of sequences can occur either by experimenting with a simulation or by looking at data from previous executions of sequences in the real world. Some
research challenges in this area are:
¯ For what class of dynamical systems is open loop
analysis (e.g., genetic algorithms) helpful for guiding
state space design for closed loop policy search?
¯ Howdo we design reinforcement learning algorithms
that can automatically design the right state space
features?
¯ Which machine learning techniques (e.g., reinforcement learning, memory-basedlearning, genetic algorithms, etc.) are the most useful for probing the solution space?
¯ Howcan the discoveries of machine learning algorithms be presented to control system designers so
that they can easily understand the topology of the
search space and the important control variables and
interactions?
¯ Howdetailed must the simulation be for results from
a machinelearning algorithm that probes the solution
space to be applicable to the real physical system?
¯ Howcan we provide safety guarantees when transferring a controller learned from simulations into the
real system?
¯ How can the system learn not only sequences but
also the contexts in which the sequences apply? Or
perhaps the system starts with a standard set of sequences and learns when and where each sequence
applies.
¯ What are the tradeoffs between having the system
learn sequences from scratch as opposed to ’%weaking" existing, working sequences?
Conclusions
The use of machine learning to increase the robustness
of autonomous control of long-duration missions poses
some specific challenges to the machine learning community. In particular,
Integration
with autonomous control
architectures
A large complex system will have existing control procedures and possibly even an overarching control architecture. Very few current control architectures incorporate learning or adaptation. Integration of learning
into an existing (or proposed) autonomouscontrol system raises a number of important research issues, such
as:
¯ Dealing with non-stationary dynamics
¯ Global optimization of large numbers of parameters
¯ Issues of safety and performance guarantees (verification and validation of machine learning techniques)
¯ Inspectibility
(so crew members know why certain
actions have been chosen)
¯ Long-term (or life-long) learning
¯ Coping with drastic changes in the underlying environment or physical system
¯ Integration of machine learning techniques with traditional AI techniques like planners, sequencers,
model-based reasoners
¯ Howgood does the initial set of control behaviors or
rules need to be for learning to be effective? Can we
start tabula rasa or do we need very effective initial
strategies?
¯ Whatare the differences between learning control information, learning procedural information, learning
qualitative modeling information and learning planning information?
¯ What are the criteria whereby the autonomous control system turns learning on or off?
¯ How does the autonomous control system decide
when to use new learned actions?
Acknowledgements
The ideas in this paper grew out of conversations with
the Advanced Life Support (ALS) Machine Learning
Group, which consists of R. Peter Bonasso (NASA
JSC/Metrica),
Justin
Boyan (NASA Ames),
Clancy (NASA Ames), Greg Dorais (NASA Ames),
Land Fleming (NASA JSC/Lockheed),
Matt MacMahon (NASA JSC/SKE), Jane Malin (NASA JSC),
Jeff Schneider (Carnegie Mellon University), Debra
Schreckenghost (NASAJSC/Metrica), Jay Schroeder
(NASAJSC), Alan Schultz (Naval Research Laboratory), Richard Simpson (NASAJSC/Metrica) and
vika Subramanian (Rice University). This work is supported by a NASACETDP Thinking Space Systems
grant entitled "Evaluating the Application of Machine
Learning to the Control of Advanced Life Support Systems."
Control system design methodology
The solution space for a closed loop control policy in
any complex interacting
system is enormous. Webelieve that machine learning techniques are a useful way
to "probe" the solution space and give control system
designers an idea as to its topology and the important
control variables. In this way, machine learning algorithms becomenot just a tool for adjusting the on-line
control system, but also a tool for helping programmers
design an a priori control policy. There are some important research questions that need to be answered
with respect to using machine learning techniques as
"probes" of the solution space, including:
112
References
Kortenkamp, D.; Bonasso, R. P.; and Subramanian,
D. 2001. Distributed autonomous control of space
habitats. In Proceedings of the IEEE Aerospace Conference.
Kortenkamp, D.; Keirn-Schreckenghost,
D.; and
Bonasso, 1%. P. 2000. Real-time autonomous control
of space habitats.
In AAAI 2000 Spring Symposium
on Real-Time Autonomous Control.
Schreckenghost,
D.; Ryan, D.; Thronesbery, C.;
Bonasso,1%. P.; and Poirot, D. 1998. Intelligent control
of life support systems for space habitats. In Proceedings of the Conference on Innovative Applications of
Artificial Intelligence.
t13