PowerPoint

advertisement
Learning
Behavioral Parameterization
Using Spatio-Temporal
Case-Based Reasoning
Maxim Likhachev, Michael Kaess,
and Ronald C. Arkin
Mobile Robot Laboratory
Georgia Tech
This research was funded under the DARPA MARS program.
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation
• Constant parameterization of robotic behavior results in
inefficient robot performance
• Manual selection of “right” parameters is difficult and
tedious work
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
2
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation (cont’d)
• Use of Case-Based Reasoning (CBR) methodology
“clear-to-goal” case
“front-obstructed” case
– an automatic selection of optimal parameters at run-time (ICRA’01)
– each case is a set of behavioral parameters indexed by environmental
features
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
3
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Motivation for the Current Research
• The CBR module
– improves robot performance (in simulations and on real
robots)
– avoids the manual configuration of behavioral
parameters
• The CBR module still required the creation of a
case library which
– is dependent on a robot architecture
– needs extensive experimentation to optimize cases
– requires good understanding of how CBR works
• Solution: to extend the CBR module to learn
– new cases from scratch or optimize existing cases
– in a separate training process or during missions
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
4
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Related Work
• Use of Case-Based Reasoning in the selection of
behavioral parameters
– ACBARR [Georgia Tech ’92] , SINS [Georgia Tech
’93]
– KINS [Chagas and Hallam]
• Automatic optimization of behavioral parameters
– genetic programming (e.g., GA-ROBOT [Ram, et. al.])
– reinforcement learning (e.g., Learning Momentum
[Lee, et. al.])
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
5
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Behavioral Control and CBR Module
CBR Module controls (case output parameters):
Weights for each behavior
Noise Persistence
BiasMove Vector
Obstacle Sphere
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
6
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Case Indices: Environmental Features
Spatial features: traversability vector Temporal features:
• split environment into K = 4 angular regions
• compute obstacle density within each region
• transform the density into traversability
• Short-term velocity towards the goal
• Long-term velocity towards the goal
f1=0.58
f2=1.0 f0=0.92
f3=0.68
Vspatial:
f0=0.92 f1=0.58
f2=1.00 f3=0.68
Vtemporal
ShortTerm: Rs=1.0
LongTerm: Rl=0.7
f1=0.22
f2=0.63 f0=0.02
f3=0.02
Vspatial:
f0=0.02 f1=0.22
f2=0.63 f3=0.02
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
Vtemporal
ShortTerm: Rs=0.01
LongTerm: Rl=1.0
7
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Overview of non-learning CBR Module
Feature
current
environment Identification
spatial &
temporal
feature vectors
set of
Spatial Features Vector
spatially
Matching
matching
(1st stage of Case Selection)
cases
Temporal Features Vector
Matching
(2nd stage of Case Selection)
set of
spatially and temporally
matching cases
all the cases
in the library
Case Library
Random Selection
Process
(3rd stage of Case Selection)
best matching
case
case output parameters
(behavioral assemblage
parameters)
Case
Case
case ready
Application for application Adaptation
best matching or
currently used case
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
Case switching
Decision tree
8
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Making CBR Module to Learn
Feature
current
environment Identification
spatial &
temporal
feature vectors
set of
Spatial Features Vector
spatially
Matching
matching
(1st stage of Case Selection)
cases
set of
spatially and temporally
matching cases
all the cases
in the library
Case output parameters
( behavioral assemblage
parameters)
Case
Application
Random Selection
Biased by Case Success
and
Spatial and Temporal
Similarities
Case Library
case ready
for
application
Case
Adaptation
new or existing
best
matching case
New Case
Creation
(if necessary)
last
K cases
last
K cases
with adjusted
performance
history
best matching
or currently
used case
Temporal Features Vector
Matching
(2nd stage of Case Selection)
best matching
case
Old Case
Performance
Evaluation
best matching
or currently
used case
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
Case switching
Decision tree
9
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Extensive Exploration of Cases:
Modified Case Selection Process
• Random selection of cases with the probability of
the selection proportional to:
– spatial similarity with the environment ( 1st step)
– temporal similarity with the environment (2nd step)
– weighted sum of the case past performance and spatial
and temporal similarities (3rd step)
P(selection)
1.0
P(selection)
1.0
set of spatially
matching
cases:
{C1, C2, C4}
1.0
0.0
C5 C4 C3 C2 C1
spatial similarity
0.0
P(selection)
1.0
best
set of spatially
matching
& temporally
case:
matching
C1
cases:
1.0 {C1,,C4}
C4 C1
0.0
C2 C1 C4
temporal similarity
weighted sum of spatial and temporal
similarities and case success
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
10
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Positive and Negative Reinforcement:
Case Performance Evaluation
• Criteria for the evaluation of the case performance :
the average velocity with which the robot approaches its
goal during the application of the case
– opportunities for intermediate case performance evaluations
– may not always be the right criteria
• such cases exhibit no positive velocity towards the goal
• the evaluation of the performance is delayed by K (=2) cases
– case_success (represents case performance) is:
• increased if the average velocity is increased or sustained high
• decreased otherwise
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
11
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Maximization of Reinforcement:
Case Adaptation
• Maximize case_success as a noisy function of case
output parameters (behavioral assemblage parameters)
– maintain the adaptation vector A(C) for each case C
– if the last series of adaptations result in the increase of
case_success then continue the adaptation:
O(C) = O(C) + A(C)
– otherwise switch the direction of the adaptation, add a random
component and scale proportionally to case_success:
A(C) = -·A(C) +  ·R
O(C) = O(C) + A(C)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
12
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Maximization of Reinforcement:
Case Adaptation (cont’d)
• Incorporate prior knowledge into the search:
– fixed adaptation of the Noise_Gain and Noise_Persistence
parameters based on the short- and long-term velocities of the
robot
• Constrain the search:
– limit Obstacle_Gain to be higher than the sum of the other
schema gains (to avoid collisions)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
13
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
The Growth of the Case Library:
Case Creation Decision
• To avoid divergence a new case is created whenever:
– case_success of the selected case is high and spatial and
temporal similarities with the environment are low to
moderate
– case_success of the selected case is low to moderate and
spatial and temporal similarities are low
• Limit the maximum size of the library (10 in this work)
• New case is initialized with:
– the spatial and temporal features of the environment
– the output parameter values of the selected case
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
14
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Experimental Analysis: Example
Learning CBR: first run (starting with an empty library)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
15
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Experimental Analysis: Example
Learning CBR: a run after 54 training runs on various environments
• library of ten cases was learned
• 36 percent shorter travel distance
A case of a
“clear-to-goal”
strategy is
learned for
such
environments
A case of a
“squeezing”
strategy is
learned for
such
environments
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
16
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Experiments: Statistical Results
Simulation results (after 250 training runs for learning CBR system)
Heterogeneous environment
100.0%
100.00
40.00
30.00
20.00
10.00
70.0%
60.0%
50.0%
40.0%
30.0%
20% Obstacle density
0.00
20.0%
10.0%
15% Obstacle density
1
2
0.0%
1
3
2
3
4500.00
3500.0
4000.00
1000.00
500.00
0.00
1500.0
1
2
1000.0
20% Obstacle density
500.0
learning CBR
1500.00
2000.0
CBR
2000.00
2500.0
learn
2500.00
CBR
3000.00
non-adaptive
3000.0
3500.00
non-adapt.
Average number
of steps
learning CBR
50.00
80.0%
non-adaptive
60.00
learning CBR
70.00
CBR
80.00
CBR
90.0%
90.00
non-adaptive
Mission
completion rate
Homogeneous environment
15% Obstacle density
0.0
3
1
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
2
3
17
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Real Robot Experiments: In Progress
• RWI ATRV-Jr
• Sensors:
– SICK laser scanners in
front and back
– Compass
– Gyroscope
• Experiments in progress,
no statistical results yet
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
18
Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning
Conclusions
• New and existing cases are learned and optimized during a
training process or as part of mission executions
• Performance:
– substantially better than that of a non-adaptive system
– comparable to a non-learning CBR system
• Neither manual selection of behavioral parameters nor careful
creation and optimization of case library is required from a user
• Future Work
– real robot experiments
– case “forgetting” component
– integration with other adaptation & learning methods (e.g., Learning
Momentum, RL for Behavioral Assemblage Selection)
Maxim Likhachev, Michael Kaess, and Ronald C. Arkin
19
Download