Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Mobile Robot Laboratory Georgia Tech This research was funded under the DARPA MARS program. Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Motivation • Constant parameterization of robotic behavior results in inefficient robot performance • Manual selection of “right” parameters is difficult and tedious work Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 2 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Motivation (cont’d) • Use of Case-Based Reasoning (CBR) methodology “clear-to-goal” case “front-obstructed” case – an automatic selection of optimal parameters at run-time (ICRA’01) – each case is a set of behavioral parameters indexed by environmental features Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 3 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Motivation for the Current Research • The CBR module – improves robot performance (in simulations and on real robots) – avoids the manual configuration of behavioral parameters • The CBR module still required the creation of a case library which – is dependent on a robot architecture – needs extensive experimentation to optimize cases – requires good understanding of how CBR works • Solution: to extend the CBR module to learn – new cases from scratch or optimize existing cases – in a separate training process or during missions Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 4 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Related Work • Use of Case-Based Reasoning in the selection of behavioral parameters – ACBARR [Georgia Tech ’92] , SINS [Georgia Tech ’93] – KINS [Chagas and Hallam] • Automatic optimization of behavioral parameters – genetic programming (e.g., GA-ROBOT [Ram, et. al.]) – reinforcement learning (e.g., Learning Momentum [Lee, et. al.]) Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 5 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Behavioral Control and CBR Module CBR Module controls (case output parameters): Weights for each behavior Noise Persistence BiasMove Vector Obstacle Sphere Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 6 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Case Indices: Environmental Features Spatial features: traversability vector Temporal features: • split environment into K = 4 angular regions • compute obstacle density within each region • transform the density into traversability • Short-term velocity towards the goal • Long-term velocity towards the goal f1=0.58 f2=1.0 f0=0.92 f3=0.68 Vspatial: f0=0.92 f1=0.58 f2=1.00 f3=0.68 Vtemporal ShortTerm: Rs=1.0 LongTerm: Rl=0.7 f1=0.22 f2=0.63 f0=0.02 f3=0.02 Vspatial: f0=0.02 f1=0.22 f2=0.63 f3=0.02 Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Vtemporal ShortTerm: Rs=0.01 LongTerm: Rl=1.0 7 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Overview of non-learning CBR Module Feature current environment Identification spatial & temporal feature vectors set of Spatial Features Vector spatially Matching matching (1st stage of Case Selection) cases Temporal Features Vector Matching (2nd stage of Case Selection) set of spatially and temporally matching cases all the cases in the library Case Library Random Selection Process (3rd stage of Case Selection) best matching case case output parameters (behavioral assemblage parameters) Case Case case ready Application for application Adaptation best matching or currently used case Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Case switching Decision tree 8 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Making CBR Module to Learn Feature current environment Identification spatial & temporal feature vectors set of Spatial Features Vector spatially Matching matching (1st stage of Case Selection) cases set of spatially and temporally matching cases all the cases in the library Case output parameters ( behavioral assemblage parameters) Case Application Random Selection Biased by Case Success and Spatial and Temporal Similarities Case Library case ready for application Case Adaptation new or existing best matching case New Case Creation (if necessary) last K cases last K cases with adjusted performance history best matching or currently used case Temporal Features Vector Matching (2nd stage of Case Selection) best matching case Old Case Performance Evaluation best matching or currently used case Maxim Likhachev, Michael Kaess, and Ronald C. Arkin Case switching Decision tree 9 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Extensive Exploration of Cases: Modified Case Selection Process • Random selection of cases with the probability of the selection proportional to: – spatial similarity with the environment ( 1st step) – temporal similarity with the environment (2nd step) – weighted sum of the case past performance and spatial and temporal similarities (3rd step) P(selection) 1.0 P(selection) 1.0 set of spatially matching cases: {C1, C2, C4} 1.0 0.0 C5 C4 C3 C2 C1 spatial similarity 0.0 P(selection) 1.0 best set of spatially matching & temporally case: matching C1 cases: 1.0 {C1,,C4} C4 C1 0.0 C2 C1 C4 temporal similarity weighted sum of spatial and temporal similarities and case success Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 10 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Positive and Negative Reinforcement: Case Performance Evaluation • Criteria for the evaluation of the case performance : the average velocity with which the robot approaches its goal during the application of the case – opportunities for intermediate case performance evaluations – may not always be the right criteria • such cases exhibit no positive velocity towards the goal • the evaluation of the performance is delayed by K (=2) cases – case_success (represents case performance) is: • increased if the average velocity is increased or sustained high • decreased otherwise Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 11 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maximization of Reinforcement: Case Adaptation • Maximize case_success as a noisy function of case output parameters (behavioral assemblage parameters) – maintain the adaptation vector A(C) for each case C – if the last series of adaptations result in the increase of case_success then continue the adaptation: O(C) = O(C) + A(C) – otherwise switch the direction of the adaptation, add a random component and scale proportionally to case_success: A(C) = -·A(C) + ·R O(C) = O(C) + A(C) Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 12 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Maximization of Reinforcement: Case Adaptation (cont’d) • Incorporate prior knowledge into the search: – fixed adaptation of the Noise_Gain and Noise_Persistence parameters based on the short- and long-term velocities of the robot • Constrain the search: – limit Obstacle_Gain to be higher than the sum of the other schema gains (to avoid collisions) Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 13 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning The Growth of the Case Library: Case Creation Decision • To avoid divergence a new case is created whenever: – case_success of the selected case is high and spatial and temporal similarities with the environment are low to moderate – case_success of the selected case is low to moderate and spatial and temporal similarities are low • Limit the maximum size of the library (10 in this work) • New case is initialized with: – the spatial and temporal features of the environment – the output parameter values of the selected case Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 14 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Experimental Analysis: Example Learning CBR: first run (starting with an empty library) Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 15 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Experimental Analysis: Example Learning CBR: a run after 54 training runs on various environments • library of ten cases was learned • 36 percent shorter travel distance A case of a “clear-to-goal” strategy is learned for such environments A case of a “squeezing” strategy is learned for such environments Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 16 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Experiments: Statistical Results Simulation results (after 250 training runs for learning CBR system) Heterogeneous environment 100.0% 100.00 40.00 30.00 20.00 10.00 70.0% 60.0% 50.0% 40.0% 30.0% 20% Obstacle density 0.00 20.0% 10.0% 15% Obstacle density 1 2 0.0% 1 3 2 3 4500.00 3500.0 4000.00 1000.00 500.00 0.00 1500.0 1 2 1000.0 20% Obstacle density 500.0 learning CBR 1500.00 2000.0 CBR 2000.00 2500.0 learn 2500.00 CBR 3000.00 non-adaptive 3000.0 3500.00 non-adapt. Average number of steps learning CBR 50.00 80.0% non-adaptive 60.00 learning CBR 70.00 CBR 80.00 CBR 90.0% 90.00 non-adaptive Mission completion rate Homogeneous environment 15% Obstacle density 0.0 3 1 Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 2 3 17 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Real Robot Experiments: In Progress • RWI ATRV-Jr • Sensors: – SICK laser scanners in front and back – Compass – Gyroscope • Experiments in progress, no statistical results yet Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 18 Learning Behavioral Parameterization Using Spatio-Temporal Case-Based Reasoning Conclusions • New and existing cases are learned and optimized during a training process or as part of mission executions • Performance: – substantially better than that of a non-adaptive system – comparable to a non-learning CBR system • Neither manual selection of behavioral parameters nor careful creation and optimization of case library is required from a user • Future Work – real robot experiments – case “forgetting” component – integration with other adaptation & learning methods (e.g., Learning Momentum, RL for Behavioral Assemblage Selection) Maxim Likhachev, Michael Kaess, and Ronald C. Arkin 19