Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Interactive Task-Plan Learning Shuonan Dong Massachusetts Institute of Technology 77 Massachusetts Ave. 32-224, Cambridge, MA 02139 dongs@mit.edu Abstract Operators are not required to explicitly specify which task they are trying to accomplish. Instead, the task commanding interface passively monitors the interaction history for command patterns that could indicate implicit plans the user is trying to perform. If the robot becomes confident enough of an implicit plan, it can ask the user whether the discovered plan should be stored as a new entry in the plan library for future use. Low-level direct commanding of space robots can be time consuming or impractical for complex systems with many degrees of freedom. My research will adaptively raise the level of interaction between the operator and the robot by (1) allowing the robot to learn implicit plans by detecting patterns in the interaction history, and (2) enabling the human to demonstrate continuous motions through teleoperation. Learned tasks and plans are recorded for future use. I introduce a novel representation of continuous actions called parameterized probabilistic flow tubes that I hypothesize will more closely encode a human’s intended motions and provide flexibility during execution in new situations. I also introduce the use of planning for plan recognition in the domain of hybrid tasks. Research Problem The interactive task-plan learning system encompasses two research problems: motion learning and implicit plan discovery, which are summarized as follows: 1. Motion learning: Using multiple human demonstrations of a continuous movement (such as ones captured through teleoperation), learn a generalized representation of the motion that can be applied to new situations. Introduction Today, most space robots are directly commanded from Earth by hand. While manageable for simple robots, this control scheme is tedious when controlling robots with many degrees of freedom, such as JPL’s ATHLETE (AllTerrain Hex-Legged Extra-Terrestrial Explorer), which has 36 independent joints. Currently, operators can either use low-level joint angle commands or choose among a handful of pre-programmed higher-level tasks. Attempting to preprogram all possible tasks is unreasonable on such a complex robot. Instead, I present a learn-as-you-go approach, which will adaptively raise the level of interaction between the operator and the robot according to the operator’s level of comfort. The interactive task-plan learning system will operate in two modes: task commanding and motion learning. Task commanding mode is the default, where a user can specify desired commands. If a command is new to the robot, i.e. not in the plan library, the robot can prompt the user to demonstrate the new task. The system then switches to motion learning mode, where the user can specify lower-level commands or demonstrate samples of a continuous motion through teleoperation. A key point is that tasks are composed of hybrid (discrete or continuous) actions. The newly learned task is added to the plan library for future use. 2. Implicit plan discovery: Using the execution history of commands, infer the most likely set of implicit plans the user has performed. We now set up the problem more formally. We define a plan library as P = {pi }, where a plan is recursively composed from primitive activities or lower-level plans: p = a|composition (pi , pj ), and composition = sequence|parallel|choice. The motion learning problem is stated as follows: Definition 1 Given B objects or points of interest in the environment, the human performs N demonstrations of a motion, where each demonstration is recorded as the time evolved states of all objects and points of interest in the enT vironment, i.e. Sn = [st1 , . . . , st2 ] : n = 1 . . . N . We define R = {rj } as the set of statistically important motion parameters obtained from the location x and orientation θ of an object and robot end effector upon change of contact. The motion learning problem is to determine a parameterized probabilistic flow tube hµ (st1 ) , Σ (st1 )i that best describes the learned motion. The implicit plan discovery problem is stated as follows: Definition 2 Given a plan library P = {pi }, command history C = (c1 , . . . , cR ), and positive real value qthreshold , M = {mk } is a set of implicit plans with corresponding c 2010, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 1974 confidence measures Q = {qk |qk > qthreshold }, such that each implicit plan either exists in the plan library mk ∈ P , or is composed of elements in a subsequence of the command history mk = composition (cr1 , . . . , cr2 ) : 1 ≤ r1 < r2 ≤ R. ner can be initialized with the command pattern. Finally, to compute the confidence measure of a plan, I first expand all continuous components in the plan into their corresponding probabilistic flow tubes, then for each trajectory generated by an occurrence of the command pattern, compute the deviation of the trajectory to the mean flow tube trajectory according to the covariances. The idea of using planning for recognition has been explored in a discrete grid world (Ramirez and Geffner 2009), but to my knowledge it has not been used for learning and recognizing complex hybrid plans. I plan to measure the performance of the interactive taskplan learning system on three platforms. First, a 2D simulation of an environment with a few simple objects is used for initial testing of the motion learning approach. Second, a few hundred hybrid control sequences recorded from underwater vehicle deployment at MBARI will be used to test implicit plan discovery. Third, through a collaboration with JPL, I will test my system on the ATHLETE robot. Proposed Research Plan My approach to the motion learning problem is summarized in the following steps: First, the demonstrated motion trajectories are checked for statistically important characteristics by detecting patterns in the position x or orientation θ relations of the robot end effector ef f and objects obj in the environment. For example, an action “put ball in bin” has the characteristic that when the robot releases the ball, the position of the robot end effector is most likely above the bin, regardless of initial locations. Characteristic relations are checked at points (denoted α, β, . . .) when the robot gains or loses contact with an object, and specifically include relations that are abso α β − x ), relative (eg. lute (eg. xα x obj ef f obj ), or relative to an α object (eg. xα ef f − xobj ). Next, the trajectories are spatially normalized through centering and rotation and temporally aligned using dynamic time warping (Myers, Rabiner, and Rosenberg 1979). I introduce a probabilistic version of a flow tube (Hofmann and Williams 2006) that can be derived by computing the mean and covariance of the normalized trajectories. During execution, a flow tube’s covariance determines how much penalty is placed on deviating from the mean trajectory. Finally, since humans do not necessarily scale motions linearly to different situations, how to apply a learned flow tube to a new situation remains an open research problem. I propose to derive a spatial warping function from the demonstrated motion trajectories using Tikhonov regularization (Tikhonov and Arsenin 1977) methods, so that the learned probabilistic flow tubes can be appropriately “warped” to new situations. Further research is planned to determine the feasibility and effectiveness of this concept. In the literature, the most similar approach to mine is by Muhlig et al. (2009), who represented learned motions as Gaussian Mixture Models. In my next paper, I will compare the performance of our relative approaches based on how well a learned motion is able to classify new test motions. Another related work is by Pastor et al. (2009), where a demonstrated movement is represented by a set of differential equations that model the dynamics. I hypothesize that my approach will generate more humanlike movements. The second aspect of my research moves beyond learning motions into discovering an operator’s implicit plans. The approach is outlined as follows: First, the input command history is examined for recurring subsequence patterns using motif discovery (Chiu, Keogh, and Lonardi 2003). Since the command patterns can be noisy, we need a way to generalize. I propose to use a hybrid planner (Li and Williams 2008) to generate a plan that fulfills the range of end states reached by a command pattern. To ensure the resulting plan will closely resemble a human’s intended commands, the plan- Progress to Date Last semester, I completed and successfully defended my thesis proposal to my doctoral committee, and completed all course requirements for my major program of study. Through JPL’s Strategic University Research Partnership program, our group received a grant to apply our research to the ATHLETE robot. Last July, I demonstrated a preliminary implementation of motion learning on ATHLETE hardware. In March, I demonstrated the newer capabilities of parameterized motion learning, during which ATHLETE was shown a few demonstrations of a pick-up and dropoff task. The robot was able to successfully generalize the learned motion to new situations where the object was initially placed in different locations. References Chiu, B.; Keogh, E.; and Lonardi, S. 2003. Probabilistic discovery of time series motifs. In ACM SIGKDD. Hofmann, A., and Williams, B. 2006. Exploiting spatial and temporal flexibility for plan execution of hybrid, underactuated systems. In AAAI, 948–955. AAAI Press. Li, H., and Williams, B. 2008. Generative planning for hybrid systems based on flow tubes. In ICAPS. Muhlig, M.; Steil, M. G. S. H. J.; and Goerick, C. 2009. Task-level imitation learning using variance-based movement optimization. In ICRA. Myers, C. S.; Rabiner, L. R.; and Rosenberg, A. E. 1979. Performance trade-offs in dynamic time warping algorithms for isolated word recognition. Journal of the Acoustical Society of America 66(S1):S34–S35. Pastor, P.; Hoffmann, H.; Asfour, T.; and Schaal, S. 2009. Learning and generalization of motor skills by learning from demonstration. In ICRA. Ramirez, M., and Geffner, H. 2009. Plan recognition as planning. In IJCAI. Tikhonov, A. N., and Arsenin, V. Y. 1977. Solutions of illposed problems. Washington: Winston. 1975