Interactive Task-Plan Learning Shuonan Dong

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10)
Interactive Task-Plan Learning
Shuonan Dong
Massachusetts Institute of Technology
77 Massachusetts Ave. 32-224, Cambridge, MA 02139
dongs@mit.edu
Abstract
Operators are not required to explicitly specify which task
they are trying to accomplish. Instead, the task commanding
interface passively monitors the interaction history for command patterns that could indicate implicit plans the user is
trying to perform. If the robot becomes confident enough of
an implicit plan, it can ask the user whether the discovered
plan should be stored as a new entry in the plan library for
future use.
Low-level direct commanding of space robots can be
time consuming or impractical for complex systems
with many degrees of freedom. My research will adaptively raise the level of interaction between the operator
and the robot by (1) allowing the robot to learn implicit
plans by detecting patterns in the interaction history,
and (2) enabling the human to demonstrate continuous
motions through teleoperation. Learned tasks and plans
are recorded for future use. I introduce a novel representation of continuous actions called parameterized probabilistic flow tubes that I hypothesize will more closely
encode a human’s intended motions and provide flexibility during execution in new situations. I also introduce the use of planning for plan recognition in the domain of hybrid tasks.
Research Problem
The interactive task-plan learning system encompasses two
research problems: motion learning and implicit plan discovery, which are summarized as follows:
1. Motion learning: Using multiple human demonstrations
of a continuous movement (such as ones captured through
teleoperation), learn a generalized representation of the
motion that can be applied to new situations.
Introduction
Today, most space robots are directly commanded from
Earth by hand. While manageable for simple robots, this
control scheme is tedious when controlling robots with
many degrees of freedom, such as JPL’s ATHLETE (AllTerrain Hex-Legged Extra-Terrestrial Explorer), which has
36 independent joints. Currently, operators can either use
low-level joint angle commands or choose among a handful
of pre-programmed higher-level tasks. Attempting to preprogram all possible tasks is unreasonable on such a complex robot. Instead, I present a learn-as-you-go approach,
which will adaptively raise the level of interaction between
the operator and the robot according to the operator’s level
of comfort.
The interactive task-plan learning system will operate in
two modes: task commanding and motion learning. Task
commanding mode is the default, where a user can specify desired commands. If a command is new to the robot,
i.e. not in the plan library, the robot can prompt the user to
demonstrate the new task. The system then switches to motion learning mode, where the user can specify lower-level
commands or demonstrate samples of a continuous motion
through teleoperation. A key point is that tasks are composed of hybrid (discrete or continuous) actions. The newly
learned task is added to the plan library for future use.
2. Implicit plan discovery: Using the execution history of
commands, infer the most likely set of implicit plans the
user has performed.
We now set up the problem more formally. We define a plan library as P = {pi }, where a plan is recursively composed from primitive activities or lower-level
plans: p = a|composition (pi , pj ), and composition =
sequence|parallel|choice.
The motion learning problem is stated as follows:
Definition 1 Given B objects or points of interest in the
environment, the human performs N demonstrations of a
motion, where each demonstration is recorded as the time
evolved states of all objects and points of interest in the enT
vironment, i.e. Sn = [st1 , . . . , st2 ] : n = 1 . . . N . We
define R = {rj } as the set of statistically important motion
parameters obtained from the location x and orientation θ
of an object and robot end effector upon change of contact.
The motion learning problem is to determine a parameterized probabilistic flow tube hµ (st1 ) , Σ (st1 )i that best describes the learned motion.
The implicit plan discovery problem is stated as follows:
Definition 2 Given a plan library P = {pi }, command history C = (c1 , . . . , cR ), and positive real value qthreshold ,
M = {mk } is a set of implicit plans with corresponding
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1974
confidence measures Q = {qk |qk > qthreshold }, such that
each implicit plan either exists in the plan library mk ∈ P ,
or is composed of elements in a subsequence of the command history mk = composition (cr1 , . . . , cr2 ) : 1 ≤ r1 <
r2 ≤ R.
ner can be initialized with the command pattern. Finally,
to compute the confidence measure of a plan, I first expand
all continuous components in the plan into their corresponding probabilistic flow tubes, then for each trajectory generated by an occurrence of the command pattern, compute the
deviation of the trajectory to the mean flow tube trajectory
according to the covariances.
The idea of using planning for recognition has been explored in a discrete grid world (Ramirez and Geffner 2009),
but to my knowledge it has not been used for learning and
recognizing complex hybrid plans.
I plan to measure the performance of the interactive taskplan learning system on three platforms. First, a 2D simulation of an environment with a few simple objects is used
for initial testing of the motion learning approach. Second,
a few hundred hybrid control sequences recorded from underwater vehicle deployment at MBARI will be used to test
implicit plan discovery. Third, through a collaboration with
JPL, I will test my system on the ATHLETE robot.
Proposed Research Plan
My approach to the motion learning problem is summarized
in the following steps:
First, the demonstrated motion trajectories are checked
for statistically important characteristics by detecting patterns in the position x or orientation θ relations of the robot
end effector ef f and objects obj in the environment. For example, an action “put ball in bin” has the characteristic that
when the robot releases the ball, the position of the robot
end effector is most likely above the bin, regardless of initial
locations. Characteristic relations are checked at points (denoted α, β, . . .) when the robot gains or loses contact with
an object, and specifically include
relations
that are abso α
β −
x
),
relative
(eg.
lute (eg. xα
x
obj
ef f
obj ), or relative to an
α object (eg. xα
ef f − xobj ).
Next, the trajectories are spatially normalized through
centering and rotation and temporally aligned using dynamic
time warping (Myers, Rabiner, and Rosenberg 1979). I introduce a probabilistic version of a flow tube (Hofmann and
Williams 2006) that can be derived by computing the mean
and covariance of the normalized trajectories. During execution, a flow tube’s covariance determines how much penalty
is placed on deviating from the mean trajectory.
Finally, since humans do not necessarily scale motions
linearly to different situations, how to apply a learned flow
tube to a new situation remains an open research problem. I propose to derive a spatial warping function from
the demonstrated motion trajectories using Tikhonov regularization (Tikhonov and Arsenin 1977) methods, so that
the learned probabilistic flow tubes can be appropriately
“warped” to new situations. Further research is planned to
determine the feasibility and effectiveness of this concept.
In the literature, the most similar approach to mine is by
Muhlig et al. (2009), who represented learned motions as
Gaussian Mixture Models. In my next paper, I will compare the performance of our relative approaches based on
how well a learned motion is able to classify new test motions. Another related work is by Pastor et al. (2009), where
a demonstrated movement is represented by a set of differential equations that model the dynamics. I hypothesize that
my approach will generate more humanlike movements.
The second aspect of my research moves beyond learning
motions into discovering an operator’s implicit plans. The
approach is outlined as follows: First, the input command
history is examined for recurring subsequence patterns using
motif discovery (Chiu, Keogh, and Lonardi 2003). Since the
command patterns can be noisy, we need a way to generalize. I propose to use a hybrid planner (Li and Williams 2008)
to generate a plan that fulfills the range of end states reached
by a command pattern. To ensure the resulting plan will
closely resemble a human’s intended commands, the plan-
Progress to Date
Last semester, I completed and successfully defended my
thesis proposal to my doctoral committee, and completed all
course requirements for my major program of study.
Through JPL’s Strategic University Research Partnership
program, our group received a grant to apply our research
to the ATHLETE robot. Last July, I demonstrated a preliminary implementation of motion learning on ATHLETE
hardware. In March, I demonstrated the newer capabilities
of parameterized motion learning, during which ATHLETE
was shown a few demonstrations of a pick-up and dropoff task. The robot was able to successfully generalize the
learned motion to new situations where the object was initially placed in different locations.
References
Chiu, B.; Keogh, E.; and Lonardi, S. 2003. Probabilistic
discovery of time series motifs. In ACM SIGKDD.
Hofmann, A., and Williams, B. 2006. Exploiting spatial
and temporal flexibility for plan execution of hybrid, underactuated systems. In AAAI, 948–955. AAAI Press.
Li, H., and Williams, B. 2008. Generative planning for
hybrid systems based on flow tubes. In ICAPS.
Muhlig, M.; Steil, M. G. S. H. J.; and Goerick, C. 2009.
Task-level imitation learning using variance-based movement optimization. In ICRA.
Myers, C. S.; Rabiner, L. R.; and Rosenberg, A. E. 1979.
Performance trade-offs in dynamic time warping algorithms
for isolated word recognition. Journal of the Acoustical Society of America 66(S1):S34–S35.
Pastor, P.; Hoffmann, H.; Asfour, T.; and Schaal, S. 2009.
Learning and generalization of motor skills by learning from
demonstration. In ICRA.
Ramirez, M., and Geffner, H. 2009. Plan recognition as
planning. In IJCAI.
Tikhonov, A. N., and Arsenin, V. Y. 1977. Solutions of illposed problems. Washington: Winston.
1975