Towards Enabling a Robot to Effectively Assist People in Human-Occupied Environments

advertisement
AAAI Technical Report SS-12-02
Designing Intelligent Robots: Reintegrating AI
Towards Enabling a Robot to Effectively Assist People
in Human-Occupied Environments
Sachithra Hemachandra and Ross Finman and Sudeep Pillai and
Seth Teller and Matthew R. Walter
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Cambridge, MA, USA
{sachih,rfinman,spillai,teller,mwalter}@csail.mit.edu
Abstract
Over the last decade, we have seen an increasing demand
for robots capable of coexisting with people. Enabling robots
to operate safely and effectively alongside human partners
within unstructured environments poses interesting research
challenges that broadly span the field of artificial intelligence.
This paper previews some of the challenges that we faced in
developing a robot “envoy” that operates for extended periods of time throughout an office-like environment, assisting
human occupants with everyday activities that include greeting and escorting guests as well as retrieving and delivering
objects. We see three skill areas as critical for the robot to
effectively perform these tasks. The first is shared situational
awareness—the robot must interpret its environment through
a world model that is shared with its human partners. Secondly, the robot should act in a safe, predictable manner and
be capable of intuitive interaction with people, through such
means as natural language speech. Thirdly, the robot, which
we initially treat as a rookie, should efficiently utilize information provided by human partners, requesting assistance
when necessary and learning from such assistance to become
more competent with time.
Introduction
Traditionally seen as tools used in isolation for industrial
automation, robots are becoming more pervasive within unstructured human-occupied environments. Our lab is currently developing a robot that is able to maintain a persistent presence alongside people in a workplace setting. The
Envoy Project is an effort to enable a mobile robot to serve
as an office assistant that works alongside people in a collaborative and socially acceptable manner. One task that the
robot would perform in this role is that of interacting with
lab visitors. The robot is currently capable of detecting and
interacting with visitors and can interpret and execute spoken commands that refer to escorting the user across floors
to their destination. Additionally, we would like the robot to
be able to retrieve known objects based upon a simple set
of directions, such as fulfilling the spoken request to “please
bring the blue notebook that I left on my desk.”
In this paper, we address three research areas that we feel
are important to ensuring that the Envoy robot works effectively with and is accepted by people. First, the robot must
c 2012, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
Figure 1: The Envoy robot platform.
exhibit shared situational awareness, which means that it interprets its environment, objects, and events in a manner that
is similar to our own world model and that the robot makes
its model transparent to the user. Secondly, the robot should
be capable of intuitive interaction with people that includes
the ability to understand natural language speech. Its actions
should be transparent and it should operate in ways that
are predictable and socially acceptable. Thirdly, the robot
should be able to request assistance when necessary so as to
avoid failure and utilize information that is provided as efficiently as possible. We treat the robot as initially being a
“rookie” that can not perform certain tasks on its own (e.g.,
pushing elevator call buttons), but that can request assistance
and, in the case of current work, learn to become more competent with time. In developing these capabilities, we take
advantage of recent advancements in perception, navigation,
motion planning, speech interpretation, and learning. Effectively integrating these potentially competing methods into
a common framework is one of the challenges we face.
Our platform (Fig. 1) consists of an electric wheelchair
with three LIDARs, a Microsoft Kinect camera and boom
microphone mounted to a panning head, an IMU, a barometric pressure sensor, a pair of speakers, and a laptop.
Research Challenges
We discuss our recent progress towards developing the
aforementioned capabilities for the Envoy platform and outline directions for future research.
Situational Awareness
Uncertainty Mitigation and Learning
Shared situational awareness on the part of the robot implies
the capacity to reason about its surroundings and actions using a model that is transparent to its human partners.
The robot maintains an occupancy map that it uses for
metric localization and a multi-floor semantic map of the environment. The semantic map consists of a coarse topology
that includes elevator locations, passageways, and rooms
bound to their colloquial names. The robot generates this
map through a guided tour scenario (Hemachandra et al.
2011), in which it autonomously follows a person throughout the building, learning the representation as they narrate.
We are currently working on a framework that allows people to convey richer semantic maps that model locations, objects, and their affordances through an online tour that emphasizes natural language speech.
In addition to recognizing modeled objects by name, we
believe the robot also needs to identify and track people
whether they are users or bystanders. Through color and
depth segmentation (Felzenszwalb and Huttenlocher 2004)
and learned geometric features, the robot detects people using the panning Kinect camera, and tracks them as they
move around the robot. The system also uses its LIDARs
to independently track users and bystanders.
A fundamental challenge to operating in unstructured environments is dealing with uncertainty. We treat the robot
much like a rookie, recognizing that there are actions that it
may not yet have the competency to perform, such as summoning elevators, or tasks that can be made simpler with
additional information, such as identifying a particular object among clutter. In an effort to improve performance, we
enable the robot to request help from and judiciously use
information provided by the human. For example, until the
robot is able to push elevator buttons, it asks the user to summon the elevator, hold the door, and push the relevant floor at
the appropriate time. The challenge we face when generalizing this capability is deciding when assistance is required
and what assistance is most helpful.
While we initially allow the robot to ask the user to open
doors and operate elevators, our goal is for it to become more
competent and independent as it develops. Learning from
demonstration (Argall et al. 2009) provides an approach to
acquiring new skills by observing a human repeatedly perform a series of tasks. One focus of our current research is
to extend the guided-tour scenario to manipulation, whereby
the robot opportunistically observes and learns from people
manipulating objects in the world, e.g., picking things up
and pushing elevator buttons. Of the many challenges is that
of being able to generalize the state/action sequences in the
context of the human to those that are consistent with the
robot’s perceptual and mobile manipulation capabilities. In
order to make this goal more tractable, we propose a strategy
that builds on work in natural language interpretation (Tellex
et al. 2011) to allow the user to provide supervision by using
speech to ground objects, actions, and causal relationships.
Human-Robot Interaction
In order for the robot to interact effectively with visitors, we
believe that it should be able to interpret task-constrained
natural language speech and synthesize responses. The ability to ground speech to the corresponding objects, locations,
and actions affords richer interactions (Walter et al. 2010;
Tellex et al. 2011). Situational awareness is a critical factor
in enabling this rich interaction.
We believe it is equally important that the robot operates
safely and that its actions be transparent to bystanders. People are more willing to accept the robot in their presence
if its actions are predictable and adhere to social conventions. This means, for example, that the robot should follow
predictable paths through the environment, a challenge for
sample-based motion planning algorithms that, while being
very good at identifying feasible paths, often result in nonobvious solutions. To address this, we make the assumption
that routes that are obvious to people are also efficient, and
implement an RRT∗ -based planner (Karaman et al. 2011) to
identify optimal paths.
It is also important that the robot adheres to social conventions that dictate how people interact, such as navigating down the right side of a corridor or letting people out
of an elevator before entering. Some challenges we’ve faced
include defining or learning these conventions and recognizing when they should govern the robot’s actions. Hard-coded
policies provide a partial solution in some cases (e.g., waiting to enter an elevator until people stop moving), but the
robot should also be capable of predicting the behavior of
people in its surroundings. Existing work in motion pattern
modeling (Joseph et al. 2011) can be incorporated within a
planning framework (Aoude et al. 2010) to support this behavior.
Conclusion
We have outlined some areas of research that we believe are
important if our Envoy robot is to work safely and effectively
alongside humans. We briefly described our current research
and outlined directions for future research.
References
Aoude, G.; Luders, B.; D.S., L.; and How, J. 2010. Threat-aware
path planning in uncertain urban environments. In IROS.
Argall, B.; Chernova, S.; Veloso, M.; and Browning, B. 2009. A
survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5).
Felzenszwalb, P., and Huttenlocher, D. 2004. Efficient graph-based
image segmentation. IJCV 59(2).
Hemachandra, S.; Kollar, T.; Roy, N.; and Teller, S. 2011. Following and interpretting narrated guided tours. In ICRA.
Joseph, J.; Doshi-Velez, F.; Huang, A.; and Roy, N. 2011. A
Bayesian nonparametric approach to modeling motion patterns.
Autonomous Robots.
Karaman, S.; Walter, M.; Perez, A.; Frazzoli, E.; and Teller, S.
2011. Anytime motion planning using the RRT*. In ICRA.
Tellex, S.; Kollar, T.; Dickerson, S.; Walter, M.; Banerjee, A.;
Teller, S.; and Roy, N. 2011. Understanding natural language commands for robotic navigation and mobile manipulation. In AAAI.
Walter, M.; Friedman, Y.; Antone, M.; and Teller, S. 2010. Visionbased reacquisition for task-level control. In ISER.
Download