II. R W Abstract

advertisement
Task Planning using a Semantic Map and Human Feedback
Kalesha Bullard, Ashley Edwards, Gabino Dabdoub, and Salim Dabdoub

Abstract— Robotic assistants are becoming increasingly useful
in many different domains, such as in the healthcare industry
for assisting with elderly or disabled patients, in outer space for
scientific exploration purposes, in the home for personal
assistance, and in an office setting. If robots are to act as
assistants or companions to humans, a human should be able to
ask a robot to provide support by performing some higher-level
task, just as they would ask another human, and the robot
should be able to respond accordingly. Our project seeks to
build an intelligent reasoning and planning framework using a
semantic map as the underlying knowledge representation, and
integrates human-robot interaction in order to minimize the
uncertainty of the robot. A robot assistant is given a higherlevel task to achieve by a human. We explore how a semantic
map may be used, along with human feedback, in order to help
the robot interpret its task, reason about how to achieve the
task in the given environment, and subsequently plan and
execute the task effectively. The robot re-plans dynamically as
it acquires new information.
I. INTRODUCTION
In interpersonal interactions, a person often asks another
person to perform some task, commonly referred to as doing a
favor. The favor may be asking the other person to retrieve
an item or to deliver a message to another person for them. If
robots are to act as assistants or companions to humans, a
human should be able to ask a robot to provide support by
performing some arbitrary task, just as they would ask
another human. In order to begin reasoning about its task(s),
the robot should have sufficient knowledge about the task(s) it
is being asked to perform as well as a working knowledge of
the specific domain and immediate environment it is
functioning in. However, like a human, a robot does not have
complete knowledge about the world, and its sensing
mechanisms are even more limited. Therefore, it should be
able to seek feedback from humans in order to help it make
sense of its world, enhance its knowledge base, and ultimately
achieve its task.
We consider the domain of an office building,
specifically the Robotics and Intelligent Machines (RIM)
Center in the College of Computing at Georgia Tech. The
robot plays the role of assistant to a human supervisor, who
may give the robot several tasks to achieve at any given point
in time. Given a multiple higher-level tasks to achieve by a
human, we seek to address how a semantic map can be used
with human feedback to help the robot interpret its task(s),
reason about how to achieve the tasks in the given
environment, and subsequently plan and execute each task
effectively.
II. RELATED WORK
A. Semantic Mapping
Semantic mapping integrates semantic domain
knowledge into traditional robot maps. It is a young
research topic in mobile robotics and holds much
promise for improving a robot’s autonomous reasoning
capabilities and interaction with humans. Galindo et. al
give a good introduction to semantic maps for planning
with robots [1].
It is often useful to automatically generate these
maps. For example, Nuchter and Hertzberg [3] introduce
a method for using a 3d sensor to generate a semantic
map. Nieto-Granda et.al’s work automatically
determines regions for semantic maps.
These metrics are useful for maps in unknown
environments. However, for our purposes, because we
have a floor plan of the environment the robot will be
acting in it is equally beneficial to manually generate a
map.
Christensen et.al’s work [5] describes how humans
and cognitive robots can interact together to perform
tasks. We are interested in using the information in
semantic maps to reduce the amount of queries a robot
needs to ask a human when attempting to solve a task. In
this case, the origin of the map is not as important as the
amount of information these maps contain. More
information within the semantic map will reduce a
robot’s uncertainty about a task.
Furthermore, manually creating a map may be
useful and natural for non-experts. For example, one
could possibly upload a floor plan of their home into the
robot, manually divide the home into areas, and then
communicate with the system about the types of objects
that the area contains. Then, if the human has an object
that they would like the robot to retrieve, they could
simply tell it what room it was contained in, or what it
was located by. This type of interaction can be used in
interactive planning.
B. Active learning and Interactive Planning
In interactive planning, robots are able to utilize help
from humans to solve a task. Rosenthal and Veloso
introduce a robot that is capable of navigating around its
environment and asking for a human’s help in achieving
some physical task, such as pressing a button on an
elevator [2]. Interactive planning and interactive learning
in general is similar to the field of Active Learning [4],
where an autonomous agent attempts to ask humans
about unknown parameters in its environment.
Humans, however, are a limited resource. They
could become bored from the robot asking them
questions, not know the answer to a question, or not be
in proximity to the robot. Therefore, we wish to combine
the power of semantic maps with interactive planning to
allow the robot to infer answers about its environment,
thus limiting the amount of questions it needs to ask the
humans.
III. APPROACH
Figure 2. Artifact Ontology
Figure 1. High-Level System Architecture
Our goal was to design a system capable of reasoning
intelligently about what it should expect to find within its
environment (i.e. an office setting) and how it may leverage
the knowledge about its domain to aid it in planning and
executing tasks or asking questions when it is faced with a
problem that it cannot solve.
A. Semantic Map Representation
The semantic map representation uses domain-specific
ontolog(ies) in order to assign meaning to areas on a map.
Each annotation or label placed on an area of the map
conveys information about that area that the robot may
retrieve whenever it needs to reason, plan, learn, or make a
query. Our semantic map relates two different types of
concepts, areas and artifacts. Figs. 2 and 3 show the
ontologies.
Each of the two ontologies has concepts classified within a
hierarchy, which defines the relationships between the
concepts. We have kept our ontology relatively simple, due
to time constraints. Each area is defined by key artifacts
typically found within it. We selected specific artifacts that
we believe to be more easily distinguishable, as not to clutter
the robot’s brain or sensors with too much detail. An artifact
is considered to be an atomic element in the map, meaning
that we do not consider properties or subcomponents of
artifacts. We assume that our robot is navigating the RIM
Center. Each area represents potential space that he may
occupy as he mobilizes, and each artifact
Figure 3. Area Ontology
represents items that he may observe in those areas. For
example, an office is a type of area found within our
semantic map. An office, as we define it, is typically
characterized by having a PC, desk, chair, books, papers,
bookshelf, and a board. Each artifact that comprises an area
is assigned a fuzzy likelihood: high, moderate, or low. This
likelihood represents the likelihood of finding an instance of
that artifact in an instance of that type of area.
In the
example of the Office type, PC, Desk, Chair, Paper, and
Book have all been assigned a high likelihood of being
located in an Office. This conveys to the robot that if it is
searching for any one of those artifacts, an office may be a
good place to look because it has a high likelihood of
containing one of those items. Bookshelf has been assigned
a moderate likelihood, and Board has been assigned a low
likelihood. By assigning likelihoods, we are able to consider
even those artifacts that are seldom found within an area.
This way, the robot is made aware of all of its choices and is
able to make an informed decision. These values may of
Figure 5. BDI Framework
Figure 4. Subset of RIM Floor Plan used to build Semantic Map
course be modified, but they serve as ground truth for the
robot, meaning that he assumes them to be true and reasons
based off of this information.
We also incorporate other properties into specific
instances of areas, as opposed to a general type of area. One
important attribute is that an area may have a person assigned
to it. For example, office1 is assigned to Josie, whereas
office4 is assigned to Andrea. If the robot is looking for a
specific person who has an office or cubicle assigned to
him/her, the robot should expect that this is good starting
place to check before searching other places that the person
may be located. When a person’s name is assigned to an
area, he/she is also associated with some relative likelihood
of being found in that area. This will differ for different
people because it depends on how much time the person
spends in their office/cubicle. Another properties associated
with an area is observations made in that area. In our system,
observations do not have likelihoods associated with them.
They are used in lieu of a real robot observing
artifacts/people in its environment using its sensors.
When the robot is tasked with finding some object or
person and it enters an area, it makes observations in order to
determine if it has found what it is seeking. If so, it has
achieved one of its goals and may progress forward with
remaining goals. If not, it must continue searching until it
finds what it seeks.
The other attributes assigned to an area are functional in
nature and are used to enable the robot to plan how to
navigate through the office building. We have encoded our
semantic map representation using an undirected graph,
where areas on the map represent nodes and two nodes are
connected by an edge if it is possible to directly navigate
from one to the other. Even though two area nodes are
adjacent to one another, they may not share an edge if they
are separated by a wall and there is no way to navigate
directly between them. Fig 4 shows an image of the portion
of the RIM Center we used for our semantic map. The map
has been divided into a grid, where each area node has an
(x,y) position and possible entrances through which to enter
and exit. If two nodes are vertically adjacent and the top
node has southern entrance and bottom node has a northern
entrance, then they will be connected by an edge.
B. Reasoner
Choice of representation for the semantic map is a primary
factor in how the robot retrieves, acquires, and reasons about
knowledge. Therefore, this was an extremely important
decision that laid the foundation for how we would need to
design and implement each of our algorithms. The power of
the semantic map representation is that it endows the robot
with the capability to reason about the meaning of concepts
within their spatial context. But if the representation was not
selected carefully, the reasoning could become very arduous
or inefficient. We used Python for implementation, which
does all of its computations and creates all of its data
structures dynamically. This worked well in terms of being
able to dynamically modify the underlying data structures
that contained information about the map, as the robot
acquired additional information.
Our reasoning system plays a key role in the robot’s
ability to plan and execute tasks effectively. We use a BDI
(Beliefs-Desires-Intentions) Framework. Beliefs represent
beliefs that the robot has. They take into account state of the
world from the robot’s perspective (i.e. observations from
sensory input) and state of the robot (e.g. battery
consumption, remaining time to complete task, etc.). Desires
represent goals. They take into account persistent goals and
priorities assigned to each. A persistent goal is a goal that
the robot keeps until it determines that the goal has either
been achieved, is unachievable, or is no longer requested by
his human supervisor. Intentions represent the robot’s
current commitment to a selected desire/goal and a plan to
achieve that goal. Our reasoner is used to decipher current
intention/commitment based on the robot’s beliefs and
highest priority desires/goals. If the reasoner does not have
enough information to determine the current intention, the
robot acquires additional information by making a query.
Once the robot has selected the next goal state, it creates a
Figure 6. Algorithm/State Machine for Achieving a Task
plan to get from current state to goal state, discussed in
section C. Fig 5 shows the diagram of the BDI framework
implemented.
The belief store always holds the robot’s current beliefs.
The goal store holds all of its remaining persistent goals yet
to be achieved. The query store is the one component we
have not yet discussed.
We selected our framework in order to enable the robot to
make sense of its environment and use his interpretation to
try and reason about how best to navigate the office building
in order to achieve tasks assigned. But a primary component
of our system is the robot’s ability to make a
Figure 7. PseudoCode for Reasoning System
query when it has trouble reasoning or planning and is
otherwise unable to find a solution to its problem. The query
store contains a list of queries that the robot may make when
faced with a high degree of uncertainty. It contains questions
about objects, people, and locations, as well as general
queries regarding ambiguity or uncertainty. Fig 6 shows the
state machine used to determine the flow of performing a
task and when queries are made.
There was no precise, standard reasoning algorithm used
to figure out which task/goal to pursue next and in what
order to navigate to the possible locations where the target
object or person could be found. We attempted to create an
algorithm using informal logic and essentially relying on
how we believe humans attempt to synthesize information
and ascertain what step to take next. It is informal logic in
the sense that although there are no logic rules in the source
code, the robot inherently uses logic to reason. For example,
if an artifact is typically found in a specific type of area, then
I believe that I can expect to find the artifact there. And if I
can expect to find an artifact (or a person) in specific
location, then I will go there to search for the artifact/person.
If I am unable to find what/who I am looking for, I will
search another area where I can expect to find the target. If I
cannot find another area where I can expect to find the target,
I will search an area where I have observed the target before.
If I have exhausted all of these options, I will query a human
in order to receive assistance. This line of logic, where p
implies q, may be represented by loops and if-else clauses.
The goal of the reasoner is to use sound deductions and
inferences based on observations and/or expectations from its
domain knowledge.
At any given time, the robot may have several goals to
achieve. The first step before navigating to achieve any
task/goal is to figure out what goal to pursue next. The next
step is narrowing down the possible locations to achieve the
goal the number of possible locations exceeds some
determined maximum and then figuring out what order to
navigate to the possible locations. Figure 7 shows the main
algorithm used to do this. It does not include queries.
C. Planner
Once the Reasoning System determines which goal to
achieve the goal location is sent to the planner, which uses
A* search to generate a complete plan. The plan is optimal,
as it uses the Euclidean Distance as its heuristic. The
semantic map is used to determine which locations the
planner can actually reach from its starting position, given by
the current location of the robot. The semantic map allows
the planner to produce sound plans. For instance, the map
shows which areas are adjacent. However, most adjacent
regions in our map need to enter the hallway first before
reaching its neighboring area. Therefore, we prune A*’s
search tree by only searching adjacent areas that have a
shared opening with the current node.
D. Extensions to Robot Planning
The planner generates an ordered list of areas that the robot
needs to go to before reaching its goal. This could be useful
for a mobile robot in a dynamic environment containing
obstacles. The robot could, for instance, use the optimal route
from the planner as a guide to find a path to the goal as
opposed to attempting to find one form its initial position.
Let’s take this Fig.8 for example. Let’s suppose the robot
wants to get to the block in C1. There are many confined
spaces in this map, so a motion planner may have some
trouble finding the entrance to the goal. Our semantic plan,
however, will make this problem much easier. The map is
divided into areas that allow us to decompose the goal for the
mobile robot. We know that the robot should be able to reach
these goals (ignoring possible obstacles) since our plan told
us there was a shared entrance between each area. Therefore,
rather than having the robot attempt to move from its starting
position to the goal, we could have it move to each sub-goal,
or location in the plan. We believe that this could allow the
motion planner to find paths much faster,
while still remaining close to optimal. Currently, this work
focused on the interaction with the robot. However, this
could be an interesting result.
Figure 8. Example Mobile Robot Application
IV. EXPERIMENTS AND ANALYSIS
Our approach has been implemented on software runs only.
Although our intention was to implement this on hardware as
well, we were unable to get our system successfully running
on one of the mobile robots in the RIM Center before the
completion of the class. We assume the softbot (i.e. software
robot) is given a full semantic map of the environment,
meaning that every area in the office is annotated with a label
from one of the two RIM ontologies, denoting what type of
area it is. Since each area has a label that gives information
about characteristics of that type of area and of that specific
instance, the softbot uses this information in order to provide
it with expectations of what it may find in a particular area.
We tested our system by giving the softbot different objects
or people to find within the ontologies, and we randomized
the priorities in order to see how the softbot responded. Let
us look at an example in more depth, in order to do a
qualitative analysis.
The softbot is given five tasks to achieve by his human
supervisor, each with a predefined priority. As he receives
them, he groups them by priority to ensure that tasks in a
higher priority bracket get achieved first. Within a priority
bracket, the softbot orders the tasks by the distance between
his current location and the most likely place he expects to
find the object or person. He pursues the first task in
ascending order, by expected distance. If he has no
expectation in terms of where to find a particular object or
person, this target simply gets placed on the end of the this
target simply gets placed on the end of the list within that
priority bracket. Hence, he pursues the goals he has some
expectations about first, and tries to do so autonomously,
then asks questions when he encounters uncertainties later,
about where to achieve a goal.
With his current set of tasks, he begins at the dining area in
the RIM center. We see in the Task Completion Order
column, out of his two high priority goals, he pursues water
first. He expects to find water in an area relatively close to
his current location. He must ask the color for object
recognition purposes. He goes on to ask about types of areas
where he may find a magazine. The human gives him many
options: “office cubicle library breakarea”. This yields 21
different possible locations, which is way too many to
search. This would consume too much of the robot’s time
and cause him to be much less productive as an assistant.
Therefore, he asks another question to help him narrow his
options. After asking what objects magazines are usually
found near and being told books and bookshelves, he is able
to narrow down to 10 options. Both offices and libraries
may generally contain books and bookshelves. Libraries
have been defined as having a high likelihood for possessing
both of these artifacts, whereas offices possess a high
likelihood of having a book but moderate likelihood of
having a bookshelf. Nonetheless, the robot is still currently
in the break area, which is right next to the entire suite of
offices, and the library is on the other side of the floor. This
is where the weighted average is used to compute which
location he should try first. As humans, we are oftentimes
likely to try an area that may yield a lower likelihood of
achieving our goal if it is located conveniently nearby, before
we go significantly out of way to try another option, even if
it yields a higher likelihood of obtaining what we seek. We
experimented with a few different pairs of weights [0.5, 0.5],
[0.7, 0.3], [0.6, 0.4], and even [0.8, 0.2], but since the offices
were much closer, the softbot always tried them first and
then went to the library after he finished. It is interesting to
note that the artifacts selected greatly impact the locations
the softbot comes up with. When just a book was selected,
he narrowed down the four area types to offices, cubicles,
and library. When a book, a bookshelf, and a chair were
selected, he narrowed it down to offices and library. When a
book and a chair were selected, he narrowed it down to
offices, cubicles, and library. When a book, table, and chair
were selected, with or without a bookshelf, the softbot
narrowed it down to only a library because tables were not
defined as common artifacts found in offices. These are
important considerations in figuring out the best ways to
design the questions. After high priority tasks, the robot
goes on to achieve normal priority tasks. With people,
Ashley and Mike, the robot goes through a similar process
but asks slightly different questions. When he could not find
Ashley at her cubicle, he asked where else she could be. If
the user had responded no, he would have asked who she
may be found near. If the user had responded Kalesha, the
softbot would have still gone to the library because it knows
that Kalesha is someone who has a low likelihood of being
found in the RIM library. When asked to find Mike, he had
many more questions.
Figure 9. Qualitative Results for an Example Set of Tasks
First of all, he did not know where Mike’s office was
because it was not defined for him. Once he finds Mike’s
office based off of knowing where Josie’s office is, he
realizes that Mike is not in his office. He must ask additional
questions to try to locate Mike. He finally finds him in
Tom’s office. As a note, the robot did not ask any questions
to find the other magazine because he already knew where
magazines were located and he did not even have to travel to
get to it, so he selected that as his first normal priority goal.
If we had asked him to go back and get Mike for some
reason as a low priority goal, he would have tried the place(s)
that he had previously observed Mike before asking any
questions of a human. Also, important to note, when the
robot has exhausted all of his options for queries and still can
come up with no possible locations or when it has tried all of
the possible locations it knows to try (even after asking for
human assistance), it will determine that the task is
unachievable. Then, it will let its human supervisor know it
cannot achieve the task and move forward with the next goal.
After each goal is achieved, the belief store and the goal store
are re-evaluated in order to determine how to proceed
forward.
V. DISCUSSION
There is a substantial amount of future work that could be
researched in order to build upon what we have begun. We
were not able to incorporate any substantial machine learning
algorithms, like we hoped. We had initially hoped to have a
partial semantic map, and to endow our robot with more
learning capabilities and take a closer look at the integration
of planning and learning enabled by the semantic map
representation. In order to make this a realistic system, it
would need to have some natural language processing and
object recognition capabilities. It should also take into
account sensor uncertainty, as well as actuator uncertainty, as
the mobile robot is navigating in a highly dynamic and
uncertain world with limited sensing mechanisms. It would
have been great to build a more detailed and extensive
ontology because this would empower the robot with
additional domain knowledge and even different types of
domain knowledge in order to extend its reasoning and
decision-making capabilities. We have also reflected on the
possibilities of developing a dynamic Query Store, such that
the robot is not limited to a set of predefined questions. This
would be an interesting problem to explore.
Of course the most obvious extension would be to
implement our system on a real robot. We had high hopes of
this, but faced some unanticipated challenges. In hindsight,
we would have begun trying to implement the system on an
actual robot quite a bit earlier. We did not begin to try and
do this until late November; we greatly underestimated the
challenges associated with getting the robot up and running
and integrating it with the software. As a matter of fact, two
of our group members spent the majority of their time trying
to get the hardware to work.
In reference to ROS and the turtlebot, we spent many
hours, for approximately two and a half weeks, and were not
able to get it working in the end. The goal was to enable the
turtlebot to execute the plan outputted by our Planner. We
first focused on the navigation aspect. One of the researchers
at the RIM center who was already familiar with ROS helped
us to get familiar with the platform. He also helped us learn
to create a map image in order to upload it to the robot. There
was a substantial learning curve on this, all of this. We
initially struggled between SLAM and uploading the map.
Once we had successfully uploaded the map, we tried various
ways of getting the robot to move. We were able to
teleoperate it, but were not able to allow it to move
autonomously, even after going through all the ROS tutorials
on the subject. We also encountered issues with using vision
libraries. After much trial and error, it seemed that most of
problems were due to errors with the initial
setup/configuration of the turtlebot. The launch file created
several problems. Nonetheless, fixing these errors required a
fairly advanced knowledge of ROS, which none of us had. It
began to seem like an uphill battle, given the time
constraints. We then attempted to try using a 2D simulator,
but soon realized that a simple simulation of the robot
moving around the map provided no real value for our
purposes; we really needed to implement the system on a real
robot.
In reference to the software, we started designing the
system architecture sometime in about the middle of
October; we wrote a proposal for the concept, with a defined
problem statement, in the beginning of November.
Nonetheless, we did not actually begin implementation until
mid to late November, around the end of Project 2. Even
with having a strong design, implementation was not trivial
to say the least, and many implementation details are
unaccounted for until you actually go to implement the
system. We worked on design of this project intermittently
throughout the end of October and beginning of November,
as we were also focusing our attention on Project 2. We
began working to implement it fairly regularly since
submitting that project. If we had to condense all of the time
spent on the project, it would probably be a solid 3 and half
weeks of regularly designing/redesigning and implementing
the software architecture and learning the hardware
simultaneously. To complete our initial goal, which was to
begin with a partial semantic map, learn the remaining
sections of the map, and incorporate natural language
processing, object recognition, and sensor/actuator
uncertainty, it could have easily taken over a year. Our
vision, although inspired, was overly ambitious. We worked
hard on our project, but did not make as much progress as we
hoped. It is difficult to compare our project to other groups
since it was so different from all of the others, especially not
really knowing the grading criteria. But all in all, we think
we deserve an A. Our project was significantly more
extensive than most other groups and required the
implementation of different sub-systems. It required a lot of
thought and planning, in terms of the system architecture
design and the representation. Although we did not get the
robot working as hoped/intended, our work was by no means
trivial; in fact, the problem is one that is quite difficult to
solve, and we were definitely diligent at it for the time period
that we worked on it.
ACKNOWLEDGMENT
We would like to thank Dr. Henrik Christensen for
allowing us to use his robot and for providing us with insight
for the project. We would also like to thank Dr. Mike
Stilman and his lab for helping us throughout this process.
REFERENCES
[1]
[2]
[3]
[4]
[5]
Galindo Cipriano et. al. “Robot Task Planning Using Semantic Maps.”
Journal Robotics and Autonomous Systems (2008):955-966
Rosenthal, Stephanie and Manuela Veloso. “Mobile Robot Planning to
Seek Help with Spatially-Situated Tasks”AAAI 2012.
Nüchter, A. and J. Hertzberg, Towards Semantic Maps for Mobile
Robots. Robotics and Autonomous Systems, 2008. 56(11): p. 915-926.
Settles Burr. “Active Learning Literature Survey.” Computer Science
Technical Report 1648.
Sjoo, K., et al., The Explorer System. Cognitive Systems, 2010: p. 395421
Download