Task Planning using a Semantic Map and Human Feedback Kalesha Bullard, Ashley Edwards, Gabino Dabdoub, and Salim Dabdoub Abstract— Robotic assistants are becoming increasingly useful in many different domains, such as in the healthcare industry for assisting with elderly or disabled patients, in outer space for scientific exploration purposes, in the home for personal assistance, and in an office setting. If robots are to act as assistants or companions to humans, a human should be able to ask a robot to provide support by performing some higher-level task, just as they would ask another human, and the robot should be able to respond accordingly. Our project seeks to build an intelligent reasoning and planning framework using a semantic map as the underlying knowledge representation, and integrates human-robot interaction in order to minimize the uncertainty of the robot. A robot assistant is given a higherlevel task to achieve by a human. We explore how a semantic map may be used, along with human feedback, in order to help the robot interpret its task, reason about how to achieve the task in the given environment, and subsequently plan and execute the task effectively. The robot re-plans dynamically as it acquires new information. I. INTRODUCTION In interpersonal interactions, a person often asks another person to perform some task, commonly referred to as doing a favor. The favor may be asking the other person to retrieve an item or to deliver a message to another person for them. If robots are to act as assistants or companions to humans, a human should be able to ask a robot to provide support by performing some arbitrary task, just as they would ask another human. In order to begin reasoning about its task(s), the robot should have sufficient knowledge about the task(s) it is being asked to perform as well as a working knowledge of the specific domain and immediate environment it is functioning in. However, like a human, a robot does not have complete knowledge about the world, and its sensing mechanisms are even more limited. Therefore, it should be able to seek feedback from humans in order to help it make sense of its world, enhance its knowledge base, and ultimately achieve its task. We consider the domain of an office building, specifically the Robotics and Intelligent Machines (RIM) Center in the College of Computing at Georgia Tech. The robot plays the role of assistant to a human supervisor, who may give the robot several tasks to achieve at any given point in time. Given a multiple higher-level tasks to achieve by a human, we seek to address how a semantic map can be used with human feedback to help the robot interpret its task(s), reason about how to achieve the tasks in the given environment, and subsequently plan and execute each task effectively. II. RELATED WORK A. Semantic Mapping Semantic mapping integrates semantic domain knowledge into traditional robot maps. It is a young research topic in mobile robotics and holds much promise for improving a robot’s autonomous reasoning capabilities and interaction with humans. Galindo et. al give a good introduction to semantic maps for planning with robots [1]. It is often useful to automatically generate these maps. For example, Nuchter and Hertzberg [3] introduce a method for using a 3d sensor to generate a semantic map. Nieto-Granda et.al’s work automatically determines regions for semantic maps. These metrics are useful for maps in unknown environments. However, for our purposes, because we have a floor plan of the environment the robot will be acting in it is equally beneficial to manually generate a map. Christensen et.al’s work [5] describes how humans and cognitive robots can interact together to perform tasks. We are interested in using the information in semantic maps to reduce the amount of queries a robot needs to ask a human when attempting to solve a task. In this case, the origin of the map is not as important as the amount of information these maps contain. More information within the semantic map will reduce a robot’s uncertainty about a task. Furthermore, manually creating a map may be useful and natural for non-experts. For example, one could possibly upload a floor plan of their home into the robot, manually divide the home into areas, and then communicate with the system about the types of objects that the area contains. Then, if the human has an object that they would like the robot to retrieve, they could simply tell it what room it was contained in, or what it was located by. This type of interaction can be used in interactive planning. B. Active learning and Interactive Planning In interactive planning, robots are able to utilize help from humans to solve a task. Rosenthal and Veloso introduce a robot that is capable of navigating around its environment and asking for a human’s help in achieving some physical task, such as pressing a button on an elevator [2]. Interactive planning and interactive learning in general is similar to the field of Active Learning [4], where an autonomous agent attempts to ask humans about unknown parameters in its environment. Humans, however, are a limited resource. They could become bored from the robot asking them questions, not know the answer to a question, or not be in proximity to the robot. Therefore, we wish to combine the power of semantic maps with interactive planning to allow the robot to infer answers about its environment, thus limiting the amount of questions it needs to ask the humans. III. APPROACH Figure 2. Artifact Ontology Figure 1. High-Level System Architecture Our goal was to design a system capable of reasoning intelligently about what it should expect to find within its environment (i.e. an office setting) and how it may leverage the knowledge about its domain to aid it in planning and executing tasks or asking questions when it is faced with a problem that it cannot solve. A. Semantic Map Representation The semantic map representation uses domain-specific ontolog(ies) in order to assign meaning to areas on a map. Each annotation or label placed on an area of the map conveys information about that area that the robot may retrieve whenever it needs to reason, plan, learn, or make a query. Our semantic map relates two different types of concepts, areas and artifacts. Figs. 2 and 3 show the ontologies. Each of the two ontologies has concepts classified within a hierarchy, which defines the relationships between the concepts. We have kept our ontology relatively simple, due to time constraints. Each area is defined by key artifacts typically found within it. We selected specific artifacts that we believe to be more easily distinguishable, as not to clutter the robot’s brain or sensors with too much detail. An artifact is considered to be an atomic element in the map, meaning that we do not consider properties or subcomponents of artifacts. We assume that our robot is navigating the RIM Center. Each area represents potential space that he may occupy as he mobilizes, and each artifact Figure 3. Area Ontology represents items that he may observe in those areas. For example, an office is a type of area found within our semantic map. An office, as we define it, is typically characterized by having a PC, desk, chair, books, papers, bookshelf, and a board. Each artifact that comprises an area is assigned a fuzzy likelihood: high, moderate, or low. This likelihood represents the likelihood of finding an instance of that artifact in an instance of that type of area. In the example of the Office type, PC, Desk, Chair, Paper, and Book have all been assigned a high likelihood of being located in an Office. This conveys to the robot that if it is searching for any one of those artifacts, an office may be a good place to look because it has a high likelihood of containing one of those items. Bookshelf has been assigned a moderate likelihood, and Board has been assigned a low likelihood. By assigning likelihoods, we are able to consider even those artifacts that are seldom found within an area. This way, the robot is made aware of all of its choices and is able to make an informed decision. These values may of Figure 5. BDI Framework Figure 4. Subset of RIM Floor Plan used to build Semantic Map course be modified, but they serve as ground truth for the robot, meaning that he assumes them to be true and reasons based off of this information. We also incorporate other properties into specific instances of areas, as opposed to a general type of area. One important attribute is that an area may have a person assigned to it. For example, office1 is assigned to Josie, whereas office4 is assigned to Andrea. If the robot is looking for a specific person who has an office or cubicle assigned to him/her, the robot should expect that this is good starting place to check before searching other places that the person may be located. When a person’s name is assigned to an area, he/she is also associated with some relative likelihood of being found in that area. This will differ for different people because it depends on how much time the person spends in their office/cubicle. Another properties associated with an area is observations made in that area. In our system, observations do not have likelihoods associated with them. They are used in lieu of a real robot observing artifacts/people in its environment using its sensors. When the robot is tasked with finding some object or person and it enters an area, it makes observations in order to determine if it has found what it is seeking. If so, it has achieved one of its goals and may progress forward with remaining goals. If not, it must continue searching until it finds what it seeks. The other attributes assigned to an area are functional in nature and are used to enable the robot to plan how to navigate through the office building. We have encoded our semantic map representation using an undirected graph, where areas on the map represent nodes and two nodes are connected by an edge if it is possible to directly navigate from one to the other. Even though two area nodes are adjacent to one another, they may not share an edge if they are separated by a wall and there is no way to navigate directly between them. Fig 4 shows an image of the portion of the RIM Center we used for our semantic map. The map has been divided into a grid, where each area node has an (x,y) position and possible entrances through which to enter and exit. If two nodes are vertically adjacent and the top node has southern entrance and bottom node has a northern entrance, then they will be connected by an edge. B. Reasoner Choice of representation for the semantic map is a primary factor in how the robot retrieves, acquires, and reasons about knowledge. Therefore, this was an extremely important decision that laid the foundation for how we would need to design and implement each of our algorithms. The power of the semantic map representation is that it endows the robot with the capability to reason about the meaning of concepts within their spatial context. But if the representation was not selected carefully, the reasoning could become very arduous or inefficient. We used Python for implementation, which does all of its computations and creates all of its data structures dynamically. This worked well in terms of being able to dynamically modify the underlying data structures that contained information about the map, as the robot acquired additional information. Our reasoning system plays a key role in the robot’s ability to plan and execute tasks effectively. We use a BDI (Beliefs-Desires-Intentions) Framework. Beliefs represent beliefs that the robot has. They take into account state of the world from the robot’s perspective (i.e. observations from sensory input) and state of the robot (e.g. battery consumption, remaining time to complete task, etc.). Desires represent goals. They take into account persistent goals and priorities assigned to each. A persistent goal is a goal that the robot keeps until it determines that the goal has either been achieved, is unachievable, or is no longer requested by his human supervisor. Intentions represent the robot’s current commitment to a selected desire/goal and a plan to achieve that goal. Our reasoner is used to decipher current intention/commitment based on the robot’s beliefs and highest priority desires/goals. If the reasoner does not have enough information to determine the current intention, the robot acquires additional information by making a query. Once the robot has selected the next goal state, it creates a Figure 6. Algorithm/State Machine for Achieving a Task plan to get from current state to goal state, discussed in section C. Fig 5 shows the diagram of the BDI framework implemented. The belief store always holds the robot’s current beliefs. The goal store holds all of its remaining persistent goals yet to be achieved. The query store is the one component we have not yet discussed. We selected our framework in order to enable the robot to make sense of its environment and use his interpretation to try and reason about how best to navigate the office building in order to achieve tasks assigned. But a primary component of our system is the robot’s ability to make a Figure 7. PseudoCode for Reasoning System query when it has trouble reasoning or planning and is otherwise unable to find a solution to its problem. The query store contains a list of queries that the robot may make when faced with a high degree of uncertainty. It contains questions about objects, people, and locations, as well as general queries regarding ambiguity or uncertainty. Fig 6 shows the state machine used to determine the flow of performing a task and when queries are made. There was no precise, standard reasoning algorithm used to figure out which task/goal to pursue next and in what order to navigate to the possible locations where the target object or person could be found. We attempted to create an algorithm using informal logic and essentially relying on how we believe humans attempt to synthesize information and ascertain what step to take next. It is informal logic in the sense that although there are no logic rules in the source code, the robot inherently uses logic to reason. For example, if an artifact is typically found in a specific type of area, then I believe that I can expect to find the artifact there. And if I can expect to find an artifact (or a person) in specific location, then I will go there to search for the artifact/person. If I am unable to find what/who I am looking for, I will search another area where I can expect to find the target. If I cannot find another area where I can expect to find the target, I will search an area where I have observed the target before. If I have exhausted all of these options, I will query a human in order to receive assistance. This line of logic, where p implies q, may be represented by loops and if-else clauses. The goal of the reasoner is to use sound deductions and inferences based on observations and/or expectations from its domain knowledge. At any given time, the robot may have several goals to achieve. The first step before navigating to achieve any task/goal is to figure out what goal to pursue next. The next step is narrowing down the possible locations to achieve the goal the number of possible locations exceeds some determined maximum and then figuring out what order to navigate to the possible locations. Figure 7 shows the main algorithm used to do this. It does not include queries. C. Planner Once the Reasoning System determines which goal to achieve the goal location is sent to the planner, which uses A* search to generate a complete plan. The plan is optimal, as it uses the Euclidean Distance as its heuristic. The semantic map is used to determine which locations the planner can actually reach from its starting position, given by the current location of the robot. The semantic map allows the planner to produce sound plans. For instance, the map shows which areas are adjacent. However, most adjacent regions in our map need to enter the hallway first before reaching its neighboring area. Therefore, we prune A*’s search tree by only searching adjacent areas that have a shared opening with the current node. D. Extensions to Robot Planning The planner generates an ordered list of areas that the robot needs to go to before reaching its goal. This could be useful for a mobile robot in a dynamic environment containing obstacles. The robot could, for instance, use the optimal route from the planner as a guide to find a path to the goal as opposed to attempting to find one form its initial position. Let’s take this Fig.8 for example. Let’s suppose the robot wants to get to the block in C1. There are many confined spaces in this map, so a motion planner may have some trouble finding the entrance to the goal. Our semantic plan, however, will make this problem much easier. The map is divided into areas that allow us to decompose the goal for the mobile robot. We know that the robot should be able to reach these goals (ignoring possible obstacles) since our plan told us there was a shared entrance between each area. Therefore, rather than having the robot attempt to move from its starting position to the goal, we could have it move to each sub-goal, or location in the plan. We believe that this could allow the motion planner to find paths much faster, while still remaining close to optimal. Currently, this work focused on the interaction with the robot. However, this could be an interesting result. Figure 8. Example Mobile Robot Application IV. EXPERIMENTS AND ANALYSIS Our approach has been implemented on software runs only. Although our intention was to implement this on hardware as well, we were unable to get our system successfully running on one of the mobile robots in the RIM Center before the completion of the class. We assume the softbot (i.e. software robot) is given a full semantic map of the environment, meaning that every area in the office is annotated with a label from one of the two RIM ontologies, denoting what type of area it is. Since each area has a label that gives information about characteristics of that type of area and of that specific instance, the softbot uses this information in order to provide it with expectations of what it may find in a particular area. We tested our system by giving the softbot different objects or people to find within the ontologies, and we randomized the priorities in order to see how the softbot responded. Let us look at an example in more depth, in order to do a qualitative analysis. The softbot is given five tasks to achieve by his human supervisor, each with a predefined priority. As he receives them, he groups them by priority to ensure that tasks in a higher priority bracket get achieved first. Within a priority bracket, the softbot orders the tasks by the distance between his current location and the most likely place he expects to find the object or person. He pursues the first task in ascending order, by expected distance. If he has no expectation in terms of where to find a particular object or person, this target simply gets placed on the end of the this target simply gets placed on the end of the list within that priority bracket. Hence, he pursues the goals he has some expectations about first, and tries to do so autonomously, then asks questions when he encounters uncertainties later, about where to achieve a goal. With his current set of tasks, he begins at the dining area in the RIM center. We see in the Task Completion Order column, out of his two high priority goals, he pursues water first. He expects to find water in an area relatively close to his current location. He must ask the color for object recognition purposes. He goes on to ask about types of areas where he may find a magazine. The human gives him many options: “office cubicle library breakarea”. This yields 21 different possible locations, which is way too many to search. This would consume too much of the robot’s time and cause him to be much less productive as an assistant. Therefore, he asks another question to help him narrow his options. After asking what objects magazines are usually found near and being told books and bookshelves, he is able to narrow down to 10 options. Both offices and libraries may generally contain books and bookshelves. Libraries have been defined as having a high likelihood for possessing both of these artifacts, whereas offices possess a high likelihood of having a book but moderate likelihood of having a bookshelf. Nonetheless, the robot is still currently in the break area, which is right next to the entire suite of offices, and the library is on the other side of the floor. This is where the weighted average is used to compute which location he should try first. As humans, we are oftentimes likely to try an area that may yield a lower likelihood of achieving our goal if it is located conveniently nearby, before we go significantly out of way to try another option, even if it yields a higher likelihood of obtaining what we seek. We experimented with a few different pairs of weights [0.5, 0.5], [0.7, 0.3], [0.6, 0.4], and even [0.8, 0.2], but since the offices were much closer, the softbot always tried them first and then went to the library after he finished. It is interesting to note that the artifacts selected greatly impact the locations the softbot comes up with. When just a book was selected, he narrowed down the four area types to offices, cubicles, and library. When a book, a bookshelf, and a chair were selected, he narrowed it down to offices and library. When a book and a chair were selected, he narrowed it down to offices, cubicles, and library. When a book, table, and chair were selected, with or without a bookshelf, the softbot narrowed it down to only a library because tables were not defined as common artifacts found in offices. These are important considerations in figuring out the best ways to design the questions. After high priority tasks, the robot goes on to achieve normal priority tasks. With people, Ashley and Mike, the robot goes through a similar process but asks slightly different questions. When he could not find Ashley at her cubicle, he asked where else she could be. If the user had responded no, he would have asked who she may be found near. If the user had responded Kalesha, the softbot would have still gone to the library because it knows that Kalesha is someone who has a low likelihood of being found in the RIM library. When asked to find Mike, he had many more questions. Figure 9. Qualitative Results for an Example Set of Tasks First of all, he did not know where Mike’s office was because it was not defined for him. Once he finds Mike’s office based off of knowing where Josie’s office is, he realizes that Mike is not in his office. He must ask additional questions to try to locate Mike. He finally finds him in Tom’s office. As a note, the robot did not ask any questions to find the other magazine because he already knew where magazines were located and he did not even have to travel to get to it, so he selected that as his first normal priority goal. If we had asked him to go back and get Mike for some reason as a low priority goal, he would have tried the place(s) that he had previously observed Mike before asking any questions of a human. Also, important to note, when the robot has exhausted all of his options for queries and still can come up with no possible locations or when it has tried all of the possible locations it knows to try (even after asking for human assistance), it will determine that the task is unachievable. Then, it will let its human supervisor know it cannot achieve the task and move forward with the next goal. After each goal is achieved, the belief store and the goal store are re-evaluated in order to determine how to proceed forward. V. DISCUSSION There is a substantial amount of future work that could be researched in order to build upon what we have begun. We were not able to incorporate any substantial machine learning algorithms, like we hoped. We had initially hoped to have a partial semantic map, and to endow our robot with more learning capabilities and take a closer look at the integration of planning and learning enabled by the semantic map representation. In order to make this a realistic system, it would need to have some natural language processing and object recognition capabilities. It should also take into account sensor uncertainty, as well as actuator uncertainty, as the mobile robot is navigating in a highly dynamic and uncertain world with limited sensing mechanisms. It would have been great to build a more detailed and extensive ontology because this would empower the robot with additional domain knowledge and even different types of domain knowledge in order to extend its reasoning and decision-making capabilities. We have also reflected on the possibilities of developing a dynamic Query Store, such that the robot is not limited to a set of predefined questions. This would be an interesting problem to explore. Of course the most obvious extension would be to implement our system on a real robot. We had high hopes of this, but faced some unanticipated challenges. In hindsight, we would have begun trying to implement the system on an actual robot quite a bit earlier. We did not begin to try and do this until late November; we greatly underestimated the challenges associated with getting the robot up and running and integrating it with the software. As a matter of fact, two of our group members spent the majority of their time trying to get the hardware to work. In reference to ROS and the turtlebot, we spent many hours, for approximately two and a half weeks, and were not able to get it working in the end. The goal was to enable the turtlebot to execute the plan outputted by our Planner. We first focused on the navigation aspect. One of the researchers at the RIM center who was already familiar with ROS helped us to get familiar with the platform. He also helped us learn to create a map image in order to upload it to the robot. There was a substantial learning curve on this, all of this. We initially struggled between SLAM and uploading the map. Once we had successfully uploaded the map, we tried various ways of getting the robot to move. We were able to teleoperate it, but were not able to allow it to move autonomously, even after going through all the ROS tutorials on the subject. We also encountered issues with using vision libraries. After much trial and error, it seemed that most of problems were due to errors with the initial setup/configuration of the turtlebot. The launch file created several problems. Nonetheless, fixing these errors required a fairly advanced knowledge of ROS, which none of us had. It began to seem like an uphill battle, given the time constraints. We then attempted to try using a 2D simulator, but soon realized that a simple simulation of the robot moving around the map provided no real value for our purposes; we really needed to implement the system on a real robot. In reference to the software, we started designing the system architecture sometime in about the middle of October; we wrote a proposal for the concept, with a defined problem statement, in the beginning of November. Nonetheless, we did not actually begin implementation until mid to late November, around the end of Project 2. Even with having a strong design, implementation was not trivial to say the least, and many implementation details are unaccounted for until you actually go to implement the system. We worked on design of this project intermittently throughout the end of October and beginning of November, as we were also focusing our attention on Project 2. We began working to implement it fairly regularly since submitting that project. If we had to condense all of the time spent on the project, it would probably be a solid 3 and half weeks of regularly designing/redesigning and implementing the software architecture and learning the hardware simultaneously. To complete our initial goal, which was to begin with a partial semantic map, learn the remaining sections of the map, and incorporate natural language processing, object recognition, and sensor/actuator uncertainty, it could have easily taken over a year. Our vision, although inspired, was overly ambitious. We worked hard on our project, but did not make as much progress as we hoped. It is difficult to compare our project to other groups since it was so different from all of the others, especially not really knowing the grading criteria. But all in all, we think we deserve an A. Our project was significantly more extensive than most other groups and required the implementation of different sub-systems. It required a lot of thought and planning, in terms of the system architecture design and the representation. Although we did not get the robot working as hoped/intended, our work was by no means trivial; in fact, the problem is one that is quite difficult to solve, and we were definitely diligent at it for the time period that we worked on it. ACKNOWLEDGMENT We would like to thank Dr. Henrik Christensen for allowing us to use his robot and for providing us with insight for the project. We would also like to thank Dr. Mike Stilman and his lab for helping us throughout this process. REFERENCES [1] [2] [3] [4] [5] Galindo Cipriano et. al. “Robot Task Planning Using Semantic Maps.” Journal Robotics and Autonomous Systems (2008):955-966 Rosenthal, Stephanie and Manuela Veloso. “Mobile Robot Planning to Seek Help with Spatially-Situated Tasks”AAAI 2012. Nüchter, A. and J. Hertzberg, Towards Semantic Maps for Mobile Robots. Robotics and Autonomous Systems, 2008. 56(11): p. 915-926. Settles Burr. “Active Learning Literature Survey.” Computer Science Technical Report 1648. Sjoo, K., et al., The Explorer System. Cognitive Systems, 2010: p. 395421