Designing Intelligent Robots: Reintegrating AI II: Papers from the 2013 AAAI Spring Symposium Benchmarking Intelligent Service Robots through Scientific Competitions:The RoboCup@Home Approach Luca Iocchi Dirk Holz Autonomous Intelligent Systems Group, Department of Computer Science VI, University of Bonn, Germany, dirk.holz@ieee.org Tijn van der Zant Dept. of Artificial Intelligence Dept. of Computer, Control, and Faculty of Natural Sciences Management Engineering Sapienza University of Rome, Italy University of Groningen, the Netherlands tijn@ieee.org iocchi@dis.uniroma1.it Abstract is based on scientific competitions and has been adopted in the past years by the RoboCup@Home competitions. It allows for testing robots on many integrated tasks (as opposed to single functionalities) in real environments as well as testing interaction with humans (non-developers of the tested system). More specifically, this paper presents the “benchmarking through scientific competitions” approach of RoboCup@Home, showing the actual method for evaluating integrative research, the results of this evaluation over the years, a specific benchmark for the reasoning capabilities of the robots and a discussion about how teams solved and integrated the many problems that must be handled for a successful implementation. The dynamical and uncertain environments of domestic service robots, which include humans, require rethinking of the benchmarking principles for testing these robots. Since 2006 RoboCup@Home has used statistical procedures to track and steer the progress of domestic service robots. This paper explains the procedures and shows outcomes of these international benchmarking efforts. Although aspects such as shopping in a supermarket receive a fair amount of attention in the robotics community, the authors believe that a recently implemented test is the most important outcome of RoboCup@Home, namely the benchmarking of robot cognition. Introduction RoboCup@Home methodology Benchmarking intelligent robotic systems is a fundamental task necessary to successfully deploy intelligent robots in everyday activities. Moreover, benchmarking is of utter importance for the marketability of intelligent robots. In this paper we focus on domestic and service robots (DSRs). They assume a large variety of forms and intend to solve several types of problems. Moreover, DSRs have a primary objective of fluid interaction with people in typical human environments (e.g., a house or an office). The development of intelligent DSRs requires the integration of many capabilities coming from various research fields: robotics, artificial intelligence, human-robot interaction. Two issues make the development, and consequently benchmarking and performance evaluation, very difficult. These issues are 1) the interaction with humans in real environments and 2) the integration of several capabilities coming from different research fields. Developing methods to evaluate integrative research and realize good benchmarks are thus more difficult in a domestic and service robot application as: 1) humans and real environments must be explicitly considered in the benchmarking methodology and 2) there is large assortment of tasks that can be accomplished by a DSR. In this paper we present a benchmarking methodology for domestic and service robots that addresses the previously mentioned problems. In particular, this methodology RoboCup@Home1 (Wisspeintner et al. 2009) is the largest and most common benchmarking effort of integrated robotic systems in the fields of domestic service robotics and human-robot interaction. RoboCup@Home is a competition that usually runs in a 6 days period (including 1.5 days of setup) in a test arena that is configured as a realistic representation of a home environment. However, some tests are also executed in the competition venue or in real environments nearby, such as restaurants, shopping malls, etc. The competition scenario (i.e., both the test arena and the real environments) is not specified beforehand. For the competition scenario, teams have a short setup time (about 1 day and a half) before the competition starts to prepare for the particular scenario and to perform calibration, etc. For the real environments (real shops in 2010 and 2011, and a restaurant in 2012), there is no setup. Here the robots must properly function immediately in a completely unknown environment. The competition includes several so-called tests that test for particular capabilities and their successful integration. In addition, the competition features open demonstrations where teams can showcase their latest research results or draw the attention to problems that have not yet been considered in RoboCup@Home, possibly leading to future tests for the competition. c 2013, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 1 27 http://www.robocupathome.org/ is not feasible. Finally, tests change over the years in order to avoid development of solutions that converge to “local optimum” performance, i.e. solutions that are overspecialized and optimized for a single test or scenario. In the regular tests many different functionalities such as navigation, localization, object recognition, manipulation, and speech and gesture recognition are simultaneously tested in an integrated manner. Good results are obtained only when a complete system, implementing all the desired abilities, is successfully demonstrated during the competition. By requiring teams to participate in many different tests, we force the development of general solutions as hacking specific solutions for many different problems would be ineffective. In order to avoid static benchmarking, where the risk of progressing towards local optimum solutions may arise, tests are changed on a yearly basis. Tests change according to the decisions of the experts in the Technical Committee that integrates both the suggestions by the teams and the results of the statistical analysis described below. The main ingredients of the RoboCup@Home methodology can be summarized as: 1) benchmarking integrated systems (rather than single functionalities); 2) performing several different tests; 3) changing tests over time. In our opinion, these features, which are unique in the field of benchmarking intelligent service robots, are key factors able to drive researchers towards developing high quality and robust scientific solutions, while penalizing specific ad-hoc implementations. However, the previously described features also introduce problems that must be considered: 1) benchmarking integrated systems does not allow an easy evaluation of the performance of the single components of a complex system; 2) dynamicity of the benchmark makes it difficult to relate the temporal evolution of performance (Feil-Seifer, Skinner, and Mataric 2007). These problems are even more acute regarding tests that require the robot to operate in the real world. The answer to these problems provided by the RoboCup@Home methodology is to perform statistical analysis on the results. This is possible due to the participation of many teams in many tests, thus providing a good data for analysis. In the remainder of this section we will first describe the implementation of the RoboCup@Home competitions and then discuss the results. Statistical analysis One important feature of the RoboCup@Home methodology for benchmarking analysis (see (Wisspeintner et al. 2009) for details) is the measure of performance advances over the time of an evolving benchmark. This is obtained by analyzing the partial scores of each test, which are associated to specific functionalities, and by evaluating the amount of score gained by the best teams for each functionality in each test. In this way we can measure the average increase of performance of the given skills over years, even when changing the tests. The functional abilities that have been considered in the competitions are the following: • Navigation, the ability of path-planning and safely navigating to a specific target position in the environment, avoiding (dynamic) obstacles. • Mapping, the ability of autonomously building a representation of a partially known or unknown environment on-line. • Person Recognition, the ability of detecting and recognizing a person. • Person Tracking, the ability of tracking the position of a person over time. • Object Recognition, the ability of detecting and recognizing (known or unknown) objects in the environment. • Object Manipulation, the ability of grasping, moving or placing an object. • Speech Recognition, the ability of recognizing and interpreting spoken user commands (speaker dependent and speaker independent). • Gesture Recognition, the ability of recognizing and interpreting human gestures. Implementation of the competition • Cognition, the ability of understanding and reasoning about the world. The competition is organized in several tests that must be accomplished by the robots. These tests are integrated tests, thus each test comprises a set of functionalities that must be properly integrated to achieve good performance. However, the scoring system allows for giving partial credit if only a part of the test is achieved. More specifically, the total score of a test is the sum of certain partial scores. Each partial score is related to a specific functional ability (see next section). Another important issue of this organization is that different tests often require the use of the same functionality but in different conditions. This enforces the development of robust and flexible solutions as a specific ad-hoc solution for one test may not be appropriate for other tests, and developing as many ad-hoc specific solutions as the number of tests As already mentioned, these functionalities are distributed throughout the tests and associated to partial scores so that each partial score gained by a team can be related to specific functionalities. The details are not shown here for lack of space. In Table 1 we report the percentage of the available scores for each functionality gained by the teams since 2008. The first value is the maximum score achieved by a team and the second value is the average score achieved by the finalist teams. Usually there are five finalists. This table allows for many considerations, including: 1) which abilities have been most successfully implemented by the teams; 2) how difficult are the tests with respect to such abilities; 3) which tests and abilities need to be modified in order to guide future development in desired directions. 28 Ability Navigation Mapping Person Recognition Person Tracking Object Recognition Object Manipulation Speech Recognition Gesture Recognition Cognition Average 2008 [%] 40 / 25 100 / 44 32 / 15 100 / 81 29 / 8 3/1 87 / 37 0/0 41 / 21 2009 [%] 47 / 40 100 / 92 69 / 37 100 / 69 39 / 23 48 / 23 89 / 71 0/0 61.5 / 44.4 2010 [%] 33 / 20 21 / 10 57 / 23 100 / 72 6/1 29 / 8 50 / 38 62 / 26 17 / 3 41.6 / 22.4 2011 [%] 61 / 26 33 / 10 48 / 16 100 / 76 35 / 10 49 / 21 76 / 59 100 / 49 68 / 24 63.3 / 32.5 2012 [%] 52 / 23 10 / 4 62 / 15 62 / 33 56 / 20 73 / 27 90 / 56 88 / 37 32 / 8 58.2 / 24.8 Table 1: Achieved scores for the desired abilities. The first value is the maximum score from a team, the second is the average score of the finalist teams. the difficulty if the average performance is high; 2) merging of abilities into high-level skills, and more realistic tasks; 3) keeping or decreasing difficulty if the observed performance is not satisfying; 4) introducing new abilities and tests. As the integration of abilities will play an increasingly important role for future general purpose home robots, this aspect should be especially considered in future competitions. Other important parameters to assess the success of a benchmark are the number of participating research groups (teams) and the general increase of performance over the years. Obviously, it is difficult to determine such measures in a quantitative way: the constant evolution of the competition with its iterative modification of the rules and of the scores do not allow a direct comparison. However, it is possible to define some metrics of general increase of performance. They are based on the capability of a team to gain score in multiple tests, thus showing the effectiveness not only in implementing the single functionalities but also in integrating them into a working system, which includes the realization of a flexible and modular architecture allowing for the execution of different tasks. In Table 2, the number of teams participating in the international competition is shown in the first row. The league has received a lot of interest since the beginning (2006). Wee registered a general increase, up to 24 teams (out of 32 requests) participating to RoboCup@Home 2010. The number of participating teams is influenced by external factors such as on which continent the benchmarks are performed and the economic situation of a team. The second row shows the increase in the total number of tests executed by all the teams during the competition (not all teams participate to all tests). The execution of over 100 tests since 2009 and some of them outside the competition arena (i.e., in a real environment) since 2010 confirms the significance of the statistical analysis we are performing. The third row contains the percentage of successful tests, i.e., tests where some score greater than zero was achieved, showing a significant and constant increase in the years compared also to the general increase of the difficulty of the tests. Finally, the fourth row contains the average number of successful tests for each team. This is a very important measure, since the enormous increase from 1.0 tests in 2006 to 7.3 in 2009 (and 6.3 in 2010 with more difficult tests and 4.2 in 2012 with even more difficult ones) is a strong indication for an From this table, two important aspects are evident: i) the effects of substantially changing the tests every two years is reflected in the scores: a general increase of performance in all the functionalities from 2008 to 2009 and from 2010 to 2011, due to the fact that tests were similar in these pairs of years, while in 2010 and 2012 tests changed in many aspects making them more difficult; ii) some functionalities have almost been completely solved while others remain difficult. In fact, teams obtained good results in navigation, mapping, person tracking and speech recognition (average above 50%, except for navigation). Notice that the reason for a low percentage score in navigation is not related to inabilities of the teams, but to the fact that part of the navigation score is available only after some other task was achieved. The good results for speech recognition is very relevant since the competition environment is much more challenging than a typical service or domestic application due to a large amount of people and a lot of background noise. On the other hand, the results for mapping and person tracking are due to the fact that they were not applied in a very challenging setup. Person recognition and tracking performance is acceptable and thus in 2010 we increased the difficulty in this functionality during the tests. For example, more unknown people are present during the tests and a person is passing between the robot and the guide during person following. This is to test the robustness of the developed methods. Object manipulation had the highest increase of performance in 2009. Although more robust solutions have been developed by the teams, there is still some work to be done. Therefore, for 2010 we did not increase the difficulties of object manipulation in the tests. Object recognition is also reaching good performance and will not become more difficult. Until 2009 gesture recognition was not implemented by teams and the increase of available score was not sufficient to motivate teams to work in this direction. Thus for 2010, we created situations where speech is likely to fail and gesture was the only practical solution to solve the problem, obtaining the result that teams had to solve this problem and integrate this feature into their systems as well. Finally, cognitive abilities have been introduced since 2010 and good progresses have been already demonstrated. Summarizing, by analyzing the results of team performance, it is possible to decide about the future development of the benchmarks. Possible adjustments are: 1) increasing 29 Measure Number of teams Total amount of tests Percentage of succ. tests Avg. succ. tests p. team 2006 12 66 17% 1.0 2007 11 76 36% 2.5 2008 14 86 59% 4.9 2009 18 127 83% 7.3 2010 24 164 74% 6.3 2011 19 141 73% 6.5 2012 18 108 58% 4.2 Table 2: Measures indicating general increase of performance. Category 1 Move to hLOCATION1i, get hOBJECTi and put it at hLOCATION2i Go to hLOCATION1i, find hPERSONi and follow him/her ... Category 2 Bring me a drink Look for a person in the apartment ... Category 3 Bring me an hOBJECTi from a hLOCATIONi (but there is not such an object in that location) ... average increase in robot abilities and in overall system integration. A team successfully participating in an average of 7 tests (that are quite different to each other) demonstrates not only effective solutions and implementation of all the desired abilities, but also a flexible integrated system that has important features for real world applications. Notice that in this table all the teams were considered (not only the finalists). The results obtained by the analysis reported here clearly show that our methodology of dynamic and integrated benchmarking is producing a quick and significant progress in domestic service robotics. General Purpose Service Robots Benchmarking cognitive abilities of intelligent robots is a fundamental step towards their deployment but at the same time a very difficult task to undertake. The main difficulty is due to the fact that in order to test cognitive abilities of a real system many related problems must be solved (e.g., perception and locomotion). This is the reason why there are no benchmarks for cognitive abilities of real entire systems, but only benchmarks for single components (e.g., planning competitions). RoboCup@Home provides an experimental setting that is able to overcome this difficulty. As teams have developed service robotic systems that must demonstrate the effectiveness of the basic functionalities in other tests, we can define a test focusing only on cognition. This test, called ’General Purpose Service Robots’ (GPSR), has been introduced in 2010 and focuses on the following aspects. Table 3: Examples of template tasks. partial scores are available for accomplishing also portions of the task. It is important to notice that basic actions, objects and locations are known to the robot, thanks to the participation in the previous tests. However, the robot does not know in advance the exact sequence of actions to be accomplished. Also notice that the robot can communicate with the referees asking for more information, when needed, thus planning “speech” actions to acquire more knowledge on the tasks to be executed. The requirements that all the robots are assumed to have are: a set of template tasks T that can be solved by combining basic actions and functionalities already tested in previous tests during the competition (e.g., following a person, finding and recognizing people/objects, going to a location, grasping and delivering objects, speech interaction with a person, etc.), a set of people P (i.e., names of persons known to the robot), a set of objects O (known to the robots), and a set of locations L (known to the robots). The terms used in the task descriptions are not given beforehand to the teams, thus speech recognition and the cognition of commands need to be robust enough for the use of synonyms to describe the basic actions. For example, “go to”, “move to”, and “drive to” all describe a navigation command. Moreover, objects are grouped into classes (known to the robots) such as drinks, food, etc. Each template task can have one or more locations, people, and objects as arguments. Tasks are also grouped into three categories that will be explained later. Depending on the category, commands may include a sequence of tasks, an underspecified task, or erroneous information and hence an impossible task. Some examples are provided in Table 3. • There is no predefined order of actions to execute. This is to slowly transition away from state machine-like behavior programming. • There is an increased complexity in speech recognition compared to the other tests. Possible commands are less restricted in both actions and arguments. Commands can include multiple objects, e.g., “put the mug next to the cup on the kitchen table” • The test is about how much the robot understands about the environment and aims for high-level reasoning. The task is executed as follows. The robot enters the arena by driving to a specified location. A task is generated onthe-fly by a computer program randomizing actions, objects and locations. The resulting command may contain different synonyms for the same action. The exact command as generated by the computer program is given to the robot by an operator (not a team member). The robot then starts executing the task. Three tasks can be solved in 10 minutes and 30 Year 2010 2011 2012 Nr. teams 7 19 7 Non-zero score 2 7 3 Avg. score 3% 24% 8% Robots and teams The robots demonstrated in RoboCup@Home are as diverse as the research foci of the research groups behind the participating teams. Predominantly, teams are constituted of researchers working on projects related to humanrobot-interaction, mobile manipulation, and domestic service robotics in general as well as students directly working on problems addressed in RoboCup@Home in the form of practical courses, seminars or theses. In order to present an overview on the diversity of problems addressed in RoboCup@Home and to provide a proof-of-concept for competitions as a benchmark for integrated systems, we present various capabilities successfully demonstrated by teams in the RoboCup@Home league. It is important to notice however that, although we focus on descriptions on single functionalities, they have been demonstrated (and evaluated for performance) within an integrated system executing complex integrated tests. Table 4: Evolution of results for ’GPSR’ test. A task t̄ is the instantiation of a template task t ∈ T obtained by selecting random locations, people and objects as needed. For example, a task generated from the first template task in Table 3 could be “Move to the kitchen shelf, get a coke and put it on the dining table”. The tasks are grouped in three categories of increasing difficulty. In the first category, the tasks are formed by a sequence of basic actions. The main difficulty for the robot is to understand a complex sequence of actions and put them together at run-time to accomplish the task. In the second category, some information is not given to the robot, for example the location in which the action must be performed is not specified, or only the object group instead of a particular object is mentioned. In this case the robot has to provide the capability of either asking for explanations to have more detailed information (e.g., “which beverage should I bring you?”) or to act according to the incomplete specification (e.g., “finding and offering different beverages”). In the third category, the task contains erroneous information and it cannot be accomplished. For example, the robot is asked to find an object in a location, but there is no such object at the given location. The robot has to come up with an explanation for not having solved the task. It has to report to the operator that something went wrong, and what exactly went wrong. That is, by conducting the task the robot has to find out which part of the information in the given command has been erroneous and constitutes a conflict between the expected and actual state of the world. Also, the robot can come up with an alternative solution such as searching for the object at another location, or offering a different but similar object, e.g., another beverage. It must be noted that the robot is not aware of the command category beforehand. Detecting, tracking and following humans A common approach for detecting individual persons not standing closely together is to detect (pairs of) legs in 2D laser range scans. A very robust and efficient approach to laser-based people detection has been demonstrated by Team NimbRo from the University of Bonn in Germany. They use multiple 2D laser range scanners mounted in different heights and detect and track legs and torsos (Droeschel et al. 2011). In addition, tracked hypotheses are only considered as humans once they could be verified through, respectively, vision and facial detection at the tracked position. People detection and gesture recognition based on 3D data (e.g., by using RGB-D cameras) has also been studied. Team NimbRo uses an approach that first segments the different body parts in acquired 3D point clouds, and then estimates body (part) postures. With this ability the team’s robots can perceive locations in the environment where a human operator is pointing to and recognize and grasp objects shown to the robot (Droeschel et al. 2011). For recognizing and tracking people when following a human guide through populated areas, team RobotAssist from the University of Technology Sydney in Australia has presented an efficient yet robust approach based on 3D features (Kirchner, Alempijevic, and Virgona 2012). The team analyzes the size and shape of the people’s head to shoulder region in 3D point clouds. The approach is not only very fast, but also allows for recognizing people standing with their back towards the robot. Thus the robot can constantly verify that it is following the correct person and it also does not need to request the guide to step in front of the robot (facing the robot) for re-verification after the robot has lost sight of the guide (as has been part of the “Follow me” in RoboCup@home until 2011. Team B-it-bots from Bonn-Rhein-Sieg University in Germany has defined a novel 3D feature descriptor based on local surface normals and a classifier for detecting persons (Hegger et al. 2012). Especially in populated places, tracking and following a human guide becomes intractable in situations where the Results for this test are shown in Table 4, in terms of nonzero scores and of the average percentage of score gained by teams with respect to the maximum amount available. Note that in this table we report the total test score considering both the cognition and the actual abilities required within the task. Results for 2011 take into account the fact that the test was split into two parts. Some teams have participated in both parts and thus are counted twice. In general, it is possible to notice a slight increase of performance over the years, although the test is still very difficult. As a consequence of this difficulty, the test will not change significantly in the next years providing for chances of better overall results. The decline in performance for 2012 can be explained by the overall drop in scores due to increased difficulty of the tested capabilities. 31 of RoboCup@Home that the authors deem important is the benchmarking of robot cognition, which requires robots that can already operate in uncertain circumstances. Many of these robust robots have been developed using the criteria set forth by the experts of the technical committees in the past seven years. Although the benchmarking of domestic service robots and of robot cognition is still a young field of research, the progress of the benchmark and the participating robots has been impressive. guide leaves the direct sight of the robot and the robot is not able to recover and find the user again. Team GOLEM from the Universidad Nacional Autónoma de México demonstrated successful speaker localization on a mobile robot (Rascon and Pineda 2012) during RoboCup 2012 in Mexico City. A human operator guiding the robot through an (unknown) apartment repeatedly left the robots sight and hid somewhere in the robot’s vicinity. Solely by calling the robot, they could continue the guiding task. Team eR@sers from Japan has successfully demonstrated (human) motion learning. Their approach estimates reference points, intrinsic coordinate system types and, finally, probabilistic motion model parameters for generalizing motions. For learning and generating trajectories Hidden Markov Models have been used (Sugiura et al. 2011). References Droeschel, D.; Stückler, J.; Holz, D.; and Behnke, S. 2011. Towards joint attention for a domestic service robot – person awareness and gesture recognition using time-of-flight cameras. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 1205–1210. Feil-Seifer, D.; Skinner, K.; and Mataric, M. J. 2007. Benchmarks for evaluating socially assistive robotics. Interaction Studies 8(3):423–439. Hegger, F.; Hochgeschwender, N.; Kraetzschmar, G. K.; and Ploeger, P. G. 2012. People detection in 3D point clouds using local surface normals. In Proceedings of the RoboCup International Symposium (CD-ROM Proceedings). Holz, D.; Holzer, S.; Rusu, R. B.; and Behnke, S. 2011. Real-time plane segmentation using rgb-d cameras. In Proceedings of the 15th RoboCup International Symposium, volume 7416 of Lecture Notes in Computer Science, 307– 317. Istanbul, Turkey: Springer. Kirchner, N.; Alempijevic, A.; and Virgona, A. 2012. Headto-shoulder signature for person recognition. In Proceedings of the IEEE international Conference on Robotics and Automation (ICRA), 1226–1231. Nakamura, T.; Sugiura, K.; Nagai, T.; Iwahashi, N.; Toda, T.; Okada, H.; and Omori, T. 2012. Learning novel objects for extended mobile manipulation. Journal of Intelligent and Robotic Systems 66(1-2):187–204. Pineda, L.; Meza, I.; Avilés, H.; Gershenson, C.; Rascón, C.; Alvarado, M.; and Salinas, L. 2011. IOCA: interactionoriented cognitive architecture. Research in Computer Science 54:273–284. Rascon, C., and Pineda, L. A. 2012. Lightweight multidirection-of-arrival estimation on a mobile robotic platform. In Proceedings of the International Conference on Signal Processing and Imaging Processing, held under the World Congress on Engineering and Computer Science. Stückler, J.; Holz, D.; and Behnke, S. 2012. Robocup@home: Demonstrating everyday manipulation skills in robocup@home. Robotics & Automation Magazine, IEEE 19(2):34–42. Sugiura, K.; Iwahashi, N.; Kashioka, H.; and Nakamura, S. 2011. Learning, generation and recognition of motions by reference-point-dependent probabilistic models. Advanced Robotics 25(6-7):825–848. Wisspeintner, T.; van der Zant, T.; Iocchi, L.; and Schiffer, S. 2009. RoboCup@Home: Scientific competition and benchmarking for domestic service robots. Interaction Studies 10(3):393–428. Detecting, learning and recognizing objects Besides human-robot interaction, mobile manipulation and interacting with objects constitute a bigger part of the abilities being tested for in RoboCup@Home. Of particular relevance is fast and reliable perception of objects, and the robot’s ability to safely pick up, place and hand over objects. A very fast yet robust approach to detecting objects has been demonstrated by Team NimbRo. They approach the location where they expect the object to be, e.g., on a table, and capture the scene using a RGB-D camera. For detection and localization of objects in real-time, they use a fast method to compute local surface normals and restrict all further processing to horizontal planes (Holz et al. 2011). This method is combined with very efficient approaches to grasp planning which allows the team’s robots to pick up objects only seconds after the robot has arrived at the object’s approximate location (Stückler, Holz, and Behnke 2012). For modeling and tracking objects in real-time they developed fast methods using multi-resolution surfel maps. They successfully demonstrated a robot tracking a table in 3D and carrying it cooperatively with a human (Stückler, Holz, and Behnke 2012). For on-line learning of objects (e.g., when a human user wants to teach the robot a new object), team eR@sers has successfully demonstrated the use of object names that are unknown to the robot and not contained in the robot’s vocabulary. To efficiently learn out-of-vocabulary words they use phoneme sequence (Nakamura et al. 2012). For interacting with the user using the learned word, they convert the waveform to the robot’s voice by EigenVoice GMM. The same approach has also been successfully used in RoboCup@Home to learn (and recognize) names of previously unknown persons. Conclusion RoboCup@Home provides the environment, methodology and community effort to discuss potential benchmarks in the field of domestic service robotics. It creates dynamical problems in order to drive the intelligent robotics research agenda. By executing the benchmarks, the community has gained experience in how to set up and evaluate the benchmarks using statistical procedures. An outcome 32