AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) Impact of Human Communication in a Multi-teacher, Multi-robot Learning by Demonstration System Murilo Fernandes Martins Yiannis Demiris Dept. of Elec. and Electronic Engineering Imperial College London London, UK Dept. of Elec. and Electronic Engineering Imperial College London London, UK murilo@ieee.org y.demiris@imperial.ac.uk ABSTRACT A wide range of architectures have been proposed within the areas of learning by demonstration and multi-robot coordination. These areas share a common issue: how humans and robots share information and knowledge among themselves. This paper analyses the impact of communication between human teachers during simultaneous demonstration of task execution in the novel Multi-robot Learning by Demonstration domain, using the MRLbD architecture. The performance is analysed in terms of time to task completion, as well as the quality of the multi-robot joint action plans. Participants with different levels of skills taught real robots solutions for a furniture moving task through teleoperation. The experimental results provide evidence that explicit communication between teachers does not necessarily reduce the time to complete a task, but contributes to the synchronisation of manoeuvres, thus enhancing the quality of the joint action plans generated by the MRLbD architecture. Categories and Subject Descriptors I.2.9 [Artificial Intelligence]: Robotics General Terms Algorithms, Design, Experimentation Keywords Learning by Demonstration, Multi-robot Systems 1. INTRODUCTION The extensively studied field of Multi-robot Systems (MRS) is well-known for its advantages and remarkable applications when compared to systems consisted of a single robot. MRS bring benefits such as redundancy, flexibility, robustness and so forth (for a review on MRS, refer to [14]) to a variety of domains, including search and rescue, cleanup of hazardous waste, space exploration, surveillance and the widely recognised RoboCup domain. Likewise, a wide range of studies have addressed the challenge of developing Learning by Demonstration (LbD) architectures, which system can learn not from expert designers or programmers, but from observation and imitation of desired behaviour (for a comprehensive survey, see [1]). However, very little work has been done attempting to merge LbD and MRS. The work of [10] presented a novel approach to LbD in MRS – the Multi-robot Learning by Demonstration (MRLbD) Figure 1: The P3-AT mobile robots used in the experiments. architecture. This architecture enabled multiple robots to learn multi-robot joint action plans by observing the simultaneous execution of desired behaviour demonstrated by multiple teachers. This was achieved by firstly learning low level behaviours at single robot level using the HAMMER architecture [6], and subsequently applying a spectral clustering algorithm to segment group behaviour [16]. Communication is a classic feature, common to the areas of MRS and LbD, which has been addressed by a wide variety of studies. The performance impact of communication in MRS is well-known [2], [13]. However, a question that remains open is whether effects of communication between humans when teaching robots are somehow correlated with the ones in MRS. To the knowledge of the authors, no prior work has analysed the effects of communication between teachers in the MRLbD domain. This paper presents an analysis of performance in time to task completion and quality of the demonstration in a multiple teacher, multiple robot learning by demonstration scenario, making use of the MRLbD architecture. The remainder of this paper is organised as follows: Section 2 provides background information of how communication has been used within the areas of MRS and LbD. In Section 3, the teleoperation framework, the MRLbD architecture and the robots (pictured in Fig. 1) are explained, followed by Section 4, which presents the real scenario and the task specification in which the experiments were performed. Subsequently, Section 5 analyses the results obtained, and finally Section 6 presents the conclusions and further work. 2. BACKGROUND INFORMATION Several studies discuss properties and the nature of communication among humans and robots. In this paper, the characteristics of communication is categorised into 2 types, which definition is hereafter discussed. The impact of com- AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) Human-robot interface munication in MRS is also discussed in this section. 2.1 Implicit communication Implicit communication is the information that can be gathered just by observing the environment, and relies on the perceptual capabilities of the observer, often involving computer vision. Implicit communication does not depend on communication networks or mechanisms. On the other hand, a mismatch between observed data and the observer’s underlying capabilities can potentially happen – a common issue known as the correspondence problem [11]. LbD techniques often make use of implicit communication, where the behaviour of a group or a single agent (human or robot) is observed, and predictions of intention are formulated. The group or single agent is not necessarily aware of the fact that it is being observed. In some cases, humans pedagogically demonstrate desired behaviour to a robot by executing a task (e.g., [6]) or teleoperating robots (e.g., [3], [10]). In other cases a robot observes humans and hypothesises the actions humans may be executing [9]. 2.2 Explicit communication Explicit communication occurs when robots deliberately exchange internal states by means other than observation. This type of communication is largely used in MRS as a means to gather data in order to maximise the utility function of a group. Thus, it is expected that a group of robots can improve performance by sharing location, sensor data and so forth. As an example, the work of [17] presented an approach in which robots could form coalitions with other robots, sharing locations of robots and objects, as well as sensor data. Market-based approaches for multi-robot task allocation also make use of state communication to define auctions and distribute resources within robots (for a review, refer to [7]). Within the LbD area, in [5] a teacher could explicitly teach 2 humanoid robots when information should be communicated, and also select the state features that should be exchanged using inter-robot communication. The approach presented by [12] allowed the teacher to provide the robot with additional information beyond the observed state by using verbal instructions, such as TAKE, DROP, START and DONE. These verbal instructions were used at the same time the robot learnt to transport objects by following the teacher. Similarly, the framework presented in [15] enabled a robot to established a verbal dialog communication (using a pre-defined vocabulary of sentences) to learn how to perform a guide tour while following the teacher around. 2.3 Impact of Communication in MRS Previous studies have analysed the effects of communication in MRS. In [2], performance of a simulated group of reactive-controlled robots was analysed in 3 distinct tasks. According to the results obtained in that study, explicit communication is not essential for tasks in which implicit communication is available. On the other hand, if the observable state of the environment is very little, then explicit communication significantly improves the group performance. In the work of [13], the performance was analysed in regards to time to task completion the results obtained led to similar conslusions of [2]. If the necessary information can be gathered by implicit communication, then explicit communication does not make significant difference in the Joystick WiFi network Plan Extraction Visualisation Action recognition (HAMMER) Robot cognitive capabilities Robot control Environment perception Player Server Logging Group behaviour segmentation Robot hardware Multirobot plan Figure 2: Overview of the MRLbD architecture. time to complete a task. However, if the cost of executing redundant actions is high, then explicit communication can potentially reduce the time to task completion. The results presented in [10] demonstrated that multiple robots can learn joint action plans from multiple teacher demonstrations. For those experiments, no explicit communication between the teachers was permitted. This paper analyses whether the impact in performance caused by communication between teachers is correlated with the findings of [2] and [13]. 3. PLATFORM FOR EXPERIMENTATION This Section briefly describes the teleoperation framework with which the P3-AT robots were controlled during the experiments, as well as the relevant details regarding the MRLbD architecture (which block diagram can be seen in Fig. 2) employed to extract multi-robot joint action plans. 3.1 Teleoperation framework The teleoperation framework within the MRLbD architecture consists in a client/server software written in C++ to control the P3-AT robots. The server software comprises the robot cognitive capabilities and resides on the robot’s onboard computer. The server is responsible for acquiring the sensor data and sending motor commands to the robot, whereas the client software runs on a remote computer and serves as the interface between the human operator and the robot. The P3-AT mobile robots – Each robot is equipped with onboard computers, and its sensing capabilities comprise a laser range scanner model Sick LMS-200 with field of view of 180 ◦ and centimetre accuracy, a firewire camera providing coloured images with 320x240 pixels at 30 frames per second, and 16 sonar range sensors with field of view of 30 ◦ installed in a ring configuration around the robot. In addition, odometry sensors installed on the robot provide feedback of robot pose (2D coordinates and orientation relative to a starting point). The server software – Within the Robot cognitive capabilities block (Fig. 2), the server interfaces with the robot hardware by means of the robot control interface Player [8]. Initially, sensor data is requested to the robot and then processed. Laser data is combined with robot odometry values to build an incremental 2D map of the environment. Additionally, the images captured from the camera are used for object recognition and pose estimation, and also compressed in JPEG format to be sent over the Wi-Fi network to the client software. Also, the sonar data is used for obstacle detection which disables the motors in case a specific AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) M1 I1 M2 I2 In Figure 3: A screenshot of the human-robot interface; camera is on top-left, 2D map is on top-right. motor command would result in crashing into a not recognised object. Besides, joystick inputs representing motor commands are constantly received from the client. The data manipulated by the server is incrementally stored in a log file every 0.5 seconds, comprising a series of observations made by the robot during task execution. The client software – Within the Human-robot interface block (Fig. 2), the sensor data received from the server is displayed to the human demonstrator . This data consists in the robot’s onboard camera, laser and sonar range scanners, odometry values, battery level and Wi-Fi signal strength. A screenshot of this interface can be seen in Fig. 3. In order to aid in teleoperating the robots, thicker line segments are extracted from laser data, and recognised objects are represented by specific geometric shapes and colours (e.g., when other robots are recognised, a blue ellipse is displayed on screen, representing the observed robot pose). In addition, an incremental 2D map of the environment is built using laser and odometry data, map which can be substantially useful for location awareness during navigation. In the meantime, the robot is teleoperated by the demonstrator through a joystick. 3.2 The MRLbD Architecture It is worth noticing that the MRLbD architecture does not necessarily require the aforementioned teleoperation framework; it can be integrated in any MRS in which robots are capable of generating appropriate observations of the state of the environment along time. The learning module of the MRLbD architecture is responsible for extracting a joint action plan for each set of simultaneous demonstrated data provided by multiple human teachers. Within the Plan Extraction block, 3 distinct stages, hereafter discussed, process the demonstrated data. Action recognition at single robot level – When addressing the problem of recognising observed actions, a mismatch between observed data and robot internal states might happen. This is a common issue known as the correspondence problem [11]. The MRLbD architecture embeds a prediction of intent approach – the HAMMER architecture [6] – to recognise actions from observed data and manoeuvre commands, which is based upon the concepts of multiple hierarchically connected inverse-forward models. As illustrated in Fig. 4, each inverse-forward model pair is a hypothesis of a primitive behaviour and the predicted state is compared to the observed state to compute a confidence value. This value represents P1 Prediction verification (at t+1) P2 F2 Mn Prediction verification (at t+1) … … state s (at t) F1 Fn Pn Prediction verification (at t+1) Figure 4: Diagramatic statement of the HAMMER architecture. In state st , inverse models (I1 to In ) compute motor commands (M1 to Mn ), with which forward models (F1 to Fn ) form predictions of the next state st+1 (P1 to Pn ), then verified at st+1 . how correct that hypothesis is, thus determining the robot primitive behaviour which would result in the most similar outcome to the observed action. As in [10], this paper makes use of five inverse-forward model pairs, which are: Idle (the robot remains stationary), Search (the robot searches for a specific object), Approach (the robot approaches the object once it is found), Push (the robot moves the object to a destination area) and Dock (the robot navigates back to its initial location). While the inverse models are hand-coded primitive behaviours, forward models take into account the differential drive kinematics model of the robot. Group behaviour segmentation – A Matlab implementation of the spectral clustering algorithm presented in [16] is used for spatio-temporal segmentation of group behaviour based on the observations made by the robots during task demonstration. The current work makes use of the same parameters described in [10] for the clustering algorithm. These values were thoroughly defined to preserve the generality of the algorithm when applied to different sets of concurrent demonstrated data. In addition, the same 2 events used in [10] were used in this implementation. The event SEESBOX occurs every time the robot recognises a known object within its field of view. Also, the event SEESROBOT occurs when the robot recognises another robot. Multi-robot plan generation – In the MRLbD, the segmented group behaviour is combined with the recognised actions at single robot level to build multi-robot action plans. This plan comprises a sequence of actions at single robot level, with specific times to start and end action execution, that each robot should perform along time as an attempt to achieve the same goals of the task execution formerly demonstrated. Given the unlabelled group behaviour segmentation over time for each robot, the recognised actions at single robot level are ranked based upon the forward model predictions and start/end times of that particular segment of group behaviour. For every robot and every cluster, a confidence level of the inverse-forward model is calculated. The action with highest confidence level is then selected, and a sequence of actions at individual level for each robot are generated, defining the multi-robot joint action plan. 4. EXPERIMENTAL TESTS Experiments were performed in a furniture movers realistic scenario. This experiment differs from the one presented AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) Comparison of task time to completion 1100 1000 without communication with communication 900 time (secs.) 800 700 600 500 400 300 200 100 0 1 2 3 4 5 trial Figure 5: Time-based comparison of the demonstrated task executions, with (green bars) and without (blue bars) verbal communication allowed between participants. in [4] in the sense that demonstrators have full control of the robot through the joystick, opposed to the selection of predefined actions comprising the robots’ underlying capabilities. Moreover, in this work mobile robots P3-AT were used in a real environment. In comparison with the experiments performed in [10], the current experiments make use of a larger box, and robots must move the box through a tighter, cluttered environment, demanding significantly more difficult coordinated manoeuvres to complete the task. A total of 10 demonstrations were conducted, involving 7 participants: 3 novices, 2 intermediate and 2 experienced users of the teleoperation framework1 . The same instructions were thoroughly given to all the 7 demonstrators, including a training session of 10 minutes for familiarisation with the teleoperation framework. In pairs, the demonstrators were asked to remotely control 2 robots and search for a particular box, which location was initially unknown, and bring it to a specified destination, resulting in the task completion. In order to analyse the impact of explicit communication between participants during task execution, the same pairs of participants performed 2 task executions (comprising one trial): one demonstration with implicit communication only, and another demonstration with verbal communication permitted. An experienced and an intermediate users performed the first trial, then 2 experienced users demonstrated the second and subsequently 2 novices deployed the third trial. The fourth trial was also executed by 2 novices and lastly an experienced and an intermediate users carried out the task demonstration. 5. RESULTS AND DISCUSSION This Section analyses the impact of explicit communication between teachers in the joint action plans extracted by the MRLbD architecture, discussing the results obtained from the 10 demonstrations of task execution in regards to time to completion of the task, as well as quality of the multi-robot joint action plan generated. 5.1 Time to completion of task demonstrations From the results obtained using the MRLbD architecture, 1 A novice and the 2 advanced participants performed the experiments twice Fig. 5 shows the time to completion of the 10 task executions, where each trial represents 2 demonstrations from the same pair of participants, with and without explicit, namely, verbal communication allowed between participants. A comparative analysis of the trials in a time basis provides no evidence of performance gains, which is consistent with the discussion presented in Section 2.3. In fact, no significant difference can be observed. On one hand, explicit communication can cause delays, as observed in trials 2 (13 secs.) and 4 (22 secs.) in Fig. 5. Delays could presumably be a consequence of participants naturally spending some time communicating with each other in order to synchronise and coordinate manoeuvres. On the other hand, trials 1 and 5 presented a slight reduction of time demanded (30 secs. and 19 secs.). Recalling the discussion in Section 2.3, implicit communication not always provide enough information when the environment is partially observable. That is, sometimes a robot might lose track of its robot mate or the object being manipulated, thus requiring manoeuvres to look around and observe its surroundings. In which case, one robot can provide the other robot with the relevant information by explicitly communicating its observation, avoiding unnecessary manoeuvres. From trial 3, however, an expressive reduction in time to completion of the task can be observed (4 min. and 11 secs.). Unexpectedly, this trial provided an excellent example of how group behaviour performance can benefit from explicit communication. In this particular trial, during the demonstration using implicit communication, both participants decided to navigate towards the direction opposite to where the box was initially located (Fig. 6(a)), thus demanding more time for the box to be found by both participants. Searching for the box does not require collaboration between robots, but it is a classical example of a redundant action. Predictably, the participants benefited from the explicit communication: one participant quickly found the box and immediately shared that information, telling the other participant the box location (Fig. 6(b)), hence reducing the time for task completion. Interestingly, whether the participants demonstrating the task were experienced users or novices, the impact of allowing explicit communication between teachers did not result in changes in time to completion of the task whatsoever. 5.2 Quality of Joint Action Plans Generated It is widely recognised that developing systems to work in real world scenarios markedly elicits real world issues which must be taken into account, such as noise and the non-determinism of the environment. Furthermore, when developing systems in which robots learn from humans, the demonstrations of desired behaviour are potentially unlikely to result in exactly the same data. Therefore, the multi-robot joint action plans generated by the MRLbD architecture are expected to present variation, mainly in the segmentation of group behaviour, where robot trajectories along time are taken into account. Fig. 7(a) plots the joint action plan generated by processing the data demonstrated with implicit communication only, from trial 1. In that particular demonstration, a lack of coordination, showing the manoeuvres were mostly out of synchrony, can be observed. While robot 1 was already pushing the object (denoted by the character “P” within the AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) Spatio−temporal view 2000 2000 1500 1500 time time Spatio−temporal view 1000 Robot found the box 1000 15 500 Both robots take wrong direction 0 10 10 10 5 Robots’ initial location 5 15 Communication occurred 500 0 −5 −10 Robots’ initial location 5 0 x coord −5 −15 y coord 5 0 10 0 −5 −10 −15 y coord (a) No communication allowed: both robots navigated opposite to box location 0 −5 x coord (b) Communication allowed: box location information communicated to other robot Figure 6: Robot trajectories along time from trial 3. Robot actions timeline Robot actions timeline A A DDA P D 1 S P P A D 2 S A I A P A P 1S P I I P P robot I robot 2 0 100 200 I PP 300 time 400 500 600 (a) Implicit communication only 0 100 200 P 300 time 400 500 600 (b) Explicit communication allowed Figure 7: Joint action plans generated using demonstrations from trial 1. Robot actions timeline A P 1 S P P P A D P 2 S P P P P P P P P P P robot S robot 2 Robot actions timeline 0 50 100 P P 150 time P 200 250 300 (a) Implicit communication only 1 S P P 0 50 100 150 time 200 250 300 (b) Explicit communication allowed Figure 8: Joint action plans generated using demonstrations from trial 2. second action on the timeline), robot 2 was still approximating (“A”) to the object. Only at the fourth action the robot 1 realised that robot 2 was not in synchrony with its manoeuvres. This lack of coordination caused the HAMMER architecture to recognise the fourth and fifth individual actions of robot 2 as Dock. At that stage, robot 2 was moving towards the place it was initially located, and presumably lost track of the object, leading the confidence level of the Dock action to be higher. Conversely, the joint action plan extracted from the demonstration with explicit communication in trial 1 indicates rea- sonably synchronised actions, as can be seen in Fig. 7(b). Again, robot 1 started pushing the object first, while robot 2 was still approaching the object (second actions on the timeline). The participant teleoperating robot 1 was possibly notified by the other participant, which initiated a period in which the robots remained stationary (Idle action denoted by “I”). The remainder of the actions were hence coordinative executed until task completion, thus suggesting that explicit communication can contribute to the MRLbD architecture to generate better coordinated joint action plans. Fig. 8 also provides evidence that explicit communication AAMAS'10 Workshop on Agents Learning Interactively from Human Teachers (2010) contributes to extraction of better joint action plans. In trial 2 both participants were experienced users. Therefore, the result from the demonstration with implicit communication only (Fig. 8(a)) already presents a considerably coordinated, noticeably synchronised joint action plan. However, this synchrony is lost at the fifth action on the timeline, which was quickly re-established by the robot 2, presumably approaching the object rapidly, and then engaging in the object moving. In contrast with Fig. 8(a), the Fig. 8(b) reveals a plan in which actions are jointly executed, tightly synchronised in time, when explicit communication was permitted between the experienced teachers. These results indicate that the use of explicit communication aids human teachers regardless their level of skills, improving the quality of the demonstration, quality which still relies on the level of skills. 6. CONCLUSION AND FUTURE WORK This paper presented an analysis of the impact of communication between humans concurrently teaching multiple robots using the novel MRLbD architecture proposed in [10]. It was found that the impact in the MRLbD system performance when using explicit communication is strongly correlated to the impact in MRS. What is more, the results noticeably provide evidence that explicit communication affects human teachers at the same extent, regardless the level of skills of these teachers. Regarding the time to task completion, explicit communication was found to result in no substantial alteration in performance. Explicit communication can only be beneficial to reduce the time to task completion if the cost (e.g., in time or battery life) of executing redundant actions is high. However, the quality of the joint action plans was shown to satisfactorily benefit from explicit communication. It was revealed that those plans presented tighter coordinated, temporally synchronised actions executed by the robots. Ongoing work is on refining the inverse-forward model pairs of the HAMMER architecture for action recognition and investigating the use of other events within the spectral clustering algorithm. 7. ACKNOWLEDGEMENTS The support of the CAPES Foundation, Brazil, under the project No. 5031/06–0, is gratefully acknowledged. 8. REFERENCES [1] B. D. Argall, S. Chernova, M. Veloso, and B. Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469–483, 2009. [2] T. Balch and R. C. Arkin. Communication in reactive multiagent robotic systems. Auton. Robots, 1(1):27–52, 1994. [3] S. Chernova and M. Veloso. Confidence-based policy learning from demonstration using gaussian mixture models. In AAMAS ’07: Proc. of the 6th Int. joint Conf. on Autonomous Agents and Multiagent Systems, pages 1–8, New York, NY, USA, 2007. ACM. [4] S. Chernova and M. Veloso. Multiagent collaborative task learning through imitation. In 4th Int. Symposium on Imitation in Animals and Artifacts, April 2007. [5] S. Chernova and M. Veloso. Teaching multi-robot coordination using demonstration of communication and state sharing. In AAMAS ’08: Proceedings of the 7th Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pages 1183–1186, Richland, SC, 2008. IFAAMAS. [6] Y. Demiris and B. Khadhouri. Hierarchical attentive multiple models for execution and recognition of actions. Robotics and Autonomous Systems, 54(5):361–369, 2006. [7] M. B. Dias, R. M. Zlot, N. Kalra, and A. T. Stentz. Market-based multirobot coordination: A survey and analysis. Technical report, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, April 2005. [8] B. Gerkey, R. Vaughan, K. Stoy, A. Howard, G. Sukhatme, and M. Mataric. Most valuable player: a robot device server for distributed control. Intelligent Robots and Systems, 2001. Proc. of the 2001 IEEE/RSJ Int. Conf. on, 3:1226–1231, 2001. [9] R. Kelley, A. Tavakkoli, C. King, M. Nicolescu, M. Nicolescu, and G. Bebis. Understanding human intentions via hidden markov models in autonomous mobile robots. In HRI ’08: Proc. of the 3rd ACM/IEEE Int. Conf. on Human Robot Interaction, pages 367–374, New York, NY, USA, 2008. ACM. [10] M. F. Martins and Y. Demiris. Learning multirobot joint action plans from simultaneous task execution demonstrations. In Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), page to appear, Toronto, Canada, 2010. [11] C. L. Nehaniv and K. Dautenhahn. The correspondence problem, pages 41–61. MIT Press, Cambridge, MA, USA, 2002. [12] M. N. Nicolescu and M. J. Mataric. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In In Proc. of the 2nd Int. Joint Conf. on Autonomous Agents and Multi-Agent Systems, pages 241–248, 2003. [13] L. E. Parker. The effect of action recognition and robot awareness in cooperative robotic teams. In IROS ’95: Proc. of the Int. Conf. on Intelligent Robots and Systems-Volume 1, page 212, Washington, DC, USA, 1995. IEEE Computer Society. [14] L. E. Parker. Distributed intelligence: Overview of the field and its application in multi-robot systems. Journal of Physical Agents, 2(1):5–14, March 2008. Special issue on Multi-Robot Systems. [15] P. Rybski, K. Yoon, J. Stolarz, and M. Veloso. Interactive robot task training through dialog and demonstration. In 2nd ACM/IEEE Int. Conf. on Human Robot Interaction, March 2007. [16] B. Takacs and Y. Demiris. Balancing spectral clustering for segmenting spatio-temporal observations of multi-agent systems. In Data Mining, 2008. ICDM ’08. Proc. of the 8th IEEE Int. Conf. on, pages 580–587, Dec. 2008. [17] F. Tang and L. Parker. Asymtre: Automated synthesis of multi-robot task solutions through software reconfiguration. Robotics and Automation, 2005. ICRA 2005. Proc. of the 2005 IEEE Int. Conf. on, pages 1501–1508, 18-22 April 2005.