Learning to Avoid Collisions

AAAI Technical Report FS-12-07 Robots Learning Interactively from Human Teachers Learning to Avoid Collisions Elizabeth Sklar1,4 , Simon Parsons1,4 , Susan L. Epstein2,4 , A. Tuna Özgelen4 , J. Pablo Muñoz4 , Farah Abbasi3 , Eric Schneider2 and Michael Costantino3 1 Brooklyn College, 2 Hunter College, 3 College of Staten Island, 4 The Graduate Center The City University of New York, New York, NY USA sklar@sci.brooklyn.cuny.edu, parsons@sci.brooklyn.cuny.edu, susan.epstein@hunter.cuny.edu, tunaozgelen@gmail.com, jpablomch@gmail.com, farah.abbasi@cix.csi.cuny.edu, nitsuga@pobox.com,michael.costantino@cix.csi.cuny.edu Abstract In both cases, teams of robots are deployed to locate targets of interest in terrain that is potentially unsafe for people, and in both cases the robots typically need a human operator to help with parts of the task that they cannot easily handle on their own. In urban search and rescue, this might be identifying a human victim; and in de-mining, this might be defusing a device that the robot team has located. In our work (Sklar et al. 2011; 2012), we focus on the use of inexpensive, limited-function robots since we believe that teams of such robots are more practical for wider deployment on the kinds of tasks we are interested in than teams of fewer, more expensive and more capable robots. Although individual robots may require human assistance, large teams of robots present additional challenges for human-robot interaction. There are many ways in which robots can profitably learn from a human trainer, and perhaps lessen the need for human assistance in some circumstances. This paper reports on a feasibility study of one such instance, where a human operator suspends robot movement in order to avert collisions. Our preliminary investigation involved collecting data from 5 human subjects and assessed the possibility of learning effective behavior patterns from this data. The results detailed here are promising and warrant further investigation. Members of a multi-robot team, operating within close quarters, need to avoid crashing into each other. Simple collision avoidance methods can be used to prevent such collisions, typically by computing the distance to other robots and stopping, perhaps moving away, when this distance falls below a certain threshold. While this approach may avoid disaster, it may also reduce the team’s efficiency if robots halt for a long time to let others pass by or if they travel further to move around one another. This paper reports on experiments where a human operator, through a graphical user interface, watches robots perform an exploration task. The operator can manually suspend robots’ movements before they crash into each other, and then resume their movements when their paths are clear. Experiment logs record the robots’ states when they are paused and resumed. A behavior pattern for collision avoidance is learned, by classifying the states of the robots’ environment when the human operator issues “wait” and “resume” commands. Preliminary results indicate that it is possible to learn a classifier which models these behavior patterns, and that different human operators consider different factors when making decisions about stopping and starting robots. Introduction We are interested in deploying human-robot teams to solve problems that are dangerous for human teams to tackle, but are beyond the capabilities of robots alone. Examples of such tasks include urban search and rescue (Jacoff, Messina, and Evans 2000; Murphy, Casper, and Micire 2001) and humanitarian de-mining (Habib 2007; Santana, Barata, and Correia 2007). In urban search and rescue, robots explore an enclosed space, such as a collapsed building, and try to locate human victims. In humanitarian demining, robots explore an open space, such as a field in a war zone, and search for anti-personnel mines that may be concealed. The goal is to locate mines so that they can be disarmed and the region rendered safe. Related Work The idea that an interactive system can improve its behavior through observation of human users’ key strokes and mouse clicks, i.e., data mining the clickstream, is not new. In the 1960’s and 1970’s, Teitelman developed an automatic error correction facility that grew into DWIM (Do What I Mean) (Teitelman 1979). In the early 1990’s, Cypher created Eager, an agent that learned to recognize repetitive tasks in an email application and take them over from the user (Cypher 1991). Maes used machine learning techniques to train agents to help with email, filter news messages, and recommend entertainment. These agents gradually gained confidence in their understanding of users’ preferences (Maes 1994). In robotics, the idea that robots can learn from humans c 2012, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 53 has been explored with learning from demonstration (Argall et al. 2009), also known as programming by demonstration (Hersch et al. 2008). This is commonly viewed as a form of supervised learning, which learns a policy from a sequence of state/action pairs (Argall et al. 2009). Other approaches to robots learning from people include (Katagami and Yamada 2000), where human teachers provide examples that seed evolutionary learning; (Lockerd and Breazeal 2004), where a robot tries to identify a human’s goal and make its own plan to achieve it; and (Nicolescu and Matarić 2001), where a robot observes while the human carries out actions in its domain and learns the outcomes of its own actions from these observations. Little of this work is concerned with multiple robots, however. There is a long history of multi-robot learning, for example (Matarić 1997; Parker 2000; Bowling and Veloso 2003; Pugh and Martinoli 2006), but these involve learning by trial and error, not learning from a human teacher. In earlier related work, we learned behaviors from data collected while humans played video and educational games, using these data to train neural networks that then served as controllers for opponents playing these games (Sklar 2000; Sklar, Blair, and Pollack 2001). The aim was not to produce the best player, but rather to derive a population of players that represented different characteristics of play. This technique was later extended beyond games to generate populations of agents that emulated students performing at different skill levels on an educational assessment (Sklar et al. 2007; Sklar and Icke 2009). The same idea is applied here, to capture and then represent characteristic behaviors of different human operators as they interact with a multi-robot team performing a complex task. Figure 1: The robots’ physical environment. (a) unmodified Figure 2: The Surveyor SRV-1 Blackfin. Our Approach Because the Blackfin has limited on-board processing, the controller for these robots runs off-board, communicating with the robot over a wireless network. The robot controllers, software that allocates tasks to robots (described in (Sklar et al. 2012)), and software that extracts robot positions from the overhead cameras run on a small network of computers facilitated by a central server process. Details of the software architecture are described in (Sklar et al. 2011). The work reported here involves a single human operator interacting with a team of three robots. This section describes the physical environment in which experiments were carried out and the design of the experiments. Physical setup Our experimental testbed, or arena, models the interior of a building, with a large space of six rooms and a hallway. The robots explore this space, as described below. The physical testbed is shown in Figure 1, and maps of the arena are shown in Figures 3 and 4. The area of the arena is approximately 400 square feet. The robots used for the experiments described here are Surveyor SRV-1Blackfins1 , which is a small tracked platform equipped with a webcam and 802.11 wireless. The Blackfin is pictured in Figure 2a. Localization is provided by a network of overhead cameras (Sklar et al. 2011). To help these cameras distinguish between robots, a “hat” is mounted atop each Blackfin (see Figure 2b). Each hat contains a unique symbol, selected from the set of Braille letters that do not have any rotational symmetry; thus the hat provides orientation as well as position. 1 (b) with hat Motivation As explained above, efficient exploration of the physical space is a key feature of the tasks that we would like our robot team to perform. We have been running experiments in which robots are allocated particular interest points to visit. As described in (Sklar et al. 2012), a market-based component allocates these points to the robots on the team. Each robot controller then calculates a path that spans all the points allocated to that robot, with no knowledge of what other robots are planning to do. The robots then simultaneously manoeuvre to their allocated points. Given that the robots are in a restricted space, that they may start in the same part of the space (modeling situations where all robots enter a space from the same point), and that they have no knowledge of what the other robots are planning to do, the robots naturally get in each other’s way. This http://www.surveyor.com/SRV info.html 54 Figure 3: Paths traced by three robots exploring the arena. The robots start in the lower left “room,” at the points marked with black ’s. Each robot visits an assigned subset of the 8 “interest points” indicated by red ×’s. Figure 4: The user interface presented to the human trainer. Experimental setup plays the current position and orientation of each robot and its planned path to its current interest point. The robots’ position information is derived by the overhead cameras and is therefore subject to error and time lag. The operator could send two different commands to the robots using the interface: wait and resume. To send a command, the user first clicks on the robot’s icon to select it, and then presses a key corresponding to the command. Feedback from human subjects participating in the experiment indicated that this input method was somewhat cumbersome. An alternate input method is under consideration where the user clicks on a robot to pause it if the robot is moving or to resume its movement if the robot is stopped. For each experimental run, three robots were positioned in the same starting locations (as per Figure 3), eight interest points were allocated, and the robots visited their assigned points while the human operator monitored the team for collisions. In each run, the operator made the robots wait when she thought stopping was necessary to avoid a collision, and she resumed their movement when she judged that the danger of collision was past. A run ended when the robots all reached their final interest point or when a robot collided with another robot or a wall. The robot team was set up in the configuration shown in Figure 3, with all three robots starting in the same room in the arena. The team was allocated eight interest points, and these were distributed among the robots as described in (Sklar et al. 2012). The robots then planned how to visit their assigned points and started to follow the paths that they had planned. The conservative collision avoidance mechanism mentioned above was disabled. In the experiments described here, collision avoidance was handled by the human operator. This person sat at a workstation physically remote from the robots’ arena, in a separate section of the research lab. The human operator could not see or hear anything from the arena. The only information that the operator had about the robots appeared on a user interface, shown in Figure 4. The interface dis- The results reported here were obtained from data collected on five human subjects (considered “trainers” for the purposes of the behavior learning described below), each of whom was an undergraduate researcher working in our lab (2 female, 3 male). Some of the trainers had previously tested the interface, while others had not used it before; but in neither case was any extensive training required. When a run ended with a robot colliding with a wall, we discarded the data and repeated the run. Thus, each trainer completed five runs which ended either with a robot-robot collision or with successful completion of the full task. During each run, the system logged data continuously, including the current positions of the robots and each robot’s mutual interference is clear in Figure 3, which shows the motion of three robots during one experimental run. Because of this interference, the robot controller is programmed to prevent collisions, and it does so very conservatively: it suspends any robots that get too close to one another, based on a fixed threshold distance. The robot that is closest to its target interest point is given right-of-way and plans a modified route around the stationary robots. Meantime, the other robot(s) wait until their paths are cleared and then they are allowed to move again. The amount of time the robots wait is recorded as delay time, which can accrue rapidly if many robots are trying to manoeuvre in the same small space. This fixed-threshold method is a naı̈ve way to prevent collisions. So we wanted to investigate the question of whether a human operator could provide a better model of how avoid collisions, in a way that ultimately reduces overall delay time. The next section describes our initial attempt to learn such behaviors from human operators. Experiments and Results 55 number four categories of moving- moving- waitingsamples moving waiting moving complete set of training data H1 20, 800 20, 768 16 16 H2 17, 103 17, 058 23 22 H3 22, 503 22, 457 23 23 H4 15, 221 15, 148 35 35 H5 17, 451 17, 389 31 31 balanced subset of training data H1 65 32 16 16 H2 91 45 23 22 H3 92 45 23 23 H4 146 72 35 35 H5 125 62 31 31 waitingwaiting two categories noaction action 0 0 0 3 0 20, 768 17, 058 22, 457 15, 151 17, 389 32 45 46 70 62 0 0 0 3 0 32 45 45 70 62 32 45 46 70 62 Table 1: Training data. Each Hi indicates one of the 5 human trainers. path to its target location (i.e., sequence of waypoints). Each time the human operator sent a command to stop a robot (presumably to avoid a collision with another robot), a “Wait” command was logged for that robot. Similarly, whenever the operator sent a command to a robot to resume its movement, a “Resume” command was logged. Thus, the log file for each run can be used to reconstruct the world state for each robot, at each point in time; and these states can be labeled with the human trainer’s decision to perform an action or to do nothing (i.e., let the system perform autonomously). A robot’s state is represented as: hr1 , θ1 , r2 , θ2 , Vx , Vy , Hx , Hy i columns contain the number of instances labeled in each of two categories. The four-category labels are: • moving-moving, which indicates that the robot was moving when the message was logged; • moving-waiting, which indicates that the human operator sent a moving robot a command to suspend its motion; • waiting-moving, which indicates that the human operator sent a suspended robot a command to resume its motion; and • waiting-waiting, which indicates that the robot was waiting when the message was logged. The two-category labels are: • action, which indicates that the human operator did something: either sent a moving robot a command to suspend its motion or sent a suspended robot a command to resume its motion; and • no-action, which indicates that the human operator did not enter any input when the message was logged. First, we describe our efforts to learn which states, represented as in equation (1), belong to which of the fourcategory labels. The upper portion of Table 1, however, clearly indicates that the vast majority of instances are moving-moving. This imbalance has a serious negative impact on learning: most classifiers will simply default to the moving-moving label, since assignment of this overwhelming majority label will be correct 99.72% of the time. A robot guided by such a classifier would never wait. In order to mitigate this situation, we created a balanced training set by selecting a subset of the complete training set of each human operator. The balanced subset was created by retaining all the non-moving-moving instances, and then selecting an equal number of moving-moving instances from the complete data set, at equally-spaced intervals in the chronologically-ordered log file. Statistics for the balanced training set appear in the lower half of Table 1. We employed the WEKA2 tool (version 3.6.7) (Witten, (1) where r1 , θ1 and r2 , θ2 are the range (in cm) and angle (in radians), respectively, from the subject robot to the other two robots in the arena; Vx and Vy are the x and y velocities of the subject robot, and Hx and Hy are the heading of the subject robot (i.e., the x and y distance from its current location to the next waypoint in its planned path). Distances are computed as straight-line Euclidean distances, without consideration of intervening walls or other robots or obstacles. We analyzed the collected data to address three questions: (1) the ability to learn behaviors from log data, (2) the ability to distinguish between trainers, and (3) the difference between the trainers’ methods and the fixed-threshold method for collision avoidance. Learning from log data To address the question of whether the log files could be used to learn behavior patterns for collision avoidance, we first needed to label the logged data. As detailed below, we compare two methods of labeling the data. The first method assigned labels from four different categories, and the second method assigned labels from two different categories. Table 1 describes the data collected from the five trainers, listed as H1 through H5. The first column contains the total number of samples in the training set produced by each operator; the next four columns contain the number of instances labeled in each of four categories; and the last two 2 56 http://www.cs.waikato.ac.nz/ml/weka H1 H2 H3 H4 H5 Frank, and Hall 2011) and tested 10 different classifiers, using 10-fold cross validation: k-nearest neighbors, C4.5 decision trees, a rule extractor from a decision tree (PART), naı̈ve Bayes, Holte’s OneR rule learner, a support vector machine (SMO), logistic regression, AdaBoost, logit boost, and decision stumps. The default parameters were used for each method (values are listed in the Appendix). We ran each classifier on the balanced subset data, attempting to learn the four-category labels. Table 2 shows the accuracy of the best classifier for each trainer and contains the percentage of correctly classified instances. None is much better than random. H1 H2 H3 H4 H5 rule learner support vector machine C4.5 decision tree logistic regression rule learner classified as → no-action action 54.69% 50.00% 59.34% 56.16% 56.80% moving-moving moving-waiting waiting-moving movingwaiting 4 13 15 noaction 30 4 action 2 28 Table 5: Confusion matrix for the best result learning 2 classes from the balanced subset data (H1). Table 5 shows the confusion matrix for the best of these classifiers, on the balanced subset data for trainer H1. Again, this is a substantial improvement over the results shown in Table 3. Our current work is investigating other techniques to address the imbalanced data, in order to be able to use more of the data for training. The resulting confusion matrices clearly indicate that the moving-waiting and waiting-moving labels are not reliably distinguished. For example, Table 3 shows the confusion matrix for the best result learning four categories from the balanced subset data for trainer H3 (corresponding to the third row in Table 2). Clearly, moving-waiting and waitingmoving are confused with each other more often than either is classified correctly. Inspecting the state data reveals that the conditions under which the human operator chooses to suspend a robot’s movement because it is about to collide with another robot are almost identical to the conditions under which the operator determines that it is safe for a suspended robot to begin moving again. movingmoving 38 5 5 90.63% 76.67% 87.91% 82.86% 87.10% Table 4: The best stratified 10-fold cross validation results, learning 2 classes from the balanced subset data. Table 2: The best results on learning 4 classes from the balanced subset data. classified as → k-nearest neighbor logit boost logit boost AdaBoost k-nearest neighbor Distinguishing between trainers To address the question of whether the data could be used to differentiate between trainers, we examine the decision trees produced by the C4.5 classifier. Although the operatorguided collision avoidance task appears relatively simple and straightforward, distinct differences between operators’ behaviors are detected. Figure 5 shows the C4.5 trees learned for the two most accurately modeled trainers. It is interesting to note that the decision trees for different trainers depend on different aspects of the robot’s state. The decisions of H1, as captured by the decision tree, depend on the direction that the robot is heading (headingY, its speed (velX (Vx ) and velY (Vy )), and the directions that the other two robots are heading (theta1 (θ1 ) and theta2 (θ2 )). In contrast, the decisions of H3 depend on the distance from one of the other robots (range2 (r2 )), the robot’s heading (headingX (Hx ) and headingY (Hy )), its speed (velX (Vx )) and the direction of the third robot (theta1 (θ1 )), not the same robot as the one whose distance is considered. H1 appears to focus on trajectories alone, while H3 attends first to proximity. We also analyzed the differences in behaviors identified by the decision stump results. Table 6 shows the rules obtained using this method for the five trainers. Again, it is clear that the decisions of different humans, as extracted by the decision stump algorithm, are influenced by different components of the robot’s state. While these observations are preliminary, and we have not yet collected enough data from human subjects to produce definitive models of particular individuals’ behaviors, these results do tell us that our direction is promising. Current work is focused on programming the decision trees and decision stump rules into the robot controllers in order to waitingmoving 3 5 3 Table 3: Confusion matrix for the best result learning four classes from the balanced subset data for H3. This training set did not contain any instances of waiting-waiting because H3 never invoked that state. Next, we describe our efforts to learn which states, represented as in equation (1), belong to which of the twocategory labels. The two categories distinguish between the states in which the human performs an action and when the human does nothing. The moving-moving and waiting-waiting instances were relabeled as no-action instances, and the moving-waiting and waiting-moving instances were relabeled as action instances. We ran each classifier again, on the balanced subset data, this time attempting to learn the two-category labels. Table 4 shows the best results, which represent a considerable improvement over Table 2. 57 H1 (85.94%) H3 (86.81%) Figure 5: Decision Trees (C4.5) for the two most accurately modeled trainers. The percentage of correctly identified instances appears in parenthesis. H1 (68.75%) H2 (66.67%) H3 (71.43%) H4 (75.00%) H5 (78.23%) act if you are heading not too far to the south act if you are going quickly west would allow robots to get closer to each other than the fixedthreshold method would. As a result, the amount of time that a robot spent waiting for others to move out of the way would decrease. The fixed-threshold method used a distance of 50cm, with results reported in (Sklar et al. 2012). act if you are closer than 77cm to one of the other robots act if you are heading not too far to the east act if you are closer than 124cm to one of the other robots Interestingly, the human operators who made decisions based on distances between robots (i.e., either r1 or r2 ) acted on distances greater than 50 cm: 77cm and 124cm in the decision stump rules for H3 and H5, respectively; and 76cm, 61cm and 75cm in the decision trees for H3, H4 and H5, respectively. The other trainers, however, did not consider the distance to other robots at all. Instead, they examined other factors, such as the directions in which robots were heading. In the decision trees, where more complex decision rules can be coded, even those trainers who did consider distance also considered the directions in which the robots were heading. So our preliminary conclusion is that the fixed-threshold method does not take into consideration all the important factors in collision avoidance. Our current work is analyzing the delay time for all the experiments described here to identify whether the humans’ performance in regard to this metric is better than that of the system when using the fixed-threshold method for collision avoidance. Table 6: Decision Stump rules for all five trainers, with the percentage of correctly classified in instances in parenthesis. evaluate how well the system performs when following the strategies modeled on different human subjects. Comparison to the fixed-threshold method for collision avoidance To address the question of how collision avoidance decisions made by human operators compare to our fixed-threshold method, we analyzed the distances between robots when the trainers suspended and resumed robots’ motion. Our intuition and initial hypothesis was that the human operators 58 Summary of folds for internal cross-validation = 0 (no crossvalidation); number of runs for internal cross-validation = 1; threshold on the improvement of the likelihood = −Double.MAX VALUE; shrinkage parameter = 1; random number seed = 1; number of iterations = 10; base classifier = weka.classifiers.trees.DecisionStump. This paper presents preliminary results from experiments in which a human guides a team of robots to avoid collisions. We recorded data from human operators who were instructed to help members of a robot team to avoid collisions with one another. Operators could suspend or resume robots’ movement. Based on data recorded while the human subjects were engaged in this task, we trained different classifiers to predict when the robots should change state (toggling from moving to waiting and waiting to moving). The best-fitting classifier achieved 90% accuracy. A more rigorous evaluation will use the learned classifier rules to control the robots autonomously and measure their ability to avoid collisions. This is necessary since the fact that we can learn a classifier that performs well on the data we logged does not mean that we have learned a classifier that will be good at stopping robots from colliding. Performing this validation step on the robots is the next major activity in this investigation. References Argall, B.; Chernova, S.; Browning, B.; and Veloso, M. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5):469–483. Bowling, M., and Veloso, M. 2003. Simultaneous adversarial multi-robot learning. In Proceedings of the 18th International Joint Conference on Artificial Intelligence. Cypher, A. 1991. Eager: Programming repetitive tasks by example. In Proceedings of the ACM Conference on Human Factors in Computing Systems. Habib, M. K. 2007. Humanitarian Demining: Reality and the Challenge of Technology. Interational Journal of Advanced Robotic Systems 4(2):151–172. Hersch, M.; Guenter, F.; Calinon, S.; and Billard, A. 2008. Dynamical system modulation for robot learning via kinesthetic demonstrations. IEEE Transactions on Robotics. Jacoff, A.; Messina, E.; and Evans, J. 2000. A standard test course for urban search and rescue robots. In Proceedings of PerMIS. Katagami, D., and Yamada, S. 2000. Interactive classifier system for real robot learning. In IEEE International Workshop on Robot and Human Interaction, 258–263. Lockerd, A., and Breazeal, C. 2004. Tutelage and socially guided robot learning. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems. Maes, P. 1994. Agents that reduce work and information overload. Communications of the ACM 37(7):31–40. Matarić, M. 1997. Reinforcement learning in the multi-robot domain. Autonomous Robots 4:73–83. Murphy, R. R.; Casper, J.; and Micire, M. 2001. Potential tasks and research issues for mobile robots in RoboCup Rescue. In Robot Soccer World Cup IV, volume 2019 of Lecture Notes in Artificial Intelligence. Springer. Nicolescu, M. N., and Matarić, M. J. 2001. Learning and interacting in human-robot domains. IEEE Transactions on Systems, Man, and Cybernetics 31(5):419–430. Parker, L. 2000. Multi-robot learning in a cooperative observation task. In Proceedings of the Fifth International Symposium on Distributed Autonomous Robotic Systems. Pugh, J., and Martinoli, A. 2006. Multi-robot learning with particle swarm optimization. In Proceedings of the 5th International Conference on Autonomous Agents and Multiagent Systems. Santana, P. F.; Barata, J.; and Correia, L. 2007. Sustainable Robots for Humanitarian Demining. International Journal of Advanced Robotic Systems 4(2):207–218. Sklar, E. I., and Icke, I. 2009. Using simulation to evaluate data-driven agents. In Multi-agent Based Simulation Acknowledgments This work was supported by the National Science Foundation under grants #IIS-1117000 and #CNS-0851901. Appendix: WEKA defaults weka.classifiers.trees.J48 (C4.5): confidence threhold for pruning = 0.25; minimum number of instances per leaf = 2; seed for random data shuffling = 1. weka.classifiers.lazy.IBk (k-nn): nearest neighbour search algorithm = weka.core.neighboursearch.LinearNNSearch; number of nearest neighbours (k) used in classification = 1. weka.classifiers.rules.PART: confidence threhold for pruning = 0.25; minimum number of instances per leaf = 2; seed for random data shuffling = 1. weka.classifiers.bayes.NaiveBayes (naı̈ve Bayes without kernel): use normal distribution for numeric attributes. weka.classifiers.rules.OneR (OneR): minimum number of objects in a bucket = 6. weka.classifiers.functions.SMO (svm): kernel = weka.classifiers.functions.supportVector.PolyKernel; complexity constant C = 1; normalize; epsilon for round-off error = 1.0e − 12; use training data for internal crossvalidation; random number seed = 1; size of the cache = 250007; exponent = 1.0; do not use lower-order terms. weka.classifiers.functions.Logistic (logistic regression): maximum number of iterations = −1 (until convergence). weka.classifiers.meta.AdaBoostM1 (AdaBoost): percentage of weight mass to base training on = 100; random number seed = 1; number of iterations = 10; base classifier = weka.classifiers.trees.DecisionStump. weka.classifiers.meta.LogitBoost (logit boost): percentage of weight mass to base training on = 100; number 59 IX, volume 5269 of Lecture Notes in Artificial Intelligence. Springer-Verlag. Sklar, E. I.; Salvit, J.; Camacho, C.; Liu, W.; and Andrewlevich, V. 2007. An agent-based methodology for analyzing and visualizing educational assessment data. In Proceeding of the Sixth International Conference on Autonomous Agents and Multiagent Systems. Sklar, E. I.; Özgelen, A. T.; Muñoz, J. P.; Gonzalez, J.; Manashirov, M.; Epstein, S. L.; and Parsons, S. 2011. Designing the HRTeam framework: Lessons learned from a rough-’nready human/multi-robot team. In Proceedings of the Workshop on Autonomous Robots and Multirobot Systems. Sklar, E. I.; Özgelen, A. T.; Schneider, E.; Costantino, M.; Muñoz, J. P.; Epstein, S. L.; and Parsons, S. 2012. On transfer from multiagent to multi-robot systems. In Proceedings of the Workshop on Autonomous Robots and Multirobot Systems. Sklar, E. I.; Blair, A. D.; and Pollack, J. B. 2001. Training intelligent agents using human data collected on the internet. In Agent Engineering. Singapore: World Scientific. 201– 226. Sklar, E. I. 2000. CEL: A Framework for Enabling an Internet Learning Community. Ph.D. Dissertation, Department of Computer Science, Brandeis University. Teitelman, W. 1979. A display oriented programmer’s assistant. International Journal of Man-Machine Studies 11:157– 187. Witten, I. H.; Frank, E.; and Hall, M. A. 2011. Data Mining, Practical Machine Learning Tools and Techniques. Elsevier Inc., third edition. 60

Learning to Avoid Collisions

Related documents

Products

Support

Learning to Avoid Collisions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib