From: AAAI Technical Report FS-93-03. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. WhatGoodIs Your VacuumingRobot’s Intelligence? R. Peter Bonasso Space Systems Division the MITRECorporation 7525 Colshire Drive Mclean, Virginia 22102 pbonasso @mitre.org The situation is the same with actuators: if they can’t perform the perfect actions needed for the task, some combination of computationand action can approximate the neededactions. Introduction Robotcontrol is about choosingactions over time to carry out a task. Howdoes one measurethe efficacy of software usedfor robot control? If weare to realize an instantiated real-world agent, e.g., a vacuumingrobot, we need to understand the contribution of the control software, particularly if we are to compare one method versus another. The question is complex, because controlling robots is complex. To a certain extent, the ability or inability of the robot to carry out a task in a natural environmentis dependenton the quality of the sensors and actuators and the rate of changeof the environment.With perfect sensorsandperfect actuators, the problemis to find the mapping between states and actions which accomplishes the goal. This problem -- the AI planning problem --is knownto be theoretically intractable [Chapman 87]. But let’s say the mappingcan be foundfor a our vacuumingtask, i.e., we have pruned the states and actions to those we really care about, and the mappingcan be discoveredin a reasonableamountof time. Thenif the robot receivesperfect state informationandcan performall actions, but the time of its sense-act cycle is longer than that of the environmentalchange,the robot will ultimately fail. Let’s assumethis is not the case, and that we have founda mappingthat has a responsetime well within that of the environment. So the state of robot control is as follows: ¯ Wecannot accurately measureevery state that is of interest for the task. ¯ Wecannot execute perfectly every action required for the task. (A corollary to the aboveis that while we can endeavor to improvethe sensing and actuation, there will alwaysbe environmentsand tasks which will defeat them) ¯ Computationand robot motionare neededto overcome sensing and actuator limitations. ¯ Wemust limit the computationsof at least the most critical sense-decide-actcycle (e.g., obstacle avoidance) order to stay within the frequencyof variability of the environment(e.g., the fastest movingobject) ¯ Wecannot find even a "satisficing" mappingof states to actions a priori (at compiletime) to carry out a task since the environmentis not completely predictable. Despite these problems, we are seeing an increase in the numberof examplesof robots carrying out tasks reliably in field environments.Thereare a variety of reasonsfor this increase in competent robots. In some cases (e.g., [Dickmanns86]), advantage is taken of a single reliably predictable part of the environment (the curvature of Autobahn highways).In others, the environmentis partially engineered(placing of barcodes)in the vicinity of the task execution (the 1992 AAAIrobot competition [Bonasso Dean92]). In all cases, there is a reliance on a nearcontinuoussensing of the environmentalong with the use of software processesthat can be executedin parallel, and whose outputs to the actuators are selected with a prioritization (e.g., [Brooks86], [Bonasso92]). Andit clear that morethan any advancesin sensorsor actuators, it is the software which makesthese robots succeed. Theproblemwouldthen be solved, except that this solution (finding the mapping)is based on the environmentbeing predictableenoughthat the actions carriedout will result in persistent changes in the environment. Else, the robot trying to stack Block B on Block C and Block A on Block B will never finish its task if BlockB keepsgetting moved by a mischievousagent. So a good deal of research into intelligent robot control centers on re-planning(finding newmapping)based on sensed changes in the environment that do not conform to the model of the environment embodiedin the mappingof states to actions. But the problemis eventougher: the sensors and actuators are limited in the states they cansense and the actions that can be taken. If the sensorsfall short of beingperfect, some combinationof computation,sensing and action (purposive sensing)needsto be carried out to approximateor infer the missing states from past states and a priori knowledge. 14 Morerecently, there has been an emphasisin integrating the moretraditional AI hallmarksof deliberative planning to whathave beenmostlysuccessful reactive systems(e.g., [Bonasso 91], [Gat 91], [Connell 92], [McDermott92], [Slack 92]). These efforts involve systems which have rapidly executing skills (chunks of sensing, computation and acting) which succeed in a wide variation of environments, asynchronousdeliberative planning which will find alternate mappings of states to actions withpartial orderings, and somemechanismby whichto transform the discrete event reasoningof the planner to the continuous activity of the skills. But will any of this makea difference in the performance of our vacuumcleaning robot? Whatis the value of adding deliberative planning, or eventually, learning and perhaps natural language capabilities to reactive robots? A reasonable conjecture is that robots whichrememberand can reason andcarry on a discourseabouttheir actions and the results of those actions over time will exhibit more intelligent behavior than purely reactive robots, which, while robust maynot be efficient whentrying to achieve a goal (e.g., the randomwalk of Scarecrow[Bonasso& Dean 92]). Wehave been developing a methodologyfor measuring, understanding,and thus predicting the contributionof the software in endowing a robot with task achieving capabilities in dynamicenvironments.This paper outlines that methodology with an example from the household vacuuming domain. 1. Define the task within the context of the specific environment. 2. Define/describe the variability of the environmentin the aspectsthat are critical to the task. 3. Define/describe the limits on the robot sensors and actuators. 4. Specify the extent to which the environment maybe engineered. 5. State the specific robot capability one wishes to measure. 6. Define the measures by which "success" will be evaluated. 7. Determine through off-line analysis, to the extent possibl,e that it is feasible for the proposed system to accomplish the described task under the described environmental conditions. 8. Structure test cases tailored to capturing the data necessaryfor provingor disprovingthe claimedcapability in step 5. 9. Run the tests, gathering the data necessary for analysis. 10. Analyzethe results with respect to the robot hardware and the claims madefor the software. (Thelast twosteps are iterated, concentratingon the areas in which the robot is having problems) The Methodology Step five is the step whichseparates this methodology from simply a specification writing and acquisition testing procedure. We are concerned with measuring the contributionof the parts of the robot softwaresystemto the overall systembehavior. Whichparts we are interested in evaluating will dictate the measureswe mustuse, the tests to be run, and the types of analysis to be performed.For example, if we are interested in determining whether algorithm A for determiningthe location of a barcode is better than algorithmB, we mightrun tests concernedwith accuracyof barcodelocation (testing the individual skill), and tests of a full systemtask (find andvisit) to understand whether the new algorithm’s computation time slows or speeds up the robot’s overall performance.If, however,we wish to understand the value added of a control system which uses long-term memoryversus a reactive system without memory(as will be the case in the detailed exampleto follow), we maynot be concernedat all about individual skill tests, but more with tests of speed, efficiency, and improvedperformanceover time. Aninsight gained at a recent AAAIsymposiumon mobile robots wasthat the scientific method,so prevalent in the physical sciences is perhaps not appropriate for the developmentof intelligent robots [Kuipers 92]. This is because in the physical sciences the emphasis is on theorizing about the nature of an existing phenomenon, whereas in robotics we are actually creating the phenomenon. In our methodology there is room for hypotheses and tests once the phenomenon is functioning in a task environment, particularly whenthe behavior exhibited is unexpectedbased on preliminary analysis. Anotherinsight gained in other discussions is that we are attempting to makestatements about the behavior of a complexsystem, and thus we are limited in what we can say aboutthe effects of the individualparts of the systemas they contribute to the overall behavior. Since we are evaluating complex phenomenaand their interaction with complexenvironments, our methodology espouses an engineering philosophy, combining quantitative and qualitative analysis, the use of observationaldata, and controlled experimentation.It also relies on a detailed description of the task and the variability of the environment within whichthe task is to be carried out. Most importantly, in describing the methodology,we focus on measuring the value-added of a given piece of software to a given robot hardwaresuite. In the description that follows, a mythical robot (Cinderella) is to vacuumthe floors of a one-story house. DeFinethe Task(1) Here one defines the minimum essential requirements for the task. Aesthetically pleasing performancemayor may not be a requirement. Our methodologyhas ten steps: 15 ¯ Define the stepwise execution of the task, expected frequency and required constraints. For example, Cinderella, is expected to keep the floors in the two bedrooms,the living room, dining roomand den vacuumed at all times; but is not to vacuum at night or whenthere is anyonepresent in a given room. humansor furniture up, so we might allow no morethan a wirelesstether. Specify the Allowable Environmental Engineering (4) The question here is how muchdo you want your house altered? If we are going to push for a high degree of intelligence, no altering should be allowed. But one might get a robot to do the job in this centuryira fewvisual cues were allowed. However,putting baffles around tables because of inadequate proximity sensors is probably unacceptable. ¯ Define the limits of the environment, materials, and environmentalconditions. A floor plan could be provided whichwouldinclude the kind of lighting, and the material of the furniture, etc. Perhapsa better approachwouldbe to invite the potential designers over to "experience" the house. Define/describe the Variability of the Environment(2) State the Specific RobotCapability to be Measured (5) Here it is most important to describe the "nominal" conditions under whichthe robot is required to operate. In this step weare focusingon whatwe expectthe robot to do from the standpoint of intelligent behavior. Wearen’t really that interested in the area of the floor that was coveredor the quality of the cleaning, but howintelligently the job was done. In step six we need to carefully define the measuresof goodnessof that intelligence. Anexample mightbe that wewant to determinethe value addedby the addition of long term memoryto a reactive system. In essence,the robot will carry out the task withand withouta mapof information obtained in previous runs. Wemight hypothesizethat we should see an improvement in the time it takes the robot to vacuumall the floors whenthe robot has a memoryof where everything was the last time. ¯ Climate, lighting, weather, e.g., indoor lighting will prevail except during the hours of 9 pm- 6 am, when manyor all lights will be off. ¯ Number,type, speed, density, frequency of movement of objects. The frequencyof movement of the furniture, the numberof people whonormally occupy the house and how fast they will be expectedto move,etc., could be detailed, but "experiencing"a typical householdenvironment(as in step 1) would be more practical (we’re talking about Everyfamilyhere, not a governmentcontract). Define the Measuresof Success (6) ¯ Number,type, speed, activity of humans. The number and ages of the householdoccupantsand a range of their activities could be specified. This is importantsince many of the measuresof "goodness" for intelligent behavior hinge on interaction with humans. Carl Friedlander[Friedlander 92] has suggestedthat there are behavior measures, inspection measures, and reduced functionality measureswhichcould be brought to bear in evaluating robot performance.Behaviormeasuresrelate to the observedbehavior of the total systemas reflected by the software logic. Inspection measures are direct measurementsof the outputs of the software moduleof concern. Reducedfunctionality measuresare the behavior and inspection measuresapplied whenparts of the software are removedor disabled. Of course the reverse of reduced functionality is improvedfunctionality, the latter designed to measurea value added, and the former being perhaps more oriented to whichparts of the systemare the most critical. Define the Limits on the Robot System(3) ¯ Size and speed of platform. Wemayallow, for example, a three foot robot that movesat about 1 foot per second (fps) whenhumansare present or 3 fps whenno one home. ¯ Sensor limitations. Weprobably don’t want laser scanners used in a householdsituation. ¯ End-effector limitations. A typical requirement might read: The robot shall not use an end-effector whose operationwill prove harmfulto humansor to furniture or kitchen appliances. Webelieve AI tests concernimprovedfunctionality and the measuresare essentially all behavioral. ¯ Taskcompleted.Withregard to the addition of long term memoryand our expectations of value added, we would add to completingthe task the following: Theless average time the robot spendsdoingthe floors the better. The time to complete a single round of vacuumingshould increase only in proportion to the number of changes in the environmentsince the previous round. ¯ Power;e.g., only available wall outlets can be used for power. ¯ Autonomy.Althoughsometasks might be allowed to be done with a tether, we’d want Cinderella not to tangle 16 mightrun Cinderella twice a day, once in the morningand once in the afternoon at times the robot will typically be expected to do the task. Wewouldanalyze the results and if there are regularly occurringerrors, wewouldreturn to this step anddesignsubtasktests. So for example,the robot mayhave no trouble getting to the rooms, but doesn’t always get through the door without catching the door frame. This could suggest a problem with the obstacle avoidance routine or an odometryproblem. A set of test cases could then be designed involving running of the navigationsystemas Cinderella wentfrom outside to inside a room. A purely reactive system might relocate and homein on each roomon each round, thus ostensibly taking moretime than a system which "remembered"the floor plan. The systemwith memory wouldmoveas directly as possible to the vicinity of the knownprevious location of each room before homingin on the door of the room. It is importantto note here the interplay of the environment and the allowable amountof environmentalengineering in the expectationsof outcome.If the environmentwas just a large great-room(no interior walls), or if markerswere allowed to be placed on the rooms in such a way as to allow viewing by a long range sensor, the expected outcomewouldnot be clear. A reactive systemwith a long range sensor might perform as well as the system which memorizedthe floor plan. As well, if the furniture was rearranged often (as by a wild bunch of ankle biters), knowingthe floor plan might not be as important as maneuveringamongobstacles. Assumingthe robot is generally successful in the runs described above, a secondset of runs will involve taking observations of the robot’s performanceover a specified period of time during which there are more complex environmentalconditions that cannot be predicted, easily controlled, or whichwouldbe too cosily to examine(the presenceof humanscarrying out daily activities, climate, e.g., creating a humidityproblemindoorsduringthe winter season). In the example, Cinderella would perhaps be required to carry out the vacuuming for three weeksduring the fall, winter, spring and summerseasons. ¯ Safety. Part of intelligent behavioris cognizantfailure [GAT91]. Safety is not just a matter of halting or shutting downin light of unsafesituation; it also involvessomekind of "unwind-protect" on the current activity as well as user notification. For the purposes of evaluating portions of the software architecture, the above runs wouldbe repeated with as identical as possible conditions using different versions of the software. Off-line Analysis (7) Somewhere in the process, the details of Cinderella’s hardwareand software need to be presented in order to conduct some analysis (computations and logical reasoning)prior to actual on-line tests. It can be augmented with a simulation of the gross systemperformanceto get a sense of wherethe shortcomingsmight be. This analysis wouldinclude following throughthe logic of the software, comparingthe sensor and end-effector limitations to the environmental conditions under which the robot must function, and simplymatchingclaimed capabilities to the desired capabilities listed in the previoussteps. Run the Robot Capturing the Data Necessary for FurtherAnalysis (9) The "data necessary" is usually a logging of the robot inputs, outputs, andconfiguration(position andorientation) at each cycle of a given run. This is the data to be usedto inspect the outputs of the parts of the softwaresystemwith whichwe are concerned. It is also used to debugsoftware fixes; we can get an initial idea of howthe robot will perform by running the new software on this data. Structure Test Cases to Prove or Disprove the ClaimedCapability (8) For detailed testing of individual skills, physical measurements must be taken such as the position of objects in the area of interest. Anindependentlocating systemfor tracking the robot’s global position mayalso be in order. But for answeringquestions of improvedfunctionality via systembehaviortests, a stopwatchand a set of observations is all that is initially necessary. For testing individual robot skills, a set of runs for each skill could be specified which are in the middle and extremesof each variable’s rangefor those variables which can be easily controlledby the testers (lighting, numberand separation of obstacles). In general, this is the method currently in use to debugskills. If the robot performs properlyin these tests andif timepermits,further tests can be conducted,for exampleto determinethe limits of these skills beyondthat stated for the tasks at hand. Analyzethe Results (10) Here we must rememberthat we are comparing/conlrasting configurations of software to understand what advantage one configurationhas over another. Thetemptationmustbe avoidedto understandwhya robot failed a given test run. In other words,don’t dilute the measurements required by adding additional measurements that will not answerthe test criteria. But for examining overall system behavior, there are usually too manyvariables to practically control in a field setting. A moreabstract level of runs will most likely be more useful in pointing out system problems which, in turn, wouldsuggestwhatmoredetailed controlled runs are necessary. For example: for one week of workdays, we 17 For instance perhaps in analyzing the data collected we found that Cinderella with memoryvacuumedthe first floor of the house on average faster than without memory except whenthe numberof people present in the house exceededa certain value. Nowwe moveinto a hypothesize and test mode,perhaps hypothesizingthat the additional information from the mapbecomesless useful when any path from one point to another in the roomis not very straight. Wecan then start fromstep five and repeat the process. [Chapman87] D. Chapman. Planning for conjunctive goals. AI Vol. 32, No 3. July 1987. Elsevier Science Publishers. [Connell 92] Connell, Jonathan. 1992. SSS:A hybrid architecture applied to robot navigation,in Proceedingsof the IEEE International Conference on Robotics and Automation, April. [Dickmanns 86] E.D, Dickmanns and A. Zapp, A curvature-based scheme for improving road vehicle guidance by computervision. In Mobile Robotics, SPIEProc, Vol. 727, Cambridge,MA,1986, pp. 161-168. Oftenin the analysis phasewe mightbe trying to find out whythe robot, whilenot failing, did somethingthat wasnot expectedbasedon the testers’ understandingof the software and hardwarespecifications of the system (Step 7). For example, maybeCinderella didn’t seem to perform any better with the mapthan without it. To conclude that memory is not an essential part of intelligent vacuuming may be premature. At this point, the users of this methodologywould moveto step 5 with a new hypothesis about the phenomenon in question, devising new measures and tests to verify or refute the hypothesis. [Friedlander92] Friedlander, Carl. Position paperon robot control metrics. DARPA UGVWorkshop, Winter 1992. [Gat 91] Gat, Erann. 1991. Taking the Second Left: Reliable Goal-Directed Reactive Control for Real-World AutonomousRobots, Phd Dissertation, VPI. [Kuipers 92] Kuipers, Ben. Comments during a presentation of the AAAIFall Symposiumon Real World Robots, October 1992. Summary Wehave presented a methodologyfor determining the value addedof a softwaremodulein the control systemof a robot. This methodology stresses off-line analysis, test case observationsand post test analysis basedon the logic of the software design. Becauseof the complexityof determining why a robot system performs better or worse when an "intelligent" componentis added or subtracted, this methodologyalso stresses examiningthe overall system behavior at the outset. In this manner, there is a good chance of seeing a clear improvementor non-improvement in system performance without requiring expensive instrumentation and empirical data acquisition. [McDermott 92] McDermott, Drew. Transformational Planning of Reactive Behavior. YALEU/CSD/RR # 941. Dec 1992. [Slack 92] Slack, MarcG. Sequencing Formally Defined Reactions for Robotic Activity: Integrating RAPSand GAPPS.Proceedings of SPIE OE/Technologyconference on Sensor Fusion, Boston, November1992. References [Bonasso 91] R. P. Bonasso. Integrating Reaction Plans and Layered CompetencesThroughSynchronousControl, bz Proceedingsof the 12th International Jobtt Conference on Artificial bltelligence. Sydney, Australia. Morgan Kaufman.1991. [Bonasso 92] Bonasso, R.P. Using Parallel Program Specifications For Reactive Control of Underwater Vehicles, in Journal of AppliedIntelligence, June 1992. [Bonasso & Dean 1992] A Review of the First AAAI Robotics Competition. AAAIProceedings of the Fall Symposiumon Real World Robots, October 1992. [Brooks 86] RodneyA. Brooks. A Robust Layered Control System for a Mobile Robot. IEEEJournal of Robotics and Automation, RA-2:14-23, April 1986. 18