WhenBeing Reactive Just Won’t Do From: AAAI Technical Report SS-95-04. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. Sharon Wood School of Cognitive & ComputingSciences University of Sussex, Brighton, BN19QH, U.K. email: sharonw@cogs.susx.ac.uk Abstract Reactivesystemsare. generally,highlysuccessfulfor dynamic uncertaindomains. Analysing whythis shouldbe so indicatesthe maincriterionfor successis that information apprehended abouta givensituationcanrenderit certain enoughto reliably informaction. Acase is madefor informingactionby dynamic worldmodellingfor domains wherethis doesnot hold.When the issueof attendingselectivelyto situationalinformation is introduced, the case for dynamic worldmodellingis enhanced.Asystemincorporal~gselective perceptionwhichdynamically models its domain is described. Introduction Weare f~millar with the idea that dynamicdomaln.~pose problemsfor deliberative planning.Thereasons are welldocumented(cf Goorgeff. 1987); to lxiefly s~immarise. deliberativeplAnnln~ is generallyunableto derivera timely response for domainswhereevents outpacethe planning process; they arc generally unable to respondto events occurring during the plannln~ process; and they are generally unable to handle the urgency with which problems must be addressed. To dale. the solutions proposed to overcomethese problems have generally resided in reactive systems. The mainadvantagesthese systemsoffer is they can deliver a timely response to events; they haveno planningprocessto interrupt so they are able both to tackle eacheventas it arises andto do so with the requisite degree of urgency.Wheresomeelement of deliberative planning is preferred, wehave seen the emergenceof hybrid systems. Often these are multilayered systems comprising a reactive level and a deliberative level, sometimes more,with the capability to select betweenthe solutions available fromeach level in orderto deliver a timelysolutionto the currentsituation(cf Ferguson, 1992; Georgeff & Lansky, 1987; Ogasawara, 1993; Sanborn& Hendler, 1988; 1990). Wherevera realtime responsivecapability is required,reactive systems. whetherincorporated within a hybrid systemor not. have generally appearedto offer a solution. If weanalyse why this mightbe the case. I believethere are someinteresting insightsto be gained. The Nature of Uncertainty The pnalysis I wish to makeinvolves, initially, moving awayfrom a focus on the need to be reactive or fast. However. ultimately. I shall return to t hi,~ issue as it is pertinent to the analysis. Insteadthe analysis rests on the relationship of reactivity to uncertainty in the execution environment. 102 The main problem posed by a deliberative approach to planningin a dynamicdomainis that pl~nnln~takes time. However.time is only a problem in so far as we ate uncertain about key features of the executionenvironment whichmight inform our planning. In a perfectly known universe, plans maybe laid in goodtime and executedat the apposite moment.This maybe staling the obvious, but statln~ it allows us to makesomeimportantobservations. Thefirst is that until information abouta givensituation is apprehended,it is not possible to specify an appropriate course of action (or reaction). Thesecondis that once informationabout a given situation is apprehended,it is certKmenoughto informaction. This second observation is crucial to the analysis of whyreactive systemsare successfulfor certain classes of problemsor domains. (Re)action selection depends upon identifying characteristicsof a givensituationthat indicatethat a given responseis appropriate. This is usually ’hard-coded’as a rule whoseantecedentdescribessituational conditionsand whoseconsequentthe (re)action(s) to take. However, restxmsewill be appropriateonlyif it interacts successfully with the situation as it unfolds.In orderfor this to be the case, the situational conditionswhichtrigger a rule must,to someextent, predict events whichwill ensue in that situation. There are two waysin whichan action maybe unsuccessful.Thefirst is througha failure of execution For example,you turn the handle of a door and pull but nothinghappens;or it opens then immediatelyswingsshut again Herein lies the powerof reactive systems: they simplytake the current outcomeas a newset of conditions whichwill trigger the appropriateresponse[Thesecondis that the response madecan be described as the wrong response2. That is, someother response should havebeen made.It is a mootpoint whether this is, in fact, any tHerein also lies the argumentthat only a reactive approach will suffice for this kind of problem. 2Of course, there are degrees of ’wrongness’, so the response might be just sub-optimal (compared to, say, a more deliberated response). However,this is not the issue addressedhere. different from the kind of failure described above; however,I think it helps to consider that it is. This is because, with the best will in the world, failmes of executionwill alwaysoccur and their consequencesborne. Whereasfailure to producea correct responseindicates a failure in the response selection process. This in mrn points to a failure to correctly identify or evaluate the situational conditions whichled to its invocation; an evaluation which depends upon correctly anticipating situational outcomes on the basis of currentobservations. Theimplicationis that reactive systemscan be successful, other than by chance,for uncertain, dynamicdomainsonly if the informationapprehended about a given situation can render predictions about the outcometo that situation certain enough to reliably inform action Without the ability to predict, and hence resolve the taw.ertainty concerningthe outcometo a situation as it unfolds, on the basis of current observations, action selection cannotbe reliably performed. Theextent to whichthis needconcernus will dependupon the characteristics of the domainwith whichthe system mustinteract. Aspointedout earlier, if the choiceof action is incorrect and the systemunsuccessfulthen the reactive approach can overcome this by evaluating the new situation that happensto turn out andselecting a response as before, assumingthat eventuallythe responsemadewill be a goodone. Thatis, it maymatter moreor less whether or not the system always makesthe correct response if incorrectonescan be put right withlittle consequence. However, there is anotherissue whichis separate fromour concern with howoften the system makesthe right or wrongresponse; although it will affect howoften the systemdoes get it wrongor fight. This is the extent to whichthe outcomesto situations are inherently predictable whenbased uponthe observable characteristics of those situations. If weare unable to correctly anticipate the outcometo a situation on this basis, then werequire a meansof supplementing the implicit prediction givenus by situation-action rules. Whatweneedto supplementit with is the ability to reasonaboutthe accuracyof the predictions on which we are to base action: to detect whennaive prediction is inaccurate; to identify whythe situation is unfoldingin someother way;and to use that informationto enhancethe accuracyof subsequentpredictions about the continuingdevelopment of the situation. At this point. I wouldlike to return to the issue of the need for responsesto he fast. Discussionabout action usually focnsses on behaviour whichdirectly achieves goals by bringing about changesin the world;however,actions for acquiring information also need to be carried out. In situations wherethere is continualchangeof an unceffsin nature, informationmustbe acquiredbefore the needto act arises, so there is little time available. In addition, the opportunityfor acquiring somekinds of informationwill be cc~strainedto a limited time window.Notonly, then, is the information to be apprehendedcontinually changing and of aa uncertain nature, it is not possible underall circumstancesto apprehend all the informationavailable. Whetherby cirounstance or design - that is, be it haphazardor informed- the apprehensionof information mustbe selective. As a result action and prediction must be based on only a limited subset of the available information.This diminishesthe inherent predictive power of the observations made; because had we madesome other (possible) observations wemighthave madeother overridingpredictions. I have w~dedthis statementthis waybecauseI believe it strengthens the case against a simple reactive rule-based approachas being appropriate for all classes of uncertain, dynamicdomains.Namely, had other observations beenmade,a different rule might have beendeemedcorrect. I believeit also strengthensthe casefor the appropriateness of a capability for reasoning about the accuracy of predictions uponwhichactions are to be based. Not least of all is the reason that predictions arising frommore recent observationswill be placedin the contextof already existing expectations.Whetheror not they entirely agree, posing a question over their accuracy, a morecomplete picture of theemerging situation will be built up than on the basis of those mostrecent observationsalone. Below, I describe a system, AUTODRIVE, which incorporatesan initial attemptto implement the capabilities of predictive world modelling based on selective perception. Dynamic World Modelling in AUTODRIVE The AUTODRIVE system simulates the dynamic. uncertain domainof driving. Asymbolicdescription of a situation andthe changesthat occurwithinit are presented as a series of ’snapshots’. Informationabout this world maybe apprehendedselectively by focussing on single domainentities over a sequenceof such snapshots. The informationis used to build up a dynamicworldmodelfor drivers of vehiclesin the microwcxld. Everyvehicle in the domainbuilds up its ownunique dynamic world model based on the observations madefrom its ownviewpoint. Of course, these viewpointsinclude drivers seeing each others’ vehicles. Thedynamicworldmodelis a predictive modelof howa situation will unfoldover an eight second period.It is usedto informthe actionsof the vehicledriver using an integrated planning and dynamicdecision-making approach which is not discussed further here but is describedin detail in Wood (1993). The Observation Mechanism Aseach successivesnapshotof the worldis presentedto a vehicle driver, data about the current situation are made accessible to an Observation Mechanism(OlVl). This mechanism selects data pertaining to a single object3 and passes it into a Short TermInformation Store (STIS). During this selection process, the OMhas access to informationpreviously placed in the STIS. Objects which havepreviouslybeenobservedwithin a precedinginterval of four seconds are ignored. The remaining observable objects are prioritised on an arbitrary preset rating scale accordingto class of object. This rating maybe modified by requests received from the planning and decisionmakingprocesses, and processes for plan recognition. about the needto observecertain objects in particular. Onceselected, aa object is ’fixated’ for 0.5 seconds(five time cycles in the current implementation). The Short TermInformation Store whichdecision-makingwill take place, over this period. Anexampleprojection for a fixed object wouldlook like this: projection,[red_car,[[sign, no_entry], sign5,[54.0,61.0]],9.6,9.7] This projects the relative distance, eg 54.0 metresand61.0 metres, of a pair of no-entry signs fromthe viewpointof the red car over a 0.1 secondinterval. Theinterval over whichthe object is predictedto be at the relative distance specifiedis givenby the italicized figures, e8 9.6 - 9.7. ThePLTWM bases its projection of the relative distance of fixed objects uponthe knownmovements of the observing driver, decreasingby preciselythe distancetravelled by the driver per interval. If the driver intends to changethe pattern of his behaviourduringthe projection period, the modellertakes this into accountin generatingthe projected sequence. Data describing each observed object, in a temporally qualified form. are stored in the STIS.Theexamplebelow of data containedin the STISdescribes a vehicle aheadof the driver, travelling at just under30mph.Overa sequence of intervals the green car observesthe red car travelling froma position 40.0 metres along the LewesRoadto 45.0 metres along the nortbemcarriagewayof the LewesRoad. Thefixation start andendtimesare italicized. Because other agents moveindependently of the driver viewingthem, projecting the future locations of other vehiclesinvolvesascertainingtheir speedandtrajectory in orderto calculatethe absolutedistancetheywill travel over the projection period. It also involvespredicting likely behavionrchanges.This firstly requires interpretation of the observedbehavioursequence. This is segmented,if required, at points wherethe vehicle is observedto have either changeddirection, suchas whenreversing, or when, stis,[greencar,[[vehicle, ahead],carl,[re~r,[lewes_rd6,1ewes_rd, brighton],40.0,n,0.5,3.0.2.0]],0.0,0.1] say, starting off after havingbeenstationaryfor a periodof stis, [ greencar, [ [vehicle, ahead], car1. [ re, dear.[lewes_rd6, lewes_rd, time. A projection is based only on that behaviour brighton],41.25,n.0.5,3.0,2.0]],0.1,0.2] observed since one of those segmentation points. The s tis. [greaacat. [ [vehicle. ahead] ,car1. [redcar. [lewes_rd6.1ewes_M, PLTWM bases its calculation of vehicle movementupon brighton].42.50.n,O.5.3.0.2.0]],0.2.0.3] the degreeof movement whichtakes place over the fixation stis,[greeacar,[[vehicle, ahead],carl ,[redear,[lewes rd6.1ewes_rd, interval (0.5 seconds) segmentedin this way. However. brighton].43.75,n.0.5,3.0,2.0]],0.3.0.4] before the future movements of the vehicle can be stis, [ greencar. [ [vehicle, ahead], car1, [ redcar, [lewes_rd6, lewes_rd, projected, expectations about changes in behavionr in brighton],45.0,n,0.5,3.0,2.0]],0.4,0.Y] satisfying situational constraints and the personal intentions of the driver mustbe identified. This processis The Plan Lookahead Temporal World Model describedin the nextsection. The construction of the Plan LookaheadTemporalWorld The Plan Recogniser Model (PLTWM)constitutes AUTODRIVE’s dynamic world model. It is based on object location and vehicle ThePlan Recogniser(PR)uses the implicit description descriptor data stored in the STIS. Onceobserved, an object’s relative distance (fixed objects) or movements action containedin the STISto invokehypothesesabout the goal(s) a vehicle mightbe pursuing. ThePRdoes not (vehicles) are modelledover a period of eight seconds. reasonaboutthe actions of vehicles per se but abouttheir This modelis representedby a sequenceof data structures deviation fromexpectations.Theinitial projections of a describingthe object for eachsimulationcycle whichwill vehicle constitute expectations; it is uponsubsequent occur during the eight second period (that is, in this observation that deviations from expectations maybe implementationeighty data structures are generated for detected and hypothesesgeneratedregardingthe personal each object) so that AUTODRIVE has access to the intentions of the driver. relevant projection descriptor for each interval during ~I’hese information processing constraints do not apply to some categories of information, such aa feedback to the driver on his own position, the ourvature, gradient and width of the road and lane markings; that is, information that enables the driver to guide his vehicle. This decision was based on evidence that humansare able to acquire such data parafoveally, and is open to change. Thecurrent speedand trajectory of the observedvehicle plus the expectationthat it will attemptto travel at the speed limit provides a base line for generating expectations. The current situation 4 and the inferred personal intentions of the driver point up constraints on this behaviour.For example,a vehicletravelling belowthe speed limit will be expected to increase speed at a reasonablerate of acceleration. Fromthis, one mayalso calculate the distance the vehicle will havetravelled in achievingthis desiredspeeds assuming a reasonablerate of acceleration. speed_constraint,[green_car, red_car, 1.3, 8.0] Here the green_car hypothesises that the red car will behavein accordancewith the constraint to drive at a speed of 1.3m over a 0.1 second cycle (approx 30mph)and expectedto havecompliedwith that constraint by the time it has travelled the distanceof 8 metres.At this point the vehiclewill be anticipatedto level off its speedandso the anticipatedchangein the vehicle’sbehaviouris also noted: behaviour_change,[green_car, red_car,8.0, 1.3, 0.0] This describes the infen~ intention of the red car to changeits behaviourafter it has travelled the distanceof 8 metresin order to satisfy the constraint of drivingat 1.3 metresper time cycle thenceforth. Becauseit is currently the red car’s inferred intention to drive at the speed specifiedby the speedlimit, this intentionis also notedas the current immediate plan of the driver: plan_constraint,[green_car, red_car.13.8,0] All constraints are comparedto ensure that always the strictest constraint applies (eg that whichrequires the greater rate of deceleration) tkrough the constraint recording mechanism.For instance, a red traffic signal would lead the PR to anticipate the car ahead slowing down,and the speed constraint applying to that vehicle would be substituted with one reflecting this new information. Constraints might apply immediately, if appropriate,or at somepoint in the future as an anticipated behaviour change. In the latter case, the intention to change speed is noted as the vehicle’s active plan_constraint. For vehicles observedpreviously, earlier predictions are used to corroboratecurrent expectations. A vehicle whose overall behaviouris not predicted on the basis of preexisting or existing situational constraints, invokes reasmingbased on the possible personal goals a driver mayhave andtypical plans for achievingthose goals. The PRattempts to link inferred behaviour(across the two observations) to a knownplan. Supplementary evideneeis soughtto supportthis reasoning.For example,if a vehicle has pulled out andis decelerating,the goal of turningright *The current situation here being the recorded position of all objects and vehicles previously observed and modelled in the PLTWM correspondingto the current time cycle. 5Calculations are based on the standard equations of motion involving uniformacceleration: v = u+at; v*v= u*u+2as;s = t(u+v)/’2. 105 (left in the USA!)maybe hypothesised;therefore, a road junctionon the right, moreor less in the vicinity of where the driver will eventually cometo a halt. wouldbe good evidence in support of this hypothesis. Similarly, one mighthypothesisethat a vehicle rapidly slowingdownmay havenoticed an obstacle or a red light ahead, previously unseen by our owndriver; he wouldthen seek supporting evidence of this too. The process of reco~ising other drivers’ plans, then, makesfurther demandson the selective attention of the driver for confirmatoryevidence of one kind ¢r another. If the driver fails to find corroboratingevidenceof anykind, it is assumedthat the vehicle will pursue its cm~ntcourse of action to its conclusion,modifiedto take accountof current situational constraints. The projection of the vehicle in the PLTWM is consequentlymodelledaccording to the constraints and behaviourchanges identified. For example,the vehicle wouldbe projected as complyingwith currently active constraints until the next predicted behaviourchangeis deemedto comeinto effect. Thenceforththe vehicle will be modelledas complyingwith the newlyactive set of constraints. This process continues for each predicted behaviour change until modelling is completedfor the eight second period. The form of the PLTWM, then. is a sequenceof projected snapshotsdescribing every observed object and agent. It is sufficiently precise to identify interactions betweenthose objects and agents and the constraints they imposeon the viewer’sownbehaviour. Selective Perception in AUTODRIVE In the last section mentionwasmadeof directing attention to information for corroboration of plan recognition. Similar direction of attention takes place during AUTODRIVE’s decision-makingactivities. In both cases an attention ’request’ is generated and held alongside similar requests which, at the end of each dynamicworld modelling and decision-makingcycle, are passed to the Observation Mechanism(OM) described above. For example,here is an attention request by the driver of the green car to observe a direction sign. As it is as yet unobservedthe Data it will provide is unknown.The request has a priority rating of 6 on an arbitrary ordinal scaleof 1 to 10. [attendon_reque.st,[gre~n_car.[[sign,direction].Data,6.0]]] Objects for which observation has been specifically requested in this wayare awardeda higher than within class priority for observationso that. for example,a sought after junctionwouldbe prioritised aboveanyother junction which just might comeinto view. However.another category, such as vehicles, wouldalwaysreceive a higher priority for attentionthana junction. Discussion The AUTODRIVE system constitutes an initial attempt to incoporate dynamicworld modelling, plan recognition, and a naive attentional mechanisminto the planning process. As a vehicle for exploring these features AUTODR_IVE has pointed up several limitations which are the focus of current and ongoing work. Oncegenerated, descriptions contained within the dynamic world modelprovide data about the unfoldin~ situation, in the form of a sequence of projected snapshots. AUTODRIVE uses this info~aiation to identify impccamt interactions betweena vehicle driver and other agents and objects. Identifying such interactions leads to decisions about appropriate actions for the vehicle driver, of course. It also leads to modification or updating of the series of projections described in the dyoami¢ world model, to reflect expectations about howother agents are likely to changetheir behaviouras they interact with the situation as it unfolds. A crucial feature of the dynamicworld model. then. is that it replaces naive projections about agent behaviour with informed anticipation about how that behavionr may chan~e as a consequence of that agent’s hypothesised goals and complex interactions with the world. The dynamic world model then provides a more informed basis for anticipating correctly the likely behaviour of yet further agents interacting in this complex scenario, by identifying accurately the situation that will unfold. The dynamic world model lacks the power of a full ATMS. However.just as a truly reactive system escapes the need for any form of situational representation; so it seemsthe form of the dynamic world model serves the needs and purposes of AUTODRIVE well. Dyvamic world modelling provides a plaffc~-ai for responding to dynamic, uncertain domain~ by rendering those domains certain enough when this cannot be done purely on the basis of information apprehended at the moment of execution alone. In most cases this is because for such domain~ reacting comestoo late - the need to act arises before the concrete evidence of perceptual information becomes available - and the greater the time lag betweenaction and consequence, the greater the uncertainty about how the action will turn out. Information available in the dynamicworld modelreplaces the needfor constant sensing of all available info~aiation in order to direct hehaviour. Furthermore. the information requirements for plan recognition (as well as for planning and decision-maklng)can be used to direct perception in selective manner,helping to overcome real-time constraints on sensing. The form in which an attentional mechanism is currently implemented in AUTODRIVE is crude. However. it succeeds in directing selective perception and serves as a basis for exploring these issues further. Workon improving the basis for attentional behaviour is ongoing, and is guided by the work of HayesRoth (1991) and Ogasawara,(1993). for example. References Ferguscm.IA (1992)TouringMachines: AnArchitecture for Dy~amlc, Rational, Mobile Robots. Unpublished Thesis. Technical Report No 273. Computer Laboratory. University of Cambridge, Cambridge, CB2 3QG. UK. Georgeff. NIP (1987) ’Planning’, Annual Review ComputerScience, 2, pp359-400. Georgeff. MP and Lansky, AL. (1987) ’Reactive Reasoningand elannln~’, AAAI-6, Vol [[, Seattle, WA, pp677-682. Hayes-Roth. B (1991) ’Making Intelligent Systems Adaptive’. In K Vanlehn (Ed) Architectures for Intelligence, I-Iill~ale, NJ: LawrenceErlbaum. Sanborn. J &Hendler, J, (1988) ’A Modelof Reaction for Plannin~ in Dynamic Environments’, International Journal of Artificial Intelligence in Engineering, 3(2), pp95-102. Ogasawara, GH, (1993) RALPH-MEA:A Real-Time. Decision-Theoretic Agent Architecture. Unpublished Thesis, Report No UCB/CSD 93/777, ComputerScience Division. University of California, Berkeley. CA94720. Sanborn, J & Hendler, J, (1990) ’Knowledge Strata: Reactive planningwith a multi-level architecture’. Technical Report CS-TR-2564. University of Maryland, MD. Wood. S (1993) Planning and Decision-Making in DynamicDomains, Chichester: Ellis Horwood.