When Being Reactive Just Won’t Do

advertisement
WhenBeing Reactive Just Won’t Do
From: AAAI Technical Report SS-95-04. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.
Sharon Wood
School of Cognitive & ComputingSciences
University of Sussex,
Brighton, BN19QH, U.K.
email: sharonw@cogs.susx.ac.uk
Abstract
Reactivesystemsare. generally,highlysuccessfulfor dynamic
uncertaindomains.
Analysing
whythis shouldbe so
indicatesthe maincriterionfor successis that information
apprehended
abouta givensituationcanrenderit certain
enoughto reliably informaction. Acase is madefor informingactionby dynamic
worldmodellingfor domains
wherethis doesnot hold.When
the issueof attendingselectivelyto situationalinformation
is introduced,
the case
for dynamic
worldmodellingis enhanced.Asystemincorporal~gselective perceptionwhichdynamically
models
its domain
is described.
Introduction
Weare f~millar with the idea that dynamicdomaln.~pose
problemsfor deliberative planning.Thereasons are welldocumented(cf Goorgeff. 1987); to lxiefly s~immarise.
deliberativeplAnnln~
is generallyunableto derivera timely
response for domainswhereevents outpacethe planning
process; they arc generally unable to respondto events
occurring during the plannln~ process; and they are
generally unable to handle the urgency with which
problems must be addressed. To dale. the solutions
proposed to overcomethese problems have generally
resided in reactive systems. The mainadvantagesthese
systemsoffer is they can deliver a timely response to
events; they haveno planningprocessto interrupt so they
are able both to tackle eacheventas it arises andto do so
with the requisite degree of urgency.Wheresomeelement
of deliberative planning is preferred, wehave seen the
emergenceof hybrid systems. Often these are multilayered systems comprising a reactive level and a
deliberative level, sometimes
more,with the capability to
select betweenthe solutions available fromeach level in
orderto deliver a timelysolutionto the currentsituation(cf
Ferguson, 1992; Georgeff & Lansky, 1987; Ogasawara,
1993; Sanborn& Hendler, 1988; 1990). Wherevera realtime responsivecapability is required,reactive systems.
whetherincorporated within a hybrid systemor not. have
generally appearedto offer a solution. If weanalyse why
this mightbe the case. I believethere are someinteresting
insightsto be gained.
The Nature of Uncertainty
The pnalysis I wish to makeinvolves, initially, moving
awayfrom a focus on the need to be reactive or fast.
However.
ultimately. I shall return to t hi,~ issue as it is
pertinent to the analysis. Insteadthe analysis rests on the
relationship of reactivity to uncertainty in the execution
environment.
102
The main problem posed by a deliberative approach to
planningin a dynamicdomainis that pl~nnln~takes time.
However.time is only a problem in so far as we ate
uncertain about key features of the executionenvironment
whichmight inform our planning. In a perfectly known
universe, plans maybe laid in goodtime and executedat
the apposite moment.This maybe staling the obvious, but
statln~ it allows us to makesomeimportantobservations.
Thefirst is that until information
abouta givensituation is
apprehended,it is not possible to specify an appropriate
course of action (or reaction). Thesecondis that once
informationabout a given situation is apprehended,it is
certKmenoughto informaction. This second observation
is crucial to the analysis of whyreactive systemsare
successfulfor certain classes of problemsor domains.
(Re)action selection depends upon identifying
characteristicsof a givensituationthat indicatethat a given
responseis appropriate. This is usually ’hard-coded’as a
rule whoseantecedentdescribessituational conditionsand
whoseconsequentthe (re)action(s) to take. However,
restxmsewill be appropriateonlyif it interacts successfully
with the situation as it unfolds.In orderfor this to be the
case, the situational conditionswhichtrigger a rule must,to
someextent, predict events whichwill ensue in that
situation. There are two waysin whichan action maybe
unsuccessful.Thefirst is througha failure of execution
For example,you turn the handle of a door and pull but
nothinghappens;or it opens then immediatelyswingsshut
again Herein lies the powerof reactive systems: they
simplytake the current outcomeas a newset of conditions
whichwill trigger the appropriateresponse[Thesecondis
that the response madecan be described as the wrong
response2. That is, someother response should havebeen
made.It is a mootpoint whether this is, in fact, any
tHerein also lies the argumentthat only a reactive approach will
suffice for this kind of problem.
2Of course, there are degrees of ’wrongness’, so the response might
be just sub-optimal (compared to, say, a more deliberated response).
However,this is not the issue addressedhere.
different from the kind of failure described above;
however,I think it helps to consider that it is. This is
because, with the best will in the world, failmes of
executionwill alwaysoccur and their consequencesborne.
Whereasfailure to producea correct responseindicates a
failure in the response selection process. This in mrn
points to a failure to correctly identify or evaluate the
situational conditions whichled to its invocation; an
evaluation which depends upon correctly anticipating
situational outcomes
on the basis of currentobservations.
Theimplicationis that reactive systemscan be successful,
other than by chance,for uncertain, dynamicdomainsonly
if the informationapprehended
about a given situation can
render predictions about the outcometo that situation
certain enough to reliably inform action Without the
ability to predict, and hence resolve the taw.ertainty
concerningthe outcometo a situation as it unfolds, on the
basis of current observations, action selection cannotbe
reliably performed.
Theextent to whichthis needconcernus will dependupon
the characteristics of the domainwith whichthe system
mustinteract. Aspointedout earlier, if the choiceof action
is incorrect and the systemunsuccessfulthen the reactive
approach can overcome this by evaluating the new
situation that happensto turn out andselecting a response
as before, assumingthat eventuallythe responsemadewill
be a goodone. Thatis, it maymatter moreor less whether
or not the system always makesthe correct response if
incorrectonescan be put right withlittle consequence.
However,
there is anotherissue whichis separate fromour
concern with howoften the system makesthe right or
wrongresponse; although it will affect howoften the
systemdoes get it wrongor fight. This is the extent to
whichthe outcomesto situations are inherently predictable
whenbased uponthe observable characteristics of those
situations. If weare unable to correctly anticipate the
outcometo a situation on this basis, then werequire a
meansof supplementing
the implicit prediction givenus by
situation-action rules. Whatweneedto supplementit with
is the ability to reasonaboutthe accuracyof the predictions
on which we are to base action: to detect whennaive
prediction is inaccurate; to identify whythe situation is
unfoldingin someother way;and to use that informationto
enhancethe accuracyof subsequentpredictions about the
continuingdevelopment
of the situation.
At this point. I wouldlike to return to the issue of the need
for responsesto he fast. Discussionabout action usually
focnsses on behaviour whichdirectly achieves goals by
bringing about changesin the world;however,actions for
acquiring information also need to be carried out. In
situations wherethere is continualchangeof an unceffsin
nature, informationmustbe acquiredbefore the needto act
arises, so there is little time available. In addition, the
opportunityfor acquiring somekinds of informationwill
be cc~strainedto a limited time window.Notonly, then, is
the information to be apprehendedcontinually changing
and of aa uncertain nature, it is not possible underall
circumstancesto apprehend
all the informationavailable.
Whetherby cirounstance or design - that is, be it
haphazardor informed- the apprehensionof information
mustbe selective. As a result action and prediction must
be based on only a limited subset of the available
information.This diminishesthe inherent predictive power
of the observations made; because had we madesome
other (possible) observations wemighthave madeother
overridingpredictions. I have w~dedthis statementthis
waybecauseI believe it strengthens the case against a
simple reactive rule-based approachas being appropriate
for all classes of uncertain, dynamicdomains.Namely,
had other observations beenmade,a different rule might
have beendeemedcorrect.
I believeit also strengthensthe casefor the appropriateness
of a capability for reasoning about the accuracy of
predictions uponwhichactions are to be based. Not least
of all is the reason that predictions arising frommore
recent observationswill be placedin the contextof already
existing expectations.Whetheror not they entirely agree,
posing a question over their accuracy, a morecomplete
picture of theemerging
situation will be built up than on
the basis of those mostrecent observationsalone.
Below, I describe a system, AUTODRIVE,
which
incorporatesan initial attemptto implement
the capabilities
of predictive world modelling based on selective
perception.
Dynamic World Modelling
in AUTODRIVE
The AUTODRIVE
system simulates the dynamic.
uncertain domainof driving. Asymbolicdescription of a
situation andthe changesthat occurwithinit are presented
as a series of ’snapshots’. Informationabout this world
maybe apprehendedselectively by focussing on single
domainentities over a sequenceof such snapshots. The
informationis used to build up a dynamicworldmodelfor
drivers of vehiclesin the microwcxld.
Everyvehicle in the
domainbuilds up its ownunique dynamic world model
based on the observations madefrom its ownviewpoint.
Of course, these viewpointsinclude drivers seeing each
others’ vehicles. Thedynamicworldmodelis a predictive
modelof howa situation will unfoldover an eight second
period.It is usedto informthe actionsof the vehicledriver
using an integrated planning and dynamicdecision-making
approach which is not discussed further here but is
describedin detail in Wood
(1993).
The Observation Mechanism
Aseach successivesnapshotof the worldis presentedto a
vehicle driver, data about the current situation are made
accessible to an Observation Mechanism(OlVl). This
mechanism
selects data pertaining to a single object3 and
passes it into a Short TermInformation Store (STIS).
During this selection process, the OMhas access to
informationpreviously placed in the STIS. Objects which
havepreviouslybeenobservedwithin a precedinginterval
of four seconds are ignored. The remaining observable
objects are prioritised on an arbitrary preset rating scale
accordingto class of object. This rating maybe modified
by requests received from the planning and decisionmakingprocesses, and processes for plan recognition.
about the needto observecertain objects in particular.
Onceselected, aa object is ’fixated’ for 0.5 seconds(five
time cycles in the current implementation).
The Short TermInformation Store
whichdecision-makingwill take place, over this period.
Anexampleprojection for a fixed object wouldlook like
this:
projection,[red_car,[[sign,
no_entry],
sign5,[54.0,61.0]],9.6,9.7]
This projects the relative distance, eg 54.0 metresand61.0
metres, of a pair of no-entry signs fromthe viewpointof
the red car over a 0.1 secondinterval. Theinterval over
whichthe object is predictedto be at the relative distance
specifiedis givenby the italicized figures, e8 9.6 - 9.7.
ThePLTWM
bases its projection of the relative distance of
fixed objects uponthe knownmovements
of the observing
driver, decreasingby preciselythe distancetravelled by the
driver per interval. If the driver intends to changethe
pattern of his behaviourduringthe projection period, the
modellertakes this into accountin generatingthe projected
sequence.
Data describing each observed object, in a temporally
qualified form. are stored in the STIS.Theexamplebelow
of data containedin the STISdescribes a vehicle aheadof
the driver, travelling at just under30mph.Overa sequence
of intervals the green car observesthe red car travelling
froma position 40.0 metres along the LewesRoadto 45.0
metres along the nortbemcarriagewayof the LewesRoad.
Thefixation start andendtimesare italicized.
Because other agents moveindependently of the driver
viewingthem, projecting the future locations of other
vehiclesinvolvesascertainingtheir speedandtrajectory in
orderto calculatethe absolutedistancetheywill travel over
the projection period. It also involvespredicting likely
behavionrchanges.This firstly requires interpretation of
the observedbehavioursequence. This is segmented,if
required, at points wherethe vehicle is observedto have
either changeddirection, suchas whenreversing, or when,
stis,[greencar,[[vehicle,
ahead],carl,[re~r,[lewes_rd6,1ewes_rd,
brighton],40.0,n,0.5,3.0.2.0]],0.0,0.1]
say, starting off after havingbeenstationaryfor a periodof
stis, [ greencar,
[ [vehicle,
ahead],
car1. [ re, dear.[lewes_rd6,
lewes_rd, time. A projection is based only on that behaviour
brighton],41.25,n.0.5,3.0,2.0]],0.1,0.2]
observed since one of those segmentation points. The
s tis. [greaacat.
[ [vehicle.
ahead]
,car1. [redcar.
[lewes_rd6.1ewes_M,
PLTWM
bases its calculation of vehicle movementupon
brighton].42.50.n,O.5.3.0.2.0]],0.2.0.3]
the degreeof movement
whichtakes place over the fixation
stis,[greeacar,[[vehicle,
ahead],carl ,[redear,[lewes
rd6.1ewes_rd,
interval
(0.5
seconds)
segmentedin this way. However.
brighton].43.75,n.0.5,3.0,2.0]],0.3.0.4]
before
the
future
movements
of the vehicle can be
stis, [ greencar.
[ [vehicle,
ahead],
car1, [ redcar, [lewes_rd6,
lewes_rd,
projected,
expectations
about
changes
in behavionr in
brighton],45.0,n,0.5,3.0,2.0]],0.4,0.Y]
satisfying situational constraints and the personal
intentions of the driver mustbe identified. This processis
The Plan Lookahead Temporal World Model
describedin the nextsection.
The construction of the Plan LookaheadTemporalWorld
The Plan Recogniser
Model (PLTWM)constitutes
AUTODRIVE’s
dynamic
world model. It is based on object location and vehicle
ThePlan Recogniser(PR)uses the implicit description
descriptor data stored in the STIS. Onceobserved, an
object’s relative distance (fixed objects) or movements action containedin the STISto invokehypothesesabout
the goal(s) a vehicle mightbe pursuing. ThePRdoes not
(vehicles) are modelledover a period of eight seconds.
reasonaboutthe actions of vehicles per se but abouttheir
This modelis representedby a sequenceof data structures
deviation fromexpectations.Theinitial projections of a
describingthe object for eachsimulationcycle whichwill
vehicle
constitute expectations; it is uponsubsequent
occur during the eight second period (that is, in this
observation
that deviations from expectations maybe
implementationeighty data structures are generated for
detected
and
hypothesesgeneratedregardingthe personal
each object) so that AUTODRIVE
has access to the
intentions
of
the
driver.
relevant projection descriptor for each interval during
~I’hese information processing constraints do not apply to some
categories of information, such aa feedback to the driver on his own
position, the ourvature, gradient and width of the road and lane markings;
that is, information that enables the driver to guide his vehicle. This
decision was based on evidence that humansare able to acquire such data
parafoveally, and is open to change.
Thecurrent speedand trajectory of the observedvehicle
plus the expectationthat it will attemptto travel at the
speed limit provides a base line for generating
expectations. The current situation 4 and the inferred
personal intentions of the driver point up constraints on
this behaviour.For example,a vehicletravelling belowthe
speed limit will be expected to increase speed at a
reasonablerate of acceleration. Fromthis, one mayalso
calculate the distance the vehicle will havetravelled in
achievingthis desiredspeeds assuming a reasonablerate of
acceleration.
speed_constraint,[green_car,
red_car,
1.3, 8.0]
Here the green_car hypothesises that the red car will
behavein accordancewith the constraint to drive at a speed
of 1.3m over a 0.1 second cycle (approx 30mph)and
expectedto havecompliedwith that constraint by the time
it has travelled the distanceof 8 metres.At this point the
vehiclewill be anticipatedto level off its speedandso the
anticipatedchangein the vehicle’sbehaviouris also noted:
behaviour_change,[green_car,
red_car,8.0, 1.3, 0.0]
This describes the infen~ intention of the red car to
changeits behaviourafter it has travelled the distanceof 8
metresin order to satisfy the constraint of drivingat 1.3
metresper time cycle thenceforth. Becauseit is currently
the red car’s inferred intention to drive at the speed
specifiedby the speedlimit, this intentionis also notedas
the current immediate
plan of the driver:
plan_constraint,[green_car,
red_car.13.8,0]
All constraints are comparedto ensure that always the
strictest constraint applies (eg that whichrequires the
greater rate of deceleration) tkrough the constraint
recording mechanism.For instance, a red traffic signal
would lead the PR to anticipate the car ahead slowing
down,and the speed constraint applying to that vehicle
would be substituted with one reflecting this new
information. Constraints might apply immediately, if
appropriate,or at somepoint in the future as an anticipated
behaviour change. In the latter case, the intention to
change speed is noted as the vehicle’s active
plan_constraint.
For vehicles observedpreviously, earlier predictions are
used to corroboratecurrent expectations. A vehicle whose
overall behaviouris not predicted on the basis of preexisting or existing situational constraints, invokes
reasmingbased on the possible personal goals a driver
mayhave andtypical plans for achievingthose goals. The
PRattempts to link inferred behaviour(across the two
observations) to a knownplan. Supplementary
evideneeis
soughtto supportthis reasoning.For example,if a vehicle
has pulled out andis decelerating,the goal of turningright
*The current situation here being the recorded position of all
objects and vehicles previously observed and modelled in the PLTWM
correspondingto the current time cycle.
5Calculations are based on the standard equations of motion
involving uniformacceleration: v = u+at; v*v= u*u+2as;s = t(u+v)/’2.
105
(left in the USA!)maybe hypothesised;therefore, a road
junctionon the right, moreor less in the vicinity of where
the driver will eventually cometo a halt. wouldbe good
evidence in support of this hypothesis. Similarly, one
mighthypothesisethat a vehicle rapidly slowingdownmay
havenoticed an obstacle or a red light ahead, previously
unseen by our owndriver; he wouldthen seek supporting
evidence of this too. The process of reco~ising other
drivers’ plans, then, makesfurther demandson the
selective attention of the driver for confirmatoryevidence
of one kind ¢r another. If the driver fails to find
corroboratingevidenceof anykind, it is assumedthat the
vehicle will pursue its cm~ntcourse of action to its
conclusion,modifiedto take accountof current situational
constraints.
The projection of the vehicle in the PLTWM
is
consequentlymodelledaccording to the constraints and
behaviourchanges identified. For example,the vehicle
wouldbe projected as complyingwith currently active
constraints until the next predicted behaviourchangeis
deemedto comeinto effect. Thenceforththe vehicle will
be modelledas complyingwith the newlyactive set of
constraints. This process continues for each predicted
behaviour change until modelling is completedfor the
eight second period. The form of the PLTWM,
then. is a
sequenceof projected snapshotsdescribing every observed
object and agent. It is sufficiently precise to identify
interactions betweenthose objects and agents and the
constraints they imposeon the viewer’sownbehaviour.
Selective
Perception
in AUTODRIVE
In the last section mentionwasmadeof directing attention
to information for corroboration of plan recognition.
Similar direction of attention takes place during
AUTODRIVE’s
decision-makingactivities. In both cases
an attention ’request’ is generated and held alongside
similar requests which, at the end of each dynamicworld
modelling and decision-makingcycle, are passed to the
Observation Mechanism(OM) described above. For
example,here is an attention request by the driver of the
green car to observe a direction sign. As it is as yet
unobservedthe Data it will provide is unknown.The
request has a priority rating of 6 on an arbitrary ordinal
scaleof 1 to 10.
[attendon_reque.st,[gre~n_car.[[sign,direction].Data,6.0]]]
Objects for which observation has been specifically
requested in this wayare awardeda higher than within
class priority for observationso that. for example,a sought
after junctionwouldbe prioritised aboveanyother junction
which just might comeinto view. However.another
category, such as vehicles, wouldalwaysreceive a higher
priority for attentionthana junction.
Discussion
The AUTODRIVE
system constitutes an initial attempt to
incoporate dynamicworld modelling, plan recognition, and
a naive attentional mechanisminto the planning process.
As a vehicle for exploring these features AUTODR_IVE
has
pointed up several limitations which are the focus of
current and ongoing work.
Oncegenerated, descriptions contained within the dynamic
world modelprovide data about the unfoldin~ situation, in
the form of a sequence of projected snapshots.
AUTODRIVE
uses this info~aiation to identify impccamt
interactions betweena vehicle driver and other agents and
objects. Identifying such interactions leads to decisions
about appropriate actions for the vehicle driver, of course.
It also leads to modification or updating of the series of
projections described in the dyoami¢ world model, to
reflect expectations about howother agents are likely to
changetheir behaviouras they interact with the situation as
it unfolds. A crucial feature of the dynamicworld model.
then. is that it replaces naive projections about agent
behaviour with informed anticipation about how that
behavionr may chan~e as a consequence of that agent’s
hypothesised goals and complex interactions with the
world. The dynamic world model then provides a more
informed basis for anticipating correctly the likely
behaviour of yet further agents interacting in this complex
scenario, by identifying accurately the situation that will
unfold.
The dynamic world model lacks the power of a full ATMS.
However.just as a truly reactive system escapes the need
for any form of situational representation; so it seemsthe
form of the dynamic world model serves the needs and
purposes
of AUTODRIVE well.
Dyvamic world
modelling provides a plaffc~-ai for responding to dynamic,
uncertain domain~ by rendering those domains certain
enough when this cannot be done purely on the basis of
information apprehended at the moment of execution
alone. In most cases this is because for such domain~
reacting comestoo late - the need to act arises before the
concrete evidence of perceptual information becomes
available - and the greater the time lag betweenaction and
consequence, the greater the uncertainty about how the
action will turn out.
Information available in the dynamicworld modelreplaces
the needfor constant sensing of all available info~aiation in
order to direct hehaviour. Furthermore. the information
requirements for plan recognition (as well as for planning
and decision-maklng)can be used to direct perception in
selective
manner,helping to overcome real-time
constraints on sensing. The form in which an attentional
mechanism is currently implemented in AUTODRIVE
is
crude. However. it succeeds in directing selective
perception and serves as a basis for exploring these issues
further. Workon improving the basis for attentional
behaviour is ongoing, and is guided by the work of HayesRoth (1991) and Ogasawara,(1993). for example.
References
Ferguscm.IA (1992)TouringMachines:
AnArchitecture
for Dy~amlc, Rational,
Mobile Robots.
Unpublished Thesis. Technical Report No 273.
Computer Laboratory. University of Cambridge,
Cambridge, CB2 3QG. UK.
Georgeff. NIP (1987) ’Planning’, Annual Review
ComputerScience, 2, pp359-400.
Georgeff. MP and Lansky, AL. (1987) ’Reactive
Reasoningand elannln~’, AAAI-6, Vol [[, Seattle,
WA, pp677-682.
Hayes-Roth. B (1991) ’Making Intelligent
Systems
Adaptive’. In K Vanlehn (Ed) Architectures for
Intelligence, I-Iill~ale, NJ: LawrenceErlbaum.
Sanborn. J &Hendler, J, (1988) ’A Modelof Reaction for
Plannin~
in
Dynamic Environments’,
International Journal of Artificial Intelligence in
Engineering, 3(2), pp95-102.
Ogasawara, GH, (1993) RALPH-MEA:A Real-Time.
Decision-Theoretic
Agent
Architecture.
Unpublished Thesis,
Report No UCB/CSD
93/777, ComputerScience Division. University of
California, Berkeley. CA94720.
Sanborn, J & Hendler, J, (1990) ’Knowledge Strata:
Reactive planningwith a multi-level architecture’.
Technical Report CS-TR-2564. University of
Maryland, MD.
Wood. S (1993) Planning and Decision-Making in
DynamicDomains, Chichester: Ellis Horwood.
Download