Commencing Execution Richard Goodwin Approaches

From: AAAI Technical Report WS-97-10. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Commencing
Execution
Richard Goodwin
rgoodwinQwatson.ibm.com
IBM, T.J. Watson Research Center
Route134, YorktownHeights
New York, 10598
(914) 945-2651
Abstract
Peal-time search algorithms interleave search with execution in order to accomplisha task ei~ciently. Deciding when to begin execution of the current best
action is crucial for creating high a performancerealtime search algorithm. This paper explores the question of when to commenceexecution for an on-line
search performed with limited computation. Weex*mine Good’s ideslised algorithm for deciding when
to begin execution and illustrate someof its llmltations. Wethen present a revised, idealised algorithm
that removessomeof these limitations. The revised
algorithm
hasbeenusedu s basisforan algorithm
that makeson-the-fly decisions for a simplified robot
courier tuk. Previous empirical results show that
this new algorithm outperforms previously used alsorithms. In this paper, we showthat the di~erence in
performance is boundedby a ratio of two to one. We
discuss the situations wherethis limit is approached
and suggest that overlappins search with execution in
these situations is mostbeneficial.
Introduction
Real time search methods, (Korf 1987), situated action (Suchman 1987) and anytime algorithms (Dean
and Boddy 1988) all share the notion that computation and execution need to be interleaved in order to
achieve good performance. All three recognized that,
faced with s complicated task, some initial search can
significantly increase the likelihood of success and increase ei~ciency. Conversely, sitting on one’s hands,
considering all possible consequencesof all possible actions does not accomplish anything. At some point,
action is called for. But when is this point reached?
Deciding when to begin execution is complex. There
is no way to know how much a solution will improve
with a given amount of additional search. The best
that can be achieved is to have some type of statistical model as is done in the anytime planning framework (Dean and Boddy 1988). In addition, the decision
should take into account the fact that it may be possible to execute one part of the plan while planning
another.
Approaches
The classical AI planning approach to deciding when
to begin execution is to run the planner until it produces a plan that satisfies its goals and then to execute the plan. This approach recogni~,es that finding
the plan with the highest expected utility maytake too
long and that using the first plan that satisfies all the
constraints may produce the best results (Simon and
Kadane 1974). It also relies on the heuristic that in
searching for a plan, a planner would tend to produce
simple plans first and simple plans tend to be more
efficient and robust than unnecessarily complex p|~ms,
Execution is delayed until a complete plan has been
generated. Only when there is a complete plan can
there be a guarantee that an action will actually be
needed and will not lead to a situation where the goals
are no longer achievable.
In contrast to the classical AI approach, the reactive approach is to pre-compile all plans and to always
perform the action suggested by the stimulus-response
rules applicable in the current situation. The answer
to the question of when to begin acting is always "now"
(Brooks 1986, Schoppers 1987). This approach is applicable when it is possible to pre-compile the plans
and in dynamic environments where quick responses
are required.
A third approach is to explicitly reason about when
to start execution while planning is taking place (Russell and Wefald 1991). Using information about the
task, the current state of the plan and expected perfonnance of the planner, the decision to execute the
current best action is made on-line. This allows execution to begin before a complete plan is created and
facilitates overlapping of planning and execution. This
approach also allows execution to be delayed in favour
of optimizing the plan, if the expected improvementso
warrants.
Idealized
Algorithms
In thissection,
we present
Good’s
ideali,.ed
algorithm
fordeciding
whento beginexecution
andourrevised
version
of thealgorithm
thataccounts
foroverlapping
planning
andexecution.
Bothalgorithms
areidealised
because they assume that the decision maker knows
the effect and duration of each of the possible computations and actions. Since this information is generally
not available, we need to approximate it in order to
create an operational system. In the next section we
operationalize both algorithms and compare the results
empirically.
Making on-line decisions about when to begin execution can be modeled as a control problem where a
recta-level controller allocates resources and time to
the planner and decides when to send plans to the execution controller for execution. The control problem
is to select actions and computations that maximize
overall expected utility. Good’s idealized algorithm,
which has been suggested by a number of researchers,
selects the action from the set {a, C,,..., Ch} that has
highest expected utility (Good 1971, Russell and Wefald 1991). In this list, a is the current best execution
action and the C’i’s represent possible computational
actions. The utility of each computational action must
take into account the duration of the computation by
which any act it recommends would be delayed. Of
course, this ideal algorithm is non-operational since the
improvement that each computation will produce and
the duration of the computation may not be known.
A practical implementation will require these values
to be estimated.
Revised
Idealized
to visit a set of locations and return to the initial location as quickly as possible. Weassume that Euclidian
distance between locations is an accurate estimate of
the travel time. Any ordering of locations is a valid
tour, but some tours require more travel than others.
The robot can use a simple edge exchange algorithm
(two-opt) to improve its tour, but this computation
takes time. The decision to commenceexecution involves a tradsoff between reducing the length of a tour
through more computation and delaying the start of
execution.
Webegin by characterizing the performance of the
two-opt algorithm using a performance curve, as is
done in anytime planning (Dean and Boddy 1988).
fit a curve to the empirical results of running the planner on multiple problems and use the curve to predict
the results for new problems. This curve is used to
supply the information needed to make each of the
idealized algorithms operational.
The expected improvement in the tour length, as
a function of computation time for the two-opt algorithm, is accurately described by equation 1. In the
equation, n is the number of locations and t is the
elapsed time spent improving the tour. The parameters A and B are determined empirically and depend
on the implementation and the computer used. A is a
scaling factor that depends on the units used to measure distance. Measuring distances in meters rather
than kilometers would increase A by 1000. B depends
on the relative speed of the computer. Doubling the
processing speed would decrease B by a factor of one
half.
Algorithm
Good’s algorithm ignores the fact that many agents
can act and compute concurrently.
A better control question to ask is not whether the agent
should act or compute, but whether the agent
should act, compute, or do both.
The corresponding control problem is to select from the list
l(t, n) = nA(I -m)
(I)
Wecan differentiate this equation to get the rate of
improvement as a function of the elapsed time:
c,),...,
ck),(¢,(¢,where each
ordered pair represents an action to perform and a
computation to do. ~b represents the null action corresponding to only computing. This is no "act only" pair
since acting typically involves (non-planning) computation (e.g. reactive control) that requires some or all
the computational resources.
The advantage of considering pairs of actions and
computations i s that it allows the controller to accept
the current action and forgo some expected improvement in favour of using the computational resources
to improve the rest of the plan. For example, I may
be able to improve the path I am taking to get to my
next meeting that saves me more time than the path
planning takes. However, I might be better off if I go
with the current plan and think about what I am going
to say when I get to the meeting instead.
l(t, n) = nABs-B*
(2)
The original idealized algorithm suggests either computing or acting, whichever one produces the best expected result at a given point in time. This corresponds
to an execution control strategy that commencesexecution when the rate of tour improvementI(t, n) fails
below the rate at which the robot can execute the tour
(1:texe).
(3)
-I(~, n) = nABe-B’ = Reze
If the rate of execution (Rexe) and the parameters
of the curve (A and B) are known, then the start time
is given by:
lnt ~a,ffi,
~--3T)
(4)
B
That is, the robot courier should optimize until t,,..
and then begin execution. Wewill call this the anytime
methodfor deciding whento begin execution since it is
based solely on the performance curve of the two-opt
algorithm, an anytime algorithm.
t.ar~ =
Robot-Courier Tour Planner
In this section, we show how to operationalize both
Good’s algorithm and our revised algorithm for use by
a simple robot courier. The task of the courier robot is
37
Compute only rate
Compute and Ac~ rate
=
= nAB
1
= Reze+ ~.,,
(5)
7(t,n- 1)d
ae.e+ n - I) + + At,n - I))
At
Step-Choice
= -t - In((n + 1)ABe-s’ - 2Reze)/B
(.- 1)AB
Algorithm
The decision algorithm based on the original idealised
algorithm ignores the possible overlap of planning and
execution. Taking this overlap into account gives a
different decision procedure that can improve performance under some circumstances.
When taking the
overlap into account, our algorithm will make a choice
about whether to begin execution of the next step or
delay execution until the planner has had more time to
improve the tour. Wecall our algorithm the step choice
algorithm because it make discrete choices at each step
rather than calculating a ~,,.,,before beginning execution. In this section, we give the mathematical basis
for the step choice algorithm. Webegin by outlining
some assumptions that we make in order to simplify
the problem. We then give the formula for deciding
when to begin execution of the next action. This formula can not be solved in a closed form, but can be
approximated. Weconclude this section with a discussion of the implications of the step choice algorithm.
In order to simplify the decision problem for the robot courier, we introduce three assumptions. We assume that computation is continuous and interruptible and the expected future bchaviour is predicted by
the performance curve given in equation I. This assumption is also implicitly used by the anytime method
given in the previous section. In addition, we also assume that execution actions are atomic and that once
started, must be completed. The reason for using this
assumption is to avoid complications associated with
interrupting an action in order to start another. Typically, interrupting an action incurs some communications overhead and some effort to put things in a safe
state before abandoning the action to begin another.
By treating actions as atomic, we avoid the problem
of modeling the cost of interrupting an action. We
also avoid considering optimizations that involve interrupting the action. Computing the value of such
an optimization may depend on how much of the action has been accomplished, which may introduce more
communications overhead between the planner and the
executer. For example, the distance from the current
location to another loc~tion is constantly changing as
the robot courier executes part of a tour. As a result,
the value of interrupting the current action and heading towards a new location is also constantly changing
38
and the robot would have to know where it was going
to be when an action was interrupted in order to evaluate the usefulness of interrupting an action. Finally,
we assume that execution does not affect the amount
of computation available for tour improvement. This
is realistic if tour improvementis carried out on a separate processor from the one used to monitor execution.
If the same processor is used for both tour improvement and execution monitoring, then the assumption
is almost true if the monitoring overhead is low. If
this assumption does not hold, then our algorithm can
be easily modified to account for the degradation in
planner performance while executing. Wemake this
same assumption when running the anytime algorithm
and comparing it to step-choice algorithm in the next
section.
Our revised, idealized algorithm suggests that the
robot should begin execution when the expected rate
of improvementdue to tour planning alone is less than
the expected rate of improvement due to planning and
executing. As with the anytime algorithm, we characterize the expected improvement in the tour using
a performance profile. The rate of tour improvement
for computing alone is given in equation 5. Whenthe
robot chooses to act and compute, the robot is committing to the choice for the duration of the action,
given our assumptions about atomic actions. The appropriate rate to use whenevaluating this option is not
the instantaneous rate but the average rate over the
duration of the action. Equation 6 gives the average
expected rate of accomplishment for acting and computing. In this equation, At is the duration of the first
action in the tour. Equating 5 and 6 and solving for
/kt gives an expression for the time duration of a move
the robot would be willing to execute as a function
of time spent computing. Unfortunately, the resulting
expression has no closed form solution 1. If the integral
in equation 6 is replaced by a linear approximation, an
expression for At can be found (equation 7). Using the
linear approximation over-estimates the rate of computational improvement of the move and compute option
and biases the robot towards moving. On the other
hand, equation 6 only approximates the rate of optimization after taking a step. It under-estimates the
effect of reducing the size of the optimization problem.
*Theresult involves solving the omegafunction.
(8)
Since two-opt is an O(n2) algorithm per exchange, reducing the problem size by 1 has more than a linear
effect, as equation 1 suggests. Since the estimates err
in opposite directions, they tend to cancel each other
out.
Examiningthe form of equation 7, it is clear that At
is infinite if the argument to the l~ 0 function reaches
zero. This corresponds to an expected rate of tour
improvement about twice the rate of path execution
(Equation 8). At this point, the marginal rate of only
computing is zero and the robot should execute any
size step remaining in the plan. This suggests that
t,,,,,
should be chosen so that the anytime strategy
waits only until the rate of improvement falls below
twice the rate of execution, (I(t,n)
nABs-Bt =
2 * Reze). Wecallthis the "Anytime-2" strategy.
In the range where At is defined, increasing the rate
the execution (Rexe) increased the size of an acceptable step. Similarly, increasing the rate of computation
(B) decreases the size of the acceptable action. In the
limit, the acceptable step correctly favours either always computing or always acting, as suggested by the
relative speed of computing and execution.
In between the extremes, the robot courier will take
an action wheneverthe time to perform it is less than
Z~t and does not necessarily delay execution. Does it
make sense to perform an action even when the planner
is currently improving the plan more than twice as fast
as the robot can execute it? It does if the action is
small enough. The amount of potential optimization
lost by taking an action is limited by the size of the
action. At best, an action could be reduced to zero
time (or cost) with further planning. The opportunity
cost of forgoing more planning by executing an action
is thus limited by the size of the action. Performing
short actions at the beginning of the tour removes them
from the optimization process and has the advantage
that the optimizer is faced with a problem with fewer
locations, but nearly the same length tour and nearly
¯ the same opportunity for improvement.
expected performance. If the robot gets lucky and the
initial randomtour contains short actions at the start
of the tour, then the robot can immediately begin execution. Thissuggests
thatrobotmightbe betteroff
by generating
an initial
tourusinga greedy
algorithm
and thenusingtwo-optratherthanusingtwo-opton
an initial
randomtour.In ourempirical
comparison,
we ignored
thispotential
optimization
to avoidbiasing
theresults
infavour
of thestepchoice
algorithm.
In previously
reported
results,
we haveshownthat
usingthestepchoicealgorithm
produces
betterperformancethanmeta-level
controlschemesbasedon
Good’salgorithm
(Goodwin
1994).In a testusing100
examples,
eachconsisting
of 200randomly
generated
locations,
thestepchoice
algorithm
gavea 4.5%performancegain whencomparedto the anytimealgorithm
approach,
despite
incurring
a 2.5%overhead
forperforming
on-line
meta-level
control.
Theempirical
resultsfromtherobot-courier
tour
plannershowa slightimprovement
dueto takingthe
overlapping
of planning
and execution
intoaccount
when decidingwhento beginexecution.
The questionwe consider
in thissection
is howmuchof an improvement
is possible.
We presentan argumentthat
the improvement
is limitedto doublingperformance
by reducing
thetotaltimeby a factor
of 2.
Time
Figure h
Consider a task where the planner is improving a
plan at a rate just slightly faster than the rate at
which the plan can be executed ( I(t, n) Reze + el)
figure 1. Good’s original algorithm would continue
to optimize the plan, but not execute any part of
it. The step-choice algorithm would attempt to do
both plan and execute. The rate of task achievement
is I(t,n1)-{- Reze = 2, Reze +el- e2, where
e2 = -i(t,n) 7(t,n - 1). Th e ra tio of the rate s of
Results and Limits
Using an on-line decision strategy that executes actions that take less than Z~ allows the robot to be
opportunistic. If at any point in the tour improvement
process the current next moveis smaller that Z~, then
the agent can begin moving immediately. In this way,
the agent takes advantage of the planner’s performance
on the given problem rather than relying only on its
39
In probabilistic domains, we face additional complications since we can not necessarily predict the outcome of an action. One of the reasons for executing
actions in a probabilistic environment is to gather information about the resulting state, which allows us to
limit the numberof states that needs to be considered.
However,the ability to predict the likely results of an
action are useful for scheduling computation during execution. Computation should be focused on finding the
best actions for the most likely next states. In (Goodwin 1994), we give an example where the expected performance increases when a robot operating in a probabilistic environment begins taking a step that may fail
and continues planning as if the step had succeeded. If
the action succeeds, the robot continues with its plan.
If the action fails, the robot throws awaythe plan from
the anticipated state and plans from its current state.
Again, this works because no failures are catastrophic
and recovery costs are small.
task accomplishment approaches 2 as el and e2 upproach zero. If the rate of plan improvement remains
relatively constant over the duration of the task, then
the revised algorithm performs the task almost twice
as quickly as the original algorithm. It is only when
the two rates, improvement and execution, are almost
equal that considering overlapping has a significant elfeet. If either rate is muchhigher than the other, then
the relative improvementis not as significant.
Discussion
The fact that overlapping execution with optimization
can double performance is not surprising since we are
basically splitting a task into two parts and working
on each one separately. What is surprising is that the
straight forward strategy of waiting until the rate of
optimization falls below the rate of execution can perform half as well as the optimal strategy. This illustrates that we must be careful of the assumptions we
make when designing recta-level control algorithms.
Weconclude with a discussion of some of the factors that should be considered when deciding when to
commenceexecution, beyond the obvious factors that
we have already considered. Wealso consider some
of the extentions needed to deal with less benign and
probabilistic domains.
The step-choice metaAevel control algorithm works
for the simplified robot courier because the cost of undoing an action is the same as the cost of performing
the action. In fact, because of the triangle inequality,
it is rare that the action has to be fully undone. The
robot courier would only retrace its path if the locations were co-linear and the new location was in the
opposite direction, since the shortest route proceeds
directly to the next location. At the other end of the
spectrum are domains such as chess playing where it
is impossible to undo a move2. In between, there are
many domains with a distribution of recovery costs.
Consider, as an example, leaving for the office in the
morning. Suppose that it is your first day at work, so
there is no pre.compiled, optimized plan yet. The first
step in your plan is to walk out the door and then to
walk to the car. Suppose you do this and while you
are walking, your planner discovers that it would be
better to bring your brief case. The cost of recovery
is relatively cheap. Just walk back into the house and
get it. On the other hand, if you forgot your keys, you
may now be locked out of the house. In this sort of
environment, using only the cost to do an action may
not be the best strategy. The decision to begin execution must consider not only the time and resources
needed to execute an action but also the time and resources needed to undo an action and probability that
the action will need to be undone.
References
Rodney Brooks. A robust layered control system for
a mobile robot. IEEE Journal of Robotics and Au.
tom,qion, RA-2:14-23, April 1986.
Thomas Dean and Mark Boddy. Aa analysis
of
time dependent planning. In Proceedings of the Seventh National Conference on Artificial Intelligence
(AAAI-88), pages 49-54. AAAI, 1988.
I. J. Good. Twenty seven principles of rationality. In Proceeding8 of the $!/mposium on the FOUNDATIONS of STATISTICAL
INFERENCE. Rene
Descartes Foundation, March 1971.
Richard Goodwin. Reasoning about when to start
acting. In K. Hammond,editor, Proceedings o/the
5eeond International Conference on Artificial Intel.
lifence Planning Systems (AIF$-94), pages 86-91,
Chicago, Illinois, June 1994.
Richard E. Kor£ Real-time heuristic search: First results. In Proceedings of the Sizth National Conference
on Artificial Intelligence (AAAI-87), pages 133-138,
1987.
Stuart Russell and Eric Wefald. Do the RigM Thing.
MIT Press, 1991.
Marcel Schoppers. Universal plans for reactive robots in unpredictable environments. In Proceedings
of the Tenth International Joint Conference on Artificial Intelligence (IJCAI-87), Milan, Italy, August
1987. IJCAI.
Herbert Simon and Joseph Kadane.
Optimal
problem-solving search: All-or-nothing solutions.
Technical Report CMU-CS-74-41, Carnegie Mellon
University, Pittsburgh PA. USA, December 1974.
Lucy Suchman. Plans and Situated Actions The problem of Human Machine Communication. Cambridge
University Press, 1987.
2Youmaybe able to movea piece back to its previous
location, but only after your opponent has had an opportunity to move, so the board will not be in the orisinal
configuration unless your opponent also undoes his move.
4O