Optimal Schedules for Parallelizing Anytime Algorithms

From: AAAI Technical Report FS-01-04. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved.
Optimal Schedules for Parallelizing
Lev Finkelstein
and Shaul Markovitch
Anytime Algorithms
and Ehud Rivlin
Computer Science Department
Technion, Haifa 32000
Israel
{lev,shaulm,ehudr}@cs.technion.ac.il
Abstract
What is commonto the above examples?
¯ Weare trying to benefit from the uncertainty in solving
morethan one instance of the algorithm-problempair. We
can use different algorithms (in the first example) and
different problems (in the second example). For nondeterministic algorithms, we can also use different instances of the samealgorithm.
¯ Eachprocess is executed with the purpose of satisfying a
given goal predicate. The task is considered accomplished
whenone of the instances is solved.
¯ If the goal predicate is satisfied at time t*, then it is also
satisfied at any time t > t*. This property is equivalent to utility monotonicityof anytime algorithms (Dean
&Boddy1988; Horvitz 1987), while utility is restricted
to Booleanvalues.
¯ The goal is to provide a schedule whichminimizesthe expected cost, maybeunder someconstraints (for example,
processes mayshare resources). Such problemdefinition
is typical for rational-bounded reasoning (Simon1982;
Russell & Wefald 1991).
This problem resembles contract algorithms (Russell
Zilberstein 1991;Zilberstein 1993). For contract algorithms,
the task is to construct the algorithmprovidingthe best possible solution given a time of execution. In our case, the
task is to construct the (parallel) algorithmprovidingthe best
possible solution given a required utility.
Simple parallelization, with no information exchangebetween the processes, mayspeedup the process due to high
diversity in solution times. For example, Knight (1993)
showed that using many reactive instances of RTA*
search (Korf 1990) is more beneficial than using a single
deliberative RTA*instance. Yokoo & Kitamura (1996)
used several search agents in parallel, with agent rearrangement after pregiven periods of time. Janakiram, Agrawal,
& Mehrotra (1988) showedthat for manycommondistributions of solution time, simpleparallelization leads to at most
linear speedup, while communicationbetween the processors can result in a superlinear speedup.
A superlinear speedupis usually obtained as a result of
information exchange between the processes (like in the
work of Clearwater, Hogg, & Huberman(1992) devoted
cryptarithmetic problems) or a result of a pre-given division of the search space (like in the works of Kumarand
The performance of anytime algorithms having a nondeterministic nature can be improvedby solving simultaneously several instancesof the algorithm-problem
pairs. These
pairsmayincludedifferent instancesof a problem
(like startingfroma differentinitial state), differentalgorithms
(if several alternativesexist), or several instancesof the samealgorithm(for non-deterministicalgorithms). A straightforwardparallelization,however,
usuallyresults in onlya linear
speedup,whilemoreeffective parallelizationschemesrequire
knowledge
about the problemspace and/or the algorithmitself.
In this paperwepresenta generalframework
for parallelization, whichuses only minimalinformationon the algorithm
(namely,its probabilistic behavior,describedby a performanceprofile), andobtainsa super-linearspeedupby optimal
schedulingof different instances of the algorithm-problem
pairs. Weshowa mathematicalmodelfor this framework,
present algorithmsfor optimal scheduling,and demonstrate
the behaviorof optimalschedulesfor different kindsof anytimealgorithms.
Introduction
Assumethat we have two learning systems which need to
learn a concept with a given success rate. One of them
learns fast, but needs somepreprocessing at the beginning.
Another works slower, but no preprocessing is required. A
question arises: can we benefit from using both learning systems to solve one learning task on a single-processor machine? Whatshould be the order of their application? Does
the situation changeif the systemshavea certain probability
to fail?
Another example is a PROLOG
expert system. Assume
we are required to handle a query of the form Q1 v Q2.
we can parallelize this query and try to solve the subproblems Q1 and Q2 independently, maybeon a single processor. Wewould like to minimizethe average time required
for the query, and, in the case of two processors, the time
consumedby both of them. The question is: is it beneficial
to parallelize the query? If not, whichone of the subqueries
should be handled first? If yes, should we solve themsimultaneously, or should we use somespecial schedule?
Copyright© 2001, AmericanAssociationfor Artificial Intelligence(www.aaai.org).
All rights reserved.
49
Rao (1987; 1987; 1992) devoted to parallelizing standard
search algorithms).
Unfortunately, the techniques requiring a pre-given division of the search space or online communicationbetween
processes are usually highly domain-dependent.An interesting approach in a domain-independentdirection has been
investigated by Huberman,Lukose, &Hogg(1997). The authors proposedto provide the processes (agents) with a different amountof resources ("portfolio" construction), which
enabled to reduce both the expected resource usage and its
variance. Their experimentsshowedthe applicability of this
approach in manyhard computationalproblems. In the field
of anytime algorithms, similar works were mostly concentrated on scheduling different anytime algorithms or decision proceduresin order to maximizeoverall utility (like in
the work of Boddy&Dean(1994)). Their settings, however,
are different from those presented above.
The goal of this research is to develop algorithms that
design an optimal scheduling policy based on the statistical characteristics of the process(es). Wepresent a formal
frameworkfor scheduling parallel anytime algorithms and
study two cases: shared and non-shared resources. The
frameworkassumesthat we knowthe probability of the goal
condition to be satisfied as a function of time (a performanceprofile (Simon 1955; Boddy& Dean1994) restricted
to Boolean quality values). Weanalyze the properties of
optimal schedules for suspend-resumemodel and show that
in most cases an extension of the frameworkto intensity
control does not decrease the optimal cost. For the case of
shared resources (a single-processor model), we showan algorithm for building optimal schedules. Finally, we demonstrate experimentalresults for the optimal schedules.
Motivation: a simple example
Before starting the formal discussion, we wouldlike to give
a simple example. Assumethat two instances of DFSwith
random tie-breaking are applied to a very simple search
space shownin Figure 1. Weassumethat each process uses
a separate processor. There are twopaths to the goal, one of
length 10, and one of length 40. DFShas no guiding heuristic, and therefore its behaviorat A’ is not determined.When
one of the instances finds the solution, the task is considered
accomplished.
Figure1: Asimplesearch task: twoinstancesof DFSsearch for a
path fromA to B. Schedulingthe processesmayreducecost.
Wehave two utility components- the time, which is the
elapsed time required for the systemto find a solution, and
the resources, which is total CPUtime consumedby both
processes. Assumefirst that we have control only over the
launch time of the processes. If the two search processes
start together, the expected time usage will be 4+ 10 x 3/4+
40 × 1/4 = 17.5 units, while the expected resource usage
will be 20 x 3/4+80 x 1/4 = 35 units. If we apply only one
instance, both time and resource usage will be 10 x 1/2 +
40 x 1/2 -- 25 units. If one of the processes will wait for 10
time units, the expected time usage will be 10 × 1/2 + 20 ×
1/4 + 40 x 1/4 = 20, and the expected resource usage will
be 10 x 1/2+30 x 1/4+70 x 1/4 = 30. Intuitively, we can
vary the delay to push the tradeoff towards time or resource
usage as we like.
Assumenowthat we are allowed to suspend and resume
the two processesafter they start. This additional capability
brings further improvement.Indeed, assumethat the first
processis active for the first 10 time units, then it stops and
the secondprocess is active for the next 10 time units, and
then the secondprocess stops and the first process continues
the execution1. Both the expected time and resource usage
will be 10 × 1/2 + 20 x 1/4 + 50 x 1/4 = 22.5 units. If
we consider the cost to be a sumof time and resource usage,
it is possible to showthat this result better than for any of
delay-based schedules.
A frameworkfor parallelization scheduling
In this section we formalize the intuitive description of parallelization scheduling. The first part of this framework
is similar to our frameworkfor monitoring anytime algorithms(Finkelstein & Markovitch 2001).
Let S be a set of states, t be a time variable with nonnegative real values, and ..4 be a randomprocess such that
each realization (trajectory) A(t) of.4 represents a mapping
from 7~+ to S. Let G : S --+ {0, 1} be a goal predicate,
where0 corresponds to False and i corresponds to True. Let
..4 be monotonicover G, i.e. for each trajectory A(t) of ,3,
the function GA(t) ---- G(A(t))Ais a non-decreasingfunction.
Underthe above assumptions, GA(t) is a step function with
at most one discontinuity point, which we denote by t’A,a
(this is the first point after whichthe goal predicate is true).
If G~’A
(t) is always0, wesay that t’A,Cis not defined. Therefore, we can define a randomvariable ( = (A,a, which for
each trajectory A(t) of.A with t’A,a defined, corresponds to
t’A,a. The behavior of ( can be described by its distribution
function F(t). At the points where F(t) is differentiable,
we use the probability density f(t) = F’(t).
This scheme resembles the one used in anytime algorithms. The goal predicate can be viewedas a special case
of the quality measurementused in anytime algorithms, and
the requirement for its non-decreasing value is a standard
requirementof these algorithms. The trajectories of ,,4 correspond to conditional performanceprofiles (Zilberstein
Russell 1992; Zilberstein 1993).
In practice, not every trajectory of .A leads to goal predicate satisfaction evenafter infinitely large time. That means
that the set of trajectories wheret’A,a is undefinedis not necessarily of measurezero. That is whywe define the probability of success p as the probability of A(t) with ~A,Gde2.
fined
1In this scenarioat each moment
onlyoneprocessis active, so
wecan use the sameprocessor.
2Another
wayto expressthe possibility that the processwill not
50
Assumenow that we have a system of n random processes .A1, ¯ ¯ ¯ ,4, with correspondingdistribution functions
F1,..., Fn and goal predicates G1,..., Gn. Wedefine a
schedule of the system as a set of binary functions {Oi},
whereat each moment
t, the i-th process is active if 0~ (t)
1 and idle otherwise. Werefer to this schemeas suspendresumescheduling.
A possible generalization of this frameworkis to extend
the suspend/resumecontrol to a morerefined mechanismallowingus to determinethe intensity with whicheach process
acts. For software processes that meansto vary the fraction
of CPUusage; for tasks like robot navigation this implies
changingthe speed of the robots. Mathematically,using intensity control it is equivalent to replacing the binary functions 0i (t) with continuousfunctions with a range between
0 and 1.
Note that scheduling makes the term time ambiguous.
Onone hand, we have the subjective time for each process
which is consumedonly when the process is active. This
kind of time corresponds to someresource consumedby the
process. Onthe other hand, we have an objective time measured from the point of view of an external observer. The
performanceprofile of each algorithm is defined over its
subjective time, while the cost function (see below)mayuse
both kinds of times. Since we are using several processes,
all the formulaein this paper are basedon the objective time.
Let us denote by o-i (t) the total time that process i has
beenactive before t. By definition,
o-~(t) = O~(x)dx.
(for the sake of readability, we omit t in o-i(t)). Under
suspend-resume model assumptions, o-~ must be differentiable functions with derivatives 0 or 1 whichwouldensure
correct values for Oi. Underintensity control assumptions,
o-i mustbe differentiable functions with derivatives between
0 and 1.
Weconsider two alternative setups regarding resource
sharing betweenthe processes:
1. The processes share resources on a mutual exclusion basis. That meansthat exactly one process can be active at
each moment,and the processes will be active one after
another until the goal is reached by one of them. In this
case the sumof derivatives of o-i is alwaysone 4. Thecase
of shared resources correspondsto the case of several algorithms running on a single processor.
2. The processes are fully independent - there is no additional constraints on o-~. This case correspondsto n independent algorithms (e.g., running on n processors).
Our goal is to find a schedule which minimizes expected
cost (2) under the correspondingconstraints.
Suspend-resume
based scheduling
In this section we consider the case of suspend-resumebased
control (o-i are continuousfunctions with derivatives 0 or 1).
Claim 1 The expressions for the goal-time distribution
Fn( t, o-l,..., o-n) andthe expectedcostE~( cq , . . . , o-,~ )
as follows:
n
In practice ~ri(t) provides the mappingfrom the objective
time t to the subjective time of the i-th process. Since 0i can
be obtained from o-i by differentiation, we often describe
schedulesby { o-i } insteadof { Oi}.
The processes {,Ai} with goal predicates {Gi} running
underschedules{o-i } result in a newprocess ,,4, with a goal
predicate G. G is the disjunction of Gi (a(t) = Viai(t)),
and therefore ,A is monotonicover G. Wedenote the distribution function of the correspondingrandomvariable ~’.a,a
by Fn(t, o-l,..., o-n), and the corresponding distribution
density by fn(t, o-1,..., o-n).
Assumethat we are given a monotonic non-decreasing
cost function u(t, tl,..., tn), whichdependson the objective time t and the subjective times per process tl. Since the
subjective times can be calculated by o-i (t), weactually have
~, =u(t, o-x(t),...,o-,(t)).
The expected cost of schedule {o-i} can be, therefore, ex3pressed as
E~(~x,...,o-,,)
+oo
f
U(t, O-1,...,
(2)
o-n) fn (t, O-l,...,
o-,~)dt
stop at all is to use profiles that approach1 - p whent --+ oz. We
preferto usep explicitlybecause,in orderfor F to be a distribution
function, it mustsatisfy limt~ooF(t) =
3Thegeneralizationto the case wherethe probabilityof success
p is notI is considered
at the endof the nextsection.
F~(t, crl,...,
o-,) = 1 - H(1 - Fi0ri)),
(3)
i=1
Eu(o-1,...,
0"N)
fo
°° u’ + o-~u’,
t
i=1
(1 - Fi(o-i))dt.
(4)
’=
Theproofs in this paper are omitted due to the lack of space.
In this section we assume that the total cost is a linear combinationof the objective time and all the subjective
times, and that the subjective times have the sameweight:
rt
u(t,
o-l,...,
).
o-n) = at + bE(ri(t
(5)
i=1
This assumption is madeto keep the expressions morereadable. The solution process remains the samefor the general
form of u.
In this case our minimizationproblemis:
Eu(o-1,...,
O-N)
a + bEo-~
i=1
H(1
- Fj(o-j))dt min,
(6)
j=l
4Thisfact is obviousfor the case of suspend-resume
control,
andfor intensitycontrolit is reflectedin Lemma
2.
(to keep the general form, we assume (_ 1 = 0). The minimization problem(11) is equivalent to the original problem (7), and the dependencybetweentheir solutions is described by (9) and (10). The only constraint for the
problemfollows from the fact that the processes are alternating for non-negativeperiods of time:
and for the case of shared resources on a mutualexclusion
basis
G(0-1,...,o-n)
(a + b)
f0
H(1 - Fj(0-j))dt
j=l
(7)
--+ min.
{~0=0<~2<...<~2~<..
In the rest of this section weshowa formal solution (necessary conditions and the algorithm) for the frameworkwith
shared resources. For the sake of clarity, we start with two
processes and present the formulae and the algorithm, and
then generalize the solution for an arbitrary numberof processes.
(1 < (3 _< ¯ .-_< ¢2~-+1_<7.
By Weierstrass theorem, (11) reaches its optimal values either when
du
--- = 0 for k = 1,...,n,...,
dek
Necessary conditions for an optimal solution for
two processes
Let A1 and A2 be two processes sharing a resource. While
working,one process locks the resource(s), and the other
necessarily idle. Wecan showthat such dependencyyields
a strong constraint on the behavior of the process, which
allows to build an effective algorithm for solving the minimization problem.
Let A1 be active in the intervals [t2k,t2k+l], and A2be
active in the intervals [t2k+l,t2k+2]. That meansthat the
processes workalternately switchingat time points tj (if A~
is active first then tl = to).
Let us denote by ~k the total time that A1has beenactive
before tk, and by rlk the total time that A2has been active
before tk. Wecan see that
Theorem1 (The chain theorem for two processes)
The value for ~i+1 for i >_2 can be computedfor given ~i-1
and ~i using the formulae
f2((2k)
_ Fl((2k+l)
-- Fl((2k-1)
1 - F2((2k)
f¢~k+~tl -- Fl(x))dx
(14)
1 - Fl((2k+l)
5Optimal solution for two processes: an algorithm
Assumethat A1 acts first ((1 > 0). From Theorem 1
can see that the values of (0 --- 0 and (1 determinethe set
possible values for (2, the values of (1 and (~ determine
possible values for (3, and so on.
Therefore, a non-zero value for (1 provides us with a tree
of possible values of ¢~. Thebranchingfactor of this tree is
determined by the numberof roots of (14) and (15).
possible sequence (1, (2,... can be evaluated using (11).
The series in that expression must converge, so we stop after a finite numberof points. For each value of (1 we can
find the best sequenceusing one of the standard search algorithms, such as Branch-and-Bound.Let us denote the value
of the best sequence for each (1 by E~((1). Performing
global optimization of G, ((1) by (1 provides us with an
timal solution for the case whereA1acts first.
Note, that the value of (1 mayalso be 0 (A2acts first),
we need to comparethe value obtained by optimization of (~
with the value obtained by optimization of (2 with (1 =
Since A1is active in the intervals [t2k, t2k+l], the functions
0-1 and 0-2 on these intervals havethe form
0-1(t) = t - t2k +0-1(t2k ) = t --
(9)
0-~(t)= 0-2(t2k)
=
Similarly, A2 is active in the intervals [t2k+l, t2k+2].Therefore, on these intervals 0-1 and 0-2 are definedas
(10)
Substituting these values to (7) weobtain
E, (¢i,..., ¢,)
co
[
(~ + b) )--]~ (1 - F2(¢~))
frf2k-bl
k= 0
(1 - FI(~))d~
(2k --
¢2k+2
(1 - Fl((2k+l)) Jf2k
(1 - F2(x))dz
\
(15)
This theoremallows us to formulate an algorithm for building an optimal solution.
(8)
: ¢2k+1,
f¢~k+~~1 - F2(z))dz
J~2k
¢k+ ¢k-1= tk -- to = tk.
: 0"2 (t2k-t-1)
, i =2k+1,
A(¢2k+l) _ F2(C2k+2)--F2(¢2k)
i 2k+2.
Wecan reduce the numberof variables by denoting (2k+1
~k+l, and (2k = r/2k. Thus, the odd indices of( pertain
A1, and the even indices to A2. Notealso, that
0-2(t)= t - t2k+l+ 0-2(t2k+l)
= t -- Gk+l.
(13)
or on the border described by (12). However,for two processes we can, without loss of generality, ignore the border
case. Thus, at each step the time spent by the processes
is determined by (13). Wecan see that ¢i appears in three
subsequentterms of Eu (0-1, ¯ ¯., o-n), and by differentiation
of (11) by (2k (and by (2k+1), we can prove the following
theorem:
~2k+1= ~2k+2= t2k+l -- t2k + ~2k-1,
112k+2 = ~]2k-t-3
= t2k+2 -- t2k+l + ~2k.
0"1 (t)
(12)
min
5Dueto the lack of spacewepresent onlythe mainidea of the
algorithm.
(11)
52
Necessary conditions for an optimal solution for an
arbitrary numberof processes
Assumenow that we have n processes A1,..., A,~ using
shared resources¯ As before, let to = 0 <_tl < ... < tj <_
¯.. be time points wherethe processes perform a switch. We
permit also the case ofti = ti+l, and this gives us a possibility to assume(without loss of generality) that the processes
are workingin a round-robinmanner,such that at tkn+i process i becomesidle, and process i + 1 becomesactive¯ Let
#i stands for the index of the process which is active betweenti-1 and ti, and let (i be the total time spent by process #i before ti. As in the case of twoprocesses, there is a
one-to-one correspondence between {ti} and {Ci}, and the
following equations hold:
Ci -- Ci-n : ti -- ti-1,
n-1
(16)
E Ci-j = ti for i > n.
(17)
values of C using the formula
f (Cp)
1 - F,(C?)
n+l--t
H (1 - F#j(C2-t))j=l+l
l-1 n+i-1
E H (1
i=l-n+l
C-.+I
= ...=
C-I
= Co= 0.
For future discussion we need to knowwhat is the schedule
of each process in the time interval [tl- 1, ti]. By construction of Ci wecan see, that at this interval the subjective time
of the process k has the following form:
1,
(18)
.,n.
Weneed to minimize the expression given by (7). The
only constraints are the monotonicityof the sequence of C
for each process i, and therefore weobtain
(19)
Cj _< Cj+- for each j.
j=i+l
F#j(~;-1))
~-’(1/(’
Optimal solution in the case of additional
constraints
Assumenowthat the problemhas additional constraints: the
solution time is limited by T (or each one of the processes
has an upper limit of 7~); and the probability of success of
the i-th agent pl is not necessarily 1. It is possible to show
that all the formulaeused in the previous sections are valid
for the current settings, with two differences:
1. Weuse pj Fj instead of Fj and pjfj instead of fj.
2. All the integrals are from0 to T instead of 0 to e~.
The first change maybe easily incorporated to all the algorithms. The second condition adds a boundarycondition for
chain theorems: for n agents, for example, (j mayalso get
the value
/=1
This adds one more branch to the Branch and Bound algorithm and can be easily implemented.
Similar changes in the algorithms are performed in the
case of the maximalallowed time T/ per agent. In practice, we alwaysuse this limitation setting T~ such that the
probability for Ai to reach the goal after Ti, pi (1 - Fi (Ti)),
becomesnegligible.
E.(CI,
¯¯.,
C.,...)
=
E H (1-
#j#t
= T-
Lemma
1 For a system of n processes, the expression for
the expected cost (7) can be rewritten
~ n iq-n--i
- F#i(x))dx.
n--1
In order to get morecompactexpressions we denote Cnh+t
by Ck (this notation does not imply l _< n). Given the expressions for (ri, we can prove the following lemma:
k=0i=l
f¢~+1(1
- F#j(<2)
)
Optimal solution for an arbitrary numberof
processes: an algorithm
As in the case of two processes, assumethat A1 acts first.
By Theorem2, given the values of C0, C1,. ¯., C2,~-3we can
determineall the possibilities for the value of C2,,- 2 (either
Cn-2if the processskips its turn, or oneof the roots of (21)).
Given the values up to C2n-2, we can determine the values
for C2~-1, and so on.
The idea of the algorithm is similar to the algorithm for
two processes. The first 2n - 2 variables (including C0 = 0)
determine the tree of possible values for Ci. Optimization
over 2n - 3 first variables, therefore, provides us with an
optimal schedule (as before, we comparethe results for the
case wherethe first k < n variables are 0).
The first equation correspondsto the fact that the time betweenti- 1 and ti is accumulatedto the (i values of the correspondingprocess; while the secondequation claims that at
each switch the objective time of the systemis equal to the
sumof the subjective times of each process. For the sake of
uniformity we denote also
{
j=i+l
H (1 - F#j(Cr~))
j=/+l
(21)
j=0
k = 1,...,#iCk+(i-#0,
¢k(t)= t -- ti-i+ Ci-n,k -- #i,
k = #i + l,..
Ck+(i-#i)-n,
n+l--1
Fi(x))d:.
(20)
This lemmamakes possible to prove the chain theorem
for an arbitrary numberof processes:
Theorem2 (The chain theorem)The value for Cj may either be (j-n, or can be computedgiven the previous 2n -
6Process scheduling by intensity control
Usingintensity control is equivalent to replacing the binary
functions 0i (t) with continuous functions with a range be6Dueto the lack of space, in this section we sketch onlythe
mainpoints.
53
tween0 and 1. It is easy to see that all the formulaefor the
distribution function and the expected cost from ClaimI are
still valid underintensity control settings.
The expression to minimize(6) remains exactly the same
as in the suspend-resumemodel. The constraints, however,
are different:
(22)
0 < tr~ _< 1
Theorem
4 Let the set offunctions {cri} be a solution to the
minimization problem (6) under the constraints (23). Then,
at each point t, where the conditions (24) do not hold and
the hazard functions
for the problemwith no shared resources, or
are continuous, only one process will be active, and it will
consumeall the resources. Moreover,this process is the process having the maximalvalue of hk (t) amongall the processes with non-zero resource consumptionin the neighborhoodof the point t.
n
0_<E~r~<_ 1
(23)
i=1
for the problemwith shared resources.
Underthe intensity control settings, we can prove an important theorem:
Theorem3 If no time cost is taken into account (a : 0),
the model with shared resources and the model with independent processes are equivalent. Namely,given a solution
for the model with independent processes, we may reconstruct a solution of the samecost for the modelwith shared
resources and vice versa.
It is possible to showthat the necessary conditions of
Euler-Lagrange for minimization problem (6) (which
usually used for solving such type of optimization problems), yield conditions of the form
-1 - Fk, (crkl) 1 - Fk~(~rk~)
fi-~i)(t)
(24)
Corollary 2 If a process becomesactive at some point t’
and its hazard function is monotonicallyincreasing by t at
It’, oo), it will remainactive until the solution is found.
For the case of no shared resources, the following theorem
can be proved:
~(t) : 0 or c~(t) = 1,
at each time t except, maybe,a set of measurezero.
(25)
play a very important role in intensity control framework.
They are knownas hazard functions or conditional failure
density functions.
Since the necessary conditions for weakminimumdo not
necessaryhold in the inner points, weshould look for a solution on the boundaryof the region described by (22) or (23).
Westart with the following lemma:
Lemma
2 Let the functions (rl , . . . , (rn be an optimalsolution. Thenthey mustsatisfy the following constraints:
1. For the case of no shared resources, at each time t there
is at least oneprocessworkingwith full intensity, i.e.,
Vt rniaxa~(t ) = 1.
.
This theoremimplies two important corollaries:
Corollary I If the hazard function of one of the processes
is greater or equal than the hazardfunctions of the others at
to and is monotonicallyincreasing by t, this process should
be the only one to be activated.
Theorem
5 Let the set offunctions {cri} be a solution to the
minimization problem (6) under the constraints (22).
for each i the following condition holds:
whichin most cases are only a false alarm. The functions
hi(t)-1
hk(t) = 1 -- Fk(C~k(t))
For the case of shared resources, at each time t all the
resourcesare consumed,i.e.,
=1.
i=1
This lemmahelps us to prove the following theorem for
the case of shared resources:
54
By Theorem5, for a system with independent processes,
intensity control is redundant. As for a system with shared
resources, by Theorem4 and Equation (24), intensity control mayonly be useful whenhazard functions of at least
two processesare equal (it is possible to show,however,that
evenin this case the equilibrium is not alwaysstable). Thus,
the extension of the suspend-resumemodelto intensity control in manycases does not increase the powerof the model.
Experiments
In this section we present the results of the optimal scheduling strategy for algorithms running on the same processor
(the modelwith shared resources), which means that cost
is equal to objective time spent. The first two experiments
showhowit is possible to benefit from the optimal schedule
by using instances of the samenon-deterministic algorithm.
The third experimentdemonstrates application of the optimal scheduling for two different deterministic algorithms
solving the same problem. Wehave implemented the algorithm described in this paper and compareits solutions
with two intuitive scheduling policies. The first policy runs
the processesone after another, initiating the secondprocess
whenthe probability that the first one will find a solution
becomesnegligible. The second intuitive strategy runs the
processes simultaneously. The processes are stopped when
the probability that they can still find a solution become
less
-6
than
. 10
70
Optimal strategy for algorithms with a bimodal
density function
Experimentsshowthat running several instances of the same
algorithm with a unimodaldistribution function usually corresponds to one-after-another strategy. That is whyin this
section we considera less trivial distribution.
Assumethat we have a non-deterministic algorithm with
a performanceprofile expressed by a linear combinationof
two normal distributions with the same deviation. Namely,
f(t) = 0.SN(al,crl)
+ 0.SN(~s,~rs),
where #1
o’1 = o’~ = 0.5, and the probability of success is p = 0.8.
Figure 2 showsthe averagetime required to reach the goal as
a function of #s (the results have been averagedover 10000
simulated problemsgenerated per #s accordingly to f(t)).
The results show, that up to a certain point the optimal
strategy is only slightly better than the one-after-another
strategy, but after that point it becomesbetter than bothintuitive strategies. Optimalschedules for this experimentcontain a single switch point.
. ......
60
5O
2O
10
0
12
8
1
Mean
valueof Ihe second
peak
1
’16
I
3
I
4
I
5
I
I
6
7
Number
of peaks
I
8
I
9
10
Figure3: Multimodal
distribution: averagetime as a function of
k.
Optimal strategy for two deterministic algorithms
The first two experimentshave generated schedules that are
rather intuitive (although they have beenfound in an optimal
way). In this experimentwe want to demonstratea very simple settings, wherethe resulting schedulewill be non-trivial.
Let us consider a simulation of the first examplefrom the
introduction. Assumethat we have two learning systems,
both having an exponential-like performance profile which
is typical for such systems. Wealso assumethat one of the
systems requires a delay for preprocessing but worksfaster.
Thus, we assumethat the first systemhas a distribution density fl(t) Ale-:~t, and th e se cond one ha s a density
fs(t) = Ase-A2(t-t2), such that A1 < As (the second is
faster), and t2 > 0 (it also has a delay). Assumethat both
learning systems are deterministic over a given set of examples, and that they mayfail to learn the concept with a
probability 1 - Pz -- 1 - Ps = 0.5.
It turns out, that, for example, for the values A1-- 3,
As-- 10, and t 1 --- 5, the optimalschedulewouldbe to apply
the first systemfor 1.15136time units, then (if it found no
solution) to apply the secondsystemfor 5.77652 time units,
then the first system will run for additional 3.22276 time
units, and finally the second system will run for 0.53572
time units. If at this point no solution has been found, both
systems have failed with a probability of 1 - 10-6 each.
Figure 4 shows the dependencyof the expected time as a
function ofts for p = 0.8. As before, the results have been
averaged over 10000 simulated examples. Wecan see that
in this case running both of the learning systems decreases
the average time even while running on a single processor.
.. y""
6
...........................................................
......................................................................................................
~30
’
optln~al
-one-after-another
simultaneous
........
4
optln~al -one-alter-al~other
.......
s m.~.lt~aou~....... ""
18
Figure2: Bimodal
distribution: averagetimeas a functionof/~z.
Optimal strategy for algorithms with a multimodal
density function
The goal of the second experiment was to demonstrate how
the numberof peaks of the density function affects the solution. Weconsider a case of partial uniform distribution,
wherethe density is distributed over k identical peaks of
length 1 placed symmetrically in the time interval from 0
to 100 (thus, the density function will be equal to 1/k when
t belongs to one of such peaks, and 0 otherwise). In this
experiment we have chosen p = 1.
As before, we have generated 10000examplesper k. Figure 3 shows the dependency of the average time from k.
Wecan see fromthe results, that the one-after-another strategy requires approximately50 time units, which is the average of the distribution. Simultaneous launch works much
worsein this case, due to the "valleys" in the distribution
function. Optimalstrategy returns schedules, wherethe processes switch after each peak, and outperforms both of the
trivial strategies.
Conclusions and future directions
In this work we present a theoretical frameworkfor optimal scheduling of parallel anytime algorithms. Weanalyze
the properties of optimal schedules for the suspend-resume
model, showan extension of the frameworkto intensity control, and provide an algorithm for building optimal schedules whenthe processes share resources (e.g., running on
55
4
simu]laneous
........
3.5
2.5
8, 2
|
....:::::::::::::::::::::
1.5
0.5
i
2
i
3
i
4
i
i
i
5
6
7
Delayof thee~ondsystem
i
8
i
9
Figure4: Learningsystems:averagetimeas a functionof t2.
the sameprocessor). The information required for building
optimal schedules is restricted to performanceprofiles. Experiments showthat using optimal schedules speeds up the
processes (even running on the same processor), thus providing a super-linear speedup.
Similar schemescan be applied for more elaborated setups:
¯ Scheduling a system of n anytime algorithms, where the
overall utility of the systemis defined as the maximalutility of its components;
¯ Scheduling with non-zero rescheduling costs;
¯ Providing dynamic schedule algorithms able to handle
changes in the environment;
¯ Building effective algorithms for the case of multiprocessor systems.
Thesedirections are a subject to an ongoingresearch.
Webelieve that using the frameworksuch as above can
be very helpful for developing more effective algorithms
only by statistical analysis and parallelization of the existing ones.
References
Boddy, M., and Dean, T. 1994. Decision-theoretic deliberation scheduling for problemsolving in time-constrained
environments.Artificial Intelligence 67(2):245-286.
Clearwater, S. H.; Hogg, T.; and Huberman,B. A. 1992.
Cooperative problem solving. In Bernardo, H., ed., Computation: The Micro and Macro View. Singapore: World
Scientific. 33-70.
Dean, T., and Boddy, M. 1988. An analysis of timedependent planning. In Proceedings of Seventh National
Conferenceon Artificial Intelligence, 49-54.
Finkelstein, L., and Markovitch, S. 2001. Optimalschedules for monitoringanytimealgorithms. Artificial Intelligence 126:63-108.
56
Horvitz, E.J. 1987. Reasoning about beliefs and actions
under computational resource constraints. In Proceedings
of the 1987 Workshopon Uncertainty in Artificial Intelligence.
Huberman,B. A.; Lukose, R. M.; and Hogg, T. 1997. An
economicapproach to hard computational problems. Science 275:51-54.
Janakiram, V. K.; Agrawal,D. P.; and Mehrotra, R. 1988. A
randomizedparallel backtracking algorithm. IEEE Transactions on Computers37(12): 1665-1676.
Knight, K. 1993. Are manyreactive agents better than a
few deliberative ones? In Proceedings of the Thirteenth
International Joint Conferenceon Artificial Intelligence,
432-437.
Korf, R.E. 1990. Real-time heuristic search. Artificial
Intelligence 42:189-211.
Kumar,V., and Rao, V. N. 1987. Parallel depth-first search
on multiprocessors part II: Analysis. International Journal
of Parallel Programming16(6):501-519.
Rao, V. N., and Kumar,V. 1987. Parallel depth-first search
on multiprocessors part I: Implementation. International
Journal of Parallel Programming16(6):479-499.
Rao, V. N., and Kumar, V. 1992. On the efficience of
parallel backtracking. IEEEtransactions on parrallel and
distributed computing.
Russell, S., and Wefald, E. 1991. Do the Right Thing:
Studies in Limited Rationality. Cambridge,Massachusetts:
The MITPress.
Russell, S. J., and Zilberstein, S. 1991. Composing
realtime systems. In Proceedingsof the Twelfth National Joint
Conferenceon Artificial Intelligence (IJCAI-91). Sydney:
Morgan Kaufmann.
Simon,H. A. 1955. A behavioral modelof rational choice.
Quarterly Journal of Economics69:99-118.
Simon, H. A. 1982. Models of BoundedRationality. MIT
Press.
Yokoo,M., and Kitamura, Y. 1996. Multiagent real-timeA*with selection: Introducing competition in cooperative
search. In Proceedingsof the SecondInternational Conference on Multiagent Systems ( ICMAS-96), 409-416.
Zilberstein, S., and Russell, S. J. 1992. Efficient resourcebounded reasoning in AT-RALPH.
In Proceedings of the
First International Conference on AI Planning Systems,
260-266.
Zilberstein, S. 1993. Operational Rationality Through
Compilation of Anytime Algorithms. Ph.D. Dissertation, ComputerScience Division, University of California,
Berkeley.