From: AAAI Technical Report FS-01-04. Compilation copyright © 2001, AAAI (www.aaai.org). All rights reserved. Optimal Schedules for Parallelizing Lev Finkelstein and Shaul Markovitch Anytime Algorithms and Ehud Rivlin Computer Science Department Technion, Haifa 32000 Israel {lev,shaulm,ehudr}@cs.technion.ac.il Abstract What is commonto the above examples? ¯ Weare trying to benefit from the uncertainty in solving morethan one instance of the algorithm-problempair. We can use different algorithms (in the first example) and different problems (in the second example). For nondeterministic algorithms, we can also use different instances of the samealgorithm. ¯ Eachprocess is executed with the purpose of satisfying a given goal predicate. The task is considered accomplished whenone of the instances is solved. ¯ If the goal predicate is satisfied at time t*, then it is also satisfied at any time t > t*. This property is equivalent to utility monotonicityof anytime algorithms (Dean &Boddy1988; Horvitz 1987), while utility is restricted to Booleanvalues. ¯ The goal is to provide a schedule whichminimizesthe expected cost, maybeunder someconstraints (for example, processes mayshare resources). Such problemdefinition is typical for rational-bounded reasoning (Simon1982; Russell & Wefald 1991). This problem resembles contract algorithms (Russell Zilberstein 1991;Zilberstein 1993). For contract algorithms, the task is to construct the algorithmprovidingthe best possible solution given a time of execution. In our case, the task is to construct the (parallel) algorithmprovidingthe best possible solution given a required utility. Simple parallelization, with no information exchangebetween the processes, mayspeedup the process due to high diversity in solution times. For example, Knight (1993) showed that using many reactive instances of RTA* search (Korf 1990) is more beneficial than using a single deliberative RTA*instance. Yokoo & Kitamura (1996) used several search agents in parallel, with agent rearrangement after pregiven periods of time. Janakiram, Agrawal, & Mehrotra (1988) showedthat for manycommondistributions of solution time, simpleparallelization leads to at most linear speedup, while communicationbetween the processors can result in a superlinear speedup. A superlinear speedupis usually obtained as a result of information exchange between the processes (like in the work of Clearwater, Hogg, & Huberman(1992) devoted cryptarithmetic problems) or a result of a pre-given division of the search space (like in the works of Kumarand The performance of anytime algorithms having a nondeterministic nature can be improvedby solving simultaneously several instancesof the algorithm-problem pairs. These pairsmayincludedifferent instancesof a problem (like startingfroma differentinitial state), differentalgorithms (if several alternativesexist), or several instancesof the samealgorithm(for non-deterministicalgorithms). A straightforwardparallelization,however, usuallyresults in onlya linear speedup,whilemoreeffective parallelizationschemesrequire knowledge about the problemspace and/or the algorithmitself. In this paperwepresenta generalframework for parallelization, whichuses only minimalinformationon the algorithm (namely,its probabilistic behavior,describedby a performanceprofile), andobtainsa super-linearspeedupby optimal schedulingof different instances of the algorithm-problem pairs. Weshowa mathematicalmodelfor this framework, present algorithmsfor optimal scheduling,and demonstrate the behaviorof optimalschedulesfor different kindsof anytimealgorithms. Introduction Assumethat we have two learning systems which need to learn a concept with a given success rate. One of them learns fast, but needs somepreprocessing at the beginning. Another works slower, but no preprocessing is required. A question arises: can we benefit from using both learning systems to solve one learning task on a single-processor machine? Whatshould be the order of their application? Does the situation changeif the systemshavea certain probability to fail? Another example is a PROLOG expert system. Assume we are required to handle a query of the form Q1 v Q2. we can parallelize this query and try to solve the subproblems Q1 and Q2 independently, maybeon a single processor. Wewould like to minimizethe average time required for the query, and, in the case of two processors, the time consumedby both of them. The question is: is it beneficial to parallelize the query? If not, whichone of the subqueries should be handled first? If yes, should we solve themsimultaneously, or should we use somespecial schedule? Copyright© 2001, AmericanAssociationfor Artificial Intelligence(www.aaai.org). All rights reserved. 49 Rao (1987; 1987; 1992) devoted to parallelizing standard search algorithms). Unfortunately, the techniques requiring a pre-given division of the search space or online communicationbetween processes are usually highly domain-dependent.An interesting approach in a domain-independentdirection has been investigated by Huberman,Lukose, &Hogg(1997). The authors proposedto provide the processes (agents) with a different amountof resources ("portfolio" construction), which enabled to reduce both the expected resource usage and its variance. Their experimentsshowedthe applicability of this approach in manyhard computationalproblems. In the field of anytime algorithms, similar works were mostly concentrated on scheduling different anytime algorithms or decision proceduresin order to maximizeoverall utility (like in the work of Boddy&Dean(1994)). Their settings, however, are different from those presented above. The goal of this research is to develop algorithms that design an optimal scheduling policy based on the statistical characteristics of the process(es). Wepresent a formal frameworkfor scheduling parallel anytime algorithms and study two cases: shared and non-shared resources. The frameworkassumesthat we knowthe probability of the goal condition to be satisfied as a function of time (a performanceprofile (Simon 1955; Boddy& Dean1994) restricted to Boolean quality values). Weanalyze the properties of optimal schedules for suspend-resumemodel and show that in most cases an extension of the frameworkto intensity control does not decrease the optimal cost. For the case of shared resources (a single-processor model), we showan algorithm for building optimal schedules. Finally, we demonstrate experimentalresults for the optimal schedules. Motivation: a simple example Before starting the formal discussion, we wouldlike to give a simple example. Assumethat two instances of DFSwith random tie-breaking are applied to a very simple search space shownin Figure 1. Weassumethat each process uses a separate processor. There are twopaths to the goal, one of length 10, and one of length 40. DFShas no guiding heuristic, and therefore its behaviorat A’ is not determined.When one of the instances finds the solution, the task is considered accomplished. Figure1: Asimplesearch task: twoinstancesof DFSsearch for a path fromA to B. Schedulingthe processesmayreducecost. Wehave two utility components- the time, which is the elapsed time required for the systemto find a solution, and the resources, which is total CPUtime consumedby both processes. Assumefirst that we have control only over the launch time of the processes. If the two search processes start together, the expected time usage will be 4+ 10 x 3/4+ 40 × 1/4 = 17.5 units, while the expected resource usage will be 20 x 3/4+80 x 1/4 = 35 units. If we apply only one instance, both time and resource usage will be 10 x 1/2 + 40 x 1/2 -- 25 units. If one of the processes will wait for 10 time units, the expected time usage will be 10 × 1/2 + 20 × 1/4 + 40 x 1/4 = 20, and the expected resource usage will be 10 x 1/2+30 x 1/4+70 x 1/4 = 30. Intuitively, we can vary the delay to push the tradeoff towards time or resource usage as we like. Assumenowthat we are allowed to suspend and resume the two processesafter they start. This additional capability brings further improvement.Indeed, assumethat the first processis active for the first 10 time units, then it stops and the secondprocess is active for the next 10 time units, and then the secondprocess stops and the first process continues the execution1. Both the expected time and resource usage will be 10 × 1/2 + 20 x 1/4 + 50 x 1/4 = 22.5 units. If we consider the cost to be a sumof time and resource usage, it is possible to showthat this result better than for any of delay-based schedules. A frameworkfor parallelization scheduling In this section we formalize the intuitive description of parallelization scheduling. The first part of this framework is similar to our frameworkfor monitoring anytime algorithms(Finkelstein & Markovitch 2001). Let S be a set of states, t be a time variable with nonnegative real values, and ..4 be a randomprocess such that each realization (trajectory) A(t) of.4 represents a mapping from 7~+ to S. Let G : S --+ {0, 1} be a goal predicate, where0 corresponds to False and i corresponds to True. Let ..4 be monotonicover G, i.e. for each trajectory A(t) of ,3, the function GA(t) ---- G(A(t))Ais a non-decreasingfunction. Underthe above assumptions, GA(t) is a step function with at most one discontinuity point, which we denote by t’A,a (this is the first point after whichthe goal predicate is true). If G~’A (t) is always0, wesay that t’A,Cis not defined. Therefore, we can define a randomvariable ( = (A,a, which for each trajectory A(t) of.A with t’A,a defined, corresponds to t’A,a. The behavior of ( can be described by its distribution function F(t). At the points where F(t) is differentiable, we use the probability density f(t) = F’(t). This scheme resembles the one used in anytime algorithms. The goal predicate can be viewedas a special case of the quality measurementused in anytime algorithms, and the requirement for its non-decreasing value is a standard requirementof these algorithms. The trajectories of ,,4 correspond to conditional performanceprofiles (Zilberstein Russell 1992; Zilberstein 1993). In practice, not every trajectory of .A leads to goal predicate satisfaction evenafter infinitely large time. That means that the set of trajectories wheret’A,a is undefinedis not necessarily of measurezero. That is whywe define the probability of success p as the probability of A(t) with ~A,Gde2. fined 1In this scenarioat each moment onlyoneprocessis active, so wecan use the sameprocessor. 2Another wayto expressthe possibility that the processwill not 50 Assumenow that we have a system of n random processes .A1, ¯ ¯ ¯ ,4, with correspondingdistribution functions F1,..., Fn and goal predicates G1,..., Gn. Wedefine a schedule of the system as a set of binary functions {Oi}, whereat each moment t, the i-th process is active if 0~ (t) 1 and idle otherwise. Werefer to this schemeas suspendresumescheduling. A possible generalization of this frameworkis to extend the suspend/resumecontrol to a morerefined mechanismallowingus to determinethe intensity with whicheach process acts. For software processes that meansto vary the fraction of CPUusage; for tasks like robot navigation this implies changingthe speed of the robots. Mathematically,using intensity control it is equivalent to replacing the binary functions 0i (t) with continuousfunctions with a range between 0 and 1. Note that scheduling makes the term time ambiguous. Onone hand, we have the subjective time for each process which is consumedonly when the process is active. This kind of time corresponds to someresource consumedby the process. Onthe other hand, we have an objective time measured from the point of view of an external observer. The performanceprofile of each algorithm is defined over its subjective time, while the cost function (see below)mayuse both kinds of times. Since we are using several processes, all the formulaein this paper are basedon the objective time. Let us denote by o-i (t) the total time that process i has beenactive before t. By definition, o-~(t) = O~(x)dx. (for the sake of readability, we omit t in o-i(t)). Under suspend-resume model assumptions, o-~ must be differentiable functions with derivatives 0 or 1 whichwouldensure correct values for Oi. Underintensity control assumptions, o-i mustbe differentiable functions with derivatives between 0 and 1. Weconsider two alternative setups regarding resource sharing betweenthe processes: 1. The processes share resources on a mutual exclusion basis. That meansthat exactly one process can be active at each moment,and the processes will be active one after another until the goal is reached by one of them. In this case the sumof derivatives of o-i is alwaysone 4. Thecase of shared resources correspondsto the case of several algorithms running on a single processor. 2. The processes are fully independent - there is no additional constraints on o-~. This case correspondsto n independent algorithms (e.g., running on n processors). Our goal is to find a schedule which minimizes expected cost (2) under the correspondingconstraints. Suspend-resume based scheduling In this section we consider the case of suspend-resumebased control (o-i are continuousfunctions with derivatives 0 or 1). Claim 1 The expressions for the goal-time distribution Fn( t, o-l,..., o-n) andthe expectedcostE~( cq , . . . , o-,~ ) as follows: n In practice ~ri(t) provides the mappingfrom the objective time t to the subjective time of the i-th process. Since 0i can be obtained from o-i by differentiation, we often describe schedulesby { o-i } insteadof { Oi}. The processes {,Ai} with goal predicates {Gi} running underschedules{o-i } result in a newprocess ,,4, with a goal predicate G. G is the disjunction of Gi (a(t) = Viai(t)), and therefore ,A is monotonicover G. Wedenote the distribution function of the correspondingrandomvariable ~’.a,a by Fn(t, o-l,..., o-n), and the corresponding distribution density by fn(t, o-1,..., o-n). Assumethat we are given a monotonic non-decreasing cost function u(t, tl,..., tn), whichdependson the objective time t and the subjective times per process tl. Since the subjective times can be calculated by o-i (t), weactually have ~, =u(t, o-x(t),...,o-,(t)). The expected cost of schedule {o-i} can be, therefore, ex3pressed as E~(~x,...,o-,,) +oo f U(t, O-1,..., (2) o-n) fn (t, O-l,..., o-,~)dt stop at all is to use profiles that approach1 - p whent --+ oz. We preferto usep explicitlybecause,in orderfor F to be a distribution function, it mustsatisfy limt~ooF(t) = 3Thegeneralizationto the case wherethe probabilityof success p is notI is considered at the endof the nextsection. F~(t, crl,..., o-,) = 1 - H(1 - Fi0ri)), (3) i=1 Eu(o-1,..., 0"N) fo °° u’ + o-~u’, t i=1 (1 - Fi(o-i))dt. (4) ’= Theproofs in this paper are omitted due to the lack of space. In this section we assume that the total cost is a linear combinationof the objective time and all the subjective times, and that the subjective times have the sameweight: rt u(t, o-l,..., ). o-n) = at + bE(ri(t (5) i=1 This assumption is madeto keep the expressions morereadable. The solution process remains the samefor the general form of u. In this case our minimizationproblemis: Eu(o-1,..., O-N) a + bEo-~ i=1 H(1 - Fj(o-j))dt min, (6) j=l 4Thisfact is obviousfor the case of suspend-resume control, andfor intensitycontrolit is reflectedin Lemma 2. (to keep the general form, we assume (_ 1 = 0). The minimization problem(11) is equivalent to the original problem (7), and the dependencybetweentheir solutions is described by (9) and (10). The only constraint for the problemfollows from the fact that the processes are alternating for non-negativeperiods of time: and for the case of shared resources on a mutualexclusion basis G(0-1,...,o-n) (a + b) f0 H(1 - Fj(0-j))dt j=l (7) --+ min. {~0=0<~2<...<~2~<.. In the rest of this section weshowa formal solution (necessary conditions and the algorithm) for the frameworkwith shared resources. For the sake of clarity, we start with two processes and present the formulae and the algorithm, and then generalize the solution for an arbitrary numberof processes. (1 < (3 _< ¯ .-_< ¢2~-+1_<7. By Weierstrass theorem, (11) reaches its optimal values either when du --- = 0 for k = 1,...,n,..., dek Necessary conditions for an optimal solution for two processes Let A1 and A2 be two processes sharing a resource. While working,one process locks the resource(s), and the other necessarily idle. Wecan showthat such dependencyyields a strong constraint on the behavior of the process, which allows to build an effective algorithm for solving the minimization problem. Let A1 be active in the intervals [t2k,t2k+l], and A2be active in the intervals [t2k+l,t2k+2]. That meansthat the processes workalternately switchingat time points tj (if A~ is active first then tl = to). Let us denote by ~k the total time that A1has beenactive before tk, and by rlk the total time that A2has been active before tk. Wecan see that Theorem1 (The chain theorem for two processes) The value for ~i+1 for i >_2 can be computedfor given ~i-1 and ~i using the formulae f2((2k) _ Fl((2k+l) -- Fl((2k-1) 1 - F2((2k) f¢~k+~tl -- Fl(x))dx (14) 1 - Fl((2k+l) 5Optimal solution for two processes: an algorithm Assumethat A1 acts first ((1 > 0). From Theorem 1 can see that the values of (0 --- 0 and (1 determinethe set possible values for (2, the values of (1 and (~ determine possible values for (3, and so on. Therefore, a non-zero value for (1 provides us with a tree of possible values of ¢~. Thebranchingfactor of this tree is determined by the numberof roots of (14) and (15). possible sequence (1, (2,... can be evaluated using (11). The series in that expression must converge, so we stop after a finite numberof points. For each value of (1 we can find the best sequenceusing one of the standard search algorithms, such as Branch-and-Bound.Let us denote the value of the best sequence for each (1 by E~((1). Performing global optimization of G, ((1) by (1 provides us with an timal solution for the case whereA1acts first. Note, that the value of (1 mayalso be 0 (A2acts first), we need to comparethe value obtained by optimization of (~ with the value obtained by optimization of (2 with (1 = Since A1is active in the intervals [t2k, t2k+l], the functions 0-1 and 0-2 on these intervals havethe form 0-1(t) = t - t2k +0-1(t2k ) = t -- (9) 0-~(t)= 0-2(t2k) = Similarly, A2 is active in the intervals [t2k+l, t2k+2].Therefore, on these intervals 0-1 and 0-2 are definedas (10) Substituting these values to (7) weobtain E, (¢i,..., ¢,) co [ (~ + b) )--]~ (1 - F2(¢~)) frf2k-bl k= 0 (1 - FI(~))d~ (2k -- ¢2k+2 (1 - Fl((2k+l)) Jf2k (1 - F2(x))dz \ (15) This theoremallows us to formulate an algorithm for building an optimal solution. (8) : ¢2k+1, f¢~k+~~1 - F2(z))dz J~2k ¢k+ ¢k-1= tk -- to = tk. : 0"2 (t2k-t-1) , i =2k+1, A(¢2k+l) _ F2(C2k+2)--F2(¢2k) i 2k+2. Wecan reduce the numberof variables by denoting (2k+1 ~k+l, and (2k = r/2k. Thus, the odd indices of( pertain A1, and the even indices to A2. Notealso, that 0-2(t)= t - t2k+l+ 0-2(t2k+l) = t -- Gk+l. (13) or on the border described by (12). However,for two processes we can, without loss of generality, ignore the border case. Thus, at each step the time spent by the processes is determined by (13). Wecan see that ¢i appears in three subsequentterms of Eu (0-1, ¯ ¯., o-n), and by differentiation of (11) by (2k (and by (2k+1), we can prove the following theorem: ~2k+1= ~2k+2= t2k+l -- t2k + ~2k-1, 112k+2 = ~]2k-t-3 = t2k+2 -- t2k+l + ~2k. 0"1 (t) (12) min 5Dueto the lack of spacewepresent onlythe mainidea of the algorithm. (11) 52 Necessary conditions for an optimal solution for an arbitrary numberof processes Assumenow that we have n processes A1,..., A,~ using shared resources¯ As before, let to = 0 <_tl < ... < tj <_ ¯.. be time points wherethe processes perform a switch. We permit also the case ofti = ti+l, and this gives us a possibility to assume(without loss of generality) that the processes are workingin a round-robinmanner,such that at tkn+i process i becomesidle, and process i + 1 becomesactive¯ Let #i stands for the index of the process which is active betweenti-1 and ti, and let (i be the total time spent by process #i before ti. As in the case of twoprocesses, there is a one-to-one correspondence between {ti} and {Ci}, and the following equations hold: Ci -- Ci-n : ti -- ti-1, n-1 (16) E Ci-j = ti for i > n. (17) values of C using the formula f (Cp) 1 - F,(C?) n+l--t H (1 - F#j(C2-t))j=l+l l-1 n+i-1 E H (1 i=l-n+l C-.+I = ...= C-I = Co= 0. For future discussion we need to knowwhat is the schedule of each process in the time interval [tl- 1, ti]. By construction of Ci wecan see, that at this interval the subjective time of the process k has the following form: 1, (18) .,n. Weneed to minimize the expression given by (7). The only constraints are the monotonicityof the sequence of C for each process i, and therefore weobtain (19) Cj _< Cj+- for each j. j=i+l F#j(~;-1)) ~-’(1/(’ Optimal solution in the case of additional constraints Assumenowthat the problemhas additional constraints: the solution time is limited by T (or each one of the processes has an upper limit of 7~); and the probability of success of the i-th agent pl is not necessarily 1. It is possible to show that all the formulaeused in the previous sections are valid for the current settings, with two differences: 1. Weuse pj Fj instead of Fj and pjfj instead of fj. 2. All the integrals are from0 to T instead of 0 to e~. The first change maybe easily incorporated to all the algorithms. The second condition adds a boundarycondition for chain theorems: for n agents, for example, (j mayalso get the value /=1 This adds one more branch to the Branch and Bound algorithm and can be easily implemented. Similar changes in the algorithms are performed in the case of the maximalallowed time T/ per agent. In practice, we alwaysuse this limitation setting T~ such that the probability for Ai to reach the goal after Ti, pi (1 - Fi (Ti)), becomesnegligible. E.(CI, ¯¯., C.,...) = E H (1- #j#t = T- Lemma 1 For a system of n processes, the expression for the expected cost (7) can be rewritten ~ n iq-n--i - F#i(x))dx. n--1 In order to get morecompactexpressions we denote Cnh+t by Ck (this notation does not imply l _< n). Given the expressions for (ri, we can prove the following lemma: k=0i=l f¢~+1(1 - F#j(<2) ) Optimal solution for an arbitrary numberof processes: an algorithm As in the case of two processes, assumethat A1 acts first. By Theorem2, given the values of C0, C1,. ¯., C2,~-3we can determineall the possibilities for the value of C2,,- 2 (either Cn-2if the processskips its turn, or oneof the roots of (21)). Given the values up to C2n-2, we can determine the values for C2~-1, and so on. The idea of the algorithm is similar to the algorithm for two processes. The first 2n - 2 variables (including C0 = 0) determine the tree of possible values for Ci. Optimization over 2n - 3 first variables, therefore, provides us with an optimal schedule (as before, we comparethe results for the case wherethe first k < n variables are 0). The first equation correspondsto the fact that the time betweenti- 1 and ti is accumulatedto the (i values of the correspondingprocess; while the secondequation claims that at each switch the objective time of the systemis equal to the sumof the subjective times of each process. For the sake of uniformity we denote also { j=i+l H (1 - F#j(Cr~)) j=/+l (21) j=0 k = 1,...,#iCk+(i-#0, ¢k(t)= t -- ti-i+ Ci-n,k -- #i, k = #i + l,.. Ck+(i-#i)-n, n+l--1 Fi(x))d:. (20) This lemmamakes possible to prove the chain theorem for an arbitrary numberof processes: Theorem2 (The chain theorem)The value for Cj may either be (j-n, or can be computedgiven the previous 2n - 6Process scheduling by intensity control Usingintensity control is equivalent to replacing the binary functions 0i (t) with continuous functions with a range be6Dueto the lack of space, in this section we sketch onlythe mainpoints. 53 tween0 and 1. It is easy to see that all the formulaefor the distribution function and the expected cost from ClaimI are still valid underintensity control settings. The expression to minimize(6) remains exactly the same as in the suspend-resumemodel. The constraints, however, are different: (22) 0 < tr~ _< 1 Theorem 4 Let the set offunctions {cri} be a solution to the minimization problem (6) under the constraints (23). Then, at each point t, where the conditions (24) do not hold and the hazard functions for the problemwith no shared resources, or are continuous, only one process will be active, and it will consumeall the resources. Moreover,this process is the process having the maximalvalue of hk (t) amongall the processes with non-zero resource consumptionin the neighborhoodof the point t. n 0_<E~r~<_ 1 (23) i=1 for the problemwith shared resources. Underthe intensity control settings, we can prove an important theorem: Theorem3 If no time cost is taken into account (a : 0), the model with shared resources and the model with independent processes are equivalent. Namely,given a solution for the model with independent processes, we may reconstruct a solution of the samecost for the modelwith shared resources and vice versa. It is possible to showthat the necessary conditions of Euler-Lagrange for minimization problem (6) (which usually used for solving such type of optimization problems), yield conditions of the form -1 - Fk, (crkl) 1 - Fk~(~rk~) fi-~i)(t) (24) Corollary 2 If a process becomesactive at some point t’ and its hazard function is monotonicallyincreasing by t at It’, oo), it will remainactive until the solution is found. For the case of no shared resources, the following theorem can be proved: ~(t) : 0 or c~(t) = 1, at each time t except, maybe,a set of measurezero. (25) play a very important role in intensity control framework. They are knownas hazard functions or conditional failure density functions. Since the necessary conditions for weakminimumdo not necessaryhold in the inner points, weshould look for a solution on the boundaryof the region described by (22) or (23). Westart with the following lemma: Lemma 2 Let the functions (rl , . . . , (rn be an optimalsolution. Thenthey mustsatisfy the following constraints: 1. For the case of no shared resources, at each time t there is at least oneprocessworkingwith full intensity, i.e., Vt rniaxa~(t ) = 1. . This theoremimplies two important corollaries: Corollary I If the hazard function of one of the processes is greater or equal than the hazardfunctions of the others at to and is monotonicallyincreasing by t, this process should be the only one to be activated. Theorem 5 Let the set offunctions {cri} be a solution to the minimization problem (6) under the constraints (22). for each i the following condition holds: whichin most cases are only a false alarm. The functions hi(t)-1 hk(t) = 1 -- Fk(C~k(t)) For the case of shared resources, at each time t all the resourcesare consumed,i.e., =1. i=1 This lemmahelps us to prove the following theorem for the case of shared resources: 54 By Theorem5, for a system with independent processes, intensity control is redundant. As for a system with shared resources, by Theorem4 and Equation (24), intensity control mayonly be useful whenhazard functions of at least two processesare equal (it is possible to show,however,that evenin this case the equilibrium is not alwaysstable). Thus, the extension of the suspend-resumemodelto intensity control in manycases does not increase the powerof the model. Experiments In this section we present the results of the optimal scheduling strategy for algorithms running on the same processor (the modelwith shared resources), which means that cost is equal to objective time spent. The first two experiments showhowit is possible to benefit from the optimal schedule by using instances of the samenon-deterministic algorithm. The third experimentdemonstrates application of the optimal scheduling for two different deterministic algorithms solving the same problem. Wehave implemented the algorithm described in this paper and compareits solutions with two intuitive scheduling policies. The first policy runs the processesone after another, initiating the secondprocess whenthe probability that the first one will find a solution becomesnegligible. The second intuitive strategy runs the processes simultaneously. The processes are stopped when the probability that they can still find a solution become less -6 than . 10 70 Optimal strategy for algorithms with a bimodal density function Experimentsshowthat running several instances of the same algorithm with a unimodaldistribution function usually corresponds to one-after-another strategy. That is whyin this section we considera less trivial distribution. Assumethat we have a non-deterministic algorithm with a performanceprofile expressed by a linear combinationof two normal distributions with the same deviation. Namely, f(t) = 0.SN(al,crl) + 0.SN(~s,~rs), where #1 o’1 = o’~ = 0.5, and the probability of success is p = 0.8. Figure 2 showsthe averagetime required to reach the goal as a function of #s (the results have been averagedover 10000 simulated problemsgenerated per #s accordingly to f(t)). The results show, that up to a certain point the optimal strategy is only slightly better than the one-after-another strategy, but after that point it becomesbetter than bothintuitive strategies. Optimalschedules for this experimentcontain a single switch point. . ...... 60 5O 2O 10 0 12 8 1 Mean valueof Ihe second peak 1 ’16 I 3 I 4 I 5 I I 6 7 Number of peaks I 8 I 9 10 Figure3: Multimodal distribution: averagetime as a function of k. Optimal strategy for two deterministic algorithms The first two experimentshave generated schedules that are rather intuitive (although they have beenfound in an optimal way). In this experimentwe want to demonstratea very simple settings, wherethe resulting schedulewill be non-trivial. Let us consider a simulation of the first examplefrom the introduction. Assumethat we have two learning systems, both having an exponential-like performance profile which is typical for such systems. Wealso assumethat one of the systems requires a delay for preprocessing but worksfaster. Thus, we assumethat the first systemhas a distribution density fl(t) Ale-:~t, and th e se cond one ha s a density fs(t) = Ase-A2(t-t2), such that A1 < As (the second is faster), and t2 > 0 (it also has a delay). Assumethat both learning systems are deterministic over a given set of examples, and that they mayfail to learn the concept with a probability 1 - Pz -- 1 - Ps = 0.5. It turns out, that, for example, for the values A1-- 3, As-- 10, and t 1 --- 5, the optimalschedulewouldbe to apply the first systemfor 1.15136time units, then (if it found no solution) to apply the secondsystemfor 5.77652 time units, then the first system will run for additional 3.22276 time units, and finally the second system will run for 0.53572 time units. If at this point no solution has been found, both systems have failed with a probability of 1 - 10-6 each. Figure 4 shows the dependencyof the expected time as a function ofts for p = 0.8. As before, the results have been averaged over 10000 simulated examples. Wecan see that in this case running both of the learning systems decreases the average time even while running on a single processor. .. y"" 6 ........................................................... ...................................................................................................... ~30 ’ optln~al -one-after-another simultaneous ........ 4 optln~al -one-alter-al~other ....... s m.~.lt~aou~....... "" 18 Figure2: Bimodal distribution: averagetimeas a functionof/~z. Optimal strategy for algorithms with a multimodal density function The goal of the second experiment was to demonstrate how the numberof peaks of the density function affects the solution. Weconsider a case of partial uniform distribution, wherethe density is distributed over k identical peaks of length 1 placed symmetrically in the time interval from 0 to 100 (thus, the density function will be equal to 1/k when t belongs to one of such peaks, and 0 otherwise). In this experiment we have chosen p = 1. As before, we have generated 10000examplesper k. Figure 3 shows the dependency of the average time from k. Wecan see fromthe results, that the one-after-another strategy requires approximately50 time units, which is the average of the distribution. Simultaneous launch works much worsein this case, due to the "valleys" in the distribution function. Optimalstrategy returns schedules, wherethe processes switch after each peak, and outperforms both of the trivial strategies. Conclusions and future directions In this work we present a theoretical frameworkfor optimal scheduling of parallel anytime algorithms. Weanalyze the properties of optimal schedules for the suspend-resume model, showan extension of the frameworkto intensity control, and provide an algorithm for building optimal schedules whenthe processes share resources (e.g., running on 55 4 simu]laneous ........ 3.5 2.5 8, 2 | ....::::::::::::::::::::: 1.5 0.5 i 2 i 3 i 4 i i i 5 6 7 Delayof thee~ondsystem i 8 i 9 Figure4: Learningsystems:averagetimeas a functionof t2. the sameprocessor). The information required for building optimal schedules is restricted to performanceprofiles. Experiments showthat using optimal schedules speeds up the processes (even running on the same processor), thus providing a super-linear speedup. Similar schemescan be applied for more elaborated setups: ¯ Scheduling a system of n anytime algorithms, where the overall utility of the systemis defined as the maximalutility of its components; ¯ Scheduling with non-zero rescheduling costs; ¯ Providing dynamic schedule algorithms able to handle changes in the environment; ¯ Building effective algorithms for the case of multiprocessor systems. Thesedirections are a subject to an ongoingresearch. Webelieve that using the frameworksuch as above can be very helpful for developing more effective algorithms only by statistical analysis and parallelization of the existing ones. References Boddy, M., and Dean, T. 1994. Decision-theoretic deliberation scheduling for problemsolving in time-constrained environments.Artificial Intelligence 67(2):245-286. Clearwater, S. H.; Hogg, T.; and Huberman,B. A. 1992. Cooperative problem solving. In Bernardo, H., ed., Computation: The Micro and Macro View. Singapore: World Scientific. 33-70. Dean, T., and Boddy, M. 1988. An analysis of timedependent planning. In Proceedings of Seventh National Conferenceon Artificial Intelligence, 49-54. Finkelstein, L., and Markovitch, S. 2001. Optimalschedules for monitoringanytimealgorithms. Artificial Intelligence 126:63-108. 56 Horvitz, E.J. 1987. Reasoning about beliefs and actions under computational resource constraints. In Proceedings of the 1987 Workshopon Uncertainty in Artificial Intelligence. Huberman,B. A.; Lukose, R. M.; and Hogg, T. 1997. An economicapproach to hard computational problems. Science 275:51-54. Janakiram, V. K.; Agrawal,D. P.; and Mehrotra, R. 1988. A randomizedparallel backtracking algorithm. IEEE Transactions on Computers37(12): 1665-1676. Knight, K. 1993. Are manyreactive agents better than a few deliberative ones? In Proceedings of the Thirteenth International Joint Conferenceon Artificial Intelligence, 432-437. Korf, R.E. 1990. Real-time heuristic search. Artificial Intelligence 42:189-211. Kumar,V., and Rao, V. N. 1987. Parallel depth-first search on multiprocessors part II: Analysis. International Journal of Parallel Programming16(6):501-519. Rao, V. N., and Kumar,V. 1987. Parallel depth-first search on multiprocessors part I: Implementation. International Journal of Parallel Programming16(6):479-499. Rao, V. N., and Kumar, V. 1992. On the efficience of parallel backtracking. IEEEtransactions on parrallel and distributed computing. Russell, S., and Wefald, E. 1991. Do the Right Thing: Studies in Limited Rationality. Cambridge,Massachusetts: The MITPress. Russell, S. J., and Zilberstein, S. 1991. Composing realtime systems. In Proceedingsof the Twelfth National Joint Conferenceon Artificial Intelligence (IJCAI-91). Sydney: Morgan Kaufmann. Simon,H. A. 1955. A behavioral modelof rational choice. Quarterly Journal of Economics69:99-118. Simon, H. A. 1982. Models of BoundedRationality. MIT Press. Yokoo,M., and Kitamura, Y. 1996. Multiagent real-timeA*with selection: Introducing competition in cooperative search. In Proceedingsof the SecondInternational Conference on Multiagent Systems ( ICMAS-96), 409-416. Zilberstein, S., and Russell, S. J. 1992. Efficient resourcebounded reasoning in AT-RALPH. In Proceedings of the First International Conference on AI Planning Systems, 260-266. Zilberstein, S. 1993. Operational Rationality Through Compilation of Anytime Algorithms. Ph.D. Dissertation, ComputerScience Division, University of California, Berkeley.