From: ICAPS-04 Proceedings. Copyright © 2004, AAAI (www.aaai.org). All rights reserved. The Value of Consensus in Online Stochastic Scheduling Russell Bent and Pascal Van Hentenryck Brown University Providence, RI 02912 {rbent,pvh}@cs.brown.edu Abstract This paper reconsiders online packet scheduling in computer networks, where the goal is to minimize weighted packet loss and where the arrival distributions of packets, or approximations thereof, are available for sampling. Earlier work proposed an expectation approach, which chooses the next packet to schedule by approximating the expected loss of each decision over a set of scenarios. The expectation approach was shown to significantly outperform traditional approaches ignoring stochastic information. This paper proposes a novel stochastic approach for online packet scheduling, whose key idea is to select the next packet as the one which is scheduled first most often in the optimal solutions of the scenarios. This consensus approach is shown to outperform the expectation approach significantly whenever time constraints and the problem features limit the number of scenarios that can be solved before making a decision. More importantly perhaps, the paper shows that the consensus and expectation approaches can be integrated to combine the benefits of both approaches. These novel online stochastic optimization algorithms are generic and problem-independent, they apply to other online applications as well, and they shed new light on why existing online stochastic algorithms behave well. Introduction Online packet scheduling is a class of important problems arising in communication networks. These problems may significantly vary in scope and difficulty but they are often challenging since they may require online solutions to complex combinatorial optimization problems. This paper reconsiders the online packet scheduling model from (Chang, Givan, & Chong 2000) where jobs from different classes arrive stochastically and must be scheduled within a fixed amount of time or dropped. Each job has a weight and the goal is to minimize the weighted packet loss or, alternatively, to maximize the sum of the weights of the scheduled packets. The arrival distributions of the packets, which are quite complex in our experiments, are available for sampling. The offline version of this packet scheduling problem is relatively simple and can be solved in polynomial time. However, it is sufficiently rich to demonstrate and study the c 2004, American Association for Artificial IntelliCopyright gence (www.aaai.org). All rights reserved. practical issues arising in online optimization. Indeed, the main purpose of (Chang, Givan, & Chong 2000) in studying this problem was to demonstrate that exploiting stochastic information may provide significant benefits in online packet scheduling. They proposed an expectation approach which chooses the next packet to schedule by approximating the expected loss of each packet class over the distribution. The approximation samples the distributions to build scenarios from available and future packets. The expectation algorithm was shown to significantly outperform traditional approaches which ignore stochastic information such as greedy heuristics and algorithms optimizing the decisions based on known packets. However, the expectation algorithm may be quite demanding computationally, since it solves each scenario for each class of packets. As a consequence, it may not scale well when the number of classes increases and it may not be applicable when decisions must be taken within strong time constraints. This paper reconsiders online packet scheduling and proposes a consensus approach which was inspired by our earlier work on online vehicle routing where decisions must be taken quickly and the optimization algorithms are timeconsuming (Bent & Van Hentenryck 2001; 2003). At each time t, the key idea underlying the consensus approach is to solve each scenario once and to select the packet which is scheduled first the most often in the optimal solutions of the scenarios. This consensus algorithm outperforms the expectation approach significantly when time limits restrict the number of scenarios which can be solved before making the decision. Indeed, in the consensus approach, solving a scenario produces global information on all possible actions, not on a single action as in the expectation approach. Moreover, the paper demonstrates that the consensus and expectation approaches can be hybridized to preserve most of their respective advantages. In particular, the hybridization provides the quality of the expectation algorithm when the number of available scenario optimizations is large and most of the quality of the consensus approach when there is only time to solve a few scenarios. The key contributions of this paper can thus be summarized as follows: 1. it proposes a consensus approach for a class of online stochastic optimization algorithms, which (1) reduces the computational burden of the expectation method and (2) should significantly outperform it when time constraints ICAPS 2004 219 limit the number of scenarios that can be solved in between two events; 2. it applies the algorithm to online packet scheduling and demonstrates the significant improvement in quality under strict time constraints. 3. it proposes an hybrid algorithm which almost always outperforms the expectation and consensus approaches and combines their benefits; 4. it provides experimental justification for the consensus approach in online vehicle routing since these applications can only solve 2-10 scenarios in between events. 5. it provides experimental evidence that consensus approaches quickly converge to their optimal solution values in terms of the number of scenarios. The rest of the paper is organized as follows. It first formulates online packet scheduling precisely as a scheduling problem and describes the online stochastic optimization framework. The next section discusses the relationships to Partially Observable Markov Decision Processes (POMDPs), since the packet scheduling problem was initially modelled as a POMDP. The paper then describes the polynomial time offline algorithm, the oblivious online algorithms, and the online stochastic approaches. It reports the experimental results, describes related work, and give the conclusion and future directions. ONLINE O PTIMIZATION(H) 1 2 3 4 5 6 7 8 J ← ∅; w ← 0; for t ∈ H do J ← AVAILABLE R EQUESTS(J, t) ∪ NEW R EQUESTS(t); j ← CHOOSE R EQUEST(J, t); SERVE R EQUEST(j, t); w ← w + w(j); J ← J \ {j}; Figure 1: The Generic Online Algorithm The goal is to find a feasible schedule σ maximizing w(σ). In the online version, jobs are not known a priori but new jobs arrive at each time step, at most one per class. As in (Chang, Givan, & Chong 2000), we assume that the arrival distribution of the jobs, or an approximation thereof, can be sampled. In other words, we assume the existence of a black box which can be queried to return samples of future job requests. Once again, the goal is to find a schedule of maximal weight. Observe that, although classes are not significant in the offline version of the problem, they are used to reduce the complexity of the expectation algorithm. Note also that, in (Chang, Givan, & Chong 2000), the arrival distribution of jobs of a given class is governed by an independent Hidden Markov Model (HMM). Problem Description Online Stochastic Optimization The online packet scheduling problem was initially proposed by (Chang, Givan, & Chong 2000) to study the benefits of stochastic information on the quality of the schedules. Its offline version can be specified as follows. We are given a set Jobs of jobs partitioned into a set of classes C. Each job j is chararacterized by its weight w(j), its arrival date a(j), and its class c(j). Jobs in the same class have the same weight (but different arrival times). We are also given a schedule horizon H = [H, H] during which jobs must be scheduled. Each job j requires a single time unit to process and must be scheduled in its time window [a(j), a(j) + d], where d is the same constant for all jobs (i.e., d represents the time a job remains available to schedule). In addition, no two jobs can be scheduled at the same time and jobs that cannot be served in their time windows are dropped. The goal is to find a schedule of maximal weight, i.e., a schedule which maximizes the sum of the weights of all scheduled jobs. This is equivalent to minimizing weighted packet loss. More formally, assume, for simplicity and without loss of generality, that there is a job scheduled at each time step of the schedule horizon. Under this assumption, a schedule is a function σ : H → Jobs which assigns a job to each time in the schedule horizon. A schedule σ is feasible if As mentioned earlier, our goal is to design and evaluate various approaches to online stochastic optimization. In particular, the main focus of this paper is on determining how to use sampling in order to improve the quality of the online solutions under various time constraints. Hence the techniques considered in this paper apply to a variety of online stochastic optimization problems and are not tailored to the packet scheduling problem. They only rely of the existence of an algorithm to solve (or approximate) the offline problem and on the ability to sample the distribution. More precisely, the approaches presented herein are expressed in terms of the two black boxes: 1. a function OPTIMAL S CHEDULE(J, H) which, given a set J of jobs and a schedule horizon H, returns an optimal schedule of J over H; 2. a function GET S AMPLE(H) which, given a schedule horizon H, returns a set of jobs to be scheduled over H by sampling the arrival distribution. When facing another online stochastic optimization problems, it suffices to replace, or implement, these two functions for the application at hand. Moreover, all the approaches described in this paper shares the same overall structure which is depicted in Figure 1 and expressed in general terms to emphasize its independence from the packet scheduling problem. The approaches only differ in the way they implement function CHOOSE R EQUEST . The online optimization schema simply considers the set of available and new requests at each time step and chooses a request j which is then served and ∀ t1 , t2 ∈ H : t1 6= t2 → σ(t1 ) 6= σ(t2 ) ∀ t ∈ H : a(σ(t)) ≤ t ≤ a(σ(t)) + d. The weight of a schedule σ, denoted by w(σ), is given by X w(σ) = w(σ(t)). t∈H 220 ICAPS 2004 removed from the set of available requests. In the case of packet scheduling, J denotes the set of jobs to schedule, function AVAILABLE R EQUEST(J, t) returns the set of jobs that are available at time t, i.e., AVAILABLE R EQUEST (J, t) = {j ∈ J | t ≤ a(j) + d} and function SERVE R EQUEST(j, t) simply schedules j at time t (σ(t) ← j). Subsequent sections specify the offline algorithms, i.e., their implementations of function CHOOSE R EQUEST . However, we first relate this framework to POMDPs. Relationships to POMDPs POMDPs have been used to represent and solve a broad class of problems with inherent uncertainty and the packet scheduling problem was initially modeled as a POMDP by (Chang, Givan, & Chong 2000). It is thus appropriate to relate this work to this line of research. There is a wide variety of approaches to solve or approximate general POMDPs in the expected sense. Of particular relevance to this work is the sampling method of (McAllester & Singh 1999) (see also (Kearns, Mansour, & Ng 1999a; 1999b)). This approach computes an expectimax function on the POMDP. In other words, it determines the action/policy that achieves the best reward in the expected sense. Informally speaking, computing this function involves a randomized tree exploration: the method tries an action, samples the distribution to find a next state, tries another action, and repeats. Of course, to evaluate the best possible expected benefit of an action a at a node, all possible subsequent actions must be tried. The best of these subsequent actions determines the best possible benefit of a. An important theoretical result of expectimax sampling is that the actual expected value of an action can be formally approximated given enough samples and a sufficient depth. However, the resulting algorithm is exponential wrt the number of actions and the depth. Indeed, as pointed out by (Chang, Givan, & Chong 2000), when a guaranteed approximation is desired, the method is intractable for many problems, including the packet scheduling problem, even when dynamic programming and other techniques are used to reduce computation. When the parameters are chosen to ensure tractability, the algorithm is no longer competitive. However, (Chang, Givan, & Chong 2000) observed that the randomness in the POMDP is independent of the actions taken. The expectimax tree can then be collapsed to a single level, where each action is evaluated against a number of scenarios obtained through sampling. Algorithm E, which is presented later in the paper, was proposed in (Chang, Givan, & Chong 2000) as a general algorithm for POMDPs with this property and applied to the packet scheduling problem to show the value of stochastic information. The new algorithms presented herein would apply to the same class of POMDPs and may be of interest to other classes of POMDPs as well. Offline Scheduling Algorithm An interesting aspect of the offline scheduling problem is that it is solvable in polynomial time (Chang, Givan, & Chong 2000). The quadratic algorithm is loosely based upon earlier work in (Villareal & Bulfin 1983). Although a detailed understanding of the algorithm is not necessary for the rest of the paper, Figure 2 gives the pseudo-code for the optimal algorithm for completeness. The algorithm builds the optimal schedule σ from right to left by considering each job successively, starting with the job of highest weight and, in case of ties, of latest arrival date (line 2). For each job j, the algorithm finds the right-most time t before the deadline a(j) + d where j can be scheduled (line 3). If such time t exists, there are two possibilities. If t ≥ a(j), then the algorithm simply schedules j at time t (line 6). Otherwise, the algorithms tries to shuffle the schedule to accommodate j after its arrival date by shifting existing jobs left (line 7). The shuffling successively swaps job j (initially scheduled at time t) with jobs on its right until j is scheduled after a(j) or no such swap is feasible. Once a schedule is built, the algorithm applies a postprocessing step to make sure that jobs with highest weights and, in case of ties, earliest arrival dates, are scheduled as early as possible (line 8). This postprocessing step is slightly more specific than in (Chang, Givan, & Chong 2000) to ensure that the offline schedule is most appropriate for an online use. Note that, in Figure 2, we use σ(t) = ⊥ to denote that no job is scheduled at time t at this stage. Moreover, FEASIBLE S WAP (σ, t1 , t2 ) holds if a(j1 ) ≤ t2 ≤ a(j1 ) + d ∧ a(j2 ) ≤ t1 ≤ a(j2 ) + d. where j1 = σ(t1 ) and j2 = σ(t2 ). We also use min S (resp. max S) to denote the minimum (resp. maximum) of a set S. By convention, if S is empty, both expressions return ⊥. Online Oblivious Algorithms This section describes two existing online algorithms which are not using stochastic information. Greedy (G): This heuristic schedules the job with highest weight and, in case of ties, the job with the earliest deadline. (Chang, Givan, & Chong 2000) refers to this heuristic as SP. It can be specified formally as CHOOSE R EQUEST-G(J, t) 1 σ ← OPTIMAL S CHEDULE(J, [t, t]); 2 return σ(t); Local Optimal (LO): This algorithm chooses the next job to schedule at time t by finding the optimal solution for the available jobs at t. The first job in the optimal solution is selected. (Chang, Givan, & Chong 2000) refers to this algorithm as CM. It can be specified formally as CHOOSE R EQUEST-LO(J, t) 1 σ ← OPTIMAL S CHEDULE(J, [t, t + d]); 2 return σ(t); Note that the optimal schedule cannot expand beyond time t + d by definition of the time windows. ICAPS 2004 221 OPTIMAL S CHEDULE(Jobs, H) 1 2 3 4 5 6 7 8 9 σ(t) ← ⊥ (t ∈ H); for j ∈ Jobs ordered by decreasing hw(j), a(j)i do p ← max{t ∈ H | t ≤ a(j) + d & σ(t) = ⊥}; if p 6= ⊥ then if p ≥ a(j) then σ(p) ← j; else σ ← SHUFFLE(σ, j, p); POSTPROCESS(σ, Jobs, H); return σ; SHUFFLE(σ, j, p) 1 2 3 4 5 6 7 8 9 σ 0 ← σ; σ(p) ← j; while p < a(j) do q ← min{t | p + 1 ≤ t ≤ p + d & a(σ(t)) ≤ p}; if q 6= ⊥ then SWAP(σ, p, q); p ← q; else return σ 0 ; return σ; POSTPROCESS(σ, H) 1 2 3 4 for p ∈ H, q ∈ H : p < q do if FEASIBLE S WAP(σ, p, q) & hw(σ(p)), a(σ(p))i < hw(σ(q)), a(σ(q))i then SWAP(σ, p, q); Figure 2: The Optimal Packet Scheduling Algorithm Online Stochastic Algorithms We now consider a variety of online approaches using stochastic information to make more informed decisions. We assume that each approach has the time to execute the offline algorithm nbOpt times for samples which extend ∆ time steps in the future. Both nbOpt and ∆ are parameters of the online stochastic algorithms. Note that function OPTI MAL S CHEDULE is quadratic in the schedule horizon in the worst case. Expectation (E): This is the primary method proposed by (Chang, Givan, & Chong 2000). It is referred to as SPM in that paper. Informally speaking, the method generates future jobs by sampling and evaluates each possible selection against that sample. A naive implementation can be specified as follows: CHOOSE R EQUEST-NE(J, t) 1 2 3 4 5 6 7 8 9 for j ∈ J do f (j) ← 0; for i ← 1 . . . nbOpt/|J| do R ← J ∪ GET S AMPLE([t + 1, t + ∆]); T ← [t + 1, t + 1 + ∆ + d]; for j ∈ J do σ ← OPTIMAL S CHEDULE(R \ {j}, T ); f (j) ← f (j) + w(j) + w(σ); return argmax(j ∈ J) f (j); Lines 1-2 initialize the evaluation function f (j) for each job j. The algorithm then generates a number of samples for 222 ICAPS 2004 future requests (lines 3-4). For each such sample, it computes the set R of all available and sampled jobs at time t (line 4) and an appropriate time horizon for the scenarios (line 5) The algorithm then considers each available job j successively, it implicitly schedules j at time t, and applies the optimal offline algorithm (line 7) using R \ {j} and the time horizon. The evaluation of job j is updated in line 8 by incrementing it with its weight and the score of the corresponding optimal offline schedule σ. All scenarios are evaluated for all jobs and the algorithm then returns the job j ∈ J with the highest evaluation. The above implementation is very ineffective when applied to the packet scheduling problem. Indeed, the algorithm can be considerably improved by considering only one job per class (i.e., the one with the earliest arrival date), since other choices are guaranteed not to produce better schedules. The jobs are called the representative jobs in the following. The representative job with the highest evaluation is then chosen. This reduces the number of iterations of the inner loop from |J| to C, which is typically much smaller. The implementation of E can be specified as follows: CHOOSE R EQUEST-E(J, t) 1 2 3 4 5 6 7 8 9 10 11 for c ∈ C do j(c) ← argmin(j ∈ J : c(j) = c) a(j); f (j(c)) ← 0; for i ← 1 . . . nbOpt/|C| do R ← J ∪ GET S AMPLE([t + 1, t + ∆]); T ← [t + 1, t + 1 + ∆ + d]; for c ∈ C do σ ← OPTIMAL S CHEDULE(R \ {j(c)}, T ); f (j(c)) ← f (j(c)) + w(j(c)) + w(σ); c∗ = argmax(c ∈ C) f (j(c)); return j(c∗ ); Lines 1-3 initializes the representative jobs j(c) for each class c and their evaluations f (j(c)). The algorithm then proceeds as before but only considers representative jobs. 1 Algorithm E partitions the number of available optimizations among the various job classes. As a consequence, every class of jobs is evaluated against nbOpt/|C| scenarios. This is a significant drawback when time is limited (nbOpt is small) and/or when the number of requests is large. This is precisely why online vehicle routing algorithms (Bent & Van Hentenryck 2001; 2003) do not use this method, since the number of requests is very large (about 50 to 100), time between decisions is limited, and the optimization algorithm is demanding computationally. Consensus (C): We now turn to a novel online algorithm. Algorithm C uses stochastic information in a fundamentally different way and can be viewed as an abstraction of the sampling method used in online vehicle routing with stochastic customers (Bent & Van Hentenryck 2001; 1 It is possible to design an alternative sampling method that utilizes independent samples for each request. However empirical studies in (Chang, Givan, & Chong 2000) and by these authors on this problem show this approach is less successful in general. Low Weight Rank 5 1 10 4 20 7 1 10 15 13 50 16 30 19 25 22 40 25 3 28 2003). Instead of evaluating each possible request at time t with respect to each sample, its key idea underlying algorithm C is to execute the optimization algorithm on the available and sampled jobs and to count the number of times a job is scheduled at time t in each resulting solution. The job with the highest count is selected. The motivation behind algorithm C is the least-commitment heuristic (Stefik 1981) often used in planning and scheduling. By choosing a job which is selected first the most often, algorithm C takes a decision which is consistent with the optimal solutions of many samples. Algorithm C can be specifies as follows: Medium Weight Rank 600 2 800 5 500 8 700 11 750 14 550 17 650 20 400 23 450 26 575 29 High Weight Rank 1000 3 2000 6 1500 9 1200 12 3000 15 2500 18 3500 21 4000 24 5000 27 4500 30 CHOOSE R EQUEST-C(J, t) 1 2 3 4 5 6 7 for j ∈ J do c(j) ← 0; for i ← 1 . . . nbOpt do R ← J ∪ GET S AMPLE([t + 1, t + ∆]); σ ← OPTIMAL S CHEDULE(R, [t, t + ∆ + d]); c(σ(t)) ← c(σ(t)) + 1; return argmax(j ∈ J) c(j); Observe line 5 which calls the offline algorithm with all available and sampled jobs and a schedule horizon starting at t and line 6 which increments the number of times job σ(t) is scheduled first. Line 7 simply returns the job with the largest count. Algorithm C has several appealling features. First, it does not partition the available optimizations between the job classes, which is a significant advantage when the number of available optimizations is small and/or when the number of classes is large. Second, it avoids the conceptual complexity of dealing with classes of jobs. The main strength of algorithm C is to identify (quickly and implicitly) the most promising requests. Its limitation is that it ignores the score of the schedules and may fail to discriminate between requests with similar or close consensus scores but different weights. Consensus+Expectation (C+Ek ): Algorithms E and C both have advantages and limitations. Algorithm C+E k attemps to combine their strengths while minimizing their drawbacks. Its key idea is to run algorithm C first to identify a small set of k promising requests and to run the naive version of E to discriminate between them in a precise fashion. Letting nbOpt = Sc + kSe , the algorithm can be specified as follows: CHOOSE R EQUEST-C+E k(J, t) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 for j ∈ J do c(j) ← 0; for i ← 1 . . . Sc do R ← J ∪ GET S AMPLE([t + 1, t + ∆]); σ ← OPTIMAL S CHEDULE(R, [t, t + ∆ + d]); c(σ(t)) ← c(σ(t)) + 1; P = argmax(k)(j ∈ J) c(j); for j ∈ P do f (j) ← 0; for i ← 1 . . . Se do R ← P ∪ GET S AMPLE([t + 1, t + ∆]); T ← [t + 1, t + 1 + ∆ + d]; for j ∈ P do σ ← OPTIMAL S CHEDULE(R \ {j}, T ); Table 1: Job Classes 15 16 f (j) ← f (j) + w(j) + w(σ); return argmax(j ∈ P ) f (j); Observe line 7 which collects in P the k jobs with the largest counts. Th rest of the algorithm simply applies algorithm E on the jobs in P . Algorithm C+E k applies OPTI MAL S CHEDULE Sc + kSe = nbOpt times. Experimental Results This section compares the various approaches on the packet scheduling problems. More precisely, it evaluates the effect of various choices of parameters on the quality of the schedule. In practice, these parameters depend on the specifics of the application (e.g., the available time to make a decision) and the problem features (e.g., the number of classes and the time windows). The experimental results are based on the problems of (Chang, Givan, & Chong 2000). In these problems, the arrival distributions are specified by independent HMMs, one for each job class. Each HMM has three states which correspond to low, medium, and high job arrival rates. The transitions between the states are drawn uniformly in the interval [0.9, 1.0]. In the low/medium/high arrival rate, the probability of generating a job is drawn uniformly from the intervals [0.0,0.01], [0.2,0.5], and [0.7, 1.0] respectively. To create problems that are challenging enough, it is important to normalize the HMMs. First, each set of classes (low, medium, and high) should have roughly the same number of jobs generated in the long run. Thus, each HMM is simulated for a long period of time and the job generation probabilities of the HMMS are normalized such that E[low] ≈ E[medium] ≈ E[high]. Furthermore, too few and too many jobs make the problems easy, so the job generation probabilities are further normalized so that roughly 1.5 tasks arrive at each time unit. The classes of jobs are also divided in three categories corresponding to low, medium, and high weights. Table 1 depicts the weights of the various jobs classes are shown. A rank number is also associated with each class to show which weights match which classes. A problem with k classes includes the classes with rank numbers 1..k. For example, problems with 7 classes use the classes with ranks 1..7 and weights (5, 10, 20, 600, 800, 1000, 2000). These ICAPS 2004 223 105 3 Avg Non Zero Actions 100 2.5 O G LO E C C+E3 90 85 2 Actions Average Weighted Loss 95 1.5 80 75 1 70 65 0 20 40 60 80 100 120 140 160 Maximum Number of Offline Optimizations 180 200 Figure 3: Effect of the Number of Optimizations (nbOpt) 0.5 0 20 40 60 80 100 120 140 Maximum Number of Optimizations 160 180 200 Figure 5: Number of Promising Actions wrt the Number of Optimizations (nbOpt) 45 Avg Available Actions Avg Samples Per Action 40 35 Expectation Data 30 25 20 15 10 5 0 0 20 40 60 80 100 120 140 Maximum Number of Optimizations 160 180 200 Figure 4: Optimizations per Actions wrt the Number of Optimizations (nbOpt) are exactly the same problems as those in (Chang, Givan, & Chong 2000). All experimental results are given for an online schedule consisting of 200,000 time steps. It is thus unrealistic to sample the future for so many steps (since the algorithm is quadratic in the time horizon). In general, the sampling horizon ∆ (which corresponds to the depth of sampling in the POMDP algorithms) is chosen to be 50 (which seems to be an excellent compromise between time and quality when d = 20). The effect of the sampling horizon is also analyzed in detail. Effect of the Number of Optimizations Figure 3 compares the various approaches on the 7-class problem from (Chang, Givan, & Chong 2000) in terms of the number of offline optimizations. In these problems, the time window size d is 20 and the sampling horizon ∆ is 50. Since these problems have a small number of classes, they should be favorable to algorithm E (e.g., when compared to vehicle routing problems which have many more actions and which are more demanding computationally). The results depict the packet loss for each of the algorithms as a function of 224 ICAPS 2004 the number of optimizations (i.e., small values denote a better algorithm) as well as the average offline solution which should be viewed as an (unachievable) lower bound. These offline solutions are computed a posteriori (when the exact data is available) and are obtained by applying the offline algorithm. The results are interesting in many ways. First, they clearly indicate the value of stochastic information, since algorithms E, C, and C+E 3 are clearly superior to the oblivious algorithms. Second, observe that algorithm LO is dominated by the greedy heuristic. This confirms once again that local optimization may not be a good approach to online problems, since it may lead to solutions that cannot accommodate future requests easily (Bent & Van Hentenryck 2001; 2003). Finally, and perhaps most importantly, the results demonstrate the value of consensus on the packet scheduling problems. In particular, C is the best method when only few scenarios can be solved before decision time and C+Ek always dominates E and significantly so when only few scenarios can be solved. It is also worth pointing out that the consensus approach reaches its best value very quickly. In other words, it does not need many samples to identify the most promising packets to schedule. Figure 4 depicts the number of average actions considered with respect to the number of scenarios (and the number of scenarios available per action) for algorithm E. The number of actions may vary because E’s decisions may differ with different number of scenarios. They indicate that there are about 5 classes of packets to consider at each time step for all values of nbOpt. The number of average classes to consider is slightly lower when there are few scenarios to work with. Figure 5 depicts the number of promising actions suggested by Algorithm C as the number of scenarios increases, i.e., the number of packet classes which are scheduled first at time t. The figure indicates that there are at most 3 classes to consider, justifying why there is no need to go beyond algorithm C+E3 on average on these problems. Effect of the Sampling Horizon Figures 6, 7, and 8 evaluates the effect of the sampling horizon ∆. They consider the 7-class problem with d = 20 1.2 G LO E C C+E2 C+E3 C+E5 160 O G LO E C C+E3 140 1 Average Weighted Loss Increase Over Optimal 150 Average Weighted Loss 130 120 110 100 0.8 0.6 0.4 90 0.2 80 0 70 5 10 15 20 25 30 Job Classes 60 0 20 40 60 80 100 120 Sampling Horizon 140 160 180 200 Figure 9: Effect of the Number of Actions Figure 6: Effect of Sample Horizon on the Weighted Loss 5.6 Avg Available Actions Avg Samples Per Action 5.4 5.2 Expectation Data 5 4.8 4.6 4.4 4.2 4 3.8 3.6 0 20 40 60 80 100 120 Sampling Horizon 140 160 180 200 Figure 7: Samples per Actions wrt the Sampling Horizon and nbOpt=20, and evaluates the effect of varying ∆. Figure 6 shows that a reasonable schedule horizon is critical for E. For low schedule horizons, E is worse than the oblivious algorithms and it only becomes comparable to C when ∆ = 20. More generally, our results indicates that E needs ∆ to be not smaller than d to perform reasonably. Note also that there is little change after ∆ = 50 and that C+E 3 is very robust wrt the sampling horizon. Figure 7 shows the average number of actions for E as a function of the sampling horizon. Note how the number of actions sharply decreases initially when the schedule horizon is increased from 0 to 20 and then 40 time steps. Similarly, Figure 8 depicts the number of promising actions identified by C. Once again, the number of actions sharply increases when ∆ moves from 0 to 20 and then 40. It is basically stable after that. Once again, these results indicate the value of consensus which makes the approach much less sensitive to the sampling horizon and always outperforms the oblivious algorithms. This is not the case of Algorithm E which exhibits pathological behavior for low sampling horizons. Effect of the Number of Classes 2.8 Avg Non Zero Actions 2.6 2.4 2.2 Actions 2 1.8 1.6 1.4 1.2 1 0.8 0 20 40 60 80 100 120 Sampling Horizon 140 160 180 200 Figure 8: Number of Promising Actions wrt the Sampling Horizon Figure 9 depicts the effect of the number of packet classes on the behaviour of the algorithms. The stochastic algorithms given a fixed number of optimizations (nbOpt=30) and the sampling horizon is 50. The figure describes the quality of the algorithms (i.e., the weighted loss) as the number of classes grow and depicts their average weighted loss as a percentage over the theoretical offline optimum. The number of optimizations (nbOpt=30) was chosen so that Algorithm E has at least one sample per action (i.e., when there are 30 classes). The results show that, as the number of classes increases, Algorithm C gets closer to Algorithm E and outperforms it as soon as the number of classes reaches 20. Note also that Algorithm C+E 3 outperforms C and E but the distance wrt C decreases as the number of classes increases. Once again, these results shed light on when the consensus approach (and its hybridization with the expectation approach) is beneficial. They indicate that the benefits of the consensus approach are more dramatic as the number of classes increases. Note also that algorithm C remains ICAPS 2004 225 between 20 and 25% from the optimum. Algorithm E starts as 10% off the optimum but it quality quickly decreases to 40% or more as the number of classes increases. Related Work Online algorithms in computer science have been addressed for numerous years. A good source for an overview of such problems and techniques for solving them can be found in (Fiat & Woeginger 1998). Research in online algorithms has traditionally focused on techniques that are oblivious to any information about the future and many results are concerned with the worst case in the form of the competitive ratio (Karlin et al. 1988). It has only been recently that researchers have begun to address online problems where information about future uncertainty is known and how such information may increase the performance of algorithms. This includes scheduling problems (Chang, Givan, & Chong 2000), vehicle routing problems (Bent & Van Hentenryck 2003), (Bent & Van Hentenryck 2001), (Cambell & Savelsbergh 2002) and elevator dispatching (Nikovski & Branch 2003) to name a few. Research on these problems has varied widely, but the unifying theme is that knowing probabilistic information about the future significantly increases performance. The consensus approach proposed in this paper was motivated by online stochastic vehicle routing proposed in (Bent & Van Hentenryck 2003; 2001). In these algorithms, routing plans are generated continously during plan execution. They are ranked by a consensus function and the chosen plan is used to make the departure decisions. In these routing problems, only 2 to 10 scenarios can be solved in between two events and the number of events is large (e.g., 50 different requests). Conclusion This paper reconsidered the online packet scheduling in computer networks proposed in (Chang, Givan, & Chong 2000) and made a number of contributions. First, it proposed a consensus approach to sampling and showed experimentally how the consensus approach outperforms the expectation approach from (Chang, Givan, & Chong 2000) when the number of scenarios that can be solved at decision time is small. Second, the paper shows how the consensus and expectation approaches may be hybridized to produce an effective algorithm outperforming both approaches most of the time. The hybrid algorithm uses the consensus approach to select the most promising actions which are then evaluated using the expectation approach. More generally, the paper confirms the value of consensus in online stochastic optimization. It indicates the consensus is very effective whenever the number of scenarios that can be solved before decision time is small, which occur in many practical problems and justifies the design of online vehicle routing algorithms (Bent & Van Hentenryck 2003; 2001). It also indicated that the consensus methodology complements the expectation approach even when large numbers of optimizations are available. There are numerous directions for future directions. First, it would be interesting to scale these algorithms to packing 226 ICAPS 2004 scheduling with parallel routers as well as more complicated problems such online vehicle routing and job-shop scheduling. Second, these algorithms are dependent to a certain degree on the sampling accuracy. It would be interesting to integrate learning techniques to approximate the underlying distributions of the job arrivals (and perhaps to change the distributions over time), as well as to determine the robustness of the approaches and how well the approach they adapt to changes. Finally, it would be interesting to determine if the consensus approach, and its hybridization with the expectation algorithm, are relevant, and bring benefits for, general POMDPs. Acknowledgments Many thanks to the anonymous reviewers for some excellent remarks and suggestions. This work was partially supported by NSF ITR Awards DMI-0121495 and ACI-0121497. References Bent, R., and Van Hentenryck, P. 2001. Scenario Based Planning for Partially Dynamic Vehicle Routing Problems with Stochastic Customers. Operations Research. (to appear). Bent, R., and Van Hentenryck, P. 2003. Dynamic Vehicle Routing with Stochastic requests. International Joint Conference on Artificial Intelligence (IJCAI’03). Cambell, A., and Savelsbergh, M. 2002. Decision Support for Consumer Direct Grocery Initiatives. Report TLI-0209, Georgia Institute of Technology. Chang, H.; Givan, R.; and Chong, E. 2000. On-line Scheduling Via Sampling. Artificial Intelligence Planning and Scheduling (AIPS’00) 62–71. Fiat, A., and Woeginger, G. 1998. Online Algorithms: The State of the Art. Karlin, A.; Manasse, M.; Rudolph, L.; and Sleator, D. 1988. Competitive Snoopy Caching. Algorithmica 3:79– 119. Kearns, M.; Mansour, Y.; and Ng, A. 1999a. A Sparse Sampling Alogorithm for Near-Optimal Planning in Large Markov Decision Processes. International Joint Conference on Artificial Intelligence (IJCAI’99). Kearns, M.; Mansour, Y.; and Ng, A. 1999b. Approximate Planning in Large POMDPs via Reusable Tragectories. Advances in Neural Information Processing Systems. McAllester, D., and Singh, S. 1999. Approximate Planning for Factored POMDPs Using Belief State Simplification. Uncertainity in AI (UAI’99). Nikovski, D., and Branch, M. 2003. Marginalizing Out Future Passengers in Group Elevator Control. Uncertainity in Artificial Intelligence (UAI’03). Stefik, M. 1981. Planning with Constraints (MOLGEN: Part 1). Artificial Intelligence 16:111–139. Villareal, F., and Bulfin, R. 1983. Scheduling a Single Machine to Minimize the Weighted Number of Tardy Jobs. IEEE Transactions 337–343.