[DEC Market-like Task Scheduling Distributed in Computing Environments Thomas W. Malone Richard E. Fikes Kenneth R. Grant Michael T. Howard MASSACHUSETTS INSTITUTE OF TECHNOLOGY 50 MEMORIAL DRIVE CAMBRIDGE, MASSACHUSETTS 02139 211989) Market-like Task Scheduling Distributed in Computing Environments Thomas W. Malone Richard E. Fikes Kenneth R. Grant Michael T. Howard March 1986 CISRWPNo. 139 Sloan WP No. 1785-86 c 1986 T.W. Malone, R.E. Fikes. K.R. Grant, MT. Howard Center For Information Systems Research Sloan School of Management Massachusetts Institute of Technology 0»f ,<moiko«'t^ DEC 2 1 1989 I I Abstract This paper focuses on a class of market- like methods for decentralized scheduling of tasks in distributed computing networks. In these methods, processors send out "requests for bids" on tasks to be done and other processors respond with "bids" giving estimated completion times that can reflect such factors as machine speed and data availability. scheduling is A simple and general protocol for such described and simulated in a wide variety of situations (e.g., different network configurations, system loads, and message delay times). The protocol is found to provide substantial performance improvements over processing tasks on the machines at which they originate even face of relatively large in the message delays and relatively innaccurate estimates of processing times. The protocol also performs well in comparison to one simpler final section of the paper, a prototype system is and one more complex alternative. In the described that uses this protocol for sharing tasks among personal workstations on a local area network. Market-like Task Scheduling in Distributed Computing Environments With the rapid spread of personal computer networks and the increasing availability of low cost VLSI processors, the opportunities for massive use of parallel and distributed computing are becoming more and more compelling ([JonSO); [Gaj85]; fundamental problems that must be solved by all such systems This problem on processors. is, One [Dav 81bl; [Ens8l]; [Ber82I; [Bir821). is of the the problem of how to schedule tasks of course, a well-known one in traditional operating systems and scheduling theory, and there are a number of mathematical and software techniques for solving it in both single and multi-processor systems (e.g.,[Bri73]; [Cof73]; [Con671; [JonSO]; [KriTlJ; [Lam68); [Kle81];[Wit80];[Vnt811). Almost where all all the traditional the information work is in this area, however, deals with centralized scheduling techniques, brought to one place and the decisions are made there. In highly parallel systems where the information used in scheduling and the resulting actions are distributed over a number of different processors, there scheduling techniques. For example, to may be substantial benefits from developing decentralized when a centralized scheduler fails, the entire system is brought a halt, but systems that use decentralized scheduling techniques can continue to operate with the remaining nodes. Furthermore, much of the information used distributed and rapidly changing (eg , momentary system load). in scheduling is all inherently Thus, decentralized scheduling techniques can "bring the decisions to the information" rather than having to constantly transmit the information to a centralized decision maker. Because of these advantages, a growing body of recent work has begun such decentralized scheduling techniques in more detail to explore [Sin85]; [Liv82]; [Mal83|; [Lar821; [SmiSO], [TenSlb]; [StrSl] In this paper, ; [SRC851; see Stankovic we focus on a particular (e.g., [StaSSJ; [Sto771; [Sto781; [Cho79]; [Bry81I; [Sta84]; [Ten81al, for a useful review). class of decentralized scheduling techniques: those involving market-like "bidding" mechanisms to assign tasks to processors (eg, [Mal831; [Far72]; [SmiSO]; [Sin85]; [Far73]; [Sta84]). Such techniques are remarkably loading and current locations of data and related tasks we will describe a (e.g., many forms see [Sta84]). In succeeding sections of simple but powerful bidding protocol and present detailed simulation and analytic results that explore the behavior of the protocol results apply to terms of the kinds of factors job characteristics, processor capacities and speeds, current network they can take into account: the paper, flexible in in a wide variety of stituations. These of parallel computation, regardless of whether or not the processors are geographically separated and whether or not they share memory. Motivating example Even though the simulation results are applicable in many situations, the driving motivated the development and analysis of our protocol was the increasingly large numbers of personal workstations connected by the final section of the paper, to share tasks One local we describe a prototype system, (e.g., common distributed systems becomes possible. systems based on local area networks [Bir82I; [BogSO]). The is to that a is new philosophy designing for traditional philosophy used in designing distributed have dedicated personal workstations which remain and various kinds of data base servers (e.g., [Bir821; [Sch84]). A system file flexible design. In this new philosophy, personal workstations are idle servers, print like Enterprise that schedules tasks on the best processor available at run time (either remote or local) enables a more In called Enterprise, that uses the protocol not used by their owners, and dedicated special purpose servers such as servers, situation of among workstations in such a network. of the benefits of sharing tasks in such networks when area networks example that still much dedicated to their owners, but during the (often substantial) periods of the day when their owners are not using them, these personal workstations become general purpose servers, available to other users on the network. "Server" functions can migrate and replicate as needed on otherwise unused machines (except for those such as file servers and print servers that are required to run on specific machines). programs can be written to take advantage of the maximum amount parallelism available on a network at any time, with little extra cost of processing when Thus power and there are few extra machines available. Problem description and related work In describing the problem on which components of task scheduling (2) we are focusing, it is useful, first of all, to distinguish in distributed networks: (1) the assignment of tasks the sequencing of task once they have been assigned to processors. between two to processors, and Much previous work on distributed load sharing has focused only on the task assignment problem (eg, [Liv82); [Sta85|; [Cho79]) and used first come-first served (FCFS) sequencing. It is sequencing can have a major effect on system performance measures clear, (e.g., however, that task see [Con67|). The task scheduling method we will describe solves both the task assignment and the task sequencing problems simultaneously. Another common simplifying assumption much in previous work processors in the scheduling network are identical It is (e.g., [Liv82]; [BrySl]) is that all frequently the case, however, in our motivating example of workstations on networks and in many other distributed processing situations, that there are important differences such as speed, ability to do certain tasks at all, between processors. These differences include factors and whether the necessary the data or programs are already present on the processor or must be transmitted from elsewhere on the network. scheduling method In some cases is (e.g., processor at the Our designed to accomodate a wide variety of such processor differences. [BrySll; [Sta841), same time and begun execution. These to it is desirable to be able to have move tasks from one processor capabilities often add substantially to the to more than one task on a another after the tasks have complexity of implementing a real system, however, so we have simplified our analysis by omitting them. The scheduling methods we will consider can be applied in any situation that has the properties we have just described; tasks are sequenced one at a time on heterogeneous processors without being moved or preempted afler execution begins. Bidding mechanisms We mentioned above a number of advantages (e.g., reliability for task scheduling in distributed networks. scheduling to be one in which processors ("contractors"), (2) (1) We and flexibility) of bidding mechanisms mechanism for distributed task define a bidding task descriptions are broadcast (by "clients") to a set of possible some subset of the contractors respond with "bids" indicating their availability (and possibly cost) for performing the task, and (3) one of the bidders is selected to perform the task. Note that there are many characteristics of human markets that are not included in this definition. For example, bidding mechanisms as we have defined them here can assign tasks after only one "round" of bid submissions. They need not include any of the iterative price adjustment, based on supply and demand, that is widely discussed in microeconomic theory (eg, [Arr71|), and that is included in the network channel access scheduler described by Kurose, Schwartz, and Yemini (KurSS). In fact, our early experiments with an iterative pricing assignment [Mal82] discouraged us from pursuing with guaranteeing convergence at all this ((Arr60]; mechanism for distributed task approach further because of the difficulties [ArrTlD and because of the extremely computationally intensive nature of the iterative process. Several of the systems mentioned above ([Far721; [Sta84); [Ram84]) use non-iterative bidding mechanisms, but for scheduling problems differer\t from the one we are considering. One previous system, however, can be used for problems of the type in which we are interested: The contract net protocol ([SmiSOI; [SmiSlI; [Dav83|) is a very general protocol for distributing tasks in a network of heterogenous processors based on whatever task specific criteria are provided by the system designer. award" sequence allows mutual selection of clients and contractors; that Its "announcement, is, contractors choose which clients to serve and clients choose which contractors to use. bid, system ([Sin85]) which is for In order to achieve criteria later based in part on the contract net protocol and on an earlier report of the system described here ([Mal83I), shows how the protocol can be modified number of messages A to substantially reduce the required. any particular scheduling objective with these systems, however, must be developed. The most important way contract net protocol is in which our scheduling protocol by specializing the selection criteria to contractors on the basis of estimated completion times (from to perform the We job). generalized contract net protocol allows us to will see make differs from the two primary dimensions: contractors select clients' tasks in the order of numerical task priorities, minimum requirements specific selection among and (1) (2) clients select the contractors that satisfy the below how this simple specialization of the direct use of a variety of optimality results traditional scheduling theory, including those about minimizing mean from flow times, and to do detailed evaluations of the effect of numerous factors on scheduling performance. One of the potential problems that arises in bidding systems like the contract net protocol stems from the fact that only idle processors submit bids. any information about the loads and evaluated. This may in the absence of capabilities of processors that are busy at the time the bids are be undesirable, for example, in situations where only slow processors are available at the time the task arrives, but a sections below, Thus assignment decisions are made we consider two much faster processor will become available soon. alternative solutions to this problem. In the Our primary scheduling method, called the Distributed Scheduling Protocol, uses the simple technique of canceling and restarting tasks if the later bid is sufficiently better We also consider, in a later section, a much more elaborate alternative method in which processors maintain detailed future schedules of tasks they have agreed to perform and use this information in submitting their bids. investigate the performance of both these methods in a variety of situations. THE DISTRIBUTED SCHEDULING PROTOCOL Our simulation results . Figure 1 primary scheduling process, the Distributed Scheduling Protocol illustrates the steps in our (DSP): 1 The client broadcasts task, a "request for bids ". The request for bids includes the priority of the any special requirements, and a summary description of the task that allows contractors to estimate 2. processing time. its Idle contractors respond with "bids" giving their estimated completion times. contractors respond with "acknowledgements "and add the Busy task to their queues (in order of priority). 3. When a contractor becomes 4. (a) If more submits a bid for the next task on its queue. than one bid has been received when the client evaluates bids, the task the best bidder. is set idle, it The length of time to wait before evaluating bids is is sent to a parameter that depending on the message delay time. (b) If no bids have been received when the client evaluates bids, the task is sent to the first subsequent bidder. (c) If a later bid is "significantly better" than the early one, the client cancels the task first bidder and sends the task to the later late bid is "significantly better" simulations below. effects If and cannot be is bidder The criterion for deciding whether a a parameter, the effect of which the later bid is not significantly better (or restarted), the client sends a cancel 5. When a contractor finishes a task, 6. When a client it on the message if is examined in the the task has side- to the later bidder. returns the result to the client. receives the result of a task, it broadcasts a "cancel" message so that all the contractors can remove the task from their queues. A more detailed description of the DSP, including complete specifications of the message contents, provided by Malone, Fikes, and Howard ([Mal83]). Global Scheduling Objectives is One of the advantages of this protocol is that it assigned from the mechanism of actually scheduling tasks according sequences jobs according to their order in a priority are assigned are made list while the policy decisions about at a higher level in the system (see [Lam68]). The DSP kind of separation of policy and mechanism. sequencing jobs according to priorities scheduling as a list one level of the system In this approach, (e.g., [Cof731). DSP level. how priorities allows precisely the protocol itself assigned at some higher priorities are to these priorities (e.g., [Cof731, [Lam68]). Traditional schedulers for centralized computing systems often use basis for layering the design of a system how separates the policy decisions about same concerned only with is By assigning these priorities in different ways, the designers of distributed systems can achieve different global objectives. example, it is well known For that in systems of identical processors, the average waiting time of jobs minimized by doing the shortest jobs first [Con67]. is Thus, by assigning priorities in order of job on priority result length, the completely decentralized decisions based in a globally optimal sequencing of tasks on processors. Optimality results for mean flow time and maximum flow time. Traditional scheduling theory (e.g., [Con671) has been primarily concerned with minimizing one of two objectives: time of jobs (Fave)--the average time from availability of a job until maximum flow time of jobs (Fmai)-the time until the completion of the maximizes the utilization of the processors being scheduled [Cof73|. scheduling theory, involving the "tardiness" of jobs in relation to it is (I) the average flow completed, and last job (2) the Minimizing F^ax ^Iso (A third class of results from to their respective deadlines, appears be less useful in most computer system scheduling problems.) The most general forms of both these problems are NP-complete ([Bru74], [Iba77I), so much of the literature comparing scheduling heuristics in we consider cases where known exactly. [Con67] and all In these cases, LPT is guaranteed value [Gra69I. If , [Dav81a|, [JaiBO]). suggest the value of using two simple heuristics, shortest processing time (SPT) and longest processing time First, has involved terms of bounds on computational complexity and "goodness" of the resulting schedules relative to optimal schedules (eg A number of results in this field first to (LPT), to achieve the objectives F^ve and Fmai. respectively. jobs are available at the if all same time and the processors are identical, then produce an Fmax that some processors are uniformly is faster their processing times are SPT exactly minimizes Fave no worse than 4/3 of the than others, then the minimum LPT consider cases where all jobs are available at the in advance. Instead the processing times same time but is Next, we their exact processing times are not have certain random distributions with different expected values for different jobs. possible heuristic guaranteed to produce an Fmax no worse than twice the best possible value [Gon77). known first In these cases, if (e.g., exponential) the system contains identical processors on which preemptions and sharing are allowed, then SPT and LPT expected values of F,ve and Fn,,i, respectively, (IWebSZl, [Gla79J). Finally, the jobs are not all available at the we consider cases where and have exponentially arrive randomly In these cases, if the processors are identical distributed processing times. then same time but instead exactly minimize the and allow preemption, LPT exactly minimizes Fmax [VanSl]. Other scheduling objectives. DSP can the traditional ones of minimizing also be used to achieve many other possible objectives besides mean or maximum flow time for independent jobs. For example: Many (1) Parallel heuristic search. artificial intelligence programs use various kinds of heuristics for determining which of several alternatives in a search space to explore next. For example, in a traditional "best first" heuristic search, the single at each point is always chosen to be explored next [N'ilSO]. most promising alternative By using the heuristic evaluation function to determine priorities for DSP, a system with n processors available can be always Furthermore, exploring the n most promising alternatives rather than only one. if the processors have different capabilities, each task will be executing on the best processor available to The DSP way it, given its priority. protocol can also, of course, be used in heuristic searches in a to assign a fixed set of ([Sin85]) subtasks to processors. system uses a bidding protocol Their use of the protocol appears to to assign more straightforward For example, Singh and Genesereth's deduction steps to distributed processors. be minimizing Fmax by giving highest priority to the most costly tasks. (2) Arbitrary market with priority points. human user of the system a fixed number DSP Another obvious use of is to assign each of priority points in each time period their programs) can then allocate these priority points to tasks in Users any way they choose (or in order to obtain the response times they desire (see [Sut68) for a similar-though non- automated-scheme, and Mendelson ([MenSS]) for a related analysis not micro-level priorities, but macro-level chargeback policies (3) Incentive market with priority points. If the make at determining, ). personal computers on a network are assigned to different people, then a slight modification of the arbitrary people an incentive to aimed market in (2) can be used their personal computers available as contractors. modified scheme, people accumulate additional priority points for their time their machine acts as a contractor for someone else's task. own to give In this later use. every 8 Estimating processing time As described above, DSP requires bidders information is used by DSP to estimate their completion times for different tasks. This only to rank different tasks and different contractors, so only rough estimates are needed. In some cases, historical processing times for similar jobs might provide a basis for making even more Our simulation precise estimates, possibly using parameters such as the size of the input studies below include an examination of the consequences of files. making these estimates very poorly. Alternative Scheduling Protocols For comparison purposes, we now consider two alternative protocols. The The second designed to remedy one of the possible deficiencies of DSP. method that provides a comparison with designs where no attention is first protocol is is a scheme a random assignment given to the scheduling decision. Alternative 1 - Eager assignment As discussed above, one of the possible deficiencies of DSP provided by processors that are not ready task on a machine that another much faster is to start that no estimates of completion times are immediately. That available immediately (possbily their machine becomes available soon. processing time on the is first machine is wasted. If not, If own the task is, DSP may clients using start a local machine), only to find that is canceled and restarted, the job finishes later than it all the could have. If reasonable estimates of completion times on currently busy machines are available, then clients would know enough to wait for faster machines that were not immediately available. In this alternative scheduling protocol, tasks are assigned to contractors as soon as possible after the tasks become available and then reassigned as necessary contractor maintains a schedule of tasks finish times, submitted. and so the contractor By analogy with it is when conditions change. In this way, each expected to do, along with their estimated start and can make estimates of when it could complete any new task that "lazy" evaluation of variables ([Fri76|, [Hen76]) the original DSP is could be called "lazy assignment" because clients defer assigning a task to a specific contractor until the contractor is actually ready to start. assignment," since it This alternative protocol, therefore, will be called "eager assigns tasks to contractors as soon as possible. This alternative may be thought of as a logical extension of the "immediate bid responses" used in the contract nets protocol ([SmiSO], [SmiSl]), and is also similar to the "STB" scheduling alternative considered by Livny & Melman([Liv82]). In this protocol, all contractors bid estimates on all tasks even higher priority is who submitted it. than expected scheduled. When new Then the client picks the best bid to complete, the contractor notiiies the to note that DSP, amount "bump may on other bump is a lot of bumping, this scheduling process is the reservations of other tasks of lower priority, finite number of bumps. of rescheduling in rapidly changing situations, its To reduce the bumping occurs amount original estimate by the tolerance" parameter. this alternative is clearly it to reschedule their tasks Distributed Computing System ([Far731; the currently estimated completion time of job exceeds specified in the While where there can only a task takes longer no longer available by the time the task arrives. new task can never cause more than a finite G>ut possibly large) when in cases to converge. Since tasks the scheduling of a only even is when to the contractor later tasks in its schedule that their These clients may then try [Far 731) when a processor that has bid on a task guaranteed owners of A related kind of bumping process occurs in the important and sends the task tasks are added to a contractor's schedule, or reservations have been "bumped." It is contractor starting time for a task by flnding the first time in its schedule at which no task of its contractors. A they are currently busy. if more elaborate and may require much more message traffic than the also result in better schedules for situations with processors of widely varying capabilities. Alternative 2 - Random assignment In the second alternative protocol, clients pick the first contractor who responds to their request for bids and contractors pick the first tasks they receive after an idle period. Contractors do not bid at all when they are executing a does not receive any bids, it and they answer when they If a client continues to rebroadcast the request for bids periodically. When task, all requests for bids contractors receive a task after already beginning execution of another one, the (with a "bump" message) and the client who submitted it when more than one new idle. task continues trying to schedule In the simulations discussed below, the selection of the first bidders available, and of the first task are task is it is rejected elsewhere. when more than one machine waiting, are both modeled as is random choices since the delay times for message transmission and processing are presumably random. (In reality, fast contractor machines might often respond more quickly to requests for bids than slow ones 10 and so would be more mechanism in Thus the performance likely to be the first bidders. of this scheduling a real system might be somewhat better than the simulated performance.) SIMULATION RESULTS In many environments, including our motivating example of real distributed scheduling workstations on a network, minimizing the primary scheduling meet objective. this objective is usually mean Unfortunately, as flow time of independent jobs we noted It is effects of In this section, results of a series of simulation studies that investigate the performance of the and compare assignment). it sequencing therefore appropriate to rely heavily on simulations to investigate strategies for achieving this objective situations likely to be the above, the problem of scheduling tasks to NP-hard, and other analytic results about the strategies (other than random) are quite scarce. is we summarize the DSP in a variety of two alternatives described above (eager assignment and random to the We also report several analytic results that are useful for comparisons to the simulation results. Simulation Method Priorities. Since the objective to be jobs, priorities in all simulations minimized by scheduling assumed is were determined according to be the mean flow time of to the shortest processing time first (SPT) heuristic, except in the random alternative where priorities are not used. Network configurations. Ten diiTerent configurations of machines on the network were defined. was configurations, a total of 8 units of processing power achieved in different ways: a single machine of speed speed 4 and 2 machines of speed 2; etc. We will 8; In all available, but in different cases this or 8 machines of speed I; or I was machine of denote different network configurations below by a sequence of the machine speeds they contain (eg, "422"). Job loads. For all the simulations, jobs were processing on any machine in the network. the amount of processing in assumed The job each job was assumed time units on a processor of speed I). System to be independent of each other and suitable for arrivals were to assumed to be a Poisson process and be exponentially distributed (with a utilization mean of 60 was defined as the expected amount of processing requested per time interval divided by the total amount of processing power in the system, and three different levels of system utilization (O.I, 5, and 0.9) were simulated. 11 In order to increase the power of comparisons between different simulations at the same common random numbers was used variance reduction technique called level, the 350-354). In this technique, the the different simulations same sequence By reducing the of random numbers utilization (see [Law82a], pp. used to generate jobs in each of is variation between simulations that is due to random job generation, this technique increases the statistical power of the comparisons between the factors of interest (different network configurations A Statistical tests for equilibrium. and scheduling methods). generic problem in simulation studies is determining how jobs to simulate. In all the simulations reported below, the procedure developed by when ([Law79]) was used to determine report for the steady state mean. to many Law and Carson terminate the simulation and what confidence intervals to This procedure was recommended by Law and Kelton ([Law82a, Law82b]) after comprehensive surveys of both fixed-sample-size and sequential procedures. procedure is based on the idea of comparing successive batches of jobs to determine whether the means are correlated. When means to The the number of batches and the batch sizes are large enough for the batch mean and variance be uncorrelated, then the grand are unbiased estimates of the steady state values for the simulation (see [Law79|). In practice, this test was quite Each simulation was run stringent. until a 90 percent confidence computed with a width of less than 15 percent of the estimated mean. The number interval could be of jobs required ranged from 1200 jobs in some of the simulations for system utilization of 0.1 to over 75,000 jobs in some of the simulations for system utilization of 0.9. In many simulation studies, some number of jobs are discarded from the beginning of the analysis to remove the "start up" values from the overall mean. However, as Law and Kelton ([Law82b]) and Gafarian, Ancker, and Morisaku ([Gaf78]) have noted, there are no generally satisfactory methods for determining how many jobs to discard. discarding any jobs. Since, later batches, above, but it it may if Therefore, we adopted the conservative approach of not anything, this increases the variance between the increase the number the jobs also included an estimated take). mean In addition to the actual amount of processing have made of how long the job would of early and of jobs necessary to pass the steady state test described will not result in biased estimates of the steady state Accuracy ofjob processing time estimates means for (see [Law82al). amount each job These estimates are used (i.e., to of processing in each job, the estimate a user might determine job priorities and estimated completion times. In order to examine extreme cases, these estimates were either perfect (0 12 percent error) or relatively inaccurate ( + /- 100 percent error). In the case of inaccurate estimates, the errors were uniformly randomly distributed over the range. Communications delays. In order to simulate "pure" cases of the different scheduling mechanisms, most of our simulations treat communication among machines as perfectly reliable and instantaneous. In real situations where communication delays are negligible relative processing times, this assumption of instantaneous communications is to job appropriate. In other simulations, designed to explore the effect of communication delays, we assumed constant delays for the transmission of all messages. The values for message delay that were simulated were equivalent to 0%, 5%, 10<!fc, and 15% respectively of the average job processing time. The simulations include only the effect of message delays; they do not include any other factors such as processing overhead needed to Even though, as we transmit and receive messages. will see below, the effect of increasing results cannot be obtained by simply adding the total for 4 communication delays message delay time per job is quite linear, the (the delay required messages) to the results for no delays. This simple approach does not work because it neglects the time saved by having the announcements of waiting jobs already queued at processors that are busy when the jobs are first announced. Restarting after late bids. In most of the simulations of DSP, late bids are never accepted no matter how much of an improvement they are over test the effect of this the earlier bids. In one series of simulations, designed to parameter, a range of values for this "late bid improvement" parameter is investigated. Bumping and rebroadcasting. In keeping with the spirit of simulating "pure" scheduling methods, jobs in the eager simulations are rescheduled every time their scheduled start time In a real system, jobs would ordinarily have to be rescheduled. bumped by more than some In other words, the performance of the eager is delayed at all. tolerance before being method could only get worse if fewer bumps were made. Similarly, in the random assignment simulations, time interval of the simulation until the job is clients rebroadcast their requests for bids in every successfully assigned to a contractor. simulates the best scheduling performance the random method could achieve; occurred less often, the performance could only get worse. if Thus, this rebroadcasting , 13 Analytic methods It is possible to derive analytically several simple results that can be used for comparison with the simulation results. Random assignment and randomly to identical processors is sequencing on identical processors. processors as they become available and sequenced randomly on these equivalent to an M/M/m queuing system, The expected queue length and waiting time L = W= [(mA/p)'n(X/ii)PoJ L/m\ + r 1 Our simulations / [m!(l-A/p)21 for that is, a system with m servers for one queue. such a system are, respectively: , 1/p m-l P= 1/ ^ where The case where tasks are assigned (mX/p)'/i! + [(m\/n)'"/m!l / 1 - \/\i) = do, in fact, generate confidence intervals that include these values for the cases where they can be computed. In these cases, only the analytically derived results are reported. Optimal assignment (with jobs moving during execution and with random sequencing). Even though our simulations assume that jobs cannot be moved once they have begun execution there formula for the flow times that would result if jobs is a simple could be moved during execution in such a way that the fastest processors were always in use (see (Sta85]). This formula provides a lower bound for the results of optimal assignment. machines to However, it assumes that tasks are sequenced randomly on the which they are assigned Since our simulated scheduling methods include sequencing as well as assignment, they might do better or worse than these analytic results depending on the relative importance of sequencing approximate lower bound for j. Letting the in a given situation. They do, however, provide an rough comparison purposes. In order to obtain these results, yijifi^ and assignment we assume that the processor number of processors be N, we define service rates p, are ordered so that p, a 14 = M(0 l.V^j =1 . 7 = M(0) 1. Using straightforward assumptions about the state transition probabilities and the conservation of flow principle, Stankovic ([StaSS]) shows that the expected queue length and waiting time for such a system are, respectively: w where = PA P^=\ ly — — y j V \JlM{j)+ + N-l +a-\/\iiN)) -2 n. l-\/\i(N) —— I ) (1 -.\/p(JV))j. j=0 Local processing only. One question of interest in our study time that locally is possible from scheduling jobs is the amount of speedup anywhere on the network as opposed to in mean processing all flow jobs on the machine at which they originate. As noted above, analytic results are not available for the effect of sequencing in this situation. However, it is possible to use the simulation results for the case with only one machine in the network to estimate of the effect of sequencing and then compute the overall flow times that would result from all machines performing their local scheduling similarly. To do this, we take advantage of the fact that the time units used in the simulation are arbitrary and can be scaled as desired. For example, the mean flow time that of a single machine of speed time units. In general, type i, C = Ej Sin, let n, 8, be the is machine of speed I is 8 times since the two simulations are indistinguishable except for the number of machines of type i, s, be the speed of the machines of be the total processing capacity on the network, and network containing only a single machine of speed network for a single 1. Then the Wi be the mean flow time in a the average flow time in the total 15 W = Ei(ni/C)Wi. Results Relative effects of system load, network configuration, We first DSP investigate the results of and accuracy ofprocessing time estimates scheduling with a variety of system loads, network configurations, and accuracies of processing time estimates. factors, much lists message delays were kept constant better they were (i.e., (at 0), and In order to focus on the effect of these were never accepted, no matter how late bids the "late bid improvement" parameter was effectively infinite). Table 1 these results, Figure 2 shows the effect of system load for a typical configuration (1 machine of speed 4 and 4 machines of speed constant both system load (at 1), 50% and Figure 3 shows the utilization) effect of network configuration holding and accuracy of processing time estimates (at 0% error). Both system load and network configuration have a major impact on mean fiow time we factor of 4 for the range of conditions (as much as a studied), while the accuracy of processing time estimates does not appear to be a major factor in performance (at most a difference of about 15% when estimates have errors of up to 100%). Figure 3 shows that the effect of dividing the same amount of processing f)ower into more and more processors is almost linear and that the number of processors used has a much greater impact on flow time than the maximum of the graph where the range of processor speeds. Even number of processors ranges from if I we to 4, restrict our attention to the first part we see that this makes a difference of approximately a factor of 3 in flow time, while changing from a configuration with identical processors to one with a speed range of a factor of 4 makes only about a 12 percent difference in flow time. Comparing the results in Table 1 for DSP with the analytically derived results for optimal assignment (with moving during execution and random sequencing), we see that assignment is comparison to this important (i.e., where processor speed ranges are large), DSP in those cases where does reasonably well in rough "lower bound". In the cases where assignment makes no difference where processor speeds are identical), DSP DSP does intelligent sequencing as well does much as assignment. (i.e , better than the rough "lower bound" because 16 Increasing the size of the bidding network while keeping overall utilization constant The results power we have in the just seen all involve configurations in which the total network different ways. From is amount of processing among processors in constant (a total of 8 processing units), but divided the point of view of a network designer, another relevant question processors to combine in one bidding network, that is, how many is how many processors to group together for the purpose of sharing tasks. To answer this question, we assume that the speed of each processor and the overall utilization remain constant We and then consider the can estimate this effect in two ways. The first, effect of adding processors to the purely analytic, method uses the well-known formulas given above for random scheduling on identical machines random sequencing). Since the machines are identical, assignment method, but random sequencing ways of analytically computing the results of the DSP is all (i.e., both random assignment and random assignment certainly not optimal (e.g., is see [Con67]). Since simple simulations to estimate the effect of (shortest processing time of the same speed (speed 1). as good as any sequencing are not known, the second method uses the effects of These estimates are then adjusted analytically machines, network. to refer to To make first) sequencing. networks with different numbers of this adjustment, we scale time units as described Wn^ is the mean flow time for a network with N identical machines of speed network with N machines of speed would have a mean flow time of W-sf.s* = (s/s')Wm above. If s' s, then a g. Table 2 shows the results of both these methods, and Figure 4 plots the results for the second method. It is clear that using either random or SPT sequencing, pooling work from several processors can, indeed, have a significant impact on flow time. This effect, however, utilization. little There is is very dependent on the system essentially no benefit at low utilizations; at moderate utilizations, there is very additional benefit after about 4 machines, and even for heavy utilizations, most of the benefits have been exhausted by about 8 machines. It is important to realize that this result known formulas for is quite general. random scheduling; our simulation The results analytic results are based on well- show that the effect holds for SPT sequencing as well. The shape of the curves shown in the figure does not depend on processor speeds or average job processing times; changes in these factors merely change the scale of the vertical axis. Our result does depend on the assumption of exponentially argument given by Sauer and Chandy [SauSl] to explain their results pooling jobs from several processors should be even coefficients of variation in service time. Their distributed service times, but the intuitive suggests that the benefits of more pronounced in argument says that with high systems with greater coefficients of variation, multiprocessor systems perform better than single processor systems because a few very long jobs can bottleneck a single processor while in a multiprocessor system, only one of the processors monopolized and other jobs can still be processed on the other processors. is This argument also suggests that in systems with very high coefTicients of variation, the benefits of adding more processors might extend further than in systems with exponential service times. The implication of this result for system design significant benefits from pooling no need to reducing mean - quite important. work generated by different machines have large numbers of machines several separate networks of 8 is all It suggests that there can be network but that there in a on the same bidding network. In is most situations, 10 processors each should perform as well, from the point of view of flow time, as a single network including all the processors together. Effects of message delays One of the possible problems with pooling are involved) forth is work from several machines (even only a few machines that the message delays required for scheduling and for transferring results back and might overwhelm the flow time benefits obtained from pooling. Figure 5 compares the results of pooling jobs by network scheduling to the results of processing all jobs locally on the machines at which they originate. The configurations simulated were chosen from our to represent the situation in machines or the maximum results in Figure 5 is maximum number of total range of processor speeds). show that pooling of work can, delays are quite substantial. scheduling total set of configurations which the least benefit would result from pooling (only two identical machines) and the situations in which the most benefit would result (the The if even when message With moderate loads (50%) and large numbers of processors, pooled superior to strictly local processing even average job processing time! in fact, be beneficial, Even in the identical machines), pooled scheduling is when message delays exceed 20% of the configuration where pooling has the least benefit (two preferred at moderate loads up to about the point where message delays exceed 5% of the average job processing time. Effects of "late bid improvement" parameter Figure 6 shows the effect of varying the amount of improvement required in "late" bids before jobs will be cancelled machine. If we on the machines let tg to which they were originally sent and restarted on the late bidding be the estimated completion time in the earlier bid, tL be the estimated completion time in the late bid, and t be the time at which the late bid is evaluated, then the 18 "improvement" is i = 1 - (tt - toVCtg Late bids must exceed some criterion parameter - 1). to before they will be accepted. We have simulated the configuration in which this factor could make the most difTerence (the configuration with the maximum single job in this situation is 0.75 range of processor speeds). The most improvement possible (if for a a processor of speed 4 becomes available immediately after a bid has been awarded to a processor of speed 1). Therefore, setting iq at 0.75 or greater will result in no rescheduling. As Figure 6 shows, at all, no matter at low utilizations performance is how improved by rescheduling small, but at moderate utilizations, performance utilizations, the maximum to 25 percent. processing time "wasted" on the plentiful job is any improvement slightly in the worse by this range of 0.4 to 0.6. benefit from using this parameter appears to be about 5 percent, while at low utilizations using this parameter on the order of 20 made somewhere strategy. In both cases, the optimal setting for io appears to be At moderate is for This result is may result in overall flow time improvements sensible since the only cost of rescheduling first processor, and with low utilization, this processing anyway. As utilizations increase, the cost of "wasting" time on the first is the time is processor to which a assigned increases and the potential benefit from using this parameter becomes negligible. Evaluation of alternative scheduling methods Figure 7 shows the results of DSP and the two alternative scheduling methods "eager" and "random." In all cases, DSP is at least as good as, and in some cases, much better than the more complicated and expensive "eager" scheduling method. With perfect estimates of processing amounts, both eager assignment are consistently as good as or better than random assignment. processing time estimates, DSP DSP and With poor suffers little performance degredation, but the performance of eager degrades quite substantially~in some cases eager becomes significantly worse than random. particularly striking, in view of the fact that the eager method was motivated by problems arising with large numbers of processors and large speed differences method performs worst in precisely those situations. It is among processors, that the eager We believe that two primary factors account for this result: (1) "Stable world illusion." In the eager assignment method, each job machine that estimates the soonest completion time. But and are assigned later and to the same machine, then they will if jobs is of higher priority arrive later keep "bumping" the later times. In other words, jobs are assigned to assigned to the first job back to machines on the assumption that no 13 more jobs will arrive (i.e., that the world will remain stable). Even though in the simulation, jobs are rescheduled every time any the time the job rescheduled, is it machine that could have completed In some of our simulations this effect, that is, newjobs arrive that delay their estimated may it by already have missed a chance to start on another before it will now be completed. (not included here), the bids included bids included an estimate of how long an extra factor to correct for the starting time of the job would be delayed by jobs that had not yet arrived, but could be expected to arrive before the job began Even though the (See [Mal86] for the derivation of this correction factor.) execution. start time, inclusion of this correction factor did improve the performance of the eager assignment method somewhat, the changes were not substantial. (2) Unexpected availability. When a job takes longer than expected, or when higher priority jobs arrive at a processor, all the clients are notified with "bump" messages and given a chance takes less time than expected or processor who submitted jobs scheduled may become when later on that processor to reschedule their jobs. When a job jobs scheduled on a processor are canceled, the available sooner than expected, but in these cases, the clients who submitted jobs that were scheduled elsewhere but who might now want to reschedule on the There can thus be situations where newly available machine are never notified. fast processors are idle while high priority jobs wait in queues on slower processors. This appears to be a serious weakness of the eager assignment method. implemented, an addition them have specified, but not to the protocol that notifies all clients of such situations to reschedule their tasks. traffic We The cost of this addition and system complexity, and we believe it and allows would be even greater message unlikely that the resulting performance would be significantly better than the much simpler lazy assignment method. IMPLEMENTATION OF THE ENTERPRISE SYSTEM In this section, we describe detailed description runs processes on a assigned to the best is several highlights of the Enterprise system implementation. provided by Malone, Fikes, and local Howard ([Mal831). The system schedules and area network of high-performance personal computers. machine available at run-time (whether that is The assignment takes estimated completion time: machine speed and currently loaded prototype version of the system is implemented in Interlisp-D Processes are the machine on which the task two primary factors that affect originated or another one). into account A more files needed for the task A and runs on the Xerox 1100 (Dolphin), 20 1108 (Dandelion), and 1132 (Dorado) Scientific Information Processors connected with an Ethernet. testing, but The prototype version has received limited no significant operational experience has been obtained. As shown in Figure 8, the system is an Inter-Process Communication (IPC) different machines, can send messages facility to each other. When messages over a "best PUPs (see [BogSO]) to provide medium such efforts" physical transport procedure calls ([Bir84], [Tho831) in which messages are passed on the remote machines. layer provides the different processes are on different Enterprise uses a pre-existing protocol that an Ethernet [Met76]. first by which different processes, either on the same or machines, the IPC protocol uses internetwork datagrams called reliable non-duplicated delivery of The partitioned into three layers of software. to as highly optimized for remote is remote machines as procedure calls The next layer of the Enterprise system is the Distributed Scheduling Protocol (DSP) which, using the IPC, assigns and sequences the task on the best available machine. Finally, the top layer is a Remote Process Mechanism, which uses both the DSP and IPC to create processes on different machines that can communicate with each other. The implementation assumes that the owners into a mode where the machines respond of idle workstations voluntarily put their machines to requests for bids from the network. Shock and Hupp [Sho82| describe an alternative mechanism for locating and activating idle machines on a network without their owners' intervention. Several augmentations of the failures to and delays, (2) DSP protocol were made to account for (1) processor or the unique status of the "local" processor, and (3) human communication users who might try "game" the system. Unreliable processors and communications In addition to the messages involved in the bidding cycle, clients periodically query the contractors to which they have sent tasks about the status of the tasks. any other message in the DSP), the client If a contractor fails to respond to a query (or assumes the contractor has failed. Failures might result from hardware or software malfunctions or from a person preempting a machine any for other uses. In case, unless the task description specifically prohibits restarting failed tasks, the client automatically reschedules the task on another machine. periodic queries from one of its clients, the contractor aborts that client's task. Similarly, if a contractor assumes the client has failed fails to receive and the contractor 21 Since a task can be restarted several times during its lifetime (e.g , because of processor failures or because of a late bid improvement), there can be .different "incarnations" of the same process [Nel81|. Because messages can sometimes be delayed or to earlier guaranteed to confusions might result from messages referring To prevent such confusions, each task incarnations of a current task. identifier that is lost, is assigned a task be unique across time and space. In order to do this, a timestamp of the most recent "milestone event" in the life of the process is included in the task identifier. Milestone events are the sending of either a request for bids or a task message concerning the task. Both these events render obsolete to DSP messages all previous DSP messages concerning the task. Before responding about a particular task, therefore, both clients and contractors check message concerns the most recent incarnation of the to be sure the task. (These task identifiers serve the same purpose as the call identifiers used by Birrell and Nelson [Bir84]). In view of benefits described above from using "late bid improvement" rescheduling in lightly loaded systems (as ours was), and in view of the unpredictable delays in message transmission, the Enterprise system implements a variation of the DSP always accepted, rather than waiting any fixed time significantly better, the task is protocol in which the first bid for a task is to evaluate bids. Then if a later bid is rescheduled. The "remote or local" decision With the variation of DSP just described, client first machine offers to be its own if the local machine submits bids for contractor), then the local machine will its own tasks (i.e., the presumably always be the bidder and will therefore receive every task. To prevent this from happening, the client waits for other bids during a specified interval before processing assumed to its own bid. Since contractor machines are be processing tasks for only one user at a time, the client machine's inflated by a factor that reflects the current load on the client machine. can express their willingness to (Human own bid is also users of a processor have tasks scheduled locally by setting either of these two parameters.) "Gaming' the system If people supply their own estimates of processing times for their tasks also used to determine priority, there is and these time estimates are a clear incentive for people to bias their processing time estimates in order to get higher priority. To counteract this incentive, the current implementation of Enterprise has an "estimation error tolerance" parameter. was estimated to take (i.e., more than the estimation error If a task takes significantly longer than it tolerance), the contractor aborts the task 22 and notifies the client that it was "cut off." This cutoff feature prevents the possibility of a few people or tasks monopolizing an entire system. Conclusion Any designer of a parallel processing computing system, whether the processors are geographically distributed or not, must solve the problem of scheduling tasks on processors. presented a simple heuristic method for solving this problem and analyzed simulation studies of a wide variety of situations. This scheduling heuristic decentralized implementation in which separate decisions processors lead to a globally coherent schedule. Our of this and similar heuristics in such a situation. from sharing tasks among processors benefits are still present even in made by a (4) its we performance with particularly suited to a set of geographically distributed results were encouraging about the desirability (1) substantial performance improvements result systems with more than light loads; when message delay times are as much (2) in many cases, these as 5 to 20 percent of the average task processing time; (3) the additional benefits from pooling tasks machines are small; and is In this paper, among more than large errors in estimating task processing times cause little 8 or 10 degradation in scheduling performance. Finally, we described a prototype system for personal workstations on a network in which programs can easily take advantage of the maximum amount of processing power and a network at any time, with extra cost little when parallelism available on there are few extra machines available. Acknowledgements The first part of this work was performed while three of the authors (Malone, Fikes, and Howard) were at the Xerox Palo Alto Research Center. The work has been supported by the Xerox Palo Alto Research Center, the Xerox University Grants Program, and the Center for Information Systems Research, MIT. Portions of this paper appeared previously in Malone, Fikes, and The authors would like to Howard ([Mal83]). thank Michael Cohen, Randy Davis, Larry Masinter, Mike Rothkopf, Vineet Singh, Henry Thompson, and Bill van Melle for helpful comments. They would also like thank Rodney Adams and--especiaIly--Debasis Bhaktiyar results are reported here. for to running many of the simulations whose 24 References [Arr60] & Hurwicz, L. Decentraliztion and computation in resource (Ed.) Essays in Economics and Econometrics. Chapel Hill: University of North Carolina Press, 1960, pp. 34-104 (Reprinted in K.J. Arrow and L. Hurwicz (Eds.) Studies in resource Allocation Processes. Cambridge: Cambridge University Press, 1977, pp Arrow, K., allocation. In R.W. Pfouts 41-95). [Arr71] [Ber82] Arrow, K., & Hahn, Holden Day, 1971. F. General Competitive Analysis. San Francisco, Berhard, R. Computing at the speed limit. IEEE Spectrum, July C A: 1982, 26- 31. [Bir84] Birrell, A. D., and Nelson, B. J. Implementing remote procedure calls. ACM Transactions on Computer Systems. 1984, 2(1), 39-59. [Bir82] Birrell, Andrew D., Levin, Roy, Needham, Roger M., Schroeder, Michael D., Grapevine: An Excercise in Distributed Computing. Communications ofthe 25(4), April 1982. ACM, [Bog80] Boggs, David R., Shoch, John F., Taft, Edward A., Metcalfe, Robert M., Pup: An Internetwork Architecture. IEEE Transactions on Communications, COM-28, (4), April 1980. [Bri73] Brinch Hansen, P., Operating Systems Principles. Prentice-Hall, Englewood Cliffs, New Jersey, 1973. [Bru74] Bruno, J., CofTman, E. G., & Sethi, R. Scheduling independent tasks to reduce mean finishing time. Communications of the ACM, 1974, 17, 382- Inc., 387. [BrySl] Bryant, R., & Finkel, R. A stable distributed scheduling algorithm. Proceedings of the Second International Conference on Distributed Computer Systems, April 1981. [Cho79] Chow, Y. and Kohler, W. Models for dynamic load balancing in a IEEE Transaction on heterogeneous multiple processor system. Computers. May 1979, C-18. [Cof73] Coffman, Edward G., Jr., and Denning, Peter J., Operating Systems Theory. Prentice-Hall, [Con67] Inc., Englewood Cliffs, New Jersey, 1973. R. W., Maxwell, W. L., Miller, L. W. Theory of Scheduling. Addison-Wesley Publishing Company, Reading, Massachusetts, 1967. Conway, [Dav81a] Davis, E. and Jaffe, J. M. Algorithms for scheduling tasks on unrelated processors. Journal of the ACM, 1981 (October), 28, 721-736. [DavSlb] Davies, D., Holler, E., Jensen, E., Kimbleton, S., Lampson, B., LeLann, G., Thurber, K., and Wateon, R. Distributed systems-Architecture and implementation: Lecture notes in computer science, vol. 105. New York: Springer- Verlag, 1981. [Dav83] Davis, R., and Smith, R. G., Negotiation as a Metaphor for Distributed Problem Solving. Artificial Intelligence. Volvmie 20 Number 1, January 1983. [Ens81] Enslow, P. 11, [Far72] What is a distrubted data processing system?. Computer, vol. June 1980. Farber, D. J. and Larson, K. C. The structure of the distributed computing system-- Software. In J. Fox (Ed.), Proceedings of the Symposium on Computer-Communications Networks and Teletraffic, Brooklyn, NY: Polytechnic Press, 1972, pp. 539-545. [Far73] Farber, D.,etal. The distributed computer system. Proceedings of the 7th Annual IEEE Computer Society International Conference, February 1973. [Fri76] Friedman, D. & Wise, D. CONS should not evaluate its arguments. Automata, Languages and Programming, Edinburgh University Press, 1976,257-284. [Gaj85] Gajski, D aand Peir, J. Essential Issues in Multiprocessor Systems. IEEE Computer, June 1985, pp. 9-27. [Gaf78] Gafarian, A. V., Ancker, C. J., & Morisaku, T. Evaluation of commonly used rules for detecting "steady state' in computer simulation. Nav. Res. Logist. Q.. 1978, 25, pp. 511-529. [Gla79] Glazebrook, K. D. Scheduling tasks with exponential service times on parallel processors. Journal of Applied Probability, 1979, 16, 685-689. [Gon77] Gonzales, T., Ibarra, O. H., and Sahni, S. Bounds for LPT schedules on uniform processors, SI Journal of Computing, 1977,6, 155-166 (as cited AM by[Jaffee]). [Gra69] Graham, R. L. Bounds on multiprocessing timing anomalies. SIAM Journal of Applied Mathematics, 1969 (March), 17, 416-429 (summarized in Coffman and Denning, pp. 100-106.). [Hen76] Henderson, P. & J. Morris, Jr. A lazy evaluator. Record Third Symposium on Principles of Programming Languages, 1976, 95-103. [Iba77] and Kim, C.E. Heuristic algorithms for scheduling independent tasks on nonidentical processors. Journal of the ACM, 1977 Ibarra, 0. H. (April), 24, 280-289. M. Efficient scheduling of tasks without full use of processor resources. Theoretical Computer Science, 1980, 12, 1-17. [JafSO] JafTe, J. [Jon80] Jones, A. K., and Schwarz, P. Experience Using Multiprocessor Systems A Status Report. Computing Surveys, Volume 12, Number 2, June 1980. 26 [KleSl] Kleinrock, L., and Nilsson, A. On optimal scheduling algorithms for timeshared systems. Journal of Association of Computing Machinery, 28, 3, pp. 477-486, July 1981. [Kor82] Komfeld, W. A. Combinatorially implosive algorithms. Communications of the ACM, 1982 (October), 25, 734-738. [Kri71] Kriebel, C. H., & Mikhail, O. I. Dynamic pricing of resources in computer networks. In C. H. Kriebel, R. L. Van Horn, & J. T. Heames (Eds.), Management Information Systems: Progress and Perspectives. Pittsburgh: Carnegie Press, 1971, pp. 105-124. [Kur85] Kurose, J., Schwartz, M., Yemini, Y. A microeconomic approach to decentralized optimization of channel access policies in multiaccess networks. IEEE 1985. , [Lani68} Lampson, B. W. A scheduling philosophy for multiprocessing systems. Communications of the ACM, 1968 (May), [Lar82] Larsen, R., McEntire, Silver Spring, P., and 11, 347-359. O'Reilly, J. Tutorial: Distributed control. MD: IEEE Computer Society Press, 1982. & Carson, J. S. A sequential procedure for determining the length of a steady-state simulation. Operations Research, 1979, 27, pp. 1011-1025. [Law79] Law, A. M., [Law82a]Law, A. M., & Kelton, W. D. Simulation modeling and analysis. York: McGraw-Hill, 1982a. New & Kelton, W. D. Confidence intervals for steady-state simulaitons, O: A survey of sequential procedures. 1982b Management. Science, vol 28, 5. May 1982, pp. 550-562. [Law82b]Law, A. M., [Liv82] Livny, M., & Melman, M. Load balancing in homogeneous broadcast distributed systems. Proceedings of the Computer Network Performance Symposium, Maryland, 1982. [Mal82] Malone, T. A decentralized method for assigning tasks to processors. Research meorandum, Cognitive and Instructional Sciences Group, Xerox Palo Alto Research Center, Palo Alto, Calif., August 9, 1982 [Mal83] T., Fikes, R., Howard, M., Enterprise: A market-like task scheduler for distributed computing environments. Working paper, Xerox Palo Alto Research Center, Palo Alto, CA, October, 1983 (Also available as CISR # 1 1 1 Center for Information Systems Research, Massachusetts Institute of Technology, Cambridge, MA, October, 1983). Malone, WP , [Mai 86] Malone, T. W. and Rothkopf, M. H. Strategies for scheduling parallel processing computer systems. Paper in preparation, Sloan Scnool of Management, MIT. [Men85] Mendelson, H. Pricing computer services: Queueing effects. Communications of the ACM, 28,3, March 1985. 27 [Met76] Metcalfe, R. M., and Boggs, D. R., Ethernet: distributed packet switching for local computer networks. Communications of the ACM. 19 (7), July 1976. [Nel81] Nelson, B.J. Remote Procedure Call. Xerox Palo Alto Research Center, CSL-81-9, May 1981. [Nil80] Nilsson.N.J. Principlesof Artificial Intelligence. Palo Alto, CA: Tioga Publishing Co., 1980. [Ram84] Ramamritham, K. and Stankovic, J. Dynamic task scheduling distriubted hard real-time systems. IEEE Software, July 1984. [Sau81] Sauer, C. H. & Chandy, K. M. Englewood Cliffs, [Sch84] Computer Systems Performance Modeling. N.J.: Prentice Hall, 1981, pp. 296-297. Schroeder, M., Birrell, A., and Needham, R. Experience With grapevine: of a Distributed System. Transaction on Computer Systems. 1984, 2(1), 3-23. ACM The Growth [Sho82] in WORM Programs - Early Experience Shoch, John F., Hupp, Jon A., The with a Distributed Computation. Communications of the ACM, 25(3), March 1982. [Sin85] Singh, V. & Genesereth, M. A variable supply model for distributing deductions. Proceedings of the International Joint Conference on Artificial Intelligence, August 1985, Los Angeles, CA. [Smi80] Smith, R. G., The Contract Net Protocol: High-Level Communication and Control in a Distributed Problem Solver IEEE Transactions on Computers Volume C-29 Number 12, December 1980. [Smi81] Smith, R. G. and Davis, R., Frameworks for Cooperation in Distributed Problem Solving. IEEE Transactions on Systems, Man. and Cybernetics, VolumeSMC-11 Number 1, January 1981. [Sta84] J. and Sidhu, I. An adaptive bidding algorithm for processes, and distributed groups. In Proceedings of the Fourth Interational Conference on Distributed Computing Systems, May 1984. Stankovic, clusters [Sta85] Stankovic, J. An application of bayesian decision theory to decentralized control of job scheduling. IEEE Transactions ofConputers, 1985, C-34, 2, pp. 117-130. [SRC85] Stankovic, J., Ramaritham, K., and Cheng, S. Evaluation of a flexible task scheduling algorithm for distributed hard real-time systems. IEEE Transactions on computer, C-34, 12, December 1985. [Sto77] Stone, H. Multiprocessor scheduling with the aid of network flow algorithms. IEEE Transactions on Software Engineering, vol Se-3, January 1977. 28 [Sto78] Stone, H. and Bokhari, S. Control of distributed processes. Computer, vol. 11, pp. 97-106, July 1978. [Sut68] Sutherland, A I. E. futures market in the ACM, 1968 (June), 11,449-451. computer time. Communications of [Ten81a] Tenney, R., Strategies for distributed decisionmaking, IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-11, pp. 527-538, August 1981. [Ten81b] Tenney, R. and Sandel, Jr., N. Structures for distributed decisionmaking. IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-11, pp. 517-527, August 1981. [Tho83] Thompson, H. Remote Procedure Call. Unpublished documentation Interlisp-D system, Xerox Palo Alto Research Center, Palo Alto, CA; for January, 1983. [Van81] Van der Heyden, L. Scheduling jobs with exponential processing and arrival times on identical processors so as to minimize the expected makespan. Mathematics of Operations Research, 1981,6,305-312. [VntSl] Van Tilborg, A. M., & Wittie, L. D. Distributed task force scheduling in multi-microcomputer networks. Proceedings of the National Computer Conference, 1981, 50, 283-289. rWeb82] Weber, R. R. Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime. Journal of Applied Probability, 1982, 19, 167-182. [Wit80] Wittie, L. and van Tilborg, A. MICROS, A distributed operationg system for micronet, A reconfigurable network computer. IEEE Transactions on Conputing, vol. C-29, December 1980. cv Table 1 Mean flow times and 90% confidence intervals for the distributed scheduling protocol in various situations (different network configurations, system loads, and accuracies of processing time estimates) 90% Network Configuration Utilization 30 Table 50% Network Configuration 1 (cont.) Utilization 31 Table 10% Network Configuration 1 (cent.) Utilization 32 Table 2 Effect on mean flow time of adding processing capacity while keeping utilization constant (Random sequencing and SPT sequencing) Utilization 50% 10% 90% NO. of machines Random SPT Random SPT Random SPT 66.64 61.28 ± 3.04 120.00 101.60 t 5.92 600.00 245.20 ± 18.16 60.60 57.44 ± 2.64 80.00 74.28 ±5.12 315.79 150.92 + 10.28 60.01 57.78 ± 2.42 65 22 65.04 ± 2.74 178.16 107 40 t 6 34 60 00 60.43 ± 2.60 60.89 60.66 ± 2.67 112.61 81.14 ± 2.52 33 Request for Bids Acknowledgement S:;i, Bid v."fl««fr^'^'*. i ^ "W^O^ ^ CONTRACTOR MACHINE CLIENT MACHINE Figure Messages in 1 the Distributed Scheduling Protocol 34 80.0 M 70.0 __ :iOOt arror e 3 60.0 X n 50.0 I o 40.0 ?trftct Estimatts w 300 I m 20.0 e 10.0 0,0 0.0 9 .1 System 1 Utilization Figure 2 Effect on mean flow time of system utilization and accuracy of processing time estimates (for DSP with network configuration 41111) 35 80.0 M 70.0 -_ 60.0 -_ e 3 n _ 50.0 ' Sptid Rang* • 40.0 w 30.0 •a Rangt • 4 T I m 20 e 10.0 0.0 I 0.0 20 40 I 60 Number of Machines Figures Effect on mean flow time of network confiauration 80 2 36 250.0 __ 90X Utiliittion M e 3 n 200.0 __ 150.0 I o w sot Util Kation 100.0 m 50.0 lOX Uti I itation 0.0 0.0 2.0 40 6.0 Number of Machines Figure 4 Effect on mean flow time of adding processing capacity while keeping overall utilization constant DSP with perfect processing time estimates and identical machines of speed 1) (for 8.0 37 Mean Flow Time O 1 o z o 3 it oa — <e«< o A •^ »« n <t 1^ 1 = ° S n 3 3 * i/» Ifi- 5' c " 5 ^ e o s o o iS o o o 38 80.0 M 70.0 -_ e 3 n _ 60.0 50.0 I o 40.0 SOX jti w 1 'zat ion 30.0 tot util i^atton I m 200 i [ I F e 10.0 0.0 00 15 Late Bid 30 45 60 75 tmprovement Criterion Figure 6 Effect on mean flow time of various settings for "late bid improvement" criterion, ip (DSP with perfect estimates of processing time. Network configuration 41 1 1 1) 90 105 39 Poor Estimates ( i 100% Error) Perfect Estimates H o n o 1- • <v Q. 3 t o o I T i5 5 1 i« w« vO a> (« "• 3X1 3 o» S c 3 M C At cr c -Si. 3 §5 3 c 2 v» 2 3 H 1- 1 01 ^2 «^ n » < 5 ^ Q. 3 JO ft. rt> 3" 1 ^ c — n b 3 n II ISJ it (t ^* fO ^* ^ w* 3" «< Q. 3 e 5<* H 321 5 c o 30 at 3 re II S -- S e S e ^ 1 S e \ S o H 40 \ 1 I ENTERPRISE System Level 2 Level Pup Internetwork Datagram 1 \ Level Ethernet \ Arpanet Packet radio Figure 8 Protocol layers used in the Enterprise system 2384 067 B^SEN15|jIDue MIT tlBR4RIES 3 DUPL 2 TOAD 00S7T021 ^