A Labeling Algorithm for Just-in-Time Charles G. Boncelet Department Jr.* This paper is to describes find a burst from source of and determine information i.e., that an algorithm that within a simplifying assumption, a route if one exists. Simulations runs a variation for in practice the we show may be achieved. this problem scheduling so NP- are is guaranteed indicate of the paper is organized this scheduling We also describe is to find to adapt a repeating the basic 3 algorithm Overview of Highball variation. The University of Delaware Highball search effort geared toward developing, 2 Summary We address high the speed accurate issue of scheduling networks. time protocols), The keeping crossbar connect to many between others bursts networks in nodes of data question (allowing on have TDMA in- do so on a bursty users by providing which with can connect any Users can transmit a megabit 1 ms. gateways.) The authors Permission granted direct to commercial of the that copying and/or copy that specific reached without at all or part otherwise, of this the ACM of the Association notice is for for Computing or to republish, requires /92/0008 /0170 consists of directed The scheduling problem switches and assign starting a fee permission. COMM’92-81921M0, USA @l1992 ACM 0.89791 -526-7 “bursts” of data, say rates, a megabit links, burst takes probably fiber is done at the source nodes. materiel and notice large these periods. and the and or distributed copyright appear, relatively accommodates access for brief is given boncelet@udel.edu ara not made and its date is by permission To copy fee the copiee advantage, publication Machinery. be respectively. provided titla can Highball dedicated optic, connecting nodes. Distances involved are relatively large. Individual links have delays from 3-18 ms, while the network diameter is approximately 30 ms. Links are connected together by crossbar switches that can connect any input to any one or several outputs. The switches are controlled locally. Each switch has an accurale clock (to within about a microsecond). Thus, the switches can perform “just in time” switching just before a burst arrives. Nowhere within the network are the bursts stored or otherwise manipulated. All queuing by Defense Advance Research Projects Agency Ames Research Center contract number NAG 2– mills@ udel.edu, basis. or so. At gigabit The network In this paper, we propose two scheduling problems for these networks. We show that simple variations of this problem are NP-complete. We present algorithms *Supported under NASA project is a reanalyzing, and protolyping a novel high speed computer network based on TI)MA protocols [2, 4]. The network is designed for “big” users, such as supercomputers, visualization workstations, and immediate disk to disk transfers, who can individually demand gigabit or so rates, but usually directed links. The assumed so that typical link delays are number of nodes is moderate, (These nodes may, of course, switches put to any output, and geographical size is large on the order of 10ms. The say between 10 and 100. 638. as follows: Section 4 specifies the scheduling problem and Section 5 presents the NP-completeness results. Algorithms are presented in Sections 6 and 8. Finally, the paper concludes with comments and suggestions in Section 9. to that tasks on machines. The remainder schedules, and overall problem how 80-90% high network finds can be achieved. in which and this quickly of 80-90% schedule 19716 the variations and efficiencies DE problem schedules that simple find algorithm * Engineering The work has its immediate application to the UD Highball Network, discussed briefly in Section 3. Other applications may lie in scheduling traffic on roads or traverse We show for The switch can several We present algorithm networks. to destination. is difficult, complete, a scheduling communication a route that L. Mills Networks for creating schedules and present the results of simulations which indicate that scheduling efficiencies of TDMA speed, David in TDMA of Delaware Newark, Abstract and of Electrical University 1 Scheduling is how to schedule the times for the bursts so that as many bursts as possible can successfully reach their destinations. The scheduling problem is perhaps best ... $1 .50 170 understood with on a directed cannot an analogy: network. be stopped that of scheduling The trains prior to reaching their the burst trains must not collide size is crucial Bursts that conveniently and destinations. are long switched to the nature of the network. compared to link delays can be with virtual circuits. Conversely, This bursts that are very short (e.g., ATM packets) are probably best handled with source routing techniques. The 1 ms burst is an interesting middle position and is convenient for many “big” users, such as supercomputers analogy gives the Highball network its name, In railroad parlance, “highball” means the train has priority and can go as fast as the tracks allow. or visualization workstations. We use the word “bursts”, rather than “packets”, because the bursts do not contain routing information. Presently, we are studying two scheduling paradigms. The first is termed Reservation- TDMA (R- TDMA). The network does not require any sort of header on the burst (except possibly for synchronization). On a sep- In R-TDMA, the nodes transmit requests to transmit bursts on a seprate reservations channel (which maybe arate channel, the nodes transmit requests to transmit. (These requests are assumed to be much shorter than physically inband, even though logically out of band). All nodes receive these requests, schedule them, and arrive at the same schedule. At the appropriate time, the source transmits its burst and intermediate switches The second paradigm is a reare set “just in time.” the actual bursts.) All nodes receive these requests must schedule them. In abstract, the scheduling problem is as follows: Maximum traversing real train trains throughput demanck that many trains are the network at once. Furthermore, unlike networks, Highball camnot signal ahead of the since the trains go as fast as the signals. Given a directed graph G = (V, A) with V being a set of vertices and A a set of directed arcs, a distance metric associated with each arc, and a list of reservation requests, find a peating schedule determined clnce at network turn on time, and modified in an attempt to meet instantaneous demand. We refer to this paradigm is referred as Adaptive Circular- TDMA (A C.. TDMA). In AC-TDMA, schedule rations nodes transmit requests to modify the repeating schedule for an extended number of bursts. The principal Note that advantage of AC-TDMA versus R-TDMA is less computational overhead, while the disadvantage is possibly slow adaptation to instantaneous demand. 4 The 4.1 Scheduling Size: about the network: The network is “large”, about 5000 km in diameter. At approximately 2/3 the speed of light, this size corresponds to a delay of about 30ms. Typical links Links: Links be fiber are directed. optic of nodes, n, is small links Minimize the arrival time ● hIinimize bursts. the transit Minimize the total of data to satisfy all the re- as many requests total waiting over which as possible. of the final time burst. of all scheduled time for transmission of bursts. and waiting times. connections. 5 many has 1 mega,bit bursts 1 ms long. of data. in transit At Thus at one time. NP-Completeness The general scheduling problem posed above is quite difficult. In this section, we show that several specific variations are NP-Complete. 5.1 are approxinmtely each burst can have configu- hlinirnize the total time for the satisfied requests to be received. The total time is the sum of the transit instant aneously. rates, ● these will likely purposes of this paper, we will ignore this reconfiguration time and assume the switches reconfigure Bursts Satisfy ● Switches: Links are connected by cross-bar switches. These switches can connect any input to any output or to several outputs. The switches can reconfigure quickly, although not instantaneously. For the bit it may be impossible the scheduled In practice, and switch ● are 5–15ms long. Number of Nodes: The number to moderate, say 10-100. times the requests. schedule. The following are some of criteria one might wish to opti-mize: Problem assumptions of starting to satisfy quests if there are lifetimes associated with them or if the requests are arriving over time with a rate that exceeds the scheduling capacity. Furthermore, it is not obvious what criteria should be optimized in selecting a Statement We make the following and giga- Transformations Problem of the Partition The partition problem is the following: Given a set of integers, xl, ZZ, . . . . ~n, and a sum, S = ~~=1 xi, typical Note 171 is there a subset of the A, such that integers, &A xi? It is well known that this complete (See, for instance, [1, 3].) We show two different problem ways of solving S/2 The the partition with arrival single problem problem: path. that S/2? In this case, a schedule the NP-completeness. Firstly, lem construction It shows expect mation any exact algorithm to run in exponential In particular, termines ble time. all the algorithm times destinations. However, that presented partitioning it Pack- Given and a bin a colsize B, can channel is a transform bin assume that This problem which transmits resources is lets the bin a scheduling prob- these consider is slotted, is NP-complete. Treat n bursts the of sizes Again bin if the ample, if all the to other all nodes. A allocate Normal traffic then network a Sourcej them, the scheduling available of bins with connecting period necessary If problem as a bin, to transmit Z1, Z2, . . .,zn. packing difficult link each number is a reservations reservations. a trivial a single For there is to periodically the access to the link siotted, is reservations. around and minimize access requests in-band for be scheduled For instance, to network we assume to do this a Destination, and is minimal. packing that in Highball way network must the of subsets integers xi = is pseudo-polynomial involved 1, then can bin get packing and large. is only For ex- is trivial. at a destination at Therefore, we can which provides time (assuming a burst Therefore is as follows: XZ, . . .. x~, as is the generalization if we simple is particularly interesting to this that the problem of determining whether a single burst can arrive a particular time is NP-complete. number instance, scheduling problem is in NP. This is trivial. An instance can easily be checked in polynomial time by adding the n delays (either Zi or O). Secondly, we need to show that this construction correctly solves the partition problem. If there is a path, take A to be the nonzero links along the path. This set will satisfy the partition problem. This paper. Bin vary. We to establish we need to show that xl, the sizes can be solved by the following We need to show two things problem NP-complete, Is there a schedule for a single burst time packing of integers, of the partition the integers into subsets such that the sum of the integers in each subset does not exceed B and as the Destination. Connect each node to the next in line by two arcs: one with zero delay, and one with delay equal to xi. All arcs go from left to right. The network is illustrated in Figure 1. The partition bin lection problem with a scheduling algorithm. For the first solution, consider a simple network of n + 1 nodes. Order the nodes linearly from left to right and identify the first (the leftmost) as the Source and the last (the rightmost) scheduling A Transformation ing Problem 5.2 = is NP- can must is an in Section arrive run 6 this inforP # NP). at all of Algorithm for Scheduling 6 depossi- To reiterate the general scheduling problem, given a directed network and a set of burst requests, find a schedule for transmitting the bursts and setting the various crossbar switches along the way. In practice these re- in exponential example Labeling a class as pseudo-polgnomiai of NP-complete problems known because (loosely speaking) they can be solved in polyno- mial time if the integers involved are bounded in size. In other words, if the integers can be represented in unary, quests are made over time and we would like to schedule each as soon as possible. In order to facilitate this task, we make a simplifying assumption: We freeze the cur- rather than binary, then polynomial time [3]. can be solved in rent schedule and try to schedule the next burst without changing any of the previous bursts. In other words, we In the scheduling problems considered here, the integers are bounded by the size of the network (about 30 ms) divided by the quantization size (about 0.25-1.0 ms). Thus we have good reason to believe that in practice the labeling algorithm will run in reasonably short times. assume that we have scheduled the first L bursts and will not change any of that schedule in scheduling the is a greedy one and leads to (L+ l)st. This assumption one at a time processing. Greedy algorithms are widely used heuristics in solving combinatorial problems. the problem The current schedule creates holes or slots into which we can try to fit the next burst. This burst cannot overlap any of the previous at any link. Thus we conclude that the schedule is link based, Each link has a timeline that stores the status of the beginning of that link at The second construction uses a trivial network consisting of a Source and Destination and two links between them, from source to destination. Each link has zero delay. Assume there are n bursts to be scheduled, each with size xi. Then the partition problem is equivalent to the following scheduling problem: Is there a schedule of these n bursts such that all bursts are finished by time S/2? Such a schedule must, by necessity, partition the bursts equal to S/2. into two subsets each with total each time. A link can be in one of four states: BUSY: The beginning of the link time by a previously scheduled size is occupied burst. at this MAY13E: The beginning of the link is unoccupied at this time and we do not know whether the burst 172 sCxx”””JJD o 0 Figure can reach this link YES: o 1: Network configuration for the partition at this time. The beginning of the link is unoccupied burst can reach this link at this time. and the ● Let b = burstsize, denote NO: The beginning of the link is unoccupied at this time, but the burst in question cannot arrive now because it would overlap with an already burst. In other words, the tail of this overlap with the head of another burst. ● scheduled burst will First ● While link, then this problem cannot the Source link FIFO ● uled, the timelines are markecl BUSY for the appropriate durations. In the effort to schedule the next burst, we must do two things: Firstly, find and mark the NO’s where the burst to be scheduled cannot fit because it burst. Note, whether or not a link that can be connected 1, by s =SUCCESSOR(l), that as YES and put on FIFO. Third step, did we succeed? marking appropriate a ● Fourth step, return for each link for(t can be Figure 2: Labeling networks. 173 to MAYBE’s: 1, == TL(l,t) connected directly to 1 by a crew.sbar switch. The crucial idea is that, if link 1 is in state YES at time t and s is in state MAYBE at time t+delay(l), then s can be marked as YES at time t + delay(l). In this basic fashion, we propagate YES labels until powibly the destination link is marked YES. The basic scheduling algorithm is given in Figure 2. NO’s and YES’s = O;t < T;t++) if(TL(l,t) to a given link, i.e., s is a link forward: not empty, Check destination for YES’S. If so, backtrack to SOURCE YES’s as BUSY’S. link state is NO is a function of the burst size. Secondly, try to turn MAYBE states into YES’S. If the destination link can be marked YES at some time, then we have succeeded in finding a path. Denote t) S t + b) ) Initially, before any bursts have been scheduled, all timelines are in the MAYBE state. As bursts are sched- another YES’s for any t <0 { TL(s, t+delay(l)) = YES if(s is not in fife) add s to FIFO. to the node. These fictitious links carry the node state, both in sending and receiving. Thus, the entire schedule is now link based. Since these links are not part of any loops, they do not affect the minimal burstsize. overlap and TL(I, pop link 1 off fifo for each s=successor(i), if (TL(I, t) == YES and t+delay(l) < T and TL(s,t+delay(l)) == MAYBE) occur. Furthermore, for each node connected directly to a crossbar switch, we augment the network with two zero delay, fictitious links between that node and the crossbar switch. One link goes to the switch and the other would ==BUSY = NO. Second step, propagate Mark twice time, t. NO’s: if (TL(l,@) TL(l,t) say that is less than T = maximum 1’s state at time step, mark head (much like the need to mark the NO state). Since the loop time is at least twice the shortest link, we can as long as the burstsize link for each link, 1, for (t= O;t < T;t++) One restriction that must be considered is that the burstsize must be smaller than duration of the smallest loop. If not, the tail of a burst can overlap its own shortest problem. NO or TL(l,t) == YES) = MAYBE algorithm for scheduling in TDMA and 19 connections. The network is well connected with many paths between pairs of nodes, so that it provides a good test of the path finding ability of the algorithm. The algorithm begins by determining and setting the NO’S. Then all available times on the source link are marked YES and the link is placed on a fife. The fifo contains links which recently marked YES ‘s. While the We model the connections as pairs of fiber optic cables, one in each direction. Combining with the 26 fictitious links (two for each node), we get 26+38=64 links. Thus Link delays were the number of links is reasonable. fifo is not empty, a link is popped off and its successors checked. When this link is YES and its successor, offset by the link delay, is MAYBE, then the successor can be reached. It is marked YES and placed on the fife. This algorithm be reached from finds all links the source. at all times It is a simple that estimated matter the longest to check the destination link and see if and when it can be reached. It is also a simple matter, if desired, to keep track of auxiliary quantities besides reachability. For instance, hop count, transit time, maximum or minimum starting iary quantities The complexity that the maximum and to ways selected the first chose arbitrarily time. ited to W by the nature of the crossbar switch. Typical values are W = 4 — 32. There are IV links, each of which can be placed on the fifo at most T times, and the inner loop over successors can be executed at most that T is exponential the integers in binary. There involved are various (the tricks link delays) to decrease All bursts a Poisson ated. since are represented computational on a per burst basis scales Simulation a uniform with that arrived and we at the same Since the smallest parameter, loop is A, were gener- were randomly distribution. With in per cent of all requests The results are presented the scheduling cho- 13 nodes, one efficiency in Table that are satis- 1. Even for A = 13, is approximately 90Y0. It takes, on average, 25.6 ms for the burst to be started, and 48.0 ms before it arrives at the destination. The average transit time is about 21 rns independent of the load. 10 Lambda Efllciency (%) Start delay (ins) Encl time (ins) Table 1: Scheduling backbone. lin- early with N. One might expect the number of bursts to also increase linearly with N. Therefore, the overall scheduling complexity appears to scale as N2. 7 were 1 ms long. distribution as the fraction fied. One interesting question is how does the labeling algorithm scale as the network grows? We expect that W would remain bounded. It is a function of the crossbar switch technology and network topologies. We believe these might well remain constant, or at worst, grow slowly. We expect that T will be constant as N grows. It is determined by how long users are willing to wait for their message to be transmitted and is not likely to increase just because the number of users increase. Thus ,complexity to the destination cannot expect to satisfy all requests if A > 13. In all cases, the simulations were run over 2000 time periods, generating at least 20,000 requests. We define efficiency cease whenever the current time plus the minimal transit time to the destination exceeds the target time. overall arrival among routes The sources and destinations sen from time somewhat. First of all, one can break out of the loops in Step 2 as soon as any path to the destination is found. Secondly, one can compute the minimal time required to travel from any intermediate node to the destination. Processing at any intermediate node can the is 3 ms and 6 ms, the labeling algorithm will work correctly. Each 1 ms interval, a random number of requests, following operathe fact in the size of the problem, link 18 ms. ms of when that request was generated, we dropped that request and moved on to the next. The requests were taken in the order received. No effort was made to sort the requests to enhance scheduling efficiency. Among possibly many arrival times at the destination, we al- can be derived as follows: We assume number of successors of a link is lim- W times. Combining these, we obtain O(NTW) tions. The exponential complexity comes from and were quantized The shortest In the simulation, we assumed a horizon of 60 ms. If a request could not be satisfied so that it arrived within 60 time, etc., can be computed. These auxilcan be used to select among the possibly many times that the destination can be reached select from the possibly many different paths. by 2/3 the speed of light to the half millisecond. can 11 12 13 14 99,9 99.0 94.6 90.1 84.3 11.4 32.1 17.8 39.8 22.3 44.8 25.6 48.0 27.7 50.0 efficiencies versus A for the NSF Note the times indicated above do not include any initial delays caused by the reservations channel. one can expect that this delay would be about 30 ms for each reservation. However, one reservation may request communications access for multiple bursts. The reservations delay for many bursts may be zero. Results We simulated the performance of the labeling algorithm on a model of the NSF backbone. There are 13 nodes 174 8 Extension to Circular Sched- [2] D. L. Mills, ules One deficiency of the scheduling scheme considered so far is the computational burden of scheduling each burst one at a time. In the simulation discussed above, it took about 5 ms to schedule ei~ch 1 ms burst on a Sun SPARC workstation. We need to schedule 10-13 bursts each millisecond. Even with coding improvements and a faster processor, it may be impossible to get the necessary speed. we implement a repeating node to communicate with sclhedule that allows each other once during each each periods. We can adapt the labeling algorithm given above to compute repeating schedules as follows: Guess the frame time and compute all times modulo the frame time. Try to schedule the n(n - 1) bursts by picking an order of bursts and scheduling each one at a time. If successful, stop; else, try a different ordering or increase time. Since the repeating schedule is computed rarely, perhaps only once, the computing time is almost irrelevant. Therefore, the “guessing” above is tolerable. 9 Conclusions We believe simulations these results are very encouraging. In the presented above we achieved scheduling effi- ciencies of approximately 90% and better. network promises high utilizations. The Highball There are several improvements one can propose to the scheduling algorithms presented here. We have done limited experiments on some of these enhancements, but there is little room for improvement, owing to the high efficiencies already achieved. One enhancement that does seem to improve the results by a few percent is to choose the minimum hop path when choosing between alternative paths. The simulations presented account inefficiencies caused here do not take into by in-band reservations scheduling. Before in-band reservations can be recommended with confidence, this effect must be quantified. References [1] A. V. Jr., J. G. Elias, P. A. Aho, J. E. Hopcroft, The Design and Analysis Addison- Wesley, Reading, and J. of Computer Mass., 1974. D. Unman. Algorithms. 175 of Delaware, Department September 1990. [3] C. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ, 1982. [4] P. Schragger. Scheduling algorithms for burst reservations on wide area high speed networks, In Pro6B,2.8, frame. The frame is the repeating period. For instance, with n nodes, the frame cannot be shorter than n — 1 the frame Report 90-9-3, University of Electrical Engineering, ceedings of the IEEE We have proposed an alternative scheduling strategy: Adaptive Circular TDMA (AC-TDMA). In AC-TDMA, burst C. G. Boncelet Schragger, and A. W. Jackson. Highball: a high speed, reserved-access wide area network. Technical April 1991. INFOCOM ’91, pages 6B.2. l–