The Turn Model for Christopher Advanced Adaptive J. Glass and Computer Department Systems State Lansing, {glass MI that a model are deadlock free, livelock free, maximally adaptive. A unique is not based on adding physical topologies (though nels). Instead, which packets wormhole minimaf the model or nonminimal, to networks is based with on analyzing in a network can form. Prohibiting cles produces routing algorithms and Laboratory 48824-1027 .msu. edu extra that just enough turns to break algorithms that are deadlock network, local input devices, and network nique in the output and which by first into flow dividing coniroi routing the digits message the tits in the packet to become available. of the turns three in routing. Partially for 2D meshes, cubes. algorithms adaptive to prevent for 2D meshes permit routing k-ary adaptive are described n-cubes, show and highest The adaptiveness that which sustainable each router a few for tilts to store routing algo- on the pattern of message tratlic. For nonuniform trfic, adaptive routing algorithms perform better than nonones. ture for become constructing computers and net works scalable offer than Systems nodes, have interconnection multiprocessors, shared-memory massive based where large-scale scalable other a popular parallelism approaches on dh-ect each node such [1, 2, 3, 4, 5] and to multiprocessor networks are organized has its own processor, and broadcast Direct The majority of as ensembles locaf work are memory, of meshes in part by the Touchstone advantage. Its date the ACM appear, and Association requires copy for ACM fee are not copyright notice Computing a fee and/or @ 1992 without the copies specific all or made notice IS gven part of th]s or dmtrlbuted and that Machinery. To the copying copy title material for of direct the is by all nodes otherwise, IS granted publication on their have n-cube the or to republish, permission. 0=89791 -509.7/92/0005/0278 $1.50 = (Zj bors if k which is n-cubes. for O < i and of routing also permits them over multiple communications topologies and k-ary [2], the for wormhole n-cabcs, [9, 1]. routing particularly A 2D mesh Intel channels, possible Paragon, is used and the lowin the Symult MIT J-machine [11], and is used in the nCUBE2 the [1]. location in the the same number differs from that mesh. In a, k-ary of neighbors. The of an n-dimensional to k and two nodes n-cube definition mesh [8], of a in that X and Y are neighbors if z, = vi for all i, O < i < n— 1, except one, j, where 1) mod k. The change to modular arithmetic in the definition adds wraparound channels to the k- ary n-cube, giving it symmetry, Every node has n neighbors if k = 2 and 2n neigh- commercial permission In bors if and only if z, = y, for all i, O < i < :n – 1, except one, j, where y, = ZJ & 1. Thus, nodes have from n to 2n neighbors, if aud only to of are low. X is identified by n coordinates, (c., c1, . . . . Zn-2, Zn–l ), where 1. Two nodes X. and Y are neighO<xl<k,–lforO~i<n- YJ that latencies Formally, an n-dimensional mesh has k. x kl x . . . x kn-z x Icn-1 nodes, k, nodes along each dimension i, where k, ~ 2. Each node k-ary provided space attraction Wormhole hypercubes. DELTA all of the k,’s are equaf Permission Another reservation but are sep- 2010 [10]; a 3D mesh is used in the Caltech MOSAIC; and a hypercube grant EC S.88.14027. NSF network and virtual buffer Channel routing, and output meshes n-dimensional Intel and its neighbor by sending a message through one To handle the complexities of routing messages was supported switching. multicast dimensionaf to store and enough circuit switching. part of wormhole making are far more space routing, virtua~ cut-through, and hand, are proportional to the sum to travel. Wommhole routing ahro as multi- depending *This for wormhole on the other and distance in circuit the fashion. a packet the latencies for store-and-forward are of packet length ancl distance to travel e flits other supportive devices. The nodes communicate by sending messages. The network connects each node directly to only a few other nodes, its n eighbora. Which nodes are neighbors is defined by the topology of the network. A node communicates with a node that is not of its neighbors. its communication phases and routes the header flits available, all of buffer [7] require is that rout ers to repficat intercomection. enough routing of contention, to the product When channel store-and-forward for each channel. architec- multiprocessors. just The techniques has some advantages over and release are an integral arate networks requires channel. packet [s]. The latencies cir-ctiit switching, of packet length Introduction Direct each switching the absence proportional packets then where they are for a suitable channel of the attractions nf wormhole rout- an entire wormhole throughput wait One ing is that cut- through and hyper- and nonadaptive and hypercubes latencies deadlock. partial algorithms meshes, of partially has the lowest depends partially adaptive 1 be prohibited of the turns n-dimensional Simulations rithm must quarters into It tech- a message through the network in a, pipeline all of the routing information for topologies for wormhole routing, n-cubes, without extra channels. quarter connect switching wwitches or flits. and lead each packet through the network. reach a router that has no suitable output remaining router it to local which a popular Wormhole free, minimal or nonminimal, and maximally adaptive for the network. In this paper, we focus on the two most common network n-dimensional meshes and Ic-ary In an n-dimensional mesh, just a The connect channels, [6] is becoming tiits in each packet Header jlits contain all of the cyfree, livelock has a router. input networks. a network packets often channels, routing in direct along the turns each node and output routers. Worrahole chan- the dkections and the cycles direct controls it to neighboring feature of this model is that it or virtual channels to network it can be applied can turn routing Ni Science in the for designing M. * University ,nl}@cps Abstract We present Lionel of Computer Michigan East Routing 278 + >2. Another topology with symmetry is the hypercube, a special case of both n-dimensional meshes and k-ary A hypercube is an n-dimensional mesh in which k, = 2 < n – 1 or a 2-ary n-cube. Meshes and k-ary n-cubes simplify are pop&r In general, because topologies meshes cubes. are preferred over high-dimensional meshes and Low-dimensionaf meshes have low, fixed node which makes them and k-ary n-cubes. channel bandwidth routing. in part regular their idea for nonadaptive low-dimensional have extended to a physical k-ary ndegrees, channel cal channel, more scalable than high-dimensionaf meshes They also have fewer channels and higher per bisection density and have lower con- but tention and blocking latencies, which result in lower comunicm tion latencies and higher hot-spot throughputs [8]. On the other hand, high-dimensional vantages that partialf routing algorithms al, fully adaptive some admeshes. They They more tend to have lower diameters, which shortens path lengths. also have more paths between pairs of nodes, which permits fault tolerance. Finally, the symetry of k-ary n-cubes can make it easier to spread Algorithms topology for should have latency, high in VLSI. Features packet routing three network traflic more message characteristics: throughput, contributing low and a network communication ease of implementation to ease of implementation are lit- tual livelock, spreading Many fault packet of these tolerance, tratlic terms evenly, require have been used in different occurs when a packet waits example, a packet by a packet lease some may that routing packets and along routing packets some definition for a network is, in turn, waiting because toward only when and the destination. never leads routing n on minimal nation to re- (possibly away Minimal paths. (not occurs along from routing, Although when or equidistant routing may Deadlock destinations can keep and occurs many readily cludes preventive measures. are less likely to occur and for packets that encounter or hot spots unless from a routing feature, [12]. It because provides continuously in trat%c of adding algorithm r.tittng routes the a packet first higher and higher way ensures that but it avoid virtual also ensures deadlock, channels along dimensions. the zu and and possibly to networks. Another to provide Dally and a path prohibit Seitz more are deadlock and maximally partially free and respecadap- hypercubes in Section 6. waiting on packets wait. would deadlock at least same turns pair than would turns one of the been avoided. in a network, a altogether. The routing one turn in each of the time, every If only have the algorithm of nodes. necessary; In would addition, otherwise, the be reduced. wormhole maximally routing adaptive algorithms for a network, we The model involves analyzing the dican turn in the network and the cycles It prohibits just enough turns to break adaptive adaptive mesh-connected, connected faulty routing current cycle for routing k-ary the network. algorithms n-cube, networks. hexagonal, Adding allows The model for such basic extra the model produces topologies octagonal, and physical or virtual to produce fulfy algorithms, the topic of a forthcoming paper focuses on improving performance as cubechan- adaptive paper [18]. The without the ex- pense of extra channels, as might be done by changing the routing algorithms in existing routers. The steps of the turn model are given in more detail below. Unless specified otherwise, the term “channel” will be used to indicate either a virtual or physical channel. along 1. Partition the channels in the network the directions in which they route v channels in a physical dhection, to being is to add [14] introduced of all of tbe cycles. Routing algorithms that employ the remaining turns are deadlock free, livelock free, minimal or nonminimal, in- way by certain of designing their popular is caused the of the algorithm dead- adaptiveness, and are described deadlock between from then traffic many of partially 2D meshes up in a circular At reaching and in to prohibit propose the turn model. rections in which packets that the turns can form. Ordering the dimensions in thk e-cube algorithms avoid deadlock, nonadaptiveness. have cycles. nels to the topologies dimension to for 2D meshes, and hypercubes performance prevent that channels, or vir- 5 describe derived n-cubes, by prohibiting toler- algorithm feature physical it can be applied 3, 4, and the this To solve this problem sound then along the y dimension algorithm for hypercubes [13] lowest not adaptiveness patterns. O) and to leave wind might would it should phyiicaf or virtual channels. The popular zy for 2D meshes [10, 2] routes a packet first along the z dimension (dimension (dimension 1). The e-cube k-ary routing turned, that, possible have The drawback to previous routing algorithms for meshes and k-ary n-cubes is that they either sacrifice adaptiveness for deadlock freedom or achieve adaptiveness and deadlock freedom at the expense routing and not algorithm algorithm it contributes to alternative paths blocked left suggests many packets fault or non- A unique Model had routing its Indefinite postponement and livelock are generally easier to prevent. Adap- tiveness is also an important severaf of the other features hardware, or all packets This to the desti- more promising, nonminimal routing provides better ance. Of these features, the most important is freedom lock. and studying in wormhole to turn packets path) initially Turn try is possible restricts routing each other in a cycle. Figure 1 shows one way that deadlock can occur in a 2D mesh. Four packets travellhg in different directions injected In con- a predetermined and 2 the routing Llvelock in contrast, minimaf meshes Simulations Deadlock are always competBoth deadlock and rather and Harden the model is based on can turn in a network algorithms of message the paper. but (though Sections routing for different patterns Section 7 concludes it to its destination. is adaptive at times). to shortest Livelock channels. adaptive packet movement, A minimany of the minimal on adding topologies the stop of a packet based algorithms trast, physi- space and wormhole free, for networks. is not nonadaptive can stop a packet from being moving once it is in the network. progress extra tively. resources. Indefinite postponement is similar to deadlock, but occurs when a packet waits for an event that can happen but never does. For example, a packet may wait forever to acquire a packet’s it designing livelock adaptive tive indefinite postponement into a network or from for free, to be released a network resource for which other packets ing successfully. The issue here is fairness. buffer [17] and Lhder with extra channels). Instead, the directions in which packets n-dimensional In wormhole routing, such a circular wait condition will cause deadlock, because a packet holds resources while waiting and excludes other packets from acquiring the held does not a model to network the partially resource. livelock Dally are deadlock is that channels without they authors). Deadlock cannot happen. For first adding channel a new at the ends of the physical channel can share the physicaf channel and bandwidths of the virtual channels channel. An advantage of adding however, is that they can support resource for adding and the cycles that the turns can form. Section 2 presents the model in general terms and applies it to n-dimensional meshes paths, adaptively. (partly ways by d,fferent for an event that wait short presents model networks analyzing than [15, 12, 16] a virtual with a high degree of adaptiveness. algorithm can route packets along and maximally of this researchers Adding It involves in the topology. that minimal, several adapsuch an algorithm for 2D meshes. A partially cannot route packets along every shortest path. paper algorithms tle hardware for channels, buRem, and control logic. Features cent ributing to low latency and high throughput are freedom from deadlock, freedom from indefinite postponement, freedom from paths [16] describe tive algorithm This through free. logic to the two routers the virtual channels It also reduces the sharing the physical or physical channels, shortest evenly. packets and routing. is less expensive it is not control so that routers. already virtual meshes and k-ary n-cubes have y offset those of low-dimensional routing, it to adaptive in v dktinct virtual directions into sets according to packets. If each node has treat these channels as and divide them into v dktinct sets accordingly. Put any wraparound channels (for tori) in a separate set to be incorporated during Step 5. the 279 1 by modifying them to use channels only when they lead toward the destination. Nomninimal routing is more adaptive and fault ❑ lmil m tolerant, II ❑ though. To make this section the steps of the turn model clearer, the remainder of applies them to n-dimensional meshes, starting with 2D meshes. For 2D meshes, in the definition a“’n we first of n-dimensional simplify meshes. south, packet 4 rmd turns north. From can be formed: these left rmd four directions, right turns four turns cannot form a cycle; they tiveness. Deadlock can be prevented four ited, Prohibiting however, packet 1 Figmre as F@re and 1. 2. A wormhole Identify the possible turns other, ignoring 180-degree is only possible when direction. It represents to another when icaf direction, Identify four routers and will The three to the prohibited is possible of turns the partially four turns not (Figure can be prohibited adaptive routing 4c). prevent left turns right 3 describes deadlock algorithms in Figure in 4b, 41b are equivalent to bc,th cycles still exist Section to prevent turn deadlock, allowed based which and describes on these turns. one virtual direction to anO-degree turns. A O-degree there are multiple a transition different identifying topology from and from the two sets route but the cycles erally, 4. involving two in a 2D mesh. turn 3. deadlock west, abstract cycles prevents deadThe remaining also do not allow any adapby prohibiting fewer than right turns allowed in Figure left turn in Figure 4a. Thus, deadlock pairs any 4 illustrates. 4a are equivalent and the three the prohibited + 90-degree traveling turns, however. In fact, only two turns need to be prohibone from each abstract cycle, allowing for some adaptiveness in routing. + Figure eight when east, south, and north. The eight turns form two as shown in Figure 2. The ny routing algorithm lock by prohibiting four of the turns (Figure 3). packet 2 used O and 1 be- kO and kl, become become west, east, come z and V. The lengths of the dimensions, m and n. The directions —z, +a, -g, and +y cm 1 packets the terminology Dimensions virtual that these the simplest channels in one one set of channels packets in the same phys- directions. abstract turns cycles can form. in each Gen- plane F&gure 2. The possible Figure 3. Only abstract cycles and turns in a 2D of the is adequate. Prohibit one turn in each abstract cycle so as to prevent deadlock. The turns must be chosen carefully in order to break every possible cycle, including complex cycles not identified in Step 3. A useful approach is first to break the cycles more in each plane complex 5. Incorporate as wraparound one turn and then to check whether this allows from the set cycles. many channels, turns without for each wraparound as possible reintroducing channel cycles. can always of zy routing four turns (solid lines) are alllowed in the algorithm. At least be incor- porated. 6. Incorporate as many 180-degree and sible, without reintroducing cycles. Routing identified other algorithms in Step allowed that 1 and by Steps route packets use only the along turns 4, 5, and 6 are deadlock minimal or nonminimal, and maximal!y They are deadlock free because breaking circular routing O-degree turns as pos- the sets of channels from one set to anfree, livelock free, adaptive for the network. all of the cycles prevents waits. (We demonstrate deadlock freedom for individual algorithms later. ) Preventing circular waits in thk way decreasing (or increasing) that a network contains order a finite Figure 4. deadlock. [14]. This, together with the fact number of channels, means that a packet will reach its destination after Thus, routing is Iivelock free. Routing the network because the model prohibits turns from one dkection to another. produces will also be nomninimal, but (c) (a) means that it is possible to number the channels in the network so that each algorithm routes every packet along channels in strictly a limited nmuber of hops. is maximally adaptive for the minimum number of Six turns that complete the abstract cycles and allow In an n-dimensional mesh, each node has up to 2n input chanFor each of these 2n directions, nels and 2n output channels. there are 2n – 2 possible 90-degree turns to a different direction, The algorithms the model can easily be made minimal making 280 4n(n — 1 ) total turns. These turns form two abstract cy - cles in each of the n(n – 1)/2 of four turns. planes, Theorem number 1 The minimum making n(n – 1) total O! turns that hibited to prevent deadlock in an n-dimensional OT a quarter of the possible turns. Proofi The partitioned 4n(n into of the n(n one of the – 1) turns n(n cycles of four mesh a quarter 3 of the •1 Adaptive Routing in (a) The a 2D mesh, and two must there abstract are four cycles be prohibited dkections, of four turns. to prevent eight One deadlock. from these two turns, 12 prevent deadlock a deadlock account. case) and are unique if symmetry 3.1 three West-First Routing each cycle a packet must west-jimt t-outing sary, and then for the route south, direction. a packet east, and are shown represent nodes, requiring paths. in that is taken that Note packets that both This fist into suggests west, north. 5b. wait (dashed lines) algorithm. ■ ■ F Ii ■ LA ■ m ■ ■ ■ ■ A ■ paths fig- blocked or take and nonminimal ■ the In this and gray bars indicate minimal ■ if neces- Example in Figure blllil 4 for Algorithm algorithm squares channels native out adaptively west-first ure, black start algorithm: by the west-first ■ WI ways (see Figure Figure 5a shows one way to prohibit two turns in a 2D mesh. The prohibited turns are the two to the west. Therefore, to travel west, lines) — +“ turns, Of the 16 different to prohibit (solid U:m:: 2D 90-degree turn allowed A Meshes In six turns L.-i adapmesh, mesh, k-ary n-cube, and hypercube topologies. for the 3D mesh can be found in [19]. Partially r~ 1-d Each At least in order In the following sections, we introduce specific partially routing algorithms based on the turn model for 2D n-dimensional detailed study 1), can be each. – 1)/2 planes contains two of these cycles. four turns in each cycle must be prohibited -~ r be pro- is n(n– turns to prevent these cycles. Therefore, prohibiting turns is necessary to prevent deadlock. tive must mesh in an n-dimensional – 1) abstract cycles alter- paths are (b) Examples of the west-tirst algorithm in an 8 x 8 mesh. shown. Proof that the west-first partially adaptive algorithm is deadlock free is based on the work of Dally and Seitz [14], who show that a routing algorithm interconnection routes every creasing) ets first numbers sign is deadlock network packet numbers. along channels Because still Theorem lower numbers the farther 2 with if the they The west-first decreasing northward, Figure in the 5. The west-tirst routing algorithm for 2D meshes. the algorithm algorithm other directions, the farther west to eastward, east channels so that strictly the west-first west and then in the to westward channels channels free can be numbered routes (or inpack- we assign lower they are, and asand southward are. routing algorithm is deadlock free. 2m-2-2x@-y Proofi Assign each channel in the m x n mesh a two-digit number a, b in base r, where T is the greater of 3m — 2 and n — 1. Figure 6 shows the particular numbers to assign to the channels leaving node (z, V). Figure 7 illustrates the numbering of a 4 x 4 mesh. To show that west-first routes every packet along channels with strictly decreasing numbers, it is sutiicient to show that, for each channel into an arbltrru-y node, the algorithm can only route the packet shows the out four case, the channels numbers than t along a channel with a lower number. Figure 8 possible cases. Examination shows that, in each the used to route input can make a 180-degree imal routing. a packet channel. turn. This Note turn out of a node that is only in part useful 2m-2-2x,y-l have lower (c) a packet Figure for nonmin- the west-fist ❑ 281 6. Numbering routing of the channels algorithm. leaving each node (z, V) for 7,0 9,0 5,0 1,0 3.2 North-Last F@re 9a shows The prohibited turns fore, a packet should rection it algorithm: and ~1 6,1 4,1 4,1 2,1 then shown 2J 0,1 Figure 7. &o 9,0 5,0 3,0 1,0 &o 9,0 3,0 1,0 Numbering of a 4 x 4 mesh for the west-first north. Algorithm way to prohibit are the only two when travel needs to travel. route a packet north two turns in a 2D mesh. traveling north. when This suggests tlrst adaptively Exrunple in Figure paths for that There- is the last di- the north-last routing west,, south, and east, the north-last algorithm are 9b. 0,1 (a) The 7,0 Routing another six turns routing allowed (solid lines) L T ■ by the north-l=t algorithm. algorithm. 2m-2-2x,n-2-y 2m-2-2x,y Jl- 2m-1-2x,0 X,y T from west (b) input from J-t T 2m-l+x,O in an 8 x 8 mesh. routing algorithm for 2D meshes. north Proofi proof 6 rmd 2m-3-2x,0 routing is similar bers. algorithm to that 7 counterclockwise rections of the channels. routes every packet along 2m-2-2x,n-l-y east The Figures X,y 2m-2-2x,y-l 3 The north-last for is deadlock Theorem 90 degrees, and 2. reverse jree. Rotate the di- The figures now show that north-last channels with strictly increasing num•1 b 2m-3-2x,0 from north-laxt 9. The Theorem 2m-2-2x,n-2-y X,y (c) input Figure algorithm +2m-2-2x,y-l 2m-2-2x?n-2-y 2m-2+x,0 of the north-last X,y 2m-2-2x,y-l (a) input (b) Examples 2m-3-2x,0 2m-3-2x,0 (d) input from south 3.3 Negative-First Figure 10a shows The prohibited negative Figure 8. For each input channel, the output channels the west-first routing algorithm have lower numbers. allowed packet by negative-jirat west turns direction. must and Routing a third start Touting south, and Jesshope al routing. or nonminimal, and two from to travel in a negative algorithm: then paths two turns route in a 2D mesh. a positive direction in a negative direction. adaptively propose a similar The negative-tirst the nonminimal fault tolerant. Example shown in Figure 10b. 282 are the Therefore, out Algorithm way to prohibit a packet east and to a direction, This suggests a the th-st adaptively north. Yrmtchev algorithm [15], lbut only for minimalgorithm can lbe either minimal version being more adaptive and for the negative-first algorithm are Theorem jree. 4 The is a special proof meshes The negative- in Section fir8t routing case of the algorithm one given for is more deadlock n-dimensional adaptive the tive routing the source-destimtion algorithm algorithms, is. For the three partially adap however, SP = 1 for at least half of Another measure of the degree pairs. to which a minimal routing algorithm is adaptive is the ratio Salgm,thm/Sf. Sp/Sf ranges from O to 1, but averaged across 4. all source-destination pairs, Sp/Sf > 1/2, which indicates a significant degree of adaptiveness. Note that Sp = 1 for a sourcedestination pair does not imply that for that pair. The partially adaptive algorithm algorithms p is nonadaptive all permit non- rninimed routing. In the case of the negative-first algorithm, this means that rout ing can be adaptive even when d= < s= and dv ~ Su or when d= > s= and du < Sg. this, see the bottom path in Figure 10b. (a) The six turns allowed (solid lines) by the negative-tirst 4 ■ !9. ■ ■ an illustration of algo- rithm. 99 For Partially Adaptive n-Dimensional ■ 4 This from Routing in Networks section describes the adaptive routing algorithms the application of the turn model to n-dimensional and k-ary n-cubes. In general, these unless n is small or the k’s equal 2. 4.1 n-Dimensional Of the many per cycle, analogs algorithms are gorithm is the all-but- first adaptively for north-last, for 2D meshes. a packet formed noteworthy of the west-first, gorithms are not practical Meshes routing three topologies resnlting meshes The by prohibiting their one turn simplicity. They and negative-first analog of the one-negative-jirst west-first routing in the negative are routing al- routing al- algorithm: directions route of all but one dimension (n – 1) and then adaptively in the other directions. The analog of the north-last routing algorithm is the all-but-onepositive-last the negative (b) Examples Figure 3.4 How of the negative-tirst 10. The negative-fist Degree adaptive of are sion (0) and then the negative-tlmt u. mmm ■ algorithm routing muting Theorem 5 n-dimensional for 2D meshes. Proofi partially adaptive routing adaptive adaptive algorithms, algorithm, Ax (Ax ‘f = Ay partially a node = IdY – Sg [. Then, channel = { AZ!AY! &J!# if d% ~ s% 1 otherwise (.4a&2 ~ ~z!AY! Snorth_la.t = when channels { For minimal routing if dv ~ Sv < SY) or (d~ z so and dv ~ SU) 1 otherwise the the traveling but the only sum of the k, in a negative larger proof All for an of we present algorithm direction, K – n – X – 1, which –1, channels which leaving the the negative-first with strictly dimensional mesh. following theorem, if (dz < s= anddg algorithms, in the negative directions. foT n-dimensional along a K – n – X channels leaving the node If a packet ente~ a node it enters along a channel islessthan node K-n+X, thenurn- in the positive routes every algorithm increasing it enters is less than numbers directions. packet and is deadlock along free. ❑ The negative-fit algorithm is deadlock free as a result of prohibiting just one turn per abstract cycle, the turn from a positive direction to a negative direction. Therefore, prohibiting some quarter of the turns is sntiicient to prevent deadlock in an n- w = be K-n+X of the produces s negattve-fsrst K nmnbered Therefore, otherwise { adaptively positive and K – n + X, the numbers of the in the negative and positive directions. when traveling in a positive direction, + Ay)! ber -f:rst first in the The negative- fivst routing meshes is deadlock free. Let numbered s west a packet adaptively X be the snm of the z, for any node Zn-I ). Nnmber each channel leaving a node in a positive direction K — n + X and each channel leaving a node in a negative direction K — n — X. Then, if a packet enters algorithms? p be one of the three = Idz — s= 1, and route then mesh, and let (ZO , Z* , . . . . z.-2, Let Salgor,thm be the number of shortest paths the algorithm allows from source node (s=, SY) to destination node (d=, dv). Also, let j be a fully and these algorithms are deadlock free, is for the negative-fist algorithm. in an 8 x 8 mesh. algorithm adaptively in the other directions. The analog of routing algorithm is also called the n egative-jirst alg orithrn: directions Adaptiveness these routing algorithm: route a packet tirst adaptively in directions and the positive direction of one dimen- Theorem n(n — 1)) to prevent Salgor,t~~ is, the 283 maximally 6 Combining this with Theorem which supports our claim that adaptive Prohibiting routing some in an n-dimensional deadlock. algorithms. quarter mesh 1, we have the the tnrn model of the is necessary turns and (that suficient is, As the tially number adaptive messages k, of dimensions algorithms adaptively. are large. are SP But = averaged increases, more the likely to 1 less often, across all minimal be able especially Algorithm: Minimal p-cube routing for hypercubes. Input: Current address, C, and destination address, D. Procedure: par- to route when source-destination the pairs, 1. If C = D, exit. Sp/Sj >1 /2n–1, indicating that the degree of adaptiveness relative to fully adaptive algorithms decreases as n increases. Again, adaptiveness 4.2 can be increased k-ary The adaptive to use the way is to allow only on its lock free, wraparound than routing wraparound a packet first of the are routed mesh along k-ary n-cubes according to prove of k-ary along that channels, channels The the n-cubes. routing in strictly decreasing for meshes. Note that freedom all of these For k-ary routing n-cubes deadlock-free routing is a simple with that Linder and is deadlock ever, Harden free, enough minimal, channels subnetworks [16] construct with and fully to n + 1 levels per k-ary Overall, the paths, Routing for hypercubes n-dimensional algorithms meshes in h = 6, hO free They add, how- into 2n -1 and cases of the n-cubes. Proofs are corollaries routing routing especially channel algorithm 3, and algorithm when The offers compared in a di- for hypercubes. a choice of many to tlhe nonadaptive following table illustrates of the 36 possible transmitting for sends a example, shortest the message, e- this the source node (1011010100) node (0010111001). For this h] = 3. One For each node kn address the last paths the number of hop C. destination was m a posltwe address dmectlon, D; and p. nr”uti’gf”r channels 1. If C = D, route meshes routing 2. Ifp=l, and al- 3. Elseif for 4. Else R = C. algorithms that of the special address case of the negative-first of the node the header address has two phases. the packet to the local processor (CA D). and increased of the proofs algorithm flits destination adaptiveness the packet for 5. Route the currently node. occupy, The and fault tolerance, any dimension flits is D. p depends on which C is a unique input buffer constant the Figure distance of adaptiveness bet ween S and D. for the p-cube The routing the any packet i for which the first i for which Wits occupy measure available channel in a di- r, = 1. p-cube routing algorithm for hypercubes. dimension choices comment taken ‘- g: OO1OO1OOOL , . nnlm lnnnn I 7 ./ phase c, = 1 -.-..-”” “ 1 . 0010110001 I 1 0010111001 I , n 1 I ‘w and in the 6 Simulation Experiments To compare the partially adaptive routing algorithms all-but-one negative-fist (ABONF), all-but-one-positive-last (ABOPL), and negative-tirst (NF) with the nonadaptive routing algorithms Zy and e-cube, we simulate a 16 x 16 mesh and a binary 8-cube for three different tratlic patterns. Each of these topologies contains 256 nodes. of the degree Is Sp—ctibe /.$f 12. The minimal address a fully adaptive routing = l(S@D)[ is the Ham- other along CV the router. Konstantinidou proposes an algorithm similar to p-cube [20], but only for minimal routing. The number of shortest paths from S to D, SP_cti~e, is hl !ho !, where IX I represents the number of 1‘s in the binary number X, hl=l(S A ~)1, and ho=l(s A D)l. For algorithm, ~, Sf = h!, where h = hl +ho R= rout- routing, for each router, header O,then and D p-cube In the case of minimal along D= D. has a particu- and d, = 1. Then, the steps can be computed as shown in Figure 12. In both of these algorithms, the only input transmitted in the header CA R=CA the routing fist phase routes the packet along a dimension i for which c, = 1 and d, = O. When there is no such dimension, the second phase routes the packet along a dimension i for which c, = O and d, = 1. These steps are easily computed using bitwise logic operations as shown in Figure 11. If nomninimal routing is desired, because can also route then mension ing algorithm ming p-cube algorithm. = whether cases. be the binary of its p-cube Current that larly compact expression, the p-cube routing algorithm. Let S be the binary address of the source node for a packet, C be the binary available choices based on the p-cube routing is also shown. The number of choices in rmrentheses indicates the additional choices available with nonmi~mal routing. Hypercubes are special and k-ary are deadlock general The any ~i = 1. addhg n-cube Hypercubes are a special case of both n-dimensional k-ary n-cubes. Consequently, the partially adaptive more along exit. p-cube gorithms packet 10-cube where to the destination is shown. per level. 5 CAD. i for which routing a binary message nonmini- algorithm subnetwork and to construct a routing the the shortest cube Again, without adaptive. to partition R= 11. The minimal extra channels. This is a result of the many cycles that do not involve turns in the topology. By adding channels to a k-ary ncube, processor or- of the proof are strictly are minimal O,then Route Figure channel and then channels. k > 4, it is impossible algorithms local packets or increasing modification algorithms to the CAD. mension algorithm. Thus, a node at the east edge will have two channels to the west: a mesh immediately to its west and a wraparound of deadlock If R= packet dead- can be extended at the west edge of the mesh 3. 4. One is still in another way: classify each wraparound the direction in which it routes packets to a node R= channel on whether algorithm 2. the can be ex- a wraparound depending negative-tirst apPIY the negative-tit of the mesh channels channel to the node maL To for meshes channels to be routed hop. der in the proof. the proof algorithms number the mesh channels as before and assign the channels a number that is either greater than or less those channel routing. n-cubes partially tended by nouminimal route of neighboring = 1/(:,). 284 A pair of unidirectional routers and channels each router connects to its local each pair processor. All of the channels input channel The routers have into the same a router operate bandwidth, has a bufTer asynchronously 20 tlits/flsec. the and synchronize rithms about Each size of a single flit. (Figure 13). At low throughputs, the algorithms perform the same. For the nonuniform tratiic patterns, the partially adaptive to simult~ routing algorithms throughputs (Figures have the lower high are blocked the source mum sustainable throughput of the partially is four times that of the nonadaptive e-cube contain header an input flits selection consumed. waiting policy must local first-come-first-served that arrived indefinite channel has multiple selection policy decides in favor multiple arbitrate. first. This postponement. output in favor policy When channels channels channel, of the header a header policy along and queued at their the source network processors fit an output used is called ZY and the lowest dimension. Average communication latency and Teverse-jiip. any of the other the The uniform processors with matrix-transpose cessor at row umn i. In the by mapping bors in the then sent mesh. nodes resulting one d determined message the by from hypercube so that hypercube. the q ). The 0 100 200 Average 300 matrix transpose are in the reverse-tlip Figure 14. Comparison traffic in a 16 x 16 mesh. pattern Overall one at of routing in the hypercube, are for the partially These throughputs sends each able throughput algorithm and is not algorithms the highest I I I Average commun- 25 ication Z. due to shorter ( longer - path lengths for matrix-transpose than tratlic 15 tion - xv 10 west-first 1 0 0 100 Average 1 I 200 300 network negative-first I I 400 500 I 600 throughput for reverse-flip 800 packets. (flits/Psec) The maintained Despite The and simulations hypercube, latencies at high of routing tratfic for uniform happen tratlic From pattern that, nonadaptive throughputs for uniform routing than the traffic algorithms traffic probably partially in the mesh have adaptive hops) than algorithms routing to embody pattern. trafhc (11.34 routing adaptive result as well the is that as when superior for uniform provide better the The av- for uniform perform algorithms global, with for long-term a global, starts informa- long-term message bet- uniform point traffic of spread and the zv and e-cube algoadaptive algorithms, on the lower A traffic processes algo- applications, performance traiiic, 285 the will of uniform tratiic information is used. of the nonadaptive partially perforrmmce pattern is determined are mapped to the each node evenness global tratiic has been used in many we know of no real applications ind]cate the algorithms tratlic. tratlic is 4.27 hops, versus 4.01 in the mesh, the highest sustti’n- across the mesh or hypercube, maintain that evenness. The algorithms Figure 13. Comparison in a 16 x 16 mesh. for the e-cube in throughput other hand, select channels based on local, short-term information. These selections tend to benefit just the routed packet and ordy for the immediate future and tend to interfere with other -AI 700 they this the uniform evenly rithms -Q-- partially is that about view, -Q- north-last 5; * the tratiic. sustain- throughput in the mesh, which occurs for the uniform tratRc. Again, average path length is traflic (10.61 hops). The reason the nonadaptive ter throughputs and reverse-tlip the next highest is for the negative-&t algorithm and matrixThis throughput is 30~o higher than the second highest sustainable xy algorithm and latency (flsec) 800 for matrix-transpose in the hypercube, which occurs uniform tratiic. This improvement able throughput transpose trallic. i 35 30 700 (flits/flsec) sustainable adaptive algorithms are 50% higher than erage path length for reverse-flip hops for uniform tratlic. Overall I 600 sends each message at (ZO, ZI, zz, Z3, ZA, Z5, ~Ij, $7) to the I 500 throughput neigh- Messages (Z-7, Z-f, $–~, ~–~,Z-3, @, Z–I, Z-o). 40 400 network at row j and colpatterm is derived in the hypercube the processor by the pru- at (ZO, ZI, $2, Z3, X4, Z5, Z6, X7 ) to the (3Y4, W, Z6, X7, ZZO,r~, w, from each in the dictated pattern the processor message to are neighbors F pattern sends each message to equal probability. In the mesh, sends a 16 x 16 mesh mesh zo dependent. Three matrix-transpose, i and column j to the one hypercube, a matrix-transpose to the The from pattern 25 and bounded. is largely the message trafFic pattern, which is application network workloads are considered: uniform, algorithms in an input to it, is small performance adaptive algorithm. flits routing is minimal. For each simulation, two characteristics of network performance are measured: average communication latency (in Usec) and average sustainable net work throughput (in tlits delivered per psec). The throughput is sustainable when the number of Obviously, at therefore All packets especially For matrix-transpose used is called is fair available must arbitrate. The of the output channel input output The policy and decides in the router prevents When for the same available 16). traflic in both the mesh and hypercube, the maximum sustainable throughput of the partially adaptive algorithms is twice that of the nonadaptive algorithms. For reverse-fllp traftic, the maxi- from immediately entering the net work are queued at processor. Messages that arrive at a destination pro- cessor are immediately 14, 15, and latencies, neously transmit the tlits in a packet. The processors generate messages at time intervals chosen from a negative exponential distribution. Each message has an equal probability of being one packet of 10 or 200 tlits. Messages that previous that adaptive in real systems. is not routing algorithms Uniform simulation studies, but generate uniform traffic. by the application and how its nodes of the network. For most communicate with some nodes much more than algorithms 30 Average25 communication Z. latency . ABONF Q ABOPL p-cube = L presents illustrate, is often 7 Conclusions Our goal tions because has been to poor performance. and Future make the interconnection in which packets number they a problem for the are nonadaptive. Just maintain the evenness of uniform tratlic, the unevenness of nonuniform trafiic. The minimum use of net works. can turn of turns best break the channels Analyzing in a network that they blindly result, ss the Work cycles faulty ment spot hardware and decreases and livelock. Nonminimal avoidance and fault the produces free, minimal freedom and livelock freedom are essentiaf for routing algorithms. ness increases the chances that packets can avoid hot 15 - in the direc- and prohibiting all of the routing algorithms that are deadlock free, livelock or nonminimal, and maximally adaptive. Deadlock 10 Adaptivespots and the chances of indefinite postponerouting allows, even greater hot- tolerance. The turn model, urdike other 54 apprOmhes to designing ~aptive routing algorithms, is applicable to networks with only the channels required by the network o~ topologies channels). o 100 200 Average 300 400 network 500 throughput 600 700 without 800 tially (flits/psec) (as well Applied extra 15. Comparison in an 8-cube. ofroutingalgorittis formatrix-trampose While the disadvantages. as to networks with extra to n-dimensional meshes chanuels, the turn routing algorithms. adaptive adaptive routing than nonadaptive tratiic. Figure traffic tratlic as they maintain wormhole-routed 35 (psec) Nonuniform e-cube figures 40 others. ZY and algorithms algorithms turn model Adaptive trol logic for route this may increase the need for produces Simulations tion on more selection than delay. of the between information. bases one of the dimensions. a selection partially does nonadaptive Part to decide header typically new, par- indicate that they can perform better for nonuniform pat terns of message For a selection on the distance remaining and is due output to chan- Another part of the to base the route selecdimension-order on the For adaptive routing, complexity multiple nels, all of which lead to the destination. complexity is due to the need for a router a router severaf of these has many advantages, it also has some routing can require more complex con- node a router model physicaf or virtual and k-ary n-cubes distance routing, remaining routing, a, router in more than in must base one, or all, di- mensions. Every extra bit of header information that is required for the router to select an output channel increases router storage requirements of store 40 I 35 (flsec) e-cube * on network ~ the Q -,& cal channels. I tigate the turn effects model to apply octagonal, 15 10 o~ for future input In [18], to networks Other that models for networks. and more like those work. In [19], we inves- output selection we illustrate include designing Another mit adaptive routing topologies, the turns without are not stract necessarily cycles are not task butions, the extra virtuaf adaptive policies application of or physi- routing algo- obvious extension do for of our work is is the so that the the addition of channels. necessarily 90-degrees and formed identification results by four of realktic of future simulations turns. workload can In such the abA final distribe more meaningful. 500 network 1000 1500 throughput 2000 2500 (flits/psec) Acknowledgments The Figure 16. Comparison fic in an 6-cube. latencies the turn model to other topologies, such as hexagonal, and cube-connected cycle net works, all of which per- important u] Average directions of different performance. the enhanced o communication rithms are based on adding extra channels to networks, but not produce routing algorithms that are maximally adaptive .20 54 are mauy I ABOPL p-cube (NF) 25 There I ABONF 30 Average communication latency and makes and f&ward. of routing algorithms for reverse-flip authors develop traf- wish to thank Dr. Philip K. McKinley for helping us the simulator. References 1. NCUBE 2. Intel iion, 286 Company, Corporation, 1991. NCUBE A 6400 Touchdone P70ce88c,r DELTA Manual, System 1990. De8crip- 3. S. B. Borkar, Kung, R. Cohn, M. Lam, P. S. Tseng, J. Sutton, An integrated solution Proceedings 4. D. Lenoski, J. May A. and J. Webb, parallel Gharachorloo, multiprocessorfl on Agarwal, B.-H. Lim, A processor tiprocessor,” on Computer W. J. Dally D. Kranz, tmchitecture in Proc. of the Architecture, Networks, 9. vol. vol. 3, no. 4, pp. nection “Performance IEEE net works,” 39, pp. 775–785, J. J. Kubiatowicz, International torus May 1990. routing 1, no. 3, pp. X. Lh June 267–286, chipfl Journal 187–196, and L. M. Ni, “Deadlock-fi-ee networks,” International 116–125, C. L. Seitz, A new Computer 1979. multicast wormhole in Proceedings Symposium May 1986. 1990. ing in in multicomputer pp. mul- Symposium analysis of Ic-ary n-cube interconTransactions on Compute.., vol. C- Dally, Annual 10. and protocol 7. P. Kermani and L. Kleinrock, “Virtual cut-through: computer communication switching technique,” W. 20. 1988. multiprocessing 104–114, “The Computing, and for 17th pp. and C. L. Seitz, of Distributed 8. Conference the 17th Internapp. 148–159, of Architecture, on of Computer routthe 18th Architecture, 1991. W. Athas, C. C. M. Flaig, A. J. Martin, J. Seizovic, C. S. Steele, and W.-K. Su, “The arch] tecturc and programming of the Ametek Series 2010 multicomputer~ in Proceedings of the Third Conference current Computers and Applications, CA), pp. 1988. 11. 12. 13. W. J. 33–36, Association “The Dally, on Hypercube ConVolume I, (Pasadena, for Computing J-machine: System in Actors: Knowledge-Based Concument and eds.), 1989. Agha, MIT Press, Mach] nery, support for Jan. Actors) Computing (Hewitt W. J. DallY and H. Aoki, “Adaptive routing using virtual channels,” tech. rep., Massachusetts Institute of Technology, Laboratory for Computer Science, Sept. 1990. H. Sullivan fully Annu. and T. R. Bashkow, distributed Symp. parallel “A large machine? Architecture, Comput. scale, homogeneous, in Proceedings oj the lth vol. 5, pp. 105–124, Mar. 1977. 14. W. J. DallY and in multiprocessor tions 15. on Computers, J. T. Yantchev deadlock-free IEE 16. 18. and vol. C-36, Pt. routing for E, vol. W. J. Dally, “Fine-grain message ers,” in PTOC. of the Third rent Computers, vol. vol. pp. 1987. low latency, of processorsfl 178-186, May in 1988. “An adaptive and fault tolfor k-ary n-cubes ,“ IEEE 40, pp. passing Conference 1, (Pasadena, May “Adaptive, networks 136(3), D. H. Linder and J. C. Harden, erant wormhole routing strategy on Computers, message routing IEEE Tmnsac- pp. 547–553, C. R. Jesshope, packet Proceedings, Transactions 17. C. L. Seitz, “Deadlock-free interconnection networks,” on CA.), 2-12, Jan. concurrent Hypercube pp. 2–12, 1991. computConcurJan. C. J. Glass and L. M. N], “Adaptive routing in meshconnected networks,” in Proceedings of the 12th International on Distributed Computing System., June 1992. in Gupta, coherence in Proc. Computer Nov. A. cache “iWarp: computing: ’88, pp. 330-339, K. 19. H. T. L. Rankh, 1990. “APRIL: 6. J. Urbanski, directory-baaed Symposium T. Gross, J. Pieper, to high-speed Laudon, “The for the DASH tional S. Gleason, C. Peterson, oj Svpercompnting J. Hennessy, 5. G. Cox, B. Moore, 1988. C. J. GhSS and L. M. Ni, “Maximally fully adaptive routing in 2d meshes; Tech. Rep. MSU-CPS-ACS-51, Dept. of Computer Science, Michigan State University, East Lansing, Michigan, Jan. 1992. 287 S. Konstantinidou, “Adaptive, minimal routing in hypercubes, “ in Proc. of the 6th MIT Conference: Advanced Re1990. search in VLSI, pp. 139–153,