The turn model for adaptive routing

advertisement
The Turn
Model
CHRISTOPHER
for Adaptive
J.
GLASS
J!flchgatz State
Uni{er.~i[y,
Abstract.
paper
presents
model
is th~t
This
feature
of
networks
the
(although
[In analyzlng
can form.
on the
k-ary
the
directions
two
most
prohibited
fidaptwe,
Adaptive
k-my
network
routing
~t-cubes,
rithms
show which
on the
pattern
perform
better
algorlthm
with
The
traffic.
wormhole
three
for
latcncies
nonuniform
of the
turns
traffic,
adaptive
the
that
meshes
nonadaptive
and
must
routing
be
to be
n-dimensional
routing
throughput
routing
turns
focuses
turns
allow
sustainable
the
paper
meshes,
and
and highest
This
of
two-dimensional
of adaptive
that
direct
is based
algorithms
~z-dimensional
a quarter
quarters
Simulations
routing
adaptive.
unique
to
the model
and the cycles
routing,
just
A
channels
Instead,
and highly
networks,
are described
For
nonadaptive
for
such
remaining
hypercubes.
channels).
m a network
or nonminimal,
In
algorithms,
or virtual
all of the cycles produces
topologies
channels.
routing
physical
extra
can turn
has the lowest
of message
than
NI
wormhole
cm adding
to brezk
algorithms
and
designing
packets
mmimal
deadlock.
for
based
to networks
free,
extra
to prevent
meshes
is not
turns
common
without
M.
Mtchigan
enough
livelock
LIONEL
a model
lt
in which
just
free,
/z-cubes
East Lansuzg,
it can be applied
Prohibiting
are deadlock
AND
Routing
algorithms
algo-
depends
generally
ones.
Ctitegorms
and SubJect
Descriptors:
C 1.2 [Processor Architectures]:
Multiple Data Stream
Arch itectures—uiter c[,nnectim arc)zitectures: parallel processors: C,2. 1 [Computer-~ ommunication
izet~dx:
~zetwork comnunicat~ons;
Networks]: Network Architecture and I)csign-&fnlwtcd
tz~,ttvork topolom:
Ciencral
packet
Terms:
Additional
Kcy
computers,
medl
rzehvorks
Algorithms,
Design,
Words
Phrases.
md
topology,
Performance
wormhole
Adaptive
routing,
de~dlock,
hypcrcubc,
massively
parallel
routing
1. Introdzlction
Direct networks
have become a popular
interconnection
architecture
for consuch as multicomputers
and scalable
structing
large-scale
multiprocessors,
shared-memoly
multiprocessors.
Direct
networks
offer
massive
parallelism
[NCUBE
Company
1990; Intel
Corporation
1991; Borkar
et al. 1990; Agatwal
et al. 1990] and are more
This
work
in part
&f II’
93-04066.
was szrpportcd
Symposium
A shorter
on Computer
Authors’
address:
Michigm
State
Permission
of the
University.
or distributed
publication
Association
Computer
East
fee M
its date
for Computing
d
(h.
,A..ocI.lt)m
I
for
or part
appear,
Machinery.
Compul~ng
MI
M.wh~nay,
Laboratory,
48 S24-1027.
(NSF)
at the
other
grants
19th
Department
c-Mail:
of this material
commercial
specific permission.
G 19!)4 ACM
00(J4-541 1/94/0900-0874
J<wrn, tl
Foundfltlon
was presented
Systems
Lonsing,
for dmect
and
Science
of Lhls paper
et al, 1988; Lenoski
than
approaches
ECS 88-14027
Annual
and
International
Architecture.
Advanced
to copy without
not made
by National
vclwon
scalable
and
advantage,
notjce
To copy
{glass,
IS granted
provided
the ACM
is gwen
otherswse,
that
of Computer
copyright
copying
41,
N“
5, S.ptcmlxr
1YY4,
pp
that
notice
the copies
874–[ 102
requires
are
and the title
is by permission
or to republish,
$03.50
V<>l
Science,
ni}<~cps.msu.edu.
of the
a fce and/or
The Turn Model
to
for Adaptile
multiprocessor
organized
local
directly
defined
each
with
interconnection.
as an ensemble
memory,
Routing
and other
875
A
of nodes,
supportive
system
based
where
each
devices.
The
on
node
a direct
has its own
network
connects
neighbors
through
for
forwarding.
the direct
local
which
is
processor,
each node
to only a few other nodes, its neighbors. Which nodes are neighbors
is
by the topolog
of the network.
Neighboring
nodes communicate
with
other by sending
messages across the channels.
A node
a node that is not its neighbor
by sending
a message
both
network
network,
channels,
connect
To
which
handle
the
each node
connect
it to neighboring
complexities
often
of
has a router.
it to local
devices,
routing
The
and
communicates
to one of its
messages
router
network
controls
channels,
routers.
Wownhole
routing
[Dally
and Seitz 1986] is becoming
a popular
switching
technique
in direct networks.
Wormhole
routing
switches a message along a
network
by first dividing
the message into packets and the packets into flow
control digits or f?its. It then routes the flits in each packet through
the network
in a pipeline
fashion.
Header jlits contain
all of the routing
information
for a
packet and lead each packet through
a router
that has no suitable
output
the network.
When
channel
available,
the header flits reach
all of the flits in the
packet wait where they are for a suitable channel to become available.
the attractions
of wormhole
routing
is that each router
requires
only
One of
enough
buffer
space to store a few flits for each channel.
Another
attraction
is that
communication
latencies
are low. In the absence of contention,
latencies
are
proportional
to the sum of the packet length and the distance to travel [Dally
1990].
The
majority
of network
topologies
meshes and k-my n-cubes. particularly
A two-dimensional
mesh is used
Corporation
the Caltech
1991], the Intel
MOSAIC;
for wormhole
routing
are ~z-dirnensional
low-dimensional
meshes
in the Intel
Touchstone
Paragon,
a two-dimensional
the Symult
torus
2010 [Seitz
and hypercubes.
DELTA
[Intel
et al. 1988], and
is used in the Fujitsu
AP1OOO
[Ishihata
et al. 1991], a three-dimensional
mesh is used in the MIT J-machine
[Dally
1989], and a hypercube
is used in the nCUBE-2
[NCUBE
Company
1990].
Formally,
an n-dimensional
mesh has k. X kl X “”” X k,, - 1 nodes, k, nodes
along each dimension
i, where
is identified
by n coordinates,
each dimension
all
i
except
i. Two
one,
j,
nodes
where
k, > 2. Each node X in an n-dimensional
mesh
(.xn, 1-1. . . . . 1-,1_ 1) , where O<xi<k,
–l
for
X and Y are neighbors
if and only if x, = y, for
yj = x, + 1. Thus,
neighbors,
depending
on their location
1990], all nodes have the same number
nodes
in the mesh.
of neighbors.
have
from
n
to
2n
In a k-ary n-cube [Dally
The definition
of a k-ary
n-cube differs from that of an n-dimensional
mesh in that the k,’s are all equal
to k and in that two nodes X and Y are neighbors
if and only if xl = J’, for all i
except one, j, where y, = ( XJ & 1)mod k. The change to modular
arithmetic
in
the definition
adds wraparound
channels to the k-ary mcube, giving it symmetry. As a result, each node in a k-ary n-cube has n neighbors
if k = 2 and 2n
neighbors
if k > 2. Another
topology with symmetry
is a hypercube,
which is a
n-cubes. A hypercube
is
special case of both }z-dimensional
meshes and k-a-y
an n-dimensional
mesh in which k, = 2 for all i, or a 2-ary n-cube.
Meshes
and k-ary
n-cubes
are popular,
in part, because
their
regular
topologies
simplify the routing of messages. In general, low-dimensional
meshes
are preferred
because they have low, fixed node degrees, which makes them
876
C. J. GLASS
more
scalable
dimensional
per bisection
result
than
in lower
1990].
In
high-dimensional
meshes
density
communication
addition,
topological
meshes
also have fewer
and have lower
their
dimensions
latencies
function
being
sions. High-dimensional
lower diameters,
which
channels
contention
k-ary
L. M.
n-cubes.
NI
Low-
and higher channel
bandwidth
and blocking
]atencies, which
and higher
better
easier
and
AND
hot-spot
matches
their
to implement
throughputs
form,
two
in the three
meshes and k-ary n-cubes, on the
shortens path lengths. High-dimensional
[Dally
and
physical
three
dimen-
other hand,
topologies
have
also
have more paths between
pairs of nodes, which permits
more fault tolerance.
Finally,
the symmetry
of k-ary n-cubes can make it easier to spread message
traffic more evenly.
The sequence
of channels
a message packet traverses
on its way from its
source
the
node
to its destination
routers
of the
have three
features:
direct
node
network
low communication
ease of implementation
in VLSI.
provide
packet transmission
that
Factors
contributing
buffers,
switches,
high throughput
from
Iivelock,
routing
packets
some
by different
by the routing
A good
latency,
high
to ease of implementation
and control
along
short
definition
circuits.
authors).
Factors
are few
fault
tolerance.
they
have been
occurs
when
a packet
Deadlock
and
that
should
throughput,
and
algorithm
should
and practicality.
and
contributing
because
algorithm
algorithm
network
small
channels.
to low latency
deadlock,
freedom
from
traffic
evenly,
routing
paths,
(partly
routing
In other words, a routing
is high in quality,
quantity,
are freedom
from
spreading
packet
require
is determined
execute.
and
starvation,
freedom
packets
adaptively,
Many
of these
terms
used in different
waits
for
ways
an event
that
cannot happen. For example,
a packet may wait for a network
resource to be
released by a packet that is, in turn, waiting for the first packet to release some
resource.
In wormhole
routing,
such a circular waif condition
causes deadlock,
because a packet holds resources
from acquiring
the held resources.
when
a packet
a packet
packets
may
waits
wait
are always
for an event
forever
to
competing
while waiting
and
Starvation
is similar
that can happen
acquire
but never
a network
successfully.
excludes
other packets
to deadlock,
but occurs
Starvation
does. For example,
resource
for
which
is only possible
other
when
the
allocation
of network
resources
is unfair.
Both deadlock
and starvation
can
stop a packet from being injected
into a network
or from moving once it is in
the network.
In contrast,
Iivelock does not stop the movement
of a packet, but
rather
stops
its progress
toward
the
destination.
routing
of a packet never leads it to its destination.
when routing
is adaptile
(not along a predetermined
Lil’e/ock
Livelock
path)
occurs
when
the
is possible only
and nonrninimal
(away from,
or equidistant
to, the destination
at times). Minimal
routing, on the
other hand, restricts packets to shortest paths. Although
minimal
routing
may
initially
sound more promising,
nonminimal
routing
provides better fault toler-
ance.
Of the factors contributing
to good routing
algorithms,
the most important
one is freedom
from deadlock.
Deadlock
can keep many or all packets from
reaching
their
destinations
and occurs
readily
unless a routing
algorithm
contains
preventive
measures.
Starvation
and livelock
are less likely to occur
and are generally
easier to prevent.
Adaptiveness
is also an important
factor,
because it contributes
to several of the other factors [Dally and Aoki 1990]. It
provides alternative
paths for packets that encounter
unfair channel allocation,
Paulty hardware,
or hot spots in traffic patterns.
The adaptiveness
of a routing
The Tum
Model
algorithm
falls
the
into
algorithm
nonadaptive
McKinley
choose
a path
every
a partially
node
a packet
from
from
on the number
packets,
toward
algorithm
its source
A partially
multiple
leading
adaptive
depending
forwarding
is predetermined.
1993] can choose
from
Thus,
when
or the
of nodes
number
can route packets along. A nonadaptive
node for each packet
to be forwarded.
routes
that
877
categories,
from
the algorithm
of only one
algorithm
along
Routing
one of three
can choose
shortest paths
has a choice
node
for AduptiL’e
nodes
to its destination
adaptil’e
algorithm
for some packets,
the destination
cannot
node
route
node
packets
of
algorithm
Thus, a
[Ni and
but cannot
for every packet.
along
every
shortest
path in a network
topology.
A fidly adaptil]e
node leading
toward
the destination
node
algorithm
can choose from every
for every packet.
Thus, a fully
adaptive
topology.
every
algorithm
can route
packets
along
shortest
path
in a network
The drawback
to previous
methods
of designing
routing
algorithms
is that
they either sacrifice adaptiveness
for deadlock freedom
or achieve adaptiveness
and deadlock
freedom
at the expense of adding physical or virtual
channels.
The popular method of dimension ordering is based on ordering
the dimensions
in a network
Applied
and routing
packets
to two-dimensional
Corporation
along
meshes,
the
dimensions
it produces
1991; Seitz et al. 1988]; route
strictly
the .x-yrouting
a packet
first
along
in that
order.
algorithm
[Intel
the x dimension
(dimension
hypercubes,
0) and
1977]; route
and higher
a packet first along the lowest dimension
and then along higher
dimensions.
Ordering
the dimensions
ensures
that the xy and
e-cube
then
it produces
algorithms
avoid
along the
the e-cube
deadlock,
y dimension
(dimension
routi?zg algoiith?n
[Sullivan
but it also ensures
1). Applied
to
and Bashkow
nonadaptiveness.
Another
popular
method
of designing
routing
algorithms
that are deadlock
free, and
possibly adaptive, is to add Lirtual channels to a network.
Dally and Seitz [1987]
introduced
the method
for nonadaptive
routing,
and several
researchers
[Yantchev
have
and Jesshope
extended
1989; Dally
it to adaptive
and Aoki
routing.
1990: Linder
Adding
a virtual
and Harden
channel
1991]
to a physical
channel is less expensive
It involves adding buffer
than adding a new physical channel, but it is not free.
space and control
logic to the two routers at the ends
of the physical
channel
channel
and routers.
It
so that the virtual
channels
also reduces the bandwidths
already
sharing
the
physical
channel.
An
can share the physical
of the virtual
channels
advantage
of
adding
virtual
or
physical channels, however, is that it can produce routing
algorithms
that have
a high degree of adaptiveness.
For example,
Dally [1988], Linder
and Harden
[1991],
and
Glass
and
Ni
[1992]
describe
how
two-dimensional
mesh in order to produce
This paper presents a model for designing
are deadlock
free,
for networks.
A unique
physical
or virtual
livelock
free,
feature
channels
minimal
add
or nonminimal,
of the model
to networks
to
virtual
a fully adaptive
wormhole-routing
is that
(although
channels
to
a
routing
algorithm.
algorithms
that
and highly
it is not based
it can be applied
adaptive
on adding
to networks
with extra channels).
Instead, the model is based on analyzing
the directions
in
which packets can turn in a network
and the cycles that the turns can form.
Section 2 presents the model in general terms. Sections 3,4, 5, and 6 apply the
model to two-dimensional
meshes, n-dimensional
meshes, k-ary n-cubes, and
hypercubes
without
virtual
channels,
producing
a variety of partially
adaptive
routing
algorithms.
(Without
adding channels to meshes and k-ary n-cubes, it
is impossible
to produce
routing
algorithms
that are deadlock
free and fully
C. J. GLASS AND
878
adaptive
[Ni
rithms.
and
as well
Sections
2. The
McKinley
1993].
as previously
7 and 8. Section
of the
nonadaptive
9 concludes
partially
algorithms,
M.
adaptive
NI
algo-
are described
in
the paper.
Model
TLU71
Deadlock
Simulations
known,
L.
in wormhole
routing
is caused
by message
packets
waiting
on each
other in a cycle. One way that deadlock can occur in a two-dimensional
mesh is
shown in Figure 1. Four packets traveling
in different
directions
try to turn left
and wind up in a circular
wait. If only one of the packets had not turned,
deadlock
algorithm
turns.
the
would
might
have been avoided.
This possibility
suggests that a routing
prevent
deadlock
in a direct network
by prohibiting
certain
The routing
many
algorithm
possible
would
cycles
in
algorithm
would
have to leave
algorithm
should
not prohibit
wormhole
adaptive for a direct
directions
in which
routing
network,
a path
more
tiveness would be reduced.
The preceding
observations
designing
have to prohibit
the
between
turns
suggest
At
every pair
than
that
the
time,
the
Finally,
the
otherwise,
solution
are
in each of
same
of nodes.
necessary:
a new
algorithms
at least one turn
however,
to
deadlock
the
its adapproblem
free
and
of
highly
network.
the turn model. The model involves analyzing
packets can turn in the network
and the cycles that
the
the
turns can form. It prohibits
just enough of the turns to break all of the cycles.
Routing
algorithms
that employ the remaining
turns are deadlock free, livelock
free, minimal
or nonminimal,
and highly adaptive for the network.
The steps of
the turn
term,
model
are given
channel,
indicates
(1) Partition
directions
topological
in more
either
detail
below.
a physical
Unless
or virtual
specified
otherwise,
the channels
in the direct
network
into sets according
to the
in which they route packets. If each node has u channels
in a
direction,
l’irtua[
treat these channels
as being in L’ distinct
L’
distinct
sets accordingly.
directions
and divide
them
into
wraparound
channels
in a separate set to be incorporated
during
(2) Identify
the possible turns from one virtual
direction
to another,
O-degree
and 180-degree
turns.
A (1-degree turn
are multiple
channels
in a topological
from one set of channels
to another,
topological
(3)
Identify
simplest
the
channel.
direction,
but different
is only possible
Put any
Step (5).
omitting
when
there
direction.
It represents
a transition
where the two sets are in the same
virtual
directions.
the cycles that the turns
can form.
Generally,
cycles in each plane of the topology
is adequate.
identifying
the
(4) Prohibit
the minimum
number
of turns
so that at least one turn
is
prohibited
in each cycle. The turns must be chosen carefully
in order to
break every possible
cycle, including
very complex
cycles. A useful
approach is first to break the cycles in each plane and then check whether
doing so allows more complex cycles.
(5)
Incorporate
as many turns as possible
channels,
without
reintroducing
cycles.
wraparound
(6) Incorporate
reintroducing
Routing
channel
can always
as many
cyc~es.
algorithms
that
O-degree
route
in Step (1) and use only the turns
involving
At least
the set of wraparound
one turn involving
each
be incorporated.
and
180-degree
packets
from
along
turns
as possible,
the sets of channels
one set to another
allowed
without
identified
by Steps (4),
The Tum
Model
for Adaptile
Routing
879
K
packet3
n
—
It
..’’’””
II
•1
.,
“O
n
~
packet 4
*
❑☞
“’’...,.❑
I
~~::i:election
packet
0.,m
J-
buffer
t
packet 2
.
~
fl,
u
progression
................ +
packet
awaiting
resource
H /“” ❑
i >,, II
packet 1
-0
T
FIG.1.
Awormhole
deadlock
invoicing
four routers and four packets inatwo-dimensional
(5), and (6) are deadlock free, livelock free, minimal
or nonminimal,
adaptive
for the network.
They are deadlock
free because breaking
cycles involving
channel
turns
dependency
algorithms
are
check
the turn
that
channel
the
free
model
are
means
each
(or
and Seitz
in later
applied
that
algorithm
increasing)
to another
prevents
1982]. Proofs
presented
has been
graph
so that
decreasing
one set (direction)
[Dally
deadlock
dependency
network
strictly
from
graph
correctly.
that
it is possible
to number
packet
order.
The
cycles in the
possibility
routing
but
Preventing
routes
every
and highly
all of the
particular
sections,
mesh.
only
as a
cycles in the
the channels
along
of such
channels
in
in
a numbering,
together
with the fact that a network
contains
a finite
number
of channels,
means that a packet reaches its destination
after a limited
number
of hops.
Thus, the routing
algorithms
are livelock
free. They are also inherently
nonminimal,
only
but
when
can easily
they
lead
be made
toward
minimal
the
by modifying
destination.
them
Nonminimal
to use channels
routing,
more adaptive
and fault tolerant.
Finally,
the routing
algorithms
adaptive for the network
because the model prohibits
the minimum
turns
from
dependency
The turn
one direction
to another
graph.
model is applicable
necessary
to direct
to prevent
networks
that
however,
is
are highly
number
of
cycles in the channel
have directions
associ-
ated with their channels,
which is most existing
and feasible
networks.
The
model produces
partially
adaptive
routing
algorithms
for such common
topologies as mesh-connected,
k-ary
n-cube,
hexagonal,
octagonal,
and cubeconnected
cycle networks.
Adding
extra virtual
or physical
channels
to the
topologies
allows the model to produce
fully adaptive
routing
algorithms.
The
following
sections apply the model to a variety of meshes and k-ary n-cubes
without
extra
channels
and should
make
the model
much
clearer.
C. J. GLASS
880
3. Two-Dimensional
For
The
directions
meshes,
meshes
lengths
of
–.x,
L.
M.
NI
of
n-
Meshes
two-dimensional
dimensional
AND
the
+.x,
the
terminology
can be simplified.
dimensions,
–y,
and
k.
+?
used
Dimensions
and
become
k,,
in
become
west,
the
O and
m
definition
1 become
and
east, south,
x and
n. Finally,
y.
the
and north.
The four directions
permit
eight 90-degree
turns: left and right turns when
traveling
west, east, south, and north. The eight turns form two simple cycles,
as shown in Figure 2. The xy routing
algorithm
shown in Figure 3. Prohibiting
these four turns
four
remaining
turns
also
turns
do not
cannot
permit
vented by prohibiting
need to be prohibited,
routing.
Figure
a cycle.
four of the turns, as
deadlock
because the
Unfortunately,
any adaptiveness
in routing.
the four
Deadlock
any
The
two
turns
three
left
does
turns
not
prevent
allowed
deadlock,
in Figure
impossible
to route
Of the
cycles, only
into
most
16 different
12 prevent
packets
to the node
ways to prohibit
deadlock
in the northwest
one turn
however,
as
4(a) are equivalent
to the prohibited
right turn in Figure 4(b), and the three right
Figure 4(b) are equivalent
to the prohibited
left turn in Figure
cycles still exist and deadlock
is possible,
as shown in Figure
two turns as in Figure
4 also has another
serious drawback.
turns that are a combination
of the directions
north and west,
mesh.
remaining
can be pre-
fewer than four turns, however.
In fact, only two turns
one from each cycle, allowing
for some adaptiveness
in
Prohibiting
4 illustrates,
form
prohibits
prevents
turns allowed in
4(a). Thus, both
4(c). Prohibiting
It prohibits
all
which makes it
corner
of the
in each of the two simple
and only 3 are unique
if symmetries
are taken
5(a) shows
one way to
account.
3.1. THE
WEST-FIRST
ROUTING
Figure
ALGORITHM.
prohibit
two 90-degree
turns in a two-dimensional
Because the mesh has no wraparound
channels
step of the turn model is to incorporate
as many
without
reintroducing
can be so incorporated:
mesh and break every cycle.
and O-degree turns, the last
180-degree
turns as possible
cycles. Of the four possible
180-degree
turns. only two
the turn from west to east and either the turn from
south to north or the turn from north to south.
Because all turns to the west are prohibited,
a packet must start out in that
direction
in order to travel west. This requirement
suggests the wext-jirst routing
algorithm:
route
a packet
first
west,
if necessary,
and
then
adaptively
south,
east, and north.
Examples
of paths allowed
by the west-first
algorithm
are
shown in Figure 5(a). In this figure, black squares represent
nodes, and gray
bars indicate
blocked
channels
requiring
that packets wait (dashed lines) or
take alternative
paths (solid lines ). Note that both minimal
and nonminimal
paths are shown.
Proof that the west-first
routing
algorithm
is deadlock
free is based on the
work of Dally and Seitz [1987], who show that a routing
algorithm
is deadlock
free if the channels
in the direct
network
can be numbered
so that the
algorithm
routes
every packet
along
channels
with
strictly
decreasing
(or
increasing)
numbers.
Because the west-first
algorithm
routes packets first west
and then in the other directions,
the proof assigns lower numbers
to westward
channels the farther west they are, and assigns still lower numbers to eastward,
northward,
and southward
channels the farther
east they are.
THEOREM
1.
The west-first
routing
algorithm
is deadlock
free.
The Tum
Model
F
,
,
--l
Ill
ri
FIG. 4. Six turns that complete
cycles and allow deadlock.
Ill
the
~y
7
t.(a)
(b)
r, where
particular
illustrates
shows
in a m X n mesh
greater
of
3m – 2 and
every packet
four
possible
used to route
number
a. b in
6 shows
the
along
channels
with
strictly
decreasing
numbers,
it is
3.2. THE
cases.
Examination
a packet
out
shows
of a node
NORTI
way to prohibit
I-LAST
ROUTING
two 90-degree
turns
cast or the turn from east to west.
Because all turns when traveling
north
travel north when that is the last direction
in
each
case,
the
numbers
than
the
Tile tlorth-last
The proof
is similar
routing
to that
5(b)
shows
another
mesh and break
every
only two 180-degree
turns to be
and either the turn from west to
are prohibited,
a packet should only
it needs to travel. This requirement
suggests the north-lint
routing algorithm:
route
south,
and east, and then north.
Examples
north-last
algorithm
are shown in Figure 5(b).
2.
Figure
ALGORITHM.
in a two-dimensional
cycle. The remaining
90-degree
turns allow
incorporated:
the turn from south to north
THEOREM
that,
have lower
❑
channel.
PROOF.
a two-digit
n – 1. Figure
to show that, for each channel
into an arbitrary
node, the algorithm
route the packet out along a channel with a lower number.
Figure 8
the
channels
~ is the
numbers
to assign to the channels
leaving
node (x, y). Figure
7
algothe numbering
of a 4 x 4 mesh. To show that the west-first
routes
sufficient
can only
(c)
Assim-–– each channel
PROOF.
input
The four turns allowed by the w routing algorithm.
rl
~
....-1
1=
I_J
rithm
FIG. 3.
+--’
+--m
base
881
Fm. 2. The possible turns and simple cycles in a two-dimensional mesh.
L-l
,..
1
t---
4--,
,
,..~
Routing
r~
1--1
L-J
~
for Adaptile
algorithm
a packet first
of the paths
is deadlock
for Theorem
adaptively
west,
allowed
by the
ji-ee.
1. Rotate
Figures
6 and 7
counterclockwise
90 degrees, and reverse the directions
of the channels.
The
figures
now show that the north-last
algorithm
routes
eve~
packet
along
❑
channels with strictly increasing
numbers.
Figure 5(c) shows a third
3.3. THE NEGATIVE-FIRST
ROUTING
ALGORITHM.
way to prohibit
two 90-degree turns in a two-dimensional
mesh and break every
882
C. J. GLASS
,..
m
7
L-l
■
■
■
■
mmmm’um
❑ Eq!4m
■
■
AND
L. M.
NI
;.m
1
❑
.
(b)
r
4
--,
L.-j
r
..-.
Li
(c)
Fici. 5. The SIX turns allowed and ewimplcs of paths In an 8 X 8 mesh of’ the three addptlve
routing algorlthrns. (a) The west-first routing algorithm. (b) The north-least routing algorlthm. (c)
The nqytwe-first
routing algorithm.
cycle. The remaining
90-degree
turns allow only two 180-degree
turns to be
incorporated:
the turn from west to east and the turn from south to north.
Because
all turns from
a positive
direction
to a negative
direction
are
prohibited,
a packet must start out in a negative direction
in order to travel in
a negative
direction.
This requirement
suggests the negatiL’e-first
routing algorithm: Route a packet first adaptively
west and south, and then adaptively
east
The Turn Model
for AdaptiLe
883
Routing
2m-2-2x,n-2-y
t
2m-2+x,0
%Y
FIG. 6. Numbermg
of the channels leaving
node (.Y, y) for the west-first routing algorithm.
2m-3-2x,0
each
v
2m-2-2x,y-l
6,2
8,0
9,0
5,0
3,0
1,0
4,2
6,0
6,1
4,0
2,2
2,0
7,0
8,0
9,0
5,0
3,0
1,0
6,1
6,0
FIG. 7.
7,0
4,1
4,1
2,1
2,1
7,0
8,0
9,0
5,0
3,0
1,0
4,0
6,2
4,2
2,0
2,2
7,0
8,0
9,0
5,0
3,0
1,0
Numbering
of a 4 x 4 mesh for the west-first
0,2
0,0
0,1
0,1
0,0
0,2
routing
algorithm
and north. Yantchev
and Jesshope [1989] propose a similar algorithm,
but only
for minimal
routing.
The negative-first
algorithm
can be either
minimal
or
nonminimal;
the nonminimal
version,
which
allows
180-degree
turns, being
more
adaptive
negative-first
THEOREM
and
fault
algorithm
3.
tolerant.
Examples
are shown
The negatiue-jii-st
in Figure
routing
The proof
is a special
in Theorem
6.
❑
algorithm
case of
PROOF.
meshes
of
the
paths
allowed
by
the
5(c).
the
is deadlock
one
given
free.
for
n-dimensional
How adaptive are these partially
adaptive
3.4. DEGREES OF ADAPTIVENESS.
routing
algorithms?
Let S.lgO, ~th~ be the number
of shortest
paths an algorithm allows from source node (s,, Sy) to destination
node ( dx, dY ). Also, let f
be
a fully
algorithms,
adaptive
Ax
algorithm,
p
= Idx – SJ and Ay
(Ax+
S,=
f
one
of
the
three
Ay)!
Ax!Ay!
(Ax
s west-first
be
= Id} – S}l. Then,
+ Ay)!
if
=
[
b,
Ax!Ay!
dz > Sx
‘
otherwise
partially
adaptive
884
C. J. GLASS
2m-2-2x,n-2-y
AND
L. M.
NI
2m-2-2x,y
2m-’-20+03-2x2
&-3-2x’0
+
t
FIG, 8. For e~ch input channel, the
outrxtt channels Wowed bv the westf]rs;
routing
algorithm
have lower
numbers. (a) Input from west (b) Input from north. (c) Input from ext. (d)
Input from south.
2m-2-2x,y-l
2m-2-2x,y-l
(b)
(a)
2m-2-2x,n-2-y
2m-2-2x,n-2-y
2m-l+x,O
2m-2+x,0
2m-3-2x,0
X,y
%Y
2m-3-2x,0
+
2m-2-2x,n-l-y
+
2m-2-2x,y-l
(d)
(c)
1
(A.r
S,,<,,,,,.,,,,, =
For
the
minimal
algorithm
the
degree
to
otherwise
\l,
otherwise
algorithms,
the
which
d,, < s,
‘
\l,
three
ever, Sr = 1 for at least half
of
if
Ax!Ay!
routing
is. For
+ Ay)!
the
larger
partially
S.l~O,.{,l,,H is, the
adaptive
routing
of the source-destination
a minimal
routing
pairs.
algorithm
more
adaptive
algorithms,
Another
is
howmeasure
adaptive
is
the
SP/Sj
ranges from O to 1, but averaged across all sourcel-atlo su/g(l?[t/t,}t /Sf.
destination
pairs, S,,/Sf > 1/2, which indicates
a significant
degree of adaptiveness. Finally,
note that SP = 1 for a source-destination
pair does not imply
that algorithm
p is nonadaptive
for that pair. The partially
adaptive algorithms
all permit
nonminimal
nonminimal
routing.
In the case of the negative-first
routing
means that routing
can be adaptive even when
~Y > SY or when d,, > s, and dY < SY. For an illustration
tweness, see the bottom
path in Figure 5(c).
4. n-Dimensional
of this
algorithm,
d, s s, and
added
adap-
Meshes
In an n-dimensional
mesh, each node has up to 2n input channels
and Zn
output
channels.
For each of the 2n directions,
there
are 2n – 2 possible
90-degree
90-degree
turns to a different
direction,
making
4n(n – 1) total
turns. These turns form two simple cycles in each of the n(lz – 1)/2 planes,
making
n( n – 1) total cycles of four turns.
THEOREM
4.
The minimlun
number of 90-degree turns that must be prohibited
to prel)ent deadlock
in an n-dimensional
mesil is n( n – 1), or a quarter of the
possible tur?ls.
The Turn Model
for Adaptitle
The
PROOF.
4n(n
Routing
– 1) turns
885
in
an
n-dimensional
mesh
form
n( n – 1)
cycles of four turns each. At least one of the four turns in each cycle must
prohibited
in order to prevent these cycles. Therefore,
prohibiting
a quarter
the turns is necessary to prevent deadlock.
❑
be
of
THEOREM
5.
The minimum
and maximum
number of 180-degree turns that
must be prohibited
to prevent deadlock in an n-dimensional
mesh is n, or one half
of the possible
turns.
The
PROOF.
each 180-degree
180-degree
turn,
turns
in an n-dimensional
its reverse
is always
mesh come
a different
180-degree
in pairs:
turn.
for
Both
of
the turns in a pair cannot be allowed,
because they would create a cycle by
themselves.
Therefore,
at least half of the 180-degree
turns must be prohibited.
If a 180-degree
turn creates a cycle when allowed,
some combination
of the
other allowed turns must form the equivalent
of the reverse 180-degree
turn. If
both of the turns in a pair create cycles when allowed, the other allowed turns
must form the equivalents
of both of the turns, which implies
that the other
allowed turns can also form a cycle. This contradiction
implies that at least one
of the turns in
180-degree
turns
each
must
pair can be
be prohibited.
allowed.
❑
Therefore,
at most
Of the many routing
algorithms
formed
by prohibiting
per cycle and one 180-degree
turn per dimension,
three
their simplicity.
They are analogs of the west-first,
routing
algorithms
for two-dimensional
meshes.
routing
algorithm
packet
first
is the
adaptively
in
all-but-one-negatil~e-jirst
the
negative
of
one 90-degree
are noteworthy
the
turn
for
north-last,
and negative-first
The analog of the west-first
routing
directions
half
of
all
algorithm:
but
one
route
a
dimension
(n – 1) and then adaptively
in the other directions.
The analog of the north-last
routing
algorithm
is the all-but-one-positil!e-last
routing algorithm:
route a packet
first adaptively
in the negative
directions
and the positive
direction
of one
dimension
(0) and then
negative-first
routing
adaptively
algorithm
in the
other
is also called
directions.
the negative-first
The
analog
routing
of the
algorithm:
route a packet first adaptively
in the negative
the positive directions.
All of these algorithms
directions
and then adaptively
in
are deadlock
free, but the only
proof
algorithm.
presented
THEOREM
deadlock
the
channel
leaving
is for the negative-first
The negatil)e-first
routing
algorithm
for
n-dimensional
meshes is
$-ee.
PROOF.
be
6.
here
sum
Let K be the sum of the k, for an n-dimensional
mesh, and let X
each
of the x, for any node (xo, X1, ..., x,, _z, Xn _ !). Number
leaving a node in a positive
a node in a negative
direction
direction
K – n + X and each channel
K – n – X. Then, if a packet enters a
node when traveling
in a negative
direction,
it enters along a channel
numthe
bered K – H – X – 1,which is less than both K – n – X and K – n +X,
numbers
of the channels
leaving the node in the negative
and positive
directions. If a packet enters a node when traveling
in a positive direction,
it enters
along a channel numbered
K – n + X – 1, which is less than K – n + X, the
number
of the channels leaving the node in the positive directions.
Therefore,
the negative-first
algorithm
routes every packet along channels
with strictly
increasing
numbers
and is deadlock
free.
❑
~. J. GLASS AND
886
The
negative-first
one turn
algorithm
per cycle,
Therefore,
the turn
prohibiting
is deadlock
from
some
free
a positive
quarter
of
the
deadlock
in an n-dimensional
mesh. Combining
4, gives the following
theorem,
which supports
produces
highly
THEOREM
dimensional
7.
adaptive
routing
Prohibiting
4.1. DEGREES
turns
M.
of prohibiting
to a negative
is sufficient
this observation
the claim that
NI
just
direction.
to
prevent
with Theorem
the turn model
algorithms.
sonte
mesh is necessary aid
as a result
direction
L.
quarter
sufficient
OF ADAPTIVENESS.
of
the
4n( n – 1)
to pre[ent
As the
number
turns
in
an
n-
deadlock.
of dimensions
increases,
the minimal,
partially
adaptive
algorithms
are more likely to be able to route
message packets adaptively.
SP = 1 less often, especially when the k, are large.
But
that
averaged
across all source-destination
pairs, S,,/Sf
the degree of adaptiveness
relative
to fully adaptive
as iZ increases. As in the special case of two-dimensional
adaptiveness
can be increased
by nonminimal
routing.
5. k-sty
> 1/2”1. indicating
algorithms
decreases
meshes,
the degree
of
n-cubes
The adaptive
routing
wraparound
channels
algorithms
for meshes can be extended
of k-ary ~z-cubes. One way to extend them
to use the
is to allow a
packet to be routed along a wraparound
channel only on its first hop. To prove
that the routing
algorithm
is still deadlock
free, number
the mesh channels as
before
and assign the wraparound
channels
a number
that is either
greater
than
or less than
those
of the mesh
are routed along channels
in strictly
The negative-first
routing
algorithm
another
way: classify
each wraparound
channels,
depending
on whether
decreasing
or increasing
order.
can also be extended to k-ary
channel
as negative
packets
n-cubes
or positive
in
accord-
ing to how it routes packets relative
to the mesh channels
and then apply the
negative-first
algorithm.
For example,
a node at the east edge of the mesh
channels
has two channels
to the west: a mesh channel
to the node immediately to its west and a wraparound
channel
to a node at the west edge of the
mesh channels.
Either
channel
to the west can be used during
the negative
phase of routing
a packet. The negative
and positive phases in a 4-ary 2-cube
are illustrated
in Figure 9. The proof of deadlock
freedom
is, again, a simple
modification
of the proof for meshes,
All of the preceding
routing
algorithms
are strictly
nonminimal.
For k-ary
/l-cubes with k > 4, it is impossible
to design deadlock-free
routing
algorithms
that are minimal,
without
adding extra channels
[Ni and McKinley
1993]. To
design a routing
algorithm
that is deadlock
free, minimal,
and fully adaptive,
Linder
and Harden
[1991] add virtual
channels
to a k-ary n-cube. They add,
however,
enough
virtual
channels
to partition
the k-ary
n-cube
into 2“ -1
virtual
networks
with rz + 1 levels per virtual
network
and (n + 1)k” channels
per level. That is, an average of ((~z + l)z/n)2’z
z virtual channels per physical
channel.
Dividing
the physical channels in a k-ary /z-cube into fewer virtual
channels,
however,
is sufficient
for designing
routing
algorithms
that are deadlock
free,
minimal,
and partially
adaptive.
A suitable
design method
is inspired
by the
one Dally and Seitz [1987] use for unidirectional
k-ary
divide each physical channel
into two virtual
channels
n-cubes. Dally and Seitz
and connect the virtual
The Tum
Model
for Adaptive
Routing
887
~[”f~:
]~~]
FIG. 9. The channels used during the two
phases of the negative-first
routing
algorithm for 4-ary 2-cubes. (a) Negative phase.
(b)witiv.phs
LLA.
.*
(a) negative
(b) positive
phase
channels
4
in each k-ary
l-cube
phase
embedded
in the k-ary
n-cube
into
a spiral
that
circles
the k-ary
n-cube
nearly
twice.
Unfolding
the spirals
converts
the
unidirectional
k-ary n-cube
into a unidirectional,
n-dimensional
mesh with
2 k – 1 nodes along each dimension.
Each node of the k-ary n-cube appears in
the mesh up to 2” times. Dimension-order
routing
in the
deadlock.
Applying
the same unfolding
scheme to a bidirectional
yields
a bidirectional,
dimension.
channel.
The
Applying
are deadlock
n-dimensional
unfolding
again
the turn
free,
model
minimal,
mesh
requires
2k – 1 nodes
with
just
two virtual
to the mesh produces
and partially
mesh prevents
k-ary n-cube
adaptive.
routing
The
use in the k-ary n-cube are the ones that correspond
nearest copy of its destination
node in the mesh.
along
channels
each
per physical
algorithms
routing
that
algorithms
to routing
a packet
to
to the
6. Hypercubes
Hypercubes
n-cubes.
special
Proofs
are
a special
Consequently,
case
the
cases of the algorithms
for
that the routing
algorithms
proofs
for the more
The
special
expression,
general
case of the
the
p-cube
of
both
adaptive
n-dimensional
routing
algorithms
n-dimensional
are deadlock
meshes
for
and
k-ary
hypercubes
are
meshes and k-ary
free are corollaries
n-cubes.
of the
has a particularly
compact
cases.
negative-first
routing
algorithm
algorithm.
Let
S be the
binary
address
of the
source node for a packet, C be the binary address of the node the header
currently
occupy, and D be the bina~
address of the destination
node.
flits
The
p-cube routing
algorithm
has two phases. In the case of minimal
routing,
the
first phase routes the packet along any dimension,
i, for which
c1 = 1 and
d, = O. When
along
there
any dimension,
is no such dimension,
i, for
which
the second
c, = O and
phase
d, = 1. These
routes
the packet
steps
are easily
computed
using bitwise logic operations,
as shown in Figure 10. If nonminimal
routing
is desired, because of its increased
adaptiveness
and fault tolerance,
the
first
phase
can
also
route
the
packet
along
any dimension,
c, = 1 and d, = 1. Then, the steps can be computed
both of these algorithms,
the only input transmitted
is a unique constant for each router, and p depends
header
similar
flits occupy in the router. Konstantinidou
to p-cube, but only for minimal
routing.
i, for
which
as shown in Figure 11. In
in the header flits is D. C
on which input buffer the
[1990]
proposes
an algorithm
Let IX I represent
the number
of 1‘s in the
6.1. DEGREES OF ADAPTIVENESS.
binary number
X. The nu_mber of shorte~t paths from S to D, SP.<,,h,,, equals
routing
h, !}z, )!, where h, = I(S A D)l and hO = I(S A D)/. For a fully adaptive
888
C. J. GLASS
AND
L.
M.
NI
Algorithm:
Minimal
p-cube routing algorithm.
Input:
Current address, C, and destination address, D.
Procedure:
I
1. H C = D,
I
2. R=
I
3. If R=
1
the packet
route
to
the
processor
local
and exit.
CAD.
4. Route
O,then
R=
the packet
along any available
FIG. 10.
Algorithm:
CAD
The mnnmal
Nonminimal
Input:
Current
positil,e
direction.
p-cube routing
p-cube
address.
channel,
routing
C; destination
1, for which
algorithm
r, = 1,
for hypercubes,
algorithm.
address,
D; and whether
the last hop was in a
p,
Procedure:
1. If C = D, Ioute
tile packet
2. l[p=l,
R=
theu
3. fZJseif CA
D=
to the lncal processor
and exit.
CAD,
O,then
R=CV
(CAD).
.1. Llse R = C,
5. Route
the packet
FIG, 11.
along any availabl[> rhannel
The nonminimal
p-cube routing
t, for which
algor]thm
r, = 1.
for hypercubes.
f, S~= h!, where
algorithm,
between
S and
D.
s ~_cuA,,/Sf, which
Overall,
equals
the p-cube
h = h] + ho = I(S @ D)l, the
other
measure
of the degree
The
l/[fl,
routing
Hamming
distance
of adaptiveness
is
].
algorithm
offers
a choice
of many
especially when compared
to the nonadaptive
e-cube routing
I illustrates
this adaptiveness,
for a binary
10-cube
in
1011010100
sends
a packet
to the
node
shortest
paths,
algorithm.
Table
which
the node
0010111001.
In this example,
h = 6,
shows only one of the 36 possible
ho = 3, /11 = 3, and SP.C,,~,, = 36. The table
shortest
shows
paths.
the
For
number
each
of choices
node
transmitting
of output
channel
the
packet,
allowed
however,
by the
routing
algorithm.
The number
of choices in parentheses
tional choices available
with nonminimal
routing.
minimal
indicates
the
table
p-cube
the
addi-
7. The Simulator
The most versatile
way to study
and compare
routing
algorithms
is simulation.
A routing
algorithm
cannot
be simulated
in isolation,
however.
It must be
simulated
for a particular
direct network,
input selection
policy, output
selection policy, and pattern
of message traffic.
Each of these factors affects the
performance
of the routing
algorithm.
This section
describes
the kinds
of direct
The Turn Model
TABLE
for AdaptiL’e
I.
Routing
EXAMPLE OF ,4 PAW ALLOWED BY ‘ItI~ p-CuBE Rout
0011010000
1(+2)
3
6
5
phase 1
0010010000
0010110000
0010110001
2
1
0
3
phase ‘2
I 001011[001 I
networks.
as well
channel
selection
experiments
and their
pattern
is measured
in terms
latency
and average
sustainable
that
elapses
for transmission
The
of two
between
of a network
main
network
quantities:
is considered
source
nodes
the source
node
node
Each
tional
physical
a pair
pair
sustainable
of neighboring
Each
of unidirectional
width,
20 flits\ps.
single
flit.
The
ously transmit
input
operate
all of the flits
7.3. CHANNEL
routing,
multiple
SELECTION
input
the same available
channel
for USC.
per unit
of messages
by one pair
to its local
channels
in a router
but
the portion
of
queued
at
have
is
per
of unidirecprocessor
the same
by
band-
has a buffer
the
synchronize
to simultane-
of a packet
size of a
in the network.
During
both nonadaptive
and adaptive
in a router may contain header flits waiting for
channel.
Which
channel
to use is decided
Selection
All
asynchronously,
next is decided by the input
flit may have a choice of
Input
is connected
channels.
channel
header
7.3.1.
delivered
available
POLICIES.
channels
output
available
100 in the simulations,
is also connected
in a worm,
of u message
theoretical
throughput,
transmitted
flits. The
the number
limit,
communication
latency
network
studied in the simulations
has one processor
and one router
routers
router
of routing
algopolicy, and traffic
the message
some small
direct
that
physical
Each
routers
the actual
the message
of messages
when
does not exceed
channels.
in this paper.
describes
The
making
having
is the number
NETWORKS.
Each
7.2. DIRECT
treated
as part of a multicomputer
node.
studied
average
throughput.
It is measured
as a percent
of the maximum
would be achieved
if every channel
continuously
throughput
their
patterns
The next section
results.
and the destination
throughput
time.
which
and traffic
I
ion
The performance
of each combination
input selection
policy, output selection
7.1. PERFORMANCE.
rithm, direct network,
is the time
Dhase 2
I dmtiJlat
is measured.
AI.GORJTWM
ING
phase 2
I
policies,
as how performance
simulation
889
by the
Policies.
input
selection
multiple
output
Input
channel
gets to use the output
policy. During
adaptive routing,
a
output
channels.
Which
output
selection
selection
policy.
policies
can be based
on a
variety of packet and router characteristics.
Some of these are the length of the
packet, the location
of the source node, the location
of the destination
node,
the location
of the router, the distance travelled
already. the distance to travel
still, the time the message was sent, the time the packet arrived in the router,
the number
of output channels available
to the packet, and the input channels
chosen previously
in the router.
In this paper,
seven policies
are studied:
random,
no-turn,
local first-come-first-served,
global
first-come-first-served,
890
C.
distance -travelled,
least-adaptive,
one
channels
of the
channel
input
directly
of the input
waiting
across
channels
input
randomly.
the router
waiting.
channels
The
local
and distance-least.
randomly.
from
The
policy
GLASS
policy
channel,
the no-turn
policy
policies
three
the
input
L.
polic}
chooses
next
chooses
AND
The random
no-turn
the output
Otherwise,
randomly.
FCFS
The
J.
input
it is one
chooses
channel
NI
chooses
the
provided
also
M.
one of the
decide
containing
ties
the
packet that arrived
in the router
first. The global FCFS policy
chooses the
input channel
containing
the packet that was sent first. The distaizce-trtll’elled
policy chooses the input channel
containing
farthest.
The least-adaptiL1e policy
chooses
the packet that has travelled
the input
channel
containing
the
the
packet that has the fewest choices of output
channels.
It decides ties by using
the distance-travelled
policy.
Using a policy that is more complex
than the
random
policy
of output
is necessary
channels
upon
because,
which
in low-dimensional
packets
often be waiting for the same number
The distance-least
policy
is like the
decides
Each
ties by using
of these
can wait
of output channels,
distance-travelled
the least-adaptive
input
the random, no-turn,
routing
information.
selection
topologies,
is small
the number
and many
packets
will
making ties common.
policy,
except that it
policy.
policies
has its advantages.
An
advantage
of
and local FCFS policies is that they require
no additional
The global FCFS requires
that a global time stamp be
passed along in the header flits of packets. Generating
and passing the global
time stamps necessitates
establishing
a global clock and adding many bits to
the header flits. The distance-travelled,
least-adaptive,
and distance-least
policies require
of packets.
attempt
that the distance travelled
so far be passed along in the header flits
The distance takes up a few extra bits. All but the random
policy
to reduce
local
and global
their
latencies
high latencies
average
FCFS
from
communication
policies
increasing
included
is that
much
in the average.
latencies.
giving
more,
The
reasoning
priority
to older
thereby
decreasing
Because
the global
behind
packets
FCFS
the number
policy
the
prevents
of
is based
on more global information,
it should perform
better
than local FCFS. The
FCFS policies
have an added benefit:
they are fair and, therefore,
free from
starvation.
The distance-travelled,
least-adaptive,
distance-least,
and no-turn
policies,
on the other hand, are not
reasoning
behind
the distance-travelled
packets that have reserved
the most
possible,
network,
fair
and are subject to starvation.
The
policy is that giving priority
to the
channels
releases as many channels
as
as soon as possible,
and limits the number
of packets entering
the
thereby
reducing
contention
and latencies.
The least-adaptive
policy
tries to reduce
contention
by giving
making it less likely that the remaining
priority
packets
to the least adaptive
will have to wait, The
packet,
no-turn
policy tries to reduce contention
by giving priority
to the packet that does not
have to turn. Why such a priority
reduces
contention
will be discussed
in
connection
with output selection
policies.
7.3.2. Output Selection Policies.
Output
selection
policies
can be based on
many of the same packet and router characteristics
as input selection
policies.
Some of the more interesting
characteristics
are: which channels are available,
which lead toward
the destination
and which away, the distance
the packet
must
still
this paper,
the output
travel
in each
direction,
and the
direction
of the
input
channel.
In
three policies
are studied:
zigzag, xy, and no-turn.
When none of
channels
allowed by the routing
algorithm
is available,
the policies
The TurnModelfo
all
choose
channels
rAdaptile
the
first
one
is available,
Routing
891
to become
the
policies
available.
choose
When
the
only
available
one
of the
channel.
output
When
more
than one of the output
channels
are available,
the zigzag policy chooses the
output channel
in the direction
the packet must still travel the farthest.
More
precisely,
if the
current
node
is C and the
destination
node
is D, the zigzag
policy chooses a channel
in the dimension,
i, with the largest value Id, – c, 1.
The
The ay policy
chooses the output
channel
for the lowest
dimension.
no-turn policy chooses the output
channel
in the same direction
as the input
channel.
Each of these output
selection
in a different
reasoning
follow
way. The
a diagonal,
staircase
policies
tries to improve
behind
path
to
network
the zigzag policy
the
destination
performance
is that
node
preferring
is most
to
likely
to
preserve adaptiveness
along the way, thereby
reducing
latency. The disadvantage of this policy is that it tends to concentrate
traffic
near the center of a
mesh,
although
adaptiveness
mitigates
the concentration
attempts
to overcome
this disadvantage
the xy routing
algorithm,
which spreads
channels
of a mesh.
produce
channel
plus signs (“ +”)
Simulations
utilizations
represent
some.
The
by preferring
the channels
uniform
traffic very evenly
of the .V routing
algorithm
in tenths.
The main
thing
channels
across
for uniform
the
traffic
like those shown in Figure
12. In the figure, the
the nodes of a 10 X 10 mesh, and the numbers
surrounding
each plus sign indicate
the utilizations
of the output
the node. Each number
is the fraction
of the time that the channel
the eastward
xy policy
chosen by
to note
in Figure
in a column
12 is that
are very nearly
channels
of
is reserved,
the utilizations
equal.
The
for all of
same is true
of
all of the westward
channels
in a column,
all of the northward
channels
in a
row, and all of the southward
channels
in a row. The packets that cross a
particular
column
or row of channels
could not
across the channels than the xy routing
algorithm
be spread much more evenly
spreads them. The reasoning
behind
13. Each
onto
the no-turn
policy
a row or column,
along
that
row
is illustrated
it blocks
or column.
in Figure
other
In Figure
packets
traveling
13, the zigzag
path
time
a packet
turns
in the same direction
(solid
arrows
on the
upper left) blocks many northbound
and eastbound
packets (dashed arrows),
while the xy path (solid arrows on the lower right) blocks fewer. The impact is
greatest
heaviest.
when turning
onto the middle
of a row or column,
where traffic
is
Turns are not all bad, however.
Their benefit
is that they increase
adaptiveness.
Therefore,
what should be avoided
is what the no-turn
policy tries to do.
are unnecessary
turns,
which
PATTERNS.
In the simulated
multicomputers,
the processors
7.4. TRAFFIC
generate
messages at time intervals
chosen from a negative exponential
distribution. Each message has an equal probability
of being one packet of 10 or 200
flits. Messages
queued at the
are immediately
What
probably
that are blocked
source processor.
from immediately
Messages arriving
entering
the network
are
at a destination
processor
consumed.
affects
the performance
of the routing
algorithms
more
than
any of the preceding
factors
is the pattern
of message traffic,
that is. the
destinations
to which messages are sent. In this paper, six traffic patterns
are
studied:
uniform,
matrix-transpose,
matrix-rotate,
merge-south,
reverse-flip,
and address-rotate.
Some apply only to two-dimensional
meshes, and some
apply only to hypercubes.
The uniform
pattern
sends each message to any of
C.
GIJWS
AND
1+
L.
+2
1
1+3
2+4
3+1
4+4
4+4
4+3
1+2
3+1
1
1
1
1
1
1
1
1
1
1
0
1+3
~
1
2+3
1
3+4
1
1
1
0
1
1
4+3
~
3+2
2
1+1
~
4+3
2
2
2
3+1
,2
1
1
1
‘j
‘J
1
1
1
1+3
2+4
3+1
-1+4
:3+1
:)
2
3
2
4+2
~
2
+2
~
2
+2
3
2
+’2
1+3
2
2 +-t
4+1
3
3
4+
4+4
32
~
3+4
3
3
~
2
4+-L
:3
2
4+3
-f+z
2
.3
4+2
3
3+1
3
3
3
3
3
3
3
~+~
2
3+1
:3
:3
3
3
:+
3
3
3
3
32
3+4
1+3
2+4
3+4
-1+4
4+4
4+:3
3
3
2
2
2
3
3
3
;3
3
:3
3
33
1+3
2
:3+J
2
3+1
2
2
:3
-~+-t
2
~
4+3
2
3
2+4
2
~
1+3
2+4
3+4
4+4
4+4
2
,2
1
1
1
1
1+3
~
2
2
22
:3+4
3+2
22
2
2
,2
:3+1
1
1
2
1+2
2
3+1
1
1
1
1
1
-!+4
0
4+4
1
1
4+3
1
1
1
1
1
I
1
1
1
1
1+3
2+3
3+4
4+4
1+3
4+3
4+2
:3+1
FIG. 13.
1+
:3+’2
1
in a 10 ~ 10meshfor
~+
,2
1+3
utilization
3
3
3
1
Channel
2+
:~+1
2
2+4
+2
3
y
3
3
+1
~+
4+3
1+3
3
2+1
3
2
:3+1
3
3
2
+2
~
:3
:3
2
4+4
:3
3
:3
~
3
3
+’2
+2
2
1
2+
4+4
3
N[
g
3
3
M.
1+
3
~
:3
+2
FIG. 12.
J.
2+
1
~
2+
1+
uniform
traffic
and the .V routingalgorlthm.
❑
mmm
❑
❑
--f-p
Turns can increase contention.
:J;P
:
:~F~
the other
processors
with equal probability.
In a k x k mesh, the nzatrixtranspose pattern
sends each message from the processor at (x, y) to the one at
(k – 1 – y, k – 1 – x). In a binary
rived by
mesh are
dictated
binary
(x,,,
k
x1>x2,
X
8-cube, a matrix-transpose
pattern
is demapping
a 16 X 16 mesh to the hypercube
so that neighbors
in the
neighbors
in the hypercube.
Messages are then sent to the processors
by the matrix
transpose
in the mesh. The resulting
pattern
in the
8-cube
sends
each
message
from
the
processor
at
-~3>x4,
k mesh,
the
x5>x6!x7
)
matrix-rotate
to
the
pattern
one
at
(IA,
X5, XK, X7,
sends each message
ZO, XI,.YZ,
from
X3).
ln
the processor
a
The Turn Model
for Adaptive
Routing
893
at (x, y) to the one at (k – 1 – y, x). In a k x k mesh, the merge-south
reduction
pattern
sends each message
from
the processor
or tree
at (x, y) to the one at
(x/2
+ (y mod 2) * (k\2),
y/2). The merge-south
pattern
might occur if all the
children
in a binary tree had to send messages southward
to their parents. In a
binary 8-cube, the reL1erse-jZip pattern
sends each message from the processor
at (.to, xl, .Y2, x3, ~4, x5, ~6>~7 ) to the one at (2T, .i~, ~j, Zl, ~j, 2Z, il, .iO). In a
binary
8-cube,
message from
the
the
address-rotate
processor
at
pattern
[Konstantinidou
1990]
(xO, xl, xj, x3, X4, X5, xfj, X7 ) to
sends
the
each
one
at
(x,> x,, x,, x,, x“, -x,, x,, X3).
8. Sinmlation
Experiments
Numerous
simulations
algorithms
produced
have
been
by the turn
conducted
model
to
study
the
and the nonadaptive
adaptive
routing
produced
by dimension
ordering
for different
direct networks,
and output selection
policies,
and different
traffic patterns.
8.1. INPUT
no-turn,
SELECTION
local
FCFS,
The first
POLICIES.
global
FCFS,
cies. Each policy
is simulated
in
north-last,
and negative-first
routing
tion
policy
is zigzag,
the routing
simulations
input
is minimal,
and the message
throughputs,
distance-travelled
policy
global
FCFS policy produces
the second
no-turn,
and local FCFS policies
produce
produces
suggests
requires
the
and local
information.
poli-
the
is uniform.
(Figure
14). At
same. At high
lowest
latencies,
the
lowest latencies,
and the random,
the highest latencies.
Compared
to
and local FCFS policies,
the distance-travelled
sustainable
throughput
by 15%.
The most surprising
of these results is that
policies
do not perform
much better
than the
no-turn,
routing
the random,
selection
traffic
each routing
algorithm
are fairly consistent
the input
selection
policies
perform
the
the random,
no-turn,
increases the maximum
input
a 10 x 10 mesh for the W, west-first,
algorithms.
In all cases, the output
selec-
The results for
low throughputs,
the
different
compare
and distance-travelled
routing
algorithms
policy
the local FCFS and no-turn
random
policy. The random,
FCFS policies
are the three that do not require
additional
That they perform
the worst may not be a coincidence
and
that improving
performance
by changing
input
selection
policies
changing to a policy that is based on additional
information
carried in
header
flits.
outperforms
reducing
The
the FCFS
contention
other
main
policies
is more
result
is that
the
at high
throughputs.
important
than
distance-travelled
This
result
maintaining
policy
suggests
fairness.
In
that
the
simulations
of the distance-travelled
policy,
starvation,
though
theoretically
possible, is not a problem.
The next simulations
compare
the distance-travelled,
least-adaptive,
and
distance-least
input selection
policies
[Glass 1992]. These three policies
differ
only in whether
they emphasize
distance
or adaptiveness
and in how they
decide ties. The results for each routing
algorithm
are very consistent.
At all
throughputs,
These
the three
input
suggest
that
results
distance-travelled
marginal
value,
selection
policies
augmenting
policy,
to give priority
at least for two-dimensional
have ~early
the same latencies.
a
successful
policy,
to
the least adaptive
packet
is of
meshes. For higher-dimensional
such
as
the
meshes, the adaptiveness
of packets has a wider range of values, and the added
discriminating
power of the least-adaptive
and distance-least
policies may make
them more useful.
894
C.
40 r’~-
J.
GLASS
AND
L.
M.
NI
~o ~
o~
;~
o
.5
Pcr~ent
10
of max
o
1.5
25
30
20
throretlcal
throughput
.5
Percent
10
of max
(b)
(a)
10
I
35 .Jo ~~
I
I
40
I
I
l~ndom
*
35 -
no-turr
+
30 -
Fms
+
’20 – rlIst -travelled
~
local
latency
10 -
5~
5~
P[r(
ellt
20
15
illwl’etl(.al
25
+
Q-
-travelled
&
o~
o
(J~
1()
of Inclx
[
1!5 -
10 -
5
no-turn
local FCFS
20 -rflst
(pie,)
I
I
random *
Average p~
1.5 -
()
~o
2.5
:30
1.5
theore~lcal
throughput
:30
10
,5
Percent
throughput
of max
(c)
15
20
theoretical
2.5
30
throughput
(d)
FIG. 14,
Comparison of input selection policies for four different routing algorithms in a 10 ~ 10
mesh. (a) q routing algorithm. (b) West-first routing algor]thm. (c) North-last routing algorhhm.
(d) Ncgatwe-first
routing algorithm.
s~
OuTpuT
SELECT-ON poLIclEs.
The next simulations
compare
the zigzag,
W, and no-turn
output selection
policies. Each policy is simulated
in a 10 X 10
mesh for the west-first,
north-last,
and negative-first
routing
algorithms.
In all
cases, the input selection
policy is distance-travelled,
the routing
is minimal,
and the message traffic is uniform.
Figure
15 shows the results. For each of the routing
algorithms,
the no-turn
policy has the lowest latencies
and the zigzag policy has the highest latencies.
Depending
on the routing
algorithm.
the .xy policy is comparable
to either the
no-turn
or
increases
zigzag
policy.
the maximum
Relative
sustainable
to
the
zigzag
throughput
policy,
the
by an average
The relative performance
of the three output
due to two factors:
spreading
message traffic
no-turn
policy
of 5%.
selection
policies appears to be
evenly and avoiding
turns. The
zigzag policy tends to concentrate
uniform
traffic
in the center of the mesh,
increasing
contention,
and latencies.
Compensating
for this tendency somewhat
is the adaptiveness
of the routing
algorithms.
The xy policy attempts
to spread
the uniform
traffic
more evenly. When
contention
is high, however,
the xy
policy,
like the other
policies,
chooses the first available
channel,
ignoring
which
channel
is preferred.
This
behavior
tends
to route
more
packets
the center of the mesh. The better performance
of the .xy policy relative
zigzag policy, in general,
suggests that spreading
traffic
evenly is more
tant than
maintaining
maximum
adaptiveness.
The superior
performance
toward
to the
imporof the
The Turn Model
for Adaptiue
Routing
895
o~
o
.5
Percent
10
of max
15
jo
theoretical
30
25
tl]roughput
(a)
40
I
Average
latency
(/lsec)
I
I
I
t
zigzag *
35 -
,zg *
30 25 20 -
no-turn
@r
FIG. 15. Comparison of output selection
policies for three different
routing algorithms in a 10 x 10 mesh. (a) West-first
routing algorithm.
(b) North-last
routing
algorithm.
(c) Negative-first
routing algorithm.
1!5 -
10 .5~
o~
o
Percent
5
10
of Inzm
1<5
20
25
30
theorctlca]
throughp(lt
(b)
()~
o
5
10
Pcr( ~nt of nlax
1.5
2()
25
30
thmr~tlcd
throughput
(c)
no-turn
policy suggests that turns
sary turns is even more important
are indeed costly and that avoiding
than spreading
traffic
evenly.
unneces-
PATTERNS.
The next simulations
compare the adaptive routing
8.3. TRAFFIC
algorithms
produced
by the turn model with the nonadaptive
routing
algorithms
produced
by dimension
ordering
for different
direct
networks
and
different
patterns
of message traffic.
mesh and a binary 8-cube (Simulation
for a larger 2D 16 x 16 mesh have
The two networks
studied are a 10 x 10
results for a 3D 10 x 10 x 10 mesh and
a similar
behavior
and can be found
in
Glass [1992]). The six traffic
patterns
studied
are uniform,
matrix-transpose,
matrix-rotate,
merge-south,
reverse-flip,
and address-rotate.
For the 10 X 10
mesh, the input selection
policy is distance-travelled
and the output
selection
policy is no-turn.
For the binary
and the output selection
policy
8-cube, the input selection
is V. In all cases, routing
policy is local FCFS
is minimal.
896
C.
J.
GLASS
AND
L.
M.
NI
40
35
30
Average 2.5
latency 20
(~sec)
north-last
negative-first
10 .5~
I
n
n
1
{
5
10
1!5
F’wrcent of maxlmurn
I
%
&
ea t-west
1,5
10
5
o~
*
o
Xl
25
30
35
40
thecretlca]
throughput
Percent
.5
10
1.5
of mammum
(b)
(a)
40
I
35
I
I
I
I
I
40
I
I
I
I
I
I
I
I
35 -
–
30 -
30 -
Average 2.5 -
25
20 -
Zy
1,5 -
west-first
north-last
negative-first
10 5~
I
o
o
I
1
.5
Pel[.erlt
I
+
%
&
east-west
10
1.5 20
2.5
nf max theoretical
*
latency
20 -
(psf’c)
1.5 10 5~
*
0
30
35
40
throughput
I
I
O
5
Percent
+
*
%
4
-
30
35
40
throughput
(d)
for different traffic patterns in a 10 X
traffic pattern. (c) Matrk-rotate
traffic
FIG, 16. Comparison
of routing algorithms
Uniform traffic pattern. (b) Matrix-transpose
Merge-south traffic pattern.
each traffic
I
~11
west-tirsl
north-last
negative-first
ea+west
I
10
1.5 20
2.5
of mm
theoretical
(c)
For
20
2.5 30
3.5 40
theoretical
throughput
pattern,
a different
cies at the high throughputs.
16(a)); for matrix-transpose
routing
algorithm
10 mesh. (a)
pattern.
has the lowest
(d)
laten-
For uniform
traffic,
it is the ~y algorithm
(Figure
traffic,
the negative-first
algorithm
(Figures
16(b)
and 17(a)); for matrhcrotate,
merge-south,
and address-rotate
traffic,
the
north-last
or all-but-one-negative-first
(ABONF)
algorithm
(Figures 16(c), 16(d),
and 17(b)); and for reverse-flip
traffic, the west-first
or all-but-one-positive-last
(ABOPL)
algorithm
(Figure
17(c)). (The
16 will be discussed
later. ) In general,
perform
better
better for
hypercube,
for uniform
traffic,
east–west
routing
the nonadaptive
and the adaptive
routing
algorithm
routing
algorithms
in Figure
algorithms
perform
nonuniform
traffic.
For matrix-transpose
traffic
in the meshes and
the maximum
sustainable
throughput
of the adaptive
algorithms
is
twice that of the nonadaptive
third higher than the second
the xy algorithm
and uniform
algorithms.
In the meshes, this throughput
is a
highest sustainable
throughput,
which occurs for
traffic.
The latter improvement
in throughput
is
not due to shorter
path lengths for matrix-transpose
traffic
than for uniform
traffic:
the average path length for matrix-transpose
traffic
is 11.34 hops and
for uniform
traffic
is 10.61 hops. For reverse-flip
traffic
in the hypercube,
the
maximum
sustainable
throughput
of the adaptive
algorithms
is four times that
of the nonadaptive
e-cube algorithm
and one half higher than the next highest
sustainable
throughput,
which occurs for the e-cube algorithm
and uniform
traffic.
Again, average path length is longer for reverse-flip
traffic
(4.27 hops)
than for uniform
traffic (4.01 hops).
The Turtl
Model
40
for AdaptiLv
1 I
1
I
I
(psec)
I
I
I
35 -
e-cube
++
30 -
ABONF
ABOPL
&+-
Average ~~
latency
Routing
p-cube
897
I
+
~tJ IS
_
51
(J~
0246
Percent
S101’2
1416182022
of’ max. theoretical
throughput
(a)
40
1 I
I
35 -
I
I
f
1
e-cube
[1
1 I
+
ABC)NF -+
A130 F’[J -B--
:30 Average 2.5 latency 20 -
~~-cube &
FIG.
(/lscc) 15 10 -
Comparmm
for
binary
R-cube.
pattern.
.5~
17.
rithms
different
0246
Percent
routing
traffic
alg-
patterns
Matrix-transport
(b) Address-rotate
(c) Reverse-flip
o~
(a)
of
traffic
trtiffic
in
a
traffic
pattern.
pattern.
S101214161S20’22
of max. theoretical
throughput
(b)
40
I
I
35 -
e-cube
+
30 -
ABONF
ABOPL
+
+
I
I
Average 2.5
latency 20 -
I
I
[
I
I I
(psec) 1,5 -
10 .5~
o~
02468101214161820’
Percent of max theoretical
2’2
throughput
(c)
The
differences
in the performances
of the algorithms,
for
individual
patterns
and across traffic patterns,
are largely due to differences
of information
about
traffic
used in the algorithms.
The xy
algorithms
tion about
are nonadaptive,
but happen
the uniform
traffic
pattern.
traffic
in the types
and e-cube
to embody global, long-term
informaBy routing
packets
first along one
dimension
and then the other, they spread uniform
traffic as evenly as possible
across the channels
of a mesh or hypercube
in the long term. The adaptive
algorithms,
on the other
hand, select channels
based on local, short-term
information,
These selections
tend
to benefit
just the routed
pacliet,
and only
for the immediate
future,
and tend to interfere
with
other
packets.
The
selections
often
produce
zigzag paths, which
disturb
the global,
long-term
evenness of uniform
traffic.
The result is increased
contention
and decreased
performance.
C. J. GLASS AND L. M. NI
898
Despite
the superior
uniform
traffic,
the
in real
systems.
studies,
but
performance
adaptive
Uniform
is not
traffic
generated
mined
by the application
direct
network.
For
of the nonadaptive
algorithms
probably
has been
by many
used
most
applications,
in many
applications.
and how its processes
routing
provide
are mapped
for
performance
previous
A traffic
each process
algorithms
better
simulation
pattern
is deter-
to the nodes of the
communicates
with
some
processes much more than others. Nonuniform
traffic
presents a problem
for
the xy and e-cube algorithms
because
they are nonadaptive.
Just as they
maintain
the evenness of uniform
traffic,
they blindly
maintain
the unevenness
of nonuniform
traffic.
The result,
as the figures
illustrate,
is often
poor
performance.
8.4. SPREADING
simulations
TRAFFIC
in the
algorithm
is largely
channels
of a direct
MORE
preceding
determined
network.
The
EVENLY.
subsection
by
The
is that
how
overall
well
present
conclusion
the performance
it
spreads
subsection
from
traffic
discusses
the
of a routing
across
the
how to modify
partially
adaptive
routing
algorithms
so that they spread traffic more evenly.
Konstantinidou
[1990] describes
two separate
modifications
of the p-cube
routing
algorithm
that make it spread traffic
more evenly across a hypercube.
The first modification
is to route some packets first in positive
directions
and
then in negative
directions
and to route the other
directions
and then in positive directions.
To prevent
channels
in the positive
directions
is doubled.
The
negative-first
packets
the channels
in the negative
is heavier
for channels
use separate
channels
directions.
nearer
node
packets first in negative
deadlock,
the number
of
positive-first
packets and
in the positive
The
traffic
directions
of the positive-first
/Z – 1, and the traffic
but share
packets
of the negative-first
packets is heavier for channels
nearer node O. In total, the traffic
of the two
types of packets is fairly even. The second modification
is to route all packets
first in positive
directions,
then in negative
directions,
and then again in
positive
directions.
Deadlock
is again prevented
by doubling
the number
of
channels in the positive directions.
The total traffic of the three phases is fairly
even across the channels of the hypercube.
The
modifications
by
Konstantinidou
[1990]
can
be
generalized
partially
adaptive
routing
algorithm
and any direct network.
technique
for modifying
a routing
algorithm
to spread traffic
direct
network
is to create
multiple
virtual
networks
to
any
The first general
more evenly in a
and use a different
version
of the routing
algorithm
(or an entirely
different
routing
algorithm)
in each
virtual
network.
For example,
a north-last
routing
algorithm
might be used in
one virtual
network,
and a south-last
routing
algorithm
might be used in a
~ecorid
virtual
network.
Which
routing
algorithm
and
yirtual
network
arG used
to route a particular
message can be determined
randomly
(as Konstantinidou
suggests)
or according
to which
routing
algorithm
is best suited
for the
particular
source and destination
nodes. When the virtual
networks
are overlaid onto the physical network,
it may be possible for virtual
networks
to share
some virtual
channels
and still prevent
deadlock
(as Konstantinidou
demonstrates for the p-cube
routing
algorithm).
The overall
objective
is that the
channel
utilizations
in the virtual
networks
complement
each other such that
the channel utilizations
in the physical network
are as even as possible.
The second general
technique
for modifying
a partially
adaptive
routing
algorithm
to spread traffic
more evenly is again to create multiple
virtual
The Turn Model
networks
with
multiple
in sequence.
advance
for Adaptille
All
Routing
routing
messages
to the next
algorithms,
start
virtual
899
in the
network
but to connect
same
virtual
the virtual
~etwork
in the sequence.
networks
and
When
repeatedly
the last phase
of
the routing
algorithm
in one virtual
network
is similar to the first phase of the
routing
algorithm
in the next virtual
network,
the two virtual
networks
may
share virtual
tions in the
utilizations
The
channels.
Again,
virtual
networks
in the physical
drawback
to both
the overall
complement
network
objective
is that the channel utilizaeach other such that the channel
are as even as possible.
of these
general
techniques
k that
they
virtual
channels
be added to a direct
network.
Adding
virtual
creases the cost of a direct network.
It also creates the possibility
algorithm
designed
from
the start
for
a direct
network
with
require
that
channels
inthat a routing
that
many
virtual
channels might make better use of the channels and perform
better.
A third general
technique
for modifying
a partially
adaptive
routing
algorithm
to spread traffic
more evenly does not have this drawback
of adding
virtual channels to a direct network.
The technique
is to use different
versions
of the routing
parts
spreads
this
algorithm
of the direct
(or entirely
network.
The
traffic
the most evenly
technique
can be applied
routing
algorithm.
The result
the mesh uses a west-first
uses an east-first
destined
for the
different
objective
routing
in each part
of the direct
to two-dimensional
is the east–west
routing
algorithms)
is to use the routing
algorithm
network.
meshes
rowing
to route
and
algo~itlun:
packets,
the
west-first
the east half
of
and the west half
or south). Packets in the same half as their
primarily
if they are heading
away from the
center of the mesh, as shown in Figure 18(b). This combination
routing
toward
the center of the mesh and adaptive
routing
helps
that
For example,
routing
algorithm.
Thus, as shown in Figure
18(a), packets
other half of the mesh are routed
there nonadaptively
(i.e.,
east or west without
going north
destination
are routed
adaptively
center
in different
algorithm
keep adaptiveness
from
inadvertently
congesting
of nonadaptive
away from the
the center
of the
mesh.
To find
out how the east–west
routing
a 10 X 10 mesh for a variety
of traffic
policy
the
is distance-travelled,
algorithm
patterns.
output
performs,
it is simulated
In all cases, the input
selection
policy
is no-turn,
in
selection
and
the
routing
is minimal.
The results are shown in Figure 16. Overall,
the east–west
routing
algorithm
performs
better than the other adaptive
routing
algorithms.
For uniform
traffic,
the nonadaptive
xy routing
algorithm
still performs
the
best, but the east–west
9. Conclusions
This
paper
previous
model
network.
highly
algorithm
and Future
presents
models,
is based
This
adaptive
on
turn
Work
a new
which
is a close second.
model
are largely
analyzing
model
the
for
designing
based
directions
systematically
for a network.
For networks
routing
on adding
virtual
in which
designs
without
routing
algorithms.
Unlike
channels,
the new
packets
can
algorithms
turn
that
in a
are
virtual
channels,
the turn
model
produces
routing
algorithms
that
are partially
adaptive.
(For networks
with virtual
channels,
the turn model can produce
routing
algorithms
that are
fully adaptive [Glass and Ni 1992; Glass 1992].) Adaptiveness
is important
in a
routing
algorithm
because it reduces the effects of hot spots, unfair
channel
allocation,
and fwlts.
The turn model also designs routing
algorithms
that are
900
C. J. GLASS AND L. M. NI
FIG. 18.
halves
The
directions
in which
of a two-dimensional
cast–west
routing
packets
mesh
algorithm
in the two
are routed
during
by the
its two phases.
EHm
(b)
(a)
ideal
for
fault
addition
tolerance,
to highly
being
adaptive
minimal
[Glass
or nonminimal
and
livelock
free,
in
and Ni 1993].
This paper also presents the results of numerous
simulations.
Simulations
of
different
input selection
policies show the distance-travelled
policy to perform
the best,
than
implying
to provide
the no-turn
policy
to
that
fairness.
policy
it is more
to perform
reduce
traffic
patterns
sion-order
adaptive
routing
routing
than
to
and routing
algorithm
algorithms
for
a policy
of different
output
the best, implying
contention
different
important
Simulations
that
increase
it is more
to make
partially
9.1. IMPLEMENTATION
formance
expected
on two things:
the routing
factor
from
whether
algorithms
is the topic
adaptive
routing
show the nonadaptive
of
dimen-
algorithms
of a network
is
Three ways are
spread
traffic
Whether
the improvements
are realizable
in real systems
model
real traffic
for a
the best for uniform
traffic
and more
better
for nonuniform
traffic.
The
CONSIDERATIONS.
the turn
show
Simulations
implication
is that spreading
traffic evenly across the channels
very important
for good performance
by a routing
algorithm.
suggested
evenly.
contention
policies
important
adaptiveness.
algorithms
to perform
to perform
to reduce
selection
patterns
are highly
can be implemented
nonuniform
in hardware
more
in perdepends
and whether
efficiently.
The
latter
of this subsection.
Adaptive
routing
requires
that more or larger header flits be present
in a
router at one time, increasing
communication
latencies. Although
nonadaptive
routing
decisions
can be made based on the distance
remaining
in a single
dimension,
adaptive routing
in more than one dimension,
address
more
decisions must be based on the distance remaining
if not all dimensions.
For dimension
ordering,
the
of the destination
node
can be stored
for each dimension.
Only
the first
it is no longer
needed,
decision,
When
header
in a series of header
flit
is required
it can be discarded
flits,
one or
for each routing
and
the next
flit
used. For adaptive
routing,
the best approach
may be to place the least
significant
bits of each dimension
in the first header
flit, the next least
significmt
bits in the second header flit, and so forth. For example, the first flit
might contain bits O to 3, the second store bits 3 to 6, the third store bits 6 to 9,
and so forth. Then, when the distances in the first flit are reduced to zero, the
flit is discarded,
the distances in the second flit are decremented
by one, and a
new first flit is generated
by the router.
Adaptive
routing
also requires more complex control circuits. More informaand the decision
rules are more
tion is involved
in the routing
decision,
complex.
decision
the direct
The extra control
hardware
means more expense and possibly longer
times. How much longer decision times arc depends on the specifics of
network
and routing
algorithm.
Basing
the decision
on more
infor-
The Turn Model
mation
does not
complexity
Finally,
necessarily
of the decision
adaptive routing
of a message.
901
for AdaptiL1e Routing
require
more
time.
It is likely,
however,
that
the
rule increases decision times by a few clock cycles.
may increase the time for reassembling
the packets
Nonadaptive
routing
algorithms
usually
prevent
the packets
in a
message from passing each other on their way through
a network,
because all
of the packets follow
the same path. Adaptive
routing,
on the other hand,
makes no guarantee
that packets will arrive at the destination
in the order they
are sent. Thus,
be more
about
reassembling
packets
for
its proper
9.2. FUTURE
the most
future
a message
a message,
may require
too, since
sorting.
each packet
There
must
may have to
carry
information
order.
There are many
is identifying
realistic
possibilities
for
traffic patterns,
WORK.
important
simulations
can be more
meaningful.
Another
future work. One
so that the results
is exploring
of
of
the efficiency
with which the routing
algorithms
produced
by the turn model can be implemented in hardware.
There are many more input and output selection policies to be explored,
too,
especially
tion
ones based
policy
that
on additional
assigns packets
information.
priorities
For
example,
an input
based on the distance
selec-
travelled,
minus
the distance
remaining
and the packet length,
might release more channels
more quickly. Also, an output
selection
policy that waits a short period before
selecting
a nonpreferred
from the destination,
The turn
model
output
channel,
especially
a channel
may improve
performance.
has been applied
to n-dimensional
that
leads
away
and
lc-ary
meshes
n-cubes in this paper, but it can also be applied
to other topologies,
such as
hexagonal
networks,
octagonal
networks,
cube-connected
cycles, and express
cubes [Dally 1991]. In such topologies,
the turns are not necessarily
90-degrees
and the simple cycles are not necessarily
formed by four turns. The turn model
should
again
produce
interesting
So far, the research
message
help
has only one destination.
design
which
and useful
has involved
routing
each
applicable
algorithms
message
The
multiple
star model
algorithms.
communication,
question
for multicast
can have
to the multicast
routing
only unicast
is whether
and broadcast
destinations.
of multicast
in which
the turn
each
model
can
communication,
The
turn
model
communication
in
may be
[Lin
and Ni
1991].
The authors
the simulator
ACKNOWLEDGMENTS.
helping
us develop
suggestions
wish to thank Dr. Philip
and anonymous
referees
K. McKinley
for
for their helpful
and comments.
REFERENCES
AG~RWAL,
A.,
architecture
siwn
on Contpt~ter
BORKAR,
B.-H.,
KRANZ,
Arcizitectztre
S. B., COHN,
PETERSON,
1988.
LIM,
for multiprocessing
R., Cox,
C., PIEP~R,
integrated
3upercu)rz~)~tttr~g’[98
(Orkzrzdu,
W. J.
1988.
Fine-grain
Conference
on Hyperczlbe
New
pp. 2–12.
DALLY,
York,
W.
Concurrent
J.
1989.
Cornpz[talg,
The
AND KUBIATOWICZ,
(Seattle,
In
Wash.,
May
L.,
solution
Fla.,
to
28–31 ). IEEE,
high-speed
parallel
Nu~,
14--18).
IEEE,
message
passing
concurrent
Corrzpztters,
J-machine:
System
and Agha,
APRIL:
1990.
A
processor
of the 17th Zntcr-nat~o)za[ S,wrzpcJNew
York,
H. T., LAM,
pp.
104– 114.
M.,
MOORE,
B.,
P. S., SUmON, J., URBANSZU. J., .AND WEBB,
TSENG,
C’o)zczm’ent
Hewitt
J.
Proceedings
G., GLEASON, S., GROSS, T., KUNG.
J., RANKIN,
An
DALLY,
iWarp:
D.,
multiprocessor.
vol.
support
eds. MIT
New
computing.
Yofk,
for
Actors.
Rrx-eedm,q
J.
oj’
pp. 330–33Q
computers.
1 (Pasadena.
Press.
In
In Proccuii?zgs
Cal if.,
In
Cambridge,
Jan.
Actors:
Mass.
of tbe 3rcl
19-20).
ACM.
Kno)tledge-B~zsed
902
C. J. GLASS
DALLY, W. J.
Trans.
1990.
Compat.
DALLY.
W. J.
t]on
W.
(June)
1991.
networks.
DALLY,
Performance
C-39,
cubes.
Trans.
J., AND Aota,
Massachusetts
Institute
mcube
interconnectmn
L.
M.
networks.
NI
lILEE
775-785.
Express
IEEE
of k-ary
analysis
AND
Improving
Comput.
H.
the
40, (Sept.),
1990.
Adaptive
of Technology,
performance
of k-my
n-cube
lntercrmnec-
1016– 1023.
rout]ng
Laboratory
using
for
virtual
Computer
channels.
Science,
Tech.
Rep.,
Cambridge,
Mass.,
Sept.
DALLI, W. J., AND SEITZ, C. L.
1986.
The torus routing
DALLY, W. J., AND SEITZ, C. L.
1987
Deadlock-free
interconnection
GLASS,
C.
turn
networks.
J.
1992.
model.
Lansing,
Ph.D.
Mich..
IEEE
Designing
thesis,
Twrrzs Comput.
maximally
C-36,
State
1992.
Maximally
routing
m
for
Department
wormhole
of Computer
ings of the 1992 International
Cottference
GLASS, C. J., AND NI, L. M.
the 23rd Annurd
1993.
International
fully
adaptive
on Parallel
Fault-tolerant
Sympo~ium
routing
Proce.swzg,
message-passing
Japan.
Nov.),
S.
KONSTANTINIDOU,
MIT
Conference:
wormhole
routing
on Faalt-Tolerant
1. (Aug.),
pp
in meshes.
Compz{tzng
Adapt]ve.
Researclz
LAUDON,
cache
17th International
AP 1000. In
mm]mal
In
(Toronto,
Ont.,
protocol
L.
for
L.
M.
1991.
Proceedozgs
Canada,
May
k-ary
27-30).
COMPANY. 1990.
AND MCKINLEY,
P. K.
IEEE
1988.
DASH
1991.
multicmt
New
In
York,
An
A
of the
May
28-31)
of
the
3rd
Calif.,
Proceedings
pp.
adaptive
The
of the
IEEE,
New
in
multicomputer
on Compz[ter
Arclzuec
tare
116-125.
and
fault
tolerant
wormhole
rout]ng
40, (Jan. ) 2– 1‘2.
Manual,
survey
of wormhole
routing
techniques
m direct
A, J., SEIZOVIC, J., STEELE, C, S., AND Su,
programming
Conference
Jan.
routing
Symposil[m
on
19-20).
ACM,
1977.
of the
Hypercube
New
A large
of the %h ,%nual
Ametek
Series
Concametlt
York,
multlcomputer.
and
Apphcatzorzs,
pp. 33-36.
scale. homogeneous,
Sym,ooszarn
2010
Computers
orz Curnputer
fully
distributed
,4rchltecture
(Mar,
parallel
), IEEE,
pp. 105-124.
for networks
of processors.
In
1989.
IEEE
Adaptive.
Proceeding,
low
latency,
Pt. E, vol.
RECEIVED FEBRUARY 1993; REVISED IU1.Y 1993; ACCEPTED JULY 1993
Journal
1990.
Proceedings
62–76.
Y.ANTCHEV, J. T., AND JESSHOPE, C. R.
routing
Third-
P~-oceedmgs of the 6t/7
In
Wash.,
wormhole
York,
6400 Processor
and
SULLIVAN, H., AND BASHKOW, T. R.
New
In
HENNESSY, J
AND
tect~Lre (Seattle,
Compw,
1993.
26, (Feb.)
architecture
I, (Pasadena,
machine.
1991,
OFZSupercoozpzmng
multiprocessor.
International
ACM,
Trans.
NCUBE
Computer
The
Procecdmgs
Volume
in hypercubes.
GUPTA, A.,
the
C. L., ATHAS, W. C., FLAIG, C. M., MARTIN,
W.-K.
of
139-153.
Deadlock-free
IEEE
M.,
SEITZ,
In
for
of the 18th Annual
n-cubes.
networks.
pp
on Compz~terArc)z~
LINDER, D. H., AND HARDEN, J. C.
NCUBE
routing
ztz jrLS[.
J., GHARACHORLOO, K.,
coherence
Jymposwn
NI.
networks.
NI,
Proceeduzgs
pp. 240–249.
M,
Sympos[unz
Proceed-
148-159.
LIN, X., AND
strategy
Inter?zarional
In
(June),
In
101 – 104
pp. 46-55.
AdLsunced
dutctory-based
pp.
computer
1990.
LENOSIU, D,,
York,
The
East
In 2D meshes.
vol.
INTEL CORPORATION. 1991. .4 Touchstone
DEL X4 System Description
lSHIHATA, H., HORIE, T., INANO,
S,, SHIMIZU, T., KATO, S., AND IKEXAKA,
(Kyusyu,
muting:
Science,
(Aug.),
GLASS. C. J., AND Nt, L. M.
generation
3, 187-196.
multiprocessor
547–553,
algorithms
University.
Comput.1,
J. Iluf.
message
(May)
adaptive
Mich]gan
chip.
A~mcmt,on
for
Cornputrng
Ma~h~ncry.
Vd
dl,
No
5, September
1994
136(3).
deadlock-free
(May),
packet
pp. 178– 186.
Download