Turn Prohibition Based Routing in Irregular Computer Networks

advertisement
TURN PROHIBITION BASED ROUTING IN IRREGULAR COMPUTER
NETWORKS
LEV ZAKREVSKI1, MEHMET MUSTAFA2, MARK KARPOVSKY2
1
(zakr@adm.njit.edu) ECE Dept., New Jersey Institute of Technology, University Heights, Newark, NJ 07102
({mmustafa,markkar}@bu.edu) ECE Dept., ,Boston University,8 St. Mary’s Street, Boston, MA 02215
2
ABSTRACT
In this paper we study turn prohibition based
multicast wormhole message routing in computer
networks with irregular interconnection graphs. Routing
is accomplished in two phases. When a topology change
is detected the first phase of routing is executed. Here the
network graph is analyzed and a set of turns to be
prohibited to break all cycles in the channel dependency
graph is created. The turns in this set are prohibited from
use during the second routing phase in order to guarantee
that channel dependency graph is free from cycles and
that routing is deadlock-free. Turn prohibition based
routing or TPBR, prohibits the use of at most 1/3 of all
turns in the graph.
In this paper turn prohibition based routing have
been extended for multicast wormhole routing in
irregular topologies. We assume that routers have
capabilities for selectively replicating worms as they pass
through the routers. Furthermore, we assume that each
physical link has the resources to operate virtual channels
in a time-multiplexed manner. The performance gain
obtained by further extension of our routing model to
multiple virtual channels has also been investigated. Our
experimental simulation results are compared with those
of tree-based up/down routing model.
Keywords: wormhole routing, multicasting, virtual
channels, turn model, deadlock prevention
1. INTRODUCTION
Previous work on efficient multicasting includes
several distinct approaches, some of which have been
generalized for irregular topologies. Tseng et al [1] have
proposed a trip-based approach which is a generalization
of traditional path-based models that work only for
Hamiltonian networks and may not be applicable to
networks with irregular topologies or regular topologies
that have been transformed to irregular due to system
faults. Work has also been done on unicast based
multicast routing. The latter approaches involve a high
start-up cost but have the advantage that they do not
require special hardware components in the network and
in the routers. The U-Mesh algorithm by McKinley et al
[2] is one such example which is a minimum time
deterministic algorithm based on dimension-ordered
routing. Kesavan et al [3] have shown that dimension
ordered routing is not always applicable in irregular
networks and have modified the original U-Mesh
algorithm in which chain ordering is considered. They
have proposed three approaches, namely, switch-based
ordering, switch-based hierarchical ordering and chain
concatenation ordering. The first two approaches order the
destination nodes so that during the latter stages of
multicast worms’ lives, the algorithm minimizes
contentions. The chain concatenation ordering improves
upon the first two by ordering the destination nodes in
such a way that during the initial stages of routing also
there is minimal contention. Boppana et al [4] have
compared various multicast routing algorithms. The
approaches they considered are unicast based multicast,
dual and multipath Hamiltonian path based multicast,
column path based multicast routing, and multicast routing
conforming to base routing. In the latter they considered
only the e-cube based approach as the underlying base
routing due to its widespread use. They have found that in
terms of average additional traffic, dual and multipath are
better, column path performs best in terms of network
throughput and multipath performs best in terms of
average latencies.
Dally [5] reported that network throughput can be
increased by having several virtual channels use the same
physical link in a time multiplexed manner where each
virtual channel has its own buffer and control circuitry.
With this scheme idle links due to blocked messages can
be used by other buffers with messages and thus increase
the link utilization. Silla and Duato [6] proposed a virtual
channel flow control mechanism which utilizes physical
link length information, resulting in channel pipelining
and use of control flits. In this scheme each data flit is
accompanied with a control flit indicating which virtual
channel the trailing flit should be transmitted on.
Unfortunately, this wastes half of the bandwidth. Authors
in a subsequent study [7], adopted an improved method in
which several data flits are transmitted once a control flit
takes control of the channel. When all of the message is
transmitted or when no new data flits can be transmitted
due to blockage, a different virtual channel will transmit
its identifying control flit followed by its data flits.
Furthermore, in networks where message lengths are not
constant, long messages tend to monopolize the channels
at the expense of short ones and thus increase their
latency. This particular problem has also been studied by
authors. They decided to limit the number of data flits
that can be transmitted after each control flit and
proposed a simple counter and comparator combination
as implementation for each physical channel. The counter
starts cleared and is incremented with each data flit that is
transmitted. When the programmed threshold is reached
the comparator fires which grants access to the next
virtual channel with data to transmit after clearing the
counter. If no other channels have any messages to
transmit then the original channel resumes the
transmission again.
In this paper we present turn prohibition based
unicast routing simulation results for randomly generated
connected network graphs with two virtual channels and
a novel approach to multicast routing in which multicast
tree lengths are minimized. Multicasting is studied only
in one virtual channel networks. For the two virtual
channels case the unicast algorithm is simple. According
to the TPBR-algorithm, in the source node where the
messages originate and enter the router using the local
injection channel none of the turns are prohibited. The
path chosen for the first virtual channel is the shortest
path. When a worm begins its trip it always starts with
the first virtual channel and remains there until it is
blocked at which time it switches to the second virtual
channel with the provision that the blocked worm is
treated as though it were originating in that node. Since
none of the turns are prohibited at the source node this
approach offers a high probability of access to the second
channel. To prevent deadlocks, once on the second
channel, the worm is not permitted to switch back to the
first virtual channel.
To avoid the high startup cost in multicasting with
multiple startup stages, we assume that network routers
have the feature of selectively replicating the worms to
be routed out on several ports simultaneously. This
would require the router to modify the multi-destination
header in the incoming worm in such a way that subsets
of destinations would be present on the outgoing worms
on different ports. Since it is neither possible nor
necessary to adjust the destination subset sizes to be
equal to each other, the router will have to manage
transmission of the worms in such a way that
transmission of the worms with longer headers would be
initiated first. The payload portion of the multicast
message would be transmitted concurrently on all of the
output ports used for this message. We believe that with
the advent of high density FPGAs and gate array devices,
incorporation of worm replication feature in new routers
is not a significant drawback. Assuming that routers have
this feature, the multicast transmission tree is constructed
to cover all destination nodes with the source node as the
root of the tree. During the construction of the tree turn
restrictions imposed by the underlying unicast routing
algorithm are obeyed.
The simplest way to construct a multicast
transmission tree which we call the 0-algorithm, involves
combining the message paths generated by the unicast
routing algorithm in a single multicast tree. Memory
complexity of the 0-algorithm is O(Nd) where d is the
degree of the node and N is the size of the network. Next
version of the algorithm is called the 1-algorithm which is
also based on unicast routing algorithm with minimal
additional overhead. Here we construct a set of output
buffers Sj associated with each destination aj to include the
output ports that have the shortest distance to destination
aj. Then the covering problem in which all Sj is covered
with minimum number of output buffers is solved. For
instance, if at the end of the covering stage buffer y covers
Sj then message to aj will be transmitted from output port
y. The complexity of the 1-algorithm is O(Nd2). The most
demanding next algorithm which we call the -algorithm
is global. Here the multicast tree is constructed as follows.
First, the shortest path (using the unicast routing table)
from the source node a0 to one destination node a1 is
constructed and added to the multicast tree. In subsequent
iterations a node v from the already constructed tree is
found such that path from v to another multicast
destination node a2 is the shortest. This path is then added
to the tree at node v. Process is complete when all
destination nodes are incorporated into the tree.
Complexity of this algorithm is O(N2d).
In order to describe the operation of the TPBRalgorithms, let us consider the graph in Fig.1 with 14
nodes of degree 4. In the figure prohibited turns are
denoted by arcs extending through the two involved edges.
For example the turn from node 5 about node 4 to node 7
represented as a three-tuple (5,4,7) is prohibited.
Altogether the TPBR-algorithm identified 21 turns that
break all cycles in the graph to prevent deadlocks from
occurring. As indicated above deleting any one of these
prohibited turns results in the introduction of a cycle. For
example, it is easy to see that eliminating the turn
(11,10,12) will introduce the cycle (11,10,12,11). The
TPBR-algorithm begins by choosing a node to delete.
Since all nodes are of degree 4 any one of the nodes can
be chosen. Here node 0 is chosen and since after its
deletion the graph remains connected all turns about node
0 are prohibited. Node 9 is deleted next since it has a
minimal degree after the deletion of node 0. This process
continues until all nodes have been accounted for. For
detailed explanation of the operation of the TPBRalgorithm please refer to [8,12].
In the remainder of this paper we discuss virtual
channel experiments and results in Section 2. In Section 3
multicast-tree length based approaches are discussed with
examples and preliminary experimental results followed
by conclusions in the last section.
3
1
2
1
2
5
7
4
4
4
3
2
1
4
13 2
3
1
4
2
3
3
3
4
8
1
2
2
3
4
1
1
4
2
3
0
9
2
1
4
3
6
1
1
2
4
1
2
2
3
4
4
1
11
1
4
4
3
1
12
3
3
2
3
2
4
10
1
3
(a)
2
Fig.1. A randomly generated graph of 14 Nodes of
degree 4 with indicated prohibited turns after
TPBR-algorithm is applied.
2. VIRTUAL CHANNEL EXPERIMENTS
In this section we discuss our simulation results of
unicast routing with TPBR-algorithm for networks with 2
virtual channels. To prevent deadlocks at the
consumption channels we provided sufficient number of
them at every node. We performed flit-level simulation
and monitor the network latency versus the network load
which is in terms of probability of message generation.
For each randomly generated connected graph we
transmit 10,000 messages between randomly chosen
source and destination pairs. We then duplicate this set of
experiments for 100 different connected graphs. In all of
our experiments we have fixed message size of 200 flits.
In Fig.2 we see average latencies in one and two virtual
channel networks where Up/Down and TPBR-algorithms
are compared. We note the significant gain attained with
two virtual channels as compared with one channel. For
example in the 256 node networks the saturation point
change is about 100 percent with two virtual channels.
We also observe that the Up/Down algorithm gets a
significant boost in performance but in all of our
experiments the TPBR-algorithm results outperformed
the Up/Down algorithm results.
(b)
(c)
In Fig.2d we see that for very low message
generation rates, routing in one VC networks performs
better. This is due to the fact that our multiplexing
technique is based on fixed time-slot assigned TDM in
which the physical link is not optimally used. This is
similar to the one control flit per data flit approach used
in [6].
We also have investigated. Corresponding results
are illustrated by Fig.3
(d)
Fig. 2 Comparison of average latencies in one and
two virtual channel networks
length to a new node in the destination set. In our case this
is node 3. Subsequently, this path is then added to the tree.
We then add to the tree node 6, which is also at two hops
from the tree. Process continues until all destination nodes
are incorporated into the tree.
Fig. 3 Scalability studies for Up/Down and TPBR
algorithms
(a) 0-Algorithm
3. TPBR-BASED MULTICAST
EXPERIMENTS
In order to demonstrate the operation of 0, 1 and algorithms let us consider the 14 node network of Fig.1,
where node 13 is to multicast a message to a set of
destination nodes D={1,6,12,3,2}. According to 0algorithm (see Fig.4a), the source node replicates the
original worm into three smaller worms; one unicast
worm for next node 4, one for node 7 and one multicast
worm for node 12. Since this decision is made strictly
with the unicast routing table in the source node it can
bee seen that even though nodes 4 and 7 are not in the
destination set they are used to forward the worms to
their ultimate destinations of 1 and 6 respectively. This is
due to the fact that these nodes are closest to nodes 1 and
6 from the source node. The last worm when it arrives at
node 12 is further replicated by the router of node 12 into
two worms one for node 2 and the other for node 3. The
multicast transmission tree is shown in Fig.4a is seen to
have a tree length of 8 hops.
Node
1
6
12
3
2
Port
1
2
3
4
2
X
2
1
3
X
1
2
0
X
3
3
1
X
2
2
(b) 1-Algorithm
(c)  -Algorithm
Fig. 4. Multicast trees according to 0, 1 and -algorithms
for network graph of Fig. 1
2
X
2
3
Table 1. Covering Table For 1-Algorithm
Finally in Fig.4c we show the multicast
transmission tree for the -algorithm in which the tree
length is 6 hops. Also note that the number of the
forwarding non-destination nodes is reduced to one. In
the global approach, first the shortest path from source
node 13 to a destination in the destination set and is
added to the tree. This is node 12 on port 1 of the source
node. In subsequent iterations a node from the already
constructed tree is found which gives the shortest path
Fig. 5. Average multicast tree length vs. destination
set size
High level multicast experimental results can be
seen in Fig.5 where the average multicast tree length
versus the multicast destination set size is shown.
Percentage improvements achieved over the simple
0-algorithm versus the destination set size is shown in
Fig.6. It can be seen that the incremental enhancement in
the average tree length is approximately 1% and 12% for
the 1-algorithm and the -algorithm. It is not obviously
very advantageous to bear the additional memory
overhead in the routers for only one percent gain over the
0-algorithm. We feel that this is due to the superiority of
the basic TPBR-algorithm.
REFERENCES
[1] Y.-C. Tseng, D. Panda, K. and T. Lai, H. "A TripBased Multicasting Model in Wormhole-Routed
Networks With Virtual Channels," IEEE Trans. on
Parallel and Distributed Systems vol. 7, no. 2, pp.
138-150, 1996.
[2] P. McKinley, A. Esfahanian, and L. Ni, "UnicastBased Multicast Communication in WormholeRouted Direct Networks," IEEE Trans. on Parallel
and Distributed Systems vol. 5, no. 12, pp.1254-1265,
1994.
[3] R. Kesavan, K. Bodalapati and D. Panda, K.
"Multicast on Irregular Switch-based Networks with
Wormhole Routing," Third Int. Symp. on High
Performance Computer Architecture, pp. 48-57, 1997
[4] R. Boppana, V., S. Chalasani and C. Raghavendra, S.
"Resource Deadlocks and Performance of Wormhole
Multicast Routing Algorithms," IEEE Trans. on
Parallel and Distributed Systems, vol.9, pp.535-549,
1998.
[5] W. Dally, J. "Virtual-Channel Flow Control," IEEE
Trans. on Parallel and Distributed Systems vol.3,
pp.194-205, 1992.
Fig. 6. Average percentage enhancements of 1-algorithm
and -algorithm over the simple 0-algorithm.
4. CONCLUSIONS
The TPBR-algorithm has been partially evaluated
for unicast routing with one two virtual channel
networks. Almost 100 percent gain in the saturation
points or maximal sustainable throughputs can be
attained by transitioning from one to two virtual
networks. The scalability aspect of the algorithm has also
been studied with encouraging results. We observed that
not only there is no degradation with network size but
that the performance of the network actually improves
(Fig.3) with the TPBR-algorithm. In contrast the
Up/Down algorithm shows marginal enhancement as the
network size is increased. We have also seen the
multicast tree length to increase very slowly as the
algorithm evolves from strictly local or 0-algorithm to
global or the -algorithm. Maximum performance gain
of only 15% is attained with the -algorithm over the
local 0-algorithm (Fig.6).
5. ACKNOWLEDGEMENTS
Authors would like to thank Anurag Agarwal from
Boston University for assisting with the simulation
experiments. This work was supported by the NSF under
Grant MIP 9630096.
[6] F. Silla and J. Duato "On the Use of Virtual Channels
in Networks of Workstations with Irregular
Topology," Proc. of the 1997 Parallel Computing,
Routing, and Communication Workshop, 1997.
[7] F. Silla, J.Duato, A. Sivasubramaniam and C.R. Das,
"Virtual Channel Multiplexing in Networks of
Workstations with Irregular Topology", Proceedings
of the 5th Int. Conf. on High Performance Computing,
pp.147-154, 1998.
[8] L. Zakrevski and M. Karpovsky "Unicast Message
Routing in Communication Networks with Irregular
Topologies," Proc. of CAD-99, 1999.
[9] C. Glass and L. Ni "The Turn Model for Adaptive
Routing," Journal of ACM vol. 5, pp.874-902, 1994.
[10] L. Zakrevski and M. Karpovsky, G. "Fault-Tolerant
Message Routing in Computer Networks," Proc. of
Int. Conf. on PDPA-99, pp.2279-2287, 1999.
[11] L.Zakrevski, S. Jaiswal, L. Levitin and M.Karpovsky,
"A New Method for Deadlock Elimination in
Computer Networks With Irregular Toplologies,"
Proc. of PDCS-99, 1999.
[12] S.Jaiswal, L.Zakrevski, M.Mustafa and M.Karpovsky,
"Unicast Wormhole Message Routing in Irregular
Computer Networks," To be presented at PDCS
2000, 2000.
Download