Introduction to Network Design

advertisement
Interconnection Network Topology Design
Trade-offs
Organizational Structure

Processors



datapath + control logic
control logic determined by examining register transfers in the
datapath
Networks



links
switches
network interfaces
2
Link Design/Engineering Space

Cable of one or more wires/fibers with connectors at the
ends attached to switches or interfaces
Narrow:
- control, data and timing
multiplexed on wire
Short:
- single logical
value at a time
Asynchronous:
- source encodes clock in
signal
Synchronous:
- source & dest on same
clock
Long:
- stream of logical
values at a time
Wide:
- control, data and timing
on separate wires
3
Example: Cray MPPs

T3D: Short, Wide, Synchronous

24 bits




16 data, 4 control, 4 reverse direction flow control
single 150 MHz clock (including processor)
flit = phit = 16 bits
two control bits identify flit type (idle and framing)


no-info, routing tag, packet, end-of-packet
T3E: long, wide, asynchronous



(500 MB/s)
14 bits, 375 MHz, LVDS
flit = 5 phits = 70 bits


(300 MB/s)
64 bits data + 6 control
switches operate at 75 MHz
framed into 1-word and 8-word read/write request packets
4
Switches
Input
Ports
Receiver
Input
Buffer
Output
Buffer Transmiter
Output
Ports
Cross-bar
Control
Routing, Scheduling
5
Switch Components

Output ports


Input ports





synchronizer aligns data signal with local clock domain
essentially FIFO buffer
Crossbar


transmitter (typically drives clock and data)
connects each input to any output
degree limited by area or pinout
Buffering
Control logic



complexity depends on routing logic and scheduling algorithm
determine output port for each incoming packet
arbitrate among inputs directed at same output
6
Interconnection Topologies
Topology
[Regular]
[Irregular]
[Static]
[OneDimensional]
[TwoDimensional]
[ThreeDimensional]
[Dynamic]
[Hypercube]
[....]
[SingleStage]
[Multistage] [Crossbar]
[OneSided]
[....]
[TwoSided]
7
Static Connection Topologies

Mesh and Torus





Illiac IV, MPP, DAP, CM-2, Paragon
k-dimensional mesh N=nk, d=2k, D=k(n-1)
wraparound variation - Illiac IV
Torus n x n binary torus, d = 4, D = 2 n/2
Hypercubes



iPSC, nCube, CM-2
N = 2n, d = n, D = n
poor scalability, difficulty in packaging higher-dimensional
hypercubes
8
Dynamic Interconnection Networks



Bus-based networks
Crossbar networks
Single Stage Networks


Shuffle-exchange
N input and N output



Crossbar
Recirculating networks
Multi-stage Networks



more than one stage of switching elements
switching box: straight, exchange, upper broadcast, lower
broadcast
network topology and control structure
9
Dynamic Interconnection Networks

Two-sided MIN



connecting an arbitrary input to an arbitrary output
blocking, rearrangeable, nonblocking networks
blocking networks


rearrangeable networks


Data manipulator, Omega, Flip, n-cube, Baseline
Benes network
nonblocking networks

Clos, Crossbar
10
Interconnection Topologies

Logical Properties:


Physcial properties


distance, degree
length, width
Fully connected network



diameter = 1
degree = N
cost?



bus => O(N), but BW is O(1)
crossbar => O(N2) for BW O(N)
- actually worse
VLSI technology determines switch degree
11
Linear Arrays and Rings
Linear Array
Torus
Torus arranged to use short wires



Linear Array
 Diameter? N-1
 Average Distance? 2/3N
 Bisection bandwidth? 1
 Route A -> B given by relative address R = B-A
 Space O(N)
Torus? Or Ring
Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1
12
Multidimensional Meshes and Tori
2D Grid

d-dimensional array



N = kd-1 X ...X kO nodes
described by d-vector of coordinates (id-1, ..., iO)
d-dimensional k-ary mesh: N = kd



3D Cube
k = dN
described by d-vector of radix k coordinate
d-dimensional k-ary torus (or k-ary d-cube)?
13
Properties

Routing




Average Distance




Wire Length?
d x 2k/3 for mesh
dk/2 for cube
Degree?
Bisection bandwidth?


relative distance: R = (b d-1 - a d-1, ... , b0 - a0 )
traverse ri = b i - a i hops in each dimension
dimension-order routing
Partitioning?
k d-1 bidirectional links
Physical layout?


2D in O(N) space
higher dimension?
Short wires
14
Real World 2D mesh

1824 node Paragon: 16 x 114 array

a single cabinet: 16 X 4 array
15
Embeddings in two dimensions
6x3x2

Embed multiple logical dimension in one physical
dimension using long wires
16
Trees





Diameter and ave distance logarithmic
 k-ary tree, height d = logk N
 address specified d-vector of radix k coordinates describing path
down from root
Fixed degree
Route up to common ancestor and down
 R = B xor A
 let i be position of most significant 1 in R, route up i+1 levels
 down in direction given by low i+1 bits of B
H-tree space is O(N) with O(N) long wires
Bisection BW?
17
Fat-Trees

Fatter links (really more of them) as you go up, so
bisection BW scales with N
18
Butterflies
4
0
0
1
0
1
0
1
1
3
2
0
1
1
0
16 node butterfly





building block
Tree with lots of roots!
N log N switches
(actually N/2 x logN)
Exactly one route from any source to any dest
R = A xor B, at level i use ‘straight’ edge if ri=0, otherwise cross edge
Bisection N/2 vs N (d-1)/d (d-dimensional mesh) vs 1 (tree)
19
Benes network and Fat Tree
16-node Benes Network (Unidirectional)
16-node 2-ary Fat-Tree (Bidirectional)

Back-to-back butterfly can route all permutations


off line
What if you just pick a random mid point?
20
Hypercubes




Also called binary n-cubes. # of nodes = N = 2n.
O(logN) Hops
Good bisection BW
Complexity

0-D
Out degree is n = logN
1-D
2-D
3-D
4-D
5-D !
21
Relationship BttrFlies to Hypercubes


Wiring is isomorphic
Except that Butterfly always takes log n steps
22
Toplology Summary
Topology
Degree Diameter
Ave Dist
Bisection
D (D ave) @ P=1024
1D Array
2
N-1
N/3
1
huge
1D Ring
2
N/2
N/4
2
2D Mesh
4
2 (N1/2 - 1)
2/3 N1/2
N1/2
63 (21)
2D Torus
4
N1/2
1/2 N1/2
2N1/2
32 (16)
n(k-1)
n(k-1)/2
2kn-1
27 (13.5) @n=3
n/2
N/2
10 (5)
k-ary n-cube 2n
Hypercube

n=log N n
All have some “bad permutations”


many popular permutations are very bad for meshs
(transpose)
randomness in wiring or routing makes it hard to find a bad
one!
23
Wire Efficient Communication Networks
for Multicomputers

What makes a network efficient?


Efficient use of the limiting resources
Limiting Factors


switches and pins were only considered the limiting factors
Wires are limiting factors because of power and delay as well as
density



At the board level as well as at the chip level, the system
interconnection is limited by wire density
Most of the power dissipated in the networks is CV2f power to used to
drive wires.
Most of the delay is propagation delay over wires or RC delay in
driving wires
24
In the 3D world

For n nodes, bisection area is O(n2/3 )

For large n, bisection bandwidth is limited to O(n2/3 )




Bill Dally, IEEE TPDS, [Dal90a]
For fixed bisection bandwidth, low-dimensional k-ary d-cubes are
better (otherwise higher is better)
i.e., a few short fat wires are better than many long thin wires
What about many long fat wires?
25
The Design Objective of the Network


To minimize latency and maximize throughput
Latency T(l,L) :the average time required to deliver a
message



Each node injects messages with average length L into the
network at an average rate of l bits per cycle.
Three independent variables: topology, routing, and flow control
Topology
Indirect Networks (k-ary d-flys: radix k and dimension d)

No of processing nodes: N = kd
BI = N/2
BWI = Nw/2
din = dout = k
d = 2k
D = d+1
: high bisection width
: low degree
: low diameter
2-ary 3-fly
26
Wire Efficient Topology

Indirect Networks




high bisection width, low degree, low diameter, long wires,
symmetry
the bisection width B = N/2 does not reflect the actual maximum
wire density for this class of networks: vertical partition (N wires)
more accurately reflects the wiring problems
wire area O(N2) : plane mapping - expensive
N = kd. As one varies k and d with the number of processing nodes,
N, and BW fixed.




the degree and diameter are directly controlled.
the channel width remains fixed at w = BW/B=2BW/N.
B is independent of the choice of k and d.
disadvantage: it prevents the designer from trading off the bandwidth
of a channel against the diameter of the network.
27
Wire Efficient Topology

Direct Networks (k-ary d-cubes)






BD = 2N/k
BWD = 2Nw/k
din = dout = d
d = 2d
D = dk/2
BI = N/2
: high bisection width
BWI = Nw/2
din = dout = k
d = 2k
: low degree
D = d+1
: low diameter
For small d





a low and controllable bisection width (N=kd)
low degree
high diameter
short wires (d  3)
wiring complexity O(N)
28
How Many Dimensions?

d = 2 or d = 3




d4




Short wires, easy to build
Many hops, low bisection bandwidth
Requires traffic locality
Harder to build, more wires, longer average length
Fewer hops, better bisection bandwidth
Can handle non-local traffic
k-ary d-cubes provide a consistent framework for
comparison



N = kd
scale dimension (d) or nodes per dimension (k)
assume cut-through
29
Traditional Scaling:
Unloaded Latency(N)
250
140
200
100
d=2
d=3
80
d=4
k=2
60
m/w
40
Ave Latency T(m=140)
Ave Latency T(m=40)
120
150
100
50
20
0
0
0
2000
4000
6000
8000
10000
0
Machine Size (N)

Assumes equal channel width


2000
4000
6000
8000
10000
Machine Size (N)
Unit routing delay (D = 1)
w=1
independent of node count or dimension
dominated by average distance
30
Real Machines


Wide links, smaller routing delay
Tremendous variation
31
Average Distance
100
256
90
1024
80
16384
1048576
Ave Distance
70
60
50
ave dist = d (k-1)/2
40
30
20
10
0
0
5
10
15
20
25
Dimension


but, equal channel width is not equal cost!
Higher dimension => more channels
32
Download