Uploaded by r

Space-division fabrics

advertisement
1YQ
UNSW
School of Electrical Engineering and Telecommunications
Space-division fabrics
Keshav Chapter 8
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1
184
UNSW
School of Electrical Engineering and Telecommunications
Textbook references
• Keshav: Ch. 8.4
• Varghese: Ch. 13
• switch cost: 13.9.1 of Varghese
• Appendix A.3 and A.4: space division switch fabrics
• other space division switch fabrics: Varghese pp. 443-4
Copyright
©
Copyright
©
10/03/2017
Tim Moors
2
14X
UNSW
School of Electrical Engineering and Telecommunications
Original References
Y. Yeh et al: “The Knockout Switch: A simple, modular architecture for
high-performance packet switching”, IEEE J. Sel. Areas in Comm.
5(8):1274-83
C. Clos: “A study of non-blocking switching networks”, Bell Sys. Tech. J.,
32(3):406-24, Mar. 1953
L. Goke and G. Lipovski: “Banyan networks for partitioning
multiprocessor systems”, Proc. 1st Annual Int. Symp. Comput.
Architecture, Dec. 1973, pp. 21-28
K. Batcher: “Sorting networks and their applications”, Proc. AFIPS
Spring Joint Computer Conference, pp. 307-14, 1968
A. Huang and S. Knauer: “Starlite: A wideband digital switch”, Proc.
Globecom, Dec. 1984, pp. 121-5
V. Benes, Mathematical Theory of Connecting Networks and Telephone
Traffic, Academic Press, New York, 1965
Copyright
©
Copyright
©
10/03/2017
Tim Moors
3
1DV
UNSW
School of Electrical Engineering and Telecommunications
Tutorial/survey References
W. Kabacinski et al: “Guest editorial - 50th anniversary of
Clos networks”, IEEE Comm. Mag. 41(10):26-7
G. Broomell and J. Heath: “Classification categories and
historical development of circuit switching topologies”,
ACM Computing Surveys 15(2):95-133
Online texts (see the TELE9751 reading list):
A. Pattavina: Switching Theory, Wiley
H. Chao et al: Broadband Packet Switching Technologies
Ch. 13 of N. Mir: Computer and Communication Networks
Copyright
©
Copyright
©
10/03/2017
Tim Moors
4
51Q6*
UNSW
5
School of Electrical Engineering and Telecommunications
Cisco Nexus context
“Each row in the crossbar is
associated with an input port, and
each group of four columns is
associated with an egress port; thus,
there are four cross-points per egress
port. In addition, there are four fabric
buffers with 10,240 bytes of memory
buffer per egress port.”
“The crossbar fabric on the Cisco Nexus 5500
Series is implemented as a single-stage fabric,
thus eliminating any bottleneck [blocking]
within the switches”
“The [fabric] is a single-stage highperformance 100-by-100 crossbar with an
integrated scheduler. The scheduler
coordinates the use of the crossbar
between inputs and outputs, allowing a
contention-free match between I/O pairs.
The scheduling algorithm is based on an
enhanced iSLIP algorithm.”
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1HX
UNSW
School of Electrical Engineering and Telecommunications
Outline
Single-stage: Crossbar switches
Handling crossbar output port contention
Knockout
Arbitration
Staged switches
Networks of basic elements
Clos
Banyan
Structure
Blocking & Motivation for sorting
Batcher sorting networks
Copyright
©
Copyright
©
10/03/2017
Tim Moors
6
1JC
UNSW
School of Electrical Engineering and Telecommunications
Crossbar switches
At the intersection of each input and each output, there is a
“cross point”, which can be selectively enabled, allowing
communication from input to output.
e.g. Intel 470 switches, Cisco Catalyst 6500, 12000 series routers
Feasible to implement in VLSI (e.g. PMC-Sierra PM9312),
limited by high-speed I/O to chip - pincount
[Sketch from znet.net/~cdk14568/mpet/contacts/fig23-5.gif. Photos from Lucent]
Copyright
©
Copyright
©
10/03/2017
Tim Moors
7
UNSW
School of Electrical Engineering and Telecommunications
8
Source of switch
element photo unknow
Bell System’s #4A toll
crossbar switching
system, circa 1957
From AT&T Archives
and History Center
IEEE Spectrum Feb.
2013
Copyright
©
10/03/2017
Tim Moors
1V8
UNSW
School of Electrical Engineering and Telecommunications
Advantages of crossbar switches
√ Simple structure
√ Suitable for circuits or packets
√ Low latency – minimal number of connecting points
between arbitrary input and output.
√ No internal blocking (may have output port blocking)
√ Multicast is easy with electrical crossbars
× Difficult with MEMS optical crossbars, since mirror will
(hopefully) reflect all signal power to one intended output.
× Easy to reach output, but scheduling multicast s.t. all outputs are
simultaneously available (other inputs aren’t transmitting to some)
can be tricky.
Copyright
©
Copyright
©
10/03/2017
Tim Moors
9
177
UNSW
School of Electrical Engineering and Telecommunications
Disadvantages of crossbar switches
× Scalability:
× # of crosspoints (Nx) grows with (P=number of ports)2
× Difficult to incrementally expand
× Output buffer speed ∝ P, not line speed (=> knockout
process).
× Fanout: Number of crosspoints on each line increases
linearly with number of ports, increasing capacitive
loading, slowing transmission (just as bad as time-division
bus, but inferior to space-division fabrics such as Banyan)
× No fault tolerance: each crosspoint is needed for one
connection or another.
Copyright
©
Copyright
©
10/03/2017
Tim Moors
10
103
UNSW
School of Electrical Engineering and Telecommunications
Handling crossbar output port contention
Crossbar switches (like all switches) can suffer output port
blocking.
Responses:
• Internal buffering at crosspoints (next slide) [GJ>
• Buffers at outputs [5D>. Issue: How to reduce speed of
buffer, or reduce number of low (input-port) speed
buffers? → Knockout tournaments [4K>
• Block at input. Issue: How to be fair to inputs s.t. none is
blocked forever? → arbitration systems. [ZP>
Copyright
©
Copyright
©
10/03/2017
Tim Moors
11
1GJ
UNSW
School of Electrical Engineering and Telecommunications
Internal crossbar buffering
Internal buffering, e.g. to handle output contention
Fig. 8.14 from Keshav
Copyright
©
Copyright
©
10/03/2017
Tim Moors
12
15D
UNSW
School of Electrical Engineering and Telecommunications
Buffering at crossbar outputs
Multiple inputs might
• Time-division access to one output port buffer.
• Small, simple, but need fast buffer.
• Speedup problem: as number of inputs increases, speed of
buffer at output must increase (like shared medium switch)
• Lead to multiple distinct buffers on each output port.
• Low speed, but no sharing => more buffer space needed.
How about something in-between: some L<P buffers, and mechanism
(Knockout tournament) to spread inputs evenly over buffers?
Copyright
©
Copyright
©
10/03/2017
Tim Moors
13
UNSW
School of Electrical Engineering and Telecommunications
14
Knockout switch
Can we exploit the directional nature of
traffic (few inputs will be directed to any one
output at a time) to reduce the output port e.g. left-most occupied buffer
buffer capacity (size+speed)?
“wins” a round; losers repeat
So use a “knockout tournament” to decide which
Example 1
Example 2
inputs get buffered:
• Competitors (inputs) play and those who win
progress to the next round.
At the end, one competitor is selected (transferred
to buffer); others have lost one match.
• Losers then compete again (clean slate) to
determine next competitor to be selected.
• Repeat to capacity of output port.
For details, see Peterson and Davie, 1st edition, pp. 193-6; Keshav p. 199
14K
Copyright
©
Copyright
©
10/03/2017
Tim Moors
Scenarios
UNSW
P = packet from L/R input
Inputs x
15
School of Electrical Engineering and Telecommunications
PL PR PL - - P R - -
L R
Knockout structure
Outputs
PL PR PL - PR - - -
or @ random
PR PL (to be fair to RHS ports)
B
8 in
C
A
A ... A
Shifter
T0
Buffers
8 out
Delay
8
A
Packet
...
filters
P:L knockout
B concentrator
C Shifter &
shared buffers
D
D
D
D
Shifter
D
T1
D
D
D
D
D
D
D
D
1
2
3
Buffers
Shifter
4
T2
Buffers
P:L knockout concentrator
selects up to L pkts from P inputs
(here P=8, L=4). Delay for pipeline Shifter spreads load
1L0 Drawings based on L. Peterson
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1ZP
UNSW
16
School of Electrical Engineering and Telecommunications
Crossbar arbitration†
Problem: When multiple inputs have info
for one output, crossbar controller needs
to determine when each should be
switched through.
Aims: Schedule aims for
• Performance: Maximise throughput by
minimising transfer rounds; minimise
delay
• Fairness: each input should receive
equal access.
• Cost: Shouldn’t require much state info
or many iterations.
Generic method: May achieve this through
multiple iterations (per round) of
negotiation between inputs & outputs
FIFO
6 rounds
iSLIP
Optimal?
1:
2:
3:
4:
A
B
C
B
A
C
-
C
B
A
-
† Often (e.g. by Varghese) called “scheduling”, but called arbitration by others (e.g. Cisco) & “arbitration” avoids
confusion with later lecture on “packet scheduling”: Scheduling affects the timing of packets; here timing for
crossing the fabric; later packet scheduling
timing
when
sentTim
onMoors
a line.
Copyright
©
Copyright
©
10/03/2017
19U
UNSW
School of Electrical Engineering and Telecommunications
Arbitration basis
• Each input maintains a separate “Virtual Output Queue” of cells
destined for each output. (Fixes Head of Line blocking <DD])
• Each of P inputs maintains P VOQs
=> P2 bits of activity info control arbitration. Feasible to optimise with
tens of ports (e.g. P=32 => 1024 bits)
• Need to
• convey activity info between inputs and outputs
• Inputs send requests to outputs
• Each output offers to serve one request (must choose fairly)
• Each input accepts one offer
• coordinate inputs and outputs, e.g. multiple iterations: Reiterate to
use outputs whose offers weren’t accepted.
• Trade-off between performance and number of iterations
Copyright
©
Copyright
©
10/03/2017
Tim Moors
17
1ZK
UNSW
School of Electrical Engineering and Telecommunications
Parallel Iterative Matching (PIM)
• Offers are made to random inputs
• Used in DEC AN-2 30 port switch
Copyright
©
Copyright
©
10/03/2017
Tim Moors
18
UNSW
School of Electrical Engineering and Telecommunications
19
PIM example
Figure from Varghese
they are similar to other requests. “a=1” on input c is an artifact of iSLIP example
12CCorrection: Requests from A to 1 should be gray;
Copyright
©
Copyright
©
10/03/2017
Tim Moors
UNSW
School of Electrical Engineering and Telecommunications
20
PIM example
(repeated for non-animated display)
Figure from Varghese
they are similar to other requests. “a=1” on input c is an artifact of iSLIP example
1CF Correction: Requests from A to 1 should be gray;
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1QT
UNSW
School of Electrical Engineering and Telecommunications
PIM vs iSLIP
• “PIM … requires a logarithmic number of iterations to
attain maximal matches”
vs “iSLIP … gets close to maximal matches after just one
or two iterations.”
(though “maximal” ≠ “close to maximal” & PIM example
uses 1 iteration and iSLIP example uses 2 iterations )
• in terms of fairness mechanism
PIM : Ethernet as iSLIP : token ring
randomness
taking turns
Copyright
©
Copyright
©
10/03/2017
Tim Moors
21
UNSW
School of Electrical Engineering and Telecommunications
iSLIP
• Each port maintains a pointer to next port desired to use.
• Choose lowest port ≥ pointer; update pointer
• “≥” in circular sense: P+1 = 1st port
• Pointers initially all point to 1st port, but desynchronise as
randomly directed traffic passes through.
• Randomness creates fairness (as in Knockout & Ethernet); PIM
adds it directly; iSLIP derives it from random traffic.
• Used in Cisco GSR router
N. McKeown et al: “Scheduling Cells in an Input-Queued Switch”, IEE Electronics Letters, pp.2174-5, 1993
1D8
Copyright
©
Copyright
©
10/03/2017
Tim Moors
22
UNSW
23
School of Electrical Engineering and Telecommunications
iSLIP round 1
a=3?!
Pointers only get updated after iteration 1 => B still points to 1 & 2 still points to A
B overhears A accepting 1 => knows that 1 is busy => doesn’t request 1
16E
Figure from Varghese
Copyright
©
Copyright
©
10/03/2017
Tim Moors
UNSW
24
School of Electrical Engineering and Telecommunications
iSLIP round 2
• a
3
3
Figure from Varghese
Round2 iteration 1 should show C with packet for output 3
1K3
Copyright
©
Copyright
©
10/03/2017
Tim Moors
3
UNSW
School of Electrical Engineering and Telecommunications
iSLIP round 3
• a
Figure from Varghese
11G
Copyright
©
Copyright
©
10/03/2017
Tim Moors
25
1QQ
UNSW
School of Electrical Engineering and Telecommunications
Outline
Copyright
©
Copyright
©
10/03/2017
Tim Moors
26
1RH
UNSW
School of Electrical Engineering and Telecommunications
“Centralized” vs distributed switches
Switches can be:
• centralized (e.g. implemented in a single box) or
• distributed (consisting of multiple interconnected boxes)
Potential advantages of centralized switches:
√ Regular internal structure, c.f. heterogeneous distributed systems
√ Feasible to have central control, c.f. propagation delays in distributed
systems
We’ll consider centralized switches first, and later consider distributed
switching (Ethernet switches and inter-switch protocols)
Regular “centralized” space-division designs can conceivably be applied
to distributed switching, provided component consistency is possible.
(e.g. Clos switches in data centres)
Copyright ©
27
UNSW
School of Electrical Engineering and Telecommunications
Composition of switches
Small switching components can be connected to form
larger switching systems.
Examples from earlier lectures:
• stacking of switches using high-speed port <MV]
• a network forms a distributed switch
• backplane fabric of chassis-based switch <MV]
interconnects small switches within port processors <0F]
Application to space-division switches:
• Multistage switches are composed of “networks”† of
smaller switches (e.g. crossbars or shared-media)
A crosspoint is like a gate: It has the potential to connect one input to one output, but
it can be controlled so that the connection is enabled or disabled at will.
1AC
Copyright ©
28
161
UNSW
School of Electrical Engineering and Telecommunications
Staged switches
Potential benefits:
√ Fewer crosspoints than in a crossbar switch
√ Diversity of paths (e.g. see figure on next slide [JX>)
How?: Selection of switch in first and last stage is
determined by which input & output are being connected
=> use three or more stages to get benefits
Why?: Intermediate stages can offer choice of switch =>
√ Lower blocking probability
√ Increased reliability: can still connect input and
output even if a component switch has failed
† Here we are interested in networks within the switch. Of course switches can
also be interconnected to form external networks (the more common use of the word).
Copyright
©
Copyright
©
10/03/2017
Tim Moors
29
30
School of Electrical Engineering and Telecommunications
Clos switches
P/
n
k
arrays
P/
n
Data center applications: Clos Networks: What's Old Is New Again
Copyright
©
Copyright
©
10/03/2017
Tim Moors
…
…
• Each switch (“array”) of a stage
arrays
arrays
connects to each switch of neighboring
nk
kn
stages
P/  P/
n
n
(# inputs may # outputs for internal
nk
kn
switches)
P/  P/
P
n
n
P
• Simplest Clos switches have 3 stages, inputs
outputs
but more are possible by recursively
nk
kn
replacing middle stages by 3-stage Clos
P/  P/
n
n
structures.
nk
kn
• Advantages:
Ignoring the ellipsis, this
√ can switch circuits (c.f. Banyan)
example has P=8+, n=2, k=3+
√ path diversity
N = (n×k)(P/n)2 + ((P/n) × (P/n))k
√ fewer crosspoints for large P x
= 2Pk + kP2/n2
What are the optimal values for n and k?
…
1JX
UNSW
31
School of Electrical Engineering and Telecommunications
k=Number of arrays in middle kstage
…
...
...
Copyright
©
Copyright
©
10/03/2017
Tim Moors
…
...
...
To be non-blocking, must be able to
arrays
P/
P/
create a connection in the presence of
n
n
P
P
/n /n
existing connections on all other ports
arrays
arrays
of input and output arrays.
nk
kn
P/  P/
Each array has only one connection to
n
n
switches in adjacent stages.
nk
kn
P/  P/
P
n
n
How many middle-stage arrays may be in P
outputs
use?
inputs
• n-1 may be tied up with other inputs
nk
kn
P/  P/
from the same input array. (n=3 here)
n
n
• n-1 may be tied up with other outputs
nk
kn
P/  P/
to the same output array.
n
n
Worst case: may have different mid(Connections that don’t share both
stage arrays tied up by inputs and
same input and same output array
outputs.
can reuse existing arrays.)
=> need at least (1+1)(n-1)+1 middlestage arrays, i.e. k≥2n-1
=> Nx = 2P(2n-1) + (2n-1)P2/n2
…
17M
UNSW
UNSW
32
School of Electrical Engineering and Telecommunications
N=>Number of arrays in outer stages
Differentiate Nx wrt n
n 2 2 P 2  2n  1P 2 2n
0  4P 
n4
0  4 Pn 4  2 P 2 n 2  4 P 2 n 2  2 P 2 n
0  2n 3  Pn  P
For n large†, neglect +P ≡ +Pn0 factor
n
Ports
8
128
2048
P
2


N x (min)  4 P 2 P  1
Nx(3-stage Clos)
96
7,680
516,096
Nx(Cross)
64
16,384
4,194,304
† Bonus mark on offer if you can justify why n (not P) should be large.
Is it because large n leads to large k for non-blocking, and with large k, adding another array in the middle to provide fault tolerance with little overhead?
15G
Copyright
©
Copyright
©
10/03/2017
Tim Moors
129
UNSW
School of Electrical Engineering and Telecommunications
(Rearrangeably) non-blocking Clos
• 2n-1≤k ensures Clos switch won’t block
• n ≤ k < 2n-1 ensures Clos switch is rearrangeably nonblocking, i.e. can connect input to output, but might need
to rearrange existing connections
e.g. n=k=2
• to connect each input to corresponding output requires pattern on left
• straight connections may end; but cross connections block connection
between bottom input to top output
• need to rearrange
Copyright
©
Copyright
©
10/03/2017
Tim Moors
33
1V6
UNSW
School of Electrical Engineering and Telecommunications
Clos switches
Used in some
commercial products:
• Juniper T-series routers
• NEC ATOM ATM
switch
• Myrinet switch
Clos architecture used in
many data centre designs
Photo from http://www.cs.utk.edu/~dongarra/lyon2000/Talk02-Chuck-Seitz.ppt
Copyright
©
Copyright
©
10/03/2017
Tim Moors
34
14G
UNSW
School of Electrical Engineering and Telecommunications
Outline
Copyright
©
Copyright
©
10/03/2017
Tim Moors
35
UNSW
School of Electrical Engineering and Telecommunications
Pipelining
Pipeline: “A sequence of functional units ("stages") which performs a
task in several steps, like an assembly line in a factory. Each functional
unit takes inputs and produces outputs which are stored in its output
buffer. One stage's output buffer is the next stage's input buffer. This
arrangement allows all the stages to work in parallel thus giving greater
throughput than if each input had to pass through the whole pipeline
before the next input could enter.
The costs are greater latency and complexity due to the need to
synchronise the stages in some way so that different inputs do not
interfere. The pipeline will only work at full efficiency if it can be filled
and emptied at the same rate that it can process.” [http://foldoc.org/]
• Multistage switches can operate as pipelines, with each stage
operating (at any instant) on a different set of packets.
• Fixed-length packets (aka “cells”) facilitate synchronisation: all
switches in a stage operate for the same period. Port processors
segment variable length frames into cells & reassemble from cells.
1K7
Copyright
©
Copyright
©
10/03/2017
Tim Moors
36
1LU
UNSW
37
School of Electrical Engineering and Telecommunications
Multistage networks:1.Physically separate stages
Physically separated
stages form a pipeline
Figure from H. Peyravi
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1JU
UNSW
38
School of Electrical Engineering and Telecommunications
Multistage networks:2.Recirculation
Multiple logically separate stages
(emulating physically separate stages)
can be created by recirculating, over
time, data through one physical stage.
≡S95
SA≡S
≡S
1
≡S106
SB≡S
≡S
2
≡S117
SC≡S
≡S
3
≡S128
SD ≡S
≡S
4
External
Inputs of switches A-D: Recirculated
External
Outputs of switches A-D: Recirculated
Copyright
©
Copyright
©
10/03/2017
Tim Moors
time
1EJ
UNSW
School of Electrical Engineering and Telecommunications
Pipelined multistage switches
• With multiple stages, each transfer (from input to output) passes
through successive stages
• At any instant:
• circuit engages the complete path through the fabric
• packets need only engage one stage at a time
=> can pipeline packets through: at any instant, each stage
processes a separate wave of packets
• Packet switching, stages could act autonomously, each responding to
different packet header bits.
• If we identify output ports with binary addresses, successive stages
could handle progressively less significant bits of the address.
(Note: “address” here is for internal use within switch. Packet classifier will
take address from packet (e.g. IP) header to determine output port, and hence
internal address)
Copyright
©
Copyright
©
10/03/2017
Tim Moors
39
1UX
UNSW
40
School of Electrical Engineering and Telecommunications
Banyan switches
from input port 001 to dest port 110
n
Self-routing using binary
representation of output
0 1
port (extra header for use
inside switch)
Direction for each stage
0 1
specified by bit
corresponding to that
stage (MSb 1st)
0 1
=> can’t multicast
P-port switch has log2P
stages each with P/2 2×2
000 001 010 011 100 101 110 111
switches
1 => switch to right output port of switch in this stage
=> Nx= 2Plog2P
0 => switch to left output port
“There is a very large, famous Banyan tree that occupies more than 3 acres of land in a park
in Lahaina, on the island of Maui in the Hawaiian islands.” – Seifert p. 208
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1KM
UNSW
41
School of Electrical Engineering and Telecommunications
Switch complexity compared
Number of crosspoints for a switch with P ports:
Crossbar: Nx = P2
Clos:
N x  4P 2P  1
Banyan: Nx= 2Plog2P


P=2048
4,194,304
516,096
45,056
Banyan switches
√ require the fewest crosspoints (of those covered)
× but suffer from the potential for blocking
× typically limited to packet switching
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1VN
UNSW
School of Electrical Engineering and Telecommunications
42
Project: Software Banyan
• Rudimentary (e.g. fixed size), but demonstrates the idea.
void fabricElements() {
• Full source code in ...
elements[7]->zero = elements[10];
3FabricBANYAN.c
elements[7]->one = elements[11];
struct banyanElement {
int elementNumber;
// flags indicate if
// outputs busy top=0,bot=1
int topFlag;
int bottomFlag;
// pointers to next element
banyanElement *zero;
banyanElement *one;
};
banyanElement *elements[12];
/* Will print out graphical representa
printf("
___
___
___
printf("000 ---| 0 |---| 4 |---| 8 |-printf("001 ---|___|
|___|\\ |___|printf("
\\
/
\\/
printf("
___ \\ /___ /\\ ___
printf("010 ---| 1 |---| 5 |/ | 9 |-printf("011 ---|___| /\\|___|---|___|printf("
\\/ \\/
printf("
___/\\ /\\__
___
printf("100 ---| 2 | \\/| 6 |---| 10|printf("101 ---|___|---|___|\\ |___|printf("
/ \\
\\/
printf("
___/
_\\__ /\\ ___
printf("110 ---| 3 |
| 7 |/ | 11|-printf("111 ---|___|---|___|---|___|-*/
Copyright
©
Copyright
©
Tim Moors
} 10/03/2017
UNSW
School of Electrical Engineering and Telecommunications
43
Project: Software Banyan (ctd)
if(out_addr[stage] == '0') { // send from top node of element
// check whether output zero has been used, if not allow this
packet to go through & record that it has been used; else
discard this packet
if(currentElement -> topFlag == 0) {
nextElement = currentElement->zero;
currentElement -> topFlag = 1;
} else { // element node busy, packed dropped.
blocked = 1;
}
} else {
// send from bottom node of element
if(currentElement -> bottomFlag == 0) {
nextElement = currentElement->one;
} else { // element node busy, packed dropped.
blocked = 1;
}
}
1CW
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1M2
UNSW
44
School of Electrical Engineering and Telecommunications
Types of Banyan blocking
110 110
Internal blocking (†): Contention for an
output port of a component switch in
one of the early stages.
Output port blocking (‡): Contention
for an output port of a switch in the
last stage.
§
110
111
0 1
†
0 1
‡
0 1
Output port blocking may even occur
internally (§)
Copyright
©
Copyright
©
10/03/2017
Tim Moors
000 001 010 011 100 101 110 111
1WY
UNSW
45
School of Electrical Engineering and Telecommunications
Probability of internal blocking
Destinations:
Blocking probability is
high, e.g. if priority is
given to leftmost input.
000 010 111 100 001 011 110 101
(Exercise: Find a formula
expressing how high,
assuming random
input→output mappings,
e.g. see wiki)
000 001 010 011 100 101 110 111
Output port numbers
Copyright
©
Copyright
©
10/03/2017
Tim Moors
17J
UNSW
School of Electrical Engineering and Telecommunications
Dealing with Banyan internal blocking
Buffering within the switching network
Dilation:
• Multiple planes of latter stages
• Move traffic blocked in one
plane to higher plane (R→G→B,Y)
• Expensive: number of switches
in each stage increases
exponentially with stage number
for the worst case.
Preface Banyan with network to
• reduce blocking (distribution network → Benes)
• avoid blocking (Sorting networks ...)
Copyright
©
Copyright
©
10/03/2017
Tim Moors
46
UNSW
47
School of Electrical Engineering and Telecommunications
Motivation for sorting networks
The blocking could have been avoided if only the information
destined to each output was present on the correct input.
Sort
=> Preface Banyan network with a
000 001 010 011 100 101 110 111
sorting network,
Shuffle
and a Shuffle Exchange
Destinations: 000 010 111 100 001 011 110 101
000 100 001 101 010 110 011 111
000 001 010 011 100 101 110 111
000 001 010 011 100 101 110 111
1UP
Copyright
©
Copyright
©
10/03/2017
Tim Moors
UNSW
48
School of Electrical Engineering and Telecommunications
Examples† of non-blocking with sorting
Destinations are multiples of 2 Destinations are multiples of 3 Destinations: 1, 3, 4, 5, 7
Sort
000 010 100 110
000 011 110
Shuffle
000
010
100
Sort
Sort
001 011 100 101 111
Shuffle
110
000 001 010 011 100 101 110 111
000
011
110
000 001 010 011 100 101 110 111
Shuffle
001 111 011
Copyright
©
Copyright
©
10/03/2017
Tim Moors
101
000 001 010 011 100 101 110 111
† Not a proof that sorting+shuffling always prevents blocking
1XL
100
1X9
UNSW
School of Electrical Engineering and Telecommunications
Sorting outline
• Isn’t sorting complicated? As complicated as switching?
• Sorting networks
• Unlike serial sorting algorithms, sequence of comparisons must be
independent of data values to allow fixed parallel implementation.
• Issues (covered in ensuing slides):
• Bitonic sequences
• Batcher’s sorting of Bitonic sequences
• How to create a bitonic sequence
• Batcher-Banyan switches
• Trapping sorted duplicates
http://en.wikipedia.org/wiki/Sorting_network
Copyright
©
Copyright
©
10/03/2017
Tim Moors
49
1NF
UNSW
50
School of Electrical Engineering and Telecommunications
Sorting is simpler than switching
Both sorting and switching reorder items, but
1. Switch must spread after sorting to account for idle ports
Input port: a
Destination: 101a
Sorted: 001d
Switched: ---x
b
c
d
e
f
g
h
---b 111c 001d 010e 110f ---g 011h
010e 011h 101a 110f 111c ---x ---x
001d 010e 011h ---x 101a 110f 111c
2. Switch must deal with duplicates
e.g. inputs a & d contend for port 101
Destination: 101a ---b 111c 101d 010e 110f ---g 011h
Sorted: 010e 011h 101a 101d 110f 111c ---x ---x
Switched: ---x --- 010e 011h ---x 101a 110f 111c
101d
i.e. adding a sorter ≠ (<cost) repeating the switching process
Copyright
©
Copyright
©
10/03/2017
Tim Moors
18H
UNSW
51
School of Electrical Engineering and Telecommunications
Aside: Bitonic sequences
A bitonic sequence is either:
A. a concatenation of an increasing sequence and a
decreasing sequence, i.e. {a1, a2, ... a2k} and
k : a1  a2  a3 ...ak  ak 1  ak  2 ...  a2 k
or
B. a sequence that can be shifted cyclically to become A.
e.g.
12387654
87123456 45687123
Theorem: Any bitonic sequence that has been arbitrarily
split in two and had the two parts reversed is also
bitonic.
e.g. 8 7 1 2 3 | 4 5 6 → 3 2 1 7 8 6 5 4cyclic
→17865432
Copyright
©
Copyright
©
10/03/2017
Tim Moors
shift
UNSW
School of Electrical Engineering and Telecommunications
Batcher’s sorting of bitonic sequences
Batcher showed† that given a bitonic sequence {a1 ... a2k}
then the sequences
{min(a1,ak+1), min(a2,ak+2), .. min(ak-1,a2k)} and
{max(a1,ak+1), max(a2,ak+2), .. max(ak-1,a2k)}
• are also both bitonic, and
• no number in the sequence of minima exceeds any number in the sequence
of maxima
e.g. {8 7 1 2 3 4 5 6} →
{3=min(8,3), 4=min(7,4), 1=min(1,5), 2=min(2,6)} and
{8=max(8,3), 7=max(7,4), 5=max(1,5), 6=max(2,6)}
i.e. can sort a bitonic sequence by dividing into two sub-sequences (one of
minima and one of maxima), and repeating on sub-sequences.
{3,4,1,2}{8,7,5,6}→{1,2}{3,4}{5,6}{8,7}→1,2,3,4,5,6,7,8
† K. Batcher: “Sorting networks and their applications”, Proc. AFIPS Spring Joint
Computer Conference, pp. 307-14, 1968
140
Copyright
©
Copyright
©
10/03/2017
Tim Moors
52
1GU
UNSW
School of Electrical Engineering and Telecommunications
How to create a bitonic sequence
Use a “odd-even mergesort”† to make sequence bitonic before feeding
into Batcher sorting network.
Aim: out.i < out.j ∀ i<j
• Sort groups of 2
if(a1<a2) {out.1=a1; out.2=a2;} else {out.1=a2; out.2=a1;}
• Merge pairs of groups of 2 to form groups of 4. Sort group of 4.
if(a1<a3) {out.1=a1; tmp.lo=a3;} else {out.1=a3; tmp.lo=a1;}
if(a2>a4) {out.4=a2; tmp.hi=a4;} else {out.4=a4; tmp.hi=a2;}
if(tmp.lo<tmp.hi) {out.2=tmp.lo; out.3=tmp.hi;}
else {out.2=tmp.hi; out.3=tmp.lo;}
note that prior sorting of sub-groups helps sorting of merged group
• repeat for larger groups of 8, 16, ...
† Like a conventional merge sort, but comparisons are independent of data
Copyright
©
Copyright
©
10/03/2017
Tim Moors
53
UNSW
54
School of Electrical Engineering and Telecommunications
Odd-even
Merge sort
Batcher sort
max
=
=
Legend
Ascending sort
to make bitonic
Decending sort
=
min
1
6
2
7
8
4
5
3
6
1
2
7
8
4
3
5
bitonic list
666 7 7
227 6 3
172 2 6
711 1 4
833 3 2
384 4 5
448 5 1
555 8 8
7
3
6
4
5
2
8
1
7
6
5
8
3
4
2
1
Numerical example from
15U C. Partridge: Gigabit Networks, p. 115
7
6
8
5
4
3
2
1
7
8
6
5
4
3
2
1
8
7
6
5
4
3
2
1
Copyright
©
Copyright
©
10/03/2017
Tim Moors
1MZ
UNSW
55
School of Electrical Engineering and Telecommunications
Trap modules
Sorting alone doesn’t resolve output port contention
=> trap excessive (>1) packets destined to one port
Starlite switch: Recirculate duplicates
Resequencing is needed. Often give priority to
recirculated packets to avoid prolonged resequencing.
Conc.
Sorter
Trap
Expand
Copyright
©
Copyright
©
10/03/2017
Tim Moors
Shuffle
Banyan
UNSW
School of Electrical Engineering and Telecommunications
Benes networks
• Essentially two Banyan networks,
one the mirror image of the other
• 2nd half = standard Banyan network
• 1st half = “distributor”: used to shift
inputs so as to reduce blocking.
Switch controller assigns path
through distributor.
• A PP Benes network has
s=2log2P-1 stages,
with each stage having P/2
22 switching elements (4 crosspoints)
Fig. from A. Pattavina: Switching Theory,
i.e. Nx = 4Plog2P-2P
Wiley, p. 100
• The Cisco CRS-1 uses “a unique 3-Stage Benes topology switching fabric”
[http://www.cisco.com/en/US/products/ps5763/index.html]
13Y
Copyright
©
Copyright
©
10/03/2017
Tim Moors
56
1!!!!
UNSW
School of Electrical Engineering and Telecommunications
Things to think about
• Critical thinking:
•
•
Engineering methods:
•
•
Assembly lines are another example of the pipelining technique of
multi-stage processing.
Links to other areas:
•
•
•
•
Most sorting algorithms are designed for sequential processing. How
might other algorithms be adapted for parallel processing?
FPGAs are constructed as arrays of programmable (disjoint or wilton)
“switch boxes”, like the smaller switches used to compose larger
switching systems in this lecture
This lecture drew on sorting algorithms & randomisation (like Ethernet)
A key change from HTTP/1.1 to /2 is reducing “Head of line blocking”
Independent learning:
•
Read about data center applications: Clos Networks: What's Old Is New
Again
Copyright
Copyright©©Tim Moors 2016
Download