ECE 1749H: Interconnection Networks for Parallel Computer

advertisement
ECE 1749H:
Interconnection Networks for
Parallel Computer Architectures:
Flow Control
Prof. Natalie Enright Jerger
Announcements
• Project Progress Reports
– Due March 9, submit by e-mail
– Worth 15% of project grade
– 1 page
• Discuss current status of project
• Any difficulties/problems encountered
• Anticipated changes from original project proposal
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
2
Announcements (2)
• 2 presentations next week
– Elastic-Buffer Flow Control for On-Chip Networks
• Presenter: Islam
– Express Virtual Channels: Toward the ideal
interconnection fabric
• Presenter: Yu
• 1 Critique due
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
3
Switching/Flow Control Overview
• Topology: determines connectivity of network
• Routing: determines paths through network
• Flow Control: determine allocation of resources
to messages as they traverse network
– Buffers and links
– Significant impact on throughput and latency of
network
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
4
Flow Control
Control State
Channel Bandwidth
Buffer capacity
• Control state records
– allocation of channels and buffers to packets
– current state of packet traversing node
• Channel bandwidth advances flits from this node to next
• Buffers hold flits waiting for channel bandwidth
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
5
Packets
• Messages: composed of one or more packets
– If message size is <= maximum packet size only one
packet created
• Packets: composed of one or more flits
• Flit: flow control digit
• Phit: physical digit
– Subdivides flit into chunks = to link width
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
6
Packets (2)
Message
Header
Packet
Route
Seq#
Head Flit
Flit
Type
Payload
Body Flit
Tail Flit
VCID
Head, Body, Tail,
Head & Tail
Phit
• Off-chip: channel width limited by pins
– Requires phits
• On-chip: abundant wiring means phit size == flit size
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
7
Packets(3)
Cache line
RC
Type
VCID
Add
r
Bytes 0-15
Head Flit
Coherence
Command
RC
Type
VCID
Bytes 16-31
Bytes 32-47
Body Flits
Add
r
Bytes 48-63
Tail Flit
Cmd
Head & Tail Flit
• Packet contains destination/route information
– Flits may not  all flits of a packet must take same route
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
8
Switching
• Different flow control techniques based on
granularity
– Message-based: allocation made at message
granularity (circuit-switching)
– Packet-based: allocation made to whole packets
– Flit-based: allocation made on a flit-by-flit basis
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
9
Message-Based Flow Control
• Coarsest granularity
• Circuit-switching
– Pre-allocates resources across multiple hops
• Source to destination
• Resources = links
• Buffers are not necessary
– Probe sent into network to reserve resources
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
10
Circuit Switching
• Once probe sets up circuit
– Message does not need to perform any routing or
allocation at each network hop
– Good for transferring large amounts of data
• Can amortize circuit setup cost by sending data with very low perhop overheads
• No other message can use those resources until
transfer is complete
– Throughput can suffer due setup and hold time for circuits
– Links are idle until setup is complete
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
11
Circuit Switching Example
0
Configuration
Probe
5
Data
Circuit
Acknowledgement
• Significant latency overhead prior to data
transfer
– Data transfer does not pay per-hop overhead for
routing and allocation
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
12
Circuit Switching Example (2)
0
1
2
Configuration
Probe
5
Data
Circuit
Acknowledgement
• When there is contention
– Significant wait time
– Message from 1  2 must wait
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
13
Time-Space Diagram: Circuit-Switching
0 S08
A08
1
S08
Location
A08
S08
2
D08
S28
S08
5
D08
A08
0
1
2
3
A08
4
5
D08
D08
D08
D08
D08
D08
D08
S08 A08
8
D08
7
8
9
D08
D08
D08
D08
6
T08
T08
D08
D08
D08
T08
D08
D08
S28
T08
D08
S28
T08
10 11 12 13 14 15 16 17 18 19 20
Time
Time to setup+ack circuit from 0 to 8
0
1
2
3
4
5
6
7
8
Winter 2011
Time setup from 2 to 8 is blocked
ECE 1749H: Interconnection Networks (Enright Jerger)
14
Packet-based Flow Control
• Break messages into packets
• Interleave packets on links
– Better utilization
• Requires per-node buffering to store in-flight
packets
• Two types of packet-based techniques
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
15
Store and Forward
• Links and buffers are allocated to entire packet
• Head flit waits at router until entire packet is
received before being forwarded to the next hop
• Not suitable for on-chip
– Requires buffering at each router to hold entire packet
• Packet cannot traverse link until buffering allocated to entire
packet
– Incurs high latencies (pays serialization latency at
each hop)
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
16
Store and Forward Example
0
5
Total delay = 4
cycles per hop x 3
hops = 12 cycles
• High per-hop latency
– Serialization delay paid at each hop
• Larger buffering required
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
17
Time-Space Diagram: Store and
Forward
0
H
B
Location
1
B
B
T
H
B
B
B
T
H
2
B
B
B
T
H
5
B
B
B
T
H
8
0 1 2 3 4 5
B
B
B
T
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Time
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
18
Packet-based: Virtual Cut Through
• Links and Buffers allocated to entire packets
• Flits can proceed to next hop before tail flit has been
received by current router
– But only if next router has enough buffer space for entire
packet
• Reduces the latency significantly compared to SAF
• But still requires large buffers
– Unsuitable for on-chip
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
19
Allocate 4 flit-sized
Allocate 4 flit-sized
buffers beforebuffers
head before head
proceeds
proceeds
Virtual Cut Through Example
0
5
Total delay = 1 cycle
per hop x 3 hops +
serialization = 6
cycles
• Lower per-hop latency
• Large buffering required
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
20
Time-Space Diagram: VCT
0
H
1
B
B
B
T
H
B
B
B
T
H
B
B
B
T
H
B
B
B
T
H
B
B
B
T
4
5
6
7
8
Location
2
5
8
0
1
2
3
Time
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
21
Virtual Cut Through
Cannot proceed because
only 2 flit buffers
available
• Throughput suffers from inefficient buffer
allocation
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
22
Time-Space Diagram: VCT (2)
0
H
1
B
B
B
T
H
B
B
B
T
H
B
B
B
T
Insufficient
Buffers
H
Location
2
5
8
0
1
2
3
4
5
6
B
B
B
T
H
B
B
B
7
8
9
10 11
T
Time
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
23
Flit-Level Flow Control
• Help routers meet tight area/power
constraints
• Flit can proceed to next router when there is
buffer space available for that flit
– Improves over SAF and VCT by allocating buffers
on a flit-by-flit basis
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
24
Wormhole Flow Control
• Pros
– More efficient buffer utilization (good for on-chip)
– Low latency
• Cons
– Poor link utilization: if head flit becomes blocked,
all links spanning length of packet are idle
• Cannot be re-allocated to different packet
• Suffers from head of line (HOL) blocking
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
25
Wormhole Example
Red holds this channel:
channel remains idle
until read proceeds
Channel idle but
red packet blocked
behind blue
Buffer full: blue
cannot proceed
Blocked by other
packets
• 6 flit buffers/input port
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
26
Time-Space Diagram: Wormhole
0
H
Location
1
B
H
2
B
B
Contention
H
5
B
T
B
B
T
B
B
B
T
H
B
B
B
T
H
B
B
B
7
8
9
10 11
8
0
1
2
3
4
5
6
T
Time
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
27
Virtual Channels
• First proposed for deadlock avoidance
– We’ll come back to this
• Can be applied to any flow control
– First proposed with wormhole
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
28
Virtual Channel Flow Control
• Virtual channels used to combat HOL blocking
in wormhole
• Virtual channels: multiple flit queues per
input port
– Share same physical link (channel)
• Link utilization improved
– Flits on different VC can pass blocked packet
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
29
Virtual Channel Flow Control (2)
A (in)
A (out)
B (in)
B (out)
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
30
Virtual Channel Flow Control (3)
In1 AH A1 A2 A3 A4 A5
1
3
3
1
2
2
3
In2 BH B1
B2
B3
B4
1
2
3
3
3
3
3
A1
B1
A2
B2
A3
Out
2
AH BH
A Downstream
B Downstream
Winter 2011
AH
3
AT
B5
A1
BH
2
1
1
3
3
2
2
1
1
B3
A4
B4
A5
B5
AT
BT
A2
B1
2
3
A3
B2
A4
B3
ECE 1749H: Interconnection Networks (Enright Jerger)
A5
B4
BT
AT
B5
BT
31
Virtual Channel Flow Control (3)
• Packets compete for VC on flit by flit basis
• In example: on downstream links, flits of each
packet are available every other cycle
• Upstream links throttle because of limited buffers
• Does not mean links are idle
– May be used by packet allocated to other VCs
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
32
Virtual Channel Example
Buffer full: blue
cannot proceed
Blocked by other
packets
• 6 flit buffers/input port
• 3 flit buffers/VC
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
33
Summary of techniques
Links
Buffers
Comments
CircuitSwitching
Store and
Forward
Messages
N/A (buffer-less)
Setup & Ack
Packet
Packet
Head flit waits
for tail
Virtual Cut
Through
Wormhole
Packet
Packet
Packet
Flit
Head can
proceed
HOL
Virtual
Channel
Flit
Flit
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
Interleave flits of
different packets
34
Deadlock
• Using flow control to guarantee deadlock freedom give
more flexible routing
– Recall: routing restrictions needed for deadlock freedom
• If routing algorithm is not deadlock free
– VCs can break resource cycle
• Each VC is time-multiplexed onto physical link
– Holding VC implies holding associated buffer queue
– Not tying up physical link resource
• Enforce order on VCs
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
35
Deadlock: Enforce Order
A0
C
A1
A
B
B1
D
D
C
B0
B0
D
D
1
C1
C0
A
• All message sent through VC 0 until cross dateline
• After dateline, assigned to VC 1
– Cannot be allocated to VC 0 again
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
36
Deadlock: Escape VCs
• Enforcing order lowers VC utilization
– Previous example: VC 1 underutilized
• Escape Virtual Channels
– Have 1 VC that is deadlock free
– Example: VC 0 uses DOR, other VCs use arbitrary routing
function
– Access to VCs arbitrated fairly: packet always has chance of
landing on escape VC
• Assign different message classes to different VCs to
prevent protocol level deadlock
– Prevent req-ack message cycles
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
37
Buffer Backpressure
• Need mechanism to prevent buffer overflow
– Avoid dropping packets
– Upstream nodes need to know buffer availability at
downstream routers
• Significant impact on throughput achieved by
flow control
• Two common mechanisms
– Credits
– On-off
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
38
Credit-Based Flow Control
• Upstream router stores credit counts for each
downstream VC
• Upstream router
– When flit forwarded
• Decrement credit count
– Count == 0, buffer full, stop sending
• Downstream router
– When flit forwarded and buffer freed
• Send credit to upstream router
• Upstream increments credit count
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
39
Credit Timeline
t1
t2
Node 1
Node 2
Flit departs
router
Process
Credit round
trip delay
t3
t4
Process
t5
• Round-trip credit delay:
– Time between when buffer empties and when next flit can
be processed from that buffer entry
– If only single entry buffer, would result in significant
throughput degradation
– Important to size buffers to tolerate credit turn-around
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
40
On-Off Flow Control
• Credit: requires upstream signaling for every flit
• On-off: decreases upstream signaling
• Off signal
– Sent when number of free buffers falls below threshold Foff
• On signal
– Sent when number of free buffers rises above threshold
Fon
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
41
On-Off Timeline
t1
Foff set to prevent
flits arriving
before t4 from
overflowing
Node 1
Node 2
t2
t3
t4
Process
Fon threshold
reached
t5
Fon set so that
Node 2 does
not run out of
flits between t5
and t8
Foff threshold
reached
t6
t7
Process
t8
• Less signaling but more buffering
– On-chip buffers more expensive than wires
42
Buffer Utilization
Cycle
Credit count
Head Flit
Body Flit 1
Credit (head)
1
2
2
1
3
0
4
0
5
0
6
0
VA/
SA
ST
LT
BW
ST
SA
ST
LT
VA/
SA
BW
7
0
SA
ST
C
C-LT
8
1
C
C-LT
Tail Flit
Winter 2011
10
1
11
0
SA
ST
LT
SA
ST
12
0
CUp
Body Flit 2
Credit (body 1)
9
2
ECE 1749H: Interconnection Networks (Enright Jerger)
CUp
LT
43
Buffer Sizing
• Prevent backpressure from limiting throughput
– Buffers must hold flits >= turnaround time
• Assume:
– 1 cycle propagation delay for data and credits
– 1 cycle credit processing delay
– 3 cycle router pipeline
• At least 6 flit buffers
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
44
Actual Buffer Usage & Turnaround Delay
Actual buffer
usage
1
1
Credit
propagation
delay
Credit
pipeline
delay
Flit leaves node 1
and credit is sent
to node 0
Flit arrives at node 1
and uses buffer
Winter 2011
Node 0 receives
credit
3
1
flit pipeline delay
flit
propagation
delay
Node 0 processes
credit, freed
buffer reallocated
to new flit
New flit arrives at
Node 1 and
reuses buffer
New flit leaves
Node 0 for Node 1
ECE 1749H: Interconnection Networks (Enright Jerger)
45
Flow Control and MPSoCs
• Wormhole flow control
• Real time performance requirements
– Quality of Service
– Guaranteed bandwidth allocated to each node
• Time division multiplexing
• Irregularity
– Different buffer sizes
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
47
Flow Control Summary
• On-chip networks require techniques with lower
buffering requirements
– Wormhole or Virtual Channel flow control
• Avoid dropping packets in on-chip environment
– Requires buffer backpressure mechanism
• Complexity of flow control impacts router
microarchitecture (in 2 weeks)
Winter 2011
ECE 1749H: Interconnection Networks (Enright Jerger)
48
Download