ECE 1749H: Interconnection Networks for Parallel Computer Architectures: Flow Control Prof. Natalie Enright Jerger Announcements • Project Progress Reports – Due March 9, submit by e-mail – Worth 15% of project grade – 1 page • Discuss current status of project • Any difficulties/problems encountered • Anticipated changes from original project proposal Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 2 Announcements (2) • 2 presentations next week – Elastic-Buffer Flow Control for On-Chip Networks • Presenter: Islam – Express Virtual Channels: Toward the ideal interconnection fabric • Presenter: Yu • 1 Critique due Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 3 Switching/Flow Control Overview • Topology: determines connectivity of network • Routing: determines paths through network • Flow Control: determine allocation of resources to messages as they traverse network – Buffers and links – Significant impact on throughput and latency of network Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 4 Flow Control Control State Channel Bandwidth Buffer capacity • Control state records – allocation of channels and buffers to packets – current state of packet traversing node • Channel bandwidth advances flits from this node to next • Buffers hold flits waiting for channel bandwidth Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 5 Packets • Messages: composed of one or more packets – If message size is <= maximum packet size only one packet created • Packets: composed of one or more flits • Flit: flow control digit • Phit: physical digit – Subdivides flit into chunks = to link width Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 6 Packets (2) Message Header Packet Route Seq# Head Flit Flit Type Payload Body Flit Tail Flit VCID Head, Body, Tail, Head & Tail Phit • Off-chip: channel width limited by pins – Requires phits • On-chip: abundant wiring means phit size == flit size Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 7 Packets(3) Cache line RC Type VCID Add r Bytes 0-15 Head Flit Coherence Command RC Type VCID Bytes 16-31 Bytes 32-47 Body Flits Add r Bytes 48-63 Tail Flit Cmd Head & Tail Flit • Packet contains destination/route information – Flits may not all flits of a packet must take same route Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 8 Switching • Different flow control techniques based on granularity – Message-based: allocation made at message granularity (circuit-switching) – Packet-based: allocation made to whole packets – Flit-based: allocation made on a flit-by-flit basis Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 9 Message-Based Flow Control • Coarsest granularity • Circuit-switching – Pre-allocates resources across multiple hops • Source to destination • Resources = links • Buffers are not necessary – Probe sent into network to reserve resources Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 10 Circuit Switching • Once probe sets up circuit – Message does not need to perform any routing or allocation at each network hop – Good for transferring large amounts of data • Can amortize circuit setup cost by sending data with very low perhop overheads • No other message can use those resources until transfer is complete – Throughput can suffer due setup and hold time for circuits – Links are idle until setup is complete Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 11 Circuit Switching Example 0 Configuration Probe 5 Data Circuit Acknowledgement • Significant latency overhead prior to data transfer – Data transfer does not pay per-hop overhead for routing and allocation Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 12 Circuit Switching Example (2) 0 1 2 Configuration Probe 5 Data Circuit Acknowledgement • When there is contention – Significant wait time – Message from 1 2 must wait Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 13 Time-Space Diagram: Circuit-Switching 0 S08 A08 1 S08 Location A08 S08 2 D08 S28 S08 5 D08 A08 0 1 2 3 A08 4 5 D08 D08 D08 D08 D08 D08 D08 S08 A08 8 D08 7 8 9 D08 D08 D08 D08 6 T08 T08 D08 D08 D08 T08 D08 D08 S28 T08 D08 S28 T08 10 11 12 13 14 15 16 17 18 19 20 Time Time to setup+ack circuit from 0 to 8 0 1 2 3 4 5 6 7 8 Winter 2011 Time setup from 2 to 8 is blocked ECE 1749H: Interconnection Networks (Enright Jerger) 14 Packet-based Flow Control • Break messages into packets • Interleave packets on links – Better utilization • Requires per-node buffering to store in-flight packets • Two types of packet-based techniques Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 15 Store and Forward • Links and buffers are allocated to entire packet • Head flit waits at router until entire packet is received before being forwarded to the next hop • Not suitable for on-chip – Requires buffering at each router to hold entire packet • Packet cannot traverse link until buffering allocated to entire packet – Incurs high latencies (pays serialization latency at each hop) Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 16 Store and Forward Example 0 5 Total delay = 4 cycles per hop x 3 hops = 12 cycles • High per-hop latency – Serialization delay paid at each hop • Larger buffering required Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 17 Time-Space Diagram: Store and Forward 0 H B Location 1 B B T H B B B T H 2 B B B T H 5 B B B T H 8 0 1 2 3 4 5 B B B T 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 18 Packet-based: Virtual Cut Through • Links and Buffers allocated to entire packets • Flits can proceed to next hop before tail flit has been received by current router – But only if next router has enough buffer space for entire packet • Reduces the latency significantly compared to SAF • But still requires large buffers – Unsuitable for on-chip Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 19 Allocate 4 flit-sized Allocate 4 flit-sized buffers beforebuffers head before head proceeds proceeds Virtual Cut Through Example 0 5 Total delay = 1 cycle per hop x 3 hops + serialization = 6 cycles • Lower per-hop latency • Large buffering required Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 20 Time-Space Diagram: VCT 0 H 1 B B B T H B B B T H B B B T H B B B T H B B B T 4 5 6 7 8 Location 2 5 8 0 1 2 3 Time Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 21 Virtual Cut Through Cannot proceed because only 2 flit buffers available • Throughput suffers from inefficient buffer allocation Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 22 Time-Space Diagram: VCT (2) 0 H 1 B B B T H B B B T H B B B T Insufficient Buffers H Location 2 5 8 0 1 2 3 4 5 6 B B B T H B B B 7 8 9 10 11 T Time Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 23 Flit-Level Flow Control • Help routers meet tight area/power constraints • Flit can proceed to next router when there is buffer space available for that flit – Improves over SAF and VCT by allocating buffers on a flit-by-flit basis Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 24 Wormhole Flow Control • Pros – More efficient buffer utilization (good for on-chip) – Low latency • Cons – Poor link utilization: if head flit becomes blocked, all links spanning length of packet are idle • Cannot be re-allocated to different packet • Suffers from head of line (HOL) blocking Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 25 Wormhole Example Red holds this channel: channel remains idle until read proceeds Channel idle but red packet blocked behind blue Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 26 Time-Space Diagram: Wormhole 0 H Location 1 B H 2 B B Contention H 5 B T B B T B B B T H B B B T H B B B 7 8 9 10 11 8 0 1 2 3 4 5 6 T Time Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 27 Virtual Channels • First proposed for deadlock avoidance – We’ll come back to this • Can be applied to any flow control – First proposed with wormhole Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 28 Virtual Channel Flow Control • Virtual channels used to combat HOL blocking in wormhole • Virtual channels: multiple flit queues per input port – Share same physical link (channel) • Link utilization improved – Flits on different VC can pass blocked packet Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 29 Virtual Channel Flow Control (2) A (in) A (out) B (in) B (out) Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 30 Virtual Channel Flow Control (3) In1 AH A1 A2 A3 A4 A5 1 3 3 1 2 2 3 In2 BH B1 B2 B3 B4 1 2 3 3 3 3 3 A1 B1 A2 B2 A3 Out 2 AH BH A Downstream B Downstream Winter 2011 AH 3 AT B5 A1 BH 2 1 1 3 3 2 2 1 1 B3 A4 B4 A5 B5 AT BT A2 B1 2 3 A3 B2 A4 B3 ECE 1749H: Interconnection Networks (Enright Jerger) A5 B4 BT AT B5 BT 31 Virtual Channel Flow Control (3) • Packets compete for VC on flit by flit basis • In example: on downstream links, flits of each packet are available every other cycle • Upstream links throttle because of limited buffers • Does not mean links are idle – May be used by packet allocated to other VCs Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 32 Virtual Channel Example Buffer full: blue cannot proceed Blocked by other packets • 6 flit buffers/input port • 3 flit buffers/VC Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 33 Summary of techniques Links Buffers Comments CircuitSwitching Store and Forward Messages N/A (buffer-less) Setup & Ack Packet Packet Head flit waits for tail Virtual Cut Through Wormhole Packet Packet Packet Flit Head can proceed HOL Virtual Channel Flit Flit Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) Interleave flits of different packets 34 Deadlock • Using flow control to guarantee deadlock freedom give more flexible routing – Recall: routing restrictions needed for deadlock freedom • If routing algorithm is not deadlock free – VCs can break resource cycle • Each VC is time-multiplexed onto physical link – Holding VC implies holding associated buffer queue – Not tying up physical link resource • Enforce order on VCs Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 35 Deadlock: Enforce Order A0 C A1 A B B1 D D C B0 B0 D D 1 C1 C0 A • All message sent through VC 0 until cross dateline • After dateline, assigned to VC 1 – Cannot be allocated to VC 0 again Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 36 Deadlock: Escape VCs • Enforcing order lowers VC utilization – Previous example: VC 1 underutilized • Escape Virtual Channels – Have 1 VC that is deadlock free – Example: VC 0 uses DOR, other VCs use arbitrary routing function – Access to VCs arbitrated fairly: packet always has chance of landing on escape VC • Assign different message classes to different VCs to prevent protocol level deadlock – Prevent req-ack message cycles Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 37 Buffer Backpressure • Need mechanism to prevent buffer overflow – Avoid dropping packets – Upstream nodes need to know buffer availability at downstream routers • Significant impact on throughput achieved by flow control • Two common mechanisms – Credits – On-off Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 38 Credit-Based Flow Control • Upstream router stores credit counts for each downstream VC • Upstream router – When flit forwarded • Decrement credit count – Count == 0, buffer full, stop sending • Downstream router – When flit forwarded and buffer freed • Send credit to upstream router • Upstream increments credit count Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 39 Credit Timeline t1 t2 Node 1 Node 2 Flit departs router Process Credit round trip delay t3 t4 Process t5 • Round-trip credit delay: – Time between when buffer empties and when next flit can be processed from that buffer entry – If only single entry buffer, would result in significant throughput degradation – Important to size buffers to tolerate credit turn-around Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 40 On-Off Flow Control • Credit: requires upstream signaling for every flit • On-off: decreases upstream signaling • Off signal – Sent when number of free buffers falls below threshold Foff • On signal – Sent when number of free buffers rises above threshold Fon Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 41 On-Off Timeline t1 Foff set to prevent flits arriving before t4 from overflowing Node 1 Node 2 t2 t3 t4 Process Fon threshold reached t5 Fon set so that Node 2 does not run out of flits between t5 and t8 Foff threshold reached t6 t7 Process t8 • Less signaling but more buffering – On-chip buffers more expensive than wires 42 Buffer Utilization Cycle Credit count Head Flit Body Flit 1 Credit (head) 1 2 2 1 3 0 4 0 5 0 6 0 VA/ SA ST LT BW ST SA ST LT VA/ SA BW 7 0 SA ST C C-LT 8 1 C C-LT Tail Flit Winter 2011 10 1 11 0 SA ST LT SA ST 12 0 CUp Body Flit 2 Credit (body 1) 9 2 ECE 1749H: Interconnection Networks (Enright Jerger) CUp LT 43 Buffer Sizing • Prevent backpressure from limiting throughput – Buffers must hold flits >= turnaround time • Assume: – 1 cycle propagation delay for data and credits – 1 cycle credit processing delay – 3 cycle router pipeline • At least 6 flit buffers Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 44 Actual Buffer Usage & Turnaround Delay Actual buffer usage 1 1 Credit propagation delay Credit pipeline delay Flit leaves node 1 and credit is sent to node 0 Flit arrives at node 1 and uses buffer Winter 2011 Node 0 receives credit 3 1 flit pipeline delay flit propagation delay Node 0 processes credit, freed buffer reallocated to new flit New flit arrives at Node 1 and reuses buffer New flit leaves Node 0 for Node 1 ECE 1749H: Interconnection Networks (Enright Jerger) 45 Flow Control and MPSoCs • Wormhole flow control • Real time performance requirements – Quality of Service – Guaranteed bandwidth allocated to each node • Time division multiplexing • Irregularity – Different buffer sizes Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 47 Flow Control Summary • On-chip networks require techniques with lower buffering requirements – Wormhole or Virtual Channel flow control • Avoid dropping packets in on-chip environment – Requires buffer backpressure mechanism • Complexity of flow control impacts router microarchitecture (in 2 weeks) Winter 2011 ECE 1749H: Interconnection Networks (Enright Jerger) 48