The StageNet Fabric for Constructing Resilient Multicore Systems Shantanu Gupta, Shuguang Feng, Amin Ansari, Jason Blome and Scott Mahlke University of Michigan, Ann Arbor 1 University of Michigan Electrical Engineering and Computer Science CPU Performance (log scale) Journey of Silicon Technology Core 2 Quad Core Duo Memory redundancy Pentium 4 Pentium III Pentium II IBM z servers Pentium 486 Perfect transistors 1985 1990 1995 Rising Variability Unreliable and Defects Silicon 2000 2005 2 2010 Cell 2015 University of Michigan Electrical Engineering and Computer Science Reliability Threats Transient Faults Hard Faults (Manufacturing defects and device wear-out) Source N+ Gate Drain - N+ - + ++ - + -+ P Manufacturing Defects That Escape Testing Parametric Variability (Inefficient Burn-in Testing) (Uncertainty in device and environment) Increased Heating Intra-die variations in ILD thickness Thermal Runaway Higher Power Dissipation Higher Transistor Leakage [Todd Austin, GSRC Sep 08] 3 University of Michigan Electrical Engineering and Computer Science Goal of this Research • Reliability is developing into a first class design constraint • Design a computing substrate ► ► ► Provides scalable fault tolerance Highly reconfigurable Marginal overheads Enable CMP designs capable of facing 100s of faults while maintaining useful throughput 4 University of Michigan Electrical Engineering and Computer Science Reconfiguration Granularity For 100% area overhead (redundancy) Better resource utilization Lower complexity CORE level STAGE level MODULE level FETCH DEC EXEC MEM WB • ElastIC, DT’ 06 • Reunion, MICRO’06 • Configurable Isolation, ISCA’07 • Online Diagnosis of Hard Faults, MICRO’ 05 • Ultra Low-Cost Defect Protection, ASPLOS’ 06 100% MTTF ↑ 170% MTTF ↑ 200% MTTF ↑ -- Poor MTTF gains + Easy to implement + Good MTTF gains + Circuit / Architectural boundary + Full coverage + Best MTTF gains -- Complex implementation 5 University of Michigan Electrical Engineering and Computer Science CMP Fabric Stage1 Stage2 Stage1 Stage3 Latch Stage2 Latch Stage3 StageN Stage1 StageN Core 1 Core 0 Stage1 Stage2 Stage2 Stage1 Stage3 Stage2 Stage3 Stage3 StageN StageN StageN Core 2 Core 3 6 University of Michigan Electrical Engineering and Computer Science StageNet (SN) Fabric Crossbar Switch StageNet Slice (SNS) Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Wearout Sensors • Delay • Temperature • Current Stage1 Stage2 Stage3 StageN Configuration Manager 7 University of Michigan Electrical Engineering and Computer Science SN – Benefits Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Stage1 Stage2 Stage3 StageN Configuration Manager 8 University of Michigan Electrical Engineering and Computer Science Outline • • • • SN Slice (SNS) architecture SNS performance results SN architecture Lifetime Reliability Evaluation 9 University of Michigan Electrical Engineering and Computer Science StageNet Slice (SNS) – Decoupled uArch register wb 5 stage pipeline Fetch Decode Issue LATCH LATCH LATCH Register File LATCH Gen Branch PC Predictor Ex/Mem branch resolution WB bypass 10 buffer Ex/Mem double buffer Issue double Scoreboard buffer Register File double buffer double buffer double Decode buffer buffer Fetch double Gen Branch PC Predictor double SNS University of Michigan Electrical Engineering and Computer Science SNS Performance Hit register wb 1 2 3 4 5 6 7 8 9 10 bypass Ex/Mem double double Issue branch resolution WB double double Ex/Mem buffer LATCH Scoreboard buffer Register File buffer LATCH buffer LATCH buffer Decode Issue double LATCH buffer Decode double Fetch Register File double Gen Branch PC Predictor Fetch buffer Gen Branch PC Predictor > 5X slowdown Commit Time 5 stage pipeline 1 2 3 6 7 8 9 10 BR SNS pipeline 1 2 3 6 7 8 9 10 register dependency 3. Transmission delays 1. Control stall 11 2. Data forwarding University of Michigan Electrical Engineering and Computer Science buffer buffer SID Issue 0 Ex/Mem • Stream-id : 1-bit to represent the execution path • Toggled upon a branch mis-predicted • Wrong path instructions are squashed 12 University of Michigan Electrical Engineering and Computer Science double buffer Scoreboard 0 SID double double Register File double buffer buffer Decode double Fetch 0 double SID buffer Branch Predictor double Gen PC buffer 1. Control Handling using Stream ID Stream-ID Example Gen PC Register File Branch Predictor Scoreboard SID 1 0 SID 1 0 Fetch Decode Issue Ex/Mem Toggle Stream-ID Toggle Stream-ID 0 BR 0 Squash the wrong ones committed squashed BR 0 Branch mispredict 1 1 0 Continue on the right path BR 1 0 1 13 University of Michigan Electrical Engineering and Computer Science 1 2 3 4 5 6 7 8 9 10 2 3 6 double double double buffer SID Issue 0 Ex/Mem Commit Time 7 8 9 10 BR SNS pipeline register dependency 1 2 3. Transmission delays 3 6 7 1. Branch induced stall 14 8 9 10 2. Data forwarding University of Michigan Electrical Engineering and Computer Science double Scoreboard 0 SID 5 stage pipeline 1 buffer buffer buffer buffer Decode double Fetch Register File 0 double SID buffer Branch Predictor double Gen PC buffer SNS with Stream-ID SNS - Challenges and Solutions 1. Control Handling [CASES 08] Decentralized Control Stream-ID takes care of this 2. Data Forwarding Reduce Feedback Links Bypass$ emulates data forwarding - Store previous results - Pass them on to new instructions 3. Transmission Delay Conserve Bandwidth Macro-ops are used to amortize delay - Bundles of instructions - Increases system utilization 15 University of Michigan Electrical Engineering and Computer Science Simulation Infrastructure • Trimaran Compiler • Liberty Simulation Environment Benchmarks Trimaran Branch predictor Global, 16-bit, gshare predictor Level 1 I/D cache 4-way, 16KB, 1 cycle latency Level 2 unified cache 8-way, 64KB, 5 cycle latency Rebel Assembler HPL-PD Assembly HPL-PD Emulator (FUNCTIONAL) SN Architecture (TIMING) 16 Liberty Simulation Framework University of Michigan Electrical Engineering and Computer Science Final SNS Performance SNS + StreamID SNS + StreamID + Bypass$ SNS + Stream ID + Bypass$ + MOPs Normalized Runtime 6 5 4 3 2 1 0 17 University of Michigan Electrical Engineering and Computer Science 1. StreamID 2. Bypass$ 3. Macro-ops buffer buffer 0 Ex/Mem – SID registers – Bypass$, Scoreboard – Packer, Buffer sizes ~12% area overhead, ~10% perf. overhead 18 University of Michigan Electrical Engineering and Computer Science double Issue Bypass $ SID double Scoreboard 0 SID buffer Register File double buffer buffer double Decode double Fetch Packer 0 double SID buffer Branch Predictor double Gen PC buffer SNS – Design Summary SN – Architecture • 5 SNSs combined to form SN • SN architecture is resilient ► ► ► Broken stages can be isolated Crossbar switches are redundant Interconnection wires are relatively reliable • Configuration manager acts upon failures ► ► Stage borrowing / lending Stage sharing 19 University of Michigan Electrical Engineering and Computer Science SN – Stage Borrowing • Pipelines borrow / lend stages to form SNSs • Exclusive use of stages by SNSs 20 University of Michigan Electrical Engineering and Computer Science SN – Stage Sharing • Allow SNSs to share stages • Degree of sharing is tunable (2-way, 3-way..) 21 University of Michigan Electrical Engineering and Computer Science Lifetime Reliability Experiments • Monte Carlo experiment of ~300 lifetime experiments • Where, each experiment involves ► ► ► ► Assigning a TTF to all the components Killing components at their failure times Reconfiguring system to isolate broken components Computing instantaneous throughput • Evaluation for three designs ► ► ► Traditional CMP SN + borrowing SN + borrowing + sharing 22 University of Michigan Electrical Engineering and Computer Science SN – Throughput 4X 23 University of Michigan Electrical Engineering and Computer Science SN – Cumulative Work 50% 24 University of Michigan Electrical Engineering and Computer Science SN Many-core Vision • SN, as presented, can not scale to many cores.... • How to deploy SN in a 64 core system? ► ► Create SN blocks – optimal # cores tied together Deploy a sparse network b/w blocks SN block SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN SN Traditional many-core SN many-core 25 University of Michigan Electrical Engineering and Computer Science Conclusions • Architectural innovations will be crucial in tackling the high failure rates. • SN is a potential solution ► ► 50% more cumulative work Low overheads (10% performance, 12% area) • SNS, a decoupled pipeline microarchitecture, forms its basis ► ► ► Stream-ID Bypass$ (not presented) Macro-ops (not presented) • Ongoing work ► ► SNS design for aggressive cores Optimal SN configuration for many-core systems 26 University of Michigan Electrical Engineering and Computer Science Thank You http://cccp.eecs.umich.edu 27 University of Michigan Electrical Engineering and Computer Science Back up 28 University of Michigan Electrical Engineering and Computer Science SN – Defect Tolerance # Faults 5 03 5 2 Traditional CMP 1 29 4 StageNet CMP 5 University of Michigan Electrical Engineering and Computer Science double buffer buffer Issue Register File Ex/Mem REG ID Valid • Scoreboard to handle RAW dependencies • Stalls generate backpressure 30 University of Michigan Electrical Engineering and Computer Science double buffer buffer double Scoreboard double buffer Decode double buffer Fetch double Branch Predictor buffer Gen PC double Scoreboard Area overhead breakdown Router area for 32 and 64 bit configurations 31 University of Michigan Electrical Engineering and Computer Science Architectural Details 32 University of Michigan Electrical Engineering and Computer Science Stage modifications for SNS 33 University of Michigan Electrical Engineering and Computer Science REG ID • Bypass Cache - Fully associative structure - FIFO replacement policy buffer buffer Bypass $ SID 0 Ex/Mem VALUE • Key benefits - Reduced stalls - Lower bandwidth consumption 34 University of Michigan Electrical Engineering and Computer Science double Issue 0 double double Decode buffer Scoreboard double buffer buffer Register File SID double Fetch 0 double SID buffer Branch Predictor double Gen PC buffer 2. Bypass$ for data forwarding 1 2 3 4 5 6 7 8 9 10 2 3 6 buffer buffer Ex/Mem Commit Time 5 stage pipeline 1 0 7 8 9 10 BR SNS pipeline register dependency 1 2 3. Transmission delays 3 6 35 7 8 9 10 University of Michigan Electrical Engineering and Computer Science 2. Data forwarding double Issue Bypass $ SID double double Scoreboard 0 SID buffer Register File double buffer buffer Decode double Fetch 0 double SID buffer Branch Predictor double Gen PC buffer SNS with Stream-ID, Bypass$ buffer buffer Bypass $ SID 0 Ex/Mem double Issue 0 double double Decode buffer Scoreboard double buffer buffer Register File SID double Fetch 0 double SID buffer Branch Predictor double Gen PC buffer 3. Transmission delay Multiple cycles for instruction transfer Low utilization 36 University of Michigan Electrical Engineering and Computer Science Hide delay with Macro-ops • Need to improve utilization ► Balance transfer and compute time • Send instruction bundles ► ► Max length 4 Max live-ins 2 Macro-ops (MOP) Greedy selection policy >> • Advantages ► ► LD + Removes temp. intermediates Parallelizes transfer and compute LD + / & ST >> << ST 37 University of Michigan Electrical Engineering and Computer Science 1 2 3 4 5 6 7 8 9 10 2 3 6 buffer buffer Ex/Mem Commit Time 5 stage pipeline 1 0 7 8 9 10 BR SNS pipeline register dependency 1 1 22 3 3. Transmission delays 3 6 67 38 87 9 10 8 9 10 University of Michigan Electrical Engineering and Computer Science double Issue Bypass $ SID double double Scoreboard 0 SID buffer Register File double buffer buffer Decode double Fetch Packer 0 double SID buffer Branch Predictor double Gen PC buffer SNS with Stream-ID, Bypass$, MOP Tolerating Permanent Faults Current approach Traditional solutions ► ► ► TMR Tandem / HP Non-stop IBM zSeries 1. 2. Detection Diagnosis ► ► …are impractical ► ► ► ► Cost Power Low gain 3. Using sensors Redundant Computation BIST Repair ► ► Replacement Reconfiguration K-pos DP-31/32 39 University of Michigan Electrical Engineering and Computer Science