sgupta-micro08

advertisement
The StageNet Fabric for Constructing
Resilient Multicore Systems
Shantanu Gupta, Shuguang Feng, Amin Ansari, Jason
Blome and Scott Mahlke
University of Michigan, Ann Arbor
1
University of Michigan
Electrical Engineering and Computer Science
CPU Performance (log scale)
Journey of Silicon Technology
Core 2 Quad
Core Duo
Memory redundancy
Pentium 4
Pentium III
Pentium II
IBM z servers
Pentium
486
Perfect transistors
1985
1990
1995
Rising Variability Unreliable
and Defects
Silicon
2000
2005
2
2010
Cell
2015
University of Michigan
Electrical Engineering and Computer Science
Reliability Threats
Transient Faults
Hard Faults
(Manufacturing defects and device wear-out)
Source
N+
Gate
Drain
-
N+
- + ++
- + -+
P
Manufacturing Defects
That Escape Testing
Parametric Variability
(Inefficient Burn-in Testing)
(Uncertainty in device and environment)
Increased Heating
Intra-die variations in ILD thickness
Thermal
Runaway
Higher
Power
Dissipation
Higher
Transistor
Leakage
[Todd Austin, GSRC Sep 08]
3
University of Michigan
Electrical Engineering and Computer Science
Goal of this Research
• Reliability is developing into a first class
design constraint
• Design a computing substrate
►
►
►
Provides scalable fault tolerance
Highly reconfigurable
Marginal overheads
Enable CMP designs capable of facing 100s of faults
while maintaining useful throughput
4
University of Michigan
Electrical Engineering and Computer Science
Reconfiguration Granularity
For 100% area overhead (redundancy)
Better resource utilization
Lower complexity
CORE level
STAGE level
MODULE level
FETCH
DEC
EXEC
MEM
WB
• ElastIC, DT’ 06
• Reunion, MICRO’06
• Configurable Isolation, ISCA’07
• Online Diagnosis of Hard Faults, MICRO’ 05
• Ultra Low-Cost Defect Protection, ASPLOS’ 06
100% MTTF ↑
170% MTTF ↑
200% MTTF ↑
-- Poor MTTF gains
+ Easy to implement
+ Good MTTF gains
+ Circuit / Architectural
boundary
+ Full coverage
+ Best MTTF gains
-- Complex implementation
5
University of Michigan
Electrical Engineering and Computer Science
CMP Fabric
Stage1
Stage2
Stage1
Stage3
Latch
Stage2
Latch
Stage3
StageN
Stage1
StageN
Core 1
Core 0
Stage1
Stage2
Stage2
Stage1
Stage3
Stage2
Stage3
Stage3
StageN
StageN
StageN
Core 2
Core 3
6
University of Michigan
Electrical Engineering and Computer Science
StageNet (SN) Fabric
Crossbar Switch
StageNet Slice (SNS)
Stage1
Stage2
Stage3
StageN
Stage1
Stage2
Stage3
StageN
Stage1
Stage2
Stage3
StageN
Wearout Sensors
• Delay
• Temperature
• Current
Stage1
Stage2
Stage3
StageN
Configuration Manager
7
University of Michigan
Electrical Engineering and Computer Science
SN – Benefits
Stage1
Stage2
Stage3
StageN
Stage1
Stage2
Stage3
StageN
Stage1
Stage2
Stage3
StageN
Stage1
Stage2
Stage3
StageN
Configuration Manager
8
University of Michigan
Electrical Engineering and Computer Science
Outline
•
•
•
•
SN Slice (SNS) architecture
SNS performance results
SN architecture
Lifetime Reliability Evaluation
9
University of Michigan
Electrical Engineering and Computer Science
StageNet Slice (SNS) – Decoupled uArch
register wb
5 stage pipeline
Fetch
Decode
Issue
LATCH
LATCH
LATCH
Register File
LATCH
Gen Branch
PC Predictor
Ex/Mem
branch resolution
WB
bypass
10
buffer
Ex/Mem
double
buffer
Issue
double
Scoreboard
buffer
Register File
double
buffer
double
buffer
double
Decode
buffer
buffer
Fetch
double
Gen Branch
PC Predictor
double
SNS
University of Michigan
Electrical Engineering and Computer Science
SNS Performance Hit
register wb
1
2
3
4
5
6
7
8
9
10
bypass
Ex/Mem
double
double
Issue
branch resolution
WB
double
double
Ex/Mem
buffer
LATCH
Scoreboard
buffer
Register File
buffer
LATCH
buffer
LATCH
buffer
Decode
Issue
double
LATCH
buffer
Decode
double
Fetch
Register File
double
Gen Branch
PC Predictor
Fetch
buffer
Gen Branch
PC Predictor
>
5X
slowdown
Commit Time
5 stage pipeline
1
2
3
6
7
8
9
10
BR
SNS pipeline
1
2
3
6
7
8
9
10
register
dependency
3. Transmission delays
1. Control stall
11
2. Data forwarding
University of Michigan
Electrical Engineering and Computer Science
buffer
buffer
SID
Issue
0
Ex/Mem
• Stream-id : 1-bit to represent the execution path
• Toggled upon a branch mis-predicted
• Wrong path instructions are squashed
12
University of Michigan
Electrical Engineering and Computer Science
double
buffer
Scoreboard
0
SID
double
double
Register File
double
buffer
buffer
Decode
double
Fetch
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
1. Control Handling using Stream ID
Stream-ID Example
Gen
PC
Register File
Branch
Predictor
Scoreboard
SID 1
0
SID 1
0
Fetch
Decode
Issue
Ex/Mem
Toggle Stream-ID
Toggle Stream-ID
0
BR 0
Squash the
wrong ones
committed
squashed
BR 0
Branch mispredict
1
1
0
Continue on the
right path
BR 1
0
1
13
University of Michigan
Electrical Engineering and Computer Science
1
2
3
4
5
6
7
8
9
10
2
3
6
double
double
double
buffer
SID
Issue
0
Ex/Mem
Commit Time
7
8
9
10
BR
SNS pipeline
register
dependency
1
2
3. Transmission delays
3
6
7
1. Branch induced stall
14
8
9
10
2. Data forwarding
University of Michigan
Electrical Engineering and Computer Science
double
Scoreboard
0
SID
5 stage pipeline
1
buffer
buffer
buffer
buffer
Decode
double
Fetch
Register File
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
SNS with Stream-ID
SNS - Challenges and Solutions
1. Control Handling
[CASES 08]
Decentralized Control
Stream-ID takes care of this
2. Data Forwarding
Reduce Feedback Links
Bypass$ emulates data forwarding
- Store previous results
- Pass them on to new instructions
3. Transmission Delay
Conserve Bandwidth
Macro-ops are used to amortize delay
- Bundles of instructions
- Increases system utilization
15
University of Michigan
Electrical Engineering and Computer Science
Simulation Infrastructure
• Trimaran Compiler
• Liberty Simulation Environment
Benchmarks
Trimaran
Branch predictor
Global, 16-bit, gshare predictor
Level 1 I/D cache
4-way, 16KB, 1 cycle latency
Level 2 unified
cache
8-way, 64KB, 5 cycle latency
Rebel
Assembler
HPL-PD Assembly
HPL-PD Emulator
(FUNCTIONAL)
SN Architecture
(TIMING)
16
Liberty
Simulation
Framework
University of Michigan
Electrical Engineering and Computer Science
Final SNS Performance
SNS + StreamID
SNS + StreamID + Bypass$
SNS + Stream ID + Bypass$ + MOPs
Normalized Runtime
6
5
4
3
2
1
0
17
University of Michigan
Electrical Engineering and Computer Science
1. StreamID
2. Bypass$
3. Macro-ops
buffer
buffer
0
Ex/Mem
– SID registers
– Bypass$, Scoreboard
– Packer, Buffer sizes
~12% area overhead, ~10% perf. overhead
18
University of Michigan
Electrical Engineering and Computer Science
double
Issue
Bypass $
SID
double
Scoreboard
0
SID
buffer
Register File
double
buffer
buffer
double
Decode
double
Fetch
Packer
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
SNS – Design Summary
SN – Architecture
• 5 SNSs combined to form SN
• SN architecture is resilient
►
►
►
Broken stages can be isolated
Crossbar switches are redundant
Interconnection wires are relatively reliable
• Configuration manager acts upon failures
►
►
Stage borrowing / lending
Stage sharing
19
University of Michigan
Electrical Engineering and Computer Science
SN – Stage Borrowing
• Pipelines borrow / lend stages to form SNSs
• Exclusive use of stages by SNSs
20
University of Michigan
Electrical Engineering and Computer Science
SN – Stage Sharing
• Allow SNSs to share stages
• Degree of sharing is tunable (2-way, 3-way..)
21
University of Michigan
Electrical Engineering and Computer Science
Lifetime Reliability Experiments
• Monte Carlo experiment of ~300 lifetime experiments
• Where, each experiment involves ►
►
►
►
Assigning a TTF to all the components
Killing components at their failure times
Reconfiguring system to isolate broken components
Computing instantaneous throughput
• Evaluation for three designs
►
►
►
Traditional CMP
SN + borrowing
SN + borrowing + sharing
22
University of Michigan
Electrical Engineering and Computer Science
SN – Throughput
4X
23
University of Michigan
Electrical Engineering and Computer Science
SN – Cumulative Work
50%
24
University of Michigan
Electrical Engineering and Computer Science
SN Many-core Vision
• SN, as presented, can not scale to many cores....
• How to deploy SN in a 64 core system?
►
►
Create SN blocks – optimal # cores tied together
Deploy a sparse network b/w blocks
SN block
SN SN SN SN
SN SN SN SN
SN SN SN SN
SN SN SN SN
Traditional
many-core
SN many-core
25
University of Michigan
Electrical Engineering and Computer Science
Conclusions
• Architectural innovations will be crucial in tackling the high
failure rates.
• SN is a potential solution
►
►
50% more cumulative work
Low overheads (10% performance, 12% area)
• SNS, a decoupled pipeline microarchitecture, forms its basis
►
►
►
Stream-ID
Bypass$ (not presented)
Macro-ops (not presented)
• Ongoing work
►
►
SNS design for aggressive cores
Optimal SN configuration for many-core systems
26
University of Michigan
Electrical Engineering and Computer Science
Thank You
http://cccp.eecs.umich.edu
27
University of Michigan
Electrical Engineering and Computer Science
Back up
28
University of Michigan
Electrical Engineering and Computer Science
SN – Defect Tolerance
# Faults
5
03
5
2
Traditional CMP 1
29
4
StageNet CMP 5
University of Michigan
Electrical Engineering and Computer Science
double
buffer
buffer
Issue
Register File
Ex/Mem
REG ID Valid
• Scoreboard to handle RAW dependencies
• Stalls generate backpressure
30
University of Michigan
Electrical Engineering and Computer Science
double
buffer
buffer
double
Scoreboard
double
buffer
Decode
double
buffer
Fetch
double
Branch
Predictor
buffer
Gen
PC
double
Scoreboard
Area overhead breakdown
Router area for 32 and 64 bit configurations
31
University of Michigan
Electrical Engineering and Computer Science
Architectural Details
32
University of Michigan
Electrical Engineering and Computer Science
Stage modifications for SNS
33
University of Michigan
Electrical Engineering and Computer Science
REG ID
• Bypass Cache
- Fully associative structure
- FIFO replacement policy
buffer
buffer
Bypass $
SID
0
Ex/Mem
VALUE
• Key benefits
- Reduced stalls
- Lower bandwidth consumption
34
University of Michigan
Electrical Engineering and Computer Science
double
Issue
0
double
double
Decode
buffer
Scoreboard
double
buffer
buffer
Register File
SID
double
Fetch
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
2. Bypass$ for data forwarding
1
2
3
4
5
6
7
8
9
10
2
3
6
buffer
buffer
Ex/Mem
Commit Time
5 stage pipeline
1
0
7
8
9
10
BR
SNS pipeline
register
dependency
1
2
3. Transmission delays
3
6
35
7
8
9
10
University of Michigan
Electrical
Engineering
and Computer Science
2. Data
forwarding
double
Issue
Bypass $
SID
double
double
Scoreboard
0
SID
buffer
Register File
double
buffer
buffer
Decode
double
Fetch
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
SNS with Stream-ID, Bypass$
buffer
buffer
Bypass $
SID
0
Ex/Mem
double
Issue
0
double
double
Decode
buffer
Scoreboard
double
buffer
buffer
Register File
SID
double
Fetch
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
3. Transmission delay
Multiple cycles for instruction transfer  Low utilization
36
University of Michigan
Electrical Engineering and Computer Science
Hide delay with Macro-ops
• Need to improve utilization
►
Balance transfer and compute time
• Send instruction bundles
►
►
Max length 4
Max live-ins 2
Macro-ops (MOP)
Greedy selection policy
>>
• Advantages
►
►
LD
+
Removes temp. intermediates
Parallelizes transfer and compute
LD
+
/
&
ST
>>
<<
ST
37
University of Michigan
Electrical Engineering and Computer Science
1
2
3
4
5
6
7
8
9
10
2
3
6
buffer
buffer
Ex/Mem
Commit Time
5 stage pipeline
1
0
7
8
9
10
BR
SNS pipeline
register
dependency
1 1
22
3
3. Transmission delays
3
6 67
38
87
9
10
8
9
10
University of Michigan
Electrical Engineering and Computer Science
double
Issue
Bypass $
SID
double
double
Scoreboard
0
SID
buffer
Register File
double
buffer
buffer
Decode
double
Fetch
Packer
0
double
SID
buffer
Branch
Predictor
double
Gen
PC
buffer
SNS with Stream-ID, Bypass$, MOP
Tolerating Permanent Faults
Current approach
Traditional solutions
►
►
►
TMR
Tandem / HP Non-stop
IBM zSeries
1.
2.
Detection
Diagnosis
►
►
…are impractical
►
►
►
►
Cost
Power
Low gain
3.
Using sensors
Redundant Computation
BIST
Repair
►
►
Replacement
Reconfiguration
K-pos DP-31/32
39
University of Michigan
Electrical Engineering and Computer Science
Download