ppt - Carnegie Mellon University

advertisement
Concurrent Autonomous Self-Test
for Uncore Components in SoCs
Yanjing Li, Stanford University
Onur Mutlu, Carnegie Mellon University
Donald S. Gardner, Intel Corporation
Subhasish Mitra, Stanford University
1
Overcoming CMOS Reliability Challenges
Failure
rate
On-line self-test and diagnostics
Burn-in
difficult
Soft errors
Built-In Soft Error
Resilience (BISER)
Guardbands
expensive
Time
Early-life failures
Lifetime
Circuit aging
2
Uncore Components Significant in SoCs
IBM Power 7
Uncore examples
Uncore
Components
 Controllers for cache & DRAM
 Crossbar
 I/O interfaces
© news.cnet.com
NVIDIA Tegra
Cisco Network Processing Engine
Uncore
Uncore
Components
© techvishal.wordpress.com
Components
© ciscosistemas.org
3
Robust Uncore Essential
8-cores 64-threads
OpenSPARC T2 SoC
Uncore
Uncore
12%
Processor cores
12%
Memories
76%
© opensparc.net

New on-line self-test for uncore

CASP for processor cores [Li DATE 08, ICCAD 09]

ECC, Memory BIST & repair for memories
4
Challenge 1: High Test Coverage

CASP: Concurrent, Autonomous, Stored Patterns
 High-coverage patterns  off-chip FLASH
 System-level on-line test access
 FLASH cheap, test compression pervasive
CASP
Logic BIST
Roving Emulation
Coverage
High
?
Depends
Cost
Low
High
High
Design effort
Moderate
High
High
5
Challenge 2: Power, Performance, Area Costs

Stall-and-test inadequate
 4-core Intel® Core™ i7 system results
DRAM Controller
Core
Core
Core Core
Caches and
Interconnects
On-line self-test
Requests from
multiple cores
Multiple cores stall
© intel.com

Unresponsiveness or system hang
6
Naïve Approaches Inadequate for Uncore
Uncore CASP  new techniques required
Unresponsiveness
or complete hang
Small
area cost
Stall-and-test

12% area
overhead*
Small
performance
impact
Spare unit for each
uncore type

* OpenSPARC T2 design
7
New Uncore On-line Self-Test Principles
I. Resource reallocation and sharing (RRS)
II. No-performance-impact testing
III. Smart backup
< 1% area impact, < 3% performance impact
OpenSPARC T2
SoC
©opensparc.net
8
I. Resource Reallocation and Sharing (RRS)

Components with “similar” functionality in SoCs

Temporary reallocation and sharing
 Small performance hit without replication
OpenSPARC T2
4 cores
Crossbar
blocks
4 cores
CASP
controller
On-line
self-test 1. Stall and
4. Reroute
drain requests
3. Invalidate
2. Transfer
dirty lines
L2 banks
©opensparc.net
9
II. No-Performance-Impact Testing

Implication-relations among SoC components

Component(s) tested when idle
 During test of another component
OpenSPARC T2
RRS
4 cores
CASP
controller
IDLE
On-line
self-test
Crossbar
blocks
4 cores
L2 banks
©opensparc.net
10
III. Smart Backup

Operations with different requirements

Backup unit for performance-critical operations
 Absolute minimal additional hardware
I/O interface
OpenSPARC T2
DMA
for network
Programmed
I/O
DMA
for disks
Support in
smart backup
Stall or handle
slowly via
Programmed I/O
11
Application Performance Impact

Memory-centric
10%
No visible unresponsiveness
Execution
5%
time
impact
1.5% performance impact
0%
PARSEC benchmarks

I/O-centric on 4-core Intel system
 Disk access: 3% impact
4-core Intel®
Core™ i7
 Uncore CASP emulated
© intel.com
12
Area and Power Impact
CASP controller
(< 0.01% area)
On-chip buffer
(8KB)
OFF-CHIP
FLASH
200 MB
© opensparc.net
Uncore on-line self-test principles applied
Minimal area impact: < 1%
Minimal power impact: < 1%
13
Test Results for Uncore Components

200 MB off-chip FLASH
 10X test compression

7 ms – 300 ms test time per component
Total pattern count
Test coverage
Stuck-at
5,577
99.2% - 99.9%
Transition
11,049
92.8% - 97.8%
Inexpensive FLASH
Thorough on-line self-test
14
Uncore CASP vs. Existing Techniques
Logic BIST
Coverage
High with
high costs
Concurrent BIST
Uncore CASP
[Saluja
[This work]
IEEE TCAD 88]
Depends
Area Cost
Design
complexity
Performance
impact
High
Low
High
Low with our
uncore
principles
High costs
possible
Low
Moderate
Low
15
CASP Applicable for Other SoCs
IBM Power 7
I.
RRS
II. No-performanceimpact testing
III. Smart backup
IV. Core CASP
© news.cnet.com
NVIDIA Tegra
Cisco Network Processing Engine
© ciscosistemas.org
© techvishal.wordpress.com
16
Conclusions

CASP  adaptive on-line self-test & diagnostics

3 new principles for uncore CASP
I.
Resource reallocation and sharing (RRS)
II.
No-performance-impact testing
III. Smart backup

Effective and practical
 High test coverage
 1% power, 3% performance, 1% area
17
Backup Slides
18
CASP on Actual Intel® Core™ i7 System

Intel Research collaboration

Quad-core Intel® Core™ i7 (3.2 GHz)
 Thermoelectric temperature controller
 Debug tool

Unique real-life experiment

Development of adaptive self-diagnostics
Temperature
Controller
Debut Tool
Adapter
19
CASP Flow
SoC with CASP controller
(mulit-core SoC proliferation)
Inexpensive off-chip FLASH
(non-volatile storage technology)
1. Select uncore or
core component
3. Apply / analyze highquality test patterns
(test compression,
at-speed test…)
2. Isolate
Scan chain
4. Resume operation
20
RRS Example: L2 Cache Banks
4. Route packets with
1. Stall cache controller
destination {bank 0,
2. Drain outstanding requests
bank 1} to bank 1
3a. Invalidate clean blocks;
Invalidate directory;
Crossbar
Invalidate L1
3b. Transfer
necessary
Bank 0
Bank 1
states (dirty
(under test)
(helper)
blocks)
Tag
Tag
Data
Data
Write-back
etc.
etc.
to main
Controller
Controller
memory if
necessary
…
DRAM Controller 0
21
No-Performance-Impact Testing Example:
CCX (Crossbar)
8 cores , 64 threads
Packets reallocated to
helper
CCX: multiplexers and
arbitration logic 7
CCX: multiplexers and
arbitration logic 0
Separate
scan chains
…
Separate
scan chains
Test at the
same time
L2 Bank 0
L2 Bank 7
22
Smart Backup Example: Non-Cachable Unit
1. Stall
2. Drain
outstanding
requests
Original (under test)
PIO
Boot
ROM
interface
4. Transfer
states
Config.
Interrupt
status
register processing
interface
Backup
PIO
3.Turn on
Reset
Interrupt
status
table
Interrupt
processing
MUX
5. Select outputs
from backup
Minimize area costs at acceptable performance impact
23
Naïve Approaches Inadequate for Uncore

Simple stall-and-test technique
Demonstration on actual 4-core Intel® Core™ i7 system
Infrequent Test
Stall
OS timer interrupt
handler on core 1
…
Request to
DRAM
OS timer interrupt
handler on core i
Stall

DRAM
controller
Noticeable
unresponsiveness
Frequent Test
Under test
System hang
Identical backup units: 12% area overhead
24
Performance Impact
Tool: GEMS simulator (modified for RRS)
Workload: PARSEC benchmark suite
4 threads on 4 cores, CASP runs 1 sec. every 10 sec.
Simulated Latency Overhead (PARSEC Benchmark Suite)
1 thread
2 threads
4 threads
1.5%
1.0%
0.5%
0.0%
25
III. Smart Backup

Operations with different requirements

Backup unit for performance-critical operations
 Absolute minimal additional hardware
Network
I/O interface
interface
OpenSPARC T2
Ethernet
DMA port
for
interface
network
Programmed
Layer 2
packetI/O
process
Layers
DMA
3 and 4
acceleration
for disks
Support in
smart backup
Stall or
OS
handle
orchestration
slowly via
Programmed I/O
26
Download