ICCAD'03 Review

advertisement
ICCAD’03 Review
CSE 597B
Lin Li
Outline

Overview





Paper in related areas






Archive download URL
Best paper award
Paper from our group
Interesting tutorial
Power and energy optimization
Interconnect-centric SoC design
Reliable issue
Performance optimization
Simulation at the nanometer scale
Other areas in ICCAD
Archive Download URL

Papers and presentation slides can be
downloaded from:
http://www.iccad.com/archive.html
Best Paper Award

6C.1 - Noise Analysis for Optical Fiber
Communication Systems



Alper Demir
KOC University, Sariyer-Istanbul, Turkey
8B.1 - Block-Based Static Timing Analysis with
Uncertainty


Anirudh Devgan, Chandramouli Kashyap
IBM Research at Austin, IBM Microelectronics
Paper from Our Group

1A.1 - Adaptive Error Protection for Energy
Efficiency


Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mary
Jane Irwin
3C.1 - Array Composition and Decomposition
for Optimizing Embedded Applications

Guilin Chen, Mahmut Kandemir, Ugur Sezer, Avanti
Nadgir
Interesting Tutorial

2C.1 - Design and CAD Challenges in sub90nm CMOS Technology



Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi,
Ruchir Puri
IBM T.J. Watson
11B.1 - Formal Methods for Dynamic Power
Mangement


Rajesh K. Gupta, Sandeep Shukla, Sandy Irani
UCSD, UCI, and VT
2C.1 - Design and CAD Challenges in sub-90nm
CMOS Technology

Introduction



Planar device structures




Partially-depleted (PD) SOI
Fully-depleted (FD) SOI
Strained-Si & high-k gate
Emerging technologies





CMOS device scaling
New devices for high-performance logic
Double-gate MOSFETs
3D integration and interconnects
Carbon Nanotube Transistor (CNT)
Molecular computing
CAD challenges


Challenges of Advanced device technologies
Major issues


Power crisis
Coping with Variability
2C.1 - Design and CAD Challenges in sub-90nm
CMOS Technology (Cont’d)
11B.1 - Formal Methods for Dynamic Power
Mangement



Overview the formal methods that have been
explored in solving the system-level Dynamic
Power Management (DPM) problem.
Show how formal reasoning frameworks can
unify apparently disparate DPM techniques.
Approaches that treat the DPM problem as
one of stochastic optimization with
probabilistic guarantees on performance.
Power and Energy Optimization




Using dynamic voltage scaling in embedded
systems (Section 1B)
Using software techniques in embedded
systems (Section 3C)
Energy issues in systems design (Section 7B)
Power-aware design (Section 8C)
1B.1 - Generalized Network Flow Techniques
for Dynamic Voltage Scaling in Hard Real-Time
Systems



Vishnu Swaminathan, Krishnendu
Chakrabarty
ECE@Duke
Energy consumption must be carefully
balanced with real-time responsiveness in
hard real-time systems.
Present an optimal offline dynamic voltage
scaling (DVS) scheme for dynamic power
management in such systems.
lij, uij, Cij, mij
Generalized Network Flow
Models for the DVS problem
Jobs
Speeds
i
j
Intervals
s1h
j1
s1l
s
.
.
.
.
jn
s 1i
snh
D1
D2
.
.
.
.
snl
D2n-2
sni
D2n-1
t
1B.2 - Approaching the Maximum Energy
Saving on Embedded Systems with Multiple
Voltages


Shaoxiong Hua, Gang Qu
ECE@UMCP
For a multiple-voltage DVS system to serve a
set of applications {(ei, di, pi): i=1, 2, …, n}
without missing their deadlines,


if the system has m voltages {v1, v2,… ,vm},
determine the value of each vi to minimize the
energy consumption.
determine m and the value of each vi .
1B.2 - Approaching the Maximum Energy
Saving on Embedded Systems with Multiple
Voltages (Cont’d)

Voltage set-up is the fundamental problem for
multiple-voltage DVS system.




application-specific
2-voltage DVS system: analytic solutions and a
linear search algorithm
m-voltage DVS system: analytic solution does not
exist, an approximation method
Multiple-voltage can be very close to the
maximal energy saving by DVS.
1B.3 - Combined Dynamic Voltage Scaling and
Adaptive Body Biasing for Heterogeneous
Distributed Real-Time Embedded Systems


Le Yan, Jiong Luo, Niraj K. Jha
EE@Princeton
New scheduling algorithm that combines DVS
and adaptive body biasing (ABB) to
simultaneously optimize both dynamic power
consumption and leakage power consumption
for real-time distributed embedded systems.
1B.3 - Combined Dynamic Voltage Scaling and
Adaptive Body Biasing for Heterogeneous
Distributed Real-Time Embedded Systems

A novel two-phase approach
Phase I
Optimal tradeoff between supply and threshold voltages
Phase II
Trade off energy consumption and clock period
1B.3 - Combined Dynamic Voltage Scaling and
Adaptive Body Biasing for Heterogeneous
Distributed Real-Time Embedded Systems
Initializations
Phase I
No
Extensible tasks exist?
Return
Yes
Allocate slack to reference task
Phase II
Reference task:
highest energy_derivative
Allocate slack to each other task
No
EST+WCET>LFT?
Yes
Invalidate this slack allocation
energy_derivative:
higher than reference
level
3C.3 - Energy Optimazation of Distributed
Embedded Processors by Combined Data
Compress ion and Functional Partitioning

Jinfeng Liu, Pai H. Chou

Goal


ECE@UCI
Energy minimization for distributed embedded
processors
Combined optimization


Selection of optimal compression algorithm
Functional partitioning
3C.3 - Energy Optimazation of Distributed
Embedded Processors by Combined Data
Compress ion and Functional Partitioning
150MHz
N1
RECV1
PROC1
IDLE
A bad partitioning scheme that
produces extra I/O load,
without compression
SEND1
D
150MHz
Non-optimal without N2
PROC2
D
RECV2
compression
80MHz
PROC1
D
However, it could turn out optimal
with compression, if the data from
N1 to N2 can be compressed well.
SEND1
IDLE
80MHz
PROC2
D
IDLE
SEND2
RECV2
COMP2
N2
DECO2
Optimal with
compression
SEND2
D
COMP1
RECV1
DECO1
N1
IDLE
3C.4 - Energy-Aware Fault Tolerance in
Fixed-Priority Real-Time Embedded Systems




Ying Zhang, Krishnendu Chakrabarty, Vishnu
Swaminathan
ECE@Duke
Goal: low power, fault-tolerant real-time
systems
Fault tolerance is achieved via checkpointing
Power management is carried out using
dynamic voltage scaling (DVS).
7B.1 - A Game Theoretic Approach to Dynamic
Energy Minimization in Wireless Transceivers




Ali Iranli, Hanif E. Fatemi, Massoud Pedram
EE@USC
A hierarchical formulation for energy optimization of
wireless transceivers is proposed
A game theoretic approach to solve this energy
minimization is proposed by which the energy
consumption is reduced by 15% for BER = 10-5
The proposed hierarchical frame work can be used in
general for energy optimization of server-client
systems
7B.1 - A Game Theoretic Approach to Dynamic
Energy Minimization in Wireless Transceivers
Transceiver Energy Optimization
Transmitter
Receiver
Transmit Power
& Modulation level
Overall
energy consumption
Truncation length
Receiver's
energy consumption
Stackelberg Game
Leader
Follower
Leader’s
Policy
Leader’s
cost function
Follower’s
Policy
Follower’s
cost function
7B.2 - Communication-Aware Task Scheduling
and Voltage Selection for Total Systems Energy
Minimization



Girish V. Varatkar, Radu Marculescu
ECE@CMU
Recent work in ES community: performance and
energy are crucial!
Voltage selection



Task scheduling algorithm should use the foresight that
voltage selection is going to follow the scheduling step
Schedule should provide the maximum slowing down
potential
This work brings the communication aspect into the
picture


A ‘communication-centric’ approach
A ‘voltage selection’ approach
7B.3 - LRU-SEQ: A Novel Replacement Policy
for Transition Energy Reduction in Instruction
Caches

Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel
CSE@Notre Dame

LRU to LRU-SEQ (Sequential LRU)




Constraining sequential fetches to the same bank (same
way) avoids bank transitions.
It also increases the sleep time for the banks over-coming
break-even time requirements.
LRU nature has to be maintained, else associativity is lost !!
(hit-ratio is affected)
Distance between the last fetched line and the present line
is a parameter that will affect the performance of this
policy.
7B.3 - LRU-SEQ: A Novel Replacement Policy for
Transition Energy Reduction in Instruction
Caches
FOR (every cache access) DO
IF (access == HIT) THEN
P_way = C_way
State Holder 1:
ELSE
P_way
dist = abs(Curr_Addr, Prev_Addr);
(entire cache)
IF ( dist <= SEQ_DST) THEN
State Holder 2 :
C_way = P_way
P_line
ELSE
(each cache way)
C_way = LRU_Way
P_( ) : Previous_( )
END
C_( ) : Current_( )
END
Update LRU state for access.
END
7B.4 - Compiler-Based Register Name
Adjustment for Low-Power Embedded
Processors





Peter Petrov, Alex Orailoglu
CSE@UCSD
Compiler-driven register name adjustment
for low-power was proposed
Register names reassigned without incurring
any performance or power overhead
No hardware support required whatsoever
Efficient algorithm for Register Name
Adjustment proposed with additional
frequency skew enhancing phase
8C.1 - Leakage Power Optimization Techniques
for Ultra Deep Sub-Micron Multi-Level Caches


Nam S. Kim, David Blaauw, Trevor N. Mudge
EECS@UMICH
Cost- effective # of VTH for cache leakage reduction


depending on the target access time, but 1 or 2 high VTH’s is
enough for leakage reduction
Cache leakage



another design constraint in processor design
trade-off among delay / area / leakage
Incorporating w/ realistic cache miss statistics for the
leakage optimization
8C.1 - Leakage Power Optimization Techniques
for Ultra Deep Sub-Micron Multi-Level Caches
Using high-k dielectric
reduces gate-oxide
leakage
ITRS 2002 projections with doubling of # of transistors every two years
8C.1 - Leakage Power Optimization Techniques
for Ultra Deep Sub-Micron Multi-Level Caches
cache sub-bank organization
VTH2
Circuit model based on
CACTI
word-line
decoder
VTH1
Abus buffer
w/ repeater
bit-line pair
70nm Berkeley predictive
technology model
VTH3
memory cell
Interconnect R/C annotated
sense-amp w/ I/O circuits
Dbus buffer
w/ repeater
VTH4
repeaters used to minimize
interconnect delay
8C.3 - Dynamic Platform Management for
Configurable Platform-Based System-on-Chips




Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD
Described design techniques for dynamically
customizing a general-purpose configurable platform
Dynamic platform management helps combine
benefits of general-purpose & application-specific
approaches
Benefits
 Improved application performance
 More efficient platform resource usage
 Improved energy efficiency
Improving flexibility, time-to-market,
engg. cost, time-in market,
8C.3 - Dynamic Platform Management for
Configurable Platform-Based System-on-Chips
Platform
Customization
Techniques
General-Purpose
Processors
General Purpose
Configurable
Platforms
Customized
Platforms
Domain Specific
Platforms
ASIC,
Custom SoC
Improving performance, power, size
8C.3 - Dynamic Platform Management for
Configurable Platform-Based System-on-Chips
Performance
Objectives, Data
Properties Application 1
Performance
Objectives, Data
Properties Application 2
Processing Requirements
Performance
Power Constraints
Objectives, Data
Properties Application 3
Processing Requirements Processing Requirements
Dynamic Platform Management
Optimized Platform Configuration
General-purpose Configurable Platform
Programmable
Voltage Regulator
Programmable PLL
Embedded
processor
PLD
On-chip communication architecture
Flexible
on-chip
SRAM
Reconfigurable
Cache
Parameterized
co-processor
Interconnect-Centric SoC
Design
1A.2 - SAMBA-Bus: A High Performance Bus
Architecture for System-on-Chips



Ruibing Lu, Cheng-Kok Koh
Single Arbitration, Multiple Bus Accesses
Automatically delivers multiple bus
transactions


ECE@Purdue
High bandwidth
Bus transactions can be performed even
without explicit bus access grant from the
arbiter

Communication latency increases only slightly
even with high arbitration latency
1A.2 - SAMBA-Bus: A High Performance
Bus Architecture for System-on-Chips
Interface Unit
Two sub-buses
Forward
Sub-bus
Backward
Sub-bus
M1
M2
M3
M4
Forward
Sub-bus
Backward
Sub-bus
1A.3 - The Y-Architecture for On-Chip
Interconnect: Analysis and Methodology





Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng
et.al.
CSE@UCSD
The Y-architecture for on-chip interconnect is based
on pervasive use of 0-, 120-, and 240-degree
oriented semi-global and global wiring.
Communication capability (throughput of meshes)
better than Manhattan architecture and Xarchitecture.
Better total wire length compared to both H and X
clock tree structures and better path length
compared to the H tree.
Achieve 8.5% less IR drop than an equally-resourced
power network in Manhattan architecture.
1A.3 - The Y-Architecture for On-Chip
Interconnect: Analysis and Methodology
(a) A 7 by 7 mesh using Y-architecture
(b) A 7 by 7 mesh using Manhattanarchitecture
(c) A 7 by 7 mesh using Xarchitecture
7 x 7 meshes with different
interconnect architectures.
Reliable Issue
3B.4 - Vectorless Analysis of Supply Noise
Induced Delay Variation


Sanjay Pant, David Blaauw, Savithri Sundareswaran
UMICH, Motorola
Power Supply Integrity Issues

Functional Failure



Voltage fluctuations inject noise in the circuit
Performance Failure

Gate delay becoming increasing sensitive to supply voltage

±10% variation in supply can result in 30% delay increase
Proposed Approach

Vectorless

Conservative in estimating worst-case drop/delay increase

Takes into account both IR and LdI/dt drops
3B.4 - Vectorless Analysis of Supply Noise
Induced Delay Variation
Power
Grid
i/p Vector
Search

Simulator
Library
Charac.
STA
WorstCase
Timing
Voltage Drop Estimation



Input
Vectors
Worst
Voltage
Drop
Worst Drop highly dependent on input vectors
Slow simulation times allow only a few vectors to be tried
Worst-Case Voltage Budget Analysis

Highly conservative


Worst-case drop is localized
Ignores voltage shifts between distant driver-receiver pairs
3B.4 - Vectorless Analysis of Supply Noise
Induced Delay Variation
VDD
Divide Chip Into
Blocks
Compute Unit Pulse
Response
Gate Delay
Characterize
POWER GRID
VDD
GND
GND
Express Delay/Voltage
Using Spatial/Temporal
Superposition
Formulate
Delay/Voltage Max.
As Linear
Optimization
V(t)
GROUND GRID
V(t)
i (t)
Variables
5B.2 - Fault-Tolerant Techniques for Ambient
Intelligent Distributed Systems

Diana Marculescu

Novel techniques for harnessing redundancy
as a way for increasing fault-tolerance



ECE@CMU
Assume a large number of networked devices
Idle devices can act as surrogates for failing ones
via application migration or remapping
Scheduling techniques for optimizing system
lifetime

Determine optimal migration schedule, under
realistic battery models
8C.2 - Dynamic Fault-Tolerance and Metrics for
Battery Powered, Failure-Prone Systems


Phillip Stanley-Marbell, Diana Marculescu
ECE@CMU
Introduce the concept of adaptive faulttolerance management for failure-prone
systems, and a classification of local
algorithms for achieving system-wide
reliability.
Performance Optimization
5B.1 - Cache Optimization For Embedded
Processor Cores: An Analytical Approach


Arijit Ghosh, Tony Givargis
CS@UCI
An efficient algorithm to directly compute
cache parameters satisfying desired
performance criteria.
5B.3 - Performance Efficiency of Context-Flow
System-On-Chip Platform


Rami Beidas, Jianwen Zhu
ECE@Toronto
A new programming model, called contextflow, that is simple, safe, highly parallelizable
yet transparent to the underlying architectural
details.
Simulation at the Nanometer Scale
7A.1 - A Probabilistic-Based Design
Methodology for Nano-Scale Computation



Iris Bahar, Joseph Mundy, Jie Chen
Based on Markov random fields
Propose a new architectural framework designed to
handle faulty processes prevalent with nanoscale
devices




Brown
Dynamically defect tolerant
Adapts to errors as a natural consequence of probability
maximization
Removes need to actually detect faults
Can handle both structure- and signal-based faults
7A.1 - A Probabilistic-Based Design
Methodology for Nano-Scale Computation

Carbon Nanotubes (CNTs)




Excellent conductors
Diodes, FETs, and memory
arrays using CNTs have
been demonstrated
Physical placement of
CNTs is an issue
Alumina substrates have
been proposed to fabricate
arrays of CNTs
Off Junction On Junction
Carbon Nanotubes
7A.1 - A Probabilistic-Based Design
Methodology for Nano-Scale Computation

Molecular devices



Direct use of molecules and
their electronic states
Conduction achieved by
changes in physical
configuration or electronic
state
Diodes and memory have
been demonstrated
additional
electron
switch on
7A.1 - A Probabilistic-Based Design
Methodology for Nano-Scale Computation

Quantum Cellular Automata
(QCA)
 Based on local interaction of
quantum dots arranged in
cells
 Logic function is encoded into
spatial patterns of the cells.
 Information is propagates
through chains of QCA devices
7A.2 - Modeling of Ballistic Carbon Nanotube
Field Effect Transistors for Efficient Circuit
Simulation





Arijit Raychowdhury, Saibal Mukhopadhyay,
Kaushik Roy
ECE@Purdue
Circuit/SPICE level model for Ballistic CNFETs
Removes self-consistent solutions of Poisson’s
and Schrödinger's Equations
Proposed model closely replicates the self
consistent numerical simulations
The model has been used to simulate simple
adders/multipliers
7A.2 - Modeling of Ballistic Carbon Nanotube
Field Effect Transistors for Efficient Circuit
Simulation
Carbon nanotubes are graphite sheets
rolled in the form of tubes. They act as
channel material for FETs.
Source: IBM
7A.2 - Modeling of Ballistic Carbon Nanotube
Field Effect Transistors for Efficient Circuit
Simulation
Schottky
barrier
Top Gate
S
Intrinsic
CNT
Band Diagram
b=Eg/2
D
ZrO2
Bottom Gate
Intrinsic
CNT
n+
Top Gate
ZrO2
Bottom Gate
n
n+
+
7A.2 - Modeling of Ballistic Carbon Nanotube
Field Effect Transistors for Efficient Circuit
Simulation
• Performance of
CNFETs can be
evaluated only through
circuit simulations
• SPICE compatible
compact modeling is
essential for circuit
simulations
7A.3 - Circuit Simulation of Nanotechnology
Devices with Non-Monotonic I-V Characteristics


Jiayong Le, Larry Pileggi, Anirudh Devgan
ECE@CMU
Describes a circuit level simulator that can
accommodate an important class of
nanotechnology devices that are
characterized by nonmonotonic I-V
characteristics.
Other Areas in ICCAD



Placement, Routing, and Floorplanning
Analog design and Methodology
Verification



Formal Verification
Dynamic Verification
Timing Analysis



Delay and Signal Modeling
Statistical Static Timing
Retiming for Global Interconnects
Other Areas in ICCAD (Cont’d)

CAD Algorithms for Emerging Technologies






Reversible Logic Synthesis
DNA Probe Array Layout
MEMS
Design for Customized Processors
Synthesis
Testing
Download