ECE 260B - CSE241A VLSI Digital Circuits

advertisement
ECE260B – CSE241A
Winter 2005
Design Styles
Multi-Vdd/Vth Designs
Website: http://vlsicad.ucsd.edu/courses/ece260b-w05
ECE 260B – CSE 241A Design Styles 1
http://vlsicad.ucsd.edu
The Design Problem
Source: sematech97
A growing gap between design complexity and design productivity
ECE 260B – CSE 241A Design Styles 2
http://vlsicad.ucsd.edu
Design Methodology
• Design process traverses iteratively between three abstractions:
behavior, structure, and geometry
• More and more automation for each of these steps
ECE 260B – CSE 241A Design Styles 3
http://vlsicad.ucsd.edu
Behavioral Description of Accumulator
entity accumulator is
port (
DI : in integer;
DO : inout integer := 0;
CLK : in bit
);
end accumulator;
architecture behavior of accumulator is
begin
process(CLK)
variable X : integer := 0; -- intermediate variable
begin
if CLK = '1' then
X < = DO + D1;
DO <= X;
end if;
end process;
end behavior;
ECE 260B – CSE 241A Design Styles 4
Design described as set of input-output
relations, regardless of chosen
implementation
Data described at higher abstraction
level (“integer”)
http://vlsicad.ucsd.edu
Structural Description of Accumulator
entity accumulator is
port ( -- definition of input and output terminals
DI: in bit_vector(15 downto 0) -- a vector of 16 bit wide
DO: inout bit_vector(15 downto 0);
CLK: in bit
);
end accumulator;
architecture structure of accumulator is
component reg -- definition of register ports
port (
DI : in bit_vector(15 downto 0);
DO : out bit_vector(15 downto 0);
CLK : in bit
);
end component;
component add -- definition of adder ports
port (
IN0 : in bit_vector(15 downto 0);
IN1 : in bit_vector(15 downto 0);
OUT0 : out bit_vector(15 downto 0)
);
end component;
-- definition of accumulator structure
signal X : bit_vector(15 downto 0);
begin
add1 : add
port map (DI, DO, X); -- defines port connectivity
reg1 : reg
port map (X, DO, CLK);
end structure;
ECE 260B – CSE 241A Design Styles 5
Design defined as composition of
register and full-adder cells (“netlist”)
Data represented as {0,1,Z}
Time discretized and progresses with
unit steps
Description language: VHDL
Other options: schematics, Verilog
http://vlsicad.ucsd.edu
Implementation Methodologies
Digital Circuit Implementation Approaches
Semi-custom
Custom
Cell-Based
Standard Cells
Compiled Cells
ECE 260B – CSE 241A Design Styles 6
Macro Cells
Array-Based
Pre-diffused
(Gate Arrays)
Pre-wired
(FPGA)
http://vlsicad.ucsd.edu
Full Custom







Hand drawn geometry
All layers customized
Digital and analog
Simulation at transistor level
High density
High performance
Long design time
Magic Layout Editor
(UC Berkeley)
ECE 260B – CSE 241A Design Styles 7
http://vlsicad.ucsd.edu
Symbolic Layout
VDD
3
Out
In
1
• Dimensionless layout entities
• Only topology is important
• Final layout generated by
“compaction” program
GND
Stick diagram of inverter
ECE 260B – CSE 241A Design Styles 8
http://vlsicad.ucsd.edu
Standard Cells


Organized in rows


All layers customized
Cells made as full custom by
vendor (not user)
Digital with possible special
analog cells
Rows of Cells




Feedthrough Cell
Simulation at gate level (digital)
Logic Cell
Routing
Channel
Functional
Module
(RAM,
multiplier, )
Medium-high density
Medium-high performance
Reasonable design time
ECE 260B – CSE 241A Design Styles 9
Routing channel
requirements are
reduced by presence
of more interconnect
layers
http://vlsicad.ucsd.edu
Standard Cell — Example
[Brodersen92]
ECE 260B – CSE 241A Design Styles 10
http://vlsicad.ucsd.edu
Standard Cell - Example
3-input NAND cell
(from Mississippi State Library)
characterized for fanout of 4 and
for three different technologies
ECE 260B – CSE 241A Design Styles 11
http://vlsicad.ucsd.edu
Automatic Cell Generation
Random-logic layout
generated by CLEO
cell compiler (Digital)
ECE 260B – CSE 241A Design Styles 12
http://vlsicad.ucsd.edu
Module Generators — Compiled Datapath
buffer
adder
reg1
reg0
bus2
mux
bus0
bus1
routing area
feed-through
bit-slice
Advantages: One-dimensional placement/routing problem
ECE 260B – CSE 241A Design Styles 13
http://vlsicad.ucsd.edu





Macrocell-Based Design
Predefined macro blocks (uP, RAM, etc.)
Macro blocks made as full custom by vendor (IP blocks)
All layers customized
Digital and some analog
Simulation at behavior
or gate level





Macrocell
Interconnect Bus
High density
High performance
Routing Channel
Short design time
Use standard on-chip busses
“System on a chip” (SOC)
ECE 260B – CSE 241A Design Styles 14
http://vlsicad.ucsd.edu
Macrocell Design Methodogoly
SRAM
Floorplan:
SRAM
Defines overall
topology of design,
relative placement of
modules, and global
routes of busses,
supplies, and clocks
Data paths
Standard cells
Video-encoder chip
[Brodersen92]
ECE 260B – CSE 241A Design Styles 15
http://vlsicad.ucsd.edu
Gate Array









Predefined transistors connected via metal
Two types: channel based, sea of gates
Only metal layers customized
Fixed array sizes
rows of
uncommitted
cells
Digital cells in library
Simulation at gate level (digital)
Medium density
routing
channel
Medium performance
Reasonable design time
ECE 260B – CSE 241A Design Styles 16
http://vlsicad.ucsd.edu
Gate Array — Primitive Cells
polysilicon
In 1 In 2
In 3 In4
VD D
metal
possible
contact
GND
Out
Uncommited
Cell
ECE 260B – CSE 241A Design Styles 17
Committed
Cell
(4-input NOR)
http://vlsicad.ucsd.edu
Sea-of-gates
Random Logic
Memory
Subsystem
LSI Logic LEA300K
(0.6 mm CMOS)
ECE 260B – CSE 241A Design Styles 19
http://vlsicad.ucsd.edu
Prewired Arrays
 Programmable logic blocks
 Programmable connections between logic blocks
 No layers customized (standard devices)
 Digital only
 Low-medium performance
 Low-medium density
 Programmable: SRAM, EPROM, Flash,
Anti-fuse, etc.





Easy and quick design changes
Cheap design tools
Low development cost
High device cost
NOT a real ASIC
ECE 260B – CSE 241A Design Styles 20
Courtesy Altera Corp.
http://vlsicad.ucsd.edu
Programmable Logic Devices
PLA
ECE 260B – CSE 241A Design Styles 21
PROM
PAL
http://vlsicad.ucsd.edu
Field-Programmable Gate Arrays - Fuse-based
I/O Buffers
Program/Test/Diagnostics
Vertical routes
I/O Buffers
I/O Buffers
Standard-cell like
floorplan
Rows of logic modules
Routing channels
I/O Buffers
ECE 260B – CSE 241A Design Styles 23
http://vlsicad.ucsd.edu
Interconnect
Programmed interconnection
Input/output pin
Cell
Antifuse
Horizontal
tracks
Vertical tracks
ECE 260B – CSE 241A Design Styles 24
Programming interconnect using anti-fuses
http://vlsicad.ucsd.edu
Field-Programmable Gate Arrays - RAM-based
CLB
CLB
switching matrix
Horizontal
routing
channel
Interconnect point
CLB
CLB
Vertical routing channel
ECE 260B – CSE 241A Design Styles 25
http://vlsicad.ucsd.edu
RAM-based FPGA - Basic Cell (CLB)
Combinational logic
Storage elements
R
A
B/Q1/Q2
Any function of up to
4 variables
C/Q1/Q2
Din
R
F
F
G
CE
D
A
B/Q1/Q2
D Q1
Any function of up to
4 variables
R
G
C/Q1/Q2
F
D
G
E
F
D Q2
CE
G
Clock
CE
Courtesy of Xilinx
ECE 260B – CSE 241A Design Styles 26
http://vlsicad.ucsd.edu
RAM-based FPGA
Xilinx XC4025
ECE 260B – CSE 241A Design Styles 27
http://vlsicad.ucsd.edu
High Performance Devices
 Mixture of full custom, standard cells and macro’s
 Full custom for special blocks: Adder (data path), etc.
 Macro’s for standard blocks: RAM, ROM, etc.
 Standard cells for non critical digital blocks
ECE 260B – CSE 241A Design Styles 28
http://vlsicad.ucsd.edu
Global Signaling and Layout
Global signaling and layout
optimization



Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing
ECE 260B – CSE 241A Design Styles 29
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Global Signaling
 Current global signaling paradigm  insert large static
CMOS repeaters to reduce wire RC delay
 Impending problems:

Too many repeaters
- 180nm processors: 22K repeaters (Itanium), 70K (Power4)
- Project 1-1.5M repeaters at 45-65nm technologies

Too much power
- Many large repeaters = significant static and dynamic power

Too much noise
- Repeater clustering complicates power distribution
- Inductive coupling across wide bus structures
ECE 260B – CSE 241A Design Styles 30
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Cell Layout Optimization

Advanced layout techniques must allow

Continuous individual device sizing
Variable p/n ratios
Tapered FET stacking sizes

Arbitrary Vth assignments within gates



First cut: Cadabra  15-22% power reduction using 1st
two approaches under fixed footprint constraint
Optimize specific
instances of
standard gates
Ref: Hurat, Cadabra
GDSII Import
ECE 260B – CSE 241A Design Styles 31
D. Sylvester, DAC-2001
Compact fixed width
http://vlsicad.ucsd.edu
Multi-Vdd

Global signaling and layout optimization
Multi-Vdd


Static power analysis
Multi-Vth + Vdd + sizing
ECE 260B – CSE 241A Design Styles 32
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Multi-Vdd Status
 Idea: Incorporate two Vdd’s to reduce dynamic power
 Limited to a few recent Japanese multimedia processors

Example – 0.3 mm, 75MHz, 3.3V media processor (Toshiba)
- Total power savings of 47% in logic, 69% in clock

Dynamic voltage scaling of mobile processors
- Transmeta Crusoe, Intel Speedstep, etc.
- Not considered in this talk
 Very powerful technique currently applied only in
low-performance designs

Mentality: today’s high performance parts aren’t “limited” by
power
ECE 260B – CSE 241A Design Styles 33
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Lower Power Via Rich Replacement
other low speed designs
have many non-critical
paths


60-70% of paths have delay
 half the clock period
After replacement, most
paths become near critical
 What about high-speed
% of total paths
 Media processors and
microprocessors?
Path delay (normalized
to clock period)
ECE 260B – CSE 241A Design Styles 34
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Similar Story For High-Performance
 IBM 480 MHz PowerPC shows over 50% of paths have
delay less than half the clock period

Implies that high-performance designs can benefit from multi-Vdd
Ref: Akrout, JSSC98
ECE 260B – CSE 241A Design Styles 35
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Resizing Is Not The Right Answer
 Post-synthesis optimizations resize gates to recover
power on non-critical paths

Looks similar to pre- and post-replacement figures in media
processor…
Before postsynthesis resizing
After postsynthesis resizing
This is the wrong approach for
nanometer design!
ECE 260B – CSE 241A Design Styles 36
D. Sylvester, DAC-2001
Ref: Sirichotiyakul, DAC99
http://vlsicad.ucsd.edu
Multi-Vdd Instead of Sizing
 Power ~ C Vdd2 f, where f is fixed
 Key: Reducing gate width impacts power sub-linearly

Interconnect capacitance is not affected
 Reducing supply voltage cuts power quadratically

All capacitive loads have lower voltage swing
 How can we minimize delay penalty at low Vdd?
ECE 260B – CSE 241A Design Styles 37
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Challenges For Multi-Vdd
 Area overhead

Toshiba reported 7% rise in area due to placement restrictions,
level converters, additional power grid routing
 EDA tool support for the above issues (placement,
dual power routing)
 Noise analysis


Additional shielding required between Vdd,low and Vdd,high
signals?
Including clock network
ECE 260B – CSE 241A Design Styles 38
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Static Power


Global signaling and layout optimization
Multi-Vdd
Static power

Multi-Vth + Vdd + sizing
ECE 260B – CSE 241A Design Styles 39
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Static Power

Why do we care about static power in non-portable
devices?



Standby power is “wasted” -- leaves fewer Watts for
computation
Worsens reliability by raising die temperatures
Leakage current is a function of Vth and subthreshold
swing (Ss) (x10 at operating vs. room temp!)
 SV 
I off  10  10  mA / mm




th
s

Ss expected to remain at 80-85 mV/dec (room temp)


Device technology may cut this by ~20%
Vth reductions are mandated by scaling Vdd

Vth has been around Vdd/5
ECE 260B – CSE 241A Design Styles 40
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Leakage Suppression Approaches
 Dual-Vth (most common)


Low-Vth on critical paths, high-Vth off
Only cost is additional masks
Vdd
 MTCMOS


Pull Up
Series inserted high-Vth device cuts
leakage current when off (sleep mode)
Delay and area penalties, control
device sizing is critical
Vout
Pull Down
 Other techniques


Substrate biasing to control Vth
Dual-Vth domino
- Use low-Vth devices only in
evaluate paths
ECE 260B – CSE 241A Design Styles 42
D. Sylvester, DAC-2001
Vcontrol
Parasitic
Node
High Vth Device
http://vlsicad.ucsd.edu
Can Gate-length biasing help leakage reduction?

Reduce leakage?
1.2
1
0.8
Leakage
Delay
0.6
0.4
0.2
13
0
13
1
13
2
13
3
13
4
13
5
13
6
13
7
13
8
13
9
14
0
0
Variation of leakage and
delay (each normalized to
1) for an NMOS device in
an industrial 130nm
technology
Gate-length (nm)

Reduce leakage variability?
Leakage Variability
Leakage
Biasing
Gate-length
ECE 260B – CSE 241A Design Styles 43
Leakage
Leakage Variability
Gate-length
http://vlsicad.ucsd.edu
Gate-length Biasing

First proposed by Sirisantana et al.



Small bias



Comparative study of effect of doping, tox and gate-length
Large bias used, significant slow down
Little reduction in leakage beyond 10% bias while delay degrades
linearly
Preserves pin compatibility
 Technique applicable as post-RET step
Salient features


Design cycle not interfered
Zero cost (no additional masks)
ECE 260B – CSE 241A Design Styles 44
http://vlsicad.ucsd.edu
Granularity

Technology-level
All devices in all cells have one biased gate-length

Cell-level
All devices in a cell have one biased gate-length

Device-level
All devices have independent biased gate-length
Simplification: In each cell, NMOS devices have one gate-length and
PMOS devices have another
ECE 260B – CSE 241A Design Styles 45
http://vlsicad.ucsd.edu
Device-Level Leakage Reduction
Leakage saving with a delay penalty of up to 10%
(Simplified device level biasing)
40
35
30
25
Low Vt
20
Nom Vt
15
High Vt
10
5
0
INVX4
ECE 260B – CSE 241A Design Styles 46
NANDX4
BUFX4
ANDX6
http://vlsicad.ucsd.edu
Circuit level



Bias gate-length for non-critical cells
Library extended with each cell having a biased version
Benefits analyzed in conjunction with Multi-VT
assignment and in isolation

SVT-SGL

DVT-SGL

SVT-DGL

DVT-DGL
ECE 260B – CSE 241A Design Styles 47
http://vlsicad.ucsd.edu
Normalized Leakage
Results: Leakage Reduction
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
SVT-SGL
SVT-DGL
DVT-SGL
DVT-DGL
c5315
c6288
c7552 alu128
With less than 2.5% delay penalty
• Design Compiler used for VT assignment and gate-length biasing
• Better results expected with Duet (academic sizer from Michigan)
ECE 260B – CSE 241A Design Styles 48
http://vlsicad.ucsd.edu
Multi-Vth + Vdd + Sizing



Global signaling and layout optimization
Multi-Vdd
Static power analysis
Multi-Vth + Vdd + sizing
ECE 260B – CSE 241A Design Styles 51
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Multi-Everything
 Need an approach that selects between speed, static
power, and dynamic power
 Should be scalable to nanometer design

Rules out dual-Vth domino or other dynamic logic families (low
supplies kill performance advantages)
 Techniques mentioned so far



Flexible, optimized cell layouts
Multi-Vdd
Dual-Vth
 Put them all together
ECE 260B – CSE 241A Design Styles 52
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Multi-Vdd Can Leverage Vth’s
 Existing designs using multi-Vdd do not alter Vth in lowVdd cells


Highly sub-optimal, delay is fully penalized
Limits cell replacement  limits power savings
 Much better solution: reduce Vth in low-Vdd cells to
carefully balance delay, static power, and dynamic power

Enforce technology scaling within a chip – whenever we reduce
Vdd, we also reduce Vth to maintain speed
ECE 260B – CSE 241A Design Styles 53
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Multi-Vdd + Vth Negates Delay Penalty
Delay ~ CVdd/Ion
 Scenarios



Constant Vth (current paradigm)
Scale Vth to maintain constant static power
Scale Vth to reduce static power linearly with Vdd
 Delay penalty is substantially offset
 Ion is very sensitive to Vth
at Vdd < 1V
 Pstatic reduces with Vdd due
to linear term and smaller
Ioff (Ion and DIBL )
Delay (Normalized)
4
3
4
3
35-nm, nominal Vdd = 0.6V
2
1
2
1
0.2
ECE 260B – CSE 241A Design Styles 54
Constant Vth (0.11V)
Scaled Vth, Constant Pstatic
Conservatively Scaled Vth
D. Sylvester, DAC-2001
0.3
0.4
0.5
Vdd (V)
0.6
0.7
http://vlsicad.ucsd.edu
Now Add Sizing
 Multi-Vdd + multi-Vth + sizing/cell layout optimization
attacks power from many angles (multi-dimensional)
 Depending on criticality and switching activities, noncritical gates can be:

Assigned Vdd,low

Assigned Vdd,low + lower Vth

Assigned Vth,high

Downsized (at the individual transistor level if advantageous)

Assigned Vdd,low and upsized
- For gates that cannot tolerate Vdd,low delay, this can be power
efficient

And others
ECE 260B – CSE 241A Design Styles 55
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Summary

Power density must saturate to maintain affordable
packaging options



Multi-Vdd will leverage multiple Vth’s to offset delay
penalty at low Vdd




50 W/cm2 means 200-250W for future large MPUs
Dynamic thermal management saves 25% on packaging power
budget
More widespread re-assignment to Vdd,low
Use Vdd first instead of re-sizing to take advantage of large path
slacks
Anticipated power savings of 50-80%
Static power also addressed through multi-Vth + Vdd +
sizing


Vth difficult to control in ultra-short channels
Intra-cell Vth assignment + MTCMOS/variants + sleep modes
ECE 260B – CSE 241A Design Styles 56
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Next Week: Project Meetings
ECE 260B – CSE 241A Design Styles 57
D. Sylvester, DAC-2001
http://vlsicad.ucsd.edu
Download