Full-Custom Design Methodology

advertisement
ECE 5745 Complex Digital ASIC Design
Topic 4: Full-Custom Design Methodology
Christopher Batten
School of Electrical and Computer Engineering
Cornell University
http://www.csl.cornell.edu/courses/ece5745
Design Domains, Abstractions, and Principles
Full-Custom Design
Part 1: ASIC Design Overview
P P
M M
Topic 4
Full-Custom
Design
Methodology
Topic 6
Closing
the
Gap
Topic 1
Hardware
Description
Languages
Topic 5
Automated
Design
Methodologies
Topic 8
Testing and Verification
Topic 3
CMOS Circuits
Topic 2
CMOS Devices
ECE 5745
T04: Full-Custom Design Methodology
Topic 7
Clocking, Power Distribution,
Packaging, and I/O
2 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Agenda
Design Domains, Abstractions, and Principles
Modularity
Hierarchy
Encapsulation
Regularity
Extensibility
Full-Custom Design
ECE 5745
T04: Full-Custom Design Methodology
3 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Behavioral, Structural, and Physical Abstractions
Adapted from [Ellervee’04]
ECE 5745
T04: Full-Custom Design Methodology
4 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Behavioral, Structural, and Physical Abstractions
Adapted from [Ellervee’04]
ECE 5745
T04: Full-Custom Design Methodology
5 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Computer Engineering Stack Abstractions
Application
Algorithm
Programming Language
Operating System
Instruction Set Architecture
Microarchitecture
Register-Transfer Level
Gate Level
Circuits
Devices
Technology
ECE 5745
Processors
Memories
Control
Datapath
Logic
State
Transistors
T04: Full-Custom Design Methodology
Networks
Memories
Interconnect
Wires
6 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principles in VLSI Design
I Modularity – Decompose into components with well-defined interfaces
I Hierarchy – Recursively apply modularity principle
I Encapsulation – Hide implementation details from interfaces
I Regularity – Leverage structure at various levels of abstraction
I Extensibility – Include mechanisms/hooks to simplify future changes
ECE 5745
T04: Full-Custom Design Methodology
7 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
ontain all functionality
Design Principle: Modularity
modules
b_out;
mux_out;
A A
sel en
x
B B
sel en
B=0
A<B
zero?
lt
Network
s_bits_A),
),
el),
ut)
A
I$
I$
I$
I$
P
P
P
P
sub
B
_pf
out),
I Separate design into components
Network
w/ well-defined interfaces
6.375 Spring 2006 • L03 Verilog 2 - Design Examples • 21
I Reason, design, and test
D$
D$
D$
D$
components in isolation
I Interface may or may not
Network
encapsulate implementation
ECE 5745
T04: Full-Custom Design Methodology
8 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Modularity
Modularity can also impact
electrical and physical
characteristics
Physical Modularity
Electrical Modularity
What happens if we cascade
many of these tranmission gate
multiplexers?
ECE 5745
pwr/gnd rails & wells in fixed
locations so they connect via
abutment Adapted from [Weste’11]
T04: Full-Custom Design Methodology
9 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Hierarchy
GCD
Control Unit
Datapath Unit
Multi-Bit
Register
Multi-Bit
Multiplexer
Multi-Bit
Subtracter
Single-Bit
Multiplexer
Single-Bit
Flip-Flop
Full-Adder
Recursively apply modularity principle until
complexity of submodules is manageable
ECE 5745
T04: Full-Custom Design Methodology
10 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Hierarchy
System
Network
I$
I$
I$
I$
Processor
P
P
P
D$
D$
Network
ECE 5745
Data Cache
Bank
P
Processor
Control
Network
D$
Instruction
Cache
Proc-to-D$
Network
Processor
Datapath
Router
Cache
Control
Cache
Datapath
D$
Router
Control
T04: Full-Custom Design Methodology
Router
Datapath
11 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Encapsulation
I Modularity requires well-defined interfaces, but these interfaces might
still expose significant implementation details (e.g., interface in
control/datapath split reveals many details of the implementation)
I Choose interfaces that hide implementation details where possible to
enable more robost composition
I Lab 1 multipliers all use a latency-insenstive val/rdy message interface
to hide timing details, any one of these can be swaped into a
processor and should work without modification
. Fixed-latency iterative multiplier
. Variable-latency iterative multplier
. Pipelined multiplier
ECE 5745
T04: Full-Custom Design Methodology
12 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Regularity
I Modularity, hierarchy, and
Network
I$
I$
I$
I$
P
P
P
P
Network
D$
D$
D$
D$
encapsulation can still lead to
many different kinds of
modules which can increase
design complexity
I Choose a hierarchical
decomposition to leverage
structure and thus faciliate
reuse and reduce complexity
I Both structural and physical
regularity can be exploited
Network
ECE 5745
T04: Full-Custom Design Methodology
13 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
Design Principle: Regularity
Common Cache Design
Network
Structural Regularity
in Ripple-Carry Adder
I$
I$
I$
I$
P
P
P
P
Network
D$
D$
D$
D$
Structural Regularity
in Network
Physical Regularity
in Datapaths
Physical Regularity
in Memories
Network
Common Network Design
ECE 5745
T04: Full-Custom Design Methodology
14 / 53
• Design Domains, Abstractions, and Principles •
Full-Custom Design
ntain all functionality
Design Principle: Extensibility
odules
out;
x_out;
Network
A A
sel en
B B
sel en
B=0
A<B
bits_A),
zero?
A
),
)
I$
I$
I$
I$
P
P
P
P
lt
sub
B
Network
f
t),
D$
Simple form of polymorphism
enables varying bitwidth of
operands
6.375 Spring 2006 • L03 Verilog 2 - Design Examples • 21
Difficult with full-custom design
methodology!
ECE 5745
D$
D$
D$
Network
Parameterization of network and
caches enables reuse; static
elaboration could enable varying the
number of cores and the types of
components
T04: Full-Custom Design Methodology
15 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Agenda
Design Domains, Abstractions, and Principles
Full-Custom Design
Cells
Datapaths
Memories
Control
ECE 5745
T04: Full-Custom Design Methodology
16 / 53
devices (Intel microprocessors, RF power amps for
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Requires
complete customization
of all layers of waf
Full Custom Design
Piece of full-custom multiplier array,
1.0 m 2-metal
Key is that all circuits and transistors
are optimized for specific context
2 Feb 2005
6.884 – Spring 2005
Intel 4004
ECE 5745
T04: Full-Custom Design Methodology
17 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Overview of Full Custom Design Methodology
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
18 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Lambda-Based Design Rules
Lambda-based design rules
One lambda = one half of the “minimum” mask dimension, typically the length
of a transistor channel. Usually all edges must be “on grid”, e.g., in the
MOSIS scalable rules, all edges must be on a lambda grid.
2x2
1
3
2
2
2
3
3
1
diffusion (active)
poly
1
2
2
3
2x2
3
metal1
contact
Adapted from [Terman’02]
More info at: http://www.mosis.org/Technical/Designrules/scmos/scmos-main.html
6.371 – Fall 2002
ECE 5745
10/16/02
T04: Full-Custom Design Methodology
L12 – CMOS Layout 2
19 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Sample “Lambda” Layout
Sample
vdd
A
Y
vss
Adapted from [Terman’02]
6.371 – Fall 2002
ECE 5745
10/16/02
T04: Full-Custom Design Methodology
L12 – CMOS Layout 3
20 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Lambda vs. Micron Rules
Lamba vs. Micron rules
Lambda-based design rules are based on the assumption that one can scale
a design to the appropriate size before manufacture. The assumuption is
that all manufacturing dimensions scale equally, an assumption that “works”
only over some modest span of time. For example: if a design is completed
with a poly width of 2 and a metal width of 3 then minimum width metal
wires will always be 50% wider than minimum width poly wires.
Consider the following
data from Weste,
contacted metal pitch
Table 3.2:
1/2 * contact size
contact surround
metal-to-metal spacing
contact surround
1/2 * contact size
lambda lambda
rule = 0.5u
1
0.5
1
0.5
3
1.5
1
0.5
1
0.5
8
4
micron
rule
0.375
0.5
1.0
0.5
0.375
2.75
Scaled design is legal
but much larger than
it needs to be!
Adapted from [Terman’02]
6.371 – Fall 2002
ECE 5745
10/16/02
T04: Full-Custom Design Methodology
L12 – CMOS Layout 5
21 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom
Cells:and
Sticks
and Compaction
Sticks
Compaction
Stick diagram
Compact X then Y
Horizontal constraints
for compaction in X
Compact Y then X
Compact X with jog
insertion, then Y
Adapted from [Terman’02]
6.3715745
– Fall 2002
ECE
10/16/02
T04: Full-Custom
Design Methodology
L12 – CMOS Layout227 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Example Stick Diagram
ECE 5745
T04: Full-Custom Design Methodology
23 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Example Stick Diagram
ECE 5745
T04: Full-Custom Design Methodology
24 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Cell “Styles”
Choosing a “style”
What about routing
signals between gates?
Note that both layouts
block metal/poly routing
inside the cell. Choices:
metal2 routing over the
cell or routing above/below
the cell.
avoid long (> 50
squares) poly runs
Vertical Gates
Good for circuits where fets
sizes are similar and each
gate has limited fanout.
Best choice for multiple
input static gates and for
datapaths.
Horizontal Gates
Good for circuits where long
and short fets are needed
or where nodes must
control many fets. Often
used in multiple-output
complex gates (e.g,
sum/carry circuits).
don’t “capture” white
space in a cell
don’t obsess over the
layout, instead make a
second pass, optimizing
where it counts
Adapted from [Terman’02]
6.371 – Fall 2002
ECE 5745
T04: Full-Custom10/16/02
Design Methodology
L12 – CMOS Layout258/
53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Optimizing Connections
Optimizing connections
Which does this gate do?
Which is the better gate layout?
Which is better considering node capacitances?
ECE 5745
T04: Full-Custom Design Methodology
considering node
capacitances?
Adapted from [Terman’02]
26 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Optimizing Large Transistors
area = 928
area = 1008
area = 1080
is better
considering
Which isWhich
the better
gate
layout? wire resistances?
Which is better considering node capacitances?
ECE 5745
considering node
capacitances?
T04: Full-Custom
Design Methodology
Adapted from [Terman’02]
27 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Optimizing Diffusion Sharing
Eliminating Gaps
A
B
C
D
C
E
B
A
A
C
D
E
ECE 5745
D
C
A
D
D
E
B
D
A
C
6.371 – Fall 2002
E
B
B
B
A
E
C
10/16/02
T04: Full-Custom Design Methodology
E
L12 – CMOS Layout 11
Adapted
from [Terman’02]
28 / 53
Replicating Cells
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Optimizing Across Cells
Whatdoes
does this
this cell
What
celldo?
do?
What if we want to replicate
What if we want to replicate th
this cell vertically to process
vertically, i.e., make a stack of
many bits in parallel?
cells, to process many bits in
parallel?
what nodes are shared
among the cells?
what nodes aren’t share
how should we arrange t
cells vertically?
Adapted from [Terman’02]
ECE 5745
T04: Full-Custom Design Methodology
29 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Vertical
Replication
Custom Cells: Optimizing
Across
Cells
Place shared geometry symmetrically
about shared boundary.
Place items that aren’t to be shared
1/2 minspacing rule from shared
boundary.
Reflect cell about X axis so that Pfets
are next to each other: this avoids large
ndiff/pdiff spacing.
Run shared control signals vertically -they’ll wire themselves up
automatically?
6.371 – Fall 2002
ECE 5745
10/16/02
T04: Full-Custom Design Methodology
L12from
– CMOS
Layout 13
Adapted
[Terman’02]
30 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Cells: Merging Simple Cells
Two latch implementations: Left implementation composes primitive
gates, while right implementation uses single tightly integrated
gate
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
31 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths, Memories, Control
decode
br_targ
jr
j_targ
pc_plus4
br_targ_X
pc_F
pc_plus4_D
+4
ir[25:0]
ir[15:0]
rst_vect
ir[10:6]
ir_D
addr
pc_mux
_sel_P
rdata
j_targ
br_targ
shamt
ir[25:21]
ir[20:16]
imem
ir[15:0]
ir[15:0]
branch
_cond_zero/neg_X
op0_mux
_sel_D
regfile
(read)
op0_mux
_out_X
alu_fn_X
rf_waddr_W
branch
_cond_eq_X
wb_mux_sel_M
result_W
alu_out_M
16
0
rf_wen_W
regfile
(write)
alu
proc2cop_data_W
zext
op1_mux
_out_X
sext
execute
_mux_sel_X
muldivreq_msg
_fn_X
muldiv
_mux_sel_X
md_mux
op1_mux
_sel_D
dmemresp
_mux_sel_M
subword
imuldiv
wdata_X
addr
wdata
dmem_msg
_rw_M
Fetch (F)
ECE 5745
Decode (D)
Execute (X)
T04: Full-Custom Design Methodology
rdata
dmem
Memory (M)
Writeback (W)
32 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths, Memories, Control
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
33 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths: PC Generation Unit
MIPS PC Generation Unit
32
32
PC for Link Register
Jump
Seq
+
+
4
Reset
Exc.
Branch
32
32
32
32
Jump Register PC
Instruction Register
PC to Instruction Memory
Routing every path as a
separate 32-bit bus
would take a large area
Adapted from [Terman’02]
ECE 5745
T04: Full-Custom Design Methodology
34 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom
Datapaths:
Bitslices
Datapath
Bitslices
Implement datapath as single bit slices, contain one bit
from each functional unit
Route each bus bit position within bitslice
Carry chain
connections
6.371 – Fall 2002
ECE 5745
+
+
Bit 2
+
+
Bit 1
+
+
Bit 0
9/27/02
T04: Full-Custom Design Methodology
Adapted from [Terman’02]
L08 – Logical Effort and ASIC Design Styles
17
35 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom
Datapaths:
Pitch
Datapath
Bit Bit
Pitch
Height of each bit slice depends on:
– height of tallest cell in entire bitslice
– maximum number of buses running through any cell in
bitslice
Two layers
metal, buses
run by side
of cells
Bit Pitch
Three+ layers
metal, buses run
over cells
Adapted from [Terman’02]
ECE 5745
6.371 – Fall 2002
T04: Full-Custom Design Methodology
9/27/02
36 / 53
L08 – Logical Effort and ASIC Design Styles
18
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths:
Gen Example
DatapathPCExample
Bus rip out at right angles
Buses
routed by
side of
cells
In 1.0 m, 2-metal CMOS process
Adapted from [Terman’02]
6.371 – Fall 2002
ECE 5745
9/27/02
T04: Full-Custom Design
Methodology
L08 – Logical Effort and ASIC Design37
Styles
/ 53
19
Design Domains, Abstractions, and Principles
• Full-Custom Design •
CustomDatapath
Datapaths: Library
Datapath Library
Cells Cells
Have to choose maximum datapath cell height
– Too high, wastes area in simple cells
– Too small, squeezes complex cells. Grow superlinearly
in length dimension, so also wastes area.
Compromise, around 8-12 metal tracks works OK
Power run
vertically in M2
VDD
GND
12 track pitch in M3
(buses fly over cell
horizontally)
Control lines run
vertically in M2,
e.g., mux selects,
clocks
6.371 – Fall 2002
ECE 5745
Width varies according
(example metal assignment,
to cell type
others possible)
9/27/02
T04: Full-Custom Design Methodology
Adapted
from
[Terman’02]
L08 – Logical Effort and ASIC
Design
Styles
20
38 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths: Optimizing Datapath Layout
+
mux
+
mux
Reduce congestion by rearranging datapath components to minimize
required number of vertical tracks per bitslice
+
+
Count number of wires
between each component
Datapath
Input or
Output
1
2
4
5
Bus Writer
Bus Reader
3
1
Bit cells must be >5 tracks high
2
4
4
3
Bit cells must be >4 tracks high
Count number of wires
crossing over each cell
Carefully account for buses
going in opposite directions
2
ECE 5745
4
5
5
2
T04: Full-Custom Design Methodology
4
4
4
39 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths: MIPS Datapath Track Allocation
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
40 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths: MIPS Datapath Example (1)
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
41 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Datapaths: MIPS Datapath Example (2)
MIPS Datapath Example
Bypass network
for six-stage Pipe
Adder
Memory address
leaves here
Fetch
address
leaves
here
32
bits
MIPS
register
file
6.371 – Fall 2002
ECE 5745
Load data
arrives here
Log shifter
layout
Multiply/Divide
Unit
Instruction bits
arrive here
(also in bypass)
9/27/02
T04: Full-Custom Design Methodology
Program
counter
generation
Adapted from [Terman’02]
L08 – Logical Effort and ASIC Design Styles
22
42 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories:
Array Structure
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
43 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories: Register File Circuits
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
44 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories: Register File Circuits
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
45 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories: SRAM Circuits
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
46 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories: SRAM Layut
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
47 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Custom Memories: SRAM
Memory Layout
Regular arrays built with cells that abut in two dimensions
– Have to pitch match in both dimensions
Word Line
Drivers
SRAM Cell
Final Decode
Power Buses
Address
Predecoders
Sense Amps
Adapted from [Terman’02]
6.371 – Fall 2002
ECE 5745
23/ 53
T04: Full-Custom9/27/02
Design Methodology L08 – Logical Effort and ASIC Design Styles 48
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Finite-State-Machine Control Unit
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
49 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Full Custom Control Logic with PLA
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
50 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Top-Level Chip Floorplan
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
51 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Final Full-Custom MIPS Processor
Adapted from [Weste’11]
ECE 5745
T04: Full-Custom Design Methodology
52 / 53
Design Domains, Abstractions, and Principles
• Full-Custom Design •
Acknowledgments
I [Weste’11] N. Weste and D. Harris, “CMOS VLSI Design: A Circuits and
Systems Perspective,” 4th ed, Addison Wesley, 2011.
I [Terman’02] C. Terman and K. Asanović, MIT 6.371 Introduction to VLSI
Systems, Lecture Slides, 2002.
I [Ellervee’04] P. Ellervee, IAY3714 VLSI Synthesis and HDLs, Lecture Slides,
2004.
ECE 5745
T04: Full-Custom Design Methodology
53 / 53
Download