L15: VLSI Integration and Performance Transformations Acknowledgement:

advertisement
L15: VLSI Integration and Performance
Transformations
Acknowledgement:
Materials in this lecture are courtesy of the following sources and are used with permission.
Curt Schurgers
J. Rabaey, A. Chandrakasan, B. Nikolic. Digital Integrated Circuits: A Design Perspective.
Prentice Hall/Pearson, 2003.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
1
Layout 101
3-D Cross-Section
p-type substrate
VDD
n-type well
metal/pdiff
contact
SiO2
+
n+
SiO2
n+
p+
p
Wp
p+
p+
n+
n
Lp
N-channel MOSFET
VDD
P-channel MOSFET
IN
OUT
Figure by MIT OpenCourseWare.
Wn
S
G
Circuit Representation
D
IN
GND
metal
OUT
S
L15: 6.111 Spring 2006
poly
n+
diff
p+
diff
Layout
D
G
contact
frommetal
to ndiff
Ln
Used with permission.
ƒ Follow simple design rules (contract
between process and circuit designers)
Introductory Digital Systems Laboratory
2
Custom Design/Layout
5-1 Mux
g64
CARRYGEN
SUMSEL
node1
2-1 Mux
9-1 Mux
ck1
SUMGEN
+ LU
b
1000um
REG
9-1 Mux
Itanium has 6 integer execution units like this
a
sum
sumb
to Cache
s0
s1
LU : Logical
Unit
From register files / Cache / Bypass
Multiplexers
Shifter
Adder stage 1
Adder stage 2
Wiring
Loopback Bus
Loopback Bus
Loopback Bus
Wiring
Die photograph of the
Itanium integer datapath
Courtesy Intel, as reprinted in Rabaey, et al. "Digital Integrated Circuits".
Bit slice 0
Bit slice 1
Sum Select
Bit slice 2
Bit slice 63
Adder stage 3
Bit-slice Design Methodology
To register files / Cache
ƒ Hand crafting the layout to achieve maximum clock rates (> 1Ghz)
ƒ Exploits regularity in datapath structure to optimize interconnects
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
3
The ASIC Approach
Design Iteration
Design Capture
Pre-Layout
Pre-Layout
Simulation
Simulation
Behavioral
Verilog
Verilog(or
(orVHDL
VHDL))
Structural
Logic
LogicSynthesis
Synthesis
Floorplanning
Floorplanning
Post-Layout
Post-Layout
Simulation
Simulation
Circuit
Circuit
Extraction
Extraction
Placement
Placement
Physical
Routing
Routing
Tape-out
Most Common Design Approach for Designs up to 500Mhz
Clock Rates
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
4
Standard Cell Example
Power Supply Line (VDD)
Delay in (ns)!!
3-input NAND cell
(from ST Microelectronics):
C = Load capacitance
T = input rise/fall time
Ground Supply Line (GND)
ƒ Each library cell (FF, NAND, NOR, INV, etc.) and the variations on size
(strength of the gate) is fully characterized across temperature, loading, etc.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
5
Standard Cell Layout Methodology
2-level metal technology
Current Day Technology
Cell-structure hidden under interconnect layers
ƒ With limited interconnect layers, dedicated routing channels
between rows of standard cells are needed
ƒ Width of the cell allowed to vary to accommodate complexity
ƒ Interconnect plays a significant role in speed of a digital circuit
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
6
Verilog to ASIC Layout
(the push button approach)
After
Synthesis
module adder64 (a, b, sum);
input [63:0] a, b;
output [63:0] sum;
assign sum = a + b;
endmodule
After Routing
After
Placement
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
7
Macro Modules
256×32 (or 8192 bit) SRAM Generated by hard-macro module generator
ƒ Generate highly regular structures (entire memories,
multipliers, etc.) with a few lines of code
ƒ Verilog models for memories automatically generated
based on size
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
8
Clock Distribution
Clock skew
D
Q
Image removed due to
copyright restrictions.
D
Q
For 1Ghz clock, skew budget is 100ps.
Variations along different paths arise
from:
• Device: VT, W/L, etc.
• Environment: VDD, °C
• Interconnect: dielectric thickness
variation
L15: 6.111 Spring 2006
IBM Clock Routing
Introductory Digital Systems Laboratory
Image removed due to
copyright restrictions.
9
The Power Supply Wires are Not Ideal!
To VDD Grid
To VDD Grid
Ccoup
To VDD Grid
Receiver
Cint
Rd
Cd
Driver
GROUND GRID
Pad
Pad
The IR-drop problem causes internal power supply voltage
to be less than the external source
Used with permission.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
10
Analog Circuits: Clock Frequency
Multiplication (Phase Locked Loop)
up
down
„
VCO
produces high frequency square wave
„
Divider
divides down VCO frequency
„
PFD
compares phase of ref and div
„
Loop filter
extracts phase error information
Used widely in digital systems for clock synthesis
(a standard IP block in most ASIC flows)
Courtesy Michael Perrott. Used with permission.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
11
Scan Testing
...
0
1
ScanShift
shift out
0
1
CLK
ScanShift
shift in
ScanShift
shift in
Used with permission
Idea: have a mode in which all registers are chained
into one giant shift register which can be loaded/
read-out bit serially. Test remaining (combinational)
logic by
(1) in “test” mode, shift in new values for all
register bits thus setting up the inputs to the
combinational logic
(2) clock the circuit once in “normal” mode, latching
the outputs of the combinational logic back into
the registers
(3) in “test” mode, shift out the values of all
register bits and compare against expected
results.
Clk
ScanShift
Response To The Test
Vector Loaded
Primary
Inputs
Scan-Flops
Load/Unload Cycles
Load/Unload Cycles
Primary
Outputs
Normal System
Figure by MIT OpenCourseWare.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
12
Behavioral Transformations
ƒ There are a large number of implementations of the same
functionality
ƒ These implementations present a different point in the
area-time-power design space
ƒ Behavioral transformations allow exploring the design
space a high-level
Optimization metrics:
power
1. Area of the design
2. Throughput or sample time TS
3. Latency: clock cycles between
the input and associated output
change
4. Power consumption
5. Energy of executing a task
6. …
L15: 6.111 Spring 2006
area
time
Introductory Digital Systems Laboratory
13
Fixed-Coefficient Multiplication
Conventional Multiplication
Z=X·Y
Z7
X3 · Y3
Z6
X3 · Y2
X2 · Y3
Z5
X3 · Y1
X2 · Y2
X1 · Y3
Z4
X3
Y3
X3 · Y0
X2 · Y1
X1 · Y2
X0 · Y3
Z3
X2
Y2
X2 · Y0
X1 · Y1
X0 · Y2
X1
Y1
X1 · Y0
X0 · Y1
X0
Y0
X0 · Y0
Z2
Z1
Z0
Constant multiplication (become hardwired shifts and adds)
Z = X · (1001)2
Z7
Y = (1001)2 = 23 + 20
X3
Z6
X2
Z5
X1
Z4
X3
1
X3
X0
Z3
X
X1
0
X1
X0
1
X0
Z2
Z1
Z0
Z
<< 3
L15: 6.111 Spring 2006
X2
0
X2
Introductory Digital Systems Laboratory
shifts using wiring
14
Transform: Canonical Signed Digits (CSD)
Canonical signed digit representation is used to increase the number of
zeros. It uses digits {-1, 0, 1} instead of only {0, 1}.
Iterative encoding: replace
string of consecutive 1’s
1
0
1 … 1
1
1
0
2N-2 + … + 21 + 20
0 … 0 -1
2N-1 - 20
Worst case CSD has 50% non zero bits
01101111
0
1
1
0
1
1
1
1
1
1
1
0
0
0
-1
1
0
0
-1
0
0
0
-1
=
0
10010001
X
<< 7
Z
<< 4
Shift translates to re-wiring
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
15
Algebraic Transformations
Distributivity
Commutativity
A
B
B
A
A
A
B
C
B
C
⇔
⇔
A+B=B+A
(A + B) C = AB + BC
Common sub-expressions
Associativity
A
C
B
B
C
⇔
Y
X
A
Y
X
X
⇔
A
B
A
B
(A + B) + C = A + (B+C)
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
16
Transforms for Efficient Resource Utilization
A
B
C
D
E
FG
H
I
Time multiplexing: mapped to
3 multipliers and 3 adders
1
2
distributivity
A
C
B D
E
FG
H
I
1
Reduce number of operators
to 2 multipliers and 2 adders
2
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
17
A Very Useful Transform: Retiming
Retiming is the action of moving delay around in the systems
ƒ Delays have to be moved from ALL inputs to ALL outputs or vice versa
D
D
D
D
D
Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint
partitions of these edges being cut. To retime, delays are moved from the ingoing to the
outgoing edges or vice versa.
D
D
D
Benefits of retiming:
• Modify critical path delay
• Reduce total number of registers
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
Courtesy of Prof. Charles E. Leiserson.
18
Retiming Example: FIR Filter
x(n)
h(0)
Symbol for multiplication
D
D
D
h(1)
h(2)
Direct form
h(3)
y (n) = h(n) ⊗ x(n) = ∑ x(n − i ) ⋅ h(i )
i =0
y(n)
x(n)
(10) h(0)
K
associativity of
the addition
D
D
D
h(1)
h(2)
h(3)
Tclk = 22 ns
y(n)
retime
(4)
x(n)
h(0)
y(n)
h(1)
D
h(2)
D
h(3)
Transposed form
Tclk = 14 ns
D
Note: here we use a first cut analysis that assumes the delay of a chain of operators is the sum
of their individual delays. This is not accurate.
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
19
Pipelining, Just Another Transformation
(Pipelining = Adding Delays + Retiming)
Contrary to retiming,
pipelining adds extra registers
to the system
add input
registers
D
D
D
D
How to pipeline:
1. Add extra registers at
all inputs
2. Retime
retime
D
L15: 6.111 Spring 2006
D
D
D
D
Introductory Digital Systems Laboratory
20
The Power of Transforms: Lookahead
loop
unrolling
y(n)
x(n)
A
y(n)
x(n)
y(n) = x(n) + A y(n-1)
2D
A
D
A
D
y(n) = x(n) + A[x(n-1) + A y(n-2)]
Try pipelining
this structure
distributivity
How about pipelining
this structure!
y(n)
x(n)
2D
D
A
A
A
associativity
y(n)
x(n)
y(n)
x(n)
A
D
D
D
retiming
2D
D
A
A2
L15: 6.111 Spring 2006
A2
precomputed
Introductory Digital Systems Laboratory
21
Key Concern in Modern VLSI: Variations!
GATE
DRAIN
SOURCE
100
D
Tox
90
BODY
80
Leff
Random Dopant Fluctuations
60
50
10000
40
1000
Temp Variation & Hot spots
100
10
1000
500
250
130
65
32
Technology Node (nm)
Path Delay
With 100b transistors,
1b unusable (variations)
Probability
Mean Number of Dopant
Atoms
70
Temperature (C)
110
Due to
variations in:
Vdd, Vt, and
Temp
Delay
Deterministic design techniques inadequate in the future
L15: 6.111 Spring 2006
Courtesy of Shekhar Y. Borkar (Intel). Used with permission.
Introductory Digital Systems Laboratory
22
Trends: “Chip in a Day”
(Matlab/Simulink to Silicon…)
S reg
Mult1
Mac1
X reg
Add,
Sub,
Shift
Mult2
Mac2
Map algorithms directly to silicon - bypass writing Verilog!
(Courtesy of R. Brodersen. Used with permission.)
L15: 6.111 Spring 2006
Introductory Digital Systems Laboratory
23
Trends: Watermarking of Digital Designs
Fingerprinting is a technique to deter people from illegally
redistributing legally obtained IP by enabling the author of the IP to
uniquely identify the original buyer of the resold copy.
The essence of the watermarking approach is to encode the author's
signature. The selection, encoding, and embedding of the signature
must result in minimal performance and storage overhead.
Image removed due to
copyright restrictions.
L15: 6.111 Spring 2006
Image removed due to
copyright restrictions.
Introductory Digital Systems Laboratory
24
Download