Physical Synthesis 2.0 - International Workshop on Logic and

advertisement
Physical Synthesis 2.0
Andrew B. Kahng
UCSD CSE and ECE Departments
abk@ucsd.edu
http://vlsicad.ucsd.edu
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
1
[UCSD
ECE 260B
CSE 241A]
Concept: “Design Principles”

Partition the problem  divide and conquer, hierarchy




Different abstraction levels: RT-level, gate-level, switch-level,
transistor-level
Orthogonalize concerns

Function vs. implementation

Logic vs. timing vs. embedding
Solve chicken-egg conundrums
Constrain the design space to simplify the design
process

Balance between design complexity and performance

E.g., standard-cell methodology

 “freedom from choice”
ECE 260B – CSE 241A Intro and ASIC Flow 2
Andrew B. Kahng, UCSD
Concept: How the IC Design Flow is Evolving
 Flow expands in two directions


System-Level Design
Design for Manufacturability (DFM)
 More design care-abouts

Area, Timing, Power, Signal Integrity,
Reliability, Cost
 Key challenges: loops, chicken-egg


“Design closure” through tight
integrations
RTL, GDSII “signoffs” = business
structure of semiconductor creation
Architecture Design
High Level Synthesis
RTL
Verification
Logic
Synthesis
Gate Netlist
FP, Place, CTS, Opt
 “One-pass flow”:
required for
Updated Gate Netlist
Productivity, requires Predictability




By Guardbands?
By “Unifications”?
By Statistics?
By Methodology (to avoid issues)?
[UCSD
ECE 260B
CSE 241A]
Extraction,
Timing, Physical
Verification
Routing
GDSII
Manufacturing
ECE 260B – CSE 241A Intro and ASIC Flow 3
Andrew B. Kahng, UCSD
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges / Stressors
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
4
Logic Design Needs Spatial Information
• High aspect ratio floorplan: shift one macro block from left to
right, and vary its shape (with constant area)
• 10% power range (post-route): center location, taller blockage
= more power, more contribution of wire (delays)
• Separation of logical, temporal, spatial must crumble
230
225
Shift the location of blockage
Macro size
260µm x 65µm
184µm x 92µm
Power (mW)
220
215
210
205
200
195
190
0%
25%
50%
75%
100%
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
5
How Do We Predict Spatial Information ?
• Predict by modeling
• Machine learning, regression, etc.
• (Don’t dismiss this!)
[SLIP15] http://vlsicad.ucsd.edu/Publications/Conferences/325/c325.pdf
[DAC00] http://vlsicad.ucsd.edu/Publications/Conferences/112/c112.pdf
[DATE13] http://vlsicad.ucsd.edu/Publications/Conferences/296/c296.pdf
[SLIP13] http://vlsicad.ucsd.edu/Publications/Conferences/300/c300.pdf
• Predict by assuming and enforcing
• Make a prediction, then make the prediction come true
• (Constant-delay methodology)
• Predict by doing
• Constructive prediction
• (Run under the hood – quick and dirty, else no leverage)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
6
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
7
Synthesis vs. Physical Synthesis
• Synthesis (DC, RC)
•
•
•
•
•
•
•
Elaboration, mapping to generic gates
Clock gating
Apply timing constraints, remap / optimize
Multibit FF optimization
MBIST insertion
Scan chain stitching
Further optimization, area recovery
• Physical Synthesis (DCT/DCG, RCP)
•
•
•
•
•
LEF list
Tech file, map file
tluplus_{max,min}
floorplan DEF
{min,max}_routing_layer
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
8
Physical Synthesis
• In
• RTL + SDC + Library models + Floorplan DEF
• Out
• Better netlist (usually), at one (worst) corner
• Better netlist (usually) + placed DEF (not legalized)
• N.B.: very fast TAT required by customers
• Netlist (+ placed DEF) is passed to P&R + signoff
• Place, placeOpt, CTS, CTSOpt, route, routeOpt, leakage
recovery, timing closure
• Different companies and tools in a long tool chain
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
9
Example
Physical Synthesis
RC tech file
(tluplus,captable)
Floorplan information
Floorplan
Specified by
designers
physical
information
Libraries, LEF,
tech files
e.g., DCT
(Physical
Synthesis)
Netlist + initial
placement
Floorplan in DEF or
physical guidance
P&R flow
Routed Results
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
10
Note: “P&R + Signoff” is Complicated!
• N. MacDonald, Broadcom Corp., “Timing Closure in Deep
Submicron Designs”, 2010 DAC Knowledge Center article
TOP-LEVEL NETLIST / SPEF
BLOCK-LEVEL NETLIST / SPEF
Static Timing Analysis for all Modes / Corners
About 5
iterations
Timing Closed
Breakdown of Timing Violations on per Block Basis
Manual Repair of Timing Failures
Operations Permitted at Each Iteration
(in order of preference)
(1) Vt Swap, Resizing, Buffer Insertion,
NDR Changes, Useful Skew
(2) Vt Swap, Resizing, Buffer Insertion,
NDR Changes
(3) Vt Swap, Resizing, Buffer Insertion
(4) Vt Swap, Resizing
(5) Vt Swap
Violation Classes Addressed
for Each Iteration (in order of priority)
(1) Electrical Rule Violations
(2) Noise Violations
(3) Setup Violations
(4) Hold Violations
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
11
[DAC15]
Since That Article Was Written:
90nm
65nm
45/40nm
28nm
Temp inversion
Maxtrans
Dynamic IR
PBA
Fixed‐margin spec
Noise
EM
MCMM
20nm
Multi‐
patterning
16/14nm
10nm ≤7nm
MOL, BEOL R 
MIS
Cell‐POCV
Phys‐aware timing ECO
AOCV / POCV
Min implant
LVF
BTI
BEOL, MOL variations
Signoff criteria with AVS
SOC complexity
Fill effects
Layout rules
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
12
How Can Physical Synthesis Possibly Work?
• “If it sounds too good to be true, it usually is …”
• What do we do with constraints at (physical) synthesis
stage?
• Overconstrain the clock period in synthesis (was by 20%, now by
~10%)
• Utilization: 60% target in synthesis (sometimes 50%, 55%) 
85+% post-placement
•
•
•
•
Which detailed placer, CTS tool, router, optimizer?
Complex tool “sensitivities” (noisy, chaotic behavior)
Information that is ignored (advanced manufacturing)
Information that is never available (CTS, SI)
• What explains “success”? Guardbands, low expectations…?
• Designers’ preoccupation with area and schedule helps…
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
13
Challenges
• FinFET, BEOL scaling effects
• Drive
• Resistivity
• Gate-wire balance
• Clock effects
• Skew across corners
• Top-level clock distribution (CGCs, muxes, dividers, …)
• Useful skews = area vs. delay tradeoffs
• “Extreme localization” effects
• Advanced (multi-)patterning
• Pin access, congestion, coupling
• Breakdown of placement-optimization separation
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
14
Questions
• If Logic Synthesis can’t know outcomes at end of
Physical Design, can it be doing the right thing?
(Simple information arguments) (What margin is left on the table? Are
we seeing placebo effects (association vs. causation etc.)?)
• Can Logic Synthesis be made better aware of
future Physical Design outcomes?
• Is Logic Synthesis at risk of being eclipsed by
Physical Design? (Venus-Mars  Sun-Moon, etc.)
LS
LS
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
15
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
16
FinFET: Current Density + Discreteness
• Better electrostatic control + continued gate length scaling
• Drive current   cell height  (e.g., 8.25T), better area density (w/ fin height )
• Effective width 1.6x equivalent area with planar devices
• Current density , plus fin discreteness challenges
Multi-Fin 3D FinFET
http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐jan2013.aspx
Metal VIA1 (M1  M2)
VIA0 (MOLx  M1)
NWell
1Pfin
3Pfin
Poly
Fin
2Pfin
Active
M1
3Pfin
1Pfin
M2
MOL1
MOL2
4Ppoly
http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
process‐soc‐2015q1.aspx
17
FinFET: Aggressive Voltage Scaling
• FinFET enables voltage scaling for reduced dynamic
power
• Better electrostatic control  better performance at low supply
voltage
• High-performance mode: wire-dominated
• Low-performance mode: gate-dominated
C. H. Lin, VLSI‐TSA, 2012, p. 1‐2.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
18
[DAC15]
Gate-Wire Balancing
• Unbalanced gate-wire delay causes severe delay variation
on data and clock paths across modes
• Delay variation in clock paths == skew variation
 Increased difficulty for timing closure (“ping-pong effect”)
• Minimization of skew variation is important for timing closure
(Our work at DAC15 uses global-local optimization achieves 22% skew variation reduction)
Skew = -0.1/+0.2
datapath
1.0 /0.7
launch path
Corner
1.1
/0.7
Clock latency
Skew
Launch Capture
SS, 0.7V, ‐25°C
1.0
1.1
‐0.1
FF, 1.1V, ‐25°C
0.9
0.7
+0.2
Low voltage: gate delay dominates
capture path High voltage: wire delay dominates
 Skew reversal
 Power/area overheads
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
19
FinFET: Less Body Effect, Richer Libraries?
• FinFET 4-input NAND ~ planar bulk 3-input NAND
• More complex cells / higher fan-in cells could be
made available to synthesis
w/ body effect
Number of fan‐in limited by body effect
‘Bulk FinFETs: Fundamentals, Modeling, and Application’, Jong‐Ho Lee, SNU
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
20
[DAC15]
Pin Accessibility Below 20nm
• Routing challenged by complex rules for multi-patterning
Inserted via
Blocked by the via
< MinOverlap
< MinSpacing
metal pitch < via pitch
• Limited pin access with small track cells
Wider power rail
• Wider power rail
for reliable connection
M2
 fewer pin access points
V1
M1
• Complex design rules
Poly
Fin
+ less pin access
Pin accessibility problem
Access Difficulty
in routingarea reduction
point

conflict
between
and routability
9T NAND2
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
21
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
[ISQED02] http://vlsicad.ucsd.edu/Publications/Conferences/131/c131.pdf
[iSQED10] http://vlsicad.ucsd.edu/Publications/Conferences/267/c267.pdf
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
New Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
22
Slack vs. Layout Context
• Layout knobs: SRAM pitches and buffer keepout distances
• Post-P&R slacks of five embedded memories is “chaotic”
• Physical synthesis challenge: Logic optimization given “chaos”
1
2
3
4
5
Blockage
sram_pitch
Placement region for
standard cells
Blockage
Blockage
WNS of paths through SRAMs (ns)
Buffer keepouts
‐0.7
Delta slack > 300ps
‐0.8
‐0.9
slack‐1
‐1
slack‐2
‐1.1
slack‐3
slack‐4
‐1.2
slack‐5
‐1.3
0
10
20
30
SRAM pitch (um)
Testcase: Logic from OpenCores GPU THEIA + SRAMs
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
23
[SLIP15]
Slack vs. Clock Period
• ∆path slack is 81ps at signoff clock period of 1.0ns
• Changing clock period to 0.82ns changes ∆path
slack to 143ps!
0.14
143ps at tighter clock period
0.13
0.12
81ps at signoff clock period
0.11
0.1
0.09
0.08
0.07
0.06
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
1.30
Max Delta Path Slack (SI – non‐SI) (ns)
0.15
Clock period (ns)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
24
[SLIP15]
Non-SI vs. SI
Path Slack in Non‐SI Mode (ns)
• Top-1000 critical paths from Viterbi design (clock period = 1.0ns)
• Slack diverges by 81ps !!! ~4 stages of logic at 28nm FDSOI
• Unfortunately, we don’t know coupling before routing !!!
Ideal correlation
81ps
Path slack in SI Mode (ns)
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
25
[DAC15]
3DIC Power (mW)
WLM, RC (Interconnect proxy) Effects
23
22.8
22.6
22.4
22.2
22
21.8
21.6
21.4
21.2
21
20.8
1.35mW
(6.43%)
0
0.2
0.4
0.6
WLM Cap (pF)
0.8
1
1.2
• Example: SOCE-based “Shrunk2D” (S2D) flow [1]
• Perform synthesis with different WLM caps, P&R with S2D flow
• Shown: total power (#buffers, #instances, instance area, WL, …
similar)
[1] Panth et al., “Design and CAD Methodologies for Low Power Gate‐Level Monolithic 3D ICs”, Proc. ISLPED, 2014, pp. 171‐176. A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
26
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
27
Sensitivity of CTS Outcomes to Layout Contexts
800
BL
BLM
B
RBM
[SLIP13]
R
Fall delay (ps)
700
R
600
500
400
RBM
300
BL BLM B
200
100
8.00
4.0
3.0
10.00
Core aspect ratio
2.5
2.0
1.0
0.5
0.4
0.33
0.250
0.125
0.1
0
• Delay varies by up to 43% with clock entry point locations
• Delay varies by up to 45% with core aspect ratio
• NDRs, fill, buffer sizes, max fanout / max trans rules, …
 100ps impacts on insertion delays, skew, slacks
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
28
[ISQED14]
Useful Skew Improves Timing
• Useful skew optimization adjusts clock sink latencies to
improve timing
• Our predictive useful skew flow resolves the “chicken-and-egg
loop”  further improved timing
Zero skew
Clock
7/3
5
Clock
7
7/3
5
Useful skew
FF1
FF2
-1000
7/2
FF3
5
10/2
FF2
6
7/2
FF3
5
Delay/Slack Clock latency
Total negative slack
FF1
10/0
-893
-800
-600
Useful skew
improves timing
-400
-197
-200
-60
0
Zero skew
Typical
Predictive
useful skew useful skew
6 testcases {3 RTLs x 2 clock periods}
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
29
Conventional Useful Skew Optimization
• Standard useful skew flow has chicken-egg problem
Netlist and placement
assume zero skew
Useful skew optimization
relies on placement
• One solution: Back-annotation flows (large runtime)
RTL netlist
Synthesis
Back annotation
Placement / Place Opt.
Wang et al. in DAC06 propose to back‐
annotate useful skew from post‐
placement to before‐synthesis
CTS
CTS Opt.
Skew_opt
Routing / Route Opt.
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
30
NOLO: No-Loop Useful Skew Optimization
• Our work: Cure the chicken-egg problem with delay prediction
RTL netlist
Synthesis w/ Multi-Vt
Synthesis w/ LVT
Predictive Useful Skew
LVT-only netlist
Placement/Place Opt.
CTS/CTS Opt.
Routing/Route Opt.
• Use setup slacks from LVT-only synthesis
 estimation of achievable slacks
• Use hold slacks from multi-VT synthesis
 reduce pessimism
• Advantage: One-pass approach, not
constrained by placement
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
31
Experimental Results
• Predictive flow achieve similar or better timing and much
smaller runtime
160
Runtime (min)
150
100
50
aes_cipher
0
-6
-5
TNS (ns)
-4
80
40
des_perf
0
-7
-3
1600
Runtime (min)
120
-6
-5
TNS (ns)
-4
-3
200
1200
Runtime (min)
Runtime (min)
200
150
800
100
400
jpeg_encoder
0
-25
-20
-15
TNS (ns)
-10
Back annotation (BA)
Prediction (w/ LVT-only syn)
50
mpeg2
0
-9
-8
TNS (ns)
-7
-6
Prediction (w/o LVT-only syn)
Average ofA.various
BA flows
B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
32
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
33
BEOL Multi-Patterning Impacts
Mandrel
Spacer
Mx metal
Line-end cuts
Mwidth
Wire1width = Mwidth
Swidth
Mspace
Line-end extensions
Floating fill wires
Wire2width = Mspace – 2*Swidth
Mandrel
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
34
[ICCAD15]
Placement-Sizing Interference
• New “interferences” between post-layout optimization
and P&R
• Rules for device layers (FEOL) become considerably
more complex and restrictive
• Minimum implant width rules for implant region
• Minimum notch and jog width rule for oxide diffusion (OD)
OD
HVT LVT
HVT
HVT
LVT
LVT
HVT
HVT
Cell boundary
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
35
[ICCAD15]
Placement-Sizing Interference (cont.)
• Drain-to-drain abutment (DDA)
√
D
D
D
S
Poly
Active region
Cell boundary
D
S
Connection
Power/ground
• Example solution
DDA
violation
Min implant width
violation
Min jog/notch width
violation
Min implant width
violation
Intertwine the historically separate tasks of P&R and post‐
route optimization
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
36
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
37
[ISQED14]
I. Flexible Timing Models
setup‐hold‐c2q flexible model
c2q1
...
• Setup time, hold time and clock-to-q hold
(c2q) delay of FF
⇒ values interdependent, but NOT fixed
• Flexible FF timing model can exploit
operating (function/test) modes
⇒ “Free” pessimism reduction in STA
setup‐hold‐c2q fixed model
c2qn
• Goal: Find best {setup, hold, c2q} for each FF instance
• Sequential LP:
• setup-c2q opt
• hold-c2q opt
C2q‐setup‐hold surface
setup
c2q
hold
c2q
c2q
setup
hold
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
38
Flexible Timing Model  Recover Margin
• Independent datapaths in PBA: using fixed FF timing
model loses performance optimization opportunity
c2q: 20ps
setup: 10ps
FF1
480ps
Total: 500ps
470ps
470ps
setup: 10ps
20ps
460ps
FF3
c2q: 20ps
10ps
460ps
480ps
FF2
Total: 500ps
c2q: 10ps
20ps
setup: 20ps
10ps
Total: 500ps  500ps!
520ps?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
39
Improved Timing Signoff Flow
Netlist (and SPEF, if routed)
Extract path timing information
Takeaways
•
•
LP formulation with flexible flip‐flop timing model
Fix timing violations “for free”
48ps average improvement of
slack over 5 designs in a
foundry 65nm technology
Next
Solve Sequential LP (STA_FTmax , STA_FTmin)
Solution
Annotate new timing model for each flip‐flop
•
•
•
Better exploitation of disjoint
cycles/modes
More accurate modeling of
setup-hold-c2q tradeoff
Circuit optimization should
natively exploit FF timing model
flexibility
Timing signoff with annotated timing
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
40
[DATE13]
II. Signoff Definition (e.g., with AVS, Aging)
• VBTI : Voltage for BTI‐aging estimation
• Vlib : Supply voltage for timing library characterization
• Vfinal: Vdd of a circuit with AVS at end‐of‐lifetime
VBTI
|Vt|
Vlib
Derated
library
Circuit
implementation
and signoff
Circuit implementation depends on VBTI and Vlib
?
VBTI and Vlib
depend on aging during AVS (Vfinal)
Vfinal
Chicken & Egg Loop
BTI degradation
and AVS
Vfinal
depends on circuit
circuit
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
41
Observations and Heuristics
Observation #1: Vfinal is not sensitive to cells along the timing‐critical path
Observation #2: ΔVt with a constant Vfinal
throughout lifetime ≈ adaptive Vdd
Heuristic #1: Use average of
critical path replicas to
estimate Vfinal (Vheur)
Heuristic #2: approximate Vdd in AVS by constant Vheur
Solve “Chicken & Egg Loop” by having VBTI = Vlib = Vheur≈ Vfinal
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
42
“Knee” Point for Signoff Definition
Optimistic aging library  large power penalty
Ignore AVS  larger area
Low Vlib
High Vlib
Low
VBTI
Slower circuit
Less aging
Faster circuit
Less aging
High
VBTI
Slower circuit More aging
Faster circuit
More aging
Overly pessimistic aging library  large area penalty
Our method finds “Knee” point for balanced area and power tradeoff
Experiment setup:
DC/AC BTI @ 125°C
32nm PTM technology
4 benchmark circuit implementations
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
43
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
44
Mixed Cell Height Implementation (!)
[ICCAD15]
• Large cell height  better timing, but large area and power
• Small cell height  smaller area/power per gate, but large delay
and more #buffers
• Mixing cell height enables tradeoffs between performance and
area/power (recall FinFET introduction!)  better design QoR
• E.g., use large-height high-fanin cells to improve pin accessibility
• Already have flop trays, etc. as problematic multi-height instances
Technology: 28nm LP
In red are 12T cells = larger area, smaller delay
In blue are 8T cells = smaller area, larger delay
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
45
Cost of Mixing Cell Heights
• “Breaker cells” are required to align regions with different cell heights
 Optimization must comprehend corresponding area cost
X directional shift
…
8T Cell
four sites
12T Cell
…
Y directional shift
one M2 pitch
64nm 48nm
64nm
Assume: M2 pitch = 64nm
…
12T Cell
12T Cell
…
Cell boundary
P/G rail
No routing blockage
Routing blockage on M1/M2
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
46
Optimization Flow
Synthesis
Initial placement
Partitioning
Legalization
Floorplan Update
Cell mapping
Routing / RoutOpt
 Initial placement uses modified LEF
 enable optimization with a
conventional flow
 Slicing-based partition with DP to
divide die area into regions with
different cell heights
 Internal-timer guided placement
legalization
 Floorplan update with “breaker cell”
penalty
 Row-based cell mapping places cells
onto rows with corresponding heights
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
47
Example of Optimization Flow
Initial placement
(8T/12T cells are “freely” placed)
Partitioning
(Yellow blocks = regions)
Legalization
Mixed-height placement
New floorplan
Technology: 28nm LP
Design: AES
8T cells are in blue
12T cells are in red
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
48
Benefits from Mixing Cell Heights
• Technology: 28nm LP (12T/8T) Design: AES
• 25% area reduction as compared to 12T-only design
• 20% performance improvement compared to 8T-only design
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
49
Outline
• Why Physical Synthesis
• Physical Synthesis 1.0
• Example Challenges
•
•
•
•
•
•
FinFET
Noise and Chaos
Clock Skew
Complexity and Hyperlocality
Better (and, more complex) Signoff
A Mixed-Height Sweet Spot?
• Physical Synthesis 2.0 ?
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
50
Physical Synthesis 2.0
• It’s the predictability! (and, prediction is challenged…)
• New devices and patterning technologies
• Complex PD tool chain; chaotic behavior of tools and flows
• Oblivious to clocks, corners, coupling  how can Physical
Synthesis be doing the right thing? (= target for margin recovery!)
LS
LS
• What will Physical Synthesis 2.0 look like?
• (1) Higher-level value: what Physical Design cannot do
• Datapath architecture selection
• Resource sharing
• Mux mapping
• (2) Other types of prediction (machine learning, big data, etc.) !
• (3) Constructive prediction deeper into implementation flow
• (More integration… ) Clock and MCMM awareness
• Hyperlocality awareness: coloring, congestion, coupling, interactions …
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
51
THANK YOU !
A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote
52
Download