Physical Synthesis 2.0 Andrew B. Kahng UCSD CSE and ECE Departments abk@ucsd.edu http://vlsicad.ucsd.edu A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 1 [UCSD ECE 260B CSE 241A] Concept: “Design Principles” Partition the problem divide and conquer, hierarchy Different abstraction levels: RT-level, gate-level, switch-level, transistor-level Orthogonalize concerns Function vs. implementation Logic vs. timing vs. embedding Solve chicken-egg conundrums Constrain the design space to simplify the design process Balance between design complexity and performance E.g., standard-cell methodology “freedom from choice” ECE 260B – CSE 241A Intro and ASIC Flow 2 Andrew B. Kahng, UCSD Concept: How the IC Design Flow is Evolving Flow expands in two directions System-Level Design Design for Manufacturability (DFM) More design care-abouts Area, Timing, Power, Signal Integrity, Reliability, Cost Key challenges: loops, chicken-egg “Design closure” through tight integrations RTL, GDSII “signoffs” = business structure of semiconductor creation Architecture Design High Level Synthesis RTL Verification Logic Synthesis Gate Netlist FP, Place, CTS, Opt “One-pass flow”: required for Updated Gate Netlist Productivity, requires Predictability By Guardbands? By “Unifications”? By Statistics? By Methodology (to avoid issues)? [UCSD ECE 260B CSE 241A] Extraction, Timing, Physical Verification Routing GDSII Manufacturing ECE 260B – CSE 241A Intro and ASIC Flow 3 Andrew B. Kahng, UCSD Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges / Stressors • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff New Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 4 Logic Design Needs Spatial Information • High aspect ratio floorplan: shift one macro block from left to right, and vary its shape (with constant area) • 10% power range (post-route): center location, taller blockage = more power, more contribution of wire (delays) • Separation of logical, temporal, spatial must crumble 230 225 Shift the location of blockage Macro size 260µm x 65µm 184µm x 92µm Power (mW) 220 215 210 205 200 195 190 0% 25% 50% 75% 100% A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 5 How Do We Predict Spatial Information ? • Predict by modeling • Machine learning, regression, etc. • (Don’t dismiss this!) [SLIP15] http://vlsicad.ucsd.edu/Publications/Conferences/325/c325.pdf [DAC00] http://vlsicad.ucsd.edu/Publications/Conferences/112/c112.pdf [DATE13] http://vlsicad.ucsd.edu/Publications/Conferences/296/c296.pdf [SLIP13] http://vlsicad.ucsd.edu/Publications/Conferences/300/c300.pdf • Predict by assuming and enforcing • Make a prediction, then make the prediction come true • (Constant-delay methodology) • Predict by doing • Constructive prediction • (Run under the hood – quick and dirty, else no leverage) A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 6 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff New Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 7 Synthesis vs. Physical Synthesis • Synthesis (DC, RC) • • • • • • • Elaboration, mapping to generic gates Clock gating Apply timing constraints, remap / optimize Multibit FF optimization MBIST insertion Scan chain stitching Further optimization, area recovery • Physical Synthesis (DCT/DCG, RCP) • • • • • LEF list Tech file, map file tluplus_{max,min} floorplan DEF {min,max}_routing_layer A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 8 Physical Synthesis • In • RTL + SDC + Library models + Floorplan DEF • Out • Better netlist (usually), at one (worst) corner • Better netlist (usually) + placed DEF (not legalized) • N.B.: very fast TAT required by customers • Netlist (+ placed DEF) is passed to P&R + signoff • Place, placeOpt, CTS, CTSOpt, route, routeOpt, leakage recovery, timing closure • Different companies and tools in a long tool chain A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 9 Example Physical Synthesis RC tech file (tluplus,captable) Floorplan information Floorplan Specified by designers physical information Libraries, LEF, tech files e.g., DCT (Physical Synthesis) Netlist + initial placement Floorplan in DEF or physical guidance P&R flow Routed Results A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 10 Note: “P&R + Signoff” is Complicated! • N. MacDonald, Broadcom Corp., “Timing Closure in Deep Submicron Designs”, 2010 DAC Knowledge Center article TOP-LEVEL NETLIST / SPEF BLOCK-LEVEL NETLIST / SPEF Static Timing Analysis for all Modes / Corners About 5 iterations Timing Closed Breakdown of Timing Violations on per Block Basis Manual Repair of Timing Failures Operations Permitted at Each Iteration (in order of preference) (1) Vt Swap, Resizing, Buffer Insertion, NDR Changes, Useful Skew (2) Vt Swap, Resizing, Buffer Insertion, NDR Changes (3) Vt Swap, Resizing, Buffer Insertion (4) Vt Swap, Resizing (5) Vt Swap Violation Classes Addressed for Each Iteration (in order of priority) (1) Electrical Rule Violations (2) Noise Violations (3) Setup Violations (4) Hold Violations A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 11 [DAC15] Since That Article Was Written: 90nm 65nm 45/40nm 28nm Temp inversion Maxtrans Dynamic IR PBA Fixed‐margin spec Noise EM MCMM 20nm Multi‐ patterning 16/14nm 10nm ≤7nm MOL, BEOL R MIS Cell‐POCV Phys‐aware timing ECO AOCV / POCV Min implant LVF BTI BEOL, MOL variations Signoff criteria with AVS SOC complexity Fill effects Layout rules A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 12 How Can Physical Synthesis Possibly Work? • “If it sounds too good to be true, it usually is …” • What do we do with constraints at (physical) synthesis stage? • Overconstrain the clock period in synthesis (was by 20%, now by ~10%) • Utilization: 60% target in synthesis (sometimes 50%, 55%) 85+% post-placement • • • • Which detailed placer, CTS tool, router, optimizer? Complex tool “sensitivities” (noisy, chaotic behavior) Information that is ignored (advanced manufacturing) Information that is never available (CTS, SI) • What explains “success”? Guardbands, low expectations…? • Designers’ preoccupation with area and schedule helps… A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 13 Challenges • FinFET, BEOL scaling effects • Drive • Resistivity • Gate-wire balance • Clock effects • Skew across corners • Top-level clock distribution (CGCs, muxes, dividers, …) • Useful skews = area vs. delay tradeoffs • “Extreme localization” effects • Advanced (multi-)patterning • Pin access, congestion, coupling • Breakdown of placement-optimization separation A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 14 Questions • If Logic Synthesis can’t know outcomes at end of Physical Design, can it be doing the right thing? (Simple information arguments) (What margin is left on the table? Are we seeing placebo effects (association vs. causation etc.)?) • Can Logic Synthesis be made better aware of future Physical Design outcomes? • Is Logic Synthesis at risk of being eclipsed by Physical Design? (Venus-Mars Sun-Moon, etc.) LS LS A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 15 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 16 FinFET: Current Density + Discreteness • Better electrostatic control + continued gate length scaling • Drive current cell height (e.g., 8.25T), better area density (w/ fin height ) • Effective width 1.6x equivalent area with planar devices • Current density , plus fin discreteness challenges Multi-Fin 3D FinFET http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐jan2013.aspx Metal VIA1 (M1 M2) VIA0 (MOLx M1) NWell 1Pfin 3Pfin Poly Fin 2Pfin Active M1 3Pfin 1Pfin M2 MOL1 MOL2 4Ppoly http://www.synopsys.com/Company/Publications/DWTB/Pages/dwtb‐finfet‐ A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote process‐soc‐2015q1.aspx 17 FinFET: Aggressive Voltage Scaling • FinFET enables voltage scaling for reduced dynamic power • Better electrostatic control better performance at low supply voltage • High-performance mode: wire-dominated • Low-performance mode: gate-dominated C. H. Lin, VLSI‐TSA, 2012, p. 1‐2. A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 18 [DAC15] Gate-Wire Balancing • Unbalanced gate-wire delay causes severe delay variation on data and clock paths across modes • Delay variation in clock paths == skew variation Increased difficulty for timing closure (“ping-pong effect”) • Minimization of skew variation is important for timing closure (Our work at DAC15 uses global-local optimization achieves 22% skew variation reduction) Skew = -0.1/+0.2 datapath 1.0 /0.7 launch path Corner 1.1 /0.7 Clock latency Skew Launch Capture SS, 0.7V, ‐25°C 1.0 1.1 ‐0.1 FF, 1.1V, ‐25°C 0.9 0.7 +0.2 Low voltage: gate delay dominates capture path High voltage: wire delay dominates Skew reversal Power/area overheads A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 19 FinFET: Less Body Effect, Richer Libraries? • FinFET 4-input NAND ~ planar bulk 3-input NAND • More complex cells / higher fan-in cells could be made available to synthesis w/ body effect Number of fan‐in limited by body effect ‘Bulk FinFETs: Fundamentals, Modeling, and Application’, Jong‐Ho Lee, SNU A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 20 [DAC15] Pin Accessibility Below 20nm • Routing challenged by complex rules for multi-patterning Inserted via Blocked by the via < MinOverlap < MinSpacing metal pitch < via pitch • Limited pin access with small track cells Wider power rail • Wider power rail for reliable connection M2 fewer pin access points V1 M1 • Complex design rules Poly Fin + less pin access Pin accessibility problem Access Difficulty in routingarea reduction point conflict between and routability 9T NAND2 A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 21 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos [ISQED02] http://vlsicad.ucsd.edu/Publications/Conferences/131/c131.pdf [iSQED10] http://vlsicad.ucsd.edu/Publications/Conferences/267/c267.pdf Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff New Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 22 Slack vs. Layout Context • Layout knobs: SRAM pitches and buffer keepout distances • Post-P&R slacks of five embedded memories is “chaotic” • Physical synthesis challenge: Logic optimization given “chaos” 1 2 3 4 5 Blockage sram_pitch Placement region for standard cells Blockage Blockage WNS of paths through SRAMs (ns) Buffer keepouts ‐0.7 Delta slack > 300ps ‐0.8 ‐0.9 slack‐1 ‐1 slack‐2 ‐1.1 slack‐3 slack‐4 ‐1.2 slack‐5 ‐1.3 0 10 20 30 SRAM pitch (um) Testcase: Logic from OpenCores GPU THEIA + SRAMs A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 23 [SLIP15] Slack vs. Clock Period • ∆path slack is 81ps at signoff clock period of 1.0ns • Changing clock period to 0.82ns changes ∆path slack to 143ps! 0.14 143ps at tighter clock period 0.13 0.12 81ps at signoff clock period 0.11 0.1 0.09 0.08 0.07 0.06 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27 1.28 1.29 1.30 Max Delta Path Slack (SI – non‐SI) (ns) 0.15 Clock period (ns) A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 24 [SLIP15] Non-SI vs. SI Path Slack in Non‐SI Mode (ns) • Top-1000 critical paths from Viterbi design (clock period = 1.0ns) • Slack diverges by 81ps !!! ~4 stages of logic at 28nm FDSOI • Unfortunately, we don’t know coupling before routing !!! Ideal correlation 81ps Path slack in SI Mode (ns) A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 25 [DAC15] 3DIC Power (mW) WLM, RC (Interconnect proxy) Effects 23 22.8 22.6 22.4 22.2 22 21.8 21.6 21.4 21.2 21 20.8 1.35mW (6.43%) 0 0.2 0.4 0.6 WLM Cap (pF) 0.8 1 1.2 • Example: SOCE-based “Shrunk2D” (S2D) flow [1] • Perform synthesis with different WLM caps, P&R with S2D flow • Shown: total power (#buffers, #instances, instance area, WL, … similar) [1] Panth et al., “Design and CAD Methodologies for Low Power Gate‐Level Monolithic 3D ICs”, Proc. ISLPED, 2014, pp. 171‐176. A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 26 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 27 Sensitivity of CTS Outcomes to Layout Contexts 800 BL BLM B RBM [SLIP13] R Fall delay (ps) 700 R 600 500 400 RBM 300 BL BLM B 200 100 8.00 4.0 3.0 10.00 Core aspect ratio 2.5 2.0 1.0 0.5 0.4 0.33 0.250 0.125 0.1 0 • Delay varies by up to 43% with clock entry point locations • Delay varies by up to 45% with core aspect ratio • NDRs, fill, buffer sizes, max fanout / max trans rules, … 100ps impacts on insertion delays, skew, slacks A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 28 [ISQED14] Useful Skew Improves Timing • Useful skew optimization adjusts clock sink latencies to improve timing • Our predictive useful skew flow resolves the “chicken-and-egg loop” further improved timing Zero skew Clock 7/3 5 Clock 7 7/3 5 Useful skew FF1 FF2 -1000 7/2 FF3 5 10/2 FF2 6 7/2 FF3 5 Delay/Slack Clock latency Total negative slack FF1 10/0 -893 -800 -600 Useful skew improves timing -400 -197 -200 -60 0 Zero skew Typical Predictive useful skew useful skew 6 testcases {3 RTLs x 2 clock periods} A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 29 Conventional Useful Skew Optimization • Standard useful skew flow has chicken-egg problem Netlist and placement assume zero skew Useful skew optimization relies on placement • One solution: Back-annotation flows (large runtime) RTL netlist Synthesis Back annotation Placement / Place Opt. Wang et al. in DAC06 propose to back‐ annotate useful skew from post‐ placement to before‐synthesis CTS CTS Opt. Skew_opt Routing / Route Opt. A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 30 NOLO: No-Loop Useful Skew Optimization • Our work: Cure the chicken-egg problem with delay prediction RTL netlist Synthesis w/ Multi-Vt Synthesis w/ LVT Predictive Useful Skew LVT-only netlist Placement/Place Opt. CTS/CTS Opt. Routing/Route Opt. • Use setup slacks from LVT-only synthesis estimation of achievable slacks • Use hold slacks from multi-VT synthesis reduce pessimism • Advantage: One-pass approach, not constrained by placement A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 31 Experimental Results • Predictive flow achieve similar or better timing and much smaller runtime 160 Runtime (min) 150 100 50 aes_cipher 0 -6 -5 TNS (ns) -4 80 40 des_perf 0 -7 -3 1600 Runtime (min) 120 -6 -5 TNS (ns) -4 -3 200 1200 Runtime (min) Runtime (min) 200 150 800 100 400 jpeg_encoder 0 -25 -20 -15 TNS (ns) -10 Back annotation (BA) Prediction (w/ LVT-only syn) 50 mpeg2 0 -9 -8 TNS (ns) -7 -6 Prediction (w/o LVT-only syn) Average ofA.various BA flows B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 32 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 33 BEOL Multi-Patterning Impacts Mandrel Spacer Mx metal Line-end cuts Mwidth Wire1width = Mwidth Swidth Mspace Line-end extensions Floating fill wires Wire2width = Mspace – 2*Swidth Mandrel A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 34 [ICCAD15] Placement-Sizing Interference • New “interferences” between post-layout optimization and P&R • Rules for device layers (FEOL) become considerably more complex and restrictive • Minimum implant width rules for implant region • Minimum notch and jog width rule for oxide diffusion (OD) OD HVT LVT HVT HVT LVT LVT HVT HVT Cell boundary A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 35 [ICCAD15] Placement-Sizing Interference (cont.) • Drain-to-drain abutment (DDA) √ D D D S Poly Active region Cell boundary D S Connection Power/ground • Example solution DDA violation Min implant width violation Min jog/notch width violation Min implant width violation Intertwine the historically separate tasks of P&R and post‐ route optimization A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 36 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 37 [ISQED14] I. Flexible Timing Models setup‐hold‐c2q flexible model c2q1 ... • Setup time, hold time and clock-to-q hold (c2q) delay of FF ⇒ values interdependent, but NOT fixed • Flexible FF timing model can exploit operating (function/test) modes ⇒ “Free” pessimism reduction in STA setup‐hold‐c2q fixed model c2qn • Goal: Find best {setup, hold, c2q} for each FF instance • Sequential LP: • setup-c2q opt • hold-c2q opt C2q‐setup‐hold surface setup c2q hold c2q c2q setup hold A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 38 Flexible Timing Model Recover Margin • Independent datapaths in PBA: using fixed FF timing model loses performance optimization opportunity c2q: 20ps setup: 10ps FF1 480ps Total: 500ps 470ps 470ps setup: 10ps 20ps 460ps FF3 c2q: 20ps 10ps 460ps 480ps FF2 Total: 500ps c2q: 10ps 20ps setup: 20ps 10ps Total: 500ps 500ps! 520ps? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 39 Improved Timing Signoff Flow Netlist (and SPEF, if routed) Extract path timing information Takeaways • • LP formulation with flexible flip‐flop timing model Fix timing violations “for free” 48ps average improvement of slack over 5 designs in a foundry 65nm technology Next Solve Sequential LP (STA_FTmax , STA_FTmin) Solution Annotate new timing model for each flip‐flop • • • Better exploitation of disjoint cycles/modes More accurate modeling of setup-hold-c2q tradeoff Circuit optimization should natively exploit FF timing model flexibility Timing signoff with annotated timing A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 40 [DATE13] II. Signoff Definition (e.g., with AVS, Aging) • VBTI : Voltage for BTI‐aging estimation • Vlib : Supply voltage for timing library characterization • Vfinal: Vdd of a circuit with AVS at end‐of‐lifetime VBTI |Vt| Vlib Derated library Circuit implementation and signoff Circuit implementation depends on VBTI and Vlib ? VBTI and Vlib depend on aging during AVS (Vfinal) Vfinal Chicken & Egg Loop BTI degradation and AVS Vfinal depends on circuit circuit A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 41 Observations and Heuristics Observation #1: Vfinal is not sensitive to cells along the timing‐critical path Observation #2: ΔVt with a constant Vfinal throughout lifetime ≈ adaptive Vdd Heuristic #1: Use average of critical path replicas to estimate Vfinal (Vheur) Heuristic #2: approximate Vdd in AVS by constant Vheur Solve “Chicken & Egg Loop” by having VBTI = Vlib = Vheur≈ Vfinal A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 42 “Knee” Point for Signoff Definition Optimistic aging library large power penalty Ignore AVS larger area Low Vlib High Vlib Low VBTI Slower circuit Less aging Faster circuit Less aging High VBTI Slower circuit More aging Faster circuit More aging Overly pessimistic aging library large area penalty Our method finds “Knee” point for balanced area and power tradeoff Experiment setup: DC/AC BTI @ 125°C 32nm PTM technology 4 benchmark circuit implementations A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 43 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 44 Mixed Cell Height Implementation (!) [ICCAD15] • Large cell height better timing, but large area and power • Small cell height smaller area/power per gate, but large delay and more #buffers • Mixing cell height enables tradeoffs between performance and area/power (recall FinFET introduction!) better design QoR • E.g., use large-height high-fanin cells to improve pin accessibility • Already have flop trays, etc. as problematic multi-height instances Technology: 28nm LP In red are 12T cells = larger area, smaller delay In blue are 8T cells = smaller area, larger delay A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 45 Cost of Mixing Cell Heights • “Breaker cells” are required to align regions with different cell heights Optimization must comprehend corresponding area cost X directional shift … 8T Cell four sites 12T Cell … Y directional shift one M2 pitch 64nm 48nm 64nm Assume: M2 pitch = 64nm … 12T Cell 12T Cell … Cell boundary P/G rail No routing blockage Routing blockage on M1/M2 A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 46 Optimization Flow Synthesis Initial placement Partitioning Legalization Floorplan Update Cell mapping Routing / RoutOpt Initial placement uses modified LEF enable optimization with a conventional flow Slicing-based partition with DP to divide die area into regions with different cell heights Internal-timer guided placement legalization Floorplan update with “breaker cell” penalty Row-based cell mapping places cells onto rows with corresponding heights A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 47 Example of Optimization Flow Initial placement (8T/12T cells are “freely” placed) Partitioning (Yellow blocks = regions) Legalization Mixed-height placement New floorplan Technology: 28nm LP Design: AES 8T cells are in blue 12T cells are in red A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 48 Benefits from Mixing Cell Heights • Technology: 28nm LP (12T/8T) Design: AES • 25% area reduction as compared to 12T-only design • 20% performance improvement compared to 8T-only design A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 49 Outline • Why Physical Synthesis • Physical Synthesis 1.0 • Example Challenges • • • • • • FinFET Noise and Chaos Clock Skew Complexity and Hyperlocality Better (and, more complex) Signoff A Mixed-Height Sweet Spot? • Physical Synthesis 2.0 ? A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 50 Physical Synthesis 2.0 • It’s the predictability! (and, prediction is challenged…) • New devices and patterning technologies • Complex PD tool chain; chaotic behavior of tools and flows • Oblivious to clocks, corners, coupling how can Physical Synthesis be doing the right thing? (= target for margin recovery!) LS LS • What will Physical Synthesis 2.0 look like? • (1) Higher-level value: what Physical Design cannot do • Datapath architecture selection • Resource sharing • Mux mapping • (2) Other types of prediction (machine learning, big data, etc.) ! • (3) Constructive prediction deeper into implementation flow • (More integration… ) Clock and MCMM awareness • Hyperlocality awareness: coloring, congestion, coupling, interactions … A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 51 THANK YOU ! A. B. Kahng, Physical Synthesis 2.0, IWLS-2015 Keynote 52