Timing Closure Today Lou Scheffer Cadence San Jose, CA Lou@cadence.com Hangzhou, April 2002 Lou Scheffer 1 Timing Closure Today Design Entry Synthesis Timing Place Timing •Timing more accurate as flow progresses •Sometimes an earlier stage thinks timing is OK, but it fails a later stage •Need to repeat one or more steps with tighter constraints • We have a timing closure problem when this process fails. Symptoms include: •Non-convergence Route •Too many iterations Timing •Solution achievable, but this flow cannot find it. Hangzhou, April 2002 Lou Scheffer 2 The Timing Closure Problem Performance of Circuit Test 7 Frequency (Target !00MHz) 100 100 99 99 96 95 90 pks regular 85 83 80 78 75 PKS/WLM P&R IPO P&R Stage Hangzhou Lou Scheffer II-3 Examples of Problems Worst slack / # misses Synthesis Placed Cycle time C1 -1 / 2000 -12 / 38k 7.5 ns .25 µm V1 0/0 -12 / 15k 7.5 ns .18 µm T1 -0.5 / 2000 -48 / 164k P1 -0.4 / 100 -97 / 43k 8 ns .25 µm V2 -0.5 / 500 -11 / 2000 7.5 ns .18 µm Design Hangzhou, April 2002 Lou Scheffer Tech 2.5-10 ns .18 µm 4 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-5 Timing Analysis Give accurate time values on each pin/port of the network Has to deal with design changes in optimization toolbox Static Timing Analysis Simulation far too slow in optimization environment Accuracy is more than enough Hangzhou Lou Scheffer II-6 Timing Analysis Requirements Choose combination of timing analyzer and delay calculator which are appropriate for level of design give the best accuracy for performance that can be tolerated Timing Analysis / Delay calculation must be able to cope with logic design changes Incremental Highest performance possible Non-linear delay models Hangzhou Lou Scheffer II-7 Timing Analysis Requirements Must handle… Difference between rising and falling delays Delay dependent on slew rate Slew and delay dependent on output load Non-linear delay equations Hangzhou Lou Scheffer II-8 Late Mode Analysis Definitions d ax ATa a ATb b y RATx x c Constraints: assertions at the boundaries – Arrival times: ATa, ATb – Required arrival time: RATx Delay from a to x is the longest time it takes to propagate a signal from a to x Slack is required arrival time - arrival time. Hangzhou Lou Scheffer II-9 Example SL y 1 2 1 SLa 0 0 0 ATa 0 a ATb 1 b SLb 0 1 1 AT y 2 RATx 2 y 1 1 c ATc 0 SLc 1 0 1 Hangzhou Lou Scheffer x ATx 3 SLx 2 3 1 II-10 Early mode analysis Definitions change as follows – longest becomes shortest – slack = arrival – required Not as important since early violations are easier to fix SL y 1 1 0 SLa 0 0 0 ATa 0 a ATb 1 b RATx 2 y 1 SLb 1 0 1 Hangzhou AT y 1 1 c ATc 0 SLc 0 1 1 Lou Scheffer x ATx 1 SLx 1 2 1 II-11 Delay modeling a b dax dbx x d tcl _ d o cl Propagation Arcs dcl _ o Timing Model Hangzhou Test Arc Lou Scheffer II-12 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Timing Correction Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-13 Traditional Design Flows Design Entry 1. Tech independent optimization Synthesis 2. Tech mapping 3. Rudimentary timing correction Timing Place Timing Route Timing Hangzhou Lou Scheffer II-14 Logic Synthesis Technology independent optimization General goal: reduce connections, literals, redundancies, area Technology mapping Map logic into technology library Timing correction added next Find and fix critical timing paths Fix electrical violations (load, slew) Hangzhou Lou Scheffer II-15 Traditional Design Flows Design Entry 1. Tech independent optimization Synthesis w/Timing 2. Tech mapping 3. Timing correction Place w/Timing Route Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer II-16 Traditional Design Flows Design Entry 1. Tech independent optimization Synthesis/Place ment w/Timing 2. Tech mapping 3. Placement 4. Timing Correction Global Route Detailed Route Timing Integrate timing with synthesis and placement Hangzhou Lou Scheffer II-17 Traditional Design Flows Design Entry Synthesis and Placement w/Timing and Global route 1. Tech independent optimization 2. Tech mapping 3. Placement 4. Timing Correction 5. Global route Detailed Route Timing Integrate timing with synthesis, placement and global route Hangzhou Lou Scheffer II-18 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-19 The Wall Logic designers concentrate on logic and timing (as understood by synthesis) Design work done in abstract world Was gates and wire load models Now may include placement and global route Throw design over the wall when complete Physical designers concentrate on layout and ability to route Effective method for many years Hangzhou Lou Scheffer II-20 General CMOS Problems Low drive strengths / low power Capacitance (not intrinsic delay) plays a large role in performance Huge variability – range between slowest possible and fastest possible Noise affects delay IR drop a big percentage of supply Crosstalk can change delay by a factor of 2 Hangzhou Lou Scheffer II-21 Additional DSM Problems High density / huge designs Very thin and resistive wires Very high frequencies Inductance becomes more important Smaller voltages IR drop a bigger fraction of signal swing Clock skew and latency Electromigration and noise Hangzhou Lou Scheffer II-22 Clock Distribution Problems Most common design approach requires close to zero skew CMOS / DSM problems all affect clocks Distribution problem increasing Number of latches/flip-flops growing significantly Power consumed in clock tree significant I Hangzhou and noise also of concern Lou Scheffer II-23 Process Designers are trying to help Many metal layers Different metal pitches Small pitch for local interconnect Big pitch/thick metal for long, fast wires Copper wires, thick metal to lower R SOI – Silicon On Insulator Low k dielectrics These help but are not enough Hangzhou Lou Scheffer II-24 Agenda Timing Analysis Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-25 Timing Correction Fix electrical violations (slew and load). Takes priority since needed for reliability. Resize cells Buffer nets Copy (clone) cells Fix timing problems Local transforms (bag of tricks) Path-based transforms Hangzhou Lou Scheffer II-26 Local Transforms Resize cells Buffer or clone to reduce load on critical nets Decompose large cells Swap connections on commutative pins or among equivalent nets Move critical signals forward Pad early paths Area recovery Hangzhou Lou Scheffer II-27 Transform Example ….. Double Inverter Delay = 4 Removal ….. ….. Delay = 2 Hangzhou Lou Scheffer II-28 Resizing ? b 0.2 e 0.2 f 0.3 d a d 0.05 0.04 0.03 0.02 0.01 0 0 a 0.2 A b 0.8 0.6 0.4 1 load 0.035 A B C a C b 0.026 Hangzhou Lou Scheffer II-29 d Cloning 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 load A a ? b d 0.2 e 0.2 f 0.2 g h 0.2 0.2 B C d A e f a B b g h Can also isolate critical sinks Hangzhou Lou Scheffer II-30 d Buffering 0.05 0.04 0.03 0.02 0.01 0 0 0.2 0.4 0.6 0.8 1 load A a ? b d 0.2 e 0.2 f 0.2 g h Hangzhou B C d 0.2 e 0.2 a B b 0.2 0.2 0.1 B f 0.2 g 0.2 0.2 h Lou Scheffer II-31 Redesign Fan-in Tree Arr(a)=4 Arr(b)=3 a b 1 e 1 Arr(c)=1 Arr(d)=0 c Arr(e)=6 1 d a b c d Hangzhou 1 e 1 Arr(e)=5 1 Lou Scheffer II-32 Redesign Fan-out Tree 3 3 1 1 1 1 1 1 1 1 2 1 Longest Path = 5 Hangzhou 1 Longest Path = 4 Slowdown of buffer due to load Lou Scheffer II-33 Decomposition Hangzhou Lou Scheffer II-34 Swap Commutative Pins 1 0 a 1 1 2 b 5 1 c 2 Simple Sorting on arrival times and delay works 1 2 3 c 1 1 0 b 1 a 2 Hangzhou Lou Scheffer II-35 Move Critical Signals Forward a b c d e a b e d Based on ATPG – linear in circuit size – Detects redundancies efficiently Efficiently find wires to be added and remove. – Based on mandatory assignments. c Hangzhou Lou Scheffer II-36 Path-based Transforms Path-based resizing Unmap / remap a path or cone Slack stealing Retiming Hangzhou Lou Scheffer II-37 Slack Stealing Take advantage of timing behavior of level sensitive registers (latches) C1 C2 0 Slack = -1 C1 1 2 Slack = +1 C2 C1 Slack = 0 C2 Hangzhou Lou Scheffer II-38 Retiming Backward Delay=3 Forward A more aggressive optimization Delay=2 since it changes the function Hangzhou Lou Scheffer II-39 Solutions to Timing Closure Carry hierarchical logic design into physical Hand / Custom design Improved analysis More sophisticated clock design Modify existing flows More physically knowledgeable tools Many variations: combined synthesis/place/route, gain based synthesis, etc. Hangzhou Lou Scheffer II-40 Agenda Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-41 Hierarchy and Physical Design Logical hierarchy can be carried over into physical design Seems natural top-down approach, using floorplanning as a firm guide to physical design Use of hierarchy offers many advantages and many possible problems A new generation of tools for this problem Hangzhou Lou Scheffer II-42 Pin Assignment and Timing Budgeting Each block requires: Content definition Block 1 L Block 2 L Partitioning Pin locations Clock/timing definition Set_input_delay L Set_output_delay Set_drive Block 3 Set_load Path exceptions (false, multicycle paths) Hangzhou Lou Scheffer II-43 Hierarchy and Physical Design Advantages… Run time of P&R tools Blocks can be built independently Early (and valuable) knowledge of global wires Limited wire delay within macro may allows simpler methodologies Contains the problem size Extends naturally to SOC and mixed A/D chips May be the only real method available Hangzhou Lou Scheffer II-44 Physical Hierarchy Disadvantages Possible to overconstrain the design in many ways (see next slide) Hierarchy usually logic-based, not physically-based Designed for logical correctness, not physical implementation Hangzhou Lou Scheffer II-45 Physical Hierarchy Overconstraints Placement solution perhaps overconstrained Logical Ability to find a routable solution hindered Can’t gates may not fit naturally in a rectangle detour through neighboring cell Boundary conditions explode and must be managed carefully to avoid surprises A recent IBM design had 17,000 top level connections. A bad timing constraint on any one can make the whole design infeasible Hangzhou Lou Scheffer II-46 Hierarchy Example Plots Hangzhou Lou Scheffer II-47 Hierarchy Example Plots Hangzhou Lou Scheffer II-48 Hierarchy Example Plots Hangzhou Lou Scheffer II-49 The Challenges How to derive sensible partitioning? How to achieve die utilization similar to “flat” approach? How to achieve clock speed and skews similar to “flat” approach? How to automatically generate optimal pin assignments for each module? How to automatically come up with realistic timing budgets for each module? Hangzhou Lou Scheffer II-50 Basic Approach to solution Example tool – First Encounter Start with a Silicon Virtual Prototype •A near final quality ‘flat’ placement RTL / gates •Near legal routing •Known feasible solution for timing and routability •Use this solution to guide the final implemention Silicon Virtual Prototype hierarchical partitioning and placement Top Level Block Level Top level buffering, clock balancing, and power grid Physical synthesis / placement and routing •Partitioning, pin assignment, timing constraints •Build the blocks with more detailed tools. Hangzhou Chip assembly routing GDSII Lou Scheffer II-51 Basic Approach continued Logic Design: RTL, gates, IP, “black box” IP a=b+c const. netlist Physical Prototype Power IPO Clock Tree STA Delay Calc RC Extract Trial Route Scan Opt. (proves timing and routability) Silicon Virtual Prototype Placement Complete “flat” physical design Physical Data .lib Accurate timing and routability data in hours instead of days or weeks VERY FAST Full-Chip Physical Prototype Hand off a floorplan OR full placement Block-Level Physical Synthesis and/or Route Hangzhou Lou Scheffer Confidence the design will work once the blocks are re-assembled into the complete IC II-52 In-Context Hierarchical Partitioning Partitioning Hangzhou Pin assignment Timing budgeting Clock tree generation Power grid planning Independent block-level implementation Lou Scheffer SoC assembly II-53 In-Context Pin Assignment Accurate Physical Prototype Flat Full-Chip Top Level Partition View Full-chip prototype results in optimal pin placement Results in narrower channels and reduced die size Reduces the routing congestion Improves the chip timing Hangzhou Lou Scheffer II-54 In Context Timing Budgeting Block 1 L Block 2 L L Block 3 Each block requires: Clock definition Set_input_delay Set_output_delay Set_drive Set_load Path exceptions (false, multicycle paths) Accurate timing budgets result in predictable timing convergence Hangzhou Lou Scheffer II-55 Agenda Analysis Methods Overview Traditional design flows Summary of DSM Problems Correction Methods Overview Hierarchy and Timing Closure Block Level Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-56 Blocks have timing closure problems, too Didn’t the big flat placement guarantee blocks are feasible? No, because Block may not have been defined when global constraints were set Global placer does not deal with all DSM effects Block may be too hard for the relatively simple global placer (which must be very fast) Requirements change as project progresses Process technology may have changed ….. Hangzhou Lou Scheffer II-57 Hand/Custom Design Mentioned for completeness Hurts productivity Yields highest performance Can only fix a few things – for example: Can realistically fix timing or crosstalk problems on a few nets Cannot realistically change the size of blocks Hangzhou Lou Scheffer II-58 Improved Analysis Helps Plot shows slack by net for two designs A 10% timing delta -> many more bad nets Often the difference between success and failure 3500 Series1 Series2 3000 Number of Nets 2500 2000 1500 1000 500 0 -5 0 5 10 15 20 Slack Relative to Worst Net (ns) Hangzhou Lou Scheffer II-59 More accurate analysis Crosstalk induced delay approach – overestimate coupling C Better – compute nominal timing + xtalk delta Old Customer example from CadMos Ignore Not crosstalk completely 400 MHz an acceptable alternative Coupling Caps overestimated by 60% 300 MHz Nominal delays + computed crosstalk 333 MHz More accurate analysis gains 10% margin Hangzhou Lou Scheffer II-60 Increased accuracy helps Global/detailed route correlation Any global route better than Wire Load Models or Steiner trees, since global routes consider congestion But to get that last 10%, need global/detailed router link Knowing some nets must detour is good, but…. Which net takes which detour is needed for good correlation Hangzhou Lou Scheffer II-61 Modified clock design Zero skew is not necessary, and often not even desirable We have the freedom to adjust clock arrival times at memory elements This obtains more margin and thus helps convergence Similar to retiming but less disruptive Improvement very design dependent If worst path is flip-flop to itself, doesn’t help May impact scan chains Hangzhou Lou Scheffer II-62 Previous attempts to fix block closure Without the radical step of combining synthesis and placement, designers have tried: Allow placer to do sizing and buffering Do post placement optimization Simple transformations Use existing placement Do post placement re-synthesis Complex transformations allowed Needs incremental placement and extraction But these have not been fully successful Why? Re-examine the root cause of discrepancies Wire load models and their limitations Combined Synthesis/Placement/Routing Hangzhou Lou Scheffer II-63 Post-Placement Optimization Design Entry Synthesis w/Timing Place Re-run Synthesis w/Timing 1. In-place optimizations 2. Minimally disturb placement optimizations Route Timing Hangzhou Lou Scheffer II-64 Post-Placement Optimization In-place (little or no placement impact) Resizing (carefully) Pin swapping, some tree rebuilding Wire sizing / typing Minimally disruptive Resizing Buffering Cloning Tree rebuilding Cell removal Hangzhou Lou Scheffer II-65 In-place Optimization Not too difficult Can use extracted electrical data (C, RC) from placement tool Some changes affect pin locations, but may be ignored Tree rebuilding needs incremental extraction Can use timing reports for timing data But, accuracy suffers as changes are made Real RC data replaced by estimates again Hangzhou Lou Scheffer II-66 In-place Optimization Resize swap pins rebuild trees Placed netlist Placement & extraction Optimization C/RC data Opt’d netlist Hangzhou Lou Scheffer II-67 Place-disruptive Optimization Nets changing implies… Must be able to recompute C and RC May need to incrementally place new cells Need incremental timing capability Hangzhou Lou Scheffer II-68 Place-disruptive Optimization Resize buffer clone cell removal rebuild trees Placed netlist Optimization with placer, timer, extractor Placement & extraction C/RC data Opt’d netlist Hangzhou Lou Scheffer II-69 What are the problems? Getting the timing right Different timers used at different stages Do the optimizer and placer see the same worst paths as the static timer? Design size / tool capacity Using Hangzhou synthesis technology on flat designs Lou Scheffer II-70 More problems Incompatible tools, formats Placer, synthesizer, timer may all use different file format, may all be different vendors Basic interoperability issues Incremental placer needed for new cells Doesn’t have to be smart But might produce some infeasible solutions Must be integrated with optimizer Hangzhou Lou Scheffer II-71 Still more challenges/problems Extraction/Estimation of net data Any optimization which significantly alters net topology needs this ability Insert cells Remove cells Move connections from one cell to another Steiner tree estimation Net C and delay (RC) calculator Do results match detail router and other extraction tools? Hangzhou Lou Scheffer II-72 Sample Optimization Results Worst slack / # misses Synthesized Placed Opt Cycle time C1 -1 / 2000 -12 / 38k -2 / 1400 7.5 ns .25 µm V1 0/0 -12 / 15k -0.3 / 100 7.5 ns .18 µm T1 -0.5 / 2000 -48 / 164k -6 / 62k 2.5-10 ns .18 µm P1 -0.4 / 100 -97 / 43k -13 / 20k 8 ns .25 µm V2 -0.5 / 500 -11 / 2000 -4 / 1000 7.5 ns .18 µm Design Hangzhou Lou Scheffer Tech II-73 Root Problem is Wire Load Models Main problem: correlation between PreP&R estimates and Post-P&R extraction If correlation is good… Problems detected and potentially fixed early If correlation is bad… Problems detected late Not a good situation! Need to re-write RTL is worst case for timing closure. Hangzhou Lou Scheffer II-74 Why are Wire Load Models Used? Can’t complete layout until logic design is complete Can’t complete logic design without timing Can’t time without load and net delay data Can’t extract load and net delay data until layout is complete Can’t complete layout … Hangzhou Lou Scheffer II-75 WLM solution – use statistics Don’t know specific layout data But we know something about statistical properties Average net load, average net delay Further refine using other characteristics Number of sinks Size of design (number of circuits) Physical size Hangzhou Lou Scheffer II-76 Correlation Pre/Post-P&R using averages Wire load models give synthesis an estimate of physical design We can correlate averages pre- and postP&R as accurately as needed If specific design has average behavior, its timing, on average, can be predicted Otherwise, a pass through placement can provide correct WLM for a design, and get the averages right Hangzhou Lou Scheffer II-77 Timing and averages WLMs OK for area, power (properties that are sums are well handled by statistics) But, timing dictated by the worst specific path That path is built of individual nets One net can determine the speed of an entire design Reality: poor correlation for relatively few nets can cause major headaches Hangzhou Lou Scheffer II-78 Correlation Pre/Post-P&R Averages and Wire Loads 10 0 11 0 90 70 60 mean 50 40 30 20 median 80 30000 25000 20000 15000 10000 5000 0 0 10 Number of nets Distribution of C / fan-out pF per fan-out Note the very long tails of this distribution Hangzhou Lou Scheffer II-79 Correlation Pre/Post-P&R Cwire Data by Logic Design Cwire Number of fan-outs Hangzhou Lou Scheffer II-80 Better Wire Load Models How can we use information from one pass through physical design? Adjust wire load model coefficients Back annotate specific net load and delay data to the logic design New problem: correlation of logic pre- and postsynthesis But, there are fundamental limits to statistical models – a new approach is needed. Hangzhou Lou Scheffer II-81 A better (but harder) approach: Combine Synthesis, P & R Don’t use wire load models at all Synthesis does a trial placement as it runs Loading found from estimated routes For best results, must include global routing Then, feed global route to detailed router Or, do detailed route itself Much better correlation and timing closure No inter-tool data transfer headaches Hangzhou Lou Scheffer II-82 Example of Combined SP&R Video Graphics Engine 160k instances 70 macros (blocks) 5 layers, 0.18 micron Target freq: 100Mhz Hangzhou Lou Scheffer II-83 Conventional Flow Func. & Timing .lib More than 20 Iterations 89MHz best result w/manual changes Synthesis Static Timing PT syn2GCF SE Floorplan DEF Func. & Timing .TLF Physical LEF Placement base optimization Global route Detail route Extraction Delay calc Hangzhou DC Lou Scheffer DRC Pearl II-84 Combined SP&R Flow 100MHz final result, met timing Correlation within + - 2.1% One pass 12hrs 20min runtime Static Timing PT write_constraints TCL Constraints Floorplan DEF Func. & Timing .TLF Physical LEF EDIF netlist SE-PKS PKS Optimization Global Route Static Timing Detail route Extraction Delay calc Hangzhou Lou Scheffer DRC HE Pearl II-85 Slack Correlation PKS Routed Wire Load Based Hangzhou Lou Scheffer II-86 Enlargement of SP&R slack Hangzhou Lou Scheffer II-87 Results from combined SP&R Case 1 2 3 4 size instances (k) 350 250 50 160 Hangzhou macros 56 50 4 70 PKS timing error (%) + - 3% + - 3% + - 0.96% + - 2.1% Lou Scheffer max freq (MHz) conventional SP&R 140 140 97 100 93 95 89 100 II-88 Agenda Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-89 Experimental Results For Hierarchical design, two objectives Should be faster and higher capacity than a fully detailed flat design, to find problems earlier Resulting partitions should be realizable For Block Design, compare different strategies Can overconstrain clock or wire models Can do IPO or not after placing Can allow placer to change size or not Can test combined synthesis/placement against running the two tools separately Hangzhou Lou Scheffer II-90 Hierarchical Experimental results Design Import Detail Place Hangzhou 57x 33x Detail Route* RC Extract Delay Calculation Lou Scheffer 6x 35 hr 40 min 5 hr 25 min 9 hr 1 hr 50 min 2 hr 15 min 56x 20 min 7 min 60x 6 min 1x 8 min 3 hr 20 min 2 hr 50 min 4 hr 4 min (*) SPC Trial Route 3 hr 50 min 5 hr 45 min Design 580K cells, 0.25um process, 5LM, 100MHz Data collected on a 500MHz processor workstation First Encounter Flow Resulting blocks were realizable Traditional Flow 7 hr 30 min 5x 7x Timing Analysis IPO Design Iteration II-91 How do different block design approaches compare? Jay McDougal of Agilent ran many flows on the same design 1. Overconstrain clock by various amounts 2. Accurate or conservative WLMs 3. 4. 5. Tried many levels of conservatism Allow placer to size or not Do post placement optimization or not Physically knowledgeable synthesis Hangzhou Lou Scheffer II-92 Characteristics of sample design Design not very difficult ColdFire processor 80K instances 0.25 micron library 5 layer process, not congestion dominated Design goal was 180 MHz, known to be possible with this design 85% of delay in gates; 15% in interconnect 0.18/0.13 micron, bigger designs will show bigger differences between techniques Hangzhou Lou Scheffer II-93 Key to the plot of results Basic flow – Design Compiler & QPlace TDD = timing driven design In addition to minimizing wire length and congestion, placer is given timing constraints and allowed to change gate sizes IPO and PBO are post placement optimizers – runs on synthesis DB with back annotation PBO – runs on physical DB with synthesis transforms IPO PKS = Physically Knowledgeable Synthesis (combined Synthesis/Place/Route) Hangzhou Lou Scheffer II-94 Clock cycle achieved Comparison of Approaches 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 0.95 Hangzhou No custom WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS Required Cycle time 1.05 1.15 Relative size Lou Scheffer 1.25 II-95 Clock cycle achieved Comparison of Approaches 9.5 9 8.5 8 7.5 7 6.5 6 5.5 5 0.95 Hangzhou Good area, but iterates between placement and synthesis, worst TTM, didn’t hit timing target No WLM 90% WLM 3ns;50%WL IPO 5ns NoWL IPO 3ns NoWL TDD/PBO 50%WL TDD/PBO 90%WL PKS One tool, no iteration, better TTM, hit timing target 1.05 1.15 Relative size Lou Scheffer 1.25 II-96 Agenda Traditional design flows Summary of DSM Problems Analysis Methods Overview Correction Methods Overview Approaches to Fixing Timing Closure Experimental Results Summary Hangzhou Lou Scheffer II-97 Good News At least we understand the problem Analysis of timing is well understood Transformations that help timing are well understood DSM effects are painful but can be controlled Hangzhou Lou Scheffer II-98 Bad News Cycle time and technology advances demand more and more sophisticated optimization techniques In previous flows, corrections must be applied in separate tools Disconnects among various tools involved increases turn-around-time and limits optimization Hangzhou Lou Scheffer II-99 Good News The Bad News is commonly recognized Many tool vendors, academics, in-house EDA researchers are working to solve these problems A new generation of tools is already available that was designed from the ground up to address timing closure Hierarchical Hangzhou and block design Lou Scheffer II-100 Bad News These problems won’t be the last! Each process generation brings new problems Increased size Weird process rules (antenna) Possible new effects (single event upset) Hangzhou Lou Scheffer II-101 Summary Timing closure is a very real problem Large chips need hierarchical tools to help with partitioning and budgeting Block tools must understand synthesis, placement and routing wire load models have serious limitations Best approach is combined synthesis/P&R Experimental data backs this up Hangzhou Lou Scheffer II-102 Acknowledgements Tony Drumm wrote the original set of slides for this lecture, including many of the examples. He credits: Alex Suess José Neves Bill Joyner IBM Rochester EDA folks But the conclusions, and any mistakes, are mine Hangzhou Lou Scheffer II-103 Hangzhou Lou Scheffer II-104