Development and Application of Tree Synthesis Algorithms

Development and Application of Tree Synthesis Algorithms John Lillis University of Illinois Chicago Overview  Part I: Buffer tree synthesis  Formulations  S/P/SP-tree  Part II: Fanin tree embedding/replication  Optimization across gate boundaries  Interaction with placement Part I: Buffer Tree Synthesis Premises of Work  MAIN PREMISE: Powerful Buffer Tree Synthesis is a Core for Modern Design  Conservation of Resources Crucial  Estimate: 700-800K Buffers/Chip in Near Future  Cost-Performance Tradeoffs  General Cost Model  Topology / Embedding / Buffering Spaces Should be Explored Simultaneously  2-Phase Approach Not Robust / Predictable  Particularly Troublesome in Presence of Blockages Max Slack Weakness Overoptimized Slack subtrees Cost Problem Formulation  Given:  Location of Driver and Sinks  Technology Parameters  Timing Requirements  Buffer Library  Target Routing Graph (Blockages)  Find:  Topology in corresponding space  its Embedding  and Buffer Assignment  Minimizing Cost  s.t. Timing Constraints Philosophy of Constraint Imposition  Goals:  Predictable Behavior  Absence of ad-hoc Heuristics  Main Idea:  Optimally Solve Constrained Variant of the Problem  Well-Designed Constraints Produce Large Flexible Solution Space Tractability  Constraints: Topology Space Full space Constrained space Topology Embedding Flexibility s c s a c a b s b c a b Target Routing Graph Construction Routing blockage s a c b Buffer blockage Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree Core Subroutine: Timing-Driven Maze Routing  Generalization of [Hur, et. al.; TCAD Feb 2000]  Single Target, Multiple Sources  Finds non-dominated paths  Simultaneous Buffer Insertion  Handling of Blockages in Topology Synthesis Target Sources Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree Topology Embedding  Goal: Obtain timing feasible embedding / buffering of given topology, minimizing cost  Solution: Dynamic Programming (bottom-up) Solution sets  A(u,v) represents a set of solutions that correspond to  Vertex u in Topology  Vertex v in Target Graph A1b = Join(A1.left , A1.right) A1 = GenDijsktra(A1b) A(u,v) u v Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree S-Tree  Notion of localities:  Spatial  Temporal  Polarity  Partition sinks into 2 sets based on:  estimated timing criticality  signal polarity requirements  some other criteria...  Subtrees can break topology and “stitch” at different place S-Tree Topology Space s Sink partition: {a,c,d} {b} d c a b s s b d d b a c a c S-Tree Recurrence A1b = Join(A1.left , A1.right) A1 = GenDijsktra(A1b) A2b = Join(A2.left , A2.right) A2 = GenDijsktra(A2b) A12b = Join(A12.left , A12.right) + Join(A1 , A2) A12 = GenDijsktra(A12b) S-Tree Topology Space s s Initial topology s c a b b f d c a e f d e s s b a c c a b d f e a b d f e c f d e Incorporating polarity  4 sets:  critical & positive signal polarity  critical & negative  non-critical & positive  non-critical & negative  Other partitioning schemes... Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree P-Tree Topology Space  All Permutation-Constrained Topologies a s a b c d e b c d s e a b c d e Limitations of P-Tree Space  Isolation of Critical / Non-Critical Subtrees: “Temporal-Locality”  Min WL May Not Produce Min Cost Driver Driver Critical Critical Non-critical Non-critical Algorithmic Description Timing-Driven Maze Routing Topology Embedding S-Tree P-Tree SP-Tree SP-Tree  Combine everything said so far...  From P-Tree  Spatial locality  Robustness  From S-Tree  Temporal locality  Polarity locality  Ability to fix “topology problems” by “stitching” Solution Space Entire space SP-Tree P-Tree S-Tree Fixed topo. Experiments  Randomly generated nets  Non-uniform required arrival time  Non-uniform sink input capacitance  Buffer-biased cost  Interested in:  Min cost feasible solution  Max slack solution for verification  Runtime  More details in the paper... Algorithms for Experiments  S-Tree  P-Tree  SP-Tree  RMP [Cong, Yuan; DAC 2000]  RMP-Quick [Cong, Yuan; DAC 2000] Results RMP RMP-Qck S-Tree P-Tree Net2-06 SP-Tree 35 Min cost feasible Max slack # buffers 30 25 20 15 10 5 0 Wire Buf Cost Slack Max Slack Wire Buf Cost Runtime Results RMP RMP-Qck S-Tree P-Tree Net2-08 SP-Tree 50 Min cost feasible 45 Max slack # buffers 40 35 30 25 20 15 10 5 0 Wire Buf Cost Slack Max Slack Wire Buf Cost Runtime Results RMP RMP-Qck S-Tree P-Tree Net2-12 SP-Tree 80 Min cost feasible Max slack 70 # buffers 60 50 40 30 20 10 0 Wire Buf Cost Slack Max Slack Wire Buf Cost Runtime SP-Tree vs. P-Tree Conclusions  Key Concepts:  General Cost Models Routing Congestion Buffer Congestion  Orthogonal Separation of Spatial and Temporal Locality  Polarity Requirements  Routing and Buffer Blockages  Targets:  Small-to-Medium Sized Signal Nets  Results Summary  Highly Cost-Efficient, High Performance Solutions  Substantially Outperforms Prior Approaches in Solution Quality and Runtime Part II: Fanin Tree Embedding/Replication Replication Overview • Hrkic, Lillis, Beraudo (DAC04, IWLS04) • Concept: Netlist structure limits potential of timing-driven placement • Difficult for top-down synthesis to fix • Main issue: inherently non-monotone paths • Approach (Hrkic, Lillis; DAC04) touches on placement, synthesis (netlist perturbation) and routing. Logic Replication  Duplicate logic cell  Preserve functionality  Improve timing Place / Move cells  Adjust connections  A B A B CR C C D E D E Early Work  Use replication to straighten I/O paths  Local monotonicity [Beraudo, Lillis, DAC 2003] Sequence of 3 cells on the path  Incremental framework  B D B A A C C E D CR E Limitations of Local Monotonicity  Local Monotonicity satisfied  Still many non-monotone paths A B C D F E Replication Tree Approach [Hrkic et. al. DAC04]     Identify critical sink Extract critical fan-in tree (Replication Tree) Optimize fan-in tree (Fan-in Tree Embedding) Legalize placement Slowest Paths Tree  Focus on slowest paths  Find slowest paths tree from critical sink  Include paths within epsilon of current critical delay  Focus on most critical portions of fan-in cone Replication Tree  Most circuits do not contain large fan-in trees due to reconvergence  Given a critical tree temporarily replicate the entire tree  Assign connections: if (u,v) is tree edge; connect uR to vR  else connect u to vR  A C B A D E C B E BR DR D F AR F FR CR Placement cost  Replication is temporary  Placement cost is crucial  Cost discount for placing cell over its logical equivalent low cost for placing DR over D  actual replication will never occur  multiple low cost location possible  A C B CR BR DR D E AR F FR Fan-in Tree Embedding  Given:  Fan-in tree  Placement of sink and inputs  Arrival times at inputs  Placement and routing graph  Find:  Placement of internal tree nodes (Gates)  Minimizing Cost  s.t. Timing Constraints  cost / delay tradeoff Fan-in Tree Embedding Example C A C A B B sink Higher delay, lower cost sink Lower delay, higher cost Fan-out and Fan-in Tree C source A B C A sink B Bottom-up Top-down Fan-in Tree Embedding  Adaptation of S-Tree algorithm [Hrkic, Lillis, DAC 2002]  Keep: Graph Model for Embedding Target  Modified Timing-Driven Maze Routing   multiple source, multiple targets  at each vertex keep a list of non-dominated solutions  S. Hur, J. Lillis, IEEE TCAD 2000  Modify: Top-down vs. Bottom-up  Solution signature (c,t):   c - cost  t - signal arrival time  Gate placement cost p(x,y) Fan-in Tree Embedding  Non-binary tree: multiple gate inputs  Top-Down Dynamic Programming  Maze Routing to populate solutions  deffered backtracking  Join Solutions Modified maze routing  c=px,y + c1 + ... + cn  t=MAX(t1, ... ,tn)  Bottom-Up solution extraction  backtrack to extract maze route  extract gate placement Join Aside: Legalization  Use Modified Gain-Graph approach [Hur, Lillis; ICCAD00]  Modified to incorporate timing information Optimization Flow  Identify critical sink (static timing analysis)  Extract Fan-in Tree  Replication Tree  epsilon-Slowest Paths Tree     Embed Fan-in Tree Decide which cells to Replicate / Unify Legalize placement Repeat while there is improvement Enhancements  Post-process unification some cells placed close to their logical equivalents  no automatic unification  if one of the paths is non-critical it is possible to unify without degrading performace   Unification in legalizer during ripple-move cell may be placed on top of its replica  unify them and stop legalization   epsilon-Slowest Paths Tree no randomization  dynamically modify value of epsilon to enlarge the fan-in cone  Experiments  Algorithms Timing-Driven VPR (Versatile Place and Route) [http://www.eecg.toronto.edu/~vaughn/vpr/vpr.html]  Local Replication [Beraudo, Lillis, DAC-03]  RT-Embedding   20 MCNC Benchmark Circuits  Interested in: Critical delay  Amount of replication  Wire usage   Tests performed in FPGA domain  Promissing results Experimental Setup Obtain valid placement with Timing-Driven VPR placer Local Replication Replication Tree Embedding Route and Evaluate with Timing-Driven VPR router 0.927 1.020 1.003 RT-Embed Average values over all 20 circuits0.858 normalized to VPR 0.869 1.084 critical path delay 1.004 W W wire inf low-stress length blocks LocalDelay improved for all circuits Repl 0.925 0.927 1.020 Best improvement for circuit 1.003 RT- pdc: 0.641 Embed 0.858 0.869 1.084 Runtime 1.004 penalty under 5% on the VPR flow Delay improved for all circuits Best improvement for circuit pdc: 0.641 Replication Statistics  Circuit ex1010: 38 replications, 12 unifications Ongoing Work  Generalize to ASICs  Include simultaneous buffering • Mitigation of legalization noise  Preventing (some) overlaps in embedding  More sophisticated placement cost  Reconvergence - arborescence approach  Simultaneous technology (re-)mapping – Explore multiple Tree Topologies simultaneously (Universal Tree solver engine: U-Tree) Review  Trees are everywhere!  Even in places where they seem to be absent  Tree based algorithms can be very strong in generality of formulation and predictability  Enable connection to general placement/routing target  Can capture tradeoffs between complex objectives  Can sometimes be applied to drive optimization of graph structures.  References:  http://cs.uic.edu/~jlillis/pubs.html  S/P/SP-tree executables:  http://eda.cs.uic.edu/software.html Thank you Timing-Driven Placement Legalization  After embedding, cells could overlap in the placement  Moving cells on critical path may harm timing  Ripple-move strategy [Hur, Lillis, ICCAD 2000]  Modified to include both timing and wiring information Overlap Empty Timing-Driven Placement Legalization  After embedding, cells could overlap in the placement  Moving cells on critical path may harm timing  Ripple-move strategy [Hur, Lillis, ICCAD 2000]  Modified to include both timing and wiring information Overlap Empty Timing-Driven Placement Legalization  After embedding, cells could overlap in the placement  Moving cells on critical path may harm timing  Ripple-move strategy [Hur, Lillis, ICCAD 2000]  Modified to include both timing and wiring information Overlap Empty Timing-Driven Placement Legalization  After embedding, cells could overlap in the placement  Moving cells on critical path may harm timing  Ripple-move strategy [Hur, Lillis, ICCAD 2000]  Modified to include both timing and wiring information Overlap Empty Timing-Driven Placement Legalization  After embedding, cells could overlap in the placement  Moving cells on critical path may harm timing  Ripple-move strategy [Hur, Lillis, ICCAD 2000]  Modified to include both timing and wiring information Overlap Empty Timing-Driven Placement Legalization  Identify overlap  Identify up to 4 closest empty (one in each quadrant)  Construct gain graph monotone paths from congested to free slots  edges: gain of moving a cell to neighboring slot  wire and timing gain  find max-gain path and perform ripple-move  gain could be negative Overlap  Empty Empty Review

Development and Application of Tree Synthesis Algorithms

Related documents

Products

Support

Development and Application of Tree Synthesis Algorithms

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib