ECE260B – CSE241A Winter 2007 Floorplanning, Partitioning and Placement Website: http://vlsicad.ucsd.edu/courses/ece260b-w07 ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD ECE260B – CSE241A Winter 2007 Floorplanning ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Floorplanning Input Design netlist (required) Area requirements (required) Power requirements (required) Timing constraints (required) Physical partitioning information (required) Die size vs. performance vs. schedule trade-off (required) I/O placement (optional) Macro placement information (optional) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Floorplanning Output Die/block area I/Os placed Macros placed Power grid designed Power pre-routing Standard cell placement areas Æ Design ready for standard cell placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Floorplanning Output ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Floorplan Blocks inside a pad frame blocks Routing inside, between blocks I/O pads Different-sized blocks more difficult than standard cells to place and route std cell RAM Routing channels data path Blocks z Hard, soft, semi-soft Rectangular, L-shaped, T-shaped, rectilinear z Can rotate, mirror, … z Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Size Estimation Why we care: z If area is too small: P&R will not finish or meet timing, will run too long z Schedule and die size inversely related z Performance and die size have complex relationship Physical Design Schedule Perf Die size Rule of thumb (must correct for power, clock, etc.): - 3LM: Cell utilization 65 percent 3LM: Cell utilization 70 percent 5LM: Cell utilization 75 percent 6LM: Cell utilization 80 percent Die size // what is utilization? Floorplan metrics z z Low interconnect density Æ Cell util (standard cell area/standard cell row area) High interconnect density Æ “Net util” (number of nets/standard cell area) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Channels Channels end at block boundaries Alternate channel definitions possible, depending on position of blocks A ch 1 ch 2 ch 3 B C A channel 1 B ch 2 C A B Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement C Andrew B. Kahng, UCSD Channel Intersection Graph Nodes are channels, edges correspond to pairs of channels that touch Channel graph shows paths between channels Channel graph can be used to guide global routing C A B D E Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Slicing Floorplan Represented by Binary Tree 1 A slicing floorplan can be recursively cut in two without cutting any blocks C A A slicing floorplan is guaranteed to have no “wheels”, therefore guaranteed to have a feasible order 2 of routing for the channels A slicing floorplan can be represented as a binary tree, with internal nodes representing slices in the floorplan and leaves representing blocks. 3 D B 4 E 1 2 A 3 B 4 C D E Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD O-Tree Partial ordering based on projection overlapping (with given physical locations) Transforming into binary trees by pivoting, etc. A Coded in a node sequence given a tree traversal algorithm z C O E.g., OACBDEF for DFS D Condensed solution space E B F Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Sequence Pair Based on layout partitions by nonoverlapping ascending/descending staircases Coded in two node sequences z E.g., CEDFAB for descending staircases and z ABCDEF for ascending staircases C A Larger solution space, finer representation D E B F Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD ECE260B – CSE241A Winter 2007 Partitioning ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Hypergraphs in VLSI CAD Circuit netlist represented by hypergraph Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Hypergraph Partitioning in VLSI Circuit netlist represented by hypergraph Variants - directed/undirected hypergraphs - weighted/unweighted vertices, edges - constraints, objectives, … Human-designed instances Benchmarks - up to 4,000,000 vertices - sparse (vertex degree ≈ 4, hyperedge size ≈ 4) - small number of very large hyperedges Efficiency, flexibility: KL-FM style preferred Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Hypergraph Partitioning in VLSI Circuit netlist represented by hypergraph Variants - directed/undirected hypergraphs - weighted/unweighted vertices, edges - constraints, objectives, … Human-designed instances Benchmarks - up to 4,000,000 vertices - sparse (vertex degree ≈ 4, hyperedge size ≈ 4) - small number of very large hyperedges Efficiency, flexibility: KL-FM style preferred Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Example: Partitioning of a Circuit Input size: 48 Cut 1=4 Size 1=15 Cut 2=4 Size 2=16 Size 3=17 Courtesy K. Yang, UCLA ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Hierarchical Partitioning Levels of partitioning: z System-level partitioning: Each sub-system can be designed as a single PCB z Board-level partitioning: Circuit assigned to a PCB is partitioned into sub-circuits each fabricated as a VLSI chip z Chip-level partitioning: Circuit assigned to the chip is divided into manageable subcircuits NOTE: physically not necessary ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Delay at Different Levels of Partitions A x 10x B D C PCB1 ECE 260B – CSE 241A Floorplanning, Partitioning and Placement 20x PCB2 Andrew B. Kahng, UCSD Delay at Different Levels of Partitions A x 10x B D C PCB1 ECE 260B – CSE 241A Floorplanning, Partitioning and Placement 20x PCB2 Andrew B. Kahng, UCSD Delay at Different Levels of Partitions etc ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Context: Top-Down Placement Speed - 6,000 cells/minute to final detailed placement - partitioning used only in top-down global placement - implied partitioning runtime: 1 second for 25,000 cells, < 30 seconds for 750,000 cells Structure - tight balance constraint on total cell areas in partitions - widely varying cell areas - fixed terminals (pads, terminal propagation, etc.) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Fiduccia-Mattheyses (FM) Approach Pass: z start with all vertices free to move (unlocked) z label each possible move with immediate change in cost that it causes (gain) z iteratively select and execute a move with highest gain, lock the moving vertex (i.e., cannot move again during the pass), and update affected gains z best solution seen during the pass is adopted as starting solution for next pass FM: z start with some initial solution z perform passes until a pass fails to improve solution quality ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Cut During One Pass (Bipartitioning) Cut Moves ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Multilevel Partitioning Clustering ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Refinement Andrew B. Kahng, UCSD ECE260B – CSE241A Winter 2007 Placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD VLSI Design Flow and Physical Design Stage IO Pad Placement Power/Ground Stripes, Rings Routing Global Placement Definitions: Cell: a circuit component to be placed on the chip area. In placement, the functionality of the component is ignored. Net: specifying a subset of terminals, to connect several cells. Netlist: a set of nets which contains the connectivity information of the circuit. Detail Placement Clock Tree Synthesis and Routing Global Routing Extraction and Delay Calc. Timing Verification Detail Routing ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement Problem Input: A set of cells and their complete information (a cell library). Connectivity information between cells (netlist information). Output: A set of locations on the chip; one location for each cell Goal: The cells are placed to produce a routable chip that meets timing and other constraints (e.g., low-power, noise, etc.) Challenge: The number of cells in a design is very large (> 1 million) The timing constraints are very tight ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Optimal Relative Order: A ECE 260B – CSE 241A Floorplanning, Partitioning and Placement B C Andrew B. Kahng, UCSD To spread ... A ECE 260B – CSE 241A Floorplanning, Partitioning and Placement B C Andrew B. Kahng, UCSD .. or not to spread A ECE 260B – CSE 241A Floorplanning, Partitioning and Placement B C Andrew B. Kahng, UCSD Place to the left A B C ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD … or to the right A ECE 260B – CSE 241A Floorplanning, Partitioning and Placement B C Andrew B. Kahng, UCSD Optimal Relative Order: A B C Without “free” space, the placement problem is dominated by order ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement Problem A bad placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement A good placement Andrew B. Kahng, UCSD Global and Detailed Placement In global placement, we decide the approximate locations for cells by placing cells in global bins. Global Placement Detailed Placement In detailed placement, we make some local adjustment to obtain the final nonoverlapping placement. ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement Footprints: Standard Cell: Data Path: IP - Floorplanning ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement Footprints: Core Reserved areas IO Control Mixed Data Path & sea of gates: ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement Footprints: Perimeter IO Area IO ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement objectives are subject to user constraints / design style Hierarchical Design Constraints z pin location power rail z reserved layers z Flat Design with Floorplan Constraints Fixed Circuits I/O Connections ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Standard Cells ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Standard Cells z Power connected by abutment, placed in sea-of-rows z Rarely rotated z DRC clean in any combination z Circuit clean (I.e. no naked T-gates, no huge input capacitances) z 8,9,10+ tracks in height z Metal 1 only used (hopefully) z Multi-height stdcells possible z Buffers: sizes, intrinsic delay steps, optimal repeater selection z Special clock buffers + gates (balanced P:N) z Special metastability hardened flops z Cap cells (metal1 used?) z Gap fillers (metal1 used?) z Tie-high, tie-low ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Unconstrained Placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Floor planned Placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Traditional Placement Algorithms Bi-Partitioning / Quadrisection Force Directed Placement Hybrid t lis t e N G rit a ul n ra y ene ss Co ars Simulated Annealing out Lay Quadratic Placement Algorithm Cost Function ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Quadratic Placement Quadratic Placement x3 Min [(x1-x3)2 + (x1-x2)2 + (x2-x4)2] : F x1 δF/δx1 = 0; x2 A = 2 -1 x4 Ax = B δF/δx2 = 0; -1 2 ECE 260B – CSE 241A Floorplanning, Partitioning and Placement B = x3 x4 x= x1 x2 Andrew B. Kahng, UCSD Analytical Placement Get a solution with lots of overlaps What do we do with the overlap? ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Pros and Cons of QP Pros z Very fast analytical solution z Can handle large design sizes z Can be used as an initial seed placement engine Cons z Can generate overlapped solutions: post-processing needed z Not suitable for timing-driven placement z Not suitable for simultaneous optimization of other aspects of physical design (clocks, crosstalk, …) z Gives trivial solutions without pads (and close to trivial with pads) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Simulated Annealing Placement Initial Placement Improved through Swaps and Moves Accept a Swap/Move if it improves cost Accept a Swap/Move that degrades cost under some probability conditions Cost Time ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Pros and Cons of SA Pros: Can Reach Globally Optimal Solution (given “enough” time) Open Cost Function. Can Optimize Simultaneously all Aspects of Physical Design Can be Used for End Case Placement Cons: Extremely Slow Process of Reaching a Good Solution ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Bi-Partitioning / Quadrisection ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Pros and Cons of Partitioning Based Placement Pros: More Suitable to Timing Driven Placement since it is Move Based New Innovation (hMetis) in Partitioning Algorithms have made this Extremely Fast Open Cost Function Move Based means Simultaneous Optimization of all Design Aspects Possible Cons: Not Well Understood Lots of “indifferent” moves May not work well with some cost functions. ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Cost Functions of Placement Net-cut Timing Coupling Other performance related cost functions oar s out C Congestion t lis ity t e ar N l nu a r G Lay Quadratic wirelength ene ss Linear wirelength Algorithm Cost Function Undiscovered: crossing ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Net-cut Cost for Global Placement ECE 260B – CSE 241A Floorplanning, Partitioning and Placement The net-cut cost is defined as the number of external nets between different global bins Minimizing net-cut in global placement tends to put highly connected cells close to each other. Andrew B. Kahng, UCSD Linear Wirelength Cost (x1,y1) 1 2 (x2,y2) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement The linear length of a net between cell 1 and cell 2 is l12 = |x1-x2| +|y1-y2| The linear wirelength cost is the summation of the linear length of all nets. Andrew B. Kahng, UCSD Quadratic Wirelength Cost (x1,y1) 1 2 (x2,y2) ECE 260B – CSE 241A Floorplanning, Partitioning and Placement The quadratic length of a net between cell 1 and cell 2 is l12 = (x1-x2)2 +(y1-y2)2 The quadratic wirelength cost is the summation of the quadratic length of all nets. Andrew B. Kahng, UCSD Timing Cost Critical Path ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Delay of the circuit is defined as the longest delay among all possible paths from primary inputs to primary outputs. Interconnection delay becomes more and more important in deep submicron regime. Andrew B. Kahng, UCSD Timing Analysis 22 3 2 5 L A T C H 19 1 2 4 2 5 1 L A T C H 1 4 1 5 4 3 2 How do we get the delay numbers on the gate/interconnect? ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Approaches Budgeting z In accurate information z Fast Path Analysis z Most accurate information z Very slow Path analysis with infrequent path substitution z Somewhere in between ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Timing Metrics How do we assess the change in a delay due to a potential move during physical design? Whether it is channel routing or area routing, the problem is the same translate geometrical change into delay change ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Other costs: Coupling Cost Hard to model during placement Can run a global router in the middle of placement Even at the global routing level it is hard to model it Avoid it ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Coupling Solutions Once we have some metrics for coupling, we can calculate sensitivities, and optimize the physical design... Noisy region Extra space Grounded Shields Spacing Shielding Quiet region Segregation ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Other Performance Costs Power usage of the chip. Weighted nets Dual voltages (severe constraint on placement) Very little known about these cost functions and their interaction with other cost functions Fundamental research is needed to shed some light on the structure of them ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Placement References C. J. Alpert, T. Chan, D. J.-H. Huang, I. Markov, and K. Yan, “Quadratic Placement Revisited”,Proc. 34th IEEE/ACM Design Automation Conference, 1997, pp. 752-757 C. J. Alpert, J.-H Huang, and A. B. Kahng, “Multilevel Circuit Partitioning”, Proc. 34th IEEE/ACM Design Automation Conference, 1997, pp. 530-533 U. Brenner, and A. Rohe, “An Effective Congestion Driven Placement Framework”, International Symposium on Physical Design 2002, pp. 6-11 A. E. Caldwell, A. B. Kahng, and I.L. Markov, “Can Recursive Bisection Alone Produce Routable Placements”,Proc. 37th IEEE/ACM Design Automation Conference, 2000, pp 477-482 M.A. Breuer, “Min-Cut Placement”, J. Design Automation and Fault Tolerant Computing, I(4), 1997, pp 343-362 J. Vygen, “Algorithms for Large-Scale Flat Placement”, Proc. 34th IEEE/ACM Design Automation Conference, 1988,pp 746-751 H. Eisenmann and F. M. Johannes, “Generic Global Placement and Floorplanning”, Proc. 35th IEEE/ACM Design Automation Conference, 1998, pp. 269-274 S.-L. Ou and M. Pedram, “Timing Driven Placement Based on Partitioning with Dynamic Cut-Net Control”, Proc. 37th IEEE/ACM Design Automation Conference, 2000, pp. 472-476 C.M. Fiduccia and R.M. Mattheyses, A linear time heuristic for improving network partitions, Proc. ACM/IEEE Design Automation Conference. (1982) pp. 175 - 181. ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Reading Assignment (Posted on the web) C.M. Fiduccia and R.M. Mattheyses, A linear time heuristic for improving network partitions, Proc. ACM/IEEE Design Automation Conference. (1982) pp. 175 - 181. A. E. Caldwell, A. B. Kahng and I. L. Markov. Design and Implementation of the Fiduccia-Mattheyses Heuristic for VLSI Netlist Partitioning. Proc. Workshop on Algorithm Engineering and Experimentation (ALENEX), January, 1999 (Optional): C. J. Alpert and A. B. Kahng, "Recent Directions in Netlist Partitioning: A Survey“, Integration: The VLSI Journal 19 (1995), pp. 1-81. ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD Homework – Friday 1/19 If I model one wire segment with R, C = Rw, Cw by two segments in series, each with R, C = Rw/2, Cw/2, how does Elmore delay change? What are the differences between Kernighan-Lin and Fiduccia-Mattheyses? ECE 260B – CSE 241A Floorplanning, Partitioning and Placement Andrew B. Kahng, UCSD