VLSI Physical Design Automation Course Coordinator: Dr. R. M. Vemuri Course Structure • Final grade will depend on following: • Two minors – 25% each • One major – 50% • Ethical Conduct: • No cheating in any assignment or examination will be tolerated – maximum penalty as per university rules. • Plagiarism will be treated same as cheating. Don’t claim other’s work as own. • Text books/references: • VLSI Physical Design Automation: Theory and Practice by Sadiq Sait & Habib Youssef • Algorithms for VLSI Design Automation by Sabih H. Gerez • Practical Problems in VLSI Physical Design Automation by Sung Kyu Lim • VLSI Physical Design: From Graph Partitioning to Timing Closure by Andrew Kahng et al. Spring Semester VLSI Physical Design Automation 2 Motivation First IC by Jack Kilby of TI in 1958 1/16 x 7/16 inch Germanium Pentium 4 introduced in 2000 > 40M transistors on Silicon 16.7×17.6 mm^2 More recent processors have > 3B transistors Spring Semester VLSI Physical Design Automation 3 Design Size productivity Gap • Number of transistors on a chip doubles every 1.52.0 years (Moore’s law). • Design complexity grows exponentially • Designer/Engineer productivity cannot grow exponentially • Causes a productivity gap between requirement and actual Spring Semester VLSI Physical Design Automation 4 • VLSI – Very Large Scale Integration refers to technology through which circuits with more than 1M transistors can be implemented in silicon • VLSI has been used successfully to build microprocessors, DSPs, large capacity memories, etc. on a single chip. • Such rapid growth in integration technology would not be possible without automation (software/tools) of the various steps. • In this course we will delve into the depths of the algorithms and methods for developing such automation. Spring Semester VLSI Physical Design Automation 5 Design process • How do you approach any complex problem? • Divide and conquer!!!! • Break up a complex problem into smaller problems and tackle each smaller problem. • Develop expertise in smaller area • Integrate many smaller solutions to the overall solution • Automate, automate, automate Spring Semester VLSI Physical Design Automation 6 VLSI Design Process Spring Semester CAD subproblem level Idea Generic CAD tools Beh/Arch Arch Design Simulation tools Register Transfer Leve Logical Design Func and logic min tools Simulation tools Library Cells, Masks Physical Design APR Tools VLSI Physical Design Automation 7 Solving Design Complexity Problem Design Abstraction Spring Semester Design Partition Design Automation VLSI Physical Design Automation Design Reuse (soft IP or hard IP) 8 Spring Semester VLSI Physical Design Automation 9 VLSI Design Cycle Just like any design, the VLSI design cycle starts with a formal specification (spec) of a VLSI chip, and follows a series of steps 1. System Specification a. High level representation of the system. b. Include performance, functionality, dimensions, power, fab technology 2. Architectural Design a. What instruction set should be used, number of ALUs, memory addressing modes? b. Whether to use pipelining, and what should be the depth c. Outcome of this stage is a document MAS (micro-architectural specification) d. Predict performance, power, and die size. e. Early estimates are critical to determine viability of product Spring Semester VLSI Physical Design Automation 10 3. Behavioral Design a. b. c. d. Main functional units of the design are identified Interconnect requirements between units are also identified PPA estimates are generated for each unit Implementation specifics are still not identified at this stage. For example, it may identify a multiplication is needed, but not the specific order in which it has to be implemented. 4. Logic Design a. Control flow, word widths, register allocation, arithmetic/logical operations are derived in RTL b. RTL is expressed in a HDL such as Verilog or VHDL c. Consists of Boolean expressions and timing information d. High level synthesis tools can be used to generate RTL from behavioral design Spring Semester VLSI Physical Design Automation 11 5. Circuit Design a. Takes the form of a circuit representation from RTL b. For digital circuits, mainly through use of logic synthesis tool and for analog designs through schematic capture tool c. Circuit simulation tools are used to verify correctness of timing. For digital logic Static Timing Verification is used and for analog Spice simulations. d. This representation is also known as netlist. 6. Physical Design a. Netlist is converted into a geometric representation. b. This geometric representation is known as layout. c. These geometric forms represent transistors, and multiple layers of wires connecting the transistors. Eg. Rectangle of poly over rectangle of diffusion forms one transistor d. Layout has to follow strict design rules imposed by the process technology Spring Semester VLSI Physical Design Automation 12 7. Fabrication a. After physical verification, go through a process called tape out b. From the geometric shapes, masks are created to etch on silicon c. The dies on which the shapes are etched are made from silicon and several dies exist on a single wafer. 8. Packaging and post silicon validation a. At this stage the wafer is diced into individual chips and go through packaging b. Post silicon validation is done with early silicon using emulators, to detect any bugs. Spring Semester VLSI Physical Design Automation 13 • Some steps are usually combined and shown in flow diagram below Problem description High level design decisions Early estimates of PPA Idea No tools Architectural and Behavioral Design Modeling and Simulation tools Logic and Circuit Design Synthesis and logic Minimization tools Cell level and layout Physical Design Tools for partitioning, Placement, routing Actual hardware Industry term: First Silicon Fabrication and Packaging RTL, netlist, and Schematic entry Spring Semester VLSI Physical Design Automation Manufacturing and Packaging tools. Actual Machines. All Software SW only to Control the machines 14 Some photos from the web Layout view of processor1 Silicon wafer2 Packaged microprocessor3 1 From: https://superuser.com/questions/324284/what-is-meant-by-the-terms-cpu-core-die-and-package 2 From: https://www.dreamstime.com/photos-images/silicon-wafer.html 3 From: https://www.dreamstime.com/royalty-free-stock-images-microprocessor-image1915559 Spring Semester VLSI Physical Design Automation 15 Physical Design Cycle • Input to physical design cycle starts with netlist (circuit) and the output is a layout. This is accomplished in several stages: ➢ Partitioning: ✓Large designs consist of millions or billions of transistors, it is not possible to work with the entire design at the same time due to memory space limitations or compute power limitations. This necessitates that the chip be partitioned int sub units which later get integrated into the full chip after it goes through to layout and verification. ✓The process considers factors such as function of the block, size of the block, total number of blocks, number of interconnections between blocks (pins/ports), and the design team size ✓Output of this stage is a set of blocks and the interconnections required between blocks. ✓Each block can recursively be partitioned to sub blocks Spring Semester VLSI Physical Design Automation 16 ➢ Floorplanning and Placement: ✓Looks into good layout alternatives for each block as well as the full chip ✓Area can be estimated based on types of components and number of components in the block ✓Interconnect area is still an estimation at this point ✓Mostly rectangular blocks are created, but not necessary ✓This stage is usually done by humans as we are better at visualizing the entire floorplan than today’s state-of-the-art software. ✓During placement blocks are exactly positioned on the chip. ✓Goal is to find a minimum area arrangement that allows completion of the interconnections between blocks while meeting the performance constraints ✓Usually done in two phases – in the first phase, initial placement is created and in second phase incremental adjustments are made ✓Quality of placement can be determined only after routing is completed Spring Semester VLSI Physical Design Automation 17 ➢ Routing: ✓Objective is to complete interconnections between all pins to meet the functionality determined by the netlist ✓First the space between blocks is partitioned into rectangular regions called channels and switchboxes ✓This includes space between blocks as well as on top of the blocks ✓Goal of a router is to complete all circuit connections using the shortest possible wire length and using only the channels and switchboxes ✓First phase of routing is called Global Routing – where it defines channels and switchboxes through which wires should be routed. ✓Second phase is Detailed Routing – where exact wire dimensions, layers used for the wires. ✓Routing is an NP-hard problem – so much research has focused on heuristics to solve this problem. Spring Semester VLSI Physical Design Automation 18 ➢ Compaction: ✓Squeezing the layout from all sides to eliminate any wasted space ✓Goal is to reduce the overall area. By making the chip smaller wire lengths are reduced which in turn reduces the signal delay, and more chips can be manufactured on the same wafer. ✓When more chips are made from the same wafer, the cost/chip also goes down. ✓Very compute intensive problem, so is used only for very high volume parts. Spring Semester VLSI Physical Design Automation 19 ➢ Extraction and Verification: ✓Design Rule Checking (DRC) verifies that the geometric patterns on the chip comply with all the rules required by the fabrication process. ✓ Rules such as wire to wire spacing, minimum widths of wires, antenna checking … ✓Layout Versus Schematic (LVS) extracts the circuit from the layout and compares with the original schematic for functional accuracy. ✓Parasitic Extraction calculates the R and C values for all the wires based on their material properties and surroundings, which is used to verify performance requirements. ✓Reliability Verification is the process by which the longevity of the chip is determined as well as reliability against such things as electrostatic discharge. Spring Semester VLSI Physical Design Automation 20 Standard Cell • Standard Cell based design style is also known as APR design, ASIC design or sea-of-cells design. • Consists of a library of pre-designed cells known as standard cells (several thousand cells in a library). • Each cell in the library is a rectangular shape with same height • The std cell library goes through characterization flow, to determine its functionality, timing, and verification. • Connection points (pins/ports) are distributed on the surface or edges of the cells. • Cells typically use no more than two layers to complete the interconnections within the cell. Spring Semester VLSI Physical Design Automation 21 • Cells are of single hierarchy – no sub hierarchies within. • Designs using this methodology are not as compact as custom design, but can be completed much faster. • Automation starts after RTL and completes the full physical design – this flow is also known as RTL2GDS in the industry. • Development of the standard cell library requires a significant initial investment. Library can be reused for any number of designs. Spring Semester VLSI Physical Design Automation 22 Gate Arrays • Simplification of standard cell design. • All the cells are in a gate array are the same (NAND or NOR). • Consists of arrays of gates where these are separated by both vertical and horizontal channels. • Sea of gates is an improvement of gate array • Entire design has to be mapped to same gate. Example: • Just need to add connectivity • Reduced design time • Less chance of error • Increase in area Spring Semester VLSI Physical Design Automation 23 Field Programmable Gate Array (FPGA) • In FPGAs, cells and interconnect are pre-fabricated • Provides flexibility in design through software • Lowers development cost and faster time to market • Easily reconfigurable in the field • Not as fast or power efficient as other design styles • High area overhead due to unusable space on FPGA Spring Semester VLSI Physical Design Automation 24 Impact of fabrication on Physical Design • Scaling: The process of shrinking the size of layout is called scaling • Transistors and the interconnects that connect them are made smaller • As a transistor becomes smaller, it becomes faster, conducts more electricity, and consumes less power. • Cost of producing the transistor goes down, and more of them can be packed in one wafer • Example: If a chip is designed on 0.25u process and is m x m (m2) dimensions • Assume a typical shrink factor of 0.7 from 0.25u to 0.18u • Dimensions of the chip will be scaled by 0.7 in width and height 0.7m x 0.7m (0.5m2) • The scaled chip becomes half the size of the original chip • Transistor delay also scales accordingly, but interconnect delay does not. • Interconnect delay starts becoming a larger factor with the size reduction Spring Semester VLSI Physical Design Automation 25 Scaling Methods • Two basic types of scaling – full scaling and constant-voltage scaling • Table below shows the difference (scaling factor S) Parameter Full Scaling CV Scaling Dimensions: width, length, oxide thickness 1/S 1/S Voltages: Power, threshold 1/S 1 Gate Capacitance 1/S 1/S Current 1/S S Propagation delay 1/S 1/S2 Table taken from “Algorithms for VLSI Physical Design Automation” by Naveed Sherwani pp. 77 Spring Semester VLSI Physical Design Automation 26 Parasitic Effects of Scaling • Circuit elements come closer to one another with process scaling • This increases the inter-component capacitance values • Capacitance between signal paths and signal path to ground are two major parasitic capacitances • Another is the inherent capacitance of the MOS transistor Spring Semester VLSI Physical Design Automation 27 Interconnect Delay/Signal Integrity • Interconnect delay is typically 50-70% of the overall delay • Resistance of a conducting material is given by 𝜌𝑙𝑐 𝑅= , where ρ is the resistivity of material, lc is length of the ℎ𝑐 𝑤𝑐 wire, hc the height of wire, and wc width of wire. C α hc, wc and 1 Cα 𝑤𝑖𝑟𝑒−𝑤𝑖𝑟𝑒−𝑠𝑝𝑎𝑐𝑖𝑛𝑔 • With scaling, the resistance goes up significantly, and capacitance goes up but not as significantly • Other signal integrity issues like noise, and crosstalk have to be dealt with during design Spring Semester VLSI Physical Design Automation 28 • Heuristic Algorithms • Heuristic algorithms are frequently used for solving NP complete problems. • A heuristic algorithm produces a solution but does not guarantee optimality • Have to be tested on benchmarks to verify their effectiveness • A good heuristic must have low time and space complexity and must produce a near optimal solution • On an average these algorithms should produce acceptable (good) results • In many cases O(n) time complexity heuristics have been developed even if an optimal O(n3) or O(n2) time complexity algorithm exists • Even when optimal solutions exist but have high time complexity, it is desirable to use a heuristic which gives near optimal solution but within a reasonable time Spring Semester VLSI Physical Design Automation 29 Data Structures and Basic Algorithms • VLSI design (going from HDL to silicon) can be viewed as a significant database management problem • Layout is captured as a database of polygons as several layers of planar rectangles, with certain properties. • Each polygon is captured with great precision. This precision is necessary as this information has to be communicated to devices such as plotters, video displays, and finally to fabricating machines. • Many VLSI problems can be represented as graphs and we can use graph theory to understand and find solutions Spring Semester VLSI Physical Design Automation 30 Definition of a Graph ❑A graph is a non-empty finite set of vertices V and edges E, both ends of which belong to set V. Nodes that do not belong to any edge are called isolated. ❑Edges may be straight or curved, the length of edges and position of vertices are arbitrary. ❑An example of a graph Designation of a graph e1 v1 v2 v3 e2 e4 G(V, E) = (V, E), V φ, E V V. e3 v5 e4 V={v1, v2, …, vn} v4 Spring Semester VLSI Physical Design Automation 31 Some Basic Concepts ❑Let v1, v2 be vertices, e=‹ v1, v2›- connecting them to the edge. Then the vertices v1 and v2 are incident to edge e. ❑Two edges, incident to a vertex are called adjacent. ❑The number of vertices of a graph G is denoted by p, and the number of edges - q, then: ➢A graph is called complete if every two distinct vertices are connected by one and only one edge. Spring Semester VLSI Physical Design Automation 32 Some Basic Concepts ❑The degree of a vertex is the number of edges of the graph which this vertex belongs to and is denoted d(v), deg(v). Vertex of the graph for which d (v) = 0 is isolated if d (v) = 1, then terminal. a e d Deg (d) = 3; deg (e) = 1; e - terminal vertex deg (c) = 0; c - isolated vertex b c ❑The vertex is called odd if d (v) - an odd number, even if d (v) an even number. The degree of each vertex of a complete graph is one less than the number of its vertices. Spring Semester VLSI Physical Design Automation 33 Properties of the Degree of Vertices ❑In the graph G (V, E) the sum of the degrees of all its vertices - an even number equal to twice the number of edges. ❑The number of odd vertices of any graph is even. ❑In any graph with n vertices, where n ≥ 2, there will always be at least two vertices with the same degrees. ❑If in the graph with n vertices (n ≥ 2) exactly two vertices have the same degree, then this graph will always have either exactly one vertex of degree 0, or exactly one vertex of degree n-1. Spring Semester VLSI Physical Design Automation 34 Paths of the Graph ❑The flow in a graph is an alternating sequence of vertices and edges, in which any two adjacent elements are incident: v0 , e1 , v1 , e2 , v2 ,..., ek , vk ❑If v0 = vk , the route is closed, otherwise open. ❑If all edges are distinct, then the route is called a chain. ❑If all vertices are distinct, then the route is called a simple chain. In the chain 𝑣0, 𝑒1, 𝑣1 , 𝑒2 , 𝑣2, . . . , 𝑒𝑘 , 𝑣𝑘 vertices v0 and vk are called the ends of the chain. ❑A closed chain is called a cycle, a closed simple chain is called a simple cycle. ❑A graph without cycles is called acyclic. ❑If the route M = v0 , e1 , v1 , e2 , v2 ,...,ek , vk , the route length M is equal to k. Spring Semester VLSI Physical Design Automation 35 Undirected Graph ❑An undirected graph is a type of graph where the edges have no specified direction assigned to them. ❑Below is an example of an undirected graph: a vertex b an edge 4 1 e a 2 c 5 3 f Spring Semester VLSI Physical Design Automation 36 Undirected Graph ❑A graph is defined in terms of vertices and edges. Two vertices are connected to each other by an edge. ❑A graph is G=(X,E) where G is a graph, X is the set of vertices, E is the set of edges connecting the vertices, hence each of element of E is an unordered pair of two elements from X. ❑Any situation (system) that contains a set of elements (the vertices) and the relationship between pairs of elements (the edges) can be described by an undirected graph. Spring Semester VLSI Physical Design Automation 37 Directed Graph ❑A directed graph contains edges that are ordered pairs of vertices. ➢ In a directed graph, the edges have a direction associated with them, indicated by an arrow a vertex b an edge 4 1 e a 2 c 5 3 f Spring Semester VLSI Physical Design Automation 38 Network ❑ A network is a graph where each edge (or arc) is associated with a number (value). 1 e1 0.5 2 e2 4 e3 3 e4 4 2 -3 ❑ The actual meanings of the numbers depend on the application. In general they may be positive or negative. ❑ A network is also called a weighted graph. a loop 1 Spring Semester 2 3 VLSI Physical Design Automation 39 Planar Graph ❑ A planar graph is a graph which can be drawn with no two edges crossing each other. 1 2 ❑ This is a planar graph … Spring Semester 3 4 1 2 3 4 VLSI Physical Design Automation 40 Tree ❑ A tree is an acyclic connected graph ➢ For any pair of nodes in the graph, there must be exactly one way to travel between them ❑ A binary tree is a tree where every node has at most 3 neighbors ➢ Every node has two edges except the leaf nodes internal node leaf node Spring Semester VLSI Physical Design Automation 41 Tree Terminology ⚫ Root: A ⚫ Internal nodes: A, C ⚫ Leaves: B, D, E ⚫ A's children: B, C ⚫ D's parent: C ⚫ C's sibling: B ⚫ E's grandparent: A ⚫ Height: 2 ⚫ Shorter binary trees are better for most algorithms and data structures Spring Semester A B C D VLSI Physical Design Automation E 42 Data Structures for Representation of Graphs ⚫ Adjacency Matrix Spring Semester VLSI Physical Design Automation 43 Data Structures for Representation of Graphs ⚫ List Representation Spring Semester VLSI Physical Design Automation 44 Spring Semester VLSI Physical Design Automation 45 Spring Semester VLSI Physical Design Automation 46 Complexity Issues and NP-hardness • Many algorithms and mathematical techniques are used for solving physical design problems in VLSI • We will study some algorithms that fall into the category of Greedy Algorithms, and Heuristic Algorithms • Due to the size of VLSI designs all algorithms must have low time and space complexity • Major cause of concern is absence of polynomial time algorithms for majority of the problems encountered in physical design automation Spring Semester VLSI Physical Design Automation 47 • The class of solvable problems can be classified into P and NP • The class P consists of problems that can be solved in polynomial time by a deterministic Turing machine • NP problems can be solved in polynomial time by non-deterministic Turing machine – can be viewed as a parallel computer with as many processors as we need – non realistic model • Using more processors if we can reduce every NP problem to a problem P, then problem P is in class NP itself Spring Semester VLSI Physical Design Automation 48 • Exponential Algorithms • If size of the problem is small, can use exponential time algorithms. This is utilized when the solution of problem is critical to the chip for some practical purpose, then it is important to take the time hit to get optimality. Integer programming is one such example, used for combinatorial optimization. • Special Case Algorithms • Not really a class in itself, but a complex problem may be simplified by applying certain constraints. Sometimes an NP complete problem may be solvable in polynomial time with restrictions applied. Placing cells of equal height (std. cells) in rows than cells of unequal heights. • Approximation Algorithms • Useful when near optimality is sufficient. Such algorithms produce results, even though it may be subpar but no worse than a lower bound. Spring Semester VLSI Physical Design Automation 49 • Some Graph Algorithms. One significant advantage of using graph algorithms is that they have been well-studied and understood. • Graph search algorithms have many applications in VLSI physical design, where problems are modeled using graphs. • Depth First Search (DFS): In DFS, an edge is selected for exploration from the most recently visited vertex v. When all edges of v have been explored, the algorithm back tracks to the previous vertex which has an unexplored edge • Time complexity of DFS is O(V + E), where V is the set of vertices and E is the set of edges Spring Semester VLSI Physical Design Automation 50 • Breadth first search (BFS): This algorithm starts from some vertex, and searches all the adjacent vertices before exploring the adjacencies of other vertices. • Starts with some source vertex v • Explores all edges of v • Puts the reachable vertices in a queue and marks v as visited • If a vertex is already marked then it is not queued • This process is repeated for each vertex in queue • The time complexity of BFS is also O(V + E) Spring Semester VLSI Physical Design Automation 51 Spring Semester VLSI Physical Design Automation 52 • Spanning Tree Algorithms • Many graph problems are of subset selection problems • Given a graph G = (V, E) select a subset V’ V, such that V’ has property P • Spanning tree is a set of edges which spans all the vertices and forms a tree. • Minimum Spanning Tree (MST) is a spanning tree with minimum cost function • Each edge of the graph will have a cost associated with it. • For example, each edge may be the distance between two pins in a design. A tree that gives the minimum wire length would be the minimum spanning tree Spring Semester VLSI Physical Design Automation 53 • There are three algorithms for finding MST • Kruskal’s algorithm, and • Prim’s algorithm • We will study Kruskal’s algorithm and Prim’s algorithm Spring Semester VLSI Physical Design Automation 54 • Kruskal’s algorithm for finding minimum spanning tree 1. Sort all the edges in a non-decreasing order of their weight for a graph with V vertices 2. Pick the smallest edge, and check if it forms a cycle with the already spanning tree formed so far: i. ii. If a cycle is formed, discard the edge. If a cycle is not formed, add the edge to the spanning tree 3. Repeat step 2 until all V vertices have been connected • Kruskal’s algorithm is a greedy algorithm • It always looks for the shortest path (minimum weight) for adding to the tree • Greedy algorithms can get stuck in local optima Spring Semester VLSI Physical Design Automation 55 Spring Semester VLSI Physical Design Automation 56 Spring Semester VLSI Physical Design Automation 57 Prim’s Algorithm 1: Determine an arbitrary vertex as the starting vertex of the MST. 2: Follow steps 3 to 5 till there are vertices that are not included in the MST 3: Find edges connecting any tree vertex with the other vertices. 4: Find the minimum among these edges. 5: Add the chosen edge to the MST if it does not form any cycle. Spring Semester VLSI Physical Design Automation 58 Shortest Path Algorithms • Many routing problems in VLSI are nothing but shortest path problems. These algorithms have a significant role in VLSI design. • Single Source Shortest Path: • Given an edge-weighted graph G = (V, E) and two vertices u,v ε V, select a set of vertices that induce a path of minimum cost in G. • Let w(p,q) be the weight of edge (p,q) with w(p,q) ≥ 0 for each (p,q) ε E • Dijkstra’s algorithm solves this problem in time complexity of O(n2) Spring Semester VLSI Physical Design Automation 59 Dijkstra’s Algorithm dist[s] ←0 for all v ∈ V–{s} do dist[v] ←∞ S←∅ Q←V while Q ≠∅ do u ← mindistance(Q,dist) S←S∪{u} for all v ∈ neighbors[u] do if dist[v] > dist[u] + w(u, v) then d[v] ←d[u] + w(u, v) return dist # distance to source vertex is zero # set all other distances to infinity # S, the set of visited vertices is initially empty # Q, the queue initially contains all vertices # while the queue is not empty # select the element of Q with the min. distance # add u to list of visited vertices # if new shortest path found # set new value of shortest path 0 s 3 1 u 1 1 4 2 2 v 3 y 1 x 2 2 3 2 z 5 Spring Semester VLSI Physical Design Automation 60 Simulation of Dijkstra’s Algorithm Round Vertex Added a 1 s 3 1 s a b c d c 2 1 4 4 6 b 1 d 3 Spring Semester VLSI Physical Design Automation 61 More reading … • Reading assignment (Sherwani’s book) • Min-Cut and Max-Cut Algorithms Pages 110-115 • Steiner Tree Algorithms • Atomic Operations for Layout Editors Pages 117-118 • Corner Stitching Pages 123-129 • Become familiar with these algorithms/concepts • Will come up for discussion later during this course • Questions and discussion encouraged in next class Spring Semester VLSI Physical Design Automation 62 Mid Term 1 Spring Semester VLSI Physical Design Automation 63 Circuit Partitioning Spring Semester VLSI Physical Design Automation 64 Introduction ❖Introduction to partitioning ❖Problem definition ❖Cost function and constraints ❖Approaches to partitioning ▪ Kernighan-Lin Heuristic ▪ Fiduccia-Mattheyses Heuristic ▪ Simulated Annealing ▪ Genetic Algorithm Spring Semester VLSI Physical Design Automation 65 Partitioning Algorithms ❖Iterative partitioning algorithms ❖Spectral based partitioning algorithms ❖Net partitioning vs. module partitioning ❖Multi-way partitioning ❖Multi-level partitioning ❖Further study in partitioning techniques (timing-driven …) Spring Semester VLSI Physical Design Automation 66 Problem Definition ❖Partitioning is the process of decomposing a system into a set of smaller sub-systems ❖The system must be decomposed in a way that the sub-systems maintain the original functionality ❖An interface specification is generated during the decomposition, which is used to connect all the sub-systems ▪ Should attempt to minimize the interface interconnections between any two sub-systems ❖The decomposition process should be efficient so the time required for the decomposition remains only a small fraction of the total design time. Spring Semester VLSI Physical Design Automation 67 Motivation • Circuit is too large to be designed as a single entity • Capacity limitations of simulation tools • Design is too large to be placed on a single chip • I/O pin limitations from packaging Spring Semester VLSI Physical Design Automation 68 Partitioning Example 5 1 6 3 2 7 4 8 C1 1 (b) C2 2 5 7 3 8 6 4 1 (a) 4 2 3 5 6 7 8 (c) Spring Semester VLSI Physical Design Automation 69 Block A Cut size A = 5 Area of Block A = 15 Spring Semester Block B Cut size B = 7 Area of Block B = 10 VLSI Physical Design Automation 70 • Problem Formulation • A graph G = (V, E) representing a partitioning problem can be constructed as follows: • Let V = {v1, v2, …, vn} be a set of vertices and E = {e1, e2, …, em} be a set of edges, where each vertex represents a component (transistor, std. cell, gate, macro cell) of the design. • There is an edge joining the vertices whenever the components corresponding to these vertices are to be connected. • Each edge is a subset of the vertex set i.e., ei ⊆ V, where i = 1, 2, …, m. Each edge represents a net in the design • The area of each component is denoted as a(vi), 1 ≤ i ≤ n. • The partitioning problem is to partition V into V1, V2, …, Vk where 𝑉𝑖 ∩ 𝑉𝑗 = ∅, 𝑖 ≠𝑗 For i = 1 to k Partition is also referred to as a cut, and the cost of partition is called cut size Cut size can be the number of edges crossing the cut Spring Semester VLSI Physical Design Automation 71 Iterative Partitioning Algorithms ❖Deterministic and Greedy iterative improvement algorithms ▪ Kernighan-Lin 1970 ▪ Fiduccia-Mattheyses 1982 ❖Non-deterministic and non-greedy iterative algorithms ▪ Simulated Annealing ▪ Genetic Algorithm Spring Semester VLSI Physical Design Automation 72 • Solutions for Partitioning Problem • Even the simplest 2-way partitioning problem with identical node sizes and unit edges is NP-complete. This is a special case of the k-way partitioning • If there was no need to balance the two partitions, we could use the maxflow mincut algorithm to get the minimum size cut. • For separating a design with 2n elements into two partitions, the total number of ways is given as 1 2𝑛 ! 2𝑛 ! 𝑃 2𝑛 = . = 2 𝑛! 2𝑛 − 𝑛 ! 2 𝑛! 𝑛! n Number of possible 2-way partitions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Spring Semester 1 3 10 35 126 462 1716 6435 24310 92378 352716 1352078 5200300 20058300 77558760 300540195 1166803110 4537567650 17672631900 68923264410 Number of possible 2 Number of possible 2-way partitions 8E+10 100000 90000 7E+10 80000 6E+10 70000 5E+10 60000 50000 4E+10 40000 3E+10 30000 2E+10 20000 1E+10 10000 0 0 0 5 10 15 20 25 0 VLSI Physical Design Automation 2 4 6 73 K-L Algorithm • Kernighan-Lin Algorithm: • An iterative improvement algorithm • One of the most popular for solving two-way partitioning problem • Can be extended to the more general case • The problem is characterized by a connectivity matrix C. Element c ij represents the sum of weights of the edges connecting elements i and j . • In TWPP, since the edges have unit weights, c ij simply counts the number of edges connecting i and j . • The output of the partitioning algorithm is a pair of sets A and B such that |A| = n = |B|, and A ∩B = ∅, and such that the size of the cutset T is minimized. Spring Semester VLSI Physical Design Automation 74 K-L Algorithm 𝑇= 𝑐𝑎𝑏 𝑎 ∊𝐴,𝑏 ∊𝐵 • Kernighan-Lin heuristic is an iterative improvement algorithm. It starts from an initial partition (A, B) such that |A| = n = |B|, and A ∩B = ∅. • How can a given partition be improved? • Let P ∗ = {A∗, B ∗ } be the optimum partition and P = {A, B } be the current partition. • Then, in order to attain P ∗ from P , one has to swap a subset X ⊆ A with a subset Y ⊆ B such that, ✓ |X| = |Y| ✓ X = A ∩B∗ ✓ Y = A* ∩B Spring Semester VLSI Physical Design Automation 75 K-L Algorithm ▪ A∗ = (A − X ) + Y and B∗ = (B − Y ) + X. ▪ The problem of identifying X and Y B∗}. X is as hard as that of finding P ∗ = {A∗, Y X A* B* Y A B Optimal Initial Spring Semester VLSI Physical Design Automation 76 K-L Algorithm - Definitions ▪ Definition 1: Consider any node a in block A. The contribution of node a to the cutset is called the external cost of a and is denoted as Ea, and 𝐸𝑎 = 𝑐𝑎𝑣 𝑣 ∊𝐵 ▪ Definition 2: The internal cost of Ia of node a in block A is defined as 𝐼𝑎 = 𝑐𝑎𝑣 𝑣 ∊𝐴 Moving node a from block A to block B would increase the value of the cutset by Ia and decrease it by Ea. Therefore, total change would be Da = Ea − Ia Spring Semester VLSI Physical Design Automation 77 Example ▪ In the figure below, consider nodes a in A and b in B ▪ Ia = 2, Ib = 3, Ea = 3, Eb = 1, Da = Ea - Ia = 1, and Db = Eb - Ib = -2 ▪ Lemma: gab = Da + Db - 2cab a b Spring Semester VLSI Physical Design Automation 78 K-L Contd ▪ Solve Example 2-2 from text book by Sait and Youssef page 53 Spring Semester VLSI Physical Design Automation 79 Fiduccia Mattheyses (FM) Algorithm ❖C. M. Fiduccia and R. M. Mattheyses were researchers at General Electric Research and Development Center in Schenectady, NY ❖Paper titled “A Linear-Time Heuristic for Improving Network Partitions” was presented at the 1982 Design Automation Conference Spring Semester VLSI Physical Design Automation 80 Features of FM Algorithm ▪ Modification of KL Algorithm ▪ Iterative graph partitioning heuristic ▪ Operates in multiple passes ▪ Move one vertex at a time to improve the cut ▪ Innovative use of data structures makes this heuristic very efficient Spring Semester VLSI Physical Design Automation 81 Features of FM Algorithm ▪ Similarities to KL ▪ Works in passes - iteratively improves the partitions ▪ Locks nodes after a move ▪ Differences from KL ▪ Does not exchange pairs of nodes. Move only one node at a time ▪ Use of gain bucket data structure Spring Semester VLSI Physical Design Automation 82 • Definition: The gain of a vertex is the improvement to the cut when it is moved to the other partition +2 Spring Semester +1 VLSI Physical Design Automation 0 -1 83 Details of FM Algorithm ▪ The data structure used for choosing the next vertex to be moved is shown below +pmax max gain vertex vertex vertex -pmax Vertex 1 Spring Semester 2 3 VLSI Physical Design Automation n 84 Details of FM Algorithm ▪ Each component is represented as a vertex that can be moved ▪ The vertex gain is an integer and each vertex has its gain in the range of pmax to +pmax, where pmax is the maximum vertex degree in the graph ▪ Since vertex gains have restricted values, bucket sorting can be used to maintain a sorted list of vertex gains - in an array BUCKET[-pmax … +pmax] ▪ The kth entry contains a doubly linked list of free vertices with gains that are currently equal to k ▪ Two such arrays are needed - one for each partition ▪ Each array is maintained by moving a vertex to the appropriate bucket whenever its gain changes due to the movement of one of its neighbors ▪ Direct access to each vertex from a separate field in the VERTEX array allows removal of a vertex from its current list and its movement to the head of its new bucket list in constant time Spring Semester VLSI Physical Design Automation 85 Details of FM Algorithm ▪ As only free vertices are allowed to move, only their gains are updated ▪ Whenever a base vertex is moved, it is locked and removed from its bucket list and placed on a FREE VERTEX LIST, which is later used to reinitialize the BUCKET array for the next pass. ▪ The FREE VERTEX LIST saves a lot of work when a large number of vertices have permanent block assignments and are not allowed to move any more ▪ Each BUCKET has a MAXGAIN index which is used to keep track of the bucket having a vertex of highest gain. ▪ This index is updated by decrementing it whenever its bucket is found to be empty and resetting it to a higher bucket whenever a vertex moves to a bucket above MAXGAIN Spring Semester VLSI Physical Design Automation 86 Pseudo Code for F-M Algorithm Spring Semester VLSI Physical Design Automation 87 Spring Semester VLSI Physical Design Automation 88 Simulated Annealing ▪ Most widely used iterative technique for solving combinatorial optimization problems ▪ It is an adaptive heuristic and belongs to the class of nondeterministic algorithms ▪ First introduced in 1983 by Kirkpatrick, Gelatt, and Vecchi. ▪ Inspired from the metallurgical process of carefully cooling molten metals in order to obtain a good crystal structure ▪ In annealing the metal is heated to a very high temperature, and then slowly cooled at a proper rate to get proper crystal structure Spring Semester VLSI Physical Design Automation 89 Simulated Annealing ▪ Every combinatorial optimization problem is a search problem through the state space of the combinatorial elements involved ▪ An iterative improvement scheme starts with some given state on the search space and examining a local neighborhood of states for a better solution ▪ A local neighborhood of a state S is the set of all states which can be reached from S by making a small change to S Spring Semester VLSI Physical Design Automation 90 Simulated Annealing ▪ When all the local neighbors have inferior costs, the algorithm is said to have converged to a local or global optimum point on the search ▪ Simulated Annealing is a non-greedy algorithm and can climb out of local optimum points ▪ No way for the algorithm to find the global optimum unless it can climb the hill out of the local optimal point ▪ Core of the algorithm is known as the Metropolis procedure, which simulates the annealing process at a given temperature T ▪ Metropolis (named after scientist) receives as input the current solution S and a value M which is the amount of time for which annealing must be applied at temperature T Spring Semester VLSI Physical Design Automation 91 Simulated Annealing ▪ Amount of time spent in annealing at a given temperature is gradually increased as temperature itself is lowered ▪ This is done using the parameter β > 1 ▪ The variable Time keeps track of the time being spent in each call to the Metropolis ▪ The annealing procedure stops when Time exceeds the allowed time ▪ The pseudo code for the algorithm is given on the next slide Spring Semester VLSI Physical Design Automation 92 Simulated Annealing Spring Semester VLSI Physical Design Automation 93 Partitioning Using Simulated Annealing ▪ Once again we consider a two-way partitioning problem ▪ Requirement is to generate an almost balanced partition with minimum cutset ▪ To use SA to solve this problem, first we need a cost function which can represent both the balance criteria as well as the cutset. Define: Spring Semester VLSI Physical Design Automation 94 Partitioning Using Simulated Annealing ▪ We can define cost function as: ▪ Ws and Wc are constants in the range of [0,1] ▪ These constants indicate the importance given to the imbalance between the partitions and the cutset of the edges between the two partitions Spring Semester VLSI Physical Design Automation 95 Neighbor Function for Partition Problem with SA ▪ Simplest neighbor function that can be used is the pairwise exchange mechanism of K-L algorithm ▪ Other neighbor functions that can be used: ▪ Select those components whose contribution to the external cost is high ▪ Select those elements that have the minimum internal connections ▪ SA does not have any imposition on the cost function or the neighbor selection function Spring Semester VLSI Physical Design Automation 96 Spring Semester VLSI Physical Design Automation 97 Partition Problem Using Genetic Algorithms ▪ Developed by John Holland of the University of Michigan with learnings from Natural Evolution ▪ Probabilistic transition rules are used ▪ Similar to SA this too is a non-deterministic algorithm ▪ Allows hill climbing to get out of locally optimum points on the search space ▪ Representation of any solution point is critical for modeling accurately ▪ Optimum search from a population of points ▪ GA has no memory, just characteristics carried over from one generation to the next Spring Semester VLSI Physical Design Automation 98 Steps involved in Genetic Algorithms ▪ The reproductive process of GA is described by the following steps: ▪ Natural Selection: Similar to natural evolution, this conceptualizes survival of the fittest. The algorithm chooses two dissimilar parents with high fitness to survive from one generation to the next ▪ Crossover: The biological equivalent of mating and producing offspring. Solutions that have a higher fitness are more likely to be chosen for crossover. In crossover, properties of both parent solutions are combined for the offspring ▪ Mutation: This is similar to mutation as seen in evolution and used to maintain diversity in the population of solutions. Similar to evolution, mutation occurs in GA with a very low probability Spring Semester VLSI Physical Design Automation 99 Steps involved in Genetic Algorithms ▪ Fitness or Cost Function: This is used for evaluating the fitness of any solution (members of the population). The results are used to compare two individual solutions. For GA, the cost function can be exactly what it was in SA. ▪ Population Update: This process replaces the old population with fresh individuals to obtain the higher-quality population and maintain diversity ▪ Maintaining some good solutions from past generations is important. To avoid the loss of the parent solutions in the new generation, some of the highly fit solutions are moved over to the next generation. Spring Semester VLSI Physical Design Automation 100 Genetic Algorithms for Partitioning (TWPP) ▪ Initial population of p can be generated randomly. ▪ A string representation would be good for a TWPP ▪ Length of string would the same as the number of components in the design ▪ Each position in the string can be either a 0 or a 1 and would represent the component with associated with that index of the array would be assigned to either partition 0 or partition 1 ▪ For example: In a string 0 1 1 1 1 0 0 0 Partition 0 would consist of components n0, n5, n6, n7, while partition 1 would consist of n1, n2, n3, and n4 Spring Semester VLSI Physical Design Automation 101 Genetic Algorithms for Partitioning (TWPP) ▪ For GA the cost function would have to provide proper weightage to both the imbalance factor and the cutset factor. ▪ GA can still allow a string that may have all 0’s or all 1’s, but the cost function would ensure that it does not make it to the next generation. ▪ In every generation and from one generation to the next, it is important to know the best solution upto that point ▪ Population size (p) per generation should be fixed (30 - 50). Generation count is relatively large (G) around 1000 ▪ Mutation probability should be kept very small (0.005). It can be 𝑝 computed as 𝑃𝑚 = 1 − 0.9 ∗ ( ) 𝐺 ▪ Stopping criteria can be generation count (G) Spring Semester VLSI Physical Design Automation 102 Genetic Algorithms for Partitioning - Flow Chart Source: “Mutli-objective module partitioning design for dynamic and partial reconfigurable system-on-chip using genetic algorithm by Nithiyanantham Janakiraman and Palanisamy Nirmal Kumar. Journal of Systems Architecture. Elsevier Publication Spring Semester VLSI Physical Design Automation 103 Spring Semester VLSI Physical Design Automation 104 Floorplanning ▪ At the floorplanning stage, the VLSI circuit is seen as a set of rectangular blocks interconnected by signal nets ▪ These rectangular blocks are placed on a two-dimensional surface such that no two blocks overlap while optimizing certain objectives ▪ During floorplanning the overall area estimate is obtained, the pin and pad lo Spring Semester VLSI Physical Design Automation 105 • Problem Formulation for Floorplanning ➢ Input: Blocks B1, B2, …, Bn of circuits with areas A1, A2, …, An respectively. Associated with each block are aspect ratios ri and si for the lower bound and upper bound. ➢Output: Determine the location of each block Bi along with its width and height. In addition to finding the location and shape, the floorplanning algorithm has to generate a valid placement for any of the following objectives: ✓ Minimize area ✓ Minimize wirelength ✓ Maximize routability ✓ Minimize delays, or ✓ A combination of two or more of the above criteria Spring Semester VLSI Physical Design Automation 106 • Minimize area • Find a feasible floorplan with the smallest overall area. • Falls into the category of generalized two-dimensional bin-packing problem • Even this simplified version of the floorplanning problem has been shown to be NP-hard [B.S. Baker, et al. “Orthogonal packing in two dimensions”, SIAM J. Compt, 9:846-855, 1980. • Several P-time approximation algorithms exist. • Minimize wirelength • Find a feasible floorplan with minimum overall interconnect length • A coarse measure of wirelength is used during floorplan • All I/O pins of the block are merged and assumed to reside in the center • Overall wirelength is calculated as L = ∑ Ci,j * Di,j where Ci,j is the connectivity between blocks i and j, and Di,j is the Manhattan distance between the centers of the rectangles i and j. Spring Semester VLSI Physical Design Automation 107 Floorplanning • Assume we have five blocks with dimensions as given in table, some feasible FPs is given below. • All these FPs have the same area. If area is the only cost fn, all these FPs are equally good. Module Width Height 1 1 1 2 1 1 3 2 1 4 1 2 5 1 3 Spring Semester VLSI Physical Design Automation 108 Another example of Floorplan • In this, area and wirelength can be used in cost function • Many feasible solutions exist and finding the optimal solution is once again an NP-hard problem Spring Semester VLSI Physical Design Automation 109 Terminology • Rectangular Dissection: • It is a subdivision of a given rectangle by a finite number of horizontal and vertical line segments into a finite number of non-overlapping rectangles. • Slicing Structure: • A rectangular dissection that can be obtained by iteratively subdividing rectangles horizontally or vertically into smaller rectangles • Slicing Tree: • A slicing structure can be modeled by a binary tree with n leaves and n-1 nodes, where each node represents a vertical or horizontal cutline and each leaf a basic rectangle. A slicing tree is also known as slicing floorplan tree. • A skewed slicing tree is one in which no node and its right child are the same. Spring Semester VLSI Physical Design Automation 110 Slicing Tree Example Spring Semester VLSI Physical Design Automation 111 Slicing and Non-slicing Floorplans • A FP that corresponds to a slicing structure is called a slicing FP, otherwise, it is called a nonslicing floorplan • Such floorplans are known as wheels. A wheel is the smallest nonslicing floorplan Spring Semester VLSI Physical Design Automation 112 Spring Semester VLSI Physical Design Automation 113 Floor Planning Algorithms • Classification of Floorplanning Algorithms ➢ Constructive ▪ Attempt to build a feasible solution by starting from a seed module, then adding in other modules to the partial floorplan. Example: Cluster Growth ➢ Iterative ▪ Start with initial floorplan, which is perturbed to obtain another feasible floorplan until no further improvements can be obtained. Example: Simulated Annealing, GA ➢ Knowledge Based ▪ As the name suggests a knowledge expert system is implemented with the help of human experts who understand the system well enough to suggest where certain rectangles should be placed. Spring Semester VLSI Physical Design Automation 114 Cluster Growth Algorithm • Greedy algorithm. Floorplan is constructed one module at a time until each module is assigned to a location of the floorplan • A seed module is selected and placed into the lower left corner of the floorplan • The remaining modules are selected one at a time and added to the partial floorplan, while trying to grow evenly • In example below, module a is placed in lower left corner, modules b and c are placed so that the increase to the floorplan dimensions is minimum Spring Semester VLSI Physical Design Automation 115 Module Selection for Cluster Growth Algorithm • The ordering of a particular module m depends on the types of nets attached to m • Three categories of nets: • Terminating Nets - have no other incident blocks that are unplaced • New Nets - have no pins on any module from the partially constructed FP • Continuing Nets - have at least one pin on a module from the partial FP and at least one pin on an unplaced module • The module that completes the greatest number of “unfinished” nets should be placed first. (Read Example 3.2 VLSI Physical Design Automation: Theory and Practice by Sait and Youssef) Spring Semester VLSI Physical Design Automation 116 Cluster Growth Algorithm • A seed module is selected and placed into the lower left corner of the floorplan • Remaining modules are selected one at a time in an ordered way • To determine the order the modules are organized into a linear list which minimizes the number of nets that will be cut by any vertical line drawn between any consecutive modules in the linear order • The linear ordering algorithm starts with a seed module, and enters a repeat loop • At each iteration, a gain function is computed for each module in the set of remaining unordered modules. • Module with the maximum gain is selected for inclusion • In case of a tie, module which terminates maximum started nets is selected • If this is also a tie, then module that is connected to largest number of continuing nets is selected Spring Semester VLSI Physical Design Automation 117 Cluster Growth Algorithm Spring Semester VLSI Physical Design Automation 118 Spring Semester VLSI Physical Design Automation 119 Simulated Annealing • First an initial solution for the floorplan is selected – may be done randomly or using one of the deterministic algorithms • Controlled walk through the search space is performed until no sizeable improvement can be made or we run out of time. How does one ensure that there are no overlaps in intermediate solutions? • Two approaches possible – • Direct approach – SA is applied directly to the physical floorplan • Indirect approach – SA is applied to an abstract representation of the actual floorplan Spring Semester VLSI Physical Design Automation 120 Simulated Annealing – Indirect Approach • Restricted to slicing floorplans • Slicing floorplans have the disadvantage of extra dead space but the advantage of computational ease Spring Semester VLSI Physical Design Automation 121 Simulated Annealing – Indirect Approach • The hierarchical structure of a slicing structure can be represented by a binary tree with n leaves representing the n basic modules, and (n – 1) nodes representing the dissection/slicing operations • Postorder traversal of a slicing tree will produce Polish expression with operators H and V, and with operands 1, 2, … n • In postorder traversal of a binary tree, the tree is traversed by visiting at each node the left subtree, the right subtree, and then the node itself • Since there is only one way of performing a postorder traversal of a binary tree, there is a one-one correspondence between floorplan and its corresponding Polish expression Spring Semester VLSI Physical Design Automation 122 Polish expression • Below diagram shows an example of a rectangular dissection and its corresponding slicing tree • The operators H and V have the following meanings • ijH means rectangle j on top of rectangle i • ijV means rectangle i to the left of rectangle j Spring Semester VLSI Physical Design Automation 123 Algorithm for post order traversal of binary tree • In postorder traversal of a binary tree, the tree is traversed by visiting at each node the left subtree, the right subtree, and then the node itself • A Polish expression E = e1e2…en-1 is called normalized iff E has no consecutive H’s or V’s Spring Semester VLSI Physical Design Automation 124 Simulated Annealing – Indirect Approach • Floorplan solutions are represented by normalized Polish expressions • Three types of perturbations are possible in Polish expression: M1: Swap two adjacent operands M2: Complement some chain of nonzero length (V => H; and H => V) M3: Swap two adjacent operand and operator • Two normalized Polish expressions are called neighbors if one can be obtained from the other using one of the above three moves • Algorithm has to take care that neighbors of normalized expressions are also normalized. M1 and M2 always produce a normalized expression. If M3 produces a non normalized Polish expression, that move is rejected. Spring Semester VLSI Physical Design Automation 125 Spring Semester VLSI Physical Design Automation 126 Simulated Annealing – Indirect Approach • Checking the new expression E does not contain two identical consecutive operators is straightforward and achievable in O(1) time. • To measure how good a floorplan solution is, we use a cost function • Cost function usually measures and tries to minimize the overall area of the floorplan and the overall interconnect length 𝐶𝑜𝑠𝑡 𝐹 = 𝛼𝐴 + λ𝑊 where A is the area of the smallest rectangle enveloping the n modules, and W is the measure of the overall wire length. What are α and λ ? Spring Semester VLSI Physical Design Automation 127 Simulated Annealing – Example Spring Semester VLSI Physical Design Automation 128 Spring Semester VLSI Physical Design Automation 129 Simulated Annealing – Actual Algorithm • When using SA technique, there are several important decisions that must be made: • A choice of initial solution • A choice of cooling schedule • Initial temperature • how long before reducing the temperature • temperature reduction rate • Perturbation function • Termination condition for the algorithm Spring Semester VLSI Physical Design Automation 130 Simulated Annealing – Actual Algorithm • A choice of initial solution – Can be randomly created as long as it is a feasible solution. • A choice of cooling schedule • Initial temperature: Chosen such that the probability of accepting uphill moves is higher. • How long before reducing the temperature: At each temperature, a number of trials are attempted until either N uphill moves, or total number of moves exceeds 2N, where N is an increasing function of n, the number of modules • Temperature reduction rate: Generally accepted value for λ is 0.85 • Perturbation function • Termination condition for the algorithm: When number of accepted moves is too small (≤ 5%) or when temperature is below 0.1*T0 Spring Semester VLSI Physical Design Automation 131 Simulated Annealing – Pseudocode Spring Semester VLSI Physical Design Automation 132 Other Floorplanning Techniques • Read pages 122 – 160 from “VLSI Physical Design Automation: Theory and Practice” by Sait and Youssef Spring Semester VLSI Physical Design Automation 133 Placement ▪ Circuit placement is the process of determining the location of each gate (or library cell) in the netlist ▪ Usually the objectives include area, wirelength, timing, congestion, thermal hotspots, power consumption, power supply noise, and routability ▪ Placement is usually done in two phases ▪ Global placement where the rough location of each gate is determined. Overlap of cells is allowed at this stage. ▪ Detailed placement is the step where the cell location is hardened and overlaps are removed Spring Semester VLSI Physical Design Automation 134 Complexity of Placement ▪ The placement of cells in order to minimize the total wirelength is an NP-complete problem ▪ Even the simplest case of the problem – one-dimensional placement is hard to solve. 𝑛! ▪ There are as many as linear arrangements of n cells 2 ▪ In practice, the number of cells to be placed can be several thousands or more ▪ Enumerating all possibilities and selecting the best is impractical ▪ A number of good heuristic techniques have been developed ▪ Provide good solution, not necessarily the best solution Spring Semester VLSI Physical Design Automation 135 Spring Semester VLSI Physical Design Automation 136 Placement – Problem Definition ▪ On board from Page 164 – Sait and Youssef Spring Semester VLSI Physical Design Automation 137 Placement – Cost Function ▪ A placement is acceptable if 100% routing can be achieved within a given area ▪ However, routing is the step after placement is completed, so performing actual routing and comparing placements is impractical ▪ Estimates are used for assessing, and is based on measuring the estimated wirelength to assess the “goodness” factor of any placement ▪ A commonly used objective function is to minimize L(P), the total wirelength over all signal nets for placement P. Spring Semester VLSI Physical Design Automation 138 Placement – Estimation of Wirelength ▪ Speed of estimation has a drastic effect on the performance of the placement algorithm. ▪ Estimation error must be uniform across all nets ➢ Assumes Manhattan routing ➢ For a two pin net connecting module i to module j, the Manhattan length is given as 𝑟𝑖𝑗 + 𝑐𝑖𝑗, where 𝑟𝑖𝑗 and 𝑐𝑖𝑗 are the number of rows and columns separating the locations of the two modules ➢ Since not all nets are 2-pin nets, we have methods to estimate nets connecting multiple pins Spring Semester VLSI Physical Design Automation 139 Placement – Methods for wirelength estimation ▪ Semi-perimeter Method ➢ Most widely used method. Find the smallest bounding box that encloses all the pins of the net to be connected. Estimated wirelength would then be the half-perimeter of that bounding box. ➢ This method provides the best estimate for the most efficient wiring scheme, which is the Steiner tree. ▪ Complete Graph ➢ For an n-pin net, the complete graph consists of [n(n-1)]/2 edges ➢A tree has (n – 1) edges which is 2/n times the number of edges in the complete graph. The estimated wirelength using this method is Spring Semester VLSI Physical Design Automation 140 Placement – Methods for wirelength estimation ▪ Minimum Chain ➢ Nodes are assumed to be on a chain and each pin has at most two neighbors. ➢ Start from one vertex and connect to the closest one, and then to the next closes and so on until all the vertices are connected. ➢ This technique is simpler than MST but results in longer wirelength ▪ Source to Sink Connection ➢ Output of a cell is assumed to connect to all other points of the net (inputs of other cells) by separate wires ➢ Simplest to implement but results in excessive wirelength Spring Semester VLSI Physical Design Automation 141 Placement – Methods for wirelength estimation ▪ Steiner Tree Approximation ➢ Steiner tree is the shortest route for connecting a set of pins ➢ Wires can branch from any point along its length to connect other pins ➢ Problem of finding minimum Steiner tree is itself NP-complete ▪ Minimum Spanning Tree ➢ Unlike Steiner tree, branching is allowed only at the pin locations ➢ For an n-pin net, the tree can be constructed by determining the distances between all possible pairs of pins, and connecting the smallest (n-1) edges that do not form a cycle. ➢ Kruskal’s algorithm to find MST completes is of polynomial time complexity Spring Semester VLSI Physical Design Automation 142 Comparison of wirelength estimation algorithms Spring Semester VLSI Physical Design Automation 143 Minimizing total wire length ▪ Main objective of placement is to give a solution that is completely routable and the area taken by the routing wires to be minimum ▪ One way to accomplish this is to place the strongly connected cells close to one another ▪ A commonly used objective function 𝑛is 𝐿(𝑃) = 𝑤𝑛 . 𝑑𝑛 𝑛 ∈𝑁 where dn = estimated length of net n wn = weight associated with net n N = set of nets ▪ In this estimate each net length is calculated independently so area is a rough estimate Spring Semester VLSI Physical Design Automation 144 Other cost functions ▪ Minimize maximum cut – read Sait and Youssef section 4.3.3 ▪ Minimize maximum density – read Sait and Youssef section 4.3.4 ▪ Maximize performance – read Sait and Youssef section 4.3.5 Spring Semester VLSI Physical Design Automation 145 Placement Solution Approaches ▪ Since placement is in the class of NP-hard problems, heuristics can be used to get “good” solutions even if not the best. ▪ Even linear placement increases exponentially with increase in “n”. Example: ▪ With (n=3) blocks, there are 6 possible placement solutions ▪ With (n=4) blocks the possible solutions increases to 24 ▪ Heuristic algorithms for placement are classified into ▪ Constructive algorithms ▪ Iterative algorithms Spring Semester VLSI Physical Design Automation 146 Constructive Placement Solution Approach ▪ The layout surface is imagined to be divided into “n” slots to place one cell in each slot ▪ Place one cell at random into one of the slots. At the end of this process, the algorithm needs two make a decisions: ▪ which cell to place next ▪ where to place the selected cell ▪ A possible heuristic for selecting the next cell could be whichever cell is most strongly connected to the already placed cell ▪ Suppose the partial placement has m1, m2, … mi cells already in layout ▪ Find all cells connected to any of the already placed cells ▪ For each such cells compute the connectivity ▪ Select the one that is most strongly connected to any of the placed cells and place the new cell close to that cell. ▪ This heuristic is known as maxcon (maximum connectivity) strategy ▪ By its very nature, this is a greedy algorithm Spring Semester VLSI Physical Design Automation 147 Constructive Placement Algorithm Spring Semester VLSI Physical Design Automation 148 Spring Semester VLSI Physical Design Automation 149 Partition based approaches ▪ Min-cut Placement: Similar to partitioning problem. Read Sait and Youssef pp 179-189 Spring Semester VLSI Physical Design Automation 150 Mincut Placement ▪ The given circuit is repeatedly partitioned into two sub-circuits ▪ Correspondingly, the layout region is divided either horizontally or vertically to accommodate the sub-circuits ▪ Repeat this process until each partition is occupied by a single cell ▪ Should not create any overlaps, but if it does, post-process to “legalize placements” ▪ Minimizing the cutset at each stage tends to minimize the overall wirelength ▪ Original paper (by Breuer, 1977) presented several cut procedures Spring Semester VLSI Physical Design Automation 151 Example Mincut Placement ▪ Assume all gates are located at center of the placement region ▪ If a bi-partitioning cutline is inserted to divide the given block, the gates are located at the center of the two sub-blocks Spring Semester VLSI Physical Design Automation 152 Practice Problem (“Practical Problems in VLSI Physical Design Automation” by Sung Kyu Lim pp 103-111) Consider the gate-level netlist shown in the Table. Figure shows the undirected graph model of the netlist, where the thick and the thin edges have weights of 1 and 0.5, respectively. The primary inputs and outputs do not need to be placed. Spring Semester VLSI Physical Design Automation 153 Practice Problem Contd. Spring Semester VLSI Physical Design Automation 154 Practice Problem Contd. Spring Semester VLSI Physical Design Automation 155 Gordian Algorithm for Placement ▪ This algorithm is based on quadratic programming (QP) ▪ In this the placement problem is formulated as a sequence of QP derived from the connectivity information of the circuit ▪ A set of constraints restricts the freedom of movement of the gates at every iteration – reduces overlap ▪ A top-down partitioning is performed so that cells grouped into the same partition satisfy “center of gravity” constraint. The area weighted center of the circuit must coincide with the center of block. ▪ This alternate sequence of QP and top-down partitioning repeats until the sizes of the partitions is small enough ▪ Goal is to minimize the squared wirelength among the cells, because of which a quadratic objective function is required ▪ Read pp. 112 - 121 from “Practical Problems in VLSI Physical Design Automation” by Sung Kyu Lim Spring Semester VLSI Physical Design Automation 156 Iterative Methods for Placement ▪ Simulated Annealing is one of the most popular algorithms for placement ▪ The perturbation function can be modified to suit the placement problem. Will need to redefine “accept” criteria for new placement ▪ Some possible ways for perturbation: ▪ Select 2 neighboring cells and swap their position ▪ Displace a randomly selected cell from its current position to a randomly selected new position ▪ If layout rules allow, perturbation may be possible by rotation or mirroring the same cell in its same location. Spring Semester VLSI Physical Design Automation 157 Iterative Methods for Placement ▪ One possible measure of cost may be the wire length Let Δh = (C (New S) – C (S)), where C is cost The swap is accepted if Δh < 0 (i.e. C(New S) < C(S)) or if the acceptance function is true 𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = (𝑟𝑎𝑛𝑑𝑜𝑚 < 𝑒 − Δh/T ) where random is a random number generated between 0 – 1 T is the current value of temperature Spring Semester VLSI Physical Design Automation 158 Timberwolf Algorithm for Placement ▪ TimberWolf (TW) is the name of one of the most widely used placement tool ▪ Internally, TW uses the Simulated Annealing algorithm ▪ Using the information provided by the user and the total width of the standard-cells to be placed, TW computes the target lengths of the rows ▪ It then computes the initial position of the cells to be placed, followed by initial placement of macro blocks, followed by placement of pads ▪ Pads and macro blocks retain their initial placement only the standard cells change their positions for optimization Spring Semester VLSI Physical Design Automation 159 Spring Semester VLSI Physical Design Automation 160 Timberwolf Algorithm for Placement ▪ Following initial placement, the algorithm has three distinct stages ▪ First stage: cells are placed so as to minimize the estimated wire length ▪ Second stage: Feedthrough cells are inserted into the layout as required, and wirelength is minimized again, then preliminary global routing is done ▪ Third stage: local changes are made in the placement to reduce the number of wiring tracks required ▪ Objective function: Minimize the estimated interconnect length Spring Semester VLSI Physical Design Automation 161 Timberwolf Algorithm Perturb Function ▪ Since the underlying algorithm is SA, there needs to be a way to move from one point on the solution space to another point. Perturbation is done in three different ways: ▪ Move a single cell to a new location ▪ Swap two cells ▪ Mirror a cell about the x-axis ▪ In TW, cell mirroring is used only about 10% of the cases – where cell movement is rejected Spring Semester VLSI Physical Design Automation 162 Timberwolf Algorithm Perturb Function ▪ When moving a single cell to a new location or swapping two cells, there is possibility of overlap between cells or a cell getting pushed out of the layout region ▪ This is handled by adding a penalty function for moves that cause cell overlaps or exceeding the boundary ▪ Penalty function makes the move less attractive so is considered as a bad move ▪ Cells can have a LOCKED flag associated. If the LOCKED flag is set to 1, then that cell cannot be selected for any further movement Spring Semester VLSI Physical Design Automation 163 Recent Developments for Placement ▪ Artificial Neural Networks – basis is around a large number of artificial neurons simulating the human brain or the nervous system ▪ An artificial neuron receives several inputs X1, X2, …, Xn and generates a single output OUT ▪ The total input is given as 𝑁𝐸𝑇 = σ𝑛𝑖=1 𝑊𝑖 . 𝑋𝑖 where Wi is the weight associated with input 𝑋I ▪ The output is a function F of NET where F is also known as the activation function of the neuron ▪ A popularly used activation function is the sigmoid function 𝐹 𝑥 = 1/(1 + 𝑒 −𝑥 ) If x is sufficiently large, F(x) approximates unity Spring Semester VLSI Physical Design Automation 164 ANN for Placement ▪ Several neurons connected together form a neural network ▪ Example of 3 neurons connected together in a feed forward formation, or recurrent network Spring Semester VLSI Physical Design Automation 165 ANN for Placement ▪ Similar to temperature in SA, energy plays an important role in ANN ▪ The set of all outputs OUTi is known as the state of the network ▪ Assuming the activation function of each neuron is a threshold fn ▪ For a recurrent network ▪ This network changes states or energy level and is said to converge to optimal when the energy is minimal as recognized by all the diagonal entries in matrix W are 0. Spring Semester VLSI Physical Design Automation 166 Spring Semester VLSI Physical Design Automation 167