Uploaded by astrogue123

0 - VLSI Physical Design Automation Lectures Complete

advertisement
VLSI Physical Design
Automation
Course Coordinator: Dr. R. M. Vemuri
Course Structure
• Final grade will depend on following:
• Two minors – 25% each
• One major – 50%
• Ethical Conduct:
• No cheating in any assignment or examination will be tolerated – maximum penalty
as per university rules.
• Plagiarism will be treated same as cheating. Don’t claim other’s work as own.
• Text books/references:
• VLSI Physical Design Automation: Theory and Practice by Sadiq Sait & Habib Youssef
• Algorithms for VLSI Design Automation by Sabih H. Gerez
• Practical Problems in VLSI Physical Design Automation by Sung Kyu Lim
• VLSI Physical Design: From Graph Partitioning to Timing Closure by Andrew Kahng et
al.
Spring Semester
VLSI Physical Design Automation
2
Motivation
First IC by Jack Kilby of TI in 1958
1/16 x 7/16 inch Germanium
Pentium 4 introduced in 2000
> 40M transistors on Silicon
16.7×17.6 mm^2
More recent processors have > 3B transistors
Spring Semester
VLSI Physical Design Automation
3
Design Size productivity Gap
• Number of transistors on a chip doubles every 1.52.0 years (Moore’s law).
• Design complexity grows exponentially
• Designer/Engineer productivity cannot grow
exponentially
• Causes a productivity gap between requirement
and actual
Spring Semester
VLSI Physical Design Automation
4
• VLSI – Very Large Scale Integration refers to technology through
which circuits with more than 1M transistors can be implemented
in silicon
• VLSI has been used successfully to build microprocessors, DSPs,
large capacity memories, etc. on a single chip.
• Such rapid growth in integration technology would not be possible
without automation (software/tools) of the various steps.
• In this course we will delve into the depths of the algorithms and
methods for developing such automation.
Spring Semester
VLSI Physical Design Automation
5
Design process
• How do you approach any complex problem?
• Divide and conquer!!!!
• Break up a complex problem into smaller
problems and tackle each smaller problem.
• Develop expertise in smaller area
• Integrate many smaller solutions to the overall solution
• Automate, automate, automate
Spring Semester
VLSI Physical Design Automation
6
VLSI Design Process
Spring Semester
CAD subproblem level
Idea
Generic CAD tools
Beh/Arch
Arch Design
Simulation tools
Register Transfer Leve
Logical Design
Func and logic min tools
Simulation tools
Library Cells, Masks
Physical Design
APR Tools
VLSI Physical Design Automation
7
Solving Design Complexity Problem
Design
Abstraction
Spring Semester
Design
Partition
Design
Automation
VLSI Physical Design Automation
Design Reuse
(soft IP or hard IP)
8
Spring Semester
VLSI Physical Design Automation
9
VLSI Design Cycle
Just like any design, the VLSI design cycle starts with a formal
specification (spec) of a VLSI chip, and follows a series of steps
1. System Specification
a. High level representation of the system.
b. Include performance, functionality, dimensions, power, fab technology
2. Architectural Design
a. What instruction set should be used, number of ALUs, memory addressing
modes?
b. Whether to use pipelining, and what should be the depth
c. Outcome of this stage is a document MAS (micro-architectural
specification)
d. Predict performance, power, and die size.
e. Early estimates are critical to determine viability of product
Spring Semester
VLSI Physical Design Automation
10
3. Behavioral Design
a.
b.
c.
d.
Main functional units of the design are identified
Interconnect requirements between units are also identified
PPA estimates are generated for each unit
Implementation specifics are still not identified at this stage. For example, it
may identify a multiplication is needed, but not the specific order in which it
has to be implemented.
4. Logic Design
a. Control flow, word widths, register allocation, arithmetic/logical operations
are derived in RTL
b. RTL is expressed in a HDL such as Verilog or VHDL
c. Consists of Boolean expressions and timing information
d. High level synthesis tools can be used to generate RTL from behavioral
design
Spring Semester
VLSI Physical Design Automation
11
5. Circuit Design
a. Takes the form of a circuit representation from RTL
b. For digital circuits, mainly through use of logic synthesis tool and for analog
designs through schematic capture tool
c. Circuit simulation tools are used to verify correctness of timing. For digital
logic Static Timing Verification is used and for analog Spice simulations.
d. This representation is also known as netlist.
6. Physical Design
a. Netlist is converted into a geometric representation.
b. This geometric representation is known as layout.
c. These geometric forms represent transistors, and multiple layers of wires
connecting the transistors. Eg. Rectangle of poly over rectangle of diffusion
forms one transistor
d. Layout has to follow strict design rules imposed by the process technology
Spring Semester
VLSI Physical Design Automation
12
7. Fabrication
a. After physical verification, go through a process called tape out
b. From the geometric shapes, masks are created to etch on silicon
c. The dies on which the shapes are etched are made from silicon and several
dies exist on a single wafer.
8. Packaging and post silicon validation
a. At this stage the wafer is diced into individual chips and go through
packaging
b. Post silicon validation is done with early silicon using emulators, to detect
any bugs.
Spring Semester
VLSI Physical Design Automation
13
• Some steps are usually combined and shown in flow diagram below
Problem description
High level design decisions
Early estimates of PPA
Idea
No tools
Architectural and
Behavioral Design
Modeling and
Simulation tools
Logic and Circuit
Design
Synthesis and logic
Minimization tools
Cell level and layout
Physical Design
Tools for partitioning,
Placement, routing
Actual hardware
Industry term: First Silicon
Fabrication and
Packaging
RTL, netlist, and
Schematic entry
Spring Semester
VLSI Physical Design Automation
Manufacturing and
Packaging tools.
Actual Machines.
All Software
SW only to
Control the
machines
14
Some photos from the web
Layout view of processor1
Silicon wafer2
Packaged microprocessor3
1 From: https://superuser.com/questions/324284/what-is-meant-by-the-terms-cpu-core-die-and-package
2 From: https://www.dreamstime.com/photos-images/silicon-wafer.html
3 From: https://www.dreamstime.com/royalty-free-stock-images-microprocessor-image1915559
Spring Semester
VLSI Physical Design Automation
15
Physical Design Cycle
• Input to physical design cycle starts with netlist (circuit) and the
output is a layout. This is accomplished in several stages:
➢ Partitioning:
✓Large designs consist of millions or billions of transistors, it is not possible to
work with the entire design at the same time due to memory space
limitations or compute power limitations. This necessitates that the chip be
partitioned int sub units which later get integrated into the full chip after it
goes through to layout and verification.
✓The process considers factors such as function of the block, size of the block,
total number of blocks, number of interconnections between blocks
(pins/ports), and the design team size
✓Output of this stage is a set of blocks and the interconnections required
between blocks.
✓Each block can recursively be partitioned to sub blocks
Spring Semester
VLSI Physical Design Automation
16
➢ Floorplanning and Placement:
✓Looks into good layout alternatives for each block as well as the full chip
✓Area can be estimated based on types of components and number of
components in the block
✓Interconnect area is still an estimation at this point
✓Mostly rectangular blocks are created, but not necessary
✓This stage is usually done by humans as we are better at visualizing the entire
floorplan than today’s state-of-the-art software.
✓During placement blocks are exactly positioned on the chip.
✓Goal is to find a minimum area arrangement that allows completion of the
interconnections between blocks while meeting the performance constraints
✓Usually done in two phases – in the first phase, initial placement is created
and in second phase incremental adjustments are made
✓Quality of placement can be determined only after routing is completed
Spring Semester
VLSI Physical Design Automation
17
➢ Routing:
✓Objective is to complete interconnections between all pins to meet the
functionality determined by the netlist
✓First the space between blocks is partitioned into rectangular regions called
channels and switchboxes
✓This includes space between blocks as well as on top of the blocks
✓Goal of a router is to complete all circuit connections using the shortest
possible wire length and using only the channels and switchboxes
✓First phase of routing is called Global Routing – where it defines channels and
switchboxes through which wires should be routed.
✓Second phase is Detailed Routing – where exact wire dimensions, layers used
for the wires.
✓Routing is an NP-hard problem – so much research has focused on heuristics
to solve this problem.
Spring Semester
VLSI Physical Design Automation
18
➢ Compaction:
✓Squeezing the layout from all sides to eliminate any wasted space
✓Goal is to reduce the overall area. By making the chip smaller wire lengths are
reduced which in turn reduces the signal delay, and more chips can be
manufactured on the same wafer.
✓When more chips are made from the same wafer, the cost/chip also goes
down.
✓Very compute intensive problem, so is used only for very high volume parts.
Spring Semester
VLSI Physical Design Automation
19
➢ Extraction and Verification:
✓Design Rule Checking (DRC) verifies that the geometric patterns on the chip
comply with all the rules required by the fabrication process.
✓ Rules such as wire to wire spacing, minimum widths of wires, antenna checking …
✓Layout Versus Schematic (LVS) extracts the circuit from the layout and
compares with the original schematic for functional accuracy.
✓Parasitic Extraction calculates the R and C values for all the wires based on
their material properties and surroundings, which is used to verify
performance requirements.
✓Reliability Verification is the process by which the longevity of the chip is
determined as well as reliability against such things as electrostatic discharge.
Spring Semester
VLSI Physical Design Automation
20
Standard Cell
• Standard Cell based design style is also known as APR design, ASIC
design or sea-of-cells design.
• Consists of a library of pre-designed cells known as standard cells
(several thousand cells in a library).
• Each cell in the library is a rectangular shape with same height
• The std cell library goes through characterization flow, to determine
its functionality, timing, and verification.
• Connection points (pins/ports) are distributed on the surface or edges
of the cells.
• Cells typically use no more than two layers to complete the
interconnections within the cell.
Spring Semester
VLSI Physical Design Automation
21
• Cells are of single hierarchy – no sub hierarchies within.
• Designs using this methodology are not as compact as custom design,
but can be completed much faster.
• Automation starts after RTL and completes the full physical design –
this flow is also known as RTL2GDS in the industry.
• Development of the standard cell library requires a significant initial
investment. Library can be reused for any number of designs.
Spring Semester
VLSI Physical Design Automation
22
Gate Arrays
• Simplification of standard cell design.
• All the cells are in a gate array are the same (NAND or NOR).
• Consists of arrays of gates where these are separated by both vertical
and horizontal channels.
• Sea of gates is an improvement of gate array
• Entire design has to be mapped to same gate. Example:
• Just need to add connectivity
• Reduced design time
• Less chance of error
• Increase in area
Spring Semester
VLSI Physical Design Automation
23
Field Programmable Gate Array (FPGA)
• In FPGAs, cells and interconnect are pre-fabricated
• Provides flexibility in design through software
• Lowers development cost and faster time to market
• Easily reconfigurable in the field
• Not as fast or power efficient as other design styles
• High area overhead due to unusable space on FPGA
Spring Semester
VLSI Physical Design Automation
24
Impact of fabrication on Physical Design
• Scaling: The process of shrinking the size of layout is called scaling
• Transistors and the interconnects that connect them are made smaller
• As a transistor becomes smaller, it becomes faster, conducts more electricity,
and consumes less power.
• Cost of producing the transistor goes down, and more of them can be packed
in one wafer
• Example: If a chip is designed on 0.25u process and is m x m (m2) dimensions
• Assume a typical shrink factor of 0.7 from 0.25u to 0.18u
• Dimensions of the chip will be scaled by 0.7 in width and height 0.7m x 0.7m (0.5m2)
• The scaled chip becomes half the size of the original chip
• Transistor delay also scales accordingly, but interconnect delay does not.
• Interconnect delay starts becoming a larger factor with the size reduction
Spring Semester
VLSI Physical Design Automation
25
Scaling Methods
• Two basic types of scaling – full scaling and constant-voltage scaling
• Table below shows the difference (scaling factor S)
Parameter
Full Scaling
CV Scaling
Dimensions: width, length, oxide thickness
1/S
1/S
Voltages: Power, threshold
1/S
1
Gate Capacitance
1/S
1/S
Current
1/S
S
Propagation delay
1/S
1/S2
Table taken from “Algorithms for VLSI Physical Design Automation” by Naveed Sherwani pp. 77
Spring Semester
VLSI Physical Design Automation
26
Parasitic Effects of Scaling
• Circuit elements come closer to one another with process scaling
• This increases the inter-component capacitance values
• Capacitance between signal paths and signal path to ground are two
major parasitic capacitances
• Another is the inherent capacitance of the MOS transistor
Spring Semester
VLSI Physical Design Automation
27
Interconnect Delay/Signal Integrity
• Interconnect delay is typically 50-70% of the overall delay
• Resistance of a conducting material is given by
𝜌𝑙𝑐
𝑅=
, where ρ is the resistivity of material, lc is length of the
ℎ𝑐 𝑤𝑐
wire, hc the height of wire, and wc width of wire.
C α hc, wc
and
1
Cα
𝑤𝑖𝑟𝑒−𝑤𝑖𝑟𝑒−𝑠𝑝𝑎𝑐𝑖𝑛𝑔
• With scaling, the resistance goes up significantly, and capacitance
goes up but not as significantly
• Other signal integrity issues like noise, and crosstalk have to be dealt
with during design
Spring Semester
VLSI Physical Design Automation
28
• Heuristic Algorithms
• Heuristic algorithms are frequently used for solving NP complete problems.
• A heuristic algorithm produces a solution but does not guarantee optimality
• Have to be tested on benchmarks to verify their effectiveness
• A good heuristic must have low time and space complexity and must produce
a near optimal solution
• On an average these algorithms should produce acceptable (good) results
• In many cases O(n) time complexity heuristics have been developed even if an
optimal O(n3) or O(n2) time complexity algorithm exists
• Even when optimal solutions exist but have high time complexity, it is
desirable to use a heuristic which gives near optimal solution but within a
reasonable time
Spring Semester
VLSI Physical Design Automation
29
Data Structures and Basic Algorithms
• VLSI design (going from HDL to silicon) can be viewed as a significant
database management problem
• Layout is captured as a database of polygons as several layers of
planar rectangles, with certain properties.
• Each polygon is captured with great precision. This precision is
necessary as this information has to be communicated to devices
such as plotters, video displays, and finally to fabricating machines.
• Many VLSI problems can be represented as graphs and we can use
graph theory to understand and find solutions
Spring Semester
VLSI Physical Design Automation
30
Definition of a Graph
❑A graph is a non-empty finite set of vertices V and edges E, both ends
of which belong to set V. Nodes that do not belong to any edge are
called isolated.
❑Edges may be straight or curved, the length of edges and position of
vertices are arbitrary.
❑An example of a graph
Designation of a graph
e1
v1
v2
v3
e2
e4
G(V, E) = (V, E), V  φ, E  V  V.
e3
v5
e4
V={v1, v2, …, vn}
v4
Spring Semester
VLSI Physical Design Automation
31
Some Basic Concepts
❑Let v1, v2 be vertices, e=‹ v1, v2›- connecting them to the edge.
Then the vertices v1 and v2 are incident to edge e.
❑Two edges, incident to a vertex are called adjacent.
❑The number of vertices of a graph G is denoted by p, and the
number of edges - q, then:
➢A graph is called complete if every two distinct vertices are connected by
one and only one edge.
Spring Semester
VLSI Physical Design Automation
32
Some Basic Concepts
❑The degree of a vertex is the number of edges of the graph
which this vertex belongs to and is denoted d(v), deg(v). Vertex
of the graph for which d (v) = 0 is isolated if d (v) = 1, then
terminal.
a
e
d
Deg (d) = 3; deg (e) = 1; e - terminal vertex
deg (c) = 0; c - isolated vertex
b
c
❑The vertex is called odd if d (v) - an odd number, even if d (v) an even number. The degree of each vertex of a complete graph
is one less than the number of its vertices.
Spring Semester
VLSI Physical Design Automation
33
Properties of the Degree of Vertices
❑In the graph G (V, E) the sum of the degrees of all its
vertices - an even number equal to twice the number of
edges.
❑The number of odd vertices of any graph is even.
❑In any graph with n vertices, where n ≥ 2, there will always
be at least two vertices with the same degrees.
❑If in the graph with n vertices (n ≥ 2) exactly two vertices
have the same degree, then this graph will always have
either exactly one vertex of degree 0, or exactly one vertex
of degree n-1.
Spring Semester
VLSI Physical Design Automation
34
Paths of the Graph
❑The flow in a graph is an alternating sequence of vertices and edges,
in which any two adjacent elements are incident:
v0 , e1 , v1 , e2 , v2 ,..., ek , vk
❑If v0 = vk , the route is closed, otherwise open.
❑If all edges are distinct, then the route is called a chain.
❑If all vertices are distinct, then the route is called a simple chain. In the
chain 𝑣0, 𝑒1, 𝑣1 , 𝑒2 , 𝑣2, . . . , 𝑒𝑘 , 𝑣𝑘 vertices v0 and vk are called the ends of the
chain.
❑A closed chain is called a cycle, a closed simple chain is called a
simple cycle.
❑A graph without cycles is called acyclic.
❑If the route M = v0 , e1 , v1 , e2 , v2 ,...,ek , vk , the route length M is equal to k.
Spring Semester
VLSI Physical Design Automation
35
Undirected Graph
❑An undirected graph is a type of graph where the edges have no
specified direction assigned to them.
❑Below is an example of an undirected graph:
a vertex
b an edge
4
1
e
a
2
c
5
3
f
Spring Semester
VLSI Physical Design Automation
36
Undirected Graph
❑A graph is defined in terms of vertices and edges. Two
vertices are connected to each other by an edge.
❑A graph is G=(X,E) where G is a graph, X is the set of
vertices, E is the set of edges connecting the vertices,
hence each of element of E is an unordered pair of two
elements from X.
❑Any situation (system) that contains a set of elements (the
vertices) and the relationship between pairs of elements
(the edges) can be described by an undirected graph.
Spring Semester
VLSI Physical Design Automation
37
Directed Graph
❑A directed graph contains edges that are ordered pairs of
vertices.
➢ In a directed graph, the edges have a direction associated with
them, indicated by an arrow
a vertex
b an edge
4
1
e
a
2
c
5
3
f
Spring Semester
VLSI Physical Design Automation
38
Network
❑ A network is a graph where each edge (or arc) is associated with
a number (value).
1
e1
0.5
2
e2
4
e3
3
e4
4
2
-3
❑ The actual meanings of the numbers depend on the application.
In general they may be positive or negative.
❑ A network is also called a weighted graph.
a loop
1
Spring Semester
2
3
VLSI Physical Design Automation
39
Planar Graph
❑ A planar graph is a graph which can be drawn with no two edges
crossing each other.
1
2
❑ This is a planar graph …
Spring Semester
3
4
1
2
3
4
VLSI Physical Design Automation
40
Tree
❑ A tree is an acyclic connected graph
➢ For any pair of nodes in the graph, there must be exactly one way to
travel between them
❑ A binary tree is a tree where every node has at most 3 neighbors
➢ Every node has two edges except the leaf nodes
internal node
leaf node
Spring Semester
VLSI Physical Design Automation
41
Tree Terminology
⚫
Root: A
⚫
Internal nodes: A, C
⚫
Leaves: B, D, E
⚫
A's children: B, C
⚫
D's parent: C
⚫
C's sibling: B
⚫
E's grandparent: A
⚫
Height: 2
⚫
Shorter binary trees are better for most algorithms and data structures
Spring Semester
A
B
C
D
VLSI Physical Design Automation
E
42
Data Structures for Representation of Graphs
⚫ Adjacency Matrix
Spring Semester
VLSI Physical Design Automation
43
Data Structures for Representation of Graphs
⚫ List Representation
Spring Semester
VLSI Physical Design Automation
44
Spring Semester
VLSI Physical Design Automation
45
Spring Semester
VLSI Physical Design Automation
46
Complexity Issues and NP-hardness
• Many algorithms and mathematical techniques are used for solving
physical design problems in VLSI
• We will study some algorithms that fall into the category of Greedy
Algorithms, and Heuristic Algorithms
• Due to the size of VLSI designs all algorithms must have low time and
space complexity
• Major cause of concern is absence of polynomial time algorithms for
majority of the problems encountered in physical design automation
Spring Semester
VLSI Physical Design Automation
47
• The class of solvable problems can be classified into P and NP
• The class P consists of problems that can be solved in polynomial time
by a deterministic Turing machine
• NP problems can be solved in polynomial time by non-deterministic
Turing machine – can be viewed as a parallel computer with as many
processors as we need – non realistic model
• Using more processors if we can reduce every NP problem to a
problem P, then problem P is in class NP itself
Spring Semester
VLSI Physical Design Automation
48
• Exponential Algorithms
• If size of the problem is small, can use exponential time algorithms. This is
utilized when the solution of problem is critical to the chip for some practical
purpose, then it is important to take the time hit to get optimality. Integer
programming is one such example, used for combinatorial optimization.
• Special Case Algorithms
• Not really a class in itself, but a complex problem may be simplified by
applying certain constraints. Sometimes an NP complete problem may be
solvable in polynomial time with restrictions applied. Placing cells of equal
height (std. cells) in rows than cells of unequal heights.
• Approximation Algorithms
• Useful when near optimality is sufficient. Such algorithms produce results,
even though it may be subpar but no worse than a lower bound.
Spring Semester
VLSI Physical Design Automation
49
• Some Graph Algorithms. One significant advantage of using graph
algorithms is that they have been well-studied and understood.
• Graph search algorithms have many applications in VLSI physical
design, where problems are modeled using graphs.
• Depth First Search (DFS): In DFS, an edge is selected for exploration from the
most recently visited vertex v. When all edges of v have been explored, the
algorithm back tracks to the previous vertex which has an unexplored edge
• Time complexity of DFS is O(V + E), where V is the set of vertices and E is the
set of edges
Spring Semester
VLSI Physical Design Automation
50
• Breadth first search (BFS): This algorithm starts from some vertex, and
searches all the adjacent vertices before exploring the adjacencies of other
vertices.
• Starts with some source vertex v
• Explores all edges of v
• Puts the reachable vertices in a queue and marks v as visited
• If a vertex is already marked then it is not queued
• This process is repeated for each vertex in queue
• The time complexity of BFS is also O(V + E)
Spring Semester
VLSI Physical Design Automation
51
Spring Semester
VLSI Physical Design Automation
52
• Spanning Tree Algorithms
• Many graph problems are of subset selection problems
• Given a graph G = (V, E) select a subset V’ V, such that V’ has property P
• Spanning tree is a set of edges which spans all the vertices and forms a tree.
• Minimum Spanning Tree (MST) is a spanning tree with minimum cost function
• Each edge of the graph will have a cost associated with it.
• For example, each edge may be the distance between two pins in a design. A tree that
gives the minimum wire length would be the minimum spanning tree
Spring Semester
VLSI Physical Design Automation
53
• There are three algorithms for finding MST
• Kruskal’s algorithm, and
• Prim’s algorithm
• We will study Kruskal’s algorithm and Prim’s algorithm
Spring Semester
VLSI Physical Design Automation
54
• Kruskal’s algorithm for finding minimum spanning tree
1. Sort all the edges in a non-decreasing order of their weight for a graph with
V vertices
2. Pick the smallest edge, and check if it forms a cycle with the already
spanning tree formed so far:
i.
ii.
If a cycle is formed, discard the edge.
If a cycle is not formed, add the edge to the spanning tree
3. Repeat step 2 until all V vertices have been connected
• Kruskal’s algorithm is a greedy algorithm
• It always looks for the shortest path (minimum weight) for adding to the tree
• Greedy algorithms can get stuck in local optima
Spring Semester
VLSI Physical Design Automation
55
Spring Semester
VLSI Physical Design Automation
56
Spring Semester
VLSI Physical Design Automation
57
Prim’s Algorithm
1: Determine an arbitrary vertex as the starting vertex of the MST.
2: Follow steps 3 to 5 till there are vertices that are not included in
the MST
3: Find edges connecting any tree vertex with the other vertices.
4: Find the minimum among these edges.
5: Add the chosen edge to the MST if it does not form any cycle.
Spring Semester
VLSI Physical Design Automation
58
Shortest Path Algorithms
• Many routing problems in VLSI are nothing but shortest path
problems. These algorithms have a significant role in VLSI design.
• Single Source Shortest Path:
• Given an edge-weighted graph G = (V, E) and two vertices u,v ε V, select a set
of vertices that induce a path of minimum cost in G.
• Let w(p,q) be the weight of edge (p,q) with w(p,q) ≥ 0 for each (p,q) ε E
• Dijkstra’s algorithm solves this problem in time complexity of O(n2)
Spring Semester
VLSI Physical Design Automation
59
Dijkstra’s Algorithm
dist[s] ←0
for all v ∈ V–{s}
do dist[v] ←∞
S←∅
Q←V
while Q ≠∅
do
u ← mindistance(Q,dist)
S←S∪{u}
for all v ∈ neighbors[u]
do if dist[v] > dist[u] + w(u, v)
then d[v] ←d[u] + w(u, v)
return dist
# distance to source vertex is zero
# set all other distances to infinity
# S, the set of visited vertices is initially empty
# Q, the queue initially contains all vertices
# while the queue is not empty
# select the element of Q with the min. distance
# add u to list of visited vertices
# if new shortest path found
# set new value of shortest path
0
s
3
1
u
1
1
4
2
2
v
3
y
1
x 2
2
3
2
z 5
Spring Semester
VLSI Physical Design Automation
60
Simulation of Dijkstra’s Algorithm
Round Vertex
Added
a
1
s
3
1
s
a
b
c
d
c
2
1
4
4
6
b
1
d
3
Spring Semester
VLSI Physical Design Automation
61
More reading …
• Reading assignment (Sherwani’s book)
• Min-Cut and Max-Cut Algorithms
Pages 110-115
• Steiner Tree Algorithms
• Atomic Operations for Layout Editors Pages 117-118
• Corner Stitching Pages 123-129
• Become familiar with these algorithms/concepts
• Will come up for discussion later during this course
• Questions and discussion encouraged in next class
Spring Semester
VLSI Physical Design Automation
62
Mid Term 1
Spring Semester
VLSI Physical Design Automation
63
Circuit Partitioning
Spring Semester
VLSI Physical Design Automation
64
Introduction
❖Introduction to partitioning
❖Problem definition
❖Cost function and constraints
❖Approaches to partitioning
▪ Kernighan-Lin Heuristic
▪ Fiduccia-Mattheyses Heuristic
▪ Simulated Annealing
▪ Genetic Algorithm
Spring Semester
VLSI Physical Design Automation
65
Partitioning Algorithms
❖Iterative partitioning algorithms
❖Spectral based partitioning algorithms
❖Net partitioning vs. module partitioning
❖Multi-way partitioning
❖Multi-level partitioning
❖Further study in partitioning techniques (timing-driven …)
Spring Semester
VLSI Physical Design Automation
66
Problem Definition
❖Partitioning is the process of decomposing a system into a set of
smaller sub-systems
❖The system must be decomposed in a way that the sub-systems
maintain the original functionality
❖An interface specification is generated during the decomposition,
which is used to connect all the sub-systems
▪ Should attempt to minimize the interface interconnections between any two
sub-systems
❖The decomposition process should be efficient so the time required
for the decomposition remains only a small fraction of the total
design time.
Spring Semester
VLSI Physical Design Automation
67
Motivation
• Circuit is too large to be designed as a single entity
• Capacity limitations of simulation tools
• Design is too large to be placed on a single chip
• I/O pin limitations from packaging
Spring Semester
VLSI Physical Design Automation
68
Partitioning Example
5
1
6
3
2
7
4
8
C1
1
(b)
C2
2
5
7
3
8
6
4
1
(a)
4
2
3
5
6
7
8
(c)
Spring Semester
VLSI Physical Design Automation
69
Block A
Cut size A = 5
Area of Block A = 15
Spring Semester
Block B
Cut size B = 7
Area of Block B = 10
VLSI Physical Design Automation
70
• Problem Formulation
• A graph G = (V, E) representing a partitioning problem can be constructed as
follows:
• Let V = {v1, v2, …, vn} be a set of vertices and E = {e1, e2, …, em} be a set of edges, where
each vertex represents a component (transistor, std. cell, gate, macro cell) of the design.
• There is an edge joining the vertices whenever the components corresponding to these
vertices are to be connected.
• Each edge is a subset of the vertex set i.e., ei ⊆ V, where i = 1, 2, …, m. Each edge
represents a net in the design
• The area of each component is denoted as a(vi), 1 ≤ i ≤ n.
• The partitioning problem is to partition V into V1, V2, …, Vk where
𝑉𝑖 ∩ 𝑉𝑗 = ∅,
𝑖 ≠𝑗
For i = 1 to k
Partition is also referred to as a cut, and the cost of partition is called cut size
Cut size can be the number of edges crossing the cut
Spring Semester
VLSI Physical Design Automation
71
Iterative Partitioning Algorithms
❖Deterministic and Greedy iterative improvement algorithms
▪ Kernighan-Lin 1970
▪ Fiduccia-Mattheyses 1982
❖Non-deterministic and non-greedy iterative algorithms
▪ Simulated Annealing
▪ Genetic Algorithm
Spring Semester
VLSI Physical Design Automation
72
• Solutions for Partitioning Problem
• Even the simplest 2-way partitioning problem with identical node sizes and
unit edges is NP-complete. This is a special case of the k-way partitioning
• If there was no need to balance the two partitions, we could use the maxflow mincut algorithm to get the minimum size cut.
• For separating a design with 2n elements into two partitions, the total
number of ways is given as
1
2𝑛 !
2𝑛 !
𝑃 2𝑛 =
.
=
2
𝑛! 2𝑛 − 𝑛 !
2 𝑛! 𝑛!
n
Number of possible 2-way partitions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Spring Semester
1
3
10
35
126
462
1716
6435
24310
92378
352716
1352078
5200300
20058300
77558760
300540195
1166803110
4537567650
17672631900
68923264410
Number of possible 2
Number of possible 2-way partitions
8E+10
100000
90000
7E+10
80000
6E+10
70000
5E+10
60000
50000
4E+10
40000
3E+10
30000
2E+10
20000
1E+10
10000
0
0
0
5
10
15
20
25
0
VLSI Physical Design Automation
2
4
6
73
K-L Algorithm
• Kernighan-Lin Algorithm:
• An iterative improvement algorithm
• One of the most popular for solving two-way partitioning problem
• Can be extended to the more general case
• The problem is characterized by a connectivity matrix C. Element c ij
represents the sum of weights of the edges connecting elements i and j .
• In TWPP, since the edges have unit weights, c ij simply counts the number
of edges connecting i and j .
• The output of the partitioning algorithm is a pair of sets A and B such that
|A| = n = |B|, and A ∩B = ∅, and such that the size of the cutset T is
minimized.
Spring Semester
VLSI Physical Design Automation
74
K-L Algorithm
𝑇=
෍
𝑐𝑎𝑏
𝑎 ∊𝐴,𝑏 ∊𝐵
• Kernighan-Lin heuristic is an iterative improvement algorithm. It starts from an initial partition
(A, B) such that |A| = n = |B|, and A ∩B = ∅.
• How can a given partition be improved?
• Let P ∗ = {A∗, B ∗ } be the optimum partition and P = {A, B } be the current partition.
• Then, in order to attain P ∗ from P , one has to swap a subset X ⊆ A with a subset Y ⊆ B such
that,
✓
|X| = |Y|
✓
X = A ∩B∗
✓
Y = A* ∩B
Spring Semester
VLSI Physical Design Automation
75
K-L Algorithm
▪ A∗ = (A − X ) + Y and B∗ = (B − Y ) + X.
▪
The problem of identifying X and Y
B∗}.
X
is as hard as that of finding P ∗ = {A∗,
Y
X
A*
B*
Y
A
B
Optimal
Initial
Spring Semester
VLSI Physical Design Automation
76
K-L Algorithm - Definitions
▪ Definition 1: Consider any node a in block A. The contribution of node
a to the cutset is called the external cost of a and is denoted as Ea, and
𝐸𝑎 = ෍ 𝑐𝑎𝑣
𝑣 ∊𝐵
▪ Definition 2: The internal cost of Ia of node a in block A is defined as
𝐼𝑎 = ෍ 𝑐𝑎𝑣
𝑣 ∊𝐴
Moving node a from block A to block B would increase the value of the
cutset by Ia and decrease it by Ea. Therefore, total change would be
Da = Ea − Ia
Spring Semester
VLSI Physical Design Automation
77
Example
▪ In the figure below, consider nodes a in A and b in B
▪
Ia = 2, Ib = 3, Ea = 3, Eb = 1, Da = Ea - Ia = 1, and Db = Eb - Ib = -2
▪ Lemma: gab = Da + Db - 2cab
a
b
Spring Semester
VLSI Physical Design Automation
78
K-L Contd
▪ Solve Example 2-2 from text book by Sait and Youssef page 53
Spring Semester
VLSI Physical Design Automation
79
Fiduccia Mattheyses (FM) Algorithm
❖C. M. Fiduccia and R. M. Mattheyses were researchers at General
Electric Research and Development Center in Schenectady, NY
❖Paper titled “A Linear-Time Heuristic for Improving Network
Partitions” was presented at the 1982 Design Automation Conference
Spring Semester
VLSI Physical Design Automation
80
Features of FM Algorithm
▪ Modification of KL Algorithm
▪ Iterative graph partitioning heuristic
▪ Operates in multiple passes
▪ Move one vertex at a time to improve the cut
▪ Innovative use of data structures makes this heuristic very efficient
Spring Semester
VLSI Physical Design Automation
81
Features of FM Algorithm
▪ Similarities to KL
▪ Works in passes - iteratively improves the partitions
▪ Locks nodes after a move
▪ Differences from KL
▪ Does not exchange pairs of nodes. Move only one node at a time
▪ Use of gain bucket data structure
Spring Semester
VLSI Physical Design Automation
82
• Definition: The gain of a vertex is the improvement to the cut when it
is moved to the other partition
+2
Spring Semester
+1
VLSI Physical Design Automation
0
-1
83
Details of FM Algorithm
▪ The data structure used for choosing the next vertex to be moved is
shown below
+pmax
max gain
vertex
vertex
vertex
-pmax
Vertex
1
Spring Semester
2
3
VLSI Physical Design Automation
n
84
Details of FM Algorithm
▪ Each component is represented as a vertex that can be moved
▪ The vertex gain is an integer and each vertex has its gain in the range of pmax to +pmax, where pmax is the maximum vertex degree in the graph
▪ Since vertex gains have restricted values, bucket sorting can be used to
maintain a sorted list of vertex gains - in an array BUCKET[-pmax … +pmax]
▪ The kth entry contains a doubly linked list of free vertices with gains that
are currently equal to k
▪ Two such arrays are needed - one for each partition
▪ Each array is maintained by moving a vertex to the appropriate bucket
whenever its gain changes due to the movement of one of its neighbors
▪ Direct access to each vertex from a separate field in the VERTEX array
allows removal of a vertex from its current list and its movement to the
head of its new bucket list in constant time
Spring Semester
VLSI Physical Design Automation
85
Details of FM Algorithm
▪ As only free vertices are allowed to move, only their gains are updated
▪ Whenever a base vertex is moved, it is locked and removed from its bucket
list and placed on a FREE VERTEX LIST, which is later used to reinitialize the
BUCKET array for the next pass.
▪ The FREE VERTEX LIST saves a lot of work when a large number of vertices
have permanent block assignments and are not allowed to move any more
▪ Each BUCKET has a MAXGAIN index which is used to keep track of the
bucket having a vertex of highest gain.
▪ This index is updated by decrementing it whenever its bucket is found to be
empty and resetting it to a higher bucket whenever a vertex moves to a
bucket above MAXGAIN
Spring Semester
VLSI Physical Design Automation
86
Pseudo Code for F-M Algorithm
Spring Semester
VLSI Physical Design Automation
87
Spring Semester
VLSI Physical Design Automation
88
Simulated Annealing
▪ Most widely used iterative technique for solving combinatorial
optimization problems
▪ It is an adaptive heuristic and belongs to the class of nondeterministic algorithms
▪ First introduced in 1983 by Kirkpatrick, Gelatt, and Vecchi.
▪ Inspired from the metallurgical process of carefully cooling molten
metals in order to obtain a good crystal structure
▪ In annealing the metal is heated to a very high temperature, and then
slowly cooled at a proper rate to get proper crystal structure
Spring Semester
VLSI Physical Design Automation
89
Simulated Annealing
▪ Every combinatorial optimization problem is a search problem through the
state space of the combinatorial elements involved
▪ An iterative improvement scheme starts with some given state on the
search space and examining a local neighborhood of states for a better
solution
▪ A local neighborhood of a state S is the set of all states which can be
reached from S by making a small change to S
Spring Semester
VLSI Physical Design Automation
90
Simulated Annealing
▪ When all the local neighbors have inferior costs, the algorithm is said
to have converged to a local or global optimum point on the search
▪ Simulated Annealing is a non-greedy algorithm and can climb out of
local optimum points
▪ No way for the algorithm to find the global optimum unless it can
climb the hill out of the local optimal point
▪ Core of the algorithm is known as the Metropolis procedure, which
simulates the annealing process at a given temperature T
▪ Metropolis (named after scientist) receives as input the current
solution S and a value M which is the amount of time for which
annealing must be applied at temperature T
Spring Semester
VLSI Physical Design Automation
91
Simulated Annealing
▪ Amount of time spent in annealing at a given temperature is gradually
increased as temperature itself is lowered
▪ This is done using the parameter β > 1
▪ The variable Time keeps track of the time being spent in each call to
the Metropolis
▪ The annealing procedure stops when Time exceeds the allowed time
▪ The pseudo code for the algorithm is given on the next slide
Spring Semester
VLSI Physical Design Automation
92
Simulated Annealing
Spring Semester
VLSI Physical Design Automation
93
Partitioning Using Simulated Annealing
▪ Once again we consider a two-way partitioning problem
▪ Requirement is to generate an almost balanced partition with minimum
cutset
▪ To use SA to solve this problem, first we need a cost function which can
represent both the balance criteria as well as the cutset. Define:
Spring Semester
VLSI Physical Design Automation
94
Partitioning Using Simulated Annealing
▪ We can define cost function as:
▪ Ws and Wc are constants in the range of [0,1]
▪ These constants indicate the importance given to the imbalance
between the partitions and the cutset of the edges between the two
partitions
Spring Semester
VLSI Physical Design Automation
95
Neighbor Function for Partition Problem with SA
▪ Simplest neighbor function that can be used is the pairwise exchange
mechanism of K-L algorithm
▪ Other neighbor functions that can be used:
▪ Select those components whose contribution to the external cost is high
▪ Select those elements that have the minimum internal connections
▪ SA does not have any imposition on the cost function or the neighbor
selection function
Spring Semester
VLSI Physical Design Automation
96
Spring Semester
VLSI Physical Design Automation
97
Partition Problem Using Genetic Algorithms
▪ Developed by John Holland of the University of Michigan with
learnings from Natural Evolution
▪ Probabilistic transition rules are used
▪ Similar to SA this too is a non-deterministic algorithm
▪ Allows hill climbing to get out of locally optimum points on the search
space
▪ Representation of any solution point is critical for modeling accurately
▪ Optimum search from a population of points
▪ GA has no memory, just characteristics carried over from one
generation to the next
Spring Semester
VLSI Physical Design Automation
98
Steps involved in Genetic Algorithms
▪ The reproductive process of GA is described by the following steps:
▪ Natural Selection: Similar to natural evolution, this conceptualizes
survival of the fittest. The algorithm chooses two dissimilar parents
with high fitness to survive from one generation to the next
▪ Crossover: The biological equivalent of mating and producing
offspring. Solutions that have a higher fitness are more likely to be
chosen for crossover. In crossover, properties of both parent
solutions are combined for the offspring
▪ Mutation: This is similar to mutation as seen in evolution and used to
maintain diversity in the population of solutions. Similar to evolution,
mutation occurs in GA with a very low probability
Spring Semester
VLSI Physical Design Automation
99
Steps involved in Genetic Algorithms
▪ Fitness or Cost Function: This is used for evaluating the fitness of any
solution (members of the population). The results are used to
compare two individual solutions. For GA, the cost function can be
exactly what it was in SA.
▪ Population Update: This process replaces the old population with
fresh individuals to obtain the higher-quality population and maintain
diversity
▪ Maintaining some good solutions from past generations is important.
To avoid the loss of the parent solutions in the new generation, some
of the highly fit solutions are moved over to the next generation.
Spring Semester
VLSI Physical Design Automation
100
Genetic Algorithms for Partitioning (TWPP)
▪ Initial population of p can be generated randomly.
▪ A string representation would be good for a TWPP
▪ Length of string would the same as the number of components in the
design
▪ Each position in the string can be either a 0 or a 1 and would
represent the component with associated with that index of the array
would be assigned to either partition 0 or partition 1
▪ For example: In a string 0 1 1 1 1 0 0 0
Partition 0 would consist of components n0, n5, n6, n7, while
partition 1 would consist of n1, n2, n3, and n4
Spring Semester
VLSI Physical Design Automation
101
Genetic Algorithms for Partitioning (TWPP)
▪ For GA the cost function would have to provide proper weightage to
both the imbalance factor and the cutset factor.
▪ GA can still allow a string that may have all 0’s or all 1’s, but the cost
function would ensure that it does not make it to the next generation.
▪ In every generation and from one generation to the next, it is
important to know the best solution upto that point
▪ Population size (p) per generation should be fixed (30 - 50).
Generation count is relatively large (G) around 1000
▪ Mutation probability should be kept very small (0.005). It can be
𝑝
computed as 𝑃𝑚 = 1 − 0.9 ∗ ( )
𝐺
▪ Stopping criteria can be generation count (G)
Spring Semester
VLSI Physical Design Automation
102
Genetic Algorithms for Partitioning - Flow Chart
Source: “Mutli-objective module partitioning design for dynamic
and partial reconfigurable system-on-chip using genetic algorithm by
Nithiyanantham Janakiraman and Palanisamy Nirmal Kumar. Journal of
Systems Architecture. Elsevier Publication
Spring Semester
VLSI Physical Design Automation
103
Spring Semester
VLSI Physical Design Automation
104
Floorplanning
▪ At the floorplanning stage, the VLSI circuit is seen as a set of
rectangular blocks interconnected by signal nets
▪ These rectangular blocks are placed on a two-dimensional surface
such that no two blocks overlap while optimizing certain objectives
▪ During floorplanning the overall area estimate is obtained, the pin
and pad lo
Spring Semester
VLSI Physical Design Automation
105
• Problem Formulation for Floorplanning
➢ Input: Blocks B1, B2, …, Bn of circuits with areas A1, A2, …, An respectively.
Associated with each block are aspect ratios ri and si for the lower bound and
upper bound.
➢Output: Determine the location of each block Bi along with its width and
height. In addition to finding the location and shape, the floorplanning
algorithm has to generate a valid placement for any of the following
objectives:
✓ Minimize area
✓ Minimize wirelength
✓ Maximize routability
✓ Minimize delays, or
✓ A combination of two or more of the above criteria
Spring Semester
VLSI Physical Design Automation
106
• Minimize area
• Find a feasible floorplan with the smallest overall area.
• Falls into the category of generalized two-dimensional bin-packing problem
• Even this simplified version of the floorplanning problem has been shown to
be NP-hard [B.S. Baker, et al. “Orthogonal packing in two dimensions”, SIAM J.
Compt, 9:846-855, 1980.
• Several P-time approximation algorithms exist.
• Minimize wirelength
• Find a feasible floorplan with minimum overall interconnect length
• A coarse measure of wirelength is used during floorplan
• All I/O pins of the block are merged and assumed to reside in the center
• Overall wirelength is calculated as L = ∑ Ci,j * Di,j where Ci,j is the connectivity
between blocks i and j, and Di,j is the Manhattan distance between the
centers of the rectangles i and j.
Spring Semester
VLSI Physical Design Automation
107
Floorplanning
• Assume we have five blocks with dimensions as given in table, some
feasible FPs is given below.
• All these FPs have the same area. If area is the only cost fn, all these
FPs are equally good.
Module
Width
Height
1
1
1
2
1
1
3
2
1
4
1
2
5
1
3
Spring Semester
VLSI Physical Design Automation
108
Another example of Floorplan
• In this, area and wirelength can be used in cost function
• Many feasible solutions exist and finding the optimal solution is once
again an NP-hard problem
Spring Semester
VLSI Physical Design Automation
109
Terminology
• Rectangular Dissection:
• It is a subdivision of a given rectangle by a finite number of horizontal and
vertical line segments into a finite number of non-overlapping rectangles.
• Slicing Structure:
• A rectangular dissection that can be obtained by iteratively subdividing
rectangles horizontally or vertically into smaller rectangles
• Slicing Tree:
• A slicing structure can be modeled by a binary tree with n leaves and n-1
nodes, where each node represents a vertical or horizontal cutline and each
leaf a basic rectangle. A slicing tree is also known as slicing floorplan tree.
• A skewed slicing tree is one in which no node and its right child are the same.
Spring Semester
VLSI Physical Design Automation
110
Slicing Tree Example
Spring Semester
VLSI Physical Design Automation
111
Slicing and Non-slicing Floorplans
• A FP that corresponds to a slicing structure is called a slicing FP,
otherwise, it is called a nonslicing floorplan
• Such floorplans are known as wheels. A wheel is the smallest
nonslicing floorplan
Spring Semester
VLSI Physical Design Automation
112
Spring Semester
VLSI Physical Design Automation
113
Floor Planning Algorithms
• Classification of Floorplanning Algorithms
➢ Constructive
▪ Attempt to build a feasible solution by starting from a seed module, then adding in other
modules to the partial floorplan. Example: Cluster Growth
➢ Iterative
▪ Start with initial floorplan, which is perturbed to obtain another feasible floorplan until
no further improvements can be obtained. Example: Simulated Annealing, GA
➢ Knowledge Based
▪ As the name suggests a knowledge expert system is implemented with the help of
human experts who understand the system well enough to suggest where certain
rectangles should be placed.
Spring Semester
VLSI Physical Design Automation
114
Cluster Growth Algorithm
• Greedy algorithm. Floorplan is constructed one module at a time until each
module is assigned to a location of the floorplan
• A seed module is selected and placed into the lower left corner of the floorplan
• The remaining modules are selected one at a time and added to the partial
floorplan, while trying to grow evenly
• In example below, module a is placed in lower left corner, modules b and c are
placed so that the increase to the floorplan dimensions is minimum
Spring Semester
VLSI Physical Design Automation
115
Module Selection for Cluster Growth Algorithm
• The ordering of a particular module m depends on the types of nets
attached to m
• Three categories of nets:
• Terminating Nets - have no other incident blocks that are unplaced
• New Nets - have no pins on any module from the partially constructed FP
• Continuing Nets - have at least one pin on a module from the partial FP and at least one pin
on an unplaced module
• The module that completes the greatest number of “unfinished” nets
should be placed first. (Read Example 3.2 VLSI Physical Design Automation: Theory and Practice by Sait
and Youssef)
Spring Semester
VLSI Physical Design Automation
116
Cluster Growth Algorithm
• A seed module is selected and placed into the lower left corner of the floorplan
• Remaining modules are selected one at a time in an ordered way
• To determine the order the modules are organized into a linear list which
minimizes the number of nets that will be cut by any vertical line drawn between
any consecutive modules in the linear order
• The linear ordering algorithm starts with a seed module, and enters a repeat loop
• At each iteration, a gain function is computed for each module in the set of
remaining unordered modules.
• Module with the maximum gain is selected for inclusion
• In case of a tie, module which terminates maximum started nets is selected
• If this is also a tie, then module that is connected to largest number of continuing nets is
selected
Spring Semester
VLSI Physical Design Automation
117
Cluster Growth Algorithm
Spring Semester
VLSI Physical Design Automation
118
Spring Semester
VLSI Physical Design Automation
119
Simulated Annealing
• First an initial solution for the floorplan is selected – may be done
randomly or using one of the deterministic algorithms
• Controlled walk through the search space is performed until no
sizeable improvement can be made or we run out of time. How does
one ensure that there are no overlaps in intermediate solutions?
• Two approaches possible –
• Direct approach – SA is applied directly to the physical floorplan
• Indirect approach – SA is applied to an abstract representation of the actual
floorplan
Spring Semester
VLSI Physical Design Automation
120
Simulated Annealing – Indirect Approach
• Restricted to slicing floorplans
• Slicing floorplans have the disadvantage of extra dead space but the
advantage of computational ease
Spring Semester
VLSI Physical Design Automation
121
Simulated Annealing – Indirect Approach
• The hierarchical structure of a slicing structure can be represented by
a binary tree with n leaves representing the n basic modules, and (n –
1) nodes representing the dissection/slicing operations
• Postorder traversal of a slicing tree will produce Polish expression
with operators H and V, and with operands 1, 2, … n
• In postorder traversal of a binary tree, the tree is traversed by visiting
at each node the left subtree, the right subtree, and then the node
itself
• Since there is only one way of performing a postorder traversal of a
binary tree, there is a one-one correspondence between floorplan
and its corresponding Polish expression
Spring Semester
VLSI Physical Design Automation
122
Polish expression
• Below diagram shows an example of a rectangular dissection and its
corresponding slicing tree
• The operators H and V have the following meanings
• ijH means rectangle j on top of rectangle i
• ijV means rectangle i to the left of rectangle j
Spring Semester
VLSI Physical Design Automation
123
Algorithm for post order traversal of binary tree
• In postorder traversal of a binary tree, the tree is traversed by visiting
at each node the left subtree, the right subtree, and then the node
itself
• A Polish expression E = e1e2…en-1 is called normalized iff E has no
consecutive H’s or V’s
Spring Semester
VLSI Physical Design Automation
124
Simulated Annealing – Indirect Approach
• Floorplan solutions are represented by normalized Polish expressions
• Three types of perturbations are possible in Polish expression:
M1: Swap two adjacent operands
M2: Complement some chain of nonzero length (V => H; and H => V)
M3: Swap two adjacent operand and operator
• Two normalized Polish expressions are called neighbors if one can be
obtained from the other using one of the above three moves
• Algorithm has to take care that neighbors of normalized expressions
are also normalized. M1 and M2 always produce a normalized
expression. If M3 produces a non normalized Polish expression, that
move is rejected.
Spring Semester
VLSI Physical Design Automation
125
Spring Semester
VLSI Physical Design Automation
126
Simulated Annealing – Indirect Approach
• Checking the new expression E does not contain two identical
consecutive operators is straightforward and achievable in O(1) time.
• To measure how good a floorplan solution is, we use a cost function
• Cost function usually measures and tries to minimize the overall area
of the floorplan and the overall interconnect length
𝐶𝑜𝑠𝑡 𝐹 = 𝛼𝐴 + λ𝑊
where A is the area of the smallest rectangle enveloping the n
modules, and W is the measure of the overall wire length.
What are α and λ ?
Spring Semester
VLSI Physical Design Automation
127
Simulated Annealing – Example
Spring Semester
VLSI Physical Design Automation
128
Spring Semester
VLSI Physical Design Automation
129
Simulated Annealing – Actual Algorithm
• When using SA technique, there are several important decisions that
must be made:
• A choice of initial solution
• A choice of cooling schedule
• Initial temperature
• how long before reducing the temperature
• temperature reduction rate
• Perturbation function
• Termination condition for the algorithm
Spring Semester
VLSI Physical Design Automation
130
Simulated Annealing – Actual Algorithm
• A choice of initial solution – Can be randomly created as long as it is a
feasible solution.
• A choice of cooling schedule
• Initial temperature: Chosen such that the probability of accepting uphill
moves is higher.
• How long before reducing the temperature: At each temperature, a number
of trials are attempted until either N uphill moves, or total number of moves
exceeds 2N, where N is an increasing function of n, the number of modules
• Temperature reduction rate: Generally accepted value for λ is 0.85
• Perturbation function
• Termination condition for the algorithm: When number of accepted
moves is too small (≤ 5%) or when temperature is below 0.1*T0
Spring Semester
VLSI Physical Design Automation
131
Simulated Annealing – Pseudocode
Spring Semester
VLSI Physical Design Automation
132
Other Floorplanning Techniques
• Read pages 122 – 160 from “VLSI Physical Design Automation: Theory
and Practice” by Sait and Youssef
Spring Semester
VLSI Physical Design Automation
133
Placement
▪ Circuit placement is the process of determining the location of each
gate (or library cell) in the netlist
▪ Usually the objectives include area, wirelength, timing, congestion,
thermal hotspots, power consumption, power supply noise, and
routability
▪ Placement is usually done in two phases
▪ Global placement where the rough location of each gate is determined.
Overlap of cells is allowed at this stage.
▪ Detailed placement is the step where the cell location is hardened and
overlaps are removed
Spring Semester
VLSI Physical Design Automation
134
Complexity of Placement
▪ The placement of cells in order to minimize the total wirelength is an
NP-complete problem
▪ Even the simplest case of the problem – one-dimensional placement
is hard to solve.
𝑛!
▪ There are as many as linear arrangements of n cells
2
▪ In practice, the number of cells to be placed can be several thousands or
more
▪ Enumerating all possibilities and selecting the best is impractical
▪ A number of good heuristic techniques have been developed
▪ Provide good solution, not necessarily the best solution
Spring Semester
VLSI Physical Design Automation
135
Spring Semester
VLSI Physical Design Automation
136
Placement – Problem Definition
▪ On board from Page 164 – Sait and Youssef
Spring Semester
VLSI Physical Design Automation
137
Placement – Cost Function
▪ A placement is acceptable if 100% routing can be achieved within a given area
▪ However, routing is the step after placement is completed, so performing actual
routing and comparing placements is impractical
▪ Estimates are used for assessing, and is based on measuring the estimated
wirelength to assess the “goodness” factor of any placement
▪ A commonly used objective function is to minimize L(P), the total wirelength
over all signal nets for placement P.
Spring Semester
VLSI Physical Design Automation
138
Placement – Estimation of Wirelength
▪ Speed of estimation has a drastic effect on the performance of the
placement algorithm.
▪ Estimation error must be uniform across all nets
➢ Assumes Manhattan routing
➢ For a two pin net connecting module i to module j, the Manhattan length is
given as 𝑟𝑖𝑗 + 𝑐𝑖𝑗, where 𝑟𝑖𝑗 and 𝑐𝑖𝑗 are the number of rows and columns
separating the locations of the two modules
➢ Since not all nets are 2-pin nets, we have methods to estimate nets
connecting multiple pins
Spring Semester
VLSI Physical Design Automation
139
Placement – Methods for wirelength estimation
▪ Semi-perimeter Method
➢ Most widely used method. Find the smallest bounding box that encloses all
the pins of the net to be connected. Estimated wirelength would then be the
half-perimeter of that bounding box.
➢ This method provides the best estimate for the most efficient wiring scheme,
which is the Steiner tree.
▪ Complete Graph
➢ For an n-pin net, the complete graph consists of [n(n-1)]/2 edges
➢A tree has (n – 1) edges which is 2/n times the number of edges in the
complete graph. The estimated wirelength using this method is
Spring Semester
VLSI Physical Design Automation
140
Placement – Methods for wirelength estimation
▪ Minimum Chain
➢ Nodes are assumed to be on a chain and each pin has at most two neighbors.
➢ Start from one vertex and connect to the closest one, and then to the next
closes and so on until all the vertices are connected.
➢ This technique is simpler than MST but results in longer wirelength
▪ Source to Sink Connection
➢ Output of a cell is assumed to connect to all other points of the net (inputs of
other cells) by separate wires
➢ Simplest to implement but results in excessive wirelength
Spring Semester
VLSI Physical Design Automation
141
Placement – Methods for wirelength estimation
▪ Steiner Tree Approximation
➢ Steiner tree is the shortest route for connecting a set of pins
➢ Wires can branch from any point along its length to connect other pins
➢ Problem of finding minimum Steiner tree is itself NP-complete
▪ Minimum Spanning Tree
➢ Unlike Steiner tree, branching is allowed only at the pin locations
➢ For an n-pin net, the tree can be constructed by determining the distances
between all possible pairs of pins, and connecting the smallest (n-1) edges
that do not form a cycle.
➢ Kruskal’s algorithm to find MST completes is of polynomial time complexity
Spring Semester
VLSI Physical Design Automation
142
Comparison of wirelength estimation algorithms
Spring Semester
VLSI Physical Design Automation
143
Minimizing total wire length
▪ Main objective of placement is to give a solution that is completely
routable and the area taken by the routing wires to be minimum
▪ One way to accomplish this is to place the strongly connected cells close to
one another
▪ A commonly used objective function 𝑛is
𝐿(𝑃) = ෍ 𝑤𝑛 . 𝑑𝑛
𝑛 ∈𝑁
where dn = estimated length of net n
wn = weight associated with net n
N = set of nets
▪ In this estimate each net length is calculated independently so area is a
rough estimate
Spring Semester
VLSI Physical Design Automation
144
Other cost functions
▪ Minimize maximum cut – read Sait and Youssef section 4.3.3
▪ Minimize maximum density – read Sait and Youssef section 4.3.4
▪ Maximize performance – read Sait and Youssef section 4.3.5
Spring Semester
VLSI Physical Design Automation
145
Placement Solution Approaches
▪ Since placement is in the class of NP-hard problems, heuristics can be
used to get “good” solutions even if not the best.
▪ Even linear placement increases exponentially with increase in “n”.
Example:
▪ With (n=3) blocks, there are 6 possible placement solutions
▪ With (n=4) blocks the possible solutions increases to 24
▪ Heuristic algorithms for placement are classified into
▪ Constructive algorithms
▪ Iterative algorithms
Spring Semester
VLSI Physical Design Automation
146
Constructive Placement Solution Approach
▪ The layout surface is imagined to be divided into “n” slots to place one cell in
each slot
▪ Place one cell at random into one of the slots. At the end of this process, the
algorithm needs two make a decisions:
▪ which cell to place next
▪ where to place the selected cell
▪ A possible heuristic for selecting the next cell could be whichever cell is most
strongly connected to the already placed cell
▪ Suppose the partial placement has m1, m2, … mi cells already in layout
▪ Find all cells connected to any of the already placed cells
▪ For each such cells compute the connectivity
▪ Select the one that is most strongly connected to any of the placed cells and place the new
cell close to that cell.
▪ This heuristic is known as maxcon (maximum connectivity) strategy
▪ By its very nature, this is a greedy algorithm
Spring Semester
VLSI Physical Design Automation
147
Constructive Placement Algorithm
Spring Semester
VLSI Physical Design Automation
148
Spring Semester
VLSI Physical Design Automation
149
Partition based approaches
▪ Min-cut Placement: Similar to partitioning problem. Read Sait and
Youssef pp 179-189
Spring Semester
VLSI Physical Design Automation
150
Mincut Placement
▪ The given circuit is repeatedly partitioned into two sub-circuits
▪ Correspondingly, the layout region is divided either horizontally or
vertically to accommodate the sub-circuits
▪ Repeat this process until each partition is occupied by a single cell
▪ Should not create any overlaps, but if it does, post-process to
“legalize placements”
▪ Minimizing the cutset at each stage tends to minimize the overall
wirelength
▪ Original paper (by Breuer, 1977) presented several cut procedures
Spring Semester
VLSI Physical Design Automation
151
Example Mincut Placement
▪ Assume all gates are located at center of the placement region
▪ If a bi-partitioning cutline is inserted to divide the given block, the
gates are located at the center of the two sub-blocks
Spring Semester
VLSI Physical Design Automation
152
Practice Problem (“Practical Problems in VLSI Physical Design Automation” by Sung Kyu Lim pp 103-111)
Consider the gate-level netlist shown in
the Table. Figure shows the undirected
graph model of the netlist, where the
thick and the thin edges have weights of
1 and 0.5, respectively. The primary
inputs and outputs do not need to be
placed.
Spring Semester
VLSI Physical Design Automation
153
Practice Problem Contd.
Spring Semester
VLSI Physical Design Automation
154
Practice Problem Contd.
Spring Semester
VLSI Physical Design Automation
155
Gordian Algorithm for Placement
▪ This algorithm is based on quadratic programming (QP)
▪ In this the placement problem is formulated as a sequence of QP derived from
the connectivity information of the circuit
▪ A set of constraints restricts the freedom of movement of the gates at every
iteration – reduces overlap
▪ A top-down partitioning is performed so that cells grouped into the same
partition satisfy “center of gravity” constraint. The area weighted center of the
circuit must coincide with the center of block.
▪ This alternate sequence of QP and top-down partitioning repeats until the sizes
of the partitions is small enough
▪ Goal is to minimize the squared wirelength among the cells, because of which a
quadratic objective function is required
▪ Read pp. 112 - 121 from “Practical Problems in VLSI Physical Design Automation”
by Sung Kyu Lim
Spring Semester
VLSI Physical Design Automation
156
Iterative Methods for Placement
▪ Simulated Annealing is one of the most popular algorithms for
placement
▪ The perturbation function can be modified to suit the placement
problem. Will need to redefine “accept” criteria for new placement
▪ Some possible ways for perturbation:
▪ Select 2 neighboring cells and swap their position
▪ Displace a randomly selected cell from its current position to a randomly
selected new position
▪ If layout rules allow, perturbation may be possible by rotation or mirroring the
same cell in its same location.
Spring Semester
VLSI Physical Design Automation
157
Iterative Methods for Placement
▪ One possible measure of cost may be the wire length
Let
Δh = (C (New S) – C (S)), where C is cost
The swap is accepted if Δh < 0 (i.e. C(New S) < C(S)) or if the
acceptance function is true
𝐴𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = (𝑟𝑎𝑛𝑑𝑜𝑚 < 𝑒
− Δh/T
)
where random is a random number generated between 0 – 1
T is the current value of temperature
Spring Semester
VLSI Physical Design Automation
158
Timberwolf Algorithm for Placement
▪ TimberWolf (TW) is the name of one of the most widely used
placement tool
▪ Internally, TW uses the Simulated Annealing algorithm
▪ Using the information provided by the user and the total width of the
standard-cells to be placed, TW computes the target lengths of the
rows
▪ It then computes the initial position of the cells to be placed, followed
by initial placement of macro blocks, followed by placement of pads
▪ Pads and macro blocks retain their initial placement only the standard
cells change their positions for optimization
Spring Semester
VLSI Physical Design Automation
159
Spring Semester
VLSI Physical Design Automation
160
Timberwolf Algorithm for Placement
▪ Following initial placement, the algorithm has three distinct stages
▪ First stage: cells are placed so as to minimize the estimated wire
length
▪ Second stage: Feedthrough cells are inserted into the layout as
required, and wirelength is minimized again, then preliminary global
routing is done
▪ Third stage: local changes are made in the placement to reduce the
number of wiring tracks required
▪ Objective function: Minimize the estimated interconnect length
Spring Semester
VLSI Physical Design Automation
161
Timberwolf Algorithm Perturb Function
▪ Since the underlying algorithm is SA, there needs to be a way to move
from one point on the solution space to another point. Perturbation
is done in three different ways:
▪ Move a single cell to a new location
▪ Swap two cells
▪ Mirror a cell about the x-axis
▪ In TW, cell mirroring is used only about 10% of the cases – where cell
movement is rejected
Spring Semester
VLSI Physical Design Automation
162
Timberwolf Algorithm Perturb Function
▪ When moving a single cell to a new location or swapping two cells,
there is possibility of overlap between cells or a cell getting pushed
out of the layout region
▪ This is handled by adding a penalty function for moves that cause cell
overlaps or exceeding the boundary
▪ Penalty function makes the move less attractive so is considered as a
bad move
▪ Cells can have a LOCKED flag associated. If the LOCKED flag is set to 1,
then that cell cannot be selected for any further movement
Spring Semester
VLSI Physical Design Automation
163
Recent Developments for Placement
▪ Artificial Neural Networks – basis is around a large number of artificial
neurons simulating the human brain or the nervous system
▪ An artificial neuron receives several inputs X1, X2, …, Xn and generates
a single output OUT
▪ The total input is given as 𝑁𝐸𝑇 = σ𝑛𝑖=1 𝑊𝑖 . 𝑋𝑖 where Wi is the
weight associated with input 𝑋I
▪ The output is a function F of NET where F is also known as the
activation function of the neuron
▪ A popularly used activation function is the sigmoid function
𝐹 𝑥 = 1/(1 + 𝑒 −𝑥 )
If x is sufficiently large, F(x) approximates unity
Spring Semester
VLSI Physical Design Automation
164
ANN for Placement
▪ Several neurons connected together form a neural network
▪ Example of 3 neurons connected together in a feed forward
formation, or recurrent network
Spring Semester
VLSI Physical Design Automation
165
ANN for Placement
▪ Similar to temperature in SA, energy plays an important role in ANN
▪ The set of all outputs OUTi is known as the state of the network
▪ Assuming the activation function of each neuron is a threshold fn
▪ For a recurrent network
▪ This network changes states or energy level and is said to converge to
optimal when the energy is minimal as recognized by all the diagonal
entries in matrix W are 0.
Spring Semester
VLSI Physical Design Automation
166
Spring Semester
VLSI Physical Design Automation
167
Download