ppt - University of Michigan

advertisement
© KLMH
EECS 527 Paper Presentation
Techniques for Fast Physical Synthesis
By Charles J. Alpert, Shrirang K. Karandikar,
Zhuo Li, Gi-Joon Nam, Stephen T. Quay,
Haoxing Ren, C. N. Sze, Paul G. Villarrubia,
and Mehmet C. Yildiz
Presented by Lingfeng Xu
Department Electrical Engineering and Computer Science
University of Michigan, Ann Arbor
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
1
Lienig
11/2011
© KLMH
EECS 527 Paper Presentation
Outlines

Introduction
 Buffering Trends
 Major Phases of Physical Synthesis
 Closer Look at Optimization

Selected Techniques
 Fast Timing-Driven Buffering
 Layout Aware Buffer Trees
 Diffusion Based Legalization
Q&A
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
2
Lienig

© KLMH
EECS 527 Paper Presentation
Introduction

Purpose of physical synthesis
 Timing closure

Physical synthesis
 Iterations
 Iterate between manual design work and automatic physical synthesis

Philosophy
 As fast as possible even if a little optimality is sacrificed

IBM’s physical synthesis tool
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
3
Lienig
 PDS (Placement-Driven Synthesis) system
© KLMH
EECS 527 Paper Presentation
Buffering trends

“Buffering Explosion”
 Thiner wires == resistance increase
 Wire delays increasingly dominate gate delays
 Saxena et al. [3] predict that half of all logic will consist of buffers
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
4
Lienig
 20% - 25% buffers or inverters in today’s 90nm design
© KLMH
Percentage of block-level nets
requiring repeaters [3]

Intra-block communication
repeaters as a percentage of
the total cell count for the block
[3]
Chapter 5: Global Routing
5
Lienig
VLSI Physical Design: From Graph Partitioning to Timing Closure

© KLMH
EECS 527 Paper Presentation
Buffering trends

Challenges
 Buffer insertion need to be performed fast
 Area and Power
 Layout awareness
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
6
Lienig
 Buffering constricts or seeds global routing
© KLMH
EECS 527 Paper Presentation
Major Phase of Physical Synthesis

PDS stages
 Initial placement and optimization
 Timing-driven placement and optimization
 Timing-driven detailed placement
 Optimization techniques
 Clock insertion and optimization
 Routing and post routing optimization
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
7
Lienig
 Early-mode timing optimization
EECS 527 Paper Presentation
Closer look at Optimization

Optimization phases
 Electrical correction
 Critical path optimization
 Histogram compression
•
•
•
•
© KLMH
Phase 1
Initial Placement
Electrical Correction
Legalization
Critical Slack Optimization
Phase 2
•
•
•
•
•
•
Timing-driven Placement
Electrical Correction
Critical Slack Optimization
Legalization
Compression
Legalization
 Legalization
Phase 3
• Timing-driven Detailed Placement
An example of physical
synthesis breakdown
VLSI Physical Design: From Graph Partitioning to Timing Closure
•
•
•
•
•
•
•
•
Electrical Correction
Legalization
Critical Slack Optimization
Legalization
Critical Slack Optimization
Legalization
Compression
Legalization
Chapter 5: Global Routing
8
Lienig

Phase 4
© KLMH
EECS 527 Paper Presentation
How to Achieve Fast Physical Synthesis?

Selected Techniques
 Fast Timing-Driven Buffering
 Layout Aware Buffer Trees
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
9
Lienig
 Diffusion Based Legalization
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering

Motivation
 Over a million buffers
 Rebuffering rips all buffers and reinserts buffers from scratch

Considerations
 Buffering resources vs. delay
 Runtime
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
10
Lienig
 Slew, noise and capacitance constraints
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering

Classical Buffering Algorithm
 Goal: Maximize source RAT
 Dynamic programming
 Candidate solutions generated and propagated from the sinks to the source

Solution internal node characteristics (q, c, w)
 q: required arrival time
 c: downstream load capacitance
 w: cost summation for the buffer insertion decision
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
11
Lienig
 Example: sink (q = RAT, c = load capacitance, w = 0)
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Classical Buffering Algorithm
 Two solutions α1, α2
 α2 dominates α1, if q2 ≥ q1, c2 ≤ c1 and w2 ≤ w1
 α1 is redundant and can be pruned
At the end of algorithm
 A set of solutions with different cost-RAT tradeoff is obtained
 Choose one in middle
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
12
Lienig
 “10 ps rule”: If margin RAT gain is more than 10ps,
choose solution with bigger RAT
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Prebuffer Slack Pruning (PSP)
 Based on current node being processed
 if q2 < q1, c2 < c1 and (q2 - q1)/(c2 - c1) ≥ Rmin, then α2 is pruned early
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
13
Lienig
 Appropriate Rmin guarantees optimality, however larger value does not hurt
solution quality
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering
Squeeze Pruning
 Three partial solutions α1, α2, α3 with same cost
 if (q2 - q1)/(c2 - c1)≤(q3 - q2)/(c3 - c2), then α2 is pruned
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
14
Lienig
 For a two-pin net, the middle point is always dominated by either the first or the
third solution; for multi-sink net, optimality not guaranteed but causes no
degradation in solution most of the time
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering

Library Lookup
 Every buffer in the library is examined
for iteration
If there are m kinds of buffer and
inverter, n nodes, mn candidate
solutions in total
 However many candidate solutions are
not worth considering
 Pre-compute Buffer table and Inverter
table
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
15
Lienig
 2n candidate solutions, n with inverters
and n with buffers
© KLMH
EECS 527 Paper Presentation
Fast Timing-Driven Buffering

Results and Summary
 Derived from 5000 high capacitance nets from an ASIC chip
 3% quality degradation and 20x speedup
 Philosophy: as fast as possible even if a little optimality is sacrificed
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
16
Lienig
 Rip up and rebuffering with more accurate techniques can be perform latter
if desired
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees

Layout problems in buffering
 (a) Alley
 (b) Pile-ups
 Holes in large blocks

Layout constrains
 Holes in large blocks
 Navigating blocks and dense
region
 Critical and non-critical routes
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
17
Lienig
 Avoiding routing congestions
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees

Layout aware buffer tree flow
 Step 1: Construct a fast timing-driven Steiner tree
 Step 2: Reroute the Steiner tree to preserve its topology while navigating
environmental constrains
 Step 3: Insert buffers (e.g. with Fast Timing-Driven Buffering)
This work focuses on Step 2
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
18
Lienig

© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees

Algorithm
 Break existing Steiner tree into disjoint 2-paths, i.e., paths start and end with
either source, sink or a Steiner point
 Each 2-path is routed in turn to minimize cost, starting from sinks and ending
at source
 Maze routing for each
2-path with cost
function
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
19
Lienig
 If Steiner point is in a
congested region,
move it in a specified
“plate region”
© KLMH
EECS 527 Paper Presentation
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
20
Lienig

© KLMH
EECS 527 Paper Presentation
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
21
Lienig

© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees
General Maze routing cost function
 Tradeoff parameter 0 ≤ K ≤ 1
 Tile cost: cost(t) = 1 + K e(t)
 Merging branches:
cost(t) = max(cost(L), cost(R) + K min(cost(L), cost(R))
 Sink initialization
cost(s) = (K - 1)RAT(s)/DpT
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
22
Lienig
 Use K=1 for electrical correction; use K=0.1 for critical path
© KLMH
EECS 527 Paper Presentation
Layout Aware Fast and Flexible Buffer Trees

Example and Summary
 A 7-pin net of an industrial design
 (a) K=1.0, 4134ps slack improvement
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
23
Lienig
 (b) K=0.1, 4646ps slack improvement
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Classical legalization
 After optimization, local regions can be overfull
 Run periodically to snap from overlaps to legal positions
 If one waits too long between two legalizations, cells may end up quite far
away from optimal position, which may severely hurt timing

Diffusion-Based Legalization
 Avoid cells been moved too far away
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
24
Lienig
 Fast. Run in minutes on designs with millions of gates
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Diffusion as a Physical Process
 Moves elements from a state with non-zero potential energy to a state of
equilibrium
 Can be modeled by breaking down into finite time steps
 Relationship of material concentration with time and space
t
  2 d x , y (t )
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
25
Lienig
d x , y (t )
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Diffusion as a Physical Process
 Cell velocity
vxH, y ( x, y )  
v ( x, y )  
V
x, y
d x , y (t )
x
d x , y (t )
y
/ d x , y (t )
/ d x , y (t )
 Cell new location
t
x(t )  x(0)   vxH(t '), y (t ') (t ')dt '
0
t
y(t )  y(0)   vVx(t '), y (t ') (t ')dt '
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
26
Lienig
0
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Diffusion Based Placement
 Coordinates are scaled so that the width and height of each bin is one
 Location (x, y) lies in bin ( j, k )  ( x ,  y )
 Forward Time Centered Space (FTCS) scheme
New bin density
t
d j ,k (n  1)  d j ,k (n)  (d j 1,k (n)  d j 1,k (n)  2d j ,k ( n))
2
t
 (d j 1,k (n)  d j ,k 1 (n)  2d j ,k ( n))
2
 Bin velocity
vVj ,k (n)  
d j 1,k (n)  d j 1,k (n)
2 d ( j , k ) ( n)
d j ,k 1 (n)  d j ,k 1 (n)
VLSI Physical Design: From Graph Partitioning to Timing Closure
2 d ( j , k ) ( n)
Chapter 5: Global Routing
27
Lienig
v Hj,k (n)  
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Diffusion Based Placement
 Enforce vH = 0 at horizontal boundary and vH = 0 at vertical boundary
 Two cells right next to each other can be assigned very different velocities
which could change their relative ordering. Apply velocity interpolation based
on the four closest bins to remedy this behavior
 New locations (x, y) for the next time stamp
x(n  1)  x(n)  vxH( n ), y ( n ) t
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
28
Lienig
y (n  1)  y (n)  vVx ( n ), y ( n ) t
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Diffusion Based Placement: Getting it work
 Diffusion process reaches equilibrium when each bin has the same density,
i.e. the average density, can cause unnecessary spreading, even if every
bin’s density is well below dmax
 Idea: Run diffusion for regions which requires it
 Local Diffusion: Run diffusion on cells in a window around bins that violate
target density constraint
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
29
Lienig
 If FTCS error exceeds a certain threshold, update the real density based on
real cell placement and restart the diffusion algorithm
© KLMH
EECS 527 Paper Presentation
Diffusion-Based Placement Techniques for Legalization

Example
 Before legalization,
after traditional legalization
and diffusion legalization
 4% total wire length save
 48% worst slack improvement
 36% less negative paths

Summary
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
30
Lienig
 Diffusion based legalization
is less likely to disrupt the
state of design
© KLMH
EECS 527 Paper Presentation
Summary

Buffering trends
 “Buffer Explosion”

Physical synthesis phases

Fast Timing-Driven Buffering

Layout Aware Buffer Trees

Diffusion-Based Legalization
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
31
Lienig
 4 phases
© KLMH
EECS 527 Paper Presentation
Thanks !
VLSI Physical Design: From Graph Partitioning to Timing Closure
Chapter 5: Global Routing
32
Lienig
Q&A
Download