Standard Cell Based Design
(RTL-to-GDSII)
UNIST 전기전자공학과
조교수 박희천
1
/ 210
Semiconductor Design Automation Laboratory
Lecturer’s Profile
▪ 학력
−’11.2: 서울대학교 전기공학부 학사
−’18.2: 서울대학교 전기컴퓨터공학부 박사
▪ 경력
−’18.3 ~ ’20.2: GeorgiaTech Post-doc. Fellow
−’20.4 ~ ’22.8: 서울대학교 초빙연구원 – 선임연구원 – BK조교수
−’22.9 ~ ’24.2: 국민대학교 전자공학부 조교수
−’24.3 ~ : UNIST 전기전자공학과 조교수
▪ Research area: VLSI/SoC physical design methodologies
−Electronic design automation (EDA), Computer-aided design (CAD)
−https://sites.google.com/view/unist-seda
2
/ 210
SeDA Lab.
강의 정보 (1)
▪강의 목표
−Standard cell 기반의 디지털 집적회로에 대한 RTL-toGDSII 설계 과정 전반에 대해 이해하는 것을 목표로 합니다.
▪강의 개요
−“Standard cell을 기반으로 하는 비교적 큰 규모의 집적회로
(ex. VLSI)에 대한 RTL-to-GDSII 설계 단계에 대한 강의를
진행합니다.”
−“논리 설계를 통해 완성된 HDL 형식의 디지털 회로(RTL)에
서 시작하여 합성 (synthesis), 배치 및 배선 (place & route,
P&R), 최종 분석 (analysis) 까지 진행되는 일련의 과정에 대
해 전반적으로 다룰 예정입니다.”
3
/ 210
SeDA Lab.
강의 정보 (2)
▪ 시간: 1/13~14 (2일), 10:00 ~ 17:00
−오전: 10:00~12:00
−오후: 13:00~17:00
(총 12시간)
시간
오전
(10:00~12:00)
시간
7/4 (목)
오전
(10:00~12:00)
Introduction
Placement (2)
CTS
Synthesis
오후
(13:00~17:00)
7/5 (금)
Floorplan
오후
(13:00~17:00)
Placement (1)
4
/ 210
Routing
Analysis
SeDA Lab.
Reference
▪ Books (VLSI physical design)
− Sung Kyu Lim, “Practical Problems in VLSI Physical Design Automation”, Springer
− Andrew B. Kahng et al., “VLSI Physical Design: From Graph Partitioning to Timing
Closure”, Springer
▪ Lecture notes (VLSI physical design)
− GeorgiaTech – ECE6133 (Pf. Sung Kyu Lim)
− Washington State Univ. – EE582 (Pf. Dae Hyun Kim)
▪ Slides
− Synopsys University Education Program
− Cadence Customer Training Course
5
/ 210
SeDA Lab.
Introduction: VLSI, RTL-to-GDSII
6
/ 210
Semiconductor Design Automation Laboratory
single Rocket tile
8 tiles
VLSI (1)
tile0
tile1
FIFO
FIFO
tile8
...
FIFO
System bus
error
Rocket
tile
Bootrom
Debug
Bridge
mux/demux
Serdes
I/O driver
I/O driver
mux/demux
Rocket
tile
Bridge
Bridge
IVR
IVR
Rocket
tile
Bridge
I/O driver
Rocket
tile
NoC
PLIC
Periphery bus
L1 to L2 interface
Bridge
Rocket
tile
I/O driver
Memory
controller
Rocket
tile
Rocket
tile
mux/demux
IVR
IVR
Application (앱)
Rocket
tile
CLINT
Rocket chiplet
L2 chiplet
L2 cache
System
Firmware / OS
SW
HW
System (시스템)
Integrated Circuit
(IC)
Memory
ZN
Circuit (회로)
Memory cell
(SRAM/DRAM/…)
Standard cell
(NAND/NOR/INV/…)
A0
A1
Device (소자)
Transistor
G
G
Material (물질)
Silicon, Carbon, …
S
7
/ 210
D
S
D
SeDA Lab.
VLSI (2)
▪ VLSI: Very Large Scale Integration (Integrated circuit)
−IC → LSI → VLSI (→ ULSI …)
▪ SoC: System-on-Chip
System
System
“Many” IPs
in one chip
Integrated Circuit
(IC)
Memory
VLSI
Memory cell
(SRAM/DRAM/…)
Standard cell
(NAND/NOR/INV/…)
SoC
“Many” TRs
in one chip
Integrated Circuit
(IC)
Transistor
8
/ 210
SeDA Lab.
CAD / EDA for VLSI
▪ More TRs in a chip ➔ Integrated chip (circuit)
▪ IC → LSIC (large scale) → VLSIC (very large scale)
▪ Design complexity ↑↑ ➔ Use computer & algorithm
➔ CAD (computer-aided design)
➔“Automated” ➔ EDA (electronic design automation)
9
/ 210
SeDA Lab.
VLSI design flow
ENTITY test is
port a: in bit;
end ENTITY test;
System Specification
“What to do?”
Architectural Design
System-level design
Functional Design
and Logic Design
RTL (register transfer level)
→ Logic gates (standard cells)
(Logic) Circuit Design
→ (GDS) Layout
Physical Design
GDS (graphic design system)
DRC
LVS
ERC
Physical Verification
and Signoff
Design constraints, design rules, …
Fabrication
“Produce”
Packaging and Testing
Chip
10
/ 210
SeDA Lab.
RTL (register transfer level)
▪ Target circuit → Hardware Description Language (HDL)
−Ex. SystemVerilog, SystemC, Verilog, VHDL, …
11
/ 210
SeDA Lab.
RTL
▪ Design (설계)
−High-level or RTL coding
▪ Verification (검증)
−Synopsys VCS, Cadence Xcelium, Xilinx(AMD) Vivado, …
−“Functional” check
12
/ 210
SeDA Lab.
RTL-to-GDS(II)
Logical
design
SRAM
GDS Memory
SRAM
RTL
Logic
(+DfT)
Power
GOAL: High performance
Physical
design
Performance
Fabrication
13
/ 210
GOAL: Low power
SeDA Lab.
RTL-to-GDS
▪ Logic design (Synthesis)
−RTL ➔ Netlist
▪ Physical design
−Netlist ➔ Layout
▪ + Analysis (timing/power/…)
14
/ 210
SeDA Lab.
Synthesis
▪ RTL → Netlist
−In: RTL code, tech. library, design constraints, …
−Out: Gate-level netlist
Synthesis
ENTITY test is
port a: in bit;
end ENTITY test;
RTL
(Gate-level)
Netlist
15
/ 210
SeDA Lab.
Physical design
▪ Netlist → Layout
−In: Gate-level netlist, tech. library, design constraints, …
−Out: Physical layout
Physical
design
(GDS)
Layout
(Gate-level)
Netlist
16
/ 210
SeDA Lab.
Physical design
▪ Place-and-Route (P&R)
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
17
/ 210
SeDA Lab.
Partition & Floorplan
▪ Partitioning (Clustering)
−Chip → 𝑛 modules (sub-circuits)
−Design complexity↓
▪ Floorplanning (Chip planning)
−Define chip size
−Macro placement
−Power distribution network (PDN)
−…
tile_0
tile_1
tile_2
tile_3
tile_6
tile_7
uncore
tile_4
tile_5
18
/ 210
SeDA Lab.
Placement
▪ Place ALL standard cells (& I/O pins)
▪ Global placement → Detail placement (& legalization)
Global placement
(Coarse-grained)
Detail placement
(Fine-grained)
19
/ 210
SeDA Lab.
Clock Tree Synthesis (CTS)
▪ 1 clock source → ALL clock pins
−FFs, macro blocks, …
−Minimize clock “skew”
▪ Clock buffer is necessary
Bad
Good
▲ Clock tree topology
▲ Physical implementation
20
/ 210
SeDA Lab.
Routing
▪ Implement nets with metal layers
▪ Global routing → Detail routing
BEOL
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
Cell
FEOL
Global
route
Detail
route
Routing
guides
Via Via
Via
21
/ 210
SeDA Lab.
Analysis & Optimization
▪ Static timing analysis (STA), power analysis, …
−ALL path delays should be shorter than the clock period
꞉ “Path” = FF-to-FF signal delay
▪ Optimization for timing / power / SI / PI / thermal, …
Combinational
Logic
FF
Copy 1
Combinational
Logic
FF
Copy 2
Combinational
Logic
FF
Copy 3
Clock
Combinational
Logic
FF
Clock
22
/ 210
SeDA Lab.
EDA tools (1)
▪ SW programs that supports EDA
▪ EDA vendors
−Synopsys, Cadence, Siemens EDA, …
Synopsys
Cadence
Siemens
EDA
Synthesis
Design
Compiler
Genus
Oasys-RTL
Place &
Route
IC Compiler
Innovus
Aprisa
Analysis
(Timing)
PrimeTime
Tempus
-
23
/ 210
SeDA Lab.
EDA tools (2)
▪ “There’s no golden script”
Synopsys ICC2
Cadence Innovus
Floorplan
Floorplan
placeDesign
Placement
& opt.
place_opt (5 stg)
CTS
& opt.
clock_opt (3 stg)
Route
& opt.
or
place_opt
_design
or
ccopt_
design
optDesign -preCTS
clockDesign
optDesign -postCTS
route_auto
routeDesign
route_opt
optDesign -postRoute
24
/ 210
SeDA Lab.
EDA tools (3)
▪ OpenROAD project (Funded by DARPA, 4-year, $11.3M)
▪ “Open-source” EDA flow ➔ Free!
▼ https://theopenroadproject.org/
▼ https://github.com/The-OpenROAD-Project
25
/ 210
SeDA Lab.
EDA tools (4)
▪ Improvements from students
26
/ 210
SeDA Lab.
* Synopsys EDA flow
27
/ 210
SeDA Lab.
* Cadence EDA flow
28
/ 210
SeDA Lab.
Synthesis
29 / 210
Semiconductor Design Automation Laboratory
Common theory
▪ Translation + Mapping + Optimization
30
/ 210
SeDA Lab.
Common theory
▪ (Example)
Translation
Z = A & B;
ZN = ~Z;
6 TRs
Optimization
Z
ZN
B
ZN
Mapping
(inv)
(and) Z
A
A
2 TRs
4 TRs
Optimization
B
A
ZN
B
And-Inverter Graph (AIG)
or
Majority-Inverter Graph (MIG)
31
/ 210
SeDA Lab.
Common tool flow
.v, .sv, .vhdl, …
LIB (Cadence)
DB (Synopsys)
RTL
(Tool command list)
Technology
library
Synthesis
tool
Design
constraints
Gate-level netlist
.v, .sv, .vhdl, …
32
/ 210
SeDA Lab.
Technology library – LIB
▪ Liberty file (.lib) – Timing & Power information
−DB (.db) – Converted from LIB, specialized to Synopsys tools
▪ Ex. Nangate 45nm library (open-source)
−Common info.
Wireload model
33
/ 210
SeDA Lab.
Technology library – LIB
−Cell info. (INV_X1)
Timing info.
34
/ 210
SeDA Lab.
Technology library – LIB
−Cell info. (INV_X1)
Power info.
▪ Non-linear delay model (NLDM) – simple, inaccurate
▪ Current-Source model (CCS, ECSM) – complex, accurate
35
/ 210
SeDA Lab.
Design constraints
▪ .tcl script ➔ import to synthesis tool
−Example from Synopsys
36
/ 210
SeDA Lab.
Design constraints
▪ .tcl script ➔ import to synthesis tool
−Example from Synopsys
37
/ 210
SeDA Lab.
Synthesis flow in tool
Parsing
“Translation”
Elaboration
Analysis/Translation
“Optimization”
“Mapping & Optimization”
Technology-independent synthesis
Technology-dependent synthesis
Logic Optimization
/ Compilation
38
/ 210
SeDA Lab.
Synthesis commands (Synopsys example)
▪ (after importing RTL, lib, constraints)
−Run synthesis (compile)
39
/ 210
SeDA Lab.
Commands (Synopsys example)
▪ (after importing RTL, lib, constraints)
−Report design
40
/ 210
SeDA Lab.
Commands (Synopsys example)
▪ (after importing RTL, lib, constraints)
−Write output files (netlist → .v, design constraints → .sdc)
41
/ 210
SeDA Lab.
Output: Gate-level netlist
▪ Verilog (.v), VHDL (.vhdl), …
42
/ 210
SeDA Lab.
Output: Design constraint
▪ “Commands” for P&R tools (~~.sdc)
43
/ 210
SeDA Lab.
Floorplan
44 / 210
Semiconductor Design Automation Laboratory
Introduction to ‘physical design’
▪ ALL steps are in a P&R tool
−Synopsys IC Compiler 2, Cadence Innovus, Siemens EDA Aprisa
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
45
/ 210
SeDA Lab.
Common tool flow
.v, .sv, .vhdl, …
LIB/LEF/QRC (Cadence)
NDM/TLUP (Synopsys)
(Gate-level)
Netlist
.sdc (from synthesis)
Technology
library
P&R tool
Design
constraints
Layout
GDSII, DEF, SPEF…
46
/ 210
SeDA Lab.
Technology library – LEF
▪ Library exchange format (.lef) ➔ Geometric information
▪ Tech LEF: Metal layers
BEOL
47
/ 210
SeDA Lab.
Technology library – LEF
▪ Library exchange format (.lef) ➔ Geometric information
▪ Macro LEF: Instances (std. cell, macro blocks)
FEOL
48
/ 210
SeDA Lab.
Logic gate → Standard cell (Mapping)
▪ Logic gate (logical) ➔ Standard cell (physical)
▪ Abstract view = Bottom-most design unit in physical design
▼ Standard cell:
Abstracted layout
Vdd
In
▼ Logic gate
symbol
Out
▼ TR-level view
▼ Standard cell:
Physical layout
Gnd
49
/ 210
SeDA Lab.
Logic gate → Standard cell (Mapping)
Vdd
ZN
▪ Ex) NAND2 gate
A1
Logic gate symbol
ZN
NAND2_X1 standard cell
(Nangate 15nm DK)
A0
A0
A1
Gnd
▼ DFFRNQ_X1 standard cell
(Nangate 15nm DK)
▪ Ex) D-FF with reset
Vdd
Logic gate
symbol
Q
RN
D
Q
CLK
D
RN
CLK
Gnd
50
/ 210
SeDA Lab.
Technology library – QRC / TLUP
BEOL
▪ Quantus RC / Timing library update plus
➔ Metal layer RC information
▪ Unreadable (Generate from readable file)
−Cadence: .ict ➔ (Quantus) ➔ .tch/.qrc
−Synopsys: .itf ➔ (StarRC) ➔ .tlup
Quantus
(techgen)
▪ ** New Data Model (NDM) – for Synopsys tools (ICC2)
−LIB + LEF + TLUP → Unreadable files
51
/ 210
SeDA Lab.
Output – DEF
▪ Design exchange format – Geometric information
−COMPONENTS, PINS, NETS
52
/ 210
SeDA Lab.
Output – SPEF
▪ Standard parasitic exchange format
➔ RC information for nets
53
/ 210
SeDA Lab.
Intro. – Floorplan
▪ “Floorplan”
−Apartment, Chip
54
/ 210
SeDA Lab.
1. Chip size setting
Die boundary
Core boundary
▼ Chip
Core margin
I/O pins
(wires in core margin)
55
/ 210
SeDA Lab.
1. Chip size setting
▪ Parameter set 1: {core area (w, h), margins}
−Ex) {30, 30, 2, 2, 2, 2} ➔ core = 30 x 30, die = 34 x 34
▪ Parameter set 2: {utilization, aspect ratio, margins}
−Ex) total cell area = 1000, param set 2: {0.7, 1.0, (2.0, 2.0, 2.0, 2.0)}
➔Core area = 1000 / 0.7 = 1428.6
➔Core w x h = 37.8 x 37.8 (= sqrt(1428.6))
➔Die w x h = 41.8 x 41.8
▪ Example (Cadence Innovus)
56
/ 210
SeDA Lab.
2. Macro placement
▪ Manual ➔ Algorithmic solution (auto)
tile_0
tile_1
tile_2
tile_3
tile_6
tile_7
uncore
tile_4
tile_5
57
/ 210
SeDA Lab.
3. Power distribution network (PDN) setting
▪ Manual PDN
Analysis
−Metal layer, wire width / spacing / offset / …
M6 (H)
M5 (V)
M4 (H)
M3 (V)
58
/ 210
SeDA Lab.
Floorplan > Macro Placement
▼ Large area ➔ “Bad”
▼ Small area ➔ “Good”
3
8
6
5
8
6
3
2
4
5
7
1
4
2
7
1
59
/ 210
SeDA Lab.
Floorplan Representation
▼ Slicing floorplan
▼ Non-slicing floorplan
(1)
c
b
(4)
(2)
e
b
(2)
c
e
f
a
(3)
a
d
d
60
/ 210
SeDA Lab.
Slicing Floorplan → Slicing Tree
▼ Slicing floorplan
▼ Slicing tree
V
c
b
H
f
e
a
a
d
H
(down) (up)
H
V
(left)
b
d
(right)
V
e
61
/ 210
c
H
f
SeDA Lab.
Slicing Tree → Polish Expression
▪ “left_child-right_child-parent”
c
b
▼ Slicing tree
f
e
V
a
d
H
a
H
b
▲ Slicing floorplan
c
H
abHdefVHcHV
d
V
e
: 6 leaves (a~f) = operands
: 5 nodes (V, H) = operators
f
62
/ 210
SeDA Lab.
Slicing Tree → Polish Expression
▪ ∆a ∆b V ➔ (abH)(∆c cH)V ➔ abH(d∆d H)cHV
➔ abHd(efV)HcHV ➔ abHdefVHcHV
V
H
a
H
H
b
H
c
H
c
H
∆𝐚 =abH
d
d
V
e
V
e
f
∆𝐛 =∆𝒄 cH
∆𝐜 =d∆𝐝 H
63
/ 210
d
f
V
e
f
∆𝐝 =efV
SeDA Lab.
Floorplan Representation – Slicing FP
▪ Slicing FP ➔ Slicing tree ➔ Polish expression
▼ Slicing tree
V
▼ Slicing floorplan
H
▼ Polish expression
H
c
b
a
H
b
c
abHdefVHcHV
f
e
a
d
d
V
e
64
/ 210
f
SeDA Lab.
Sequence Pair
▪ 2D plane ➔ Express with two sequences (pair)
▪ Direction x & y → Can’t make order for ‘overlapped’ blocks
3
6
8
(1,5,6) → (2,4) → 7
3→8→7
2
1→5→6→3
4→2→8
7→6→8
5
1
4
7
➔ NOT in a single sequence
65
/ 210
SeDA Lab.
Sequence Pair
▪ Solve: use diagonal directions ➔ Sequence pair
3
8
6
S+ = <3 6 8 5 2 1 4 7>
S- = <1 5 4 6 2 3 8 7>
2
5
1
4
7
66
/ 210
SeDA Lab.
Sequence Pair
▪ Sequence pair → Floorplan: 1 SP makes 1 FP
−Relation between two blocks in sequences → ‘relative’ positions
−S+ = <…, x, …, y, …> // S- = <…, x, …, y, …>
➔ x is at the left of y, y is at the right of x
−S+ = <…, x, …, y, …> // S- = <…, y, …, x, …>
➔ x is above to y, y is below to x
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
67
/ 210
SeDA Lab.
Sequence Pair → Floorplan
▪ Generate horizontal / vertical constraint graphs
−HCG, VCG
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
68
/ 210
SeDA Lab.
Sequence Pair → Floorplan
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
➔ Generate nodes: 1 node per block + s & t node
t
1
7
5
6
s
4
S+
2
3
t
S-
69
/ 210
8
s
SeDA Lab.
Sequence Pair → Floorplan
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
➔ Use table → Draw HCG & VCG
−Left - Right: HCG, Above - Below: VCG
−Exclude ‘transitive’ edge
▲VCG
▲HCG
70
/ 210
SeDA Lab.
Sequence Pair → Floorplan
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
➔ Record ‘weight’ for each node
−HCG: block width, VCG: block height
−s, t: 0
71
/ 210
SeDA Lab.
Sequence Pair → Floorplan
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Largest path weight from s = Block’s lower-left position
−(Exclude self weight)
▪ Largest path weight of s→ t = Total FP size
72
/ 210
Block #
HCG (x)
VCG (y)
1
0
11
2
3
4
3
6
4
4
0
4
5
3
7
6
6
7
7
0
9
8
0
0
SeDA Lab.
Sequence Pair → Floorplan
▪ Ex) (S+, S-) = (<1 7 4 5 2 6 3 8>, <8 4 7 2 5 3 6 1>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ FP result
−Size = 11 x 15
73
/ 210
SeDA Lab.
Floorplan Algorithm – Simulated Annealing
▪ Meta-heuristic algorithm
−Find solution with “global” optimum
▪ Concept (find min-cost solution)
−(0) Init: High T
−(1) Change state (S → NewS)
−(2-1) ∆Cost < 0 ➔ Pick changed state (S NewS) = Move
−(2-2) ∆Cost > 0 ➔ Pick changed state with prob. = Move
꞉ Higher T = Higher problem
−(3) T↓
−Iterate (1)~(3) until T < 𝜖
74
/ 210
SeDA Lab.
SA – Pseudo-code
▪ T = T0
// (0) initialization
▪ i=0
▪ curr_state = init_state
▪ curr_cost = COST(curr_state)
▪ while (T >Tmin)
▪
// Iterate (1)~(3) until T ≤Tmin
while (stopping criterion is not met)
// 종료 조건 추가 가능
▪
i=i+1
▪
trial_state = TRY_MOVE;
// (1) Try another state
▪
trial_cost = COST(trial_state)
// check cost
▪
cost = trial_cost – curr_cost
▪
if (cost < 0)
▪
curr_state = trial_state;
▪
curr_cost = trial_cost;
▪
else
// (2-1) MOVE
// (2-2)
▪
r = RANDOM(0,1)
▪
if (r < e –Δcost/T)
▪
curr_state = trial_state;
▪
curr_cost = trial_cost;
▪
** You only need COST, MOVE
to solve any other problems!
T=α∙T
// MOVE with 𝑒
−∆𝐶𝑜𝑠𝑡
𝑇
prob.
// (3) T↓ (0 < α < 1)
▪ Return curr_state or the best solution
75
/ 210
SeDA Lab.
FP with SA – Cost
▪ Goal = minimize total area
➔ Cost = total area
−Polish expression: Bottom-up calculation
H
(max(w1, w2), h1+h2)
1
2
h1+h2
h1
1
w1
h2
2
w2
1
(w1, h1)
2
(w2, h2)
V
(w1+w2, max(h1,h2))
max(h1,h2)
1
(w1, h1)
2
1
2
(w2, h2)
max(w1, w2)
2
1
2
1
w1+w2
−Sequence pair: HCG & VCG → largest path weight (pp. 30)
▪ We only need ‘string’ to calculate FP state cost
76
/ 210
SeDA Lab.
FP with SA – Cost
▪ Ex) P.E.: 25V1H374VH6V8VH (Normalized)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
H
(up)
(down)
V
(left)
(right)
77
/ 210
SeDA Lab.
FP with SA – Move (P.E.)
▪ Ex) P.E.: 25V1H374VH6V8VH (Normalized)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 3&7 ➔ 25V1H734VH6V8VH
−Area: 11x15 → 13x14
78
/ 210
SeDA Lab.
FP with SA – Move (P.E.)
▪ Ex) P.E.: 25V1H734VH6V8VH (Normalized)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Complement the chain ➔ 25V1H734VH6V8HV
−Area: 13x14 → 15x11
79
/ 210
SeDA Lab.
FP with SA – Move (P.E.)
▪ Ex) P.E.: 25V1H734VH6V8HV (Normalized)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 6&V ➔ 25V1H734VHV68HV
−Area: 15x11 → 15x7
80
/ 210
SeDA Lab.
FP with SA – Move (S.P.)
▪ Ex) (S+, S-) = (<17452638>, <84725361>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
Block #
HCG (x)
VCG (y)
1
0
11
2
3
4
3
6
4
4
0
4
5
3
7
6
6
7
7
0
9
8
0
0
81
/ 210
SeDA Lab.
FP with SA – Move (S.P.)
▪ Ex) (S+, S-) = (<17452638>, <84725361>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 1 & 3 in S+ → (<37452618>, <84725361>)
82
/ 210
SeDA Lab.
FP with SA – Move (S.P.)
▪ Ex) (S+, S-) = (<17452638>, <84725361>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 1 & 3 in S+ → (<37452618>, <84725361>)
−Area: 11x15 → 13x14
83
/ 210
Block #
HCG (x)
VCG (y)
1
11
4
2
3
4
3
0
11
4
0
4
5
3
7
6
6
4
7
0
9
8
0
0
SeDA Lab.
FP with SA – Move (S.P.)
▪ Ex) (S+, S-) = (<37452618>, <84725361>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 4, 6 in both → (<37652418>, <86725341>)
84
/ 210
SeDA Lab.
FP with SA – Move (S.P.)
▪ Ex) (S+, S-) = (<37452618>, <84725361>)
−Block (w, h): (2,4), (1,3), (3,3), (3,5), (3,2), (5,3), (1,2), (2,4)
▪ Swap 4, 6 in both → (<37652418>, <86725341>)
−Area: 13x14 → 13x12
85
/ 210
Block #
HCG (x)
VCG (y)
1
11
4
2
5
4
3
0
9
4
8
4
5
5
7
6
0
4
7
0
7
8
0
0
SeDA Lab.
FP with SA – Example
86
/ 210
SeDA Lab.
Recent works – AI application
▪ Macro placement by AI (Reinforcement learning)
−(Google, Nature’20)
87
/ 210
SeDA Lab.
Placement
88 / 210
Semiconductor Design Automation Laboratory
Intro.
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
89
/ 210
SeDA Lab.
Intro.
Good
e
a
h
c
Bad
f
k
j
l
i
l
f
h
i
d
c
e
b
b
j
d
k
g
a
90
/ 210
g
SeDA Lab.
Metric: Wirelength
▪ NOT routed → “estimated” values
▪ 2-pin net: Manhattan distance
▪ 3-pins or more: Half-perimeter wirelength (HPWL)
Steiner point
BB height
BB width
91
/ 210
SeDA Lab.
Placement – Overview
▪ 1. Global placement:
−“Minimize 𝑾𝑳 s.t. 𝑫𝒃 𝒙, 𝒚 ≤ 𝑴𝒃 for ∀𝒃”
▼ Global placement result
(~300 iterations)
…
Spread cells
Iter. 1
Iter. 3
Iter. 5
Initial iterations
92
/ 210
SeDA Lab.
Placement – Overview
▪ 2. Detailed placement
−Overlap → 0
−Align standard cells to rows
► NOT
▼ Detail placement result
► NAND2
VDD
VSS
VDD
93
/ 210
SeDA Lab.
Global placement
Partitioning-based
Stochastic
Analytic
(2000s~)
Min-cut
placement
(1980s)
Simulated annealing
(1990s)
94
Quadratic
placement
/ 210
Force-directed
placement
SeDA Lab.
Min-cut placement: Concept
▪ 1 sub-netlist in 1 sub-area
▪ Min-cut = inter-area WL↓ = total WL ↓
95
/ 210
SeDA Lab.
Min-cut placement: Concept
1
▪ (Ex.)
4
2
5
6
3
1
4
1
4
2
5
2
0
3
6
5
3
6
0
cut1
1
4
3
5
cut2L
2
cut2R
0
0
cut3TL
1
cut3TR
4
5
2
6
cut3BL
1
4
5
2
6
3
3
cut3BR
96
6
/ 210
SeDA Lab.
Min-cut placement: Example
Clique-based, NOT hypergraph
Thin edge weight: 0.5
Thick edge weight: 1
All node (instance) area = same
97
/ 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 1: Full graph → L & R
▪ Cut 2: Left sub-graph → U & D
−o, k, g can be closed to any partition
98
/ 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 3: Right sub-graph → U & D
−n, j, f, e: “Terminals” ➔ Add dummy nodes (p1, p2)
99
/ 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 4: Up-left sub-graph → L & R
−o, k, g: Closed to R partition ➔ Add dummy node (p1, FIXED at R)
−e, f, a can be closed to any partition → NO dummy node
100 / 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 5: Down-left sub-graph → L & R
−i, j, g: “Terminals” ➔ Add dummy nodes (p1, p2 and p3)
꞉ p1: FIXED at L, p2 & p3: FIXED at R
101 / 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 6: Up-right sub-graph → L & R
−n, j: Closed to L partition ➔ Add dummy node (p1, FIXED at L)
−g, d, l can be closed to any partition → NO dummy node
102 / 210
SeDA Lab.
Min-cut placement: Example
▪ Cut 7: Down-right sub-graph → L & R
−j, f, b, o, k: Closed to L partition ➔ Add p1 & p2, FIXED at L
−p, h: Closed to R partition ➔ Add p3, FIXED at R
103 / 210
SeDA Lab.
Min-cut placement: Example
▪ U & D for all 8 sub-graphs & sub-areas
▪ High-weight edges (strongly connected) are “short”
1
2
2
2
2
2
1
2
3
1
1
2
2
104 / 210
Total HPWL (BB)
= 23 units
SeDA Lab.
Min-cut placement: Example
▪ Better partitioning == Better HPWL
HPWL(BB) = 27
HPWL(BB) = 23
105 / 210
SeDA Lab.
Simulated annealing – Concept
▪ SAME as in floorplan
−(0) Init: High T
−(1) Change state (S → NewS)
−(2-1) ∆Cost < 0 ➔ Pick changed state (S NewS) = Move
−(2-2) ∆Cost > 0 ➔ Pick changed state with prob. = Move
꞉ Higher T = Higher problem
−(3) T↓
꞉ = less prob. of move when ∆Cost > 0 in (2-2)
−Iterate (1)~(3) until T < 𝜖
▪ Cost: HPWL, …
▪ Move:
106 / 210
SeDA Lab.
SA – Example (TimberWolf 7.0)
107 / 210
SeDA Lab.
SA – Example (TimberWolf 7.0)
▪ 1st move: Swap b & e
▪ Calculate ∆cost (= HPWL change)
108 / 210
SeDA Lab.
SA – Example (TimberWolf 7.0)
▪ 2nd move: Swap m & o
▪ Calculate ∆cost
20 – 26 = – 6
– 12
– 12
109 / 210
–4
SeDA Lab.
Analytic placement – Quadratic
▪𝑳 𝑷
= σ𝒏𝒊,𝒋=𝟏 𝒄𝒊𝒋
𝒙𝒊 − 𝒙𝒋
𝟐
+ 𝒚𝒊 − 𝒚𝒋
−P: placement result
−(𝑥𝑖 ,𝑦𝑖 ): Coordinate of Cell i
−𝑛: Total cell count
−𝑐𝑖𝑗 : Connection cost between cell i & cell j
𝟐
Convex
Concave
▪ Convex optimization
➔ Local optimum == Global optimum
110 / 210
SeDA Lab.
Analytic placement – Quadratic
▪ Millions of equations (derivative)
➔ Cannot find P directly
➔ Find P that is closer to the optimal point
➔ Multiple iterations required
111 / 210
SeDA Lab.
Analytic placement – Quadratic
▪ Cannot consider overlap ➔ Cells in the center
▪ Reduce overlap (~20%) & Maintain relative positions
112 / 210
SeDA Lab.
Analytic placement – Force-directed
▪ Map cell placement problem to ‘force’ (≈ spring)
−Too far: Pulling each other → Closer
−Too close (overlap): Pushing each other → Apart
▪ Stage 1: Net force only, similar to quadratic-based
▪ Stage 2: Move force & hold force
𝐹 𝑛𝑒𝑡
𝐹 𝑚𝑜𝑣𝑒
−Density → ‘Potential’(∅)
−∅’ (force) = 0
➔ 𝐹 𝑛𝑒𝑡 + 𝐹 𝑚𝑜𝑣𝑒 + 𝐹 ℎ𝑜𝑙𝑑 = 0
113 / 210
SeDA Lab.
Analytic placement – Force-directed
▪ Example
Stage 1
Iter. = 1
Iter. = 3
Stage 2
Iter. = 5
* Overlap remains
Iter. ~300
114 / 210
SeDA Lab.
Analytic placement – Non-linear
▪ Non-differentiable ➔ Convert to differentiable equations
−ex) WL: ABS → log-sum-exp, weighted average, …
−ex) Density: Smoothed (bell-shaped, …)
▪ Various models
−mPL (UCLA); NTUplace (NTU); RePlAce (UCSD); …
115 / 210
SeDA Lab.
Detailed placement
▪ Legalization
−Eliminate overlap (→ 0)
−Align to standard cell rows
꞉ orig. – flip – orig. – flip – …
Overlap = 0
VDD
Standard cell row
Standard cell (INV)
Vdd
Out
In
VSS
Flipped
Standard cell row: flipped
VDD
Gnd
Standard cell row
VSS
116 / 210
SeDA Lab.
Recent works – DREAMPlace
▪ Deep Learning Toolkit-Enabled GPU Acceleration for
Modern VLSI Placement
▪ Utilize GPU ➔ 30x↑ speedup (1.0)
▪ Open-source available
(https://github.com/limbo018/DREAMPlace)
117 / 210
SeDA Lab.
Recent works – AutoDMP (NVIDIA, ISPD’23)
▪ Place macros & standard cells concurrently
−Better results // high complexity (‘cell’ size difference, …)
▪ Leveraged ‘open-source’ placement tool (DREAMPlace 4.0)
−Available at github (https://github.com/NVlabs/AutoDMP)
▪ https://github.com/NVlabs/AutoDMP/blob/main/images/mem
pool.gif
118 / 210
SeDA Lab.
Recent works – AutoDMP (NVIDIA, ISPD’23)
▪ Utilize AI engine
▲ GPU-accelerated mixed-size placement (NVIDIA)
https://developer.nvidia.com/blog/autodmp-optimizes-macro-placement-for-chip-design-with-ai-and-gpus/
119 / 210
SeDA Lab.
Placement with CAD tool
▪ Run with a command
−Ex. 1) Synopsys IC Compiler II
꞉ create_placement
꞉ place_opt (initial_place, initial_drc, initial_opto, final_place, final_opto)
−Ex. 2) Cadence Innovus
꞉ place_design
꞉ optDesign –preCTS
꞉ place_opt_design → 2-in-1
▪ Internal algorithm is “classified”
−Min-cut based + Analytic
−Parallel computing → lots of iterations
▪ Tune with parameters (100+)
−Turn on/off, iteration count, effort, …
120 / 210
SeDA Lab.
Placement with CAD tool
▪ Ex) Cadence Innovus parameters
−Global placement
121 / 210
SeDA Lab.
Placement with CAD tool
▪ Ex) Cadence Innovus parameters
−Detailed placement
122 / 210
SeDA Lab.
Clock Tree Synthesis (CTS)
123 / 210
Semiconductor Design Automation Laboratory
Intro.
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
124 / 210
SeDA Lab.
Intro.: Clock signal
▪ Clock signal oscillates with clock period
−1/(min c.p.) = max clock frequency ➔ Performance (ex. n GHz)
Combinational
Logic 1
FF
1
Combinational
Logic 2
FF
2
Combinational
Logic 3
FF
3
clock period
Clock
▪ Clock source → N clock sinks (FFs, macro blocks)
125 / 210
SeDA Lab.
Intro.: Clock skew
▪ Setup time constraint: 𝒕𝒑 + 𝒕𝒄.𝑴𝒂𝒙 + 𝒕𝒔𝒖 < 𝒕𝒄𝒍𝒌
−Clock arrives at FF1 later: signal arrives later than next clock edge
➔ Longer clock period for ‘margin’ ➔ less performance
➔ Less clock arrival time difference = better performance
▪ Clock skew: Max diff. of clock arrival times among sinks
−𝑠𝑘𝑒𝑤(𝑇) = max |𝑡(𝑠0 , 𝑠𝑖 ) − 𝑡(𝑠0 , 𝑠𝑗 )|
𝑠𝑖 ,𝑠𝑗 ∈𝑆
Path delay limit
FF1
FF2
FF1
Combinational
Logic 1
FF
1
Combinational
Logic 2
FF
2
Combinational
Logic 3
FF
3
FF2
Clock
126 / 210
SeDA Lab.
Intro.: Clock network
▼Clock tree
▼Clock spine
▼Clock mesh
▪ Clock arrival time (source → sink) – mesh < spine < tree
▪ Power consumption – mesh >> spine > tree
−Clock power = 30%~40% of total power
➔ Clock tree: Easier to control clock skew & lightweight
127 / 210
SeDA Lab.
CTS methods
▪ 1. Clock tree construction (global)
−Top-down: Method of Means and Medians (MMM)
−Bottom-up: Recursive Geometric Matching (RGM)
−Both: Deferred-Merge Embedding (DME)
▪ 2. Clock buffer insertion + skew optimization (detail)
128 / 210
SeDA Lab.
Method of Means and Medians (MMM)
▪ Top-down approach (source → sink, ≈ partitioning)
▪ Builds a ‘balanced’ tree
−A special case of MMM = H-tree (regular distribution)
▪ Zero skew is not guaranteed
129 / 210
SeDA Lab.
Recursive Geometric Matching (RGM)
▪ Bottom-up approach (sink → source, ≈ clustering)
▪ Zero-skew embedding
▪ Better than MMM (WL & balance)
▲ MMM: V cut than H cut
130 / 210
▲ RGM
SeDA Lab.
RGM – Exact Zero Skew
▪ Tapping point satisfies zero skew embedding (tree)
▪ (tp→s1) + (s1→Ts1 sinks) = (tp→s2) + (s2→Ts2 sinks)
s1
s1
Subtree Ts1
tp
tp
s2
s2
Subtree Ts2
131 / 210
SeDA Lab.
Delay calculation: Elmore delay model
Source: Washington State Univ. – EE582 (Pf. Dae Hyun Kim)
132 / 210
SeDA Lab.
Delay calculation: Elmore delay model
Source: Washington State Univ. – EE582 (Pf. Dae Hyun Kim)
133 / 210
SeDA Lab.
RGM – Exact Zero Skew
(s1~s2 = L)
tp
(1 – z)L
zL
w1 w2
s1
s2
zL
R(w1)
s1
C(w1)
2
C(w1)
2
Tapping point tp,
where Elmore delay
(1 – z)L
to sinks is equalized
R(w2)
s2
C(w2)
2
C(w2)
2
Subtree Ts1 Subtree Ts2
𝐶 𝑤1
2
▪ tp→ s1 (w1 delay) = 𝑅 𝑤1 ∙
+ 𝐶 𝑠1
= 𝑅𝑤 𝑧𝐿 ∙
𝐶𝑤 𝑧𝐿
+ 𝐶 𝑠1
2
t(Ts1 )
C(s1)
t(Ts2 )
C(s2)
–①
▪ s1→Ts1 sinks = 𝑡(𝑇𝑠1 ) – ② (from last loop)
𝐶(𝑤2 )
𝐶𝑤 1−𝑧 𝐿
+
𝐶(𝑠
))
=
𝑅
1
−
𝑧
𝐿
∙
(
+ 𝐶(𝑠2 )) – ③
2
𝑤
2
2
▪ tp→ s2 (w2 delay) = 𝑅(𝑤2 ) ∙ (
▪ s2→Ts2 sinks = 𝑡 𝑇𝑠2 – ④ (from last loop)
▪ ① + ② = ③ + ④ → get 𝑧 → get tp coordinate
− 𝑧 → 0 ➔ 𝑡𝑝 → 𝑠1 // 𝑧 → 1 → 𝑡𝑝 → 𝑠2
− 𝑧 < 0 or 𝑧 > 1 ➔ Can’t find zero-skew point
134 / 210
s1
z
1–z
s2
SeDA Lab.
Deferred-Merge Embedding (DME)
▪ Phase 1: Bottom-up (≈ RGM)
−Find merging segment (MS) ➔ zero skew
▪ Phase 2: Top-down
−Pick a point from each MS ➔ min. WL
s2
s0
s1
u1
u2
s1
u3
s3
s2
s4
s4
s3
s0
135 / 210
SeDA Lab.
DME – Phase 1
▪ Children → Parent: Find MS
−Point & Point = Segment
Tilted Rectangular
Region (TRR)
Core
s1
s1
s2
Radius
(=2)
s2
Manhattan distance = 4
Merging Segment
(MS)
136 / 210
SeDA Lab.
DME – Phase 1
▪ Children → Parent: Find MS
−Segment & Segment = Segment
Radius
(=2)
s1
TRR for a
segment ►
s2
- a+b=5
- a+2=b+1
➔ a=2, b=3
u3
a
s1
ms(u2)
b
s3
s1
u2
u1
2
ms(u1)
Core
2
1
1
s2
s3
s4
trr(u2)
s4
trr(u1)
s2
ms(u3)
137 / 210
SeDA Lab.
DME – Phase 1
▪ Example
s1
s8
s7
s2
s1
s6
s3
s4
s8
s7
s5
s0
s8
s7
s2
s2
s6
s1
s1
s8
s2
s3
s4
s7
s0
s6
s5
s3
s4
s0
s5
s6
s3
s4
s0
s5
138 / 210
SeDA Lab.
DME – Phase 2
▪ Parent → Children: Pick an optimal (min. distance) point
▼ Phase 2
▼ Phase 1
139 / 210
SeDA Lab.
DME – Phase 2
▪ Example
s1
s8
s7
s2
s6
s3
s1
s8
s5
s0
s4
s1
s7
s2
s8
s7
s2
s6
s6
s1
s8
s7
s2
s3
s4
s0
s5
s3
s4
s0
s5
s6
s3
s4
s0
s5
140 / 210
SeDA Lab.
Clock buffer insertion
▪ Buffer: 2 serial inverters → NO function
▪ Reduce load capacitance
Load cap.: 5
a
b
vB
d C(d) = 1
e C(e) = 1
f C(f) = 1
g C(g) = 1
Load cap.: 3
a
b
A
Z=A
d C(d) = 1
e C(e) = 1
vB
y
f C(f) = 1
g C(g) = 1
h C(h) = 1
h C(h) = 1
C(y) = 1
141 / 210
SeDA Lab.
Clock buffer insertion
▪ Clock: 1 source → Lots of sinks → Buffer is essential
142 / 210
SeDA Lab.
+ @ (Skew optimization)
▪ Wire = metal + via (complex)
▪ Buffer = instance → Cannot place anywhere
HERE?
▪ Extra Clock skew optimization
−Buffer sizing
−Wire sizing / snaking
INV_X1/2/4/8/16
(wire)
Sizing
143 / 210
Snaking
SeDA Lab.
CTS with CAD tool
▪ CTS + post-CTS optimization = 1 set
▪ ICC2: clock_opt (-build_clock, -route_clock, -final_opto)
▪ Innvous: clockDesign (CTS) + optDesign (opt.)
−ccopt_design (Clock Concurrent Optimization)
Synopsys ICC2
Cadence Innovus
Floorplan
Floorplan
placeDesign
Placement
& opt.
place_opt (5 stg)
CTS
& opt.
clock_opt (3 stg)
place_opt
_design
or
ccopt_
design
optDesign -preCTS
clockDesign
optDesign -postCTS
Route
& opt.
or
route_auto
routeDesign
route_opt
optDesign -postRoute
144 / 210
SeDA Lab.
CTS with CAD tool
▪ Ex) Cadence Innovus parameters
−CTS
145 / 210
SeDA Lab.
CTS with CAD tool
▪ Ex) Cadence Innovus parameters
−ccopt
146 / 210
SeDA Lab.
CTS with CAD tool
▪ Example) AES-128, ASAP 7nm PDK
−118,370 cells
−10,688 FFs
−705 clock buffers/inverters
CTS
147 / 210
SeDA Lab.
Recent work – Multi-bit flip-flop (MBFF)
▪ Multiple FFs share internal inverter(s) ➔ MBFF
▪ Banking / Debanking
−Banking: Power↓, Area↓, Performance↓ (timing degradation)
(ICCAD’24 contest)
(ICCAD’22)
(ICCAD’24 contest)
▲ Single-bit FF
▲ MBFF (2 bits)
(ICCAD’22)
148 / 210
SeDA Lab.
Routing
149 / 210
Semiconductor Design Automation Laboratory
Intro.
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
150 / 210
SeDA Lab.
Intro: Routing
▪ Connect instances (rectilinear)
Cell Cell
▼ Global Routing (GR)
(Course-grained)
▼ Placement
N3
N3
N1
N1
N2
N3
N3
N2
N1
N1
N2
N3
N2
N3
N3
N1
▼ Detail Routing (GR)
(Fine-grained)
N3
N3
N1
Cell Cell Cell
N1
151 / 210
N2
N1
N1
N2
SeDA Lab.
Intro: Global Routing (GR)
▪ Input: Cell (pin) locations & connections (nets)
−Circuit ➔ Grid graph
꞉ 𝑽 – Global bin
꞉ 𝑬 – Edge with capacity 𝑐𝑖𝑗 (available wire count, “track”)
8
… 8
Capacity = 8 … Capacity = 8
Capacity = 8
8
…
…
…
…
Capacity = 8
8
…
…
152 / 210
SeDA Lab.
Intro: Detailed Routing (DR)
M2
▪ Input: GR result
▪ Output: ‘real’ routing result
M1
▪ ALL nets should be connected
▪ Zero (minimized) design rule violations (DRVs)
−Remaining DRVs: Engineering change order (ECO) after routing
▲ EOL spacing
▲ Parallel run ▲ Minimum area
spacing
153 / 210
SeDA Lab.
Routing algorithms
▪ Single-net
−2-pins: Maze routing, Pattern routing
−Multi-pins: RMST, RSMT
▪ Multi-net
−Sequential: Net ordering + Rip-up and Reroute (RRR)
−Concurrent: ILP
▪ Single layer (2D) → Multi-layer (3D)
−2D routing + Layer assignment
−3D routing
154 / 210
SeDA Lab.
2-pin net: Maze routing
▪ Routing = Find path (S → T)
−Min-cost path = shortest routing result
Shortest
T
S
T
T
S
S
155 / 210
SeDA Lab.
Dijkstra algorithm
▪ (Prerequisite: ALL moving cost > 0)
▪ Set S, T, Select S
▪ Repeat until T is selected:
−Update ‘cost’ & ‘from’ of neighboring grids
꞉ (‘Cost’ = accumulated cost from S)
−Select a grid with non-visited & min-cost
6
5
4
5
6
7
8
5
4
3
4
5
T
7
4
3
2
3
4
5
6
3
2
1
2
3
4
5
2
1
S
1
2
3
4
3
2
1
2
3
4
5
156 / 210
SeDA Lab.
Dijkstra algorithm
▪ Example) Find min-cost path (S → T)
−Cost = ∑w1 + ∑w2
▪Set S, T, Select S
Edge weight: (w1, w2)
S
1
Node
Cost
From
1
0
X
2
-
-
3
-
-
4
-
-
5
-
-
6
-
-
7
-
-
8
-
-
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
157 / 210
4
3,2
5
2,8
2,8
3
9,8
7
8
T
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
Node
Cost
From
1
0
X
2
8,6
1
3
-
-
4
1,4
1
5
-
-
6
-
-
7
-
-
8
-
-
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
158 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
Node
Cost
From
1
0
X
2
8,6
1
3
-
-
4
1,4
1
5
10,11
4
6
-
-
7
9,12
4
8
-
-
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
159 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
10,12 (from 2) > 10,11 (from 4)
Node
Cost
From
1
0
X
2
8,6
1
3
9,10
2
4
1,4
1
5
10,11
4
6
-
-
7
9,12
4
8
-
-
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
160 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
Node
Cost
From
1
0
X
2
8,6
1
3
9,10
2
4
1,4
1
5
10,11
4
6
18,18
3
7
9,12
4
8
-
-
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
161 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
12,19 (from 5) < 18,18 (from 3)
Node
Cost
From
1
0
X
2
8,6
1
3
9,10
2
4
1,4
1
5
10,11
4
6
12,19
5
7
9,12
4
8
12,19
5
9
-
-
1,4
8,6
8,8
9,7
2
2,6
1,4
162 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Update ‘cost’ & ‘from’ of neighboring grids
−Cost exists → update if the new cost is smaller
▪ Select a grid with non-visited & min-cost
−Tie-breaker = random
S
1
Node
Cost
From
1
0
X
2
8,6
1
3
9,10
2
4
1,4
1
5
10,11
4
6
12,19
5
7
9,12
4
12,14 (from 7) < 12,19 (from 5) 8(T)
12,14
7
-
-
9
1,4
8,6
8,8
9,7
2
2,6
1,4
163 / 210
4
3,2
5
9,8
T
2,8
2,8
3
7
8
4,5
6
3,3
9
SeDA Lab.
Dijkstra algorithm – Example
▪ Backtrack (T → S) with ‘from’
▪ Result: 1 → 4 → 7 → 8
− Cost: 12, 14
Node
Cost
From
1 (S)
0
X
2
8,6
1
3
9,10
2
4
1,4
1
5
10,11
4
6
12,19
5
7
9,12
4
8 (T)
12,14
7
9
-
-
S
1
1,4
8,6
8,8
9,7
2
2,6
1,4
9,8
7
3,2
5
T
2,8
2,8
3
164 / 210
4
8
4,5
6
3,3
9
SeDA Lab.
Best-first-search
▪ Cost in Dijkstra: S → here
▪ Cost in Best-first-search: here → T
▪ Redundant search ↓ // Result cannot guarantee min-cost
https://theory.stanford.edu/~amitp/GameProgramming/AStarComparison.html
▲ Dijkstra
▲ Best-first-search
165 / 210
SeDA Lab.
A* search
▪ Cost in A*: Dijkstra cost + Best-first-search cost
−[S → here] + [here → T]
https://theory.stanford.edu/~amitp/GameProgramming/AStarComparison.html
≈
▲A* search
▲Best-first-search
166 / 210
SeDA Lab.
A* search
▪ Redundant search ↓ // Guarantee min-cost
https://theory.stanford.edu/~amitp/GameProgramming/AStarComparison.html
▲Dijkstra
▲Best-first-search
167 / 210
▲A* search
SeDA Lab.
Pattern routing
▪ Connect wire with pre-defined ‘patterns’
−L-shape, Z-shape, U-shape, …
▪ Practical
−less memory, easy patterns, …
▪ Less flexible, NOT min. length (multi-nets)
▲ L-shape
▲ Z-shape
168 / 210
▲ Monotonic
SeDA Lab.
N-pin net – RMST / RSMT
▪ Rectilinear minimum spanning tree (RMST)
−n-pin net ➔ (n-1) 2-pin nets
꞉ Fast (𝑂 𝑛2 ~𝑂(𝑛 𝑙𝑜𝑔𝑛)) // WL↑
▪ Rectilinear Steiner minimum tree (RSMT)
−Add Steiner point(s) → build tree with 2-pin nets
꞉ WL↓ // Finding Steiner points for RSMT is NP
RMST
RSMT
5
6
3
3
Length = 11
1
Steiner
point
3
Length = 10
169 / 210
SeDA Lab.
N-pin net – Heuristics
▪ Ex.) Consecutive 2-pin net routing ➔ Fine-tuning at last
▪ Ex.) Consecutive L-shape routing
3
1
2
1
3
1
2
2
3
1
2
4
3
3
1
2
3
1
2
3
1
2
4
4
4
5
4
6
4
5
6
5
6
5
6
5
7
170 / 210
7
SeDA Lab.
N-pin net – MST + RS(M)T
▪ (1) Build MST → (2) Convert to RS(M)T
▲ Initial MST
171 / 210
▲ Steiner tree (RSMT)
SeDA Lab.
N-pin net – (1) Build MST
▪ Example graph: 9 nodes, 14 edges → “pick” 8 edges
https://www.geeksforgeeks.org/prims-minimum-spanning-tree-mst-greedy-algo-5/
172 / 210
SeDA Lab.
N-pin net – (1) Build MST (Prim’s)
▪ Prim’s algorithm: pick a min. weight adjacent edge
173 / 210
SeDA Lab.
N-pin net – (1) Build MST (Kruskal’s)
▪ Kruskal’s algorithm
−Pick a min. weight edge from the whole graph
174 / 210
SeDA Lab.
N-pin net – (1) Build MST (Kruskal’s)
▪ Kruskal’s algorithm
−Pick a min. weight edge from the whole graph
175 / 210
SeDA Lab.
N-pin net – (2) MST → RS(M)T
▪ Heuristic (greedy) approach
−Ex.) 1-Steiner routing
6
6
WL = 20
(Rectilinear)
WL = 18
(Rectilinear)
WL = 17
(Rectilinear)
176 / 210
WL = 16
(Rectilinear)
SeDA Lab.
Multi-net
▪ (Global & Detailed) routing: Route ALL net
▪ Sequential approach
−Faster // Order-dependent (NOT optimal)
A
A
B
B
B´
B´
A´
A´
▪ Concurrent approach
−Optimal solution // Very slow (NP)
A
B
A
B
A´
B´
A´
A
B
B´
177 / 210
A´
B´
SeDA Lab.
Concurrent approach: Routing with ILP
▪ Integer linear programming
▪ ILP formulation ➔ Mount into ILP solver (program)
−Runtime ∝ Problem size (#variable, #constraints, …)
▪ Usage: “small” designs
−Ex.) Standard cell → in-cell routing
Sequential
Metal usage (length)
= 292
Concurrent
Metal usage (length)
= 239
178 / 210
SeDA Lab.
Sequential approach
▪ Net-by-net routing
−Route one net at a time → N! ordering sequences
▪ Net ordering criteria (heuristic)
−Short nets > Long nets
−high aspect ratio (close to ‘straight’) nets > low aspect ratio nets
−Timing critical nets (ex. long FF-to-FF path) > non-critical nets
−…
A
B´
B´
A´
A
A´
B
179 / 210
B
SeDA Lab.
Sequential approach
▪ Rip-up and reroute (RRR)
−Reset (rip-up) some nets and reroute with changed order/path
180 / 210
SeDA Lab.
Multi-layer
▪ Routing is on multiple metal layers
−Metal (x, y) + via (z)
▪ (1) Route on (2D) grid graph → layer assignment
▪ (2) Route on 3D graph directly
−Implement all algorithms in 3D (Search space: 4 → 6)
−Better WL // Slow
181 / 210
SeDA Lab.
Routing with CAD tool
▪ Route
−Synopsys ICC2: route_auto (=route_global + route_track + route_detail)
꞉ route_track = Track assignment (guide → track)
−Cadence Innovus: routeDesign
▪ + Post-route optimization
−Synopsys ICC2: route_opt
−Cadence Innovus: optDesign -postRoute
−Optimize (refine) power/timing based on ‘real’ (=routed) data
꞉ Placement, CTS – based on ‘estimated’ data → not accurate
▪ Route + post-route optimization ➔ Final result (GDS)
182 / 210
SeDA Lab.
Routing with CAD tool
▪ Example) AES, ASAP 7nm PDK
−Metal 2~7 (Metal 1: Only used in routing within std cell)
2
3
4
5
6
7
183 / 210
SeDA Lab.
Timing Analysis & Closure
184 / 210
Semiconductor Design Automation Laboratory
Intro.
Partitioning (Clustering)
Chip Planning (Floorplanning)
Placement
Clock Tree Synthesis
Routing
Timing Closure
185 / 210
SeDA Lab.
Physical design metrics
▪ PPA(C)
−Power – Performance – Area (– Cost)
Better!
▪ Power (unit: W)
▪ Performance (unit: Hz) // Timing (unit: s)
−Clock period↓ ➔ Clock frequency↑ ➔ High performance
꞉ Ex.) 1ns → 1GHz, 0.5ns → 2GHz, …
−ALL path (FF-to-FF) delays < clock period ➔ Closed!
▪ Area: Chip size
▪ Cost: etc. (runtime, manpower, money, …)
▪ GOAL = Low power & High performance
186 / 210
SeDA Lab.
Intro.: Timing constraints
(unroll)
Combinational
Logic
FF
Combinational
Logic
FF
Combinational
Logic
Copy 1
FF
Copy 2
Combinational
Logic
FF
Copy 3
Clock
Clock
▪ Setup time constraint: Signal should be “setup”
▪ Hold time constraint: Signal should “hold” value
(FF characteristics)
“Setup” before here
“Hold” until here
187 / 210
SeDA Lab.
Timing analysis
▪ 𝑡𝑐𝑙𝑘→𝐹𝐹1 + 𝑡𝑝 + 𝑡𝑐.𝑀𝑎𝑥 + 𝑡𝑠𝑢 < 𝑡𝑐𝑙𝑘 + 𝑡𝑐𝑙𝑘→𝐹𝐹2 (Setup)
▪ 𝑡𝑝 + 𝑡𝑐.𝑚𝑖𝑛 > 𝑡ℎ (Hold)
−𝑡𝑝 : FF propagation delay (clock → Q)
−𝑡𝑠𝑢 / 𝑡ℎ : FF setup time / hold time
−𝑡𝑐𝑙𝑘 : Clock period
−𝑡𝑐𝑙𝑘→𝐹𝐹1 / 𝑡𝑐𝑙𝑘→𝐹𝐹2 : Clock source → FF1 / FF2 delay
−𝒕𝒄.𝑴𝒂𝒙 / 𝒕𝒄.𝒎𝒊𝒏 : Combinational logic (path) delay
FF
1
Combinational
Logic
FF
2
Clock
▪ 𝑡𝑝 , 𝑡𝑠𝑢 , 𝑡ℎ : Fixed, 𝑡𝑐𝑙𝑘 : Initial condition
▪ 𝑡𝑐𝑙𝑘→𝐹𝐹1 & 𝑡𝑐𝑙𝑘→𝐹𝐹2 : Clock skew (at CTS)
➔ 𝑡𝑐.𝑀𝑎𝑥 / 𝑡𝑐.𝑚𝑖𝑛 : Main analysis & modification target
188 / 210
SeDA Lab.
Timing analysis
▪ Path delay = Gate delay + Wire delay
−Ex.)
a
(0.15)
y (2.0)
(0.2)
w (2.0)
FF
Combinational
Logic
FF
f
(0.1)
b
Clock
(0.2)
(0.1) x (1.0)
(0.25)
(0.3)
z (2.0)
c
(0.1)
▪ Static timing analysis (STA)
−Faster // less accurate → pessimistic (= acceptable)
꞉ ex. AND2: (0, 0→1) ➔ 0→0 (NO change)
189 / 210
SeDA Lab.
STA
▪ Circuit (Path) → Graph (DAG, directed acyclic graph)
a
(0.15)
y (2.0)
FF output or
Primary input
b
(0.2)
FF input or
Primary output
(0.1)
(0.1)
(0.25)
(0.3)
x (1.0)
f
(0.2)
w (2.0)
z (2.0)
c
(0.1)
a (0)
(0)
s
(Input delay)
(0)
y (2)
(0.15)
(0.1)
b (0)
(0.1)
(0.2)
w (2)
x (1)
(0.3)
(0.6)
c (0)
(0.1)
(0.2)
f (0)
(0.25)
z (2)
190 / 210
SeDA Lab.
STA - Actual arrival time (AAT, AT)
▪ 𝐴𝐴𝑇(𝑣) = max 𝐴𝐴𝑇(𝑢) + 𝑡(𝑢, 𝑣)
𝑢∈𝐹𝐼(𝑣)
−𝐹𝐼(𝑣): Fanin nodes, 𝑡(𝑢, 𝑣): u → v delay (wire + gate)
▪ s → output
−AAT of s = 0
a (0)
y (2)
(0.15)
A0
A 3.2
(0)
s
(0)
A0
(0.1)
b (0) (0.1)
A0
A 0.6
w (2) (0.2)
x (1)
A 1.1
A 5.65
(0.3)
(0.6)
c (0)
(0.2)
f (0)
A 5.85
(0.25)
(0.1)
z (2)
A 3.4
191 / 210
SeDA Lab.
STA - Required arrival time (RAT)
▪ 𝑅𝐴𝑇(𝑢) = min
𝑣∈𝐹𝑂(𝑢)
𝑅𝐴𝑇 𝑣 − 𝑡(𝑢, 𝑣)
−𝐹𝑂(𝑣): Fanout nodes, 𝑡(𝑢, 𝑣): u → v delay (wire + gate)
▪ output → s
−RAT of output = 𝒕𝒄𝒍𝒌 (+𝒕𝒔𝒌𝒆𝒘 − 𝒕𝒔𝒖 )
a (0)
y (2)
(0.15)
R 0.95
R 3.1
(0)
s
(0)
R -0.35
(0.1)
b (0) (0.1)
x (1)
R -0.35
R 0.75
R 0.95
w (2) (0.2)
R 5.3
(0.3)
(0.6)
c (0)
(0.2)
f (0)
R 5.5
(0.25)
(0.1)
z (2)
R 3.05
192 / 210
SeDA Lab.
STA - Slack
▪ 𝒔𝒍𝒂𝒄𝒌(𝒗) = 𝑹𝑨𝑻(𝒗) − 𝑨𝑨𝑻(𝒗)
−Slack > 0 = Timing closed (redundant resource?)
−Slack < 0 = Timing NOT closed
▪ Goal = Zero slack (➔ RAT↑, AAT↓)
−Slack: -100 < -90 < … < -10 < 0 == +10 == +20 == +30 == …
a (0)
s
A0
R -0.35
S -0.35
(0)
A0
R 0.95
S 0.95
(0)
b (0) (0.1)
(0.6)
A0
R -0.35
S -0.35
c (0)
A 0.6
R 0.95
S 0.35
y (2)
(0.15)
A 3.2
(0.1) R 3.1 (0.2)
S -0.1
w (2) (0.2)
x (1)
A 1.1
R 0.75 (0.3)
S -0.35
A 5.65
(0.25) R 5.3
S -0.35
(0.1)
f (0)
A 5.85
R 5.5
S -0.35
z (2)
A 3.4
R 3.05
S -0.35
193 / 210
SeDA Lab.
STA - Zero slack
▪ Reduce gate/wire delay
▪ Clock period↑ → RAT↑ (Max operating frequency↓)
▪ (𝒕𝒄𝒍𝒌→𝑭𝑭𝟐 − 𝒕𝒄𝒍𝒌→𝑭𝑭𝟏 )↑ → RAT↑ (Useful skew, not general)
▪…
a (0)
(0)
s
A0
R 0.15
S 0.15
A0
R 1.45
S 1.45
A 3.2
(0.1) R 3.6 (0.2)
S 0.4
(0)
b (0) (0.1)
(0.6)
A0
R 0.15
S 0.15
A 1.1
R 1.25
S 0.15
c (0)
(0.1)
A 0.6
R 1.45
S 0.85
Ex) Reduced gate w delay
(2 → 1.5)
y (2)
(0.15)
w
f (0)
(1.5) (0.2)
A 5.35
A 5.15
R 5.5
(0.25) R 5.3
S 0.15
S 0.15
x (1)
(0.3)
z (2)
A 3.4
R 3.55
S 0.15
194 / 210
SeDA Lab.
STA – WNS/TNS
▪ WS (worst slack): min 𝑠𝑙𝑎𝑐𝑘(τ)
τ∈Τ
−Minimum slack of ALL paths
−Clock period – WS = Min clock period (➔ Max freq.)
▪ WNS (worst negative slack):
min
τ∈Τ,𝒔𝒍𝒂𝒄𝒌 𝛕 ≤𝟎
𝑠𝑙𝑎𝑐𝑘(τ)
−WNS = 0 ➔ Timing closed
▪ TNS (total negative slack): στ∈Τ,𝑠𝑙𝑎𝑐𝑘(τ)≤0 𝑠𝑙𝑎𝑐𝑘(τ)
195 / 210
SeDA Lab.
Analysis with CAD tool
▪ Synopsys PrimeTime, Cadence Tempus
−report_timing
Worst (negative) slack < 0
➔ NOT closed!!
196 / 210
SeDA Lab.
Analysis with CAD tool
▪ Synopsys PrimeTime, Cadence Tempus
−report_power
197 / 210
SeDA Lab.
Analysis with CAD tool
▪ Others (SI, PI, thermal, …)
−Lots of analysis tools exist
IR drop map
(Cadence Voltus)
Thermal map
(Ansys Fluent)
198 / 210
SeDA Lab.
Timing closure (optimization)
▪ Timing-driven design: During design stages
−placement, routing ➔ Add timing-related parameter/constraints
▪ Timing closure: After each stage
−Move cell, gate sizing, buffer insertion, wire sizing/snaking, utilize
useful skew, …
199 / 210
SeDA Lab.
Timing closure
▪ Post-design modification
−Cell move, Gate sizing, buffer insertion, …
Synopsys ICC2
Cadence Innovus
Floorplan
Floorplan
placeDesign
Placement
& opt.
place_opt (5 stg)
CTS
& opt.
clock_opt (3 stg)
Route
& opt.
or
place_opt_design
or
ccopt_design
optDesign -preCTS
clockDesign
optDesign
-postCTS
route_auto
routeDesign
route_opt
optDesign
-postRoute
200 / 210
SeDA Lab.
Gate sizing
▪ Ex) Inverter: X1, X2, X4, X8, X16
A
Z
A
Z
A
Z
201 / 210
…
SeDA Lab.
Gate sizing
▪ Gate delay = F(input slew, output load capacitance)
output load capacitance
input slew
(=transition time)
…
202 / 210
SeDA Lab.
Gate sizing
▪ Gate size↑ = Small delay difference for varying load cap.
40
40
35
35
Delay (ps)
30
25
20
15
23
18
24
21
25
24
30
27
30
26
27
33
28
C>B>A
(size)
A
B
C
20
15
10
5
0.5
1.0
1.5
2.0
2.5
Load Capacitance (fF)
203 / 210
3.0
SeDA Lab.
Gate sizing
▪ Choose a gate with same function & different size
➔ GOAL: shorter delay or lower power
a
b
vB
d
C(d) = 1.5
e
C(e) = 1.0
f
C(f) = 0.5
t(vB) = 33
(Size: A < B < C)
d
d
a
b
vA
a
b
e
vC
e
f
f
t(vC) = 28, power↑
t(vA) = 40, power↓
204 / 210
SeDA Lab.
Buffer insertion
▪ Load cap.↓→ gate delay↓
▪ Gate output slew↓ → next gate input slew↓ → next gate
delay↓
Load cap.: 5
a
b
vB
Load cap.: 3
d C(d) = 1
e C(e) = 1
a
b
f C(f) = 1
g C(g) = 1
vB
y
f C(f) = 1
g C(g) = 1
h C(h) = 1
h C(h) = 1
(Weak)
d C(d) = 1
e C(e) = 1
(Strong)
C(y) = 1
Transition time (slew)
205 / 210
SeDA Lab.
Buffer insertion
▪ + Buffer delay, + Power
−Use on non-critical paths (don’t make neg. slack!)
−Fix hold violation (𝑡𝑝𝑑 + 𝑡𝑐.𝑚𝑖𝑛 ↑ > 𝑡ℎ )
−Add as small as possible
a
b
d C(d) = 1
e C(e) = 1
vB
f C(f) = 1
g C(g) = 1
C(vB) = 5 fF
t(vB) = 45 ps
h C(h) = 1
d C(d) = 1
e C(e) = 1
a
b
vB
y
C(vB) = 3 fF
t(vB) = 33 ps
f C(f) = 1
g C(g) = 1
h C(h) = 1
C(y) = 3 fF
t(y) = t(vB) + t(y) = 66 ps
206 / 210
SeDA Lab.
Etc.: Well tap cell
▪ Remove “body effect” transistor
→ VSS
PMOS
NMOS
→ VDD
▪ “Tap” cell ➔ add periodically (empty space)
TAPCELL
(ASAP 7nm)
207 / 210
SeDA Lab.
Etc.: Filler cell
▪ “Empty” cell – NO function
INV_X1 (ASAP 7nm)
FILLER
208 / 210
FILLER_xp5 (x0.5)
SeDA Lab.
Etc.
▪ Ex) ac97_ctrl (8K cells), ASAP 7nm
−14K Filler & tap cells
add filler
& tap cells
209 / 210
SeDA Lab.
Thank you!
Contact: h.park@unist.ac.kr
https://sites.google.com/view/unist-seda
210 / 210
Semiconductor Design Automation Laboratory
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )