Futures at the IC Design-Manufacturing Interface

advertisement
Futures at the IC Design-Manufacturing
Interface
Andrew B. Kahng
UCSD VLSI CAD Laboratory
abk@ucsd.edu
http://vlsicad.ucsd.edu
CSE and ECE Departments
University of California, San Diego
Outline
• I. Motivation: Variability and Value
• II. Classical Design for Manufacturing: DAM, MAD
• III. Futures at the Design-Manufacturing Interface
Andrew B. Kahng 110328 2
Outline
• I. Motivation: Variability and Value
• Semiconductor trends
• Power and yield challenges
• How to maximize value from new technology: “DFM”
• II. Classical Design for Manufacturing: DAM, MAD
• III. Futures at the Design-Manufacturing Interface
Andrew B. Kahng 110328 3
What Does an IC Do?
0.01
0.1
Video
MPEG1
Extraction
JPEG
Audio
Voice
100
GOPS
10
MPEG2 Extraction
Compression
MPEG4
Sentence Translation
Dolby-AC3
MPEG
1
Voice Auto Translation
Word Recognition
3D Graphics
Graphics
10Mpps
100Mpps
2D Graphics
Communication
Modem
Recognition
FAX
VoIP Modem
Voice Print Recognition
SW Defined Radio
Face Recognition
Moving Picture Recognition
Requirement for multimedia processing (GOPS: Giga OPs/Sec)
2007 ITRS Consumer-Stationary SOC Model: 220 TFlops on single chip in 2022
Andrew B. Kahng 110328 4
What Does an IC Look Like?
• Qualcomm’s first 45nm tapeout: “Serra” (~July 2009)
•
•
•
•
•
•
•
•
•
•
53 mm2 , 419 signal IOs
170M transistors
13.7M bits RAM
1.1M bits ROM
CDMA, UMTA, GSM
2 DSP, ARM9, ARM11
WVGA, ATI LT, 8MP
6 metal layers + RDL
2 Vt, 2 Lgate
450uA leakage
TT / 25C / 1.125V
• 105 master, 230 total
clock domains
• 24 analog, pad power
domains
• 8 digital power islands
• …
Matt Severson, UCSD ECE 260B guest lecture, February 2011
Andrew B. Kahng 110328 5
How Is It Created?
Design
Manufacturing
Test
Corrected Layout
E-Test
Behavior
Mask
Wafer Sort
Circuit
Wafer
Burn-In
Layout
Packaged Die
Final Test
System
Andrew B. Kahng 110328 6
Manufacturing: Optical Lithography
• Feature size (28nm) << Wavelength of light (193nm)
Still 193nm
20nm in 2013
Numerical Technologies, Inc., 1998
Andrew B. Kahng 110328 7
Photomask Complexity
Andrew B. Kahng 110328 8
Photomask Complexity (Intel 65nm)
Kelin Kuhn, Intel, ICVC 2009
Andrew B. Kahng 110328 9
Photomask Complexity (Intel 65nm)
Kelin Kuhn, Intel, ICVC 2009
Manufacturing complexity brings challenges…
Andrew B. Kahng 110328 10
Challenge: Variability
• What is this chip’s frequency?
Figure courtesy Intel
S. Nassif, IBM
Across-wafer frequency variation
Andrew B. Kahng 110328 11
Challenge: Power
• Primary limiter of product features, performance, form factor
• Expensive: packaging, electricity, cooling, …
• In particular: Leakage power = “wasted” power
D. Blaauw, U. Michigan
Andrew B. Kahng 110328 12
Example of Power As Limiter: Form Factor
Phone Surface Temperature Rise [C]
Power  heating  thermal runaway, human discomfort
Surface Power Density [W / sq-in] (target = 0.1)
Huawei / M. Severson, Qualcomm
Andrew B. Kahng 110328 13
1.4
1.3
1.2
1.1
1.0
0.9
1000 Intel CPUs
at 130nm node
30%
Normalized Frequency
Challenge: Variability of Power
20x
0
5
10
Normalized Leakage
15
20
• Subthreshold leakage varies exponentially with Lgate, Vt, tox …
 5-20X variation is common
• Gate length [== “critical dimension” (CD)] variation in
manufacturing is a major source of leakage variation
Andrew B. Kahng 110328 14
1.4
1.3
1.2
1.1
1.0
0.9
1000 Intel CPUs
at 130nm node
30%
Normalized Frequency
Challenge: Variability of Power
20x
0
5
10
Normalized Leakage
15
20
• Subthreshold leakage varies exponentially with Lgate, Vt, tox …
 5-20X variation is common
• Gate length [== “critical dimension” (CD)] variation in
manufacturing is a major source of leakage variation
• Parametric yield loss: cannot sell leaky chips or slow chips
Andrew B. Kahng 110328 15
Cost of Variability: Example
• Company hopes to sell 100M copies of its chip
• 300mm wafer in 45nm process: $5,000
• Die size: 10mm x 10mm  ~600 raw die per wafer
• 90% vs. 95% yield
 parametric yield loss
• 540 vs. 570 good die per wafer
• 185,186 vs. 175,439 wafers needed
 $50M difference
Andrew B. Kahng 110328 16
How Can Designers Deal With Variability?
• Guardbands?
•Increase margins and over-design
 more area, more power, less value
• Statistics?
• “Probabilistic” instead of “worst-case” design
 difficult since statistics are always changing
• Wait for better manufacturing equipment?
• My work:
Connect IC Design and Manufacturing
•Drive design requirements into manufacturing
•Bring manufacturing awareness into design
In practically useful ways
Andrew B. Kahng 110328 17
Value at the Design-Manufacturing Interface
Design
Manufacturing
Test
Corrected Layout
E-Test
Behavior
Mask
Wafer Sort
Circuit
Wafer
Burn-In
Layout
Packaged Die
Final Test
System
Create value HERE
Andrew B. Kahng 110328 18
Outline
• I. Motivation: Variability and Value
• II. Classical Design for Manufacturing: DAM, MAD
• Design-Aware Manufacturing (DAM)
• Manufacturing-Aware Design (MAD)
• Variations: Measure, Model, and Monitor
• III. Futures at the Design-Manufacturing Interface
Andrew B. Kahng 110328 19
Design-Aware Manufacturing
1
KEY IDEA: Not all shapes equally important!
DESIGN INFORMATION
Design
Variation: Measure,
Model, Monitor
Manufacturing
3
MANUFACTURING INFORMATION
2
Manufacturing-Aware Design
KEY IDEA: Mitigate and compensate
systematic variations
Key Aspect of Design = Timing Slack
• Positive timing slack can be “converted” into power reductions
(smaller transistors, area, power, …)
3 - 1
1
+2
Tarrival
5 - 3
Trequired
2
+2
7 - 7
1-1
10
20
2 - 2
01
01
5- 5
20
4 - 4
1
+1
2- 1
Slack = Trequired – Tarrival
CLK
CLK
Andrew B. Kahng 110328 21
Key Aspect of Design = Timing Slack
• Positive timing slack can be “converted” into power reductions
(smaller transistors, area, power, …)
3 - 1
1
+2
Tarrival
5 - 3
Trequired
2
+2
7 - 7
1-1
10
20
Gates of positive-slack
2 - 2
01
cells01 can have larger CD,
20
5- 5
variation budgets!
4 - 4
1
+1
2- 1
Slack = Trequired – Tarrival
CLK
CLK
Andrew B. Kahng 110328 22
Transistor Gate-Length Biasing
Delay
Leakage
Leakage and Delay vs. Gate Length
• Bias Impact
• Exponential
• (Isub) Leakage reduction
• Variability reduction
• Linear
• Performance reduction
Gate Length
[DAC’04, TCAD’06]
• Idea: Use fine-grain gate-length biases (e.g., +2nm, +4nm)
• Save leakage without changing chip timing
• Chip-scale optimization after detailed routing, before tapeout
Andrew B. Kahng 110328 23
Design-Aware Manufacturing
Transistor on
non-critical
path: target CD
70nm
Transistor on
near setupcritical path:
target CD 66nm
Transistor on
setup-critical
path: target
CD 65nm
• Design Win: 20+% less leakage, 30+% less leakage variability
• Manufacturing Win: Same process has more value
• UCSD-patented flow: marker shapes in GDSII sent to OPC, etc.
• Currently offered in TSMC’s Green “Power Trim” service
• Energy savings over lifetime of AMD/ATI Radeon GPU chips: O(annual output of
nuclear power plant)
*Patent Pending
Andrew B. Kahng 110328 24
Equipment Can Also Trade Off Delay, Leakage
• ASML “DoseMapper” technology
Scan direction profile
• Adjusts exposure dose to compensate
across-chip, across-wafer variations
• Up to +/-5% dose variation in both “slit”
and “scan” directions
• Dose sensitivity of linewidth: 1-2 nm/%
Slit direction profile
DoseMap in slit,
scan directions
Adjust exposure dose
Andrew B. Kahng 110328 25
Design-Aware DoseMap (DAC’08)
Original DoseMap: same CDs
 Goal: same CD for all devices
 No design awareness
Our DoseMap: different CDs
 Setup-critical path: larger
dose  faster transistors
 Non-critical path: smaller
dose  less leaky transistors
 Improve timing yield with no
leakage penalty
Andrew B. Kahng 110328 26
Design-Aware Manufacturing
KEY IDEA: Not all shapes equally important!
DESIGN INFORMATION
Design
Variation: Measure,
Model, Monitor
Manufacturing
MANUFACTURING INFORMATION
2
Manufacturing-Aware Design
KEY IDEA: Mitigate and compensate
systematic variations
Systematic Defocus-Layout Interaction
• Defocus: Non-ideal distance from light source to wafer surface in
lithography tool
• Up to ~70nm in 45nm process
[size of retrovirus = ~100nm]
• Interaction with Layout: defocus causes “isolated” transistors to
speed up, but “dense” transistors to slow down
Line Width
Width of dense lines increases
(SMILE) (slower)
Actual variation if
Best-case to
worst-case
variation if
nothing is known
about layout
pattern
dense-ness is known
Defocus
Actual variation if
iso-ness is known
Width of isolated lines decreases
(FROWN) (faster)
• DAC’03: Awareness of defocus-layout interaction can save 40% of
best-worst timing analysis guardband
Andrew B. Kahng 110328 28
Self-Compensating Design (DAC’05)
Lgate
• Idea: Isolated, Dense lines have opposite
behavior under defocus  mitigate
variation by mixing them in layout
delay
• Goal for near-early path:
0
defocus
Defocus
• Goal for near-late path:
delay
0
defocus
• Self-compensation
Transistor gate
• In cell library layout: mix Iso and Dense
transistor gates on timing arcs
• In standard-cell layout: mix Iso and Dense
cells on timing paths
Timing path
Andrew B. Kahng 110328 29
Improved Delay Distribution Under Defocus
Required cycle time = 2.177ns
#of samples
30
0
original
2.10
2.15
300
0
2.20
2.25
• Monte-Carlo simulation (1000
trials) of timing under defocus
• Self-compensated design
achieves more robust timing
with 0.4% area overhead
Self-compensated
2.10
2.15
2.20
2.25
delay(ns)
Andrew B. Kahng 110328 30
Variation Mitigation in Design
• Observations from manufacturing
• Variations depend on layout pitch: e.g., Iso vs. Dense
• Some pitches are “forbidden”: poor patterning, small process window
• Manufacturing-side solutions exist…
Sub-Resolution Assist Feature (AF)
Auxiliary Pattern (AP)**
Enhances uniformity of the poly pitch
Cannot apply to wrong pitch!
Shields proximity effect
Requires extra spacing!
… but are not always feasible
Design must change to help manufacturing correctness
 Our “-CORR” techniques
** U.S. patent issued to UCSD in January 2011
Andrew B. Kahng 110328 31
Dynamic Programming Optimizations
Cell Library
RTL
synthesis
Technology mapping
Boolean mapping
Gate-Level Netlist
Original placement
Shift for AF
Placement
Placement Opt.
① Shift cells in available whitespace
 Make room for AF or AP
 Improve timing yield (STI stress)
② Swap cells
 Make leakage-optimal pitch
AF-CORR [SPIE 2005, TCAD2007]
- OPC error reduction by 83-100%
AP-CORR [SPIE 2006, JM3 2008]
- OPC runtime reduction by 5x-200x
Routing
Leakage-CORR [ISLPED07]
- Leakage reduction by 5-7%
Manufacturing
Andrew B. Kahng 110328 32
Dynamic Programming Optimizations
Cell Library
RTL
synthesis
Technology mapping
Boolean mapping
Gate-Level Netlist
Original placement
Shift for AF
Placement
Placement Opt.
① Shift cells in available whitespace
 Make room for AF or AP
 Improve timing yield (STI stress)
② Swap cells
 Make leakage-optimal pitch
AF-CORR [SPIE 2005, TCAD2007]
- OPC error reduction by 83-100%
AP-CORR [SPIE 2006, JM3 2008]
- OPC runtime reduction by 5x-200x
Routing
Manufacturing
Stress-CORR [ICCAD07]
- STI width-dependent stress
- Timing speedup by 5% with no Leakage
Andrew B. Kahng 110328 33
Design-Aware Manufacturing
KEY IDEA: Not all shapes equally important!
DESIGN INFORMATION
Design
Variation: Measure,
Model, Monitor
Manufacturing
3
MANUFACTURING INFORMATION
Manufacturing-Aware Design
KEY IDEA: “Random” Variation  Systematic
(Then, Mitigate and Compensate)
Measure, Model
•
•
•
•
Scales: Across wafer, field (mask), die
Sources: litho, etch, CMP, stress, anneal, …
Structures: scribe-line devices, ring oscillators, …
Measurements: Idsat, fmax, I-V characteristic, …
 A difficult task!
TC
TR
TL
40nm foundry partner: Ids of 9
isolated devices; frequency of
17 ROs per field.
75 fields per 300mm wafer 
huge data
Where to place test structures?
CR
CL
C
How many to measure?
Bad/noisy measurements?
Interpolation?
BL
BR
BC
Decomposition?
Model fitting?
………
Andrew B. Kahng 110328 35
UCSD Website for Variation Mapping
• Various variation modeling techniques are available
Andrew B. Kahng 110328 36
Observed Variations in Foundry Data
• STMicroelectronics 40nm (900 fields, 9 points/field)
[ST visitor’s project = enable 28nm DoseMap]
Ids variation (%)
• Intrafield variation modeling
Location in field
Measured
Modeled
• IBM 65nm SOI (348 wafers, 100 dies/wafer, 14 points/die)
• Intrawafer variation modeling
Andrew B. Kahng 110328 37
Recent Direction: Natural Timing Paths
• Many test structures  large area, test cost
• Few test structures  inaccurate
???
• Natural timing paths in a design
• Do not require additional area
• Can measure variation in (natural) timing paths
• Automated measurements: speedpath test
• Gives rise to variation mapping problem :
Given a 2D gridded region and measured delays (= compounded
variations) of timing paths that span multiple grids
Explain the compounded variations with a map of physical parameter
variations in each grid
Andrew B. Kahng 110328 38
Compressed Sensing: TAU’11 to appear
• Scenario: Die stacking
2nd die
• Can we reconstruct
variations of interconnect
capacitance and gate CD
across two stacked die?
1st die
1st die CD map
2nd die CD map
Gate CD map
Int. Cap. map
Given
Restored
Max. error: 0.70% Max. error: 0.39%
(using 50 paths)
(using 50 paths)
Max. error: 1.68%
(using 90 paths)
Max. error: 0.76%
(using 90 paths)
Andrew B. Kahng 110328 39
Monitor: Design-Dependent RO
1 Delay
.
Vth Delaynom
Gate A
1
Delay
.
Lgate Delaynom
1 Delay
.
Vth Delaynom
Gate B
Delay
1
.
Lgate Delaynom
1 Delay
.
Vth Delaynom
DDRO
path (A+B)
Delay
1
.
Lgate Delaynom
• Problem: Measure real-time performance
variation in an adaptive system
• Approach: Select gates to form designdependent ring oscillators (DDROs) with
similar delay sensitivity to variations (Lgate,
Vth, Tox, V, T, …) as actual critical paths
• Potential benefits:
• Specific to path’s rising or falling transition
• Can cluster critical paths having similar
sensitivities to reduce number of RO
• Low area overhead
• Automated design flow, standard cells only
1 Delay
.
Vth Delaynom
Critical path
Delay
1
.
Lgate Delaynom
Andrew B. Kahng 110328 40
DDRO Synthesis Flow
Gate
sensitivities
1 Delay
.
Vth Delaynom
Critical path
sensitivities
Cluster 1
Critical path
DDRO
error
Cluster
critical paths
Delay
1
.
Lgate Delaynom
Cluster 2
45nm SOI
test chip
ARM
Cortex M3
DDRO
Cluster 1
Cluster 2
Cluster 3
Cluster 4
Cluster 5
Average
Synthesis result
Delay sensitivity Error (%)
For each cluster,
synthesize a DDRO using
integer linear program
INV. RO
6
4
CPRO
DDRO
2
Andrew B. Kahng 110328 41
Monte Carlo Simulation Results (30 samples)
Without within-die variation modeling
Estimated delay (ns)
Estimated delay (ns)
1.2
Estimation error :
-1.4 % ~ 3.7 %
1.2
Estimated delay (ns)
Estimation error :
-2.0 % ~ 4.1 %
1.2
1.1
1.1
1.1
1.0
1.0
1.0
Critical path RO
DDRO
0.9
0.9
1.0
1.1
Actual delay (ns)
Estimation error :
-4.3 % ~ 7.1 %
1.2
0.9
0.9
1.0
1.1
1.2
Inv. RO
0.9
0.9
1.0
1.1
1.2
Actual delay (ns)
Actual delay (ns)
With within-die variation modeling
Estimated delay (ns)
Estimated delay (ns)
1.2
Estimation error :
-0.5 % ~ 3.7 %
1.2
Estimated delay (ns)
Estimation error :
-1.3 % ~ 3.6 %
1.2
1.1
1.1
1.1
1.0
1.0
1.0
DDRO
0.9
0.9
1.0
1.1
Actual delay (ns)
Estimation error :
-1.7 % ~ 5.1 %
Inv. RO
Critical path RO
1.2
0.9
0.9
1.0
1.1
Actual delay (ns)
1.2
0.9
0.9
1.0
1.1
Actual delay (ns)
1.2
Andrew B. Kahng 110328 42
Outline
• I. Motivation: Variability and Value
• II. Classical Design for Manufacturing: DAM, MAD
• III. Futures at the Design-Manufacturing Interface
• MAD for recent patterning technologies
• Living with variation: Resilience
• Emerging technologies and new problems: EUV, 3D, …
Andrew B. Kahng 110328 43
Double Patterning Lithography (DPL)
+
Desired
pattern
First Mask
Second Mask
Combined
exposure
Note: Shown here is the “Litho-Etch-Litho-Etch” (LELE)
approach (e.g., TSMC 20nm node). Other DPL approaches
offer different challenges.
Andrew B. Kahng 110328 44
DPL Layout Decomposition
d1<t
d2<t
d1<t d2<t d3<t
d3>t
d4>t
d4>t
• Two features must be assigned opposite colors (= masks) if their
spacing is less than a given minimum coloring spacing t
• IF two features within minimum coloring spacing t cannot be
assigned different colors
• THEN at least one feature must be split into two or more parts
• Pattern split increases manufacturing cost and complexity
• Overlay (misalignment error) between exposures causes pattern errors
• Should choose robust splitting points in non-critical patterns
• Coloring spacing t induces a graph of color conflicts: resulting “graph
bipartization” formulations have been addressed in my group since 1998
Andrew B. Kahng 110328 45
DPL Layout Decomposition Flow (ICCAD’08)
• Layout fracturing
• Polygons  rectangles
• Graph construction
• Conflict cycle (CC)
detection
• Overlap length
computation
• If there is a feasible
dividing point  node
splitting
• Otherwise, report an
unresolvable conflict
cycle (uCC)
• Graph updating
• ILP based DPL color
assignment
Layout fracturing
Graph construction
Conflict cycle detection
Conflict
cycle?
No
ILP
Yes
Overlap length computation
Overlap
margin?
No
uCC
Yes
Node splitting
Graph update
46
Andrew B. Kahng 110328 46
Example DPL Coloring Results
Poly layer
Metal layer
• Layouts are correctly decomposed with respect to a
prescribed overlap margin at splitting points (for overlay
robustness)
47
Andrew B. Kahng 110328 47
Bimodal CD distribution in DPL
• Two patterning steps  Two different CDs
Green lines
from 1st patterning
Blue lines
from 2nd patterning
• Two different colorings  Two different timings
C12-type cell
C21-type cell
C12: Odd polys in BLUE,
Even polys in GREEN
C21: Odd polys in GREEN,
Even polys in BLUE
Gates from CD group1
Gates from CD group2
Andrew B. Kahng 110328 48
Impact of Bimodality on Guardband
• Comparison of design guardband (Min-Max delay)
• Unimodal representation is too pessimistic
3.0E-11
Delay (s)
2.5E-11
2.0E-11
Large CD group
1.5E-11
Small CD group
1.0E-11
5.0E-12
CD mean
difference
Best case: Large CD group
Worst case: Large CD group
Best case: Small CD group
Worst case: Small CD group
Best case: Pooled CD
Worst case: Pooled CD
0.0E+00
1 nm
2 nm
3 nm
4 nm
5 nm
6 nm
CD Mean Difference
Jeong et al. ASPDAC’09
Andrew B. Kahng 110328 49
Impact of Bimodality on Path Delay
• Bimodality can help reduce path delay variation
• Reduction of covariance when alternately colored
C12
+
C12
C12
SPICE Simulation Results
25
+
Alternate
+
+
20
+4
Variation () is accumulated
C12
C21
C12
Uniform
Sigma / Mean (%)
C12
15
C21
10
+
+
-
Variation () is compensated
5
-
0
0
0
1
2
3
4
5
6
CD Mean Difference (nm)
Andrew B. Kahng 110328 50
Impact of Bimodality on Clock Skew
• Different coloring sequences in a clock network
 Clock skew
Case
Source to Sink A
Source to Sink B
1
2
C12+C12+C12+…+C12 C12+C12+C12+…+C12
C12+C12+C12+…+C12 C21+C21+C21+…+C21
Clock skew (s)
6.00E-11
5.00E-11
Case2
4.00E-11
3.00E-11
2.00E-11
Case1
1.00E-11
0.00E+00
0nm 1nm 2nm 3nm 4nm 5nm 6nm
CD mean difference
• Same color on all clock buffers is better!
Andrew B. Kahng 110328 51
Bimodal CD Distribution: 3 Key Facts
1. Design requires bimodal-aware timing models
• Unimodal representation is too pessimistic
2. Data paths benefit from alternate (mixed) coloring
• Exploit existence of two uncorrelated CD populations
• Minimize correlated variations in a given path
3. Clock paths benefit from uniform coloring
• Correlated variation between launch and capture paths
minimizes bimodality-induced clock skew
Andrew B. Kahng 110328 52
Bimodality-Aware Timing Model and Analysis
• Timing model
G2
G1
G1
G2
• Two timing libraries:
• G1L-G2S: group1 has larger CD than group2
• G1S-G2L: group1 has smaller CD than group2
• Two coloring versions of a cell in each library
• C12: leftmost poly is in group1
• C21: leftmost poly is in group2
• Mean CD difference
• Chosen from process information
• E.g., 2nm, 4nm and 6nm
• Timing analysis
• Worse timing between G1L-G2S and G1S-G2L libraries is
regarded as the actual worst-case timing
Andrew B. Kahng 110328 53
DPL Layout-to-Mask Flow (ICCAD’09)
RTL-to-GDS
DPL Mask Coloring
Alternate coloring
using integer-linear
programming
Bimodal-Aware
Timing Analysis
Optimization 1
Maximization of
Alternate Coloring
(Datapaths)
Coloring conflict
> Minimum resolution
Optimization 2
Placement Perturbation
for Color Conflict Removal
(Clock and Data paths)
Placement perturbation using
dynamic programming (“CORR”)
Andrew B. Kahng 110328 54
Overall Timing Improvement
• Bimodal timing model  Reduce pessimism
• Alternate coloring  Improve timing
• Placement perturbation  Remove conflicts
Stage
#Conflict
Initial Coloring
(Unimodal)
0
Initial Coloring
(Bimodal)
0
Alternative
Coloring
219
DPL-Corr
(+ECO Routing)
0
Mean CD Difference
Timing
Metric
2nm
4nm
6nm
WNS (ns)
-1.113
-2.016
-2.902
TNS (ns)
-671.1
-1776.3
-3348.5
WNS (ns)
-0.191
-0.354
-0.527
TNS (ns)
-8.17
-26.56
-64.64
WNS (ns)
-0.090
-0.145
-0.267
TNS (ns)
-1.48
-3.85
-22.40
WNS (ns)
-0.104
-0.183
-0.295
TNS (ns)
-3.43
-10.45
-28.42
Complete design methodology to mitigate bimodality impact
(timing models, timing signoff, coloring, placement)
Andrew B. Kahng 110328 55
Summary: DFM = A Focus Area Since 1997
• Helps to see a big picture
• Manufacturing processes (litho, CMP, …)
• Designs (architectures, circuits, devices)
• Design methodology and CAD
• Large-scale optimization, algorithms
• Industry structure, interfaces, handoffs
• Fast-moving
• Always many more research problems than time/students to handle
• Spacer double-patterning, EUV lithography, 3D integration with
through-silicon vias, resilience, aging, …
• Relevant
• Real-world solutions are urgently needed by industry
• “Design = Equivalent Scaling” (this includes DFM) 
high value
• (My research interests are more wide-ranging than this )
Andrew B. Kahng 110328 56
THANK YOU
BACKUP
Living With Variation: Resilience
Outline
• I. Motivation: Variability and Value
• II. Classical Design for Manufacturing: DAM, MAD
• III. Futures at the Design-Manufacturing Interface
• MAD for recent patterning technologies
• Living with variation: Resilience
• Emerging technologies and new problems: EUV, 3D, …
Andrew B. Kahng 110328 59
Why Resilience Is Required
Paradigm shift is forced by technology scaling (variability )
Traditional Worst-Case Design
Design‐Time Verification and Optimization
Resilient Design
Typical Case Optimization
Run‐Time
Verification
“Better Than Worst‐Case Design” [Austin05]
Relaxing the requirement of correctness for designs can
dramatically reduce costs of manufacturing, verification
and test ...
Andrew B. Kahng 110328 60
Resilient Designs
• Resilience: dynamic (runtime) reliability management
• Tolerate errors with redundancy techniques
• Avoid errors with a sensor and adaptive control
Error Tolerance
Error Avoidance
Error Acceptance
Recovery‐Driven Design
Adaptive System
Approximate Design
performance
monitoring
detect and correct
errors w/ Razor flip-flop
dynamical voltage
and frequency
scaling
performance  power 
w/ relaxation
(e.g., human sense
related application)
Andrew B. Kahng 110328 61
[DAC10]
Recovery-Driven Design
• Low-power methodology for error-tolerant designs
• Minimize power for a target error rate
• Slack redistribution w/ functional information
Voltage Scaling
reduce voltage until
the error rate
exceeds a target
Path Optimization
Power Reduction
optimize frequently
exercised, negative
slack paths
reducing power w/o
affecting error rate
Andrew B. Kahng 110328 62
Recovery-Driven Design: Experimental Results
• Path extraction and error rate estimation
accurate
fast(20X)
• Power comparison at each design technique
25% power savings
w/ 2% error
22% power savings
w/ Razor flip‐flop
Power Consumption (W)
0.021
Conventional P&R
Tight P&R
0.019
PCT
Slack Optimizer
0.017
Power Optimizer
0.015
0.13%
0.25%
0.50%
1.00%
2.00%
4.00%
8.00%
Error Rate (%)
Andrew B. Kahng 110328 63
Resilience Overhead
• Resilience incurs design overheads
i.e., additional circuits and operations
Razor flip-flop
- 2x area, 1.5x power
- additional energy on error recovery
• Tradeoff between resilient overhead and design cost reduction
# of Razor F/F
Area (power) of
fanin circuit
tradeoff
Goal:
Minimize a cost function of (area, power) using the tradeoffs
Andrew B. Kahng 110328 64
Resilience Overhead Reductions
1. Selective-endpoint optimization
: Optimize endpoint incrementally based on sensitivity function
p: negative slack path ,
fanin(p): # cell in fanin cone
2. Clock skew optimization
: Maximize timing tolerance
endpoint optimization
clock‐skew optimization
Andrew B. Kahng 110328 65
Approximate Arithmetic Designs
Approximation generates good enough
results rather than totally accurate results
(e.g., for applications related to human
senses)
Approximate
Adder
Approximate
Multiplier
cut carry chain
performance 
error rate: < 1%
Lu et al. Computer, 2004
under‐designed multiplier
45% power reduction
w/ 3.3% avg. error
Kulkarni et al. VLSI Design, 2011
Andrew B. Kahng 110328 66
Error-Correctable Approximate Adder (ECA)
• Proposed design: divide into sub-adders to cut carry chain
error detection
A [15:12]
A [11:8]
A [7:4]
A [3:0]
B [15:12]
B [11:8]
B [7:4]
B [3:0]
Co, S [15:12]
S [11:8]
(16‐bit adder implementation)
S [7:0]
(case 1) carry[8] && S[11:8] == 1111(2)
(case 2) carry[4] && S[7:4] == 1111(2)
error correction
(case 1) S’ = S + 16’h1000
(case 2) S’ = S + 16’h0100
• Both accurate and approximate results are available
accurate results
approximate results
Andrew B. Kahng 110328 67
BACKUP
Unifying Litho/Mask, Design Rules,
and Electrical Metrics
Gate Line-End Patterning
• Problem:
• Transistor is no long rectangular  Especially, gate
line-end suffers from tapering
• Tapering results in large leakage increase
• To suppress tapering, long line-end extension and
complex RET are required
 Increase manufacturing cost in terms of area and complexity
• Traditional line-end metrics
• Line-end gap (LEG), line width at gate edge (LW0)
• Have guided litho and RET for many years, but may be oblivious to
tradeoff of area, cost, and variation-robustness
LW0
LEG
Line-End Shortening
(LES)
Line-End Bridging
(LEB)
• Electrical model of line-end is required
Andrew B. Kahng 110328 69
How Does Line-End Affect Current?
• LEE affects the current (Ion and Ioff) at the gate
edge.
• As area of LEE increases, current at the gate edge
increases sharply and the increase depends on the Ctaper
2.70E+07
2.60E+07
60nm
40nm
10nm
2.50E+07
2.40E+07
* From DaVinci
2.30E+07
2.20E+07
1
80nm fixed
Varied:
10, 40, 60nm
4
7
10
13
16
19
22
25
28
31
Diffusion
Gate
34
37
40
43
46
49
52
55
80nm fixed
70nm fixed
Diffusion
Andrew B. Kahng 110328 70
Impact of Line-End
• LEE vs. Capacitance
Increasing LEE
• Line-end extension increases Cg
because of fringe capacitance
between line-end extension and
channel
• Capacitance vs. Vth
Vth  V fb  2 B 
Vth
• Cg affects Vth, following Vth model
equation.
• Cg increase  Vth decrease
• Cg decrease  Vth increase
4 si qN a B
Cox
V fb  2 B
Cg
240
240
220
200
200
180
160
180
Ion(uA)
140
Current (Ioff: pA)
• Ion and Ioff are functions of Vth
• Vth increase  Ion, Ioff decrease
• Vth decrease  Ion, Ioff increase
Current (Ion: uA)
• Vth vs. Current
220
Ioff(pA)
160
120
140
100
0
10
20
30
40
50
60
70
80
90
100
Line-End Extension (nm)
04/09/2008
Andrew B. Kahng 110328 71
Line-End Shape Evaluation
y
• Super-Ellipse Representation for line-end
x
yk

1
a
b
n
b
n
o
a
k
x
• Typical Line-End Shapes
Small n
Large n
1
Small a
Large a
Mirroring
b
a
Lnom
(a) Tapering
Minimum
Necking Location
lmin k
2
Large b
Small b
b
ylmin
b
lmin k
Mirroring
3
Lnom
Lnom
Lnom
(b) Bulge
(c) Necking
• Linewidth Model

Tapering
h k
li  2a1  i

and Bulge
bc

n 1/ n




Necking
Andrew B. Kahng 110328 72
Mask + Design Rule + Performance: Unify!
OPC Cost
Design Rule
Electrical Cost
minEdgeLength
line-end length
leakage power
Leakage Current vs.
(minEdgeLength, LEE)
OPC Cost Increase
Dose=0.96, DoF=0nm
1.15E-10
Dose=1.04, DoF=0nm
1.30E-09
1.05E-10
1.10E-09
9.50E-11
8.50E-11
OPC
7.50E-11
default
6.50E-11
100nm
5.50E-11
50nm
4.50E-11
30nm
3.50E-11
10nm
9.00E-10
default
100nm
7.00E-10
50nm
5.00E-10
30nm
10nm
3.00E-10
2.50E-11
1.00E-10
1.50E-11
100
90
80
Design Area Increase
70
60
50
40
100
30
90
80
Dose=0.96, DoF=100nm
1.00E-07
70
60
50
40
30
Line-End Length (nm)
Line-End Length (nm)
Dose=1.04, DoF=100nm
1.00E-03
default
1.00E-04
default
100nm
1.00E-08
100nm
50nm
30nm
50nm
1.00E-05
30nm
10nm
1.00E-09
10nm
1.00E-06
100
90
80
70
60
50
Line-End Length (nm)
40
30
100
90
80
70
60
50
40
30
Line-End Length (nm)
Design Rule
Andrew B. Kahng 110328 73
Analysis on Tapering: Area vs. Cost vs. Ioff
• Standard cell
y
e a
a
b
a
k
x
x
yk

1
a
b
n
b
Poly
c2
Diffusion
f
f
NWell
H
d
d
g
c1
Contact
ab
a
e
10
8.00E-10
Large ‘n’
Small ‘n’
7.50E-10
n=2.5
9
7.00E-10
n=3.0
n=3.5
8
6.50E-10
Misalignment: 11nm
Ioff (A)
o
n
7
n=4.0
n=4.5
6.00E-10
6
n=5.0
Area Reduction (%)
5.50E-10
5
5.00E-10
4
4.50E-10
3
4.00E-10
2
3.50E-10
1
3.00E-10
0
100
90
80
70
60
50
40
30
Area Reduction (%)
• New shape metric:
Superellipse
20
LEE (nm)
Andrew B. Kahng 110328 74
Best OPC + Design Rule Combination
• Cell area can be reduced by 10%
• Ioff can be reduced by 29%
LEE
(nm)
LEG (nm)
10
20
30
40
50
60
70
80
90
100
100
B
224 220 376 189 181 145 162 164 152
90
B
220 212 340 185 176 146 159 162 153
80
B
216 213 370 180 172 140 156 159 150
70
B
209 207 330 175 168 140 151 156 148
60
B
204 200 353 172 164 136 147 153 145
50
B,S
S
S
310 162 160 132 142 148 142
40
B,S
S
S
347 161 161 108 138 144 139
30
B,S
S
S
342
S
S
S
135
S
S
20
B,S
S
S
S
S
S
S
S
S
S
10
B,S
S
S
S
S
S
S
S
S
S
Bridging: B
Line-end
shortening: S
Broken: F
Andrew B. Kahng 110328 75
BACKUP
Emerging Technologies
Outline
• I. Motivation: Variability and Value
• II. Classical Design for Manufacturing: DAM, MAD
• III. Futures at the Design-Manufacturing Interface
• MAD for recent patterning technologies
• Living with variation: Resilience
• Emerging technologies and new problems: EUV, 3D, …
Andrew B. Kahng 110328 77
SADP Processes
• Self-aligned double patterning (SADP)
spacer
Mandrel
Oxide
Spacer formation
Oxide
1st Litho-Etch
Oxide
Oxide etch & Cu filling
• Spacer is Dielectric (SID)
• Spacer is Metal (SIM)
• SID uses simpler processes
Resist
Block Mask
Oxide Etch
Dielectric
Metal
(to be filled)
Cut Mask
Spacer Cut
Ash
Tone Reversal
Oxide Etch
Figure courtesy Y. Ma et al., GlobalFoundry, SPIE 2010
Andrew B. Kahng 110328 78
CD Variability in SID-SADP
Block mask width
variation (SB)
Spacer thickness
variation (SS)
Mandrel width
variation (SM)
Overlay (SM-B)
CD
Mandrel
Block mask
Spacer
Overlay
Width
variation
sM
sB
sS
Width variation
(2.0n@2012) (2.0n@2012)
(1.3n@2012)
(from ITRS LITH5B)
Mandrel-to-Block
sM-B
(8n@2012)
Feature
Variance (2)
 (nm)
Metal CD by mandrel
sM2
2.0
Metal CD by gap
sM2+(2sS)2
3.3
Metal CD by wide mandrel+block
(0.5sM)2+sM-B2+(0.5sB)2
8.1
Metal CD by wide gap+block
sM2+sS2+sM-B2+(0.5sB)2
8.4
Andrew B. Kahng 110328 79
Variability Vs. Mask Assignments
• Block mask patterns that define line edge should be avoided
Line edges defined
by only spacers
block mask (yellow)
defines line edges
With overlay error
Target metal
pattern
Final pattern
CD variation
No CD variation
• Patterns by mandrel vs. patterns by gap
• There is still a bimodality between A and B !
• Important signal nets should be generated by mandrel
B
A
Coloring1
Coloring2
A < B
A > B
Andrew B. Kahng 110328 80
Extreme Ultraviolet Lithography (EUVL)
ArF (193nm) EUV (13.5nm)
• What is EUV?

• A form of soft X-rays
• Much smaller  and higher energy
• EUVL requires a new set of systems
Multilayer reflective mask
E
1000nm
100nm
IR
VUV
1nm
1nm
Soft X-ray
UV
1eV
10nm
EUV
10eV
100eV
6.9eV
Hard X-ray
1KeV
10KeV
92eV
Figure courtesy G. Vandenberghe, IMEC, 2008
ArF
193nm
EUV
New illumination optics
High power source
Projection optics
Resist
Figure courtesy G. Vandentop, Intel, SPIE 2009
wafer
Bossung curves for EUV vs. ArFi
• Benefits of EUV
• Sharp and less corner rounding
• Large process window
 No complicated OPC
 Reduced data volume / mask write time
CD(nm)
ArF: 40nm DR
EUV: 28nm DR
Focus (nm)
Figure courtesy H. Meiling, ASML, SPIE 2009
Andrew B. Kahng 110328 81
Research Directions for EUVL
• Development of high power EUV sources
• New resist materials to balance resolution, sensitivity and LER
• Defect-free mask generation
• Buried blank defects in multilayer  CD errors
• Defect density target: 1 defects/cm2 today  ~0.003 defects/cm2 for logic
Figure courtesy T. Terasawa et al., Selete, 2009
• New inspection/metrology sensitive to small volume defects
• Biggest gap exists between reality and requirement
• Design issues
• Design methodology with reduced guardband
• Fast simulator for impact of buried defects
• Defect-aware reticle floorplan
Andrew B. Kahng 110328 82
Why 3-D Integration?
Small footprint
Multiple small dies
Selective assembly
Reduced interconnect
Reduced capacitance
~50% Lower power
Long
interconnect
Improved yield
Reduced cost
~40% Higher speed
Higher bandwidth
Short interconnect
CPU
MEM
Offchip-package
BW < 100GB/s
CPU
MEM
2D Planar MCP
BW =100-200GB/s
MEM
CPU
3-D
Stacked Die
MCP
BW > 1TB/s
Higher security
(from reverse engineering)
Enabling heterogeneous integration
(Different technology/foundry/devices/…)
Andrew B. Kahng 110328 83
Research Directions in 3-D Integration
• Challenges
• Design
• Heat dissipation / Power distribution / chippackage system co-design /
3-D design partitioning / fast simulation and
verification / standardization
• Material
Figure courtesy A. Sridhar et al., IBM, 2010
• Stress-temperature resistant material /
superior thermal- electrical characteristics /
lower elasticity
• Process
• Wafer thinning / stress relief / wafer-to-wafer
alignment and bonding / TSV generation
• Test
• Known-Good-Die (KGD) assurance /
accessibility to individual wafer/die/TSV
Microchannel cooling
Figure courtesy A. Sridhar et al., IBM, 2010
Andrew B. Kahng 110328 84
BACKUP
Compressed Sensing
DCT-Based Compressed Sensing
• Li et al., ICCAD-2009, “Virtual Probe (VP)”
• A variation can be restored when DCT coefficients G(u,v)
are known, using inverse DCT
-41.25
20.16
-3.92
2.17
-0.93
0.73
-0.37
0.32
-0.16
0.13
-0.05
A  η  B;
20.16
-3.92
0.00  A10.00
,1,1
0.00  0.00
A2,1,1
0.00
A
  0.00
0.00  0.00
0.00  0.00
,1,1
0.00  APQ0.00
0.00
0.00
0.00
0.00
Ak ,u ,v
0.00
0.00
0.00
0.00
2.17
-0.93
0.73
-0.37
0.32
-0.16
0.13
-0.05
A1,1, 2 0.00
A1, P,Q 0.00
 0.00
G
(
1
,
1
)
(
1
,
1
) 
g
0.00
0.00
0.00
0.00
0.00




0.00
0.00
0.00
0.00
0.00
0.00

0.00

 g (1, 20.00

A

A
G
(
1
,
2
)
)
2
,
1
,
2
2
,
P
,
Q
0.00
0.00
0.00
0.00
0.00
0.00
 η  0.00


 B 0.00
0.00
0.00
0.00  0.00
0.00
0.00




0.00  0.00


0.00
0.00
0.00
0.00
0.00
0.00
0.00



0.00

APQ,1, 2 0.00
 0.00
APQ, P,Q0.00
)
G ( P , Q0.00
)

 g (P, Q
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00 (2
0.00
x  1)(0.00
u  1) 0.00 (20.00
y  1)(v0.00
 1) 0.00
cos 0.00 0.00 cos
0.00
uv 0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00 2 P 0.00
0.00
0.002Q 0.00
0.00
• Sparsity of DCT coefficients enables efficient restoration
from a small number of samples  Minimize L1-norm of
DCT coefficients
minimize:
subject to:
1 + 2 + … + PQ
A η  B
−i  i  i,
(i = 1,2, . . . ,PQ)
Andrew B. Kahng 110328 86
Download