Futures at the IC Design-Manufacturing Interface Andrew B. Kahng UCSD VLSI CAD Laboratory abk@ucsd.edu http://vlsicad.ucsd.edu CSE and ECE Departments University of California, San Diego Outline • I. Motivation: Variability and Value • II. Classical Design for Manufacturing: DAM, MAD • III. Futures at the Design-Manufacturing Interface Andrew B. Kahng 110328 2 Outline • I. Motivation: Variability and Value • Semiconductor trends • Power and yield challenges • How to maximize value from new technology: “DFM” • II. Classical Design for Manufacturing: DAM, MAD • III. Futures at the Design-Manufacturing Interface Andrew B. Kahng 110328 3 What Does an IC Do? 0.01 0.1 Video MPEG1 Extraction JPEG Audio Voice 100 GOPS 10 MPEG2 Extraction Compression MPEG4 Sentence Translation Dolby-AC3 MPEG 1 Voice Auto Translation Word Recognition 3D Graphics Graphics 10Mpps 100Mpps 2D Graphics Communication Modem Recognition FAX VoIP Modem Voice Print Recognition SW Defined Radio Face Recognition Moving Picture Recognition Requirement for multimedia processing (GOPS: Giga OPs/Sec) 2007 ITRS Consumer-Stationary SOC Model: 220 TFlops on single chip in 2022 Andrew B. Kahng 110328 4 What Does an IC Look Like? • Qualcomm’s first 45nm tapeout: “Serra” (~July 2009) • • • • • • • • • • 53 mm2 , 419 signal IOs 170M transistors 13.7M bits RAM 1.1M bits ROM CDMA, UMTA, GSM 2 DSP, ARM9, ARM11 WVGA, ATI LT, 8MP 6 metal layers + RDL 2 Vt, 2 Lgate 450uA leakage TT / 25C / 1.125V • 105 master, 230 total clock domains • 24 analog, pad power domains • 8 digital power islands • … Matt Severson, UCSD ECE 260B guest lecture, February 2011 Andrew B. Kahng 110328 5 How Is It Created? Design Manufacturing Test Corrected Layout E-Test Behavior Mask Wafer Sort Circuit Wafer Burn-In Layout Packaged Die Final Test System Andrew B. Kahng 110328 6 Manufacturing: Optical Lithography • Feature size (28nm) << Wavelength of light (193nm) Still 193nm 20nm in 2013 Numerical Technologies, Inc., 1998 Andrew B. Kahng 110328 7 Photomask Complexity Andrew B. Kahng 110328 8 Photomask Complexity (Intel 65nm) Kelin Kuhn, Intel, ICVC 2009 Andrew B. Kahng 110328 9 Photomask Complexity (Intel 65nm) Kelin Kuhn, Intel, ICVC 2009 Manufacturing complexity brings challenges… Andrew B. Kahng 110328 10 Challenge: Variability • What is this chip’s frequency? Figure courtesy Intel S. Nassif, IBM Across-wafer frequency variation Andrew B. Kahng 110328 11 Challenge: Power • Primary limiter of product features, performance, form factor • Expensive: packaging, electricity, cooling, … • In particular: Leakage power = “wasted” power D. Blaauw, U. Michigan Andrew B. Kahng 110328 12 Example of Power As Limiter: Form Factor Phone Surface Temperature Rise [C] Power heating thermal runaway, human discomfort Surface Power Density [W / sq-in] (target = 0.1) Huawei / M. Severson, Qualcomm Andrew B. Kahng 110328 13 1.4 1.3 1.2 1.1 1.0 0.9 1000 Intel CPUs at 130nm node 30% Normalized Frequency Challenge: Variability of Power 20x 0 5 10 Normalized Leakage 15 20 • Subthreshold leakage varies exponentially with Lgate, Vt, tox … 5-20X variation is common • Gate length [== “critical dimension” (CD)] variation in manufacturing is a major source of leakage variation Andrew B. Kahng 110328 14 1.4 1.3 1.2 1.1 1.0 0.9 1000 Intel CPUs at 130nm node 30% Normalized Frequency Challenge: Variability of Power 20x 0 5 10 Normalized Leakage 15 20 • Subthreshold leakage varies exponentially with Lgate, Vt, tox … 5-20X variation is common • Gate length [== “critical dimension” (CD)] variation in manufacturing is a major source of leakage variation • Parametric yield loss: cannot sell leaky chips or slow chips Andrew B. Kahng 110328 15 Cost of Variability: Example • Company hopes to sell 100M copies of its chip • 300mm wafer in 45nm process: $5,000 • Die size: 10mm x 10mm ~600 raw die per wafer • 90% vs. 95% yield parametric yield loss • 540 vs. 570 good die per wafer • 185,186 vs. 175,439 wafers needed $50M difference Andrew B. Kahng 110328 16 How Can Designers Deal With Variability? • Guardbands? •Increase margins and over-design more area, more power, less value • Statistics? • “Probabilistic” instead of “worst-case” design difficult since statistics are always changing • Wait for better manufacturing equipment? • My work: Connect IC Design and Manufacturing •Drive design requirements into manufacturing •Bring manufacturing awareness into design In practically useful ways Andrew B. Kahng 110328 17 Value at the Design-Manufacturing Interface Design Manufacturing Test Corrected Layout E-Test Behavior Mask Wafer Sort Circuit Wafer Burn-In Layout Packaged Die Final Test System Create value HERE Andrew B. Kahng 110328 18 Outline • I. Motivation: Variability and Value • II. Classical Design for Manufacturing: DAM, MAD • Design-Aware Manufacturing (DAM) • Manufacturing-Aware Design (MAD) • Variations: Measure, Model, and Monitor • III. Futures at the Design-Manufacturing Interface Andrew B. Kahng 110328 19 Design-Aware Manufacturing 1 KEY IDEA: Not all shapes equally important! DESIGN INFORMATION Design Variation: Measure, Model, Monitor Manufacturing 3 MANUFACTURING INFORMATION 2 Manufacturing-Aware Design KEY IDEA: Mitigate and compensate systematic variations Key Aspect of Design = Timing Slack • Positive timing slack can be “converted” into power reductions (smaller transistors, area, power, …) 3 - 1 1 +2 Tarrival 5 - 3 Trequired 2 +2 7 - 7 1-1 10 20 2 - 2 01 01 5- 5 20 4 - 4 1 +1 2- 1 Slack = Trequired – Tarrival CLK CLK Andrew B. Kahng 110328 21 Key Aspect of Design = Timing Slack • Positive timing slack can be “converted” into power reductions (smaller transistors, area, power, …) 3 - 1 1 +2 Tarrival 5 - 3 Trequired 2 +2 7 - 7 1-1 10 20 Gates of positive-slack 2 - 2 01 cells01 can have larger CD, 20 5- 5 variation budgets! 4 - 4 1 +1 2- 1 Slack = Trequired – Tarrival CLK CLK Andrew B. Kahng 110328 22 Transistor Gate-Length Biasing Delay Leakage Leakage and Delay vs. Gate Length • Bias Impact • Exponential • (Isub) Leakage reduction • Variability reduction • Linear • Performance reduction Gate Length [DAC’04, TCAD’06] • Idea: Use fine-grain gate-length biases (e.g., +2nm, +4nm) • Save leakage without changing chip timing • Chip-scale optimization after detailed routing, before tapeout Andrew B. Kahng 110328 23 Design-Aware Manufacturing Transistor on non-critical path: target CD 70nm Transistor on near setupcritical path: target CD 66nm Transistor on setup-critical path: target CD 65nm • Design Win: 20+% less leakage, 30+% less leakage variability • Manufacturing Win: Same process has more value • UCSD-patented flow: marker shapes in GDSII sent to OPC, etc. • Currently offered in TSMC’s Green “Power Trim” service • Energy savings over lifetime of AMD/ATI Radeon GPU chips: O(annual output of nuclear power plant) *Patent Pending Andrew B. Kahng 110328 24 Equipment Can Also Trade Off Delay, Leakage • ASML “DoseMapper” technology Scan direction profile • Adjusts exposure dose to compensate across-chip, across-wafer variations • Up to +/-5% dose variation in both “slit” and “scan” directions • Dose sensitivity of linewidth: 1-2 nm/% Slit direction profile DoseMap in slit, scan directions Adjust exposure dose Andrew B. Kahng 110328 25 Design-Aware DoseMap (DAC’08) Original DoseMap: same CDs Goal: same CD for all devices No design awareness Our DoseMap: different CDs Setup-critical path: larger dose faster transistors Non-critical path: smaller dose less leaky transistors Improve timing yield with no leakage penalty Andrew B. Kahng 110328 26 Design-Aware Manufacturing KEY IDEA: Not all shapes equally important! DESIGN INFORMATION Design Variation: Measure, Model, Monitor Manufacturing MANUFACTURING INFORMATION 2 Manufacturing-Aware Design KEY IDEA: Mitigate and compensate systematic variations Systematic Defocus-Layout Interaction • Defocus: Non-ideal distance from light source to wafer surface in lithography tool • Up to ~70nm in 45nm process [size of retrovirus = ~100nm] • Interaction with Layout: defocus causes “isolated” transistors to speed up, but “dense” transistors to slow down Line Width Width of dense lines increases (SMILE) (slower) Actual variation if Best-case to worst-case variation if nothing is known about layout pattern dense-ness is known Defocus Actual variation if iso-ness is known Width of isolated lines decreases (FROWN) (faster) • DAC’03: Awareness of defocus-layout interaction can save 40% of best-worst timing analysis guardband Andrew B. Kahng 110328 28 Self-Compensating Design (DAC’05) Lgate • Idea: Isolated, Dense lines have opposite behavior under defocus mitigate variation by mixing them in layout delay • Goal for near-early path: 0 defocus Defocus • Goal for near-late path: delay 0 defocus • Self-compensation Transistor gate • In cell library layout: mix Iso and Dense transistor gates on timing arcs • In standard-cell layout: mix Iso and Dense cells on timing paths Timing path Andrew B. Kahng 110328 29 Improved Delay Distribution Under Defocus Required cycle time = 2.177ns #of samples 30 0 original 2.10 2.15 300 0 2.20 2.25 • Monte-Carlo simulation (1000 trials) of timing under defocus • Self-compensated design achieves more robust timing with 0.4% area overhead Self-compensated 2.10 2.15 2.20 2.25 delay(ns) Andrew B. Kahng 110328 30 Variation Mitigation in Design • Observations from manufacturing • Variations depend on layout pitch: e.g., Iso vs. Dense • Some pitches are “forbidden”: poor patterning, small process window • Manufacturing-side solutions exist… Sub-Resolution Assist Feature (AF) Auxiliary Pattern (AP)** Enhances uniformity of the poly pitch Cannot apply to wrong pitch! Shields proximity effect Requires extra spacing! … but are not always feasible Design must change to help manufacturing correctness Our “-CORR” techniques ** U.S. patent issued to UCSD in January 2011 Andrew B. Kahng 110328 31 Dynamic Programming Optimizations Cell Library RTL synthesis Technology mapping Boolean mapping Gate-Level Netlist Original placement Shift for AF Placement Placement Opt. ① Shift cells in available whitespace Make room for AF or AP Improve timing yield (STI stress) ② Swap cells Make leakage-optimal pitch AF-CORR [SPIE 2005, TCAD2007] - OPC error reduction by 83-100% AP-CORR [SPIE 2006, JM3 2008] - OPC runtime reduction by 5x-200x Routing Leakage-CORR [ISLPED07] - Leakage reduction by 5-7% Manufacturing Andrew B. Kahng 110328 32 Dynamic Programming Optimizations Cell Library RTL synthesis Technology mapping Boolean mapping Gate-Level Netlist Original placement Shift for AF Placement Placement Opt. ① Shift cells in available whitespace Make room for AF or AP Improve timing yield (STI stress) ② Swap cells Make leakage-optimal pitch AF-CORR [SPIE 2005, TCAD2007] - OPC error reduction by 83-100% AP-CORR [SPIE 2006, JM3 2008] - OPC runtime reduction by 5x-200x Routing Manufacturing Stress-CORR [ICCAD07] - STI width-dependent stress - Timing speedup by 5% with no Leakage Andrew B. Kahng 110328 33 Design-Aware Manufacturing KEY IDEA: Not all shapes equally important! DESIGN INFORMATION Design Variation: Measure, Model, Monitor Manufacturing 3 MANUFACTURING INFORMATION Manufacturing-Aware Design KEY IDEA: “Random” Variation Systematic (Then, Mitigate and Compensate) Measure, Model • • • • Scales: Across wafer, field (mask), die Sources: litho, etch, CMP, stress, anneal, … Structures: scribe-line devices, ring oscillators, … Measurements: Idsat, fmax, I-V characteristic, … A difficult task! TC TR TL 40nm foundry partner: Ids of 9 isolated devices; frequency of 17 ROs per field. 75 fields per 300mm wafer huge data Where to place test structures? CR CL C How many to measure? Bad/noisy measurements? Interpolation? BL BR BC Decomposition? Model fitting? ……… Andrew B. Kahng 110328 35 UCSD Website for Variation Mapping • Various variation modeling techniques are available Andrew B. Kahng 110328 36 Observed Variations in Foundry Data • STMicroelectronics 40nm (900 fields, 9 points/field) [ST visitor’s project = enable 28nm DoseMap] Ids variation (%) • Intrafield variation modeling Location in field Measured Modeled • IBM 65nm SOI (348 wafers, 100 dies/wafer, 14 points/die) • Intrawafer variation modeling Andrew B. Kahng 110328 37 Recent Direction: Natural Timing Paths • Many test structures large area, test cost • Few test structures inaccurate ??? • Natural timing paths in a design • Do not require additional area • Can measure variation in (natural) timing paths • Automated measurements: speedpath test • Gives rise to variation mapping problem : Given a 2D gridded region and measured delays (= compounded variations) of timing paths that span multiple grids Explain the compounded variations with a map of physical parameter variations in each grid Andrew B. Kahng 110328 38 Compressed Sensing: TAU’11 to appear • Scenario: Die stacking 2nd die • Can we reconstruct variations of interconnect capacitance and gate CD across two stacked die? 1st die 1st die CD map 2nd die CD map Gate CD map Int. Cap. map Given Restored Max. error: 0.70% Max. error: 0.39% (using 50 paths) (using 50 paths) Max. error: 1.68% (using 90 paths) Max. error: 0.76% (using 90 paths) Andrew B. Kahng 110328 39 Monitor: Design-Dependent RO 1 Delay . Vth Delaynom Gate A 1 Delay . Lgate Delaynom 1 Delay . Vth Delaynom Gate B Delay 1 . Lgate Delaynom 1 Delay . Vth Delaynom DDRO path (A+B) Delay 1 . Lgate Delaynom • Problem: Measure real-time performance variation in an adaptive system • Approach: Select gates to form designdependent ring oscillators (DDROs) with similar delay sensitivity to variations (Lgate, Vth, Tox, V, T, …) as actual critical paths • Potential benefits: • Specific to path’s rising or falling transition • Can cluster critical paths having similar sensitivities to reduce number of RO • Low area overhead • Automated design flow, standard cells only 1 Delay . Vth Delaynom Critical path Delay 1 . Lgate Delaynom Andrew B. Kahng 110328 40 DDRO Synthesis Flow Gate sensitivities 1 Delay . Vth Delaynom Critical path sensitivities Cluster 1 Critical path DDRO error Cluster critical paths Delay 1 . Lgate Delaynom Cluster 2 45nm SOI test chip ARM Cortex M3 DDRO Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Average Synthesis result Delay sensitivity Error (%) For each cluster, synthesize a DDRO using integer linear program INV. RO 6 4 CPRO DDRO 2 Andrew B. Kahng 110328 41 Monte Carlo Simulation Results (30 samples) Without within-die variation modeling Estimated delay (ns) Estimated delay (ns) 1.2 Estimation error : -1.4 % ~ 3.7 % 1.2 Estimated delay (ns) Estimation error : -2.0 % ~ 4.1 % 1.2 1.1 1.1 1.1 1.0 1.0 1.0 Critical path RO DDRO 0.9 0.9 1.0 1.1 Actual delay (ns) Estimation error : -4.3 % ~ 7.1 % 1.2 0.9 0.9 1.0 1.1 1.2 Inv. RO 0.9 0.9 1.0 1.1 1.2 Actual delay (ns) Actual delay (ns) With within-die variation modeling Estimated delay (ns) Estimated delay (ns) 1.2 Estimation error : -0.5 % ~ 3.7 % 1.2 Estimated delay (ns) Estimation error : -1.3 % ~ 3.6 % 1.2 1.1 1.1 1.1 1.0 1.0 1.0 DDRO 0.9 0.9 1.0 1.1 Actual delay (ns) Estimation error : -1.7 % ~ 5.1 % Inv. RO Critical path RO 1.2 0.9 0.9 1.0 1.1 Actual delay (ns) 1.2 0.9 0.9 1.0 1.1 Actual delay (ns) 1.2 Andrew B. Kahng 110328 42 Outline • I. Motivation: Variability and Value • II. Classical Design for Manufacturing: DAM, MAD • III. Futures at the Design-Manufacturing Interface • MAD for recent patterning technologies • Living with variation: Resilience • Emerging technologies and new problems: EUV, 3D, … Andrew B. Kahng 110328 43 Double Patterning Lithography (DPL) + Desired pattern First Mask Second Mask Combined exposure Note: Shown here is the “Litho-Etch-Litho-Etch” (LELE) approach (e.g., TSMC 20nm node). Other DPL approaches offer different challenges. Andrew B. Kahng 110328 44 DPL Layout Decomposition d1<t d2<t d1<t d2<t d3<t d3>t d4>t d4>t • Two features must be assigned opposite colors (= masks) if their spacing is less than a given minimum coloring spacing t • IF two features within minimum coloring spacing t cannot be assigned different colors • THEN at least one feature must be split into two or more parts • Pattern split increases manufacturing cost and complexity • Overlay (misalignment error) between exposures causes pattern errors • Should choose robust splitting points in non-critical patterns • Coloring spacing t induces a graph of color conflicts: resulting “graph bipartization” formulations have been addressed in my group since 1998 Andrew B. Kahng 110328 45 DPL Layout Decomposition Flow (ICCAD’08) • Layout fracturing • Polygons rectangles • Graph construction • Conflict cycle (CC) detection • Overlap length computation • If there is a feasible dividing point node splitting • Otherwise, report an unresolvable conflict cycle (uCC) • Graph updating • ILP based DPL color assignment Layout fracturing Graph construction Conflict cycle detection Conflict cycle? No ILP Yes Overlap length computation Overlap margin? No uCC Yes Node splitting Graph update 46 Andrew B. Kahng 110328 46 Example DPL Coloring Results Poly layer Metal layer • Layouts are correctly decomposed with respect to a prescribed overlap margin at splitting points (for overlay robustness) 47 Andrew B. Kahng 110328 47 Bimodal CD distribution in DPL • Two patterning steps Two different CDs Green lines from 1st patterning Blue lines from 2nd patterning • Two different colorings Two different timings C12-type cell C21-type cell C12: Odd polys in BLUE, Even polys in GREEN C21: Odd polys in GREEN, Even polys in BLUE Gates from CD group1 Gates from CD group2 Andrew B. Kahng 110328 48 Impact of Bimodality on Guardband • Comparison of design guardband (Min-Max delay) • Unimodal representation is too pessimistic 3.0E-11 Delay (s) 2.5E-11 2.0E-11 Large CD group 1.5E-11 Small CD group 1.0E-11 5.0E-12 CD mean difference Best case: Large CD group Worst case: Large CD group Best case: Small CD group Worst case: Small CD group Best case: Pooled CD Worst case: Pooled CD 0.0E+00 1 nm 2 nm 3 nm 4 nm 5 nm 6 nm CD Mean Difference Jeong et al. ASPDAC’09 Andrew B. Kahng 110328 49 Impact of Bimodality on Path Delay • Bimodality can help reduce path delay variation • Reduction of covariance when alternately colored C12 + C12 C12 SPICE Simulation Results 25 + Alternate + + 20 +4 Variation () is accumulated C12 C21 C12 Uniform Sigma / Mean (%) C12 15 C21 10 + + - Variation () is compensated 5 - 0 0 0 1 2 3 4 5 6 CD Mean Difference (nm) Andrew B. Kahng 110328 50 Impact of Bimodality on Clock Skew • Different coloring sequences in a clock network Clock skew Case Source to Sink A Source to Sink B 1 2 C12+C12+C12+…+C12 C12+C12+C12+…+C12 C12+C12+C12+…+C12 C21+C21+C21+…+C21 Clock skew (s) 6.00E-11 5.00E-11 Case2 4.00E-11 3.00E-11 2.00E-11 Case1 1.00E-11 0.00E+00 0nm 1nm 2nm 3nm 4nm 5nm 6nm CD mean difference • Same color on all clock buffers is better! Andrew B. Kahng 110328 51 Bimodal CD Distribution: 3 Key Facts 1. Design requires bimodal-aware timing models • Unimodal representation is too pessimistic 2. Data paths benefit from alternate (mixed) coloring • Exploit existence of two uncorrelated CD populations • Minimize correlated variations in a given path 3. Clock paths benefit from uniform coloring • Correlated variation between launch and capture paths minimizes bimodality-induced clock skew Andrew B. Kahng 110328 52 Bimodality-Aware Timing Model and Analysis • Timing model G2 G1 G1 G2 • Two timing libraries: • G1L-G2S: group1 has larger CD than group2 • G1S-G2L: group1 has smaller CD than group2 • Two coloring versions of a cell in each library • C12: leftmost poly is in group1 • C21: leftmost poly is in group2 • Mean CD difference • Chosen from process information • E.g., 2nm, 4nm and 6nm • Timing analysis • Worse timing between G1L-G2S and G1S-G2L libraries is regarded as the actual worst-case timing Andrew B. Kahng 110328 53 DPL Layout-to-Mask Flow (ICCAD’09) RTL-to-GDS DPL Mask Coloring Alternate coloring using integer-linear programming Bimodal-Aware Timing Analysis Optimization 1 Maximization of Alternate Coloring (Datapaths) Coloring conflict > Minimum resolution Optimization 2 Placement Perturbation for Color Conflict Removal (Clock and Data paths) Placement perturbation using dynamic programming (“CORR”) Andrew B. Kahng 110328 54 Overall Timing Improvement • Bimodal timing model Reduce pessimism • Alternate coloring Improve timing • Placement perturbation Remove conflicts Stage #Conflict Initial Coloring (Unimodal) 0 Initial Coloring (Bimodal) 0 Alternative Coloring 219 DPL-Corr (+ECO Routing) 0 Mean CD Difference Timing Metric 2nm 4nm 6nm WNS (ns) -1.113 -2.016 -2.902 TNS (ns) -671.1 -1776.3 -3348.5 WNS (ns) -0.191 -0.354 -0.527 TNS (ns) -8.17 -26.56 -64.64 WNS (ns) -0.090 -0.145 -0.267 TNS (ns) -1.48 -3.85 -22.40 WNS (ns) -0.104 -0.183 -0.295 TNS (ns) -3.43 -10.45 -28.42 Complete design methodology to mitigate bimodality impact (timing models, timing signoff, coloring, placement) Andrew B. Kahng 110328 55 Summary: DFM = A Focus Area Since 1997 • Helps to see a big picture • Manufacturing processes (litho, CMP, …) • Designs (architectures, circuits, devices) • Design methodology and CAD • Large-scale optimization, algorithms • Industry structure, interfaces, handoffs • Fast-moving • Always many more research problems than time/students to handle • Spacer double-patterning, EUV lithography, 3D integration with through-silicon vias, resilience, aging, … • Relevant • Real-world solutions are urgently needed by industry • “Design = Equivalent Scaling” (this includes DFM) high value • (My research interests are more wide-ranging than this ) Andrew B. Kahng 110328 56 THANK YOU BACKUP Living With Variation: Resilience Outline • I. Motivation: Variability and Value • II. Classical Design for Manufacturing: DAM, MAD • III. Futures at the Design-Manufacturing Interface • MAD for recent patterning technologies • Living with variation: Resilience • Emerging technologies and new problems: EUV, 3D, … Andrew B. Kahng 110328 59 Why Resilience Is Required Paradigm shift is forced by technology scaling (variability ) Traditional Worst-Case Design Design‐Time Verification and Optimization Resilient Design Typical Case Optimization Run‐Time Verification “Better Than Worst‐Case Design” [Austin05] Relaxing the requirement of correctness for designs can dramatically reduce costs of manufacturing, verification and test ... Andrew B. Kahng 110328 60 Resilient Designs • Resilience: dynamic (runtime) reliability management • Tolerate errors with redundancy techniques • Avoid errors with a sensor and adaptive control Error Tolerance Error Avoidance Error Acceptance Recovery‐Driven Design Adaptive System Approximate Design performance monitoring detect and correct errors w/ Razor flip-flop dynamical voltage and frequency scaling performance power w/ relaxation (e.g., human sense related application) Andrew B. Kahng 110328 61 [DAC10] Recovery-Driven Design • Low-power methodology for error-tolerant designs • Minimize power for a target error rate • Slack redistribution w/ functional information Voltage Scaling reduce voltage until the error rate exceeds a target Path Optimization Power Reduction optimize frequently exercised, negative slack paths reducing power w/o affecting error rate Andrew B. Kahng 110328 62 Recovery-Driven Design: Experimental Results • Path extraction and error rate estimation accurate fast(20X) • Power comparison at each design technique 25% power savings w/ 2% error 22% power savings w/ Razor flip‐flop Power Consumption (W) 0.021 Conventional P&R Tight P&R 0.019 PCT Slack Optimizer 0.017 Power Optimizer 0.015 0.13% 0.25% 0.50% 1.00% 2.00% 4.00% 8.00% Error Rate (%) Andrew B. Kahng 110328 63 Resilience Overhead • Resilience incurs design overheads i.e., additional circuits and operations Razor flip-flop - 2x area, 1.5x power - additional energy on error recovery • Tradeoff between resilient overhead and design cost reduction # of Razor F/F Area (power) of fanin circuit tradeoff Goal: Minimize a cost function of (area, power) using the tradeoffs Andrew B. Kahng 110328 64 Resilience Overhead Reductions 1. Selective-endpoint optimization : Optimize endpoint incrementally based on sensitivity function p: negative slack path , fanin(p): # cell in fanin cone 2. Clock skew optimization : Maximize timing tolerance endpoint optimization clock‐skew optimization Andrew B. Kahng 110328 65 Approximate Arithmetic Designs Approximation generates good enough results rather than totally accurate results (e.g., for applications related to human senses) Approximate Adder Approximate Multiplier cut carry chain performance error rate: < 1% Lu et al. Computer, 2004 under‐designed multiplier 45% power reduction w/ 3.3% avg. error Kulkarni et al. VLSI Design, 2011 Andrew B. Kahng 110328 66 Error-Correctable Approximate Adder (ECA) • Proposed design: divide into sub-adders to cut carry chain error detection A [15:12] A [11:8] A [7:4] A [3:0] B [15:12] B [11:8] B [7:4] B [3:0] Co, S [15:12] S [11:8] (16‐bit adder implementation) S [7:0] (case 1) carry[8] && S[11:8] == 1111(2) (case 2) carry[4] && S[7:4] == 1111(2) error correction (case 1) S’ = S + 16’h1000 (case 2) S’ = S + 16’h0100 • Both accurate and approximate results are available accurate results approximate results Andrew B. Kahng 110328 67 BACKUP Unifying Litho/Mask, Design Rules, and Electrical Metrics Gate Line-End Patterning • Problem: • Transistor is no long rectangular Especially, gate line-end suffers from tapering • Tapering results in large leakage increase • To suppress tapering, long line-end extension and complex RET are required Increase manufacturing cost in terms of area and complexity • Traditional line-end metrics • Line-end gap (LEG), line width at gate edge (LW0) • Have guided litho and RET for many years, but may be oblivious to tradeoff of area, cost, and variation-robustness LW0 LEG Line-End Shortening (LES) Line-End Bridging (LEB) • Electrical model of line-end is required Andrew B. Kahng 110328 69 How Does Line-End Affect Current? • LEE affects the current (Ion and Ioff) at the gate edge. • As area of LEE increases, current at the gate edge increases sharply and the increase depends on the Ctaper 2.70E+07 2.60E+07 60nm 40nm 10nm 2.50E+07 2.40E+07 * From DaVinci 2.30E+07 2.20E+07 1 80nm fixed Varied: 10, 40, 60nm 4 7 10 13 16 19 22 25 28 31 Diffusion Gate 34 37 40 43 46 49 52 55 80nm fixed 70nm fixed Diffusion Andrew B. Kahng 110328 70 Impact of Line-End • LEE vs. Capacitance Increasing LEE • Line-end extension increases Cg because of fringe capacitance between line-end extension and channel • Capacitance vs. Vth Vth V fb 2 B Vth • Cg affects Vth, following Vth model equation. • Cg increase Vth decrease • Cg decrease Vth increase 4 si qN a B Cox V fb 2 B Cg 240 240 220 200 200 180 160 180 Ion(uA) 140 Current (Ioff: pA) • Ion and Ioff are functions of Vth • Vth increase Ion, Ioff decrease • Vth decrease Ion, Ioff increase Current (Ion: uA) • Vth vs. Current 220 Ioff(pA) 160 120 140 100 0 10 20 30 40 50 60 70 80 90 100 Line-End Extension (nm) 04/09/2008 Andrew B. Kahng 110328 71 Line-End Shape Evaluation y • Super-Ellipse Representation for line-end x yk 1 a b n b n o a k x • Typical Line-End Shapes Small n Large n 1 Small a Large a Mirroring b a Lnom (a) Tapering Minimum Necking Location lmin k 2 Large b Small b b ylmin b lmin k Mirroring 3 Lnom Lnom Lnom (b) Bulge (c) Necking • Linewidth Model Tapering h k li 2a1 i and Bulge bc n 1/ n Necking Andrew B. Kahng 110328 72 Mask + Design Rule + Performance: Unify! OPC Cost Design Rule Electrical Cost minEdgeLength line-end length leakage power Leakage Current vs. (minEdgeLength, LEE) OPC Cost Increase Dose=0.96, DoF=0nm 1.15E-10 Dose=1.04, DoF=0nm 1.30E-09 1.05E-10 1.10E-09 9.50E-11 8.50E-11 OPC 7.50E-11 default 6.50E-11 100nm 5.50E-11 50nm 4.50E-11 30nm 3.50E-11 10nm 9.00E-10 default 100nm 7.00E-10 50nm 5.00E-10 30nm 10nm 3.00E-10 2.50E-11 1.00E-10 1.50E-11 100 90 80 Design Area Increase 70 60 50 40 100 30 90 80 Dose=0.96, DoF=100nm 1.00E-07 70 60 50 40 30 Line-End Length (nm) Line-End Length (nm) Dose=1.04, DoF=100nm 1.00E-03 default 1.00E-04 default 100nm 1.00E-08 100nm 50nm 30nm 50nm 1.00E-05 30nm 10nm 1.00E-09 10nm 1.00E-06 100 90 80 70 60 50 Line-End Length (nm) 40 30 100 90 80 70 60 50 40 30 Line-End Length (nm) Design Rule Andrew B. Kahng 110328 73 Analysis on Tapering: Area vs. Cost vs. Ioff • Standard cell y e a a b a k x x yk 1 a b n b Poly c2 Diffusion f f NWell H d d g c1 Contact ab a e 10 8.00E-10 Large ‘n’ Small ‘n’ 7.50E-10 n=2.5 9 7.00E-10 n=3.0 n=3.5 8 6.50E-10 Misalignment: 11nm Ioff (A) o n 7 n=4.0 n=4.5 6.00E-10 6 n=5.0 Area Reduction (%) 5.50E-10 5 5.00E-10 4 4.50E-10 3 4.00E-10 2 3.50E-10 1 3.00E-10 0 100 90 80 70 60 50 40 30 Area Reduction (%) • New shape metric: Superellipse 20 LEE (nm) Andrew B. Kahng 110328 74 Best OPC + Design Rule Combination • Cell area can be reduced by 10% • Ioff can be reduced by 29% LEE (nm) LEG (nm) 10 20 30 40 50 60 70 80 90 100 100 B 224 220 376 189 181 145 162 164 152 90 B 220 212 340 185 176 146 159 162 153 80 B 216 213 370 180 172 140 156 159 150 70 B 209 207 330 175 168 140 151 156 148 60 B 204 200 353 172 164 136 147 153 145 50 B,S S S 310 162 160 132 142 148 142 40 B,S S S 347 161 161 108 138 144 139 30 B,S S S 342 S S S 135 S S 20 B,S S S S S S S S S S 10 B,S S S S S S S S S S Bridging: B Line-end shortening: S Broken: F Andrew B. Kahng 110328 75 BACKUP Emerging Technologies Outline • I. Motivation: Variability and Value • II. Classical Design for Manufacturing: DAM, MAD • III. Futures at the Design-Manufacturing Interface • MAD for recent patterning technologies • Living with variation: Resilience • Emerging technologies and new problems: EUV, 3D, … Andrew B. Kahng 110328 77 SADP Processes • Self-aligned double patterning (SADP) spacer Mandrel Oxide Spacer formation Oxide 1st Litho-Etch Oxide Oxide etch & Cu filling • Spacer is Dielectric (SID) • Spacer is Metal (SIM) • SID uses simpler processes Resist Block Mask Oxide Etch Dielectric Metal (to be filled) Cut Mask Spacer Cut Ash Tone Reversal Oxide Etch Figure courtesy Y. Ma et al., GlobalFoundry, SPIE 2010 Andrew B. Kahng 110328 78 CD Variability in SID-SADP Block mask width variation (SB) Spacer thickness variation (SS) Mandrel width variation (SM) Overlay (SM-B) CD Mandrel Block mask Spacer Overlay Width variation sM sB sS Width variation (2.0n@2012) (2.0n@2012) (1.3n@2012) (from ITRS LITH5B) Mandrel-to-Block sM-B (8n@2012) Feature Variance (2) (nm) Metal CD by mandrel sM2 2.0 Metal CD by gap sM2+(2sS)2 3.3 Metal CD by wide mandrel+block (0.5sM)2+sM-B2+(0.5sB)2 8.1 Metal CD by wide gap+block sM2+sS2+sM-B2+(0.5sB)2 8.4 Andrew B. Kahng 110328 79 Variability Vs. Mask Assignments • Block mask patterns that define line edge should be avoided Line edges defined by only spacers block mask (yellow) defines line edges With overlay error Target metal pattern Final pattern CD variation No CD variation • Patterns by mandrel vs. patterns by gap • There is still a bimodality between A and B ! • Important signal nets should be generated by mandrel B A Coloring1 Coloring2 A < B A > B Andrew B. Kahng 110328 80 Extreme Ultraviolet Lithography (EUVL) ArF (193nm) EUV (13.5nm) • What is EUV? • A form of soft X-rays • Much smaller and higher energy • EUVL requires a new set of systems Multilayer reflective mask E 1000nm 100nm IR VUV 1nm 1nm Soft X-ray UV 1eV 10nm EUV 10eV 100eV 6.9eV Hard X-ray 1KeV 10KeV 92eV Figure courtesy G. Vandenberghe, IMEC, 2008 ArF 193nm EUV New illumination optics High power source Projection optics Resist Figure courtesy G. Vandentop, Intel, SPIE 2009 wafer Bossung curves for EUV vs. ArFi • Benefits of EUV • Sharp and less corner rounding • Large process window No complicated OPC Reduced data volume / mask write time CD(nm) ArF: 40nm DR EUV: 28nm DR Focus (nm) Figure courtesy H. Meiling, ASML, SPIE 2009 Andrew B. Kahng 110328 81 Research Directions for EUVL • Development of high power EUV sources • New resist materials to balance resolution, sensitivity and LER • Defect-free mask generation • Buried blank defects in multilayer CD errors • Defect density target: 1 defects/cm2 today ~0.003 defects/cm2 for logic Figure courtesy T. Terasawa et al., Selete, 2009 • New inspection/metrology sensitive to small volume defects • Biggest gap exists between reality and requirement • Design issues • Design methodology with reduced guardband • Fast simulator for impact of buried defects • Defect-aware reticle floorplan Andrew B. Kahng 110328 82 Why 3-D Integration? Small footprint Multiple small dies Selective assembly Reduced interconnect Reduced capacitance ~50% Lower power Long interconnect Improved yield Reduced cost ~40% Higher speed Higher bandwidth Short interconnect CPU MEM Offchip-package BW < 100GB/s CPU MEM 2D Planar MCP BW =100-200GB/s MEM CPU 3-D Stacked Die MCP BW > 1TB/s Higher security (from reverse engineering) Enabling heterogeneous integration (Different technology/foundry/devices/…) Andrew B. Kahng 110328 83 Research Directions in 3-D Integration • Challenges • Design • Heat dissipation / Power distribution / chippackage system co-design / 3-D design partitioning / fast simulation and verification / standardization • Material Figure courtesy A. Sridhar et al., IBM, 2010 • Stress-temperature resistant material / superior thermal- electrical characteristics / lower elasticity • Process • Wafer thinning / stress relief / wafer-to-wafer alignment and bonding / TSV generation • Test • Known-Good-Die (KGD) assurance / accessibility to individual wafer/die/TSV Microchannel cooling Figure courtesy A. Sridhar et al., IBM, 2010 Andrew B. Kahng 110328 84 BACKUP Compressed Sensing DCT-Based Compressed Sensing • Li et al., ICCAD-2009, “Virtual Probe (VP)” • A variation can be restored when DCT coefficients G(u,v) are known, using inverse DCT -41.25 20.16 -3.92 2.17 -0.93 0.73 -0.37 0.32 -0.16 0.13 -0.05 A η B; 20.16 -3.92 0.00 A10.00 ,1,1 0.00 0.00 A2,1,1 0.00 A 0.00 0.00 0.00 0.00 0.00 ,1,1 0.00 APQ0.00 0.00 0.00 0.00 0.00 Ak ,u ,v 0.00 0.00 0.00 0.00 2.17 -0.93 0.73 -0.37 0.32 -0.16 0.13 -0.05 A1,1, 2 0.00 A1, P,Q 0.00 0.00 G ( 1 , 1 ) ( 1 , 1 ) g 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 g (1, 20.00 A A G ( 1 , 2 ) ) 2 , 1 , 2 2 , P , Q 0.00 0.00 0.00 0.00 0.00 0.00 η 0.00 B 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 APQ,1, 2 0.00 0.00 APQ, P,Q0.00 ) G ( P , Q0.00 ) g (P, Q 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 (2 0.00 x 1)(0.00 u 1) 0.00 (20.00 y 1)(v0.00 1) 0.00 cos 0.00 0.00 cos 0.00 uv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 P 0.00 0.00 0.002Q 0.00 0.00 • Sparsity of DCT coefficients enables efficient restoration from a small number of samples Minimize L1-norm of DCT coefficients minimize: subject to: 1 + 2 + … + PQ A η B −i i i, (i = 1,2, . . . ,PQ) Andrew B. Kahng 110328 86