Demystifying TSV-Based 3D Stacked ICs Design Paul Franzon North Carolina State University Raleigh, NC paulf@ncsu.edu 919.515.7351 Outline 3DIC and Through Silicon Via (TSV) technology System Drivers for 3DIC 3D specific design 3D CAD Flow Open issues in 3DIC © 2012 Paul D. Franzon 2 3DIC with internal connections but NO TSV Face to face bonding 10 - 50 m pitch, 30 m typical © 2012 Paul D. Franzon 3 3DIC Without TSV AC Coupled Interconnect Heat Typical: 30 m pitch Face-to-face capacitive Face-to-face or face-to-back Heat Power inductive Few micron of slip Power Consumption Power Best to date: Capacitive: 2 pJ/bit Inductive: 0.14 pJ/bit Efficient power delivery difficult For Capacitive, oxides must be in contact Buried bump for DC (could be used in RDL) © 2012 Paul D. Franzon 4 3DIC with TSVs Technology set: Underfill Wafer Thinning © 2012 Paul D. Franzon 5 Simplified Process Flow 1. Etch TSV holes in substrate Max. aspect ratio 10:1 hole depth < 10x hole radius 2. Passivate side walls to isolate from bulk © 2012 Paul D. Franzon 6 … Simplified Process Flow 3. Fill TSV with metal Copper plating, or Tungsten filling 4. Often the wafer is then attached to a carrier or another wafer before thinning 5. Back side grinding and etching to expose bottom of metal filled holes Wafer Thinning 6. Formation of backside microbumps © 2012 Paul D. Franzon 7 … Simplified Process Flow 7. Wafer bonding and (sometimes) underfill distribution Underfill TSV enabled 3D stack © 2012 Paul D. Franzon 8 Transistor/TSV Integration Options Via-First/ Via-Middle Face-to-Face Via-Last Face-to-Back Back-to-Back Each option offers its own unique blend of Cost Via Density Routing Congestion Heat Dissipation © 2012 Paul D. Franzon SOI 9 Coarse pitch TSV Pitch: 25 m to 250 m Advantages Reduces need for wafer thinning Established production route because of cell phone cameras Samsung Disadvantages Limits architectural solutions Really Advanced Packaging, not advanced integration Numerous vendors, captive and external © 2012 Paul D. Franzon 10 High Density TSV Pitch: 0.5 m to 10 m Advantages: Permits architectural optimization MIT LL Disadvantages Adds processing cost Adds complexity in design and test Limited supply chain High Density TSVs create unique Opportunities for 3D Specific Systems Tezzaron © 2012 Paul D. Franzon 11 3-Tier 3DIC Cross-Section Second DARPA Multiproject Run (3DM2) Two Digital & One RF 180-nm 1.5V FDSOI CMOS Tiers Transistor Layers RF Back Metal Cvia ~ 0.4 fF 3D Via Tier-3 Oxide Bond Interface 3D Via Tier-2 Tier-1 Tier-1 Transistor Layer 3DM2 Process Highlights 11 metal interconnect levels 1.75-m 3D via tier interconnect Stacked 3D vias allowed Tier-2 back-metal/back-via process © 2012 Paul D. Franzon 20 m 2-m-thick RF back metal Tier-3 W gate shunt Tier-3 silicide block MIT Lincoln Labs 12 Outline SOI Process Assembly Greatly simplifies TSV formation Oxide Epi Metal Buried Oxide Transistors Bulk Silicon 1. Oxide-Oxide Bond 2. Silicon Etch 3. Via Formation 4. Repeat © 2012 Paul D. Franzon 13 Tezzaron 3D Technology: 0.13 um Bulk CMOS Cvia ~ 4 fF Cvia ~ 25 fF 6.6 m pitch 25 m pitch Tezzaron © 2012 Paul D. Franzon 14 “Commercial” TSV Options Tezzaron Down to 1.2 m features, Tungsten IBM IMEC ~40 m pitch, Copper CETI/LEA (ST-Micro and others) 10 m pitch Cu Tezzaron TSMC Samsung Elpida IMEC Packaging (“Via Last”) AllVia Amkor, ASE, ? © 2012 Paul D. Franzon 30m AllVia 15 Chip to Wafer (C2W) vs. Wafer to Wafer (W2W) Wafer to Wafer (W2W) Wafer 1 Thin Mount Wafer 2 Wafer 3 Thin Mount Bump Advantages Disadvantages Simpler Identical sized chips Lower Cost Accumulated Yield Loss Higher via Density Better Alignment © 2012 Paul D. Franzon Thinner Chips 1 tier 2 tiers 3 tiers 4 tiers 90% 81% 73% 65% 16 (Thick) Chip to Wafer (C2W) ONE chip to wafer (or wafer stack) (face mounted) Test Wafer 1 Dice Test Wafer 2 Advantages Disadvantages Known Good Die – no accumulated yield loss Higher cost – serial pick and place; Different die sizes Worse alignment Wafer die size largest Solder bump requires coarse TSVs in one layer Limited stacking © 2012 Paul D. Franzon 17 (Thin) Chip to Wafer (C2W) Multiple chips to wafer (or wafer stack) Mount Wafer 1 Thin Temporary Carrier Test Dice Attach/Demount Wafer 2 Wafer 3 Test Advantages Disadvantages Known Good Die in multiple chips Highest cost – temporary carrier Thin TSV – Little area loss to connections to solder bumps Still in research © 2012 Paul D. Franzon 18 Interposers and RDLs Redistribution layer = thick metal (layers) added to wafers to customize interface to next chip in 3D stack Interposer = Silicon or other carrier used to mount chips WITHIN package Examples: 1-2 m thick metal 50 – 200 m Modifed Legacy Process © 2012 Paul D. Franzon Modifed 65 nm or 90 nm Back End of Line Process 19 Possible Assemblies In a commercial product integrated amongst multiple vendors, the IO within the chip stack will not line up without standards Memory ASIC Face to Face ASIC Memory Chip on Substrate on Chip Memory ASIC ASIC Memory TSV Through ASIC TSV Through Memory with backside with backside Redistribution Layer (RDL) Redistribution Layer (RDL) © 2012 Paul D. Franzon 20 Substrate Alternatives Side-by-side mounting Silicon Interposer or thin film Multi-chip Module RAM ASIC ASIC RAM ASIC Face-up-Silicon Interposer No TSVs TSV Enabled Top-to-bottom mounting ASIC ASIC Memory Memory Conventional Interposer TSV Enabled Silicon Interposer e.g. High Density Laminate © 2012 Paul D. Franzon 21 Impact of Substrate Alternatives Approach Advantages Disadvantages Side-by-side Silicon No TSVs - Readily available - No power advantage - Routability limitations - Limited # of Power/Ground planes -Limited # of external pins and pin routing Side-by-side Silicon with TSVs - Some availability - Can support high external pin count - No power advantage - Routability limitations - Limited # of Power/Ground planes Top-to-bottom laminate - High availability - Can support multiple power/ground planes - Reduced IO power - Must SerDes channels (higher interface clock rate) Top-to-bottom Silicon with TSVs - Lowest IO power - Limited availability - Limited # of Power/Ground planes © 2012 Paul D. Franzon 22 Relative Manufacturing Costs ASIC Chip DRAM Chip DRAM KGD Test/chip ASIC KGD Test/chip Assembled stack test W2W 3D steps / chip C2W 3D steps / chip Interposer / chip stack 2,000 pin package ($10) © 2012 Paul D. Franzon 23 Outline 3DIC and Through Silicon Via (TSV) technology System Drivers for 3DIC Load Reduced Memory Power Reduction Optimized Technology Mix Miniaturization Yield improvement Packaging cost reduction High capacity memory systems 3D specific design 3D CAD Flow Open issues in 3DIC © 2012 Paul D. Franzon 24 Future 3DIC Product Space Image sensor 3D Mobile Sensor Node Heterogeneous Server Memory 3D Processor Interposer “Extreme” 3D Integration Time © 2012 Paul D. Franzon 25 Load Reduced Memory Stacked memories to reduce capacitive load in servers Samsung (ISSCC 10) IBM © 2012 Paul D. Franzon 26 Dark Silicon Performance per unit power Systems increasingly limited by power consumption, not number of transistors “Dark Silicon” : Most of the chip will be OFF to meet thermal limits © 2012 Paul D. Franzon 27 Energy Efficient Memory Bandwidth Many applications are headed to requiring 1 TBps of memory bandwidth e.g. Multicore CPUs Intel © 2012 Paul D. Franzon 28 Bandwidth is going to be expensive Computers, games, networking TBps interfaces to main memory DIMM Modules c/- Intel Multi-core Processor 3,000 pin package: $50 - $80 1 TBps: 400 pairs @ 20 Gbps 240 W @ 30 mW/Gbps 1 TBps in DDR3 600 W 16 W @ 2 mW/Gbps © 2012 Paul D. Franzon 29 DDR3 Optimized DRAM core 4.8 nJ/word 128 pJ/word MIPS 64 core 400 pJ/cycle 11 nm 0.4 V core 200 pJ/op 45 nm 0.8 V FPU 38 pJ/Op SERDES I/O 1.9 nJ/Word 20 mV I/O 128 pJ/Word LPDDR2 1 cm / high-loss interposer 0.4 V / low-loss interposer On-chip/mm 512 pJ/Word 300 pJ/Word 45 pJ/Word 7 pJ/Word TSV I/O (ESD) 7 pJ/Word TSV I/O (no ESD) 2 pJ/Word © 2012 Paul D. Franzon Various Sources (64 bit words) Energy per Operation 30 Wire Power Reduction Exemplar power distribution © 2012 Paul D. Franzon FFT Processor 31 Shorter wires Modest Returns Relying on wire-length reduction alone is not enough 2D Design 0.13 m Cell Placement split across 6.6 m face-to-face bump structure Results get less compelling with technology scaling, as the microbumps don’t scale © 2012 Paul D. Franzon 32 Memory on Logic Conventional TSV Enabled Less Overhead Flexible bank access x32 to x128 or nVidea © 2012 Paul D. Franzon N x 128 Less interface power “wide I/O” 3.2 GHz @ >10 pJ/bit 1 GHz @ 0.3 pJ/bit Processor & SRAM Flexible architecture Short wires Exploit dense face-to-face or Mobile 33 Mobile Graphics Problem: Want more graphics capacity but total power is constrained Solution: Trade power in memory interface with power to spend on computation POP with LPDDR2 TSV IO LPDDR2 Power Power Consumption GPU 532 M triangles/s © 2012 Paul D. Franzon TSV Enabled GPU Consumption 695 M triangles/s Won Ha Choi 34 Architectural Power Optimization Re-architect the system to: Leverage low-power vertical wires Move data to save power when the opportunity presents Replace interface power with compute power © 2012 Paul D. Franzon 35 Optimized Technology Mix Heterogeneous Integration: Older analog process and advanced digital process Analog designs do not “shrink” well while synthesized digital designs do Analog redesign in an advanced node is expensive, timeconsuming and often brings little advantage Other possible examples: III-V + Silicon for power, opto-electronics, RF transmitters Dedicated “passives” layer Planar RF and inductive components Decoupling Capacitors for power noise reduction IBM simulated reduction in power noise from 300 mV to 20 mV by mating it to a layer of trench capacitors © 2012 Paul D. Franzon 36 3D Miniaturization Cell phone cameras Height reduction through TSVs Miniature Sensors mm3 scale Implantable cm3 scale Food Safety & Agriculture Real problem is power delivery and storage © 2012 Paul D. Franzon 37 3D Miniaturization Other applications: Biomedical Industrial Main limitations are process flow and manufacturing related RF harvester/sensor + Antenna MEMS Low-power mixed signal ASIC Low power Nonvolatile memory Secondary battery/ultracapacitor © 2012 Paul D. Franzon 38 Yield Improvement Example: Xilinx multi-chip Virtex 7 © 2012 Paul D. Franzon 39 Package Price Improvement ASIC Chip DRAM Chip DRAM KGD Test/chip ASIC KGD Test/chip Assembled stack test 0.02c / pin* W2W 3D steps / chip C2W 3D steps / chip 0.5c / pin Interposer / chip stack 2,000 pin package ($10) © 2012 Paul D. Franzon * Actually cost largely independent of # pins 40 Outline 3DIC and Through Silicon Via (TSV) technology System Drivers for 3DIC 3D specific designs Specific details : ESD protection and TSV parasitics 3D specific logic design examples 3D-optimized FFT processor 3D-optimized radar processor 3D CAD Flow Open issues in 3DIC © 2012 Paul D. Franzon 41 ElectroStatic Discharge (ESD) Protection There are NO published definitive studies as to what level of ESD protection is needed R Current “working” assumptions 3D integration through interposer – Can distribute amongst tiers 3D integration through stacking in separate fabs Need full ESD protection (~ 1 pF) Need machine model ESD protection only (~250 fF) 3D integration within fab Fab can specify (Tezzaron: Antenna diode) © 2012 Paul D. Franzon 42 IMEC TSV Parasitics Plas et.al., ISSCC 2010 40 fF On-chip interconnect: ~ 70 – 300 fF/mm © 2012 Paul D. Franzon 43 Interposer and “Coarse” TSVs Single TSV equivalent model © 2012 Paul D. Franzon 44 Interposer and “coarse” TSVs Frequency dependent losses © 2012 Paul D. Franzon 45 Interposer and “coarse” TSVs Crosstalk models needed © 2012 Paul D. Franzon 46 Power – Signaling rate trade-off Power (W) 512 GBps total bandwidth 7 1 Gbps, 3.4 pF 6 8 Gbps, 3.4 pF 5 8 Gbps, 2 pF 4 1 Gbps, 2 pF 3 More efficient to operate at 1 V 2 0.2 mW/Gbps 1 (0.2 pJ/bit) 0 0 © 2012 Paul D. Franzon 0.1 0.2 0.3 0.4 0.5 0.6 V – swing of signal 0.7 0.8 0.9 c/- John Wilson, Rambus 47 47 Synthetic Aperture Radar Processor Built FFT in Lincoln Labs 3D Process Metric Bandwidth (GBps) Energy Per Write(pJ) Energy Per Read (pJ) Memory Pins (#) Total Area (mm2) © 2012 Paul D. Franzon Undivided 13.4 14.48 68.205 150 23.4 Divided 128.4 6.142 26.718 2272 26.7 % +854.9 -57.6 -60.8 +1414.7 +16.8% 48 3D FFT Floorplan All communications is vertical Support multiple small memories WITHOUT an interconnect penalty AND Gives 60% memory power savings © 2012 Paul D. Franzon Thor Thorolfsson 49 2DIC vs. 3DIC Implementation vs. Metric Total Area (mm2) 2D 3D Change 31.36 23.4 -25.3% 19.107 8.238 -56.9% 63.7 79.4 +24.6% Power @ 63.7MHz (mW) 340.0 324.9 -4.4% FFT Logic Energy (µJ) 3.552 3.366 -5.2% Total Wire Length (m) Max Speed (Mhz) © 2012 Paul D. Franzon Thor Thorolfsson 50 RePartition FFT to Exploit Locality Every partition is a PE Every unique intersection is a memory © 2012 Paul D. Franzon 51 Thermal Evaluation © 2012 Paul D. Franzon Thor Thorolfsson, Samson Melamed, Rhett Davis, Gradient Inc. 52 Tezzaron SAR Processor Metric 2D 3D Total Wire length (mm) 588 487.3 -1.17% 464.8 -21% Max. Frequency (MHz) 31.6 33.84 +7.1% 38.74 +22.6% Max Performance (MFlops) 316.1 338.4 +7.1% 387.4 +22.6% Parasitic Power (mW) 1.51 1.79 -15.5% 0.984 -45.2% Logic Power (mW) 5.975 5.692 -4.8% 5.21 -12.9% Memory Power (W) 10.3 3.1 -71% © 2012 Paul D. Franzon Thor Thorolfsson mPl 3D 53 Tezzaron “Dis-integrated RAM” Mixed technology concept DRAM arrays in low-leakage DRAM technology (at node N) Peripheral circuits in highperformance logic process (at node N-1) Bit and word lines fed vertically at array edge No repair or test prior to assembly BIST and CAM based remapping in logic layer Configuration Claimed results Reduced overall cost/bit Two metals only in DRAM tiers Effective ~ 60-70% fill factor (?) Faster timing on interfaces © 2012 Paul D. Franzon Density Burst access in page/port 8 x 128-bit ports 90 nm DRAM on 130 nm logic 1 Gb/layer of DRAM 1 Gword/s (128 Gbps) 54 High Capacity Memory Systems ~ 8 – 16 GB memory @ 1 TBps Many applications and advantages in computing, graphics, networking Challenges: Cost; yield; test; repair Interposer © 2012 Paul D. Franzon 55 Outline 3DIC and Through Silicon Via (TSV) technology System Drivers for 3DIC 3D specific design 3D CAD Flow Open issues in 3DIC © 2012 Paul D. Franzon 56 CAD Flow SystemC Models e.g. NOC performance, power evaluator Architectural Evaluator SRAM evaluation Trial Designs Chip/package codesign e.g. Interconnect, Thermal Partitioning TSV placement Improved Estimators Floorplanner Power Rings IMMU 1 Wishbone Traffic Cop SRAM Address Pins 2520 µm SRAM Address Pins 2520 µm DMMU 1 Instruction SRAM Data Memory Controller Data SRAM SRAM Address Pins SRAM Address Pins OpenRISC Data Wishbone Interface Module 1 Data SRAM 2520 µm CPU 1 Data SRAM OpenRISC Data Wishbone Interface Module 2 SRAM Address Pins Data SRAM OpenRISC Instruction Wishbone Interface Module 2 OpenRISC Instruction Wishbone Interface Module 1 CPU 2 True 3D Routability FloorplannerEvaluation IMMU 2 DMMU 2 Power Stripe 2524 µm 2524 µm NCSU tools © 2012 Paul D. Franzon 2524 µm Commercial Tools Desirable Tools 57 … CAD Flow wafer Individual Tier Place and Route thickness True 3D Design Kit TSV 3D DFM diameter transistor keepout TSV bond pitch Chip Reassembly & Layout Verification Thermal Extraction LVS, DRC and Performance Verification NCSU tools © 2012 Paul D. Franzon Commercial Tools Thermal Verification True 3D Design Kit Desirable Tools 58 Partitioning Alternatives Fix TSVs in one layer and propagate Thorolfsson Average TSVs across tiers Schoenfloss, Davis © 2012 Paul D. Franzon 59 Medium-Resolution Thermal Simulation Collaboration with Gradient Full-fidelity material model (including all vias, wires, etc.) Heat sources placed over individual channels Hot Spots = Clock buffers (local heating in SOI) 2.9–23.5µm elements •Tier A © 2012 Paul D. Franzon •Tier B 8.5G RAM, 150 minute runtime •Tier C 60 Other CAD/Design Issues Potential for logic-on-logic stacking Issues: Increased heat density in high performance logic Complexity of clock distribution Limited via density 3D Place and Route “True” 3D placement can be used to improve a multi-tier logic design and is relatively easy to implement While 3D routing has been demonstrated, a fully compatible 3D router requires a lot of work Thermo-mechanical issues Stress, especialy package induced stress Power and Signal integrity TSV current ~ 1 – 10 mA Particular difficulty designing “feed through” vias © 2012 Paul D. Franzon 61 Design For Stress with Cu TSVs Issues: Stress Gradient around CU TSVs Changes Vt of strained transistors Package-chip interaction stress Esp. with low-k dielectrics Solutions: Keep out zones Choose nearby circuits carefully DFM Vt calculator Use Tungsten Example: DRAM Sense Amplifiers © 2012 Paul D. Franzon 62 Outline 3DIC and Through Silicon Via (TSV) technology System Drivers for 3DIC 3D specific design 3D CAD Flow Open issues in 3DIC © 2012 Paul D. Franzon 63 Challenges Challenge Problem Solution(s) Cost TSV processing adds ~10% to chip fab cost Focus on adding value, or Recover cost from simpler packaging Test Can not probe ~10,000 microbumps costeffectively Use of BIST and a dedicated probe test port; good partitioning Thermal Hot spots Memory leakage Better integration of tools; Better thermal isolation and conduction layers Codesign Managing performance; thermal; power and signal integrity simultaneously ESL (SystemC) centric early design flow (“pathfinding”) © 2012 Paul D. Franzon 64 Challenge # 1: Cost Issues Supply Chain: Who does what steps? Who owns failed portions? Mobile market very cost sensitive TSVs, etc. increase wafer cost 5% - 10% Additional test steps increase assembly cost © 2012 Paul D. Franzon 65 Challenge #2 : Test, Repair & Validation Problems: Added test steps increase cost of test TSV/microbump test : probe, circuit-based test or ignore? Probe = expensive probe card, microbump damage & ESD needed Circuit-based test = extra area, test time Ignore = test escape TSV redundancy Not clear if needed Validation How to debug in the middle of a chip stack? TSV circuit for self-test © 2012 Paul D. Franzon 66 Challenge #3: Thermal and Power Delivery Power delivery to top chips, and heat removal through bottom chips is THROUGH the chip stack. Requires: Heat Out (Watts) Codesign of chips and package CAD interchange when the chips come from different foundries Want DRAM temperature to be 85 C (for leakage control) while HP procesor might run at 105 C – how to prevent coheating? I/O (Gbps) Power In (Amps) Gerousis, Cadence © 2012 Paul D. Franzon 67 Challenge #4: Codesign and CAD Chip-package codesign to manage power, thermal, stress and co-optimize floorplans of chips in stack Interchange formats for codesign across corporate boundaries DFM and ESD rules DFT to minimize added test cost © 2012 Paul D. Franzon 68 SystemC Methodology for Pathfinding User configuration ISA mode Power Library Arch. setup, Execution control Unit energy, TSV constants Power Model (TLM) ISA Library Update TSV constants Commands 1. Logic: command processor, streamer, etc Reference for control 2. Memory 3. Power Tracker Simulation results Update virtual coord. Power Manager 1. Stores power information 2. Generates pre-placement related information 3. Update virtual coordinates and TSV constants © 2012 Paul D. Franzon Won Ha Choi 69 Thermal and Physical Flow: Comprehensive technology file Composite technology file Resolution of simulation: Grid Size Textual Floor plan WireX: Power Thermal Extractor Thermal MNAM Power vector Hotspot only PETSC: Sparse Matrix Solver Static Thermal Profile © 2012 Paul D. Franzon Transient Simulator e.g. HSPICE/fREEDA Transient Thermal Profile Shivam Priyadashi 70 CAD Interchange Standards 3DIC requires more “initimate” chip to chip codesign than in 2D Often in different tools from different vendors Current working group (lead by Sematech and SRC) trying to determine CAD interchange standards Thermal Electrical Physical (Some more on this in my talk during the conference) © 2012 Paul D. Franzon 71 Conclusions 3DIC with TSV being vigorously pursued as a savior to the cost of lithography at 22 nm and below In the short term, coarse TSV used to improve “packaging” performance and size In the mid to long term, high density TSV presents a significant opportunity to improve performance and power But system needs to be re-arhitected to exploit dense TSV arrays Memory needs to be repartitioned BUT issues include: Cost recovery through clever design and process learning Complexity of codesign Test management © 2012 Paul D. Franzon 72 Acknowledgements Faculty: William Rhett Davis, Michael B. Steer, Professionals: Steven Lipa, Neil DiSpigna, Students: Hua Hao, Samson Melamed, Peter Gadfort, Akalu Lentiro, Shivam Priyadarshi, Christopher Mineo, Julie Oh, Won Ha Choi, Zhou Yang, Ambirish Sule, Gary Charles, Thor Thorolfsson, Department of Electrical and Computer Engineering NC State University © 2012 Paul D. Franzon 73