System and Circuit Level Power Modeling of Energy-Efficient 3D-Stacked Wide I/O DRAMs Karthik Chandrasekar TU Delft Christian Weis$, Benny Akesson*, Norbert Wehn$ & Kees Goossens# $ * # Overview • • • • • • Motivation for 3D-stacking of DRAMs Problem Statement - Power Modeling Circuit-level DRAM architecture & power model System-level DRAM power model (DRAMPower) Comparison: Results and Analysis Summary 19-Mar-13 Karthik Chandrasekar / TU Delft 1 Motivation: Why 3D-Stacked DRAMs? State of the art: Mobile LPDDR/2/3 3D-Stacked Wide IO PoP (Package-on-Package) bumps) Off-Chip Interconnects (on (μ PCB) TSV (Through Silicon Via) - Many Dies Capacitance: 8 to 20pF (-50% Power) PoP (Package-on-Package) (μ bumps) Capacitance: ~2pF (-85% Power) 1 or 2 Channels (x32) (Low Bandwidth) 4 Channels (x128) (High Bandwidth) [I/O power per bit: 0.7mW in TSV vs 2.3mW in PoP vs 4.6mW in Off-Chip – Samsung] The Performance Vs. Power Factor 80 60 40 Bandwidth 14 Power 12 10 8 6 4 20 2 0 Peak Bandwidth (GBps) Power (mW/GBps) 100 0 SC LPDDR2 x32 (400) DC LPDDR2 x32 (533) DC LPDDR3 x32 (800) QC Wide IO x128 (200) Images & Data Courtesy: HMC, JEDEC 42.6, FineTech, Nvidia, Samsung 19-Mar-13 Karthik Chandrasekar / TU Delft 2 What’s missing? [Problem Statement] An accurate 3D-DRAM Power Model to design DRAM-stacked SoCs 19-Mar-13 Karthik Chandrasekar / TU Delft 3 Approaches to power modeling • Circuit-level Power Model – Modeling the DRAM architecture at the circuit-level in SPICE – Pros: Accurate and detailed – Cons: Slow, requires circuit-level understanding of DRAM architecture & technology specifications for DRAMs are publicly unavailable • System-level Power Model (like Micron’s) – Based on vendor provided datasheet measures and JEDEC specifications – Pros: Fast, easy to integrate & employs simple models for memory operations – Cons: Accuracy is unclear. Not directly applicable for 3D-DRAMs and is not verified against circuit-level models or hardware measurements. Need: Fast, Simple & Accurate Model 19-Mar-13 Karthik Chandrasekar / TU Delft 4 What’s the solution? Develop A System-Level 3D-DRAM Power Model i.e. as accurate as A Circuit-Level 3D-DRAM Power Model 19-Mar-13 Karthik Chandrasekar / TU Delft 5 Circuit-Level DRAM Modeling Baseline DRAM Model • • • (Weis) DATE‘11 and DAC‘13 NGSPICE - PTM/BSIM 1T1C Cell to Banks 2D to 3D (New) • • • • • • • Based on DATE ‘11 & JEDEC Wide IO – x512 4 Banks/Channel 4 Channels TSV Routing – Data, Cmd & Addr – Control, Clock & Power No ODT (On Die Termination) – Low Freq. & IO Capacitance No DLL (Delay Locked Loop) TSV model from IMEC/GaTech 19-Mar-13 Karthik Chandrasekar / TU Delft 6 System-Level Power Model (DRAMPower) Comparison to Micron model • Problem with Micron’s model: • • • Not directly applicable for 3D-DRAMs (Multiple voltage domains and IO) Accuracy is unclear (State transitions not addressed & Approx. workload used) Not verified against circuit-level models or hardware power measurements. • Adapting to 3D-DRAMs: • Considers multiple voltage domains: (a) Core (b) Derived (Wordline) • Includes IO power consumption (Incl. I/O Pads, Buffers, Bumps, Drivers & Pins) • RD operation Energy (Generic equation): • Modeling for Accuracy: • Models memory state transitions – from active to power-down • Models self-refresh accurately (functional correctness & timing difference) • Most importantly: Is almost as accurate as the circuit-level model 19-Mar-13 Karthik Chandrasekar / TU Delft 7 Self-Refresh Operation - Accuracy Micron SREF NOP NOP NOP NOP NOP NOP NOP SREX NOP NOP NOP NOP NOP Timings <--------- ---------- ---------- -------SR EF------- ---------- ---------- --------> <--------- ---------- ---------- -XSDLL- ---------- --------> Active Current Bckgnd Current IDD6 IDD6 IDD6 IDD6 IDD6 IDD6 IDD6 IDD6 IDD2N IDD2N IDD2N IDD2N IDD2N IDD2N Actual • • Internal Refresh No DLL Actual SREF Timings <--------Active IDD5Current IDD3N Bckgnd Current IDD3P0 NOP NOP NOP NOP NOP NOP NOP SREX NOP NOP NOP RFC-RP --------> <-------R P-------> <--------- --SREF-- ---------> <--------- ---------X S-------- ---------> IDD5- IDD5- IDD5- IDD5IDD3N IDD3N IDD2N IDD2N IDD3P0 IDD3P0 IDD2P0 IDD2P0 IDD6 IDD6 IDD6 IDD2N IDD2N IDD2N IDD2N We furnish new equations in the system-level power model to address such accuracy issues 19-Mar-13 Karthik Chandrasekar / TU Delft 8 Comparison: Results & Analysis • Experiment I: – Different Operations – Different Granularity • Results: – Less than 2% difference – Adapted Micron SR (200): 72% diff. • Experiment II: – H.263 Encoder & EPIC Encoder – JPEG Encoder & MPEG2 Decoder – Different Loads and Power Modes • Results: – Less than 2% difference – Adapted Micron: 12% diff. (SR 500MHz) • The 2% difference is due to the use of JEDEC-specified averaged IDD currents. Shows the accuracy of the system-level power model 19-Mar-13 Karthik Chandrasekar / TU Delft 9 Summary Key Highlights: • • • Presented an accurate datasheet-based system-level power model for Wide I/O 3D-stacked DRAMs. Verified the system-level model for accuracy against as a detailed SPICE-based circuit-level 3D-DRAM architecture and power model. Observed < 2% difference in power and energy estimates for different memory operations and for any variations in memory load. Other Important Contributions: • • Provided estimates for IDD current measures for different JEDEC 3D-DRAM configurations, in place of the as yet unavailable datasheets (in the paper). The system-level power model (DRAMPower) has been released online as an open-source 3D-DRAM power estimation tool. Download link: www.drampower.info 19-Mar-13 Karthik Chandrasekar / TU Delft 10