Mobile System Considerations for SDRAM Interface Trends Andrew B. Kahng†‡, Vaishnav Srinivas‡¥ June 5th, 2011 CSE† and ECE‡ Departments University of California, San Diego Qualcomm Inc. ¥ Outline • SDRAM Memory Interfaces: Today and Tomorrow • Motivation • Trends in DRAM Density and Data Rate • Trends in Mobile Processor Requirements • Memory Interface Calculator • Exploration Using the Calculator • Summary and Next Steps (2/13) SDRAM Memory Interfaces Today and Tomorrow • Various interconnect and signaling options exist: o Interconnect: Die stack/MCP POP DIMM 3D-Stack o Signaling: DDR, XDR, Serial, Wide IO • Exploration of these options based on the primary bounds (Capacity, Throughput, Power and Latency) is required for making the correct tradeoffs (3/13) Motivation • The memory interface calculator includes: o IO switching, bias and termination power o IO/PHY and interconnect latencies o Input parameters for exploration: • Termination values • Loading • Number of data and strobe pins • Memory timing parameters • IO/PHY “retiming” power • Predict gaps between offerings and requirements • Integrating into CACTI can help exploration of system metrics (4/13) Trends in DRAM Capabilities • DRAM densities to double every 3 years • Projections for DRAM densities revised downwards over time • Current densities at 4Gb/die 1000 100 10 1 8. Gb/s 6. 5. 4. 3. 2. 1. . 1999 2001 2003 2004 2005 2006 2007 2009 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 7. 1999 2001 2003 2004 2005 2006 2007 2009 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 9. DRAM I/O Rate (Source: ITRS ITWG) DRAM Density (Gbits/die) (Source: ITRS ITWG) • DRAM data rates to double every 4-5 years • Projections for DRAM data rates revised upwards over time • Current data-rates at 2.2 Gb/s (5/13) Trends in Mobile Processor Requirements • Trends for mobile processor requirements o Capacity to scale 3-4x every 3 years o Throughput to double every 3 years • The requirements are very dynamic! • Quick exploration and projection for compatible memories is useful 20 18 16 14 Market 2010 2011 2012 2013 2014 12 Desktop 3.0 4.2 5.6 7.4 10.2 10 Laptop 2.0 3.3 4.6 6.3 8.0 8 Mobile 0.3 0.5 0.8 1.0 1.3 6 4 Capacity Requirements in GB (Source: IDC) 2 0 2008 2010 2012 2014 2016 2018 Mobile Handset Throughput Requirements in GB/s (Source: Qualcomm) (6/13) Memory Interface Calculator Primary Bound Capacity Throughput Power Latency Parameters affected Number of ranks and channels Memory Density Capacitive loading Data-rate, number of data lanes Timing parameters Signal Integrity skew and jitter Termination scheme Supply voltage Activity factor Number of pipeline stages Interconnect delay Memory access time (7/13) Memory Interface Calculator Summary Bound LPDDR2 TSS-Wide IO Clock Speed (MHz) 300-533, DDR 200-333, SDR Throughput (GB/s) 3-4.3 12-24 6-13 12-17 12-17 ~40 ~10 ~120 ~60 ~20 ~50 ~35 ~100 ~50 ~50 ~90 ~45 ~220 ~110 ~70 Active Idle IO Power (mW) ~6-10 ~2-4 ~500-600 ~450 ~200 Active Idle Core Power (mW) ~20 ~20 ~150 ~20 ~20 2-8 for dual-rank DIMM 0.5-1 for x32 dual rank 0.5-1 for x32 dual rank Peak IO Power Efficiency (mW/GBps) Peak Core Power Efficiency (mW/GBps) Total Peak Power Efficiency (mW/GBps) Capacity (GB) (Current trends) Latency from MC-DRAM-MC 0.5-1 for 0.5-2 through x32 multi-die stacking dual rank ~50ns ~40ns DDR3 Serial Mobile-XDR 400-800, DDR 4-8 GHz, Serial 400-533, Octal ~45ns, but ~65ns, PLL lock penalty if DLL is penalty if off off (~512 Tck) ~60ns, DLL penalty if off (8/13) Memory Interface Calculator Summary • The spider chart highlights the design space covered o Wide IO covers the largest space for lower capacities o Large capacity systems still need DDR3/DDR4 • Alternatives to be explored outside the existing space? 30 30 25 25 20 20 LPDDR2 (2 x32) LPDDR2 (2 x32) WideIO (4 x128) WideIO (4 x128) Serial (x32) Serial (x32) Mobile Req 15 15 10 10 55 00 2008 2010 2010 2012 2012 2014 2014 2016 2016 2018 2018 2008 Memory Interface Design Space Throughput (2,25) Max LPDDR2 DDR3 Power Efficiency (0.002,0.04) Capacity (0,8) LPDDR3 DDR4 M-XDR Serial Wide IO 1/Latency (0.01,0.04) • Before LPDDR3 came up in JEDEC, Wide-IO and Serial Memory were being explored. • LPDDR3 was brought up as a way to fill this gap in 2012-2014 timeframe Throughput in GB/s (9/13) Exploration using the calculator • How fast can LPDDR3 operate? o o o o o With terminations? With DLL/better retiming? With lower loading? With better packaging? POP versus MCP • Wide IO exploration? o Transition to DDR for Wide IO? o Number of data lanes per strobe – 8, 16 or 32? o When does interface timing and signal/power integrity become an issue for Wide IO? • High-capacity memory alternatives to DDR3/DDR4? o MCP with larger number of wire-bonded dies? o TSS with large number of stacks (8?) o TSS-MCP if stacking with processor is a thermal risk? (10/13) LPDDR3 Exploration Inputs to the calculator Value Number of memories on data pin 1 Number of memories on add pin 1 Number of memories on clk pin 1 Frequency of clock Units 1250 MHz Retiming current 25 mA Number of data pins 32 Number of DQS pairs 8 Termination RTT on DQ & DQS 60 ohms Termination RTT on CA 60 ohms Memory density for each memory core 4 Gb TDS 100 ps TDH 100 ps TDQSQ 100 ps TQHS 100 ps Outputs of the calculator Signal Swing on DQ&DQS, Vsw. DQ Switching Power on DQ Switching Power on DQS Switching Power on CLK + CLK diff termination Bias and Static Power Signal Swing on CA, Vsw.DQ Switching Power on CA Termination Power I/O power for CPU chip Throughput Capacity Latency Tskew Tjitter Terror Timing margin WRITE Timing margin READ Value Units 0.80 52.80 52.80 (V) (mW) (mW) 12.78 30.00 0.65 19.24 225.45 393.07 10 0.5 38.6 41 29 20 60 -5 (mW) (mW) (V) (mW) (mW) (mW) GB/s GB ns ps ps ps ps ps (11/13) LPDDR3 Exploration Maximum speeds for: Preliminary Answers from the calculator POP, Unterminated LPDDR3 with ~150ps memory timing parameters (tDS/tDH/tDQSQ/tQHS)? 800MHz for single-rank 800MHz for dual-rank will need careful architecture and design POP, Terminated LPDDR3 with ~100ps memory timing parameters? 1250MHz External (MCP), Unterminated LPDDR3 Even 533MHz for dual-rank is challenging and may need sophisticated retiming External (MCP), Terminated LPDDR3? 1066MHz (12/13) Summary and Next Steps • A simple framework to model interconnect and IO/PHY timing and power for existing and upcoming SDRAM memory interfaces • Helps explore standards and design space • Helps identify gaps between DRAM and SOCs • Next Steps: o Integrate the memory interface models within CACTI o Challenge the calculator for future usage cases for mobile products o Include more parameters, including silicon area, packaging options and number of data lanes per strobe pin (13/13)