NC STATE UNIVERSITY Retention-Aware Placement in DRAM (RAPID): Software Methods for Quasi-Non-Volatile DRAM Ravi K. Venkatesan Stephen Herr, Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North Carolina State University NC STATE UNIVERSITY DRAM in Next-Generation Mobile Devices Feature-rich next-generation mobile devices Increase in memory requirements DRAM displacing SRAM – Samsung SGH i700 Triband Windows Smartphone : 32 MB DRAM – Motorola i930 cell phone has 32 MB DRAM Dram Makers Prep Multichip Cell Phone Memories. ElectroSpec, 1/3/2003. Innovative Mobile Products Require Early Access to Optimized DRAM Solutions. A Position Paper by Samsung Semiconductor, Inc., 2003. DRAM continuously drains battery even in standby – Periodic refresh needed to preserve stored information Need to reduce DRAM refresh power 2 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY DRAM – Cells 500 ms Bit Line Word Line from dummy cell DRAM capacitor 50 s Sense Amp 3 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Retention Time Measurement ISSI DRAM Chip (IS42S16800A) 128 Mb 4 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Retention Time Characterization Algorithm 10 trials are performed for Disable refresh – Writing all 1s – Writing all 0s Write all 1s to pagei Retention time for each page = minimum of the 20 trials wait(T) Read pagei Is pattern all 1s? NO Retention Time(pagei) = T - ΔT YES T = T + ΔT 5 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY DRAM – Retention Time Distribution Number of pages 800 600 400 200 0 worst page retention time (500 ms) 0 10000 20000 30000 40000 50000 Retention time (ms) 6 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Cumulative Retention Time Distribution 99% DRAM usable with 3s refresh period % Usable DRAM 100 80 6X longer than worst page (500 ms at 25°C) 60 46X longer than default (64 ms at 70°C) 40 20 0 0 10000 20000 30000 40000 50000 Refresh period (ms) 7 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY DRAM – Retention Time Distribution 45°C 100 % Usable DRAM Number of pages 500 400 300 200 100 0 80 60 40 20 0 0 5000 10000 15000 Retention time (ms) 20000 0 5000 10000 15000 20000 Refresh period (ms) worst page retention time (320 ms) 8 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY DRAM – Retention Time Distribution 70°C 100 % Usable DRAM Number of pages 1000 800 600 400 200 80 60 40 20 0 0 0 1000 2000 3000 4000 5000 6000 Retention time (ms) 0 2000 4000 6000 Refresh period (ms) worst page retention time (150 ms) 9 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY DRAM Refresh Approaches Three classes of hardware techniques: Conventional – Refresh period based on worst cell – Worst-case temperature Temperature-Compensated Refresh (TCR) – Refresh period based on worst cell – Dynamically adjusts for temperature Better than worst-case design – Multiple refresh periods (refresh period per block of cells) – Can be coupled with TCR 10 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY RAPID (Retention-Aware Placement in DRAM) Novel software-only approach KEY IDEA: Allocate longer-retention pages before allocating shorterretention pages Single refresh period selected based on shortestretention page among populated pages, rather than shortest-retention page overall NOTE: All pages are still refreshed Extended refresh period safe only for populated pages OK for overall correctness 11 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY RAPID Conventional RAPID-1 Refresh Period RAPID-2 RAPID-3 0.5s 3s 37s 50s 41s 22s 41s 50s 22s 37s A A C C B B Retention Time Data Objects A, B, C, D 3s Program Sequence Allocate A Allocate B Allocate C 0.5s C D 37s C B B C D 41s D A A 50s D D 22s Allocate D Free B Free D 1s 12 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY RAPID - Recap Novel software-only approach – Can exploit off-the-shelf DRAMs to reduce refresh power RAPID - 1 – Eliminate outlier pages RAPID - 2 – RAPID - 1 and – Allocate longer-retention pages before allocating shorterretention pages RAPID - 3 – RAPID - 2 and – Continuously reconsolidate data to longest-retention pages possible 13 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Coupling RAPID with Off-the-shelf DRAMs RAPID requires a single refresh period that can be adjusted Refresh options in off-the-shelf DRAMs: SELF-REFRESH AUTO-REFRESH DRAM issues refresh commands internally at a refresh period that is fixed or based on a particular temperature range Regularly timed refresh commands issued by memory controller Typically not programmable Typically programmable Low power High power Used in standby operation Used in active mode 14 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Coupling RAPID with Off-the-shelf DRAMs (cont.) Neither refresh option is fully satisfactory – Self-refresh not programmable – Auto-refresh programmable, but power-inefficient Key observation – DRAMs allow enabling/disabling self-refresh via configuration register 15 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Coupling RAPID with Off-the-shelf DRAMs (cont.) Solution – Set up a periodic timer interrupt (INT1) • Period = RAPID refresh period – Set up another interrupt (INT2) • Triggers 64 ms (self-refresh period) after INT1 – INT1 triggered: • Enable self-refresh – INT2 triggered: • Disable self-refresh INT2 INT1 INT2 INT1 64 ms 64 ms 10s INT2 INT1 64 ms 10s RAPID refresh period 16 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY RAPID – Implementation Modifications to routines which allocate and deallocate physical pages in memory INACTIVE LIST (free-list of inactive physical pages) Conventional HEAD RAPID-1 HEAD ... .... TAIL TAIL RAPID-1: 1% of outlier pages are excluded from Inactive List 17 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY RAPID – Implementation (cont.) INACTIVE LIST (free-list of inactive physical pages) RAPID-1 RAPID-2 and RAPID-3 H H ... ... ... ... ..... T ... T 50 – 45.3 seconds 45.3 – 40.6 40.6 – 35.9 7.7 – 3 Refresh Period = 40.6s 45.3s 35.9s RAPID-2 and RAPID-3 Single Inactive List transformed into Multiple Inactive Lists 18 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Related Work Characterization [Hamamoto et al. 1998, IEEE Tran Electron Devices] On Retention Time Distribution of DRAM Dual Period Refresh [Yanagisawa 1988, US Patent] Semiconductor Memory Multiperiod Refresh [Kim and Papaefthymiou 2001, 2003, IEEE Tran VLSI Systems] Block-Based Multiperiod Refresh Multiperiod Refresh + Selective [Ohsawa et al. 1998, ISPLED] Optimizing DRAM refresh Count for Merged DRAM/Logic LSIs Placement + Single Refresh Period + Selective [Kai et al. 2002, US Patent] Semiconductor Circuit and Method of Controlling the Same [Le et al. 2005, US Patent] Bank Address Mapping According to Bank Retention Time in DRAM [Kawasaki et al. 2005, US Patent] DRAM and Refresh method thereof 19 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Evaluation Methodology Active Mode (5%), Idle Mode (95%) Simulated timeline = 24 hours Divide 24-hour timeline into time slices Probability [time slice = active period] = 5% Probability [time slice = idle period] = 95% Inject random number of allocations/frees in active period – Vary DRAM utilization across timeline 20 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY Custom Hardware Techniques HW-Multiperiod (HW-M) – Each DRAM page is refreshed at a tailored refresh period, that is a multiple of shortest refresh period HW-Multiperiod-Occupied (HW-M-O) – Same as HW-Multiperiod – Refresh only currently occupied pages HW-Ideal (HW-I) – Each DRAM page is refreshed at its own tailored refresh period HW-Ideal-Occupied (HW-I-O) – Same as HW-Ideal – Refresh only currently occupied pages 21 Ravi Venkatesan © 2006 HPCA-12 02/14/2006 NC STATE UNIVERSITY 1 HW-I-O HW-M-O 0 HW-I HW-I-O HW-M-O HW-I HW-M R-3 R-2 R-1 TCR 0 0.2 HW-M 83% 93% 95% 0.2 0.4 R-3 0.4 0.6 R-2 0.6 45°C 0.8 R-1 0.8 1 TCR 25°C normalized energy w.r.t. TCR 1 83-95% energy savings at 25°C 70°C 0.8 Similar energy savings across all temperatures 0.6 0.4 Nearly as effective as custom hardware techniques 0.2 HW-I-O HW-I HW-M-O Ravi Venkatesan © 2006 HW-M R-3 R-2 R-1 0 TCR normalized energy w.r.t. TCR normalized energy w.r.t. TCR RAPID – Refresh Energy Savings 22 HPCA-12 02/14/2006 NC STATE UNIVERSITY 0.2 1.2 R-3 R-2 R-1 TCR 0 0.8 0.6 0.4 0.2 0 R-3 0.4 45°C 1 R-2 0.6 1.62 R-1 0.8 1.2 TCR 25°C 1 Energy Consumption (mW hours) 1.2 3.46 70°C 1 0.8 0.6 0.4 0.2 Ravi Venkatesan © 2006 R-3 R-2 R-1 0 TCR Energy Consumption (mW hours) Energy Consumption (mW hours) RAPID vs. TCR Worst-case non-temperature adjusted R-1 yields same or better energy than TCR R-1 simpler, cost-effective alternative to TCR R-2 and R-3 yield much higher energy savings than TCR 23 HPCA-12 02/14/2006 NC STATE UNIVERSITY 0.04 HW-I-O HW-M-O 0 HW-I HW-I-O HW-M-O HW-I HW-M R-3 R-2 R-1 0 0.08 HW-M 0.04 0.12 R-3 0.08 50% utilization (average) 0.16 R-2 0.12 0.2 R-1 75% utilization (average) 0.16 Energy Consumption (mW hours) 0.2 0.2 25% utilization (average) 0.16 0.12 0.08 0.04 Ravi Venkatesan © 2006 HW-I-O HW-M-O HW-I HW-M R-3 R-2 0 R-1 Energy Consumption (mW hours) Energy Consumption (mW hours) Refresh Energy – Different DRAM Utilizations R-1, HW-M and HW-I constant R-2 and R-3 yield more energy savings than HW-M and HW-I as utilization decreases R-2 and R-3 approach energy of HW-I-O HW-I-O not retention-aware 24 HPCA-12 02/14/2006 NC STATE UNIVERSITY Summary – RAPID RAPID reduces refresh power to vanishingly small levels – Quasi-non-volatile DRAM Software-only technique – No custom DRAM support required – Can exploit off-the-shelf DRAM Approaches energy levels of idealized techniques, which require hardware support 25 Ravi Venkatesan © 2006 HPCA-12 02/14/2006