SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh Jamie Liu Ben Jaiyen Richard Veras October 27, 2011 Onur Mutlu Background Related Work SAPPER Evaluation Conclusion DRAM Overview DRAM is the dominant main memory technology due to high density and low cost Stores data as charge on a capacitor Charge leakage causes data to be lost DRAM cells must be periodically refreshed (charge on cells restored) to avoid data loss Retention time varies between DRAM cells SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 2 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Key Challenges Refreshes interfere with memory accesses, reducing performance Refreshes consume energy Higher DRAM density means greater probability of a cell failure, with consequences for device yield Cost sensitivity of DRAM makes modifying DRAM devices unattractive SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 3 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Status Quo Memory controller issues auto-refresh commands at a fixed time interval Refreshing is managed by the DRAM Control on DRAM must be very simple for cost reasons Every row is refreshed even if not strictly necessary No support for variability in retention time — all cells refreshed at the minimum retention time across the entire device SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 4 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Smart Refresh Count accesses as refreshes M. Ghosh and H.-H. S. Lee, “Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs,” MICRO 2007 Very high storage overhead (768 KB for a 32 GB memory controller) No support for variability in retention time SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 5 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Retention-Aware Placement Only refresh used pages and prefer use of high-retention pages to decrease refresh rate R. K. Venkatesan et al., “Retention-Aware Placement in DRAM (RAPID): Software Methods for Quasi-Non-Volatile DRAM,” HPCA 2006 C. Isen and L. K. John, “Eskimo: Energy Savings Using Semantic Knowledge of Inconsequential Memory Occupancy for DRAM Subsystem,” MICRO 2009 Either the OS performs refreshes (requires hard-deadline scheduling, high context switching overhead) . . . Or hardware has to track retention time for each row (extremely high storage overhead) SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 6 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion ECC Allow refresh lapses and correct errors using ECC J. Kim and M. C. Papaefthymiou, “Dynamic Memory Design for Low Data-Retention Power”, PATMOS 2000 P. G. Emma, W. R. Reohr, and M. Meterelliyoz, “Rethinking Refresh: Increasing Availability and Reducing Power in DRAM for Cache Applications,” IEEE Micro 2008 C. Wilkerson et al., “Reducing Cache Power with Low-Cost, Multi-Bit Error-Correcting Codes”, ISCA 2010 ECC imposes significant storage overhead — very expensive in cost-sensitive commodity DRAM SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 7 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Accepting Retention Errors Allow non-critical data to be refreshed at a lower rate S. Liu et al., “Flikker: Saving DRAM Refresh-Power Through Critical Data Partitioning”, ASPLOS 2011 Requires significant programmer effort to determine which application data is non-critical and/or recovery from corruption of non-critical data SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 8 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Refresh Scheduling Try to schedule refreshes for when the system is idle J. Stuecheli et al., “Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory,” MICRO 2010 Doesn’t decrease number of refreshes (no impact on energy consumption) No support for variability in retention time SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 9 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion DRAM Retention Distribution Probability density of retention time distribution 0 10 10-1 10-2 Probability 10-3 10-4 10-5 10-6 10-7 10-8 10-9 -2 10 10-1 SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 101 100 Retention Time (s) 10 102 103 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Minimum Retention Time and Yield Device Failure Probability (1 - Yield) 100% 10% 1% 0.1% 0.01% 128 ms 64 ms 32 ms 4 Gb 8 Gb SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 16 Gb 32 Gb Device Density 11 64 Gb 128 Gb SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion DRAM Retention Distribution Most refresh overhead incurred because of a small number of low-retention rows Low-retention rows are difficult to eliminate Refreshing only the lowest-retention rows at the lowest refresh interval can allow significant energy savings SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 12 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion System Overview Modification to the memory controller (or cache controller in eDRAM, or logic die in stacked memory device) Keep track of rows in lowest-retention groups (e.g. 64–128 ms, 128–256 ms, 256–512 ms) Row counter counts through every row in the DRAM system Refresh tracked rows at their group refresh interval (64 ms, 128 ms, or 256 ms respectively) Refresh all other rows at the default interval (512 ms) SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 13 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Efficient Row Tracking Naive way to track rows is to use a table J.-H. Oh, “Refresh for Dynamic Cells with Weak Retention”, U.S. Patent US20050099868 Requires associative lookup to check if a row is in a retention group Fixed size: system fails to provide correctness if the table capacity is insufficient SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 14 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Bloom Filters Bloom filters provide a space-efficient probabilistic row tracking mechanism False positives cause a row to be refreshed more frequently than needed (no correctness issue) A Bloom filter can contain any number of elements (no overflow issue) SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 15 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Tolerating Temperature Variation Change in temperature causes retention time of all cells to change by a predictable factor Period scaling: increase the rate at which the row counter counts depending on the temperature Results in uniform refresh rate scaling for all rows SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 16 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Period Scaling Increment Period Counter on Period Scaler roll-over Use Period Counter to determine which Bloom filter needs refreshing Period Counter Period Scaler Roll-over Value n bits where 2n = # of supported rows =? Period Scaler Row Counter Increment Period Scaler on Row Counter roll-over Choose lengths based on design choices. (# of Bloom filters, granularity of temperature scaling) SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 17 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Methodology 32 GB DRAM 64–128 ms retention range: 256 B Bloom filter, 10 hash functions 128–256 ms retention range: 1 KB Bloom filter, 6 hash functions Default refresh interval: 256 ms SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 18 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Initial Results 74.7% refresh reduction 21.2% idle energy reduction Active energy reduction still being analyzed SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 19 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Sensitivity to Row Size Normalized Refreshes 0.28 0.26 0.24 0.22 0.20 4 KB SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 8 KB Row Size 20 16 KB 32 KB SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Sensitivity to Tail Probability Normalized Refreshes 0.28 0.26 0.24 0.22 0.20 1.00E-6 1.25E-6 SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 1.50E-6 Tail Probability 21 1.75E-6 2.00E-6 SAFARI — Carnegie Mellon University Background Related Work SAPPER Evaluation Conclusion Conclusion 74.7% refresh reduction on average Low overhead: 1.25 KB for 32 GB memory controller Low complexity in hardware Robust to DRAM parameter variation Enables higher-density memory systems SAPPER: Statistically-Aware Probabilistic Power-Efficient Refresh 22 SAFARI — Carnegie Mellon University