E-RoC: Embedded RAIDs-on-Chip for Low Power Distributed Dynamically Managed Reliable Memories* Luis Bathen, Nikil Dutt University of California, Irvine * This work was presented at DATE 2011 Luis A. Bathen University of California, Irvine 1 Distributed Memories and Voltage Scaling • • • Trend towards multicore platforms • Distributed on-chip memories By 2014 up to 94% chip area may be memories Saving Power? Process • Voltage Scaling Increased vulnerability to Variations Technology scaling + environment soft-errors! Voltage + + Overdriven Vdd Nominal Vdd Low Vdd + Parametric Manufacturing Errors Errors intentionally Introduced by aggressive Vdd scaling Aggressively Low Vdd Memory Array of y at the Reduced power consumption cost x introducing errors! Luis A. Bathen University of California, Irvine [Kurdahi, Eltawil 2008] 2 Related Work in Memory Reliability • BIST/ECC • Makhzan et al. [ICCD 2007], Kim et al. [DATE’06], Lee et al. Memory ECC/hybrids incur high [CASES ’06], Ghosh et al. [ITC 2004] characterization/BIST is very performance and power • Redundancy • expensive Lucente et et al. [ICSoverheads ‘04] ! al. [CICC ‘90] , Zhang consumption • ECC/replication hybrids • Zhang et al. [DSN ’03], Li et al. [ICCAD ‘05] • RAID: very successful for reliable distributed data storage • Can we exploit RAID notions for on-chip memories? Luis A. Bathen University of California, Irvine 3 Towards Embedded RAIDs (E-RAIDs) Traditional RAID – Storage Sytems RAID Controller CPU HD System Bus Embedded RAID - SoCs Introduce HW/SW On-Chip Bus E-RAID Manager CPU CPU CPU CPU RAID 1 (Mirroring) RAID 5 HD HD (Stripe +HD Mirroring) • • • ERoC HD Guarantee 24/7 uptime under heavy IO loads Software/Hardware RAID controllers Different RAID levels • For performance/reliability (RAID0, RAID1, RAID5…) SPM SPM SPM SPM System Bus E-RoC Framework Embedded RAID Levels Different Platform Configurations (CMP, NoC, etc.) Logical SPMs (Virtual Address Space) DSPAM Allocation Policies Embedded RAIDs-onChip Aggressive Voltage Scaling E-RoCManager Manager E-RoC Luis A. Bathen University of California, Irvine 4 Case for E-RAIDs E-RoC Manager IF E-RAID 1 (Mirroring) 2KB SPM @ Nominal Vdd Vs. 512 B SPM ERAID Levels 512 B 512 B E-RAID SPM 0 SPM (1 Byte stripping) 512 B SPM 512 B SPM 512 B 512 B E-RAID SPM 0 SPM (1 Byte stripping) 512 B SPM 8bit Byte 0 32bit 14% increase @ 2e-20 SEU 8% savings @ 1e-15 SEU 19% savings @ 6e-12 SEU Byte 1 46% savings @ 7e-2 SEU Byte 2 Byte 3 4 x 8bit Power reduction through aggressive voltage scaling Probabiligy of Failure (SEU) 1.00E-02 1.00E-05 1.00E-08 0.9 0.8 0.75 0.6 50 40 Incurs power consumption overhead at high Vdd 30 20 1.00E-11 10 1.00E-14 0 1.00E-17 -10 Power Reduction Percentage 60 - Provide Same Memory Space - Parallel IOs - Voltage scaled Voltage scale induced errors handled automatically by E-RAID levels! 1.00E-20 Luis A. Bathen University of California, Irvine Saves power at low Vdd Vdd -20 5 Embedded RAID Levels and Logical SPMs CPU • • • Greatly limits SPM utilization • CPU0 • • Mirroring, 2x1K • App2:2KB Associated with an E-RAID level Expose LSPMs to the outside world Managed as regular SPMs Efficient allocation policies App1: 1KB 1K 4K Parity, 3x2K E-RAID Level Layer 1K 1K 1K 4K 4K 4K Inefficient SPM utilization! CPU2 LSPM of 1K LSPM of 2K Mirroring, 2x1K Parity, 3x2K E-RoC Manager Physical Level Layer Luis A. Bathen University of California, Irvine App2:2KB Successful CPU0 CPU1 allocation of both E-RAID levels! Address Virtualization Layer App2 is not successful in 1K Parity creating RAID 4K SPM (Mirroring, Parity, No E-RAID, etc.) CPU2 CPU1 SPM Logical SPMs (LSPMs) • App1: 1KB SPM MM CPU Customized E-RAID levels • Create and use for entire app run Allocate an entire SPM to a RAID level • CPU Embedded RAIDs Statically defined • SPM ERoC Traditional RAIDs CPU 1K 1K 1K 1K 4K 1K 4K 1K 4K 1K 1K 4K 4K Transparent and efficient utilization of SPM space! 6 Sample Experimental Results: Power & Performance Comparison Platform: 8 Core CMP with 8x4KB SPMs (32KB) CPU CPU CPU CPU CPU CPU CPU CPU Baseline: SPM @ Nominal Vdd All others: Voltage Scaled (Vdd = 0.65) SP SP SP SP SP SP SP ERo Benchmarks: JPEG Encoder/Decoder, H263 Encoder M M M M M M M C Traditional (ECC/DUP): High Power Consumption Traditional (ECC/DUP): Overhead High Performance E-RoC: AVG: 64% increase Overhead Minimal overall Normalized Performance AVG: 9.2% Normalized Power 4 3.5 3 2.5 JPEGDEC H263 E-RoC: AVG savings of 76% 2 1.5 1 0.5 1.25 Performance Overhead Power Consumption Overheads JPEGENC MM SP M performance overhead JPEGDEC H263 AVG: 2.3% JPEGENC 1.2 1.15 1.1 1.05 1 0.95 0.9 0 Points of Comparison Reduced power consumption with minimal performance overhead! Points of Comparison Luis A. Bathen University of California, Irvine 7 Conclusion • Introduced Embedded RAIDs-on-Chip (E-RoC) • Key ideas are: 1. Reliability via redundancy using E-RAID levels 2. Custom E-RAID levels optimized for use in embedded SoCs 3. Dynamic allocation of distributed SPMs 4. Virtualization support (Logical SPMs) • Use RAID-like policies to achieve a fully distributed low power and reliable on-chip memory subsystem • Our experimental results show that E-RoC can attain • 76 % average power reduction over ECC based approaches • Minimal performance overhead (2.3% AVG) • To learn more come to my poster! Luis A. Bathen University of California, Irvine 8