Reducing Energy Consumption of Disk Storage Using Power Aware Cache Management Qingbo Zhu, Francis M. David, Christo F. Deveraj, Zhenmin Li, Yuanyuan Zhou Department of Computer Science University of Illinois at Urbana-Champaign Pei Cao* *Cisco Systems Inc. HPCA’04 02/17/2004 Data Centers: Service-based Computing Application Servers … Storage Servers Web Servers router … switch Database Servers … SAN … Energy Problem Faced by Data Centers Data centers High electricity bills: up to 25% TCO $8M per year for a 30,000-square-foot data center [EERE news 2003] Increase as much as 25% annually [Energy User News 2002] Storage 27% of the total energy consumed [Maximum Inc. 2002] Disk Power Model Disk power modes Active/idle/standby/sleep Spinup/down cost Breakeven time Metrics Energy consumption Average response time Disk Power Management Schemes Oracle scheme (off-line) IdleTime > BreakEvenTime access1 access2 Practical scheme (on-line) Idle for BreakEvenTime Wait time Current Research Status The idle periods in server workloads are too short to justify high spinup/down cost of server disks [ISCA’03][ISPASS’03] [ICS’03] IBM Ultrastar 36Z15 -- 135J/10.9s Multi-speed disk model [ISCA’03] RPMs: multiple intermediate power modes Smaller spinup/down costs Be able to save energy for server workloads Most previous work assume that all requests go directly to physical disks Observation Many requests are filtered out by the storage cache EMC Symmetrix storage system IBM ESS system Up to 128GB storage cache Up to 64GB storage cache Cache replacement and write policies affect the access sequences to physical disks Block-based storage system The Focus of Our Paper Power-aware off-line and on-line cache replacement algorithms and write policies reduce the disk energy consumption Clarification The underlying disk power management scheme is NOT changed The storage cache is always active Outline Motivation Power aware cache management Belady’s algorithm is NOT energy-optimal Off-line power-aware greedy algorithm On-line power-aware algorithm Four write policies Simulations Conclusion Limitations and future work Breakeven-Time for Multiple Power Modes Active mode mode 0 mode 1 mode 2 mode 3 Energy Consumption E(T) Spinup cost t1 t2 T t3 Idle Period Length Is Belady’s Algorithm Energy-Optimal? Belady’s algorithm: performance-optimal Minimize the number of misses Evicting the block with the longest future reference distance Answer: NO! Only consider the access sequence Ignore requests’ arrival time Ignore multiple disk scenario A Simple Example A B C t DA Disk 0 B Belady’s algorithm power-aware algorithm An energy-optimal algorithm using dynamic programming Off-line Power-Aware Greedy Algorithm Idea: evicting the block with the smallest energy penalty Observation: take advantage of the knowledge about future’s bound-to-happen misses Cold misses Capacity misses due to previous evictions A B C D A E D E F: bound-to-happen misses B F How to Calculate Energy Penalty of Evicting a Block - Energy Penalty (A) = E(DA) + E(AE) Energy Penalty (B) = E(EB) + E(BF) E(DE) E(EF) A B C D A E D E F: bound-to-happen misses B F Re-view mode 0 mode 1 mode 2 Energy Consumption mode 3 t1 t2 t3 Idle Period Length On-line Power Aware Algorithm mode 3 << Energy Saving energy saving mode 2 Super Linear mode 1 energy penalty mode 0 t4 t3 Idle Period Length t1 t2 Idea: selectively keep blocks from inactive disks in the cache for a longer time Make “inactive disks” more inactive How to Measure Disk Activeness? Characteristics of inactive disks Small percentage of cold misses Large idle period lengths with high probability How to Keep Track of Cold Misses? Bloom Filter: a space-efficient membership test method A vector v of m bits k independent hash functions ranging {1..m} Given an access for block a, check the bits at position h1 (a), h2 (a),..., hk (a) If any of them is 0, a is cold miss and then set all bits 1 Otherwise, it is not a cold miss though we may be wrong 1.6M blocks with v = 2M bytes and k = 7 the accuracy is 99.18% How to Keep Track of the Distribution of Idle Period Lengths? Idle Period Length Histogram-based estimation Case Study: PA-LRU Applies to all cache replacement algorithms LRU, 2Q, MQ etc. PA-LRU Two LRU stacks LRU0: blocks from active disks LRU1: blocks from inactive disks Evict blocks from LRU0 first The evaluation of disk activeness is epoch-based Adapt to workload changes Write Policy Write back Write through Write back with eager updates (WBEU) Eagerly write back all the dirty blocks when the target disk becomes active due to a read miss Write through with deferred updates (WTDU) Use a log disk which is always active Write the blocks to the log disk if the target disk is not active Flush back all the logged blocks when the target disk becomes active due to a read miss Retain persistent semantics Evaluation Methodology Experiment setup DiskSim: Real system traces: IBM Ultrastar 36Z15 Enhanced by a multi-speed disk power model Enhanced by a CacheSim OLTP Cello96 Synthetic traces: Exponential distribution Pareto distribution Energy (OLTP) Infinite size Belady OPG LRU PA-LRU 1 0.8 0.6 0.4 0.2 0 Practical Oracle OPG: energy saving 2% - 9% over Belady’s algorithm PA-LRU: energy saving 16% over LRU Average Response Time (OLTP) Infinite size Belady OPG LRU PA-LRU 1 0.8 0.6 0.4 0.2 0 Practical OPG: 4% better than belady’s algorithm PA-LRU: 50% better than LRU (avoid expensive spinup) Conclusion Power aware cache management plays an important role on disk energy consumption Belady’s algorithm is NOT energy-optimal Evict the blocks with small energy penalty Make inactive disks more inactive Future Work and Acknowledgements Limitations and future work Acknowledgements Design online algorithms for a single disk as well Take prefetching into account Real system experiments Anonymous reviewers Professor Lenny Pitt (UIUC) CMU Parallel Data Lab (for DiskSim) HP Lab (for Cello Trace) Questions? Thanks! Backup Slides Write Policies (Exponential Distribution) Write through Write back WBEU WTDU 1 0.8 0.6 0.4 0.2 0 Practical Write back: up to 20% saving than write through WBEU: up to 60% saving than write through WTDU: up to 55% saving than write through Energy-optimal problem Offline Energy-optimal Algorithm Only two power state Virtual time Only one disk Parameters: 1: active mode 0: standby mode b: the number of disk blocks k: the number of cache blocks n: the input size m: threshold Cache State (C, t, i) The cache contains the blocks in set C after the first i+1 references and the last t consecutive reference were ache hit Offline energy optimal algorithm Minimize energy: maximize the time the disk can spend in standby mode A(C,t,i): the maximum time that the disk spends in the standby mode until (C,t,i) is reached Dynamic programming: Extend to multiple disks: Time Breakdown Mean Inter-arrival Time Simulation Results: Cello96 OPG: energy saving 5% - 7% over belady’s algorithm PA-LRU: energy saving 2% - 3% Cello96: high cold miss ratio, larger than 65% for all disks OPG is heuristic A B C D A B E D E: bound-to-happen misses A Step Further… Consider both miss ratio and energy penalty Idea: don’t differentiate among blocks whose energy penalty is smaller than a threshold T energy penalty smaller than T: round up to T T=0: pure greedy algorithm T is large enough: belady’s algorithm Data Centers: Service-based Computing Web Ethernet Database SAN Servers Servers Internet Local Storage Storage Servers