Reducing Energy Consumption of Disk Storage Using Power Aware

advertisement
Reducing Energy Consumption of Disk Storage
Using Power Aware Cache Management
Qingbo Zhu, Francis M. David, Christo F. Deveraj, Zhenmin Li, Yuanyuan Zhou
Department of Computer Science
University of Illinois at Urbana-Champaign
Pei Cao*
*Cisco Systems Inc.
HPCA’04
02/17/2004
Data Centers: Service-based Computing
Application
Servers
…
Storage Servers
Web Servers
router
…
switch
Database
Servers
…
SAN
…
Energy Problem Faced by Data Centers

Data centers

High electricity bills: up to 25% TCO



$8M per year for a 30,000-square-foot data center
[EERE news 2003]
Increase as much as 25% annually [Energy User
News 2002]
Storage

27% of the total energy consumed [Maximum
Inc. 2002]
Disk Power Model

Disk power modes




Active/idle/standby/sleep
Spinup/down cost
Breakeven time
Metrics


Energy consumption
Average response time
Disk Power Management Schemes

Oracle scheme (off-line)
IdleTime > BreakEvenTime
access1

access2
Practical scheme (on-line)
Idle for
BreakEvenTime
Wait time
Current Research Status

The idle periods in server workloads are too short to
justify high spinup/down cost of server disks
[ISCA’03][ISPASS’03] [ICS’03]


IBM Ultrastar 36Z15 -- 135J/10.9s
Multi-speed disk model [ISCA’03]



RPMs: multiple intermediate power modes
Smaller spinup/down costs
Be able to save energy for server workloads
Most previous work assume that all requests go directly to
physical disks
Observation

Many requests are filtered out
by the storage cache

EMC Symmetrix storage system


IBM ESS system


Up to 128GB storage cache
Up to 64GB storage cache
Cache replacement and write
policies affect the access
sequences to physical disks
Block-based storage
system
The Focus of Our Paper

Power-aware off-line and on-line cache
replacement algorithms and write policies


reduce the disk energy consumption
Clarification


The underlying disk power management scheme is
NOT changed
The storage cache is always active
Outline


Motivation
Power aware cache management







Belady’s algorithm is NOT energy-optimal
Off-line power-aware greedy algorithm
On-line power-aware algorithm
Four write policies
Simulations
Conclusion
Limitations and future work
Breakeven-Time for Multiple Power Modes
Active mode
mode 0
mode 1
mode 2
mode 3
Energy
Consumption
E(T)
Spinup cost
t1
t2
T
t3
Idle Period Length
Is Belady’s Algorithm Energy-Optimal?

Belady’s algorithm: performance-optimal



Minimize the number of misses
Evicting the block with the longest future
reference distance
Answer: NO!



Only consider the access sequence
Ignore requests’ arrival time
Ignore multiple disk scenario
A Simple Example
A
B
C
t
DA
Disk 0
B
Belady’s
algorithm
power-aware
algorithm
An energy-optimal algorithm using dynamic programming
Off-line Power-Aware Greedy Algorithm


Idea: evicting the block with the smallest energy
penalty
Observation: take advantage of the knowledge about
future’s bound-to-happen misses


Cold misses
Capacity misses due to previous evictions
A
B
C
D
A
E
D E F: bound-to-happen misses
B
F
How to Calculate Energy Penalty of
Evicting a Block
-
Energy Penalty (A) = E(DA) + E(AE)
Energy Penalty (B) = E(EB) + E(BF)
E(DE)
E(EF)
A
B
C
D
A
E
D E F: bound-to-happen misses
B
F
Re-view
mode 0
mode 1
mode 2
Energy
Consumption
mode 3
t1
t2
t3
Idle Period Length
On-line Power Aware Algorithm
mode 3
<<
Energy Saving
energy saving
mode 2
Super
Linear
mode 1
energy penalty
mode 0
t4

t3
Idle Period Length
t1
t2
Idea: selectively keep blocks from inactive disks in the cache
for a longer time
 Make “inactive disks” more inactive
How to Measure Disk Activeness?

Characteristics of inactive disks


Small percentage of cold misses
Large idle period lengths with high probability
How to Keep Track of Cold Misses?

Bloom Filter: a space-efficient membership test
method



A vector v of m bits
k independent hash functions ranging {1..m}
Given an access for block a, check the bits at position
h1 (a), h2 (a),..., hk (a)



If any of them is 0, a is cold miss and then set all bits 1
Otherwise, it is not a cold miss though we may be wrong
1.6M blocks with v = 2M bytes and k = 7

the accuracy is 99.18%
How to Keep Track of the Distribution of
Idle Period Lengths?
Idle Period Length
Histogram-based estimation
Case Study: PA-LRU

Applies to all cache replacement algorithms


LRU, 2Q, MQ etc.
PA-LRU

Two LRU stacks




LRU0: blocks from active disks
LRU1: blocks from inactive disks
Evict blocks from LRU0 first
The evaluation of disk activeness is epoch-based

Adapt to workload changes
Write Policy



Write back
Write through
Write back with eager updates (WBEU)


Eagerly write back all the dirty blocks when the target disk
becomes active due to a read miss
Write through with deferred updates (WTDU)




Use a log disk which is always active
Write the blocks to the log disk if the target disk is not active
Flush back all the logged blocks when the target disk
becomes active due to a read miss
Retain persistent semantics
Evaluation Methodology

Experiment setup

DiskSim:




Real system traces:



IBM Ultrastar 36Z15
Enhanced by a multi-speed
disk power model
Enhanced by a CacheSim
OLTP
Cello96
Synthetic traces:


Exponential distribution
Pareto distribution
Energy (OLTP)
Infinite size
Belady
OPG
LRU
PA-LRU
1
0.8
0.6
0.4
0.2
0
Practical
Oracle
OPG: energy saving 2% - 9% over Belady’s algorithm
PA-LRU: energy saving 16% over LRU
Average Response Time (OLTP)
Infinite size
Belady
OPG
LRU
PA-LRU
1
0.8
0.6
0.4
0.2
0
Practical
OPG: 4% better than belady’s algorithm
PA-LRU: 50% better than LRU (avoid expensive spinup)
Conclusion

Power aware cache management plays an
important role on disk energy consumption



Belady’s algorithm is NOT energy-optimal
Evict the blocks with small energy penalty
Make inactive disks more inactive
Future Work and Acknowledgements

Limitations and future work




Acknowledgements





Design online algorithms for a single disk as well
Take prefetching into account
Real system experiments
Anonymous reviewers
Professor Lenny Pitt (UIUC)
CMU Parallel Data Lab (for DiskSim)
HP Lab (for Cello Trace)
Questions?
Thanks!
Backup Slides
Write Policies (Exponential Distribution)
Write through
Write back
WBEU
WTDU
1
0.8
0.6
0.4
0.2
0
Practical
Write back: up to 20% saving than write through
WBEU: up to 60% saving than write through
WTDU: up to 55% saving than write through
Energy-optimal problem
Offline Energy-optimal Algorithm

Only two power state





Virtual time
Only one disk
Parameters:





1: active mode
0: standby mode
b: the number of disk blocks
k: the number of cache blocks
n: the input size
m: threshold
Cache State (C, t, i)

The cache contains the blocks in set C after
the first i+1 references and the last t
consecutive reference were ache hit
Offline energy optimal algorithm


Minimize energy: maximize the time the
disk can spend in standby mode
A(C,t,i): the maximum time that the disk
spends in the standby mode until (C,t,i) is
reached
Dynamic programming:
Extend to multiple disks:
Time Breakdown
Mean Inter-arrival Time
Simulation Results: Cello96
OPG: energy saving 5% - 7% over belady’s algorithm
PA-LRU: energy saving 2% - 3%
Cello96: high cold miss ratio, larger than 65% for all disks
OPG is heuristic
A
B
C
D
A
B
E
D E: bound-to-happen misses
A Step Further…


Consider both miss ratio and energy penalty
Idea: don’t differentiate among blocks whose
energy penalty is smaller than a threshold T



energy penalty smaller than T: round up to T
T=0: pure greedy algorithm
T is large enough: belady’s algorithm
Data Centers: Service-based Computing
Web Ethernet Database SAN
Servers
Servers
Internet
Local
Storage
Storage
Servers
Download