Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers

advertisement
Energy Conservation in
Datacenters through Cluster
Memory Management and
Barely-Alive Memory Servers
Vlasia Anagnostopoulou (vlasia@cs.ucsb.edu),
Susmit Biswas, Alan Savage, Ricardo
Bianchini*, Tao Yang, Frederic T. Chong
Department of Computer Science, UC Santa Barbara
*Department of Computer Science, Rutgers University
Dependence on Internetservices…
Rise in Daily Searches
• More online services
500000000
– new internet-services, 400000000
email, informational 300000000
sites, social networks…200000000
500,000
• Example:
100000000
10,000
0
– Growth of web-search
1998
2001
450,000,000
200,000,000
2004
2007
Year
Google Searches
Total
Environmental impact
• Internet-services live in datacenters
• Thousands of machines per datacenter
• Many datacenters across the globe
– Energy consumption: ~1.2% total US
[Ref: EPA]
– Energy consumpt.
growth:
Are Datacenters efficient?
• Strict performance
standards for
internet-services
through SLAs
• Over-provisioning
– Machines are
under-utilized most
of the time
– Servers are
inefficient at low or
average utilization
Ref: Barroso and Hölzle
Current techniques for efficiency
• Under low load, reconfigure cluster
– Consolidate load into fewer machines
– Turn rest off
– Transition to low power idle state
– Memory is not accessible in these states
• Operate at lower frequency (VS)
• Performance problems
– For internet-services the working set typically
doesn’t shrink with load!
– Because of reboot, very slow to restart (~sec)
– Have to warm-up memory
For web-search, memory is
particularly critical
• Search dataset doesn’t change much with
load!
• Searches have temporal locality
-> Zipf’s distribution [Ref:Adamic]
• Intense database search
– May search up to hundreds servers at a time
• But fairly light CPU task to process a search
query
Memory can and should be managed wisely, in
order not to loose performance!
Our technique for efficiency +
performance:
• Barely-Alive state:
– CPU is turned off, memory is kept on
– Much lower power consumption
• Distributed middleware:
– Request distribution
– Transition servers to BA state
– Manage server memory content locally
– Allocate optimal memory to services globally
– Do not degrade performance (respect SLA)
Hardware requirements?
How would it be implemented
• Memory is accessed
through Memory Controller
(MC)
• MC is on CPU (bummer!)
• Install small CPU on NIC
• Memory accessible like
DMA via new CPU + MC
• Turn off main CPU
Software requirements?
OF
F
Basic request distribution in a selfmanaged cluster algorithm
• LARD: locality-aware
[Ref: V.Pai+others]
I am less
loadeddistribution
than
request
Server 3
Challenges of integrating BA servers
• PRESS:
its distributed
version
into the
request distribution
scheme:
[Ref: Carrera+Bianchini]
• Transition from Active to BA and
vice versa
• Main idea:
• Exploit locality in references by forwarding
same requests
same machine
• Stale to
content
of BA servers
• But balance load evenly among machines
Self-managed cluster with BA servers
• Transition Active to BA
– Application decides on global level
– Locally, if there are no procs or reqs
– Make sure not to over-utilize active servers!
• Stale content of BA servers
– Store installs new object (immutability)
– Application may invalidate old objects at will
– On activate, BA updates its Directory from
active
– Periodic activation or state swapping
– Space of obsolete objects can be reclaimed
Optimal memory allocation? Multiple services? Energy efficiency?
Middleware for efficient memory
management
• Optimal memory allocation
– Dynamically size memory to respect exactly the SLA
requirement
– Translate SLA requirement -> target hit-ratio
– Use stack algorithm to predict optimal size from target
hit-ratio
• Stack algorithm overview
– Measures contribution of cache size
to the hit-ratio
– On a single pass, it calculates the
cumulative hit-ratio with size
Size
Hits
Hit-ratio
1
6/9
66.7%
2
1/9
77.8%
3
0/9
77.8%
How to adapt the stack algorithm for resizing the
global cluster memory optimally?
Optimal memory allocation
• Distributed stack algorithm
– For each server:
– Keep track of memory size +
hit ratio information
– On time window,
broadcast size for desired hit-ratio.
– Resize local stack with global average size
Extension for BA servers, variable sized objects, multiple
services…
Extension of distributed Stack Algo
• Include BA servers
– Contribute fixed amount of memory (passive)
• Multiple-size objects
– Separate stack for each object size
– This leverages directory look-up
• Multiple services
– Each service keeps its own stack in the
memory
– Memory partitioned across services
Energy efficiency of BA state (without the efficiency
yielded from the memory management)
Power savings potential
Cumulative power savings
•Synthetic search
trace over 1 day
(24h)
Future work
• Currently looking into more on-line and offline apps (e.g. web-translation, sorting
algorithm)
• Extend power consumption breakdown
• Sensitivity analysis of power savings to
simulation’s parameters
– (e.g. memory capacity, network assumptions,
component access times, etc)
• Evaluation of distributed algorithm
Conclusions
• Datacenters have a growing impact on the
environment
• Machines in datacenters are inefficient
• Memory is a critical component for
performance for applications run on a cluster
• Exploit memory without degrading
performance with Barely-Alive state +
middleware
• Potential power savings up to 49%, without
loss of performance
Questions?
• Thank you for your attention!
• vlasia@cs.ucsb.edu
• www.cs.ucsb.edu/~arch
Download