Power Cost Reduction in Distributed Data Centers Yuan Yao University of Southern California Joint work: Longbo Huang, Abhishek Sharma, LeanaGolubchik and Michael Neely IBM Student Workshop for Frontiers of Cloud Computing 2011 Paper to appear on Infocom 2012 1 Background and motivation • Data centers are growing in number and size… – Number of servers: Google (~1M) – Data centers built in multiple locations • IBM owns and operates hundreds of data centers worldwide • …and in power cost! – Google spends ~$100M/year on power – Reduce cost on power while considering QoS 2 Existing Approaches • Power efficient hardware design • System design/Resource management – Use existing infrastructure – Exploit options in routing and resource management of data center 3 Existing Approaches • Power cost reduction through algorithm design – Server level: power-speed scaling [Wierman09] – Data center level: rightsizing [Gandhi10, Lin11] – Inter data center level: Geographical load balancing [Qureshi09, Liu11] $5/kwh $2/kwh job 4 Our Approach: SAVE • We provide a framework that allows us to exploit options in all these levels Server level Data center level Inter data center level + Job arrived Temporal volatility of power prices = StochAstic power redUctionschEme(S AVE) Job served 5 Our Model: data center and workload • M geographically distributed data centers • Each data center contain a front end server and a back end cluster • Workloads Ai(t) (i.i.d) arrive at front end servers and are routed to one of the back end clusters µji(t) 6 Our Model: server operation and cost • Back end cluster of data center i contain Ni servers – Ni(t) servers active • • • • Service rate of active servers: bi (t) ∈[0, bmax] Power price at data center i: pi(t) (i.i.d) Powerusage at data center i: Power cost at data center i: 7 Our Model: two time scale • The system we model is two time scale – At t=kT, change the number of active servers Nj(t) – At all time slots, change service rate bj(t) 8 Our Model: summary • Input: power prices pi(t), job arrival Ai(t) • Two time Scale Control Action: • Queue evolution: • Objective: Minimize the time average power cost subject to all constraints on Π, and queue stability 9 SAVE: intuitions • SAVE operates at both front end and back end • Front end routing: – When , choose μij(t)>0 • Back end server management: – Choose small Nj(t) and bj(t) to reduce the power costfj(t) – When is large, choose large Nj(t) and bj(t) to stabilize the queue 10 SAVE: how it works • Front end routing: – In all time slot t, choose μij(t) maximize • Back end server management: Choose V>0 – At time slot t=kT, choose Nj(t) to minimize – In all time slots τ choose bj(τ) to minimize • Serve jobs and update queue sizes 11 SAVE: performance • Theorem on performance of our approach: – Delay of SAVE ≤ O(V) – Power cost of SAVE ≤ Power cost of OPTIMAL + O(1/V) – OPTIMAL can be any scheme that stabilizes the queues • V controls the trade-off between average queue size (delay) and average power cost. • SAVE suited for delay tolerant workloads 12 Experimental Setup • We simulate data centers at 7 locations – Real world power prices – Possion arrivals • We use synthetic workloads that mimics MapReduce jobs • Power Cost Power price Power consumption of active servers Power consumption of servers in sleep Power usage effectiveness 13 Experimental Setup: Heuristics for comparison • Local Computation – Send jobs to local back end • Load Balancing All servers are activated – Evenly split jobs to all back ends • Low Price (similar to [Qureshi09]) – Send more jobs to places with low power prices • Instant On/Off Unrealistic – Routing is the same as Load Balancing – Data center i tune Ni(t) and bi(t) every time slot to minimize its power cost – No additional cost on activating/putting to sleep servers 14 Experimental Results relative power cost reduction as compared to Local Computation • As V increases, power cost reduction grows from ~0.1% to ~18% • SAVE is more effective for delay tolerant workloads. 15 Experimental Results: Power Usage • We record the actual power usage (not cost) of all schemes in our experiments • Our approach saves power usage 16 Summary • We propose atwo time scale, non work conserving control algorithm aimed atreducing power costin distributed data centers. • Our work facilitating an explicit power cost vs. delay trade-off • We derive analytical bounds on the time average power cost and service delay achieved by our algorithm • Through simulations we show that our approach can reduce the power cost by as much as 18%, and our approach reduces power usage. 17 Future work • Other problems on power reduction in data centers – Scheduling algorithms to save power – Delay sensitive workloads – Virtualized environment, when migration is available 18 Questions? • Please check out our paper: – "Data Centers Power Reduction: A two Time Scale Approach for Delay Tolerant Workloads” to appear on Infocom 2012 • Contact info: yuanyao@usc.edu http://www-scf.usc.edu/~yuanyao/ 19