Energy Efficient Geographical Load Balancing via Dynamic Deferral of Workload Muhammad Abdullah Adnan Ryo Sugihara (Amazon.com) Rajesh K. Gupta Department of CSE University of California San Diego (UCSD) Adnan . IEEE CLOUD 2012 1 Data Centers: Energy Consumption • Energy expenses become increasingly important Data Center – 61 million MWh per year, costing about 4.5 billion dollars: Growing very fast – Millions of dollars for companies every year • Increasing energy prices and rise of cloud computing – Energy efficient Cloud • Significant research on improving energy efficiency Adnan . IEEE CLOUD 2012 2 Geographical Load Balancing • Cloud Computing can be utilized for energy efficient computing. – Increasing energy prices. – ability to dynamically track these price variations. • Geographical Load Balancing techniques have been suggested for data centers hosting cloud computation – exploit the electricity price differences across regions. Adnan . IEEE CLOUD 2012 3 Qureshi et al. [ACM SIGCOMM 2009] • Geographical Load Balancing – reducing the electricity cost in a wholesale market environment. – Lower electricity bill by adapting the load balancing with dynamic electricity price variation. • Electricity Markets – Day-ahead markets (futures) • Hourly price predicted for the following day – Real-time markets (spot) • Prices are calculated every five minutes, based on actual conditions, rather than expectations. Our work • More volatile – provides opportunities for savings. Adnan . IEEE CLOUD 2012 4 Buchbinder et al.’s Approach [IFIP Networking 2011] • Online algorithms for migrating jobs between data centers, – fundamental tradeoff between energy and bandwidth costs. • Sophisticated methods to reduce the computational complexity of the proposed heuristics. • Drawbacks – Elementary cost vectors • Large number of iterations – Discretization of continuous update rule • Computationally costly – Bounded Competitive Ratio • constant/fixed workload - √ • varying workload – x – No deadline requirement. Adnan . IEEE CLOUD 2012 5 Liu et al.’s Algorithm [ACM SIGMETRICS 2011] • Distributed algorithms for Geographical Load Balancing – Multiple sources for workload. – Incorporated capacity provisioning inside data centers • Only homogeneous servers • Investigated how renewable energy can be used to lower the electricity price of brown energy. • Drawbacks – No bound on the maximum delay. – No workload migration. Adnan . IEEE CLOUD 2012 6 Dynamic Deferral • Cloud Computing and Mobile Computing – More and more computation has been outsourced to the cloud. – Different types of workload • Delay sensitive, response time/throughput guarantee, Completion time/deadline requirement. • Service level agreement (SLA) – Latency requirement – Often has some flexibility • We use the flexibility from different SLAs for geographical load balancing to reduce energy consumption. – Defer some of the workload to execute later when electricity price is low – Utilize the slackness in the execution of jobs for energy savings. Adnan . IEEE CLOUD 2012 7 Assumptions • Temporal and Geographical variation of electricity prices. – Variation is unpredictable. – Migrate jobs between data centers • cloud service providers have many replication of their data. • We consider data centers as computation units. – Homogeneous/heterogeneous • Workloads arrive at a central dispatcher. – Dispatcher cannot store workload – Makes load balancing decision CLOUD dispatcher Adnan . IEEE CLOUD 2012 8 Geographical Load Balancing migration j assignment zi,j,d,t i xi,d,t Adnan . IEEE CLOUD 2012 9 Model Formulation • Workload Model – Workload Lt released at time t has Deadline Dt • Cost Model j zi,j,d,t i – Energy cost xi,d,t • Proportional to the workload C i ,t ( y i ,t ) i i ,t y i ,t • piecewise linear function – Bandwidth cost • Cost of migration Lt B i , t ( z i , j , t ) bi , j z i , j , t Adnan . IEEE CLOUD 2012 t t+1 …… t+D 10 Model Formulation • Assumption: uniform deadline – Deadline is same for all the jobs • The net amount of workload executed at data center i at time t assigned + migrated in - migrated out n y i ,t x i ,t D n D z j ,i , d ,t d z i , j , d ,t d j 1 d 1 Adnan . IEEE CLOUD 2012 j 1 d 1 11 Offline Formulation • Future price known => there exists optimal solution without migration. – Dispatcher can always make the correct assignment. Execution cost Migration cost Total assignment equals total released workload Total migration cannot exceed total assignment Adnan . IEEE CLOUD 2012 12 Online Challenges Decide xt & zt online 0 t time • Unpredictable future electricity cost. – How much to execute at current time? – How much to defer to execute later? – How much to migrate and where? • Future workload is also unknown – Online algorithm Adnan . IEEE CLOUD 2012 13 Our Approach • Decouple migration from assignment. • @ Dispatcher – Assignment – based on the current electricity prices and future price predictions. • @ DC - Migration Decision – The predicted electricity prices by the dispatcher may contain prediction errors. – Data centers correct that error by migrating jobs between each other at later time slots. Adnan . IEEE CLOUD 2012 14 @ Dispatcher – Assignment • The dispatcher distributes the workload among n data centers. DC1 Lt t t t+1 …… t+1 t+D t+D DC2 t t+1 t+D DCn t Adnan . IEEE CLOUD 2012 t+1 t+D 15 @ DC • Adjust assignment with dynamic electricity price variation. – Moving workload at earlier time slots. – Migrating workload between data centers. Adnan . IEEE CLOUD 2012 16 Formulation w/o Migration • Workload assigned at later time slots can only be moved to previous time slots. Total execution should be equal Data Center 1 t t+1 t+D t t+1 t+D t t+1 t+D Data Center 2 Data Center n unexecuted workload Adnan . IEEE CLOUD 2012 execution cannot be less than unexecuted workload 17 Formulation with Migration • Workload can migrate between data centers Data Center 1 t t+1 t+D t t+1 t+D t t+1 t+D Data Center 2 Data Center n unexecuted workload Adnan . IEEE CLOUD 2012 every data center does some work 18 @ DC - Migration Decision + t t+1 t+D t t+1 t+D assigned workload unexecuted workload + t t+1 t+D Migrated-in workload t t+1 t+D Migrated-out workload Adnan . IEEE CLOUD 2012 19 How good is the algorithm? Lemma No online algorithm has constant competitive ratio with respect to the offline formulation. Proof Adversary Method βt+i = K’βt βt = Kβt+D CASE 1: xD,t ≠ 0 Lt = Lt+1 = M t t+1 …… t+D Competitive Ratio = t t+1 t t+1 …… …… Offline Online Cost Offline Cost = K’/K, arbitrary t+D t+D K’ > K t t+1 Adnan . IEEE CLOUD 2012 …… Any Online t+D 20 How good is the algorithm? Lemma No online algorithm has constant competitive ratio with respect to the offline formulation. Proof Adversary Method βt+i = K’βt βt = Kβt+D CASE 2: xD,t = 0 Lt = M t t+1 …… t+D Competitive Ratio = t t+1 t t+1 …… …… Offline Online Cost Offline Cost = K, arbitrary t+D t+D K’ > K t t+1 Adnan . IEEE CLOUD 2012 …… Any Online t+D 21 How good is the algorithm? • Since the competitive ratio cannot be bounded, we compare the online algorithm with much simpler online algorithms. • Suppose Online Prediction Algorithm Error AEM √ √ AE √ x A x x Adnan . IEEE CLOUD 2012 Migration 22 How good is the algorithm? Lemma Cost(AEM) ≤ Cost(AE) • Proof Let Δy = amount of migrated workload y = amount of non- migrated workload CostAEM(y) = CostAE(y) • Data Center 1 t t+1 t+D t t+1 t+D Migration happens only when Cost of execution of Δy at earlier time slot + cost of migration of Δy ≤ Cost of execution of Δy at later time slot Data Center 2 CostAEM(Δy) ≤ CostAE(Δy) Data Center n t t+1 unexecuted workload t+D • Cost(AEM) = CostAEM(y) + CostAEM(Δy) ≤ CostAE(y) + CostAE(Δy) ≤ Cost(AE) Adnan . IEEE CLOUD 2012 23 How good is the algorithm? Lemma Cost(AEM) ≤ Cost(AE) + Lemma Cost(AE) ≤ (1+ε) Cost(A) Proof Prediction error, ε Predicted price, β’ Actual price, β β’ – ε ≤ β ≤ β’ + ε α + β’y Cost(AE) Cost(A) = Adnan . IEEE CLOUD 2012 α + βy ≤ 1+ ε β ≤1+ε 24 How good is the algorithm? Lemma Cost(AEM) ≤ Cost(AE) + Lemma Cost(AE) ≤ (1+ε) Cost(A) ‖ Theorem Cost(AEM) ≤ (1+ε) Cost(A) Adnan . IEEE CLOUD 2012 25 Electricity Price Prediction • We model future prices within 24-hr time-frame with Gaussian random variables with – Means: predicted prices by moving average from current day prices. – Variance: estimated from the history by the weighted average price prediction filter. • By using two different methods for mean and variance, we exploit both temporal and historical correlation of electricity prices. Adnan . IEEE CLOUD 2012 26 Evaluation - Electricity Price • Four data centers geographically located at four different locations. five minute locational marginal electricity prices in real time market on 15th February, 2012 for four different regions. Adnan . IEEE CLOUD 2012 27 Evaluation - Workload • Two MapReduce Traces from Facebook – Cluster of 600 machines over 24 hours. – Time slot length of 5 minutes because electricity prices vary with an interval of 5 minutes. Workload A Workload B Adnan . IEEE CLOUD 2012 28 Evaluation - Deadline • We vary deadline 1-12 slots and compare cost reduction with respect to the greedy algorithm without deferral by Qureshi et al. • Dynamic deferral can provide around 30% cost savings for deadlines of 12 slots (1 hour) and even for one slot we can get 5% cost savings. Workload A Workload B Adnan . IEEE CLOUD 2012 29 Evaluation - Deadline • We compare the total cost from the algorithms AEM and AE. • The total cost from AEM is always less than the AE as claimed in Lemma. • As deadline increases prediction error increases (AE) but cost decreases (AEM) due to flexibility of migration. Workload A Workload B Adnan . IEEE CLOUD 2012 30 Non-uniform Deadline • Workload decomposed according to their associated deadline, Ld,t , 0 ≤ d ≤ D • Then we replace the release constraints in the formulations by D L d ,t Lt d 0 • Deadline assignment by k-means clustering based on sizes (map, shuffle and reduce bytes) 15.64% cost reduction for Workload A 9.23% cost reduction for Workload B Adnan . IEEE CLOUD 2012 31 Summary of Findings • Formulation for geographical load balancing with deferral – Uniform deadline – Non-uniform deadline • Characterization of optimal offline solution • Online Algorithm – Formulation with migration – Formulation without migration • Future work – Heterogeneity in data centers/cloud. – Availability of renewable energy. Adnan . IEEE CLOUD 2012 32 Thank You ? Adnan . IEEE CLOUD 2012 33