Geographically Distributed Datacenters with Load Reallocation Indra Widjaja, Sem Borst, Iraj Saniee Bell Labs DIMACS Workshop on Cloud Computing, December 8-9, 2011 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Datacenter Alternatives Geographically Centralized: 2 Geographically Distributed: 2 1 1 3 4 3 5 4 = Servers 5 = Potential DC Site 2 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Challenge •Centralized datacenters cannot uniformly offer low-latency services to all end-users •Distributed datacenters may not achieve elasticity 3 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Toy Example of Distributed DC with Reallocation Without reallocation: With reallocation: λ1 m1 q1,1 1 m1 3 2 4 λ1 1 q1,3 3 2 5 4 5 • i = job arrival rate at site i , mi = processing capacity at site i • qi,j = fraction of load reallocated from site i to site j 4 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Formal Model of Load (Re)Allocation in Geographically Distributed Datacenter Let lik be arrival rate of type-k jobs at site i, bk service time of type-k job per server, and ti,j round-trip delay between sites i and j. The optimization problem to solve is: weighted average delay fraction of load at i sent to j st where normalized exogenous arrival rate at i total exogenous arrival rate at all sites total arrival rate at site j utilization at site j with Kj servers average processing delay with multiple-server approx. 5 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Toy Example of Distributed DC with Reallocation λ1=2 1 λ2=1.5 1 λ3=1 2 3 1 1 λ4=1.5 1 1 4 5 Node l m t Delay 1 1.814 3 0 0.8432 0.186 1->3 1 1.6143 2 1.5 3 0 0.6667 3 1 3 0 0.6143 4 1.5 3 0 0.6667 5 1.814 3 0 0.8432 0.186 5->3 1 1.6143 Q= 0.907 0 0 0 0 0 1 0 0 0 0.093 0 1 0 0.093 0 0 0 1 0 0 0 0 0 0.907 λ5=2 Weighted Delay = 0.7842 6 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Large-Scale Topology 32-node, 44-link network used in the experiment: SEA 11 SAI 6 2 5 2 SAL 4 SFO 2 3 DEN 4 4 LOS PHO 2 1 1 PIT 2 4 RAL 5 3 ATL 1 NOR HOU 1 JAC 3 TAM TAM 3 2 MIA • Each link is associated with delay tij. • The centralized datacenter is located in CHI 7 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION 1 PHI BAL 1 1 1 WAS 2 1 ALB 2 2 1 CIN NAS 3 ELP CLE 3 5 2 1 CHI 2 KAN 2 3 1 DET 1 1 SPR 2 LAS BUF MIL 1 1 NYC BOS Comparison of Delays Nearly-uniform job arrival rates: Non-uniform job arrival rates: li = 1.1l, if i is odd 0.9l, if i is even li = 1.5l, if i is odd 0.5l, if i is even mi =1 for all i 8 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Comparison of Elasticities Moderate load variation: High load variation: In each trial, li =Uniform(0.25, 1) for moderate load variation for each i li =Uniform(0, 1.5) for high load variation for each i Then rescale li such that system-wide utilization is fixed (to 0.5). mi = 1 for each i 9 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Multiple Job Types • Type-independent: jobs are reallocated from i to j with qi,j fraction regardless of their types • Type-dependent: type-k jobs are reallocated from i to j with qki,j Example with 2 job types: 10 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Distributed Algorithms for Load Reallocation • Basic idea: - Each site i computes impact on global objective function as it sends an additional small fraction of jobs to each site j, i.e., - Min-rule: site i determines site jmin(i) such that ai,jmin(i) is the minimum derivative. It then reallocates loads from other sites to site jmin. - Max-rule: site i determines site jmax(i) such that ai,jmax(i) is the maximum derivative. It then reallocates loads from site jmax to other sites. 11 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Distributed Algorithm with “min-rule” At site i: Compute gi,j = ai,j - ai,jmin(i) for all j Ni, compute gi = ∑jNi, j ≠jmin(i) gi,j, and d=min{k, (1-rjmin(i)) Kjmin(i)/(li b gi)} where jmin(i) = argminjNi ai,j At site i: Evaluate hi,j = min{qi,j, d gi,j} for all j ≠ jmin(i), jNi, and hi,jmin(i) = - ∑j≠jmin(i), jNi hi,j At site i: Update qi,j = qi,j-hi,j for all jNi, qi,j=0, for jNi Collect new measurement and go to next site (e.g., i=i+1 mod N) No Converged? Yes Detect changes in delay and utilization 12 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Distributed Algorithm with “max-rule” At site i: Compute gi,j = max{ai,jmax(i) - ai,j, 0} for all jNi and compute nij = (1-rj) Kj/(li b), for all j ≠ jmax(i), j Ni, where jmax(i) = argmaxj:qi,j>0 ai,j At site i: Compute d = min{k, qi,jmax(i)/ ∑jNi gi,j} Evaluate hi,j = min{nij, d gi,j} for all j ≠ jmax(i), j Ni, and hi,jmax(i) = - ∑j≠jmax(i),jNi hi,j At site i: Update qi,j = qi,j + hi,j for all jNi, qi,j=0, for jNi Collect new measurement and go to next site (e.g., i=i+1 mod N) No Converged? Yes Detect changes in delay and utilization 13 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Scenario 1: Load Increases by 50% at One Site 14 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Scenario 2: Load Increases by 100% at One Site 15 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Scenario 3: Load Increases by 200% at One Site 16 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Scenario 4: Two Back-to-Back Overloaded Sites 17 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Scenario 5: Noisy versus Perfect Measurements 18 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION Conclusions and Further Work • Load reallocation provides key instrument for achieving elasticity and reducing latency simultaneously • Only considered processing-intensive applications so far; other applications will be considered in further work 19 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. ALCATEL-LUCENT — CONFIDENTIAL — SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW — PROPRIETARY — USE PURSUANT TO COMPANY INSTRUCTION