WORKING DRAFT Approximation Algorithm for Soft-Capacitated Connected Facility Location Problems Data Centers Placement 7'th Israeli Network Seminar 2012 Prof. Danny Raz and Assaf Rappaport 17/05/2012 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study 1 Data centers are becoming the hosting platform for a wide spectrum of composite applications Examples of data centers applications 1 Email services ▪ Data centers are used to run applications that handle the core business and operational data of organizations: – SaaS – Software as a Service 2 Database services 3 File Servers 4 Collaboration tools – HaaS – Hardware as a service – PaaS – Platform as a Service 5 CRM (Customer Relationship Management) 6 ERP (Enterprise Resource Planning) 7 E-Commerce 2 In recent years, large investments have been made in massive data centers supporting cloud services A list of companies that are running at least 50,000 servers SOURCE: Data Center Knowledge (DCK) 3 With an increasing trend towards communication intensive applications, the bandwidth usage within and between data centers is rapidly growing 4 Data centers placement presents challenging optimization problems (1/2) 1 Graph with costs on edges 1 Number of facilities 2 Set of locations where facilities may be placed 2 Location 3 Set of demand nodes that must be assigned to an open facility 3 Assignment 5 Data centers placement presents challenging optimization problems (2/2) 1 Graph with costs on edges 1 Number of facilities 2 Set of locations where facilities may be placed 2 Location 3 Set of demand nodes that must be assigned to an open facility 3 Assignment 6 The goal is to optimally place the applications and their related data over the available infrastructure Consider the following scenario: ▪ An email application in the cloud depends on an authentication service ▪ We consider the problem of placing replicas of the authentication servers at multiple locations in the data center ? Data center 7 Replica placement deals with the actual number and network location of the replicas ▪ We would like to minimize the network distance between an application server and the closest replica and thus having more replicas helps ▪ A replica must be synchronized with the original content server in order to supply reliable service ▪ The synchronization traffic across the network depends on the number of replicas deployed in the network, the topology of the distributed update and the rate of updates in the content of the server ▪ Having more replicas is more expensive so we need to model the cost ? 8 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study 9 The general uncapacitated facility location problem (1/2) Description ▪ ▪ ▪ Set of potential facility sites where a facility can be opened ▪ There can be a cost associated with creating each facility that also must be minimized, otherwise all points would be facilities ▪ Minimize the sum of distances, plus the sum of opening costs of the facilities Set of demand points D that must be serviced We want the facilities to be as efficient as possible, thus we want to minimize the distance from each client to its closest facility. Input ▪ ▪ ▪ ▪ Set D of clients Output ▪ Set F of potential facility locations A distance function A cost function 10 The general uncapacitated facility location problem (2/2) fj dij Facilities F Customers D Facility Location (FL) Problem: Open a subset of facilities & connect customers to one facility each at minimal cost 11 Uncapacitated facility location problem - History The Fermat-Weber Problem The point minimizing the sum of distances to the sample points: Given set of m points and positive multipliers Uncapacitated Facility Location Problem Plant location problem or warehouse location problem Stollsteimer - 1963 Constant-factor approximation algorithm Shmoys, Tardos and Aardal give a first polynomial-time algorithm that finds a solution within a factor of 3.16 of the optimal Kuehn and Hamburger - 1963 Find a point Balinski and Wolfe - 1963 that minimizes Manne - 1964 17th century 1960s 1997 12 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study 13 Steiner Tree Problem Output ▪ Input ▪ Find the minimum cost tree that spans the nodes in S 10 10 10 Given: 4 5 3 3 – An undirected weighted graph G(V,E) – A set of nodes S (subset of V) 2 ▪ ▪ 1 5 Which is the Steiner tree for the green nodes? Shortest path tree doesn’t equal Steiner tree 10 4 2 10 10 3 2 5 6 3 2 1 10 4 10 10 3 2 5 5 3 2 1 14 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study 15 Connected Facility Location Input ▪ Given: Graph G=(V,E), costs {ce} on edges and a parameter M ≥ 1 F : set of facilities D : set of clients (demands) Facility i has facility cost fi cij : distance between i and j in V We want to: Pick a set A of facilities to open Assign each demand j to an open facility i(j) Connect all open facilities by a Steiner tree T Cost = I in A fi + j in D ci(j)j + M e in T ce = facility opening cost + client assignment cost + cost of connecting facilities open facility facility facility client client Steiner tree node Soft-ConFL algorithm – the first deterministic constant approximation algorithm for the soft capacitated connected facility location problem Ρ-approximation algorithm Text for the Uncapacitated Facility Location Problem μ-approximation algorithm for the minimum Steiner Tree Problem fj dij Add a cost λi to each facility: This cost is defined as twice the minimum cost of satisfying M units of demand from facility i. Modify the distance function by adding: 17 Deterministic constant approximation algorithm 18 Proof of lemma 1 19 Proof of lemma 1 20 Proof of lemma 1 Convert into a binary tree <M 3M> <M <M 21 Contents ▪ Data Centers ▪ Facility Location Problem ▪ Steiner Tree ▪ Connected Facility Location ▪ Google Case Study 22 Google data centers Google data centers world wide Google data centers in the USA ▪ Google data centers in Europe Google operates data centers in: – 19 in the US – 12 in Europe – one in Russia – one in South America – 3 in Asia ▪ Not all of the locations are dedicated Google data centers Google data centers – Case example X 36 Google data centers How many replicas? Locations? Unified demand Unified cost Geographic distance 24 Google data centers: Greedy vs. CoFL Greedy ▪ Facility cost: 5,000-10,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 CoFL Google data centers: Greedy vs. UFL vs. CoFL ▪ Facility cost: 5,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 Greedy UFL CoFL Google data centers: Greedy vs. UFL vs. CoFL ▪ Facility cost: 3,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 5 Greedy UFL CoFL 4 Google data centers: Greedy vs. UFL vs. CoFL ▪ Facility cost: 3,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 Greedy UFL CoFL CoFL 2.80% ▪ Facility cost: 1,000 ▪ Min SPT: 22,000 ▪ Total demand: 36 Mountain View, Calif. Beijing Portland, Oregon Lenoir, North Carolina Frankfurt, Germany Pryor, Oklahoma Mons, Belgium Moscow, Russia Sao Paulo, Brazil Tokyo Hong Kong Atlanta, Ga. (two sites) Ashburn, Va. Groningen, Netherlands Other 22 Facilities 5.60% 8.30% 11.10% 13.90% 2.8% CoFL Mountain View, Calif. Pleasanton, Calif. San Jose, Calif. Los Angeles, Calif. Palo Alto, Calif. Seattle Portland, Oregon The Dalles, Oregon Chicago Atlanta, Ga. (two sites) Reston, Virginia Ashburn, Va. Virginia Beach, Virginia Houston, Texas Miami, Fla. ▪ Lenoir, North Carolina Facility cost: 1,000 Goose Creek, South Carolina Pryor, Oklahoma ▪ Council Bluffs, Iowa Min SPT: 22,000 Toronto, Canada Berlin, Germany ▪ Total demand: 36 Frankfurt, Germany Munich, Germany Zurich, Switzerland Groningen, Netherlands Mons, Belgium Eemshaven, Netherlands Paris London Dublin, Ireland Milan, Italy Moscow, Russia Sao Paulo, Brazil Tokyo Hong Kong Beijing 5.6% 8.3% 11.1% 13.9% 2.80% 5.60% 8.30% 11.10% 13.90% Mountain View, Calif. Beijing Portland, Oregon Lenoir, North Carolina Frankfurt, Germany Pryor, Oklahoma Mons, Belgium Moscow, Russia Sao Paulo, Brazil Tokyo Hong Kong Atlanta, Ga. (two sites) Ashburn, Va. Groningen, Netherlands Other 22 Facilities 31 Greedy UFL CoFL 32 33 The Steiner tree problem is NP-hard Reduction We will show that a known NP-hard problem can be solved in polynomial complexity if the Steiner decision problem can be solved in polynomial complexity x1 Exact cover by 3-sets is NP-hard X = {x1, x2,……, x3p} C1 C = {C1, C2,….. Cq} C2 v Ci X | |Ci|=3, i=1,…..q Is it possible to select mutually disjoint subsets such that their union is X? x2 x3 x4 x5 x6 C3 C4 x7 x8 x9 x10 34