Data Centers Placement

advertisement
WORKING DRAFT
Approximation Algorithm for
Soft-Capacitated Connected
Facility Location Problems
Data Centers Placement
7'th Israeli Network Seminar 2012
Prof. Danny Raz and Assaf Rappaport
17/05/2012
Contents
▪
Data Centers
▪
Facility Location Problem
▪
Steiner Tree
▪
Connected Facility Location
▪
Google Case Study
1
Data centers are becoming the hosting platform for a wide spectrum
of composite applications
Examples of data centers applications
1 Email services
▪
Data centers are used to run
applications that handle the
core business and operational
data of organizations:
– SaaS – Software as a Service
2 Database services
3 File Servers
4 Collaboration tools
– HaaS – Hardware as a service
– PaaS – Platform as a Service
5 CRM (Customer Relationship Management)
6 ERP (Enterprise Resource Planning)
7 E-Commerce
2
In recent years, large investments have been made in massive data
centers supporting cloud services
A list of companies that are running at least 50,000 servers
SOURCE: Data Center Knowledge (DCK)
3
With an increasing trend towards communication intensive applications,
the bandwidth usage within and between data centers is rapidly growing
4
Data centers placement presents challenging optimization problems (1/2)
1 Graph with costs on edges
1 Number of facilities
2 Set of locations where facilities may be
placed
2 Location
3 Set of demand nodes that must be
assigned to an open facility
3 Assignment
5
Data centers placement presents challenging optimization problems (2/2)
1 Graph with costs on edges
1 Number of facilities
2 Set of locations where facilities may be
placed
2 Location
3 Set of demand nodes that must be
assigned to an open facility
3 Assignment
6
The goal is to optimally place the applications and their related data
over the available infrastructure
Consider the following scenario:
▪
An email application in the cloud
depends on an authentication service
▪
We consider the problem of placing replicas of
the authentication servers at multiple locations in
the data center
?
Data center
7
Replica placement deals with the actual number and network location of
the replicas
▪
We would like to minimize the
network distance between an
application server and the
closest replica and thus having
more replicas helps
▪
A replica must be synchronized with
the original content server in order
to supply reliable service
▪
The synchronization traffic across
the network depends on the number
of replicas deployed in the network,
the topology of the distributed
update and the rate of updates in
the content of the server
▪
Having more replicas is
more expensive so we
need to model the cost
?
8
Contents
▪
Data Centers
▪
Facility Location Problem
▪
Steiner Tree
▪
Connected Facility Location
▪
Google Case Study
9
The general uncapacitated facility location problem (1/2)
Description
▪
▪
▪
Set of potential facility sites where a facility can be opened
▪
There can be a cost associated with creating each facility that also must be
minimized, otherwise all points would be facilities
▪
Minimize the sum of distances, plus the sum of opening costs of the facilities
Set of demand points D that must be serviced
We want the facilities to be as efficient as possible, thus we want to minimize the
distance from each client to its closest facility.
Input
▪
▪
▪
▪
Set D of clients
Output
▪
Set F of potential facility locations
A distance function
A cost function
10
The general uncapacitated facility location problem (2/2)
fj
dij
Facilities F
Customers D
Facility Location (FL) Problem:
Open a subset of facilities & connect
customers to one facility each at minimal cost
11
Uncapacitated facility location problem - History
The Fermat-Weber Problem
The point minimizing the
sum of distances to the
sample points:
Given set of m points
and positive multipliers
Uncapacitated Facility
Location Problem
Plant location problem
or warehouse location
problem
Stollsteimer - 1963
Constant-factor
approximation algorithm
Shmoys, Tardos and
Aardal give a first
polynomial-time
algorithm that finds a
solution within a factor
of 3.16 of the optimal
Kuehn and Hamburger - 1963
Find a point
Balinski and Wolfe - 1963
that minimizes
Manne - 1964
17th century
1960s
1997
12
Contents
▪
Data Centers
▪
Facility Location Problem
▪
Steiner Tree
▪
Connected Facility Location
▪
Google Case Study
13
Steiner Tree Problem
Output
▪
Input
▪
Find the minimum cost tree that spans the
nodes in S
10
10
10
Given:
4
5
3
3
– An undirected
weighted graph G(V,E)
– A set of nodes S
(subset of V)
2
▪
▪
1
5
Which is the Steiner tree for the green nodes?
Shortest path tree doesn’t equal Steiner tree
10
4
2
10
10
3
2
5
6
3
2
1
10
4
10
10
3
2
5
5
3
2
1
14
Contents
▪
Data Centers
▪
Facility Location Problem
▪
Steiner Tree
▪
Connected Facility Location
▪
Google Case Study
15
Connected Facility Location
Input
▪
Given:
Graph G=(V,E), costs {ce} on
edges and a parameter M ≥ 1
F : set of facilities
D : set of clients (demands)
Facility i has facility cost fi
cij : distance between i and j in V
We want to:
Pick a set A of facilities to open
Assign each demand j to an open facility i(j)
Connect all open facilities by a Steiner tree T
Cost = I in A fi + j in D ci(j)j + M e in T ce
= facility opening cost + client assignment
cost + cost of connecting facilities
open
facility
facility
facility
client
client
Steiner tree
node
Soft-ConFL algorithm – the first deterministic constant approximation
algorithm for the soft capacitated connected facility location problem
Ρ-approximation
algorithm
Text
for the Uncapacitated
Facility Location Problem
μ-approximation algorithm
for the minimum Steiner
Tree Problem
fj
dij
Add a cost λi to each facility: This cost is
defined as twice the minimum cost of
satisfying M units of demand from facility i.
Modify the distance function by
adding:
17
Deterministic constant approximation algorithm
18
Proof of lemma 1
19
Proof of lemma 1
20
Proof of lemma 1
Convert into
a binary tree
<M
3M>
<M
<M
21
Contents
▪
Data Centers
▪
Facility Location Problem
▪
Steiner Tree
▪
Connected Facility Location
▪
Google Case Study
22
Google data centers
Google data centers world wide
Google data centers in the USA
▪
Google data centers in Europe
Google operates data centers in:
– 19 in the US
– 12 in Europe
– one in Russia
– one in South America
– 3 in Asia
▪
Not all of the locations are dedicated
Google data centers
Google data centers – Case example
X 36 Google
data centers
How many
replicas?
Locations?
Unified
demand
Unified cost
Geographic
distance
24
Google data centers: Greedy vs. CoFL
Greedy
▪
Facility cost:
5,000-10,000
▪
Min SPT:
22,000
▪
Total demand:
36
CoFL
Google data centers: Greedy vs. UFL vs. CoFL
▪
Facility cost:
5,000
▪
Min SPT:
22,000
▪
Total demand:
36
Greedy
UFL
CoFL
Google data centers: Greedy vs. UFL vs. CoFL
▪
Facility cost:
3,000
▪
Min SPT:
22,000
▪
Total demand:
36
5
Greedy
UFL
CoFL
4
Google data centers: Greedy vs. UFL vs. CoFL
▪
Facility cost:
3,000
▪
Min SPT:
22,000
▪
Total demand:
36
Greedy
UFL
CoFL
CoFL
2.80%
▪
Facility cost: 1,000
▪
Min SPT: 22,000
▪
Total demand: 36
Mountain View, Calif.
Beijing
Portland, Oregon
Lenoir, North Carolina
Frankfurt, Germany
Pryor, Oklahoma
Mons, Belgium
Moscow, Russia
Sao Paulo, Brazil
Tokyo
Hong Kong
Atlanta, Ga. (two sites)
Ashburn, Va.
Groningen, Netherlands
Other 22 Facilities
5.60%
8.30%
11.10%
13.90%
2.8%
CoFL
Mountain View, Calif.
Pleasanton, Calif.
San Jose, Calif.
Los Angeles, Calif.
Palo Alto, Calif.
Seattle
Portland, Oregon
The Dalles, Oregon
Chicago
Atlanta, Ga. (two sites)
Reston, Virginia
Ashburn, Va.
Virginia Beach, Virginia
Houston, Texas
Miami, Fla.
▪
Lenoir, North Carolina
Facility cost: 1,000
Goose Creek, South Carolina
Pryor, Oklahoma
▪
Council Bluffs, Iowa
Min SPT: 22,000
Toronto, Canada
Berlin, Germany
▪
Total demand: 36
Frankfurt, Germany
Munich, Germany
Zurich, Switzerland
Groningen, Netherlands
Mons, Belgium
Eemshaven, Netherlands
Paris
London
Dublin, Ireland
Milan, Italy
Moscow, Russia
Sao Paulo, Brazil
Tokyo
Hong Kong
Beijing
5.6%
8.3%
11.1%
13.9%
2.80%
5.60%
8.30%
11.10%
13.90%
Mountain View, Calif.
Beijing
Portland, Oregon
Lenoir, North Carolina
Frankfurt, Germany
Pryor, Oklahoma
Mons, Belgium
Moscow, Russia
Sao Paulo, Brazil
Tokyo
Hong Kong
Atlanta, Ga. (two sites)
Ashburn, Va.
Groningen, Netherlands
Other 22 Facilities
31
Greedy
UFL
CoFL
32
33
The Steiner tree problem is NP-hard
Reduction
We will show that a known NP-hard problem can be solved in polynomial
complexity if the Steiner decision problem can be solved in polynomial complexity
x1
Exact cover by 3-sets is NP-hard
X = {x1, x2,……, x3p}
C1
C = {C1, C2,….. Cq}
C2
v
Ci X | |Ci|=3, i=1,…..q
Is it possible to select
mutually disjoint subsets
such that their union is X?
x2
x3
x4
x5
x6
C3
C4
x7
x8
x9
x10
34
Download