A Scalable Application Placement Controller for Enterprise

advertisement
An economic approach for scalable and
highly-available distributed applications
Nicolas Bonvin, Thanasis G. Papaioannou and Karl Aberer
School of Computer and Communication Sciences
Ecole Polytechnique F´ed´erale de Lausanne (EPFL)
1015 Lausanne, Switzerland
firstname.lastname@epfl.ch
2010 IEEE 3rd International Conference on Cloud Computing
1
Outline
•
•
•
•
•
•
INTRODUCTION
MOTIVATION - RUNNING EXAMPLE
SCARCE: THE QUEST OF AUTONOMIC APPLICATIONS
EVALUATION
CONCLUSIONS
COMMENT
2
INTRODUCTION
• A successful online application should
– be able to handle traffic spikes and flash crowds efficiently
– be resilient to all kinds of failures
(software, hardware, rack or even datacenter failures)
• A naive solution against load variations would be static overprovisioning of resources
– result into resource underutilization for most of the time
• As the size of the cloud increases its administrative overhead
becomes unmanageable
– The cloud resources for an application should be self-managed and
adaptive to load variations or failures
3
INTRODUCTION
• We propose a middleware
(“Scattered Autonomic Resources”, referred to as Scarce)
– to avoid underutilized computational resources
– dynamically adapts to changing conditions(failures or load variations)
• Simplifies the development of online applications composed
by multiple independent components (e.g. web services)
– following the Service Oriented Architecture (SOA) principles
4
INTRODUCTION
• Components are treated as individually rational entities that
rent computational resources from servers
• Components migrate, replicate or stop according to their
economic fitness
– fitness expresses the difference between the utility offered by a
specific application component and the cost for retaining it in the
cloud
• Components of a certain application are dynamically
replicated to geographically-diverse servers according to the
availability requirements of the application
5
INTRODUCTION
• Our approach combines the following unique characteristics:
•
•
•
•
Adaptive component replication for accommodating load variations
Geographically-diverse placement of clone component instances
Cost-effective placement of service components
Decentralized self-management of the cloud resources for the
application
6
RUNNING EXAMPLE
• Building an application that both provides robust guarantees against
failures (hardware, network, etc.) and handles dynamically load spikes is a
non-trivial task
• a simple web application for selling e-tickets composed by 4 independent
components
l
entry
point
offor
the
An
Athe
ticket
e-ticket
manager
generator
managing
that
A user manager for managing
application
and
serves
the
produces
the
amount
e-tickets
of
available
in PDF
the profiles of the customers
HTML of
pages
to end users
format
tickets
an event
7
RUNNING EXAMPLE
• Each component can be regarded as a standalone and selfcontained web service
A token (or a session ID) is assigned
to each customer’s browser by the
web front-end and is passed to each
component along with the requests
This token is used as a key in the keyvalue database to store the details
of the client’s shopping cart
(number of tickets ordered)
• This application
– is highly sensitive to traffic spikes
– is business-critical (deployed on different geographical regions)
8
SCARCE
• THE QUEST OF AUTONOMIC APPLICATIONS:
– A. The approach
– B. Server agent
– C. Routing table
– D. Economic model
– E. Maintaining high-availability
9
The approach
• We consider applications formed by many
independent components
– interact together to provide a service to the end user
• A component
– is self-managing, self-healing and is hosted by a
server(allowed to host many different components)
– stop, migrate or replicate to a new server according to its
load or availability
10
Server agent
• The server agent is a special component that resides at each
server
– responsible for starting and stopping the components of applications
– checking the health of the services
1. verifying if the service process is still running
2. firing a test request ,checking that the corresponding reply is correct
• The agent knows the properties of every service that
composes the application
– the path of the service executable
• This knowledge is acquired when the agent starts, by
contacting another agent (referred to as bootstrap agent)
11
Server agent
• During the startup phase, the agent also retrieve the current
routing table from the bootstrap agent
• Assume that a server belongs to a rack, a room, a datacenter,
a city, a country and a continent.
• A label of the form “continent-country-city-datacenter-room-rack-server”
– For example, a possible label for a server located in a data center in London
could be “EU-UK-LON-D1-C03-R11-S07”
12
Routing table
• Keeps locally a mapping between components and servers
• It is maintained by a gossiping algorithm
– where each agent contacts a random subset (log( ๐‘) where ๐‘ is the
total number of servers) of remote agents
– exchanges information about the services running on server
13
Choosing the replica
• we consider 4 different policies that a server ๐‘  may use for
choosing the replica of a component
– proximity-based policy:
the geographically nearest replica is chosen
– rent-based policy:
the least loaded server is chosen based on the rent price of the servers
– random-based policy:
a random replica is chosen
– a net benefit-based policy:
the geographically closest and least loaded replica
14
Economic model
• Service replication should be highly adaptive to the processing
load and to failures
– the server agent as an individual optimizer to each component
1. to ascertain the pre-specified availability guarantees
2. to balance its economic fitness
• At each epoch, a service :
pays a virtual rent to the servers where it is running
– virtual rent corresponds to the usage of the server resources
(CPU, memory, network, disk)
may be replicated , migrated or stopped by the server agent
– based on the service demand, the renting cost and the maintenance
of high availability
15
Economic model
• The actions performed by the server agent are directly related
to the economic viability, which is given by:
• The utility of a component corresponds to the value
• ๐’–๐’”๐’‚๐’ˆ๐’†๐’„ is a factor computed using the utilization of the server resources
by the component ๐’„
• ๐’–๐’”๐’‚๐’ˆ๐’†๐’•๐’‰๐’“๐’†๐’”๐’‰๐’๐’๐’… is a certain threshold that determines when a component
should be considered fit enough in order to replicate(currently, this is set
to 25% of server usage)
• Some components may be more business critical than others
16
Economic model
• The virtual rent paid by the component ๐‘ to the server ๐‘  is
given by:
• ๐‘๐‘œ๐‘›๐‘“๐‘  is a subjective estimation of the server quality and reliability
• ๐‘ข๐‘ ๐‘Ž๐‘”๐‘’๐‘  is a factor that expresses the resource utilization of the server
• Other utility and rent function could be used as long as they
were both increasing to the resource usage and result in
comparable values
17
Migrate or Stop
• At the beginning of a new epoch, a component may:
• migrate or stop: if it has negative balance for the last ๐‘“ epochs
– If the availability is satisfactory, the component stops
– Otherwise, it tries to find a less expensive server (migration)
• To avoid oscillations of a replica among servers, the migration
is only allowed if the following migration conditions:
– The minimum availability is still satisfied using the new server
– the absolute price difference between the current and the new server
– the ๐‘ข๐‘ ๐‘Ž๐‘”๐‘’๐‘  of the current server ๐‘  is above a soft limit
18
Replicate
• replicate: if it has positive balance for the last ๐‘“ epochs
• For replication, a component has also to verify that it can
afford the replication by having a positive balance ๐’ƒ′ for
consecutive ๐’‡ epochs:
– Where ๐’“๐’†๐’๐’•๐’”′ is the current virtual rent of the candidate server ๐‘† ′ for
replication
– the factor 1 + ∅ accounts for a ∅ โˆ™ 100% increase at this rent price in
the next epoch of the candidate server
(an upper bound of ∅ = 0.2 can typically be assumed)
19
Availability
• The availability of a component should be always kept above a
required minimum level ๐’•๐’‰
– estimating the probability of each server to fail necessitates access to
an large set of historical data and private information of the server
• Express the availability of a service ๐‘– by the geographical
diversity of the servers that host its replicas
• ๐‘†๐‘– (๐‘ 1 , ๐‘ 2 , … , ๐‘ ๐‘› )is the set of servers hosting replicas of the service
• ๐‘๐‘œ๐‘›๐‘“๐‘– , ๐‘๐‘œ๐‘›๐‘“๐‘— ∈ 0, 1 are the confidence levels of servers
20
Availability
• The diversity function returns a number calculated based on
the geographical distance among each server pairs
– This distance can be represented as a ๐‘› - bit number
(continent, country, city, data center, room, rack, server)
• When two servers have the same location, their corresponding proximity
bit is set to 1, otherwise to 0
• A binary NOT operation is then applied to the proximity to get the
diversity value
• having more replicas in distinct servers located even in the same location
always results in increased availability
21
Candidate server
• When the availability of a component falls below ๐‘กโ„Ž a new
service instance should be started (i.e. replicated)
– maximize the net benefit between the diversity of the resulting set of
replica locations for the service and the virtual rent of the new server
– where rent ๐‘— is the virtual rent of candidate server
– ๐‘”๐‘— is a weight related to the proximity of the server location to the
geographical distribution of the client requests for the service (cf.[3])
( tend to replicate closer to the components that heavily rely on the
services of the former)
[3]N. Bonvin, T. G. Papaioannou, and K. Aberer, “Cost-efficient and differentiated data
availability guarantees in data clouds,” in Proc. of the ICDE, Long Beach, CA, USA, 2010.
22
Candidate server
• The components rank servers according
– net benefit (6)
– randomly choose the target for replication among the top-k ones
( avoid overloading the same destination server at an epoch )
• the availability tends to be increased as much as possible at
the minimum cost, while the network latency for the query
reply also decreases
• that the same approach according to (6) is used for choosing
the candidate server for component migration
23
Experimental Setup
• We employ two different testbed settings:
– a single application setup consisting of 7 servers
– a multi-application setup consisting of 15 servers
• The hardware specification of each server is Intel Core i7 920
@ 2.67 GHz, 8GB Ram, Linux 2.6.32-trunk-amd64
• We run two databases (MySQL 5.1 and Cassandra 0.5.0)
• One generator of client requests for each application
(FunkLoad 1.10, http://funkload.nuxeo.org/) on their own
dedicated servers
24
Experimental Setup
• Performing the following actions:
–
–
–
–
–
–
–
–
1) request the main page that contains the list of entertainment events
2) request the details of an event A
3) request the details of an event B
4) request again the details of the event A
5) login into the application and view user account
6) update some personal information
7) buy a ticket for the event A
8) download the corresponding ticket in PDF
• A client continuously performs this list of actions over a period
of 1 minute
25
Experimental Setup
• An epoch is set to 15 seconds
• An agent sends gossip messages every 5 seconds
• We consider two different placements of the components:
– A static approach
each component is assigned to a server by the system administrator
– A dynamic approach
all components are started on a single server and dynamically migrate
/ replicate / stop according to the load or the hardware failures
26
Dynamic vs Static Replica Placement
the response time is lower bounded by that
of the slowest component
(in our case, for generating PDF tickets)
27
Scalability
the multi-application experimental setup
Assume that all 10 servers reside at 1 datacenter:
Increase the number of concurrent users from 150 to 1500
randomly routed among the replicas of a component
28
High-availability
A single application setup
consisting of 7 servers
Assume that each component has 2
replicas that reside at separate
servers
10 concurrent clients continuously
send requests for 1 minute
After 30 seconds, one random
server between those hosting the
replicas of a component fails
29
Adaptation to New Cloud Resources
We employ the single-application experimental setup, but the number of
available servers in the cloud ranges from 1 to 10
30
Evaluation of Routing Policies
In this case, we employ the single-application setup
The 4 servers of the cloud are located in 2 datacenters (2 servers per datacenter).
The round-trip time between the datacenters is 50 ms
The minimum availability (i.e. number of replica per component) is set to 2
31
CONCLUSIONS
• Proposed an economic approach for dynamic accommodation
of load spikes for composite web services deployed in clouds
• Server agents act as individual optimizers and autonomously
replicate, migrate or stop based on their economic fitness
• This approach also offers high availability guarantees by
maintaining a certain number of the various components in
geographically diverse
32
COMMENT
• Rents v.s. Resources
– Using rents is a more flexible approach
• Distributed v.s. Centralized
33
Download