A Survey on Cloud Data Center Load Balancing Algorithms

advertisement
International Journal of Research in Engineering Technology and Management
ISSN 2347 - 7539
A Survey on Cloud Data Center Load Balancing Algorithms
Sonpreet Juneja1, Mr. Abhay Kothari2
1
Final Year M Tech Scholar, CSE Department, Acropolis Institute Of Technology & Research Affiliated to Rajiv Gandhi
Technical University, Bhopal, Madhya Pradesh, India, sonkaur24@gmail.com
2
Prof, CSE Department, Acropolis Institute of Technology & Research Affiliated to Rajiv Gandhi Technical University, Bhopal,
Madhya Pradesh, India ,abhaykothari@acropolis.in
Abstract
Cloud Computing is an emerging technology in parallel and distributed computing which requires large amount of infrastructure
and resources. To optimally serve the needs of the clients all over the world, their providers have leveraged data centers at
different geographical locations. Load Balancing is one of the key issue in cloud computing. A load can be a CPU load, memory
capacity or network load. It is the process of distributing workload equally on all servers in order to improve resource utilization
and response time. Such algorithms mainly aim at ignoring a state where some nodes are heavily loaded while others are lightly
loaded or idle. This paper presents cloud perspective and survey of different load balancing algorithm in a cloud environment.
Keywords: Load, Cloud Computing, Load balancing, Load balancing algorithm.
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
CLOUD
Cloud Computing is an aggregation of two functional terms.
First is “Cloud”, which is a pool of heterogeneous resources
and second is “Computing”, which is a mesh of large
infrastructure through which applications is delivered to end
users as services over the internet and data centres [1]. With
cloud computing coming into existence as the next
generation platform that facilitates huge demand on renting
of computing for hosting application that experiences
variable workloads and performance.
The word “cloud” is used as a metaphor for the internet, but
when it combines with “computing”, the meaning gets
bigger and fuzzier. It enables a wide range of users to access
virtualized hardware or software infrastructure distributed
over the internet on demand which is gaining momentum.
Services provided by the cloud are Software as a Service
(SaaS), Platform as a Service (PaaS) and Infrastructure as a
Service (IaaS). The end users of cloud share processing
powers, storage space, bandwidth, memory and software.
These services are being used on the basis of pay–as-youuse model to customers regardless of their location.
1.1 Cloud Computing Elements
In the taxonomy of cloud computing, different participants
involved in cloud along with their attributes and technology
are coupled to address their needs and services. Cloud is a
kind of virtualization that abstracts the coupling between
hardware and operating systems. The three main
components of cloud are data centres, clients and distributed
servers. Cloud computing is used to host services that runs
client server software at remote location. They are
represented as follows:
Clients
Distributed Servers
Data Centers
Fig-1: Cloud Computing Elements
1.1.1 Clients: The end users interact with the clients to
enable access to various services provided by the cloud.
Clients generally fall into three categories as given:
Mobile: Users access cloud computing using networked
client devices such as desktop computers, laptop, tablets and
smart phones such as android and iPhone.
Thin: Such clients don’t perform any computational work.
They only display information. Servers do all the works for
them. Thin clients don’t have any internal memory.
Thick: Thick clients sometimes called as heavy or fat
clients. They are basically personal computers with their
own operating system, storage and ability to execute their
own programs.
1.1.2 Data Centers: A data center is defined as a
centralized repository for the storage and management of
data information. It’s a collection of servers hosting
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
1
International Journal of Research in Engineering Technology and Management
ISSN 2347 - 7539
different applications. To access cloud services the end users
contact data centers. It may exist at a large distant from the
client.
1.1.3 Distributed Servers: Distributed Servers are different
functional units of cloud which together hosts internet
applications. It creates a virtualization and transparency to
users that they are accessing cloud services directly on their
machines.
1.2 Cloud Deployment Models
Cloud services can be deployed in different ways depending
upon domain or environment in which clouds are used;
clouds can be divided into three categories:
1.2.1 Public Clouds: In public clouds the services provided
over a network is open for public use as pay-per-usage fee.
Clients don’t require purchasing hardware to get the service
and they can also scale their use on demand.
.2.2 Private Clouds: This cloud’s infrastructure is operated
by a single organization and resources are deployed inside a
firewall and managed by the client’s organization. It has full
control over corporate data, security issues and system
performance.
1.2.3 Community Cloud: In a community cloud,
organizations with common requirements of cloud resources
share a common infrastructure. Such a cloud is a
generalization of a private cloud.
1.2.4 Hybrid Cloud: A hybrid cloud is a combination of
both public and private cloud from different service
providers. It allows the clients to extend their capacity of
cloud service by integration with another cloud service.
Fig-2: Virtualization of cloud computing [8]
Para Virtualization is a virtualization technique that
provides an interface to virtual machines that are similar to
their underlying hardware. In this virtualization the services
are provided partially.
1.4 Cloud Perspectives [5]
Cloud has different meaning to different stakeholders. There
are three main stakeholders of cloud:
1.4.1 End Users: These are the customers or consumers of
the cloud. They use various services Infrastructure, Software
and Platform provided by the cloud. They use these services
on demand and have to pay depending upon their usage.
Table-1: Cloud Perspectives
Type of stakeholders
Requirement
End Users
1.3 Virtualization
In cloud computing, virtualization means to create
transparency of device resource such as a server, storage,
network or even an operating system. Virtualization means
something which isn’t real, but gives all the facilities of a
real. It enables end users to use different services of a cloud.
Essentially, it differs from cloud computing because
virtualization is software that manipulates hardware, while
Cloud computing refers to a service that results from that
manipulation. Virtualization is of two types:
Cloud Provider
Full Virtualization is complete abstraction between
underlying physical machine and different applications of
cloud which makes it highly secure. It is used for isolating
users from each other and enables accessing of hardware on
another machine
Cloud Developer
Security
Provenance
Privacy
High Availability
Reduced Cost
Ease Of Use
Managing Resources
Outsourcing
Resource Utilization
Energy Efficiency
Metering
Cost Efficiency
Meet end user requirements
Utility Computing
Elasticity/Scalability
Virtualization
Agility and Adaptability
Data Management
Reliability
Programmability
1.4.2 Cloud Provider: Cloud Providers can be either public
or private or hybrid. They are responsible for building the
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
2
International Journal of Research in Engineering Technology and Management
ISSN 2347 - 7539
cloud. They must accomplish their job of “Resource
Provisioning” which includes managing of huge bundle of
resources that make up cloud and providing these resources
to end users.
2. Priority Activation: Cloud Computation system suffers
from a situation when number of request becomes more than
the number of available servers. Therefore, clients face
downtime. To overcome this, standby servers are brought
online.
1.4.3 Cloud Developer: This entity lies between the end
user and cloud provider. It has the responsibility of taking
into consideration both the perspectives of the cloud.
3. SSL Offload: Secure Socket Layer (SSL) certificate
provides authentication between server and client in a web
application. As the workload increases, the authentication
requirement on web servers becomes a major demand. To
overcome this, balancer terminates SSL connections by
sending HTTP request to web servers. SSL is processed on a
single balancer which can become a new bottleneck.
2. LOAD BALANCING
COMPUTING
IN
CLOUD
Load Balancing in cloud computing provides an efficient
resource provisioning and scheduling of resources as well.
Resource provisioning is defined as the availability of
resources to user to meet their requirements and scheduling
of resources defines the manner in which the allocated
resource is available to end users. It’s the concept of
distributing workloads equally on all servers such as
computer cluster, CPUs and disk drives. It ensures that
every node in the computational network perform equal
work at any instant of time. The availability of resources and
to increase the job response time is a growing concern in IT
industry. Fundamentally its role is to determine the time that
the system is up, running correctly the length of time
between failures. It is provided by a dedicated software or
hardware.
A real life example of load balancing is surfing of websites.
Without efficient load balancing algorithm, user experiences
delays, timeouts and long responses. In redundant servers, if
one server got destroyed, requests are forwarded to another
server with more capacity. Thus load balancing aims at
optimizing resource use, maximize throughput, minimize
response time and avoid overloading.
3.
WHY
LOAD
REQUIRED?
BALANCING
IS
Load is the measure of amount of work that a computation
system performs. The random arrival of load in cloud
environment can unbalance of load on servers. Load
balancing is very essential because equal load distribution
enhances system performance by transferring load from
heavily loaded to lightly loaded server. Also, efficient
resource allocation and scheduling is critical characteristics
of cloud computing on which whole system’s performance
is estimated. In clouds, load balancing is a method that is
applied across different data centers to overcome resource
limitations and failure to scale up with the increased
demands.
A load balancer is a device that acts as a proxy and
distributes network traffic across a number of servers [11].
Various features of load balancers are:
1. Asymmetric Load: Load balancer accounts the
symmetry of the system by a ratio which is manually
assigned to ensure that load is distributed equally.
4. HTTP Compression: It is the capability between client
and server to make better use of available bandwidth and
reduce response time by utilizing zip compressions.
5. Checking: The balancer polls servers checks for the
amount of load on all servers and removes failed servers
from the pool.
6. Traffic Manipulation: There should be at least one
balancer which allows the use of scripting language
incorporate balancing methods and traffic manipulations.
7. Firewall: To ensure security, firewall is used. Firewall is
the set of rules that decides whether the traffic may pass
through it or not. It also prevents direct connection to
backend servers.
4. TYPES OF
ALGORITHMS [5]
LOAD
BALANCING
Load balancing algorithms follows two major classifications
depending upon how the load is distributed and the
resources are allocated to nodes i.e. system load and
depending upon the information status of the load i.e.
system topology. The important parameters to be considered
while developing such an algorithm is estimation of load,
comparison of load, performance of the system and
interaction between nodes. It is represented as follows:
Load balancing algorithm
Static
Dynamic
Centralized
Distributed
Fig-3: Types of load balancing algorithms
4.1 Static: In static environment, the cloud provider installs
homogenous resources. In load balancing algorithm the
traffic is divided equally among the servers. This algorithm
requires a prior knowledge of system resources so that the
decision of transferring of load does not depend on the
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
3
International Journal of Research in Engineering Technology and Management
current state of the system. This algorithm is easy to
stimulate but not well suited for heterogeneous environment.
Also it is not compatible with changing user requirements as
well as load.
ISSN 2347 - 7539
concept that ants and their food are compared to web
services. When ant moves, it releases a chemical called as
“Pheromone”. The chemical attracts other ants to follow the
path. In context to cloud computing, the aim of ant colony
optimization is to find an optimal path in the graph which
depicts the behavior of ants seeking the path between their
colony and source of food, i.e. web services.
AntColonyOptimization ( )
{
Initialize Pheromone table
Declare threshold value of all nodes
Ants move through nodes
If load < threshold
Traverse the node with maximum pheromone
Else
Traverse the node with minimum pheromone
Update pheromone tables
Reassign resources if node is under/over loaded
}
5.2 Round Robin Algorithm
Fig -4: Load Balancing Approach
4.2 Dynamic: In dynamic load balancing algorithm, the
lightly loaded server in whole network is searched and
preferred for balancing a load. For this communication with
network is needed. Here current state is used to manage
load. Also, run time statistics is monitored to adapt changes
in load requirements. In dynamic environment, the cloud
provider installs heterogeneous resources and therefore it is
difficult to be simulated but it is highly adaptable in cloud
computing environment. A dynamic load balancing
algorithm may be centralized or distributed:
1. Centralized Approach: In this approach, a single server
node is responsible for making assignment decisions and for
managing the distribution within the whole system. It is
useful in small networks with low loads because this
approach is not fault tolerant.
This algorithm uses the round robin scheduling of processes.
It mainly focuses on distributing load equally to all nodes. In
this approach, the scheduler allocates one virtual machine
(VM) to each node in a cyclic manner. This process is
repeated until all the nodes have been allocated at least on
VM and then it returns to first node again. The idea behind
round robin is that, multiple IP addresses are assigned to a
single domain name and clients are virtualized and expected
to choose the server.
process 1
process 5
2. Distributed Approach: In this approach, the distributed
nodes build its own load vector by collecting the load
information from other nodes. Decisions are made local and
independent. Distributed algorithms are complex in nature
and communication overhead also occurs.
5.
EXISTING
ALGORITHMS
LOAD
process 4
BALANCING
5.1 Ant Colony Optimization
Ant colony optimization is a probabilistic technique for
solving computational problems. This algorithm monitors
the capability of potentially available resources and then
analyzed some parameters such as response time.
This algorithm was first proposed by Marco Dorigo in his
PhD thesis in 1992. Such an algorithm is based on the
process 2
process 3
Fig -5: Round Robin Process
Advantages:


It utilizes all the resources in balanced order.
All the nodes are assigned with equal no of VMs
which ensures fairness.
Disadvantages:
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
4
International Journal of Research in Engineering Technology and Management


This algorithm considers only current load on each
VM.
There is an additional load vector which decides
load distribution.

ISSN 2347 - 7539
Absence of any bottleneck due to its decentralized
nature.
Disadvantages:
 Degrades as system diversity increase.
5.3 Load Balancing Min-Min (LBMM)
It’s a static load balancing algorithm [7]. The main aim of
Min-Min algorithm is to minimize the execution and
completion time of all tasks by managing a matrix called as
make-span. It first finds the minimum execution time of all
tasks and then chooses the least one. Further, the algorithm
proceeds by allocating the task to resource that require
minimum completion time. The completion time is updated
and then it is removed from tasks list. The process is
repeated until all tasks are scheduled.
6. COMPARISION OF LOAD BALANCING
ALGORITHMS
Min-Min ( )
{
Generate completion time matrix for each task in tasks list
{
Find minimum completion time from matrix;
Assign tasks to respective VMs;
Update the completion time;
}
}
2. Overhead: The overhead refers to the cost associated
with the execution and it should be minimized for successful
implementation of the load balancing algorithm.
Advantage:
In today’s computational world, where the number of small
tasks is more than the number of large tasks, this algorithm
achieves better performance.
Metrics/T
echniques
Thro
ughp
ut
Over
head
Fau
lt
Tole
rant
Migr
ation
Time
Res
pons
e
Tim
e
Scala
bility
Ant
Colony
Optimizati
on
NO
YES
NO
NO
YES
NO
Round
Robin
Algorithm
YES
YES
NO
NO
YES
YES
Load
Balancing
Min-Min
NO
NO
NO
NO
YES
YES
Active
Clustering
NO
YES
NO
NO
YES
YES
Disadvantages:
This approach can leads to starvation. It does not consider
low and high machine and cannot handle heterogeneous
tasks efficiently.
5.4 Active Clustering
Active clustering is dynamic statistical load balancing
algorithm. It works on the principal of grouping similar
nodes together and working on these groups. This algorithm
introduces the concept of clustering in cloud computing. A
node initiates the process and selects another node called as
matchmaker node from its neighbours satisfying the criteria
that it should be different than the former one. The so called
matchmaker node forms the connection between one of its
neighbours which is of the same type as the initial node.
Then the matchmaker node detaches the connection between
itself and initial node. The process is followed iteratively.
The performance of the system is enhanced due to efficient
utilization of resources. Such an algorithm finds application
in cluster based multimedia web servers that dynamically
generate video units to satisfy bit rate and bandwidth
requirement of clients.
Advantages:
 It supports heterogeneity
 Scalability
 Lowers network congestion
Load balancing algorithms are compared on the basis of
following metrics:
1. Throughput: Throughput is the total number of tasks that
have completed execution. A high throughput is required for
better performance of a system.
3. Fault Tolerant: Load balancing algorithm must be
reliable i.e. the algorithm should continue performing
uniformly even in the conditions of failure of any server. It
must be highly fault tolerant.
Table-2: Comparison of existing load balancing algorithm
4. Migration Time: It is time taken for the transfer of task
from one machine to another in any system. This time
should be minimum for improving the performance of the
system.
5. Response Time: Response time can be measured as the
time interval between sending the request and receiving its
response. It should be minimized to boost system’s
performance.
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
5
International Journal of Research in Engineering Technology and Management
6. Resource Utilization: It is the degree to which the
resources of the system are utilized. A good load balancing
algorithm must provide maximum resource utilization.
7. Scalability: A system must be highly scalable i.e. when
number of nodes is increased or decreased according to the
requirement, it should not degrade the overall system’s
performance.
8. Performance: Performance is the measure of the
efficiency of the system. This can be improved at reasonable
cost.
All algorithms have their own pros and cons. In the table
above, there is a comparison of different load balancing
algorithms on the various parameters. The performance can
be improved by varying these parameters.
7. CONCLUSION
Load Balancing is the most important aspect of cloud
computing. It helps in reducing the dynamic workload
across all the nodes for the achievement of higher user and
resource satisfaction.
This paper surveys on basic
fundamentals of cloud computing. Secondly, it focuses on
concept of load balancing, different load balancing
algorithms in cloud environment and their comparison on
parameters like throughput, overhead, response time and
resource utilization stated in table2. This paper also provides
the information to evaluate and improve the existing and
new load balancing scenarios in cloud computing systems.
ISSN 2347 - 7539
[7] Aayush Agrawal, Manish G, Raje Neha Milind, Shylaja
S S, “A Survey Of Cloud Based Load Balancing
Techniques”, Proceedings of Int. Conf. on Electrical,
Electronics, Computer Science & Mechanical Engg., 27th
April-2014, Bangalore, India.
[8] Nusrat Pasha, Dr. Amit Agrawal, Dr. Ravi Rastogi, ’”
Round Robin Approach for VM Load Balancing Algorithm
in Cloud Computing Environment”, International Journal of
Advanced Research in Computer Science and Software
Engineering, Volume 4, Issue 5, May 2014.
[9] Zenon Chaczko, Venkatesh Mahadevan, Shahrzad
Aslanzadeh, Christopher Mcdermid, “Availability and Load
Balancing in Cloud Computing”, 2011 International
Conference on Computer and Software Modeling IPCSIT
vol.14 (2011) © (2011) IACSIT Press, Singapore.
[10] Bhaskar Prasad Rimal, Eunmi Choi, Ian Lumb, “A
taxonomy and survey of Cloud Computing Systems”, 2009
Fifth International Joint Conference on INC, IMS and IDC.
[11] www.wikipedia.com
8. REFERENCES
[1]. Jaspreet Kaur, “Comparison of load balancing
algorithms in a Cloud”, International Journal of Engineering
Research and Applications, 2009.
[2]. Ram Prasad Padhy, PGoutam Prasad Rao,”Load
balancing in cloud computing system” Department of
Computer Science and Engineering National Institute of
Technology, Rourkela Rourkela-769 008, Orissa, India
May, 2011.
[3]. Akshay Daryapurkar, Mrs. V.M. Deshmukh, “Efficient
Load Balancing Algorithm in Cloud Environment”,
International Journal Of Computer Science And
Applications, Apr 2013.
[4]. Martin Randles, David Lamb, A. Taleb-Bendiab, “A
Comparative Study into Distributed Load Balancing
Algorithms for Cloud Computing”, 2010 IEEE 24th
International Conference on Advanced Information
Networking and Applications Workshops.
[5]. Mayanka Katyal, Atul Mishra,” Comparative Study of
Load Balancing Algorithms in Cloud Computing
Environment”, http://www.publishingindia.com.
[6] Ranjan Kumar, G. Sahoo
, K. Mukherjee ,
”Performance Analysis of Cloud Computing Using Ant
Colony Optimization Approach”, International Journal of
Innovative Research in Science, Engineering and
Technology Vol. 2, Issue 6, June 2013.
_______________________________________________________________________________________
Volume: 02 Issue: 05 | Sep-2014, Available @ http://www.ijretm.com | Paper id - IJRETM-2014-02-05-014
6
Download