vidhyaloadbal

advertisement
DYNAMIC LOAD BALANCING
IN
WEBSERVERS & PARALLEL COMPUTERS
By
Vidhya Balasubramanian
• Dynamic Load Balancing on Highly Parallel Computers
- dynamic balancing schemes which seek to minimize total execution
time of a single application running in parallel on a multiprocessor
system
1. Sender Initiated Diffusion (SID)
2. Receiver Initiated Diffusion(RID)
3. Hierarchical Balancing Method (HBM)
4. Gradient Model (GM)
5. Dynamic Exchange method (DEM)
• Dynamic Load Balancing on Web Servers
-dynamic load balancing techniques in distributed web-server
architectures , by scheduling client requests among multiple nodes in a
transparent way
1. Client-based approach
2. DNS-Based approach
3. Dispatcher-based approach
4. Server-based approach
Load balancing on Highly Parallel computers
• load balancing is needed to solve non-uniform problems on
multiprocessor systems
• load balancing to minimize total execution time of a single application
running in parallel on a multicomputer system
• General Model for dynamic load balancing includes four phases
* process load evaluation
* load balancing profitability determination
* task migration strategy
* task selection strategy
• 1st and 4th phase application dependent and hence can be done
independently
• load balancing overhead includes :- communication costs of acquiring load information
- informing processors of load migration decisions
- processing costs of evaluating load information to determine task
transfers
Issues in DLB Strategies
1. Sender or Receiver initiation of balancing
2. Size and type of balancing domains
3. Degree of knowledge used in the decision process
4. Overhead , distribution and complexity
General DLB Model
• Assumption – each task is estimated to require equal computation time
• process load evaluation – count of number of tasks pending execution
• task selection simple – no distinction between tasks
• inaccuracy of task requirements estimates leads to unbalanced load
distributions
• imbalance detected in phase 2, and appropriate migration strategy
devised in phase 3.
• centralized vs. distributed approach –
• centralized –more accurate, high degree of knowledge, but
requires synchronization which incurs an overhead and delay
• distributed – less accurate, lesser overhead
Load Balancing Terminology
Load Imbalance Factor ( f(t) ) :
It is a measure of potential speedup obtainable through load balancing
at time t
It is defined as the maximum processor loads before and after load
balancing , Lmax, and Lbal respectively
f(t) = Lmax - Lbal
Profitability:
Load Balancing is profitable if the savings is greater than load
balancing overhead Loverhead i.e.,
f(t) > Loverhead
Simplifying assumption : One the processor’s load drops below a preset
threshold , Koverhead any balancing will improve the system performance
Balancing Domains: system partitioned into individual groups of
processors
Larger domains – more accurate migration strategies : smaller domains –
reduced complexity
Gradient Model
•
Under loaded processors inform other processors in the system of their state and
overloaded processors respond by sending a portion of the load to the nearest lightly
loaded processor
•
threshold parameters – Low-Water-Mark(LWM) , High-Water-Mark(HWM)
•
processors state light if less than LWM, and high if greater than HWM
•
Proximity of a process : defined as the shortest distance from itself to the nearest lightly
loaded node in the system
•
wmax - initial proximity, the diameter of the system
•
proximity of system is 0 if state becomes light
•
Proximity of p with ni neighbors computed as :
proximity(p) = mini ( proximity(ni )) + 1
Load balancing profitable if :
Lp – Lq > HWM – LWM
•
Complexity:
1. May perform inefficiently when too mulch or too little work is sent to an under loaded
processor
2. In the worst case an update would require NlogN messages (dependent on network
topology)
3. Since ultimate destination of migrating tasks is not explicitly known , intermediate
processors must be interrupted to do the migration
4. Proximity map might change during a task’s migration altering its destination
3
3
2
2
Overloaded
Moderately
Overloaded
Underloaded
3
d
1
0
1
2
1
2
d
3
Sender Initiated Diffusion
 Local, near- neighbor diffusion approach which employs overlapping
balancing domains to achieve global balancing
 balancing performed when a processor receives a load update message from
a neighbor indicating that the neighbors load li < L low where L low is preset
threshold
 Average load in domain Lp
_
k
Lp = 1 / (k+1) ( lp + S lk )
k=1
 Profitability: Profitable if
_
Lp – Lp > Lthreshold
 Each neighbor assigned a weight hk depending on its load
 the weights hk are summed to find the local deficiency Hp
The portion of processor p’s excess load that is apportioned to neighbor k is
given by dk = ( lp – Lp) hk / Hp
 Complexity
1. Number of messages for update = KN
2. Overhead incurred by each processor = K messages
3. Communication overhead for migration = N/2 k transfers
8
0
4
6
Average load L =10
Domain deficiency H = 20
Surplus load S = 21
Receiver Initiated Diffusion
• under loaded processors request load from overloaded processors
• initiated by any processor whose load drops below a prespecified threshold
Llow
• processor will fulfill request only upto half of its current load.
• underloaded processors take on majority of load balancing overhead
dk = ( lp – Lp) hk / Hp same as SID, except it is amount of load requested.
• balancing activated when load drops below threshold and there are no
outstanding requests.
• Complexity
Num of messages for update = KN
Communication overhead for task migration = Nk messages + N/2 K transfers
(due to extra messages for requests)
As in SID, number of iterations to achieve global balancing is dependent on
topology and application
Hierarchical Balancing Method
• processors in charge of balancing process at level li , receive load information
from both lower level li-1 domains
• size of balancing domains double from one level to the next
• subtree load information is computed at intermediate nodes and propagated to
the root
• The absolute value of difference between the left domain LL and right domain
LR is compared to Lthreshold
| LL – LR | > Lthreshold
• Processors within the overloaded subtree , send a designated amount of load to
matching neighbor in corresponding subtree
• Complexity:
1. Load transfer request messages = N/2
2. Total messages required = N(log N+1)
3. Avg cost per processor = log N+1 sends and receives
4. Cost at leaves = 1 send + log N receives
5 . Cost at root = log N receives + N-1 sends + log N receives
Dimension Exchange Method
• small domains balanced first, then entire system is balanced
• synchronized approach
• in N processor hypercube, balancing performed iteratively in each logN
dimensions
• balancing initiated by processor with load that drops below threshold
• Complexity
1. Total communication overhead = 3N log N messages
Summary of Comparison Analysis
Category
Gm
SID
DEM
HBM
RID
Initiation
Receiver
Sender
Designated
Designated
Receiver
Balancing
Domain
Variable
Overlapped
Variable
Variable
Overlapped
Knowledge
Global;
local
Global
Global
Local
Aging
period
O(diameter(
N))
F(u,K)
Constant
F(u,N)
F(u,K)
Overhead
Distribution
Uniform
Uniform
Uniform
Non
uniform
Uniform
U = load update factor: if u = ½ then processor must send update messages
whenever load has doubled or halved from last update
Performance Analysis Graphs
Speedup Vs Number of Processors
Dynamic Load Balancing on Web Servers
• load balancing is required to route requests among distributed web
server nodes in a transparent way
• this helps in improving throughput and provides high scalability and
availability
• user: one who accesses the information
• client: a program, typically a web browser
• client obtains IP address of a web server node through an address
mapping request to the DNS server
• there are intermediate name server, local gateways and browsers , that
can cache the address mapping for sometime
Requirements of the web server:
• transparency
• scalability
• load balancing
• availability
• applicability to existing Web standards (backward compatibility)
• geographic scalability (i.e., solutions applicable to both LAN and
WAN distributed systems)
Client –Based Approach
•
In this approach it is the client side itself that routes the request to one of
the servers in the cluster. This can be done by the Web-browser or by the
client-side proxy-server.
1 . Web Clients
•
assume web clients know the existence of replicated servers of the web
server system
•
based on protocol centered description
•
web client selects the node of a cluster , resolves the address and submits
requests to selected node
•
Example:
1. Netscape
* Picks random server i
* not scalable
2. Smart Clients
* Java applet monitors node states and network delays
* scalable, but large network traffic
Client –Based Approach-contd
2.
Client Side Proxies
•
combined caching and server replication
•
Web Location and Information service can keep track of replicated URL
addresses and route client requests appropriately
Advantages and Disadvantages:
-Scalable and high availability
-Limited applicability
-Lack of portability on the client side
DNS –Based Approach
• cluster DNS – routes requests to the corresponding server
• transparency at URL level
• through the translation process from the symbolic name to IP address
, it can select any node of the cluster
•DNS it also specifies, a validity period known as Time-to-Live, TTL
• After expiration of TTL, address mapping request forwarded to
cluster DNS
• limited factors affecting DNS
* TTL does not work on browser caching
* no cooperative intermediate name servers
* can become potential bottleneck
• Two DNS based System of algorithms
* Constant TTL Algorithms
* Adaptive TTL algorithms
A DNS-based Web server cluster
DNS-Based Approach
Constant TTL Algorithms
 classified based on system state information and constant TTL value
 System Stateless Algorithms:
- Round Robin DNS by NCSA
- load distribution not very balanced, overloaded server nodes
- ignores sever capacity and availability
 Server State Based Algorithms:
- simple feedback alarm mechanism
- selects server with lightest load
- limited applicability
 Client State Based Algorithms
- typical load that can come from each connected domain
- Hidden Load , measure of average number of data requests sent
from each domain to a Web site during the TTL caching period
- geographical location of the client
- Cisco DistributedDirector – takes into account relative client-toserver topological proximity, and client-to-server link latency
- Internet2 Distributed Storage Infrastructure uses round trip delays
 Server and Client State Based Algorithm
-Distributed Director DNS - both server availability and client
proximity
Adaptive TTL Algorithm
-By base of dynamic information from servers and/or clients to assign
different TTL
- Two step process
* DNS selects server node similar to hidden load weight
algorithms
* DNS chooses appropriate value for the TTL period
-TTL values inversely proportional to the domain request rate
- popular domains have shorter TTL intervals
- scalable from LAN to WAN distributed Web Server systems
Dispatcher Based Approach
• provides full control on client requests and masks the request routing among
multiple servers
• cluster has only one virtual IP address the IP address of the dispatcher
• dispatcher identifies the servers through unique private IP addresses
• Classes of routing
1. Packet single-rewriting by the dispatcher
2. Packet double-rewriting by the dispatcher
3. Packet forwarding by the dispatcher
4. HTTP redirection
Packet Single Rewriting
-dispatcher reroutes client-to-server packets by rewriting their IP address
-requires modification of the kernel code of the servers, since IP address
substitution occurs at TCP/IP level
-Provides high system availability
Packet Double Rewriting
-modification of all IP addresses, including that in the response packets carried
out by dispatcher
-two architectures based on this:
* Magicrouter (fast packet interposing where user level process,acting
as a switchboard, intercepts client-to-server and server-to-client packets and
modifies them)
* LocalDirector ( modifies IP address of client-server packets
according to a dynamic mapping table)
Packet Forwarding
* forwards client packets to servers instead of rewriting IP address
* Network Dispatcher
- use MAC address
- dispatcher and servers share same IP-SVA address
- for WAN, two level dispatcher (first level packet rewriting)
- transparent to both the client and server
* ONE-IP address
- publicizes the same secondary IP addresses of all Web-server nodes
as IP-SVA of the Web-server cluster
- routing based dispatching :
destination server selected based on hash function
- broadcast based dispatching:
router broadcasts the packets to every server in the cluster
- using hash function restricts dynamic load balancing
- does not account for server heterogeneity
HTTP Redirection
• Distribute requests among web-servers through HTTP redirection
mechanism
• redirection transparent to user
• Server State based dispatching
- each server periodically reports both the number of processes in its
run queue and number of received requests per second
• Location based dispatching
• can be finely applied to LAN and WAN distributed Web Server Systems
• duplicates the number of necessary TCP connections
Server Based Approach
- uses two level dispatching mechanism
- cluster DNS assigns requests to a server
- server may redirect request to another server in the cluster
-allows all servers to participate in load balancing (distributed)
- Redirection is done in two ways
- HTTP redirection
- Packet redirection by packet rewriting
HTTP Redirection by the Server
Packet Redirection
-transparent to client
-Two balancing algorithms
- use RR-DNS to schedule request (static routing)
- periodic communication among servers about their current load
Main Pros and Cons
Approach
Scheduling
ClientBased
Client-side
No server overhead
Limited applicability
Distributed
LAN & WAN solution
Medium coarse grained balancing
No bottleneck
Partial control
Centralized
LAN & WAN solution
Coarse grained balancing
Cluster side
Fine grained balancing
Dispatcher bottleneck
Centralized
Full control
LAN solution
DNS-Based Cluster-side
DispatcherBased
Pros
Cons
Packet rewriting overhead
ServerBased
Cluster-side
Distributed control
Latency time increase(HTTP)
Distributed
Fine grained balancing
Packet rewriting overhead(DPR)
LAN & WAN solution
Performance of various distributed architectures
1.
Exponential distribution model
2. Heavy-tailed distribution model
Conclusions
-consider performance constraints due to network bandwidth than
server node capacity
- account for network load as well as client proximity
Download