Web Server Load Balancing/Scheduling

advertisement
Web Server Load
Balancing/Scheduling
Asima Silva
Tim Sutherland
Outline
Web Server Introduction
Information Management Basics
Load Sharing Policies
–
–
–
–
FLEX
WARD
EquiLoad
AdaptLoad
Summary
Conclusions
Future Work
Introduction to Web Server Load
Balancing
Request enters a
router
Load balancing server
determines which
web server should
serve the request
Sends the request to
the appropriate web
server
Request
Response
Internet
Router
Load-Balancing Server
Web Servers
Traditional Web Cluster
How do we split up information?
Content
Server Farm
?
Information Strategies
Replication
Partition
Load Balancing Approaches
File Distribution
Routing
Content/Locality DNS Server
Aware
Size Aware
Centralized
Router
Workload Aware
Distributed
Dispatcher
Issues
Efficiently processing requests with
optimizations for load balancing
– Send and process requests to a web server
that has files in cache
– Send and process requests to a web server
with the least amount of requests
– Send and process requests to a web server
determined by the size of the request
File Distribution
FLEX
Routing
Content/Locality
Aware
DNS Server
Size Aware
Centralized Router
Workload Aware
Distributed
Dispatcher
Locality aware load-balancing strategy
based on two factors:
– Accessed files, memory requirements
– Access rates (working set), load requirements
Partitions all servers into equally
balanced groups
Each server transfers the response to the
browser to reduce bottleneck through the
router (TCP Handoff)
Flex Diagram
S1
Requests
DNS
Server
S2
Forwards
Request
S3
To Client Browser
W(S1) ≈ W(S2) ≈ W(S3) ≈ … ≈ W(S6)
Ar(S1) ≈ Ar(S2) ≈ Ar(S3) ≈ … ≈ Ar(S6)
S5
S4
S6
FLEX Cont.
Advantages:
– Highly scalable
– Reduces bottleneck by the load balancer
– No software is required
– Reduces number of cache misses
FLEX Cont. II
Disadvantages:
– Not dynamic, routing tale must be recreated
– Only compared to RR
– Number of access logs required on each
server could be tremendous
– Responsibility of load-balancing and
transferring response is given to web servers
– unorganized responsibility
– How often to update access rates and
working sets? Monitor?
File Distribution
WARD
Routing
Content/Locality
Aware
DNS Server
Size Aware
Centralized Router
Workload Aware
Distributed
Dispatcher
Workload-Aware Request Distribution Strategy
Server core are essential files that represent
majority of expected requests
Server core is replicated at every server
Ward-analysis computes the nearly optimal core
size determined by workload access patterns
–
–
–
–
Number of nodes
Node RAM
TCP handoff overhead
Disk access overhead
WARD Cont.
Three components: dispatcher (load
balancer), distributor (router), web server
Three progressive architectures:
CARD
LARD
Dispatcher
Distributor
Distributor
Server
Server
Server
Switch
Switch
WARD
Dispatcher
Distributor
Server
Dispatcher
Front
End
Front
End
Distributor
Server
Server
LAN
Single front-end
distributor, centralized
dispatcher
LAN
Co-located distributor and
server
Front
End
Dispatcher
Distributor
Server
LAN
Co-located distributor,
server, and dispatcher
WARD Diagram
S1
S2
S3
Queue:
Queue:
Requests
Queue:
Switch
S4
Queue:
S5
S6
•Each computer is a distributor
and a dispatcher
Queue:
Queue:
WARD Cont. II
Similar to FLEX, sends response directly to
client
Minimizes forwarding overhead from handoffs
for the most frequent files
Optimizes the overall cluster RAM usage
“by mapping a small set of most frequent files to
be served by multiple number of nodes, we can
improve both locality of accesses and the cluster
performance significantly”
WARD Cont. III
Advantages:
– No decision making, core files are replicated
on every server
– Minimizes transfer of requests and disk reads,
both are “equally bad”
– Outperforms Round Robin
– Efficient use of RAM
– Performance gain with increased number of
nodes
WARD Cont. IV
Disadvantages:
– Core files are created on past day’s data,
could decrease performance up to 15%
– Distributed dispatcher increases the number
of TCP requests transfers
– If core files not selected correctly, higher
cache miss rate and increased disk accesses
WARD Results
File Distribution
EquiLoad
Routing
Content/Locality
Aware
DNS Server
Size Aware
Centralized Router
Workload Aware
Distributed
Dispatcher
Determines which server will process a
request determined by the size of the
requested file
Splits the content on each server by file
size, forcing the queues sizes to be
consistent.
EquiLoad Solves Queue Length
Problems
This is bad
Queue 1k 1k 1k 2k
Queue 1k 1k
This is better
Queue 1k 1k 1k 2k
Queue
100k
100k
1000k
2k
1k 1k 2k 1k
100k
EquiLoad Diagram
Dispatcher
S1
S2
(periodically calculates
partitions)
Requests
Forwards
Request
1k-2k
2k-3k
S3
S4
Distributor
3k-10k
To Client Browser
10k-20k
S5
20k-100k
S6
>100k
EquiLoad
Advantages
– Dynamic repartitioning
– Can be implemented at various levels
DNS
Dispatcher
Server
– Minimum queue buildup
– Performs well under variable workload and
high system load
EquiLoad
Disadvantages
– Cache affinity is neglected
– Requires a front end dispatcher
– Distributor must communicate with servers
– Thresholds of parameter adjustment
EquiLoad  AdaptLoad
AdaptLoad improves upon EquiLoad using
“fuzzy boundaries”
– Allows for multiple servers to process a
request
– Behaves better in situations where server
partitions are very close in size
AdaptLoad Diagram
Distributor
S1
S2
(periodically calculates
partitions)
Requests
Forwards
Request
1k-3k
2k-4k
S3
S4
Dispatcher
3k-10k
To Client Browser
8k-20k
S5
15k-100k
S6
>80k
AdaptLoad Results
Summary
File Distribution
Routing
Content/Locality DNS Server
FLEX
Aware
Size Aware
Centralized
EquiLoad, AdaptLoad
Router
Workload Aware
Distributed
WARD
Dispatcher
Conclusions
There is no “best” way to distribute content
among servers.
There is no optimal policy for all website
applications.
Certain strategies are geared towards a
particular website application.
Future Work
Compare and contrast the three policies
Figure out how often nodes should be
repartitioned
Compare each policy to a standard
benchmark
Figure out which policy works in a
particular environment
Questions?
Anyone have one?
Download