CPS 212 lecture(s): Internet Site Traffic Management

advertisement
Server Traffic Management
Jeff Chase
Duke University, Department of Computer Science
CPS 212: Distributed Information Systems
The Server Selection Problem
server array A
server farm B
Which server?
Which network site?
“Contact the weather service.”
not-so-great solutions
static client binding
manual selection
HTTP forwarding
better old solutions
DNS round robin [Brisco, RFC 1794]
WebOS “smart clients” etc. [Vahdat97]
today’s buzzwords
content-aware traffic management
content switching (L4-L7)
server switching
web switching
Traffic Management for Clusters
Today we focus on the role of the network infrastructure in routing requests
to servers in a cluster.
Ignore the wide-area problem for now (DNS and other tricks later).
Relatively simple switches can support ACLs to filter traffic to specific TCP
or UDP ports from given addresses or subnets.
Current-generation server switches incorporate much richer L4 and contentaware switching features.
How much of the front end support can we build into the network
elements while preserving “wire speed” performance?
What request routing policies should server arrays use?
Key point: the Web is “the only thing that matters” commercially.
TCP with HTTP+SSL is established as lingua franca, so more
TCP/HTTP/SSL functionality migrates into hardware or firmware.
Traffic Management for Clusters
Goals
server load balancing
failure detection
access control filtering
priorities/QoS
external VIP management
request locality
transparent caching
L4: TCP
L7: HTTP
SSL
etc.
Clients
virtual IP
addresses
(VIPs)
smart
switch
server array
What to switch/filter on?
L3 source IP and/or VIP
L4 (TCP) ports etc.
L7 URLs and/or cookies
L7 SSL session IDs
L4 Server Load Balancing (SLB)
Issues
switch redundancy
mechanics of L4 switching
handling return traffic
server failure detection (health checks)
load
balancer
server array
Policies
random
weighted round robin (WRR)
lightest load
least connections
Key point: the heavy lifting of server selection
happens only on connect request (SYN).
Performance metric: connections per second.
Limitations
connection-grained
no request locality
no session locality
failover?
Mechanics of L4 Switching
a
b
c
d
x
“Client C at TCP
port p1 requests
connection to TCP
server at port p2 at
VIP address x.”
Smart switch:
1. recognizes connect request (TCP SYN)
2. selects specific server (d) for service at p2
3. replaces x with d in connect request packet
4. remembers connection {(C,p1),(d,p2)}
5. for incoming packets from (C,p1) for (x,p2)
replace virtual IP address x with d
forward to d
6. for outgoing packets from (d,p2) for (C,p1)
replace d with x
forward to C
an instance of network address translation (NAT)
Handling Return Traffic
fast
dumb
switch
incoming traffic routes to smart switch
smart switch changes MAC address
smart switch leaves dest VIP intact
all servers accept traffic for VIPs
Clients
slow
smart
switch
server responds to client IP
dumb switch routes outgoing traffic
server array
examples
IBM eNetwork Dispatcher (host-based)
Foundry, Alteon, Arrowpoint, etc.
simply a matter of configuration
alternatives
TCP handoff (e.g., LARD)
URL Switching
a,b,c
d,e,f
web
switch
g,h,i
server array
Idea: switch parses the HTTP request,
retrieves the request URL, and uses the
URL to guide server selection.
Example: Foundry
host name
URL prefix
URL suffix
Substring pattern
URL hashing
Advantages
separate static content from dynamic
reduce content duplication
improve server cache performance
cascade switches for more complex policies
Issues
HTTP parsing cost
URL length
delayed binding
server failures
HTTP 1.1
session locality
hybrid SLB and URL
popular objects
The Problem of Sessions
In some cases it is useful for a given client’s requests to “stick” to a
given server for the duration of a session.
This is known as session affinity or session persistence.
• session state may be bound to a specific server
• SSL negotiation overhead
One approach: remember {source, VIP, port} and map to the same
server.
• The mega-proxy problem: what if the client’s requests filter through
a proxy farm? Can we recognize the source?
Alternative: recognize sessions by cookie or SSL session ID.
• cookie hashing
• cookie switching also allows differentiated QoS
Think “frequent flyer miles”.
LARD
Idea: route requests based on request URL,
to maximize locality at back-end servers.
a,b,c
d,e,f
LARD
front-end
(a,b,c: 1)
(d,e,f: 2)
(g,h,i: 3)
g,h,i
server array
LARD front-end maintains an LRU
cache of request targets and their
locations, and table of active
connections for each server.
LARD predates commercial URL switches,
and was concurrent with URL-hashing
proxy cache arrays (CARP).
Policies
1. LB (locality-based) is URL hashing.
2. LARD is locality-aware SLB: route to target’s
site if there is one and it is not “overloaded”, else
pick a new site for the target.
3. LARD/R augments LARD with replication for
popular objects.
LARD Performance Study
LARD paper compares SLB/WRR and LB with LARD approaches:
• simulation study
small Rice and IBM web server logs
jiggle simulation parameters to achieve desired result
• Nodes have small memories with greedy-dual replacement.
• WRR combined with global cache-sharing among servers (GMS).
WRR/GMS is global cache LRU with duplicates and cachesharing cost.
LB/GC is global cache LRU with duplicate suppression and no
cache-sharing cost.
LARD Performance Conclusions
1. WRR has the lowest cache hit ratios and the lowest throughput.
There is much to be gained by improving cache effectiveness.
2. LB* achieve slightly better cache hit ratios than LARD*.
WRR/GMS lags behind...it’s all about duplicates.
3. The caching benefit of LB* is minimal, and LB is almost as good
as LB/GC.
Locality-* request distribution induces good cache behavior at the
back ends: global cache replacement adds little.
4. Better load balancing in the LARD* strategies dominates the
caching benefits of LB*.
LARD/R and LARD achieve the best throughput and scalability;
LARD/R yields slightly better throughput.
LARD Performance: Issues and Questions
1. LB (URL switching) has great cache behavior but lousy
throughput.
Why? Underutilized time results show poor load balancing.
2. WRR/GMS has good cache behavior and great load balancing, but
not-so-great throughput.
Why? How sensitive is it to CPU speed and network speed?
3. What is the impact of front-end caching?
4. What is the effectivness of bucketed URL hashing policies?
E.g., Foundry: hash URL to a bucket, pick server for bucket based
on load.
5. Why don’t L7 switch products support LARD? Should they?
[USENIX 2000]: use L4 front end; back ends do LARD handoff.
Possible Projects
1. Study the impact of proxy caching on the behavior of the request
distribution policies.
“flatter” popularity distributions
2. Study the behavior of alternative locality-based policies that
incorporate better load balancing in the front-end.
How close can we get to the behavior of LARD without putting a
URL lookup table in the front-end?
E.g., look at URL switching policies in commercial L7 switches.
3. Implement request switching policies in the FreeBSD kernel, and
measure their performance over GigE.
Mods to FreeBSD for recording connection state and forwarding
packets are already in place.
4. How to integrate smart switches with protocols for group
membership or failure detection?
Download