Presentation Slides - Rice University -

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University ASPLOS 1998 *** Figures adapted from original presentation *** Time Warp to 1998  Rapid Internet growth  Bandwidth limitations  “Cheap” PCs and “fast” LANs  Need for increased throughput 2 Clustered Servers Back-End Node Clien t Back-End Node Clien t Back-End Node 3 Weighted Round Robin (WRR) Back end nodes Front end node B C A A C B C A A B A C A C C C A B A B B C 4 Pure Locality-Based Distribution Back end nodes Front end node B C A A C B C A A A A C B C C B A C B 5 Motivation for Change  Weighted Round Robin  Disregards content on back-end nodes  Many cache misses  Limited by disk performance  Pure Locality-Based Distribution  Disregards current load on back-end nodes  Uneven load distribution  Inefficient use of resources 6 LARD Concepts  Locality-Aware Request Distribution  Goal: improve performance  Higher throughput  Higher cache hit rates  Reduced disk access  Even load distribution + content-based distribution  The best of both algorithms 7 Outline  Basic LARD Algorithm  Improvements to LARD  TCP Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 8 Outline  Basic LARD Algorithm  Improvements to LARD  TCP Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 9 Basic LARD Algorithm  Front-end maps target content to back-end nodes  1-to-1 mapping  First request for each target is assigned to the least-loaded back-end node  Subsequent requests are distributed to the same back-end node based on target content mapping  Unless overloaded…  Re-assigns target content to a new back-end node 10 Flow of Basic LARD Client 11 Determining Load in Basic LARD  Ask the server?  Introduces unnecessary communication  Current load = number of open connections  Tracked in the front-end node  Use thresholds to determine when to re-balance  Low, High, and Limit  Re-balance when (load > Tlimit) or (load > Thigh and there is a “free” node with load < Tlow) 12 Outline  Basic LARD Algorithm  Improvements to LARD  TCP Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 13 LARD Needs Improvement  Only one back-end node per target content  Working set is a single node  Front-end must limit total connections  Still need to increase throughput  One node per content type is unrealistic  …add more back-end nodes? 14 LARD/R  LARD with Replication  Maps target content to a set of back-end nodes  Working set is several nodes with similar cache content  Sends new requests to least-loaded node in set  Moves nodes to/from sets based on load imbalance  Idle nodes in a low-load set are moved to higher-load set 15 Flow of LARD/R Client 16 LARD Outline  Basic LARD Algorithm  Improvements to LARD  Request Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 17 Determining Content Type  How do we determine content in the front-end?  Front-end must see network traffic  Standard TCP Assumptions  Requests are small and light  Responses are big and heavy  How do we forward requests? 18 Potential TCP Solutions  Simple TCP Proxy  Everything must flow through front-end node  Can inspect all incoming content  Cannot respond directly from back-end to client  But front-end can also inspect all outgoing content  Better for persistent connections 19 TCP Connection Handoff  Front-end connects to client  Inspects content  Forwards request to back-end node  Returned directly back to client from back-end node 20 LARD Outline  Basic LARD Algorithm  Improvements to LARD  TCP Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 21 Evaluation Goals  Throughput  Requests/second served by entire cluster  Hit rate  (Requests that hit memory cache) / (total requests)  Underutilization time  Time that a node’s load is ≤ 40% of Tlow 22 Simulation Model  300MHz Pentium II  32MB Memory (cache)  100Mbps Ethernet  Traces from web servers at Rice and IBM 23 Simulation Results – Prior Work  Weighted Round Robin  Lowest throughput  Highest cache miss ratio  But lowest idle time  Pure Locality-Based  An increase in nodes  decrease in cache miss ratio  But idle time increases (unbalanced load)  Only minor improvement over WRR 24 Simulation Results – LARD & LARD/R  Throughput ~4x better (8 nodes)  WRR would need nodes with a 10x larger cache size  CPU bound after 8 nodes  Cache miss rate decreases  Only 1% idle time on average 25 Simulation Results – Throughput 26 Simulation Results – Cache Misses 27 Simulation Results – Idle Time 28 What Affects Performance?  WRR is disk-bound, LARD/R is CPU bound  Increasing CPU speed improves LARD/R, not WRR  Adding more disks improves WRR, not LARD/R  LARD/R shows no improvement if a node has > 2 disks  WRR is not scalable 29 LARD Outline  Basic LARD Algorithm  Improvements to LARD  TCP Handoff Protocol  Simulation and Results  Prototype Implementation and Testing 30 Prototype Implementation  One front-end PC  300MHz Pentium II, 128MB RAM  6 back-end PCs  7 client PCs  166MHz Pentium Pro, 64MB RAM  100Mb Ethernet, 24-port switch 31 Prototype Testing Results 32 Evaluation Shortcomings  What influences the results more?  LARD/R protocol?  TCP handoff protocol? 33 Conclusion  LARD and LARD/R significantly better than WRR     Higher throughput Better CPU utilization More frequent cache hits Reduced disk access  Benefits of Locality-Based and Load-Balanced  Scalable at low cost 34

Presentation Slides - Rice University -

Related documents

Products

Support

Presentation Slides - Rice University -

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib