Integrated Resource Management for Cluster-based Internet Services Kai Shen Hong Tang, Tao Yang*, Lingkun Chu Dept. of Computer Science Dept. of Computer Science Univ. of Rochester Univ. of California, Santa Barbara *: Ask Jeeves, Inc. Background Large-scale resource-intensive Internet services hosted on server clusters. Yahoo, MSN, Google, Teoma/Ask Jeeves … Challenges/requirements for resource management: Scalability and robustness; Online users require interactive responses; Resource (CPU, IO)–hungry service processing and large user traffic require efficient resource utilization; Fluctuating user traffic requires adaptive management; Supporting differentiated services to different types of user requests. 6/20/2016 OSDI 2002 2 Architecture of Targeted Services: Document Search Engine Query caches Index servers (partition 1) Firewall/ Web switch Local-area network Index servers (partition 2) Web server/ Query handlers Doc servers 6/20/2016 OSDI 2002 Index servers (partition 3) 3 “Neptune” Project Overview Programming and runtime support to aggregate and replicate stand-alone service components. Building blocks for scalable and robust service constructions: 1. 2. 3. Functionally-symmetric clustering architecture; Integrated resource management – quality, efficiency, and differentiation; Replication management. 6/20/2016 OSDI 2002 4 Architecture of Targeted Services: Document Search Engine Query cache Firewall/ Web switch Index servers (partition 1) Neptune runtime Neptune runtime SAP Local-area network SAP Index servers (partition 2) Web server/ Query handlers Doc servers 6/20/2016 OSDI 2002 Index servers (partition 3) 5 Neptune Deployments Service deployments: Web document searching; BLAST – protein sequence similarity matching; Prototype database services – online discussion group, auction. Production system at search engines Teoma/Ask Jeeves since 2000: search indexes of more than 450M Web documents; over 800 multiprocessor servers; tens of millions of search queries per day. 6/20/2016 OSDI 2002 6 Outline Project Overview Integrated Resource Management Multiple Resource Management Objectives Two-level Mechanism Trace-driven Performance Evaluation on a Linux Cluster Related Work and the Conclusion 6/20/2016 OSDI 2002 7 Quality-aware Resource Utilization Efficiency Throughput: measure resource utilization efficiency. Service response time: measure client-perceived service quality. Aggregate service yield: measure quality-aware resource utilization efficiency. Fulfillment of each service request generates quality-aware service yield – a function of service response time. Service yield function Y (r ) – specified by service providers (flexibility). System goal – maximizing aggregate service yield: Y (r ) r 6/20/2016 OSDI 2002 8 Sample Service Yield Functions C Ythroughput 0 { Service yield <A> Maximizing throughput (with a deadline) if 0 r D, if r D. QoS yield Constant yield QoS yield 0 0 Full yield QoS yield 0 0 6/20/2016 <C> A hybrid metric Service yield Service yield <B> Minimizing mean response time (with a deadline) Deadline Deadline Response time Full yield Drop penalty Response time 0 0 OSDI 2002 Full-yield Deadline deadline Response time 9 Service Differentiation Service class – a category of service accesses that enjoy the same level of QoS support. Client identities: paid vs unpaid, consumers vs corporate partners. Service types or data partitions: order placement vs catalog browsing. Service differentiation in Neptune Differentiated service yield function. Proportional resource allocation guarantee. 6/20/2016 OSDI 2002 10 Two-level Resource Management Service client Service client Service client Cluster-level request distribution Service Service node Service node Service node node Other node … ... Other node Nodes hosting the requested service Service cluster 6/20/2016 OSDI 2002 11 Cluster-level: Partitioning or Not? Periodic Server Partitioning [Zhu2001]: Determine resource allocation at each epoch. Partition the server pool among service classes. Neptune – does not partition servers at cluster-level: Random polling-based load balancing to evenly distribute requests for each service class to all nodes service differentiation inside each node. Advantages: Functional-symmetry and decentralization robustness and scalability. Better handling of system state changes: demand spikes and node failures. Disadvantage: Less isolation for misbehaved service classes. 6/20/2016 OSDI 2002 12 Node-level Request Scheduling Class 1 … Class N Drop requests likely generating zero yield Search for under-allocated service class Request scheduler Found ? Yes Schedule the under-allocated service class No Schedule for high aggregate yield Worker threads 6/20/2016 OSDI 2002 13 Scheduling for High Aggregate Yield Offline optimal scheduling is NP-complete. Policy Priority (the smaller the higher) EDF Relative deadline; YID Relative deadline divided by expected yield; Greedy Expected resource consumption divided by expected yield; Adaptive Dynamically switch between YID (in underload) and Greedy (in overload). 6/20/2016 OSDI 2002 14 Evaluation Settings Evaluation platform Workload I: trace-driven A cluster of Linux servers connected by switched Ethernet. Document search on a 2.5GB memory-mapped search index. Based on 1.5M search queries selected from an one-week access trace at Ask Jeeves search in January 2002. “Service yield”-based priority order: Gold > Silver > Bronze. QoS yield Workload II: CPU-spinning micro-benchmark. Poisson process arrival; exponentially-distributed service processing time. 6/20/2016 OSDI 2002 15 Evaluation on Scheduling Policies (16 nodes aggregate) Performance Metric: LossPercen t OfferedYie ld RealizedYi eld 100% OfferedYie ld (B) Overload (A) Underload EDF YID Greedy Adaptive 4% 2% Lost percent 60% Loss percent Loss percent 6% 45% 30% EDF YID Greedy Adaptive Lost percent 15% Aggregated yield(normalized) (normalized) Aggregated Aggregated yield (normalized) (normalized) 0% 0% Aggregated yield yield 0% 25% 50% 75% 100% 100% 125% 150% 175% 200% Aggregated yield (normalized) Arrival demand Aggregated yield (normalized) Arrival demand EDF and YID perform better than Greedy during system under-load; Greedy performs better during system overload. Adaptive dynamically switches between YID and Greedy to achieve good performance under both situations. 6/20/2016 OSDI 2002 16 CPU demand/acquisition In percentage to total system resource Service Differentiation during a Demand Spike and a Node Failure (8 nodes) Bronze demand Bronze acquisition Silver demand Silver acquisition Gold demand Gold acquisition 100% 80% 60% 40% 20% Resource demand/acquisition Resource demand/acquisition 0% 0 50 100 150 200 250 300 Timeline (seconds) “Service yield”-based priority order: Gold > Silver > Bronze. 20% proportional resource guarantee for low-priority Bronze class. Demand spike for the Silver class between time 50 and 150. One node fails at time 200 and recovers at 250. 6/20/2016 OSDI 2002 17 Performance Scalability <A> Differentiated Search <B> Micro-benchmark 20 Aggregated yield (normalized) Aggregated yield (normalized) 20 Demand 200% Demand 125% Demand 75% 15 10 5 Aggregate yield (normalized) 0 0 5 10 15 Number of service nodes 6/20/2016 Demand 200% Demand 125% Demand 75% 15 10 5 Aggregate yield (normalized) 20 0 0 OSDI 2002 5 10 15 Number of service nodes 20 18 Related Work Software infrastructure for cluster-based Internet services – TACC [Fox1997], MultiSpace [Gribble1999], Porcupine [Saito1999], Ninja [von Behren2002]. QoS and service differentiation in computer networks – Weighted Fair Queuing [Demers1990; Parekh1993], Leaky Bucket, LIRA [Stoica1998], [Dovrolis1999]. QoS or real-time scheduling at the single host level – [Huang1989], [Haritsa1993], [Waldspurger1994], [Mogul1996], LRP [Druschel96], [Jones97], Eclipse [Bruno1998], Resource Container [Banga1999], [Steere1999]. Resource management and QoS for Web servers – [Almeida1998], [Pandey1998], [Abdelzaher1999], [Bhatti1999], [Chandra2000], [Li2000], [Voigt2001]. Resource management for clustered servers – LARD [Pai1998], Cluster Reserves [Aron2000], [Sullivan2000], DDSD [Zhu2001], [Chase2001]. 6/20/2016 OSDI 2002 19 Conclusion Multiple resource management objectives: Two-level resource management mechanism: quality-aware resource utilization efficiency service differentiation non-partitioning at the cluster level adaptive scheduling at the node level Trace-driven evaluations. Future work – other types of service qualities. 6/20/2016 OSDI 2002 20