Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan Vivek Pai

Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan IBM T.J. Watson Research Center Vivek Pai Princeton University Background – Web servers  Previous work focuses on throughput  SPECWeb99 mixes throughput & latency  Network delay dominated end-user latency  Server latency contribution increasing  Connection speed increase  Multiple data centers reduces round-trip time 2 Paper Contributions  Understand server-induced latency  Observe in both Flash & Apache  Identify blocking in filesystem-related queues  Quantify effects – service inversion  Address problem     In both servers Using portable techniques With scalable results 5-50x latency reduction 3 Outline  Experimental setup & measurement methodology  Identify blocking in Web servers  New server design  Results of the new servers 4 Experimental Setup  Server-client setup  3 GHz P4 w/ 1 GB memory  FreeBSD 4.6 operating system  Web servers  Flash & Apache 1.3  Fairly tuned for performance  Workloads  SPECweb99 static  3 GB dataset and 1024 simultaneous connections 5 Latency Analysis Methodology  Response time vs load  Infinite-demand  20, 40, 60, 80, 90, 95% of the infinite-demand request rate  Record mean & 5th, 50th, 95th percentiles of latency CDF Flash Latency Profile – 336 Mb/s 6 Evidence for Blocking in Flash  Event-driven model  Select( ) or kevent( )  Each call returns ready events  About 60-70 events/call  But we have free CPU – should return more often, fewer ready events CDF of # of ready events for Flash 7 Evidence for Blocking in Apache  Multiple processes   Blocking expected Hard to identify excessive blocking  Two configurations  Sample % of ready processes per sec.  Bimodal (0 or 60+)  Worse with 1024 8 Identifying Blocking using DeBox  Exclusive vnode locks  To reduce complexity and avoid possible deadlocks  Directory walk locks  Lock overlapping between the parent and child directory  Locks during disk access  Only downgrade the parent’s lock when the child needs disk access  Result: lock convoys 9 Growth of Median Latencies  But medians shouldn’t grow this fast Response Time (ms) 250 200 150 100  Working set < 200MB 50 0 20% 80% 95% 100% Load Level Apache Flash 10 Response Time vs Dataset Size >99.5 % cache hits 11 Service Inversion  CDF breakdowns  Split CDF by decile  Group responses by size  Service inversion = difference between actual order and ideal order 10 9 … 4 3 2 1 12 Ideal Service Breakdown 100% 90% 80% large files 70% 60% 50% 40% 30% small files 20% 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 13 Flash Service Breakdown 100% large files 90% 80% 70% 60% 50% 40% 30% 20% small files 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 Flash CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 14 Apache Service Breakdown 100% large files 90% 80% 70% 60% 50% 40% 30% 20% small files 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 Apache CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 15 Solution  Let blocking happen, elsewhere  Move filesystem calls out of process  Shared backend  Cache open file descriptors  Perform misses via helper processes  Prefetch cold disk blocks  IPC better than blocking 16 Flashpache Apache Flashpache 17 Flash Ready Events Mean events: Flash – 61, fdpass – 15, New-Flash – 1.6 18 Flashpache Ready Processes 19 New-Flash Latency Profile Flash Latency Profile – 336 Mb/s New-Flash Latency Profile – 450 Mb/s Latency improvement : 6X in mean, 43X in median Median & 95th percentile virtually flat 20 Flashpache Performance Apache Latency Profile – 241 Mb/s Flashpache Latency Profile – 273 Mb/s Latency improvement: 15X in mean, 9X in median Median & 95th percentile virtually flat 21 Response Time vs. Dataset 22 Flash Service Breakdowns 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 1 2 3 Series1 4 5 Series2 Flash 6 7 Series3 8 Series4 9 10 1 2 3 Series1 4 5 Series2 6 Series3 7 8 9 10 Series4 New-Flash 23 Flashpache Service Breakdowns 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 Apache 8 Series4 9 10 1 2 3 Series1 4 5 Series2 6 Series3 7 8 9 10 Series4 Flashpache 24 Latency Scalability Apache Response Time (msec) Response Time (msec) 250 250 200 150 100 50 0 Flashpache 200 150 100 50 0 P II P III P4 P II P III P4 Response time at 0.95 load level 25 In The Paper  More details, measurements, breakdowns  Quantifying service inversion 26 Conclusion  Much server latency originates from head-of-line blocking  Impact on latency higher than throughput  Blocking degrades service fairness  Possible to solve in server application 27 Thank you www.cs.princeton.edu/nsg/papers Princeton University Network Systems Group Apache on Linux Apache Flashpache 29 Effects on Response Time Flash latency CDFs 30 Service Inversion 31

Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan Vivek Pai

Related documents

Products

Support

Understanding &amp; Addressing Blocking-Induced Server Latency Yaoping Ruan Vivek Pai

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan Vivek Pai