Understanding & Addressing Blocking-Induced Server Latency Yaoping Ruan IBM T.J. Watson Research Center Vivek Pai Princeton University Background – Web servers Previous work focuses on throughput SPECWeb99 mixes throughput & latency Network delay dominated end-user latency Server latency contribution increasing Connection speed increase Multiple data centers reduces round-trip time 2 Paper Contributions Understand server-induced latency Observe in both Flash & Apache Identify blocking in filesystem-related queues Quantify effects – service inversion Address problem In both servers Using portable techniques With scalable results 5-50x latency reduction 3 Outline Experimental setup & measurement methodology Identify blocking in Web servers New server design Results of the new servers 4 Experimental Setup Server-client setup 3 GHz P4 w/ 1 GB memory FreeBSD 4.6 operating system Web servers Flash & Apache 1.3 Fairly tuned for performance Workloads SPECweb99 static 3 GB dataset and 1024 simultaneous connections 5 Latency Analysis Methodology Response time vs load Infinite-demand 20, 40, 60, 80, 90, 95% of the infinite-demand request rate Record mean & 5th, 50th, 95th percentiles of latency CDF Flash Latency Profile – 336 Mb/s 6 Evidence for Blocking in Flash Event-driven model Select( ) or kevent( ) Each call returns ready events About 60-70 events/call But we have free CPU – should return more often, fewer ready events CDF of # of ready events for Flash 7 Evidence for Blocking in Apache Multiple processes Blocking expected Hard to identify excessive blocking Two configurations Sample % of ready processes per sec. Bimodal (0 or 60+) Worse with 1024 8 Identifying Blocking using DeBox Exclusive vnode locks To reduce complexity and avoid possible deadlocks Directory walk locks Lock overlapping between the parent and child directory Locks during disk access Only downgrade the parent’s lock when the child needs disk access Result: lock convoys 9 Growth of Median Latencies But medians shouldn’t grow this fast Response Time (ms) 250 200 150 100 Working set < 200MB 50 0 20% 80% 95% 100% Load Level Apache Flash 10 Response Time vs Dataset Size >99.5 % cache hits 11 Service Inversion CDF breakdowns Split CDF by decile Group responses by size Service inversion = difference between actual order and ideal order 10 9 … 4 3 2 1 12 Ideal Service Breakdown 100% 90% 80% large files 70% 60% 50% 40% 30% small files 20% 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 13 Flash Service Breakdown 100% large files 90% 80% 70% 60% 50% 40% 30% 20% small files 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 Flash CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 14 Apache Service Breakdown 100% large files 90% 80% 70% 60% 50% 40% 30% 20% small files 10% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 8 9 10 Series4 Apache CDF breakdown by decile at load level 0.95 Service inversion at this level is 0.58 15 Solution Let blocking happen, elsewhere Move filesystem calls out of process Shared backend Cache open file descriptors Perform misses via helper processes Prefetch cold disk blocks IPC better than blocking 16 Flashpache Apache Flashpache 17 Flash Ready Events Mean events: Flash – 61, fdpass – 15, New-Flash – 1.6 18 Flashpache Ready Processes 19 New-Flash Latency Profile Flash Latency Profile – 336 Mb/s New-Flash Latency Profile – 450 Mb/s Latency improvement : 6X in mean, 43X in median Median & 95th percentile virtually flat 20 Flashpache Performance Apache Latency Profile – 241 Mb/s Flashpache Latency Profile – 273 Mb/s Latency improvement: 15X in mean, 9X in median Median & 95th percentile virtually flat 21 Response Time vs. Dataset 22 Flash Service Breakdowns 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 1 2 3 Series1 4 5 Series2 Flash 6 7 Series3 8 Series4 9 10 1 2 3 Series1 4 5 Series2 6 Series3 7 8 9 10 Series4 New-Flash 23 Flashpache Service Breakdowns 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% 1 2 3 Series1 4 5 Series2 6 7 Series3 Apache 8 Series4 9 10 1 2 3 Series1 4 5 Series2 6 Series3 7 8 9 10 Series4 Flashpache 24 Latency Scalability Apache Response Time (msec) Response Time (msec) 250 250 200 150 100 50 0 Flashpache 200 150 100 50 0 P II P III P4 P II P III P4 Response time at 0.95 load level 25 In The Paper More details, measurements, breakdowns Quantifying service inversion 26 Conclusion Much server latency originates from head-of-line blocking Impact on latency higher than throughput Blocking degrades service fairness Possible to solve in server application 27 Thank you www.cs.princeton.edu/nsg/papers Princeton University Network Systems Group Apache on Linux Apache Flashpache 29 Effects on Response Time Flash latency CDFs 30 Service Inversion 31