I/O System Performance Debugging Using Model-driven Anomaly Characterization Kai Shen Ming Zhong Chuanpeng Li Dept. of Computer Science, Univ. of Rochester Motivation Implementations of complex systems (e.g., operating systems) contain performance “problems” Such problems are hard to identify and understand for complex systems over-simplification, mishandling of special cases, … … these problems degrade the system performance; make system behavior unpredictable many system features and configuration settings dynamic workload behaviors problems manifest under special conditions Goal comprehensively identify performance problems over wide ranges of system configurations and workload conditions 6/20/2016 FAST'05 2 Bird’s Eye View of Our Approach Construct models to predict system performance Model-driven anomaly characterization “simple”: modeling system components following their highlevel design algorithms “comprehensive”: considering wide ranges of system configuration and workload conditions Discover performance anomalies (discrepancies between model prediction and measured actual performance) Characterize them and attribute them to possible causes What can you do with the anomaly characterizations? making system perform better and more predictable through debugging identifying problematic settings for avoidance 6/20/2016 FAST'05 3 Operating System Support for Disk I/O-Bound Online Servers Disk I/O-bound online servers Server processing access large disk-resident data Examples: Web servers serving large Web data index searching database-driven server systems Complex workload characteristics affecting performance Operating system support I/O prefetching Disk I/O scheduling (elevator, anticipatory, …) File system layout and meta-data management Memory caching 6/20/2016 FAST'05 4 A “Simple” Yet “Comprehensive” Throughput Model Workload characteristics System I/O throughput Memory caching model workload’ throughput’ OS configuration I/O prefetching model workload’’ Operating system throughput’’ Decompose a complex system into weakly coupled subcomponents (layers) Each layer transforms the workload and alters the I/O throughput I/O scheduling model workload’’’ Storage properties 6/20/2016 throughput’’’ Storage device model FAST'05 Consider wide ranges of workloads and server concurrency 5 Model-Driven Anomaly Characterization An OS implementation may deviate from model prediction over-simplification, mishandling of special cases, … … a “performance bug” may only manifest under specific system configurations or workload conditions Real system measurement Sample workload & configuration settings Comparison Statistical clustering and characterization Representative anomalous settings Performance model prediction 6/20/2016 FAST'05 Performance bug profiles Correlated system component & workload conditions … … … ... … … … ... … … … ... 6 Parameter Sampling We choose a set of system configurations and workload properties to check performance anomalies Sample parameters are chosen from a parameter space system configuration x system configuration y workload property z If we choose samples randomly and independently, the chance for missing a bug decreases exponentially as the sample number increases 6/20/2016 FAST'05 7 Sampling Parameter Space Workload properties server concurrency I/O access pattern a stream a stream application inter-I/O think time OS configurations prefetching: enable (prefetching depth)/disable I/O scheduling: elevator or anticipatory memory caching: enable/disable 6/20/2016 FAST'05 8 Anomaly Clustering system configuration x system configuration y workload property z Anomalous settings may be due to multiple causes (bugs) hard to make observation out of all anomalous settings desirable to cluster anomalous settings into groups likely attributed to individual causes Existing clustering algorithms (EM, K-means) do not handle crossintersected clusters We perform hyper-rectangle clustering 6/20/2016 FAST'05 9 Anomaly Characterization Anomaly characterization hard to derive useful debugging information from a group of anomalous settings succinct characterizations are desirable Characterization is easy after hyper-rectangle clustering simply projecting the hyper-rectangle onto all dimensions 6/20/2016 FAST'05 10 Experimental Setup A micro-benchmark that can be configured to exhibit any desired workload patterns Linux 2.6.10 kernel parameter sampling (400 samples) anomaly clustering and characterization for one possible bug human debugging (assisted by a kernel tracing tool) 6/20/2016 FAST'05 11 Result – Top 50 Model/Measurement Errors out of 400 Samples Error defined as: 1– Measured throughput Model-predicted throughput Model/measurement error 100% 80% Original Linux 2.6.10 #1 bug fix #1, #2 fixes #1, #2, #3 fixes #1, #2, #3, #4 fixes 60% 40% Performance error 20% 0% Sample parameter settings ranked on errors 6/20/2016 FAST'05 12 Result – Anomaly #1 Workload property concurrency: Stream length: 128 and above 256KB and above System configuration Prefetching: The cause enabled when the disk queue is “congested”, prefetching is cancelled however, prefetching sometimes include synchronously requested data, which is resubmitted as single-page “makeup” I/O Solutions do not cancel prefetching that includes synchronously requested data or block reads when the disk queue is “congested” 6/20/2016 FAST'05 13 Result – Anomaly #2, #3, #4 Anomaly #2 Anomaly #3 concerning the anticipatory I/O scheduler uses average seek distance of past requests to estimate seek time concerning the elevator I/O scheduler always search from block address 0 for next request after “reset” Anomaly #4 concerning the anticipatory I/O scheduler a large I/O operation is often split into small disk requests, anticipation timer is started after the first disk request returns 6/20/2016 FAST'05 14 Result – Overall Predictability Model prediction Measured performance 35 30 25 20 15 10 5 I/O throughput (in MB/sec) 0 After four bug fixes I/O throughput (in Mbytes/sec) I/O throughput (in Mbytes/sec) Original Linux 2.6.10 35 30 25 20 15 10 5 I/O0 throughput (in MBytes/sec) Ranked sample parameter settings 6/20/2016 Ranked sample parameter settings FAST'05 15 Support for Real Applications Index searching from Ask Jeeves search engine Search workload following a 2002 Ask Jeeves trace Anticipatory I/O scheduler Apache Web server Media clips workload following IBM 1998 World Cup trace Elevator I/O scheduler 25 20 15 10 #1, #3 bug fixes 5 #1 bug fix Original(in Linux 2.6.10 I/O throughput MB/sec) 0 1 2 4 8 16 32 64 128 256 Server concurrency 6/20/2016 I/O throughput (in Mbytes/sec) I/O throughput (in Mbytes/sec) 30 15 10 5 I/O 0 #1, #2, #4 bug fixes #1, #2 bug fixes #1 bug fix Original(in Linux 2.6.10 throughput MB/sec) 1 FAST'05 2 4 8 16 32 64 128 256 Server concurrency 16 Related Work I/O system performance modeling [Worthington et al. 1994] [Shriver et al. 1998] [Uysal et al. 2001] OS I/O subsystem [Cao et al. 1995] [Shenoy & Vin 1998] [Shriver et al. 1999] Performance debugging Storage devices [Ruemmler & Wilkes 1994] [Kotz et al. 1994] Fine-grain system instrumentation & simulation [Goldberg & Hennessy 1993] [Rosenblum et al. 1997] Analyzing online traces [Chen et al. 2002] [Aguilera et al. 2003] Correctness (non-performance) debugging Code analysis [Engler et al. 2001] [Li et al. 2004] Configuration debugging [Nagaraja et al. 2004] [Wang et al. 2004] 6/20/2016 FAST'05 17 Summary Model-driven anomaly characterization a systematic approach to assist performance debugging for complex systems over wide ranges of runtime conditions for disk I/O-bound online servers, we discovered several performance bugs of Linux 2.6.10 kernel Linux 2.6.10 kernel patch for bug fix #1 available http://www.cs.rochester.edu/~cli/Publication/patch1.htm 6/20/2016 FAST'05 18