Skoll: A System for Distributed Continuous Quality Assurance Atif Memon & Adam Porter University of Maryland {atif,aporter}@cs.umd.edu Quality Assurance for Large-Scale Systems • Modern systems increasingly complex – Run on numerous platform, compiler & library combinations – Have 10’s, 100’s, even 1000’s of configuration options – Are evolved incrementally by geographically-distributed teams – Run atop of other frequently changing systems – Have multi-faceted quality objectives • How do you QA systems like this? 2 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Distributed Continuous Quality Assurance • QA processes conducted around-the-world, around-the-clock on powerful, virtual computing grids – Grids can by made up of end-user machines, project-wide resources or dedicated computing clusters • General Approach – Divide QA processes into numerous tasks – Intelligently distribute tasks to clients who then execute them – Merge and analyze incremental results to efficiently complete desired QA process • Expected benefits – Massive parallelization allows more, better & faster QA – Improved access to resources/environs. not readily found in-house – Carefully coordinated QA efforts enables more sophisticated analyses 3 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Collaborators Doug Schmidt & Andy Gohkale Group Alex Orso Myra Cohen Murali Haran, Alan Karr, Mike Last, & Ashish Sanil Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at IBM TJ Watson) & Il-Chul Yoon 4 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Clients See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 5 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Clients 1. Model See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 6 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Clients 2. Reduce Model See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 7 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Test TestResources Request Clients 3. Distribution See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 8 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Test Results Clients 4. Feedback See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 9 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Skoll DCQA Infrastructure & Approach Clients 5. Steering See: A. Porter, C. Yilmaz, A. Memon, A. Nagarajan, D. C. Schmidt, and B. Natarajan, Skoll: A Process and Infrastructure for Distributed Continuous Quality Assurance. IEEE Transactions on Software Engineering. August 2007, 33(8), pp. 510-525. 10 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} The ACE+TAO+CIAO (ATC) System • ATC characteristics – 2M+ line open-source CORBA implementation – maintained by 40+, geographically-distributed developers – 20,000+ users worldwide – Product Line Architecture with 500+ configuration options • runs on dozens of OS and compiler combinations – Continuously evolving – 200+ CVS commits per week – Quality concerns include correctness, QoS, footprint, compilation time & more 11 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Define QA Space Options Type Settings Operating System compile-time {Linux, Windows XP, ….} TAO_HAS_MINIMUM_CORBA compile-time {True, False} ORBCollocation runtime {global, per-orb, no} ORBConnectionPurgingStrategy runtime {lru, lfu, fifo, null} ACE_version component version {v5.4.3, v5.4.4,…} TAO_version component version {v1.4.3, v1.4.4,…} run(ORT/run_test.pl) test case {True, False} Constraints (TAO_HAS_AMI) (¬TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) (¬TAO_HAS_MINIMUM_CORBA) 12 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Nearest Neighbor Search 13 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Nearest Neighbor Search 14 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Nearest Neighbor Search 15 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Nearest Neighbor Search 16 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Fault Characterization • We used machine learning techniques (classification trees) to model option & setting patterns that predict test failures CORBA_MESSAGING | 1 0 AMI_POLLER AMI 1 0 AMI_CALLBACK 1 ERR-1 0 OK 17 OK 1 ERR-2 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} 0 ERR-3 Applications & Feasibility Studies • Compatibility testing of component-based systems • Configuration-level fault characterization • Test case generation & input space exploration 18 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Compatibility Testing of Comp-Based Systems Goal • Given a component-based system, identify components & their specific versions that fail to build Solution Approach • Sample the configuration space, efficiently test this sample & identify subspaces in which compilation & installation fails – Initial focus on building & installing components. Later work will add functional and performance testing See: I. Yoon, A. Sussman, A. Memon & A. Porter, Direct-Dependency-based Software Compatibility Testing. International Conference on Automated Software Engineering, Nov. 2007 (to appear). 19 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} The InterComm (IC) Framework • Middleware for coupling large scientific simulations – Built from up to 14 other components (e.g., PVM, MPI, GCC, OS) – Each comp can have several actively maintained versions – There are complex constraints between components, e.g., • Requires GCC version 2.96 or later • When configured with multiple GNU compilers, all must have the same version number • When configured with multiple comps that use MPI, all must use the same implementation & version – http://www.cs.umd.edu/projects/hpsl/chaos/ResearchAreas/ic • Developers need help to – Identify working/broken configurations – Broaden working set (to increase potential user base) – Rationally manage support activities 20 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Annotated Comp. Dependency Graph • ACDG = (CDG, Ann) – CDG: DAG capturing inter-comp deps – Ann: comp. versions & constraints • Constraints for each cfg, e.g., – ver (gf) = x ver (gcr) = x – ver (gf) = 4.1.1 ver (gmp) ≥ 4.0 • Can generate cfgs from ACDG – 3552 total cfgs. Takes up to ~10,700 CPU hrs to build all 21 Comp ic ap pvm lam mch gf gf77 pf gxx pxx mpfr gmp pc pc gcr gcr fc fc http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Vers. Description 1.5 InterComm 0.7.9 Array mgmt 3.2.6, 3.3.11, 3.4,5 Parallel data comm. 6.5.9, 7.0.6, MPI impl7.1.3 1.2.7 MPI impl 4.0.3, 4.1.195 GNU Fortran 3.3.6, 3.4.677 GNU Fortran 6.2 PGI Fortran 3.3.6, GNU 3.4.6, C++ 4.0.3, 4.1.1 PGI6.2 C++ High -prec.2.2.0 floating-point 4.2.1 Arbitrary prec. arith 6.2 PGI C 3.3.6, 3.4.6, GNU4.0.3, C 4.1.1 4.0Linux OS Fedora Core Improving Test Execution • Cfgs often share common build subsequences. This build effort should be reusable across cfgs • Combine all cfgs into a data structure called a prefix tree • Execute implied test plan across grid by (1) assigning subpaths to clients, (2) building each subcfg in a VM & caching the VMs to enable reuse • Example: with 8 machines each able to cache up to 8 VMs exhaustive testing takes up to 355 hours fc 4.0 gcr v4.0.3 gmp v4.2.1 22 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} pf/6.2 Direct-Dependency (DD) Coverage • Hypothesis: A comp’s build process is most likely to be affected by the comps on which it directly depends – A directly depends on B iff there is a path (in CDG) from A to B containing no comp nodes • Sampling approach – Identify all DDs between every pair of components – Identify all valid instantiations of these DDs (ver. combs that violate no constraints) – Select a (small) set of cfgs that cover all valid instantiations of the DDs 23 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Executing the DD Coverage Test Suite • DD test suite much smaller than exhaustive – 211 cfgs with 649 comps vs 3552 cfgs with 9919 comps – For IC, no loss of test eff. (same build failures exposed) • Speedups achieved using 8 machines w/ 8 VM cache – Actual case: 2.54 (18 vs 43 hrs) – Best case: 14.69 (52 vs 355 hrs) 24 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Summary • Infrastructure in place & working – Complete client/server implementation using VMware – Simulator for large scale tests on limited resources • Initial results promising, but lots of work remains • Ongoing activities – Alternative algorithms & test execution policies – More theoretical study of sampling & test exec approaches – Apply to more software systems 25 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Configuration-Level Fault Characterization Goal • Help developers localize configuration-related faults Current Solution Approach • Use covering arrays to sample the cfg space to test for subspaces in which (1) compilation fails or (2) reg. tests fail • Build models that characterize the configuration options and specific settings that define the failing subspace See: C. Yilmaz, M. Cohen, A. Porter, Covering Arrays for Efficient Fault Characterization in Complex Configuration Spaces, ISSTA’04, TSE v32 (1) 26 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Covering Arrays • Compute test schedule from t-way covering arrays – a set of configurations in which all ordered t-tuples of option settings appear at least once • 2-way covering array example: 27 Configurations C3 C4 C5 C6 C1 C2 C7 C8 C9 O1 0 0 0 1 1 1 2 2 2 O2 0 1 2 0 1 2 0 1 2 O3 0 1 2 1 2 0 2 0 1 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Limitations • Must choose the strength covering array before computing it – No way to know, a priori, what the right value is – Our experience suggests failures patterns can change over time • Choose too high: – Run more tests than necessary – Testing might not finish before next release • Non-uniform sample negatively affects classification performance • Choose too low: – Non-uniform sample negatively affects classification techniques – Must repeat process at higher strength 28 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Incremental Covering Arrays • Start with traditional covering array(s) of low strength (usually 2) • Execute test schedule & classify observed failures • If resources allow or classification performance requires – Increment strength – Build new covering array using previously run array(s) as seeds See: S. Fouche, M. Cohen and A. Porter. Towards Incremental Adaptive Covering Arrays, ESEC/FSE 2007, (to appear) 29 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Incremental Covering Arrays (cont.) Multiple CAs at each level of t • Use t1 as a seed for the first t+1way array (t+11) • To create the ith t-way array (ti), create a seed of size = |t-11| using non-seeded cfgs from t-1i • If |seed| < |t-11| complete seed with cfgs from t+11 30 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} MySQL Case Study • Project background – Widely-used, 2M+ line open-source database project – Cont. evolving & maint. by geographically-distributed developers – Dozens of cfg opts & runs on dozens of OS/compiler combos • Case study using release 5.0.24 – Used 13 cfg opts with 2-12 settings each (> 110k unique cfgs) – 460 tests/per config across a grid of 50 machines – Executed ~50M tests total using ~25 CPU years 31 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Results • Built 3 trad and incr covering arrays for 2 t 4 – Traditional sizes : 108, 324, 870 – Incremental sizes: 113, 336 (223), 932 (596) • Incr appr exposed & classified the same failures as trad appr • Costs depend on t & failures patterns – Failures at level t : Inc > Trad (4-9%) – Failures at level < t : Inc < Trad (65-87%) – Failures at level > t : Inc < Trad (28-38%) 32 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Summary • New application driving infrastructure improvements • Initial results encouraging – Applied process to a configuration space with over 110K cfgs – Found many test failures corresponding to real bugs – Incremental approach more flexible than traditional approach. Appears to offer substantial savings in best case, while incurring minimal cost in worst case • Ongoing extensions – MySQL continuous build process – Community involvement starting – Want to volunteer? Go to http://www.cs.umd.edu 33 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} GUI Test Case – Executable By A “Robot” • JFCUnit – Other interactions – Exponential with length • Capture/replay – Tedious • Test “common” sequences – Bad Idea • Model-based techniques – GUITAR – 34 guitar.cs.umd.edu http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Modeling The Event-Interaction Space Sampling • Event flow graph (EFG) – Nodes: all GUI events • Starting events – Edges: Follows • relationship • Reverse Engineering Follows – Obtained Automatically Follows • Test case generation Follows – Cover all edges Follows See: Atif M. Memon and Qing Xie, Studying the Fault-Detection Effectiveness of GUI Test Cases for Rapidly Evolving Software. IEEE Transactions on Software Engineering, vol. 31, no. 10, 2005, pp. 884-896. 35 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Lets See How It Works! • Point to the CVS head – Push the button – Read error report – Gets code from CVS head – Builds – Reverse engineers the eventflow graph – Generates test cases to cover all the edges • 2-way covering – Runs them • SourceForge.net Number of Faults Detected 9 • What happens 8 7 6 Crossw ordSage 5 4 FreeMind 3 2 1 0 – Four applications 36 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} GanttProject JMSN Digging Deeper! • Intuition – Non-interacting events (e.g., Save, Find) – Interacting events (e.g., Copy, Paste) • Key Idea – Identify interacting events – Mark the EFG edges (Annotated graph) – Generate 3-way, 4-way, … covering test cases for interacting EFG events only 37 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Identifying Interacting Events • High-level overview of approach – Observe how events execute on the GUI – Events interact if they influence one another’s execution • Execute event e2; execute event sequence <e1, e2> • Did e1 influence e2’s execution? • If YES, then they must be tested further; annotate the <e1, e2> edge in graph • Use feedback – Generate seed suite • 2-way covering test cases – Run test cases • Need to obtain sets of GUI states – Collect GUI run-time states as feedback – Analyze feedback and obtain interacting event sets – Generate new test cases • 3-way, 4-way, … covering test cases 38 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Did We Do Better? • Compare feedback-based approach to 2-way coverage Number of Faults Detected 9 Crossw ordSage 8 FreeMind 7 GanttProject 6 5 4 JMSN 3 2 1 0 39 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Summary • Manually developed test cases – JFCUnit, Capture/replay – Can be deployed and executed by a “robot” – Too many interactions to test • Exponential • The GUITAR Approach – Develop a model of all possible interactions – Use abstraction techniques to “sample” the model • Develop adequacy criteria – Generate an “initial test suite”; Develop an “Execute tests – collect feedback – annotate model – generate tests” cycle – Feasibility study & results 40 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} Future Work • Need volunteers for MySQL Build Farm Project – http://skoll.cs.umd.edu • Looking for more example systems (help!) • Continue improving Skoll system • New problem classes – Performance and robustness optimization – Improved use of test data • Test case ROI analysis • Configuration advice • Cost-aware testing (e.g., minimize power, network, disk) • Use source code analysis to further reduce state spaces • Extend test generation technology outside GUI applications • QA for distributed systems 41 http://www.cs.umd.edu/{projects/skoll,~atif/GUITARWeb/} The End