Efficient Time-Aware Prioritization with Knapsack Solvers Sara Alspaugh Kristen R. Walcott Mary Lou Soffa University of Virginia ACM WEASEL Tech Michael Belanich Gregory M. Kapfhammer Allegheny College November 5, 2007 Test Suite Prioritization Testing occurs throughout software development life cycle Challenge: time consuming and costly Prioritization: reordering the test suite Goal: find errors sooner in testing Doesn’t consider the overall time budget Alternative: time-aware prioritization Goal 1: find errors sooner in testing Goal 2: execute within time constraint Motivating Example Original test suite with fault information T1 4 faults 2 min. T2 1 fault 2 min. T3 2 faults 2 min. T4 6 faults 2 min. Assume: - Same execution time - Unique faults found Prioritized test suite T4 6 faults 2 min. T1 4 faults 2 min. T3 2 faults 2 min. T2 1 fault 2 min. Testing time budget: 4 minutes The Knapsack Problem for Time-Aware Prioritization P n c i=1 i ci ¤x i Maximize: , where is the code x i coverage of test and i is either 0 or 1. P Subject to the constraint: ¤ · n i=1 ti ti xi tmax i where is the execution time of test and tmax is the time budget. The Knapsack Problem for Time-Aware Prioritization Assume test cases cover unique requirements. T1 4 lines 2 min. T2 1 line 2 min. T3 2 lines 2 min. T4 5 lines 2 min. Time Budget: 4 min. Total Value: 9 5 0 Space Remaining: 0 2 4 min. The Extended Knapsack Problem Value of each test case depends on test cases already in prioritization Test cases may cover same requirements T1 4 lines 0 2 min. T2 1 line 2 min. T3 2 lines 2 min. T4 5 lines 2 min. UPDATE Time Budget: 4 min. Total Value: 7 5 0 Space Remaining: 0 2 4 min. Goals and Challenges Evaluate traditional and extended knapsack solvers for use in time-aware prioritization Effectiveness Efficiency Coverage-based metrics Time overhead Memory overhead How does overlapping code coverage affect results of traditional techniques? Is the cost of extended knapsack algorithms worthwhile? The Knapsack Solvers Random: select tests cases at random Greedy by Ratio: order by coverage/time Greedy by Value: order by coverage Greedy by Weight: order by time Dynamic Programming: break problem into sub-problems; use sub-problem results for main solution Generalized Tabular: use large tables to store sub-problem solutions The Knapsack Solvers (continued) Core: compute optimal fractional solution then exchange items until optimal integral solution found Overlap-Aware: uses a genetic algorithm to solve the extended knapsack problem for timeaware prioritization The Scaling Heuristic Ti Order the test cases by their coverage-toexecution-time ¸ ratio ¸ such ¸ that: c1 £ j tmax t1 k c1 t1 ¸c £ 2 ³ c2 t2 tmax t2 ::: ´ cn tn If , then it is possible T1 to find an optimal solution that includes . Check then]inequality for each test case Tx ; x 2 [1; until it no longer holds. hT ; : : : T i ¡ 1 x 1 belong in the final prioritization. Implementation Details Program Under Test (P) Coverage Calculator Test Transformer Test Suite (T) Knapsack Solver New Test Suite (T ’) Knapsack Solver Parameters 1. Selected Solver 2. Reduction Preference 3. Knapsack Size Evaluation Metrics Code coverage: Percentage of requirements executed when prioritization is run Basic block coverage used Coverage preservation: Proportion of code covered by prioritization versus code covered by entire original test suite Order-aware coverage: Considers both the order in which test cases execute in addition to overall code coverage Experiment Design Goals of experiment: Case studies: Measure efficiency of algorithms and scaling in terms of time and space overhead Measure effectiveness of algorithms and scaling in terms of three coverage-based metrics JDepend Gradebook Knapsack Size 25, 50, and 75% of execution time of original test suite Summary of Experimental Results Prioritizer Effectiveness: Overlap-aware solver had highest overall coverage for each time limit Greedy by Value solver good for Gradebook All Greedy solvers good for JDepend Prioritizer Efficiency: All algorithms took small amount of time and memory except for Dynamic Programming, Generalized Tabular, and Core Overlap-aware solver required hours to run Generalized Tabular had prohibitively large memory requirements Scaling heuristic reduced overhead in some cases Conclusions Most sophisticated algorithm not necessarily most effective or most efficient Trade-off: effectiveness versus efficiency Efficiency or effectiveness most important? Effectiveness overlap-aware prioritizer Efficiency low-overhead prioritizer Prioritizer choice depends on test suite nature Time versus coverage of each test case Coverage overlap between test cases Future Research Use larger case studies with bigger test suites Use case studies written in other languages Evaluate other knapsack solvers such as branch-and-bound and parallel solvers Incorporate other metrics such as APFD Use synthetically generated test suites Questions? Thank you! http://www.cs.virginia.edu/walcott/weasel.html Case Study Applications Gradebook JDepend Classes 5 22 Functions 73 305 NCSS 591 1808 Test Cases 28 53 7.008 s 5.468 s Test Suite Exec. Time