Testing Games: Randomizing Regression Tests Using Game Theory Nupul Kukreja, William G.J. Halfond, Milind Tambe & Manish Jain Annual Research Review April 30, 2014 1 Outline • • • • • • Motivation Problem(s) with traditional test scheduling Game Theory and Randomization Modeling software testing as a 2-player game Evaluation Conclusion & Future Work 2 Motivation Dude! Suite XX is not gonna run! Let’s CODE NOW FIX LATER The deadline is close too! DEVS 3 Motivating Problem(s) • Existing test case scheduling activities are deterministic • Developers know which test cases will be executed when • Developers can check in insufficiently tested code closer to delivery deadline • High-turn around time for fixing bugs in low priority features • Random test-scheduling helpful but treats each test case as equally important 4 Software Testing as a 2-Player Game • This tension between software testers and developers can be modeled as a two-player game • We solve the game to answer the following question: – Given an adaptive adversary (developers) and resource constraints (testers) what is the optimum test-scheduling strategy that maximizes the tester’s expected payoff? 5 Game Theory • Study of strategic decision making among multiple players – corporations, software agents, testers and developers, regular humans etc., 6 Two-player “Security” Game Adversary Terminal 1 Terminal 2 -3 60% 1 Terminal 1 5 -1 Defender 40% 5 -1 Terminal 2 -5 2 Security game assumptions: 1. What is good for one player (+ve payoff) is bad for the other (-ve payoff) 2. Adversary can conduct perfect surveillance and act appropriately i.e., these are simultaneous move games or Stackelberg games 7 Testing Game Developer Requirement 1 Check in ITC* Check in PC* -3 1 Test 5 Tester -1 Requirement 1 5 -1 Don’t Test -5 2 *ITC: Insufficiently tested code *PC: Perfect code i.e., 100% tested 8 Testing Game – Payoffs • Payoffs are either positive or negative • Proportional to the value of the requirement for both, the tester & developer • Payoffs can be derived in many ways: – Directly from requirement priorities – Expert judgment and/or planning poker – Delphi methods – Directly from test-case priorities 9 Defining Test Requirements • Could be black-box or white-box based • If black-box, TR may correspond to: – Module/component – Method – OR…the requirement as a whole • “We” group test cases by requirements – Each requirement is ‘covered’ by one or more test cases (or suites) 10 Not All Developers Are The Same • Commonly encountered personality traits – Lazy/sloppy – New Grad – Moderate/Average – Seasoned Developer • Each persona has a probability of “screwing up” i.e., checking in insufficiently tested code • We can compute these probabilities by looking at the team composition 11 The Testing Game P(sloppy) = 3/10 P(avg) = 5/10 P(seasoned) = 2/10 12 Solving the Testing Game Probability of scheduling test case ‘i' 13 Example Testing Game 0.1398 0.1344 0.2307 0.4538 0.0414 Req 1 Req 2 Req 3 Req 4 Req 5 Tester 2 -10 7 -4 6 -1 9 -9 9 -9 Developer -7 4 -1 3 -6 5 -3 7 -10 3 Create a test case scheduling of ‘m’ test cases by sampling from the above distribution 14 Evaluation • Large simulation: – 1000 test requirements = 1 Game • 1000 Games randomly generated – Each game played/solved 1000 times over – Payoffs range from [-10,10] – Constraint: Can only schedule/execute 500 test cases • Compared with: – Deterministic test scheduling – Uniform Random test scheduling – Weighted Random test scheduling • Tester-only weights • Tester+developer based weights 15 Results 16 Limitations and Threats to Validity • Developers not adversarial • Developers may choose to be sloppy at times with a particular probability • Lack of perfect historical observation for developers • Expected payoffs is mostly a mathematical notation 17 Conclusion & Future Work • New approach for test case scheduling using Game Theory – Accounts for tester and developer’s payoffs • Randomizing test cases acts as deterrent for developers, for checking in insufficiently tested code • The test case distribution is optimum under resource constraints and maximizes payoff for worst case developer behavior – robust! • Simulation shows positive results and is a first step to analyzing the tester/developer relationship 18 0.1398 0.1344 0.2307 0.4538 0.0414 Req 1 Req 2 Req 3 Req 4 Req 5 Tester 2 -10 7 -4 6 -1 9 -9 9 -9 Developer -7 4 -1 3 -6 5 -3 7 -10 3 DEVS Thank you! Questions? Adversary Terminal 1 Terminal 2 -3 1 Terminal 1 5 -1 Defender 5 -1 Terminal 2 -5 2