Testing Games Randomizing Regression Tests using Game Theory

advertisement
Testing Games:
Randomizing Regression Tests Using
Game Theory
Nupul Kukreja, William G.J. Halfond,
Milind Tambe & Manish Jain
Annual Research Review
April 30, 2014
1
Outline
•
•
•
•
•
•
Motivation
Problem(s) with traditional test scheduling
Game Theory and Randomization
Modeling software testing as a 2-player game
Evaluation
Conclusion & Future Work
2
Motivation
Dude! Suite
XX is not
gonna run!
Let’s CODE
NOW FIX
LATER
The deadline is
close too!
DEVS
3
Motivating Problem(s)
• Existing test case scheduling activities are
deterministic
• Developers know which test cases will be
executed when
• Developers can check in insufficiently tested
code closer to delivery deadline
• High-turn around time for fixing bugs in low
priority features
• Random test-scheduling helpful but treats
each test case as equally important
4
Software Testing as a 2-Player Game
• This tension between software testers and
developers can be modeled as a two-player
game
• We solve the game to answer the following
question:
– Given an adaptive adversary (developers) and
resource constraints (testers) what is the optimum
test-scheduling strategy that maximizes the
tester’s expected payoff?
5
Game Theory
• Study of strategic decision making among
multiple players – corporations, software
agents, testers and developers, regular
humans etc.,
6
Two-player “Security” Game
Adversary
Terminal 1
Terminal 2
-3
60%
1
Terminal 1
5
-1
Defender
40%
5
-1
Terminal 2
-5
2
Security game assumptions:
1. What is good for one player (+ve payoff) is bad for the other (-ve payoff)
2. Adversary can conduct perfect surveillance and act appropriately i.e.,
these are simultaneous move games or Stackelberg games
7
Testing Game
Developer
Requirement 1
Check in ITC*
Check in PC*
-3
1
Test
5
Tester
-1
Requirement 1
5
-1
Don’t Test
-5
2
*ITC: Insufficiently tested code
*PC: Perfect code i.e., 100% tested
8
Testing Game – Payoffs
• Payoffs are either positive or negative
• Proportional to the value of the requirement
for both, the tester & developer
• Payoffs can be derived in many ways:
– Directly from requirement priorities
– Expert judgment and/or planning poker
– Delphi methods
– Directly from test-case priorities
9
Defining Test Requirements
• Could be black-box or white-box based
• If black-box, TR may correspond to:
– Module/component
– Method
– OR…the requirement as a whole
• “We” group test cases by requirements
– Each requirement is ‘covered’ by one or more test
cases (or suites)
10
Not All Developers Are The Same
• Commonly encountered personality traits
– Lazy/sloppy
– New Grad
– Moderate/Average
– Seasoned Developer
• Each persona has a probability of “screwing
up” i.e., checking in insufficiently tested code
• We can compute these probabilities by
looking at the team composition
11
The Testing Game
P(sloppy) = 3/10
P(avg) = 5/10
P(seasoned) = 2/10
12
Solving the Testing Game
Probability of scheduling
test case ‘i'
13
Example Testing Game
0.1398
0.1344
0.2307
0.4538
0.0414
Req 1
Req 2
Req 3
Req 4
Req 5
Tester
2
-10
7
-4
6
-1
9
-9
9
-9
Developer
-7
4
-1
3
-6
5
-3
7
-10
3
Create a test case scheduling of ‘m’ test cases by sampling from the above distribution
14
Evaluation
• Large simulation:
– 1000 test requirements = 1 Game
• 1000 Games randomly generated
– Each game played/solved 1000 times over
– Payoffs range from [-10,10]
– Constraint: Can only schedule/execute 500 test cases
• Compared with:
– Deterministic test scheduling
– Uniform Random test scheduling
– Weighted Random test scheduling
• Tester-only weights
• Tester+developer based weights
15
Results
16
Limitations and Threats to Validity
• Developers not adversarial
• Developers may choose to be sloppy at times
with a particular probability
• Lack of perfect historical observation for
developers
• Expected payoffs is mostly a mathematical
notation
17
Conclusion & Future Work
• New approach for test case scheduling using
Game Theory
– Accounts for tester and developer’s payoffs
• Randomizing test cases acts as deterrent for
developers, for checking in insufficiently tested
code
• The test case distribution is optimum under
resource constraints and maximizes payoff for
worst case developer behavior – robust!
• Simulation shows positive results and is a first
step to analyzing the tester/developer
relationship
18
0.1398
0.1344
0.2307
0.4538
0.0414
Req 1
Req 2
Req 3
Req 4
Req 5
Tester
2
-10
7
-4
6
-1
9
-9
9
-9
Developer
-7
4
-1
3
-6
5
-3
7
-10
3
DEVS
Thank you!
Questions?
Adversary
Terminal 1
Terminal 2
-3
1
Terminal 1
5
-1
Defender
5
-1
Terminal 2
-5
2
Download