lecture12-random

advertisement
Randomness and testing
Analysis, Lecture 12
Claire Le Goues
February 24, 2015
(c) 2015 C. Le Goues
1
Learning goals
• Define random testing, describe its benefits and
drawbacks, contrast with partition testing, and
enumerate an example tool.
• Describe fuzz testing, including its technical
distinction from regular random testing, and the
defects its particularly well suited to find.
• Explain how randomness can help with testing for
robustness, usability, integration, and performance
testing.
• Define mutation testing and explain why it’s useful.
(c) 2015 C. Le Goues
2
Switch statements and V(G)
• …a more complicated question than
you’d think.
• Short answer: # of cases.
• Long answer: counting switch statements
in terms of actual number of edges can
lead to misleadingly high complexity
numbers.
• McCabe suggested just ignoring them.
(c) 2015 C. Le Goues
3
Random testing
• Testing: verb
– The more or less thorough execution of the
software with the purpose of finding bugs before
the software is released for use and to establish that
the software performs as expected.
• Random: adjective
1. proceeding, made, or occurring without definite
aim, reason, or pattern: the random selection of
numbers.
2. Statistics. of or characterizing a process of
selection in which each item of a set has an equal
probability of being chosen.
2014, Dictionary.com
(c) 2015 C. Le Goues
4
In a nutshell
• Select inputs independently at random from the program’s
input domain:
– Identify the input domain of the program.
– Map random numbers to that input domain.
– Select inputs from the input domain according to some
probability distribution.
– Determine if the program achieves the appropriate outputs on
those inputs.
• Random testing can provide probabilistic guarantees about
the likely faultiness of the program.
– E.g., Random testing using ~23,000 inputs without failure (N =
23, 000) establishes that the program will not fail more than one
time in 10,000 (F = 104), with a confidence of 90% (C = 0.9).
(c) 2015 C. Le Goues
5
Random testing
The systematic variation
of values through the
input space with the
purpose of identifying
abnormal output
patterns.
When such patterns are
identified a root cause
analysis is conducted to
identify the source of the
problem. In this case the
“state 3” outputs seem to
be missing
When Only Random Testing Will Do Dick Hamlet, 2006
(c) 2015 C. Le Goues
6
Why use random testing?
• It’s cheap, assuming you solve the oracle problem.
– You may need more tests to find the same number of faults, but
generating tests is very easy.
• Can calculate the reliability of an application using established
probability theory (next slide).
– Can augment release criteria. E.g., “no random failures for 3 weeks prior to a
release under a given random testing protocol.”
• Good when:
– Lack of domain knowledge makes it difficult or meaningless to partition the
input space into equivalence classes.
– When the amount of state information is important.
– When large volume of data are necessary, such as for load testing, stress
testing, robustness testing, or reliability calculations.
• Useful complement to partition testing.
(c) 2015 C. Le Goues
7
When to use Random Testing
• Lack of domain knowledge makes it difficult or
meaningless to partition input into equivalence
classes.
• The amount of state information is important.
• When large volume of data are necessary:
– Load testing
– Stress testing
– Robustness testing
– Reliability calculations
• To complement partition testing
(c) 2015 C. Le Goues
8
Mathematical reliability
• Assume program P has a constant failure rate of q.
– The probability that P will fail a given test is q ; that it will succeed, 1- q .
• On N independent tests:
– probability of universal success: (1- q )
N
– probability of at least one failure: e =1- (1- q )
N
• 1- e is the confidence probability that a failure will occur no more
often than once in 1 runs.
q
1
• Solve for q , which is also the mean time to failure:
1
q
³
1
1- (1- e)1/N
• The number of tests required to attain confidence 1- e in this MTTF:
log(1- e)
log(1- q )
(c) 2015 C. Le Goues
9
Challenges
• Selecting or sampling from the input space.
–
–
–
–
Uniform distribution: uniform random selection
Equispaced: unsampled gaps are the same size
Operational profile (only makes sense at the system level)
Proportional sampling: sample according to subdomain distribution (partition
the input space)
– Adaptive sampling: take the pattern of previously-identified failure-causing
inputs into consideration in sample strategy.
• The oracle problem: a test case is an input, an expected output, and a
mechanism for determining if the observed output is consistent with the
expect output.
• Root cause analysis: how to debug using a randomly generated input
(c) 2015 C. Le Goues
10
1. if((((l_421 || (safe_lshift_func_uint8_t_u_u (l_42 1, 0xABE574F6L))) &&
2. (func_77(func_38((l_424 >= l_425), g_394, g_30.f0), func_8(l_408, g_345[2],
3. g_7, (*g_165), l_421), (*l_400), func_8(((*g_349) != (*g_349)), (l_426 !=
4. (*l_400)), (safe_lshift_func_int16_t_s_u((**g_349), 0xD5C55EF8L)), 0x0B1F0B62L,
5. g_95), (safe_add_func_uint32_t_u_u((*g_165),l_431)))^
6. ((safe_rshift_func_uint8_t_u_s(((*g_165)>=(**g_349)),(safe_mul_func_int8_t_s_s
7. ((*g_165), l_421)))) <= func_77((*g_129), g_95, 1L, l_408, (*l_400))))){
8. struct S0 *l_443 = &g_30;
9. (*l_400) = ((safe_mod_func_int16_t_s_s((safe_add_func_int16_t_s_s(l_421,
10.
(**g_164))), (**g_349))) && l_425);
11.l_447^=(safe_sub_func_int16_t_s_s (0x27AC345CL, ((**g_250) <=
12.
func_66(l_446, g_19, g_129, (*g_129), l_407))));
13.(*l_446)=func_22(l_431,-1L,l_421,(0x1B625347L<=func_22(g_394,l_447, -1L)));
14.} else {
15.const uint32_t l_459 = 0x9671310DL;
16.l_448 = (*g_186);
17.(*l_400) = (0L & (0 == (*g_348)));
18.(*l_400) = func_77((*g_31),((*g_165) && 6L), l_426, func_77((*l_441),
19. (safe_lshift_func_uint16_t_u_u ((((safe_mul_func_int16_t_s_s ((**g_349),
20. (*g_165))) | ((*g_165) > l_426)) < (0 != (*g_129))), (&l_431 == &l_408))),
21. (l_453 == &l_407), func_77(func_38((*l_400), (safe_mod_func_uint16_t_u_u
22. ((l_420 < (*g_165)), func_77((*l_441), l_456, (*l_446), (*l_448), g_345[5]))),
23. g_345[4]), g_287, (func_77((*g_129), l_421, (l_424 & (**g_349)), ((*l_453) !=
24. (*g_129)), 0x6D4CA97DL) == (safe_div_func_int64_t_s_s (-1L, func_77((*g_129),
25. l_459, l_447, (*l_446), l_459)))), g_95, g_19), l_420), (*l_446));
(c) 2015 C. Le Goues
11
26.}
Solutions to the oracle problem
Parameters
Input
generator
Fail
SUT
Parameters
Comparator
Normal
Pass
Input
generator
SUT
Observer
Crash
Golden
standard
Parameters
Input
generator
Exception
Fail
SUT
Comparator
Parameters
Assertions
Input
generator
SUT
Pass
Parametric
oracle
(c) 2015 C. Le Goues
12
Pass
Fails
Parametric oracle example
A Technique for Testing Command and Control Software, M. Watkins, Boeing, 1982
(c) 2015 C. Le Goues
13
Criticisms of random testing
• Oracle problem
• Corner faults might escape detection
• Prediction problems
– Uniform-distribution based predictions can be easily
wrong
– Operational profiles are difficult to obtain and what
suits one user might not suite another, e.g., "novice"
and "expert"
– Domain based analysis does not account for
program size, only for the number of test points
(c) 2015 C. Le Goues
14
The random vs. partition testing
debate
(c) 2015 C. Le Goues
15
Metrics
• P-measure: probability of finding at least one failing test case.
– Pr =1- (1- q )N
k
– Pp =1- Õ (1- qi )n
i
•
i=1
q i probability that an input from partition i will
cause a failure in that partition; ni number of test
cases drawn from that partition; k number of
partitions
• E-measure: expected number of triggered failures:
E = å nq
E
=
N
q
r
F-measure: expected number of test cases required to trigger an
k
•
p
i=1 i i
error.
(c) 2015 C. Le Goues
16
Mathematical implications
• Partition testing: guaranteed to perform at least as
well as random if number of test cases is
proportional to size of subdomain.
– Ratios of subdomain sizes may be too large, test case
counts thus infeasible.
• E-measure is same when the failure rates of all
partitions are the same; better in partitions strategy
if we focus on the buggy partitions; better in random
strategy otherwise.
• P-measure better on partitions if the partitions are
all the same size and we choose the same # of test
cases from each partition.
(c) 2015 C. Le Goues
17
The discussion: Part 1
• 1984, Duran & Ntafos. “Simulation results are presented which suggest
that random testing may often be more cost effective than partition
testing schemes”
• 1991, Weyuker & Jeng. “We have shown analytically that partition testing
can be an excellent testing strategy or a poor one” … “For a partition
testing strategy to be really effective and therefore worth doing…it is
necessary that some subdomains be relatively small and contain only
failure causing inputs, or at least nearly so”
• 1994, Chen & Yu. “Partition testing is guaranteed to perform at least as
well as random testing so long as the number of test cases selected is in
proportion to the size of the subdomains”
• 1996, Reid. “As expected, an implementation of BVA was found to be most
effective, with neither EP nor random testing half as effective. The random
testing results were surprising, requiring just 8 test cases per module to
equal the effectiveness of EP, although somewhere in the region of 50,000
random test cases were required to equal the effectiveness of BVA.”
(c) 2015 C. Le Goues
18
The Discussion: Part 2
• 1998, Ntafos. “Proportional partition testing has been suggested as a
preferred way to perform partition testing because it assures performance
that is at least as good as that of random testing. We showed that this goal
for partition testing is rather questionable and even counterproductive.
Partition testing strategies can be expected to perform better than
random testing; the real issue is cost-effectiveness. Random testing may
be a good complementary strategy to use especially for final testing. It
has the advantage of relatively easy reliability estimation from test
outcomes.”
• Gutjahr, 1999. Even if no especially error-prone subdomains of the input
domain can be identified in advance, partition testing can provide
substantially better results than random testing.
• 2003, Boland et al. For equal sample sizes from all the subdomains,
partition testing is superior to random testing if the average of the
subdomain failure rates is larger than the overall failure rate of the
program.
(c) 2015 C. Le Goues
19
Random vs. partition summary
• Most comparisons made on the basis of the methods’ ability to detect at
least one fault.
• Comparisons refer to “partition testing” in general.
• General agreement that “fault” oriented partitions work better than other
type of partitions when judged from the fault detection perspective.
• Fault detection ability is not the only thing that counts. Bear in mind the
assumptions behind the math!
– The non-overlapping subdomains assumption used in most studies ignores the
reality of common code across them.
– Most comparisons assume the same number of test cases in both instances.
– Most also assume the existence of an oracle.
• Consider the cost effectiveness of a technique for your particular
domain/testing problem. Do you have an automated oracle? What is your
computational/human budget?
(c) 2015 C. Le Goues
20
Randoop
• Feedback-directed random testing
– Inputs
• Location of an assembly,
• A time limit after which test generation stops,
• An optional set of configuration files – what should be tested or avoided for not
rediscovering the same error.
– Generates unit tests (sequences) and checks for assertion violations, access
violations, and unexpected program termination
– Before outputting an error-revealing sequence, Randoop attempts to minimize
it by iteratively omitting method calls that can be removed from the method
sequence while preserving its error-revealing behavior
(c) 2015 C. Le Goues
21
AgitarOne
M.t Boshernitsan, R. Doong, A. Savoia , .From daikon to agitator: lessons and challenges in building a commercial tool for developer testing, 2006
(c) 2015 C. Le Goues
22
Fuzz testing
• Fuzz testing is a negative software testing
method that feeds malformed and unexpected
input data to a program, device, or system with
the purpose of finding security-related defects,
or any critical flaws leading to denial of service,
degradation of service, or other undesired
behavior (A. Takanen et al, Fuzzing for Software
Security Testing and Quality Assurance, 2008)
• Programs and frameworks that are used to
create fuzz tests or perform fuzz testing are
commonly called fuzzers.
(c) 2015 C. Le Goues
23
Fuzzing process
(c) 2015 C. Le Goues
24
Fuzzing approaches
• Generic: crude, random corruption of valid data without any regard to the
data format.
• Pattern-based: modify random data to conform to particular patterns. For
example, byte values alternating between a value in the ASCII range and
zero to “look like” Unicode.
• Intelligent: uses semi-valid data (that may pass a parser/sanity checker’s
initial line of defense); requires understanding the underlying data format.
For example, fuzzing the compression ratio for image formats, or fuzz PDF
header or cross-reference table values.
• Large Volume: fuzz tests at large scale. The Microsoft Security
Development Lifecycle methodology recommends a minimum of 100,000
data fuzzed files.
• Exploit variant: vary a known exploitative input to take advantage of the
same attack vector with a different input; good for evaluating the quality
of a security patch.
(c) 2015 C. Le Goues
25
Some studies
•
An Empirical Study of the Reliability of UNIX Utilities, B. Miller et al, 1990
– We have been able to crash 25-33% of the utility programs on any version of UNIX that was
tested
•
An Empirical Study of the Robustness of Windows NT Applications Using Random
Testing, J. Forrester & B. Miller, 2000
– When subjected to random valid input that could be produced by using the mouse and
keyboard, we crashed 21% of applications tested (including Microsoft Office 97 and 2000,
Adobe Acrobat Reader, Eudora, Netscape 4.7, Visual C++ 6.0, Internet Explorer (IE) 4.0 and
5.0), and hung an additional 24% of applications. When subjected to raw random Win32
messages, we crashed or hung all the applications that we tested
•
An Empirical Study of the Robustness of MacOS Applications Using Random Testing,
B.Miller et al, 2006
– Our testing crashed only 7% of the command-line utilities, a considerably lower rate of failure
than observed in almost all cases of previous studies. We found the GUI-based applications to
be less reliable: of the thirty that we tested, only eight did not crash or hang. Twenty others
crashed, and two hung. These GUI results were noticeably worse than either of the previous
Windows (Win32) or UNIX (X-Windows) studies
(c) 2015 C. Le Goues
26
Types of faults found
•
•
•
•
•
•
•
Pointer/array errors
Not checking return codes
Invalid/out of boundary data
Data corruption
Signed characters
Race conditions
Undocumented features
(c) 2015 C. Le Goues
27
Fuzzers
•
•
•
•
•
•
•
•
•
•
•
AxMan—A web-based ActiveX fuzzing engine
Blackops SMTP Fuzzing Tool—Supports a variety of
different SMTP commands and Transport Layer
Security (TLS)
BlueTooth Stack Smasher (BSS)—L2CAP layer fuzzer,
distributed under GPL license
COMRaider—COMRaider is a tool designed to fuzz
COM Object Interfaces
Dfuz—A generic fuzzer
File Fuzz—A graphical, Windows based file format
fuzzing tool. FileFuzz was designed to automate the
creation of abnormal file formats and the execution
of applications handling these files. FileFuzz also
has built in debugging capabilities to detect
exceptions resulting from the fuzzed file formats
Fuzz—The original fuzzer developed by Dr. Barton
Miller at my Alma Matter, the University of
Wisconsin-Madison in 1990. Go badgers!
fuzzball2—TCP/IP fuzzer
radius fuzzer—C-based RADIUS fuzzer written by
Thomas Biege
ip6sic—Protocol stressor for IPv6
Mangle—A fuzzer for generating odd HTML tags, it
will also auto launch a browser.
•
•
•
•
•
•
•
•
•
PROTOS Project—Software to fuzz Wireless
Application Protocol (WAP), HTTP, Lightweight
Directory Access Protocol (LDAP), Simple Network
Management Protocol (SNMP), Session Initiation
Protocol (SIP), and Internet Security Association and
Key Management Protocol (ISAKMP)
Scratch—A protocol fuzzer
SMUDGE—A fault-injector for many different types
of protocols and is written in the python language.
SPIKE—Network protocol fuzzer
SPIKEFile—Another file format fuzzer for attacking
ELF (Linux) binaries from iDefense. Based off of
SPIKE listed above.
SPIKE Proxy—Web application fuzzer
Tag Brute Forcer—Awesome fuzzer from Drew
Copley at eEye for attacking all of those custom
ActiveX applications. Used to find a bunch of nasty
IE bugs, including some really hard to reach heap
overflows
beSTORM—Performs a comprehensive analysis,
exposing security holes in your products during
development and after release.
Hydra—Hydra takes network fuzzing and protocol
testing to the next level by corrupting traffic
intercepted “on the wire,” transparent to both the
client and server under test
M. Warnock, Look out! It’s the fuzz!, 2007
(c) 2015 C. Le Goues
28
VARIATIONS ON A (RANDOM)
THEME…
(c) 2015 C. Le Goues
29
(c) 2015 C. Le Goues
30
Chaos monkey/Simian army
• A Netflix infrastructure testing system.
• “Malicious” programs randomly trample on
components, network, datacenters, AWS instances…
– Chaos monkey was the first – disables production
instances at random.
– Other monkeys include Latency Monkey, Doctor
Monkey, Conformity Monkey, etc… Fuzz testing at the
infrastructure level.
– Force failure of components to make sure that the
system architecture is resilient to unplanned/random
outages.
• Netflix has open-sourced their chaos monkey code.
(c) 2015 C. Le Goues
31
Usability: A/B testing
• Controlled randomized experiment with two
variants, A and B, which are the control and
treatment.
• One group of users given A (current system);
another random group presented with B;
outcomes compared.
• Often used in web or GUI-based applications,
especially to test advertising or GUI element
placement or design decisions.
(c) 2015 C. Le Goues
32
Example
• A company sends an advertising email to
its customer database, varying the
photograph used in the ad...
(c) 2015 C. Le Goues
33
Example: group A (99% of users)
• Act now!
Sale ends
soon!
(c) 2015 C. Le Goues
34
Example: group B (1%)
• Act now!
Sale ends
soon!
(c) 2015 C. Le Goues
35
Example
• A company sends an advertising email to
its customer database, varying the
photograph used in the ad...
• If more customers in the cat group than
the dog group respond to the
advertisement, this indicates a possibly
fruitful marketing direction.
(c) 2015 C. Le Goues
36
Integration: object protocols
• Covers the space of possible API calls, or program “conceptual
states.”
• Develop test cases that involve representative sequence of operations on
objects
– Example: Dictionary structure: Create, AddEntry*, Lookup,
ModifyEntry*, DeleteEntry, Lookup, Destroy
– Example: IO Stream: Open, Read, Read, Close, Read, Open,
Write, Read, Close, Close
– Test concurrent access from multiple threads
• Example: FIFO queue for events, logging, etc.
Create
Put
Put
Put
Get
Get
Get
Get
Put
Put
Get
• Approach
– Develop representative sequences – based on use cases, scenarios, profiles
– Randomly generate call sequences
• Also useful for protocol interactions within distributed designs.
(c) 2015 C. Le Goues
37
Stress testing
• Robustness testing technique: test beyond
the limits of normal operation.
• Can apply at any level of system granularity.
• Stress tests commonly put a greater
emphasis on robustness, availability, and
error handling under a heavy load, than on
what would be considered “correct”
behavior under normal circumstances.
(c) 2015 C. Le Goues
38
Soak testing
• Problem: A system may behave exactly as
expected under artificially limited execution
conditions.
– E.g., Memory leaks may take longer to lead to
failure (also motivates static/dynamic analysis, but
we’ll talk about that later).
• Soak testing: testing a system with a significant
load over a significant period of time (positive).
• Used to check reaction of a subject under test
under a possible simulated environment for a
given duration and for a given threshold.
(c) 2015 C. Le Goues
39
Mutation testing
• Technique to evaluate the quality of a
(typically black box) test suite.
• Creates many random mutants of a program
and then runs the test cases on them.
• If any of the test cases now fail, the test suite
has “killed” the mutant.
• Test suite quality is measured by the
number/percentage of killed mutants.
(c) 2015 C. Le Goues
40
Solid boxes are automatic; dashed are manual, though research
continues on identifying equivalent mutants automatically and
automatically generating test cases that kill previously unkilled
mutants.
A practical system for mutation testing: help for the common programmer. J. Offut, 1994
(c) 2015 C. Le Goues
41
Example mutation operators
• Delete a statement
• Change operators (example: && to ||)
• Replace expressions with true or
false
• Replace a variable with another variable.
• …Considerable research in
more/better/more powerful mutation
types.
(c) 2015 C. Le Goues
42
Example tools
•
Java source code:
–
–
–
–
•
Jester
muJava
Bacterio
Judy
Java byte code:
– Javalanche
– Jumble
– PIT
•
Ruby:
– Mutant
– Heckle
•
.NET and C#:
– NinjaTurtles
– Nester
•
Php: Mutagenesis
(c) 2015 C. Le Goues
43
Duran & Ntafos, 1984. “Simulation results are presented which
suggest that random testing may often be more cost effective
than partition testing schemes”
(c) 2015 C. Le Goues
44
50 trials
Weyuker & Jeng, 1991. “We have shown analytically that
partition testing can be an excellent testing strategy or a poor
one”
• can be better, worse, or the same as , depending on how the
partitioning is performed
(c) 2015 C. Le Goues
45
Weyuker & Jeng, 1991. “For a partition testing strategy to be
really effective and therefore worth doing, it clearly has to
perform substantially better than in these examples. It is
therefore necessary that some subdomains be relatively small
and contain only failure causing inputs, or at least nearly so”
• when one or more partitions only contains inputs that produce
incorrect outputs. In most cases
• is minimized when a dominant partition contains all the failure
causing inputs and is assigned just one test case. In most cases,
(c) 2015 C. Le Goues
46
Proportional testing
• Chen & Yu, 1994. “We also find that partition testing is guaranteed to
perform at least as well as random testing so long as the number of test
cases selected is in proportion to the size of the subdomains”
“There is, however, a practical consideration that affects the applicability of
Proposition 3. The exact ratios of the subdomain sizes may not be reducible to the
ratios of small integers, thereby rendering the total number of test cases too large”
• Ntafos, 1998. “Proportional partition testing has been suggested as a
preferred way to perform partition testing because it assures performance
that is at least as good as that of random testing. We showed that this
goal for partition testing is rather questionable and even
counterproductive. Partition testing strategies can be expected to perform
better than random testing; the real issue is cost-effectiveness. Random
testing may be a good complementary strategy to use especially for final
testing. It has the advantage of relatively easy reliability estimation from
test outcomes”
• Gutjahr, 1999
(c) 2015 C. Le Goues
47
1996, S. Reid. Empirical Comparison of Random,
Equivalence Classes and Boundary Value Analysis
Number of
test cases
required
Probability of Faults
detection
detected
Equivalence
Classes
8
.33
6
Random
8
.32
5
Boundary
Value Analysis
25
.73
12
Random
25
.37
6
An Empirical Analysis of Equivalence Partitioning, Boundary Value Analysis and Random Testing
Stuart, C. Reid, 1997
(c) 2015 C. Le Goues
48
Chen & Yu, 1996. Overlapping
subdomains & the use of Pmeasure to evaluate test
•effectivenes
Another merit of the E-measure is that it can distinguish the
capability of detecting more than one failure, while the Pmeasure regards a testing strategy as good as another so long
as both can detect at least one failure
• For the overlapping case, we notice that the crucial factor of
the relative performance of subdomain testing to random
testing is the aggregate of the differences in the subdomain
failure rate and the overall failure rate, weighted by the
number of test cases selected from that subdomain. Thus,
unlike the disjoint case, it is possible that all subdomain failure
rates are higher than the overall, in which case subdomain
testing is clearly better than random testing
(c) 2015 C. Le Goues
49
Gutjahr, 1999. The influence of
uncertainty
•
•
•
•
This paper compares partition testing and random testing on the assumption that program failure rates
are not known with certainty and should, therefore, be modeled by random variables
It is shown that under uncertainty, partition testing compares more favorably to random testing than
suggested by prior investigations concerning the deterministic case: The restriction to failure rates that are
known with certainty systematically favors random testing.
The case above is a boundary case (the worst case for partition testing), and the fault detection probability
of partition testing can be up to k times higher than that of random testing, where k is the number of
subdomains
Finally, let us briefly summarize consequences of our results for the work of a practicing test engineer:
–
–
–
œ
In spite of (erroneous) conclusions that might possibly be drawn from previous investigations, partition–
based testing techniques are well-founded. Even if no especially error-prone subdomains of the input domain
can be identified in advance, partition testing can provide substantially better results than random testing.
œ
Because of the close relations between partition testing and other subdomain-based testing methods
(branch testing, all-uses, mutation testing etc.), also the superiority of the last-mentioned methods over
random testing can be justified. The wide-spread practice of spending effort for satisfying diverse coverage
criteria instead of simply choosing random test cases is not a superstitious custom; it is a procedure the
merits of which can be understood by sufficiently subtle, but formally precise models.
œ
The effort for satisfying partition-based coverage criteria is particularly well spent, whenever the partition
leads to subdomains of largely varying sizes, each of which is processed by the program or system in a rather
homogeneous way (i.e., the processing steps are similar for all inputs of a given subdomain). Contrary, the
advantages of partition testing are only marginal in the case of subdomains of comparable sizes and
heterogeneous treatment by the program. In any case, the partition should not be arbitrarily chosen, but
carefully derived from the structure or function of the program.
(c) 2015 C. Le Goues
50
Boland et al, 2003. Comparing partition and random
testing via majorization and Schur functions
•
•
•
•
•
•
•
We establish a general result that states that, for equal sample sizes from all the subdomains, partition
testing is superior to random testing if the average of the subdomain failure rates is larger than the overall
failure rate of the program. This general result helps in identifying many situations where partition testing
will be more effective than random testing, giving strength to the partition testing approach
This generalizes the result established by Gutjahr, which states that, for samples of size one from each
subdomain, partition testing is better or same as random testing if all subdomain failure rates have the
same expected value
The most important results of our analysis are the following:
For equal sample sizes from all the subdomains, partition testing outperforms random testing if the
average of the subdomain failure rates is larger than the overall failure rate of the program. Throughout
this paper, the (sub)-domain failure rate is defined as the ratio of the number of failure causing inputs in
the (sub)domain to the size of the (sub)domain.
For equal sample sizes from all the subdomains, partition testing is superior if the subdomain failure rates
are inversely proportional to subdomain size.
For unequal sample sizes, if and then partition testing is superior to random testing if the average of the
subdomain failure rates is larger than the overall failure rate of the program
In cases when the number of failure causing inputs in subdomains are assumed to be random variables (as
in Gutjahr) and samples of size one from each subdomain are taken, partition testing is better or the same
as random testing if the average of expected subdomain failure rates is larger than the expected failure
rate of the program.
(c) 2015 C. Le Goues
51
Download