Lecture 2: Design for Testability + The Idea of Coverage

advertisement
Basic Definitions: Testing
 What is software testing?
•
•
Running a program
In order to find faults
•
•
•
•
•
a.k.a. defects
a.k.a. errors
a.k.a. flaws
a.k.a. faults
a.k.a. BUGS
 Hrm. . . that’s a lot of “a.k.a”s
• Let’s refine this terminology a bit
1
Faults, Errors, and Failures
 Fault: a static flaw in a program
• What we usually think of as “a bug”
 Error: a bad program state that results
from a fault
• Not every fault always produces an error
 Failure: an observable incorrect behavior
of a program as a result of an error
•
Not every error ever becomes visible
2
To Expose a Fault with a Test
 Reachability: the test much actually reach
and execute the location of the fault
 Infection: the fault must actually corrupt
the program state (produce an error)
 Propagation: the error must persist and
cause an incorrect output – a failure
3
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
Find the fault
for (i = n-1; i > 0; i--) {
if (a[i] == x)
return i;
}
return -1;
}
4
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] == x)
Here’s a test case:
a = {}
n=0
x=2
return i;
}
return -1;
Does not even reach
the fault
}
5
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
Here’s another:
a = {3, 9, 4}
n=3
x=2
return i;
}
return -1;
Reaches the fault
Infects state with error
But no failure
}
6
An Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
And finally:
a = {2, 9, 4}
n=3
x=2
return i;
}
return -1;
}
Reaches the fault
Infects state with error
And fails – returns -1
instead of 0
7
Controllability and Observability
 Goals for a test case:
•
•
•
Reach a fault
Produce an error
Make the error visible as a failure
 In order to make this easy the program must be
controllable and observable
• Controllability:
•
•
How easy it is to drive the program where we want
to go
Observability:
•
How easy it is to tell what the program is doing
8
Design for Testability
 If a program is not designed to be
controllable and observable, it generally
won’t be
 We have to start preparing for testing
before we write any code
•
Testing as an after-the-fact, ad hoc, exercise is
often limited by earlier design choices
9
Test-Driven Development
 One way to design for testability is to write the
test cases before the code
•
Idea arising from Extreme Programming and agile
development
•
•
•
•
•
Write automated test cases first
Then write the code to satisfy tests
Helps focus attention on making software well-specified
Forces observability and controllability: you have to be
able to handle the test cases you’ve already written
(before deciding they were impractical)
Reduces temptation to tailor tests to idiosyncratic
behaviors of implementation
10
Controllability: Simulation and Stubbing
 A key to controllable code is effective
simulation and stubbing
•
Simulation of low-level hardware devices
through a clean driver interface
•
•
•
•
Real hardware may be slow
May be impossible/expensive to induce some
hardware failure modes on real hardware
Real hardware may be a limited resource
Stubbing for other routines and code
•
•
•
Other code/modules may not be complete
May be slow and irrelevant to test
May need to simulate failure of other modules
11
Simulation and Stubbing: JPL Example
 When testing JPL flash storage modules we
rely on software simulation of flash devices
•
Real flash devices are slow
•
•
Real flash devices are expensive
•
•
•
Can’t do aggressive random testing
JPL only has a few boards – constant competition to
test on these
Running hundreds of thousand of tests will wear the
flash hardware out
Enables us to introduce rare hardware failures
•
System resets, spontaneous bad blocks and write
failures, etc.
12
Controllability: Downwards Scalability
 Another important aspect of controllability is
to make code “downwards scalable”
•
•
Many faults cause an error only in a corner
case due to a resource limit
An effective strategy for finding errors is to
reduce the resource limits
•
•
•
Test a version of the program with very tight bounds
Finding corner cases is easier if the corners are
close together
Too many programs hard-code resource limits
or make assumptions about resources
unconnected to defined limits
•
E.g., not checking the result of malloc
13
Downwards Scalability: JPL Example
 Flight flash hardware is usually 1-4 GB
device
•
E.g., 64 blocks of 32 pages of 8192 bytes
 We primarily test with much smaller “devices”
(using software simulation)
• 6 blocks of 4 pages of 64 bytes
• Forces flash file system to compact storage
more often
• Tests assumptions about how space is used on
flash
• Forces more multi-page writes and directory
entries over multiple pages
14
Downwards Scalability: JPL Example
 Easier to explore various combinations of
states of blocks/pages of the device
Used page
Free page
Dirty page
Bad block
15
Controllability
 Other important themes for controllability
• Network/file access
•
If program reads from the network or to remote files,
this is hard to control
•
•
System calls
•
Similarly, reading the time from the operating system
can be hard to control
•
•
Again, simulation and stubbing are key
Simulation and stubbing – Operating System
Abstraction Layer etc.
GUI control
•
Allow scripted control of GUI elements so tests can
be automated
16
Observability: Assertions
 Assertions improve observability by making
(some) errors into failures
•
Even if the effect of a fault doesn’t propagate, it
may be visible if an assertion checks the state
at the right time
 Assertions also improve observability by
making the error, rather than failure, visible
• Know how the state was corrupted
directly, not just eventual effect
17
Observability: Invariant Checkers
 Can extend the idea of assertions to writing
“full” invariant checkers
• Do a crawl of code’s basic data structures
• Check various invariants that would be
too expensive to check at runtime
• Invariant checker can be written to be
easy-to-use: recursion, memory
allocation, etc.
•
•
Won’t run on actual system
But be careful! If your invariant checker has
a bug and changes the system state. . .
18
Observability
 Other important themes for observability
• Logging
•
•
Especially critical for GUI interfaces, to mirror
GUI events in ordered parseable messages
Network/file access
•
If program writes to the network or to remote
files, this is hard to observe
19
Controllability & Observability: Memory
Allocation
 More extreme case: embedded code for
mission or safety critical systems
•
•
May be running without memory protection
Dynamic allocation often forbidden
 Design module to accept a static block allocated
elsewhere, and only access this memory
• Controllability: allows us to introduce memory
faults, simulate warm reboots
• Observability: allows us to easily instrument
code with low-overhead checks to find memory
safety violations during testing
20
Coverage
 Literature of software testing is primarily
concerned with various notions of coverage
 Ammann and Offutt identify four basic kinds of
coverage:
• Graph coverage
•
Logic coverage
•
Input space partitioning
•
Syntax-based coverage
21
Graph Coverage
 Cover all the nodes, edges, or paths of
some graph related to the program
 Examples:
• Statement coverage
• Branch coverage
• Path coverage
• Data flow (def-use) coverage
• Model-based testing coverage
• Many more – most common kind of
coverage, by far
22
Graph Coverage
 Most FSM testing algorithms can be seen
as graph coverage
• Consider VC – computing a spanning tree
to nodes is standard graph exploration
•
Beizer: “find a graph and cover it”
23
Statement/Basic Block Coverage
if (x < y)
{
y = 0;
x = x + 1;
}
else
{
x = y;
}
Statement coverage:
Cover every node of these
graphs
1
x<y
y=0
x=x+1
x >= y
2
3
x=y
4
Treat as one node because
if one statement executes
the other must also execute
(code is a basic block)
if (x < y)
{
y = 0;
x = x + 1;
}
1
x<y
y=0
x=x+1
2
x >= y
3
24
Branch Coverage
if (x < y)
{
y = 0;
x = x + 1;
}
else
{
x = y;
}
1
x<y
y=0
x=x+1
x >= y
2
3
Branch coverage vs.
statement coverage:
Same for if-then-else
x=y
4
But consider this if-then
structure. For branch coverage
can’t just cover all nodes, but
must cover all edges – get to
node 3 both after 2 and without
executing 2!
if (x < y)
{
y = 0;
x = x + 1;
}
1
x<y
y=0
x=x+1
2
x >= y
3
25
Path Coverage
if (x < y)
{
y = 0;
x = x + 1;
}
else
{
x = y;
}
How many paths through
this code are there? Need
one test case for each to
get path coverage
1
x<y
y=0
x=x+1
x >= y
2
3
1 2 4 5 6 and 1 3 4 6
4
x<y
if (x < y)
y=0
x=x+1
{
5
}
Path coverage needs two more:
12456
x >= y
1346
1246
y = 0;
x = x + 1;
x=y
To get statement and branch
coverage, we only need two
test cases:
6
13456
In general: exponential in
the number of conditional branches!
26
Data Flow Coverage
1
x = 3;
y = 3;
2
if (w) {
x = y + 2;
}
if (z) {
y = x – 2;
}
x=3
Def(x)
Annotate program with
locations where variables
are defined and used
(very basic static
analysis)
y=3
Def(y)
3
w
x=y+2
Def(x)
Use(y)
!w
4
54
n=x+y
Def-use pair coverage requires
executing all possible pairs
of nodes where a variable is
first defined and then used,
without any intervening
re-definitions
z
y=x-2
Def(y)
Use(x)
May be many pairs,
some not actually executable
!z
6
7
E.g., this path covers the pair
where x is defined at 1 and used
at 7: 1 2 3 5 6 7
n=x+y
Use(x) Use(y)
But this path does NOT:
1234567
27
Logic Coverage
What if, instead of:
if (x < y)
{
y = 0;
x = x + 1;
}
we have:
if (((a>b) || G)) && (x < y))
{
y = 0;
x = x + 1;
}
1
((a>b) || G)) && (x < y)
y=0
x=x+1
((a <= b) && !G) || (x >= y)
2
3
Now, branch coverage will guarantee
that we cover all the edges, but does
not guarantee we will do so for all
the different logical reasons
We want to test the logic of the guard
of the if statement
28
Active Clause Coverage
( (a > b) or G ) and (x < y)
With these values
for G and (x<y),
(a>b) determines
the value of the
predicate
1
T
F
T
T
2
F
F
T
F
With these values
for (a>b) and
(x<y), G
determines the
value of the
With these values
predicate
for (a>b) and G,
(x<y) determines
the value of the
predicate
3
F
T
T
T
4
F
F
T
F
5
T
T
T
T
6
T
T
F
F
duplicate
29
29
Input Domain Partitioning
 Partition scheme q of domain D
 The partition q defines a set of blocks, Bq = b1 ,
b2 , … bQ
 The partition must satisfy two properties:
1.
2.
blocks must be pairwise disjoint (no overlap)
together the blocks cover the domain D (complete)
b1
b2
b3
bi  bj = ,  i  j, bi, bj  Bq

b=D
b  Bq
Coverage then means using at least one input from each
of b1, b2, b3, . . .
30
30
Input Domain Partitioning
 Some subtleties here…
 What’s wrong with this partition of file contents?
•
{
•
•
•
•
b1: Sorted ascending file
b2: Sorted descending file
b3: Neither sorted ascending nor sorted descending
}
b1
b2
b3
bi  bj = ,  i  j, bi, bj  Bq

b=D
b  Bq
31
31
Syntax-Based Coverage
 Based on mutation testing (a pet topic of
Amman and Offutt, who are heavily into this
research area)
 Bit different kind of creature than the other
coverages we’ve looked at
 Idea: generate many syntactic mutants of
the original program
 Coverage: how many mutants does a test
suite kill (detect)?
32
32
Mutating Our Buggy Program
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}
33
Mutant #1
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n; i > 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}
34
Mutant #2
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
return i;
}
return 0;
}
35
Mutant #3
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] != x)
return i;
}
return -1;
}
36
Mutant #4
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = n)
return i;
}
return -1;
}
37
Mutant #5: Wait, this one’s the fix!
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i >= 0; i--) {
if (a[i] = x)
return i;
}
return -1;
}
38
Syntax-Based Coverage
MUTANTS OF P
Program P
100% coverage
means you kill
all the mutants with
your test suite
39
39
Generation vs. Recognition
 Generation of tests based on coverage
means producing a test suite to achieve a
certain level of coverage
• As you can imagine, generally very hard
• Consider: generating a suite for 100%
statement coverage easily reaches
“solving the halting problem” level
• Obviously hard for, say, mutant-killing
 Recognition means seeing what level of
coverage an existing test suite reaches
40
Coverage and Subsumption
 Sometimes one coverage approach subsumes another
• If you achieve 100% coverage of criteria A, you are
guaranteed to satisfy B as well
•
For example, consider node and edge coverage
•
(there’s a subtlety here, actually – can you spot it?)
 What does this mean?
• Unfortunately, not a great deal
• If test suite X satisfies “stronger” criteria A and test suite
Y satisfies “weaker” criteria B
•
•
•
Y may still reveal bugs that X does not!
For example, consider our running example and statement
vs. branch coverage
It means we should take coverage with a grain of salt,
for one thing
41
Testing “for” Coverage
 Never seek to improve coverage just for the
sake of increasing coverage
•
Well, unless it’s a command from-on-high
 Coverage is not the goal
•
•
Finding failures that expose faults is the goal
No amount of coverage will prove that the
program cannot fail
“Program testing can be used to show the
presence of bugs, but never to show their
absence!” – E. Dijkstra, Notes On Structured
Programming
42
The Purpose of Testing
“Program testing can be used to show the
presence of bugs, but never to show their
absence!” – E. Dijkstra, Notes On
Structured Programming
 Dijkstra meant this as a criticism of testing and an
argument in favor of more disciplined and total
approaches (proving programs correct)
 But he also points out what testing is good for:
exposing errors
 Coverage is valuable if and only if test sets with
higher coverage are more likely to expose failures
43
The Purpose of Testing
“Program testing can be used to show the
presence of bugs”
 When we first start “testing,” we often want to
“see that the program works”
• Try out some scenarios and watch the program
“do its stuff”
• Surprised (annoyed) when (if) the program fails
• This is not really testing: testing is not the
same as a demonstration
• Aim to break (your) code, if it can be broken
44
Levels of Testing
 Adapted from Beizer, by Amman and Offutt
• Level 0: Testing is debugging
• Level 1: Testing is to show the program works
• Level 2: Testing is to show the program
doesn’t work
• Level 3: Testing is not to prove anything
specific, but to reduce risk of using program
• Level 4: Testing is a mental discipline that
helps develop higher quality software
45
What’s So Good About Coverage?
 Consider a fault that
causes failure every
time the code is
executed
 Don’t execute the
code: cannot possibly
find the fault!
int findLast (int a[], int n, int x) {
// Returns index of last element
// in a equal to x, or -1 if no
// such. n is length of a
int i;
for (i = n-1; i >= 0; i--) {
if (a[i] = x)
return i;
}
return 0;
}
 That’s a pretty good
argument for
statement coverage
46
What’s So Good About Coverage?
 We should have an
argument for any kind
of coverage:
• “If I don’t cover this,
then there is more
chance I’ll miss a
fault like that”
• Backed with
empirical data,
preferably!
int findLast (int a[], int n, int x) {
// Returns index of last element
// in a equal to x, or -1 if no
// such. n is length of a
int i;
for (i = n-1; i >= 0; i--) {
if (a[i] = x)
return i;
}
return 0;
}
47
Return to Our Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
Let’s write a tester for
int i;
this version of the
for (i = n-1; i > 0; i--) { program (back to the
if (a[i] == x)
first off-by-one bug)
return i;
}
return -1;
Forget for a moment
that we know what the
bug is!
}
48
Return to Our Example
int findLast (int a[], int n, int x) {
// Returns index of last element in a
// equal to x, or -1 if no such.
// n is length of a
int i;
for (i = n-1; i > 0; i--) {
if (a[i] = x)
What kind of coverage
might we want to think
about when testing this
code?
return i;
}
return -1;
}
49
Return to Our Example
#define N 5 // 5 is “big enough”?
int testFind () {
int a[N];
int p, i;
for (p = 0; p < N; p++) {
random_assign(a, N)
a[p] = 3;
for (i = p; i < N; i++) {
if (a[i] == 3)
a[i] = a[i] – 1;
}
printf (“TEST: findLast({”);
print_array(a, N);
printf (“}, %d, 3)”, N);
assert (findLast(a, N, 3) == p);
}
}
What kind of coverage
does this tester exploit?
50
Download