Verification and Validation - Department of Computer Science

advertisement
Verification and Validation
John Morris
Computer Science/
Electrical and Computer Engineering
16-Mar-16
A hard day’s work ensuring that some Japanese colleagues
understand why Auckland is The City of Sails!
1
Terms
• Validation
• Ensuring that the specification is correct
• Determine that the software to be built is actually
what the user wants!
• Verification
• Ensuring that the software runs correctly
16-Mar-16
2
Validation or Verification?
Validation
Building the right software
Make sure it’s what the user wants
Verification
Building the software right
Make sure it works
Accurate, complete specification
essential!
16-Mar-16
3
Specifications
• Functional
• Define actions and operations of system
eg
• Each transaction shall be stored in a database
• GST at the current rate is applied to invoice
• Can be verified by software tests
• Apply an input data set
• Compare output state to expected state
• Expected state is defined in specifications
16-Mar-16
4
Specifications
• Functional
• Define actions and operations of system
• Can be verified by software tests
• Non-functional
• Performance
eg
• Searches will take <2 seconds
• Messages will be compressed by 60%
• Usability
eg
• An trained monkey shall be able to run this software
• Require special tests
16-Mar-16
5
Specifications
• Functional
• Define actions and operations of system
• Can be verified by software tests
• Non-functional
• Performance
eg
• Searches will take <2 seconds
• Messages will be compressed by 60%
• Usability
eg
• An trained monkey shall be able to run this software
• Require special tests
16-Mar-16
6
Testing
• Aim
• Locate and repair defects
• Axiom
Testing only reveals the presence of defects,
it never proves their absence!!
• No matter how much testing you do, you can’t be
sure that there isn’t an error waiting to bite you!
16-Mar-16
7
Testing
The alternative?
• Formal verification
• Uses formal logic to prove that software is correct
• Currently:
•
•
•
•
Prohibitively expensive
Little automated support
Mainly manual techniques
Error prone
• Only feasible when cost of failure is extreme
16-Mar-16
• Usually when failure leads to loss of life
• Air and space craft control
• Medical systems
• Nuclear plants
8
Testing - Motivation
Definitely the least glamorous part of software
development 
• Possibly the most expensive!
• If not carried out thoroughly!
• Estimates of the economic cost of software failure
produce astronomic numbers
• US: $59.5 billion in 2002
• http://www.nist.gov/public_affairs/releases/n02-10.htm
• ~10% of projects are abandoned entirely
• Including some very large ones
16-Mar-16
9
Famous software failures
• July 28, 1962 Mariner I space probe
• A bug in the flight software for the Mariner 1 causes the rocket to divert
from its intended path on launch. Mission control destroys the rocket over
the Atlantic Ocean. The investigation into the accident discovers that a
formula written on paper in pencil was improperly transcribed into
computer code, causing the computer to miscalculate the rocket's trajectory.
16-Mar-16
10
Famous software failures
• 1982 -- Soviet gas pipeline.
• Operatives working for the Central Intelligence Agency allegedly plant a
bug in a Canadian computer system purchased to control the trans-Siberian
gas pipeline. The Soviets had obtained the system as part of a wide-ranging
effort to covertly purchase or steal sensitive U.S. technology. The CIA
reportedly found out about the program and decided to make it backfire with
equipment that would pass Soviet inspection and then fail once in operation.
The resulting event is reportedly the largest non-nuclear explosion in the
planet's history.
16-Mar-16
11
Famous software failures
• 1985-1987 -- Therac-25 medical accelerator
• Based upon a previous design, the Therac-25 was an "improved" therapy
system that could deliver two different kinds of radiation: either a lowpower electron beam or X-rays. The Therac-25's X-rays were generated by
smashing high-power electrons into a metal target positioned between the
electron gun and the patient. A second "improvement" was the replacement
of the older Therac-20's electromechanical safety interlocks with software
control, a decision made because software was perceived to be more
reliable.
• What engineers didn't know was that both the 20 and the 25 were built upon
an operating system that had been kludged together by a programmer with
no formal training. Because of a subtle bug called a "race condition," a
quick-fingered typist could accidentally configure the Therac-25 so the
electron beam would fire in high-power mode but with the metal X-ray
target out of position. At least five patients die; others are seriously injured.
16-Mar-16
12
Famous software failures
• June 4, 1996 -- Ariane 5 Flight 501
• Working code for the Ariane 4 rocket is reused in the Ariane 5, but the
Ariane 5's faster engines trigger a bug in an arithmetic routine inside the
rocket's flight computer. The error is in the code that converts a 64-bit
floating-point number to a 16-bit signed integer. The faster engines cause
the 64-bit numbers to be larger in the Ariane 5 than in the Ariane 4,
triggering an overflow condition that results in the flight computer crashing.
• First Flight 501's backup computer crashes, followed 0.05 seconds later by a
crash of the primary computer. As a result of these crashed computers, the
rocket's primary processor overpowers the rocket's engines and causes the
rocket to disintegrate 40 seconds after launch.
• More stories
• http://www.wired.com/software/coolapps/news/2005/11/69355
or
• ‘Software testing failures’ in Google!
16-Mar-16
13
Approach
1.
2.
3.
4.
Coding’s finished
Run a few tests
System passes
Release
Result: Disaster
 Inadequate design or poor coding
produced many timebombs in the system!
16-Mar-16
14
Approach
1.
2.
3.
4.
Coding’s finished
Run a few tests
System passes
Release
Here’s the problem ..
• Errors are inevitable (we’re human!)
• Testing did not reveal them
• Passing a few tests was assumed to mean
that the system was error-free
See the first axiom!!
16-Mar-16
15
Why testing is hard
• Let’s take a trivial example
• Test the addition operation on a 32-bit machine
c = a + b
• How many tests needed?
16-Mar-16
16
Why testing is hard
• Trivial example
• Test the addition operation on a 32-bit machine
c = a + b
• How many tests needed?
• Naïve strategy
• But simple and easily understood!
• How many values for a?
232
• How many values for b?
232
• Total possible input combinations? 232 x 232 = 264
• Assume:
• One addition test/10 instructions = 3x108 test/sec
16-Mar-16
17
Why testing is hard (2)
• Total possible input combinations? 232 x 232 = 264
• Assume:
• 3GHz machine
• One addition test/~10 cycles = 3x108 test/sec
• Time = 264 / 3x108 = 1.6x1019/3x108 = 0.5x1011 sec
= several years!!
• Clearly need a smarter technique!!
16-Mar-16
18
Testing strategies
• Exhaustive testing - Try all possible inputs
•
•
•
•
Naïve
Simple (easy to implement)
Easy to justify and
Argue for completeness!
• Works for very small input sets only!!
•
•
•
•
For inputs, ai, i= 0, n-1
If Ai = {ai0,ai1,….,aik-1} is the set of all possible values of ai
and |Ai| = k is the cardinality of Ai
then
• Tests required =  |Ai|
• Clearly only useful when all |Ai| are small!!
16-Mar-16
19
Exhaustive Testing
• Inefficient, naive?
• Never forget the KISS principle
• An automated test system can do a very large
numbers of tests in a reasonable time
and do them while you’re designing the next test!
• Analysis needed is trivial
whereas
• Analysis for a more efficient test regime may be
quite complex and error-prone
• It’s easy to convince someone that an
exhaustively tested system is reliable
16-Mar-16
20
Efficient testing
• Many tests are redundant
• In the adder example, most tests are equivalent
• They don’t exercise any new part of the underlying
circuit!
• For example, you might argue that
• all additions of +ve numbers without overflow are
equivalent
• Addition of 0 to a +ve number is the same for all +ve
numbers
• Similarly for 0 + -ve number
etc
• This divides the tests into equivalence classes
• Only one representative of each class need be
16-Mar-16
tested!
21
Equivalence Classes
• Key concept:
Only one representative of each class needs
to be tested!
• All other tests of inputs in the same equivalence
class just repeat the first one!
Dramatic reduction in total number of tests
No loss of ‘coverage’ or satisfaction that tests are
complete
16-Mar-16
22
Adder example
Test
a
b
c
+ve, no overflow
20
40
60
+ve, overflow
2^31
2^20
overflow
+ve, 0
34
0
34
-ve, no overflow
-100
-30
-130
-ve, 0
-30
0
30
-ve, overflow
-2^31
-2^31
underflow
Result 0
-30
30
0
?
?
Clearly, we’ve achieved a dramatic reduction in
number of required tests!
Disclaimer: A more careful analysis would look at
the circuitry needed to implement an adder!
16-Mar-16
23
Equivalence classes – formal definition
• A set of equivalence classes is a partition of a
set such that
• Each element of the set is a member of exactly
one equivalence class
• For a set, S, and a set of equivalence classes, Ci
• U Ci = S
• Ci  Cj =  (null set) unless I = j
16-Mar-16
24
Equivalence classes – formal definition
• A set of equivalence classes is a partition of
a set such that
• The elements of an equivalence class, C, are
classified by an equivalence relation, ~
• If a  C and b  C , then a ~ b
• The equivalence relation is
• Reflexive
• Transitive
• Symmetric
a~a
if a ~ b and b ~ c, then a ~ c
if a ~ b then b ~ a
• A Representative of each class is an arbitrary
member of the class
• They’re all ‘equivalent’ – so choose any one!
16-Mar-16
25
Equivalence classes – verification
• Equivalence relation
• In the verification context, the elements of the set
are the sets of input values for a function under
test
eg we are verifying a function
f( int a, int b )
• The 2-tuples (1,1), (1,2), (1,3) .. (and many more!)
are the elements of the set of all possible inputs for f
• The equivalence relation is
“behaves the same way under testing”
• One common interpretation of this is:
• “follows the same path through the code”
16-Mar-16
26
Equivalence classes – verification
• Equivalence relation
• Consider this function
int max( int a, int b ){
if( a > b ) return a;
else return b;
}
• There are two paths through this code, so the inputs fall
into two classes
Those for which a > b and
the rest
• This implies that we have only two tests to make:
• (a=5, b=3) and
• (a=4, b=6)
16-Mar-16
27
Black Box and White Box Verification
• There are two scenarios for developing
equivalence classes
• Black Box
• Specification is available but no code
• Equivalence classes are derived from rules in the
specification
eg admission price: if age < 6, then free
if age < 16, then 50%
else full price
would lead to 3 equivalence classes:
age < 6; age  6  age < 16; age  16
16-Mar-16
28
Black Box and White Box Verification
• Black Box
• Specification is available but no code
• White Box
• Code is available and can be analyzed
• Equivalence classes are derived from rules in the
specification and the code
16-Mar-16
29
White Box Verification
• White Box
• Equivalence classes are derived from rules in the
specification and the code
• These are not always the same
eg a database stored on a disc
• Specification might say,
if record exists, then return it
Black Box Testing
Two equivalence classes
• Record exists and
• record does not exist
16-Mar-16
30
White Box Verification
• White Box
• However, the code reveals that an m-way tree
(matched to disc block size for efficiency) Is used
Many additional classes
• Disc block full
• Block split needed
• Only one record
• Record at start of block
• Record in middle of block
• Record at end of block
• Record in root block
• Record in leaf
• ….
16-Mar-16
31
Generating the Equivalence Classes
Specification
admission price: if age < 6, then free
if age < 16, then 50%
else full price
would lead to 3 equivalence classes:
age < 6; age  6  age < 16; age  16
Choose representatives 3, 9 and 29
(or many other sets)
3
16-Mar-16
5 6
9
15 16
29
32
Generating the Equivalence Classes
Formally
Choose representatives 3, 9 and 29 is sufficient
However
An experienced tester knows that a very common
error is writing
< for  or
> for 
or vice versa
So include class limits too!
3
16-Mar-16
5 6
9
15 16
29
33
Generating the Equivalence Classes
Other special cases
• Nulls
• Identity under addition: x + 0 = x
• Unity
• Identity under multiplication: x  1 = x
• Range Maxima and Minima
• May have (or need!) special code
• Illegal values
• Should raise exceptions or return errors
• Read the specification to determine behaviour!
-5
16-Mar-16
-1 0 1
3
5 6
9
15 16
29
999
34
Generating the Equivalence Classes
• Illegal values
•
•
•
•
Should raise exceptions or return errors
Read the specification to determine behaviour!
Particularly important!
Typical commercial code probably has as much
code handling illegal or unexpected input as
‘working’ code!
• Treat every possible exception as an output!
-5
16-Mar-16
-1 0 1
3
5 6
9
15 16
29
999
35
Generating the Equivalence Classes
Other special cases
• This caused the set of representatives to expand
from 3 to 12
• Some
are not reallytesters
needed routinely
Experienced
eg code does process 1 in just the same way as 3
include these special cases!
• However, this is a small price to pay for robust
software!
• The cost of proving that a unity is not needed is more
than the cost of testing it!
-5
16-Mar-16
-1 0 1
3
5 6
9
15 16
29
999
36
Generating the Equivalence Classes
Outputs
•
•
•
•
•
Find equivalence classes that cover outputs also!
Never
neglect
the inputs
null case!
Same general rules
apply
as for
• It’s very easy to neglect at
One representativespecification
of each class
plus
stage
• Required behaviour may be ‘obvious’
Boundaries
• No need to write it down!
Null output
• It will require coding
eg No items in a report
– does the user
want a confirming
• Experienced
programmers
know that
report anyway? it’s a very common source of error!
• Just one output
eg Reports often have header and trailer sections - are
these correctly generated for a short (<1 page) report?
16-Mar-16
37
Coverage in White Box Testing
• Black Box testing will not usually cover all
the special cases required to test data
structures
• Often, the functional goals of the specification
could be met by one of several data structures
• Specification may deliberately not prescribe the
data structure used
• Allows developers to choose one meeting performance goals
• Permits substitution of an alternative with better
performance (vs non functional specifications)
• Coverage defines the degree to which white
box testing covers the code
• Measurement of completeness of testing
16-Mar-16
38
Coverage in White Box Testing
• Usually, at least some white box coverage
goals will have been met by executing test
cases designed using black-box strategies
• How would you know if this were the case
or not?
• In simple modules, which don’t use internal data
structures, black box classes may be adequate
• This is not the general case though!
• Various coverage criteria exist
Every statement at least once
Every branch taken in true and false directions
Every path through the code
16-Mar-16
39
Coverage in White Box Testing
• Coverage criteria
• Logic coverage
• Statement: each statement executed at least once
• Branch: each branch traversed (and every entry
point taken) at least once
• Condition: each condition True at least once and
False at least once
• Branch/Condition: both Branch and Condition
coverage
• Compound Condition: all combinations of
condition values at every branch statement
covered (and every entry point taken)
• Path: all program paths traversed at least once
16-Mar-16
40
Pseudocode and Control Flow Graphs
input(Y)
if (Y<=0) then
Y = −Y
end
while (Y>0) do
input(X)
Y = Y-1
end
16-Mar-16
“nodes”
“edges”
41
Statement Coverage
• Statement Coverage requires that
each statement is executed at least once
• Simplest form of logic coverage
• Also known as Node Coverage
• What is the minimum number of test cases
required to achieve statement coverage for
the program segment given next?
16-Mar-16
42
Pseudocode and Control Flow Graphs
input(Y)
if (Y<=0) then
Y = −Y
end
while (Y>0) do
input(X)
Y = Y-1
end
16-Mar-16
“nodes”
“edges”
43
Branch coverage
• Branch Coverage requires that each branch
will have been traversed, and that every
program entry point will have been taken, at
least once
• Also known as Edge Coverage
16-Mar-16
44
Branch Coverage – Entry points
• Why include
“…and that every
program entry point
will have been taken,
at least once.”
• Not common
in HLLs (eg Java) now
• Common in scripting
languages
• Any language that
allows a goto and a
statement label!
16-Mar-16
45
Procedure – Module Verification
Steps
• Obtain precise specification
• Should include definitions of exception or illegal input
behaviour
• For each input of module
• Determine equivalence classes (inc special cases)
• Choose representatives
• Determine expected outputs
• Repeat for outputs
• Many output equivalence classes are probably covered
by input equivalence classes
• Build test table
16-Mar-16
46
Procedure – Module Verification
Steps
• Write test program
• Tests in tables is usually the best approach
• Easily maintained
• Test programs need to be retained and run when any
change is made
• To make sure that something that worked isn’t broken
now!!
• Tables are easily augmented
• When you discover the case that you didn’t test for!
16-Mar-16
47
Download