SWE3643 2013 What Is SoftwareTesting Whittaker

advertisement
What is Software Testing? And Why is it So Hard
J. Whittaker paper
(IEEE Software – Jan/Feb 2000)
Summarized by F. Tsui
Common Reasons of “Bugs” Escaped to Customers
•
User executed an “untested” segment of code.
•
User executed the sequence of code differently in actual
use from any of the test cases
•
User applied a different combination of input values from
any of the test cases
•
User’s operating environment is different from any of the
tested environment --- or an environment that was
conscientiously untested due to cost.
Note: These reasons give us a “hint” of where and what to test.
Proposed: 4 Major Testing “Phases” or “Activities”
1.
Model the User’s Software Environment
2.
Generating & Selecting the Test Scenarios
3.
Running and Evaluating the Test Scenarios
4.
Record the Test Results and Measure the Progress
1. Modeling the Software Environment
• Simulate the interactions between the software and its
environment; mostly in the interfaces that the software
uses:
– Human Interface (e.g. GUI with mouth clicks, key board inputs, other
devices)
– Software Interface (e.g. software interfaces to Op System or other
modules or apps, especially in the areas of “error” return messages)
– File System interfaces (e.g. data written out to or read from external
files or DB)
– Communications interface (e.g. protocols to the network, both valid
and invalid ones)
In testing the interface the tester must consider 2 things:
1) the actual data value for the test inputs and
2) the sequencing of the test inputs
e.g.
-1 in lock-stock-barrel
Interface Testing Suggestions
• Boundary Value Partitioning (combination of what we
called boundary value and equivalence class) is often
used for input value choices
• The sequencing of inputs must consider the
“dependencies” among the inputs. (our “strong normal”
or “worst case” case testing)
– Exercise both the “legal” combination and sequence along
with the “illegal” combination and sequence.
• Use “Models” to help set up sequences of inputs and
transitions:
– Graphs
– State Diagrams,
– Logic Tables, etc.
2. Selecting Test Scenarios
• There are often too many test cases (possibly infinite) to run, but
schedule and resource forces us to select a subset .
– Consider “Coverage” as criteria:
my addition
1. Code statements coverage – how much of the source code are executed at least once?
2. Input coverage – how many of the inputs and how much of the variations for each input
are covered? (boundary value, robust testing, -----, decision table)
3. Output coverage - how many of the outputs are covered and to what depths?
– Consider the execution paths:
• Sequences of inputs to execute different paths of the source code
– Major user paths in terms of “typical usage” scenarios
– Less well –defined (thus less well coded) and minor functional paths only
used by a minority of users.
A bit different from just plain path testing ---- needs to know the
usage & users and application domain
Execution Paths Testing as Discriminating “Criteria”:
• For Control flow Testing
– Every source line executed
– Every branch (case, while, if-then-else)
Think about our lecture on
paths testing:
- code coverage
- branch coverage
- linearly independent paths
- all logical combinations
how far would you go?
• For Data flow testing
– Data “define” to data “usage” path
– Data structure initialization to usage
How much of the D-U paths
would you use ---- just the
D to C-use, D-to P-use, or
all D-U paths?
Would you include program
slice testing to focus on
certain data?
Fault Seeding and Discovery as “criteria”
• Insert defects into the code
• Design test scenarios to detect seeded defects, but the
seeded defects are not necessarily known to the testers.
• Use the amount of seeded defects uncovered to predict how
effective the testing methodology is
e.g. : - put in 10 seeded defects
- set the goal to detect , say 90%, of the seeded bugs
- but, say, found only 6 seeded bugs and 25 “real” bugs
- 6/10 = 25/x
- x = 250/6 ; which implies there are 42 total “real” bugs
- so, we believe 42-25= 17 more real bugs left
- decision would be keep on testing until we discover 90%seeded defect
Input Domain Testing as “criteria”
• Select test cases to cover all physical input
• Select test cases that causes each input interface control to be
stimulated (e.g. window menu, radio button, drop downs, etc.)
• Discriminating Criteria for these above:
– Statistically equivalent to covering the complete set (equivalence
partitioning)
– As stated before, typical or most likely to be executed by users
3. Running and Evaluating Test Scenarios
• Executing the designed test scenarios: manually executing test
scenarios are expensive ---so often testing includes a tool that
aids in automating some aspect of the test execution
• Evaluating test results : comparing the actual test result
against the expected test result --- needs human oracle to
compare results ?.
– Use formal specification to state the expected result and then
compare against the actual results, also specified in formal
terms. (my example: use pre & post conditions with “assert”.)
– Embedding some code to show the executing results (e.g. outputting variable values to be checked by an external person or
automated program ----- in some strategically picked program
slices
my words
Regression Testing – just re-running old test cases?
•
•
Regression test is a form of testing changes (& fixes)
Often an application software may go through
several versions before final testing and release,
each version may contain corrected code, modified
code, and/or additional functional code:
–
We need to test to see that the newer version with these
potential changes did not “regress” what was working
before in the previous versions. This is called regression
testing. How much testing do we do?
1.
2.
3.
Test only the modified areas
Test only the modified areas and the “immediate neighborhood”
Test the new/modified areas and also re-run all the previous test
scenarios
Some Other Concerns (when evaluating tests)
• Should code that is hard to test or buggy be rewritten?
– What is “hard to test” code ?
– How much of the code should be re-written if a code piece
is judged to be hard-to-test
- We do not have a clear definition of hard to test code – “non-cohesive” code?
- Furthermore, are we writing code for users or for testers and other developers?
• Recreating or reproducing a failed case is not always
easy because the state under which the code failed
is not always understood.
– What do we do with non-reproducible failures?
• Report them but do nothing?
• Do not even report those?
4. Measuring Test Progress
• Instead of just reporting “numbers” of test cases
designed, or executed; number of failures found;
etc. , should also ask: We talked about this when you reported bugs for inspection.
e.g. problem type and severity; problems/page (rate)
–
–
–
–
Have we considered “common” programming errors
Have we forced all data to be initialized?
Have we found all seeded errors?
Have we tried the expected user usage modes?
• Testers and Support would also like to know how
many defects are left in the code and the chance of
these showing up in the field. How do we deal with this ?
More on Measurement and Models
• Testability: (not sure how measured)
– If a software has “high” testability, then it is easier to test
and thus more likely to have the defects found and removed
prior to releasing to users. (refers to Voas’ notion of
testability --- another paper) If we found small # of bugs in
highly testable code --- then we should be happy --- & finding
lots of bugs in low testable code is also a good sign?
• Reliability models
– Models the frequency of defects found through testing
– Using the model to predict the probability of defects to be
found in the user environment
Finally
•
•
•
•
Hire the “smartest” & “qualified” people you can get.
Equip them with proper tools and training
Give them the authority and time to perform their work
Listen to them when they speak about the “quality” of
the product.
My addition
Download