Traditional Testing is a Failure

advertisement
Traditional Software Testing
is a Failure!
Linda Shafer
University of Texas at Austin
www.athensgroup.com
donshafer@athensgroup.com
Don Shafer
Chief Technology Officer, Athens Group, Inc.
www.athensgroup.com
donshafer@athensgroup.com
Seminar Dedication
Copyright © 2002 Linda and Don Shafer
Enrico Fermi referred to these
brilliant Hungarian scientists
as “the Martians,” based on
speculation that a spaceship
from Mars dropped them all off
in Budapest in the early
1900’s.
Traditional Software Testing is a Failure!
And let us not forget….
1994
John Harsányi (b. 5/29/1920,
Budapest, Hungary, d. 2000)
"For his pioneering analysis of
equilibrium in the theory of non
cooperative games.“
Shared prize with A Beautiful Mind‘’s
John Nash
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Special Thanks to:
Rex Black
RBCS, Inc.
31520 Beck Road
Bulverde, TX 78163 USA
Phone: +1 (830) 438-4830
Fax: +1 (830) 438-4831
www.rexblackconsulting.com
rex_black@rexblackconsulting.com
Gary Cobb’s testing presentations for SWPM as excerpted from:
SOFTWARE ENGINEERING: a Practitioner’s Approach, by Roger Pressman,
McGraw-Hill, Fourth Edition, 1997
Chapter 16: “Software Testing Techniques”
Chapter 22: “Object-Oriented Testing”
UT SQI SWPM Software Testing Sessions from 1991 to 2002
… for all they provided in past test guidance and direct use of
some material from the University of Texas at Austin Software
Project Management Certification Program.
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Traditional Development Life Cycle Model
Process Steps
Process Gates
REQUIREMENTS DEFINITION
HIGH LEVEL DESIGN
DETAIL DESIGN
SYSTEM CONSTRUCTION
VERIFICATION & VALIDATION
SYSTEM DELIVERY
Prototypes
REVIEW
PROTOTYPE
1
REVIEW
PROTOTYPE
2
REVIEW
PROTOTYPE
3
REVIEW
REVIEW
REVIEW
POST
IMPLEMENTATION
REVIEW
Project Management Support Processes
Risk Reduction
Copyright © 2002 Linda and Don Shafer
Training
Planning
Configuration Management
Estimating
Traditional Software Testing is a Failure!
Metrics
Quality Assurance
Software Failures – NIMBY!
It took the European Space Agency 10 years and $7 billion to produce Ariane 5, a giant rocket capable
of hurling a pair of three-ton satellites into orbit with each launch and intended to give Europe
overwhelming supremacy in the commercial space business. All it took to explode that rocket less
than a minute into its maiden voyage last June, scattering fiery rubble across the mangrove swamps
of French Guiana, was a small computer program trying to stuff a 64-bit number into a 16-bit space.
http://www.around.com/ariane.html 1996 James Gleick
Columbia and other space shuttles have a history of computer glitches that have been linked to
control systems, including left-wing steering controls, but NASA officials say it is too early to
determine whether those glitches could have played any role in Saturday's shuttle disaster. … NASA
designed the shuttle to be controlled almost totally by onboard computer hardware and software
systems. It found that direct manual intervention was impractical for handling the shuttle during
ascent, orbit or re-entry because of the required precision of reaction times, systems complexity and
size of the vehicle. According to a 1991 report by the General Accounting Office, the investigative arm
of Congress, "sequencing of certain shuttle events must occur within milliseconds of the desired
times, as operations 10 to 400 milliseconds early or late could cause loss of crew, loss of vehicle, or
mission failure." http://www.computerworld.com/softwaretopics/software/story/0,10801,78135,00.html
After the hydraulic line on their V-22 Osprey aircraft ruptured, the four Marines on board had 30
seconds to fix the problem or perish. Following established procedures, when the reset button on the
Osprey's primary flight control system lit up, one of the pilots — either Lt. Col. Michael Murphy or Lt.
Col. Keith Sweaney — pushed it. Nothing happened. But as the button was pushed eight to 10 times in
20 seconds, a software failure caused the tilt-rotor aircraft to swerve out of control, stall and then
crash near Camp Lejeune, N.C. …The crash was caused by "a hydraulic system failure compounded by
a computer software anomaly," he said. Berndt released the results of a Corps legal investigation into
the crash at an April 5 Pentagon press briefing. http://www.fcw.com/fcw/articles/2001/0409/news-osprey-0409-01.asp
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Software Failures – Offshore
Material Handling
Drive Off under Dynamic Positioning
Computer science also differs from physics in that it is not
actually a science. It does not study natural objects. Neither
is it, as you might think, mathematics; although it does use
mathematical reasoning pretty extensively. Rather, computer
science is like engineering - it is all about getting something
to do something, rather than just dealing with abstractions
as in pre-Smith geology.
Richard Feynman, Feynman Lectures on Computation, 1996, pg xiii
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Material Handling Under Software Control
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Drive Off
Drive-off is caused by a fault in the thrusters and/or their control system
and a fault in the position reference system. As for the drift-off incidents,
the failure frequencies are derived from the DPVOA database, which
consists of mainly DP Class 2 vessels. For the drive-off incident, these
figures are representative also for the DP Class 3 vessels. The major
difference between DP Class 2 and 3 is that only the latter is designed to
withstand fire and flooding in any one compartment. These two incident
types do not affect the drive-off frequency significantly due to the fact that
drive-off is caused by logical failures in the thrusters, their control system
and/or the position reference system. Hence, the DP Class 2 figures are
used directly also for the DP Class 3 vessel.
The frequency of fault in the thrusters and/or their control system can be
obtained from data in Ref. /A-4/. The control system is here meant to
include all relevant software and hardware. The statistics show that
the software failure is about four times as frequent as the
hardware failure and slightly more frequent than the pure
thruster failure.

KONGSBERG OFFSHORE A.S.
 DEMO 2000 - LIGHT WELL INTERVENTION
 HSE EVALUATION
 REPORT NO. 00-4002, REVISION NO. 00
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
IMCA* Software Incidents as a Percentage of
the Total
100%
80%
60%
40%
20%
0%
1990
1991
1992
1993
Total Software
1994
1995
1996
1997
Other Incidents
*International Marine Contractors Association
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
1998
1999
2000
Average
Class 1: “Serious loss of position with serious actual or potential
consequences.”
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
General Testing Remarks







a critical element of software quality assurance
represents the ultimate review of specification, design
and coding
may use up to 40% of the development resources
the primary goal of testing is to think of as many ways
as possible to bring the newly-developed system down
demolishing the software product that has just been
built
requires learning to live with humanness of
practitioners
requires an admission of failure in order to be good at
it
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Organizing for Software Testing
System
Engineering
System test
Software Requirements
Integration test
Software Design
Code
Validation test
Unit Test
The software developer is always responsible for
testing the modules
An independent test group removes the COI of
expecting the developer perform other tests
Customer witnessed testing
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Cost Impact of Software Defects
Benefits of formal reviews



early discovery of software
design
1%
coding
8%
defects
design activities introduce
between 50%-65% or all
defects of the development
process
formal reviews can find up
to 75% of the design flaws
Req Anal
Prod Des
Prog.
Test Plan
V &V
Proj Off.
CM/QA
release
73%
Manuals
Staffing
15
3500
3000
2500
2000
1000
Model Output
10
CDR
1
1
0
Copyright © 2002 Linda and Don Shafer
DD+CUT
FCA/PCA
5
500
PD
TRR
PDR
0
1500
REQ
testing
16%
I&T
Traditional Software Testing is a Failure!
4
4
7
7
10
13
16
mo.
Adjusting the Software
Development Life Cycle
Cost Differential
Percent Effort by Phase
Future
Today
%
40
Conventional
20
OO
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
O&M
IT
CUT
DD
Cost
Schedule
Product Quality
Ease-of-Management
Customer Loyalty
Confidence Among Developers
Capitalized Reuse
PD
REQ
0
Questions People Ask About Testing
Should testing instill guilt?
Is testing really destructive?
Why aren’t there methods of eliminating bugs at the
time they are being injected?
Why isn’t a successful test one that inputs data
correctly, gets all the right answers and outputs them
correctly?
Why do test cases need to be designed?
Why can’t you test most programs completely?
Why can’t the software developer simply take
responsibility for finding/fixing all his/her errors?
How many testers does it take to change a light
bulb?
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Testing Objectives
Glenford J. Myers (IBM, 1979)
1. Testing is a process of executing a program
with the intent of finding an error
2. A good test case is one that has a high
probability of finding an as yet undiscovered
error
3. A successful test is one that uncovers an as
yet undiscovered error
Test Information Flow
Software
Configuration
Testing
Test
Configuration
Copyright © 2002 Linda and Don Shafer
Evaluation
Test
Results
Error
rate data
Errors
Debug
Reliability
Model
Expected
Results
Traditional Software Testing is a Failure!
Predicted
Reliability
Correctness
Testing Principles
All tests should be traceable to
customer requirements
Tests should be planned long
before testing begins
The Pareto principle applies to
software testing

80% of all errors will likely be in
20% of the modules
Testing should begin “in the
small” and progress toward
testing “in the large”
Exhaustive testing is not possible
Testing should be conducted by
an independent third party
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
“The test team
should always
know
how many
errors still
remain in the
code
- for when the
manager comes
by.”
About Software Testing
Software testing - a multi-step strategy


series of test case design methods that help
ensure effective error detection
testing is not a safety net or replacement of
software quality
 SQA typically conducts independent audits of a
product’s compliance to standards
 SQA assures that the test team follows the policies
& procedures

testing is not 100% effective in uncovering
defects in software
 A real good testing effort will ship no more than
2.5 defects/KSLOC
(thousand source lines of code)
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Test Planning
Requirements
Inspections
Software
Architecture
Inspections
Design
Inspections
Code Inspections
Test Inspections
Test Specification Outline
I. Scope of testing
II. Test Plan
A. Test phases and builds
B. Schedule
C. Overhead software
D. Environment and resources
III. Test procedure n
A. Order of integration
1. Purpose
2. Modules to be tested
B. Unit tests for modules in build
1. Description of tests for
module m
2. Overhead software description
3. Expected results
C. Test Environment
1. Special tools or techniques
2. Scaffolding software
description
D. Test case data
E. Expected results for build n
IV. Actual test results
V. References
VI. Appendices
Glenford Myers’
Sandwich Method
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Sample Review Checklist









Have major test phases properly been identified and
sequenced?
Has traceability to validation criteria/requirements
been established as part of software requirements
analysis?
Are major functions demonstrated early?
Is the test plan consistent with the overall project
plan?
Has a test schedule been explicitly defined?
Are test resources and tools identified and available?
Has a test record-keeping mechanism been
established?
Have test drivers and stubs been identified and has
work to develop them been scheduled?
Has stress testing for software been specified
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Sample Review Checklist
Software Test Procedure







Have both white and black box tests been
specified?
Have all the independent logic paths been
tested?
Have test cases been identified and listed
with their expected results?
Is error handling to be tested?
Are boundary values to be tested?
Are timing and performance to be tested?
Has an acceptable variation from the
expected results been specified?
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Bach’s 1994 Testability Checklist
Testability - how easily a computer
program can be tested depends on the
following:
 Operability
 Observability
 Controlability
 Decomposability
 Simplicity
 Stability
 Understandability
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Some of the Art of Test Planning
Test Time Execution Estimation
How Long to get the Bugs Out?
Realism not Optimism
Rules of Thumb
Light Weight Test Plan
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
How Can We Predict Test Execution Time?
When will you be done executing the tests?
Part of the answer is when you’ll have run all the planned
tests once



Total estimated test time (sum for all planned tests)
Total person-hours of tester time available per week
Time spent testing by each tester
The other part of the answer is when you’ll have found
the important bugs and confirmed the fixes
Estimate total bugs to find, bug find rate, bug fix rate,
and closure period (time from find to close) for bugs


Historical data really helps
Formal defect removal models are even more accurate
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
How Long to Run the Tests?
It depends a lot on how you run tests


Scripted vs. exploratory
Regression testing strategy (repeat tests or just run once?)
What I often do




Plan for consistent test cycles (tests run per test release) and
passes (running each test once)
Realize that buggy deliverables and uninstallable builds slow test
execution…and plan accordingly
Try to understand the amount of confirmation testing, as a large
number of bugs leads to lots of confirmation testing
Check number of cycles with bug prediction
Testers spend less than 100% of their time testing



E-mail, meetings, reviewing bugs and tests, etc.
I plan six hours of testing in a nine-to-ten hour day (contractor)
Four hours of testing in an eight hour day is common (employee)
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
How Long to Get the Bugs Out?
Using historical data and test cycle model, you can
develop a simple model for bugs
Predicting the total number bugs is hard



Subject to many factors and nonlinear effects
Common techniques: bugs per developer-day, KSLOC, or
function point (FP)
Project size is usually the easiest factor to estimate
Bug injection rates, fix rates, and closure periods are
beyond the test team’s control


You can predict, but document assumptions
The more historical data you have for similar (size, staff,
technology, etc.) projects, the more accurate and confident
you can be
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
This chart was created in about one hour using historical data from a couple
projects and some simplified models of bug find/fix rates. The more data you
have, the more accurate your model can be, and the more you can predict the
accuracy.
Copyright
© Linda
1996-2002
Rex Black
Copyright © 2002
and Don Shafer
Traditional Software Testing is a Failure!
Objective: Realism, Not Optimism
Effort, dependencies, resources accurately forecast
Equal likelihood for each task to be early as late
Best-case and worst-case scenarios known
Plan allows for corrections to early estimates
Risks identified and mitigated, especially…




Gaps in skills
Risky technology
Logistical issues and other tight critical paths
Excessive optimism
“[Optimism is the] false assumption that… all will go well, i.e., that
each task will take only as long as it ‘ought’ to take.”
--Fred Brooks, in The Mythical Man-month, 1975.
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
How About those “Rules of Thumb”?
Various rules of thumb exist for effort,
team size, etc., e.g., developer-to-tester
ratios, Jones’ Estimating Software Costs
How accurate are these rules?




20
15
10
Jones’ test development rule: ~ 1 hr/case
5
Across four projects, I averaged 9 hrs./case:
almost an order of magnitude difference
0
Imagine if I’d used Jones’ rule on IA project
Even on my own projects, lots of variation
However, historical data from previous
projects with similar teams, technologies,
and techniques is useful for estimation
You can adjust data from dissimilar projects
if you understand the influential factors
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
19
Traditional Software Testing is a Failure!
8
4
4
IA
IVR Bank Mkt
Person-hours per test case
developed varied widely
across four projects. Factors
included test precision,
number of conditions per
case, automation, custom tool
development, test data needs,
and testware re-use
Refining the Estimate
For each key deliverable, look at size and complexity


Small, medium, large and simple, moderate, complex
Do you have data from past projects that can help you confirm
the effort and time to develop deliverables of similar size and
complexity?
Iterate until it works




Must fit overall project schedule…
…but not disconnected from reality
Check “bottom-up” (work-breakdown-structure) estimates against
“top-down” rules of thumb, industry benchmarks, models relating
metrics like FP or KSLOC to test size
Estimates will change (by as much as 200%) as the project
proceeds (knowledge is gained, requirements evolve)
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
A Lightweight Test Plan Template
Overview
Bounds



Test Development
Test Execution
Scope
Definitions
Setting


Quality Risks
Schedule of Milestones
Transitions



Entry Criteria
Continuation Criteria
Exit Criteria
Test Configurations and
Environments




Key Participants
Test Case and Bug Tracking
Bug Isolation and Classification
Release Management
Test Cycles
Test Hours
Risks and Contingencies
Change History
Referenced Documents
Frequently Asked Questions
You can find this template, a case study using it, and three
other test plan templates at www.rexblackconsulting.com
Copyright
© 1996-2002
Black
Copyright
© 2002
Linda and DonRex
Shafer
Traditional Software Testing is a Failure!
How well did you test???
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Linda Shafer Bio:
Linda Shafer has been working with the software industry since
1965, beginning with NASA in the early days of the space
program. Her experience includes roles of programmer,
designer, analyst, project leader, manager, and SQA/SQE. She
has worked for large and small companies, including IBM,
Control Data Corporation, Los Alamos National Laboratory,
Computer Task Group, Sterling Information Group, and
Motorola. She has also taught for and/or been in IT shops at
The University of Houston, The University of Texas at Austin,
The College of William and Mary, The Office of the Attorney
General (Texas) and Motorola University. Ms. Shafer's
publications include 25 refereed articles, and three books. She
currently works for the Software Quality Institute and coauthored a SQI Software Engineering Series book published by
PrenHall in 2002: Quality Software Project Management. She is
on the International Press Committee of the IEEE and an
author in the Software Engineering Series books for IEEE. Her
MBA is from the University of New Mexico.
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Don Shafer Bio:
Don Shafer is a co-founder, corporate director and Chief Technology Officer of
Athens Group, Inc. Incorporated in June 1998, Athens Group is an employeeowned consulting firm, integrating technology strategy and software solutions.
Prior to Athens Group, Shafer led groups developing and marketing hardware
and software products for Motorola, AMD and Crystal Semiconductor. He was
responsible for managing a $129 million-a-year PC product group that
produced award-winning audio components. From the development of lowlevel software drivers in yet-to-be-released Microsoft operating systems to the
selection and monitoring of Taiwan semiconductor fabrication facilities, Shafer
has led key product and process efforts. In the past three years he has led
Athens engineers in developing industry standard semiconductor fab
equipment software interfaces, definition of 300mm equipment integration
tools, advanced process control state machine data collectors and embedded
system control software agents. His latest patents are on joint work done with
Agilent Technologies in state-based machine control. He earned a BS degree
from the USAF Academy and an MBA from the University of Denver. Shafer’s
work experience includes positions held at Boeing and Los Alamos National
Laboratories. He is currently an adjunct professor in graduate software
engineering at Southwest Texas His faculty web site is
http://www.cs.swt.edu/~donshafer/. With two other colleagues in 2002, he
wrote Quality Software Project Management for Prentice-Hall now used in both
industry and academia. Currently he is working on an SCM book for the IEEE
Software Engineering Series.
Copyright © 2002 Linda and Don Shafer
Traditional Software Testing is a Failure!
Download