novataig_2011_07_offutt

advertisement
Cost / Benefits Arguments
for Automation and Coverage
Jeff Offutt
Professor, Software Engineering
George Mason University
Fairfax, VA USA
www.cs.gmu.edu/~offutt/
offutt@gmu.edu
Who Am I
PhD Georgia Institute of Technology, 1988
Professor at George Mason University since 1992
– BS, MS, PhD in Software Engineering (also CS)
Lead the Software Engineering MS program
– Oldest and largest in USA
Editor-in-Chief of Wiley’s journal of Software Testing,
Verification and Reliability (STVR)
Co-Founder of IEEE International Conference on
Software Testing, Verification and Validation (ICST)
Co-Author of Introduction to Software Testing
(Cambridge University Press)
NoVa TAIG, August 2011
© Jeff Offutt
2
Software is a Skin that Surrounds
Our Civilization
Quote due to Dr. Mark Harman
NoVa TAIG, August 2011
© Jeff Offutt
3
Costly Software Failures
NIST report, “The Economic Impacts of Inadequate
Infrastructure for Software Testing” (2002)
– Inadequate software testing costs the US alone between $22 and
$59 billion annually
– Better approaches could cut this amount in half
Huge losses due to web application failures
– Financial services : $6.5 million per hour (just in USA!)
– Credit card sales applications : $2.4 million per hour (in USA)
In Dec 2006, amazon.com’s BOGO offer turned into a
double discount
2007 : Symantec says that most security vulnerabilities are
due to faulty software
World-wide monetary loss due to poor software is staggering
NoVa TAIG, August 2011
© Jeff Offutt
4
Types of Test Activities
Testing can be broken up into four general types of activities
1. Test Design
1.a) Criteria-based
2. Test Automation
1.b) Human-based
3. Test Execution
4. Test Evaluation
Each type of activity requires different skills, background
knowledge, education and training
No reasonable software development organization uses the same
people for requirements, design, implementation, integration and
configuration control
Why do test organizations still use the same people
for all four test activities??
This clearly wastes resources
NoVa TAIG, August 2011
© Jeff Offutt
5
1. Test Design – (a) Criteria-Based
Design test values to satisfy coverage criteria
or other engineering goal
This is the most technical job in software testing
Requires knowledge of :
– Discrete math
– Programming
– Testing
Requires much of a traditional CS degree
This is intellectually stimulating, rewarding, and challenging
Test design is analogous to software architecture on the development
side
Using people who are not qualified to design tests is a sure way to
get ineffective tests
NoVa TAIG, August 2011
© Jeff Offutt
6
1. Test Design – (b) Human-Based
Design test values based on domain knowledge of
the program and human knowledge of testing
This is much harder than it may seem to developers
Criteria-based approaches can be blind to special situations
Requires knowledge of :
– Domain, testing, and user interfaces
Requires almost no traditional CS
– A background in the domain of the software is essential
– An empirical background is very helpful (biology, psychology, …)
– A logic background is very helpful (law, philosophy, math, …)
This is intellectually stimulating, rewarding, and challenging
– But not to typical CS majors – they want to solve problems and build things
NoVa TAIG, August 2011
© Jeff Offutt
7
Model-Driven Test Design – Steps
model /
structure
analysis
domain
analysis
software
artifact
refine
refined
test
requirements /
requirements
test specs
generate
criterion
test
requirements
DESIGN
ABSTRACTION
LEVEL
IMPLEMENTATION
ABSTRACTION
LEVEL
input
values
execute
evaluate
automate
pass /
test
test
test
fail
results
scripts
cases
NoVa TAIG, August 2011
© Jeff Offutt
prefix
postfix
expected
8
MDTD – Activities
model /
structure
test
requirements
Test Design
software
artifact
DESIGN
ABSTRACTION
LEVEL
IMPLEMENTATION
Raising our abstraction level makes
ABSTRACTION
test design MUCH easier
LEVEL
pass /
fail
Test
Evaluation
NoVa TAIG, August 2011
refined
requirements /
test specs
test
results
test
scripts
input
values
test
cases
Test
Execution
© Jeff Offutt
9
Example Coverage Criteria
Statement coverage … more generally known as
node coverage on graphs
Branch coverage … more generally known as edge
coverage on graphs
Prime path coverage (graphs)
Predicate coverage (logic)
Multiple condition / decision coverage (MCDC) …
also known as correlated active clause coverage
Input space partitioning
Mutation analysis coverage
NoVa TAIG, August 2011
© Jeff Offutt
10
Test Coverage Criteria
Test coverage criteria use classic engineering
abstraction
– Civil engineers use algebra and calculus to model parts
of the real world
– Then solve problems with those models
– Instead of algebra and calculus, we use discrete math …
logic, graphs, grammar, sets
Why are test criteria growing in use now ?
– We need to use test automation before using criteria
– Tool support is essential
– Testers need to have more knowledge than in the past
NoVa TAIG, August 2011
© Jeff Offutt
11
Example Success Stories
These slides introduce some specific examples of
how some of these ideas are being used in
companies
Some companies are mentioned by name
– Some names cannot be mentioned
I discuss some general process notes
Then discuss examples of some of the specific
criteria being used
NoVa TAIG, August 2011
© Jeff Offutt
12
Google
Programmers spend up to half of their time testing
– Unit testing is measured as part of programmer productivity
– Programmers must solve all problems found in system testing, immediately
– If quality is bad, system testers refuse to help
Products are shipped daily
– Release and iterate cycle
– Focus on fast fixing instead of prevention
All tests are fully automated
Teams choose their own test criteria, but teams must use criteria
They have saved tens of millions of dollars
– Automation
– Developer responsibility
– Immediate feedback
Source – Patrick Copeland, Keynote Address, Intl Conf on
Software Testing, Verification and Validation (ICST 2010)
NoVa TAIG, August 2011
© Jeff Offutt
13
Amazon
All tests are automated and documented
Developers are educated in testing
Developers are measured by their unit tests’ quality
– Developers are rewarded for finding unit faults
– Developers are measured by the number of faults found
during system testing that trace back to them
They have lots of internal-use tools for automation
and measuring criteria
Source – visit to the company
NoVa TAIG, August 2011
© Jeff Offutt
14
Microsoft
Software Development Engineer in Test (SDET)
– Developers who specialize in testing (not SMEs)
Goal is to automate all tests
They use Input Space Partitioning for many of their
tests
Many groups use graph-based criteria (branch or
node coverage)
Source – How We Test Software at Microsoft,
by Page, Johnston, and Rollison
NoVa TAIG, August 2011
© Jeff Offutt
15
Major US Government Contractor
Last year a manager started applying these ideas in her
project
– Focused on unit / developer testing
– Held monthly reviews of documentation quality, code structure,
and unit tests
– Required use of test automation tools
– Required use of a simple graph criterion (all branches)
Established a test design expert and a test automation
expert
She received a commendation for saving tens of thousands
of dollars in a few months
– Is now teaching her approach to other managers on the project
Source – personal contact
NoVa TAIG, August 2011
© Jeff Offutt
16
Graph Criteria
Web software company (in Northern VA)
– Applying graph criteria to develop tests for new web applications
– Automation with httpunit
– Reduced deployment errors by 50%, reduced cost by 25%
– Updating automated tests is a lot of work
Government contractor of security assessment tools
– Applying graph criteria to test their threat assessment engines
– Automation with JUnit and internal automation framework
– Cut time to deploy new products by 20%, reduced development
cost by 15%
Sources – consulting / part-time student employee
NoVa TAIG, August 2011
© Jeff Offutt
17
Logic Criteria
Company that builds embedded, safety-critical, real-time,
software for trains
– Applied CACC to post-deployment communication software
– Found over a dozen faults, 3 safety-critical, 2 real-time
– Fixed all problems before the software failed in the system
– Logic testing is now mandated on all safety-critical software
Aerospace company that manufactures planes
– Applied CACC to flight guidance software (embedded, real-time,
safety critical)
– Found numerous problems
– Automation estimated to have saved 30% of testing cost
Sources – Student industry project / consulting
NoVa TAIG, August 2011
© Jeff Offutt
18
Input Space Partitioning
Freddie Mac (major financial service company)
– System testing on calculation engines
• Faults can cause millions of dollars loss
– Test manager tested two similar products, one with their
traditional method and one using ISP
– Special purpose tools to support ISP
– ISP tests found 3.5 times as many faults, with half the effort
• ZERO defects reported in deployment (after 2 years)
– ISP is now being disseminated throughout the company
Dozens of companies in Northern Virginia have used ISP
over the past 15 years
– All saved money and found more faults
Sources – MS Thesis at GMU / part-time student employees
NoVa TAIG, August 2011
© Jeff Offutt
19
Mutation Testing
A major network router manufacturer
– One of my students applied mutation to an essential engine in a
router – embedded, real-time software
• Already been in deployment for years
– Found 3 major problems, one of which had cost the company
over $70 million in downtime and lost revenue
– My student got a bonus of $800,000 (1999)
Telecommunications company
– Real-time, embedded software, plus web applications
– I helped apply mutation testing and graph criteria to 3 software
components – past testing, ready for deployment
– About 150 tests found over 50 separate issues – at 25% the cost
of their usual system testing
Sources – student / consulting
NoVa TAIG, August 2011
© Jeff Offutt
20
Advantages of Criteria-Based
Test Design
Criteria maximize the “bang for the buck”
– Fewer tests that are more effective at finding faults
Comprehensive test set with minimal overlap
Traceability from software artifacts to tests
– The “why” for each test is answered
– Built-in support for regression testing
A “stopping rule” for testing—advance knowledge
of how many tests are needed
Natural to automate
NoVa TAIG, August 2011
© Jeff Offutt
21
Criteria-Based Testing Summary
• Many companies still use “monkey testing”
• A human sits at the keyboard, wiggles the mouse and
bangs the keyboard
• No automation
• Minimal training required
• Some companies automate human-designed tests
• Reduces execution cost
• Eases repeat testing
• But companies that use automation and criteria-based
test design
Save money
Find more faults
Build better software
NoVa TAIG, August 2011
© Jeff Offutt
22
Contact
We are in the middle of a revolution in how software is
tested
Research is finally meeting practice
Jeff Offutt
offutt@gmu.edu
http://cs.gmu.edu/~offutt/
NoVa TAIG, August 2011
© Jeff Offutt
23
Download