Cost / Benefits Arguments for Automation and Coverage Jeff Offutt Professor, Software Engineering George Mason University Fairfax, VA USA www.cs.gmu.edu/~offutt/ offutt@gmu.edu Who Am I PhD Georgia Institute of Technology, 1988 Professor at George Mason University since 1992 – BS, MS, PhD in Software Engineering (also CS) Lead the Software Engineering MS program – Oldest and largest in USA Editor-in-Chief of Wiley’s journal of Software Testing, Verification and Reliability (STVR) Co-Founder of IEEE International Conference on Software Testing, Verification and Validation (ICST) Co-Author of Introduction to Software Testing (Cambridge University Press) NoVa TAIG, August 2011 © Jeff Offutt 2 Software is a Skin that Surrounds Our Civilization Quote due to Dr. Mark Harman NoVa TAIG, August 2011 © Jeff Offutt 3 Costly Software Failures NIST report, “The Economic Impacts of Inadequate Infrastructure for Software Testing” (2002) – Inadequate software testing costs the US alone between $22 and $59 billion annually – Better approaches could cut this amount in half Huge losses due to web application failures – Financial services : $6.5 million per hour (just in USA!) – Credit card sales applications : $2.4 million per hour (in USA) In Dec 2006, amazon.com’s BOGO offer turned into a double discount 2007 : Symantec says that most security vulnerabilities are due to faulty software World-wide monetary loss due to poor software is staggering NoVa TAIG, August 2011 © Jeff Offutt 4 Types of Test Activities Testing can be broken up into four general types of activities 1. Test Design 1.a) Criteria-based 2. Test Automation 1.b) Human-based 3. Test Execution 4. Test Evaluation Each type of activity requires different skills, background knowledge, education and training No reasonable software development organization uses the same people for requirements, design, implementation, integration and configuration control Why do test organizations still use the same people for all four test activities?? This clearly wastes resources NoVa TAIG, August 2011 © Jeff Offutt 5 1. Test Design – (a) Criteria-Based Design test values to satisfy coverage criteria or other engineering goal This is the most technical job in software testing Requires knowledge of : – Discrete math – Programming – Testing Requires much of a traditional CS degree This is intellectually stimulating, rewarding, and challenging Test design is analogous to software architecture on the development side Using people who are not qualified to design tests is a sure way to get ineffective tests NoVa TAIG, August 2011 © Jeff Offutt 6 1. Test Design – (b) Human-Based Design test values based on domain knowledge of the program and human knowledge of testing This is much harder than it may seem to developers Criteria-based approaches can be blind to special situations Requires knowledge of : – Domain, testing, and user interfaces Requires almost no traditional CS – A background in the domain of the software is essential – An empirical background is very helpful (biology, psychology, …) – A logic background is very helpful (law, philosophy, math, …) This is intellectually stimulating, rewarding, and challenging – But not to typical CS majors – they want to solve problems and build things NoVa TAIG, August 2011 © Jeff Offutt 7 Model-Driven Test Design – Steps model / structure analysis domain analysis software artifact refine refined test requirements / requirements test specs generate criterion test requirements DESIGN ABSTRACTION LEVEL IMPLEMENTATION ABSTRACTION LEVEL input values execute evaluate automate pass / test test test fail results scripts cases NoVa TAIG, August 2011 © Jeff Offutt prefix postfix expected 8 MDTD – Activities model / structure test requirements Test Design software artifact DESIGN ABSTRACTION LEVEL IMPLEMENTATION Raising our abstraction level makes ABSTRACTION test design MUCH easier LEVEL pass / fail Test Evaluation NoVa TAIG, August 2011 refined requirements / test specs test results test scripts input values test cases Test Execution © Jeff Offutt 9 Example Coverage Criteria Statement coverage … more generally known as node coverage on graphs Branch coverage … more generally known as edge coverage on graphs Prime path coverage (graphs) Predicate coverage (logic) Multiple condition / decision coverage (MCDC) … also known as correlated active clause coverage Input space partitioning Mutation analysis coverage NoVa TAIG, August 2011 © Jeff Offutt 10 Test Coverage Criteria Test coverage criteria use classic engineering abstraction – Civil engineers use algebra and calculus to model parts of the real world – Then solve problems with those models – Instead of algebra and calculus, we use discrete math … logic, graphs, grammar, sets Why are test criteria growing in use now ? – We need to use test automation before using criteria – Tool support is essential – Testers need to have more knowledge than in the past NoVa TAIG, August 2011 © Jeff Offutt 11 Example Success Stories These slides introduce some specific examples of how some of these ideas are being used in companies Some companies are mentioned by name – Some names cannot be mentioned I discuss some general process notes Then discuss examples of some of the specific criteria being used NoVa TAIG, August 2011 © Jeff Offutt 12 Google Programmers spend up to half of their time testing – Unit testing is measured as part of programmer productivity – Programmers must solve all problems found in system testing, immediately – If quality is bad, system testers refuse to help Products are shipped daily – Release and iterate cycle – Focus on fast fixing instead of prevention All tests are fully automated Teams choose their own test criteria, but teams must use criteria They have saved tens of millions of dollars – Automation – Developer responsibility – Immediate feedback Source – Patrick Copeland, Keynote Address, Intl Conf on Software Testing, Verification and Validation (ICST 2010) NoVa TAIG, August 2011 © Jeff Offutt 13 Amazon All tests are automated and documented Developers are educated in testing Developers are measured by their unit tests’ quality – Developers are rewarded for finding unit faults – Developers are measured by the number of faults found during system testing that trace back to them They have lots of internal-use tools for automation and measuring criteria Source – visit to the company NoVa TAIG, August 2011 © Jeff Offutt 14 Microsoft Software Development Engineer in Test (SDET) – Developers who specialize in testing (not SMEs) Goal is to automate all tests They use Input Space Partitioning for many of their tests Many groups use graph-based criteria (branch or node coverage) Source – How We Test Software at Microsoft, by Page, Johnston, and Rollison NoVa TAIG, August 2011 © Jeff Offutt 15 Major US Government Contractor Last year a manager started applying these ideas in her project – Focused on unit / developer testing – Held monthly reviews of documentation quality, code structure, and unit tests – Required use of test automation tools – Required use of a simple graph criterion (all branches) Established a test design expert and a test automation expert She received a commendation for saving tens of thousands of dollars in a few months – Is now teaching her approach to other managers on the project Source – personal contact NoVa TAIG, August 2011 © Jeff Offutt 16 Graph Criteria Web software company (in Northern VA) – Applying graph criteria to develop tests for new web applications – Automation with httpunit – Reduced deployment errors by 50%, reduced cost by 25% – Updating automated tests is a lot of work Government contractor of security assessment tools – Applying graph criteria to test their threat assessment engines – Automation with JUnit and internal automation framework – Cut time to deploy new products by 20%, reduced development cost by 15% Sources – consulting / part-time student employee NoVa TAIG, August 2011 © Jeff Offutt 17 Logic Criteria Company that builds embedded, safety-critical, real-time, software for trains – Applied CACC to post-deployment communication software – Found over a dozen faults, 3 safety-critical, 2 real-time – Fixed all problems before the software failed in the system – Logic testing is now mandated on all safety-critical software Aerospace company that manufactures planes – Applied CACC to flight guidance software (embedded, real-time, safety critical) – Found numerous problems – Automation estimated to have saved 30% of testing cost Sources – Student industry project / consulting NoVa TAIG, August 2011 © Jeff Offutt 18 Input Space Partitioning Freddie Mac (major financial service company) – System testing on calculation engines • Faults can cause millions of dollars loss – Test manager tested two similar products, one with their traditional method and one using ISP – Special purpose tools to support ISP – ISP tests found 3.5 times as many faults, with half the effort • ZERO defects reported in deployment (after 2 years) – ISP is now being disseminated throughout the company Dozens of companies in Northern Virginia have used ISP over the past 15 years – All saved money and found more faults Sources – MS Thesis at GMU / part-time student employees NoVa TAIG, August 2011 © Jeff Offutt 19 Mutation Testing A major network router manufacturer – One of my students applied mutation to an essential engine in a router – embedded, real-time software • Already been in deployment for years – Found 3 major problems, one of which had cost the company over $70 million in downtime and lost revenue – My student got a bonus of $800,000 (1999) Telecommunications company – Real-time, embedded software, plus web applications – I helped apply mutation testing and graph criteria to 3 software components – past testing, ready for deployment – About 150 tests found over 50 separate issues – at 25% the cost of their usual system testing Sources – student / consulting NoVa TAIG, August 2011 © Jeff Offutt 20 Advantages of Criteria-Based Test Design Criteria maximize the “bang for the buck” – Fewer tests that are more effective at finding faults Comprehensive test set with minimal overlap Traceability from software artifacts to tests – The “why” for each test is answered – Built-in support for regression testing A “stopping rule” for testing—advance knowledge of how many tests are needed Natural to automate NoVa TAIG, August 2011 © Jeff Offutt 21 Criteria-Based Testing Summary • Many companies still use “monkey testing” • A human sits at the keyboard, wiggles the mouse and bangs the keyboard • No automation • Minimal training required • Some companies automate human-designed tests • Reduces execution cost • Eases repeat testing • But companies that use automation and criteria-based test design Save money Find more faults Build better software NoVa TAIG, August 2011 © Jeff Offutt 22 Contact We are in the middle of a revolution in how software is tested Research is finally meeting practice Jeff Offutt offutt@gmu.edu http://cs.gmu.edu/~offutt/ NoVa TAIG, August 2011 © Jeff Offutt 23