Automated Test Generation and Repair Darko Marinov Escuela de Verano de Ciencias Informáticas RÍO 2011 Rio Cuarto, Argentina February 14-19, 2011 Why Testing? • Goal: Increase software reliability – Software bugs cost US economy $60B/year [NIST’02] • Approach: Find bugs using testing – Estimated savings from better testing $22B/year • Challenge: Manual testing is problematic – Time-consuming, error-prone, expensive • Research: Automate testing – Reduce cost, increase benefit Topics to Cover • • • • • • Introduction: about bugs Randoop: random generation of OO tests Pex: dynamic symbolic generation of inputs UDITA: generation of complex data inputs ReAssert: repair of OO unit tests JPF: systematic testing of Java code Introduction • • • • Why look for bugs? What are bugs? Where they come from? How to find them? Some Costly “Bugs” • NASA Mars space missions – Priority inversion (2004) – Different metric systems (1999) • BMW airbag problems (1999) – Recall of >15000 cars • Ariane 5 crash (1996) – Uncaught exception of numerical overflow – Sample Video • Your own favorite example? Some “Bugging” Bugs • An example bug on my laptop – “Jumping” file after changing properties • Put a read-only file on the desktop • Change properties: rename and make not read-only • Your own favorite example? • What is important about software for you? – Correctness, performance, functionality Terminology • • • • • • • • • • Anomaly Bug Crash Defect Error Failure, fault Glitch Hangup Incorrectness J… Dynamic vs. Static • Incorrect (observed) behavior – Failure, fault • Incorrect (unobserved) state – Error, latent error • Incorrect lines of code – Fault, error “Bugs” in IEEE 610.12-1990 • Fault – Incorrect lines of code • Error – Faults cause incorrect (unobserved) state • Failure – Errors cause incorrect (observed) behavior • Not used consistently in literature! Correctness • Common (partial) properties – Segfaults, uncaught exceptions – Resource leaks – Data races, deadlocks – Statistics based • Specific properties – Requirements – Specification Traditional Waterfall Model Requirements Analysis Design Checking Implementation Unit Testing Integration System Testing Maintenance Verification Phases (1) • Requirements – Specify what the software should do – Analysis: eliminate/reduce ambiguities, inconsistencies, and incompleteness • Design – Specify how the software should work – Split software into modules, write specifications – Checking: check conformance to requirements Phases (2) • Implementation – Specify how the modules work – Unit testing: test each module in isolation • Integration – Specify how the modules interact – System testing: test module interactions • Maintenance – Evolve software as requirements change – Verification: test changes, regression testing Testing Effort • Reported to be >50% of development cost [e.g., Beizer 1990] • Microsoft: 75% time spent testing – 50% testers who spend all time testing – 50% developers who spend half time testing When to Test • The later a bug is found, the higher the cost – Orders of magnitude increase in later phases – Also the smaller chance of a proper fix • Old saying: test often, test early • New methodology: test-driven development (write tests before code) Software is Complex • • • • • • Malleable Intangible Abstract Solves complex problems Interacts with other software and hardware Not continuous Software Still Buggy • Folklore: 1-10 (residual) bugs per 1000 nbnc lines of code (after testing) • Consensus: total correctness impossible to achieve for (complex) software – Risk-driven finding/elimination of bugs – Focus on specific correctness properties Approaches for Finding Bugs • Software testing • Model checking • (Static) program analysis Software Testing • Dynamic approach • Run code for some inputs, check outputs • Checks correctness for some executions • Main questions – Test-input generation – Test-suite adequacy – Test oracles Other Testing Questions • • • • • • • • Maintenance Selection Minimization Prioritization Augmentation Evaluation Fault Characterization … Model Checking • Typically hybrid dynamic/static approach • Checks correctness for “all” executions • Some techniques – Explicit-state model checking – Symbolic model checking – Abstraction-based model checking Static Analysis • Static approach • Checks correctness for “all” executions • Some techniques – Abstract interpretation – Dataflow analysis – Verification-condition generation Comparison • Level of automation – Push-button vs. manual • Type of bugs found – Hard vs. easy to reproduce – High vs. low probability – Common vs. specific properties • Type of bugs (not) found Soundness and Completeness • Do we find all bugs? – Impossible for dynamic analysis • Are reported bugs real bugs? – Easy for dynamic analysis • Most practical techniques and tools are both unsound and incomplete! – False positives – False negatives Analysis for Performance • Static compiler analysis, profiling • Must be sound – Correctness of transformation: equivalence • Improves execution time • Programmer time is more important • Programmer productivity – Not only finding bugs Combining Dynamic and Static • Dynamic and static analyses equal in limit – Dynamic: try exhaustively all possible inputs – Static: model precisely every possible state • Synergistic opportunities – Static analysis can optimize dynamic analysis – Dynamic analysis can focus static analysis – More discussions than results Current Status • Testing remains the most widely used approach for finding bugs • A lot of recent progress (within last decade) on model checking and static analysis – Model checking: from hardware to software – Static analysis: from sound to practical • Vibrant research in the area • Gap between research and practice Topics Related to Finding Bugs • How to eliminate bugs? – Debugging • How to prevent bugs? – Programming language design – Software development processes • How to show absence of bugs? – Theorem proving – Model checking, program analysis Our Focus: Testing • More precisely, recent research on automated test generation and repair – More info at CS527 from Fall 2010 • Recommended general reading for research – How to Read an Engineering Research Paper by William G. Griswold – Writing Good Software Engineering Research Papers by Mary Shaw (ICSE 2003) • If you have read that paper, read on another area Writing Good SE Papers Overview • Motivation – Guidelines for writing papers for ICSE • Approach – Analysis of papers submitted to ICSE 2002 – Distribution across three dimensions • Question (problem) • Result (solution) • Validation (evaluation) • Results – Writing matters, know your conferences! Randoop • Feedback-directed random test generation by Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, and Thomas Ball (ICSE 2007) – (optional) Finding Errors in .NET with Feedbackdirected Random Testing by Carlos Pacheco, Shuvendu K. Lahiri & Thomas Ball (ISSTA 2008) – Website: Randoop • Slides courtesy of Carlos Pacheco Randoop Paper Overview • Problem (Question) – Generate unit tests (with high coverage?) • Solution (Result) – Generate sequences of method calls – Random choice of methods and parameters – Publicly available tool for Java (Randoop) • Evaluation (Validation) – Data structures (JPF is next lecture) – Checking API contracts – Regression testing (lecture next week) Pex • Pex – White Box Test Generation for .NET by Nikolai Tillmann and Jonathan de Halleux (TAP 2008) – (optional) Moles: Tool-Assisted Environment Isolation with Closures by Jonathan de Halleux and Nikolai Tillmann (TOOLS 2010) – Websites: Pex, TeachPex • Slides courtesy of Tao Xie (and Nikolai Tillmann, Peli de Halleux, Wolfram Schulte) Pex Paper Overview • Problem (Question) – Generate unit tests (with high coverage) • Solution (Result) – Describe test scenarios with parameterized unit tests (PUTs) – Dynamic symbolic execution – Tool for .NET (Pex) • Evaluation (Validation) – Found some issues in a “core .NET component” UDITA • Test Generation through Programming in UDITA by Milos Gligoric, Tihomir Gvero, Vilas Jagannath, Sarfraz Khurshid, Viktor Kuncak, and Darko Marinov (ICSE 2010) – (optional) Automated testing of refactoring engines by Brett Daniel, Danny Dig, Kely Garcia, and Darko Marinov (ESEC/FSE 2007) – Websites: UDITA, ASTGen • Slides partially prepared by Milos Gligoric UDITA Paper Overview • Problem (Question) – Generate complex test inputs • Solution (Result) – Combines filtering approach (check validity) and generating approaches (valid by construction) – Java-based language with non-determinism – Tool for Java (UDITA) • Evaluation (Validation) – Found bugs in Eclipse, NetBeans, javac, JPF... ReAssert • ReAssert: Suggesting repairs for broken unit tests by Brett Daniel, Vilas Jagannath, Danny Dig, and Darko Marinov (ASE 2009) – (optional) On Test Repair using Symbolic Execution by Brett Daniel, Tihomir Gvero, and Darko Marinov (ISSTA 2010) – Website: ReAssert • Slides courtesy of Brett Daniel ReAssert Paper Overview • Problem (Question) – When code evolves, passing tests may fail – How to repair tests that should be updated? • Solution (Result) – Find small changes that make tests pass – Ask the user to confirm proposed changes – Tool for Java/Eclipse (ReAssert) • Evaluation (Validation) – Case studies, user study, open-source evolution Java PathFinder (JPF) • Model Checking Programs by W. Visser, K. Havelund, G. Brat, S. Park and F. Lerda (J-ASE, vol. 10, no. 2, April 2003) – Note: this is a journal paper, so feel free to skip/skim some sections (3.2, 3.3, 4) – Website: JPF • Slides courtesy of Peter Mehlitz and Willem Visser JPF Paper Overview • Problem – Model checking of real code • Terminology: Systematic testing, state-space exploration • Solution – Specialized Java Virtual Machine • Supports backtracking, state comparison • Many optimizations to make it scale – Publicly available tool (Java PathFinder) • Evaluation/applications – Remote Agent Spacecraft Controller – DEOS Avionics Operating System