Pex White Box Test Generation for .NET Nikolai Tillmann, Microsoft Research SMT 2008 A unit test is a small program with assertions. void AddTest() { HashSet set = new HashSet(); set.Add(7); set.Add(3); Assert.IsTrue(set.Count == 2); } Many developers write such unit tests by hand. This involves determining a meaningful sequence of method calls, selecting exemplary argument values (the test inputs), stating assertions. 2 void AddSpec(int x, int y) { HashSet set = new HashSet(); set.Add(x); Parameterized Unit Testing Parameterized Unit Testing bridges set.Add(y); } the are gap algebraic between specifications! Assert.AreEqual(x == y, set.Count == 1); • Unit Testing, and Assert.AreEqual(x != y, set.Count == 2); • Design-By-Contract paradigm Parameterized Unit Tests separate two concerns: 1) The specification of externally visible behavior (assertions) 2) The selection of internally relevant test inputs (coverage) Test input generator Pex starts from parameterized unit tests Generated tests are emitted as traditional unit tests Dynamic symbolic execution framework Symbolic execution based on monitoring and re-execution Whole-program, white-box code analysis At the level of the .NET instructions (bytecode) Support for “Java-like” programs as well as “unsafe” code SMT-solver Z3 determines satisfying assignments for constraint systems representing execution paths How to test this code? (Real code from .NET base class libraries.) 8 Main challenge: Making sure it does not crash by writing many tests that cover the code Possible test case, written by Hand 9 Test input, generated by Pex 10 Initially, choose Arbitrary Solve Test Inputs Constraint System Choose an Uncovered Path Result: small test suite, high code coverage Run Test and Monitor Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings Initially, choose Arbitrary Solve Test Inputs Constraint System Choose an Uncovered Path Result: small test suite, high code coverage a[0] a[1] a[2] a[3] … = = = = 0; 0; 0; 0; Run Test and Monitor Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings Initially, choose Arbitrary Solve Test Condition: InputsPath … ⋀ magicNum Run Test and Monitor != 0x95673948 Constraint System Choose an Uncovered Path Result: small test suite, high code coverage Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings Initially, choose Arbitrary Solve Test Inputs 0x95673948 Run Test and Monitor … ⋀ magicNum != … ⋀ magicNum == 0x95673948 Constraint System Choose an Uncovered Path Result: small test suite, high code coverage Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings a[0] a[1] a[2] a[3] = = = = 206; 202; 239; 190; Initially, choose Arbitrary Solve Test Inputs Constraint System Choose an Uncovered Path Result: small test suite, high code coverage Run Test and Monitor Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings Initially, choose Arbitrary Solve Test Inputs Constraint System Choose an Uncovered Path Result: small test suite, high code coverage Run Test and Monitor Execution Path Known Paths Record Path Condition Finds only real bugs No false warnings Results in VS Report: Coverage, path conditions class Point { int x; int y; public static int GetX(Point p) { if (p != null) return p.X; else return -1; } } L0: ldtoken Point::X call __Monitor::LDFLD_REFERENCE ldfld Point::X call __Monitor::AtDereferenceFallthrough br L2 L1: ldtoken Point::GetX Prologue call __Monitor::AtBranchTarget call __Monitor::EnterMethod Record concrete values call __Monitor::LDC_I4_M1 brfalse L0 ldarg.0 to have all ldc.i4.m1 information L2: call __Monitor::NextArgument<Point> Calls to buildthis method when is called call __Monitor::RET .try { (The real C# compiler path condition stloc.0 context with no proper .try { Calls will perform actually moreleave L4 call __Monitor::LDARG_0 output is } catch NullReferenceException { ldarg.0 symbolic computation complicated.) ‘ call __Monitor::AtNullReferenceException call __Monitor::LDNULL rethrow ldnull } call __Monitor::CEQ Epilogue L4: leave L5 ceq } finally { call __Monitor::BRTRUE call __Monitor::LeaveMethod brtrue L1 Calls to build endfinally call __Monitor::BranchFallthrough path condition } call __Monitor::LDARG_0 L5: ldloc.0 ldarg.0 ret … 18 Similar to representation of verification conditions in ESC/Java, Spec#, … Terms for Primitive types (integers, floats, …) Constants Unary and binary expressions ‘struct’ types Tuples Instance fields of classes Mapping of references to values Elements of arrays, memory accesses through pointers Mapping of integers to values … Goal: Efficient representation of evolving program states Reduction of ground terms to constants Sharing of syntactically equal sub-terms BDDs over if-then-else terms to represent logical operations Tries/Patricia Trees to represent associative-commutative-withunit operators Normal form of polynomials Update trees Other simplification rules, e.g. \forall x. ceq(vtable(x, m1), m2) => ceq(objecttype(x), t) where m2 overrides m1, and t is the sealed declaring type of m2 Problem: Reachable code not known initially No loop invariants, loops must be unfolded Without guidance, symbolic execution may get stuck unfolding the same loop forever Solution: Search strategies outside of SMT solver choose “next branch to flip” Fair choice between different strategies Individual strategies based on program structure, including: Fair choice of branch instructions Fair choice of branch instructions + stack contexts Fair choice of branch coverage Independent constraint optimization + Constraint caching (similar to EXE) Idea: Related execution paths give rise to "similar" constraint systems Example: Consider x>y ⋀ z>0 vs. x>y ⋀ z<=0 If we already have a cached solution for a "similar" constraint system, we can reuse it x=1, y=0, z=1 is solution for x>y ⋀ z>0 we can obtain a solution for x>y ⋀ z<=0 by reusing old solution of x>y: x=1, y=0 combining with solution of z<=0: z=0 Decision procedures for uninterpreted functions with equalities, linear integer arithmetic, bitvector arithmetic, arrays, tuples Support for universal quantifiers Used to model custom theories, e.g. .NET type system Model generation Models used as test inputs Incremental solving Push / Pop of contexts for model minimization Programmatic API For small constraint systems, text through pipes would add huge overhead Problem: Pex can collect constraints over private fields, constraint solver determines assignment for private fields How to bring object into desired state? Private fields cannot be initialized freely, but only through constructor and other methods Approach taken by Pex: Automatic selection of constructor and state-modifying methods based on static code analysis Exploration of constructor and methods to find nonexceptional paths void PexAssume.IsTrue(bool c) { if (!c) throw new AssumptionViolationException(); } void PexAssert.IsTrue(bool c) { if (!c) throw new AssertionViolationException(); } Assumptions and assertions are explored just like all other branches Executions which cause assumption violations are ignored, not reported as errors or test cases 26 AppendFormat(null, “{0} {1}!”, “Hello”, “World”); “Hello World!” .Net Implementation: public StringBuilder AppendFormat( IFormatProvider provider, char[] chars, params object[] args) { if (chars == null || args == null) throw new ArgumentNullException(…); int pos = 0; int len = chars.Length; char ch = '\x0'; ICustomFormatter cf = null; if (provider != null) cf = (ICustomFormatter)provider.GetFormat( typeof(ICustomFormatter)); … 27 Introduce a mock class which implements the interface. Write assertions over expected inputs, provide concrete outputs public class MFormatProvider : IFormatProvider { public object GetFormat(Type formatType) { Assert.IsTrue(formatType != null); return new MCustomFormatter(); } } Problems: Costly to write detailed behavior by example How many and which mock objects do we need to write? 28 Introduce a mock class which implements the interface. Let an oracle provide the behavior of the mock methods. public class MFormatProvider : IFormatProvider { public object GetFormat(Type formatType) { … object o = call.ChooseResult<object>(); return o; } } Result: Relevant result values can be generated by white-box test input generation tool, just as other test inputs can be generated! 29 We applied Pex on a core .NET component Already extensively tested for several years Assertions written by developers >10,000 public methods >100,000 basic blocks Sandbox Restriction of access to external resources (files, registry, unsafe code, …) 10 machines (P4, 2Ghz, 2GB RAM) running for 3 days Exploration started from simple, generated parameterized unit tests (one per public method); assertions embedded in code 31 Coverage achieved: 43% block coverage 36% arc coverage Errors found: A significant number of benign errors, e.g. NullReferenceException, IndexOutOfRangeException, … 17 unique errors involving violation of developer-written assertions, exhaustion of memory, other serious issues. 32 Automatically achieved coverage on selected classes for core .NET component Classname Blocks Hit Arcs Hit A (mostly stateless methods) >300 95% >400 90% B (mostly stateless methods) >100 97% >200 94% C (stateful) >200 76% >300 65% D (parsing code) >500 81% >800 73% E (numerical algorihm) >400 71% >600 67% F (numerical algorihm) >100 82% >200 79% G (numerical algorihm) >100 98% >100 97% H (numerical algorihm) >200 71% >200 61% I (numerical algorihm) >200 97% >300 96% 33 Assumption: Environment is deterministic "Environment" includes all code that is not monitored, e.g. native code, uninstrumented code Pex prunes non-deterministic behavior Assumption: Program is single-threaded Potential solution: control and explore thread scheduling like all other test inputs Limitations of constraint solver Z3 has no built-in theories for floating point arithmetic approximation with rationals (linear arithmetic only) Bounds on Z3's time and memory consumption 34 Goal: Test-input generation for programs with contracts (preconditions, postconditions, invariants, etc.) In Verisoft project, compiler generates Boogie or MSIL programs from C code annotated with contracts MSIL programs embed most contracts in executable form These contracts are turned into constraints by Pex, which performs a path-sensitive analysis Challenge: Non-executable contracts Quantifiers: may range over “all integers” or “all pointers” Predicates for memory-safety: do not translate directly into machine-observable behavior Better scalability More sophisticated search-frontiers (e.g. based on fitness function that determines distance to target state) Summarizing execution paths instead of exploring them (TACAS'08) Inference of likely invariants/contracts (DySy, ICSE'08) Dealing with multi-threaded programs Controlling the scheduler Systematically exploring all relevant thread interleavings Race detection Tom Ball et. al. are building such analyses on Pex framework (ManagedChess) Program model checkers JPF, Kiasan/KUnit (Java), XRT (.NET) Combining random testing and constraint solving DART (C), CUTE (C), EXE (C), jCUTE (Java), SAGE (X86) … 37 Parameterized Unit Tests separate two concerns Specification of externally visible behavior Selection of test inputs to cover internal behavior Pex automates test input generation Uses SMT-solver Z3 Dynamic Symbolic Execution platform for .NET Used internally in Microsoft to test core .NET components Pex is publicly available for academic use. http://research.microsoft.com/Pex 38 http://research.microsoft.com/Pex Most interesting programs are beyond the scope of static symbolic execution. Calls to external world Unmanaged x86 code Unsafe managed .NET code (with pointers) Safe managed .NET code Dynamic symbolic execution will systematically explore the conditions in the code which the constraint solver understands. And happily ignore everything else, e.g. Calls to native code Difficult constraints (e.g. precise semantics of floating point arithmetic) Result: Under-approximation, which is appropriate for testing When generating test inputs for any method, e.g. DateTime ParseDateTime(string s) { … } a regression test suite can be generated, where each test asserts the observed behavior. void ParseDateTimeTest132() { DateTime result = ParseDateTime(“6/19/2008”); Assert(result.ToString() == “06/19/2008”); } XRT: Exploring Runtime Interpreter for .NET programs Static symbolic execution Used Simplify to determine unsatisfiability of path constraints Successful for self-contained programs Used today on a large scale within Microsoft for quality assurance purposes as the core of the model-based testing tool “Spec Explorer 2007”. Does not work well for real-world programs All environment behavior must be modeled Modeling of entire environment is often not feasible