Model Checking Java Programs using Structural Heuristics Alex Groce Carnegie Mellon University Willem Visser NASA Ames Research Center Model Checking • Explores graph of reachable system states – Checking for local assertions, invariants and general temporal (logic) properties • Symbolic model checking • Explicit-state model checking Java PathFinder Java Code Bytecode void add(Object o) { buffer[head] = o; head = (head+1)%size; } Object take() { … tail=(tail+1)%size; return buffer[tail]; } JAVAC 0: 1: 2: 5: 8: 9: 10: iconst_0 istore_2 goto #39 getstatic aload_0 iload_2 aaload Model Checker Special JVM JVM Depth-first Search push initial state on Stack while (Stack not empty) s = top(Stack) if s has no more successors pop the Stack else s’ = next successor of s if s’ not already visited mark s’ visited if s’ is a goal state then terminate push s’ on Stack Problems with DFS • Produces lengthy counterexamples • If state-space is too large to fully explore – May expend all resources on a single path when shallow counterexamples exist – Failed runs give little information because states explored may be very “similar” Directed Model Checking • Model checking as a search in a state space • Why not use heuristics to guide the search? – Need to know what we’re looking for • Can we find good heuristics for model checking? • Bug-finding rather than verification Best-first Search priority queue Q = {initial state} while (Q not empty) s = state in Q with lowest f remove s from Q for each successor state s’ of s if s’ not already visited mark s’ visited if s’ is a goal state then terminate f = h(s’) store (s’, f) in Q Two Kinds of Heuristics • Property-specific heuristics – Directed at a specific error • Number of unblocked threads as a measure of distance to deadlock • Static analysis for distance to an assertion check – Focus of most previous work in field Two Kinds of Heuristics • Structural heuristics – Designed to explore the structure of a program in a systematic fashion – But what do we mean by structure? Structural Heuristics • One obvious kind of structure in a program: – Control flow • Reachable control flow rather than just CFG • Motivation for branch coverage metrics used in software testing Branch Coverage • Instrument model checker to calculate branch coverage • Using a simple coverage measure as a heuristic doesn’t work well – Easily falls into local minima (once any branches are taken, every state on that path has “better” coverage) – Doesn’t distinguish between branches explored once and branches explored many times The Branch Counting Heuristic • Count the number of times each branch has been taken • Heuristic value is then: – Branches never before taken get lowest value – Non-branching transitions are next lowest – Otherwise, score is equal to the count (lower values are explored first) Three Searches DFS Branch Counting CFG Each CFG state is a basic block that increments some variable x. ERROR BFS Three Searches Branch Counting CFG BFS DFS Three Searches Branch Counting CFG BFS DFS Three Searches Branch Counting CFG BFS DFS Three Searches Branch Counting CFG Heuristic avoids taking BFS DFS Three Searches Branch Counting CFG BFS DFS Three Searches DFS Branch Counting CFG Expands 15 states BFS Terminates only with depth limit Expands 25 states Experimental Results • DEOS real-time operating system example • This version uses an integer valued counter, without abstraction Results for DEOS Search Strategy Branch-count %-coverage Random heuristic BFS DFS DFS depth 500 DFS depth 1000 DFS depth 4000 States 2,701 20,215 8,057 18,054 14,678 392,470 146,949 8,481 Time 60 FAIL 162 FAIL FAIL 6,782 2,222 171 Memory 91MB FAIL 240MB FAIL FAIL 383MB 196MB 270MB Length 136 FAIL 334 FAIL FAIL 455 987 3,997 Max Depth 139 334 360 135 14,678 500 1,000 4,000 All experiments performed on a 1.4GHz Athlon, limiting Java heap size to 512MB, all times are in seconds The Interleaving Heuristic • An important (and very hard to find) class of errors in Java is concurrency errors • What kind of structure could we explore to catch these? – Thread-interdependency The Interleaving Heuristic • Not clear how to heuristically define actual thread-interdependence • So we use an approximation: – Executions in which context is switched more often are given better heuristic values – Explores executions unlikely to appear in testing (JVM/JITs schedule quite differently) The Interleaving Heuristic • Keep track on each path of which threads are executed at each transition • Give lower (better) heuristic score to paths in which the most recently executed thread has been run less frequently • Slightly more complicated in practice, counting live threads Limiting the Queue • With heuristics we are more interested in finding bugs than in verification • So, we apply a technique from heuristic search literature: – Limit the size of the priority queue! • When queue has more than k states in it, remove all but k states with best heuristic values Experimental Results • Dining Philosophers – Comparison to other results: • Godefroid and Khurshid in TACAS ’02 paper apply genetic algorithms to dining philosophers – Best result reported is 17 philosophers, 177 seconds, 50% success rate (on a slower machine) • HSF-SPIN – Not clear how to compare (times not given) – Best result they show is 16 philosophers, and SPIN (using partial order reduction) itself fails with 14 philosophers Experimental Results Search Strategy Random heuristic BFS DFS DFS depth 500 DFS depth 1000 DFS depth 4000 Interleaving Most-blocked Interleaving (k = 1000) Most-blocked (k = 5) Most-blocked (k = 160) Most-blocked (k = 1000) Interleaving (k = 40) Interleaving (k = 160) Most-blocked (k = 40) Interleaving (k = 5) Threads 8 8 8 8 8 8 8 8 8 8 8 8 16 16 16 64 States 218,500 436,068 398,906 1,354,747 1,345,289 1,348,398 487,942 310,317 354,552 891,177 25,023 123,640 69,987 290,637 101,576 101,196 Time FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL 60 17,259 10 46 16 60 38 59 Memory FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL 137MB 378MB 12MB 59MB 45MB 207MB 69MB 206MB Length FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL 67 78,353 172 254 131 131 1,008 514 Max Depth 86 13 384,286 500 1,000 4,000 16 285 67 78,353 172 278 131 132 1,008 514 One Last Heuristic • The choose-free heuristic: – Works only for abstracted Java programs – Rewards transitions that do not involve nondeterminism introduced by the abstraction – Prefers counterexamples that do not result from loss of precision introduced by the abstraction • Structure of abstraction, not program Previous Work • Edelkamp, Lafuente, and Leue – HSF-SPIN: SPIN + heuristic search framework • Bloem, Ravi, and Somenzi – Symbolic Guided Search: BDDs + heuristics – With BDDs heuristics can aid verification • Cobleigh, Clarke, and Osterweil – FLAVERS verification work Conclusions • Structural heuristics: a useful class of heuristics – When model checking is used for debugging, we may not know what kinds of bugs we are hunting • Property-specific heuristics are also useful; approach is complementary, not replacement – Most-blocked can perform as well or better than interleaving in the Remote Agent example, depending on the k limit and search method Future Work • • • • Experiment with other, larger examples Static analysis for property-specific heuristics Language for properties/search/heuristics Discover how heuristics work when symbolic execution is introduced into JPF • Counterexample analysis for “bug causality” • What other kinds of structure can be exploited with heuristics? – Counting occurrences of data values, perhaps