A Performance Study of BDD-Based Model Checking Bwolen Yang Randal E. Bryant, David R. O’Hallaron, Armin Biere, Olivier Coudert, Geert Janssen Rajeev K. Ranjan, Fabio Somenzi Motivation for Studying Model Checking (MC) MC is an important part of formal verification digital circuits and other finite state systems BDD is an enabling technology for MC Not well studied Packages are tuned using combinational circuits (CC) Qualitative differences between CC and MC computations CC: build outputs, constant time equivalence checking MC: build model, many fixed-points to verify the specs CC: BDD algorithms are polynomial MC: key BDD algorithms are exponential 2 Outline BDD Overview Organization of this Study participants, benchmarks, evaluation process Experimental Results performance improvements characterizations of MC computations BDD Evaluation Methodology evaluation platform various BDD packages » real workload » metrics 3 BDD Overview BDD DAG representation for Boolean functions fixed order on Boolean variables Set Representation represent set as Boolean function » an element’s value is true <==> it is in the set Transition Relation Representation set of pairs (current to next state transition) each state variable is split into two copies: » current state and next state 4 BDD Overview (Cont’d) BDD Algorithms dynamic programming sub-problems (operations) » recursively apply Shannon decomposition memoization: computed cache Garbage Collection recycle unreachable (dead) nodes Dynamic Variable Reordering BDD graph size depends on the variable order sifting based » nodes in adjacent levels are swapped 5 Organization of this Study: Participants Armin Biere: ABCD Carnegie Mellon / Universität Karlsruhe Olivier Coudert: TiGeR Synopsys / Monterey Design Systems Geert Janssen: EHV Eindhoven University of Technology Rajeev K. Ranjan: CAL Synopsys Fabio Somenzi: CUDD University of Colorado Bwolen Yang: PBF Carnegie Mellon 6 Organization of this Study: Setup Metrics: 17 statistics Benchmark: 16 SMV execution traces traces of BDD-calls from verification of » cache coherence, Tomasulo, phone, reactor, TCAS… size 6 million - 10 billion sub-operations » 1 - 600 MB of memory » Evaluation platform: trace driver “drives” BDD packages based on execution trace 7 Organization of this Study: Evaluation Process Phase 1: no dynamic variable reordering Phase 2: with dynamic variable reordering Iterative Process collect stats identify issues hypothesize suggest improvements design experiments validate validate 8 Phase 1 Results: Initial / Final 13 6 n/a 1e+6 100x 10x 1x initial time (sec) 1e+5 1 76 1e+4 1e+3 s 1e+2 1e+1 1e+1 1e+2 1e+3 1e+4 1e+5 n/a 1e+6 speedups > 100: 6 10 - 100: 16 5 - 10: 11 2 - 5: 28 final time (sec) Conclusion: collaborative efforts have led to significant performance improvements 9 Phase 1: Hypotheses / Experiments Computed Cache effects of computed cache size amounts of repeated sub-problems across time Garbage Collection reachable / unreachable Complement Edge Representation work space Memory Locality for Breadth-First Algorithms 10 Phase 1: Hypotheses / Experiments (Cont’d) For Comparison ISCAS85 combinational circuits (> 5 sec, < 1GB) c2670, c3540 13-bit, 14-bit multipliers based on c6288 Metrics depends only on the trace and BDD algorithms machine-independent implementation-independent 11 Computed Cache: Repeated Sub-problems Across Time Source of Speedup increase computed cache size Possible Cause many repeated sub-problems are far apart in time Validation study the number of repeated sub-problems across user issued operations (top-level operations). 12 Hypothesis: Top-Level Sharing Hypothesis MC computations have a large number of repeated sub-problems across the top-level operations. Experiment measure the minimum number of operations with GC disabled and complete cache. compare this with the same setup, but cache is flushed between top-level operations. 13 Results on Top-Level Sharing # of ops (flush / min) 100 10 1 Model Checking Traces ISCAS85 flush: cache flushed between top-level operations Conclusion: large cache is more important for MC 14 Garbage Collection: Rebirth Rate Source of Speedup reduce GC frequency Possible Cause many dead nodes become reachable again (rebirth) GC is delayed till the number of dead nodes reaches a threshold » dead nodes are reborn when they are part of the result of new sub-problems » 15 Hypothesis: Rebirth Rate Hypothesis MC computations have very high rebirth rate. Experiment measure the number of deaths and the number of rebirths 16 Results on Rebirth Rate rebirths / deaths 1 0.8 0.6 0.4 0.2 0 Model Checking Traces ISCAS85 Conclusions delay garbage collection triggering GC should not base only on # of dead nodes delay updating reference counts 17 BF BDD Construction On MC traces, breadth-first based BDD construction has no demonstrated advantage over traditional depth-first based techniques. Two packages (CAL and PBF) are BF based. 18 BF BDD Construction Overview Level-by-Level Access operations on same level (variable) are processed together one queue per level Locality group nodes of the same level together in memory Good memory locality due to BF ==> # of ops processed per queue visit must be high 19 # of ops processed per queue visit Average BF Locality 5000 4000 3000 2000 1000 0 Model Checking Traces ISCAS85 Conclusion: MC traces generally have less BF locality 20 avg. locality / # of ops (x 1e-6) Average BF Locality / Work 250 200 150 100 50 0 Model Checking Traces ISCAS85 Conclusion: For comparable BF locality, MC computations do much more work. 21 Phase 1: Some Issues / Open Questions Memory Management space-time tradeoff » computed cache size / GC frequency resource awareness » available physical memory, memory limit, page fault rate Top-Level Sharing possibly the main cause for strong cache dependency » high rebirth rate » better understanding may lead to better memory management » higher level algorithms to exploit the pattern » 22 Phase 2: Dynamic Variable Reordering BDD Packages Used CAL, CUDD, EHV, TiGeR improvements from phase 1 incorporated 23 Why is Variable Reordering Hard to Study Time-space tradeoff how much time to spent to reduce graph sizes Chaotic behavior e.g., small changes to triggering / termination criteria can have significant performance impact Resource intensive reordering is expensive space of possible orderings is combinatorial Different variable order ==> different computation e.g., many “don’t-care space” optimization algorithms 24 Phase 2: Experiments Quality of Variable Order Generated Variable Grouping Heuristic keep strongly related variables adjacent Reorder Transition Relation BDDs for the transition relation are used repeatedly Effects of Initial Variable Order with and without variable reordering Only CUDD is used 25 Variable Grouping Heuristic: Group Current / Next Variables Current / Next State Variables for transition relation, state variable is split into two: » current state and next state Hypothesis Grouping the corresponding current- and next-state variables is a good heuristic. Experiment for both with and without grouping, measure work (# of operations) » space (max # of live BDD nodes) » reorder cost (# of nodes swapped with their children) » 26 Results on Grouping Current / Next Variables 3 2 1 0 MC traces 3 nodes swapped max live nodes # of ops 4 Reorder Cost Space Work 2 1 0 MC traces 3 2 1 0 MC traces All results are normalized against no variable grouping. Conclusion: grouping is generally effective 27 Effects of Initial Variable Order: Experimental Setup For each trace, find a good variable ordering O perturb O to generate new variable orderings fraction of variables perturbed » distance moved » measure the effects of these new orderings with and without dynamic variable reordering 28 Effects of Initial Variable Order: Perturbation Algorithm Perturbation Parameters (p, d) p: probability that a variable will be perturbed d: perturbation distance Properties in average, p fraction of variables is perturbed max distance moved is 2d (p = 1, d = infinity) ==> completely random variable order For each perturbation level (p, d) generate a number (sample size) of variable orders 29 Effects of Initial Variable Order: Parameters Parameter Values p: (0.1, 0.2, …, 1.0) d: (10, 20, …, 100, infinity) sample size: 10 ==> for each trace, 1100 orderings 2200 runs (w/ and w/o dynamic reordering) 30 Effects of Initial Variable Order: Smallest Test Case Base Case (best ordering) time: 13 sec memory: 127 MB Resource Limits on Generated Orders time: 128x base case memory: 500 MB 31 Effects of Initial Variable Order: Result # of unfinished cases no reorder reorder # of cases 1200 800 400 0 >1x >2x >4x >8x >16x >32x >64x >128x time limit At 128x/500MB limit, “no reorder” finished 33%, “reorder” finished 90%. Conclusion: dynamic reordering is effective 32 Phase 2: Some Issues / Open Questions Computed Cache Flushing cost Effects of Initial Variable Order determine sample size Need New Better Experimental Design 33 BDD Evaluation Methodology Trace-Driven Evaluation Platform real workload (BDD-call traces) study various BDD packages focus on key (expensive) operations Evaluation Metrics more rigorous quantitative analysis 34 BDD Evaluation Methodology Metrics: Time elapsed time (performance) page fault rate CPU time GC time BDD Alg time reordering time memory usage # of ops (work) # of GCs computed cache size # of reorderings # of node swaps (reorder cost) 35 BDD Evaluation Methodology Metrics: Space memory usage computed cache size max # of BDD nodes # of GCs reordering time 36 Importance of Better Metrics Example: Memory Locality elapsed time (performance) CPU time cache miss rate page fault rate without accounting for work, may lead to false conclusions TLB miss rate memory locality # of GCs # of ops (work) computed cache size # of reorderings 37 Summary Collaboration + Evaluation Methodology significant performance improvements » up to 2 orders of magnitude characterization of MC computation computed cache size » garbage collection frequency » effects of complement edge » BF locality » reordering heuristic » – current / next state variable grouping effects of reordering the transition relation » effects of initial variable orderings » other general results (not mentioned in this talk) issues and open questions for future research 38 Conclusions Rigorous quantitative analysis can lead to: dramatic performance improvements better understanding of computational characteristics Adopt the evaluation methodology by: building more benchmark traces » for IP issues, BDD-call traces are hard to understand using / improving the proposed metrics for future evaluation For data and BDD traces used in this study, http://www.cs.cmu.edu/~bwolen/fmcad98/ 39 Phase 1: Hypotheses / Experiments Computed Cache effects of computed cache size amounts of repeated sub-problems across time Garbage Collection reachable / unreachable Complement Edge Representation work space Memory Locality for Breadth-First Algorithms 41 Phase 2: Experiments Quality of Variable Order Generated Variable Grouping Heuristic keep strongly related variables adjacent Reorder Transition Relation BDDs for the transition relation are used repeatedly Effects of Initial Variable Order for both with and without variable reordering Only CUDD is used 42 Benchmark “Sizes” # of BDD Vars Work Space Min Ops (millions) Max Live Nodes (thousands) MC 86 – 600 6 – ~10000 ISCAS85 26 – 233 15 – 178 53 – 26944 4363 – 9662 Min Ops: minimum number of sub-problems/operations (no GC and complete cache) Max Live Nodes: maximum number of live BDD nodes 43 Phase 1: Before/After Cumulative Speedup Histogram 75 76 76 # of cases 80 61 60 33 40 22 20 13 6 6 1 bad failed new >0 >0.95 >1 >2 >5 >10 >100 0 speedups 6 packages * 16 traces = 96 cases 44 Computed Cache Size Dependency Hypothesis The computed cache is more important for MC than for CC. Experiment Vary the cache size and measure its effects on work. size as a percentage of BDD nodes normalize the result to minimum amount of work necessary; i.e., no GC and complete cache. 45 Effects of Computed Cache Size MC Traces ISCAS85 Circuits 1e+4 # of ops # of ops 1e+4 1e+3 1e+2 1e+1 1e+0 10% 20% 40% 80% 1e+3 1e+2 1e+1 1e+0 10% cache size 20% 40% 80% cache size # of ops: normalized to the minimum number of operations cache size: % of BDD nodes Conclusion: large cache is important for MC 46 deaths / operations Death Rate 3 2.5 2 1.5 1 0.5 0 Model Checking Traces ISCAS85 Conclusion: death rate for MC can be very high 47 Effects of Complement Edge Work no C.E. / with C.E. # of ops 2 1.5 1 0.5 0 Model Checking Traces ISCAS85 Conclusion: complement edge only affects work for CC 48 Effects of Complement Edge Space no C.E. / with C.E. sum of BDD sizes 1.5 1 0.5 0 Model Checking Traces ISCAS85 Conclusion: complement edge does not affect space Note: maximum # of live BDD nodes would be better measure 49 Phase 2 Results With / Without Reorder no reorder (sec) n/a 1e+6 1e+5 100x 10x 1x 1e+4 .1x 1e+3 .01x 1 6 13 44 1e+2 1e+1 n/a 1e+1 1e+2 1e+3 1e+4 1e+5 1e+6 with reorder (sec) 4 packages * 16 traces = 64 cases 50 Variable Reordering Effects of Reordering Transition Relation Only Work 2 1 0 MC traces 3 2 1 0 Reorder Cost nodes swapped max live nodes # of ops 3 Space MC traces 1 0 MC traces All results are normalized against variable reordering w/ grouping. nodes swapped: number of nodes swapped with their children 51 Quality of the New Variable Order Experiment use the final variable ordering as the new initial order. compare the results using the new initial order with the results using the original variable order. 52 Results on Quality of the New Variable Order 4 original order (sec) 1e+6 n/a 100x 10x 1x 1e+5 1e+4 .1x 1e+3 .01x 60 1e+2 1e+1 n/a 1e+1 1e+2 1e+3 1e+4 1e+5 1e+6 new initial order (sec) Conclusion: qualities of new variable orders are generally good. 53 No Reorder (> 4x or > 500Mb) 10 9 7 6 4 2 1.0 0.8 0.6 1 0.4 0 0.2 10 20 30 40 50 60 70 80 90 100 3 probability 5 inf # of cases 8 distance 54 > 4x or > 500Mb Reorder 10 8 8 distance 0 prob 0.4 10 0.1 2 30 10 30 50 70 90 inf 0 0.4 1.0 0.7 50 2 4 70 1.0 0.7 90 4 6 inf 6 # of cases 10 prob # of cases No Reorder 0.1 distance Conclusions: For very low perturbation, reordering does not work well. Overall, very few cases get finished. 55 > 32x or > 500Mb Reorder 10 8 8 distance 0 prob 0.4 10 0.1 2 30 10 30 50 70 90 inf 0 0.4 1.0 0.7 50 2 4 70 1.0 0.7 90 4 6 inf 6 # of cases 10 prob # of cases No Reorder 0.1 distance Conclusion: variable reordering worked rather well 56 Memory Out (> 512Mb) Reorder 10 8 8 distance 0 prob 0.4 10 0.1 2 30 10 30 50 70 90 inf 0 0.4 1.0 0.7 50 2 4 70 1.0 0.7 90 4 6 inf 6 # of cases 10 prob # of cases No Reorder 0.1 distance Conclusion: memory intensive at highly perturbed region. 57 Timed Out (> 128x) Reorder 10 8 8 0 distance prob 0.4 10 0.1 2 30 10 30 50 70 90 inf 0 0.4 1.0 0.7 50 2 4 70 1.0 0.7 90 4 6 inf 6 # of cases 10 prob # of cases No Reorder 0.1 distance diagonal band from lower-left to upper-right 58 Issues and Open Questions: Cross Top-Level Sharing Why are there so many repeated sub-problems across top-level operations? Potential experiment identify how far apart these repetitions are 59 Issues and Open Questions: Inconsistent Cross-Platform Results For some BDD packages, results on machine A are 2x faster than B Other BDD packages, the difference is not as significant Probable Cause memory hierarchy This may shed some light on the memory locality issues. 60