A New Reachability Algorithm for Symmetric Multi-processor Architecture D. Sahoo, Stanford J. Jain, Fujitsu S. Iyer, UT-Austin D. Dill, Stanford Formal Equivalence and Assertion-based Verification Workshop 2005 Outline Standard Reachability Analysis Multithreaded Reachability Multithreaded Reachability in SMP machines Engineering Issues Results Conclusion and Future Work Related Work Parallel Reachability Analysis: – – – – – – Stern and Dill [CAV, 97] Stornetta and Brewer [DAC, 96] Yang, Hallaron [97] Heyman, Geist, Grumberg, Schuster [CAV, 00] Garavel, Mateescu, Smarandache [SPIN, 01] Pixley, Havlicek [03] Reachability using BDD [Burch et al. : 91] Partitioned Transition Relation … Tr1 I … Tri Trn R1 R2 Least Fixed Point Ri Initial State Image computation Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Communicate from 1 -> 2 Communicate from 1 -> 4 Communicate from 1 -> 3 Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Initial States : I Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Communicate from 2 -> 1 Communicate from 2 -> 3 Communicate from 2 -> 4 Similarly repeat for other partitions Partitioned Reachability using POBDD I POBDD - [Jain : 92] Reachability - [Narayan et al. : 97] Local Fixed Point 1 Local Fixed Point 2 Local Fixed Point 3 Local Fixed Point 4 Improvements: [Iyer et al. : 03] [Sahoo et al. : 04] Motivation for Multi-threaded Approach Scheduling Problem Increasing availability of powerful SMP machines Multi-threading is a way of achieving real parallelism in SMP machines Time Multi-threaded Reachability [DAC 05] Naïve parallelization Advantage: – Parallel speedup – Catch a bug faster than the sequential version Problems: – Not much parallelism Time Multi-threaded Reachability [DAC 05] Early Communication Advantage: – Parallel speedup – Finishes the reachability analysis faster – Catches bug faster than the naive version Problems: – Parallelism could be better Time Multi-threaded Reachability [DAC 05] Early Communication and Partial Communication Advantage: – Parallel speedup – Finishes the reachability analysis faster – Catches bug faster than the previous versions Time Reachability in SMP Architecture We find the bugs faster ! Improved parallelism Better parallel speedup Engineering Issues Thread-safe BDD library Deterministic behavior Smart thread scheduling Sources of Non-determinism Thread 1 Thread 2 p = malloc (…) p = malloc (…) key = hash(p) if (p > p1) … Extensive memory based optimizations Pointer comparisons Hashing based on memory address Solutions: – Deterministic Hashing – Deterministic comparisons Sources of Non-determinism Thread 1 Thread synchronization Solutions Thread 2 Image #n Image #n+1 – Synchronization based on deterministic count Number of ITE operations Number of Sift operations Smart Thread Scheduling CPU1 Thread Cache1 CPU2 Cache2 0x07ffd0 Lookup 0x07ffd0 Cach emis s Each processor has its own cache Thread is assigned to a processor The cache fills up with the thread’s memory usage. The same thread assigned to a different processor after sometime A large number of unnecessary cache miss when the thread use its previously used memory locations Solutions: – Bind thread to a processor – Leads to suboptimal throughput If the number of threads exceeds the number of processors BDD Performance : CUDD Vs New BDD Statistics after Reachability Analysis (Static Order) Ckts P/ F #i m g #node s bpb F 10 eight P fru32 CUDD New Mem (MB) Cache hits Cache collision Time Mem Cache hits Cache collision Time 1.8M 50M 41.0% 90.4% 18.6 61M 41.0% 88.2% 26.3 47 79K 6.1M 42.9% 26.2% 0.8 7.5M 42.9% 26.2% 1.5 F 2 8K 9.2M 34.0% 28.4% 7.9 10.9M 34.0% 28.9% 8.9 idu32 F 1 36K 6.6M 28.8% 5.0% 4.2 7.8M 28.7% 7.7% 4.5 usbphy P 1 90K 6.4M 37.7% 16.6% 0.7 7.8M 37.7% 17.1% 0.7 BDD Performance : CUDD Vs New BDD performance 2 1.8 1.6 1.4 Cudd Ratio 1.2 New Memory 1 New Cache Hits New Cache Collision 0.8 New Time 0.6 0.4 0.2 0 bpb eight fru32 Ckts idu32 usbphy Performance : Non-deterministic Vs Deterministic Verification Time in Sec Ckts Non-deterministic Deterministic c1 T/O 227 c2 962 917 c3 809 62 c4 903 161 d1 13 13 d2 24 30 d3 84 100 d4 30 38 d5 13 37 Performance: Cache or Parallelism Verification Time in Sec Ckts Uniprocessor Sequential Parallel In 8-way SMP In 8-way SMP c1 1570 286 227 d1 125 13 13 d2 180 39 30 d3 295 130 100 d4 176 60 38 Results on Industrial Circuits Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel Parallel Early Comm + Partial Comm 8 CPUs Naïve 8 CPUs Early Comm 1 CPU 8 CPUs c1 371 T/O T/O T/O 286 227 c2 3346 1789 1564 93 917 917 c3 2540 T/O T/O T/O 228 62 c4 2236 2084 1174 161 509 161 d1 6 T/O T/O 13 13 13 d2 10 11 13 45 39 30 d3 15 21 23 100 130 100 d4 11 T/O T/O 39 60 38 d5 12 16 15 34 37 37 Results on public benchmarks Ckt Vis Seq POBDD Parallel Multi-threaded Approaches Parallel Parallel Early Comm + Partial Comm 8 CPUs Naïve 8 CPUs Early Comm 1 CPU 8 CPUs spprod 891 61 53 93 510 440 am2910 T/O 281 122 204 386 356 palu 273 4 9 8 9 9 S1269b-1 3635 T/O T/O 59 72 60 S1269b-5 2287 T/O T/O 55 67 55 blackjck T/O 1213 470 340 98 70 Results : Gantt charts Real execution traces from our multi-threaded reachability program Conclusion and Future Work Parallelize the Reachability Multi-threaded Reachability Better results Deterministic behavior Future Work – Improve the parallelism further – Study cache behavior