PPT

advertisement
A New Reachability Algorithm for
Symmetric Multi-processor
Architecture
D. Sahoo, Stanford
J. Jain, Fujitsu
S. Iyer, UT-Austin
D. Dill, Stanford
Formal Equivalence and
Assertion-based Verification
Workshop 2005
Outline






Standard Reachability Analysis
Multithreaded Reachability
Multithreaded Reachability in SMP machines
Engineering Issues
Results
Conclusion and Future Work
Related Work

Parallel Reachability Analysis:
–
–
–
–
–
–
Stern and Dill [CAV, 97]
Stornetta and Brewer [DAC, 96]
Yang, Hallaron [97]
Heyman, Geist, Grumberg, Schuster [CAV, 00]
Garavel, Mateescu, Smarandache [SPIN, 01]
Pixley, Havlicek [03]
Reachability using BDD
[Burch et al. : 91]
Partitioned Transition Relation
…
Tr1
I
…
Tri
Trn
R1
R2
Least Fixed Point
Ri
Initial State
Image computation
Partitioned Reachability using
POBDD
I
POBDD - [Jain : 92]
Reachability - [Narayan et al. : 97]
Initial States : I
Local Fixed Point 1
Local Fixed Point 2
Local Fixed Point 3
Local Fixed Point 4
Partitioned Reachability using
POBDD
I
POBDD - [Jain : 92]
Reachability - [Narayan et al. : 97]
Initial States : I
Local Fixed Point 1
Local Fixed Point 2
Local Fixed Point 3
Local Fixed Point 4
Communicate from 1 -> 2
Communicate from 1 -> 4
Communicate from 1 -> 3
Partitioned Reachability using
POBDD
I
POBDD - [Jain : 92]
Reachability - [Narayan et al. : 97]
Initial States : I
Local Fixed Point 1
Local Fixed Point 2
Local Fixed Point 3
Local Fixed Point 4
Communicate from 2 -> 1
Communicate from 2 -> 3
Communicate from 2 -> 4
Similarly repeat for other
partitions
Partitioned Reachability using
POBDD
I
POBDD - [Jain : 92]
Reachability - [Narayan et al. : 97]
Local Fixed Point 1
Local Fixed Point 2
Local Fixed Point 3
Local Fixed Point 4
Improvements:
[Iyer et al. : 03]
[Sahoo et al. : 04]
Motivation for Multi-threaded
Approach



Scheduling Problem
Increasing availability of powerful SMP machines
Multi-threading is a way of achieving real parallelism
in SMP machines
Time
Multi-threaded Reachability [DAC 05]
Naïve parallelization

Advantage:
– Parallel speedup
– Catch a bug faster than the
sequential version

Problems:
– Not much parallelism
Time
Multi-threaded Reachability [DAC 05]
Early Communication

Advantage:
– Parallel speedup
– Finishes the reachability
analysis faster
– Catches bug faster than
the naive version

Problems:
– Parallelism could be better
Time
Multi-threaded Reachability [DAC 05]
Early Communication and
Partial Communication

Advantage:
– Parallel speedup
– Finishes the reachability
analysis faster
– Catches bug faster than
the previous versions
Time
Reachability in SMP Architecture


We find the bugs faster !
Improved parallelism
 Better parallel speedup
Engineering Issues



Thread-safe BDD library
Deterministic behavior
Smart thread scheduling
Sources of Non-determinism

Thread 1
Thread 2

p = malloc (…)
p = malloc (…)
key = hash(p)
if (p > p1) …


Extensive memory based
optimizations
Pointer comparisons
Hashing based on memory
address
Solutions:
– Deterministic Hashing
– Deterministic comparisons
Sources of Non-determinism
Thread 1

Thread synchronization

Solutions
Thread 2
Image #n
Image #n+1
– Synchronization based on
deterministic count
 Number of ITE
operations
 Number of Sift
operations
Smart Thread Scheduling
CPU1
Thread
Cache1
CPU2
Cache2




0x07ffd0

Lookup 0x07ffd0
Cach
emis
s

Each processor has its own cache
Thread is assigned to a processor
The cache fills up with the thread’s
memory usage.
The same thread assigned to a
different processor after sometime
A large number of unnecessary
cache miss when the thread use its
previously used memory locations
Solutions:
– Bind thread to a processor
– Leads to suboptimal throughput

If the number of threads exceeds
the number of processors
BDD Performance : CUDD Vs New
BDD Statistics after Reachability Analysis (Static Order)
Ckts
P/
F
#i
m
g
#node
s
bpb
F
10
eight
P
fru32
CUDD
New
Mem
(MB)
Cache
hits
Cache
collision
Time
Mem
Cache
hits
Cache
collision
Time
1.8M
50M
41.0%
90.4%
18.6
61M
41.0%
88.2%
26.3
47
79K
6.1M
42.9%
26.2%
0.8
7.5M
42.9%
26.2%
1.5
F
2
8K
9.2M
34.0%
28.4%
7.9
10.9M
34.0%
28.9%
8.9
idu32
F
1
36K
6.6M
28.8%
5.0%
4.2
7.8M
28.7%
7.7%
4.5
usbphy
P
1
90K
6.4M
37.7%
16.6%
0.7
7.8M
37.7%
17.1%
0.7
BDD Performance : CUDD Vs New
BDD performance
2
1.8
1.6
1.4
Cudd
Ratio
1.2
New Memory
1
New Cache Hits
New Cache Collision
0.8
New Time
0.6
0.4
0.2
0
bpb
eight
fru32
Ckts
idu32
usbphy
Performance : Non-deterministic
Vs Deterministic
Verification Time in Sec
Ckts
Non-deterministic
Deterministic
c1
T/O
227
c2
962
917
c3
809
62
c4
903
161
d1
13
13
d2
24
30
d3
84
100
d4
30
38
d5
13
37
Performance: Cache or Parallelism
Verification Time in Sec
Ckts
Uniprocessor
Sequential
Parallel
In 8-way SMP
In 8-way SMP
c1
1570
286
227
d1
125
13
13
d2
180
39
30
d3
295
130
100
d4
176
60
38
Results on Industrial Circuits
Ckt
Vis
Seq
POBDD
Parallel Multi-threaded Approaches
Parallel
Parallel
Early Comm + Partial Comm
8 CPUs
Naïve
8 CPUs
Early Comm
1 CPU
8 CPUs
c1
371
T/O
T/O
T/O
286
227
c2
3346
1789
1564
93
917
917
c3
2540
T/O
T/O
T/O
228
62
c4
2236
2084
1174
161
509
161
d1
6
T/O
T/O
13
13
13
d2
10
11
13
45
39
30
d3
15
21
23
100
130
100
d4
11
T/O
T/O
39
60
38
d5
12
16
15
34
37
37
Results on public benchmarks
Ckt
Vis
Seq
POBDD
Parallel Multi-threaded Approaches
Parallel
Parallel
Early Comm + Partial Comm
8 CPUs
Naïve
8 CPUs
Early Comm
1 CPU
8 CPUs
spprod
891
61
53
93
510
440
am2910
T/O
281
122
204
386
356
palu
273
4
9
8
9
9
S1269b-1
3635
T/O
T/O
59
72
60
S1269b-5
2287
T/O
T/O
55
67
55
blackjck
T/O
1213
470
340
98
70
Results : Gantt charts
Real execution traces from our multi-threaded reachability program
Conclusion and Future Work

Parallelize the Reachability
Multi-threaded Reachability
Better results
Deterministic behavior

Future Work



– Improve the parallelism further
– Study cache behavior
Download