CAV2011

advertisement
Getting Rid of Store-Buffers in
TSO Analysis
Mohamed Faouzi Atig
Uppsala University, Sweden
Ahmed Bouajjani
LIAFA, University of Paris 7, France
Gennaro Parlato
University of Southampton, UK
✓
Sequential consistency memory model (SC)
T1
…
Shared
Memory
Tn
Write(var,val): sh_mem[var]  val; (immidialy visible to all threads
Read(var): returns sh_mem[val];
SC=
• actions of different threads interleaved in any order
• action of the same thread maintain the execution order
WMM=
For performance reason
modern multi-processors reorder
memory operations of the same thread
Total Store Ordering (TSO)
T1
(x4) (z7) (y3)
M1
…
…
…
Tn
(z4) (y4)
Mn
Shared
Memory
•
Each thread has its store-buffer (FIFO)
•
Write(var,val): the pair (varval) is sent to the buffer
•
Memory update = execution of a Write taken from some buffer
•
Read(var) returns val
- If (var val) the last value written into var still in the store-buffer
- the buffer does not contain any Write to var, and sh_mem(var) = val
•
fence requires that the store-buffer is empty
Correct under SC -- Wrong under TSO
Dekker’s mutual exclusion protocol
Thread 1
a: y:=1
b: r1:=x
c: if (r1==0) then
d: critical section
Thread 2
1: x:=1
2: r2:=y
4: if (r2==0) then
4: critical section
Bad Schedule for TSO:
4
y1
x1
abcd
123
both threads in the critical section!!!
x
y
0
0
Verification for TSO?
• For finite state programs
reachability is non-primitive recursive
[Atig, Bouajjani, Burckhardt, Masuvathi – POPL’10]
• What shall we do?
• Symbolic representation of the store buffers?
[Linden, Wolper—SPIN’10]: Regular model-checking
• Our approach reduce the analysis from TSO to SC
• can be done only with approximations …
What is this talk about
If we restrict to only executions where each thread is executed at most
k times with no interruption (for a fixed k)
we can translate any concurrent program PTSO (recursion, thread
creation, heap, …) into another program PSC s.t.
•
PSC (under SC) simulates all possible executions of PTSO (under TSO)
where each thread is executed at most k times
•
PSC has no buffer at all! Simulation of the store-buffers using 2k copies of
the shared variables as locals
•
PSC has linear size in the size of PTSO
•
Advantage: use off-the-shelf SC tools for the analysis of TSO programs
Code-to-code translation
from TSO to SC
k-round (for each thread) reachability
P1 T1
M1
…
…
Pi
Mi
Ti
=
…
…
Run
(Ti1++Mi1)+
round Pi1
(Ti2++Mi2)+ ...
round Pi2
A k-round run : Ɐi # round Pi ≤ k
Shared
Memory
Compositional reasoning
[(Ti +Mi)*]k
round0
(Mask0 Buff0)

(Mask1 Buff1)
round1

round2
(Mask2 Buff2)

Getting rid of store-buffers
Maski
x
y
z

(Mask0 Buff0)

is a copy of the shared
vars as Boolean
(as locals)
(Mask1 Buff1)

(Mask2 Buff2)

Buffi
x
y
z
-
6
-
is a copy of the shared
vars (as locals)
Invariant:
x
y
Mask0


Mask1

Mask2


z

x
y
z
Buff0
3
5
-
Buff1
0
-
-
Buff2
0
1
4
store-buffer
round 2
(x0) (y1) (z4) (y7)
round 1
(x0) (x4) (x7)
round 0
(x3) (x7)
(y5)
at each time in the simulation
Maski [var]=1 iff
• there is a store in the store-buffer for var that update the
Shared memory at round i
• Buffi[var] containts the last value sent for var
Simulation
(Mask0 Buff0)
(Mask1 Buff1)
(Mask2 Buff2)
Before simulation:
• Masks set to False
• r_SC0; r_TSO0;
Simulation:
• All statements not involving shared
vars are executed
Write(var,val)
• Maskr_TSO[var]  T;
• Queuer_TSO[var]  val;
Read(var)
Let i be the greatest index s.t.
i>=r_SC & Maski(var) =1
End of round : (Update shared vars):
For all var
if Maskr_SC (var) ==1 varBuffr_SC [var];
if i>=0
else
return Queuei[var]
return var ;
round round
round Buff
i
0
1
2
Skeleton of the translation
Shared sh_vars;
before(){
// start round
Thread_i()
if (!sim){
lock;
Begin
sim=1; r_SC++;
if (r_TSO< r_SC)
locals l_vars; r_TSO, r_SC, sim, Mask0 , Buff0, …,Maskk , Buff
k;
r_TSO=r_SC;
}
Init(); // initialize Masks to False, r_SC=0, r_TSO, while(*)
sim=0; r_TSO++;
}
stmt_1;
stmt_2;
…
stmt_n;
end
stmt_j 
before();
stmt_j;
after();
after()
{
if(*) //end round
Update_shared(r_SC, Mask,
Queue)
sim=0; unlock;
}
Characteristics of the translation
• For fixed k, PSC is linear in the size of PTSO
• 2k copies of the shared variable as locals (no store-buffer)
• PSC and PTSO are in the same class
• no restriction on the programs is imposed
• The reachable shared states are the same in PSC and PTSO
A state S is reachable in PTSO with at most k rounds per thread
iff
S is reachable in PSC
Bounding Store Ages
Observation:
When r_SC =1 (Mask0, Buff0)
are not used any longer
(Mask0 Buff0)
Reuse the Mask and Queue variables:
Translation: (Maskj , Buffj) are
used circularly (modulo k+1).
(Mask1 Buff1)
k store-ages:
(Mask2 Buff2)
•
•
(Mask0 Buff0)
Unbounded rounds!
Constraint: each write pair remains
in the store-buffer for at most k
rounds
…
…
How can we use this
code-to-code
translation?
Corollaries
Decidability results for TSO reachability
Our code-to-code translation is a linear reduction TSO -> SC.
schedules
(k fixed)
Concurrent
Boolean Prog.
Complexity
Inherit decidability from SC
References
k-store-ages
no recursion
Pspace
k contextswitches
Recursion
Exptime
[Qadeer, Rehof – TACAS’05]
k round-robin
Recursion
Finite # threads
|parameterized
Exptime
[Lal, Reps–CAV’08]
[La Torre, P., Madhusudan—CAV’09]
[La Torre, P., Madhusudan—CAV’10]
k-rounds per
thread
recursion
thread-creation
2-Expspace
[Atig, Bouajjani, Qadeer – TACAS’09]
k-delay bound
recursion
thread- creation
Exptime
[Emmi, Qadeer, Rakamaric—POPL’11]
k-compositional
recursion
thread-creation
Exptime
[Bouajjani, Emmi, P.—SAS’11]
Tools for SC  Tools for TSO
(our code-to-code translation as a plug-in)
A convenient way to get new tools for TSO …
Concurrent Program
Experiments
Mutual exclusion
Protocols
POIROT
Loop unrolling: 2
No fences
(buggy for TSO)
D=1
Dekker
(by MSR)
D stands for Delay bound
With fences
(correct for TSO)
D=1
D=2
7s
6s
72 s
Lamport
26 s
110 s
1608 s
Peterson
5s
6s
47 s
Szymanski
8s
6s
978 s
POIROT: SMT-based bounded model-checkers for SC programs
Errors due to TSO discovered in few seconds!
POIROT can also be a model-checker for TSO!
Conclusions
Conclusions
We have proposed a code-to-code translation from TSO to SC
• allows to use existing and future tools designed for SC
to analyze programs running under TSO
• under-approximation (error finding)
• restrictions imposed on the analyzed runs is useful to
find errors in programs
Beyond TSO ? Generic approach ?
Thanks!
Download