Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models:

Bounded Model Checking of

Concurrent Data Types on Relaxed

Memory Models:

A Case Study

Sebastian Burckhardt

Rajeev Alur

Milo M. K. Martin

Department of Computer and Information Science

University of Pennsylvania

CAV 2006, Seattle

The General Problem

software multiprocessor concurrent executions bugs

-2-





 concurrency libraries can help e.g. Java JSR-166 but how to debug the libraries?


The Specific Problem

optimized implementations of concurrent datatypes concurrent executions shared-memory multiprocessor with relaxed memory model bugs case study: use SAT solver to find bugs

-3-


Case Study:

Two-Lock Queue

 Algorithm published by M. Michael and M. Scott [PODC 1996]

1 2 lock head lock

3 tail lock head lock tail









Singly linked list with head and tail pointers

Dummy node at front

Independent head and tail locks

→ allows for concurrent enqueue() and dequeue()

Race condition if queue is empty

-4-


Case Study:

Our Correctness Criterion

 client program observes

 ordering of operation calls within each thread

 argument and return values of the operation thread 1 enqueue( 1 ) enqueue( 2 ) thread 2 enqueue( 3 ) dequeue() → 1 thread 3 dequeue() → 3 dequeue() → 2



 code is correct if and only if all executions are observationally equivalent to some serial execution

(def. serial: interleaved at operation boundaries only)

We assume serial executions are correct

(can be verified by convential sequential methods)

-5-


Finer Interleavings

= More Executions

Serial

SC

Relaxed

Reordered Instructions

= More Executions





 serial executions threads interleave the operations

(operations are atomic)

(operations are in-order) sequentially consistent executions threads interleave the instructions

(instructions are atomic)

(instructions are in-order) relaxed executions hardware makes performancemotivated compromises

(stores may be non-atomic)

(loads/stores may be out-of-order)

-6-


Case Study:

Relaxed Memory Models

 Example: thread 1 x = 1 y = 2 thread 2 print y print x

→ 2

→ 0 output not consistent with any interleaved execution!

 can be the result of out-of-order stores

 can be the result of out-of-order loads

 improves performance (more choices for processor)

 Q: Why doesn’t everything break?

A: Relaxations are transparent to “normal” programs



 uniprocessor semantics are preserved library code for lock/unlock contains memory ordering fences

-7-


Which Memory Model?

Alpha

RMO

PSO

TSO

SC

390

Relaxed

PPC









Memory models are platform dependent

We use a conservative approximation

“Relaxed” to capture common effects

Once code is correct for “Relaxed”, it is correct for all models

See paper for formal spec of “Relaxed”

-8-


Halftime Overview

done coming up

 General motivation





Case study parameters





Two-lock queue implementation

Correctness criterion

 Relaxed memory models

Our verification method





Symbolic tests

SAT encoding

 Results





Bugs found

Evaluation & Conclusion

-9-


Our Verification Method

symbolic test

1

Encoder

5

SAT solver

2 implementation code with commit points

3 pass counterexample

4

-10-


1

How To Bound Executions





Verify individual “symbolic tests”





 finite number of operations nondeterministic instruction order nondeterministic input values

Example

(this is the smallest one in our test suite) thread 1 enqueue( X ) thread 2 dequeue() → Y

 User creates suite of tests of increasing size

-11-


Why symbolic test programs?

1) Avoid undecidability by making everything finite:

 State is unbounded (dynamic memory allocation)

... is bounded for individual test

 Sequential consistency is undecidable

... is decidable for individual test

2) Gives us finite instruction sequence to work with

 State space too large for interleaved system model

.... can directly encode value flow between instructions

 Memory model specified by axioms

.... can directly encode ordering axioms on instructions

-12-


2

Implementation code



 we handtranslated Michael & Scott’s code (above) into a low-level representation that uses explicit loads, stores we added code for dynamic memory allocation and locks

-13-


3

Commit points

 designate where the operation commits logically



 given order of commit points, we can construct serial witness execution eliminates the  in

“  executions  equivalent serial execution”

-14-


4

1

11

12

2

3

13

14

Counterexample Trace

thread 1 enqueue ( 1 ) thread 2 dequeue() → 0

4

5

6

9

10

7

8



 commit point order ( 3 < 6 ) indicates that enqueue precedes dequeue, so we would expect dequeue()

→

1 incorrect value ( 0 ) of queue element gets read ( 7 ) before correct value ( 1 ) is being written ( 11 ).

-15-


5

Encoding

 Given





 symbolic test T ( A , B ) thread 1 enqueue( A ) thread 2 dequeue() → B memory model Y implementation code & commit point specifications

 Encoding





First step : encode concurrent executions of T on Y as solutions to CNF formula



Y

( A , B , X ) (aux vars X )

Second step : encode counterexamples as solutions to



Y

( A , B , X )  

Atomic

( A’ , B’ , X’ )

 ( A = A’ )

 (commit point orders match)

 (( B

 B’ )  (some operations commit out of order))

-16-


Encoding Detail:

Obtain Symbolic Instruction Stream









Finite instruction sequence for each thread

Only loads, stores, moves, and fences

Each register is assigned exactly once

Control flow represented by predicates

-17-



Memory Order







Example: two threads: thread 1 s1 s2 store store thread 2 l1 l2 load load

Encoding variables





Use bool vars for relative order ( x<y ) of memory accesses

Use bitvector variables A x

2 and D x associated with memory access x for address and data values

Encode constraints





 encode transitivity of memory order encode ordering axioms of the memory model

Example (for SC): ( s1<s2 )



( l1<l2 ) encode value flow

“Loaded value must match last value stored to same address”

Example: value must flow from s1 to l1 under following conditions:

(

( s1<l1 )



( A s1

D l1

)

= A l1

)

 ( ( s2<s1 )



( l1<s2 )



( A s2

 A l1

) )

) 

( D s1

=

-18-



The combined formula

input values output values intermediate values memory order variables communication formula

-19thread-local formulas


So what did we learn in the case study?

done coming up

 General motivation





Case study parameters





Two-lock queue implementation

Correctness criterion

 Relaxed memory models

Our verification method





Symbolic tests

SAT encoding

 Results





Bugs found

Evaluation & Conclusion

-20-


Results: 5 code problems found





3 were mistakes we made

 first commit point guess was wrong

 incorrect/insufficient fences in lock/unlock and alloc/free

2 were caused by missing fences in queue implementation

(not fault of authors... were assuming SC multiprocessor)

---load-load-fence

---store-store-fence

-21-


Results: Scalability

1000

900

800

700

600

500

400

300

200

100

0







Graph shows tests in our suite (unsatisfiable instances only)

 y-axis : runtime in seconds

 x-axis : # accesses

(loads/stores) in test

Fast on small tests, slow on long tests

Not sensitive to # threads

0 50 100 150

# memory accesses

-22-

 All 5 problems were found on smallest 2 tests... all under 1 sec


Conclusion

We would recommend this method to designers and implementors of concurrent data types.

PROs

CHALLENGES

 quickly finds subtle bugs





 supports relaxed memory models counterexample traces not truly scalable

(though scalable enough to be useful)



 not fully automatic

 catches broad range of bugs

(not limited to deadlocks or data races) is more automatic than deductive methods

 does not solve full problem

(bounded instances, commit points)

-23-


-24-


Ordering/Atomicity Relaxations





The following 2 examples illustrate the main effects

(1. ordering relaxation / 2. atomicity relaxation)

Where necessary, a programmer can prevent these effects by inserting fence instructions

EXAMPLE 1 store, load may execute out of order

3

1 processor 1 store A, 1 load B, 0

4

2 processor 2 store B, 1 load A, 0

EXAMPLE 2 stores are buffered locally before effect is global

1/6

2

3 processor 1 store A, 1 load A, reg store reg, B

-25-

4

5 processor 2 load B, 1 load A, 0 initially A=B=0 pink numbers = memory order initially A=B=0 split store into local / remote components


What code?

Data type implementations optimized for concurrent execution

(Concurrency libraries)

What machines?

Common shared-memory multiprocessors

(e.g. PPC, Sparc, Alpha)

What bugs?

Bugs caused by concurrency

(We assume code runs fine if single-threaded)

-26-


Encoding Concurrent Executions

 label x

1 x

2 load a[0], R1 store R1 , y label x

3 x

4 load y, R2 move R2 +1, R3 store 1 , a[ R3 ]

Variables





O(n 2 ) bitvectors R1 , R2 , R3 for intermediate values boolean variables M ij to represent memory order x i

< x j

(for i < j)

 Constraints



 memory order is transitive:

Λ i<j<k

( M ij

 M jk

) → M ik loads get latest value stored to same address





O(n 3 ) memory order must respect memory model axioms and fences

(e.g. sequential consistency requires M

12

 M

34

) thread-local computations connect values (e.g. R3 = R2 + 1)

-27-


Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models:

Bounded Model Checking of

Concurrent Data Types on Relaxed

Memory Models:

A Case Study

The General Problem

The Specific Problem

Case Study:

Two-Lock Queue

Case Study:

Our Correctness Criterion

Finer Interleavings

= More Executions

Reordered Instructions

= More Executions

Case Study:

Relaxed Memory Models

Which Memory Model?

Halftime Overview

Our Verification Method

How To Bound Executions

Why symbolic test programs?

Implementation code

Commit points

Counterexample Trace

Encoding

Encoding Detail:

Obtain Symbolic Instruction Stream

Encoding Detail:

Memory Order

Encoding Detail:

The combined formula

So what did we learn in the case study?

Results: 5 code problems found

Results: Scalability

Conclusion

Related documents

Products

Support

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: