Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models:


Bounded Model Checking of

Concurrent Data Types on Relaxed

Memory Models:

A Case Study

Sebastian Burckhardt

Rajeev Alur

Milo M. K. Martin

Department of Computer and Information Science

University of Pennsylvania

CAV 2006, Seattle

The General Problem

software multiprocessor concurrent executions bugs


 concurrency libraries can help e.g. Java JSR-166 but how to debug the libraries?

Sebastian Burckhardt

The Specific Problem

optimized implementations of concurrent datatypes concurrent executions shared-memory multiprocessor with relaxed memory model bugs case study: use SAT solver to find bugs


Sebastian Burckhardt

Case Study:

Two-Lock Queue

 Algorithm published by M. Michael and M. Scott [PODC 1996]

1 2 lock head lock

3 tail lock head lock tail

Singly linked list with head and tail pointers

Dummy node at front

Independent head and tail locks

→ allows for concurrent enqueue() and dequeue()

Race condition if queue is empty


Sebastian Burckhardt

Case Study:

Our Correctness Criterion

 client program observes

 ordering of operation calls within each thread

 argument and return values of the operation thread 1 enqueue( 1 ) enqueue( 2 ) thread 2 enqueue( 3 ) dequeue() → 1 thread 3 dequeue() → 3 dequeue() → 2

 code is correct if and only if all executions are observationally equivalent to some serial execution

(def. serial: interleaved at operation boundaries only)

We assume serial executions are correct

(can be verified by convential sequential methods)


Sebastian Burckhardt

Finer Interleavings

= More Executions




Reordered Instructions

= More Executions

 serial executions threads interleave the operations

(operations are atomic)

(operations are in-order) sequentially consistent executions threads interleave the instructions

(instructions are atomic)

(instructions are in-order) relaxed executions hardware makes performancemotivated compromises

(stores may be non-atomic)

(loads/stores may be out-of-order)


Sebastian Burckhardt

Case Study:

Relaxed Memory Models

 Example: thread 1 x = 1 y = 2 thread 2 print y print x

→ 2

→ 0 output not consistent with any interleaved execution!

 can be the result of out-of-order stores

 can be the result of out-of-order loads

 improves performance (more choices for processor)

 Q: Why doesn’t everything break?

A: Relaxations are transparent to “normal” programs

 uniprocessor semantics are preserved library code for lock/unlock contains memory ordering fences


Sebastian Burckhardt

Which Memory Model?









Memory models are platform dependent

We use a conservative approximation

“Relaxed” to capture common effects

Once code is correct for “Relaxed”, it is correct for all models

See paper for formal spec of “Relaxed”


Sebastian Burckhardt

Halftime Overview

done coming up

 General motivation

Case study parameters

Two-lock queue implementation

Correctness criterion

 Relaxed memory models

Our verification method

Symbolic tests

SAT encoding

 Results

Bugs found

Evaluation & Conclusion


Sebastian Burckhardt

Our Verification Method

symbolic test




SAT solver

2 implementation code with commit points

3 pass counterexample



Sebastian Burckhardt


How To Bound Executions

Verify individual “symbolic tests”

 finite number of operations nondeterministic instruction order nondeterministic input values


(this is the smallest one in our test suite) thread 1 enqueue( X ) thread 2 dequeue() → Y

 User creates suite of tests of increasing size


Sebastian Burckhardt

Why symbolic test programs?

1) Avoid undecidability by making everything finite:

 State is unbounded (dynamic memory allocation)

... is bounded for individual test

 Sequential consistency is undecidable

... is decidable for individual test

2) Gives us finite instruction sequence to work with

 State space too large for interleaved system model

.... can directly encode value flow between instructions

 Memory model specified by axioms

.... can directly encode ordering axioms on instructions


Sebastian Burckhardt


Implementation code

 we handtranslated Michael & Scott’s code (above) into a low-level representation that uses explicit loads, stores we added code for dynamic memory allocation and locks


Sebastian Burckhardt


Commit points

 designate where the operation commits logically

 given order of commit points, we can construct serial witness execution eliminates the  in

“  executions  equivalent serial execution”


Sebastian Burckhardt









Counterexample Trace

thread 1 enqueue ( 1 ) thread 2 dequeue() → 0








 commit point order ( 3 < 6 ) indicates that enqueue precedes dequeue, so we would expect dequeue()

1 incorrect value ( 0 ) of queue element gets read ( 7 ) before correct value ( 1 ) is being written ( 11 ).


Sebastian Burckhardt



 Given

 symbolic test T ( A , B ) thread 1 enqueue( A ) thread 2 dequeue() → B memory model Y implementation code & commit point specifications

 Encoding

First step : encode concurrent executions of T on Y as solutions to CNF formula


( A , B , X ) (aux vars X )

Second step : encode counterexamples as solutions to


( A , B , X )  


( A’ , B’ , X’ )

 ( A = A’ )

 (commit point orders match)

 (( B

 B’ )  (some operations commit out of order))


Sebastian Burckhardt

Encoding Detail:

Obtain Symbolic Instruction Stream

Finite instruction sequence for each thread

Only loads, stores, moves, and fences

Each register is assigned exactly once

Control flow represented by predicates


Sebastian Burckhardt

Encoding Detail:

Memory Order

Example: two threads: thread 1 s1 s2 store store thread 2 l1 l2 load load

Encoding variables

Use bool vars for relative order ( x<y ) of memory accesses

Use bitvector variables A x

2 and D x associated with memory access x for address and data values

Encode constraints

 encode transitivity of memory order encode ordering axioms of the memory model

Example (for SC): ( s1<s2 )

( l1<l2 ) encode value flow

“Loaded value must match last value stored to same address”

Example: value must flow from s1 to l1 under following conditions:


( s1<l1 )

( A s1

D l1


= A l1


 ( ( s2<s1 )

( l1<s2 )

( A s2

 A l1

) )

) 

( D s1



Sebastian Burckhardt

Encoding Detail:

The combined formula

input values output values intermediate values memory order variables communication formula

-19thread-local formulas

Sebastian Burckhardt

So what did we learn in the case study?

done coming up

 General motivation

Case study parameters

Two-lock queue implementation

Correctness criterion

 Relaxed memory models

Our verification method

Symbolic tests

SAT encoding

 Results

Bugs found

Evaluation & Conclusion


Sebastian Burckhardt

Results: 5 code problems found

3 were mistakes we made

 first commit point guess was wrong

 incorrect/insufficient fences in lock/unlock and alloc/free

2 were caused by missing fences in queue implementation

(not fault of authors... were assuming SC multiprocessor)




Sebastian Burckhardt

Results: Scalability












Graph shows tests in our suite (unsatisfiable instances only)

 y-axis : runtime in seconds

 x-axis : # accesses

(loads/stores) in test

Fast on small tests, slow on long tests

Not sensitive to # threads

0 50 100 150

# memory accesses


 All 5 problems were found on smallest 2 tests... all under 1 sec

Sebastian Burckhardt


We would recommend this method to designers and implementors of concurrent data types.



 quickly finds subtle bugs

 supports relaxed memory models counterexample traces not truly scalable

(though scalable enough to be useful)

 not fully automatic

 catches broad range of bugs

(not limited to deadlocks or data races) is more automatic than deductive methods

 does not solve full problem

(bounded instances, commit points)


Sebastian Burckhardt


Sebastian Burckhardt

Ordering/Atomicity Relaxations

The following 2 examples illustrate the main effects

(1. ordering relaxation / 2. atomicity relaxation)

Where necessary, a programmer can prevent these effects by inserting fence instructions

EXAMPLE 1 store, load may execute out of order


1 processor 1 store A, 1 load B, 0


2 processor 2 store B, 1 load A, 0

EXAMPLE 2 stores are buffered locally before effect is global



3 processor 1 store A, 1 load A, reg store reg, B



5 processor 2 load B, 1 load A, 0 initially A=B=0 pink numbers = memory order initially A=B=0 split store into local / remote components

Sebastian Burckhardt

What code?

Data type implementations optimized for concurrent execution

(Concurrency libraries)

What machines?

Common shared-memory multiprocessors

(e.g. PPC, Sparc, Alpha)

What bugs?

Bugs caused by concurrency

(We assume code runs fine if single-threaded)


Sebastian Burckhardt

Encoding Concurrent Executions

 label x

1 x

2 load a[0], R1 store R1 , y label x

3 x

4 load y, R2 move R2 +1, R3 store 1 , a[ R3 ]


O(n 2 ) bitvectors R1 , R2 , R3 for intermediate values boolean variables M ij to represent memory order x i

< x j

(for i < j)

 Constraints

 memory order is transitive:

Λ i<j<k

( M ij

 M jk

) → M ik loads get latest value stored to same address

O(n 3 ) memory order must respect memory model axioms and fences

(e.g. sequential consistency requires M


 M


) thread-local computations connect values (e.g. R3 = R2 + 1)


Sebastian Burckhardt
