Verification for Concurrency

Verification for Concurrency

Part 2: Incomplete techniques and bug finding

Contents

 Race detection

 Context bounding and Sequentialization

 Odds and ends

Race detection

Data Races

 “Two threads simultaneously access the same memory location, with at least one access being a write ”

 Most concurrent software is written to avoid data-races

 Extremely tricky to write racy code that is correct

 For racy code, correctness on one architecture and complier does not imply correctness for all

 Race-detection is a very efficient light-weight bug detection technique

 Dynamic techniques: Work on actual executions

The Lockset algorithm

 Lockset assumption: Whenever two different threads access a location (with one of them being a write), they will both hold a common lock .

 Possible to write non-racy code violating this assumption

 False positives from code that violates this assumption

Vanilla Lockset algorithm

for each tid: LocksHeld[tid] = ∅ for each v : Cands[v] = set of all locks for each instruction in trace:

// Maintain lockset if (instruction = lock(l))

LocksHeld[tid] = LocksHeld[tid] ∪ { l } if (instruction = unlock(l))

LocksHeld[tid] = LocksHeld[tid] \ { l }

// Update candidate locks if (access to v by thread tid)

Cands[v] := Cands[v] ∩ LocksHeld[tid] if (Cands[v] = ∅ )

Output “Potential race on v”

 Based on a simple locking discipline

 Each variable is protected by some lock

Vanilla Lockset algorithm: Example

Thread 2: Thread 1: lock (l1) x := x + 4 unlock (l1) lock(l1) lock(l2) x := x – y y := y + 4 unlock(l2) unlock(l1) lock (l2) y = y + 5 x := x - 3 unlock (l2) x := x - 3

Vanilla LockSet: False positive

Thread2: Thread3: Thread1:

// Initialize g := 1729

// Share lock(l) published := true unlock(l) lock(l) assume(published) unlock(l) y := g y := f(l, y); output(y) lock(l) assume(published) unlock(l) z := g z := g(l, z) output(z)

More advanced locking disciplines

 Simple locking discipline leads to false positives in many common cases

 Lazy initialization

 Initialize followed by read-only access

Virgin

Shared

Modified write (new thread) write write read (new thread)

Exclusive Shared r/w (first thread)

More advanced locking discipline

for each access to v by thread tid: update State[v] if State[v] = Exclusive

// Do nothing else if State[v] = Shared

// Update lockset, don’t report any races

Cands[v] = Cands[v] ∩ LocksHeld[tid] else if State[v] = Shared-Modified

// Update lockset, report races

Cands[v] := Cands[v] ∩ LocksHeld[tid] if (Cands[v] = ∅ )

Output “Potential race on v”

 Works well for initialize-and-publish, lazy initialization, etc

 Doesn’t work with ownership-transfer, etc

Advanced LockSet: Examples

Thread1:

// Initialize g := 1729

// Share lock(l) published := true unlock(l)

Thread2: lock(l) assume(published) unlock(l) y := g y := f(l, y); output(y)

Thread3: lock(l) assume(published) unlock(l) z := g z := g(l, z) output(z)

Advanced LockSet: False positive

Thread2: Thread1:

…

… await (Obj.owner == 1);

Obj.foo()

Obj.bar := baz();

…

Obj.owner := 2

… await (Obj.owner = 2);

Obj.bar()

Obj.foo = baz();

…

Happens-Before relation

 Proposed by Lamport in 1978 for distributed systems

 Many race detection algorithms try and approximate the happens-before relation

 Basic idea: two events are related if and only if communication allows information-flow between them

 We write e i

 e j

 Informally, if e trace i

 for event-i happens-before event-j e j

, e i happens-before e j in all variations of the

 A race if two events e and neither e i

 e j i and e j accesses the same location, holds, nor e j

 e i holds

Happens-Before Relation: Defintion

 If two events are from the same thread, the earlier one happens-before the later

 thread(e i

) = thread(e j

)

∧ i < j  e i

 e j

 Happens-before is transitive

 (e i

 e j

) ∧ (e j

 e k

)  (e i

 e k

)

 Every synchronization gives some happens-befores

 LOCK: if e i

 WAIT/NOTIFY: if e i

 e j

 … is a unlock and e j is a lock later, e i is a notify and e j

 e j is a corresponding wait, e i

…

Happens-Before: Examples

Thread1: obj := new Foo()

Notify(obj)

Thread2: data = readFile()

Wait(obj) obj.data = data

Notify(obj)

Wait(obj) lock(l) obj.data = obj.data + 4 unlock(l) lock(l) obj.data = obj.data - 4 unlock(l)

…

Computing the Happens-Before:

Vector Clocks

 The happens-before relation is usually very expensive to compute

 Few dynamic techniques actually compute the full relation

 Classical method proposed by Lamport himself

 Vector clocks : with each event, associate a “vector clock” storing the last event from the other threads that affects it

 e i

 e j if and only if VC[e j

][thread(i)] >= e i

Vector clocks

VC[e] = [ e

1

, e

2

, e

3

, … , e n

]

Last relevant event from thread 1

Last relevant event from thread 2

VC[e][thread(e)] := e

// regular events

VC[e][tid] := VC[prev(e)][tid]

Can be extended to other synchronization primitives!

// Acquire locks e i

 e j if and only if VC[e j

][thread(i)] >= e i

// Release locks

VC[e][tid] := VC[prev(e)][tid]

LVC[tid] := max(VC[e][tid], LVC[tid])

…

Vector clocks: Examples

Thread1:

T1_0: obj := new Foo()

T1_1: Notify(obj)

Thread2:

T2_0: data = readFile()

T2_1: Wait(obj)

T2_2: obj.data = data

T2_3: Notify(obj)

T1_2: Wait(obj)

T1_3: lock(l)

T1_4: obj.data = obj.data + 4

T1_5: unlock(l)

T2_4: lock(l)

T2_5: obj.data = obj.data - 4

T2_6: unlock(l)

…

What does HB miss?

 Every race or false race reported by happens-before based methods are also reported by LockSet based methods

 Fewer false positives, Potential false negatives

 Why? Every happens-before relation is not really a true synchronization

Thread2: Thread1: y = y + 1 lock(l) x = x + 1 unlock(l) lock(l) x = x + 1 unlock(l) y = y + 1

Race detection: summary

 First line of defence for most concurrent programs

 Many bugs just show up as race conditions

 Lockset is fast

 Lots of false positives

 Happens-before is slow

 Reports only true data races

 Potential false negatives

 There are hybrid techniques

 Compute approximations of LockSet and HB

Context bounding and

Sequentialization

Context bounding

 Folk knowledge : Most concurrency bugs are shallow in terms of required context-switches

 Most bugs require very few bug fixes

 Most concurrency bugs are atomicity violations or order violations

 For an empirical study, see Shan Lu et al. 2006…2008

 Why not check concurrent programs only up to a few context switches?

 Much more efficient

CHESS: Systematic exploration

 Culmination of techniques proposed by Qadeer et al in

2004

 Correctness primarily given by assertions in the code

 Can also use monitors

 Can detect data-races, deadlocks, etc

 Main idea: Use a scheduler that explores traces of the program deterministically , prioritizing traces having few context-switches

CHESS: Controlling scheduler

 Non-determinism source:

 Input

 Scheduling

 Timing and library

 Input non-determinism controlled by specifying fixed inputs

 Scheduling non-determinism controlled by writing deterministic scheduler

 Library non-determinism: model library code

State-space explosion

Thread1: x = 1

…

…

…

… y = k

…

Threadn: x = 1

…

…

…

… y = k

Additionally, scheduler can use polynomial amount of space

• Remember c spots for context switches

• Permutations of the n+c atomic blocks

 Exploring k steps in each of the n threads

 Number of executions is

O(n nk )

 Exploring k steps in each thread, but only c context-switches

 Number of executions is

O((n 2 k) c .n!)

 Not exponential in k

Scheduling: Picking pre-emption points

void Deposit100() {

ChessSchedule();

EnterCriticalSection(&cs); balance += 100;

ChessSchedule();

LeaveCriticalSection(&cs);

} void Withdraw100() { int t;

ChessSchedule();

EnterCriticalSection(&cs); t = balance;

ChessSchedule();


ChessSchedule();

EnterCriticalSection(&cs); balance = t -100;

ChessSchedule();


}

 Heuristics: More pre-emption points in critical code, etc

 Coverage guarantee: When 2 context-switches are explored, every remaining bug requires at least 3 context-switches

CHESS: Summary

 Build a deterministic scheduler

 Complications: Fairness and Live locks, weak memory models

 Advantages:

 Runs real code on real systems

 Only scheduler has been replaced

 Disadvantages:

 Is mostly program agnostic

 Exhaustive testing

Sequentialization

 CHESS approach: Concurrent program + bound on context switches  explore all interleavings

 General sequentialization approach: Concurrent program + bound on context switches  Sequential program

 Then, verify sequential program using your favourite verification technique

 Many flavours of context-bounded analysis:

 PDS based (Qadeer et al.)

 Transformation based sequentialization: Eager, Lazy (Lal et al.)

 BMC based (Parlato et al.)

Sequentialization: Basic idea

 What is hard about sequentialization?

 Have to remember local variables across phases (though they don’t change)

 If exploring T1  T2  T1, have to remember locals of T1 across phase of T2

 Lal-Reps 2008: Instead, do a source to source transformation

 Copy each statement and global variable c times

 Now, we can explore T1  T1  T2 instead of T1  T2  T1

 Only one threads local variables relevant at each stage


 Replace each global variable X by X[tid][0..K]

 X[tid][i] represents the value of the global variable X the i th time thread tid is scheduled

X := X + 1 if (phase = 0)

X[tid][0] := X[tid][0] + 1 else if (phase = 1)

X[tid][1] := X[tid][1] + 1

… else if (phase = K)

X[tid][K] = X[tid][K] + 1 if (phase < K && *) phase++; if phase == K + 1 phase = 1

Thread[tid+1]()


 A program (T1||T2) is rewritten into Seq(T1); Seq(T2); check() for phase = 0 to K if (phase > 0) assume (X[0][phase] == X[N][phase – 1] for tid = 1 to N assume (X[tid][phase] == X[tid-1][phase])

 Roughly,

 Execute each thread sequentially

 But, at random points, guess new values for global variables

 In the end, check the guessed new values are consistent

Sequentialization

Each green arrow is one part of the check!

Thread 0:

…

…

X[0][0] := X[0][0] + 1

…

…

…

X[0][1] := X[0][1] + 1

…

…

X[0][2] := X[0][2] + 1

…

Thread 1:

…

…

X[1][0] := X[1][0] + 1

…

…

…

X[1][1] := X[1][1] + 1

…

…

X[1][2] := X[1][2] + 1

…

Sequentialization

 The original Lal/Reps technique uses summarization for verification of the sequential program

 Compute summaries for the relation of initial and final values of global variables

 Extremely powerful idea

 Advantage: Reduces the need to reason about locals of different threads

 No need to reason explicitly about interleavings

 Interleavings encoded into data (variables)

 Scales linearly with number of threads

Sequentialization and BMC

 Currently, the best tools in the concurrency verification competitions use “sequentialization + BMC”

 The previous sequentialization technique is better suited for analysis techniques, not model checking

 No additional advantage using additional globals and then checking for consistency

 Instead, just explicitly use non-determinism

BMC for concurrency

 First, rewrite threads by unrolling loops and inlining function calls

 No loops

 No function calls

 Forward only control flow

 Write a driver “main” function to schedule the threads one by one

Naïve sequentialization for BMC

Main driver: pc0 = 0, … , pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) threadi();

}

• The resume mechanism jumps into “right” spot in the thread

• There is a potential CS before each statement

• What’s the problem? Lots of jumps in the control flow

• Bad for SMT encoding threadi(): switch(pci) { case 0: goto 0; case 1: goto 1;

…

}

0: CS(0); stmt0;

1: CS(1); stmt1;

…

M: CS(M); stmtm;

CS(j) := if(*) { pci = j; return

}

Better sequentialization for BMC

Main driver: pc0 = 0, … , pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) nextCS = * assume (nextCS >= pci) threadi(); pci = nextCS

}

• Avoid the multiple control flow breaking jumps

• Restricted non-determinism to one spot threadi():

0: CS(0); stmt0;

1: CS(1); stmt1;

…

M: CS(M); stmtm;

CS(j) := if(j < pci || j >= nextCS)

{ goto j+1;

}

Context bounding and

Sequentialization: Summary

 Host of related techniques

 Can be adapted for analysis, model checking, testing, etc

 Different techniques need different kinds of tuning

 Basic idea: Most bugs require few context switches to turn up

 Can leverage standard sequential program analysis techniques

Odds and Ends

Things we didn’t cover

Specification-free correctness

Seq. Exec

 In many cases we don’t want to write assertions

 Just want concurrent program to do the same thing as a sequential program is doing

E x c

. e

C o n c

 Standard correctness conditions

 Linearizability [Herlihy/Wing 91]

 Serializability [Papadimitrou and others 70s]

Method 0 Method 2

Method 1

Method 3

Method 0 Method 3 Method 1 Method 2

Testing for concurrency

 Root cause of bugs

 Ordering violations

 Atomicity violations

 Data races

 Coverage metrics and coverage guided search

 Define use pairs [Tasirin et al]

 Find ordering violations based on define use orderings

 HaPSet [ Wang et al]

 Find interesting interleavings by trying to cover all

“immediate histories” of events

 Cute/JCute [Sen et al]

 Concolic testing: Accumulate constraints along test run to guide future test runs

(Symbolic) Predictive Analysis

 Analyze variations of the given concurrent trace

 Run a test and record information

 Build a predictive model by relaxing scheduling constraints

 Analyze predictive model for alternate interleavings

 Can flag false bugs

 Symbolic predictive analysis

 From a trace, build precise predictive model (as SMT formula)

 No false bugs

This is the End

 Brief overview of concurrent verification techniques

 Lecture 1: Full proof techniques

 Lecture 2: Incomplete techniques / Bug finding

 What did we learn?

 Full verification is hard, not many techniques for weakmemory architectures

 Use light-weight and incomplete techniques to detect shallow bugs

 Code using a strict concurrency discipline is more likely to be correct, easier to verify

Verification for Concurrency

Verification for Concurrency

Contents

Race detection

Data Races

The Lockset algorithm

Vanilla Lockset algorithm

Vanilla Lockset algorithm: Example

Vanilla LockSet: False positive

More advanced locking disciplines

More advanced locking discipline

Advanced LockSet: Examples

Advanced LockSet: False positive

Happens-Before relation

Happens-Before Relation: Defintion

Happens-Before: Examples

Computing the Happens-Before:

Vector Clocks

Vector clocks

Vector clocks: Examples

What does HB miss?

Race detection: summary

Context bounding and

Sequentialization

Context bounding

CHESS: Systematic exploration

CHESS: Controlling scheduler

State-space explosion

Scheduling: Picking pre-emption points

CHESS: Summary

Sequentialization

Sequentialization: Basic idea

Sequentialization: Basic idea

Sequentialization: Basic idea

Sequentialization

Sequentialization

Sequentialization and BMC

BMC for concurrency

Naïve sequentialization for BMC

Better sequentialization for BMC

Context bounding and

Sequentialization: Summary

Odds and Ends

Specification-free correctness

Testing for concurrency

(Symbolic) Predictive Analysis

This is the End

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib