Anton-Types for Atomicity

advertisement
Types for Atomicity
Presentation by Anton Wolkov for Seminar in
Distributed Algorithms Spring 2013
Authors:
Cormac Flanagan, UC Santa Cruz
Stephen Freund, Marina Lifshin, Williams College
Shaz Qadeer, Microsoft Research
Subject of this paper
A way to simplify verification of programs with
arbitrarily-interleaved threads accessing
common resources by introducing an
extension to the Java type system with
special notations.
The notation writing task itself is simplified by
introducing special type inference system.
Finally using a prototype checker on
benchmark code we show the usefulness of
this type system.
What problems are we addressing?
Multi-threaded Java programs are hard to verify
manually, because they can misbehave when
non atomic operations with write operations
on common resources are interleaved in
unexpected ways.
Even existing Java libraries exhibit subtle flaws
after years of community and internal QA.
We would much rather manually verify a serial
execution of a multi-threaded program.
Race Conditions
What is a race condition?
A race condition occurs when two threads
simultaneously access the same data
variable, and at least one of the accesses is a
write.
A lot of work has been done in detecting these
types of flaws with static and dynamic
analysis and there are decent tools for
detecting these errors in Java.
Current solutions to race condition
There are methods and tools to detect race
conditions in multi-threaded Java applications
Is a race condition free program good enough?
Is there a serial execution equivalent with the
same result as any other interleaved
execution?
What is a race condition?
Race Condition (2)
A race condition occurs when two
threads simultaneously access the same
data variable, and at least one of the
accesses is a write.
Is a race condition free program good enough?
No. For example:
acquire(l); t := x; release(l);
t := t + 1;
acquire(l); x := t; release(l);
This code by definition is race condition free.
What is a race condition?
Race Condition (3)
A race condition occurs when two
threads simultaneously access the same
data variable, and at least one of the
accesses is a write.
acquire(l); t := x; release(l);
t := t + 1;
acquire(l); x := t; release(l);
If this code is executed 5 times with random
thread interleaving the value of x can be
incremented by any value between 1 and 5.
What we would like to use is a stronger term
we call atomicity.
Introducing Atomicity
Obviously we do not require the entire program
to be serially executed in a multi-threaded
environment, so instead we would like to
have some individual methods to be safe and
atomic.
What is atomicity?
A method is atomic if, for every execution,
there is an equivalent serial execution in
which the actions of the method are not
interleaved with actions of other threads.
Introducing Atomicity Cont'd.
Thread scheduler is still free to interleave but,
any interaction between threads are benign
or has negative affect (access to common
resources is "safe").
In layman's terms
If a method is atomic we can assume that even if
it will interleaved with other code it will still act
the same way it would in a serial execution.
This covers race conditions but provides a
stronger guarantee.
Atomic
Race
Condition
Free
Motivation for atomicity
What is it good for?
•
•
•
Multi-threaded applications are prevalent.
Thread interleavings are really hard for
programmers. Flaws are commonly missed
during development and code reviews.
Previous efforts were invested in avoiding
race conditions, but race condition free
programs still may fail spectacularly in edge
cases.
An example
class Account {
int balance = 0;
synchronized int read() {
return balance;
}
synchronized void set(int b) {
balance = b;
}
void deposit (int amount) {
int b = read();
set(b + amount);
}
}
An example
class Account {
int balance = 0;
synchronized int read() {
return balance;
}
synchronized void set(int b) {
balance = b;
}
void deposit (int amount) {
If a new thread comes along
int b = read();
and modifies balance here
set(b + amount);
we will set the new balance
}
to a wrong value.
}
An example
class Account {
This is not a race condition
int balance = 0;
since we do not access
synchronized int read() {
balance concurrently.
return balance;
}
synchronized void set(int b) {
balance = b;
}
void deposit (int amount) {
int b = read();
set(b + amount);
}
}
Synchronized vs. Atomic
The synchronized keyword in java is too
restrictive.
An object wide statement, does not handle
escapes.
Too harsh. All other synchronized methods
in the object are stopped from executing
(serial execution).
We would like the same bottom line without
sacrificing thread interleaving completely.
•
•
•
Why a type system?
Programs are large, and we don't want to
check it whole.
Type system provides modularity for the
checker.
We also want to guarantee our libraries are
safe for any use.
Verification using keywords
We introduce a formal multi-threaded subset of
Java language we call AtomicJava.
Each atomic method should be notated with the
keyword atomic and an appropriate level of
atomicity, including the conditions are locks
that are in place.
Quick Reminder:
Theory of right and left movers
An action a is a right mover if for any execution
where the action a performed by one thread
is immediately followed by an action b of a
different thread, the actions a and b can be
swapped without changing the resulting state
S3.
Theory of right and left movers
Similarly, an action b is a left mover if
whenever b immediately follows an action a
of a different thread, the actions a and b can
be swapped, again without changing the
resulting state.
Example of an atomic method
synchronized void inc() {
int t = x;
x = t + 1;
}
The keyword synchronized instructs Java to
hold a lock on this for the duration of the
method.
Example of an
atomic method
synchronized void inc()
{ int t = x;
x = t + 1; }
We denote the acquisition of the lock acq and rel
for the release.
Suppose that the actions of this method are
interleaved with arbitrary actions X1, X2, and
X3 of other threads
Example of an
atomic method
synchronized void inc()
{ int t = x;
x = t + 1; }
Because the acq operation is a right mover and
the write and rel operations are left movers,
there exists an equivalent serial execution
where the operations of the method are not
interleaved with operations of other threads
(see illustration above). Thus the method is
atomic.
Theory of right and left movers
cont.
•
•
a lock-acquire operation is a right mover
a lock-release operation is a left mover
If a variable must be held by a lock to be
accessed the operation has (both) right and
left movers.
In Java this is true to all synchronized
methods since it does both lock-acquire and
lock-release operations.
In short we will call right and left movers movers.
Theory of right and left movers
cont.
More generally, suppose a method contains a
sequence of right movers followed by a single
atomic action followed by a sequence of left
movers. Then an execution where this method
has been fully executed can be reduced to
another execution with the same resulting state
here the method is executed serially without any
interleaved actions by other threads. Therefore,
an atomic annotation on such a method is valid.
AtomicJava
We base our formal development on the
language AtomicJava, a multi-threaded subset
of Java with a type system for atomicity.
AtomicJava keywords
Each field declaration includes a field guard g
that specifies the synchronization discipline
for that field. The possible guards are:
•
•
•
•
final - the field cannot be written to after initialization
guarded_by l - the lock denoted by the lock expression l
must be held on all accesses (reads or writes) of that
field
write_guarded_by l - the lock denoted by the lock
expression l must be held on all writes of that field, but
not for reads.
no_guard - the field can be read or written at any time.
Useful to denote fields with intentional race conditions.
AtomicJava parameterized classes
class cn<ghost x1,...,xn> { ... }
Classes now have a binding for the ghost
variables x1...xn, which can be referred to
from type annotations within the class body.
The type cn<l1,...,ln> refers to an instantiated
version of cn where each xi in the body is
replaced by the lock expression li.
AtomicJava parameterized
methods
atomicity type method cn<ghost x1,...,xn>
(t1 y1 t2 y2 ... tm ym) { ... }
Defines a method method of return type type
that is parameterized by a ghost locks x1...xn
and takes arguments of types t1...tm with
corresponding values y1...ym .
atomicity is a keyword (like atomic) which we
will define next.
AtomicJava parameterized
methods
atomicity type method cn<ghost x1,...,xn>
(t1 y1 t2 y2 ... tm ym) { ... }
Here, we note that the atomicity a may refer to
program variables in scope, including this,
the ghost parameters of the containing class,
and normal parameters of the method itself.
AtomicJava synchronized methods
sync l { ... }
Acquires lock l, execute the expression inside,
and finally release lock l.
This is similar to synchronized in "regular" Java
only here we get to specify the lock (and
synchronized always chooses this).
A forked thread does not inherit locks held by
its parent thread.
AtomicJava fork
e.fork
Starts a new thread starting with (object) e's run
method with a single ghost parameter.
The fork operation spawns a new thread that,
conceptually, acquires a new thread-local lock
tll for instantiating the ghost parameter to
the method run. This lock is always held by
the new thread and may therefore be used by
run to guard thread-local data, and it may be
passed as a ghost parameter to other
methods that access thread-local data.
AtomicJava - everything else
Other expressions in the language include field
read and update, method calls, variable
binding and reference, conditionals, loops,
and synchronized blocks. We include basic
types for both single-wordints and doublewordlongs.
Only reads and writes of the former are atomic.
Reads and writes of object references are
also atomic.
Types of Atomicity - Levels
•
const
o
•
•
•
•
o
The atomicity const describes any expression whose
evaluation does not depend on or change any
mutable state. Hence the repeated evaluation of a
const expression with a given environment always
yields the same result.
i.e. a method returns a constant.
mover
atomic
cmpd
error
Types of Atomicity - Levels
•
•
const
mover
o
o
•
•
•
The atomicity mover describes any expression that
both left and right commutes with operations of other
threads.
i.e. if access to a field f declared as guarded_by l and
the access is performed with the lock l held.
atomic
cmpd
error
Types of Atomicity - Levels
•
•
•
•
•
const
mover
atomic
o The atomicity atomic describes any expression that
is a single atomic action, or that can be considered
to execute without interleaved actions of other
threads
cmpd
error
Types of Atomicity - Levels
•
•
•
•
const
mover
atomic
cmpd
o
•
o
The atomicity cmpd describes compound expression
for which none of the preceding atomicities apply.
i.e. sequential atomic ops not guarded by a lock
error
Types of Atomicity - Levels
•
•
•
•
•
const
mover
atomic
cmpd
error
o The atomicity error describes any expression
violating the locking discipline specified by the type
annotations.
Hierarchy of basic atomicity
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
Sequential composition operations
of atomicity levels
; (as in b;c) denotes sequential composition
Conditional atomicity
In some cases, the atomicity of an expression
depends on the locks held by the thread
evaluating that expression.
For instance, if we hold lock1 we have a
mover, otherwise we have a problem.
Formal notation: (lock1 ? mover : error)
A conditional atomicity (l ? a1 : a2) is
equivalent to atomicity a1 if the lock l is
currently held, and it is equivalent to a2 if the
lock is not held.
Atomicity - Formal definition
Each atomicity level can be basic or condition:
Atomicity closure
For example:
Meaning: if lock l1 is held we have a mover, if
lock l2 is held but not l1 we have an atomic
op, otherwise we have a violation of thread
safety.
Atomicity levels formal notation
Let b;c denote the sequential composition of b
and later c
Let b* denote the iterative closure of b.
Let ⨆ denote the join operator based on this
subatomicity ordering. If basic atomicities b1
and b2 reflect the behavior of e1 and e2
respectively, then the nondeterministic choice
between executing either e1 or e2 has
atomicity b1⨆b2.
Conditional atomicity with lock sets
We denote (|b|)(ls) = b where b is a conditional
atomicity and ls is a set of locks, to be
equivalent to:
b = l ? a1 : a2
Meaning: for every lock in the set our atomicity
is a1 and a2 for every lock that is not.
Atomicity levels calculus
Extension of iterative closure, sequential
composition, and join operations to
conditional atomicities
Theorem 1
Extension of iterative closure, sequential
composition, and join operations to
conditional atomicities with lock sets
(|a1*|)(ls)=((|a1|)(ls))*
(|a1;a2|)(ls)=(|a1|)(ls);(|a2|)(ls)
(|a1⨆a2|)=(|a1|)(ls)⨆(|a2|)(ls)
Conditional subatomicity ordering
We now extend the subatomicity ordering to
conditional atomicities.
Assume h is a set of locks held by the current
thread and n is a set of locks not held by the
current thread.
Intuitively, the condition a1⊑nh a2 holds if and
only if (|a1*|)(ls)⊑(|a2*|)(ls) holds for every
lockset ls that contains h and is disjoint from n.
Conditional subatomicity ordering
The condition a1⊑nh a2 holds if and
only if (|a1*|)(ls)⊑(|a2*|)(ls) holds for
every lockset ls that contains h and is
disjoint from n.
Formally, we define a1⊑ a2 to be a1⊑∅∅ a and
check recursively:
Theorem 2
For all atomicities a1 and a2:
a1⊑ a2 ⇔ ∀ ls: (|a1|)(ls)⊑(|a2|)(ls)
If a1, a2 are not conditional this is trivial
because
∀ ls: (|a|)(ls) = a and a is non conditional
Atomicity Equivalent
Atomicities a1 and a2 are equivalent (a1≡a2) if
a1⊑ a2 and a2⊑ a1 and thus:
a1≡a2 ⇔ ∀ ls: (|a1|)(ls)≡(|a2|)(ls)
Figure III
Figure IV
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
List Example
class ListElem <ghost x> {
int num guarded_by x;
ListElem<x> next guarded_by x;
(x ? mover : error) int get() { return this.num; }
}
class List {
ListElem<this> elems guarded_by this;
(this ? mover : atomic) void add (int v) {
sync (this) {
this.elems = new ListElem<this>(v, this.elems);
}
}
atomic void addPair(int i, int j) {
this.add(i);
this.add(j);
}
(this ? mover : atomic) int get() {
sync (this) {
return this.elems.get();
}
}
}
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
List Example
class ListElem <ghost x> {
int num guarded_by x;
ListElem<x> next guarded_by x;
(x ? mover : error) int get() { return this.num; }
}
num and next are guarded by lock x (external
to the class).
If x is not held get() is an error, otherwise it's a
mover as it's return value may vary
depending on it's position in concurrent
threads.
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
List Example
class ListElem <ghost x> {
int num guarded_by x;
ListElem<x> next guarded_by x;
(x ? mover : error) int get() { return this.num; }
}
class List {
ListElem<this> elems guarded_by this;
(this ? mover : atomic) void add (int v) {
sync (this) {
this.elems = new ListElem<this>(v, this.elems);
}
}
atomic void addPair(int i, int j) {
this.add(i);
this.add(j);
}
(this ? mover : atomic) int get() {
sync (this) {
return this.elems.get();
}
}
}
get() is synchronized (holds this for the duration
of the method) which makes it atomic, if this is
already held we can upgrade it to a mover.
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
List Example - What's wrong?
class ListElem <ghost x> {
int num guarded_by x;
ListElem<x> next guarded_by x;
(x ? mover : error) int get() { return this.num; }
}
class List {
ListElem<this> elems guarded_by this;
(this ? mover : atomic) void add (int v) {
sync (this) {
this.elems = new ListElem<this>(v, this.elems);
}
}
atomic void addPair(int i, int j) {
this.add(i);
this.add(j);
}
(this ? mover : atomic) int get() {
sync (this) {
return this.elems.get();
}
}
}
const ⊏ mover ⊏ atomic ⊏ cmpd ⊏ error
List Example Corrected
class ListElem <ghost x> {
int num guarded_by x;
ListElem<x> next guarded_by x;
(x ? mover : error) int get() { return this.num; }
}
class List {
ListElem<this> elems guarded_by this;
(this ? mover : atomic) void add (int v) {
sync (this) {
this.elems = new ListElem<this>(v, this.elems);
}
}
(this ? mover : cmpd) void addPair(int i, int j) {
this.add(i); // (this ? mover : atomic) ;
this.add(j); // (this ? mover : atomic)
this.add(j); // = (this ? mover : atomic;atomic)
this.add(j); // = (this ? mover : cmpd)
}
(this ? mover : atomic) int get() {
sync (this) {
return this.elems.get();
}
}
}
Manual Annotation
The authors manually annotated parts of the
JDK and then ran a prototype verifier.
Actual Findings
In JDK 1.4.0 StringBuffer's methods are
documented to be atomic.
StringBuffer uses lock based synchronization to
achieve atomicity.
Using a prototype checker that supports the full
functionalities of Java including inheritance
we found a failed atomicity type check in the
method append(StringBuffer sb).
The method calls tosb.length() and sb.getChars(...) - both specified to have
atomicity atomic, making the overall atomicity be cmpd.
More findings - java.lang.String
This defect was fixed in the 1.4.2 release by
having contentEquals(StringBuffer sb)
acquire the lock on sb for the duration of the
method call.
False positives
The method String.hashCode() is also not
atomic because it caches the hashcode for
the String object in an unprotected field.
However, this can only result in redundant hash
recomputations and not erroneous behavior.
Analysis techniques that abstract away benign
atomicity violations can eliminate some
spurious warnings like this one.
More findings - java.lang.Vector
interface Collection {
/*# this ? mover : atomic */ int size();
/*# this ? mover : atomic */ Object[] toArray(Object a[]);
}
class Vector ... {
Object elementData[] /*# guarded by this */;
int elementCount /*# guarded by this */;
// does not type check:
/*# atomic */ Vector(Collection c) {
elementCount = c.size();
elementData = new Object[Math.min((elementCount*110L)/100,
Integer.MAX VALUE)];
c.toArray(elementData);
}
...
More findings - java.lang.Vector
The Vector constructor should set
elementCount to the size of its argument c
and copy the contents of c into the newlycreated array elementData. However, since
the lock c is not held between the calls to
c.size() and c.toArray(...), another
thread could concurrently modify c, resulting
either in an improperly initialized Vector or
an ArrayIndexOutOfBounds exception.
More findings - java.lang.Vector #2
public class Vector ... {
/*# this ? mover : atomic */
public void synchronized removeElementAt(int index) { ... }
/*# this ? mover : atomic */
public int indexOf(Object elem) { ... }
...
/*# this ? mover : atomic */
public synchronized boolean removeElement(Object obj) {
...
int i = indexOf(obj);
if (i >= 0) {
removeElementAt(i);
return true;
}
return false;
}
public abstract class Writer {
// Does not type check!
// The object used to synchronize
public class PrintWriter extends Writer {
// operations on this stream.
protected Writer out;
protected Object lock;
Writer() {
public PrintWriter(Writer out) {
this.lock = this;
super(); this.out = out; }
}
public void print(int x) {
/*# lock ? mover : atomic */
public void write(int ch) { ... }
synchronized (lock) {
/*# lock ? mover : atomic */
out.write(Integer.toString(x))
public void write(String str) { ...
...
;
}
}
}
public void println() {
synchronized (lock) {
out.write(lineSeparator); out.lock ? (lock ? mover : atomic) : atomic
}
}
out.lock ? (lock ? mover : atomic) : cmpd
public void println(int x) {
synchronized (lock) {
print(x);
println();
}
}
}
Type Inference for Atomicity
Our type checker provides fairly promising
results, but it does require the programmer to
fully annotate the code.
We've seen the amount of annotations required
is fairly high and complex.
To address this shortcoming, we now develop a
type inference algorithm for atomicity.
Type Inference for Atomicity
Algorithm Overview
Two stage run:
•
•
Infer locks (if any) protect each field using
Rcc/Sat reduction to propositional
satisfiability.
Infer the most precise atomicity for each
method, using a constraint-based analysis.
Atomicity Extensions for Type
Inference
We now introduce an atomicity variable α. An
open atomicity s is either an (explicit)
atomicity a or an atomicity variable α.
Type Inference
Demo
1. infer guarded_by
2. infer class parameters
3. insert atomicity vars
Implementation
We have extended the type checker with
inference.
We've managed to avoid exponential explosion
by reducing duplicate subterms.
Consider the sequential composition of two
conditional atomicities:
(l ? a1 : a2) ; (l ? a3 : a4)
become
l ? (l ? a1;a3) : (l ? a1;a4) : (l ? (a2;a3) : (a2;a4))
Avoiding exponential explosion
Using the following simplification rules
l ? (l ? a1;a3) : (l ? a1;a4) : (l ? (a2;a3) : (a2;a4))
becomes (l ? a1 ; a3) : (l ? a2 ; a4)
Benchmarks
Limitations
A programmer must annotate correctly, otherwise
the checker will not be very useful.
Example: The programmer wrongly assumed only
synchronized methods should be atomic:
Limitations Cont'd.
Clearly deposit should be atomic as well but
due to the lack of annotations the checker will
not warn about this.
Conclusion
Atomicity can be expressed using a type
system.
Atomicity is not the same as race conditions.
We can build an effective automatic checker for
this type of system.
We can reduce a lot of manual labour by
automatically inferring atomicity.
Questions?
Download