revisions-FOOL-2010

advertisement
Concurrent Revisions:
A deterministic concurrency model.
Daan Leijen & Sebastian Burckhardt
Microsoft Research
(invited talk at FOOL 2010)
Algorithmic/
Scientific
Data Mining
Biology
Chemistry
Physics
Special-Purpose
Operating System
Interactive/
Reactive
Desktop
Applications
Database
Games
Multimedia
Signal Processing
Parallel
Programming
TPL, Fortran, MPI, Cilk,
StreamIt, X10, Cuda, ...
Concurrent
Programming
Threads & Locks,
Futures, Promises,
Transactions, ...
?
Our Focus.
Application = Shared Data and Tasks
Example: Office application
•
•
•
•
Save the document
React to keyboard input by the user
Perform a spellcheck in the background
Exchange updates with collaborating remote users
R
Reader
R R
Mutator
Mutator
R
Reader
Shared Data
Mutator
Reader
Mutator
Spacewars!
Examples from SpaceWars Game
Example 1: read-write conflict
 Render task reads position of all game objects
 Physics task updates position of all game objects
=> Render task needs to see consistent snapshot
Example 2: write-write conflict
 Physics task updates position of all game objects
 Network task updates position of some objects
=> Network has priority over physics updates
Conventional Concurrency Control
Conflicting tasks can not efficiently execute in parallel.
 pessimistic concurrency control
• use locks to avoid parallelism where
there are (real or potential) conflicts
 optimistic concurrency control
• speculate on absence of conflicts
rollback if there are real conflicts
either way: true conflicts kill parallel performance.
Our Proposed Programming Model:
Revisions and Isolation Types
•
•
•
•
•
Revision
Isolation Type
A logical unit of work
that is forked and
joined
A type which implements
automatic copying/merging of
versions on write-write conflict
Deterministic Conflict Resolution, never roll-back
Full concurrent reading and writing of shared data
No restrictions on tasks (can be long-running, do I/O)
Clean semantics
Fast and space-efficient runtime implementation
What’s new
Traditional Task
int x = 0;
task t = fork {
x = 1;
}
assert(x==0 || x==1);
join t;
assert(x==1);
Isolation types:
declares shared data
fork revision:
forks off a private copy of
the shared state
Concurrent Revisions
versioned<int> x = 0;
revision r = rfork {
x = 1;
}
assert(x==0);
join r;
assert(x==1);
• Isolation: side effects
are only visible when the
revision is joined.
• Deterministic execution!
isolation:
Concurrent modifications
are not seen by others
join revision:
waits for the revision to
terminate and writes back
changes into the main revision
Sequential int x = 0;
Consistency int y = 0;
task t = fork {
if (x==0) y++;
}
if (y==0) x++;
join t;
Transactional
Memory
assert(
(x==0 && y==1)
|| (x==1 && y==0)
|| (x==1 && y==1));
Concurrent
Revisions
int x = 0;
int y = 0;
task t = fork {
atomic { if (x==0) y++; }
}
atomic { if (y==0) x++; }
join t;
versioned<int> x = 0;
versioned<int> y = 0;
revision r = rfork {
if (x==0) y++;
}
if (y==0) x++;
join r;
assert(
(x==0 && y==1)
|| (x==1 && y==0));
assert(x==1 && y==1);
Conflict resolution
Versioned<int> x;
x = 0
x = 1
assert( x==2 )
By default, on a write-write conflict
(only), the modification in the child
revision wins.
x = 0
x = 2
x = 1
assert( x==0 )
x = 0
x = 0
x = 1
assert( x==1 )
Custom conflict resolution
Cumulative<int, (main,join,orig).main + join – orig> x;
x = 0
0
x += 1
merge(1,2,0)
3
1
2
assert(x==3)
x += 2
x = 0
0
x += 1
x += 2
2
merge(1,2,0)
3
1
3
x += 3
2
5
merge(3,5,2)
6
assert( x==6 )
A Software engineering perspective
• Transactional memory:
 Code centric: put “atomic” in the code
 Granularity:
• too broad: too many conflicts and no parallel
speedup
• too small: potential races and incorrect code
• Concurrent revisions:
 Data centric: put annotations on the data
 Granularity: group data that have mutual
constraints together, i.e. if (x + y > 0) should hold,
then x and y should be versioned together.
Current Implementation: C# library
• For each versioned object, maintain multiple
copies
 Map revision ids to versions
 `mostly’ lock-free array
Revision
Value
1
0
40
2
45
7
• New copies are allocated lazily
 Don’t copy on fork… copy on first write after fork
• Old copies are released on join
 No space leak
Demo
class Sample
{
[Versioned]
int i = 0;
public void Run()
{
var t1 = CurrentRevision.Fork(() => {
i += 1;
});
var t2 = CurrentRevision.Fork(() => {
i += 2;
});
i+=3;
CurrentRevision.Join(t1);
CurrentRevision.Join(t2);
Console.WriteLine("i = " + i);
}
}
Demo: Sandbox
class Sandbox
{
[Versioned]
int i = 0;
Fork a revision without forking
an associated task/thread
public void Run()
{
var r = CurrentRevision.Branch("FlakyCode");
try {
r.Run(() =>
Run code in a certain revision
{
i = 1;
throw new Exception("Oops");
});
CurrentRevision.Merge(r);
}
Merge changes in a
catch {
revision into the main one
CurrentRevision.Abandon(r);
}
Console.WriteLine("\n i = " + i);
}
}
Abandon a revision and don’t
merge its changes.
By construction, there is no
‘global’ state: just local state for
each revision
State is simply a (partial)
function from a location to a
value
Operational
Semantics
For some revision r, with snapshot
and local modifications and an
expression context with hole ( x.e) v
On a join, the writes of the
joinee r’ take priority over
the writes of the current
revision: :: ’
the state is a composition
of the root snapshot
and local modifications
On a fork, the
snapshot of the new
revision r’ is the
current state: ::
Custom merge: per location (type)
No conflict if a
location was not
written in the joinee
On a join, using a
merge function.
Conflict otherwise, use
a location/type specific
merge function
Standard merges:
No conflict if a location
was unmodified in the
current revision, use
the value of the joinee
What is a conflict?
Cumulative<int> x = 0
• Merge is only called if:
(1) write in child, and
(2) modification in
main revision:
0
x += 2
2
No conflict
(merge function
is not called)
No conflict
(merge function
is not called)
0
2
x += 3
2
5
assert( x = 5 )
Merging with failure
On fail, we just ignore any
writes in the joinee
Snapshot isolation
• Widely used in databases, for example Oracle
and Microsoft SQL
• In essence, in snapshot isolation a concurrent
transaction can only complete in the absence
of write-write conflicts.
• Our calculus generalizes snapshot isolation:
 We support arbitrary nesting
 We allow custom merge functions to resolve
write-write conflicts deterministically
Snapshot isolation
We can succinctly model snapshot isolation as:
• Disallow nesting
• Use the default merge:
Some versions of snapshot isolation do not treat
silent writes in a transaction as a conflict:
Sequential merges
• We can view each location as an abstract data
types (i.e. object) with certain operations (i.e.
methods).
• If a merge function always behaves as if
concurrent operations for those objects are
sequential, we call it a sequential merge.
• Such objects always behave as if the
operations in the joinee are all done
sequentially at the join point.
Sequential merges
x=o
• A merge is sequential if:
u
merge(uw1(o), uw2(o), u(o))
=
uw1w2(o)
w1
w2
• And uw1w2(o)
merge(uw1(o), uw2(o), u(o))
Abelian merges
• For any abstract data type that forms an
abelian group (associative, commutative, with
inverses) with neutral element 0 and an
operation , the following merge is
sequential:
merge(v,v’,v0) = v
v’
v0
• This holds for example for additive integers
and additive sets.
SpaceWars Game
Parallel
Collision
Detection
Parallel
Collision
Parallel CollisionDetection
Detection
Simulate
Physics
Render
Screen
Play
Sounds
Shared
State
Graphics
Card
Sequential Game Loop:
Send
Receive
Network Connection
Autosave
Disk
Process
Inputs
Keyboard
autosave
(long running)
network
Physics
Render
Coll. Det. 4
Coll. Det. 3
Coll. Det. 2
Coll. Det. 1
Revision Diagram for Parallelized Game Loop
autosave
(long running)
network
Physics
Render
Coll. Det. 4
Coll. Det. 3
Coll. Det. 2
 Render task reads
position of all game
objects
 Physics task updates
position of all game
objects
 No interference!
Coll. Det. 1
“Problem Example 1” is solved
autosave
(long running)
network
Physics
Render
Coll. Det. 4
Coll. Det. 3
Coll. Det. 2
 Physics task updates
position of all game
objects
 Network task updates
position of some
objects
 Network updates have
priority over physics
updates
 Order of joins
establishes
precedence!
Coll. Det. 1
“Problem Example 2” is solved.
Results
 Autosave now perfectly unnoticeable in background
 Overall Speed-Up:
3.03x on four-core
(almost completely limited by graphics card)
Overhead:
How much does all the copying and the indirection cost?
Conclusion
Revisions and Isolation Types simplify the
parallelization of applications with tasks that




Exhibit conflicting accesses to shared data
Have unpredictable latency
Have unpredictable data access pattern
May perform I/O that can not be rolled back
Revisions and Isolation Types are
 easy to reason about (determinism, isolation)
 have low-enough overhead for many applications
Questions?
• daan@microsoft.com
• sburckha@microsoft.com
• Download available soon on CodePlex
Download