det2 - Vijay Saraswat

advertisement
Determinate Imperative
Programming:
The CF Model
Vijay Saraswat
IBM TJ Watson Research Center
joint work with Radha Jagadeesan, Armando SolarLezama, Christoph von Praun
http://www.saraswat.org/cf.html
Outline

Problem:



Many concurrent
imperative programs are
determinate.
Determinacy is not
apparent from the syntax.
Basic idea

A variable is the stream
of values written to it by a
thread.

Many examples

Semantics

Implementation

Future work
2
Background: X10

Five basic themes:





Partitioned address
space
Pervasive explicit
asynchrony (Cilk-style
recursive parallelism)
Java base
Guaranteed VM
invariants
Explicit, distributed VM

Few language
extensions




<s> = async <s>
<s> = finish <s>
<s> = foreach ( <v>,
…,<v> in <e>) <s>
Multidimensional arrays
over distributions
Subsumes MPI, OpenMP, SPMD languages, Cilk …
3
X10: clocks, clocked final data structures




Clocks can be created
dynamically.
Activities are registered with
clocks.
An activity may register a
newly created activity with
one of its clocks.
“next;” resumes each clock;
blocks until each clock
advances.
 This is sufficient for
deadlock-freedom.
 Adequate for parallel
operations on arrays
 But not dataflow


Clock advances when all
activities registered on it
resume the clock.
Operations



c.resume(); next;
c.drop();
Clocked final datum


In each phase of the clock
the datum is immutable.
Read gets current value;
write updates in next phase.
Clocks do not introduce deadlock; clocked finals are determinate.
4
Clocked final example: Array relaxation
G elements are assigned to at most once in each
phase of clock c.
int clocked (c) final [0:M-1,0:N-1] G = …;
Each activity is registered on c.
finish foreach (int i,j in [1:M-1,1:N-1]) clocked (c) {
for (int p in [0:TimeStep-1]) {
Read current
value of cell.
G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];
next;
Wait for clock to advance.
}
}
Write visible (only) when clock advances.
Takeaway: Each cell is assigned a clocked stream of immutable values.
5
Imperative Programming Revisited

Variables






Value in a Box
Read: fetch current value
Write: change value
Stability condition: Value
does not change unless a
write is performed
Asynchrony introduces
indeterminacy
int x = 0;
async x=1;
print(x);
Very powerful

Permit repeated manywriter, many-reader
communication through
arbitrary reference graphs

May write out either 0 or 1.
Reader-reader, reader-writer, writer-writer conflicts.
6
Determinate Concurrent Imperative
frameworks

Asynchronous Kahn
networks




Nodes can be thought of
as (continuous) functions
over streams.
Pop/peek
Push
Node-local state may
mutate arbitrarily

Concurrent Constraint
Programming




Tell constraints
Ask if a constraint is true
Subsumes Kahn
networks (dataflow).
Subsumes (det)
concurrent logic
programming, lazy
functional programming
Do not support arbitrary mutable variables.
7
Determinate Concurrent Imperative
Frameworks
Good:

Safe Asynchrony
(Steele 1991)



Parent may communicate
with children.
Children may
communicate with
parent.
Siblings may
communicate with each
other only through
commutative, associative
writes (“commuting
writes”).
int x=0;
finish foreach (int i in 1:N) {
x += i;
}
print(x); // N*(N+1)/2
Bad:
int x=0;
finish foreach (int i in 1:N) {
x += i;
async print(x);
}
Useful but limited. Does not permit dataflow synch.
8
The CF Basic model


A shared variable is a stream
of immutable values.
Each activity maintains an
index i + clean/dirty bit for
every shared variable.




World Map=Collection of
indices for an activity.

Index transmission rules.

Initially i=1, v[0] contains initial
value.
Read: If clean, block until v[i] is
written and return v[i++] else
return v[i-1]. Mark as clean.
Write: Write into v[i++]. Mark as
dirty.
A read stutters (returns value
in last phase) if no activity can
write in this phase.


E.g. for local variables.


Activity initialized with
current world map of parent
activity.
On finish, world map of
activity is lubbed with world
map of finished activities.
(clean lub dirty = clean)
All programs are determinate
and scheduler independent.

May deadlock … nexts are
not conjunctive.
The clock of clocked final is made implicit.
9
CF example: Array relaxation
shared int [0:M-1,0:N-1] G = …;
finish foreach (int i,j in [1:M-1,1:N-1]) {
for (int p in [0:TimeStep-1]) {
G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];
}
}
All clock manipulations are implicit.
10
Some simple examples
shared int x=0;
0
finish {
async {int r1 = x; int r2 = x; println(r1); println(r2);}
1
async {x=1;x=2;}
}
i
x
A1
0
0
read r1
1
1
read r2
2
2
A2
write 1
write 2
Only one result – independent of the scheduler!
11
Some simple examples
shared int x=0;
finish {
async {int r1 = x; int r2 = x; println(r1); println(r2);}
0
async {x=1;}
1
async {x=1; int r3 = x; async {x=2;}}
2
}
println(x);
i
x
A1 (0)
0 0
read r1
1 1
read r2
2 2
A2 (0)
A3 (0)
write 1
write 1;
read r3
A4 (2)
write 2
All programs are determinate.
12
Some StreamIt examples
StreamIt
X10/CF
void -> void pipeline Minimal {
shared int x=0;
0
add IntSource;
async while (true) x++;
1
add IntPrinter;
async while (true) println(x);
…
}
void ->int filter IntSource {
The communication is through
assignment to x, so the same result
is obtained with:
int x;
init {x=0;}
work push 1 { push(x++);}
}
shared int x=0;
0
int->void filter IntPrinter {
async while (true) ++x;
1
async while (true) println(x);
…
work pop 1 { print(pop());}
}
Each shared variable is a multi-reader, multi-writer stream.
13
Some StreamIt examples: fibonacci
shared int x=1, y=1;
async while (true) y=x;
async while (true) x+=y;
i
0
1
2
3
…
y
1
1
2
3
…
x
1
2
3
5
…
Activity 1
Activity 2
Can express any recursive, asynchronous Kahn network.
14
StreamIt examples: Moving Average
void->void pipeline MovingAverage {
add intSource();
add Averager(10);
shared int y=0;
shared int x=0; async while (true) x++;
async while (true) {
add IntPrinter();
int sum=x;
}
for (int i in 1:N-1) sum += peek(x, i);
int->int filter Average(int n) {
work pop 1 push 1 peek n {
y = sum/N;
}
int sum=0;
for (int i=0; i < n; i++)
sum += peek(i);
push(sum/n);
pop();
• peek(x, i) reads the i’th future value,
without popping it. Blocks if necessary.
}
}
15
StreamIt examples: Bandpass filter
float->float pipeline BandPassFilter(float rate,
float bandPassFilter(float rate, float low,
float low, float high, int taps) {
float high, int taps, int in) {
add BPFCore(rate, low, high, taps);
int tmp=in;
add Subtracter();}
shared int in1=tmp, in2=tmp;
float ->float splitjoin BPFCore
async while (true) in1=in;
(float rate, float low,
async while (true) in2=in;
float high, int taps) {
shared int o1 = lowPass(rate, low, taps, 0, in1),
split duplicate;
o2 = lowPass(rate, high, taps, 0, in2);
add LowPass(rate, low, taps, 0);
shared int o = o1-o2;
add LowPass(rate, high, taps, 0);
async while(true) o = o1-o2;
join roundrobin;}
float->float filter Subtracter {
return o;
}
Work pop 2 push 1 {
push(peek(1)-peek(0));
pop(); pop();}}
Functions return streams.
16
Canon matrix multiplication
Parameters whose values are finalized.
<final int N>void canon (double[N,N] c, double[N,N] a, double[N,N] b) {
finish foreach (int i,j in [0:N-1,0:N-1]) {
a[i,j] = a[i,(j+1) % N];
b[i,j] = b[(i+j)%N, j];
}
for (int k in [0:N-1])
Local variables in each activity.
finish foreach (int i,j in [0:N-1,0:N-1]) {
c[i,j] = c[i+j] + a[i,j]*b[i,j];
a[i,j] = a[i,(j+1)%N];
b[i,j] = b[(i+1)%N, j];
}
}
The natural sequential program works (for  finish foreach).
17
Histogram


Permit “commuting”
writes to be performed
simultaneously in the
same phase.
Phase is completed
when all activities that
can write have written.
<int N> [1:N][] histogram([1:N][] A) {
final int[] B = new int [1:N];
finish foreach(int i in A) B[A[i]]++;
return B;
}
B’s phase is not yet
complete. A subsequent
read will complete it.
18
Cilk programs with races
int x;
cilk void foo() {
x = x +1;
Determinate: Will always print 1 in CF.
}
cilk int main() {
x=0;
spawn foo();
spawn foo();
sync;
printf(“x is \%d\n”, x);
return 0;
}
CF smoothly combines Cilk and StreamIt.
19
Implementation


Each activity’s world map
increases monotonically
with time.
Use garbage collection to
erase past unreachable
values.

Programs with no sibling
communication may be
executed in buffers with unit
windows.

Considering permitting user
to specify bounds on
variables (cf push/pop
specifications in StreamIt).

This will force writes to
become blocking as well.
Scheduling strategy affects size of buffers, not result.
20
Formalization



MJ/CF
Very straightforward
additions to field
read/write.
Paper contains details.
Surprisingly localized.
21
Future work

Paper contains ideas on detecting deadlock
(stabilities) at runtime and recovering from them.


Implementation.


Programmability being investigated.
Leverage connection with StreamIt, and static scheduling.
Coarser granularity for indices.


Use same clock for many variables.
Permits “coordinated” changes to multiple variables.
22
Download