Determinate imperative programming

advertisement
IBM Research: Software Technology
Programming Technologies
Determinate Imperative Programming
1
Vijay Saraswat, Radha Jagadeesan, Armando SolarLezama, Christoph von Praun
November, 2006
IBM Research
This work has been supported in part by the Defense
Advanced Research Projects Agency (DARPA)
under contract No. NBCH30390004
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Outline
2
 Problem:
– Many concurrent imperative
programs are determinate.
– Determinacy is not
apparent from the syntax.
– Design a language in which
all programs are
guaranteed determinate.
 Many examples
 Semantics
 Implementation
 Future work
 Basic idea
– A variable is the stream of
values written to it by a
thread.
© 2006 IBM Corporation
IBM Research: Software Technology
Acknowledgments

Programming Technologies

3


X10 Core Team
– Rajkishore Barik
– Vincent Cave
– Chris Donawa
– Allan Kielstra
– Igor Peshansky
– Christoph von Praun
– Vijay Saraswat
– Vivek Sarkar
– Tong Wen
X10 Tools
– Philippe Charles
– Julian Dolby
– Robert Fuhrer
– Frank Tip
– Mandana Vaziri
Emeritus
– Kemal Ebcioglu
– Christian Grothoff
Research colleagues
– R. Bodik, G. Gao, R. Jagadeesan,
J. Palsberg, R. Rabbah, J. Vitek
– Several others at IBM
Recent Publications
1.
"Concurrent Clustered Programming", V. Saraswat, R.
Jagadeesan. CONCUR conference, August 2005.
2.
"X10: An Object-Oriented Approach to Non-Uniform Cluster
Computing", P. Charles, C. Donawa, K. Ebcioglu, C. Grothoff, A.
Kielstra, C. von Praun, V. Saraswat, V. Sarkar. OOPSLA Onwards!
conference, October 2005.
3.
“A Theory of Memory Models”, V Saraswat, R Jagadeesan, M.
Michael, C. von Praun, to appear PPoPP 2007.
4.
“Experiences with an SMP Implementation for X10 based on the
Java Concurrency Utilities Rajkishore Barik, Vincent Cave,
Christopher Donawa, Allan Kielstra,Igor Peshansky, Vivek Sarkar.
Workshop on Programming Models for Ubiquitous Parallelism
(PMUP), September 2006.
5.
"X10: an Experimental Language for High Productivity
Programming of Scalable Systems", K. Ebcioglu, V. Sarkar, V.
Saraswat. P-PHEC workshop, February 2005.
Tutorials

TiC 2006, PACT 2006, OOPSLA06
© 2006 IBM Corporation
IBM Research: Software Technology
A new era of mainstream parallel processing
The Challenge
Parallelism scaling replaces frequency scaling as foundation for
increased performance  Profound impact on future software
Multi-core chips
Heterogeneous Parallelism
SPE
Programming Technologies
PEs,
L1 $
4
...
PEs,
L1 $
SPU
SXU
...
SPU
SPU
SXU
SPU
SXU
SPU
SXU
SPU
SXU
...
SXU
LS
LS
LS
LS
LS
LS
LS
LS
SMF
SMF
SMF
SMF
SMF
SMF
SMF
SMF
...
16B/cycle
PPE
PPU
L2
L1
SMP Node
MIC
16B/cycle (2x)
SMP Node
PEs,
PEs,
...
EIB (up to 96B/cycle)
16B/cycle
PEs,
L1 $
SPU
SXU
16B/cycle
L2 Cache
PEs,
L1 $
SPU
SXU
Cluster Parallelism
...
Memory
PEs,
PEs,
...
Memory
BIC
PXU
32B/cycle 16B/cycle
L2 Cache
Dual
XDRTM
FlexIOTM
Interconnect
64-bit Power Architecture with VMX
Our response:
Use X10 as a new language for parallel hardware that builds on
existing tools, compilers, runtimes, virtual machines and libraries
© 2006 IBM Corporation
IBM Research: Software Technology
Server Trends: Concurrency, Distribution, Heterogeneity at
all levels
Workload
Programming Technologies
Apps
5
Servers
Network
 Shared Administrative Domain
Rack
32 Node Cards
2048 processors
System
64 Racks, 64x32x32
131,072 processors
Mode Card
16 compute cards
(16 compute, 0-2 IO
cards)
64 processors
5.6 TF/s
512 GB
20 KWatts
1 m2 footprint
Compute Card
2 chips, 1x2x1
4 processors
Chip
2 processors
360 TF/s
32 TB
1.3M Watts
HPC
Scale Out
180 GF/s
16 GB
11.2 GF/s
1.0 GB
Appliance
Commercial
Scale Out
Blade
Multi-Core
Chip
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
The X10 Programming Model
6
Place = collection of resident
activities & objects
Storage classes
 Immutable Data
 PGAS
– Local Heap
– Remote Heap

Activity Local
Locality Rule
Any access to a mutable
datum must be performed by a
local activity  remote data
accesses can be performed by
creating remote activities
Ordering Constraints (Memory Model)
Locally Synchronous:
Guaranteed coherence for local heap 
Sequential consistency
Globally Asynchronous:
No ordering of inter-place activities 
use explicit synchronization for coherence
Few concepts, done right.
© 2006 IBM Corporation
IBM Research: Software Technology
X10 v0.41 Cheat sheet
DataType:
Stm:
async [ ( Place ) ] [clocked ClockList ] Stm
ClassName | InterfaceName | ArrayType
when ( SimpleExpr ) Stm
nullable DataType
finish Stm
future DataType
next;
c.resume()
Programming Technologies
for( i : Region ) Stm
7
c.drop()
Kind :
value | reference
foreach ( i : Region ) Stm
ateach ( I : Distribution ) Stm
Expr:
ArrayExpr
ClassModifier : Kind
MethodModifier: atomic
x10.lang has the following classes (among
others)
point, range, region, distribution, clock, array
Some of these are supported by special syntax.
Forthcoming support: closures, generics, dependent types, place types,
implicit syntax, array literals.
© 2006 IBM Corporation
IBM Research: Software Technology
X10 v0.41 Cheat sheet: Array support
Region:
ArrayExpr:
new ArrayType ( Formal ) { Stm }
Expr : Expr
-- 1-D region
Distribution Expr
-- Lifting
[ Range, …, Range ]
-- Multidimensional Region
ArrayExpr [ Region ]
-- Section
Region && Region
-- Intersection
ArrayExpr | Distribution
-- Restriction
Region || Region
-- Union
ArrayExpr || ArrayExpr
-- Union
Region – Region
-- Set difference
ArrayExpr.overlay(ArrayExpr)
-- Update
BuiltinRegion
Programming Technologies
ArrayExpr. scan( [fun [, ArgList] )
8
ArrayExpr. reduce( [fun [, ArgList] )
Dist:
Region -> Place
-- Constant distribution
Distribution | Place
-- Restriction
Distribution | Region
-- Restriction
Type [Kind] [ ]
Distribution || Distribution
-- Union
Type [Kind] [ region(N) ]
Distribution – Distribution
-- Set difference
Type [Kind] [ Region ]
Distribution.overlay ( Distribution )
Type [Kind] [ Distribution ]
BuiltinDistribution
ArrayExpr.lift( [fun [, ArgList] )
ArrayType:
Language supports type safety, memory safety, place safety, clock safety.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Memory Model
9
Please see: http://www.saraswat.org/rao.html
 X10 v 0.41 specifies
sequential consistency per
place.
– Not workable.
 We are considering a
weaker memory model.
 Built on the notion of
atomic: identify a step as
the basic building block.
– A step is a partial write
function.
 Use links for non hb-reads.
 A process is a pomset of
steps closed under certain
transformations:
– Composition
– Decomposition
– Augmentation
– Linking
– Propagation
 There may be opportunity
for a weak notion of atomic:
decouple atomicity from
ordering.
Correctly synchronized programs behave as SC.
Correctly synchronized programs= programs whose SC
executions have no races.
© 2006 IBM Corporation
IBM Research: Software Technology
async
Programming Technologies
Stmt ::= async PlaceExpSingleListopt Stmt
10
async (P) S
 Creates a new child activity
at place P, that executes
statement S
 Returns immediately
 S may reference final
variables in enclosing blocks
 Activities cannot be named
 Activity cannot be aborted or
cancelled
cf Cilk’s spawn
// global dist. array
final double a[D] = …;
final int k = …;
async ( a.distribution[99] ) {
// executed at A[99]’s
// place
atomic a[99] = k;
}
 Memory model: hb edge
between stm before async
and start of async.
© 2006 IBM Corporation
IBM Research: Software Technology
finish
Stmt ::= finish Stmt
finish S
cf Cilk’s sync
 Execute S, but wait until all
(transitively) spawned asyncs have
finish ateach(point [i]:A)
terminated.
Programming Technologies
A[i] = i;
11
Rooted exception model
 Trap all exceptions thrown by
spawned activities.
 Throw an (aggregate) exception if
any spawned async terminates
abruptly.
 implicit finish at main activity
finish is useful for expressing
“synchronous” operations on
(local or) remote data.
finish async
(A.distribution [j])
A[j] = 2;
// all A[i]=i will complete
// before A[j]=2;
 Memory model: hb edge
between last stm of each
async and stm after finish S.
© 2006 IBM Corporation
IBM Research: Software Technology
foreach
foreach ( FormalParam: Expr ) Stmt
foreach (point p: R) S
 Creates |R| async statements in parallel at current place.
Programming Technologies
foreach (point p:R) S
12
for (point p: R)
async { S }
 Termination of all (recursively created) activities can be ensured
with finish.
 finish foreach is a convenient way to achieve master-slave
fork/join parallelism (OpenMP programming model)
© 2006 IBM Corporation
IBM Research: Software Technology
atomic
 Atomic blocks are conceptually
executed in a single step while
other activities are suspended:
isolation and atomicity.
Programming Technologies
 An atomic block ...
13
– must be nonblocking
– must not create concurrent
activities (sequential)
– must not access remote data
(local)
 Memory model: end of tx hb
start of next tx in the same
place.
Stmt ::= atomic Statement
MethodModifier ::= atomic
// target defined in lexically
// enclosing scope.
atomic boolean CAS(Object old,
Object new) {
if (target.equals(old)) {
target = new;
return true;
}
return false;
}
// push data onto concurrent
// list-stack
Node node = new Node(data);
atomic {
node.next = head;
head = node;
}
© 2006 IBM Corporation
IBM Research: Software Technology
Clocks: Motivation

Programming Technologies

14
Activity coordination using finish and force() is accomplished by
checking for activity termination
However, there are many cases in which a producer-consumer
relationship exists among the activities, and a “barrier”-like coordination is
needed without waiting for activity termination
– The activities involved may be in the same place or in different places
Phase 0
Phase 1
...
Activity 0
Activity 1
Activity 2
...
© 2006 IBM Corporation
IBM Research: Software Technology
Clocks (1/2)
clock c = clock.factory.clock();
 Allocate a clock, register current activity with it. Phase 0 of c starts.
Programming Technologies
async(…) clocked (c1,c2,…) S
ateach(…) clocked (c1,c2,…) S
foreach(…) clocked (c1,c2,…) S
 Create async activities registered on clocks c1, c2, …
15
c.resume();
 Nonblocking operation that signals completion of work by current
activity for this phase of clock c
next;
 Barrier --- suspend until all clocks that the current activity is registered
with can advance. c.resume() is first performed for each such clock, if
needed.
 Next can be viewed like a “finish” of all computations under way in the
current phase of the clock
© 2006 IBM Corporation
IBM Research: Software Technology
Clocks (2/2)
Programming Technologies
c.drop();
 Unregister with c. A terminating
activity will implicitly drop all clocks
that it is registered on.
16
c.registered()
 Return true iff current activity is
registered on clock c
 c.dropped() returns the opposite
of c.registered()
 Activity is deregistered from a clock
when it terminates.
 Memory model: hb edge
between next stm of all
registered activities on c, and
the subsequent stm in each
activity.
Static semantics
– An activity may operate only on
those clocks it is registered with.
– In finish S,S may not contain
any (top-level) clocked asyncs.
Dynamic semantics
– A clock c can advance only when
all its registered activities have
executed c.resume().
– An activity may not pass-on clocks
on which it is not live to subactivities. ClockUseException
– An activity may not transmit a clock
into the scope of a finish.
ClockUseException
No explicit operation to register a clock.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Example (TutClock1.x10)
17
finish async {
final clock c = clock.factory.clock();
foreach (point[i]: [1:N]) clocked (c) {
parent transmits clock
while ( true ) {
to child
int old_A_i = A[i];
int new_A_i = Math.min(A[i],B[i]);
if ( i > 1 )
new_A_i = Math.min(new_A_i,B[i-1]);
if ( i < N )
new_A_i = Math.min(new_A_i,B[i+1]);
A[i] = new_A_i;
next;
int old_B_i = B[i];
int new_B_i = Math.min(B[i],A[i]);
if ( i > 1 )
new_B_i = Math.min(new_B_i,A[i-1]);
if ( i < N )
new_B_i = Math.min(new_B_i,A[i+1]);
B[i] = new_B_i;
next;
if ( old_A_i == new_A_i && old_B_i == new_B_i )
break;
exiting from while loop
} // while
terminates activity for
} // foreach
} // finish async
iteration i, and automatically
deregisters activity from clock
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Clocked final variables
18

Permit variables to be marked
as clocked final, e.g.

clocked(c) final double[.] a = …

In each phase of the clock, the
variable is immutable.

Writes for such a variable are
performed on a shadow copy.

Main copy of variable updated
with value in shadow copy
when clock moves to the next
phase.

Clocked final variables cannot
introduce non-determinism.
– Assuming multiple writers write
the same value in each phase.
© 2006 IBM Corporation
IBM Research: Software Technology
Clocked final example: Array relaxation
G elements are assigned to at most once in each
phase of clock c.
clocked (c) final int [0:M-1,0:N-1] G = …;
Each activity is registered on c.
finish foreach (int i,j in [1:M-1,1:N-1]) clocked (c) {
Programming Technologies
for (int p in [0:TimeStep-1]) {
19
Read current
value of cell.
G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];
next;
Value written into ghost copy of G[i,j]
}
}
Wait for clock to advance.
Write visible (only) when clock advances.
© 2006 IBM Corporation
IBM Research: Software Technology
Imperative Programming Revisited
Programming Technologies

20

Variables
 Asynchrony introduces
indeterminacy
– Variable=Value in a Box
– Read: fetch current value
– Write: change value
int x = 0;
– Stability condition: Value does
not change unless a write is
async x=1;
performed
print(x);
Very powerful
– Permit repeated many-writer,
many-reader communication
 May write out either 0 or 1.
through arbitrary reference
graphs
 Bugs due to races are very
– Mutability in the presence of
difficult to debug.
sharing
– Permits different variables to
change at different rates.
Reader-reader, reader-writer, writer-writer conflicts.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Determinate programming design patterns
21
 NAS parallel benchmarks
– Conjugate gradient
– Multigrid
– LU factorization
 Single producer, multiple
copying consumers
– Kahn networks, StreamIt
– Pipelining
 Stencil computations
– Jacobi, SOR
 Molecular dynamics
 Graph algorithms
– Connected components
 Detecting stable properties
– Clocks!
– Short circuit technique
Parallelism for performance/scaling, not control.
© 2006 IBM Corporation
IBM Research: Software Technology
Determinate programming anti-patterns
Programming Technologies
 Reactive computing: arrivalorder indeterminism
– “Races in the world”
– E.g. Bank accounts
22
 Resource contention: any of
several possible outcomes is
acceptable
– Mutual exclusion
– Load balancing
•
Shared work list
But the program may
still contain
determinate
concurrent
components.
 Algorithm may permit any
one of many possible
solutions
– One solution for N-queens
– Some minimal spanning tree
– Some Delauney triangulation
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Determinate Concurrent Imperative frameworks
23
 Asynchronous Kahn
networks
– Nodes can be thought of as
(continuous) functions over
streams.
– Pop/peek
– Push
– Node-local state may
mutate arbitrarily
 Concurrent Constraint
Programming
– Tell constraints
– Ask if a constraint is true
– Subsumes Kahn networks
(dataflow).
– Subsumes (det) concurrent
logic programming, lazy
functional programming
Do not support arbitrary mutable variables.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Determinate Concurrent Imperative Frameworks
24
 Safe Asynchrony (Steele
1991)
– Parent may communicate
with children.
– Children may communicate
with parent.
– Siblings may communicate
with each other only
through commutative,
associative writes
(“commuting writes”).
Good:
int x=0;
finish foreach (int i in 1:N) {
x += i;
}
print(x); // N*(N+1)/2
Bad:
int x=0;
finish foreach (int i in 1:N) {
x += i;
async print(x);
}
Useful but limited. Does not permit dataflow synch.
© 2006 IBM Corporation
IBM Research: Software Technology
Determinate X10
DataType:
Stm:
async [ ( Place ) ] [clocked ClockList ] Stm
ClassName | InterfaceName | ArrayType
when ( SimpleExpr ) Stm
nullable DataType
finish Stm
future DataType
Programming Technologies
next;
25
c.resume()
c.drop()
local DataType
for( i : Region ) Stm
det DataType
foreach ( i : Region ) Stm
indet DataType
ateach ( I : Distribution ) Stm
Expr:
Kind :
value | reference
ArrayExpr
ClassModifier : Kind
MethodModifier: atomic
Constructs not available
Constructs added.
© 2006 IBM Corporation
IBM Research: Software Technology
local variables
 Instances of value classes
always considered local.
Programming Technologies
A mutable object is local only if it
is marked as local when created:
– new local T(…)
26
 A value of a local type can be
assigned only into a variable of
local type, or a field of a local
object.
 Variables of a local type are not
visible to contained asyncs.
 An async spawned in a finish
may assign a value of a local
type to a local variable of the
parent activity.
 A value may be cast to local T;
the cast may fail.
– E.g. local T x = (local T) this;
Invariant: Each activity owns the
local objects and local variables
it creates. Only local objects can
reference local objects.
 Local objects of terminated
activity become local objects of
parent.
Ownership type system used to maintain locality.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
det locations
27
 A det location is represented
in memory as a stream
(indexed sequence) of
immutable values.
 Each activity maintains an
index i + clean/dirty bit for
every det location.
– Initially i=1, v[0] contains initial

value.
– Read: If clean, block until v[i] is
written and return v[i++] else
return v[i-1]. Mark as clean.
– Write: Write into v[i++]. Mark
as dirty.
Note: index updated only as a
result of activity’s operations.

World Map = Collection of indices
for an activity.

Index transmission rules.
– async: Activity initialized with
current world map of parent
activity.
– finish: world map of activity is
lubbed with world map of
finished activities.
– (clean lub dirty = dirty)
The clock of clocked final is made implicit.
© 2006 IBM Corporation
IBM Research: Software Technology
Indet locations
 Can be recovered as det
locations + a mutable shared
index (“current”).
 An activity’s world map does
not need to contain index for
indet locations.
Programming Technologies
 All activities read and update
location through current.
28
 Therefore stream
representation is not
necessary, only the “current”
value need be kept.
© 2006 IBM Corporation
IBM Research: Software Technology
det example: Array relaxation
det int [0:M-1,0:N-1] G = …;
finish foreach (local int i,j in [1:M-1,1:N-1]) {
for (local int p in [0:TimeStep-1]) {
Programming Technologies
G[i,j] = omega/4*(G[i-1,j]+G[i+1,j]+G[i,j-1]+G[i,j+1])+(1-omega)*G[i,j];
29
}
}
All clock manipulations are implicit.
© 2006 IBM Corporation
IBM Research: Software Technology
Some simple examples
det int x=0;
finish {
0
async {
int r1 = x; int r2 = x;
1
println(r1); println(r2);
}
Programming Technologies
async {x=1;x=2;}
30
}
i
x
A1
0
0
read r1
1
1
read r2
2
2
A2
write 1
Convention: A type not marked
det is assumed marked local.
write 2
Only one result – independent of the scheduler!
© 2006 IBM Corporation
IBM Research: Software Technology
Some simple examples
det int x=0;
finish {
async {int r1 = x; int r2 = x; println(r1); println(r2);}
0
async {x=1;}
1
async {x=1; int r3 = x; async {x=2;}}
2
}
Programming Technologies
println(x);
31
i
x
A1 (0)
0
0
read r1
1
1
read r2
2
2
A2 (0)
A3 (0)
write 1
write 1;
read r3
A4 (2)
write 2
All programs are determinate.
© 2006 IBM Corporation
IBM Research: Software Technology
Some StreamIt examples
StreamIt
Det X10
void -> void pipeline Minimal {
det int x=0;
0
add IntSource;
async while (true) x++;
1
add IntPrinter;
async while (true) println(x);
…
}
Programming Technologies
void ->int filter IntSource {
32
The communication is through
assignment to x, so the same result
is obtained with:
int x;
init {x=0;}
work push 1 { push(x++);}
}
det int x=0;
0
int->void filter IntPrinter {
async while (true) ++x;
1
async while (true) println(x);
…
work pop 1 { print(pop());}
}
Each shared variable is a multi-reader, multi-writer stream.
© 2006 IBM Corporation
IBM Research: Software Technology
Some StreamIt examples: fibonacci
det int x=1, y=1;
async while (true) y=x;
Programming Technologies
async while (true) x+=y;
33
i
y
x
0
1
1
1
1
2
2
2
3
3
3
5
…
…
…
Activity 1
Activity 2
Can express any recursive, asynchronous Kahn network.
© 2006 IBM Corporation
IBM Research: Software Technology
StreamIt examples: Moving Average
void->void pipeline MovingAverage {
add intSource();
add Averager(10);
det int y=0;
det int x=0; async while (true) x++;
async while (true) {
add IntPrinter();
int sum=x;
}
for (int i in 1:N-1) sum += peek(x, i);
Programming Technologies
int->int filter Average(int n) {
34
work pop 1 push 1 peek n {
y = sum/N;
}
int sum=0;
for (int i=0; i < n; i++)
sum += peek(i);
push(sum/n);
pop();
• peek(x, i) reads the i’th future value,
without popping it. Blocks if necessary.
}
}
© 2006 IBM Corporation
IBM Research: Software Technology
Canon matrix multiplication
void canon (det double[N,N] c, det double[N,N] a,
det double[N,N] b) {
finish foreach (int i,j in [0:N-1,0:N-1]) {
a[i,j] = a[i,(j+1) % N];
Programming Technologies
b[i,j] = b[(i+j)%N, j];
35
}
for (int k in [0:N-1])
finish foreach (int i,j in [0:N-1,0:N-1]) {
c[i,j] = c[i+j] + a[i,j]*b[i,j];
a[i,j] = a[i,(j+1)%N];
b[i,j] = b[(i+1)%N, j];
}
}
The natural sequential program works (for  finish foreach).
© 2006 IBM Corporation
IBM Research: Software Technology
Implementation

Programming Technologies

36
Each activity’s world map
increases monotonically with
time.
Use garbage collection to erase
past unreachable values.

Programs with no sibling
communication may be
executed in buffers with unit
windows.

Considering permitting user to
specify bounds on variables (cf
push/pop specifications in
StreamIt).
– This will force writes to
become blocking as well.
Scheduling strategy affects size of buffers, not result.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Future work
37

Formalization
– MJ/CF
– Very straightforward additions
to field read/write.
– Paper contains details.
 Implementation.
– Leverage connection with
StreamIt, and static scheduling.

Paper contains ideas on
detecting deadlock (stabilities)
at runtime and recovering from
them.
– Programmability being
investigated.
– Devise static type system to
establish deadlock-freedom.
 Coarser granularity for indices.
– Use same clock for many
variables.
– Permits “coordinated” changes
to multiple variables.
 Introduce fusion operation
(x -> y) to support CCP.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Backup
38
© 2005 IBM Corporation
IBM Research: Software Technology
StreamIt examples: Bandpass filter
float->float pipeline BandPassFilter(float rate,
float bandPassFilter(float rate, float low,
float low, float high, int taps) {
float high, int taps, int in) {
add BPFCore(rate, low, high, taps);
int tmp=in;
add Subtracter();}
det int in1=tmp, in2=tmp;
float ->float splitjoin BPFCore
async while (true) in1=in;
Programming Technologies
(float rate, float low,
39
async while (true) in2=in;
float high, int taps) {
det int o1 = lowPass(rate, low, taps, 0, in1),
split duplicate;
o2 = lowPass(rate, high, taps, 0, in2);
add LowPass(rate, low, taps, 0);
det int o = o1-o2;
add LowPass(rate, high, taps, 0);
async while(true) o = o1-o2;
join roundrobin;}
float->float filter Subtracter {
return o;
}
Work pop 2 push 1 {
push(peek(1)-peek(0));
pop(); pop();}}
Functions return streams.
© 2006 IBM Corporation
IBM Research: Software Technology
Programming Technologies
Histogram
40
 Permit “commuting” writes <int N> [1:N][] histogram([1:N][] A) {
to be performed
final int[] B = new int [1:N];
simultaneously in the same
finish foreach(int i in A) B[A[i]]++;
phase.
 Phase is completed when all return B;
activities that can write have
}
written.
B’s phase is not yet
complete. A subsequent
read will complete it.
© 2006 IBM Corporation
IBM Research: Software Technology
Cilk programs with races
int x;
cilk void foo() {
x = x +1;
Determinate: Will always print 1 in CF.
}
Programming Technologies
cilk int main() {
41
x=0;
spawn foo();
spawn foo();
sync;
printf(“x is \%d\n”, x);
return 0;
}
CF smoothly combines Cilk and StreamIt.
© 2006 IBM Corporation
Download