APGAS: Programming for concurrency and distribution Vijay Saraswat May 12, 2008

advertisement
IBM Research: Software Technology
Programming Technologies
APGAS: Programming for concurrency and
distribution
1
Vijay Saraswat
May 12, 2008
IBM TJ Watson Research Center
5/31/2016
© 2005 IBM Corporation
X10 Programming Language 6/2008
The current architectural landscape
SMP Node
PEs,
SMP Node
PEs,
...
PEs,
...
Memory
PEs,
...
Memory
Interconnect
IBM Research
Power5 Clusters
2
Multi-core processors, with
accelerators
e.g. Sun Niagara
e.g. Intel multicore, IXP
e.g. IBM Cell
e.g. GPGPUs
P7 supernode
Blue Gene
I/O
gateway
nodes

(100’s of such
cluster nodes)
“Scalable Unit” Cluster Interconnect Switch/Fabric
Road Runner: Cell-accelerated Opteron
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
IBM Research
The current architectural landscape
3
 Substantial architectural
innovation is anticipated over
the next ten years.
– Hardware situation remains
murky, but programmers need
stable interfaces to develop
applications
 Heterogenous accelerator-based
systems will exist, raising
serious programmability
challenges.
– Programmers must choreograph
interactions between
heterogenous processors,
memory subsystems.
 Multicore systems will dramatically
raise the number of cores available
to applications.
– Programmers must understand
concurrent structure of their
applications.
 Applications seeking to leverage
these architectures will need to go
beyond data-parallel, globally
synchronizing MPI model.
 These changes, while most profound
for HPC now, will change the face of
commercial computing over time.
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
What is Partitioned Global Address Space (PGAS)?
IBM Research
Process/Thread
4
Address Space
Message passing
Shared Memory
PGAS
MPI
OpenMP
UPC, CAF, X10
 Computation is performed in
multiple places.
 A place contains data that can be
operated on remotely.
 Data lives in the place it was
created, for its lifetime.
 A datum in one place may
reference a datum in another place.
 Data-structures (e.g. arrays) may
be distributed across many places.
 Places may have different
computational properties (e.g. PPE,
SPE, …).
A place expresses locality.
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
IBM Research
What is Asynchronous PGAS?
5
 Asynchrony
– Simple explicitly concurrent
model for the user: async (p)
S runs statement S “in parallel”
at place p
– Controlled through finish, and
local (conditional) atomic
 Used for active messaging
(remote asyncs), DMAs, finegrained concurrency,
fork/join concurrency, doall/do-across parallelism
– SPMD is a special case
Concurrency is made explicit and programmable.
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
IBM Research
What is X10?
6
 Based on sequential
Java/Scala
 Extensions for
concurrency,
distribution and arrays
 Clean foundations
– Advanced type system
– Determinate and
deadlock-free subsets
 Designed for multicore,
clusters, for commercial
applications and HPC.
 Supports simple
constructs for atomicity
and ordering
 Supports rich multidimensional distributed
arrays
A realization of APGAS as a modern OO language.
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
X10 v 1.5= Sequential Java + Concurrency
 <stmt> ::= async (p) <stmt>
 <stmt> ::= finish <stmt>
 <stmt> := atomic <stmt>
 <stmt> ::= join <stmt>
– New for X10 2.0
 <expr> = future (p) <expr>
IBM Research
 <expr> ::= f.force()
7
 Clocks
– <stmt> = clock a = new
clock();
– c.next;
– c.resume();
 Streams planned as “data
carrying” clocks for X10 2.0
 Array language –
multidimensional, distributed
– Points
– Regions
– Distributions
– Arrays
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
How does X10 help productivity?
 Expressive and efficient
core language
– Rich, flexible, widely-used OO
language base
– With multidimensional arrays
– Dependent types permit
compiler to check datadependent assertions (catch
bugs early) and generate
better code
– Supports reusable libraries
IBM Research
 Expressive concurrent
language
8
– Language supports global
address space, one-sided
remote operations, atomic
operations, fine-grained
asynchrony, termination
detection, multi-level
parallelism
“Looks
 Faster time to correct code
– Large classes of errors ruled out
by design (e.g. language
guarantees memory safety,
pointer safety, deadlockfreedom for programs in a rich
sublanguage, determinacy,
advanced type-checking).
•
Cf. May 2005 PSC Study
– While maintaining performance
•
Cf. SC 07 HPCC Award
 Tooling
– Eclipse-based programming
environment enables
programmer to quickly develop,
browse, understand and refactor
code.
good, runs fast”
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
X10 1.7 language design
 In essence, support
generics cf Java 1.5
val d: Array[double] = …;
val ranks: List[int]= …;
class Array[T]{ …}
IBM Research
 Why Generics?
9
– Same class can be reused at
different types.
– Fosters reusable code.
– Standard technique for
commercial, type-safe languages
--- modern programmers expect
generics.
– Permits (large portions of) X10
runtime to be written in X10
 Actual language design
influenced more by Scala
than Java 1.5
– Java has made many “wrong”
decisions, driven by backward
compatibility.
– Scala gives a much cleaner
OO + functional base than
Java, while being compatible
with JVMs.
– X10 1.7 is not Scala
•
Diverges on many
dimensions including
types, pattern matching,
multiple inheritance.
•
Retains run-time types
(rtti)
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
X10 1.7: Design of generics
 X10 v 1.1 permits class
parametrization with values
value Array[T]{
val dist: dist;
– Constrained types, cf. OOPSLA
08
– List(:length==3)
IBM Research
 X10 v 1.7 permits
classes/methods/constructor
s to be parametrized with
types as well.
10
– Basic idea is to permit type
valued properties and
parameters.
– Same dependent type syntax is
used for value- and typeparametrization.
val pieces:
ValRail[Rail[T]{current}];
…
}
 Generic classes are
implemented by type
expansion
 Type parameters are not
erased at runtihme
– Provides significant run-time
flexibility.
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
Design of type definitions
 Types with many dependent
clauses can become
cumbersome to read
– => Need abstractions over
types.
 Type definitions permit new
(type-, value-) parametrized
names for types.
 Type definitions expand out
into real types.
– Not generative; do not
introduce new subtypes.
– Scoped in classes, just as
other members
IBM Research
type nat = int{self >=0};
11
type Rail[T](n:nat)=Array[T] {val p:place;dist==0..n-1->p};
type booleanVar = int;
…
val a : Rail[double](N)=…;
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
Introduce functions/closures
 Permit succinct
specifications of pieces of
codes (as lambda
expressions)
 A function may be applied
as many times as
necessary, returns once
each time.
 Functions may capture
lexical variables.
 Functions implemented as
special objects.
 Many library methods
parametrized to accept
such functions.
val d = dist.blockCyclic(r);
IBM Research
val a = new Array[double](d, (p:point)=>rand(p));
12
val max = (x:double,y:double)=> (x <= y ? y :x);
val m = a.reduce(0,max);
X10 remains an OO language (but with functions)
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
Example: RandomAccess
finish ateach(val (p) in UNIQUE.region) {
var ran=HPCC_starts (p*NumUpdates);
for (val i in 0..NumUpdates-1) {
val placeID =((ran>>LogTableSize)& PLACEIDMASK) as int;
val arg=ran;
async {
val local = Table(placeID);
IBM Research
atomic local.array((arg & local.mask) as int) ^= arg;
13
}
ran = (ran << 1) ^ (ran < 0 ? POLY : 0L);
}
}
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
Pseudo Depth-first search
Breadth-first search
class V[T] {
var parent: V[T];
val neighbors: Array[V[T]];
val data: T;
def this(d: T){data=d; …}
def compute(): void {
for (val v in neighbors) {
atomic
v.parent=(v.parent==null?this:v.parent);
if (v.parent==this)
IBM Research
async clocked (c) {next; v.compute(); }} }
14
def computeTree(): void {parent=this;
finish compute();}
…
}
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
December 2008
 Public availability of X10 Flash compiler and X10lib
runtime for clusters of SMPs.
– On track.
 Demonstration of initial functionality of X10 sourcelevel debugger for Eclipse (operating on Java code
produced by X10 compiler).
IBM Research
– On track.
– Note that X10DT currently has source-level breakpoint and
stepping functionality.
15
5/31/2016
© 2007 IBM Corporation
X10 Programming Language 6/2008
X10 papers
1.
2.
3.
4.
5.
IBM Research
6.
16
7.
“Constrained types for OO Languages”, To
appear in OOPSLA 08
“Solving irregular graph algorithms using
adaptive work-stealing”, to appear in ICPP 08.
“Optimizing array accesses in high
productivity languages”, to appear in HPCC
2007
“Deadlock-free scheduling of X10
Computations with bounded resources”, SPAA
2007, June 2007
“A Theory of Memory Models”, PPoPP, March
2007.
“May-Happen-in-Parallel Analysis of X10
Programs”, PPoPP, March 2007.
“An annotation and compiler plug-in system
for X10”, IBM Technical Report, Feb 2007.
1.
“Experiences with an SMP Implementation for X10
based on the Java Concurrency Utilities” Workshop
on Programming Models for Ubiquitous Parallelism
(PMUP), September 2006.
7.
"An Experiment in Measuring the Productivity of
Three Parallel Programming Languages”, PPHEC workshop, February 2006.
8. "X10: An Object-Oriented Approach to NonUniform Cluster Computing", OOPSLA
conference, October 2005.
9. "Concurrent Clustered Programming", CONCUR
conference, August 2005.
10. "X10: an Experimental Language for High
Productivity Programming of Scalable Systems",
P-PHEC workshop, February 2005.
5/31/2016
© 2007 IBM Corporation
Download