A Framework for Reasoning About Inherent Parallelism in Modern

advertisement
A Framework for Reasoning About
Inherent Parallelism in Modern
Object-Oriented Languages
Presented by A. Craik (5-Jan-12)
Research supported by funding from Microsoft Research
and the Queensland State Government
1
Introduction
Procedural
Algorithm
Semantic
Analysis
Sequential
Implementation
Parallel
Algorithm
Explicitly Parallel
Implementation
Dependency
Analysis
Procedural
Algorithm
Sequential
Implementation w/
Injected Parallelism
22
Introduction
• Inherent Parallelism:
a = 1;
for (int i=0; i<max; ++i)
a[i] = a[i] + 1;
b = 2;
c = a + b;
• Three steps for finding & exploiting:
 1. Find the inherent parallelism in the
 2.
 3.
program
Decide which inherent parallelism is
worth exploiting
Choose an implementation technology
to expose the selected parallelism
3
Introduction
• Dependencies impose ordering
constraints
• Sequential consistency required
• Two forms
– Control – which statements will run
– Data – reads & writes of shared state
• Control well studied and easier to
handle inter-procedurally
– Example, Java checked exceptions
4
Data Dependencies
• Flow Dependence (Write-After-Read)
int a = 1;
int b = a + 1;
a = 2;
• Output Dependence (Write-After-Write)
int a = 1;
a = 4;
a = 5;
• Anti-Dependence (Read-After-Write)
int a = 1;
a = 2;
int b = a + 1;
5
Traditional Approach
for (int i=0; i < 3; ++i) {
for (int j=0; j < i+1; ++j) {
a[i,j] = b[i,j] + c[i,j];
b[i,j] = a[i,j+1];
}
}
• Pair-wise analysis of statements and
expressions
• Can a, b or c refer to the array?
6
Traditional Approach
for (int i=0; i < 3; ++i) {
for (int j=0; j < i+1; ++j) {
a[i,j] = b[i,j] + c[i,j];
b[i,j] = a.readIandJInc(i,j);
}
}
• What does a.readIandJInc(i,j) do?
• Examine ALL possible
implementations!
7
Side-Effects
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
return this[i,j+1];
}
}
8
Side-Effects
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
this[0,0] = i + j;
return this[i,j];
}
}
9
Side-Effects
class Holder {
public static int value;
}
class Array {
public int readsIandJInc(i,j) {
Holder.value++;
return this[i,j];
}
}
10
Limitations of Current Techniques
Traditional Approach My Approach
Kernels
Inter-procedural


Less precise

• Traditional:
• Focused on analyzing complex
tight loops
• Poor abstraction and composition
• Too complex for programmers
to use without tool support
11
The Idea
• Goal:
– Simplify inter-procedural dependency
analysis
• Idea:
– Ensure safety
– Make reasoning modular and
composable
12
The Idea
• Specify effects on method signature:
public int getReads()
reads<> writes<> 
• What goes in the angle brackets?
– Abstract effect description
– Composable descriptions
– Verifiable
13
The Idea
14
Object-Orientation
• Encapsulation  representation hierarchy
Person
name
Company
dateOfBirth
employer
Date
String
15
The Idea
16
Safe Parallelism
Block 1 {
...
}
Block 2 {
...
}
reads <a,b> writes <c,d>
reads <w,x> writes <y,z>
• Can 2 arbitrary pieces of code
execute in parallel safely?
• Type rules specify computation of
effect sets
• Look for overlaps in the read & write
effect sets to find possible data deps.
17
Dependencies using Effect Sets
• Dependency exists where two
triangles of representation overlap
• Triangles can only be nested:

• Becomes a check for a parent-child
relationship; disjointess  no dep.
18
Types of Parallelism
• Task Parallelism
– Run 2+ separate ops. at same time
• Loop Parallelism
– Execute loop iterations in parallel
• Pipeline Parallelism
– Stage loop body execution so that
iteration execution overlaps safely
19
Task Parallelism
class Demo {
void op1() reads<a,b> writes<c,d> {…}
void op2() reads<w,x> writes<y,z> {…}
}
• Can we execute calls to op1 and op2
in parallel?
• Determine the overlap in the effect
sets; no overlap  no data deps.
• Realization using one-way calls or
futures
20
Loop Parallelism Conditions
• Data parallel loops major source of
parallelism in imperative programs
• Start with simple data parallel loop
in the form of a foreach loop:
foreach (T element in collection)
element.operation();
21
Foreach Loop Conditions
• Condition 1:
Areas holding the representations of the
objects returned by the enumerator are
all disjoint from one another
22
Foreach Loop Conditions
• Condition 2:
The operation only mutates the
representation of its “own” element and
does not read the state owned by any of
the other elements
23
Foreach Loop Conditions
• Condition 3:
There are no control dependencies
which would prevent loop
parallelization
24
Arbitrary Loop Bodies
• So far we have looked at
foreach(T element in collection)
element.operation();
• Question: How do we generalize this to
an arbitrary loop body?
foreach(T element in collection) {
//sequence of statements
//including local var defs
//and a read of a context r
}
25
Loop Body Rewriting
• Loop becomes:
foreach (T elem in collection)
elem.loopBody(this);
• Where loopBody is:
class T {
void loopBody(Foo me) {
//same sequence of statements
//replace all elem by this
//and all this by me
}
}
26
Object-Orientation
• Encapsulation  representation hierarchy
Person
name
Company
dateOfBirth
employer
Date
String
27
Ownership Types
• Designed to enforce encapsulation
• Adapted to validate encapsulation
• Type parameters to capture memory
referencing permissions
class Person [o,c] {
private String|this| Name;
private Date|this| DateOfBirth;
private Company|c| Employer;
…
}
28
Ownerships & Effects
class Company[o] {
public string name;
…
}
class Person[o,c] {
private Company|c| Employer;
public string employerName()
reads<this,c> writes<>
{
return Employer.name;
}
…
}
29
Contexts and Dependencies
• Analyze & apply sufficient
conditions
• All pairs of context relations need to
be known
• Need some basis to believe the
relationships between contexts to
hold
30
Reasons for a Runtime System
• Statically know some relationships
– The owner of an object is a parent of
the object’s this context
– The world context is a parent of all
contexts
• Relationship may only be known
dynamically
• Optionally track at runtime to allow
runtime conditions
31
Conditional Parallelism
parallel for(T<c> e in collection){
e.operation(arguments);
}
disjoint(r,c)
Always True
for(T<c> e in collection){
e.operation(arguments);
}
if (disjoint(r,c)) {
parallel version
} else {
sequential version
} disjoint(r,c)
unknown
serial for(T<c> e in collection){
e.operation(arguments);
}
disjoint(r,c)
Always False
32
Reasons for a Runtime System
• We do not know the relationships
between all contexts at compile time.
• May vary from one object or method
invocation to another
• Reasons:
– Separate Compilation
– Dynamic Linking
– Complex Data Flows
33
Reasons for a Runtime System
• Type system provides support for
specifying context relationships
programmer asserts must be true
void oper1[r]() reads<r,c…> writes<…>
where r # c {
…
foreach(T|c| elem in collection){…}
…
}
34
Runtime System Implementation
• Naïve implementation – each object
keeps a pointer to its owner
35
Subject Reduction
Progress
Well Formed Heap
Owner Invariance
AFJO Soundness
Effect Soundness
Contexts form a Tree
Cast Safety
Effect Completeness
Static Context
Relations
Disjointness Test
Correct
Context Parameters
do not survive
Context Disjointness Implies Effect
Disjointness
Disjoint effects imply no data
dependencies
Update Dependency Preservation Sufficient
for Parallelization Sequential Consistency
Task Parallelism
Sufficient Conditions
Data Parallelism
Sufficient Conditions
Pipeline Parallelism
Sufficient Conditions
36
Implementation – Zal
• Added my system to C# 3.5
• Extended GPC# compiler
Metric
Total
GPC#
Extensions
Extensions
(% Total)
SLOC-P
39,444
27,888
12,156
30.8%
SLOC-L
22,201
14,957
7,244
32.7%
• Added infrastructure to support
arbitrary type parameters
• Implemented runtime ownership
tracking system (~1,000 lines)
37
Implementation – Zal
Zal
source
Zal Compiler
C#
source
Microsoft
C# Compiler
CIL
Program w/
Ownership
Tracking
Runtime
Ownership
Libraries
Executing Program with
Automatic Parallelization
38
Implementation – Zal
Legend
Effect
Computation
C# compilation step
Parallelization
computeEffects() AST
LocalEffects()
Zal compilation step
Computes heap & stack
effects for AST nodes
I/O
Parallelize()
Ownership
Implementation
AST
Checks sufficient
conditions for
parallelism and
implements them
BuildOwnership
Implementation()
Implements Zal features
in C# by modifying AST
AST
Scanner
Parser
generated by GPLex
generated by Coco/R
Scanner.scan()
Reads a stream of
characters and
processes them into
tokens
Tokens
Parser.parse()
Converts stream of
tokens into an
Abstract Syntax Tree
Type Checker
AST
TypeCheck()
Resolves all TypeRefs
to TypeDefs & checks
type correctness
Code
Generation
AST
Output()
Emit
Generates C# or CIL
implementation of AST
Dynamic Linked
Libraries
Source
Code
Files
Bytecode
File
C# Source
File
39
Validation
• Have applied my system to a
number of realistic applications
• Overall annotation requires
modification to 20% of the source
• Ownership tracking overhead:
– Execution time: 10% to 20%
– Memory usage: 15% to 30%
• Implementation not fully optimized
40
Validation – Speedup
41
Validation – Speedup
42
Related Work – Prog. Langs.
• Focus on providing tools to express
parallelism
• No support for validating
correctness of parallelization
• Assumed programmer knowledge of
parallel programming constructs
• Examples: Fortress, Chapel, X10
43
Related Work – Ownership
• Have proposed effect systems, but only
suggested application to parallelism
• Data race and dead lock detection for
locking – very different reasoning
• Deterministic Parallel Java (late 2009)
– modified ownerships
– Focused on kernels
– Lost composition & abstraction to do so
44
Contributions
• Abstract and composable system for
reasoning about effects based on
Ownership Types.
• Effect and reasoning systems
applied to a real language and real
program examples
• Real parallelism detected and
exploited automatically
45
Contributions
• Developed and proved sufficient
conditions for a number of different
forms of parallelism
• Runtime system to support static
reasoning.
46
Publications
A. Craik and W. Kelly. Using Ownership to Reason
About Inherent Parallelism in Imperative
Object-Oriented Programs. International
Conference on Compiler Construction. ed. R.
Gupta, LNCS 6011, pp. 145-164, SpringerVerlag Berlin Hiedleberg, 2010.
W. Reid, W. Kelly, and A. Craik. Reasoning about
Parallelism in Modern Object-Oriented
Languages. Australasian Computer Science
Conference. 2008
+3 technical reports on various versions of the
reasoning system in e-prints
47
Conclusion
• System for reasoning about data
dependencies and parallelism
• Abstract & composable
• Usable by both programmers &
automated tools
• Question of when & how to exploit
still open
• Demonstration this automated
reasoning is possible w/ prototype
48
Q&A
49
Ownership & The Stack
• Ownerships traditionally for
encapsulation
• Stack not considered by these works
• Stack & stack referencing models
vary from language to language
• I consider a restricted stack model:
– Stack and heap are disjoint
– Stack locations can be differentiated
by name
50
Ownership & The Stack
• Stack model fits Java, C#, and
VB .NET
• Dereferencing to read the heap
causes an ownership effect
• Stack location names are unique
and cannot be aliased without dereferencing
51
Download