A Framework for Reasoning About Inherent Parallelism in Modern Object-Oriented Languages Presented by A. Craik (5-Jan-12) Research supported by funding from Microsoft Research and the Queensland State Government 1 Introduction Procedural Algorithm Semantic Analysis Sequential Implementation Parallel Algorithm Explicitly Parallel Implementation Dependency Analysis Procedural Algorithm Sequential Implementation w/ Injected Parallelism 22 Introduction • Inherent Parallelism: a = 1; for (int i=0; i<max; ++i) a[i] = a[i] + 1; b = 2; c = a + b; • Three steps for finding & exploiting: 1. Find the inherent parallelism in the 2. 3. program Decide which inherent parallelism is worth exploiting Choose an implementation technology to expose the selected parallelism 3 Introduction • Dependencies impose ordering constraints • Sequential consistency required • Two forms – Control – which statements will run – Data – reads & writes of shared state • Control well studied and easier to handle inter-procedurally – Example, Java checked exceptions 4 Data Dependencies • Flow Dependence (Write-After-Read) int a = 1; int b = a + 1; a = 2; • Output Dependence (Write-After-Write) int a = 1; a = 4; a = 5; • Anti-Dependence (Read-After-Write) int a = 1; a = 2; int b = a + 1; 5 Traditional Approach for (int i=0; i < 3; ++i) { for (int j=0; j < i+1; ++j) { a[i,j] = b[i,j] + c[i,j]; b[i,j] = a[i,j+1]; } } • Pair-wise analysis of statements and expressions • Can a, b or c refer to the array? 6 Traditional Approach for (int i=0; i < 3; ++i) { for (int j=0; j < i+1; ++j) { a[i,j] = b[i,j] + c[i,j]; b[i,j] = a.readIandJInc(i,j); } } • What does a.readIandJInc(i,j) do? • Examine ALL possible implementations! 7 Side-Effects class Holder { public static int value; } class Array { public int readsIandJInc(i,j) { return this[i,j+1]; } } 8 Side-Effects class Holder { public static int value; } class Array { public int readsIandJInc(i,j) { this[0,0] = i + j; return this[i,j]; } } 9 Side-Effects class Holder { public static int value; } class Array { public int readsIandJInc(i,j) { Holder.value++; return this[i,j]; } } 10 Limitations of Current Techniques Traditional Approach My Approach Kernels Inter-procedural Less precise • Traditional: • Focused on analyzing complex tight loops • Poor abstraction and composition • Too complex for programmers to use without tool support 11 The Idea • Goal: – Simplify inter-procedural dependency analysis • Idea: – Ensure safety – Make reasoning modular and composable 12 The Idea • Specify effects on method signature: public int getReads() reads<> writes<> • What goes in the angle brackets? – Abstract effect description – Composable descriptions – Verifiable 13 The Idea 14 Object-Orientation • Encapsulation representation hierarchy Person name Company dateOfBirth employer Date String 15 The Idea 16 Safe Parallelism Block 1 { ... } Block 2 { ... } reads <a,b> writes <c,d> reads <w,x> writes <y,z> • Can 2 arbitrary pieces of code execute in parallel safely? • Type rules specify computation of effect sets • Look for overlaps in the read & write effect sets to find possible data deps. 17 Dependencies using Effect Sets • Dependency exists where two triangles of representation overlap • Triangles can only be nested: • Becomes a check for a parent-child relationship; disjointess no dep. 18 Types of Parallelism • Task Parallelism – Run 2+ separate ops. at same time • Loop Parallelism – Execute loop iterations in parallel • Pipeline Parallelism – Stage loop body execution so that iteration execution overlaps safely 19 Task Parallelism class Demo { void op1() reads<a,b> writes<c,d> {…} void op2() reads<w,x> writes<y,z> {…} } • Can we execute calls to op1 and op2 in parallel? • Determine the overlap in the effect sets; no overlap no data deps. • Realization using one-way calls or futures 20 Loop Parallelism Conditions • Data parallel loops major source of parallelism in imperative programs • Start with simple data parallel loop in the form of a foreach loop: foreach (T element in collection) element.operation(); 21 Foreach Loop Conditions • Condition 1: Areas holding the representations of the objects returned by the enumerator are all disjoint from one another 22 Foreach Loop Conditions • Condition 2: The operation only mutates the representation of its “own” element and does not read the state owned by any of the other elements 23 Foreach Loop Conditions • Condition 3: There are no control dependencies which would prevent loop parallelization 24 Arbitrary Loop Bodies • So far we have looked at foreach(T element in collection) element.operation(); • Question: How do we generalize this to an arbitrary loop body? foreach(T element in collection) { //sequence of statements //including local var defs //and a read of a context r } 25 Loop Body Rewriting • Loop becomes: foreach (T elem in collection) elem.loopBody(this); • Where loopBody is: class T { void loopBody(Foo me) { //same sequence of statements //replace all elem by this //and all this by me } } 26 Object-Orientation • Encapsulation representation hierarchy Person name Company dateOfBirth employer Date String 27 Ownership Types • Designed to enforce encapsulation • Adapted to validate encapsulation • Type parameters to capture memory referencing permissions class Person [o,c] { private String|this| Name; private Date|this| DateOfBirth; private Company|c| Employer; … } 28 Ownerships & Effects class Company[o] { public string name; … } class Person[o,c] { private Company|c| Employer; public string employerName() reads<this,c> writes<> { return Employer.name; } … } 29 Contexts and Dependencies • Analyze & apply sufficient conditions • All pairs of context relations need to be known • Need some basis to believe the relationships between contexts to hold 30 Reasons for a Runtime System • Statically know some relationships – The owner of an object is a parent of the object’s this context – The world context is a parent of all contexts • Relationship may only be known dynamically • Optionally track at runtime to allow runtime conditions 31 Conditional Parallelism parallel for(T<c> e in collection){ e.operation(arguments); } disjoint(r,c) Always True for(T<c> e in collection){ e.operation(arguments); } if (disjoint(r,c)) { parallel version } else { sequential version } disjoint(r,c) unknown serial for(T<c> e in collection){ e.operation(arguments); } disjoint(r,c) Always False 32 Reasons for a Runtime System • We do not know the relationships between all contexts at compile time. • May vary from one object or method invocation to another • Reasons: – Separate Compilation – Dynamic Linking – Complex Data Flows 33 Reasons for a Runtime System • Type system provides support for specifying context relationships programmer asserts must be true void oper1[r]() reads<r,c…> writes<…> where r # c { … foreach(T|c| elem in collection){…} … } 34 Runtime System Implementation • Naïve implementation – each object keeps a pointer to its owner 35 Subject Reduction Progress Well Formed Heap Owner Invariance AFJO Soundness Effect Soundness Contexts form a Tree Cast Safety Effect Completeness Static Context Relations Disjointness Test Correct Context Parameters do not survive Context Disjointness Implies Effect Disjointness Disjoint effects imply no data dependencies Update Dependency Preservation Sufficient for Parallelization Sequential Consistency Task Parallelism Sufficient Conditions Data Parallelism Sufficient Conditions Pipeline Parallelism Sufficient Conditions 36 Implementation – Zal • Added my system to C# 3.5 • Extended GPC# compiler Metric Total GPC# Extensions Extensions (% Total) SLOC-P 39,444 27,888 12,156 30.8% SLOC-L 22,201 14,957 7,244 32.7% • Added infrastructure to support arbitrary type parameters • Implemented runtime ownership tracking system (~1,000 lines) 37 Implementation – Zal Zal source Zal Compiler C# source Microsoft C# Compiler CIL Program w/ Ownership Tracking Runtime Ownership Libraries Executing Program with Automatic Parallelization 38 Implementation – Zal Legend Effect Computation C# compilation step Parallelization computeEffects() AST LocalEffects() Zal compilation step Computes heap & stack effects for AST nodes I/O Parallelize() Ownership Implementation AST Checks sufficient conditions for parallelism and implements them BuildOwnership Implementation() Implements Zal features in C# by modifying AST AST Scanner Parser generated by GPLex generated by Coco/R Scanner.scan() Reads a stream of characters and processes them into tokens Tokens Parser.parse() Converts stream of tokens into an Abstract Syntax Tree Type Checker AST TypeCheck() Resolves all TypeRefs to TypeDefs & checks type correctness Code Generation AST Output() Emit Generates C# or CIL implementation of AST Dynamic Linked Libraries Source Code Files Bytecode File C# Source File 39 Validation • Have applied my system to a number of realistic applications • Overall annotation requires modification to 20% of the source • Ownership tracking overhead: – Execution time: 10% to 20% – Memory usage: 15% to 30% • Implementation not fully optimized 40 Validation – Speedup 41 Validation – Speedup 42 Related Work – Prog. Langs. • Focus on providing tools to express parallelism • No support for validating correctness of parallelization • Assumed programmer knowledge of parallel programming constructs • Examples: Fortress, Chapel, X10 43 Related Work – Ownership • Have proposed effect systems, but only suggested application to parallelism • Data race and dead lock detection for locking – very different reasoning • Deterministic Parallel Java (late 2009) – modified ownerships – Focused on kernels – Lost composition & abstraction to do so 44 Contributions • Abstract and composable system for reasoning about effects based on Ownership Types. • Effect and reasoning systems applied to a real language and real program examples • Real parallelism detected and exploited automatically 45 Contributions • Developed and proved sufficient conditions for a number of different forms of parallelism • Runtime system to support static reasoning. 46 Publications A. Craik and W. Kelly. Using Ownership to Reason About Inherent Parallelism in Imperative Object-Oriented Programs. International Conference on Compiler Construction. ed. R. Gupta, LNCS 6011, pp. 145-164, SpringerVerlag Berlin Hiedleberg, 2010. W. Reid, W. Kelly, and A. Craik. Reasoning about Parallelism in Modern Object-Oriented Languages. Australasian Computer Science Conference. 2008 +3 technical reports on various versions of the reasoning system in e-prints 47 Conclusion • System for reasoning about data dependencies and parallelism • Abstract & composable • Usable by both programmers & automated tools • Question of when & how to exploit still open • Demonstration this automated reasoning is possible w/ prototype 48 Q&A 49 Ownership & The Stack • Ownerships traditionally for encapsulation • Stack not considered by these works • Stack & stack referencing models vary from language to language • I consider a restricted stack model: – Stack and heap are disjoint – Stack locations can be differentiated by name 50 Ownership & The Stack • Stack model fits Java, C#, and VB .NET • Dereferencing to read the heap causes an ownership effect • Stack location names are unique and cannot be aliased without dereferencing 51