The DryadLINQ Approach to Distributed Data-Parallel Computing Yuan Yu Microsoft Research Silicon Valley Distributed Data-Parallel Computing • Dryad talk: the execution layer – How to reliably and efficiently execute distributed data-parallel programs on a compute cluster? • This talk: the programming model – How to write distributed data-parallel programs for a compute cluster? The Programming Model • Sequential, single machine programming abstraction • Same program runs on single-core, multi-core, or cluster • Preserve the existing programming environments – Modern programming languages (C# and Java) are very good • Expressive language and data model • Strong static typing, GC, generics, … – Modern IDEs (Visual Studio and Eclipse) are very good • Great debugging and library support • Legacy code could be easily reused Dryad and DryadLINQ DryadLINQ provides automatic query plan generation Dryad provides automatic distributed execution Outline • • • • Programming model DryadLINQ Applications Discussions and conclusion LINQ • Microsoft’s Language INtegrated Query – Available in .NET3.5 and Visual Studio 2008 • A set of operators to manipulate datasets in .NET – Support traditional relational operators • Select, Join, GroupBy, Aggregate, etc. – Integrated into .NET programming languages • Programs can invoke operators • Operators can invoke arbitrary .NET functions • Data model – Data elements are strongly typed .NET objects – Much more expressive than relational tables • For example, nested data structures LINQ Framework .Net program (C#, VB, F#, etc) Query Objects LINQ provider interface Local machine Execution engines DryadLINQ PLINQ Scalability Cluster Multi-core LINQ-to-SQL LINQ-to-Obj Single-core A Simple LINQ Query IEnumerable<BabyInfo> babies = ...; var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby; A Simple PLINQ Query IEnumerable<BabyInfo> babies = ...; var results = from baby in babies.AsParallel() where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby; A Simple DryadLINQ Query PartitionedTable<BabyInfo> babies = PartitionedTable.Get<BabyInfo>(“BabyInfo.pt”); var results = from baby in babies where baby.Name == queryName && baby.State == queryState && baby.Year >= yearStart && baby.Year <= yearEnd orderby baby.Year ascending select baby; DryadLINQ Data Model Partition .Net objects Partitioned Table Partitioned table exposes metadata information – type, partition, compression scheme, serialization, etc. Demo • It is just programming – The same familiar programming languages, development tools, libraries, etc. K-means Execution Graph C0 ac C1 cc ac ac P1 P2 P3 ac C2 cc ac ac ac cc ac ac C3 K-means in DryadLINQ public class Vector { public double[] entries; [Associative] public static Vector operator +(Vector v1, Vector v2) { … } public static Vector operator -(Vector v1, Vector v2) { … } public double Norm2() { …} } public static Vector NearestCenter(Vector v, IEnumerable<Vector> centers) { return centers.Aggregate((r, c) => (r - v).Norm2() < (c - v).Norm2() ? r : c); } public static IQueryable<Vector> Step(IQueryable<Vector> vectors, IQueryable<Vector> centers) { return vectors.GroupBy(v => NearestCenter(v, centers)) .Select(group => group.Aggregate((x,y) => x + y) / group.Count()); } var vectors = PartitionedTable.Get<Vector>("dfs://vectors.pt"); var centers = vectors.Take(100); for (int i = 0; i < 10; i++) { centers = Step(vectors, centers); } centers.ToPartitionedTable<Vector>(“dfs://centers.pt”); PageRank Execution Graph cc N01 N02 ae D ae N11 cc D ae N12 D N13 cc N03 cc E1 E2 ae D ae N21 cc D ae N22 D N23 cc E3 cc ae D ae cc D ae N31 D N32 cc N33 PageRank in DryadLINQ public static IQueryable<Rank> Step(IQueryable<Page> pages, IQueryable<Rank> ranks) { // join pages with ranks, and disperse updates var updates = from page in pages join rank in ranks on page.name equals rank.name select page.Disperse(rank); public struct Page { public UInt64 name; public Int64 degree; public UInt64[] links; public Page(UInt64 n, Int64 d, UInt64[] l) { name = n; degree = d; links = l; } // re-accumulate. return from list in updates from rank in list group rank.rank by rank.name into g select new Rank(g.Key, g.Sum()); public Rank[] Disperse(Rank rank) { Rank[] ranks = new Rank[links.Length]; double score = rank.rank / this.degree; for (int i = 0; i < ranks.Length; i++) { ranks[i] = new Rank(this.links[i], score); } return ranks; } } var pages = PartitionedTable.Get<Page>(“dfs://pages.pt”); var ranks = pages.Select(page => new Rank(page.name, 1.0)); // repeat the iterative computation several times for (int iter = 0; iter < n; iter++) { ranks = Step(pages, ranks); } } public struct Rank { public UInt64 name; public double rank; public Rank(UInt64 n, double r) { name = n; rank = r; } } ranks.ToPartitionedTable<Rank>(“dfs://ranks.pt”); MapReduce in DryadLINQ MapReduce(source, // sequence of Ts mapper, // T -> Ms keySelector, // M -> K reducer) // (K, Ms) -> Rs { var map = source.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.SelectMany(reducer); return result; // sequence of Rs } // Ex: Count the frequencies of words in a book MapReduce(book, line => line.Split(' '), w => w.ToLower(), g => g.Count()) DryadLINQ System Architecture Client machine Cluster Dryad DryadLINQ .NET program LINQ query Query Expr Distributed Invoke query plan Query plan Vertex code Input Tables Dryad Execution foreach .Net Objects Output (11) Table Results Output Tables DryadLINQ • Distributed execution plan generation – Static optimizations: pipelining, eager aggregation, etc. – Dynamic optimizations: data-dependent partitioning, dynamic aggregation, etc. • Vertex runtime – – – – – Single machine (multi-core) implementation of LINQ Vertex code that runs on vertices Data serialization code Callback code for runtime dynamic optimizations Automatically distributed to cluster machines A Simple Example • Count the frequencies of words in a book var map = book.SelectMany(line => line.Split(' ')); var group = map.GroupBy(w => w.ToLower()); var result = group.Select(g => g.Count()); Naïve Execution Plan M D D ..... M Map D Distribute Merge MG MG MG G G G GroupBy R R R Reduce X X X Consumer … reduce g.Count() M map line.Split(' ') M Map G1 G1 G1 GroupBy IR IR IR InitialReduce D D D Distribute g.Sum() g.Sum() MG MG Merge G2 G2 GroupBy C C Combine MG MG Merge G3 G3 GroupBy F F FinalReduce X X Consumer aggregation tree M reduce g.Count() M map Execution Plan Using Partial Aggregation Challenge: Program Analysis Support • The main sources of difficulty – Complicated data model – User-defined functions all over the places • Requires sophisticated static program analysis at byte-code level – Mainly some form of flow analysis • Possible with modern programming languages and runtimes, such as C#/CLR Inferring Dataset Property • Useful for query optimizations Hash(y => y.a+y.b) Select(x => new { A=x.a+x.b, B=f(x.c) } Hash(z => z.A) GroupBy(x => x.A) No data partitioning at GroupBy Inferring Dataset Property • Useful for query optimizations Hash(y => y.a+y.b) Select(x => Foo(x) } Hash(x => x.A)? GroupBy(x => x.A) Need IL-level data-flow analysis of Foo Caching Query Results • A cluster-wide caching service to support – Reuse of common subqueries – Incremental computations • Cache: [Key Value] – Key is <q, d>, Value is the result of q(d) – Key requires a pretty good reachability analysis • For correctness, Key must include “everything” reachable from q • For performance, Key should only contain things q depends on More Static Analysis • Purity checking – All the functions called in DryadLINQ queries must be side-effect free – DryadLINQ (and PLINQ) doesn’t enforce it • Metadata validity checking – Partitioned table’s metadata contains the record type, partition scheme, serialization functions, … – Need to determine if the metadata is valid – DryadLINQ doesn’t fully enforce it • Static enforcement of program properties for security and privacy mechanisms Examples of DryadLINQ Applications • Data mining – Analysis of service logs for network security – Analysis of Windows Watson/SQM data – Cluster monitoring and performance analysis • Graph analysis – Accelerated Page-Rank computation – Road network shortest-path preprocessing • Image processing – Image indexing – Decision tree training – Epitome computation • Simulation – light flow simulations for next-generation display research – Monte-Carlo simulations for mobile data • eScience – Machine learning platform for health solutions – Astrophysics simulation Decision Tree Training Mihai Budiu, Jamie Shotton et al Learn a decision tree to classify pixels in a large set of images label image Machine learning 1M images x 10,000 pixels x 2,000 features x 221 tree nodes Decision Tree Complexity >1020 objects Sample Execution plan Initial empty tree Image inputs, partitioned Read, preprocess Redistribute Histogram Regroup histograms on node Compute new tree layer Compute new tree Broadcast new tree Final tree 30 Application Details • • • • • • Workflow = 37 DryadLINQ jobs 12 hours running time on 235 machines More than 100,000 processes More than 100 days of CPU time Recovers from several failures daily 34,000 lines of .NET code Windows SQM Data Analysis SQM Service Michal Strehovsky, Sivarudrappa Mahesh et al DataMarts V SQM Client Client Compressed Storage Front FrontEnd End Cluster V V Distributed Execution Engine Reporting Portal Data Flow Custom Schema/Storage Datapoints collected on client and Uploaded. IIS Servers Check validity & concatenate File movers move incoming data into inexpensive compressed storage Distributed Excution Engine queries data based on user initiated adhoc queries or well defined queries for reporting DataMarts stores data for reporting purposes Rich reports created with customization capability Data validation and analysis toolsets which query raw data directly Extract data for storage in custom schema with different retention policy The Language Integration Approach • Single unified programming environment – Unified data model and programming language – Direct access to IDE and libraries – Different from SQL, HIVE, Pig Latin • Multiple layers of languages and data models • Works out very well, but requires good programming language supports – LINQ extensibility: custom operators/providers – .NET reflection, dynamic code generation, … Combining with PLINQ Query DryadLINQ subquery PLINQ The combination of PLINQ and DryadLINQ delivers computation to every core in the cluster 34 Acyclic Dataflow Graph • Acyclic dataflow graph provides a very powerful computation model – Easy target for higher-level programming abstractions such as DryadLINQ – Easy expression of many data-parallel optimizations • We designed Dryad to be general and flexible – Programmability is less of a concern – Used primarily to support higher-level programming abstractions – No major changes made to Dryad in order to support DryadLINQ Expectation Maximization (Gaussians) • Generated by DryadLINQ • 3 iterations shown 36 Decoupling of Dryad and DryadLINQ • Separation of concerns – Dryad layer concerns scheduling and fault-tolerance – DryadLINQ layer concerns the programming model and the parallelization of programs – Result: powerful and expressive execution engine and programming model • Different from the MapReduce/Hadoop approach – A single abstraction for both programming model and execution engine – Result: very simple, but very restricted execution engine and language Software Stack Machine Learning Image Processing Graph Analysis Data Mining …… eScience Applications DryadLINQ Dryad CIFS/NTFS SQL Servers Azure DFS Cosmos DFS Cluster Services (Azure, HPC, or Cosmos) Windows Server Windows Server Windows Server Windows Server 38 Conclusion • Single unified programming environment – Unified data model and programming language – Direct access to IDE and libraries • An open and extensible system – Many LINQ providers out there • Existing ones: LINQ-to-XML, LINQ-to-SQL, PLINQ, … • Very easy to write one for your app domain – Dryad/DryadLINQ scales out all of them! Availability • Freely available for academic use – http://connect.microsoft.com/DryadLINQ – DryadLINQ source, Dryad binaries, documentation, samples, blog, discussion group, etc. • Will be available soon for commercial use – Free, but no product support