Distributed Data-Parallel Computing Using a High-Level Programming Language Yuan Yu Michael Isard Joint work with: Andrew Birrell, Mihai Budiu, Jon Currey, Úlfar Erlingsson, Dennis Fetterly, Pradeep Kumar Gunda Microsoft Research Silicon Valley The Goal of The Talk From the invitation: “to expose SIGMOD members to work going on in other "parallel" fields of Computer Science that potentially have deep implications for the information management research community.” The 2008 Claremont Report • “Designing systems that embrace non-relational data model, rather than shoehorning them into tables; • “… the techniques behind parallel and distributed databases---partitioned dataflow and cost-based query optimization---should extend to new environments. • “… need to pay attention to the softer issues that capture the hearts and minds of programmers (such as attractive syntax, typing and modularity, development tools, …) • “… database research must look beyond its traditional boundaries and find allies throughout computing.” Distributed Data-Parallel Computing • Research problem: How to write distributed data-parallel programs for a compute cluster? • The DryadLINQ programming model – – – – Sequential, single machine programming abstraction Same program runs on single-core, multi-core, or cluster Familiar programming languages Familiar development environment Dryad and DryadLINQ DryadLINQ provides automatic query plan generation Dryad provides automatic distributed execution Outline • • • • Programming model Dryad and DryadLINQ overview Lessons Conclusions LINQ • Microsoft’s Language INtegrated Query – Available in .NET3.5 and Visual Studio 2008 • A set of operators to manipulate datasets in .NET – Support traditional relational operators • Select, Join, GroupBy, Aggregate, etc. – Integrated into .NET programming languages • Programs can invoke operators • Operators can invoke arbitrary .NET functions • Data model – Data elements are strongly typed .NET objects – Much more expressive than relational tables • For example, nested data structures DryadLINQ Data Model Partition .Net objects Partitioned Table Partitioned table exposes metadata information – type, partition, compression scheme, serialization, etc. Demo • Preserve an existing programming model – The same familiar programming languages, development tools, libraries, etc. An Example: PageRank • Ranks web pages using the hyperlink structure, propagating scores along the links • Each iteration can be expressed as a SQL query 1. 2. 3. 4. 5. Join pages with ranks Distribute ranks on outgoing edges GroupBy edge destination Aggregate into ranks Repeat One PageRank Step in DryadLINQ // one step of pagerank: dispersing and re-accumulating rank public static IQueryable<Rank> PRStep(IQueryable<Page> pages, IQueryable<Rank> ranks) { // join pages with ranks, and disperse updates var updates = from page in pages join rank in ranks on page.name equals rank.name select page.Disperse(rank); // re-accumulate. return from list in updates from rank in list group rank.rank by rank.name into g select new Rank(g.Key, g.Sum()); } The Complete PageRank Program public static IQueryable<Rank> PRStep(IQueryable<Page> pages, IQueryable<Rank> ranks) { // join pages with ranks, and disperse updates var updates = from page in pages join rank in ranks on page.name equals rank.name select page.Disperse(rank); public struct Page { public UInt64 name; public Int64 degree; public UInt64[] links; public Page(UInt64 n, Int64 d, UInt64[] l) { name = n; degree = d; links = l; } // re-accumulate. return from list in updates from rank in list group rank.rank by rank.name into g select new Rank(g.Key, g.Sum()); public Rank[] Disperse(Rank rank) { Rank[] ranks = new Rank[links.Length]; double score = rank.rank / this.degree; for (int i = 0; i < ranks.Length; i++) { ranks[i] = new Rank(this.links[i], score); } return ranks; } } var pages = PartitionedTable.Get<Page>(“dfs://pages.txt”); var ranks = pages.Select(page => new Rank(page.name, 1.0)); // repeat the iterative computation several times for (int iter = 0; iter < n; iter++) { ranks = PRStep(pages, ranks); } } public struct Rank { public UInt64 name; public double rank; public Rank(UInt64 n, double r) { name = n; rank = r; } } ranks.ToPartitionedTable<Rank>(“dfs://outputranks.txt”); Multi-Iteration PageRank pages ranks Iteration 1 Iteration 2 Memory FIFO Iteration 3 Dryad System Architecture job manager Job1 data plane Files, TCP, FIFO V V V PD PD PD control plane New jobs Job1: v11, v12, … Job2: v21, v22, … Job3: … scheduler cluster Dryad • Provides a general, flexible execution layer – Dataflow graph as the computation model • Can be modified by runtime optimizations – Higher language layer supplies graph, vertex code, serialization code, hints for data locality, … • Automatically handles distributed execution – Distributes code, routes data – Schedules processes on machines near data – Masks failures in cluster and network – Fair scheduling of concurrent jobs Immutable Input • Assumes that inputs are immutable – High performance: Scales out to shared-nothing clusters made up of thousands of machines – Significantly simplifies the design/implementation • Simple fault-tolerant story • No need to handle complex transaction and synchronization – Good for processing largely static datasets – Not suitable for fine-grain, frequent updates DryadLINQ System Architecture Client machine Dryad DryadLINQ .NET program ToTable Cluster Query Expr Distributed Invoke query plan Query plan Vertex code Input Tables Dryad Execution foreach .Net Objects Output (11) Table Results Output Tables DryadLINQ • Distributed execution plan generation – Static optimizations: pipelining, eager aggregation, etc. – Dynamic optimizations: data-dependent partitioning, dynamic aggregation, etc. • Vertex runtime – – – – – Single machine (multi-core) implementation of LINQ Vertex code that runs on vertices Channel serialization code Callback code for runtime dynamic optimizations Automatically distributed to cluster machines Lessons • Acyclic dataflow graph is a powerful computation model • Language integration is amazingly successful • Leverage decades of database research • Decoupling of Dryad and DryadLINQ worked out well Acyclic Dataflow Graph • Acyclic dataflow graph provides a very powerful computation model – Easy target for higher-level programming abstractions such as DryadLINQ – Easy expression of many data-parallel optimizations • We designed Dryad to be general and flexible – Programmability is less of a concern – Used primarily to support higher-level programming abstractions – We haven’t modified Dryad in order to support DryadLINQ Expectation Maximization (Gaussians) • Generated by DryadLINQ • 3 iterations shown 21 The Language Integration Approach • Single unified programming environment – Unified data model and programming language – Direct access to IDE and libraries • Simpler than SQL programming – As easy for simple queries – Easier to use for even moderately complex queries • No embedded languages • Requires good programming language supports – LINQ extensibility: custom operators/providers – .NET reflection, dynamic code generation, … LINQ Framework • Extremely open and extensible .Net program (C#, VB, F#, etc) Query Objects LINQ provider interface Local machine Execution engines DryadLINQ PLINQ Scalability Cluster Multi-core LINQ-to-SQL LINQ-to-XML Single-core Combining with PLINQ Query DryadLINQ subquery PLINQ The combination of PLINQ and DryadLINQ delivers computation to every core in the cluster 24 Leverage Database Research • Example: MapReduce written in DryadLINQ MapReduce(source, // sequence of Ts mapper, // T -> Ms keySelector, // M -> K reducer) // (K, Ms) -> Rs { var map = source.SelectMany(mapper); var group = map.GroupBy(keySelector); var result = group.SelectMany(reducer); return result; // sequence of Rs } But, Not So Easy • The main sources of difficulty – Much more complicated data model – User-defined functions all over the places • Requires sophisticated program analysis techniques – Possible with modern programming languages and runtimes, such as C#/CLR Decoupling of Dryad and DryadLINQ • Separation of concerns – Dryad layer concerns scheduling and fault-tolerance – DryadLINQ layer concerns the programming model and the parallelization of programs – Result: powerful and expressive execution engine and programming model • Different from the MapReduce/Hadoop approach – A single abstraction for both programming model and execution engine – Result: very simple, but very restricted execution engine and language Software Stack Machine Learning Image Processing Graph Analysis … Data Mining Applications Other Applications DryadLINQ Other Languages Dryad CIFS/NTFS SQL Servers Azure DFS Cosmos DFS Cluster Services (Azure, HPC, or Cosmos) Windows Server Windows Server Windows Server Windows Server 28 Availability • Freely available for academic use – Dryad in binary, DryadLINQ in source – Will release Dryad source in the future • Coming soon to Microsoft commercial partners – Free, but no product support Conclusions • Goal: Use a compute cluster as if it is a single computer – Dryad/DryadLINQ represent a significant step • Requires close collaborations across many fields of computing, including – Distributed systems – Distributed and parallel databases – Programming language design and analysis Dryad/DryadLINQ Papers 1. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks (EuroSys’07) 2. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language (OSDI’08) 3. Distributed Data-Parallel Computing Using a High-Level Programming Language (SIGMOD’09) 4. Quincy: Fair scheduling for distributed computing clusters (SOSP’09) 5. Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations (SOSP’09)