SkewReduce Skew-Resistant Parallel Processing of Feature-Extracting Scientific User-Defined Functions YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Published in SoCC 2010 Motivation • Science is becoming a data management problem • MapReduce is an attractive solution – Simple API, declarative layer, seamless scalability, … • But it is hard to – Express complex algorithms and – Get good performance (14 hours vs. 70 minutes) • SkewReduce: – Goal: Scalable analysis with minimal effort – Toward scalable feature extraction analysis 2 Application 1: Extracting Celestial Objects • Input – { (x,y,ir,r,g,b,uv,…) } • Coordinates • Light intensities • … • Output – List of celestial objects • • • • • Star Galaxy Planet Asteroid … M34 from Sloan Digital Sky Survey 3 Application 2: Friends of Friends • Simple clustering algorithm used in astronomy • Input: ε – Points in multi-dimensional space • Output: – List of clusters (e.g., galaxy, …) – Original data annotated with cluster ID • Friends – Two points within ε distance • Friends of Friends – Transitive closure of Friends relation 4 Parallel Friends of Friends • Partition • Local clustering • Merge – P1-P2 – P3-P4 • Merge C2 C1 C3 C5 C5 →C3 P1 P3 P2 P4 C4 C4→C3 C6 C6 →C5 C6 →C3 – P1-P2-P3-P4 • Finalize – Annotate original data 5 Parallel Feature Extraction INPUT DATA Features • • • • Partition multi-dimensional input data Map Extract features from each partition Merge (or reconcile) features “Hierarchical Reduce” Finalize output 6 Problem: Skew Skew Task ID Local Clustering (MAP) Merge (REDUCE) 5 minutes 35 minutes 8 node cluster, 32map/32reduce slots • The top red line runs for 1.5 hours Time 7 Unbalanced Computation: Skew • Computation skew – Characteristics of an algorithm • Same amount of input data != Same runtime O(N log N) 0 friends per particle ~ O(N2) O(N) friends per particle Can we scale out off-the-shelf implementation without (or minimal) modifications? 8 Solution 1? Micro partition • Assign tiny amount of work to each task to reduce skew 9 How about having micro partitions? 8 node cluster, 32map/32reduce slots • To find sweet spot, need to try different granularities! 16 Completion time (Hours) • It works! • Framework overhead! 14 12 10 8 6 4 2 0 256 1024 4096 8192 # of partitions Can we find a good partitioning plan without trial and error? 10 Outline • Motivation • SkewReduce – API (in the paper) – Partition Optimization • Evaluation • Summary 11 Partition Optimization Serial Feature Extraction Algorithm Merge Algorithm 1 5 6 7 8 9 13 11 10 15 3 4 12 14 2 • Varying granularities of partitions • Can we automatically find a good partition plan and schedule? 12 Approach Runtime Plan 1 5 Sample Cost functions 6 SkewReduce Optimizer 7 8 Cluster configuration 9 13 11 10 15 3 4 12 14 2 • Goal: minimize expected total runtime • SkewReduce runtime plan – Bounding boxes for data partitions – Schedule 13 Partition Plan Guided By Cost Functions … “Given sample, how long will it take to process?” • Two cost functions: – Feature cost: (Bounding box, sample, sample rate) → cost – Merge cost:(Bounding boxes, sample, sample rate) → cost • Basis of two optimization decisions – How (axis, point) to split a partition – When to stop partitioning 14 Search Partition Plan • Greedy top-down search – Split if total expected runtime improves • Evaluate costs for subpartitions and merge • Estimate new runtime Original Possible Split 50 1 Schedule 1 = 60 1 3 2 3 2 100 50 Time 10 Schedule 2 = 110 1 2 3 15 Summary of Contributions • Given a feature extraction application – Possibly with computation skew • SkewReduce – Automatically partitions input data – Improves runtime in spite of computation skew • Key technique: user-defined cost functions 16 Evaluation • 8 node cluster – Dual quad core CPU, 16 GB RAM – Hadoop 0.20.1 + custom patch in MapReduce API • Distributed Friends of Friends – Astro: Gravitational simulation snapshot • 900 M particles – Seaflow: flow cytometry survey • 59 M observations 17 Does SkewReduce work? MapReduce Astro (18 GB, 3D) Seaflow (1.9 GB, 3D) Relative Runtime 10 9 8 7 6 5 4 3 2 1 0 1 hour preparation 128 MB 16 MB 4 MB 2 MB Manual SkewReduce 14.1 8.8 4.1 5.7 2.0 1.6 87.2 63.1 77.7 98.7 - 14.1 Hours Minutes • SkewReduce plan yields 2 ~ 8 times faster running time 18 Impact of Cost Function Astro Completion time (Hours) 16 14 12 10 8 6 4 2 0 Data Size Histogram 1D Cost Function Higher fidelity = Better performance Histogram 3D 19 Highlights of Evaluation • Sample size – Representativeness of sample is important • Runtime of SkewReduce optimization – Less than 15% of real runtime of SkewReduce plan • Data volume in Merge phase – Total volume during Merge = 1% of input data • Details in the paper 20 Conclusion • Scientific analysis should be easy to write, scalable, and have a predictable performance • SkewReduce – API for feature extracting functions – Scalable execution – Good performance in spite of skew • Cost-based partition optimization using a data sample • Published in SoCC 2010 – More general version is coming out soon! 21