Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota †IBM Almaden Research Center University of Minnesota MapReduce Provisioning Problem Platform: Virtualized Cloud Environment, which enables Virtualized MapReduce Clusters Several MapReduce Jobs from different users Goal: Optimize system-wide metrics, such as: throughput, energy, load distribution, user costs Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? University of Minnesota 2 MapReduce Platform: Hadoop Open-source implementation of MapReduce distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google) Input Data University of Minnesota Hadoop Clusters Distributed data Distributed computation Replicated chunks Map/reduce tasks Traditional: Dedicated physical nodes University of Minnesota 4 Virtual Hadoop Clusters Hadoop Processes VM Pool Server Pool Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce = Hadoop+AmazonEC2 University of Minnesota 5 Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning Case Study: Performance optimization Case Study: Energy optimization University of Minnesota 6 Spatio-Temporal Insights for Provisioning Initial Focus: Energy Savings Goal: Minimize energy usage Energy+cooling ~ 42% of total cost [Hamilton08] Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) University of Minnesota 7 VM Placement: Spatial Fit Job 1 Job 2 Job 3 Job 4 Co-Place complementary workloads University of Minnesota 8 Which placement is better? 100min 20min SHUTDOWN 20min 20min SHUTDOWN 10min 20min A B University of Minnesota 9 Time Balancing 20 25 20 25 20 25 90 Time Balance 20 25 30 20 25 30 University of Minnesota 20 25 30 10 Building Blocks for Provisioning MapReduce Jobs Objective-driven resource provisioning Job profiling Cluster scaling Initial Provisioning Migration Continuous Optimization Cloud Execution Environment University of Minnesota 11 Building Blocks for Provisioning Job Profiling: MapReduce job runtime estimation Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Based on number of VMs allocated to job Based on input data size Offline and Online Profiling Affects runtime of job; relies on Job Profiling model Migration: Useful for continuous optimization Load balancing, VM consolidation University of Minnesota 12 Job Profiling: Runtime Estimation Based on Number of VMs University of Minnesota 13 Job Profiling: Runtime Estimation Based on Input Data Size University of Minnesota 14 Job Profiling: Runtime Estimation Online Profiling: Additional refinement University of Minnesota 15 Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance University of Minnesota 16 Cluster Scaling: Time Balancing 20 25 20 25 20 25 90 Time Balance 20 25 30 20 25 30 University of Minnesota 20 25 30 17 Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for Provisioning Building Blocks for MapReduce Provisioning Case Study: Performance optimization Case Study: Energy optimization University of Minnesota 18 Case Study: Performance & Deadlines Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet deadline if necessary Monitoring: Use offline profiling to estimate number of VMs needed based on past performance Actuation: Online profiling: Trigger points to invoke cluster scaling University of Minnesota 19 Case Study: Energy Savings Goal: Minimize energy consumption from the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost [Hamilton08] Pass energy savings on to users Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU) University of Minnesota 20 Case Study: Energy Savings Use Job Profiling to place similar-runtime VMs together for initial provisioning Use Job Profiling to adjust number of VMs in each cluster to adjust runtimes if needed Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning University of Minnesota 21 Conclusion Framework: Building blocks (STEAMEngine) for the optimization of MapReduce provisioning from a cloud service provider perspective Preliminary evaluations to validate usefulness of each building block Approaches for applying building blocks to meet specific goals, e.g. performance, energy University of Minnesota 22 Thank you! Questions? University of Minnesota 23 Job Profiling: Runtime Estimation Based on Number of VMs University of Minnesota 24 Cluster Scaling Increasing allocated resources (typical): Add additional VMs to join virtualized Hadoop cluster Job performance increases, runtime decreases E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines: Performance University of Minnesota 25