Hadoop in Flight: Migrating Live MapReduce Jobs acknowledgments On-Disk On-Disk Segment On-Disk Segment On-Disk Segment Segment In-Mem In-Mem Segment In-Mem Segment Segment HDFS Wordcount on 11GB Migrated at 25% completion. 3x107 Power Consumption (W) Run 200 100 0 3x107 100 200 300 Site 2 7 2x107 1.5x107 1x107 350 300 250 200 150 5x106 100 0 100 200 300 Seconds from Start 400 1x107 350 400 300 1x106 250 1x105 200 1x104 150 1x103 100 0 400 Power Consumption Network In Network Out 1x108 400 Power Consumption (W) 2.5x10 250 Run 0 300 150 Pause Migrate 5x10 6 350 Pause Migrate 1.5x107 1x10 400 Run 2x107 7 Grep on 11GB Migrated at 50% completion Site 1 2.5x107 On-Disk On-Disk Segment On-Disk Segment On-Disk Segment Segment Power Consumption (W) Merged Output Index In-Mem In-Mem Segment In-Mem Segment Segment Reduce 50 100 150 200 1x108 400 1x107 350 1x10 Power Consumption (W) Merged Output Shuffle references index_0 index_1 index_2 index_3 Meta HDFS segment paths Segment byte offsets Segment type Task ID A. Berl, E. Gelenbe, M. D. Girolamo, G. Giuliani, H. D. Meer, M. Q. Dang, and K. Pentikousis, “Energy-efficient cloud computing,” The Computer Journal, vol. 53, no. 7, pp. 1045–1051, 2009. A. Beloglazov, J. Abawajy, and R. Buyya, “Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing,” Future Generation Computer Systems, May 2011. S. Akoush, R. Sohan, A. Rice, A. W. Moore, and A. Hopper, “Free lunch: exploiting renewable energy for computing,” in Proceedings of the 13th USENIX Workshop on Hot Topics in Operating Systems (HotOS 2011). USENIX, 2011. Y. Li, D. Chiu, C. Liu, L. T. X. Phan, T. Gill, S. Aggarwal, Z. Zhang, B. T. Loo, D. Maier, and B. McManus, “Towards dynamic pricing-based collaborative optimizations for green data centers,” in Workshops Proceedings of the 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, 2013, pp. 272–278. Network Activity (Byte/s) spill_0 spill_1 spill_2 spill_3 Meta HDFS segment paths Segment lengths Segment type Task ID Sort Run • Restarting a job reads this state file • FileInputFormat generates new splits based on block locations in secondary cluster • Application master parses global state metadata and creates new tasks with task-level state metadata • Restarting tasks retrieve task-level state files and rebuild data structures Map preliminary results • Pausing creates a state file in HDFS • Task-level state is calculated independently for each task • Each task stores its state to HDFS • Each task reports local state metadata to application master • Application master saves job-level state metadata to HDFS Meta Completed input split HDFS output path HDFS index path Task ID Network Activity (Byte/s) Meta Input split byte offset HDFS spill paths HDFS index paths Task ID • Ability to migrate a job to another cluster • Exploit volatile energy prices between multiple clusters to save money by shifting load State Meta .hdst Application Master Network Activity (Byte/s) • Assuming relevant data exists on both clusters • Ability to pause a job, free up resources, and restart at any time Department of Mathematics & Computer Science, University of Puget Sound, Tacoma WA, 98416 Network Activity (Byte/s) motivation • To allow higher-priority Hadoop jobs to take precedent over running jobs design Chili Johnson | David Chiu (Advisor) 300 6 250 1x105 200 1x104 150 1x103 100 This research was funded in part by the McCormick Research Grant, University of Puget Sound. 0 50 100 Seconds from Start 150 200