Energy Prediction for I/O Intensive Workflow Applications MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University of British Columbia 1 Background - Workflow Applications Computation Characteristics: • File based communication • Large number of tasks • Large amount of I/O • Common data access patterns File Dependency Montage Workflow 2 Background - Application Execution File based communication Workflow Runtime Engine Large I/O volume App. task App. task App. task App. task App. task Local storage Local storage Local storage Local storage Local storage I/O Bottleneck Central Storage System (e.g., GPFS, NFS) 3 Background - Intermediate Storage System Workflow Runtime Engine Compute Nodes App. task Local storage … App. task App. task Local storage Local storage Stage Out Intermediate Storage Stage In Central Storage System (e.g., GPFS, NFS) 4 Background - Context of this thesis This work focuses on workflow application execution on intermediate storage systems. 5 Research Problem – Energy Consumption Computing Equipment Energy Bill • The pursuit of performance use to dominate the conventional computing area. • Energy efficiency is the new concern. 6 Research Problem - Configuration Decisions Configuring the runtime system is complex (Example: resource allocation decision) Montage Workload Energy Delay Product (EDP) 7 Research Problem - Questions • Q1: What performance optimizations in storage systems lead to energy savings? • Q2: What is the performance and energy impact of powercentric tuning techniques? • Q3: How can users balance time-to-solution and energy consumption when given a target application? 8 Outline • • • • • Background Research Problem Methodology Evaluation Conclusion 9 Methodology – Building Energy Consumption Predictor The goal of this work is to build an energy consumption predictor to aid system configuration and provisioning decisions. • Answer what-if questions (E.g, is A configuration better than B from the energy perspective?) • Customize optimization metric (E.g., energy consumption, performance-energy product) 10 Methodology – Energy Model Workflow Runtime Engine A Compute Nodes C B App. task App. task Local storage Local storage … D App. task App. task Local storage Local storage Intermediate Storage Execution States: • Idle • Network Transfer • Storage I/O • Task Processing Power Profiles: 11 Methodology – Energy Model Execution States: Energy Power Profile * Predicted Times Idle Network Transfer I/O ops (read, write) Task Processing 12 Methodology – Energy Model How to seed the energy model? • Power states: using synthetic benchmarks to get the power consumption in each state. • Time estimates: augments a performance predictor to track the time spent in each state. 13 Methodology – Building Energy Consumption Predictor Sources of inaccuracies Model Simplification (metadata, scheduling, …) Time Prediction homogeneity, Power meter L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, (Acceptance Rate: 20%) June 2014. L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, 2013. 14 Evaluation Outline • • • • Synthetic benchmarks: Workflow Patterns Real workflow applications Predicting Energy Impact of Power-tuning Techniques Predicting Energy-Performance Tradeoffs 15 Evaluation - Platform Grid5000 Lyon site • Idle Taurus Cluster (11 nodes) App Storage two 2.3GHz Intel Xeon E5-2630 CPUs (each I/O with 6 cores), Net transfer 32GB memory, 10 Gbps NIC • Sagittaire Cluster (16 nodes) two 2.4GHz AMD Opteron CPUs (each with one core), 2GB RAM and 1 Gbps NIC • SME Omegawatt power-meter per Node 0.01W power resolution at 1Hz sampling rate 16 Evaluation – Synthetic benchmarks: Workflow Patterns Montage Workflow Reduce Pipeline 17 Evaluation – Synthetic benchmarks: Workflow Patterns 18 Evaluation – Synthetic benchmarks: Workflow Patterns • Average 88% accuracy • 20-30x times faster than running the actual benchmark • 200x-300x less resources (machines * runtime) Using Default Storage System Configuration (DSS) 19 Evaluation – Synthetic benchmarks: Workflow Patterns Q1: What are the energy savings that performance optimizations in storage can bring? DSS – Default Storage System • Accurate in both configurations. Configuration •WOSS Suggests the configuration from – Workflow Optimized Storage System Configuration energy perspective. Pipeline Energy Consumption S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014 L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, Journal of Grid Computing 2014. 20 Evaluation – Real Workflow Applications BLAST workflow Montage workflow 21 Evaluation – Real Workflow Applications BLAST Result (Energy 89%, Time 95% ) Montage Result (Energy 84%, Time 86% ) 22 Evaluation – CPU Throttling • CPU throttling is an important technique where processors run at less-than-maximum frequency to Q2: What is the energy and performance conserveofpower. impact CPU throttling? Is it application• this technique can prolong the execution time while specific? conserving instantaneous power. CPU bound application: BLAST I/O bound application: pipeline benchmark 23 Evaluation – CPU Throttling Frequency Level: 1200MHz, 1800MHz, 2300MHz Conclusion: • The computational and I/O characteristics Energy BLAST Result Time 96% cost when using maximum CPU throttling Energy savings/ energy costs • The predictor can be used in make the decisions. Energy Pipeline Result Time 17% savings when using maximum throttling 24 Evaluation – Predicting Energy Delay Product User’s optimization metric • • • Performance (use more machines) Energy Energy-Delay Product (EDP, energy * time) Q3: How can users balance time-to-solution and energy consumption when given a target application? • Consider allocation decision. • Use Montage workload on two clusters to demonstrate prediction. 25 Evaluation – Predicting Energy Delay Product Montage EDP at Sagittaire Montage EDP at Taurus 26 Conclusion • This thesis presents an energy consumption predictor in the workflow application domain. • The proposed energy model and prediction framework achieve adequate accuracy to be useful for the energyoriented configurations this work targets. 27 Resulting Publications Energy Prediction • H. Yang, L. B. Costa and M. Ripeanu, “Energy Prediction for I/O Intensive Workflows Applications”, submitted to 7th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014 (Co-located with Supercomputing/SC 2014), under-review. Performance Prediction and Provisioning • • • L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration and Provisioning for I/O Intensive Workflows”, In Preparation. L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of ICS'14, Acceptance rate: 20%. June 2014. L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, 2013. A Workflow-Optimized Storage System • • • S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan , M. Ripeanu, “A Software Defined Storage for Scientific Workflow Applications”, In Preparation. S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014 L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, accepted by Journal of Grid Computing, 2014. Evaluating Storage Systems for Scientific Data in the Cloud • K. Maheshwari, J. Wozniak, H. Yang, D. S. Katz, M. Ripeanu, V. Zavala, M. Wilde, “Evaluating Storage Systems for Scientific Data in the Cloud”, In Proceedings of the 5th Workshop on Scientific Cloud Computing (ScienceCloud), Co-located with ACM HPDC 2014 (Best Paper Award) 28 Backup Slides System Deployment Configuration I/O traces Number of Storage Nodes Task Dependency Graph • The system model • Model seeding • Workload description 𝑁 𝑠𝑡 Number of Client Nodes 𝑁 𝑐𝑙𝑖 Chunk Size 𝑆𝑐ℎ𝑢𝑛𝑘 Replication Level 𝑅 … Platform Performance Parameters Manger Service Time Storage Service Time Client Service Time Remote network service Time Local network service time 𝜇𝑚𝑎 𝜇 𝑠𝑚 𝜇𝑐𝑙𝑖 𝜇𝑟𝑒−𝑛𝑒𝑡 𝜇lo−𝑛𝑒𝑡 L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, June 2014. 29 Backup Slides Limitations: • Simplification of the model • Short tasks/ small workload • Not validated using new devices (e.g, SSD) 30 Backup Slides Alternative Approaches: • Utilization • Detailed simulation • Machine learning 31 Backup Slides Combined states Apply benchmarks in parallel to get combined power state: E.g., perform storage and network benchmarks in parallel 𝑃𝑐𝑜𝑚𝑏𝑖𝑛𝑒 ≈ 𝑃𝑖𝑑𝑙𝑒 + (𝑃𝑠𝑡𝑜𝑟𝑎𝑔𝑒 − 𝑃𝑖𝑑𝑙𝑒 ) + (𝑃𝑛𝑒𝑡 − 𝑃𝑖𝑑𝑙𝑒 ) 𝑃𝑐𝑜𝑚𝑏𝑖𝑛𝑒 : 160.5W, 𝑃𝑖𝑑𝑙𝑒 :91.6W, 𝑃𝑠𝑡𝑜𝑟𝑎𝑔𝑒 :129.0W, 𝑃𝑛𝑒𝑡 : 127.7W 32 Backup Slides Energy Composition (pipeline benchmark): • Idle energy: 64% • App processing: 9.2% • Storage operations: 15.8% • Network transfer: 10.6% 33 Backup Slides Sagittaire power profiles 175W 25W 8W 7W 34