SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department This project has been partially funded by: Energy Efficiency in Computing Clusters Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency. (Source: International Data Corporation (IDC), 2009) 2 Multi-threaded Applications in the Cloud Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Energy efficiency is a function of application characteristics. Energy Savings on Virtualized Server 40 35 30 25 % 20 15 10 5 0 Max Energy Saving Min Energy Saving 3 Outline Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions 4 Background Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Cluster-level VM Management - Consolidation policies across server nodes VM migration techniques [Srikantaiah, HotPower’08] [Bonvin, CCGrid’11] Node-level Management - Co-scheduling based on thread communication Identifying best thread mixes to co-schedule [Frachtenberg, TPDS’05] [McGregor, IPDPS’05] Recent Co-scheduling policies - Co-scheduling contrasting workloads - Balancing performance events across nodes - Cache misses [Dhiman, ISLPED’09] - IPC [Bhadauria, ICS’10] - Bus accesses 5 Virtualized System Setup Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • 12-core AMD Magny Cours Server 2x 6-core dies attached side by side in the same package Private L1 and L2-caches for each core 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere 5 ESXi hypervisor 2 Virtual Machines (VM) with Ubuntu Server Guest OS 6 Methodology: Measurement Setup Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • System-level power measurements at 1s sampling rate • Performance counter collection through vmkperf at 1s sampling rate Counters: CPU cycles, retired instructions, L3-cache misses • VM-level CPU and memory utilization data collection through esxtop with 2s sampling rate esxtop vmkperf System-level power measurement Logger 7 Parallel Workloads Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • PARSEC 2.1 benchmark suite [Bienia et al., 2008] Benchmark Application IPC Memory Acc. blackscholes Financial Analysis Low Low bodytrack Computer Vision High Medium canneal VLSI Design Low High dedup Enterprise Storage Medium Low ferret Similarity Search Medium Low freqmine Data Mining High Low swaptions Financial Analysis High Low streamcluster Data Mining Low High vips Media Processing High Low x264 Media Processing Medium Medium 8 Tracking Parallel Phases Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • consolmgmt • Consolidation management interface • Synchronizes ROI (region-of-interest) of multiple workloads parsecmgmt hooks.c Benchmark A Input (Serial) consolmgmt sleep() Output (Serial) roi-Trigger() start-Logging Benchmark B roi-Trigger() Input (Serial) start-Logging() Output (Serial) end-Logging() 9 Performance Impact of Consolidation Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Consolidating multiple workloads can degrade performance due to resource contention. • Virtualization provides performance isolation by managing memory and NUMA node affinities. • With native OS, performance variation is 2.5x higher. Average throughput of Streamcluster when coscheduled with another PARSEC benchmark 10 Outline Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Background • Methodology • Adaptive Resource Sharing • Results • Conclusions 11 Impact of Application Selection Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Previous co-scheduling policies focus on application selection to improve energy efficiency. • Application selection is based on balancing memory operations and CPU usage. A B C D 12 Predicting Power Efficiency Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • To improve the energy efficiency, we need to allocate more CPU resources to power-efficient workloads. IPC*CPU Utilization • IPC*CPU Utilization metric shows strong correlation with power efficiency. 13 Application Classification Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. 14 Application Classification Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. • We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. Benchmarks Case 1 VM Configuration VM0 VM1 ESXi Case 2 VM0 VM0 VM1 VM1 ESXi 15 Reconfiguring Resource Allocations Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • CPU hot-plugging: Adding/removing vCPUs during runtime Cons: Removing vCPU is not supported in some OSes • Resource Allocation Adjustment: Allocating/limiting CPU resources for VMs Pros: Fine granularity (resource allocation unit is MHz) • Both techniques have low overhead, less than 1%. Resource Configuration Comparison 16 Reconfiguration Runtime Behavior Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Resource allocation limits can be dynamically adjusted according to application classes. • CPU allocation limits can be effectively reconfigured within a second. 17 Results Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Proposed approach improves throughput-per-watt by up to 25% and by 9% on average. 18 Results Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. Set 2 3x canneal 3x ferret 2x bodytrack 1x dedup 1x vips Set 1 4x blackscholes 2x vips 1x bodytrack 1x freqmine 1x streamcluster 1x swaptions 19 Results Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. • Proposed resource sharing technique improves the throughput-perwatt by 12% on average in comparison to application selection based co-scheduling techniques. 20 Conclusions & Future Work Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments • Consolidation is a powerful technique to improve the energy efficiency on data centers. • Energy efficiency of parallel workloads varies significantly depending on application characteristics. • Adaptive VM configuration for parallel workloads improves the energy efficiency by 12% on average over existing co-scheduling algorithms. • Future research directions include: Investigating the effect of memory allocation decisions on energy efficiency; Utilizing application-level instrumentation to explore power/energy optimization opportunities; Expanding the application space. 21