Green Computing Omer Rana o.f.rana@cs.cardiff.ac.uk The Need Bill St. Arnaud (CANARIE, Inc) Impact of ICT Industry Bill St. Arnaud (CANARIE, Inc) Virtualization Techniques Bill St. Arnaud (CANARIE, Inc) Virtual Machine Monitors (IBM 1960s) App App App App CMS MVS CMS CMS IBM VM/370 IBM Mainframe A thin software layer that sits between hardware and the operating system— virtualizing and managing all hardware resources Ed Bugnion, VMWare Old idea from the 1960s • IBM VM/370 – A VMM for IBM mainframe – Multiple OS environments on expensive hardware – Desirable when few machine around • Popular research idea in 1960s and 1970s – Entire conferences on virtual machine monitor – Hardware/VMM/OS designed together • Interest died out in the 1980s and 1990s. – Hardware got cheap – Operating systems got more more powerful (e.g multiuser) Ed Bugnion, VMWare A return to Virtual Machines • Disco: Stanford research project (1996-): – Run commodity OSes on scalable multiprocessors – Focus on high-end: NUMA, MIPS, IRIX • Hardware has changed: – Cheap, diverse, graphical user interface – Designed without virtualization in mind • System Software has changed: – Extremely complex – Advanced networking protocols – But even today : • Not always multi-user • With limitations, incompatibilities, … Ed Bugnion, VMWare The Problem Today Operating System Intel Architecture Ed Bugnion, VMWare The VMware Solution Operating System Operating System Intel Architecture Intel Architecture Ed Bugnion, VMWare ™ VMware MultipleWorlds Technology App App App App Win 2000 Win NT Linux Win 2000 ™ VMware MultipleWorlds Intel Architecture A thin software layer that sits between Intel hardware and the operating system— virtualizing and managing all hardware resources Ed Bugnion, VMWare MultipleWorlds Technology World App App App App Win 2000 Win NT Linux Win 2000 VMware MultipleWorlds Intel Architecture A world is an application execution environment with its own operating system Ed Bugnion, VMWare Virtual Hardware Parallel Ports Ethernet Serial/Com Ports Floppy Disks Monitor (VMM) Keyboard Sound Card IDE Controller SCSI Controller Ed Bugnion, VMWare Mouse Attributes of MultipleWorlds Technology • Software compatibility – Runs pretty much all software • Low overheads/High performance – Near “raw” machine performance • Complete isolation – Total data isolation between virtual machines • Encapsulation – Virtual machines are not tied to physical machines • Resource management Ed Bugnion, VMWare Hosted VMware Architecture Host Mode VMM Mode VMware, acting as an application, uses the host to access other devices such as the hard disk, floppy, or network card The VMware Virtual machine monitor allows each guest OS to directly access the processor (direct execution) Guest OS Applications Guest Operating System Host OS Apps Host OS NIC VMware App VMware Driver Disks PC Hardware Virtual Machine Virtual Machine Monitor Memory VMware achieves both near-native execution speed and broad device support by transparently switching* between Host Mode and VMM Mode. CPU *VMware typically switches modes 1000 times per second Ed Bugnion, VMWare Hosted VMM Architecture • Advantages: – Installs and runs like an application – Portable – host OS does I/O access – Coexists with applications running on the host • Limits: – Subject to Host OS: • Resource management decisions • OS failures • Usenix 2001 paper: J. Sugerman, G. Venkitachalam and B.-H. Lim, “Virtualizing I/O on VMware Workstation’s Hosted Architecture”. Ed Bugnion, VMWare Virtualizing a Network Interface Physical Ethernet VMApp Guest OS Virtual Network Hub Host OS NIC Driver Virtual Bridge NIC Driver Physical NIC VMDriver VMM PC Hardware Ed Bugnion, VMWare The rise of data centers • Single place for hosting servers and data • ISP’s now take machines hosted at data centers • Run by large companies – like BT • Manage – Power – Computation + Data – Cooling systems – Systems Admin + Network Admin Data Centre in Tokyo From: Satoshi Matsuoka http://www.attokyo.co.jp/eng/facility.html Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark) Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark) Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark) Requirements • Power an important design constraint: – Electricity costs – Heat dissipation • Two key options in clusters – enable scaling of: – Operating frequency (square relation) – Supply voltage (cubic relation) • Balance QoS requirements – e.g.fraction of workload to process locally – with power management From: Salim Hariri, Mazin Yousif From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) Martin J. Levy (Tier1 Research) and Josh Snowhorn (Terremark) The case for power management in HPC • Power/energy consumption a critical issue – Energy = Heat; Heat dissipation is costly – Limited power supply – Non-trivial amount of money • Consequence – Performance limited by available power – Fewer nodes can operate concurrently • Opportunity: bottlenecks – Bottleneck component limits performance of other components – Reduce power of some components, not overall performance • Today, CPU is: – Major power consumer (~100W), Power/performance – Rarely bottleneck and “gears” – Scalable in power/performance (frequency & voltage) • Two reasons: 1. Frequency and voltage scaling (1) power Is CPU scaling a win? Performance reduction less than Power reduction Throughput reduction less than Performance reduction • Assumptions – CPU large power consumer – CPU driver – Diminishing throughput gains (2) application throughput 2. Application throughput CPU power P = ½ CVf2 performance (freq) performance (freq) AMD Athlon-64 • • • • x86 ISA 64-bit technology Hypertransport technology – fast memory bus Performance – Slower clock frequency – Shorter pipeline (12 vs. 20) – SPEC2K results • 2GHz AMD-64 is comparable to 2.8GHz P4 • P4 better on average by 10% & 30% (INT & FP) • Frequency and voltage scaling – 2000 – 800 MHz – 1.5 – 1.1 Volts From: Vincent W. Freeh (NCSU) LMBench results • LMBench – Benchmarking suite – Low-level, micro data • Test each “gear” Frequency Gear (Mhz) 0 2000 1 1800 2 1600 3 1400 4 1200 6 800 Voltage 1.5 1.4 1.3 1.2 1.1 0.9 From: Vincent W. Freeh (NCSU) Operating system functions From: Vincent W. Freeh (NCSU) Communication From: Vincent W. Freeh (NCSU) The problem • Peak power limit, P – Rack power – Room/utility – Heat dissipation • Static solution, number of servers is – N = P/Pmax – Where Pmax maximum power of individual node • Problem – – – – Peak power > average power (Pmax > Paverage) Does not use all power – N * (Pmax - Paverage) unused Under performs – performance proportional to N Power consumption is not predictable From: Vincent W. Freeh (NCSU) From: Vincent W. Freeh (NCSU) Safe over provisioning in a cluster • Allocate and manage power among M > N nodes – Pick M > N • Eg, M = P/Paverage – MPmax > P – Plimit = P/M • Goal Pmax Paverage P(t) time power power – Use more power, safely under limit – Reduce power (& peak CPU performance) of individual nodes – Increase overall application performance Pmax Paverage Plimit P(t) time From: Vincent W. Freeh (NCSU) Safe over provisioning in a cluster • Benefits – Less “unused” power/energy – More efficient power use • More performance under same power limitation – Let P be performance – Then more performance means: MP * > NP Pmax Paverage P(t) time unused energy power power – Or P */ P > N/M or P */ P > Plimit/Pmax Pmax Paverage Plimit P(t) time When P */ P > N/M or P */ P > Plimit/Pmax (1) In words: power reduction more than performance reduction • Two reasons: 1. Frequency and voltage scaling 2. Application throughput (2) application throughput • power When is this a win? performance (freq) performance (freq) From: Vincent W. Freeh (NCSU) Feedback-directed, adaptive power control • Uses feedback to control power/energy consumption – Given power goal – Monitor energy consumption – Adjust power/performance of CPU • Several policies – Average power – Maximum power – Energy efficiency: select slowest gear (g) such that From: Vincent W. Freeh (NCSU) A more holistic approach: Managing a Data Center CRAC: Computer Room Air Conditioning units From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) Location of Cooling Units six CRAC units are serving 1000 servers, consuming 270 KW of power out of a total capacity of 600 KW http://blogs.zdnet.com/BTL/?p=4022 From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Justin Moore, Ratnesh Sharma, Rocky Shih, Jeff Chase, Chandrakant Patel, Partha Ranganathan (HP Labs) From: Satoshi Matsuoka From: Satoshi Matsuoka From: Satoshi Matsuoka From: Satoshi Matsuoka From: Satoshi Matsuoka From: Satoshi Matsuoka