Applied CyberInfrastructure Concepts ISTA 420/520 Fall 2014 Will Computers Crash Genomics? Science Vol 331 Feb 2011 Nirav Merchant (nirav@email.arizona.edu) Bio Computing & iPlant Collaborative Eric Lyons (ericlyons@email.arizona.edu) Plant Sciences & iPlant Collaborative University of Arizona 1 http://goo.gl/p4j3m or https://sites.google.com/site/appliedciconcepts/ 1 Topic Coverage • Building your tool chest for: – Development Environment – Test Environment – Deployment Environment • Why do you need different Environments ? • Virtualization, Containers and Virtual Env • Preparing VM for Thu. hands on. Simple Formula + 3 = The Reality PERL Python Java Ruby Fortran C C# C++ R Matlab etc. and lots of glue….. 4 + + Amazon Azure Rackspace Campus HPC XSEDE Etc. Simple Formula + = Too many environments ? • • • • • What is “dependency hell” ? Why create these environments ? The laptop syndrome Reproducibility challenges Making things “cloudy” Arrival of “As a Service” models Cyberinfrastructure SaaS: Software as a Service (e.g. Clustering/Assembly is a service) PaaS: Platform as a Service IaaS plus core software capabilities on which you build SaaS (e.g. Hadoop/MapReduce is a Platform) IaaS: Infrastructure as a Service (get computer time with a credit card and with a Web interface like EC2) 7 http://salsahpc.indiana.edu Pain Flexibility Productivity Is “Research as a Service” Your new best friend (for this course) Virtualization (in all layers of our cake !) 8 These slides are a subset from Carl Waldspurger (Vmware R&D) Introduction to Virtual Machines To learn more about virtualization visit: http://labs.vmware.com/academic/introduction-to-virtualization Overview Virtualization and VMs Processor Virtualization Memory Virtualization I/O Virtualization Network Virtualization How does Virtualization relate to cloud ? Types of Virtualization Process Virtualization – Language-level Java, .NET, Smalltalk – OS-level processes, Solaris Zones, BSD Jails, Virtuozzo – Cross-ISA emulation Apple 68K-PPC-x86, Digital FX!32 Device Virtualization – Logical vs. physical VLAN, VPN, NPIV, LUN, RAID System Virtualization – “Hosted” Virtual Box, VMware Workstation, Microsoft VPC, Parallels – “Bare metal” VMware ESX, Xen, KVM, Microsoft Hyper-V Starting Point: A Physical Machine • Physical Hardware – Processors, memory, chipset, I/O devices, etc. – Resources often grossly underutilized • Software – Tightly coupled to physical hardware – Single active OS instance – OS controls hardware What is a Virtual Machine? • Software Abstraction – Behaves like hardware – Encapsulates all OS and application state • Virtualization Layer – – – – Extra level of indirection Decouples hardware, OS Enforces isolation Multiplexes physical hardware across VMs • Host OS and Guest OS Virtualization Properties Isolation – Fault isolation – Performance isolation Encapsulation – Cleanly capture all VM state – Enables VM snapshots, clones Portability – Independent of physical hardware – Enables migration of live, running VMs Interposition – Transformations on instructions, memory, I/O – Enables transparent resource overcommitment, encryption, compression, replication … What is a Virtual Machine Monitor (VMM)? Classic Definition (Popek and Goldberg ’74) VMM Properties – Fidelity – Performance – Safety and Isolation Note: VMM = Hypervisor Classic Virtualization and Applications • Classical VMM – IBM mainframes: IBM S/360, IBM VM/370 – Co-designed proprietary hardware, OS, VMM – “Trap and emulate” model From IBM VM/370 product announcement, ca. 1972 • Applications – Timeshare several single-user OS instances on expensive hardware – Compatibility Modern Virtualization Renaissance Recent Proliferation of VMs – Considered exotic mainframe technology in 90s – Now pervasive in datacenters and clouds – Huge commercial success Why? – Introduction on commodity x86 hardware – Ability to “do more with less” saves $$$ – Innovative new capabilities – Extremely versatile technology Modern Virtualization Applications Server Consolidation – Convert underutilized servers to VMs – Significant cost savings (equipment, space, power) – Increasingly used for virtual desktops Simplified Management – Datacenter provisioning and monitoring – Dynamic load balancing Improved Availability – Automatic restart – Fault tolerance – Disaster recovery Test and Development Improve the software lifecycle Develop, debug, deploy and maintain applications in virtual machines Power tool for software developers – record/replay application execution deterministically – trace application behavior online and offline – model distributed hardware for multi-tier applications Application and OS flexibility – run any application or operating system Virtual appliances – a complete, portable application execution environment Increase application availability Fast, automated recovery – automated failover/restart within a cluster – disaster recovery across sites – VM portability enables this to work reliably across potentially different hardware configurations Fault tolerance – hypervisor-based fault tolerance against hardware failures [Bressoud and Schneider, SOSP 1995] – run two identical VMs on two different machines, backup VM takes over if primary VM’s hardware crashes – commercial prototypes beginning to emerge (2008) Clouds and VM Frist step to learning about cloud is Virtualization Taking VM from your desktop to cloud is our goal (which will not be easy, but we will make it happen) Scaling and why it matters to have many VM ? Connecting VM’s and what is a appliance ? Discussion on VM <-> Cloud Containers (Docker and others in next class)! Virtual Box We will use https://virtualbox.org (Sun now Oracle) ver 4.2.16 (current as of today) Guest OS will be CentOS 6.3 http://virtualboxes.org/images/centos/ (Number 13 on the list on this page) Use manual to install this on your laptop and learn about Virtual Box (end user docs) https://www.virtualbox.org/manual/UserManual.h tml Virtual Box (next class) You should have it running with the the centos 6.3 image for next class (and have logged in) We will learn about basic linux system admin tasks/duties, machine performance etc. for getting started http://library.linode.com/using-linux/administration-basic http://www.linuxtraining.co.uk/download/new_linux_course_m odules.pdf (main text for next few sessions) Use Snapshots to save states Working with VM Appliances and exchanging images. Starting to build VM from scratch Before coming to class on Thu. Download the current Ubuntu desktop ISO (ver. 14.04.1 LTS and get 64 bit version) http://www.ubuntu.com/download/desktop This is a large file so make sure you do it BEFORE you come to class