Cloud computing Clouds for science, Virtualization

advertisement
Cloud computing
Clouds for science, Virtualization
Dan C. Marinescu
Computer Science Division, EECS Department
University of Central Florida
Email: dcm@cs.ucf.edu
Contents

Last time





Service Level Agreements
Software licensing
Basic architecture of cloud platforms
Open-source platforms for cloud computing: Eucalyptus, Nebula, Nimbus
Cloud applications



Existing and new applications; Coordination and the Zookeeper
The Map-Reduce programming model;The GrepTheWeb application
Today






Clouds for science and engineering
Layering and virtualization
Virtual machines
Virtual machine monitors
Performance isolation; security isolation
Full and paravirtualization
7/26/2016
UTFSM - May-June 2012
2
Clouds for science and engineering

In virtually all areas of science one has to:






manage very large volumes of data,
build and execute models,
integrate data and the literature,
document the experiments,
share the data with others, and preserve it for a long time.
Typical problem data discovery in large scientific data sets; examples:
biomedical and genomic data at NCBI (National Center for Bio Information),
 astrophysics data from NASA,
 atmospheric data from NOAA and NCAR


Online data discovery can be viewed as an ensemble of several phases





(i) recognition of the information problem;
(ii) generation of search queries using one or more search engines;
(iii) evaluation of the search results;
(iv) evaluation of the Web documents; and
(v) comparing information from different sources.
7/26/2016
UTFSM - May-June 2012
3
Comparative benchmark,EC2 versus supercomputers


NERSC (National Energy Research Scientific Computing Center), located at
Lawrence Berkeley National Laboratory has some 3000 users working on 400
projects and using some 600 codes.
Some of the codes used
1. CAM (Community Atmosphere Mode), the atmospheric component of CCSM
(Community Climate System Model) used for weather and climate modeling.
The code uses two two-dimensional domain decompositions; one for the
dynamics and the other for re-mapping. It is communication-intensive.
2. GAMESS, ab-initio quantum chemistry calculations. On the Cray XT4 it uses
MPI and only one-half of the processors compute, while the other half are data
movers. The program is memory- and communication-intensive.
3. GTC, fusion research; non-spectral Poisson solver, memory intensive.
4. IMPACT-T the prediction and performance enhancement of particle
accelerators; sensitive to memory bandwidth and MPI collective performance.
5. MAESTRO low Mach number hydrodynamics code for simulating astrophysical
flows; Parallelization via a tri-dimensional domain decomposition using a coarsegrained distribution strategy to balance the load and minimize communication
costs.
7/26/2016
UTFSM - May-June 2012
4
Codes (cont’d)
1.
2.
MILC (MImd Lattice Computation) is a QCD (Quantum Chromo
Dynamics) code used to study ``strong'' interactions binding quarks into
protons and neutrons and holding them together in the nucleus. The CG
(Conjugate Gradient) method is used to solve a sparse, nearly-singular
matrix problem. Many CG iterations steps are required for convergence;
the inversion translates into tri-dimensional complex matrix-vector
multiplications. Each multiplication requires a dot product of three pairs of
tri-dimensional complex vectors; a dot product consists of five multiplyadd operations and one multiply. The code is highly memory- and
computational-intensive and it is heavily dependent on pre-fetching.
PARATEC (PARAllel Total Energy Code) is a quantum mechanics code;
it performs ab initio total energy calculations using pseudo-potentials, a
plane wave basis set and an all-band (unconstrained) conjugate gradient
(CG) approach. Parallel three-dimensional FFTs transform the wave
functions between real and Fourier space. The FFT dominates the
runtime; the code uses MPI and is communication-intensive.
7/26/2016
UTFSM - May-June 2012
5
The systems used for comparison with EC2
1.
2.
3.
Carver  400 node IBM iDataPlex cluster with quad-core Intel Nehalem
processors at 2.67 GHz and with 24 GB of RAM (3 GB/core). Each node has two
sockets; a single Quad Data Rate (QDR) IB link connects each node to a network
that is locally a fat-tree with a global two-dimensional mesh. The codes were
compiled with Portland Group suite version 10.0 and Open MPI version. 1.4.1.
Franklin  9660 node Cray XT4; each node has a single quad-core 2.3 GHz
AMD Opteron ``Budapest'' processor with 8 GB of RAM (2 GB/core). Each
processor is connected through a 6.4 GB/s bidirectional HyperTransport interface
to the interconnect via a Cray SeaStar-2 ASIC. The SeaStar routing chips are
interconnected in a tri-dimensional torus topology, where each node has a direct
link to its six nearest neighbors. Codes were compiled with the Pathscale or the
Portland Group version 9.0.4.
Lawrencium  198-node (1584 core) Linux cluster; a compute node is a Dell
Poweredge 1950 server with two Intel Xeon quad-core 64 bit, 2.66 GHz
Harpertown processors with 16 GB of RAM (2 GB/core). A compute node is
connected to a Dual Data Rate Infiniband network configured as a fat tree with a
3:1 blocking factor. Codes were compiled using Intel 10.0.018 and Open MPI
1.3.3.
7/26/2016
UTFSM - May-June 2012
6
The benchmarks used for comparison
1.
2.
3.
4.
5.
6.
7.
DGEMM - measures the floating point performance of a processor/core; the memory
bandwidth does little to affect the results, as the code is cache friendly. The results
of the benchmark are close to the theoretical peak performance of the processor.
STREAM - measures the memory bandwidth.
The network latency benchmark.
HPL - a software package that solves a (random) dense linear system in double
precision arithmetic on distributed-memory computers; it is a portable and freely
available implementation of the High Performance Computing Linpack Benchmark.
FFTE - measures the floating point rate of execution of double precision complex
one-dimensional DFT (Discrete Fourier Transform).
PTRANS - parallel matrix transpose; it exercises the communications where pairs of
processors communicate with each other simultaneously. It is a useful test of the
total communications capacity of the network.
RandomAccess - measures the rate of integer random updates of memory (GUPS)
7/26/2016
UTFSM - May-June 2012
7
The benchmark results
7/26/2016
UTFSM - May-June 2012
8
Why is virtualization important?




Resource management for a community of users with a wide range of
applications running under different operating systems is a very difficult.
Resource management becomes even more complex when resources are
oversubscribed and users are uncooperative.
The obvious solution for an organization providing utility computing is
installing standard operating systems on individual systems and relying on
conventional OS techniques to ensure resource sharing, application
protection, and performance isolation. In this setup the system
administration, accounting, and security are very challenging for the
providers of service, while application development and performance
optimization are equally challenging for the users.
The alternative is resource virtualization.
7/26/2016
UTFSM - May-June 2012
9
Codes used (cont’d)
1.
2.
MILC (MImd Lattice Computation) is a QCD (Quantum Chromo Dynamics)
code used to study ``strong'' interactions binding quarks into protons and
neutrons and holding them together in the nucleus. The CG (Conjugate
Gradient) method is used to solve a sparse, nearly-singular matrix problem.
Many CG iterations steps are required for convergence; the inversion translates
into tri-dimensional complex matrix-vector multiplications. Each multiplication
requires a dot product of three pairs of tri-dimensional complex vectors; a dot
product consists of five multiply-add operations and one multiply. The code is
highly memory- and computational-intensive and it is heavily dependent on
pre-fetching.
PARATEC (PARAllel Total Energy Code) is a quantum mechanics code; it
performs ab initio total energy calculations using pseudo-potentials, a plane
wave basis set and an all-band (unconstrained) conjugate gradient (CG)
approach. Parallel three-dimensional FFTs transform the wave functions
between real and Fourier space. The FFT dominates the runtime; the code
uses MPI and is communication-intensive.
7/26/2016
UTFSM - May-June 2012
10
Resource virtualization

Virtualization simulates the interface to a physical object by:
Multiplexing: create multiple virtual objects from one instance of a physical
object; example, a processor is multiplexed among a number of threads.
2. Aggregation: create one virtual object from multiple physical objects;
example, a number of physical disks are aggregated into a RAID disk.
3. Emulation: construct a virtual object from a different type of a physical object;
example, a physical disk emulates a Random Access Memory.
4. Multiplexing and emulation. Examples: virtual memory with paging
multiplexes real memory and disk and a virtual address emulates a real
address; the TCP protocol emulates a reliable bit pipe and multiplexes a
physical communication channel and a processor.
1.

Virtualization

abstracts the underlaying resources and simplifies their use,
 isolates users from one another, and
 supports replication which, in turn, increases the elasticity of the system.

Virtualization is a critical aspect of cloud computing, equally important for
the providers and the consumers of cloud services.
7/26/2016
UTFSM - May-June 2012
11
Why is virtualization important for cloud computing?





System security  allows isolation of services running on the same
hardware.
Performance and reliability  allows applications to migrate from one
platform to another.
Lower costs  simplifies the development and management of services
offered by a provider.
Performance isolation  allows isolation of applications and predictive
performance.
User convenience  a major advantage of a Virtual Machine architecture
versus a traditional operating system. For example, a user of the Amazon
Web Services (AWS) could submit an Amazon Machine Image (AMI)
containing the applications, libraries, data, and associated configuration
settings; the user could choose the operating system for the application,
then start, terminate, and monitor as many instances of the AMI as needed,
using the Web service APIs and the performance monitoring and
management tools provided by the AWS.
7/26/2016
UTFSM - May-June 2012
12
Side effects of virtualization



Performance penalty  all privileged operations of a virtual machine must
be trapped and validated by the Virtual Machine Monitor which, ultimately,
controls the system behavior; the increased overhead has a negative
impact on the performance.
Increased hardware costs  the cost of the hardware for a virtual machine
is higher than the cost for a system running a traditional operating system
because the physical hardware is shared among a set of guest operating
systems and it is typically configured with faster and/or multi-core
processors, more memory, larger disks, and additional network interfaces
as compared with a system running a traditional operating system.
Architectural support for multi-level control  resource sharing in a virtual
machine environment requires not only ample hardware support and, in
particular powerful processors, but also architectural support for multi-level
control. Indeed, resources, such as CPU cycles, memory, secondary
storage, and I/O and communication bandwidth, are shared among several
virtual machines; for each virtual machine resources must be shared among
multiple instances of an application.
7/26/2016
UTFSM - May-June 2012
13


7/26/2016
Layering and interfaces between layers in a
computer system.
The software components include applications,
libraries, and operating system; they interact
with the hardware via several interfaces:
 Application Programming Interface (API) defines the set of instructions the hardware
was designed to execute and gives the
application access to the ISA; it includes
HLL library calls which often invoke system
calls
 Application Binary Interface (ABI) - allows
the ensemble consisting of the application
and the library modules to access the
hardware; the ABI does not include
privileged system instructions, instead it
invokes system calls. and
 Instruction Set Architecture (ISA)
The hardware consists of one or more multicore processors, a system interconnect, (e.g.,
one or more busses) a memory translation unit,
the main memory, and I/O devices, including
one or more networking interfaces.
UTFSM - May-June 2012
14
Portability


The binaries created by a compiler for a specific ISA and operating
systems are not portable, cannot run on a computer with a different ISA or
on the computer with the same ISA, but a different operating system.
It is possible though to compile a HLL program for a virtual machine (VM)
environment; the portable code is produced and distributed and then
converted by binary translators to the ISA of the host system. A dynamic
binary translation converts blocks of guest instructions from the portable
code to the host instruction and leads to a significant performance
improvement, as such blocks are cached and reused.
7/26/2016
UTFSM - May-June 2012
15

Two strategies to create
memory images from High
Level Language (HLL) code:
1.
HLL code can be translated for a
specific architecture and
2.
7/26/2016
operating system.
HLL code can also be
compiled into portable code
and then the portable code be
translated for systems with
different ISAs. The code
shared is the object code in
the first case and the portable
code in the second case.
UTFSM - May-June 2012
16
Virtual machines




A Virtual Machine (VM)  an isolated environment that appears to be a
whole computer, but actually only has access to a portion of the computer
resources.
Each virtual machine appears to be running on the bare hardware, giving
the appearance of multiple instances of the same computer, though all are
supported by a single physical system.
Modern operating systems such as Linux Vserver, OpenVZ (Open
VirtualiZation), FreeBSD Jails, and Solaris Zones based on Linux, Unix,
FreeBSD, and Solaris, respectively, implement operating system-level
virtualization technologies. They allow a physical server to run multiple
isolated operating system instances, known as containers, Virtual Private
Servers (VPSs), or Virtual Environments (VEs).
These systems claim performance advantages over the systems based on
a Virtual Machine Monitor such as Xen or VMware; it is reported that there
is only a 1% - 3% performance penalty for OpenVZ as compared to a
standalone Linux server. OpenVZ is licensed under the GPL version 2.
7/26/2016
UTFSM - May-June 2012
17
The performance of applications under virtual machine



Generally, virtualization adds some level of overhead and affects negatively
the performance.
In some cases an application running under a virtual machine performs
better than one running under a classical OS; this is the case of a policy
called cache isolation. The cache is generally not partitioned equally among
processes running under a classical OS, as one process may use the
cache space better than the other. For example, in the case of two
processes, one write-intensive and the other read-intensive, the cache may
be aggressively filled by the first. Under the cache isolation policy the cache
is divided between the VMs and it is beneficial to run workloads competing
for cache in two different VMs.
The application I/O performance running under a virtual machine depends
on factors such as, the disk partition used by the VM, the CPU utilization,
the I/O performance of the competing VMs, and the I/O block size. On Xen
platform discrepancies between the optimal choice and the default are as
high as 8% to 35%
7/26/2016
UTFSM - May-June 2012
18
Process, system, and application VMs



Process VM  a virtual platform created for an individual process and
destroyed once the process terminates. Virtually all operating systems
provide a process VM for each one of the applications running, but the
more interesting process VMs are those which support binaries compiled
on a different instruction set.
System VM  provides a complete system; each VM can run its own OS,
which in turn can run multiple applications.
Application virtual machine  when the VM runs under the control of a
normal OS and provides a platform-independent host for a single
application, e.g., Java Virtual Machine (JVM).
7/26/2016
UTFSM - May-June 2012
19
Virtual machine monitors (VMMs)

A Virtual Machine Monitor control software allowing several virtual machines
to share a system. Several approaches are possible:



Traditional VM  a thin software layer that runs directly on the host machine
hardware; its main advantage is performance; examples: VMWare ESX, ESXi
Servers, Xen, OS370, and Denali.
Hybrid  the VMM shares the hardware with existing OS; example: VMWare
Workstation.
Hosted  the VM runs on top of an existing OS; example, User-mode Linux.
 The main advantage of the easier to build and install; other advantages: the
VMM could use several components of the host OS, such as the scheduler,
the pager and the I/O drivers, rather than providing its own.
 A price to pay for this simplicity is the increased overhead and the associated
performance penalty; indeed, the I/O operations, page faults, and scheduling
requests from a guest OS are not handled directly by the VMM, instead they
are passed to the host OS. Performance, as well as the challenges to
support complete isolation of VMs make this solution less attractive for
servers in a cloud computing environment;
7/26/2016
UTFSM - May-June 2012
20
7/26/2016
UTFSM - May-June 2012
21




7/26/2016
UTFSM - May-June 2012
a) A taxonomy of process and
systems VMs for the same and
for different Instruction Set
Architectures (ISAs). Traditional,
Hybrid, and Hosted are three
classes of VMs for systems with
the same ISA.
(b) Traditional VM; VMM
supports multiple virtual
machines and runs directly on
the hardware;.
(c) Hybrid VM; VMM shares the
hardware with a host operating
system and supports multiple
virtual machines.
(d) Hosted VM; VMM runs under
a host operating system.
22
The interaction of VMM with the guest OS

VMM (hypervisor)  software that securely partitions the resources of
computer system into one or more VMs; runs in kernel mode and allows:

multiple services to share the same platform;
 live migration, the movement of a server from one platform to another;
 system modification while maintaining backward compatibility with the original
system.


Guest operating system  an OS that runs under the control of a VMM
rather than directly on the hardware; runs in user mode.
The functions of a VMM:

monitors the system performance and takes corrective actions to avoid
performance degradation;
 when a guest OS attempts to execute a privileged instruction it traps the
operation and enforces the correctness and safety of the operation;
 guarantees the isolation of the individual virtual machines and thus, ensures
security and encapsulation, a major concern in cloud computing.
7/26/2016
UTFSM - May-June 2012
23
A VMM virtualizes the CPU and the memory




The VMM traps interrupts and dispatches then to the individual guest
operating systems; if a guest OS disables interrupts, the VMM buffers such
interrupts until the guest OS enables them.
The VMM maintains a shadow page table for each guest OS and
replicates any modification made by the guest OS in its own shadow page
table; this shadow page table points to the actual page frame and it is used
by the hardware component called the Memory Management Unit (MMU)}
for dynamic address translation.
VMMs control the virtual memory management and decide what pages to
swap out
Memory virtualization has important implications on the performance.
VMMs use a range of optimization techniques; for example, VMWare
systems avoid page duplication among virtual machines, they maintain
only one copy of a shared page and use copy-on-write policies while Xen
imposes total isolation of the VM and does not allow page sharing.
7/26/2016
UTFSM - May-June 2012
24
Application
thread 1
IR
PC
Virtual
Memory
Manager
Exception
Handler
Scheduler
Multi-Level
Memory
Manager
Application
thread 2
Translate (PC)
into (Page#,Displ)
Is (Page#) in
primary storage?
YES- compute the
physical address
of the instruction
IR
PC
NO – page fault
Save PC
Handle page fault
Identify Page #
SEND(Page #)
AWAIT
Issue AWAIT on
behalf of thread 1
Thread 1
WAITING
Thread 2
RUNNING
Find a block in
primary storage
Is “dirty” bit of
block ON?
YES- write block to
secondary storage
NO- fetch block
corresponding to
missing page
Load PC of
thread 2
IR
PC
I/O operation
complets
ADVANCE
Thread 1
RUNNING
Load PC of
thread 1
IR
7/26/2016
PC
Lecture 22
25
Performance and security isolation




Performance isolation is a critical aspect for the Quality of Service grantees
in shared environments; if the run-time behavior of an application is affected
by other applications running concurrently and thus, competing for CPU
cycles, cache, main memory, disk and network access, it is rather difficult to
predict the completion time. It is equally difficult to optimize the application.
Operating systems use the process abstraction not only for resource
sharing but also to support isolation; unfortunately, once a process is
compromised it is rather easy for an attacker to penetrate the entire system.
The software running on a virtual machine it can only access virtual devices
emulated by the software and has the potential to provide a level of security
isolation nearly equivalent to the isolation presented by two different
physical systems. Virtualization can be used to improve security.
The security vulnerability of VMMs is considerably reduced; a VMM is a
much simpler and better specified system than a traditional operating
system; for example, the Xen VMM has approximately 60,000 lines of code
while the Denali VMM has only about half, 30,000.
7/26/2016
UTFSM - May-June 2012
26
Architectural support for virtualization

In 1974 Gerald J. Popek and Robert P. Goldberg gave a set of sufficient
conditions for a computer architecture to support virtualization and allow a
Virtual Machine Monitor to operate efficiently:

A program running under the VMM should exhibit a behavior essentially
identical to that demonstrated when running on an equivalent machine directly.
 The VMM should be in complete control of the virtualized resources.
 A statistically significant fraction of machine instructions must be executed
without the intervention of the VMM.

Another criterion allows us to identify an architecture suitable for a virtual
machine and to distinguish two classes of machine instructions

sensitive instructions of two types:



control sensitive, instructions that attempt to change the memory allocation or the
privileged mode and
mode sensitive, instructions whose behavior is different in the privileged mode;
innocuous instructions, that are not sensitive.
7/26/2016
UTFSM - May-June 2012
27
Full virtualization versus paravirtualization


Full virtualization  each virtual machine runs on an exact copy of the
actual hardware; example: VMWare VMMs.
This execution mode is efficient  the hardware is fully exposed to the
guest OS which runs unchanged.
Paravirtualization  each virtual machine runs on a slightly modified
copy of the actual hardware; examples, Xen and Denali.
The guest OS must be modified to run under the VMM; the guest OS
code must be ported for individual hardware platforms.
The reasons why paravirtualization is often adopted are:

some aspects of the hardware cannot be virtualized;
 to improve performance;
 to present a simpler interface.
7/26/2016
UTFSM - May-June 2012
28


7/26/2016
UTFSM - May-June 2012
a) The full virtualization
requires the hardware
abstraction layer of the
guest OS to have some
knowledge about the
hardware.
(b) The paravirtualization
avoids this requirement
and allows full
compatibility at the
Application Binary
Interface (ABI).
29
Xen - a VMM based on paravirtualization





Xen is a Virtual Machine Monitor (VMM) developed at the University of
Cambridge, UK, in the early 2000.
The goal was to design a VMM capable of scaling to about 100 virtual
machines running standard applications and services without any
modifications to the Application Binary Interface (ABI).
Fully aware that the x86 architecture does not support efficiently full
virtualization, the designers of Xen opted for paravirtualization.
Several operating systems including Linux, Minix, NetBSD, FreeBSD,
NetWare, and OZONE can operate as paravirtualized Xen guest operating
systems running on IA-32, x86-64, Itanium, and ARM architectures.
Since 2010 Xen is a free software, developed by the community of users
and licensed under the GNU General Public License (GPLv2).
7/26/2016
UTFSM - May-June 2012
30

7/26/2016
UTFSM - May-June 2012
Xen for the x86
architecture; in the
original Xen
implementation a
guest OS could be
- XenoLinix,
- XenoBSD, or
- XenoXP
31
Paravirtualization strategies for virtual memory management, CPU multiplexing,
and I/O devices for the original x86 Xen implementation
7/26/2016
UTFSM - May-June 2012
32
Download