Introduction to Cloud Computing

advertisement
Cloud Computing
Part #1
1
http://www.digitaltrends.com/wp-content/uploads/2011/09/Cloud-Computing.jpg
2
Computing history (1)

Abacus
 2700–2300 BC
http://upload.wikimedia.org/wikipedia/commons/e/ea/Boulier1.JPG
http://retrocalculators.com/abacus_files/Wooden_Abacus_Russian_Wood_Schoty.jpg
3
Computing history (2)

Babbage computer
 1834 - Charles Babbage
http://members.peak.org/~jeremy/superlative/pix/babbageMachine.jpg
4
Computing history (3)

Z1 computer
 Konrad Zuse, 1936
 22-bit floating point
 Z2, Z3, … Z5
 Plankalkul (ALGOL)
http://www.yorku.ca/lbianchi/sts3700b/z1-vb2.jpg
5
Computing history (4)

Bell 1
 1940
 9000 relays, 90 m2, 10 t

Mark 1
 1944
 Equations

ENIAC
 1946
 18000 lamps, 90 × 15 m2, 30t, 150 kW
 100 kHz, + for 0.2 ms, * for 2.8 ms
http://mathsci.ucd.ie/~plynch/eniac/ENIAC.jpg
6
Computing history (5)

Philco-2000
 1955
 56000 transistors, 1200 diodes, (450 lamps)
 + for 1,7 microseconds, * for 40,3

CDC 6600
 1960
 169000 transistors
 100 MHz
http://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/CDC_6600.jc.jpg/800px-CDC_6600.jc.jpg
7
Computing history (6)

System-360
 1964, First integral
 DOS, OS/360

Intel 8008
 1972
 8 bit
Intel 8088
 PC XT -> PC AT (80286)

http://www.wired.com/images/article/full/2008/04/ibm_360_500px.jpg
8
Performance progress (1)




2010: 2.57 petaflops
2005: 280.6 teraflops
2000: 4.94 teraflops
1995: 170 gigaflops




15,100 times faster
1,650 times faster
19 times faster
The baseline
http://royal.pingdom.com/2010/12/02/incredible-growth-supercomputing-performance-1995-2010/
9
Performance progress (2)

In 2010, we measure the performance of
the fastest supercomputers in petaflops
(quadrillions of operations per second).
In 1995, we used gigaflops (billions of
operations per second). We are now
using the scale a million times larger
than we did 15 years ago.
10
Tasks and computers

Need for performance
 Amount of the data
 Resolution / quality / complexity

Growing demand
 More online users
 More applications running
11
Scaling thing (1)

Personal computer
 Simple, personal computing tasks
http://a57.foxnews.com/global.fncstatic.com/static/managed/img/Health/2009/July/660/371/COMPUTER-GIRL_640.jpg?ve=1
12
Scaling thing (2)

Network
 Common tasks, resources
http://www.lucartech.com/images/Services_network.jpg
13
Scaling thing (3)

Cluster
 Processing power, large IO
http://www.biomedcentral.com/content/figures/1471-2105-11-217-1-l.jpg
http://upload.wikimedia.org/wikipedia/commons/thumb/c/c5/MEGWARE.CLIC.jpg/300px-MEGWARE.CLIC.jpg
14
Scaling thing (4)

Cloud
 The topic we will speak about…
http://www.bluesci.org/wordpress/wp-content/uploads/2011/09/Sevensheaven_illustration-Cloud_Computing.jpg
15
Cloud computing (1)
http://en.wikipedia.org/wiki/File:Cloud_computing.svg
16
Cloud computing (2)
Grid computing
 SOA
 Client-server

 distributed application that distinguishes
between service providers (servers) and
service requesters (clients)

Peer-to-peer
 distributed architecture without the need for
central coordination
17
5 essential characteristics
On-demand self-service
 Broad network access
 Resource pooling
 Rapid elasticity
 Measured service

18
Service models
Infrastructure (IaaS)
 Platform (PaaS)
 Software (SaaS)
 Network (NaaS)
 Database (DBaaS)

http://upload.wikimedia.org/wikipedia/commons/3/3c/Cloud_computing_layers.png
19
Deployment models
Public cloud
 Community cloud
 Hybrid cloud
 Private cloud

http://upload.wikimedia.org/wikipedia/commons/8/87/Cloud_computing_types.svg
20
Comparison for SaaS
Criteria
Public cloud
Private cloud
Initial cost
Typically zero
Typically high
Running cost
Predictable
Unpredictable
Customization
Impossible
Possible
Privacy
No (Host has access
to the data)
Yes
Single sign-on
Impossible
Possible
Scaling up
Easy while within
defined limits
Laborious but no limits
21
Virtualization (1)

VM technology allows multiple virtual
machines to run on a single physical
machine
22
Virtualization (2)
Advantages of virtual machines:
 Run operating systems where the physical hardware is







unavailable;
Easier to create new machines, backup machines, etc.;
Software testing using “clean” installs of operating
systems and software;
Emulate more machines than are physically available;
Timeshare lightly loaded systems on one host,
Debug problems (suspend and resume the problem
machine);
Easy migration of virtual machines (shutdown needed or
not);
Run legacy systems!
23
Advantages of Cloud
Computing (1)
Lower computer costs
 Improved performance
 Reduced software costs
 Instant software updates
 Improved document format compatibility

24
Advantages of Cloud
Computing (2)
Unlimited storage capacity
 Increased data reliability
 Universal document access
 Latest version availability
 Easier group collaboration


Device independence
25
Disadvantages of Cloud
Computing (1)
 Requires
a constant Internet
connection
 Does not work well with low-speed
connections
 Features might be limited
26
Disadvantages of Cloud
Computing (2)
Can be slow
 Stored data might not be secure
 Stored data can be lost
 Compatibility for clouds/DB/etc.

27
http://www.treloarphysio.com/blog/wp-content/uploads/2012/02/relax-relaxing-8925208-1024-768.jpg
28
What is Cloud Computing?
Web-scale problems
2. Large data centers
3. Different models of computing
4. Highly-interactive Web applications
1.
29
1. Web-Scale Problems

Characteristics:
 Definitely data-intensive
 May also be processing intensive

Examples:






Crawling, indexing, searching, mining the Web
“Post-genomics” life sciences research
Other scientific data (physics, astronomers, etc.)
Sensor networks
Web 2.0 applications
…
30
How much data?
Wayback Machine has 2 PB + 20 TB/month (2006)
 Google processes 20 PB a day (2008)
 “all words ever spoken by human beings” ~ 5 EB
 NOAA has ~1 PB climate data (2007)
 CERN’s LHC will generate 15 PB a year (2008)

640K ought to be
enough for anybody.
31
32
Maximilien Brice, © CERN
33
Maximilien Brice, © CERN
What to do with more data?

Answering factoid questions



Pattern matching on the Web
Works amazingly well
Who shot Abraham Lincoln?  X shot Abraham Lincoln
Learning relations



Start with seed instances
Search for patterns on the Web
Using patterns to find more instances
Wolfgang Amadeus Mozart (1756 - 1791)
Einstein was born in 1879
Birthday-of(Mozart, 1756)
Birthday-of(Einstein, 1879)
PERSON (DATE –
PERSON was born in DATE
(Brill et al., TREC 2001; Lin, ACM TOIS 2007)
(Agichtein and Gravano, DL 2000; Ravichandran and Hovy, ACL 2002; … )
34
2. Large Data Centers
Web-scale problems? Throw more
machines at it!
 Clear trend: centralization of computing
resources in large data centers

 Necessary ingredients: fiber, juice, and space

Important Issues:




Redundancy
Efficiency
Utilization
Management
35
36
Source: Harper’s (Feb, 2008)
37
Maximilien Brice, © CERN
Key Technology: Virtualization
App
App
App
App
App
App
OS
OS
OS
Operating System
Hypervisor
Hardware
Hardware
Traditional Stack
Virtualized Stack
38
3. Different Computing Models
“Why do it yourself if you can pay someone to do it for you?”

Utility computing
 Why buy machines when you can rent cycles?
 Examples: Amazon’s EC2, GoGrid, AppNexus

Platform as a Service (PaaS)
 Give me nice API and take care of the
implementation
 Example: Google App Engine, Heroku

Software as a Service (SaaS)
 Just run it for me!
 Example: Gmail
39
4. Web Applications

What is the nature of software
applications?
 From the desktop to the browser
 SaaS = Web-based applications
 Examples: Google Maps, Facebook

How do we deliver highly-interactive
Web-based applications?
 AJAX (asynchronous JavaScript and XML)
 For better, or for worse…
40
What is the course about?

MapReduce: the “back-end” of cloud
computing
 Batch-oriented processing of large datasets

Ajax: the “front-end” of cloud computing
 Highly-interactive Web-based applications

Computing “in the clouds”
 Amazon’s EC2/S3 as an example of utility
computing
41
Amazon Web Services

Elastic Compute Cloud (EC2)
 Rent computing resources by the hour
 Basic unit of accounting = instance-hour
 Additional costs for bandwidth

Simple Storage Service (S3)
 Persistent storage
 Charge by the GB/month
 Additional costs for bandwidth
42
Simple Storage Service

Pay for what you use:
 $0.20 per GByte of data transferred,
 $0.15 per GByte-Month for storage used,
 Second Life Update:
○ 1TBytes, 40,000 downloads in 24 hours $200
43
Some cloud providers
44
Cloud Computing Zen

Don’t get frustrated (take a deep breath)…
 This is bleeding edge technology
 Those W$*#T@F! moments

Be patient…
 This is the second first time I’ve taught this
course

Be flexible…
 There will be unanticipated issues along the way

Be constructive…
 Tell me how I can make everyone’s experience
better
45
46
Source: Wikipedia
Web-Scale Problems?

Don’t hold your breath:
 Biocomputing
 Nanocomputing
 Quantum computing
…

It all boils down to…
 Divide-and-conquer
 Throwing more hardware at the problem
Simple to understand… a lifetime to master…
47
Divide and Conquer
“Work”
Partition
w1
w2
w3
“worker”
“worker”
“worker”
r1
r2
r3
“Result”
Combine
48
Different Workers
Different threads in the same core
 Different cores in the same CPU
 Different CPUs in a multi-processor
system
 Different machines in a distributed
system

49
Choices, Choices, Choices
Commodity vs. “exotic” hardware
 Number of machines vs. processor vs.
cores
 Bandwidth of memory vs. disk vs.
network
 Different programming models

50
Flynn’s Taxonomy
Single (SD)
Multiple (MD)
Data
Instructions
Single (SI)
Multiple (MI)
SISD
MISD
Single-threaded
process
Pipeline
architecture
SIMD
MIMD
Vector Processing
Multi-threaded
Programming
51
SISD
Processor
D
D
D
D
D
D
D
Instructions
52
SIMD
Processor
D0
D0
D0
D0
D0
D0
D0
D1
D1
D1
D1
D1
D1
D1
D2
D2
D2
D2
D2
D2
D2
D3
D3
D3
D3
D3
D3
D3
D4
D4
D4
D4
D4
D4
D4
…
…
…
…
…
…
…
Dn
Dn
Dn
Dn
Dn
Dn
Dn
Instructions
53
MIMD
Processor
D
D
D
D
D
D
D
D
D
D
Instructions
Processor
D
D
D
D
Instructions
54
Memory Typology: Shared
Processor
Processor
Memory
Processor
Processor
55
Memory Typology: Distributed
Processor
Memory
Processor
Memory
Network
Processor
Memory
Processor
Memory
56
Memory Typology: Hybrid
Processor
Processor
Memory
Processor
Memory
Processor
Network
Processor
Processor
Memory
Processor
Memory
Processor
57
Parallelization Problems






How do we assign work units to workers?
What if we have more work units than
workers?
What if workers need to share partial
results?
How do we aggregate partial results?
How do we know all the workers have
finished?
What is the common theme of all of these problems?
What if workers die?
58
General Theme?

Parallelization problems arise from:
 Communication between workers
 Access to shared resources (e.g., data)
Thus, we need a synchronization
system!
 This is tricky:

 Finding bugs is hard
 Solving bugs is even harder
59
Managing Multiple Workers

Difficult because
 (Often) don’t know the order in which workers run
 (Often) don’t know where the workers are running
 (Often) don’t know when workers interrupt each other

Thus, we need:
 Semaphores (lock, unlock)
 Conditional variables (wait, notify, broadcast)
 Barriers

Still, lots of problems:
 Deadlock, livelock, race conditions, ...

Moral of the story: be careful!
 Even trickier if the workers are on different machines
60
Patterns for Parallelism
Parallel computing has been around for
decades
 Here are some “design patterns” …

61
Master/Slaves
master
slaves
62
Producer/Consumer Flow
P
C
P
C
P
C
P
C
P
C
P
C
63
Work Queues
P
shared queue
P
P
W W W W W
C
C
C
64
Rubber Meets Road

From patterns to implementation:
 pthreads, OpenMP for multi-threaded
programming
 MPI for clustering computing
 …

The reality:
 Lots of one-off solutions, custom code
 Write you own dedicated library, then program
with it
 Burden on the programmer to explicitly manage
everything

MapReduce to the rescue!
65
Map/Reduce (1)

Document

Query
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
66
Map/Reduce (2)

Query result
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
67
Map/Reduce (3)

Reduce
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
68
Map/Reduce (4)

Reduce…
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
69
Map/Reduce (5)

Reduce…
http://ayende.com/blog/4435/map-reduce-a-visual-explanation
70
Map/Reduce performance

Sorting 210 100 bytes records (~1TB)
http://static.usenix.org/event/osdi04/tech/full_papers/dean/dean.pdf
71
Security in a cloud
Traditional threats to a software
 Functional threats of cloud components
 Attacks on a client
 Virtualization threats
 Threat of cloud complexity
 Attacks on hypervisor
 Threats of VM migration
 Attacks on management systems
 Privacy, personal data

72
Traditional threats to a
software
The traditional treads are related to the
vulnerabilities of network protocols,
operating systems, modular components
and other similar weaknesses. This is a
classic security threat, to solve that, it is
sufficient to use anti-virus software, firewall
and other components discussed later. It is
important that these tools are adapted to
the cloud environment to run effectively in
virtualization.
73
Functional threats of cloud
components

This type of attack is associated with
multiple layers of the "clouds", the main
principle ofv security – the general level
of security is the security of the weakest
element.
Cloud element
Means of security
Proxy server
Protection against DoS-attacks
Web server
Monitoring the integrity of the web pages
Application server
Shielding of the applications
Data storage layer
Protection against SQL injections
Data storage systems
Access control and backups
74
Attacks on a client
These types of attacks have worked out in
a web environment, but they are just as
relevant in cloud environments, as users
connect to the cloud through a web
browser. Attacks include such types as
Cross Site Scripting (XSS), DoS attacks,
interception of web sessions, stealing
passwords, "the man in the middle” and
others.
75
Virtualization threats
Since the platform for the cloud elements,
usually is a virtual environment, the attack
on virtualization threatens the entire cloud
as a whole. This type of attack is unique to
cloud computing.
76
Threat of cloud complexity
Monitoring the events in the "cloud" and
management of them is also a security
issue. How do we ensure that all
resources are counted and that there is no
rogue virtual machine that perform thirdparty processes and do not interfere in
mutual configuration of the layers and
elements of the "cloud"?
77
Attacks on hypervisor
In fact, a key element in the virtual system
is a hypervisor which provides separation
of physical computer resources among
virtual machines. Interfering the work of
the hypervisor or its breach may allow one
virtual machine to access resources of
other – network traffic, stored data. This
can also lead to virtual machine
displacement from the server.
78
Threats of VM migration
Note that the virtual machine itself is a file that can
be executed on different nodes of the "cloud". The
system of virtual machine management includes
mechanisms for the transfer (migration) of virtual
machines.
Nevertheless, it is possible to steal virtual machine
file and run it out of the cloud. It is impossible to
steal the physical server from the data centre, but
you can steal files of virtual machines across the
network without physical access to servers.
79
Attacks on management
systems
A large number of virtual machines that
are used in the "clouds", especially in
public clouds require a management
system that can reliably control the
creation, transfer and utilization of virtual
machines. The interference in the
management system can lead to ghost
virtual machines, blocking some of the
machines and the substitution of elements
or layers in the cloud to the rogue.
80
Privacy, personal data
When it comes to the privacy of data, there are
a lot of problems with the legislation – such as
the processing of personal data and its
protection.
Choosing a cloud computing as a solution for
business systems, it is important to take into
account the confidentiality of the data that will
be stored in a "cloud". To store secret and top
secret data in the "cloud" environments is not
absolutely safe – that's why government
agencies are still not switched to “clouds”
81
Thank you!
82
Download