Distributed Systems Architecture

advertisement
Advanced
Operating Systems
Lecture 9: Distributed Systems
Architecture
University of Tehran
Dept. of EE and Computer Engineering
By:
Dr. Nasser Yazdani
Univ. of Tehran
Distributed Operating Systems
1
Covered topic


Distributed Systems Architectures
References



Chapter 2 of the text book
Chord:
Anatomy of Grid
Univ. of Tehran
Distributed Operating Systems
2
Outline





Distributed Systems Architecture
Client-server
Peer to peer Computing
Cloud Computing
Grid computing
Univ. of Tehran
Distributed Operating Systems
3
Architectural Models

Concerned with



The placement of the components across a
network of computers
The interrelationships between the components
Common Architectures




Client – server, Web
Peer to peer
Cloud
Grid
Univ. of Tehran
Distributed Operating Systems
4
Clients and Servers

General interaction between a client and a server.
1.25
Univ. of Tehran
Distributed Operating Systems
5
Processing Level

The general organization of an Internet
search engine into three different layers
1-28
Univ. of Tehran
Distributed Operating Systems
6
Multitiered Architectures
(1)

Alternative client-server organizations (a) – (e).
1-29
Univ. of Tehran
Distributed Operating Systems
7
Multitiered Architectures
(2)

An example of a server acting as a client.
1-30
Univ. of Tehran
Distributed Operating Systems
8
Client-Server
•Creating for example a hotmail? What are the options?
•One server?
•Several servers?
Client
invocation
res ult
invocation
Server
Server
res ult
Client
Key:
Process :
Univ. of Tehran
Distributed Operating Systems
Computer:
9
Multiple Servers
Service
Server
Client
Server
Client
Server
Univ. of Tehran
Distributed Operating Systems
10
HTTP Basics (Review)

HTTP layered over bidirectional byte
stream


Almost always TCP
Interaction
Client sends request to server, followed by
response from server to client
 Requests/responses are encoded in text


Stateless

Server maintains no information about past
client requests
How to Mark End of
Message? (Review)

Size of message  Content-Length


Delimiter  MIME-style Content-Type


Must know size of transfer in advance
Server must “escape” delimiter in content
Close connection

Only server can do this
HTTP Request (review)

Request line

Method
GET – return URI
 HEAD – return headers only of GET response
 POST – send data to the server (forms, etc.)


URL (relative)


E.g., /index.html
HTTP version
HTTP Request (cont.)
(review)

Request headers
Authorization – authentication info
 Acceptable document types/encodings
 From – user email
 If-Modified-Since
 Referrer – what caused this page to be
requested
 User-Agent – client software



Blank-line
Body
HTTP Request (review)
HTTP Request Example
(review)
GET / HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)
Host: www.intel-iris.net
Connection: Keep-Alive
HTTP Response (review)

Status-line


HTTP version
3 digit response code


1XX – informational
2XX – success


3XX – redirection




404 Not Found
5XX – server error


301 Moved Permanently
303 Moved Temporarily
304 Not Modified
4XX – client error


200 OK
505 HTTP Version Not Supported
Reason phrase
HTTP Response (cont.)
(review)

Headers











Location – for redirection
Server – server software
WWW-Authenticate – request for authentication
Allow – list of methods supported (get, head, etc)
Content-Encoding – E.g x-gzip
Content-Length
Content-Type
Expires
Last-Modified
Blank-line
Body
HTTP Response Example
(review)
HTTP/1.1 200 OK
Date: Tue, 27 Mar 2001 03:49:38 GMT
Server: Apache/1.3.14 (Unix) (Red-Hat/Linux)
mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2
mod_perl/1.24
Last-Modified: Mon, 29 Jan 2001 17:54:18 GMT
ETag: "7a11f-10ed-3a75ae4a"
Accept-Ranges: bytes
Content-Length: 4333
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html
…..
Typical Workload (Web
Pages)


Multiple (typically small) objects per page
File sizes

Heavy-tailed


Pareto distribution for tail
Lognormal for body of distribution
-- For reference/interest only --

Embedded references

Number of embedded objects =
pareto – p(x) = akax-(a+1)
HTTP 0.9/1.0 (mostly
review)

One request/response per TCP
connection


Simple to implement
Disadvantages

Multiple connection setups  three-way
handshake each time


Several extra round trips added to transfer
Multiple slow starts
Single Transfer Example
Client
SYN
0 RTT
Client opens TCP
connection
1 RTT
Client sends HTTP request
for HTML
SYN
DAT
ACK
2 RTT
ACK
Server reads from
DAT
disk
FIN
ACK
Client parses HTML
Client opens TCP
connection
FIN
ACK
3 RTT
Client sends HTTP request
for image
4 RTT
SYN
SYN
ACK
DAT
ACK
Image begins to arrive
Server
DAT
Server reads from
disk
More Problems

Short transfers are hard on TCP



Lots of extra connections


Stuck in slow start
Loss recovery is poor when windows are small
Increases server state/processing
Server also forced to keep TIME_WAIT
connection state
-- Things to think about -

Why must server keep these?
Tends to be an order of magnitude greater than
# of active connections, why?
Persistent Connection
Solution (review)


Multiplex multiple transfers onto one TCP
connection
How to identify requests/responses





Delimiter  Server must examine response for delimiter
string
Content-length and delimiter  Must know size of
transfer in advance
Block-based transmission  send in multiple length
delimited blocks
Store-and-forward  wait for entire response and then
use content-length
Solution  use existing methods and close connection
otherwise
Persistent Connection
Example (review)
Client
0 RTT
Client sends HTTP request
for HTML
DAT
ACK
DAT
1 RTT
Client parses HTML
Client sends HTTP request
for image
2 RTT
Image begins to arrive
Server
Server reads from
disk
ACK
DAT
ACK
DAT
Server reads from
disk
Persistent HTTP
(review)
Nonpersistent HTTP issues:
Persistent without pipelining:
 Requires 2 RTTs per object
 Client issues new request
 OS must work and allocate
only when previous
host resources for each TCP
response has been received
connection
 One RTT for each
 But browsers often open
referenced object
parallel TCP connections to
Persistent with pipelining:
fetch referenced objects
 Default in HTTP/1.1
Persistent HTTP
 Client sends requests as
 Server leaves connection
soon as it encounters a
open after sending response
referenced object
 Subsequent HTTP messages
 As little as one RTT for all
between same client/server
the referenced objects
HTTP Caching

Clients often cache documents


Challenge: update of documents
If-Modified-Since requests to check



HTTP 0.9/1.0 used just date
HTTP 1.1 has an opaque “entity tag” (could be a file
signature, etc.) as well
When/how often should the original be
checked for changes?



Check every time?
Check each session? Day? Etc?
Use Expires header

If no Expires, often use Last-Modified as estimate
Ways to cache




Client-directed caching
Web Proxies
Server-directed caching
Content Delivery Networks (CDNs)
Caching Example (1)
Assumptions
 Average object size = 100,000
bits
 Avg. request rate from
institution’s browser to origin
servers = 15/sec
 Delay from institutional router
to any origin server and back
to router = 2 sec
Consequences



Utilization on LAN = 15%
Utilization on access link = 100%
Total delay = Internet delay +
access delay + LAN delay
origin
servers
public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
Caching Example (2)
Possible solution
 Increase bandwidth of
access link to, say, 10
Mbps

origin
servers
public
Internet
Often a costly upgrade
Consequences
Utilization on LAN = 15%

Utilization on access link = 15%
 Total delay
= Internet delay +
access delay + LAN delay
= 2 sec + msecs + msecs

10 Mbps
access link
institutional
network
10 Mbps LAN
Caching Example (3)
Install cache

origin
servers
Suppose hit rate is .4
Consequence
40% requests will be satisfied
almost immediately (say 10
msec)

60% requests satisfied by
origin server

Utilization of access link
reduced to 60%, resulting in
negligible delays

Weighted average of delays
= .6*2 sec + .4*10msecs < 1.3
secs

public
Internet
1.5 Mbps
access link
institutional
network
10 Mbps LAN
institutional
cache
Problems

First: Over 50% of all HTTP objects are uncacheable
– why?



Dynamic data  stock prices, scores, web cams
CGI scripts  results based on passed parameters
Obvious fixes

SSL  encrypted data is not cacheable




Most web clients don’t handle mixed pages well many generic
objects transferred with SSL
Cookies  results may be based on passed data
Hit metering  owner wants to measure # of hits for
revenue, etc.
Second: How about other clients using the same
data.
Web Proxy Caches


User configures
browser: Web accesses
via cache
Browser sends all HTTP
requests to cache


Object in cache: cache
returns object
Else cache requests
object from origin
server, then returns
object to client
origin
server
Proxy
server
client
client
origin
server
Content Distribution
Networks (CDNs)
Problem: so many cleints?
Content replication
 CDN company installs
hundreds of CDN servers
throughout Internet
 Close to users
 CDN replicates its customers’
content in CDN servers.
When provider updates
content, CDN updates
servers

origin server
in North America
CDN distribution node
CDN server
in S. America
CDN server
in Europe
CDN server
in Asia
Networks &
Server Selection


Replicate content on many servers
Challenges






How to replicate content
Where to replicate content
How to find replicated content
How to choose among known replicas
How to direct clients towards replica
Consistency of data?
Server Selection

Which server?


Lowest load  to balance load on servers
Best performance  to improve client
performance



Based on Geography? RTT? Throughput? Load?
Any alive node  to provide fault tolerance
How to direct clients to a particular server?



As part of routing  anycast, cluster load
balancing
As part of application  HTTP redirect
As part of naming  DNS
Application Based


HTTP supports simple way to indicate that Web
page has moved (30X responses)
Server receives Get request from client





Decides which server is best suited for particular client
and object
Returns HTTP redirect to that server
Can make informed application specific decision
May introduce additional overhead  multiple
connection setup, name lookups, etc.
OK solution in general, but…


HTTP Redirect has some flaws – especially with current
browsers
Incurs many delays, which operators may really care
Naming Based


Client does DNS name lookup for service
Name server chooses appropriate server
address


A-record returned is “best” one for the client
What information can name server base
decision on?


Server load/location  must be collected
Information in the name lookup request

Name service client  typically the local name server
for client
How Akamai Works

Clients fetch html document from primary
server


E.g. fetch index.html from cnn.com
URLs for replicated content are replaced in
html

E.g. <img src=“http://cnn.com/af/x.gif”> replaced
with <img
src=“http://a73.g.akamaitech.net/7/23/cnn.com/af/x.gif”>

Client is forced to resolve
aXYZ.g.akamaitech.net hostname
How Akamai Works




How is content replicated?
Akamai only replicates static content (*)
Modified name contains original file name
Akamai server is asked for content


First checks local cache
If not in cache, requests file from primary server
and caches file
How Akamai Works


Root server gives DNS record for akamai.net
Akamai.net name server returns DNS record
for g.akamaitech.net



Name server chosen to be in region of client’s
name server
TTL is large
G.akamaitech.net name server chooses
server in region



Should try to chose server that has file in cache How to choose?
Uses aXYZ name and hash
TTL is small  why?
Simple Hashing



Given document XYZ, we need to choose a
server to use
Suppose we use modulo n
Number servers from 1…n


Place document XYZ on server (XYZ mod n)
What happens when a servers fails? n  n-1


Same if different people have different measures of n
Why might this be bad?
Consistent Hash


“view” = subset of all hash buckets that are
visible
Desired features




Balanced – in any one view, load is equal across
buckets
Smoothness – little impact on hash bucket contents
when buckets are added/removed
Spread – small set of hash buckets that may hold
an object regardless of views
Load – across all views # of objects assigned to
hash bucket is small
Consistent Hash –
Example
• Construction
• Assign each of C hash buckets to
random points on mod 2n circle,
where, hash key size = n.
• Map object to random position on
circle
• Hash of object = closest
clockwise bucket



0
14
12
Bucket
4
8
Smoothness  addition of bucket does not cause
movement between existing buckets
Spread & Load  small set of buckets that lie
near object
Balance  no bucket is responsible for large
number of objects
How Akamai Works
cnn.com (content provider)
DNS root server
Akamai server
Get foo.jpg
12
Get
index.
html
1
11
2
5
3
6
7
4
8
9
End-user
10
Get /cnn.com/foo.jpg
Akamai high-level
DNS server
Akamai low-level DNS
server
Nearby matching
Akamai server
Akamai – Subsequent
Requests
cnn.com (content provider)
Get
index.
html
1
DNS root server
Akamai high-level
DNS server
2
7
8
9
End-user
Akamai server
10
Get
/cnn.com/foo.jpg
Akamai low-level DNS
server
Nearby matching
Akamai server
Impact on DNS Usage

DNS is used for server selection more and
more





What are reasonable DNS TTLs for this type of
use
Typically want to adapt to load changes
Low TTL for A-records  what about NS
records?
How does this affect caching?
What do the first and subsequent lookup
do?
HTTP (Summary)

Simple text-based file exchange protocol


Workloads



Typical documents structure, popularity
Server workload
Interactions with TCP



Support for status/error responses, authentication,
client-side state maintenance, cache maintenance
Connection setup, reliability, state maintenance
Persistent connections
How to improve performance



Persistent connections
Caching
Replication
Why Study Peer to peer?




To understand how they work
To build your own peer to peer system
To understand the techniques and principles within
them
To modify, adapt, reuse these techniques and
principles in other related areas



Cloud computing
Sensor networks
Why Peer to Peer?
 to share and exchange resources they have books,
class notes, experiences, videos, music cd’s
 Why not Web: is heavy weight for specific
resources:
General framework
Somebody want to share his/her movie,
book, music.
Others: want to watch that great Movie, etc.
-They can download and watch, but needs
1.Search:
“better off dead” -> better_off_dead.mov
or -> 0x539fba83ajdeadbeef
2.Locate sources of better_off_dead.mov
3.Download the file from them
50
Searching
Need search.
N1
Key=“title”
Value=MP3 data…
Publisher
N2
Internet
N4
N5
N3
?
Client
Lookup(“title”)
N6
51
Search Approaches




Centralized
Flooding
A hybrid: Flooding between “Supernodes”
Structured
52
Primitives & Structure

Common Primitives:





Join: how to I begin participating?
Publish: how do I advertise my file?
Search: how to I find a file?
Fetch: how to I retrieve a file?
Centralized Database:



Join: on startup, client contacts central server
Publish: reports list of files to central server
Search: query the server => return node(s)
that store the requested file
53
Napster Example: Publish
insert(X,
123.2.21.23)
...
Publish
I have X, Y, and Z!
123.2.21.23
54
Napster: Search
123.2.0.18
Fetch
Query
search(A)
-->
123.2.0.18
Reply
Where is file A?
55
Napster: Discussion


Pros:
 Simple
 Search scope is O(1) for even complex searches
(one index, etc.)
 Controllable (pro or con?)
Cons:
 Server maintains O(N) State
 Server does all processing
 Single point of failure

Technical failures + legal (napster shut down 2001)
56
Query Flooding

Join: Must join a flooding network



Usually, establish peering with a few existing
nodes
Publish: no need, just reply
Search: ask neighbors, who ask their
neighbors, and so on... when/if found, reply
to sender.

TTL limits propagation
57
Example: Gnutella
I have file A.
I have file A.
Reply
Query
Where is file A?
58
Flooding: Discussion

Pros:




Cons:




Fully de-centralized
Search cost distributed
Processing @ each node permits powerful search semantics
Search scope is O(N)
Search time is O(???)
Nodes leave often, network unstable
TTL-limited search works well for haystacks.

For scalability, does NOT search every node. May have to
re-issue query later
59
Supernode Flooding

Why everybody should participate in search?



A subset of nodes for search, supernode, like
multicast.
Kazal Technology
Mechanism:



Join: on startup, client contacts a “supernode” ...
may at some point become one itself
Publish: send list of files to supernode
Search: send query to supernode, supernodes
flood query amongst themselves.
 Supernode network just like prior flooding net
60
Supernode Network
Design
“Super Nodes”
61
Supernode: File Insert
insert(X,
123.2.21.23)
...
Publish
I have X!
123.2.21.23
62
Supernode: File Search
search(A)
-->
123.2.22.50
123.2.22.50
Query
Replies
search(A)
-->
123.2.0.18
Where is file A?
123.2.0.18
63
Supernode: Which nodes?

Often, bias towards nodes with good:
 Bandwidth
 Computational Resources
 Availability!
64
Stability and Superpeers

Why superpeers?

Query consolidation



Caching effect


Many connected nodes may have only a few files
Propagating a query to a sub-node would take
more b/w than answering it yourself
Requires network stability
Superpeer selection is time-based

How long you’ve been on is a good predictor
of how long you’ll be around.
65
Superpeer results



Basically, “just better” than flood to all
Gets an order of magnitude or two better
scaling
But still fundamentally: o(search) *
o(per-node storage) = O(N)



central: O(1) search, O(N) storage
flood: O(N) search, O(1) storage
Superpeer: can trade between
66
Structured Search:
Distributed Hash Tables


Academic answer to p2p
Goals




Makes some things harder



Guatanteed lookup success
Provable bounds on search time
Provable scalability
Fuzzy queries / full-text search / etc.
Read-write, not read-only
Hot Topic in networking since introduction in
~2000/2001
67
Searching Wrap-Up
Type
O(search) storage
Fuzzy?
Central
O(1)
O(N)
Yes
Flood
~O(N)
O(1)
Yes
Super
< O(N)
> O(1)
Yes
O(log N)
not really
Structured O(log N)
68
DHT: Overview

Abstraction: a distributed “hash-table” (DHT)
data structure:



put(id, item);
item = get(id);
Implementation: nodes in system form a
distributed data structure

Can be Ring, Tree, Hypercube, Skip List, Butterfly
Network, ...
69
DHT: Overview (2)

Structured Overlay Routing:




Join: On startup, contact a “bootstrap” node and integrate
yourself into the distributed data structure; get a node id
Publish: Route publication for file id toward a close node id
along the data structure
Search: Route a query for file id toward a close node id.
Data structure guarantees that query will meet the
publication.
Important difference: get(key) is for an exact match on
key!
 search(“spars”) will not find file(“briney spars”)
 We can exploit this to be more efficient
70
DHT: Example - Chord

Associate to each node and file a unique id in an
uni-dimensional space (a Ring)



E.g., pick from the range [0...2m]
Usually the hash of the file or IP address
Properties:


Routing table size is O(log N) , where N is the total
number of nodes
Guarantees that a file is found in O(log N) hops
from MIT in 2001
71
DHT: Consistent Hashing
Key 5
Node 105
K5
N105
K20
Circular ID space
N32
N90
K80
A key is stored at its successor: node with next higher ID
72
DHT: Chord Basic Lookup
N120
N10
N105
“N90 has K80”
“Where is key 80?”
N32
K80 N90
N60
73
DHT: Chord “Finger Table”
1/4
1/2
1/8
1/16
1/32
1/64
1/128
N80
• Entry i in the finger table of node n is the first node that succeeds or
equals n + 2i
• In other words, the ith finger points 1/2n-i way around the ring
74
Node Join


Compute ID
Use an existing node to route to that ID in
the ring.



Finds s = successor(id)
ask s for its predecessor, p
Splice self into ring just like a linked list

p->successor = me
me->successor = s
me->predecessor = p

s->predecessor = me


75
DHT: Chord Join

Assume an identifier space [0..8]

Node n1 joins
Succ. Table
i id+2i succ
0 2
1
1 3
1
2 5
1
0
1
7
6
2
5
3
4
76
DHT: Chord Join

Node n2 joins
Succ. Table
i id+2i succ
0 2
2
1 3
1
2 5
1
0
1
7
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
1
1 4
1
2 6
1
77
DHT: Chord Join
Succ. Table

i id+2i succ
0 1
1
1 2
2
2 4
0
Nodes n0, n6 join
Succ. Table
i id+2i succ
0 2
2
1 3
6
2 5
6
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
78
DHT: Chord Join
Succ. Table

Nodes:
n1, n2, n0, n6

Items:
f7, f2
i
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
1
7
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
Succ. Table
6
i
i id+2
0 2
1 3
2 5
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
79
DHT: Chord Routing
Succ. Table
i



Upon receiving a query for
item id, a node:
Checks whether stores the
item locally
If not, forwards the query
to the largest node in its
successor table that does
not exceed id
Succ. Table
i id+2i succ
0 7
0
1 0
0
2 2
2
i id+2
0 1
1 2
2 4
Items
7
succ
1
2
0
0
Succ. Table
1
7
i
i id+2
0 2
1 3
2 5
query(7)
6
Items
succ 1
2
6
6
2
Succ. Table
5
3
4
i id+2i succ
0 3
6
1 4
6
2 6
6
80
DHT: Chord Summary

Routing table size?


Log N fingers
Routing time?

Each hop expects to 1/2 the distance to the
desired id => expect O(log N) hops.
81
DHT: Discussion

Pros:



Guaranteed Lookup
O(log N) per node state and search scope
Cons:


This line used to say “not used.” But:
Now being used in a few apps, including
BitTorrent.
Supporting non-exact match search is
(quite!) hard
82
The limits of search:
A Peer-to-peer Google?

Complex intersection queries (“the” + “who”)


Billions of hits for each term alone
Sophisticated ranking

Must compare many results before returning a
subset to user

Very, very hard for a DHT / p2p system

Need high inter-node bandwidth
 (This is exactly what Google does - massive
clusters)
But maybe many file sharing queries are okay...

83
Fetching Data


Once we know which node(s) have the
data we want...
Option 1: Fetch from a single peer

Problem: Have to fetch from peer who has
whole file.



Peers not useful sources until d/l whole file
At which point they probably log off. :)
How can we fix this?
84
Chunk Fetching


More than one node may have the file.
How to tell?





Must be able to distinguish identical files
Not necessarily same filename
Same filename not necessarily same file...
Use hash of file
 Common: MD5, SHA-1, etc.
How to fetch?


Get bytes [0..8000] from A, [8001...16000] from B
Alternative: Erasure Codes
85
BitTorrent: Overview

Swarming:





Join: contact centralized “tracker” server, get a list
of peers.
Publish: Run a tracker server.
Search: Out-of-band. E.g., use Google to find a
tracker for the file you want.
Fetch: Download chunks of the file from your
peers. Upload chunks you have to them.
Big differences from Napster:



Chunk based downloading (sound familiar? :)
“few large files” focus
Anti-freeloading mechanisms
86
BitTorrent


Periodically get list of peers from tracker
More often:

Ask each peer for what chunks it has




(Or have them update you)
Request chunks from several peers at a
time
Peers will start downloading from you
BT has some machinery to try to bias
towards helping those who help you
87
BitTorrent: Publish/Join
Tracker
88
BitTorrent: Fetch
89
BitTorrent: Summary

Pros:



Works reasonably well in practice
Gives peers incentive to share resources; avoids
freeloaders
Cons:


Central tracker server needed to bootstrap swarm
(Tracker is a design choice, not a requirement, as
you know from your projects. Modern BitTorrent
can also use a DHT to locate peers. But approach
still needs a “search” mechanism)
90
Writable, persistent p2p


Do you trust your data to 100,000 monkeys?
Node availability hurts




Ex: Store 5 copies of data on different nodes
When someone goes away, you must replicate the
data they held
Hard drives are *huge*, but cable modem upload
bandwidth is tiny - perhaps 10 Gbytes/day
Takes many days to upload contents of 200GB hard
drive. Very expensive leave/replication situation!
91
What’s out there?
Central
Flood
Whole
File
Napster
Gnutella
Chunk
Based
BitTorrent
Supernode
flood
Route
Freenet
KaZaA
(bytes,
not
chunks)
DHTs
eDonkey2
000
92
P2P: Summary

Many different styles; remember pros and cons of
each


centralized, flooding, swarming, unstructured and
structured routing
Lessons learned:







Single points of failure are bad
Flooding messages to everyone is bad
Underlying network topology is important
Not all nodes are equal
Need incentives to discourage freeloading
Privacy and security are important
Structure can provide theoretical bounds and guarantees
93
Cloud Computing
Infrastructure
Take a seat & prepare to fly
Anh M. Nguyen
CS525, UIUC, Spring 2009
94
What is cloud computing?


I don’t understand what we would do
differently in the light of Cloud Computing
other than change the wordings of some of
our ads
Larry Ellision, Oracle’s CEO
I have not heard two people say the same
thing about it [cloud]. There are multiple
definitions out there of “the cloud”
Andy Isherwood, HP’s Vice President of European Software Sales

It’s stupidity. It’s worse than stupidity: it’s a
marketing hype campaign.
95
Richard
Stallman, Free Software Foundation founder
What is a Cloud?





It’s a cluster! It’s a supercomputer! It’s a
datastore!
It’s superman!
None of the above
All of the above
Cloud = Lots of storage + compute
cycles nearby
96
What is a Cloud?

A single-site cloud (aka “Datacenter”) consists
of







Compute nodes (split into racks)
Switches, connecting the racks
A network topology, e.g., hierarchical
Storage (backend) nodes connected to the network
Front-end for submitting jobs
Services: physical resource set, software services
A geographically distributed cloud consists of


Multiple such sites
Each site perhaps with a different structure and
services
97
A Sample Cloud Topology
Core Switch
op of the Rack Switch
Rack
Servers
98
Scale of Industry
Datacenters

Microsoft [NYTimes, 2008]





Yahoo! [Hadoop Summit, 2009]



25,000 machines
Split into clusters of 4000
AWS EC2 (Oct 2009)



150,000 machines
Growth rate of 10,000 per month
Largest datacenter: 48,000 machines
80,000 total running Bing
40,000 machines
8 cores/machine
Google

(Rumored) several hundreds of thousands of machines
99
The first datacenters!
“A Cloudy History of Time”
©
1940
1950
Timesharing Companies & Data Processing Indu
1960
Clusters
1970
Grids
1980
PCs
(not distributed!)
1990
2000
Peer to peer
systems
2010
Clouds and datacenters
100
“A Cloudy History of Time”
First large datacenters: ENIAC, ORDVAC, ILLIAC
Many used vacuum tubes and mechanical relays
Berkeley NOW Project
Supercomputers
Server Farms (e.g., Ocean
©
P2P Systems (90s-00
•Many Millions of use
•Many GB per day
Data Processing Industry
968: $70 M. 1978: $3.15 Billion.
Timesharing Industry (1975):
•Market Share: Honeywell 34%, IBM 15%,
•Xerox 10%, CDC 10%, DEC 10%, UNIVAC 10%
•Honeywell 6000 & 635, IBM 370/168,
Xerox 940 & Sigma 9, DEC PDP-10, UNIVAC 1108
Grids (1980s-2000s):
Clouds
•GriPhyN (1970s-80s)
•Open Science Grid and Lambda Rail (2000s) 101
Trends: Technology


Doubling Periods – storage: 12 mos,
bandwidth: 9 mos, and (what law is this?)
cpu speed: 18 mos
Then and Now
Bandwidth
1985: mostly 56Kbps links nationwide
 2004: 155 Mbps links widespread
Disk capacity
 Today’s PCs have 100GBs, same as a 1990
supercomputer

102
Trends: Users

Then and Now
Biologists:
1990: were running small single-molecule
simulations
 2004: want to calculate structures of complex
macromolecules, want to screen thousands of
drug candidates, sequence very complex genomes
Physicists
 2008 onwards: CERN’s Large Hadron Collider will
produce 700 MB/s or 15 PB/year


Trends in Technology and User
Requirements: Independent or Symbiotic?
103
Prophecies
In 1965, MIT's Fernando Corbató and the other
designers of the Multics operating system
envisioned a computer facility operating “like a
power company or water company”.
Plug your thin client into the computing Utility
and Play your favorite Intensive Compute &
Communicate Application

[Have today’s clouds brought us closer to this
reality?]
104
So, clouds have
been around for
decades! But aside
from massive scale
what’s new about
today’s cloud
computing?!
105
What(’s new) in Today’s
Clouds?
Three major features:
On-demand access: Pay-as-you-go, no upfront
commitment.
I.

Anyone can access it (e.g., Washington Post – Hillary Clinton
example)
Data-intensive Nature: What was MBs has now become
TBs.
II.


Daily logs, forensics, Web data, etc.
Do you know the size of Wikipedia dump?
New Cloud Programming Paradigms: MapReduce/Hadoop,
Pig Latin, DryadLinq, Swift, and many others.
III.

High in accessibility and ease of programmability
Combination of one or more of these gives rise to novel and
unsolved distributed computing problems in cloud
computing.
106
I. On-demand access: *aaS
Classification
On-demand: renting a cab vs (previously) renting a car, or buying one. E.g.:



HaaS: Hardware as a Service




You get access to flexible computing and storage infrastructure. Virtualization is
one way of achieving this. Often said to subsume HaaS.
Ex: Amazon Web Services (AWS: EC2 and S3), Eucalyptus, Rightscale.
PaaS: Platform as a Service



You get access to barebones hardware machines, do whatever you want with
them
Ex: Your own cluster, Emulab
IaaS: Infrastructure as a Service


AWS Elastic Compute Cloud (EC2): $0.086-$1.16 per CPU hour
AWS Simple Storage Service (S3): $0.055-$0.15 per GB-month
You get access to flexible computing and storage infrastructure, coupled with a
software platform (often tightly)
Ex: Google’s AppEngine
SaaS: Software as a Service


You get access to software services, when you need them. Often said to
subsume SOA (Service Oriented Architectures).
Ex: Microsoft’s LiveMesh, MS Office on demand
107
II. Data-intensive Computing

Computation-Intensive Computing



Data-Intensive





Example areas: MPI-based, High-performance computing, Grids
Typically run on supercomputers (e.g., NCSA Blue Waters)
Typically store data at datacenters
Use compute nodes nearby
Compute nodes run computation services
In data-intensive computing, the focus shifts from computation to the data:
CPU utilization no longer the most important resource metric
Problem areas include







Distributed systems
Middleware
OS
Storage
Networking
Security
Others
108
III. New Cloud Programming
Paradigms
Dataflow programming frameworks
 Google: MapReduce and Sawzall
 Yahoo: Hadoop and Pig Latin
 Microsoft: DryadLINQ
 Facebook: Hive
 Amazon: Elastic MapReduce service (pay-as-you-go)
 Google (MapReduce)
 Indexing: a chain of 24 MapReduce jobs
 ~200K jobs processing 50PB/month (in 2006)
 Yahoo! (Hadoop + Pig)
 WebMap: a chain of 100 MapReduce jobs
 280 TB of data, 2500 nodes, 73 hours
 Facebook (Hadoop + Hive)
 ~300TB total, adding 2TB/day (in 2008)
 3K jobs processing 55TB/day
 Similar numbers from other companies, e.g., Yieldex,
109
Two Categories of Clouds

Industrial Clouds



Can be either a (i) public cloud, or (ii) private cloud
Private clouds are accessible only to company employees
Public clouds provide service to any paying customer:




Amazon S3 (Simple Storage Service): store arbitrary datasets ,pay per GBonth stored
Amazon EC2 (Elastic Compute Cloud): upload and run arbitrary images, pay
per CPU hour used
Google AppEngine: develop applications within their appengine framework,
upload data that will be imported into their format, and run
Academic Clouds




Allow researchers to innovate, deploy, and experiment
Google-IBM Cloud (U. Washington): run apps programmed atop
Hadoop
Cloud Computing Testbed (CCT @ UIUC): first cloud testbed to support
systems research. Runs: (i) apps programmed atop Hadoop and Pig, (ii)
systems-level research on this first generation of cloud computing
models (~HaaS), and (iii) Eucalyptus services (~AWS EC2).
http://cloud.cs.illinois.edu
OpenCirrus: first federated cloud testbed. http://opencirrus.org
110
Academic Clouds

CCT = Cloud Computing Testbed




NSF infrastructure
Used by 10+ NSF projects, including several nonUIUC projects
Housed within Siebel Center (4th floor!)
Accessible to students of CS525!

Almost half of SP09 course used CCT for their projects

OpenCirrus = Federated Cloud Testbed

Contains CCT and other sites
If you need a CCT account for your CS525 experiment,
let me know asap! There are a limited number of these
available for CS525

111
Cloud Computing Testbed
(CCT)
112
CCT Hardware in more
Detail
•128 compute nodes = 64+
•500 TB & 1000+ shared co
113
Goal of CCT
Support both Systems Research
and Applications Research
in Data-intensive Distributed
Computing
114
Open Cirrus Federation
First open federated cloud testbed
Shared: research, applications, infrastructure (9*1,000 cores), data sets
Global services: sign on, monitoring, store, etc., Federated clouds, meaning each is different
RAS
Intel
HP
KIT (de)
ETRI
Yahoo UIUC CMU
IDA (sg)
MIMOS
115
21 March 2016
Grown to 9 sites, with more to come
115
10 Challenges [Above the
Clouds]
(Index: Performance Data-related Scalability Logisitical)










Availability of Service: Use Multiple Cloud Providers; Use Elasticity;
Prevent DDOS
Data Lock-In: Enable Surge Computing; Standardize APIs
Data Confidentiality and Auditability: Deploy Encryption, VLANs,
Firewalls: Geographical Data Storage
Data Transfer Bottlenecks: Data Backup/Archival; Higher BW
Switches; New Cloud Topologies; FedExing Disks
Performance Unpredictability: QoS; Improved VM Support; Flash
Memory; Schedule VMs
Scalable Storage: Invent Scalable Store
Bugs in Large Distributed Systems: Invent Debuggers; Real-time
debugging; predictable pre-run-time debugging
Scaling Quickly: Invent Good Auto-Scalers; Snapshots for
Conservation
Reputation Fate Sharing
Software Licensing: Pay-for-use licenses; Bulk use sales
116
A more Bottom-Up View of
Open Research Directions
Myriad interesting problems that acknowledge the characteristics that make
today’s cloud computing unique: massive scale + on-demand + dataintensive + new programmability + and infrastructure- and applicationspecific details.














Monitoring: of systems&applications; single site and multi-site
Storage: massive scale; global storage; for specific apps or classes
Failures: what is their effect, what is their frequency, how do we achieve
fault-tolerance?
Scheduling: Moving tasks to data, dealing with federation
Communication bottleneck: within applications, within a site
Locality: within clouds, or across them
Cloud Topologies: non-hierarchical, other hierarchical
Security: of data, of users, of applications, confidentiality, integrity
Availability of Data
Seamless Scalability: of applications, of clouds, of data, of everything
Inter-cloud/multi-cloud computations
Second Generation of Other Programming Models? Beyond MapReduce!
Pricing Models
117
Explore the limits today’s of cloud computing
New Parallel Programming
Paradigms: MapReduce






Highly-Parallel Data-Processing
Originally designed by Google (OSDI 2004 paper)
Open-source version called Hadoop, by Yahoo!
 Hadoop written in Java. Your implementation could
be in Java, or any executable
Google (MapReduce)
 Indexing: a chain of 24 MapReduce jobs
 ~200K jobs processing 50PB/month (in 2006)
Yahoo! (Hadoop + Pig)
 WebMap: a chain of 100 MapReduce jobs
 280 TB of data, 2500 nodes, 73 hours
Annual Hadoop Summit: 2008 had 300 attendees, 2009
had 700 attendees
118
What is MapReduce?
Terms are borrowed from Functional Language
(e.g., Lisp)
Sum of squares:


(map square ‘(1 2 3 4))
Output: (1 4 9 16)
[processes each record sequentially and independently]


(reduce + ‘(1 4 9 16))
(+ 16 (+ 9 (+ 4 1) ) )
 Output: 30
[processes set of all records in a batch]

119
Map

Process individual key/value pair to
generate intermediate key/value pairs.
Welcome
Everyone
Hello Everyone
Input <filename, file text>
Welcome
Everyone
Hello
Everyone
1
1
1
1
120
Reduce

Processes and merges all intermediate
values associated with each given key
assigned to it
Welcome
Everyone
Hello
Everyone
1
1
1
1
Everyone 2
Hello
1
Welcome 1
121
Some Applications

Distributed Grep:

Map - Emits a line if it matches the supplied
pattern
Reduce - Copies the intermediate data to output
Count of URL access frequency
 Map – Process web log and outputs <URL, 1>
 Reduce - Emits <URL, total count>
Reverse Web-Link Graph
 Map – process web log and outputs <target,



source>

Reduce - emits <target, list(source)>
122
Programming MapReduce
Externally: For user

1.
2.
3.
Write a Map program (short), write a Reduce program
(short)
Submit job; wait for result
Need to know nothing about parallel/distributed
programming!
Internally: For the cloud (and for us distributed
systems researchers)

1.
2.
3.
4.
Parallelize Map
Transfer data from Map to Reduce
Parallelize Reduce
Implement Storage for Map input, Map output, Reduce input,
and Reduce output
123
Inside MapReduce
For the cloud (and for us distributed systems
researchers)

Parallelize Map: easy! each map job is independent of the other!
1.
All Map output records with same key assigned to same Reduce

2.
Transfer data from Map to Reduce:


3.
4.
All Map output records with same key assigned to same Reduce task
use partitioning function (more soon)
Parallelize Reduce: easy! each reduce job is independent of the
other!
Implement Storage for Map input, Map output, Reduce input,
and Reduce output
Map input: from distributed file system

Map output: to local disk (at Map node); uses local file system

Reduce input: from (multiple) remote disks; uses local file systems

Reduce output: to distributed file system
local file system = Linux FS, etc.
distributed file system = GFS (Google File System), HDFS (Hadoop
Distributed File System)

124
Internal Workings of
MapReduce
125
Fault Tolerance

Worker Failure

Master keeps 3 states for each worker task


Master sends periodic pings to each worker to keep track of
it (central failure detector)




(idle, in-progress, completed)
If fail while in-progress, mark the task as idle
If map workers fail after completed, mark worker as idle
Reduce task does not start until all Map tasks done, and all its
(Reduce’s) data has been fetched
Master Failure

Checkpoint
126
Locality and Backup tasks

Locality


Since cloud has hierarchical topology
GFS stores 3 replicas of each of 64MB chunks



Maybe on different racks
Attempt to schedule a map task on a machine
that contains a replica of corresponding input
data: why?
Stragglers (slow nodes)


Due to Bad Disk, Network Bandwidth, CPU, or
Memory.
Perform backup (replicated) execution of
straggler task: task done when first replica
complete
127
Testbed: 1800 servers each with 4GB RAM, dual 2GHz Xeon,
dual 169 GB IDE disk, 100 Gbps, Gigabit ethernet per machine
Grep
Locality optimization helps:


1800 machines read 1 TB at peak ~31 GB/s
W/out this, rack switches would limit to 10 GB/s
Startup overhead is significant for short jobs
Workload: 1010 100-byte records to extract records
matching a rare pattern (92K matching records)
128
Discussion Points

Hadoop always either outputs complete results, or
none



Storage: Is the local write-remote read model
good for Map output/Reduce input?



Partial results?
Can you characterize partial results of a partial
MapReduce run?
What happens on node failure?
Can you treat intermediate data separately, as a firstclass citizen?
Entire Reduce phase needs to wait for all Map
tasks to finish: in other words, a barrier


Why? What is the advantage? What is the disadvantage?
Can you get around this?
129
Grid
1.
2.
3.
4.
5.
What is Grid?
Grid Projects & Applications
Grid Technologies
Globus
CompGrid
Definition
A type of parallel and distributed system that enables the
sharing, selection, & aggregation of geographically distributed
resources:





Computers – PCs, workstations, clusters, supercomputers,
laptops, notebooks, mobile devices, PDA, etc;
Software – e.g., ASPs renting expensive special purpose applications on
demand;
Catalogued data and databases – e.g. transparent access to human
genome database;
Special devices/instruments – e.g., radio telescope – SETI@Home
searching for life in galaxy.
People/collaborators.
depending on their availability, capability, cost, and user QoS
requirements
for solving large-scale problems/applications.
thus enabling the creation of “virtual organization” (VOs)
Resources = assets,
capabilities, and knowledge







Capabilities (e.g. application codes, analysis
tools)
Compute Grids (PC cycles, commodity clusters,
HPC)
Data Grids
Experimental Instruments
Knowledge Services
Virtual Organisations
Utility Services
Why go Grid?







Hot subject
Try it, experience it to learn the potential
Will enable true ubiquitous computing in future
Today, proven in some areas: intraGrids, But still
long way to World Wide Grid
State of art techniques, tools are difficult
Short term goals? Use another technology
Does your system have Grid characteristics?

Distributed users, large scale and heterogeneous
resources, across domains
Grid‘s main idea





To treat CPU cycles and software like commodities.
Enable the coordinated use of geographically distributed
resources – in the absence of central control and existing
trust relationships.
Computing power is produced much like utilities such as
power and water are produced for consumers.
Users will have access to “power” on demand
“When the Network is as fast as the computer’s internal
links, the machine disintegrates across the Net into a set
of special purpose appliances” – Gilder Technology
Report June 2000
Computational Grids and Electric
Power Grids
What do users want ?

Grid Consumers




Execute jobs for solving varying problem size and
complexity
Benefit by selecting and aggregating resources
wisely
Tradeoff timeframe and cost
Grid Providers



Contribute (“idle”) resource for executing consumer
jobs
Benefit by maximizing resource utilisation
Tradeoff local requirements & market opportunity
Grid
Applications
Distributed HPC (Supercomputing):



Computational science.
High-Capacity/Throughput Computing:
Large scale simulation/chip design & parameter studies.
Content Sharing (free or paid)
 Sharing digital contents among peers (e.g., Napster)
Remote software access/renting services:
 Application service provides (ASPs) & Web services.




Data-intensive computing:
Drug Design, Particle Physics, Stock Prediction...
On-demand, real-time computing:
 Medical instrumentation & Mission Critical.
Collaborative Computing:
 Collaborative design, Data exploration, education.




Service Oriented Computing (SOC):

Towards economic-based Utility Computing: New paradigm,
new applications, new industries, and new business.
Grid Projects





Australia

Nimrod-G

Gridbus

GridSim

Virtual Lab

DISCWorld

GrangeNet

..new coming up
Europe

UNICORE

Cactus

UK eScience

EU Data Grid

EuroGrid

MetaMPI

XtremeWeb

and many more.
India

I-Grid
Japan

Ninf

DataFarm
Korea...
N*Grid

USA
Globus

Legion

OGSA

Sun Grid Engine

AppLeS

NASA IPG

Condor-G

Jxta

NetSolve

AccessGrid

and many more...
Cycle Stealing & .com Initiatives

Distributed.net

SETI@Home, ….

Entropia, UD, Parabon,….
Public Forums

Global Grid Forum

Australian Grid Forum

IEEE TFCC

CCGrid conference

P2P conference



Grid Requirements









Identity & authentication
Authorization & policy
Resource discovery
Resource characterization
Resource allocation
(Co-)reservation, workflow
Distributed algorithms
Remote data access
High-speed data transfer








Performance guarantees
Monitoring Adaptation
Intrusion detection
Resource management
Accounting & payment
Fault management
System evolution
Etc.
Problem
Enabling secure, controlled remote access
to computational resources and
management of remote computation
– Authentication and authorization
– Resource discovery & characterization
– Reservation and allocation
– Computation monitoring and control
Challenges









Locate “suitable” computers
Authenticate with appropriate sites
Allocate resources on those computers
Initiate computation on those computers
Configure those computations
Select “appropriate” communication methods
Compute with “suitable” algorithms
Access data files, return output
Respond “appropriately” to resource changes
Leading Grid Middleware
Developments
Globus Toolkit (mainly developed at ANL and USC)
Service-oriented toolkit from the Globus project,to be
used in Grid applications, not targeted at end-user
Services for resource selection and allocation,
authentication, file system access and file transfer, …
Largest user-base in projects worldwide
Open-source software, commercial support by IBM
and Platform Computing
The Globus Alliance

Globus Project ™, since 1996





Ian Foster (Argonne National Lab),
Carl Kesselman (University of Southern California’s
Information Science Institute)
Develop protocols, middleware and tools for
Grid computing
Globus Alliance, since Sept 2003
International scope



University of Edinburgh’s EPCC
Swedish Center for Parallel Computers (PDC)
Advisory council of Academic Affiliates from AsiaPacific, Europe, US
Globus Toolkit

GT2 (2.4 released in 2002): reference
implementation of Grid fabric protocols





GT3 (3.0 released July 2003): redesign



GRAM for job submissions
MDS for resource discovery
GridFTP for data transfer
GSI security
OGSI based
Grid services, built on SOAP and XML
GT3.2 released March 31, 2004
Globus Toolkit Services

Job submission and management (GRAM)


Security (GSI)


LDAP-based Information Service
Remote file management (GASS) and transfer
(GridFTP)


PKI-based Security (Authentication) Service
Information services (MDS)


Uniform Job Submission
Remote Storage Access Service
Remote Data Catalogue and Management
Tools


Support by Globus 2.0 released in 2002
Resource selection and allocation (GIIS, GRIS)
Resource Specification
Language

Common notation for exchange of information
between components


RSL provides two types of information:



Syntax similar to MDS/LDAP filters
Resource requirements: Machine type, number of
nodes, memory, etc.
Job configuration: Directory, executable, args,
environment
API provided for manipulating RSL
Protocols Make the Grid

Protocols and APIs



Protocols enable interoperability
APIs enable portability
Sharing is about interoperability, so …

Grid architecture should be about protocols
Grid Services Architecture:
Previous Perspective
… a rich variety of applications ...
Applns
Appln
Toolkits
Remote
data
toolkit
Remote
comp.
toolkit
Remote
viz
toolkit
Async.
collab.
toolkit
...
Remote
sensors
toolkit
Grid
Services
Protocols, authentication, policy, resource
management, instrumentation, discovery, etc., etc.
Grid
Fabric
Grid-enabled archives, networks, computers, display devices,
etc.; associated local services
Characteristics of Grid
Services Architecture

Identifies separation of concerns



Isolates Grids from languages and specific
programming environments
Makes provisions for generic and
application specific functionality
Protocols not explicit in architecture

fails to make clear distinction between
language, service and networking issues
Layered Grid Protocol
Architecture
Application
User
Grid
Resource
Connectivity
Fabric
Important Points

Being Grid-enabled requires speaking
appropriate protocols


Protocol only requirement, not reachability
Protocols can be used to bridge local resources or
“local Grids”



Built on Internet protocols
Independent of language and implementation


Intergrid as analog to Internet
Focus on interaction over network
Services exist at each level
Protocols, services and
interfaces
Applications
Languages/Frameworks
User Service APIs and SDKs
User Services
Grid Service APIs and SDKs
Grid Services
Resource APIs and SDKs
Resource Services
User Service Protocols
Grid Service Protocols
Resource Service Protocols
Connectivity APIs
Connectivity Protocols
Local Access APIs and protocols
Fabric Layer
How does Globus fit in?


Defines connectivity and resource
protocols
Enables definition of grid and user
protocols


Globus provides some of these, others
defined by other groups
Defines range of APIs and SDKs that
leverage Resource, Grid and User
protocols
Fabric

Local access to logical resource




May be real component, e.g. CPU, software
module, filesystem
May be logical component, e.g. Condor pool
Protocol or API mediated
Fabric elements include:

SSP, ASP, peer-to-peer, Entropia-like, and
enterprise level solutions
Connectivity Protocols


Two classes of connectivity protocols
underlie all other components
Internet communication



Application, transport and internet layer
protocols
I.e., transport, routing, DNS, etc.
Security


Authentication and delegation
Discussed below
Security

Protocols


Services


K5ssl, Globus Authorization Service
APIs


TLS with delegation
GSS-API, GAA, SASL, gss_assist
SDKs

GlobusIO
Resource Protocols





Resource management,
Storage system access
Network quality of service
Data movement
Resource information
Resource Management

Protocols


Resource services


GRAM+GARA (on HTTP)
Gatekeeper, JobManager, SlotManager
APIs and SDKs

GRAM API, JavaCog Client, DUROC
Data Transport

Protocols


Services


Grid FTP, LDAP for replica catalog
FTP, LDAP replica catalog
APIs and SDKs

GridFTP client library, copy URL API, replica
catalog access, replica selection
Resource Information

Protocol


Service


LDAP V3, Registration/Discovery protocol
GRIS
APIs & SDKs

C API; JNDI, PerlLDAP, ….
Grid Protocols

Grid Information Index Services




LDAP and Service registration protocol, …
GIIS service
LDAP APIs and specialized information API
Co-allocation and brokering



GRAM (HTTP+RSL)
DUROC service
DUROC client API, end-to-end reservation
API
Grid Protocols (cont)

Online authentication, authorization
services




HTTP
MyProxy, Group policy servers
Myproxy API, GAA API,
Many others (e.g.):


Resource discovery (Matchmaker)
Fault recovery
User Protocols


In general, there are many of these, they
tend to be on off, and not well defined
Examples:



Portal toolkits (e.g. Hotpage)
Netsolve
Cactus framework
Next Lecture



Communication among distributed
systems.
Remote Procedure Call (RPC)
References

Chapter 4 of the book
Univ. of Tehran
Distributed Operating Systems
173
Download