Distributed Web-based Systems - Department of Computer and

advertisement
Distributed Systems
Principles and Paradigms
Chapter 12
(version October 15, 2007)
Maarten van Steen
Vrije Universiteit Amsterdam, Faculty of Science
Dept. Mathematics and Computer Science
Room R4.20. Tel: (020) 598 7784
E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/∼steen/
01
02
03
04
05
06
07
08
09
10
11
12
13
Introduction
Architectures
Processes
Communication
Naming
Synchronization
Consistency and Replication
Fault Tolerance
Security
Distributed Object-Based Systems
Distributed File Systems
Distributed Web-Based Systems
Distributed Coordination-Based Systems
00 – 1
/
Distributed Web-Based Systems
Essence: The WWW is a huge client-server system
with millions of servers; each server hosting thousands
of hyperlinked documents:
Client machine
Server machine
Browser
Web server
2. Server fetches
document from
local file
OS
3. Response
1. Get document request (HTTP)
• Documents are generally represented in text (plain
text, HTML, XML)
• Alternative types: images, audio, video, but also
applications (PDF, PS)
• Documents may contain scripts that are executed
by the client-side software
12 – 1
Distributed Web-Based Systems/12.1 Architecture
Multi-tiered Architectures
Observation: Already very soon, Web sites were organized into three tiers:
3. Start process to fetch document
1. Get request
6. Return result
HTTP
request
handler
4. Database interaction
CGI
program
5. HTML document
created
Web server
12 – 2
Database server
CGI process
Distributed Web-Based Systems/12.1 Architecture
Web Services
Observation: At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒
standardization is then badly needed.
Look up
a service
Client machine
Server machine
Client
application
Server
application
Stub
Stub
Communication
subsystem
SOAP
Publish service
Communication
subsystem
Generate stub
from WSDL
description
Generate stub
from WSDL
description
Servicedescription
description(WSDL)
(WSDL)
Service
Service
description (WSDL)
Directory service (UDDI)
12 – 3
Distributed Web-Based Systems/12.1 Architecture
Clients: Web browsers
Observation: browsers form the Web’s most important client-side sofware. They used to be simple, but
that is long ago.
Display back end
User interface
Browser engine
Rendering engine
Client-side
script
interpreter
Network
comm.
12 – 4
HTML/XML
parser
Distributed Web-Based Systems/12.2 Processes
Apache Web Server
Observation: More than 70% of all Web sites are
based on Apache. The server is internally organized
more or less according to the steps needed to process
an HTTP request:
Module
Module
...
Function
Module
...
...
Link between
function and hook
Hook
Hook
Hook
Hook
Apache core
Functions called per hook
Request
12 – 5
Response
Distributed Web-Based Systems/12.2 Processes
Server Clusters (1/2)
Essence: To improve performance and availability,
WWW servers are often clustered in a way that is
transparent to clients:
Web
server
Web
server
Web
server
Web
server
LAN
Front
end
Request
Front end handles
all incoming requests
and outgoing responses
Response
Problem: The front end may easily get overloaded,
so that special measures need to be taken.
Transport-layer switching: Front end simply passes
the TCP request to one of the servers, taking some
performance metric into account.
Content-aware distribution: Front end reads the content of the HTTP request and then selects the
best server.
12 – 6
Distributed Web-Based Systems/12.2 Processes
Server Clusters (2/2)
Question: Why can content-aware distribution be so
much better?
6. Server responses
Web
server
5. Forward
other
messages
Other messages
Client
Switch
Setup request
4. Inform
switch
1. Pass setup request
to a distributor
Distributor
Web
server
12 – 7
3. Hand of
f
TCP connection
Distributor
Dispatcher
2. Dispatcher selects
server
Distributed Web-Based Systems/12.2 Processes
Communication (1/2)
Essence: Communication in the Web is generally based
on HTTP; a relatively simple client-server transfer protocol having the following request messages:
Operation
Description
Request to return the header of a document
Request to return a document to the client
Request to store a document
Provide data that are to be added to a document (collection)
Request to delete a document
Head
Get
Put
Post
Delete
12 – 8
Distributed Web-Based Systems/12.3 Communication
Communication (2/2)
C/S
Header
Accept
Accept-Charset
AcceptEncoding
AcceptLanguage
Authorization
WWWAuthenticate
Date
ETag
Expires
From
Host
If-Match
If-None-Match
If-ModifiedSince
If-UnmodifiedSince
Last-Modified
Location
Referer
Upgrade
Warning
12 – 9
C
C
C
Contents
The type of documents the client can handle
The character sets are acceptable for the client
The document encodings the client can handle
C
The natural language the client can handle
C
S
A list of the client’s credentials
Security challenge the client should respond to
C+S
S
S
C
C
C
C
C
C
S
S
C
C+S
C+S
Date and time the message was sent
The tags associated with the returned document
The time for how long the response remains valid
The client’s e-mail address
The TCP address of the document’s server
The tags the document should have
The tags the document should not have
Tells the server to return a document only if it has
been modified since the specified time
Tells the server to return a document only if it has
not been modified since the specified time
The time the returned document was last modified
A document reference to which the client should
redirect its request
Refers to client’s most recently requested document
The application protocol sender wants to switch to
Information about status of the data in the message
Distributed Web-Based Systems/12.3 Communication
SOAP
Simple Object Access Protocol: Based on XML,
this is the standard protocol for communication between Web services.
• SOAP is bound to an underlying protocol (i.e., it
is not independent from its carrier)
• Conversational exchange style: Send a document one way, get a filled-in response back.
• RPC-style exchange: Used to invoke a Web service.
12 – 10
Distributed Web-Based Systems/12.3 Communication
A Note on XML
Observation: XML has the advantage of allowing selfdescribing documents. Full stop (i.e., it introduces
performance problems and is not meant to be read
by human beings)
env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<n:alertcontrol xmlns:n="http://example.org/alertcontrol">
<n:priority>1</n:priority>
<n:expires>2001-06-22T14:00:00-05:00</n:expires>
</n:alertcontrol>
</env:Header>
<env:Body>
<m:alert xmlns:m="http://example.org/alert">
<m:msg>Pick up Mary at school at 2pm</m:msg>
</m:alert>
</env:Body>
</env:Envelope>
12 – 11
Distributed Web-Based Systems/12.3 Communication
Naming: URL
URL: Uniform Resource Locator tells how and where
to access a resource.
http
Pathname
Host name
Scheme
:// www.cs.vu.nl
/home/steen/mbox
(a)
Host name
Scheme
http
Port
:// www.cs.vu.nl : 80
Pathname
/home/steen/mbox
(b)
Scheme
http
Host name
Port
:// 130.37.24.11 : 80
Pathname
/home/steen/mbox
(c)
Examples:
http
mailto
ftp
file
data
HTTP
Mail
FTP
Local file
Inline data
telnet
tel
modem
Remote login
Telephone
Modem
12 – 12
http://www.cs.vu.nl:80/globe
mailto:steen@cs.vu.nl
ftp://ftp.cs.vu.nl/pub/minix/README
file:/edu/book/work/chp/11/11
data:text/plain;charset=iso-8859-7,
%e1%e2%e3
telnet://flits.cs.vu.nl
tel:+31201234567
modem:+31201234567;type=v32
Distributed Web-Based Systems/12.4
Synchronization: WebDAV
Problem: There is a growing need for collaborative
auditing of Web documents, but bare-bones HTTP can’t
help here. Solution: Web Distributed Authoring and
Versioning.
• Supports exclusive and shared write locks, which
operate on entire documents
• A lock is passed by means of a lock token; the
server registers the client(s) holding the lock
• Clients modify the document locally and post it
back to the server along with the lock token
Note: There is no specific support for crashed clients
holding a lock.
12 – 13
Distributed Web-Based Systems/12.5 Synchronization
Web Proxy Caching
Basic idea: Sites install a separate proxy server that
handles all outgoing requests. Proxies subsequently
cache incoming documents. Cache-consistency protocols:
• Always verify validity by contacting server
• Age-based consistency:
Texpire = α · ( Tcached − Tlast modi f ied ) + Tcached
• Cooperative caching, by which you first check your
neighbors on a cache miss:
Web
server
3. Forward request
to Web server
1. Look in
local cache
Web
proxy
Cache
2. Ask neighboring proxy caches
Client Client Client
Web
proxy
Cache
Client Client Client
Web
proxy
HTTP Get request
Cache
Client Client Client
12 – 14
Distributed Web-Based Systems/12.6 Consistency and Replication
Replication in Web Hosting
Systems
Observation: By-and-large, Web hosting systems are
adopting replication to increase performance. Much
research is done to improve their organization. Follows the lines of self-managing systems:
Uncontrollable parameters (disturbance / noise)
Corrections
Initial configuration
+/Replica
placement
+/Consistency
enforcement
Web hosting system
Observed output
+/Request
routing
Reference input
Metric
estimation
Analysis
Adjustment triggers
12 – 15
Measured output
Distributed Web-Based Systems/12.6 Consistency and Replication
Handling Flash Crowds
Observation: We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem:
2 days
12 – 16
(a)
2 days
(b)
6 days
(c)
2.5 days
(d)
Distributed Web-Based Systems/12.6 Consistency and Replication
Server Replication
Content Delivery Network: CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai).
CDN
server
Cache
6. Get embedded documents
(if not already cached)
5. Get embedded
documents
Return IP address
client-best server
CDN DNS
server
7. Embedded documents
1. Get base document
4
DNS lookups
Client
3
Origin
server
2. Document with refs
to embedded documents
Regular
DNS system
Question: How would consistency be maintained in
this system?
12 – 17
Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (1/3)
Observation: Replication becomes more difficult when
dealing with databses and such. No single best solution.
Edge-server side
Client
Origin-server side
query
Server
Server
response
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
Assumption: Updates are carried out at origin server,
and propagated to edge servers.
12 – 18
Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (2/3)
Edge-server side
Client
Origin-server side
query
Server
Server
response
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
• Full replication: high read/write ratio, often in
combination with complex queries. Note: replication may possibly speed-down performance when
R/W ratio goes down.
• Partial replication: high read/write ratio, but in
combination with simple queries
12 – 19
Distributed Web-Based Systems/12.6 Consistency and Replication
Replication of Web Apps. (3/3)
Edge-server side
Client
Origin-server side
query
Server
Server
response
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
full schema replication/
query templates
Schema
Schema
Authoritative
database
• Content-aware caching: Check for queries at local database, and subscribe for invalidations at
the server. Works good with range queries and
complex queries.
• Content-blind caching: Simply cache the result
of previous queries. Works great with simple queries
that address unique results (e.g., no range queries).
12 – 20
Distributed Web-Based Systems/12.6 Consistency and Replication
Security: TLS (SSL)
Transport Layer Security: Modern version of the
the Secure Socket Layer (SSL), which “sits” between
transport layer and application protocols. Relatively
simple protocol that can support mutual authentication using certificates:
1
Possibilities
3
4
5
12 – 21
Choices
[ K+S ] CA
Server
Client
2
[ K+C ] CA
K +S ([ R ] C)
Distributed Web-Based Systems/12.6 Consistency and Replication
Download