Distributed Web-Based Systems Distributed Web

advertisement
Distributed Systems
Principles and Paradigms
Maarten van Steen
VU Amsterdam, Dept. Computer Science
steen@cs.vu.nl
Chapter 12: Distributed Web-Based Systems
Version: December 10, 2012
1 / 19
Distributed Web-Based Systems
12.1 Architecture
Distributed Web-Based Systems
12.1 Architecture
Distributed Web-based systems
Essence
The WWW is a huge client-server system with millions of servers; each
server hosting thousands of hyperlinked documents.
Documents are often represented in text (plain text, HTML, XML)
Alternative types: images, audio, video, applications (PDF, PS)
Documents may contain scripts, executed by client-side software
Client machine
Server machine
Browser
Web server
2. Server fetches
document from
local file
OS
3. Response
1. Get document request (HTTP)
2 / 19
2 / 19
Distributed Web-Based Systems
12.1 Architecture
Distributed Web-Based Systems
12.1 Architecture
Multi-tiered architectures
Observation
Already very soon, Web sites were organized into three tiers.
3. Start process to fetch document
1. Get request
6. Return result
HTTP
request
handler
CGI
program
4. Database interaction
5. HTML document
created
Web server
CGI process
Database server
3 / 19
3 / 19
Distributed Web-Based Systems
12.1 Architecture
Distributed Web-Based Systems
12.1 Architecture
Web services
Observation
At a certain point, people started recognizing that it is was more than just
user ↔ site interaction: sites could offer services to other sites ⇒
standardization is then badly needed.
Look up
a service
Client machine
Server machine
Client
application
Server
application
Stub
Stub
SOAP
Communication
subsystem
Publish service
Communication
subsystem
Generate stub
from WSDL
description
Generate stub
from WSDL
description
Servicedescription
description(WSDL)
(WSDL)
Service
Service
description (WSDL)
Directory service (UDDI)
Distributed Web-Based Systems
4 / 19
12.2 Processes
4 / 19
Distributed Web-Based Systems
12.2 Processes
Apache Web server
Observation: More than 52% of all 185 million Web sites are Apache.
The server is internally organized more or less according to the steps needed
to process an HTTP request.
Module
Module
...
Function
Module
...
...
Link between
function and hook
Hook
Hook
Hook
Hook
Apache core
Functions called per hook
Request
Response
5 / 19
5 / 19
Distributed Web-Based Systems
12.2 Processes
Distributed Web-Based Systems
12.2 Processes
Server clusters
Essence
To improve performance and availability, WWW servers are often clustered in
a way that is transparent to clients.
Web
server
Web
server
Web
server
Web
server
LAN
Front
end
Request
Front end handles
all incoming requests
and outgoing responses
Response
6 / 19
6 / 19
Distributed Web-Based Systems
12.2 Processes
Distributed Web-Based Systems
12.2 Processes
Server clusters
Problem
The front end may easily get overloaded, so that special measures
need to be taken.
Transport-layer switching: Front end simply passes the TCP
request to one of the servers, taking some performance metric
into account.
Content-aware distribution: Front end reads the content of the
HTTP request and then selects the best server.
7 / 19
7 / 19
Distributed Web-Based Systems
12.2 Processes
Distributed Web-Based Systems
12.2 Processes
Server Clusters
Question
Why can content-aware distribution be so much better?
6. Server responses
Web
server
5. Forward
other
messages
Other messages
Client
Switch
Setup request
3. Hand of
f
TCP connection
Distributor
Dispatcher
4. Inform
switch
1. Pass setup request
to a distributor
Distributor
Web
server
2. Dispatcher selects
server
8 / 19
8 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Web proxy caching
Basic idea
Sites install a separate proxy server that handles all outgoing requests.
Proxies subsequently cache incoming documents. Cache-consistency
protocols:
Always verify validity by contacting server
Age-based consistency:
Texpire = α · (Tcached − Tlast
modified ) + Tcached
9 / 19
9 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Web proxy caching
Basic idea (cnt’d)
Cooperative caching, by which you first check your neighbors on a
cache miss
Web
server
3. Forward request
to Web server
1. Look in
local cache
Web
proxy
Cache
Web
proxy
2. Ask neighboring proxy caches
Client Client Client
Cache
Client Client Client
Web
proxy
HTTP Get request
Cache
Client Client Client
10 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
10 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication in Web hosting systems
Observation
By-and-large, Web hosting systems are adopting replication to increase
performance. Much research is done to improve their organization. Follows
the lines of self-managing systems.
Uncontrollable parameters (disturbance / noise)
Initial configuration
Corrections
+/Replica
placement
+/Consistency
enforcement
Web hosting system
Observed output
+/Request
routing
Reference input
Metric
estimation
Analysis
Measured output
Adjustment triggers
11 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
11 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Handling flash crowds
Observation
We need dynamic adjustment to balance resource usage. Flash
crowds introduce a serious problem.
2 days
(a)
2 days
(b)
6 days
(c)
2.5 days
(d)
12 / 19
12 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Server replication
Content Delivery Network
CDNs act as Web hosting services to replicate documents across the
Internet providing their customers guarantees on high availability and
performance (example: Akamai).
6. Get embedded documents
(if not already cached)
CDN
server
Cache
5. Get embedded
documents
Return IP address
client-best server
7. Embedded documents
1. Get base document
CDN DNS
server
4
Origin
server
Client
DNS lookups
2. Document with refs
to embedded documents
3
Regular
DNS system
13 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
13 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication of Web applications
Observation
Replication becomes more difficult when dealing with databses and
such. No single best solution.
Assumption
Updates are carried out at origin server, and propagated to edge
servers.
14 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
14 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication of Web applications: normal
Edge-server side
Client
Origin-server side
query
Web
server
Web
server
response
Appl
logic
Appl
logic
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
15 / 19
15 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication of Web applications
Alternative solutions
Full replication: high read/write ratio, often in combination with complex
queries.
Partial replication: high read/write ratio, but in combination with simple
queries
Content-aware caching: Check for queries at local database, and
subscribe for invalidations at the server. Works good with range queries
and complex queries.
Content-blind caching: Simply cache the result of previous queries.
Works great with simple queries that address unique results (e.g., no
range queries).
Question
What can be said about replication vs. performance?
16 / 19
16 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication Web apps.: full/partial replication
Edge-server side
Client
Origin-server side
query
Web
server
Web
server
response
Appl
logic
Appl
logic
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
17 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
17 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication Web apps.: content-aware caching
Edge-server side
Client
Origin-server side
query
Web
server
Web
server
response
Appl
logic
Appl
logic
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
18 / 19
18 / 19
Distributed Web-Based Systems
12.6 Consistency and Replication
Distributed Web-Based Systems
12.6 Consistency and Replication
Replication Web apps.: content-blind caching
Edge-server side
Client
Origin-server side
query
Web
server
Web
server
response
Appl
logic
Appl
logic
Content-blind
cache
Database
copy
full/partial data replication
Content-aware
cache
Schema
full schema replication/
query templates
Schema
Authoritative
database
19 / 19
19 / 19
Download