8. Distributed Web-based Systems

advertisement
8. Distributed Web-based Systems
Sistemes Distribuïts en Xarxa (SDX)
Facultat d’Informàtica de Barcelona (FIB)
Universitat Politècnica de Catalunya (UPC)
2013/2014 Q2
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (2) Distributed Web-based systems
Introduction
• The World Wide Web is a huge distributed
system with millions of clients and servers for
accessing hyperlinked documents
• Key issues
–
–
–
–
–
–
–
Architecture
Communication
Naming
Synchronization
Consistency & replication
Fault tolerance
Security (not covered in this course)
SDX (3) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (4) Distributed Web-based systems
Traditional Web-based systems
• Client-server architecture
– Server: store documents, run remote commands
– Browser: retrieve and display documents
SDX (5) Distributed Web-based systems
Traditional Web-based systems
• Documents can be:
– Plain text
– HTML: HyperText Markup Language
• Most common
• Allows embedding of links to other documents
– XML: Extensible Markup Language
• More flexible (i.e. it contains the names, types and
structure of the data elements within it)
– Images, Audio, Video
– Application (e.g. PDF, PS)
• Can include scripts that execute in the client
SDX (6) Distributed Web-based systems
Multitiered architectures
• Common Gateway Interface (CGI)
– Server runs a program taking user data as input
– This has now evolved to servlets and JSPs
• This has lead to three-tiered architectures
SDX (7) Distributed Web-based systems
Three-tiered architecture
• Distribute logical layers across the tiers
– User-interface, processing, and data layer
SDX (8) Distributed Web-based systems
Web server clusters
• Web servers can be clustered transparently to
clients to improve performance & availability
SDX (9) Distributed Web-based systems
Web server clusters
• Problem: The front end may easily get
overloaded, thus must be carefully designed
1. Transport-layer switch
•
Front end simply passes the data sent along the
TCP connection to one of the servers, taking
some measurement on servers load into account
2. Content-aware request distribution
– Front end inspects the content of the HTTP
request and then selects the best server
↑ Flexible, better profit of caching/replication
↓ High demand on front end
SDX (10) Distributed Web-based systems
READING REPORT
• Barroso09. The Datacenter as a Computer:
Workloads and Software Infrastructure
SDX (11) Distributed Web-based systems
Web Services
• Go beyond simple user-site interaction and
offer services to remote applications
– Communication using Internet standards
• Web Services components
– A standard way for communication (SOAP)
– A uniform data representation and exchange
mechanism (XML)
– A standard meta language to describe the services
offered (WSDL)
– A mechanism to register and locate WS-based
applications (UDDI)
SDX (12) Distributed Web-based systems
Web Services
SDX (13) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (14) Distributed Web-based systems
HyperText Transfer Protocol (HTTP)
• Simple client-server transfer protocol for
communication in traditional web systems
• One resource per HTTP request
• Based on TCP
• HTTP v1.0 uses
non-persistent
connections
– Each request to a
server requires
setting up a separate
TCP connection
SDX (15) Distributed Web-based systems
HyperText Transfer Protocol (HTTP)
• HTTP v1.1 uses
persistent
connections
– Same TCP
connection can be
used to issue
several requests
• Pipelining: Client can also issue several
requests in a row without waiting the
response for the first
SDX (16) Distributed Web-based systems
HyperText Transfer Protocol (HTTP)
• Operations supported by HTTP
Operation
Description
Head
Request to return the header of a resource
Get
Request to return a resource to the client. Can be a
document or the output of a program execution
Put
Request to store data as a resource
Post
Provide data that are to be handled by a resource
Delete
Request to delete a resource
SDX (17) Distributed Web-based systems
HyperText Transfer Protocol (HTTP)
• HTTP request message
– Client message headers:
e.g. acceptable content types
e.g. acceptable document encoding
e.g. list of client’s credentials
e.g. conditions on the latest date of
modification of the resource
SDX (18) Distributed Web-based systems
HyperText Transfer Protocol (HTTP)
• HTTP response message
– Server message header:
e.g. time of last modification
e.g. response’s expiration time
e.g. URL to redirect request
SDX (19) Distributed Web-based systems
Simple Object Access Protocol
• SOAP is the standard protocol for
communication between Web services
• Based on XML (extends XML-RPC)
↑ XML allows self-describing data (portable)
↓ Introduces performance problems
↓ Not meant to be read by human beings
• SOAP is platform independent, but bound to
an underlying protocol (a.k.a. carrier)
– Specifies bindings to underlying transfer protocols
• Currently HTTP, SMTP
• Recipient’s address is specified by the transfer protocol
SDX (20) Distributed Web-based systems
Simple Object Access Protocol
• SOAP message consists of two elements
which are jointly put inside a Envelope
1. Header (optional)
– Contains application specific information relevant
for nodes along the path from sender to receiver
•
e.g. routing, security, transactions
2. Body (mandatory)
– Contains the actual message
SDX (21) Distributed Web-based systems
SOAP example: Request
<soap:Envelope
URI of XML schema for SOAP envelopes
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
<m:Trans xmlns:m="http://www.w3schools.com/transaction/"
soap:mustUnderstand="1">234</m:Trans>
</soap:Header>
<soap:Body>
URI of XML schema for the service description
<m:GetPrice xmlns:m="http://www.w3schools.com/prices">
<m:Item>Apples</m:Item>
</m:GetPrice>
</soap:Body>
</soap:Envelope>
SDX (22) Distributed Web-based systems
SOAP example: Response
<soap:Envelope
URI of XML schema for SOAP envelopes
xmlns:soap="http://www.w3.org/2001/12/soap-envelope"
soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding">
<soap:Header>
<m:Trans xmlns:m="http://www.w3schools.com/transaction/"
soap:mustUnderstand="1">234</m:Trans>
</soap:Header>
URI of XML schema for the service description
<soap:Body>
<m:GetPriceResponse xmlns:m="http://www.w3schools.com/prices">
<m:Price>1.90</m:Price>
</m:GetPriceResponse>
</soap:Body>
</soap:Envelope>
SDX (23) Distributed Web-based systems
Web Services Description Language
• WSDL is a formal language for describing
precisely the interfaces provided by a WS
– Procedure specification, data types, logical
location of services
• Based on XML
• Can be automatically translated to client and
server stubs
• Analogous to IDL in RPCs
SDX (24) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (25) Distributed Web-based systems
Naming
• Documents referred by URIs
• Two forms of Uniform Resource Identifiers
– URN: Uniform Resource Name
• Globally unique, location independent and persistent
reference to a document
– A resolution operation is needed to find it
• e.g. urn:ietf:rfc:3187
– URL: Uniform Resource Locator
• Includes information on how and where to access the
document
• e.g. http://tools.ietf.org/html/rfc3187.html
SDX (26) Distributed Web-based systems
Naming
• URL contents
– Scheme: Application-level protocol for transferring
the document (e.g. http, ftp)
– DNS name/IP address of server
– Pathname of the resource in server’s file system
– Input query to the resource
– Fragment to identify a component of the resource
http:// servername [:port] [/pathname] [?query] [#fragment]
SDX (27) Distributed Web-based systems
Naming
http://www.cs.vu.nl/home/steen/mbox
http://www.cs.vu.nl:80/home/steen/mbox
http://130.37.24.11:80/home/steen/mbox
http://www.cdk5.net
http://www.w3.org/standards/faq.html#intro
http://www.google.com/search?q=obama
scheme
server name
port
pathname
http
www.cs.vu.nl
(default)
/home/steen/mbox (none)
(none)
http
www.cs.vu.nl
80
/home/steen/mbox (none)
(none)
http
130.37.24.11
80
/home/steen/mbox (none)
(none)
http
www.cdk5.net
(default)
(default)
(none)
(none)
http
www.w3.org
(default)
standards/faq.html
(none)
intro
http
www.google.com
(default)
search
q=obama (none)
SDX (28) Distributed Web-based systems
query
fragment
Naming
• Examples of URIs
Name
Used for
Example
http
HTTP
http://www.cs.vu.nl:80/globe
mailto
E-mail
mailto:steen@cs.vu.nl
ftp
FTP
ftp://ftp.cs.vu.nl/pub/minix/README
file
Local file
file:/edu/book/work/chp/11/11
data
Inline data
data:text/plain;charset=iso-88597,%e1%e2%e3
telnet
Remote login
telnet://flits.cs.vu.nl
tel
Telephone
tel:+31201234567
Modem
Modem
modem:+31201234567;type=v32
SDX (29) Distributed Web-based systems
Web Services naming
• UDDI directory service
– Universal Description Discovery Interface
• Contains service descriptions for allowing
users browsing for relevant services
• Provides pointers to WSDL documents
• WSDL service descriptions may be looked up
by name or by attribute
• UDDI is also based on XML
SDX (30) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (31) Distributed Web-based systems
Synchronization
• Not an issue in traditional web-based systems
– Nothing to synchronize since servers do not
exchange information with other servers
– WWW is a read-mostly system. Updates are done
by a single entity
• But today, distributed authoring of
documents is emerging
– Synchronization is needed
– Let’s see how this is accomplished in collaborative
editing of documents: Google Docs
• Based on Operational Transformation (OT)
SDX (32) Distributed Web-based systems
Google Docs collaboration
• A document is stored in the server as a list of
chronological changes: revision log
– 3 basic types of changes: inserting text, deleting
text, and applying styles to a range of text
• e.g. {InsertText ‘SDX' @10}, {DeleteText @9-11},
{ApplyStyle bold @10-20}
– Append each change to the end of the revision log
A. Collaborative protocol to sync changes
1. Each editor sends changes to the server and
waits for acknowledgement
• Changes during this period are keep in a pending list
(never send more than one change at a time)
SDX (33) Distributed Web-based systems
Google Docs collaboration
2. For each change, server updates revision log,
acknowledges a new revision to the sender and
sends change to the other editors
3. Each editor transforms incoming changes against
its pending changes so that they make sense
relative to the local version of the document
• Operational Transformation to define the different ways
that InsertText, DeleteText, and ApplyStyle changes can
be paired and transformed against each other
– http://googledocs.blogspot.com.es/2010/09/whatsdifferent-about-new-google-docs_22.html
– http://googledocs.blogspot.com.es/2010/09/whats-differentabout-new-google-docs_23.html
SDX (34) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (35) Distributed Web-based systems
Client-side caching
1. Browser’s cache
2. Web proxy caching
– Site installs a separate proxy server that passes
all requests from local clients to the Web servers
– Proxy subsequently caches incoming documents
a) Hierarchical caches
•
•
Place caches covering a region (even a country)
On cache miss, the request is moved upward
b) Cooperative caching
•
On cache miss, the web proxy checks the neighboring
proxies
SDX (36) Distributed Web-based systems
Cooperative caching
SDX (37) Distributed Web-based systems
Cache-consistency protocols
1. Verify validity by contacting web server
↑ Strong consistency
↓ Server must be contacted for each request
2. Age-based consistency
– Used by Squid Web proxy
– Assign an expiration time to each document
•
Depends on how long ago the document was last
modified when it is cached
– Document is considered valid until this time
↑ Better performance
↓ Weaker consistency
SDX (38) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
– Content Distribution Networks (CDN)
• Fault tolerance
SDX (39) Distributed Web-based systems
Content Distribution Networks
• Goal: Improve content delivery performance
& reliability without tremendous infrastructure
investments by the content providers
• Idea: distributed web server
– CDN replicates contents across the Internet and
delivers them to users on behalf of origin sites
• Typically high demand content e.g. graphics, media, etc.
 Reduces the load on origin servers
• Can handle better flash crowds
 Reduces client perceived response time
• Bring content closer to end-users
SDX (40) Distributed Web-based systems
Content Distribution Networks
• Using replication introduces some technical
challenges:
– What content to replicate?
– Where to place replicated content?
– How to enforce consistency between replicas?
 How to route a client request to a replica?
• How to find the requested content?
• How to choose between different replicas of the
content?
 Redirection schemes
SDX (41) Distributed Web-based systems
Redirection schemes
• URL rewriting
– Origin server rewrites object URL links in returned
pages to redirect clients to different CDN servers
• DNS redirection
– Origin server modifies its DNS zone file; DNS
resolution is now controlled by CDN
– Redirect requests to CDN servers according to a
given policy (least loaded server, closest server to
client, etc.)
– Full-site or partial-site content delivery
• Hybrid schemes can be also used
SDX (42) Distributed Web-based systems
URL rewriting example
index.html
<HTML>
<BODY>
<IMG SRC=“www.akamai1.net/www.yahoo.com/img1.gif”>
<IMG SRC=“www.akamai2.net/www.yahoo.com/img2.gif”>
<IMG SRC=“www.akamai3.net/www.yahoo.com/img3.gif”>
</BODY>
</HTML>
SDX (43) Distributed Web-based systems
DNS redirection example
Origin server
111.222.100.1
www.yahoo.com/index.html
GET index.html
10.20.30.1
<HTML>
10.20.30.1 (no 111.222.100.1)
yahoo.com IP?
10.20.30.4
10.20.30.2
10.20.30.3
SDX (44) Distributed Web-based systems
DNS server controlled
by CDN
Example: Akamai
• World leader in CDN
– 100,000 servers in 75 countries directly connected
within nearly 1,000 networks
• Uses a hybrid redirection scheme
– DNS redirection
– URL rewriting
• Akamai Resource Locator (ARL): ‘Akamized’ URL
http://www.foo.com/images/logo.gif
http://a9.g.akamai.net/7/9/21/aaa7a80f016a2c/www.foo.com/images/logo.gif
Serial number
Akamai domain
Type code
Content provider code
Object data
Original URL
SDX (45) Distributed Web-based systems
Example: Akamai
1. DNS lookup for www.xyz.com
2. IP address for optimal Akamai
server returned
3. Browser requests HTML from
optimal Akamai server
4. Akamai server contacts central
xyz.com server over the
Internet if necessary
5. Akamai server assembles HTML
and returns to browser
6. Browser resolves embedded
links, obtaining IP addresses for
optimal Akamai servers hosting
those objects
7. Browser retrieves objects
– e.g. dynamic content
SDX (46) Distributed Web-based systems
Contents
• Introduction
• Architecture
• Communication
• Naming
• Synchronization
• Consistency & replication
• Fault tolerance
SDX (47) Distributed Web-based systems
Fault tolerance
• Mainly achieved through client-side caching
and server replication
SDX (48) Distributed Web-based systems
Summary
• WWW as a distributed system
– From traditional client-server architectures for
fetching hyperlinked documents to Web Services
– Use of standardized protocols
• Communication: HTTP, SOAP
• Naming: URI, UDDI
– Caching and replication for improving performance
and fault tolerance
• e.g. web proxy caching, CDN
SDX (49) Distributed Web-based systems
Download