Distributed Systems Principles and Paradigms Chapter 12 (version October 15, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel: (020) 598 7784 E-mail:steen@cs.vu.nl, URL: www.cs.vu.nl/∼steen/ 01 02 03 04 05 06 07 08 09 10 11 12 13 Introduction Architectures Processes Communication Naming Synchronization Consistency and Replication Fault Tolerance Security Distributed Object-Based Systems Distributed File Systems Distributed Web-Based Systems Distributed Coordination-Based Systems 00 – 1 / Distributed Web-Based Systems Essence: The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents: Client machine Server machine Browser Web server 2. Server fetches document from local file OS 3. Response 1. Get document request (HTTP) • Documents are generally represented in text (plain text, HTML, XML) • Alternative types: images, audio, video, but also applications (PDF, PS) • Documents may contain scripts that are executed by the client-side software 12 – 1 Distributed Web-Based Systems/12.1 Architecture Multi-tiered Architectures Observation: Already very soon, Web sites were organized into three tiers: 3. Start process to fetch document 1. Get request 6. Return result HTTP request handler 4. Database interaction CGI program 5. HTML document created Web server 12 – 2 Database server CGI process Distributed Web-Based Systems/12.1 Architecture Web Services Observation: At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒ standardization is then badly needed. Look up a service Client machine Server machine Client application Server application Stub Stub Communication subsystem SOAP Publish service Communication subsystem Generate stub from WSDL description Generate stub from WSDL description Servicedescription description(WSDL) (WSDL) Service Service description (WSDL) Directory service (UDDI) 12 – 3 Distributed Web-Based Systems/12.1 Architecture Clients: Web browsers Observation: browsers form the Web’s most important client-side sofware. They used to be simple, but that is long ago. Display back end User interface Browser engine Rendering engine Client-side script interpreter Network comm. 12 – 4 HTML/XML parser Distributed Web-Based Systems/12.2 Processes Apache Web Server Observation: More than 70% of all Web sites are based on Apache. The server is internally organized more or less according to the steps needed to process an HTTP request: Module Module ... Function Module ... ... Link between function and hook Hook Hook Hook Hook Apache core Functions called per hook Request 12 – 5 Response Distributed Web-Based Systems/12.2 Processes Server Clusters (1/2) Essence: To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients: Web server Web server Web server Web server LAN Front end Request Front end handles all incoming requests and outgoing responses Response Problem: The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best server. 12 – 6 Distributed Web-Based Systems/12.2 Processes Server Clusters (2/2) Question: Why can content-aware distribution be so much better? 6. Server responses Web server 5. Forward other messages Other messages Client Switch Setup request 4. Inform switch 1. Pass setup request to a distributor Distributor Web server 12 – 7 3. Hand of f TCP connection Distributor Dispatcher 2. Dispatcher selects server Distributed Web-Based Systems/12.2 Processes Communication (1/2) Essence: Communication in the Web is generally based on HTTP; a relatively simple client-server transfer protocol having the following request messages: Operation Description Request to return the header of a document Request to return a document to the client Request to store a document Provide data that are to be added to a document (collection) Request to delete a document Head Get Put Post Delete 12 – 8 Distributed Web-Based Systems/12.3 Communication Communication (2/2) C/S Header Accept Accept-Charset AcceptEncoding AcceptLanguage Authorization WWWAuthenticate Date ETag Expires From Host If-Match If-None-Match If-ModifiedSince If-UnmodifiedSince Last-Modified Location Referer Upgrade Warning 12 – 9 C C C Contents The type of documents the client can handle The character sets are acceptable for the client The document encodings the client can handle C The natural language the client can handle C S A list of the client’s credentials Security challenge the client should respond to C+S S S C C C C C C S S C C+S C+S Date and time the message was sent The tags associated with the returned document The time for how long the response remains valid The client’s e-mail address The TCP address of the document’s server The tags the document should have The tags the document should not have Tells the server to return a document only if it has been modified since the specified time Tells the server to return a document only if it has not been modified since the specified time The time the returned document was last modified A document reference to which the client should redirect its request Refers to client’s most recently requested document The application protocol sender wants to switch to Information about status of the data in the message Distributed Web-Based Systems/12.3 Communication SOAP Simple Object Access Protocol: Based on XML, this is the standard protocol for communication between Web services. • SOAP is bound to an underlying protocol (i.e., it is not independent from its carrier) • Conversational exchange style: Send a document one way, get a filled-in response back. • RPC-style exchange: Used to invoke a Web service. 12 – 10 Distributed Web-Based Systems/12.3 Communication A Note on XML Observation: XML has the advantage of allowing selfdescribing documents. Full stop (i.e., it introduces performance problems and is not meant to be read by human beings) env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"> <env:Header> <n:alertcontrol xmlns:n="http://example.org/alertcontrol"> <n:priority>1</n:priority> <n:expires>2001-06-22T14:00:00-05:00</n:expires> </n:alertcontrol> </env:Header> <env:Body> <m:alert xmlns:m="http://example.org/alert"> <m:msg>Pick up Mary at school at 2pm</m:msg> </m:alert> </env:Body> </env:Envelope> 12 – 11 Distributed Web-Based Systems/12.3 Communication Naming: URL URL: Uniform Resource Locator tells how and where to access a resource. http Pathname Host name Scheme :// www.cs.vu.nl /home/steen/mbox (a) Host name Scheme http Port :// www.cs.vu.nl : 80 Pathname /home/steen/mbox (b) Scheme http Host name Port :// 130.37.24.11 : 80 Pathname /home/steen/mbox (c) Examples: http mailto ftp file data HTTP Mail FTP Local file Inline data telnet tel modem Remote login Telephone Modem 12 – 12 http://www.cs.vu.nl:80/globe mailto:steen@cs.vu.nl ftp://ftp.cs.vu.nl/pub/minix/README file:/edu/book/work/chp/11/11 data:text/plain;charset=iso-8859-7, %e1%e2%e3 telnet://flits.cs.vu.nl tel:+31201234567 modem:+31201234567;type=v32 Distributed Web-Based Systems/12.4 Synchronization: WebDAV Problem: There is a growing need for collaborative auditing of Web documents, but bare-bones HTTP can’t help here. Solution: Web Distributed Authoring and Versioning. • Supports exclusive and shared write locks, which operate on entire documents • A lock is passed by means of a lock token; the server registers the client(s) holding the lock • Clients modify the document locally and post it back to the server along with the lock token Note: There is no specific support for crashed clients holding a lock. 12 – 13 Distributed Web-Based Systems/12.5 Synchronization Web Proxy Caching Basic idea: Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: • Always verify validity by contacting server • Age-based consistency: Texpire = α · ( Tcached − Tlast modi f ied ) + Tcached • Cooperative caching, by which you first check your neighbors on a cache miss: Web server 3. Forward request to Web server 1. Look in local cache Web proxy Cache 2. Ask neighboring proxy caches Client Client Client Web proxy Cache Client Client Client Web proxy HTTP Get request Cache Client Client Client 12 – 14 Distributed Web-Based Systems/12.6 Consistency and Replication Replication in Web Hosting Systems Observation: By-and-large, Web hosting systems are adopting replication to increase performance. Much research is done to improve their organization. Follows the lines of self-managing systems: Uncontrollable parameters (disturbance / noise) Corrections Initial configuration +/Replica placement +/Consistency enforcement Web hosting system Observed output +/Request routing Reference input Metric estimation Analysis Adjustment triggers 12 – 15 Measured output Distributed Web-Based Systems/12.6 Consistency and Replication Handling Flash Crowds Observation: We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem: 2 days 12 – 16 (a) 2 days (b) 6 days (c) 2.5 days (d) Distributed Web-Based Systems/12.6 Consistency and Replication Server Replication Content Delivery Network: CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai). CDN server Cache 6. Get embedded documents (if not already cached) 5. Get embedded documents Return IP address client-best server CDN DNS server 7. Embedded documents 1. Get base document 4 DNS lookups Client 3 Origin server 2. Document with refs to embedded documents Regular DNS system Question: How would consistency be maintained in this system? 12 – 17 Distributed Web-Based Systems/12.6 Consistency and Replication Replication of Web Apps. (1/3) Observation: Replication becomes more difficult when dealing with databses and such. No single best solution. Edge-server side Client Origin-server side query Server Server response Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database Assumption: Updates are carried out at origin server, and propagated to edge servers. 12 – 18 Distributed Web-Based Systems/12.6 Consistency and Replication Replication of Web Apps. (2/3) Edge-server side Client Origin-server side query Server Server response Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database • Full replication: high read/write ratio, often in combination with complex queries. Note: replication may possibly speed-down performance when R/W ratio goes down. • Partial replication: high read/write ratio, but in combination with simple queries 12 – 19 Distributed Web-Based Systems/12.6 Consistency and Replication Replication of Web Apps. (3/3) Edge-server side Client Origin-server side query Server Server response Content-blind cache Database copy full/partial data replication Content-aware cache full schema replication/ query templates Schema Schema Authoritative database • Content-aware caching: Check for queries at local database, and subscribe for invalidations at the server. Works good with range queries and complex queries. • Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). 12 – 20 Distributed Web-Based Systems/12.6 Consistency and Replication Security: TLS (SSL) Transport Layer Security: Modern version of the the Secure Socket Layer (SSL), which “sits” between transport layer and application protocols. Relatively simple protocol that can support mutual authentication using certificates: 1 Possibilities 3 4 5 12 – 21 Choices [ K+S ] CA Server Client 2 [ K+C ] CA K +S ([ R ] C) Distributed Web-Based Systems/12.6 Consistency and Replication