Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science steen@cs.vu.nl Chapter 12: Distributed Web-Based Systems Version: December 10, 2012 1 / 19 Distributed Web-Based Systems 12.1 Architecture Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence The WWW is a huge client-server system with millions of servers; each server hosting thousands of hyperlinked documents. Documents are often represented in text (plain text, HTML, XML) Alternative types: images, audio, video, applications (PDF, PS) Documents may contain scripts, executed by client-side software Client machine Server machine Browser Web server 2. Server fetches document from local file OS 3. Response 1. Get document request (HTTP) 2 / 19 2 / 19 Distributed Web-Based Systems 12.1 Architecture Distributed Web-Based Systems 12.1 Architecture Multi-tiered architectures Observation Already very soon, Web sites were organized into three tiers. 3. Start process to fetch document 1. Get request 6. Return result HTTP request handler CGI program 4. Database interaction 5. HTML document created Web server CGI process Database server 3 / 19 3 / 19 Distributed Web-Based Systems 12.1 Architecture Distributed Web-Based Systems 12.1 Architecture Web services Observation At a certain point, people started recognizing that it is was more than just user ↔ site interaction: sites could offer services to other sites ⇒ standardization is then badly needed. Look up a service Client machine Server machine Client application Server application Stub Stub SOAP Communication subsystem Publish service Communication subsystem Generate stub from WSDL description Generate stub from WSDL description Servicedescription description(WSDL) (WSDL) Service Service description (WSDL) Directory service (UDDI) Distributed Web-Based Systems 4 / 19 12.2 Processes 4 / 19 Distributed Web-Based Systems 12.2 Processes Apache Web server Observation: More than 52% of all 185 million Web sites are Apache. The server is internally organized more or less according to the steps needed to process an HTTP request. Module Module ... Function Module ... ... Link between function and hook Hook Hook Hook Hook Apache core Functions called per hook Request Response 5 / 19 5 / 19 Distributed Web-Based Systems 12.2 Processes Distributed Web-Based Systems 12.2 Processes Server clusters Essence To improve performance and availability, WWW servers are often clustered in a way that is transparent to clients. Web server Web server Web server Web server LAN Front end Request Front end handles all incoming requests and outgoing responses Response 6 / 19 6 / 19 Distributed Web-Based Systems 12.2 Processes Distributed Web-Based Systems 12.2 Processes Server clusters Problem The front end may easily get overloaded, so that special measures need to be taken. Transport-layer switching: Front end simply passes the TCP request to one of the servers, taking some performance metric into account. Content-aware distribution: Front end reads the content of the HTTP request and then selects the best server. 7 / 19 7 / 19 Distributed Web-Based Systems 12.2 Processes Distributed Web-Based Systems 12.2 Processes Server Clusters Question Why can content-aware distribution be so much better? 6. Server responses Web server 5. Forward other messages Other messages Client Switch Setup request 3. Hand of f TCP connection Distributor Dispatcher 4. Inform switch 1. Pass setup request to a distributor Distributor Web server 2. Dispatcher selects server 8 / 19 8 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Web proxy caching Basic idea Sites install a separate proxy server that handles all outgoing requests. Proxies subsequently cache incoming documents. Cache-consistency protocols: Always verify validity by contacting server Age-based consistency: Texpire = α · (Tcached − Tlast modified ) + Tcached 9 / 19 9 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Web proxy caching Basic idea (cnt’d) Cooperative caching, by which you first check your neighbors on a cache miss Web server 3. Forward request to Web server 1. Look in local cache Web proxy Cache Web proxy 2. Ask neighboring proxy caches Client Client Client Cache Client Client Client Web proxy HTTP Get request Cache Client Client Client 10 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication 10 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Replication in Web hosting systems Observation By-and-large, Web hosting systems are adopting replication to increase performance. Much research is done to improve their organization. Follows the lines of self-managing systems. Uncontrollable parameters (disturbance / noise) Initial configuration Corrections +/Replica placement +/Consistency enforcement Web hosting system Observed output +/Request routing Reference input Metric estimation Analysis Measured output Adjustment triggers 11 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication 11 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Handling flash crowds Observation We need dynamic adjustment to balance resource usage. Flash crowds introduce a serious problem. 2 days (a) 2 days (b) 6 days (c) 2.5 days (d) 12 / 19 12 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Server replication Content Delivery Network CDNs act as Web hosting services to replicate documents across the Internet providing their customers guarantees on high availability and performance (example: Akamai). 6. Get embedded documents (if not already cached) CDN server Cache 5. Get embedded documents Return IP address client-best server 7. Embedded documents 1. Get base document CDN DNS server 4 Origin server Client DNS lookups 2. Document with refs to embedded documents 3 Regular DNS system 13 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication 13 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications Observation Replication becomes more difficult when dealing with databses and such. No single best solution. Assumption Updates are carried out at origin server, and propagated to edge servers. 14 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication 14 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications: normal Edge-server side Client Origin-server side query Web server Web server response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database 15 / 19 15 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Replication of Web applications Alternative solutions Full replication: high read/write ratio, often in combination with complex queries. Partial replication: high read/write ratio, but in combination with simple queries Content-aware caching: Check for queries at local database, and subscribe for invalidations at the server. Works good with range queries and complex queries. Content-blind caching: Simply cache the result of previous queries. Works great with simple queries that address unique results (e.g., no range queries). Question What can be said about replication vs. performance? 16 / 19 16 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: full/partial replication Edge-server side Client Origin-server side query Web server Web server response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database 17 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication 17 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: content-aware caching Edge-server side Client Origin-server side query Web server Web server response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database 18 / 19 18 / 19 Distributed Web-Based Systems 12.6 Consistency and Replication Distributed Web-Based Systems 12.6 Consistency and Replication Replication Web apps.: content-blind caching Edge-server side Client Origin-server side query Web server Web server response Appl logic Appl logic Content-blind cache Database copy full/partial data replication Content-aware cache Schema full schema replication/ query templates Schema Authoritative database 19 / 19 19 / 19