Advanced Internet and Web Systems C. Edward Chow Outline of the Talk Syllabus Introduction to WWW Systems Survey of Web Cluster Systems Survey of Caching Techniques Server Selection and Load Balancing Advanced Internet & Web Systems chow 2 Introduction to WWW Systems Web Server Hosting web pages Retrieving web pages using HTTP protocol Internet Web Client Browser Web Authoring System create web pages Publish web pages Scanner Video capture Web page: document written in HTML Advanced Internet & Web Systems chow Sound card 3 What is Unique in WWW? Hyperlink: Use Hypertext Markup Language HTML to describe the document in ASCII text (extended to iso-8859-1) Naming scheme: Name object in the web with Universal Resource Locator (URL) with syntax: protocol://domain_name/<uri or path name> HTTP: HyperText Transfer Protocol a simple request-response protocol for transferring HTML documents ASCII text based (not binary, therefore easy to debug) Advanced Internet & Web Systems chow 4 Web Authoring System Text Editor: type in HTML <tag> and content HTML Editor: like normal word processor, user did not have know a lot about HTML syntax, e.g., Netscape Page Composer, MS Front Page Front Page takes another step by providing templates and hyperlink management functions Dreamweaver allows site management (upload/download); editor understands PHP, XSLT, XML, CSS, JavaScript syntax. Most desktop publishing software and word processor have built-in converters to convert from their internal format to HTML format. For example FrameMaker, Office2007 Advanced Internet & Web Systems chow 5 Web Delivery Systems Delivery web documents efficiently and reliably to the web clients. Content Distribution and Content Delivery Performance is decided by Web server performance Network path performance Client browser performance. Use multiple physical servers (server farm), and multiple server farms in wide area. New generation of proxy servers/content switches emerge. Advanced Internet & Web Systems chow 6 Content Delivery Network (CDN) Slow Response Huge Requests @Home Clients Host Server PSINet Sprint QWest UUnet Mind Spring Server Crash Gloobix Clients Clients Advanced Internet & Web Systems chow 7 Content Delivery Problems http://www.akamai.com Advanced Internet & Web Systems chow 8 Use Client Cache/ Client Side Cache Server Fewer Requests Clients Fast Response Client Cache Clients Advanced Internet & Web Systems PSINet @Home Sprint Host Server UUnet QWest Mind Spring Client Side Cache Server Gloobix Clients chow 9 Use Mirror Sites Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement Mirror Site Fewer Requests Clients PSINet @Home Host Server Sprint QWest UUnet Server Clients Mind Spring Fast Response Advanced Internet & Web Systems Gloobix Mirror Site chow Clients 10 Edge Network Cache Servers Mirror Site Fast Response Clients Cache Server PSINet Client Cache Cache Server @Home Cache Server Host Server Sprint Cache Server UUnet QWest Server Mind Spring Clients Fewer Requests Client Side Cache Server Advanced Internet & Web Systems Gloobix Edge Network Cache Server Mirror Site Clients chow 11 Architecture solutions for scalable Web-server systems (Fig. 1) Advanced Internet & Web Systems chow 12 Fig. 2. Model architecture for a locally distributed Web system Advanced Internet & Web Systems chow 13 Fig. 3. Architecture of a cluster-based Web system Advanced Internet & Web Systems chow 14 Fig. 4. Architecture of a virtual Web cluster Advanced Internet & Web Systems chow 15 Fig. 5. Architecture of a distributed Web system Advanced Internet & Web Systems chow 16 Content Distribution Secure, automate content/application distribution to single (multiple server)/wide area Internet sites. Provide replication, synchronization, staged rollout and roll back. With revision control, transmit only updates. User-defined file distribution profiles/rules Advanced Internet & Web Systems chow 17 Content Delivery Problem Cache Location Problem: Where to put cache servers? How many are needed? When/where/how to push/delivery the content? How about dynamic content? Advanced Internet & Web Systems chow 18 Akamai Edge Delivery Service Date 11/2000 # of Edge Servers 6000 # of Networks # of Countries 335 54 1/2009 15000 1000 69 Peering Bottleneck Problem: Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%) Need to put edge servers in many networks. Akamai delivers between 10-20% Internet traffic, 10B interactions/day. 1 hop to 85% of the world’s Internet users. http://www.akamai.com/html/technology/nocc.html http://www.akamai.com/html/technology/medium_res.asx chow Advanced Internet & Web Systems 19 F5 Web System Product Site II losangeles.domain.com Internet Internet Site I newyork.domain.com Router 3-DNS BIG-IP BIG-IP Local DNS GLOBAL-SITE Webmaster Site III tokyo.domain.com Server Array User london.domain.com Advanced Internet & Web Systems chow 20 BIG/ip - Delivers High Availability E-commerce - ensures sites are not only up-and-running, but taking orders Fault-tolerance - eliminates single points of failure Content Availability - verifies servers are responding with the correct content Directory & Authentication - load balance multiple directory and/or authentication services (LDAP, Radius, and NDS) Portals/Search Engines – Using EAV administrators perform key-word searches Legacy Systems - Load balance services to multiple interactive services Gateways – Load balance gateways (SAA, SNA, etc.) E-mail (POP, IMAP, SendMail) - Balances traffic across a large number of mail servers Advanced Internet & Web Systems chow 21 3DNS Intelligent Load Balancing Intelligent Load Balancing QoS Load Balancing Quality of Service load balancing is the ability to select apply different load balancing methods for different users or request types Modes of Load Balancing Round Robin Least Connections User-defined Quality-of-Service Completion Rate (Packet Loss) Global Availability Topology Distribution LDNS Round Robin E-Commerce Advanced Internet & Web Systems chow Ratio Random Round Trip Time BIG/ip Packet Rate HOPS Access Control Dynamic Ratio 22 GLOBAL-SITE Replicate Multiple Servers and Sites File archiving engine and scheduler for automated site and server replication BIG-IP controls server availability during replication and synchronization Gracefully shutdown for update update in group/scheduled manner FTP provides transferring files from GLOBAL-SITE to target servers (agent free, scalable) RCE for source control No client side software Complete, turnkey system (appliance) (adapt from F5 presentation) Advanced Internet & Web Systems chow 23 Intel NetStructure Routing based on XML tag (e.g., given preferred treatment for buyers, large volume) http://www.intel.com/network/solutions/xml.htm Advanced Internet & Web Systems chow 24 1. Compared to SUN E450 server Advanced Internet & Web Systems chow 25 Simple Web Access Example: Step1 Someone requests a document using a browser (Web Client) on a computer connected to Internet On a browser window Type in a URL, http://news.netcraft.com/archives/web_server_survey.html Equivalent of %telnet www.netcraft.co.uk 80 > out GET /survey/ HTTP/1.0<cr> <cr> Here <cr> is “carriage return” entered by pressing “enter”key The browser parses the URL, obtains domain name of url, www.netcraft.co.uk asks Domain Name Server (DNS) for translating the domain name to the IP address with IP address the client computer set up a HTTP connection to the server Advanced Internet & Web Systems chow 26 Computer Network Local Area Network (LAN): a private-owned network within a single building or campus of up to a few kilometer in size (Tanenbaum). Wide Area Network (WAN): a network that spans a large geographical area, often a country or continent, and connects LANs or MANs. It consists of transmission line (called circuits, channels, or trunks) and switching elements (called switching nodes, data switching exchanges or router). web host client host DNS router router LAN LAN Advanced Internet & Web Systems DNS web server host LAN host communication subnet LAN chow 27 Protocol and Protocol Layer A set of rules for achieving a global objective exercised by geographically distributed nodes. (Robert Gallager, Prof. EE MIT) Processing Communication Advanced Internet & Web Systems Presentation End-to-End Session Application Transport Network Data Link Physical Networking chow user-oriented functions syntactic, format compatibility value-added features to transport connection reliable transfer between processes at different hosts establish connection between different hosts in a net reliable transfer across a link between neighboring hosts attachment to network or comm. line 28 Protocol Data Encapsulation Message between entities consist of two parts: header and payload. Data from upper layer are put in the payload. Header contains info to allow receiving Data end to deliver to the right upper layer entity. TH: Transport Layer Header AH Data Application Application PH Presentation Session SH Transport TH Network Data Link NH LH Data Presentation Data Session Data Transport Data Network Data LT Physical Data Link Physical Physical Transmission Medium Advanced Internet & Web Systems chow 29 Internet Protocol Layer Interface email FTP SMTP 21 25 W3 Server DNS 53 2001 80 TCP MINS Server Web Client1 2002 UDP Application Layer (4) SAP’s Ports Transport Layer (3)SAP’s IP Network Layer IP Advanced Internet & Web Systems SAP: Service Access Point chow 30 Simple Web Access Example: Step2 Browser sends the following character string to server GET /survey/ HTTP/1.0 User-agent: Mosaic for X windows/2.4 Accept: text/plain Accept: text/html Accept: image/* httpd server parses the request according to HTTP protocol 1.0 interprets rest of the metainfo for browser capabilities Maps the /survey/ to c:/InetPub/wwwroot/survey/default.htm a file path in its file system according to server configuration. retrieves c:/InetPub/wwwroot/survey/default.htm or index.html sends information back using HTTP/1.0 format Advanced Internet & Web Systems chow 31 Simple Web Access Example: Step3 Server replies information using HTTP/1.0 format HTTP/1.0 200 Document follows Date: Tue, 19 Jan 1999 18:10:20 GMT Server: NCSA/1.5 Content-type: text/html <html> <head><title>Netcraft Web Server Survey</title></head> Server close file, set certain timeout and wait for next subsequent requests, such as images/midi files referenced in the web page. (called keep-alive connection). When time expires, disconnect the connection. Advanced Internet & Web Systems chow 32 Simple Web Access Example: Step3a Browser send GET /sample.htm HTTP/1.0 Server replies HTTP/1.0 404 Object Not Found Content-Type: text/html <body><h1>HTTP/1.0 404 Object Not Found </h1></body> Server close file, network connection, wait for next request Advanced Internet & Web Systems chow 33 Simple Web Access Example: Step4 Browser receives http response, a web document with HTML tags, from the server. Browser parses/processes the HTML document, display the document content according the tags. When other images/audio/video data are referenced by <img> <object> <applet> tags, the browser initiates the retrieval of those data. Some of them will http requests to the same web servers. That is the reason why keep-alive connection improves the web server throughput. A URL request may trigger many http requests to several web servers. Advanced Internet & Web Systems chow 34 HTTP HTTP1.0/1.1 http://www.w3.org/Protocols/rfc2068/rfc2068 A HTTP request consists of method: GET, HEAD, POST, PUT, DELETE, Universal Resource Identifier (URI) Protocol version other info to modify or supplement the request If-Modified-Since: (only return object if it is newer the date authorization: (user password or other authentication as required) accept: application/postscript Advanced Internet & Web Systems chow 35 HTTP Response consists of status line (success or failure) HTTP/1.1 400 Bad Request 200 (Document Follow), 301 (Move Permanently), 302 (Move Temporarily), 304 (Not Modified), 401 (Unauthorized), 402 (payment required), 403 (Forbidden), 404 (Not Found), 500 (server error) description of the information (metaheader) Server, Date, Content-Length, Content-Type, ContentEncoded, Last Modified actual info requested Advanced Internet & Web Systems chow 36 Content-Type: MIME Type MIME Type text/plain text/html application/postscript application/ms-powerpoint application/x-javascript image/gif image/jpeg audio/midi video/mpeg x-world/x-vrml Advanced Internet & Web Systems File Extension txt, default (most server) htm, html ps ppt js gif jpg mid mpg wrl chow 37 Configure MIME Types For supporting new mime types, both web server and web client may need to be reconfigured. For web server, Include new mime.type definition in the mime.types file of the configuration directory of the web server By default, most servers deliver unknown type as text/plain browser then may display them as “gibberish” Restart the web server For web client, Specify external viewer associated with the mime type Or, install the plug-in associate with the mime type Advanced Internet & Web Systems chow 38 Brief Survey of Web Servers http://www.w3c.org/hypertext/WWW/Servers. html Jigsaw, http://www.w3c.org/Jigsaw/ http://java.sun.com/products/java-servers/ http://www.yahoo.com/computers_and_Inter net/Internet/World_Wide_Web/HTTP/Servers http://www.netcraft.co.uk/Survey/ “Web Server Technologies” by Nancy J. Yeager and Robert E. McGrath, Morgan Kaufmann 1996. Advanced Internet & Web Systems chow 39 CGI Script Example Client type http://owl.uccs.edu/cgibin/chow/uptime.pl or click on <A HREF =“http://owl.uccs.edu /cgibin/chow /uptime.pl”> Show the load on owl</A> in a web page. uptime.pl #!/usr/bin/perl $UPTIME = '/usr/ucb/uptime'; select(STDOUT); $| =1; #make output unbufferedprint "Content-type: text/html\n\n"; if (-x $UPTIME) { exec($UPTIME); } else { print "cannot find uptime command on this system.\n"; exit(1); } Advanced Internet & Web Systems chow 40 CGI Script Example (Step 2) Web browser sends “GET /cgi-bin/chow/uptime.pl HTTP/1.0” to owl.uccs.edu httpd server at owl parses the request and discovers that a perl script needs to be executed. It locates the script in the file system. Create the execution environment starting a process with appropriate shell environment variable set with STDIN from httpd program with STDOUT to httpd Advanced Internet & Web Systems chow 41 CGI Script Example (Step 3) uptime.pl generates Content-type: text/plain 15:55 up 18 days, 7:15, 5 users, load average: 0.89, 0.81, 0.79 It was sent over STDOUT back to httpd httpd add HTTP/1.0 200 OK Server: Netscape-Communications/1.1 Date: Tuesday, 27-Jan-98 23:12:45 GMT httpd relays the text string back to the web browser Advanced Internet & Web Systems chow 42 What problems can occur? How to detect a script running infinite loop? How to detect a hung script? Advanced Internet & Web Systems chow 43 Handle Multiple Requests Can’t afford sequential processing, since some requested documents are big. Three basic approaches: 1. Fork a new child process: Cloning a copy of httpd 2. Use multithread (if the OS or language support it) e.g., IIS, Java Web Server, Jigsaw 3. Spread the load among several helper programs e.g., Apache Apache allows the starting , min, max # of child web server processes to be specified in a configuration file. It can dynamically adjust to the load. Advanced Internet & Web Systems chow 44 More than One Web Service on the Same Server Platform Run different/same httpd programs on different ports http://www.server.org/intro.html (port 80 by default) http://www.server.org:8080/intro.html (port 8080) http://www.server.org:8081/intro.html (port 8081) They may have different document trees, content, and access control, and serve different user groups (customer, sales, authorized) Note that running program at any port < 1024 requires root privilege. Advanced Internet & Web Systems chow 45 Virtual Hosting To allow one server to server requests with multiple IP addresses. It is a low cost option for clients that want own id and cannot afford a separate machine/connection. Hosting other domain names on the same machine. http://www.a.com/home.html http://www.b.com/home.html Require OS with virtual host support. Assign Multiple IP numbers to the same interface using the ifconfig command in UNIX or ipconfig in NT. Advanced Internet & Web Systems chow 46 Assign Multiple IP Address to the Same Interface On FreeBSD, execute ifconfig ep0 192.168.123.2 ifconfig ep0 192.168.123.3 alias netmask 0XFFFFFFFF ifconfig ep0 192.168.124.1 alias (netmask option is used to suppress error msg) On Linux, execute ifconfig eth0:0 192.168.123.3 192.168.124.1 you may add # route add -host 192.168.123.3 dev eth0:0 # route add -host 192.168.124.1 dev eth0:0 Advanced Internet & Web Systems chow 47 New Hosting Technique Set up virtual machines for each customer Related software packages: User mode Linux VMWare ESX and Virtual Center/Infrastructure. MS VS 2005 Utility Computing (On-Demand Computing) Advanced Internet & Web Systems chow 48 Improving WWW Delivery Systems Currently network is bottleneck. The retrieval of web pages can be improved by increasing network bandwidth, e.g., ADSL link reducing round trip, e.g., use client side programming to check data with Java/Javascript caching (both at client and proxy cache server) increase # and processing power of web servers load balancing by partitioning client-server requests Advanced Internet & Web Systems chow 49 Large Web Sites Mapping the request, e.g., ftp.netscape.com, evenly across a set of server, e.g., ftp[128].netscape.com to Internet RRDNS Router/Firewall Web Server1 HA NFS Server DMZ Firewall Web Server9 Internal Proxy Server HA NFS Server Router/Firewall To Intranet Advanced Internet & Web Systems Web Pages chow 50 CISCO Distributed Director Distributed Director uses, the Director Response Protocol (DRP), a UDP-based application for querying DRP server agent for BGP and IGP routing table metrics between distributed servers and clients, and perform load distribution. Advanced Internet & Web Systems chow 51 Internet Caching Harvest/SQUID Cache: hierarchical, 42% ftp bw reduction Client/Proxy Cache. Local Small, 65% bw reduction Server Push Cache: Gwertzman and Seltzer (cornell) Distributed Internet Cache: Povey and Harrison (uq) hierarchical index on tree top, content on the leave Cachemesh: Wong and Crowcroft (ucl) cache routing table for reducing search overhead WebWave: Heddaya and Mirdad (bu) Cache on Route, Tree Load Balancing &Load Diffusion Adaptive Web Caching: Zhang, Floyd, Jacobson Self-configuration Cache Group, Multicast. Advanced Internet & Web Systems chow 52 Havest/SQUID Object Cache Hierarchical Cache: Danzig, Hall & Schwartz shows it reduces 42% of FTP traffic. Place Big caches between regional networks and backbone. Byte*hop as metric Havest Object Cache: manual configurated hierarchical cache system. Client uses Internet Cache Protocol (ICP) to (recursively) query Sibling and Parent caches NLANR SQUID Object Cache. Internet hierarchical cache system. Problems: 14 separate Australian branches from US CA content sources distribute content through East Coast root cache, back to CA clients. Advanced Internet & Web Systems chow 53 Server Push Cache Assume a network with a lot of push cache servers. Show server initiated cache (push cache) can be combined with client cache to be very effective. Use network topology info and access history to decide which push cach server to place replica. Advanced Internet & Web Systems chow 54 Distributed Internet Cache Povey and Harrison (Univ. of Queenland, Brisbane) Address hierarchical cache problem Hierarchical structure for data searhing only. with mapping info on non-leave nodes, content on the leave. After retrieving a new page, send advertisment up the tree hierarchy. Nonleave node in the path store the advertisement (url, cache loc.) in its table. Disadvantages: increase load on leave caches. Advanced Internet & chow Web Systems 55 Cachemesh Wong and Crowcroft (University College London). Client search cache routing table for cache location. A collection of co-operating caches use Cache Information Exchange Protocol (CIEP) to add/delete entries to the cache routing table. Web site as unit for cache table entries Collision resolution when multiple cache servers claim responsibility (based on freq.) for a web site: use random CIEP_ADD/DELETE sending delay. Realistic metrics to be used for selecting cache server. Advanced Internet & Web Systems chow 56 WebWave Heddaya and Mirdad (Boston Univ.) No directory lookup or cache search. Cache lies along the route to the source. Assume cache server can change filter rules in router to intercept and server the web requests. Define Optimal Tree Load Balancing (TLB). Provide load diffusion algorithms that achieve TLB. Only address single tree for now. Advanced Internet & Web Systems chow 57 Adaptive Web Cache Zhang (UCLA), Floyd, Jacobson (NRG, LBNL) New DARPA-funded research project. Focus on scalability and self-configuration. Self-configuration Cache Group use Cache Group Management Protocol (CGMP). IP Multicast delivery. Cache server may join multiple cache groups (select multi-homed hosts as cache server) Ideally one cache server forward requests to the source. Advanced Internet & Web Systems chow 58 Dynamic Server Selection One candidate architecture Web Server1 Web Server2 Web Server8 Server push load status LB Agent client Advanced Internet & Web Systems Web Server9 Client probe response time LB Agent LB Agent chow LB Agent 59 Novel Server Selection Technique Fei et al (Ammar) [GIT-CC-97-24] Use application layer anycast to select the best geographically separated web servers. Server push (server load status) to resolver. Only push when load change over threshold. Client (resolver) probe (response time of the server) Retrieve fixed size document in each server. Avoid oscillation by returning one server from a set of equivalent servers. Investigate the impact of push/probe frequency on response time. Advanced Internet & Web Systems chow 60 Application-layer Anycast Architecture Resolver Server Content Server Performance Update Push Daemon Probes Probe Client Server Pushes (multicast) Probe Update Name Resolver client/server comm. Anycast Query/Response Anycast-ware Client Advanced Internet & Web Systems chow 61 Experimental Topology Advanced Internet & Web Systems chow 62 Performance of Server Location Scheme Advanced Internet & Web Systems chow 63 Response Time Varying with Push and Probe Frequency Server push twice/min Client Probe once/6min Advanced Internet & Web Systems chow vs. Server push 12 times/min Client probe once/10min 64 Dynamic Server Selection vs. Load Balancing in Servers In Fei et al’s work, after every client chooses the lightest server, it becomes the heavy loaded server. Next round, every client swings to next lightest server and results in oscillation in server selection. How to damp the oscillation: Anycast resolvers return a set of good servers A threshold is used to add/delete good server set User response time vs. System throughput Dynamic server Load Balancing selection in Servers Advanced Internet & Web Systems chow 65 WAN Load Balancing Architecture LBed Server Content Server Probes Performance Update Server Pushes (multicast) Push Daemon client/server comm. Advanced Internet & Web Systems LB Agent LB Agent Probe Client Probe Client Probe Update Probe Update LB Coordinator LB Coord. Protocol LB Coordinator LB Query/Response LB Query/Response LB Client LB Client chow 66 WAN Load Balancing Architecture-2 Server Content Server Probes LB Agent LB Agent Probe Client Probe Client Probe Update Probe Update LB Coordinator client/server comm. Advanced Internet & Web Systems LB Coord. Protocol LB Coordinator LB Query/Response LB Query/Response LB Client LB Client chow 67 WAN Load Balancing Architecture-3 Server Content Server Probes client/server comm. LB Agent LB Agent Probing module Probe Control Update Probing Module Probe Control Update LB Coordinator Traffic Update LB Coord. Protocol Traffic Control Load Balancing System LB Query/Response LB LB Client Client Advanced Internet & Web Systems chow LB Coordinator Traffic Update Traffic Control LB Query/Response LB LB Client Client 68 Functions of LB coordinator Collect server load, network status, and traffic status from probing and traffic control module Share the server and traffic status with other LB coordinators via LB coordinating protocol Run load balancing algorithm that directs the client requests (macro control) dynamically regulates the client-server traffic (micro control) Control the probing frequency of probing module Regulate the traffic of client-server communication Advanced Internet & Web Systems chow 69 Status Collection in WLB system Passive traffic monitoring on client-server comm. Server load report from other LB coordinators Active probing on server and network loads when there is no traffic status reports Research issues: traffic monitoring system design efficiency, accuracy, coordination of probing system derive server and network load from traffic data Advanced Internet & Web Systems chow 70 Traffic Control in WLB System Admission control (Macro level Control) Estimate the load of the requests Direct the requests Taffic grooming/shapping (Micro level control) At what protocol level (TCP, IP?) At which module/interface (Router? Layer4/Content/Web switch) Advanced Internet & Web Systems chow 71 Important Web Sites http://www.w3c.org/ http://developer.netscape.com/ http://java.sun.com/ http://www.microsoft.com/workshop/default.a sp http://www.apache.org/ http://www.netcraft.co.uk/Survey/ http://web.mit.edu/afs/athena/user/w/s/wsma rt/WEB/HTMLtutor.html ... Advanced Internet & Web Systems chow 72 Useful References Oreily’s Web series HTML, CGI, Dynamic HTML, Programming Perl “Web Server Technologies” by Nancy J. Yeager and Robert E. McGrath, Morgan Kaufmann 1996. HTML+CGI World Wide Web Beyond the Basics, edit by Marc Abrams, Prentice Hall, 1998 MS Technical Support for IIS, self learning manual. How to setup and maintain a web sites, L. Stein. Web Server Tuning … Advanced Internet & Web Systems chow 73