Building Scalable Web Services Using Apache JServ Sunny Gleason COM S 717 Tuesday, December 4, 2001 In This Lecture • • • • What is JServ? The Alternatives Java Servlet API Apache JServ / Tomcat – Scalability – Load Balancing – Fault-Tolerance • JServ Security Introduction • Running a web service has changed a lot since the early 1990’s • Originally static HTML, text, and images • Still a great deal of HTML content • Shift from static pages to dynamically generated content • Database-driven content, WAP, XML, XSLT What is JServ? • JServ Server is a Java Servlet Engine (compliant with the Java Servlet API v2.0) • Free software produced by the Apache Software Foundation • Mod_jserv is a module for connecting JServ to Apache HTTP Server • JServ engine has been replaced by Tomcat • Mod_jserv has been replaced by mod_jk HTTP Basics • HyperText Transfer Protocol • Built on top of TCP • 2 Well-Known Methods: – GET – POST • Other Methods – HEAD, PUT, DELETE, ... • Stateless HTTP GET • Format: – GET url HTTP/1.1 crlf headers crlf crlf • The url string contains the resource identifier i.e. “/top.htm” • The headers contain optional information provided by the client to the server • Query Data may follow a question mark in the URL – i.e. “/search.pl?query=linux” HTTP PUT • Format: – PUT url HTTP/1.1 crlf headers crlf crlf form_data • Form data not passed through URL • Allows submission of data values which are larger than maximum URL length – [URL ~ 2k on MS IE4.0 and above] HTTP Server Response • HTTP 200 OK crlf headers crlf crlf content • Headers include MIME-type, content length, content encoding • Other responses: 301 Redirect, 401 Authorization Required, 403 Access Forbidden, 500 Internal Server Error Cookies • Persistent Client-Side Information • <Server, Key, Value, Expiration Date> tuples • Server sets cookie using Set-Cookie header • All future requests to server (before expiration date) accompanied by cookie in header Serving Dynamic Content • We discuss 3 early models for dynamic content: – CGI – Mod_perl – Mod_php The Alternatives: CGI • Common Gateway Interface • Advantages – Flexibility - run any program • bash, perl, python, php – Low process overhead when idle • Disadvantages – Reload interpreter upon every request – Re-establish (costly) database connections – Security concerns - passing parameters The Alternatives: Mod_Perl • • • • Apache module for Perl Memory-resident interpreter Precompiled scripts / Script cache Speed / Memory Tradeoff – HTTP Processes maintain individual perl interpreters – Allows persistent database connections, other persistent server state – Consistency between HTTP processes was not always assured The Alternatives: Mod_Php • Apache module for PHP (PHP: Hypertext Preprocessor) • Template-based language – Code tags are “embedded” within HTML template files – Similar to MS ASP • Suitable where HTML to script code ratio is high • Huge library of add-on modules • Similar tradeoffs as mod_perl The Alternatives: Summary • Should application logic be running on the web server? – scalability – fault-tolerance – security • Clearly, need something better for enterprise-scale applications Apache JServ • Separate Application Server from Web Server – Clean up the architecture – Improve Scalability – Provide fault-tolerance • Embrace Java Philosophy – “Write once, run anywhere” • Provide additional Servlet functionality – Like user sessions JServ: Openness • JServ is 100% Java Code – Platform-Independent – Runs on any compliant JVM (IBM, Sun, ...) • JServ is built on top of TCP • Part of the Apache Software Foundation – Integrates nicely with Apache HTTP Server – Ports available for Windows, BSD, Linux ... JServ: Security • JServ/Apache can run on different hosts (also: different users) • JServ itself is comprised of many “Zones” – A zone is a JVM which executes some number of Java Servlets • • • • JServ may be placed behind a firewall JServ offers ACL security by IP address Optional shared-key authentication Apache HTTP Server may integrate SSL for secure HTTP client-server interaction JServ: Load Balancing • Level 0: 1 - 1 Apache/JServ – No load balancing, no redundancy • Level 1: 1 - n Apache/JServ – Each JServ hosts different zones (load partitioning) • Level 2: 1 - m*n Apache/JServ – Each zone may be balanced among several JServs • Level 3: p - m*n Apache/JServ – Multiple Apache Servers, multiple JServs JServ: Levels 0-1 • Level 0: allows smaller hosts to run entire application on a single machine • Level 1: allows different hosts to serve different applications • Typically difficult to plan/partition applications in this manner JServ: Level 2 • 1 - m*n Apache/JServ – Allows Apache to balance requests among several JServ servers hosting the same zones – Apache configuration file specifies ratio of hits for each JServ – Each HTTP process chooses server for each JServ zone, sends new requests to this target JServ: Level 3 • p - m*n Apache/JServ – Allows HTTP traffic to be load-balanced among several Apache servers – Allows Servlet workload to be distributed among several JServ servers – In order for the system to work, each Apache HTTP server must have identical JServ configuration • (To preserve sessions, as we’ll see later) JServ: Session Handling • Once established, a session is bound to a particular JServ • But, HTTP client accesses might be “sprayed” among many HTTP servers – Allows HTTP Server fault-tolerance • Identical mod_jserv configuration allows different Apache servers to “route” requests to the right JServ • Mechanism requires client to maintain a cookie which contains JServ server ID JServ: Session Handling • How does it work? – Every time a request arrives for a balanced ServletMountPoint, mod_jserv chooses a JServ to handle the request – mod_jserv adds a cookie trailer to the environment variables of the JServ request (i.e. JS3) – JServ appends the cookie trailer to the end of the session cookie – Upon subsequent requests, Apache examines cookie, and sends the request to the correct JServ JServ: Fault-Tolerance • (Assume Level 3) • No Single Point of Failure – Apache can become overloaded and fail, but JServ servers continue to provide services (although SSL sessions lost) – JServ redundancy allows applications to continue running even if multiple hosts fail (although application sessions will be lost) – Since any Apache can route to any JServ, as long as one of each stay up, the system can work JServ: Fault-Tolerance • How is the JServ fault tolerance implemented? – Each Apache contains a memory-mapped file where it keeps JServ information – Each Apache process has access to the file – If a process does not receive a response from a JServ process, it marks it as DOWN in the file • (Load is re-distributed [fairly] among the survivor JServs) – A “watchdog” process pings the JServs intermittently, updates the JServ status in memory if the server is back online JServ: Fault-Tolerance – Apache Fault-Tolerance: Step 1 • 1. Web server requests www.jserv.com:80 • 2. HTTP Load-balancing system routes request to 111.222.244.10:3000 • 3. Apache server chooses a random JServ machine, say 192.168.0.51:8885 • 4. JServ machine responds to request with content of page, along with cookie with name “JServSessionID” and value “xxxx-JS1” JServ: Fault-Tolerance • Apache Fault-Tolerance: Step 2 – 1. Client requests another page from www.jserv.com:80 – 2. HTTP Load-balancing system routes request to 111.222.244.20:3000 – 3. Apache server recognizes session cookie, finds “JS1” at end of the cookie – 4. Apache looks up “JS1” in JServ configuration, routes request to 192.168.0.51:8885 JServ: Load-Balancing • Step 1: JServ Load-Balancing – 1. – 2. – 3. – 4. – 5. – 6. Client A requests a servlet (A1) HTTP chooses target JServ (A’1) Client A cookie is set for JS1 Client B requests a servlet (B2) HTTP chooses target JServ (B’2) Client B cookie is set for JS2 JServ: Load-balancing • Step 2: Session Handling – 1. Client B requests a servlet (sends previously-set cookie) – 2. HTTP server recognizes cookie – 3. Request is routed to JServ2 (B’2) JServ: Fault-Tolerance • [assume Jserv1 goes down] • Step 3: JServ Fault-Tolerance – 1. Client A requests a servlet – 2. HTTP Server recognizes the JS1 cookie – 3. Request is passed to JServ1, resulting in timeout – 4. HTTP marks JServ1 “dead” in shared memory – 5. HTTP looks up another server for the servlet mount point, sends request to JServ3 – 6. If a new session is needed, a new one is created and the new cookie is set to “JS3” (JS1 erased) JServ: Fault-Tolerance • Implementation Issues – Denial of Service • Failed requests must be re-distributed evenly! • Otherwise, a single server will bear the brunt of the load, and probably crash – Network Partitioning and Application-level Data Synchronization Issues • Must still be anticipated by the app. designer – Watchdog process • For single-threaded watchdog, if timeout is t, time between crash and restoration could be f*t, where f is the number of failed processes JServ: Manageability • Shared JServ State allows HTTP process coordination • Admins can mark JServs as “shutdown” in shared memory • JServ processes can be brought down for maintenance • Apache HTTP processes redirect requests among “live” servers • Detailed availability information can be produced by logging contents of shared memory file Tomcat: New Features • Enhanced security model • Property files which specify access rights (open socket, write file, etc.) • Allows different protection levels within the same JVM (i.e. Java 2 protection model) Conclusion • JServ provides: – Limited support for • Load balancing • Fault-tolerance • External Security – Good support for • Internal Security • N-tier application abstraction provides flexibility when needed, “loopback” option otherwise The End • Any questions/comments? – Apache Web Server – JServ / Tomcat Servlet Containers – Scalability / Load-balancing – Fault-tolerance – Security For Further Info • Apache Jakarta Project • http://jakarta.apache.org/ • http://jakarta.apache.org/tomcat/