Functionality of a web server What does the web server do? Let a user request a resource Find the resource Return something to the user The resource can be different things, such as An HTML page A picture A PDF document Data (xml, json, plain/text) Functionality of a web server If the requested resource is not there, you will get an error “404 Not Found” error in the browser In the context of this book, when we say “server”, we mean either the physical machine (hardware) or the web server application (software) Functionality of a web client When we talk about clients, we usually mean both (or either) the human user and browser application The browser is the piece of software that knows how to render HTML pages How do the clients and servers talk to each other? The clients and servers speak HTTP HTTP stands for Hyper Text Transfer Protocol HTTP is the protocol clients and servers use on the web to communicate The client sends an HTTP request, and the server answers with an HTTP response. The browsers must know HTML HTML stands for HyperText Markup Language HTML tells the browser how to display the content to the user HTML When you develop a web page, you use HTML to describe what the page should look like and how it should behave The goal of HTML is to take a text document and add tags that tell the browser how to format the text. What is HTTP HTTP runs on top of TCP/IP. TCP stands for Transmission Control Protocol It is a connection-oriented, end-to-end reliable protocol for message transmission on the network TCP is responsible for making sure that a file sent from one network node to another ends up as a complete file at the destination IP stands for Internet Protocol (protocol used for communicating data across a packet-switched internetwork) HTML can be part of the HTTP response An HTTP response can contain HTML HTTP adds header information to the top of whatever content is in the response An HTML browser uses that header info to help process the HTML page HTML can be part of the HTTP response HTTP Header info <html> <head> … </head> <body> <img src=…> </body> </html> What is in the HTTP request The first thing you’ll find is an HTTP method name HTTP protocols has several methods, the ones you’ll use most often are GET POST HTTP GET User clicks a link to a new page User Browser Browser sends out an HTTP GET to the server, asking the server to get the page Server HTTP POST Browser sends out an HTTP POST to the server, giving the server what the user typed into the form User types in a form and hits the Submit button User Browser Server What is in the HTTP request The main job of HTTP GET is to ask the server to get a resource such as an HTML page, a JPEG, a PDF, etc. The main job of HTTP POST is to request something and at the same time send form data to the server What is in the HTTP request That does not mean HTTP GET cannot be used to send data The data you send with HTTP GET is appended to the URL up in the browser bar So whatever you send is exposed You can even use HTTP GET to send form data to the web server Doing this cause any form data to be exposed in the browser bar, that is why people usually do not do this Anatomy of an HTTP GET request HTTP method Request line Request headers Path of resource Protocol version Anatomy of an HTTP GET request Another example with request parameters HTTP GET request parameters Anatomy of an HTTP GET request In a GET request Parameters (if there are any) are appended to the first part of the request URL, Starting with a “?”. Parameters are separated with an ampersand “&” E.g., GET /select/selectBeerTaste.jsp?color=dark&taste=malty HTTP/1.1 Anatomy of an HTTP POST request Request line Request headers Message body Path of resource Protocol version Anatomy of an HTTP POST request HTTP POST requests are designed to be used by the browser to make complex requests on the server For example, it can be used to send all of the form data use completed to the web server and then added the data to a database The data sent back to the server is known as the “message body” or “payload” The data can be quite large Anatomy of an HTTP response Protocol ver Response header Response body http status code Text version of the status code Anatomy of an HTTP response An HTTP response has both a header and a body. The header info tells the browser about the protocol being used Whether the request was successful What kind of content is included in the body The body contains the contents (e.g., HTML) for the browser to display Anatomy of an HTTP response The Content-Type response header’s value is known as a MIME type. The MIME type tells the browser what kind of data the browser is about to receive so that the browser will know how to render it Notice that the MIME type value relates to the value listed in the HTTP request’s “Accept” header MIME stands for Multipurpose Internet Mail Extensions URL URL stands for Uniform Resource Locator Every resource on the web has its own unique address, in the URL format http://www.wickedlysmart.com:80/beeradvice/select/beer1.html Path Sever name Protocol Resource Port if not specified, then port 80 is the default If not specified, default to index.html TCP port A TCP port is just a number A port represents a logical connection to a particular piece of software running on the server hardware A TCP port can be any number from 0-65535 A port does not represent a place to plug in some physical device, it is just a number representing a server application The TCP port numbers from 0 to 1023 are reserved for well-known services Well-known TCP port numbers FTP: 21 Telnet: 23 SMTP: 25 HTTPS: 443 POP3: 110 HTTP: 80 Time: 37 Directory structure for a simple Apache web site Apache is a popular open source web server Suppose we have a web site www.wickedlysmart.com running on Apache It hosts two applications One giving skiing advice One giving beer-related advice What would the directory structure look like for this web site? Directory structure for a simple Apache Apache web site Home htdocs is the dir that is the root for all of the web applications Index.html is the default page that will be returned to a user who keys www.wickedlysmart.com htdocs The root folder for the skiingAdvice application The root folder for the beerAdvice application <html> . . . </html> A skiingAdvic e beerAdvic e select select Index.html <html> . . . </html> <html> . . . </html> B Index.html is the default page for the skiingAdvice application C Index.html checkout <html> . . . </html> Index.html Index.html is the default page for the beerAdvice application D selectBeer.html An HTML page that gives the user some advice Mapping URLs to content http://www.wickedlysmart.com will cause the server to return to you index.html at location A Mapping URLs to content What url will cause the server to return to you index.html at location B? Mapping URLs to content What url will cause the server to return to you index.html at location C? Mapping URLs to content http://www.wickedlysmart.com will cause the server to return to you index.html at location A Web server loves serving static web pages Web server sends back the page the client ask for with added HTTP header info without doing any change or computation on the page If the client want a dynamic web page such as showing the time on the web server, the server cannot do that <html> <body> The current time is [insertTimeOnServer] </body> </html> Web server cannot insert the time on the html page directly Helper application So a helper application is needed to generate the dynamic content. Web server sends the request to the helper application (to generate dynamic content), then take the app’s response and send it back to the client. Client Web server Another application on server In fact, the client never needs to know that someone else did some of the work Two things the web server alone won’t do Dynamic Content A separate “helper” application that the web server can communicate with can build non-static, just-in-time pages Saving data on the server When the user submits data in a form, in order to process form data, you need a help application The helper application can either save data to a database or use the data to generate the response page CGI Non-java term for a web server helper application is “CGI” program Most CGI programs are written as Perl scripts, but many other languages can be used including C, Python, and PHP CGI CGI Differences between Servlets and CGI Servlets have better performance in serving client requests Client requests for a Servlet resource are handled as separate threads of a single running Servlet With CGI, the server has to launch a heavy-weight process for each and every request for that resource Servlet Demystified Let us use a simple example to show how to write, deploy, and run a servlet that generates a HTML page that displays the current date and time of the server Build this directory tree Servlet Demystified Write a servlet named Ch1Servlet.java and put it in the src directory. Alternatively, you may download the servlet code from the blackboard (lab assignment section for this class) Servlet Demystified import javax.servlet.*; import javax.servlet.http.*; import java.io.*; public class Ch1Servlet extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException { PrintWriter out = response.getWriter(); java.util.Date today = new java.util.Date(); out.println("<html> " + "<body>"+ "<h1 align=center>HF\'s Chapter1 Servlet</h1>" + "<br>"+today+"</body>"+"</html>"); } } Create a deployment descriptor (DD) <?xml version="1.0" encoding="ISO-8859-1"?> <web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" version="2.5"> <servlet> <servlet-name>Chapter_1_Servlet</servlet-name> <servlet-class>Ch1Servlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>Chapter_1_Servlet</servlet-name> <url-pattern>/Serv1</url-pattern> </servlet-mapping> </web-app> Servlet Demystified Build this directory under the existing tomcat directory… Servlet Demystified From the project1 directory, compile the servlet javac -classpath \tomcat_dir\lib\servlet-api.jar -d classes src\Ch1Servlet.java Please be noted that in the book, it uses a different version of tomcat, thus it uses a different classpath as follows: javac -classpath /your path/tomcat/common/lib/servlet-api.jar -d classes src/Ch1Servlet.java (do not use this command line) In our installation of version 6.0.18 of Tomcat, there is no sub-dir common under tomcat dir. Thus, if you use the command in the book as it, it will prompt a compilation error. Servlet Demystified Copy the Ch1Servlet.class file to WEB-INF/classes, and copy the web.xml file to WEB-INF. From the tomcat’s bin directory, start Tomcat C:\apache-tomcat-6.0.18\bin>startup or # ./startup.sh (for linux) Servlet Demystified Launch your browser and type in http://localhost:8080/ch1/Serv1 Servlet Demystified From now, every time you update either a servlet class or the deployment descriptor, shutdown tomcat and then restart it. C:\apache-tomcat-6.0.18\bin>shutdown or # ./shutdown.sh (for linux) Disadvantage of using servlet Disadvantage of using servlet Question: Why can’t I just copy a whole page of HTML from my web page editor, like Microsoft Front Page, Dreamweaver, and paste it into the println()? Answer: You cannot have a carriage return (a real one) inside a String literal. Simply copy a whole page of HTML into println() will cause compilation errors. Quotes in HTML page can be a problem too Overview of JSP A JSP page looks like an HTML page, except you can put Java and Java-related things inside the page So it really like inserting a variable into your HTML Putting Java into HTML is a solution for two issues: Not all HTML page designers know Java 2. It is difficult to format HTML into a string literal in servlet 1. With JSP, Java Developers can do Java, and HTML developers can do web pages