CIS 5930-04 – Spring 2001 Part 6: Introduction to CGI and Servlets http://aspen.csit.fsu.edu/it1spring01 Instructors: Geoffrey Fox , Bryan Carpenter Computational Science and Information Technology Florida State University Acknowledgements: Nancy McCracken Syracuse University dbc@csit.fsu.edu 1 Introduction RMI gave us one approach to client/server programming. The approach was based on the Java language and some far-reaching ideas about remote objects, object serialization, and dynamic class loading. We could achieve direct integration into the traditional World Wide Web through applets, but the technology is not specifically tied to the Web. RMI is powerful and general (and interesting), but it can be a slightly heavy-handed approach if actually we only need to interact with users through Web pages. For the future, it may be more natural to view RMI as a technology for the “middle tier” (or for connectivity in the LAN) rather than for the Web client. dbc@csit.fsu.edu 2 HTML Forms and CGI There are long-established techniques for getting information from users through Web browsers (predating the appearance of Java on the Web). The FORM element of HTML can contain a variety of input fields. The inputted data is harvested by the browser, suitably encoded, and forwarded to the Web server. On the server side, the Web server is configured to execute an arbitrary program that processes the user’s form inputs. This program typically outputs a dynamically generated HTML document containing an appropriate response to the user’s input. The server-side mechanism is called CGI: Common Gateway Interface. dbc@csit.fsu.edu 3 CGI and Servlets In conventional CGI, a Web site developer writes the executable programs that process form inputs in a language such as Perl or C. The program (or script) is executed once each time a form is submitted. Servlets provide a more modern, Java-centric approach. The server incorporates a Java Virtual Machine, which is running continuously. Invocation of a CGI script is replaced invocation of a method on a servlet object. dbc@csit.fsu.edu 4 Advantages of Servlets Invocation of a single Java method is typically much cheaper than starting a whole new program. So servlets are typically more efficient than CGI scripts. – This is important if we planning to centralize processing in the server (rather than, say, delegate processing to an applet or browser script). Besides this we have the usual advantages of Java: – Portability, – A fully object-oriented environment for large-scale program development. – Library infrastructure for decoding form data, handling cookies, etc (although many of these things are also available in Perl). – Servlets are the foundation for Java Server Pages. dbc@csit.fsu.edu 5 Plan of this Lecture Set Review HTML forms and associated HTTP requests. Briefly describe traditional CGI programming. Detailed discussion of Java servlets: – – – – – – Deploying Tomcat as a standalone Web server. Simple servlets. The servlet life cycle. Servlet requests and responses. More on the HTTP protocol. Approaches to session tracking. Handling cookies. The servlet session-tracking API. dbc@csit.fsu.edu 6 References Core Servlets and JavaServer Pages, Marty Hall, Prentice Hall, 2000. – Good coverage and current, with some discussion of the Tomcat server. Java Servlet Programming, Jason Hunter and William Grawford, O’Reilly, 1998. – Also good, with some good examples. Slightly out of date. Java Servlet Specification, v2.2, and other documents, at: http://java.sun.com/products/servlet/ dbc@csit.fsu.edu 7 HTML Forms dbc@csit.fsu.edu 8 The HTTP GET request Before discussing forms, let’s look again at how the GET request normally works. The following server program listens for HTTP requests, and simply prints the received request to the console. dbc@csit.fsu.edu 9 A Dummy Web Server public class DummyServer { public static void main(String [] args) throws Exception { ServerSocket server = new ServerSocket(8080) ; while(true) { Socket sock = server.accept() ; BufferedReader in = new BufferedReader( new InputStreamReader(sock.getInputStream())) ; String method = in.readLine() ; System.out.println(method) ; while(true) { String field = in.readLine() ; System.out.println(field) ; if(field.length() == 0) break ; } . . . Send a dummy response to client socket . . . } } dbc@csit.fsu.edu 10 A GET Request On the host sirah I run the dummy server: sirah$ java DummyServer Now I point a browser at http://sirah.csit.fsu.edu:8080/index.html The dummy server program might print: GET /index.html HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank line> dbc@csit.fsu.edu 11 Fields of the GET request The HTTP GET request consists of a series text fields on separate lines, ended by an empty line. The first line is the most important: it is called the method field. In simple GET requests, the second token in the method line is the requested file name, expressed as a path relative to the document root of the server. dbc@csit.fsu.edu 12 A Simple HTML Form The form element includes one or more input elements, along with any normal HTML terms: <html> <body> <form method=get action=“http://sirah.csit.fsu.edu:8080/dummy”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> </body> </html> dbc@csit.fsu.edu 13 Remarks The form tag includes important attributes method and action. The method attribute defines the kind of HTTP request sent when the form is submitted: its value can be get or post (see later). The action attribute is a URL. In normal use it will locate an executable program on the server. In this case it is a reference to my “dummy server”. An input tag with type attribute text represents a text input field. An input tag with type attribute submit represents a “submit” button. dbc@csit.fsu.edu 14 Displaying the Form If I place this HTML document on a Web Server at a suitable location, and visit its URL with a browser, I see something like: dbc@csit.fsu.edu 15 Submitting the Form If I type my name, and click on the “Submit Query” button, the dummy server running on sirah prints: GET /dummy?who=Bryan HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 <blank> dbc@csit.fsu.edu 16 Remarks When the form specifying the get method is submitted, the values inputted by the user are effectively appended to the end of the URL specified in the action attribute. In the HTTP GET request—sent when the submit button is pressed—they appear attached to the second token of the first line of the request. In simple cases the appended string begins with a ? This is followed by pairs of the form name=value, where name is the name appearing in the name attribute of the input tag, and value is the value entered by the user. If the form has multiple input fields, the pairs are separated by & dbc@csit.fsu.edu 17 POST requests This method of attaching input data to the URL is handy if the user has a relatively simple query (e.g. for a search engine). For more complex forms it is usually recommended to specify the post method in the form tag, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> In the HTTP protocol, a POST request differs from a GET request by having some data appended after the headers. dbc@csit.fsu.edu 18 A Form Using the POST Method <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Surname: <input type=text name=surname size=32> <p> Surname: <input type=text name=fornames size=40> <p> <input type=submit> </form> dbc@csit.fsu.edu 19 Extending the Dummy Server We can modify the dummy server to display POST requests, by declaring a variable contentLength, adding the lines if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ; contentLength = Integer.parseInt(field.substring(16)) ; inside the loop that reads the headers, and adding for(int i = 0 ; i < contentLength ; i++) int b = in.read() ; System.out.println((char) b) ; } after that loop. dbc@csit.fsu.edu 20 Submitting the Form When I click on the “Submit Query” button, the dummy server prints: POST /dummy HTTP/1.0 Connection: Keep-Alive User-Agent: Mozilla/4.51 [en] (X11; I; ...) Host: sirah.csit.fsu.edu:8080 Accept: image/gif, ..., */* Accept-Encoding: gzip Accept-Language: en Accept-Charset: iso-8859-1,*,utf-8 Content-type: application/x-www-form-urlencoded Content-Length: 39 surname=Carpenter&forenames=David+Bryan dbc@csit.fsu.edu 21 Remarks The method field (the first line) now starts with the word POST instead of GET; the data is not appended to the URL. There are a couple more fields in the header, describing the format of the data. Most importantly, the form data is now on a separate line at the end of file. However, the form data is still URL-encoded. dbc@csit.fsu.edu 22 URL Encoding URL encoding is a method of wrapping up form-data in a way that will make a legal URL for a GET request. We have seen that the encoded data consists of a sequence of name=value pairs, separated by &. In the last example we saw that spaces are replaced by +. Non-alphanumeric characters are converted to the form %XX, where XX is a two digit hexadecimal code. In particular, line breaks in multi-line form data (e.g. addresses) become %0D%0A—the hex ASCII codes for a carriage-return, new-line sequence. URL encoding is somewhat redundant for the POST method, but it is the default anyway. dbc@csit.fsu.edu 23 More Options for the input Tag We can make a group of radio buttons in an HTML form by using a set of input tags with the type attribute set to radio. Tags belonging to the same button group should have the same name attribute, and distinct value attributes, e.g.: <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> Favorite primary color: <p> Red: <input type=radio name=color value=red> Blue: <input type=radio name=color value=blue> Green: <input type=radio name=color value=green> <p> <input type=submit> </form> dbc@csit.fsu.edu 24 Radio Buttons The message sent to the server is: ... Content-type: application/x-www-form-urlencoded Content-Length: 10 color=blue dbc@csit.fsu.edu 25 Checkboxes <form method=post action=“http://sirah.csit.fsu.edu:8080/dummy”> What pets do you own? <p> <input type=checkbox name=pets value=dog checked> Dog <br> <input type=checkbox name=pets value=cat> Cat <br> <input type=checkbox name=pets value=bird> Bird <br> <input type=checkbox name=pets value=fish> Fish <p> <input type=submit> </form> Example from “HTML and XHTML: The Definitive Guide”, O’Reilly. dbc@csit.fsu.edu 26 Checkboxes The message posted to the server is: ... pets=dog&pets=bird Note there is no requirement that a form map a name to a unique value. dbc@csit.fsu.edu 27 File-Selection You can name a local file in an input element, and have the entire contents of the file posted by browser to server. This is not allowed using the default URL-encoding for form data. Instead you must specify multi-part MIME encoding in the form element, e.g.: <form method=post enctype=“multipart/form-data” action=“http://sirah.csit.fsu.edu:8080/dummy”> Course: <input name=course size=20> <p> Students file: <input type=file name=students size=32> <p> <input type=submit> </form> dbc@csit.fsu.edu 28 File-Selection Entry With multi-part encoding, the data is no longer sent on a single line. On submission the DummyServer prints. . . dbc@csit.fsu.edu 29 Output of DummyServer on submit POST /dummy HTTP/1.0 Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html ... Content-type: multipart/form-data; boundary=--------------------------269912718414714 Content-Length: 455 -----------------------------269912718414714 Content-Disposition: form-data; name="course" CIS6930 -----------------------------269912718414714 Content-Disposition: form-data; name="students"; filename="students" wcao flora Fulay gao ... zhao6930 zheng -----------------------------269912718414714-dbc@csit.fsu.edu 30 Remarks Each form field has its own section in the posted file, separated by a delimiter specified in the Content-type field of the header. Within each section there are one or more header lines, followed by a blank line, followed by the form data. The values can contain binary data. There is no “URLencoding”. dbc@csit.fsu.edu 31 Masked and Hidden fields The input to a text field can be masked by setting the type attribute to password. The entered text will not be echoed to the screen. If the type attribute is set to hidden, the input field is not displayed at all. This kind of field is often used in HTML forms dynamically generated by CGI scripts. Hidden fields allow the CGI scripts to keep track of “session” information over an interaction that involves multiple forms—hidden fields may contain values characterizing the session. – Use of hidden fields will be one of the topics in the lectures on servlets. dbc@csit.fsu.edu 32 Text Areas Similar to text input fields, but allow multi-line input. Included in a form by using the textarea tag, e.g.: <textarea name=address cols=40 rows=3> . . . optional default text goes here . . . </textarea> With default (URL) encoding, lines of input are separated by carriage return/newline, coded as %0D%0A. dbc@csit.fsu.edu 33 Text Area Input Data posted to server: address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATal lahassee%2C+FL+32306-4120 dbc@csit.fsu.edu 34 Scrollable Menus (Lists) For long lists of options, when checkboxes become too tedious: <select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select> The value attribute in the option tag is optional: default value returned is the displayed string, immediately following the tag. Without the multiple attribute, only a single option can be selected. dbc@csit.fsu.edu 35 List Input The message posted to the server is: ... pets=dog&pets=bird dbc@csit.fsu.edu 36 Conventional CGI dbc@csit.fsu.edu 37 Handling Form Data on the Server In conventional CGI programming, the URL in the action attribute of a form will identify an executable file somewhere in the Web Server’s document hierarchy. A common server convention is that these executables live in a subdirectory of cgi-bin/ The executable file may be written in any language. For definiteness we will assume it is written in Perl, and refer to it as a CGI script. The Web Server program will invoke the CGI script, and pass it the form data, either through environment variables or by piping data to standard input of the script. The CGI script generates a response to the form, which is piped to the Web server through its standard output, then returned to the browser. dbc@csit.fsu.edu 38 Operation of a CGI Script At the most basic level, a CGI script must – Parse the input (the form data) from the server, and – Generate a response. Most often the response is the text of a dynamically generated HTML document, preceded by some HTTP headers. – In practice the only required HTTP header is the Content-type header. The Web Server will fill in other necessary headers automatically. Even if there is no meaningful response to the input data, the CGI script must output an empty message, or some error message. – Otherwise the server will not close the connection to the client, and a browser error will occur. dbc@csit.fsu.edu 39 “Hello World” CGI Script In the directory /home/httpd/cgi-bin/users/dbc on sirah, I create the file hello.pl, with contents: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello World!</h1></body></html>” ; I mark this file world readable, and mark it executable: sirah$ chmod o+r hello.pl sirah$ chmod +x hello.pl Now I point my browser at the URL: http://sirah/cgi-bin/users/dbc/hello.pl dbc@csit.fsu.edu 40 Output from CGI Script The novel feature here is the the HTML was dynamically generated: it was printed out on the fly by the Perl script. dbc@csit.fsu.edu 41 Retrieving Form Data Several environment variables are set up by the server to pass information about the request to the Perl script. If the form data was sent using a GET request, the most important is QUERY_STRING, which contains all the text in the URL following the first ? character. If the form data was sent using a POST request, the environment variable CONTENT_LENGTH contains the length in bytes of the posted data. To retrieve this data, these bytes are read from the standard input of the script. dbc@csit.fsu.edu 42 GET example I change our first form to submit data to a CGI script: <form method=get action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/getEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define getEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $ENV{QUERY_STRING}!</h1></body></html>\n” ; When I point the browser at the form, enter my name, and submit the form, the page returned to the browser contains the message: Hello who=Bryan! dbc@csit.fsu.edu 43 POST example Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/postEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define postEg.pl by: #!/usr/bin/perl print “Content-type: text/html\n\n” ; for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) { $in .= getc ; } print “<html><body><h1>Hello $i!</h1></body></html>\n” ; dbc@csit.fsu.edu 44 Using the CGI module The previous example illustrate the underlying mechanisms used to communicate between server and CGI program. One could go on to use the text processing features of Perl to parse the form data and generate meaningful responses. In modern Perl you can (and presumably should) use the CGI module to hide many of these details— especially extracting form parameter. dbc@csit.fsu.edu 45 CGI module example Change the form as follows: <form method=post action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/CGIEg.pl”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> and define CGIEg.pl by: #!/usr/bin/perl use CGI qw( :standard) ; $name = param(“who”) ; print “Content-type: text/html\n\n” ; print “<html><body><h1>Hello $name!</h1></body></html>\n” ; Now the browser gets a more friendly message like: dbc@csit.fsu.edu Hello Bryan! 46 Getting Started with Servlets dbc@csit.fsu.edu 47 Server Software Standard Web servers typically need some additional software to allow them to run servlets. Options include: – Apache Tomcat The official reference implementation for the servlet 2.2 and JSP 1.1 specifications. It can stand alone or be integrated into the Apache Web server. – JavaServer Web Development Kit (JSWDK) A small standalone Web server mainly intended for servlet development. – Sun’s Java Web server An early server supporting servlets. Now apparently obsolete. – Allaire JRun, New Atlanta’s ServletExec, . . . dbc@csit.fsu.edu 48 Tomcat In these lectures we will use Apache Tomcat for examples. For debugging of servlets it seems to be necessary to use a stand-alone server, dedicated to the application you are developing. – The current architecture of servlets makes revision of servlet classes already loaded in a Web server either disruptive or expensive. In general you need to establish your classes are working smoothly before they are deployed in a production server. Hence you will be encouraged to install your own private server for developing Web applications. Tomcat is the flagship product of the Jakarta project, which produces server software based on Java. dbc@csit.fsu.edu 49 Typical Modes of Operation of Tomcat 1. Stand-alone Browser Client Servlet Request 8080 Tomcat Server Apache Browser Client Servlet Request 80 2. In-process servlet container Tomcat Server 3. Out-ofprocess servlet container 80 Browser Apache Servlet Request Client Tomcat 8007 Server dbc@csit.fsu.edu 50 Downloading Tomcat Go to the Jakarta home-page: http://jakarta.apache.org Follow the link for downloading binaries. Under the heading Release Builds, follow the Tomcat X.X link. Get the file jakarta-tomcat-X.X.tar.gz. dbc@csit.fsu.edu 51 Unpacking and Setting the Environment Unpack the compressed file, e.g.: gunzip -c jakarta-tomcat-X.X.tar.gz | tar xvf - Set the environment variables TOMCAT_HOME and JAVA_HOME, e.g.: export TOMCAT_HOME=$HOME/jakarta-tomcat-X.X export JAVA_HOME=/usr/java/jdk1.Y.Y Most likely you will also want to add these commands to your .bashrc file. dbc@csit.fsu.edu 52 Servers on Course Hosts: Ground Rules The system manager would like to be able to keep track of who is running what Web server. – Also we want to avoid overloading the course hosts. You will each be allocated a port number on one of the three course hosts. Please stick with this port number and host for you main server. – You can run additional servers on random port numbers for brief experiments, but please not for extended periods. – Of course avoid port numbers allocated to other students! Your Tomcat home directory should be directly nested in your top-level home directory. – The management reserves the right to read and modify your server configuration if it seems to be causing problems. dbc@csit.fsu.edu 53 Choosing a Port Edit the file jakarta-tomcat-X.X/conf/server.xml. Find the Connector element that defines the parameters of the HTTP connection handler. It looks like: <Connector className=“. . .”> <Parameter name=“handler” value=“. . . .HttpConnectionHandler”> <Parameter name=“port” value=“8080”> </Connector> If you are using a course host, change the value of the port parameter from its default 8080 to a port number you have been allocated. dbc@csit.fsu.edu 54 Removing the AJP Connector In the file jakarta-tomcat-X.X/conf/server.xml you will also find a Connector element defining the parameters of an “AJP connection handler” (used for interactions with an Apache server). It looks like: <Connector className=“. . .”> <Parameter name=“handler” value=“. . . .Ajp12ConnectionHandler”> <Parameter name=“port” value=“8007”> </Connector> If you are using a course host, change the value of the port parameter from its default 8007 to a value unique to you— e.g. the a port number one greater than your HttpConnectionHandler port. Even if you are not going to use the Apache connection, the shutdown.sh script also uses this port, so the connection 55 handler is still required. dbc@csit.fsu.edu Starting and Stopping your Server If you are using a course host, these operations should be done on the host on which you have been allocated a port to run your main server. To start your server run the script: jakarta-tomcat-X.X/bin/startup.sh To stop your server run the script: jakarta-tomcat-X.X/bin/shutdown.sh If for any reason this fails, simply find the java process and kill it. dbc@csit.fsu.edu 56 Check Your Server is Running If you are running your server on a course host, and your allocated host/port pair is host/XXXX, point your browser at the URL: http://host.csit.fsu.edu:XXXX You should see the default Tomcat home page. In the Tomcat 3.1 release, the file for this home page is at: jakarta-tomcat-X.X/webapps/ROOT/index.html dbc@csit.fsu.edu 57 First Servlets dbc@csit.fsu.edu 58 Creating a Context Before writing a servlet, you need a place to put it. Shut down your server, if it is running. In the file jakarta-tomcat-X.X/conf/server.xml, find the example Context elements. Add a new context element such as: – – – – <Context path=“/dbc” docBase=“webapps/dbc/” debug=“0” reloadable=“true”> </Context> The path attribute defines a logical path that appears in the URL. The docBase attribute defines the physical directory where HTML and servlets live. Be careful to put /s in all the right places! The reloadable flag is supposed to allow servlet classes to be reloaded into a running server if they have been modified. We set it true because that is the recommended default during development. Note, however, it does not work very reliably! dbc@csit.fsu.edu 59 Creating a Document Directory This can be created as a subdirectory of jakarta-tomcat-X.X/webapps/ With the server configuration defined above, I create a subdirectory: jakarta-tomcat-X.X/webapps/dbc/ This will be the root directory for my HTML documents. To check my configuration is working properly, I can put a file index.html in dbc/, restart my server, and point my browser at: http://host.csit.fsu.edu:XXXX/dbc where host/XXXX is my host/port pair. I should see the contents of the HTML file. dbc@csit.fsu.edu 60 A Directory for Servlet Classes Now I create the subdirectories: jakarta-tomcat-X.X/webapps/dbc/WEB-INF/ and jakarta-tomcat-X.X/webapps/dbc/WEB-INF/classes/ The latter directory is where I put class files and package subdirectories for servlets. The WEB-INF subdirectory will not be directly visible to browsers as a document directory. dbc@csit.fsu.edu 61 A “Hello World” Servlet import java.io.* ; import javax.servlet.* ; import javax.servlet.http.* ; public class HelloWorld extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType(“text/html”) ; PrintWriter out = response.getWriter() ; out.println(“<html><body>”) ; out.println(“<h1>Hello World!</h1>”) ; out.println(“</html></body>”) ; } } dbc@csit.fsu.edu 62 Remarks This program should be contained in a file HelloWorld.java, which may be placed in the classes/ subdirectory. HttpServlet is the base class for servlets running in HTTP servers. Although servlets can be written for other kinds of server, in reality servlets are nearly always HttpServlets. The doGet() method is called in response to an HTTP GET request directed at the servlet. As the names suggest, the arguments describe the browser’s request and the servlet’s response. Before writing to the output stream associated with the response, the content type header (at least) must be set. dbc@csit.fsu.edu 63 Setting the Class Path Before compiling servlet code you will have to set the class path to include some related libraries. – The server apparently also needs the class path to be set at the time it is started, before it can run servlets. Shut down the server again. Set your class path to include some necessary jar files, e.g.: export CLASSPATH=$TOMCAT_HOME/lib/servlet.jar:\ $TOMCAT_HOME/lib/jasper.jar:\ $TOMCAT_HOME/lib/jaxp.jar Again, probably add this command to your .bashrc file. – A back-slash, \, at the end of a line is a line-continuation character (it “escapes” the EOL). Do not include it if you type the whole command on one line! – To avoid grief in the future, also make sure now that the working directory is on you class path, e.g: CLASSPATH=$CLASSPATH: . Restart the server. dbc@csit.fsu.edu 64 Compiling and Deploying the Servlet This is straightforward: javac HelloWorld You should now be able to view the servlet. In my case I point my browser at the URL: http://host.csit.fsu.edu:XXXX/dbc/servlet/HelloWorld Note that by default the Tomcat server will run with the same privileges as the user who started it. This means you don’t actually need to make files world readable (because you have privileges to read them). It also means you have to be careful. If you stick with this default you must never deploy servlets that have the power to damage or compromise your account – e.g. by reading or writing arbitrary files, or executing random commands! dbc@csit.fsu.edu 65 A Servlet that Reads a Parameter Define a new servlet class called HelloUser. This is identical to the class HelloWorld, except that the line: out.println(“<h1>Hello World!</h1>”) ; is replaced with out.println(“<h1>Hello ” + request.getParameter(“who”) + “!</h1>”) ; dbc@csit.fsu.edu 66 First Form using a Servlet In the directory jakarta-tomcat-X.X/webapps/dbc/ I place an HTML file hello.html containing the form element: <form method=get action=“http://sirah.csit.fsu.edu:8081/dbc/servlet/HelloUser”> Name: <input type=text name=who size=32> <p> <input type=submit> </form> This assumes my host/port pair is sirah/8081. To view this form, I point my browser at the URL: http://sirah.csit.fsu.edu:8081/dbc/hello.html If I enter my name and submit the form, I get back a page containing the message: Hello Bryan! dbc@csit.fsu.edu 67 The Servlet Life Cycle dbc@csit.fsu.edu 68 Servlet Classes Any servlet class implements the interface javax.servlet.Servlet. This interface defines a few low-level methods, including the low-level request-handling method, service(). – Perhaps the only method from Servlet you will use explicitly is getServletConfig(). All servlets we will be concerned with are extended from the base class javax.servlet.http.HttpServlet (which implements Servlet). dbc@csit.fsu.edu 69 Servlet Instances By default, (at most) one instance of a given servlet class will ever be created by a Web server process. By default, the servlet class is loaded into the Web server’s JVM, and the unique servlet instance is created, the first time any client sends a request to a URL identifying the servlet class. Subsequent requests to the same URL are all handled by the same servlet class instance. – By default, however, each request is handled in a different Java thread. This means that a later request can access results of processing an earlier request through values of instance variables (or class variables). dbc@csit.fsu.edu 70 The init() Method The init() method: public void init() throws ServletException {. . .} is quite analogous to the init() method on applets. It is called once when the servlet is created. You override it to define initialization code for your servlet instance. As with applets, this is used in preference to defining a non-default constructor, because you are allowed to access initialization parameters inside init() (but not in a constructor). – There is another lower-level init() method: public void init(ServletConfig config) throws ServletException {. . .} Don’t override it. Instead, if you need a ServletConfig during initialization, call getServletConfig() in the body of the noargument init(). dbc@csit.fsu.edu 71 The Request Handling Methods These are where you put the code that handles HTTP requests to URL of the servlet. The available request-handling methods are doGet() doPost() doPut() doDelete() doOptions() doTrace() Handle HTTP GET request. Handle HTTP POST request. Handle HTTP PUT request. Handle HTTP DELETE request. Handle HTTP OPTIONS request. Handle HTTP TRACE request. – Note there is no doHead(). These have generic signature: protected void doXxx(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {. . .} dbc@csit.fsu.edu 72 Last Modification Date When a browser reloads a page, it can include an If_Modified-Since header. If the document has not been modified since the specified date , the server response will be a simple “Not Modified” status code (no data). For dynamically generated content, OS date-stamps on document files are not enough to determine whether the effective content will be different. Instead a servlet can override: protected long getLastModified(HttpServletRequest req) throws ServletException, IOException {. . .} and thus take advantage of browser caching. – The returned date is in standard Java representation— milliseconds since New Year, 1970. dbc@csit.fsu.edu 73 The destroy() method. Finally a servlet can also override public void destroy() {. . .} If the Web server terminates gracefully, it will invoke destroy() on all servlet instances it holds before shutting down. In principle, this is a place where you can put code to back-up the current state of the servlet to persistent storage. The servlet can restart from restored state when the Web server is restarted. In practice, servers (especially Tomcat!) often terminate “ungracefully”, when the system crashes or the server process is killed. Relying on destroy() methods being called is probably not advisable. dbc@csit.fsu.edu 74 A Counter Servlet import java.io.* ; import javax.servlet.* ; import javax.servlet.http.* ; public class Counter extends HttpServlet { int count = 0 ; public void doGet(HttpServletRequest req, HttpServletResponse resp) throws IOException, ServletException { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; out.println(“This servlet instance has been accessed ” + (count++) + “ times”) ; out.println(“ </body></html>”) ; } } Example taken from “Java Servlet Programming”, O’Reilly. dbc@csit.fsu.edu 75 Remarks The first time I point my browser at this servlet (e.g. at http://sirah.csit.fsu.edu:8081/dbc/servlet/Counter ), I get a response page containing the message: This servlet has been accessed 0 times Each time I reload the URL, the count increases. Since count is an instance variable of the class, this illustrates that indeed only a single instance of Counter is created. This servlet is not completely reliable, because it is possible to have concurrent requests in different threads. The instance variable count is shared by threads. This could lead to problems of interference. dbc@csit.fsu.edu 76 Mutual Exclusion In general, any access to – servlet instance variables, – servlet class variables, – external files, etc that may be modified by any HTTP request on the servlet, should be guarded by synchronized methods or a synchronized statement. This is very important! For example, the increment of count could be done in a synchronized statement: int myCount ; synchronized(this) myCount = count++ ; Subsequently the local variable myCount—which is private to the thread—is printed in the response. dbc@csit.fsu.edu 77 Registering Servlet Instances In simple cases we don’t need to explicitly register servlets with the Web server. Instances will simply be created on demand. However, registering servlets has various advantages: – we can give the servlets meaningful names, or map them to simpler URL addresses, – we can create multiple instances of the same servlet class, with different names, – we can set initialization parameters for the instance, etc. With Tomcat, servlets can be registered by creating entries in an XML file called web.xml, which is placed in the WEB-INF/ subdirectory for your context. dbc@csit.fsu.edu 78 Example Registering a Servlet I copy the example file: jakarta-tomcat-X.X/webapps/examples/WEB-INF/web.xml to my personal context directory: jakarta-tomcat-X.X/webapps/dbc/WEB-INF/ I delete the existing <servlet>. . .</servlet> and <servlet-mapping>. . .</servlet_mapping> elements from my copy, and replace them with: <servlet> <servlet-name>counter1</servlet-name> <servlet-class>Counter</servlet-class> </servlet> I restart the server. dbc@csit.fsu.edu 79 Multiple Instances I can view the registered servlet at the URL: http://sirah.csit.fsu.edu:8081/dbc/servlet/counter1 Now add a second servlet element to the web.xml file: <servlet> <servlet-name>counter2</servlet-name> <servlet-class>Counter</servlet-class> </servlet> After restarting the server again, I find that the access count for the original servlet and the second servlet at: http://sirah.csit.fsu.edu:8081/dbc/servlet/counter2 are updated independently. dbc@csit.fsu.edu 80 Initialization Parameters A new counter servlet, defining an init() method: public class InitCounter extends HttpServlet { int count ; public void init() throws ServletException { ServletConfig config = getServletConfig() ; try { count = Integer.parseInt(config.getInitParameter(“initial”)) ; } catch (NumberFormatException e) { count = 0 ; } } public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { ... } } dbc@csit.fsu.edu 81 Defining Initialization Parameters In web.xml, I add the element: <servlet> <servlet-name>counter1</servlet-name> <servlet-class>Counter</servlet-class> <init-param> <param-name>initial</param-name> <param-value>50</param-value> </init-param> </servlet> Now when I restart the server and point my browser at, say: http://sirah.csit.fsu.edu:8081/dbc/servlet/counter I get a response page containing the message: This servlet has been accessed 50 times dbc@csit.fsu.edu 82 Handling Requests dbc@csit.fsu.edu 83 Reading Form Data Servlets make reading form data easy (at least in common cases). If a particular parameter name is known to have only a single value, one can just apply the method: public String getParameter(String name) {. . .} to the HttpServletRequest parameter of the doGet() or doPost() method. Use of this method was illustrated earlier in the HelloUser example. Note parameter names are case sensitive. dbc@csit.fsu.edu 84 Uniform support for GET and POST The parameter-reading methods behave the same for GET and POST requests. It is natural to support both kinds of request with the same code. To do this, simply have doGet() dispatch to doPost(), or vice versa. For example: public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { doPost(request, response) ; } dbc@csit.fsu.edu 85 Determining the HTTP Method The getMethod() method on the HttpServletRequest returns the HTTP method appearing in the header. For example, when a client sends the HTTP HEAD request, the server is supposed to treat it like a GET request, but return the headers only—not the data. The server will automatically discard any data doGet() returns, but (if you had the urge) you could make things a bit more efficient as follows: public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { . . . set headers . . . if(request.getMethod().equals(“HEAD”)) return ; . . . return data . . . } dbc@csit.fsu.edu 86 Information from Request Headers getMethod() is one a series of convenience methods that read information from the request headers. Others include: getRequestURI(), getProtocol() getContentLength() getContentType() getAuthType(), getRemoteUser() getCookies() dbc@csit.fsu.edu Method header Content-Length header Content-Type header Authorization header See later 87 Reading Request Headers Directly Preceding methods are not exhaustive. If you know the name of the header you want, use String getHeader(String name) For headers (e.g. Accept-Language) that can appear multiple time in a given request, use: java.util.Enumeration getHeaders(String name) To simply enumerate all headers of a given request, use java.util.Enumeration getHeaderNames() in conjunction with getHeader(). dbc@csit.fsu.edu 88 Displaying All Headers public class Headers extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; Enumeration headers req.getHeaderNames() ; while(headers.hasMoreElements()) { String name = (String) headers.nextElement() ; out.println(name + “<br>” + req.getHeader(name) + “<br><br>”) ; } out.println(“</body></html>”) ; } } dbc@csit.fsu.edu 89 HTTP 1.1 Request Headers Accept: MIME types the browser can handle Accept-Charset: Character sets the browser can handle Accept-Encoding: Encoding (e.g gzip) Accept-Language: English (en), etc. Authorization: User ID/password Cache-Control: For proxy servers. Connection: Can the browser keep connections alive? Content-Length: of POSTed data Content-Type: MIME encoding Cookie: Cookies previously received from this site. Expect: Browser wishes to attach a document From: email address of requester. Host: host/port information on original URL dbc@csit.fsu.edu 90 HTTP 1.1 Request Headers (cont.) If-Match: If-Modified-Since: only send recently changed data. If-Match: If-None-Match: If-Range: If-Unmodified-Since: Used with PUT. Pragma: onlystandard value is no-cache. Proxy-Authorization: Range: Get part of document. Referer: Set if was link from a Web page Upgrade: Change protocol User-Agent: Identifies browser Via: Set by gateways and proxies Warning: dbc@csit.fsu.edu 91 Multiple-valued Parameters If a form parameter can have more than one value (e.g. a value from a menu allowing multiple selections), you should apply the method: public String [] getParamterValues(String name) {. . .} to the HttpServletRequest object. Recall this example from the section on forms: <select name=pets size=3 multiple> <option value=dog> Dog <option value=cat> Cat <option value=bird> Bird <option value=fish> Fish </select> The form may send the data in a GET request to the following servlet. dbc@csit.fsu.edu 92 Handling Multi-valued Parameters public class MultiValue extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType(“text/html”) ; PrintWriter out = response.getWriter() ; String [] pets = request.getParameterValues(“pets”) ; out.println(“<html><body><head></head>”) ; out.println(“Your pets:<p>”) ; out.println(“<table border cellspacing=0 cellpadding=5>”) ; for (int i = 0 ; i < pets.length ; i++) out.println(“<tr><td>” + pets [i] + “</td></tr>”) ; out.println(“</table>”) ; out.println(“</html></body>”) ; } } dbc@csit.fsu.edu 93 Multi-part Data Recall this (slightly modified) example from the section on forms: <form method=post enctype=“multipart/form-data” action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/MultiPart”> Course: <input name=course size=20> <p> Students file: <input type=file name=students size=32> <p> <input type=submit> </form> The simple getParam() approach does not appear to work for multi-part data (required for uploading files). However, we can resort to a lower-level CGI-like approach—reading the posted data from an input stream, and decoding it “by hand”. dbc@csit.fsu.edu 94 Displaying Raw Multi-part Data public class MultiPart extends HttpServlet { public void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; String contentType = req.getContentType() ; out.println(“content type:<br>” + contentType + “<br>”) ; BufferedReader in = new BufferedReader(req.getReader()) ; while(true) { String line = in.readLine() ; if(line == null) break ; out.println(line + “<br>”) ; } out.println(“</body></html>”) ; } dbc@csit.fsu.edu 95 } Remarks This servlet will simply print value of the Content-Type header, and the raw version of the posted data. In general it is not safe to combine this style of reading data, using getReader(), with the higher-level approach, using getParameter()—choose one or the other. dbc@csit.fsu.edu 96 Multi-part Data Example public void doPost(HttpServletRequest req, HttpServletResponse resp) { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; Vector students = new Vector() ; String course = parseFormData(students, req) ; out.println(“course: ” + course + “<br>”) ; out.println(“students: <br>”) ; out.println(“<table border cellspacing=0 cellpadding=5>”) ; for (int i = 0 ; i < pets.length ; i++) out.println(“<tr><td>” + (String) students.get(i) + “</td></tr>”) ; out.println(“</table>”) ; out.println(“</body></html>”) ; } } dbc@csit.fsu.edu 97 Multi-part Data Example (cont.) public String parseFormData(Vector students, HttpServletRequest req) { String contentType = req.getContentType() ; String boundary = “--” + contentType.substring( . . . ) ; //Extract part boundary from content type header BufferedReader in = new BufferedReader(req.getReader()) ; String line = in.readLine() ; while(! line.equals(boundary + “--”)) ) { String header = in.readLine() ; String name = header.substring( . . . ) ; //Extract parameter name from content disposition header if(name.equals(“course”)) { course = in.readLine() ; line = in.readLine() ; } else if(name.equals(“students”)) while(true) { line = in.readLine() ; if(line.startsWith(boundary)) break ; students.addElement(line) ; } } return course ; } dbc@csit.fsu.edu 98 } Remarks The parseFormData() implementation outlined here is schematic only. Parsing the multi-part MIME encoded data is straightforward, but clearly fairly tedious. – Servlets don’t give much help here. dbc@csit.fsu.edu 99 Generating Responses dbc@csit.fsu.edu 100 The HTTP Status Line A minimal server response to a client request might be: HTTP/1.1 200 OK Content-Type: text/plain Hello World! We already saw how to set the content type explicitly using setContentType(). Here we are more interested in the first header line: the status line. As the example suggests, a status value of 200 means the request was successfully serviced. For a servlet response, the Web server sets this status value by default. A servlet can explicitly set other values by using the setStatus() method of HttpResponse. dbc@csit.fsu.edu 101 HTTP Status Codes 100 Continue: Response to Expect request. 100 Switching Protocols: Response to Upgrade request. 200 OK: OK! 201 Created: Server created a document. URL follows. 202 Accepted: Processing is in progress. 203 Non-Authoritative Information: 204 No Content: No new document is available. 205 Reset Content: Clear form fields. 206 Partial Content: Response to Range request. 300 Multiple Choices: Trick question? 301 Moved Permanently: Document is elsewhere 302 Found: Redirects the browser to a different URL. 303 See Other: Please use GET instead of POST. 304 Not Modified: Response to request with If-Modified-Since. 305 Use Proxy: Go to proxy at returned URL 307 Temporary Redirect: like 302. dbc@csit.fsu.edu 102 HTTP Status Codes (cont.) 400 Bad Request: Syntax error. 401 Unauthorized: No appropriate Authorization header 403 Forbidden: Not allowed with any authorization 404 Not Found: Not at this address. 405 Method Not Allowed: Self explanatory. 406 Not Acceptable: Resource doesn’t match Accept header. 407 Proxy Authentication Required: 408 Request Timeout: Client took too long sending request. 409 Conflict: Used with PUT. 410 Gone: Document has gone. 411 Length Required: Content-Length missing (in POST). 412 Precondition Failed: 413 Request Entity Too Large: Document too big to handle. 414 Request URI Too Long: URI is too long 415 Unsupported Media Type: 416 Requested Range Not Satisfiable: 417 Expectation Failed: Disillusioned? dbc@csit.fsu.edu 103 HTTP Status Codes (cont.) 500 Internal Server Error: Server is confused. 501 Not Implemented: Requested functionality not supported. 502 Bad Gateway: Used by proxy servers. 503 Service Unavailable: Server overloaded or service down. 504 Gateway Timeout: Used by proxy servers. 505 HTTP Version Not Supported: Self explanatory. dbc@csit.fsu.edu 104 Explicitly Returning Status Codes These status values are available as predefined constants in the HttpServletResponse class: final int SC_OK = 200 ; final int SC_FOUND = 302 ; final int SC_NOT_FOUND = 404 ; etc. The default status is equivalent to explicitly doing: resp.setStatus(HttpServletResponse.SC_OK) ; There are a couple of convenience methods on HttpServletResponse for dealing with common cases: void sendError(int sc, String message) – send specified status, with generated page containing message. void sendRedirect(String location) – send SC_TEMPORARY_REDIRECT status, and include Location header. dbc@csit.fsu.edu 105 Redirecting the Browser By sending the SC_FOUND or SC_TEMPORARY_REDIRECT status, together with a dynamically generated URL, a servlet can cause a the browser to go directly to a different page or site (without the user manually clicking another link). Following is a simplified version of an example from “Core Servlets and Java Server Pages”. It allows the user to specify a search string and a preferred search engine, dynamically generates a query URL for the chosen search engine, and redirects the browser to that URL. dbc@csit.fsu.edu 106 Search-Engine Selection Servlet public class Search extends HttpServlet { public void doGet(HttpServletRequest req, HttpServletResponse resp) { String searchEngine = req.getParameter(“searchEngine”) ; String searchString = req.getParameter(“searchString”) ; String url = null ; if(searchEngine.equals(“google”) url = “http://www.google.com/search?q=” + searchString ; if(searchEngine.equals(“lycos”) url = “http://lycospro.lycos.com/cgibin/pursuit?query=” + searchString ; if(searchEngine.equals(“hotbot”) url = “http://www.hotbot.com/?MT=” + searchString ; resp.sendRedirect(url) ; dbc@csit.fsu.edu 107 } } Remarks The sendRedirect() call does everything necessary to create the response. To deal with complex search strings, you should probably URL-encode searchString before appending it to url. A possible form: <form method=get action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/Search”> Search engine: <p> Google: <input type=radio name=searchEngine value=google checked> Lycos: <input type=radio name=searchEngine value=lycos> Hotbot: <input type=radio name=searchEngine value=hotbot> <p> dbc@csit.fsu.edu Search string: <input type=text name=searchString 108 Introduction to Session Tracking dbc@csit.fsu.edu 109 The Problem HTTP is a stateless protocol—it provides no intrinsic way to associate one request/response transaction with any subsequent transactions. But very often a Web application requires that the server engage in a non-trivial dialog with a single user, involving multiple client requests and server responses. So the problem is to find ways to define and keep track of a particular “session” between browser and Web server. This is called session tracking. dbc@csit.fsu.edu 110 Solutions There are three solutions in common use: – Hidden Form Fields Assumes all client requests associated with the session are form submissions. The forms must be dynamically generated by the server, and include hidden input fields that preserve session information. – URL-Rewriting Again assumes all pages associated with the session are dynamically generated by the server. Session information is directly appended to any URLs referring back to the server in the generated pages. – Cookies An extension to HTTP allows a server to ask a browser to store small amount of persistent information. The browser returns this information in HTTP request headers, typically whenever the client revisits a Web server on the same host. dbc@csit.fsu.edu 111 Example Using Hidden Form Fields The classic example of “session information” is the contents of a customer’s shopping cart at an online store. In the interests of fitting code in slides, we scale this down and deal with selections from a virtual snackvending machine. – “. . . Think of clocks and counters and telephones and board games and vending machines.” C.A.R Hoare, Communicating Sequential Processes, 1985. dbc@csit.fsu.edu 112 Snack-Vending Machine public class VendingMachine extends HttpServlet { String[] snacks = {“Chips”, “Popcorn”, “Peanuts, . . . } ; public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; String [] selections = req.getParameterValues(“selection”) ; out.println(“<html><head></head><body>”) ; for(int i = 0 ; i < snacks.length ; i++) { out.println(“<form action=” + selectURL + “>”) ; out.println(“<input type=submit name=selection ” + “value=\“” + snacks [i] + “\”>”) ; printHidden(out, selections) ; // print hidden fields out.println(“</form>”) ; } . . . generate form element for viewing current selections . . . out.println(“</body></html>”) ; dbc@csit.fsu.edu 113 } Remarks The servlet generates an HTML page with one form element for every snack. – selectURL is a reference back to this servlet. The submit button for each form sets a value for the parameter called selection: value set is name of the snack. Crucially, every form element in the generated page also sets again any pre-existing values for selection, using hidden input elements: void printHidden(PrintWriter out, String [] selections) { if(selections != null) for(int j = 0 ; j < selections.length ; j++) out.println(“<input type=hidden name=selection ” + “value=\“” + selections [j] + “\”>”) ; } The value of selections was returned by the the call 114 to dbc@csit.fsu.edu getParameterValues(), earlier in the servlet method. The Initial Page If I go to the URL of the servlet, perhaps http://sirah.csit.fsu.edu:8081/dbc/servlets/VendingMachine I see something like: dbc@csit.fsu.edu 115 Generated Source of Initial Page If we view HTML source of the initial page, it includes a series of forms: <html><head></head><body> <form action=http://sirah...:8081/dbc/servlet/VendingMachine> <input type=submit name=selection value="Chips"> </form> <form action=http://sirah...:8081/dbc/servlet/VendingMachine> <input type=submit name=selection value="Popcorn"> </form> <form action=http://sirah...:8081/dbc/servlet/VendingMachine> <input type=submit name=selection value="Peanuts"> </form> ... </body></html> selections was null, and initially there are no hidden fields. dbc@csit.fsu.edu 116 Making Selections If I click on a couple of the selections on the initial page, apparently nothing changes—each selection returns a generated page that looks identical in the browser. But if I view the generated HTML source. . . dbc@csit.fsu.edu 117 Generated Source of Later Pages <html><head></head><body> <form action=http://sirah...:8081/dbc/servlet/VendingMachine> <input type=submit name=selection value="Chips"> <input type=hidden name=selection value=”Peanuts"> <input type=hidden name=selection value=”Chips"> </form> <form action=http://sirah...:8081/dbc/servlet/VendingMachine> <input type=submit name=selection value="Popcorn"> <input type=hidden name=selection value=”Peanuts"> <input type=hidden name=selection value=”Chips"> </form> ... </body></html> Every form now contains hidden fields holding values that were in selections. dbc@csit.fsu.edu 118 Handling the Accumulated “State” The page returned by the VendingMachine servlet contains a final form generated by: out.println(“<form action=” + viewURL + “>”) ; out.println(“View current selections: <input type=submit>”) ; printHidden(out, selections) ; out.println(“</form>”) ; Here viewURL is a reference to a second servlet, which generates a page containing the contents of the hidden fields. dbc@csit.fsu.edu 119 Critique of Hidden Fields The approach is quite elegant, but it has some problems: – All interactions between client and server must go through forms. – Every form on every generated page must include the hidden fields defining the session state. – In our example, the number of hidden fields grew quickly. All approaches to session tracking run into problems analogous to the last: one wishes to keep down the amount of hidden information that must be exchanged in every single transaction of a session. For example, this will be important for the URL-rewriting approach, because we don’t want to end up with huge URLs. dbc@csit.fsu.edu 120 Session IDs The direct “hidden fields” approach does not store session state in any fixed place. The “state” is somehow encoded by the current point in an ongoing dialog. – Perhaps reminiscent of simulation of state by lazy lists in functional programming languages?? This is interesting, but, as noted, it means that the associated information is constantly swapped between client and server. An obvious solution is for the server to store the bulk of the data associated with each active session. The only session information bounced back and forth between client and server is an immutable identifier for the session. dbc@csit.fsu.edu 121 Improved Vending Machine Servlet In an improved version of our vending machine servlet, the main servlet has a static variable, sessionTable. – We make it static so it can be accessed by a separate servlet class, used for viewing or processing the current selections. This sessionTable is a HashMap. It is keyed by a session ID string. The associated values are “records” describing the current state of the session. In our simple example, each session-state “record” is a Vector containing the items selected thus far. In our example, the session ID is a random number generated when the servlet is initially called (without a sessionID parameter). This number is embedded as a hidden field in the generated pages, and thus returned in subsequent transactions. dbc@csit.fsu.edu 122 A Second Vending Machine static HashMap sessionTable = new HashMap() ; Random rand = new Random() ; // Seeded by current date/time public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { ... String sessionID = req.getParameter(“sessionID”) ; if(sessionID == null) { // First invocation in this session sessionID = “” + rand.nextInt() ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation Vector selections = (Vector) sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } . . . Print single hidden field in all forms: out.println(“<input type=hidden name=sessionID ” + “value=” + sessionID + “>”) ; ... } dbc@csit.fsu.edu 123 Remarks Our naive implementation does not worry about issues of thread safety. More strictly, accesses to sessionTable should be synchronized, eg: synchronized(sessionTable) sessionTable.put(sessionID, selections) ; This is sufficiently safe if we make the often-reasonable assumption that there are no concurrently active transactions involving the same session. – Without this assumption, access to the individual session records should be synchronized as well. The selection-viewing servlet can access the session table in the first servlet class by VendingMachine2.sessionTable. dbc@csit.fsu.edu 124 Server Restarts Our simplified implementation will fail ungraciously if the server is restarted while a browser is in the middle of a session. The session record disappears, while the session ID may still be stored in the browser. Unless session data is stored persistently there is no completely satisfactory solution, but a servlet writer should be aware of this possibility, and code defensively (perhaps sending an explanatory message to the browser). dbc@csit.fsu.edu 125 URL-Rewriting URL-rewriting can be regarded as an optimization of the hidden fields approach. Assuming a form with a hidden field is submitted using the GET method, what the server really sees is just a request whose URI has been extended with an encoding of the value in the hidden field. In URL-rewriting we cut out role of the browser (encoding session data from hidden fields) and directly extend the URL in the action attribute of the form with an encoding of the session data. As a byproduct, this also works for URLs in anchor elements (simple hypertext links). dbc@csit.fsu.edu 126 A Third Vending Machine public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { ... String sessionID ; String pathInfo = req.getPathInfo() ; if(pathInfo == null) { // First invocation in this session sessionID = “” + rand.nextInt() ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation sessionID = pathInfo.substring(1) ; // Strip leading “/” Vector selections = (Vector) sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } ... out.println(“<form action=” + selectURL + “/” + sessionID + “>”) ; out.println(“<input type=submit name=selection . . . >”) ; out.println(“</form>”)dbc@csit.fsu.edu ; 127 ... Remarks The session ID information is appended to the servlet URL in the action attribute of the forms (recall selectURL is the URL of this servlet). On any invocation, after the first in the session, this information can be retrieved using getPathInfo(). The getPathInfo() method on HttpServletRequest returns any text in the request URL following the servlet name (up to and excluding the ? that delimits the query string, if there is one). We now have the option to replace the form that connects to a selection-viewing servlet with a simple anchor element. – The URL in the anchor element is extended with the session ID information, just like the action attribute in a form. dbc@csit.fsu.edu 128 Cookies dbc@csit.fsu.edu 129 Cookies A cookie is a small piece of contextual information embedded in an HTTP response from a Web server. If a browser receives an HTTP response including a Set-Cookie header (and it is willing to accept cookies) it stores this information. The information can either be stored in the memory of the running browser program (“session cookies”) or saved to disk (“persistent cookies”). Subsequently, whenever the browser constructs an HTTP request for a server, it checks if it is storing any cookies for the server involved. If so, it returns the cookie information to the server, in a Cookie header in the new request. dbc@csit.fsu.edu 130 Uses of Cookies Recognizing a regular customer – A persistent cookie can save some identification information for the particular customer. The stored information may be actual name and details, or (preferably) some key into a database on the server. – When the customer returns to the site, associated information (mailing address, etc) is already known; it doesn’t have to be entered anew by the customer. – There are many variations on this theme, e.g. it allows portal sites to do focussed advertising. Session Tracking – Within the context of a single “visit” to a site, cookies can be used as an alternative to hidden fields or URL-rewriting, as the underlying mechanism for session tracking. dbc@csit.fsu.edu 131 Abuses of Cookies A poorly constructed commercial site might use cookies to store sensitive information (e.g. credit card numbers) on the hard disk of your PC. This might be a privacy problem if the PC is shared by several users. A Web site can persuade a browser to send a cookie to a third party site, by embedding an image that comes from the Web server of the third party. – The third party site might offer the original site collated information on its visitors. – It may be a particular nuisance if the third party has previously harvested the email address of the user, e.g. by sending them an HTML email containing a cookie-setting icon. – Moral: configure your browser to only send cookies to the actual page you are visiting? dbc@csit.fsu.edu 132 Limits to Cookies Typically a browser will restrict the number and size of cookies it will accept, e.g.: – Maximum of 20 cookies per site, – Maximum of 300 cookies total (from all sites), – Maximum size of individual cookie is 4 kilobytes. Users may of course configure their browsers to refuse all cookies, or only accept selected cookies. Hence a Web application should not rely on cookies for basic functionality—only for “added value”. dbc@csit.fsu.edu 133 The Servlet Cookie API The servlet creates a cookie by using a constructor for the class Cookie. Various attributes can be set for the cookie before sending it to the client. They include: – The name and value of cookie. These are usually set in the Cookie constructor. – The domain to which the cookie should be returned. By default the cookie will only be returned to the server that sent it, but this default can be overridden. – The URI path to which the cookie should be returned. By default, the cookie is only returned to pages in the same directory as the page that sent the cookie. – The time when a persistent cookie expires e.g., the cookie should be deleted by the browser after one hour, after one year, etc. dbc@csit.fsu.edu 134 A Servlet that Sets Two Cookies public class SetCookies extends HttpServlet { Random rand = new Random() ; // Seeded by current date/time public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { resp.setContentType(“text/html”) ; Cookie session = new Cookie(“mySessionCookie”, “” + rand.nextInt()) ; resp.addCookie(session) ; Cookie persistent = new Cookie(“myPersistentCookie”, “” + rand.nextInt()) ; persistent.setMaxAge(3600) ; // One hour resp.addCookie(persistent) ; } } PrintWriter out = resp.getWriter() ; out.println(“<html><head></head><body>”) ; out.println(“<h1>Enjoy your cookies!</h1>”) ; out.println(“</body></html>”) ; dbc@csit.fsu.edu 135 Remarks The arguments of the Cookie constructor are the cookie name and value. Here the value is a random number. Cookie names or values should not include white space or any of: “[”, “]”, “(”, “)”, “=”, “,”, “””, “/”, “?”, “@”, “:”, “;”. To make a cookie persistent, set the expiration time in seconds using setMaxAge(). By default the expiration time is negative, indicating a session cookie. dbc@csit.fsu.edu 136 Viewing the Set-Cookie Headers After deploying this servlet, we can view the headers it returns by modifying the TrivialBrowser class (introduced in the network programming lecture) to take host, port, and path arguments. . . dbc@csit.fsu.edu 137 HTTP Response Including Set-Cookie java TrivialBrowser sirah.csit.fsu.edu 8081 /dbc/servlet/SetCookies HTTP/1.0 200 OK Date: Mon, 13 Nov 2000 15:49:25 GMT Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1; Servlet 2.2; Java 1.2.2; Linux 2.2.14-5.0 i386; java.vendor=Sun Microsystems Inc.) Set-Cookie: mySessionCookie=1367792973 Set-Cookie: myPersistentCookie=1264283064;Expires=Mon, 13-Nov-2000 16:49:25 GMT Content-Language: en Content-Type: text/html Status: 200 <html><head></head><body> <h1>Enjoy the cookies!</h1> </body></html> dbc@csit.fsu.edu 138 Browser Behavior If we visit the SetCookies servlet with a real browser, we just see a message: Enjoy the cookies! Now if we point the browser at our earlier Headers servlet (extended to accept GET requests), we may see something like: User-Agent: Mozilla/4.51 [en] (X11; I; SunOS 5.7 sun4u) ... Cookie: mySessionCookie=1367792973; myPersistentCookie=1264283064 ... The browser is returning the cookies in a Cookie header. dbc@csit.fsu.edu 139 Retrieving Cookies with the Cookie API The previous example just used the generic getHeader() method to view the HTTP Cookie header returned by the browser. Of course the cookie API provides higher level methods to do this. dbc@csit.fsu.edu 140 Displaying Cookies public class ShowCookies extends HttpServlet { } public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { resp.setContentType(“text/html”) ; Cookies [] cookies = req.getCookies() ; PrintWriter out = resp.getWriter() ; out.println(“<html><body><head></head>”) ; out.println(“You returned cookies:<p>”) ; out.println(“<table border cellspacing=0 cellpadding=5>”) ; . . . write a header row . . . for (int i = 0 ; i < cookies.length ; i++) { Cookie cookie = cookies [i] ; out.println(“<tr><td>” + cookie.getName() + “</td>”) ; out.println(“<td>” + cookie.getValue() + “</td></tr>”) ; } out.println(“</table>”) ; out.println(“</html></body>”) ; } dbc@csit.fsu.edu 141 Remarks The method getCookies() returns an array of Cookie objects. The methods getName() and getValue() on a Cookie object naturally return name and value of the cookie. The API has no method for extracting just a cookie with a specified name from the request (i.e. nothing directly analogous to getParameter()). dbc@csit.fsu.edu 142 Visiting ShowCookies Initially: After visiting SetCookies: After restarting the browser: dbc@csit.fsu.edu 143 Session Tracking Using Cookies Our penultimate version of the vending machine servlet uses cookies instead of URL-rewriting. We need to define a method to retrieve a cookie with a given name, e.g. String getCookieValue(HttpServletRequest req, String name) { Cookie [] cookies = req.getCookies() ; for(int i = 0 ; i < cookies.length ; i++) { Cookie cookie = cookies [i] ; if(cookie.getName().equals(name)) return cookie.getValue() ; } return null ; } dbc@csit.fsu.edu 144 A Fourth Vending Machine public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { ... String sessionID= getCookieValue(request, “vending_session”) ; if(sessionID == null) { // First invocation in this session sessionID = “” + rand.nextInt() ; resp.addCookie(new Cookie(“vending_session”, sessionID)) ; sessionTable.put(sessionID, new Vector()) ; } else { // Subsequent invocation Vector selections = (Vector) sessionTable.get(sessionID) ; String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; } ... out.println(“<form action=” + selectURL + “>”) ; out.println(“<input type=submit name=selection . . . >”) ; out.println(“</form>”) ; ... } dbc@csit.fsu.edu 145 Remarks We no longer have to worry about rewriting forms and anchor elements. A “session” can now be tracked across links through intervening static HTML pages. However: – This version provides no way to terminate a session, short of restarting the browser. – There are subtle questions about how the “scope” of a session is delimited. – A functional programmer might argue that we lost “referential transparency”?? dbc@csit.fsu.edu 146 The Servlet Session Tracking API dbc@csit.fsu.edu 147 The Session Tracking API We have already illustrated several underlying approaches to session tracking: – Hidden fields, URL-rewriting, cookies. In general an application has to make a choice between these mechanisms, taking into account support in server and browser. Session cookies may be favored on grounds of generality and flexibility, but not all clients will accept them. In practice the servlet programmer does not have to worry too much about these issues. A high-level API is provided that will transparently choose and deploy a suitable low-level tracking mechanism. dbc@csit.fsu.edu 148 The HttpSession class A particular session is represented by an object from the HttpSession class. A session is defined as an association, lasting for some period, between a particular browser and a particular group of servlets on a server. The current session is obtained by applying the method getSession() to the HttpRequest. If no session object currently exists for this browser/servlet association, one will be created on the first call to getSession(). dbc@csit.fsu.edu 149 Simple Example public class GetSession extends HttpServlet { static final String myURL = . . . URL of this servlet . . . ; public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { HttpSession session = req.getSession(true) ; resp.setContentType(“text/html”) ; PrintWriter out = resp.getWriter() ; out.println(“<html><body><head></head>”) ; out.println(“<a href=” + resp.encodeURL(myURL) + “>” + “View servlet again</a>”) ; out.println(“</html></body>”) ; } } dbc@csit.fsu.edu 150 Remarks The true argument of getSession() means that a new session object will be created if one does not already exist. – For robustness you should probably always use this argument. (The documentation says that the form of getSession() without an argument is equivalent. Experience suggests maybe not?) This servlet simply outputs a link back to itself (it doesn’t explicitly use the session object). One important thing to note is the call to encodeURL(). This method should be applied to any URLs in the generated page that refer back to the same servlet context. This supports URL-rewriting (if this is the sessiontracking strategy adopted for the session). dbc@csit.fsu.edu 151 Viewing the Generated Page Pointing the browser at this page, we see a page containing a link: View servlet again If we view the HTML source of this generated page, we may see something like: <html><head></head><body> <a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/ GetSession;jsessionid=To1019mC0 . . . 365At > View servlet again </a> </body></html> The URL in this first generated page has been rewritten to include an attribute jsessionid. The associated value is a long, random-looking string. dbc@csit.fsu.edu 152 Checking the Cookies Before doing anything else, we visit the ShowCookies servlet. We may see something like: In the first HTTP response after the session is created, the servlet both rewrites URLs, and sends a cookie to the browser. The same session ID appears in both places. dbc@csit.fsu.edu 153 Revisiting the Servlet Going back to the GetSession servlet, if I follow the link to view the servlet again, I see a page that looks the same. But if I view the HTML source again I may see: <html><head></head><body> <a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/GetSession> View servlet again </a> </body></html> This time the URL is not rewritten. dbc@csit.fsu.edu 154 Selecting a Session Tracking Mechanism As noted, in the first response after a session is created, the servlet both sends a cookie and rewrites URLs If the browser returns the session ID cookie in a subsequent request, URL-rewriting can be disabled. If the browser is not returning cookies, URL-rewriting will continue. All this happens “behind the scenes”: the servlet programmer may not even be aware of the mechanism. dbc@csit.fsu.edu 155 Binding Information to a Session Of course this is not particularly useful unless we have a way to associate application information with the session. In previous examples we used a HashMap, keyed by session ID, to store session data. We may assume that analogous mechanisms are used behind the scenes in the session-tracking API, but the session ID is not usually directly accessed by the programmer. Instead, the application programmer just sees the HttpSession object. Methods are available to directly “cache” information in this object. – The session object itself behaves like a simple collection class. dbc@csit.fsu.edu 156 Some Methods on HttpSession public void setAttribute(String name, Object value) Add a reference to the object value to the session object, keyed by the string name. public void removeAttribute(String name) Remove the value associated with the key name from the session. public Object getAttribute(String name) Extract the value associated with the key name. Note the value object may implement HttpSessionBindingListener, in which case it will be notified when it is added or removed from a session. dbc@csit.fsu.edu 157 Session Attributes vs. Instance Variables In well-written Java programs, local variables are normally declared inside methods to hold values that are computed and used by only a single method invocation. Typically, instance variables are used to hold values that need to be shared across multiple invocations. In servlet programming—where several sessions may be concurrently operating on the single servlet instance—this role for instance variables is naturally taken over by attributes of the session object. Think hard before declaring an instance variable in a servlet. In many cases you should probably be using a session attribute instead. dbc@csit.fsu.edu 158 A Final (?) Vending Machine The first operation in the doGet() method is to retrieve or create a session object using getSession() We then attempt to extract a Vector object called selections from the session. If we fail, we can assume this is the first transaction in this session. A new Vector object is created, and added to the new session. Form parameters are added to the Vector as usual. Whenever URLs referring back to this servlet context appear in the generated HTML, they are passed through encodeURL(). dbc@csit.fsu.edu 159 A Fifth Vending Machine public void doGet(HttpServletRequest req, HttpServletResponse resp) throws . . . { HttpSession session = request.getSession(true) ; Vector selections = (Vector) session.getAttribute(“selections”) ; if(selections == null) { // First invocation in this session selections = new Vector() ; session.setAttribute(“selections”, selections) ; } String selection = req.getParameter(“selection”) ; if(selection != null) selections.addElement(selection) ; ... } out.println(“<form action=” + resp.encodeURL(selectURL) + “>”) ; out.println(“<input type=submit name=selection . . . >”) ; out.println(“</form>”) ; ... dbc@csit.fsu.edu 160 Remarks It is still recommended to use synchronized blocks to ensure thread safety. You can use the session object for synchronization, e.g.: synchronized (session) { Vector selections = (Vector) session.getAttribute(“selections”) ; if(selections == null) { selections = new Vector() ; session.setAttribute(“selections”, selections) ; } } As usual, the vending machine servlet will lead to a selection-viewing servlet when the user follows a suitable link. These two servlets automatically share the same session object, and thus session information, because they are in dbc@csit.fsu.edu 161 the same servlet context. The Scope of a Session A servlet context is a group of servlets (and possibly other Web entities), collected together in some directory. Under Tomcat, servlet contexts are defined in the server.xml file. – In the examples so far, the servlet context was /dbc. Several servlets may be involved in the same session, hence share the same HttpSession object. This sharing is automatic if the servlets are in the same context, and are interacting with the same browser. Servlets from different contexts in the same server, or interacting with different browsers, always have distinct HttpSession objects. dbc@csit.fsu.edu 162 Life-Time of a Session In general a session expires after some interval. The method: public void setMaxInactiveInterval(int seconds) on HttpSession can be used to request that the session will be invalidated if there has been no transaction during a period of the specified length. The method: public void invalidate() on HttpSession can be used to immediately invalidate a session. dbc@csit.fsu.edu 163