CIS6930: CGI and Servlets

advertisement
CIS 5930-04 – Spring 2001
Part 6: Introduction to CGI and Servlets
http://aspen.csit.fsu.edu/it1spring01
Instructors: Geoffrey Fox , Bryan Carpenter
Computational Science and Information Technology
Florida State University
Acknowledgements: Nancy McCracken
Syracuse University
dbc@csit.fsu.edu
1
Introduction





RMI gave us one approach to client/server programming.
The approach was based on the Java language and some
far-reaching ideas about remote objects, object
serialization, and dynamic class loading.
We could achieve direct integration into the traditional
World Wide Web through applets, but the technology is not
specifically tied to the Web.
RMI is powerful and general (and interesting), but it can be
a slightly heavy-handed approach if actually we only need
to interact with users through Web pages.
For the future, it may be more natural to view RMI as a
technology for the “middle tier” (or for connectivity in the
LAN) rather than for the Web client.
dbc@csit.fsu.edu
2
HTML Forms and CGI






There are long-established techniques for getting
information from users through Web browsers (predating
the appearance of Java on the Web).
The FORM element of HTML can contain a variety of input
fields.
The inputted data is harvested by the browser, suitably
encoded, and forwarded to the Web server.
On the server side, the Web server is configured to execute
an arbitrary program that processes the user’s form inputs.
This program typically outputs a dynamically generated
HTML document containing an appropriate response to the
user’s input.
The server-side mechanism is called CGI: Common
Gateway Interface.
dbc@csit.fsu.edu
3
CGI and Servlets

In conventional CGI, a Web site developer writes the
executable programs that process form inputs in a
language such as Perl or C.

The program (or script) is executed once each time a
form is submitted.

Servlets provide a more modern, Java-centric approach.

The server incorporates a Java Virtual Machine, which
is running continuously.

Invocation of a CGI script is replaced invocation of a
method on a servlet object.
dbc@csit.fsu.edu
4
Advantages of Servlets

Invocation of a single Java method is typically much
cheaper than starting a whole new program. So
servlets are typically more efficient than CGI scripts.
– This is important if we planning to centralize processing in the
server (rather than, say, delegate processing to an applet or
browser script).

Besides this we have the usual advantages of Java:
– Portability,
– A fully object-oriented environment for large-scale program
development.
– Library infrastructure for decoding form data, handling cookies,
etc (although many of these things are also available in Perl).
– Servlets are the foundation for Java Server Pages.
dbc@csit.fsu.edu
5
Plan of this Lecture Set



Review HTML forms and associated HTTP requests.
Briefly describe traditional CGI programming.
Detailed discussion of Java servlets:
–
–
–
–
–
–
Deploying Tomcat as a standalone Web server.
Simple servlets.
The servlet life cycle.
Servlet requests and responses. More on the HTTP protocol.
Approaches to session tracking. Handling cookies.
The servlet session-tracking API.
dbc@csit.fsu.edu
6
References

Core Servlets and JavaServer Pages, Marty Hall,
Prentice Hall, 2000.
– Good coverage and current, with some discussion of
the Tomcat server.

Java Servlet Programming, Jason Hunter and
William Grawford, O’Reilly, 1998.
– Also good, with some good examples. Slightly out of
date.
 Java
Servlet Specification, v2.2, and other
documents, at:
http://java.sun.com/products/servlet/
dbc@csit.fsu.edu
7
HTML Forms
dbc@csit.fsu.edu
8
The HTTP GET request


Before discussing forms, let’s look again at how the
GET request normally works.
The following server program listens for HTTP requests,
and simply prints the received request to the console.
dbc@csit.fsu.edu
9
A Dummy Web Server
public class DummyServer {
public static void main(String [] args) throws Exception {
ServerSocket server = new ServerSocket(8080) ;
while(true) {
Socket sock = server.accept() ;
BufferedReader in = new BufferedReader(
new InputStreamReader(sock.getInputStream())) ;
String method = in.readLine() ;
System.out.println(method) ;
while(true) {
String field = in.readLine() ;
System.out.println(field) ;
if(field.length() == 0) break ;
}
. . . Send a dummy response to client socket . . .
}
}
dbc@csit.fsu.edu
10
A GET Request

On the host sirah I run the dummy server:
sirah$ java DummyServer

Now I point a browser at
http://sirah.csit.fsu.edu:8080/index.html

The dummy server program might print:
GET /index.html HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.51 [en] (X11; I; ...)
Host: sirah.csit.fsu.edu:8080
Accept: image/gif, ..., */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
<blank line>
dbc@csit.fsu.edu
11
Fields of the GET request



The HTTP GET request consists of a series text fields
on separate lines, ended by an empty line.
The first line is the most important: it is called the
method field.
In simple GET requests, the second token in the method
line is the requested file name, expressed as a path
relative to the document root of the server.
dbc@csit.fsu.edu
12
A Simple HTML Form

The form element includes one or more input elements,
along with any normal HTML terms:
<html>
<body>
<form method=get
action=“http://sirah.csit.fsu.edu:8080/dummy”>
Name: <input type=text name=who size=32>
<p>
<input type=submit>
</form>
</body>
</html>
dbc@csit.fsu.edu
13
Remarks





The form tag includes important attributes method and
action.
The method attribute defines the kind of HTTP request
sent when the form is submitted: its value can be get or
post (see later).
The action attribute is a URL. In normal use it will
locate an executable program on the server. In this
case it is a reference to my “dummy server”.
An input tag with type attribute text represents a text
input field.
An input tag with type attribute submit represents a
“submit” button.
dbc@csit.fsu.edu
14
Displaying the Form

If I place this HTML document on a Web Server at a
suitable location, and visit its URL with a browser, I see
something like:
dbc@csit.fsu.edu
15
Submitting the Form

If I type my name, and click on the “Submit Query”
button, the dummy server running on sirah prints:
GET /dummy?who=Bryan HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.51 [en] (X11; I; ...)
Host: sirah.csit.fsu.edu:8080
Accept: image/gif, ..., */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
<blank>
dbc@csit.fsu.edu
16
Remarks

When the form specifying the get method is submitted,
the values inputted by the user are effectively appended
to the end of the URL specified in the action attribute.

In the HTTP GET request—sent when the submit button
is pressed—they appear attached to the second token
of the first line of the request.

In simple cases the appended string begins with a ?

This is followed by pairs of the form name=value, where
name is the name appearing in the name attribute of
the input tag, and value is the value entered by the user.

If the form has multiple input fields, the pairs are
separated by &
dbc@csit.fsu.edu
17
POST requests


This method of attaching input data to the URL is handy
if the user has a relatively simple query (e.g. for a
search engine).
For more complex forms it is usually recommended to
specify the post method in the form tag, e.g.:
<form method=post
action=“http://sirah.csit.fsu.edu:8080/dummy”>

In the HTTP protocol, a POST request differs from a
GET request by having some data appended after the
headers.
dbc@csit.fsu.edu
18
A Form Using the POST Method
<form method=post
action=“http://sirah.csit.fsu.edu:8080/dummy”>
Surname: <input type=text name=surname size=32>
<p>
Surname: <input type=text name=fornames size=40>
<p>
<input type=submit>
</form>
dbc@csit.fsu.edu
19
Extending the Dummy Server

We can modify the dummy server to display POST
requests, by declaring a variable contentLength, adding
the lines
if(field.stubstring(0, 16).equalsIgnoreCase(“Content-Length: ”)) ;
contentLength = Integer.parseInt(field.substring(16)) ;
inside the loop that reads the headers, and adding
for(int i = 0 ; i < contentLength ; i++)
int b = in.read() ;
System.out.println((char) b) ;
}
after that loop.
dbc@csit.fsu.edu
20
Submitting the Form

When I click on the “Submit Query” button, the dummy
server prints:
POST /dummy HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.51 [en] (X11; I; ...)
Host: sirah.csit.fsu.edu:8080
Accept: image/gif, ..., */*
Accept-Encoding: gzip
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8
Content-type: application/x-www-form-urlencoded
Content-Length: 39
surname=Carpenter&forenames=David+Bryan
dbc@csit.fsu.edu
21
Remarks




The method field (the first line) now starts with the word
POST instead of GET; the data is not appended to the
URL.
There are a couple more fields in the header, describing
the format of the data.
Most importantly, the form data is now on a separate
line at the end of file.
However, the form data is still URL-encoded.
dbc@csit.fsu.edu
22
URL Encoding

URL encoding is a method of wrapping up form-data in a
way that will make a legal URL for a GET request.

We have seen that the encoded data consists of a
sequence of name=value pairs, separated by &.

In the last example we saw that spaces are replaced by
+.

Non-alphanumeric characters are converted to the form
%XX, where XX is a two digit hexadecimal code.

In particular, line breaks in multi-line form data (e.g.
addresses) become %0D%0A—the hex ASCII codes for
a carriage-return, new-line sequence.

URL encoding is somewhat redundant for the POST
method, but it is the default anyway.
dbc@csit.fsu.edu
23
More Options for the input Tag


We can make a group of radio buttons in an HTML form by
using a set of input tags with the type attribute set to
radio.
Tags belonging to the same button group should have the
same name attribute, and distinct value attributes, e.g.:
<form method=post
action=“http://sirah.csit.fsu.edu:8080/dummy”>
Favorite primary color:
<p>
Red: <input type=radio name=color value=red>
Blue: <input type=radio name=color value=blue>
Green: <input type=radio name=color value=green>
<p>
<input type=submit>
</form>
dbc@csit.fsu.edu
24
Radio Buttons

The message sent to the server is:
...
Content-type: application/x-www-form-urlencoded
Content-Length: 10
color=blue
dbc@csit.fsu.edu
25
Checkboxes
<form method=post
action=“http://sirah.csit.fsu.edu:8080/dummy”>
What pets do you own?
<p>
<input type=checkbox name=pets value=dog checked>
Dog <br>
<input type=checkbox name=pets value=cat> Cat <br>
<input type=checkbox name=pets value=bird> Bird
<br>
<input type=checkbox name=pets value=fish> Fish
<p>
<input type=submit>
</form>

Example from “HTML and XHTML: The Definitive
Guide”, O’Reilly.
dbc@csit.fsu.edu
26
Checkboxes

The message posted to the server is:
...
pets=dog&pets=bird

Note there is no requirement that a form map a name to
a unique value.
dbc@csit.fsu.edu
27
File-Selection


You can name a local file in an input element, and have
the entire contents of the file posted by browser to server.
This is not allowed using the default URL-encoding for
form data. Instead you must specify multi-part MIME
encoding in the form element, e.g.:
<form method=post
enctype=“multipart/form-data”
action=“http://sirah.csit.fsu.edu:8080/dummy”>
Course: <input name=course size=20> <p>
Students file: <input type=file name=students size=32>
<p>
<input type=submit>
</form>
dbc@csit.fsu.edu
28
File-Selection Entry


With multi-part encoding, the data is no longer sent on a
single line.
On submission the DummyServer prints. . .
dbc@csit.fsu.edu
29
Output of DummyServer on submit
POST /dummy HTTP/1.0
Referer: http://sirah.csit.fsu.edu/users/dbc/forms/form5.html
...
Content-type: multipart/form-data; boundary=--------------------------269912718414714
Content-Length: 455
-----------------------------269912718414714
Content-Disposition: form-data; name="course"
CIS6930
-----------------------------269912718414714
Content-Disposition: form-data; name="students"; filename="students"
wcao
flora
Fulay
gao
...
zhao6930
zheng
-----------------------------269912718414714-dbc@csit.fsu.edu
30
Remarks



Each form field has its own section in the posted file,
separated by a delimiter specified in the Content-type
field of the header.
Within each section there are one or more header lines,
followed by a blank line, followed by the form data.
The values can contain binary data. There is no “URLencoding”.
dbc@csit.fsu.edu
31
Masked and Hidden fields



The input to a text field can be masked by setting the
type attribute to password. The entered text will not be
echoed to the screen.
If the type attribute is set to hidden, the input field is not
displayed at all. This kind of field is often used in HTML
forms dynamically generated by CGI scripts.
Hidden fields allow the CGI scripts to keep track of
“session” information over an interaction that involves
multiple forms—hidden fields may contain values
characterizing the session.
– Use of hidden fields will be one of the topics in the lectures on
servlets.
dbc@csit.fsu.edu
32
Text Areas


Similar to text input fields, but allow multi-line input.
Included in a form by using the textarea tag, e.g.:
<textarea name=address cols=40 rows=3>
. . . optional default text goes here . . .
</textarea>

With default (URL) encoding, lines of input are
separated by carriage return/newline, coded as
%0D%0A.
dbc@csit.fsu.edu
33
Text Area Input

Data posted to server:
address=Bryan+Carpenter%0D%0ACSIT%2C+FSU%0D%0ATal
lahassee%2C+FL+32306-4120
dbc@csit.fsu.edu
34
Scrollable Menus (Lists)

For long lists of options, when checkboxes become too
tedious:
<select name=pets size=3 multiple>
<option value=dog> Dog
<option value=cat> Cat
<option value=bird> Bird
<option value=fish> Fish
</select>

The value attribute in the option tag is optional: default
value returned is the displayed string, immediately
following the tag.

Without the multiple attribute, only a single option can
be selected.
dbc@csit.fsu.edu
35
List Input

The message posted to the server is:
...
pets=dog&pets=bird
dbc@csit.fsu.edu
36
Conventional CGI
dbc@csit.fsu.edu
37
Handling Form Data on the Server





In conventional CGI programming, the URL in the action
attribute of a form will identify an executable file
somewhere in the Web Server’s document hierarchy.
A common server convention is that these executables live
in a subdirectory of cgi-bin/
The executable file may be written in any language. For
definiteness we will assume it is written in Perl, and refer to
it as a CGI script.
The Web Server program will invoke the CGI script, and
pass it the form data, either through environment variables
or by piping data to standard input of the script.
The CGI script generates a response to the form, which is
piped to the Web server through its standard output, then
returned to the browser.
dbc@csit.fsu.edu
38
Operation of a CGI Script

At the most basic level, a CGI script must
– Parse the input (the form data) from the server, and
– Generate a response.

Most often the response is the text of a dynamically
generated HTML document, preceded by some HTTP
headers.
– In practice the only required HTTP header is the Content-type
header. The Web Server will fill in other necessary headers
automatically.

Even if there is no meaningful response to the input
data, the CGI script must output an empty message, or
some error message.
– Otherwise the server will not close the connection to the client,
and a browser error will occur.
dbc@csit.fsu.edu
39
“Hello World” CGI Script

In the directory /home/httpd/cgi-bin/users/dbc on sirah, I
create the file hello.pl, with contents:
#!/usr/bin/perl
print “Content-type: text/html\n\n” ;
print “<html><body><h1>Hello World!</h1></body></html>” ;

I mark this file world readable, and mark it executable:
sirah$ chmod o+r hello.pl
sirah$ chmod +x hello.pl

Now I point my browser at the URL:
http://sirah/cgi-bin/users/dbc/hello.pl
dbc@csit.fsu.edu
40
Output from CGI Script

The novel feature here is the the HTML was dynamically
generated: it was printed out on the fly by the Perl script.
dbc@csit.fsu.edu
41
Retrieving Form Data



Several environment variables are set up by the server to
pass information about the request to the Perl script.
If the form data was sent using a GET request, the most
important is QUERY_STRING, which contains all the text
in the URL following the first ? character.
If the form data was sent using a POST request, the
environment variable CONTENT_LENGTH contains the
length in bytes of the posted data. To retrieve this data,
these bytes are read from the standard input of the script.
dbc@csit.fsu.edu
42
GET example

I change our first form to submit data to a CGI script:
<form method=get
action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/getEg.pl”>
Name: <input type=text name=who size=32> <p>
<input type=submit>
</form>
and define getEg.pl by:
#!/usr/bin/perl
print “Content-type: text/html\n\n” ;
print “<html><body><h1>Hello
$ENV{QUERY_STRING}!</h1></body></html>\n” ;

When I point the browser at the form, enter my name, and
submit the form, the page returned to the browser contains
the message:
Hello
who=Bryan!
dbc@csit.fsu.edu
43
POST example

Change the form as follows:
<form method=post
action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/postEg.pl”>
Name: <input type=text name=who size=32> <p>
<input type=submit>
</form>
and define postEg.pl by:
#!/usr/bin/perl
print “Content-type: text/html\n\n” ;
for($i = 0 ; $i < $ENV{CONTENT_LENGTH} ; $i++) {
$in .= getc ;
}
print “<html><body><h1>Hello $i!</h1></body></html>\n” ;
dbc@csit.fsu.edu
44
Using the CGI module



The previous example illustrate the underlying
mechanisms used to communicate between server and
CGI program.
One could go on to use the text processing features of
Perl to parse the form data and generate meaningful
responses.
In modern Perl you can (and presumably should) use
the CGI module to hide many of these details—
especially extracting form parameter.
dbc@csit.fsu.edu
45
CGI module example

Change the form as follows:
<form method=post
action=“http://sirah.csit.fsu.edu/cgibin/users/dbc/CGIEg.pl”>
Name: <input type=text name=who size=32> <p>
<input type=submit>
</form>
and define CGIEg.pl by:
#!/usr/bin/perl
use CGI qw( :standard) ;
$name = param(“who”) ;
print “Content-type: text/html\n\n” ;
print “<html><body><h1>Hello $name!</h1></body></html>\n” ;

Now the browser gets a more friendly message like:
dbc@csit.fsu.edu
Hello
Bryan!
46
Getting Started with Servlets
dbc@csit.fsu.edu
47
Server Software

Standard Web servers typically need some additional
software to allow them to run servlets. Options include:
– Apache Tomcat
The official reference implementation for the servlet 2.2 and
JSP 1.1 specifications. It can stand alone or be integrated into
the Apache Web server.
– JavaServer Web Development Kit (JSWDK)
A small standalone Web server mainly intended for servlet
development.
– Sun’s Java Web server
An early server supporting servlets. Now apparently obsolete.
– Allaire JRun, New Atlanta’s ServletExec, . . .
dbc@csit.fsu.edu
48
Tomcat


In these lectures we will use Apache Tomcat for
examples.
For debugging of servlets it seems to be necessary to
use a stand-alone server, dedicated to the application
you are developing.
– The current architecture of servlets makes revision of servlet
classes already loaded in a Web server either disruptive or
expensive. In general you need to establish your classes are
working smoothly before they are deployed in a production
server.


Hence you will be encouraged to install your own private
server for developing Web applications.
Tomcat is the flagship product of the Jakarta project,
which produces server software based on Java.
dbc@csit.fsu.edu
49
Typical Modes of Operation of Tomcat
1. Stand-alone
Browser
Client
Servlet
Request
8080
Tomcat
Server
Apache
Browser
Client
Servlet
Request
80
2. In-process
servlet
container
Tomcat
Server
3. Out-ofprocess
servlet
container
80
Browser
Apache
Servlet
Request
Client
Tomcat 8007 Server
dbc@csit.fsu.edu
50
Downloading Tomcat

Go to the Jakarta home-page:
http://jakarta.apache.org



Follow the link for downloading binaries.
Under the heading Release Builds, follow the Tomcat
X.X link.
Get the file jakarta-tomcat-X.X.tar.gz.
dbc@csit.fsu.edu
51
Unpacking and Setting the Environment

Unpack the compressed file, e.g.:
gunzip -c jakarta-tomcat-X.X.tar.gz | tar xvf

-
Set the environment variables TOMCAT_HOME and
JAVA_HOME, e.g.:
export TOMCAT_HOME=$HOME/jakarta-tomcat-X.X
export JAVA_HOME=/usr/java/jdk1.Y.Y
Most likely you will also want to add these commands
to your .bashrc file.
dbc@csit.fsu.edu
52
Servers on Course Hosts: Ground Rules

The system manager would like to be able to keep track
of who is running what Web server.
– Also we want to avoid overloading the course hosts.

You will each be allocated a port number on one of the
three course hosts. Please stick with this port number
and host for you main server.
– You can run additional servers on random port numbers for brief
experiments, but please not for extended periods.
– Of course avoid port numbers allocated to other students!

Your Tomcat home directory should be directly nested in
your top-level home directory.
– The management reserves the right to read and modify your
server configuration if it seems to be causing problems.
dbc@csit.fsu.edu
53
Choosing a Port

Edit the file jakarta-tomcat-X.X/conf/server.xml. Find the
Connector element that defines the parameters of the
HTTP connection handler. It looks like:
<Connector className=“. . .”>
<Parameter name=“handler” value=“. . .
.HttpConnectionHandler”>
<Parameter name=“port” value=“8080”>
</Connector>

If you are using a course host, change the value of the port
parameter from its default 8080 to a port number you have
been allocated.
dbc@csit.fsu.edu
54
Removing the AJP Connector

In the file jakarta-tomcat-X.X/conf/server.xml you will also
find a Connector element defining the parameters of an
“AJP connection handler” (used for interactions with an
Apache server). It looks like:
<Connector className=“. . .”>
<Parameter name=“handler” value=“. . .
.Ajp12ConnectionHandler”>
<Parameter name=“port” value=“8007”>
</Connector>

If you are using a course host, change the value of the port
parameter from its default 8007 to a value unique to you—
e.g. the a port number one greater than your
HttpConnectionHandler port.

Even if you are not going to use the Apache connection, the
shutdown.sh script also uses this port, so the connection
55
handler is still required. dbc@csit.fsu.edu
Starting and Stopping your Server


If you are using a course host, these operations should
be done on the host on which you have been allocated
a port to run your main server.
To start your server run the script:
jakarta-tomcat-X.X/bin/startup.sh

To stop your server run the script:
jakarta-tomcat-X.X/bin/shutdown.sh

If for any reason this fails, simply find the java process
and kill it.
dbc@csit.fsu.edu
56
Check Your Server is Running

If you are running your server on a course host, and
your allocated host/port pair is host/XXXX, point your
browser at the URL:
http://host.csit.fsu.edu:XXXX


You should see the default Tomcat home page.
In the Tomcat 3.1 release, the file for this home page is
at:
jakarta-tomcat-X.X/webapps/ROOT/index.html
dbc@csit.fsu.edu
57
First Servlets
dbc@csit.fsu.edu
58
Creating a Context



Before writing a servlet, you need a place to put it. Shut
down your server, if it is running.
In the file jakarta-tomcat-X.X/conf/server.xml, find the
example Context elements.
Add a new context element such as:
–
–
–
–
<Context path=“/dbc” docBase=“webapps/dbc/”
debug=“0” reloadable=“true”>
</Context>
The path attribute defines a logical path that appears in the URL.
The docBase attribute defines the physical directory where HTML
and servlets live.
Be careful to put /s in all the right places!
The reloadable flag is supposed to allow servlet classes to be
reloaded into a running server if they have been modified. We set it
true because that is the recommended default during development.
Note, however, it does not work very reliably!
dbc@csit.fsu.edu
59
Creating a Document Directory

This can be created as a subdirectory of
jakarta-tomcat-X.X/webapps/

With the server configuration defined above, I create a
subdirectory:
jakarta-tomcat-X.X/webapps/dbc/

This will be the root directory for my HTML documents.
To check my configuration is working properly, I can put
a file index.html in dbc/, restart my server, and point
my browser at:
http://host.csit.fsu.edu:XXXX/dbc
where host/XXXX is my host/port pair. I should see the
contents of the HTML file.
dbc@csit.fsu.edu
60
A Directory for Servlet Classes

Now I create the subdirectories:
jakarta-tomcat-X.X/webapps/dbc/WEB-INF/
and
jakarta-tomcat-X.X/webapps/dbc/WEB-INF/classes/


The latter directory is where I put class files and
package subdirectories for servlets.
The WEB-INF subdirectory will not be directly visible to
browsers as a document directory.
dbc@csit.fsu.edu
61
A “Hello World” Servlet
import java.io.* ;
import javax.servlet.* ;
import javax.servlet.http.* ;
public class HelloWorld extends HttpServlet {
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
response.setContentType(“text/html”) ;
PrintWriter out = response.getWriter() ;
out.println(“<html><body>”) ;
out.println(“<h1>Hello World!</h1>”) ;
out.println(“</html></body>”) ;
}
}
dbc@csit.fsu.edu
62
Remarks





This program should be contained in a file
HelloWorld.java, which may be placed in the classes/
subdirectory.
HttpServlet is the base class for servlets running in
HTTP servers. Although servlets can be written for
other kinds of server, in reality servlets are nearly
always HttpServlets.
The doGet() method is called in response to an HTTP
GET request directed at the servlet.
As the names suggest, the arguments describe the
browser’s request and the servlet’s response.
Before writing to the output stream associated with the
response, the content type header (at least) must be
set.
dbc@csit.fsu.edu
63
Setting the Class Path

Before compiling servlet code you will have to set the class
path to include some related libraries.
– The server apparently also needs the class path to be set at the
time it is started, before it can run servlets.

Shut down the server again. Set your class path to include
some necessary jar files, e.g.:
export CLASSPATH=$TOMCAT_HOME/lib/servlet.jar:\
$TOMCAT_HOME/lib/jasper.jar:\
$TOMCAT_HOME/lib/jaxp.jar
Again, probably add this command to your .bashrc file.
– A back-slash, \, at the end of a line is a line-continuation character
(it “escapes” the EOL). Do not include it if you type the whole
command on one line!
– To avoid grief in the future, also make sure now that the working
directory is on you class path, e.g: CLASSPATH=$CLASSPATH: .

Restart the server.
dbc@csit.fsu.edu
64
Compiling and Deploying the Servlet

This is straightforward:
javac HelloWorld

You should now be able to view the servlet. In my case
I point my browser at the URL:
http://host.csit.fsu.edu:XXXX/dbc/servlet/HelloWorld



Note that by default the Tomcat server will run with the
same privileges as the user who started it.
This means you don’t actually need to make files world
readable (because you have privileges to read them).
It also means you have to be careful. If you stick with
this default you must never deploy servlets that have the
power to damage or compromise your account
– e.g. by reading or writing arbitrary files, or executing random
commands!
dbc@csit.fsu.edu
65
A Servlet that Reads a Parameter


Define a new servlet class called HelloUser.
This is identical to the class HelloWorld, except that the
line:
out.println(“<h1>Hello World!</h1>”) ;
is replaced with
out.println(“<h1>Hello ” +
request.getParameter(“who”) + “!</h1>”) ;
dbc@csit.fsu.edu
66
First Form using a Servlet

In the directory jakarta-tomcat-X.X/webapps/dbc/ I place
an HTML file hello.html containing the form element:
<form method=get
action=“http://sirah.csit.fsu.edu:8081/dbc/servlet/HelloUser”>
Name: <input type=text name=who size=32> <p>
<input type=submit>
</form>

This assumes my host/port pair is sirah/8081.
To view this form, I point my browser at the URL:
http://sirah.csit.fsu.edu:8081/dbc/hello.html

If I enter my name and submit the form, I get back a page
containing the message:
Hello Bryan!
dbc@csit.fsu.edu
67
The Servlet Life Cycle
dbc@csit.fsu.edu
68
Servlet Classes


Any servlet class implements the interface
javax.servlet.Servlet.
This interface defines a few low-level methods, including
the low-level request-handling method, service().
– Perhaps the only method from Servlet you will use explicitly is
getServletConfig().

All servlets we will be concerned with are extended from
the base class javax.servlet.http.HttpServlet (which
implements Servlet).
dbc@csit.fsu.edu
69
Servlet Instances



By default, (at most) one instance of a given servlet
class will ever be created by a Web server process.
By default, the servlet class is loaded into the Web
server’s JVM, and the unique servlet instance is
created, the first time any client sends a request to a
URL identifying the servlet class.
Subsequent requests to the same URL are all handled
by the same servlet class instance.
– By default, however, each request is handled in a different Java
thread.

This means that a later request can access results of
processing an earlier request through values of instance
variables (or class variables).
dbc@csit.fsu.edu
70
The init() Method

The init() method:
public void init() throws ServletException {. . .}


is quite analogous to the init() method on applets.
It is called once when the servlet is created. You override it
to define initialization code for your servlet instance.
As with applets, this is used in preference to defining a
non-default constructor, because you are allowed to
access initialization parameters inside init() (but not in a
constructor).
– There is another lower-level init() method:
public void init(ServletConfig config) throws ServletException {. . .}
Don’t override it. Instead, if you need a ServletConfig during
initialization, call getServletConfig() in the body of the noargument init().
dbc@csit.fsu.edu
71
The Request Handling Methods


These are where you put the code that handles HTTP
requests to URL of the servlet.
The available request-handling methods are
doGet()
doPost()
doPut()
doDelete()
doOptions()
doTrace()
Handle HTTP GET request.
Handle HTTP POST request.
Handle HTTP PUT request.
Handle HTTP DELETE request.
Handle HTTP OPTIONS request.
Handle HTTP TRACE request.
– Note there is no doHead().

These have generic signature:
protected void doXxx(HttpServletRequest req,
HttpServletResponse resp)
throws ServletException, IOException {. . .}
dbc@csit.fsu.edu
72
Last Modification Date




When a browser reloads a page, it can include an
If_Modified-Since header.
If the document has not been modified since the
specified date , the server response will be a simple
“Not Modified” status code (no data).
For dynamically generated content, OS date-stamps on
document files are not enough to determine whether the
effective content will be different.
Instead a servlet can override:
protected long getLastModified(HttpServletRequest req)
throws ServletException, IOException {. . .}
and thus take advantage of browser caching.
– The returned date is in standard Java representation—
milliseconds since New Year, 1970.
dbc@csit.fsu.edu
73
The destroy() method.

Finally a servlet can also override
public void destroy() {. . .}



If the Web server terminates gracefully, it will invoke
destroy() on all servlet instances it holds before
shutting down.
In principle, this is a place where you can put code to
back-up the current state of the servlet to persistent
storage. The servlet can restart from restored state
when the Web server is restarted.
In practice, servers (especially Tomcat!) often terminate
“ungracefully”, when the system crashes or the server
process is killed. Relying on destroy() methods being
called is probably not advisable.
dbc@csit.fsu.edu
74
A Counter Servlet
import java.io.* ;
import javax.servlet.* ;
import javax.servlet.http.* ;
public class Counter extends HttpServlet {
int count = 0 ;
public void doGet(HttpServletRequest req,
HttpServletResponse resp)
throws IOException, ServletException {
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><head></head><body>”) ;
out.println(“This servlet instance has been accessed ” +
(count++) + “ times”) ;
out.println(“ </body></html>”) ;
}
}

Example taken from “Java Servlet Programming”,
O’Reilly.
dbc@csit.fsu.edu
75
Remarks

The first time I point my browser at this servlet (e.g. at
http://sirah.csit.fsu.edu:8081/dbc/servlet/Counter
), I get a response page containing the message:
This servlet has been accessed 0 times



Each time I reload the URL, the count increases.
Since count is an instance variable of the class, this
illustrates that indeed only a single instance of Counter
is created.
This servlet is not completely reliable, because it is
possible to have concurrent requests in different
threads. The instance variable count is shared by
threads. This could lead to problems of interference.
dbc@csit.fsu.edu
76
Mutual Exclusion

In general, any access to
– servlet instance variables,
– servlet class variables,
– external files, etc

that may be modified by any HTTP request on the
servlet, should be guarded by synchronized methods or
a synchronized statement. This is very important!
For example, the increment of count could be done in a
synchronized statement:
int myCount ;
synchronized(this)
myCount = count++ ;

Subsequently the local variable myCount—which is
private to the thread—is printed in the response.
dbc@csit.fsu.edu
77
Registering Servlet Instances


In simple cases we don’t need to explicitly register
servlets with the Web server. Instances will simply be
created on demand.
However, registering servlets has various advantages:
– we can give the servlets meaningful names, or map them to
simpler URL addresses,
– we can create multiple instances of the same servlet class, with
different names,
– we can set initialization parameters for the instance, etc.

With Tomcat, servlets can be registered by creating
entries in an XML file called web.xml, which is placed in
the WEB-INF/ subdirectory for your context.
dbc@csit.fsu.edu
78
Example Registering a Servlet

I copy the example file:
jakarta-tomcat-X.X/webapps/examples/WEB-INF/web.xml
to my personal context directory:
jakarta-tomcat-X.X/webapps/dbc/WEB-INF/

I delete the existing <servlet>. . .</servlet> and
<servlet-mapping>. . .</servlet_mapping> elements
from my copy, and replace them with:
<servlet>
<servlet-name>counter1</servlet-name>
<servlet-class>Counter</servlet-class>
</servlet>

I restart the server.
dbc@csit.fsu.edu
79
Multiple Instances

I can view the registered servlet at the URL:
http://sirah.csit.fsu.edu:8081/dbc/servlet/counter1

Now add a second servlet element to the web.xml file:
<servlet>
<servlet-name>counter2</servlet-name>
<servlet-class>Counter</servlet-class>
</servlet>

After restarting the server again, I find that the access
count for the original servlet and the second servlet at:
http://sirah.csit.fsu.edu:8081/dbc/servlet/counter2
are updated independently.
dbc@csit.fsu.edu
80
Initialization Parameters

A new counter servlet, defining an init() method:
public class InitCounter extends HttpServlet {
int count ;
public void init() throws ServletException {
ServletConfig config = getServletConfig() ;
try {
count = Integer.parseInt(config.getInitParameter(“initial”)) ;
} catch (NumberFormatException e) {
count = 0 ;
}
}
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
...
}
}
dbc@csit.fsu.edu
81
Defining Initialization Parameters

In web.xml, I add the element:
<servlet>
<servlet-name>counter1</servlet-name>
<servlet-class>Counter</servlet-class>
<init-param>
<param-name>initial</param-name>
<param-value>50</param-value>
</init-param>
</servlet>

Now when I restart the server and point my browser at, say:
http://sirah.csit.fsu.edu:8081/dbc/servlet/counter
I get a response page containing the message:
This servlet has been accessed 50 times
dbc@csit.fsu.edu
82
Handling Requests
dbc@csit.fsu.edu
83
Reading Form Data


Servlets make reading form data easy (at least in
common cases).
If a particular parameter name is known to have only a
single value, one can just apply the method:
public String getParameter(String name) {. . .}

to the HttpServletRequest parameter of the doGet() or
doPost() method.
Use of this method was illustrated earlier in the
HelloUser example. Note parameter names are case
sensitive.
dbc@csit.fsu.edu
84
Uniform support for GET and POST



The parameter-reading methods behave the same for
GET and POST requests.
It is natural to support both kinds of request with the
same code.
To do this, simply have doGet() dispatch to doPost(), or
vice versa. For example:
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
doPost(request, response) ;
}
dbc@csit.fsu.edu
85
Determining the HTTP Method



The getMethod() method on the HttpServletRequest
returns the HTTP method appearing in the header.
For example, when a client sends the HTTP HEAD
request, the server is supposed to treat it like a GET
request, but return the headers only—not the data.
The server will automatically discard any data doGet()
returns, but (if you had the urge) you could make things
a bit more efficient as follows:
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
. . . set headers . . .
if(request.getMethod().equals(“HEAD”)) return ;
. . . return data . . .
}
dbc@csit.fsu.edu
86
Information from Request Headers


getMethod() is one a series of convenience methods
that read information from the request headers.
Others include:
getRequestURI(), getProtocol()
getContentLength()
getContentType()
getAuthType(), getRemoteUser()
getCookies()
dbc@csit.fsu.edu
Method header
Content-Length header
Content-Type header
Authorization header
See later
87
Reading Request Headers Directly


Preceding methods are not exhaustive.
If you know the name of the header you want, use
String getHeader(String name)

For headers (e.g. Accept-Language) that can appear
multiple time in a given request, use:
java.util.Enumeration getHeaders(String name)

To simply enumerate all headers of a given request, use
java.util.Enumeration getHeaderNames()
in conjunction with getHeader().
dbc@csit.fsu.edu
88
Displaying All Headers
public class Headers extends HttpServlet {
public void doPost(HttpServletRequest req,
HttpServletResponse resp) {
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><head></head><body>”) ;
Enumeration headers req.getHeaderNames() ;
while(headers.hasMoreElements()) {
String name = (String) headers.nextElement() ;
out.println(name + “<br>” +
req.getHeader(name) + “<br><br>”) ;
}
out.println(“</body></html>”) ;
}
}
dbc@csit.fsu.edu
89
HTTP 1.1 Request Headers
Accept: MIME types the browser can handle
Accept-Charset: Character sets the browser can handle
Accept-Encoding: Encoding (e.g gzip)
Accept-Language: English (en), etc.
Authorization: User ID/password
Cache-Control: For proxy servers.
Connection: Can the browser keep connections alive?
Content-Length: of POSTed data
Content-Type: MIME encoding
Cookie: Cookies previously received from this site.
Expect: Browser wishes to attach a document
From: email address of requester.
Host: host/port information on original URL
dbc@csit.fsu.edu
90
HTTP 1.1 Request Headers (cont.)
If-Match:
If-Modified-Since: only send recently changed data.
If-Match:
If-None-Match:
If-Range:
If-Unmodified-Since: Used with PUT.
Pragma: onlystandard value is no-cache.
Proxy-Authorization:
Range: Get part of document.
Referer: Set if was link from a Web page
Upgrade: Change protocol
User-Agent: Identifies browser
Via: Set by gateways and proxies
Warning:
dbc@csit.fsu.edu
91
Multiple-valued Parameters

If a form parameter can have more than one value (e.g.
a value from a menu allowing multiple selections), you
should apply the method:
public String [] getParamterValues(String name) {. . .}

to the HttpServletRequest object.
Recall this example from the section on forms:
<select name=pets size=3 multiple>
<option value=dog> Dog
<option value=cat> Cat
<option value=bird> Bird
<option value=fish> Fish
</select>

The form may send the data in a GET request to the
following servlet.
dbc@csit.fsu.edu
92
Handling Multi-valued Parameters
public class MultiValue extends HttpServlet {
public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
response.setContentType(“text/html”) ;
PrintWriter out = response.getWriter() ;
String [] pets = request.getParameterValues(“pets”) ;
out.println(“<html><body><head></head>”) ;
out.println(“Your pets:<p>”) ;
out.println(“<table border cellspacing=0 cellpadding=5>”) ;
for (int i = 0 ; i < pets.length ; i++)
out.println(“<tr><td>” + pets [i] + “</td></tr>”) ;
out.println(“</table>”) ;
out.println(“</html></body>”) ;
}
}
dbc@csit.fsu.edu
93
Multi-part Data

Recall this (slightly modified) example from the section on
forms:
<form method=post
enctype=“multipart/form-data”
action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/MultiPart”>
Course: <input name=course size=20> <p>
Students file: <input type=file name=students size=32> <p>
<input type=submit>
</form>


The simple getParam() approach does not appear to work
for multi-part data (required for uploading files).
However, we can resort to a lower-level CGI-like
approach—reading the posted data from an input stream,
and decoding it “by hand”.
dbc@csit.fsu.edu
94
Displaying Raw Multi-part Data
public class MultiPart extends HttpServlet {
public void doPost(HttpServletRequest req,
HttpServletResponse resp) {
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><head></head><body>”) ;
String contentType = req.getContentType() ;
out.println(“content type:<br>” + contentType + “<br>”) ;
BufferedReader in = new BufferedReader(req.getReader())
;
while(true) {
String line = in.readLine() ;
if(line == null) break ;
out.println(line + “<br>”) ;
}
out.println(“</body></html>”) ;
}
dbc@csit.fsu.edu
95
}
Remarks


This servlet will simply print value of the Content-Type
header, and the raw version of the posted data.
In general it is not safe to combine this style of reading
data, using getReader(), with the higher-level approach,
using getParameter()—choose one or the other.
dbc@csit.fsu.edu
96
Multi-part Data Example
public void doPost(HttpServletRequest req,
HttpServletResponse resp) {
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><head></head><body>”) ;
Vector students = new Vector() ;
String course = parseFormData(students, req) ;
out.println(“course: ” + course + “<br>”) ;
out.println(“students: <br>”) ;
out.println(“<table border cellspacing=0 cellpadding=5>”) ;
for (int i = 0 ; i < pets.length ; i++)
out.println(“<tr><td>” + (String) students.get(i) + “</td></tr>”)
;
out.println(“</table>”) ;
out.println(“</body></html>”) ;
}
}
dbc@csit.fsu.edu
97
Multi-part Data Example (cont.)
public String parseFormData(Vector students, HttpServletRequest req) {
String contentType = req.getContentType() ;
String boundary = “--” + contentType.substring( . . . ) ;
//Extract part boundary from content type header
BufferedReader in = new BufferedReader(req.getReader()) ;
String line = in.readLine() ;
while(! line.equals(boundary + “--”)) ) {
String header = in.readLine() ;
String name = header.substring( . . . ) ;
//Extract parameter name from content disposition header
if(name.equals(“course”)) {
course = in.readLine() ;
line = in.readLine() ;
}
else if(name.equals(“students”))
while(true) {
line = in.readLine() ;
if(line.startsWith(boundary)) break ;
students.addElement(line) ;
}
}
return course ;
}
dbc@csit.fsu.edu
98
}
Remarks


The parseFormData() implementation outlined here is
schematic only.
Parsing the multi-part MIME encoded data is
straightforward, but clearly fairly tedious.
– Servlets don’t give much help here.
dbc@csit.fsu.edu
99
Generating Responses
dbc@csit.fsu.edu
100
The HTTP Status Line

A minimal server response to a client request might be:
HTTP/1.1 200 OK
Content-Type: text/plain
Hello World!




We already saw how to set the content type explicitly
using setContentType().
Here we are more interested in the first header line: the
status line.
As the example suggests, a status value of 200 means
the request was successfully serviced. For a servlet
response, the Web server sets this status value by
default.
A servlet can explicitly set other values by using the
setStatus() method of HttpResponse.
dbc@csit.fsu.edu
101
HTTP Status Codes
100 Continue: Response to Expect request.
100 Switching Protocols: Response to Upgrade request.
200 OK: OK!
201 Created: Server created a document. URL follows.
202 Accepted: Processing is in progress.
203 Non-Authoritative Information:
204 No Content: No new document is available.
205 Reset Content: Clear form fields.
206 Partial Content: Response to Range request.
300 Multiple Choices: Trick question?
301 Moved Permanently: Document is elsewhere
302 Found: Redirects the browser to a different URL.
303 See Other: Please use GET instead of POST.
304 Not Modified: Response to request with If-Modified-Since.
305 Use Proxy: Go to proxy at returned URL
307 Temporary Redirect: like 302.
dbc@csit.fsu.edu
102
HTTP Status Codes (cont.)
400 Bad Request: Syntax error.
401 Unauthorized: No appropriate Authorization header
403 Forbidden: Not allowed with any authorization
404 Not Found: Not at this address.
405 Method Not Allowed: Self explanatory.
406 Not Acceptable: Resource doesn’t match Accept header.
407 Proxy Authentication Required:
408 Request Timeout: Client took too long sending request.
409 Conflict: Used with PUT.
410 Gone: Document has gone.
411 Length Required: Content-Length missing (in POST).
412 Precondition Failed:
413 Request Entity Too Large: Document too big to handle.
414 Request URI Too Long: URI is too long
415 Unsupported Media Type:
416 Requested Range Not Satisfiable:
417 Expectation Failed: Disillusioned?
dbc@csit.fsu.edu
103
HTTP Status Codes (cont.)
500 Internal Server Error: Server is confused.
501 Not Implemented: Requested functionality not supported.
502 Bad Gateway: Used by proxy servers.
503 Service Unavailable: Server overloaded or service down.
504 Gateway Timeout: Used by proxy servers.
505 HTTP Version Not Supported: Self explanatory.
dbc@csit.fsu.edu
104
Explicitly Returning Status Codes

These status values are available as predefined
constants in the HttpServletResponse class:
final int SC_OK = 200 ;
final int SC_FOUND = 302 ;
final int SC_NOT_FOUND = 404 ;
etc. The default status is equivalent to explicitly doing:
resp.setStatus(HttpServletResponse.SC_OK) ;

There are a couple of convenience methods on
HttpServletResponse for dealing with common cases:
void sendError(int sc, String message)
– send specified status, with generated page containing message.
void sendRedirect(String location)
– send SC_TEMPORARY_REDIRECT status, and include
Location header.
dbc@csit.fsu.edu
105
Redirecting the Browser



By sending the SC_FOUND or
SC_TEMPORARY_REDIRECT status, together with a
dynamically generated URL, a servlet can cause a the
browser to go directly to a different page or site (without
the user manually clicking another link).
Following is a simplified version of an example from
“Core Servlets and Java Server Pages”.
It allows the user to specify a search string and a
preferred search engine, dynamically generates a query
URL for the chosen search engine, and redirects the
browser to that URL.
dbc@csit.fsu.edu
106
Search-Engine Selection Servlet
public class Search extends HttpServlet {
public void doGet(HttpServletRequest req,
HttpServletResponse resp) {
String searchEngine = req.getParameter(“searchEngine”) ;
String searchString = req.getParameter(“searchString”) ;
String url = null ;
if(searchEngine.equals(“google”)
url = “http://www.google.com/search?q=” +
searchString ;
if(searchEngine.equals(“lycos”)
url = “http://lycospro.lycos.com/cgibin/pursuit?query=” +
searchString ;
if(searchEngine.equals(“hotbot”)
url = “http://www.hotbot.com/?MT=” + searchString ;
resp.sendRedirect(url) ;
dbc@csit.fsu.edu
107
}
}
Remarks



The sendRedirect() call does everything necessary to
create the response.
To deal with complex search strings, you should probably
URL-encode searchString before appending it to url.
A possible form:
<form method=get
action=“http://sirah.csit.fsu.edu:8081/dbc/servlets/Search”>
Search engine: <p>
Google: <input type=radio name=searchEngine
value=google
checked>
Lycos: <input type=radio name=searchEngine value=lycos>
Hotbot: <input type=radio name=searchEngine
value=hotbot>
<p>
dbc@csit.fsu.edu
Search string: <input
type=text name=searchString 108
Introduction to Session Tracking
dbc@csit.fsu.edu
109
The Problem




HTTP is a stateless protocol—it provides no intrinsic
way to associate one request/response transaction with
any subsequent transactions.
But very often a Web application requires that the server
engage in a non-trivial dialog with a single user,
involving multiple client requests and server responses.
So the problem is to find ways to define and keep track
of a particular “session” between browser and Web
server.
This is called session tracking.
dbc@csit.fsu.edu
110
Solutions

There are three solutions in common use:
– Hidden Form Fields
Assumes all client requests associated with the session are form
submissions. The forms must be dynamically generated by the
server, and include hidden input fields that preserve session
information.
– URL-Rewriting
Again assumes all pages associated with the session are
dynamically generated by the server. Session information is
directly appended to any URLs referring back to the server in the
generated pages.
– Cookies
An extension to HTTP allows a server to ask a browser to store
small amount of persistent information. The browser returns this
information in HTTP request headers, typically whenever the client
revisits a Web server on the same host.
dbc@csit.fsu.edu
111
Example Using Hidden Form Fields

The classic example of “session information” is the
contents of a customer’s shopping cart at an online
store.

In the interests of fitting code in slides, we scale this
down and deal with selections from a virtual snackvending machine.
– “. . . Think of clocks and counters and telephones and board
games and vending machines.” C.A.R Hoare, Communicating
Sequential Processes, 1985.
dbc@csit.fsu.edu
112
Snack-Vending Machine
public class VendingMachine extends HttpServlet {
String[] snacks = {“Chips”, “Popcorn”, “Peanuts, . . . } ;
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
String [] selections = req.getParameterValues(“selection”)
;
out.println(“<html><head></head><body>”) ;
for(int i = 0 ; i < snacks.length ; i++) {
out.println(“<form action=” + selectURL + “>”) ;
out.println(“<input type=submit name=selection ” +
“value=\“” + snacks [i] + “\”>”) ;
printHidden(out, selections) ; // print hidden fields
out.println(“</form>”) ;
}
. . . generate form element for viewing current selections . . .
out.println(“</body></html>”)
;
dbc@csit.fsu.edu
113
}
Remarks

The servlet generates an HTML page with one form
element for every snack.
– selectURL is a reference back to this servlet.


The submit button for each form sets a value for the
parameter called selection: value set is name of the
snack.
Crucially, every form element in the generated page also
sets again any pre-existing values for selection, using
hidden input elements:
void printHidden(PrintWriter out, String [] selections) {
if(selections != null)
for(int j = 0 ; j < selections.length ; j++)
out.println(“<input type=hidden name=selection ”
+
“value=\“” + selections [j] + “\”>”) ;
}
The value of selections
was returned by the the call 114
to
dbc@csit.fsu.edu
getParameterValues(), earlier in the servlet method.
The Initial Page

If I go to the URL of the servlet, perhaps
http://sirah.csit.fsu.edu:8081/dbc/servlets/VendingMachine
I see something like:
dbc@csit.fsu.edu
115
Generated Source of Initial Page

If we view HTML source of the initial page, it includes a series
of forms:
<html><head></head><body>
<form action=http://sirah...:8081/dbc/servlet/VendingMachine>
<input type=submit name=selection value="Chips">
</form>
<form action=http://sirah...:8081/dbc/servlet/VendingMachine>
<input type=submit name=selection value="Popcorn">
</form>
<form action=http://sirah...:8081/dbc/servlet/VendingMachine>
<input type=submit name=selection value="Peanuts">
</form>
...
</body></html>

selections was null, and initially there are no hidden fields.
dbc@csit.fsu.edu
116
Making Selections

If I click on a couple of the selections on the initial page,
apparently nothing changes—each selection returns a
generated page that looks identical in the browser.

But if I view the generated HTML source. . .
dbc@csit.fsu.edu
117
Generated Source of Later Pages
<html><head></head><body>
<form action=http://sirah...:8081/dbc/servlet/VendingMachine>
<input type=submit name=selection value="Chips">
<input type=hidden name=selection value=”Peanuts">
<input type=hidden name=selection value=”Chips">
</form>
<form action=http://sirah...:8081/dbc/servlet/VendingMachine>
<input type=submit name=selection value="Popcorn">
<input type=hidden name=selection value=”Peanuts">
<input type=hidden name=selection value=”Chips">
</form>
...
</body></html>

Every form now contains hidden fields holding values that
were in selections.
dbc@csit.fsu.edu
118
Handling the Accumulated “State”

The page returned by the VendingMachine servlet
contains a final form generated by:
out.println(“<form action=” + viewURL + “>”) ;
out.println(“View current selections:
<input type=submit>”) ;
printHidden(out, selections) ;
out.println(“</form>”) ;
Here viewURL is a reference to a second servlet, which
generates a page containing the contents of the hidden
fields.
dbc@csit.fsu.edu
119
Critique of Hidden Fields

The approach is quite elegant, but it has some problems:
– All interactions between client and server must go through forms.
– Every form on every generated page must include the hidden
fields defining the session state.
– In our example, the number of hidden fields grew quickly.

All approaches to session tracking run into problems
analogous to the last: one wishes to keep down the
amount of hidden information that must be exchanged in
every single transaction of a session.

For example, this will be important for the URL-rewriting
approach, because we don’t want to end up with huge
URLs.
dbc@csit.fsu.edu
120
Session IDs

The direct “hidden fields” approach does not store
session state in any fixed place. The “state” is
somehow encoded by the current point in an ongoing
dialog.
– Perhaps reminiscent of simulation of state by lazy lists in
functional programming languages??



This is interesting, but, as noted, it means that the
associated information is constantly swapped between
client and server.
An obvious solution is for the server to store the bulk of
the data associated with each active session.
The only session information bounced back and forth
between client and server is an immutable identifier for
the session.
dbc@csit.fsu.edu
121
Improved Vending Machine Servlet

In an improved version of our vending machine servlet, the
main servlet has a static variable, sessionTable.
– We make it static so it can be accessed by a separate servlet
class, used for viewing or processing the current selections.




This sessionTable is a HashMap. It is keyed by a
session ID string. The associated values are “records”
describing the current state of the session.
In our simple example, each session-state “record” is a
Vector containing the items selected thus far.
In our example, the session ID is a random number
generated when the servlet is initially called (without a
sessionID parameter).
This number is embedded as a hidden field in the
generated pages, and thus returned in subsequent
transactions.
dbc@csit.fsu.edu
122
A Second Vending Machine
static HashMap sessionTable = new HashMap() ;
Random rand = new Random() ;
// Seeded by current date/time
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
...
String sessionID = req.getParameter(“sessionID”) ;
if(sessionID == null) {
// First invocation in this session
sessionID = “” + rand.nextInt() ;
sessionTable.put(sessionID, new Vector()) ;
} else {
// Subsequent invocation
Vector selections = (Vector) sessionTable.get(sessionID) ;
String selection = req.getParameter(“selection”) ;
if(selection != null) selections.addElement(selection) ;
}
. . . Print single hidden field in all forms:
out.println(“<input type=hidden name=sessionID ” +
“value=” + sessionID + “>”) ;
...
}
dbc@csit.fsu.edu
123
Remarks


Our naive implementation does not worry about issues of
thread safety.
More strictly, accesses to sessionTable should be
synchronized, eg:
synchronized(sessionTable)
sessionTable.put(sessionID, selections) ;

This is sufficiently safe if we make the often-reasonable
assumption that there are no concurrently active
transactions involving the same session.
– Without this assumption, access to the individual session records
should be synchronized as well.

The selection-viewing servlet can access the session table
in the first servlet class by
VendingMachine2.sessionTable.
dbc@csit.fsu.edu
124
Server Restarts



Our simplified implementation will fail ungraciously if the
server is restarted while a browser is in the middle of a
session.
The session record disappears, while the session ID
may still be stored in the browser.
Unless session data is stored persistently there is no
completely satisfactory solution, but a servlet writer
should be aware of this possibility, and code defensively
(perhaps sending an explanatory message to the
browser).
dbc@csit.fsu.edu
125
URL-Rewriting




URL-rewriting can be regarded as an optimization of the
hidden fields approach.
Assuming a form with a hidden field is submitted using
the GET method, what the server really sees is just a
request whose URI has been extended with an
encoding of the value in the hidden field.
In URL-rewriting we cut out role of the browser
(encoding session data from hidden fields) and directly
extend the URL in the action attribute of the form with
an encoding of the session data.
As a byproduct, this also works for URLs in anchor
elements (simple hypertext links).
dbc@csit.fsu.edu
126
A Third Vending Machine
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
...
String sessionID ;
String pathInfo = req.getPathInfo() ;
if(pathInfo == null) {
// First invocation in this session
sessionID = “” + rand.nextInt() ;
sessionTable.put(sessionID, new Vector()) ;
} else {
// Subsequent invocation
sessionID = pathInfo.substring(1) ;
// Strip leading “/”
Vector selections = (Vector) sessionTable.get(sessionID) ;
String selection = req.getParameter(“selection”) ;
if(selection != null) selections.addElement(selection) ;
}
...
out.println(“<form action=” + selectURL + “/” + sessionID +
“>”) ;
out.println(“<input type=submit name=selection . . . >”) ;
out.println(“</form>”)dbc@csit.fsu.edu
;
127
...
Remarks




The session ID information is appended to the servlet
URL in the action attribute of the forms (recall
selectURL is the URL of this servlet).
On any invocation, after the first in the session, this
information can be retrieved using getPathInfo().
The getPathInfo() method on HttpServletRequest
returns any text in the request URL following the servlet
name (up to and excluding the ? that delimits the query
string, if there is one).
We now have the option to replace the form that
connects to a selection-viewing servlet with a simple
anchor element.
– The URL in the anchor element is extended with the session ID
information, just like the action attribute in a form.
dbc@csit.fsu.edu
128
Cookies
dbc@csit.fsu.edu
129
Cookies





A cookie is a small piece of contextual information
embedded in an HTTP response from a Web server.
If a browser receives an HTTP response including a
Set-Cookie header (and it is willing to accept cookies) it
stores this information.
The information can either be stored in the memory of
the running browser program (“session cookies”) or
saved to disk (“persistent cookies”).
Subsequently, whenever the browser constructs an
HTTP request for a server, it checks if it is storing any
cookies for the server involved.
If so, it returns the cookie information to the server, in a
Cookie header in the new request.
dbc@csit.fsu.edu
130
Uses of Cookies

Recognizing a regular customer
– A persistent cookie can save some identification information for
the particular customer. The stored information may be actual
name and details, or (preferably) some key into a database on
the server.
– When the customer returns to the site, associated information
(mailing address, etc) is already known; it doesn’t have to be
entered anew by the customer.
– There are many variations on this theme, e.g. it allows portal
sites to do focussed advertising.

Session Tracking
– Within the context of a single “visit” to a site, cookies can be
used as an alternative to hidden fields or URL-rewriting, as the
underlying mechanism for session tracking.
dbc@csit.fsu.edu
131
Abuses of Cookies


A poorly constructed commercial site might use cookies
to store sensitive information (e.g. credit card numbers)
on the hard disk of your PC. This might be a privacy
problem if the PC is shared by several users.
A Web site can persuade a browser to send a cookie to
a third party site, by embedding an image that comes
from the Web server of the third party.
– The third party site might offer the original site collated
information on its visitors.
– It may be a particular nuisance if the third party has previously
harvested the email address of the user, e.g. by sending them
an HTML email containing a cookie-setting icon.
– Moral: configure your browser to only send cookies to the actual
page you are visiting?
dbc@csit.fsu.edu
132
Limits to Cookies

Typically a browser will restrict the number and size of
cookies it will accept, e.g.:
– Maximum of 20 cookies per site,
– Maximum of 300 cookies total (from all sites),
– Maximum size of individual cookie is 4 kilobytes.


Users may of course configure their browsers to refuse
all cookies, or only accept selected cookies.
Hence a Web application should not rely on cookies for
basic functionality—only for “added value”.
dbc@csit.fsu.edu
133
The Servlet Cookie API


The servlet creates a cookie by using a constructor for
the class Cookie.
Various attributes can be set for the cookie before
sending it to the client. They include:
– The name and value of cookie.
These are usually set in the Cookie constructor.
– The domain to which the cookie should be returned.
By default the cookie will only be returned to the server that sent
it, but this default can be overridden.
– The URI path to which the cookie should be returned.
By default, the cookie is only returned to pages in the same
directory as the page that sent the cookie.
– The time when a persistent cookie expires
e.g., the cookie should be deleted by the browser after one hour,
after one year, etc.
dbc@csit.fsu.edu
134
A Servlet that Sets Two Cookies
public class SetCookies extends HttpServlet {
Random rand = new Random() ;
// Seeded by current date/time
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
resp.setContentType(“text/html”) ;
Cookie session = new Cookie(“mySessionCookie”,
“” + rand.nextInt()) ;
resp.addCookie(session) ;
Cookie persistent = new Cookie(“myPersistentCookie”,
“” + rand.nextInt()) ;
persistent.setMaxAge(3600) ; // One hour
resp.addCookie(persistent) ;
}
}
PrintWriter out = resp.getWriter() ;
out.println(“<html><head></head><body>”) ;
out.println(“<h1>Enjoy your cookies!</h1>”) ;
out.println(“</body></html>”) ;
dbc@csit.fsu.edu
135
Remarks





The arguments of the Cookie constructor are the cookie
name and value.
Here the value is a random number.
Cookie names or values should not include white space
or any of: “[”, “]”, “(”, “)”, “=”, “,”, “””, “/”, “?”, “@”, “:”, “;”.
To make a cookie persistent, set the expiration time in
seconds using setMaxAge().
By default the expiration time is negative, indicating a
session cookie.
dbc@csit.fsu.edu
136
Viewing the Set-Cookie Headers

After deploying this servlet, we can view the headers it
returns by modifying the TrivialBrowser class
(introduced in the network programming lecture) to take
host, port, and path arguments. . .
dbc@csit.fsu.edu
137
HTTP Response Including Set-Cookie
java TrivialBrowser sirah.csit.fsu.edu 8081 /dbc/servlet/SetCookies
HTTP/1.0 200 OK
Date: Mon, 13 Nov 2000 15:49:25 GMT
Servlet-Engine: Tomcat Web Server/3.1 (JSP 1.1;
Servlet 2.2; Java 1.2.2; Linux 2.2.14-5.0 i386;
java.vendor=Sun Microsystems Inc.)
Set-Cookie: mySessionCookie=1367792973
Set-Cookie: myPersistentCookie=1264283064;Expires=Mon,
13-Nov-2000 16:49:25 GMT
Content-Language: en
Content-Type: text/html
Status: 200
<html><head></head><body>
<h1>Enjoy the cookies!</h1>
</body></html>
dbc@csit.fsu.edu
138
Browser Behavior

If we visit the SetCookies servlet with a real browser, we just
see a message:
Enjoy the cookies!

Now if we point the browser at our earlier Headers servlet
(extended to accept GET requests), we may see something
like:
User-Agent:
Mozilla/4.51 [en] (X11; I; SunOS 5.7 sun4u)
...
Cookie:
mySessionCookie=1367792973; myPersistentCookie=1264283064
...

The browser is returning the cookies in a Cookie header.
dbc@csit.fsu.edu
139
Retrieving Cookies with the Cookie API


The previous example just used the generic
getHeader() method to view the HTTP Cookie header
returned by the browser.
Of course the cookie API provides higher level methods
to do this.
dbc@csit.fsu.edu
140
Displaying Cookies
public class ShowCookies extends HttpServlet {
}
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
resp.setContentType(“text/html”) ;
Cookies [] cookies = req.getCookies() ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><body><head></head>”) ;
out.println(“You returned cookies:<p>”) ;
out.println(“<table border cellspacing=0 cellpadding=5>”) ;
. . . write a header row . . .
for (int i = 0 ; i < cookies.length ; i++) {
Cookie cookie = cookies [i] ;
out.println(“<tr><td>” + cookie.getName() + “</td>”) ;
out.println(“<td>” + cookie.getValue() + “</td></tr>”) ;
}
out.println(“</table>”) ;
out.println(“</html></body>”) ;
}
dbc@csit.fsu.edu
141
Remarks



The method getCookies() returns an array of Cookie
objects.
The methods getName() and getValue() on a Cookie
object naturally return name and value of the cookie.
The API has no method for extracting just a cookie with
a specified name from the request (i.e. nothing directly
analogous to getParameter()).
dbc@csit.fsu.edu
142
Visiting ShowCookies

Initially:

After visiting
SetCookies:

After restarting
the browser:
dbc@csit.fsu.edu
143
Session Tracking Using Cookies


Our penultimate version of the vending machine servlet
uses cookies instead of URL-rewriting.
We need to define a method to retrieve a cookie with a
given name, e.g.
String getCookieValue(HttpServletRequest req, String
name) {
Cookie [] cookies = req.getCookies() ;
for(int i = 0 ; i < cookies.length ; i++) {
Cookie cookie = cookies [i] ;
if(cookie.getName().equals(name))
return cookie.getValue() ;
}
return null ;
}
dbc@csit.fsu.edu
144
A Fourth Vending Machine
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
...
String sessionID= getCookieValue(request, “vending_session”) ;
if(sessionID == null) {
// First invocation in this session
sessionID = “” + rand.nextInt() ;
resp.addCookie(new Cookie(“vending_session”, sessionID)) ;
sessionTable.put(sessionID, new Vector()) ;
} else {
// Subsequent invocation
Vector selections = (Vector) sessionTable.get(sessionID) ;
String selection = req.getParameter(“selection”) ;
if(selection != null) selections.addElement(selection) ;
}
...
out.println(“<form action=” + selectURL + “>”) ;
out.println(“<input type=submit name=selection . . . >”) ;
out.println(“</form>”) ;
...
}
dbc@csit.fsu.edu
145
Remarks



We no longer have to worry about rewriting forms and
anchor elements.
A “session” can now be tracked across links through
intervening static HTML pages.
However:
– This version provides no way to terminate a session, short of
restarting the browser.
– There are subtle questions about how the “scope” of a session
is delimited.
– A functional programmer might argue that we lost “referential
transparency”??
dbc@csit.fsu.edu
146
The Servlet Session Tracking API
dbc@csit.fsu.edu
147
The Session Tracking API

We have already illustrated several underlying
approaches to session tracking:
– Hidden fields, URL-rewriting, cookies.



In general an application has to make a choice between
these mechanisms, taking into account support in server
and browser.
Session cookies may be favored on grounds of
generality and flexibility, but not all clients will accept
them.
In practice the servlet programmer does not have to
worry too much about these issues. A high-level API is
provided that will transparently choose and deploy a
suitable low-level tracking mechanism.
dbc@csit.fsu.edu
148
The HttpSession class




A particular session is represented by an object from the
HttpSession class.
A session is defined as an association, lasting for some
period, between a particular browser and a particular
group of servlets on a server.
The current session is obtained by applying the method
getSession() to the HttpRequest.
If no session object currently exists for this
browser/servlet association, one will be created on the
first call to getSession().
dbc@csit.fsu.edu
149
Simple Example
public class GetSession extends HttpServlet {
static final String myURL = . . . URL of this servlet . . . ;
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
HttpSession session = req.getSession(true) ;
resp.setContentType(“text/html”) ;
PrintWriter out = resp.getWriter() ;
out.println(“<html><body><head></head>”) ;
out.println(“<a href=” + resp.encodeURL(myURL) + “>” +
“View servlet again</a>”) ;
out.println(“</html></body>”) ;
}
}
dbc@csit.fsu.edu
150
Remarks

The true argument of getSession() means that a new
session object will be created if one does not already
exist.
– For robustness you should probably always use this argument.
(The documentation says that the form of getSession() without
an argument is equivalent. Experience suggests maybe not?)




This servlet simply outputs a link back to itself (it doesn’t
explicitly use the session object).
One important thing to note is the call to encodeURL().
This method should be applied to any URLs in the
generated page that refer back to the same servlet
context.
This supports URL-rewriting (if this is the sessiontracking strategy adopted for the session).
dbc@csit.fsu.edu
151
Viewing the Generated Page

Pointing the browser at this page, we see a page
containing a link:
View servlet again

If we view the HTML source of this generated page, we
may see something like:
<html><head></head><body>
<a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/
GetSession;jsessionid=To1019mC0 . . . 365At
>
View servlet again </a>
</body></html>

The URL in this first generated page has been rewritten
to include an attribute jsessionid. The associated value
is a long, random-looking string.
dbc@csit.fsu.edu
152
Checking the Cookies




Before doing anything else, we visit the ShowCookies
servlet.
We may see something like:
In the first HTTP response after the session is created,
the servlet both rewrites URLs, and sends a cookie to
the browser.
The same session ID appears in both places.
dbc@csit.fsu.edu
153
Revisiting the Servlet

Going back to the GetSession servlet, if I follow the link
to view the servlet again, I see a page that looks the
same.

But if I view the HTML source again I may see:
<html><head></head><body>
<a href=http://sirah.csit.fsu.edu:8081/dbc/servlet/GetSession>
View servlet again </a>
</body></html>

This time the URL is not rewritten.
dbc@csit.fsu.edu
154
Selecting a Session Tracking
Mechanism




As noted, in the first response after a session is created,
the servlet both sends a cookie and rewrites URLs
If the browser returns the session ID cookie in a
subsequent request, URL-rewriting can be disabled.
If the browser is not returning cookies, URL-rewriting will
continue.
All this happens “behind the scenes”: the servlet
programmer may not even be aware of the mechanism.
dbc@csit.fsu.edu
155
Binding Information to a Session




Of course this is not particularly useful unless we have a
way to associate application information with the
session.
In previous examples we used a HashMap, keyed by
session ID, to store session data.
We may assume that analogous mechanisms are used
behind the scenes in the session-tracking API, but the
session ID is not usually directly accessed by the
programmer.
Instead, the application programmer just sees the
HttpSession object. Methods are available to directly
“cache” information in this object.
– The session object itself behaves like a simple collection class.
dbc@csit.fsu.edu
156
Some Methods on HttpSession
public void setAttribute(String name, Object value)
Add a reference to the object value to the session object, keyed
by the string name.
public void removeAttribute(String name)
Remove the value associated with the key name from the
session.
public Object getAttribute(String name)
Extract the value associated with the key name.

Note the value object may implement
HttpSessionBindingListener, in which case it will be
notified when it is added or removed from a session.
dbc@csit.fsu.edu
157
Session Attributes vs. Instance
Variables




In well-written Java programs, local variables are
normally declared inside methods to hold values that
are computed and used by only a single method
invocation.
Typically, instance variables are used to hold values that
need to be shared across multiple invocations.
In servlet programming—where several sessions may
be concurrently operating on the single servlet
instance—this role for instance variables is naturally
taken over by attributes of the session object.
Think hard before declaring an instance variable in a
servlet. In many cases you should probably be using a
session attribute instead.
dbc@csit.fsu.edu
158
A Final (?) Vending Machine





The first operation in the doGet() method is to retrieve
or create a session object using getSession()
We then attempt to extract a Vector object called
selections from the session.
If we fail, we can assume this is the first transaction in
this session. A new Vector object is created, and added
to the new session.
Form parameters are added to the Vector as usual.
Whenever URLs referring back to this servlet context
appear in the generated HTML, they are passed through
encodeURL().
dbc@csit.fsu.edu
159
A Fifth Vending Machine
public void doGet(HttpServletRequest req,
HttpServletResponse resp) throws . . . {
HttpSession session = request.getSession(true) ;
Vector selections = (Vector) session.getAttribute(“selections”) ;
if(selections == null) {
// First invocation in this session
selections = new Vector() ;
session.setAttribute(“selections”, selections) ;
}
String selection = req.getParameter(“selection”) ;
if(selection != null)
selections.addElement(selection) ;
...
}
out.println(“<form action=” + resp.encodeURL(selectURL) +
“>”) ;
out.println(“<input type=submit name=selection . . . >”) ;
out.println(“</form>”) ;
...
dbc@csit.fsu.edu
160
Remarks

It is still recommended to use synchronized blocks to
ensure thread safety. You can use the session object for
synchronization, e.g.:
synchronized (session) {
Vector selections = (Vector)
session.getAttribute(“selections”) ;
if(selections == null) {
selections = new Vector() ;
session.setAttribute(“selections”, selections) ;
}
}


As usual, the vending machine servlet will lead to a
selection-viewing servlet when the user follows a suitable
link.
These two servlets automatically share the same session
object, and thus session information, because they are in
dbc@csit.fsu.edu
161
the same servlet context.
The Scope of a Session


A servlet context is a group of servlets (and possibly
other Web entities), collected together in some directory.
Under Tomcat, servlet contexts are defined in the
server.xml file.
– In the examples so far, the servlet context was /dbc.



Several servlets may be involved in the same session,
hence share the same HttpSession object.
This sharing is automatic if the servlets are in the same
context, and are interacting with the same browser.
Servlets from different contexts in the same server, or
interacting with different browsers, always have distinct
HttpSession objects.
dbc@csit.fsu.edu
162
Life-Time of a Session


In general a session expires after some interval.
The method:
public void setMaxInactiveInterval(int seconds)

on HttpSession can be used to request that the
session will be invalidated if there has been no
transaction during a period of the specified length.
The method:
public void invalidate()
on HttpSession can be used to immediately invalidate a
session.
dbc@csit.fsu.edu
163
Download