Cutting Edge Research in Engineering of Web Applications Part 1 What is a Web Application? Jeff Offutt Professor of Software Engineering George Mason University http://www.cs.gmu.edu/~offutt/ offutt@gmu.edu Outline A. Who am I ? B. Who are you ? Part1 (13:00-15:00) Part 2 (19:00-21:00) 1. Web Apps Overview 4. Control Flow & State Handling is Different 2. How the Interweb Works 3. Web Software (Servlets) 5. State Handling in JSP Part 3 (Friday13:00-15:00) 6. Web Software Security 7. Modeling Web Apps 8. Testing Web Apps 9. Engineering Process July 2013 © J Offutt 2 Who Am I? • Professor of Software Engineering – – – – – > 150 refereed publications, H-index = 51 Editor-in-Chief: Journal of Software Testing, Verif., and Reliability Co-Founder: IEEE Intl Conf. on Software Testing Author: Introduction to Software Testing Several teaching awards at Mason • George Mason University – Suburban Washington, DC – “Most diverse” campus in the USA – 34,000 students • MS Software Engineering – Established 1987 – 60 to 80 graduates per year – 24 different graduate courses July 2013 © J Offutt 3 Who Are You? • Where are you from ? • What was your undergraduate major ? • What do you think of software engineering? July 2013 © J Offutt 4 Outline A. Who am I ? B. Who are you ? Part1 (13:00-15:00) Part 2 (19:00-21:00) 1. Web Apps Overview 4. Control Flow & State Handling is Different 2. How the Interweb Works 3. Web Software (Servlets) 5. State Handling in JSP Part 3 (Friday13:00-15:00) 6. Web Software Security 7. Modeling Web Apps 8. Testing Web Apps 9. Engineering Process July 2013 © J Offutt 5 Motivation – Overview • Modern web applications are: – – – – Distributed (world-wide) Heterogeneous (hardware and software) Highly user interactive Built on new technologies • The software is: – Very loosely coupled – Written in multiple languages – Often generated dynamically Diverse: In terms of software, communication, and people July 2013 © J Offutt 6 Motivation – Overview (2) • Web application software has to be better than most shrink-wrap or contract software • The combination of higher quality requirements and unique technologies make for a very interesting situation (Academics think “interesting” means fun, managers think “interesting” is scary …) This talk discusses why and in what ways web software must be better July 2013 © J Offutt 7 Software Deployment • Bundled : On your computer when you buy it • Shrink-wrapped : Bought at a store on a CD – Downloaded from company’s website or OSS site • Contract : Single customer • Embedded : Installed on an electronic device • Web application : On the web through a URL – – – – – July 2013 Component-based Concurrent / distributed One copy on the server Can be updated at any time (fast update cycle) User interactive © J Offutt 8 Important Quality Attributes for Traditional Software Traditional 1. Efficiency of process (time-to-market) 2. Efficiency of execution (performance) 50. Reliability 51. Safety 52. Maintainability 53. Security July 2013 9 © J Offutt Important Quality Attributes for Web Software 1. Reliability 2. Usability 3. Security 4. 5. 6. 7. Customers have little “site loyalty” and will switch quickly, thus time to market is much less important than in other application areas. Availability Scalability (but still important!) Maintainability Performance & Time to Market Based on an informal survey of around a dozen software development managers, 2000 July 2013 © J Offutt 10 Common N-Tier Architecture network Client Browser Javascripts middleware middleware Web Server Application Server HTML CGI JSP, etc DB Server Java Client-server … 3-tier … N-tier … July 2013 © J Offutt 11 Problems Can Occur Anywhere • • • • • 1995 : Web sites were 100% interface 1998 : Web sites were about 90% interface 2001 : Web sites are less than 50% interface 2005 : Web applications about 25% interface 2013 : Web application development dominates the software industry There is a huge shortage of knowledgeable, skilled web programmers and software engineers July 2013 © J Offutt 12 Summary : Concerns of Software Traditional 1. Efficiency of process (time to market) 2. Efficiency of execution 50. Reliability 51. Safety 52. Maintainability 53. Security July 2013 1. 2. 3. 4. 5. 6. 7. © J Offutt Web Software Reliability Usability Scalability Security Availability Maintainability Performance & Time to Market 13 Outline A. Who am I ? B. Who are you ? Part1 (13:00-15:00) Part 2 (19:00-21:00) 1. Web Apps Overview 4. Control Flow & State Handling is Different 2. How the Interweb Works 3. Web Software (Servlets) 5. State Handling in JSP Part 3 (Friday13:00-15:00) 6. Web Software Security 7. Modeling Web Apps 8. Testing Web Apps 9. Engineering Process July 2013 © J Offutt 14 Hypertext Transport Protocol (HTTP) • HTTP is based on the request-response communication model : – Client sends a request – Server sends a response • HTTP is a stateless protocol : – The protocol does not require the server to remember anything about the client between requests • The original standards proposal for HTTP : – ftp://ftp.rfc-editor.org/in-notes/rfc2616.txt July 2013 © J Offutt 15 HTTP • Normally implemented over a TCP connection – 80 is standard port number for HTTP • Typical browser-server interaction: – – – – – – July 2013 User enters Web address in browser Browser uses DNS to locate IP address Browser opens TCP connection to server Browser sends HTTP request over connection Server sends HTTP response to browser over connection Browser displays body of response in the client area of the browser window © J Offutt 16 HTTP Request Clients send requests to servers to ask for a resource (usually a file or to run a program) Example: GET http://cs.gmu.edu/~offutt HTTP/1.1 July 2013 © J Offutt 17 HTTP Response Servers send responses to clients with result of request (error code, a file output of a program) Example: HTTP/1.1 200 OK First digit is class of the status code : – – – – – July 2013 1 = Informational 2 = Success 3 = Redirection (with alternate URL) 4 = Client Error 5 = Server Error © J Offutt 18 Client Caching Server Client 1. HTTP request for image 2. HTTP response containing image Browser I need that image again… The slow way … HTTP request for image Web Server HTTP response containing image 3. Store image … or the fast way Get image Cache July 2013 © J Offutt 19 Outline A. Who am I ? B. Who are you ? Part1 (13:00-15:00) Part 2 (19:00-21:00) 1. Web Apps Overview 4. Control Flow & State Handling is Different 2. How the Interweb Works 3. Web Software (Servlets) 5. State Handling in JSP Part 3 (Friday13:00-15:00) 6. Web Software Security 7. Modeling Web Apps 8. Testing Web Apps 9. Engineering Process July 2013 © J Offutt 20 Server Side Processing HTTP Request data UI implemented in a browser Web server Container engine Program components Client Server HTML HTTP Response July 2013 © J Offutt 21 Execution Overview 8 1 Incoming Response back to request on port Server requestor 8080 HTTP HTTP 7 Response Request 2 Web server Request / Modified Response Response 6 3 Objects Container Objects engine Create Return 5 thread / call 4 method Program component July 2013 © J Offutt 22 Web Container Engine Container Engine Web App 2 Web App 1 C1 a C2 C1 a b C1 C2 c c Shared memory C2 b C2 d Shared memory Shared memory July 2013 © J Offutt 23 Compiled Modules • Compiled modules are executable program components that the server uses • Common compiled module application plug-ins : – Microsoft’s .NET ASP – J2EE Java servlets • Compiled modules are efficient and very effective • They allow programmers to clearly separate the frontend from the back-end – Aids design – Complicates implementation July 2013 © J Offutt 24 Scripted Pages • Scripted pages look like HTML pages that happen to process business logic • Execution is on the server, not on the client – unlike JavaScripts • They have HTML with program statements that get and process data • JSPs are compiled and run as servlets – very clean and efficient July 2013 © J Offutt 25 Scripted Pages (2) • Common scripted pages: – Adobe’s ColdFusion – Microsoft’s Active Server Pages (ASP) – Java Server Pages (JSP) • Scripted pages are generally easy to develop and deploy • They mix logic with HTML, so can be difficult to read and maintain • Not as effective for heavy-duty engineering July 2013 © J Offutt 26 Summary Web Programming • The major difference is deployment – Software is deployed across the Web using HTTP – Other deployment methods include bundling, shrinkwrapping, embedding, and contracting • New software technologies • New conceptual language constructs for programming These differences affect every – Integration aspect of how to engineer high – Data management quality software – Control connections July 2013 © J Offutt 27 What are Servlets? • Servlets are small Java classes that – Process an HTTP request – Return an HTTP response • Servlet container or engine – – – – – Connects to network Catches requests Produces responses Creates object instances of servlet classes Hands requests to the appropriate object • Programmers use an API to write servlet classes July 2013 © J Offutt 28 Servlets vs. Java Applications • Servlets do not have a main() – The main() is in the server – Entry point to servlet is via call to a method ( doGet() or doPost() ) • Servlet interaction with end user is indirect via request / response object APIs – Actual HTTP request / response processing is handled by the server • Servlet output is usually HTML July 2013 © J Offutt 29 Servlet Container (or Engine) • • Servlet container is a plug-in for handling Java servlets A servlet container has five jobs : 1. Creates servlet instance 2. Calls init() 3. Calls service() whenever a request is made 1. service() calls a method written by a programmer to handle the request 2. doGet() to handle GET requests, doPost() to handle POST requests 3. More on this later … 4. Calls destroy() before deleting the servlet object 5. Destroys instance July 2013 © J Offutt 30 Servlet Container (2) When a request comes to a servlet, the servlet container does one of two things: 1. If there is an active object for the servlet, the container creates a Java thread to handle the request 2. If there is no active object for the servlet, the container instantiates a new object of that class, then creates a Java thread on the object to handle the request July 2013 © J Offutt 31 Servlet Container (3) • • • • July 2013 A servlet instance runs until the container decides to destroy it : When it gets destroyed is not specified by the servlet rules Most servlet containers destroy the object N minutes after the last request N is usually 15 or 30, and can be set by the system administrator Container can also be configured to never destroy a servlet object © J Offutt 32 Servlet Container (4) • What if the same servlet gets multiple requests ? • More than one thread for the same servlet may be running at the same time, using the same memory Client 1 Client 2 Server container servlet thread 1 servlet thread 2 Shared memory space Risky … July 2013 © J Offutt 33 Servlet Object Thread Lifecycle Does not exist instantiation based on a request or at container startup Instantiated initialization initialization failed release reference Unavailable back to service if temporarily unavailable (optional) temporary or permanent failure Destroyed July 2013 timeout or a container shutdown © J Offutt Initialized and/or ready for requests end of service HTTP thread requests from clients Service 34 Simple Servlet Example import javax.servlet.*; import javax.servlet.http.*; import java.io.*; public class hello extends HttpServlet { public void doGet (HttpServletRequest req, HttpServletResponse res) throws servletException, IOException { res.setContentType (“text/html; charset=\”UTF-8\””); PrintWriter out = res.getWriter (); out.println (“<HTML>”); July 2013 © J Offutt 35 Simple Servlet (2) out.println (“<HEAD>”); out.println (“<TITLE>Servlet example</TITLE>”); out.println (“</HEAD>”); out.println (“<BODY>”); out.println (“<P>My first servlet. </P>”); out.println (“</BODY>”); out.println (“</HTML>”); out.close (); } // end doGet() } // end hello July 2013 © J Offutt 36 Servlet Parameters – requests Parameters are conveniently stored in objects • String req.getParameter (String KEY) – Returns value of field with the name = KEY – Names are defined in HTML, and values supplied by the users • String[ ] req.getParameterValues (String KEY) – Returns all values of KEY – For example checkboxes • Enumeration req.getParameterNames () – Returns an Enumeration object with a list of all parameter names • String req.getQueryString () – Returns the entire query string July 2013 © J Offutt 37 Transmitting Servlet Parameters • Parameter data is the Web analog of arguments in a method call : – System.out.println (“aString”); – http://www.example.com/servlet/PrintThis?arg=aString • Query string syntax and semantics – Multiple parameters are separated by ‘&’ http://www.example.com/servlet/PrintThis?color=red&arg=aString – Order of parameters does not matter http://www.example.com/servlet/PrintThis?arg=aString&color=red – All parameter values are strings http://www.example.com/servlet/PrintThis?arg=&age=39 Empty string July 2013 © J Offutt 38 Summary—Examples http://www.cs.gmu.edu/~offutt/classes/642/examples/servlets/ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. hello : Prints lots of hellos name : Accepts and prints a name from a form goldGetPost : Differences between GET and POST formHandler : Displays arbitrary data from a form twoButtons : Processing two submit buttons abstracts : Processes form data and sends through email loan : Compute time to pay off a loan convert : Convert values convert2 : Better value conversion fileLoad : Uploads a file to a server studInfo : Our student info system – small web app showRequestHeaders : Shows information about the requests July 2013 © J Offutt 39