CS 290C: Formal Models for Web Software Lecture 5: of Navigation Behavior

advertisement
CS 290C: Formal Models for Web Software
Lecture 5: Automated Extraction and Verification
of Navigation Behavior
Instructor: Tevfik Bultan
Model checking navigation in existing applications
• The following papers using model checking techniques to
analyze existing web applications without requiring manual
specification of navigation models
– “Automatic Extraction and Verification of Page
Transitions in a Web Application,” Atsuto Kubo, Hironori
Washizaki, Yoshiaki Fukazawa, APSEC 2007
– “Verifying Interactive Web Programs,” Daniel R. Licata
and Shriram Krishnamurthi, ASE 2004
– “VeriWeb Automatically Testing Dynamic Sites,” Michael
Benedikt, Juliana Freire, Patrice Godefroid, WWW’02
Navigation Bugs: The Orbitz Bug
• [Step 1] A user enters the desired dates and destination of his flight; he
is then presented with a page listing possible flights, including Flight A
and Flight B.
• [Step 2] He clicks a link to open the description of Flight A in a new
browser window.
• [Step 3] Not being particularly enthused about that flight, he returns to
the list of flights …
• [Step 4] and clicks a link to load the description of Flight B, again in a
new browser window.
• [Step 5] Deciding that Flight A was better after all, he switches back to
the window still on the screen showing Flight A …
• [Step 6] and submits the form, causing a page confirming his
reservation to be displayed.
• [Result] Orbitz incorrectly makes a reservation on Flight B.
Navigation Properties
• Property that user expects to hold: The data used for
computation should always correspond to twhat the user
saw on the last page he submitted
• However, sometimes it may be better to have another
property:
– Amazon property: Once the user selects an item for
purchase, it should be contained in his shopping cart
• There are other properties that relate to navigation:
– Password-page property: An authentication page should
always be visited before accessing a certain controlled
page
Model checking navigation properties
• The goal of model checking navigation properties of web
applications is to find violations of such navigation
properties
• Model checking exhaustively explores the state space of
the application and looks for violations of the state
properties
Web application model in Struts
• The application model in Struts framework uses a set of
pages and a set of transitions between pages
• The page generation is separated from the processing
– Page generation is handled with JSP
– Processing is handled by action servlets
• JSP and servlets can be developed independently and the
associations between them are made using a configuration
file
Web application model in Struts
• The processing of the user requests is as follows;
– The user sends form data as a request to the server
– The server handles the request with and action servlet
that makes calls to the business logic
– The action servlet returns the processing results using a
JSP
Navigation behavior in web applications
• http is a stateless protocol
• The state information for http sessions is held using
– session cookies
– or as part of the URI
• However, clients can modify this content
– so the server cannot control what will be the next
request that will be sent by the client
Navigation behavior in web applications
• In extracting a navigation model, we must decide what type
of page transitions we are trying to model
– In the most general case, we can assume that the user
can transition from any page to any other page
– Or we can allow transitions that only correspond to the
links on the pages plus the backward or forward button
of the browser
– Or we can allow transitions that only corresponds to the
links on the pages without using any navigation
capability of the browser
Extracting navigation model for Struts
• Kubo et al. extract a navigation model from Struts
applications by focusing only only links provided by the
application
• They analyze
– the Struts config file, and
– the JSP template files
to extract this information
• After extracting a finite state machine from the application
they generate a PROMELA model that corresponds to the
page transitions in the application
Extracting navigation model for Struts
• Page transitions are inferred by investigating the Struts
configuration files and JSP template files
• They extract the following elements
– file names of JSP template files
– action attributes from html:form elements in the JSP
template files
– path attributes from action, forward and global-forward
elements in the Struts configuration files
• In the extracted finite state model the pages and actions are
both mapped to states
– One page can trigger multiple actions
– Same action can be triggered by multiple pages
Extracting navigation model for Struts
• Their analysis has limitations
• They do not perform any analysis on the Java code and
may ignore transitions among pages that are allowed by the
application
• After extracting the state machine model they also simplify
it and eliminate or merge transitions which they find
uninteresting from the verification perspective
Modeling user
• After extracting the navigation state machine, they also
generate a state machine that represents the user
• The user can submit arbitrary requests to the web
application
– so the state machine modeling the user randomly
generates requests in a loop and sends it to the web
application
Generating the Promela model
• Then, they generate a Promela model from the navigation
state machine
• They use an enumerated variable to represent the states of
the navigation state machine
• They generate a communication channel to represent the
communication between the user process and the
navigation state machine
• They create one user process and one web application
process and run them concurrently
Verifying the navigation model
• They write navigation properties in LTL
• They use the Spin model checker to check the properties
on the Promela specification
• Spin model checker outputs error traces for the properties
that are violated
• Experiments on a mail-reader finds a violation of a property
but it turns out that the extracted model excluded a
transition
– It is necessary to analyze the Java code to extract that
transition which is not done in this paper
Model checking web applications written in Scheme
• Licata et al. extract a Web control-flow graph (WebCFG)
from web applications written in PLT Scheme
• The WebCFG represents the navigation behavior of the
applications
• They then use model checking techniques to verify
properties on the WebCFG
• WebCFG is constructed from the input program using
standard CFG construction techniques
Model checking web applications written in Scheme
• WebCFG is constructed from the input program using
standard CFG construction techniques
• Each node in the WebCFG corresponds to an operation
– Each operation is represented as a node in the CFG
Model checking web applications written in Scheme
• Properties are specified by first tagging the page elements
(using Cascading Style Sheets) that will be used as atomic
propositions
• Then properties are specified as property automata
– Recall that LTL properties can be written as automata
• They expect the developer to provide explicit disctionarystyle mapping from field names to values (similar to
SmartProfiles used by VeriWeb).
•
Model checking web applications written in Scheme
• As a verification tool they use the FLAVERS toolkit.
• In addition to verifying properties written as property
automata, FLAVERS also supports constraint automata
– The constraint automata specify the behaviors that
should be ignore during verification
• They use the constraint automata to restrict the navigation
behavior so that spurious behaviors can be eliminated
– Such as a user jumping to a page that is not reachable
from the current page and that has never been visited
before.
Navigation Verification with VeriWeb
• VeriWeb is an exhaustive navigation testing tool proposed
by Benedikt et al.
• Rather than extracting a navigation model from a web
application and then analyzing it using a separate
verification tool, VeriWeb explores different navigation
scenarios on the application directly looking for errors
• By automating the navigation testing, VeriWeb prevents
manual effort required in testing by “capture-replay” tools
– In the “capture-replay” approaches different scenarios
are manually explored and recorded and then later on
automatically re-executed for testing
Challenging in Testing Web Applications
• Web applications are complex distributed systems
• They are frequently updated
• It is hard to isolate the behavior of a web application since it
involves many components (browser, server, back-end
database, etc.)
– So, it is not possible to test the web application as a
stand-alone application
• Web applications are accessible by a large set of user
which could be inexperienced or malicious
– So, any user behavior is possible
VeriWeb
• VeriWeb is a tool that automatically explores multiple
navigation scenarios looking for errors
– Like a crawler it exhaustively searches different
navigation scenarios
• However, it can also deal with forms which crawlers
are unable to handle
– Like a capture-replay tool, it can deal with dynamically
generated pages
• However, it does not require manual recording like
capture-replay tools
• It looks for standard errors like broken links, malformed
URLs
VeriSoft
• VeriWeb uses a software model checking tool called
VeriSoft for exploration of the navigation behavior
• VeriSoft is a verification tool that explores the state space of
programs
• It is different than other model checking tools (such as Spin)
in the sense that VeriSoft performs a stateless search
– It does not keep track of all the states it has visited
– It can keep track of the states in the current search path
to detect cycles
VeriSoft
• The key to state-space exploration with VeriSoft is a choice
function that determines what action to take next
– such as what statement to execute, or which link to
follow in case of web navigation
• VeriSoft systematically explores all possible actions by
using different choices when it backtracks
• It can guarantee complete coverage up to a certain depth
VeriSoft
• Since VeriSoft does not record all the visited states, if two
different scenarios bring the system to the same state,
VeriSoft may repeat exploration of the same scenarios after
that state multiple times
– This can lead to exponential blow up in the worst case
• VeriSoft uses partial-order reduction techniques to prevent
this exponential blow-up
– It keeps track of dependencies among different actions
and does not explore all possible interleavings of
independent actions
• It only explores a representative interleaving
• This is sufficient if the actions are independent
Back to VeriWeb
• VeriWeb uses the following components
– ChoiceFinder:
• Find actions in a page (links, forms, JavaScript)
– VeriSoft
• Controls the systematic exploration of the actions
– WebNavigator
• Executes the browsing actions selected by VeriSoft
– Error Checker
• Checks for errors, the tester can plugin their own
checks
VeriWeb Navigation Testing Algorithm
ExploreSite(startingURL,constraints)
currentPage = Navigator.load(startingURL);
while (true) {
error = ErrorHandler(currentPage,constraints);
if (error.status==true)
VeriSoft.assert(currentPage,error);
if (this page has been seen before)
VeriSoft.abort(currentPage,``cycle'');
else {
choices = ChoiceFinder(currentPage);
selectedChoice = VeriSoft.toss(choices);
currentPage =
Navigator.execute(selectedChoice,choices);
if (currentPage.error != null)
VeriSoft.assert(currentPage,error);
}
}
How to deal with Forms?
• Web applications ask users for input and their behavior
change based on that
– They may require user-name, password pairs
– They may require search queries
• Automatically generating different user-name, password
pairs is unlikely to find a valid pair
• Automatically generated search queries my result in a huge
state-space
How to deal with Forms
• In VeriWeb they require the tester to provide a “Smart
Profile”
• Tester specifies the set of data that can be entered to the
forms
– Valid user-name/password pairs
– A subset of possible search queries that may lead to
different/interesting behaviors
• The test engine tries different combinations of the provided
values
• VeriWeb provides a format for specification of these profiles
Download