Eliminating Navigation Errors in Web Applications via Model Checking and Runtime Enforcement of Navigation State Machines Sylvain Halle, Taylor Ettema, Chris Bunch, and Tevfik Bultan University of California, Santa Barbara Web software • Web software is becoming increasingly dominant • Web applications are used extensively in many areas: – Commerce: online banking, online shopping, … – Entertainment: online music, videos, … – Interaction: social networks • We will rely on web applications more in the future: – Health records (Microsoft HealthVault, Google Health) – Controlling and monitoring of national infrastructures (Google Powermeter) • Web software is also rapidly replacing desktop applications – cloud computing + software-as-service – Google Docs, Google … One Major Road Block • Web applications are not dependable! • Web applications are error prone – Many web applications have navigation errors: They mishandle unexpected user requests • Web applications are notorious for security vulnerabilities – Their global accessibility makes them a target for many malicious users • As web applications are becoming increasingly dominant and as their use in safety critical areas is increasing, their dependability is becoming a critical issue Web applications are error prone • Most web applications have navigation errors where an unexpected user request can cause a web application to – display cryptic error messages – display sensitive information that might be exploited by malicious users – execute an unintended action A Web Application: Bamboo Invoice A Web Application: Bamboo Invoice • At the top of the Bamboo Invoice project homepage, it states: “BambooInvoice is free Open Source invoicing software intended for small businesses and independent contractors. Our number one priorities are ease of use, user-interface, and beautiful code.” Navigation errors: Bamboo Invoice Another Web Application: Digitalus Another Web Application: Digitalus • At the top of the Digitalus project homepage, it states: “Digitalus CMS is a new kind of CMS. The focus of this open source project is usable software as opposed to endless lists of features.” Navigation errors: Digitalus Navigation errors: Digitalus How Did We Generate These Screens? • Not very difficult, just try to do something unexpected • For example • delete yourself • try to access a page that you should not have access to • See the step by step scenarios in the paper • The point is: • A normal user can accidentally do these operations • A malicious user can intentionally do these operations Why are web applications error prone? • Script-oriented programming: – A web application consists of a collections of scripts – These scripts call each other indirectly through interaction by the user and the browser • The form that one script generates has the address of the next script that will consume the user input – There are no systematic checks that guarantee that the caller and the callee agree on an interface • For example in a procedure call, the caller and the callee must agree on the number of arguments and their types – There is no explicit control flow identifying the execution order • The control flow is buried in the links of the generated html pages Why are web applications error prone? • Extensive string manipulation: – Web applications use extensive string manipulation • To construct html pages, to construct database queries in SQL, etc. – The user input comes in string form and must be validated and sanitized before it can be used • This requires the use of complex string manipulation functions such as string-replace – String manipulation is error prone Why are web applications error prone? • Interactivity – User interaction is not under the control of the developer • The user can use the back button of the browser • The user can open multiple windows • The user can cut and paste the URL – Imagine you develop a desktop application where all dialog boxes and all menu items could be displayed by the user at any moment... ...regardless of whether this makes sense in the current state of the application Why are web applications error prone? • Interactivity – Stateful interaction over stateless protocols (HTTP) – Interactions between different software components • browser, server, back-end database • the need to maintain session state across these components – One web application can be composed of many applications • Mash-ups, web services Automated Verification to the Rescue • What can automated verification do for you? – Exhaustive state-space exploration • Using state space reduction techniques to enable exhaustive exploration of the state space of a program – Symbolic analysis • Using compact symbolic representations (such as BDDS) to explore large sate spaces – Runtime verification • Check or enforce properties at runtime – Combining static and dynamic checks • Check as much as possible statically, for the rest use runtime enforcement • What can you do for automated verification? • Specify the intended behavior! Request processing in a Web application • Request processing in Web applications that use MVC (Model View Controller) frameworks Navigation modeling and analysis • We developed a simple language to specify navigation state machines – It is a state machine that shows the allowable sequences of controller action executions in each session of a web application • MVC frameworks typically use a hierarchical structure where actions are combined in the controllers and controllers are grouped into modules – We exploit this hierarchy to specify the navigation state machines as hierarchical state machines Navigation state machines • The states of a navigation state machine are defined by – the values of the session variables, – the last action executed by the application – and the request parameters of the last action • We assume that this information is enough to figure out what are the next actions that can be executed by the application • NSM specification and verification is session modular What can we do with NSMs? • If we can check that the web application conforms to the NSM, – then we can verify navigation properties on the NSM and conclude that the navigation properties hold for the application – We can also use automated verification techniques to check properties of NSMs – This way we can eliminate the navigation errors • Problem: How do we ensure that the application conforms to the NSM? – Two approaches • Automatically extract the NSM from the application • Manually specify the NSM and use runtime enforcement to make sure that the application follows the NSM • Or use a combination of these two Runtime Enforcement with NSMs • Statically verifying that a web application conforms to a navigation state machine is a very difficult problem (in general undecidable) • So, instead, we use runtime enforcement – We have a plugin that can be easily added to an MVC web application that takes a NSM as input and makes sure that every incoming request conforms to the NSM – If the incoming request does not obey the NSM, then the plugin either ignores the request and refreshes the previous page or generates an appropriate error message – This way non-compliant user requests can be handled uniformly without generating strange error messages Model Checking NSMs • Runtime enforcement ensures that the violations of NSM behavior will be handled uniformly at runtime • However, we may also want to check properties of NSMs • Is the logout page always followed by the homepage? • Our approach: • Ask developer to write properties of NSMs as temporal logic formulas • We translate NSMs to SMV specifications • We check if the properties hold on the NSMS using the NuSMV model checker Overview of Our Approach ACTL Properties NSM to SMV Translator Navigation State Machine (NSM) Specification SMV Static Verification Runtime Enforcement NSM Plugin (NSM Interpreter) Counter Exampe Verified Some examples • We studied three real-world, freely available web applications: – BambooInvoice: invoice management application 159,000 lines of code, 60 actions – Capstone: student project management system 41,000 lines of code, 33 actions – Digitalus: content management system 401,000 lines of code, 26 actions Extracting the NSM • We extracted the NSMs from the applications by hand, by exploring potential error sequences – Amount of effort: Half a day per application (including taking screenshots, drawing the graph, etc.) Application States Transitions Variables Digitalus 32 48 7 BambooInvoice 63 80 8 Capstone 8 16 1 Extracting the NSM • Most NSMs are a collection of groups of logically interrelated pages, with few entry and exit points between groups • Extraction by hand is easier than numbers show • The NSM only needs to be a (reasonably) conservative approximation of all the paths that the application tolerates • Future work: 1) semi-automated extraction of NSM from source code 2) promote NSM as part of code documentation Fragment of BambooInvoice's NSM • Yellow transitions have guards which, if violated cause PHP warnings • Target states for red transitions cause PHP error messages • Black transitions either have guards that are handled gracefully by the application and do not cause PHP messages or have no guard Model Checking NSMs • We used the NuSMV model checker to statically check navigation properties of NSMs expressed in ACTL • Some examples: – Once you login, the only way to go back to the login page is by traversing the logout page – Each controller has the 'index' action as its only entry point from other controllers • The original BambooInvoice and Digitalus assume, but do not enforce either of these properties – This is the cause for many cryptic error messages we found • We could statically verify with NuSMV that the NSM for both applications did fulfill these properties – 4.4 MB of memory – 0.4 sec running time Runtime enforcement of NSMs • PHP plugin for enforcement of NSMs at runtime – Intercepts page requests in an MVC application and validates them against the NSM (supplied in an external XML file) – One line of code to insert in MVC frameworks (Zend, CodeIgniter) – Simple: 1,100 lines of PHP code (3% of the smallest application) • Average processing time when an action conforms to the NSM: Application Time without plugin (ms) Time with plugin (ms) Digitalus 11 12 BambooInvoice 183 199 Capstone 90 122 Runtime enforcement of NSMs • Processing time when an action provokes a PHP warning (which the NSM-enabled application blocks): Application Action Time without plugin (ms) Time with plugin (ms) Digitalus Create folder 28 26 Digitalus Upload media 18 32 BambooInvoice New invoice 938 574 • ...and when an action provokes a PHP error: Application Action Time without plugin (ms) Time with plugin (ms) Digitalus Edit page 416 32 Digitalus Delete folder 424 36 BambooInvoice View invoice 564 594 Runtime enforcement of NSMs • Take-home point: runtime enforcement pays – Reasonable overhead when everything is OK – Can actually save CPU time by sparing the application from processing an error • Other advantage: prevents an error from occurring, instead of recovering from it after the fact – E.g.: rolling back database operations – Catching an exception at the earliest moment vs. propagating it deeper in the stack trace • Reminder: also guarantees that the results obtained by static verification hold How many errors do we prevent? • We can provide an estimate based on the ''colored'' transitions we found while extracting the NSM • Count the number of valid traces of length k-1 (i.e., that follow the NSM from its start state) • Then count the ways these traces can be extended to an invalid trace of length k by executing an unexpected action • Calculate the percentage of these unexpeted traces that are not caught by the application • This ratio represents the proportion of traces of length k for which a navigation constraint is assumed, but not checked, by the application. • BambooInvoice: 64% of all unexpected navigation traces longer than 4 can generate a cryptic error message • Digitalus: 52% of all unexpected navigation traces longer than 4 can generate a cryptic error message Related Work • Navigation problems in Web applications have been identified a while ago [Licata and Krishnamurthi, ASE 2004] • There are programming language based solutions for this problem that use continuations [Krishnamurthi et al. 2006] • Modeling web applications as state machines has been proposed and investigated before [Miao, Zeng ICECCS 2008], [Han, Hofmeister MODELS 2007] • Runtime enforcement of navigation state machines is related to earlier work on runtime monitoring and verification [see Runtime Verification Conference] Conclusions • Web applications suffer from weak enforcement mechanisms for valid navigation sequences; this is the source of cryptic and confusing errors • Navigation State Machines (NSM) are finite state machines that can formally represent valid navigation paths, along with constraints on request parameters • By combining enforcement of NSMs at runtime with static verification of NSMs, we can... – Prevent navigation errors from occuring – Verify navigation properties by model checking the NSMs rather than the applications themselves