CS 290C: Formal Models for Web Software Lecture 6: Language and Model-Based Solutions to Navigation Errors Instructor: Tevfik Bultan Eliminating Navigation Errors • There are several approaches that have been proposed for eliminating navigation errors – Model driven development approaches where the application is specified or enhanced using a formal model • For example: statecharts for modeling navigation – Reverse engineering approaches where a formal model is extracted fro the application • For example: Extracting a state machine model for navigation by analyzing the links that are inserted in web pages Model Driven Development Approach • Model driven development approach enables – Specification of the behavior of the application at a high level of abstraction, making it easier to develop applications. – The actual implementation can be automatically or semiautomatically generated from the high level models – Separation of concerns can be achieved by specifying different concerns about the application (such as the data model or the navigation constraints) using different specification mechanisms • However, model driven development requires the developers to learn and use the modeling languages • There is a concern about the mapping between the actual implementation and model (they have to maintained together) Reverse Engineering Approach • Reverse engineering approaches does not require developers to learn a new specification language • Since reverse engineering approaches extract a model directly from the code, there is no maintenance issues (when the application changes, we can extract a new model) • However, reverse engineering is hard: – Extracting sound models using static analysis can lead to very approximate models that do not contain much information or can be undecidable for more precise models – Extracting models by observing runtime behavior is not sound and cannot be used to guarantee correctness Language Based Approaches • Both model driven development and reverse engineering approaches can be considered software engineering approaches • Another approach would be to use a programming language based approach • Can we model the problems that appear in Web applications in programming language terms and possibly suggest solutions using programming language mechanisms (such as type checking)? Today I will discuss two approaches • A language based approach for modeling and analyzing navigation problems in Web applications where the navigation problems are resolved using language-based constructs (such as types): “Modeling Web Interactions and Errors,” S. Krishnamurthi, R. B. Findler, P. Graunke, and M. Felleisen. • A model-driven approach where the navigation problems are addressed using a formal model specifying the navigation behavior and analyzing it: "Eliminating Navigation Errors in Web Applications via Model Checking and Runtime Enforcement of Navigation State Machines.” Sylvain Halle, Taylor Ettema, Chris Bunch and Tevfik Bultan. Web Applications • A Web program’s execution consists of a series of interactions between a Web browser and a Web server • When a browser submits a http request whose URL points to a Web program, the server invokes the corresponding program • It then waits from the program to terminate and turns the program’s output into a response that the browser can display, i.e., it returns a Web page. • Each such program a “script” that reads some inputs and writes some output Challenges of script oriented programming • This simple request-response style programming using scripts makes design of multi-stage Web interactions difficult • A multi-stage interactive Web program consists of many scripts each handling one request – These scripts communicate with each other via external media since they must remember the earlier part of the interaction – Forcing scripts to communicate this way causes problems since they lead to unstated and easily violated invariants Web Applications • Use of the Web browser creates further complications – A browser is designed to let a user navigate a web of hyperlinked nodes – When a user uses this power to navigate an interaction with an application many unexpected scenarios can happen • User can backtrack to an earlier stage of the interaction • User can duplicate a page and generate parallel interactions A Language Based Approach • We will first describe a formal model that captures the essence of Web application behavior • Then we will investigate the use of language based techniques to address the navigation problems A Formal Model • A Web application (W) consists of – a server (S) and – a client (C) • Server consists of – a storage, and – a dispatcher • Dispatcher contains – a table (P) of programs that associates URLs with programs and – an evaluator that applies programs from the table to the submitted form A Formal Model • Every page is simply a form (F) that contains – the URL to which the form is submitted, and – a set of form fields • A field name is a value that can be edited by the client • The client stores the – the current form and – the sequence of all the forms that have been visited by the client so far (cached pages) Web Program Behavior • The behavior of the Web program is described using three types of actions: – Fill-form: This corresponds to client editing values of fields in the current form. The modified form becomes the current form and is added to the cache – Switch: Makes a form from the cache the current form – Submit: dispatches on the current form’s URL to find a program in the table P. This program accesses the server state and the current form and updates the server state and generates a new form which becomes the current form A Simple Web Programming Language • A simple functional programming language can be specified to characterize the basic operations that are required to write a web application: – Extract a field from a form – Construct a new form – Modify fields of a form • To allow stateful programming we can introduce read and write operations that allow read and write access to the server storage Navigation Problems • Two navigation problems can be characterized formally in this model: – Script communication problem: Where a script accepts a different type of form than what is delivered to it. For example, the script tries to access a field that does not exist in the form. – HTTP observer problem: Since the http protocol does not allow a proper implementation of the observer pattern (which enables independent observers to be notified of state changes) a page received by the client can become outdated when the data model changes in the server. Script Communication Problem and Types • The main issue in script communication problem is type mismatch between the forms generated and consumed by different scripts • Since these scripts are loosely coupled programs, there is no standard type checking mechanism that can be used to make sure that these type mismatches do not happen • Checking all scripts together is not feasible since they are developed incrementally and may reside on different Web servers and may be written using different programming languages An Incremental Type System for Web Applications • The proposed solution is the following: – When the Web server receives a request for a URL that is not already in its table, it installs the relevant program – Before installing the relevant program it checks that there is no type mismatch with the input form and the installed program (internal consistency check) – Furthermore it generates type constraints that this new installed program imposes on other programs in the server that it interacts (these become external consistency checks) • If either the internal or external check fails the program is rejected resulting in an error A Simple Typed Web Programming Language • The simple functional Web programming language can be extended with types by requiring type declarations for function arguments • The type system for this language shows how external type checking can be done – While traversing the program, the type system generates a set of type constraints on external programs – Each constraints state a condition such as: a program associated with a particular URL should consume Web forms of a particular type Solving Script Communication Problem with Type Checking • Using type checking with this incremental system it can be guaranteed that – scripts do not get stuck when they are processing appropriately typed forms – Server does not apply the scripts to forms with wrong types Solving the http observer problem with timestamps • Server keeps track of the number of processed submissions (this represent time) • The external storage is changed so that it maps locations to values + timestamp for the last write • The server also maintains the set of all storage locations read or written during the execution of a script (called a carrier set CS) – When sever sends a page to the consumer, it adds the current time stamp and this set of locations as an extra hidden field Solving the http observer problem with timestamps • A form with carrier set CS and time stamp T submitted to a server is out of date if and only if any of the locations in CS have a timestamp at the server that is greater than T • A runtime error can be generated when out of date forms are submitted preventing execution of scripts with out of date data – This approach solves the Orbitx problem of booking an unintended flight • However, this approach can also generate false positives (for example a page counter value may make the form out of date) – So the programmers must specify which reads or writes are relevant, and an error is generated only when a relevant field is out of date Modeling web application behavior with continuations • Another language-based approach that has been investigated in web application development is the use of continuations for modeling web application behavior • A “continuation” is an abstract representation of the control state of a program • In the continuation-passing-style of programming the control is passed explicitly using continuations – When invoking a function written using the continuationpassing-style, the caller function passes a continuation that will be invoked with the return value of the callee after the callee terminates Modeling web application behavior with continuations • Using continuations we do not have to think of a web application as a collection of scripts • Using continuations we can capture the behavior of a web application as a single program that suspends its behavior while interacting with the user • When a web application is invoked by submitting a form, after it performs its task, it outputs the result and a continuation – This continuation then is used to process the next form submission Modeling web application behavior with continuations • In the continuation-based model, when a page is sent to the user, the current “continuation” is captured and stored in a table. – The form sent to the user contains a URL that contains a reference to that table entry – When user submits the form, the server invokes the corresponding continuation which then continues the execution from the corresponding control location • If you are interested in this topic here is a paper that discusses this view: “The Influence of Browsers on Evaluators or, Continuations to Program Web Servers,” Christian Queinnec. Modeling web application behavior with continuations • Using this continuation-based approach, one can investigate the effects of using the Back button, multiple window creation, direct URL entry, etc. in a web application. • If you are interested in this topic here is a paper that discusses this view: “The Influence of Browsers on Evaluators or, Continuations to Program Web Servers,” Christian Queinnec. A model-based approach to navigation problems • We have discussed some language-based ideas for dealing with navigation problems in web applications, now we will discuss a model-based approach • One successful approach to web application development has been adoption of design patterns that bring some structure to the scripts that implement the web application • Web application development frameworks that adopt these design patterns have become very successful Model-View-Controller (MVC) Architecture • MVC is a design structure for separating representation from presentation using a subscribe/notify protocol • The basic idea is to separate – where and how data (or more generally some state) is stored, i.e., the model – from how it is presented, i.e., the views • Follows basic software engineering principles: – Separation of concerns – Abstraction Model-View-Controller (MVC) Architecture • MVC consists of three kinds of objects – Model is the application object – View is its screen presentation – Controller defines the way the user interface reacts to user input a=50% b=30% c=20% model views Model-View-Controller (MVC) Architecture • MVC decouples views and models by establishing a subscribe/notify protocol between them – whenever model changes it notifies the views that depend on it – in response each view gets an opportunity to update itself • This architecture allows you to attach multiple views to a model – it is possible to create new views for a model without rewriting it Model-View-Controller (MVC) Architecture • Taken at face value this may be seen as an architecture for user interface design – It is actually addresses a more general problem: • decoupling objects so that changes to one can affect any number of others without requiring the changed object to know the details of the others – This is called Observer pattern in the design patterns catalog • Observer pattern is a design pattern that is used as part of the Model-View-Controller (MVC) architecture to handle notification of multiple views that depend on a single model A Brief Overview of Design Patterns • Think about the common data structures you learned – Trees, Stacks, Queues, etc. • These data structures provide a set of tools on how to organize data • Probably you implement them slightly differently in different projects A Brief Overview of Design Patterns • Main concepts about these data structures, such as – how to store them – manipulation algorithms are well understood • You can easily communicate these data structures to another software developer by just stating their name • Knowing them helps you when you are dealing with data organization in your software projects – Better than re-inventing the wheel A Brief Overview of Design Patterns • This is the question: – Are there common ideas in architectural design of software that we can learn (and give a name to) so that • We can communicate them to other software developers • We can use them in architectural design in a lot of different contexts (rather than re-inventing the wheel) • The answer is yes according to E. Gamma, R. Helm, R. Johnson, J. Vlissides – They developed a catalog of design patterns that are common in object oriented software design A Brief Overview of Design Patterns • Design patterns provide a mechanism for expressing common design structures • Design patterns identify, name and abstract common themes in software design • Design patterns can be considered micro architectures that contribute to overall system architecture • Design patterns are helpful – In developing a design – In communicating the design – In understanding a design A Brief Overview of Design Patterns • The origins of design patterns are in architecture (not in software architecture) • Christopher Alexander, a professor of architecture at UC Berkeley, developed a pattern language for expressing common architectural patterns • Work of Christopher Alexander inspired the work of Gamma et al. • In explaining the patterns for architecture, Christopher Alexander says: “Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice” • These comments also apply to software design patterns Resources for Design Patterns • Original paper: – “Design Patterns: Abstraction and Reuse of ObjectOriented Design” by E. Gamma, R. Helm, R. Johnson, J. Vlissides • Later, same authors published a book which contains an extensive catalog of design patterns: – “Design Patterns: Elements of Reusable ObjectOriented Software”, by E. Gamma, R. Helm, R. Johnson, J. Vlissides, Addison-Wesley, ISBN 0-201-63361-2 Cataloging Design Patterns • Gamma et al. present: – A way to describe design patterns – A way to organize design patterns by giving a classification system • More importantly, in their book on design patterns, the authors give a catalog of design patterns – As a typical developer you can use patterns from this catalog – If you are a good developer you can contribute to the catalog by discovering and reporting new patterns • The template for describing design patterns used by Gamma et al. is given in the next slide Design Pattern Template DESIGN PATTERN NAME the name should convey pattern’s essence succinctly Jurisdiction Characterization used for categorization Intent What particular design issue or problem does the design pattern address? Motivation A scenario in which the pattern is applicable. This will make it easier to understand the more abstract description that follows. Applicability What are the situations the design pattern can be applied? Participants Describe the classes and/or objects participating in the design pattern and their responsibilities. Collaborations Describe how the participants collaborate to carry out their responsibilities. Diagram A class diagram representation of the pattern (extended with pseudo-code). Consequences What are the trade-offs and results of using the pattern? Implementation What pitfalls, hints, or techniques should one be aware of when implementing the pattern? Examples Examples of applications of the pattern in real systems. See Also What are the related patterns and what are their differences? Observer Pattern • Observer pattern is a design pattern based on Model-ViewController (MVC) architecture • In the next slides, I will give the design pattern catalog entry for the Observer pattern. Observer Behavioral Intent The Observer pattern defines an one-to-many dependency between a subject object and any number of observer objects so that when the subject object changes state, all its observer objects are notified and updated automatically. Motivation The Observer design pattern has two parts and they are subject and observer. The relationship between subject and observer is one-to-many. In order to reuse subject and observer independently, their relationship has to be decoupled. An example of using the observer pattern is the graphical interface toolkit which separates the presentational aspect with application data. The presentation aspect is the observer part and the application data aspect is the subject part. For example, in a spreadsheet program, the Observer pattern can be applied to separate the spreadsheet data from its different views. In one view spreadsheet data can be presented as a bar graph and in another view it can be represented as a pie chart. The spread sheet data object notifies the observers whenever a there is a data change that can make its state inconsistent with its observers. Class Diagram for the Observer Pattern Subject observers Attach(Observer) Detach(Observer) Notify() Observer Update() for all o in observers { o->Update(); } ConcreteObserver ConcreteSubject observerState subjectState Update() GetState() SetState() return subjectState; observerState = subject->GetState(); :ConreteSubject a:ConcreteObserver b:ConcreteObserver SetState() Notify() Update() GetState() Update() GetState() Applicability Use the observer pattern in any of the following situations: • When the abstraction has two aspects with one dependent on the other. Encapsulating these aspects in separate objects will increase the chance to reuse them independently. • When the subject object doesn't know exactly how many observer objects it has. • When the subject object should be able to notify it's observer objects without knowing who these objects are. Participants • Subject • Knows it observers • Has any number of observer • Provides an interface to attach and detaching observer object at run time •ConcreteSubject • Store subject state interested by observer • Send notification to it's observer •Observer • Provides an update interface to receive signal from subject • ConcreteObserver • Maintain reference to a ConcreteSubject object • Maintain observer state • Implement update operation Consequences Further benefit and drawback of Observe pattern include: •Abstract coupling between subject and observer, each can be extended and reused individually. • Dynamic relationship between subject and observer, such relationship can be established at run time. This gives a lot more programming flexibility. • Support for broadcast communication. The notification is broadcast automatically to all interested objects that subscribed to it. •Unexpected updates. Observes have no knowledge of each other and blind to the cost of changing in subject. With the dynamic relationship between subject and observers, the update dependency can be hard to track down. Known Uses • Smalltalk Model/View/Controller (MVC). User interface framework while Model is subject and View is observer. Back to MVC How do the MVC architecture and the Observer pattern relate to Web applications? • The reason we are discussing the MVC architecture is that many Web applications nowadays are built based on the MVC architecture • The reason we are discussing the Observer pattern is that the MVC-based Web applications doe not properly use the Observer pattern, causing problems MVC Architecture in Web Applications • Many web frameworks support web application development based on the MVC architecture – Ruby on Rails, Zend Framework for PHP, CakePHP, Spring Framework for Java, Struts Framework for Java, Django for Python, … • MVC architecture has become the standard way to structure web applications MVC Framework for Web Applications • Use of MVC architecture in Web applications – Model: This is the data model which is an abstract representation of the data stored in the backend database. Typically uses an object-relational mapping to map the class structure for the data model to the tables in the back-send database – Views: These are responsible for rendering of the web pages, i.e., how is the data presented in user’s browser – Controllers: Controllers are basically event handlers that process incoming user requests. Based on a user request, they can update the data model, and create a new view to be presented to the user MVC Framework for Web Applications • Note that use of MVC in web applications does not fit the Observer pattern – typically it is not possible to refresh a browser window directly when the data model changes (i.e., it is not possible to actively notify the observers when the state of the subject has changed) • This can create navigation problems – when there are multiple windows open, they may represent stale views – this was the problem in the orbitz example we discussed earlier Abstraction in MVC Frameworks • MVC framework provides separation of concerns and abstraction, which can be exploited for analysis – For example, for analyzing properties of the data model we can focus on the data model and ignore the views – We can focus on the behaviors of the controllers to eliminate navigation errors Achieving Navigation Correctness in MVC • I will discuss some work we have done recently on analyzing navigation behavior in web applications developed using MVC frameworks • The idea is – to exploit the abstraction provided by the MVC architecture by enforcing navigation constraints at the controller – use model driven development to provide a navigation model and analyze it statically – enforce the navigation model synamically at runtime to prevent navigation errors Request processing in a Web application A formal model We can formally model an MVC application as • M: is a set of data model states, where the data model can include any stateful representation of application data, such as a database, • V: is a set of session variables, i.e., data stored on the server on a per-client basis, • I: is a set of sessions used by the server to associate clients with session variables, • A: is a set of controller actions, i.e., the program segments that are invoked based on the HTTP requests sent by the user, • P: is a set of request parameters, i.e., input data from the user received as part of the HTTP requests via GET or POST. A formal model • A web application is a tuple A = (Q, B, T) where: – Q is the set of states, which is the Cartesian product of the model states and the domains of the session variables for each session – B is the set of initial states – T is the transition relation mapping a state, an action, a set of request parameters and a session to a next state • The transition relation must guarantee that each session can only modify its own session variables • The model can change even when there is no action executed (i.e., the backend database contents can change without a request from a user) Session traces • Given the formal model, we can define global execution traces of a web application – Each trace starts from an initial state – Each element in the trace is a tuple: (state, action, request parameters, session index) – Any two consecutive tuples in the trace must be consistent with the transition relation of the application • We can project each global trace to a session i (by deleting the tuples which do not contain i) and obtain a session trace Navigation state machines (NSMs) • A navigation state machine (NSM) is a state machine – that specifies acceptable sequences of actions and request parameters that can appear in a session trace • Given a navigation state machine, and a session trace, – the session trace conforms to the navigation state machine • if the sequence of actions and request parameters for that session trace is accepted by the navigation state machine Navigation state machines • The states of a navigation state machine is defined by – the values of the session variables, – the last action executed by the application – And the request parameters that were sent with the last action • We assume that this information is enough to figure out what are the next actions that can be executed by the application Navigation state machines • We developed a simple language to specify navigation state machines • It is a state machine that shows the allowable sequences of controller action executions in a web application • MVC frameworks typically use a hierarchical structure where actions are combined in the controllers and controllers are grouped into modules – We exploit this hierarch to specify the navigation state machines as hierarchical state machines (statecharts) Navigation state machines • In addition to identifying which action can be executed after which other action, – navigation state machines also identify constraints among the request parameters between two consecutive requests – This can be used to make sure that the values stored in cookies are not changes for example Navigation state machine example • A portion of the navigation state machine for the Digitalus system (an open source content management system) What can we do with NSMs • If we can check that the web application conforms to the NSM, – then we can verify navigation properties on the NSM and conclude that the navigation properties hold for the application – We can use model checking to check properties of NSMs – This way we can eliminate the navigation errors • Big problem: How do we ensure that the application conforms to the NSM? Runtime Enforcement • Statically verifying that a web application conforms to a navigation state machine is a very difficult problem (in general undecidable) • So, instead, we use runtime enforcement – We have a plugin that can be easily added to an MVC web application that takes a NSM as input and makes sure that every incoming request conforms to the NSM – If the incoming request does not obey the NSM, then the plugin either ignores the request and refreshes the previous page or generates an appropriate error message – This way non-compliant user requests can be handled uniformly without generating strange error messages Model checking NSMs • We translate NSM models to SMV • We write navigation constraints in temporal logic: G (login => (!login U logout)) • We check the properties using a model checker Model checking and Runtime Enforcement • Our approach combines the following ideas: – Using model driven development for specification of navigation constraints – Using model checking to verify properties of formal navigation models – Using runtime enforcement to ensure that the navigation behavior at runtime obeys the navigation model • We show that when all these are combined, navigation errors can be eliminated