Eliminating Bugs In MVC-Style Web Applications Tevfik Bultan Verification Lab (Vlab), CS@UCSB

advertisement
Eliminating Bugs In MVC-Style Web
Applications
Tevfik Bultan
Verification Lab (Vlab), CS@UCSB
bultan@cs.ucsb.edu
http://www.cs.ucsb.edu/~vlab
Joint work with my students & postdocs
•
•
•
•
Fang Yu
Muath Alkhalaf
Jaideep Nijjar
Sylvain Halle
Web software everywhere
• Commerce, entertainment, social interaction
• We will rely on web apps more in the future
• Web apps + cloud will make desktop apps obsolete
One major road block
• Web applications are not trustworthy
– They are notorious for security vulnerabilities and
unreliable behavior
• Web applications are hard to program
– Distributed behavior, interaction among many
components and many languages
• We applications are hard to test
– Highly dynamic behavior and concurrency
• Testing accounts for 50% of the development costs and
software failures cost USA 60 billion annually
Why are web applications error prone?
• Script-oriented programming
– A collection of scripts with no explicit control flow
• Extensive string manipulation
– User input comes in string form, output (html, SQL) is
generated via string manipulation
• Complex interactions
– User interaction via browsers, client-side server-side
interaction, interaction with the backend database
• Distributed system
– Many components have to coordinate
VLab Research Agenda
• Improve dependability of web applications
– How?
• By eliminating bugs
– How?
» Use automated verification
• By eliminating bugs I do not mean debugging
– Debugging is done after the bug is found
• Finding the bug is the hard part
• Our goal is to find the bugs automatically
– and eliminate them before the software is deployed
• we do this using automated verification
We have a hammer
Automated Verification
Actually, we have several hammers
Bounded verification
via SAT solvers
Model checking
Symbolic execution
via SMT solvers
String analysis
Automated Verification Techniques
Unfortunately, life is complicated
Web application
problems
Automated verification
techniques
Automated verification is hard
• If it wasn’t, everybody would be using it
• It is hard because software systems are too complex
• In order to make automated verification feasible
– we need to focus our attention
Making automated verification feasible
• We focus our attention by
– Abstraction
• Hide details that do not involve the things we are
checking
– Modularity
• We focus on one part of the system at a time
– Separation of concerns
• We focus on one property at a time
• It turns out these are also the main principles of software
design
– We should be able to exploit the design structure in
automated verification!
Separation of concerns
• First, we need to identify our concerns
– What should we be concerned with if we want to
eliminate the bugs in web applications
• Here are some concerns:
– Input validation
• Errors in input validation are a major cause of
security vulnerabilities in web applications
– Navigation
• Many web applications do not handle unexpected
user requests appropriately
– Data model
• Integrity of the data model is essential for
correctness of web applications
Why care about input validation?
String related web application vulnerabilities as a
percentage of all vulnerabilities (reported by CVE)
50%
45%
40%
File Inclusion
XSS
SQL Injection
35%
OWASP Top 10 in 2007:
1. Cross Site Scripting
2. Injection Flaws
30%
25%
OWASP Top 10 in 2010:
1. Injection Flaws
2. Cross Site Scripting
20%
15%
10%
5%
0%
2001 2002 2003 2004 2005 2006 2007 2008 2009
Why care about navigation?
Why care about navigation?
Why care about navigation?
Why care about data model?
• The data in the back-end database is the most important
resource for most web applications
• Integrity and consistency of this data is crucial for
dependability of the web application
Separation of concerns
• By focusing on one problem at a time:
– Input validation errors
– Navigation errors
– Data modeling errors
We are more likely to develop feasible verification techniques
Still, we need more help!
Modularity: Model-View-Controller
• Model-view-controller (MVC) architecture has become the
standard way to structure web applications
– Ruby on Rails, Zend Framework for PHP, CakePHP,
Spring Framework for Java, Struts Framework for Java,
Django for Python, …
• MVC architecture separates
– where and how the data is stored (the model)
– from how it is presented (the views)
Model-View-Controller (MVC) architecture
• MVC consists of three modules
– Model represents the data
– View is its presentation to the user
– Controller defines the way the application reacts to
user input
a=50%
b=30%
c=20%
model
views
MVC framework for web applications
• Model:
– An abstract representation of the data stored in the
backend database
– Uses an object-relational mapping to map the
application class structure to the back-end database
tables
• Views:
– Responsible for rendering of the web pages, i.e., how is
the data presented in user’s browser
• Controllers:
– Event handlers that process incoming user requests
– They can update the data model, and create a new view
to be presented to the user
Exploiting MVC for verification
• Modularity provided by MVC can be exploited in verification
• For checking navigation properties
– Focus on controllers
• For checking input validation
– Focus on controllers
• For checking data model properties
– Focus on the model
Still, need more help
• Even with
– the separation of concerns
• check one type of property at a time (navigation,
input validation, data model)
– and modularity
• focus on only one component at a time (controller or
the model)
we still need more help!
To completely automate the verification process
– We need to find a way to map the code written in a
programming language to something that an automated
verification tool can check
We have a verification model problem!
How to get verification models?
• The verification model is an abstraction of the behavior of
the application
• We obtain this abstraction using two basic approaches:
1. When possible we automatically extract a verification
model from the code using automated abstraction +
static analysis
• Automated model extraction
2. If we cannot automatically extract the model, we ask
the developer to write a model. In this case we have to
enforce the specified model at runtime
• Automated model enforcement
Three verification techniques
1. Concern: Input validation checking
– Module: Controller
– Verification model extraction via abstraction:
Dependency analysis to extract dependency graph
– Verification technique: String analysis
2. Concern: Navigation correctness
– Module: Controller
– Verification model specification by developer:
Automated enforcement of navigation model at runtime
– Verification technique: Model checking
3. Concern: Data model integrity
– Module: Data model
– Verification model extraction via abstraction:
– Verification technique: Bounded SAT based verification
Making it work
Separation of concerns
+ modularity
+ abstraction
Input
validation
String
analysis
Navigation
correctness
Model
checking
Data
integrity
Bounded
SAT
based
verification
Input validation verification
Application/
Scripts
Parser/
Taint Analysis
Attack
Patterns
(Tainted) Dependency
Graphs
Reachable Attack
Strings
Vulnerability Analysis
Vulnerability
Signature
Signature Generation
Sanitization
Statements
Patch Synthesis
An example XSS vulnerability
A PHP Example:
1:<?php
<!sc+ri++pt! ... >
2: $www = $_GET[”www”];
3: $l_otherinfo = ”URL”;
4: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);
>
5:<script
echo ...
”<td>”
. $l_otherinfo . ”: ” . $www . ”</td>”;
6:?>
Line 4 removes all characters that are not in {A-Za-z0-9 .@:/}
•However, “.-@” denotes all characters between “.” and
“@” (including “<” and “>”)
•Our string analysis automatically detects that there is a XSS
vulnerability in line 5
•
Automatically generated patch
•
•
Our string analysis generates a vulnerability signature that
characterizes all possible input strings that can exploit the
vulnerability
Our patch generation algorithm automatically generates a
patch that removes the vulnerability
1: <?php
P: if(preg match(’/[^ <]*<.*/’,$_GET[”www”]))
$_GET[”www”] = preg replace(<,””,$_GET[”www”]);
2: $www = $_GET[”www”];
3: $l_otherinfo = ”URL”;
4: $www = preg_replace(”[^A-Za-z0-9 .-@://]”,””,$www);
5: echo ”<td>” . $l_otherinfo . ”: ” .$www. ”</td>”;
6: ?>
Experiments
We applied our analysis to three open source PHP applications
•Webches 0.9.0 (a server for playing chess over the internet)
•EVE 1.0 (a tracker for player activity for an online game)
•Faqforge 1.3.2 (a document management tool)
Application
#Files
LOC
# XSS
Sinks
# SQLI
Sinks
1
Webchess
0.9.0
23
3375
421
140
2
EVE 1.0
8
906
114
17
3
Faqforge
1.3.2
10
534
375
133
•Attack patterns:
– Σ∗<scriptΣ∗ (for XSS)
– Σ∗ or 1=1 Σ∗ (for SQLI)
Verification Results

We use (single, 2, 3, 4) indicates the number of detected
vulnerabilities that have single input, two inputs, three
inputs and four inputs
Type
#
Time
Vulnerabilities (s)
(single, 2, 3, 4) total
fwd
bwd
relational
Memory
(KB)
average
1. XSS
SQL
(24, 3, 0, 0)
(43, 3, 1, 2 )
46.08
110.7
1.73
4.87
0.92
12.04
6.30
38.03
16850
136790
2. XSS
SQL
(0, 0, 8, 0)
(8, 3, 0, 0)
288.50
23.9
6.80
1.5
8.47
127.80
5.2
125382
17280
3. XSS
SQL
(20, 0, 0, 0)
(0, 0, 0, 0)
7.87
6.7
0.22
-
0.22
-
-
9948
<1
Making it work
Separation of concerns
+ modularity
+ abstraction
Input
validation
String
analysis
Navigation
correctness
Model
checking
Data
integrity
Bounded
SAT
based
verification
Navigation verification
• We ask developer to specify the high level navigation
behavior as a navigation state machine
• MVC frameworks typically use a hierarchical structure
where actions are combined in the controllers and
controllers are grouped into modules
– We exploit this hierarchy to specify the navigation state
machines as hierarchical state machines
Navigation verification
Navigation
Properties
Navigation
State
Machine
Translator
SMV
Static Verification
Runtime Enforcement
NSM Plugin
(NSM Interpreter)
Counter-example
behavior
Verified
Navigation verification experiments
• We studied three real-world, freely available web
applications:
– BambooInvoice: invoice management application
159,000 lines of code, 60 actions
– Capstone: student project management system
41,000 lines of code, 33 actions
– Digitalus: content management system
401,000 lines of code, 26 actions
Specifying the NSM
• We specified the NSMs manually, by exploring potential
error sequences in applications
– Amount of effort: Half a day per application (including
taking screenshots, drawing the graph, etc.)
Application
States
Transitions
Variables
Digitalus
32
48
7
BambooInvoice
63
80
8
Capstone
8
16
1
Model checking NSMs
• We used an existing model checker (SMV) to statically
check navigation properties of NSMs
• Some example navigation properties:
– Once you login, the only way to go back to the login
page is by traversing the logout page
– Each controller has the 'index' action as its only entry
point from other controllers
• We can statically verify the NSMs with SMV model checker
– Verification uses less than 4.4 MB of memory and takes
less than 1 second
Runtime enforcement of NSMs
• PHP plugin for enforcement of NSMs at runtime
– Intercepts page requests in an MVC application and
validates them against the NSM
– One line of code to insert in the application
– Simple: 1,100 lines of PHP code (3% of the smallest
application)
• Average processing time when an action conforms to the
NSM:
Application
Time without
plugin (ms)
Time with plugin
(ms)
Digitalus
11
12
BambooInvoice
183
199
Capstone
90
122
Runtime enforcement of NSMs
• Processing time when an action provokes a PHP warning
(which the NSM-enabled application blocks):
Application
Action
Time without
plugin (ms)
Time with
plugin (ms)
Digitalus
Create folder
28
26
Digitalus
Upload media
18
32
BambooInvoice
New invoice
938
574
• ...and when an action provokes a PHP error:
Application
Action
Time without
plugin (ms)
Time with
plugin (ms)
Digitalus
Edit page
416
32
Digitalus
Delete folder
424
36
BambooInvoice
View invoice
564
594
Making it work
Separation of concerns
+ modularity
+ abstraction
Input
validation
String
analysis
Navigation
correctness
Model
checking
Data
integrity
Bounded
SAT
based
verification
Data model verification
• We worked on verification of static data model properties in
Ruby-on-Rails applications
• Rails uses active records for object-relational mapping
• There are many options in active records declarations that
can be used to specify constraints about the data model
• Our goal:
– Automatically analyze the active records files in Rails
applications to check if the data model satisfies some
properties that we expect it to satisfy
• Approach:
– Bounded SAT based verification
– We automatically search for data model errors within a
given scope
Data Model Verification
Active
Records
Translator
Alloy Specification
Counter-example
data model instance
Data
Model
Properties
Alloy Analyzer
Verified
Data Model Verification Experiments
• We used two open source Rails applications in our
experiments:
– TRACKS: An application to manage things-to-do lists
• 6062 lines of code, 44 classes, 13 data model
classes
• Alloy translation: 301 lines
– Fat Free CRM: Customer Relations Management
software
• 12069 lines of code, 54 classes, 20 data model
classes
• Alloy translation: 1082 lines
Types of properties we checked
• Relationship cardinality
– Is it possible to have two accounts for one user?
• Transitive relations
– A book’s author is the same as the book’s edition’s
author
• Deletion properties
– Deleting a user does not create orphan orders
• We wrote 10 properties for TRACKS and 20 properties for
Fat Free CRM
Data Model Verification Experiments
• Of the 30 properties we checked 7 of them failed
• For example, in TRACKS Note’s User can be
different than Note’s Project’s User
Conclusions
• Web application dependability is a very important problem
• Using automated verification we can find and remove
errors in web application before they are deployed
• In order to develop feasible automated verification
techniques we exploit the MVC architecture and the
principles of modularity, abstraction and separation of
concerns
• At minimum, the developer has to specify the correctness
properties (e.g. attack patterns)
– In some cases, the developer may also need to specify
desired behavior (e.g. navigation)
Future work
• Handle more concerns
• Maximize automation
• Mobile applications
Download