
A Systematic Analysis of XSS Sanitization in
Web Application Frameworks
Executive summary
• Web page processing analyzed in detail
• Sanitization is quite complex
• Context sensitive
• 14 WEB frameworks analyzed
• None handle sanitization properly
• In some cases they give a false sense of security because the
algorithm is wrong
HTTP background
Basic HTTP operation
• Client sends
request to
Sample file
<h>Sample file</h>
<p>This is a sample</p>
• Server locates
and sends
back file
This is a sample
• Client displays
HTTP background
Server side scripting
• Client
echo ‘<h>Sample file</h>’;
echo ‘<p>This is a sample</p>’;
• Server
<h>Sample file</h>
<p>This is a sample</p>
Sample file
This is a sample
• Server
• Client
HTTP background
Form management
Please send me your
important financial
Name: Mr. Dummy__
Soc: 234-23-5555
Credit card number:
# save data somewhere
echo ‘<p>Now I own you.</p>’
Now I own you.
• User fills
in fields
• Client
data to
• Server
• Server
page to
HTTP background
Client side scripting
<h1>My First Web Page</h1>
<script type="text/javascript">
document.write("<p>" + Date() + "</p>");
HTTP background
Client side scripting
<h1>My First Web Page</h1>
<p>Tue Feb 28 2012 14:28:07 GMT-0500 (EST)</p>
HTTP background
Client side scripting
My First Web Page
Tue Feb 28 2012 14:28:07 GMT-0500 (EST)
XSS attack
Server side code prints text entered by a user from an
earlier session. Consider this code:
echo ‘<p>Note from ‘.$user.’</p>’
echo ‘<p>’.$note.’</p>’
Suppose $note contains
<script>document.write("<img src=" + document.cookie + ">")</script>
The sky is falling.
XSS attack
The result is that the following is sent to your browser:
<p>Note from Mr. Apocalypse</p>
<script>document.write("<img src=" +
document.cookie + ">")</script>The sky is falling.
XSS attack
Your browser displays the following:
Note from Mr. Apocalypse
[img] The sky is falling.
And the attacker has gotten your cookie.
XSS attack
The attacker simply needed to enter this script on the
screen used to post the note.
Logged in as: Mr. Apocalypse
Text of message to post:
<script>document.write("<img src=" +
document.cookie + ">")</script>The sky is falling._______
Any website that echoes back a user input can be used for
an XSS attack.
XSS attack
• The following can be used to obtain the cookie for your
bank account:
me=<script>document.write("<img src="
+ document.cookie + ">")</script>'</script>
One solution is to escape out sensitive characters
<script>document.write("<img src=" +
document.cookie + ">")</script>
src=" + document.cookie +
Problem: sanitization needs to be done in a context
sensitive manner and the rules are very complex
Web page parsing
Challenge 1: context sensitivity
Consider this code:
echo ‘<p>’.$note.’</p>’
Here one can replace ‘<‘ with < and ‘> with > to block
attacks. However consider:
echo ‘<img src=‘.$url.’>’
Consider the following url:
picture.jpg’ onLoad=‘document.location=…”
Challenge 2: Sanitizing nested contexts
Consider this piece of php code:
echo ‘<script> var x = ‘.$UNTRUSTED_DATA.’...</script>’
One needs to block both the possibility of a </script> and
that of a ‘ to prevent attacks
Challenge 3: Browser transductions
<div class=‘comment-box’onclick=‘displayComment("
UNTRUSTED",this)’> ... hidden comment ... </div>
Even if all the “ characters are replaced with &quot, HTML
5 removes the encoding before passing the text to
Challenge 4: Dynamic code
Consider this program:
function foo(untrusted) {
document.write("<input onclick=’foo(" + untrusted + ")’ >");
Evaluation generates html code that will repeat the call to
the function.
Challenge 5: Character set issues
+ADw- maps to < in UTF-7
The sanitizer needs to recognize the character set
Challenge 6: everything else
• MIME based XSS
• Browser bugs
• Capability leaks
• Parsing inconsistencies
• Browser extensions
• Adobe flash is fairly buggy
Evaluation of web frameworks and
• Subjects
• 14 popular web application frameworks
• 8 popular php applications
• Evaluation
• Auto-sanitization and/or sanitization libraries
• Dynamic sanitization handling
Auto sanitization
• 7 of 14 support auto sanitization
• 4 of 7 of these perform context insensitive sanitization which is
inherently unsafe
• 14.8%-33.6% of output sinks fail to be protected by auto
sanitization in 10 popular Django application
Context sensitive sanitization
• Performed by 3 of 7 frameworks
• GWT, Google Clearsilver, and Google Ctemplate
• Involved a runtime parser that checked the context and
applied the appropriate sanitization function
• User needs to mark untrusted variables
• No detailed analysis of reliability
• I assume they worked reasonably well
Manual sanitization
• Prone to error
• Variables missed
• Wrong sanitization function used
Dynamic code evaluation
• Perform appropriate runtime checks before printing
untrusted strings
• Generally not supported by frameworks
• Four frameworks provided static sanitization of untrusted
strings within the context of Javascript constants
DOM based errors
• Javascript can actually reference the content of a web
<h1>This page changes itself</h1>
<a name=“xxx”>Original content</a>
document.anchors[0].innerHTML=“New content”;
DOM based errors
• Javascript can actually reference the content of a web
<h1>This page changes itself</h1>
<a name=“xxx”>New content</a>
document.anchors[0].innerHTML=“New content”;
DOM based errors
• Consider this code:
text = element.getAttribute(’title’);
// ... elided ...
desc = create_element(’span’, ’bottom’);
desc.innerHTML = text;
This code read an element from the HTML, destroy
escaping and reinsert it elsewhere
To avoid bug:
use of innerText to write or innerHTML to read
DOM based errors
• Ignored by frameworks
• Cause many XSS vulnerabilities
Expressiveness of contexts in web
• 8 php applications analyzed
• 19-532KLOC
• All applications emit untrusted data into all contexts
• Applications sometimes employ different sanitizers for the
same context
• General conclusion: frameworks do not provide sufficient
sanitization support
Manual sanitization expressiveness
• 9 of 14 frameworks do not support contexts other than the
generic HTML
4 provided sanitizers for Javascript string context
1 framework provided a sanitizer for Javascript number
and boolean contexts
None allow for sanitization of Javascript code
Only one framework allowed customization of the
sanitizer within a context—the others had a pre-packaged
sanitizer for all contexts
Correctness of sanitizers
• Sanitizers prone to error
• In frameworks they usually work on a “whitelist” model in
which only structures following specific patterns are
• One framework uses a “blacklist” model in which specific
strings are forbidden
• Frameworks rely on canonical form into which all output is
formatted to simplify sanitizers
• The authors conclude that the “whitelist” approach should
be researched. The “blacklist” approach is too error
Related work
• XSS analysis and defense
• Server side code errors
• Javascript code errors
• Research identifies vulnerabilities
• Untrusted data showing up in output
• Improper sanitization
• Server side solutions
• Formalize web model to design sanitizers
• Client side
• XSS-Auditor
• Analyze browser reference patterns to try and identify attacks
• Does not separate trusted and untrusted data
• Studies in sanitizer correctness
• Manual process of adding sanitization is error prone
• None provide a good underlying model for sanitizers
• Taint tracking and security typed languages
Paper’s conclusions
• Current frameworks do not properly manage sanitization
• The paper suggests a future direction of producing a
formal model of the browser’s behavior
Some later work
• Saxena developed php analysis tools
• Model checker – symbolic execution of php to try and find
dangerous code
• Static analysis—tries to identify and incorporate sanitizers
based on the context of a print
• Probably the better approach
• Needs to be integrated with some sort of dynamic analysis
Discussion questions
• What is the best approach for solving XSS?
• In addition to technical issues, what practical issues need to be
addressed to get a solution deployed? For example, asking
everyone to rewrite their php code is going to be difficult.
• Should the government get involved in regulating web
sites to make sure basic protection standards are upheld?
XSS attack game
• 2 teams
• Source code available from
• Look for $_GET and $_POST variables for user input
• Use MAMP to run