A Symbolic Execution Framework for JavaScript Prateek Saxena Devdatta Akhawe Steve Hanna Feng Mao Stephen McCamant Dawn Song UC Berkeley 1 Motivation: Rich Web Applications • Client-side JS complexity in Rich Web Applications • High cross-domain client-side data exchange • Need tools to analyze complex applications 2 An Important Application: Finding Client-side Code Injection Bugs • An Example • Several Client-side Data Exchange Channels for mashups Many attack vectors ….. <IMG SRC="javascript:alert('XSS')”> <IMG SRC=JaVaScRiPt:alert('XSS')> Data: “friendName: Joe,msg: Yo!” facebook.com Data: “..msg: <img src=s onerror=javascript:alert(..” FRAGMENT ID http://cnn.com ?friendName=Joe #msg=Yo! postMessage var DataStr = ’var new_msg =(’ +event.data+’);’; ParseData(DataStr); Parse the Input var regx = /<script.*>.*?<\/script>/g; if (regex.test(DataStr.msg)) { return false; } Validation checks n.innerHTML = DataStr.msg; Dynamic HTML update3 Problem Definition Automatically Find Code-Injection Vulnerabilities in JS Applications • Two challenges • #1: Automatic exploration of the execution space • #2: Automatically check if data is sanitized sufficiently – Can’t distinguish parsing ops. from custom validation checks – Can’t assume validation, false negatives vs. false positives. 4 Our Contributions • Existing Approaches – Static Analysis [Gatekeeper’09, StagedInfoFlow ‘09] – Taint-enhanced blackbox fuzzing [Flax’10] • Drawbacks – Either assumes an external test suite to explore paths [Flax’10] – Or, does not generate an exploit instance, can have FPs [Gatekeeper’09, StagedInfoFlow ‘09] • Our Contributions – – – – A Symbolic Analysis approach Kudzu: An end-to-end symbolic execution tool for JavaScript Identify a sufficiently expressive “theory of strings” Kaluza: A new expressive, efficient decision procedure » Supports strings, integers and booleans as first-class input 5 variables Outline • • • • • Problem Definition Previous Approaches vs. Our Approach Kudzu System Design Kaluza Decision Procedure Evaluation 6 Outline • • • • • Problem Definition Previous Approaches vs. Our Approach Kudzu System Design Kaluza Decision Procedure Evaluation 7 Kudzu: Approach and Design • Input space has 2 components – Event Space: GUI explorer – Value Space: Dynamic Symbolic Execution • Checking sufficiency of validation checks – Symbolic analysis of validation operations on code-evaluated data NEW INPUT FEEDBACK GUI EXPLORER DYNAMIC SYMBOLIC INTERPRETER KALUZA DECISION PROCEDURE APPLICATION-AGNOSTIC APPLICATION-SPECIFIC CHECKING SUFFICIENCY OF VALIDATION 8 Dynamic Symbolic Interpreter for JavaScript • Employed for Value Space Exploration New Input Initial Input Symbolic Formula f f' KALUZA DECISION PROCEDURE Program Concrete Execution Symbolic Execution 9 Checking Sufficiency of Validation Checks • To eliminate false positives Attack Grammar Specification INITIAL INPUT CODE EVALUATION CONSTRUCT KALUZA DECISION PROCEDURE I Attack INTERSECTION EMPTY If 10 GUI Exploration • • • • Events: State of GUI elements, mouse and link clicks Event Sequence: A sequence of state-altering GUI actions Event Space Exploration using a GUI explorer Practically enhances coverage benefits – Example: – 1 Gadget Vulnerability: reachable with a sequence of events executed: dropdown box value is changed, delete hit 11 Outline • • • • • Problem Definition Previous Approaches vs. Our Approach Kudzu System Design Kaluza Decision Procedure Evaluation 12 Empirical Motivation for A Theory of Strings split / match / test (1%) concat (8%) substring / charAt / charcodeAt (5%) – Combined string and integer solver replace / decodeURI / encodeURI (8%) indexOf/ lastIndexOf / strlen (78%) – Regular Expression based operations are 1/3rd of the match, split, test, replace operations (9%) – Multiple string variables /\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/ 33% regexes have Capture Groups 13 A Sufficiently Expressive Theory for JS • Practical Requirements to support [DPRLE’09] Concatenation (Word Equations) Regular Language Membership String Length Equality Multiple String Variables Boolean and Integer Logic [HAMPI’09] [PEX’09] Existing solvers not sufficiently expressive 14 Kaluza: A New Solver Decision Procedure • Input: A boolean combination of constraints over multiple integer and variable-length string variables STRING • Decidability vs Expressiveness – Equality between reg language variables undecidable [STOC’81] – Full generality of replace in word constraints undecidable [TACAS’09] SOLVING APPROACHES LANGUAGE EQUATIONS WORD CONSTRAINTS Insight: JS to Kaluza Reduction uses Dynamic Information JavaScript Language Operations Kaluza Core Constraints 15 Outline • • • • • Problem Definition Previous Approaches vs. Our Approach Kudzu System Design Kaluza Decision Procedure Evaluation 16 Kudzu System Evaluation • 18 Live Applications – 13 iGoogle gadgets – 5 AJAX application » Social networking: Academia, Plaxo » Chat applications: AjaxIM, Facebook Chat, » Utilities: parseURI • Setup – Untrusted sources » All cross-domain channels » Text boxes – Critical sinks » Code evaluation constructs 17 Results: Summary • Summary – Kudzu found 11 code injection vulnerabilities automatically – 2 previously unknown vulnerabilities – 6 hours of testing period • Examples – XSS in Facebook Connect used by 2 social networking sites – Gadget Overwriting Attacks on Google/IG – Self-XSS on AjaxIM • No false positives • Finds all known vulnerabilities in our benchmarks [Flax’10] 18 Results: Code Coverage 29% code coverage increase in 6 hours Initial Discovered Initial Executed Total Discovered Total Executed 19 Results: Code Coverage 29% code coverage increase in 6 hours Code Coverage (in %) 100 90 Coverage Increase Initial Coverage 80 70 60 50 40 30 20 10 0 20 Conclusion • Kudzu: An End-to-end Symbolic Execution Tool for JS – Separates the input space analysis into 2 components • Identified a theory of strings expressive enough for JS • Kaluza: A new decision procedure for the theory • Demonstrated capabilities on 18 live web applications • Found 11 vulnerabilities with no given initial test harness • 2 new vulnerabilities 21 Contact • Contact: – Prateek Saxena (prateeks@cs.berkeley.edu) • Kaluza, our core constraint solver is online: – http://webblaze.cs.berkeley.edu/2010/kaluza • Please visit Webblaze, our web security research page – http://webblaze.cs.berkeley.edu THANKS FOR COMING TO THE TALK 22 Reduction of JS Operations: Mixed Concrete and Symbolic Power • Example: replace full generality is undecidable • Concretize number of occurances of matched string rep1 = INPUT.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, "@"); Symbolic operations R S0 Regex Membership over T1..T3, S1..S3 S1 S2 T1 T2 @ Concat @ S1 @ INPUT R T3 @ S0 S3 @ S2 @ S3 OUTPUT 23 Results: Solver Performance SAT cases: < 1sec, UNSAT 1-50 secs 24 Comparison of Symbolic Execution Alone with GUI Exploration • Symbolic Execution Alone vs. Full-featured Kudzu Symbolic Execution Alone Full-featured Kudzu 25 Example Attacks: Gadget Overwriting Legitimate URL bar <Attack Link to IGoogle page> Compromised Gadget with Overwritten Contents 26