Practical Static Analysis of JavaScript Applications in the Presence of Libraries and Frameworks Magnus Madsen Benjamin Livshits Michael Fanning Outline Motivation • auto-complete • call graph discovery • capability + API usage Challenges • large & complex libraries • native libraries • insufficiency of stubs Technique • pointer analysis • use analysis Evaluation • improved auto-complete • improved call graph resolution • soundness & completeness 2 / 27 Motivation Windows 8: – JavaScript is an officially supported language – .NET library bindings exposed to JavaScript Q: How can we use static analysis to improve the development experience? 3 / 27 The Challenge Modern JavaScript applications are often built using large and complex libraries: – Browser API, Win8 API, NodeJS, PhoneGap, ... – Problems: Reflection? native code? sheer size? – But: We really only care about the application! A pragmatic choice: We ignore the libraries (thus sacrificing soundness) to focus on the applications themselves 4 / 27 Practical Applications (which do not require soundness) • • • • auto-complete call graph discovery capability usage API usage 5 / 27 Practical Applications (which do not require soundness) • • • • auto-complete call graph discovery capability usage API usage 6 / 27 Practical Applications (which do not require soundness) • • • • auto-complete call graph discovery capability usage API usage <Capabilities> <Capability name="internetClient"/> <Capability name="picturesLibrary"/> <DeviceCapability name="location"/> <DeviceCapability name="microphone"/> <DeviceCapability name="webcam"/> </Capabilities> 7 / 27 Practical Applications (which do not require soundness) • • • • auto-complete call graph discovery capability usage API usage • • • • • • • Windows.Devices.Sensors Windows.Devices.Sms Windows.Graphics.Display Windows.Graphics.Printing Windows.Media.Capture Windows.Networking.Sockets Windows.Storage.Search 8 / 27 Win8 & Web Applications Web App Windows 8 App Builtin DOM WinJS Win8 Builtin DOM jQuery … 3000 functions 9 / 27 Introducing Use Analysis elm flows into reset elm flows into playVideo elm must have: muted and play elm must have: pause Conclusion: elm is a HTMLVideoElement 10 / 27 Use Analysis: Determines what an object is based on how it is used 11 / 27 Outline Motivation • call graph discovery • API + capability usage • auto-complete Challenges • large & complex libraries • native libraries • insufficiency of stubs Technique • pointer analysis • use analysis Evaluation • improved call graph resolution • improved auto-complete • running time 12 / 27 Heap Partitioning Application Heap "Symbolic Heap" Library Heap 13 / 27 Symbolic Objects and Unification 1. Introduce symbolic objects where flow is dead (i.e. missing) due to libraries. 2. Collect information about where the symbolic objects flow and how they are used. 3. Unify symbolic objects with "compatible" application or library objects. 14 / 27 Example: Iteration 1 We discover that c is a dead return 15 / 27 Example: Iteration 2 We introduce a symbolic return object 16 / 27 Example: Iteration 3 We unify the symbolic object with the HTMLCanvasElement 17 / 27 Missing Flow Where can dataflow be missing when ignoring the library code?: – Dead Returns – Dead Arguments – Dead Loads – Dead Prototypes – Dead Array Accesses 18 / 27 Unification Strategies Unification strategies based on property names: – ∃: a single shared property name – ∀: all shared property names – ∀: all shared property names, but prioritize prototype objects x x Application x x y z y Symbolic Application 19 / 27 Outline Motivation • call graph discovery • API + capability usage • auto-complete Challenges • large & complex libraries • native libraries • insufficiency of stubs Technique • pointer analysis • use analysis Evaluation • improved call graph resolution • improved auto-complete • running time 20 / 27 Benchmarks 25 Windows 8 Apps: Average ~1,500 lines of code Approx. 30,000 lines of stubs 21 / 27 Call Graph Resolution Pointer Analysis Pointer Analysis + Use Analysis A call site is resolved if it has a non-empty set of call targets 22 / 27 Auto-complete • We compared our technique to the auto-complete in four popular IDEs: – Eclipse for JavaScript developers – IntelliJ IDEA 11 – Visual Studio 2010 – Visual Studio 2012 • In all cases, where libraries were involved, our technique was an improvement 23 / 27 Auto-complete: Case study 0 35 26 1 0 9 7 k 0 50 7 7 0 50 1 k 0 250 7 k 24 / 27 Soundness & Completeness Use Analysis is inheritenly unsound: – library code is not analyzed – library code could have arbitrary side-effects An example of unsoundness ... An example of incompleteness: results of manual (human) inspection of 200 call sites 25 / 27 Findings • Auto-completion is improved compared to four popular IDEs • Use analysis improves call graph resolution • In practice unsoundness is limited • Reasonable analysis time median analysis time of 10s for apps of avg 1500 loc 26 / 27 Summary Pointer analysis + Use analysis: – A technique to statically reason about JavaScript applications which rely on large and complex libraries without analyzing the libraries themselves Practical applications: – – – – auto-complete API usage capability discovery call graph construction Thank You 27 / 27 28 / 27 Architecture JavaScript Application Introduce New Facts App Facts Analysis Rules Pointer Analysis Use Analysis 29 / 27 30 / 27 31 / 27 32 / 27 Datalog Formulation We define the following domains: 𝑽 – variables 𝑯 – heap-allocated objects 𝑷 – property names 𝑪 – call sites 𝒁 – integers (e.g. argument offsets) Based on Gatekeeper (Livshits et al. 2009) 33 / 27 Pointer Analysis PointsTo(v, h) :- NewObj(v, h, _). PointsTo(v1, h) :- Assign(v1, v2), PointsTo(v2, h). PointsTo(v2, h2) :- Load(v2, v1, p), PointsTo(v1, h1), HeapPtsTo(h1, p, h2). HeapPtsTo(h1, p, h2) :- Store(v1, p, v2), PointsTo(v1, h1), PointsTo(h2, h2). HeapPtsTo(h1, p, h3) :- Prototype(h1, h2), HeapPtsTo(h2, p, h3). CallGraph(c, f) :- ActualArg(c, Assign(v1, v2) :- CallGraph(c, ActualArg(c, Assign(v1, v2) :- CallGraph(c, ActualRet(c, 0, v), PointsTo(v, f). f), FormalArg(f, i, v1), i, v2), z > 0. f), FormalRet(f, v1), v2). 34 / 27 Example: Dead Returns DeadRet(c, v) :- CallGraph(c, f), ActualRet(c, v), !ResolvedVar(v), !AppAlloc(f). DeadArg(f, i) :- FormalArg(f, i, v), !ResolvedVar(v), AppAlloc(f). ... 35 / 27