Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps Bin Liu (SRA), Bin Liu (CMU), Hongxia Jin (SRA), Ramesh Govindan (USC) 2 The Mobile Ad Ecosystem App Developer Paid by Impressions Paid by User Clicks Ad Plugin See/Click Ads Ad Network App User Phone/Tablet App Introduction Challenges PEDAL Evaluation Conclusion Ecosystem Incentives are Skewed Against Users Ad libraries taking unwarranted liberties with personal data on devices in order to more efficiently target ads Users are especially concerned about privacy risks posed by ad libraries “Mobile advertising services were a consistent privacy concern for the most participants” “ Users felt the least comfortable when private resources were used for advertising” Introduction Challenges PEDAL Evaluation Conclusion 3 Therefore, our position is that… Considering these privacy concerns on ad libraries Ad libraries fundamentally need less privilege than app logic The user should be able to specify what resources should be granted to ad libraries This cannot be achieved in Android Android permissions model governs app access to resources, however, acts on the whole apps, at install time Once the app is installed, the app and all its included libraries are granted access to these resources Introduction Challenges PEDAL Evaluation Conclusion 4 5 Our Approach – Privilege De-Escalation An ad library can have fewer resource access privileges than the app logic itself Users can selectively deny resource access privileges to the ad libraries without affecting the main app logic Introduction Challenges PEDAL Evaluation Conclusion 6 Our Approach – Examples Introduction Challenges PEDAL Evaluation Conclusion 7 Our Approach – Examples Introduction Challenges PEDAL Evaluation Conclusion 8 Challenges To implement such a system, we need to answer two questions How to identify ad library code in an app? How to effect selective privilege de-escalation? Both challenges are non-trivial Introduction Challenges PEDAL Evaluation Conclusion Challenges on Identify Ad Libraries We can at best access the so called bytecodes which are a intermediate code obtained by compiling source codes There is no annotation that preserves the separation between bytecodes from app logic and bytecodes from an ad library Introduction Challenges PEDAL Evaluation Conclusion 9 Challenges on Identify Ad Libraries Some researchers suggest to use bytecode path matching to identify ad libraries in bytecodes, e.g. /com/google/ads However, advanced ad libraries use package-level or code-level obfuscation to foil this method Introduction Challenges PEDAL Evaluation Conclusion 10 Challenges on privilege de-escalation Ideally, the solution must not require changes to the OS or the VM, or must not require rooting a phone The solution must be highly efficient; significant slowdowns in app execution time can affect usability Introduction Challenges PEDAL Evaluation Conclusion 11 Challenges on privilege de-escalation Most important, in a substantial fraction of apps, ad libraries inherit privileges from the app logic Any solution for privilege de-escalation must prevent this kind of privilege inheritance Introduction Challenges PEDAL Evaluation Conclusion 12 13 PEDAL Overview PEDAL contains: a Separator and a Rewriter Input: a packaged app & Output: a repacked app with deescalated privileges for any (obfuscated) ad libraries in the app Introduction Challenges PEDAL Evaluation Conclusion 14 PEDAL Overview This design achieves the challenges we have reviewed before Obfuscation resistant classification and binary-rewriting achieve selective de-escalation on ad libraries By using binary rewriting, our approach does not require OS level changes, and also achieves significant efficiency Finally, the Rewriter, by analyzing information flow across bytecode sets, can prevent privilege inheritance Introduction Challenges PEDAL Evaluation Conclusion 15 Separator Implementation Most important: choose the set of features that ensure high classification accuracy Introduction Challenges PEDAL Evaluation Conclusion 16 Separator Implementation We choose six groups of features that are informative to ad library classification Usage of Android basic components Usage of selective Android permissions Usage of visual elements Usage of information sources and sinks Usage of APIs for runtime permission check Keyword matching for class/method/field names We do not use bytecode path information, and the chosen features are resistance to code obfuscation Introduction Challenges PEDAL Evaluation Conclusion 17 Rewriter Implementation Rewriter effects privilege de-escalation by binary rewriting based on user-specified privacy policies Rewriter interposes on resource accesses by the ad library or the app logic Rewriter only interposes what we called core resource access functions Introduction Challenges PEDAL Evaluation Conclusion 18 Rewriter Implementation Preventing Privilege Inheritance Focus on resource access core functions in the app logic to Internet access calls in the ad library Once these potential leakage paths have been identified, Rewriter performs the same kind of interposition as above Native Libraries Marginally Affect our Control Introduction Challenges PEDAL Evaluation Conclusion 19 Evaluation: the Separator Crawled 63,105 free apps from Google Play Store Train a SVM from 335 ad modules and 335 non ad modules: Recall 98.4%, Precision 98.5% Randomly chose 200 apps, and manually check the classification result Even with obfuscation in most of these apps (120/200) our classifier performs an accuracy of 93% Introduction Challenges PEDAL Evaluation Conclusion 20 Evaluation: the Separator Our Separator is more efficient than the traditional package name matching approach Among all apps, our Separator discovered 2,598 unique ad library modules, belonging to 546 unique ad library sources This is at least 5X more than the reported numbers in papers that maintain a pre-defined blacklist of ad package names Introduction Challenges PEDAL Evaluation Conclusion 22 Evaluation: the Rewriter How much the runtime overhead the rewriting code has added We select 100 apps, and uses an UI automation tool to run both original and rewritten apps Both versions of a app were fed identical click streams Executing these 100 apps on showed a total increase in runtime of 0.89% on average. Introduction Challenges PEDAL Evaluation Conclusion 23 Evaluation: the Rewriter How effective the control can be? 100 Apps + Pre-defined clickstream for each app No Control 843 ads, 304 are location targeted Introduction Challenges Control Internet (block ads) Control Location (feed fake location) 9 ads 806 ads, 249/23 targets fake/real location Due to missing core functions Due to limitations of static flow analysis PEDAL Evaluation Conclusion 24 Conclusion PEDAL: a system to achieve selective privilege de-escalation for ad libraries PEDAL performs automated classification to identify ad library code, and rewrite core resource functions to achieve de-escalation PEDAL is robust, by design, to both package name obfuscations and source code obfuscation PEDAL shows remarkable classification accuracy and efficacy, yet requires reasonable computing power to process apps PEDAL is effective and imposes negligible runtime overhead for apps Introduction Challenges PEDAL Evaluation Conclusion