Slides - Data Transparency Lab Conference

ReCon: Revealing and Controling PII Leaks in Mobile Network Systems David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology Arnaud Legout, INRIA Sophia-Antipolis Sponsored by: DTL Workshop, Nov. 2015 Motivation 2  Mobile devices  Rich sensors  Ubiquitous connectivity  Key questions  What personal information is transmitted?  To whom does it go?  What can average users do about it? How Frequently Is PII Leaked? 3 Fraction of top 100 apps leaking PII App Store Google Play 0.6 WP Store Basic tracking is common 0.5 0.4 0.3 0.2 0.1 Significant fraction of very personal information leaked across all platforms PII leakage is pervasive! 0 User Identifier (email, name, gender etc.) Contact Info Location Credential (username, password) (Tested in September, 2015) Device Identifier (IMEI, Advertiser ID, MAC etc.) How to Detect PII Leaks in Mobile? 4  At the OS  Information flow analysis (static/dynamic/hybrid)  Ok solutions, but not perfect or easily deployable  In the network  Independent of OS, app store  Easy to detect if you know what PII to search for What if you don’t know the PII a priori? ReCon: Automatically Identifying PII Leaks 5  Hypothesis: PII leaks have distinguishing characteristics  Is it just simple key/value pairs (e.g., “user=R3C0N”)?    Nope, this leads to high FP/FN rates Need to learn the structure of PII leaks Approach: Build ML classifiers to reliably detect leaks Does not require knowing PII in advance  Resilient to changes in PII leak formats over time   We built ReCon Machine learning to reveal PII leaks from mobile devices  Software middleboxes to intercept and control leaks  Works on all major platforms (iOS, Android, Windows Phone)  ReCon: Viewing detected leaks 6  PII Category  Device Identifiers  Contact Information  User Identifiers  Credentials  User Feedback  Correct  Incorrect  Not sure  Not about me Where They Know You’ve Been 7  Location information is hard to digest using text alone  WTKYB shows just how pervasive location tracking is  Creepiness factor to help users care more about privacy(?) Mitigating PII Leaks 8  ReCon gives users control over leaks  Example simple strategies  Block PII  Modify PII  Randomize identifiers  Coarsen locations  Advanced  Mock mitigation (under dev) user profiles  Provide k-anonymity How does ReCon work? 9  Key challenges for ML-based PII detection  Which classifier do we use?   C4.5 Decision Tree is best trade-off between speed and accuracy How do we train the classifier? Use traces from real users and controlled experiments  Break flows into separate words that may indicate a leak  Feature selection for scalability   How well are we doing? Controlled experiments  In the wild: Only the users themselves know for sure!   Crowdsourced reinforcement Key Results: ReCon accuracy 10  How accurate is ReCon?  99% overall accuracy from controlled experiments  FPR: 2.2%, FNR: 3.5%  Why?  Per-domain classifiers  Decision tree captures non-trivial cases Key Results: ReCon Has Good Coverage 11  How does it compare to other solutions? FlowDroid(Static IFA) Andrubis (Dynamic IFA) 100.00% ReCon ReCon finds significantly more PII than IFA solutions 90.00% Fraction of total leaks found AppAudit(Hybrid IFA) 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% ReCon successfully idenifies missing leaks after retraining 10.00% 0.00% Device Identifier User Identifier Contact Info Location Key Results: User study 12  IRB-approved user study  24 iOS, 13 Android devices  20/26 responses: system useful & behavior change  165 cases of credential leaks, 94 verified  Average leaks: iOS > Android   Unexpected, suspicious leaks  Recipe/cooking app tracks location  Video/Game/News app leaks gender  And more…  Check out http://recon.meddle.mobi Summary 13  ReCon: Provides transparency/control over PII leaks  Relies only on access to network traffic (OS independent)  Machine learning to automatically identify PII leaks  Crowdsourced reinforcement with user feedback  Works today! Check out http://recon.meddle.mobi Questions? Sponsor: David Choffnes choffnes@ccs.neu.edu Backups 14 Encryption and ReCon 15  What is your answer for increasing use of encryption?  Recon needs access only to plaintext flows  mcTLS, BlindBox  Route to trusted middlebox that can do MITM  Works for most apps, but usually not logins  Haystack (on Android device) Encryption: What is leaked? 16  Leaks over SSL (not much)  Send PII to trackers over SSL (100 apps/device) 6 iOS  2 Android  1 Windows  Problem with SSL  Certification pining  Not working with VPN enabled  Obfuscation  Little evidence in controlled experiment using IFA Other applications of ReCon 17   K-anonymity Explicit sharing  Allow  users to control how much shared to third-parties Obfuscation  Retrain classifiers to identify obfuscated leaks  Use static/dynamic to analysis tools that are resilient to evasion techniques Deployment models 18  ReCon only needs access to network flows  VPN proxy (current deployment): tunnel to proxy server  Currently supported by all mobile OSes  Can run VMs anywhere in the world  Raspberry Pi  In home network  Enables HTTPS decryption with minimal additional risks  On device  Haystack  In on Android network  Awazza and other APN/middlebox deployment models Methodology Details 19   Controlled experiments as ground truth Text classification approaches  Problem: Given a network flow, whether it contains PII information or not?  Feature Extraction: Bag-of-word model  Example.com /someevent?x=1&y=2 {“z”:”xx@y”}  Words: someevent, x, 1, y, 2, z, xx@y,  Per-Domain classifiers (e.g. Google-Analytics)  Faster (compared to one-for-all)  More accurate  Library: weka Why Run ReCon? 20  User incentives  Control over data leaks!  Blocking unwanted content  k-anonymity for increased privacy

Slides - Data Transparency Lab Conference

Related documents

Products

Support

Slides - Data Transparency Lab Conference

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib