Bunker: A Tamper Resistant Platform for Network Tracing Stefan Saroiu University of Toronto Motivation Today’s tracing help build tomorrow’s systems ISPs view raw network traces as a liability Traces can compromise user privacy Protecting users’ privacy increasingly important Trace anonymization mitigates these issues Offline Anonymization Trace anonymized after raw data is collected Today’s traces require deep packet inspection Privacy risk until raw data is deleted Headers insufficient to understand phishing or P2P Payload traces pose a serious privacy risk Risk to user privacy is too high Two universities rejected offline anonymization Offline’s Privacy Vulnerabilities 1. 2. Two types of attacks: Traditional: Network intrusion attacks New: Raw data can be subpoenaed Both universities required that subpoenas would not affect privacy Online Anonymization Trace anonymized while tracing Difficult to meet performance demands Extraction and anonymization must be done at line speeds Code is frequently buggy and difficult to maintain Raw data resides in RAM only Low-level languages (e.g. C) + “Home-made” parsers Small bugs cause large amounts of data loss Introduces consistent bias against long-lived flows Simple Tasks can be Very Slow Regular expression for phishing: " ((password)|(<form)|(<input)|(PIN)|(username)|(<script)| (user id)|(sign in)|(log in)|(login)|(signin)|(log on)| (sign on)|(signon)|(passcode)|(logon)|(account)|(activate)|(verify)| (payment)|(personal)|(address)|(card)|(credit)|(error)|(terminated)| (suspend))[^A-Za-z]” libpcre: 5.5 s for 30 M = 44 Mbps max Online Anonymization Trace anonymized while tracing Difficult to meet performance demands Extraction and anonymization must be done at line speeds Code is frequently buggy and difficult to maintain Raw data resides in RAM only Low-level languages (e.g. C) + “Home-made” parsers Small bugs cause large amounts of data loss Introduces consistent bias against long-lived flows Our solution: Bunker Combines best of both worlds Same privacy benefits as online anonymization Same engineering benefits as offline anonymization Pre-load analysis and anonymization code Lock-it and throw away the key (tamper-resistance) Threat Model Accidental disclosure: Subpoenas: Risk is substantial whenever humans are handling data Attacker has physical access to tracing system Subpoenas force researcher and ISPs to cooperate As long as cooperation is not “unduly burdensome” Implication: Nobody can have access to raw data Is Developing Bunker Legal? It Depends on Intent of Use Developing Bunker is like developing encryption Must consider purpose and uses of Bunker Developing Bunker for user privacy is legal Misuse of Bunker to bypass law is illegal Outline Motivation Design of our platform System evaluation Case study: Phishing Conclusions Logical Design anonymize One-Way Interface (anon. data) parse Anon. Key assemble Offline Online capture Capture Hardware VM-based Implementation Closed-box VM anonymize One-Way Socket parse Anon. Key assemble decrypt Enc. Key encrypt Offline Online capture Hypervisor Open-box NIC Encrypted Raw Data Capture Hardware VM-based Implementation Open-box VM Closed-box VM anonymize One-Way Socket parse save trace Anon. Key assemble logging maintenance decrypt Enc. Key encrypt Offline Online capture Hypervisor Open-box NIC Encrypted Raw Data Capture Hardware Benefits Strong privacy properties Raw trace and other sensitive data cannot be leaked Trace processing done offline Can use your favorite language! Parsing can be done with off-the-shelf components Key Technologies “Closed-box” VM protects sensitive data Encryption protects on-disk data Contains all raw trace data & processing code No interactive access to closed-box (e.g. no console) Randomly generated key held in volatile memory Data cannot be decrypted upon reboot “Safe-on-reboot” VM mitigates hardware attacks Outline Motivation Design of our tool System evaluation Case study: Phishing Conclusions Software Engineering Benefits Python C Lines of Code 60,000 40,000 63,382 53,995 20,000 1,350 5,512 0 UW Toronto Bunker One order of magnitude btw. online and offline Development time: Bunker - 2 months, UW/Toronto - years Work Deferral Queue Size (GB) 200 150 100 50 0 12:00 PM 6:00 PM 12:00 AM 6:00 AM 12:00 PM Time Don’t do now what you can do later Error Recovery 100% % of Flows 80% Parsing OK 68.20% 60% 99.92% 40% Collateral damage 20% 31.72% 0.08% Parsing errors 0.08% 0% Online Tracer Tamper Resistant Tracer Small bugs lead to small errors in the trace -- not huge gaps Outline Motivation Design of our tool System evaluation Case study: Phishing Conclusions Phishing is Bad Costs U.S. economy hundreds of millions Affects 1+ million U.S. Internet users 2004 - mid 2006: # of phishing sites grew 10x Banks claim phishing is #1 source of fraud Phishing messages now personalized Harder to filter Two Day Hotmail Trace Hotmail Users 3,062 # of E-mails Received 13,438 # of From Addresses 7,422 # of To Addresses 25,456 Median # of Words in E-mail Body Tues Jan 29/08 11:15am - Thurs Jan 31 11:23am, University of Toronto at Mississauga 130 Questions How often are URLs present in e-mails? How often do people click on links in e-mails? Do people verify an e-mail for legitimacy before clicking on a link? Links in Email 100% 90.80% 80% % with Clicks <= 2 s % with Clicks 78.80% % with URLs 60% 40% 18.70% 20% 0% 1.53% 0.54% Users 5.86% Emails Conclusions Today’s tracing experiments need to look “deep” into network activity Serious privacy concerns IP-level trace vs. email and browse history Physical security isn’t enough: subpoenas Bunker provides the safety of online anonymization the simplicity of offline anonymization Acknowledgments Andrew Miklas (U. of Toronto) Alec Wolman (Microsoft Research) Angela Demke Brown (U. of Toronto) Questions? http://www.cs.toronto.edu/~stefan Design Open-box VM (DomainU) Closed-box VM (Domain0) One-Way Interface Offline Software Enc. Key Anon. Key Untrusted Software Online Software XEN Hypervisor Encrypted Raw Trace Capture NIC Open NIC Phishy Mail Leaks through Filters MURTY_PHISHING1 NORMAL_HTTP_TO_IP 4.33% SCREENTIP 2.93% 0.85% MURTY_PHISHING3 HTML_OBFUSCATE_05_10 17.10% 0.42% HTML_OBFUSCATE_10_20 0.10% SARE_BANK_URI_IP 0.03% SARE_SPOOF_BADURL 0.01% SARE_EBAY_SPOOF_NAME 0.22% 0% 5% 10% % of Emails 15% 20% anonymize parse Anonymized Trace Anon. Key assemble Offline Online capture Capture Hardware Commodity VM Inaccessible VM anonymize One-Way Socket Anon. Key parse save trace assemble logging Offline Online maintenance capture Hypervisor Anonymized Trace Capture Hardware Commodity VM Inaccessible VM anonymize One-Way Socket Anon. Key parse save trace assemble logging decrypt Enc. Key maintenance encrypt Offline Online capture Hypervisor Anonymized Trace Encrypted Raw Trace Capture Hardware Overall Privacy Goal Tracing Starts Tamper Attack Time Data Protected Data Exposed Goal: Ensure that user’s privacy is “no worse off” when a trace is in progress