D2Taint: Differentiated and Dynamic Information Flow Tracking on Smartphones for Numerous Data Sources Boxuan Gu, Xinfeng Li, Gang Li, Adam C. Champion, Zhezhe Chen, Feng Qin, and Dong Xuan The Ohio State University Infocom 2013 Outline Introduction Background Differentiated and Dynamic Tagging IFT with Dynamic and Differentiated Tagging Evaluation Method & Experimental Results Conclusions Introduction Trend Micro reports that over 25,000 Android malware samples were found in June 2012 alone 46%-55% of smartphone apps transmit users’ private information over networks without users’ awareness or consent Introduction TaintDroid extend Dalvik VM to tag smartphone data using 32 possible types based on their origins Just 32 origins ... Introduction D2Taint track sensitive data from a large number of possible internal and external sources partition sources into disjoint classes (correspond to different sensitivities) tag structure updates itself on-the-fly Background Information Flow Tracking Basics Compiler analysis on programs written in special type-safe programming languages Software instrumentation at source code, bytecode, or binary level Architecture support for IFT Background Pre-defined structure sources IDs or sensitivity level Tag propagation policy a = b + c --> a.tag = max(b.tag, c.tag) Tag checking “Passwords” may not be sent over network TaintDroid TaintDroid TaintDroid Dalvik opcode: http://pallergabor.uw.hu/androidblog/dalvik_opcodes .html Differentiated and Dynamic Tagging Source level different info sources may have different sensitivities in terms of security Differentiated and Dynamic Tagging Application level different amounts of storage space to capture heterogeneous sources and correlations Differentiated and Dynamic Tagging User level adapt to changing information source access patterns Differentiated and Dynamic Tagging By examining the source level and application level Differentiated classes By examining the user level Tag dynamics Differentiated and Dynamic Tagging Tag structure Tag Structure Tag scheme ID Class 1 Tag scheme 00 2/3/1 01 2/2/2 10 4/1/1 11 3/2/1 Class 2 Class 3 Class 1 Table 0001 google.com 0010 yahoo.com Tag Structures Examples 32 bits, 1 class for each bit: TaintDroid 32 bits, 2-bit tag scheme ID, 3 classes, 16/8/6 bits per class, 4/4/2 bits per source 32 bits, 2-bit tag scheme ID, 2 classes, 24/6 bits per class, 3/2 bits per source Tag Dynamics Each class can have different length at different times Perform “on-demand” machine learning based on statistical properties of tag space usage and location information tables’ recent hash values Adjust its tag structure Tag Dynamics Issues Tag Scheme Switching tag scheme config -> preconfigured when to switch tag scheme -> on-the-fly Tag merging IFT with Dynamic and Differentiated Tagging IFT with Dynamic and Differentiated Tagging When an app start, the dynamic tagging component first loads two configuration files tag structure definitions user-defined classes and known data sources IFT with Dynamic and Differentiated Tagging The dynamic tagging component checks the data source list for each incoming data source by the tag assigner and tracks incoming sources’ statistics and determines whether it should switch D2Taint to a different tag scheme Dynamic Tagging Component Dynamic Tagging Core tag scheme settings: scheme number, bits per tag, number of classes, and a pointer to the class list class structure: number of classes in the tag system, bits per hashcode, number of reserved slots for the class, and a text description of the class Dynamic Tagging Component Dynamic Tagging Core use a global location information table list to record all source information after a certain number of new sources are added into an location information table, D2Taint decides whether to switch the tag scheme based on these new sources Dynamic Tagging Component Tag Merger a.tag = b.tag ⊕ c.tag if using the same tag scheme? if using the different tag scheme? truncate certain significant bits Information Flow Tracking Component Taint Map we do not store tags of method local variables, method arguments, and class instance fields adjacent to them in memory Taint Map Method local variables and method arguments when Dalvik VM allocates a stack frame for a method, our system allocates a stack taint map for it Class instance fields to be stored in objects’ taint maps objects’ taint map is stored immediately after that allocated for the object Information Flow Tracking Component Tag Assigner insert our tag assigner logic into file I/O, network I/O, sensor, and other library functions that read private information Information Flow Tracking Component Tag Propagator interpreted code and native code: same as TaintDroid also propagate tags via Binder IPC Information Flow Tracking Component Tag Checker trustable sites Evaluation Method & Experimental Results Android 2.2 on Nexus One Select 84 “top free” apps from Google Play CaffeineMark for benchmark Evaluation Method & Experimental Results Real-world 71 out of 84 apps leak information reveal the paths by which the information is leaked 33 apps transmit data among many various external sources 12 apps leak devices’ IMEIs/EIDs Evaluation Method & Experimental Results Performance -> 9% -> 7.3% -> 16% -> 3% -> 13% -> 21% Evaluation Method & Experimental Results Java Macrobenchmark Evaluation Method & Experimental Results CaffeineMark’s memory footprint Android: 21664 KB D2Taint: 22528 KB this test ignored the memory used by location information table, which will dynamically increases as more information sources arrived Evaluation Method & Experimental Results Sequential websites dynamic static Evaluation Method & Experimental Results Random websites dynamic static Conclusions A novel IFT tagging strategy using differentiated and dynamic tagging dynamic tag structure