1 DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis Lok Kwong Yan, and Heng Yin Syracuse University Air Force Research Laboratory USENIX 2012 Presentation: 2012-09-11 曾毓傑 2 Outline • Introduction • Background • Architecture • Interface & Plugins • Evaluation • Discussion & Conclusion 3 INTRODUCTION 4 Introduction • Malicious applications exist in official and unofficial marketplace with a rate of 0.02% and 0.2% respectively • Virtualization-based analysis approach • Analysis runs underneath the entire virtual machine • Difficult for an attack within VM to disrupt the analysis • Loss the semantic contextual information when the analysis component is moved out of the box • We need to intercept certain kernel events and parse kernel data structure to reconstruct the semantic knowledge 5 DroidScope • Reconstruct two levels of semantic knowledge • OS-level: to understand the activities of the malware process and its native components • Java-level: comprehend the behaviors in the Java components • Built on top of QEMU emulator • Build tools for analysis • Native instruction tracer • Dalvik instruction tracer • API tracer • Taint tracker 6 BACKGROUND 7 Android System Overview Android System Parent process for all Android processes libdvm.so provide Java-level abstraction Kernel data structure 8 DroidScope Overview 9 ARCHITECTURE 10 Architecture • Integrating the changes into the QEMU emulator • Came from Android SDK • Leave Android system unchanged • For different virtual devices can be loaded • Reconstruct OS-level and Java-level views • Monitors how malware’s Java components communicate with Android Java Framework • Monitors how malware’s native components interact with the Linux Kernel • Monitors how malware’s Java components and native components communicate through the JNI interface 11 Reconstructing OS-level View • Basic Instrumentation • Insert extra instructions during the code translation phase for system status Target Instructions Add additional code for detection Tiny Code Generator(TCG) Native Instructions 12 Reconstructing OS-level View (Cont.) • For example, context switch in ARM architecture would change the c2_base0 and c2_base1 registers, which stores the page table address • Extract semantic knowledge • System calls • Running processes, threads • Memory maps 13 Reconstructing OS-level View (Cont.) • System calls • ARM architecture use service zero instruction svc #0 as making system calls, and system call number is in register R7 • Processes and Threads • Read task_struct structure for process information • pid, tgid, pgd, uid, gid, euid, egid, comm, cmdline, thread_info • sys_fork, sys_execve, sys_clone, and sys_prctl system calls trigger the information update • Memory maps • mm_struct • sys_mmap2 triggers the information update 14 Reconstructing Java-level View • Dalvik Instructions • Knowing which instruction is executing right now • Register R15 points to the currently executing Dalvik instruction 15 Reconstructing Java-level View (Cont.) • Just-In-Time Compiler • Some hot, heavily used instructions are compiled into native machine code • Those code execution would skip the mterp component Call dvmGetCodeAddr() for address of compiled code Flush JIT cache, return NULL and reset counter to disable JIT function 16 Reconstructing Java-level View (Cont.) • Dalvik Virtual Machine States • Record Register R4 to R8 for storing DVM states R4: Program Counter R5: Stack Frame Pointer R6: InterpState Structure R7: Instruction Counter R8: mterp Base Address 17 Reconstructing Java-level View (Cont.) • Java Objects • Obtaining data inside Java objects such as string data 18 Symbol Information • Native library symbols • Use objdump to retrieve symbol information • Some malwares often stripped of all symbol information • Dalvik or Java symbols • Use dexdump to retrieve symbol information • Data structures of DVM also contains some symbol information • InterpState Structure (Register R6) has a method field points to the Method structure for the currently executing method • Method structure has a name field points to method name 19 INTERFACE & PLUGINS 20 Interface & Plugins • APIs for analysis customization • The instrumentation logic in DroidScope is complex and dynamic • An event based interface to facilitate custom analysis tool developement 21 Sample Plugin • Setup which program to be analyzed and print all Dalvik opcode information 22 API Implementation • API tracer • Instrument the invoke* and execute* Dalvik bytecodes to identify and log method invocations • Native instruction tracer • Gather each instruction including the raw instruction, its operands, and their values • Dalvik instruction tracer • Decode instructions into dexdump format, including values and all available symbol information • Taint Tracker • Monitor sensitive information and keep track data propagation 23 EVALUATION 24 Evaluation • Benchmark checking efficiency and capability • 7 benchmark apps • AnTuTu Benchmark • AnTuTu CaffeineMark • CaffeineMark • CF-Bench • Mobile Processor Benchmark • Benchmark by Softweg • Linpack 25 Evaluation • Performance • Capability • Analysis of DroidKongFu • Analysis of DroidDream 26 DISCUSSION & CONCLUSION 27 Discussion • Limited Code Coverage • One drawback of dynamic analysis • By manipulating the return value of function call, we may increase the code coverage • Other Dalvik Analysis Tools • Dalvik/Java Static Analysis: Woodpecker, DroidMoss • Native Static Analysis: IDA, binutils, BAP • Android Dynamic Analysis: TaintDroid, DroidRanger • Linux Kernel Dynamic Analysis: logcat, adb 28 Conclusion • We presented DroidScope, a fine grained dynamic binary instrumentation tool for Android that rebuilds two levels of semantic information