Annual Computer Security Applications Conference (ACSAC) 2012 DOWN TO THE BARE METAL: USING PROCESSOR FEATURES FOR BINARY ANALYSIS Carsten Willems1, Ralf Hund1, Andreas Fobian1, Thorsten Holz1, Amit Vasudevan2 1Ruhr-University Bochum, Germany 2Carnegie Mellon University 左昌國 2013/02/25 Seminar @ ADLab, NCU-CSIE 2 Outline • Introduction • Software Emulators • Delusion Attacks • Binary Analysis with Branch Tracing • Experiments • Limitations • Conclusion 3 Introduction • Binary(malware or vulnerable software) analysis • Static • Dynamic • Number of execution paths • (on behavior analysis) Every Instruction or Critical Point • Native Machine or Emulation/Virtualization 4 Introduction • Native Machine • The analysis result must be unaffected by malicious code • Reverting to clean states • Lack of monitoring abilities • Emulator • Artificial environment detection • Delusion attacks • No explicit test 5 Introduction • Contributions: • Introducing several delusion attacks • An approach to perform behavior analysis • Branch tracing feature of x86 CPU • Implementing a prototype that shows the usefulness of this approach 6 Software Emulators • BOCHS • QEMU • Dynamic Translation • Guest code block (before branch) intermediate code optimization translated to host instruction code block (Translation Block) saving TBs in code cache • Isolated Memory • BitBlaze and Anubis • Taint Propagation Tracking 7 Delusion Attacks - Motivation • Current emulator detection techniques consist of 2 steps: (1) Probing the existence of a non-native system environment (2) Depending on the outcome of (1), different actions are performed • These techniques are easy to spot and mitigate • Powerful analysis methods like multi-path execution • This paper proposes detection methods that have no explicit check and do not have conditional branch 8 Delusion Attacks – Basic Principle • Self-Modifying Code (SMC) • On a native system, handling SMC correctly is sophisticated • Instruction prefetch • Multi-processor environment • Modern CPUs can handle these problems correctly • In an emulator, the CPU facilities for SMC detection cannot be utilized • Implemented in software • Preparing a list of addresses of instructions huge overhead • Most emulators (like QEMU) use page fault handling for SMC detection • All executable memory pages are set read-only • If (memory write on executable memory), page fault handler triggered • (In the handler) If the target memory should be writable (writable in guest OS), 1. 2. 3. Memory protection is modified to writable The memory write instruction is executed again Memory protection is changed to read-only 9 Delusion Attacks – REP MOVS • rep movs instruction • Copying a number of bytes, words, or double words within an implicit loop • esi: source memory location • edi: destination location • ecx: loop counter, -1 for each loop, 0 for stopping loop • On a real machine, the copy loop is atomically • In an emulator, if the destination is a code address, • The first loop iteration triggers the page fault handler • Making it writable, re-executing the write operation, and making it read-only • The instruction is re-read from memory (second loop iteration) • … 10 Delusion Attacks – REP MOVS lea lea lea lea mov eax, ebx, esi, edi, ecx, BENIGNCODE MALICIOUSCODE NEW OLD 2 ecx = 0 2 1 OLD+0x0 eip = OLD+0x2 OLD+0x0: rep movsd OLD+0x2: nop OLD+0x3: nop OLD+0x4: call eax OLD+0x6: nop OLD+0x7: nop //BENIGNCODE ret NEW+0x0: NEW+0x0: NEW+0x1: NEW+0x1: NEW+0x2: NEW+0x2: NEW+0x3: NEW+0x3: NEW+0x4: NEW+0x4: NEW+0x6: NEW+0x6: NEW+0x7: NEW+0x7: nop nop nop nop nop nop nop nop call ebx call ebx nop nop nop nop Double word //MALICIOUSCODE //MALICIOUSCODE On a real machine 11 Delusion Attacks – REP MOVS re-read the instruction from memory lea lea lea lea mov eax, ebx, esi, edi, ecx, BENIGNCODE MALICIOUSCODE NEW OLD 2 OLD+0x0: rep movsd OLD+0x2: nop OLD+0x3: nop OLD+0x4: call eax OLD+0x6: nop OLD+0x7: nop ecx = 1 2 read-only read-only page fault writable OLD+0x0 eip = OLD+0x1 //BENIGNCODE ret NEW+0x0: NEW+0x0: NEW+0x1: NEW+0x1: NEW+0x2: NEW+0x2: NEW+0x3: NEW+0x3: NEW+0x4: NEW+0x4: NEW+0x6: NEW+0x6: NEW+0x7: NEW+0x7: nop nop nop nop nop nop nop nop call ebx call ebx nop nop nop nop Double word //MALICIOUSCODE //MALICIOUSCODE In QEMU 12 Delusion Attacks - INVD • Many kinds of caches are available on a contemporary system • In an emulator, there is no explicit cache support, and all cache-related instructions have no effect • On a real machine • The modification in cache will not be written back to memory immediately • On an emulated machine • The modification is written directly to RAM 13 Delusion Attacks - INVD lea eax, lea ebx, lea esi, inc esi wbinvd mov byte invd BENIGNCODE MALICIOUSCODE A ptr [esi], 0xD0 A+0x0 esi = A+0x1 The modification is done in cache, not yet writing back to memory The cache is now invalidated A: call ebx // FF D3 = call ebx MALICIOUSCODE // FF D0 = call eax On a real machine 14 Delusion Attacks - INVD lea eax, lea ebx, lea esi, inc esi wbinvd mov byte invd BENIGNCODE MALICIOUSCODE A ptr [esi], 0xD0 A+0x0 esi = A+0x1 The modification is directly written to memory A: call eax ebx MALICIOUSCODE BENIGNCODE // FF D3 = call ebx // FF D0 = call eax In QEMU 15 Delusion Attacks - LEAVE leave mov esp, ebp pop ebp 16 Binary Analysis with Branch Tracing • On x86/64 architectures from Intel and AMD, the branch tracing (BT) facilities can record all pairs of the source address and the destination address of branch operations • The information can be used to reconstruct the execution/decision path taken during execution 17 Experiments 1: Binning of Malicious PDF Documents • “Fuzzing” which produces a large number of crash reports is a kind of automated vulnerability analysis • Binning: a technique to group similar root causes in the crash reports • This technique can also be used to group a set of exploits by the categories of exploited vulnerability • By comparing with the control path generated from BT log, it is easy to realize binning 18 Experiments 1: Binning of Malicious PDF Documents • CWXDetector • A tool that is capable of detecting exploitation attempts and extracting shellcode • It does not become active before the execution of the first shellcode instruction no information can be gained about the cause vulnerability • By combining BT with CWXDetector, it is useful to trace back from the execution of the first shellcode instruction to the root cause of vulnerability • The experiment • 4,869 malicious PDF documents • Each file exploits some kind of vulnerability in Acrobat Reader 9.00 19 Experiments 1: Binning of Malicious PDF Documents 20 Experiments 1: Binning of Malicious PDF Documents • Normalization • Because of ASLR, the branch addresses are recorded in the form of relative addresses • Collapsing loops • Removing internal exception handling of the Windows system • Ignoring the shellcode part • Clustering algorithm • DBSCAN • Jaro-Winkler distance • Measure the difference between two strings • Similar string higher score • Similar prefix higher score 21 Experiments 1: Binning of Malicious PDF Documents k: minimum cluster size ε: maximum distance of two objects to belong to the same cluster 22 Experiments 1: Binning of Malicious PDF Documents • Comparing with Wepawet • 5 different vulnerability signatures (only addressing exploits of Acrobat Reader 9.00) • A small number of samples not detected to have exploits to Acrobat Reader 9.00 manually verified wepawet is wrong • Some samples are labeled incorrectly manually verified wepawet is wrong • Performance • Time from opening the documents to the execution of shellcode • Min: 11s (2s w/o BT) • Max: 406s (117s w/o BT) • Avg: 129s (11s w/o BT) 23 Experiment 2: Enriching BT Logs 24 Experiment 3: Practical Delusion Attack with a PDF File • See T.R. Appendix B • This sample in Anubis behaved normally 25 Limitations • The data from BT logs is coarse • The prototype could be detected by timing measurements • The attacker in ring-0 is capable of disabling the BT • Could incorporate with a hardware-assisted hypervisor 26 Conclusion • Many analysis techniques utilize software emulators. • Attackers still have methods to evade the analysis under the emulation environment • A new approach for dynamic code analysis that uses CPU-assisted branch tracing offers a granularity between instruction- and function-level monitoring with reasonable overhead • Practical results show that the BT traces contain enough information to assist some tasks in malware and vulnerability analysis