1 Securing Untrusted Code via Compiler-Agnostic Binary Rewriting Richard Wartell, Vishwath Mohan, Dr. Kevin Hamlen, Dr. Zhiqiang Lin The University of Texas at Dallas Supported in part by NSF, AFOSR, and DARPA 2 Software Fault Isolation (SFI) • Automatically rewrite binaries to make them safer • [Wahbe, Lucco, Anderson, Graham, SOSP 1993] Untrusted code Rewriter Safe code 3 Software Fault Isolation (SFI) • trusted & untrusted modules in common address space kernel32.dll • Example #1: web browser plug-ins • Example #2: trusted system libraries inside user.dll untrusted application • Goal: protect trusted modules from Trusted untrusted ones Untrusted • confine untrusted module behaviors • Example: Untrusted modules must eMule.exe obey trusted module interfaces • Blocks ROP attacks [Shacham, CCS 2007] 4 Inlined Reference Monitors (IRMs) • SFI foundation supports higher-level kernel32.dll user.dll • program-specific (no other programs affected) Trusted Untrusted policies [Abadi, Budiu, Erlingsson, and Ligatti. CCS 2005] • Example: IRMs [Schneider, ISS 2000] • Enforces powerful policies: IRM • light-weight enforcement (minimize context switches) • Statefulness • Example: Adobe Reader may access the network reader.exe (to check for updates) and may read my confidential files, but may not access the network after reading my confidential files. 5 A Brief History of SFI 1995 2000 2005 2010 1: [Wahbe, Lucco, Anderson, and Graham. SOSP 1993] 2: [Abadi, Budiu, Erlingsson, and Ligatti. CCS 2005] 3: [McCamant and Morrisett. USENIX 2006] 4: [Erlingsson, Abadi, Vrable, Budiu, and Necula. SOSDI 2006] 5: [Yee, Sehr, Dardyk, Chen, Muth, Ormandy, Okasaka, Narula, and Fullagar. S&P 2009] 6 A Brief History of SFI 1995 2000 2005 2010 All prior works require explicit code-producer cooperation 1: [Wahbe, Lucco, Anderson, and Graham. SOSP 1993] 2: [Abadi, Budiu, Erlingsson, and Ligatti. CCS 2005] 3: [McCamant and Morrisett. USENIX 2006] 4: [Erlingsson, Abadi, Vrable, Budiu, and Necula. SOSDI 2006] 5: [Yee, Sehr, Dardyk, Chen, Muth, Ormandy, Okasaka, Narula, and Fullagar. S&P 2009] 7 Reins: REwriting and IN-lining System • Main Discovery: means of enforcing SFI for near arbitrary COTS binaries • no source code or debug info (assumed unavailable) • no disassembly listing • compiler-agnostic • real COTS binary features • interleaved code and data • computed control-flows • dynamic linking • event-driven callbacks • multithreading • Low overhead (~2%) • Formal machine-verification of policy enforcement 8 Binary Rewriting w/o metadata • Relocation information, debug tables and symbol stores not always available • Reverse engineering concerns • Perfect static disassembly without metadata is provably undecidable • Best disassemblers (IDA Pro) make many mistakes Program Instruction Count IDA Pro Errors mfc42.dll 355906 1216 mplayerc.exe 830407 474 vmware.exe 364421 183 9 Infeasibility of Perfect Disassembly FF E0 5B 5D C3 0F 88 52 0F 84 EC 8B Valid Disassembly • Disassemble this hex sequence • Undecidable problem Valid Disassembly Valid Disassembly FF E0 jmp eax FF E0 jmp eax FF E0 jmp eax 5B pop ebx 5B pop ebx 5B pop ebx 5D pop ebp 5D pop ebp 5D pop ebp C3 retn C3 retn C3 retn 0F 88 52 0F 84 EC jcc 0F db (1) 0F 88 db (2) 8B … mov 88 52 0F 84 EC mov 52 push edx jcc 8B … mov 0F 84 EC 8B … 10 Separating Code from Data Original Memory Layout Rewritten Memory Layout Reins Binary Original Binary Header IAT .data .text Rewritten Header IAT .data .told (NX bit set) .tnew (NW bit set) Low Memory kernel32.dll user32.dll High Memory user32.dll kernel32.dll Denotes a section that is modified during static rewriting 11 De-Shingling Disassembly Byte Sequence: FF E0 5B 5D C3 0F 88 B0 50 FF FF 8B Disassembled Hex FF Path 1 Path 2 Invalid Path 3 jmp eax E0 Path 4 Included Disassembly jmp eax loopne 5B pop pop 5D L1: pop L1: pop C3 retn retn 0F jcc 88 mov B0 L2: mov mov loopne 50 jmp L1 N/A FF mov FF 8B jcc L2: mov jmp L2 12 Aligning Instructions • Chunk instructions to 16 byte boundaries with targets at the beginning, and calls at the end [McCamant and Morrisett. USENIX 2006] Rewritten Binary 0x78900F nop 0x789010 mov eax, 0x6891d8 0x789016 add eax, 1 0x78901C nop (x4) 0x789020 nop (x8) 0x789028 and eax, 0x0FFFFFF0 0x78902E call eax 0x789030 … 0x7892E0 push ebx Injected Instructions 0x7892E1 mov ebx, [esp+4] Alignment nops 0x7892E5 … Original Binary 0x68900F mov eax, 0x6891D8 0x689015 add eax, 1 0x68901B call eax … … 0x6891D9 push ebx 0x6891DA mov ebx, [esp+4] 13 Preserving Good Flows Rewritten Binary • Turn original code section into .told 0x6891D9 0xF4 loc_7892F0 .tnew 0x78900F nop 0x789010 mov eax, 0x6891d8 0x68900F mov eax, 0x6891D8 0x789016 add eax, 1 0x689015 add eax, 1 0x78901C nop (x4) 0x68901B call eax 0x789020 cmp 0xF4, [eax] … 0x789023 cmovz eax, [eax+1] 0x789027 nop 0x789028 and eax, 0x0FFFFFF0 0x78902E call eax 0x789030 … 0x7892F0 push ebx 0x7892F1 mov ebx, [esp+4] 0x7892F5 … a dynamic lookup table Original Binary … 0x6891D9 push ebx 0x6891DA mov ebx, [esp+4] Injected Instructions Alignment nops 14 Preserving Good Inter-module Flows Original Code Rewritten Code jmp [IAT:CreateWindow] jmp [IAT:CreateWindow] CreateWindow CreateWindow • IAT data section locked non-writable 15 Computed Inter-module Flows trusted library intermediary library (trusted) rewritten code callback stub callback callback_ret return trampoline caller • computed jumps to trusted modules • dynamic linking (DLLs) • callbacks (event-driven programming) 16 Results 16% 12% 8% 4% 0% -4% -8% 17 IRM Synthesis Binary Rewriter Policy Policy-adherant binary • Enforced policies on Eureka email client (>1.6MB code): • Disallow creation of .exe, .msi, or .bat files • Disallow execution of Windows explorer as an external process • Disallow opening more than 100 SMTP connections • Malware policies: • Disallow creation of .exe, .msi, or .bat files • Successfully stopped virus propagation for real world malware samples 18 Formal Verification TCB Binary Rewriter Policy Policy-adherant binary Verifier • Formal verification of rewritten binaries • 1500 SLOC of 80-column OCaml code • no shared code between verifier and rewiter • median verification time: 0.4 ms/KB code • Allows rewriter to remain completely untrusted! • rewriting deployable as an untrusted service 19 Compatibility Limitations • COM objects • Runtime code generation (JIT) • Undocumented OS callbacks 20 Conclusion • Reins finally opens the door to full-scale COTS native SFI for massively complex, real-world applications without source. • no source code, debug info, or disassembly (assumed unavailable) • compiler-agnostic • real COTS binary features • interleaved code and data, computed control-flows, dynamic linking, event- driven callbacks, multithreading • automated synthesis of monitor from policy specification • automated machine-verification • low runtime overhead (~2.4%) • successfully tested on real commercial applications (>3MB code) • Practical Applications: • safe reuse of untrusted commercial software in security-critical environments • rewriting on demand: rewriter deployable as an untrusted third-party service due to separate verifier 21 References • R. Wahbe, S. Lucco, T. E. Anderson, and S. L. Graham. Efficient software-based • • • • • • fault isolation. In Proc. ACM Sym. Operating Systems Principles, pages 203–216, 1993. F. B. Schneider. Enforceable security policies. ACM Trans. Information and Systems Security, 3(1):30–50, 2000. M. Abadi, M. Budiu, U. Erlingsson, and J. Ligatti. Control-flow integrity. In ACM Conference on Computer and Communications Security, pages 340-353, 2005. S. McCamant and G. Morrisett. Evaluating SFI for a CISC architecture. In Proc. USENIX Security Sym., 2006. Ú. Erlingsson, M. Abadi, M. Vrable, M. Budiu, and G. C. Necula. XFI: Software guards for system address spaces. In Proc. Sym. Operating Systems Design and Implementation, pages 75–88, 2006. H. Shacham. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In Proc. ACM Conf. Computer and Communications Security, pages 552–561, 2007. B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox for portable, untrusted x86 native code. In Proc. IEEE Sym. Security and Privacy, pages 79–93, 2009. 22 Advantage over VMs • no air gap • IRM has controlled but direct access to system resources and other processes • no semantic gap • no dynamic instruction interpretation or translation • better performance • fewer context switches • light-weight VM logic essentially in-lined into code • formal verification • few VMs have been formally verified • each change to VM (e.g., to enforce new policy) requires reverification of VM