Static Analysis of Executables with Applications to Infosec Somesh Jha University of Wisconsin Wisconsin Safety Analyzer http://www.cs.wisc.edu/wisa Various Tasks • Developing infrastructure for analysis and rewriting executables – Initial focus on x86 and infosec applications • Applications to host-based intrusion detection • Malicious code detection – Attacking virus scanners – Better malicious code detection techniques • Foundations – Framework for interprocedural analysis – Applications to “identifying structure” in activation records 28 July 2016 Somesh Jha, UW-Madison 2 Goal: Discover attempts to maliciously gain access to a system Misuse Detection Specification-Based Monitoring Anomaly Detection • Specify patterns of attack or misuse • Specify constraints upon • Learn typical behavior of application program behavior • Ensure misuse patterns do not arise at runtime • Ensure execution does not violate specification • Variations indicate potential intrusions • Snort • Our work; Ko, et. al. • IDES • Rigid: cannot adapt to novel attacks • Specifications can be cumbersome to create • High false alarm rate 28 July 2016 Somesh Jha, UW-Madison 3 Worldview User Program Event Interface Operating System 28 July 2016 • User desires to run program • Running program makes operating system requests • Attacker uses running program to generate malicious requests Somesh Jha, UW-Madison 4 Worldview • Attack goal: be creative… – Destruction – Information leaks – Service disruption User Program • Attack technique: run arbitrary code in the user program Event Interface Operating System 28 July 2016 – Buffer overrun – Virus or worm – Manipulate remote execution Somesh Jha, UW-Madison 5 Example: SQL Slammer • Worm activated January 2003 – Caused worldwide service disruption • Propagation: exploited buffer overrun in Microsoft SQL Server to execute arbitrary code • Detection: SQL Server makes unexpected system calls – Arbitrary code differs from SQL code 28 July 2016 Somesh Jha, UW-Madison 6 Our Objective • Detect malicious activity before harm caused to local machine User Program Event Interface Operating System 28 July 2016 • … before operating system executes malicious system call Somesh Jha, UW-Madison 7 Our Objective User Program Event Interface Operating System 28 July 2016 Our work • Detection at system call interface makes our work independent of intrusion technique Somesh Jha, UW-Madison 8 Our Objective Snort • Detection at service interface: limited to network-based attacks User Program Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 9 Specification-Based Monitoring • Specify constraints upon program behavior – Construct automaton accepting all system call sequences the program can generate – First suggested by Wagner-Dean, Oakland, 2000 (static analysis of source code) – Our analysis is on binaries • Ensure execution does not violate specification – Operate the automaton – If no valid states, then intrusion attempt occurred 28 July 2016 Somesh Jha, UW-Madison 10 An Application of Binary Analysis/Rewriting Infrastructure • Binary analysis – Construct model for host-based intrusion detection • Binary rewriting – Rewrite binary to expose more information about the program – Makes the model more precise • Current prototype uses EEL – J.R. Larus and E. Schnarr, PLDI, 1995. 28 July 2016 Somesh Jha, UW-Madison 11 Specification-Based Monitoring User Program Analyzer Rewritten Binary 28 July 2016 Runtime Monitor Somesh Jha, UW-Madison 12 Specification-Based Monitoring User Program Analyzer Rewritten Binary 28 July 2016 Runtime Monitor Somesh Jha, UW-Madison 13 Specification-Based Monitoring Rewritten Binary Runtime Monitor User Program Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 14 Specification-Based Monitoring Runtime Monitor Rewritten Binary Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 15 Specification-Based Monitoring Rewritten Binary Runtime Monitor Event Interface Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 16 Specification-Based Monitoring Rewritten Binary Event Interface Runtime Monitor • Our runtime monitor monitors program execution at the event interface layer • Ensures program events match specification Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 17 Specification-Based Monitoring Rewritten Binary Event Interface Runtime Monitor Event Interface Operating System • Our runtime monitor monitors program execution at the event interface layer • Ensures program events match specification • Runtime monitor must be part of trusted computing base Trusted computing base 28 July 2016 Somesh Jha, UW-Madison 18 Specification-Based Monitoring Rewritten Binary Event Interface Runtime Monitor • Event interface defines observable events • Observed events may be superset of system calls • Expand interface between program and monitor – Call-site renaming – Null calls Event Interface Operating System 28 July 2016 Somesh Jha, UW-Madison 19 Specification-Based Monitoring Expanded Interface Rewritten Binary Runtime Monitor Event Interface Operating System 28 July 2016 • Expanded set of observable events – More precise program modeling – More efficient model operation • User program rewritten to use expanded interface Somesh Jha, UW-Madison 20 Model Construction User Program Analyzer Rewritten Binary Binary Program 28 July 2016 Control Flow Graphs Runtime Monitor Local Automata Somesh Jha, UW-Madison Global Automaton 21 The Binary View (SPARC) function: save %sp, 0x96, %sp cmp %i0, 0 bge L1 mov 15, %o1 call read mov 0, %o0 call line nop b L2 nop L1: call read mov %i0, %o0 call close mov %i0, %o0 L2: ret restore 28 July 2016 function (int a) { if (a < 0) { read(0, 15); line(); } else { read(a, 15); close(a); } } Somesh Jha, UW-Madison 22 Control Flow Graph Generation function: save %sp, 0x96, %sp cmp %i0, 0 bge L1 mov 15, %o1 call read mov 0, %o0 call line nop b L2 nop L1: call read mov %i0, %o0 call close mov %i0, %o0 L2: ret restore CFG ENTRY bge call read call read call close call line ret CFG EXIT 28 July 2016 Somesh Jha, UW-Madison 23 Control Flow Graph Translation CFG ENTRY bge read close read line call read call read call close call line ret CFG EXIT 28 July 2016 Somesh Jha, UW-Madison 24 Interprocedural Model Generation A read read close line 28 July 2016 Somesh Jha, UW-Madison 25 Interprocedural Model Generation A read read close line 28 July 2016 line write Somesh Jha, UW-Madison 26 Interprocedural Model Generation B A read read close line 28 July 2016 line write Somesh Jha, UW-Madison line close 27 Interprocedural Model Generation B A read read line line write close close 28 July 2016 Somesh Jha, UW-Madison 28 Interprocedural Model Generation B A read read line write close close 28 July 2016 Somesh Jha, UW-Madison 29 Possible Paths A read B read line write close close 28 July 2016 Somesh Jha, UW-Madison 30 Possible Paths A read B read line write close close 28 July 2016 Somesh Jha, UW-Madison 31 Impossible Paths A read B read line write close close 28 July 2016 Somesh Jha, UW-Madison 32 Impossible Paths A read B read line write close close 28 July 2016 Somesh Jha, UW-Madison 33 A read Adding Context Sensitivity B read line Y X write close Y close X 28 July 2016 Somesh Jha, UW-Madison 34 PDA State Explosion • ε-edge identifiers maintained on a stack – Stack may grow to be unbounded X • Solution: – Dyck language model – Stack operations visible in call stream – Requires binary rewriting 28 July 2016 Somesh Jha, UW-Madison 35 A read Dyck Language Model B read line Y X write close Y close X 28 July 2016 Somesh Jha, UW-Madison 36 A read Dyck Language Model read line B Y X write close Y’ close X’ 28 July 2016 Somesh Jha, UW-Madison 37 A read Dyck Language Model read line B Y X write close Y’ close X’ 28 July 2016 Somesh Jha, UW-Madison 38 Rewriting User Job User Job Analyzer Checking Shadow Binary Program 28 July 2016 Modified User Job Rewritten Binary Somesh Jha, UW-Madison 39 Null Call Insertion Expanded Interface Rewritten Binary Runtime Monitor Event Interface Operating System 28 July 2016 • Null calls are dummy system calls – Part of the expanded interface – Used by the monitor to update the model – Do not cross the interface to the operating system Somesh Jha, UW-Madison 40 Rewriting User Job function (int a) { if (a < 0) { read(0, 15); line(); } else { read(a, 15); close(a); } } 28 July 2016 • Insert dummy remote system calls around function call sites • Notify monitor of stack activity Somesh Jha, UW-Madison 41 Rewriting User Job function (int a) { if (a < 0) { read(0, 15); line(); } } else { read(a, 15); close(a); } 28 July 2016 • Insert dummy remote system calls around function call sites • Notify monitor of stack activity Somesh Jha, UW-Madison 42 Rewriting User Job function (int a) { if (a < 0) { read(0, 15); X(); line(); X’(); } else { read(a, 15); close(a); } } 28 July 2016 • Insert dummy remote system calls around function call sites • Notify monitor of stack activity • Null calls are cheap Somesh Jha, UW-Madison 43 Dyck Language Model Theory • Language accepted is bracketed contextfree language [Ginsberg, Harrison] • Subsequences of null calls form a Dyck language [Chomsky, Scheutzenberger] • Dyck languages as powerful as CFL LCFL = h(LDyck LReg) 28 July 2016 Somesh Jha, UW-Madison [Chomsky] 44 Test Programs Program procmail 28 July 2016 Number of Instructions 107,246 gzip 56,710 cat 54,028 ps 59,814 fdformat 67,874 eject 70,177 Somesh Jha, UW-Madison 45 Smart Null Call Insertion • Precision metric: average branching factor chown getpid open • Lower values indicate greater precision 28 July 2016 Somesh Jha, UW-Madison 46 NFA and Dyck Model Accuracy 12 11 Average Branching Factor 10 9 8 7 6 NFA Dyck 5 4 3 2 1 0 procmail 28 July 2016 gzip cat ps Somesh Jha, UW-Madison fdformat eject 47 Number of Calls Generated 3500 Number of Calls 3000 2500 2000 NFA Dyck 1500 1000 500 0 procmail 28 July 2016 gzip cat ps Somesh Jha, UW-Madison fdformat eject 48 Important Ideas • Attackers exploit code vulnerabilities to execute arbitrary, malicious code. • Pre-execution static analysis to construct a model of the system call sequences addresses this threat. • The Dyck model effectively balances model accuracy and runtime cost. 28 July 2016 Somesh Jha, UW-Madison 49 Static Analysis of Executables with Applications to Infosec Somesh Jha University of Wisconsin Wisconsin Safety Analyzer http://www.cs.wisc.edu/wisa