Static Analysis of Executables with Applications to Infosec Somesh Jha University of Wisconsin

advertisement
Static Analysis of Executables with
Applications to Infosec
Somesh Jha
University of Wisconsin
Wisconsin Safety Analyzer
http://www.cs.wisc.edu/wisa
Various Tasks
• Developing infrastructure for analysis and
rewriting executables
– Initial focus on x86 and infosec applications
• Applications to host-based intrusion detection
• Malicious code detection
– Attacking virus scanners
– Better malicious code detection techniques
• Foundations
– Framework for interprocedural analysis
– Applications to “identifying structure” in activation
records
28 July 2016
Somesh Jha, UW-Madison
2
Goal: Discover attempts to maliciously gain
access to a system
Misuse Detection
Specification-Based
Monitoring
Anomaly Detection
• Specify patterns of
attack or misuse
• Specify constraints upon • Learn typical behavior
of application
program behavior
• Ensure misuse patterns
do not arise at runtime
• Ensure execution does
not violate specification
• Variations indicate
potential intrusions
• Snort
• Our work; Ko, et. al.
• IDES
• Rigid: cannot adapt
to novel attacks
• Specifications can be
cumbersome to create
• High false alarm rate
28 July 2016
Somesh Jha, UW-Madison
3
Worldview
User
Program
Event Interface
Operating
System
28 July 2016
• User desires to run
program
• Running program
makes operating
system requests
• Attacker uses running
program to generate
malicious requests
Somesh Jha, UW-Madison
4
Worldview
• Attack goal:
be creative…
– Destruction
– Information leaks
– Service disruption
User
Program
• Attack technique:
run arbitrary code in the
user program
Event Interface
Operating
System
28 July 2016
– Buffer overrun
– Virus or worm
– Manipulate remote execution
Somesh Jha, UW-Madison
5
Example: SQL Slammer
• Worm activated January 2003
– Caused worldwide service disruption
• Propagation: exploited buffer overrun in
Microsoft SQL Server to execute
arbitrary code
• Detection: SQL Server makes unexpected
system calls
– Arbitrary code differs from SQL code
28 July 2016
Somesh Jha, UW-Madison
6
Our Objective
• Detect malicious
activity before harm
caused to local
machine
User
Program
Event Interface
Operating
System
28 July 2016
• … before operating
system executes
malicious system call
Somesh Jha, UW-Madison
7
Our Objective
User
Program
Event Interface
Operating
System
28 July 2016
Our work
• Detection at system call
interface makes our work
independent of intrusion
technique
Somesh Jha, UW-Madison
8
Our Objective
Snort
• Detection at service
interface: limited to
network-based attacks
User
Program
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
9
Specification-Based Monitoring
• Specify constraints upon program behavior
– Construct automaton accepting all system call
sequences the program can generate
– First suggested by Wagner-Dean, Oakland, 2000
(static analysis of source code)
– Our analysis is on binaries
• Ensure execution does not violate specification
– Operate the automaton
– If no valid states, then intrusion attempt occurred
28 July 2016
Somesh Jha, UW-Madison
10
An Application of Binary
Analysis/Rewriting Infrastructure
• Binary analysis
– Construct model for host-based intrusion
detection
• Binary rewriting
– Rewrite binary to expose more information
about the program
– Makes the model more precise
• Current prototype uses EEL
– J.R. Larus and E. Schnarr, PLDI, 1995.
28 July 2016
Somesh Jha, UW-Madison
11
Specification-Based Monitoring
User
Program
Analyzer
Rewritten
Binary
28 July 2016
Runtime
Monitor
Somesh Jha, UW-Madison
12
Specification-Based Monitoring
User
Program
Analyzer
Rewritten
Binary
28 July 2016
Runtime
Monitor
Somesh Jha, UW-Madison
13
Specification-Based Monitoring
Rewritten
Binary
Runtime
Monitor
User
Program
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
14
Specification-Based Monitoring
Runtime
Monitor
Rewritten
Binary
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
15
Specification-Based Monitoring
Rewritten
Binary
Runtime
Monitor
Event Interface
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
16
Specification-Based Monitoring
Rewritten
Binary
Event Interface
Runtime
Monitor
• Our runtime monitor
monitors program
execution at the event
interface layer
• Ensures program events
match specification
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
17
Specification-Based Monitoring
Rewritten
Binary
Event Interface
Runtime
Monitor
Event Interface
Operating
System
• Our runtime monitor
monitors program
execution at the event
interface layer
• Ensures program events
match specification
• Runtime monitor must
be part of trusted
computing base
Trusted computing base
28 July 2016
Somesh Jha, UW-Madison
18
Specification-Based Monitoring
Rewritten
Binary
Event Interface
Runtime
Monitor
• Event interface defines
observable events
• Observed events may be
superset of system calls
• Expand interface between
program and monitor
– Call-site renaming
– Null calls
Event Interface
Operating
System
28 July 2016
Somesh Jha, UW-Madison
19
Specification-Based Monitoring
Expanded
Interface
Rewritten
Binary
Runtime
Monitor
Event Interface
Operating
System
28 July 2016
• Expanded set of
observable events
– More precise program
modeling
– More efficient model
operation
• User program
rewritten to use
expanded interface
Somesh Jha, UW-Madison
20
Model Construction
User
Program
Analyzer
Rewritten
Binary
Binary
Program
28 July 2016
Control
Flow
Graphs
Runtime
Monitor
Local
Automata
Somesh Jha, UW-Madison
Global
Automaton
21
The Binary View (SPARC)
function:
save %sp, 0x96, %sp
cmp %i0, 0
bge L1
mov 15, %o1
call read
mov 0, %o0
call line
nop
b L2
nop
L1:
call read
mov %i0, %o0
call close
mov %i0, %o0
L2:
ret
restore
28 July 2016
function (int a) {
if (a < 0) {
read(0, 15);
line();
} else {
read(a, 15);
close(a);
}
}
Somesh Jha, UW-Madison
22
Control Flow Graph
Generation
function:
save %sp, 0x96, %sp
cmp %i0, 0
bge L1
mov 15, %o1
call read
mov 0, %o0
call line
nop
b L2
nop
L1:
call read
mov %i0, %o0
call close
mov %i0, %o0
L2:
ret
restore
CFG ENTRY
bge
call read
call read
call close
call line
ret
CFG EXIT
28 July 2016
Somesh Jha, UW-Madison
23
Control Flow Graph
Translation
CFG ENTRY
bge
read
close
read
line
call read
call read
call close
call line
ret
CFG EXIT
28 July 2016
Somesh Jha, UW-Madison
24
Interprocedural Model
Generation
A
read
read
close
line
28 July 2016
Somesh Jha, UW-Madison
25
Interprocedural Model
Generation
A
read
read
close
line
28 July 2016
line
write
Somesh Jha, UW-Madison
26
Interprocedural Model
Generation
B
A
read
read
close
line
28 July 2016
line
write
Somesh Jha, UW-Madison
line
close
27
Interprocedural Model
Generation
B
A
read
read
line
line

write
close
close

28 July 2016
Somesh Jha, UW-Madison
28
Interprocedural Model
Generation
B
A
read

read
line

write
close

close

28 July 2016
Somesh Jha, UW-Madison
29
Possible
Paths
A
read
B

read
line

write
close

close

28 July 2016
Somesh Jha, UW-Madison
30
Possible
Paths
A
read
B

read
line

write
close

close

28 July 2016
Somesh Jha, UW-Madison
31
Impossible
Paths
A
read
B

read
line

write
close

close

28 July 2016
Somesh Jha, UW-Madison
32
Impossible
Paths
A
read
B

read
line

write
close

close

28 July 2016
Somesh Jha, UW-Madison
33
A
read
Adding Context
Sensitivity
B

read
line
Y

X
write
close

Y
close

X
28 July 2016
Somesh Jha, UW-Madison
34
PDA State Explosion
• ε-edge identifiers maintained on a stack
– Stack may grow to be unbounded
X
• Solution:
– Dyck language model
– Stack operations visible in call stream
– Requires binary rewriting
28 July 2016
Somesh Jha, UW-Madison
35
A
read
Dyck Language
Model
B

read
line
Y

X
write
close

Y
close

X
28 July 2016
Somesh Jha, UW-Madison
36
A
read
Dyck Language
Model
read
line
B
Y
X
write
close
Y’
close
X’
28 July 2016
Somesh Jha, UW-Madison
37
A
read
Dyck Language
Model
read
line
B
Y
X
write
close
Y’
close
X’
28 July 2016
Somesh Jha, UW-Madison
38
Rewriting User Job
User Job
Analyzer
Checking
Shadow
Binary
Program
28 July 2016
Modified
User Job
Rewritten
Binary
Somesh Jha, UW-Madison
39
Null Call Insertion
Expanded
Interface
Rewritten
Binary
Runtime
Monitor
Event Interface
Operating
System
28 July 2016
• Null calls are dummy
system calls
– Part of the expanded
interface
– Used by the monitor to
update the model
– Do not cross the
interface to the
operating system
Somesh Jha, UW-Madison
40
Rewriting User Job
function (int a) {
if (a < 0) {
read(0, 15);
line();
} else {
read(a, 15);
close(a);
}
}
28 July 2016
• Insert dummy remote
system calls around
function call sites
• Notify monitor of
stack activity
Somesh Jha, UW-Madison
41
Rewriting User Job
function (int a) {
if (a < 0) {
read(0, 15);
line();
}
} else {
read(a, 15);
close(a);
}
28 July 2016
• Insert dummy remote
system calls around
function call sites
• Notify monitor of
stack activity
Somesh Jha, UW-Madison
42
Rewriting User Job
function (int a) {
if (a < 0) {
read(0, 15);
X();
line();
X’();
} else {
read(a, 15);
close(a);
}
}
28 July 2016
• Insert dummy remote
system calls around
function call sites
• Notify monitor of
stack activity
• Null calls are cheap
Somesh Jha, UW-Madison
43
Dyck Language Model Theory
• Language accepted is bracketed contextfree language [Ginsberg, Harrison]
• Subsequences of null calls form a Dyck
language [Chomsky, Scheutzenberger]
• Dyck languages as powerful as CFL
LCFL = h(LDyck  LReg)
28 July 2016
Somesh Jha, UW-Madison
[Chomsky]
44
Test Programs
Program
procmail
28 July 2016
Number of
Instructions
107,246
gzip
56,710
cat
54,028
ps
59,814
fdformat
67,874
eject
70,177
Somesh Jha, UW-Madison
45
Smart Null Call Insertion
• Precision metric: average branching factor
chown
getpid
open
• Lower values indicate greater precision
28 July 2016
Somesh Jha, UW-Madison
46
NFA and Dyck Model Accuracy
12
11
Average Branching Factor
10
9
8
7
6
NFA
Dyck
5
4
3
2
1
0
procmail
28 July 2016
gzip
cat
ps
Somesh Jha, UW-Madison
fdformat
eject
47
Number of Calls Generated
3500
Number of Calls
3000
2500
2000
NFA
Dyck
1500
1000
500
0
procmail
28 July 2016
gzip
cat
ps
Somesh Jha, UW-Madison
fdformat
eject
48
Important Ideas
• Attackers exploit code vulnerabilities to execute
arbitrary, malicious code.
• Pre-execution static analysis to construct a
model of the system call sequences addresses
this threat.
• The Dyck model effectively balances model
accuracy and runtime cost.
28 July 2016
Somesh Jha, UW-Madison
49
Static Analysis of Executables with
Applications to Infosec
Somesh Jha
University of Wisconsin
Wisconsin Safety Analyzer
http://www.cs.wisc.edu/wisa
Download