Part I - Princeton University

advertisement
Logic-based, data-driven enterprise
network security analysis
Xinming (Simon) Ou
Assistant Professor
CIS Department
Kansas State University
COS 598D: Formal Methods in Networking
Princeton University
March 08, 2010
1
Self Introduction
• Brief Bio
– PhD, Princeton University, 2005
– Post-doc, Purdue CERIAS, Idaho National Laboratory, 2006
– Assistant Professor, Kansas State University, 2006-now
• Research Interests
– Computer and network security, especially on formal and quantitative
analysis
– Programming languages, formal methods
• Research Group
– Argus: http://people.cis.ksu.edu/~xou/argus/
2
Overview of the two lectures
• Lecture One
– Datalog model for network attacks
– SLG resolution for Datalog evaluation
– Exhaustive proof generation for Datalog
• Lecture Two
– Formulating security hardening problem as a SAT
solving problem
– Applying MinCostSAT to achieve optimal security
configuration
– Open research problems
3
Cyber Defender’s Life
IDS alerts
Network configuration
Automated Situation
Awareness
Users and data assets
Reasoning
System
Vulnerability
reports
Apache
1.3.4
bug!
Security
advisories
4
Multi-step Attacks
Internet
Firewall 1
Demilitarized zone
(DMZ)
webServer
Firewall 2
sharedBinary
Trojan horse
Corporation
workStation
webPages
fileServer
5
Two Questions
• Are there potential attack paths in the system?
– How can they happen?
– How can they be addressed in an optimal way?
• Are there attacks that are going on/have
succeeded in the system?
– How do you know?
– How to counter the attack?
What we
are going
to focus on
6
User
information
MulVAL
Could root be
compromised on any of
the machines?
Ou, Govindavajhala, and Appel.
Usenix Security 2005
Datalog Rules from
Security Experts
Vulnerability
Information (e.g.
NIST NVD)
Vulnerability
definition (e.g.
OVAL, Nessus
Scripting
Language)
Analyzer
Vulnerability
Scanner
Vulnerability
Scanner
Answers
Network
reachability
information
Network
Analyzer
7
Host access-control lists
Network config
(firewall analyzer)
reachable(internet, webServer, tcp, 80)
reachable(webServer, fileserver, nfs, -)
.
.
.
8
File permissions
fileOwner(webServer, /bin/apache, root)
fileAttr(webServer, /bin/apache, r,w,x,r,0,0,r,0,0)
Host config scanner
9
Installed software
…
…
vulExists(dbServer, 'CVE-2009-2446', mySQL).
vulExists(webserver, ‘CVE-2006-3747’, httpd)
Host-based
vulnerability scanner
10
Security advisories
…
…
vulProperty('CVE-2009-2446', remote, privEscalation).
vulProperty('CVE-2006-3747', remote, privEscalation).
US-CERT
NVD
Apache
1.3.4
bug!
11
Datalog Rules
Linux security behavior;
Windows security behavior;
Common attack techniques
execCode(Host, PrivilegeLevel) :vulExists(Host, Program,
remote, privilegeEscalation),
Security expert
serviceRunning(Host, Program, Protocol, Port,
PrivilegeLevel),
networkAccess(Host, Protocol, Port).
The rules are completely
independent of any site-specific
settings.
12
Rule for NFS
accessFile(Server, Access, Path) :dmz
nfsExport(Server, Path, Access, Client),
webServer
reachable(Client, Server, nfs, -),
sharedBinary
execCode(Client, _Perm).
corp
webPages
fileServer
13
Rule for Trojan Horse
execCode(H, User) :accessFile(H, write, Path),
fileOwner(H, Path, User).
sharedBinary
projectPlan
Trojan horse
corp
webPages
fileServer
workStation
14
Deducing new facts
Oops!
execCode(attacker, webServer, apache).
execCode(Host, PrivilegeLevel) :vulExists(Host, Program, remote, privilegeEscalation),
serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel),
networkAccess(Host, Protocol, Port).
internet
networkAccess(webServer, tcp, 80).
Derived
serviceRunning(webServer, httpd, tcp, 80, apache).
From Vulnerability
Scanner
webServer
Firewall 1
dmz
vulExists(webServer, httpd, remote, privilegeEscalation).
From Vulnerability
Scanner & NVD
15
Advantages of using Prolog
• Prolog’s goal-oriented evaluation is
potentially more efficient.
• Prolog provides more programming
flexibility.
Can we evaluate Datalog programs in Prolog?
16
However…
• Prolog as a programming language cannot
be directly used to evaluate Datalog
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
17
However…
• Prolog as a programming language cannot
be directly used to evaluate Datalog
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
18
However…
• Prolog as a programming language cannot
be directly used to evaluate Datalog
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
19
Problem of SLD resolution
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
parent(bill,mary).
 ancestor(X, Y).
parent(mary,john).
 parent(X,Y).
X=bill
Y=mary

Success
X=mary
Y=john
 parent(X,Z), ancestor(Z,Y).
X=mary
Z=john
X=bill
Z=mary

ancestor(mary,Y).
Success
parent(mary,Y).
Y=john
ancestor(john,Y).
…
Failure
parent(mary,Z2), ancestor(Z2,Y).
Z2=john

Success
ancestor(john,Y).
…
Failure
20
Problem of SLD resolution
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
 ancestor(X, Y).
parent(mary,john).
 ancestor(Z, Y), parent(X, Z).
 ancestor(Z1, Y), parent(Z, Z1), parent(X, Z).
 ancestor(Z2, Y), parent(Z1, Z2), parent(Z, Z1), parent(X, Z).
…
21
Problem of SLD resolution
• Termination of cyclic Datalog programs not only
depends on logical semantics, but also the order
of the clauses and subgoals.
– This creates problems since in network security
analysis, such cyclic rules are common place.
• e.g. after compromising one machine, the attacker can use it as a
stepping stone to compromise another.
– Datalog is a declarative language; thus order should
not matter.
– A pure Datalog program shall always terminate due to
the bound on the number of tuples.
22
Bottom-up Evaluation
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
Semi-naïve Evaluation:
Step(1) (base case)
ancestor(bill,mary),ancestor(mary,john)
Step(2)
Iteration 1
ancestor(bill, john)
Iteration 2
No new tuples (“fixpoint”)
23
SLG Resolution
• Goal-oriented evaluation
• Predicates can be “tabled”
– A table stores the evaluation results of a goal.
– The results can be re-used later, i.e. dynamic
programming.
– Entering an active table indicates a cycle.
– Fixpoint operation is taken at such tables.
• The XSB system implements SLG resolution
– Developed by Stony Brook (http://xsb.sourceforge.net/ ).
– Provides full ISO Prolog compatibility.
24
SLG resolution example
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
 ancestor(X, Y).
active node
resolve ancestor(Z,Y)
against the results in the
table for ancestor(X,Y)
 ancestor(Z, Y), parent(X, Z).
Z=bill
Y=mary
 parent(X, bill).
Failure
Z=bill
Y=john
Z=mary
Y=john
generator node
new table created
for ancestor(X,Y)
 parent(X,Y).
X=bill
Y=mary
 parent(X, bill).

Failure
 parent(X, mary).
Success
X=bill

Success
X=mary
Y=john

Success
25
SLG in MulVAL
netAccess(H2, Protocol, Port) :execCode(H1, User),
reachable(H1, H2, Protocol, Port).
netAccess(…)
execCode(…)
Possible
instantiations
Possible
instantiations
table for goal
table for
first subgoal
from input tuples
26
SLG complexity for Datalog
• Total time dominated by the rule that has the
maximum number of instantiations
– Time for computing one table =
Computation of the subgoals
+ retrieving information from input tuples
+ matching results in the rules bodies
– Time for computing all tables =
retrieving information from input tuples
+ matching results in the rules’ bodies
• See “On the Complexity of Tabled Datalog Programs”
http://www.cs.sunysb.edu/~warren/xsbbook/node21.html
27
MulVAL complexity in SLG
execCode(Attacker, Host, User) :vulExists(Host, _, Program,
remote, privilegeEscalation),
networkService(Host, Program,
Protocol, Port, User),
netAccess(Attacker, Host, Protocol, Port).
Scale with network size
O(N) different
instantiations
28
MulVAL complexity in SLG
netAccess(Attacker, H2, Protocol, Port) :execCode(Attacker, H1, _),
reachable(H1, H2, Protocol, Port).
Scale with network size
Complexity
of MulVAL
O(N2) different
instantiations
29
Datalog proof generation
• In security analysis, not only do we want to know
what attacks could happen, but also we want to
know how attacks can happen
– Thus, we need more than an yes/no answer for
queries.
– We need the proofs for the true queries, which in the
case of security analysis will be attack paths.
– We also want to know all possible attack paths; thus
we need exhaustive proof generation.
30
An obvious approach
execCode(Host, PrivilegeLevel) :vulExists(Host, Program, remote, privilegeEscalation),
serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel),
networkAccess(Host, Protocol, Port).
execCode(Host, PrivilegeLevel, Pf) :vulExists(Host, Program, remote, privilegeEscalation, Pf1),
serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel, Pf2),
networkAccess(Host, Protocol, Port, Pf3),
Pf=(execCode(Host, PrivilegeLevel), [Pf1, Pf2, Pf3]).
This will break the bounded-term
property and result in non-termination for
cyclic Datalog programs
31
MulVAL Attack-Graph Toolkit
Ou, Boyer, and McQueen. ACM CCS 2006
Datalog rules
XSB
reasoning
engine
Graph
Builder
Machine
configuration
Translated rules
Datalog Proof Steps
Network
configuration
Datalog representation
Security
advisories
Datlog proof
graph
Joint work with Idaho National Laboratory
32
Stage 1: Record Proof Steps
netAccess(H2, Protocol, Port, ProofStep) :execCode(H1, User),
reachable(H1, H2, Protocol, Port),
Proof step
ProofStep=
because(
‘multi-hop network access',
netAccess(H2, Protocol, Port),
[execCode(H1, User),
reachable(H1, H2, Protocol, Port)]
).
33
Stage 2: Build the Exhaustive Proof
because(‘multi-hop network access',
netAccess(fileServer, rpc, 100003),
[execCode(webServer, apache),
reachable(webServer, fileServer, rpc, 100003)])
execCode(webServer, apache)
multi-hop network access
1
2
0
3
netAccess(fileServer, rpc, 100003)
reachable(webServer, fileServer,
rpc, 100003)
34
Complexity of Proof Building
• O(N2) to complete Datalog evaluation
– With proof steps generated
• O(N2) to build a proof graph from proof steps
– Need to build O(N2) graph components
– Building of one component
• Find the predecessor: table lookup
• Find the successors: table lookup
Total time: O(N2),
if table lookup is constant time
35
NFS shell
Logical Attack Graphs
accessFile(attacker,fileServer,
Trojan horse installation
write,/export)
netAccess(attacker,webServer,
tcp,80)
NFS semantics
2
0
Remote exploit
execCode(attacker,
webServer,apache)
6
1
3
accessFile(attacker,workStation,
write,/usr/local/share)
execCode(attacker,workStation,root)
: OR
: AND
: ground fact
5
4
vulExists(webServer,
CAN-2002-0392,
httpd,
remoteExploit,
privEscalation)
networkService (webServer,httpd,tcp,80,apache)
36
Performance and Scalability
10000
Fully connected
1000
Partitioned
CPU time (sec)
Ring
100
Star
10
1
0.1
0.01
1
10
100
1000
Number of hosts
37
Related Work
• Sheyner’s attack graph tool (CMU)
– Based on model-checking
• Cauldron attack graph tool (GMU)
– Based on graph-search algorithms
• NetSPA attack graph tool (MIT LL)
– Graph-search based on a simple attack model
38
Advantages of the Logicprogramming Approach
• Publishing and incorporation of
knowledge/information through wellunderstood logical semantics
• Efficient and sound analysis by leveraging
the reasoning power of well-developed
logic-deduction systems
39
Next Lecture
• How to make use of the proof graph
– Optimizing mitigation measures through SAT solving
• Open problems
– Uncertainty in reasoning
40
Download