Secure Network Provenance
Wenchao Zhou*, Qiong Fei*, Arjun Narayan*,
Andreas Haeberlen*, Boon Thau Loo*, Micah Sherr+
*University
of Pennsylvania
+Georgetown
http://snp.cis.upenn.edu/
University
Motivation
Route r2
Why did my route to
foo.com change?!
D
E
A
Innocent Reason?
Malicious Attack?
Alice
Route r1
foo.com
B
C
An example scenario: network routing
System administrator observes strange behavior
Example: the route to foo.com has suddenly changed
What exactly happened (innocent reason or malicious
attack)?
2
We Need Secure Forensics
For network routing …
Example: incident in March 2010
Traffic from Capital Hill got redirected
… but also for other application scenarios
Distributed hash table: Eclipse attack
Cloud computing: misbehaving machines
Online multi-player gaming: cheating
Goal: secure forensics in adversarial scenarios
3
Ideal Solution
Route r2
Q: Explain
theto
Why
did mywhy
route
route
to foo.com
foo.com
change?!
changed to r2.
D
E
A
Alice
foo.com
C
B
The Network
A: Because someone accessed
Router D and changed the
configuration from X to Y.
Not realistic: adversary can tell lies
4
Challenge: Adversaries Can Lie
Everything is fine. Router
I should cover up
E advertised a new route.
the intrusion.
Route r2
Q: Explain why the
route to foo.com
changed to r2.
D
E
A
Alice
foo.com
C
B
The Network
Problem: adversary can …
... fabricate plausible (yet incorrect) response
… point accusation towards innocent nodes
5
Existing Solutions
Existing systems assume trusted components
Trusted OS kernel, monitor, or hardware
E.g. Backtracker [OSDI 06], PASS [USENIX ATC 06], ReVirt [OSDI 02],
A2M [SOSP 07]
These components may have bugs or be compromised
Are there alternatives that do not require such trust?
Our solution:
We assume no trusted components;
Adversary has full control over an arbitrary subset of the
network (Byzantine faults).
6
Ideal Guarantees
Fundamentally
impossible
Ideally: explanation is always complete and accurate
Fundamental limitations
E.g. Faulty nodes secretly exchange messages
E.g. Faulty nodes communicate outside the system
What guarantees can we provide?
7
Realistic Guarantees
Route r2
Q: Why did my route to
foo.com change to r2?
D
E
A
foo.com
Alice
B
Aha, at least I know which
node is compromised.
The Network
C
A: Because someone accessed
Router D and changed its router
configuration from X to Y.
No faults: Explanation is complete and accurate
Byzantine fault: Explanation identifies at least one faulty node
Formal definitions and proofs in the paper
8
Outline
Goal: A secure forensics system that works in an
adversarial environment
Explains unexpected behavior
No faults: explanation is complete and accurate
Byzantine fault: exposes at least one faulty node with evidence
Model: Secure Network Provenance
Tamper-evident Maintenance and Processing
Evaluation
Conclusion
9
Provenance as Explanations
route(D, foo.com)
D
link(D, E)
route(E, foo.com)
E
link(E, B)
foo.com
route(A, foo.com)
A
Alice
route(A, B)
link(A, B)
route(A, D)
……
B
route(B, foo.com)
link(B, C)
C
route(C, foo.com)
link(C, foo.com)
Origin: data provenance in databases
Explains the derivation of tuples (ExSPAN [SIGMOD 10])
Captures the dependencies between tuples as a graph
“Explanation” of a tuple is a tree rooted at the tuple
10
Secure Network Provenance
route(A, foo.com)
link(A, B)
route(B, foo.com)
link(B, C)
route(C, foo.com)
link(C, foo.com)
Challenge #1. Handle past and transient behavior
Traditional data provenance targets current, stable state
What if the system never converges?
What if the state no longer exists?
11
Secure Network Provenance
Time = t1
Time = t2
Time = t3
route(A, foo.com)
route(C, foo.com)
link(C, foo.com)
route(B, foo.com)
route(C, foo.com)
link(B, C)
link(A, B)
link(C, foo.com)
route(B, foo.com)
route(C, foo.com)
link(B, C)
link(C, foo.com)
Timeline
Challenge #1. Handle past and transient behavior
Traditional data provenance targets current, stable state
What if the system never converges?
What if the state no longer exists?
Solution: Add a temporal dimension
12
Secure Network Provenance
Time = t1
Time = t2
Time = t3
route(A,
foo.com)
+route(A,
foo.com)
route(C, foo.com)
link(C, foo.com)
route(B, foo.com)
route(C, foo.com)
link(B, C)
link(A, B)
link(C, foo.com)
route(B,
+route(B,
foo.com)
foo.com)
route(C, foo.com)
+route(C,
foo.com)
link(B, C)
link(C, foo.com)
+link(C,
foo.com)
Timeline
Challenge #2. Explain changes, not just state
Traditional data provenance targets system state
Often more useful to ask why a tuple (dis)appeared
Solution: Include “deltas” in provenance
13
Secure Network Provenance
route(D, foo.com)
link(D, E)
route(E, foo.com)
link(E, B)
route(A, foo.com)
link(A, B)
route(B, foo.com)
link(B, C)
route(C, foo.com)
link(C, foo.com)
Challenge #3. Partition and secure provenance
A trusted node would be ideal, but we don’t have one
Need to partition the graph among the nodes themselves
Prevent nodes from altering the graph
14
Partitioning the Provenance Graph
route(D, foo.com)
link(D, E)
route(E, foo.com)
link(E, B)
route(A, foo.com)
link(A, B)
RECV
SEND
route(B, foo.com)
link(B, C)
route(C, foo.com)
link(C, foo.com)
Step 1: Each node keeps vertices about local actions
Split cross-node communications
Step 2: Make the graph tamper-evident
15
Securing Cross-Node Edges
Signed
commitment
from B
Signed
ACK
from A
RECEIVE
SEND
RECV
Router A
SEND
Router B
Step 1: Each node keeps vertices about local actions
Split cross-node communications
Step 2: Make the graph tamper-evident
Secure cross-node edges (evidence of omissions)
16
Outline
Goal: A secure forensics system that works in an
adversarial environment
Explains unexpected behavior
No faults: explanation is complete and accurate
Byzantine fault: exposes at least one faulty node with evidence
Model: Secure Network Provenance
Tamper-evident Maintenance and Processing
Evaluation
Conclusion
17
System Overview
Users
Primary system
Provenance system
Operator
Extract provenance
Maintain provenance
Query provenance
Query engine
Application
Maintenance
engine
Network
Stand-alone provenance system
On-demand provenance reconstruction
Provenance graph can be huge (with temporal dimension)
Rebuild only the parts needed to answer a query
18
Extracting Dependencies
Option 1: Inferred provenance
Declarative specifications explicitly capture provenance
E.g. Declarative networking, SQL queries, etc.
Option 2: Reported provenance
Modified source code reports provenance
Option 3: External specification
Defined on observed I/Os of a black-box system
19
Secure Provenance Maintenance
Maintain sufficient information for reconstruction
I/O and non-deterministic events are sufficient
Logs are maintained using tamper-evident logging
Based on ideas from PeerReview [SOSP 07]
E
D
foo.com
Alice
A
……
RECV
ACK
B
C
……
SEND
RCV-ACK
20
Secure Provenance Querying
Recursively construct the provenance graph
Retrieve secure logs from remote nodes
Check for tampering, omission, and equivocation
Replay the log to regenerate the provenance graph
E
D
Alice
A
Explain the route
from A to foo.com.
foo.com
route(A, foo.com)
RECV (from B)
link(A, B)
B
C
21
Secure Provenance Querying
Recursively construct the provenance graph
Retrieve secure logs from remote nodes
Check for tampering, omission, and equivocation
Replay the log to regenerate the provenance graph
E
D
foo.com
route(A, foo.com)
Alice
A
link(A, B)
route(B, foo.com)
RECV (from C)
B
link(B, C)
C
22
Secure Provenance Querying
Recursively construct the provenance graph
Retrieve secure logs from remote nodes
Check for tampering, omission, and equivocation
Replay the log to regenerate the provenance graph
E
D
foo.com
route(A, foo.com)
Alice
A
OK. Now I know
how the route
was derived.
link(A, B)
route(B, foo.com)
B
link(B, C)
C
route(C, foo.com)
link(C, foo.com)
23
Outline
Goal: A secure forensics system that works in an
adversarial environment
Explains unexpected behavior
No faults: explanation is complete and accurate
Byzantine fault: exposes at least one faulty node with evidence
Model: Secure Network Provenance
Tamper-evident Maintenance and Processing
Evaluation
Conclusion
24
Evaluation Results
Prototype implementation (SNooPy)
How useful is SNP? Is it applicable to different systems?
How expensive is SNP at runtime?
Traffic overhead, storage cost, additional CPU overhead?
Does SNP affect scalability?
What is the querying performance?
Per-query traffic overhead?
Turnaround time for each query?
25
Usability: Applications
We evaluated SNooPy with
Quagga BGP: RouteView (external specification)
Explains oscillation caused by router misconfiguration
Hadoop MapReduce: (reported provenance)
Detects a tampered Mapper that returns inaccurate results
Declarative Chord DHT: (inferred provenance)
Detects an Eclipse attacker that always returns its own ID
SNooPy solves problems reported in existing work
26
Runtime Overhead: Storage
Over 50% of the overhead
was due to signatures and
acks. Batching messages
would help.
Manageable storage overhead
One week of data: E.g. Quagga – 7.3GB; Chord – 665MB
27
Query Latency
largely due to
replaying logs
Verification
Replay
Download
dominated by
verifying logs
and snapshots
Query latency varies from application to application
Reasonable overhead
28
Summary
Secure network provenance in untrusted environments
Requires no trusted components
Strong guarantees even in the presence of Byzantine faults
Formal proof in a technical report
Significantly extends traditional provenance model
Past and transient state, provenance of change, …
Efficient storage: reconstructs provenance graph on demand
Application-independent
(Quagga, Hadoop, and Chord)
Questions?
Project website: http://snp.cis.upenn.edu/
29