PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION Haakon Ringberg

advertisement
PRIVACY-PRESERVING
COLLABORATIVE NETWORK
ANOMALY DETECTION
Haakon Ringberg
Unwanted network traffic
2

Problem
 Attacks
on resources (e.g., DDoS, malware)
 Lost productivity (e.g., instant messaging)
 Costs USD billions every year

Goal: detect & diagnose unwanted traffic
 Scale
to large networks by analyzing summarized data
 Greater accuracy via collaboration
 Protect
privacy using cryptography
Haakon Ringberg
Challenges with detection
3

Data volume
Some commonly used
algorithms analyze IP
packet payload info
 Infeasible at edge of
large networks

Network
Haakon Ringberg
Challenges with detection
Anomaly
Detector
4


Data volume
Attacks deliberately
mimic normal traffic
I’m not sure
about Beasty
 e.g.,
SQL-injection,
application-level DoS1
Network
Let me in!
1[Srivatsa
TWEB ’08], 2[Jung WWW ’02]
Haakon Ringberg
Challenges with detection
5


Data volume
Attacks deliberately
mimic normal traffic
 e.g.,
SQL-injection,
application-level DoS1

Network
Is it a DDoS attack or
a flash crowd?2
A
single network in
isolation may not be
able to distinguish
1[Srivatsa
TWEB ’08], 2[Jung WWW ’02]
Haakon Ringberg
Collaborative anomaly detection
6

“Bad guys tend to be
around when bad
stuff happens”
CNN.com
I’m just not
sure about
Beasty :-/
FOX.com
Haakon Ringberg
Collaborative anomaly detection
7


“Bad guys tend to be
around when bad
stuff happens”
CNN.com
“Foolus
Fool
usonce,
once,
shame on you.
Fool us, we can’t
get fooled again!”
again! 2
Targets (victims) could
correlate
attacks/attackers1
FOX.com
IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03]
2George W. Bush
1[Katti
Haakon Ringberg
Corporations demand privacy
8
I don’t want
FOX to know
my customers

Corporations are
reluctant to share
sensitive data
CNN.com
 Legal
constraints
 Competitive reasons
Haakon Ringberg
FOX.com
Common practice
9
AT&T
Sprint
Every network
for themselves!
Haakon Ringberg
System architecture
10
•
-like system
• Greater scalability
• Provide as a service
AT&T
Sprint
• Collaboration infrastructure
• For greater accuracy
• Protects privacy
Haakon Ringberg
N.B. collaboration
could also be
performed between
stub networks
Dissertation Overview
11
Providing
Detection at a
single network
Collaboration
Effectiveness
Collaboration
Infrastructure
Scalable Snortlike IDS system
Quantifying
Privacy of participants
benefits of coll.
and suspects
Technologies Machine Learning
Analysis of
Measurements
Cryptography
Venue
To be submitted
Submitted
ACM CCS ‘09
Presented
IEEE Infocom ’09
Haakon Ringberg
12
Chapter I: scalable signature-based
detection at individual networks
Work with at&t labs:
• Nick Duffield
• Patrick Haffner
• Balachander Krishnamurthy
Background: packet & rule IDSes
13
IP header
TCP header
Enterprise
App header
Payload

Intrusion Detection Systems (IDSes)


Protect the edge of a network
Leverage known signatures of traffic


e.g., Slammer worm packets contain “MS-SQL” (say) in payload
or AOL IM packets use specific TCP ports and application headers
Background: packet and rule IDSes
14


A predicate is a boolean function on a packet feature
 e.g., TCP port = 80
A signature (or rule) is a set of predicates
Benefits

Leverage existing
community




Many rules already exist
CERT, SANS Institute, etc
Classification “for free”
Accurate (?)
Background: packet and rule IDSes
15


A predicate is a boolean function on a packet feature
 e.g., TCP port = 80
A signature (or rule) is a set of predicates
Drawbacks


Network
Too many packets per second
Packet inspection at the edge
requires deployment at many
interfaces
Background: packet and rule IDSes
16


A predicate is a boolean function on a packet feature
 e.g., TCP port = 80
A signature (or rule) is a set of predicates
Packet has:
• Port number X, Y, or Z
• Contains pattern “foo”
within the first 20 bytes
• Contains pattern “bar”
within the last 40 bytes
Drawbacks
 Too many packets per second
 Packet inspection at the edge
requires deployment at many
interfaces
 DPI (deep-packet inspection)
predicates can be
computationally expensive
Our idea: IDS on IP flows
17
How well can signature-based IDSes
be mimicked on IP flows?
 Efficient
 Only fixed-offset predicates
 Flows are more compact
src
IP
dst
IP
A
B
 Flow collection
…
…
infrastructure is ubiquitous
src
Port
…
dst
Port
…
Durat
ion
#
Packets
5 min
36
…
…
 IP flows capture the
concept of a connection
Idea
18
1.
IDSes associate a “label” with every packet
2.
An IP flow is associated with a set of packets
3.
Our system associates the labels with flows
Snort rule taxonomy
19
Header-only
MetaInformation
Payload
dependent
Inspect only IP
flow header
Inexact
correspondence
Inspect packet
payload
e.g., port numbers
e.g., TCP flags
e.g., ”contains abc”
Relies on features that cannot
be exactly reproduced in IP
flow realm
Simple translation
20
3.
Our systems associates the labels with flows
 Simple rule translation would capture only flow predicates
20
 Low accuracy or low applicability
Snort rule:
Slammer
Worm
• dst port = MS SQL
• contains “Slammer”
Only flow predicates:
• dst port = MS SQL
Machine Learning (ML)
21
3.
Our systems associates the labels with flows
 Leverage ML to learn mapping from “IP flow space” to label
:
src port * # packets * flow duration
if
raised
otherwise
# packets
 e.g., IP flow space =
src port
Boosting
22
h1
h2
h3
Hfinal
sign
Boosting combines a set of weak learners to
create a strong learner
Benefit of Machine Learning (ML)
23
Slammer
Worm
Snort rule:
• dst port = MS SQL
• contains “Slammer”

Only flow predicates:
• dst port = MS SQL
ML-generated rule:
• dst port = MS SQL
• packet size = 404
• flow duration
ML algorithms discover new predicates to capture rule
 Latent
correlations between predicates
 Capturing same subspace using different dimensions
Evaluation
24




Border router on OC-3 link
Used Snort rules in place
Unsampled NetFlow v5 and packet traces
Statistics
 One
month, 2 MB/s average, 1 billion flows
 400k Snort alarms
Accuracy metrics
25

Receiver Operator
Characteristic (ROC)
Full FP vs TP tradeoff
 But need a single number



Area Under Curve (AUC)
Average Precision (AP)
AP of p
25
1-p
p
FP per TP
Classifier accuracy
26
5 FP per
100 TP
Rule class
Week1-2
Week1-3
Week1-4
Header rules
1.00
0.99
0.99
Metainformation
1.00
1.00
0.95
Payload
0.70
0.71
0.70
 Training
43 FP per
100 TP
on week 1, testing on week n
 Minimal drift within a month
 High degree of accuracy for header and meta
Variance within payload group
27

Rule
Average
Precision
MS-SQL version overflow
1.00
ICMP PING speedera
0.82
NON-RFC HTTP DELIM
0.48
Accuracy is a function of correlation between flow
and packet-level features
Computational efficiency
28
Our prototype can support
OC48 (2.5 Gbps) speeds:
1.
Machine learning (boosting)

2.
33 hours per rule for one week of OC48
Classification of flows


57k flows/sec 1.5 GHz Itanium 2
Line rate classification for OC48
29
Chapter II: Evaluating the effectiveness
of collaborative anomaly detection
Work with:
• Matthew Caesar
• Jennifer Rexford
• Augustin Soule
Methodology
30
Identify attacks in IP flow traces
2. Extract attackers
3. Correlate attackers across victims
1.
1)
2)
3)
Identifying anomalous events
31

Use existing anomaly detectors1



IP scans, port scans, DoS
e.g., IP scan is more than n IP
addresses contacted
Minimize false positives


Correlate with DNS BL
IP addresses exhibiting open
proxy or spambot behavior
1[Allan
IMC ’07], [Kompella IMC ’04]
Cooperative blocking
Beasty is
very bad!
32
CNN
FOX


A set ‘S’ of victims agree to participate
Beasty is blocked following initial attack

Subsequent attacks by Beasty on members of ‘S’ are deemed
ineffective
DHCP lease issues
33
10.0.0.1
?

Dynamic address allocation



IP address first owned by Beasty
Then owned by innocent Tweety
Should not block Tweety’s innocuous queries
CNN
DHCP lease issues
34
• Update DNS BL hourly
• Block IP addresses for a period
shorter than most DHCP leases1

Dynamic address allocation



1[Xie
IP address first owned by Beasty
Then owned by innocent Tweety
Should not block Tweety’s innocuous queries
SIGC ’07]
Methodology
35




IP flow traces from Géant
DNS BL to limit FP
Cooperative blocking of
attackers for Δ hours
Metric is fraction of potentially
mitigated flows
Blacklist duration parameter Δ
36


Collaboration between all hosts
Majority of benefit can be had with small Δ
Number of participating victims
37

Randomly selecting n victims to collaborate in scheme
 Reported number average of 10 random selections
Number of participating victims
38


Collaboration between most victimized hosts
Attackers are more like to continue to engage in bad action “x” than a
random other action
Chapter conclusion
39

Repeat-attacks often occur within one hour
 Substantially

less than average DHCP lease
Collaboration can be effective
 Attackers
contact a large number of victims
 10k random hosts could mitigate 50%

Some hosts are much more likely victims
 Subsets
of victims can see great improvement
40
Chapter III: Privacy-preserving
collaborative anomaly detection
Work with:
• Benny Applebaum
• Matthew Caesar
• Michael J Freedman
• Jennifer Rexford
Privacy-Preserving Collaboration
41
CNN
E(
)
Secure
Correlation
E(
Google
)
Protect privacy of
• Participants: do not reveal
who suspected whom
• Suspects: only reveal
suspects upon correlation
E(
FOX
Haakon Ringberg
)
System sketch
42

Trusted third party is a
point of failure
Google
rogue employee
 Inadvertent data leakage
 Risk of subpoena
MSFT
 Single
Secure
Correlation
CNN
Haakon Ringberg
FOX
System sketch
43

Trusted third party is a
point of failure
Google
MSFT
CNN
FOX
 Single
rogue employee
 Inadvertent data leakage
 Risk of subpoena

Fully distributed impractical
 Poor
scalability
 Liveness issues
Haakon Ringberg
Recall:
• Participant privacy
• Suspect privacy
Split trust
44
CNN
Proxy
DB
FOX



Managed by separate organizational entities
Honest but curious proxy, DB, participants (clients)
Secure as long as proxy and DB do not collude
Haakon Ringberg
Recall:
• Participant privacy
• Suspect privacy
Protocol outline
45
1.
Clients send suspect IP addrs (x)

2.
Client /
Participant
e.g., x = 127.0.0.1
DB releases IPs above threshold
x
Proxy
But this violates
suspect privacy!
x
#
1
23
3
2
DB
Recall:
• Participant privacy
• Suspect privacy
Protocol outline
46
1.
2.
Clients send suspect IP addrs (x)
DB releases IPs above threshold
Client /
Participant
H(x)
Hash of IP address
Proxy
Still violates
suspect privacy!
H(x)
#
1
23
3
2
DB
Recall:
• Participant privacy
• Suspect privacy
Protocol outline
47
1.
2.
Clients send suspect IP addrs (x)
IP addrs blinded w/Fs(x)


3.
Keyed hash function (PRF)
Key s held only by proxy
Client /
Participant
Keyed hash of IP
address
DB releases IPs above threshold
Still violates
suspect privacy!
Fs(x)
#
Fs(1)
23
Fs(3)
2
Fs(x)
Proxy
DB
Recall:
• Participant privacy
• Suspect privacy
Protocol outline
48
1.
2.
Clients send suspect IP addrs (x)
IP addrs blinded w/EDB(Fs(x))


3.
Keyed hash function (PRF)
Key s held only by proxy
Client /
Participant
Encrypted keyed
hash of IP address
DB releases IPs above threshold
But how do clients
learn EDB(Fs(x))?
Fs(x)
#
Fs(1)
23
Fs(3)
2
EDB(Fs(x))
Proxy
DB
Protocol outline
Recall:
• Participant privacy
• Suspect privacy
Clients send suspect IP addrs (x)
IP addrs blinded w/EDB(Fs(x))
Client /
Participant
49
1.
2.


3.
4.
Keyed hash function (PRF)
Key s held only by proxy
EDB(Fs(x)) learned through
secure function evaluation
DB releases IPs above threshold
Possible to reveal IP
addresses at the end
EDB(Fs(x))
x
Fs(x)
Proxy
s
DB
Fs(x)
#
Fs(1)
23
Fs(3)
2
Protocol summary
50

Client
Clients send suspects IPs
 Learns
Fs(x) using
secure function evaluation

EDB(Fs(3))
Proxy forwards to DB
Ds (Fs(3)) = 3
 Randomly
shuffles suspects
 Re-randomizes encryptions


DB correlates using Fs(x)
DB forwards bad Ips to proxy
Fs(x)
#
Fs(3)
2
1
Fs(3)
Architecture
51
Clients
Client-Facing Proxies
Back-End
DB Storage
Front-End
DB Tier


Proxy Decryption
Oracles
Proxy split into client-facing and decryption oracles
Proxies and DB are fully parallelizable
Evaluation
52

All components implemented
 ~5000
lines of C++
 Utilizing GnuPG, BSD TCP sockets, and Pthreads

Evaluated on custom test bed
 ~2
GHz (single, dual, quad-core) Linux machines
Algorithm
RSA / ElGamal
Parameter
key size
Value
1024 bits
Oblivious Transfer
AES
k
key size
80
256
Scalability w.r.t. # IPs
53

Single CPU core for DB and proxy each
Scalability w.r.t. # clients
54

Four CPU cores for DB and proxy each
Scalability w.r.t. # CPU cores
55

n CPU cores for DB and proxy each
Summary
56

Collaboration protocol protects privacy of
 Participants:
do not reveal who suspected whom
 Suspects: only reveal suspects upon agreement

Novel composition of crypto primitives
 One-way
function hides IPs from DB; public key encryption
allows subsequent revelation; secure function evaluation

Efficient implementation of architecture
 Millions
of IPs in hours
 Scales linearly with computing resources
Conclusion
57
1.
Speed

2.
Accuracy

3.
ML-based architecture supports accurate and
scalable Snort-like classification on IP flows
Collaborating against mutual adversaries
Privacy

Novel cryptographic protocol supports efficient
collaboration in privacy-preserving manner
Future Work Highlights
58
1. ML-based Snort-like architecture
 Cross-site: train on site A and test on site B
 Performance on sampled flow records
2. Measurement study
 Biased correlation results due to biased DNSBL (ongoing)
 Rate at which information must be exchanged
 Who should cooperate: end-points or ISPs?
3. Privacy-preserving collaboration
 Other applications, e.g., Viacom-vs-YouTube concerns
THANK YOU!
Collaborators: Jennifer Rexford, Benny
Applebaum, Matthew Caesar, Nick Duffield,
Michael J Freedman, Patrick Haffner,
Balachander Krishnamurthy, and Augustin Soule
Difference in rule accuracy
60
Rule
Overall
w/o dst
Accuracy port
w/o mean
packet size
MS-SQL version overflow
1.00
0.99
0.83
ICMP PING speedera
0.82
0.79
0.06
NON-RFC HTTP DELIM
0.48
0.02
0.22

Accuracy is a function of correlation between flow and
packet-level features
Choosing an operating point
61
• X = alarms we want raised
• Z = alarms that are raised
X
Y
Z
Y
Precision
Z
Exactness
Y
Recall
X
Completeness
AP is a single number, but not most intuitive
 Precision & recall are useful for operators

 “I
need to detect 99% of these alarms!”
Choosing an operating point
62
Rule
Precision
Precision
w/recall 1.00 w/recall=0.99
MS-SQL version overflow
1.00
1.00
ICMP PING speedera
0.02
0.83
CHAT AIM receive message
0.02
0.11
AP is a single number, but not most intuitive
 Precision & recall are useful for operators

 “I
need to detect 99% of these alarms!”
Quantifying the benefit of collaboration
63
MSNBC

FOX
CNN
Effectiveness of collaboration is a function of
1.
2.
Whether different victims see the same attackers
Whether all victims are equally likely to be targeted
IP address blinding
64
EDB(Fs(x))
Client

DB requires injective and one-way function on IPs


Cannot use simple hash
Fs(x) is keyed hash function (PRF) on IPs

Key s held only by proxy
Haakon Ringberg
Secure Function Evaluation
65
EDB(Fs(x))
x
Fs(x)


Client
s
IP address blinding can be split into per-IP-bit xi problem
 Client must learn EDB(Fs(xi))
 Client must not learn s
 Proxy must not learn xi
Oblivious Transfer (OT) accomplishes this1,2

Amortized OT makes asymptotic performance equal to
1[Naor et al. SODA ’01] ,
matrix multiplication3
et al. TCC ’05] ,
2[Ishai et al. CRYPTO ’03]
Haakon Ringberg 1[Freedman
Public key encryption
66

Clients encrypt suspect IPs (x)
First w/proxy’s pubkey
 Then w/DB’s pubkey
Client


Forwarded by proxy


Does not learn IPs
Decrypted by DB


EDB(EPX(x))
Does not learn IPs
Does not allow for DB correlation
due to padding (e.g., OAEP)
Haakon Ringberg
EPX(x)
How client learns Fs(x)
67

Client must learn Fs(x)
Client must not learn ‘s’
 Proxy must not learn ‘x’


Naor-Reingold PRF
s = { si | 1 ≤ i ≤ 32}
 PRF = g^(∏x =1 si)
i


Add randomness ui to
obscure si from client
Message = ui * si
Haakon Ringberg
How client learns Fs(x)
68

s0
s1
Fs(x) =
u0
u1 * s1
x=
x0=0
x1=1
s31
u31 * s31
x31=1
For each bit xi of the IP, the client learns



s=
ui * si, if xi is 1
ui, if xi is 0
The user also learns ∏ ui
Haakon Ringberg
How client learns Fs(x)
69
∏xi=1 ui * si * ∏xi=0 ui
∏ ui * ∏xi=1 si / ∏ ui
∏ ui



∏ ui * ∏xi=1 si
Fs(x) = ∏xi=1 si
User multiplies together all values
Divides out ∏ ui
Acquires Fs(x) w/o having learned ‘s’
Haakon Ringberg
How client learns Fs(x)
70
• But how does the client learn
• si * ui, if xi is 1
• ui, if xi is 0
• Without the proxy learning the IP x?



User multiplies together all values
Divides out ∏ ui
Acquires Fs(x) w/o having learned ‘s’
Haakon Ringberg
70
Oblivious Transfer (details)
71
1.
Client sends f(x=0) and f(x=1)

2.
Proxy doesn’t learn x
Proxy sends


3.
•x
• g(f(x))
v(0) = Eg(f(0))(1 + r)
v(1) = Eg(f(1))(s + r)
Public:
• f(x)
• g(x)
Client decrypts v(x) with g(f(x))


Client
f(0)
f(1)
v(0)
v(1)
Calculates g(f(x))
Cannot calculate g(f(1-x))
s
Haakon Ringberg
Oblivious Transfer (more details)
72
Preprocessing:



Proxy chooses random c and r (at startup)
Proxy publishes c and gr
Client chooses random k (for each bit)
y0
y1
1. Keyx = gk
Key1-x = c * g-k
2. Keyxr = (gr)k
Used to decrypt yx
Key0
Haakon Ringberg
1. Key0r = Key0r
Key1r = cr / Key0r
2. y0 = AESKey1r (u)
y1 = AESKey0r (s * u)
Oblivious Transfer (more details)
73
• Proxy never learns x
• Client can calculate Keyxr = (gr)k easily,
but cannot calculate cr (due to lack of r),
which is needed for Key1-xr = cr * (gr)-k
y0
y1
1. Keyx = gk
Key1-x = c * g-k
2. Keyxr = (gr)k
Used to decrypt yx
Key0
Haakon Ringberg
1. Key0r = Key0r
Key1r = cr / Key0r
2. y0 = AESKey1r (u)
y1 = AESKey0r (s * u)
Other usage scenarios
74
1.
Cross-checking certificates



2.
e.g., Perspectives1
Clients = end users
Keys = Hash of certificates received
Distributed ranking



e.g., Alexa Toolbar2
Clients = Web users
Keys = Hash of web pages
USENIX ’08],
2[www.alexa.com]
1[Wendlandt
Download