PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION Haakon Ringberg Unwanted network traffic 2 Problem Attacks on resources (e.g., DDoS, malware) Lost productivity (e.g., instant messaging) Costs USD billions every year Goal: detect & diagnose unwanted traffic Scale to large networks by analyzing summarized data Greater accuracy via collaboration Protect privacy using cryptography Haakon Ringberg Challenges with detection 3 Data volume Some commonly used algorithms analyze IP packet payload info Infeasible at edge of large networks Network Haakon Ringberg Challenges with detection Anomaly Detector 4 Data volume Attacks deliberately mimic normal traffic I’m not sure about Beasty e.g., SQL-injection, application-level DoS1 Network Let me in! 1[Srivatsa TWEB ’08], 2[Jung WWW ’02] Haakon Ringberg Challenges with detection 5 Data volume Attacks deliberately mimic normal traffic e.g., SQL-injection, application-level DoS1 Network Is it a DDoS attack or a flash crowd?2 A single network in isolation may not be able to distinguish 1[Srivatsa TWEB ’08], 2[Jung WWW ’02] Haakon Ringberg Collaborative anomaly detection 6 “Bad guys tend to be around when bad stuff happens” CNN.com I’m just not sure about Beasty :-/ FOX.com Haakon Ringberg Collaborative anomaly detection 7 “Bad guys tend to be around when bad stuff happens” CNN.com “Foolus Fool usonce, once, shame on you. Fool us, we can’t get fooled again!” again! 2 Targets (victims) could correlate attacks/attackers1 FOX.com IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03] 2George W. Bush 1[Katti Haakon Ringberg Corporations demand privacy 8 I don’t want FOX to know my customers Corporations are reluctant to share sensitive data CNN.com Legal constraints Competitive reasons Haakon Ringberg FOX.com Common practice 9 AT&T Sprint Every network for themselves! Haakon Ringberg System architecture 10 • -like system • Greater scalability • Provide as a service AT&T Sprint • Collaboration infrastructure • For greater accuracy • Protects privacy Haakon Ringberg N.B. collaboration could also be performed between stub networks Dissertation Overview 11 Providing Detection at a single network Collaboration Effectiveness Collaboration Infrastructure Scalable Snortlike IDS system Quantifying Privacy of participants benefits of coll. and suspects Technologies Machine Learning Analysis of Measurements Cryptography Venue To be submitted Submitted ACM CCS ‘09 Presented IEEE Infocom ’09 Haakon Ringberg 12 Chapter I: scalable signature-based detection at individual networks Work with at&t labs: • Nick Duffield • Patrick Haffner • Balachander Krishnamurthy Background: packet & rule IDSes 13 IP header TCP header Enterprise App header Payload Intrusion Detection Systems (IDSes) Protect the edge of a network Leverage known signatures of traffic e.g., Slammer worm packets contain “MS-SQL” (say) in payload or AOL IM packets use specific TCP ports and application headers Background: packet and rule IDSes 14 A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates Benefits Leverage existing community Many rules already exist CERT, SANS Institute, etc Classification “for free” Accurate (?) Background: packet and rule IDSes 15 A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates Drawbacks Network Too many packets per second Packet inspection at the edge requires deployment at many interfaces Background: packet and rule IDSes 16 A predicate is a boolean function on a packet feature e.g., TCP port = 80 A signature (or rule) is a set of predicates Packet has: • Port number X, Y, or Z • Contains pattern “foo” within the first 20 bytes • Contains pattern “bar” within the last 40 bytes Drawbacks Too many packets per second Packet inspection at the edge requires deployment at many interfaces DPI (deep-packet inspection) predicates can be computationally expensive Our idea: IDS on IP flows 17 How well can signature-based IDSes be mimicked on IP flows? Efficient Only fixed-offset predicates Flows are more compact src IP dst IP A B Flow collection … … infrastructure is ubiquitous src Port … dst Port … Durat ion # Packets 5 min 36 … … IP flows capture the concept of a connection Idea 18 1. IDSes associate a “label” with every packet 2. An IP flow is associated with a set of packets 3. Our system associates the labels with flows Snort rule taxonomy 19 Header-only MetaInformation Payload dependent Inspect only IP flow header Inexact correspondence Inspect packet payload e.g., port numbers e.g., TCP flags e.g., ”contains abc” Relies on features that cannot be exactly reproduced in IP flow realm Simple translation 20 3. Our systems associates the labels with flows Simple rule translation would capture only flow predicates 20 Low accuracy or low applicability Snort rule: Slammer Worm • dst port = MS SQL • contains “Slammer” Only flow predicates: • dst port = MS SQL Machine Learning (ML) 21 3. Our systems associates the labels with flows Leverage ML to learn mapping from “IP flow space” to label : src port * # packets * flow duration if raised otherwise # packets e.g., IP flow space = src port Boosting 22 h1 h2 h3 Hfinal sign Boosting combines a set of weak learners to create a strong learner Benefit of Machine Learning (ML) 23 Slammer Worm Snort rule: • dst port = MS SQL • contains “Slammer” Only flow predicates: • dst port = MS SQL ML-generated rule: • dst port = MS SQL • packet size = 404 • flow duration ML algorithms discover new predicates to capture rule Latent correlations between predicates Capturing same subspace using different dimensions Evaluation 24 Border router on OC-3 link Used Snort rules in place Unsampled NetFlow v5 and packet traces Statistics One month, 2 MB/s average, 1 billion flows 400k Snort alarms Accuracy metrics 25 Receiver Operator Characteristic (ROC) Full FP vs TP tradeoff But need a single number Area Under Curve (AUC) Average Precision (AP) AP of p 25 1-p p FP per TP Classifier accuracy 26 5 FP per 100 TP Rule class Week1-2 Week1-3 Week1-4 Header rules 1.00 0.99 0.99 Metainformation 1.00 1.00 0.95 Payload 0.70 0.71 0.70 Training 43 FP per 100 TP on week 1, testing on week n Minimal drift within a month High degree of accuracy for header and meta Variance within payload group 27 Rule Average Precision MS-SQL version overflow 1.00 ICMP PING speedera 0.82 NON-RFC HTTP DELIM 0.48 Accuracy is a function of correlation between flow and packet-level features Computational efficiency 28 Our prototype can support OC48 (2.5 Gbps) speeds: 1. Machine learning (boosting) 2. 33 hours per rule for one week of OC48 Classification of flows 57k flows/sec 1.5 GHz Itanium 2 Line rate classification for OC48 29 Chapter II: Evaluating the effectiveness of collaborative anomaly detection Work with: • Matthew Caesar • Jennifer Rexford • Augustin Soule Methodology 30 Identify attacks in IP flow traces 2. Extract attackers 3. Correlate attackers across victims 1. 1) 2) 3) Identifying anomalous events 31 Use existing anomaly detectors1 IP scans, port scans, DoS e.g., IP scan is more than n IP addresses contacted Minimize false positives Correlate with DNS BL IP addresses exhibiting open proxy or spambot behavior 1[Allan IMC ’07], [Kompella IMC ’04] Cooperative blocking Beasty is very bad! 32 CNN FOX A set ‘S’ of victims agree to participate Beasty is blocked following initial attack Subsequent attacks by Beasty on members of ‘S’ are deemed ineffective DHCP lease issues 33 10.0.0.1 ? Dynamic address allocation IP address first owned by Beasty Then owned by innocent Tweety Should not block Tweety’s innocuous queries CNN DHCP lease issues 34 • Update DNS BL hourly • Block IP addresses for a period shorter than most DHCP leases1 Dynamic address allocation 1[Xie IP address first owned by Beasty Then owned by innocent Tweety Should not block Tweety’s innocuous queries SIGC ’07] Methodology 35 IP flow traces from Géant DNS BL to limit FP Cooperative blocking of attackers for Δ hours Metric is fraction of potentially mitigated flows Blacklist duration parameter Δ 36 Collaboration between all hosts Majority of benefit can be had with small Δ Number of participating victims 37 Randomly selecting n victims to collaborate in scheme Reported number average of 10 random selections Number of participating victims 38 Collaboration between most victimized hosts Attackers are more like to continue to engage in bad action “x” than a random other action Chapter conclusion 39 Repeat-attacks often occur within one hour Substantially less than average DHCP lease Collaboration can be effective Attackers contact a large number of victims 10k random hosts could mitigate 50% Some hosts are much more likely victims Subsets of victims can see great improvement 40 Chapter III: Privacy-preserving collaborative anomaly detection Work with: • Benny Applebaum • Matthew Caesar • Michael J Freedman • Jennifer Rexford Privacy-Preserving Collaboration 41 CNN E( ) Secure Correlation E( Google ) Protect privacy of • Participants: do not reveal who suspected whom • Suspects: only reveal suspects upon correlation E( FOX Haakon Ringberg ) System sketch 42 Trusted third party is a point of failure Google rogue employee Inadvertent data leakage Risk of subpoena MSFT Single Secure Correlation CNN Haakon Ringberg FOX System sketch 43 Trusted third party is a point of failure Google MSFT CNN FOX Single rogue employee Inadvertent data leakage Risk of subpoena Fully distributed impractical Poor scalability Liveness issues Haakon Ringberg Recall: • Participant privacy • Suspect privacy Split trust 44 CNN Proxy DB FOX Managed by separate organizational entities Honest but curious proxy, DB, participants (clients) Secure as long as proxy and DB do not collude Haakon Ringberg Recall: • Participant privacy • Suspect privacy Protocol outline 45 1. Clients send suspect IP addrs (x) 2. Client / Participant e.g., x = 127.0.0.1 DB releases IPs above threshold x Proxy But this violates suspect privacy! x # 1 23 3 2 DB Recall: • Participant privacy • Suspect privacy Protocol outline 46 1. 2. Clients send suspect IP addrs (x) DB releases IPs above threshold Client / Participant H(x) Hash of IP address Proxy Still violates suspect privacy! H(x) # 1 23 3 2 DB Recall: • Participant privacy • Suspect privacy Protocol outline 47 1. 2. Clients send suspect IP addrs (x) IP addrs blinded w/Fs(x) 3. Keyed hash function (PRF) Key s held only by proxy Client / Participant Keyed hash of IP address DB releases IPs above threshold Still violates suspect privacy! Fs(x) # Fs(1) 23 Fs(3) 2 Fs(x) Proxy DB Recall: • Participant privacy • Suspect privacy Protocol outline 48 1. 2. Clients send suspect IP addrs (x) IP addrs blinded w/EDB(Fs(x)) 3. Keyed hash function (PRF) Key s held only by proxy Client / Participant Encrypted keyed hash of IP address DB releases IPs above threshold But how do clients learn EDB(Fs(x))? Fs(x) # Fs(1) 23 Fs(3) 2 EDB(Fs(x)) Proxy DB Protocol outline Recall: • Participant privacy • Suspect privacy Clients send suspect IP addrs (x) IP addrs blinded w/EDB(Fs(x)) Client / Participant 49 1. 2. 3. 4. Keyed hash function (PRF) Key s held only by proxy EDB(Fs(x)) learned through secure function evaluation DB releases IPs above threshold Possible to reveal IP addresses at the end EDB(Fs(x)) x Fs(x) Proxy s DB Fs(x) # Fs(1) 23 Fs(3) 2 Protocol summary 50 Client Clients send suspects IPs Learns Fs(x) using secure function evaluation EDB(Fs(3)) Proxy forwards to DB Ds (Fs(3)) = 3 Randomly shuffles suspects Re-randomizes encryptions DB correlates using Fs(x) DB forwards bad Ips to proxy Fs(x) # Fs(3) 2 1 Fs(3) Architecture 51 Clients Client-Facing Proxies Back-End DB Storage Front-End DB Tier Proxy Decryption Oracles Proxy split into client-facing and decryption oracles Proxies and DB are fully parallelizable Evaluation 52 All components implemented ~5000 lines of C++ Utilizing GnuPG, BSD TCP sockets, and Pthreads Evaluated on custom test bed ~2 GHz (single, dual, quad-core) Linux machines Algorithm RSA / ElGamal Parameter key size Value 1024 bits Oblivious Transfer AES k key size 80 256 Scalability w.r.t. # IPs 53 Single CPU core for DB and proxy each Scalability w.r.t. # clients 54 Four CPU cores for DB and proxy each Scalability w.r.t. # CPU cores 55 n CPU cores for DB and proxy each Summary 56 Collaboration protocol protects privacy of Participants: do not reveal who suspected whom Suspects: only reveal suspects upon agreement Novel composition of crypto primitives One-way function hides IPs from DB; public key encryption allows subsequent revelation; secure function evaluation Efficient implementation of architecture Millions of IPs in hours Scales linearly with computing resources Conclusion 57 1. Speed 2. Accuracy 3. ML-based architecture supports accurate and scalable Snort-like classification on IP flows Collaborating against mutual adversaries Privacy Novel cryptographic protocol supports efficient collaboration in privacy-preserving manner Future Work Highlights 58 1. ML-based Snort-like architecture Cross-site: train on site A and test on site B Performance on sampled flow records 2. Measurement study Biased correlation results due to biased DNSBL (ongoing) Rate at which information must be exchanged Who should cooperate: end-points or ISPs? 3. Privacy-preserving collaboration Other applications, e.g., Viacom-vs-YouTube concerns THANK YOU! Collaborators: Jennifer Rexford, Benny Applebaum, Matthew Caesar, Nick Duffield, Michael J Freedman, Patrick Haffner, Balachander Krishnamurthy, and Augustin Soule Difference in rule accuracy 60 Rule Overall w/o dst Accuracy port w/o mean packet size MS-SQL version overflow 1.00 0.99 0.83 ICMP PING speedera 0.82 0.79 0.06 NON-RFC HTTP DELIM 0.48 0.02 0.22 Accuracy is a function of correlation between flow and packet-level features Choosing an operating point 61 • X = alarms we want raised • Z = alarms that are raised X Y Z Y Precision Z Exactness Y Recall X Completeness AP is a single number, but not most intuitive Precision & recall are useful for operators “I need to detect 99% of these alarms!” Choosing an operating point 62 Rule Precision Precision w/recall 1.00 w/recall=0.99 MS-SQL version overflow 1.00 1.00 ICMP PING speedera 0.02 0.83 CHAT AIM receive message 0.02 0.11 AP is a single number, but not most intuitive Precision & recall are useful for operators “I need to detect 99% of these alarms!” Quantifying the benefit of collaboration 63 MSNBC FOX CNN Effectiveness of collaboration is a function of 1. 2. Whether different victims see the same attackers Whether all victims are equally likely to be targeted IP address blinding 64 EDB(Fs(x)) Client DB requires injective and one-way function on IPs Cannot use simple hash Fs(x) is keyed hash function (PRF) on IPs Key s held only by proxy Haakon Ringberg Secure Function Evaluation 65 EDB(Fs(x)) x Fs(x) Client s IP address blinding can be split into per-IP-bit xi problem Client must learn EDB(Fs(xi)) Client must not learn s Proxy must not learn xi Oblivious Transfer (OT) accomplishes this1,2 Amortized OT makes asymptotic performance equal to 1[Naor et al. SODA ’01] , matrix multiplication3 et al. TCC ’05] , 2[Ishai et al. CRYPTO ’03] Haakon Ringberg 1[Freedman Public key encryption 66 Clients encrypt suspect IPs (x) First w/proxy’s pubkey Then w/DB’s pubkey Client Forwarded by proxy Does not learn IPs Decrypted by DB EDB(EPX(x)) Does not learn IPs Does not allow for DB correlation due to padding (e.g., OAEP) Haakon Ringberg EPX(x) How client learns Fs(x) 67 Client must learn Fs(x) Client must not learn ‘s’ Proxy must not learn ‘x’ Naor-Reingold PRF s = { si | 1 ≤ i ≤ 32} PRF = g^(∏x =1 si) i Add randomness ui to obscure si from client Message = ui * si Haakon Ringberg How client learns Fs(x) 68 s0 s1 Fs(x) = u0 u1 * s1 x= x0=0 x1=1 s31 u31 * s31 x31=1 For each bit xi of the IP, the client learns s= ui * si, if xi is 1 ui, if xi is 0 The user also learns ∏ ui Haakon Ringberg How client learns Fs(x) 69 ∏xi=1 ui * si * ∏xi=0 ui ∏ ui * ∏xi=1 si / ∏ ui ∏ ui ∏ ui * ∏xi=1 si Fs(x) = ∏xi=1 si User multiplies together all values Divides out ∏ ui Acquires Fs(x) w/o having learned ‘s’ Haakon Ringberg How client learns Fs(x) 70 • But how does the client learn • si * ui, if xi is 1 • ui, if xi is 0 • Without the proxy learning the IP x? User multiplies together all values Divides out ∏ ui Acquires Fs(x) w/o having learned ‘s’ Haakon Ringberg 70 Oblivious Transfer (details) 71 1. Client sends f(x=0) and f(x=1) 2. Proxy doesn’t learn x Proxy sends 3. •x • g(f(x)) v(0) = Eg(f(0))(1 + r) v(1) = Eg(f(1))(s + r) Public: • f(x) • g(x) Client decrypts v(x) with g(f(x)) Client f(0) f(1) v(0) v(1) Calculates g(f(x)) Cannot calculate g(f(1-x)) s Haakon Ringberg Oblivious Transfer (more details) 72 Preprocessing: Proxy chooses random c and r (at startup) Proxy publishes c and gr Client chooses random k (for each bit) y0 y1 1. Keyx = gk Key1-x = c * g-k 2. Keyxr = (gr)k Used to decrypt yx Key0 Haakon Ringberg 1. Key0r = Key0r Key1r = cr / Key0r 2. y0 = AESKey1r (u) y1 = AESKey0r (s * u) Oblivious Transfer (more details) 73 • Proxy never learns x • Client can calculate Keyxr = (gr)k easily, but cannot calculate cr (due to lack of r), which is needed for Key1-xr = cr * (gr)-k y0 y1 1. Keyx = gk Key1-x = c * g-k 2. Keyxr = (gr)k Used to decrypt yx Key0 Haakon Ringberg 1. Key0r = Key0r Key1r = cr / Key0r 2. y0 = AESKey1r (u) y1 = AESKey0r (s * u) Other usage scenarios 74 1. Cross-checking certificates 2. e.g., Perspectives1 Clients = end users Keys = Hash of certificates received Distributed ranking e.g., Alexa Toolbar2 Clients = Web users Keys = Hash of web pages USENIX ’08], 2[www.alexa.com] 1[Wendlandt