Shield & Friends Troubleshooting Networks Helen J. Wang Researcher System and Networking Research Group Microsoft Research Shield: Vulnerability-Driven EndHost Firewall for Preventing Known Vulnerability Attacks Search keywords: msr shield project People involved MSR Redmond John Dunagan Dan Simon Helen Wang MSR Asia Chuanxiong Guo MSR Cambridge Alf Zugenmaier Summer interns: Nikita Borisov (Berkeley, 2004) David Brumley (CMU, 2004) Pallavi Joshi (IIT, 2005) Charlie Reis (UW, 2005) Justin Ma (UCSD, 2005) Software patching not an effective first-line defense Sasser, MSBlast, CodeRed, Slammer, Nimda, Slapper all exploited known vulnerabilities whose patches were released months or weeks before 90+% of attacks exploit known vulnerabilities [Arbaugh2002] People don’t patch immediately Why don’t people patch? Disruption Service or machine reboot Unreliability Software patches inherently hard to test Irreversibility Most patches are not designed to be easily reversible Unawareness Firewall also not an effective first line defense Traditional firewalls Typically in the network One-size-fits-all solution, lack application-awareness, miss end-to-end encrypted traffic Course-grained High false positive rate Exploit-driven firewalls Filter according to exploit (attack) signatures Attack code obfuscation, e.g., polymorphism, metamorphism, can evade the firewall Worms spread fast (in minutes or seconds!) Real-time signature generation and distribution difficult Shields: End-host VulnerabilityDriven Network Filters Goal: Protect the time window between vulnerability disclosure and patch application. Approach: Characterize the vulnerability instead of its exploits and use the vulnerability signature for end-host firewalling Shields combine the best features of Patches: vulnerability-specific, code level, executable Firewall: exploit-specific, network level, data-driven Advantages of Shield: Protection as good as patches (resilient to attack variations), unlike exploit-driven firewalls Easier to test and deploy, more reliable than patches Vulnerability vs. Exploit (1:M) Many exploits against a single vulnerability E.g., many different strings can overrun a vulnerable buffer Vulnerability signatures generated at vulnerability discovery time E.g., sizeof (msg.buffer) > legalLimit Exploit signatures generated at attack time E.g., Snort signature for Slammer: alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL Worm propagation attempt"; content:"|04|"; depth:1; content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send"; Overview of Shield Usage New Shield Policy Incoming or Outgoing Network Traffic Shield Policies Shielded Traffic to Processes or Remote Hosts End-Host Shield Shield intercepts vulnerable application traffic above the transport layer. Policy distribution very much like anti-virus signature model – automatic, non-disruptive, reversible Vulnerability Modeling Vulnerability State Machine S0 S0 Message S1 S2 V4 S4 S3 S2 Exploit Event Protocol State Machine S5 V4 S5 Application Functionality in S2 Shield Policy (Vulnerability Signature): Vulnerability state machine + how to recognize and react to exploits in the vulnerable state Protocol analysis is the key for vulnerability-driven filtering Shield Architecture: Goals Flexibility: support protocol analysis for any application level protocols Safety: avoid repeating Ethereal Fidelity: protocol analysis consistent with that of the application DoS resilience: hold less state than that of the application Flexibility: separate mechanism from policy Mechanism: protocol analysis – reconstruct message and session semantics: e.g., parsing, state machine operations GAPA: generic application-level protocol analyzer Policy: a language that describes protocol state machine, message formats, and specific vulnerabilities GAPAL: GAPA language Shield policy: a GAPAL script that blocks according to a vulnerability Shield Architecture Policy Loader Per-App Spec New Policies Exe->Spec ID How to parse message How to identify a session Raw bytes Port # Application Dispatcher Raw bytes Spec ID HandlerAt(State, Event) Session Dispatcher Event for Session i Interpret (Handler) ParsePayload Drop TearDownSession Shield Interpreter SetNextState State Machine Engine CurState Session Session Session State i State State Achieving Shield Fidelity Infidelity results in evasion or false positives Sources of inconsistencies: Misunderstanding of the protocol or message format Test suites or trace-driven debugging Event dispatching logic: Session as an abstraction independent of socket or host pair Scattered message arrivals: Message as an abstraction independent of the packet [Ptacek et al] [Paxson] [Handley et al] [Malan et al] Achieve DoS-resilience: Session state: Current protocol state Parsing state Handler continuation Parsing: Exploit-checking only -- much streamlined parsing Aggressive byte skipping Save the partial field only (instead of partial message) Achieving Safety: GAPAL Protocol <protoName> { uses <lowerLayerName> transport = { TCP|UDP/<port> } // session-local vars <baseType> <varName>; grammar { // msg-local vars <baseType> <varName>; NonTerminal <name>:<type> { <code>} …. }; State-machine <name> { (<state>, IN|OUT|Timeout) handler; initial-state = <stateName>; final-state = <stateName>; }; Session-identifier (<startNonTerminal>) { <code> return <session ID>; }; Handler <name> (<startNonTerminal>) { // handler-local vars <baseType> <varName>; <grammar visitor> <post-parsing code> return “<nextState>; }; }; // protocol An Example: CodeRed protocol HTTPProtocol { transport = (80/TCP); int32 content_length = 0; bool chunked = false; bool keep_alive = false; int32 foo; bool maybeCodered=false; grammar { … HTTP_message ->Request | Response; Request-> RequestLine HeadersBody; Response-> ResponseLine HeadersBody; HeadersBody-> { chunked = false; keep_alive = false; content_length = 0; } Headers CRLF { if (chunked) message_body := ChunkedBody; else message_body := NormalBody; } message_body:?; RequestLine-> Method WS uri:RequestURI WS version:HTTPVersion CRLF ; ResponseLine-> HTTPVersion WS statusCode:"[0-9][0-9][0-9]" WS message:"[^\r\n]*" CRLF ; Method -> "GET" | "POST" ; state-machine serverOrclient { (S_Init,IN)->H_RequestIN; (S_Init,OUT)->H_RequestOUT; (S_ResponseOUT,OUT)->H_ResponseOUT; (S_ResponseIN,IN)->H_ResponseIN; (S_RequestIN,IN)->H_RequestIN; (S_RequestOUT,OUT)->H_RequestOUT; initial_state=S_Init; final_state=S_Final; }; handler H_RequestIN (HTTP_message) { @IDAURI->{ if (strlen(buffer) > 239) { print (“CodeRed!! \n”); return S_Final; } else return S_ResponseOUT; } return S_ResponseOUT; RequestURI->IDAURI|OtherURI; IDAURI -> idaname:"[^\n]*\.ida?" buffer:"[^\n=]*" "=" rest:"[^\n]*" ; … }; }; Key Properties of a GAPAL Completeness Binary as well as text-based protocols Layering Ease of authoring protocol descriptions Payload parsing grammar similar to BNF E.g., HTTP RFC spec - text ~= GPA policy for HTTP Safety Strong typing No dynamic memory allocation No general-purpose loops or iterators Semantic checking and optimization at compile time GAPA as a General Facility Rapid protocol analysis enabler for IDSes, firewalls, and network monitors; and allow flexible customization Easy authoring of Shield vulnerability signature Vulnerability signature authoring as refinement of previously specified protocol Merging vulnerability signatures of the same application becomes trivial Shield Implementation and Evaluation First prototype implemented as Windows Layered Service Provider (LSP) Working shields for vulnerabilities behind Blaster, Slammer, and CodeRed Near-zero false positives Performance and scalability results promising: Negligible overhead for end user machines 14-30% throughput overhead for an artificial scenario stressing Shield Second prototype based on GAPAL 48Mbps for CodeRed, 72Mbps for host header, 8-18Mbps for Blaster MSRC 2003 Bulletin study (49 bulletins) All 12 worm-able vulnerabilities are easily shield-able Some of the other 37 may also be shield-able Ongoing work Performance enhancement Shield as a detector Fast Shield vulnerability signature generation when combined with zero-day attack detection tools GAPA provides protocol context BrowserShield for protecting browsers and their extensions Conclusions Shield: vulnerability-driven end-host firewall, “patching” network traffic rather than software binary --- easier to test and deploy SIGCOMM 2004 GAPA: generic protocol analysis is a valuable tool for IDSes, firewalling, networking monitoring, etc.. Ongoing Privacy-Preserving Friends Troubleshooting Network Search keywords: friends troubleshooting network People Involved Qiang Huang (Intern 2004, Princeton) Helen J. Wang David Jao Nikita Borisov (Intern 2004, UC Berkeley) Yih-Chun Hu Background Manual diagnosis of misconfigurations on desktop PCs is time consuming and expensive. Partial automation for diagnosis: Strider [Y. Wang et al, LISA 2003] Diagnosing misconfigurations by diff-ing against the known good registry snapshot, etc.. Full automation for diagnosis: PeerPressure [H. Wang et al, OSDI 2004]: Intuition: an application functions well on most machines Approach: use statistics in the mass to diagnose the anomaly How to gather sample set? Given: A list of suspect entries N samples to collect Search for helpers who owns the app, and gather parameters needed for PeerPressure: Cardinality (number of possible values) for each suspect entry Number of samples that match the value of suspect entry Most popular value for each suspect entry for the purpose of correction Database vs. P2P: trust, scaling, freshness, … We focus on P2P Challenges Privacy: Privacy of both troubleshooting users and peer helpers Integrity: Malicious users Compromised machines Basic Approach: Friends Troubleshooting Network Intuition Mitigate the malicious user problem People troubleshoot their computers with their friends, co-workers, or neighbors Friends can be trusted not to contribute false and potentially harmful configuration information Friends Troubleshooting Network (FTN) Peer-to-peer overlay link between you and your “trustworthy” friends. Does not address compromised machine. Recursive Trust vs. Transitive Trust Transitive trust is undesirable “I’ll help my friends, but nobody else” We use recursive trust Alice’s friend, Bob, proxies her request to Bob’s friend Carol; and Carol has incentive to help If Alice asked Carol directly, Carol might not answer Proxied request hides source requester’s identity Privacy Model Who to protect: The troubleshooting user and the contributors Private information Identity revealing configuration state filtered out (e.g., canonicalization) Privacy-sensitive info to be protected: URL visited, app installed Attacker model Curious, but honest friends (who never lie about configuration) Attacks Eavesdrop Message Inspection Polling Attack Gossip Attack (Collusions) Preserving Privacy in FTN: Basic Approaches Integration of search and parameter gathering in one transaction: Privacy of the application ownership lost if search separate Historyless and futureless random-walk of ownerless troubleshooting request: Individual state disguised in the aggregate state Similar to Crowds, but: 1. 2. Random-walk must follow the friends link Parameter aggregation is carried out during the random-walk FTN Routing Vi e1 e2 True False on off Me(i) 1 0 5 7 R=10 Vi e1 e2 True False on off Vi e1 e2 Me(i) Vi e1 e2 True False on off Vi 1 0 5 7 R=10 F 10 9 1 15 10 7 0 True False on off Me(i) e1 e2 2 0 6 7 R=9 H Me(i) 10 1 15 7 True False on off Me(i) Vi e1 e2 True False on off Me(i) 10 1 15 7 Issues Random initialization not really possible Gossip attack A helper’s previous and next hop friends may collude Last hop polling attack Set R=1, last hop’s state can be easily determined Proposed solution: use probabilistic helping (Ph) and random wait to disguise the last hop Statistical inference and timing analysis still feasible Enhancements Countering polling attack Intuition: do not indicate R. Approach: each helper proxies Req with Pf = 1-1/N. Countering gossip attack Intuition: helper does not contribute its configuration info directly. Approach: forming a cluster around the helper and carry out a multi-party secure sum. Enabling random initialization Evaluation Use real-world MSN IM operational data (a snapshot of Aug 2003) to evaluate performance 150 million users. Median 9 friends, average 19 friends. Inherent privacy and efficiency tradeoff The performance of our current prototype allows enterprise users to diagnose misconfigurations in a minute with a high privacy guarantee. Related Work Anonymization techniques: Crowds, FreeNet, Mix, Onion Routing, Tarzan No control on who can contribute the configuration data and who can manipulate the data Homomorphic encryption Only supports known fixed set of items, while cardinality of configuration states is most likely unknown ahead of time Random perturbation [Agrawal’00] Severely affects PeerPressure ranking, when N small Ongoing Work New techniques based on homomorphic encryption to do data aggregation for FTN Compromised machine Conclusions FTN enables privacy-preserving P2P troubleshooting on a friends overlay network We faced many interesting challenges in preserving privacy and integrity We used the following techniques: Historyless and futureless random walk Integrated search and parameter gathering Probabilistic helping and forwarding Clustering for mitigating collusion attacks Cardinality estimation for enabling random initialization There is a tradeoff between privacy and efficiency Some of our techniques can be applied to other scenarios requiring privacy-preserving data aggregation Acknowledgement John Dunagan for critiquing my slides. Questions and comments ? © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary. The Case for Shield: Shield-ability # of vul. Nature Worm-able Shield-able 6 Local No No 24 User-involved No Impractical 12 Server input validation Yes Easy 3 Cross-site scripting No Impractical 3 Server DoS No Varies Study of 49 vulnerabilities from MS Security bulletin board in 2003: All 12 worm-able vulnerabilities are shield-able. Encrypted Traffic Layer 5+: no need to deal with IPSec Build Shield above SSL Application-specific encryption hard Outline The case for Shield Overview of Shield design, implementation and evaluation Generic Application-Level Protocol Analyzer (GAPA) Ongoing work Countering Gossip Attack Via Cluster-based Secure Multi-party Sum Helper 3 2 2 0 Forwarder 2 1 Cluster entrance Helper 3 3 2 0. 1. 2. 3. 4. Receive a request Elect the cluster exit Distribute Shares Unicast subtotal Aggregate cluster sum 3 NonHelper 2 Cluster exit 4 Cluster Entrance Entrance and Exit Collusion Entrance and exit can collude to find out cluster aggregate Impact limited: they will not know individual contributions Some privacy compromise if more than half cluster members contribute Adjust Ph Deterministic approach: iterative helper selection Every cluster member first flips a biased coin (probability Pp < 0.5) Use secure sum to compute total number of heads If sum is more than half cluster, repeat Second step: use multi-party sum on contributions, but only those members with heads will contribute Background: PeerPressure Design Overview Narrow down the list of suspect entries on the sick machine The suspects are the entries that are touched during the faulty execution of the application Search for helper PCs from a DB or p2p Draw samples from the helper machines’ configuration state for the suspects Rank the suspects using Bayesian statistical analysis Top ranking entry is the most anomalous entry comparing with others The most popular value among the samples can be used for correction PeerPressure is effective 10 samples are needed for PeerPressure to be effective