Networks Worms Research and Engineering Challenges Stefan Savage Department of Computer Science and Engineering University of California, San Diego Joint work (in part or full) with David Moore (UCSD/CAIDA), Colleen Shannon (CAIDA), Geoff Voelker (UCSD), Vern Paxson (ICIR/LBL), Stuart Staniford (Silicon Defense), Nick Weaver (UC Berkeley), Sumeet Singh (UCSD), Cristian Estan (UCSD), George Varghese (UCSD) What is a Network Worm? • Self-propagating self-replicating network program – Exploits some vulnerability to infect remote machines • No human intervention necessary – Infected machines continue propagating infection University California, San Diego – Department of Computer Science UCSD CSE A Brief History… • Brunner describes “tapeworm” program in novel “Shockwave Rider” (1972) [I’ve been told there is an earlier sci-fi reference] • Shoch&Hupp co-opt idea; coin term “worm” (1982) – Key idea: programs that self-propagate through network to accomplish some task – Benign; didn’t replicate • Fred Cohen demonstrates power and threat of selfreplicating viruses (1984) • Morris worm exploits buffer overflow vulnerabilities & infects a few thousand hosts (1988) Hiatus for 13 years… University California, San Diego – Department of Computer Science UCSD CSE Recent Events • CodeRed worm released in Summer 2001 – – – – Exploited buffer overflow in IIS Uniform random target selection (after fixed bug in CRv1) Infects 360,000 hosts in 10 hours (CRv2) Still going… • Starts renaissance in worm development – CodeRed II – Nimda – Scalper, etc. • Culminating in Sapphire/Slammer worm (Winter 2003) University California, San Diego – Department of Computer Science UCSD CSE Inside the Sapphire/Slammer Worm • Worm fit in a single UDP packet (404 bytes total) • Code structure – Cleanup from buffer overflow – Get API pointers Header Oflow • Code borrowed from published exploit – Create socket & packet – Seed PRNG with getTickCount() – While (TRUE) • Increment PRNG – Mildly buggy • Send packet to PRNG address • Key insight: decouple scanning from target behavior (easy to adapt to TCP-based worms) University California, San Diego – Department of Computer Science UCSD CSE API Socket Seed PRNG Sendto Sapphire growth • First ~1min behaves like classic random scanning worm – Doubling time of ~8.5 seconds – Code Red doubled every 40mins • >1min worm starts to saturate access bandwidth – Some hosts issue >20,000 scans/sec – Self-interfering (no congestion control) • Peaks at ~3min – 55million IP scans/sec • 90% of Internet scanned in <10mins • Infected ~100k hosts (conservative due to PRNG errors) University California, San Diego – Department of Computer Science UCSD CSE Eye Candy University California, San Diego – Department of Computer Science UCSD CSE Motivation (Gloom and Doom) • Possibly controversial statement: worms are the most potent network security threat today – Many millions of susceptible hosts – Easy to write worms • Worm payload separate from vulnerability exploit • Significant code reuse in practice – Possible to cause major damage • Lucky so far; existing worms have benign payload • Wipe disk; flash bios; modify data; reveal data; Internet DoS • We have no operational defense – Good evidence that humans don’t react fast enough – Defensive technology is nascent at best University California, San Diego – Department of Computer Science UCSD CSE Agenda for today • How to think about the worm problem • Reactive defense – Containment: what we’re doing – Treatment: the next talk • Proactive defense – Prevention: an appeal to the software research community University California, San Diego – Department of Computer Science UCSD CSE Modeling network worms • Network worms are well modeled as infectious epidemics – Simplest version: Homogeneous random contacts • Classic SI model • • • • • dI IS N: population size N S(t): susceptible hosts at time t dt dS IS I(t): infected hosts at time t dt N ß: contact rate i(t): I(t)/N, s(t): S(t)/N courtesy Paxson, Staniford, Weaver e (t T ) i (t ) 1 e (t T ) University California, San Diego – Department of Computer Science UCSD CSE di i (1 i ) dt What’s important? • How likely is it that an infection attempt is successful? – Target selection (random, biased, hitlist, etc) – Vulnerability distribution (e.g. density – S(0)/N) • How frequently are infections attempted? – ß: Contact rate • That’s it… with current technology death/recovery is irrelevant on timescales of interest University California, San Diego – Department of Computer Science UCSD CSE What can be done? • Reduce the number of infected hosts – Treatment, reduce I(t) while I(t) is still small • Reduce the contact rate Reactive – Containment, reduce ß while I(t) is still small • Reduce the number of susceptible hosts – Prevention, reduce S(0) University California, San Diego – Department of Computer Science UCSD CSE Proactive Treatment • Reduce # of infected hosts • Disinfect infected hosts – Detect infection in real-time – Develop specialized “vaccine” in real-time (next talk) – Distribute “patch” more quickly than worm can spread • Anti-worm? (CRClean written) • Bandwidth interference… University California, San Diego – Department of Computer Science UCSD CSE Containment • Reduce contact rate • Oblivious defense – Consume limited worm resources [Liston01] – Throttle traffic to slow spread [Williamson02] – Possibly important capability, but worm still spreads… • Targeted defense – Detect and block worm [Moore et al 03] University California, San Diego – Department of Computer Science UCSD CSE Design Issues for Reactive Defense [Moore et al 03] • Any reactive defense is defined by: – Reaction time – how long to detect, propagate information, and activate response – Containment strategy – how malicious behavior is identified and stopped – Deployment scenario - who participates in the system • We evaluate the requirements for these parameters to build any effective system. University California, San Diego – Department of Computer Science UCSD CSE Methodology • Simulate spread of worm across Internet topology: – infected hosts attempt to spread at a fixed rate (probes/sec) – target selection is uniformly random over IPv4 space • Simulation of defense: – system detects infection within reaction time – subset of network nodes employ a containment strategy • Evaluation metric: – % of vulnerable hosts infected in 24 hours – 100 runs of each set of parameters (95th percentile taken) • Systems must plan for reasonable situations, not the average case • Source data: – vulnerable hosts: 359,000 IP addresses of CodeRed v2 victims – Internet topology: AS routing topology derived from RouteViews University California, San Diego – Department of Computer Science UCSD CSE Initial Approach: Universal Deployment • Assume every host employs the containment strategy • Two containment strategies we tested: – Address blacklisting: • block traffic from malicious source IP addresses • reaction time is relative to each infected host – Content filtering: • block traffic based on signature of content • reaction time is from first infection • How quickly does each strategy need to react? • How sensitive is reaction time to worm probe rate? University California, San Diego – Department of Computer Science UCSD CSE How quickly does each strategy need to react? Content Filtering: % Infected (95th perc.) % Infected (95th perc.) Address Blacklisting: Reaction time (minutes) Reaction time (hours) • To contain worms to 10% of vulnerable hosts after 24 hours of spreading at 10 probes/sec (CodeRed): – Address blacklisting: reaction time must be < 25 minutes. – Content filtering: reaction time must be < 3 hours University California, San Diego – Department of Computer Science UCSD CSE How sensitive is reaction time to worm probe rate? reaction time Content Filtering: probes/second • Reaction times must be fast when probe rates get high: – 10 probes/sec: reaction time must be < 3 hours – 1000 probes/sec: reaction time must be < 2 minutes University California, San Diego – Department of Computer Science UCSD CSE Limited Network Deployment • Depending on every host to implement containment is not feasible: – installation and administration costs – system communication overhead • A more realistic scenario is limited deployment in the network: – Customer Network: firewall-like inbound filtering of traffic – ISP Network: traffic through border routers of large transit ISPs • How effective are the deployment scenarios? • How sensitive is reaction time to worm probe rate under limited network deployment? University California, San Diego – Department of Computer Science UCSD CSE How effective are the deployment scenarios? % Infected at 24 hours (95th perc.) CodeRed-like Worm: University California, San Diego – Department of Computer Science UCSD CSE How sensitive is reaction time to worm probe rate? reaction time Top 100 ISPs probes/second • Above 60 probes/sec, containment to 10% hosts within 24 hours is impossible even with instantaneous reaction. University California, San Diego – Department of Computer Science UCSD CSE Summary for reactive defense • Reaction time: – required reaction times are a couple minutes or less (far less for BW-limited scanners) • Containment strategy: – content filtering is more effective than address blacklisting • Deployment scenarios: – need nearly all customer networks to provide containment – need at least top 40 ISPs provide containment • We’re currently trying to build a system that could surpass these requirements (another talk) University California, San Diego – Department of Computer Science UCSD CSE Proactive Defense: Prevention • Reduce # of susceptible hosts • Software quality: eliminate vulnerability – – – – Static/dynamic testing [e.g. work of Cowan, Wagner, Engler, etc] Software process, code review, etc… Active research community Traditional problems: soundness, completeness, usability • Software updating: reduce window of vulnerability – Most worms exploit known vulnerability (10 days -> 3 months) – Relatively little activity; yet critical problem • Software heterogeneity: reduce impact of vulnerability – Exploit existing heterogeneity [e.g. Junqueria’s Phoenix, HotOS 03] – Artificial heterogeneity [e.g. Forrest97] University California, San Diego – Department of Computer Science UCSD CSE Artificial Heterogeneity: A Call to Arms for the Software Research Community • Key idea: automatically give each instance of a program a unique implementation • Low-level – Environment/Run-time heterogeneity • Variable stack sizes, dynamic import tables – Representation heterogeneity • Activation record format; randomize function prolog • Register assignment, spilling order; Heap vs stack assignment – Control flow heterogeneity • Re-order basic blocks • Isomorphic CFGs • High-level – Source-translation -> functional equivalents – Translation into design-level equivalents (its ok to have different semantics for property p if property p isn’t defined in interface spec) University California, San Diego – Department of Computer Science UCSD CSE Why this is a good/dumb idea • Yes but, – Simple code randomization was tried for buffer overflows and it didn’t help much – Software maintenance becomes more expensive – This is just code obfuscation and we know that doesn’t work • Yes and, – This might also help debug programs and eliminate the use of undefined “quasi-invariants” – This is something the software community could do besides repeating the “write correct code” mantra – We’re desperate enough that it might be worth giving this some serious thought University California, San Diego – Department of Computer Science UCSD CSE Summary • Worms are a humongous potential problem • There are a limited # of things you can do • Reactive defense – Very challenging engineering requirements – But a number of us are having a shot at it • Proactive defense – Some obvious things (better software, better patch distribution) – Large potential impact from attacking homogeneity – Open research question: can we programmatically create sufficient software diversity to provide protection? University California, San Diego – Department of Computer Science UCSD CSE