Propagation and Containment Presented by Jing Yang, Leonid Bolotnyy, and Anthony Wood Analogy between Biological and Computational Mechanisms • The spread of self-replicating program within computer systems is just like the transition of smallpox several centuries ago [1] • Mathematical models which are popular in the research of biological epidemiology can also be used in the research of computer viruses [1] • Kephart & White’s work – first time to explicitly develop and analyze quantitative models which capture the spreading characteristics of computer viruses Kephart & White’s Work • Based on the assumption that viruses are spread by program sharing • Benefits of using mathematically models mentioned in their paper Evaluation and development of general polices and heuristics for inhibiting the spread of viruses Apply to a particular epidemic, such as predicting the course of a particular epidemic Modeling Viral Epidemics on Directed Graphs • Directed Graph [1] Representing an individual system as a node in a graph Directed edges from a given node j to other nodes represent the set of individuals that can be infected by j A rate of infection is associated with each edge A rate at which infection can be detected and “cured” is associated with each node SIS Model on A Random Graph • Random Graph – a directed graph constructed by making random, independent decisions about whether to include each of the N(N-1) possible directional edges [1] • Techniques used by Kephart & White Deterministic approximation Approximate probabilistic analysis Simulation Deterministic Approximation • β – infection rate along each edge • δ – cure rate for each node • β’ = β p (N - 1) – average total rate at which a node attempts to infect its neighbors • ρ’ = δ / β’ – if ρ’ > 1, the fraction of infected individuals decays exponentially from the initial value to 0, i.e. there is no epidemic; if ρ’ < 1, the fraction of infected individuals grows from the initial value at a rate which is initially exponential and eventually saturates at the value 1 - ρ’ Probabilistic Analysis • Including new information such as: Size of fluctuations in the number of infected individuals Possibility that fluctuations will result in extinction of the infection • Conclusion: A lot of variance in the number of infected individuals from one simulation to another In equilibrium the size of the infected population is completely insensitive to the moment at which the exponential rise occurred • Extinction probability and metastable distribution can be calculated Simulations • Results Higher extinction probability Lower average number of infected individuals • Suspected reason No details of which nodes are infected No variation in the number of nodes that a given node could infect Simulations (cont.) • Scenario A random graph in which most nodes are isolated and a few are joined in small clusters • Anything contributes to containment? Build isolated cells – worms can spread unimpeded in a cell, but containment system will limit further infection between cells Improvements of SIS Model on A Random Graph • Kephart & White’s work Weak links – give a node a small but finite chance of infecting any node which is not explicitly connected to it Hierarchical model – extend a two-type model of strong and weak link to a hierarchy • Wang’s work Effects of infection delay Effects of user vigilance Does SIS & SIR Models Take Containment into Consideration? • No to SIS and maybe Yes to SIR • In SIS, it may be more appropriate to be called treatment • In SIR, deployment of containment is limited to the individual node, which means that every cell only contains one node – more appropriate to be called treatment • Not to be automatic • No cooperation has been applied Model without Containment • Remember the assumption by Kephart & White’s work? Viruses spread by program sharing • Modern worms spread so quickly that manual containment is impossible • A model without containment should be built first (SI model) and then different containment methods are added to test the results IPv6 vs. Worms • It will be very challenging to build Internet containment systems that prevent widespread infection from worm epidemics in the IPv4 environment [2] • It seems that the only effective defense is to increase worm scanning space Upgrading IPv4 to IPv6 How Large is IPv6 Address • IPv6 has 2128 IP addresses [3] • Smallest subnet has 264 addresses [3] 4.4 billion IPv4 internets • Consider a sub-network [3] 1,000,000 vulnerable hosts 100,000 scans per second (Slammer - 4,000) 1,000 initially infected hosts It would take 40 years to infect 50% of vulnerable population with random scanning Worm Classification • Spreading Media [3] Scan-based & self-propagation Email Windows File Sharing Hybrid • Target Acquisition [3] Random Scanning Subnet Scanning Routing Worm Pre-generated Hit List Topological Stealth / Passive Can IPv6 Really Defeats Worms? • Traditional scan-based worms seems to be ineffective, but there may be some way to improve the scan methods [4] • More sophisticated hybrid worms may appear, which use a variety of ways to collect addresses for a quick propagation • Polymorphic worms may significantly increase the time to extract the signature of a worm Improvement in Scan Methods • Subnet Scanning The first goal may be a /64 enterprise network instead of the whole internet • Routing Worm Some IP addresses are not allocated • Pre-generated Hit List Scanning Speedup the propagation and the whole address pace can be equally divided for each zombie Improvement in Scan Methods (cont.) • Permutation Scanning Avoid waste of scanning one host many times • Topological Scanning Use the information stored on compromised hosts to find new targets • Stealth / Passive Worm Waiting for the vulnerable host to contact you may be more efficient to scan such a large address space What Can IPv6 Itself Contribute? • Public services need to be reachable by DNS At least we have some known addresses in advance [4] • DNS name for every host because of the long IPv6 address DNS Server under attack can yield large caches of hosts [4] • Standard method of deriving the EUI field The lower 64 bits of IPv6 is derived from the 48-bit MAC address [5] • IPv6 neighbor-discovery cache data One compromised host can reveal the address of other hosts [4] • Easy-to-remember host address in the transition from IPv4 to IPv6 To scan the IPv6 address space is no difference to scan the IPv4 address space [4] More Sophisticated Hybrid Worms • Humans are involved • Different methods are used to compromise a host, so the vulnerability density increases relatively • Nimda’s success… Polymorphic Worms • Very effective containment system extracts a worm’s feature to do content filtering • Even though some methods exist to detect polymorphic worms, the successful rate may not be 100% What Should We Do Then? • To find out whether the following methods which just speedup the worm’s propagation in IPv4 can just make worm’s quick propagation in IPv6 possible Improvement in scan methods + IPv6 inherent features More sophisticated hybrid worms Polymorphic worms Future’s Work • Use traditional models to see whether each method or a combination of them can make a quick propagation of worm in IPv6 possible • Add new features of worm spread in IPv6 to build new models, which can represent the reality more precisely • If quick propagation can be true in IPv6, relative containment methods should be figured out – it should be much more possible than in IPv4 References [1] Jeffrey O. Kephart, Steve R. White. Directed-Graph Epidemiological Models of Computer Viruses [2] David Moore et al. Internet Quarantine: Requirements for Containing Self-Propagating Code [3] Mark Shaneck. Worms: Taxonomy and Detection. [4] Sean Convery et al. IPv6 and IPv4 Threat Comparison and Best Practice Evaluation. [5] Michael H. Warfield et al. Security Implications of IPv6. Propagation and Containment of Worms General modeling of worm propagation and containment strategies Ways to mitigate the threat of worms • Prevention – Prevent the worm from spreading by reducing the size of vulnerable hosts • Treatment – Neutralize the worm by removing the vulnerability it is trying to exploit • Containment – Prevent the worm from spreading from infected systems to the unaffected, but vulnerable hosts Containment Approaches • La Brea – Intercepts probes to unallocated addresses • Connection-history based anomaly detection – Analyzes connection traffic trying to detect an anomaly • Per host throttling – Restricts connection rate to “new” hosts • Blocking access to affected ports – Prevents affected hosts from accessing vulnerable ports on other machines • NBAR – Filters packets based on the content using worm signatures – It was very effective in preventing the spread of Code-Red General model for worm infection rate Modeling Containment Systems • Reaction Time – Time required to detect the infection, spread the information to all the hosts participating in the system, and to activate containment mechanisms • Containment Strategy – Strategy that isolates the worm from uninfected susceptible systems (e.g. address blacklisting and content filtering) • Deployment Scenario – “Who”, “Where” and “How” of the containment strategy implementation Simulation parameters • Population = 2^32 (assuming IPv4) • Number of vulnerable hosts = 360,000 (same as for Code-Red v2) • Any probe to the susceptible host results in an infection • A probe to the infected or non-vulnerable host has no effect • The first host is infected at time 0 • If a host is infected in time t, then all susceptible hosts are notified at time t + R where R is the reaction time of the system • Simulation is run 100 times Simulation Goals • Determine the reaction time needed to limit the worm propagation for addressblacklisting and content filtering • Compare the two containment strategies • Realize the relationship between reaction time and worm probe rate Idealized Deployment Simulation for Code-Red Idealized Deployment Simulation for Code-Red Conclusions • The strategy is effective if under 1% of susceptible hosts are infected within the 24 hour period with 95% certainty • Address-blacklisting is effective if reaction time is less than 20 minutes – Note: if reaction time > 20 minutes, all susceptible hosts will eventually become infected • Content filtering is effective if reaction time is less than two hours – How many susceptible hosts will become infected after time R (reaction time)? Idealized Deployment Simulation for General Worm (1) • The authors generalize the definition of the effectiveness as the reaction time required to contain the worm to a given degree of global infection • Worm aggressiveness – rate at which infected host probes others to propagate – Note: The rate of host probes does not take into account the possibility of preferential status that some addresses may have Idealized Deployment Simulation for General Worm (2) Idealized Deployment Simulation for General Worm conclusions • Worms that are more aggressive than Code-Red – having higher probe rate of 100 probes/second require a reaction time of under three minutes using Addressblacklisting and under 18 minutes for Content Filtering to contain the worm to 10% of total susceptible population. Practical Deployment • Analyzing practical deployment, authors concentrate on content filtering because of seemingly much lower requirements on the reaction time compared to addressblacklisting. This may be premature because the technique is still useful. It would be very beneficial to see a hybrid containment strategy that uses both content filtering and address-blacklisting. Practical Deployment simulation parameters • The topology of the Internet is taken at the time of the spread of Code-Red v2. • The number of vulnerable hosts is 338,652 (some hosts map to multiple autonomous systems; they have been removed; only infected hosts in the first 24 hours are inc.) • The number of autonomous systems is 6,378 • The packet is assumed to travel along the shortest path through autonomous systems Practical Deployment for Code-Red • Reaction time is two hours (less than 1% infected in idealized simulation) Practical Deployment for Code-Red conclusions • ISP deployment is more effective by itself than the Customer deployment • 40 top ISPs can limit the infection to under 5% whereas top 75% of Customer Autonomous systems can only limit infection to 25% • The results could have been anticipated based on the role of ISPs (their topology) Practical Deployment for Generalized Worm • We investigate reaction time requirements Practical Deployment for Generalized Worm conclusions • For probe rate of 100 probes/second or larger, neither deployment can effectively contain the worm • In the best case, effective containment is possible for 30 or fewer probes by the TOP 100 ISPs and only 2 or fewer probes by the 50% Customers • Note: TOP 100 ISPs cannot prevent a worm from infecting less than 18% of the hosts if the probe rate is 100 probes/second (not on the graph) Conclusions of the modeling scheme (1) • Automated means are needed to detect the worm and contain it. • Content filtering is more effective than addressblacklisting, but a combination of several strategies may need to be employed. • The reaction time has to be very small, on the order of minutes to be able to combat aggressive worms. • It is important to deploy the containment filtering strategy at most top ISPs. Conclusions of the modeling scheme (2) • The parameters of the model have changed • Other containment strategies need to be considered • What will happen to the population parameter? • What may happen to beta soon? • Combination of prevention, treatment and containment strategies are needed to combat aggressive worms LaBrea (1) • LaBrea is a Linux-based application which works at the network application layer creating virtual machines for nonexistent IP addresses when a packet to such an address reaches the network. • Once the connection is established, LaBrea will try to hold the connection as long as possible (by moving connections from established state to persistent state, it can hold connections almost indefinitely). LaBrea (2) • Any connection to LaBrea is suspect because the IP address to which the packet is sent does not exist, not even in DNS. • It can also analyze the range of IP addresses that are requested giving it a broader view of a potential attack (all ports on virtual machines appear open). • It requires 8bps to hold 3 threads of Code-Red. If there were 300,00 infected machines each with 100 threads, 1,000 sites would require 5.2% of the full T1 line bandwidth each to hold them. Connection-history based anomaly detection • The idea is to use GriDS based intrusion detection approach and make some modifications to it to allow for worm containment • Goals: – Automatic worm propagation determination – Worm detection with a low false positive rate – Effective countermeasures • Automatic responses in real-time • Prevent infected hosts from infecting other hosts • Prevent non-infected hosts from being infected Connection-history based anomaly detection model • Monitoring station collects all recent connection attempts data and tries to find anomaly in it. • Patterns of a worm – Similarity of connection patterns • The worm tries to exploit the same vulnerability – Causality of connection patterns • When one event follows after another – Obsolete connections • Compromised hosts try to access services at random IPs Very Fast Containment of Scanning Worms Weaver, Staniford, Paxson Outline • • • • • Scanning Suppression Algorithm Cooperation Attacks Conclusion What is Scanning? • • • • • • Probes from adjacent remote addresses? Dist. probes that cover local addresses? Horizontal vs. Vertical Factor in connection rates? Temporal and spatial interdependence How to infer intent? Scanning Worms • Blaster, Code Red, CR II, Nimda, Slammer • Does not apply to: – Hit lists (flash worms) – Meta-servers (online list) – Topology detectors – Contagion worms Scanning Detection • Key properties of scans: – Most scanning fails – Infected machines attempt many connections • Containment is based on worm behavior, not signatures (content) • Containment by address blocking (blacklisting) • Blocking can lead to DoS if false positive rate is high Scan Suppression • Goal 1: protect the enterprise; forget the Internet • Goal 2: keep worm below epidemic threshold, or slow it down so humans notice • Divide enterprise network into cells • Each is guarded by a filter employing the scan detection algorithm Inside, Outside, Upside Down • Preventing scans from Interne t Internet is too hard Outside • If inside node is infected, filter sees all traffic Inside • Cell (LAN) is “outside”, Enterprise network is “inside” Outside • Can also treat entire Scan detectors enterprise as cell, Internet as outside Scan Suppression • Assumption: benign traffic has a higher probability of success than attack traffic • Strategy: – Count connection establishment messages in each direction – Block when misses – hits > threshold – Allow messages for existing connections, to reduce impact of false positives Constraints • For line-speed hardware operation, must be efficient: – Memory access speed • On duplex gigabit ethernet, can only access DRAM 4 times – Memory size • Attempt to keep footprint under 16MB – Algorithm complexity • Want to implement entirely in hardware Mechanisms • Approximate caches – Fixed memory available – Allow collisions to cause aliasing – Err on the side of false negative • Cryptographic hashes – Prevent attackers from controlling collisions – Encrypt hash input to give tag – For associative cache, split and save only part as tag in table Connection Cache • Remember if we’ve seen a packet in each direction • Aliasing turns failed attempt into success (biases to false negative) • Age is reset on each forwarded packet • Every minute, bg process purges entries older than Dconn Address Cache • Track “outside” addresses • Counter keeps difference between successes and failures • Counts are decremented every Dmiss seconds Algorithm Pseudo-code Out In Connection cache Internet A A,X: OutIn - A,*: OutIn - A,Y: OutIn - A,Z: OutIn B A,B: OutIn InOut Address cache A: 1 2 3 T max C • UDP Probe: A → X [fwd] A → Y [fwd] • Normal Traffic: A → B [fwd] B → A [fwd, bidir] • Scanning again: A → … [fwd until T] A → Z [blocked] A→B ? [block SYN/UDP, fwd TCP] Performance • For 6000-host enterprise trace: – 1MB connection cache, 4MB 4-way address cache = 5MB total – At most 4 memory accesses per packet – Operated at gigabit line-speed – Detects scanning at rates over 1 per minute – Low false positive rate – About 20% false negative rate – Detects scanning after 10-30 attempts Scan Suppression – Tuning • Parameters: – T: miss-hit difference that causes block – Cmin: minimum allowed count – Cmax: maximum allowed count – Dmiss: decay rate for misses – Dconn: decay rate for idle connections – Cache size and associativity Cooperation • Divide enterprise into small cells • Connect all cells via low-latency channel • A cell’s detector notifies others when it blocks an address (“kill message”) • Blocking threshold dynamically adapts to number of blocks in enterprise: – T’ = T – θX, for very small θ – Changing θ does not change epidemic threshold, but reduces infection density Cooperation – Effect of θ Cooperation Issues • Poor choice of θ could cause collapse • Lower thresholds increase false positives • Should a complete shutdown be possible? • How to connect cells (practically)? Attacking Containment • False positives – Unidirectional control flows – Spoofing outside addresses (though this does not prevent inside systems from initiating connections) • False negatives – Use a non-scanning technique – Scan under detection threshold – Use a whitelisted port to test for liveness before scanning Attacking Containment • Detecting containment – Try to contact already infected hosts – Go stealthy if containment is detected • Circumventing containment – Embed scan in storm of spoofed packets – Two-sided evasion: • Inside and outside host initiate normal connections to counter penalty of scanning • Can modify algorithm to prevent, but lose vertical scan detection Attacking Cooperation • Attempt to outrace containment if threshold is permissive • Flood cooperation channels • Cooperative collapse: – False positives cause lowered thresholds – Lowered thresholds cause more false positives – Feedback causes collapse of network Conclusion Additional References • Weaver, Paxson, Staniford, Cunningham, A Taxonomy of Computer Worms, ACM Workshop on Rapid Malcode, 2003. • Williamson, Throttling Viruses: Restricting Propagation to Defeat Mobile Malicious Code, ACSAC, 2002. • Jung, Paxson, Berger, Balakrishnan, Fast Portscan Detection Using Sequential Hypothesis Testing, IEEE Symposium on Security and Privacy, 2004.