Propagation and Containment Presented by Jing Yang, Leonid Bolotnyy, and Anthony Wood

advertisement
Propagation and Containment
Presented by Jing Yang, Leonid
Bolotnyy, and Anthony Wood
Analogy between Biological and
Computational Mechanisms
• The spread of self-replicating program within
computer systems is just like the transition of
smallpox several centuries ago [1]
• Mathematical models which are popular in the
research of biological epidemiology can also be
used in the research of computer viruses [1]
• Kephart & White’s work – first time to explicitly
develop and analyze quantitative models which
capture the spreading characteristics of computer
viruses
Kephart & White’s Work
• Based on the assumption that viruses are
spread by program sharing
• Benefits of using mathematically models
mentioned in their paper
 Evaluation and development of general polices and
heuristics for inhibiting the spread of viruses
 Apply to a particular epidemic, such as predicting the
course of a particular epidemic
Modeling Viral Epidemics on
Directed Graphs
• Directed Graph [1]
 Representing an individual system as a node
in a graph
 Directed edges from a given node j to other
nodes represent the set of individuals that can
be infected by j
 A rate of infection is associated with each
edge
 A rate at which infection can be detected and
“cured” is associated with each node
SIS Model on A Random Graph
• Random Graph – a directed graph constructed
by making random, independent decisions about
whether to include each of the N(N-1) possible
directional edges [1]
• Techniques used by Kephart & White
 Deterministic approximation
 Approximate probabilistic analysis
 Simulation
Deterministic Approximation
• β – infection rate along each edge
• δ – cure rate for each node
• β’ = β p (N - 1) – average total rate at which a node
attempts to infect its neighbors
• ρ’ = δ / β’ – if ρ’ > 1, the fraction of infected individuals
decays exponentially from the initial value to 0, i.e. there
is no epidemic; if ρ’ < 1, the fraction of infected
individuals grows from the initial value at a rate which is
initially exponential and eventually saturates at the value
1 - ρ’
Probabilistic Analysis
• Including new information such as:
 Size of fluctuations in the number of infected individuals
 Possibility that fluctuations will result in extinction of the
infection
• Conclusion:
 A lot of variance in the number of infected individuals from
one simulation to another
 In equilibrium the size of the infected population is
completely insensitive to the moment at which the
exponential rise occurred
• Extinction probability and metastable distribution can be
calculated
Simulations
• Results
 Higher extinction probability
 Lower average number of infected individuals
• Suspected reason
 No details of which nodes are infected
 No variation in the number of nodes that a
given node could infect
Simulations (cont.)
• Scenario
 A random graph in which most nodes are
isolated and a few are joined in small clusters
• Anything contributes to containment?
 Build isolated cells – worms can spread
unimpeded in a cell, but containment system
will limit further infection between cells
Improvements of SIS Model on A
Random Graph
• Kephart & White’s work
 Weak links – give a node a small but finite chance of
infecting any node which is not explicitly connected to
it
 Hierarchical model – extend a two-type model of
strong and weak link to a hierarchy
• Wang’s work
 Effects of infection delay
 Effects of user vigilance
Does SIS & SIR Models Take
Containment into Consideration?
• No to SIS and maybe Yes to SIR
• In SIS, it may be more appropriate to be called treatment
• In SIR, deployment of containment is limited to the
individual node, which means that every cell only
contains one node – more appropriate to be called
treatment
• Not to be automatic
• No cooperation has been applied
Model without Containment
• Remember the assumption by Kephart & White’s
work?
 Viruses spread by program sharing
• Modern worms spread so quickly that manual
containment is impossible
• A model without containment should be built first
(SI model) and then different containment
methods are added to test the results
IPv6 vs. Worms
• It will be very challenging to build Internet
containment systems that prevent
widespread infection from worm epidemics
in the IPv4 environment [2]
• It seems that the only effective defense is
to increase worm scanning space

Upgrading IPv4 to IPv6
How Large is IPv6 Address
• IPv6 has 2128 IP addresses [3]
• Smallest subnet has 264 addresses [3]
 4.4 billion IPv4 internets
• Consider a sub-network [3]




1,000,000 vulnerable hosts
100,000 scans per second (Slammer - 4,000)
1,000 initially infected hosts
It would take 40 years to infect 50% of vulnerable
population with random scanning
Worm Classification
• Spreading Media [3]




Scan-based & self-propagation
Email
Windows File Sharing
Hybrid
• Target Acquisition [3]






Random Scanning
Subnet Scanning
Routing Worm
Pre-generated Hit List
Topological
Stealth / Passive
Can IPv6 Really Defeats Worms?
• Traditional scan-based worms seems to be
ineffective, but there may be some way to
improve the scan methods [4]
• More sophisticated hybrid worms may appear,
which use a variety of ways to collect addresses
for a quick propagation
• Polymorphic worms may significantly increase
the time to extract the signature of a worm
Improvement in Scan Methods
• Subnet Scanning
 The first goal may be a /64 enterprise network instead
of the whole internet
• Routing Worm
 Some IP addresses are not allocated
• Pre-generated Hit List Scanning
 Speedup the propagation and the whole address
pace can be equally divided for each zombie
Improvement in Scan Methods
(cont.)
• Permutation Scanning
 Avoid waste of scanning one host many times
• Topological Scanning
 Use the information stored on compromised hosts to
find new targets
• Stealth / Passive Worm
 Waiting for the vulnerable host to contact you may be
more efficient to scan such a large address space
What Can IPv6 Itself Contribute?
• Public services need to be reachable by DNS
 At least we have some known addresses in advance [4]
• DNS name for every host because of the long IPv6 address
 DNS Server under attack can yield large caches of hosts [4]
• Standard method of deriving the EUI field
 The lower 64 bits of IPv6 is derived from the 48-bit MAC address [5]
• IPv6 neighbor-discovery cache data
 One compromised host can reveal the address of other hosts [4]
• Easy-to-remember host address in the transition from IPv4 to IPv6
 To scan the IPv6 address space is no difference to scan the IPv4
address space [4]
More Sophisticated Hybrid Worms
• Humans are involved
• Different methods are used to compromise
a host, so the vulnerability density
increases relatively
• Nimda’s success…
Polymorphic Worms
• Very effective containment system extracts
a worm’s feature to do content filtering
• Even though some methods exist to detect
polymorphic worms, the successful rate
may not be 100%
What Should We Do Then?
• To find out whether the following methods
which just speedup the worm’s
propagation in IPv4 can just make worm’s
quick propagation in IPv6 possible
 Improvement in scan methods + IPv6 inherent
features
 More sophisticated hybrid worms
 Polymorphic worms
Future’s Work
• Use traditional models to see whether each
method or a combination of them can make a
quick propagation of worm in IPv6 possible
• Add new features of worm spread in IPv6 to
build new models, which can represent the
reality more precisely
• If quick propagation can be true in IPv6, relative
containment methods should be figured out – it
should be much more possible than in IPv4
References
[1] Jeffrey O. Kephart, Steve R. White. Directed-Graph
Epidemiological Models of Computer Viruses
[2] David Moore et al. Internet Quarantine: Requirements
for Containing Self-Propagating Code
[3] Mark Shaneck. Worms: Taxonomy and Detection.
[4] Sean Convery et al. IPv6 and IPv4 Threat Comparison
and Best Practice Evaluation.
[5] Michael H. Warfield et al. Security Implications of IPv6.
Propagation and Containment
of Worms
General modeling of worm
propagation and containment
strategies
Ways to mitigate the threat of
worms
• Prevention
– Prevent the worm from spreading by reducing the
size of vulnerable hosts
• Treatment
– Neutralize the worm by removing the vulnerability it is
trying to exploit
• Containment
– Prevent the worm from spreading from infected
systems to the unaffected, but vulnerable hosts
Containment Approaches
• La Brea
– Intercepts probes to unallocated addresses
• Connection-history based anomaly detection
– Analyzes connection traffic trying to detect an anomaly
• Per host throttling
– Restricts connection rate to “new” hosts
• Blocking access to affected ports
– Prevents affected hosts from accessing vulnerable ports on other
machines
• NBAR
– Filters packets based on the content using worm signatures
– It was very effective in preventing the spread of Code-Red
General model for worm infection
rate
Modeling Containment Systems
• Reaction Time
– Time required to detect the infection, spread the
information to all the hosts participating in the system,
and to activate containment mechanisms
• Containment Strategy
– Strategy that isolates the worm from uninfected
susceptible systems (e.g. address blacklisting and
content filtering)
• Deployment Scenario
– “Who”, “Where” and “How” of the containment
strategy implementation
Simulation parameters
• Population = 2^32 (assuming IPv4)
• Number of vulnerable hosts = 360,000 (same as for
Code-Red v2)
• Any probe to the susceptible host results in an infection
• A probe to the infected or non-vulnerable host has no
effect
• The first host is infected at time 0
• If a host is infected in time t, then all susceptible hosts
are notified at time t + R where R is the reaction time of
the system
• Simulation is run 100 times
Simulation Goals
• Determine the reaction time needed to
limit the worm propagation for addressblacklisting and content filtering
• Compare the two containment strategies
• Realize the relationship between reaction
time and worm probe rate
Idealized Deployment Simulation
for Code-Red
Idealized Deployment Simulation
for Code-Red Conclusions
• The strategy is effective if under 1% of
susceptible hosts are infected within the 24 hour
period with 95% certainty
• Address-blacklisting is effective if reaction time
is less than 20 minutes
– Note: if reaction time > 20 minutes, all susceptible
hosts will eventually become infected
• Content filtering is effective if reaction time is
less than two hours
– How many susceptible hosts will become infected
after time R (reaction time)?
Idealized Deployment Simulation
for General Worm (1)
• The authors generalize the definition of the
effectiveness as the reaction time required
to contain the worm to a given degree of
global infection
• Worm aggressiveness – rate at which
infected host probes others to propagate
– Note: The rate of host probes does not take
into account the possibility of preferential
status that some addresses may have
Idealized Deployment Simulation
for General Worm (2)
Idealized Deployment Simulation
for General Worm conclusions
• Worms that are more aggressive than
Code-Red – having higher probe rate of
100 probes/second require a reaction time
of under three minutes using Addressblacklisting and under 18 minutes for
Content Filtering to contain the worm to
10% of total susceptible population.
Practical Deployment
• Analyzing practical deployment, authors
concentrate on content filtering because of
seemingly much lower requirements on
the reaction time compared to addressblacklisting. This may be premature
because the technique is still useful. It
would be very beneficial to see a hybrid
containment strategy that uses both
content filtering and address-blacklisting.
Practical Deployment simulation
parameters
• The topology of the Internet is taken at the time
of the spread of Code-Red v2.
• The number of vulnerable hosts is 338,652
(some hosts map to multiple autonomous
systems; they have been removed; only infected
hosts in the first 24 hours are inc.)
• The number of autonomous systems is 6,378
• The packet is assumed to travel along the
shortest path through autonomous systems
Practical Deployment for Code-Red
• Reaction time is two hours (less than 1%
infected in idealized simulation)
Practical Deployment for Code-Red
conclusions
• ISP deployment is more effective by itself
than the Customer deployment
• 40 top ISPs can limit the infection to under
5% whereas top 75% of Customer
Autonomous systems can only limit
infection to 25%
• The results could have been anticipated
based on the role of ISPs (their topology)
Practical Deployment for
Generalized Worm
• We investigate reaction time requirements
Practical Deployment for
Generalized Worm conclusions
• For probe rate of 100 probes/second or larger,
neither deployment can effectively contain the
worm
• In the best case, effective containment is
possible for 30 or fewer probes by the TOP 100
ISPs and only 2 or fewer probes by the 50%
Customers
• Note: TOP 100 ISPs cannot prevent a worm
from infecting less than 18% of the hosts if the
probe rate is 100 probes/second (not on the
graph)
Conclusions of the modeling
scheme (1)
• Automated means are needed to detect the
worm and contain it.
• Content filtering is more effective than addressblacklisting, but a combination of several
strategies may need to be employed.
• The reaction time has to be very small, on the
order of minutes to be able to combat
aggressive worms.
• It is important to deploy the containment filtering
strategy at most top ISPs.
Conclusions of the modeling
scheme (2)
• The parameters of the model have changed
• Other containment strategies need to be
considered
• What will happen to the population parameter?
• What may happen to beta soon?
• Combination of prevention, treatment and
containment strategies are needed to combat
aggressive worms
LaBrea (1)
• LaBrea is a Linux-based application which works
at the network application layer creating virtual
machines for nonexistent IP addresses when a
packet to such an address reaches the network.
• Once the connection is established, LaBrea will
try to hold the connection as long as possible (by
moving connections from established state to
persistent state, it can hold connections almost
indefinitely).
LaBrea (2)
• Any connection to LaBrea is suspect because
the IP address to which the packet is sent does
not exist, not even in DNS.
• It can also analyze the range of IP addresses
that are requested giving it a broader view of a
potential attack (all ports on virtual machines
appear open).
• It requires 8bps to hold 3 threads of Code-Red.
If there were 300,00 infected machines each
with 100 threads, 1,000 sites would require 5.2%
of the full T1 line bandwidth each to hold them.
Connection-history based anomaly
detection
• The idea is to use GriDS based intrusion
detection approach and make some
modifications to it to allow for worm containment
• Goals:
– Automatic worm propagation determination
– Worm detection with a low false positive rate
– Effective countermeasures
• Automatic responses in real-time
• Prevent infected hosts from infecting other hosts
• Prevent non-infected hosts from being infected
Connection-history based anomaly
detection model
• Monitoring station collects all recent connection
attempts data and tries to find anomaly in it.
• Patterns of a worm
– Similarity of connection patterns
• The worm tries to exploit the same vulnerability
– Causality of connection patterns
• When one event follows after another
– Obsolete connections
• Compromised hosts try to access services at random IPs
Very Fast Containment of
Scanning Worms
Weaver, Staniford, Paxson
Outline
•
•
•
•
•
Scanning
Suppression Algorithm
Cooperation
Attacks
Conclusion
What is Scanning?
•
•
•
•
•
•
Probes from adjacent remote addresses?
Dist. probes that cover local addresses?
Horizontal vs. Vertical
Factor in connection rates?
Temporal and spatial interdependence
How to infer intent?
Scanning Worms
• Blaster, Code Red, CR II, Nimda, Slammer
• Does not apply to:
– Hit lists (flash worms)
– Meta-servers (online list)
– Topology detectors
– Contagion worms
Scanning Detection
• Key properties of scans:
– Most scanning fails
– Infected machines attempt many connections
• Containment is based on worm behavior,
not signatures (content)
• Containment by address blocking
(blacklisting)
• Blocking can lead to DoS if false positive
rate is high
Scan Suppression
• Goal 1: protect the enterprise; forget the
Internet
• Goal 2: keep worm below epidemic
threshold, or slow it down so humans
notice
• Divide enterprise network into cells
• Each is guarded by a filter employing the
scan detection algorithm
Inside, Outside, Upside Down
• Preventing scans from
Interne
t
Internet is too hard
Outside
• If inside node is infected,
filter sees all traffic
Inside
• Cell (LAN) is “outside”,
Enterprise network is “inside”
Outside
• Can also treat entire
Scan detectors
enterprise as cell, Internet as
outside
Scan Suppression
• Assumption: benign traffic has a higher
probability of success than attack traffic
• Strategy:
– Count connection establishment messages in
each direction
– Block when misses – hits > threshold
– Allow messages for existing connections, to
reduce impact of false positives
Constraints
• For line-speed hardware operation, must
be efficient:
– Memory access speed
• On duplex gigabit ethernet, can only access DRAM
4 times
– Memory size
• Attempt to keep footprint under 16MB
– Algorithm complexity
• Want to implement entirely in hardware
Mechanisms
• Approximate caches
– Fixed memory available
– Allow collisions to cause aliasing
– Err on the side of false negative
• Cryptographic hashes
– Prevent attackers from controlling collisions
– Encrypt hash input to give tag
– For associative cache, split and save only part
as tag in table
Connection Cache
• Remember if we’ve seen a packet in each direction
• Aliasing turns failed attempt into success (biases to false
negative)
• Age is reset on each forwarded packet
• Every minute, bg process purges entries older than Dconn
Address Cache
• Track “outside”
addresses
• Counter keeps
difference between
successes and
failures
• Counts are
decremented every
Dmiss seconds
Algorithm Pseudo-code
Out
In
Connection cache
Internet
A
A,X: OutIn
-
A,*: OutIn
-
A,Y: OutIn
-
A,Z: OutIn
B
A,B: OutIn InOut
Address cache
A: 1
2
3
T max
C
• UDP Probe:
A → X [fwd]
A → Y [fwd]
• Normal Traffic:
A → B [fwd]
B → A [fwd, bidir]
• Scanning again:
A → … [fwd until T]
A → Z [blocked]
A→B ?
[block SYN/UDP, fwd TCP]
Performance
• For 6000-host enterprise trace:
– 1MB connection cache, 4MB 4-way address
cache = 5MB total
– At most 4 memory accesses per packet
– Operated at gigabit line-speed
– Detects scanning at rates over 1 per minute
– Low false positive rate
– About 20% false negative rate
– Detects scanning after 10-30 attempts
Scan Suppression – Tuning
• Parameters:
– T: miss-hit difference that causes block
– Cmin: minimum allowed count
– Cmax: maximum allowed count
– Dmiss: decay rate for misses
– Dconn: decay rate for idle connections
– Cache size and associativity
Cooperation
• Divide enterprise into small cells
• Connect all cells via low-latency channel
• A cell’s detector notifies others when it
blocks an address (“kill message”)
• Blocking threshold dynamically adapts to
number of blocks in enterprise:
– T’ = T – θX, for very small θ
– Changing θ does not change epidemic
threshold, but reduces infection density
Cooperation – Effect of θ
Cooperation Issues
• Poor choice of θ could cause collapse
• Lower thresholds increase false positives
• Should a complete shutdown be possible?
• How to connect cells (practically)?
Attacking Containment
• False positives
– Unidirectional control flows
– Spoofing outside addresses (though this does not
prevent inside systems from initiating connections)
• False negatives
– Use a non-scanning technique
– Scan under detection threshold
– Use a whitelisted port to test for liveness
before scanning
Attacking Containment
• Detecting containment
– Try to contact already infected hosts
– Go stealthy if containment is detected
• Circumventing containment
– Embed scan in storm of spoofed packets
– Two-sided evasion:
• Inside and outside host initiate normal connections
to counter penalty of scanning
• Can modify algorithm to prevent, but lose vertical
scan detection
Attacking Cooperation
• Attempt to outrace containment if
threshold is permissive
• Flood cooperation channels
• Cooperative collapse:
– False positives cause lowered thresholds
– Lowered thresholds cause more false
positives
– Feedback causes collapse of network
Conclusion
Additional References
• Weaver, Paxson, Staniford, Cunningham, A
Taxonomy of Computer Worms, ACM Workshop
on Rapid Malcode, 2003.
• Williamson, Throttling Viruses: Restricting
Propagation to Defeat Mobile Malicious Code,
ACSAC, 2002.
• Jung, Paxson, Berger, Balakrishnan, Fast
Portscan Detection Using Sequential Hypothesis
Testing, IEEE Symposium on Security and
Privacy, 2004.
Download