Correcting Congestion-Based Error in Network Telescope’s Observations of Worm Dynamics Songjie Wei

advertisement
Correcting Congestion-Based Error in
Network Telescope’s Observations of
Worm Dynamics
Songjie Wei
Jelena Mirkovic
Computer & Information Sciences Dept.
University of Delaware
Information Sciences Institute
University of Southern California
Proceedings of the ACM Internet Measurement Conference (IMC) 2008
Introduction

Worms compromise hosts and interfere with
network operation


Network telescopes monitor Internet-wide
security incidents



Worms enlist bots, crash routers, congest links
An allocated portion of IPv4 space, otherwise unused
Collecting data about worms, spoofed DDoS, etc.
Network telescopes collect information about
worm dynamics


CAIDA telescope, WAIL telescope
Code Red (2001), Slammer (2003), Witty worm (2004)
Motivation
How to Correct
Validation
Conclusions
Sample Telescope Observation of a Worm
Slammer’s early progress [Moore03] according to the WAIL
telescope (/8) at University of Wisconsin
80M x 404B x 8 / 256 ≈ 1Gbps
Too slow
Worm
scan size
Telescope
size
[Moore03] David Moore, Vern Paxson, Stefan Savage, Collen Shannon,
Stuart Staniford, and Nicholas Weaver. Inside the Slammer Worm. IEEE
Security and Privacy, Jul/Aug 2003.
Motivation
How to Correct
Validation
Conclusions
Why Do We Need Accurate Observations?

Curiosity


Sometimes the only way to test worm defenses
is with a simulator or a model




We should know what is REALLY going on
Internet-scale defenses
Also new worm propagation approaches
Simulators and models must be validated against
some ground truth - usually telescope’s
observations
Inaccurate observations lead to inaccurate
simulators and models
Motivation
How to Correct
Validation
Conclusions
Network Telescope’s Limitations

Aggressive worms saturate Internet links and
cause packet drops



Slammer (404 bytes worm packet) infected 75,000+
hosts, peak rate of 26,000 scans per worm per second
Witty infected 12,000 computers, generating 90 Gbps
Telescopes may only observe some worm
infectees and receive portions of worm scans
sent to them


Slow or short-life infectees less likely to be observed
Worm scans get dropped on crashed routers or
congested links (network intermediate link, telescope’s
access link)
Motivation
How to Correct
Validation
Conclusions
Telescope’s Inference of Worm Dynamics

Scaling up a telescope’s local observation by the
telescope’s size to infer the global worm
dynamics


For a /8 telescope, local_scans x 256 =
global_scans, local_infected = global infected?
The traditional inference may be wrong, due to
congestion and packet loss


Lower inference of the number of infected hosts
Underestimated worm scanning rates (both global
and individual worm rate)
Motivation
How to Correct
Validation
Conclusions
Our Research Objectives

Measure the congestion severity based on
telescope’s observation


Estimate scan loss and arrival ratio for each infectee
Infer worm’s global dynamics


Correct telescope’s local observation with
consideration of network congestion and packet loss
Scale up the local worm observation correctly to
reflect the global dynamics of worm propagation in
Internet
Motivation
How to Correct
Validation
Conclusions
Related Work

Worm simulation scale-down [Weaver04]



Not clear how you would use it to scale up
Worm forensics [Kumar05, Hama06]

More accurate than our method but need worm source
and human involvement

Complementary
Correcting telescope bias [Zou03]


Bias occurs due to delay between a host’s infection
and scans seen by telescope
Minor for /8 telescopes
Motivation
How to Correct
Validation
Conclusions
Assumptions about Worms and Internet

Constant infectee scanning rate


Scanning rate is bounded by infectee’s capability (CPU,
memory, access bandwidth, OS, etc.)
Packet loss mainly due to congestion

Many incidents and countermeasures cause packet
loss



Congestion, routing failure, security countermeasures, etc.
Hosts sharing routing paths have the same loss ratio
No significant loss in early stage of worm spread


Congestion gradually builds up as more hosts are
infected
Early stage is the time when none or a few packets are
dropped
Motivation
How to Correct
Validation
Conclusions
Scan Arrival Ratio

Inferring the constant scanning rate
T
S   R (t ) / T
Duration of
early stage
t 1
Constant
sending rate

Scans received
Scan arrival ratio: the percentage of worm
scans sent by an infectee to the telescope that
is successfully received
P(t) =
Motivation
Scans received by the telescope during time i
The infectee’s constant sending rate to the telescope
How to Correct
Validation
=
R(t)
S
Conclusions
Inferring the End of the Early Stage

Epidemic model of worm propagation
Cliff C. Zou, Lixin
Gao, Weibo Gong,
and Don Towsley.
“Monitoring and
Early Warning for
Internet Worms,”
ACM CCS, 2003.

T is when the number of infected hosts departs
from the exponential model

Due to congestion or end of slow start phase
Motivation
How to Correct
Validation
Conclusions
End of Early Stage: Time T

We measure the match between the number of
infected hosts and the exponential model





R-squared: a statistical measure, between 0 and 1,
shows fit between a model’s prediction and real values
0 means totally different, and 1 means perfect match
We define T as the time when R-squared value starts to
decrease continuously from 1
Hosts infected before T are called early infectees
Underestimating T is OK, as long as we identify
enough early infectees
Motivation
How to Correct
Validation
Conclusions
Scan Arrival Ratio for Late Infectees

Infectees observed after time T are identified as
late infectees


We cannot infer their constant scanning rates directly
because their scans are always observed with loss
We differentiate between two cases

If we know the ratio of another infectee sharing the
same inter-AS routing path, we use this known ratio


Infectees sharing a routing path suffer the same congestion
Otherwise we use the average of ratios of all known
early infectees


Better than random guess
More accurate when congestion is closer to the telescope
Motivation
How to Correct
Validation
Conclusions
Inferring Worm Dynamics

Global number of infected hosts



Worm’s Internet scanning rate


Number of infected hosts in the Internet
Also hosts that are infected but not yet seen by
telescope due to congestion
Number of worm scans sent into the Internet per
second
Infectees’ scanning rate distribution


Reflect the infectees’ aggressiveness
Limited by infectee’s features and configurations
Motivation
How to Correct
Validation
Conclusions
Global Number of Infected Hosts
∆Ir(t) = Ir(t) – Ir(t-1)
Ir(t): number of infectees by time t
∆Io(t) = Io(t) – Io(t-1)
Io(t): number of infectees observed by time t
Uo(t): number of infectees not observed by time t
Pagg(t): aggregated scan arrival ratio of infectees
Smed(t-1): Med scans sent to telescope per second per infectee
Io(t)
Ir(t) 
 Uo(t 1)
Smed( t1)
1 (1 Pagg(t 1))
Probability that an infectee is seen
by the telescope during time t
Uo(t)  (Ir(t)  Uo(t 1))  (1 Pagg(t 1)) Smed(t1)
Probability that an infectee is NOT
seen by the telescope during time t
Motivation
How to Correct
Validation
Conclusions
Inference Example
t
∆Io Pagg
Smed
Io
∆Ir
Uo
Ir
0
0
1
0
0
0
0
0
1
1
0.8
5
1
1
0
1
2
3
0.5
5
4
2
0.00096 4.00096
3
2
0.5
5
6
2.06 0.065
6.06
4
3
0.2
5
9
3.03 0.097
9.10
5
6
0.2
5
15
8.83 2.92
17.92
Motivation
How to Correct
Validation
Conclusions
Variation of Scan Arrival to Telescope
Early stage
Congestion
(no congestion)
140
Scans per second)
120
100
Constant sending rate to
the telescope
80
60
Time T
40
20
0
1
4
7
10
13
16
19
22
25
28
31
34
37
40
Time (second)
A sample observation of scans from an infectee to a /8 telescope
(from our simulation of Witty worm)
Motivation
How to Correct
Validation
Conclusions
Worm’s Internet Scanning Rate
Ri(t): number of scans received by the telescope from infectee i
during time t
Pi(t): scan pass ratio of infectee i during time t
Ri (t )
Ir (t )
IPv 4
 

Scanning Rate =
Io (t ) iIo ( t ) Pi (t ) telescope _ size
Counting those infectees
that are not yet observed
Motivation
Each infectee’s
sending rate
How to Correct
Scaling up local observation to
global dynamics
Validation
Conclusions
Infectees’ Scanning Rate

We use the maximum of inferred infectee’s scanning rate
as the infectee’s original scanning rate
Ri (t )
IPv 4
Bi  max t 0 (
)
Pi (t ) telescope _ size
Motivation
Scanning rate
Sending rate of
of infectee i
infectee i at time t
How to Correct
Scaling up from telescope’s
observation to the whole Internet
Validation
Conclusions
Inferring Global Dynamics of Witty Worm

Witty worm trace





From CAIDA telescope, a /8 block of IPv4
Scans begin on March 19, 2004 at 4:45am UTC
Original IP addresses
Ground truth [Kumar05]
Our experiment



Avoid infectees within NATs
Filter out infectees with less than 20 scans
First 75 minutes,11,326 infectees, 45.5 M scans
[Kumar05] A. Kumar, V. Paxson and N. Weaver, “Exploiting Underlying
Structure for Reconstruction of an Internet-Scale Event”, IMC 2005.
Motivation
How to Correct
Validation
Conclusions
R-squared value
# of observed infectees
Inferring T - End of the Early Stage
Motivation
How to Correct
Validation
Conclusions
Witty Worm’s Internet Scanning Rate
Larger T tends to underestimate the packet loss and thus
the Internet-wide worm propagation strength.
Motivation
How to Correct
Validation
Conclusions
Infectees’ Bandwidth
Two error 1. We use average scan arrival ratio for some late infectees
sources: 2. We lack information about slow/short-life infectees
Motivation
How to Correct
Validation
Conclusions
Conclusions and Future Work

We investigate the congestion effect on network
telescope’s observation of worm propagation

We propose ways to estimate congestion packet
loss, and to correct telescope’s observation


We correct CAIDA telescope’s observation of Witty
We plan to extend our work for other error
sources

Influence of various telescope sizes on our analysis

NATs, filters, non-uniform scanning worms
Motivation
How to Correct
Validation
Conclusions
Questions? Comments?

Jelena Mirkovic (sunshine@isi.edu)

Songjie Wei (weis@cis.udel.edu)
[PAWS] S. Wei and J. Mirkovic, “A Realistic Simulation of Internet-Scale Events,”
Proceedings of the 2006 VALUETOOLS Conference, October 2006
Download