The Practicality of End-User Network Monitoring Vivek Pai Princeton University

advertisement
The Practicality of End-User
Network Monitoring
Vivek Pai
Princeton University
What Is This Talk?
Gedankenexperiment
 A brief history of work – ours & related

– Not necessarily precise
– Not even close to exhaustive

Some prediction, direction
– From discussions with Ming Zhang, Larry
Peterson
– Much derived from Ming’s PlanetSeer work
June 1, 2005
Vivek Pai, Princeton University
2
In The Beginning
There was RON
 And RON was good

But

RON was smaller than the Internet
June 1, 2005
Vivek Pai, Princeton University
3
And Then There Was PlanetLab

PlanetLab was bigger
– But still smaller than the Internet
– But it was growing

What about RON on PlanetLab?
June 1, 2005
Vivek Pai, Princeton University
4
Other Problems

All-pairs probing not indefinitely scalable
– Possible to modify this

Path diversity was a problem
– No quadratic increase in diversity with
additional nodes
– Every reviewer would jump on this

Still not growing fast enough
June 1, 2005
Vivek Pai, Princeton University
5
Idea: Use “External” Nodes

Two groups had similar ideas
– SOSR (Gummadi et al) and
– PlanetSeer (Zhang et al)
– Both published in OSDI 2004

Approach specifics differed
– Probe type, probe frequency
– # of participating nodes, etc
June 1, 2005
Vivek Pai, Princeton University
6
Quick Highlights
SOSR
PlanetSeer


Target popular web
servers
 Actively probe at
periodic intervals
 TCP probes
Target clients &
servers
 Passively monitor,
then actively probe
 UDP (traceroute)
 Host: CoDeeN CDN
http://codeen.cs.princeton.edu
June 1, 2005
Vivek Pai, Princeton University
7
High-Level Picture of PlanetSeer
June 1, 2005
Vivek Pai, Princeton University
8
When
To
Probe?
TTL
TTL
TTL
31
32
30
TTL
29
30
source

destination
Difficulties
– Do not continuously probe
– No cooperation from both ends

TTL
28
29
Indicators of routing problem
– Time-to-live (TTL) change
– n consecutive timeouts (currently n = 4)
• Idling period of 3 to 16 seconds
• Congestions usually don’t last this long?
Probing Groups

353 nodes, 145 sites, 30 groups world-wide
– Reduce overhead without losing accuracy
– One traceroute from each group
Confirmed Anomaly Breakdown

Confirmed anomalies
–
–
–
–
271,898
3 months
2 per minute
100 x higher
Temp
Loop
1%

Temp
Anomaly
16%
Persist
Loop 7%
Temp anomaly
– Inconsistent probe
Other
Outage
23%
Path
Change
44%
Fwd
Outage
9%
PlanetSeer Tradeoffs

Passive/active big win
– One active probe on avg every 4 seconds
• Understanding NATs drops this to every 8 secs
– One confirmed anomaly every 30 seconds
– About 100x the anomalies for 3x probe
traffic

Using external loses some info
– But passive traffic provides some
June 1, 2005
Vivek Pai, Princeton University
12
Path Diversity
Tier Coverage
100%
80%
Core
60%
Edge
40%
20%
0%
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
22
ASes
215
ASes
1392
ASes
1420
ASes
13872
ASes
Monitoring period: 02/2004 – 05/2004
 Unique IPs: 887,521
 Traversed ASes: 10,090
Vivek Pai, Princeton University

June 1, 2005
13
PlanetSeer Going Forward

CoDeeN traffic increasing
– Was doing ~5M reqs/day from ~25K clients
– Now at 12M+ reqs/day from 50K+ clients

Coverage might be improving
– PlanetSeer saw ~1M unique IP addresses in 3
months
– Not clear how many are dial-up
– New users will come from new services, like
CoBlitz (scalable large-file transfer)
June 1, 2005
Vivek Pai, Princeton University
14
Observations

Getting 2 orders larger than RON required
new approach
 PlanetSeer has several avenues for growth
– Missing half of Tier 5 ASes
– More traffic on lower tiers desirable
– Total users still small

Projection: next 2 orders will need new
approach
June 1, 2005
Vivek Pai, Princeton University
15
Involving the End User

Seti@home approach
– About 5M downloads
– In comparison: CNN 22M, AOL 23M

Web bugs
– Possible, but who’s going to do it?

P2P probing
– Public relations problem? Maybe
– BitTorrent/Skype likely candidates – how?
– Locality optimizations undesirable
June 1, 2005
Vivek Pai, Princeton University
16
MeasureMe!
Use browser to launch active probes
 Like web bugs, but obvious
 Delivery options

– Built into browser
– Clickable via error pages
– Toolbar
– Local application (screen saver, etc.)
June 1, 2005
Vivek Pai, Princeton University
17
Each image URL
is for a CGI, and
has an identifier
June 1, 2005
Vivek Pai, Princeton University
18
Do We Need End Users?

Most people not multi-homed
– Last mile does not matter
– Matters to them, but not otherwise

Focus on ISPs
– Fewer privacy, security issues
– Can ship data with other routing data
– End users useful when ISP not joining
June 1, 2005
Vivek Pai, Princeton University
19
Do We Need To Coalesce?

Measurement traffic still small
– Good experience for students
– New ideas needed
– Different approaches may yield new insight

Shared measurement infrastructure
vulnerable
– Blacklisting affects more people
– Any experiment can cause ripples
June 1, 2005
Vivek Pai, Princeton University
20
What’s Next For Us

We’ll let PlanetSeer track CoDeeN
– User growth will give us more data
– Long (1GB+) downloads in CoBlitz will
provide more stickiness
– Might implement MeasureMe! splash
screen

Longer term – allow direct participation
June 1, 2005
Vivek Pai, Princeton University
21
Download