NetProfiler: Profiling Networks From the Edge Venkat Padmanabhan Microsoft Research

advertisement
NetProfiler: Profiling Networks From
the Edge
Venkat Padmanabhan
Microsoft Research
June 2005
With Sharad Agarwal (MSR), Jitu Padhye (MSR),
Dilip Joseph (UCB), Sriram Ramabhadran (UCSD)
1
Motivation: End Users
Users have little info or
recourse when they
experience network
problems

Why the failure?
 website, ISP, client site?
 is it just me?

How am I faring over the
long term?
 switch ISPs?
2
Motivation: Network Operators
Network
health?
Operators have little
visibility into end-user
network experience
Microsoft
AT&T
MS
SVC
UUNe
t
Sprint MS India

Enterprise networks:
 adequately provisioned?
 health of wireless LAN?
MS UK

Consumer ISPs
 how are users in Boston
faring?
3
NetProfiler
Goal: remedy the situation by leveraging passive
observation of normal end-to-end network
communication at the “edge” to “profile” the
network.
Edge = client hosts distributed around the network
Profile = monitor + deconstruct (+ diagnose)
Turn the Internet into a sensor network
4
NetProfiler Overview

Key idea: leverage peer cooperation
 share network experience info across end hosts
 draw inferences based on correlation

Observations
 automate what expert users do manually
 unlike traditional P2P applications

Complements previous work




network infrastructure monitoring
active probing
server-based monitoring
network tomography
5
Architecture

Sensing: glean info from existing communication
 TCP, web, email, streaming, etc.
 quantify the user’s network experience
−web download failure, e2e email delay

Aggregation:
 based on attributes (website, proxy, domain pair)
 tradeoff between privacy and data integrity

Inference: distributed blame attribution
 assign credit/blame equally to all entities involved
 use mass of info from diverse vantage points to make
inference
6
Measurement Study

Goal:
 characterize end-to-end web access failures
 make inferences based on shared observations

Testbed:
 134 clients worldwide
−academic, corporate, dialup, broadband
 80 websites worldwide

Month-long experiment (Jan ‘05)
 synthetic workload: each client downloads top level
“index” file from each website ~4 times an hour
7
Basic Findings

Findings based on local observations
 Transaction failure rate: 0.7-2.8%
 TCP conn failures: 57-64%, DNS failures: 34-42%
−DNS: dominated by LDNS reachability problems (76-83%)
−TCP: dominated by conn establishment failures (41-79%)

Correlation analyses to shed more light on the
nature of failures
 Server-side or client-side
 Proxy-related
8
Classification of Connection Failures
Likely Server
15.5%
Both
29.5%
Server
29.0%
Likely Client
0.9%
Neither
18.2%
Client
6.9%
Connection failures are dominated by server-side problems
9
End-to-End Failures vs. BGP Instability
Severe BGP instability
is rare but has E2E impact
when it happens.
10
Proxy-related Problem
Server: www.iitb.ac.in
Failure rate (%)
6
5
4
3
2
1
0
SEA1 SEA2
SF
CHN
UK
EXT
Other
Clients behind proxy see significantly higher failure rate
11
Conclusion



NetProfiler leverages edge perspective to monitor
network health & infer cause of problems
Targeted at both end users and operators
More info:
www.research.microsoft.com/~padmanab/projects/NetProfiler
12
Download