NetProfiler: Profiling Networks From the Edge Venkat Padmanabhan Microsoft Research June 2005 With Sharad Agarwal (MSR), Jitu Padhye (MSR), Dilip Joseph (UCB), Sriram Ramabhadran (UCSD) 1 Motivation: End Users Users have little info or recourse when they experience network problems Why the failure? website, ISP, client site? is it just me? How am I faring over the long term? switch ISPs? 2 Motivation: Network Operators Network health? Operators have little visibility into end-user network experience Microsoft AT&T MS SVC UUNe t Sprint MS India Enterprise networks: adequately provisioned? health of wireless LAN? MS UK Consumer ISPs how are users in Boston faring? 3 NetProfiler Goal: remedy the situation by leveraging passive observation of normal end-to-end network communication at the “edge” to “profile” the network. Edge = client hosts distributed around the network Profile = monitor + deconstruct (+ diagnose) Turn the Internet into a sensor network 4 NetProfiler Overview Key idea: leverage peer cooperation share network experience info across end hosts draw inferences based on correlation Observations automate what expert users do manually unlike traditional P2P applications Complements previous work network infrastructure monitoring active probing server-based monitoring network tomography 5 Architecture Sensing: glean info from existing communication TCP, web, email, streaming, etc. quantify the user’s network experience −web download failure, e2e email delay Aggregation: based on attributes (website, proxy, domain pair) tradeoff between privacy and data integrity Inference: distributed blame attribution assign credit/blame equally to all entities involved use mass of info from diverse vantage points to make inference 6 Measurement Study Goal: characterize end-to-end web access failures make inferences based on shared observations Testbed: 134 clients worldwide −academic, corporate, dialup, broadband 80 websites worldwide Month-long experiment (Jan ‘05) synthetic workload: each client downloads top level “index” file from each website ~4 times an hour 7 Basic Findings Findings based on local observations Transaction failure rate: 0.7-2.8% TCP conn failures: 57-64%, DNS failures: 34-42% −DNS: dominated by LDNS reachability problems (76-83%) −TCP: dominated by conn establishment failures (41-79%) Correlation analyses to shed more light on the nature of failures Server-side or client-side Proxy-related 8 Classification of Connection Failures Likely Server 15.5% Both 29.5% Server 29.0% Likely Client 0.9% Neither 18.2% Client 6.9% Connection failures are dominated by server-side problems 9 End-to-End Failures vs. BGP Instability Severe BGP instability is rare but has E2E impact when it happens. 10 Proxy-related Problem Server: www.iitb.ac.in Failure rate (%) 6 5 4 3 2 1 0 SEA1 SEA2 SF CHN UK EXT Other Clients behind proxy see significantly higher failure rate 11 Conclusion NetProfiler leverages edge perspective to monitor network health & infer cause of problems Targeted at both end users and operators More info: www.research.microsoft.com/~padmanab/projects/NetProfiler 12