Internet Measurement Jennifer Rexford

Internet Measurement Jennifer Rexford Outline • Measurement overview – Why measure? Why model measurements? – What to measure? Where to measure? • Internet challenges • Measurement tools – Active: ping, traceroute, and pathchar – Passive: logs, SNMP, packet, and flow monitoring • Operational applications of measurement • Discussion Why Measure? • The Internet is a man-made system, so why do we need to measure it? – Because we still don’t really understand it – Because sometimes things go wrong • Measurement for network operations – Detecting and diagnosing problems – What-if analysis of future changes • Measurement for scientific discovery – Characterizing a complex system as organism – Creating accurate models that represent reality – Identifying new features and phenomena Why Build Models of Measurements? • Compact summary of measurements – Efficient way to represent a large data set – E.g., exponential distribution with mean 100 sec • Expose important properties of measurements – Reveals underlying cause or engineering question – E.g., mean RTT to help explain TCP throughout • Generate random but realistic data as input – Generate new data that agree in key properties – E.g., topology models to feed into simulators “All models are wrong, but some models are useful.” – George Box What Can be Measured? • Traffic – Load statistics – Packet or flow traces • Performance of paths – Application performance, e.g,. Web download time – Transport performance, e.g., TCP bulk throughput – Network performance, e.g., packet delay and loss • Network structure – Topology, and paths on the topology – Dynamics of the routing protocol Where Measure? • Short answer – Anywhere you can!  • End hosts – Application logs, e.g., Web server logs – Sending active probes to measure performance • Individual links/routers – Load statistics, packet traces, flow traces – Configuration state – Routing-protocol messages or table dumps – Alarms Internet Challenges Make Measurement an Art • Stateless routers – Routers do not routinely store packet/flow state – Measurement is an afterthought, adds overhead • IP narrow waist – IP measurements cannot see below network layer – E.g., link-layer retransmission, tunnels, etc. • Violations of end-to-end argument – E.g., firewalls, address translators, and proxies – Not directly visible, and may block measurements • Decentralized control – Autonomous Systems may block measurements – No global notion of time Active Measurement: Ping • Adding traffic for purposes of measurement – Trade-offs between accuracy and overhead – Need careful methods to avoid introducing bias • Ping – Host sends an ICMP ECHO packet to a target – … and captures the ICMP ECHO REPLY – Useful for checking connectivity, and RTT – Only requires control of one of the two end-points • Problems with ping – Round-trip rather than one-way delays – Some hosts might not respond Active Measurement: Traceroute • Time-To-Live field in IP packet header – Source sends a packet with a TTL of n – Each router along the path decrements the TTL – “TTL exceeded” sent when TTL reaches 0 • Traceroute tool exploits this TTL behavior TTL=1 source Time exceeded destination TTL=2 Send packets with TTL=1, 2, 3, … and record source of “time exceeded” message Active Measurement: Challenges of Traceroute • Measuring multiple paths – Successive probes may traverse different paths • Non-participating network elements – Some routers and firewalls don’t reply • Inaccurate delay information – Includes processing delays on the router CPU • Round-trip vs. one-way measurements – Paths may have asymmetric properties • Interfaces, not routers – Returns IP address of interfaces, not routers Active Measurement: Applications of Traceroute • Network troubleshooting – Identify forwarding loops and black holes – Identify long and convoluted paths – See how far the probe packets get • Network topology inference – Launch traceroute probes from many places – … toward many destinations – Join together to fill in parts of the topology – … though traceroute undersamples the edges Active Measurement: Pathchar for Links rtt (i  1)  rtt (i )  d  L / c   i : initial TTL value c : link capacity L : packet size rtt(i+1) -rtt(i) Three delay components: d : propagation delay L / c : transmission delay  : queueing delay  noise How to infer d,c?  min. RTT (L) slope=1/c d L Passive Measurement: Logs at Hosts • Web server logs – Host, time, URL, response code, content length, … – E.g., 122.345.131.2 - - [15/Oct/1998:00:00:25 0400] "GET /images/wwwtlogo.gif HTTP/1.0" 304 - "http://www.aflcio.org/home.htm" "Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; AOL 4.0; Windows 95)" "-" • DNS logs – Request, response, time • Useful for workload characterization, troubleshooting, etc. Passive Measurement: SNMP • Simple Network Management Protocol – Coarse-grained counters on the router – E.g., byte and packet counts • Polling – Management system can poll the counters – E.g., once every five minutes • Limitations – Extremely coarse-grained statistics – Delivered over UDP! • Advantages: ubiquitous Passive Measurement: Packet Monitoring • Tapping a link Multicast switch Shared media (Ethernet, wireless) Host A Host A Host B Monitor Host B S w i t c h Host C Monitor Splitting a point-to-point link Router A Router B Monitor Line card that does packet sampling Router A Packet Monitoring: Selecting the Traffic • Filter to focus on a subset of the packets – IP addresses/prefixes (e.g., to/from specific Web sites, client machines, DNS servers, mail servers) – Protocol (e.g., TCP, UDP, or ICMP) – Port numbers (e.g., HTTP, DNS, BGP, Napster) • Collect first n bytes of packet (snap length) – Medium access control header (if present) – IP header (typically 20 bytes) – IP+UDP header (typically 28 bytes) – IP+TCP header (typically 40 bytes) – Application-layer message (entire packet) Tcpdump Output (three-way TCP handshake and HTTP request message) timestamp Web server client address and port # (port 80) 23:40:21.008043 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: S 617756405:617756405(0) win 32120 <mss 1460,sackOK,timestamp 46339 0,nop,wscale 0> (DF) SYN flag sequence number TCP options 23:40:21.036758 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: S 2598794605:2598794605(0) ack 617756406 win 16384 <mss 512> 23:40:21.036789 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: . 1:1(0) ack 1 win 32120 (DF) 23:40:21.037372 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P 1:513(512) ack 1 win 32256 (DF) 23:40:21.085106 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: . 1:1(0) ack 513 win 16384 23:40:21.085140 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P 513:676(163) ack 1 win 32256 (DF) 23:40:21.124835 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: P 1:179(178) ack 676 win 16384 Analysis of Packet Traces • IP header – Traffic volume by IP addresses or protocol – Burstiness of the stream of packets – Packet properties (e.g., sizes, out-of-order, etc.) • TCP header – Traffic breakdown by application (e.g., Web) – TCP congestion and flow control – Number of bytes and packets per session • Application header – URLs, HTTP headers (e.g., cacheable response?) – DNS queries and responses, user key strokes, … Aggregating Packets into IP Flows flow 1 flow 2 flow 3 flow 4 • Set of packets that “belong together” – Source/destination IP addresses and port numbers – Same protocol, ToS bits, … – Same input/output interfaces at a router (if known) • Packets that are “close” together in time – Maximum spacing between packets (e.g., 15 sec, 30 sec) – Example: flows 2 and 4 are different flows due to time Packet vs. Flow Measurement • Basic statistics (available from both techniques) – Traffic mix by IP addresses, port numbers, and protocol – Average packet size • Traffic over time – Both: traffic volumes on a medium-to-large time scale – Packet: burstiness of the traffic on a small time scale • Statistics per TCP connection – Both: number of packets & bytes transferred over the link – Packet: frequency of lost or out-of-order packets, and the number of application-level bytes delivered • Per-packet info (available only from packet traces) – TCP seq/ack #s, receiver window, per-packet flags, … – Probability distribution of packet sizes – Application-level header and body (full packet contents) Measurement Challenges for Operators • Network-wide view – Crucial for evaluating control actions – Multiple kinds of data from multiple locations • Large scale – Large number of high-speed links and routers – Large volume of measurement data • Poor state-of-the-art – Working within existing protocols and products – Technology not designed with measurement in mind • The “do no harm” principle – Don’t degrade router performance – Don’t require disabling key router features – Don’t overload the network with measurement data Network Operations Tasks • Reporting of network-wide statistics – Generating basic information about usage and reliability • Performance/reliability troubleshooting – Detecting and diagnosing anomalous events • Security – Detecting, diagnosing, and blocking security problems • Traffic engineering – Adjusting network configuration to the prevailing traffic • Capacity planning – Deciding where and when to install new equipment Basic Reporting • Producing basic statistics about the network – For business purposes, network planning, ad hoc studies • Examples – – – – – Proportion of transit vs. customer-customer traffic Total volume of traffic sent to/from each private peer Mixture of traffic by application (Web, Napster, etc.) Mixture of traffic to/from individual customers Usage, loss, and reliability trends for each link • Requirements – Network-wide view of basic traffic and reliability statistics – Ability to “slice and dice” measurements in different ways (e.g., by application, by customer, by peer, by link type) Troubleshooting • Detecting and diagnosing problems – Recognizing and explaining anomalous events • Examples – – – – – Why Why Why Why Why a backbone link is suddenly overloaded the route to a destination prefix is flapping DNS queries are failing with high probability a route processor has high CPU utilization a customer cannot reach certain Web sites • Requirements – Network-wide view of many protocols and systems – Diverse measurements at different protocol levels – Thresholds for isolating significant phenomena Security • Detecting and diagnosing problems – Recognizing suspicious traffic or disruptions • Examples – Denial-of-service attack on a customer or service – Spread of a worm or virus through the network – Route hijack of an address block by adversary • Requirements – Detailed measurements from multiple places – Including deep-packet inspection, in some cases – Online analysis of the data – Installing filters to block the offending traffic Traffic Engineering • Adjusting resource allocation policies – Path selection, buffer management, and link scheduling • Examples – OSPF weights to divert traffic from congested links – BGP policies to balance load on peering links – Link-scheduling weights to reduce delay for “gold” traffic • Requirements – Network-wide view of the traffic carried in the backbone – Timely view of the network topology and configuration – Accurate models to predict impact of control operations (e.g., the impact of RED parameters on TCP throughput) Capacity Planning • Deciding whether to buy/install new equipment – What? Where? When? • Examples – – – – – Where to put the next backbone router When to upgrade a link to higher capacity Whether to add/remove a particular peer Whether the network can accommodate a new customer Whether to install a caching proxy for cable modems • Requirements – Projections of future traffic patterns from measurements – Cost estimates for buying/deploying the new equipment – Model of the potential impact of the change (e.g., latency reduction and bandwidth savings from a caching proxy) Examples of Public Data Sets • Network-wide data – Abilene and GEANT backbones – Netflow, IGP, and BGP traces • CAIDA DatCat – Data catalogue maintained by CAIDA – http://imdc.datcat.org/ • Interdomain routing – RouteViews and RIPE-NCC – BGP routing tables and update messages • Traceroute and looking glass servers – http://www.traceroute.org/ – http://www.nanog.org/lookingglass.html Discussion • How important is accuracy of the data? • How can we validate measurement studies? (If we know the answer already, why are we measuring?) • How to do controlled experiments with measurement techniques? • Can we move measurement to a science rather than an art? • Can we identify incentives for making measurement possible and data available? • Distributed analysis of measurement data? • An architecture for router or line-card support for traffic and performance measurement? • Trade-offs between security and privacy?

Internet Measurement Jennifer Rexford

Related documents

Products

Support

Internet Measurement Jennifer Rexford

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib