Internet Measurement Jennifer Rexford

advertisement
Internet Measurement
Jennifer Rexford
Outline
• Measurement overview
– Why measure? Why model measurements?
– What to measure? Where to measure?
• Internet challenges
• Measurement tools
– Active: ping, traceroute, and pathchar
– Passive: logs, SNMP, packet, and flow monitoring
• Operational applications of measurement
• Discussion
Why Measure?
• The Internet is a man-made system, so why
do we need to measure it?
– Because we still don’t really understand it
– Because sometimes things go wrong
• Measurement for network operations
– Detecting and diagnosing problems
– What-if analysis of future changes
• Measurement for scientific discovery
– Characterizing a complex system as organism
– Creating accurate models that represent reality
– Identifying new features and phenomena
Why Build Models of Measurements?
• Compact summary of measurements
– Efficient way to represent a large data set
– E.g., exponential distribution with mean 100 sec
• Expose important properties of measurements
– Reveals underlying cause or engineering question
– E.g., mean RTT to help explain TCP throughout
• Generate random but realistic data as input
– Generate new data that agree in key properties
– E.g., topology models to feed into simulators
“All models are wrong, but some models are useful.” – George Box
What Can be Measured?
• Traffic
– Load statistics
– Packet or flow traces
• Performance of paths
– Application performance, e.g,. Web download time
– Transport performance, e.g., TCP bulk throughput
– Network performance, e.g., packet delay and loss
• Network structure
– Topology, and paths on the topology
– Dynamics of the routing protocol
Where Measure?
• Short answer
– Anywhere you can! 
• End hosts
– Application logs, e.g., Web server logs
– Sending active probes to measure performance
• Individual links/routers
– Load statistics, packet traces, flow traces
– Configuration state
– Routing-protocol messages or table dumps
– Alarms
Internet Challenges Make Measurement an Art
• Stateless routers
– Routers do not routinely store packet/flow state
– Measurement is an afterthought, adds overhead
• IP narrow waist
– IP measurements cannot see below network layer
– E.g., link-layer retransmission, tunnels, etc.
• Violations of end-to-end argument
– E.g., firewalls, address translators, and proxies
– Not directly visible, and may block measurements
• Decentralized control
– Autonomous Systems may block measurements
– No global notion of time
Active Measurement: Ping
• Adding traffic for purposes of measurement
– Trade-offs between accuracy and overhead
– Need careful methods to avoid introducing bias
• Ping
– Host sends an ICMP ECHO packet to a target
– … and captures the ICMP ECHO REPLY
– Useful for checking connectivity, and RTT
– Only requires control of one of the two end-points
• Problems with ping
– Round-trip rather than one-way delays
– Some hosts might not respond
Active Measurement: Traceroute
• Time-To-Live field in IP packet header
– Source sends a packet with a TTL of n
– Each router along the path decrements the TTL
– “TTL exceeded” sent when TTL reaches 0
• Traceroute tool exploits this TTL behavior
TTL=1
source
Time
exceeded
destination
TTL=2
Send packets with TTL=1, 2, 3, … and record source of “time exceeded” message
Active Measurement: Challenges of Traceroute
• Measuring multiple paths
– Successive probes may traverse different paths
• Non-participating network elements
– Some routers and firewalls don’t reply
• Inaccurate delay information
– Includes processing delays on the router CPU
• Round-trip vs. one-way measurements
– Paths may have asymmetric properties
• Interfaces, not routers
– Returns IP address of interfaces, not routers
Active Measurement: Applications of Traceroute
• Network troubleshooting
– Identify forwarding loops and black holes
– Identify long and convoluted paths
– See how far the probe packets get
• Network topology inference
– Launch traceroute probes from many places
– … toward many destinations
– Join together to fill in parts of the topology
– … though traceroute undersamples the edges
Active Measurement: Pathchar for Links
rtt (i  1)  rtt (i )  d  L / c  
i : initial TTL value
c : link capacity
L : packet size
rtt(i+1)
-rtt(i)
Three delay components:
d : propagation delay
L / c : transmission delay
 : queueing delay  noise
How to infer d,c?

min. RTT (L)
slope=1/c
d
L
Passive Measurement: Logs at Hosts
• Web server logs
– Host, time, URL, response code, content length, …
– E.g., 122.345.131.2 - - [15/Oct/1998:00:00:25 0400] "GET /images/wwwtlogo.gif HTTP/1.0" 304
- "http://www.aflcio.org/home.htm" "Mozilla/2.0
(compatible; MSIE 3.02; Update a; AK; AOL 4.0;
Windows 95)" "-"
• DNS logs
– Request, response, time
• Useful for workload characterization,
troubleshooting, etc.
Passive Measurement: SNMP
• Simple Network Management Protocol
– Coarse-grained counters on the router
– E.g., byte and packet counts
• Polling
– Management system can poll the counters
– E.g., once every five minutes
• Limitations
– Extremely coarse-grained statistics
– Delivered over UDP!
• Advantages: ubiquitous
Passive Measurement: Packet Monitoring
• Tapping a link
Multicast switch
Shared media (Ethernet, wireless)
Host A
Host A
Host B
Monitor
Host B
S
w
i
t
c
h
Host C
Monitor
Splitting a point-to-point link
Router A
Router B
Monitor
Line card that does packet sampling
Router A
Packet Monitoring: Selecting the Traffic
• Filter to focus on a subset of the packets
– IP addresses/prefixes (e.g., to/from specific Web
sites, client machines, DNS servers, mail servers)
– Protocol (e.g., TCP, UDP, or ICMP)
– Port numbers (e.g., HTTP, DNS, BGP, Napster)
• Collect first n bytes of packet (snap length)
– Medium access control header (if present)
– IP header (typically 20 bytes)
– IP+UDP header (typically 28 bytes)
– IP+TCP header (typically 40 bytes)
– Application-layer message (entire packet)
Tcpdump Output
(three-way TCP handshake and HTTP request message)
timestamp
Web server
client address and port #
(port 80)
23:40:21.008043 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: S
617756405:617756405(0) win 32120 <mss 1460,sackOK,timestamp 46339
0,nop,wscale 0> (DF)
SYN flag
sequence number
TCP options
23:40:21.036758 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: S
2598794605:2598794605(0) ack 617756406 win 16384 <mss 512>
23:40:21.036789 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: .
1:1(0) ack 1 win 32120 (DF)
23:40:21.037372 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P
1:513(512) ack 1 win 32256 (DF)
23:40:21.085106 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: .
1:1(0) ack 513 win 16384
23:40:21.085140 eth0 > 135.207.38.125.1043 > lovelace.acm.org.www: P
513:676(163) ack 1 win 32256 (DF)
23:40:21.124835 eth0 < lovelace.acm.org.www > 135.207.38.125.1043: P
1:179(178) ack 676 win 16384
Analysis of Packet Traces
• IP header
– Traffic volume by IP addresses or protocol
– Burstiness of the stream of packets
– Packet properties (e.g., sizes, out-of-order, etc.)
• TCP header
– Traffic breakdown by application (e.g., Web)
– TCP congestion and flow control
– Number of bytes and packets per session
• Application header
– URLs, HTTP headers (e.g., cacheable response?)
– DNS queries and responses, user key strokes, …
Aggregating Packets into IP Flows
flow 1
flow 2
flow 3
flow 4
• Set of packets that “belong together”
– Source/destination IP addresses and port numbers
– Same protocol, ToS bits, …
– Same input/output interfaces at a router (if known)
• Packets that are “close” together in time
– Maximum spacing between packets (e.g., 15 sec, 30 sec)
– Example: flows 2 and 4 are different flows due to time
Packet vs. Flow Measurement
• Basic statistics (available from both techniques)
– Traffic mix by IP addresses, port numbers, and protocol
– Average packet size
• Traffic over time
– Both: traffic volumes on a medium-to-large time scale
– Packet: burstiness of the traffic on a small time scale
• Statistics per TCP connection
– Both: number of packets & bytes transferred over the link
– Packet: frequency of lost or out-of-order packets, and the
number of application-level bytes delivered
• Per-packet info (available only from packet traces)
– TCP seq/ack #s, receiver window, per-packet flags, …
– Probability distribution of packet sizes
– Application-level header and body (full packet contents)
Measurement Challenges for Operators
• Network-wide view
– Crucial for evaluating control actions
– Multiple kinds of data from multiple locations
• Large scale
– Large number of high-speed links and routers
– Large volume of measurement data
• Poor state-of-the-art
– Working within existing protocols and products
– Technology not designed with measurement in mind
• The “do no harm” principle
– Don’t degrade router performance
– Don’t require disabling key router features
– Don’t overload the network with measurement data
Network Operations Tasks
• Reporting of network-wide statistics
– Generating basic information about usage and reliability
• Performance/reliability troubleshooting
– Detecting and diagnosing anomalous events
• Security
– Detecting, diagnosing, and blocking security problems
• Traffic engineering
– Adjusting network configuration to the prevailing traffic
• Capacity planning
– Deciding where and when to install new equipment
Basic Reporting
• Producing basic statistics about the network
– For business purposes, network planning, ad hoc studies
• Examples
–
–
–
–
–
Proportion of transit vs. customer-customer traffic
Total volume of traffic sent to/from each private peer
Mixture of traffic by application (Web, Napster, etc.)
Mixture of traffic to/from individual customers
Usage, loss, and reliability trends for each link
• Requirements
– Network-wide view of basic traffic and reliability statistics
– Ability to “slice and dice” measurements in different ways
(e.g., by application, by customer, by peer, by link type)
Troubleshooting
• Detecting and diagnosing problems
– Recognizing and explaining anomalous events
• Examples
–
–
–
–
–
Why
Why
Why
Why
Why
a backbone link is suddenly overloaded
the route to a destination prefix is flapping
DNS queries are failing with high probability
a route processor has high CPU utilization
a customer cannot reach certain Web sites
• Requirements
– Network-wide view of many protocols and systems
– Diverse measurements at different protocol levels
– Thresholds for isolating significant phenomena
Security
• Detecting and diagnosing problems
– Recognizing suspicious traffic or disruptions
• Examples
– Denial-of-service attack on a customer or service
– Spread of a worm or virus through the network
– Route hijack of an address block by adversary
• Requirements
– Detailed measurements from multiple places
– Including deep-packet inspection, in some cases
– Online analysis of the data
– Installing filters to block the offending traffic
Traffic Engineering
• Adjusting resource allocation policies
– Path selection, buffer management, and link scheduling
• Examples
– OSPF weights to divert traffic from congested links
– BGP policies to balance load on peering links
– Link-scheduling weights to reduce delay for “gold” traffic
• Requirements
– Network-wide view of the traffic carried in the backbone
– Timely view of the network topology and configuration
– Accurate models to predict impact of control operations
(e.g., the impact of RED parameters on TCP throughput)
Capacity Planning
• Deciding whether to buy/install new equipment
– What? Where? When?
• Examples
–
–
–
–
–
Where to put the next backbone router
When to upgrade a link to higher capacity
Whether to add/remove a particular peer
Whether the network can accommodate a new customer
Whether to install a caching proxy for cable modems
• Requirements
– Projections of future traffic patterns from measurements
– Cost estimates for buying/deploying the new equipment
– Model of the potential impact of the change (e.g., latency
reduction and bandwidth savings from a caching proxy)
Examples of Public Data Sets
• Network-wide data
– Abilene and GEANT backbones
– Netflow, IGP, and BGP traces
• CAIDA DatCat
– Data catalogue maintained by CAIDA
– http://imdc.datcat.org/
• Interdomain routing
– RouteViews and RIPE-NCC
– BGP routing tables and update messages
• Traceroute and looking glass servers
– http://www.traceroute.org/
– http://www.nanog.org/lookingglass.html
Discussion
• How important is accuracy of the data?
• How can we validate measurement studies? (If we
know the answer already, why are we measuring?)
• How to do controlled experiments with measurement
techniques?
• Can we move measurement to a science rather than
an art?
• Can we identify incentives for making measurement
possible and data available?
• Distributed analysis of measurement data?
• An architecture for router or line-card support for
traffic and performance measurement?
• Trade-offs between security and privacy?
Download