Network Measurement

advertisement
Network Measurement
Jennifer Rexford
Advanced Computer Networks
http://www.cs.princeton.edu/courses/archive/fall08/cos561/
Tuesdays/Thursdays 1:30pm-2:50pm
Outline
• Traffic
– SNMP link statistics
– Packet and flow monitoring
• Network topology
– IP routers and links
– Intradomain route monitoring
• Interdomain routes
– BGP route monitoring and table dumps
– Inferring AS-level topology and biz relationships
• Conclusions
Traffic Measurement
Why is Traffic Measurement Important?
• Billing the customer
– Measure usage on links to/from customers
– Applying billing model to generate a bill
• Traffic engineering and capacity planning
– Measure the traffic matrix (i.e., offered load)
– Tune routing protocol or add new capacity
• Denial-of-service attack detection
– Identify anomalies in the traffic
– Configure routers to block the offending traffic
• Analyze application-level issues
– Evaluate benefits of deploying a Web caching proxy
– Quantify fraction of traffic that is P2P file sharing
Collecting Traffic Data: SNMP
• Simple Network Management Protocol
– Standard Management Information Base (MIB)
– Protocol for querying the MIBs
• Advantage: ubiquitous
– Supported on all networking equipment
– Multiple products for polling and analyzing data
• Disadvantages: dumb
– Coarse granularity of the measurement data
• E.g., number of byte/packet per interface per 5 minutes
– Cannot express complex queries on the data
– Unreliable delivery of the data using UDP
Collecting Traffic Data: Packet Monitoring
• Packet monitoring
– Passively collecting IP packets on a link
– Recording IP, TCP/UDP, or application-layer traces
• Advantages: details
– Fine-grain timing information
• E.g., can analyze the burstiness of the traffic
– Fine-grain packet contents
• Addresses, port numbers, TCP flags, URLs, etc.
• Disadvantages: overhead
– Hard to keep up with high-speed links
– Often requires a separate monitoring device
Collecting Traffic Data: Flow Statistics
• Flow monitoring (e.g., Cisco Netflow)
– Statistics about groups of related packets (e.g.,
same IP/TCP headers and close in time)
– Recording header information, counts, and time
• Advantages: detail with less overhead
– Almost as good as packet monitoring, except no
fine-grain timing information or packet contents
– Often implemented directly on the interface card
• Disadvantages: trade-off detail and overhead
– Less detail than packet monitoring
– Less ubiquitous than SNMP statistics
Using the Traffic Data in Network Operations
• SNMP byte/packet counts: everywhere
– Tracking link utilizations and detecting anomalies
– Generating bills for traffic on customer links
– Inference of the offered load (i.e., traffic matrix)
• Packet monitoring: selected locations
– Analyzing the small time-scale behavior of traffic
– Troubleshooting specific problems on demand
• Flow monitoring: selective, e.g,. network edge
– Tracking the application mix
– Direct computation of the traffic matrix
– Input to denial-of-service attack detection
Flow Measurement
Flow Measurement: Outline
• Definition
– Passively collecting statistics about groups of packets
– Group packets based on headers and spacing in time
– Essentially a way to aggregate packet measurement data
• Scope
– Medium-grain information about user behavior
– Passively monitoring the link or the interface/router
– Helpful in characterizing, detecting, diagnosing, and fixing
• Outline
–
–
–
–
Definition of an IP “flow” (sequence of related packets)
Flow measurement data and its applications
Mechanics of collecting flow-level measurements
Reducing the overheads of flow-level measurement
IP Flows
flow 1
flow 2
flow 3
flow 4
• Set of packets that “belong together”
– Source/destination IP addresses and port numbers
– Same protocol, ToS bits, …
– Same input/output interfaces at a router (if known)
• Packets that are “close” together in time
– Maximum spacing between packets (e.g., 15 sec, 30 sec)
– Example: flows 2 and 4 are different flows due to time
Flow Abstraction
• A flow is not exactly the same as a “session”
– Sequence of related packets may be multiple flows
(due to the “close together in time” requirement)
– Sequence of related packets may not follow the same links
(due to changes in IP routing)
– A “session” is difficult to measure from inside the network
• Motivation for this abstraction
– As close to a “session” as possible from inside the network
– Flow switching paradigm from IP-over-ATM technology
– Router optimization for forwarding/access-control decisions
(cache the result after the first packet in a flow)
– … might as well throw in a few counters
Recording Traffic Statistics (e.g., Netflow)
• Packet header information (same for every packet)
– Source and destination IP addresses
– Source and destination TCP/UDP port numbers
– Other IP & TCP/UDP header fields (protocol, ToS bits, etc.)
• Aggregate traffic information (summary of the traffic)
– Start and finish time of the flow (time of first & last packet)
– Total number of bytes and number of packets in the flow
– TCP flags (e.g., logical OR over the sequence of packets)
SYN
ACK
ACK
FIN
4 packets
1436 bytes
SYN, ACK, & FIN
start
finish
Recording Routing Info (e.g., Netflow)
• Input and output interfaces
– Input interface is where the packets entered the router
– Output interface is the “next hop” in the forwarding table
• Source and destination IP prefix (mask length)
– Longest prefix match on the src and dest IP addresses
• Source and destination autonomous system numbers
– Origin AS for src/dest prefix in the BGP routing table
forwarding table
Processor
Line card
Line card
Line card
BGP table
Line card
Switching
Fabric
Line card
Line card
Measuring Traffic as it Flows By
source
dest
input
output
source
prefix
source AS
dest
prefix
intermediate AS
Source and destination: IP header
Source and dest prefix: forwarding table or BGP table
Source and destination AS: BGP table
dest AS
Packet vs. Flow Measurement
• Basic statistics (available from both techniques)
– Traffic mix by IP addresses, port numbers, and protocol
– Average packet size
• Traffic over time
– Both: traffic volumes on a medium-to-large time scale
– Packet: burstiness of the traffic on a small time scale
• Statistics per TCP connection
– Both: number of packets & bytes transferred over the link
– Packet: frequency of lost or out-of-order packets, and the
number of application-level bytes delivered
• Per-packet info (available only from packet traces)
– TCP seq/ack #s, receiver window, per-packet flags, …
– Probability distribution of packet sizes
– Application-level header and body (full packet contents)
Collecting Flow Measurements
Route CPU that generates flow records
…may degrade forwarding performance
CPU
Router A
Line card that generates flow records
Router A
…more efficient to support
measurement in each line card
Packet monitor that generates flow records
Router A
Router B
Monitor
…third party
Router Collecting Flow Measurement
• Advantage
– No need for separate measurement device(s)
– Monitor traffic over all links in/out of router (parallelism)
– Ease of providing routing information for each flow
• Disadvantage
–
–
–
–
Requirement for support in the router product(s)
Danger of competing with other 1st-order router features
Possible degradation of the throughput of the router
Difficulty of online analysis/aggregation of data on router
• Practical application
– View from multiple vantage points (e.g., all edge links)
Packet Monitor Collecting Flow Records
• Advantages
– No performance impact on packet forwarding
– No dependence on support by router vendor
– Possibility of customizing the thinning of the data
• Disadvantages
– Overhead/cost of tapping a link & reconstructing packets
– Cost of buying, deploying, and managing extra equipment
– No access to routing info (input/output link, IP prefix, etc.)
• Practical application
– Selective monitoring of a small number of links
– Deployment in front of particular services or sites
• Packet monitor vendors support flow-level output
Mechanics: Flow Cache
• Maintain a cache of active flows
– Storage of byte/packet counts, timestamps, etc.
• Compute a key per incoming packet
– Concatenation of source, destination, port #s, etc.
• Index into the flow cache based on the key
– Creation or updating of an entry in the flow cache
key #bytes, #packets, start, finish
header
packet
key
key #bytes, #packets, start, finish
Mechanics: Evicting Cache Entries
• Flow timeout
– Remove flows that have not received a packet recently
– Periodic sequencing through the cache to time out flows
– New packet triggers the creation of a new flow
• Cache replacement
– Remove flow(s) when the flow cache is full
– Evict existing flow(s) upon creating a new cache entry
– Apply eviction policy (LRU, random flow, etc.)
• Long-lived flows
– Remove flow(s) that persist for a long time (e.g., 30 min)
– … otherwise flow statistics don’t become available
– … and the byte and packet counters might overflow
Sampling: Packet Sampling
• Packet sampling before flow creation (Sampled Netflow)
– 1-out-of-m sampling of individual packets (e.g., m=100)
– Create of flow records over the sampled packets
• Reducing overhead
– Avoid per-packet overhead on (m-1)/m packets
– Avoid creating records for a large number of small flows
• Increasing overhead (in some cases)
– May split some long transfers into multiple flow records
– … due to larger time gaps between successive packets
time
not sampled
timeout
two flows
Conclusions
• Flow measurement
– Medium-grain view of traffic on one or more links
• Advantages
– Lower measurement volume than full packet traces
– Available on high-end line cards (Cisco Netflow)
– Control over overhead via aggregation and sampling
• Disadvantages
– Computation and memory requirements for the flow cache
– Loss of fine-grain timing and per-packet information
– Not uniformly supported by router vendors
Intradomain Network Topology
IP Topology
• Topology information
– Routers
– Links, and their capacities
• Internal links inside the AS
• Edge links connecting to neighboring domains
• Ways to learn the topology
– Inventory database
– SNMP polling/traps
– Traceroute
– Route monitoring
– Router configuration data
Below IP
• Layer-2 paths
– ATM virtual circuits
– Frame Relay virtual circuits
• Mapping to lower layers
– Specific fibers
– Shared optical amplifiers
– Shared conduits
– Physical length (propagation delay)
• Information not visible to IP
– Stored in an inventory database
– Not necessarily generated/updated automatically
Intradomain Monitoring: OSPF Protocol
• Link-state protocol
– Routers flood Link State Advertisements (LSAs)
– Routers compute shortest paths based on weights
– Routers identify next-hop to reach other routers
2
3
2
1
1
1
3
5
4
3
Intradomain Route Monitoring
• Construct continuous view of topology
– Detect when equipment goes up or down
– Input to traffic-engineering and planning tools
• Detect routing anomalies
– Identify failures, LSA storms, and route flaps
– Verify that LSA load matches expectations
– Flag strange weight settings as misconfigurations
• Analyze convergence delay
– Monitor LSAs in multiple locations with go
– Compare the times when LSAs arrive
• Detect router implementation mistakes
Passive Collection of LSAs
• OSPF is a flooding protocol
– Every LSA sent on every participating link
– Very helpful for simplifying the monitor
• Can participate in the protocol
– Shared media (e.g., Ethernet)
• Join multicast group and listen to LSAs
– Point-to-point links
• Establish an adjacency with a router
• … or passively monitor packets on a link
– Tap a link and capture the OSPF packets
Interdomain Route Monitoring
Motivation for BGP Monitoring
• Visibility into external destinations
– What neighboring ASes are telling you
– How you are reaching external destinations
• Detecting anomalies
–
–
–
–
Increases in number of destination prefixes
Lost reachability to some destinations
Route hijacking
Instability of the routes
• Input to traffic-engineering tools
– Knowing the current routes in the network
• Workload for testing routers
– Realistic message traces to play back to routers
BGP Monitoring: A Wish List
• Ideally: knowing what the router knows
– All externally-learned routes
– Before policy has modified the attributes
– Before a single best route is picked
• How to achieve this
– Special monitoring session on routers that tells
everything they have learned
– Packet monitoring on all links with BGP sessions
• If you can’t do that, you could always do…
– Periodic dumps of routing tables
– BGP session to learn best route from router
Using Routers to Monitor BGP
Talk to operational
routers using SNMP or
telnet at command line
Establish a “passive” BGP
session from a workstation
running BGP software
eBGP or iBGP
(-) BGP table dumps
are expensive
(+) BGP table dumps do not
burden operational routers
(+) Table dumps show all
alternate routes
(-) Receives only best routes from
BGP neighbor
(-) Update dynamics lost
(+) Update dynamics captured
(-) restricted to interfaces
provided by vendors
(+) not restricted to interfaces
provided by vendors
Collect BGP Data From Many Routers
Seattle
Cambridge
Chicago
New York
Kansas City
Denver
San
Francisco
Detroit
Philadelphia
St. Louis
Washington, D.C.
2
Los Angeles
Dallas
San Diego
Atlanta
Phoenix
Austin
Houston
BGP is not a flooding protocol
Orlando
Route Monitor
Example: BGP Table (“show ip bgp” at RouteViews)
Network
* 3.0.0.0
*
*
*
*
*>
*
* 9.184.112.0/20
*
*>
*
*
*
Next Hop
Metric LocPrf Weight Path
205.215.45.50
0 4006 701 80 i
167.142.3.6
0 5056 701 80 i
157.22.9.7
0 715 1 701 80 i
195.219.96.239
0 8297 6453 701 80 i
195.211.29.254
0 5409 6667 6427 3356 701 80 i
12.127.0.249
0 7018 701 80 i
213.200.87.254
929
0 3257 701 80 i
205.215.45.50
0 4006 6461 3786 i
195.66.225.254
0 5459 6461 3786 i
203.62.248.4
0 1221 3786 i
167.142.3.6
0 5056 6461 6461 3786 i
195.219.96.239
0 8297 6461 3786 i
195.211.29.254
0 5409 6461 3786 i
AS 80 is General Electric, AS 701 is UUNET, AS 7018 is AT&T
AS 3786 is DACOM (Korea), AS 1221 is Telstra
Inferring the AS Topology
What is the AS Graph?
• Node: Autonomous System
• Edge: Two ASes that speak BGP to each other
4
3
5
2
1
7
6
How Do You Know a Node or Edge Exists?
• Consult the Whois database?
– Tells which ASes have been allocated
– But, might be out-of-date on who owns it
– … and often doesn’t say who the neighbors are
• See a path that uses the node/edge
– Collect measurements of AS paths
– Extract all of the nodes and edges
– E.g., AS path “7018 1 88” implies
• Nodes: 7018, 1, and 88
• Edges: (7018, 1) and (1, 88)
Interdomain Routing Policies
• Two main decisions
– Path selection: which of the paths to use?
– Path export: which neighbors to tell?
• Both driven by business relationships, e.g.,
– Customer pays provider for Internet access
– Peers find it mutually advantageous to cooperate
“12.34.158.0/24: path (2,1)”
3
“12.34.158.0/24: path (1)”
1
2
data traffic
data traffic
12.34.158.5
Customer-Provider Relationship
• Customer needs to be reachable from everyone
– Provider exports routes learned from customer to everyone
• Customer does not want to provide transit service
– Customer does not export from one provider to another
Traffic to the customer
Traffic from the customer
d
provider
advertisements
provider
traffic
customer
d
customer
Peer-Peer Relationship
• Peers exchange traffic between customers
– AS exports only customer routes to a peer
– AS exports a peer’s routes only to its customers
Traffic to/from the peer and its customers
advertisements
peer
d
traffic
peer
Paths You Should Never See (“Invalid”)
Customer-provider
Peer-peer
two peer edges
transit through a customer
Other Kinds of Relationships
• Siblings
–
–
–
–
Same company
Mutual transit service
Like one bigger AS
Mergers, acquisitions, …
• Backup
– Used only when failure
– Second provider
– Backup peering
• Geography-specific
– Customer in U.S.
– Peer in Europe
E
A
B
H
C
D
F
G
primary
backup
AS Relationships Matter
• Scientific understanding
– Understanding Internet structure and evolution
– Understanding why certain paths are used for traffic
• Placement of Web servers
– Want to be close to most customer networks
• Business decisions
– Selecting new peer or provider, or renegotiating relations
• Security policies
– Knowing which BGP routes look suspicious
• Analyzing BGP convergence
– Relationships have a big impact here (more later!)
Inferring AS Relationships
• Top down: how routes are selected
– AS relationships define routing policy
– Routing policy determines the routes you see
• Bottom up: how policies can be inferred
– Routing data are available from public sources
– The chosen routes tell you about the policy
• Example: seeing path “A B C” tells you…
– B permits A to transit through B to reach C
– (A,B) and (B,C) cannot both be peering links
– A and C are not both upstream providers of B
Type-of-Relationship Problem
• Given the inputs
– AS graph G(V,E) with vertices V and edges E
– Set of paths P on the graph G
• Find a solution that
– Labels each edge with an AS relationship
– Minimizes the number of “invalid” paths in P
• Rich area of research work
– http://www-unix.ecs.umass.edu/~lgao/ton.ps
– http://www.cs.princeton.edu/~jrex/papers/infocom02.pdf
– http://www.caida.org/publications/papers/2006/as_relation
ships_inference/
Conclusions
• Passive measurements
– Traffic: SNMP, packets, and flows
– Routing: intradomain and interdomain
• Publicly-available measurements
– Netflow from Abilene Internet2
– BGP updates and table dumps from RouteViews
• Constructing AS-level topology
– AS graph based on edges in AS paths
– Inferring business relationships between ASes
Download