MonNet A project for network and traffic monitoring

advertisement
Assessing the Nature of Internet traffic:
Methods and Pitfalls
Wolfgang John
Chalmers University of Technology, Sweden
together with
Min Zhang
Beijing Jiaotong University, China
Maurizio Dusi
Università degli Studi di Brescia, Italy
kc claffy, Nevil Brownlee
CAIDA, SDSC, UCSD, USA
Introduction
• Traffic classification (TC)
?
?
?
?
Bittorrent
?
?
HTTP
?
SMTP
TrefPunkt 20
2009-05-13
Introduction (cont.)
• Why traffic classification?
–
–
–
–
–
–
Network design and provisioning
QoS assignment and traffic shaping
Accounting
Security monitoring: IDS/IPS
Network Forensics
Trends and changes in network applications
TrefPunkt 20
2009-05-13
Outline
• Classification Methods
– Research review and taxonomy
• Survey analysis: P2P
• Pitfalls
– Systematic shortcomings
– Re-validate assumptions
• UDP rising
• Routing (a)symmetry on backbone links
TrefPunkt 20
2009-05-13
Research Review and Taxonomy
• Research review
– create a structured taxonomy of traffic classification
papers and their datasets
– help to answer popular questions
– reveal open issues and challenges
http://www.caida.org/research/traffic-analysis/classification-overview
TrefPunkt 20
2009-05-13
Research review and taxonomy: Overview
• 64 papers published between 1994 and 2008
• Definition: traffic classification
“Methods to classify traffic data sets
based on features passively observed in the traffic,
according to specific classification goals.”
TrefPunkt 20
2009-05-13
Research review and taxonomy: Datasets and Goals
• Data sets: >80 data sets used for 64 papers!
– Time of collection, link type, capture environments,
geographic location, (payload, anonymization), etc.
• Classification goals:
– Coarse or fine-grained classification
– Applications or protocols
TrefPunkt 20
2009-05-13
Research review and taxonomy: Features
• Features
– Reacting on application development
TrefPunkt 20
2009-05-13
Research review and taxonomy: Methods
• Methods
– exact matching
• port number, payload, etc
– heuristic methods
• e.g. on connection patterns
– machine learning methods
• supervised and unsupervised
TrefPunkt 20
2009-05-13
Survey analysis: P2P
• How much P2P?
1.3% to 93% across the 18 (out of 64) papers
TrefPunkt 20
2009-05-13
Survey analysis: P2P (contd.)
• So how much of modern Internet traffic is P2P?
"there is a wide range of P2P traffic on Internet links;
see your specific link of interest and classification
technique you trust for more details."
TrefPunkt 20
2009-05-13
Survey analysis: P2P (contd.)
• SUNET: April till Nov. 2006
TrefPunkt 20
2009-05-13
Outline
• Methods
– Research review and taxonomy
• Survey analysis: P2P
• Pitfalls
– Systematic shortcomings
– Re-validate assumtions
• UDP rising
• Routing (a)symmetry on backbone links
TrefPunkt 20
2009-05-13
Systematic Shortcomings
• Poor comparability of results!!!
– 80 data sets by 64 papers
→ lack of shared, modern data sets as reference data
– no clear definitions (P2P or file-sharing …)
→ lack of standardized measures
→ lack of defined classification goals
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic
• Current TC approaches consider mainly TCP
– Assumptions
• TCP is dominating traffic
• Bulk (data) transfer is done via TCP
– Advantage
• TCP has a clear notion of “sessions”
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic (cont.)
• There might be a shift (soon):
– IPTV applications
• PPLive, PPStream: switched to UDP in Oct. 2008
• VA (Video Accelerator): UDP for data transfer
– P2P applications
• uTP: Micro Transport protocol, based on UDP
– Part of uTorrent 1.9 beta, expected during 2010
All on high, random ports (of course …)
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic (cont.)
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic (cont.)
• CDF of UDP flows per Port number
Indeed, high ephemeral ports are common today!
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic (cont.)
• Avg. Packets/Flow for top 10 UDP ports
No substantial data portions carried (on these links - yet)
TrefPunkt 20
2009-05-13
Assumption: TCP dominates traffic (cont.)
• Current situation (on the links measured)
– TCP dominating pkts (bytes), UDP dominating flows
• UDP for P2P overlay signaling
• This might change soon:
– UDP based IPTV already common in China, uTP …
• UDP for bulk and streaming data transfer
→ TC methods can no longer ignore UDP?
TrefPunkt 20
2009-05-13
Assumption: routing symmetry
• Current approaches consider bidirectional traffic
– Assumption
• Traffic is routed symmetrically
– Same path for forward and backward direction
– Advantage
• Bi-directional information offers more features for
classification
• For TCP, bi-directional information allows easier
inference of sessions (connections)
TrefPunkt 20
2009-05-13
Assumption: routing symmetry (cont.)
• Degree of symmetry
– 4 link locations
(Sweden and USA)
– 2 samples each
TrefPunkt 20
2009-05-13
Assumption: routing symmetry (cont.)
• Beyond Intranets and access links (edge
networks), there is little symmetry
• Degree of symmetry decreases with level of
“coreness” of the link
→ TC methods for backbone links need to master
unidirectional data flows
TrefPunkt 20
2009-05-13
Summary
• Research review
– structured taxonomy of traffic classification papers
• Current systematic shortcomings
→ lack of shared, modern data sets as reference data
→ lack of standardized measures
→ lack of defined classification goals
• Upcoming technical challenges
→ TC methods can no longer ignore UDP
→ TC methods should handle unidirectional flows
TrefPunkt 20
2009-05-13
Traffic classification overview:
http://www.caida.org/research/traffic-analysis/classification-overview/
Observations on UDP traffic on Internet backbone links:
soon to be published on www.caida.org (“News” section)
Estimation of routing asymmetry on Internet links:
http://www.caida.org/research/traffic-analysis/asymmetry/
or Email: johnwolf@chalmers.se
Download