Automatically Inferring Patterns of Resource Consumption in Network Traffic Cristian Estan, Stefan Savage, George Varghese University of California, San Diego Who is using my link? July 26, 2016 Traffic Clusters - 2003 2 Looking at the traffic Too much data for a human Do something smarter! July 26, 2016 Traffic Clusters - 2003 3 Looking at traffic aggregates Src. Dest. IPIP Dest. Source IP port Src. Dest. net net Dest. net Rank Protocol Destination IP Dest. port Traffic Aggregating onRank packet fields Which Source port header Traffic 1individual jeff.dorm.bigU.edu 11.9% gives useful results RankbutDestination network Traffic network uses Where does the 1 Web 42.1%3.12% 2 tracy.dorm.bigU.edu Traffic reports are not always at the right granularity What apps web and traffic come 1 library.bigU.edu 27.5% risc.cs.bigU.edu (e.g. individual3 subnet, etc.) 6.7% 2.83% 2 IP address, Kazaa are used? which one from? 2 cs.bigU.edu Cannot show aggregates defined over …… multiple fields 3 Ssh 6.3% 18.1% kazaa? (e.g. which network uses which application) 3 dorm.bigU.edu Src. port 17.8% The traffic analysis tool should automatically find aggregates over the Most right fields the right trafficatgoes granularity to the dorms … July 26, 2016 Traffic Clusters - 2003 4 Ideal traffic report Traffic aggregate Traffic Web traffic 42.1% Web traffic to library.bigU.edu 26.7% Web traffic from www.schwarzenegger.com 13.4% ICMP traffic from sloppynet.badU.edu to jeff.dorm.bigU.edu 11.9% Web is the dominant This The paper is about theofnetwork application This isisagiving Denial library a That’s a bigtraffic flashreports administrator insightful Service attack !! heavy user of web crowd! July 26, 2016 Traffic Clusters - 2003 5 Contributions of this paper Approach Definitions Algorithms System Experience July 26, 2016 Traffic Clusters - 2003 6 Approach Characterize traffic mix by describing all important traffic aggregates Multidimensional aggregates (e.g. flash crowd described by protocol, port number and IP address) Aggregates at the the right level of granularity (e.g. computer, subnet, ISP) Traffic analysis is automated – finds insightful data without human guidance July 26, 2016 Traffic Clusters - 2003 7 Definition: traffic clusters Traffic clusters are the multidimensional traffic aggregates identified by our reports A cluster is defined by a range for each field The ranges are from natural hierarchies (e.g. IP prefix hierarchy) – meaningful aggregates Example Traffic aggregate: incoming web traffic for CS Dept. Traffic cluster: ( SrcIP=*, DestIP in 132.239.64.0/21, Proto=TCP, SrcPort=80, DestPort in [1024,65535] ) July 26, 2016 Traffic Clusters - 2003 8 Definition: traffic report Traffic reports give the volume of chosen traffic clusters To keep report size manageable describe only clusters above threshold (e.g. H=total of traffic/20) To avoid redundant data compress by omitting clusters whose traffic can be inferred (up to error H) from nonoverlapping more specific clusters in the report To highlight non-obvious aggregates prioritize by using unexpectedness label Example » 50% of all traffic is web » Prefix B receives 20% of all traffic » The web traffic received by prefix B is 15% instead of 50%*20%=10%, unexpectedness label is 15%/10%=150% July 26, 2016 Traffic Clusters - 2003 9 Contributions of this paper Approach Definitions Algorithms System Experience July 26, 2016 Traffic Clusters - 2003 10 Algorithms and theory Algorithms and theoretical bounds in the paper Unidimensional reports are easy to compute Multidimensional reports are exponentially harder as we add more fields Next few slides Example of unidimensional compression Example for the structure of the multidimensional cluster space July 26, 2016 Traffic Clusters - 2003 11 Unidimensional report example Threshold=100 Hierarchy 10.0.0.0/28 500 10.0.0.0/29 120 10.0.0.8/29 380 10.0.0.0/30 50 10.0.0.4/30 70 10.0.0.2/31 50 10.0.0.4/31 70 15 10.0.0.2 35 30 10.0.0.8/30 305 10.0.0.8/31 270 40 160 10.0.0.3 10.0.0.4 10.0.0.5 July 26, 2016 10.0.0. 10/31 110 10.0.0.8 10.0.0.9 Traffic Clusters - 2003 75 10.0.0.12/30 35 75 10.0.0.14/31 35 75 10.0.0.10 10.0.0.14 12 Unidimensional report example Compression 10.0.0.0/28 500 Source IP Traffic 10.0.0.0/29 120 10.0.0.8/29 380 380-270≥100 10.0.0.0/29 120 10.0.0.8/29 10.0.0.8/30 305 305-270<100 380 10.0.0.8 160 10.0.0.8/31 270 10.0.0.9 110 160 110 10.0.0.8 10.0.0.9 July 26, 2016 Traffic Clusters - 2003 13 Multidimensional structure ex. Nodes (clusters) have multiple parents Source net Nodes (clusters) overlap Application All traffic US All traffic US EU CA CA NY Web GB Web Mail DE US Web July 26, 2016 Traffic Clusters - 2003 14 Contributions of this paper Approach Definitions Algorithms System Experience July 26, 2016 Traffic Clusters - 2003 15 System: AutoFocus Cluster miner names Grapher Traffic parser Web based GUI categories Packet header trace July 26, 2016 Traffic Clusters - 2003 16 July 26, 2016 Traffic Clusters - 2003 17 July 26, 2016 Traffic Clusters - 2003 18 July 26, 2016 Traffic Clusters - 2003 19 Contributions of this paper Approach Definitions Algorithms System Experience July 26, 2016 Traffic Clusters - 2003 20 Structure of regular traffic mix Backups from CAIDA to tape server SD-NAP Semi-regular time pattern FTP from SLAC Stanford Scripps web traffic Web & Squid servers Large ssh traffic Steady ICMP probing from CAIDA July 26, 2016 Traffic Clusters - 2003 SD-NAP 21 Analysis of unusual events UCSD to UCLA route change Sapphire/SQL Slammer worm Site 2 July 26, 2016 Traffic Clusters - 2003 22 Conclusionsuly 26, 2016 Traffic Clusters - 2003 23 Conclusions Multidimensional traffic clusters using natural hierarchies describe traffic aggregates Traffic reports using thresholding identify automatically conspicuous resource consumption at the right granularity Compression produces compact traffic reports and unexpectedness labels highlight non-obvious aggregates Our prototype system, AutoFocus, provides insights into the structure of regular traffic and unexpected events July 26, 2016 Traffic Clusters - 2003 24 Thank you! Alpha version of AutoFocus downloadable from http://ial.ucsd.edu/AutoFocus/ Any questions? Acknowledgements: NIST, NSF, Vern Paxson, David Moore, Liliana Estan, Jennifer Rexford, Alex Snoeren, Geoff Voelker July 26, 2016 Traffic Clusters - 2003 25 Bounds and running times Report size Running time unc. 1dim. rep. ≤1+(d-1)T/H O(n+m(d-1)) O(m(d-1)) 1dim. report ≤ T/H linear 1dim. Δ report ≤T1/H+T2/H unc. +dim. rep. ≤ T/H ∏di +dim. rep. ≤ T/H ∏di/max(di) +dim. Δ report July 26, 2016 Memory usage linear linear ≈result*n O(m+result) ≈eresult Traffic Clusters - 2003 26 Open questions Are there tighter bounds for the size of the reports? Are there algorithms that produce smaller results? Are there algorithms that compute traffic reports more efficiently? In streaming fashion? July 26, 2016 Traffic Clusters - 2003 27 Delta reports Why repeat the same traffic report if the traffic doesn’t change from one day to the other? Delta reports describe the clusters that increased or decreased by more than the threshold from one interval to the other On related traffic mixes delta reports much smaller than traffic reports Multidimensional compression very hard for delta reports We have only exponential algorithm for the cluster delta July 26, 2016 Traffic Clusters - 2003 28 Greedy compression algorithm July 26, 2016 Traffic Clusters - 2003 29 Multidimensional report example Thresholding July 26, 2016 Compression Traffic Clusters - 2003 30 System details Part Backend July 26, 2016 Language C++ GUI HTML, Javascript Glue perl Traffic Clusters - 2003 LoC 5400 Status stable 1000 functional 350 evolving 31