On the efficient detection of elephant flows in aggregated network traffic Javier Rivillo Lopez Jose Alberto Hernandez Iain W. Phillips Networks and Control Group Research School of Informatics Loughborough University J.Rivillo-Lopez@lboro.ac.uk J.A.Hernandez@lboro.ac.uk I.W.Phillips@lboro.ac.uk LCS, 2005 Outline • Motivation • Flow analysis • Detection method • Experiments and results • Conclusions Motivation (I): Definitions • Flow: unidirectional set of packets of the same • transport protocol sharing the same source and destination IP addresses and ports. Elephant flow: stream of packets which contribute to network load substantially more than the rest of the flows. – A threshold must be defined by the network managers/administrators. – The threshold value depends on the network size. Motivation (II): Elephant and mice phenomenon • Usually a few • • flows carry most of the data Trace example from NLANR router. In future, MASTS. 0.1% of the flows carry nearly 83% of the total traffic Figure 1: A flow aggregation view of network traffic Motivation (III): Elephant and mice phenomenon • Elephant flows – Low-priority applications – Large data transfer transactions and peer-to-peer file sharing • Mice flows – Sensitive to delay, jitter and high loss rates. – Voice over IP, online gaming, small http requests • Under this phenomenon, Internet's best effort • delivery is not suitable. The performance of the network can be improved by detecting the elephant flows and applying traffic engineering solutions Flow analysis (II): Flow duration Figure 2: Flow duration histogram • Most of the mice flows (92%) have very short duration (< 2 sec) • Most elephants are long duration flows: heavy tail behaviour. Flow analysis (III): Mean interarrival time Figure 3: Flow mean interarrival histogram • Elephants have very low average packet interarrival time. • So, a flow with high average packet rate and long duration is very likely to be an elephant. Detection method (I): Sampling • Requirement: Low computational cost • Continuous monitoring not suitable: – Requires huge amount of resources. – Not scalable. • Sampling is required. • Random sampling not suitable because we lose • information about the packet interarrival and timing. Solution: Windowing. – Example: monitoring the network 20ms every 2 seconds. • Monitoring 1% of the time, Sampling factor = 20ms/2sec = 100 Detection method (II): Elephant detection algorithm • Objective: identify flows with low packet interarrival • • • time (high packet rate) and long duration: Elephants. Step 1. A flow has high packet rate when it has at least Np packets in a sampling window. Step 2. A flow is considered elephant when it has been identified as high packet rated flow in Nw different sampling windows. Parameters: – Np, Nw, w and T • In future, the algorithm will be adaptative: The parameters will be calculated automatically by the system. Experiments and Results • Results with w=20ms and T=2sec: Figure 4: Flows identified as elephant traffic • Np=2, Nw=2: 80% of the elephant flows are correctly identified, they carry 89% of the total traffic and 0.12% of the mice flows are misidentified as elephants • Increasing Np and Nw, we get more Precision but less Recall. Conclusions • Identifying elephant flows for traffic engineer solutions can improve the network performance. • The properties of elephant and mice flows have been obtained studying real traffic data. • The long tail behaviour and high packet transmission rate shown by the elephants have been used in the elephant detection method explained. • This scalable and low computational cost method uses high sampling rate for early detection of elephant flows. • We have shown in the results that it is a valid method and its parameters may be adjusted for a tradeoff between Precision and Recall in identifying the elephant flows. THANK YOU!