Anomaly detection in VoIP and Ethernet traffic under presence of daily patterns Piotr Żuraniewski (UvA/TNO/AGH) Felipe Mata (UAM), Michel Mandjes (UvA), Marco Mellia (POLITO) Changepoint detection • Changepoint detection: finding that current statistical description of data sample is no longer valid • Problem can be formulated in language of statistical hypothesis test Benefits of changepoint detection • Deviation from normal system state can be detected (anomaly detection) – attack on ICT infrastructure (excessive number of TCP SYN packets) – failure (excessive/too low traffic volume) – Service Level Agreement not met (delay out of acceptable range) • Human experts empowered with additional tool Benefits of statistics-based approach • Manual and on-line analysis of large data volumes may be infeasible • Visual inspection may be insufficient due to some hidden structures in data • Objective and unbiased opinion of human not always available • Possibility to control false alarm ratio/detection ratio Problems • Changepoint detection procedures often assume independent observations • Real life: dependency is present – stochastic one (mind ‘fractal’ models) – deterministic (e.g., diurnal trends) • High dependency may ruin changepoint detection test Possible solution • Estimate and remove trend from traffic – for VoIP traffic: try to exploit possible local Poissonian behavior – exploit periodicity • Only than apply changepoint detection procedure(s) to residuals – residuals should be (approx.) standard normal – anomaly: change from N(0,1) to N(m,s) Traffic, trend, residuals (no nights) 7 Contribution • We have developed changepoint detection test able to detect simultaneous change in mean and variance for Gaussian input • We have numerically assessed sensitivity to deviation from independence assumption – our simple trend removal method may still leave some dependency in residuals Synthetic Gaussian trace • Window of 50 observation presented to detector, sequential manner, delta – relative position of changepoint • True change from N(0,1) to N(3.07,1.082) from window 152 on (Erlang: it would give 0.1% blocking prob.) • 500 experiments, good performance 1 0.8 0.8 0.6 0.6 true delta Q 2.5% of detected deltas Q 25% of detected deltas Q 50% of detected deltas Q 75% of detected deltas Q 97.5% of detected deltas detection ratio 1 0.4 0.4 0.2 0.2 0 0 100 200 300 window number 400 0 0 100 200 300 window number 400 Dependent input • What if input to detection procedure is correlated? • Verification with genarated AR(1) traces • Recall: {Xi} is AR(1) process if it follows X i X i 1 i ; - mean; i - whitenoise • AR(1) autocorrelation (linear dependency measure) function is: k k , k 0,1, Correlated input – results phi 0 5.7% detection ratio for window no. 152 76.6% false alarm ratio (regen.) 5.7% 0.2 10.1% 77.9% 5.3% 0.4 17.7% 80.8% 10.4% 0.6 27.2% 85.9% 17.9% 0.8 36.8% 90.3% 24.0% 1 0.8 detection ratio mean false alarm ratio 0.6 0.4 0.2 0 0 100 200 300 window number • Correlation results in performance degradation • Due to dependency, false alarm ratio (FA) ratio in window k influences FA prob. in window k+1 • To assess this effect, FA is calculated for fully regenerated sample 400 Real data example data, pattern, detected anomalies (week 5) 1 calls 500 0.5 calls pattern 0 0 100 200 time 300 0 400 Ethernet traffic • Poissonian assumption may be problematic • Mean and variance to be estimated • Less regularity • Periodic moving average and simple moving average? Ethernet traffic (NREN) • Some traces show some regular patterns 7 Bps (10min. avg) 4 x 10 3 2 1 0 1.279 1.28 1.281 1.282 1.283 1.284 1.285 time (UNIX stamp) x 109 Trends 6 10 x 10 original trace estimated pattern estimated periodic pattern estimated MA pattern Bps (10min. avg) 8 6 4 2 5660 5680 5700 time 5720 5740 Trends 6 x 10 16original trace Bps (10min. avg) estimated pattern 14estimated periodic pattern 12estimated MA pattern 10 8 6 4 2 6680 6700 6720 time 6740 Residuals 7 1.5 x 10 residuals = trace - pattern Bps (10min. avg) 1 0.5 0 -0.5 -1 0 2000 4000 6000 time 8000 10000 Busy hour • The same model for day and night, working day and weekend may not be optimal in all cases • Now we focus on7 busy hour (8-15), no weekends 4 x 10 Bps (10min. avg) original trace busy hour 8-15, no weekends 3 2 1 0 1.279 1.28 1.281 1.282 1.283 1.284 1.285 9 time x 10 Residuals 6 14 x 10 residuals 12 10 Bps (10min. avg) 8 6 4 2 0 -2 -4 800 1000 1200 1400 time 1600 1800 2000 Residuals – 1st part 6 14 QQ Plot of Sample Data versus Standard Normal x 10 140 12 120 Quantiles of Input Sample 10 100 8 6 80 4 60 2 0 40 -2 20 -4 -6 -4 -3 -2 -1 0 1 Standard Normal Quantiles 2 3 4 0 -6 -4 -2 0 2 4 6 8 10 12 14 6 x 10 Residuals 2nd part 7 1.5 QQ Plot of Sample Data versus Standard Normal x 10 45 40 Quantiles of Input Sample 1 35 30 0.5 25 20 0 15 10 -0.5 5 -1 -4 -3 -2 -1 0 1 Standard Normal Quantiles 2 3 4 0 -6 -4 -2 0 2 4 6 8 10 12 14 6 x 10 Summary • We have extended anomaly-detection method developed for stationary VoIP traffic • Diurnal trends taken into consideration • Statistical framework as a basis but… • …practitioner’s perspective – simplifications – also considered • Other type of traffic – more challenges