A Nonparametric CUSUM Algorithm for Timeslot Sequences with Applications to Network Surveillance Qi Zhang1, Carlos J. Rendon2, Daniel R. Jeske3 Veronica Montes de Oca1, and Mazda Marvasti4 1Graduate Statistics Student, 2Undergraduate Computer Science Student, 3Professor and Director of Statistical Consulting Collaboratory 4Chief Technology Officer, Integrien Corporation Variability in the Mean and Std. Dev. of the # of Live Sessions on a Network Server Implementation Introduction AliveTM is the major software product of Integrien Corporation that monitors, visually presents and reports the health of a business information technology system. The Statistical Consulting Collaboratory at the University of California, Riverside was contacted to develop a nonparametric statistical change-point detection procedure that would be applied to most types of univariate data. Illustrative Timeslot Distributions for # of Live Sessions on a Network Server Monte Carlo Simulations for H Flow Chart max( Sn ) Sn Sn max( Sn ) Historical Data Off-line Processing Construct Timeslot Distributions Our work extended the conventional CUSUM procedure to a nonparametric timeslot stationary context and is being implemented into the next release of AliveTM. Data 0 Determine H for Screening Run CUSUM on Historical Data Determine H for New Data Construct Screened Timeslot Distributions Screen Alerts from Historical Data 1 2 3 4 5 0 n 6 1 2 3 4 5 n 6 For each simulated sample path, compute Max { max(Sn ) , max(Sn ) } . H is the 100(1-g)th percentile of the EDF of these values, where g is the nominal false alarm level. Generalized CUSUM for Live Sessions Real-time Processing Data from a real client was available. Data within each hour timeslot were assumed to be i.i.d. Empirical distributions for each timeslot are estimated from a rolling window of 12 weeks of historical data. CUSUM on New Data Monitor for Alerts Attribute Value Denotes median of distribution CUSUM Screening of Historical Data Performance Evaluation Assume the data windows causing alarms by the CUSUM procedure are anomalous. A slope test is used to find the start and end point of the data window. 168 1 2 3 4 5 6 7 8 9 10 11 12 13 Study Based on Real Client Data Timeslot … 1 12 13 … 14 20 21 Predict Cycle 1 Predict Cycle 2 … Predict Cycle 8 Predict Cycle 9 Conventional CUSUM Procedure Let Xn denote the measurement of a univariate process at the nth time point and assume that X n ~ N( , 2 ) with µ and σ2 known. If Xn shifts upward or downward more than K units from the mean, we say that there is a serious change. The CUSUM statistics are expressed as n max 0 , S n 1 Xn ( K ) n 1 ( K ) Xn S max 0 , S n S K n target 1 2 3 4 5 6 n End Point At the time the CUSUM statistic alerts, begin a forward sequence of fitted lines using windows containing the previous v points. Predicted end point is the time at which the CUSUM is the largest value within the first window for which the hypothesis H0 : slope 0 vs H1 : slope 0 First forward-window after the alarm where the slope is no longer positive H S max 0 , S Live 12 62.17 1.22 0 17.11 Active 1 8.00 0.00 0 17.56 Resp. Time 10 352.00 1.11 0 7.770 Oracle 4 87.50 0.22 0 12.38 ( K ) Xn K ( K ) Xn Report the average number of samples between the starting point of an injected event and the point at which the CUSUM signals. First backward-window before the alarm where the slope is not positive Predicted Start Time Xn ( K ) Conclusion: The procedure performs well with respect to 0 false negatives per cycle indicating alarms will be adequately detected. Average is based on 1,000 sample path simulations for each cycle. Xn ( K ) n 1 Computation Time per Cycle (min) Inject an event that shifts the timeslot distributions by 100X% during the second half of the week. Signal an alarm here S 0 False Negatives per cycle is not rejected on the basis of a t-test. 0 n 1 False Positives per cycle Study Based on Simulated Data S 0 S max 0 , S Level of change that is “serious.” Average detection Time (min) is not rejected on the basis of a t-test. 0 where K is generally called the reference value. If Sn or Sn are above some predetermined threshold H, we conclude that there is a change in the mean. The threshold H is determined to control the average run length (ARL) between false alarms, and is usually obtained from Monte Carlo Simulations. Xn Metric Number of Alarms H0 : slope 0 vs H1 : slope 0 Predict Cycle 7 12 weeks of historical data and 9 new monitoring weeks (cycles). True alarms were determined by subject matter expert. t Predicted End Time CUSUM with Resetting Reset CUSUM statistics after each alarm to eliminate the effect of previous alarm. Alarm end is determined via slope test. Xn Reset point Detection Sensitivity for Response Time Metric Average number of samples to detect Week Start Point When the CUSUM statistic alerts, begin a backward sequence of fitted lines using windows of v points. Predicted start point is the rightmost point of the first window for which the hypothesis 600 500 400 Nonparametric CUSUM Procedure For non-Gaussian measurements, use the 100ath and 100(1-a)th percentile, Qt n (a ) and Qt n (1 a ), for each timeslot instead of + K and – K. The generalized CUSUM becomes max 0 , S Sn max 0 , Sn1 X n Qt n (a ) Sn n 1 Qt n (1 a ) X n where tn = timeslot associated with the current hour {1, 2, …168} Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 300 200 Cycle 9 100 0 -1 H Cycle 1 0 1 2 3 4 5 x H Conclusion: If the shift is small, the average number of samples until detection will be large. If the shift is large, the average number of samples until detection will small, therefore an alarm will be signaled immediately. Real example from Integrien data - S+ - S+ - S- - S- Special Thanks To: The Staff of Integrien Corporation, Pengyue James Lin (CTO, College of Humanities, Arts and Social Sciences at UCR), Dr. Huaying Karen Xu (Associate Director of Statistical Consulting Collaboratory at UCR), Prof. Keh-Shin Lii (Dept of Statistics at UCR), Graduate Students of the Spring 2006 offering of STAT 293.