A Nonparametric CUSUM Algorithm for Timeslot Sequences with

advertisement
A Nonparametric CUSUM Algorithm for Timeslot
Sequences with Applications to Network Surveillance
Qi Zhang1, Carlos J. Rendon2, Daniel R. Jeske3
Veronica Montes de Oca1, and Mazda Marvasti4
1Graduate
Statistics Student, 2Undergraduate Computer Science Student,
3Professor and Director of Statistical Consulting Collaboratory
4Chief Technology Officer, Integrien Corporation
Variability in the Mean and Std. Dev. of the
# of Live Sessions on a Network Server
Implementation
Introduction
AliveTM
is the major software product of Integrien Corporation that
monitors, visually presents and reports the health of a business
information technology system.
The Statistical Consulting Collaboratory at the University of
California, Riverside was contacted to develop a nonparametric
statistical change-point detection procedure that would be applied
to most types of univariate data.
Illustrative Timeslot Distributions for
# of Live Sessions on a Network Server
Monte Carlo Simulations for H
Flow Chart
max( Sn )
Sn
Sn
max( Sn )
Historical Data
Off-line Processing
Construct
Timeslot
Distributions
Our work extended the conventional CUSUM procedure to a
nonparametric timeslot stationary context and is being implemented
into the next release of AliveTM.
Data
0
Determine H
for
Screening
Run CUSUM
on Historical
Data
Determine H
for New Data
Construct
Screened
Timeslot
Distributions
Screen
Alerts from
Historical
Data
1
2
3
4
5
0
n
6
1
2
3
4
5
n
6
For each simulated sample path, compute Max { max(Sn ) , max(Sn ) } .
H is the 100(1-g)th percentile of the EDF of these values, where g is the
nominal false alarm level.
Generalized CUSUM for Live Sessions
Real-time Processing
Data from a real client was available. Data within each hour timeslot
were assumed to be i.i.d. Empirical distributions for each timeslot
are estimated from a rolling window of 12 weeks of historical data.
CUSUM on
New Data
Monitor for
Alerts
Attribute Value
Denotes median of distribution
CUSUM Screening of Historical Data
Performance Evaluation
Assume the data windows causing alarms by the CUSUM procedure
are anomalous. A slope test is used to find the start and end point of
the data window.
168
1 2 3 4 5 6 7 8 9 10 11 12 13
Study Based on Real Client Data
Timeslot
…
1
12
13
…
14
20
21
Predict
Cycle 1
Predict
Cycle 2
…
Predict
Cycle 8
Predict
Cycle 9
Conventional CUSUM Procedure
Let Xn denote the measurement of a univariate process at the nth
time point and assume that X n ~ N( , 2 ) with µ and σ2 known. If Xn
shifts upward or downward more than K units from the mean, we
say that there is a serious change. The CUSUM statistics are
expressed as

n

 max  0 , S

n 1
 Xn  (  K )

n 1
 (  K )  Xn
S  max 0 , S

n
S


 K

n
target

1
2
3
4
5
6

n
End Point
At the time the CUSUM statistic alerts, begin a forward sequence of
fitted lines using windows containing the previous v points. Predicted
end point is the time at which the CUSUM is the largest value within
the first window for which the hypothesis
H0 : slope  0 vs H1 : slope  0
First forward-window after the alarm
where the slope is no longer positive
H

S  max 0 , S
Live
12
62.17
1.22
0
17.11
Active
1
8.00
0.00
0
17.56
Resp.
Time
10
352.00
1.11
0
7.770
Oracle
4
87.50
0.22
0
12.38
 (  K )  Xn


 K
(  K )  Xn
Report the average number of samples between the starting point of
an injected event and the point at which the CUSUM signals.
First backward-window before the alarm
where the slope is not positive
Predicted
Start Time
 Xn  (  K )
Conclusion: The procedure performs well with respect to 0 false
negatives per cycle indicating alarms will be adequately detected.
Average is based on 1,000 sample path simulations for each cycle.
Xn  (  K )

n 1
Computation
Time per
Cycle (min)
Inject an event that shifts the timeslot distributions by 100X% during
the second half of the week.
Signal an
alarm here
S 0

False
Negatives
per cycle
is not rejected on the basis of a t-test.

0

n 1
False
Positives
per cycle
Study Based on Simulated Data
S 0
S  max 0 , S
Level of change that is “serious.”
Average
detection
Time (min)
is not rejected on the basis of a t-test.

0


where K is generally called the reference value. If Sn or Sn are
above some predetermined threshold H, we conclude that there is
a change in the mean. The threshold H is determined to control the
average run length (ARL) between false alarms, and is usually
obtained from Monte Carlo Simulations.
Xn
Metric
Number
of Alarms
H0 : slope  0 vs H1 : slope  0
Predict
Cycle 7
12 weeks of historical data and 9 new monitoring weeks (cycles). True
alarms were determined by subject matter expert.
t
Predicted
End Time
CUSUM with Resetting
Reset CUSUM statistics after each alarm to eliminate the effect of
previous alarm. Alarm end is determined via slope test.
Xn
Reset point
Detection Sensitivity for Response Time Metric
Average number of samples to
detect
Week
Start Point
When the CUSUM statistic alerts, begin a backward sequence of
fitted lines using windows of v points. Predicted start point is the
rightmost point of the first window for which the hypothesis
600
500
400
Nonparametric CUSUM Procedure
For non-Gaussian measurements, use the 100ath and 100(1-a)th
percentile, Qt n (a ) and Qt n (1  a ), for each timeslot instead of  + K and 
– K. The generalized CUSUM becomes

 max  0 , S
Sn  max 0 , Sn1  X n  Qt n (a )
Sn

n 1

 Qt n (1  a )  X n

where tn = timeslot associated with the current hour  {1, 2, …168}
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
300
200
Cycle 9
100
0
-1
H
Cycle 1
0
1
2
3
4
5
x
H
Conclusion: If the shift is small, the average number of samples until
detection will be large. If the shift is large, the average number of
samples until detection will small, therefore an alarm will be signaled
immediately.
Real example from Integrien data
- S+
- S+
- S-
- S-
Special Thanks To: The Staff of Integrien Corporation, Pengyue James
Lin (CTO, College of Humanities, Arts and Social Sciences at UCR), Dr.
Huaying Karen Xu (Associate Director of Statistical Consulting
Collaboratory at UCR), Prof. Keh-Shin Lii (Dept of Statistics at UCR),
Graduate Students of the Spring 2006 offering of STAT 293.
Download