Microsoft PowerPoint

Magee Campus Dataset Shift Detection in Non-Stationary Environments using EWMA Charts Prof. Girijesh Prasad Co-authors: Haider Raza, Yuhua Li School of Computing & Intelligent Systems @ Magee, Faculty of Computing & Engineering, Derry~Londonderry. g.prasad@ulster.ac.uk http://isrc.ulster.ac.uk 1 Magee Campus Outline  Motivation  Background  Proposed contribution  Future work and Conclusion http://isrc.ulster.ac.uk 2 Magee Campus Motivation  Classical learning systems are built upon the assumption that the input data distribution for the training and testing are same.  Real-world environments are often non-stationary (e.g., EEG-based BCI)  So, learning in real-time environments is difficult due to the nonstationarity effects and the performance of system degrades with time.  So, predictors need to adapt online. However, online adaptation particularly for classifiers is difficult to perform and should be avoided as far as possible and this requires performing in real-time: Non-stationary shift-detection test. Target t x1 Inputs + y - x2 Error e http://isrc.ulster.ac.uk 3 Magee Campus Background • Supervised learning • Non-stationary environments • Dataset shift Dataset shift-detection • • Proposed Work Supervised learning (Mitchell, 1997) (Sugiyama et al. 2009) http://isrc.ulster.ac.uk Proposed Work Shift-Detection Dataset shift-detection (Shewhart 1939), (Page 1954), (Roberts 1959), (Alippi et al. 2011b), (Alippi & Roveri 2008a; Alippi & Roveri 2008b) Dataset shift (Torres et al. 2012), Non-stationary environments (M Krauledat 2008), (Sugiyama 2012). 4 Magee Campus Supervised Learning  Training samples: Input (𝑥) and output (𝑦)  Learn input-output rule: 𝑦 = 𝑓(𝑥)  Assumption: “Training and test samples are drawn from same probability distribution” i.e., (𝑃𝑡𝑟𝑎𝑖𝑛 𝑦, 𝑥 = 𝑃𝑡𝑒𝑠𝑡 𝑦, 𝑥 ) Is this assumption really true? No….!!! Not always true  http://isrc.ulster.ac.uk Reason :Non-Stationary Environments ! 5 Magee Campus Non-Stationarity Definition: Let (𝑋𝑡 )𝑡 ∈𝐼 be a multivariate time-series, where 𝐼 ⊂ ℝ is an index set. Then (𝑋𝑡 ) is called stationary time-series, if the probability distribution does not change over time, i.e., 𝑃 𝑋𝑡(𝑖) = 𝑃 𝑋𝑡(𝑗) for all 𝑡(𝑖) , 𝑡(𝑗) ∈ 𝐼. A time-series is called non-stationary, if it is not stationary. For examples: 𝟐) 𝓝(𝝁, 𝝈 Learning from past only is of limited use      Brain-computer interface Robot control Remote sensing application Network intrusion detection What is the challenge? http://isrc.ulster.ac.uk 6 Magee Campus Dataset Shift Dataset Shift appears when training and test joint distributions are different. That is, when (𝑃𝑡𝑟𝑎𝑖𝑛 𝑦, 𝑥 ≠ 𝑃𝑡𝑒𝑠𝑡 𝑦, 𝑥 ) (Torres, 2012) *Note : Relationship between covariates (x) and class label (y) XY: Predictive model (e.g., spam filtering) YX: Generative model (e.g., Fault detection ) Types of Dataset Shift  Covariate Shift  Prior Probability Shift  Concept Shift http://isrc.ulster.ac.uk Concept Covariate Prior probability shifts shift appears appears shift appears only inonly XY in problems YX 𝑝𝑡𝑟𝑎𝑖𝑛problems 𝑦 𝑥 ≠ 𝑝𝑡𝑒𝑠𝑡 𝑦 𝑥 , & 𝑝𝑡𝑟𝑎𝑖𝑛 𝑝𝑡𝑟𝑎𝑖𝑛 𝑦𝑦 𝑥 𝑥𝑥 ==𝑝𝑝𝑡𝑒𝑠𝑡 𝑦𝑥 𝑦 𝑥𝑖𝑛, & 𝑡𝑒𝑠𝑡 𝑥 𝑝𝑡𝑟𝑎𝑖𝑛 𝑋→ (𝑥) (𝑦) 𝑌 ≠ 𝑝𝑡𝑒𝑠𝑡 (𝑥) (𝑦) 7 Magee Campus Dataset Shift-Detection Detecting abrupt and gradual shifts in time-series data is called the data shift-detection. Types of Shift-Detection  Retrospective/offline-detection: (i.e., Shift-point analysis)  Real-time/online-detection: (i.e., Control charts) Types of Control Charts  Shewart Chart (Shewart, 1939)  Cumulative Sum (CUSUM) (E S Page, 1954)  Exponentially Weighted Moving Average (EWMA) (S W Roberts, 1959)  Computational Intelligence CUSUM (CI-CUSUM) (Alippi et al., 2008)  Intersection of Confidence Interval (ICI) (Alippi et al., 2011 ) http://isrc.ulster.ac.uk 8 Magee Campus Proposed Contribution  We have proposed dataset shift-detection test. • Shift-Detection based on Exponentially Weight Moving Average (SDEWMA) model http://isrc.ulster.ac.uk 9 Magee Campus Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA) 𝒛 𝒊 = 𝝀. 𝒙 𝒊 + 𝟏 − 𝝀 .𝒛 (1) 𝒊−𝟏 where λ is the smoothing constant (0<λ≤1). It is a first-order integrated moving average (ARIMA) model. 𝒙𝒊=𝒙 𝒊−𝟏 − 𝜺𝒊 − 𝜽𝜺𝒊−𝟏 (2) Where 𝜀𝑖 is a sequence of i.i.d random signal with zero mean and constant variance. Equation (1) with 𝝀 = 1 − 𝜃 , is the optimal 1-step-ahead prediction for this process 𝒙 𝒊+𝟏 (𝒊) = 𝒛 𝒊 The 1-step-ahead error are calculated as 𝒆𝒓𝒓 (𝒊) = 𝒙 (𝒊) − 𝒙 𝒊 (𝒊 − 𝟏) (3) IF the 1-step-ahead error is normally distributed, then UCL LCL 𝑷𝒙 𝒊 𝑷 −𝑳. 𝝈𝒆𝒓𝒓 ≤ 𝒆𝒓𝒓(𝒊) ≤ 𝑳. 𝝈𝒆𝒓𝒓 = 𝟏 − 𝜶 𝒊 − 𝟏 − 𝑳. 𝝈𝒆𝒓𝒓 ≤ 𝒙 𝒊 ≤ 𝒙 𝒊 𝒊 − 𝟏 + 𝑳. 𝝈𝒆𝒓𝒓 = 𝟏 − 𝜶 http://isrc.ulster.ac.uk 10 Magee Campus Proposed Algorithm: SD-EWMA Training Input: For each observation in the training dataset, go to training phase and compute the parameters for testing. Testing input: Receive new data sample-by-sample and perform the check as follows. IF (Shift detected) THEN (Report the point of shift and initiate an appropriate corrective action) ELSE (Continue and integrate the upcoming information). Output: Shift-detection points Training Phase 1. Initialize training data: 𝑥 𝑖 𝑓𝑜𝑟 𝑖 = 1 𝑡𝑜 𝑛, 𝑛 is the size of training data. 2. Calculate the mean of input data (𝑥) and set it to the initial value of 𝑧 0 . 3. Compute the 𝑧-statistics for each observation 𝑥 𝑖 in training data 𝑧 𝑖 = 𝜆. 𝑥 𝑖 + 1 − 𝜆 . 𝑧 𝑖−1 1. Compute 1-step-ahead prediction errors 𝑒𝑟𝑟 𝑖 = 𝑥 𝑖 − 𝑥𝑖 (𝑖 − 1) 1. Compute the sum of 1-step-ahead prediction error divided by number of observations and use it as the initial value of variance 𝜎𝑒𝑟𝑟 2(0) for the testing phase. 2. Set 𝜆 by minimizing the sum of squared prediction error on training data. Testing Phase 1. For each data point 𝑥 𝑖+1 in the operation/testing phase, 2. Compute 𝑧 𝑖 = 𝜆. 𝑥 𝑖 + 1 − 𝜆 . 𝑧 𝑖−1 3. Compute 𝑒𝑟𝑟 𝑖 = 𝑥 𝑖 − 𝑥(𝑖) (𝑖 − 1) 4. Compute the estimated variance 𝜎𝑒𝑟𝑟 2(𝑖) = 𝜗. 𝑒𝑟𝑟(𝑖) 2 + 1 − 𝜗 . 𝜎𝑒𝑟𝑟 2(𝑖−1) 5. Compute 𝑈𝐶𝐿(𝑖+1) and 𝐿𝐶𝐿(𝑖+1) 6. 𝑈𝐶𝐿(𝑖+1) = 𝑧 𝑖 + 𝐿. 𝜎𝑒𝑟𝑟 7. 𝐿𝐶𝐿(𝑖+1) = 𝑧 𝑖 − 𝐿. 𝜎𝑒𝑟𝑟 8. IF (𝐿𝐶𝐿(𝑖+1) < 𝑥 𝑖+1 < 𝑈𝐶𝐿(𝑖+1) ) 9. THEN 10. Continue processing with new sample 11. ELSE (Covariate Shift detected) 12. Initiate an appropriate corrective action. http://isrc.ulster.ac.uk 11 Magee Campus Datasets Synthetic Data Dataset 1-Jumping Mean (D1): 𝑦 𝑡 = 0.6𝑦 𝑡 − 1 − 0.5𝑦 𝑡 − 2 + 𝜀𝑡 where 𝜀𝑡 is a noise with mean 𝜇 and standard deviation 1.5. The initial values are set as 𝑦(1) = 𝑦(2) = 0. A change point is inserted at every 100 time steps by setting the noise mean 𝜇 at time 𝑡 as 0 𝑁=1 𝑁 𝜇𝑁 = 𝜇𝑁−1 + 𝑁 = 2 … .49 16 where 𝑁 is a natural number such that 100 𝑁 − 1 + 1 ≤ 𝑡 ≤ 100𝑁. Dataset 2-Scaling Variance (D2): The change point is inserted at every 100 time steps by setting the noise standard deviation 𝜎 at time 𝑡 as 1 𝑁 = 1,3, … … , 49 𝑁 𝑁 = 2,4, … 48 ln(e + ) 4 where 𝑁 is a natural number such that 100 𝑁 − 1 + 1 ≤ 𝑡 ≤ 100𝑁. 𝜎= Dataset 3-Positive-Auto-correlated (D3): The dataset is consisting of 2000 data-points, the non stationarity occurs in the middle of the data stream, shifting from 𝒩(𝑥; 1,1) to 𝒩(𝑥; 3,1), where 𝒩(𝑥; 𝜇, 𝜎) denotes the normal distribution with mean and standard deviation respectively. http://isrc.ulster.ac.uk 12 Magee Campus Dataset 4-Auto-correlated (D4): The dataset is a time-series consisting of 2000 data-points using 1-D digital filter from matlab. The filter function creates a direct form II transposed implementation of a standard difference equation. In the filter, the denominator coefficient is changed from 2 to 0.5 after producing 1000 number of points. Real-world Dataset: EEG Based Brain Signals The real-world data used here are from BCI competition-III dataset (IV-b). This dataset, contains 2 classes, 118 EEG channels (0.05-200Hz), 1000Hz sampling rate which is down-sampled to 100Hz, 210 training trials, and 420 test trials. 1.5 Session1 Session 2 Session 3 1 0.5 0 2.5 3 3.5 4 Log(bandpower) 4.5 5 Figure : pdf plot of 3 different sessions’ data taken from the training dataset. It is clear from the plot that, in each session the distribution is changed by shifting the mean from session-to-session transfer. http://isrc.ulster.ac.uk 13 Magee Campus 75 80 Shift Points 75 Shift Point 70 70 65 65 60 55 60 50 UCL 45 0 100 200 LCL SD-EWMA 300 400 Sample Number 500 UCL 55 380 600 385 390 LCL 395 400 405 Sample Number SD-EWMA 410 415 420 (a) Figure: Shift detection based on SD-EWMA: Dataset 1 (jumping mean): (a) the shift point is detected at every 100th point. (b) Zoomed view of figure a: shift is detected at 401st sample by crossing the upper control limit. 20 8 UCL LCL 20 SD-EWMA 6 10 Shift Point 10 4 (b) 0 0 2 -10 0 -10 -20 -2 Shifts Points -20 250 300 350 400 450 500 Sample Number Shift Point UCL 550 600 650 -4 850 900 950 LCL 1000 1050 Sample Number UCL SD-EWMA 1100 1150 -30 900 950 LCL 1000 Sample Number SD-EWMA 1050 1100 Figure : Shift detection based on SD-EWMA: (a) Dataset 2 (scaling variance): the shift is detected at 3 points. (b) Dataset 3 (positive auto-correlated): detects the shift after producing 1000 observations. (c) Dataset 4 (Auto-correlated): detects the shift after producing 1000 observations. http://isrc.ulster.ac.uk 14 Magee Campus TABLE : SD-EWMA shift detection in time-series data Total shifts D1 9 D2 5 D3 1 D4 1 Lambda (𝝀) # TP # FP #FN % Shift detected 0.20 0.30 0.40 0.60 0.70 0.80 0.40 0.50 0.60 0.40 0.50 0.60 8 8 All All All 4 All All All All All All 1 3 4 11 6 8 13 9 15 7 6 7 1 1 0 1 0 1 0 0 0 0 0 0 88.8 88.8 100 100 100 80 100 100 100 100 100 100 Table : Simulation results on different tests ICI-CDT SD-EWMA # TP # FP #FN AD # TP # FP #FN AD D1 6 0 3 35 All 1 0 0 D2 1 0 4 60 All 6 0 0 D3 1 0 0 80 All 9 0 0 D4 1 0 0 60 All 6 0 0 http://isrc.ulster.ac.uk 15 Magee Campus 5 log(bandpower) UCL LCL SD-EWMA 4.5 4 3.5 3 2.5 5500 6000 6500 Sample Number Figure 4: A window of 2000 samples obtained from real-world dataset. TABLE 4: SD-EWMA shift detection in BCI data Lambda (𝝀) 0.01 0.05 0.10 http://isrc.ulster.ac.uk Number of Trials 1-20 21-45 46-70 1-20 21-45 46-70 1-20 21-45 46-70 # Shifts points SD-EWMA Session 2 Session 3 0 1 3 1 4 2 10 4 9 6 7 5 16 9 22 21 25 17 16 Magee Campus Conclusion and Future Work  The drawback of classical supervised learning techniques in nonstationary environments and the motivation behind the dataset shiftdetection were discussed.  The background of non-stationary environments and dataset shiftdetection were presented.  A proposed SD-EWMA method is presented and the results are discussed.  In future, the SD-EWMA will be combined into an adaptive learning framework for non-stationary learning. http://isrc.ulster.ac.uk 17 Magee Campus Questions Thank You ! http://isrc.ulster.ac.uk

Microsoft PowerPoint

Related documents

Products

Support

Microsoft PowerPoint

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib