Change Detection in Data Streams by Testing Exchangeability Shen-Shyang Ho JPL/Caltech The research is part of the author’s PhD dissertation (in computer science) at George Mason University Conference travel is partially sponsored by NASA Postdoctoral Program (NPP) Travel Grant. 3/12/2016 1 Outline 1. 2. 3. 4. 5. 6. 7. 8. Introduction Previous Work (Statistics and Machine Learning/Data Mining/Computer Vision) Intuition Background (Exchangeability/Martingale) Methodology Comparison and Experimental Results Application I: Adaptive Support Vector Machine (Classification Model) Application II: Video Shot Change Detection (Cluster Model) 3/12/2016 2 Introduction Let X1, X 2 ,, X n be a sequence of independent p-dimensional random vectors with parameters 1 , 2 ,, n .Test the following hypothesis: H 0 : 1 2 n 0 H1 : m with 1 m m1 n Assumption: Data vectors are observed sequentially. 3/12/2016 3 Introduction 3/12/2016 4 Previous Work Statistics :- Sequential Analysis is statistical inference with the assumption that the number of observations/samples required is not pre-determined. • Sequential Probability Ratio Test – A. Wald (1945) • Application: Quality Control (Military/Manufacturing) • CUSUM (Cumulative Sum) – E. S. Page (1954) • Refer to “Sequential Analysis: Design Methods and Applications” Journal for recent research. • Most recent issue (vol 27, no 2, 2008) – papers on structural change/minimax method for change-point detection problems/multidecision quickest change-point detection – 3 out of 6 papers. Machine Learning/Data Mining: • Applications: Concept Drift Problem, Adaptive classifier, Anomaly in Internet Traffic, Video-shot change detection • Proposed methodology is usually problem-specific • Monitoring error, sliding window, weighted data, ensemble classifier … • Statistical method: Likelihood ratio method, Bayesian methods, Hypothesis Testing … 3/12/2016 5 Related Data Mining/Machine Learning/Computer Vision Research 1. Xiuyao Song, Mingxi Wu, Christopher M. Jermaine, Sanjay Ranka: Statistical change detection for multi-dimensional data. KDD 2007: 667-676 2. Kolter, J.Z. and Maloof, M.A. Dynamic Weighted Majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8:2755--2790, 2007. 3. Klinkenberg, Ralf and Joachims, Thorsten: Detecting Concept Drift with Support Vector Machines. Proceedings of the Seventeenth International Conference on Machine Learning (ICML): 487--494, 2000. 4. Bi Song, Namrata Vaswani, Amit K. Roy Chowdhury: Closed-Loop Tracking and Change Detection in Multi-Activity Sequences. CVPR 2007 5. Paul L. Rosin: Thresholding for Change Detection. ICCV 1998: 274-279 6. Balachander Krishnamurthy, Subhabrata Sen, Yin Zhang, Yan Chen: Sketchbased change detection: methods, evaluation, and applications. Internet Measurement Conference 2003: 234-247 7. Tsuyoshi Idé, Keisuke Inoue: Knowledge Discovery from Heterogeneous Dynamic Systems using Change-Point Correlations. SDM 2005 8. Tsuyoshi Idé, Koji Tsuda: Change-Point Detection using Krylov Subspace Learning. SDM 2007 9. Daniel Kifer, Shai Ben-David, Johannes Gehrke, Detecting Changes in Data Streams, Proc. 30th VLDB Conference, 2004. 3/12/2016 10. ... … 6 Motivation “Lack of Exchangeability” implies “Change in Data Distribution/Model” 3/12/2016 7 Intuition 1 Identically Distributed but may be Dependent 3/12/2016 2 3 4 5 6 7 8 9 10 1 9 3 5 2 6 7 4 8 10 1 2 3 4 5 6 7 8 9 10 1 9 3 5 2 6 7 2 8 10 8 Background Vovk et al’s work on “Testing Exchangeability Online” (ICML 2003) and “Algorithmic Learning in a random world” (Springer) : 1.Testing exchangeability assumption in an online mode. 2.Explicit Martingale for testing the hypothesis of exchangeability (Refer to (conformal prediction) ) 3/12/2016 9 Background Let {Zi : 1 i } be a sequence of random variables. A finite sequence of random variable Z1 ,, Z n is exchangeable if p( Z1 ,, Z n ) , the joint distribution is invariant under any permutation of the indices of the random variables. A martingale is a sequence of random variables {M i : 1 i } such that M n is a measurable function of Z1 ,, Z n for all n 0,1, (in particular, M 0 is a constant value) and the conditional expectation of M n1 given Z1 ,, Z n is equal to M n , i.e., E ( M n1 | Z1,, Z n ) M n . 3/12/2016 10 Background ( Doob ' s Maximal Inequality ) Suppose that {M i : 0 i } is a nonnegative martingale. Then for any 0 and n , P(max M k ) E ( M n ). 0k n 3/12/2016 11 Methodology - Strangeness Strangeness measures how well one data point (for each data point seen so far) is represented by a data model compared to other points • Applicable to classification, regression or cluster model • measure diversity / disagreements, i.e. the higher the strangeness of a point, the less likely it comes from the model Condition for a valid strangeness measure: A strangeness value of a data point at a particular time instance should be independent of the order it is observed with respect to the other data points. 3/12/2016 12 Classification Model k Strangeness (K-NN): i y d ij j 1 k y d ij t = 1 to 1000 1001 to 2000 2001 to 3000 j 1 A B C t aaaaa…aaaaabbbbbb…….bbbbbccccc…cccccc Strangeness (SVM): Lagrange Multiplier 3/12/2016 Classification Model Strangeness (SVM): Lagrange Multiplier 3/12/2016 Cluster Model Strangeness of a data vector z i in a cluster i || zi C || where C is the center of the cluster. 3/12/2016 15 Regression Model i ( xi , yi ) where f | yi f ( xi ) | exp( g ( xi )) is the regression function and g is the error estimation function for f at xi (Papadopoulos et al., Inductive Confidence Machines for Regression, ECML, LNAI 2430, pp 345-356, 2002) 3/12/2016 16 Methodology p-value of a new point xn1 given previous seen data points: PV ({ x1 , x2 ,, xn1}, n1 ) #{i : i n1} n1 #{i : i n1} n 1 ( B) where i is the strangeness measure for xi , i 1,2,, n 1 and n is1 randomly chosen from [0,1] for each new point xn1. n1: necessary so the sequence of p-values are uniformly distributed in [0,1] for any strangeness measure (Vovk, 2003) 3/12/2016 17 Methodology 3/12/2016 18 Methodology Consider the null hypothesis H 0 : no change in data stream against the alternative hypothesis H1 : a change occurs in data stream The test for change continues as long as 0 Mi One rejects the null hypothesis when M i 3/12/2016 19 Methodology n M n( ) (pi 1 ) i 1 where is a fixed positive number in (0,1) and pi ,i 1,, n are pvalues at time 1,, n 3/12/2016 20 Methodology 3/12/2016 21 Experimental Result – Performance Measure 3/12/2016 22 Experimental Result – Varying 3/12/2016 23 Experimental Result – Varying Strangeness 3/12/2016 24 Experimental Result –Varying Linearly Separable Classification Model 3/12/2016 Linearly Non-separable Classification Model 25 Experimental Result Ringnorm/Twonorm (Change in dataset every 1000 points) 3/12/2016 Nursery Categorical Dataset (Change in class compositions every 1000 points) 26 Experimental Result 3/12/2016 27 Experimental Result – Different Methods Changes at time 200 from Y1 3x1 x22 n1 to Y2 10 x1 222 n2 where n1 and n2 are Gaussian noise. 3/12/2016 28 Application: Adaptive SVM 3/12/2016 29 Application: Adaptive SVM Simulated USPS 3-Digit Image Data Stream 01120120…0340033404…156556115…77789987… 3/12/2016 t 30 Application: Adaptive SVM A (blue): True Change Point Known to the SVM B(red): Adaptive SVM using martingale method C(magenta): SVM using sliding window of size 250 D(black): SVM using sliding window of size 500 E(green): SVM using sliding window of size 1000 3/12/2016 31 Application: Video-Shot Change Detection Martingale Change Detection using multiple features (MVMT: Multipleview martingale test) 3/12/2016 32 Application: Video-Shot Change Detection • HI: Histogram Intersection • Chi-Square Measure • Euclidean Distance (ED) 3/12/2016 33 Reference 1. S.-S. Ho and H. Wechsler, Detecting Change-Points in Unlabeled Data Streams using Martingale, Proc. 20th Int. Joint. Conf. Artificial Intelligence (IJCAI 2007), Hyderabad, India, Jan. 6 - 12, 2007. 2. S-S Ho, A Martingale Framework for Concept Change Detection in Time-Varying Data Streams, Proc Int. Conf. on Machine Learning (ICML 2005), Bonn, Germany, Aug. 7 - 11, 2005 3. S-S Ho and H. Wechsler, Adaptive Support Vector Machine for Time-Varying Data streams Using the Martingale, Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI 2005), Edinburgh, Scotland, July 30 - Aug. 5, 2005 4. S-S Ho and H. Wechsler, On the detection of concept change in time-varying data streams by testing exchangeability, Proc. Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, Scotland, July 26 - 29, 2005 5. (matlab codes + datasets) 3/12/2016 34 Acknowledgement • Harry Wechsler, PhD Advisor (George Mason University) • Volodya Vovk, (Royal Holloway, University of London) • Alexander Gammerman (Royal Holloway, University of London) • Oak Ridge Associated University (ORAU) 3/12/2016 35