Towards a Minimum Description Length Based Stopping Criterion for Semi-Supervised Time Series Classification Nurjahan Begum, Bing Hu, Thanawin Rakthanmanon, and Eamonn Keogh Outline Introduction Proposed Stopping Criterion Motivation of Stopping Criterion for Semi-Supervised Classification Minimum Description Length (MDL) technique Our Approach Experimental Results Conclusion 2 Introduction We have developed a Minimum Description Length based Stopping Criterion for Semi-supervised Time Series Classification Why Semi-Supervised Learning? Why do we need a Stopping Criterion? 3 Why Semi-Supervised Learning? Labeled data Scarce and extremely expensive* Human intervention Unlabeled data Abundant. PhysioBank archive* has more than 700 GB of digitized signals and time series freely available. Semi Supervised classification Less labeled data Less human effort and usually obtains higher accuracy* *F. Florea, et. al., Medical image categorization with MedIC and MedGIFT (2006) * A. L. Goldberger, et. al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals (2000) * L. Wei et. al., Semi-Supervised Time Series Classification (2006) 4 Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Use Semi-Supervised Classification 5 Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Use Semi-Supervised Classification 6 Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Use Semi-Supervised Classification 7 Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Oops… We are adding false positives! 8 Our Contribution A novel, parameter free stopping criterion using Minimum Description Length (MDL) for semi-supervised time series classification Allows easy adaptation by experts in medical community 9 Minimum Description Length (MDL) MDL is a formalization of Occam's Razor The best hypothesis for a given set of data is the one that leads to the best compression of the data. 10 Minimum Description Length (MDL) MDL is a formalization of Occam's Razor The best hypothesis for a given set of data is the one that leads to the best compression of the data. Why MDL? Intrinsically parameter Leverages the true underlying structure of data Avoids needing to explain all of the data Has recently shown great potential for real-valued time series data free 11 Our Approach Given Positive Instance Original Time Series 12 Our Approach Discretize the Time Series Repeat Find the Nearest Neighbor of the Positive Instance set Calculate the BitCount Until BitCount increases Given Positive Instance Original Time Series 13 Discrete Normalization (Why?) MDL is defined in discrete space Time series are real-valued Need to normalize real-valued data in a space of reduced cardinality Won’t drastic information reduction loose meaningful information? 14 Will Discrete Normalization loose meaningful information? The answer is NO! Justification? A time series clustering experiment*… (REF: [1][2]) Real valued time series Discretized time series (cardinality = 16) [1] B. Hu, et. al. Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL.(2011) [2] T. Rakthanmanon, et. al. Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data (2011) *Incartdb dataset (Record I70, Signal II) [www.physionet.org] 15 Our Approach Discretize the Time Series Repeat Find the Nearest Neighbor of the Positive Instance set Calculate the BitCount Until BitCount increases Given Positive Instance Original Time Series 16 Our Approach H= Iteration 0 17 Our Approach H= Iteration 0 18 Our Approach No. of instances encoded H= 3500 Iteration 0 2500 0 2 4 6 = 2800 BitCount Bit Count = 100 * log216 + 6 * 100 * log216 19 Our Approach No. of instances encoded 3500 Iteration 0 2500 Iteration 1 BitCount H= 0 2 4 6 0 2 4 6 3500 2500 Bit Count =100 * log216 + 6 * (ceil(log2100)+log216) + 5 * 100 * log216 = 2466 20 Our Approach No. of instances encoded 3500 Iteration 0 2500 Iteration 1 BitCount H= 4 6 0 2 4 6 2 4 6 2500 3500 Bit Count 2500 = 2242 2 3500 Iteration 2 = 100 * log216 + 22 * (ceil(log2100)+log216) + 4 * 100 * log216 0 0 21 Our Approach No. of instances encoded H= 3500 Iteration 3 2500 0 = 100 * log216 + 37 * (ceil(log2100)+log216) + 3 * 100 * log216 = 2007 4 6 BitCount Bit Count 2 22 Our Approach No. of instances encoded H= 3500 Iteration 4 Bit Count 2500 BitCount Iteration 3 0 2 4 6 0 2 4 6 3500 2500 = 100 * log216 + 115 *(ceil(log2100)+log216) + 2 * 100 * log216 = 2465 23 Our Approach No. of instances encoded H= 3500 Iteration 4 2500 BitCount Iteration 3 0 2 4 6 0 2 4 6 3500 2500 3500 Iteration 5 Stopping point 2500 Bit Count 0 2 4 6 = 100 * log216 + 192 * (ceil(log2100)+log216) + 1*100 * log216 = 2912 24 Experimental Results Dataset Recordings Record Signal # of target class instances MIT-BIH Supraventricular Arrhythmia Database (svdb) 78 (each ½ hour long) 801 ECG1 268 (Premature Ventricular contractions) St. Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database (incartdb) 75 (each ½ hour long) I70 Signal II 126 (Atrial Premature Beat) Sudden Cardiac Death Holter Database (sddb) 23 52 (7 ½ hour long)* ECG1 216 (R-on-T Premature Ventricular Contraction) 25 * We worked with ~1 hour long data Interpreting the plots Ideal BitCount BitCount Bad, adds false positives Stopping Point Stopping Point Number of instances encoded Number of instances encoded Really bad BitCount BitCount Bad, misses true positives Stopping Point Number of instances encoded Stopping Point Number of instances encoded Experimental Results 2.85 X 10 svdb BitCount 3.2 2.8 2.4 100 300 incartdb 2.75 2.65 2.55 Stopping Point Stopping Point 500 700 100 Number of instances encoded 300 500 700 Number of instances encoded 2.6x10 5 sddb BitCount BitCount 3.6 X 10 5 5 2.3 2 1.7 Stopping Point 100 200 300 400 Number of instances encoded 27 Experimental Results (Contd.) 6.4 X 10 5 2X 10 5 BitCount 6.1 Stopping Point Fish_test 1.9 Stopping Point 1.8 5.8 1.7 5.5 0 100 200 Number of instances encoded 5 1.9 X 10 300 0 20 40 60 80 Number of instances encoded 100 FaceAll_test 1.8 BitCount BitCount Swedish_leaf 1.7 1.6 Stopping Point 1.5 0 100 200 Number of instances encoded 300 28 Comparison with the state-of-the-art algorithm Fish_test Minimal Distance 1.2 0.6 Too Early Stopping (Li et. al’s method*) 0.4 0 5 0 10 BitCount 2X 10 20 30 40 50 60 Number of instances classified 70 80 1.9 1.8 Stopping Point 1.7 0 10 20 30 40 50 Number of instances encoded 60 70 80 29 * L. Wei et. al., Semi-Supervised Time Series Classification (2006) Conclusions Novel way of semi-supervised classification with only one labeled instance. Previous approaches of stopping the semi-supervised classification required – extensive parameter tuning, remained something of a black art. Stopping criterion for semi-supervised classification based on MDL. To our knowledge, our stopping criterion is the first parameter free criterion that mitigates the early stopping problem, leverages the inherent structure of the data. 30 Thank you! If you have any question, please contact me: Name: Nurjahan Begum Email: nbegu001@ucr.edu 31