Fair Use Agreement This agreement covers the use of this presentation, please read carefully. • You may freely use these slides for teaching, if • You send me an email telling me the class number/ university in advance. • My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides). • You may freely use these slides for a conference presentation, if • You send me an email telling me the conference name in advance. • My name appears on each slide you use. • You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission. Please get in contact with Prof. Eamonn Keogh, eamonn@cs.ucr.edu 1 Ueno, Eamonn Keogh, Xiaopeng Xi}, University of California, Riverside (C) {Ken Draft ver. 12/12/2006 Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining Ken Ueno Toshiba Corporation, Japan ( Visiting PostDoc Researcher at UC Riverside ) Xiaopeng Xi Eamonn Keogh University of California, Riverside, U.S.A. Dah-Jye Lee Brigham Young University, U.S.A. 2 Outline of the Talk 1. Motivation & Background Usefulness of the anytime nearest neighbor classifier for real world applications including fish shape recognition. 2. Anytime Nearest Neighbor Classifier (ANNC) 3. SimpleRank, the critical ordering method for ANNC How can we convert conventional nearest neighbor classifier into the anytime version? What’s the critical intuition? 4. Empirical Evaluations 5. Conclusion 3 Case Study: Fish Recognition - Application for Video Monitoring System Preliminary experiments with Rotation-Robust DTW [Keogh 05] 100 accuracy(%) 99.5 99 Random Test SimpleRank Test 98.5 98 0 500 1000 1500 2000 2500 3000 Number of instances seen before interruption, S 2.0 sec -2 - 1.5 -1 -0.5 0 0.5 1 1.5 27.0 sec 2 Time intervals tend to vary among fish appearances Anytime Classifiers 4 Plausible for Streaming Shape Recognition Real World Problems for Data Mining When will it be finished? Challenges for Data Mining in Real World Applications. Accuracy / Speed Trade Off Limited memory space Real time processing Best-so-far Answer Available anytime? Multimedia Intelligence Medical Diagnosis 5 Motion Search Fish Migration Biological Shape Recognition Anytime Algorithms Trading execution time for quality of results. Always has a best-so-far answer available. Quality of the answer improves with execution time. Allowing users to suspend the process during execution, and keep going if needed. 2. Peek the results 3. Continue If you want Quality of Solution Setup Time Current Solution 6 Time S 1. Suspend Anytime Characteristics Interruptability After some small amount of setup time, the algorithm can be stopped at anytime and provide an answer Monotonicity The quality of the result is a non-decreasing function of computation time Diminishing returns The improvement in solution quality is largest at the early stages of computation, and diminishes over time Measurable Quality The quality of an approximate result can be determined Preemptability The algorithm can be suspended and resumed with minimal overhead [Zilberstein and Russell 95] Setup Time Quality of Solution Current Solution Time 7 S Bumble Bee’s Anytime Strategy “Bumblebees can choose wisely or rapidly, but not both at once.” Lars Chittka, Adrian G. Dyer, Fiola Bock, Anna Dornhaus, Nature Vol.424, 24 Jul 2003, p.388 To survive I can perform the best judgment for finding real nectars like “anytime learning” ! 8 Big Question: How can we make classifiers wiser / more rapid like bees? Nearest Neighbor Classifiers Anytime Algorithm + Lazy Learning [Reasons] To the best of our knowledge there is no “Anytime Nearest Neighbor Classifier” so far. Inherently familiar with similarity measures. Easily handle time series data by using DTW. Robust & accurate 9 Nearest Neighbor Classifiers Instance-based, lazy classification algorithm based on training exemplars. Giving the class label of the closest training exemplars with unknown instance based on a certain distance measure. As for k-Nearest Neighbor (k-NN) we give the answer by voting. cˆ( xq ) argmax vV k v, f ( xi ). i 1 1 if a b ( a, b) 0 otherwise. 10 xq : a query instance x1 xi xk : the k instances estimated class of x q cˆ( xq ) : V: k: a set of class labels # of nearest neighbors How can we convert it into anytime algorithm? Designing the anytime Nearest Neighbor 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Function [best_match_class]= Anytime_Classifier (Database, Index, O) best_match_val = inf; best_match_class = undefined; For p = 1 to number_of_classes(Database) D = distance(Database.object(Indexp) , O); If D < best_match_val best_match_val = D; best_match_class = Database.class_label(Indexp); (Constant Time) End End Disp(‘The algorithm can now be interrupted’); p = number_of_classes(Database) + 1; While (user_has_not_interrupted AND p < max(index) ) D = distance(Database.object(Indexp) , O); If D < best_match_val best_match_val = D; best_match_class = Database.class_label(Indexp); End p = p +1; user_has_not_interrupted = test_for_user_interrupt; End Initial Step Interruptible step 11 Plug-in design for any ordering method Tentative Solution for good ordering Ordering Training Data is critical. Critical points for classification results best first or worst last? put non-critical points last. Numerosity Reduction can partially be the good ordering solutions. The problem is very similar to ordering problem for anytime algorithms. Leave-one-out (k=1) within training data Numerosity Reduction: S must be decidable before classification Anytime Preprocessing: S does not need to be decidable before classification Keypoint: 12 Static Dynamic in terms of interrupting time S JF:two-class classification problem 2-D Gaussian ball f ( x) 2 Class A Class B 1 Class A 0 -1 Class B -2 -2 -1 0 13 1 2 1 e 2 if ( xm)2 2 2 ( x mean( x)) 2 ( y mean( y )) 2 r 2 otherwise. Hard to classify correctly because of the round shape. We need non-linear and fast-enough classifier. We cannot use DP for JF problem Dynamic Programming (DP) ans(n-1) ans(n) DP is locally optimal. I II III Ideal Tessellations heavily depend on entire feature space. Captures14the entire classification boundaries in the early stage. Numerosity reduction Scoring strategy: similar to Numerosity Reduction Random Ranking (baseline) DROP Algorithms [Wilson and Martinez 00] Weighting based on enemies / associates for Nearest Neighbor DROP1, DROP2,DROP3 acc(BestDrop) max(acc(DROP1),acc(DROP2), acc(DROP3)) NaïveRank Algorithms Sorting based on leave-one-out with 1-Nearest Neighbor 15 SimpleRank Ordering based on NaïveRank Algorithm [Xi and Keogh 06] Sorting by leave-one-out with 1-Nearest Neighbor NaiveRank Anytime Framework + SimpleRank if class ( x) class( x j ) 1 rank ( x) j 2 /( num _ of _ class 1) otherwise 1. order training instances by the unimportance measure 2. sort it in reverse order. Observation 1 Penalizing the close instance with the different class label. Observation 2 Adjust the16 penalty weights with regard to the num. of Classes How SimpleRank works. Ranking process on JF Dataset by Simple Rank Voronoi Tessellation on JF Dataset Movie ( T = 1 … 50 ) T=10 2 SimpleRank 1.5 Random Rank (baseline) 2 1 0.5 1 0 -0.5 0 -1 -1.5 -2 -2 -1 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 I -2 -1 0 1 II 2 -2 -1 0 Click here to start movie 17 wrong class estimation area 1 2 Empirical Evaluations fair evaluations based on diverse kinds of datasets Name # classes # features # instances Evaluation Data Type JF 2 2 20,000 2,000/18,000 Real (synthetic) Australian Credit 2 14 690 10-fold CV Mixed Letter 26 16 20,000 5,000/15,000 Real Pen Digits 10 16 10,992 7,494/3,498 Real Forest Cover Type 7 54 581,012 11,340/569,672 Mixed Ionosphere 2 34 351 10-fold CV Real Voting Records 2 16 435 10-fold CV Boolean Two Patterns 4 128 5,000 1,000/4,000 time series Leaf 6 150 442 10-fold CV time series Face 16 131 2,231 1,113/1,118 time series All of the datasets are public and available for everyone! UCI ICS Machine Learning Data Archive UCI KDD Data Archive 18 UCR Time Series Data Mining Archive K=1: Voting Records 10-fold Cross Validation, Euclidean 100 accuracy(%) SimpleRank 90 BestDrop Random Test SimpleRank Test BestDrop Test RandomRank 0 50 100 150 200 250 300 350 Number of instances seen before interruption, S 19 K=1: Forest Cover Type 70 65 SimpleRank, k=1 Accuracy (%) 60 55 50 Random Rank, k=1 45 40 35 30 20 0 2000 4000 6000 8000 10000 # of instances seen before interruption 12000 K=1,3,5 Australian Credit 10-CV, Euclidean 90 85 80 Accuracy (%) 75 70 K=1 K=3 K=5 65 60 Australian Credit dataset 55 50 45 40 0 100 200 300 400 500 600 # of instances seen before interruption 21 Preliminary Results in our experiments K=1 Two Patterns - Time Series Data - 22 Future Research Directions Make ordering+sorting much faster O(n log n) for sorting + α Handling Concept Drift Showing Confidence 23 Conclusion and Summary Our Contributions: - New framework for Anytime Nearest Neighbor. - SimpleRank: Quite simple but critically good ordering. So far our method has achieved the highest accuracy in diverse datasets. Demonstrates the usefulness for shape recognition in Stream Video Mining. 24 Good Job! This is the best-so-far ordering method familiar with anytime Nearest Neighbor! Acknowledgments Dr. Agenor Mafra-Neto, ISCA Technologies, Inc Dr. Geoffrey Webb, Monash University Dr. Ying Yang, Monash University Dr. Dennis Shiozawa, BYU Dr, Xiaoqian Xua, BYU Dr. Pengcheng Zhana, BYU Dr. Robert Schoenberger, Agris-Schoen Vision Systems, Inc Jill Brady, UCR NSF grant IIS-0237918 Many Thanks!! 25 Fin Thank you for your attention. Any Question? 26