Powerpoint - University of California, Riverside

advertisement
Fair Use Agreement
This agreement covers the use of this presentation,
please read carefully.
• You may freely use these slides for teaching, if
• You send me an email telling me the class number/ university in advance.
• My name and email address appears on the first slide (if you are using all or most of the slides), or on each
slide (if you are just taking a few slides).
• You may freely use these slides for a conference presentation, if
• You send me an email telling me the conference name in advance.
• My name appears on each slide you use.
• You may not use these slides for tutorials, or in a published work (tech report/
conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is
highly likely I will grant you permission.
Please get in contact with Prof. Eamonn Keogh, eamonn@cs.ucr.edu
1 Ueno, Eamonn Keogh, Xiaopeng Xi}, University of California, Riverside
(C) {Ken
Draft ver. 12/12/2006
Anytime Classification
Using the Nearest Neighbor Algorithm
with Applications to Stream Mining
Ken Ueno
Toshiba Corporation, Japan
( Visiting PostDoc Researcher at UC Riverside )
Xiaopeng Xi
Eamonn Keogh
University of California, Riverside, U.S.A.
Dah-Jye Lee
Brigham Young University, U.S.A.
2
Outline of the Talk
1. Motivation & Background
Usefulness of the anytime nearest neighbor classifier
for real world applications including fish shape recognition.
2. Anytime Nearest Neighbor Classifier (ANNC)
3. SimpleRank, the critical ordering method for ANNC
How can we convert conventional nearest neighbor classifier
into the anytime version? What’s the critical intuition?
4. Empirical Evaluations
5. Conclusion
3
Case Study: Fish Recognition
- Application for Video Monitoring System Preliminary experiments with Rotation-Robust DTW [Keogh 05]
100
accuracy(%)
99.5
99
Random Test
SimpleRank Test
98.5
98
0
500
1000
1500
2000
2500
3000
Number of instances seen before interruption, S
2.0 sec
-2
- 1.5
-1
-0.5
0
0.5
1
1.5
27.0 sec
2
Time intervals tend to vary among fish appearances
Anytime Classifiers
4
Plausible for Streaming Shape Recognition
Real World Problems
for Data Mining



When will it be finished?
Challenges for Data Mining
in Real World Applications.
 Accuracy / Speed  Trade Off
 Limited memory space
 Real time processing
Best-so-far Answer Available anytime?
Multimedia Intelligence
Medical Diagnosis
5
Motion Search
Fish Migration
Biological Shape Recognition
Anytime Algorithms




Trading execution time for quality of results.
Always has a best-so-far answer available.
Quality of the answer improves with execution time.
Allowing users to suspend the process during
execution, and keep going if needed.
2. Peek the results
3. Continue
If you want
Quality of
Solution
Setup
Time
Current Solution
6
Time
S
1. Suspend
Anytime Characteristics

Interruptability
After some small amount of setup time, the algorithm can be stopped at anytime
and provide an answer

Monotonicity
The quality of the result is a non-decreasing function of computation time

Diminishing returns
The improvement in solution quality is largest at the early stages of computation,
and diminishes over time

Measurable Quality
The quality of an approximate result can be determined
Preemptability
The algorithm can be suspended
and resumed with minimal overhead
[Zilberstein and Russell 95]
Setup
Time
Quality of
Solution

Current Solution
Time
7
S
Bumble Bee’s Anytime Strategy
“Bumblebees can choose wisely or rapidly, but not both at once.”
Lars Chittka, Adrian G. Dyer, Fiola Bock, Anna Dornhaus, Nature Vol.424, 24 Jul 2003, p.388
To survive
I can perform the best judgment
for finding real nectars
like “anytime learning” !
8
Big Question:
How can we make classifiers
wiser / more rapid
like bees?
Nearest Neighbor Classifiers
Anytime Algorithm + Lazy Learning
[Reasons]
 To the best of our knowledge there is
no “Anytime Nearest Neighbor Classifier” so far.
 Inherently familiar with similarity measures.
 Easily handle time series data by using DTW.
 Robust & accurate
9
Nearest Neighbor Classifiers



Instance-based, lazy classification algorithm based on training
exemplars.
Giving the class label of the closest training exemplars with
unknown instance based on a certain distance measure.
As for k-Nearest Neighbor (k-NN) we give the answer by voting.
cˆ( xq )  argmax
vV
k
  v, f ( xi ).
i 1
1 if a  b
 ( a, b)  
0 otherwise.
10
xq :
a query instance
x1  xi  xk : the k instances
estimated class of x q
cˆ( xq ) :
V:
k:
a set of class labels
# of nearest neighbors
How can we convert it into anytime algorithm?
Designing
the anytime Nearest Neighbor
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Function [best_match_class]= Anytime_Classifier (Database, Index, O)
best_match_val
= inf;
best_match_class = undefined;
For p = 1 to number_of_classes(Database)
D = distance(Database.object(Indexp) , O);
If D < best_match_val
best_match_val = D;
best_match_class = Database.class_label(Indexp); (Constant Time)
End
End
Disp(‘The algorithm can now be interrupted’);
p = number_of_classes(Database) + 1;
While (user_has_not_interrupted AND p < max(index) )
D = distance(Database.object(Indexp) , O);
If D < best_match_val
best_match_val = D;
best_match_class = Database.class_label(Indexp);
End
p = p +1;
user_has_not_interrupted = test_for_user_interrupt;
End
Initial
Step
Interruptible
step
11
Plug-in design for any ordering method
Tentative Solution
for good ordering





Ordering Training Data is critical.
Critical points for classification results
best first or worst last?
 put non-critical points last.
Numerosity Reduction can partially be the good
ordering solutions. The problem is very similar to
ordering problem for anytime algorithms.
Leave-one-out (k=1) within training data
Numerosity Reduction: S must be decidable before classification
Anytime Preprocessing: S does not need to be decidable before classification
Keypoint:
12
Static  Dynamic in terms of interrupting time S
JF:two-class classification problem

2-D Gaussian ball
f ( x) 
2
Class A

 Class B
1

Class A
0

-1
Class B
-2
-2
-1
0
13
1
2

1
e
2 
if
( xm)2
2 2
( x  mean( x)) 2  ( y  mean( y )) 2  r 2
otherwise.
Hard to classify correctly
because of the round shape.
We need non-linear and
fast-enough classifier.
We cannot use DP for JF problem
Dynamic Programming (DP)
ans(n-1)  ans(n)
DP is locally optimal.
I
II
III
Ideal Tessellations heavily depend on entire feature space.
Captures14the entire classification boundaries in the early stage.
Numerosity reduction

Scoring strategy: similar to Numerosity Reduction


Random Ranking (baseline)
DROP Algorithms [Wilson and Martinez 00]
Weighting based on enemies / associates for Nearest Neighbor
DROP1, DROP2,DROP3
acc(BestDrop)  max(acc(DROP1),acc(DROP2), acc(DROP3))

NaïveRank Algorithms
Sorting based on leave-one-out with 1-Nearest Neighbor
15
SimpleRank Ordering
based on NaïveRank Algorithm [Xi and Keogh 06]
Sorting by leave-one-out with 1-Nearest Neighbor
NaiveRank
Anytime Framework + SimpleRank
if class ( x)  class( x j )
1
rank ( x)   
j  2 /( num _ of _ class  1) otherwise
1. order training instances by the unimportance measure
2. sort it in reverse order.
Observation 1
Penalizing the close instance with the different class label.
Observation 2
Adjust the16 penalty weights with regard to the num. of Classes
How SimpleRank works.
Ranking process on JF Dataset
by Simple Rank
Voronoi Tessellation on JF Dataset
Movie ( T = 1 … 50 )
T=10
2
SimpleRank
1.5
Random Rank
(baseline)
2
1
0.5
1
0
-0.5
0
-1
-1.5
-2
-2
-1
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2
I
-2
-1
0
1
II
2
-2
-1
0
Click here to start movie
17
wrong class estimation area
1
2
Empirical Evaluations
fair evaluations based on diverse kinds of datasets
Name
# classes
# features
# instances
Evaluation
Data Type
JF
2
2
20,000
2,000/18,000
Real (synthetic)
Australian Credit
2
14
690
10-fold CV
Mixed
Letter
26
16
20,000
5,000/15,000
Real
Pen Digits
10
16
10,992
7,494/3,498
Real
Forest Cover Type
7
54
581,012
11,340/569,672
Mixed
Ionosphere
2
34
351
10-fold CV
Real
Voting Records
2
16
435
10-fold CV
Boolean
Two Patterns
4
128
5,000
1,000/4,000
time series
Leaf
6
150
442
10-fold CV
time series
Face
16
131
2,231
1,113/1,118
time series
All of the datasets are public and available for everyone!
UCI ICS Machine Learning Data Archive
UCI KDD Data Archive
18
UCR Time Series Data Mining Archive
K=1: Voting Records
10-fold Cross Validation,
Euclidean
100
accuracy(%)
SimpleRank
90
BestDrop
Random Test
SimpleRank Test
BestDrop Test
RandomRank
0
50
100
150
200
250
300
350
Number of instances seen before interruption, S
19
K=1: Forest Cover Type
70
65
SimpleRank, k=1
Accuracy (%)
60
55
50
Random Rank, k=1
45
40
35
30
20
0
2000
4000
6000
8000
10000
# of instances seen before interruption
12000
K=1,3,5
Australian Credit
10-CV, Euclidean
90
85
80
Accuracy (%)
75
70
K=1
K=3
K=5
65
60
Australian Credit dataset
55
50
45
40
0
100
200
300
400
500
600
# of instances seen before interruption
21
Preliminary Results in our experiments
K=1 Two Patterns
- Time Series Data -
22
Future Research Directions
Make ordering+sorting much faster
O(n log n) for sorting + α
 Handling Concept Drift
 Showing Confidence

23
Conclusion and Summary



Our Contributions:
- New framework for Anytime Nearest Neighbor.
- SimpleRank: Quite simple but critically good ordering.
So far our method has achieved the highest accuracy
in diverse datasets.
Demonstrates the usefulness for shape recognition
in Stream Video Mining.
24
Good Job!
This is the best-so-far
ordering method familiar
with anytime Nearest
Neighbor!
Acknowledgments
Dr. Agenor Mafra-Neto, ISCA Technologies, Inc
Dr. Geoffrey Webb, Monash University
Dr. Ying Yang, Monash University
Dr. Dennis Shiozawa, BYU
Dr, Xiaoqian Xua, BYU
Dr. Pengcheng Zhana, BYU
Dr. Robert Schoenberger, Agris-Schoen Vision Systems, Inc
Jill Brady, UCR
NSF grant IIS-0237918
Many Thanks!!
25
Fin
Thank you for your attention.
Any Question?
26
Download