Machine Learning Performance Evaluation

advertisement
Machine Learning
Performance Evaluation
Performance Evaluation:
Estimation of
Recognition rates
J.-S. Roger Jang ( 張智星 )
CSIE Dept., National Taiwan Univ.
http://mirlab.org/jang
jang@mirlab.org
Machine Learning
Performance Evaluation
Outline
Performance indices of a given classifier/model
• Accuracy (recognition rate)
• Computation load
Methods to estimate the recognition rate
•
•
•
•
•
2
Inside test
One-sided holdout test
Two-sided holdout test
M-fold cross validation
Leave-one-out cross validation
Machine Learning
Performance Evaluation
Synonym
The following sets of synonyms will be use
interchangeably
• Classifier, model
• Recognition rate, accuracy
3
Machine Learning
Performance Evaluation
Performance Indices
Performance indices of a classifier
• Recognition rate
- Requires an objective procedure to derive it
• Computation load
- Design-time computation
- Run-time computation
Our focus
• Recognition rate and the procedures to derive it
• The estimated accuracy depends on
- Dataset
- Model (types and complexity)
4
Machine Learning
Performance Evaluation
Methods for Deriving Recognition
rates
Methods to derive the recognition rates
•
•
•
•
•
Inside test (resubstitution recog. rate)
One-sided holdout test
Two-sided holdout test
M-fold cross validation
Leave-one-out cross validation
Data partitioning
• Training set
• Training and test sets
• Training, validating, and test sets
5
Machine Learning
Performance Evaluation
Inside Test
Dataset partitioning
• Use the whole dataset for training & evaluation
Recognition rate
• Inside-test recognition rate
• Resubstitution accuracy
Training dataset  whole set D =

xiD ; yiD

FD () : Model identified by the training data
1
D
D
RR  1 
sgn
y

F
(
x

i
D
i )
|D| i
6
Machine Learning
Performance Evaluation
Inside Test (2)
Characteristics
• Too optimistic since RR tends to be higher
• For instance, 1-NNC always has an RR of 100%!
• Can be used as the upper bound of the true RR.
Potential reasons for low inside-test RR:
• Bad features of the dataset
• Bad method for model construction, such as
- Bad results from neural network training
- Bad results from k-means clustering
7
Machine Learning
Performance Evaluation
One-side Holdout Test
Dataset partitioning
• Training set for model construction
• Test set for performance evaluation
Recognition rate
• Inside-test RR
• Outside-test RR
Training set A =


xiA ; yiA , test set B =

xiB ; yiB
FA () : Model identified using set A
8
1
B
B
Outside-test RR  1 
sgn
y

F
(
x

i
A
i )
|B| i

Machine Learning
Performance Evaluation
One-side Holdout Test (2)
Characteristics
• Highly affected by data partitioning
• Usually Adopted when design-time computation
load is high
9
Machine Learning
Performance Evaluation
Two-sided Holdout Test
Dataset partitioning
• Training set for model construction
• Test set for performance evaluation
• Role reversal
Data set A =
 x
A
i

; yiA , data Set B =
 x
B
i
; yiB

FA () : Model identified using data set A
FB () : Model identified using data set B
1

B
B
A
A 
Outside-test RR  1 
sgn yi  FA ( xi )   sgn yi  FB ( xi ) 


| A|  | B | i
i

1

A
A
B
B 
Inside-test RR  1 
sgn yi  FA ( xi )   sgn yi  FB ( xi ) 


| A|  | B | i
10
i

Machine Learning
Performance Evaluation
Two-sided Holdout Test (2)
Two-sided holdout test (used in GMDH)
Data set A
Data set B
construction
Model A
evaluation
RRA
Model B
evaluation
RRB
construction
Outside-test RR = (RRA + RRB)/2
11
Machine Learning
Performance Evaluation
Two-sided Holdout Test (3)
Characteristics
• Better usage of the dataset
• Still highly affected by the partitioning
• Suitable for models/classifiers with high designtime computation load
12
Machine Learning
Performance Evaluation
M-fold Cross Validation
Data partitioning
• Partition the dataset into m fold
• One fold for test, the other folds for training
• Repeat m times
Data set S=S1  S2  S3 
S1
S2
S3
 Sm
Sm
Si  S j   , i  j
Class distribution within each Si should be as close as possible to S .
 m
Cross-validation RR  1     sgn yi  FS  Si ( xi
 j 1  xi ; yi Si

13
 m
)  / | S j |
 j 1

Machine Learning
Performance Evaluation
M-fold Cross Validation (2)
S1
m
disjoint
sets
S2
..
.
construction
..
.
Sk
..
.
Model k
m
Sm
RRCV 
Outside test
14
evaluation
 RR
k
k 1
m
RRk
Machine Learning
Performance Evaluation
M-fold Cross Validation (3)
Characteristics
• When m=2  Two-sided holdout test
• When m=n  Leave-one-out cross validation
• The value of m depends on the computation load
imposed by the selected model/classifier.
15
Machine Learning
Performance Evaluation
Leave-one-out Cross Validation
Data partitioning
• When m=n and Si=(xi, yi)
Data set S=S1  S2  S3 
S1
S2
S3
 Sm
Sm
Si  S j   , i  j
Class distribution within each Si should be as close as possible to S .
 m
Cross-validation RR  1     sgn yi  FS  Si ( xi
 j 1  xi ; yi Si

16
 m
)  / | S j |
 j 1

Machine Learning
Performance Evaluation
Leave-one-out Cross Validation (2)
Leave-one-out CV
 

x ; y 

x1 ; y1
2
n
i/o
pairs
..
.
2
construction
..
.
0% or 100%!

xk ; yk 
..
.

xn ; yn

Model k

n
RRLOOCV 
Outside test
17
evaluation
 RR
k
k 1
n
RRk
Machine Learning
Performance Evaluation
Leave-one-out Cross Validation (3)
General method for LOOCV
• Perform model construction (as a blackbox) n
times  Slow!
To speed up the computation LOOCV
• Construct a common part that will be used
repeatedly, such as
- Global mean and covariance for QC
• More info of cross-validation on Wikipedia
18
Machine Learning
Performance Evaluation
Applications and Misuse of CV
Applications of CV
• Input (feature) selection
• Model complexity determination
• Performance comparison among different models
Misuse of CV
• Do not try to boost validation RR too much, or you
are running the risk of indirectly training the leftout data!
19
Download