Machine Learning
Performance Evaluation
Performance Evaluation:
Estimation of
Recognition rates
J.-S. Roger Jang ( 張智星 )
CSIE Dept., National Taiwan Univ.
http://mirlab.org/jang
jang@mirlab.org
Machine Learning
Performance Evaluation
Outline
Performance indices of a given classifier/model
• Accuracy (recognition rate)
• Computation load
Methods to estimate the recognition rate
•
•
•
•
•
2
Inside test
One-sided holdout test
Two-sided holdout test
M-fold cross validation
Leave-one-out cross validation
Machine Learning
Performance Evaluation
Synonym
The following sets of synonyms will be use
interchangeably
• Classifier, model
• Recognition rate, accuracy
3
Machine Learning
Performance Evaluation
Performance Indices
Performance indices of a classifier
• Recognition rate
- Requires an objective procedure to derive it
• Computation load
- Design-time computation
- Run-time computation
Our focus
• Recognition rate and the procedures to derive it
• The estimated accuracy depends on
- Dataset
- Model (types and complexity)
4
Machine Learning
Performance Evaluation
Methods for Deriving Recognition
rates
Methods to derive the recognition rates
•
•
•
•
•
Inside test (resubstitution recog. rate)
One-sided holdout test
Two-sided holdout test
M-fold cross validation
Leave-one-out cross validation
Data partitioning
• Training set
• Training and test sets
• Training, validating, and test sets
5
Machine Learning
Performance Evaluation
Inside Test
Dataset partitioning
• Use the whole dataset for training & evaluation
Recognition rate
• Inside-test recognition rate
• Resubstitution accuracy
Training dataset whole set D =
xiD ; yiD
FD () : Model identified by the training data
1
D
D
RR 1
sgn
y
F
(
x
i
D
i )
|D| i
6
Machine Learning
Performance Evaluation
Inside Test (2)
Characteristics
• Too optimistic since RR tends to be higher
• For instance, 1-NNC always has an RR of 100%!
• Can be used as the upper bound of the true RR.
Potential reasons for low inside-test RR:
• Bad features of the dataset
• Bad method for model construction, such as
- Bad results from neural network training
- Bad results from k-means clustering
7
Machine Learning
Performance Evaluation
One-side Holdout Test
Dataset partitioning
• Training set for model construction
• Test set for performance evaluation
Recognition rate
• Inside-test RR
• Outside-test RR
Training set A =
xiA ; yiA , test set B =
xiB ; yiB
FA () : Model identified using set A
8
1
B
B
Outside-test RR 1
sgn
y
F
(
x
i
A
i )
|B| i
Machine Learning
Performance Evaluation
One-side Holdout Test (2)
Characteristics
• Highly affected by data partitioning
• Usually Adopted when design-time computation
load is high
9
Machine Learning
Performance Evaluation
Two-sided Holdout Test
Dataset partitioning
• Training set for model construction
• Test set for performance evaluation
• Role reversal
Data set A =
x
A
i
; yiA , data Set B =
x
B
i
; yiB
FA () : Model identified using data set A
FB () : Model identified using data set B
1
B
B
A
A
Outside-test RR 1
sgn yi FA ( xi ) sgn yi FB ( xi )
| A| | B | i
i
1
A
A
B
B
Inside-test RR 1
sgn yi FA ( xi ) sgn yi FB ( xi )
| A| | B | i
10
i
Machine Learning
Performance Evaluation
Two-sided Holdout Test (2)
Two-sided holdout test (used in GMDH)
Data set A
Data set B
construction
Model A
evaluation
RRA
Model B
evaluation
RRB
construction
Outside-test RR = (RRA + RRB)/2
11
Machine Learning
Performance Evaluation
Two-sided Holdout Test (3)
Characteristics
• Better usage of the dataset
• Still highly affected by the partitioning
• Suitable for models/classifiers with high designtime computation load
12
Machine Learning
Performance Evaluation
M-fold Cross Validation
Data partitioning
• Partition the dataset into m fold
• One fold for test, the other folds for training
• Repeat m times
Data set S=S1 S2 S3
S1
S2
S3
Sm
Sm
Si S j , i j
Class distribution within each Si should be as close as possible to S .
m
Cross-validation RR 1 sgn yi FS Si ( xi
j 1 xi ; yi Si
13
m
) / | S j |
j 1
Machine Learning
Performance Evaluation
M-fold Cross Validation (2)
S1
m
disjoint
sets
S2
..
.
construction
..
.
Sk
..
.
Model k
m
Sm
RRCV
Outside test
14
evaluation
RR
k
k 1
m
RRk
Machine Learning
Performance Evaluation
M-fold Cross Validation (3)
Characteristics
• When m=2 Two-sided holdout test
• When m=n Leave-one-out cross validation
• The value of m depends on the computation load
imposed by the selected model/classifier.
15
Machine Learning
Performance Evaluation
Leave-one-out Cross Validation
Data partitioning
• When m=n and Si=(xi, yi)
Data set S=S1 S2 S3
S1
S2
S3
Sm
Sm
Si S j , i j
Class distribution within each Si should be as close as possible to S .
m
Cross-validation RR 1 sgn yi FS Si ( xi
j 1 xi ; yi Si
16
m
) / | S j |
j 1
Machine Learning
Performance Evaluation
Leave-one-out Cross Validation (2)
Leave-one-out CV
x ; y
x1 ; y1
2
n
i/o
pairs
..
.
2
construction
..
.
0% or 100%!
xk ; yk
..
.
xn ; yn
Model k
n
RRLOOCV
Outside test
17
evaluation
RR
k
k 1
n
RRk
Machine Learning
Performance Evaluation
Leave-one-out Cross Validation (3)
General method for LOOCV
• Perform model construction (as a blackbox) n
times Slow!
To speed up the computation LOOCV
• Construct a common part that will be used
repeatedly, such as
- Global mean and covariance for QC
• More info of cross-validation on Wikipedia
18
Machine Learning
Performance Evaluation
Applications and Misuse of CV
Applications of CV
• Input (feature) selection
• Model complexity determination
• Performance comparison among different models
Misuse of CV
• Do not try to boost validation RR too much, or you
are running the risk of indirectly training the leftout data!
19