Machine Learning Performance Evaluation Performance Evaluation: Estimation of Recognition rates J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. http://mirlab.org/jang jang@mirlab.org Machine Learning Performance Evaluation Outline Performance indices of a given classifier/model • Accuracy (recognition rate) • Computation load Methods to estimate the recognition rate • • • • • 2 Inside test One-sided holdout test Two-sided holdout test M-fold cross validation Leave-one-out cross validation Machine Learning Performance Evaluation Synonym The following sets of synonyms will be use interchangeably • Classifier, model • Recognition rate, accuracy 3 Machine Learning Performance Evaluation Performance Indices Performance indices of a classifier • Recognition rate - Requires an objective procedure to derive it • Computation load - Design-time computation - Run-time computation Our focus • Recognition rate and the procedures to derive it • The estimated accuracy depends on - Dataset - Model (types and complexity) 4 Machine Learning Performance Evaluation Methods for Deriving Recognition rates Methods to derive the recognition rates • • • • • Inside test (resubstitution recog. rate) One-sided holdout test Two-sided holdout test M-fold cross validation Leave-one-out cross validation Data partitioning • Training set • Training and test sets • Training, validating, and test sets 5 Machine Learning Performance Evaluation Inside Test Dataset partitioning • Use the whole dataset for training & evaluation Recognition rate • Inside-test recognition rate • Resubstitution accuracy Training dataset whole set D = xiD ; yiD FD () : Model identified by the training data 1 D D RR 1 sgn y F ( x i D i ) |D| i 6 Machine Learning Performance Evaluation Inside Test (2) Characteristics • Too optimistic since RR tends to be higher • For instance, 1-NNC always has an RR of 100%! • Can be used as the upper bound of the true RR. Potential reasons for low inside-test RR: • Bad features of the dataset • Bad method for model construction, such as - Bad results from neural network training - Bad results from k-means clustering 7 Machine Learning Performance Evaluation One-side Holdout Test Dataset partitioning • Training set for model construction • Test set for performance evaluation Recognition rate • Inside-test RR • Outside-test RR Training set A = xiA ; yiA , test set B = xiB ; yiB FA () : Model identified using set A 8 1 B B Outside-test RR 1 sgn y F ( x i A i ) |B| i Machine Learning Performance Evaluation One-side Holdout Test (2) Characteristics • Highly affected by data partitioning • Usually Adopted when design-time computation load is high 9 Machine Learning Performance Evaluation Two-sided Holdout Test Dataset partitioning • Training set for model construction • Test set for performance evaluation • Role reversal Data set A = x A i ; yiA , data Set B = x B i ; yiB FA () : Model identified using data set A FB () : Model identified using data set B 1 B B A A Outside-test RR 1 sgn yi FA ( xi ) sgn yi FB ( xi ) | A| | B | i i 1 A A B B Inside-test RR 1 sgn yi FA ( xi ) sgn yi FB ( xi ) | A| | B | i 10 i Machine Learning Performance Evaluation Two-sided Holdout Test (2) Two-sided holdout test (used in GMDH) Data set A Data set B construction Model A evaluation RRA Model B evaluation RRB construction Outside-test RR = (RRA + RRB)/2 11 Machine Learning Performance Evaluation Two-sided Holdout Test (3) Characteristics • Better usage of the dataset • Still highly affected by the partitioning • Suitable for models/classifiers with high designtime computation load 12 Machine Learning Performance Evaluation M-fold Cross Validation Data partitioning • Partition the dataset into m fold • One fold for test, the other folds for training • Repeat m times Data set S=S1 S2 S3 S1 S2 S3 Sm Sm Si S j , i j Class distribution within each Si should be as close as possible to S . m Cross-validation RR 1 sgn yi FS Si ( xi j 1 xi ; yi Si 13 m ) / | S j | j 1 Machine Learning Performance Evaluation M-fold Cross Validation (2) S1 m disjoint sets S2 .. . construction .. . Sk .. . Model k m Sm RRCV Outside test 14 evaluation RR k k 1 m RRk Machine Learning Performance Evaluation M-fold Cross Validation (3) Characteristics • When m=2 Two-sided holdout test • When m=n Leave-one-out cross validation • The value of m depends on the computation load imposed by the selected model/classifier. 15 Machine Learning Performance Evaluation Leave-one-out Cross Validation Data partitioning • When m=n and Si=(xi, yi) Data set S=S1 S2 S3 S1 S2 S3 Sm Sm Si S j , i j Class distribution within each Si should be as close as possible to S . m Cross-validation RR 1 sgn yi FS Si ( xi j 1 xi ; yi Si 16 m ) / | S j | j 1 Machine Learning Performance Evaluation Leave-one-out Cross Validation (2) Leave-one-out CV x ; y x1 ; y1 2 n i/o pairs .. . 2 construction .. . 0% or 100%! xk ; yk .. . xn ; yn Model k n RRLOOCV Outside test 17 evaluation RR k k 1 n RRk Machine Learning Performance Evaluation Leave-one-out Cross Validation (3) General method for LOOCV • Perform model construction (as a blackbox) n times Slow! To speed up the computation LOOCV • Construct a common part that will be used repeatedly, such as - Global mean and covariance for QC • More info of cross-validation on Wikipedia 18 Machine Learning Performance Evaluation Applications and Misuse of CV Applications of CV • Input (feature) selection • Model complexity determination • Performance comparison among different models Misuse of CV • Do not try to boost validation RR too much, or you are running the risk of indirectly training the leftout data! 19