Machine Learning Performance Evaluation

Machine Learning Performance Evaluation Performance Evaluation: Estimation of Recognition rates J.-S. Roger Jang ( 張智星 ) CSIE Dept., National Taiwan Univ. http://mirlab.org/jang jang@mirlab.org Machine Learning Performance Evaluation Outline Performance indices of a given classifier/model • Accuracy (recognition rate) • Computation load Methods to estimate the recognition rate • • • • • 2 Inside test One-sided holdout test Two-sided holdout test M-fold cross validation Leave-one-out cross validation Machine Learning Performance Evaluation Synonym The following sets of synonyms will be use interchangeably • Classifier, model • Recognition rate, accuracy 3 Machine Learning Performance Evaluation Performance Indices Performance indices of a classifier • Recognition rate - Requires an objective procedure to derive it • Computation load - Design-time computation - Run-time computation Our focus • Recognition rate and the procedures to derive it • The estimated accuracy depends on - Dataset - Model (types and complexity) 4 Machine Learning Performance Evaluation Methods for Deriving Recognition rates Methods to derive the recognition rates • • • • • Inside test (resubstitution recog. rate) One-sided holdout test Two-sided holdout test M-fold cross validation Leave-one-out cross validation Data partitioning • Training set • Training and test sets • Training, validating, and test sets 5 Machine Learning Performance Evaluation Inside Test Dataset partitioning • Use the whole dataset for training & evaluation Recognition rate • Inside-test recognition rate • Resubstitution accuracy Training dataset  whole set D =  xiD ; yiD  FD () : Model identified by the training data 1 D D RR  1  sgn y  F ( x  i D i ) |D| i 6 Machine Learning Performance Evaluation Inside Test (2) Characteristics • Too optimistic since RR tends to be higher • For instance, 1-NNC always has an RR of 100%! • Can be used as the upper bound of the true RR. Potential reasons for low inside-test RR: • Bad features of the dataset • Bad method for model construction, such as - Bad results from neural network training - Bad results from k-means clustering 7 Machine Learning Performance Evaluation One-side Holdout Test Dataset partitioning • Training set for model construction • Test set for performance evaluation Recognition rate • Inside-test RR • Outside-test RR Training set A =   xiA ; yiA , test set B =  xiB ; yiB FA () : Model identified using set A 8 1 B B Outside-test RR  1  sgn y  F ( x  i A i ) |B| i  Machine Learning Performance Evaluation One-side Holdout Test (2) Characteristics • Highly affected by data partitioning • Usually Adopted when design-time computation load is high 9 Machine Learning Performance Evaluation Two-sided Holdout Test Dataset partitioning • Training set for model construction • Test set for performance evaluation • Role reversal Data set A =  x A i  ; yiA , data Set B =  x B i ; yiB  FA () : Model identified using data set A FB () : Model identified using data set B 1  B B A A  Outside-test RR  1  sgn yi  FA ( xi )   sgn yi  FB ( xi )    | A|  | B | i i  1  A A B B  Inside-test RR  1  sgn yi  FA ( xi )   sgn yi  FB ( xi )    | A|  | B | i 10 i  Machine Learning Performance Evaluation Two-sided Holdout Test (2) Two-sided holdout test (used in GMDH) Data set A Data set B construction Model A evaluation RRA Model B evaluation RRB construction Outside-test RR = (RRA + RRB)/2 11 Machine Learning Performance Evaluation Two-sided Holdout Test (3) Characteristics • Better usage of the dataset • Still highly affected by the partitioning • Suitable for models/classifiers with high designtime computation load 12 Machine Learning Performance Evaluation M-fold Cross Validation Data partitioning • Partition the dataset into m fold • One fold for test, the other folds for training • Repeat m times Data set S=S1  S2  S3  S1 S2 S3  Sm Sm Si  S j   , i  j Class distribution within each Si should be as close as possible to S .  m Cross-validation RR  1     sgn yi  FS  Si ( xi  j 1  xi ; yi Si  13  m )  / | S j |  j 1  Machine Learning Performance Evaluation M-fold Cross Validation (2) S1 m disjoint sets S2 .. . construction .. . Sk .. . Model k m Sm RRCV  Outside test 14 evaluation  RR k k 1 m RRk Machine Learning Performance Evaluation M-fold Cross Validation (3) Characteristics • When m=2  Two-sided holdout test • When m=n  Leave-one-out cross validation • The value of m depends on the computation load imposed by the selected model/classifier. 15 Machine Learning Performance Evaluation Leave-one-out Cross Validation Data partitioning • When m=n and Si=(xi, yi) Data set S=S1  S2  S3  S1 S2 S3  Sm Sm Si  S j   , i  j Class distribution within each Si should be as close as possible to S .  m Cross-validation RR  1     sgn yi  FS  Si ( xi  j 1  xi ; yi Si  16  m )  / | S j |  j 1  Machine Learning Performance Evaluation Leave-one-out Cross Validation (2) Leave-one-out CV    x ; y   x1 ; y1 2 n i/o pairs .. . 2 construction .. . 0% or 100%!  xk ; yk  .. .  xn ; yn  Model k  n RRLOOCV  Outside test 17 evaluation  RR k k 1 n RRk Machine Learning Performance Evaluation Leave-one-out Cross Validation (3) General method for LOOCV • Perform model construction (as a blackbox) n times  Slow! To speed up the computation LOOCV • Construct a common part that will be used repeatedly, such as - Global mean and covariance for QC • More info of cross-validation on Wikipedia 18 Machine Learning Performance Evaluation Applications and Misuse of CV Applications of CV • Input (feature) selection • Model complexity determination • Performance comparison among different models Misuse of CV • Do not try to boost validation RR too much, or you are running the risk of indirectly training the leftout data! 19

Machine Learning Performance Evaluation

Related documents

Products

Support

Machine Learning Performance Evaluation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib