The Elements of Statistical Learning Theory Hastie, Tibshirani,Friedman December 20, 2002

The Elements of Statistical Learning Theory Hastie, Tibshirani,Friedman December 20, 2002 What is statistical learning? • supervised learning - response can be continuous so this is broader than supervised classification • unsupervised learning - cluster analysis 1 Examples 1. Email Spam: How to design an automatic spam detector based on commonly occurring words, and punctuation marks (57 vars)? (HTF use general additive models and trees) 2. Prostrate cancer: How to predict size of cancer based on prostrate size, age, .... (8 vars)? (HTF use regression) 3. Handwritten digit recognition: Build an automatic zip code reader for the US postal system based on 16×16 normalized images with 0-255 grey scale value for each pixel. (HTF use neural network and ICA classification) 4. Gene expression: What gene activate together to generate actions in organisms (thousands of genes, not too many experiments)? (HTF use cluster analysis) 2 Outline • terminology • parametric regression • parametric classification • nonparametric approaches 3 Terminology • observation ≡ instance • explanatory variable ≡ features (attributes) • response variable ≡ outcome (concept) • training set: the outcome and feature measurements for a sample of instances 4 Supervised Learning Model: y = f (X) + ε; training set L = (yi, Xi), i = 1, . . . , N Fit using: PN • Least squares: RSS(f ) = i=1(yi − f (X)i)2 P • Nearest-neighbor: fˆ(X) = 1 h Xi∈Nh (X) yi • *Penalized RSS: P RSS(f ) = RSS(f ) + λJ(f ) 5 Parametric Regression When p is small, use least squares, else • Best subset • Ridge • Lasso • Principal Component Regression (PCR) • Partial Least Squares (PLS) 6 Parametric Classification • Linear Discriminant Analysis (LDA) • Logistic Regression • Quadratic Discriminant Analysis (QDA) 7 Non-parametric Approach When p is small: • Basis expansion: expand the dimension of the input space via non-linear transformation, eg splines, wavelets, GAM, SVM • Local average: eg kernels, local likelihood, nearest neighbors 8 When p is large: • Partitioning the input space, eg Trees, PRIM, HME • additive spline methods: MARS • additive models with dimension reduction: projection pursuit, neural networks, SVM • subsampling, ensemble methods: bagging, boosting, random forests 9

The Elements of Statistical Learning Theory Hastie, Tibshirani,Friedman December 20, 2002

Related documents

Products

Support

The Elements of Statistical Learning Theory Hastie, Tibshirani,Friedman December 20, 2002

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib