2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Target readings: Hastie, Tibshirani, Friedman Chapter 6 Kernel Smoothing Recall kernel-weighted averaging: N fˆ ( x0 ) K ( x , x ) y i 1 N 0 i i K ( x , x ) i 1 0 i Example: the kNN algorithm, with | x x0 | K ( x0 , x) D h ( x ) 0 D(u ) I | u | 1 and the width is data-driven: h ( x0) hk ( x0) | x0 x0( k ) | where x0( k ) is the kth nearest neighbor. Q: how do the variance and bias vary with the data density? Local linear regression Replace the one-df fit at x0 by a two-df fit: N RSS (x0, , ) K ( x0 , xi ) yi xi 2 i 1 T fˆ ( x0 ) (1 x0 ) arg min , RSS ( x0 , , ) So there’s a different regression fit for each point x0 . Local linear regression does “automatic kernel carpentry”. -1- 2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining What does this mean? N fˆ ( x0 ) (1 x0 )(BT KB) 1 BT Ky li ( x0 ) yi i 1 where K = K x0 diag ( K ( x0 , xi )) , B (1 X ) . Think of this as: N N i 1 i 1 fˆ ( x0 ) li ( x0 ) yi K * ( x0 , xi ; x0 ) yi I.e. a kernel-weighted average, but the shape of the kernel now depends on x0 , in a “good” way. See Fig 6.4. N Note that N l ( x ) 1 and ( x x )l ( x ) 0 i 1 i 0 i 1 i 0 i 0 because l ( x0 )(1 X ) (1 x0 )(BT KB)1 BT KB (1 x0 ) . So N N i 1 i 1 Efˆ ( x0 ) li ( x0 ) Eyi li ( x0 ) f ( xi ) N li ( x0 )( f ( x0 ) ( xi x0 ) f '( x0 ) O( xi x0 ) 2 i 1 N f ( x0 ) li ( x0 )O( xi x0 ) 2 i 1 -2- 2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Extensions & variations of kernel methods: Polynomial regression. Local multiple regression in p . Structured local regression in p . Local likelihood. Relationship to kernel density estimation Forget about y for a moment. Estimate fˆX ( x0 ) # xi N ( x0 ) N {width of N ( x0 )} or better: N fˆX ( x0 ) N 1 1K ( x0 , xi ) i 1 Now, do this conditional on group membership y, then estimate the regression as Pr(G j | X x0 ) ˆ j fˆX ; j ( x0 ) J ˆ k 1 k Conditioning Linear methods on X on Y Logistic regression Discriminant analysis fˆX ;k ( x0 ) Kernel method Kernel smoothing Kernel density estimation With kernel methods, “the model is the entire training data set, and the fitting is done at evaluation of prediction time.” (p. 190) -3-