2005-04-15 kernel smoothing

2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Target readings: Hastie, Tibshirani, Friedman Chapter 6 Kernel Smoothing Recall kernel-weighted averaging: N fˆ ( x0 )   K ( x , x ) y i 1 N 0 i i  K ( x , x ) i 1 0 i Example: the kNN algorithm, with  | x  x0 |  K  ( x0 , x)  D   h ( x )   0  D(u )  I | u | 1 and the width is data-driven: h ( x0)  hk ( x0) | x0  x0( k ) | where x0( k ) is the kth nearest neighbor. Q: how do the variance and bias vary with the data density? Local linear regression Replace the one-df fit at x0 by a two-df fit: N RSS (x0, ,  )   K  ( x0 , xi )  yi     xi  2 i 1 T fˆ ( x0 )  (1 x0 ) arg min ,  RSS ( x0 , ,  ) So there’s a different regression fit for each point x0 . Local linear regression does “automatic kernel carpentry”. -1- 2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining What does this mean? N fˆ ( x0 )  (1 x0 )(BT KB) 1 BT Ky   li ( x0 ) yi i 1 where K = K  x0   diag ( K ( x0 , xi )) , B  (1 X ) . Think of this as: N N i 1 i 1 fˆ ( x0 )   li ( x0 ) yi   K  * ( x0 , xi ; x0 ) yi I.e. a kernel-weighted average, but the shape of the kernel now depends on x0 , in a “good” way. See Fig 6.4. N Note that N  l ( x )  1 and  ( x  x )l ( x )  0 i 1 i 0 i 1 i 0 i 0 because l ( x0 )(1 X )  (1 x0 )(BT KB)1 BT KB  (1 x0 ) . So N N i 1 i 1 Efˆ ( x0 )   li ( x0 ) Eyi   li ( x0 ) f ( xi ) N   li ( x0 )( f ( x0 )  ( xi  x0 ) f '( x0 )  O( xi  x0 ) 2 i 1 N  f ( x0 )   li ( x0 )O( xi  x0 ) 2 i 1 -2- 2005-04-15 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Extensions & variations of kernel methods: Polynomial regression. Local multiple regression in p . Structured local regression in p . Local likelihood. Relationship to kernel density estimation Forget about y for a moment. Estimate fˆX ( x0 )  # xi  N ( x0 ) N  {width of N ( x0 )} or better: N fˆX ( x0 )  N 1   1K  ( x0 , xi ) i 1 Now, do this conditional on group membership y, then estimate the regression as Pr(G  j | X  x0 )  ˆ j fˆX ; j ( x0 ) J ˆ k 1 k Conditioning Linear methods on X on Y Logistic regression Discriminant analysis fˆX ;k ( x0 ) Kernel method Kernel smoothing Kernel density estimation With kernel methods, “the model is the entire training data set, and the fitting is done at evaluation of prediction time.” (p. 190) -3-

2005-04-15 kernel smoothing

Related documents

Products

Support

2005-04-15 kernel smoothing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib