Kernel Methods

Kernel Methods Dept. Computer Science & Engineering, Shanghai Jiao Tong University Outline • • • • • • • One-Dimensional Kernel Smoothers Local Regression Local Likelihood Kernel Density estimation Naive Bayes Radial Basis Functions Mixture Models and EM 2015/4/9 Kernel Methods 2 One-Dimensional Kernel Smoothers • k-NN: fˆ ( x)  Ave( yi | xi  Nk ( x)) • 30-NN curve is bumpy, since fˆ ( x) is discontinuous in x. • The average changes in a discrete way, leading to a discontinuous fˆ ( x) . 2015/4/9 Kernel Methods 3 One-Dimensional Kernel Smoothers • Nadaraya-Watson Kernel weighted average: ˆf ( x )    N 0 i 1 N K  ( x0 , xi ) yi i 1 K  ( x0 , xi ) • Epanechnikov quadratic kernel:  x  x0 K  ( x0 , x)  D   2015/4/9 Kernel Methods     4 One-Dimensional Kernel Smoothers • More general kernel:  x  x0   K  ( x0 , x)  D  h ( x )   0  – h ( x0 ): width function that determines the width of the neighborhood at x0. – For quadratic kernel h ( x0 )  , Bias constant – For k-NN kernel   k hk ( x0 ) | x0  x[ k ] |, replaced Variance  constant – The Epanechnikov kernel has compact support 2015/4/9 Kernel Methods 5 One-Dimensional Kernel Smoothers • Three popular kernel for local smoothing:  x  x0 K  ( x0 , x)  D       • Epanechnikov kernel and tri-cube kernel are compact but tri-cube has two continuous derivatives • Gaussian kernel is infinite support 2015/4/9 Kernel Methods 6 Local Linear Regression • Boundary issue – Badly biased on the boundaries because of the asymmetry of the kernel in the region. – Linear fitting remove the bias to first order 2015/4/9 Kernel Methods 7 Local Linear Regression • Locally weighted linear regression make a firstorder correction • Separate weighted least squares at each target point x0: min N K ( x , x )[ y   ( x )   ( x ) x ]2  ( x0 ),  ( x0 )  i 1  0 i i 0 0 i • The estimate: fˆ ( x0 )  ˆ ( x0 )  ˆ ( x0 ) x0 • b(x)T=(1,x); B: Nx2 regression matrix with i-th row b(x)T; WNN ( x0 )  diagK ( x0 , xi ), i  1,, N N fˆ ( x0 )  b( x0 )T ( BTW ( x0 ) B) 1 BTW ( x0 ) y   li ( x0 ) yi i 1 2015/4/9 Kernel Methods 8 Local Linear Regression • The weights li ( x0 ) combine the weighting kernel K ( x0 ,) and the least squares operations——Equivalent Kernel 2015/4/9 Kernel Methods 9 Local Linear Regression • The expansion for Efˆ ( x0 ), using the linearity of local regression and a series expansion of the true function f around x0 N N N i 1 i 1 Efˆ ( x0 )   li ( x0 ) f ( xi )  f ( x0 ) li ( x0 )  f ( x0 ) ( xi  x0 )li ( x0 ) i 1 f ( x0 ) 2  ( x  x ) li ( x0 )  R  i 0 2 i 1 N • For local regression i 1 li ( x0 )  1, i 1 ( xi  x0 )li ( x0 )  0 • The bias Efˆ ( x0 )  f ( x0 ) depends only on quadratic and higher-order terms in the expansion of f . N 2015/4/9 Kernel Methods N 10 Local Polynomial Regression • Fit local polynomial fits of any degree d N d   j min K  ( x0 , xi )  yi   ( x0 )    j ( x0 ) xi    ( x0 ),  j ( x0 ), j 1,, d i 1 j 1   d fˆ ( x0 )  ˆ ( x0 )   j 1 ˆ j ( x0 ) x0j 2015/4/9 Kernel Methods 2 11 Local Polynomial Regression • Bias only have components of degree d+1 and higher. • The reduction for bias costs the increased variance. 2 2 ˆ var( f ( x0 ))   l ( x0 ) , l ( x0 ) increases with d 2015/4/9 Kernel Methods 12 选择核的宽度 • 核 K  中， 是参数，控制核宽度： – 对于有紧支集的核， 取其支集区域的半径 – 对于高斯核， 取其方差 – 对k-对近邻域法， 取 k/N • 窗口宽度导致偏倚-方差权衡： – 窗口较窄，方差误差大，均值误差偏倚小 – 窗口较宽，方差误差小，均值误差偏倚大 2015/4/9 Kernel Methods 13 Structured Local Regression • Structured kernels  ( x  x0 )T A( x  x0 )  K  , A ( x0 , x)  D     – Introduce structure by imposing appropriate restrictions on A • Structured regression function f ( X1 , X 2 ,, X p )     g j ( X j )   gkl ( X k , X l )  j k l – Introduce structure by eliminating some of the higher-order terms 2015/4/9 Kernel Methods 14 Local Likelihood & Other Models • Any parametric model can be made local: – Parameter associated with yi : i   ( xi )  xiT  N – Log-likelihood: l (  )  i 1 l ( yi , xiT  ) – Model  ( X ) likelihood local to x0 : N l (  ( x0 ))   K  ( x0 , xi )l ( yi , xiT  ( x0 )) i 1 – A varying coefficient model  (z ) N l ( ( z0 ))   K  ( z0 , zi )l ( yi , ( x0 ,  ( z0 ))) i 1 e.g .  ( x,  )  xT  2015/4/9 Kernel Methods 15 Local Likelihood & Other Models • Logistic Regression Pr(G  j | X  x)  exp( j 0   Tj x) 1  k 1 exp( k 0   kT x) J 1 – Local log-likelihood for the J class model N T  K ( x , x )  ( x )   ( x ) i1  0 i gi 0 0 gi 0 ( xi  x0 )    log1  k 1 exp( k 0 ( x0 )   k ( x0 )T ( xi  x0 )) – Center the local regressions at J 1 ˆ ( x )) exp(  j0 0 Pˆ r(G  j | X  x)  J 1 1  k 1 exp(ˆk 0 ( x0 )) 2015/4/9 Kernel Methods 16 Kernel Density Estimation • A natural local estimate ˆf ( x )  # xi  N ( x0 ) X 0 N • The smooth Parzen estimate N ˆf ( x )  1 K ( x0 , xi )  X 0 i 1  N – For Gaussian kernel K ( x , x ) /    ( x  x )  0 i i 0 – The estimate become ˆf ( x)  1 N  ( x  x )   i 0 X N i 1 1 1 N 2  exp(  ( x  x /  ) ) i 0 2 p / 2 i 1 N (2  ) 2 2015/4/9 Kernel Methods 17 Kernel Density Estimation • A kernel density estimate for systolic blood pressure. The density estimate at each point is the average contribution from each of the kernels at that point. 2015/4/9 Kernel Methods 18 Kernel Density Classification • Bayes’ theorem: • The estimate for CHD uses the tri-cube kernel with k-NN bandwidth. 2015/4/9 Kernel Methods 19 Kernel Density Classification • The population class densities and the posterior probabilities 2015/4/9 Kernel Methods 20 Naïve Bayes • Naïve Bayes model assume that given a class G=j, the features Xk are independent: p f j ( X )   f jk ( X k ) k 1 ˆ – f jk ( X k ) is kernel density estimate, or Gaussian, for coordinate Xk in class j. – If Xk is categorical, use Histogram.   k 1 f k ( X k )   f ( X ) Pr(G   | X ) logit  log  log p Pr(G  J | X )  J fJ (X )  J  f Jk ( X k ) p k 1 2015/4/9  f k ( X k ) p p  log  k 1 log     k 1 g k ( X k ) J f Jk ( X k ) Kernel Methods 21 Radial Basis Function & Kernel • Radial basis function combine the local and flexibility of kernel methods.  x  j f ( x)   j 1 K  j ( j , x)  j   j 1 D  j  M M    j  – Each basis element is indexed by a location or prototype parameter  j and a scale parameter  j – D, a pop choice is the standard Gaussian density function. 2015/4/9 Kernel Methods 22 Radial Basis Function & Kernel • For simplicity, focus on least squares methods for regression, and use the Gaussian kernel. • RBF network model: 2 T N  M    ( x   ) ( x   )   i j i j  min   yi  0    j exp  2   , ,  i 1     j 1 j    j j M j 1 • Estimate the  j ,  j separately from the  j . • A undesirable side effect of creating holes—— regions of IRp where none of the kernels has appreciable support. 2015/4/9 Kernel Methods 23 Radial Basis Function & Kernel • Renormalized radial basis functions. D x  j /  h j ( x)    M k 1  D x   k /   • The expansion in renormalized RBF K ( x0 , xi ) N f ( x)  i 1 yi N i 1 K  ( x0 , xi )  N  ifunction y h ( x0with ) fixed width can Gaussian radial basis 1 i i leave holes. Renormalized Gaussian radial basis function produce basis functions similar in some respects to B-splines. 2015/4/9 Kernel Methods 24 Mixture Models & EM • Gaussian Mixture Model M f ( x)  m 1 m ( x;  m ,  m ) –  m are mixture proportions,  M m 1 m  1 • EM algorithm for mixtures – Given x1 , x2 ,, xn ,log-likelihood:  l ( y, )  i 1 log 1 ( xi )  (1   ) 2 ( xi ) N  Bad – Suppose we observe Latent Binary N N L( x, z, )  i 1 log ( xi ) i 1 log(1   ) ( xi ) zi 1 zi  0 1 z such that z  1  x ~  , z  0  x ~  2 1 2015/4/9 Kernel Methods 2 Good 25 Mixture Models & EM • Given  0 ,compute ~ ~ 0 ( )  E( x, z, )( , y), max(( )) • In Example ˆˆ ( xi )  0 E ( zi | xi , )   wi ˆˆ ( xi )  (1  ˆ )ˆ ( xi ) 1  1  2  ( )  i 1 wi log ˆ1 ( xi )  (1  wi ) log (1   ) 2 ( xi ) N 2015/4/9 Kernel Methods  26 Mixture Models & EM • Application of mixtures to the heart disease risk factor study. 2015/4/9 Kernel Methods 27 Mixture Models & EM • Mixture model used for classification of the simulated data 2015/4/9 Kernel Methods 28

Kernel Methods

Related documents

Products

Support

Kernel Methods

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib