RECITATION 3 APRIL 30 Spline and Kernel method Gaussian Processes Penalized Cubic Regression Splines • gam() in library “mgcv” • gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset) • By default, the optimal smoothing parameter selected by GCV • R Demo 1 Kernel Method • Nadaraya-Watson locally constant model • locally linear polynomial model • How to define “local”? • By Kernel function, e.g. Gaussian kernel • R Demo 1 • R package: “locfit” • Function: locfit(y~x, kern=“gauss”, deg= , alpha= ) • Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= , alpha= bandwidth range) Gaussian Processes • Distribution on functions • f ~ GP(m,κ) • m: mean function • κ: covariance function • p(f(x1), . . . , f(xn)) ∼ Nn(μ, K) • μ = [m(x1),...,m(xn)] • Kij = κ (xi,xj) • Idea: If xi, xj are similar according to the kernel, then f(xi) is similar to f(xj) Gaussian Processes – Noise free observations • Example task: • learn a function f(x) to estimate y, from data (x, y) • A function can be viewed as a random variable of infinite dimensions • GP provides a distribution over functions. Gaussian Processes – Noise free observations • Model • (x, f) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noise free data (x, f), • Length-scale • R Demo 2 Gaussian Processes – Noisy observations (GP for Regression) • Model • (x, y) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noisy data (x, y), • R Demo 3 Reference • Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams • 527 lecture notes by Emily Fox