RECITATION 2 APRIL 28 Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation Penalized Cubic Regression Splines • gam() in library “mgcv” • gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset) • By default, the optimal smoothing parameter selected by GCV • R Demo 1 Kernel Method • Nadaraya-Watson locally constant model • locally linear polynomial model • How to define “local”? • By Kernel function, e.g. Gaussian kernel • R Demo 1 • R package: “locfit” • Function: locfit(y~x, kern=“gauss”, deg= , alpha= ) • Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= , alpha= bandwidth range) Gaussian Processes • Distribution on functions • f ~ GP(m,κ) • m: mean function • κ: covariance function • p(f(x1), . . . , f(xn)) ∼ Nn(μ, K) • μ = [m(x1),...,m(xn)] • Kij = κ (xi,xj) • Idea: If xi, xj are similar according to the kernel, then f(xi) is similar to f(xj) Gaussian Processes – Noise free observations • Example task: • learn a function f(x) to estimate y, from data (x, y) • A function can be viewed as a random variable of infinite dimensions • GP provides a distribution over functions. Gaussian Processes – Noise free observations • Model • (x, f) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noise free data (x, f), • Length-scale • R Demo 2 Gaussian Processes – Noisy observations (GP for Regression) • Model • (x, y) are the observed locations and values (training data) • (x*, f*) are the test or prediction data locations and values. • After observing some noisy data (x, y), • R Demo 3 Reference • Chapter 2 from Gaussian Processes for Machine Learning Carl Edward Rasmussen and Christopher K. I. Williams • 527 lecture notes by Emily Fox Mixture Models – Density Estimation • EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC) • Remember: • EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD • MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior) EM algorithm • Iterative procedure that attempts to maximize log- likelihood ---> MLE estimates of the mixture model parameters. • I.e. one final density estimate Bayesian Mixture Modeling (MCMC) • Uses an iterative procedure to DRAW SAMPLES from posterior (then you can average draws, etc.) • Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.