Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Date: July 03, 2008 Contents Introduction Piecewise Polynomials and Splines Filtering and Feature Extraction Smoothing Splines Automatic Smoothing parameter selection 1. Introduction Basis: In Linear Algebra, a basis is a set of vectors satisfying: Linear combination of the basis can represent every vector in a given vector space; No element of the set can be represented as a linear combination of the others. In Function Space, Basis is degenerated to a set of basis functions; Each function in the function space can be represented as a linear combination of the basis functions. Example: Quadratic Polynomial bases {1,t,t^2} What is Basis Expansion? Given data X and transformation hm ( X ) : p , m 1,..., M . Then we model as a linear basis expansion in X, where hm ( X ) is a basis function. Why Basis Expansion? In regression problems, f(X) will typically nonlinear in X; Linear model is convenient and easy to interpret; When sample size is very small but attribute size is very large, Linear model is all what we can do to avoid over fitting. 2. Piecewise Polynomials and Splines Spline: In Mathematics, a spline is a special function defined piecewise by polynomials; In Computer Science, the term spline more frequently refers to a piecewise polynomial (parametric) curve. Simple construction, ease and accuracy of evaluation, capacity to approximate complex shapes through curve fitting and interactive curve design. Example of a Spline http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif Assume four knots spline (two boundary knots and two interior knots), also X is one dimensional. Piecewise constant basis: Piecewise Linear Basis: Piecewise Cubic Polynomial Basis functions: Six functions corresponding to a sixdimensional linear space. An M-order spline with knots j , j 1,..., K has continuous derivatives up to order M-2. The general form for truncated-power basis set would be: Natural cubic Spline A natural cubic spline adds additional constrains: function is linear beyond the boundary knots. A natural cubic spline with K knots is represented by K basis functions. One can start from a basis for cubic splines, and derive the reduced basis by imposing boundary constraints. Example of Natural cubic spline Starting from the truncated power series basis, we arrive at: Where An example of application (Phoneme Recognition) Data:1000 samples drawn from 695 “aa”s and 1022 “ao”s, with a feature vector of length 256. Goal: use such data to classify spoken phoneme. The coefficients can be plotted as a function of frequency Fitting via maximum likelihood only, the coefficient curve is very rough; Fitting through natural cubic splines: Rewrite the coefficient function as expansion of splines that’s where H is a p by M basis matrix of natural cubic splines. since we replace input features x by filtered version . Fit via linear logistic regression on x* Final result 3. Filtering and Feature Extraction Preprocessing high-dimensional features is a power method to improve performance of learning algorithm. Previous example , a filtering approach to transform features; They need not be linear, but can be in a general form . Another example: wavelet transform refers to section 5.9. 4.Smoothing Splines Purpose: avoid complexity of knot selection problem by using maximal set of knots. Complexity is controlled via regularization. Considering this problem: among all functions with two continuous second derivative, minimize Though RSS is defined on an infinitedimensional function space, it has an explicit, finite-dimensional unique minimizer : a natural cubic spline with knots at the unique values of the xi , i 1,..., N. Penalty term translates to a penalty on the spline coefficients. Rewrite the solution: , where are N-dimensional set of basis functions representing the family of natural splines. Matrix format criterion: Where . With ridge regression result, the solution: The fitted smooth spline is given by Example of a smoothing spline Degree of freedom and smoother matrix A smoothing spline with prechosen is a linear operator. Let fˆ be the N-vector of fitted values at the training predictors xi : Here S is called smoother matrix. It depends on , xi only. fˆ ( xi ) Suppose B is a N by M matrix of M cubic spline basis functions evaluated at the N training points xi , with knot sequence . The fitted spline value is given by: Here linear operator H is a projection operator, known as hat matrix in statistics. Similarity and difference between S and H Both are symmetric, positive, semi-definite. Idempotent Rank( S)=N, Rank( H )=M. Trace of H gives the dimension of the projection space (number of basis functions). Define effective degree of freedom as: By specifying df , we can derive . Since S is symmetric, hence rewrite is the solution of K is known as Penalty Matrix. Eigen-decomposition of S is given by: where k ( ) 1 1 dk d k , uk are eigen value and eigen vector of K. Highlights of eigen-decompostion The eigen-vectors are not effected by changes in . Shrinking nature . The eigen-vector sequence ordered by decreasing k ( ) appears to increase in complexity. First two eigen values are always 1, since d1=d2=0, showing Linear functions are not penalized. Figure: cubic smooth spline fitting to some data 5. Automatic selection of the smoothing parameters Selecting the placement and number of knots for regression splines can be a combinatorially complex task; For smoothing splines, only penalty . Method: fixing the degree of freedom, solve it from . Criterion: Bias-Variance tradeoff. The Bias-Variance Tradeoff Integrated squared prediction error (EPE): Cross Validation: An example: Figure: EPE,CV and effects for different degree of freedom Any questions?