Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz

advertisement
Basis Expansion and
Regularization
Presenter: Hongliang Fei
Brian Quanz
Date:
July 03, 2008
Contents





Introduction
Piecewise Polynomials and
Splines
Filtering and Feature Extraction
Smoothing Splines
Automatic Smoothing parameter
selection
1. Introduction



Basis: In Linear Algebra, a basis is a
set of vectors satisfying:
Linear combination of the basis can
represent every vector in a given
vector space;
No element of the set can be
represented as a linear combination
of the others.



In Function Space, Basis is
degenerated to a set of basis
functions;
Each function in the function space
can be represented as a linear
combination of the basis functions.
Example: Quadratic Polynomial bases
{1,t,t^2}
What is Basis Expansion?

Given data X and transformation
hm ( X ) : p  , m  1,..., M . Then we model
as a linear basis expansion in X, where
hm ( X ) is a basis function.
Why Basis Expansion?



In regression problems, f(X) will
typically nonlinear in X;
Linear model is convenient and easy
to interpret;
When sample size is very small but
attribute size is very large, Linear
model is all what we can do to avoid
over fitting.
2. Piecewise Polynomials and
Splines




Spline:
In Mathematics, a spline is a special function
defined piecewise by polynomials;
In Computer Science, the term spline more
frequently refers to a piecewise polynomial
(parametric) curve.
Simple construction, ease and accuracy of
evaluation, capacity to approximate
complex shapes through curve fitting and
interactive curve design.
Example of a Spline
http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif

Assume four knots spline (two
boundary knots and two interior
knots), also X is one dimensional.
Piecewise constant basis:

Piecewise Linear Basis:

Piecewise Cubic Polynomial


Basis functions:
Six functions corresponding to a sixdimensional linear space.

An M-order spline with knots  j , j  1,..., K
has continuous derivatives up to
order M-2. The general form for
truncated-power basis set would be:
Natural cubic Spline



A natural cubic spline adds additional
constrains: function is linear beyond
the boundary knots.
A natural cubic spline with K knots is
represented by K basis functions.
One can start from a basis for cubic
splines, and derive the reduced basis
by imposing boundary constraints.
Example of Natural cubic spline

Starting from the truncated power
series basis, we arrive at:
Where
An example of application (Phoneme
Recognition)



Data:1000 samples drawn from 695
“aa”s and 1022 “ao”s, with a feature
vector of length 256.
Goal: use such data to classify
spoken phoneme.
The coefficients can be plotted as a
function of frequency






Fitting via maximum likelihood only, the
coefficient curve is very rough;
Fitting through natural cubic splines:
Rewrite the coefficient function as
expansion of splines
that’s
where H is a p by M basis matrix of
natural cubic splines.
since
we replace input features x
by filtered version
.
Fit  via linear logistic regression on x*
Final result
3. Filtering and Feature Extraction



Preprocessing high-dimensional features is
a power method to improve performance
of learning algorithm.
Previous example
, a filtering
approach to transform features;
They need not be linear, but can be in a
general form
.
Another example: wavelet transform
refers to section 5.9.
4.Smoothing Splines



Purpose: avoid complexity of knot
selection problem by using maximal set of
knots.
Complexity is controlled via regularization.
Considering this problem: among all
functions with two continuous second
derivative, minimize


Though RSS is defined on an infinitedimensional function space, it has an
explicit, finite-dimensional unique
minimizer : a natural cubic spline
with knots at the unique values of
the xi , i  1,..., N.
Penalty term translates to a penalty
on the spline coefficients.

Rewrite the solution:
, where
are N-dimensional set of basis functions
representing the family of natural splines.
Matrix format criterion:

Where
.
With ridge regression result, the solution:

The fitted smooth spline is given by

Example of a smoothing spline
Degree of freedom and smoother matrix


A smoothing spline with prechosen 
is a linear operator.
Let fˆ be the N-vector of fitted values
at the training predictors xi :
Here S is called smoother matrix. It
depends on  , xi only.
fˆ ( xi )

Suppose B is a N by M matrix of M
cubic spline basis functions evaluated
at the N training points xi , with knot
sequence  . The fitted spline value is
given by:
Here linear operator H  is a projection
operator, known as hat matrix in
statistics.
Similarity and difference between S and H




Both are symmetric, positive, semi-definite.
Idempotent
Rank( S)=N, Rank( H )=M.
Trace of H  gives the dimension of the
projection space (number of basis
functions).


Define effective degree of freedom
as:
By specifying df  , we can derive  .
 Since S  is symmetric, hence rewrite
is the solution of
K is known as Penalty Matrix.

Eigen-decomposition of S is given by:
where
 k ( ) 
1
1   dk
d k , uk are eigen value and eigen vector of K.
Highlights of eigen-decompostion




The eigen-vectors are not effected by
changes in  .
Shrinking nature
.
The eigen-vector sequence ordered by
decreasing  k ( ) appears to increase in
complexity.
First two eigen values are always 1,
since d1=d2=0, showing Linear
functions are not penalized.
Figure: cubic smooth spline fitting to some data
5. Automatic selection of the
smoothing parameters




Selecting the placement and number
of knots for regression splines can be
a combinatorially complex task;
For smoothing splines, only penalty  .
Method: fixing the degree of freedom,
solve it from
.
Criterion: Bias-Variance tradeoff.
The Bias-Variance Tradeoff

Integrated squared prediction error (EPE):

Cross Validation:

An example:
Figure: EPE,CV and effects for different
degree of freedom
Any questions?
Download