Basis Expansion and Regularization Prof. Liqing Zhang

advertisement
Basis Expansion and
Regularization
Prof. Liqing Zhang
Dept. Computer Science & Engineering,
Shanghai Jiaotong University
Outline
• Piece-wise Polynomials and Splines
• Wavelet Smoothing
• Smoothing Splines
• Automatic Selection of the Smoothing Parameters
• Nonparametric Logistic Regression
• Multidimensional Splines
• Regularization and Reproducing Kernel Hilbert
Spaces
2015/4/13
Basis Expansion and Regularization
2
Piece-wise Polynomials and Splines
• Linear basis expansion
N
f ( x)    m hm ( x)
m 1
• Some basis functions that are widely used
hm ( x)  x
hm ( x)  x p
hm ( x)  log(x)
hm ( x)  sin(mx); cos(mx)
2015/4/13
Basis Expansion and Regularization
3
Regularization
• Three approaches for controlling the
complexity of the model.
f ( x)    h ( x)
– Restriction
N
– Selection
– Regularization:
k k
k 1
m
y    k hk ( x)  
k 1
M
min

i 1
M
min

i 1
M
m
i 1
k 1
 (i )  yi    k hk ( xi )
2
m
2
2
yi    k hk ( xi )  J (  )
k 1
J ( )   ?
2
2015/4/13
Basis Expansion and Regularization
4
Piecewise Polynomials and Splines
h1  X   I  X  1 ,
h2  X   I 1  X   2 
h3  X   I  2  X  ;
h1  X   I  X  1 ,
h2  X   I 1  X   2 
h3  X   I  2  X ,
hm  3  X   hm  X  X ;
h1  X   1, h2  X   X ,
h3  X    X  1  ,
h4  X    X   2 
2015/4/13
Basis Expansion and Regularization
5
Piecewise Cubic Polynomials
• Increasing orders
of continuity at the
knots.
• A cubic spline with
knots at  1 and 2 :
1, X , X 2 , X 3 ,
 X  1 3 ,  X   2 3 ;
• Cubic spline
truncated power
basis
2015/4/13
Basis Expansion and Regularization
6
Piecewise Cubic Polynomials
• An order-M spline with knots , j=1,…,K is a
piecewise-polynomial of order M, and has
continuous derivatives up to order M-2.
• A cubic spline has M=4.
• Truncated power basis set:
hj ( X )  X
j 1
,
j  1, , M
hM l ( X )  ( X   l ) M 1 , l  1,, M
2015/4/13
Basis Expansion and Regularization
7
Natural cubic spline
• Natural cubic spline adds additional constraints,
namely that the function is linear beyond the
boundary knots.
Natural boundary constraints
Linear
Cubic
Cubic
1
2015/4/13
Linear
2
3
Basis Expansion and Regularization
8
B-spline
• The augmented knot sequenceτ:
1   2     M  0 ;
 jM   j ,
j  1,, K ;
 K 1   K  M 1   K  M  2     K  2 M
• Bi,m(x), the i-th B-spline basis function of order m
for the knot-sequenceτ, m≤M.
1  i  x   i 1 
Bi ,1 ( x)  
 i  1,, K  2M  m
0 others

x  i
 im  x
Bi ,m ( x) 
Bi.m 1 ( x) 
Bi 1,m 1 ( x)
 i  m 1   i
 i  m   i 1
2015/4/13
Basis Expansion and Regularization
9
B-spline
• The sequence of
B-spline up to
order 4 with ten
knots evenly
spaced from
0 to 1
• The B-spline
have local
support; they are
nonzero on an
interval spanned
by M+1 knots.
2015/4/13
Basis Expansion and Regularization
10
Smoothing Splines
• Base on the spline basisN method:
m
f ( x)    k hk ( x)
• So y    k hk ( x)   ,  is the noise.
k 1
k 1
• Minimize the penalized residual sum of squares
N
RSS( f ,  )   yi  f ( xi )     f ' ' (t ) dt
2
2
i 1
 is a fixed smoothing parameter
  0 : f can be any function that interpolates the data
   : the simple least squares line fit
2015/4/13
Basis Expansion and Regularization
11
Smoothing Splines
N
• The solution is a natural spline: f ( x)   N j ( x) j
j 1
• Then the criterion reduces to:
T
RSS( , )   y  N   y  N    T N
– where N  {N j ( xi ) };  Nij   N i" (t ) N "j (t )dt
• So the solution:
ˆ  ( N T N  N )1 N T y
• The fitted smoothing spline:
N
fˆ ( x)   N j ( x)ˆ j
j 1
2015/4/13
Basis Expansion and Regularization
12
Smoothing Splines
脊骨BMD—骨质密度
2015/4/13
• Function of age,
that response the
relative change in
bone mineral
density measured
at the spline in
adolescents
• Separate
smoothing
splines fit the
males and
females,  0.00022
• 12 degrees of
freedom
Basis Expansion and Regularization
13
Smoothing Matrix
• fˆ the N-vector of fitted values
fˆ  N ( N T N  N )1 N T y  S y
• The finite linear operator S — the smoother matrix
• Compare with the linear operator in the LS-fitting:
M cubic-spline basis functions, knot sequenceξ
T
T
fˆ  B ( B B ) 1 B y  H y, B is N  M matrix
• Similarities and differences:
– Both are symmetric, positive semidefinite matrices
– H H  H idempotent(幂等的) ; S S  S shrinking
– rank: r (S )  N , r ( H )  M
2015/4/13
Basis Expansion and Regularization
14
Smoothing Matrix
• Effective degrees of freedom of a smoothing spline
df  trace(S )
• S in the Reinsch form: S  (I  K )1
2
T
ˆ
f

S
y
min
y

f


f
Kf
• Since
 , solution:
f
• S is symmetric and has a real eigen-decomposition
N
S     k ( )uk ukT
k 1
 k ( ) 
1
1  d k
– d k is the corresponding eigenvalue of K
2015/4/13
Basis Expansion and Regularization
15
• Smoothing spline
fit of ozone(臭氧)
concentration
versus Daggot
pressure gradient.
• Smoothing
parameter df=5
and df=10.
• The 3rd to 6th
eigenvectors of
the spline
smoothing
matrices
2015/4/13
Basis Expansion and Regularization
16
• The smoother matrix for a
smoothing spline is nearly
banded, indicating an
equivalent kernel with local
support.
2015/4/13
Basis Expansion and Regularization
17
Bias-Variance Tradeoff
• Example:
Y  f (X )  
sin(12( X  0.2))
f (X ) 
X  0.2
• For fˆ  S y, then cov(fˆ )  S cov(y)ST  S ST
• The diagonal contains the pointwise variances at
the training xi
• Bias is given by Bias( fˆ )  f  E( fˆ )  f  S f
• f is the (unknown) vector of evaluations of the true f
2015/4/13
Basis Expansion and Regularization
18
Bias-Variance Tradeoff
• df=5, bias high,
standard error band
narrow
• df=9, bias slight,
variance not
increased appreciably
• df=15, over learning,
standard error widen
2015/4/13
Basis Expansion and Regularization
19
Bias-Variance Tradeoff
• The integrated squared prediction
error
(EPE)
The
EPE
and CV
combines both bias and variancecurves
in a single
have the a
summary: EPE( fˆ )  E (Y  fˆ ( X ))2 similar shape.


And,
 Var(Y )  E Bias
( fˆ ( Xoverall
))  Varthe
( fˆ (CV
X ))
curve is
2
ˆ
   MSE( f )

approximately
• N fold (leave one) cross-validation:
N
unbiased
as an
i
2
ˆ
ˆ
CV ( f  )   ( y  f  ( xi ))estimate of the
i 1
EPE
curve
2
N 
ˆ (x ) 
y

f
i

i 
  

i 1  1  S  (i, i ) 
2015/4/13
Basis Expansion and Regularization
20
Logistic Regression
• Logistic regression with a single quantitative
P r(Y  1 | X  x)
input X
log
 f ( x)
P r(Y  0 | X  x)
e f ( x)
P r(Y  1 | X  x) 
1  e f ( x)
• The penalized log-likelihood criterion
N
1
l ( f ;  )    yi log p( xi )  (1  yi ) log(1  p( xi ))    { f " (t )}2 dt
2
i 1
N

  yi f ( xi )  log(1  e
i 1
2015/4/13
f ( xi )

1
)    { f " (t )}2 dt
2
Basis Expansion and Regularization
21
Multidimensional Splines
• Tensor product basis
– The M1×M2 dimensional tensor product basis
g jk ( X )  h1 j ( X1 )h2k ( X 2 ), j  1,, M1, k  1,, M 2
– h1 j ( X1 ), basis function for coordinate X1
– h2k ( X 2 ), basis function for coordinate X2
M1 M 2
g ( X )   jk g jk ( X )
j 1 k 1
2015/4/13
Basis Expansion and Regularization
22
Tenor product basis of B-splines, some selected pairs
2015/4/13
Basis Expansion and Regularization
23
Multidimensional Splines
• High dimension smoothing Splines
N
min f
2


y

f
(
x
)
 J  f ,
 i
i
i 1
xi  IRd
– J is an appropriate penalty function
  2 f ( x)  2   2 f ( x)  2   2 f ( x)  2 
  2
  
 dx1dx2
J  f     2 
2
2
IR
 x1 
 x1x2   x2  
a smooth two-dimensional surface, a thin-plate spline.
• The solution has the form
N
f ( x)   0   x    j h j ( x)
T
j 1
2015/4/13
Basis Expansion and Regularization
24
Multidimensional Splines
• The decision
boundary of an
additive logistic
regression model.
Using natural
splines in each of
two coordinates.
• df = 1 +(4-1) + (41) = 7
2015/4/13
Basis Expansion and Regularization
25
Multidimensional Splines
• The results of
using a tensor
product of natural
spline basis in
each coordinate.
• df = 4 x 4 = 16
2015/4/13
Basis Expansion and Regularization
26
Multidimensional Splines
• A thin-plate spline fit
to the heart disease
data.
• The data points are
indicated, as well as
the lattice of points
used as knots.
2015/4/13
Basis Expansion and Regularization
27
Reproducing Kernel Hilbert space
• A regularization problems has the form:
N

min L( yi , f ( xi ))  J ( f )
f H
 i 1

J( f )  
~ 2
f ( s)
ds
~
G( s)
– L(y,f(x)) is a loss-function.
– J(f) is a penalty functional, and H is a space of
functions on which J(f) is defined.
• The solution
K
N
k 1
i 1
f ( x)   kk ( X )  i G( X  xi )
–
k span the null space of the penalty functional J
2015/4/13
Basis Expansion and Regularization
28
Spaces of Functions Generated by Kernel
• Important subclass are generated by the positive
kernel K(x,y).
• The corresponding space of functions Hk is called
reproducing kernel Hilbert space.
• Suppose thatK has an eigen-expansion
K ( x, y)    ii ( x)i ( y),  i  0,    

2
i 1 i
i 1
• Elements of H have an expansion

f ( x)   cii ( x),
i 1
2015/4/13
f
2
Hk

ˆ  ci2 /  i  
i 1
Basis Expansion and Regularization
29
Spaces of Functions Generated by Kernel
• The regularization problem become
N
2 
min L( yi , f ( xi ))   f H 
k
f H k
 i 1



N

2
min
L( yi ,  c j j ( xi ))    c j /  j 
C j 1 
i 1
j 1
j 1

• The finite-dimension solution(Wahba,1990)
N
f ( x)   i K ( x, xi )
i 1
• Reproducing properties of kernel function
 K (, xi ), f H K  f ( xi ),  K (, xi ), K (, x j )  K ( xi , x j )
2015/4/13
Basis Expansion and Regularization
30
Spaces of Functions Generated by Kernel
•
N
f ( x)   i K ( x, xi )
i 1
• The penalty functional
N N
J ( f )   K ( xi , x j ) i j
i 1 j 1
• The regularization function reduces to a
finite-dimensional criterion

min L( y, K )   T K , K  K ( xi , x j )


– K is NxN matrix, the ij-th entry K(xi, xj)
2015/4/13
Basis Expansion and Regularization
31
RKHS
• Penalized least squares
min ( y  K ) ( y  K )   K
T
T

• The solution of  : ˆ  ( K  I )T y
• The fitted values: N
fˆ ( x)   ˆ k K ( x, xk )
k 1
• The vector of N fitted value is given by
fˆ  Kˆ  K ( K  I ) 1 y
 ( I  K 1 ) 1 y
2015/4/13
Basis Expansion and Regularization
32
Example of RKHS
• Polynomial regression
– Suppose h( x) : IR p  IRM M huge
– Given x1, x2 ,, xN , with M  N , H  {h j ( xi )}
– Loss function: R( )  ( y  H )T ( y  H )   T 
L(  )polynomial
– The penalty
 0   H T ( yregression:
 Hˆ )  ˆ  0
2
M
M
 N 

minM  yi T  m ˆhm ( xi ) ˆ    m2
{  m }1  HH ( y  H )  H  0
i 1 
m 1
m 1




 The solution
H:  ( HH T  I ) 1 HH T y
{HH } : h( x ), h( x )  K ( x , x )
fˆ ( x)  h( x)T    ˆ i K (i x, xi ),j ˆ  ( Ki  j I ) 1 y
TN
2015/4/13
i 1
Basis Expansion and Regularization
33
Penalized Polynomial Regression
pd

• Kernel: K ( x, x)  (1  x, x ) has M  

d 

eigen-functions
d
• E.g. d=2, p=2:
K ( x, x)  (1  x1 x1  x2 x2 )2  h( x )T h( x )
2
2




 1  2 x1 x1  2 x2 x2  ( x1 x1 )  ( x2 x2 )  2 x1 x1x2 x2
h( x )  (1, 2 x1 , 2 x2 , x12 , x22 , 2 x1 x2 )
• The penalty polynomial regression:
2
M
M


2
minM   yi    m hm ( xi )      m
{  m }1
i 1 
m 1
m 1

N
2015/4/13
Basis Expansion and Regularization
34
RBF kernel & SVM kernel
• Gaussian Radial Basis2 Functions
 x  y / 2 2
K ( x, y )  e
;
h( x j )  K ( x, x j ); j  1,...,M
• Support Vector Machines
N
f ( x )   0    j K ( x, x j )
j 1
N

T
min 1  yi f ( xi )    K 
 0 ,
 i 1

2015/4/13
Basis Expansion and Regularization
35
Wavelet smoothing
• Another type of bases——Wavelet bases
• Wavelet bases are generated by translations and
dilations of a single scaling function  (x).
• If  ( x)  I ( x [0 1]) ,then 0,k ( x)   ( x  k ) generates an
orthonormal basis for functions with jumps at the
integers.
• 0,k ( x) form a space called reference space V0
• The dilations 1,k ( x)  2 (2 x  k ) form an orthonormal
basis for a space V1  V0
• Generally, we have   V1  V0  V1  
2015/4/13
Basis Expansion and Regularization
36
 j ,k ( x)  2 j / 2  (2 j x  k )
V0  V1  V2  
V j 1  V j  W j
W j  信号的细节,正交于V j
 ( x)   (2 x)   (2 x  1)  W0上的小波基
 j ,k ( x)  2 j / 2 (2 j x  k )  W j 上的小波基
2015/4/13
Basis Expansion and Regularization
37
2015/4/13
Basis Expansion and Regularization
38
Wavelet smoothing
• The L2 space dividing
V j 1  V j  W j  V j 1  W j 1  W j
 V0  W0  W1    W j
V 2V1 V0
W3 W 2
V1
W1
W0  V1 /V0
V2

• Mother wavelet  ( x)   (2 x)   (2 x  k ) generate
function  0,k  ( x  k ) form an orthonormal basis for
W0. Likewise  j ,k  2 j / 2 (2 j x  k ) form a basis for Wj.
2015/4/13
Basis Expansion and Regularization
39
Wavelet smoothing
• Wavelet basis on W0 :
 ( x)   (2 x)   (2 x  1)
• Wavelet basis on Wj :
 j ,k ( x)  2 j / 2 (2 j x  k )
• The symmlet-p wavelet:
– A support of 2p-1 consecutive intervals.
– p vanishing moments:
j

(
x
)
x
dx  0,

2015/4/13
j  1,  , p
Basis Expansion and Regularization
40
2015/4/13
Basis Expansion and Regularization
41
2015/4/13
Basis Expansion and Regularization
42
2015/4/13
Basis Expansion and Regularization
43
Adaptive Wavelet Filtering
• Wavelet transform: y*  W T y
– y: response vector, W: NxN orthonormal wavelet
basis matrix
• Stein Unbiased Risk Estimation (SURE)
min y W
– The solution:
2
2
 2 
1


ˆ
 j  sign( y j )(| y j |  ) 
– Fitted function is given by inverse wavelet
transform: fˆ  Wˆ , LS coefficients truncated to 0
– Simple choice for  :    2 log N
2015/4/13
Basis Expansion and Regularization
44
2015/4/13
Basis Expansion and Regularization
45
Download