Constrained Least Squares Authors: G.H. Golub and C.F. Van Loan

advertisement
UNIVERSITETET
I OSLO
Constrained Least Squares
Authors: G.H. Golub and C.F. Van Loan
Chapter 12 in Matrix Computations, 3rd Edition, 1996, pp.580-587
INSTITUTT FOR INFORMATIKK
CICN may05/1
UNIVERSITETET
I OSLO
Background
The least squares problem:
min kAx − bk2
x
Sometimes, we want x to be chosen from some proper subset S ⊂ Rn.
Example: S = {x ∈ Rn s.t. kxk2 = 1}
Such problems can be solved using the QR factorization and the singular
value decomposition (SVD).
INSTITUTT FOR INFORMATIKK
CICN may05/2
UNIVERSITETET
I OSLO
Least Squares with a Quadratic Inequality
Constraint (LSQI)
General problem:
min
x
where:
kAx − bk2
s.t. kBx − dk2 ≤ α
A ∈ Rm,n(m ≥ n), b ∈ Rm, B ∈ Rp,n, d ∈ Rp , α ≥ 0
INSTITUTT FOR INFORMATIKK
CICN may05/3
UNIVERSITETET
I OSLO
Assume the generalized SVD of matrices A and B given as:
UTAX = diag(α1, ..., αn), UTU = Im
VT BX = diag(β1, ..., βq ), VTV = Ip , q = min{p, n}
Assume also the following definitions:
b̃ Õ UTb, d̃ Õ VTd, y Õ X−1x
Then the problem becomes:
min
y
INSTITUTT FOR INFORMATIKK
kDA y − b̃k2
s.t. kDB y − d̃k2 ≤ α
CICN may05/4
UNIVERSITETET
I OSLO
min
y
kDA y − b̃k2
s.t.kDB y − d̃k2 ≤ α
Correctness: By inserting the definitions we get:
kDA y − b̃k2 = kUTAXX−1x − UTbk2 = kUT (Ax − b) k2
Multiplication with an orthogonal matrix does not affect the 2-norm. (The
same result applies for the inequality constraint.)
INSTITUTT FOR INFORMATIKK
CICN may05/5
UNIVERSITETET
I OSLO
The objective function becomes:
n m
2
X
X
αiyi − b̃i +
b̃i2
i=1
The constraint becomes:
r X
i=1
We have:
i=n+1
βiyi − d̃i
2
+
p
X
i=r +1
d̃2i ≤ α2
r = rank(B)
βr +1 = βr +2 = ... = βq = 0
INSTITUTT FOR INFORMATIKK
CICN may05/6
UNIVERSITETET
I OSLO
We have a solution if and only if:
p
X
i=r +1
d̃2i ≤ α2
Otherwise, there is obviously no way to satisfy the constraint.
INSTITUTT FOR INFORMATIKK
CICN may05/7
UNIVERSITETET
I OSLO
Special Case:
Pp
2
2
=
α
d̃
i=r +1 i
The first sum in (12.1.5) must equal zero, this means:
d̃i
yi = , i ∈ [1, r ]
βi
The remainder of the variables can be chosen to minimize the first sum in
(12.1.4):
b̃i
, i ∈ [r + 1, n]
yi =
αi
(Of course, if αi = 0, i ∈ [r + 1, n], this does not make any sense. We then
choose yi = 0.)
INSTITUTT FOR INFORMATIKK
CICN may05/8
UNIVERSITETET
I OSLO
The General Case:
Pp
2
2
<
α
d̃
i=r +1 i
The minimization (without regards to the constraint) is given by:

 b̃ /α α ≠ 0
i
i
i
yi =
 d̃i/βi αi = 0
This may or may not be a feasible solution, depending on whether it is in
S.
INSTITUTT FOR INFORMATIKK
CICN may05/9
UNIVERSITETET
I OSLO
The Method of Lagrange Multipliers
Solve
∂h
,i
∂yi
h(λ, y) = kDA y −
b̃k22
+ λ kDB y −
d̃k22
−α
= 1, ..., n, this yields:
T
T
T
DA DA + λDB DB y = DT
A b̃ + λDB d̃
INSTITUTT FOR INFORMATIKK
2
CICN may05/10
UNIVERSITETET
I OSLO
Solution using Lagrange multipliers:


 αib̃2i +λβi2d̃i i = 1, 2, ..., q
yi(λ) = b̃ αi +λβi

 i
i = q + 1, ..., n
αi
INSTITUTT FOR INFORMATIKK
CICN may05/11
UNIVERSITETET
I OSLO
Determining the Lagrange parameter, λ
Define:
φ(λ) Õ kDB y(λ) − d̃k22 =
r
X
i=1
βib̃i − αi d̃i
αi 2
αi + λβ2i
!2
+
p
X
i=r +1
d̃2i
Solve for φ(λ) = α2 . Because φ(0) > α2 and the function is monotone
decreasing for λ > 0, we know that there must be a unique, positive
solution λ∗ with φ(λ∗) = α2 .
INSTITUTT FOR INFORMATIKK
CICN may05/12
UNIVERSITETET
I OSLO
Algorithm: Spherical Constraint
The special case B = In, d = 0, α < 0 can be interpreted as selecting x from
the interior of an n-dimensional sphere. It can be solved using the
following algorithm:
• [U, Σ, V] ← SVD(A)
• b ← UT b
• r ← rank (A)
INSTITUTT FOR INFORMATIKK
CICN may05/13
UNIVERSITETET
I OSLO
Algorithm: Sperical Constraint
• if
Pr
i=1
2
bi
σi
• λ ← solve
∗
•x←
• else:
•x←
Pr
i=1
Pr
• end if
i=1
> α2 :
Pr
i=1
σ i bi
σi2 +λ∗
bi
σi
σ i bi
σi2+λ∗
2
= α2
!
vi
vi
Computing the SVD is the most computationally intense operation in the
above algorithm.
INSTITUTT FOR INFORMATIKK
CICN may05/14
UNIVERSITETET
I OSLO
Spherical Constraint as Ridge Regression Problem
Using Lagrange multipliers to solve the spherical constraint problem
results in:
ATA + λI x = ATb
where:
λ > 0, kxk2 = α
This is the solution to the ridge regression problem:
minkAx − bk22 + λkxk22
We need some procedure for selecting a suitable λ.
INSTITUTT FOR INFORMATIKK
CICN may05/15
UNIVERSITETET
I OSLO
Define the problem:
xk(λ) = argminxkDk(Ax − b)k22 + λkxk22
where Dk = I − ekeT
k is the matrix operator for removing one of the rows.
Select λ to minimize the cross-validation weighted square error:
m
1 X
2
wk(aT
C(λ) =
k xk(λ) − bk )
m k=1
This means choosing a λ that does not make the final model rely to much
on any one observation.
INSTITUTT FOR INFORMATIKK
CICN may05/16
UNIVERSITETET
I OSLO
Through some calculation, we find that:
2
m
1 X
rk
wk
C(λ) =
m k=1
∂rk/∂bk
where rk is an element in the residual vector r = b − Ax(λ). The
expression inside the parenthesis can be interpreted as an inverse
measure of the impact of the kth observation on the model.
INSTITUTT FOR INFORMATIKK
CICN may05/17
UNIVERSITETET
I OSLO
Using the SVD, the minimization problem is reduced to:
2 2

Pr
σj
m
b̃
−
u
b̃
j=1 kj j σ 2+λ 
 k
1 X
j

C(λ) =
wk 
2


P
σj
m k=1
r
2
1 − j=1 ukj σ 2+λ
j
where b̃ = UTb as before.
INSTITUTT FOR INFORMATIKK
CICN may05/18
UNIVERSITETET
I OSLO
Equality Constrained Least Squares
We consider a problem similar to LSQI, but with an equality constraint,
i.e. a normal least squares problem with solution:
with the constraint that:
minkAx − bk2
Bx = d
We assume the following dimensions:
A ∈ Rm,n, B ∈ Rp,n, b ∈ Rm , d ∈ Rp , rank(B) = p
INSTITUTT FOR INFORMATIKK
CICN may05/19
UNIVERSITETET
I OSLO
We start by calculating the QR-factorization of BT:
 
R
T


B =Q
0
with
A ∈ Rn,n, R ∈ Rp,p , 0 ∈ Rn−p,p
and then add the following defintions:

AQ = [A1 A2] , Q x = 
T
This gives us:
 
Bx = Q 
INSTITUTT FOR INFORMATIKK
R
0
T
h
i
h
y
z


i

 x = R 0 Q x = R 0 
T
T
T
y
z

 = RT y
CICN may05/20
UNIVERSITETET
I OSLO
We also get (because QQT = I):

Ax = (AQ) Q x = [A1A2 ] 
T
So the problem becomes:
subject to:
y
z

 = A1 y + A2 z
minkA1 y + A2 z − bk2
RT y = d
where y is determined directly from the constraint, and then inserted into
the LS problem:
minkA2 z − (b − A1 y) k2
giving us a vector z which can be used to find the final answer:
 
y

x=Q
z
INSTITUTT FOR INFORMATIKK
CICN may05/21
UNIVERSITETET
I OSLO
The Method of Weighting
A method for approximating the solution of the LSE-problem (minimize
kAx − bk s.t. Bx = d) through a normal, unconstrained LS problem:




A
b




min
x−
λB
λd
2
x
for large values of λ.
INSTITUTT FOR INFORMATIKK
CICN may05/22
UNIVERSITETET
I OSLO
The exact solution to the LSE problem:
p
n
X
X
vT
d
uT
i
ib
xi +
xi
x=
βi
αi
i=1
i=p+1
The approximation:
x(λ) =
p
2 2 T
X
α i uT
i b + λ β i vi d
i=1
αi2 + λ2β2i
The difference:
p
x(λ) − x =
X αi
i=1
n
X
uT
ib
xi +
xi
α
i
i=p+1
β i uT
ib
βi
αi2
−
+
α i vT
id
λ2β2i
xi
It is appearant that as λ grows larger, the approximation error is reduced.
This method is attractive because it only utilizes ordinary LS solving.
INSTITUTT FOR INFORMATIKK
CICN may05/23
Download