Lagrange Multipliers and the Rayleigh Quotient

advertisement
Lagrange Multipliers and the Rayleigh Quotient
Our goal is to find the extremum of the function f (x) : Rn → R subject
to the constraints hi (x) = 0 for i = 1 . . . m (hi : Rn → R).
Theorem (Lagrange multipliers): If x∗ is an extremum of f subject to the
constraints hi (x) = 0 there exist scalars λ1 , . . . , λm such that
∗
∇f (x ) +
m
X
λi ∇hi (x∗ ) = 0
(1)
i=1
where ∇f is the gradient of f . In other words, if x∗ is an extremum subject
to the constraints then
∂f (x∗ ) X ∂hi (x∗ )
+
λi
= 0 for j = 1 . . . n
∂xj
xj
i=1
m
(2)
λ1 , . . . , λm are called Lagrange multipliers.
Example: The Rayleigh Quotient
max
x
xT Ax
xT x
(3)
where A is symmetric.
T
′T
′
= xxTAx
for x′ = cx and c 6= 0 ∈ R, therefore we will
Notice that xx′TAx
x′
x
solve for x with a unit norm kxk22 = 1.
max xT Ax
s.t. xT x = 1
(4)
L(x) = xT Ax + λ(xT x − 1)
(5)
The Lagrangian is
1
Taking the derivative with respect to x:
∂L(x)
= xT (A + AT ) + 2λxT
∂x
∂L(x)
= xT x − 1 (the original constraint)
∂λ
∂L(x)
=0⇒
∂x
xT (A + AT ) = −2λxT ⇒
(A + AT )x = −2λx ⇒ (A is symmetric)
Ax = λ̃x where (λ̃ = −2λ)
(6)
(7)
(8)
(9)
(10)
(11)
Hence the maximum and the minimum are obtained for x an eigenvector of
A (the Lagrange multipliers provide a necessary condition. The extremum is
indeed obtained because xT Ax is a continuous function and the unit sphere
is a compact set). For x an eigenvector of A with unit norm, xT Ax =
xT λx = λxT x = λ. Therefore the maximum is obtained at the eigenvector
corresponding to the largest eigenvalue of A.
The Generalized Rayleigh Quotient is:
max
x
xT Ax
xT Bx
(12)
For A, B symmetric and positive definite. Again, to choose a certain solution
we will constrain x:
maxx xT x
s.t. xT Bx = 1
(13)
We will solve the Generalized Rayleigh Quotient by reduction to the Rayleigh
Quotient.
Define B = D T D, C = D −T AD −1 and y = Dx. Notice that C ∈ P SDN.
xT Ax
xT D T D −T AD −1 Dx
y T Cy
=
=
xT Bx
xT D T Dx
yT y
(14)
This is the Rayleigh Quotient with the symmetric matrix C and the unit
vector y (y T y = xT D T Dx = xT BX = 1). The solution is the first eigenvector
of C. Notice that the first eigenvalue of C and B −1 A is the same (substitute
y = Dx in D −T AD −1 y = λy), but their first eigenvector is different.
2
Download