QR Decomposition

advertisement
QR Decomposition
When solving an overdetermined system by projection (or a least
squares solution) often the following method is used:
• Factorize A = Q · R with R upper triangular and Q orthogonal, i.e.
QT Q = 1.
• Compute y = QT · b.
• Solve Rx = y by substitution, ignoring the row entries that do not
belong to columns of the original A.
Q can be obtained by applying Gram-Schmidt orthogonalization to
the columns of A and extending to a orthonormal basis of Rn . R holds
the coefficients of the Gram-Schmidt process.
(In practice not Gram-Schmidt, but another process – “Householder
Transformations” – are used.)
Eigenvalues
Computing the characteristic polynomial as determinant is a very
unstable process. Instead eigenvalues are computed by transforming
• The matrix is converted by orthogonal transformations to “almost
upper diagonal” form (upper Hessenberg form).
• The matrix is transformed to upper diagonal form.
• The eigenvalues are the diagonal entries.
This process can be performed by the LAPack routine sgeev/dgeev.
Nonlinear equations
We are given a function f : R → R and want to find (one or all) z with
f (z) = 0.
Typically methods work by iteration, starting at a point x0 and then
iteratively approximate a zero z.
If there are several zeroes, it might be necessary to work with several
start values.
The three main methods are:
• Bisection
• Newton’s method (using tangents)
• Secant method
In general, problems are:
• How to select good start values.
• How to enforce convergence for ‘bad’ start values.
• How long to iterate.
Quadratic, Ternary, Quartic
We’ve seen the formula for the solutions of a quadratic equation.
Similar formulas exist for equations of degree 3 and 4, but they are
numerically unstable.
Furthermore one can show (this is done in an abstract algebra course)
that there cannot be a formula for polynomials of higher degree.
Newton’s method
We have that
0 = f (z) ≈ f (x) + f 0 (x)(x − z)
Solving for z gives the iteration (replace x with zero of the tangent
line).
f (x)
x→x− 0
f (x)
This method converges if x0 is chosen close enough to z (and f 0 has
no zeroes in the interval, in particular z is no double zero of f ).
If we let ek = xk − z the error, we obtain:
ek+1
f (xk )
= xk+1 −z = xk −z− 0
f (xk )
f (xk ) − f 0 (xk )ek
=−
f 0 (xk )
1 f 00 (ξk )
=
2 f 0 (xk )
for ξk in the interval (Taylor approximation for 0 = f (z) by a degree 1
polynomial around xk ).
As xk → z we get approximately
ek+1
1 f 00 (z) 2
≈
ek ,
0
2 f (z)
i.e. each step we double the number of digits.
Problem: Bad (or no convergence) if f 0 (z) = 0.
As a stop criterion check:
• Change in step width smaller than some tolerance.
• Given upper limit for number of iterations.
Generalizations of Newton’s exist for multidimensional systems.
Systems of polynomial equations
Consider a system of polynomial equations in several variables:
f1 (x1 , . . . , xn ) = 0
f2 (x1 , . . . , xn ) = 0
..
.
fm (x1 , . . . , xn ) = 0
To solve this system we want to eliminate variables in a similar way
as with solving a system of linear equations.
Problem: How to eliminate xi y versus yz?
Convention: For xα1 1 xα2 2 · · · xαnn write xα .
Gröbner basis approach
We define an ordering (lex ordering) on monomials: xα ≺ xβ if α < β
lexicographically.
(One can define an “admissible” ordering in more general. One main
variant is to compare the total degrees first.)
This way, we identify in every polynomial p a leading term lt(p).
If S = {p1 , . . . , pm } is a set of polynomials, we say that a polynomial f
reduces at S if q = lt(pi ) · r for a monomial q in f , some monomial r
and some i,
The reduction of f at S is the polynomial obtained by subtracting
multiples of pi until no leading term divides any longer.
Note: In this process the monomials in f become smaller, this process
can have only finitely many steps.
S-polynomial
To define some measure of “reduction”, we define for two polynomials
p, q their S-polynomial as
l
l
S(p, q) =
·p−
·q
lt(p)
lt(q)
where l = lcm(lt(p), lt(q)).
Observation 1: Common zeroes of p and q are zeroes of S(p, q).
Observation 2: We can also reduce the S-polynomial at p and q
and get a “smaller” polynomial without losing common zeroes.
Example:
p = x2 y 3 + 3xy 4 , q = 3xy 4 + 2x3 y,
lt(p) = x2 y 3 , lt(q) = 2x3 y
Then lcm(lt(p), lt(q)) = 2x3 y 3 and
S(p, q) = 2x · p − y 2 · q = −3xy 6 + 6x2 y 4
We now can reduce S(p, q) at p and get:
S(p, q) = 6y · p = −3xy 6 − 18xy 5
Buchberger’s Algorithm
Given a set F of polynomials, we now iterate this process.
Require: F = (f1 , . . . , fs ).
Ensure: A set G = (g1 , . . . , gt ).
begin
G := F ;
repeat
G0 := G;
for every pair {p, q}, p 6= q in G0 do
S := S(p, q); (S-polynomial)
G0
S := S ; (reduction modulo G0 )
if S 6= 0 then G := G ∪ {S}; fi;
end for
until G = G0 ;
end
Gröbner bases
The resulting set G is called a Gröbner basis of F . (One can reduce
terms against each other and this way get a reduced Gröbner basis.)
Observation: Common zeroes of polynomials in F are common
zeroes of polynomials in G.
Note: One might get different performance/results for a different
ordering of variables.
Theorem: If one can obtain polynomials from F that only involve
the last variable, this process will find them.
One can thus use a back-substitution approach to solve for common
zeroes.
Example
Consider the equations
x2 + y 2 + z 2 = 1, x2 + y 2 = z, x = y;
respectively the set of polynomials
{x2 + y 2 + z 2 − 1, x2 + y 2 − z, x − y}
The (reduced) Gröbner basis calculation in Maple proceeds as this:
> with(Groebner);
> f:=[xˆ2+yˆ2+zˆ2-1,xˆ2+yˆ2-z,x-y];
> g:=gbasis(f,plex(x,y,z));
g := [2y 2 − z, z + z 2 − 1, x − y]
We now solve first for z, then for x and y.
Application
Suppose we want to find the maximum value of the function
f (x, y, z) = x3 + 2xyz − z 2
subject to the constraints (points on a sphere)
x2 + y 2 + z 2 = 1
By the method of Lagrange multipliers, we know that ∇f = λ∇g at
a local maximum or minimum.
The three partial derivatives, and the constraints give the equations:
3x2 + 2yz = 2xλ
2xz = 2yλ
2xy − 2z = 2zλ
x2 + y 2 + z 2 = 1
We now compute a Gröbner basis for z ≺ y ≺ x ≺ λ and get
655 3
11
1763 5
z
+
z
−
z,
1152
1152
288
1605 4
453 2
6
3
− 576
z
+
yz
+
z
−
yz
−
z ,
59
118
118
827 3
3839
5
2
z
+
y
z
+
z
−
z,
− 6912
3835
295
3835
906 3
2562
5
3
2
z
+
y
+
yz
+
z
−
y
−
z,
− 9216
3835
295
3835
1152 5
2556
3
− 3835
z + yz 2 − 108
z
+
xz
+
z,
295
3835
1999 3
6403
5
z
+
z
+
xy
−
z,
− 19584
3835
295
3835
2
2
2
z7 −
x + y + z − 1,
λ−
167616 6
z
3835
+
36717 4
z
590
− 32 yz −
134419 2
z
7670
− 32 x
Solving for z yields:
√
2
11
z = 0, ±1, ± , ± √
3 8 2
and from this one can solve for each z-value the corresponding x and
y values and finally test for maxima/minima.
Observation: This process can be done in an exact way or even
using variables as coefficients.
There are many issues with making this process effective, for example
using different orderings.
Download