QR Decomposition When solving an overdetermined system by projection (or a least squares solution) often the following method is used: • Factorize A = Q · R with R upper triangular and Q orthogonal, i.e. QT Q = 1. • Compute y = QT · b. • Solve Rx = y by substitution, ignoring the row entries that do not belong to columns of the original A. Q can be obtained by applying Gram-Schmidt orthogonalization to the columns of A and extending to a orthonormal basis of Rn . R holds the coefficients of the Gram-Schmidt process. (In practice not Gram-Schmidt, but another process – “Householder Transformations” – are used.) Eigenvalues Computing the characteristic polynomial as determinant is a very unstable process. Instead eigenvalues are computed by transforming • The matrix is converted by orthogonal transformations to “almost upper diagonal” form (upper Hessenberg form). • The matrix is transformed to upper diagonal form. • The eigenvalues are the diagonal entries. This process can be performed by the LAPack routine sgeev/dgeev. Nonlinear equations We are given a function f : R → R and want to find (one or all) z with f (z) = 0. Typically methods work by iteration, starting at a point x0 and then iteratively approximate a zero z. If there are several zeroes, it might be necessary to work with several start values. The three main methods are: • Bisection • Newton’s method (using tangents) • Secant method In general, problems are: • How to select good start values. • How to enforce convergence for ‘bad’ start values. • How long to iterate. Quadratic, Ternary, Quartic We’ve seen the formula for the solutions of a quadratic equation. Similar formulas exist for equations of degree 3 and 4, but they are numerically unstable. Furthermore one can show (this is done in an abstract algebra course) that there cannot be a formula for polynomials of higher degree. Newton’s method We have that 0 = f (z) ≈ f (x) + f 0 (x)(x − z) Solving for z gives the iteration (replace x with zero of the tangent line). f (x) x→x− 0 f (x) This method converges if x0 is chosen close enough to z (and f 0 has no zeroes in the interval, in particular z is no double zero of f ). If we let ek = xk − z the error, we obtain: ek+1 f (xk ) = xk+1 −z = xk −z− 0 f (xk ) f (xk ) − f 0 (xk )ek =− f 0 (xk ) 1 f 00 (ξk ) = 2 f 0 (xk ) for ξk in the interval (Taylor approximation for 0 = f (z) by a degree 1 polynomial around xk ). As xk → z we get approximately ek+1 1 f 00 (z) 2 ≈ ek , 0 2 f (z) i.e. each step we double the number of digits. Problem: Bad (or no convergence) if f 0 (z) = 0. As a stop criterion check: • Change in step width smaller than some tolerance. • Given upper limit for number of iterations. Generalizations of Newton’s exist for multidimensional systems. Systems of polynomial equations Consider a system of polynomial equations in several variables: f1 (x1 , . . . , xn ) = 0 f2 (x1 , . . . , xn ) = 0 .. . fm (x1 , . . . , xn ) = 0 To solve this system we want to eliminate variables in a similar way as with solving a system of linear equations. Problem: How to eliminate xi y versus yz? Convention: For xα1 1 xα2 2 · · · xαnn write xα . Gröbner basis approach We define an ordering (lex ordering) on monomials: xα ≺ xβ if α < β lexicographically. (One can define an “admissible” ordering in more general. One main variant is to compare the total degrees first.) This way, we identify in every polynomial p a leading term lt(p). If S = {p1 , . . . , pm } is a set of polynomials, we say that a polynomial f reduces at S if q = lt(pi ) · r for a monomial q in f , some monomial r and some i, The reduction of f at S is the polynomial obtained by subtracting multiples of pi until no leading term divides any longer. Note: In this process the monomials in f become smaller, this process can have only finitely many steps. S-polynomial To define some measure of “reduction”, we define for two polynomials p, q their S-polynomial as l l S(p, q) = ·p− ·q lt(p) lt(q) where l = lcm(lt(p), lt(q)). Observation 1: Common zeroes of p and q are zeroes of S(p, q). Observation 2: We can also reduce the S-polynomial at p and q and get a “smaller” polynomial without losing common zeroes. Example: p = x2 y 3 + 3xy 4 , q = 3xy 4 + 2x3 y, lt(p) = x2 y 3 , lt(q) = 2x3 y Then lcm(lt(p), lt(q)) = 2x3 y 3 and S(p, q) = 2x · p − y 2 · q = −3xy 6 + 6x2 y 4 We now can reduce S(p, q) at p and get: S(p, q) = 6y · p = −3xy 6 − 18xy 5 Buchberger’s Algorithm Given a set F of polynomials, we now iterate this process. Require: F = (f1 , . . . , fs ). Ensure: A set G = (g1 , . . . , gt ). begin G := F ; repeat G0 := G; for every pair {p, q}, p 6= q in G0 do S := S(p, q); (S-polynomial) G0 S := S ; (reduction modulo G0 ) if S 6= 0 then G := G ∪ {S}; fi; end for until G = G0 ; end Gröbner bases The resulting set G is called a Gröbner basis of F . (One can reduce terms against each other and this way get a reduced Gröbner basis.) Observation: Common zeroes of polynomials in F are common zeroes of polynomials in G. Note: One might get different performance/results for a different ordering of variables. Theorem: If one can obtain polynomials from F that only involve the last variable, this process will find them. One can thus use a back-substitution approach to solve for common zeroes. Example Consider the equations x2 + y 2 + z 2 = 1, x2 + y 2 = z, x = y; respectively the set of polynomials {x2 + y 2 + z 2 − 1, x2 + y 2 − z, x − y} The (reduced) Gröbner basis calculation in Maple proceeds as this: > with(Groebner); > f:=[xˆ2+yˆ2+zˆ2-1,xˆ2+yˆ2-z,x-y]; > g:=gbasis(f,plex(x,y,z)); g := [2y 2 − z, z + z 2 − 1, x − y] We now solve first for z, then for x and y. Application Suppose we want to find the maximum value of the function f (x, y, z) = x3 + 2xyz − z 2 subject to the constraints (points on a sphere) x2 + y 2 + z 2 = 1 By the method of Lagrange multipliers, we know that ∇f = λ∇g at a local maximum or minimum. The three partial derivatives, and the constraints give the equations: 3x2 + 2yz = 2xλ 2xz = 2yλ 2xy − 2z = 2zλ x2 + y 2 + z 2 = 1 We now compute a Gröbner basis for z ≺ y ≺ x ≺ λ and get 655 3 11 1763 5 z + z − z, 1152 1152 288 1605 4 453 2 6 3 − 576 z + yz + z − yz − z , 59 118 118 827 3 3839 5 2 z + y z + z − z, − 6912 3835 295 3835 906 3 2562 5 3 2 z + y + yz + z − y − z, − 9216 3835 295 3835 1152 5 2556 3 − 3835 z + yz 2 − 108 z + xz + z, 295 3835 1999 3 6403 5 z + z + xy − z, − 19584 3835 295 3835 2 2 2 z7 − x + y + z − 1, λ− 167616 6 z 3835 + 36717 4 z 590 − 32 yz − 134419 2 z 7670 − 32 x Solving for z yields: √ 2 11 z = 0, ±1, ± , ± √ 3 8 2 and from this one can solve for each z-value the corresponding x and y values and finally test for maxima/minima. Observation: This process can be done in an exact way or even using variables as coefficients. There are many issues with making this process effective, for example using different orderings.