Polynomial roots and approximate greatest common divisors by Joab Winkler Lecture notes for a Summer School at The Computer Laboratory The University of Oxford Oxford England 17-21 September 2007 Supported by the Engineering and Physical Sciences Research Council c Joab Winkler, 2007 Copyright Introduction The determination of the roots of a polynomial is a classical problem in mathematics that has played an important part in its development through the centuries. It has motivated the introduction of many important concepts of mathematics, including irrational, negative and complex numbers. Although the study of polynomial equations does not presently play a leading role in general computational mathematics, it forms an important part of computer algebra, which is applied widely in algebraic geometry computations. There exist many algorithms for the computation of the roots of a polynomial, but the results deteriorate as the degree of the polynomial increases, or the multiplicity of one or more of its roots increases, or the roots become more closely spaced. Furthermore, even roundoff errors are sufficient to cause totally incorrect results, and thus the problems are compounded when the coefficients are subject to uncertainty. These notes describe a new method for solving polynomial equations that has been shown to be significantly better, particularly for ‘difficult polynomials’ in the presence of noise, than results obtained by the standard methods, such as Newton’s method. The new method uses resultants and subresultants, structured matrices, constrained optimisation, information theory and the method of non-linear least squares. A very important part of this algorithm is the determination of an approximate greatest i common divisor of two inexact polynomials, and this topic is covered in detail. The theory is presented for the power (monomial) and Bernstein polynomial bases because the power basis is the most frequently used polynomial basis, and the Bernstein basis is used extensively in geometric modelling for the representation of curves and surfaces. These notes form the course material for a series of lectures on the computation of the roots of a polynomial and approximate greatest common divisors that were given at the Computer Laboratory, The University of Oxford, Oxford, England in September 2007. These lectures were given in response to the program call Maths for Engineers by the Engineering and Physical Sciences Research Council. Joab Winkler The University of Sheffield Sheffield, United Kingdom September 2007 ii Acknowledgements The author wishes to thank the Engineering and Physical Sciences Research Council for its financial support for the lecture course. He also wishes to thank John Allan, his PhD student, who developed the MATLAB computer code for all the examples in these notes. The material in this document forms a major part of his PhD thesis. iii Abbreviations and notation GCD ... greatest common divisor LS ... least squares LSE ... least squares with equality MDL ... minimum description length ML ... maximum likelihood STLN ... structured total least norm H(p(x)) . . . Shannon entropy of the discrete probability distribution p(x) L(x) ... code length of the symbols stored in x S(f, g) ... Sylvester resultant matrix for the power basis polynomials f (x) and g(x) Sk (f, g) ... Sylvester subresultant matrix of order k for the power basis polynomials f (x) and g(x) S(p, q) ... Sylvester resultant matrix for the Bernstein basis polynomials p(x) and q(x) Sk (p, q) ... Sylvester subresultant matrix of order k for the Bernstein basis polynomials p(x) and q(x) iv T (p, q) ... Sylvester resultant matrix for the scaled Bernstein basis polynomials p(x) and q(x) Tk (p, q) . . . Sylvester subresultant matrix of order k for the scaled Bernstein basis polynomials p(x) and q(x) ηc (x̃0 ) ... componentwise backward error of the approximate root x̃0 ηn (x̃0 ) ... normwise backward error of the approximate root x̃0 κc (x0 ) ... componentwise condition number of the root x0 κn (x0 ) ... normwise condition number of the root x0 ρ(x0 ) ... condition number of the root x0 that preserves its multiplicity h(k) (x) ... kth derivative of the polynomial h(x) h ... vector of coefficients of the polynomial h(x) log x ... log2 x k·k ... k·k2 v Contents Introduction i Acknowledgements iii Abbreviations and notation iv Contents vi List of Figures viii 1 Introduction 1.1 Historical background . . . . . . . . . . . 1.2 A review of some methods for computing 1.3 Examples of errors . . . . . . . . . . . . 1.4 Summary . . . . . . . . . . . . . . . . . . . the . . . . 2 Condition numbers and errors 2.1 Backward errors and the forward error . . . 2.2 Condition numbers . . . . . . . . . . . . . . 2.3 Condition numbers, backward errors and the 2.4 The geometry of ill-conditioned polynomials 2.5 A simple polynomial root finder . . . . . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . roots of a polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forward error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 5 9 . . . . . . 10 11 18 21 25 34 40 3 The Sylvester resultant matrix 3.1 The Sylvester resultant matrix for power basis polynomials . . . . . . 3.1.1 Subresultants of the Sylvester matrix for power basis polynomials 3.2 The Sylvester resultant and subresultant matrices for Bernstein basis polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 42 43 45 53 60 4 Approximate greatest common divisors 4.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The non-uniqueness of the Sylvester resultant matrix . . . 4.3 Structured matrices and constrained minimisation . . . . . 4.3.1 Algorithms for the solution of the LSE problem . . 4.3.2 Computational details . . . . . . . . . . . . . . . . 4.4 Approximate GCDs of Bernstein basis polynomials . . . . 4.5 An approximate GCD of a polynomial and its derivative . 4.6 GCD computations by partial singular value decomposition 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 63 64 66 71 76 90 100 108 124 5 A robust polynomial root finder 5.1 GCD computations and the multiplicities of the roots . . . . . . . . . 5.2 Non-linear least squares for Bernstein polynomials . . . . . . . . . . . 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 127 128 132 6 Minimum description length 6.1 Minimum description length . . . . . . . . . . . . . . . . . 6.2 Shannon entropy and the length of a code . . . . . . . . . 6.3 An expression for the code length of a model . . . . . . . . 6.3.1 The precision that minimises the total code length . 6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 134 138 146 147 153 162 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 vii List of Figures 1.1 1.2 2.1 2.2 4.1 4.2 4.3 4.4 The computed roots of (x − 1)100 . . . . . . . . . . . . . . . . . . . . . Perturbation regions of f (x) = (x − 1)m , m = 1, 6, 11, . . . , when the constant term is perturbed by 2−10 . . . . . . . . . . . . . . . . . . . . Backward and forward errors for y = g(x). The solid lines represent exact computations, and the dashed line represents the computed approximation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The root distribution of four polynomials after the coefficients have been perturbed: (a) f (x) = (x−0.6)3 (x−1)5 , (b) f (x) = (x−0.7)3 (x− 1)5 , (c) f (x) = (x − 0.8)3 (x − 1)5 , (d) f (x) = (x − 0.9)3 (x − 1)5 . . . . (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ, (b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ54 /σ55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given inexact data S(f1 , g1 ), ; (iii) the computed data S(f˜1,0 , g̃1,0), ×, for α = 10−0.6. All the polynomials are normalised by the geometric mean of their coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given inexact data S(f1 , g1 ), ; (iii) the computed data S(f˜1,0 , g̃1,0), ×, for α = 101.4 . All the polynomials are normalised by the geometric mean of their coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ, (b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ51 /σ52 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 6 8 13 24 84 85 88 89 4.5 4.6 4.7 6.1 The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ2 , ĝ2 ), ♦; (ii) the given inexact data S(f2 , g2 ), ; (iii) the computed data S(f˜2,0 , g̃2,0), ×, for α = 100.1 . All the polynomials are normalised by the geometric mean of their coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . The variation with α of (i)(a) The maximum allowable value of kzp k, which is equal to kp(x)k /µ, (b) the computed value of kzp k; (ii)(a) the maximum allowable value of kzq k/α, which is equal to kq(x)k /µ, (b) the computed value of kzq k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ40 /σ41 . The horizontal and vertical axes are logarithmic in the four plots. . . . . . . . . . . . . . . . . . . . . . The normalised singular values, on a logarithmic scale, of the Sylvester resultant matrix for (i) the theoretically exact polynomials p̂(x) and q̂(x), ♦; (ii) the given inexact polynomials p(x) and q(x), ; (iii) the corrected polynomials p̃(x) and q̃(x) for α = 102.8 , ×. All the polynomials are scaled by the geometric mean of their coefficients. . . . . . . 90 97 99 (a) A third order approximation polynomial, (b) a fifth order interpolation curve, and (c) a linear approximation, of a set of data points. . 136 ix Chapter 1 Introduction This chapter contains a brief historical review of the determination of the roots of a polynomial, which is one of the classical problems in applied mathematics. Some commonly used numerical methods for computing the roots of a polynomial are considered, and their advantages and disadvantages are stated. Several simple examples are included in order to show that roundoff errors due to floating point arithmetic, and errors in the polynomial coefficients, can cause a significant deterioration in the computed roots. These examples provide the motivation for the subsequent chapters, in which a robust method, that solves the problems that are highlighted in these examples, is described. 1.1 Historical background The computation of the roots of a polynomial f (x) m X f (x) = ai φi (x), i=0 1 (1.1) CHAPTER 1. INTRODUCTION 2 where φi (x), i = 0, . . . , m, is a set linearly independent basis functions, and ai is the coefficient, assumed to be real, of φi (x), has a rich and long history. An excellent historical review, and the branches of mathematics that have been motivated by solving (1.1), are discussed by Pan [33]. In particular, irrational and complex numbers, algebraic groups, fields and ideals are closely related to this problem, and in more recent times, the development of accurate and stable numerical methods for the solution of (1.1) has been a major area of research. The solution of (1.1) continues to be a major area of research, motivated mainly by problems in algebraic geometry and geometric modelling, because it arises in the calculation of the intersection points of curves and surfaces that are defined by polynomials. Closed form solutions that only involve arithmetic operations and radicals have been obtained for polynomials of degree up to and including four, and Galois (18111832) proved that a closed form solution of (1.1) cannot exist for polynomials of degree greater than four. In spite of this, the fundamental theorem of algebra states that (1.1) always has a complex solution for all positive integers m. This absence of a closed form solution motivated the introduction of iterative and numerical algorithms, and some of them are reviewed in the next section. 1.2 A review of some methods for computing the roots of a polynomial Many numerical methods, using linear algebra, linear programming and Fourier analysis, have been developed for the solution of (1.1). These methods include Bairstow’s method [11], Graeffe’s root-squaring method [11], the algorithm of Jenkins and Traub CHAPTER 1. INTRODUCTION 3 [21, 22], Laguerre’s method [10, 15], Müller’s method [11], and variants of Newton’s method [31]: (a) Bairstow’s method is only valid for polynomials that have real coefficients. Since complex roots for these polynomials occur in complex conjugate pairs, and Bairstow’s method computes the real quadratic factors that generate these complex conjugate roots, complex arithmetic is avoided in this method. Convergence requires a very good initial approximation of the exact root, in which case convergence is quadratic, but the method converges slowly for quadratic factors whose multiplicity is greater than one. (b) Graeffe’s root-squaring method replaces the given polynomial by another polynomial whose roots are the squares of the original polynomial. The roots of the transformed polynomial are therefore more widely separated than those of the original polynomial, particularly for roots whose magnitude is greater than one. This process of squaring the polynomial is repeated until the roots can be calculated directly from the coefficients, such that the approximation to the root has been calculated to a sufficient number of decimal places. The sign of each root is then easily computed from the original polynomial. It is clear that this method fails when there are roots of equal magnitude, but this problem can be overcome [35]. (c) The Jenkins-Traub algorithm involves three stages and is only valid for polynomials with real coefficients. The roots of the polynomial are computed one at a time, and roots of multiplicity m are found m times. They are computed in approximately increasing order of magnitude in order to avoid instability that CHAPTER 1. INTRODUCTION 4 arises when deflating with a large root [43]. It is fast and globally convergent for all distributions of the roots, and real arithmetic is used, from which it follows that complex conjugate roots are computed as quadratic factors. (d) Laguerre’s method is almost always guaranteed to converge to a root of the polynomial, for all values of the initial estimate. Empirical evidence shows that it performs well, which makes it a good choice for a general purpose polynomial root solver. Each iteration of Laguerre’s method requires that the first and second derivatives be evaluated at the estimated root, which makes the method computationally expensive. The method has cubic convergence for simple roots, and linear convergence for multiple roots. (e) Müller’s method is based on the approximation of the polynomial in the neighbourhood of the root by a quadratic function, which is better than the linear approximation used by Newton’s method. An estimate of the root is calculated from this quadratic function, and the process is repeated by updating the points at each stage of the iteration. Experience shows that Müller’s method converges at a rate that is similar to Newton’s method, but unlike Newton’s method, it does not require the evaluation of derivatives. The method may converge to a complex root from a real initial approximation. (f) Newton’s method is an iterative procedure based on a Taylor series of the polynomial about the approximate root. Convergence requires that the estimate be sufficiently near the exact root, and problems can occur at or near a multiple root if precautions are not taken. After a root is computed, it is removed from the polynomial by division, in order that repetition of the iterative scheme does CHAPTER 1. INTRODUCTION 5 not cause convergence to the same root. This removal process of a computed root is called deflation, and polynomials of reduced degrees are obtained as successive roots are deflated. These polynomials are subject not only to roundoff errors, but also to the errors that occur when an approximate root is deflated out, and this has a detrimental effect on the accuracy of the computed roots. The accuracy of the roots can be improved by polishing them against the original polynomial after all the roots have been deflated [43]. These methods yield satisfactory results on the ‘average polynomial’, that is, a polynomial of moderate degree with simple and well-separated roots, assuming that a good starting point in the iterative scheme is used. This heuristic for the ‘average polynomial’ has exceptions, the best example of which is the Wilkinson polynomial, 20 Y f (x) = (x − i) = (x − 1)(x − 2) · · · (x − 20), (1.2) i=1 because its roots are very difficult to compute reliably. More generally, as the degree of the polynomial and/or the multiplicity of a root increases, the quality of the results obtained by standard numerical methods deteriorates, such that they cannot be used for these classes of polynomials. The next section contains some examples that illustrate the problems of computing the roots of polynomials. 1.3 Examples of errors This section contains several examples that show the difficulties of accurately computing the roots of a polynomial that has multiple roots. Examples 1.1 and 1.2 show that roundoff errors can cause a significant deterioration in the computed roots, and 6 CHAPTER 1. INTRODUCTION Example 1.3 shows the effect of a perturbation in a coefficient of a polynomial of high degree. Example 1.1. Consider the fourth order polynomial x4 − 4x3 + 6x2 − 4x + 1 = (x − 1)4 , whose root is x = 1 with multiplicity 4. The roots function in MATLAB returns the roots1 1.0002, 1.0000 + 0.0002i, 1.0000 - 0.0002i, 0.9998, which shows that roundoff errors due to floating point arithmetic, which are about O(10−16 ), are sufficient to cause a relative error in the solution of 2 × 10−4 . Example 1.2. The roots of the polynomial (x − 1)100 were computed by the roots function in MATLAB, and the results are shown in Figure 1.1. 4 3 2 Imag 1 0 −1 −2 −3 −4 0 1 2 3 Real 4 5 6 Figure 1.1: The computed roots of (x − 1)100 . It is seen that the multiple root has split up into 100 distinct roots, such that it 1 This function uses the QR algorithm, which is numerically stable [17], to compute the eigenvalues of the companion matrix. 7 CHAPTER 1. INTRODUCTION cannot be deduced that the original polynomial contains a root of multiplicity 100 at x = 1. Example 1.3. Consider the effect of perturbing the constant coefficient of the polynomial (x − 1)10 by −ǫ. The roots of the perturbed polynomial are the solutions of (x − 1)10 − ǫ = 0, that is, 1 x = 1 + ǫ 10 2πk 2πk = 1+ǫ cos + i sin , 10 10 2πk 2πk 1 = 1+ cos + i sin , 2 10 10 1 10 k = 0, . . . , 9 k = 0, . . . , 9, if ǫ = 2−10 , and thus the perturbed roots lie on a circle in the complex plane, with centre at (1, 0) and radius 1/2. It follows that a relative error in the constant coefficient of 2−10 = 1/1024 causes a relative error of 1/2 in the solution, which corresponds to a condition number of 29 = 512. If the more general equation (x−1)m = 0 is considered, and the constant coefficient is perturbed by −2−10 , then the general solution is 1 2πk 2πk x = 1 + 10 cos + i sin , k = 0, . . . , m − 1. m m 2m These roots are shown in Figure 1.2, and it is seen that as m → ∞, they lie on a circle of unit radius and centred at (1, 0) in the complex plane. The results in these examples are in accord with the remarks of Goedecker [12], who performed comparative numerical tests on the Jenkins-Traub algorithm, a modified version of Laguerre’s algorithm, and the application of the QR decomposition to the companion matrix of the polynomial. Goedecker notes on page 1062 that None of the methods gives acceptable results for polynomials of degrees higher than 50, and he notes on page 1063 that If roots of high multiplicity exist, any . . . method has to 8 CHAPTER 1. INTRODUCTION 1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 0 0.5 1 1.5 Figure 1.2: Perturbation regions of f (x) = (x − 1)m , constant term is perturbed by 2−10 . 2 m = 1, 6, 11, . . . , when the be used with caution. It is seen that the error in the computed roots in Examples 1.1 and 1.2, and Example 1.3, arises from roundoff errors due to floating point arithmetic and inexact data, respectively. Roundoff errors are always present in numerical computations, and their effect on the solution may be benign or significant. Uncertainties in the data (experimental errors) are present in most practical examples, and it is therefore necessary to quantify the effect of this source of error, and roundoff errors, when considering the quality of a computed solution. These two sources of error have fundamentally different properties. In particular, it is usually assumed that data errors have a Gaussian distribution, and that the errors in the coefficients of the polynomial are independent. By contrast, roundoff errors are not random, they are often correlated, and they often behave like discrete, rather than continuous, random CHAPTER 1. INTRODUCTION 9 variables [24]. 1.4 Summary This chapter considered briefly the historical background of the problem of computing the roots of a polynomial, and some of the areas of mathematics that have developed as a consequence. Several commonly used methods for the solution of polynomial equations were considered, and examples that illustrate some of the difficulties that arise were presented. These examples were included in order to motivate the difficulty of the problem, and the next chapter will consider these problems in more detail, such that a better understanding of their origins and potential solutions is obtained. Chapter 2 Condition numbers and errors It was shown in Chapter 1 that the determination of a multiple root of a polynomial in the presence of errors, including roundoff errors, is sufficient to cause an incorrect and unacceptable solution. In this chapter, the forward error, backward error and condition number of a root of a polynomial are defined, and it is shown that they quantify the results in Chapter 1. The relationship between these quantities for a root of arbitrary multiplicity is established, and it is shown that a multiple root is ill-conditioned and that its condition number increases as its multiplicity increases when the perturbations applied to the coefficients of the polynomial are random (unstructured). This ill-conditioning must be compared with the situation that occurs when a structured perturbation that preserves the multiplicities of the roots is applied to the coefficients of the polynomial, in which case a multiple root is well-conditioned, even if it is of high multiplicity. A very simple polynomial root finder is described in the last section of the chapter, and it will be apparent that it differs significantly from the root finders described in Chapter 1. This root finder contains, from a high level, the features of the root 10 CHAPTER 2. CONDITION NUMBERS AND ERRORS 11 finder that is considered in this report. The operations that are required for this root finder are considered and it is shown that their implementation in a floating point environment is not trivial because they are either ill-conditioned or ill-posed. Moreover, the data in many practical examples is inexact, and thus a practical root finder must be robust with respect to minor perturbations in the coefficients of the polynomial. It is assumed that the coefficients of the polynomial (1.1) are real, and that a polynomial of degree m can be identified with a vector space of dimension (m + 1). The elements of vectors that lie in this space are the coefficients of f (x) with respect to the basis functions φi (x), i = 0, . . . , m. The 2-norm of f (x) is equal to the 2-norm of the vector a of its coefficients, kf (x)k = kak = m X i=0 a2i ! 21 . This norm is used exclusively in this document, unless stated otherwise. 2.1 Backward errors and the forward error The error of a computed quantity must be calculated in order to assess its computational reliability. This error measure can be statistical, for example, an average case measure, or deterministic, for example, an upper bound. There exist many sources of errors in computation, and they may degrade the quality of the computed solution. Some common sources of errors include: • Modelling errors These errors arise when a simplified computational model of a complex physical process is used. This may occur, for example, when a low dimensional approximation of a high dimensional system is used. CHAPTER 2. CONDITION NUMBERS AND ERRORS 12 • Roundoff errors These errors necessarily occur in floating point arithmetic. • Data errors These errors are usually significantly larger than roundoff errors, and they are therefore important when computations are performed on experimental data. • Discretisation errors These errors occur when a continuum is replaced by a discrete approximation, as occurs, for example, in the finite element and finite difference methods. These are the main sources of errors, but other sources must also be considered sometimes. For example, data may be sparse or the sample data may not be representative of the population, in which case statistical analysis is required for the correct analysis of the results. The simplest error measure is the forward error, which is defined as the relative error |δx0 | / |x0 |, in the solution x0 , but this cannot always be computed because the exact solution x0 is not known. Another error measure, the backward error, is therefore used, and it is based on the observation that the computed solution, which is in error, is the theoretically exact solution of a neighbouring problem, that is, a problem that is ‘near’ the problem whose solution is desired. Thus the forward error is a measure of the distance between the exact solution and the computed solution, and the backward error is a measure of the distance between the problem whose solution is sought and the problem whose solution has been computed. The difference between these two errors is shown in Figure 2.1, which is reproduced from [17], for the function evaluation y = g(x). It is seen that the forward error is measured in the output (solution) space, and the backward error is measured in the input (data) space. 13 CHAPTER 2. CONDITION NUMBERS AND ERRORS Input space Output space x y = g(x) backward error forward error x + δx ŷ = g(x + δx) Figure 2.1: Backward and forward errors for y = g(x). The solid lines represent exact computations, and the dashed line represents the computed approximation. The forward and backward errors of the root of a polynomial are related by its condition number, and there are two types of backward error and condition number. Specifically, each of these quantities can be measured in the componentwise sense and the normwise wise, and they differ in the error model that is assigned to the coefficients of the polynomials. In particular, it is assumed in the componentwise error model that each coefficient ai is perturbed to ai + δai such that ai + δai ≤ ai (1 + rεc ), i = 0, . . . , m, where r is a uniformly distributed random variable in the range [−1, +1] and ε−1 c is the upper bound of componentwise signal-to-noise ratio. It follows that the componentwise error model is defined by |δai | ≤ εc |ai | , i = 0, . . . , m, (2.1) that is, δai is a uniformly distributed random variable in the range [−εc |ai | , +εc |ai |]. This is a very simple model of roundoff error, which as noted in [24], contains many assumptions that are not satisfied in practice. This simplicity of (2.1) does not, CHAPTER 2. CONDITION NUMBERS AND ERRORS 14 however, diminish its use because it allows a closed form expression for the condition number of the roots of a polynomial to be calculated using a simple probabilistic error model. The normwise error model is defined by kδak ≤ εn kak , (2.2) where ε−1 n is the upper bound of the normwise signal-to-noise ratio. Let µ ≥ 0 be a random variable that has a one-sided Gaussian distribution, and thus if P denotes probability, then r 2 1 µ2 P (0 < µ ≤ kδak) = exp − 2 dµ π εn 2εn 0 Z εn kak r 2 1 µ2 exp − 2 dµ ≤ π εn 2εn 0 2 Z kak r 2 ν = exp − dν π 2 0 Z kδak = P (0 < ν ≤ kak) , where µ = νεn and ν has a one-sided Gaussian distribution. This inequality provides a probabilistic interpretation of the error model (2.2). The definitions of the componentwise and normwise error models (2.1) and (2.2) respectively show that a componentwise quantity (backward error and condition number) is more refined than its normwise equivalent. The componentwise and normwise backward errors are defined in Definitions 2.1 and 2.2, and formulae for them are developed in Theorems 2.1 and 2.2. Expressions for the componentwise and normwise condition numbers are developed in Theorems 2.3 and 2.4. Definition 2.1. The componentwise backward error of the approximation x̃0 , which 15 CHAPTER 2. CONDITION NUMBERS AND ERRORS may be complex, of the root x0 of f (x) is defined as ( ) m X ηc (x̃0 ) = min εc : ãi φi (x̃0 ) = 0 and |δai | ≤ εc |ai | ; ã = a + δa . i=0 Theorem 2.1. The componentwise backward error of the approximation x̃0 , which may be complex, of the root x0 of f (x) is given by |f (x̃0 )| . i=0 |ai φi (x̃0 )| ηc (x̃0 ) = Pm (2.3) The perturbations in the coefficients that achieve this backward error are ! ak f (x̃0 ) ak φk (x̃0 ) , k = 0, . . . , m, δak = − Pm |ak φk (x̃0 )| i=0 |ai φi (x̃0 )| (2.4) Proof. By definition, ãi = ai + δai , i = 0, . . . , m, and thus m m m m X X X X ãi φi (x̃0 ) = ai φi (x̃0 ) + δai φi (x̃0 ) = f (x̃0 ) + δai φi (x̃0 ) . (2.5) where (·) denotes the complex conjugate of (·). i=0 i=0 i=0 i=0 By assumption, the term on the left hand side is equal to zero, and thus m m m X X δak X δai |ai φi (x̃0 )| . |f (x̃0 )| = δai φi (x̃0 ) ≤ ai |ai φi (x̃0 )| ≤ max k ak i=0 i=0 i=0 It follows that |f (x̃0 )| , i=0 |ai φi (x̃0 )| εc ≥ Pm and the result (2.3) is established. Consider now the perturbations in the coefficients that achieve this backward error. By definition, these perturbations must satisfy δak ηc (x̃0 ) = min εc : εc ≥ , k = 0, . . . , m , ak (2.6) 16 CHAPTER 2. CONDITION NUMBERS AND ERRORS and (2.5) implies that they must also satisfy m X f (x̃0 ) = − δai φi (x̃0 ) . (2.7) i=0 Consider the perturbations, from (2.6), |δak | = ηc (x̃0 ) |ak |, or equivalently, |ak | |f (x̃0 )| |δak | = Pm , i=0 |ai φi (x̃0 )| k = 0, . . . , m. (2.8) It must be verified that these perturbations also satisfy (2.7), and this is now established. It follows from (2.8) that δak is given by ak f (x̃0 ) δak = Pm hk (x̃0 ) , i=0 |ai φi (x̃0 )| |hk (x̃0 )| = 1, k = 0, . . . , m, (2.9) where each of the functions hk (x̃0 ) is of unit magnitude and to be determined. The substitution of this equation into the right hand side of (2.7) yields Pm ak f (x̃0 ) φk (x̃0 ) hk (x̃0 ) f (x̃0 ) = − k=0 Pm , i=0 |ai φi (x̃0 )| and this equation determines the functions hk (x̃0 ) , k = 0, . . . , m. It follows that these functions must satisfy and thus Pm a φ (x̃ ) h (x̃ ) k=0 Pmk k 0 k 0 = −1, i=0 |ai φi (x̃0 )| hk (x̃0 ) = − ak φk (x̃0 ) , |ak φk (x̃0 )| k = 0, . . . , m. Equation (2.9) establishes the result (2.4). Definition 2.2. The normwise backward error of the approximation x̃0 , which may be complex, of the root x0 of f (x) is defined as ( ) m X ηn (x̃0 ) = min εn : ãi φi (x̃0 ) = 0 and kδak ≤ εn kak ; ã = a + δa . i=0 17 CHAPTER 2. CONDITION NUMBERS AND ERRORS Theorem 2.2. The normwise backward error, measured in the 2-norm, of the approximation (x̃0 ), which may be complex, of the root x0 of f (x) = 0 is given by ηn (x̃0 ) = |f (x̃0 )| . kφ (x̃0 )k kak (2.10) The perturbations in the coefficients that achieve this backward error are δai = f (x̃0 ) vi kφ (x̃0 )k v (x̃0 )T φ (x̃0 ) = − kφ (x̃0 )k where and kv (x̃0 )k = 1. (2.11) Proof. The proof of this theorem follows closely that of Theorem 2.1. In particular, since the term on the left hand side of (2.5) is equal to zero, it follows that m X kδak kak kφ (x̃0 )k . |f (x̃0 )| = δai φi (x̃0 ) ≤ kak i=0 Thus |f (x̃0 )| kδak ≤ ≤ εn , kak kφ (x̃0 )k kak (2.12) and (2.10) follows. The perturbations in the coefficients that achieve this backward error must satisfy (2.7) and (2.12). Specifically, consider the perturbation vector whose norm is given by kδak = ηn (x̃0 ) kak = |f (x̃0 )| , kφ (x̃0 )k from which it follows that f (x̃0 ) vk (x̃0 ) δak = kφ (x̃0 )k where kv(x̃0 )k = m X k=0 |vk (x̃0 )|2 ! 21 = 1, and the functions vk (x̃0 ) are to be determined. The substitution of these expressions 18 CHAPTER 2. CONDITION NUMBERS AND ERRORS for the perturbations δak into (2.7) yields m X vk (x̃0 )φk (x̃0 ) = − kφ (x̃0 )k , k=0 and thus the functions vk (x̃0 ) must satisfy v (x̃0 )T φ (x̃0 ) = − kφ (x̃0 )k kv(x̃0 )k = 1. and This establishes the result (2.11). The next section considers the componentwise and normwise condition numbers of a root of a polynomial. 2.2 Condition numbers Expressions for the componentwise and normwise condition numbers of a root of a polynomial are derived in Theorems 2.3 and 2.4 respectively. Theorem 2.3. Let the coefficients ai of f (x) in (1.1) be perturbed to ai + δai where |δai | ≤ εc |ai | , i = 0 . . . m. Let the real root x0 of f (x) have multiplicity r, and let one of these r roots be perturbed to x0 + δx0 due to the perturbations in the coefficients. Then the componentwise condition number of x0 is κc (x0 ) = 1 1 |δx0 | 1 = 1− 1 |δai |≤εc |ai | |x0 | εc εc r |x0 | max r! (x0 )| |f (r) Proof. Consider the perturbed polynomial f (x + δx), m X f (x + δx) = ai φi (x + δx) m X i=0 |ai φi (x0 )| ! r1 i=0 = m X i=0 (ai + δai ) φi (x + δx) − m X i=0 δai φi (x + δx) . . (2.13) CHAPTER 2. CONDITION NUMBERS AND ERRORS 19 Since x0 + δx0 is a root of the perturbed polynomial, that is, m X (ai + δai ) φi (x0 + δx0 ) = 0, i=0 it follows that f (x0 + δx0 ) = − By Taylor’s theorem, m X δai φi (x0 + δx0 ) . i=0 m X δxk 0 f (x0 + δx0 ) = k=0 and φi (x0 + δx0 ) = m X δxk 0 k=0 and hence m X δxk 0 k=0 k! f (k) (x0 ) = − f (k) (x0 ) , k! m X i=0 k! δai (k) φi (x0 ) , m X δxk 0 k=0 k! (k) φi (x0 ) . Since x0 is an r-tuple root, f (k) (x0 ) = 0 for 0 ≤ k ≤ r − 1, and the perturbation δx0 is assumed to be small, it follows that m m X X δxk0 (k) r! r δai φ (x0 ) , δx0 = − (r) f (x0 ) i=0 k! i k=0 and hence 1 r! r |δx0 | = (r) f (x0 ) 1 m m X r X δxk0 (k) δai φi (x0 ) k! i=0 k=0 ! 1r m m X X δxk0 (k) r! ≤ |δa | φ (x ) i 0 k! i |f (r) (x0 )| i=0 k=0 ! r1 m X 1 r! ≤ εcr |ai φi (x0 )| , |f (r) (x0 )| i=0 (2.14) since δx0 is small and thus only the term corresponding to k = 0 is retained. It 20 CHAPTER 2. CONDITION NUMBERS AND ERRORS follows that |δx0 | 1 1 1 ≤ 1− 1 |x0 | εc εc r |x0 | and the result (2.13) follows. r! (x0 )| |f (r) m X i=0 |ai φi (x0 )| ! r1 , Theorem 2.4. Let the coefficients ai of f (x) in (1.1) be perturbed to ai + δai where kδak ≤ εn kak. Let the real root x0 of f (x) have multiplicity r, and let one of these r roots be perturbed to x0 + δx0 due to the perturbations in the coefficients. Then the normwise condition number of x0 is 1 1 |δx0 | 1 = 1− 1 κn (x0 ) = max kδak≤εn kak |x0 | εn εn r |x0 | Proof. It follows from (2.14) that 1 r! r |δx0 | = (r) f (x0 ) 1 r! r ≤ (r) f (x0 ) r! kak kφ (x0 )k (r) |f (x0 )| r1 . (2.15) m 1 m X r k X δx0 (k) δai φi (x0 ) k! 1 r! r = (r) f (x0 ) since δx0 is small, and thus i=0 m X i=0 m X i=0 k=0 ! 1r m X δxk0 (k) φ (x ) |δai | 0 k! i k=0 |δai φi (x0 )| ! r1 , r1 r! |δx0 | ≤ kδak kφ (x0 )k |f (r) (x0 )| r1 1 r! r ≤ εn kak kφ (x0 )k . |f (r) (x0 )| It follows that |δx0 | 1 1 1 ≤ 1− 1 |x0 | εn εn r |x0 | and the result (2.15) follows. r! kak kφ (x0 )k (r) |f (x0 )| r1 , The condition numbers (2.13) and (2.15) assume that the perturbations δai are CHAPTER 2. CONDITION NUMBERS AND ERRORS 21 sufficiently small, such that only lowest order terms need be considered. There exist circumstances when then this assumption is not satisfied, for example, two close but distinct roots, and this situation is considered in [46]. Furthermore, condition numbers are, by definition, worst case error measures, but it is also possible to derive expressions for average case componentwise and normwise error measures [44]. The basis in which a polynomial is expressed may have a significant effect on the condition numbers of its roots. For example, the componentwise condition numbers of the roots of the power and Bernstein basis forms of the Wilkinson polynomial (1.2), scaled so that all the roots lie in the interval [0, 1], 20 Y i , f (x) = x− 20 i=1 are shown in Tables 1a and 1b in [9], and it is seen that they are consistently lower, by several orders of magnitude, for the Bernstein basis form of f (x) than for the power basis form of f (x). The next section establishes the connection between the forward errors, backward errors and condition numbers of a root of a polynomial. 2.3 Condition numbers, backward errors and the forward error Formulae for the componentwise and normwise backward errors of a root x0 of multiplicity r of f (x) are stated in (2.3) and (2.10) respectively, and formulae for the componentwise and normwise condition numbers of x0 are stated in (2.13) and (2.15) respectively. It is shown in this section that these quantities are related in a simple formula to the forward error of x0 . 22 CHAPTER 2. CONDITION NUMBERS AND ERRORS It is readily verified that P 1 1 ηc (x̃0 ) r 1 r! |f (x̃0 )| m |ai φi (x0 )| r i=0 P , κc (x0 ) εc = εc |x0 | |f (r) (x0 )| m i=0 |ai φi (x̃0 )| and since x0 is a root of multiplicity r, it follows that (δx0 )r (r) (δx0 )r (r) f (x̃0 ) = f (x0 + δx0 ) ≈ f (x0 ) + f (x0 ) = f (x0 ), r! r! and thus κc (x0 ) ηc (x̃0 ) εc 1r |δx0 | εc = |x0 | Pm |ai φi (x0 )| Pm i=0 |ai φi (x0 + δx0 )| i=0 r1 ≈ |δx0 | , |x0 | to lowest order. It is readily verified that an identical expression is valid for the normwise condition number and backward error, and thus to lowest order, 1 |δx0 | ηc (x̃0 ) r = κc (x0 ) εc , |x0 | εc (2.16) and |δx0 | = κn (x0 ) |x0 | ηn (x̃0 ) εn 1r εn . (2.17) It follows that if r = 1, that is, x0 is a simple root, its forward error is equal to the product of its condition number and the backward error of its approximation x̃0 . Furthermore, if x0 is ill-conditioned and simple, it has a large forward error even if its backward error is small, that is, with reference to Figure 2.1, a small error in the input space leads to a large error in the output space. If r is sufficiently large, then (2.16) and (2.17) reduce to |δx0 | ≈ κc (x0 )εc |x0 | and |δx0 | ≈ κn (x0 )εn , |x0 | which are the conditions for which (2.13) and (2.15) hold with equality, that is, κc (x0 ) and κn (x0 ) attain their maximum values as r increases. The multiplicity of x0 enters the expressions for ηc (x̃0 ) and ηn (x̃0 ) indirectly by the constraints that a multiple root places on the coefficients ai . This must be compared CHAPTER 2. CONDITION NUMBERS AND ERRORS 23 with the expressions for κc (x0 ) and κn (x0 ), for which r is both an implicit and explicit argument, which reveals the advantage of the backward errors. It is well known that finite precision arithmetic causes a multiple root to be computed as a cluster of closely spaced roots, from which it follows that a multiple root is a not a differentiable function of the coefficients of the polynomial. If the radius of the cluster is small, and it is sufficiently far from the nearest neighbouring cluster or isolated root, then a simple interpretation of the computed solution is the approximation of the cluster of roots by a multiple root at the arithmetic mean of the cluster. Although this appears a simple solution with an obvious justification - one is merely ‘undoing’ the effects of finite precision arithmetic - and procedures for the detection and validation of the clusters have been developed [20], it can lead to totally incorrect answers [32]. Figure 2.2 shows an example in which clustering fails to provide the correct multiplicities of the roots. The figure shows the computed roots, evaluated 1000 times, of four polynomials, each of whose coefficients is perturbed by noise, using the value εc = 10−7 . It is seen that when the roots are well spaced, their values can be estimated by simple clustering. As the roots merge, however, the clusters also merge, such that they cannot be distinguished for the polynomials in Figures 2.2(c) and (d), and moreover, it is impossible to deduce that the polynomials contain only two distinct roots. The perturbations considered in the derivation of (2.13) and (2.15) are random, and as noted above, they are associated with the break up of a multiple root. There exist, however, structured (non-random) perturbations that preserve the multiplicities 24 CHAPTER 2. CONDITION NUMBERS AND ERRORS 3 5 f(x) = (x−0.6) (x−1) 3 ε = 1e−007 5 f(x) = (x−0.7) (x−1) c 0.08 ε = 1e−007 c 0.1 0.06 0.04 0.05 Imag Imag 0.02 0 0 −0.02 −0.04 −0.05 −0.06 −0.08 0.5 0.6 0.7 0.8 0.9 Real 3 1 1.2 −0.1 1.3 5 0.1 0.05 0.05 0 0 Imag 0.1 −0.05 −0.1 −0.15 −0.15 0.9 0.95 Real 1 1.1 1 1.05 1.1 5 1.15 ε = 1e−007 c −0.05 −0.1 0.85 0.9 f(x) = (x−0.9) (x−1) c 0.15 0.8 0.8 3 ε = 1e−007 0.15 −0.2 0.75 0.7 Real f(x) = (x−0.8) (x−1) Imag 1.1 −0.2 0.8 0.9 1 1.1 1.2 1.3 Real Figure 2.2: The root distribution of four polynomials after the coefficients have been perturbed: (a) f (x) = (x − 0.6)3 (x − 1)5 , (b) f (x) = (x − 0.7)3 (x − 1)5 , (c) f (x) = (x − 0.8)3 (x − 1)5 , (d) f (x) = (x − 0.9)3 (x − 1)5 . of the roots, that is, the multiple roots do not break up, and furthermore, the multiple root is well-conditioned with respect to these perturbations. These structured perturbations are considered in the next section. 25 CHAPTER 2. CONDITION NUMBERS AND ERRORS 2.4 The geometry of ill-conditioned polynomials Polynomials with one or more multiple zeros form a subset of the space of all polynomials. In particular, a root x0 of multiplicity r introduces (r − 1) constraints on the coefficients, and since a monic polynomial of degree m has m degrees of freedom, it follows that the root x0 lies on a manifold of dimension (m − r + 1) in a space of dimension m. This manifold is called a pejorative manifold [19] because polynomials that are near this manifold are ill-conditioned [23]. This section introduces the pejorative manifold of a polynomial and it is shown that it plays an important role in determining when a polynomial is, and is not, ill-conditioned. Specifically, a polynomial that lies on a pejorative manifold is wellconditioned with respect to (the structured) perturbations that keep it on the manifold, which corresponds to the situation in which the multiplicity of the roots is preserved, but it is ill-conditioned with respect to perturbations that move it off the manifold, which corresponds to the situation in which a multiple root breaks up into a cluster of simple roots. Example 2.1 shows that the condition number of a multiple root of a polynomial is unbounded as the magnitude of the perturbation tends to zero, assuming that the perturbation does not preserve the multiplicity of the root, that is, the polynomial moves off its pejorative manifold. Example 2.1. Consider the problem of determining the smaller real root of the polynomial f (x) = (x + 1 + √ ǫ)(x + 1 − √ ǫ), (2.18) where |ǫ| ≤ 0.01. Since a real root does not exist for ǫ < 0, the problem is ill-posed. 26 CHAPTER 2. CONDITION NUMBERS AND ERRORS If it is assumed that 0 ≤ ǫ ≤ 0.1, then the problem is well-posed because the solution is unique and changes in a continuous manner as ǫ changes. It is, however, ill-conditioned because there exists a solution x0 , x1 for every value of ǫ, and an arbitrarily small change in ǫ leads to an arbitrarily large change in the solution as ǫ → 0. In particular, x0 = −1 − √ ǫ and x1 = −1 + √ ǫ, and thus dx0 1 =− √ dǫ 2 ǫ and dx1 1 = √ . dǫ 2 ǫ (2.19) A monic quadratic polynomial is of the form x2 + bx + c, and a double root exists if b2 = 4c. (2.20) All real quadratic monic polynomials whose coefficients lie on the curve (2.20) in (b, c) ∈ R2 have a double root, and this curve is therefore the pejorative manifold for this class of polynomial. The condition (2.20) is satisfied by ǫ = 0, and the polynomial (2.18) lies near this manifold. This proximity to the manifold is the cause of its ill-conditioning. The numerical condition (2.19) of the roots x0 , x1 of the quadratic polynomial (2.18) is inversely proportional to the square root of its distance ǫ from the quadratic polynomial that has a double root. This is a particular example of the more general fact that a polynomial is ill-conditioned if it is near a polynomial that has a multiple root. A polynomial that is near a polynomial with a multiple root is ill-conditioned, and it is recalled from above that a polynomial with a multiple root forms a subset of the space of all polynomials. This leads to the definition of a pejorative manifold 27 CHAPTER 2. CONDITION NUMBERS AND ERRORS of a polynomial. Definition 2.3. A pejorative manifold of a monic polynomial f (x), of degree m, P whose root multiplicities are r1 , r2 , . . . , rk , is a surface of dimension m − ki=1 (ri − 1) in the space Rm of real monic polynomials of degree m in which f (x) lies. Example 2.2. Consider a cubic polynomial f (x) with real roots x0 , x1 and x2 , (x − x0 )(x − x1 )(x − x2 ) = x3 − (x0 + x1 + x2 )x2 + (x0 x1 + x1 x2 + x2 x0 )x − x0 x1 x2 . • If f (x) has three distinct roots, then x0 6= x1 6= x2 , and the pejorative manifold is R3 , apart from the points for which xi = xj , i = 0, 1; j = i + 1, 2. • If f (x) has one double root and one simple root, then x0 = x1 6= x2 , and thus f (x) can be written as x3 − (2x1 + x2 )x2 + (x21 + 2x1 x2 )x − x21 x2 . The pejorative manifold of a cubic polynomial that has a double root is, therefore, the surface defined by −(2x1 + x2 ) (x21 + 2x1 x2 ) −x21 x2 x1 6= x2 , , x1 , x2 ∈ R. • If f (x) has a triple root, then x0 = x1 = x2 , and thus f (x) can be written as x3 − 3x0 x2 + 3x20 x − x30 . The pejorative manifold of a cubic polynomial that has a triple root is, therefore, the curve defined by −3x0 3x20 −x30 , x0 ∈ R. 28 CHAPTER 2. CONDITION NUMBERS AND ERRORS It was stated above that a multiple root is well-conditioned when the multiplicity of the root is preserved, in which case the polynomial stays on its pejorative manifold. This result is established in the next theorem. Theorem 2.5. The condition number of the real root x0 of multiplicity r of the polynomial f (x) = (x − x0 )r , such that the perturbed polynomial also has a root of multiplicity r is r ρ(x0 ) := 1 k(x − x0 ) k 1 ∆x0 = = r−1 ∆f r |x0 | k(x − x0 ) k r |x0 | where ∆f = kδf k kf k and ∆x0 = r 2 2i i=0 i (x0 ) Pr−1 r−12 (x0 )2i i=0 i Pr ! 12 , |δx0 | . |x0 | Proof. If f (x, x0 ) := f (x), then f (x, x0 ) = (x − x0 )r r X r r−i = x (−x0 )i i i=0 r X r r = x + (−1)i (x0 )i xr−i . i i=1 A neighbouring polynomial that also has a root of multiplicity r is f (x, x0 + δx0 ) = (x − (x0 + δx0 ))r , and hence r X r f (x, x0 + δx0 ) − f (x, x0 ) = (−1)i (x0 + δx0 )i − xi0 xr−i i i=1 r X r r−i = δx0 (−1)i ixi−1 + O(δx20 ). 0 x i i=1 (2.21) CHAPTER 2. CONDITION NUMBERS AND ERRORS 29 Since (x − x0 ) r−1 r−1 X r−1 xr−1−i (−x0 )i i i=0 r 1X r r−i = − (−1)i ixi−1 , 0 x r i=1 i = it follows that to first order, δf := f (x, x0 + δx0 ) − f (x, x0 ) = −rδx0 (x − x0 )r−1 , (2.22) and thus the condition number of x0 that preserves its multiplicity is 1 k(x − x0 )r k ∆x0 = . ∆f r |x0 | k(x − x0 )r−1 k Since r (x − x0 ) = the result (2.21) follows. r X r i i=0 (−x0 )i xr−i , Example 2.3. The condition number ρ(1) of the root x0 = 1 of the polynomial (x − 1)r is, from (2.21), ρ(1) = 1 r and since r 2 X r i=0 it follows that v u 1u ρ(1) = t r ! 12 r 2 i=0 i Pr−1 r−12 i=0 i Pr i 2r r 2(r−1) r−1 , 2r = , r 1 = r r 2(2r − 1) 2 ≈ , r r if r is large. This condition number must be compared with the componentwise and normwise condition numbers, κc (1) ≈ |δx0 | εc and κn (1) ≈ |δx0 | , εn CHAPTER 2. CONDITION NUMBERS AND ERRORS 30 which are proportional to the signal-to-noise ratio. By contrast, ρ(1) is independent of the perturbation of the polynomial and it decreases as the multiplicity r of the root x0 = 1 increases. Example 2.4. Consider the Bernstein basis form of the polynomial f (x) = (x − x0 )r = (−x0 (1 − x) + x(1 − x0 ))r r X r (1 − x0 )i (−x0 )r−i xi (1 − x)r−i = i i=0 r X (r) r = ai (1 − x)r−i xi , i i=0 (r) where the superscript (r) denotes that ai is the ith coefficient of a polynomial of degree r, and (r) ai = (−1)r−i x0r−i (1 − x0 )i . A perturbation analysis similar to that in Theorem 2.5 shows that (2.22) is valid for the Bernstein basis form of (x − x0 )r , and thus where v u P 2 (r) u r r i=0 ai 1 k(x − x0 ) k 1 u ∆x0 u = = , ∆f r |x0 | k(x − x0 )r−1 k r |x0 | t Pr−1 (r−1) 2 i=0 ai (r) (r−1) ai a =− i , x0 i = 0, . . . , r − 1. It therefore follows that v u P 2 u r (r) i=0 ai ∆x0 1 k(x − x0 )r k 1u = = u , ∆f r |x0 | k(x − x0 )r−1 k r t Pr−1 (r) 2 i=0 ai is the condition number of the Bernstein basis form of (x − x0 )r that preserves the CHAPTER 2. CONDITION NUMBERS AND ERRORS 31 multiplicity r of x0 . The next theorem extends Theorem 2.5 to the more general polynomial f (x) = (x − x0 )r g(x), g(x0 ) 6= 0, (2.23) where g(x) is a polynomial of degree n [19]. Theorem 2.6. The condition number of x0 of (2.23) such that its multiplicity r is preserved is ρ(x0 ) = = where |∆x0 | k∆f k 1 sup degree k(x − x0 )r g(x)k |δh(x0 )| , r |g(x0 )| |x0 | k(x − x0 )r−1 δh(x)k δh(x) ≤ n δh(x) = (x − x0 ) δg(x) − rg(x)δx0 . (2.24) (2.25) Proof. Let δf (x) be a perturbation in f (x) such that the multiplicity r of x0 is preserved, f (x) + δf (x) = (x − (x0 + δx0 ))r (g(x) + δg(x)) , where δg(x) is a polynomial to be determined. It follows that δf (x) = (x − (x0 + δx0 ))r (g(x) + δg(x)) − (x − x0 )r g(x) = ((x − x0 ) − δx0 )r (g(x) + δg(x)) − (x − x0 )r g(x) = (x − x0 )r − r (x − x0 )r−1 δx0 (g(x) + δg(x)) − (x − x0 )r g(x) = (x − x0 )r−1 ((x − x0 ) δg(x) − rg(x)δx0 ) = (x − x0 )r−1 δh(x), to lowest order, where the polynomial δh(x), whose maximum degree is n, is defined CHAPTER 2. CONDITION NUMBERS AND ERRORS 32 in (2.25). It follows that δf (x) = (x − x0 )r−1 δh(x), (2.26) δh(x) + rg(x)δx0 . x − x0 (2.27) and from (2.25) that δg(x) = Since the polynomial δg(x) is non-rational, the denominator must be an exact divisor of the numerator, and this condition is satisfied if δx0 = − δh(x0 ) , rg(x0 ) (2.28) which defines δx0 . It therefore follows from (2.27) and (2.26), respectively, that δg(x) and δf (x) are uniquely specified for every polynomial δh(x). It follows from (2.26) and (2.28) that the ratio of the change in the root x0 to the change δf = δf (x) in the polynomial f (x) is |δx0 | 1 δh(x0 ) 1 = kδf k r g(x0 ) kδf (x)k 1 |δh(x0 )| = . |rg(x0 )| k(x − x0 )r−1 δh(x)k (2.29) The condition number (2.24) follows by dividing both sides of (2.29) by |x0 | / kf k, and taking the supremum over all polynomials δh(x) of degree less than or equal to n. Theorems 2.5 and 2.6, and Examples 2.3 and 2.4, consider the simplest type of polynomials that have a multiple root, but the general polynomial "K # Y f (x) = (x − xi )ri g(x), g(xi ) 6= 0, i=1 where the roots of g(x) are simple, must be considered in order to calculate the condition number of xi such that its multiplicity ri is preserved. The derivation of these condition numbers, one for each multiple root, follows closely the method in CHAPTER 2. CONDITION NUMBERS AND ERRORS 33 Theorems 2.5 and 2.6, and the resulting condition numbers are very similar to (2.24). Consider the polynomial (2.23), whose condition number is given in (2.24), ρ(x0 ) = σ , r |g(x0 )| |x0 | where σ is the function on the right hand side of (2.24), and the neighbouring polynomial, f˜(x) = (x − x0 )r (g(x) − g(x0 )) , which has a root x0 of multiplicity (r + 1). The magnitude of the difference between the polynomials is and thus kδf k = f (x) − f˜(x) = |g(x0 )| k(x − x0 )r k , kδf k = σ k(x − x0 )r k , r |x0 | ρ(x0 ) from which it follows that if the condition number ρ(x0 ) of x0 of f (x) is large, then there is a neighbouring polynomial that has a root of multiplicity (r + 1). This explanation of ill-conditioning requires that the nearest polynomial on the manifold of polynomials that have an (r + 1)-tuple root be computed. If this polynomial is also ill-conditioned, then it is necessary to compute the nearest polynomial on the manifold of polynomials with an (r + 2)-tuple root. The structured condition number ρ(x0 ) shows that a polynomial is ill-conditioned when it is near a pejorative manifold, but it is well-conditioned when it is on the manifold, except when it approaches manifolds of lower dimension. It follows, therefore, that if a polynomial with a root of multiplicity r is ill-conditioned, then it is near a submanifold, and if this polynomial is also ill-conditioned, then it is near a submanifold of this manifold. This procedure of locating manifolds that are defined CHAPTER 2. CONDITION NUMBERS AND ERRORS 34 by higher order multiplicities is continued until the roots of the computed polynomial are sufficiently well-conditioned, and close enough to the original polynomial. In this circumstance, the original polynomial may be considered to be a small perturbation of the computed polynomial, all of whose roots are well-conditioned. The computed polynomial is acceptable if it is sufficiently near the original polynomial, and it is reasonable to hypothesize that the original polynomial has a constraint that favours multiple roots. 2.5 A simple polynomial root finder It has been shown in the previous sections that a multiple root is ill-conditioned with respect to random perturbations because they cause it to break up into a cluster of simple roots, but that it is stable with respect to perturbations that maintain its multiplicity. A simple root is, in general, better conditioned than a multiple root, and it is therefore instructive to consider a polynomial root finder that reduces to the determination of the roots of a several polynomials, each of which only contains simple roots. This method, which is described in [42], pages 65-68, differs from the methods that are described in Chapter 1 because the multiplicities of the roots are calculated initially, after which the values of the roots are computed. The multiplicities of the roots are calculated by a sequence of greatest common divisor (GCD) computations. Consider the polynomial f (x) = (x − x1 )r1 (x − x2 )r2 · · · (x − xl )rl g0 (x), where ri ≥ 2, i = 1, . . . , l, g0 (x) contains only simple roots, and the multiple roots are arranged such that r1 ≥ r2 ≥ · · · ≥ rl . Since a root of multiplicity ri of f (x) is a root CHAPTER 2. CONDITION NUMBERS AND ERRORS 35 of multiplicity ri − 1 of its derivative f (1) (x), it follows that f (1) (x) = (x − x1 )r1 −1 (x − x2 )r2 −1 · · · (x − xl )rl −1 g1 (x), where g0 (x) and g1 (x) are coprime polynomials, and the roots of g1 (x) are simple. It follows that q1 (x) := GCD f (x), f (1) (x) = (x − x1 )r1 −1 (x − x2 )r2 −1 · · · (x − xl )rl−1 , and thus the polynomial f (x)/q1 (x) is equal to the product of all roots of f (x), f (x) = (x − x1 )(x − x2 ) · · · (x − xl )g0 (x). q1 (x) (1) The GCD of q1 (x) and q1 (x) is (1) q2 (x) := GCD q1 (x), q1 (x) = (x − x1 )r1 −2 (x − x2 )r2 −2 · · · (x − xk )rk −2 , where ri ≥ 2, i = 1, . . . , k, and thus q1 (x) = (x − x1 )(x − x2 ) · · · (x − xk ), q2 (x) which is the product of all the roots of f (x) whose multiplicity is greater than or equal to 2. This process of GCD computations and polynomial divisions is repeated, and it terminates when the division yields a polynomial of degree one, corresponding to the divisor of f (x) of maximum degree. In order to generalise this procedure, let w1 (x) be the product of all linear factors of f (x), let w2 (x) be the product of all quadratic factors of f (x), and in general, let wi (x) be the product of all factors of degree i of f (x). If f (x) does not contain a factor of degree k, then wk (x) is set equal to a constant, which can be assumed to be unity. It follows that to within a constant multiplier, max f (x) = w1 (x)w22 (x)w33 (x) · · · wrrmax (x), 36 CHAPTER 2. CONDITION NUMBERS AND ERRORS and thus max −1 q1 (x) = GCD f (x), f (1) (x) = w2 (x)w32 (x)w43 (x) · · · wrrmax (x). Similarly, q2 (x) = GCD (1) q1 (x), q1 (x) (1) max −2 = w3 (x)w42 (x)w53 (x) · · · wrrmax (x) max −3 q3 (x) = GCD q2 (x), q2 (x) = w4 (x)w52 (x)w63 (x) · · · wrrmax (x) (1) max −4 q4 (x) = GCD q3 (x), q3 (x) = w5 (x)w62 (x) · · · wrrmax (x) .. . and the sequence terminates at qrmax (x), which is a constant. A sequence of polynomials hi (x), i = 1, . . . , rmax , is defined such that h1 (x) = f (x) q1 (x) = w1 (x)w2 (x)w3 (x) · · · h2 (x) = q1 (x) q2 (x) = w2 (x)w3 (x) · · · h3 (x) = q2 (x) q3 (x) = w3 (x) · · · .. . hrmax (x) = qrmax −1 qrmax = wrmax (x), and thus all the functions, w1 (x), w2 (x), · · · , wrmax (x), are determined from w1 (x) = h1 (x) , h2 (x) w2 (x) = h2 (x) , h3 (x) ··· , wrmax −1 (x) = hrmax −1 (x) , hrmax (x) until wrmax (x) = hrmax (x). The equations w1 (x) = 0, w2 (x) = 0, ··· , wrmax (x) = 0, contain only simple roots, and they yield the simple, double, triple, etc., roots of f (x). In particular, if x0 is a root of wi (x), then it is a root of multiplicity i of f (x). Algorithm 2.1 contains pseudo-code for the implementation of Uspensky’s method for CHAPTER 2. CONDITION NUMBERS AND ERRORS the calculation of the roots of a polynomial. Algorithm 2.1: Uspensky’s algorithm for the roots of a polynomial Input A polynomial f (x). Output The roots of f (x). Begin 1. Set q0 = f . 2. Calculate the GCD of f and f (1) . q1 = GCD f, f (1) 3. Calculate h1 = q0 . q1 4. Set j = 2. 5. While degree qj−1 > 0 do (a) Calculate the GCD of qj−1 and its derivative. (1) qj = GCD qj−1 , qj−1 (b) Calculate hj = qj−1 . qj (c) Calculate wj−1 = hj−1 /hj . (d) Calculate the roots of wj−1 . (e) Set j = j + 1. End while % They are of multiplicity j − 1. 37 CHAPTER 2. CONDITION NUMBERS AND ERRORS 6. Set wj−1 = hj−1 and solve wj−1 = 0 % They are of multiplicity j − 1. End Example 2.5. Consider the polynomial f (x) = x6 − 3x5 + 6x3 − 3x2 − 3x + 2, whose derivative is f (1) (x) = 6x5 − 15x4 + 18x2 − 6x − 3. It follows that q1 (x) = GCD f (x), f (1) (x) = x3 − x2 − x + 1 (1) q1 (x) = 3x2 − 2x − 1, and hence (1) (1) q2 (x) = GCD q1 (x), q1 (x) = x − 1 and q3 (x) = GCD q2 (x), q2 (x) = 1. The polynomials h1 (x), h2 (x) and h3 (x) are h1 (x) = f (x) q1 (x) = x3 − 2x2 − x + 2 h2 (x) = q1 (x) q2 (x) = x2 − 1 h3 (x) = q2 (x) q3 (x) = x − 1, and thus the polynomials w1 (x), w2 (x) and w3 (x) are w1 (x) = h1 (x) h2 (x) = x−2 w2 (x) = h2 (x) h3 (x) = x+1 w3 (x) = h3 (x) = x − 1. 38 CHAPTER 2. CONDITION NUMBERS AND ERRORS 39 It follows that the factors of f (x) are f (x) = (x − 1)3 (x + 1)2 (x − 2), and thus f (x) has a triple root at x = 1, a double root at x = −1 and a simple root at x = 2. Example 2.5 contains the essential features of the algorithm for the computation of the roots of a polynomial that will be described in subsequent chapters of this document. Although it is easy to follow, it contains steps whose implementation in a floating point environment raises some difficult issues: • The computation of the GCD of two polynomials is an ill-posed problem because it is not a continuous function of their coefficients. In particular, the polynomials f (x) and g(x) may have a non-constant GCD, but the perturbed polynomials f (x) + δf (x) and g(x) + δg(x) may be coprime. Even if f (x) and g(x) are specified exactly and have a non-constant GCD, roundoff errors may be sufficient to imply that they are coprime when the GCD is computed in a floating point environment. • The determination of the degree of the GCD of two polynomials reduces to the determination of the rank of a resultant matrix, but the rank of a matrix is not defined in a floating point environment. In particular, the rank loss of a resultant matrix is equal to the degree of their GCD, and a minor perturbation in one of both of the polynomials is sufficient to cause their resultant matrix to have full rank, which suggests that the polynomials are coprime. The determination of the rank of a noisy matrix is a challenging problem that arises in many applications. CHAPTER 2. CONDITION NUMBERS AND ERRORS 40 • Polynomial division, which reduces to the deconvolution of their coefficients, is an ill-conditioned problem that must be implemented with care in order to obtain a computationally reliable solution. • The data in many practical examples is inexact, and thus the polynomials are only specified within a tolerance. The given inexact polynomials may be coprime, and it is therefore necessary to perturb each polynomial slightly, such that they have a non-constant GCD. This GCD is called an approximate greatest common divisor of the given inexact polynomials, and it is necessary to compute the smallest perturbations such that the perturbed polynomials have a non-constant GCD. • The amplitude of the noise may or may not be known in practical examples, and it may only be known approximately and not exactly. It is desirable that a polynomial root finder not require an estimate of the noise level, and that all parameters and thresholds be calculated from the data, that is, the polynomial coefficients. The subsequent chapters in this document address these issues in order to develop a robust polynomial root finder. 2.6 Summary Expressions for the componentwise and normwise backward errors, and componentwise and normwise condition numbers, were developed in this chapter. It was shown that the backward errors and condition numbers are related to the forward error and CHAPTER 2. CONDITION NUMBERS AND ERRORS 41 that the expressions for the condition numbers attain their maximum values for a multiple root of high multiplicity. A multiple root is well-conditioned when the perturbed polynomial has a root of the same multiplicity as the original (unperturbed) polynomial. The pejorative manifold of a polynomial was defined, and this allowed a geometric interpretation of ill-conditioning to be developed. A simple polynomial root finder was introduced and it was shown that it differs from the root finders that were discussed in Chapter 1 because the multiplicity of each root is calculated initially, after which the values of the roots are determined. It was shown, however, that the computational implementation of this algorithm in a floating point environment and with inexact data requires that some difficult problems be addressed. Chapter 3 The Sylvester resultant matrix The simple polynomial root finder in Section 2.5 is, from a high level, the polynomial root finder whose computational implementation is considered in this document. It requires that the GCD of pairs of polynomials be computed several times, and this chapter describes the application of the Sylvester resultant matrix and its subresultant matrices for this calculation. Two polynomials are coprime if and only if the determinant of their resultant matrix is equal to zero, and if they are not coprime, the degree and coefficients of their GCD can be calculated from their resultant matrix. In particular, the rank loss of this matrix is equal to the degree of the GCD of the polynomials, and the coefficients of the GCD are obtained by reducing the matrix to upper triangular form. The Sylvester resultant matrix for the power and Bernstein polynomial bases is considered in this chapter, and this leads on to a discussion of their subresultant matrices and their use in calculating the degree of a common divisor of two polynomials. There are several resultant matrices, including the Sylvester, Bézout and companion resultant matrices, and they may be considered equivalent because they all 42 43 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX yield the same information on the GCD of two polynomials. Euclid’s algorithm is a classical method of calculating the GCD of two polynomials, and the connection between it and the Sylvester resultant matrix is established in [50]. 3.1 The Sylvester resultant matrix for power basis polynomials Let f = f (x) and g = g(x) be real polynomials of degrees m and n respectively, m n X X i f (x) = ai x and g(x) = bi xi , (3.1) i=0 i=0 where am , bn 6= 0. The Sylvester resultant matrix S(f, g) ∈ R(m+n)×(m+n) is equal to bn am bn−1 bn am−1 am . . . . .. .. .. .. b a n−1 m−1 .. .. .. .. . bn . am b1 . . a1 , S(f, g) = .. .. a0 . bn−1 . am−1 b0 b1 a1 . . . . .. .. .. .. b0 a0 .. .. . b1 . a1 b0 a0 | {z n columns } | {z m columns } (3.2) where the coefficients ai of f (x) occupy the first n columns of S(f, g), and the coefficients bi of g(x) occupy the last m columns of S(f, g). The most important property of the Sylvester matrix is that det S(f, g) = 0 is a necessary and sufficient condition CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 44 for f (x) and g(x) to have a non-constant common divisor [45, 49]. Furthermore, if det S(f, g) = 0, then the degree and coefficients of the GCD of f (x) and g(x) can be computed from S(f, g). Theorem 3.1. Let S(f, g) be the Sylvester matrix of the polynomials f (x) and g(x) that are defined in (3.1). If the degree of their GCD is d, then 1. The rank of S(f, g) is equal to m + n − d. 2. The coefficients of their GCD are given in the last non-zero row of S(f, g)T after it has been reduced to upper triangular form by an LU or QR decomposition. Proof. See [4], page 36, or [6]. Example 3.1. Consider the polynomials f (x) and g(x), f (x) = −3x3 + 25 2 23 x − x+3 2 2 and g(x) = 6x2 − 7x + 2, whose GCD is g(x). The transpose S(f, g)T of S(f, g) − 23 3 −3 25 2 2 25 0 −3 − 23 2 2 T S(f, g) = 2 0 6 −7 6 −7 2 0 0 0 6 −7 and its reduction to row echelon (upper triangular) −3 25 − 23 3 0 2 2 25 0 −3 − 23 3 2 2 0 0 6 −7 2 0 0 0 0 0 0 0 0 0 0 is 0 3 0 , 0 2 form yields the matrix . CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 45 The rank loss of this matrix is two, which is equal to the degree of the GCD of f (x) and g(x). Furthermore, the non-zero coefficients in its last non-zero row are 6, −7, 2, and thus the GCD is 6x2 − 7x + 2, which is equal to g(x), as required. The Sylvester matrix allows the degree and coefficients of the GCD of two polynomials to be calculated. The next section considers subresultant matrices that are derived from the Sylvester matrix, and it is shown that the order of a submatrix is related to the degree of a common divisor of two polynomials. Subresultant matrices will be used extensively in Chapter 4, where the GCD of two inexact polynomials is considered. 3.1.1 Subresultants of the Sylvester matrix for power basis polynomials The subresultant matrices of the Sylvester matrix are obtained by the deletion of some of its rows and columns. In particular, the k’th Sylvester matrix, or subresultant matrix, Sk (f, g) ∈ R(m+n−k+1)×(m+n−2k+2) is a submatrix of S(f, g) that is formed by deleting the last k − 1 rows of S(f, g), the last k − 1 columns of the coefficients of f (x), and the last k − 1 columns of the coefficients of g(x). 46 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX Example 3.2. If m = 4 and n = 3, then a4 a3 a4 a2 a3 S1 = S(f, g) = a1 a2 a0 a1 a0 S2 = S2 (f, g) = a4 b3 a3 a4 b2 b3 a2 a3 b1 b2 a1 a2 b0 b1 a0 a1 a0 b0 b3 , b2 b1 b0 b3 b2 b3 a4 b1 b2 b3 a3 b0 b1 b2 a2 b0 b1 a1 b0 a0 b3 , b2 b1 b0 a 4 a3 S3 = S3 (f, g) = a2 a1 a0 b3 b2 b1 b0 b3 b2 . b1 b0 Theorems 3.2 and 3.3 establish the connection between the degree of a common divisor of two polynomials and the order of a subresultant matrix. Theorem 3.2. Let f (x) and g(x) be defined in (3.1), and let q(x) and p(x) be polynomials of degrees m − k and n − k respectively. If d(x) is a polynomial of degree k, then f (x)p(x) = g(x)q(x), (3.3) if and only if d(x) is a common divisor of f (x) and g(x). Proof. If d(x) is a common divisor of f (x) and g(x), then there exist polynomials CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 47 p(x) and q(x) such that f (x) = q(x) d(x) and g(x) = p(x), d(x) and (3.3) follows. Conversely, assume that (3.3) holds, such that, without loss of generality, p(x) and q(x) are coprime. (If these polynomials are not coprime, then any common divisors can be removed.) It follows that since p(x) is of degree n − k and g(x) is of degree n, every divisor of p(x) is also a divisor of g(x). There therefore exists a polynomial d1 (x) of degree k such that p(x)d1 (x) = g(x), and substitution into (3.3) yields f (x) = d1 (x)q(x). (3.4) Similarly, consideration of the polynomials q(x) and f (x) leads to the equation g(x) = d2 (x)p(x), (3.5) where d2 (x) is of degree k. The substitution of (3.4) and (3.5) into (3.3) shows that d1 (x) = d2 (x), and thus the result is established. The main theorem can now be established [7]. Theorem 3.3. A necessary and sufficient condition for the polynomials f (x) and g(x), which are defined in (3.1), to have a common divisor of degree k ≥ 1 is that the rank of Sk (f, g) be less than (m + n − 2k + 2), or equivalently, the dimension of its null space is greater than or equal to one. Proof. Let f (x) and g(x) have a common divisor of degree k, where 1 ≤ k ≤ t, and t is the degree of the GCD of f (x) and g(x). There therefore exists a polynomial w(x) 48 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX of degree k such that f (x) = w(x)f1 (x) and g(x) = w(x)g1 (x), where f1 (x) = m−k X i cm−k−i x and g1 (x) = i=0 i ai x i=0 dn−k−ixi . i=0 It follows from f (x)g1 (x) = f1 (x)g(x) that m X n−k X n−k X j=0 j dn−k−j x = m−k X i=0 i cm−k−i x n X bj xj , j=0 and since this equation is satisfied for all values of x, the coefficients on both sides of this equation can be equated. This yields the homogeneous equation bn am d0 0 .. am−1 . . . . . b n−1 .. .. . . .. .. .. . . am . bn . . . dn−k .. . . = . . a1 . . bn−1 a1 b1 −c0 ... . .. .. .. a0 . .. . . b0 . .. .. . .. .. . a1 . b1 −cm−k 0 a0 b0 , (3.6) where the coefficient matrix is Sk ∈ R(m+n−k+1)×(m+n−2k+2) . The coefficients ai of f (x) occupy the first (n − k + 1) columns, and the coefficients bj of g(x) occupy the last (m−k +1) columns. The coefficient matrix is square and reduces to the Sylvester resultant matrix if k = 1. If k > 1, the number of rows is greater than the number of columns, and since it is assumed that (3.6) possesses a solution, the coefficient matrix Sk (f, g) must be rank deficient. It therefore follows that if f (x) and g(x) have a common divisor of degree k, then the rank of Sk (f, g) is less than (m + n − 2k + 2). 49 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX Assume now that the rank of Sk (f, g) is less than (m + n − 2k + 2), from which it follows that one or more of its columns is linearly dependent on the other columns. There therefore exist constants p0 , . . . , pn−k , q0 , . . . , qm−k , not all zero, such that n−k X i=0 pi u i − m−k X qj vj = 0, (3.7) j=0 where ui, i = 0, . . . , n−k, and vj , j = 0, . . . , m−k, are the vectors of the first (n−k+1) and last (m − k + 1) columns of Sk (f, g), respectively. If the polynomials p(x) and q(x) are defined as p(x) = n−k X i pi x and q(x) = m−k X qi xi , i=0 i=0 respectively, then (3.7) states that p(x)f (x) = q(x)g(x), and Theorem 3.2 shows that f (x) and g(x) have a common divisor of degree k. It therefore follows that if the rank of Sk (f, g) is less than (m + n − 2k + 2), then f (x) and g(x) have a common divisor of degree k. Theorem 3.3 allows the degree d of the GCD of f (x) and g(x) to be calculated because these polynomials possess common factors of degrees 1, 2, . . . , d, but they do not possess a common factor of degree d + 1. Thus rank Sk (f, g) < m + n − 2k + 2, k = 1, . . . , d, (3.8) k = d + 1, . . . , min (m, n), (3.9) and rank Sk (f, g) = m + n − 2k + 2, and hence d is equal to the index k of the last rank deficient subresultant matrix in the sequence S1 (f, g), S2(f, g), . . . , Sk (f, g). Alternatively, one can consider the sequence Smin (m,n) , Smin (m,n)−1 , . . . , thereby increasing the size of the subresultant matrix, and 50 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX thus d is equal to the order of the first subresultant matrix that is rank deficient. Each matrix Sk (f, g) is partitioned into a vector ck ∈ R(m+n−k+1) and a matrix Ak ∈ R(m+n−k+1)×(m+n−2k+1) , where ck is the first column of Sk (f, g), and Ak is the matrix formed from the remaining columns of Sk (f, g), Sk (f, g) = ck Ak = ck coeffs. of f (x) coeffs. of g(x) , (3.10) where the coefficients of f (x) occupy n − k columns, and the coefficients of g(x) occupy m − k + 1 columns. The computation of the GCD of two univariate polynomials requires that the equation Ak y = ck , y ∈ Rm+n−2k+1 , (3.11) be considered. Theorem 3.4. Let f (x) and g(x) be polynomials that are defined in (3.1), and let k ≤ min (m, n) be a positive integer. Then the dimension of the null space of Sk (f, g) is greater than or equal to one if and only if (3.11) possesses a solution. Proof. The following proof is taken from [55], and another proof is in [50]. Assume initially that (3.11) has a solution, from which it follows that ck lies in the column space of Ak . The definition of Sk (f, g) then shows that the dimension of the null space of Sk (f, g) is greater than or equal to one. Assume now that the dimension of the null space of Sk (f, g) is greater than or equal to one, and consider the left and right hand sides of (3.11). In particular, if 51 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX w T = w T (x) is defined as xm+n−k xm+n−k−1 · · · x 1 , then T w Ak = xn−k−1 f xn−k−2 f · · · f xm−k g xm−k−1 g · · · g , and w T ck = xn−k f, where f = f (x) and g = g(x). Let u(x) and v(x) be the polynomials u(x) = n−k−1 X n−k−1−i ui x and v(x) = i=0 m−k X vi xm−k−i , i=0 respectively, and let the vector t ∈ Rm+n−2k+1 be formed from the coefficients of u(x) and v(x), T t := u0 · · · un−k−2 un−k−1 v0 · · · vm−k−2 vm−k−1 vm−k . (3.12) It therefore follows that w T Ak t = uf + vg, where u = u(x) and v = v(x), and thus the equation w T Ak t = w T ck , (3.13) uf + vg = xn−k f, (3.14) reduces to for the polynomials u(x) and v(x). Let d = d(x) be the GCD of f (x) and g(x), and thus there exist polynomials CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 52 f1 = f1 (x) and g1 = g1 (x) such that f1 = f d and g g1 = . d The form of Sk (f, g) for several values of k is shown in Example 3.2, and it is clear that if (k − 1) columns are added to the coefficients of f (x) and (k − 1) columns are added to the columns of g(x), with suitable vertical shifts, such that the matrix S(f, g) is obtained, then the dimension of the null space of S(f, g) is greater than or equal to 1 + (k − 1) = k, and thus the rank loss of S(f, g) is greater than or equal to k. This implies that the degree of the GCD of f (x) and g(x) is greater than or equal to k, and hence degree f1 ≤ m − k and degree g1 ≤ n − k. Consider the polynomial division r (1 − x)n−k =q+ , g1 g1 (3.15) where q = q(x) is the quotient, r = r(x) is the remainder, the degree of r(x) is less than or equal to (n − k − 1), and degree q = (n − k) − degree g1 = (n − degree g1 ) − k = (degree d) − k. It is now shown that u = r and v = qf1 are solutions of (3.13), where t is defined in CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 53 (3.12). In particular, these forms of u(x) and v(x) satisfy (3.14) because uf + vg = rf + qf1 g = rf + qf1 dg1 = rf + qf g1 = f (r + qg1 ) = (1 − x)n−k f, from (3.15), and thus it follows that w T (Ak t − ck ) = 0, w = w(x), possesses a solution for all values of x. Since w 6= 0, the solution of (3.11) is given by y = t, where t is specified in (3.12). The next theorem follows from Theorems 3.3 and 3.4. Theorem 3.5. A necessary and sufficient condition for the polynomials f (x) and g(x) to have a common divisor of degree k is that (3.11) possesses a solution. This result is important when calculating approximate GCDs, and it will be used extensively in the sequel. 3.2 The Sylvester resultant and subresultant matrices for Bernstein basis polynomials This section considers the Sylvester resultant matrix, and its subresultant matrices, for polynomials expressed in the Bernstein basis, which is, for a polynomial of degree 54 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX m, m φi (x) = (1 − x)m−i xi , i i = 0, . . . , m. (3.16) It follows that, for example, second and third order polynomials expressed in the Bernstein basis are 2 2 2 2 2 c0 (1 − x) + c1 (1 − x)x + c2 x, 0 1 2 and 3 3 3 3 3 3 2 2 c0 (1 − x) + c1 (1 − x) x + c2 (1 − x)x + c3 x, 0 1 2 3 respectively. It is shown in [45, 49] that the Sylvester resultant matrix S(p, q) ∈ R(m+n)×(m+n) of the polynomials p = p(x) and q = q(x), m n X X m n m−i i p(x) = ci (1 − x) x and q(x) = di (1 − x)n−i xi , i i i=0 i=0 (3.17) is S(p, q) = D −1 T (p, q), where D, T (p, q) ∈ R(m+n)×(m+n) , −1 1 D = diag (m+n−1 ) 0 1 (m+n−1 ) 1 ··· (3.18) 1 1 m+n−1 (m+n−2 ) m+n−1 (m+n−1 ) , 55 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX and c0 m0 c1 m 1 .. . m T (p, q) = cm−1 m−1 cm m m d0 n0 .. . .. . c0 m 0 .. . c1 m 1 .. . .. . cm−1 .. . cm .. . .. . .. . dn−1 dn m m−1 m m d1 n1 .. . n n−1 n n .. . .. . n d0 0 n , d1 1 .. . n dn−1 n−1 n dn n (3.19) is the Sylvester resultant matrix for polynomials expressed in the scaled Bernstein basis, whose basis functions for a polynomial of degree m are φi (x) = (1 − x)m−i xi , i = 1, . . . , m. (3.20) Comparison of S(f, g) and S(p, q) shows that the Bernstein basis Sylvester resultant matrix does not exhibit the diagonal property of its power basis equivalent because of the diagonal matrix D −1 that premultiplies T (p, q). Despite this difference, all of the properties of the Sylvester matrix for power basis polynomials apply to the Sylvester matrix for Bernstein basis polynomials, and thus Theorems 3.1, 3.2, 3.3 and 3.4 are valid for Bernstein basis polynomials. Example 3.3. Consider the Bernstein basis polynomials 3 5 3 1 3 3 3 3 2 2 p(x) = 3 (1 − x) − (1 − x) x − (1 − x)x + x, 0 6 1 2 2 3 and 2 3 2 2 2 2 q(x) = 2 (1 − x) − (1 − x)x + x, 0 2 1 2 whose GCD is q(x) because 1 p(x) = q(x) 3(1 − x) + 2x . 2 (3.21) 56 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX The transpose of the Sylvester 3 0 S(p, q)T = 2 0 0 3 0 = 2 0 0 resultant matrix is 1 0 − 25 − 23 5 3 3 −2 −2 1 −3 1 0 0 2 −3 1 0 0 2 −3 1 1 − 58 − 41 0 4 3 5 3 − 12 − 8 1 4 1 . − 34 0 0 6 1 1 1 −2 0 2 4 1 3 0 −4 1 3 1 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 41 0 0 0 0 0 1 1 4 The reduction of S(p, q)T to row echelon form yields 1 0 3 − 85 − 41 4 3 5 3 0 − 12 − 8 1 4 0 , 1 3 0 − 1 3 4 0 0 0 0 0 0 0 0 0 0 (3.22) and thus the degree of the GCD is 2. The coefficients in the last non-zero row of this matrix yield the GCD, 1 4 3 4 4 4 2 2 3 (1 − x) x − (1 − x)x + x 3 2 4 3 4 2 3 2 2 2 2 2 =x 2 (1 − x) − (1 − x)x + x . 0 2 1 2 Deletion of the extraneous factor x2 yields the polynomial q(x), as required. Consider now the polynomial formed from the first row of the matrix in (3.22). CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 57 In particular, this polynomial is equal to 4 5 4 1 4 1 4 4 3 2 2 3 (1 − x) − (1 − x) x − (1 − x) x + (1 − x)x3 , 0 8 1 4 2 4 3 which simplifies to 1 3 3 3 3 5 3 2 2 3 (1 − x) x − (1 − x)x + x , (1 − x) 3 (1 − x) − 6 1 2 2 3 0 and since the term in the square brackets is equal to p(x), it follows from (3.21) that this polynomial can be simplified further, 1 3(1 − x)2 + 2(1 − x)x q(x) 2 1 2 3 2 2 = (1 − x) + (1 − x)x q(x). 2 0 2 1 (1 − x)p(x) = It follows that the coefficients in the first row of the matrix in (3.22) define a polynomial, one of whose factors is q(x), that is, the GCD of p(x) and q(x). Consider now the polynomial formed from the coefficients in the second row in the matrix in (3.22). This polynomial is equal to 3 4 5 4 3 4 4 4 3 2 2 3 (1 − x) x − (1 − x) x − (1 − x)x + x, 4 1 12 2 8 3 4 which simplifies to 3 2 2 2 (1 − x)x + x q(x), 4 1 2 which is proportional to q(x), that is, the GCD of p(x) and q(x). 58 CHAPTER 3. THE SYLVESTER RESULTANT MATRIX Example 3.4. Consider the polynomials p(x) c0 (40) (60) c1 (41) c0 (40) 6 (1 ) (6) c 4 c 14 c 4 2 (2) 1 (1) 0 (0) (6 ) (62) (62) 2 c3 (4) c2 (4) c1 (4) 3 3 1 S(p, q) = S1 (p, q) = 6 6 (63) ( ) ( ) 3 3 c4 (44) c3 (43) c2 (42) 6 (4 ) (64) (64) 4 c 4 (4 ) c3 (43) 6 (5) (65) c4 (44) (66) and q(x) for m = 4 and n = 3, d0 (30) (60) d1 (31) d0 (30) 6 6 (1) (1) d2 (32) d1 (31) d0 (30) 6 6 6 (2) (2) (2) d3 (33) d2 (32) d1 (31) d0 (30) . (63) (63) (63) (63) d3 (33) d2 (32) d1 (31) (64) (64) (64) d3 (33) d2 (32) (65) (65) d3 (33) (66) The second and third subresultant matrices are, c0 (40) d0 (30) 6 (0) (60) 4 4 c1 (1) c0 (0) d1 (31) 6 (6) (6 ) (1) c2 (4) c1 (14) d2 (1 3) 1 2 2 (6) (62) (62) 2 S2 (p, q) = 4 4 3 c3 (63) c2 (63) d3 (6 3) (3) (3) (3 ) c4 (44) c3 (43) 6 (64) (4) c4 (44) (65) and c0 (40) d0 (30) 6 (0) (6) c 4 d 03 1 (1) 1 (1) (6) (61) 1 c2 (4) d2 (3) 2 2 S3 (p, q) = 6 (62) ( ) 2 c3 (43) d3 (33) 6 (3) (63) c 4 4 (4 ) (64) respectively, d0 (30) (61) d1 (31) (62) d2 (32) (63) d3 (33) (64) d0 (30) (61) d1 (31) (62) d2 (32) (63) d3 (33) (64) d0 (30) (62) d1 (31) (63) d2 (32) (64) d3 (33) (65) , . CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 59 The next example illustrates Theorem 3.5 for polynomials expressed in the Bernstein basis. Example 3.5. It follows from Example 3.3 that the Sylvester resultant matrix S(p, q) is 3 0 2 0 0 5 3 3 1 − −4 0 4 2 8 5 1 1 1 . 1 S(p, q) = − − − 4 12 6 2 3 1 3 1 3 4 −8 0 −4 4 0 1 0 0 1 Consider the subresultant matrix formed k = 1, in which case it is necessary that the equation 0 2 0 0 3 1 3 − 0 4 4 2 1 1 1 −5 − 12 6 2 3 1 0 − 34 − 38 4 1 0 0 1 3 y1 5 −8 y2 = − 14 , y3 1 4 y4 0 have a solution in order that the degree of the GCD of p(x) and q(x) be greater than or equal to one. It is readily checked that the solution of this equation is T 3 1 y = 1 2 − 2 −1 , and thus the minimum degree of the GCD of p(x) and q(x) is one. Consider now the subresultant matrix for k = 2. A necessary and sufficient condition for the degree of the GCD of p(x) and q(x) to be greater than or equal to CHAPTER 3. THE SYLVESTER RESULTANT MATRIX two is that the equation 60 3 0 2 1 −5 −3 y 8 4 2 1 , = 1 1 1 − y − 6 4 2 2 1 1 0 4 4 have a solution. It is readily checked that the solution of this equation is T 3 y= , 1 2 and thus the minimum degree of the GCD of p(x) and q(x) is two. Since the degree of q(x) is two, it follows that the GCD of p(x) and q(x) is proportional to q(x). 3.3 Summary This chapter has reviewed some properties of the Sylvester resultant matrix, and its subresultant matrices, for polynomials expressed in the power and Bernstein bases. It was shown that the Sylvester matrix for power basis polynomials has a strong diagonal pattern, which is not present in its Bernstein basis equivalent because it is premultiplied by a diagonal matrix. Despite this difference, all the properties of common divisors that are valid for the Sylvester matrix for power basis polynomials are valid for its Bernstein basis equivalent. For example, the rank loss of a resultant matrix is equal to the degree of the GCD of the polynomials, and the coefficients of the GCD can be obtained by reducing the resultant matrix to upper triangular form, for example, by an LU or QR decomposition, and considering the coefficients in the last non-zero row. The subresultant matrices of a Sylvester matrix are obtained by deleting some CHAPTER 3. THE SYLVESTER RESULTANT MATRIX 61 rows and columns of the Sylvester matrix. This yields a rectangular matrix, and some theorems about the degree of a common divisor of two polynomials and the dimensions of the subresultant matrices were established. The subresultant matrices of a Sylvester resultant matrix are important when it is required to compute an approximate GCD of two inexact polynomials. Chapter 4 Approximate greatest common divisors The application of the Sylvester matrix and its subresultant matrices to the calculation of the GCD of two polynomials was considered in Chapter 3. The methods required for these calculations are adequate if neither roundoff nor data errors are present, and all computations are performed in a symbolic environment. These conditions rarely prevail in practice because data is often inexact, and computations are performed in a floating point environment in which roundoff errors cannot be ignored. It is therefore assumed that the given inexact polynomials f (x) and g(x), which are defined in (3.1), are coprime, and a minor structured perturbation of the coefficients to ai + δai and bi + δbi , may cause the perturbed polynomials m n X X i ˜ f (x) = (ai + δai )x and g̃(x) = (bi + δbi )xi , i=0 (4.1) i=0 to have a non-constant GCD. The computed GCD is an approximate GCD of the given inexact polynomials f (x) and g(x), and moreover, it is not unique because 62 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 63 different perturbations of the given inexact polynomials yield different approximate GCDs. The Sylvester matrix and its subresultant matrices can still be used if it is required to compute an approximate GCD of two polynomials, but modifications to the simple operations described in Chapter 3 are required, and they are considered in this chapter. Two methods for the computation of an approximate GCD are considered in this chapter. In particular, Section 4.3 describes a method that makes extensive use of the structured nature of the Sylvester resultant matrix for power basis polynomials [48], and it is extended in Section 4.4 to Bernstein basis polynomials [1, 47]. Section 4.6 describes a method that uses a partial singular value decomposition of the Sylvester subresultant matrices to obtain an initial estimate of the GCD, followed by a nonlinear refinement procedure in order to improve its accuracy [53]. The method of GCD computation that uses structured matrices takes advantage of the non-uniqueness of the Sylvester resultant matrix to introduce a parameter α that improves the computed results with respect to its default value α = 1. This non-uniqueness is described in Section 4.2, and the examples in Section 4.3.2 show that an incorrect value of α leads to poor results. Also, this method yields a low rank approximation of a Sylvester resultant matrix, which has applications for the calculation of the points of intersection of curves and surfaces. 4.1 Previous work The computation of an approximate GCD of the inexact polynomials (3.1) has been considered by several authors. For example, Corless et. al. [6], and Zarowski et. al. [52], use the QR decomposition of the Sylvester matrix S(f, g). Similarly, the CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 64 singular value decomposition of S(f, g) is used in [5], [8] and [40] in order to compute an approximate GCD, but both these decompositions do not preserve its structure. In particular, the smallest non-zero singular value of S(f, g) is a measure of its distance to singularity, but this is the distance to an arbitrary rank deficient matrix, and not the distance to the nearest rank deficient Sylvester matrix. Karmarkar and Lakshman [26] use optimisation techniques in order to compute the smallest perturbations that must be applied to the coefficients of two polynomials such that they have a nonconstant GCD, and Pan [34] uses Padé approximations to compute an approximate GCD. 4.2 The non-uniqueness of the Sylvester resultant matrix The Sylvester matrix S(f, g) has a very simple structure, and this makes it convenient for computations. This simple structure exhibits one property that has a significant effect on the quality of the computed approximate GCD. In particular, an approximate GCD of f (x) and g(x) is equal to, up to a scalar multiplier, an approximate GCD of f (x) and αg(x) where α is an arbitrary non-zero constant, and thus the resultant matrix S(f, αg) should be used when it is desired to compute an approximate GCD of f (x) and g(x). Since S(f, αg) 6= αS(f, g), the inclusion of α permits a family of approximate GCDs, rather than only one approximate GCD, to be computed. The restriction α = 1 yields unsatisfactory solutions, but it is shown that the inclusion of α allows significantly improved solutions to be obtained. The Sylvester matrix S(f, g) is defined in (3.2), but the discussion in the previous 65 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS paragraph shows that it is more appropriate to consider the resultant matrix S(f, αg), where α is a real non-zero constant, a m am−1 am . . .. am−1 . . .. .. . am a1 . S(f, αg) = .. a0 . am−1 a1 .. .. . a0 . .. . a1 a0 αbn αbn−1 αbn .. . αbn−1 .. αb1 . αb0 αb1 αb0 .. . .. . αbn .. . αbn−1 .. .. . . .. . αb1 αb0 , for approximate GCD computations. The scalar α can be interpreted as the magnitude of g(x) relative to the magnitude f (x), but this interpretation is only valid provided that f (x) and g(x) are normalised in the same way. Since the coefficients of f (x) and g(x) may vary by several orders of magnitude, normalisation by the geometric means of their coefficients is convenient, and thus the polynomials f (x) and g(x) are redefined as 1 and f (x) := Q 1 m+1 ( m i=0 |ai |) 1 g(x) := Q 1 ( ni=0 |bi |) n+1 m X ai xi , (4.2) bi xi , (4.3) i=0 n X i=0 respectively, where ai , i = 0, . . . , m, and bi , i = 0, . . . , n, are the perturbed coefficients, and thus the Sylvester matrix S(f, αg) is constructed from these polynomials. If one or more of the coefficients of a polynomial is zero, then the normalisation by the CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 66 geometric mean of its coefficients, as shown in (4.2) and (4.3), requires modification. 4.3 Structured matrices and constrained minimisation This section describes the use of structured matrices and constrained optimisation for the computation of an approximate GCD of two inexact polynomials. This requires the calculation of the perturbations δai , i = 0, . . . , m, and δbi , i = 0, . . . , n, such that the perturbed polynomials f˜(x) and g̃(x), which are defined in (4.1), have a non-constant GCD. This problem reduces, therefore, to the calculation of a low rank ˜ g̃) of a full rank Sylvester matrix S(f, αg), where f (x) structured approximation S(f, and g(x) are the normalised polynomials (4.2) and (4.3) respectively. It is usual to require the smallest (structured) perturbations that perform this transformation, and thus the computation of an approximate GCD reduces to a constrained minimisation, where the function to be minimised (the objective function) is kδak2 + α2 kδbk2 , and ˜ the constraint is the requirement that f(x) and g̃(x) have a non-constant GCD. This condition is imposed by employing Theorem 3.5 for the perturbed polynomials f˜(x) and g̃(x) and determining the integers k for which (3.11) has a solution. The largest integer defines the degree of the approximate GCD of f˜(x) and g̃(x). The Sylvester matrix and subresultant matrices have very strong structures, and thus if Sk (f, αg) is a subresultant matrix, it is necessary to determine the smallest perturbations of f (x) and αg(x) such that Sk (f˜, g̃) has the same structure as Sk (f, αg). Since f (x) and g(x) are inexact and coprime, and their theoretically exact forms have a non-constant GCD, there exist perturbations δf (x) and αδg(x) such CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 67 that f (x) + δf (x) and α (g(x) + δg(x)) have a non-constant common divisor, that is, if hk ∈ Rm+n−k+1 and Ek ∈ R(m+n−k+1)×(m+n−2k+1) are structured perturbations of ck and Ak respectively, it follows from Theorem 3.5 that the equation (Ak + Ek ) y = ck + hk , (4.4) which is the perturbed form of (3.11), has an exact solution. It follows from Theorem 3.5 that, for a given value of k, (4.4) has a solution if and only if f˜(x) and g̃(x), where f˜(x) = f (x) + δf (x) and g̃(x) = α (g(x) + δg(x)) , have a common divisor of degree k. The computation of a structured low rank approximation of S(f, αg) therefore requires the determination of Ek and hk such that (4.4) possesses a solution for which Ak and Ek have the same structure, and ck and hk have the same structure. This is an overdetermined equation, and k is initially set equal to its maximum value, k = k0 = min (m, n). If a solution exists, then the degree of the GCD of f˜(x) and g̃(x) is equal to k0 . If this equation does not possess a solution, then k is reduced to k0 − 1, and if a solution exists for this value of k, then the degree of the GCD of f˜(x) and g̃(x) is equal to k0 − 1. If a solution does not exist, then k is reduced to k0 − 2, and this process is repeated until (4.4) possesses a solution. This result is used in the next section in order to compute a structured low rank approximation of S(f, αg). The perturbation matrix Ek and perturbation vector hk are structured, and thus ordinary least squares (LS) methods cannot be used for their computation because they do not preserve the structure of a matrix or vector. It is therefore necessary to use structure preserving matrix methods in order to guarantee that (4.4) has the same form as its unperturbed equivalent (3.11), and the method of structured total 68 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS least norm (STLN) is therefore used [39]. If zi is the perturbation of the coefficient ai , i = 0, . . . , m, of f (x), and zm+1+i is the perturbation of the coefficient αbi , i = 0, . . . , n, of g(x), then the Sylvester matrix Bk = Bk (z) := S(δf, αδg) of the Bk := hk Ek zm zm−1 zm . .. z m−1 .. . z1 = z0 z1 z 0 perturbations is zm+n+1 .. . .. . zm+n .. . zm .. . zm−1 .. .. . . .. . z1 z0 zm+n+1 .. . .. . zm+n+1 zm+2 zm+n .. . zm+1 zm+2 .. . zm+1 .. . zm+n .. . .. . zm+2 zm+1 , (4.5) where hk = hk (z) is equal to the first column of Bk (z), and Ek = Ek (z) is equal to the last m + n − 2k + 1 columns of Bk (z). The matrix Bk (z) is a structured error matrix because Sk (f, αg) + Bk (z) is a subresultant matrix. It follows from the definitions of hk and z that there exists a matrix Pk ∈ R(m+n−k+1)×(m+n+2) such that 0m+1,n+1 Im+1 hk = Pk z = z, 0n−k,m+1 0n−k,n+1 where Im+1 is the identity matrix of order m + 1 and the subscripts on the zero matrices indicate their order. 69 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS Example 4.1. If 0 b3 a3 b2 A2 = a2 b1 a1 b0 a0 0 m = n = 3 and k = 2, then 0 a3 1 0 a2 0 1 b3 a , P2 = 0 0 , c = b2 2 1 b1 a0 0 0 b0 0 0 0 T z = z3 z2 z1 z0 z7 z6 z5 z4 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 , 0 0 0 0 0 0 , and z 3 z2 h2 = z1 , z0 0 0 z3 E2 = z2 z1 z0 z7 z6 z5 z4 0 0 z7 z6 . z5 z4 The residual r(z, y) that is associated with an approximate solution of (4.4) due to the perturbations hk and Ek is given by r(z, y) = ck + hk − (Ak + Ek )y, hk = Pk z, Ek = Ek (z), (4.6) where the elements of z are zi , i = 0, . . . , m + n + 1, and it is required to minimise kzk subject to the constraint r(z, y) = 0, which is an equality constrained least squares (LSE) problem. It is necessary to replace the vector Ek y with a vector Yk z, that is, Yk z = Ek y, Yk = Yk (y), Ek = Ek (z), (4.7) and hence Yk δz = δEk y, (4.8) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 70 where Yk ∈ R(m+n−k+1)×(m+n+2) , and thus the residual r(z, y) can be written as r(z, y) = ck + hk − Yk z − Ak y. Example 4.2. Consider the vectors and matrices in Example 4.8. It is readily checked that 0 0 0 0 y2 0 0 0 y1 0 0 0 y3 y2 0 0 , Y2 = 0 y 0 0 0 y y 0 1 3 2 0 0 y1 0 0 0 y3 y2 0 0 0 y1 0 0 0 y3 and that Y2 z = E2 y. The calculation of the perturbations to f (x) and αg(x) such that f˜(x) and g̃(x) have a non-constant common divisor requires the solution of r(z, y) = 0, which is a set of (m + n − k + 1) non-linear equations in z ∈ Rm+n+2 and y ∈ Rm+n−2k+1 . The calculation of a solution of these non-linear equations requires that they be linearised, and it is necessary to impose the constraint that f (x) and g(x) be perturbed by the minimum amount. Iterative algorithms for the solution of r(z, y) = 0 require that it be linearised, and thus if it is assumed that second order terms are sufficiently small such that they can be neglected, then since Ek = Ek (z), r(z + δz, y + δy) = ck + (hk + δhk ) − Ak (y + δy) − (Ek + δEk )(y + δy) ≈ ck + (hk + δhk ) − Ak y − Ak δy − Ek y − Ek δy − δEk y = r(z, y) + Pk δz − Ak δy − Ek δy − Yk δz = r(z, y) − (Yk − Pk )δz − (Ak + Ek )δy, CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 71 using (4.8). The requirement that the perturbations be minimised is imposed by posing the problem in the form of a constrained minimisation, min kD(z + δz)k such that r(z + δz, y + δy) = 0, z+δz (4.9) where D ∈ R(m+n+2)×(m+n+2) is a diagonal matrix that accounts for the repetition of the elements of z in Bk (z). In particular, since each of the perturbations zi , i = 0, . . . , m, occurs (n−k +1) times, and each of the perturbations zi , i = m+1, . . . , m+ n + 1, occurs (m − k + 1) times, it follows that 0 D1 0 (n − k + 1)Im+1 D= = . 0 D2 0 (m − k + 1)In+1 The problem statement (4.9) is the LSE problem, and algorithms for its solution are considered in the next section. 4.3.1 Algorithms for the solution of the LSE problem The LSE problem (4.9) can be written in matrix form as min kEv − sk , Cv=t where (m+n−k+1)×(2m+2n−2k+3) (Yk − Pk ) (Ak + Ek ) ∈ R (m+n+2)×(2m+2n−2k+3) E = D 0 ∈R C = t = r(z, y) ∈ Rm+n−k+1 s = −Dz δz v = δy m+n+2 ∈ R 2m+2n−2k+3 , ∈R (4.10) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 72 δz ∈ Rm+n+2 and δy ∈ Rm+n−2k+1 . For ease of notation, it is convenient to define m1 = m + n − k + 1, m2 = m + n + 2, m3 = 2m + 2n − 2k + 3, m4 = m3 − m1 = m + n − k + 2, where m1 < m2 , m3 , and thus C ∈ Rm1 ×m3 , E ∈ Rm2 ×m3 , t ∈ Rm1 and s ∈ Rm2 . It is assumed that (i) rank C = m1 , which guarantees that the constraint is consistent, and (ii) that N (E) \ N(C) = ∅ ⇔ E rank = m3 , C where N (X) denotes the null space of X, which guarantees that the LSE problem has a unique solution [16], page 396. There exist three principal methods for the solution of the LSE problem: The method of weights, the method of Lagrange multipliers and the QR decomposition. The method of weights The method of weights requires that the LSE problem (4.9) be written as an unconstrained LS problem, τC τ t min v − , s E τ ≫ 1, where v = v(τ ) ∈ Rm3 and τ is a weight whose large value guarantees that, in the limit, the equality constraint is satisfied. The normal equations associated with this minimisation are τ 2 C T C + E T E v = τ 2 C T t + E T s. If λ(τ ) ∈ Rm1 and r(τ ) ∈ Rm2 are defined as λ(τ ) := τ 2 (t − Cv(τ )) and r(τ ) := s − Ev(τ ), CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 73 respectively, then C T λ(τ ) + E T r(τ ) = 0, and these three equations can −2 τ Im1 0 CT be combined, 0 C λ(τ ) Im2 E r(τ ) T v(τ ) E 0 t = s . 0 (4.11) The method of weights is attractive because of its simplicity – a standard LS solver can be used – but a large value of τ may be necessary in order to obtain an acceptable solution. This may, however, cause numerical problems, as shown in Section 3 in [29], and care must therefore be exercised when the method is implemented computationally. The value of τ must be determined, and this issue is now considered. Van Loan 1 [29] recommends the value τ = µ− 2 , where µ is the machine precision, because it implies that kEv − sk2 + τ 2 kCv − tk2 = kEv − sk2 + 1 kCv − tk2 ≈ kCv − tk2 , µ from which it follows that the equality constraint is enforced exactly, to within the limits of machine precision. Barlow [2], and Barlow and Vemulapati [3], recommend 1 that τ = µ− 3 , which is a heuristic that is derived from experimental results. They note that the choice of τ is critical for the convergence of the algorithm because if τ is too small or too large, the algorithm may converge very slowly, or it may converge to inaccurate values, or it may not converge at all. 1 1 The values τ = µ− 2 and τ = µ− 3 are independent of the data E, C, s and t, and CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 74 it is therefore appropriate to multiply the constraint by a constant ν such that E s = ν C t . It is recommended that this scaled form of the equality constraint, that is, C1 v = t1 , C1 = νC and t1 = νt, be used when the LSE problem is solved by the method of weights. The method of weights is used for the solution of the LSE problem in [25], [27] and [55], but the disadvantages discussed above suggest that alternative methods for its solution be sought. The method of Lagrange multipliers The LSE problem can also be solved by the method of Lagrange multipliers. In particular, if λ ∈ Rm1 is a vector of Lagrange multipliers, then the LSE problem requires the minimisation of the function h(v, λ), h(v, λ) = 1 (Ev − s)T (Ev − s) − λT (Cv − t) , 2 and this leads to the equations E T Ev − E T s − C T λ = 0 Cv = t. The residual r of the objective function is equal to r = s − Ev, and thus E T r + C T λ = 0, CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS from which it follows that these equations can be written in matrix form, 0 C λ t 0 0 I r = s . E m2 CT ET 0 v 0 75 (4.12) It is seen that the solution of (4.11) approaches the solution of (4.12) as τ → ∞, which provides the equivalence between the method of weights and the method of Lagrange multipliers. The coefficient matrix in (4.12) is square and of order m1 + m2 + m3 , which is large for many problems of practical interest. Smaller matrices are required when the QR decomposition is used to solve the LSE problem. The QR decomposition The LSE problem can be solved directly by the QR decomposition [13], pages 585-586, and [16], pages 397-398. Let R1 C T = QR = Q , 0 where Q ∈ Rm3 ×m3 is an orthogonal matrix, R ∈ Rm3 ×m1 and R1 ∈ Rm1 ×m1 is a non-singular upper triangular matrix, be the QR decomposition of C T . If EQ = E1 E2 , where E1 ∈ Rm2 ×m1 and E2 ∈ Rm2 ×m4 , and w1 QT v = , w2 where w1 ∈ Rm1 and w2 ∈ Rm4 , the constraint Cv = t becomes R1T w1 = t. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 76 Similarly, the objective function kEv − sk becomes kEv − sk = EQQT v − s w 1 = EQ − s w2 = kE1 w1 + E2 w2 − sk = kE2 w2 − (s − E1 w1 )k , and thus it is minimised when w2 = E2† (s − E1 w1 ) , from which it follows that the solution of the LSE problem is w1 v = Q = Q(:, 1 : m1 )w1 + Q(:, m1 + 1 : m3 )w2 . w2 This method for solving the LSE problem will be used because it does not possess the disadvantages of the method of weights, and the matrices are smaller than those required for the method of Lagrange multipliers. 4.3.2 Computational details The given inexact polynomials f (x) and g(x), which are defined in (4.2) and (4.3) respectively, are constructed by perturbing their theoretically exact forms fˆ(x) and ĝ(x), whose coefficients are âi , i = 0, . . . , m, and b̂i , i = 0, . . . , n, respectively. It therefore follows that if µ = 1/ε is the signal-to-noise ratio, then kδf (x)k = ε fˆ(x) and kδg(x)k = ε kĝ(x)k . If cf ∈ Rm+1 and cg ∈ Rn+1 are vectors of random variables, all of which are uniformly distributed in the interval [−1, +1], then the perturbations δf (x) and δg(x) are given CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 77 by δf (x) = ε ˆ f (x) cf kcf k and δg(x) = ε kĝ(x)k cg , kcg k and thus the inexact polynomials f (x) and g(x) are ˆ m f (x) cf X ˆ f (x) = f (x) + ε = (âi + δai ) xi , kcf k i=0 and n kĝ(x)k cg X = b̂i + δbi xi , g(x) = ĝ(x) + ε kcg k i=0 respectively. It therefore follows that the forms of the inexact polynomials f (x) and g(x) that form the Sylvester matrix S(f, αg) are m X 1 f (x) = Q (âi + δai ) xi , 1 m ( i=0 |âi + δai |) m+1 i=0 and m X 1 i g(x) = b̂ + δb i i x . 1 Qn n+1 i=0 i=0 b̂i + δbi (4.13) (4.14) It was shown in Section 4.3 that the computation of an approximate GCD can be posed as an LSE problem, and algorithms for its solution were considered in Section 4.3.1. Algorithm 4.1 is a simple implementation of the QR decomposition for the solution of the LSE problem (4.9). Since the objective function r(z + δz, y + δy) is obtained by linearisation, the QR decomposition is applied to the linearised form, and a simple iterative procedure is used to obtain a solution of the non-linear equation. An initial estimate of the solution is required, and this is obtained by setting z = 0, that is, a simple LS problem is solved. The corresponding value of y is obtained by setting r(z, y) = hk = 0 and Ek = 0 in (4.6), and thus its initial value is given by the 78 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS solution of the minimisation y = arg min kAk t − ck k . (4.15) t Termination of the algorithm occurs when the relative error between successive iterates is less than a specified tolerance. The perturbed polynomials that have a non-constant GCD are given by m X ˜ f(x) = (ai + zi ) xi and g̃(x) = i=0 n X i=0 zm+1+i i bi + x, α because zm+1+i , i = 0, . . . , n, are the structured perturbations of αg(x). Algorithm 4.1: STLN for the computation of an approximate GCD Input The polynomials f (x) and g(x), the scalar α, a value for k, where 1 ≤ k ≤ min (m, n), and the tolerances ǫy and ǫz . Output Polynomials f˜(x) = f (x) + δf (x) and g̃(x) = g(x) + δg(x) such that the ˜ degree of the GCD of f(x) and g̃(x) is greater than or equal to k. Begin 1. Form the k’th Sylvester matrix Sk (f, αg) from f (x), g(x) and α. 2. Set Ek = 0 and hk = 0, and compute the initial value of y from (4.15). Construct the residual r(z, y) = ck − Ak y, the matrix Yk from y, and the matrix Pk . 3. Form the matrices E and C, and the vectors t and s, as shown in (4.10). 4. Repeat CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS (a) Compute the QR decomposition of C T , 79 R1 C T = QR = Q . 0 (b) Set w1 = R1−T t. (c) Partition EQ as EQ = E1 E2 , where E1 ∈ R(m+n+2)×(m+n−k+1) and E2 ∈ R(m+n+2)×(m+n−k+2) . (d) Compute w2 = E2† (s − E1 w1 ) . (e) Compute the solution δz w1 = Q . δy w2 (f) Set y := y + δy and z := z + δz. (g) Update Ek and hk from z, and Yk from y. Compute the residual r(z, y) = (ck + hk ) − (Ak + Ek )y. Until kδyk kyk ≤ ǫy AND kδzk kzk ≤ ǫz . End Algorithm 4.1 can be improved by distinguishing between valid and invalid approximate GCDs. The method of STLN allows the vector z of perturbations of the coefficients of f (x) and αg(x) that solves the LSE problem to be calculated, but the CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 80 maximum permissible value of kzk is related to the signal-to-noise ratio µ of the coefficients of f (x) and g(x). In particular, the smaller the value of µ, the larger the maximum permissible value of kzk. This consideration leads to the definition of the legitimate solution space. Definition 4.1 (Legitimate solution space). The legitimate solution space of fˆ(x) is the region that contains all perturbations of its coefficients that are allowed by the signal-to-noise ratio µ. The maximum allowable magnitude of these perturbations is ρ, where ˆ kf(x)k , µ and all perturbations that are smaller than this bound lie in the legitimate solution space. The errors consist of the data errors f (x) − fˆ(x) and the structured perturba- tions from the method of STLN, and thus the perturbations must satisfy ˆ f (x) , f (x) − fˆ(x) + kzf k ≤ µ (4.16) where zf ∈ Rm+1 denotes the structured perturbations of f (x). This equation requires modification because fˆ(x) is not known, and thus if it is assumed that fˆ(x) ≈ kf (x)k, then (4.16) can be approximated by kzf k ≤ kf (x)k . µ (4.17) This definition of the legitimate solution space is expressed in terms of f (x), and it is clear that (4.17) is also satisfied by g(x), but with a slight modification. Specifically, since zm+i+1 , i = 0, . . . , n, are the perturbations of the coefficients αbi , it follows that kzg k kg(x)k ≤ , α µ (4.18) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 81 where zg ∈ Rn+1 stores the structured perturbations of the polynomial αg(x). Acceptable structured perturbations require that the conditions (4.17) and (4.18) be satisfied. Algorithm 4.2 is an extension of Algorithm 4.1 that performs a sequence of tests in order to eliminate values of α, and therefore polynomials f˜(x) and g̃(x), from Algorithm 4.1 that do not satisfy error criteria with regard to the legitimate solution space, the magnitude of the normalised residual, and the rank of the structured low rank approximation. In particular, Algorithm 4.2 is executed for a range of values of α, and the results are stored. Each value of α yields a different pair of polynomials f˜(x) and g̃(x), and Step 2 of Algorithm 4.2 is used to eliminate the values of α for which the magnitude of the structured perturbations is greater than the error in the polynomials, that is, polynomials that lie outside the legitimate solution space are discarded. Values of α for which the normalised residual krnorm k is too large are eliminated in Step 3 of Algorithm 4.2, which is therefore performed on a reduced set of solutions. Step 4 of Algorithm 4.2 calculates, for each of the remaining values of α, the singular values of the Sylvester matrix S(f˜, g̃) in order to determine its numerical rank. The value of α for which this quantity is most clearly defined is the optimal value α0 of α, and a low rank approximation of S(f, αg) is constructed from the polynomials f˜0 (x) and g̃0 (x), which are the polynomials that are associated with α0 . An approximate GCD of f (x) and g(x) can be calculated by performing an LU or QR decomposition on S(f˜0 , g̃0 ). CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 82 Algorithm 4.2: Extended STLN for the computation of an approximate GCD Input The polynomials f (x) and g(x), the scalar α, a value for k where 1 ≤ k ≤ min (m, n), the tolerances ǫy and ǫz , the signal-to-noise ratio µ, and a range of values of α, α1 ≤ α ≤ α2 . Output Polynomials f˜0 (x) and g̃0 (x) such that the degree of the GCD of f˜0 (x) and g̃0 (x) is greater than or equal to k. Begin 1. Apply Algorithm 4.1 with the given values of ǫy , ǫz and all values of α in the specified range. For each value of α, store the values of kzf k , kzg k and rnorm , rnorm = r(z, y) , kck + hk k where r(z, y) is calculated in Step 3g of Algorithm 4.1 and rnorm is the normalised form of r. 2. Retain the values of α for the values of kzf k and kzg k that satisfy (4.17) and (4.18), respectively. 3. Retain the values of α for which the normalised residual krnorm k satisfies the error criterion krnorm k ≤ 10−13 . 4. For each acceptable value of α, compute the singular values σi of S(f˜, g̃), where f˜(x) and g̃(x) are the polynomials that are computed by Algorithm 4.1 and are CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 83 normalised by the geometric mean of their coefficients, as shown in (4.2) and (4.3) for f (x) and g(x), respectively. Arrange the singular values σi in nonincreasing order, and choose the value α0 of α for which the numerical rank of ˜ g̃) is equal to (m + n − k), that is, the ratio S(f, σm+n−k σm+n−(k−1) , (4.19) is a maximum. The polynomials that correspond to the value α0 are f˜0 (x) and g̃0 (x). End Examples 4.3 and 4.4 implement Algorithm 4.2 [48]. The polynomials in these examples have several multiple roots of high degree, and they therefore provide a good test for the algorithm. An approximate GCD that lies within the legitimate solution space is obtained in both examples, and a simple test is included to show ˜ that the computed polynomials f(x) and g̃(x) are not coprime. Example 4.3. Consider the exact polynomials fˆ1 (x) = (x − 0.25)8(x − 0.5)9 (x − 0.75)10 (x − 1)11 (x − 1.25)12 , (4.20) ĝ1 (x) = (x + 0.25)4(x − 0.25)5(x − 0.5)6 , (4.21) and which have 11 common roots, from which it follows that the rank of S(fˆ1 , ĝ1 ) is equal to 54. The termination constants ǫy and ǫz , which are defined in Algorithm 4.1, were set equal to 10−6 and 10−8 , respectively. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS −3.4 84 0 (a) −3.6 (a) −5 −3.8 −10 −4 (b) (b) −15 −4.2 −4.4 −5 0 −20 −5 5 log α 10 (i) (ii) 10 log −8 α=0 8 log10 σ54/σ55 10 || norm 10 log ||r 5 10 −6 −10 −12 −14 −16 −5 0 log α 6 log10 α = 0 4 2 0 log10α 5 (iii) 0 −5 0 log10α 5 (iv) Figure 4.1: (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ, (b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ54 /σ55 . Case 1: The computation of a family of approximate GCDs from a given structured low rank approximation of S(f1 , αg1). The exact polynomials (4.20) and (4.21) were perturbed by noise such that µ = 108 and then normalised by the geometric mean of their coefficients, as shown in (4.13) and (4.14), thereby yielding the polynomials f1 (x) and g1 (x). Figure 4.1 shows the results of applying the criteria in Steps 2, 3 and 4 in Algorithm 4.2. In particular, Figure 4.1(i) shows the ratio kf1 (x)k /µ, which is the maximum allowable perturbation of f1 (x), and the variation with α of the computed value of kzf1 k, which is calculated CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 85 α = 10−0.6 0 −5 −10 −15 −20 −25 −30 0 10 20 30 40 50 60 70 i Figure 4.2: The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given inexact data S(f1 , g1), ; (iii) the computed data S(f˜1,0 , g̃1,0 ), ×, for α = 10−0.6 . All the polynomials are normalised by the geometric mean of their coefficients. by the method of STLN. Figure 4.1(ii) is the same as Figure 4.1(i), but for g1 (x), and it is seen from (4.17) and (4.18) that valid solutions are obtained for log10 α > −0.9. Figure 4.1(iii) shows the variation of the normalised residual krnorm k with α, and it is seen that it ranges from O(10−16 ) to O(10−8) in the specified range of α. Figure 4.1(iv) shows the variation with α of the ratio σ54 /σ55 that is defined in (4.19), and it is seen that the profile of this curve could be produced (approximately) by calculating the reciprocal (to within a scale factor) of the normalised residual shown in Figure 4.1(iii). This result, which has been observed frequently, suggests that small values of the normalised residual are associated with large values of the CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 86 ratio (4.19). This example clearly shows the importance of including α in the analysis because there exist, in general, many values of α for which the normalised residual is sufficiently small and the ratio σ54 /σ55 is sufficiently large. The small value of the normalised residual implies that the perturbed equation (4.6) is satisfied to high accuracy, and the large value of σ54 /σ55 implies that the numerical rank of the structured low rank approximation S(f˜1 , g˜1) is well defined. Each of these values of α yields a different structured low rank approximation of S(f1 , αg1), and therefore a different approximate GCD of f1 (x) and g1 (x). It is shown in Figure 4.1(iv) that in the absence of scaling, that is, log10 α = 0, a poor solution is obtained because the ratio of the singular values (4.19) is approximately equal to 101.5 , which is about 7 orders of magnitude smaller than the ratio obtained for log10 α = −0.6, which is the optimal value of α. Figure 4.1(iii) shows that if log10 α = 0, the normalised residual is about 6 orders of magnitude larger than the value obtained for log10 α = −0.6. These observations show that an arbitrary choice of α can yield severely suboptimal results when it required to compute an approximate GCD of f (x) and g(x) from S(f, αg). Figure 4.2 shows the normalised singular values of the Sylvester resultant matrices S(fˆ1 , ĝ1 ), S(f1 , g1), and S(f˜1,0 , g̃1,0 ) for the optimal value of α, where all the polynomials are normalised by the geometric mean of their coefficients. The polynomials f˜1,0 (x) and g̃1,0 (x) are the polynomials computed in Algorithm 4.2 that form the structured low rank approximation of S(f1 , αg1 ), α = 10−0.6 . It is seen that the computed singular values of S(fˆ1 , ĝ1 ) do not show a sharp cut off, which would suggest that the polynomials (4.20) and (4.21) are coprime. The profile of the singular CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 87 values of S(f1 , g1 ) shows that the noise affects the small singular values severely, but significantly improved results are obtained when the Sylvester matrix S(f˜1,0 , g̃1,0) is considered. In particular, it is clear that the numerical rank of this matrix is equal to 54 because σ54 is about 7 orders of magnitude larger than σ55 . Since the Sylvester matrix is of order 65 × 65 and k = 11, the method of STLN has yielded an excellent result. Convergence of the algorithm was achieved in 45 iterations. It is clear that S(f˜1,0 , g̃1,0 ) can be used to compute an approximate GCD of f1 (x) and g1 (x). This example has considered the situation in which the correct subresultant has been selected because the degree of the GCD of fˆ1 (x) and ĝ1 (x) is 11, which is the chosen value of k, but this information is not, in general, known a priori. It is therefore necessary to consider how the solution changes as a function of k, and this is investigated in Case 2. Case 2: The effects of different subresultants. Computational experiments showed that the method of STLN is able to compute structured low rank approximations for k = 10, . . . , 1. Figure 4.3 shows the results for k = 8, and it is seen that the numerical rank of S(fˆ1 , ĝ1 ) is not defined, but the numerical rank of its structured low rank approximation S(f˜1,0 , g̃1,0) is equal to 57, corresponding to a loss in rank of 8. Convergence was achieved in 26 iterations. Consider now the situation that occurs for k = 12, 13 and 14. In particular, successful results were obtained for k = 12 and k = 13, but the computed solution for k ≥ 14 was not acceptable. This can be seen for k = 14 in Figures 4.4(i) and (ii), which show that although valid solutions exist for either f1 (x) or g1 (x), they do not exist for both f1 (x) and g1 (x). It is noted that if it is not required that the solution lie in the legitimate solution space, it is possible to construct structured low rank CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 88 α = 101.4 0 −5 −10 −15 −20 −25 −30 0 10 20 30 40 50 60 70 i Figure 4.3: The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ1 , ĝ1 ), ♦; (ii) the given inexact data S(f1 , g1 ), ; (iii) the computed data S(f˜1,0 , g̃1,0 ), ×, for α = 101.4 . All the polynomials are normalised by the geometric mean of their coefficients. approximations matrices that can be used for the computation of approximate GCDs of f1 (x) and g1 (x), such that the ratio (4.19) is large and the normalised residual is small. Example 4.4. Consider the polynomials fˆ2 (x) = (x − 1)8 (x − 2)16 (x − 3)24 , and ĝ2 (x) = (x − 1)12 (x + 2)4 (x − 3)8 (x + 4)2 , which have 16 common roots, and thus the rank of S(fˆ2 , ĝ2 ) is 58. The polynomials CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS −2 89 0 (b) −2 −2.5 −4 −3 (a) −6 −8 (a) −3.5 (b) −10 −4 −12 −4.5 −5 0 log α 5 −14 −5 10 (i) (ii) 11 10.8 norm || log10 σ51/σ52 −14.5 10 log ||r 5 10 −14 −15 −15.5 −16 −16.5 −5 0 log α 10.6 10.4 10.2 10 0 log10α (iii) 5 9.8 −5 0 log10α 5 (iv) Figure 4.4: (i)(a) The maximum allowable value of kzf1 k, which is equal to kf1 (x)k /µ, (b) the computed value of kzf1 k; (ii)(a) the maximum allowable value of kzg1 k/α, which is equal to kg1 (x)k /µ, (b) the computed value of kzg1 k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ51 /σ52 . were perturbed by noise such that µ = 108 , and the result for k = 16 is shown in Figure 4.5. It is seen that although the numerical rank of S(fˆ2 , ĝ2 ) is not well defined, the rank of the structured low rank approximation S(f˜2,0 , g̃2,0 ) is 58, which is the correct value. Convergence was achieved in 22 iterations. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 90 α = 100.1 0 −5 −10 −15 −20 −25 0 10 20 30 40 50 60 70 80 i Figure 4.5: The normalised singular values of the Sylvester matrix, on a logarithmic scale, for (i) the theoretically exact data S(fˆ2 , ĝ2 ), ♦; (ii) the given inexact data S(f2 , g2 ), ; (iii) the computed data S(f˜2,0 , g̃2,0 ), ×, for α = 100.1 . All the polynomials are normalised by the geometric mean of their coefficients. 4.4 Approximate GCDs of Bernstein basis polynomials It was shown in Section 4.3 that the computation of an approximate GCD of two polynomials in the power (monomial) basis can be expressed as an LSE problem, and algorithms for its solution were discussed in Section 4.3.1. In this section, the computation of an approximate GCD of two Bernstein basis polynomials is considered, and it is shown that only minor changes are required to the theory and computational implementation for the power basis. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 91 The calculation of a structured low rank approximation of the Sylvester matrix for Bernstein basis polynomials requires that the polynomials be transformed to the scaled Bernstein basis, the basis functions of which are stated in (3.20). Comparison of these basis functions with the basis functions (3.16) that define the Bernstein basis show that a polynomial expressed in the Bernstein basis can be transformed to the scaled Bernstein basis form by moving the combinatorial factor mi from the basis function to the coefficient, X m m X m m m−i i ci (1 − x) x = ci (1 − x)m−i xi . i i i=0 i=0 The expression on the left is a Bernstein basis polynomial, and the expression on the right is the scaled Bernstein basis form of this polynomial. The matrix T (p, q) in (3.19) is the Sylvester resultant matrix of the polynomials p(x) and q(x), which are defined in (3.17), when they are expressed in the scaled Bernstein basis. This must be compared with S(p, q), which is the Sylvester resultant of p(x) and q(x) when they are expressed in the Bernstein basis, and these two resultant matrices are related by a diagonal matrix, as shown in (3.18). The resultant matrices T (p, q) and S(f, g) have exactly the same structure, and thus the formulation of the LSE problem for the calculation of an approximate GCD of scaled Bernstein basis polynomials is very similar to its formulation for power basis polynomials. The subresultant matrix Sk (p, q) of the Bernstein basis resultant matrix S(p, q) can be decomposed as Sk (p, q) = Dk−1 Tk (p, q), where Tk (p, q) ∈ R(m+n−k+1)×(m+n−2k+2) is the k’th subresultant matrix of T (p, q), that is, Tk (p, q) is formed from T (p, q) by deleting the last (k − 1) columns of p(x), the last (k − 1) columns of q(x), and the last (k − 1) rows. Similarly, the diagonal CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 92 matrix Dk ∈ R(m+n−k+1)×(m+n−k+1) is obtained by deleting the last (k − 1) rows and the last (k − 1) columns of D. The matrix Tk (p, q) is written as Tk (p, q) = ek Fk = ek coeffs. of p(x) coeffs. of q(x) , where ek ∈ R(m+n−k+1) and Fk ∈ R(m+n−k+1)×(m+n−2k+1) , and thus −1 −1 −1 Sk (p, q) = Dk ek Fk = Dk ek Dk Fk , which is the same as (3.10), but for a Bernstein basis polynomial. Example 4.5. The Sylvester resultant matrix of the polynomials 4 4 4 4 3 p(x) = 3 (1 − x) − 2 (1 − x) x − 5 (1 − x)2 x2 0 1 2 4 4 4 +2 (1 − x)x3 + 6 x, 3 4 3 5 3 1 3 3 3 3 2 2 q(x) = 3 (1 − x) − (1 − x) x − (1 − x)x + x, 0 6 1 2 2 3 is 3 0 0 3 0 0 0 1 5 1 −4 0 − 0 0 3 2 12 2 8 1 1 1 1 − − 0 −2 − 15 5 10 6 5 1 3 1 3 S(p, q) = S1 (p, q) = 25 − 32 − 52 − 40 − 8 20 , 20 8 1 1 1 2 −2 0 − − 5 15 15 10 6 4 1 1 1 0 0 − 0 3 6 4 0 0 6 0 0 0 1 (4.22) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 93 and this matrix can be decomposed as the product D1−1 T1 (p, q), 1 0 0 0 0 0 0 3 0 0 3 0 0 0 5 0 1 0 0 0 0 0 −8 3 0 0 3 0 −2 6 1 3 5 0 0 −30 −8 3 − 0 0 0 0 − 3 0 15 2 2 1 3 5 0 0 0 20 0 0 0 8 −30 −8 1 −2 −2 3 . 3 5 0 0 0 0 1 0 0 6 8 −30 0 1 −2 −2 15 0 0 0 0 0 1 0 3 0 6 8 0 0 1 − 6 2 0 0 0 0 0 0 1 0 0 6 0 0 0 1 The vector e1 is equal to the first column of T1 (p, q), and the matrix F1 is formed from columns 2, . . . , 7, of T1 (p, q). Consider now the situation k = 2, for which the first column of S2 (p, q) is 3 3 1 0 0 0 0 0 − 4 0 1 0 0 0 0 −8 3 6 1 0 0 0 −30 −2 0 0 15 = D −1 e2 , 2 2 = 0 0 0 1 0 0 8 20 5 2 0 0 0 0 1 0 6 5 15 0 0 0 0 0 0 61 0 and the matrix formed by columns 0 3 0 0 1 5 1 0 1 − 0 2 12 2 8 1 1 − 61 0 − 15 − 10 5 = 3 1 3 1 − − 40 − 8 2 20 0 8 1 1 0 − 10 15 0 15 1 1 0 0 0 6 2, . . . , 5, of S2 (p, q) can be decomposed as D2−1 F2 , 0 0 0 0 0 0 3 0 0 1 5 0 0 0 0 3 0 3 − 6 2 1 0 15 0 0 0 −8 − 32 − 52 3 . 1 3 5 0 0 20 0 0 1 −2 −2 −30 1 3 0 0 0 15 0 8 0 1 − 2 0 0 0 0 16 6 0 0 1 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS The first column of S3 (p, q) is 3 1 4 − 0 3 −2 = 0 2 5 0 2 0 5 0 0 0 0 3 94 −8 0 0 0 −1 1 0 15 0 0 −30 = D3 e3 , 1 0 0 20 0 8 1 0 0 0 15 6 1 6 and the matrix formed by the second and third columns of S3 (p, q) is 1 0 0 0 0 3 0 3 0 5 5 1 1 − 0 0 0 0 3 2 6 12 −2 − 1 −1 = 0 0 1 − 3 − 5 = D3−1 F3 . 0 0 2 10 6 15 2 1 3 1 20 − 40 0 0 0 20 0 1 − 32 1 1 0 0 0 0 0 15 0 1 15 Since (4.22) is the scaled Bernstein basis equivalent of (3.10), it follows from Theorem 3.5 that the scaled Bernstein basis polynomials p(x) and q(x) have a common divisor of degree k if and only if Fk y = ek possesses a solution. Since the vector ek 95 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS and matrix Fk form the matrix Tk (p, q), Tk (p, q) = ek Fk m d0 n0 c0 0 .. c1 m . d1 n1 1 .. .. ... c0 m0 . . .. n m = cm−1 m−1 . c1 m1 dn−1 n−1 .. .. cm m . . dn nn m ... m cm−1 m−1 cm m m .. . .. . .. . .. . .. . d0 n0 n , d1 1 .. . n dn−1 n−1 n dn n and this matrix has exactly the same pattern as the power basis Sylvester matrix S(f, g), it follows that the theory in Sections 4.3 and 4.3.1 can be reproduced for scaled Bernstein basis polynomials. In particular, T (p, q) and its subresultant matrices contain the indeterminacy that is described in Section 4.2, and thus it is more appropriate to denote the Sylvester matrix of the scaled Bernstein basis polynomials p(x) and q(x) as T (p, αq). Furthermore, the error matrix Bk (z), which is defined in (4.5), is also the structured error matrix for Tk (p, αq), and thus the computation of an approximate GCD of two scaled Bernstein basis polynomials reduces to an LSE problem. The entries of the vector z are the perturbations of p(x) and αq(x) when they are expressed in the scaled Bernstein basis. z = zp Since z is partitioned as αz , q where zp ∈ Rm+1 and zq ∈ Rn+1 are the perturbation vectors of p(x) and q(x) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 96 respectively, it follows that the corrected scaled Bernstein polynomials are m X m p̃(x) = ci + zi (1 − x)m−i xi , i i=0 and n X n zm+1+i q̃(x) = di + (1 − x)n−i xi . i α i=0 The Bernstein form of these polynomials is ! m X zi m p̃(x) = ci + m (1 − x)m−i xi , i i i=0 and q̃(x) = n X i=0 zm+1+i di + α ni ! n (1 − x)n−i xi , i respectively, and thus the perturbations of the coefficients of the Bernstein form of p(x) and q(x) are zi m , i respectively. i = 0, . . . , m, and zm+1+i , α ni i = 0, . . . , n, Example 4.6. Consider the Bernstein form of the exact polynomials [47], p̂(x) = (x − 0.6)8 (x − 0.8)9 (x − 0.9)10 (x − 0.95)5 , and q̂(x) = (x − 0.6)12 (x − 0.7)4 (x − 0.9)5 , whose GCD is of degree 13, and thus the rank of S(p̂, q̂) is equal 40. The 13’th subresultant matrix, corresponding to the value k = 13, was selected, and µ was set equal to 108 . Figure 4.6(i) shows the ratio kp(x)k /µ, which is the maximum allowable magnitude of the perturbations of the coefficients of p(x), and the variation with α of CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS −1 97 2 (a) −2 0 −3 −2 −4 (a) −4 (b) −5 −6 −6 −8 (b) −7 −5 0 log α 5 −10 −5 10 (i) (ii) 12 −6 10 log10 σ40/σ41 log10||rnorm|| 5 10 −4 −8 −10 −12 −14 8 6 4 2 −16 −18 −5 0 log α 0 log10α (iii) 5 0 −5 0 log10α 5 (iv) Figure 4.6: The variation with α of (i)(a) The maximum allowable value of kzp k, which is equal to kp(x)k /µ, (b) the computed value of kzp k; (ii)(a) the maximum allowable value of kzq k/α, which is equal to kq(x)k /µ, (b) the computed value of kzq k/α; (iii) the normalised residual krnorm k; (iv) the singular value ratio σ40 /σ41 . The horizontal and vertical axes are logarithmic in the four plots. the computed value of kzp k. Figure 4.6(ii) is the same as Figure 4.6(i), but for q(x) instead of p(x). It is seen that the four plots in this figure are identical in form to their equivalents in Figure 4.1 for the power basis polynomials considered in Example 4.3. The Bernstein basis equivalent of the inequality (4.17) for p(x) is satisfied for all values of α in the specified range, but the corresponding inequality (4.18) for q(x) is only satisfied for log10 α > 1.72. This is the minimum value of α for which CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 98 the bounds on the structured perturbations of the coefficients of p(x) and q(x) are satisfied. Figure 4.6(iii) shows the variation with α of the normalised residual krnorm k, where rnorm = r(z, y) , kek + hk k and hk is defined in (4.5) and r(z, y) is defined in (4.6). It is seen that this variation is significant, and in particular, the graph shows that there exist values of α for which the normalised residual is large. It therefore follows that there does not exist a structured matrix Ek and a structured vector hk such that (4.6) is satisfied for these values of α. By contrast, it is also seen that there exist values of α for which the normalised residual is equal to O (10−16 ), which implies that (4.6) is satisfied (to within machine precision). The normalised residual is a minimum when α = 102.8 , and this is therefore the optimal value of α. The theoretically exact rank of S(p̂, q̂) is equal to 40, and thus a measure of the effectiveness of the method of STLN is the ratio γ = σ40 /σ41 of the Sylvester resultant matrix S(p̃, q̃), where p̃ = p̃(x) and q̃ = q̃(x) are the polynomials that are computed by the method of STLN, σi is the ith singular value of S(p̃, q̃) and the singular values are arranged in non-increasing order. Figure 4.6(iv) shows the variation of this ratio with α, and it is seen that it is identical in form to Figure 4.1(iv), and thus the comments made in Example 4.3 are valid for this example. In particular, a poor choice of α can lead to unsatisfactory results (the ratio γ is small and the normalised residual krnorm k is large), and the default value (α = 1) may lead to poor results. Figure 4.7 shows the normalised singular values of the Sylvester matrix of the theoretically exact polynomials p̂(x) and q̂(x), the given inexact polynomials p(x) and q(x), and the corrected polynomials p̃(x) and q̃(x) for α = 102.8 , which is the CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 99 α = 102.8 0 −10 −20 −30 −40 −50 0 10 20 30 40 50 60 Figure 4.7: The normalised singular values, on a logarithmic scale, of the Sylvester resultant matrix for (i) the theoretically exact polynomials p̂(x) and q̂(x), ♦; (ii) the given inexact polynomials p(x) and q(x), ; (iii) the corrected polynomials p̃(x) and q̃(x) for α = 102.8 , ×. All the polynomials are scaled by the geometric mean of their coefficients. optimal value of α. All the polynomials are scaled by the geometric mean of their coefficients. Figure 4.7(i) shows that S(p̂, q̂) is of full rank, which is incorrect because p̂ and q̂ are not coprime, and Figure 4.7(iii) shows the results for S(p̃, q̃), which are significantly better because its (numerical) rank is 40, which is the correct value. Since the perturbations that are used for the formation of this matrix are, by construction, structured, and its rank is equal to 40, this matrix is a structured low rank approximation of S(p, αq), α = 102.8, that can be used to compute an approximate GCD of p(x) and q(x). It is seen from Figures 4.6(iii) and (iv) that there are many values of α > 101.72 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 100 such that the ratio σ40 /σ41 is large and the normalised residual krnorm k is sufficiently small. There therefore exist many structured low rank approximations of S(p, αq) that satisfy tight bounds on krnorm k, which is an error bound for the satisfaction of (4.6), and also satisfy tight bounds on the ratio γ = σ40 /σ41 , which is a measure of the numerical rank of S(p̃, q̃). Each of these approximations yields a different approximate GCD, and additional constraints can be imposed on the optimisation algorithm in order to select a particular structured low rank approximation of S(p, αq), and thus a particular approximate GCD. Another example of the computation of an approximate GCD of two Bernstein polynomials is in [1]. 4.5 An approximate GCD of a polynomial and its derivative Uspensky’s method for the computation of the roots of a polynomial, which is described in Section 2.5, requires the determination of the GCD of a polynomial and its derivative. In this circumstance, the theory presented in earlier sections of this chapter for the independent polynomials f (x) and g(x) can be extended because the condition g(x) = f (1) (x) imposes additional structure on the matrices that arise in the LSE problem, such that their dimensions can be reduced. 101 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS The Sylvester matrix S(f, αf (1) ) ∈ R(2m−1)×(2m−1) is mαam am (m − 1)αam−1 . . . am−1 . . . mαam .. . . .. . . . (m − 1)αam−1 . . . am 2αa2 a1 am−1 .. .. . . αa1 . . . a0 . . . .. .. . . 2αa2 a1 αa1 a0 , and it is seen that columns m, . . . , 2m − 1, that is, the columns occupied by the coefficients of f (1) (x) can be calculated from columns 1, . . . , m−1, which are occupied by the coefficients of f (x). This constraint can be imposed on the problem, with the consequence that the dimensions of the matrices in the LSE problem are reduced. The subresultant matrices Sk f, f (1) are defined in exactly the same way as the subresultant matrices Sk (f, g). Example 4.7. If m = 4, then S1 (1) (1) f, αf = S f, αf = a4 4αa4 a3 a4 3αa3 4αa4 a2 a3 a4 2αa2 3αa3 4αa4 a1 a2 a3 αa1 2αa2 3αa3 a0 a1 a2 αa1 2αa2 a0 a1 αa1 a0 4αa4 , 3αa3 2αa2 αa1 102 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS and S2 = S2 f, αf (1) and 4αa4 a4 a a 3αa 3 3 4 a2 a3 2αa2 S2 = a1 a2 αa1 a a 0 1 a0 respectively. S3 = S3 f, αf (1) are 4αa4 3αa3 2αa2 αa1 4αa4 , 3αa3 2αa2 αa1 a 4αa4 4 a3 3αa3 4αa4 S3 = a2 2αa2 3αa3 , a1 αa1 2αa2 a0 αa1 Each matrix Sk f, αf (1) is partitioned into a vector ck ∈ R(2m−k) and a matrix Ak ∈ R(2m−k)×(2m−2k) , where ck is the first column of Sk f, αf (1) and Ak is the matrix formed from the remaining columns of Sk , where 1 ≤ k ≤ m − 1, (1) Sk f, αf = ck Ak (1) = ck coeffs. of f (x) coeffs. of αf (x) , where the columns of f (x) occupy m − k − 1 columns and the columns of αf (1) (x) occupy m−k+1 columns. Theorem 3.5 shows that a necessary and sufficient condition for f (x) and f (1) (x) to have a non-constant GCD is that the equation Ak y = ck , y ∈ R2(m−k) , (4.23) possess a solution. If hk ∈ R2m−k and Ek ∈ R(2m−k)×(2m−2k) are the perturbations of ck and Ak respectively, then ck and hk have the same structure, and Ak and Ek have the same structure. It therefore follows that if zi is the perturbation of the coefficient ai , i = 0, . . . , m, of f (x), the error vector z is equal to T z = zm zm−1 · · · z1 z0 ∈ Rm+1 , 103 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS and the error matrix Bk = Bk (z) is equal to Bk := hk Ek mαzm zm zm−1 . . . (m − 1)αzm−1 . .. .. . . . z . m = z1 . . . zm−1 2αz2 . .. z0 . . . αz1 ... z1 z0 .. . .. . mαzm .. . (m − 1)αzm−1 .. .. . . .. . 2αz2 αz1 , where hk is equal to the first column of Bk (z), and Ek is equal to the last 2m − 2k columns of Bk (z). The first (m − 1 − k) columns of Ek store the perturbations of f (x), and the last (m + 1 − k) columns of Ek store the perturbations of αf (1) (x). It follows from the definitions of hk and z that there exists a matrix Pk ∈ R(2m−k)×(m+1) such that Example 4.8. If m = 4 a4 a a 3 4 a2 a3 (1) S2 f, αf = a1 a2 a a 0 1 a0 hk = Pk z = Im+1 0m−k−1,m+1 and k = 2, then 4αa4 3αa3 4αa4 2αa2 3αa3 αa1 2αa2 αa1 4αa4 , 3αa3 2αa2 αa1 z. c2 = a4 a3 a2 , a1 a0 0 z 4 z3 z = z2 , z1 z0 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS hT2 and P2 = = z4 z3 z2 z1 z0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 , 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 E2 = , 4αz4 z4 3αz3 4αz4 z3 2αz2 3αz3 z2 αz1 2αz2 z1 αz1 z0 104 4αz4 . 3αz3 2αz2 αz1 The residual r(z, y) that is associated with an approximate solution of (4.23) due to the perturbations hk and Ek is given by r(z, y) = ck + hk − (Ak + Ek )y, hk = Pk z, Ek = Ek (z), and it was shown in Section 4.3 that the residual r(z, y) can be written as r(z, y) = ck + hk − Ak y − Yk z. Equation (4.7) allows the rule for the construction of the elements of Yk to be established because closed form expressions for the elements of Ek have been derived. In particular, it follows from (4.7) that m+1 X r=1 2(m−k) (Yk )i,r zr−1 = X s=1 (Ek )i,s ys , i = 1, . . . , 2m − k, CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS and thus for i = 1, . . . , 2m − k, where 1 ≤ k ≤ m − 1, m+1 X (Yk )i,r zr−1 = r=1 m−1−k X 2(m−k) (Ek )i,s ys + s=1 = m−1−k X = (Ek )i,s ys s=m−k (Ek )i,j yj + j=1 m−1−k X X m+1−k X (Ek )i,m−1−k+j ym−1−k+j j=1 zm+1+j−i yj + j=1 m+1−k X j=1 α(m + j − i)zm+j−i ym−1−k+j from the formulae for (Ek )i,j = m−k X zm+j−i yj−1 + j=2 = j=1 m−k X j=2 m+1−k X α(m + j − i)zm+j−i ym−1−k+j yj−1 + α(m + j − i)ym−1−k+j zm+j−i +α(m + 1 − i)ym−k zm+1−i +α(2m + 1 − k − i)y2(m−k) z2m+1−k−i , which enables closed form expressions for the elements of Yk to be obtained. Example 4.9. If m = 4 and k = 1, then 4αz4 z4 3αz3 4αz4 z z 2αz 3αz 4αz 3 4 2 3 4 E1 = z2 z3 αz1 2αz2 3αz3 4αz4 , z1 z2 αz1 2αz2 3αz3 z z αz 2αz 0 1 1 2 z0 αz1 105 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS and it follows from (4.7) that 4αy3 106 3αy3 (y1 + 4αy4 ) 2αy (y + 3αy ) (y + 4αy ) 3 1 4 2 5 Y1 = αy3 (y1 + 2αy4 ) (y2 + 3αy5) 4αy6 . (y1 + αy4) (y2 + 2αy5 ) 3αy6 y (y + αy ) 2αy 1 2 5 6 y2 αy6 If m = 4 and k = 2, then E2 = 4αz4 z4 3αz3 4αz4 z3 2αz2 3αz3 z2 αz1 z1 z0 and it is readily verified from (4.7) that Y2 = y1 2αz2 αz1 4αz4 , 3αz3 2αz2 αz1 4αy2 3αy2 (y1 + 4αy3 ) 2αy2 (y1 + 3αy3) 4αy4 . αy2 (y1 + 2αy3 ) 3αy4 (y1 + αy3) 2αy4 αy4 Equation (4.23) is a non-linear equation that is solved iteratively after it has been CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 107 linearised, and it was shown in Section 4.3 that the linearised form of this equation is r(z + δz, y + δy) = r(z, y) − (Yk − Pk )δz − (Ak + Ek )δy. It is necessary to solve r(z + δz, y + δy) = 0, subject to the constraint that the perturbations zi are minimised. Different perturbations zi occur a different number of times, and it is therefore necessary to minimise kDzk subject to r(z, y) = (Yk − Pk )δz + (Ak − Ek )δy, where the element dii = di , i = 1, . . . , m + 1, of the diagonal matrix D is calculated from the number of times that zi occurs in Bk (z): zm occurs (m − k) + m(m − k + 1) times zm−1 occurs (m − k) + (m − 1)(m − k + 1) times zm−2 occurs (m − k) + (m − 2)(m − k + 1) times ··· ··· ··· ··· z1 occurs (m − k) + (m − k + 1) times z0 occurs (m − k) times and thus D = diag {di} = diag {(m − k) + (m − i + 1)(m − k + 1)} . This is an LSE problem that is similar to the LSE problem in Section 4.3, and it can be written in matrix form as min kEv − sk , Cv=t CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 108 where (2m−k)×(3m−2k+1) (Yk − Pk ) (Ak + Ek ) ∈ R (m+1)×(3m−2k+1) E = D 0 ∈R C = t = r(z, y) ∈ R2m−k s = −Dz δz v = δy m+1 ∈ R 3m−2k+1 , ∈R δz ∈ Rm+1 and δy ∈ R2(m−k) . The LSE problem is solved by Algorithms 4.1 and 4.2. 4.6 GCD computations by partial singular value decomposition This section describes the method developed by Zeng [53] for the computation of an approximate GCD of two polynomials. It uses the Sylvester resultant matrix, but in a slightly rearranged form from that used in this report. In particular, he uses the Sylvester resultant matrix of the polynomials (1) W (f, f ) = Cm (f (1) ) f (x) and f (1) (x), C m−1 (f ) , CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS where Ck (f ) is the Cauchy matrix, with k a0 a 1 .. . Ck (f ) = am 109 columns, of f (x), .. . .. . a0 .. . .. . a1 .. . am . The Cauchy matrix arises when polynomial multiplication is expressed in terms of matrices and vectors. In particular, if g(x) is a polynomial of degree n, and f , g and h are the vectors of coefficients of f (x), g(x) and h(x) = f (x)g(x) respectively, then h = Cn+1 (f )g = Cm+1 (g)f . The subresultant matrix Wk f, f (1) ∈ R(2m−k)×(2m−2k+1) is obtained from W f, f (1) in the same way as the subresultant matrices Sk f, f (1) are obtained from S f, f (1) , that is, the last k − 1 columns of the coefficients of f (x), the last k − 1 columns of the coefficients of f (1) (x), and the last k − 1 rows, of W f, f (1) are deleted. Example 4.10. If m = 4, W (f, f (1) ) = W1 (f, f (1) ) = a1 0 0 0 2a2 a1 0 0 3a3 2a2 a1 0 4a4 3a3 2a2 a1 0 4a4 3a3 2a2 0 0 4a4 3a3 0 0 0 4a4 a0 0 0 a1 a0 0 a2 a1 a0 a3 a2 a1 , a4 a3 a2 0 a4 a3 0 0 a4 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 110 and the subresultant matrices W2 (f, f (1) ) and W3 (f, f (1) ) are 0 0 a0 0 a1 2a a a a 0 1 2 1 0 3a 2a a a a 2 1 3 2 1 (1) , W2 (f, f ) = 4a4 3a3 2a2 a3 a2 0 4a 3a a a 3 4 3 4 0 0 4a4 0 a4 and respectively. (1) W3 (f, f ) = 0 2a2 a1 3a3 2a2 4a4 3a3 0 4a4 a1 a0 a1 a2 , a3 a4 The method described by Zeng [53] uses the relationships (3.8) and (3.9) between the order of a subresultant matrix Wk (f, f (1) ) and the degree d of the GCD of f (x) and f (1) (x), rank Wm−j (f, f (1) ) = 2j + 1, j = 1, . . . , m − d − 1, rank Wm−j (f, f (1) ) < 2j + 1, j = m − d, . . . , m − 1. and If the first rank deficient subresultant matrix Wm−j (f, f (1) ) ∈ R(m+j)×(2j+1) , as j increases from 1 to m − 1, occurs for j = k, then the degree d of the GCD of f (x) and f (1) (x) is given by d = m − k. It follows that if ξj , j = 1, . . . , m − 1, is the smallest CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS singular value of Wm−j (f, f (1) ), that is, ξj = σ2j+1 Wm−j (f, f (1) ) , then ξ1 , ξ2 , · · · , ξm−d−1 > 0, ξm−d = ξm−d+1 = · · · = ξm−1 = 0. 111 (4.24) The following theorem is proved in [53]. Theorem 4.1. Let u = u(x) and v = v(x) be polynomials of degrees t = m − d and t − 1 respectively, such that u(x)d(x) = f (x) v(x)d(x) = f (1) (x). and (4.25) Then (a) u(x) and v(x) are coprime. (b) The (column) rank of Wd (f, f 1 ) = Wm−t (f, f (1) ) is deficient by one. (c) The vector w= u , −v where u and v are column vectors of the coefficients of u(x) and v(x) respectively, is the right singular vector of Wd (f, f (1) ) that is associated with the smallest (zero) singular value ξm−d . (d) If u is known, then the vector d of the coefficients of d(x) is the solution of Cd+1 (u)d = f. Equation (4.24) shows that if the first rank deficient matrix Wm−j (f, f (1) ) for j = 1, . . . , m − 1, occurs for j = k, then the degree d of the GCD of f (x) and f (1) (x) is equal to m−k. The singular value decomposition is the natural method to calculate the rank of a matrix, but it follows from (4.24) that the calculation of the degree of CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 112 the GCD of f (x) and f (1) (x) only requires the determination of the index of the first zero singular value, and the associated right singular vector, and not all the singular values and singular vectors. It is shown in [53, 54] that if the columns of Wk (f, f (1) ) are rearranged, the QR decomposition can be updated and an inverse iteration used to compute these two quantities efficiently. This rearrangement requires that the columns of the coefficients of f (x) and f (1) (x) interlace, and the rearranged form of Wk (f, f (1) ) is obtained by adding two columns to the right hand side, and a row at the bottom, to the rearranged form of Wk+1 (f, f (1) ) for k = m − 2, m − 3, . . . , 1. Example 4.11. Consider Wk (f, f (1) ), k = 1, 2, 3, for f (x) = (x − 1)2 (x − 2)(x − 3) = x4 − 7x3 + 17x2 − 17x + 6, and thus f (1) (x) = 4x3 − 21x2 + 34x − 17. The subresultant matrix W3 (f, f 1 ) is a 0 1 2a2 a1 (1) W3 (f, f ) = 3a3 2a2 4a4 3a3 0 4a4 a0 a1 a2 , a3 a4 and this matrix is rearranged so that the coefficients of f (x) and f (1) (x) interlace. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 113 The reordered form of W3 (f, f (1) ) is therefore given by a1 a0 0 2a2 a1 a1 r (1) W3 (f, f ) = 3a3 a2 2a2 . 4a4 a3 3a3 0 a4 4a4 The subresultant matrix W2 (f, f (1) ) is 0 0 a1 2a 0 2 a1 3a3 2a2 a1 (1) W2 (f, f ) = 4a4 3a3 2a2 0 4a 3a 4 3 0 0 4a4 and the reordered form of this matrix is a1 2a 2 3a3 r (1) W2 (f, f ) = 4a4 0 0 a0 a1 0 a0 0 0 a1 a0 a2 a1 , a3 a2 a4 a3 0 a4 a1 a0 a2 2a2 a1 a3 3a3 a2 a4 4a4 a3 0 0 a4 0 0 a1 , 2a2 3a3 4a4 and the columns of the coefficients of f (x) and f (1) (x) interlace. The reordered matrix W2r (f, f (1) ) is obtained by the addition of a column of the coefficients of f (x) and a column of the coefficients of f (1) (x) on the right hand side, and an extra row, to W3r (f, f (1) ). CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS The subresultant matrix W1 (f, f (1) ) is a 0 0 0 1 2a2 a1 0 0 0 3a3 2a2 a1 W1 (f, f (1) ) = 4a4 3a3 2a2 a1 0 4a4 3a3 2a2 0 0 4a4 3a3 0 0 0 4a4 a0 0 0 a1 a0 0 a2 a1 a0 a3 a2 a1 a4 a3 a2 0 a4 a3 0 0 a4 114 , and the rearranged form of this matrix, such that the columns of f (x) and f (1) (x) interlace, is a1 a0 2a2 3a3 r (1) W1 (f, f ) = 4a4 0 0 0 0 0 0 0 a1 a0 0 0 0 a1 a0 0 a3 3a3 a2 2a2 a1 a1 a1 0 a2 2a2 a1 a4 4a4 a3 3a3 a2 2a2 0 0 a4 4a4 a3 3a3 0 0 0 0 a4 4a4 , which is formed from W2r (f, f (1) ) by the addition of two columns and one row, as described above. The computation of the GCD of f (x) and f (1) (x) reduces to the computation of the polynomials u(x) and v(x), which are defined in (4.25), and this involves three stages: • Obtain initial estimates of the coprime polynomials u(x) and v(x) (Section 4.6.1). CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 115 • Obtain an initial estimate of the GCD d(x) (Section 4.6.2). • Use the method of non-linear least squares to obtain improved estimates of u(x), v(x) and d(x) (Section 4.6.3). The degree of the GCD of f (x) and f (1) (x) is not known a priori, and it is therefore r necessary to interrogate the subresultant matrices Wm−j (f, f (1) ) for j = 1, . . . , m − 1, and determine the first matrix in this sequence that is rank deficient. This requires a criterion in order to determine when its smallest singular value ξj can be assumed to be (numerically) zero, and this is established in Section 4.6.4. Since it is only r required to determine the smallest singular value of Wm−j (f, f (1) ), and the associated r right singular vector, a complete singular value decomposition of Wm−j (f, f (1) ) is not required. It is shown, however, in Lemma 2 in [53] and Lemma 2.4 in [54] that ξj can r be calculated from the QR decomposition of Wm−j (f, f (1) ), and furthermore, a fast r update procedure for the calculation of the QR decomposition of Wm−j (f, f (1) ) from r the QR decomposition of Wm−j+1 (f, f (1) ) is derived. 4.6.1. The calculation of the coprime polynomials The accurate computation of the GCD of f (x) and f (1) (x) requires that their coprime polynomials u(x) and v(x) be calculated, and pseudo-code for this is shown in Algorithm 4.3. The coefficients of u(x) and v(x) are stored in the vectors u and v respectively, and they are calculated in Step 3(b) in Algorithm 4.3 from the right singular vector yj that is associated with the singular value ξj ≤ θ, as stated in Theorem 4.1. An expression for the threshold θ in terms of the noise level of the coefficients of f (x) and the singular values ξj is developed in Section 4.6.4. Algorithm 4.3 is very simple, such that the computational reliability of u and v CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 116 cannot be guaranteed, and thus their values must be improved. They are therefore initial estimates u0 and v0 in a refinement strategy, which is implemented by the method of non-linear least squares and considered in Section 4.6.3. Algorithm 4.3: The coprime factors of a polynomial and its derivative r Input The matrix Wm−1 (f, f (1) ) and a tolerance θ for the smallest singular values of r Wm−j (f, f (1) ), j = 1, . . . , m − 1. Output The coprime factors, whose coefficients are stored in u0 and v0 , of f (x) and f (1) (x). Begin r 1. Calculate Wm−1 (f, f (1) ) by rearranging the columns of Wm−1 (f, f (1) ). r 2. Calculate the QR decomposition Wm−1 (f, f (1) ) = Qm−1 Rm−1 . 3. Set j = 1. While j ≤ m − 1 (a) Use the inverse iteration algorithm in [53, 54] to calculate the smallest r singular value ξj and corresponding right singular vector yj of Wm−j (f, f (1) ). (b) If ξj ≤ θ Then d = m − j, and calculate u0 and v0 from yj . Go to End (c) Else Set j = j + 1. CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 117 r r (d) Calculate Wm−j (f, f (1) ) from Wm−j+1 (f, f (1) ). End While % f (x) and f (1) (x) have no common factors. End 4.6.2. The calculation of the initial estimate of the GCD The calculation of the initial estimates u0 and v0 allows an initial estimate d0 of the GCD of f (x) and f (1) (x) to be calculated. In particular, it follows from (4.25) that Cd+1 (u0 )d 0 = f and Cd+1 (v0 )d 0 = f (1) , and these equations can be combined into one equation Cd+1 (u0) f d0 = , Cd+1 (v0 ) f (1) (4.26) where the coefficient matrix is of order (2m + 1) × (d + 1). This is a linear least squares problem that is solved by standard methods. 4.6.3. Refinement by non-linear least squares Initial estimates u0 and v0 of the coprime polynomials of f (x) and f (1) (x) were calculated in Algorithm 4.3, and an initial estimate d0 of the GCD of f (x) and f (1) (x) was calculated in (4.26). It is shown in this section that the refinement of these estimates leads to a non-linear least squares minimisation that is solved iteratively by the Gauss-Newton method, in which u0 , v0 and d0 are the initial estimates of the solution. 118 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS Equation (4.25), which shows how the polynomials u(x), v(x) and d(x) are related to f (x) and f (1) (x), can be cast in matrix form, Cd+1 (u)d = f and Cd+1 (v)d = f (1) . This pair of equations contains an arbitrary scale factor that must be removed before the initial estimates u0 , v0 and d0 of u, v and d, respectively, can be refined. It is therefore necessary to add a constraint on the magnitude of d, and thus the equations that are used for the refinement of u, v and d are Cd+1 (v)d = f (1) Cd+1 (u)d = f , and it is therefore required to minimise T r d −1 = 0 C (u)d = f d+1 Cd+1 (v)d = f (1) or kF (z ) − bk2 , where T r d −1 F (z ) = Cd+1 (u)d Cd+1 (v)d , d z = u v and 2 , and r T d = 1, 0 b= f f (1) , (4.27) in order to obtain improved estimates of u, v and d. The vectors Cd+1 (u)d and f are of length m + 1, and the vectors Cd+1 (v)d and f(1) are of length m. The equation that is satisfied at stationarity must be determined. Let r = kF (z ) − bk2 = F (z )T F (z ) − 2b T F (z ) + b T b, and thus δr = F (z + δz )T F (z + δz ) − F (z )T F (z ) − 2b T (F (z + δz ) − F (z )) . (4.28) 119 CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS Since, to first order, F (z + δz ) = F (z ) + 2m−d+2 X i=1 = F (z ) + = F (z ) + where ∂F (z ) ∂zi ∂F (z ) ∂z1 ∂F (z ) δzi ∂zi ∂F (z ) ∂z2 ··· ∂F (z ) ∂z2m−d+2 ∂F (z ) δz , ∂z is a column vector of length 2m + 2 and (2m + 2) × (2m − d + 2), it follows that F (z + δz )T = F (z )T + δz T ∂F (z ) ∂z δz is a matrix of order ∂F (z )T . ∂z The substitution of these expressions into (4.28) yields ∂F (z )T (F (z ) − b) , δr = 2δz ∂z T to lowest order, and thus the vector z that minimises r satisfies ∂F (z )T (F (z ) − b) = 0. ∂z The solution of this equation requires that an expression for (4.29) ∂F (z )T ∂z be derived, and this is obtained from the definition of F (z ) in (4.27), from which it follows that T r (d + δd ) − 1 F (z + δz ) = Cd+1 (u + δu) (d + δd ) . Cd+1 (v + δv) (d + δd ) Since, to lowest order, Cd+1 (u + δu) (d + δd ) = Cd+1 (u + δu) d + Cd+1 (u + δu) δd = Cd+1 (u)d + Cd+1 (δu)d + Cd+1 (u)δd = Cd+1 (u)d + Cd+1 (u)δd + Cm−d+1 (d)δu, CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 120 and a similar expression is valid for Cd+1 (v + δv) (d + δd ), it follows that T r δd F (z + δz ) − F (z ) = Cd+1 (u)δd + Cm−d+1 (d)δu Cd+1 (v)δd + Cm−d (d)δv T 0 0 r δd δu = C (u) C (d) 0 m−d+1 d+1 Cd+1 (v) 0 Cm−d (d) δv = ∂F (z ) δz. ∂z The Jacobian matrix J(z )∈ R(2m+2)×(2m−d+2) is therefore given by T 0 0 r ∂F (z) , J(z ) = = C (u) C (d) 0 d+1 m−d+1 ∂z Cd+1 (v) 0 Cm−d (d) and thus (4.29) becomes J(z )T (F (z ) − b) = 0, (4.30) which is a set of 2m − d + 2 non-linear equations in 2m + 2 unknowns. It is solved iteratively by the Gauss-Newton iteration, in which the initial estimates of u0 and v0 are computed in Algorithm 4.3, and the initial estimate of d0 is calculated from the solution of the least squares problem (4.26). Equation (4.30) is solved by the Gauss-Newton iteration, zj+1 = zj − J(zj )† (F (zj ) − b) , CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS where and J(z)† = J(z)T J(z) −1 121 d0 z0 = u0 , v0 J(z)T is the left inverse of J(z). It is clear that this itera- tion requires that J(z j ) have full column rank, and the conditions for the satisfaction of this condition are stated in the following theorem, which is proved in [53]. Theorem 4.2. If the polynomials u(x) and v(x) are coprime, and rT d 6= 0, then J(z) has full (column) rank. The Jacobian matrix J(z ) contains the vector r which must satisfy the condition rT d 6= 0 for the convergence of the Gauss-Newton iteration, and Zeng [53] recommends that r=d0 . 4.6.4. The calculation of the threshold r Step 3(b) of Algorithm 4.3 requires that the smallest singular value ξj of Wm−j (f, f (1) ) be measured against a threshold θ in order to determine its numerical rank. Since the singular values of a matrix are invariant with respect to row and column permutations, r it follows that ξj is also the smallest singular value of Wm−j (f, f (1) ). An expression for θ in terms of ξj is developed in this section. The polynomial f (x) is usually not known exactly, and furthermore, roundoff errors are adequate to cause incorrect results. It is therefore necessary to consider the theoretically exact polynomials f (x) and f (1) (x), and their perturbed forms f˜ = f˜(x) and f˜(1) = f˜(1) (x), respectively, where 2 (1) 2 (1) f − f̃ + f − f̃ ≤ ǫ2 . CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 122 In general, f˜(x) and f˜(1) (x) are coprime, and thus if ξ˜j is the smallest singular value r ˜ f˜(1) ), then ξ˜j ≥ 0, j = 1, . . . , m − 1. It is assumed, however, that f (x) of Wm−j (f, and f (1) (x) have a GCD of degree d, and thus the singular values ξj satisfy (4.24). Since the Sylvester resultant matrix and its subresultant matrices have a linear structure, it follows that Wjr (f˜, f˜(1) ) = Wjr (f, f (1) ) + Wjr (f˜ − f, f˜(1) − f (1) ), j = 1, . . . , m − 1, where each subresultant matrix is of order (2m − j) × (2m − 2j + 1). Since there are m − j and m − j + 1 columns of the coefficients of f (x) and f (1) (x) respectively, it follows that 2 r ˜ (1) (1) ˜ Wj (f − f, f − f ) ≤ (m − j + 1)ǫ2 , F where k·kF denotes the Frobenius norm. It is shown in [13], page 428, that if σi (X), i = 1, . . . , min (m, n), are the singular values of X ∈ Rm×n , arranged in non-increasing order, then |σk (A + E) − σk (A)| ≤ σ1 (E) = kEk2 ≤ kEkF , and thus if A = Wjr (f, f (1) ), ˜ f˜(1) ), E = Wjr (f˜ − f, f˜(1) − f (1) ) and A + E = Wjr (f, then for k = 1, . . . , 2m − 2j + 1, p r ˜ ˜(1) r (1) σk (Wj (f , f )) − σk (Wj (f, f )) ≤ ǫ m − j + 1. If j = d + 1, then √ r r (f˜, f˜(1) )) − σk (Wd+1 (f, f (1) )) ≤ ǫ m − d, σk (Wd+1 (4.31) (4.32) r r where Wd+1 (f, f (1) ) = Wm−(m−d−1) (f, f (1) ) ∈ R(2m−d−1)×(2m−2d−1) , and thus its smallr est singular value is ξm−d−1 . The matrix Wd+1 (f, f (1) ) is non-singular because f (x) CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 123 and f (1) (x) have a common divisor of degree d, and thus it follows from (4.24) that r ξm−d−1 = σ2m−2d−1 (Wd+1 (f, f (1) )) > 0, and r ξ˜m−d−1 = σ2m−2d−1 (Wd+1 (f˜, f˜(1) )). The substitution of these results into (4.32) for k = 2m − 2d − 1 yields √ ˜ ξm−d−1 − ξm−d−1 ≤ ǫ m − d, or √ √ ξm−d−1 − ǫ m − d ≤ ξ˜m−d−1 ≤ ξm−d−1 + ǫ m − d, r which relates the smallest singular value of Wd+1 (f, f (1) ), the smallest singular value r ˜ f˜(1) ), and the noise level ǫ. of Wd+1 (f, If j = d, then (4.31) becomes √ r ˜ ˜(1) r (1) σk (Wd (f , f )) − σk (Wd (f, f )) ≤ ǫ m − d + 1, (4.33) r where Wdr (f, f (1) ) = Wm−(m−d) (f, f (1) ) ∈ R(2m−d)×(2m−2d+1) is singular because f (x) and f (1) (x) have a common divisor of degree d. It follows that ξm−d = σ2m−2d+1 Wdr (f, f (1) ) = 0, and the substitution k = 2m − 2d + 1 in (4.33) yields √ ξ˜m−d ≤ ǫ m − d + 1, √ θ = ǫ m − d + 1, (4.34) where the threshold θ is specified in Algorithm 4.3. This equation is an upper bound on the smallest singular value of Wdr (f˜, f˜(1) ). r ˜ f˜(1) ), j = 1, . . . , m − 1, is constructed and The sequence of matrices Wm−j (f, their smallest singular values ξ˜1 , ξ˜2, . . . , ξ˜m−1 , are computed. When (4.34) is satisfied, CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 124 there is a possibility that f˜(x) and f˜(1) (x) have an approximate GCD of degree d, and this is confirmed or rejected by applying the Gauss-Newton iteration. If this yields an acceptable approximate GCD, then the algorithm terminates. If, however, an r acceptable GCD is not obtained, then the next matrix Wm−j−1 (f˜, f˜(1) ) is considered, and the process repeated. This guarantees that an approximate common divisor of maximum degree is calculated. An algorithm for the calculation of the GCD of two polynomials is in Section 7.4 in [53]. 4.7 Summary This section has considered the application of structured matrix methods, and the partial singular value decomposition of the Sylvester resultant matrix, for the computation of an approximate GCD of two polynomials. It was shown that the method that uses structured matrices leads to an LSE problem, and three methods for its solution were considered. It was shown that although the method of weights is frequently used for the solution of this problem, it possesses some disadvantages, which are not shared by the method based on the QR decomposition. It was shown that the Sylvester matrix contains an indeterminacy that can be used to introduce an extra parameter α into the computations, and moreover, it was shown in the examples that the quality of the computed approximate GCD varies significantly as α varies. Examples of the computation of approximate GCDs of two inexact polynomials, expressed in the power and Bernstein bases, were given. The partial singular value decomposition of the Sylvester resultant matrix yields initial estimates of the coprime polynomials and the GCD, which are then refined CHAPTER 4. APPROXIMATE GREATEST COMMON DIVISORS 125 by the method of non-linear least squares. Since it is only required to calculate the smallest singular value and the associated right singular vector of the subresultant matrices, a full singular value decomposition is not required. It was shown that these two quantities can be calculated efficiently using the QR decomposition of the subresultant matrix, and that a fast update procedure can be used to make the computations efficient. Chapter 5 A robust polynomial root finder It was shown by example in Chapter 1, and theoretically in Chapter 2, that the accurate computation of the multiple roots of a polynomial in a floating point environment is very difficult because even if the coefficients are known exactly, roundoff errors are sufficient to cause a multiple root to break up into simple roots. Several popular methods for the computation of the roots of a polynomial were reviewed in Section 1.2, and a method due to Uspensky [42], pages 65-68, was discussed in Section 2.5. In this method, the multiplicities of the roots are determined initially, after which the roots are calculated. This procedure differs from the methods discussed in Section 1.2 because the roots are calculated directly, without prior knowledge of their multiplicities. It is shown in this chapter that the GCD computations described in Chapter 4 enable Uspensky’s algorithm to be implemented, such that the computed roots are numerically stable. In particular, these GCD computations enable the multiplicity of each root to be calculated, and initial estimates of the roots of a polynomial are obtained by solving several lower degree polynomials, all of whose roots are simple. 126 CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER 127 These estimates are then refined by the method of non-linear least squares, using the multiplicities as prior information in this equation in order to obtain numerically accurate and robust computed roots. 5.1 GCD computations and the multiplicities of the roots The multiplicities of the roots of a polynomial are calculated directly from the GCD computations of a polynomial and its derivative, as shown in Algorithm 2.1 and Example 2.5. It is seen from this example that the roots of f (x) are equal to the roots of h1 (x), h2 (x) and h3 (x), where numerically identical roots are grouped together, or the roots of w1 (x), w2 (x) and w3 (x) are the roots of f (x) with multiplicities 1, 2 and 3 respectively. Example 5.1. Consider the polynomial f (x) in Example 2.5, f (x) = x6 − 3x5 + 6x3 − 3x2 − 3x + 2, whose roots are x = 1 with a multiplicity of 3, x = −1 with a multiplicity of 2, and a simple root x = 2. It is shown in Example 2.5 that h1 (x) = x3 − 2x2 − x − 2, h2 (x) = x2 − 1, h3 (x) = x − 1, and the roots of h1 (x) are x = −1, 1, 2, the roots of h2 (x) are x = −1, 1, and the root of h3 (x) is x = 1. When these roots are grouped together, it is seen that they are equal to the roots of f (x). The polynomials w1 (x), w2 (x) and w3 (x) are calculated in Example 2.5 and it is seen that the roots of wi (x), i = 1, 2, 3, are the roots of multiplicity i of f (x). CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER 128 If the roots of the polynomials hi (x) are used to calculate the roots of f (x), then numerically identical roots are grouped together, thereby obtaining the multiplicity of each root and an approximation to its value. Alternatively, the roots of the polynomials wi (x) can be used to calculate the roots of multiplicity i of f (x). The first method requires a criterion to decide when two numerically distinct roots are theoretically the same root, and the second method requires that polynomial divisions be performed. This operation can be performed in a stable manner by the method of least squares, as shown in Section 4.6.2. It is therefore assumed that one of these methods has been used to obtain an estimate of each root of f (x), and its multiplicity, and these estimates are refined using the method of non-linear least squares. The application of this method to power basis polynomials is described in [54], and its extension to Bernstein polynomials is described in the next section. 5.2 Non-linear least squares for Bernstein polynomials It is shown in this section that if initial estimates of the roots of the Bernstein basis polynomial p(x) are given, and their multiplicities are known, their refinement leads to a non-linear equation that is solved by the method of non-linear least squares. This requires an expression for each coefficient of p(x) in terms of its roots. Example 5.2. Consider the quadratic polynomial 2 p1 (x) = (x − α)2 = − α(1 − x) + (1 − α)x , where the term on the right is written as the square of a linear polynomial in the 129 CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER Bernstein basis. The convolution of [−α (1 − α)] with itself yields −α (1 − α) ⊗ −α (1 − α) = α2 −2α(1 − α) (1 − α)2 , which are the coefficients of p1 (x) in the scaled Bernstein basis, 2 2 2 p1 (x) = α (1 − x) + − 2α(1 − α) x(1 − x) + (1 − α) x2 . The ith coefficient of the Bernstein basis form of p1 (x) is recovered by dividing the ith coefficient of its scaled Bernstein basis form by 2i , i = 0, 1, 2, −2α(1−α) (1−α)2 α2 2 2 → α −2α(1 − α) (1 − α) (20) (21) (22) = α2 −α(1 − α) (1 − α)2 . Example 5.3. Consider the scaled Bernstein form of the cubic polynomial p2 (x), p2 (x) = (x − α1 )2 (x − α2 ) 2 = − α1 (1 − x) + (1 − α1 )x − α2 (1 − x) + (1 − α2 )x = −α12 α2 (1 − x)3 + α1 (α1 + 2α2 − 3α1 α2 )(1 − x)2 x −(1 − α1 )(2α1 + α2 − 3α1 α2 )(1 − x)x2 + (1 − α1 )2 (1 − α2 )x3 . The convolution α12 −2α1 (1 − α1 ) (1 − α1 )2 ⊗ −α2 (1 − α2 ) , is equal to the vector of coefficients of the scaled Bernstein basis form of p2 (x), and the Bernstein basis form of this polynomial is α1 (α1 + 2α2 − 3α1 α2 ) 2 3 p2 (x) = −α1 α2 (1 − x) + 3(1 − x)2 x 3 (1 − α1 )(2α1 + α2 − 3α1 α2 ) − 3(1 − x)x2 + (1 − α1 )2 (1 − α2 )x3 , 3 CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER 130 where the coefficients are obtained by dividing the ith scaled Bernstein basis coeffi cient by 3i , i = 0, . . . , 3, α (α +2α −3α α ) (1−α )(2α +α −3α α ) 1 1 2 1 2 1 1 2 1 2 −α12 α2 − (1 − α1 )2 (1 − α2 ) . 3 3 These two examples show that each coefficient of a Bernstein basis polynomial can be expressed as the repeated convolution of linear Bernstein basis polynomials, followed by division by the combinatorial coefficients. Pseudo-code for this repeated convolution is in [54], and it can be reproduced for Bernstein basis polynomials, with the division of each term in the final coefficient vector by a combinatorial factor, as shown in Examples 5.2 and 5.3. The Bernstein basis form of p(x) that has r distinct roots is m X m p(x) = ci (1 − x)m−i xi i i=0 = k r Y (x − αi )li i=1 m X m gi (z ) (1 − x)m−i xi , i i=1 = k = k r Y i=0 li − αi (1 − x) + (1 − αi )x where the root αi has multiplicity li , T z = α1 · · · αr , and (5.1) r X (5.2) li = m, i=1 m X m i k = (−1) (−1) ci , i i=0 m which is obtained by considering the coefficient of xm in the power basis form of p(x). Each coefficient gi (z ) in (5.2) is obtained by convolution followed by division by mi , CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER 131 as shown in Examples 5.2 and 5.3. It follows from (5.1) and (5.2) that the distinct roots αi can be determined from the coefficients ci by solving the over-determined non-linear equations kgi (α1 , . . . , αr ) = ci , i = 0, . . . , m, which is a set of (m + 1) equations in r unknowns. These equations can be written as G(z) =p, where G(z), p∈ Rm+1 , kg0 (α1 , . . . , αr ) .. . kgm (α1 , . . . , αr ) and thus the minimisation problem is minm z ∈C = c0 .. . cm , 1 kkG(z ) − pk2 , 2 which is identical to the minimisation problem considered in Section 4.6.3. The stationary points of 1 2 kkG(z ) − pk2 are the solutions of J(z )T [kG(z ) − p] = 0, (5.3) where the Jacobian matrix J(z ), which is of order (m + 1) × r, is ∂g0 (z ) ∂g0 (z ) ∂g0 (z ) · · · ∂α2 ∂αr ∂α1 ∂g (z ) ∂g (z ) k∂g1 (z ) 1 1 ··· ∂α1 ∂α2 ∂αr J(z ) = k . . . . . . . . . . . . . ∂gm (z ) ∂gm (z ) ∂gm (z ) · · · ∂α1 ∂α2 ∂αr The Gauss-Newton iteration for the solution of (5.3) is z j+1 = z j − J(z j )† [kG(z j ) − p] , j = 0, 1, . . . , (5.4) where the initial estimates z 0 of the roots are calculated by Uspensky’s method. It is clear that the iteration (5.4) is only defined if the left inverse J(z )† of J(z ) CHAPTER 5. A ROBUST POLYNOMIAL ROOT FINDER 132 exists, which requires that J(z ) have full column rank. It is shown in [54], for power basis polynomials, that this condition is necessarily satisfied because the roots αi , i = 1, . . . , r, are distinct, and the proof is easily extended to Bernstein basis polynomials. This paper also contains examples of the application of this method to the computation of multiple roots of a polynomial. 5.3 Summary This section has considered the implementation of Uspensky’s algorithm for the calculation of the roots of a polynomial. The method relies extensively on the GCD computations discussed in Chapter 4 because they are used to obtain initial estimates of the multiplicities of the roots. This reduces the computation of the multiple roots of a polynomial to the computation of the simple roots of several polynomials. These simple roots are then refined by the method of non-linear least squares, using the calculated multiplicities as prior information to obtain improved solutions. Chapter 6 Minimum description length An important part of the polynomial root solver described in Chapter 5 is the GCD calculations that are used to obtain the multiplicities of the roots, which are then used as prior knowledge in the solution of the non-linear equation that is used for the refinement of the initial estimates of the roots. The calculation of the multiplicity of each root requires that the numerical rank of a matrix be determined, which necessitates that a threshold be set, below which the singular values of the matrix can be assumed to be zero. This threshold is dependent upon the amplitude of the noise in the coefficients of the polynomial, but this may not be known, or it may only be known approximately. It is therefore desirable to estimate the rank of a noisy matrix, assuming that this noise amplitude is not known. This section describes the principle of minimum description length (MDL), which is an information theoretic measure that provides an objective criterion for the selection of one hypothesis from a collection of hypotheses in order to explain or model a given set of data [14, 36, 37, 38, 41]. 133 CHAPTER 6. MINIMUM DESCRIPTION LENGTH 6.1 134 Minimum description length Every set of data can be represented by a string of symbols from a finite alphabet, which is usually the binary alphabet, but tertiary and higher alphabets can be used. The fundamental idea of the principle of MDL is that any regularity in a given set of data can be used to compress it, that is, the data can be described by fewer symbols than are needed to describe it literally. Grünwald [14], pages 6-7, considers three long sequences of bits that differ in their regularity: • The first example consists of the repetition of the sequence of 0001 2500 times. The entire sequence, which is 10000 bits long, is therefore highly regular and it can be compressed significantly, that is, a short code is required to describe the entire sequence. • The second example of 10000 bits consists of the outcomes of tossing a fair coin, and it is therefore totally random. The bit sequence does not contain any regularity, and it cannot therefore be compressed. • The third example contains elements of the first and second examples because this bit stream contains about four times as many 0s as 1s, and deviations from this regularity are statistical rather than deterministic. Compression of this data is possible, but some information will be lost. The length of the code required to describe this bit stream is therefore between the lengths of the codes required for the two examples above. The application of the principle of MDL requires that the data be coded, such that the regularity of the data can be quantified. In particular, highly regular data requires a short code length (bit stream), and the code length increases as the randomness CHAPTER 6. MINIMUM DESCRIPTION LENGTH 135 of the data increases, such that totally random data cannot be compressed. This is closely related to the complexity theory developed by Kolmogorov, which is the fundamental idea of the theory of inductive inference developed by Solomonoff. The complexity theory of Kolmogorov does not provide a practical method for performing inference because it cannot be computed, and the principle of MDL can be considered as an attempt to modify Kolomogorov’s complexity theory, such that the revised theory is amenable to practical implementation. The principle of MDL requires that several hypotheses for a set of data be postulated, and it selects the hypothesis that compresses the data the most, that is, requires the fewest bits for its description. The compressive measure of each hypothesis is equal to the sum of the code lengths, in bits, of encoding the data using the hypothesis, and then decoding the encoded data, that is, estimating the error between the actual data and the data calculated by the model. Example 6.1. Consider Figure 6.1, which shows a set of points through which it is required to fit a polynomial. Figure 6.1(a) is an approximation curve that is obtained with a third order polynomial. It is relatively simple, but the error between it and the data points is small. By contrast, the polynomial curve in Figure 6.1(b) is of higher order and follows the exact fluctuations in the data, rather than the general pattern that underlies it, and Figure 6.1(c) is too simple because it does not capture the regularities in the data. Example 6.1 illustrates the general point that a very good fit is obtained if a high degree polynomial (that is, a complex model) is fitted through a set of data points, and a poor fit is achieved by the simple linear approximation of these points. The third order polynomial achieves a compromise because the complexity is sufficient to 136 CHAPTER 6. MINIMUM DESCRIPTION LENGTH 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 (a) 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 (b) 1 0 0 1 (c) Figure 6.1: (a) A third order approximation polynomial, (b) a fifth order interpolation curve, and (c) a linear approximation, of a set of data points. capture the regularity in the data, but the errors are small. Furthermore, if these three curves are tested against another set of data that originates from the same source, the fit for the fifth order polynomial and linear curves will be poor. By contrast, the errors between the third order polynomial and this new data set will be small. The complexity of a model in the principle of MDL is measured by the length, in bits, of its description. This measure of complexity and Example 6.1 suggest that the ‘best’ model among a given collection of models is the one that yields the shortest description of the model and the data: • The length of the model: The length L(Hi ) of the model Hi in the collection of all the models H = {H1 , H2 , . . . , HM } is defined as the code length of its encoded form. • The length of the data: The length L(D| Hi) is equal to the code length of the data D using the model Hi . CHAPTER 6. MINIMUM DESCRIPTION LENGTH 137 With reference to Example 6.1, M = 3 because there are three models: • H1 : Cubic polynomial approximation • H2 : Fifth order polynomial interpolation • H3 : Linear polynomial approximation and • L(H1 ) is the length of encoding the number of coefficients (‘4’), and their values. • L(H2 ) is the length of encoding the number of coefficients (‘6’), and their values. • L(H3 ) is the length of encoding the number of coefficients (‘2’), and their values. The length of description of each model (hypothesis) L(D| Hi) is a measure of the goodness-of-fit of the model Hi . Thus: • L(D| H1 ) is small because the error between the approximating curve and data points is small. • L(D| H2 ) is zero because the curve passes through the data points. • L(D| H3 ) is large because the error between the straight line and data points is large. It is seen that L(H2 ) + L(D| H2 ) is dominated by L(H2 ) (the complexity of the model), and L(H3 ) + L(D| H3 ) is dominated by L(D| H3 ) (the large errors in the reconstruction of the data). The principle of MDL states that the ‘best’ hypothesis is the one for which the length L(Hi ) + L(D| Hi), i = 1, 2, 3, is a minimum, and for the data in Example 6.1, L(H1 ) + L(D| H1 ) < L(H2 ) + L(D| H2 ), L(H3 ) + L(D| H3 ), CHAPTER 6. MINIMUM DESCRIPTION LENGTH 138 that is, the model H1 would be selected by the principle of MDL. The cubic approximation provides a compromise between the complexity of the model and the reconstruction errors of the data, and it therefore avoids the problem of overfitting, which can be a problem in regression if care is not taken. This simple example shows that the principle of MDL is consistent with the principle of parsimony or Occam’s Razor if the parsimony of a model is interpreted as its length: Choose the model that gives the shortest description of the data It is clear that the principle of MDL requires that the length of description of a model be quantified, and the following examples show that this description is closely related to the Shannon entropy and the average length of a coded message. 6.2 Shannon entropy and the length of a code This section shows that the Shannon entropy of a binary string allows an expression for the lengths (number of bits) of a model L(H), and the data given the model L(D| H), to be quantified. Let x= x1 x2 . . . xN , be a string of symbols from a finite alphabet X . An efficient coding scheme for these symbols requires that symbols that occur frequently have shorter code lengths than symbols that occur rarely. If lk = l(xk ) is the length of the code for the symbol xk ∈ X , and pk = p(xk ) is the probability of occurrence of xk , then the average length CHAPTER 6. MINIMUM DESCRIPTION LENGTH 139 of a code is E {L(x)} = X li pi . i The Shannon entropy of the code is X H(p(x)) = − pi log pi , i log ≡ log2 , and a standard result in information theory states that if the code is a prefix (instantaneous) code, that is, the codeword for xk is not the prefix of the codeword for xl , k 6= l, then the Kraft inequality N X k=1 2−l(xk ) ≤ 1, (6.1) is satisfied [30], pages 94-97. In this circumstance, H(p(x)) ≤ E {L(x)} ≤ H(p(x)) + 1, from which it follows that the Shannon entropy is a lower bound for the average length of a prefix code. The minimum value of the average length of a prefix code occurs, therefore, when the length of the code for the symbol xk is lk = − log pk . (6.2) This is an intuitively appealing result because it states that a symbol that occurs more frequently (a large value of pk ) has a shorter code length than a symbol that occurs less frequently (a small value of pk ). It is clear that lk is an integer only if the probability pk is of the form 2−q , q > 0, which cannot be guaranteed in practice. If it is required to construct a code for the symbols in X , then lk is the smallest integer that is equal to or larger than − log pk . Example 6.2. The code length of a deterministic integer j ∈ {0, 1, . . . , N − 1} is approximately equal to log N. This result follows from (6.2) by assuming that each CHAPTER 6. MINIMUM DESCRIPTION LENGTH 140 of the N integers has an equal probability of occurring. Example 6.2 considers the code length of an integer that lies in a defined range, and thus an upper bound on the length can be specified. The situation is more complicated when the magnitude of the integer is not known, and this is considered in Example 6.3. Example 6.3. Consider the situation that occurs when it is required to transmit the binary representation of a natural number n > 0 of unknown magnitude. If this number is followed by other numbers whose binary forms are to be transmitted, then it is necessary to mark the junction between them. One possible way to achieve this is to precede each binary representation by its length, in bits, and transmit this number, in addition to the binary representation of the number. For example, if it is required to transmit the number n = 101101000010110, then the code 1111, which is the binary representation of 15, the length (number of bits) of n, is transmitted, and thus the actual bit stream transmitted is s = 1111101101000010110. This does not solve the problem, however, because the end of the bit stream that represents log n, and the beginning of the bit stream that represents n, must be defined. Furthermore, the string s will, in practice, be preceded and followed by other bit streams, and the entire bit stream can be decoded if the codes for (a) n and log n, and (b) the codes for successive integers, can be distinguished. In order to solve this problem, Rissanen [36] proposed that log log n bits, that is, the length of the binary representation of log n, precede the log n bits, which precede the n bits. This process is repeated, so that the total code length of n > 0 is L∗ (n) = log∗ n + log c0 = log n + log log n + log log log n + · · · + log c0 , (6.3) 141 CHAPTER 6. MINIMUM DESCRIPTION LENGTH where the sum only involves positive terms, and log n is rounded up to the nearest integer.1 The constant c0 ≈ 2.865064 is added to make sure that the Kraft inequality (6.1) is satisfied with equality, ∞ X 2−L ∗ (j) = 1. (6.4) j=1 If n can take both positive and negative values, then this result is generalised to [41], 1 if n = 0 ∗ L (n) = (6.5) log∗ |n| + log 4c0 otherwise, where log∗ n is defined in (6.3), and the Kraft inequality (6.4) is replaced by ∞ X ∗ 2−L (j) = 1, j=−∞ because the summation is taken over all negative and positive integers. In particular, ∞ −1 ∞ X X X 1 ∗ −L∗ (j) −L∗ (j) 2 = 2 + 2−L (j) + 2 j=−∞ j=−∞ j=1 = 2 = 2 = 1 2 ∞ X j=1 ∞ X 2−L ∗ (j) 2− log j=1 ∞ X ∗ 2− log j=1 + 1 2 j−log 4c0 ∗ j−log c0 + 1 2 + 1 2 from (6.5) = 1 from (6.3) and (6.4), as required. The coding induced by (6.3) and (6.5) is called Elias omega coding, and it can be used to code and decode the natural numbers. The following procedure is used to code a natural number using this scheme: 1 It is necessary to round log n up to the nearest integer because, for example, 3 bits are required to represent 7 in binary and log 7 ≈ 2.807, and 4 bits are required to represent 8 in binary, but log 8 = 3. CHAPTER 6. MINIMUM DESCRIPTION LENGTH Integer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 142 Elias omega code 0 10 0 11 0 10 100 0 10 101 0 10 110 0 10 111 0 11 1000 0 11 1001 0 11 1010 0 11 1011 0 11 1100 0 11 1101 0 11 1110 0 11 1111 0 10 100 10000 0 10 100 10001 0 Table 6.1: The Elias omega code for the integers 1-17. 1. Place the bit 0 at the end of the representation. 2. IF the number to be encoded is 1, STOP. ELSE Place the binary representation of the number, as a string, at the beginning of the representation. 3. Repeat the previous step, using one digit less than the number of digits just written, as the new number to be encoded. Example 6.4. The binary representation of 14 is 1110, and the length of this string is 4. The output of Stage 2 is therefore 1110 0, where the space is included for ease of reading, but it is not transmitted. Since 4 digits have been added, it follows that CHAPTER 6. MINIMUM DESCRIPTION LENGTH 143 the binary representation of 4 − 1 = 3 must be added at the beginning of 1110 0, thus yielding the string 11 1110 0. Since 2 digits have been added, the next stage requires that the binary representation 2 − 1 = 1 be placed at the beginning of the string. It follows from the IF statement that the algorithm terminates, and thus the Elias omega code of 14 is 11 1110 0. The Elias omega codes for the integers 1-17 are shown in Table 6.1, and it is seen that the code for 1 is an exception because it is the only integer whose code starts with 0. The following examples show how to decode an integer that has been coded using the Elias omega code. Example 6.5. It is seen from Table 6.1 that the integer to be decoded is 1 if the first bit is 0, and the integer is greater than 1 if the first bit is 1. 1. Consider the integer represented by the string 1111010. Since the first bit is 1, the integer cannot be 1. The first two bits 11 are therefore considered, and since this is the binary representation of 3, the next 3 + 1 = 4 bits, that is, 1101 are considered. This is the binary representation of 13, and since the next bit in the string is 0, the decoding procedure is terminated. It follows that the integer represented by the string 1111010 is 13. 2. Consider the integer represented by the string 10100100010. It is clear that the integer cannot be 1, and thus the first two bits 10 are considered. This is the binary representation of 2, and hence the next 2 + 1 = 3 bits, that is, 100 are considered. This is the binary representation of 4, and thus the next 4 + 1 = 5 bits, that is, 10001 are considered. This is the 144 CHAPTER 6. MINIMUM DESCRIPTION LENGTH binary representation of 17, which is the desired integer because the next bit in the string is 0, which is the end of string marker. It follows that the integer represented by the string 10100100010 is 17. It is clear from this example that the termination bit 0 enables a long string that represents the concatenation of several integers to be decoded. Equation (6.2) applies to a discrete random variable, and its extension to a continuous random variable θ is obtained by discretisation. In particular, consider the situation that occurs when θ ranges over a subset A of the real line, in which case A can be discretised into a finite number of intervals, which enables the result for a discrete random variable to be applied. This is considered in the next example, where the discretisation is performed in Rk , thereby yielding a cell of dimension k. Example 6.6. Consider a probability density function p(θ) = p(θ1 , . . . , θk ) of k continuous random variables θ1 , . . . , θk , where θ ∈ Rk . The probability that the random variable θ lies between θ = Θ and θ = Θ + δΘ is equal to p(Θ) k Y δΘi , i=1 and thus − log p(Θ) k Y i=1 δΘi ! = − log p(Θ) − k X log δΘi . (6.6) i=1 This equation is obtained by considering a cell of side lengths δΘi > 0 and centred at θ = Θ. The exact code for a deterministic real variable v ∈ R generally requires an infinite number of bits, which must be compared with the finite number of bits that are required for the code of an integer. Truncation is therefore necessary for the binary representation of v, and (6.6) allows the code length L(vǫ ) for v to precision ǫ > 0 to CHAPTER 6. MINIMUM DESCRIPTION LENGTH 145 be defined as L(vǫ ) = L∗ ([v]) − log ǫ, (6.7) where vǫ is the value of v truncated to precision ǫ, that is, |v − vǫ | < ǫ, [v] is the integer nearest to v, and L∗ (j) is defined in (6.3) for j > 0, and in (6.5) for j ∈ Z. The second term on the right hand side of (6.7) is the number of fractional bits of the error due to truncation. Example 6.7. [41] Consider a discrete degradation model of exact data f ∈ RN by an error vector e ∈ RN , such that only noisy data d ∈ RN is available, d = f + e, e ∼ N (0, s2 I). (6.8) It is required to estimate the exact signal f , given noisy data d, and an algorithm that solves this problem involves the construction of a library L = {B1 , B2 , . . . , BM } of bases. If it is assumed that the unknown signal f can be decomposed exactly by k < N elements of one basis Bm , then f = Wm a(k) m , (6.9) (k) where Wm ∈ RN ×N and am ∈ RN is the vector of coefficients in the basis Bm . It is noted that the correct integer k and correct basis Bm are not known, and that (6.9) is a chosen model of the exact data f , rather than a physical model that can be used to explain it. The estimation problem has been transformed into a model selection problem, where the models are defined by the bases in the library L, and the number of terms in each basis, assuming additive white Gaussian noise of known variance. Data compression requires that k be as small as possible, but it is also required to minimise the distortion between the noisy and exact signals by choosing the most suitable basis, and this distortion usually decreases as k increases. There therefore 146 CHAPTER 6. MINIMUM DESCRIPTION LENGTH exists a conflict between data compression and data reconstruction, and the method of MDL allows this conflict to be resolved. 6.3 An expression for the code length of a model The examples in Section 6.2 allow the principle of MDL to be stated more clearly. Specifically, let H = {Hm : m = 1, 2, . . . } be a collection of models, where the integer m is the index of a particular model, and let x be a data vector of length N. It is assumed that the true model that generated the data x is not known. The code length for the selection of best model from H is L(x, θm , m) = L(m) + L(θm | m) + L(x| θm , m) km X L(θm,j | m) + L(x| θm , m), = L(m) + (6.10) j=1 where θm ∈ Rkm is the parameter vector of the model Hm . It is seen that the total code length is composed of the sum of three terms: • The code length of the index m • The code length of the model Hm , given m • The code length of the reconstructed data x using the model Hm The second term on the right hand side of (6.10) must be replaced by the code length of θ, truncated to precision δθm , as given in (6.7), and thus (6.10) is written as L(x, θm , m) = L(m) + km X i=1 L∗ ([θm,i ]) − km X log δθm,i + L(x| θm , m). (6.11) i=1 As each component δθm,i increases, corresponding to a coarser precision, the third term on the right hand side decreases, but L(x| θm , m) increases because the truncated 147 CHAPTER 6. MINIMUM DESCRIPTION LENGTH parameter vector can deviate more from its optimal non-truncated value. It therefore follows that it is necessary to determine the optimal value of the precisions δθm,i , such that the total code length is minimised. This calculation can be simplified by noting that the precision is independent of the index m, that is, it can be assumed that there is only one model in the set H, in which case (6.10) and (6.11) reduce to L(x, θ) = L(θ) + L(x| θ), (6.12) and L(x, θ) = k X i=1 ∗ L ([θi ]) − k X log δθi + L(x| θ), i=1 respectively. The calculation of the precision δθ that minimises the total code length is considered in the next section. 6.3.1 The precision that minimises the total code length Let θ = α be the value of θ that minimises the code length L(x, θ), which is defined in (6.12). As shown above, this optimal value must be truncated, and thus let θ̄ be the truncated value, to precision δα, of α. The parameter vectors θ̄ ∈ Rk and α ∈ Rk are related by θ̄ = α + δα, and it is required to determine δα such that the code length L(x, θ̄) is minimised. Consider the Taylor expansion of L(x, θ̄) about α, k k X ∂L(x, θ) 1X ∂ 2 L(x, θ) L(x, θ̄) = L(x, α) + + δαi δαj δαi ∂θ 2 ∂θ ∂θ i i j θ=α θ=α i,j=1 i=1 3 +O kδαk k 1X ∂ 2 L(x, θ) 3 = L(x, α) + δαi δα + O kδαk , j 2 i,j=1 ∂θi ∂θj θ=α 148 CHAPTER 6. MINIMUM DESCRIPTION LENGTH since α is the optimal non-truncated value of θ. It therefore follows from (6.7) that up to and including second order terms, k X 1 L(x, θ̄) = L (x, [α]) + δαT Γδα − log δαi , 2 i=1 (6.13) where Γ ∈ Rk×k is the matrix of the second derivatives of the code length L(x, θ) evaluated at θ = α and L (x, [α]) is the code length of L(x, α) when α is evaluated to precision δα. The term on the right hand side of (6.13) is a minimum when T 1 Γδα = β, β = δα1 δα1 . . . δα1 , δα 1 2 k−1 (6.14) k and thus the optimal truncation parameters are δα = Γ−1 β. Equation (6.13) therefore yields k L(x, θ̄) min k X ≤ L (x, [α]) + − log δαi , 2 i=1 (6.15) where an inequality has been used because the truncations δαi that minimise the code length L(x, θ̄) must still be determined. It is now shown that if the data vector x is sufficiently long, that is, N ≫ 1, then an expression for an upper bound of the minimum of L(x, θ̄) can be obtained. It follows from (6.12) that L (x, [α]) = L([α]) + L (x| [α]) , and thus (6.15) becomes L(x, θ̄) min k k X ≤ L([α]) + L (x| [α]) + − log δαi . 2 i=1 (6.16) This expression for the upper bound of the code length contains two parameters, α and δα, whose values must be determined. Consider initially the value of α, after which the form of δα for large values of N will be determined. The value of α must be calculated from the data x, and it is usual to select 149 CHAPTER 6. MINIMUM DESCRIPTION LENGTH the maximum likelihood (ML) estimate of θ, or an estimate based on a Bayesian procedure. The ML estimate is used in this work, and thus α = θ̂. The second term on the right hand side of (6.16) can be approximated by a simpler expression. In particular, this term is the code length of the reconstruction of the data x from [θ̂], the truncated form of the ML estimate of the parameter vector. In practice, however, it is rarely required to obtain [θ̂], and its non-truncated form, up to machine precision, is adequate because the likelihood surface is usually smooth. The second term on the right of (6.16) is therefore written as − log p(x|θ̂) − N log δd , δd ≈ 10−16 , where δd is the machine precision. The term N log δd is constant for all models in H and it can therefore be omitted from the expression for the upper bound of the minimum of L(x, θ̄), and thus (6.16) can be written as k L(x, θ̄) min k X ≤ L([θ̂]) + L(x|θ̂) + − log δ θ̂i . 2 i=1 (6.17) Let q(θ) be the prior probability density function of the parameter vector θ. Typically, it is obtained from training data, and thus it is independent of the data x. It therefore follows that (6.17) becomes k k X L(x, θ̄ǫ ) min ≤ − log q([θ̂]) − log p(x|θ̂) + − log δ θ̂i 2 i=1 k k X = − log p(x|θ̂)q([θ̂]) + − log δ θ̂i 2 i=1 k k X = − log p(x|θ̂)q(θ̂) + − log δ θ̂i , 2 i=1 because, as noted above, the non-truncated value of θ̂ is usually adequate. CHAPTER 6. MINIMUM DESCRIPTION LENGTH 150 Since it follows that L(x, θ̂) = L(θ̂) + L(x| θ̂) = − log p(x|θ̂)q(θ̂) , ∂ 2 log (p(x| θ)q(θ)) ∂ 2 L(x, θ) Γij = =− , ∂θi ∂θj θ=θ̂ ∂θi ∂θj θ=θ̂ (6.18) where Γ is defined in (6.13). Since p(x, θ) = p(x| θ)q(θ), (6.19) where p(x, θ) denotes the joint probability density function of θ and x, it follows that if the reconstruction errors of the data are independent, then p(x| θ) = N Y i=1 and thus − log p(x, θ) = − N X i=1 p(xi | θ), (6.20) log p(xi | θ) − log q(θ). Differentiation of both sides of (6.19) with respect to θ and the evaluation of the derivatives at θ = θ̂, followed by division by N, yields 1 ∂ 2 log p(x, θ) 1 ∂ 2 log p(x| θ) 1 ∂ 2 log q(θ) − =− − . N ∂θi ∂θj θ=θ̂ N ∂θi ∂θj N ∂θi ∂θj θ=θ̂ θ=θ̂ The second term on the right decreases to zero as N → ∞ because q(θ) is independent of N, and thus if N is sufficiently large, ∂ 2 log p(x, θ) ∂ 2 log p(x| θ) ≈ . ∂θi ∂θj θ=θ̂ ∂θi ∂θj θ=θ̂ It follows from (6.20) that − log p(x| θ) = − N X i=1 log p(xi | θ) ≈ −N (E {log p(x| θ)}) , (6.21) 151 CHAPTER 6. MINIMUM DESCRIPTION LENGTH where the expectation is taken with respect to x, and thus ∂ 2 log p(x| θ) ∂ 2 (E {log p(x| θ)}) − ≈ −N = µij N, ∂θi ∂θj θ=θ̂ ∂θi ∂θj θ=θ̂ where µij is a finite constant that is independent of N. Equations (6.18) and (6.21), and this approximation, therefore yield ∂ 2 L(x, θ) ∂ 2 log p(x, θ) ∂ 2 log p(x| θ) Γij = =− ≈− ≈ µij N, ∂θi ∂θj θ=θ̂ ∂θi ∂θj θ=θ̂ ∂θi ∂θj θ=θ̂ for i, j = 1, . . . , k, which implies that the elements of the matrix Γ/N are independent of N. This result is used in (6.14) to obtain an approximate expression for the elements of the vector δ θ̂. In particular, this equation can be written as k X 1 Γij δ θ̂j = , i = 1, . . . , k, N N δ θ̂ i j=1 which is approximated by N k X j=1 µij δ θ̂j = 1 δ θ̂i , i = 1, . . . , k. Since the constants µij and k are independent of N, it follows that where ci ≪ √ ci δ θ̂i ≈ √ , N ci = ci (µij , k), i, j = 1, . . . , k, N because δ θ̂i ≪ 1. Equation (6.17) therefore becomes L(x, θ̄) k min X k k ≤ L([θ̂]) + L(x|θ̂) + + log N − log ci , 2 2 i=1 and since it is assumed that N is large, this inequality is simplified to k L(x, θ̄)min ≤ L([θ̂]) + L(x|θ̂) + log N, 2 (6.22) which is an expression for the upper bound of the shortest code length with which long data sequences can be encoded. The last term on the right states that each parameter √ is encoded to precision 1/ N, and it is noted that this result is in agreement with the 152 CHAPTER 6. MINIMUM DESCRIPTION LENGTH distribution of the sample mean from a population that has a normal distribution, σ σx̄ = √ , N √ that is, the standard deviation of the sample mean σx̄ varies as 1/ N with the population standard deviation σ, where N is the size of the sample. The derivation of (6.22) assumes that the collection H contains only one model, and it is therefore necessary to extend this expression when there are several models. It follows from (6.11) and (6.22) that the expression for the minimum code length is L(x, θ̂m , m) = L(m) + km X L([θ̂m,j ]) + L(x| θ̂m , m) + j=1 km log N, 2 (6.23) where the model θm has km parameters. The minimum value of this expression yields the best compromise between the low complexity of the model and the high likelihood of the data. The code length L(m) = − log p(m) is the probability of selecting the mth model, and p(m) should reflect prior information about the models, such that models that are more likely to describe the data should be assigned a higher value of p(m) than other models. If this prior information does not exist, then it is assumed that all models are equally likely, and the uniform distribution should be used. The following two points are noted: • Even if the collection H of models does not include the correct model, the principle of MDL achieves the best result among the available models. • It is not claimed that the principle of MDL computes the absolute minimum description of the data. Rather, it requires an available collection of models and provides a criterion for selecting the best model from the collection. This must be compared with the Kolmogorov complexity, which provides the true CHAPTER 6. MINIMUM DESCRIPTION LENGTH 153 minimum description of the data, but it cannot be computed. 6.4 Examples This section contains several examples that show the practical application of the principle of MDL for the solution of some common problems in applied mathematics. Example 6.8. [41] Consider N data points (xi , yi ) ∈ R2 through which it is required to fit a polynomial. It is clear that the maximum degree of an approximating polynomial is N − 1, and thus the class of models is the set of polynomials of orders {0, 1, . . . , N − 1}. The parameters of the mth model, m = 0, . . . , N − 1, are the coefficients of a polynomial of degree m, θm = [a0 , a1 , . . . , am ]. It is assumed that the data is corrupted by Gaussian white noise of zero mean and standard deviation s2 , and it is required to compute the polynomial f , where yi = f (xi ) + ei , ei ∼ N (0, s2). The discussion above shows that it is first necessary to compute the code length of the mth model, that is, the code length of the m + 1 coefficients of a polynomial of degree m. The ML estimates of these coefficients are â = â0 , â1 , . . . , âm , and since the noise is composed of independent random samples, the ML estimates of these coefficients are equal to the least squares estimates. It is assumed that the independent variables xi , i = 1, . . . , N, and the noise variance s2 are known to the encoder and decoder, and they need not therefore be transmitted. If a polynomial of degree N − 1 were used to interpolate the data, then the fit would be perfect (no reconstruction errors) but information would not have been 154 CHAPTER 6. MINIMUM DESCRIPTION LENGTH gained because compression has not occurred. The other extreme occurs when the approximating polynomial is the constant polynomial. Although this is the simplest model and it permits maximum compression, a large number of bits are required to quantify the reconstruction errors, unless the underlying data is constant, in which case the reconstruction errors are zero. Consider the situation that occurs when prior information on the degree m of the approximating polynomial is not available, in which case L(m) = − log p(m) = log N. The error in the ith data point, i = 1, . . . , N, is m X ei = yi − âj xji , j=0 and thus its probability density function is √ 1 2πs2 exp − e2i 2s2 =√ 1 2πs2 yi − exp − Pm j j=0 âj xi 2s2 2 . Since there are N data points and the errors are assumed to be independent, the joint probability density function p(e) of these errors is 2 PN Pm j N Y i=1 yi − j=0 âj xi 1 e2 1 exp − i2 = , N N exp − 2 2s 2s (2πs2 ) 2 i=1 (2πs2 ) 2 and thus the code length of the reconstructed data is − log p(e), !2 N m X X N log e j − log p(e) = log(2πs2 ) + yi − âj xi . 2 2s2 i=1 j=0 CHAPTER 6. MINIMUM DESCRIPTION LENGTH 155 It follows that the expression (6.23) for the total code length is m X (m + 1) N L(y, θm , m) = log N + L∗ ([âj ]) + log N + log(2πs2 ) 2 2 j=0 !2 N m X log e X + 2 yi − âj xji , 2s i=1 j=0 and the principle of MDL requires that the ‘best polynomial’ is obtained by choosing the degree m∗ that minimises this expression. Since N is constant and it is assumed that the variance s2 is known to the encoder and decoder, the first and fourth terms are constant, and they can therefore be neglected in the minimisation. The next example extends Example 6.7 by considering the formulae for the code lengths of the index m, the model Hm given m, and the reconstruction of the data x using the model Hm . Example 6.9. Consider an encoder and decoder for the data in Example 6.7, in which the library L consists of M orthogonal bases. Given the integers k and m in (6.9), then 1. The encoder expands the data d in the basis Bm . 2. The number of terms k, the specification of the basis indexed by m, the k expansion coefficients, the variance s2 of the Gaussian noise, and the reconstruction errors are transmitted to the decoder. 3. The decoder receives this information, in bits, and attempts to reconstruct the data d. The total code length to be minimised is expressed as the sum of the following code lengths: 156 CHAPTER 6. MINIMUM DESCRIPTION LENGTH • The natural numbers k and m (k) • The (k + 1) real parameters am and s2 , given k and m (k) • The deviations of the observed data d from the estimated signal f = Wm am , (k) given k, m, am and s2 . The approximate total code length is therefore given by (k) ˆ2 (k) ˆ2 ˆ2 L(d, âm , s , k, m) = L(k, m) + L âm , s | k, m + L(d| â(k) m , s , k, m), (6.24) (k) (k) where âm and sˆ2 , the ML estimates of am and s2 respectively, are now derived. Since it is assumed that the noise is white and Gaussian, it follows from (6.8) that the probability density function of the data, given all the model parameters, is 2 (k) 1 d − Wm am 2 exp p(d| a(k) , s , k, m) = − , N m 2s2 (2πs2 ) 2 and thus the loglikelihood of this density function is 2 (k) d − W a m m N 2 2 ln 2πs − . ln p(d| a(k) , s , k, m) = − m 2 2s2 (6.25) Differentiation of both sides of this expression with respect to s2 yields the ML estimate sˆ2 of s2 , sˆ2 = 2 (k) d − Wm am N , and thus the loglikelihood expression (6.25) becomes 2 (k) 2π d − Wm am N ˆ2 , k, m) = − N ln ln p(d| a(k) , s − . m 2 N 2 (6.26) (6.27) Since Wm is orthogonal, the vector of expansion coefficients of d in the basis Bm is 157 CHAPTER 6. MINIMUM DESCRIPTION LENGTH equal to d˜m = WmT d, and thus 2 2 ˜ T (k) 2 (k) d − Wm a(k) = W W d − a = d − a m m m m m m . It follows from this equation that the loglikelihood expression (6.27) is maximised 2 (k) (k) when d˜m − am is minimised, and since am contains exactly k non-zero elements, this minimum occurs when these k elements are equal to the k largest coefficients in (k) magnitude of d˜m . The ML estimate âm is therefore given by (k) ˜ (k) T â(k) m = Θ dm = Θ Wm d, where Θ(k) ∈ RN ×N is a threshold matrix that retains the k largest elements of d˜m , and sets the other elements equal to zero. The substitution of this expression for the (k) ML estimate of am into (6.26) yields (I − Θ(k) )W T d2 m , sˆ2 = N (6.28) for the ML estimate of s2 , where I is the identity matrix of order N. The expressions for the ML estimates âm and sˆ2 enable (6.24) to be considered in (k) more detail. It is assumed that prior information on the value of m is not available, and thus L(m) = log M, which is a constant and can therefore be neglected. The integer k < N, the number of non-zero coefficients, must be transmitted, and this requires a maximum of about k log N bits, and thus L(k, m) = L(k) = k log N. The second term in (6.24) represents the code lengths of transmitting the ML (k) estimates âm and sˆ2 , and these lengths are (k + 1) ∗ (k) ∗ ˆ2 ˆ2 L(â(k) log N. m , s | k, m) = L ([âm ]) + L ([s ]) + 2 The third term in (6.24) is calculated from (6.27), but using logarithm to base 2 CHAPTER 6. MINIMUM DESCRIPTION LENGTH 158 instead of natural logarithms. In particular, it is easily verified that, using (6.28), 2 (k) d − W a m m N 2 2 log(2πs ) + log e − log p(d| a(k) , s , k, m) = m 2 2s2 2 N 2πe N = log + log (I − Θ(k) )WmT d , 2 N 2 where the first term can be ignored because it is independent of k and m, and similarly, the noise s2 is independent of k and m. It therefore follows that the total code length to be minimised is, ignoring all constant terms and assuming that prior information on m is not available, 2 ∗ (k) (k) ˆ2 ˆ2 L(d, â(k) m , ŝ , k, m) = k log N + L ([âm ], [s ]| k, m) + L(d| âm , s , k, m) k X 3k ∗ ˆ2 log N = log∗ â(k) m i + log [s ] + 2 i=1 2 N (6.29) + log (I − Θ(k) )WmT d , 2 where log∗ j, j ∈ Z, is defined in (6.3) and (6.5). The expression (6.29) is minimised over all values of k and m in the ranges 1≤k<N and 1 ≤ m ≤ M, respectively. If prior information on the values of k and m is available, then it can be included in (6.29). For example, if it is known that the number of terms k in the basis functions satisfies k1 ≤ k ≤ k2 and the uniform distribution is assumed in this range of k, then L(m) + log(k2 − k1 + 1) if k1 ≤ k ≤ k2 L(k, m) = +∞ otherwise. Example 6.10. Zaarowski [51] considers the application of the principle of MDL for the estimation of the rank of a noisy matrix when the noise is not known. CHAPTER 6. MINIMUM DESCRIPTION LENGTH 159 Consider a matrix A ∈ Rm×n of have rank r ≤ p = min (m, n) whose singular values are σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σj = 0, j = r + 1, . . . , p. In many practical problems, the singular values are known approximately and not exactly, in which case only estimates σ̂i of the exact singular values σi are available, σi + ei i = 1, . . . , r σ̂i = (6.30) ei i = r + 1, . . . , p. It is assumed that the errors ei are statistically independent random variables with Gaussian and Laplacian probability density functions, √ 1 exp − e2i2 i = 1, . . . , r 2s 2πs p(ei ) = α exp(−αei ) i = r + 1, . . . , p, (6.31) where s, α > 0. This simple model is used because it enables considerable analytical progress to be made, and in particular, it provides a trade-off between a physically accurate model and a mathematically simple model. It is assumed that a polynomial provides a good model for the variation of the exact non-zero singular values σj , j = 1, . . . , r, with j, σ1 = a0 10 + a1 11 + · · · + ak 1k σ2 = a0 20 + a1 21 + · · · + ak 2k .. . σr = a0 r 0 + a1 r 1 + · · · + ak r k , that is, σj = k X l=0 al j l = b(j)T a, j = 1, . . . , r, (6.32) 160 CHAPTER 6. MINIMUM DESCRIPTION LENGTH where the vectors a and b(j) are, respectively, T a = a0 a1 · · · ak ∈ Rk+1 , and b(j) = j0 j1 · · · jk T ∈ Rk+1 . The integer k is the degree of the polynomial model of the singular values, which are arranged in non-increasing order. It follows that the interpolating polynomial cannot have maxima or minima, and thus a low degree polynomial, k = 2 or k = 3, is adequate. The integer k is therefore assumed to be a known constant. The r equations in (6.32) can be combined into one equation T b(1) σ 1 σ2 b(2)T . = . a, .. .. T b(r) σr and since the matrix in this equation is of order r × (k + 1), it follows that the least squares solution of this equation is unique if r ≥ (k + 1). Furthermore, since n ≥ r, it follows that r satisfies the inequalities (k + 1) ≤ r ≤ n. It follows from (6.30) and (6.31) that the joint probability density function of the random variables ei is p r X αp−r 1 X 2 − 2 e −α σ̂i r exp 2s i=1 i (2πs2 ) 2 i=r+1 ! , and the substitution of (6.30) and (6.32) into this expression yields the probability 161 CHAPTER 6. MINIMUM DESCRIPTION LENGTH density function for the estimates σ̂j of the exact singular values σj , !2 p r k p−r X X X α l − 1 pσ̂ = σ̂ − a j − α σ̂i r exp j l 2s2 j=1 (2πs2 ) 2 i=r+1 l=0 ! p r X 1 X αp−r 2 = − 2 σ̂j − b(j)T a − α σ̂i , r exp 2 2 2s (2πs ) j=1 i=r+1 (6.33) where pσ̂ = pσ̂ (σ̂| a, s2 , α, k, r). The ML estimate of α is obtained by setting the partial derivative of pσ̂ with respect to α equal to zero, which yields p−r α̂ = Pp ˆj j=r+1 σ . (6.34) Similarly, the ML estimate of the variance s2 satisfies r X 2 σ̂j − b(j)T a = r sˆ2 , (6.35) j=1 and the ML estimate of the vector a satisfies Hâ = q, where the coefficient matrix H is Hankel, and r X H= b(j)b(j)T ∈ R(k+1)×(k+1) and j=1 (6.36) q= r X j=1 σ̂j b(j) ∈ Rk+1 . Zarowski [51] uses (6.33), (6.34), (6.35) and (6.36) to derive an expression for the total code length, which he then minimises in order to find the best estimate of the rank r of A. This procedure yields an ill-conditioned linear algebraic equation because of the poor numerical properties of the interpolating polynomials that are stored in the vector b(j). This problem is overcome by using orthogonal polynomials, which are numerically well-behaved, and Zarowski therefore uses Gram polynomials [18], [28] and [35] in order to obtain an equation that has better numerical properties. He gives examples in order to show the effectiveness of the principle of MDL. CHAPTER 6. MINIMUM DESCRIPTION LENGTH 6.5 162 Summary This chapter has considered the theoretical principles of MDL for the selection of a hypothesis, from a collection of hypotheses, that best explains a given set of data. The principle of MDL, which is closely related to Occam’s razor, does not find the globally best model because it makes its selection from the given hypotheses. It was shown that it is necessary to distinguish between the code length of a deterministic and probabilistic parameter, and between an integer variable and a real variable. The calculation of the code length of a random variable uses the Shannon entropy, and that the code length of an integer is given by its Elias omega code. Several examples of the application of the principle of MDL were given. Bibliography [1] J. D. Allan and J. R. Winkler. Structure preserving methods for the computation of approximate GCDs of Bernstein polynomials. In P. Chenin, T. Lyche, and L. L. Schumaker, editors, Curve and Surface Design: Avignon 2006, pages 11–20. Nashboro Press, Tennessee, USA, 2007. [2] J. Barlow. Error analysis and implementation aspects of deferred correction for equality constrained least squares problems. SIAM J. Numer. Anal., 25(6):1340– 1358, 1988. [3] J. Barlow and U. Vemulapati. A note on deferred correction for equality constrained least squares problems. SIAM J. Numer. Anal., 29(1):249–256, 1992. [4] S. Barnett. Polynomials and Linear Control Systems. Marcel Dekker, New York, USA, 1983. [5] R. M. Corless, P. M. Gianni, B. M. Trager, and S. M. Watt. The singular value decomposition for polynomial systems. In Proc. Int. Symp. Symbolic and Algebraic Computation, pages 195–207. ACM Press, New York, 1995. 163 164 BIBLIOGRAPHY [6] R. M. Corless, S. M. Watt, and L. Zhi. QR factoring to compute the GCD of univariate approximate polynomials. IEEE Trans. Signal Processing, 52(12):3394– 3402, 2004. [7] I. Emiris, A. Galligo, and H. Lombardi. Numerical univariate polynomial GCD. In J. Renegar, M. Schub, and S. Smale, editors, The Mathematics of Numerical Analysis. Volume 32 of Lecture Notes in Applied Mathematics, pages 323–343. AMS, 1996. [8] I. Emiris, A. Galligo, and H. Lombardi. Certified approximate univariate GCDs. J. Pure and Applied Algebra, 117,118:229–251, 1997. [9] R. T. Farouki and V. T. Rajan. On the numerical condition of polynomials in Bernstein form. Computer Aided Geometric Design, 4:191–216, 1987. [10] L. Foster. Generalizations of Laguerre’s method. SIAM J. Numer. Anal., 18:1004–1018, 1981. [11] C. F. Gerald and P. O. Wheatley. Applied Numerical Analysis. Addison-Wesley, USA, 1994. [12] S. Goedecker. Remark on algorithms to find roots of polynomials. SIAM J. Sci. Stat. Comput., 15:1059–1063, 1994. [13] G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, USA, 1996. [14] P. Grünwald. A tutorial introduction to the minimum description length principle http://www.grunwald.nl, 2005. BIBLIOGRAPHY 165 [15] E. Hansen, M. Patrick, and J. Rusnack. Some modificiations of Laguerre’s method. BIT, 17:409–417, 1977. [16] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA, 1996. [17] N. J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, USA, 2002. [18] F. B. Hildebrand. Introduction to Numerical Analysis. Tata McGraw-Hill, New Delhi, India, 1974. [19] D. G. Hough. Explaining and Ameliorating the Ill Condition of Zeros of Polynomials. PhD thesis, Department of Computer Science, University of California, Berkeley, USA, 1977. [20] V. Hribernig and H. J. Stetter. Detection and validation of clusters of polynomial zeros. Journal of Symbolic Computation, 24:667–681, 1997. [21] M. A. Jenkins and J. F. Traub. A three-stage variable-shift iteration for polynomial zeros and its relation to generalized Raleigh iteration. Numerische Mathematik, 14:252–263, 1970. [22] M. A. Jenkins and J. F. Traub. Algorithm 419: Zeros of a complex polynomial. Comm. ACM, 15:97–99, 1972. [23] W. Kahan. Conserving confluence curbs ill-condition. Technical report, Department of Computer Science, University of California, Berkeley, USA, 1972. BIBLIOGRAPHY 166 [24] W. Kahan. The improbability of probabilistic error analyses for numerical computations. http://www.cs.berkeley.edu/∼wkahan/improber.ps, 1996. [25] E. Kaltofen, Z. Yang, and L. Zhi. Structured low rank approximation of a Sylvester matrix, 2005. Preprint. [26] N. Karmarkar and Y. N. Lakshman. Approximate polynomial greatest common divsior and nearest singular polynomials. In Proc. Int. Symp. Symbolic and Algebraic Computation, pages 35–39. ACM Press, New York, 1996. [27] B. Li, Z. Yang, and L. Zhi. Fast low rank approximation of a Sylvester matrix by structured total least norm. J. Japan Soc. Symbolic and Algebraic Comp., 11:165–174, 2005. [28] J. S. Lim and A. V. Oppenheim. Advanced Topics in Signal Processing. Prentice Hall, Englewood Cliffs, New Jersey, USA, 1988. [29] C. Van Loan. On the method of weighting for equality-constrained least squares problems. SIAM J. Numer. Anal., 22(5):851–864, 1985. [30] D. J. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge, UK, 2003. [31] K. Madsen. A root-finding algorithm based on Newton’s method. BIT, 13:71–75, 1973. [32] D. Manocha and J. Demmel. Algorithms for intersecting parametric and algebraic curves II: Multiple intersections. Graphical Models and Image Processing, 57(2):81–100, 1995. BIBLIOGRAPHY 167 [33] V. Y. Pan. Solving a polynomial equation: Some history and recent progress. SIAM Review, 39(2):187–220, 1997. [34] V. Y. Pan. Computation of approximate polynomial GCDs and an extension. Information and Computation, 167:71–85, 2001. [35] A. Ralston. A First Course in Numerical Analysis. McGraw Hill, USA, 1965. [36] J. Rissanen. A universal prior for integers and estimation by minimum description length. Ann. Statist., 11(2):416–431, 1983. [37] J. Rissanen. Universal coding, information, prediction and estimation. IEEE Trans. Information Theory, 30(4):629–636, 1984. [38] J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore, 1989. [39] J. Ben Rosen, H. Park, and J. Glick. Total least norm formulation and solution for structured problems. SIAM J. Mat. Anal. Appl., 17(1):110–128, 1996. [40] D. Rupprecht. An algorithm for computing certified approximate GCD of n univariate polynomials. J. Pure and Applied Algebra, 139:255–284, 1999. [41] N. Saito. Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion. In E. Foufoula-Georgiou and P. Kumar, editors, Wavelets in Geophysics, pages 299–324, Boston, MA, 1994. Academic Press. [42] J. V. Uspensky. Theory of Equations. McGraw-Hill, New York, USA, 1948. BIBLIOGRAPHY 168 [43] J. Wilkinson. Rounding Errors In Algebraic Processes. Prentice-Hall, Englewood Cliffs,N.J., USA, 1963. [44] J. R. Winkler. A statistical analysis of the numerical condition of multiple roots of polynomials. Computers and Mathematics with Applications, 45:9–24, 2003. [45] J. R. Winkler. Numerical and algebraic properties of Bernstein basis resultant matrices. In T. Dokken and B. Jüttler, editors, Computational Methods for Algebraic Spline Surfaces, pages 107–118, Germany, 2005. Springer-Verlag. [46] J. R. Winkler. High order terms for condition estimation of univariate polynomials. SIAM J. Sci. Stat. Comput., 28(4):1420–1436, 2006. [47] J. R. Winkler and J. D. Allan. Structured low rank approximations of the Sylvester resultant matrix for approximate GCDs of Bernstein polynomials, 2006. Submitted to Computer Aided Geometric Design. [48] J. R. Winkler and J. D. Allan. Structured total least norm and approximate GCDs of inexact polynomials, 2006. To appear in Journal of Computational and Applied Mathematics. [49] J. R. Winkler and R. N. Goldman. The Sylvester resultant matrix for Bernstein polynomials. In T. Lyche, M. Mazure, and L. L. Schumaker, editors, Curve and Surface Design: Saint-Malo 2002, pages 407–416, Tennessee, USA, 2003. Nashboro Press. [50] J. R. Winkler and J. Zı́tko. The transformation of the Sylvester matrix and the calculation of the GCD of two inexact polynomials, 2007. In preparation. BIBLIOGRAPHY 169 [51] C. J. Zarowski. The MDL criterion for rank determination via effective singular values. IEEE Trans. Signal Processing, 46(6):1741–1744, 1998. [52] C. J. Zarowski, X. Ma, and F. W. Fairman. QR-factorization method for computing the greatest common divisor of polynomials with inexact coefficients. IEEE Trans. Signal Processing, 48(11):3042–3051, 2000. [53] Z. Zeng. The approximate GCD of inexact polynomials. Part 1: A univariate algorithm, 2004. Preprint. [54] Z. Zeng. Computing multiple roots of inexact polynomials. Mathematics of Computation, 74:869–903, 2005. [55] L. Zhi and Z. Yang. Computing approximate GCD of univariate polynomials by structured total least norm. Technical report, Institute of Systems Science, AMSS, Academia Sinica, Beijing, China, 2004.