The Search for the Nearest Defective Matrix Michael L. Overton Courant Institute of Mathematical Sciences New York University Joint work with R. Alam, S. Bora, R. Byers RPB Workshop Berlin, 2006 Brent Memories • 1975 • 1986-1987 The Nearest Defective Matrix to A • Defective means having an eigenvalue with algebraic multiplicity > geometric multiplicity • Equivalently, not diagonalizable • Equivalently, with a nontrivial Jordan block • Equivalently, with a nonlinear elementary divisor • Equivalently, with an infinitely ill-conditioned eigenvalue • Same as distance to nearest matrix with a multiple eigenvalue • Same as distance to nearest matrix with a double eigenvalue • The search for this matrix began with Wilkinson Wilkinson, AEP, 1965 • Defined condition number of a simple eigenvalue as 1/|y*x|, where y and x are respectively normalized left and right eigenvectors • “Even if the eigenvalues are well separated they may still be very ill-conditioned.” • An example of such A is given • “It might be expected that there is a matrix close to A which has some nonlinear elementary divisors and we can readily see that this is true…” Ruhe, Numer. Math., 1970 • “It is a well known fact that a matrix which is close to one with multiple eigenvalues has an ill-conditioned eigensystem. We show a converse, that if a matrix has an ill-conditioned eigensystem, it is also close to a matrix having multiple eigenvalues.” Golub and Wilkinson, SIREV, 1976 • “The solution of the eigenvalue problem for a nonnormal matrix presents severe practical difficulties when A is defective or close to a defective matrix.” • “We now show that when an eigenvalue of A is illconditioned, A is necessarily relatively close to a matrix with a multiple eigenvalue.” • Improves on Ruhe’s bound and an unpublished bound of Kahan Demmel, PhD thesis, 1983 • Defines diss_path(A) as distance from n by n matrix A to nearest matrix with multiple eigenvalues (“path” refers to the path traveled by the eigenvalues under a smoothly varying perturbation) • Defines diss_region(A) as largest e such that the “area swept out by the eigenvalues under perturbation”, {z: z is an eigenvalue of A+E for some E with ||E||≤ e}, consists of n disjoint regions • Notes that for all norms diss_path ≥ diss_region, since in the 2nd definition the perturbations may be different • Observes that for some norms diss_path > diss_region • Says it is an interesting open question as to whether diss_path = diss_region in the case of the 2 and Frobenius norms Pseudospectra • Pseudospectra Gateway www.comlab.ox.ac/pseudospectra/ • Spectra and Pseudospectra, Trefethen and Embree, Princeton, 2005 • In language of pseudospectra, Demmel’s question asks whether, for the 2 and F norms, the distance to the nearest defective matrix is the same as the largest e for which the e-pseudospectrum consists of n disjoint regions • That is, the smallest e for which two e-pseudospectral components coalesce • EigTool (T. Wright and Trefethen, 2003) (2 and F norm) Fundamental Thm of Pseudospectra • The following two definitions of the 2 and F norm epseudospectrum of A are equivalent: • {z: det(A+E - zI) = 0 for some E with ||E||≤ e} • {z: sn(A – zI) ≤ e} (where sn denotes smallest singular value) • Easy to prove via SVD: – If z satisfies first condition, sn(A – zI) ≤ e – If z satisfies second condition, A – sn(A – zI) unvn* must have an eigenvalue z • • • • Thus EigTool plots contours of sn Extends to other norms Importance emphasized by Trefethen for many years Who first made this observation? Wilkinson, Utilitas Math, 1984, 71 pp • “A problem of primary interest to us is the distance, measured in the 2 norm, of our matrix A from matrices having multiple eigenvalues.” • “We expect that when the condition number is large A will be, at least in some sense, close to a matrix having a double eigenvalue. A major objective of this paper is to quantify this statement.” • “The nearest defective matrix will often have an eigenvalue of multiplicity 2 and the nearest matrix with an eigenvalue of larger multiplicity may be much further away.” • Still no mention of pseudospectra Wilkinson, Utilitas Math 30, 1986, 43 pp. • “The domain D(e) in the complex plane defined by ||(A – zI)-1||-1 ≤ e uniquely identifies all z which can be induced as eigenvalues by perturbations E with ||E||≤e.” • “We would like to emphasize the sheer economy of this theorem.” • May be the first explicit observation of the fundamental theorem of pseudospectra, although Varah observed the less trivial direction direction in 1979 (possible exception: Landau) • “The behaviour of the domain D(e) as e increases from 0 is of fundamental interest to us.” • “When A has distinct eigenvalues and e is sufficiently small, D(e) consists of n isolated domains” [connected components] Wilkinson, Util Math 30, continued • “A problem of basic interest to us is the smallest value of e for which one of these domains [connected components] coalesces with one of the others.” • Observes that coalescence of two components of D(e) at z for a particular e shows that each of two eigenvalues can be moved to the same z by a perturbation of norm e but this does not imply that a single perturbation exists that can induce a double eigenvalue. • Like Demmel, gives examples where this is indeed not possible for the 1 and norms • Observes that “it might be felt that the 2 norm is more satisfactory…” but that “computation of the smallest singular value for a range of values of z is generally prohibitive”. The Editors, Util Math 31, 1987 • When we sent Vol 30 to press, the last paper was a 40-page study by Wilkinson on sensitivity of eigenvalues. At that time we little thought that Vol 31 would commence with an “in memoriam” notice… Demmel, 1987 • In proceedings of a conference in memory of Wilkinson • “No counterexample [to a conjecture that the answer to his question is yes] is yet known.” • “A simple guaranteed way to compute the distance to the nearest defective matrix remains elusive.” Malyshev, Numer. Math., 1999 • Distance to nearest defective matrix in 2-norm is • Inner minimization is unimodal, but outer is potentially a hard global optimization problem. • Inspired by algorithm to compute the “real stability radius” Edelman and Lippert, 1998-1999 • Used ideas from both pseudospectra and differential geometry • Used these to argue that, generically, distance to nearest defective matrix in the F-norm is the height of the lowest saddle point of f(x,y) = sn(A – (x+iy) I) • The implication is that the answer to Demmel’s question is yes, at least generically, but this is not addressed • No algorithm is give to find this lowest critical point Alam and Bora, LAA, 2005 • First proof that the answer to Demmel’s question is yes • Furthermore, the infimum is always achieved by a defective matrix with a double eigenvalue • Proof is not easy • A. Lewis has now obtained a more topologicallybased proof • This confirms that the nearest nondefective matrix is A – sn(A–zI)unvn*, where z = x+iy and(x,y) is the lowest saddle point of f(x,y) = sn(A – (x+iy) I) for the 2 and F norms – as long as sn(A–zI) is simple Nongeneric Case • What if sn-1(A-zI) = sn(A-zI) ? • Then A – sn-1(A-zI)un-1vn-1* – sn(A-zI) unvn* is a nearest matrix with a multiple eigenvalue – in the 2 norm • This eigenvalue is nondefective (it has algebraic multiplicity 2 and geometric multiplicity 2) • One can also construct a nearest defective matrix, for both 2 and F norms • Example: normal matrices: pseudospectral components are circles • More interesting: nonnormal block diagonal matrices, when two coalescing eigenvalues come from different blocks: pseudospectral components are not circles but coalesce tangentially Clarke Generalized Gradient • Clarke (PhD thesis, 1973; book, 1982) • Assume f is locally Lipschitz • The Clarke generalized gradient of f at x RN is the convex hull of limits of gradients of f at x, ∂Cf(x) = conv { limr Df(xr) : xrx , xr Q} where Q = {w: f is differentiable at w} Alam, Bora, Byers, Overton, 2006 • Theorem: in all cases, for the 2 and F norm, the distance to the nearest defective matrix is the height of the lowest generalized saddle point of f(x,y) = sn(A – (x+iy) I) • That is, such that 0 ∂Cf (x,y), but (x,y) is not a local extremum of f • Covers the generic case that sn-1 < sn (smooth saddle point, sn has a zero gradient) • And the nongeneric case that sn-1 = sn (nonsmooth saddle point, sn is not differentiable at saddle point and has two different limits, which point in opposite directions when point of coalescence is approached from the two different pseudospectral components Algorithm • Still no guaranteed algorithm known • Heuristic: apply Newton’s method in 2 real variables (1 complex variable) to search for a zero of the gradient of sn(A – (x+iy) I) • This breaks down in the nongeneric case An Iteration Covering Both Cases • Idea: apply Newton’s method to find a zero of the following function mapping C x R to C x R • Use known bounds of Wilkinson and Alam to provide starting points (weighted averages of eigenvalue pairs) Convergence Rate • Smooth saddle point: quadratic in theory and practice • Nonsmooth saddle point: typically also see quadratic convergence • But the function to which we are applying Newton’s method is not smooth on C X R ! • We think we can show quadratic convergence by looking at the behaviour along lines through the nonsmooth saddle point Transition Between Cases • If we break the block diagonal structure by adding a small top right entry connecting the blocks, we see abrupt transition from the nongeneric to the generic case • Condition number of Jacobian jumps arbitrarily large and drops as we increase the size of the perturbation • When we increase it enough, algorithm transitions to discovering that m = 0, indicating the saddle point is smooth Sepl • Given A Cnxn and B Cmxm, find smallest max{||E||, ||F||} such that A+E and B+F share a common eigenvalue • Equivalently, find smallest e such that e pseudospectra are disjoint • Posed by Varah, 1979, Demmel 1986 • First algorithm: Gu-Overton, 2006 • This is a nearest defective matrix problem for Diag(A,B) with the requirement that the block diagonal structure be preserved Thanks To • Volker Merhmann, our host in Berlin in 2004 • Gene Golub and Nick Trefethen, from whom I learned about numerical linear algebra and pseudospectra • The Brent family, for many special memories