The search for the nearest defective matrix

advertisement
The Search for the Nearest
Defective Matrix
Michael L. Overton
Courant Institute of Mathematical Sciences
New York University
Joint work with R. Alam, S. Bora, R. Byers
RPB Workshop
Berlin, 2006
Brent Memories
• 1975
• 1986-1987
The Nearest Defective Matrix to A
• Defective means having an eigenvalue with algebraic
multiplicity > geometric multiplicity
• Equivalently, not diagonalizable
• Equivalently, with a nontrivial Jordan block
• Equivalently, with a nonlinear elementary divisor
• Equivalently, with an infinitely ill-conditioned eigenvalue
• Same as distance to nearest matrix with a multiple
eigenvalue
• Same as distance to nearest matrix with a double
eigenvalue
• The search for this matrix began with Wilkinson
Wilkinson, AEP, 1965
• Defined condition number of a simple eigenvalue as
1/|y*x|, where y and x are respectively normalized
left and right eigenvectors
• “Even if the eigenvalues are well separated they may
still be very ill-conditioned.”
• An example of such A is given
• “It might be expected that there is a matrix close to
A which has some nonlinear elementary divisors and
we can readily see that this is true…”
Ruhe, Numer. Math., 1970
• “It is a well known fact that a matrix which is close to
one with multiple eigenvalues has an ill-conditioned
eigensystem. We show a converse, that if a matrix
has an ill-conditioned eigensystem, it is also close to
a matrix having multiple eigenvalues.”
Golub and Wilkinson, SIREV, 1976
• “The solution of the eigenvalue problem for a
nonnormal matrix presents severe practical difficulties
when A is defective or close to a defective matrix.”
• “We now show that when an eigenvalue of A is illconditioned, A is necessarily relatively close to a
matrix with a multiple eigenvalue.”
• Improves on Ruhe’s bound and an unpublished
bound of Kahan
Demmel, PhD thesis, 1983
• Defines diss_path(A) as distance from n by n matrix A
to nearest matrix with multiple eigenvalues (“path”
refers to the path traveled by the eigenvalues under a
smoothly varying perturbation)
• Defines diss_region(A) as largest e such that the “area
swept out by the eigenvalues under perturbation”,
{z: z is an eigenvalue of A+E for some E with ||E||≤ e},
consists of n disjoint regions
• Notes that for all norms diss_path ≥ diss_region, since
in the 2nd definition the perturbations may be different
• Observes that for some norms diss_path > diss_region
• Says it is an interesting open question as to whether
diss_path = diss_region in the case of the 2 and
Frobenius norms
Pseudospectra
• Pseudospectra Gateway
www.comlab.ox.ac/pseudospectra/
• Spectra and Pseudospectra, Trefethen and Embree,
Princeton, 2005
• In language of pseudospectra, Demmel’s question asks
whether, for the 2 and F norms, the distance to the
nearest defective matrix is the same as the largest e for
which the e-pseudospectrum consists of n disjoint
regions
• That is, the smallest e for which two e-pseudospectral
components coalesce
• EigTool (T. Wright and Trefethen, 2003) (2 and F norm)
Fundamental Thm of Pseudospectra
• The following two definitions of the 2 and F norm epseudospectrum of A are equivalent:
• {z: det(A+E - zI) = 0 for some E with ||E||≤ e}
• {z: sn(A – zI) ≤ e} (where sn denotes smallest
singular value)
• Easy to prove via SVD:
– If z satisfies first condition, sn(A – zI) ≤ e
– If z satisfies second condition, A – sn(A – zI) unvn* must
have an eigenvalue z
•
•
•
•
Thus EigTool plots contours of sn
Extends to other norms
Importance emphasized by Trefethen for many years
Who first made this observation?
Wilkinson, Utilitas Math, 1984, 71 pp
• “A problem of primary interest to us is the distance,
measured in the 2 norm, of our matrix A from
matrices having multiple eigenvalues.”
• “We expect that when the condition number is large
A will be, at least in some sense, close to a matrix
having a double eigenvalue. A major objective of this
paper is to quantify this statement.”
• “The nearest defective matrix will often have an
eigenvalue of multiplicity 2 and the nearest matrix
with an eigenvalue of larger multiplicity may be much
further away.”
• Still no mention of pseudospectra
Wilkinson, Utilitas Math 30, 1986, 43 pp.
• “The domain D(e) in the complex plane defined by
||(A – zI)-1||-1 ≤ e uniquely identifies all z which can be
induced as eigenvalues by perturbations E with
||E||≤e.”
• “We would like to emphasize the sheer economy of
this theorem.”
• May be the first explicit observation of the
fundamental theorem of pseudospectra, although
Varah observed the less trivial direction direction in
1979 (possible exception: Landau)
• “The behaviour of the domain D(e) as e increases
from 0 is of fundamental interest to us.”
• “When A has distinct eigenvalues and e is sufficiently
small, D(e) consists of n isolated domains”
[connected components]
Wilkinson, Util Math 30, continued
• “A problem of basic interest to us is the smallest
value of e for which one of these domains [connected
components] coalesces with one of the others.”
• Observes that coalescence of two components of
D(e) at z for a particular e shows that each of two
eigenvalues can be moved to the same z by a
perturbation of norm e but this does not imply that a
single perturbation exists that can induce a double
eigenvalue.
• Like Demmel, gives examples where this is indeed
not possible for the 1 and  norms
• Observes that “it might be felt that the 2 norm is
more satisfactory…” but that “computation of the
smallest singular value for a range of values of z is
generally prohibitive”.
The Editors, Util Math 31, 1987
• When we sent Vol 30 to press, the last paper was a
40-page study by Wilkinson on sensitivity of
eigenvalues. At that time we little thought that Vol 31
would commence with an “in memoriam” notice…
Demmel, 1987
• In proceedings of a conference in memory of
Wilkinson
• “No counterexample [to a conjecture that the answer
to his question is yes] is yet known.”
• “A simple guaranteed way to compute the distance to
the nearest defective matrix remains elusive.”
Malyshev, Numer. Math., 1999
• Distance to nearest defective matrix in 2-norm is
• Inner minimization is unimodal, but outer is
potentially a hard global optimization problem.
• Inspired by algorithm to compute the “real stability
radius”
Edelman and Lippert, 1998-1999
• Used ideas from both pseudospectra and differential
geometry
• Used these to argue that, generically, distance to
nearest defective matrix in the F-norm is the height of
the lowest saddle point of f(x,y) = sn(A – (x+iy) I)
• The implication is that the answer to Demmel’s
question is yes, at least generically, but this is not
addressed
• No algorithm is give to find this lowest critical point
Alam and Bora, LAA, 2005
• First proof that the answer to Demmel’s question is
yes
• Furthermore, the infimum is always achieved by a
defective matrix with a double eigenvalue
• Proof is not easy
• A. Lewis has now obtained a more topologicallybased proof
• This confirms that the nearest nondefective matrix is
A – sn(A–zI)unvn*, where z = x+iy and(x,y) is the
lowest saddle point of f(x,y) = sn(A – (x+iy) I) for
the 2 and F norms – as long as sn(A–zI) is simple
Nongeneric Case
• What if sn-1(A-zI) = sn(A-zI) ?
• Then A – sn-1(A-zI)un-1vn-1* – sn(A-zI) unvn* is a
nearest matrix with a multiple eigenvalue – in the 2
norm
• This eigenvalue is nondefective (it has algebraic
multiplicity 2 and geometric multiplicity 2)
• One can also construct a nearest defective matrix, for
both 2 and F norms
• Example: normal matrices: pseudospectral
components are circles
• More interesting: nonnormal block diagonal matrices,
when two coalescing eigenvalues come from different
blocks: pseudospectral components are not circles
but coalesce tangentially
Clarke Generalized Gradient
• Clarke (PhD thesis, 1973; book, 1982)
• Assume f is locally Lipschitz
• The Clarke generalized gradient of f at x  RN is the
convex hull of limits of gradients of f at x,
∂Cf(x) = conv { limr Df(xr) : xrx , xr  Q}
where Q = {w: f is differentiable at w}
Alam, Bora, Byers, Overton, 2006
• Theorem: in all cases, for the 2 and F norm, the
distance to the nearest defective matrix is the height
of the lowest generalized saddle point of f(x,y) =
sn(A – (x+iy) I)
• That is, such that 0  ∂Cf (x,y), but (x,y) is not a local
extremum of f
• Covers the generic case that sn-1 < sn (smooth
saddle point, sn has a zero gradient)
• And the nongeneric case that sn-1 = sn (nonsmooth
saddle point, sn is not differentiable at saddle point
and has two different limits, which point in opposite
directions when point of coalescence is approached
from the two different pseudospectral components
Algorithm
• Still no guaranteed algorithm known
• Heuristic: apply Newton’s method in 2 real variables
(1 complex variable) to search for a zero of the
gradient of sn(A – (x+iy) I)
• This breaks down in the nongeneric case
An Iteration Covering Both Cases
• Idea: apply Newton’s method to find a zero of the
following function mapping C x R to C x R
• Use known bounds of Wilkinson and Alam to provide
starting points (weighted averages of eigenvalue
pairs)
Convergence Rate
• Smooth saddle point: quadratic in theory and practice
• Nonsmooth saddle point: typically also see quadratic
convergence
• But the function to which we are applying Newton’s
method is not smooth on C X R !
• We think we can show quadratic convergence by
looking at the behaviour along lines through the
nonsmooth saddle point
Transition Between Cases
• If we break the block diagonal structure by adding a
small top right entry connecting the blocks, we see
abrupt transition from the nongeneric to the generic
case
• Condition number of Jacobian jumps arbitrarily large
and drops as we increase the size of the perturbation
• When we increase it enough, algorithm transitions to
discovering that m = 0, indicating the saddle point is
smooth
Sepl
• Given A  Cnxn and B  Cmxm, find smallest max{||E||,
||F||} such that A+E and B+F share a common
eigenvalue
• Equivalently, find smallest e such that e pseudospectra are disjoint
• Posed by Varah, 1979, Demmel 1986
• First algorithm: Gu-Overton, 2006
• This is a nearest defective matrix problem for
Diag(A,B) with the requirement that the block
diagonal structure be preserved
Thanks To
• Volker Merhmann, our host in Berlin in 2004
• Gene Golub and Nick Trefethen, from whom I learned
about numerical linear algebra and pseudospectra
• The Brent family, for many special memories
Download