CSE 713: Expanders - Theory and Applications Lecturer: Hung Q. Ngo

advertisement
CSE 713: Expanders - Theory and Applications
SUNY at Buffalo, Fall 2005
Lecturer: Hung Q. Ngo
Scribe: Hung Q. Ngo
A Linear Algebra Primer
Standard texts on Linear Algebra and Algebra are [1, 8].
1
1.1
Preliminaries
Vectors and matrices
We shall use R to denote the set of real numbers and C to denote the set of complex numbers. For any
c = a + bi ∈ C, the
√ complex conjugate of c, denoted2 by c̄ is defined to be c̄ = a − bi. The modulus of c,
denoted by |c|, is a2 + b2 . It is easy to see that |c| = cc̄.
If we mention the word “vector” alone, it is understood to be a column vector. An n-dimensional
vector x has n entries in some field of numbers, such as R or C:
 
x1
 x2 
 
x =  . .
 .. 
xn
The set of all n-dimensional vectors over R (respectively C) is denoted by Rn (respectively Cn ). They
are also called real vectors and complex vectors, respectively.
Similar to vectors, matrices need an underlying field. We thus have complex matrices and real matrices just as in the case of vectors. In fact, an n-dimensional vector is nothing but an n × 1 matrix. In the
discussion that follows, the concepts of complex conjugates, transposes, and conjugate transposes also
apply to vectors in this sense.
Given an m × n matrix A = (aij ), the complex conjugate Ā of A is a matrix obtained from A by
replacing each entry aij of A by the corresponding complex conjugate āij . The transpose AT of A is the
matrix obtained from A by turning its rows into columns and vice versa. For example,


0 −2
0 3 1
A=
, and AT = 3 0  .
−2 0 1
1 1
The conjugate transpose A∗ of A is defined to be (Ā)T . A square matrix A is symmetric iff A = AT ,
and is Hermitian iff A = A∗ .
Given a real vector x ∈ Rn , the length kxk of x is
q
kxk = x21 + x22 + · · · + x2n .
(1)
2
T
Notice that kxk
use x∗ instead of xT . Hence, in general we
√ = xx√. When x is a complex vector, we
∗
∗
∗
define kxk = xx = x x. (You should check that xx = x∗ x, and that it is a real number so that the
square root makes sense.)
1
The length kxk is also referred to as the L2 -norm of vector x, denoted by kxk2 . In general, the
Lp -norm of an n-dimensional vector x, denoted by kxkp , where p = 1, 2, . . . , is defined to be
1
kxkp := (|x1 |p + · · · + |xn |p ) p ,
(2)
kxk∞ := max |xi |.
(3)
and
i=1..n
The following identities are easy to show, yet of great importance. Given a p × q matrix A and a
q × r matrix B, we have
(AB)T
∗
(AB)
= B T AT
∗
= B A
(4)
∗
(5)
(Question: what are the dimensions of the matrices (AB)T and (AB)∗ ?)
A square matrix A is said to be singular if there is no unique solution to the equation Ax = b. For
A to be singular, it does not matter what b is. The uniqueness of a solution to Ax = b is an intrinsic
property of A alone. If there is one and only one x such that Ax = b, then A is said to be non-singular.
1.2
Determinant and trace
Given a square matrix A = (aij ) of order n, the equation Ax = 0 has a unique solution if and only if
det A 6= 0, where det A denotes the determinant of A, which is defined by
det A =
X
π∈Sn
I(π)
(−1)
n
Y
aiπ(i) =
i=1
X
sign(π)
π∈Sn
n
Y
aiπ(i) .
(6)
i=1
Here, Sn denotes the set of all permutations on the set [n] = {1, . . . , n}. (Sn is more often referred to
as the symmetric group of order n.) Given a permutation π ∈ Sn , we use I(π) to denote the number of
inversions of π, which is the number of pairs (π(i), π(j)) for which i < j and π(i) > π(j). The sign of
a permutation π, denoted by sign(π), is defined to be sign(π) = (−1)I(π) .
Exercise 1.1. Find an involution for Sn to show that, for n ≥ 2, there are as many permutations with
negative sign as permutations with positive sign.
Let us take an example for n = 3. In this case Sn consists of 6 permutations:
Sn = {123, 132, 213, 231, 312, 321}.
Notationally, we write π = 132 to mean a permutation where π(1) = 1, π(2) = 3, and π(3) = 2.
Thus, when π = 132 we have sign(π) = −1 since there is only one “out-of-order” pair (3, 2). To
be more precise, sign(123) = 1, sign(132) = −1, sign(312) = 1, sign(213) = −1, sign(231) = 1,
sign(321) = −1.
Consequently, for


0 3 1
A = −2 0 1
−1 2 2
we have
det A = a11 a22 a33 + (−1)a11 a23 a32 + (−1)a12 a21 a33 + a12 a23 a31 +
a13 a21 a32 + (−1)a13 a22 a31
= 0 · 0 · 2 + (−1) · 0 · 1 · 2 + (−1) · 3 · (−2) · 2 + 3 · 1 · (−1) +
1 · (−2) · 2 + (−1) · 1 · 0 · (−1)
= 5
2
The trace of a square matrix A, denoted by tr A is the sum of its diagonal entries. The matrix A
above has
tr A = 0 + 0 + 2 = 2.
1.3
Combinations of vectors and vector spaces
A vector w is a linear combination of m vectors v1 , . . . , vm if w can be written as
w = a1 v1 + a2 v2 + . . . am vm .
(7)
The number aj is called the coefficient of the vector vj in this linear combination. Note that, as usual, we
have to fix the underlying field such as R or C. If, additionally, we also have a1 + a2 + · · · + am = 1,
then w is called an affine combination of the vi .
A canonical combination is a linear combination in which aj ≥ 0, ∀j; and a convex combination is an
affine combination which is also canonical. The linear (affine, canonical, convex) hull of {v1 , . . . , vm } is
the set of all linear (affine, canonical, convex) combinations of the vj . Note that in the above definitions,
m could be infinite. The convex hull of a finite set of vectors is called a cone, or more specifically a
convex polyhedral cone.
A real vector space is a set V of real vectors so that a linear combination of any subset of vectors
in V is also in V . In other words, vector spaces have to be closed under taking linear combinations.
Technically speaking, this is an incomplete definition, but it is sufficient for our purposes. One can also
replace the word “real” by “complex”. A subspace of a vector space V is a subset of V which is closed
under taking linear combinations.
Given a set V = {v1 , . . . , vm } of vectors, the set of all linear combinations of the vj forms a vector
space, denoted by span {(V )}, or span {(v1 , . . . , vm )}. The column space of a matrix A is the span of
its column vectors. The row space of A is the span of A’s rows. Note that equation Ax = b (with A not
necessarily a square matrix) has a solution if and only if b lies in the column space of A. The coordinates
of x form the coefficients of the column vectors of A in a linear combination to form b.
A set V = {v1 , . . . , vm } of (real, complex) vectors is said to be linearly independent if
a1 v1 + a2 v2 + . . . am vm = 0 only happens when a1 = a2 = . . . am = 0.
Otherwise, the vectors in V are said to be (linearly) dependent.
The dimension of a vector space is the maximum number of linearly independent vectors in the
space. The basis of a vector space V is a subset {v1 , . . . , vm } of V which is linearly independent and
span {(v1 , . . . , vm )} = V . It is easy to show that m is actually the dimension of V . A vector space
typically has infinitely many bases. All bases of a vector space V have the same size, which is also the
dimension of V . The sets Rn and Cn are vector spaces by themselves.
In an n-dimensional vector space, a set of m > n vectors must be linearly dependent.
The dimensions of a matrix A’s column space and row space are equal, and is referred to as the rank
of A. This fact is not very easy to show, but not too difficult either. Gaussian elimination is of great use
here.
Exercise 1.2. Show that for any basis B of a vector space V and some vector v ∈ V , there is exactly
one way to write v as a linear combination of vectors in B.
1.4
Inverses
We use diag(a1 , . . . , an ) to denote the matrix A = (aij ) where aij = 0 for i 6= j and aii = ai , ∀i. The
identity matrix, often denoted by I, is defined to be diag(1, . . . , 1).
Given a square matrix A, the inverse of A, denoted by A−1 is a matrix B such that
AB = BA = I, or AA−1 = A−1 A = I.
3
Exercise 1.3. Show that, if A and B both have inverses, then the inverse of AB can be calculated easily
by
(AB)−1 = B −1 A−1 .
(8)
Similarly, the same rule holds for 3 or more matrices. For example,
(ABCD)−1 = D−1 C −1 B −1 A−1 .
If A has an inverse, it is said to be invertible. Not all matrices are invertible. There are many
conditions to test if a matrix has an inverse, including: non-singularity, non-zero determinant, non-zero
eigenvalues (to be defined), linearly independent column vectors, linearly independent row vectors.
2
Eigenvalues and eigenvectors
In this section, we shall be concerned with square matrices only, unless stated otherwise.
The eigenvalues of a matrix A are the numbers λ such that the equation Ax = λx, or (λI − A)x = 0,
has a non-zero solution vector, in which case the solution vector x is called a λ-eigenvector.
The characteristic polynomial pA (λ) of a matrix A is defined to be
pA (λ) := det(λI − A).
Since the all-0 vector, denoted by ~0, is always a solution to (λI − A)x = 0, it would be the only
solution if det(λI − A) 6= 0. Hence, the eigenvalues are solutions to the equation pA (λ) = 0. For
example, if
2 1
,
A=
−2 3
then,
λ − 2 −1
= (λ − 2)(λ − 3) + 2 = λ2 − 5λ + 8.
pA (λ) = det
+2 λ − 3
√
Hence, the eigenvalues of A are (5/2 ± i 7/2).
If we work on the complex numbers, then equation pA (λ) = 0 always has n roots (up to multiplicities). However, we shall be concerned greatly with matrices which have real eigenvalues. We shall
establish sufficient conditions for a matrix to have real eigenvalues, as shall be seen in later sections.
Theorem 2.1. Let λ1 , . . . , λn be the eigenvalues of an n × n complex matrix A, then
(i) λ1 + · · · + λn = tr A.
(ii) λ1 . . . λn = det A.
Proof. In the complex domain, pA (λ) has n complex roots since it is a polynomial of degree n. The
eigenvalues λ1 , . . . , λn are the roots of pA (λ). Hence, we can write
Y
pA (λ) =
(λ − λi ) = λn + cn−1 λn−1 + · · · + c1 λ + c0 .
i
It is evident that
cn−1 = − (λ1 + · · · + λn )
c0 = (−1)n λ1 . . . λn .
4
On the other hand, by definition we have

λ − a11 −a12
 −a21 λ − a22

pA (λ) = det  .
 ..
...
−an1
−an2
...
...
...
...
−a1n
−a2n
..
.



.

λ − ann
Expanding pA (λ) in this way, the coefficient of λn−1 (which is cn−1 ) is precisely −(a11 +a22 +· · ·+ann );
and the coefficient of λ0 (which is c0 ) is (−1)n det A (think carefully about this statement!).
2.1
The diagonal form
Proposition 2.2. Suppose the n×n matrix A has n linearly independent eigenvectors x1 , . . . , xn , where
xi is a λi -eigenvector. Let S be the matrix whose columns are the vectors xi , then S −1 AS = Λ, where
Λ = diag (λ1 , . . . , λn ).
Proof. Note that since the column vectors of S are independent, S is invertible and writing S −1 makes
sense. We want to show S −1 AS = Λ, which is the same as showing AS = SΛ. Since Axi = xi λi , it
follows that

 
 

| ...
|
|
...
|
|
...
|
AS = A x1 . . . xn  = Ax1 . . . Axn  = λ1 x1 . . . λn xn  = SΛ.
| ...
|
|
...
|
|
...
|
In general, if a matrix S satisfies the property that S −1 AS is a diagonal matrix, then S is said to
diagonalize A, and A is said to be diagonalizable. It is easy to see from the above proof that if A is
diagonalizable by S, then the columns of S are eigenvectors of A; moreover, since S is invertible by
definition, the columns of S must be linearly independent. In other words, we just proved
Theorem 2.3. A matrix is diagonalizable if and only if it has n independent eigenvectors.
Proposition 2.4. If x1 , . . . xk are eigenvectors corresponding to distinct eigenvalues λ1 , . . . λk , then
x1 , . . . xk are linearly independent.
Proof. When k = 2, suppose c1 x1 + c2 x2 = 0. Multiplying by A gives c1 λ1 x1 + c2 λ2 x2 = 0.
Subtracting λ2 times the previous equation we get
c1 (λ1 − λ2 )x1 = 0.
Hence, c1 = 0 since λ1 6= λ2 and x1 6= 0. The general case follows trivially by induction.
Exercise 2.5. If λ1 , . . . λn are eigenvalues of A, then λk1 , . . . λkn are eigenvalues of Ak . If S diagonalizes
A, i.e. S −1 AS = Λ, then S −1 Ak S = Λk
2.2
Symmetric and Hermitian matrices
For any two vectors x, y ∈ Cn , the inner product of x and y is defined to be
x∗ y = x̄T y = x̄1 y1 + · · · + x̄n yn
Two vectors are orthogonal to one another if their inner product is 0. The vector ~0 is orthogonal to all
vectors. Two orthogonal non-zero vectors must be linearly independent. For, if x∗ y = 0 and ax+by = 0,
5
then 0 = ax∗ x + bx∗ y = ax∗ x. This implies a = 0, which in turns implies b = 0 also. With the
same reasoning, one easily shows that a set of pairwise orthogonal non-zero vectors must be linearly
independent.
If A is any complex matrix, recall that the Hermitian transpose A∗ of A is defined to be ĀT , and
that A is said to be Hermitian if A = A∗ . A real matrix is Hermitian if and only if it is symmetric.
Also notice that the diagonal entries of a Hermitian matrix must be real, because they are equal to their
respective complex conjugates. The next lemma lists several useful properties of a Hermitian matrix.
Lemma 2.6. Let A be a Hermitian matrix, then
(i) for all x ∈ Cn , x∗ Ax is real.
(ii) every eigenvalue of A is real.
(iii) the eigenvectors of A, if correspond to distinct eigenvalues, are orthogonal to one another.
Proof. It is straightforward that
(i) (x∗ Ax)∗ = x∗ A∗ x∗∗ = x∗ Ax.
(ii) Ax = λx implies λ =
x∗ Ax
x∗ x .
(iii) Suppose Ax = λ1 x, Ay = λ2 y, and λ1 6= λ2 , then
(λ1 x)∗ y = (Ax)∗ y = x∗ Ay = x∗ (λ2 y).
Hence, (λ1 − λ2 )x∗ y = 0, implying x∗ y = 0.
2.3
Orthonormal and unitary matrices
A real matrix Q is said to be orthogonal if QT Q = I. A complex matrix U is unitary if U ∗ U = I. In
other words, the columns of U (and Q) are orthonormal. Obviously being orthogonal is a special case of
being unitary. We state without proof a simple proposition.
Proposition 2.7. Let U be a unitary matrix, then
(i) (U x)∗ (U y) = x∗ y, and kU xk2 = kxk2 .
(ii) Every eigenvalue λ of U has modulus 1 (i.e. |λ| = λ∗ λ = 1).
(iii) Eigenvectors corresponding to distinct eigenvalues of U are orthogonal.
(iv) If U 0 is another unitary matrix, then U U 0 is unitary.
3
The Spectral Theorem and the Jordan canonical form
Two matrices A and B are said to be similar iff there is an invertible matrix M such that M −1 AM =
B. Thus, a matrix is diagonalizable iff it is similar to a diagonal matrix. Similarity is obviously an
equivalence relation. The following proposition shows what is common among matrices in the same
similarity equivalent class.
Proposition 3.1. If B = M −1 AM , then A and B have the same eigenvalues. Moreover, an eigenvector
x of A corresponds to an eigenvector M −1 x of B.
6
Proof. Ax = λx implies (M −1 A)x = λM −1 x, or (BM −1 )x = λ(M −1 x).
An eigenvector corresponding to an eigenvalue λ is called a λ-eigenvector. The vector space spanned
by all λ-eigenvectors is called the λ-eigenspace. We shall often use Vλ to denote this space.
Corollary 3.2. If A and B are similar, then the corresponding eigenspaces of A and B have the same
dimension.
Proof. Suppose B = M −1 AM , then the mapping φ : x → M −1 x is an invertible linear transformation
from one eigenspace of A to the corresponding eigenspace of B.1
If two matrices A and B are similar, then we can say a lot about A if we know B. Hence, we
would like to find B similar to A where B is as “simple” as possible. The first “simple” form is the
upper-triangular form, as shown by the following Lemma, which is sometime referred to as the Jacobi
Theorem.
Lemma 3.3 (Schur’s lemma). For any n×n matrix A, there is a unitary matrix U such that B = U −1 AU
is upper triangular. Hence, the eigenvalues of A are on the diagonal of B.
Proof. We show this by induction on n. The lemma holds when n = 1. When n > 1, over C A must
have at least one eigenvalue λ1 . Let x01 be a corresponding eigenvector. Use the Gram-Schmidt process
to extend x01 to an orthonormal basis {x1 , x2 , . . . , xn } of Cn . Let U1 be the matrix whose columns are
these vectors in order. From the fact that U1−1 = U1∗ , it is easy to see that


λ1 ∗ ∗ . . . ∗
 0 ∗ ∗ . . . ∗


−1

U1 AU1 = 
 0 ∗ ∗ . . . ∗ .
. . . . . . . . . . . . . . . . .
0 ∗ ∗ ... ∗
Now, let A0 = (U1−1 AU1 )11 (crossing off row 1 and column 1 of U1−1 AU1 ). Then, by induction there
exists an (n − 1) × (n − 1) unitary matrix M such that M −1 A0 M is upper triangular. Let U2 be the
n × n matrix obtained by adding a new row and new column to M with all new entries equal 0 except
(U2 )11 = 1. Clearly U2 is unitary and U2−1 (U1−1 AU1 )U2 is upper triangular. Letting U = U1 U2
completes the proof.
The following theorem is one of the most important theorems in elementary linear algebra, beside
the Jordan form.
Theorem 3.4 (Spectral theorem). Every real symmetric matrix can be diagonalized by an orthogonal
matrix, and every Hermitian matrix can be diagonalized by a unitary matrix:
(real case) Q−1 AQ = Λ, (complex case) U −1 AU = Λ
Moreover, in both cases all the eigenvalues are real.
Proof. The real case follows from the complex case. Firstly, by Schur’s lemma there is a unitary matrix
U such that U −1 AU is upper triangular. Moreover,
(U −1 AU )∗ = U ∗ A∗ (U −1 )∗ = U −1 AU,
i.e. U −1 AU is also Hermitian. But an upper triangular Hermitian matrix must be diagonal. The realness
of the eigenvalues follow from Lemma 2.6.
1
I have not define linear transformation yet. The thing to remember is that if there is an invertible linear transformation
from one vector space to another, then the two vector spaces have the same dimension. Invertible linear transformations are
like isomorphisms or bijections, in some sense. A curious student should try to prove this fact directly without using the term
linear transformation.
7
Theorem 3.5 (The Jordan canonical form). If a matrix A has s linearly independent eigenvectors, then
it is similar to a matrix which is in Jordan form with s square blocks on the diagonal:


B1 0
0 ... 0
 0 B2 0 . . . 0 




..
M −1 AM =  ...
. ... 0 
0


. . . . . . . . . . . . . . . . . . . . . .
0
0
0 . . . Bs
Each block has exactly one 1-dimensional eigenspace, one eigenvalue, and 1’s just above the diagonal:


λj 1 0 . . . 0
 0 λj 1 . . . 0 




..
Bj =  ...
. ... 0 
0


. . . . . . . . . . . . . . . . 1 
0 0 0 . . . λj
Proof. A proof could be read from Appendix B of [8]. Another proof is presented in [3], which has a
nice combinatorial presentation in terms of digraphs. The fact that each Jordan block has exactly one 1dimensional eigenspace is straightforward. The main statement is normally shown by induction in three
steps.
Corollary 3.6. Let n(λ) be the number of occurrences of λ on the diagonal of the Jordan form of A. The
following hold
P
1. rank(A) = λi 6=0 n(λi ) + n(0) − dim(V0 ).
2. If A is Hermitian, then the λ-eigenspace has dimension equal the multiplicity of λ as a solution to
equation pA (x) = 0.
L
3. In fact, in Hermitian case Cn = i Vλi where Vλi denotes the λi -eigenspace.
Proof. This follows directly from the Jordan form and our observation in Corollary 3.2. We are mostly
concerned with the dimensions of eigenspaces, so we can think about Λ instead of A. Similar matrices
have the same rank, so A and its Jordan form have the same rank. The Jordan form of A has rank equal
the total number of non-zero eigenvalues on the diagonal plus the number of 1’s in the Jordan blocks
corresponding to the eigenvalue 0, which is exactly n(0) − dim(V0 ).
When A is Hermitian, it is diagonalizable. Every eigenvector corresponding to an occurrence of an
eigenvalue λ is linearly independent from all others (including the eigenvector corresponding to another
instance of the same λ).
4
The Minimum Polynomial
I found the following very nice theorem stated without proof in a book called “Matrix Methods” by
Richard Bronson. I’m sure we could find a proof in either [6] or [4], but I wasn’t able to get them from
the library. Here I present my little proof.
8
Theorem 4.1. Suppose Bk is a Jordan block of size (l + 1) × (l + 1) corresponding to the eigenvalue
λk of A, i.e.


λk 1 0 . . . 0
 0 λk 1 . . . 0 



.. . .
..  .
Bk =  ...
. ... . 
.


. . . . . . . . . . . . . . . . 1 
0 0 0 . . . λk
Then, for any polynomial q(λ) ∈ C[λ]

0
q 00 (λk )
k)
q(λ ) q (λ
...
1!
2!
 k
q 0 (λk )
 0
q(λk )
...

1!
 .
.
.
q(Bk ) =  .
..
..
...
 .

. . . . . . . . . . . . . . . . . . . . . . . . .
0
0
0
...

q (l) (λk )
l!

q (l−1) (λk ) 
(l−1)! 
..
.
q 0 (λk )
1!
(9)





q(λk )
Proof. We only need to consider the case q(x) = xj , j ≥ 0, and then extend linearly into all polynomials.
The case j = 0 is clear. Suppose equation (9) holds for q(x) = xj−1 , j ≥ 1. Then, when q(x) = xj we
have
q(Bk ) = Bkj−1 Bk

 j−1

j−1 j−l−1 
j−1 j−2
j−1 j−3
.
.
.
λ
λk
λ
λ
λk 1 0 . . . 0
k
l k
1
2 k

j−1 j−l  
j−1 j−2
...
λj−1
  0 λk 1 . . . 0 
 0
k
l−1 λk
1 λk

 .
 .

.
.
.
.
.
.


.
.
.
.
.
.
.
.
=  .
 .
. ... . 
.
.
...
.
.




j−1 j−2  . . . . . . . . . . . . . . . .

1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
λ
k
1
j−1
0 0 0 . . . λk
0
0
0
0
λk
 j

j−2
j
j j−l
λk 1j λj−1
...
k
2 λ k
l λk

j
j−1
j−l+1 
j
j
λk
...
0

1 λk
l−1 λk
 .

.
.
.

..
..
..
= 
...
 ..



j−1
j
. . . . . . . . . . . . . . . . . . . . . . . . . .

1 λk
0
0
0
0
λjk
The minimum polynomial mA (λ) of an n × n matrix A over the complex numbers is the monic
polynomial of lowest degree such that mA (A) = 0.
Lemma 4.2. With the terminologies just stated, we have
(i) mA (λ) divides pA (λ).
(ii) Every root of pA (λ) is also a root of mA (λ). In other words, the eigenvalues of A are roots of
mA (λ).
(iii) A is diagonalizable iff mA (λ) has no multiple roots.
(iv) If {λi }si=1 are distinct eigenvalues of a Hermitian matrix A, then mA (λ) =
9
Qs
i=1 (λ
− λi ).
Proof.
(i) mA (λ) must divide every polynomial q(λ) with q(A) = 0, since otherwise q(λ) = h(λ)mA (λ)+
r(λ) implies r(A) = 0 while r(λ) has smaller degree than mA (λ). On the other hand, by the
Cayley-Hamilton Theorem (theorem 5.1), pA (A) = 0.
(ii) P
Notice that Ax
implies Ai x = λi x. Thus, for any λk eigenvector x of A ~0 = mA (A)x =
P = λx
i
i
i ci A x =
i ci λk x = m(λk )x. This implies λk is a root of m(λ).
(iii) (⇒). Suppose M −1 AM = Λ for some invertible matrix M , and λ1 , . . . , λs areQ
distinct eigenvalues of A. By (i) and (ii), we only need to show A is a root of mA (λ) = si=1 (λ − λi ).
It is easy to see that for any polynomial q(λ), q(A) = M q(Λ)M −1 . In particular, mA (A) =
M −1 mA (Λ)M = 0, since mA (Λ) = 0.
Q
(⇐). Now we assume mA (λ) has no multiple root, which implies mA (λ) = si=1 (λ − λi ). By
Proposition 2.2, we shall show that A has n linearly independent eigenvectors. Firstly, notice that
if the Jordan form of A is


B1 0
0 ... 0
 0 B2 0 . . . 0 




..
M −1 AM =  ...
.
.
.
.
.
0
0


. . . . . . . . . . . . . . . . . . . . . .
0
0
0 . . . Bs
Then, for any q(λ) ∈ C[λ] we have
M −1 q(A)M


B1 0
0 ... 0
 0 B2 0 . . . 0 


 ..

.
.
= q  .

. ... 0 
0


. . . . . . . . . . . . . . . . . . . . . .
0
0
0 . . . Bs


q(B1 )
0
0 ...
0
 0
q(B2 ) 0 . . .
0 


 ..

..
=  .

.
.
.
.
0
0


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0
0
0 . . . q(Bs )
Q
Q
So, si=1 (A − λi I) = 0 implies si=1 (Bk − λi I) = 0 for all k = 1, . . . , s. If A does not have n
linearly independent
Q eigenvectors, one of the blocks Bk must have size > 1. Applying Theorem
4.1 with q(λ) = si=1 (λ − λi ), we see that q(Bk ) does not vanish since q 0 (λi ) 6= 0, ∀i ∈ [s].
Contradiction!
(iv) Follows from (iii) since a Hermitian matrix is diagonalizable.
5
5.1
Two Motivating Theorems
The statements
We examine two elegant theorems which illustrate beautifully the inter-relations between Combinatorics,
Algebra, and Graph Theory. These two theorems are presented not only for the purpose of demonstrating
10
the relationships, but they will also be used to develop some of our later materials on Algebraic Graph
Theory.
Theorem 5.1 (Cayley-Hamilton). Let A be an n × n matrix over any field. Let pA (x) := det(xI − A)
be the characteristic polynomial of A. Then pA (A) = 0.
I will give a proof of this theorem combinatorially, following the presentation in [7]. A typical
algebraic proof of this theorem would first shows that a weak version where A is diagonal holds, then
extend to all matrices over C. To show the most general version we stated, the Fundamental Theorem of
Algebra is used. (FTA says C is algebraically closed, or any p ∈ C[x] has roots in C).
Theorem 5.2 (Matrix-Tree). Let G be a labeled graph on [n] := {1, . . . , n}. Let A be the adjacency
matrix of G and di := deg(i) be the degree of vertex i. Then the number of spanning trees of G is any
cofactor of L, where L = D − A, D is diagonal with diagonal entries dii = di ,
The matrix L is often referred to as the Laplacian of G. A cofactor of a square matrix L is
(−1)i+j det Lij where Lij is the matrix obtained by crossing off row i and column j of L. This theorem also has a beautiful combinatorial proof. See [7] for details. I will present the typical proof of this
theorem which uses the Cauchy-Binet theorem on matrix expansion. This proof is also very elegant and
helps us develope a bit of linear algebra. Actually, for weighted graphs, a minimum spanning tree can be
shown to be a tree which minimizes certain determinant.
5.2
The proofs
Combinatorial proof of Cayley-Hamilton Theorem. (by Straubing 1983 [9]).
X
pA (x) := det(xI − A) :=
sgn(π)
n
Y
(xI − A)iπ(i)
i=1
π∈Sn
Let the set fixed points of a permutation π be denoted by f p(π) := {i ∈ [n] | π(i) = i}. Each
i ∈ f p(π) contributes either x or −aii to a term. Each i ∈
/ f p(π) contributes −aiπ(i) . Hence, thinking
of F as the set of fixed points contributing x, we get
X
X
Y
pA (x) =
sgn(π)
(−1)n−|F | x|F |
aiπ(i)
=
π∈Sn
F ⊆f p(π)
X
X
sgn(π)
π∈Sn
i∈F
/
|S| n−|S|
(−1) x
Y
aiπ(i) .
i∈S
S⊆[n],
[n]−S⊆f p(π)
Now we exchange the summation indices by first fixing a particular choice of S. The π will be the ones
with [n] − S ⊆ f p(π), i.e. the permutations which fix everything not in S. Let P (S) be the set of
permutations on S, then
pA (x) =
n
X
xn−k
k=0
X
X
sgn(π)(−1)k
aiπ(i) .
i∈S
[n]
k
S∈(
Y
) π∈P (S)
Let c(π) be the number of cycles of π, it is easy to see that for π ∈ P (S) with |S| = k, sgn(π)(−1)k =
(−1)c(π) . Thus,
pA (x) =
n
X
k=0
xn−k
X
X
[n]
k
S∈(
) π∈P (S)
11
(−1)c(π)
Y
i∈S
aiπ(i)
Our objective is to show pA (A) = 0. We’ll do so by showing (pA (A))ij = 0, ∀i, j ∈ [n]. Firstly,
(pA (A))ij =
n
X
X
(An−k )ij
k=0
X
(−1)c(π)
alπ(l)
l∈S
[n]
k
S∈(
Y
) π∈P (S)
Let Pijk be the set of all directed walks of length k from i to j in Kn - the complete directed graph
on n vertices.
Let an edge e = (i, j) ∈ E(Kn ) be weighted by w(e) = aij . For any P ∈ Pijk , let
Q
w(P ) = e∈P w(e). It follows that
X
(An−k )ij =
w(P )
n−k
P ∈Pij
n−|S|
To this end, let (S, π, P ) be a triple satisfying (a) S ⊆ [n]; (b) π ∈ P (S); and (c) P ∈ Pij
Q
Define w(S, π, P ) := w(P )w(π), where w(π) = t∈S atπ(t) . Let sgn(S, π, P ) := (−1)c(π) , then
(pA (A))ij =
X
.
w(S, π, P )sgn(S, π, P )
(S,π,P )
To show (pA (A))ij = 0, we seek a sign-reversing, weight-preserving involution φ on the set of triples
(S, π, P ). Let v be the first vertex in P along the walk such that either (i) v ∈ S, or (ii) v completes a
cycle in P . Clearly,
• (i) and (ii) are mutually exclusive, since if v completes a cycle in P and v ∈ S then v was in S
before completing the cycle.
• One of (i) and (ii) must hold, since if no v satisfy (i) then P induces a graph on n − |S| vertices
with n − |S| edges. P must have a cycle.
Lastly, given the observations above we can describe φ as follows. Take the first v ∈ [n] satisfying
(i) or (ii). If v ∈ S then let C be the cycle of π containing v. Let P 0 be P with C added right after v.
S 0 = S − C and π 0 be π with the cycle C removed. The image of φ(S, π, P ) is then (S 0 , π 0 , P 0 ). Case
(ii) v completes a cycle in P before touching S is treated in the exact opposite fashion, i.e. we add the
cycle into π, and remove it from P .
To prove the Matrix-Tree Theorem, we first need to show a sequence of lemmas. The first (CauchyBinet Theorem) is commonly stated with D = I.
Lemma 5.3 (Cauchy-Binet Theorem). Let A and B be, respectively, r × m and m × r matrices. Let D
be an m × m diagonal matrix with diagonal entries ei , i ∈ [m]. For any r-subset S of [m], let AS and
B S denote, respectively, the r × r submatrices of A and B consisting of the columns of A, or the rows
of B, indexed by S. Then
X
Y
det(ADB) =
det AS det B S
ei .
S∈([m]
r )
i∈S
Proof. We will prove
this assuming that e1 , . . . , em are indeterminates. With this assumption in mind,
Pm
since (ADB)ij =
k=1 aik bkj ek , it is easy to see that det(ADB) is a homogeneous polynomial in
e1 , . . . , em with degree r.
12
Consider a monomial et11 et22 . . . etmm , where the number of distinct variables that occur is < r, i.e.
|{i | ti > 0}| < r. Substitute 0 for all other indeterminates then et11 et22 . . . etmm and its coefficient
are unchanged. But, after this substitution, rank(D) < r, which implies rank(ADB) < r, making
det(ADB) = 0. So the coefficient of our monomial is 0.
Put it another way, the coefficient of a monomial et11 . . . etmm is 0 unless it is a product of r distinct
Q
indeterminates, i.e. ∃SQ∈ [m]
s.t. et11 . . . etmm = i∈S ei .
r
The coefficient of i∈S ei can be calculated by setting ei = 1 for all i ∈ S and ej = 0 for all j ∈
/ S.
It is not hard to see that the coefficient is det AS det B S .
Lemma 5.4. Given a directed graph H with incident matrix N . Let C(H) be the set of connected
component of H, then
rank(N ) = |V (H)| − |C(H)|
Proof. Recall that N is defined to be a
indexed by E(H), and


0
Ni,e = 1


−1
matrix whose rows are indexed by V (H), whose columns are
if i is not incident to e or e is a loop
if e = j → i, j 6= i
if e = i → j, j 6= i
To show rank(N ) = |V (H)| − |C(H)| we only need to show that dim(col(N )⊥ ) = |C(H)|. For
any row vector g ∈ R|V (H)| , g ∈ col(N )⊥ iff gN = 0, i.e. for any edge e = x → y ∈ E(H) we must
have g(x) = g(y). Consequently, g ∈ col(N )⊥ iff g is constant on the coordinates corresponding to any
connected component of H. It is thus clear that dim(col(N )⊥ ) = |C(H)|.
Lemma 5.5 (Poincaré, 1901). Let M be a square matrix with at most two non-zero entries in each
column, at most one 1 and at most one −1, then det M = 0, ±1.
Proof. This can be done easily by induction. If every column has exactly a 1 and a −1, then the sum
of all row vectors of M is ~0, making det M = 0. Otherwise, expand the determinant of M along the
column with at most one ±1 and use the induction hypothesis.
Proof the Matrix-Tree Theorem. We will first show that the Theorem holds for the ii-cofactors for all
i ∈ [n]. Then, we shall show that the ij-cofactors are all equal for all j ∈ [n], which completes the
proof. We can safely assume m ≥ n − 1, since otherwise there is no spanning tree and at the same time
det(N N T ) = 0.
Step 1. If G0 is any orientation of G, and N is the incident matrix of G0 , then L = N N T . (Recall
that L is the Laplacian of G.) For any i 6= j ∈ [n], if i is adjacent to j then clearly (N N T )ij = −1. On
the other hand, (N N T )ii is obviously the number of edges incident to i.
Step 2. If B is an (n − 1) × (n − 1) submatrix of N , then det B = 0 if the corresponding n − 1
edges contain a cycle, and det B = ±1 if they form a spanning tree of G. Clearly, B is obtained by
removing a row of NS for some (n − 1)-subset S of E(H). By Lemma 5.4, rank(NS ) = n − 1 iff
the edges corresponding to S form a spanning tree. Moreover, since the sum of all rows of NS is the
0-vector, rank(B) = rank(NS ). Hence, det B 6= 0 iff S form a spanning tree. When S does not form
a spanning tree, Lemma 5.5 implies det B = ±1.
Step 3. Calculating det Lii , i.e. the ii-cofactor of L. Let m = |E(G)|. Let M be the matrix obtained
from N by deleting row i of N , then Lii = M M T . Applying Cauchy-Binet theorem with ei = 1, ∀i,
we get
13
det(M M T ) =
X
det MS det(M T )S
[m]
S∈(n−1
)
=
X
(det MS )2
[m]
S∈(n−1
)
=
# of spanning trees of G
The following Lemma is my solution to exercise 2.2.18 in [10]. The Lemma completes the proof
because L is a matrix whose columns sum to the 0-vector.
Lemma 5.6. Given an n × n matrix A = (aij ) whose columns sum to the 0-vector. Let bij =
(−1)i+j det Aij , then for a fixed i, we have bij = bij 0 , ∀j, j 0 .
Proof. Let B = (bij )T = (bji ), then
(AB)ij =
n
X
aik bjk
k=1
Obviously, (AB)ij = δij det A where δij is the Kronecker delta. To see this, imagine replacing row
j of A by row i of A and expand det A along row j, we get exactly the expression above. In other words,
AB = (det A)I.
P
Let a~i denote column i of A, then by assumption i a~i = ~0. Hence, det A = 0 and dim(col(A)) ≤
n − 1. If dim(col(A)) < n − 1 then rank(Aij ) < n − 1, making bij = 0. Otherwise, if dim(col(A)) =
n −P
1 then n − 1 vectors a~j − a~1 , 2 ≤ j ≤ n are linearly independent. Moreover, AB = (det A)I = 0
and i a~i = ~0 implies that for all i
(bi2 − bi1 )(a~2 − a~1 ) + (bi3 − bi1 )(a~3 − a~1 ) + . . . (bin − bi1 )(a~n − a~1 ) = ~0
So, bij − bi1 = 0, ∀j ≥ 2.
Corollary 5.7 (Cayley Formula). The number of labeled trees on [n] is nn−2 .
Proof. Cayley formula is usually proved by using Prufer correspondence. Here I use the Matrix-Tree
theorem to give us a different proof. Clearly the number of labeled trees on [n] is the number of spanning
trees of Kn . Hence, by the Matrix-Tree theorem, it is det(nI − J) where J is the all 1’s matrix, and I
and J are matrices of order n − 1 (we are taking the 11-cofactor).
14
det(nI − J) =
=
=
=
=


n − 1 −1
−1 . . .
−1
 −1 n − 1 −1 . . .
−1 


−1 n − 1 . . .
−1 
det 

 −1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
−1
−1
−1 . . . n − 1


n−1
−1
−1
...
−1
n(n−2)
−n
−n 
 0
...

n−1
n−1
n−1 

n(n−2)
−n
−n 
det  0
...
n−1
n−1
n−1 


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
−n
−n
0
. . . n(n−2)
n−1
n−1
n−1


n−1
−1
−1
...
−1
n(n−2)
−n
−n 
 0
...

n−1
n−1
n−1 

n(n−3)
−n 
det  0

0
...
n−2
n−2


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
−n
0
0
. . . n(n−3)
n−2
n−2
...


n−1
−1
−1
...
−1
n(n−2)
−n
−n

 0
...


n−1
n−1
n−1


n(n−3)
−n
.
.
.
0
det  0

n−2
n−1


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
0
0
0
. . . n(n−(n−1))
n−(n−2)
= nn−2
6
Positive definite matrices
The purpose of this section is to develop a necessary and sufficient conditions for a real symmetric matrix
A (or Hermitian in general) to be positive definite. This is essentially the conditions for a quadratic form
on Rn to have a minimum at some point.
6.1
Some analysis
Let us first recall two key theorems from real analysis, stated without proofs. For f : Rn → R, if for
∂f
some a ∈ Rn ∂x
(a) = 0, then a is called a stationary point of f .
i
Theorem 6.1 (The second derivative test). Suppose f : Rn → R and its partial derivatives up to and
including order 2 are continuous in a ball B(a, r) (centered at a ∈ Rn , radius r). Suppose that f has a
stationary point at a. For h = (h1 , . . . , hn ), define ∆f (a, h) = f (a + h) − f (a); also define
Q(h) =
n
1 X ∂2f
(a)hi hj
2!
∂xi ∂xj
i,j=1
then,
1. If Q(h) > 0 for h 6= 0, then f has a strict local minimum at a.
15
2. If Q(h) < 0 for h 6= 0, then f has a strict local maximum at a.
3. If Q(h) has a positive maximum and a negative minimum, then ∆f (a, h) changes sign in any ball
B(a, ρ) such that ρ < r.
Note. (3.) says that at any close neighborhood of a, there are some points b and c such that f (b) > f (a)
and f (c) < f (a).
Example 6.2. Let us look at a quadratic form F (x1 , . . . , xn ) with all real coefficients, i.e. every term of
2F
F has degree at most 2. Let A be the matrix defined by aij = ∂x∂i ∂x
(a). Clearly, A is a real symmetric
j
n
matrix. For any vector h ∈ R ,
 h 
1
a11 a12 . . . a1n

h
 a21 a22 . . . a2n  
2

hn 

.
. . . . . . . . . . . . . . . . . . .  .. 

an1 an2 . . . ann
hn

hT Ah =
=
h 1 h2 . . .
n
X
aij hi hj
i,j=1
= 2Q(h)
So, F (x1 , . . . , xn ) has a minimum at (0, . . . , 0) (which is a stationary point of F ) iff hT Ah > 0 for
all h 6= 0.
Definition 6.3. A non-singular n × n Hermitian matrix A is said to be positive definite if x∗ Ax > 0 for
all non zero vector x ∈ Cn . A is positive semidefinite if we only require x∗ Ax ≥ 0. The terms negative
definite and negative semidefinite can be defined similarly.
Note. Continuing with our example, clearly F (x1 , . . . , xn ) has a minimum at (0, . . . , 0) iff A is positive
definite. Also, since we already showed that if A is Hermitian, then x∗ Ax is real, the definitions given
above make sense.
A function f is in C 1 on some domain D ⊆ Rn if f and all its first order derivatives are continuous
∂f
∂f
on D. For a ∈ D, 5f (a) := ( ∂x
(a), . . . , ∂x
(a)).
n
1
Theorem 6.4 (Lagrange’s multiplier rule). Suppose that f, ϕ1 , . . . , ϕk are C 1 functions on an open set
D in Rn containing a point a, that the vectors 5ϕ1 (a), . . . , 5ϕk (a) are linearly independent, and that
f takes on its minimum among all points of D0 at x0 , where D0 is the subset of D so that for all x ∈ D0 ,
ϕi (x1 , . . . , xn ) = 0, i = 1, . . . , k
Then, if F : Rn+k → R is defined to be
F (x, λ) = f (x) −
k
X
λi ϕi (x)
i=1
then there exists λ0 ∈ Rk such that
∂F
(a, λ0 ) = 0 i = 1, . . . , n
∂xi
∂F
(a, λ0 ) = 0 i = 1, . . . , k
∂λ0j
16
Note. This theorem essentially says that the maxima (or minima) of f subject to the side conditions
ϕ1 = · · · = ϕk = 0 are among the maxima (or minima) of the function F without any constraints.
Example 6.5. To find the maximum of f (x) = x1 + 3x2 − 2x3 on the sphere 14 − (x21 + x22 + x23 ) = 0,
we let
F (x, λ) = x1 + 3x2 − 2x3 + λ(x21 + x22 + x23 − 14)
∂F
∂F
∂F
∂F
2
2
2
Then, ∂x
= 1+2λx1 , ∂x
= 3+2λx2 , ∂x
= −2+2λx3 , and ∂F
∂λ = x1 +x2 +x3 −14. Solving ∂xi = 0
1
2
3
we obtain two solutions (x, λ) = (1, 3, −2, −1/2) and (−1, −3, 2, 1/2). Which of these solutions give a
maximum or a minimum ? We apply the second derivative test. All second derivatives of F are 0 except
∂ 2 F/∂x21 = ∂ 2 F/∂x22 = ∂ 2 F/∂x23 = 2λ. Q(h) = 2λ(h21 + h22 + h23 ) has the same sign as λ. Hence,
the first solution gives the maximum value of 14, the second solution gives the minimum value of −13.
6.2
Conditions for positive-definiteness
Now we are ready to specify the necessary and sufficient conditions for a Hermitian matrix to be positive
definite, or positive semidefinite for that matter.
Theorem 6.6. Each of the following tests is a necessary and sufficient condition for the real symmetric
matrix A to be positive definite.
(a) xT Ax > 0 for all non-zero vector x.
(b) All the eigenvalues of A are positive.
(c) All the upper left submatrices Ak of A have positive determinants.
(d) If we apply Gaussian elimination on A without row exchanges, all the pivots satisfy pi > 0.
Note. (a) and (b) hold for Hermitian matrices also.
Proof. (a ⇒ b). Suppose xi is a unit λi -eigenvector, then 0 < xTi Axi = xTi λi xi = λi .
(b ⇒ a). Since A is real symmetric, it has a full set of orthonormal eigenvectors {x1 , . . . , xn } by the
Spectral theorem. For each non-zero vector x ∈ Rn , suppose x = c1 x1 + · · · + cn xn , then
Ax = A(c1 x1 + · · · + cn xn ) = c1 λ1 x1 + · · · + cn λn xn
Because the xi are orthonormal, we get
xT Ax = (c1 xT1 + · · · + cn xTn )(c1 λ1 x1 + · · · + cn λn xn )
= λ1 c21 + · · · + λn c2n .
Thus, every λi > 0 implies xT Ax > 0 whenever x 6= 0.
(a ⇒ c). We know det A = λ1 . . . λn > 0. To prove the same result for all Ak , we look at a non-zero
vector x whose last n − k components are 0, then
T
Ak ∗ xk
T
0 < x Ax = xk 0
= xTk Ak xk .
∗ ∗ 0
Thus, det Ak > 0 follows by induction.
(c ⇒ d). Without row exchanges, the pivot pk in Gaussian elimination is det Ak / det Ak−1 . This
can also be proved easily by induction.
17
(d ⇒ a). Gaussian elimination gives us a LDU factorization of A where all diagonal entries of L
and U are 1. Also, the diagonal entries di of D is exactly the ith pivot pi . The fact that A is symmetric
implies L = U T , hence A = LDLT , which gives
xT Ax = (xT L)(D)(LT x) = d1 (LT x)21 + d2 (LT x)22 + · · · + dn (LT x)2n
Since L is fully ranked, LT x 6= 0 whenever x 6= 0. So the pivots di > 0 implies xT Ax > 0 for all
non-zero vectors x.
7
The Rayleigh’s quotient and the variational characterizations
For a Hermitian matrix A, the following is known as the Rayleigh’s quotient :
x∗ Ax
.
x∗ x
Theorem 7.1 (Rayleigh-Ritz). Suppose λ1 ≥ λ2 ≥ · · · ≥ λn are the eigenvalues of a real symmetric
matrix A. Then, the quotient R(x) is maximized at any λ1 -eigenvector x = x1 with maximum value λ1 .
R(x) is minimized at any λn -eigenvector x = xn with minimum value λn ,
R(x) =
Proof. Let Q be a matrix whose columns are a set of orthonormal eigenvectors {x1 , . . . , xn } of A corresponding to λ1 , . . . , λn , respectively. Writing x as a linear combination of columns of Q: x = Qy, then
since QT AQ = Λ we have
R(x) =
(QT y)T A(QT y)
y T Λx
λ1 y12 + · · · + λn yn2
xT Ax
=
=
=
xT x
(QT y)T (QT y)
yT y
y12 + · · · + yn2
Hence,
λ1 ≥ R(x) =
λ1 y12 + · · · + λn yn2
≥ λn
y12 + · · · + yn2
Moreover, R(x) = λ1 when y1 6= 0 and yi = 0, ∀i > 1. This means x = Qy is a λ1 -eigenvector. The
case R(x) = λn case is proved similarly.
The theorem above is also referred to as the Rayleigh principle, which also holds when A is Hermitian. The proof is identical, except that we have to replace transposition (T) by Hermitian transposition
(*). An equivalent statement of the principle is as follows.
Corollary 7.2. Suppose λ1 ≥ λ2 ≥ · · · ≥ λn are the eigenvalues of a real symmetric matrix A. Over
all non-zero unit vectors x ∈ Rn , xT Ax is maximized at a unit λ1 -eigenvector, with maximum value λ1 ,
and minimized at a unit λn -eigenvector, with minimum value λn .
Rayleigh’s principle essentially states that
λ1 = maxn R(x) and λn = minn R(x)
x∈R
x∈R
What about the rest of the eigenvalues ? Here is a simple answer, stated without proof. The proof is
simple enough.
Theorem 7.3. Suppose λ1 ≥ λ2 ≥ · · · ≥ λn are the eigenvalues of a Hermitian matrix A, and
u1 , . . . , un are the corresponding set of orthonormal eigenvectors. Then,
λk =
λk =
max
RA (x)
min
RA (x)
06=x∈Cn
x⊥u1 ,...,uk−1
06=x∈Cn
x⊥uk+1 ,...,un
18
The theorem has a pitfall that sometime we don’t know the eigenvectors. The following generalization of Rayleigh’s principle, sometime referred to as the minimax and maximin principles for eigenvalues,
fill the hole by not requiring us to know that eigenvectors.
Theorem 7.4 (Courant-Fisher). Let Vk be the set of all k-dimensional subspaces of Cn . Let λ1 ≥ λ2 ≥
· · · ≥ λn be the eigenvalues of a Hermitian matrix A. Then,


λk = max min R(x)
S∈Vk
x∈S
x6=0


=
min
S∈Vn−k+1
max R(x)
x∈S
x6=0
Note. It should be noted that the previous two theorems are often referred to as the variational characterization of the eigenvalues.
Proof. Let U = [u1 , u2 , . . . , un ] be the unitary matrix with unit eigenvectors u1 , . . . , un corresponding
to the eigenvalues λ1 , . . . , λn . Let us first fix S ∈ Vk and let S 0 be the image of S under the invertible
linear transformation represented by U ∗ . Obviously, dim(S 0 ) = k. We have already known that R(x) is
bounded, so it is safe to say the following, with x 6= 0, y 6= 0 being implicit.
x∗ Ax
x∈S x∗ x
(U ∗ x)∗ Λ(U ∗ x)
= inf
x∈S (U ∗ x)∗ (U ∗ x)
y ∗ Λy
= inf 0 ∗
y∈S y y
y ∗ Λy
≤
inf 0
y∗y
y∈S
inf R(x) =
x∈S
inf
y1 =···=yk−1 =0
=
inf 0
λ1 y12 + · · · + λn yn2
y12 + · · · + yn2
inf 0
λk yk2 + · · · + λn yn2
yk2 + · · · + yn2
y∈S
y1 =···=yk−1 =0
=
y∈S
y1 =···=yk−1 =0
≤ λk
The inequality in line 4 is justified by the fact that there is a non zero vector y ∈ S 0 such that y1 = · · · =
yk−1 = 0. To get this vector, put k basis vectors of S 0 into the rows of a k × n matrix and do Gaussian
elimination.
Now, S was chosen arbitrarily, so it is also true that
sup inf R(x) ≤ λk
S∈Vk x∈S
Moreover, R(uk ) = (U ∗ uk )∗ Λ(U ∗ uk ) = ek Λek = λk . Thus, the infimum and supremum can be
changed to minimum and maximum, and the inequality can be changed to equality. The other equality
can be proven similarly.
19
This theorem has a very important and beautiful corollary, called the Interlacing of eigenvalues to be
presented in the next section. Let us introduce a simple corollary.
Corollary 7.5. Let A be an n × n Hermitian matrix, let k be a given integer with 1 ≤ k ≤ n, let
λ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of A, and let Sk be a given k-dimensional subspace of Cn . The
following hold
(a) If there exists c1 such that R(x) ≤ c1 for all x ∈ Sk , then c1 ≥ λn−k+1 ≥ · · · ≥ λn .
(b) If there exists c2 such that R(x) ≥ c2 for all x ∈ Sk , then λ1 ≥ · · · ≥ λk ≥ c2 .
Proof. It is almost straightforward from the Courant-Fisher theorem that
(a)
c1 ≥ max R(x) ≥
06=x∈Sk
min
max R(x) = λn−k+1
dim(S)=n−(n−k+1)+1 06=x∈S
(b)
c2 ≤ min R(x) ≤
06=x∈Sk
8
max
min R(x) = λk
dim(S)=k 06=x∈S
Other proofs of the variational characterizations
The presentation below follows that in [2]. Let A be a Hermitian matrix of order n with eigenvalues
λ1 ≥ · · · ≥ λn . Let U = [u1 , . . . , un ] be the unitary matrix where ui is the unit λi -eigenvector of A.
Then,
X
A=
λj uj u∗j .
(10)
j
This equation is called the spectral resolution of A. From the equation we obtain
X
X
∗
hx, Axi =
λj u∗j x (u∗j x) =
λj |u∗j x|2
j
j
When |x| = 1, we have
X
j
|u∗j x|2 =
X
∗
u∗j x (u∗j x) = x∗ U U ∗ x = 1.
j
As we can always normalize x, we shall only consider unit vectors x in this section from here on.
Lemma 8.1. Suppose 1 ≤ i < k ≤ n, then
{hx, Axi | |x| = 1, x ∈ span{ui , . . . , uk }} = [λk , λi ].
Additionally,
hui , Aui i = λi , ∀i
20
Proof. The fact that hui , Aui i = λi is trivial. Assume x is a unit vector in span{ui , . . . , uk }, then
X
hx, Axi =
λj |u∗j x|2 ,
i≤j≤k
which yields easily
{hx, Axi | |x| = 1, x ∈ span{ui , . . . , uk }} ⊆ [λk , λi ].
For the converse, let λ ∈ [λk , λi ]. Let y ∈ Cn be a column vector all of whose components are 0, except
that
s
λ − λk
yi =
λi − λk
s
λi − λ
.
yk =
λi − λk
Then, x = U y ∈ span{ui , uk } ⊆ span{ui , . . . , uk }, x is obviously a unit vector. Moreover,
hx, Axi = (U y)∗ A(U y) = y ∗ Λy = λi yi2 + λk yk2 = λ.
Lemma 8.1 gives an interesting proof of Theorem 7.4.
Another proof of Courant-Fisher Theorem. We shall show that


λk = max min R(x) .
S∈Vk
x∈S
x6=0
The other equality is obtained similarly. Fix S ∈ Vk , let W = span{uk , . . . , un }, then W and S have
total dimension n+1. Thus, there exists a unit vector x ∈ W ∩S. Lemma 8.1 implies R(x) = hx, Axi ∈
[λn , λk ]. Consequently,
min R(x) ≤ λk .
x∈S
x6=0
Equality is obtained by picking any k-dimensional subspace S containing uk , and x = uk .
9
Applications of the variational characterizations and minimax principle
Throughout the rest of this section, we use Mn to denote the set of all n × n matrices over C (i.e.
2
Mn ≈ Cn ). The first application is a very important and beautiful theorem named the Interlacing of
Eigenvalues Theorem.
Theorem 9.1 (Interlacing of eigenvalues). Let A be a Hermitian matrix with eigenvalues α1 ≥ α2 ≥
· · · ≥ αn . Let B be the matrix obtained from A by removing row i and column i, for any i ∈ [n]. Suppose
B has eigenvalues β1 ≥ · · · ≥ βn−1 , then
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ βn−1 ≥ αn
Note. A proof of this theorem using Spectral Decomposition Theorem can also be given, but it is not
very instructive.
21
Proof. We can safely assume that i = n for the ease of presentation. We would like to show that as
1 ≤ k ≤ n − 1, αk ≥ βk ≥ αk+1 . Let x = [y T xn ]T ∈ Cn where y ∈ Cn−1 . Note that if xn = 0 then
x∗ Ax = y ∗ By. We first use the maximin form of Courant-Fisher theorem to write
αk =
≥
=
=
maxn
min
x∗ Ax
x∗ x
max
min
x∗ Ax
x∗ x
max
min
x∗ Ax
x∗ x
max
min
y ∗ By
y∗y
S⊆C
06=x∈S
dim(S)=k
S⊆{en }⊥ 06=x∈S
dim(S)=k
S⊆{en }⊥ 06=x∈S
dim(S)=k xn =0
S⊆Cn−1 06=y∈S
dim(S)=k
= βk
Now, we use the minimax form of the theorem to obtain αk+1 ≤ βk .
αk+1 =
≤
=
=
minn
max
S⊆C
06=x∈S
dim(S)=n−(k+1)+1
x∗ Ax
min
max
min
max
S⊆{en }⊥ 06=x∈S
dim(S)=n−k
S⊆{en }⊥ 06=x∈S
dim(S)=n−k xn =0
min
x∗ Ax
x∗ x
x∗ x
x∗ Ax
x∗ x
max
06=y∈S
S⊆Cn−1
dim(S)=(n−1)−k+1
y ∗ By
y∗y
= βk
The converse of the Interlacing of Eigenvalues Theorem is also true.
Theorem 9.2. Given real numbers
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ βn−1 ≥ αn .
(11)
Let B = diag[β1 , . . . , βn−1 . Then, there exist a vector y ∈ Rn−1 and a real number a such that the
matrix
B y
A= T
y
a
has eigenvalues α1 , . . . , αn .
22
P
P
Proof. Firstly, a = ni=1 αi − n−1
i=1 βi .
f (x) = det(Ix − A).

x − β1
 ..

det(Ix − A) = det  .
 0
y1
To determine the vector y = [y1 , . . . , yn−1 ]T , we evaluate
...
..
.
...
...
0
..
.

y1
.. 
. 

yn−1 
x−a
x − βn−1
yn−1
= (x − β1 ) . . . (x − βn−1 ) x − a −
2
yn−1
y12
− ··· −
x − β1
x − βn−1
.
Let g(x) = (x − β1 ) . . . (x − βn−1 ), then
f (x) = g(x)(x − a) + r(x),
where
r(x) = −y12
g(x)
g(x)
2
− · · · − yn−1
x − β1
x − βn−1
(12)
is a polynomial of degree n − 2. We want f (x) = (x − α1 ) . . . (x − αn ), which could be used to solve
for the yi .
If the βi are all distinct, then r(x) is determined at n − 1 points: r(βi ) = f (βi ). Lagrange interpolation gives:
n−1
X
g(x)
r(x) =
f (βi ) 0
(13)
g (βi )(x − βi )
i=1
Comparing (12) and (13), we conclude that if for all i = 1, . . . , n − 1,
f (βi )
(βi − α1 ) . . . (βi − αi−1 )(βi − αi )(βi − αi+1 ) . . . (βi − αn )
=
≤ 0,
0
g (βi )
(βi − β1 ) . . . (βi − βi−1 )(βi − βi+1 ) . . . (βi − βn )
then we can solve for the yi . The interlacing condition (11) implies that (βi − αj ) and (βi − βj ) have the
i)
same sign except when j = i. Hence, gf0(β
(βi ) ≤ 0 as desired.
If, say, β1 = · · · = βk > βk+1 ≥ . . . , then the interlacing condition (11) forces β1 = · · · = βk =
α2 · · · = αk . Hence, we can divide both sides of f (x) = g(x)(x − a) + r(x) by (x − β1 )k−1 to eliminate
the multiple root β1 of g(x). After all multiple roots have been eliminated this way, we can proceed as
before.
Hermann Weyl (1912, [11]) derived a set of very interesting inequalities concerning the eigenvalues
of three Hermitian matrices A, B, and C where C = A + B. We shall follow the notations used in [2].
For any matrix A, let λ↓j (A) and λ↑j (A) denote the jth eigenvalue of A when all eigenvalues are weakly
ordered decreasingly and increasingly, respectively. When given three Hermitian matrices A, B, and C
where C = A + B, implicitly we define αj = λ↓j (A), βj = λ↓j (B), and γj = λ↓j (C), unless otherwise
specified.
Theorem 9.3 (Weyl, 1912). Given Hermitian matrices A, B, and C of order n such that C = A + B.
Then,
γi+j−1 ≤ αi + βj for i + j − 1 ≤ n.
(14)
23
Proof. For k = 1, . . . , n, let uk , vk , and wk be the unit αk , βk , and γk eigenvectors of A, B, and C,
respectively. The three vector spaces spanned by {ui , . . . , un }, {vj , . . . , vn }, and {w1 , . . . , wi+j−1 }
have total dimension 2n + 1. Hence, they have a non-trivial intersection. Let x be a unit vector in the
intersection, then by Lemma 8.1
hx, Axi ∈ [αn , αi ]
hx, Bxi ∈ [βn , βj ]
hx, Cxi ∈ [γi+j−1 , β1 ].
Thus,
γi+j−1 ≤ hx, Cxi = hx, Axi + hx, Bxi ≤ αi + βj .
A few interesting consequences are summarized as follows.
Corollary 9.4.
(i) For all k = 1, . . . , n.
αk + βn ≥ γk ≥ αk + β1 ,
(ii)
γ1 ≤ α1 + β1
γn ≥ αn + βn
Proof.
(i) The second inequality is obtained by specializing j = 1, i = k in Theorem 9.3. The first
inequality follows from the first by noting that −C = −A − B.
(ii) Applying Theorem 9.3 with i = j = 1 yields the first inequality. The second follows by the
−C = −A − B argument.
The following is a trivial consequence of the minimax principle.
Corollary 9.5 (Monotonicity principle). Define a partial order of all Hermitian matrices as follows.
A ≤ B iff hx, Axi ≤ hx, Bxi ∀x.
Then, for all j = 1, . . . , n we have λj (A) ≤ λj (B) whenever A ≤ B.
Equivalently, if A and B are Hermitian with B being positive semidefinite, then
λ↓k (A) ≤ λ↓k (A + B)
10
Sylvester’s law of inertia
Material in this section follows closely that in the book Matrix Analysis by Horn and Johnson [5].
Definition 10.1. Let A, B ∈ Mn be given. If there exists a non-singular matrix S such that
• B = SAS ∗ , then B is said to be ∗-congruent to A.
• B = SAS T , then B is said to be T -congruent to A.
24
(15)
Note. These two notion of congruence must be closely related; they are the same if S is a real matrix.
When it is not important to distinguish between the two, we use the term congruence without a prefix.
Since S was required to be non-singular, congruent matrices have the same rank. Also note that, if A is
Hermitian then so is SAS ∗ ; if A is symmetric, then SAS T is also symmetric.
Proposition 10.2. Both ∗-congruence and T -congruence are equivalent relations.
Proof. It is easy to verify that the relations are reflexive, symmetric and transitive. Only need to notice
that S is non-singular.
The set Mn , therefore, is partitioned into equivalence classes by congruence. As an abstract problem, we may seek a canonical representative of each equivalence class under each type of congruence.
Sylvester’s law of inertia gives us the affirmative answer for the ∗-congruence case, and thus also gives
the answer for the set of real symmetric matrices.
Definition 10.3. Let A ∈ Mn be a Hermitian matrix. The inertia of A is the ordered triple
i(A) = (i+ (A), i− (A), i0 (A))
where i+ (A) is the number of positive eigenvalues of A, i− (A) is the number of negative eigenvalues of
A, and i0 (A) is the number of zero eigenvalues of A, all counting multiplicity. The signature of A is the
quantity i+ (A) − i− (A).
Note. Since rank(A) = i+ (A) + i− (A), the signature and the rank of A uniquely identify the inertia of
A.
Theorem 10.4 (Sylvester’s law of inertia). Let A, B ∈ Mn be Hermitian matrices. A and B are ∗congruent if and only if A and B have the same inertia.
Proof. (⇒). Firstly, for any Hermitian matrix C ∈ Mn , C can be diagonalized by a unitary matrix
U , i.e. C = U ΛU ∗ , with Λ being diagonal containing all eigenvalues of C. By multiplying U with a
permutation matrix, we can safely assume that down the main diagonal of Λ, all positive eigenvalues
go first:
i+ , then the negatives: λi+ +1 , . . . , λi+ +i− , and the rest are 0’s. Thus, if we set D =
p λ1 , . . . , λp
diag( |λ1 |, . . . , |λi+ +i− |, 0, . . . , 0), then


1

 ..


.




1




−1




..
Λ = D
 D = DIC D
.




−1




0




.
.

. 
0
with the entries not shown being 0. IC is called the inertia matrix of C. Consequently, letting S = U D
(S is clearly non-singular) we get
C = U ΛU ∗ = U DIC DU ∗ = SIC S ∗
(16)
Hence, if A and B have the same inertia, then they could be written in the form (16) with possibly
a different S for each, but IA = IB . Since ∗-congruence is transitive, A is ∗-congruent to B as they are
both congruent to IA .
25
(⇐). Now, assume A = SBS ∗ for some non-singular matrix S. A and B have the same rank,
so i0 (A) = i0 (B). We are left to show that i+ (A) = i+ (B). For convenience, let a = i+ (A) and
b = i+ (B). Let u1 , . . . , ua be the orthonormal eigenvectors for the positive eigenvalues of A, so that
dim(Span{u1 , . . . , ua }) = a. If x = c1 u1 + · · · + ca ua , then x∗ Ax = λ1 |c1 |2 + · · · + λa |ca |2 > 0. But
then
x∗ Ax = x∗ SBS ∗ x = (S ∗ x)∗ B(S ∗ x) > 0
so y ∗ By > 0 for all vector y in Span{S ∗ u1 , . . . , S ∗ ua }, which also has dimension a. By Corollary 7.5,
b ≥ a. A similar argument shows that a ≥ b, which completes the proof.
Corollary 10.5. Given x ∈ Rn , if xT Ax can be written as the sum of m products involving two linear
factors, that is
m X
X
X
T
x Ax =
(
bi,k xi )(
cj,k xj )
k=1 i∈Sk
j∈Tk
Further assume that A has p positive eigenvalues and q positive eigenvalues (counting multiplicities),
then m ≥ max(p, q).
Proof. I have not been able to see why this corollary follows from Sylvester’s law yet. A proof of the
corollary can be given, but that’s not the point.
11
Majorizations
Several results in linear algebra are best presented through the concept of majorization. We first need a
few definitions.
Let v1 , . . . , vm be vectors in Rn . The vector
v = λ1 v1 + · · · + λm vm
P
is called the linear combination of the vj ; when
λj = 1, we get an affine combination; a canonical
combination is a linear combination in which λj ≥ 0, ∀j; and a convex combination is an affine combination which is also canonical. The linear (affine, canonical, convex) hull of {v1 , . . . , vm } is the set of
all linear (affine, canonical, convex) combinations of the vj . Note that in the above definitions, m could
be infinite. The convex hull of a finite set of vectors is called a (convex polyhedral) cone.
Let x = (x1 , . . . , xn ) be a vector in Rn . The vector x↓ = (x↓1 , . . . , x↓n ) is obtained from x by
rearranging all coordinates in weakly decreasing order. The vector x↑ can be similarly defined.
Suppose x, y ∈ Rn . We say x is weakly majorized by y and write x ≺w y if
k
X
j=1
Additionally, if
x↓j
≤
k
X
yj↓ , ∀k = 1, . . . , n.
(17)
j=1
n
X
x↓j
=
j=1
n
X
yj↓ ,
(18)
j=1
then x is said to be majorized by y and we write x ≺ y.
The concept of majorization is very important in the theory of inequalities, as well as in linear algebra.
We develop here a few essential properties of majorization.
Theorem 11.1. Let A ∈ Mn be Hermitian. Let a be the vector of diagonal entries of A, and α = λ(A)
the vector of all eigenvalues of A. Then, a ≺ α.
26
Proof. When n = 1, there is nothing to show. In general, let B ∈ Mn−1 be a Hermitian matrix obtained
from A by removing the row and the column corresponding to a smallest diagonal entry of A. Let
β1 , . . . , βn−1 be the eigenvalues of B. Then, α1 ≥ β1 ≥ · · · ≥ βn−1 ≥ αn . Moreover, induction
hypothesis yields
k
k
X
X
a↓i ≤
βi , 1 ≤ k ≤ n − 1.
i=1
i=1
Hence,
k
X
a↓i ≤
i=1
Lastly, tr(A) =
P
ai =
P
k
X
αi , 1 ≤ k ≤ n − 1.
i=1
αi finishes the proof.
It turns out of the converse also holds. Before showing the converse, we need a technical lemma.
Lemma 11.2. Let x, y ∈ Rn such that x y. Then, there exists a vector z ∈ Rn−1 such that
x↓1 ≥ z1 ≥ x↓2 ≥ z2 · · · ≥ zn−1 ≥ x↓n ,
↓
]T .
and z [y1↓ , . . . , yn−1
Proof. When n = 2, we must have x↓1 ≥ y1↓ ≥ y2↓ ≥ x↓2 . Hence, picking z1 = y1↓ suffices.
Suppose n ≥ 3. Let D ⊆ Rn−1 be defined by
)
(
k
k
X
X
↓
↓
vi ≥
yk , 1 ≤ k ≤ n − 2 .
D = v ∈ Rn−1 | x1 ≥ v1 ≥ · · · ≥ vn−1 ≥ x↓n , and
i=1
i=1
Pn−1 ↓
Pn−1
yi = c would complete the proof.
zi = i=1
Then, the existence of a point z ∈ D for which i=1
↓
↓
T
Notice that as [x1 , . . . , xn−1 ] ∈ D, D is not empty. Define a continuous function f : D → R by
f (v) = v1 + · · · + vn−1 . Then, f ([x↓1 , . . . , x↓n−1 ]T ) ≥ c. Since D is a connected domain, if we could
find v ∈ D for which f (v) ≤ c, then there must exist the vector z for which f (z) = c. Let v̂ ∈ D be a
vector such that v̂ = min{f (z) | z ∈ D}. If f (v̂) ≤ c, then we are done. Suppose f (v̂) > c, we shall
show that f (v̂) ≥ c to reach a contradiction. We have
k
X
v̂i ≥
i=1
v̂k ≥
k
X
yk↓ , 1 ≤ k ≤ n − 1
i=1
x↓k+1 ,
1 ≤ k ≤ n − 1.
(19)
(20)
Suppose first that all inequalities (19) are strict. Then, it must be the case that v̂k = x↓k+1 , k =
1, . . . , n − 1. Otherwise, one could reduce some v̂k to make f (v̂k ) smaller. Consequently, f (v̂) =
f ([x↓2 , . . . , x↓n ]T ) ≤ c.
If not all of the inequalities (19) are strict, then let r be the largest index for which
r
X
i=1
k
X
i=1
v̂i =
v̂i >
r
X
i=1
k
X
yi↓
(21)
yi↓ , r < k ≤ n − 1
(22)
i=1
27
(Notice the fact that r ≤ n − 2, since we assumed f (v̂) > c.) By the same reasoning as before, we must
have v̂k = x↓k+1 for k = r + 1, . . . , n − 1. Thus,
f (v̂) − c =
=
=
n−1
X
i=1
r
X
v̂i −
v̂i +
i=1
n−1
X
i=r+1
r
X
n
X
yi↓
+
r
X
=
v̂i −
n−1
X
yi↓
++
n
X
x↓i
−
n
X
n−1
X
i=r+2
n X
n−1
X
yi↓
i=1
yi↓
−
i=r+2
yi↓ −
yi↓
i=1
i=r+2
i=1
=
yi↓
i=1
i=1
≤
n−1
X
n−1
X
yi↓
i=1
yi↓
i=r+1
↓
yi↓ − yi−1
i=r+2
≤ 0
We are now ready to show the converse of Theorem 11.1.
Theorem 11.3. Let a and α be two vectors in Rn . If a ≺ α, then there exists a real symmetric matrix
A ∈ Mn which has diagonal entries ai , and λ(A) = α.
Proof. The case n = 1 is trivial. In general, assume without loss of generality that a = a↓ , and α = α↓ .
Also, let b = [a1 , . . . , an−1 ]. Then, Lemma 11.2 implies the existence of a vector β ∈ Rn−1 such that
α1 ≥ β1 ≥ · · · ≥ βn−1 ≥ αn ,
and that β b. The induction hypothesis ensures the existence of a real symmetric matrix B which has
diagonal entries b and eigenvalues β. Now, Theorem 9.2 allows us to extend B into a real symmetric
matrix A0 ∈ Mn :
Λ y
0
A = T
,
y
b
where Λ = diag(β1 , . . . , βn−1 ), and A0 has eigenvalues α. One more step needs to be done to turn A0
into matrix A we are looking for. We know that there exists a orthonormal matrix Q ∈ Mn−1 for which
B = QΛQT . Hence, letting
Q 0 Λ y QT 0
QΛQT Qy
B
Qy
A=
=
=
0 1 yT b
0 1
(Qy)T
a
(Qy)T a
finishes the proof.
For any π ∈ Sn and y ∈ Rn , let yπ := (yπ(1) , . . . , yπ(n) ).
Theorem 11.4. Given vectors x, y ∈ Rn , the following three statements are equivalent.
(i) x y.
28
(ii) There exists a doubly stochastic matrix M for which x = M y.
(iii) x is in the convex hull of all n! points yπ , π ∈ Sn .
Proof. We shall show (i) ⇒ (ii) ⇒ (iii) ⇒ (i).
Firstly, assume x ≺ y. By Theorem 11.3 there is a Hermitian matrix A ∈ Mn with diagonal entries y
and λ(A) = x. There thus must exist a unitary matrix U = (uij ) for which A = U diag(y1 , . . . , yn )U ∗ ,
which implies
n
X
xi = aii =
yj |uij |2 .
j=1
Hence, taking M = (|uij |2 ) completes the proof.
Secondly, suppose x = M y where M is a doubly stochastic matrix. Birkhoff Theorem implies that
there are non-negative real numbers cπ , π ∈ Sn such that
X
cπ = 1
π∈Sn
M
=
X
cπ Pπ ,
π∈Sn
where Pπ is the permutation matrix corresponding to π. Consequently,
X
X
cπ yπ .
cπ Pπ y =
x = My =
π∈Sn
π∈Sn
Lastly, suppose there are non-negative real numbers cπ , π ∈ Sn such that
X
cπ = 1
π∈Sn
x =
X
cπ yπ .
π∈Sn
Without loss of generality, we assume y = y ↓ . We can write xi is the following form:


xi =
n
X
j=1
Note that
P
i dij
=
P
j
n
 X
 X
=
yj 
c
yj dij .
π


π∈Sn
π(i)=j
j=1
dij = 1. (This is rather like having (iii) ⇒ (ii) first, and then we show
29
(ii) ⇒ (i).) The following is straightforward:
k
X
xi =
i=1
k X
n
X
yj dij
i=1 j=1
=
n
X
yj
j=1
≤ y1
k
X
k
X
dij
i=1
di1 + · · · + yk
i=1
= y1 + · · · + y k − y 1
k
X

dik + yk+1 
i=1
1−
n
k
X
X

dij 
j=k+1 i=1
k
X
!
di1
− · · · − yk
1−
i=1

≤ y1 + · · · + yk − yk+1 k −
k X
k
X
= y1 + · · · + yk − yk+1 k −
n X
k
X

!
dik
+ yk+1 
i=1


dij  + yk+1 
j=1 i=1

k
X
n
k
X
X
n
k
X
X

dij 
j=k+1 i=1

dij 
j=k+1 i=1

dij 
j=1 i=1

= y1 + · · · + yk − yk+1 k −
k X
n
X

dij 
i=1 j=1
= y1 + · · · + y k .
References
[1] M. A RTIN, Algebra, Prentice-Hall Inc., Englewood Cliffs, NJ, 1991.
[2] R. B HATIA, Linear algebra to quantum cohomology: the story of Alfred Horn’s inequalities, Amer. Math. Monthly, 108
(2001), pp. 289–318.
[3] R. A. B RUALDI, The Jordan canonical form: an old proof, Amer. Math. Monthly, 94 (1987), pp. 257–267.
[4] F. R. G ANTMACHER, The theory of matrices. Vol. 1, AMS Chelsea Publishing, Providence, RI, 1998. Translated from
the Russian by K. A. Hirsch, Reprint of the 1959 translation.
[5] R. A. H ORN AND C. R. J OHNSON, Matrix analysis, Cambridge University Press, Cambridge, 1985.
[6] P. L ANCASTER AND M. T ISMENETSKY, The theory of matrices, Academic Press Inc., Orlando, Fla., second ed., 1985.
[7] D. S TANTON AND D. W HITE, Constructive combinatorics, Springer-Verlag, New York, 1986.
[8] G. S TRANG, Linear algebra and its applications, Academic Press [Harcourt Brace Jovanovich Publishers], New York,
second ed., 1980.
[9] H. S TRAUBING, A combinatorial proof of theCayley-Hamilton theorem, Discrete Math., 43 (1983), pp. 273–279.
[10] D. B. W EST, Introduction to graph theory, Prentice Hall Inc., Upper Saddle River, NJ, 1996.
[11] H. W EYL, Das asymptotische verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen, Math. Ann.,
71 (1912), pp. 441–479.
30
Download