J. Broida UCSD Fall 2009 Phys 130B QM II Supplementary Notes on Mathematics Part I: Linear Algebra 1 Linear Transformations Let me very briefly review some basic properties of linear transformations and their matrix representations that will be useful to us in this course. Most of this should be familiar to you, but I want to make sure you know my notation and understand how I think about linear transformations. I assume that you are already familiar with the three elementary row operations utilized in the Gaussian reduction of matrices: (α) Interchange two rows. (β) Multiply one row by a nonzero scalar. (γ) Add a scalar multiple of one row to another. I also assume that you know how to find the inverse of a matrix, and you know what a vector space is. If V is an n-dimensional vector space over a field F (which you can think of as either R or C), then a linear transformation on V is a mapping T : V → V with the property that for all x, y ∈ V and a ∈ F we have T (x + y) = T (x) + T (y) and T (ax) = aT (x) . We will frequently write T x rather than T (x). That T (0) = 0 follows either by letting a = 0 or noting that T (x) = T (x + 0) = T (x) + T (0). By way of notation, we will write T ∈ L(V ) if T is a linear transformation from V to V . A more general notation very often used is to write T ∈ L(U, V ) to denote a linear transformation T : U → V from a space U to a space V . P A set of vectors {v1 , . . . , vn } is said to be linearly independent if ni=1 ai vi = 0 implies that ai = 0 for all i = 1, . . . , n. The set {vi } is also said to span V if every vector in V can be written as a linear combination of the vi . A basis for V is a set of linearly independent vectors that also spans V . The dimension of V is the (unique) number of vectors in any basis. A simple but extremely useful fact is that every vector x ∈ V has a unique expansion given basis {ei }. P Indeed, if we have two such expansions Pn in terms of any P n n ′ ′ x = x e and x = x e , then i=1 i i i=1 i i i=1 (xi − xi )ei = 0. But the ei are ′ linearly independent by definition so that xi − xi = 0 for each i, and hence we must have xi = x′i and the expansion is unique as claimed. The scalars xi are called the components of x. 1 Now suppose we are given a set {v1 , v2 , . . . , vr } of linearly independent vectors in a finite-dimensional space V with dim V = n. Since V always has a basis {e1 , e2 , . . . , en }, the set of n + r vectors {v1 , . . . , vr , e1 , . . . , en } will necessarily span V . We know that v1 , . . . , vr are linearly independent, so check to see if e1 can be written as a linear combination of the vi ’s. If it can, then delete it from the set. If it can’t, then add it to the vi ’s. Now go to e2 and check to see if it can be written as a linear combination of {v1 , . . . , vr , e1 }. If it can, delete it, and if it can’t, then add it to the set. Continuing in this manner, we will eventually arrive at a subset of {v1 , . . . , vr , e1 , . . . , en } that is linearly independent and spans V . In other words, we have extended the set {v1 , . . . , vr } to a complete basis for V . The fact that this can be done (at least in principle) is an extremely useful tool in many proofs. A very important characterization of linear transformations that we may find useful is the following. Define the set Ker T = {x ∈ V : T x = 0} . The set Ker T ⊂ V is called the kernel of T . In fact, it is not hard to show that Ker T is actually a subspace of V . Recall that a mapping T is said to be one-to-one if x 6= y implies T x 6= T y. The equivalent contrapositive statement of this is that T is one-to-one if T x = T y implies x = y. Let T be a linear transformation with Ker T = {0}, and suppose T x = T y. Then by linearity we have T x − T y = T (x − y) = 0. But Ker T = {0} so we conclude that x − y = 0 or x = y. In other words, the fact that Ker T = {0} means that T must be oneto-one. Conversely, if T is one-to-one, then the fact that we always have T (0) = 0 means that Ker T = {0}. Thus a linear transformation is one-to-one if and only if Ker T = {0}. A linear transformation T with Ker T = {0} is said to be a nonsingular transformation. If V has a basis {ei }, then any x ∈ V has a unique expansion which we will write as x = xi ei . Note that here I am using the Einstein summation convention where repeated indices are summed over. (The range of summation is always clear Pn from the context.) Thus xi ei is a shorthand notation for i=1 xi ei . Since we will almost exclusively work with Cartesian coordinates, there is no difference between superscripts and subscripts, and I will freely raise or lower indices as needed for notational clarity. In general, the summation convention should properly be applied to an upper and a lower index, but we will sometimes ignore this, particularly when it comes to angular momentum operators. Note also that summation indices are dummy indices. By this we mean that the particular letter used to sum over is irrelevant. In other words, xi ei is the same as xk ek , and we will frequently relabel indices in many of our calculations. In any case, since T is a linear map, we see that T (x) = T (xi ei ) = xi T (ei ) and hence a linear transformation is fully determined by its values on a basis. Since T maps V into V , it follows that T ei is just another vector in V , and hence we can write T e i = e j aj i (1) where the scalar coefficients aj i define the matrix representation of T with respect to the basis {ei }. We sometimes write [T ] = (ai j ) to denote the fact that the 2 n×n matrix A = (ai j ) is the matrix representation of T . (And if we need to be clear just what basis the matrix representation is with respect to, we will write [T ]e .) Be sure to note that it is the row index that is summed over in this equation. This is necessary so that the composition ST of two linear transformations S and T has a matrix representation [ST ] = AB that is the product of the matrix representations [S] = A of S and [T ] = B of T taken in the same order. We will denote the set of all n × n matrices over the field F by Mn (F ), and the set of all m × n matrices over F by Mm×n (F ). Furthermore, if A ∈ Mm×n (F ), we will label the rows of A by subscripts such as Ai , and the columns of A by superscripts such as Aj . It is important to realize that each row vector Ai is just a vector in F n , and each column vector Aj is just a vector in F m . Therefore the rows of A form a subspace of Rn called the row space and denoted by row(A). Similarly, the columns form a subspace of F m called the column space col(A). The dimension of row(A) is called the row rank rr(A) of A, and dim col(A) is called the column rank cr(A). What happens if we perform elementary row operations on A? Since all we do is take linear combinations of the rows, it should be clear that the row space won’t change, and hence rr(A) is also unchanged. However, the components of the column vectors get mixed up, so it isn’t at all clear just what happens to either col(A) or cr(A). In fact, while col(A) will change, it turns out that cr(A) remains unchanged. Probably the easiest way to see this is to consider those columns of A that are linearly dependent ; and with no loss of generality we can call them A1 , . . . , Ar . Then their linear dependence means there are nonzero scalars x1 , . . . , xr such that P r i i=1 A xi = 0. In full form this is a11 a1r .. . . x1 + · · · + .. xr = 0. am1 amr But this is a system of m linear equations in r unknowns, and we have that the Pseen r solution set doesn’t change under row equivalence. In other words, i=1 Ãi xi = 0 for the same coefficients xi . Then the same r columns of à are linearly dependent, and hence both A and à have the same (n − r) independent columns, i.e., cr(A) = cr(Ã). (There can’t be more dependent columns of à than A because we can apply the row operations in reverse to go from à to A. If à had more dependent columns, then when we got back to A we would have more than we started with.) Furthermore, it is also true that the dimension of the row and column spaces of a matrix are the same, and this is in fact what is meant by the rank of a matrix. To see this, think about the reduced row echelon form of a matrix. This form has a 1 for the first entry of every nonzero row, and every other entry in the column containing that 1 is 0. For example, the following matrix is in reduced row echelon form: 1 0 5 0 2 0 1 2 0 4 0 0 0 1 7 . 0 0 0 0 0 3 Note that every column either consists entirely of a single 1, or is a linear combination of columns that each have only a single 1. In addition, the number of columns containing that single 1 is the same as the number of nonzero rows. Therefore the row rank and column rank are the same, and this common number is called the rank of a matrix. If A ∈ Mn (F ) is an n × n matrix that is row equivalent to the identity matrix I, then A has rank n, and we say that A is nonsingular. If rank A < n then A is said to be singular. It can be shown that a matrix is invertible (i.e., A−1 exists) if and only if it is nonsingular. Now let A ∈ Mm×n (F ) and B ∈ Mn×r (F ) be such that P the product AB is defined. Since the (i, j)th entry of AB is given by (AB)ij = k aik bkj , we see that the ith row of AB is given by a linear combination of the rows of B: X X X X (AB)i = aik (bk1 , . . . , bkr ) = aik Bk . (2a) aik bk1 , . . . , aik bkr = k k k k This shows that the row space of AB is a subspace of the row space of B. Another way to write this is to observe that X X (AB)i = aik bk1 , . . . , aik bkr k k b11 .. = (ai1 , . . . , ain ) . bn1 ··· ··· b1r .. . = Ai B. (2b) bnr Similarly, for the columns of a product we find that the jth column of AB is a linear combination of the columns of A: P a1k k a1k bkj n n X X .. .. (AB)j = = b = Ak bkj (3a) kj . P . k=1 k=1 amk k amk bkj and therefore the column space of AB is a subspace of the column space of A. We also have the result P b1j a11 · · · a1n k a1k bkj .. .. .. = AB j . .. (3b) (AB)j = = . . . P . bnj am1 · · · amn k amk bkj These formulas will be quite useful to us in a number of theorems and calculations. Returning to linear transformations, it is extremely important to realize that T takes the ith basis vector ei into the ith column of A = [T ]. This is easy to see because with respect to the basis {ei } itself, the vectors ei have components simply 4 given by 1 0 e1 = .. . 0 Then 0 1 e2 = .. . 0 0 en = .. . . ··· 0 1 T e i = e j aj i = e 1 a1 i + e 2 a2 i + · · · e n an i 1 a i 0 0 1 2 a i 0 n 1 2 0 1 = .. a i + .. a i + · · · + .. a i = .. . . . . 0 1 0 an i which is just the ith column of (aj i ). Example 1. For example, let V have the basis {e1 , e2 , e3 }, and let T be the linear transformation defined by T (e1 ) = 3e1 + e3 T (e2 ) = e1 − 2e2 − e3 T (e3 ) = e2 + e3 Then the representation of T (relative to this basis) is 3 1 0 1 . [T ]e = 0 −2 1 −1 1 Now suppose we have another basis {ēi } for V . Since each basis vector ēi is just some vector in V , it can be expressed in terms of the original basis {ei }. We can think of this as defining another linear transformation P whose representation (pi j ) is called the transition matrix and is defined by ēi = P ei = ej pj i . (4) Here we are being somewhat sloppy in using the same symbol P to denote both the linear transformation P and its matrix representation P = (pi j ). Note that we could equally well write each ei in terms of {ēj }, and hence the matrix P must be invertible. Now realize that a vector x ∈ V exists independently of any particular basis for V . However, its components most definitely depend on the basis, and hence using 5 (4) we have x = xj ej = x̄i ēi = x̄i ej pj i = (pj i x̄i )ej . Equating coefficients of each ej (this is an application of the uniqueness of the expansion in terms of a given basis) we conclude that xj = pj i x̄i or, equivalently, x̄i = (p−1 )i j xj . (5) Equations (4) and (5) describe the relationship between vector components with respect to two distinct bases. What about the matrix representation of a linear transformation T with respect to bases {ei } and {ēi }? By definition we can write both T e i = e j aj i (6a) T ēi = ēj āj i . (6b) and Using (4) in the right side of (6b) we have T ēi = ek pk j āj i On the other hand, we can use (4) in the left side of (6b) and then use (6a) to write T ēi = T (ej pj i ) = pj i T ej = pj i ek ak j = ek ak j pj i where in the last step we wrote the matrix product in the correct order. Now equate both forms of T ēi and use the linear independence of the ek to conclude that pk j āj i = ak j pj i which in matrix notation is just P Ā = AP . Since P is invertible this can be written in the form that should be familiar to you: Ā = P −1 AP . (7) A relationship of this form is called a similarity transformation. Be sure to note that P goes from the basis {ei } to the basis {ēi }. −1 Conversely, suppose T is represented by PA in the basis {ei }, and let Ā = P AP . Defining a new basis {ēi } by ēi = P ei = j ej pji it is straightforward to show that the matrix representation of T relative to the basis {ēi } is just Ā. Example 2. As an example, consider the linear transformation T : R3 → R3 (i.e., T ∈ L(R3 )) defined by 9x + y T (x, y, z) = 9y . 7z 6 Let {ei } be the standard basis for R3 , and let {ēi } be the basis defined by 1 ē1 = 0 1 1 ē2 = 0 −1 0 ē3 = 1 . 1 Let us first find the representation Ā = [T ]ē directly from the definition T (ēi ) = P3 j j=1 ēj āji = ēj ā i . We will go through two ways of doing this to help clarify the various concepts involved. We have T (ē1 ) = T (1, 0, 1) = (9, 0, 7). Then we write (9, 0, 7) = a(1, 0, 1) + b(1, 0, −1) + c(0, 1, 1) and solve for a, b, c to obtain T (ē1 ) = 8ē1 + ē2 . Similarly, we find T (ē2 ) = T (1, 0, −1) = (9, 0, −7) = ē1 + 8ē2 and T (ē3 ) = T (0, 1, 1) = (1, 9, 7) = (−1/2)ē1 + (3/2)ē2 + 9ē3 . This shows that the representation [T ]ē is given by 8 Ā = [T ]ē = 1 0 1 −1/2 8 3/2 . 0 9 Another way is to use the fact that everything is simple with respect to the standard basis for R3 . We see that T (e1 ) = T (1, 0, 0) = (9, 0, 0) = 9e1 , T (e2 ) = T (0, 1, 0) = (1, 9, 0) = e1 + 9e2 and T (e3 ) = T (0, 0, 1) = (0, 0, 7) = 7e3 . Note that this shows 9 1 0 A = [T ]e = 0 9 0 0 0 7 which we will need below when we use the transition matrix to find Ā. It is easy to see that ē1 = e1 + e3 , ē2 = e1 − e3 and ē3 = e2 + e3 , so inverting these equations we have e1 = (1/2)(ē1 + ē2 ), e3 = (1/2)(ē1 − ē2 ) and e2 = ē3 − e3 = −(1/2)(ē1 − ē2 ) + ē3 . Then using the linearity of T we have T (ē1 ) = T (e1 + e3 ) = T (e1 ) + T (e3 ) = 9e1 + 7e3 = (9/2)(ē1 + ē2 ) + (7/2)(ē1 − ē2 ) = 8ē1 + ē2 T (ē2 ) = T (e1 − e3 ) = T (e1 ) − T (e3 ) = 9e1 − 7e3 = (9/2)(ē1 + ē2 ) − (7/2)(ē1 − ē2 ) = ē1 + 8ē2 T (ē3 ) = T (e2 + e3 ) = T (e2 ) + T (e3 ) = e1 + 9e2 + 7e3 = (1/2)(ē1 + ē2 ) − (9/2)(ē1 − ē2 ) + 9ē3 + (7/2)(ē1 − ē2 ) = −(1/2)ē1 + (3/2)ē2 + 9ē3 and, as expected, this gives the same result as we had above for [T ]ē . 7 Now we will use the transition matrix P to find Ā = [T ]ē . The matrix P is P3 defined by ēi = P ei = j=1 ej pji = ej pj i which is just the ith column of P , so we immediately have 1 1 0 0 1. P =0 1 −1 1 There are a number of ways to find P −1 which you should already be familiar with, and I won’t bother to explain them. We will simply use the fact that the inverse matrix is defined by ei = P −1 ēi and use the expressions we found above for each ei in terms of the ēi ’s. This last approach is the easiest for us and we can just write down the result 1 −1 1 1 1 −1 . P −1 = 1 2 0 2 0 We now see that 1 −1 1 9 1 1 [T ]ē = P −1 [T ]e P = 1 1 −1 0 9 2 0 2 0 0 0 8 1 −1/2 3/2 =1 8 0 0 9 0 1 1 0 0 0 0 1 7 1 −1 1 which agrees with our previous approaches. Also realize that a vector X = (x, y, z) ∈ R3 has components x, y, z only with respect to the standard basis {ei } for R3 . In other words x 1 0 0 X = y = x 0 + y 1 + z 0 = xe1 + ye2 + ze3 . z 0 0 1 But with respect to the basis {ēi } we have 1 −1 1 x−y+z x 1 1 1 −1 y = x + y − z X = P −1 X = 1 2 2 0 2 0 2y z 1 1 (x − y + z)ē1 + (x + y − z)ē2 + yē3 2 2 = x̄ē1 + ȳē2 + z̄ē3 . = As we will see later in the course, the Clebsch-Gordan coefficients that you may have seen are nothing more than the entries in the (unitary) transition matrix that 8 takes you between the |j1 j2 m1 m2 i basis and the |j1 j2 jmi basis in the vector space of two-particle angular momentum states. If T ∈ L(V ) is a linear transformation, then the image of T is the set Im T = {T x : x ∈ V } . It is also easy to see that Im T is a subspace of V . Furthermore, we define the rank of T to be the number rank T = dim(Im T ) . By picking a basis for Ker T and extending it to a basis for all of V , it is not hard to show that the following result holds, often called the rank theorem: dim(Im T ) + dim(Ker T ) = dim V . (8) It can also be shown that the rank of a linear transformation T is equal to the rank of any matrix representation of T (which is independent of similarity transformations). This is a consequence of the fact that T ei is the ith column of the matrix representation of T , and the set of all such vectors T ei spans Im T . Then rank T is the number of linearly independent vectors T ei , which is also the dimension of the column space of [T ]. But the dimension of the row and column spaces of a matrix are the same, and this is what is meant by the rank of a matrix. Thus rank T = rank[T ]. Note that if T is one-to-one, then Ker T = {0} so that dim Ker T = 0. It then follows from (8) that rank[T ] = rank T = dim(Im T ) = dim V = n so that [T ] is invertible. Another result we will need is the following. Theorem 1. If A and B are any matrices for which the product AB is defined, then the row space of AB is a subspace of the row space of B, and the column space of AB is a subspace of the column space of A. P Proof. As we saw above, using (AB)i = k aik Bk it follows that the ith row of AB is in the space spanned by the rows of B, and hence the row space of AB is a subspace of the row space of B. As to the column space, this was also shown above. Alternatively, note that the column space of AB is just the row space of (AB)T = B T AT , which is a subspace of the row space of AT by the first part of the theorem. But the row space of AT is just the column space of A. Corollary. rank(AB) ≤ min{rank(A), rank(B)}. 9 Proof. Let row(A) be the row space of A, and let col(A) be the column space of A. Then rank(AB) = dim(row(AB)) ≤ dim(row(B)) = rank(B) while rank(AB) = dim(col(AB)) ≤ dim(col(A)) = rank(A). The last topic I want to cover in this section is to briefly explain the mathematics of two-particle states. While this isn’t really necessary for this course and we won’t deal with it in detail, it should help you better understand what is going on when we add angular momenta. In addition, this material is necessary to understand direct product representations of groups, which is quite important in its own right. So, given two vector spaces V and V ′ , we may define a bilinear map V × V ′ → V ⊗ V ′ that takes ordered pairs (v, v ′ ) ∈ V × V ′ and gives a new vector denoted by v ⊗ v ′ . Since this map is bilinear by definition (meaning P that it is linear in P each′ ′ variable separately), if we have the linear combinations v = x v and v = yj vj i i P then v ⊗ v ′ = xi yj (vi ⊗ vj′ ). In particular, if V has basis {ei } and V ′ has basis {e′j }, then {ei ⊗ e′j } is a basis for V ⊗ V ′ which is then of dimension (dim V )(dim V ′ ) and called the direct (or tensor) product of V and V ′ . If we are given two operators A ∈ L(V ) and B ∈ L(V ′ ), the direct product of A and B is the operator A ⊗ B defined on V ⊗ V ′ by (A ⊗ B)(v ⊗ v ′ ) := A(v) ⊗ B(v ′ ) . We know that the matrix representation of an operator is defined by its values on a basis, and the ith basis vector goes to the ith column of the matrix representation. In the case of the direct product, we choose an ordered basis by taking all of the (dim V )(dim V ′ ) = mn elements ei ⊗ e′j in the obvious order {e1 ⊗ e′1 , . . . , e1 ⊗ e′n , e2 ⊗ e′1 , . . . , e2 ⊗ e′n , . . . , em ⊗ e′1 , . . . , em ⊗ e′n } . Now our matrix elements are labeled by double subscripts because each basis vector is labeled by two subscripts. The (ij)th column of C = A ⊗ B is given in the usual way by acting on ei ⊗ e′j with A ⊗ B: (A ⊗ B)(ei ⊗ e′j ) = Aei ⊗ Be′j = ek ak i ⊗ e′l bl j = (ek ⊗ e′l )ak i bl j = (ek ⊗ e′l )(A ⊗ B)kl ij . For example, the (1, 1)th column of C is the vector (A⊗B)(e1 ⊗e′1 ) = ak 1 bl 1 (ek ⊗e′l ) given by (a1 1 b1 1 , . . . , a1 1 bn 1 , a2 1 b1 1 , . . . , a2 1 bn 1 , . . . , am 1 b1 1 , . . . , am 1 bn 1 ) and in general, the (i, j)th column is given by (a1 i b1 j , . . . , a1 i bn j , a2 i b1 j , . . . , a2 i bn j , . . . , am i b1 j , . . . , am i bn j ) . 10 If we write this as the column vector it is, 1 1 a ib j .. . a1 i b n j .. . m 1 a ib j .. . am i b n j then it is not hard to see this shows that the matrix C has the block matrix form 1 a 1 B a12 B · · · a1m B .. .. . C = .. . . . am1 B am2 B · · · amm B As I said, we will see an application of this formalism when we treat the addition of angular momentum. 2 The Levi-Civita Symbol and the Vector Cross Product In order to ease into the notation we will use, we begin with an elementary treatment of the vector cross product. This will give us a very useful computational tool that is of importance in and of itself. While you are probably already familiar with the cross product, we will still go through its development from scratch just for the sake of completeness. To begin with, consider two vectors a and b in R3 (with Cartesian coordinates). There are two ways to define their vector product (or cross product) a × b. The first way is to define a × b as that vector with norm given by ka × bk = kak kbk sin θ where θ is the angle between a and b, and whose direction is such that the triple (a, b, a × b) has the same “orientation” as the standard basis vectors (x̂, ŷ, ẑ). This is commonly referred to as “the right hand rule.” In other words, if you rotate a into b thru the smallest angle between them with your right hand as if you were using a screwdriver, then the screwdriver points in the direction of a × b. Note that by definition, a × b is perpendicular to the plane spanned by a and b. The second way to define a × b is in terms of its vector components. I will start from this definition and show that it is in fact equivalent to the first definition. So, 11 we define a × b to be the vector c with components cx = (a × b)x = ay bz − az by cy = (a × b)y = az bx − ax bz cz = (a × b)z = ax by − ay bx Before proceeding, note that instead of labeling components by (x, y, z) it will be very convenient for us to use (x1 , x2 , x3 ). This is standard practice, and it will greatly facilitate many equations throughout the remainder of these notes. Using this notation, the above equations are written c1 = (a × b)1 = a2 b3 − a3 b2 c2 = (a × b)2 = a3 b1 − a1 b3 c3 = (a × b)3 = a1 b2 − a2 b1 We now see that each equation can be obtained from the previous by cyclically permuting the subscripts 1 → 2 → 3 → 1. Using these equations, it is easy to multiply out components and verify that a · c = a1 c1 + a2 c2 + a3 c3 = 0, and similarly b · c = 0. This shows that a × b is perpendicular to both a and b, in agreement with our first definition. Next, there are two ways to show that ka × bk is also the same as in the first definition. The easy way is to note that any two vectors a and b in R3 (both based at the same origin) define a plane. So we choose our coordinate axes so that a lies along the x1 -axis as shown below. x2 b h θ x1 a Then a and b have components a = (a1 , 0, 0) and b = (b1 , b2 , 0) so that (a × b)1 = a2 b3 − a3 b2 = 0 (a × b)2 = a3 b1 − a1 b3 = 0 (a × b)3 = a1 b2 − a2 b1 = a1 b2 and therefore c = a × b = (0, 0, a1 b2 ). But a1 = kak and b2 = h = kbk sin θ so that P kck2 = 3i=1 ci 2 = (a1 b2 )2 = (kak kbk sin θ)2 and therefore ka × bk = kak kbk sin θ . 12 Since both the length of a vector and the angle between two vectors is independent of the orientation of the coordinate axes, this result holds for arbitrary a and b. Therefore ka × bk is the same as in our first definition. The second way to see this is with a very unenlightening brute force calculation: 2 2 ka × bk2 = (a × b) · (a × b) = (a × b)1 + (a × b)2 + (a × b)3 2 = (a2 b3 − a3 b2 )2 + (a3 b1 − a1 b3 )2 + (a1 b2 − a2 b1 )2 = a2 2 b 3 2 + a3 2 b 2 2 + a3 2 b 1 2 + a1 2 b 3 2 + a1 2 b 2 2 + a2 2 b 1 2 − 2(a2 b3 a3 b2 + a3 b1 a1 b3 + a1 b2 a2 b1 ) = (a2 2 + a3 2 )b1 2 + (a1 2 + a3 2 )b2 2 + (a1 2 + a2 2 )b3 2 − 2(a2 b2 a3 b3 + a1 b1 a3 b3 + a1 b1 a2 b2 ) = (add and subtract terms) = (a1 2 + a2 2 + a3 2 )b1 2 + (a1 2 + a2 2 + a3 2 )b2 2 + (a1 2 + a2 2 + a3 2 )b3 2 − (a1 2 b1 2 + a2 2 b2 2 + a3 2 b3 2 ) − 2(a2 b2 a3 b3 + a1 b1 a3 b3 + a1 b1 a2 b2 ) = (a1 2 + a2 2 + a3 2 )(b1 2 + b2 2 + b3 2 ) − (a1 b1 + a2 b2 + a3 b3 )2 = kak2 kbk2 − (a · b)2 = kak2 kbk2 − kak2 kbk2 cos2 θ = kak2 kbk2 (1 − cos2 θ) = kak2 kbk2 sin2 θ so again we have ka × bk = kak kbk sin θ. To see the geometrical meaning of the vector product, first take a look at the parallelogram with sides defined by a and b. b h θ a In the figure, the height h is equal to b sin θ (where b = kbk and similarly for a), and the area of the parallelogram is equal to the area of the two triangles plus the area of the rectangle: 1 area = 2 · (b cos θ)h + (a − b cos θ)h 2 = ah = ab sin θ = ka × bk . Now suppose we have a third vector c that is not coplanar with a and b, and consider the parallelepiped defined by the three vectors as shown below. 13 a×b θ c b a The volume of this parallelepiped is given by the area of the base times the height, and hence is equal to Vol(a, b, c) = ka × bk kck cos θ = (a × b) · c . So we see that the so-called scalar triple product (a× b)·c represents the volume spanned by the three vectors. Most of this discussion so far should be familiar to most of you. Now we turn to a formalism that is probably not so familiar. Our formulation of determinants will use a generalization of the permutation symbol that we now introduce. Just keep in mind that the long term benefits of what we are about to do far outweigh the effort required to learn it. While the concept of permutation should be fairly intuitive, let us make some rather informal definitions. If we have a set of n numbers {a1 , a2 , . . . , an }, then these n numbers can be arranged into n! ordered collections (ai1 , ai2 , . . . , ain ) where (i1 , i2 , . . . , in ) is just the set (1, 2, . . . , n) arranged in any one of the n! possible orderings. Such an arrangement is called a permutation of the set {a1 , a2 , . . . , an }. If we have a set S of n numbers, then we denote the set of all permutations of these numbers by Sn . This is called the permutation group of order n. Because there are n! rearrangements (i.e., distinct orderings) of a set of n numbers (this can really be any n objects), the permutation group of order n consists of n! elements. It is conventional to denote an element of Sn (i.e., a particular permutation) by Greek letters such as σ, τ, θ etc. Now, it is fairly obvious intuitively that any permutation can be achieved by a suitable number of interchanges of pairs of elements. Each interchange of a pair is called a transposition. (The formal proof of this assertion is, however, more difficult than you might think.) For example, let the ordered set (1, 2, 3, 4) be permuted to the ordered set (4, 2, 1, 3). This can be accomplished as a sequence of transpositions as follows: 1↔4 1↔3 (1, 2, 3, 4) −−−→ (4, 2, 3, 1) −−−→ (4, 2, 1, 3) . It is also easy enough to find a different sequence that yields the same final result, and hence the sequence of transpositions resulting in a given permutation is by no means unique. However, it is a fact (also not easy to prove formally) that whatever sequence you choose, the number of transpositions is either always an even number or always an odd number. In particular, if a permutation σ consists of m transpositions, then we define the sign of the permutation by sgn σ = (−1)m . 14 Because of this, it makes sense to talk about a permutation as being either even (if m is even) or odd (if m is odd). Now that we have a feeling for what it means to talk about an even or an odd permutation, let us define the Levi-Civita symbol εijk (also frequently referred to as the permutation symbol) by 1 if (i, j, k) is an even permutation of (1, 2, 3) εijk = −1 if (i, j, k) is an odd permutation of (1, 2, 3) . 0 if (i, j, k) is not a permutation of (1, 2, 3) In other words, ε123 = −ε132 = ε312 = −ε321 = ε231 = −ε213 = 1 and εijk = 0 if there are any repeated indices. We also say that εijk is antisymmetric in all three indices, meaning that it changes sign upon interchanging any two indices. For a given order (i, j, k) the resulting number εijk is also called the sign of the permutation. Before delving further into some of the properties of the Levi-Civita symbol, let’s take a brief look at how Pit is used. Given two vectors a and b, we can let i = 1 and form the double sum 3j,k=1 ε1jk aj bk . Since εijk = 0 if any two indices are repeated, the only possible values for j and k are 2 and 3. Then 3 X j,k=1 ε1jk aj bk = ε123 a2 b3 + ε132 a3 b2 = a2 b3 − a3 b2 = (a × b)1 . But the components of the cross product are cyclic permutations of each other, and εijk doesn’t change sign under cyclic permutations, so we have the important general result 3 X (a × b)i = εijk aj bk . (9) j,k=1 (A cyclic permutation is one of the form 1 → 2 → 3 → 1 or x → y → z → x.) Now, in order to handle various vector identities, we need to prove some other properties of the Levi-Civita symbol. The first identity to prove is this: 3 X εijk εijk = 3! = 6 . (10) i,j,k=1 But this is actually easy, because (i, j, k) must all be different, and there are 3! ways to order (1, 2, 3). In other words, there are 3! permutations of {1, 2, 3}. For every case where all three indices are different, whether εijk is +1 or −1, we always have (εijk )2 = +1, and therefore summing over the 3! possibilities yields the desired result. 15 Recalling the Einstein summation convention, it is important to keep the placement of any free (i.e., unsummed over) indices the same on both sides of an equation. For example, we would always write something like Aij B jk = Ci k and not Aij B jk = Cik . In particular, the ith component of the cross product is written (a × b)i = εijk aj bk . (11) As mentioned earlier, for our present purposes, raising and lowering an index is purely a notational convenience. And in order to maintain the proper index placement, we will frequently move an index up or down as necessary. While this may seem quite confusing at first, with a little practice it becomes second nature and results in vastly simplified calculations. Using this convention, equation (10) is simply written εijk εijk = 6. This also P3 applies to the Kronecker delta, so that we have expressions like ai δij = i=1 ai δij = aj (where δij is numerically the same as δij ). An inhomogeneous system of linear equations would be written as simply ai j xj = y i , and the dot product as a · b = ai b i = ai b i . (12) Note also that indices that are summed over are “dummy indices” meaning, for P3 ai b i = example, that ai bi = ak bk . This is simply another way of writing i=1 P3 a1 b1 + a2 b2 + a3 b3 = k=1 ak bk . As we have said, the Levi-Civita symbol greatly simplifies many calculations dealing with vectors. Let’s look at some examples. Example 3. Let us take a look at the scalar triple product. We have a · (b × c) = ai (b × c)i = ai εijk bj ck = bj εjki ck ai = bj (c × a)j (because εijk = −εjik = +εjki ) = b · (c × a) . Note also that this formalism automatically takes into account the anti-symmetry of the cross product: (c × a)i = εijk cj ak = −εikj cj ak = −εikj ak cj = −(a × c)i . It doesn’t get any easier than this. Of course, this formalism works equally well with vector calculus equations involving the gradient ∇. This is the vector defined by ∇ = x̂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ + x̂2 + x̂3 = ei . + ŷ + ẑ = x̂1 ∂x ∂y ∂z ∂x1 ∂x2 ∂x3 ∂xi 16 In fact, it will also be convenient to simplify our notation further by defining ∇i = ∂/∂xi = ∂i , so that ∇ = ei ∂i . Example 4. Let us prove the well-known identity ∇ · (∇ × a) = 0. We have ∇ · (∇ × a) = ∇i (∇ × a)i = ∂i (εijk ∂j ak ) = εijk ∂i ∂j ak . But now notice that εijk is antisymmetric in i and j (so that εijk = −εjik ), while the product ∂i ∂j is symmetric in i and j (because we assume that the order of differentiation can be interchanged so that ∂i ∂j = ∂j ∂i ). Then εijk ∂i ∂j = −εjik ∂i ∂j = −εjik ∂j ∂i = −εijk ∂i ∂j where the last step follows because i and j are dummy indices, and we can therefore relabel them. But then εijk ∂i ∂j = 0 and we have proved our identity. The last step in the previous example is actually a special case of a general result. To see this, suppose that we have an object Aij··· that is labeled by two or more indices, and suppose that it is antisymmetric in two of those indices (say i, j). This means that Aij··· = −Aji··· . Now suppose that we have another object Sij··· that is symmetric in i and j, so that Sij··· = Sji··· . If we multiply A times S and sum over the indices i and j, then using the symmetry and antisymmetry properties of S and A we have Aij··· Sij··· = −Aji··· Sij··· by the antisymmetry of A ji··· Sji··· by the symmetry of S ij··· Sij··· by relabeling the dummy indices i and j = −A = −A and therefore we have the general result Aij··· Sij··· = 0 . It is also worth pointing out that the indices i and j need not be the first pair of indices, nor do they need to be adjacent. For example, we still have A···i···j··· S···i···j··· = 0. Now suppose that we have an arbitrary object T ij without any particular symmetry properties. Then we can turn this into an antisymmetric object T [ij] by a process called antisymmetrization as follows: T ij → T [ij] := 1 ij (T − T ji ) . 2! In other words, we add up all possible permutations of the indices, with the sign of each permutation being either +1 (for an even permutation) or −1 (for an odd 17 permutation), and then divide this sum by the total number of permutations, which in this case is 2!. If we have something of the form T ijk then we would have 1 ijk (T − T ikj + T kij − T kji + T jki − T jik ) 3! where we alternate signs with each transposition. The generalization to an arbitrary number of indices should be clear. Note also that we could antisymmetrize only over a subset of the indices if required. It is also important to note that it is impossible to have a nonzero antisymmetric object with more indices than the dimension of the space we are working in. This is simply because at least one index will necessarily be repeated. For example, if we are in R3 , then anything of the form T ijkl must have at least one index repeated because each index can only range between 1, 2 and 3. Now, why did we go through all of this? Well, first recall that we can write the Kronecker delta in any of the equivalent forms δij = δji = δij . Then we can construct quantities like T ijk → T [ijk] := [1 2] δi δj = and 1 1 2 2 δi δj − δi2 δj1 = δ[i1 δj] 2! 1 1 2 3 δ δ δ − δi1 δj3 δk2 + δi3 δj1 δk2 − δi3 δj2 δk1 + δi2 δj3 δk1 − δi2 δj1 δk3 . 3! i j k In particular, we now want to show that [1 2 3] δi δj δk = [1 2 3] εijk = 3! δi δj δk . (13) Clearly, if i = 1, j = 2 and k = 3 we have [1 2 3] 1 1 2 3 δ δ δ − δ11 δ23 δ32 + δ13 δ21 δ32 − δ13 δ22 δ31 + δ12 δ23 δ31 − δ12 δ21 δ33 3! 1 2 3 = 1 − 0 + 0 − 0 + 0 − 0 = 1 = ε123 3! δ1 δ2 δ3 = 3! so equation (13) is correct in this particular case. But now we make the crucial observation that both sides of equation (13) are antisymmetric in (i, j, k), and hence the equation must hold for all values of (i, j, k). This is because any permutation of (i, j, k) results in the same change of sign on both sides, and both sides also equal 0 if any two indices are repeated. Therefore equation (13) is true in general. To derive what is probably the most useful identity involving the Levi-Civita symbol, we begin with the fact that ε123 = 1. Multiplying the left side of equation (13) by 1 in this form yields [1 2 3] εijk ε123 = 3! δi δj δk . But now we again make the observation that both sides are antisymmetric in (1, 2, 3), and hence both sides are equal for all values of the upper indices, and we have the fundamental result [n l m] εijk εnlm = 3! δi δj δk . 18 (14) We now set n = k and sum over k. (This process of setting two indices equal to each other and summing is called contraction.) Using the fact that δkk = 3 X δkk = 3 i=1 along with terms such as δik δkm = δim we find [k l m] εijk εklm = 3! δi δj δk = δik δjl δkm − δik δjm δkl + δim δjk δkl − δim δjl δkk + δil δjm δkk − δil δjk δkm = δim δjl − δil δjm + δim δjl − 3δim δjl + 3δil δjm − δil δjm = δil δjm − δim δjl . In other words, we have the extremely useful result εijk εklm = δil δjm − δim δjl . (15) This result is so useful that it should definitely be memorized. Example 5. Let us derive the well-known triple vector product known as the “bac − cab” rule. We simply compute using equation (15): [a × (b × c)]i = εijk aj (b × c)k = εijk εklm aj bl cm = (δil δjm − δim δjl )aj bl cm = am bi cm − aj bj ci = bi (a · c) − ci (a · b) and therefore a × (b × c) = b(a · c) − c(a · b) . We also point out that some of the sums in this derivation can be done in more than one way. For example, we have either δil δjm aj bl cm = am bi cm = bi (a · c) or δil δjm aj bl cm = aj bi cj = bi (a · c), but the end result is always the same. Note also that at every step along the way, the only index that isn’t repeated (and hence summed over) is i. Example 6. Equation (15) is just as useful in vector calculus calculations. Here is an example to illustrate the technique. [∇ × (∇ × a)]i = εijk ∂ j (∇ × a)k = εijk εklm ∂ j ∂l am = (δil δjm − δim δjl )∂ j ∂l am = ∂ j ∂i aj − ∂ j ∂j ai = ∂i (∇ · a) − ∇2 ai 19 and hence we have the identity ∇ × (∇ × a) = ∇(∇ · a) − ∇2 a which is very useful in discussing the theory of electromagnetic waves. 3 Determinants In treating vectors in R3 , we used the permutation symbol εijk defined in the last section. We are now ready to apply the same techniques to the theory of determinants. The idea is that we want to define a mapping from a matrix A ∈ Mn (F ) to F in a way that has certain algebraic properties. Since a matrix in Mn (F ) has components aij with i and j ranging from 1 to n, we are going to need a higher dimensional version of the Levi-Civita symbol already introduced. The obvious extension to n dimensions is the following. We define 1 if i1 , . . . , in is an even permutation of 1, . . . , n εi1 ··· in = −1 if i1 , . . . , in is an odd permutation of 1, . . . , n . 0 if i1 , . . . , in is not a permutation of 1, . . . , n Again, there is no practical difference between εi1 ··· in and εi1 ··· in . Using this, we define the determinant of A = (aij ) ∈ Mn (F ) to be the number det A = εi1 ··· in a1i1 a2i2 · · · anin . (16) Look carefully at what this expression consists of. Since εi1 ··· in vanishes unless (i1 , . . . , in ) are all distinct, and there are n! such distinct orderings, we see that det A consists of n! terms in the sum, where each term is a product of n factors aij , and where each term consists precisely of one factor from each row and each column of A. In other words, det A is a sum of terms where each term is a product of one element from each row and each column, and the sum is over all such possibilities. The determinant is frequently written as a11 . . . a1n .. . det A = ... . an1 . . . ann The determinant of an n × n matrix is said to be of order n. Note also that the determinant is only defined for square matrices. 20 Example 7. Leaving the easier 2 × 2 case to you to verify, we will work out the 3 × 3 case and show that it gives the same result that you probably learned in a more elementary course. So, for A = (aij ) ∈ M3 (F ) we have det A = εijk a1i a2j a3k = ε123 a11 a22 a33 + ε132 a11 a23 a32 + ε312 a13 a21 a32 + ε321 a13 a22 a31 + ε231 a12 a23 a31 + ε213 a12 a21 a33 = a11 a22 a33 − a11 a23 a32 + a13 a21 a32 − a13 a22 a31 + a12 a23 a31 − a12 a21 a33 You may recognize this in either of the mnemonic forms (sometimes called Sarrus’s rule) − a11 a12 a13 a11 a12 a21 a22 a23 a21 a22 a31 a32 a33 a31 a32 + + − − + or + − + + a11 a12 a13 a21 a22 a23 a31 a32 a33 − − Here, we are to add together all products of terms connected by a (+) line, and subtract all of the products connected by a (−) line. It can be shown that this 3 × 3 determinant may be expanded as a sum of three 2 × 2 determinants. Example 8. Let A = (aij ) be a diagonal matrix, i.e., aij = 0 if i 6= j. Then det A = εi1 ··· in a1i1 · · · anin = ε1··· n a11 · · · ann n Y aii = a11 · · · ann = i=1 21 so that a11 .. . 0 ··· .. . ··· In particular, we see that det I = 1. n Y aii . = ann i=1 0 .. . We now prove a number of useful properties of determinants. These are all very straightforward applications of the definition (16) once you have become comfortable with the notation. In fact, in my opinion, this approach to determinants affords the simplest way in which to arrive at these results, and is far less confusing than the usual inductive proofs. Theorem 2. For any A ∈ Mn (F ) we have det A = det AT . Proof. This is simply an immediate consequence of our definition of determinant. We saw that det A is a sum of all possible products of one element from each row and each column, and no product can contain more than one term from a given column because the corresponding ε symbol would vanish. This means that an equivalent way of writing all n! such products is (note the order of subscripts is reversed) det A = εi1 ··· in ai1 1 · · · ain n . But aij = aT ji so this is just det A = εi1 ··· in ai1 1 · · · ain n = εi1 ··· in aT 1i1 · · · aT nin = det AT . In order to help us gain some additional practice manipulating these quantities, we prove this theorem again based on another result which we will find very useful in its own right. We start from the definition det A = εi1 ··· in a1i1 · · · anin . Again using ε1··· n = 1 we have ε1··· n det A = εi1 ··· in a1i1 · · · anin . (17) By definition of the permutation symbol, the left side of this equation is antisymmetric in (1, . . . , n). But so is the right side because, taking a1i1 and a2i2 as an example, we see that εi1 i2 ··· in a1i1 a2i2 · · · anin = εi1 i2 ··· in a2i2 a1i1 · · · anin = −εi2 i1 ··· in a2i2 a1i1 · · · anin = −εi1 i2 ··· in a2i1 a1i2 · · · anin 22 where the last line follows by a relabeling of the dummy indices i1 and i2 . So, by a now familiar argument, both sides of equation (17) must be true for any values of the indices (1, . . . , n) and we have the extremely useful result εj1 ··· jn det A = εi1 ··· in aj1 i1 · · · ajn in . (18) This equation will turn out to be very helpful in many proofs that would otherwise be considerably more difficult. Let us now use equation (18) to prove Theorem 2. We begin with the analogous result to equation (10). This is εi1 ··· in εi1 ··· in = n!. (19) Using this, we multiply equation (18) by εj1 ··· jn to yield n! det A = εj1 ··· jn εi1 ··· in aj1 i1 · · · ajn in . On the other hand, by definition of det AT we have det AT = εi1 ··· in aT 1i1 · · · aT nin = εi1 ··· in ai1 1 · · · ain n . Multiplying the left side of this equation by 1 = ε1··· n and again using the antisymmetry of both sides in (1, . . . , n) yields εj1 ··· jn det AT = εi1 ··· in ai1 j1 · · · ajn in . (This also follows by applying equation (18) to AT directly.) Now multiply this last equation by εj1 ··· jn to obtain n! det AT = εi1 ··· in εj1 ··· jn ai1 j1 · · · ajn in . Relabeling the dummy indices i and j we have n! det AT = εj1 ··· jn εi1 ··· in aj1 i1 · · · ain jn which is exactly the same as the above expression for n! det A, and we have again proved Theorem 2. Let us restate equation (18) as a theorem for emphasis, and also look at two of its immmediate consequences. Theorem 3. If A ∈ Mn (F ), then εj1 ··· jn det A = εi1 ··· in aj1 i1 · · · ajn in . 23 Corollary 1. If B ∈ Mn (F ) is obtained from A ∈ Mn (F ) by interchanging two rows of A, the det B = − det A. Proof. This is really just what the theorem says in words. (See the discussion between equations (17) and (18).) For example, let B result from interchanging rows 1 and 2 of A. Then det B = εi1 i2 ··· in b1i1 b2i2 · · · bnin = εi1 i2 ··· in a2i1 a1i2 · · · anin = εi1 i2 ··· in a1i2 a2i1 · · · anin = −εi2 i1 ··· in a1i2 a2i1 · · · anin = −εi1 i2 ··· in a1i1 a2i2 · · · anin = − det A = ε213···n det A . where again the next to last line follows by relabeling. Corollary 2. If A ∈ Mn (F ) has two identical rows, then det A = 0. Proof. If B is the matrix obtained by interchanging two identical rows of A, then by the previous corollary we have det A = det B = − det A and therefore det A = 0. Here is another way to view Theorem 3 and its corollaries. If we view det A as a function of the rows of A, then the corollaries state that det A = 0 if any two rows are the same, and det A changes sign if two nonzero rows are interchanged. In other words, we have det(Aj1 , . . . , Ajn ) = εj1 ··· jn det A . (20) If it isn’t immediately obvious to you that this is true, then note that for (j1 , . . . , jn ) = (1, . . . , n) it’s just an identity. So by the antisymmetry of both sides, it must be true for all j1 , . . . , jn . Looking at the definition det A = εi1 ··· in a1i1 · · · anin , we see that we can view the determinant as a function of the rows of A: det A = det(A1 , . . . , An ). Since each row is actually a vector in F n , we can replace A1 (for example) by any linear combination of two vectors in F n so that A1 = rB1 + sC1 where r, s ∈ F and B1 , C1 ∈ F n . Let B = (bij ) be the matrix with rows Bi = Ai for i = 2, . . . , n, and let C = (cij ) be the matrix with rows Ci = Ai for i = 2, . . . , n. Then det A = det(A1 , A2 , . . . , An ) = det(rB1 + sC1 , A2 , . . . , An ) = εi1 ··· in (rb1i1 + sc1i1 )a2i2 · · · anin = rεi1 ··· in b1i1 a2i2 · · · anin + sεi1 ··· in c1i1 a2i2 · · · anin = r det B + s det C. 24 Since this argument clearly could have been applied to any of the rows of A, we have proved the following theorem. Theorem 4. Let A ∈ Mn (F ) have row vectors A1 , . . . , An and assume that for some i = 1, . . . , n we have Ai = rBi + sCi where Bi , Ci ∈ F n and r, s rows A1 , . . . , Ai−1 , Bi , Ai+1 , . . . , An A1 , . . . , Ai−1 , Ci , Ai+1 , . . . , An . Then ∈ F. and C Let ∈ B ∈ Mn (F ) Mn (F ) have have rows det A = r det B + s det C. Besides the very easy to handle diagonal matrices, another type of matrix that is easy to deal with are the triangular matrices. To be precise, a matrix A ∈ Mn (F ) is said to be upper-triangular if aij = 0 for i > j, and A is said to be lowertriangular if aij = 0 for i < j. Thus a matrix is upper-triangular if it is of the form a11 a12 a13 · · · a1n 0 a22 a23 · · · a2n 0 0 a33 · · · a3n .. .. .. .. . . . . 0 0 0 · · · ann and lower-triangular if it is of the form a11 0 0 a21 a22 0 a31 a32 a33 .. .. .. . . . an1 an2 an3 ··· ··· ··· 0 0 0 .. . ··· ann . We will use the term triangular to mean either upper- or lower-triangular. Theorem 5. If A ∈ Mn (F ) is a triangular matrix, then det A = n Y aii . i=1 Proof. If A is lower-triangular, then A is of the form shown above. Now look carefully at the definition det A = εi1 ··· in a1i1 · · · anin . Since A is lower-triangular we have aij = 0 for i < j. But then we must have i1 = 1 or else a1i1 = 0. Now 25 consider a2i2 . Since i1 = 1 and a2i2 = 0 if 2 < i2 , we must have i2 = 2. Next, i1 = 1 and i2 = 2 means that i3 = 3 or else a3i3 = 0. Continuing in this way we see that the only nonzero term in the sum is when ij = j for each j = 1, . . . , n and hence det A = ε 12 ··· n a11 · · · ann = n Y aii . i=1 If A is an upper-triangular matrix, then the theorem follows from Theorem 2. An obvious corollary is the following (which was also shown directly in Example 8). Corollary. If A ∈ Mn (F ) is diagonal, then det A = Qn i=1 aii . It is important to realize that because det AT = det A, Theorem 3 and its corollaries apply to columns as well as to rows. Furthermore, these results now allow us easily see what happens to the determinant of a matrix A when we apply elementary row (or column) operations to A. In fact, if you think for a moment, the answer should be obvious. For a type α transformation (i.e., interchanging two rows), we have just seen that det A changes sign (Theorem 3, Corollary 1). For a type β transformation (i.e., multiply a single row by a nonzero scalar), we can let r = k, s = 0 and Bi = Ai in Theorem 4 to see that det A → k det A. And for a type γ transformation (i.e., add a multiple of one row to another) we have (for Ai → Ai + kAj and using Theorems 4 and 3, Corollary 2) det(A1 , . . . , Ai + kAj , . . . , An ) = det A + k det(A1 , . . . , Aj , . . . , Aj , . . . , An ) = det A + 0 = det A. Summarizing these results, we have the following theorem. Theorem 6. Suppose A ∈ Mn (F ) and let B ∈ Mn (F ) be row equivalent to A. (i) If B results from the interchange of two rows of A, then det B = − det A. (ii) If B results from multiplying any row (or column) of A by a scalar k, then det B = k det A. (iii) If B results from adding a multiple of one row of A to another row, then det B = det A. Corollary. If R is the reduced row-echelon form of a matrix A, then det R = 0 if and only if det A = 0. Proof. This follows from Theorem 6 since A and R are row-equivalent. 26 Now, A ∈ Mn (F ) is singular if rank A < n. Hence there must be at least one zero row in the reduced row echelon form R of A, and thus det A = det R = 0. Conversely, if rank A = n, then the reduced row echelon form R of A is just I, and hence det R = 1 6= 0. Therefore det A 6= 0. In other words, we have shown that Theorem 7. A ∈ Mn (F ) is singular if and only if det A = 0. Finally, let us prove a basic result that you already know, i.e., that the determinant of a product of matrices is the product of the determinants. Theorem 8. If A, B ∈ Mn (F ), then det(AB) = (det A)(det B). Proof. If either A or B is singular (i.e., their rank is less than n) then so is AB (by the corollary to Theorem 1). But then (by Theorem 7) either det A = 0 or det B = 0, and also det(AB) = 0 so the theorem is true in this case. Now assume Pthat both A and B are nonsingular, and let C = AB. Then Ci = (AB)i = k aik Bk for each i = 1, . . . , n so that from an inductive extension of Theorem 4 we see that det C = det(C1 , . . . , Cn ) X X = det anjn Bjn a1j1 Bj1 , . . . , = X j1 jn j1 ··· X jn a1j1 · · · anjn det(Bj1 , . . . , Bjn ). But det(Bj1 , . . . , Bjn ) = εj1 ··· jn det B (see equation (20)) so we have X X a1j1 · · · anjn εj1 ··· jn det B ··· det C = j1 jn = (det A)(det B). Corollary. If A ∈ Mn (F ) is nonsingular, then det A−1 = (det A)−1 . Proof. If A is nonsingular, then A−1 exists, and hence by the theorem we have 1 = det I = det(AA−1 ) = (det A)(det A−1 ) and therefore det A−1 = (det A)−1 . 27 4 Diagonalizing Matrices If T ∈ L(V ), then an element λ ∈ F is called an eigenvalue of T if there exists a nonzero vector v ∈ V such that T v = λv. In this case we call v an eigenvector of T belonging to the eigenvalue λ. Note that an eigenvalue may be zero, but an eigenvector is always nonzero by definition. It is important to realize (particularly in quantum mechanics) that eigenvectors are only specified up to an overall constant. This is because if T v = λv, then for any c ∈ F we have T (cv) = c(T v) = cλv = λ(cv) so that cv is also an eigenvector with eigenvalue λ. Because of this, we are always free to normalize our eigenvectors to any desired value. If T has an eigenvalue λ, then T v = λv or (T − λ)v = 0. But this means that v ∈ Ker(T − λ1) with v 6= 0, so that T − λ1 is singular. Conversely, if T − λ1 is singular, then there exists v 6= 0 such that (T − λ1)v = 0 or T v = λv. Thus we have proved that a linear operator T ∈ L(V ) has an eigenvalue λ ∈ F if and only if T − λ1 is singular. (This is exactly the same as saying λ1 − T is singular.) In an exactly analogous manner we define the eigenvalues and eigenvectors of a matrix A ∈ Mn (F ). Thus we say that an element λ ∈ F is an eigenvalue of a A if there exists a nonzero (column) vector v ∈ F n such that Av = λv, and we call v an eigenvector of A belonging to the eigenvalue λ. Given a basis {ei } for F n , we can write this matrix eigenvalue equation in terms of components as ai j v j = λv i or, written out as n X aij vj = λvi , i = 1, . . . , n . (21a) Writing λvi = Pn j=1 j=1 λδij vj , we can write (21a) in the form n X j=1 (λδij − aij )vj = 0 . (21b) If A has an eigenvalue λ, then λI − A is singular so that det(λI − A) = 0 . (22) Another way to think about this is that if the matrix (operator) λI − A is nonsingular, then (λI − A)−1 would exist. But then multiplying the equation (λI − A)v = 0 from the left by (λI − A)−1 implies that v = 0, which is impossible if v is to be an eigenvector of A. It is also worth again pointing out that there is no real difference between the statements det(λ1 − A) = 0 and det(A − λ1) = 0, and we will use whichever one is most appropriate for what we are doing at the time. Example 9. Let us find all of the eigenvectors and associated eigenvalues of the matrix 1 2 A= . 3 2 28 This means that we must find a vector v = (x, y) such that Av = λv. In matrix notation, this equation takes the form 1 2 x x =λ 3 2 y y and the equation (A − λI)v = 0 becomes 1−λ 2 x = 0. 3 2−λ y This is equivalent to the system (1 − λ)x + 2y = 0 3x + (2 − λ)y = 0 . (23) By (22) we must have 1−λ 2 = λ2 − 3λ − 4 = (λ − 4)(λ + 1) = 0 . 3 2−λ We thus see that the eigenvalues are λ = 4 and λ = −1. (The roots of this polynomial are found either by inspection, or by applying the elementary quadratic formula.) Substituting λ = 4 into equations (23) yields −3x + 2y = 0 3x − 2y = 0 or y = (3/2)x. This means that every eigenvector corresponding to the eigenvalue λ = 4 has the form v = (x, 3x/2). In other words, every multiple of the vector v = (2, 3) is also an eigenvector with eigenvalue equal to 4. If we substitute λ = −1 in equations (23), then we similarly find y = −x, and hence every multiple of the vector v = (1, −1) is an eigenvector with eigenvalue equal to −1. (Note that both of equations (23) give the same information. This is not surprising because the determinant of the coefficients vanishes so we know that the rows are linearly dependent, and hence each supplies the same information.) Let us denote the set of all polynomials over the field F by F [x]. Thus p ∈ F[x] means that p = a0 + a1 x + a2 x2 + · · · + an xn where each ai ∈ F and an 6= 0. The number n is called the degree of p and denoted by deg p. If an = 1 the polynomial is said to be monic. In high school you learned how to do long division, and an inductive application of this process yields the following result, called the division algorithm: Given f, g ∈ F[x] with g 6= 0, there exist unique polynomials q, r ∈ F[x] such that f = qg +r where either r = 0 or deg r < deg g. The polynomial 29 q is called the quotient and r is called the remainder. If f (x) ∈ F[x], then c ∈ F is said to be a zero or root of f if f (c) = 0. If f, g ∈ F[x] and g 6= 0, then we say that f is divisible by g (or g divides f ) over F if f = qg for some q ∈ F[x]. In other words, f is divisible by g if the remainder in the division of f by g is zero. In this case we also say that g is a factor of f . Suppose that we divide f by x − c. By the division algorithm we know that f = (x − c)q + r where either r = 0 or deg r < deg(x − c) = 1. But then either r = 0 or deg r = 0 in which case r ∈ F. Either way, substituting x = c we have f (c) = (c − c)q + r = r. Thus the remainder in the division of f by x − c is f (c). This result is called the remainder theorem. As a consequence of this, we see that x − c will be a factor of f if and only if f (c) = 0, a result called the factor theorem. If c is such that (x − c)m divides f but no higher power of x − c divides f , then we say that c is a root of multiplicity m. In counting the number of roots a polynomial has, we shall always count a root of multiplicity m as m roots. A root of multiplicity 1 is frequently called a simple root. The fields R and C are by far the most common fields used by physicists. However, there is an extremely important fundamental difference between them. A field F is said to be algebraically closed if every polynomial f ∈ F[x] with deg f > 0 has at least one zero (or root) in F . It is a fact (not at all easy to prove) that the complex number field C is algebraically closed. Let F be algebraically closed, and let f ∈ F[x] be of degree n ≥ 1. Since F is algebraically closed there exists a1 ∈ F such that f (a1 ) = 0, and hence by the factor theorem, f = (x − a1 )q1 where q1 ∈ F[x] and deg q1 = n − 1. (This is a consequence of the general fact that if deg p = m and deg q = n, then deg pq = m + n. Just look at the largest power of x in the product pq = (a0 + a1 x + a2 x2 + · · · + am xm )(b0 + b1 x + b2 x2 + · · · + bn xn ).) Now, by the algebraic closure of F there exists a2 ∈ F such that q1 (a2 ) = 0, and therefore q1 = (x − a2 )q2 where deg q2 = n − 2. It is clear that we can continue this process a total of n times, finally arriving at f = c(x − a1 )(x − a2 ) · · · (x − an ) = c n Y i=1 (x − ai ) where c ∈ F is nonzero. In particular, c = 1 if qn−1 is monic. Observe that while this shows that any polynomial of degree n over an algebraically closed field has exactly n roots, it doesn’t require that these roots be distinct, and in general they are not. Note also that while the field C is algebraically closed, it is not true that R is algebraically closed. This should be obvious because any quadratic equation of the form ax2 + bx + c = 0 has solutions given by the quadratic formula √ −b ± b2 − 4ac x= 2a and if b2 − 4ac < 0, then there is no solution for x in the real number system. 30 Given a matrix A = (aij ) ∈ Mn (F ), the trace of A is defined by tr A = An important property of the trace is that it is cyclic: tr AB = n X (AB)ii = n X n X i=1 j=1 i=1 aij bji = n X n X i=1 j=1 bji aij = n X Pn i=1 aii . (BA)jj = tr BA . j=1 As a consequence of this, we see that the trace is invariant under similarity transformations. In other words, if A′ = P −1 AP , then tr A′ = tr P −1 AP = tr AP P −1 = tr A. Let A ∈ Mn (F ) be a matrix representation of T . The matrix xI − A is called the characteristic matrix of A, and the expression det(x1 − T ) = 0 is called the characteristic (or secular) equation of T . The determinant det(x1 − T ) is frequently denoted by ∆T (x). Writing out the determinant in a particular basis, we see that det(x1 − T ) is of the form x − a11 −a12 ··· −a1n −a21 x − a22 · · · −a2n ∆T (x) = .. .. .. . . . −an1 −an2 · · · x − ann where A = (aij ) is the matrix representation of T in the chosen basis. Since the expansion of a determinant contains exactly one element from each row and each column, we see that (and this is a very good exercise for you to show) det(x1 − T ) = (x − a11 )(x − a22 ) · · · (x − ann ) + terms containing n − 1 factors of the form x − aii n + · · · + terms with no factors containing x = x − (tr A)xn−1 + terms of lower degree in x + (−1)n det A. (24) This monic polynomial is called the characteristic polynomial of T . Using Theorem 8 and its corollary, we see that if A′ = P −1 AP is similar to A, then det(xI − A′ ) = det(xI − P −1 AP ) = det[P −1 (xI − A)P ] = det(xI − A) . We thus see that similar matrices have the same characteristic polynomial (the converse of this statement is not true), and hence also the same eigenvalues. Therefore the eigenvalues (not eigenvectors) of an operator T ∈ L(V ) do not depend on the basis chosen for V . Note that since both the determinant and trace are invariant under similarity transformations, we may as well write tr T and det T (rather than tr A and det A) since these are independent of the particular basis chosen. Since the characteristic polynomial is of degree n in x, it follows from the discussion above that if we are in an algebraically closed field (such as C), then there 31 must exist n roots. In this case the characteristic polynomial may be factored into the form det(x1 − T ) = (x − λ1 )(x − λ2 ) · · · (x − λn ) (25) where the eigenvalues λi are not necessarily distinct. Expanding this expression we have ! n X n λi xn−1 + · · · + (−1)n λ1 λ2 · · · λn . det(x1 − T ) = x − i=1 Comparing this with the above general expression for the characteristic polynomial, we see that n X λi (26a) tr T = i=1 and det T = n Y λi . (26b) i=1 (You can easily verify these for the matrix in Example 9.) It should be remembered that this result only applies to an algebraically closed field (or to any other field F as long as all n roots of the characteristic polynomial lie in F ). If v1 , v2 , . . . , vr are eigenvectors belonging to the distinct eigenvalues λ1 , λ2 , . . . , λr of T ∈ L(V ), then it can be shown that the set {v1 , v2 , . . . , vr } is linearly independent. Therefore, if T has n distinct eigenvalues (and it can’t have more than n) there are n linearly independent eigenvectors which then form a basis for V . Let us now take a careful look at what happens if a space V has a basis of eigenvectors of an operator T . Suppose that T ∈ L(V ) with dim V = n. If V has a basis {v1 , . . . , vn } that consists entirely of eigenvectors of T , then the matrix representation of T in this basis is defined by T (vi ) = n X vj aji = λi vi = n X δji λj vj j=1 j=1 and therefore aji = δji λj . In other words, T is represented by a diagonal matrix in a basis of eigenvectors, and the diagonal elements of [T ]v are precisely the eigenvalues of T . Conversely, if T is represented by a diagonal matrix aji = δji λj relative to some basis {vi }, then reversing the argument shows that each vi is an eigenvector of T . This proves the following theorem. Theorem 9. A linear operator T ∈ L(V ) can be represented by a diagonal matrix if and only if V has a basis consisting of eigenvectors of T . If this is the case, then the diagonal elements of the matrix representation are precisely the eigenvalues of T . (Note however, that the eigenvalues need not necessarily be distinct.) 32 If T ∈ L(V ) is represented in some basis {ei } by a matrix A, and in the basis of eigenvectors {vi } by a diagonal matrix D, then the discussion above Example 2 tells us that A and D must be similar matrices. This proves the following version of Theorem 9, which we state as a corollary. Corollary 1. A matrix A ∈ Mn (F ) is similar to a diagonal matrix D if and only if A has n linearly independent eigenvectors. Corollary 2. A linear operator T ∈ L(V ) can be represented by a diagonal matrix if T has n = dim V distinct eigenvalues. Proof. This follows from our discussion above. Note that the existence of n = dim V distinct eigenvalues of T ∈ L(V ) is a sufficient but not necessary condition for T to have a diagonal representation. For example, the identity operator has the usual diagonal representation, but its only eigenvalues are λ = 1. In general, if any eigenvalue has multiplicity greater than 1, then there will be fewer distinct eigenvalues than the dimension of V . However, in this case it may be possible to choose an appropriate linear combination of eigenvectors in each eigenspace so the matrix of T will still be diagonal. We say that a matrix A is diagonalizable if it is similar to a diagonal matrix D. If P is a nonsingular matrix such that D = P −1 AP , then we say that P diagonalizes A. It should be noted that if λ is an eigenvalue of a matrix A with eigenvector v (i.e., Av = λv), then for any nonsingular matrix P we have (P −1 AP )(P −1 v) = P −1 Av = P −1 λv = λ(P −1 v). In other words, P −1 v is an eigenvector of P −1 AP . Similarly, we say that T ∈ L(V ) is diagonalizable if there exists a basis for V that consists entirely of eigenvectors of T . How do we actually go about diagonalizing a matrix? If T ∈ L(V ) and A is the matrix representation of T in a basis {ei }, then P is defined to be the transformation that takes P the basis {ei } into the basis {vi } of eigenvectors. In other words, vi = P ei = j ej pji . This means that the ith column of (pji ) is just the ith eigenvector of A. The fact that P must be nonsingular coincides with the requirement that T (or A) have n linearly independent eigenvectors vi . Example 10. In Example 9 we found the eigenvectors v1 = (2, 3) (corresponding to the eigenvalue λ1 = 4) and v2 = (1, −1) (corresponding to λ2 = −1) of the matrix 1 2 A= . 3 2 33 Then the transition matrix P is given by 2 P = 3 1 −1 and you can use your favorite method to show that 1 1 1 −1 . P = 5 3 −2 Then P −1 AP = = 1 5 1 1 1 2 2 1 3 −2 3 2 3 −1 4 0 = D. 0 −1 It is also easy to see that det A = −4 = λ1 λ2 and tr A = 3 = λ1 + λ2 . 5 More on Diagonalization In the previous section we showed that an operator T ∈ L(V ) can be represented by a diagonal matrix if and only if it has a basis of eigenvectors. However, we haven’t addressed the conditions under which such a basis will exist, or the types of matrices that will in fact be diagonalizable. One very general characterization deals with the concepts of algebraic and geometric multiplicities. Unfortunately, in order to explain these terms and show how they are useful we must first develop some additional concepts. Since these notes aren’t meant to be a complete course in linear algebra, we will be fairly brief in our discussion. First note that one eigenvalue can belong to more than one linearly independent eigenvector. In fact, if T ∈ L(V ) and λ is an eigenvalue of T , then the set Vλ := {v ∈ V : T v = λv} of all eigenvectors of T belonging to λ is a subspace of V called the eigenspace of λ. It is also easy to see that Vλ = Ker(λ1 − T ). Suppose we are given a matrix A = (aij ) ∈ Mm×n (F ). Then, by partitioning the rows and columns of A in some manner, we obtain what is called a block matrix. To illustrate, suppose A ∈ M3×5 (R) is given by 7 5 5 4 −1 5 . A = 2 1 −3 0 0 8 2 1 −9 34 Then we may partition A into blocks to obtain (for example) the matrix A11 A12 A= A21 A22 where A11 = 7 5 A21 = 2 0 5 1 −3 8 2 A12 = 4 −1 A22 = 0 5 1 −9 . If A and B are block matrices that are partitioned into the same number of blocks such that each of the corresponding blocks is of the same size, then it is clear that (in an obvious notation) A11 + B11 · · · A1n + B1n .. .. A+B = . . . Am1 + Bm1 ··· Amn + Bmn In addition, if C and D are block matrices such that the number of columns in each Cij is equal to the number of rows in each P Djk , then the product of C and D is also a block matrix CD where (CD)ik = j Cij Djk . Thus block matrices are multiplied as if each block were just a single element of each matrix in the product. In other words, each (CD)ik is a matrix that is the sum of a product of matrices. The proof of this fact is an exercise in matrix multiplication, and is left to you. The proof of the next theorem is just a careful analysis of the definition of determinant, and is omitted. Theorem 10. If A ∈ Mn (F ) is a block triangular matrix of the form A11 0 .. . 0 A12 A22 .. . A13 A23 .. . ··· ··· A1k A2k .. . 0 0 ··· Akk where each Aii is a square matrix and the 0’s are zero matrices of appropriate size, then k Y det Aii . det A = i=1 35 Example 11. Consider the matrix 1 −1 2 3 2 2 0 2 . A= 4 1 −1 −1 1 2 3 0 Subtract multiples of row 1 from rows 2, 3 and 4 to obtain the matrix 1 −1 2 3 0 4 −4 −4 . 0 5 −9 −13 0 3 1 −3 Now subtract 5/4 times row 2 from row 3, and 3/4 times row 2 from row 4. This yields the matrix 2 3 1 −1 0 4 −4 −4 B= 0 0 −4 −8 0 0 4 0 with det B = det A (see the discussion at the beginning of Section 4). Since B is in block triangular form we have 1 −1 −4 −8 = 4(32) = 128. det A = det B = 0 4 4 0 Next, suppose T ∈ L(V ) and let W be a subspace of V . Then W is said to be invariant under T (or simply T -invariant) if T (w) ∈ W for every w ∈ W . For example, if V = R3 then the xy-plane is invariant under the linear transformation that rotates every vector in R3 about the z-axis. As another example, note that if v ∈ V is an eigenvector of T , then T (v) = λv for some λ ∈ F, and hence v generates a one-dimensional subspace of V that is invariant under T (this is not necessarily the same as the eigenspace of λ). Another way to describe the invariance of W under T is to say that T (W ) ⊂ W . Then clearly T 2 (W ) = T (T (W )) ⊂ W , and in general T n (W ) ⊂ W for every n ∈ Z+ . Since W is a subspace of V , this means f (T )(W ) ⊂ W for any f (x) ∈ F[x]. In other words, if W is invariant under T , then W is also invariant under any polynomial in T (over the same field as W ). If W ⊂ V is T -invariant, we may focus our attention on the effect of T on W alone. To do this, we define the restriction of T to W as that operator T |W : W → W defined by (T |W )(w) = T (w) for every w ∈ W . In other words, the restriction is an operator T |W that acts only on the subspace W , and gives the same result as the full operator T gives when it acts on those vectors in V that happen to be in 36 W . We will frequently write TW instead of T |W . Now suppose T ∈ L(V ) and let W ⊂ V be a T -invariant subspace. Furthermore, let {v1 , . . . , vn } be a basis for V , where the first m < n vectors form a basis for W . If A = (aij ) is the matrix representation of T relative to this basis for V , then a little thought should convince you that A must be of the block matrix form B C A= 0 D where aij = 0 for j ≤ m and i > m. This is because T (w) ∈ W and any w ∈ W has components (w1 , . . . , wm , 0, . . . , 0) relative to the above basis for V . It should also be reasonably clear that B is just the matrix representation of TW . The formal proof of this fact is given in our next theorem. Theorem 11. Let W be a subspace of V and suppose T ∈ L(V ). Then W is T -invariant if and only if T can be represented in the block matrix form B C A= 0 D where B is a matrix representation of TW . Proof. First suppose that W is T -invariant. Choose a basis {v1 , . . . , vm } for W , and extend this to a basis {v1 , . . . , vm , vm+1 , . . . , vn } for V . Then, since T (vi ) ∈ W for each i = 1, . . . , m there exist scalars bij such that TW (vi ) = T (vi ) = v1 b1i + · · · + vm bmi for each i = 1, . . . , m. In addition, since T (vi ) ∈ V for each i = m + 1, . . . , n there also exist scalars cij and dij such that T (vi ) = v1 c1i + · · · + vm cmi + vm+1 dm+1,i + · · · + vn dni for each i = m + 1, . . . , n. Because T takes the ith basis vector into the ith column of the matrix representation of T , we see that this representation is given by an n × n matrix A of the form b11 · · · b1m c1 m+1 · · · c1n b21 · · · b2m c2 m+1 · · · c2n .. .. .. .. . . . . A= b · · · b c · · · c mm m m+1 mn m1 0 0 d · · · d m+1 m+1 m+1 n . . . . .. .. .. .. 0 ··· 0 dn m+1 · · · dnn 37 or, in block matrix form as A= B 0 C D where B is an m × m matrix that represents TW , C is an m × (n − m) matrix, and D is an (n − m) × (n − m) matrix. Conversely, if A has the stated form and {v1 , . . . , vn } is a basis for V , then the subspace W of V defined by vectors of the form w= m X αi vi i=1 where each αi ∈ F will be invariant under T . Indeed, for each i = 1, . . . , m we have T (vi ) = n X j=1 and hence T (w) = Pm i=1 vj aji = v1 b1i + · · · + vm bmi ∈ W αi T (vi ) ∈ W . Given a linear operator T ∈ L(V ), what we have called the multiplicity of an eigenvalue λ is the largest positive integer m such that (x − λ)m divides the characteristic polynomial ∆T (x). This is properly called the algebraic multiplicity of λ, in contrast to the geometric multiplicity which is the number of linearly independent eigenvectors belonging to that eigenvalue. In other words, the geometric multiplicity of λ is the dimension of Vλ . In general, we will use the word “multiplicity” to mean the algebraic multiplicity. The set of all eigenvalues of a linear operator T ∈ L(V ) is called the spectrum of T . If some eigenvalue in the spectrum of T is of algebraic multiplicity greater than 1, then the spectrum is said to be degenerate. If T ∈ L(V ) has an eigenvalue λ of algebraic multiplicity m, then it is not hard for us to show that the dimension of the eigenspace Vλ must be less than or equal to m. Note that since every element of Vλ is an eigenvector of T with eigenvalue λ, the space Vλ must be a T -invariant subspace of V . Furthermore, every basis for Vλ will obviously consist of eigenvectors corresponding to λ. Theorem 12. Let T ∈ L(V ) have eigenvalue λ. Then the geometric multiplicity of λ is always less than or equal to its algebraic multiplicity. In other words, if λ has algebraic multiplicity m, then dim Vλ ≤ m. Proof. Suppose dim Vλ = r and let {v1 , . . . , vr } be a basis for Vλ . Now extend this to a basis {v1 , . . . , vn } for V . Relative to this basis, T must have the matrix representation (see Theorem 11) λIr C . 0 D 38 Applying Theorem 10 and the fact that the determinant of a diagonal matrix is just the product of its (diagonal) elements, we see that the characteristic polynomial ∆T (x) of T is given by (x − λ)Ir −C ∆T (x) = 0 xIn−r − D = det[(x − λ)Ir ] det(xIn−r − D) = (x − λ)r det(xIn−r − D) which shows that (x − λ)r divides ∆T (x). Since by definition m is the largest positive integer such that (x − λ)m | ∆T (x), it follows that r ≤ m. Note that a special case of this theorem arises when an eigenvalue is of (algebraic) multiplicity 1. In this case, it then follows that the geometric and algebraic multiplicities are necessarily equal. We now proceed to show just when this will be true in general. Recall that any polynomial over an algebraically closed field will factor into linear terms (see equation (25)). Theorem 13. Assume that T ∈ L(V ) has a characteristic polynomial that factors into (not necessarily distinct) linear terms. Let T have distinct eigenvalues λ1 , . . . , λr with (algebraic) multiplicities m1 , . . . , mr respectively, and let dim Vλi = di . Then T is diagonalizable if and only if mi = di for each i = 1, . . . , r. Proof. Let dim V = n. We note that since the characteristic polynomial of T is of degree n and factors into linear terms, it follows that m1 + · · · + mr = n. We first assume that T is diagonalizable. By definition, this means that V has a basis consisting of n linearly independent eigenvectors of T . Since each of these basis eigenvectors must belong to at least one of the eigenspaces Vλi , it follows that V = Vλ1 + · · · + Vλr and consequently n ≤ d1 + · · · + dr . From Theorem 12 we know that di ≤ mi for each i = 1, . . . , r and hence n ≤ d1 + · · · + dr ≤ m1 + · · · + mr = n which implies d1 + · · · + dr = m1 + · · · + mr or (m1 − d1 ) + · · · + (mr − dr ) = 0. But each term in this equation is nonnegative (by Theorem 12), and hence we must have mi = di for each i. Conversely, suppose di = mi for each i = 1, . . . , r. For each i, we know that any basis for Vλi consists of linearly independent eigenvectors corresponding to the eigenvalue λi , while on the other hand we know that eigenvectors corresponding to distinct eigenvalues are linearly independent. Therefore the union B of the bases of 39 {Vλi } forms a linearly independent set of d1 + · · · + dr = m1 + · · · + mr vectors. But m1 + · · · + mr = n = dim V , and hence B forms a basis for V . Since this shows that V has a basis of eigenvectors of T , it follows by definition that T must be diagonalizable. Example 12. Consider the operator T ∈ L(R3 ) defined by T (x, y, z) = (9x + y, 9y, 7z). Relative to the standard basis for R3 , the 9 A=0 0 and hence the characteristic polynomial is matrix representation of T is given by 1 0 9 0 0 7 ∆A (x) = det(A − λI) = (9 − λ)2 (7 − λ) which is a product of linear factors. However, 0 1 A − 9I = 0 0 0 0 0 0 −2 which clearly has rank equal to 2, and hence nul(A − 9I) = 3 − 2 = 1 which is not the same as the algebraic multiplicity of λ = 9 (which is 2). Thus T is not diagonalizable. Example 13. Consider the operator on R3 defined by the following matrix: 5 −6 −6 A = −1 4 2 . 3 −6 −4 In order to avoid factoring a cubic polynomial, we compute the characteristic polynomial ∆A (x) = det(xI − A) by applying elementary row operations as follows (you should be able to see exactly what elementary row operations were performed in each step; see the discussion at the beginning of Section 4). x−5 0 −x + 2 6 6 x − 2 1 1 x−4 −2 x−4 −2 = −3 6 x+4 6 x + 4 −3 1 0 = (x − 2) 1 x − 4 −3 6 40 −1 −2 x+4 1 0 −1 −1 = (x − 2) 0 x − 4 0 6 x+1 x−4 −1 = (x − 2) 6 x+1 = (x − 2)2 (x − 1). We now see that A has eigenvalue λ1 = 1 with (algebraic) multiplicity 1, and eigenvalue λ2 = 2 with (algebraic) multiplicity 2. From Theorem 12 we know that the algebraic and geometric multiplicities of λ1 are necessarily the same and equal to 1, so we need only consider λ2 . Observing that 3 −6 −6 2 2 A − 2I = −1 3 −6 −6 it is obvious that rank(A − 2I) = 1, and hence nul(A − 2I) = 3 − 1 = 2. This shows that A is indeed diagonalizable. Let us now construct bases for the eigenspaces Wi = Vλi . This means we seek vectors v = (x, y, z) ∈ R3 such that (A − λi I)v = 0. This is easily solved by the usual row reduction techniques as follows. For λ1 = 1 we have 4 −6 −6 1 0 −1 1 0 −1 3 2 → −1 3 2→0 3 1 A − I = −1 3 −6 −5 3 −6 −5 0 −6 −2 1 →0 0 0 −1 3 1 0 0 which has the solutions x = z and y = −z/3 = −x/3. Therefore W1 is spanned by the single eigenvector v1 = (3, −1, 3). As to λ2 = 2, we proceed in a similar manner to obtain 3 −6 −6 1 −2 −2 2 2→0 0 0 A − 2I = −1 3 −6 −6 0 0 0 which implies that any vector (x, y, z) with x = 2y + 2z will work. For example, we can let x = 0 and y = 1 to obtain z = −1, and hence one basis vector for W2 is given by v2 = (0, 1, −1). If we let x = 1 and y = 0, then we have z = 1/2 so that another independent basis vector for W2 is given by v3 = (2, 0, 1). In terms of these eigenvectors, the transformation matrix P that diagonalizes A is given by 3 0 2 1 0 P = −1 3 −1 1 41 and I leave it to you to verify that AP = P D (i.e., P −1 AP = D) where D is the diagonal matrix with diagonal elements d11 = 1 and d22 = d33 = 2. 6 Diagonalizing Normal Matrices The previous section described some general conditions under which a matrix may be diagonalized. However, in physics the most useful matrices are either real symmetric (in the case of Mn (R)), Hermitian (in the case of Mn (C)) or unitary (also for Mn (C)). In this section I will show that in fact all of these can always be diagonalized. One tool that we will find useful is the Gram-Schmidt orthogonalization process which you have probably seen before. However, just in case, I will give a complete statement and proof. It will also be useful to note that any orthogonal set of vectors is necessarily linearly Pn independent. To see this, let {v1 , . . . , vn } be an orthogonal set, and suppose i=1 ai vi = 0. Taking the inner product with vj we have n n X X ai δij = aj . ai hvj , vi i = 0= i=1 i=1 Since this holds for all j, we have ai = 0 for all i so that {vi } is linearly independent as claimed. Theorem 14. Let V be a finite-dimensional inner product space. Then there exists an orthonormal set of vectors that forms a basis for V . Proof. Let dim V = n and let {u1 , . . . , un } be a basis for V . We will construct a new basis {w1 , . . . , wn } such that hwi , wj i = δij . To begin, we choose w1 = u1 ku1 k so that 2 kw1 k = hw1 , w1 i = hu1 / ku1 k , u1 / ku1 ki = hu1 , u1 i/ ku1 k 2 = ku1 k2 / ku1 k2 = 1 and hence w1 is a unit vector. We now take u2 and subtract off its “projection” along w1 . This will leave us with a new vector v2 that is orthogonal to w1 . Thus, we define v2 = u2 − hw1 , u2 iw1 so that hw1 , v2 i = hw1 , u2 i − hw1 , u2 ihw1 , w1 i = 0 42 If we let v2 kv2 k w2 = then {w1 , w2 } is an orthonormal set . (That v2 6= 0 will be shown below.) We now go to u3 and subtract off its projection along w1 and w2 . In other words, we define v3 = u3 − hw2 , u3 iw2 − hw1 , u3 iw1 so that hw1 , v3 i = hw2 , v3 i = 0. Choosing w3 = v3 kv3 k we now have an orthonormal set {w1 , w2 , w3 }. It is now clear that given an orthonormal set {w1 , . . . , wk }, we let vk+1 = uk+1 − k X i=1 hwi , uk+1 iwi so that vk+1 is orthogonal to w1 , . . . , wk , and hence we define wk+1 = vk+1 . kvk+1 k It should now be obvious that we can construct an orthonormal set of n vectors from our original basis of n vectors. To finish the proof, we need only show that w1 , . . . , wn are linearly independent. To see this, note first that since u1 and u2 are linearly independent, w1 and u2 must also be linearly independent, and hence v2 6= 0 by definition of linear independence. Thus w2 exists and {w1 , w2 } is linearly independent (since they are orthogonal). Next, {w1 , w2 , u3 } is linearly independent since w1 and w2 are in the linear span of u1 and u2 . Hence v3 6= 0 so that w3 exists, and again {w1 , w2 , w3 } is linearly independent. In general then, if {w1 , . . . , wk } is linearly independent, it follows that the set {w1 , . . . , wk , uk+1 } is also independent since {w1 , . . . , wk } is in the linear span of {u1 , . . . , uk }. Hence vk+1 6= 0 and wk+1 exists. Then {w1 , . . . , wk+1 } is linearly independent. Thus {w1 , . . . , wn } forms a basis for V , and hwi , wj i = δij . Corollary (Gram-Schmidt process). Let {u1 , . . . , un } be a linearly independent set of vectors in an inner product space V . Then there exists a set of orthonormal vectors w1 , . . . , wn ∈ V such that the linear span of {u1 , . . . , uk } is equal to the linear span of {w1 , . . . , wk } for each k = 1, . . . , n. Proof. This corollary follows by a careful inspection of the proof of Theorem 14. 43 We emphasize that the Gram-Schmidt algorithm (the “orthogonalization process” of Theorem 14) as such applies to any inner product space, and is not restricted to only finite-dimensional spaces. Example 14. Consider the following basis vectors for R3 : u1 = (3, 0, 4) u2 = (−1, 0, 7) u3 = (2, 9, 11). Let us apply the Gram-Schmidt process (with the standard inner product on R3 ) to obtain a new orthonormal basis for R3 . √ Since ku1 k = 9 + 16 = 5, we define w1 = u1 /5 = (3/5, 0, 4/5). Next, using hw1 , u2 i = −3/5 + 28/5 = 5 we let v2 = (−1, 0, 7) − (3, 0, 4) = (−4, 0, 3). Since kv2 k = 5, we have w2 = (−4/5, 0, 3/5). Finally, using hw1 , u3 i = 10 and hw2 , u3 i = 5 we let v3 = (2, 9, 11) − (−4, 0, 3) − (6, 0, 8) = (0, 9, 0) and hence, since kv3 k = 9, our third basis vector becomes w3 = (0, 1, 0). I leave it to you to show that {w1 , w2 , w3 } does indeed form an orthonormal basis for R3 . Recall that the transpose of A = (aij ) ∈ Mn (C) is the matrix AT = (aij )T = (aji ). Then for any A, B ∈ Mn (C) we have (AB)Tij = (AB)ji = n X ajk bki = k=1 n X bTik aTkj = (B T AT )ij k=1 and therefore (AB)T = B T AT as you should already know. Now suppose we have a matrix A ∈ Mn (C). We define the adjoint (or Hermitian adjoint) of A to be the matrix A† = A∗T . In other words, the adjoint of A is its complex conjugate transpose. From what we just showed, it is easy to see that (AB)† = B † A† . If it so happens that A† = A, then A is said to be a Hermitian matrix. 44 If a matrix U ∈ Mn (C) has the property that U † = U −1 , then we say U is unitary. Thus a matrix U is unitary if U U † = U † U = I. (I state without proof that in a finite-dimensional space, it is only necessary to require either U U † = I or U † U = I. In other words, in finite dimensions (or for any finite group) the existence of a left (right) inverse implies the existence of a right (left) inverse. However, the full definition is necessary in the case of infinite-dimensional spaces.) We also see that the product of two unitary matrices U and V is unitary since (U V )† U V = V † U † U V = V † IV = V † V = I. If a matrix N ∈ Mn (C) has the property that it commutes with its adjoint, i.e., N N † = N † N , then N is said to be a normal matrix. Note that Hermitian and unitary matrices are automatically normal. Example 15. Consider the matrix A ∈ M2 (C) given by 1 1 −1 . A= √ i 2 i Then the adjoint of A is given by 1 A† = √ 2 1 −i −1 −i and I leave it to you to verify that AA† = A† A = I, and hence show that A is unitary. A convenient property of the adjoint is this. If A ∈ Mn (C) and x, y ∈ Cn , then Ax ∈ Cn also, so we may use the standard inner product on Cn to write (using A† = A∗T ) hAx, yi = n X (Ax)∗i yi = n X a∗ij x∗j yi = i=1 i=1 † = hx, A yi. n X x∗j a†ji yi i=1 (27) In the particular case of a unitary matrix, we see that hU x, U yi = hx, U † U yi = hx, yi so that unitary transformations also preserve the angle between two vectors (and hence maintains orthogonality as well). Choosing y = x we also see that 2 kU xk = hU x, U xi = hx, U † U xi = hx, Ixi = hx, xi = kxk 2 so that unitary transformations preserve lengths of vectors, i.e., they are really just rotations in Cn . 45 It is well worth pointing out that in the case of a real matrix A ∈ Mn (R), instead of the adjoint A† we have the transpose AT and equation (27) becomes hAx, yi = hx, AT yi or equivalently hAT x, yi = hx, Ayi. (28) We will use this below when we prove that a real symmetric matrix has all real eigenvalues. Note that since U ∈ Mn (C), the rows Ui and columns U i of U are just vectors in Cn . This means we can take their inner product relative to the standard inner product on Cn . Writing out the relation U U † = I in terms of components, we have (U U † )ij = n X uik u†kj = k=1 n X uik u∗jk = n X k=1 k=1 u∗jk uik = hUj , Ui i = δij and from U † U = I we see that (U † U )ij = n X k=1 u†ik ukj = n X k=1 u∗ki ukj = hU i , U j i = δij . In other words, a matrix is unitary if and only if its rows (or columns) each form an orthonormal set. Note we have shown that if the rows (columns) of U ∈ Mn (C) form an orthonormal set, then so do the columns (rows), and either of these is a sufficient condition for U to be unitary. For example, you can easily verify that the matrix A in Example 15 satisfies these conditions. It is also worth pointing out that Hermitian and unitary matrices have important analogues over the real number system. If A ∈ Mn (R) is Hermitian, then A = A† = AT , and we say A is symmetric. If U ∈ Mn (R) is unitary, then U −1 = U † = U T , and we say U is orthogonal. Repeating the above calculations over R, it is easy to see that a real matrix is orthogonal if and only if its rows (or columns) form an orthonormal set. Let us summarize what we have shown so far in this section. Theorem 15. The following conditions on a matrix U ∈ Mn (C) are equivalent: (i) U is unitary. (ii) The rows Ui of U form an orthonormal set. (iii) The columns U i of U form an orthonormal set. Note that the equivalence of (ii) and (iii) in this theorem means that the rows of U form an orthonormal set if and only if the columns of U form an orthonormal set. But the rows of U are just the columns of U T , and hence U is unitary if and only if U T is unitary. 46 Corollary. The following conditions on a matrix A ∈ Mn (R) are equivalent: (i) A is orthogonal. (ii) The rows Ai of A form an orthonormal set. (iii) The columns Ai of A form an orthonormal set. Our next theorem details several useful properties of orthogonal and unitary matrices. Theorem 16. (i) If A is an orthogonal matrix, then det A = ±1. (ii) If U is a unitary matrix, then |det U | = 1. Alternatively, det U = eiφ for some real number φ. Proof. (i) We have AAT = I, and hence 1 = det I = det(AAT ) = (det A)(det AT ) = (det A)2 so that det A = ±1. (ii) If U U † = I then, as above, we have 1 = det I = det(U U † ) = (det U )(det U † ) = (det U )(det U T )∗ 2 = (det U )(det U )∗ = |det U | . Since the absolute value is defined to be positive, this shows |det U | = 1 and hence det U = eiφ for some real φ. Example 16. Let us take a look at rotations in R2 as shown, for example, in the figure below. Recall that if we have two bases {ei } P and {ēi }, then they are related by a transition matrix A = (aij ) defined by ēi = j ej aji . In addition, if P i P i P X= x ei = x̄ ēi , then xi = j aij x̄j . If both {ei } and {ēi } are orthonormal bases, then E X D X X akj hei , ek i = hei , ēj i = ei , akj δik = aij . ek akj = k k k Using the usual dot product on R2 as our inner product and referring to the figure below, we see that the elements aij are given by a11 = e1 · ē1 = |e1 | |ē1 | cos θ = cos θ a12 = e1 · ē2 = |e1 | |ē2 | cos(π/2 + θ) = − sin θ a21 = e2 · ē1 = |e2 | |ē1 | cos(π/2 − θ) = sin θ a22 = e2 · ē2 = |e2 | |ē2 | cos θ = cos θ 47 x̄2 x2 X θ x̄1 θ x1 Thus the matrix A is given by (aij ) = cos θ sin θ − sin θ . cos θ I leave it to you to compute directly and show ATA = AAT = I and det A = +1. Example 17. Referring to Example 16, we can show that any (real) 2×2 orthogonal matrix with det A = +1 has the form cos θ − sin θ (aij ) = sin θ cos θ for some θ ∈ R. To see this, suppose A has the form a b c d where a, b, c, d ∈ R. Since A is orthogonal, its rows form an orthonormal set, and hence we have a2 + b2 = 1, c2 + d2 = 1, ac + bd = 0, ad − bc = 1 where the last equation follows from det A = 1. If a = 0, then the first of these equations yields b = ±1, the third then yields d = 0, and the last yields −c = 1/b = ±1 which is equivalent to c = −b. In other words, if a = 0, then A has either of the forms 0 1 0 −1 or . −1 0 1 0 The first of these is of the required form if we choose θ = −90◦ = −π/2, and the second is of the required form if we choose θ = +90◦ = +π/2. Now suppose that a 6= 0. From the third equation we have c = −bd/a, and substituting this into the second equation, we find (a2 + b2 )d2 = a2 . Using the first 48 equation, this becomes a2 = d2 or a = ±d. If a = −d, then the third equation yields b = c, and hence the last equation yields −a2 − b2 = 1 which is impossible. Therefore a = +d, the third equation then yields c = −b, and we are left with a −c c a Since det A = a2 + c2 = 1, there exists a real number θ such that a = cos θ and c = sin θ which gives us the desired form for A. One of the most important and useful properties of matrices over C is that they can always be put into triangular form by an appropriate transformation. To show this, it will be helpful to recall from Section 1 that if A and B are two matrices for which the product AB is defined, then the ith row of AB is given by (AB)i = Ai B and the ith column of AB is given by (AB)i = AB i . Theorem 17 (Schur Canonical Form). If A ∈ Mn (C), then there exists a unitary matrix U ∈ Mn (C) such that U † AU is upper-triangular. Furthermore, the diagonal entries of U † AU are just the eigenvalues of A. Proof. The proof is by induction. If n = 1 there is nothing to prove, so we assume the theorem holds for any square matrix of size n − 1 ≥ 1, and suppose A is of size n. Since we are dealing with the algebraically closed field C, we know that A has n (not necessarily distinct) eigenvalues. Let λ be one of these eigenvalues, and denote e 1 . We extend U e 1 to a basis for Cn , and by the the corresponding eigenvector by U Gram-Schmidt process we assume this basis is orthonormal. From our discussion e above, we see that this basis may be used as the columns of a unitary matrix U 1 e with U as its first column. We then see that e † AU e )1 = U e † (AU e )1 = U e † (AU e 1) = U e † (λU e 1 ) = λ(U e †U e 1) (U e †U e )1 = λI 1 = λ(U e † AU e has the form and hence U e † AU e = U λ 0 .. . 0 ∗···∗ B where B ∈ Mn−1 (C) and the *’s are (in general) nonzero scalars. By our induction hypothesis, we may choose a unitary matrix W ∈ Mn−1 (C) such that W † BW is upper-triangular. Let V ∈ Mn (C) be a unitary matrix of the 49 form V = 1 0 .. . 0···0 W 0 e V ∈ Mn (C). Then and define the unitary matrix U = U e V )† A(U e V ) = V † (U e † AU e )V U † AU = (U is upper-triangular since (in an obvious shorthand notation) λ ∗ 1 0 1 0 λ e † AU e )V = 1 0 † V † (U = 0 W 0 B 0 W 0 W† 0 λ ∗ = 0 W † BW ∗ BW and W † BW is upper-triangular by the induction hypothesis. It is easy to see (since the determinant of a triangular matrix is the product of the diagonal entries) that the roots of det(λI − U † AU ) are just the diagonal entries of U † AU because λI − U † AU is of the upper triangular form λ − (U † AU )11 ∗ ∗ 0 λ − (U † AU )22 ∗ .. .. .. . ∗ . . 0 λ − (U † AU )nn 0 where the *’s are just some in general nonzero entries. But det(λI − U † AU ) = det[U † (λI − A)U ] = det(λI − A) so that A and U † AU have the same eigenvalues. Corollary. If A ∈ Mn (R) has all its eigenvalues in R, then the matrix U defined in Theorem 17 may be chosen to have all real entries. Proof. If λ ∈ R is an eigenvalue of A, then A − λI is a real matrix with determinant det(A−λI) = 0, and therefore the homogeneous system of equations (A−λI)X = 0 e 1 = X, we may now proceed as in Theorem 17. The has a real solution. Defining U details are left to you. We say that two matrices A, B ∈ Mn (C) are unitarily similar if there exists a unitary matrix U such that B = U † AU = U −1 AU . Since this defines an equivalence 50 relation on the set of all matrices in Mn (C), it is also common to say that A and B are unitarily equivalent. I leave it to you to show that if A and B are unitarily similar and A is normal, then B is also normal. In particular, suppose U is unitary and N is such that U † N U = D is diagonal. Since any diagonal matrix is automatically normal, it follows that N must be normal also. In other words, any matrix unitarily similar to a diagonal matrix is normal. We now show that the converse is also true, i.e., that any normal matrix is unitarily similar to a diagonal matrix. This extremely important result is the basis for many physical applications in both classical and quantum physics. To see this, suppose N is normal, and let U † N U = D be the Schur canonical form of N . Then D is both upper-triangular and normal (since it is unitarily similar to a normal matrix). We claim that the only such matrices are diagonal. For, consider the (1, 1) elements of DD† and D† D. From what we showed above, we have (DD† )11 = hD1 , D1 i = |d11 |2 + |d12 |2 + · · · + |d1n |2 and (D† D)11 = hD1 , D1 i = |d11 |2 + |d21 |2 + · · · + |dn1 |2 . But D is upper-triangular so that d21 = · · · = dn1 = 0. By normality we must have (DD† )11 = (D† D)11 , and therefore d12 = · · · = d1n = 0 also. In other words, with the possible exception of the (1, 1) entry, all entries in the first row and column of D must be zero. In the same manner, we see that (DD† )22 = hD2 , D2 i = |d21 |2 + |d22 |2 + · · · + |d2n |2 and 2 2 2 (D† D)22 = hD2 , D2 i = |d12 | + |d22 | + · · · + |dn2 | . Since the fact that D is upper-triangular means d32 = · · · = dn2 = 0 and we just showed that d21 = d12 = 0, it again follows by normality that d23 = · · · = d2n = 0. Thus all entries in the second row and column with the possible exception of the (2, 2) entry must be zero. Continuing this procedure, it is clear that D must be diagonal as claimed. In other words, an upper-triangular normal matrix is necessarily diagonal. This discussion proves the following very important theorem. Theorem 18. A matrix N ∈ Mn (C) is normal if and only if there exists a unitary matrix U such that U † N U is diagonal. Corollary. If A = (aij ) ∈ Mn (R) is symmetric, then its eigenvalues are real and there exists an orthogonal matrix S such that S T AS is diagonal. 51 Proof. If the eigenvalues are real, then the rest of this corollary follows from the corollary to Theorem 17 and the real analogue of the proof of Theorem 18. Now suppose A = AT so that aij = aji . If λ is an eigenvalue of A, then there exists a (nonzero and not necessarily real) vector x ∈ Cn such that Ax = λx and hence 2 hx, Axi = λhx, xi = λ kxk . On the other hand, using equation (28) we see that 2 hx, Axi = hAT x, xi = hx, AT xi∗ = hx, Axi∗ = λ∗ hx, xi∗ = λ∗ kxk . 2 Subtracting these last two equations yields (λ − λ∗ ) kxk = 0 and hence λ = λ∗ since kxk 6= 0 by definition. Let me make some observations. Note that any basis relative to which a normal matrix N is diagonal is by definition a basis of eigenvectors. The unitary transition matrix U that diagonalizes N has columns that are precisely these eigenvectors, and since the columns of a unitary matrix are orthonormal, it follows that the eigenvector basis is in fact orthonormal. Of course, the analogous result holds for a real symmetric matrix also. 52 Part II: Vector Calculus 7 Surfaces To begin, it’s worth being a little bit careful in defining a surface. For simplicity we restrict attention to R3 , and since this isn’t an advanced calculus course, we’ll stick to a less formal approach and rely instead on intuition. So, a surface in R3 is best represented parametrically as x = x(u, v) = (x(u, v), y(u, v), z(u, v)) where (u, v) ∈ A ⊂ R2 . v z u = const u = const v = const v = const y u x For v = v0 = const, the curve u → x(u, v0 ) has tangent vector ∂x ∂y ∂z ∂x = , , ∂u ∂u ∂u ∂u which is traditionally denoted by xu . Similarly, for u = u0 = const we have the curve v → x(u0 , v) with tangent ∂x ∂x ∂y ∂z := xv . = , , ∂v ∂v ∂v ∂v At any point (u, v), the vectors xu and xv span the tangent plane to the surface. Their cross product is the normal to the surface (because it’s normal to both xu and xv ) and hence we have the surface normal x̂ ŷ ẑ xu × xv = ∂x/∂u ∂y/∂u ∂z/∂u . ∂x/∂v ∂y/∂v ∂z/∂v Therefore the principle normal is the unit normal n̂ = xu × xv . kxu × xv k 53 n̂ xv xu Example 18. The sphere of radius a can be expressed as r = x(θ, ϕ) = (a sin θ cos ϕ, a sin θ sin ϕ, a cos θ) where 0 ≤ θ ≤ π and 0 ≤ ϕ ≤ 2π. Then we have x̂ ŷ ẑ xθ × xϕ = a cos θ cos ϕ a cos θ sin ϕ −a sin θ −a sin θ sin ϕ a sin θ cos ϕ 0 = a2 sin2 θ cos ϕ x̂ + a2 sin2 θ sin ϕ ŷ + a2 sin θ cos θ ẑ so that 2 kxθ × xϕ k = (a2 sin2 θ cos ϕ)2 + (a2 sin2 θ sin ϕ)2 + (a2 sin θ cos θ)2 = a4 sin2 θ and therefore n̂ = xθ × xϕ = sin θ cos ϕ x̂ + sin θ sin ϕ ŷ + cos θ ẑ . kxθ × xϕ k Note that this is just n̂ = (1/a)r. But krk = kx(θ, ϕ)k = a so we have n̂ = r̂ as should have been expected. What about the area of a surface? We subdivide the region A ⊂ R2 into small rectangles and look at the image of each little rectangle. v x(u, v + dv) A x(u, v) u 54 x(u + du, v) For the infinitesimal rectangle on the surface we have the distance between points given by x(u + du, v) − x(u, v) ≈ ∂x du = xu du ∂u x(u, v + dv) − x(u, v) ≈ ∂x dv = xv dv ∂v so the element of area is dS = kxu du × xv dvk = kxu × xv k du dv and hence that total area of the surface is given by Z S= kxu × xv k du dv . A It also follows that the integral of a function f (x, y, z) over the surface is given by Z Z f (x, y, z) dS = f (x(u, v), y(u, v), z(u, v)) kxu × xv k du dv . A 8 A Gradients Now let us consider the gradient of a function f (x, y, z). We know that the differential of the function is given by df = ∂f ∂f ∂f dx + dy + dz . ∂x ∂y ∂z Since dx = dx x̂ + dy ŷ + dz ẑ, we may define the vector ∇f = ∂f ∂f ∂f x̂ + ŷ + ẑ ∂x ∂y ∂z so that df = ∇f · dx . The vector operator ∇ is called the gradient vector: ∂ ∂ ∂ ∂ ∂ ∂ = x̂ , , + ŷ + ẑ . ∇= ∂x ∂y ∂z ∂x ∂y ∂z To understand what this represents physically, suppose that the displacement dx is in the direction of constant f , i.e., dx is tangent to the surface f = const. Then clearly df = 0 in that direction so that df = ∇f · dx = 0. But this means that ∇f ⊥dx; in other words, the gradient is orthogonal to the surface. And since df = ∇f · dx = k∇f k kdxk cos θ, this shows that df will be largest when cos θ = 1, i.e., when dx points along ∇f . 55 Other coordinate systems work the same way. For example, if we have f (r, θ, ϕ), then ∂f ∂f ∂f df = dr + dθ + dϕ ∂r ∂θ ∂ϕ while dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂ . Remark : The first thing to formulate is how to find an infinitesimal displacement dx in a curvilinear coordinate system. Let us consider the usual spherical coordinates as an example. z θ x y ϕ x Writing kxk = r, the position vector x has (x, y, z) coordinates x = (r sin θ cos ϕ, r sin θ sin ϕ, r cos θ) . If we let ui stand for the ith coordinate of a general curvilinear coordinate system, then a unit vector in the ui direction is by definition ûi = ∂x/∂ui . k∂x/∂ui k For our spherical coordinates we have for r: ∂x = (sin θ cos ϕ, sin θ sin ϕ, cos θ) ∂r and so that 1/2 ∂x = ∂x , ∂x =1 ∂r ∂r ∂r r̂ = (sin θ cos ϕ, sin θ sin ϕ, cos θ) For θ: and ∂x = r̂ . ∂r ∂x = (r cos θ cos ϕ, r cos θ sin ϕ, −r sin θ) ∂θ 56 and so that 1/2 ∂x = ∂x , ∂x =r ∂θ ∂θ ∂θ θ̂ = (cos θ cos ϕ, cos θ sin ϕ, − sin θ) For ϕ: and so that and ∂x = r θ̂ . ∂θ ∂x = (−r sin θ sin ϕ, r sin θ cos ϕ, 0) ∂ϕ 1/2 ∂x = ∂x , ∂x = r sin θ ∂ϕ ∂ϕ ∂ϕ ϕ̂ = (− sin ϕ, cos ϕ, 0) and ∂x = r sin θ ϕ̂ . ∂ϕ Putting this all together we see that dx = ∂x ∂x ∂x dr + dθ + dϕ ∂r ∂θ ∂ϕ or dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂ . You can also easily verify the the unit vectors r̂, θ̂, ϕ̂ constructed in this manner are orthonormal. While this was the correct way to find dx, the easy way to find it in various coordinate systems is to hold two variables constant, vary the third, and see what the resulting displacement is. In the case of spherical coordinates, holding θ, ϕ constant and varying r we have dx = dr r̂. Holding r, ϕ constant and varying θ we have dx = rdθ θ̂. Finally, holding r, θ constant and varying ϕ we have dx = r sin θ dϕ ϕ̂. Putting these together we obtain the general displacement dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂. Note also that each of the different dx’s lies on the edge of an infinitesimal “cube,” and hence the volume element in spherical coordinates is the product of the sides of the cube or d3 x = r2 sin θ drdθdϕ.) In any case, we see that we can write 1 ∂f 1 ∂f ∂f r̂ + θ̂ + ϕ̂ · (dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂) df = ∂r r ∂θ r sin θ ∂ϕ and hence from df = ∇f · dx it follows that ∇ = r̂ ∂ ∂ 1 ∂ 1 + θ̂ + ϕ̂ . ∂r r ∂θ r sin θ ∂ϕ It is important to realize that this is the form of the gradient with respect to an 57 orthonormal set of basis vectors. Such a basis is called a non-coordinate basis in differential geometry. 9 Rotations First recall the definition of the matrix representation of a linear operator: If T ∈ L(U, V ) where U has basis {ui } and V has basis {vi }, then the matrix representation (ai j ) of T with respect to these bases is defined by T ui = vj aj i . We will sometimes write A = [T ]vu to denote the fact that A is the matrix representation of T with respect to the given bases of U and V . Since with respect to {vi }, the basis vectors themselves have coordinates v1 = (1, 0, . . . , 0), . . . , vn = (0, . . . , 0, 1), we see that 1 a i 1 1 .. .. n .. 1 1 n T ui = v1 a i + · · · vn a i = . a i + · · · + . a i = . 0 0 an i which is the ith column of the matrix (aj i ). In other words, a linear transformation T takes the ith basis vector into the ith column of the matrix representation of T . The result of T acting on an arbitrary vector x ∈ U is then given by T x = T (xi ui ) = xi T ui = xi vj aj i = (aj i xi )vj . If we write y = T x, then y = y j vj = (aj i xi )vj and we see that y j = aj i xi . Suppose we have a vector space V with two bases {ei } and {ēi }. Then any basis vector ēi can be written in terms of the basis {ei } as ēi = ej pj i where the transition matrix (pi j ) is necessarily nonsingular (since we can also write ei in terms of the basis {ēi }). It is useful to think of the transition matrix as defining a linear operator P by ēi = P (ei ), and then (pi j ) is the matrix representation of P relative to the bases {ei } and {ēi } as defined in the usual manner. Since any x ∈ V can be written in terms of either basis, we have x = xj ej = x̄i ēi = x̄i ej pj i = (pj i x̄i )ej which implies that xj = pj i x̄i or equivalently x̄i = (p−1 )i j xj . (29) What we will now do is focus on rotations in R2 . Rotating the basis vectors is just a particular type of change of basis, and the transition matrix is just a rotation matrix. 58 To begin with, let r be a vector in R2 and consider a counterclockwise rotation of the x1 x2 -plane about the x3 -axis as shown below. (For simplicity of notation, we will let x = x1 and y = x2 .) x̄2 x2 r ē2 θ e2 φ ē1 θ e1 x̄1 x1 The vectors ei and ēi are the usual orthonormal basis vectors with kei k = kēi k = 1. From the geometry of the diagram we see that ē1 = (cos θ)e1 + (sin θ)e2 ē2 = −(sin θ)e1 + (cos θ)e2 so that ēi = P (ei ) = ej pj i and the transition matrix (pj i ) is given by cos θ − sin θ j (p i ) = . sin θ cos θ (30) You can easily compute the matrix P −1 , but it is better to make the general observation that rotating the coordinate system doesn’t change the length of r. So using krk2 = xi xi = x̄j x̄j together with xi = pi j x̄j this becomes xi xi = pi j x̄j pi k x̄k = (pT )k i pi j x̄j x̄k := x̄j x̄j so that we must have (pT )k i pi j = δjk . In matrix notation this is just P T P = I which implies that P T = P −1 . This is the definition of an orthogonal transformation (or orthogonal matrix). In other words, a matrix A ∈ Mn (F ) is said to be orthogonal if and only if AT = A−1 . As an important consequence of this definition, note that if A is orthogonal, then 1 = det I = det(AA−1 ) = det(AAT ) = (det A)(det AT ) = (det A)2 and hence det A = ±1 . 59 (31) Going back to our example rotation, we therefore have cos θ sin θ P −1 = P T = − sin θ cos θ so that x̄i = (p−1 )i j xj = (pT )i j xj or x̄1 = (cos θ)x1 + (sin θ)x2 x̄2 = −(sin θ)x1 + (cos θ)x2 To check these results, we first verify that P −1 = P T : cos θ sin θ cos θ − sin θ 1 0 PTP = = =I. − sin θ cos θ sin θ cos θ 0 1 Next, from the diagram we see that x1 = r cos(θ + φ) = r cos θ cos φ − r sin θ sin φ = (cos θ)x̄1 − (sin θ)x̄2 x2 = r sin(θ + φ) = r sin θ cos φ + r cos θ sin φ = (sin θ)x̄1 + (cos θ)x̄2 In matrix form this is or, alternatively, x1 x2 = cos θ sin θ x̄1 x̄2 = cos θ − sin θ − sin θ cos θ x̄1 x̄2 (32) sin θ cos θ x1 x2 (33) which is the same as we saw above using (pT )i j . To be completely precise, the rotation that we have just described is properly called a passive transformation because it left the vector alone and rotated the coordinate system. An alternative approach is to leave the coordinate system alone and rotate the vector itself. This is called an active transformation. One must be very careful when reading the literature to be aware of just which type of rotation is under consideration. Let’s compare the two types of rotation. With an active transformation we have the following situation: x2 r̄ r θ φ x1 60 Here the vector r is rotated by θ to give the vector r̄ where, of course, krk = kr̄k. In the passive case we defined the transition matrix P by ēi = P (ei ). Now, in the active case we define a linear transformation T by r̄ = T (r). From the diagram, the components of r̄ are given by x̄1 = r cos(θ + φ) = r cos θ cos φ − r sin θ sin φ = (cos θ)x1 − (sin θ)x2 x̄2 = r sin(θ + φ) = r sin θ cos φ + r cos θ sin φ = (sin θ)x1 + (cos θ)x2 or x̄1 x̄2 = Another way to write this is cos θ sin θ − sin θ cos θ x1 x2 . (34) (x̄1 , x̄2 ) = T (x1 , x2 ) = ((cos θ)x1 − (sin θ)x2 , (sin θ)x1 + (cos θ)x2 ). Then the first column of [T ] is T (e1 ) = T (1, 0) = (cos θ, sin θ) and the second column is T (e2 ) = T (0, 1) = (− sin θ, cos θ) so that [T ] = cos θ sin θ − sin θ cos θ . as in equation (34). Carefully compare the matrix in equation (34) with that in equation (33). The matrix in equation (34) is obtained from the matrix in equation (33) by letting θ → −θ. This is the effective difference between active and passive rotations. If a passive transformation rotates the coordinate system counterclockwise by an angle θ, then the corresponding active transformation rotates the vector by the same angle but in the clockwise direction. Now let us see how the scalar and vector products behave under orthogonal transformations in R3 . Let a, b ∈ R3 , and consider a · b = kak kbk cos θ. Since the magnitude of a vector doesn’t change under a rotation, and the angle between two vectors also doesn’t change, it should be clear that a · b is also invariant under rotations. That’s why it’s called a scalar product. However, we can easily prove the invariance directly as follows. Under an orthogonal transformation a → ā we have āi = (pT )i j aj (and similarly for b) where the rotation matrix P is orthogonal (P −1 = P T ). Then l ā · b̄ = āi b̄i = (pT )i k (pT )i ak bl = pl i (pT )i k ak bl = δkl ak bl = ak bk = a · b 61 so that the scalar product is indeed invariant under orthogonal transformations. However, what can we say about the cross product? Since the definition of a × b depends on the right-hand rule, it is reasonable to suspect that there will be some kind of dependence on the particular orthogonal transformation (since we know that the determinant of an orthogonal transformation can be either +1 or −1). Proceeding as in the scalar product, we have (using Theorem 3) r s ā × b̄ = (ā × b̄)i ēi = εijk āj b̄k ēi = εijk (pT )j (pT )k ar bs pt i et = εijk pt i pr j ps k ar bs et = εtrs det(pi j )ar bs et = (a × b)t et det P = (a × b) det P . Now, a proper rotation is one that has det P = +1. These are the rotations that can be continuously connected to the identity transformation (which obviously has det I = +1). So for a proper rotation we have ā × b̄ = a × b. However, under an improper rotation (i.e., parity or inversion of coordinates) we have det P = −1 so that ā × b̄ = −a × b. This is why the cross product of two vectors is called a pseudo vector or an axial vector—its definition depends on the orientation of the coordinate system. We have seen that under a rotation, the components of a vector x transform as xi → x̄i = (p−1 )i j xj (see equation (29)), and this can be taken as the definition of a vector, i.e., a quantity whose components transform under proper rotations according to this rule. It is not hard to show directly that under proper rotations, the cross product transforms as a vector. The proof is essentially what we just did r above, noting that P −1 P = I (so that δli = (p−1 )l pr i ) and det P −1 = (det P )−1 = +1: n m m n (ā × b̄)i = εijk āj b̄k = εijk (p−1 )j (p−1 )k am bn = εljk δli (p−1 )j (p−1 )k am bn n m r = εljk (p−1 )l pr i (p−1 )j (p−1 )k am bn = εrmn (det P −1 )pr i am bn = (a × b)r pr i = (pT )i r (a × b)r = (p−1 )i r (a × b)r which is the definition of how vectors transform under proper rotations. Finally, what can we say about how the gradient transforms? (For simplicity, we restrict consideration to cartesian coordinates only.) From xk = pk i x̄i we have pk i = ∂xk ∂ x̄i and hence from the chain rule we have ∂ ∂xk ∂ ∂ = = pk i k . i ∂ x̄ ∂ x̄i ∂xk ∂x 62 Then (again using P −1 = P T ) X X X ∂ ∂ k ∂ e j pj i pk i k = ej pj i (p−1 )i ēi i = ∇= ∂ x̄ ∂x ∂xk i ijk = X ej δ jk jk ijk X ∂ ∂ ej j = k ∂x ∂x j = ∇. In other words, the gradient is invariant under rotations, just like an ordinary vector. 10 The Divergence Theorem Suppose we have a vector field A = A(x, y, z) defined throughout some region of R3 . The field A could represent the velocity field of a fluid flow, or the electric field of some charge distribution, or a lot of other things. Consider a small volume element dxdydz centered at a point (x, y, z). z A(x, y, z) y (x, y, z) dz dx x dy Let dS represent an outward directed surface element. For example, on the near face we would have dS = dydz x̂. You can think of dS as n̂ dS, i.e., dS is normal to the surface. Then the flux dφ of A thru the surface element dS is given by dφ = A · dS = A · n̂ dS . Example 19. As an example of a flux calculation, let A(x, y, z) = zẑ, and let us find the flux of A across the sphere described in Example 18. We saw that xθ × xϕ = a2 sin2 θ cos ϕ x̂ + a2 sin2 θ sin ϕ ŷ + a2 sin θ cos θ ẑ and n̂ = xθ × xϕ = sin θ cos ϕ x̂ + sin θ sin ϕ ŷ + cos θ ẑ . kxθ × xϕ k 63 Now A = zẑ = a cos θẑ so that A · n̂ = a cos2 θ and hence the total flux of A across the surface of the sphere is given by Z Z φ = A · n̂ dS = A · n̂ kxθ × xϕ k dθdϕ = Z (a cos2 θ)(a2 sin θ) dθdϕ = 2πa3 π cos2 θ sin θ dθ 0 = 2πa 3 Z 1 −1 = Z cos2 θ d cos θ = 2πa3 · 2 3 4 3 πa . 3 You should be able to see the reason for this answer. Now, we were given A at the center (x, y, z) of the volume element. To evaluate dφ on the surface, we need A on the surface. On the right-hand face of the volume element we have the field A(x, y + dy/2, z) = A(x, y, z) + ∂A dy ∂y 2 so the flux thru the right-hand side is ∂Ay dy dxdz . dφR = A · n̂ dS = A · ŷ dxdz = Ay + ∂y 2 On the left-hand side we have A(x, y − dy/2, z) = A(x, y, z) − and dφL = A · n̂ dS = −A · ŷ dxdz = ∂A dy ∂y 2 ∂Ay dy dxdz − Ay + ∂y 2 where in this case the outward unit normal is n̂ = −ŷ. Therefore, the net outward flux thru these two faces is dφL + dφR = ∂Ay dxdydz . ∂y We can clearly do the same for the other two pairs of faces, so adding up the total for the entire volume element we obtain ∂Ax ∂Ay ∂Az dφtotal = dxdydz + + ∂x ∂y ∂z := ∇ · A d3 x := (div A)d3 x 64 where this expression defines the divergence of the vector field A. So what div A represents is the outward flux of A per unit volume. If we have a large volume, then we break it up into infinitesimal volume elements and note that the contribution to the common interior faces will cancel. Then the total flux out of the bounding surface is given by Z φtotal = d3 x ∇ · A . On the other hand, we see from dφ = A · n̂ dS that the total flux is also given by Z φtotal = A · n̂ dS . Equating these two expressions yields the Divergence Theorem: Theorem 19. Let V be a volume in R3 with outward oriented bounding surface S. If A is a vector field defined over V with continuous first partial derivatives on an open set containing V , then Z Z 3 d x∇ · A = A · n̂ dS . V S While we haven’t given a careful definition of orientation, for our purposes we take it as an intuitively clear concept. 11 Stokes’ Theorem Stokes’ Theorem is probably the most important of all vector integral theorems. In fact, in its most general form (which is formulated in terms of differential forms on a manifold, and is beyond the scope of these notes) it includes the classical Green’s theorem and the divergence theorem as special cases. We begin with a discussion of line integrals in R3 . Suppose we have a vector field A(x, y, z) = Ax (x, y, z)x̂ + Ay (x, y, z)ŷ + Az (x, y, z)ẑ and a line element dr = x̂dx + ŷdy + ẑdz. Then the line integral of A along a curve C is defined by Z Z Ax (x, y, z)x̂ + Ay (x, y, z)ŷ + Az (x, y, z)ẑ . A · dr = C C 65 Example 20. Consider the two-dimensional vector field A(x, y) = x̂ cos x + ŷ sin x and let C be the curve C : r(t) = tx̂ + t2 ŷ for −1 ≤ t ≤ 2. Then Z Z A · dr = A · r′ (t) dt C C = Z 2 −1 = Z (x̂ cos x + ŷ sin x) · (x̂ + 2tŷ) dt 2 (cos t + 2t sin t) dt −1 2 = 3 sin t − 2t cos t −1 = 3(sin 2 + sin 1) − 2(2 cos 2 + cos 1) ≈ 5.84 where to go from theR third line to the fourth we integrated by parts to write R t sin t dt = −t cos t + cos t dt. H Let us first consider the H integral A · dr around an infinitesimal path in the xy-plane. (The circle on simply means that the integral is taken around a closed path.) y A(x, y, z) dy dx x z Note that the path is “positively oriented,” meaning that if your right hand has its fingers in the direction of the path, then your thumb points along the positive z-axis. For this path we have I I I A · dr = Ax dx + Ay dy . The first integral on the right-hand side has contributions at y − dy/2 and at dy + 66 dy/2: I Ax dx = Ax (x, y − dy/2, z) dx − Ax (x, y + dy/2, z) dx ∂Ax dy ∂Ax dy = Ax (x, y, z) − dx − Ax (x, y, z) + dx ∂y 2 ∂y 2 =− ∂Ax dydx ∂y where the (−) sign between the two terms in the first line is because along the top path we have dr = −dx x̂. Similarly, I Ay dy = −Ay (x − dx/2, y, z) dy + Ay (x + dx/2, y, z) = so that I A · dr = ∂Ax ∂Ay dxdy . − ∂x ∂y We now define the “vector” ∇ × A by ∂Az ∂Ax ∂Ay ∂Ay ∂Az ∂Ax ∇×A= x̂ + ŷ + ẑ − − − ∂y ∂z ∂z ∂x ∂x ∂y x̂ ŷ ẑ = ∂/∂x ∂/∂y ∂/∂z A A A x or ∂Ay dxdy ∂x y z ∂ Ak . ∂xj The quantity ∇ × A is called the curl of A, and is frequently written curl A. If ∇ × A = 0, then we say that A is irrotational, and because of this, curl A is sometimes called rot A. Returning to our integral above, we have H dxdy = kdSk where dS = ẑ dS, and hence we can write our result in the form A · dr = (∇ × A) · dS. This result was derived for an infinitesimal element of area in the xy-plane. What about other orientations of the area element? Well, we have shown that the gradient transforms like a vector under rotations, as does the cross product of two vectors. This means that the curl behaves like a vector. But we also know that the scalar product of two vectors is a scalar, and thus I A · dr = (∇ × A) · dS (∇ × A)i = εijk 67 is true independently of the orientation of the infinitesimal area element dS. To handle a large area S, we do the obvious and break the area up into very small rectangles, each traversed in the positive direction: ∂S S For the ith area element we have I A · dri = (∇ × A) · dSi . P H If we sum these over all elements i, then in the term i A · dri fromPthe H left-hand side, the dr ’s in adjacent paths will cancel so that we are left with A · dri = i i H A · dr where the integral is taken over the external boundary ∂S. And summing ∂S R P the terms on the right-hand side we clearly have i (∇ × A) · dSi = S (∇ × A) · dS. In other words, we have (essentially) proved Stokes’ Theorem: Theorem 20. Let S be a piecewise smooth oriented surface bounded by a simple, closed piecewise smooth curve ∂S that also has positive orientation. If the vector field A(x, y, z) has continuous first partial derivatives on an open set containing S, then I Z A · dr . (∇ × A) · dS = ∂S S 12 Curvilinear Coordinates We now want to take a look at how to formulate the gradient, divergence, curl and laplacian in a general curvilinear coordinate system in R3 . Throughout this section, we refrain from using the summation convention, so that repeated indices do not imply a summation. To begin with, we assume that we have three families of mutually orthogonal surfaces which we label by the variables q1 , q2 , q3 . These are called the curvilinear coordinates of a point in R3 . (By mutually orthogonal surfaces, we mean that at 68 each point in the region under consideration, the three coordinate curves intersect in a mutually orthogonal manner.) Let there be an infinitesimal region under consideration, and let dli be an element of length perpendicular to the surface qi = const. The length dli is really the distance between surfaces at qi and qi + dqi in the infinitesimal region under consideration, and in general can be written in the form dli = hi dqi where hi is in general a function of the coordinates q1 , q2 , q3 . We also have the three unit vectors ûi defined by (see Section 8) ûi = ∂x/∂qi . k∂x/∂qi k Each ûi is orthogonal to the surface of constant qi . (For example, think of x̂ as being orthogonal to a surface of constant x.) However, be sure to realize that their orientation in space depends on their location, except in the particular case of rectangular coordinates. Thus a general displacement dx can be written dx = dl1 û1 + dl2 û2 + dl3 û3 = h1 dq1 û1 + h2 dq2 û2 + h3 dq3 û3 and the volume element is dl1 dl2 dl3 or d3 x = h1 h2 h3 dq1 dq2 dq3 . If we write dli = hi dqi ûi , then recalling the scalar triple product defined in Section 2, we can write d3 x = Vol(dl1 , dl2 , dl3 ) = |(h1 dq1 û1 × h2 dq2 û2 ) · (h3 dq3 û3 )| h1 dq1 0 0 h2 dq2 0 = 0 0 0 h dq 3 3 = h1 h2 h3 dq1 dq2 dq3 . Now suppose we have a scalar function f = f (q1 , q2 , q3 ). Then its differential is given by ∂f ∂f ∂f df = dq1 + dq2 + dq3 . ∂q1 ∂q2 ∂q3 But dqi = (1/hi )dli so we have df = 1 ∂f 1 ∂f 1 ∂f dl1 + dl2 + dl3 h1 ∂q1 h2 ∂q2 h3 ∂q3 69 and in terms of dx = dl1 û1 + dl2 û2 + dl3 û3 this may be written as df = ∇f · dx where ∇f is defined by ∇f = 1 ∂f 1 ∂f 1 ∂f û1 + û2 + û3 h1 ∂q1 h2 ∂q2 h3 ∂q3 or ∇f = (35) ∂f ∂f ∂f û1 + û2 + û3 . ∂l1 ∂l2 ∂l3 For example, recall from the beginning of Section 8 that in spherical coordinates we had dx = dr r̂ + rdθ θ̂ + r sin θdϕ ϕ̂. Then a straightforward application of equation (35) again yields ∇ = r̂ ∂ 1 ∂ 1 ∂ + θ̂ + ϕ̂ . ∂r r ∂θ r sin θ ∂ϕ Now for the divergence of a vector field A. We do exactly what we did earlier and compute the flux dφ = A · dS = A · n̂ dS thru the faces of a small volume element centered at the point (q1 , q2 , q3 ), with h-values h1 , h2 , h3 at the center. q3 A(q1 , q2 , q3 ) q2 dl3 dl1 q1 dl2 On the right-hand side of the element we have n̂ = û2 so (recall dφ = A · n̂ dS where dS = dli dlj = hi hj dqi dqj is evaluated on the surface of the element) dφR = A(q1 , q2 + dq2 /2, q3 ) · û2 dl1 dl3 ∂h1 dq2 ∂h3 dq2 ∂A2 dq2 h1 + h3 + dq1 dq3 = A2 + ∂q2 2 ∂q2 2 ∂q2 2 = A2 h1 h3 dq1 dq3 + ∂ dq2 (A2 h1 h3 ) dq1 dq3 ∂q2 2 where we have kept terms thru 3rd order in dqi . On the left-hand side we have n̂ = −û2 so dφL = −A(q1 , q2 − dq2 /2, q3 ) · û2 dl1 dl3 70 ∂A2 dq2 ∂h1 dq2 ∂h3 dq2 = − A2 − h1 − h3 − dq1 dq3 ∂q2 2 ∂q2 2 ∂q2 2 = −A2 h1 h3 dq1 dq3 + dq2 ∂ (A2 h1 h3 ) dq1 dq3 . ∂q2 2 The net outward flux thru these faces is then given by dφ2 = dφR + dφL = ∂ (A2 h1 h3 )dq1 dq2 dq3 . ∂q2 We repeat this for the other two pairs of faces (you can just cyclically permute the indices 1 → 2 → 3 → 1) to get the total net outward flux. By definition, the net outward flux per unit volume is the divergence, so using d3 x = h1 h2 h3 dq1 dq2 dq3 we obtain 1 ∂ ∂ ∂ ∇·A = (A1 h2 h3 ) + (A2 h3 h1 ) + (A3 h1 h2 ) . (36) h1 h2 h3 ∂q1 ∂q2 ∂q3 H Next we look at curl A. We have seen that A · dr = (∇ × A) · dS is true independently of the orientation of dS. Because of this, we can define the curl by I 1 A · dr (∇ × A)n = lim S→0 S where (∇ × A)n = (∇ × A) · n̂ is the component of the curl normal to dS = n̂ dS. Let n̂ = û1 and consider this path of integration: q3 A d a b q2 c q1 For this path we have dS = dl2 dl3 = hH2 h3 dq2 dq3 .H The segments (a)–(d) have the following contributions to the integral A · dr = A · dl: (a) On path (a) we have dl = −dl3 û3 = −h3 dq3 û3 and hence A · dl = −A3 dl3 = −A3 h3 dq3 . 71 (b) On path (b) we have dl = dl3 û3 = h3 dq3 û3 and we must evaluate both A and h3 at q2 + dq2 . Then ∂A3 ∂h3 A · dl = A3 + h3 + dq2 dq2 dq3 ∂q2 ∂q2 = A3 h3 dq3 + ∂ (A3 h3 )dq2 dq3 ∂q2 where we only keep terms thru second order in dqi . (c) Path (c) is again easy because dl = dl2 û2 = h2 dq2 û2 so A · dl = A2 dl2 = A2 h2 dq2 . (d) Here we have dl = −dl2 û2 = −h2 dq2 û2 where A2 and h2 must be evaluated at q3 + dq3 . Then as for path (b) we have ∂h2 ∂A3 h2 + dq3 dq2 dq2 A · dl = − A2 + ∂q3 ∂q3 = −A2 h2 dq2 − ∂ (A2 h2 )dq2 dq3 ∂q3 Adding up these four contributions and using the above definition of curl we obtain ∂ ∂ 1 (A3 h3 )dq2 dq3 − (A2 h2 )dq2 dq3 (∇ × A)1 = h2 h3 dq2 dq3 ∂q2 ∂q3 ∂ ∂ 1 (A3 h3 ) − (A2 h2 ) . = h2 h3 ∂q2 ∂q3 The corresponding results for the other two coordinate surfaces comes by letting 1 → 2 → 3 → 1 again, and we can write these results simply in the form of a determinant: h1 û1 h2 û2 h3 û3 1 ∇×A= (37) ∂/∂q1 ∂/∂q2 ∂/∂q3 . h1 h2 h3 h A h2 A2 h3 A3 1 1 This is the general expression for curl A in curvilinear coordinates. Finally, the laplacian is defined by ∇2 f = ∇ · ∇f , so we can use our previous results, equations (35) and (36), to write ∇2 f = 1 ∂ h3 h1 ∂f ∂ h1 h2 ∂f ∂ h2 h3 ∂f + + . h1 h2 h3 ∂q1 h1 ∂q1 ∂q2 h2 ∂q2 ∂q3 h3 ∂q3 Note that the three terms are cyclic permutations of each other. 72 (38)