J. Broida UCSD Fall 2009 Phys 130B QM II Supplementary Notes on Special Relativity and Maxwell’s Equations 1 The Lorentz Transformation This is a derivation of the Lorentz transformation of Special Relativity. The basic idea is to derive a relationship between the spacetime coordinates x, y, z, t as seen by observer O and the coordinates x′ , y ′ , z ′ , t′ seen by observer O′ moving at a velocity V with respect to O along the positive x axis. y′ y V x′ O′ x O These observers are assumed to be inertial. In other words, they are moving at a constant velocity with respect to each other and in the absence of any external forces or accelerations (which is somewhat redundant). In particular, there is no rotational motion or gravitational field present. Our derivation is based on two assumptions: 1. The Principle of Relativity: Physics is the same for all observers in all inertial coordinate systems. 2. The speed of light c in a vacuum is the same for all observers independently of their relative motion or the motion of the light source. We first show that this transformation must be of the form t′ = at + bx (1.1a) x′ = dt + ex y′ = y (1.1b) z′ = z where we assume that the origins coincide at t = t′ = 0. The above figure shows the coordinate systems displaced simply for ease of visualization. 1 The first thing to note is that the y and z coordinates are the same for both observers. (This is only true in this case because the relative motion is along the x axis only. If the motion were in an arbitrary direction, then each spatial coordinate of O′ would depend on all of the spatial coordinates of O. However, this is the case that is used in almost all situations, at least at an elementary level.) To see that this is necessary, suppose there is a yardstick at the origin of each coordinate system aligned along each of the y- and y ′ -axes, and suppose there is a paintbrush at the end of each yardstick pointed towards the other. If O′ ’s yardstick along the y ′ -axis gets shorter as seen by O, then when the origins pass each other O’s yardstick will get paint on it. But by the Principle of Relativity, O′ should also see O’s yardstick get shorter and hence O′ would get paint on his yardstick. Since this clearly can’t happen, there can be no change in a direction perpendicular to the direction of motion. The next thing to notice is that the transformation equations are linear. This is a result of space being homogeneous. To put this very loosely, “things here are the same as things there.” For example, if there is a yardstick lying along the x axis between x = 1 and x = 2, then the length of this yardstick as seen by O′ should be the same as another yardstick lying between x = 2 and x = 3. But if there were a nonlinear dependence, say ∆x′ goes like ∆x2 , then the first yardstick would have a length that goes like 22 − 12 = 3 while the second would have a length that goes like 32 − 22 = 5. Since this is also not the way the world works, equations (1.1) must be linear as shown. We now want to figure out what the coefficients a, b, d and e must be. First, let O look at the origin of O′ (i.e., x′ = 0). Since O′ is moving at a speed V along the x-axis, the x coordinate corresponding to x′ = 0 is x = V t. Using this in (1.1b) yields x′ = 0 = dt + eV t or d = −eV . Similarly, O′ looks at O (i.e., x = 0) and it has the coordinate x′ = −V t′ with respect to O′ (since O moves in the negative x′ direction as seen by O′ ). Then from (1.1b) we have −V t′ = dt and from (1.1a) we have t′ = at, and hence t′ = −dt/V = at so that −aV = d = −eV and thus also a = e and d = −aV . Using these results in equations (1.1) now gives us b ′ (1.2a) t =a t+ x a x′ = a(x − V t) (1.2b) y′ = y z ′ = z. Now let a photon move along the x-axis (and hence also along the x′ -axis) and pass both origins when they coincide at t = t′ = 0. Then the x coordinate of the photon as seen by O is x = ct, and the x′ coordinate as seen by O′ is x′ = ct′ . Note that the value of c is the same for both observers. This is assumption (2). Using these in equations (1.2) yields b bc ′ t = a t + ct = at 1 + a a 2 V x′ = a(ct − V t) = cat 1 − c so that bc V ′ ′ = x = ct = cat 1 + cat 1 − c a and therefore −V /c = bc/a or b V = − 2. a c So now equations (1.2) become V t = a t− 2x c ′ x′ = a(x − V t) y′ = y (1.3a) (1.3b) z ′ = z. We still need to determine a. To do this, we will again use the Principle of Relativity. Let O look at a clock situated at O′ . Then ∆x = V ∆t and from (1.3a), O and O′ will measure time intervals related by V2 V ′ ∆t = a ∆t − 2 ∆x = a 1 − 2 ∆t. c c Now let O′ look at a clock at O (so ∆x = 0). Then ∆x′ = −V ∆t′ so (1.3b) yields −V ∆t′ = ∆x′ = a(0 − V ∆t) = −aV ∆t and hence 1 ′ ∆t . a By the Principle of Relativity, the relative factors in the time measurements must be the same in both cases. In other words, O sees O′ ’s time related to his by the factor a(1 − V 2 /c2 ), and O′ sees O’s time related to his by the factor 1/a. This means that 1 V2 =a 1− 2 a c or 1 a= q 2 1 − Vc2 ∆t = 3 and therefore the final Lorentz transformation equations are t − V2 x t′ = q c 2 1 − Vc2 x−Vt x′ = q 2 1 − Vc2 (1.4) y′ = y z ′ = z. It is very common to define the dimensionless variables β= V c and 1 γ=q 1− V2 c2 1 . = p 1 − β2 (1.5) In terms of these variables, equations (1.4) become β t′ = γ t − x c x′ = γ(x − βct) y′ = y (1.6) z ′ = z. Since c is a universal constant, it is essentially a conversion factor between units of time and units of length. Because of this, we may further change to units where c = 1 (so time is measured in units of length) and in this case the Lorentz transformation equations become t′ = γ(t − βx) x′ = γ(x − βt) y′ = y (1.7a) z ′ = z. These equations give the coordinates as seen by O′ in terms of those of O. If we want the coordinates as seen by O in terms of those of O′ , then we let β → −β and we have t = γ(t′ + βx′ ) x = γ(x′ + βt′ ) y = y′ z = z′. Note that 0 ≤ β ≤ 1 so that 1 ≤ γ < ∞. We also see that γ2 = 1 1 − β2 4 (1.7b) so that γ 2 − γ 2 β 2 = 1. Then recalling the hyperbolic trigonometric identities cosh2 θ − sinh2 θ = 1 and 1 − tanh2 θ = sech2 θ we may define a parameter θ (sometimes called the rapidity) by β = tanh θ so that γ = cosh θ and γβ = sinh θ. In terms of θ, equations (1.7a) become t′ = (cosh θ)t − (sinh θ)x x′ = −(sinh θ)t + (cosh θ)x y′ = y z′ = z (1.8) which looks very similar to a rotation in the xt-plane, except that now we have hyperbolic functions instead of the usual trigonometric ones. However, note that both sinh terms have the same sign. Next, let us consider motion as seen by both observers. In this case we write displacements in both space and time as dt′ = γ(dt − βdx) dx′ = γ(dx − βdt) dy ′ = dy dz ′ = dz. Then the velocity vx′ of a particle along the x′ -axis as seen by O′ is vx′ = dx − βdt dx/dt − β vx − β dx′ = . = = ′ dt dt − βdx 1 − β(dx/dt) 1 − βvx (1.9a) Alternatively, we may write vx = dx′ /dt′ + β dx dx′ + βdt′ v′ + β = = ′ = x . ′ ′ ′ dt dt + βdx 1 + β(dx /dt ) 1 + βvx′ (1.9b) These last two equations are called the relativistic velocity addition law. It should be obvious that for motion along the x-axis we have vy′ = vy and vz′ = vz . 5 Be sure to understand what these equations mean. They relate the velocity of an object as seen by two different observers whose relative velocity along their common x-axis is β. Note that even if vx′ = 1 (corresponding to vx′ = c), the velocity as seen by O is still just vx = 1. This is quite different from the classical Galilean addition of velocities, because nothing can go faster than light (c = 1). One of the most important aspects of Lorentz transformations is that they leave the quantity t2 − x2 − y 2 − z 2 invariant. In other words, using equations (1.7a) you can easily show that t′2 − x′2 − y ′2 − z ′2 = t2 − x2 − y 2 − z 2 . (1.10) Note that setting this equal to zero, we get the equation of an outgoing sphere of light as seen by either observer. (Don’t forget that if c 6= 1, then t becomes ct.) We refer to this as the invariance of the interval because it can be written as (∆t′ )2 − (∆x′ )2 − (∆y ′ )2 − (∆z ′ )2 = (∆t)2 − (∆x)2 − (∆y)2 − (∆z)2 . If the primed frame is the rest frame of a particle, then we have dx′ = dy ′ = dz = 0 and dt′ measures the time interval as seen by the particle, called the proper time. Because of this, we sometimes write ′ dτ 2 := dt2 − dx2 − dy 2 − dz 2 . This is frequently also called the proper distance (or proper length) and written ds2 . The difference between dτ and ds is if c 6= 1, we have ds2 = c2 dt2 − dx2 = c2 dt2 (1 − v2 /c2 ) = c2 dt2 /γ 2 := c2 dτ 2 so that dτ 2 = ds2 /c2 . Since we are working with c = 1, we will write proper distance as ds2 = dt2 − dx2 − dy 2 − dz 2 . Now let’s go to some modern notation. In units with c = 1, we first define our four spacetime components as the vector 0 x t x1 x µ x = x2 = y . x3 z (If c 6= 1 then x0 = ct.) This vector is an element of a 4-dimensional vector space called Minkowski space. Then we have ds2 = (dx0 )2 − (dx1 )2 − (dx2 )2 − (dx3 )2 or, defining the Lorentz (or Minkowski) metric 1 −1 gµν = −1 −1 6 (1.11) we write (using the summation convention) ds2 = gµν dxµ dxν . (1.12) Let me note that most particle physicists use this metric, which we can also write as simply gµν = diag(1, −1, −1, −1), but most relativists us the metric gµν = diag(−1, 1, 1, 1), and you need to be careful when reading equations. Many physicists also use the symbol ηµν rather than gµν when dealing with the Lorentz metric (as opposed to more general metrics used in general relativity). Vectors xµ in Minkowski space are classified as timelike if xµ xµ > 0, spacelike if xµ xµ < 0 and null (or lightlike) if xµ xµ = 0. Light rays are null, and hence we see that there are nonzero vectors with zero norm. Because of this, the Minkowski metric is not positive definite, and we say that Minkowski space is semi-Riemannian. From linear algebra we know that the metric defines an inner product, and we can use this to raise or lower indices, for example, xµ = gµν xν . In the case of the Lorentz metric, we have the inverse metric with components g µν = gµν . Furthermore, there is no difference between x0 and x0 , but xi = −xi for i = 1, . . . , 3. (Again, be careful because this is the opposite of what you get if you use the other metric.) Using this notation, equation (1.10) is written gµν x′µ x′ν = gµν xµ xν where the metric is the same in both frames. Lowering indices, we write this in its most compact form as x′µ x′µ = xµ xµ and we say that the length x2 = xµ xµ is an invariant. It is also important to understand the the scalar product of two vectors aµ and bµ is written in the equivalent forms a · b = gµν aµ bν = aν bν = a0 b0 + ai bi = a0 b0 − 3 X i=1 ai bi = a0 b0 − a · b. Note that the summation convention means that repeated Greek indices are to be summed from 0 to 3, and repeated Latin indices are to be summed from 1 to 3. We now write our Lorentz transformation equations as x′µ = Λµ ν xν where we have defined the Lorentz transformation matrix γ −βγ 0 0 −βγ γ 0 0 . Λµ ν = 0 0 1 0 0 0 0 1 7 (1.13) (1.14) (You should be aware that some authors put the prime on the indices and ′ ′ write this in the form xµ = Λµ ν xν .) Using this, we write the invariant x2 as ′ ′µ µ α β xµ x = Λµα Λ β x x . But this must equal xα xα , and hence we have ν Λµα Λµ β = (ΛT )αµ Λµ β = (ΛT )α gνµ Λµ β = gαβ (1.15) which can be written in matrix form as ΛT gΛ = g. In fact, this can be taken as the definition of a Lorentz transformation Λ. Since gβα = δβα , we can write equation (1.15) as (ΛT )α µ Λµ β = δβα which shows that Λ is an orthogonal transformation, i.e., ΛT = Λ−1 . This is actually just what equations (1.8) say—a Lorentz transformation is a rotation in Minkowski space. Since (Λ−1 )µ ν = (ΛT )µ ν = Λν µ , we see from equation (1.13) that (Λ−1 )α µ x′µ = (Λ−1 )α µ Λµ ν xν = xα or xα = Λµ α x′µ . (1.16) Equations (1.13) and (1.16) then give us the very useful results Λµ ν = ∂x′µ ∂xν Λµ ν = (Λ−1 )ν µ = and ∂xν . ∂x′µ (1.17) In order to define velocity in an invariant manner, we define the 4-velocity in terms of the proper time by uµ := dxµ . dτ (1.18) Note we can write dτ 2 = dt2 − dx2 = dt2 (1 − v2 ). Here v is the velocity of a particle as seen by O. If we let O′ be the particle rest frame, then v is just β and we have dτ 2 = dt2 (1 − v2 ) = dt2 /γ 2 so that dt = γ. dτ Then dt dxµ dxµ dxµ = =γ dτ dτ dt dt and we can write the 4-velocity as the vector # " γ µ u = γv uµ = which has the magnitude uµ uµ = γ 2 − γ 2 v2 = γ 2 (1 − v2 ) = 1. 8 (1.19) (Again, with c 6= 1 we have x0 = ct so that uµ = γ[c, v] and uµ uµ = c2 .) Since Λµ ν is a constant matrix, we have u′µ = dx′µ dxν = Λµ ν = Λµ ν u ν dτ dτ so that uµ transforms in exactly the same manner as xµ . We call any vector that transforms in this way a 4-vector, which justifies the term ‘4-velocity’ used above. Similarly, we define the 4-momentum by " # 1 µ µ (1.20) p := mu = mγ v so that p 2 = p µ p µ = m2 . (If c 6= 1 then p2 = m2 c2 . Let me also emphasize that the mass m in all of our equations is the constant rest mass. We never talk about a “relativistic mass” γm that many older books use where they write our mass as m0 and then define m = γm0 .) Expanding the square root we have 1 m = m 1 + v2 + · · · p0 = mγ = √ 2 1 − v2 which is the sum of a rest energy term m (= mc) and a kinetic energy mv 2 /2 (= (1/2)mv 2 /c) plus higher order terms in v (= v/c). Because of this, we see that p0 is the total energy p0 = mγ = E (= E/c) of the particle, so using mγv = p as the classical momentum, we have " # E µ . (1.21) p = p Therefore m2 = p2 = E 2 − p2 or E 2 = p2 + m2 . (1.22) (If c 6= 1, then pµ = mγ[c, v] = [E/c, p] and this becomes E 2 = p2 c2 + m2 c4 .) Now, the gradient operator is defined as ∇ = ∂/∂x so that ∇i = ∂/∂xi := ∂i . Let us define ∂µ = ∂/∂xµ . Then " # ∂0 ∂µ = ∇ and ∂µ = " ∂0 −∇ Using equation (1.17) we have ∂µ′ = ∂ ∂xν ∂ = = Λµ ν ∂ν ′µ ∂x ∂x′µ ∂xν 9 # . (1.23) or, equivalently, ∂ ′µ = Λµ ν ∂ ν which shows that indeed ∂µ transforms as a 4-vector (which is implied by the notation). The operator ∂µ ∂ µ = (∂0 )2 + ∂i ∂ i = (∂t )2 − ∇2 = ∂2 − ∇2 ∂t2 is called the d’Alembertian, and is frequently written as . In quantum mechanics we have the momentum operator defined by p = −i~∇ and the energy operator defined by E = i~(∂/∂t). Then pi = −i~∇i = −i~∂i = +i~∂ i and we can define the relativistic momentum operator pµ = i~∂ µ . Using units with ~ = 1, the expression E 2 − p2 − m2 = 0 becomes −∂t2 + ∇2 − m2 = 0 or (∂t2 − ∇2 + m2 )φ(x) = ( + m2 )φ(x) = 0 which is known as the Klein-Gordon equation. Even though the two reference frames relative to which we are describing motion must be inertial, there is no reason we can’t describe the motion of an accelerated object. As you might guess, we define the 4-acceleration of an object by duµ . aµ = dτ Since uµ uµ = 1, it follows that the 4-acceleration is always orthogonal to the 4-velocity because duµ d µ u uµ = 2uµ = 2uµ aµ . 0= dτ dτ We also define the 4-force fµ = dpµ = maµ dτ so that dpµ dpµ =γ =γ fµ = dτ dt " d(γm)/dt dp/dt # = " f0 γfc # (1.24) where fc = dp/dt is the classical force on the particle. Since the 4-force obviously obeys f µ uµ = 0, we have 0 = f µ uµ = γf 0 − γ 2 fc · v and therefore f 0 = γfc · v (1.25) which, to within the factor of γ, is just the classical power (i.e., the rate at which work is done). (And if c 6= 1 we have 0 = f µ uµ = f 0 γc − γ 2 fc · v so that f 0 = (γ/c)fc · v.) 10 2 Maxwell’s Equations Experimentally, it is found that the charge to mass ratio e/mγ of a particle moving at velocity β obeys the law e e = (1 − β 2 )1/2 . mγ m (The two sides of this equation refer to different measurements, so it’s not as trivial of a statement as it looks at first.) Therefore, e is a constant, and we have that charge is an invariant quantity. What we would now like to know is how charge density and electric current behave. Since charge density is charge/volume, we must find out how volumes transform. Let frame 2 (the primed frame) be in motion with respect to frame 1 (the unprimed frame) along their mutual x-axis, and consider a small cube of side l0 at rest in frame 2. In its rest frame, this cube has volume dτ0 (not to be confused with proper time). From (1.7a) we have dτ1 = dxdydz = 1 ′ ′ ′ 1 dx dy dz = dτ0 γ γ (2.1) where we used dx′ = γ(dx − βdt) together with dt = 0 for a measurement made in frame 1. Thus we have dτ1 = dτ0 (1 − β 2 )1/2 . Now suppose the volume is also moving with respect to frame 2, and let this motion be along the x2 axis. Letting v2x be the velocity of the box with respect to frame 2, and similarly for v1x , we have 2 1/2 dτ2 = dτ0 (1 − v2x ) and 2 1/2 dτ1 = dτ0 (1 − v1x ) . But from (1.9b) we have v1x = v2x + β 1 + βv2x and therefore 2 1/2 v2x + β dτ1 = dτ0 1 − 1 + βv2x 1/2 2 2 1 + 2βv2x + β 2 v2x − v2x − 2βv2x − β 2 = dτ0 (1 + βv2x )2 1/2 2 2 (1 − β 2 )(1 − v2x ) [(1 − β 2 )(1 − v2x )]1/2 = dτ0 = dτ 0 (1 + βv2x )2 1 + βv2x = (1 − β 2 )1/2 dτ2 1 + βv2x 11 2 1/2 where in going to the last line we used dτ2 = dτ0 (1 − v2x ) . Rearranging, this is dτ2 dτ1 = . (2.2a) γ(1 + βv2x ) Reversing the frame point of view, we clearly also have dτ2 = dτ1 . γ(1 − βv1x ) (2.2b) Be sure to understand what these equations say. The velocities v1x and v2x are the observed velocities of a box with rest volume dτ0 moving along the common x-axis as seen in frames 1 and 2, which are moving with velocity β with respect to each other. Now suppose that we have dn charges of Q coulombs each. As we stated above, Q is an invariant. Obviously, dn is also an invariant since it is just the number of charges. Then the charge densities as observed in frames 1 and 2 are ρ1 = Q dn dτ1 and ρ1 = Q dn dτ1 so that ρ1 dτ1 = ρ2 dτ2 where dτ1 and dτ2 are the same volume containing the charge Q dn as observed in the two different reference frames. Then, using equations 2.2, we have ρ1 = ρ2 dτ2 = ρ2 γ(1 + βv2x ) dτ1 dτ1 ρ2 = ρ1 = ρ1 γ(1 − βv1x ) . dτ2 (2.3) The current density J := ρv is defined as the charge per area-time, so we can write J1x = ρ1 v1x and J2x = ρ2 v2x . Using these definitions in equations (2.3) yields the transformation of charge density ρ1 = γ(ρ2 + βJ2x ) (2.4) ρ2 = γ(ρ1 − βJ1x ) . Now using equations (1.9) and (2.3) we also have J1x = ρ1 v1x = ρ2 γ(1 + βv2x ) v2x + β 1 + βv2x or J1x = γ(J2x + βρ2 ) J2x = γ(J1x − βρ1 ) . Similarly, J1y = ρ1 v1y = ρ2 γ(1 + βv2x )v1y . 12 (2.5a) But v1y = dy1 dy2 v2y = = . dt1 γ(dt2 + βdx2 ) γ(1 + βv2x ) Hence J1y = ρ2 v2y = J2y and similarly J1z = J2z . (2.5b) Comparing equations (2.4) and (2.5) with equations (1.7), we see that we have a 4-current density " # ρ µ J = . (2.6) J (And again, if c 6= 1 this becomes J µ = [ρc, J].) Now that we have shown that J µ does indeed define a 4-vector, let me show another way to arrive at this conclusion that is analogous to the definition of 4-momentum. From (2.1) we can write dτ = dτ0 /γ where dτ0 is at rest in frame 2, and γ = (1 − β 2 )−1/2 where β is the velocity of frame 2 with respect to frame 1. Then the invariance of charge implies ρdτ = ρ0 dτ0 or ρdτ0 /γ = ρ0 dτ0 , and hence ρ = γρ0 . This is analogous to the expression mrelativistic = γmrest or simply m = γm0 . Now we also have J = ρv = ρ0 γv. Recalling that the 4-velocity is given by # " γ µ u = γv we see that letting v be the velocity of the charge, i.e., v = β, then (2.6) is the same as J µ = ρ0 u µ which is analogous to pµ = muµ . In other words, we have shown # " " # ρ γ = ρ0 u µ . Jµ = = ρ0 γv J (2.7) Since we saw earlier that the derivative operator ∂µ transforms as a 4-vector (technically, a co-vector), and we just showed that J µ is a 4-vector, it follows that ∂µ J µ is a Lorentz scalar. But ∂µ J µ = ∂0 ρ + ∂i J i = ∂ρ +∇·J ∂t and therefore the continuity equation may be written in the covariant form ∂µ J µ = 0 . 13 (2.8) Now recall Maxwell’s equations: ∇ · E = 4πρ (2.9a) ∇·B= 0 (2.9b) ∇×E=− ∂B ∂t ∇ × B = 4πJ + (2.9c) ∂E ∂t (2.9d) Using B = ∇ × A, equation (2.9c) implies E = −∇φ − ∂A ∂t so that equations (2.9a) and (2.9d) then imply (using the identity ∇ × ∇ × A = ∇(∇ · A) − ∇2 A) ∂ (2.10a) ∇2 φ + (∇ · A) = −4πρ ∂t ∂φ ∂2A = −4πJ . (2.10b) −∇ ∇·A+ ∇2 A − 2 ∂t ∂t The gauge transformation A → A′ = A + ∇Λ leaves the physical field B = ∇ × A unchanged, so if E = −∇φ − ∂A/∂t is also to remain unchanged, we must have φ → φ′ = φ − ∂Λ/∂t. This gives us the freedom to choose (φ, A) such as to satisfy the Lorentz gauge (or Lorentz condition) ∇·A+ ∂φ = 0. ∂t In other words, we demand that the new potentials (φ′ , A′ ) satisfy 0 = ∇ · A′ + ∂φ′ ∂t = ∇ · A + ∇2 Λ + ∂φ ∂ 2 Λ − 2 . ∂t ∂t Thus, if we can find a Λ that satisfies ∇2 Λ − ∂2Λ ∂φ = − ∇ · A + ∂t2 ∂t the gauge transformed fields will satisfy the Lorentz condition. Fortunately, this is a straightforward problem to solve. All we need to do is find the Green’s function for the wave equation, and then Λ will be the integral of the Green’s function times the quantity on the right. (Very briefly, if you have a linear operator L(x) acting on a function f(x) such that L(x)f (x) = g(x), and if you find a Green’s function G(x, x′ ) defined by L(x)G(x, x′ ) = δ(x − x′ ), then the 14 R solution to the problem is essentially f (x) = G(x, x′ )g(x′ ) dx′ . You can easily verify that acting on this with L(x) will yield L(x)f (x) = g(x).) In any case, choosing the Lorentz gauge, equations (2.10) become the wellknown wave equations ∂2φ = −4πρ ∂t2 (2.11a) ∂2A = −4πJ . ∂t2 (2.11b) ∇2 φ − ∇2 A − Note that an equivalent way of writing these is φ = −4πJ 0 A = −4πJ . So if we define the 4-potential Aµ by µ A = " φ A # then the wave equations may be written in the concise form Aµ = −4πJ µ (2.12) where the Lorentz condition becomes ∂µ Aµ = 0 . That Aµ is indeed a 4-vector follows because is a Lorentz invariant quantity and J µ is a 4-vector. Thus Aµ must transform as a 4-vector so that (2.12) is covariant (i.e., so that both sides transform the same way). Also, Aµ is unchanged even if c 6= 1. This is because the right side of (2.11b) becomes (−4π/c)J and J 0 = ρc. Now we need a bit of terminology. Recall that a 4-vector was defined as a quantity v µ that under a Lorentz transformation transformed as v µ → v ′µ = Λµ ν v ν . As we shall see below, it is also possible to have quantities with more than one index such that under a Lorentz transformation, each index transforms like a 4-vector. For example, a quantity F µν (not necessarily the electromagnetic field tensor) with two indices that transforms like F µν → F ′µν = Λµ α Λν β F αβ is called a (second rank) tensor. Higher rank tensors are defined in the obvious manner. Note also that all of the indices need not be superscripts (such indices 15 are called contravariant). We can equally have subscripts (called covariant) that transform like ′ Fµν = Λµ α Λν β Fαβ . And we can have a mixed tensor like F µ ν . Indices are raised and lowered by using the metric gµν and its inverse g µν . At last we are ready to write Maxwell’s equations in covariant form. It is not hard to show that even though ∂µ transforms as a 4-vector under a Lorentz transformation Λµ ν , as does Aµ , the quantity ∂µ Aν does not transform as a second-rank tensor. However, the antisymmetric quantity Fµν defined by Fµν := ∂µ Aν − ∂ν Aµ (2.13a) does indeed transform as a tensor. This is called the electromagnetic field tensor. Equivalently, we may consider the contravariant version F µν = ∂ µ Aν − ∂ ν Aν . (2.13b) I claim that equations (2.9a) and (2.9d) can be written in the form ∂µ F µν = J ν . (2.14) To see this, first recall that we are using the metric g = diag(1, −1, −1, −1) so that ∂/∂t = ∂/∂x0 = ∂0 = ∂ 0 and ∇i := ∂/∂xi = ∂i = −∂ i . Using E = −∇ϕ − ∂A/∂t we have E i = ∂ i A0 − ∂ 0 Ai = F i0 and also B = ∇ × A so that B 1 = ∇2 A3 − ∇3 A2 = −∂ 2 A3 + ∂ 3 A2 = −F 23 plus cyclic permutations 1 → 2 → 3 → 1. Then the electromagnetic field tensor is given by 0 −E 1 −E 2 −E 3 E1 0 −B 3 B2 . (2.15) F µν = 2 3 E B 0 −B 1 3 2 1 E −B B 0 (Be sure to note that this is the form of F µν for the metric diag(1, −1, −1, −1). If you use the metric diag(−1, 1, 1, 1) then all entries of F µν change sign. In addition, you frequently see the matrix F µ ν which also has different signs.) Now, for the ν = 0 component of (2.14) we have J 0 = ∂µ F µ0 = ∂i F i0 = ∂i E i which is Coulomb’s law ∇ · E = ρ. 16 Next consider the ν = 1 component of (2.14). This is J 1 = ∂µ F µ1 = ∂0 F 01 + ∂2 F 21 + ∂3 F 31 = −∂0 E 1 + ∂2 B 3 − ∂3 B 2 = −∂t E 1 + (∇ × B)1 and therefore we have ∂E = J. ∂t Finally, I leave it as an exercise for you to show that equations (2.9b) and (2.9c) can be written as (note the superscripts are cyclic permutations) ∇×B− ∂ µ F νσ + ∂ ν F σµ + ∂ σ F µν = 0 or simply ∂ [µ F νσ] = 0 . (2.16) Remark: There is another interesting way to arrive at the electromagnetic field tensor that we now describe, but you are free to skip over it. First we need to give a more careful definition of a tensor. To begin, given a vector space V , we can define the dual space V ∗ as the vector space of linear functionals on V . In other words, α ∈ V ∗ means that α : V → R is a linear map from V to R. (We restrict consideration to real vector spaces.) Members of the dual space are frequently called covectors. If V has a basis {e1 , . . . , en }, we define the n linear functionals {ω 1 , . . . , ω n } by ω i (ej ) = δji . I will show that these n linear functionals form a basis for V ∗ , i.e., that they are linearly independent and span V ∗ . To show this, let α ∈ V ∗ be arbitrary but fixed. Note that for any v ∈ V , using the linearity of α we have α(v) = α(v i ei ) = v i α(ei ) = ai v i where we have defined the scalars ai by ai = α(ei ). On the other hand, ω i (v) = ω i (v j ej ) = v j ω i (ej ) = v j δji = v i and hence we see that α(v) = ai v i = ai ω i (v) so that α = ai ω i . This shows that the ω i span V ∗ . To show they are linearly independent, suppose we have ci ω i = 0 for some set of scalars ci . Then for any j = 1, . . . , n we have 0 = ci ω i (ej ) = ci δji = cj which proves linear independence. Thus we have shown that any α ∈ V ∗ can be written in the form α = ai ω i = α(ei )ω i . 17 As an example, consider the space V = R2 consisting of all column vectors of the form 1 v v= . v2 Relative to the standard basis we have 0 1 = v 1 e1 + v 2 e2 . + v2 v = v1 1 0 P If φ ∈ V ∗ , then φ(v) = φi v i , and we may represent φ by the row vector φ = (φ1 , φ2 ). In particular, if we write the dual basis as ω i = (ai , bi ), then we have 1 1 = ω 1 (e1 ) = (a1 , b1 ) = a1 0 0 0 = ω 1 (e2 ) = (a1 , b1 ) = b1 1 1 2 0 = ω (e1 ) = (a2 , b2 ) = a2 0 0 2 1 = ω (e2 ) = (a2 , b2 ) = b2 1 so that ω 1 = (1, 0) and ω 2 = (0, 1). Note, for example, 1 v ω 1 (v) = (1, 0) 2 = v 1 v as it should. As another very important example, let V be an inner product space. If a, b ∈ V , then the inner product of a and b is the number ha, bi. Then given a fixed vector a, the quantity ha, ·i is a linear functional on V because it takes a vector b ∈ V and gives back a number: ha, ·i : b → ha, bi ∈ R . Since ha, ·i is in V ∗ , let us denote it by α, so that α(b) = ha, bi. Given a basis {ei } for V , let us define the numbers gij by gij := hei , ej i = hej , ei i = gji . This is the proper definition of the metric. Then ha, bi = hai ei , bj ej i = ai bj hei , ej i = ai bj gij = bj gji ai . But we also have α(b) = α(bj ej ) = bj α(ej ) = bj aj 18 and therefore Hence we define bj aj = α(b) = ha, bi = bj gji ai . aj = gji ai . This is called lowering an index. Since the inner product is nondegenerate by definition, the metric gij must be nonsingular, and hence we can define its inverse which we denote by g ij . Multiplying this last equation by g kj we then have g kj aj = g kj gji ai = δik ai = ak and thus we define raising an index by ak = g kj aj . Now that we have an understanding of dual spaces, we are in a position to define tensors carefully. So, a tensor T is just a multilinear map T : V ∗s × V r = V ∗ × · · · × V ∗ × V × · · · × V → R . By multilinear, we mean that it is linear in each variable separately. This tensor is said to have covariant order r, and contravariant order s, or simply to be a tensor of type (s, r). In other words, T takes as its argument s covectors and r vectors. Since it is multilinear, we see that T (α(1) , . . . , α(s) , v(1) , . . . ,v(r) ) (s) (1) j1 ir ej1 , . . . , v(r) eir ) = T (ai1 ω i1 , . . . , ais ω is , v(1) (1) (s) (1) (s) jr j1 T (ω i1 , . . . , ω is , ej1 , . . . , ejr ) · · · v(r) = ai1 · · · ais v(1) jr j1 T i1 ···is j1 ···jr · · · v(r) = ai1 · · · ais v(1) where the last line defines the components of T . Thus we see that if we know the components of T , then we know the result of T acting on an arbitrary collection of vectors and covectors. What happens to the components of T under a change of coordinates? From a practical standpoint, this is what really defines a tensor. A change of basis in V is of the form ei → ēi = ej pj i where (pj i ) is called the transition matrix. Then any x ∈ V can be written in terms of its components with respect to either ei or ēi , and we have x = xj ej = x̄i ēi = x̄i ej pj i and therefore we must have xj = pj i x̄i 19 or x̄i = (p−1 )i j xj . From these we easily see that ∂xi ∂ x̄j pi j = and (p−1 )i j = ∂ x̄i . ∂xj When V undergoes a change of basis, what about V ∗ ? Let us write in general ω → ω̄ i = bi j ω j . Since we must also have ω̄ i (ēj ) = δji , we see that i δji = ω̄ i (ēj ) = ω̄ i (ek pk j ) = pk j ω̄ i (ek ) = pk j bi l ω l (ek ) = pk j bi l δkl = bi k pk j so that bi k = (p−1 )i k . In other words, ω i → ω̄ i = (p−1 )i j ω j . Finally we are in a position to derive the general transformation law of a tensor. We have T i1 ···is j1 ···jr = T (ω̄ i1 , . . . , ω̄ is , ēj1 , . . . , ējr ) = T ((p−1 )i1 k1 ω k1 , . . . , (p−1 )is ks ω ks , el1 pl1 j1 , . . . , elr plr jr ) = (p−1 )i1 k1 · · · (p−1 )is ks pl1 j1 · · · plr jr T (ω k1 , . . . , ω ks , el1 , . . . , elr ) = (p−1 )i1 k1 · · · pl1 j1 · · · T k1 ···ks l1 ···lr or ∂ x̄i1 ∂ x̄is ∂xl1 ∂xlr k1 ···ks · · · · · · T l1 ···lr . ∂xk1 ∂xks ∂ x̄j1 ∂ x̄jr This is the classical transformation law of a type (s, r) tensor. In the particular case of a second rank tensor F µν under a Lorentz transformation, we have x̄µ = Λµ ν xν T i1 ···is j1 ···jr = so that ∂ x̄µ = Λµ ν ∂xν and therefore ∂ x̄µ ∂ x̄ν αβ F = Λµ α Λν β F αβ . ∂xα ∂xβ Let us now return to the physics. We know that the Lorentz force law is F µν = F = q(E + v × B) = dp dt so consider (now τ is the proper time again) dp dt dp dp = =γ . dτ dτ dt dt 20 In terms of the 4-velocity µ u = " γ γv # = " u0 u # we can write dp = γq(E + v × B) = q(γE + γv × B) dτ = q(u0 E + u × B) . (2.17) Also note that if W = p0 is the energy of the particle, then the change in energy is the rate at which work is done, so that dW dr = F· = F · v = q(E + v × B) · v = qE · v dt dt and therefore dW dW dp0 = =γ = qE · γv = qE · u . (2.18) dτ dτ dt Combining equations (2.17) and (2.18), we see that we can define a linear map uµ → dpµ /dτ of a 4-vector to another 4-vector, and hence there exists a second rank mixed tensor F µ ν such that dpµ = qF µ ν uν . dτ (2.19) Comparing (2.19) with (2.17) and (2.18) allows us to pick out the components of F µ ν : 0 Ex Ey Ez 0 Bz −By Ex . F µν = E −B 0 Bx y z Ez By Finally, we note that we can also write F and F µν = g να F µ α . µ −Bx ν 0 in the alternate forms Fµν = gµα F α ν Now that we have the electromagnetic field tensor, it is straightforward to derive the transformation laws for the E and B fields. Starting from F µν which defines the fields E and B, we have F ′µν = Λµ α Λν β F αβ which then gives the fields E′ and B′ in terms of E and B. In matrix notation, we can write this as F ′ = ΛF ΛT . Using equations (1.14) and (2.15), it is easy to multiply out the matrices and 21 show that F ′µν −E ′1 0 B ′3 −B ′2 0 E ′1 = E ′2 E ′3 −E ′2 −B ′3 0 B ′1 0 E1 = γ(E 2 − βB 3 ) 3 2 γ(E + βB ) −E ′3 B ′2 −B ′1 0 −E 1 −γ(E 2 − βB 3 ) −γ(E 3 + βB 2 ) 0 γ(B 3 − βE 2 ) 2 3 −γ(B + βE ) −γ(B 3 − βE 2 ) 0 B1 γ(B 2 + βE 3 ) . −B 1 0 This was for the special case of a boost along the x-axis, i.e., β = βx̂. it is not hard to see that we can write down the field transformation laws for a boost β in an arbitrary direction (but with the coordinate axes still parallel) if we write this in terms of components parallel and perpendicular to the boost. This yields E′ ⊥ = γ(E⊥ + β × B) E′ k = Ek B ⊥ = γ(B⊥ − β × E) B k = Bk ′ ′ 22 (2.20a) (2.20b)