Special Relativity and Maxwell’s Equations 1 The Lorentz Transformation J. Broida

advertisement
J. Broida
UCSD Fall 2009
Phys 130B
QM II
Supplementary Notes on
Special Relativity and Maxwell’s Equations
1
The Lorentz Transformation
This is a derivation of the Lorentz transformation of Special Relativity. The
basic idea is to derive a relationship between the spacetime coordinates x, y, z, t
as seen by observer O and the coordinates x′ , y ′ , z ′ , t′ seen by observer O′ moving
at a velocity V with respect to O along the positive x axis.
y′
y
V
x′
O′
x
O
These observers are assumed to be inertial. In other words, they are moving at
a constant velocity with respect to each other and in the absence of any external
forces or accelerations (which is somewhat redundant). In particular, there is
no rotational motion or gravitational field present.
Our derivation is based on two assumptions:
1. The Principle of Relativity: Physics is the same for all observers in all
inertial coordinate systems.
2. The speed of light c in a vacuum is the same for all observers independently
of their relative motion or the motion of the light source.
We first show that this transformation must be of the form
t′ = at + bx
(1.1a)
x′ = dt + ex
y′ = y
(1.1b)
z′ = z
where we assume that the origins coincide at t = t′ = 0. The above figure shows
the coordinate systems displaced simply for ease of visualization.
1
The first thing to note is that the y and z coordinates are the same for both
observers. (This is only true in this case because the relative motion is along
the x axis only. If the motion were in an arbitrary direction, then each spatial
coordinate of O′ would depend on all of the spatial coordinates of O. However,
this is the case that is used in almost all situations, at least at an elementary
level.) To see that this is necessary, suppose there is a yardstick at the origin of
each coordinate system aligned along each of the y- and y ′ -axes, and suppose
there is a paintbrush at the end of each yardstick pointed towards the other.
If O′ ’s yardstick along the y ′ -axis gets shorter as seen by O, then when the
origins pass each other O’s yardstick will get paint on it. But by the Principle
of Relativity, O′ should also see O’s yardstick get shorter and hence O′ would
get paint on his yardstick. Since this clearly can’t happen, there can be no
change in a direction perpendicular to the direction of motion.
The next thing to notice is that the transformation equations are linear.
This is a result of space being homogeneous. To put this very loosely, “things
here are the same as things there.” For example, if there is a yardstick lying
along the x axis between x = 1 and x = 2, then the length of this yardstick
as seen by O′ should be the same as another yardstick lying between x = 2
and x = 3. But if there were a nonlinear dependence, say ∆x′ goes like ∆x2 ,
then the first yardstick would have a length that goes like 22 − 12 = 3 while the
second would have a length that goes like 32 − 22 = 5. Since this is also not the
way the world works, equations (1.1) must be linear as shown. We now want to
figure out what the coefficients a, b, d and e must be.
First, let O look at the origin of O′ (i.e., x′ = 0). Since O′ is moving at a
speed V along the x-axis, the x coordinate corresponding to x′ = 0 is x = V t.
Using this in (1.1b) yields x′ = 0 = dt + eV t or d = −eV . Similarly, O′ looks
at O (i.e., x = 0) and it has the coordinate x′ = −V t′ with respect to O′ (since
O moves in the negative x′ direction as seen by O′ ). Then from (1.1b) we have
−V t′ = dt and from (1.1a) we have t′ = at, and hence t′ = −dt/V = at so that
−aV = d = −eV and thus also a = e and d = −aV . Using these results in
equations (1.1) now gives us
b
′
(1.2a)
t =a t+ x
a
x′ = a(x − V t)
(1.2b)
y′ = y
z ′ = z.
Now let a photon move along the x-axis (and hence also along the x′ -axis)
and pass both origins when they coincide at t = t′ = 0. Then the x coordinate
of the photon as seen by O is x = ct, and the x′ coordinate as seen by O′
is x′ = ct′ . Note that the value of c is the same for both observers. This is
assumption (2). Using these in equations (1.2) yields
b
bc
′
t = a t + ct = at 1 +
a
a
2
V
x′ = a(ct − V t) = cat 1 −
c
so that
bc
V
′
′
= x = ct = cat 1 +
cat 1 −
c
a
and therefore −V /c = bc/a or
b
V
= − 2.
a
c
So now equations (1.2) become
V
t = a t− 2x
c
′
x′ = a(x − V t)
y′ = y
(1.3a)
(1.3b)
z ′ = z.
We still need to determine a. To do this, we will again use the Principle of
Relativity. Let O look at a clock situated at O′ . Then ∆x = V ∆t and from
(1.3a), O and O′ will measure time intervals related by
V2
V
′
∆t = a ∆t − 2 ∆x = a 1 − 2 ∆t.
c
c
Now let O′ look at a clock at O (so ∆x = 0). Then ∆x′ = −V ∆t′ so (1.3b)
yields
−V ∆t′ = ∆x′ = a(0 − V ∆t) = −aV ∆t
and hence
1 ′
∆t .
a
By the Principle of Relativity, the relative factors in the time measurements
must be the same in both cases. In other words, O sees O′ ’s time related to his
by the factor a(1 − V 2 /c2 ), and O′ sees O’s time related to his by the factor
1/a. This means that
1
V2
=a 1− 2
a
c
or
1
a= q
2
1 − Vc2
∆t =
3
and therefore the final Lorentz transformation equations are
t − V2 x
t′ = q c
2
1 − Vc2
x−Vt
x′ = q
2
1 − Vc2
(1.4)
y′ = y
z ′ = z.
It is very common to define the dimensionless variables
β=
V
c
and
1
γ=q
1−
V2
c2
1
.
= p
1 − β2
(1.5)
In terms of these variables, equations (1.4) become
β
t′ = γ t − x
c
x′ = γ(x − βct)
y′ = y
(1.6)
z ′ = z.
Since c is a universal constant, it is essentially a conversion factor between units
of time and units of length. Because of this, we may further change to units
where c = 1 (so time is measured in units of length) and in this case the Lorentz
transformation equations become
t′ = γ(t − βx)
x′ = γ(x − βt)
y′ = y
(1.7a)
z ′ = z.
These equations give the coordinates as seen by O′ in terms of those of O. If we
want the coordinates as seen by O in terms of those of O′ , then we let β → −β
and we have
t = γ(t′ + βx′ )
x = γ(x′ + βt′ )
y = y′
z = z′.
Note that 0 ≤ β ≤ 1 so that 1 ≤ γ < ∞. We also see that
γ2 =
1
1 − β2
4
(1.7b)
so that
γ 2 − γ 2 β 2 = 1.
Then recalling the hyperbolic trigonometric identities
cosh2 θ − sinh2 θ = 1
and
1 − tanh2 θ = sech2 θ
we may define a parameter θ (sometimes called the rapidity) by
β = tanh θ
so that
γ = cosh θ
and
γβ = sinh θ.
In terms of θ, equations (1.7a) become
t′ = (cosh θ)t − (sinh θ)x
x′ = −(sinh θ)t + (cosh θ)x
y′ = y
z′ = z
(1.8)
which looks very similar to a rotation in the xt-plane, except that now we have
hyperbolic functions instead of the usual trigonometric ones. However, note
that both sinh terms have the same sign.
Next, let us consider motion as seen by both observers. In this case we write
displacements in both space and time as
dt′ = γ(dt − βdx)
dx′ = γ(dx − βdt)
dy ′ = dy
dz ′ = dz.
Then the velocity vx′ of a particle along the x′ -axis as seen by O′ is
vx′ =
dx − βdt
dx/dt − β
vx − β
dx′
=
.
=
=
′
dt
dt − βdx
1 − β(dx/dt)
1 − βvx
(1.9a)
Alternatively, we may write
vx =
dx′ /dt′ + β
dx
dx′ + βdt′
v′ + β
=
= ′
= x
.
′
′
′
dt
dt + βdx
1 + β(dx /dt )
1 + βvx′
(1.9b)
These last two equations are called the relativistic velocity addition law. It
should be obvious that for motion along the x-axis we have vy′ = vy and vz′ = vz .
5
Be sure to understand what these equations mean. They relate the velocity of
an object as seen by two different observers whose relative velocity along their
common x-axis is β. Note that even if vx′ = 1 (corresponding to vx′ = c), the
velocity as seen by O is still just vx = 1. This is quite different from the classical
Galilean addition of velocities, because nothing can go faster than light (c = 1).
One of the most important aspects of Lorentz transformations is that they
leave the quantity t2 − x2 − y 2 − z 2 invariant. In other words, using equations
(1.7a) you can easily show that
t′2 − x′2 − y ′2 − z ′2 = t2 − x2 − y 2 − z 2 .
(1.10)
Note that setting this equal to zero, we get the equation of an outgoing sphere of
light as seen by either observer. (Don’t forget that if c 6= 1, then t becomes ct.)
We refer to this as the invariance of the interval because it can be written
as
(∆t′ )2 − (∆x′ )2 − (∆y ′ )2 − (∆z ′ )2 = (∆t)2 − (∆x)2 − (∆y)2 − (∆z)2 .
If the primed frame is the rest frame of a particle, then we have dx′ = dy ′ =
dz = 0 and dt′ measures the time interval as seen by the particle, called the
proper time. Because of this, we sometimes write
′
dτ 2 := dt2 − dx2 − dy 2 − dz 2 .
This is frequently also called the proper distance (or proper length) and
written ds2 . The difference between dτ and ds is if c 6= 1, we have
ds2 = c2 dt2 − dx2 = c2 dt2 (1 − v2 /c2 ) = c2 dt2 /γ 2 := c2 dτ 2
so that dτ 2 = ds2 /c2 . Since we are working with c = 1, we will write proper
distance as
ds2 = dt2 − dx2 − dy 2 − dz 2 .
Now let’s go to some modern notation. In units with c = 1, we first define
our four spacetime components as the vector
 0  
x
t
 x1   x 
µ
  
x =
 x2  =  y  .
x3
z
(If c 6= 1 then x0 = ct.) This vector is an element of a 4-dimensional vector
space called Minkowski space. Then we have
ds2 = (dx0 )2 − (dx1 )2 − (dx2 )2 − (dx3 )2
or, defining the Lorentz (or Minkowski) metric


1


−1

gµν = 


−1
−1
6
(1.11)
we write (using the summation convention)
ds2 = gµν dxµ dxν .
(1.12)
Let me note that most particle physicists use this metric, which we can also
write as simply gµν = diag(1, −1, −1, −1), but most relativists us the metric
gµν = diag(−1, 1, 1, 1), and you need to be careful when reading equations.
Many physicists also use the symbol ηµν rather than gµν when dealing with the
Lorentz metric (as opposed to more general metrics used in general relativity).
Vectors xµ in Minkowski space are classified as timelike if xµ xµ > 0, spacelike if xµ xµ < 0 and null (or lightlike) if xµ xµ = 0. Light rays are null, and
hence we see that there are nonzero vectors with zero norm. Because of this,
the Minkowski metric is not positive definite, and we say that Minkowski space
is semi-Riemannian.
From linear algebra we know that the metric defines an inner product, and
we can use this to raise or lower indices, for example, xµ = gµν xν . In the case
of the Lorentz metric, we have the inverse metric with components g µν = gµν .
Furthermore, there is no difference between x0 and x0 , but xi = −xi for i =
1, . . . , 3. (Again, be careful because this is the opposite of what you get if you
use the other metric.)
Using this notation, equation (1.10) is written
gµν x′µ x′ν = gµν xµ xν
where the metric is the same in both frames. Lowering indices, we write this in
its most compact form as
x′µ x′µ = xµ xµ
and we say that the length x2 = xµ xµ is an invariant. It is also important to
understand the the scalar product of two vectors aµ and bµ is written in the
equivalent forms
a · b = gµν aµ bν = aν bν = a0 b0 + ai bi = a0 b0 −
3
X
i=1
ai bi = a0 b0 − a · b.
Note that the summation convention means that repeated Greek indices are to
be summed from 0 to 3, and repeated Latin indices are to be summed from 1
to 3.
We now write our Lorentz transformation equations as
x′µ = Λµ ν xν
where we have defined the Lorentz transformation matrix


γ −βγ 0 0
 −βγ
γ 0 0
.
Λµ ν = 
 0
0 1 0
0
0 0 1
7
(1.13)
(1.14)
(You should be aware that some authors put the prime on the indices and
′
′
write this in the form xµ = Λµ ν xν .) Using this, we write the invariant x2 as
′ ′µ
µ
α β
xµ x = Λµα Λ β x x . But this must equal xα xα , and hence we have
ν
Λµα Λµ β = (ΛT )αµ Λµ β = (ΛT )α gνµ Λµ β = gαβ
(1.15)
which can be written in matrix form as
ΛT gΛ = g.
In fact, this can be taken as the definition of a Lorentz transformation Λ. Since
gβα = δβα , we can write equation (1.15) as (ΛT )α µ Λµ β = δβα which shows that
Λ is an orthogonal transformation, i.e., ΛT = Λ−1 . This is actually just what
equations (1.8) say—a Lorentz transformation is a rotation in Minkowski space.
Since (Λ−1 )µ ν = (ΛT )µ ν = Λν µ , we see from equation (1.13) that
(Λ−1 )α µ x′µ = (Λ−1 )α µ Λµ ν xν = xα
or
xα = Λµ α x′µ .
(1.16)
Equations (1.13) and (1.16) then give us the very useful results
Λµ ν =
∂x′µ
∂xν
Λµ ν = (Λ−1 )ν µ =
and
∂xν
.
∂x′µ
(1.17)
In order to define velocity in an invariant manner, we define the 4-velocity
in terms of the proper time by
uµ :=
dxµ
.
dτ
(1.18)
Note we can write dτ 2 = dt2 − dx2 = dt2 (1 − v2 ). Here v is the velocity of a
particle as seen by O. If we let O′ be the particle rest frame, then v is just β
and we have dτ 2 = dt2 (1 − v2 ) = dt2 /γ 2 so that
dt
= γ.
dτ
Then
dt dxµ
dxµ
dxµ
=
=γ
dτ
dτ dt
dt
and we can write the 4-velocity as the vector
#
"
γ
µ
u =
γv
uµ =
which has the magnitude
uµ uµ = γ 2 − γ 2 v2 = γ 2 (1 − v2 ) = 1.
8
(1.19)
(Again, with c 6= 1 we have x0 = ct so that uµ = γ[c, v] and uµ uµ = c2 .)
Since Λµ ν is a constant matrix, we have
u′µ =
dx′µ
dxν
= Λµ ν
= Λµ ν u ν
dτ
dτ
so that uµ transforms in exactly the same manner as xµ . We call any vector
that transforms in this way a 4-vector, which justifies the term ‘4-velocity’
used above. Similarly, we define the 4-momentum by
" #
1
µ
µ
(1.20)
p := mu = mγ
v
so that
p 2 = p µ p µ = m2 .
(If c 6= 1 then p2 = m2 c2 . Let me also emphasize that the mass m in all of our
equations is the constant rest mass. We never talk about a “relativistic mass”
γm that many older books use where they write our mass as m0 and then define
m = γm0 .)
Expanding the square root we have
1
m
= m 1 + v2 + · · ·
p0 = mγ = √
2
1 − v2
which is the sum of a rest energy term m (= mc) and a kinetic energy mv 2 /2
(= (1/2)mv 2 /c) plus higher order terms in v (= v/c). Because of this, we see
that p0 is the total energy p0 = mγ = E (= E/c) of the particle, so using
mγv = p as the classical momentum, we have
" #
E
µ
.
(1.21)
p =
p
Therefore m2 = p2 = E 2 − p2 or
E 2 = p2 + m2 .
(1.22)
(If c 6= 1, then pµ = mγ[c, v] = [E/c, p] and this becomes E 2 = p2 c2 + m2 c4 .)
Now, the gradient operator is defined as ∇ = ∂/∂x so that
∇i = ∂/∂xi := ∂i .
Let us define ∂µ = ∂/∂xµ . Then
" #
∂0
∂µ =
∇
and
∂µ =
"
∂0
−∇
Using equation (1.17) we have
∂µ′ =
∂
∂xν ∂
=
= Λµ ν ∂ν
′µ
∂x
∂x′µ ∂xν
9
#
.
(1.23)
or, equivalently,
∂ ′µ = Λµ ν ∂ ν
which shows that indeed ∂µ transforms as a 4-vector (which is implied by the
notation). The operator
∂µ ∂ µ = (∂0 )2 + ∂i ∂ i = (∂t )2 − ∇2 =
∂2
− ∇2
∂t2
is called the d’Alembertian, and is frequently written as .
In quantum mechanics we have the momentum operator defined by p =
−i~∇ and the energy operator defined by E = i~(∂/∂t). Then pi = −i~∇i =
−i~∂i = +i~∂ i and we can define the relativistic momentum operator
pµ = i~∂ µ .
Using units with ~ = 1, the expression E 2 − p2 − m2 = 0 becomes −∂t2 + ∇2 −
m2 = 0 or
(∂t2 − ∇2 + m2 )φ(x) = ( + m2 )φ(x) = 0
which is known as the Klein-Gordon equation.
Even though the two reference frames relative to which we are describing
motion must be inertial, there is no reason we can’t describe the motion of an
accelerated object. As you might guess, we define the 4-acceleration of an
object by
duµ
.
aµ =
dτ
Since uµ uµ = 1, it follows that the 4-acceleration is always orthogonal to the
4-velocity because
duµ
d µ
u uµ = 2uµ
= 2uµ aµ .
0=
dτ
dτ
We also define the 4-force
fµ =
dpµ
= maµ
dτ
so that
dpµ
dpµ
=γ
=γ
fµ =
dτ
dt
"
d(γm)/dt
dp/dt
#
=
"
f0
γfc
#
(1.24)
where fc = dp/dt is the classical force on the particle. Since the 4-force obviously
obeys f µ uµ = 0, we have
0 = f µ uµ = γf 0 − γ 2 fc · v
and therefore
f 0 = γfc · v
(1.25)
which, to within the factor of γ, is just the classical power (i.e., the rate at
which work is done). (And if c 6= 1 we have 0 = f µ uµ = f 0 γc − γ 2 fc · v so that
f 0 = (γ/c)fc · v.)
10
2
Maxwell’s Equations
Experimentally, it is found that the charge to mass ratio e/mγ of a particle
moving at velocity β obeys the law
e
e
= (1 − β 2 )1/2 .
mγ
m
(The two sides of this equation refer to different measurements, so it’s not as trivial of a statement as it looks at first.) Therefore, e is a constant, and we have that
charge is an invariant quantity. What we would now like to know is how charge
density and electric current behave. Since charge density is charge/volume, we
must find out how volumes transform.
Let frame 2 (the primed frame) be in motion with respect to frame 1 (the
unprimed frame) along their mutual x-axis, and consider a small cube of side
l0 at rest in frame 2. In its rest frame, this cube has volume dτ0 (not to be
confused with proper time). From (1.7a) we have
dτ1 = dxdydz =
1 ′ ′ ′
1
dx dy dz = dτ0
γ
γ
(2.1)
where we used dx′ = γ(dx − βdt) together with dt = 0 for a measurement made
in frame 1. Thus we have
dτ1 = dτ0 (1 − β 2 )1/2 .
Now suppose the volume is also moving with respect to frame 2, and let this
motion be along the x2 axis. Letting v2x be the velocity of the box with respect
to frame 2, and similarly for v1x , we have
2 1/2
dτ2 = dτ0 (1 − v2x
)
and
2 1/2
dτ1 = dτ0 (1 − v1x
) .
But from (1.9b) we have
v1x =
v2x + β
1 + βv2x
and therefore
2 1/2
v2x + β
dτ1 = dτ0 1 −
1 + βv2x
1/2
2
2
1 + 2βv2x + β 2 v2x
− v2x
− 2βv2x − β 2
= dτ0
(1 + βv2x )2
1/2
2
2
(1 − β 2 )(1 − v2x
)
[(1 − β 2 )(1 − v2x
)]1/2
= dτ0
=
dτ
0
(1 + βv2x )2
1 + βv2x
=
(1 − β 2 )1/2
dτ2
1 + βv2x
11
2 1/2
where in going to the last line we used dτ2 = dτ0 (1 − v2x
) . Rearranging, this
is
dτ2
dτ1 =
.
(2.2a)
γ(1 + βv2x )
Reversing the frame point of view, we clearly also have
dτ2 =
dτ1
.
γ(1 − βv1x )
(2.2b)
Be sure to understand what these equations say. The velocities v1x and v2x are
the observed velocities of a box with rest volume dτ0 moving along the common
x-axis as seen in frames 1 and 2, which are moving with velocity β with respect
to each other.
Now suppose that we have dn charges of Q coulombs each. As we stated
above, Q is an invariant. Obviously, dn is also an invariant since it is just the
number of charges. Then the charge densities as observed in frames 1 and 2 are
ρ1 =
Q dn
dτ1
and
ρ1 =
Q dn
dτ1
so that
ρ1 dτ1 = ρ2 dτ2
where dτ1 and dτ2 are the same volume containing the charge Q dn as observed
in the two different reference frames. Then, using equations 2.2, we have
ρ1 = ρ2
dτ2
= ρ2 γ(1 + βv2x )
dτ1
dτ1
ρ2 = ρ1
= ρ1 γ(1 − βv1x ) .
dτ2
(2.3)
The current density J := ρv is defined as the charge per area-time, so we
can write J1x = ρ1 v1x and J2x = ρ2 v2x . Using these definitions in equations
(2.3) yields the transformation of charge density
ρ1 = γ(ρ2 + βJ2x )
(2.4)
ρ2 = γ(ρ1 − βJ1x ) .
Now using equations (1.9) and (2.3) we also have
J1x = ρ1 v1x = ρ2 γ(1 + βv2x )
v2x + β
1 + βv2x
or
J1x = γ(J2x + βρ2 )
J2x = γ(J1x − βρ1 ) .
Similarly,
J1y = ρ1 v1y = ρ2 γ(1 + βv2x )v1y .
12
(2.5a)
But
v1y =
dy1
dy2
v2y
=
=
.
dt1
γ(dt2 + βdx2 )
γ(1 + βv2x )
Hence
J1y = ρ2 v2y = J2y
and similarly
J1z = J2z .
(2.5b)
Comparing equations (2.4) and (2.5) with equations (1.7), we see that we have
a 4-current density
" #
ρ
µ
J =
.
(2.6)
J
(And again, if c 6= 1 this becomes J µ = [ρc, J].)
Now that we have shown that J µ does indeed define a 4-vector, let me show
another way to arrive at this conclusion that is analogous to the definition of
4-momentum. From (2.1) we can write dτ = dτ0 /γ where dτ0 is at rest in frame
2, and γ = (1 − β 2 )−1/2 where β is the velocity of frame 2 with respect to frame
1. Then the invariance of charge implies ρdτ = ρ0 dτ0 or ρdτ0 /γ = ρ0 dτ0 , and
hence
ρ = γρ0 .
This is analogous to the expression mrelativistic = γmrest or simply m = γm0 .
Now we also have J = ρv = ρ0 γv. Recalling that the 4-velocity is given by
#
"
γ
µ
u =
γv
we see that letting v be the velocity of the charge, i.e., v = β, then (2.6) is the
same as
J µ = ρ0 u µ
which is analogous to pµ = muµ . In other words, we have shown
#
"
" #
ρ
γ
= ρ0 u µ .
Jµ =
= ρ0
γv
J
(2.7)
Since we saw earlier that the derivative operator ∂µ transforms as a 4-vector
(technically, a co-vector), and we just showed that J µ is a 4-vector, it follows
that ∂µ J µ is a Lorentz scalar. But
∂µ J µ = ∂0 ρ + ∂i J i =
∂ρ
+∇·J
∂t
and therefore the continuity equation may be written in the covariant form
∂µ J µ = 0 .
13
(2.8)
Now recall Maxwell’s equations:
∇ · E = 4πρ
(2.9a)
∇·B= 0
(2.9b)
∇×E=−
∂B
∂t
∇ × B = 4πJ +
(2.9c)
∂E
∂t
(2.9d)
Using B = ∇ × A, equation (2.9c) implies
E = −∇φ −
∂A
∂t
so that equations (2.9a) and (2.9d) then imply (using the identity ∇ × ∇ × A =
∇(∇ · A) − ∇2 A)
∂
(2.10a)
∇2 φ + (∇ · A) = −4πρ
∂t
∂φ
∂2A
= −4πJ .
(2.10b)
−∇ ∇·A+
∇2 A −
2
∂t
∂t
The gauge transformation A → A′ = A + ∇Λ leaves the physical field
B = ∇ × A unchanged, so if E = −∇φ − ∂A/∂t is also to remain unchanged,
we must have φ → φ′ = φ − ∂Λ/∂t. This gives us the freedom to choose (φ, A)
such as to satisfy the Lorentz gauge (or Lorentz condition)
∇·A+
∂φ
= 0.
∂t
In other words, we demand that the new potentials (φ′ , A′ ) satisfy
0 = ∇ · A′ +
∂φ′
∂t
= ∇ · A + ∇2 Λ +
∂φ ∂ 2 Λ
− 2 .
∂t
∂t
Thus, if we can find a Λ that satisfies
∇2 Λ −
∂2Λ
∂φ
=
−
∇
·
A
+
∂t2
∂t
the gauge transformed fields will satisfy the Lorentz condition. Fortunately,
this is a straightforward problem to solve. All we need to do is find the Green’s
function for the wave equation, and then Λ will be the integral of the Green’s
function times the quantity on the right. (Very briefly, if you have a linear
operator L(x) acting on a function f(x) such that L(x)f (x) = g(x), and if you
find a Green’s function G(x, x′ ) defined by L(x)G(x, x′ ) = δ(x − x′ ), then the
14
R
solution to the problem is essentially f (x) = G(x, x′ )g(x′ ) dx′ . You can easily
verify that acting on this with L(x) will yield L(x)f (x) = g(x).)
In any case, choosing the Lorentz gauge, equations (2.10) become the wellknown wave equations
∂2φ
= −4πρ
∂t2
(2.11a)
∂2A
= −4πJ .
∂t2
(2.11b)
∇2 φ −
∇2 A −
Note that an equivalent way of writing these is
φ = −4πJ 0
A = −4πJ .
So if we define the 4-potential Aµ by
µ
A =
"
φ
A
#
then the wave equations may be written in the concise form
Aµ = −4πJ µ
(2.12)
where the Lorentz condition becomes
∂µ Aµ = 0 .
That Aµ is indeed a 4-vector follows because is a Lorentz invariant quantity
and J µ is a 4-vector. Thus Aµ must transform as a 4-vector so that (2.12)
is covariant (i.e., so that both sides transform the same way). Also, Aµ is
unchanged even if c 6= 1. This is because the right side of (2.11b) becomes
(−4π/c)J and J 0 = ρc.
Now we need a bit of terminology. Recall that a 4-vector was defined as a
quantity v µ that under a Lorentz transformation transformed as
v µ → v ′µ = Λµ ν v ν .
As we shall see below, it is also possible to have quantities with more than one
index such that under a Lorentz transformation, each index transforms like a
4-vector. For example, a quantity F µν (not necessarily the electromagnetic field
tensor) with two indices that transforms like
F µν → F ′µν = Λµ α Λν β F αβ
is called a (second rank) tensor. Higher rank tensors are defined in the obvious
manner. Note also that all of the indices need not be superscripts (such indices
15
are called contravariant). We can equally have subscripts (called covariant)
that transform like
′
Fµν
= Λµ α Λν β Fαβ .
And we can have a mixed tensor like F µ ν . Indices are raised and lowered by
using the metric gµν and its inverse g µν .
At last we are ready to write Maxwell’s equations in covariant form. It is
not hard to show that even though ∂µ transforms as a 4-vector under a Lorentz
transformation Λµ ν , as does Aµ , the quantity ∂µ Aν does not transform as a
second-rank tensor. However, the antisymmetric quantity Fµν defined by
Fµν := ∂µ Aν − ∂ν Aµ
(2.13a)
does indeed transform as a tensor. This is called the electromagnetic field
tensor. Equivalently, we may consider the contravariant version
F µν = ∂ µ Aν − ∂ ν Aν .
(2.13b)
I claim that equations (2.9a) and (2.9d) can be written in the form
∂µ F µν = J ν .
(2.14)
To see this, first recall that we are using the metric g = diag(1, −1, −1, −1)
so that ∂/∂t = ∂/∂x0 = ∂0 = ∂ 0 and ∇i := ∂/∂xi = ∂i = −∂ i . Using
E = −∇ϕ − ∂A/∂t we have
E i = ∂ i A0 − ∂ 0 Ai = F i0
and also B = ∇ × A so that
B 1 = ∇2 A3 − ∇3 A2 = −∂ 2 A3 + ∂ 3 A2 = −F 23
plus cyclic permutations 1 → 2 → 3 → 1. Then the electromagnetic field tensor
is given by


0 −E 1 −E 2 −E 3
 E1
0 −B 3
B2 
.
(2.15)
F µν = 
2
3
E
B
0 −B 1 
3
2
1
E
−B
B
0
(Be sure to note that this is the form of F µν for the metric diag(1, −1, −1, −1).
If you use the metric diag(−1, 1, 1, 1) then all entries of F µν change sign. In
addition, you frequently see the matrix F µ ν which also has different signs.)
Now, for the ν = 0 component of (2.14) we have
J 0 = ∂µ F µ0 = ∂i F i0 = ∂i E i
which is Coulomb’s law
∇ · E = ρ.
16
Next consider the ν = 1 component of (2.14). This is
J 1 = ∂µ F µ1 = ∂0 F 01 + ∂2 F 21 + ∂3 F 31
= −∂0 E 1 + ∂2 B 3 − ∂3 B 2
= −∂t E 1 + (∇ × B)1
and therefore we have
∂E
= J.
∂t
Finally, I leave it as an exercise for you to show that equations (2.9b) and
(2.9c) can be written as (note the superscripts are cyclic permutations)
∇×B−
∂ µ F νσ + ∂ ν F σµ + ∂ σ F µν = 0
or simply
∂ [µ F νσ] = 0 .
(2.16)
Remark: There is another interesting way to arrive at the electromagnetic field
tensor that we now describe, but you are free to skip over it. First we need to
give a more careful definition of a tensor. To begin, given a vector space V , we
can define the dual space V ∗ as the vector space of linear functionals on V .
In other words, α ∈ V ∗ means that α : V → R is a linear map from V to R.
(We restrict consideration to real vector spaces.) Members of the dual space
are frequently called covectors. If V has a basis {e1 , . . . , en }, we define the n
linear functionals {ω 1 , . . . , ω n } by
ω i (ej ) = δji .
I will show that these n linear functionals form a basis for V ∗ , i.e., that they
are linearly independent and span V ∗ .
To show this, let α ∈ V ∗ be arbitrary but fixed. Note that for any v ∈ V ,
using the linearity of α we have
α(v) = α(v i ei ) = v i α(ei ) = ai v i
where we have defined the scalars ai by ai = α(ei ). On the other hand,
ω i (v) = ω i (v j ej ) = v j ω i (ej ) = v j δji = v i
and hence we see that α(v) = ai v i = ai ω i (v) so that α = ai ω i . This shows that
the ω i span V ∗ .
To show they are linearly independent, suppose we have ci ω i = 0 for some
set of scalars ci . Then for any j = 1, . . . , n we have 0 = ci ω i (ej ) = ci δji = cj
which proves linear independence. Thus we have shown that any α ∈ V ∗ can
be written in the form
α = ai ω i = α(ei )ω i .
17
As an example, consider the space V = R2 consisting of all column vectors
of the form
1
v
v=
.
v2
Relative to the standard basis we have
0
1
= v 1 e1 + v 2 e2 .
+ v2
v = v1
1
0
P
If φ ∈ V ∗ , then φ(v) =
φi v i , and we may represent φ by the row vector
φ = (φ1 , φ2 ). In particular, if we write the dual basis as ω i = (ai , bi ), then we
have
1
1 = ω 1 (e1 ) = (a1 , b1 )
= a1
0
0
0 = ω 1 (e2 ) = (a1 , b1 )
= b1
1
1
2
0 = ω (e1 ) = (a2 , b2 )
= a2
0
0
2
1 = ω (e2 ) = (a2 , b2 )
= b2
1
so that ω 1 = (1, 0) and ω 2 = (0, 1). Note, for example,
1
v
ω 1 (v) = (1, 0) 2 = v 1
v
as it should.
As another very important example, let V be an inner product space. If
a, b ∈ V , then the inner product of a and b is the number ha, bi. Then given a
fixed vector a, the quantity ha, ·i is a linear functional on V because it takes a
vector b ∈ V and gives back a number:
ha, ·i : b → ha, bi ∈ R .
Since ha, ·i is in V ∗ , let us denote it by α, so that α(b) = ha, bi.
Given a basis {ei } for V , let us define the numbers gij by
gij := hei , ej i = hej , ei i = gji .
This is the proper definition of the metric. Then
ha, bi = hai ei , bj ej i = ai bj hei , ej i = ai bj gij = bj gji ai .
But we also have
α(b) = α(bj ej ) = bj α(ej ) = bj aj
18
and therefore
Hence we define
bj aj = α(b) = ha, bi = bj gji ai .
aj = gji ai .
This is called lowering an index.
Since the inner product is nondegenerate by definition, the metric gij must
be nonsingular, and hence we can define its inverse which we denote by g ij .
Multiplying this last equation by g kj we then have
g kj aj = g kj gji ai = δik ai = ak
and thus we define raising an index by
ak = g kj aj .
Now that we have an understanding of dual spaces, we are in a position to
define tensors carefully. So, a tensor T is just a multilinear map
T : V ∗s × V r = V ∗ × · · · × V ∗ × V × · · · × V → R .
By multilinear, we mean that it is linear in each variable separately. This tensor
is said to have covariant order r, and contravariant order s, or simply to be a
tensor of type (s, r). In other words, T takes as its argument s covectors and
r vectors. Since it is multilinear, we see that
T (α(1) , . . . , α(s) , v(1) , . . . ,v(r) )
(s)
(1)
j1
ir
ej1 , . . . , v(r)
eir )
= T (ai1 ω i1 , . . . , ais ω is , v(1)
(1)
(s)
(1)
(s)
jr
j1
T (ω i1 , . . . , ω is , ej1 , . . . , ejr )
· · · v(r)
= ai1 · · · ais v(1)
jr
j1
T i1 ···is j1 ···jr
· · · v(r)
= ai1 · · · ais v(1)
where the last line defines the components of T . Thus we see that if we know the
components of T , then we know the result of T acting on an arbitrary collection
of vectors and covectors.
What happens to the components of T under a change of coordinates? From
a practical standpoint, this is what really defines a tensor. A change of basis in
V is of the form
ei → ēi = ej pj i
where (pj i ) is called the transition matrix. Then any x ∈ V can be written
in terms of its components with respect to either ei or ēi , and we have
x = xj ej = x̄i ēi = x̄i ej pj i
and therefore we must have
xj = pj i x̄i
19
or
x̄i = (p−1 )i j xj .
From these we easily see that
∂xi
∂ x̄j
pi j =
and
(p−1 )i j =
∂ x̄i
.
∂xj
When V undergoes a change of basis, what about V ∗ ? Let us write in general
ω → ω̄ i = bi j ω j . Since we must also have ω̄ i (ēj ) = δji , we see that
i
δji = ω̄ i (ēj ) = ω̄ i (ek pk j ) = pk j ω̄ i (ek ) = pk j bi l ω l (ek ) = pk j bi l δkl = bi k pk j
so that bi k = (p−1 )i k . In other words,
ω i → ω̄ i = (p−1 )i j ω j .
Finally we are in a position to derive the general transformation law of a
tensor. We have
T
i1 ···is
j1 ···jr
= T (ω̄ i1 , . . . , ω̄ is , ēj1 , . . . , ējr )
= T ((p−1 )i1 k1 ω k1 , . . . , (p−1 )is ks ω ks , el1 pl1 j1 , . . . , elr plr jr )
= (p−1 )i1 k1 · · · (p−1 )is ks pl1 j1 · · · plr jr T (ω k1 , . . . , ω ks , el1 , . . . , elr )
= (p−1 )i1 k1 · · · pl1 j1 · · · T k1 ···ks l1 ···lr
or
∂ x̄i1
∂ x̄is ∂xl1
∂xlr k1 ···ks
·
·
·
·
·
·
T
l1 ···lr .
∂xk1
∂xks ∂ x̄j1
∂ x̄jr
This is the classical transformation law of a type (s, r) tensor.
In the particular case of a second rank tensor F µν under a Lorentz transformation, we have
x̄µ = Λµ ν xν
T
i1 ···is
j1 ···jr
=
so that
∂ x̄µ
= Λµ ν
∂xν
and therefore
∂ x̄µ ∂ x̄ν αβ
F = Λµ α Λν β F αβ .
∂xα ∂xβ
Let us now return to the physics. We know that the Lorentz force law is
F
µν
=
F = q(E + v × B) =
dp
dt
so consider (now τ is the proper time again)
dp
dt dp
dp
=
=γ
.
dτ
dτ dt
dt
20
In terms of the 4-velocity
µ
u =
"
γ
γv
#
=
"
u0
u
#
we can write
dp
= γq(E + v × B) = q(γE + γv × B)
dτ
= q(u0 E + u × B) .
(2.17)
Also note that if W = p0 is the energy of the particle, then the change in
energy is the rate at which work is done, so that
dW
dr
= F·
= F · v = q(E + v × B) · v = qE · v
dt
dt
and therefore
dW
dW
dp0
=
=γ
= qE · γv = qE · u .
(2.18)
dτ
dτ
dt
Combining equations (2.17) and (2.18), we see that we can define a linear
map uµ → dpµ /dτ of a 4-vector to another 4-vector, and hence there exists a
second rank mixed tensor F µ ν such that
dpµ
= qF µ ν uν .
dτ
(2.19)
Comparing (2.19) with (2.17) and (2.18) allows us to pick out the components
of F µ ν :


0
Ex
Ey
Ez


0
Bz −By 
 Ex
.
F µν = 
 E −B
0
Bx 

 y
z
Ez
By
Finally, we note that we can also write F
and F µν = g να F µ α .
µ
−Bx
ν
0
in the alternate forms Fµν = gµα F α ν
Now that we have the electromagnetic field tensor, it is straightforward to
derive the transformation laws for the E and B fields. Starting from F µν which
defines the fields E and B, we have F ′µν = Λµ α Λν β F αβ which then gives the
fields E′ and B′ in terms of E and B. In matrix notation, we can write this as
F ′ = ΛF ΛT .
Using equations (1.14) and (2.15), it is easy to multiply out the matrices and
21
show that

F ′µν
−E ′1
0
B ′3
−B ′2
0
 E ′1
=
 E ′2
E ′3

−E ′2
−B ′3
0
B ′1
0

E1

=
 γ(E 2 − βB 3 )

3
2
γ(E + βB )

−E ′3
B ′2 

−B ′1 
0
−E 1
−γ(E 2 − βB 3 ) −γ(E 3 + βB 2 )
0
γ(B 3 − βE 2 )
2
3
−γ(B + βE )
−γ(B 3 − βE 2 )
0
B1


γ(B 2 + βE 3 ) 
.

−B 1

0
This was for the special case of a boost along the x-axis, i.e., β = βx̂. it
is not hard to see that we can write down the field transformation laws for a
boost β in an arbitrary direction (but with the coordinate axes still parallel) if
we write this in terms of components parallel and perpendicular to the boost.
This yields
E′ ⊥ = γ(E⊥ + β × B)
E′ k = Ek
B ⊥ = γ(B⊥ − β × E)
B k = Bk
′
′
22
(2.20a)
(2.20b)
Download