The Origin of the Special and General Theories
Newtonian Mechanics and Inertial Frames of Reference
Newtonian particle mechanics is based on Newton's laws of motion. Newton's first law may
be written: "Reference frames exist in which all free particles have zero acceleration". Here a
free particle is defined to be one on which no net force acts. It is assumed that the question of
whether or not a particle is free is absolute and does not depend on the choice of frame in
which the motion is expressed.
Originally Newton presumed the existence of a unique reference frame called "absolute
space" with respect to which free particles would have zero acceleration. However the
additional assumption of "absolute time" meant that any frame moving with uniform velocity
with respect to absolute space would be dynamically equivalent to the latter. Accordingly in
specifying Newton's laws we normally refer to a set of reference frames, each of which has
uniform velocity with respect to any other. These are the so-called inertial frames - in
practice, those moving with uniform velocity with respect to the "fixed" stars. (Why the latter
should be involved is the subject of the so-called "Mach principle" - that distant matter in the
universe determines by some means the inertia effects which we observe.)
Newton's second law may be written: "The acceleration of a particle with respect to any
inertial frame is proportional to the force acting on it", or
F  mr
where the constant of proportionality m is called the inertial mass of the particle. We assume
here that the force F is independent of the reference frame in which the motion of the particle
is expressed. Experimentally equation (1) and its immediate consequences are found to be
correct for particles moving with everyday speeds, i.e. small compared with the velocity of
Newton's third law is that the forces exerted on each other by two interacting particles are
equal and opposite. This proposition is comparatively easy to verify for simple mechanical
systems, but it is untrue in electrodynamics, e.g. for the forces between two charged particles
in relative motion.
The Origins of the Special Theory of Relativity
As mentioned above, inertial frames all have a constant relative velocity with respect to one
another. Mathematically this is in accordance with the so-called Galilean transformation
equations, which state in pre-relativistic physics the presumed relation between the spacetime
coordinates of an event (x, y, z, t ) in one inertial frame and the corresponding coordinates
(x', y', z', t') in another. For this purpose we usually imagine the two frames of reference to be
in "standard configuration", with the cartesian axes coinciding at t = t' = 0 and the relative
motion being in the common x, x' direction, as shown in the figure.
frame S
y' "moving"
frame S'
Figure 1. Inertial frames S and S' in "standard configuration".
Since the distance between the origins is just vt, it is apparently "obvious" that the coordinate
transformation equations are of the form
x' = x - vt,
y' = y, z' = z, t' = t
the fourth of these expressing the Newtonian belief that the time of an event is the same in all
frames of reference, i.e. "time" is absolute. These are the Galilean transformation equations,
and it is an elementary exercise now to verify that the components of acceleration of any
object with respect to the two coordinate systems of reference are the same. So the
acceleration in equation (1) will have the same value in any inertial frame.
Since Newton's laws of motion are the basis for all of particle mechanics, later extended to
continuum mechanics (rigid bodies etc.) it follows that the whole of classical mechanics
"works" in all inertial frames of reference. For example, if the momentum of a physical
system is conserved in one inertial frame of reference (which will be the case if there are no
external forces acting on it) then it follows that the same result will hold in any other inertial
It is of interest to enquire if this principle also holds in other branches of physics, e.g. does
electromagnetism apply equally in all inertial frames of reference? Electromagnetism is based
on Maxwell's equations, which are sophisticated ways of expressing, in the form of
differential equations , more familiar equations such as Coulomb's law, the Biot-Savart law,
etc. It can be shown that, in the absence of matter, Maxwell's equations combine to give a
wave equation, describing electromagnetic waves, the velocity of such waves being given by
the formula c  (  o o )1/ 2 .When numerical values are inserted, the result is
c = 2.998 x108 m s-1, the same as the experimental value of the velocity of light. This was a
great triumph for electromagnetic theory when it was first discovered, since it identified light
as a form of electromagnetic wave, with wavelengths in a particular (visible) range. However
one drawback, which proved very troublesome for late 19th century physics, was this: any
wave travelling at a speed of exactly c in one inertial frame must surely have a different value
in other inertial frames; we have after all from (2) that the velocity components dx/dt and
dx'/dt' in the frames S and S' respectively must differ by v. So electromagnetism, unlike
mechanics, appears to single out one inertial frame in particular - that in which the velocity of
light is exactly c for light travelling in any direction. For understandable reasons, prerelativity physicists assumed that this preferred frame was that in which the "medium" for
light propagation was at rest, by analogy with the propagation of sound. This all-pervading
medium was referred to as the "ether" and assumed to be endowed with specific physical
properties, even although in empty space it seemed to consist of nothing at all.
The experimental search for the ether took place in the late 19th century, but ultimately
proved to be a blind alley. Repeated attempts (see textbook accounts of the MichelsonMorley and other experiments) to measure the velocity of the earth with respect to the ether
frame ended in failure, and likewise various ingenious attempts to account in an ad hoc way
for those failures.
Einstein's Special Theory of Relativity
Einstein started from the belief that physics demonstrates an essential unity - there are no
rigid boundaries between its various disciplines. So electromagnetism should be on the same
footing as mechanics in the sense of being valid in all inertial frames of reference. He realised
also that the Galilean transformation equations (2), the origin of the contradiction just
discussed, are not self-evidently true, but are fallible assertions about the results of
hypothetical physical experiments. It is possible therefore that they could be wrong, in spite
of the fact that they seem so "obvious" and work so well in everyday situations.
In 1905, Einstein put forward the following two postulates:
1. The laws of physics should have the same form in all inertial frames of reference.
2. Electromagnetic signals in vacuo travel at speed c with respect to all inertial reference
Note that the first postulate confirms the special role of inertial frames of reference in
physics. It implies that any proposed law of physics should satisfy the theoretical test of
"covariance"; i.e. when we have expressed it in mathematical form in one inertial frame,
transforming the coordinates to a different inertial frame should leave the form of the
equation unaltered.
The second postulate reveals the velocity of light as one of the few fundamental constants, on
the same footing as Plank's constant and the electronic charge. But it also compels us to
question the validity of the Galilean transformation equations, since they are clearly
incompatible with the postulate that the velocity of light never changes. Recognising however
that the Galilean equations are experimentally correct for small velocities, we search for a
more general set of transformation equations which will (we guess) approximate to the
Galilean equations in some appropriate limit.
The Lorentz transformation equations
These equations describe how the coordinates (x, y, z, t) of an event in one inertial frame S
are connected to the corresponding coordinates (x', y', z', t') in another frame S'. In the
simplest case we set up the two frames of reference as shown in figure 1, the "standard
configuration" of S and S', with relative velocity v between the frames in the common x, x'
(The actual assignment of coordinates to events in a given inertial frame is straightforward,
assuming standard measuring rods and clocks distributed throughout the frame. In particular,
each clock at a distance from the origin is synchronised with all others by setting it to read
t = /c when a light signal, sent out from the origin at t = 0, arrives at the clock. Then the time
of any event is the time currently shown on the clock beside which the event occurs.)
The arguments which lead to the new "Lorentz transformation equations" include, as well as
Einstein's postulates, some basic assumptions about the nature of space and time. We assume
for example that there are no preferred directions in space, and that the choice of the origins
of the coordinate systems is of no fundamental importance. Taken together with the first
postulate - that all inertial reference frames should be on the same footing - it can be shown
that the transformation equations must be linear, and are restricted to the mathematical form
x  vt
1 v 2 / k 2
t  vx / k 2
y' = y, z' = z, t ' 
1  v 2 / k2
where k is a constant with the dimensions of velocity. At this point we note that the Galilean
transformation equations (2) are obtained by making the erroneous assumption that k = ∞.
It is Einstein's second postulate which determines the correct value of k. We imagine a light
pulse emitted at t = 0 from the origin of the inertial frame S, and which at any subsequent
time will be of spherical shape described by the equation
x  y z c t 0
2 2
The second postulate requires that the pulse should be described in frame S' by an equation of
exactly the same form, i.e.
x 2  y  2  z  2  c 2 t  2  0
It is necessary therefore that either equation should imply the other; and this is achieved by
taking k = c, for then we have (as is easy to verify)
2 2
2 2
x   y   z   c t   x  y  z  c t .
Hence the final result is
x  vt
1 v 2 / c 2
y' = y, z' = z, t ' 
t  vx / c 2
1  v 2 / c2
which are the Lorentz transformation equations. The Galilean transformation equations are an
approximation to these, valid in the "non-relativistic limit" v/c « 1.
Some consequences of the Lorentz transformation equations
1. Reversal of the roles of the two frames. We took S to be the frame at rest and S' to be
moving in the positive x direction with respect to S with speed v. But this initial choice was
arbitrary, and we could just as well have taken S' to be at rest, with S moving in the negative
x' direction with respect to S' with speed v. The equivalence of these two descriptions is
shown mathematically by solving equations (7) for the coordinates x, y, z, t in terms of the
primed coordinates. We find
x  v t 
1  v 2 / c2
y = y', z = z', t 
t  v x / c 2
1 v 2 / c 2
which are of the same mathematical form but with the primed and non-primed coordinates
interchanged and v replaced by -v.
2. Time dilation. Consider two events which occur at the same place in one of the two frames
- say the frame S'. Denoting the coordinates of the events (x'1, y'1, z'1, t'1) and (x'2, y'2, z'2,
t'2) in frame S' (so that x'1 = x'2, y'1 = y'2, z'1 = z'2) and (x1, y1, z1, t1) and (x2, y2, z2, t2) in
frame S, application of the fourth of the transformation equations (8) to both events followed
by subtraction yields the result
t2  t1 
(t 2 t 1)  (v / c 2 )( x 
2  x 1)
1 v / c
t 
2  t1
1 v 2 / c 2
This equation shows that the time interval between the two events does not take the same
value in all frames of reference; the interval in S is extended by a factor (1-v2/c2)-1/2
compared with the interval in S'. We call this time dilation. The quantity t2'-t1' is called the
proper time interval between the events, and is the minimum time interval which would be
ascribed to the two events in any inertial reference frame.
This result is often expressed by the statement: "moving clocks go slow". This refers to a
hypothetical attempt by observers in one frame of reference - the frame S in this case - to
ascertain the rate of a standard clock which is at rest in the moving frame S'. To do this the
observers in S note two particular ticks of the moving clock and use these as the two events
whose space and time coordinates have just been described. They are bound to find, if the
theory is correct, that the moving clock has registered a smaller time interval than is recorded
by clocks in their own frame, so their conclusion is that, compared with their own clocks
(which naturally are assumed to be "correct"), the moving clock is going slow.
It is only apparently anomalous that the same conclusion, but in reverse, would be reached by
observers in the frame S' who make observations on a clock at rest in S. Their conclusion, in
other words, is that
t 
2  t 
1 
t2  t1
1  v 2 / c2
the same equation as (9) but with the primed and unprimed coordinates interchanged. There
is no contradiction between these equations, since the two physical situations are different. In
the first case, the two events which are being used for purposes of comparing time intervals 6
the two ticks of the clock in S' - occur at the same place in frame S', and in the second, they
are at the same place in frame S.
3. The velocity of light as a limiting velocity. The factor (1-v2/c2)-1/2 becomes imaginary for
values of v exceeding c, so we see that no frame of reference can have a value of v in this
range. Furthermore, since frames of reference may be constructed from material objects, it
follows that no particle may have a velocity exceeding c either.
4.The Doppler effect. Like the Doppler effect in acoustics, this refers to the shift in frequency
when a wave impinges on two observers who are in relative motion with respect to another.
In its simplest version, we will consider an electromagnetic wave approaching from x = +∞
two observers located at the origins of the two reference frames S and S'. Given that the two
observers record frequencies and ' respectively, it can be shown that these are related by
the equation
 1  v / c
1v / c
If the source of the light is stationary in frame S, then the fact that the square root factor is
greater than 1 indicates that the observer in S', for whom the light source is approaching,
would note a frequency shift towards the blue end of the spectrum. Similarly when the source
of light is receding from the observer S' (i.e. light is approaching from -∞) a similar Doppler
effect is produced, but the factor on the RHS is the inverse of that in (11), indicating a shift of
spectral lines towards the red end of the spectrum
Back to Newtonian Mechanics - Inertial Forces
In a frame moving with acceleration a with respect to an inertial frame, a free particle will
have an acceleration -a which, if the first frame had been inertial, would have been caused by
a force of magnitude -ma . Although there is no real force acting on the particle, we can, if
we like, attribute the acceleration -a to the action of a hypothetical "force" of just this
magnitude. If we are prepared to introduce in this way forces which have no physical origin,
we may extend the validity of (1) to all frames of reference, on the understanding that F now
includes fictitious or inertial forces as well as real forces if the frame is non-inertial. The
characteristic feature of any inertial force is that the acceleration it "produces" is independent
of the mass of the particle on which it acts.
Newtonian Gravitational Theory
Suppose two particles, of gravitational mass M and M', are situated at r and r' respectively.
According to Newtonian gravitational theory, the force on M is
r  r
F  GM M 
| r  r| 3
where G is the gravitational constant (G = 6.67 x10
-11 3 -1 -2
.) Alternatively , the force may be expressed in terms of the gravitational potential function evaluated at r, i.e.
F  MV (r)
GM 
V (r)  
| r  r|
If, instead of a point source M', we have a distribution of matter represented by a density ,
then the generalisation of (14) is
( r) 3
V (r)  G
d r
| r  r|
By analogy with electrostatics where an inverse square law also applies, or otherwise, it is
easily shown that
 V  4G .
In terms of V, the motion of the particle is governed by Newton's second law:
rÝ F  MV
Now there is no apparent reason why any connection should exist between the gravitational
mass and inertial mass for a particle; the first quantity measures its capacity for gravitational
interaction with other particles (just as its charge measures its capacity for electromagnetic
interaction), while the second measures its resistance to change of velocity when acted on by
a force. However experimentally (e.g. Eötvös' experiment) one finds that M = k m where k is
numerically the same, to very high accuracy, for all particles. The statement that the
proportionality between M and m is exact, and not just correct to a very good approximation,
is one form of the Principle of Equivalence. We may of course adjust the unit of
gravitational mass so that k = 1; gravitational and inertial masses are then numerically,
though not conceptually, the same. It then follows from (17) that
Ý V
i.e. the acceleration of a particle in a gravitational field is independent of its mass.
The general success of Newton's laws of motion together with the gravitational equation (12)
in explaining the motion of the heavenly bodies is well known, and their successful
application to other branches of physics led to the belief that they must be universally
applicable. However, the weaknesses of the Newtonian approach to gravity were eventually
realised to be as follows:
1.The Newtonian field equations are time independent, implying action at a distance with
information being propagated at infinite speed, contrary to the conclusions of special
2.The apparently fortuitous proportionality between inertial and gravitational mass already
referred to. Newtonian theory provides no explanation for this.
3.There was discovered an unpredicted residual advance in the perihelion angle of the orbit of
the planet Mercury, after allowing for the perturbations to its elliptic orbit due to the
influence of the other planets.
These weaknesses provided the motivation for the development of the general theory of
relativity, introduced by A Einstein in 1916. Founded on only a few postulates, it links the
motion of free particles to the presence of large gravitating masses via the intrinsic geometry
of spacetime, and serves as a base for the construction of theories of the universe.
The General Theory of Relativity
Einstein started from the supposition that the observed proportionality between inertial mass
and gravitational mass is no accident. This proportionality implies that the acceleration of a
particle in a gravitational field is independent of its mass, which is the same result as when
the "force" supplying the acceleration is inertial. If this is to be no coincidence, it must be
because the gravitational force is itself of the inertial variety, i.e. it manifests itself as
"causing" the acceleration of test particles simply because of the peculiar choice of reference
frame for observation of the motion. In a frame falling freely under gravity however, the
acceleration of test particles is zero and the gravitational "force" disappears. This simple
reasoning causes us to adopt the following change of outlook. We retain Newton's laws and
the inertial frames of reference, but with this change: that real forces do not now include that
previously described as the force of gravitation. This means in practice that inertial frames
are those which are freely falling but not rotating with respect to the fixed stars. Note that
since the acceleration due to gravity varies from place to place, a given reference frame,
defined by rigid axes, can be inertial only locally, strictly at a point but approximately so over
a small region. As pointed out above, Special Relativity (SR) ascribes particular importance
to the inertial frames of reference. Since what we mean by "inertial frames" is not altered by
the change in outlook described above (rather it is the particular reference frames which
qualify for that description), it is natural to suppose that this special relationship should
continue, i.e. that SR holds only in freely falling frames, those in which all gravitational
effects disappear. This supposition is in fact supported by experimental evidence. According
to SR, light travels in straight lines with respect to any inertial frame of reference In a
non-inertial frame, e.g. one accelerating with respect to the first, it is easy to see (especially if
the light is assumed to consist of a stream of photons) that light rays must be curved Hence if
a gravitational field is in some sense equivalent to an "inertial field", as Einstein's theory says
it is*, light rays should be bent by the gravitational field of a large body. It is this which
experiments confirm.
The fact that no reference frame is inertial everywhere means that there is now no reason why
one set of reference frames should be favoured over all others, as was the assumption in SR,
for the expression of the laws of physics. (Locally, there is of course a preferred set - those in
which no gravitational effects appear and in which SR holds.) This is the motivation for the
Principle of Covariance : "The laws of physics should be expressible in the same form (i.e.
generally covariant) in all reference frames". The reshaping of the laws of physics into a
generally covariant form therefore becomes one of the principal tasks of the general theory of
relativity. Since tensor equations are true in all coordinate systems, it is natural that the
mathematical expression of the theory should be in tensor form.
*The qualification "in some sense" is vital to the truth of this statement, and its frequent
omission is a cause of great confusion. Specifically, effects due to a gravitational field are
equal to those caused by a mere acceleration only to first order, i.e. when second and higher
derivatives of the metric tensor are ignored.
N C McGill

Need for the General Theory