Blau. Lectures Notes On GR PDF
Blau. Lectures Notes On GR PDF
Matthias Blau
Albert Einstein Center for Fundamental Physics
Institut für Theoretische Physik
Universität Bern
CH-3012 Bern, Switzerland
http://www.blau.itp.unibe.ch/Lecturenotes.html
1
Contents
0 Introduction 11
3 Tensor Algebra 72
3.1 From the Einstein Equivalence Principle to the Principle of General Covariance . 72
3.2 Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3 Tensor Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Tensor Densities and Volume Elements . . . . . . . . . . . . . . . . . . . . . . . . 79
2
4 Tensor Analysis 90
4.1 The Covariant Derivative for Vector Fields . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Invariant Interpretation of the Covariant Derivative . . . . . . . . . . . . . . . . . 91
5.6 Klein-Gordon Scalar Field in (1+1) Minkowski and Rindler Space . . . . . . . . 113
5.7 Minimal Coupling and (quasi-)Topological Couplings . . . . . . . . . . . . . . . . 115
5.8 Conserved Quantities from Covariantly Conserved Currents . . . . . . . . . . . . 117
7.4 Conserved Charges from Killing Tensors and Killing-Yano Tensors . . . . . . . . 134
3
8 Curvature I: The Riemann Curvature Tensor 137
8.1 Curvature: Preliminary Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.2 The Riemann Curvature Tensor from the Commutator of Covariant Derivatives . 137
11.2 The Matter Action and the Covariant Energy-Momentum Tensor . . . . . . . . . 184
11.3 Consequences of the Variational Principle . . . . . . . . . . . . . . . . . . . . . . 187
11.4 Canonical vs Covariant Energy-Momentum Tensor . . . . . . . . . . . . . . . . . 190
4
11.6 Comments on Gravitational Energy . . . . . . . . . . . . . . . . . . . . . . . . . . 201
11.7 The Palatini Variational Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 205
12.3 Solving the Einstein Equations for a Static Spherically Symmetric Metric . . . . 215
12.4 Schwarzschild Coordinates and Schwarzschild Radius . . . . . . . . . . . . . . . . 218
14.5 The Geometry Near rs and Minkowski Space in Rindler Coordinates . . . . . . . 260
14.6 Tortoise Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
14.7 Klein-Gordon Scalar Field in the Schwarzschild Geometry . . . . . . . . . . . . . 264
5
15 The Schwarzschild Black Hole 267
15.1 Crossing rs with Painlevé-Gullstrand Coordinates . . . . . . . . . . . . . . . . . . 267
15.2 Lemaı̂tre and Novikov Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 273
6
18.4 Fundamental Observations II: The Hubble(-Lemaı̂tre) Expansion . . . . . . . . . 339
18.5 Mathematical Model: the Robertson-Walker Metric . . . . . . . . . . . . . . . . . 340
18.6 Area Measurements and Number Counts . . . . . . . . . . . . . . . . . . . . . . . 345
21.5 The Cosmological Constant Dominated Era: (Anti-) de Sitter Space . . . . . . . 388
21.6 The Λ-CDM Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
7
22 The Reissner-Nordstrøm Solution 393
22.1 The Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
22.2 Basic Properties of the Naked Singularity Solution with m2 − q 2 < 0 . . . . . . . 396
8
26 Interlude: Null Congruences and Horizons 470
26.1 Expansions and Inaffinities of Radial Null Congruences . . . . . . . . . . . . . . . 470
26.2 The Raychaudhuri Equation for Null Geodesic Congruences . . . . . . . . . . . . 474
9
30 Kaluza-Klein Theory 534
30.1 Motivation: Gravity and Gauge Theory . . . . . . . . . . . . . . . . . . . . . . . 534
30.2 The Kaluza-Klein Miracle: History and Overview . . . . . . . . . . . . . . . . . . 535
30.6 Masses and Charges from Scalar Fields in Five Dimenions . . . . . . . . . . . . . 542
30.7 Kinematics of Dimensional Reduction . . . . . . . . . . . . . . . . . . . . . . . . 545
10
0 Introduction
1905 was Einstein’s magical year. In that year, he published three articles, on light
quanta, on the foundations of the theory of Special Relativity, and on Brownian motion,
each one separately worthy of a Nobel prize. Immediately after his work on Special
Relativity, Einstein started thinking about gravity and how to give it a relativistically
invariant formulation. He kept on working on this problem during the next ten years,
doing little else. This work, after many trials and errors, culminated in his masterpiece,
the General Theory of Relativity, presented in 1915/1916. It is clearly one of the greatest
scientific achievements of all time, a beautiful theory derived from pure thought and
physical intuition, capable of explaining, still today, almost 100 years later, virtually
every aspect of gravitational physics ever observed.
Einstein’s key insight was that gravity is not a physical external force like the other
forces of nature but rather a manifestation of the curvature of space-time itself. This
realisation, in its simplicity and beauty, has had a profound impact on theoretical physics
as a whole, and Einstein’s vision of a geometrisation of all of physics is still with us today.
These lecture notes for an introductory course on General Relativity are based on a
course that I originally gave in the years 1998-2003 in the framework of the Diploma
Course of the ICTP (Trieste, Italy). Currently these notes form the basis of a course
that I teach as part of the Master in Theoretical Physics curriculum at the University
of Bern.
In the intervening years, I have made (and keep making) various additions to the lecture
notes, and they now include more material than is needed for (or can realistically be
covered in) an introductory 1-semester course, say, but I hope to have nevertheless
conserved the introductory character and accessible style of the original notes.
Part I of these lecture notes will be dedicated to explaining and exploring the conse-
quences of Einstein’s insights into the relation between gravity and space-time geometry,
and to developing the machinery (of tensor calculus and Riemannian geometry) required
to describe physics in a curved space time, i.e. in a gravitational field, and the dynamics
of the gravitational field itself.
From about section 3 onwards, Part I can be read in parallel with Part II which
deals with various applications of General Relativity that are part of most introduc-
tory courses in General Relativity:
• Black Holes
• Gravitational Waves
• Cosmology
11
Part III contains miscellaneous topics of a semi-advanced nature, and the selection here
is less systematic and necesasarily biased: it contains topics that I either find particuarly
suitable as follow-up subejcts to those discussed in Part II, or consider to be particularly
interesting, or simply happen to have worked out at some point.
Remarks:
• Special Relativity,
• Lagrangian mechanics.
• vector calculus and coordinate transformations
Readers familiar with this book will still recognise the similarities in various places
(and I have made no attempt to disguise this). Even though my own way of
thinking about general relativity is somewhat more geometric, I have found that
the approach adopted by Weinberg is ideally suited to introduce general relativity
to students with little mathematical background.
12
I have also used a number of other sources, including, in particular, Sean Carroll’s
on-line lecture notes (http://pancake.uchicago.edu/~carroll/notes/) which
have, in the meantime, been expanded into the lovely textbook
3. A relatively new basic and introductory book that I like and highly recommend is
If you are missing exercises in or to these lecture notes, then take a look at
4. Invariably, any set of (introductory) lecture notes has its shortcomings, due to
lack of space and time, the requirements of the audience and the expertise (or
lack thereof) and interests of the lecturer. These lecture notes are, of course, no
exception.
In my opinion, among the weaknesses of this course or these lecture notes are the
following:
13
Moreover, practically no mention is made of the manifestly coordinate inde-
pendent calculus of differential forms. Given a little bit more time, it would
be possible to cover the (extremely useful) vielbein and differential form for-
mulations of general relativity, and a supplement to these lecture notes on
this subject is in preparation.
• The discussion of the causal structure of the Schwarzschild metric and its
Kruskal-Szekeres extension currently stops short of introducing Penrose di-
agrams. These are useful and important and, once again, given a bit more
time, this is a subject that could and ought to be covered as well.
There are numerous other important topics not treated in these notes, foremost
among them perhaps a discussion of the canonical ADM formalism, more gener-
ally a discussion of notions of energy in general relativity (which is only briefly
touched upon in these notes), the post-Newtonian approximation, results of math-
ematical general relativity regarding singularities or properties of general back hole
solutions etc.
Including all these topics would require at least one more one-semester course and
would turn these lecture notes into a (rather voluminous) book. The former was
not possible, given initially the constraints of the ICTP Diploma Course and now
those of the Bologna system, and the latter is not my intention, since a number
of excellent textbooks on General Relativity have appeared in recent years. I can
only hope that these lecture notes provide the necessary background for studying
these more advanced topics.
Further additions and updates to these notes are in preparation. I am grateful for
feedback of any kind: complaints, constructive criticism, corrections, and suggestions
for what else to include. If you have any comments on these notes, or also if you just
happen to find them useful, please let me know (blau@itp.unibe.ch).
14
Part I: Towards the Einstein Equations
First of all, let us ask the question why we should not be happy with the classical
Newtonian description of gravity. Well, for one, this theory is not Lorentz invariant,
postulating an action at a distance and an instantaneous propagation of the gravitational
field to every point in space. This is something that Einstein had just successfully
exorcised from other aspects of physics, and clearly Newtonian gravity had to be revised
as well.
It is then immediately clear that what would have to replace Newton’s theory is some-
thing rather more complicated. The reason for this is that, according to Special Rela-
tivity, mass is just another form of energy. But then, since gravity couples to masses,
in a relativistically invariant theory gravity will also have to couple to energy. In par-
ticular, therefore, gravity would have to couple to gravitational energy, i.e. to itself. As
a consequence, the new gravitational field equations will, unlike Newton’s, have to be
non-linear: the field of the sum of two masses cannot equal the sum of the gravitational
fields of the two masses because it should also take into account the gravitational energy
of the two-body system.
But now, having realised that Newton’s theory cannot be the final word on the issue,
how does one go about finding a better theory?
I will first very briefly discuss (and then dismiss) what at first sight may appear to be
the most natural and simple approach to formulating a relativistic theory of gravity,
namely the simple replacement of Newton’s field equation
∆φ = 4πGN ρ (1.1)
for the gravitational potential φ given a mass density ρ (with GN denoting, here and
throughout, Newton’s constant, i.e. the gravitational coupling constant) by its relativis-
tically covariant version
While this looks promising, something can’t be quite right about this equation. We
already know (from Special Relativity) that ρ is not a scalar but rather the 00-component
of a tensor, the energy-momentum tensor, so if actually ρ appears on the right-hand
side, φ cannot be a scalar, while if φ is a scalar something needs to be done to fix the
right-hand side.
15
Turning first to the latter possiblility, one option that suggests itself is to replace ρ by
the trace T = T αα of the energy-momentum tensor. This is by definition / construction
a scalar, and it will agree with ρ in the non-relativistic limit. Thus a first attempt at
fixing the above equation might look like
2φ = 4πGN T . (1.3)
This is certainly an attractive equation, but it definitely has the drawback that it is
too linear. Recall from the discussion above that the universality of gravity (coupling
to all forms of matter) and the equivalence of mass and energy lead to the conclusion
that gravity should couple to gravitational energy, invariably predicting non-linear (self-
interacting) equations for the gravitational field. However, the left hand side could be
such that it only reduces to 2 or ∆ of the Newtonian potential in the Newtonian limit
of weak stationary fields. Thus a second attempt at fixing the above equation might
look like
2Φ(φ) = 4πGN T , (1.4)
Such a scalar relativistic theory of gravity and variants thereof were proposed and
studied among others by Abraham, Mie, and Nordstrøm. As it stands, this field equation
appears to be perfectly consistent (and it may be interesting to discuss if/how the
Einstein equivalence principle, which will put us on our route towards metrics and
space-time curvature) is realised in such a theory. However, regardless of this, this
theory is incorrect simply because it is ruled out experimentally. The easiest way to see
this (with hindsight) is to note that the energy-momentum tensor of Maxwell theory
(5.32) is traceless, and thus the above equation would predict no coupling of gravity to
the electro-magnetic field, in particular to light, hence in such a theory there would be
no deflection of light by the sun etc.1
The other possibility to render (1.2) consistent is the, a priori perhaps much less com-
pelling, option to think of φ and ∆φ or 2φ not as scalars but as (00)-components of
some tensor, in which case one could try to salvage (1.2) by promoting it to a tensorial
equation
{Some tensor generalising ∆φ}αβ ∼ 4πGN Tαβ . (1.5)
This is indeed the form of the field equations for gravity (the Einstein equations) we
will ultimately be led to (see section 10.4), but Einstein arrived at this in a completely
different, and much more insightful, way.
Let us now, very briefly and in a streamlined way, try to retrace Einstein’s thoughts
which, as we will see, will lead us rather quickly to the geometric picture of gravity
sketched in the Introduction. He approached this by thinking about three related issues,
1
For more on the history and properties of scalar theories of gravity see the review by D. Giulini,
What is (not) wrong with scalar gravity?, arXiv:gr-qc/0611100.
16
1. the equivalence principle of Special Relativity;
As regards the first issue, let me just recall that Special Relativity postulates a preferred
class of inertial frames, namely those travelling at constant velocity to each other. But
this raises the questions (I will just raise and not attempt to answer) what is special
about constant velocities and, more fundamentally, velocities constant with respect to
what? Some absolute space? The background of the stars? . . . ?
Regarding the second issue, recall that in Newtonian theory, classical mechanics, there
are two a priori independent concepts of mass: inertial mass mi , which accounts for
the resistance against acceleration, and gravitational mass mg which is the mass gravity
couples to. Now it is an important empirical fact that the inertial mass of a body is equal
to its gravitational mass. This is usually paraphrased as ‘all bodies fall at the same rate
in a gravitational field’. This realisation, at least with this clarity, is usually attributed
to Galileo (it is not true, though, that Galileo dropped objects from the leaning tower
of Pisa to test this - he used an inclined plane, a water clock and a pendulum).
17
Figure 1: An experimenter and his two stones freely floating somewhere in outer space,
i.e. in the absence of forces.
Figure 2: Constant acceleration upwards mimics the effect of a gravitational field: ex-
perimenter and stones drop to the bottom of the box.
2. Now assume (Figure 2) that somebody on the outside suddenly pulls the box up
with a constant acceleration. Then of course, our friend will be pressed to the
bottom of the elevator with a constant force and he will also see his stones drop
to the floor.
3. Now consider (Figure 3) this same box brought into a constant gravitational field.
Then again, he will be pressed to the bottom of the elevator with a constant force
and he will see his stones drop to the floor. With no experiment inside the elevator
can he decide if this is actually due to a gravitational field or due to the fact that
somebody is pulling the elevator upwards.
18
Figure 3: The effect of a constant gravitational field: indistinguishable for our experi-
menter from that of a constant acceleration in Figure 2.
Thus our first lesson is that, indeed, locally the effects of acceleration and gravity
are indistinguishable.
4. Now consider somebody cutting the cable of the elevator (Figure 4). Then the
elevator will fall freely downwards but, as in Figure 1, our experimenter and his
stones will float as in the absence of gravity.
Thus lesson number two is that, locally the effect of gravity can be eliminated
by going to a freely falling reference frame (or coordinate system). This should
not come as a surprise. In the Newtonian theory, if the free fall in a constant
gravitational field is described by the equation
and the effect of gravity has been eliminated by going to the freely falling coordi-
nate system ξ. The crucial point here is that in such a reference frame not only our
observer will float freely, but because of the equality of inertial and gravitational
mass he will also observe all other objects obeying the usual laws of motion in the
absence of gravity.
19
Figure 4: Free fall in a gravitational field has the same effect as no gravitational field
(Figure 1): experimenter and stones float.
Figure 5: The experimenter and his stones in a non-uniform gravitational field: the
stones will approach each other slightly as they fall to the bottom of the elevator.
20
Figure 6: Experimentator and stones freely falling in a non-uniform gravitational field.
The experimenter floats, so do the stones, but they move closer together, indicating the
presence of some external force.
5. In the above discussion, I have put the emphasis on constant accelerations and
on ‘locally’. To see the significance of this, consider our experimenter with his
elevator in the gravitational field of the earth (Figure 5). This gravitational field
is not constant but spherically symmetric, pointing towards the center of the
earth. Therefore the stones will slightly approach each other as they fall towards
the bottom of the elevator, in the direction of the center of the gravitational field.
Thus, if somebody cuts the cable now and the elevator is again in free fall (Figure
6), our experimenter will float again, so will the stones, but our experimenter will
also notice that the stones move closer together for some reason. He will have to
conclude that there is some force responsible for this.
This is lesson number three: in a non-uniform gravitational field the effects of
gravity cannot be eliminated by going to a freely falling coordinate system. This
is only possible locally, on such scales on which the gravitational field is essentially
constant.
Einstein formalised the outcome of these thought experiments in what is now known as
the Einstein Equivalence Principle which roughly states that physics in a freely falling
frame in a gravitational field is the same physics in an inertial frame in Minkowski space
in the absence of gravitation. Two formulation are
21
to choose a locally inertial (or freely falling) coordinate system such that,
within a sufficiently small region of this point, the laws of nature take the
same form as in unaccelerated Cartesian coordinate systems in the absence
of gravitation. (Weinberg, Gravitation and Cosmology)
and
There are different versions of this principle depending on what precisely one means by
‘the laws of nature’. If one just means the laws of Newtonian (or relativistic) mechanics,
then this priciple essentially reduces to the statement that inertial and gravitational
mass are equal. Usually, however, this statement is taken to imply also Maxwell’s
theory, quantum mechanics etc. What it asserts in its strong form is that
The power of the above principle lies in the fact that we can combine it with our
understanding of physics in accelerated reference systems to gain insight into the physics
in a gravitational field. Two immediate consequences of this (which are hard to derive
on the basis of Newtonian physics or Special Relativity alone) are
To see the inevitability of the first assertion, imagine a light ray entering the rocket /
elevator in Figure 1 horizontally through a window on the left hand side and exiting
again at the same height through a window on the right. Now imagine, as in Figure
2, accelerating the elevator upwards. Then clearly the light ray that enters on the left
will exit at a lower point of the elevator on the right because the elevator is accelerating
upwards. By the equivalence principle one should observe exactly the same thing in a
constant gravitational field (Figure 3). It follows that in a gravitational field the light
ray is bent downwards, i.e. it experiences a downward acceleration with the (locally
constant) gravitational acceleration g.
To understand the second assertion, one can e.g. simply appeal to the so-called “twin-
paradox” of Special Relativity: the accelerated twin is younger than his unaccelerated
inertial sibling. Hence accelerated clocks run slower than inertial clocks. Hence, by
22
the equivalence principle, clocks in a gravitational field run slower than clocks in the
absence of gravity.
Alternatively, one can imagine two observers at the top and bottom of the elevator,
having identical clocks and sending light signals to each other at regular intervals as
determined by their clocks. Once the elevator accelerates upwards, the observer at
the bottom will receive the signals at a higher rate than he emits them (because he
is accelerating towards the signals he receives), and he will interpret this as his clock
running more slowly than that of the observer at the top. By the equivalence principle,
the same conclusion now applies to two observers at different heights in a gravitational
field. This can also be interpreted in terms of a gravitational red-shift or blue-shift
(photons losing or gaining energy by climbing or falling in a gravitational field), and we
will return to a more quantitative discussion of this effect in section 2.8.
What the equivalence principle tells us is that we can expect to learn something about
the effects of gravitation by transforming the laws of nature (equations of motion) from
an inertial Cartesian coordinate system to other (accelerated, curvilinear) coordinates.
As a first step, we will, in section 1.3 below, discuss the above example of an observer
undergoing constant acceleration in the context of special relativity.
As a preparation for this, and the remainder of the course, this section will provide a
lightning review of the Lorentz-covariant formulation of special relativity, mainly to set
the notation and conventions that will be used throughout, and only to the extent that
it will be used in the following.
1. Minkowski space(-time)
(ξ a ) = (ξ 0 = ct, ξ k = xk ) , (1.9)
where c is the speed of light. Typically in these notes ξ a will indicate such
a (locally) inertial coordinate system, whereas generic coordinates will be
called xµ etc. We will almost always work in units in which c = 1.
(b) Minkowski space is equipped with a prescription for measuring distances,
encoded in a line-element which, in these coordinates, takes the form
X
ds2 = −(dξ 0 )2 + (dξ k )2 . (1.10)
k
23
(c) This can be written as
X
ds2 = −(dξ 0 )2 + (dξ k )2 ≡ ηab dξ a dξ b . (1.11)
k
with metric (ηab ) = diag(−1, +1, +1, +1) or, more explicitly,
−1 0 0 0
0 +1 0 0
ηab = (1.12)
0 0 +1 0
0 0 0 +1
2. Lorentz Transformations
ξ a 7→ ξ̄ a = Laβ ξ b (1.13)
ds̄2 ≡ ηab dξ̄ a dξ̄ b = ηab dξ a dξ b = ds2 ⇔ ηab Lac Lbd = ηcd . (1.14)
ξ̄ = Lξ , Lt ηL = η (1.15)
24
(b) Infinitesimal Lorentz rotations, i.e. Lorentz transformations with L of the
form L = 1 + ω, ω infinitesimal, are characterised by
(c) Poincaré transformations are those affine transformations that leave the Min-
kowski line-element invariant. They are composed of Lorentz transformations
and arbitrary constant translations and thus have the form
ξ a 7→ ξ̄ a = Laβ ξ b + ζ a , (1.19)
infinitesimally
δξ a = ω ab ξ b + ǫa . (1.20)
Any two inertial systems in the sense of the equivalence principle of special
relativity are related by a Poincaré transformation.
(a) The Minkowski metric defines the Lorentz (and Poincaré) invariant distance
a
(∆ξ)2 = ηab (ξPa − ξQ b
)(ξPb − ξQ ) (1.21)
(b) Depending on the sign of (∆ξ)2 , the two events P, Q are called, spacelike,
lightlike (null) or timelike separated,
> 0 spacelike separated
2
(∆ξ) = = 0 lightlike separated (1.22)
< 0 timelike separated
(c) The set of events that are lightlike separated from P define the lightcone
at P . It consists of two components (joined at P ), the future and the past
0 − ξ 0 (positive for Q on the future
lightcone, distinguished by the sign of ξQ P
0 0
lightcone, ξQ > ξP , negative for Q on the past lightcone).
25
It is called spacelike, lightlike (null) or timelike, depending on the sign of
ηab ξ ′a ξ ′b ,
> 0 spacelike
′a ′b
ηab ξ ξ = 0 lightlike (1.24)
< 0 timelike
This sign (and hence this classification) depends only on the image of the
curve, not its parametrisation.
(b) A curve whose tangent vector is everywhere timelike is called a timelike curve
(and likewise for lightlike and spacelike curves). A curve whose tangent
vector is everywhere timelike or null (i.e. non-spacelike) is called a causal
curve. Worldlines of massive particles are timelike curves, those of massless
particles (light) are null curves.
(c) A natural Lorentz-invariant parametrisation of timelike curves is provided by
the Lorentz-invariant proper time τ along the curves,
ξ a = ξ a (τ ) , (1.25)
with
p p p
cdτ = −ds2 = −ηab dξ a dξ b = −ηab ξ ′a ξ ′b dλ
dξ a (τ ) dξ b (τ ) (1.26)
⇒ ηab = −c2 .
dτ dτ
Likewise spacelike curves are naturally parametrised by proper distance ds.
The derivative with respect to proper time will be denoted by an overdot,
d a
ξ̇ a (τ ) = ξ (τ ) . (1.27)
dτ
d a ∂ ξ̄ a d
ξ¯˙a (τ ) = ξ̄ (τ ) = b ξ b (τ ) = Lab ξ̇ b (τ ) . (1.28)
dτ ∂ξ dτ
These are the prototypes of what are called Lorentz vectors or, more gener-
ally, Lorentz tensors.
5. Lorentz Vectors
(a) Lorentz vectors (or 4-vectors) are objects with components v a which trans-
form under Lorentz transformations with the matrix Lab (to be thought of as
the Jacobian of the transformation relating ξ̄ a and ξ a ),
v̄ a = Lab v b . (1.29)
26
(b) ηab can be regarded as defining an indefinite scalar product on the space of
Lorentz vectors, and the Minkowski norm ηab v a v b and the Minkowski scalar
product ηab v a wb of Lorentz vectors are invariant under Lorentz transforma-
tions,
ηab v̄ a v̄ b = ηab v a v b , ηab v̄ a w̄b = ηab v a wb . (1.30)
A vector is called, spacelike, lightlike (null) or timelike depending on the sign
of its Minkowski norm.
(c) One can identify Minkowski space with its “tangent space”, i.e. with the
vector space V = R1,3 of 4-vectors equipped with the quadratic form or
scalar product ηab with signature (1,3).
(a) Lorentz scalars are objects that are invariant under Lorentz transformations.
Examples are scalar products and norms of Lorentz vectors.
(b) Lorentz covectors are objects ua that transform under Lorentz transforma-
tions with the (contragredient = transpose inverse) representation
i.e.
ūa = Λab ub , Λab = ηac Lcd η db , (1.32)
where η ab denotes the components of the inverse metric η −1 . In terms of
Λ, the condition that L is a Lorentz transformation (i.e. preserves ηab ) can
evidently be written as
u : v ∈ V 7→ u(v) = ua v a ∈ R . (1.34)
27
Linear combinations of (p, q)-tensors are again (p, q)-tensors. Arbitrary prod-
ucts and contractions of Lorentz tensors are again Lorentz tensors (and the
tensor type can be read off from the number and position of the “free” in-
dices).
7. Tensor Fields
(a) Lorentz tensor fields are assignments of Lorentz tensors to each point of
Minkowski space,
a ...a
T : ξ 7→ Tc11...cqp (ξ) . (1.37)
(b) Given a vector field V a (ξ), ηab V a (ξ)V b (ξ) is an example of a scalar field, and
given a scalar field f (ξ), its partial derivatives give a covector field
2 = η ab ∂a ∂b (1.40)
are Lorentz invariant in the sense that they are satisfied in one inertial system
iff they are satisfied in all inertial systems.
ua ≡ ξ˙a (τ ) (1.42)
28
(b) The Lorentz-covariant acceleration is the 4-vector
d c d2
ac = u = 2 ξc , (1.44)
dτ dτ
and the equation of motion of a massive free particle is
d2 c
ac = ξ (τ ) = 0 . (1.45)
dτ 2
We will study this equation further (in any arbitrary coordinate system) in
section 1.4. For observers with non-zero acceleration it follows from (1.43)
by differentiation that ac is orthogonal to ub ,
ac uc ≡ ηcb ac ub = 0 , (1.46)
9. Energy-Momentum 4-Vector
29
(c) The pa are the compoments of a Lorentz vector, the energy-momentum 4-
vector. It satisfies the mass-shell relation
We return to the issue discussed in the context of the Einstein equivalence principle in
section 1.1, namely physics as experienced by an observer undergoing constant accel-
eration (as a precursor to studying this observer in a genuine gravitational field), now
specifically within the framework of special relativity.
Specialising (1.46) to an observer accelerating in the ξ 1 -direction (so that in the mo-
mentary restframe of this observer one has ua = (1, 0, 0, 0), aa = (0, a, 0, 0)), we will say
that the observer undergoes constant acceleration if a is time-independent. To deter-
mine the worldline of such an observer, we note that the general solution to (1.43) with
u2 = u3 = 0,
ηab ua ub = −(u0 )2 + (u1 )2 = −1 , (1.56)
is
u0 = cosh F (τ ) , u1 = sinh F (τ ) (1.57)
with norm
a2 = Ḟ 2 , (1.59)
We can now ask the question what the Minkowski metric or line-element looks like in
the restframe of such an observer. Note that one cannot expect this to be again the
constant Minkowski metric ηab : the transformation to an accelerated reference system,
30
worldline of a
stationary observer
eta constant
rho constant
Figure 7: The Rindler metric: Rindler coordinates (η, ρ) cover the first quadrant ξ 1 >
|ξ 0 |. Indicated are lines of constant ρ (hyperbolas, worldlines of constantly accelerating
observers) and lines of constant η (straight lines through the origin). The quadrant is
bounded by the lightlike lines ξ 0 = ±ξ 1 ⇔ η = ±∞. A stationary observer reaches and
crosses the line η = ∞ in finite proper time τ = ξ 0 .
while certainly allowed in special relativity, is not a Lorentz transformation, while ηab
is, by definition, invariant under Lorentz-transformations.
We are thus looking for coordinates that are adapted to these accelerated observers in
the same way that the inertial coordinates are adapted to stationary observers (ξ 0 is
proper time, and the spatial components ξ i remain constant). In other words, we seek a
coordinate transformation (ξ 0 , ξ 1 ) → (η, ρ) such that the worldlines of these accelerated
observers are characterised by ρ = constant (this is what we mean by restframe, the
observer stays at a fixed value of ρ) and ideally such that then η is proportional to the
proper time of the observer.
It is now easy to see that in terms of these new coordinates the 2-dimensional Minkowski
metric ds2 = −(dξ 0 )2 + (dξ 1 )2 (we are now suppressing, here and in the remainder of
this subesection, the transverse spectator dimensions 2 and 3) takes the form
31
• The Rindler coordinates ρ and η are obvisouly in some sense hyperbolic (Lorentzian)
analogues of polar coordinates (x = r cos φ, y = r sin φ, ds2 = dx2 + dy 2 =
dr 2 + r 2 dφ2 ). In particular, since
ξ0
(ξ 1 )2 − (ξ 0 )2 = ρ2 , = tanh η , (1.65)
ξ1
• Along the worldline of an observer with constant ρ one has dτ = ρ0 dη, so that his
proper time parametrised path is
Even though (1.64) is just the metric of Minkwoski space-time, written in accelerated
coordinates, this metric exhibits a number of interesting features that are prototypical
of more general metrics that one encounters in general relativity:
1. First of all, we notice that the coefficients of the line element (metric) in (1.64)
are no longer constant (space-time independent). Since in the case of constant
acceleration we are just describing a “fake” gravitational field, this dependence
32
on the coordinates is such that it can be completely and globally eliminated by
passing to appropriate new coordinates (namely inertial Minkowski coordinates).
Since, by the equivalence principle, locally an observer cannot distinguish between
a fake and a “true” gravitational field, this now suggests that a “true” gravitational
field can be described in terms of a space-time coordinate dependent line-element
ds2 = gαβ (x)dxα dxβ where the coordinate dependence on the xα is now such that
it cannot be eliminated globally by a suitable choice of coordinates.
3. The above coordinates do not just fail at ρ = 0, they actually fail to cover large
parts of Minkowski space. Thus the next lesson is that, given a metric in some
coordinate system, one has to investigate if the space-time described in this way
needs to be extended beyond the range of the original coordinates. One way to
analyse this question (which we will make extensive use of in sections 14 and
15 when trying to understand and come to terms with black holes) is to study
light rays or the worldlines of freely falling (inertial) observers. In the present
example, it is evident that a stationary inertial observer (at fixed value of ξ 1 , say,
with ξ 0 = τ his proper time), will “discover” that η = +∞ is not the end of the
world (he crosses this line at finite proper time τ = ξ 1 ) and that Minkowski space
continues (at the very least) into the quadrant ξ 0 > |ξ 1 |.
|∂η |2 = (ξ 0 )2 − (ξ 1 )2 . (1.70)
33
the boundary ξ 1 = |ξ 0 | of that region. As we will discuss in section 15.7, such a
Killing horizon also happens to be one of the characteristic properties of a black
hole.
6. Finally we note that there is a large region of Minkowski space that is “invisible”
to the constantly accelerated observers. While a static observer will eventually
receive information from any event anywhere in space-time (his past lightcone
will eventually cover all of Minkowski space . . . ), the past lightcone of one of
the Rindler accelerated observers (whose worldlines asymptote to the lightcone
direction ξ 0 = ξ 1 ) will asymptotically only cover one half of Minkowski space,
namely the region ξ 0 < ξ 1 . Thus any event above the line ξ 0 = ξ 1 will forever be
invisible to this class of observers. Such an observer-dependent horizon has some
similarities with the event horizon characterising a black hole (section 13).
We now consider the effect of arbitrary (general) coordinate transformations on the laws
of special relativity and the geometry of Minkowski space(-time). Let us see what the
equation of motion (1.45) of a free massive particle looks like when written in some other
(non-inertial, accelerating) coordinate system. It is extremely useful for bookkeeping
purposes and for avoiding algebraic errors to use different kinds of indices for different
coordinate systems. Thus we will call the new coordinates xµ (ξ b ) and not, say, xa (ξ b ).
First of all, proper time should not depend on which coordinates we use to describe the
motion of the particle (the particle couldn’t care less what coordinates we experimenters
or observers use). [By the way: this is the best way to resolve the so-called ‘twin-
paradox’: It doesn’t matter which reference system you use - the accelerating twin in
the rocket will always be younger than her brother when they meet again.] Thus
dτ 2 = −ηab dξ a dξ b
∂ξ a ∂ξ b
= −ηab µ ν dxµ dxν . (1.71)
∂x ∂x
Here
∂ξ a
Jµa (x) = (1.72)
∂xµ
is the Jacobi matrix associated to the coordinate transformation ξ a = ξ a (xµ ), and we
will make the assumnption that (locally) this matrix is non-degenerate, thus has an
inverse Jaµ (x) or Jaµ (ξ) which is the Jacobi matrix associated to the inverse coordinate
transformation xµ = xµ (ξ a ),
34
We see that in the new coordinates, proper time and distance are no longer measured
by the Minkowski metric, but by
where the metric tensor (or metric for short) gµν (x) is
∂ξ a ∂ξ b
gµν (x) = ηab . (1.75)
∂xµ ∂xν
The fact that the Minkowski metric written in the coordinates xµ in general depends
on x should not come as a surprise - after all, this also happens when one writes the
Euclidean metric in spherical coordinates etc.
It is easy to check, using (1.73), that the inverse metric, which we will denote by gµν ,
is given by
∂xµ ∂xν
g µν (x) = η ab . (1.77)
∂ξ a ∂ξ b
We will have much more to say about the metric below and, indeed, throughout this
course.
Turning now to the equation of motion, the usual rules for a change of variables give
d a ∂ξ a dxµ
ξ = , (1.78)
dτ ∂xµ dτ
∂ξ a
where ∂xµ is an invertible matrix at every point. Differentiating once more, one finds
d2 a ∂ξ a d2 xµ ∂ 2 ξ a dxν dxλ
ξ = +
dτ 2 ∂xµ dτ 2 ∂xν ∂xλ dτ dτ
a 2 µ
∂ξ d x ∂xµ ∂ 2 ξ b dxν dxλ
= + . (1.79)
∂xµ dτ 2 ∂ξ b ∂xν ∂xλ dτ dτ
Thus, since the matrix appearing outside the square bracket is invertible, in terms of the
coordinates xµ the equation of motion, or the equation for a straight line in Minkowski
space, becomes
d2 xµ ∂xµ ∂ 2 ξ a dxν dxλ
+ a =0 . (1.80)
dτ 2 ∂ξ ∂xν ∂xλ dτ dτ
The second term in this equation, which we will write as
d2 xµ ν
µ dx dx
λ
+ Γ νλ dτ dτ = 0 , (1.81)
dτ 2
where
∂xµ ∂ 2 ξ a
Γµνλ = , (1.82)
∂ξ a ∂xν ∂xλ
35
represents a pseudo-force or fictitious gravitational force (like a centrifugal force or the
Coriolis force) that arises whenever one describes inertial motion in non-inertial coor-
dinates. This term is absent for linear coordinate transformations ξ a (xµ ) = Mµa xµ . In
particular, this means that the equation (1.45) is invariant under Lorentz transforma-
tions, as it should be.
While (1.81) looks a bit complicated, it has one fundamental and attractive feature
which will also make it the prototype of the kind of equations that we will be looking
for in general. This feature is its covariance under general coordinate transformations,
which means that the equation takes the same form in any coordinate system. Indeed,
this covariance is in some sense tautologically true since the coordinate system {xµ }
that we have chosen is indeed arbitrary. However, it is instructive to see how this comes
about by explicitly transforming (1.81) from one coordinate system to another.
′
Thus consider transforming (1.45) to another coordinate system {y µ }. Following the
same steps as above, one thus arrives at the y-version of (1.79), namely
" ′ ′ ′ ′
#
d2 a ∂ξ a d2 y µ ∂y µ ∂ 2 ξ b dy ν dy λ
ξ = + . (1.83)
dτ 2 ∂y µ′ dτ 2 ∂ξ b ∂y ν ′ ∂y λ′ dτ dτ
In particular, since this matrix is assumed to be invertible, we reach the conclusion that
the left hand side of (1.85) is zero if the term in square brackets on the right hand side
is zero,
′ ν′ λ′
d2 y µ µ′ dy dy d2 xµ ν
µ dx dx
λ
+ Γ ′ ′
ν λ dτ dτ = 0 ⇔ + Γ νλ dτ dτ = 0 (1.86)
dτ 2 dτ 2
This is what is meant by the statement that the equation takes the same form in any
coordinate system. We see that in this case this is achieved by having the equation
transform in a particularly simple way under coordinate transformations, namely as a
tensor.
Remarks:
36
1. We will see below that, in general, that is for an arbitrary metric, not necessarily
related to the Minkowski metric by a coordinate transformation, the equation for
a geodesic, i.e. a path that extremises proper time or proper distance, takes the
form (1.81), where the (pseudo-)force terms can be expressed in terms of the first
derivatives of the metric as
[You should check yourself that plugging the metric (1.75) into this equation, you
find the result (1.82).]
2. In this sense, the metric plays the role of a potential for the pseudo-force and,
more generally, for the gravitational force, and will thus come to play the role
of the fundamental dynamical variable of gravity. Also, in this more general
context the Γµνλ are referred to as the Christoffel symbols of the metric or (in more
fancy terminology) the components of the Levi-Civita connection, a privileged
connection on the tangent (or frame) bundle of a manifold equipped with a metric.
Above we saw that the motion of free particles in Minkowski space in curvilinear coordi-
nates is described in terms of a modified metric, gµν , and a force term Γµνλ representing
the ‘pseudo-force’ on the particle. Thus the Einstein Equivalence Principle suggests
that an appropriate description of true gravitational fields is in terms of a metric tensor
gµν (x) (and its associated Christoffel symbols) which can only locally be related to the
Minkowski metric via a suitable coordinate transformation (to locally inertial coordi-
nates). Thus our starting point will now be a space-time equipped with some metric
gµν (x), which we will assume to be symmetric and non-degenerate, i.e.
A space-time equipped with a metric tensor gµν (x) is called a metric space-time or
(pseudo-)Riemannian space-time. Here “Riemannian” usually refers to a space equipped
with a positive-definite metric (all eigenvalues positive), while pseudo-Riemannian (or
Lorentzian) refers to a space-time with a metric with one negative and 3 (or 27, or
whatever) positive eigenvalues. The metric encodes the information how to measure
(spatial and temporal) distances, as well as areas, volumes etc., via the associated line
element
ds2 = gµν (x)dxµ dxν . (1.89)
Examples:
37
1. Examples that you may be familiar with are the Euclidean metrics on R2 or R3 ,
but written in polar or spherical coordinates,
etc.
We will discuss such metrics in detail later on in the context of cosmology, sections
18-21.
Remarks:
1. A metric determines a geometry, but different metrics may well determine the
same geometry, namely those metrics which are just related by coordinate trans-
formations. In particular, distances should not depend on which coordinate system
38
′
is used. Hence, changing coordinates from the {xµ } to new coordinates {y µ (xµ )}
and demanding that
′ ′
gµν (x)dxµ dxν = gµ′ ν ′ (y)dy µ dy ν , (1.97)
2. Objects which transform in such a nice and simple way under coordinate transfor-
mations are known as tensors - the metric is an example of what is known as (and
we will get to know as) a covariant symmetric rank two tensor. We will study
tensors in much more detail and generality later, starting in section 3.
3. One point to note about this transformation behaviour is that if in one coordinate
system the metric tensor has one negative and three positive eigenvalues (as in a
locally inertial coordinate system), then the same will be true in any other coordi-
nate system (even though the eigenvalues themselves will in general be different) -
this statement should be familiar from linear algebra as Sylvester’s law of inertia.
5. By drawing the coordinate grid determined by the metric tensor, one can convince
onseself that in general a metric space or space-time need not or cannot be flat.
Example: the coordinate grid of the metric dθ 2 + sin2 θdφ2 cannot be drawn in flat
space but can be drawn on the surface of a two-sphere because the infinitesimal
parallelograms described by ds2 degenerate to triangles not just at θ = 0 (as would
also be the case for the flat metric ds2 = dr 2 +r 2 dφ2 in polar coordinates at r = 0),
but also at θ = π.
At this point the question naturally arises how one can tell whether a given (per-
haps complicated looking) metric is just the flat metric written in other coordinates or
whether it describes a genuinely curved space-time. We will see later that there is an
39
object, the Riemann curvature tensor, constructed from the second derivatives of the
metric, which has the property that all of its components vanish if and only if the metric
is a coordinate transform of the flat space Minkowski metric. Thus, given a metric, by
calculating its curvature tensor one can decide if the metric is just the flat metric in
disguise or not. The curvature tensor will be introduced in section 8, and the above
statement will be established in section 9.2.
We have seen that the equation for a straight line in Minkowski space, written in arbi-
trary coordinates, is
d2 xµ ν
µ dx dx
λ
+ Γ νλ dτ dτ = 0 , (1.99)
dτ 2
where the pseudo-force term Γµνλ is given by (1.82). We have also seen in (1.87) (pro-
vided you checked this) that Γµνλ can be expressed in terms of the metric (1.75) as
This gravitational force term is fictitious since it can globally be transformed away by
going to the global inertial coordinates ξ a . The equivalence principle suggests, however,
that in general the equation for the worldline of a massive particle, i.e. a path that
extremises proper time, in a true gravitational field is also of the above form.
We will now confirm this by deriving the equations for a timelike path that extremises
proper time from a variational principle. These paths will be referred to as (timelike)
geodesics. We will breifly return below to the (delicate) issue to which extent these can
be regarded as world lines of actual massive particles.
Recall first of all from special relativity that the Lorentz-invariant action of a free
massive particle with mass m is
Z
S0 = −m dτ , (1.101)
with dτ 2 = −ηab dξ a dξ b . We can adopt the same action in the present setting, with the
coordinate-invariant Lagrangian based on the proper time
Of course m drops out of the variational equations (as it should by the equivalence
principle) and we will therefore ignore m in the following.
40
In order to perform the variation, it is useful to introduce an arbitrary auxiliary param-
eter λ in the initial stages of the calculation via
µ dxν 1/2
dτ = (−gµν dx
dλ dλ ) dλ , (1.104)
and to write Z Z Z
µ dxν 1/2
dτ = (dτ /dλ)dλ = (−gµν dx
dλ dλ ) dλ . (1.105)
keeping the end-points fixed, and will denote the τ -derivatives by ẋµ (τ ). By the standard
variational procedure one then finds
Z Z
µ dxν −1/2 dxµ dxν dδxµ dxν
δ dτ = 12 (−gµν dx dλ dλ ) dλ −δg µν − 2gµν
dλ dλ dλ dλ
Z h i
1
= dτ −gµν ,λ ẋµ ẋν δxλ + 2gµν ẍν δxµ + 2gµν ,λ ẋλ ẋν δxµ
2
Z h i
= dτ gµν ẍν + 12 (gµν ,λ +gµλ ,ν −gνλ ,µ )ẋν ẋλ δxµ (1.107)
Here the factor of 2 in the first equality is a consequence of the symmetry of the metric,
the second equality follows from an integration by parts, the third from relabelling the
indices in one term and using the symmetry in the indices of ẋλ ẋν in the other.
Thus we see that indeed the equations for a timelike geodesic in an arbitrary gravita-
tional field are
d2 xµ ν
µ dx dx
λ
+ Γ νλ dτ dτ = 0 . (1.110)
dτ 2
Remarks:
1. By definition, massive test particles are those particles that satisfy the above
geodesic equation, i.e. that follow timelike geodesics in space-time. However, it
needs to be borne in mind that this notion of a test particle is a fiction, in particular
as it neglects the backreaction, i.e. the change in the background gravitational field
due to the mass of the particle. Moreover, real particles either have a finite extent
(in which case this finite size should play a role in their equations of motion) or
are considered to be point-like. However, the notion of a point-like particle is
41
extremely dangerous and delicate in general relativity: as we will see later, if a
given total mass is concentrated in a sufficiently small region of space-time (and
“point-like” certainly qualifies as “sufficiently small”), then one will end up with a
black hole rather than with the description of a particle. The correct description
of point particles in general relativity is a complicated issue and an active area of
research.2
2. One can also consider spacelike paths that extremise (minimise) proper distance,
by using the action Z
S0 ∼ ds (1.111)
where
ds2 = gµν (x)dxµ dxν . (1.112)
One should also consider massless particles, whose worldlines will be null (or
lightlike) paths. However, in that case one can evidently not use proper time or
proper distance, since these are by definition zero along a null path, dτ 2 = 0. We
will come back to this special case, and a unified description of the massive and
massless case, below (section 2.1). In all cases, we will refer to the resulting paths
as geodesics. If required, we add the qualifier “timelike”, “spacelike” or “null”,
and this is meaningful and unambiguous since, as we will see below, a geodesic
that is initially timelike will always remain timelike etc.
We will have much more to say about geodesics and variational principles in section 2.
The Christoffel symbols play the role of the gravitational force term, and thus in this
sense the components of the metric play the role of the gravitational potential. These
Christoffel symbols play an important role not just in the geodesic equation but, as we
will see later on, more generally in the definition of a covariant derivative operator and
the construction of the curvature tensor.
Two elementary important properties of the Christoffel symbols are that they are sym-
metric in the second and third indices,
and that symmetrising Γµνλ over the first pair of indices one finds
42
Knowing how the metric transforms under coordinate transformations, we can now also
determine how the Christoffel symbols (1.100) and the geodesic equation transform. A
straightforward but not particularly inspiring calculation (which you should nevertheless
do) gives
µ′ ∂xν ∂xλ ′
∂y µ ∂ 2 xµ
µ′ µ ∂y
Γ ν ′ λ′ = Γ νλ µ ν ′ λ′ + . (1.115)
∂x ∂y ∂y ∂xµ ∂y ν ′ ∂y λ′
Thus, Γµνλ transforms inhomogenously under coordinate transformations. If only the
first term on the right hand side were present, then Γµνλ would be a tensor. However, the
second term is there precisely to compensate for the fact that ẍµ is also not a tensor - the
combined geodesic equation transforms in a nice way under coordinate transformations.
Namely, after another not terribly inspiring calculation (which you should nevertheless
also do at least once in your life) , one finds
′ ν′ λ′ ′ λ
d2 y µ µ′ dy dy ∂y µ d2 xµ µ dx dx
ν
+ Γ ν ′ λ′ = + Γ νλ . (1.116)
dτ 2 dτ dτ ∂xµ dτ 2 dτ dτ
This is analogous to the result (1.85) that we had obtained before in Minkoswki space,
and the same remarks about covariance and tensors etc. apply.
Remarks:
1. That the geodesic equation transforms in this simple way (namely as a vector)
should not come as a surprise. We obtained this equation as a variational equation.
The Lagrangian itself is a scalar (invariant under coordinate transformations), and
the variation δxµ is (i.e. transforms like) a vector. Putting these pieces together,
one finds the desired result. [This comment may become less mysterious after the
discussion of tensors and Lie derivatives . . . ]
2. There is of course a very good physical reason for why the force term in the
geodesic equation (quadratic in the 4-velocities) is not tensorial. This simply
reflects the equivalence principle that locally, at a point (or in a sufficiently small
neighbourhood of a point) you can eliminate the gravitational force by going to
a freely falling (inertial) coordinate system. This would not be possible if the
gravitational force term in the equation of motion for a particle were tensorial.
You may feel that, after a promising start in sections 1.1 and 1.3, the things that we
have done subsequently in sections 1.4 - 1.7 look terribly messy. I agree, indeed they
are! However, I can assure you that things will improve dramatically rather quickly and
that this section 1 is by far the messiest part of the entire lecture notes.
Indeed, the main purpose and benefit of developing tensor calculus in the next couple
of sections is to develop a formalism in the framework of which (among other things)
43
• one can avoid having to deal explicitly with objects that transform in complicated
ways under coordinate transformations
• the transformation behaviour of any object is manifest (and does not have to be
checked)
This formalism is simple, elegant and efficient and will then allow us to make rapid
progress towards describing the dynamics in and of a gravitational field in a way com-
patible with the Einstein equivalence principle.
44
2 The Physics and Geometry of Geodesics
Before discussing and establishing the relation between these two actions, let us first
verify that S1 really leads to the same equations of motion as S0 . Either by direct
variation of the action, or by using the Euler-Lagrange equations
d ∂L ∂L
µ
− µ =0 , (2.4)
dτ ∂ ẋ ∂x
one finds that the action is indeed extremised by the solutions to the equation
d2 xµ ν
µ dx dx
λ
+ Γ νλ dλ dλ = 0 . (2.5)
dλ2
This is identical to the geodesic equation derived from S0 (with λ → τ , the proper
time).
Now let us turn to the relationship between, and comparison of, the two actions S0
and S1 . The first thing to notice is that S0 is manifestly parametrisation-invariant,
i.e. independent of how one parametrises the path. The reason for this is that dτ =
(dτ /dλ)dλ is evidently independent of λ. This is not the case for S1 , which changes
under parametrisations or, put more positively, singles out a preferred parametrisation
(or, more precicesly, class of parametrisations). We will discuss this in somewhat more
detail in subsection 2.2
Thus, what is the relation (if any) between the two actions? In order to explain this, it
will be useful to introduce an additional field e(λ) (i.e. in addition to the xα (λ)), and a
“master action” (or parent action) S which we can relate to both S0 and S1 . Consider
the action
Z Z
dxα dxβ
1
S[x, e] = 2 dλ e(λ) gαβ−1
− m e(λ) = dλ e(λ)−1 L − 21 m2 e(λ) (2.6)
2
dλ dλ
45
The crucial property of this action is that it is parametrisation invariant provided that
one declares e(λ) to transform appropriately. It is easy to see that under a transforma-
tion λ → λ̄ = f (λ), with
the action S is invariant provided that e(λ) transforms such that e(λ)dλ is invariant,
i.e.
!
ē(λ̄)dλ̄ = e(λ)dλ ⇒ ē(λ̄) = e(λ)/f ′ (λ) . (2.8)
The first thing to note now is that, courtesy of this parametrisation invariance, we
can always choose a “gauge” in which e(λ) = 1. With this choice, the action S[x, e]
manifestly reduces to the action S1 [x] modulo an irrelevant field-independent constant,
Z Z
1 2
S[x, e = 1] = dλ L − 2 m dλ = S1 [x] + const. . (2.9)
Alternatively, instead of fixing the gauge, we can try to eliminate e(λ) (which appears
purely algebraically, i.e. without derivatives, in the action) by its equation or motion.
Varying S[x, e] with respect to e(λ), one finds the constraint
dxα dxβ
gαβ + m2 e(λ)2 = 0 . (2.10)
dλ dλ
This is just the usual mass-shell condition in disguise. It suggests that a better gauge
fixing than e(λ) = 1 would have been e(λ) = m−1 . However, the sole effect of this would
have been to replace L in (2.9) by mL,
In any caser, for a massive particle, m2 6= 0, one can also solve (2.10) for e(λ),
r
dxα dxβ
e(λ) = m−1 −gαβ . (2.12)
dλ dλ
Using this to eliminate e(λ) from the action, one finds
Z r Z
−1 √ dxα dxβ
S[x, e = m . . .] = −m dλ −gαβ = −m dτ = S0 [x] . (2.13)
dλ dλ
Thus for m2 6= 0 we find exactly the original action (integral of the proper time) S0 [x]
(and since we haven’t touched or fixed the parametrisation invariance, no wonder that
S0 is parametrisation invariant). Thus we have elucidated the common origin of S0 and
S1 for a massive particle.
Remarks:
46
1. An added benefit of the master action S[x, e] is that it also makes perfect sense
for a massless particle. For m2 = 0, the mass shell condition
dxα dxβ
gαβ =0 (2.14)
dλ dλ
says that these particles move along null lines, and the action reduces to
Z
dxα dxβ
S[x, e] = 21 dλ e(λ)−1 gαβ (2.15)
dλ dλ
which is parametrisation invariant but can (as in the massive case) be fixed to
e(λ) = 1, upon which the action reduces to S1 [x]. Thus we see that S1 [x] indeed
provides a simple and unfied action for both massive and massless particles, and in
both cases the resulting equation of motion is the (affinely parametrised) geodesic
equation (2.5),
d2 xα β
α dx dx
γ
+ Γ βγ =0 . (2.16)
dλ2 dλ dλ
2. One important consequence of (2.16) is that the quantity L is a constant of motion,
i.e. constant along the geodesic,
d2 xα β
α dx dx
γ d dxα dxβ
+ Γ βγ =0 ⇒ gαβ =0 . (2.17)
dλ2 dλ dλ dλ dλ dλ
This useful result can be understood and derived in a variety of ways, the least
insightful of which is direct calculation. Nevertheless, this is straightforward and
a good exercise in Γ-ology. An alternative derivation will be given in section 4.7,
using the concept of “parallel transport”.
From the present (action-based) perspective it is most useful to think of this as
the conserved quantity associated (via Noether’s theorem) to the invariance of the
action S1 [x] under translations in λ. Note that evidently S1 [x] has this invariance
(as there is no explicit dependence on λ) and that this invariance is precisely the
residual parametrisation invariance f (λ) = λ + a, f ′ (λ) = 1, that leaves invariant
the “gauge” condition e(λ) = 1. For an infinitesimal constant λ-translation one
has δxα (λ) = dxα /dλ etc., so that
∂ d d ∂L dxα
L = 0 ⇒ δL = L= + Euler − Lagrange .
∂λ dλ dλ ∂(dxα /dλ) dλ
(2.18)
Thus via Noether’s theorem the associated conserved charge for a solution to the
Euler-Lagrange equations is the Legendre transform
∂L dxα
H= −L (2.19)
∂(dxα /dλ) dλ
of the Lagrangian (aka the Hamiltonian, once expressed in terms of the momenta).
In the case at hand, with the Lagrangian L consisting of a purely quadratic term
47
in the velocities (the dxα /dλ), the Hamiltonian is equal to the Lagrangian, and
hence the Lagrangian L itself is conserved,
d
H=L , L|solution = 0 . (2.20)
dλ
3. This is as it should be: something that starts off as a massless particle will remain
a massless particle etc. If one imposes the initial condition
dxµ dxν
gµν =ǫ , (2.21)
dλ dλ λ=0
then this condition will be satisfied for all λ. In particular, therefore, one can
choose ǫ = ∓1 for timelike (spacelike) geodesics, and λ can then be identified with
proper time (proper distance), while the choice ǫ = 0 sets the initial conditions
appropriate to massless particles (for which λ is then not related to proper time
or proper distance).
To understand the significance of how one parametrises the geodesic, observe that the
geodesic equation itself,
ẍµ + Γµνλ ẋν ẋλ = 0 , (2.22)
d2 xµ ν
µ dx dx
λ f¨ dxµ
+ Γ νλ dσ dσ = − . (2.24)
dσ 2 f˙2 dσ
Thus the geodesic equation retains its form only under affine changes of the proper time
parameter τ , f (τ ) = aτ + b, and parameters σ = f (τ ) related to τ by such an affine
transformation are known as affine parameters.
From the first variational principle, based on S0 , the term on the right hand side arises
in the calculation of (1.107) from the integration by parts if one does not switch back
from λ to the affine parameter τ . The second variational principle, based on S1 and the
Lagrangian L, on the other hand, always and automatically yields the geodesic equation
in affine form.
d2 xµ ν
µ dx dx
λ dxµ
+ Γ νλ = κ(σ) , (2.25)
dσ 2 dσ dσ dσ
48
for some function κ(σ), we can deduce that this curve is the trajectory of a geodesic,
but that it is simply not parametrised by an affine parameter (like proper time in the
case of a timelike curve). Comparison of (2.24) and (2.25) shows that, given κ(σ), an
affine parameter τ is determined by
f¨ d dτ
κ(f (τ )) = − ⇔ κ(σ) = ln (2.26)
f˙2 dσ dσ
or Rσ
dτ
= e ds κ(s) . (2.27)
dσ
In the following, whenever we talk about geodesics we will practically always have in
mind the variational principle based on S1 leading to the geodesic equation (2.16) in
affinely parametrised form.
However, it should be kept in mind that sometimes non-affine parameters appear nat-
urally. For instance, it is occasionally convenient to parametrise timelike geodesics in a
geometry with coordinates xα = (x0 = t, xk ) not by xα = xα (τ ), where τ is the proper
time along the geodesic, but rather as xk = xk (t). This is the same curve, but described
with respect to coordinate time (which could for instance agree with the proper time
of some other, perhaps stationary, observer). The curve t → (t, xk (t)) will not be an
affinely parametrised curve unless t satisfies the geodesic equation ẗ = 0 ↔ t = aτ + b.
One occasion where this will play a role (and from where I have borrowed the symbol
κ for the “inaffinity”) is in our discussion, much later, of the horizon of a black hole,
where the lack of a certain coordinate to be an affine parameter is directly related to
the physical properties of black holes (see section 15.7). In this context κ is known as
the surface gravity of a black hole.
It is high time to consider an example. We will consider the simplest non-trivial metric,
namely the standard Euclidean metric on R2 in polar coordinates. Thus the line element
is
ds2 = dr 2 + r 2 dφ2 (2.28)
and the non-zero components of the metric are grr = 1, gφφ = r 2 . Since this metric is
diagonal, the components of the inverse metric gµν are g rr = 1 and g φφ = r −2 .
A remark on notation: since µ, ν in gµν are coordinate indices, we should really have
called x1 = r, x2 = φ, and written g11 = 1, g22 = r 2 , etc. However, writing grr etc. is
more informative and useful since one then knows that this is the (rr)-component of the
metric without having to remember if one called r = x1 or r = x2 . In the following we
will frequently use this kind of notation when dealing with a specific coordinate system,
while we retain the index notation gµν etc. for general purposes.
49
The Christoffel symbols of this metric are to be calculated from
Since the only non-trivial derivative of the metric is gφφ,r = 2r, only Christoffel symbols
with exactly two φ’s and one r are non-zero,
In any case, having assembled all the Christoffel symbols, we can now write down the
geodesic equations (one again in the convenient hybrid notation). For r one has
r̈ − r φ̇2 = 0 . (2.33)
The geodesic equations can of course also be derived as the Euler-Lagrange equations
of the Lagrangian
L = 21 (ṙ 2 + r 2 φ̇2 ) . (2.35)
Indeed, one has
d ∂L ∂L
− = r̈ − r φ̇2 = 0
dτ ∂ ṙ ∂r
d ∂L ∂L
− = r 2 φ̈ + 2r ṙ φ̇ = 0 , (2.36)
dτ ∂ φ̇ ∂φ
50
which are obviously identical to the equations derived above.
You may have the impression that getting the geodesic equation in this way, rather than
via calculation of the Christoffel symbols first, is much simpler. I agree wholeheartedly.
Not only is the Lagrangian approach the method of choice to determine the geodesic
equations. It is also frequently the most efficient method to determine the Christoffel
symbols. This will be described in the next section.
The next simplest example to discuss would be the two-sphere with its standard metric
dθ 2 + sin2 θdφ2 . It will appear, in bits and pieces, in the next section to illustrate the
general remarks.
Recall from above that the geodesic equation for a metric gµν can be derived from the
Lagrangian L = (1/2)gµν ẋµ ẋν
d ∂L ∂L
µ
− µ =0 . (2.39)
dτ ∂ ẋ ∂x
This has several immediate consequences which are useful for the determination of
Christoffel symbols and geodesics in practice.
51
is conserved along the geodesic.
Remarks:
(a) One might perhaps have wanted to argue that the definition (and interpre-
tation) of conserved momenta should be based on the physical Lagrangian
(2.1) r
λ dxα dxβ
L0 = −m −gαβ (2.41)
dλ dλ
R
with action S = −m dτ , but this makes no difference since the two momenta
are essentially equal: one has
∂Lλ0
= mp1 (2.42)
∂(dx1 /dλ)
with p1 as defined in (2.40), so that this just supplies us with the additional
information that the momenta obtained from the Lagrangian L should (for a
massive particle) be interpreted as momenta per unit mass. This discrepancy
could have been avoided by working with the Lagrangian mL (alternatively:
fixing the gauge e(λ) = m−1 in section 2.1, see (2.11)), but unless or until
one starts coupling the particle to fields other than the gravitational field it
is unnecessary (and a nuisance) to carry m around all the time.
(b) For example, on the two-sphere the Lagrangian reads
The angle φ is a cyclic variable and the angular momentum (actually angular
momentum per unit mass for a massive particle)
∂L
pφ = = sin2 θ φ̇ (2.44)
∂ φ̇
is a conserved quantity. This generalises to conservation of angular momen-
tum for a particle moving in an arbitrary spherically symmetric gravitational
field.
(c) Likewise, if the metric is independent of the time coordinate x0 = t, the
corresponding conserved quantity
has the interpretation as minus the energy (per unit mass) of the particle,
“minus” because, with our sign conventions, p0 = −E in special relativity.
We will discuss the relation between this notion of energy and the notion
of energy familiar from special relativity (this requires an asymptotically
Minkowski-like metric) in more detail in section 13.1.
52
(d) We will discuss in more detail in section 2.5 (and then again in sections
6 and 7) how to detect and describe symmetries and conserved charges in
coordinate systems in which the symmetries are not as manifest (via cyclic
variables) as above.
Remarks:
(a) In the case of the two-sphere, with its metric ds2 = dθ 2 + sin2 θdφ2 , this
translates into the familiar statement that the great circles, the coordinate
lines of y = θ, are geodesics.
(b) The result is also valid when y is a timelike coordinate. For example, consider
a space-time with coordinates (t, xi ) and metric
(this describes a space-time with spatial metric gij (x)dxi dxj and a time-
dependent radius a(t); in particular, such a space-time metric can describe
an expanding universe in cosmology - see section 18). In such a space-time,
there is, according to the above result, a privileged class of freely falling
(i.e. geodesic) observers, namely those that stay at fixed values of the spatial
coordinates xi . For such observers, the coordinate-time t coincides with their
proper time τ .
53
Christoffel symbols. Thus you derive the Euler-Lagrange equations, write them
in the form
ẍµ + terms proportional to ẋẋ = 0 , (2.49)
Remarks:
(b) For example, once again in the case of the two-sphere, for the θ-equation one
has
d ∂L
= 2θ̈
dτ ∂ θ̇
∂L
= 2 sin θ cos θ φ̇2 . (2.52)
∂θ
Comparing the variational equation
In the previous section we have seen that cyclic coordinates, i.e. coordinates the metric
does not depend on, lead to conserved charges, as in (2.40). As nice and useful as this
may be (and it is nice and useful), it is obvioulsy somewhat unsatisfactory because it
is an explicitly coordinate-dependent statement: the metric may well be independent
of one coordinate in some coordinate system, but if one now performs a coordinate
transformation which depends on that coordinate, then in the new coordinate system
the metric will typically depend on all the new coordinates. Nevertheless,
54
• the statement that a metric has a certain symmetry (a translational symmetry in
the first coordinate system) should be coordinate-independent, and
• thus there should be a corresponding first integral of the geodesic equation in any
coordinate system.
To see how this works, let us reconsider the situation discussed in the previous section,
namely a metric which in some coordinate system, we will now call it {y µ }, has com-
ponents gµν which are independent of y 1 , say. Translation invariance of the geodesic
Lagrangian is the statement that the Lagrangian is invariant under the infinitesimal
variation δy 1 = ǫ, δy µ = 0 otherwise, and via Noether’s theorem this leads to a con-
served charge g1µ ẏ µ , as in (2.40).
Now we ask ourselves what this statement corresponds to in another coordinate system.
Note that in the y-coordinates, invariance is the statement that the metric is invariant
under the (infinitesimal) coordinate transformation y 1 → y 1 + ǫ or δy 1 = ǫ, δy µ = 0
otherwise,
δgµν ≡ ∂y1 gµν = 0 . (2.56)
It is then clear that in another coordinate system, infinitesimal y 1 -translations must also
correspond to some infinitesimal coordinate transformation (but not necessarily just a
translation),
δxα = ǫV α (x) . (2.57)
so that
V α = J1α (2.59)
55
where
δV gαβ = V γ ∂γ gαβ + (∂α V γ )gγβ + (∂β V γ )gαγ (2.62)
Thus the condition for the infinitesimal transformation (2.57) to leave the La-
grangian invariant is
δV gαβ = 0 . (2.63)
Note that for constant components V α , (2.63) is simply the statement that the
metric is constant in the direction V , V γ ∂γ gαβ = 0.
2. Alternatively, we can determine the variation δV gαβ of the components gαβ of the
metric in x-coordinates from the variation (2.56) of the components gµν of the
metric in y-coordinates by demanding that under a coordinate transformation the
variation (2.56) of the metric transforms like the metric. Since we know how the
metric transforms (1.98), and we also know how ∂y1 transforms,
to show that this expression for δV gαβ is identical to that given in (2.62).
All of this may seem a bit ham-handed at this point, and indeed it is. However, we
will see later how these results can be written and understood in a much more pleasing
and covariant way. In particular, we will see in section 4.5 how to write (2.62) in a way
that makes it completely manifest that it transforms like the metric under coordinate
transformations. Moreover, we will discover in section 6 that (2.62) is a special case of
the Lie derivative of a tensor field along a vector field V , denoted by LV . Continuous
symmetries of a metric correspond to vector fields along which the Lie derivative of the
metric vanishes. Such vectors are known as Killing vectors, and are thus vectors V α
satisfying the Killing equation (2.63),
56
2.6 The Newtonian Limit
We saw that the 10 components of the metric gµν appear to play the role of potentials for
the gravitational force. In order to substantiate this, and to show that in an appropriate
limit this setting is able to reproduce the Newtonian results, we now want to find the
relation of these potentials to the Newtonian potential, and the relation between the
geodesic equation and the Newtonian equation of motion for a particle moving in a
gravitational field.
First let us determine the conditions under which we might expect the general relativistic
equation of motion (namely the non-linear coupled set of partial differential geodesic
equations) to reduce to the linear equation of motion
d2 ~
~x = −∇φ (2.69)
dt2
of Newtonian mechanics, with φ the gravitational potential, e.g.
GN M
φ=− . (2.70)
r
Thus we are trying to characterise the circumstances in which we know and can trust
the validity of Newton’s equations, such as those provided e.g. by the gravitational field
of the earth or the sun, the gravitational fields in which Newton’s laws were discovered
and tested. Two of these are fairly obvious:
1. Weak Fields: our first plausible assumption is that the gravitational field is in a
suitable sense sufficiently weak. We will need to make more precise by what we
mean by this, and we will come back to this below.
2. Slow Motion: our second, equally reasonable and plausible, assumption is that the
test particle moves at speeds at which we can neglect special relativistic effects, so
“slow” should be taken to mean that its velocity is small compared to the velocity
of light.
Interestingly, it turns out that one more condition is required. Note that the gravita-
tional fields we have access to are not only quite weak but also only very slowly varying
in time, and we will add this condition,
3. Stationary Fields: we will assume that the gravitational field does not vary sig-
nificantly in time (over the time scale probed by our test particle).
The very fact that we have to add this condition in order to find Newton’s equations
(as will be borne out by the calculations below) is interesting in its own right, because
it also shows that general relativity predicts phenomena deviating from the Newtonian
picture even for weak fields, provided that they vary sufficiently rapidly (e.g. quickly
57
oscillating fields), and one such phenomenon is that of gravitational waves (see section
16).
Now, having formulated in words the conditions that we wish to impose, we need to
translate these conditions into equations that we can then use in conjunction with the
geodesic equation.
1. In order to define a notion of weak fields, we need to keep in mind that this is
not a coordinate-independent statement since we can simulate arbitrarily strong
gravitational fields even in Minkowski space by going to suitably accelerated co-
ordinates, and therefore a “weak field” condition will be a condition not only on
the metric but also on the choice of coordinates. Thus we assume that we can
choose coordinates {xµ } = {t, xi } in such a way that in these coordinates the
metric differs from the standard constant Minkowski metric ηµν only by a small
amount,
gµν = ηµν + hµν (2.71)
where we will implement “by a small amount” in the calculations below by drop-
ping all terms that are at least quadratic in hµν .
2. The second condition is obviously (with the coordinates chosen above) dxi /dt ≪ 1
or, expressed in terms of proper time,
dxi dt
≪ . (2.72)
dτ dτ
Now we look at the geodesic equation. The condition of slow motion (condition 2)
implies that the geodesic equation can be approximated by
From the weak field condition (condition 1), which allows us to write
where
hµν = η µλ η νρ hλρ , (2.77)
58
we learn that
Γµ00 = − 21 η µi ∂i h00 , (2.78)
ẗ = 0
ẍi = 1
2 ∂i h00 ṫ
2
. (2.80)
As the first of these just says that ṫ is constant, we can use this in the second equation to
convert the τ -derivatives into derivatives with respect to the coordinate time t. Hence
we obtain
d2 xi
= 21 h00 ,i (2.81)
dt2
(the spatial index i in this expression is raised or lowered with the Kronecker symbol,
η ik = δik ). Comparing this with the Newtonian equation (2.69),
d2 xi
= −φ,i (2.82)
dt2
leads us to the identification h00 = −2φ, with the constant of integration absorbed into
an arbitrary constant term in the gravitational potential. By relating this back to gαβ ,
we find the sought-for relation between the Newtonian potential and the space-time
metric.
Remarks:
1. For the gravitational field of isolated systems, it makes sense to choose the in-
tegration constant in such a way that the potential goes to zero at infinity, and
this choice also ensures that the metric approaches the flat Minkowski metric at
infinity.
2. Restoring the appropriate units, in particular a factor of c2 , one finds that φ/c2 ∼
10−9 on the surface of the earth, 10−6 on the surface of the sun (see section 12.4
for some more details), so that the distortion in the space-time geometry produced
by gravitation is in general quite small (justifying our approximations).
3. Within this Newtonian approximation, we can not distinguish the above result
g00 = −(1 + 2φ) from g00 = −(1 + φ)2 , say (or a host of other possibilities).
59
4. Liewise, in this approximation it does not make sense to inquire about the other
subleading components of the metric. As we have seen, a slowly moving particle
in weak stationary gravitational field is not sensitive to them, and hence can also
not be used to probe or determine these components.
Einstein realised fairly early on (1911) in his search for a relativistic theory of
gravity that this would have to be part of the story. However, this interpretation
is neither useful nor tenable when considering gravitational fields beyond the static
Newtonian approximation,
6. Later on, we will determine the exact solution of the Einstein equations (the field
equations for the gravitational field, i.e. for the metric) for the gravitational field
outside a spherically symmetric mass distribution with mass M (the Schwarzschild
metric). The metric turns out to have the simple form (12.25)
2GN M 2GN M −1 2
ds2 = − 1 − c2 2
dt + 1 − dr + r 2 dΩ2 . (2.85)
c2 r c2 r
From this expression one can read off that the leading correction to the flat metric
indeed arises from the 00-component of the metric,
2GN M 2
ds2 ≈ −c2 dt2 + dr 2 + r 2 dΩ2 + dt + . . .
r (2.86)
α β2GN M
= ηαβ dx dx + (dx0 )2 .
rc2
This is indeed precisely of the above Newtonian form, with the standard New-
tonian potential φ = −GN M/r. One can then also determine the subleading
“post-Newtonian” corrections to the gravitational field, which are evidently sup-
pressed by additional inverse powers of c2 .
In section 1.3 we had discussed the Minkowski metric in Rindler coordinates, i.e. in
coordinates adapted to a constantly accelerating observer. In these coordinates the
metric took the form (1.64),
What is the relation, if any, between this metric and the metric describing a weak
gravitational field, as derived above (after all, small accelerations should mimick weak
60
gravitational fields)? At first sight, the only thing they appear to have in common
is that the departure from what would be the Minkowski-metric in these coordinates
is encoded in the time-time component of the metric, ρ2 in one case, (1 + 2φ) in the
other, but apart from that ρ2 and (1 + 2φ) look quite different. This difference is,
however, again a coordinate artefact and the Rindler metric can be made to look like
the weak-field metric with the help of a suitable further redefinition of the coordinates.
For starters, it will be convenient, for this purpose and for a generalisation which we
will discuss below, to introduce the acceleration a explicitly into the coordinates by
redefining the coordinate transformation (1.63) to (I will now also call the Minkowski
coordinates ξ 0 = t and ξ 1 = x)
(so this differs by ρ → ρ/a, η → aη from the transformation given in (1.63)). Thus, now
it is the observer at ρ = 1 who has acceleration a and whose proper time is τ = η. The
Rindler metric now has the form
and it is the observer at r̂ = 0 who has proper time τ = η and constant acceleration a.
If one now assumes that the acceleration a is sufficiently small, one can approximate
and in φ = ar̂ we recognise precisely the Newtonian potential for a force a in the
r̂-direction. Thus the Rindler and weak field form of the metric agree in this case.
Remarks:
1. Remarkably, this same form of the metric remains valid for an arbitrary time-
dependent acceleration a = a(τ ), and thus is capable of reproducing the weak
field form of the metric for general potentials. To see this, consider the wordline
(t(τ ), x(τ )) with general 4-velocity (actually 2-velocity in this case)
which satisfies
(u0 )2 − (u1 )2 = 1 , (2.93)
as it should, and has the time-dependent acceleration a(τ ) = v̇(τ ),
61
We can pass to adapted coordinates (η, r̂), as above, by setting
(t(η, r̂), x(η, r̂)) = (t(η) + r̂ sinh v(η), x(η) + r̂ cosh v(η)) , (2.95)
Note that in these coordinates the metric is conformally flat, i.e. it differs from the
flat Minkowski metric −dη 2 + dξ 2 in these coordinates only by an overall factor.
For the record, and for later use, we note that the complete coordinate transfor-
mation between the Minkowski coordinates (t, x) and the conformally flat Rindler
coordinates (η, ξ) is
In terms of the Minkowski null (or advanced and retarded time) coordinates
or, compactly,
Note that the range of the coordinates is −∞ < η, ξ < +∞ or −∞ < uR , vR <
+∞, and that the coordinates (η, ξ) or (uR , vR ) cover (and can be used in) the
right-hand quadrant x > |t| of Minkowski space-time, corresponding to −∞ <
uM = t − x < 0 and 0 < vM = t + x < +∞, the so called (right) Rindler wedge.
62
2.8 The Gravitational Red-Shift
The gravitational red-shift (i.e. the fact that photons lose or gain energy when rising or
falling in a gravitational field) is a consequence of the Einstein Equivalence Principle
(and therefore also provides an experimental test of the Einstein Equivalence Principle).
It is clear from the expression dτ 2 = −gµν (x)dxµ dxν that e.g. the rate of clocks is
affected by the gravitational field. However, as everything is affected in the same way
by gravity it is impossible to measure this effect locally. In order to find an observable
effect, one needs to compare data from two different points in a gravitational potential.
The situation we could consider is that of two observers A and B moving on worldlines
(paths) γA and γB , A sending light signals to B. In general the frequency, measured
in the observers rest-frame at A (or in a locally inertial coordinate system there) will
differ from the frequency measured by B upon receiving the signal.
In order to separate out Doppler-like effects due to relative velocities, we consider two
observers A and B at rest radially to each other, at radii rA and rB , in a stationary
spherically symmetric gravitational field. This means that the metric depends only on
a radial coordinate r and we can choose it to be of the form
where dΩ2 is the standard volume element on the two-sphere (see section 12 for a more
detailed justification of this ansatz for the metric).
Observer A sends out light of a given frequency νA , say n pulses per proper time unit
∆τA . Observer B receives these n pulses in his proper time ∆τB and interprets this
as a frequency νB . Thus the relation between the frequency νA emitted at A and the
frequency νB observed at B is
νA ∆τB
= . (2.104)
νB ∆τA
I will now give two arguments to show that this ratio depends on the metric (i.e. the
gravitational field) at rA and rB through
νA
= (g00 (rB )/g00 (rA ))1/2 . (2.105)
νB
1. The first argument is essentially one based on geometric optics (and is best ac-
companied by drawing a (1+1)-dimensional space-time diagram of the light rays
and worldlines of the observers).
The geometry of the situation dictates that the coordinate time intervals recorded
at A and B are equal, ∆tA = ∆tB as nothing in the metric actually depends on
t. In equations, this can be seen as follows. First of all, the equation for a radial
light ray is
−g00 (r)dt2 = grr (r)dr 2 , (2.106)
63
or 1/2
dt grr (r)
=± . (2.107)
dr −g00 (r)
From this we can calculate the coordinate time for the light ray to go from A to
B. Say that the first light pulse is emitted at point A at time t(A)1 and received
at B at coordinate time t(B)1 . Then
Z rB
t(B)1 − t(A)1 = dr(−grr (r)/g00 (r))1/2 (2.108)
rA
But the right hand side obviously does not depend on t, so we also have
Z rB
t(B)2 − t(A)2 = dr(−grr (r)/g00 (r))1/2 (2.109)
rA
where t2 denotes the coordinate time for the arrival of the n-th pulse. Therefore,
or
t(A)2 − t(A)1 = t(B)2 − t(B)1 , (2.111)
as claimed. Thus the coordinate time intervals recorded at A and B between the
first and last pulse are equal. However, to convert this to proper time, we have to
multiply the coordinate time intervals by an r-dependent function,
dxµ dxν 1/2
∆τA,B = (−gµν (rA,B ) ) ∆tA,B , (2.112)
dt dt
and therefore the proper time intervals will not be equal. For observers at rest,
dxi /dt = 0, one has
∆τA,B = (−g00 (rA,B ))1/2 ∆tA,B . (2.113)
2. The second argument uses the null geodesic equation, in particular the conserved
quantity associated to time-translations (recall that we have assumed that the
metric (2.103) is time-independent), as well as a somewhat more covariant looking,
but equivalent, notion of frequency.
First of all, let the light ray be described the wave vector kµ . In special relativity,
we would parametrise this as kµ = (ω, ~k) with ω = 2πν the frequency. This is the
frequency observed by an inertial observer at rest, with 4-velocity uµ = (1, 0, 0, 0).
A Lorentz-invariant, and thus in our context now coordinate-independent, notion
of the frequency as measured by an observer with velocity uµ is thus
ω = −uµ kµ . (2.114)
64
This includes as special cases the relativistic Doppler effect (where one compares
ω with ω̄ = −ūµ kµ , ūµ the tangent to the world line of a boosted observer), as
well as the gravitational red-shift we want to discuss here.
A static observer in the spherically-symmetric and static gravitational field (2.103)
is described by the 4-velocity
(and likewise for the observer at r = rB ). The wave vector kµ is a null tangent
vector, kµ kµ = 0, to a null geodesic corresponding to the Lagrangian
Since the metric is time-independent, there is (cf. the discussion in section 2.4)
the corresponding conserved quantity
∂L
E=− = −g00 (r)ṫ (2.118)
∂ ṫ
(the minus sign serving only to make this quantity positive for ṫ > 0). Then one
finds that the frequency measured by the stationary observer at r = rA is
ωA = −uµA kµ = − (−g00 (rA ))−1/2 k0 = − (−g00 (rA ))−1/2 g0µ (rA )ẋµ
(2.119)
= − (−g00 (rA ))−1/2 g00 (rA )ṫ = E (−g00 (rA ))−1/2
Since E is a conserved quantity, i.e. the same for the light ray at r = rA or r = rB ,
one sees that ωA /ωB = νA /νB is given by (2.105), as claimed.
Note also that this derivation shows that the relation between ω and E is exactly
like the relation (2.113) between (∆τ )−1 and (∆t)−1 , which provides us with an
interpreteation of the conserved quantity E for a massless particle / photon: it
is the frequency measured with respect to coordinate time (as the momentum
conjugate to the time-coordinate t this should not be too surprising).
65
This result can also be deduced from energy conservation. A local inertial observer at
the emitter A will see a change in the internal mass of the emitter ∆mA = −hνA when a
photon of frequency of νA is emitted. Likewise, the absorber at point B will experience
an increase in inertial mass by ∆mB = hνB . But the total internal plus gravitational
potential energy must be conserved. Thus
leading to
νA 1 + φ(rB )
= ∼ 1 + φ(rB ) − φ(rA ) , (2.123)
νB 1 + φ(rA )
as before. This “derivation” (in quotes, because we are wildly mixing Newtonian gravity,
special relativity and quantum mechanics - do take this “derivation” with an appropri-
ately sized grain of salt, please) shows that gravitational red-shift experiments test the
Einstein Equivalence Principle in its strong form, in which the term ‘laws of nature’ is
not restricted to mechanics (inertial = gravitational mass), but also includes quantum
mechanics in the sense that it tests if in an inertial frame the relation between photon
energy and frequency is unaffected by the presence of a gravitational field.
While difficult to observe directly (by looking at light form the sun), this prediction
has been verified in the laboratory, first by Pound and Rebka (1960), and subsequently,
with one percent accuracy, by Pound and Snider in 1964 (using the Mössbauer effect).
Let us make some rough estimates of the expected effect. We first consider light reaching
us (B) from the sun (A). In this case, we have rB ≫ rA , where rA is the radius of the
sun, and (also inserting a so far suppressed factor of c2 ) we obtain
νA − νB GN M (rB − rA ) GN M
= 2
≃ 2 . (2.124)
νB c rA rB c rA
Using the approximate values
rA ≃ 0.7 × 106 km
Msun ≃ 2 × 1033 g
GN ≃ 7 × 10−8 g−1 cm3 s−2
GN c−2 ≃ 7 × 10−29 g−1 cm = 7 × 10−34 g−1 km , (2.125)
one finds
∆ν
≃ 2 × 10−6 . (2.126)
ν
In principle, such a frequency shift should be observable. In practice, however, the
spectral lines of light emitted by the sun are strongly effected e.g. by convection in the
atmosphere of the sun (Doppler effect), and this makes it difficult to measure this effect
with the required precision.
66
In the Pound-Snider experiment, the actual value of ∆ν/ν is much smaller. In the orig-
inal set-up one has rB − rA ≃ 20m (the distance from floor to ceiling of the laboratory),
and rA = rearth ≃ 6.4 × 106 m, leading to
∆ν
≃ 2.5 × 10−15 . (2.127)
ν
However, here the experiment is much better controlled, and the gravitational red-shift
was verified with 1% accuracy.
Central to our initial discussion of gravity was the Einstein Equivalence Principle which
postulates the existence of locally inertial (or freely falling) coordinate systems in which
locally at (or around) a point the effects of gravity are absent. Now that we have decided
that the arena of gravity is a general metric space-time, we should establish that such
coordinate systems indeed exist. Looking at the geodesic equation, it it is clear that
‘absence of gravitational effects’ is tantamount to the existence of a coordinate system
{ξ a } in which at a given point p the metric is the Minkowski metric, gab (p) = ηab and
the Christoffel symbol is zero, Γabc (p) = 0. Owing to the identity
the latter condition is equivalent to gab ,c (p) = 0. I will sketch three arguments estab-
lishing the existence of such coordinate systems, each one having its own virtues and
providing its own insights into the issue.
Actually it is physically plausible (and fortuitously moreover true) that one can always
find coordinates which embody the equivalence principle in the stronger sense that the
metric is the flat metric ηab and the Christoffel symbols are zero not just at a point but
along the entire worldline of an inertial (freely falling) observer, i.e. along a geodesic.
Such coordinates, based on a geodesic rather than on a point, are known as Fermi
normal coordinates. The construction is similar to that of Riemann normal coordinates
(based at a point) to be discussed below.
1. Direct Construction
We know that given a coordinate system {ξ a } that is inertial at a point p, the
metric and Christoffel symbols at p in a new coordinate system {xµ } are deter-
mined by (1.75,1.82). Conversely, we will now see that knowledge of the metric
and Christoffel symbols at a point p is sufficient to construct a locally inertial
coordinate system at p.
Equation (1.82) provides a second order differential equation in some coordinate
system {xµ } for the inertial coordinate system {ξ A }, namely
∂2ξa µ ∂ξ
a
= Γ νλ . (2.129)
∂xν ∂xλ ∂xµ
67
By a general theorem, a local solution around p with given initial conditions ξ a (p)
and (∂ξ a /∂xµ )(p) is guaranteed to exist. In terms of a Taylor series expansion
around p one has
∂ξ a 1 ∂ξ a
ξ a (x) = ξ a (p) + µ
|p (x µ
− p µ
) + µ
|p Γµνλ (p)(xν − pν )(xλ − pλ ) + . . . (2.130)
∂x 2 ∂x
It follows from (1.75) that the metric at p in the new coordinate system is
∂xµ ∂xν
gab (p) = gµν (p) |p |p . (2.131)
∂ξ a ∂ξ b
Since a symmetric matrix (here the metric at the point p) can always be diago-
nalised by a similarity transformation, for an appropriate choice of initial condi-
tion (∂ξ a /∂xµ )(p) one can arrange that gab (p) is the standard Minkowski metric,
gab (p) = ηab .
With a little bit more work it can also be shown that in these coordinates (2.130)
one also has gab ,c (p) = 0. Thus this is indeed an inertial coordinate system at p.
As the matrix (∂ξ a /∂xµ )(p) which transforms the metric at p into the standard
Minkowski form is only unique up to Lorentz transformations, overall (counting
also the initial condition ξ a (p)) a locally inertial coordinate system is unique only
up to Poincaré transformations - an unsurprising result.
68
the north pole into the φ = 0 (ξ 1 ) and φ = π/2 (ξ 2 ) directions. The affine
parameter along a great circle (geodesic) connecting the north pole to a point
(θ, φ) is θ, and thus θ is also the geodesic distance, and the coordinates of the point
(θ, φ) are (ξ 1 = θ cos φ, ξ 2 = θ sin φ). In particular, the north pole is the origin
ξ 1 = ξ 2 = 0. Calculating the metric in these new components one finds (exercise!
see also e.g. Hartle’s Gravity) that gab (ξ) = δab + O(ξ 2 ), so that gab (ξ = 0) = δab
and gab,c (ξ = 0) = 0, as required. Note also that one could have guessed these
coordinates from the fact that near θ = 0 the metric is dθ 2 + θ 2 dφ2 , which is the
Euclidean metric in polar coordinates (θ cos φ, θ sin φ).
3. A Numerological Argument
This is my favourite argument because it requires no calculations and at the same
time provides additional insight into the nature of curved space-times.
Assuming that the local existence of solutions to differential equations is guaran-
teed by some mathematical theorems, it is frequently sufficient to check that one
has enough degrees of freedom to satisfy the desired initial conditions (one may
also need to check integrability conditions). In the present context, this argument
is useful because it also reveals some information about the ‘true’ curvature hidden
in the second derivatives of the metric. It works as follows:
69
number of 2nd derivatives of the metric modulo coordinate transformations is
D(D + 1) 2 D(D + 1)(D + 2) 1 2 2
−D = D (D − 1) . (2.133)
2 6 12
Again this turns out to agree with the number of independent components (8.20)
of the curvature tensor in D dimensions.
70
Note: At this point in the course I find it useful to develop in parallel (and suggest to
read in parallel)
• and a detailed discussion of the basic properties of the Schwarzschild metric (sec-
tions 12.4 - 15),
since much of the latter (in particular geodesics, solar system tests of general relativity,
even the issues that arise in connection with the Schwarzschild radius) can be understood
just on the basis of what has been done so far (if, for the time being, one accepts on
faith that the Schwarzschild metric is the unique spherically symmetric vacuum solution
of the Einstein field equations). Not only is this an interesting and physically relevant
application of the machinery developed so far, it also provides an appropriate balance
between physics and formalism in the lectures.
71
3 Tensor Algebra
The Einstein Equivalence Principle tells us that the laws of nature (including the effects
of gravity) should be such that in an inertial frame they reduce to the laws of Special
Relativity (SR). As we have seen, this can be implemented by transforming the laws
of SR to arbitrary coordinate systems and declaring that these be valid for arbitrary
coordinates and metrics.
However, this is a tedious method in general (e.g. to obtain the correct form of the
Maxwell equations in the presence of gravity) and not particularly enlightning. We
will thus replace the Einstein Equivalence Principle by the closely related Principle of
General Covariance PGC:
1. the equation holds in the absence of gravity, i.e. when gµν = ηµν , Γµνλ =
0, and
2. the equation is generally covariant, i.e. preserves its form under a gen-
eral coordinate transformation.
Concretely, the 2nd condition means the following: assume that you have some physical
equation that in some coordinate system takes the form T = 0, where T = 0 could be
some multi-component (thus T is adorned with various indices) differential equation.
Now perform a coordinate transformation x → y(x), and assume that the new object
T ′ has the form
T ′ = (. . .)T + junk (3.1)
where the term in brackets is some invertible matrix or operator. Then clearly the
presence of the junk-terms means that the equation T ′ = 0 is not equivalent to the
equation T = 0. An example of an object that transform in this way is, as we have seen,
the Christoffel symbols. On the other hand, if these junk terms are absent, so that we
have
T ′ = (. . .)T (3.2)
then clearly T ′ = 0 if and only if T = 0, i.e. the equation is satisfied in one coordinate
system if and only if it is satisfied in any other (or all) coordinate systems. This is the
kind of equation that embodies general covariance, and again we have already seen an
example of such an equation, namely the geodesic equation, where the term (. . .) in
brackets is just the Jacobi matrix. Thus, to be more concrete, we can replace the 2nd
condition above by
72
2’ the equation is generally covariant, i.e. it is satisfied in one coordinate
system iff it is satisfied in all coordinate systems.
Let us now establish the above statement, namely that the Einstein equivalence principle
implies that an equation that satisfies the conditions 1 and 2 (or 2’) is valid in an
arbitrary graviational field:
• consider some equation that satisfies these conditions, and assume that we are in
an arbitrary gravitational field;
• condition 2’ implies that this equation is true (or satisfied) in all coordinate sys-
tems if it is satisfied just in one coordinate system;
• now we know that we can always (locally) construct a freely falling coordinate
system in which the effects of gravity are absent;
• the Einstein Equivalence Principle now posits that in such a reference system the
physics is that of Minkowski space-time;
Note that general covariance alone is an empty statement since any equation (whether
correct or not) can be made generally covariant simply by writing it in an arbitrary
coordinate system. It develops its power only when used in conjunction with the Einstein
Equivalence Principle as a statement about physics in a gravitational field, namely that
by virtue of its general covariance an equation will be true in a gravitational field if it
is true in the absence of gravitation.
3.2 Tensors
If you are already familiar with Lorentz tensors from special relativity (as briefly recalled
in section 1.2, these are objects which transform in a particularly simple multi-linear way
under Lorentz transformations), then hardly anything in this or the subsequent section
3.3 should be new or unexpected (but interesting new features will arise in particular
when we move on from tensor algebra to tensor analysis in section 4).
73
Scalars
The simplest example of a tensor is a function (or scalar) f which under a coordinate
′
transformation xµ → y µ (xµ ) simply transforms as
or f ′ (y) = f (x(y)). One frequently suppresses the argument, and thus writes simply,
f ′ = f , expressing the fact that, up to the obvious change of argument, functions are
invariant under coordinate transformations.
Vectors
The next simplest case are vectors V µ (x) transforming as
′
′ ∂y µ µ
V ′µ (y(x)) = V (x) . (3.4)
∂xµ
A prime example is the tangent vector ẋµ to a curve, for which this transformation
behaviour ′
µ µ′ ∂y µ µ
ẋ → ẏ = ẋ (3.5)
∂xµ
is just the familiar one.
It is extremely useful to think of vectors as first order differential operators, via the
correspondence
V µ ⇔ V := V µ ∂µ . (3.6)
One of the advantages of this point of view is that V is completely invariant under
coordinate transformations as the components V µ of V transform inversely to the basis
vectors ∂µ . For more on this see sections 3.5 and 3.6 on the coordinate-independent
interpretation of tensors below.
Covectors
Covariant 2-Tensors
Clearly, given the above objects, we can construct more general objects which transform
in a nice way under coordinate transformations by taking products of them. Tensors in
74
general are objects which transform like (but need not be equal to) products of vectors
and covectors.
Contravariant 2-Tensors
Remarks:
1. Note that, in particular, a tensor is zero (at a point) in one coordinate system if
and only if the tensor is zero (at the same point) in another coordinate system.
Thus, any law of nature (field equation, equation of motion) expressed in terms of
µ ...µ
tensors, say in the form T ν11 ...νpq = 0, preserves its form under coordinate trasfor-
mations and is therefore automatically generally covariant,
µ1 ...µp µ′1 ...µ′p
T ν1 ...νq =0⇔T ν1′ ...νq′
=0 (3.13)
75
2. An important special example of a tensor is the Kronecker tensor δµν . Together
with scalars and products of scalars and Kronecker tensors it is the only tensor
whose components are the same in all coordinate systems. I.e. if one demands that
δµν transforms as a tensor, then one finds that it takes the same numerical values
′ ′ ′ ′ ′ ′
in all coordinate systems, i.e. δ νµ′ = δµν ′ . Conversely, if one posits that δ νµ′ = δµν ′ ,
one can deduce that δµν transforms as (i.e. is) a (1, 1)-tensor.
4. A covariant 2-tensor Tµν , say, is said to be symmetric if Tµν = Tνµ and anti-
symmetric if Tµν = −Tνµ . This is well-defined because it is a generally covariant
notion: a tensor is symmetric in all coordinate system iff it is symmetric in one
coordinate system, etc.
This definition can be extended to any or all pairs of covariant indices or pairs of
contravariant indices. Thus e.g. a tensor T µ1 ...µp is called totally symmetric (or
totally anti-symmetric) if it is symmetric (anti-symmetric) under the exchange of
any pair of indices.
On the other hand, it is not meaningful to talk of the symmetry of a (1,1)-tensor,
say, as an equation like T µν = T νµ is meaningless.
76
3.3 Tensor Algebra
Tensors can be added, multiplied and contracted in certain obvious ways. The basic
algebraic operations are the following:
1. Linear Combinations
µ1 ...µp µ1 ...µp
Given two (p, q)-tensors A ν1 ...νq and B ν1 ...νq , their sum
µ1 ...µp µ1 ...µp µ1 ...µp
C ν1 ...νq =A ν1 ...νq +B ν1 ...νq (3.15)
2. Direct Products
µ1 ...µp λ1 ...λp′
Given a (p, q)-tensor A ν1 ...νq and a (p′ , q ′ )-tensor B ρ1 ...ρq′ , their direct product
µ1 ...µp λ1 ...λp′
A ν1 ...νq B ρ1 ...ρq′ (3.16)
is a (p + p′ , q + q ′ )-tensor,
3. Contractions
Given a (p, q)-tensor with p and q non-zero, one can associate to it a (p − 1, q − 1)-
tensor via contraction of one covariant and one contravariant index,
µ1 ...µp µ1 ...µp−1 µ1 ...µp−1 λ
A ν1 ...νq →B ν1 ...νq−1 =A ν1 ...νq−1 λ . (3.17)
This is indeed a (p − 1, q − 1)-tensor, i.e. transforms like one. Consider, for ex-
ample, a (1,2)-tensor Aµνλ and its contraction Bν = Aµνµ . Under a coordinate
transformation Bν transforms as a covector:
′
Bν ′ = Aµν ′ µ′
′
∂y µ ∂xν ∂xλ µ
= A
∂xµ ∂y ν ′ ∂y µ′ νλ
∂xν λ µ
= δ A
∂y ν ′ µ νλ
∂xν µ ∂xν
= A = Bν . (3.18)
∂y ν νµ ∂y ν ′
′
77
µ ...µ
the metric. From the above we know that given a (p, q)-tensor A ν11 ...νpq , the prod-
µ ...µ
uct plus contraction with the metric tensor gµ1 ν A ν11 ...νpq is a (p − 1, q + 1)-tensor.
It will be denoted by the same symbol, but with one index lowered by the metric,
i.e. we write
µ ...µ µ ...µ
gµ1 ν A ν11 ...νpq ≡ Aν ν21 ...νpq . (3.19)
Note that there are p different ways of lowering the indices, and they will in general
give rise to different tensors. It is therefore important to keep track of this in the
notation. Thus, in the above, had we contracted over the second index instead of
the first, we should write
µ1 ...µp µ3 ...µp
gµ2 ν Aν1 ...νq ≡ Aµν1 ν1 ...νq . (3.20)
Finally note that this notation is consistent with denoting the inverse metric by
raised indices because
gµν = g µλ gνσ gλσ . (3.21)
and raising one index of the metric gives the Kronecker tensor,
78
It can also be generalised to the total (anti-)symmetrisation of a higher-rank ten-
sor; e.g.
1
T(µνλ) ≡ 3! (Tµνλ + Tνµλ + Tλνµ + Tνλµ + Tµλν + Tλµν ) (3.26)
is totally symmetric, i.e. symmetric under the exchange of any pair of indices, and
1
T[µνλ] ≡ 3! (Tµνλ − Tνµλ − Tλνµ + Tνλµ − Tµλν + Tλµν ) (3.27)
is totally anti-symmetric. The prefactor 16 is again there to ensure that the total
symmetrisation of a totally symmetric tensor is the original tensor (and likewise for
the total anti-symmetrisation of totally anti-symmetric tensors). This generalises
in an evident way to higher rank p tensors, with the combinatorial prefactor 1/p!.
An observation we will frequently make use of to recognise when some object is a tensor
is the following (occasionally known as the quotient theorem or quotient lemma):
µ ...µ
Assume that you are given some object A ν11 ...νpq . Then if for every covector Uµ the
µ ...µ µ ...µ
contracted object Uµ1 A ν11 ...νpq transforms like a (p − 1, q)-tensor, A ν11 ...νpq is a (p, q)-
tensor. Likewise for contractions with vectors or other tensors so that if e.g. in an
equation of the form
Aµν = Bµνλρ C λρ (3.28)
you know that A transforms as a tensor for every tensor C, then B itself has to be a
tensor.
While tensors are the objects which, in a sense, transform in the nicest and simplest
possible way under coordinate transformations, they are not the only relevant objects.
An important class of non-tensors are so-called tensor densities. The prime example of
a tensor density is the determinant g := − det gµν of the metric tensor (the minus sign
is there only to make g positive in signature (− + ++)). It follows from the standard
tensorial transformation law of the metric that under a coordinate transformation xµ →
′
y µ (xµ ) this determinant transforms as
2 −2
′ ∂x ∂y
g = det g = det g . (3.30)
∂y ∂x
79
An object which transforms in such a way under coordinate transformations is called a
scalar tensor density of weight w = +2.
′ ′
′−w/2 µ′ ...µ′ ∂y µ1 ∂y µp ∂xν1 ∂xνq −w/2 µ1 ...µp
g T ν1′ ...νp′ = . . . ′ . . . ′ g T ν1 ...νq . (3.32)
1 q ∂xµ1 ∂xµp ∂y ν1 ∂y νq
Conversely, therefore, any tensor density of weight w can be written as tensor times
g+w/2 . In particular (and this will be important in the following), the square root of
√
the determinant g transforms as, and hence is, a tensor density of weight w = +1.
The algebraic rules for tensor densities are strictly analogous to those for tensors. Thus,
for example, the sum of two (p, q) tensor densities of weight w (let us call this a (p, q; w)
tensor) is again a (p, q; w) tensor, and the direct product of a (p1 , q1 ; w1 ) and a (p2 , q2 ; w2 )
tensor is a (p1 + p2 , q1 + q2 ; w1 + w2 ) tensor. Contractions and the raising and lowering
of indices of tensor densities can also be defined just as for ordinary tensors.
Remarks:
1. The relevance of tensor densities arises from the fundamental theorem of integral
calculus that says that the integral measure d4 x (more generally dD x in dimension
D) transforms as
4 ∂y
d y = det d4 x , (3.33)
∂x
√
i.e. as a scalar density of weight w = −1. Thus gd4 x is a volume element which
is invariant under coordinate transformations,
p 4 √
g′ d y = gd4 x . (3.34)
80
2. This is also frequently the quickest way to determine the volume element in non-
Cartesian coordinates in Euclidean space. Thus, to determine what is the volume
element in spherical coordinates {y k } = (r, θ, φ), say, instead of laboriously de-
termining the Jacobi matrix for the coordinate transformation, and then (equally
laboriously) calculating its determinant (which would be the standard uninspiring
and uninspired procedure), all one needs to know is the metric in these coordinates
to deduce
ds2 = dr 2 + r 2 (dθ 2 + sin2 dφ2 ) ⇒ g = r 4 sin2 θ (3.37)
and therefore
√
d3 x = g d3 y = r 2 sin θ dr dθ dφ . (3.38)
Note that all this is as it should be. In order to measure things like distances, areas
and volumes, all that one should need to know is the metric, not the Jacobian
between two coordinate systems.
3. There is one more important tensor density which - like the Kronecker tensor - has
the same components in all coordinate systems. This is the totally anti-symmetric
Levi-Civita symbol ∈µνρσ (taking the values 0, ±1) which is a tensor density of
√
weight w = −1. Then g ∈µνρσ is a tensor (strictly speaking it is a pseudo-tensor
because of its behaviour under reversal of orientation but this will not concern us
here).
To see this, recall first of all the definition of the Levi-Civita symbol: it is totally
anti-symmetric,
∈λµνρ =∈[λµνρ] , (3.39)
and has therefore only got one independent component which we will normalise
to be
∈0123 = +1 . (3.40)
Thus ∈λµνρ = +1 if the indices (λµνρ) are an even permutation of (0123), ∈λµνρ =
−1 if the indices (λµνρ) are an odd permutation of (0123), and ∈λµνρ = 0 iff any
two indices are equal. This definition makes no reference to any coordinate system
whatsoever, and thus tautologically the purely combinatorial object ∈λµνρ has the
same components in all coordinate systems.
This evidently extends to other dimensions, and we will define the D = (d + 1)-
dimensional Levi-Civita symbol in the same way,
81
Now choose M to be the Jacobi matrix (∂y/∂x). Then the above equation shows
that ν1
∂y ∂x ∂xνD
∈µ′1 ...µ′D = det ′ . . . ′ ∈ν1 ...νD , (3.43)
∂x ∂y µ1 ∂y µD
i.e. that ∈ν1 ...νD transforms as a tensor density of weight w = −1.
Going back to 4 dimensions, it follows that
√
ǫλµνρ ≡ g ∈λµνρ (3.44)
We could have chosen to not absorb the minus sign into the definition of ∈λµνρ ,
at the expense of an explicit minus sign on the right-hand side of (3.45). The
convention we have adopted is more convenient, however, in particular since it
is compatible with the standard practice in special relativity to (tacitly) identify
ǫλµνρ =∈λµνρ , the minus sign arising from raising the indices on ∈λµνρ with the
Minkowski metric η µν with η 00 = −1, so that ∈0123 = − ∈0123 .
82
proportional to the metric dependent Levi-Civita tensor ǫµ1 ...µ4 but we will now
not make use of this fact (which would return us to the setting of the previous
remark). Rather, we consider its contraction with the contravariant Levi-Civita
symbol ∈λµνρ (which exists independently of any additional structure like a met-
ric),
Aµ1 ...µ4 → f˜ =∈µ1 ...µ4 Aµ1 ...µ4 . (3.49)
This is a scalar density of weight w = +1. As a consequence, its integral (3.36) is
well-defined and coordinate independent, without reference to any metric. Thus
totally anti-symmetric (0, 4)-forms provide natural 4-dimensional volume elements
(and likewise for totally anti-symmetric (0, p) tensors and p-dimensional volume
elements).
Consider first of all the derivative df of a function (scalar field) f = f (x). This is
clearly a coordinate-independent object, not only because we didn’t have to specify a
coordinate system to write df but also because
∂f (x) µ ∂f (y(x)) µ′
df = dx = ′ dy , (3.50)
∂xµ ∂y µ
which follows from the fact that ∂µ f (a covector) and dxµ (the coordinate differentials)
transform inversely to each other under coordinate transformations. This suggests that
it is useful to regard the quantities ∂µ f as the coefficients of the coordinate independent
object df in a particular coordinate system, namely when df is expanded in the basis
{dxµ }.
We can do the same thing for any covector Aµ . If Aµ is a covector (i.e. transforms like
one under coordinate transformations), then A := Aµ (x)dxµ is coordinate-independent,
and it is useful to think of the Aµ as the coefficients of the covector A when expanded
in a coordinate basis, A = Aµ dxµ . Linear combinations of dxµ built in this way from
covectors are known as 1-forms.
From this point of view, we interpret the {Aµ } simply as the (coordinate dependent)
components of the (coordinate independent) 1-form A when expressed with respect to
the (coordinate dependent) differentials {dxµ }, considered as a basis of the space of
covectors.
We can do the same thing for a general covariant tensor Tµν··· . Namely, if Tµ1 ···µq is a
(0, q)-tensor, then
T := Tµ1 ···µq dxµ1 . . . dxµq (3.51)
83
is coordinate independent. In the particular case of the metric tensor we have already
known and used this. In that case, T is what we called ds2 ,
gµν → ds2 = gµν dxµ dxν , (3.52)
which we know to be invariant under coordinate transformations (either by definition
or by construction).
If Tµ1 ···µp is totally anti-symmetric, the resulting object is also referred to as a p-form. As
we have already seen above, such p-forms provide natural and invariant p-dimensional
volume elements. In particular, applying this to the Levi-Civita tensor discussed in
section 3.4, we reproduce the statement (3.34) that (3.48)
1 √
ǫλµνρ dxλ dxµ dxν dxρ = gd4 x (3.53)
4!
is a space-time volume element that is invariant under coordinate transformations.
Can we do something similar for vectors and other contravariant (or mixed) tensors?
The answer is yes. Just as covectors transform inversely to coordinate differentials,
vectors V µ transform inversely to partial derivatives ∂µ . Thus
∂
V := V µ (x) (3.54)
∂xµ
is coordinate-independent - a coordinate-independent linear first-order differential op-
erator. One can thus always think of a vector field as a 1st order differential operator
and this is a very fruitful point of view.
Any choice of coordinate system {xµ } gives rise to such a basis {dxµ }, and such bases
are known as coordinate bases or natural bases. In the classical (and somewhat old-
fashioned) tensor calculus one always works in such a basis, and directly with the com-
ponents of tensors with respect to such a basis. This is very convenient and natural,
but this is now clearly not the only choice.
84
3.6 Vielbeins and Orthonormal Frames
The above point of view suggests a reformulation and generalisation that is extremely
natural and useful (but that I will nevertheless hardly ever make use of in these notes).
Namley, let {emµ (x)} be such that it is an invertible matrix for every point x. Then
another possible choice of basis for the space of covectors are the linear combinations
A general such basis is called a vielbein, which is German for multileg, quite appropriate
actually, as one should visualise this as a bunch of linearly independent (co-)vectors at
every point of space-time.
In two, three, and four dimensions these are also known more specifically as zweibeins,
dreibeins and vierbeins respectively. In four dimensions, the Greek word tetrads is also
commonly used. The em are sometimes also referred to as frame fields, mostly in the
context of orthonormal frames (see below).
In general, this new basis is not a coordinate basis, i.e. there does not exist a coordinate
system {y m } such that em = dy m . If such a coordinate system does exist, then one has
∂y m
em = dy m ⇒ emµ =
∂xµ (3.57)
⇒ ∂ν emµ = ∂µ emν ,
For many purposes, bases other than coordinate bases can also be extremely useful and
natural, in particular the orthonormal bases we will introduce below.
em µ m
µ en = δn eµm em µ
ν = δν . (3.60)
so that the components of A with respect to the new basis {em } are
Am = Aµ eµm . (3.62)
85
Likewise, the vielbeins allow us to pass from a natural (or coordinate) basis for vector
fields, the {∂µ }, to another basis
Em = eµm ∂µ , (3.63)
V = V µ (x)∂µ = V m Em (3.64)
with
V m = emµ V µ . (3.65)
[Em , En ] 6= 0 . (3.66)
We can apply the same reasoning to any other tensor field, e.g. to the metric tensor
itself. We can write the invariant line element as
so that the components of the metric with respect to the new basis are
Given a metric, there is a preferred class of bases {ea } which are such that the corre-
sponding matrices eaµ (x) diagonalise (and normalise) the metric at every point x, i.e.
which are such that gab = ηab or
Such a basis ea , with respect to which the components of the metric are the Minkowski
metric ηab , is known as an orthonormal basis or orthonormal frame.
In the more mathematical literature, the ea are also referred to as soldering forms
because they identify (solder, glue) an abstract space of (co-)vectors at each point x,
labelled by a, b, . . . with the concrete space of (co-)vectors tangent to the space-time at
the point x, labelled e.g. by the indices µ, ν, . . ..
For a general metric, a basis which achieves this cannot be a coordinate basis (because
this would mean that the metric is equivalent to the Minkowski metric by a coordinate
transformation). But clearly there is no obstacle to finding a more general basis which
will do this: for every point x we can find a matrix eaµ (x) which achieves (3.69) As the
86
metric varies smoothly with x, we can also choose the matrices eaµ (x) to vary smoothly
with x, and hence we can put them together to define the smooth matrix-valued function
eaµ (x) for all x. [I am ignoring some global (topological) issues here. We will not need
to worry about them here.]
The reason why I referred to a “class of bases” above is that, clearly, such an orthonormal
basis is not unique. At every point x it is determined up to a Lorentz transformation
Thus a given metric does not determine a unique orthonormal basis, but only an or-
thonormal basis up to Lorentz transformations
If one wants the components of the metric in a given coordinate system {xµ }, one
expands the orthonormal basis ea in terms of the natural basis dxµ as above as
to find, as above,
gµν (x) = eaµ (x)ebν (x)ηab . (3.74)
Thus instead of the metric one can choose orthonormal vielbeins as the basic variables
of General Relativity. In that case one has to demand not only general covariance but
also invariance under local Lorentz transformations (acting on the orthonormal indices
a, b, . . .). [One could also allow for general vielbeins, in which case one would have to
replace Lorentz transformations by the larger group of general linear transformations.]
Examples:
Here are a few examples to illustrate that orthonormal frames are not something mys-
terious but can usually be read off very easily from the metric in a coordinate basis.
Now define
e1 = Rdθ , e2 = R sin θdφ , (3.76)
87
i.e.
e1θ = R , e1φ = 0 , e2θ = 0 , e2φ = R sin θ . (3.77)
Then the metric can be written as
so the ea are an orthonormal basis. They are obviously not a coordinate basis
because (3.58)
∂θ e2φ = R cos θ 6= ∂φ e2θ = 0 . (3.79)
With
0 1 2 3 2m 1/2 2m −1/2
(e , e , e , e ) = (1 − ) dt, (1 − ) dr, rdθ, r sin θdφ (3.81)
r r
the metric can be written as
Remarks:
1. The eaµ can in some sense be regarded as the square-root of the metric. In par-
ticular denoting the determinant of the matrix eaµ by
(3.69) implies
p
g(x) := − det(gαβ (x)) = e(x)2 ⇔ |e(x)| = g(x) . (3.86)
88
2. Coordinate indices can, as usual, be raised and lowered with the space-time metric
gµν and its inverse, and Minkowski (tangent space) indices with the Minkowski
metric ηab and its inverse.
Note that this is consistent with the notation for eaµ and its inverse eµa because
etc. The reason why I have called the basis of vector fields in a general frame Em
rather than em is that em and Em are of course not related just by lowering or
raising the indices of the metric, Em 6= gmn en . The former are linear combinations
of the dxµ , the latter linear combinations of the ∂µ , so they are very different
objects.
One could now go ahead and develop the entire machinery of tensor calculus (covariant
derivatives, curvature, . . . ) that we are about to develop in the following sections in
terms of vielbeins as the basic variables instead of the metric. This is rather straight-
forward. For example, given the expression for the Christoffel symbols in terms of the
metric, and for the metric in terms of the vielbeins, one can express the Christoffel
symbols (and hence covariant derivatives and curvatures) in terms of vielbeins. But the
resulting expressions are rather unenlightning and not of much use in practice.
The real power of the vielbein formalism emerges when one combines it with the for-
malism of differential forms. And in practice the most useful and efficient alternative
to working in components in a coordinate basis is working with differential forms in an
orthonormal basis.3
3
I do most of my calculations in the latter framework, but this is (for the time being) not something
I will develop further here in these notes. See e.g. W. Thirring, Classical Mathematical Physics for a
presentation of general relativity entirely in the coordinate-independent formalism of differential forms.
89
4 Tensor Analysis
Tensors transform in a nice and simple way under general coordinate transformations.
Thus these appear to be the right objects to construct equations from that satisfy the
Principle of General Covariance.
However, the laws of physics are differential equations, so we need to know how to
differentiate tensors. The problem is that the ordinary partial derivative does not map
tensors to tensors, the partial derivative of a (p, q)-tensor is not a tensor unless p = q = 0.
This is easy to see: take for example a vector V µ . Under a coordinate transformation,
its partial derivative transforms as
′
µ′ ∂xν ∂ ∂y µ µ
∂ V
ν′ = ′ V
∂y ν ∂xν ∂xµ
′ ′
∂xν ∂y µ µ ∂xν ∂ 2 y µ
= ∂ν V + Vµ . (4.1)
∂y ν ′ ∂xµ ∂y ν ′ ∂xµ ∂xν
The appearance of the second term shows that the partial derivative of a vector is not
a tensor.
As the second term is zero for linear transformations, you see that partial derivatives
transform in a tensorial way e.g. under Lorentz transformations, so that partial deriva-
tives are all one usually needs in special relativity.
We also see that the lack of covariance of the partial derivative is very similar to the
lack of covariance of the equation ẍµ = 0, and this suggests that the problem can be
cured in the same way - by introducing Christoffel symbols. This is indeed the case.
∇ν V µ = ∂ν V µ + Γµνλ V λ . (4.2)
It follows from the non-tensorial behaviour (1.115) of the Christoffel symbols under coor-
dinate transformations that ∇ν V µ , as defined above, is indeed a (1, 1) tensor. Moreover,
in a locally inertial coordinate system this reduces to the ordinary partial derivative,
and we have thus, as desired, arrived at an appropriate tensorial generalisation of the
partial derivative operator.
We could have also arrived at the above definition in a somewhat more systematic way.
Namely, let {ξ a } be an inertial coordinate system. In an inertial coordinate system we
can just use the ordinary partial derivative ∂b V a . We now define the new (improved,
covariant) derivative ∇ν V µ in any other coordinate system {xµ } by demanding that it
90
transforms as a (1,1)-tensor, i.e. we define
∂xµ ∂ξ b
∇ν V µ := ∂b V a . (4.3)
∂ξ a ∂xν
By a straightforward calculation one finds that
∇ν V µ = ∂ν V µ + Γµνλ V λ , (4.4)
That ∇µ V ν , defined in this way, is indeed a (1, 1) tensor, now follows directly from the
way we arrived at the definition of the covariant derivative. Indeed, imagine transform-
′
ing from inertial coordinates to another coordinate system {y µ }. Then (4.3) is replaced
by
′
′ ∂y µ ∂ξ b
∇ν ′ V µ := ∂b V a . (4.6)
∂ξ a ∂y ν ′
Comparing this with (4.3), we see that the two are related by
′
µ′ ∂y µ ∂xν
∇ν ′ V := ∇ν V µ , (4.7)
∂xµ ∂y ν ′
as required.
∇X V µ ≡ X ν ∇ν V µ . (4.8)
The appearance of the Christoffel-term in the definition of the covariant derivative may
at first sight appear a bit unusual (even though it also appears when one just transforms
Cartesian partial derivatives to polar coordinates etc.). There is a more invariant way
of explaining the appearance of this term, related to the more coordinate-independent
way of looking at tensors explained above. Namely, since the V µ (x) are really just
the coefficients of the vector field V (x) = V µ (x)∂µ when expanded in the basis ∂µ , a
meanigful definition of the derivative of a vector field must take into account not only
91
the change in the coefficients but also the fact that the basis changes from point to point
- and this is precisely what the Christoffel symbols do. Writing
∇ν V = ∇ν (V µ ∂µ )
= (∂ν V µ )∂µ + V λ (∇ν ∂λ ) , (4.9)
∇ν ∂λ = Γµνλ ∂µ . (4.10)
It is instructive to check in some examples that the Christoffel symbols indeed describe
the change of the tangent vectors ∂µ . For instance on the plane, in polar coordinates,
one has
∇r ∂r = Γµrr ∂µ = 0 , (4.12)
which is correct because ∂r indeed does not change when one moves in the radial di-
rection. ∂r does change, however, when one moves in the angular direction given by
∂φ . In fact, it changes its direction proportional to ∂φ but this change is stronger for
small values of r than for larger ones (draw a picture!). This is precisely captured by
the non-zero Christoffel symbol Γφrφ ,
1
∇φ ∂r = Γφrφ ∂φ = ∂φ . (4.13)
r
Our basic postulates for the covariant derivative are the following:
∇µ φ = ∂µ φ . (4.14)
3. Leibniz Rule
Acting on the direct product of tensors, ∇µ satisfies a generalised Leibniz rule,
µ ...µ µ ...µ µ ...µ
∇µ (A ν11 ...νpq B ρλ11...ρr 1 p ρ1 ...ρr 1 p ρ1 ...ρr
...λs ) = ∇µ (A ν1 ...νq )B λ1 ...λs + A ν1 ...νq ∇µ B λ1 ...λs (4.15)
92
We will now see that, demanding the above properties, in particular the Leibniz rule,
there is a unique extension of the covariant derivative on vector fields to a differential
operator on general tensor fields, mapping (p, q)- to (p, q + 1)-tensors.
To define the covariant derivative for covectors Uµ , we note that Uµ V µ is a scalar for
any vector V µ so that
(since the partial derivative satisfies the Leibniz rule), and we demand
∇µ Uν = ∂µ Uν − Γλµν Uλ . (4.18)
That this is indeed a (0, 2)-tensor can either be checked directly or, alternatively, is a
consequence of the quotient theorem.
The extension to other (p, q)-tensors is now immediate. If the (p, q)-tensor is the direct
product of p vectors and q covectors, then we already know its covariant derivative (using
the Leibniz rule again). We simply adopt the same resulting formula for an arbitrary
(p, q)-tensor. The result is that the covariant derivative of a general (p, q)-tensor is the
sum of the partial derivative, a Christoffel symbol with a positive sign for each of the p
upper indices, and a Christoffel with a negative sign for each of the q lower indices. In
equations
ν1 ···νp ν1 ···νp
∇µ T ρ1 ···ρq = ∂µ T ρ1 ···ρq
ν1 λν ···ν ν ν1 ···νp−1 λ
+ Γ µλ T ρ12···ρqp p
+ . . . + Γ µλ T ρ1 ···ρq
| {z }
p terms
ν1 ···νp ν1 ···νp
− Γλµρ1 T λρ2 ···ρq − . . . − Γλµρq T ρ1 ···ρq−1 λ (4.19)
| {z }
q terms
Having defined the covariant derivative for arbitrary tensors, we are also ready to define
it for tensor densities. For this we recall that if T is a (p, q; w) tensor density, then
g−w/2 T is a (p, q)-tensor. Thus ∇µ (g−w/2 T ) is a (p, q + 1)-tensor. To map this back to
a tensor density of weight w, we multiply this by gw/2 , arriving at the definition
93
where ∇tensor
µ just means the usual covariant derivative for (p, q)-tensors defined above.
For example, for a scalar density φ one has
w
∇µ φ = ∂µ φ − (∂µ g)φ . (4.22)
2g
In particular, since the determinant g is a scalar density of weight +2, it follows that
∇µ g = 0 , (4.23)
which obviously simplifies integrations by parts in integrals defined with the measure
√ 4
gd x.
The main properties of the covariant derivative, in addition to those that were part of
our postulates (like linearity and the Leibniz rule) are the following:
The most transparent way of stating this property is that the Kronecker delta is
covariantly constant, i.e. that
∇µ δνλ = 0 . (4.26)
∇µ Aν... ν... ρ
ν... = ∇µ (A ρ... δ ν )
= (∇µ Aν... ρ
ρ... )δ ν (4.27)
94
which is precisely the statement that covariant differentiation and contraction
commute. To establish that the Kronecker delta is covariantly constant, we follow
the rules to find
(a) Since ∇µ gνλ is a tensor, we can choose any coordinate system we like to
establish if this tensor is zero or not at a given point x. Choose an inertial
coordinate system at x. Then the partial derivatives of the metric and the
Christoffel symbols are zero there. Therefore the covariant derivative of the
metric is zero. Since ∇µ gνλ is a tensor, this is then true in every coordinate
system.
(b) The other argument is by direct calculation. Recalling the identity
we calculate
∇µ ∇ν φ − ∇ν ∇µ φ = ∇µ ∂ν φ − ∇ν ∂µ φ
= ∂µ ∂ν φ − Γλµν ∂λ φ − ∂ν ∂µ φ + Γλνµ ∂λ φ = 0 . (4.32)
Note that the second covariant derivatives on higher rank tensors do not commute
- we will come back to this in our discussion of the curvature tensor later on.
95
4.5 Tensor Analysis: Some Special Cases
In this section I will, without proof, give some useful special cases of covariant derivatives
- covariant curl and divergence etc. - you should make sure that you can derive all of
these yourself without any problems.
because the symmetric Christoffel symbols drop out in this antisymmetric linear
combination. Thus the Maxwell field strength Fµν = ∂µ Aν −∂ν Aµ is a tensor under
general coordinate transformations, no metric or covariant derivative is needed to
make it a tensor in a general space time.
∇µ V µ = ∂µ V µ + Γµµλ V λ . (4.35)
and one only needs to calculate g and its derivative, not the Christoffel symbols
themselves, to calculate the covariant divergence of a vector field.
96
How should the Laplacian be defined? Well, the obvious guess (something that
is covariant and reduces to the ordinary Laplacian for the Minkowski metric) is
2 = gµν ∇µ ∇ν , which can alternatively be written as
etc. Note that, even though the covariant derivative on scalars reduces to the
ordinary partial derivative, so that one can write
2φ = ∇µ gµν ∂ν φ , (4.39)
it makes no sense to write this as ∇µ ∂ µ φ: since ∂µ does not commute with the
metric in general, the notation ∂ µ is at best ambiguous as it is not clear whether
this should represent g µν ∂ν or ∂ν g µν or something altogether different. This am-
biguity does not arise for the Minkowski metric, but of course it is present in
general.
A compact yet explicit expression for the Laplacian follows from the expression
for the covariant divergence of a vector:
2φ := gµν ∇µ ∇ν φ
= ∇µ (gµν ∂ν φ)
= g−1/2 ∂µ (g 1/2 gµν ∂ν φ) . (4.40)
This formula is also useful (and provides the quickest way of arriving at the result)
if one just wants to write the ordinary flat space Laplacian on R3 in, say, polar or
cylindrical coordinates.
To illustrate this, let us calculate the Laplacian for the standard metric on Rn+1
in polar coordinates. The standard procedure would be to first determine the
coordinate transformation xi = xi (r, angles), then calculate ∂/∂xi , and finally
P
assemble all the bits and pieces to calculate ∆ = i (∂/∂xi )2 . This is a pain.
To calculate the Laplacian, we do not need to know the coordinate transformation,
all we need is the metric. In polar coordinates, this metric takes the form
where dΩ2n is the standard line-element on the unit n-sphere S n . The determinant
of this metric is g ∼ r 2n (times a function of the coordinates (angles) on the
sphere). Thus, for n = 1 one has ds2 = dr 2 + r 2 dφ2 and therefore
In general, denoting the angular part of the Laplacian, i.e. the Laplacian of S n ,
by ∆S n , one finds analogously
∆ = ∂r2 + nr −1 ∂r + r −2 ∆S n . (4.43)
I hope you agree that this method is superior to the standard procedure.
97
5. The Covariant Form of Gauss’ Theorem
Let V µ be a vector field, ∇µ V µ its divergence and recall that integrals in curved
space are defined with respect to the integration measure g1/2 d4 x. Thus one has
Z Z
g d x∇µ V = d4 x∂µ (g1/2 V µ ) .
1/2 4 µ
(4.44)
Now the second term is an ordinary total derivative and thus, if V µ vanishes
sufficiently rapidly at infinity, one has
Z
g1/2 d4 x∇µ V µ = 0 . (4.45)
While we saw that this expression could be understood and deduced from the
requirement that the variation of the metric is itself a tensorial object that trans-
forms like the metric, the tensorial nature of the above expression is far from
manifest. However, it has a very nice and simple expression in terms of covariant
derivatives of V , namely
δV gαβ = ∇α Vβ + ∇β Vα (4.49)
(as is easily verified). This expression will be rederived (and placed into the general
context of Lie derivatives and Killing vectors) in section 6.
You will have noticed that many equations simplify considerably for completely anti-
symmetric tensors. In particular, their curl can be defined in a tensorial way without
reference to any metric. This observation is at the heart of the coordinate indepen-
dent calculus of differential forms. In this context, the curl is known as the exterior
derivative.
98
Likewise, Lie derivatives of tensors (section 6 and item 7 above) are automatically
tensorial objects (and one can, but need not, make their tensorial nature manifest by
writing these derivatives in terms of covariant derivatives).
————————————————–
Here is an elementary proof of the identity (4.36), and a useful more general formula
for the variation of the determinant of the metric, namely
This proof is based on the standard cofactor or minor expansion of the determinant of
a matrix (an alternative standard proof can, as also outlined below, be based on the
identity det G = exp tr log G and its derivative or variation). The cofactor expansion
formula for the determinant is
X
g= (−1)µ+ν gµν |mµν | , (4.51)
ν
where |mµν | is the determinant of the minor of gµν , i.e. of the matrix one obtains by
removing the µ’th row and ν’th column from gµν . It follows that
∂g
= (−1)µ+ν |mµν | . (4.52)
∂gµν
since this is, in particular, the determinant of a matrix with gµν = gλν , i.e. of a matrix
with two equal rows.
99
This can also be written in matrix form, with G denoting the matrix with components
(G)µν = gµν as
δ log det G = tr G−1 δG . (4.58)
In this form, the result can also be derived from variation of the remarkably useful
identity
det G = e tr log G (4.59)
This identity, in turn, can be derived in an elementary way for diagonalisable G by
noting that it holds trivially for diagonal matrices, and therefore, by the conjugation
invariance of det and tr, also for diagonalisable matrices (like the metric).
In particular, it follows from (4.50) that if the variation is the partial derivative one has
∂g
∂λ g = ∂λ gµν = gg µν ∂λ gµν . (4.60)
∂gµν
or
g −1 ∂λ g = g µν ∂λ gµν . (4.61)
On the other hand, the contracted Christoffel symbol is
1
Γµµλ = gµν ∂λ gµν , (4.62)
2
which establishes the identity (4.36).
So far, we have defined covariant differentiation for tensors defined everywhere in space
time. Frequently, however, one encounters tensors that are only defined on curves - like
the momentum of a particle which is only defined along its world line. In this section we
will see how to define covariant differentiation along a curve. Thus consider a curve xµ (τ )
(where τ could be, but need not be, proper time) and the tangent vector field X µ (x(τ )) =
ẋµ (τ ). Now define the covariant derivative ∇τ along the curve, covariantising d/dτ , by
d
= ẋµ ∂µ → ∇τ = X µ ∇µ = ẋµ ∇µ . (4.63)
dτ
Frequently one also uses the (suggestive, but ugly) notation
100
This notion of covariant derivative along a curve permits us, in particular, to define the
(covariant) acceleration aµ of a curve xµ (τ ) as the covariant derivative of the velocity
uµ = ẋµ ,
aµ = ∇τ ẋµ = ẍµ + Γµνλ ẋν ẋλ = uν ∇ν uµ . (4.66)
Thus we can characterise (affinely parametrised) geodesics as those curves whose cvari-
ant acceleration is zero,
Geodesics: aµ = uν ∇ν uµ = 0 , (4.67)
a reasonable and natural statement regarding the movement of freely falling particles.
If they are not affinely parametrised, as in (2.25), then instead of uν ∇ν uµ = 0 one has
uν ∇ν uµ = κuµ . (4.68)
We now come to the important notion of parallel transport of a tensor along a curve.
Note that, in a general (curved) metric space time, it does not make sense to ask if two
vectors defined at points x and y are parallel to each other or not. However, given a
metric and a curve connecting these two points, one can compare the two by dragging
one along the curve to the other using the covariant derivative.
∇τ T ···
··· = 0 . (4.69)
1. In a locally inertial coordinate system along the curve, this condition reduces to
dT /dτ = 0, i.e. to the statement that the tensor does not change along the curve.
Thus the above is indeed an appropriate tensorial generalisation of the intuitive
notion of parallel transport to a general metric space-time.
2. The parallel transport condition is a first order differential equation along the
curve and thus defines T ··· ···
··· (τ ) given an initial value T ··· (τ0 ).
3. Taking T to be the tangent vector X µ = ẋµ to the curve itself, the condition for
parallel transport becomes
i.e. precisely the geodesic equation. We have already seen that geodesics are
precisely the curves with zero acceleration. We can now equivalently characterise
them by the property that their tangent vectors are parallel transported (do not
change) along the curve. For this reason geodesics are also known as autoparallels.
101
4. Since the metric is covariantly constant, it is in particular parallel along any curve.
Thus, in particular, if V µ is parallel transported, also its length remains constant
along the curve,
∇τ V µ = 0 ⇒ ∇τ (gµν V µ V ν ) = 0 . (4.71)
In particular, we rediscover the fact claimed in (2.17) that the quantity gµν ẋµ ẋν =
gµν X µ X ν is constant along a geodesic.
5. Now let xµ (τ ) be a geodesic and V µ parallel along this geodesic. Then, as one
might intuitively expect, also the angle between V µ and the tangent vector to the
curve X µ remains constant. This is a consequence of the fact that both the norm
of V and the norm of X are constant along the curve and that
d
(gµν X µ V ν ) = ∇τ (gµν X µ V ν ) = gµν ∇τ (X µ )V ν + gµν X µ ∇τ V ν = 0 (4.72)
dτ
Recall that the transformation behaviour of the Christoffel symbols, equation (1.115),
was the key ingredient in the proof that the geodesic equation transforms like a vector
under general coordinate transformations. Likewise, to show that the covariant deriva-
tive of a tensor is again a tensor all one needs to know is that the Christoffel symbols
transform in this way. Thus any other object Γ̃µνλ could also be used to define a co-
variant derivative (generalising the partial derivative and mapping tensors to tensors)
provided that it transforms in the same way as the Christoffel symbols, i.e. provided
that one has ′ ′
′ ∂y µ ∂xν ∂xλ ∂y µ ∂ 2 xµ
Γ̃µν ′ λ′ = Γ̃µνλ µ ν ′ λ′ + . (4.73)
∂x ∂y ∂y ∂xµ ∂y ν ′ ∂y λ′
But this implies that the difference
where C µνλ is a (1,2)-tensor. Therefore the question arises if the Christoffel symbols we
have been using are somehow singled out or preferred.
In some sense, the answer is an immediate yes because it is this particular connection
that enters in determining the paths of freely falling particles (the geodesics which
extremise proper time).
102
Moreover, the covariant derivative as we have defined it has two important properties,
namely that
2. the torsion is zero, i.e. the second covariant derivatives of a scalar commute.
In fact, it turns out that these two conditions uniquely determine the Γ̃ to be the
Christoffel symbols. The second condition implies that the Γ̃µνλ are symmetric in the
two lower indices,
˜ µ, ∇
[∇ ˜ ν ]φ = 0 ⇔ Γ̃λµν = Γ̃λνµ . (4.76)
The first condition now allows one to express the Γ̃λµν in terms of the derivatives of
the metric, leading uniquely to the familiar expression for the Christoffel symbols Γµνλ :
First of all, by definition / construction one has (e.g. from demanding the Leibniz rule
˜ µ)
for ∇
˜ µ gνλ = ∂µ gνλ − Γ̃ρνµ gρλ − Γ̃ρ gνρ ≡ ∂µ gνλ − Γ̃λνµ − Γ̃νλµ .
∇ (4.77)
λµ
˜ µ gνλ + ∇
0=∇ ˜ ν gµλ − ∇
˜ λ gµν
= ∂µ gνλ + ∂ν gµλ − ∂λ gµν − Γ̃λνµ − Γ̃νλµ − Γ̃λµν − Γ̃µλν + Γ̃νµλ + Γ̃µνλ (4.78)
= 2(Γλµν − Γ̃λµν )
(where the cancellations are entirely due to the assumed symmetry of the coefficients
in the last two indices). Thus Γ̃ = Γ. This unique metric-compatible and torsion-free
connection is also known as the Levi-Civita connection. It is the connection canonically
associated to a space-time (manifold) equipped with a metric tensor.
It is of course possible to relax either of the conditions (1) or (2), or both of them and,
in particular, connections with torsion (relaxation of condition 2) are popular in certain
circles and/or arise naturally in certain generalised (gauge) theories of gravity and in
string theory.
with Γµνλ the canonical Levi-Civita connection, and C µνλ a (1, 2)-tensor. We will also
use the corresponding (0, 3)-tensor
103
Associated with Γ̃µνλ we have the covariant derivative ∇ ˜ µ . Since Γ̃µ will in general not
νλ
be symmetric in its lower indices, in this section we need to be particularly careful with
(and choose a convention for) the ordering of the lower indices in the covariant derivative.
We will choose the convention that the last index always refers to the direction along
which one is differentiating, i.e.
˜ µ V ν = ∂µ V ν + Γ̃ν V λ
∇ (4.81)
λµ
etc. The reason for this choice is that one should think of the collection of objects Γνλµ
(and Γ̃νλµ ) as the coefficients of a matrix-valued 1-form (cf. section 3.5) Γνλ = Γνλµ dxµ ,
the matrices acting by rotation on vectors (and more general tensors), as in section 4.2.
˜ µ, ∇
[∇ ˜ ν ]φ = T λ ∂λ φ , Tλµν = gλρ T ρµν , (4.82)
µν
˜ µ gνλ = −Qνλµ .
∇ (4.83)
Thus the torsion is zero iff C λµν (and hence Γ̃λµν ) is symmetric in its lower indices, and
the connection is compatible with the metric iff Cνλµ is anti-symmetric in its first two
indices. In particular, if the torsion is zero and the connection is metric-compatible, one
has
Cλµν = Cλνµ and Cλµν = −Cµλν ⇒ Cλµν = 0 , (4.85)
Conversely, since the absence of torsion and non-metricity characterises the Levi-Civita
connection, it should be possible to express the deviation Cλµν from the Levi-Civita
connection entirely in terms of torsion and non-metricity. This is indeed the case. By
repeating the calculation (4.78) in this more general context, one finds
104
with
T̃λµν = 21 (Tλµν − Tµλν − Tνλµ ) = −T̃µλν (4.89)
and
Q̃λµν = 12 (Qµλν + Qνλµ − Qµνλ ) = Q̃λνµ . (4.90)
Thus we can now split a general connection more informatively into the 3 pieces
Remarks:
1. The tensor T̃λµν is known as the contorsion tensor (frquently misspelled as “con-
tortion” tensor). I am not aware of a commonly used name for Q̃λµν , and will
not try to invent one. The contorsion tensor is the linear combination of compo-
nents of the torsion tensor that appear as the connection coefficents of a general
metric-compatible connection with torsion,
but it cannot be symmetric (if the contorsion were symmetric, the torsion, and
hence the contorsion, would be zero). If its symmetric part vanishes, then T̃λµν is
completely anti-symmetric,
˜ τ Xµ = 0
∇ ⇔ ẍµ + Γ̃µνλ ẋν ẋλ = 0 , (4.95)
(i.e. curves characterised by the fact that their tangent vectors are parallel trans-
ported along the curve - this depends on a choice of connection) no longer coincides
with the notion of geodesics (which are obtained by extremising proper time or
distance, and which always lead to the Levi-Civita connection). However, this
difference disappears if C µνλ happens to be antisymmetric in its lower indices (e.g.
for a metric-compatible connection with totally antisymmetric contorsion tensor),
as one then has
ẍµ + Γ̃µνλ ẋν ẋλ = ẍµ + Γµνλ ẋν ẋλ . (4.96)
105
5 Physics in a Gravitational Field
The fact that the covariant derivative ∇ maps tensors to tensors and reduces to the
ordinary partial derivative in a locally inertial coordinate system suggests the following
algorithm for obtaining equations that satisfy the Principle of General Covariance:
1. Write down the Lorentz invariant equations of Special Relativity (e.g. those of
relativistic mechanics, Maxwell theory, relativistic hydrodynamics, . . . ).
2. Wherever the Minkowski metric ηµν (or ηab ) appears, replace it by gµν .
4. In particular, for the proper-time derivative along a curve this entails replacing
d/dτ by ∇τ .
R R√ 4
5. Wherever an integral d4 x appears, replace it by gd x
By construction, these equations are tensorial (generally covariant) and true in the
absence of gravity and hence satisfy the Principle of General Covariance. Hence they
will be true in the presence of gravitational fields (at least on scales small compared to
that of the gravitational fields - if one considers higher derivatives of the metric tensor
then there are other equations that one can write down, involving e.g. the curvature
tensor, that are tensorial but reduce to the same equations in the absence of gravity).
We can see the power of the formalism we have developed so far by rederiving the laws
of particle mechanics in a general gravitational field. In Special Relativity, the motion
of a particle with mass m moving under the influence of some external force is governed
by the equation
dX µ fµ
SR: = , (5.1)
dτ m
where f µ is the force four-vector and X µ = ẋµ . Thus, using the principle of minimal
coupling, the equation in a general gravitational field is
fµ
GR: ∇τ X µ = , (5.2)
m
Of course, the left hand side is just the covariant acceleration, leading to the familiar
geodesic equation for f µ = 0, but we see that it follows much faster from demanding
general covariance (the principle of minimal coupling) than from our previous somewhat
more convoluted considerations based e.g. on the equivalence principle.
106
5.3 Klein-Gordon Scalar Field in a Gravitational Field
Here is where the formalism we have developed really pays off. We will see once again
that, using the minimal coupling rule, we can immediately rewrite the equations for a
scalar field (here) and the Maxwell equations (below) in a form in which they are valid
in an arbitrary gravitational field.
The action for a (real) free massive scalar field φ in Special Relativity is
Z h i
SR: S[φ] = d4 x − 12 η αβ ∂α φ∂β φ − 21 m2 φ2 . (5.3)
√
To covariantise this, we replace d4 x → gd4 x, η αβ → gαβ , and we could replace ∂α →
∇α (but this makes no difference on scalars). Therefore, the covariant action in a general
gravitational field is
Z
√ 4 h 1 αβ i
GR: S[φ, gαβ ] = gd x − 2 g ∂α φ∂β φ − 21 m2 φ2 . (5.4)
Here I have also indicated the dependence of the action on the metric gαβ . This is not
(yet) a dynamical field, though, just the gravitational background field.
A comment on how to derive this: if one thinks of the ∂α in the action as covariant deriva-
tives, ∂α → ∇α , then the calculation is identical to that in Minkowski space provided
√
that one remembers that ∇α g = 0. If one sticks with the ordinary partial derivatives,
√
then upon the usual integration by parts one picks up a term ∼ ∂α ( ggαβ ∂β φ) which
then evidently leads to the Laplacian in the form (4.40).
107
The corresponding energy-momentum tensor in a gravitational field is then evidently
GR: Tαβ = ∂α φ∂β φ + gαβ L = ∂α φ∂β φ − 21 gαβ gµν ∂µ φ∂ν φ + m2 φ2 (5.9)
Note that we cannot expect to have an ordinary conservation law because there is no
translation invariance in a general gravitational background. We will come back to
this issue below and in sections 6 and 7. In particular, we cannot even derive the
energy-momentum tensor (5.9) via Noether’s theorem (applied to translations).
We can, however, obtain it in a quite different (and perhaps at this point rather mys-
terious) way, namely by varying the action S[φ, gαβ ] not with respect to φ but with
respect to the metric! Indeed, recalling the formula (4.50) for the derivative / variation
of the determinant of the metric from section 4.5,
(the second equality following from δ(gαβ gαβ ) = 0), one finds that under gαβ → gαβ +
δgαβ the action varies by
Z
√ 4
δS[φ, gαβ ] = − 12 gd x Tαβ δgαβ (5.12)
or
2 δS[φ, gαβ ]
Tαβ = − √ . (5.13)
g δgαβ
More should (and will) be said about this in due course . . .
and the energy-momentum tensor in that case has the form (5.9) with mφ2 /2 unsur-
prisingly also replaced by V (φ),
Given the vector potential Aµ , the Maxwell field strength tensor in Special Relativity is
108
Therefore in a general metric space time (gravitational field) we have
Actually, this is a bit misleading. The field strength tensor (two-form) in any, Abelian or
non-Abelian, gauge theory is always given in terms of the gauge-covariant exterior derivative of
the vector potential (connection), and as such has nothing whatsoever to do with the metric
on space-time. So you should not really regard the first equality in the above equation as the
definition of Fµν , but you should regard the second equality as a proof that Fµν , always defined
by Fµν = ∂µ Aν − ∂ν Aµ , is a tensor. The mistake of adopting ∇µ Aν − ∇ν Aµ as the definition
of Fµν in a curved space time has led some poor souls to believe, and even claim in published
papers, that in a space time with torsion, for which the second equality does not hold, the
Maxwell field strength tensor is modified by the torsion. This is nonsense.
SR: ∂µ F µν = −J ν
∂[µ Fνλ] = 0 . (5.18)
Thus in a general gravitational field (curved space time) these equations become
GR: ∇µ F µν = −J ν
∇[µ Fνλ] = 0 , (5.19)
where now of course all indices are raised and lowered with the metric gµν ,
Regarding the use of the covariant derivative in the second equation, the same caveat
as above applies.
Using the results derived above, we can rewrite these two equations as
√ √
GR: ∂µ ( gF µν ) = − gJ ν
∂[µ Fνλ] = 0 . (5.21)
It is clear from the first of these equations that the Maxwell equations imply that the
current is covariantly conserved: since
√
∂ν ∂µ ( gF µν ) = 0 (5.22)
From the covariant version ∇µ F µν = −J ν this follows in the seemingly more roundabout
way from the identity (8.26) for the commutator of covariant derivatives that we will
109
establish later, in the context of our discussion of the Riemann curvature tensor in
section 8.
δ
S[Aα , gαβ ] = 0 ⇒ ∇µ F µν = 0 . (5.28)
δAν
Instead of just adding a source-term by hand, a more coherent approach (which also
provides the sources with their own dynamics) is to consider a matter action (minimally)
coupled to the Maxwell field,
The combined Maxwell + matter action will then give rise to the Maxwell equations
with a source provided that one defines the current J α as the variation of the matter
action with respect to the gauge field,
δSM [φ, Aα ]
Jα ∼ . (5.30)
δAα
By the same rationale, we should define the source term for the gravitational field by
the variation of the gravitationally minimally coupled matter action with respect to the
metric. But the source term for the gravitational field equation should be the energy-
momentum tensor. We have already seen that this works out correctly in the case of a
scalar field. Let us now see what we get here.
Recall that in the case of Maxwell theory in Minkoswki space-time the canonical Noether
energy-momentum tensor (associated to the translation invariance of the Maxwell ac-
tion)
∂L
Θµν = − ∂ν Aλ + ηµν L = Fµλ ∂ν Aλ − 14 ηµν Fλσ F λσ (5.31)
∂(∂ µ Aλ )
110
is (of course) conserved but neither symmetric nor gauge-invariant. This can be recti-
fied by adding an identically conserved correction-term and/or by considering Lorentz
transformations as well, not just translations, and taking into account the tensorial (non-
scalar) nature of the vector potential under Lorentz-transformations (see the discussion
in section 11.4 and the references given there for details). Either way one then finally
arrives at the covariant (improved, symmetric and gauge-invariant) energy-momentum
tensor of Maxwell theory, namely
SR: Tµν = Fµλ Fνλ + ηµν L = Fµλ Fνλ − 14 ηµν Fλσ F λσ , (5.32)
~2 + B
T00 = 21 (E ~ 2) . (5.33)
where all indices are raised with the inverse metric gµν . The (non-)conservation equation
SR: ∂µ T µν = Jµ F µν , (5.35)
(deriving this requires using both sets of Maxwell equations) becomes the covariant
(non-)conservation law
GR: ∇µ T µν = Jµ F µν . (5.36)
This becomes a covariant conservation law provided that Tµν is contructed from, and
thus includes the contributions from, the complete matter action (i.e. Maxwell + sources
+ interaction terms).
Once again we observe the remarkable fact that this energy-momentum tensor can be
obtained by varying the action with respect to the metric,
2 δS[Aα , gαβ ]
Tαβ = − √ . (5.37)
g δgαβ
This procedure automatically, and in general, gives rise to a symmetric and gauge in-
variant tensor by construction, without the need for “correction terms”, clearly an ap-
pealing feature. We will see later (section 11.3) that it is also automatically covariantly
conserved “on-shell” by virtue of the general covariance of the action.
111
Minkowski space, one uses the fact that partial derivatives commute. Thus, analogously,
to establish that ∇µ Θµν = 0, say, one needs to be able to commute the covariant
derivatives ∇µ and ∇ν . But, as we will discuss at length in section 8, the characteristic
and defining feature of a non-trivial curved space-time is that these covariant derivatives
do not commute when acting on tensors other than scalars (their commutator defining
the curvature tensor of the space-time).
Thus, when it comes to defining the energy-momentum tensor for Maxwell theory, the
covariant definition based on the variation of the matter action with respect to the
metric wins hands down over the canonical definition based on Noether’s theorem for
translations. We will return to these matters in somewhat more generality in section
11, in particular section 11.4, when discussing the Lagrangian /variational formulation
of general relativity.
Examples of such actions are e.g. the action of a massless scalar field (5.4) in D = 2
(space-time) dimensions
Z
√
S[φ, gαβ ] = − 2 d2 x ggαβ ∂α φ∂β φ
1
(5.42)
Indeed, in that case the metric dependence of the action is precisely such that the
√
combination of of the determinant g and the inverse metric that appears is invariant
under Weyl rescalings,
( √ αβ √
2ω D=2 gg → ggαβ
gαβ → e gαβ ⇒ √ αβ γδ √ (5.44)
D=4 gg g → ggαβ gγδ
112
This is reflected in the fact that the corresponding energy-momentum tensor is traceless
precisely in these dimensions: from (5.9) and (5.34) one finds
As a concrete, and the simplest non-trivial, example, and an illustration of the above
remarks regarding Weyl invariance, let us consider a massless scalar field in (1+1)-
dimensions, in either the usual Minkowski coordinates, or in the Rindler coordinates
discussed in sections 1.3 and 2.7 (we will in particular make use of the results in section
2.7).
Thus a natural basis of solutions to this equation is provided by the plane waves fk ∼
exp(−iωt + ikx), with k2 = ω 2 , i.e. k = ±ω, ω > 0, and their complex conjugates. For
a given ω there are thus two linearly-independent positive frequency solutions,
1
fω (t, x) = e −iω(t − x)
(4πω)1/2
(5.49)
1
gω (t, x) = e −iω(t + x)
(4πω)1/2
(the normalisation factors are inserted for QFT-pedantry reasons only and irrelevant
for the following). Thus the basis of solutions splits into right-movers or right-moving
modes fω and left-movers gω . It is thus convenient to introduce the corresponding null
coordinates uM = t − x, vM = t + x as in (2.99), in terms of which the solutions can be
written as
1
fω = fω (uM ) = e −iωuM
(4πω)1/2
(5.50)
1 −iωv M
gω = gω (vM ) = e .
(4πω)1/2
113
That the solutions split in this way could have also been deduced from the form of the
wave operator in these lightcone (null) coordinates, namely 2 = −4∂uM ∂vM , and the
ensuing equation of motion,
Now let us consider the same issue in Rindler coordinates. In terms of the coordinates
(η, ξ) (2.98), the metric takes the form (2.97)
Note that, as mentioned in section 2.7, the metric in these coordinates is conformally
flat. Thus, by the reasoning above, in section 5.5, in particular the discussion around
equation (5.44), we know that the action and equation of motion for a scalar field
in Rindler coordinates will look just like those in Minkowski coordinates, with the
replacement (t, x) → (η, ξ),
Z Z
√
1
SR [φ] = − 2 αβ
gdηdξ g ∂α φ∂β φ = − 2 dηdξ η αβ ∂α φ∂β φ ,
1
(5.53)
and
2g φ = 0 ⇔ (−∂η2 + ∂ξ2 )φ = 0 . (5.54)
Thus by the same reasoning as above, the solutions can be split into left- and right-
movers and are conveniently written in terms of the Rindler lightcone coordinates
(uR , vR ) = η ± ξ (2.100),
The interest in these (fairly simple) considerations lies in the fact that the exponential
relation (2.102) between the Minkowski and Rindler null coordinates
Of course, the fω (uR ) and their complex conjugates fω∗ (uR ) provide a basis of solutions
for the right-moving modes (in the right Rindler-wedge), so that one can certainly
expand the Minkowski plane waves as
Z ∞
fω (uM ) = dω ′ α(ω, ω ′ )fω′ (uR ) + β(ω, ω ′ )fω∗′ (uR ) , (5.58)
0
114
but necessarily with some of the β(ω, ω ′ ) 6= 0.
If you know a little bit of quantum field theory, you will be able to anticipate that
this means that the notions of creation and annihilation operators are inequivalent, and
that therefore what is the vacuum, say, for an inertial observer, will not be seen as the
vacuum by the accelerating observer (and vice-versa). In the spirit of the equivalence
principle (“before studying gravity, let us study accelerations in flat space”), this Unruh
Effect is a fascinating and rewarding first step towards understanding (or appreciating
the difficulties encountered by) quantum field theory in curved space-times, i.e. in non-
trivial gravitational fields. For more on this see the references given in section 15.5.
In all the cases considered so far, the minimal coupling prescription resulted in a mini-
mally coupled matter action that depends explicitly on the metric - this is as it should
be and is not a surprise. What would be more of a surprise would be to find minimally
coupled and hence generally covariant contributions to an action that do do not depend
on the metric, but such examples do indeed exist (and play an important role in many
branches of physics and even mathematics, ranging from the strong-CP problem in QCD
to high-Tc superconductors to topology). Such terms in the action are usually referred
to as “topological terms” in the physics literature but as they need not be (and usually
are not) purely topological in the mathematics sense, for lack of a better name I refer
to them as “quasi-topological”.
with Z
Ss [φ] = d4 x Ls (φ, ∂α φ) (5.60)
some arbitrary standard scalar field action (of the type already discussed), Sm [A]
the usual Maxwell action,
Z Z
4
Sm [A] = d x Lm (∂α Aβ ) = − 4 d4 x F αβ Fαβ
1
(5.61)
115
where
F̃ αβ = 1
2 ∈αβγδ Fγδ (5.63)
with ∈αβγδ = 0, ±1 the Levi-Civita symbol, a tensor density (cf. remark 3 in section
3.4) and f (φ) some function of the scalar field φ. Note that for f (φ) = 1 (or in
the absence of a scalar field) the axionic term would be (locally) a total derivative
and would hence not contribute to the equations of motion. For non-trivial f (φ),
on the other hand, the axionic term is itself non-trivial.
Minimal coupling for the first two (standard) terms proceeds as already discussued
√
above. For the third, axionic term, we make the usual replacement d4 x → gd4 x
and recall from (3.45) that
1
ǫαβγδ ≡ √ ∈αβγδ (5.64)
g
is a (4,0) tensor, so that the generally covariant generalisation of the axionic action
is Z
√ 4
1
Sa [φ, A, gαβ ] = − 8 gd xf (φ)ǫαβγδ Fγδ Fαβ
Z (5.65)
= − 8 d4 xf (φ) ∈αβγδ Fγδ Fαβ = Sa [φ, A]
1
We see that, as announced, the metric dependence drops out of the minimally cou-
pled generally covariant action. The reason for this is that the axionic Lagrangian
is already all by itself a scalar density of weight w = +1, and that therefore
its integral (3.36) is well-defined and generally covariant without having to take
√
recourse to a metric to construct an auxiliary weight-one object like g.
Minimal coupling for the first term is standard and for the 2nd term one finds,
as above, that the generally covariant minimally coupled Chern-Simons action is
actually metric independent (since the Chern-Simons Lagrangian is a density of
weight w = 1),
Z
1 √ 3 αβγ
Scs [A, gαβ ] = 2 k gd x ǫ Aα Fβγ
Z (5.67)
1 3 αβγ
= 2k d x ∈ Aα Fβγ = Scs [A]
As an aside note that the above theory is also known as topologically massive
Maxwell theory, since the CS term provides a gauge-invariant mass term for the
116
photon. One quick way to see this is to note that the equations of motion are
Gβ = 1
2 ∈βγδ Fγδ (5.69)
the equations of motion and the Bianchi identity take the form
∂α Gβ − ∂β Gα = 2k ∈αβγ Gγ , ∂β Gβ = 0 (5.70)
respectively. Acting with ∂ α on the equation of motion and using the Bianchi
identity and again the equation of motion one finds
where V is the four-volume R3 × [t0 , t1 ]. This holds provided that J vanishes at spatial
infinity.
Now in General Relativity, the conservation law will be replaced by the covariant conser-
vation law ∇µ J µ = 0, and one may wonder if this also leads to some conserved charges
in the ordinary sense. The answer is yes because, recalling the formula for the covariant
divergence of a vector,
∇µ J µ = g−1/2 ∂µ (g 1/2 J µ ) , (5.74)
we see that
∇µ J µ = 0 ⇔ ∂µ (g1/2 J µ ) = 0 , (5.75)
so that g1/2 J µ is a conserved current in the ordinary sense. We then obtain conserved
quantities in the ordinary sense by integrating J µ over a spacelike hypersurface Σ. Using
117
the generalised Gauss’ theorem appropriate for metric space-times, one can see that Q
is invariant under deformations of Σ.
In order to write down more precise equations for the charges in this case, we would
have to understand how a metric on space-time induces a metric (and hence volume
element) on a spacelike hypersurface. This would require developing a certain amount
of formalism, useful for certain purposes in cosmology and for developing a canonical
formalism for General Relativity.
Some aspects of this issue will be developed much later, in a different context and for
a different reason, in section 23.4. Suffice it to say here that the first step would be the
introduction of a normalised normal vector nµ to the hypersurface Σ, nµ nµ = −1 and
to consider the object hµν = gµν + nµ nν . As hµν nν = 0 while hµν X ν = gµν X ν for any
vector X µ normal to nµ , hµν induces a metric and volume element on Σ.
The factor g1/2 apearing in the current conservation law can be understood physically.
To see what it means, split J µ into its space-time direction uµ , with uµ uµ = −1, and
its magnitude ρ as
J µ = ρuµ . (5.76)
This defines the average four-velocity of the conserved quantity represented by J µ and
its density ρ measured by an observer moving at that average velocity (rest mass density,
charge density, number density, . . . ). Since uµ is a vector, in order for J µ to be a vector,
ρ has to be a scalar. Therefore this density is defined as per unit proper volume. The
factor of g1/2 transforms this into density per coordinate volume and this quantity is
conserved (in a comoving coordinate system where J 0 = ρ, J i = 0).
We will come back to this in the context of cosmology later on in this course, but for
now just think of the following picture (Figure 20): take a balloon, draw lots of dots on
it at random, representing particles or galaxies. Next choose some coordinate system on
the balloon and draw the coordinate grid on it. Now inflate or deflate the balloon. This
represents a time dependent metric, roughly of the form ds2 = r 2 (t)(dθ 2 + sin2 θdφ2 ).
You see that the number of dots per coordinate volume element does not change, whereas
the number of dots per unit proper volume will.
∂µ T µν = Gν , (5.77)
where Gµ represents the density of the external forces acting on the system. In par-
ticular, if there are no external forces, the divergence of the energy-momentum tensor
118
is zero. For example, in the case of Maxwell theory and a current corresponding to a
charged particle we have
When there are no external forces, i.e. when one has taken into account the com-
plete matter action, the total energy-momentum tensor is conserved. In that case,
T µν = J (ν)µ defines four conserved currents, more or less (modulo Belinfante improve-
ment terms, see e.g. the discussion in section 11.4 and the references given there) the
currents associated to translation invariance of the action via Noether’s theorem. One
is thus in the setting of conserved currents of the previous section, and one can define
conserved quantities like total energy and momentum, P µ , and angular momentum J µν ,
by integrals of T 0µ or xµ T 0ν − xν T 0µ (the latter being conserved if Tµν is symmetric)
over spacelike hypersurfaces.
We see that, due to the second term, this does not define four conserved currents in the
ordinary or covariant sense (and we will return to the interpretation of this equation,
and the related issue of energy of the gravitational field, in section 11.6).
Nevertheless, in analogy with special relativity, one might like to attempt to define
conserved quantities like total energy and momentum, P µ , and angular momentum
J µν , by integrals of T 0µ or xµ T 0ν − xν T 0µ over spacelike hypersurfaces. However, these
quantities are rather obviously not covariant, and nor are they conserved.
This should perhaps not be too surprising because, after all, for a Poincaré-invariant field
theory in Minkowski space these quantities are preserved as a consequence of Poincaré
invariance, i.e. because of the symmetries (isometries) of the Minkowski metric (as well
as of the action).
A generic metric has no isometries whatsoever (the explicit examples of metrics in these
notes not withstanding, all of which exhibit at least some symmetries). As it has no
symmetries, we have no reason to expect to find associated conserved quantities in
general.
However, if there are symmetries then one should indeed be able to define conserved
quantities (think of Noether’s theorem), one for each symmetry generator. In order to
119
implement this we need to understand how to define and detect isometries of the metric.
For this we need the concepts of Lie derivatives and Killing vectors.
Alternatively, one might try to construct a covariant current-like object (with a corre-
sponding conservation law and the ensuing possibility to define conserved charges) by
contracting the energy-momentum tensor not with the coordinates but with a vector
field V λ , along the lines of
JVµ = T µλ V λ . (5.80)
At least this is now clearly a vector. But is it conserved? Calculating its covariant
divergence, and using the fact that T µν is symmetric and conserved, one finds
Thus we would have a conserved current (and associated conserved charge by the pre-
vious section) for any conserved energy-momentum tensor if the vector field V λ were
such that it satisfies
∇ µ Vν + ∇ ν Vµ = 0 ⇒ ∇µ (T µλ V λ ) = 0 . (5.82)
The link between this observation and the one in the preceding paragraph regarding
symmetries is that this is precisely the condition characterising (infinitesimal) symme-
tries of metric:
• First of all, this is the condition we already found and encountered in (2.63), as
reformulated in (4.49), for the infinitesimal coordinate transformation δxµ = ǫV µ
to generate a symmetry of the metric, thus leading to a conserved charge for
geodesics.
• More generally, as we will discuss in detail in section 6 below, vector fields satisfy-
ing the equation ∇µ Vν + ∇ν Vµ = 0 are indeed in one-to-one correspondence with
infinitesimal generators of continuous symmetries of a metric (isometries), giving
a pleasing and coherent overall picture of symmetries and conservation laws in a
gravitational field.
120
6 The Lie Derivative, Symmetries and Killing Vectors
Before trying to figure out how to detect symmetries of a metric, or so-called isometries,
let us decide what we mean by symmetries of a metric. For example, we would say that
the Minkowski metric has the Poincaré group as a group of symmetries, because the
corresponding coordinate transformations leave the metric invariant.
Likewise, we would say that the standard metrics on the two- or three-sphere have
rotational symmetries because they are invariant under rotations of the sphere. We can
look at this in one of two ways: either as an active transformation, in which we rotate
the sphere and note that nothing changes, or as a passive transformation, in which we
do not move the sphere, all the points remain fixed, and we just rotate the coordinate
system. So this is tantamount to a relabelling of the points. From the latter (passive)
point of view, the symmetry is again understood as an invariance of the metric under a
particular family of coordinate transformations.
Thus consider a metric gµν (x) in a coordinate system {xµ } and a change of coordinates
xµ → y µ (xν ) (for the purposes of this and the following section it will be convenient
not to label the two coordinate systems by different sets of indices). Of course, under
such a coordinate transformation we get a new metric gµν ′ , with (since here we do
Thinking actively, in order to detect symmetries, we should e.g. compare the geometry,
given by the line-element ds2 = gµν dxµ dxν , at two different points x and y related by
y µ (x). Thus we are led to consider the difference
Using the invariance of the line-element under coordinate transformations, i.e. the usual
tensorial transformation behaviour of the components of the metric, we see that we can
also write this as the difference
′
(gµν (y) − gµν (y))dy µ dy ν . (6.3)
Thus we deduce that what we mean by a symmetry, i.e. invariance of the metric under
a coordinate transformation, is the statement
′
gµν (y) = gµν (y) . (6.4)
121
From the passive point of view, in which a coordinate transformation represents a rela-
belling of the points of the space, this equation compares the new metric at a point P ′
(with coordinates y µ ) with the old metric at the point P which has the same values of
the old coordinates as the point P ′ has in the new coordinate system, y µ (P ′ ) = xµ (P ).
The above equality then states that the new metric at the point P ′ has the same
functional dependence on the new coordinates as the old metric on the old coordinates
at the point P . Thus a neighbourhood of P ′ in the new coordinates looks identical to
a neighbourhood of P in the old coordinates, and they can be mapped into each other
isometrically, i.e. such that all the metric properties, like distances, are preserved. Thus
either actively or passively one is led to the above condition.
Note that to detect a continuous symmetry in this way, we only need to consider infinites-
imal coordinate transformations. In that case, the above amounts to the statement that
metrically the space time looks the same when one moves infinitesimally in the direction
given by the coordinate transformation.
We now want to translate the above discussion into a condition for an infinitesimal
coordinate transformation
to generate a symmetry of the metric. Here you can and should think of V µ as a
vector field because, even though coordinates themselves of course do not transform like
vectors, their infinitesimal variations δxµ do,
′
′ ′ ′ ∂z µ µ
z µ = z µ (x) → δz µ = δx (6.6)
∂xµ
and we think of δxµ as ǫV µ .
In fact, we will do something slightly more general than just trying to detect symmetries
of the metric. After all, we can also speak of functions or vector fields with symmetries,
and this can be extended to arbitrary tensor fields (although that may be harder to
visualise). So, for a general tensor field T we will want to compare T ′ (y(x)) with T (y(x))
- this is of course equivalent to, and only technically a bit easier than, comparing T ′ (x)
with T (x).
As usual, we start the discussion with scalars. In that case, we want to compare φ(y(x))
with φ′ (y(x)) = φ(x). We find
122
Evaluating this, we find
LV φ = V µ ∂µ φ . (6.9)
Thus for a scalar, the Lie derivative is just the ordinary directional derivative, and this
is as it should be since saying that a function has a certain symmetry amounts to the
assertion that its derivative in a particular direction vanishes.
We now follow the same procedure for a vector field W µ . We will need the matrix
(∂y µ /∂xν ) and its inverse for the above infinitesimal coordinate transformation. We
have
∂y µ
= δµν + ǫ∂ν V µ , (6.10)
∂xν
and
∂xµ
= δµν − ǫ∂ν V µ + O(ǫ2 ) . (6.11)
∂y ν
Thus we have
∂y µ ν
W ′µ (y(x)) = W (x)
∂xν
= W µ (x) + ǫW ν (x)∂ν V µ (x) , (6.12)
and
1. The result looks non-covariant, i.e. non-tensorial. But as a difference of two vectors
at the same point (recall the limit ǫ → 0) the result should again be a vector. This
is indeed the case. One way to make this manifest is to rewrite (6.15) in terms of
covariant derivatives, as
LV W µ = V ν ∇ ν W µ − W ν ∇ ν V µ
= ∇V W µ − ∇W V µ . (6.16)
This shows that LW W µ is again a vector field. Note, however, that the Lie
derivative, in contrast to the covariant derivative, is defined without reference to
any metric.
123
2. There is an alternative, and perhaps more intuitive, derivation of the above ex-
pression (6.15) for the Lie derivative of a vector field along a vector field, which
makes both its tensorial character and its interpretation manifest (and which also
generalises to other tensor fields; in fact we had already applied it to the metric
in section 2.5 to deduce (2.62)).
′
Namely, let us assume that we are initially in a coordinate system {y µ } adapted
′
to V in the sense that V = ∂/∂y a for some particular a, i.e. V µ = δµa (so that
′
[V, [W, X]]µ + [X, [V, W ]]µ + [W, [X, V ]]µ = 0 . (6.21)
This can also be rephrased as the statement that the Lie derivative is also a
derivation of the Lie bracket, i.e. that one has
4. I want to reiterate at this point that it is extremely useful to think of vector fields
as first order linear differential operators, via V µ → V = V µ ∂µ . In this case, the
124
Lie bracket [V, W ] is simply the ordinary commutator of differential operators,
[V, W ] = [V µ ∂µ , W ν ∂ν ]
= V µ (∂µ W ν )∂ν + V µ W ν ∂µ ∂ν − W ν (∂ν V µ )∂µ − W ν V µ ∂ν ∂ν
= (V ν ∂ν W µ − W ν ∂ν V µ )∂µ
= (LV W )µ ∂µ = [V, W ]µ ∂µ . (6.23)
5. Having equipped the space of vector fields with a Lie algebra structure, in fact
with the structure of an infinite-dimensional Lie algebra, it is fair to ask ‘the Lie
algebra of what group?’. Well, we have seen above that we can think of vector
fields as infinitesimal generators of coordinate transformations. Hence, formally at
least, the Lie algebra of vector fields is the Lie algebra of the group of coordinate
transformations (passive point of view) or diffeomorphisms (active point of view).4
To extend the definition of the Lie derivative to other tensors, we can proceed in one of
two ways. We can either extend the above procedure to other tensor fields by defining
T ··· ′···
··· (y(x)) − T ··· (y(x))
LV T ···
··· := lim . (6.24)
ǫ→0 ǫ
Or we can extend it to other tensors by proceeding as in the case of the covariant
derivative, i.e. by demanding the Leibniz rule. In either case, the result can be rewritten
in manifestly tensorial form in terms of covariant derivatives.
The general result is that the Lie derivative of a (p, q)-tensor T is, like the covariant
derivative, the sum of three kinds of terms: the directional covariant derivative of T
along V , p terms with a minus sign, involving the covariant derivative of V contracted
with each of the upper indices, and q terms with a plus sign, involving the convariant
derivative of V contracted with each of the lower indices (note that the plus and minus
signs are interchanged with respect to the covariant derivative). Thus, e.g., the Lie
derivatives of a (0,2) and a (1,2)-tensor are
LV T νλ = V ρ ∇ρ T νλ + T ρλ ∇ν V ρ + T νρ ∇λ V ρ
(6.26)
LV T µνλ = V ρ ∇ρ T µνλ − T ρνλ ∇ρ V µ + T µρλ ∇ν V ρ + T µνρ ∇λ V ρ .
4
See e.g. H. Glöckner, Fundamental problems in the theory of infinite-dimensional Lie groups,
arXiv:math/0602078 [math.GR] for an introduction and a survey of the problems that arise when
dealing with or trying to define infinite-dimensional Lie groups.
125
Remarks:
1. While it is not obvious from the somewhat pedestrian definition of the Lie deriva-
tive that we have given here, the Lie derivative is an extremely natural operation
on tensors. In differential geometry textbooks (and mathematically more sophis-
ticated accounts of general relativity) it is defined as follows:
(c) Define the Lie derivative to be the infinitesimal generator of this action,
d t ∗
LV T := (Φ ) T |t=0 . (6.28)
dt V
While this definition can be shown to be equivalent to the definition of the Lie
derivative given above in terms of coordinates, Taylor expansions etc., this defi-
nition is evidently more compact, more illuminating and somewhat more to the
point. In particular, it makes the tensorial nature of the Lie derivative manifest.
2. The fact that the Lie derivative provides a representation of the Lie algebra of
vector fields by first-order differential operators on the space of (p, q)-tensors is
expressed by the identity
[LV , LW ] = L[V,W ] . (6.29)
The above general formula for the Lie derivative of a tensor, as e.g. in (6.26), becomes
particularly simple for the metric tensor gµν . The first term is not there (because the
metric is covariantly constant), so the Lie derivative is the sum of two terms (with plus
signs) involving the covariant derivative of V ,
Lowering the index of V with the metric, this can be written more succinctly as
LV gµν = ∇µ Vν + ∇ν Vµ . (6.31)
We are now ready to return to our discussion of isometries (symmetries of the metric).
Evidently, an infinitesimal coordinate transformation is a symmetry of the metric if
LV gµν = 0,
V generates an isometry ⇔ ∇µ Vν + ∇ν Vµ = 0 . (6.32)
126
Vector fields V satisfying this equation are called Killing vectors - not because they kill
the metric but after the 19th century mathematician W. Killing.
An alternative way of writing the Killing equation, which is not manifestly covariant
but which makes it manifest that only derivatives of the metric in the V -direction (and
thus only the corresponding Christoffel symbols) enter, is
This is precisely the condition (2.63) we had encountered first in our discussion of first
integrals of motion for the geodesic equation, and which we had already rewritten in
terms of covariant derivatives, as in (6.31) above, in (4.49).
Since they are associated with symmetries of space time, and since symmetries are
always of fundamental importance in physics, Killing vectors will play an important
role in the following. Our most immediate concern (in section 7.1 and the subsequent
sections below) will be with the conserved quantities associated with Killing vectors.
Other aspects of Killing vectors and their interplay with the geometry of a space-time
will be discussed in sections 9.5 and 17. For now we just note the following simple facts
and examples:
1. Note that by virtue of (6.29) Killing vectors form a Lie algebra, i.e. if V and W
are Killing vectors, then also [V, W ] is a Killing vector,
2. The resulting algebra of Killing vectors is the Lie algebra of the isometry group
of the metric. For example, the collection of all Killing vectors of the Minkowski
metric generates the Lie algebra of the Poincaré group. Indeed, for the Minkoski
space-time in inertial (Cartesian) coordinates, i.e. with the constant standard
metric ηαβ , the Killing condition simply becomes
∂α Vβ + ∂β Vα = 0 , (6.36)
which is solved by
V α = ω αβ xβ + ǫα (6.37)
where the ǫα are constant parameters and the constant matrices ωβα satisfy ωαβ =
−ωβα. These are precisely the infinitesimal Lorentz transformations and transla-
tions of the Poincaré algebra.
127
3. Another simple example is provided by the two-sphere: as mentioned before, in
some obvious sense the standard metric on the two-sphere is rotationally invariant.
In particular, with our new terminology we would expect the vector field ∂φ , i.e.
the vector field with components V φ = 1, V θ = 0 to be Killing. Let us check
this. With the metric dθ 2 + sin2 θdφ2 , the corresponding covector Vµ , obtained by
lowering the indices of the vector field V µ , are
Vθ = 0 , Vφ = sin2 θ . (6.38)
∇θ Vθ = ∂θ Vθ − Γµθθ Vµ
= −Γφθθ sin2 θ = 0
∇θ Vφ + ∇φ Vθ = ∂θ Vφ − Γµθφ Vµ + ∂φ Vθ − Γµθφ Vµ
= 2 sin θ cos θ − 2 cot θ sin2 θ = 0
∇φ Vφ = ∂φ Vφ + Γµφφ Vµ = 0 . (6.39)
Alternatively, using the non-covariant form (6.33) of the Killing equation, one
finds, since V φ = 1, V θ = 0 are constant, that the Killing equation reduces to
∂φ gµν = 0 , (6.40)
which is obviously satisfied. This is clearly a simpler and more efficient argument.
By solving the Killing equations on S 2 , in addition to ∂φ ≡ V(3) one finds two
other linearly independent Killing vectors V(1) and V(2) , namely
Note that V(3) evidently relates these two other Killing vectors by
This is the Lie algebra of infinitesimal rotations, i.e. of the rotation group SO(3),
which is the isometry group of the standard metric on S 2 .
128
4. In general, if the components of the metric are all independent of a particular
coordinate, say y, then by the above argument V = ∂y is a Killing vector,
Such a coordinate system, in which one of the coordinate lines agrees with the
integral curves of the Killing vector, is said to be adapted to the Killing vector
(or isometry) in question. As we did in section 2.5, one can also take the above
equations as the starting point for what one means by a symmetry of the metric
(isometry) and then simply transform it to an arbitrary coordinate system by
requiring that it transforms as a (0, 2)-tensor. Then one arrives at the Killing
condition in the form (6.33).
0 = V µ (∇µ Vν + ∇ν Vµ ) = V µ ∇µ Vν + 12 ∇ν (V µ Vµ ) . (6.46)
129
7 Symmetries and Conserved Charges
We are used to the fact that symmetries lead to conserved quantities (Noether’s theo-
rem). For example, in classical mechanics, the angular momentum of a particle moving
in a rotationally symmetric gravitational field is conserved. In the present context, the
concept of ‘symmetries of a gravitational field’ is replaced by ‘symmetries of the met-
ric’, and we therefore expect conserved charges associated with the presence of Killing
vectors. Here are the two most important classes of examples of this phenomenon:
QK = Kµ ẋµ (7.1)
Note that this is precisely the conserved quantity QV (2.64) with V → K deduced
from Noether’s theorem and the variational principle for geodesics in section 2.5.
130
7.2 Conformal Killing Vectors and Conserved Charges
Another situation of interest occurs when one has a theory invariant under Weyl rescal-
ings and thus a traceless energy-momentum tensor (section 5.5). In that case one can
associate conserved currents not only to Killing vectors fields but also to conformal
Killing vectors C µ , satisfying
∇µ Cν + ∇ν Cµ = 2ω(x)gµν (7.5)
for some function ω(x). Such conformal Killing vectors generate coordinate transfor-
mations that leave the metric invariant up to an overall (Weyl) rescaling.
If the theory is invariant under such Weyl rescalings, then the energy-momentum tensor
is traceless and there should also be a corresponding conserved current. Indeed, we have
JCµ = T µν Cν (7.6)
There is also a counterpart of statement 1 (conserved charges for geodesics) in the case
of conformal Killing vectors, namely for null geodesics (this condition replacing the
assumption in statement 2’ that the energy-momentum tensor is traceless):
1’ Let C µ be a conformal Killing vector field, and let xµ (τ ) be a null geodesic. Then
the quantity
QC = Cµ ẋµ (7.8)
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1, for a null geodesic one has
d d
QC = (Cµ ẋµ ) = 12 (∇ν Cµ + ∇µ Cν )ẋµ ẋν = ω(x)gµν ẋµ ẋν = 0 . (7.9)
dτ dτ
We will make use of (7.9) in the discussion of the cosmological red-shift in section
18.7.
As an aside, note that if K µ is a true Killing vector for a metric g̃µν , say, then it is at
least a conformal Killing vector for any conformally rescaled metric
131
Indeed, writing the Killing equation in the non-covariant form (6.33) (in order to avoid
having to determine the covariant derivatives or Christoffel symbols of conformally
rescaled metrics)
K λ ∂λ g̃µν + ∂µ K λ g̃λν + ∂ν K λ g̃µλ = 0 (7.11)
Thus K µ will be a true Killing vector field for the rescaled metric if the conformal
factor α(x) is constant along the orbits (integral curves) of K µ , and will otherwise be
a conformal Killing vector field. Conformal Killing vector fields that do not arise from
true Killing vector fields in this way are called essential. In the Riemannian case it is
known that (under some technical assumptions) metrics admitting essential conformal
vector fields are conformal to the standard metric on the sphere or the Euclidean space.
In the psuedo-Riemannian (Lorentzian signature) case the situation turns out to be
quite different (with an interesting connection with the plane wave metrics that are the
subject of section 29).5
Finally, let us consider the special case that the conformal factor ω(x) in (7.5) is constant,
ω(x) = ω0 ,
∇µ Cν + ∇ν Cµ = 2ω0 gµν (7.14)
In that case, the transformation generated by the conformal Killing vector is called a
homothety.
A trivial example of a homothety is the uniform dilatation (scaling) of the coordinates
in Minkowski space,
generated by
C = ξ a ∂a : LC ηab = ∂a Cb + ∂b Ca = 2ηab . (7.16)
5
See e.g. F. Belgun, A. Moroianu, L. Ornea, Essential points of conformal
vector fields, arXiv:1002.0482 [math.DG] and references therein, as well as W.
Kühnel, H. Rademacher, Essential conformal field in pseudo-Riemannian geometry,
http://www.math.uni-leipzig.de/~rademacher/Paper/j-math-pures.pdf, Conformal transfor-
mations of pseudo-Riemannian manifolds, http://www.math.uni-leipzig.de/~rademacher/esi.pdf.
132
Other examples of space-times admitting homotheties are for example the exact gravi-
tational plane waves (to be discussed in detail much later, in section 29), for which the
metrics take the form (29.19)
with Aab (u) an arbitrary function of u. These metrics have the homothety
for any choice of plane wave “profile” Aab (u), and this homothety is generated by
Whenever one has such a homothety, there is an explicitly τ -dependent conserved quan-
tity even for non-null geodesics:
is constant along the geodesic. Indeed, repeating the calculation leading to state-
ment 1’, and using the fact that gµν ẋµ ẋν is constant, one finds
d
QC = ω0 gµν ẋµ ẋν − ω0 gµν ẋµ ẋν = 0 . (7.21)
dτ
Remarks:
1. Note that for a null geodesic (7.20) reduces to the conserved charge Cµ ẋµ (7.8) in
1’ above (which does not explicitly depend on τ ).
2. The existence of this constant of motion can also be understood from the Noether
theorem (applied now to transformations of the “fields” xα (τ ) and the “coordi-
nate” τ ). Indeed, when one has a homothety, one has
133
4. For example, for the homothety of Minkowski space-time given above, the con-
served charge is explicitly
For a free particle with ξ̈ a = 0 and ηab ξ̇ a ξ˙b = −1 this is rather obviously conserved,
since then
d
QC = −1 + 1 = 0 . (7.25)
dτ
Parametrising ξ(τ ) as
ξ a (τ ) = ξ0a + (pa /m)τ (7.26)
with the constant momenta satisfying the mass shell condition
pa pa = −m2 , (7.27)
When a metric possesses sufficiently many symmetries (Killing vectors), the geodesic
equations (or the associated Hamilton-Jacobi equation) or, say, the Klein-Gordon equa-
tion or some other field equation in that background are separable and can hence be
reduced to quadratures of ordinary differential equations. It is not uncommon, how-
ever, in particular in the context of black hole physics, to encounter space-times in
which these equations can be separated even though there appear not to be enough
isometries (symmetries of the metric) to explain this. In many cases, this phenomenon
can be explained via (or deduced from) the existence of additional (hidden) symmetries
of the problem, associated not to Killing vectors but to certain higher-rank generalisa-
tions thereof. Most prominent among them are (totally symmetric) Killing tensors and
(totally anti-symmetric) Killing-Yano tensors.
134
To set the stage, recall from above that a Killing vector satisfies
and that using the geodesic equation ẋα ∇α ẋβ = 0 this leads to a first integral QK =
Kβ ẋβ of the geodesic equations of motion via the simple chain of manipulations
d
(Kβ ẋβ ) = ẋα ∇α (Kβ ẋβ ) = ẋα ẋβ ∇α Kβ = 0 (7.30)
dτ
by symmetry of ẋα ẋβ and anti-symmetry of ∇α Kβ .
This has the following two immediate (and, as it turns out, actually useful in practice)
generalisations:
1. Killing Tensors
Let Kβ1 ...βn be totally symmetric rank-n tensor satisfying the Killing tensor equa-
tion
∇(α Kβ1 ...βn ) = 0 . (7.31)
This is evidently one possible generalisation of the Killing vector equation (7.29)
to higher rank tensors (generalising the first formulation in (7.29)). Then
2. Killing-Yano Tensors
Let Kβ1 ...βn be totally anti-symmetric rank-n tensor satisfying the Killing-Yano
equation
∇(α Kβ1 )...βn = 0 . ⇔ ∇α Kβ1 ...βn = ∇[α Kβ1 ...βn ] (7.34)
135
Remarks:
1. Trivial examples of Killing tensors are the metric gαβ (whose associated conserved
quantity gαβ ẋα ẋβ we already know), and products of Killing vectors Kα . . . Kβ
which do not yield any new independent constants of motion beyond those pro-
vided by the Killing vectors. New constants of motion are associated with Killing
tensors that cannot be constructed from the metric and the Killing vectors alone.
Trivial Killing-Yano tensors are Killing vectors Kα and the Levi-Civita tensor (in
four dimensions ǫαβγδ ).
2. There are interesting relations between Killing-Yano tensors and Killing tensors.
For example, it is not difficult to check that if Kαβ is a rank-2 Killing-Yano
tensor, then its square Kαγ K γβ (which is symmetric) is a rank-2 Killing tensor
(and squares of trivial Killing-Yano tensors give rise to trivial Killing tensors, as
in Kα → Kα Kβ ).
and just as the latter these turn out to be useful for massless particles or fields.
For example, a rank 2 conformal Killing tensor satisifies an equation of the form
for some (co-)vector field V . Repeating the calculation (7.33) in the case at hand
for the quantity QC = Cαβ ẋα ẋβ , one finds
d
QC = ∇(α Cβγ) ẋα ẋβ ẋγ = gαβ Vγ ẋα ẋβ ẋγ (7.39)
dτ
which evidentliy vanishes for null geodesics (gαβ ẋα ẋβ = 0).
4. Historically, the discovery of (conformal) Killing and Killing-Yano tensors for the
Kerr metric, the metric describing a rotating black hole (see section 15.10) and
their relation to the separability of the geodesic and field equations in the Kerr
background played a decisive role in the development of the subject.6
6
For more information about and examples and applications of Killing(-Yano) tensors, see e.g. section
35.3 of H. Stephani, D. Kramer, M. MacCallum, C. Hoenslaers, E. Herlt, Exact Solutions to Einstein’s
Field Equations - Second Edition or the articles O. Santillan, Killing-Yano tensors and some applica-
tions, arXiv:1108.0149 [hep-th], F. Larsen, C. Keeler, Separability of Black Holes in String Theory,
arXiv:1207.5928 [hep-th] and the references therein. The interesting question if or when Killing-Yano
tensors form a Lie algebra, extending and generalising the Lie algebra of the isometry group generated
by the Killing vectors, is analysed in D. Kastor, S. Ray, J. Traschen, Do Killing-Yano tensors form a
Lie Algebra?, arXiv:0705.0535 [hep-th].
136
8 Curvature I: The Riemann Curvature Tensor
We now come to one of the most important concepts of General Relativity and Rie-
mannian Geometry, that of curvature and how to describe it in tensorial terms. Among
other things, this will finally allow us to decide unambiguously if a given metric is just
the (flat) Minkowski metric in disguise or the metric of a genuinely curved space. It
will also lead us fairly directly to the Einstein equations, i.e. to the field equations for
the gravitational field.
Recall that the equations that describe the behaviour of particles and fields in a gravi-
tational field involve the metric and the Christoffel symbols determined by the metric.
Thus the equations for the gravitational field should be generally covariant (tensorial)
differential equations for the metric.
But at first, here we seem to face a dilemma. How can we write down covariant differ-
ential equations for the metric when the covariant derivative of the metric is identically
zero? Having come to this point, Einstein himself reached an impasse and required the
help of his mathematician friend Marcel Grossmann whom he had asked to investigate
if there were any tensors that could be built from the second derivatives of the metric.
Grossmann soon found that this problem had indeed been addressed and solved in the
mathematics literature, in particular by Riemann (generalising work of Gauss on curved
surfaces), Ricci and Levi-Civita. It was shown by them that there are indeed non-trivial
tensors that can be constructed from (ordinary) derivatives of the metric. These can
then be used to write down covariant differential equations for the metric.
The most important among these are the Riemann curvature tensor and its various
contractions. In fact, it is known that these are the only tensors that can be constructed
from the metric and its first and second derivatives, and they will therefore play a central
role in all that follows.
Technically the most straightforward way of introducing the Riemann curvature tensor
is via the commutator of covariant derivatives. As this is not geometrically the most
intuitive way of introducing the concept of curvature, we will then, once we have de-
fined it and studied its most important algebraic properties, study to which extent the
curvature tensor reflects the geometric properties of space time.
137
consequence that e.g. the commutator of covariant derivatives acting on a vector field
V µ does not involve any derivatives of V µ . In fact, I will first show, without actually
calculating the commutator, that
for any scalar field φ. This implies that [∇µ , ∇ν ]V λ cannot depend on derivatives of V
because if it did it would also have to depend on derivatives of φ. Hence, the commutator
can be expressed purely algebraically in terms of V . As the dependence on V is clearly
linear, there must therefore be an object Rλσµν such that
This can of course also be verified by a direct calculation, and we will come back to this
below. For now let us just note that, since the left hand side of this equation is clearly
a tensor for any V , the quotient theorem implies that Rσµνλ has to be a tensor. It is the
famous Riemann-Christoffel Curvature Tensor.
Thus, upon taking the commutator the second and third terms drop out and we are left
with
By explicitly calculating the commutator, one can confirm the structure displayed in
(8.2). This explicit calculation shows that the Riemann tensor (for short) is given by
Remarks:
1. Note how useful the quotient theorem is in this case. It would be quite unpleasant
to have to verify the tensorial nature of this expression by explicitly checking its
behaviour under coordinate transformations.
2. Note also that this tensor is clearly zero for the Minkowski metric written in
Cartesian coordinates. Hence it is also zero for the Minkowski metric written in
any other coordinate system. We will prove the converse, that vanishing of the
Riemann curvature tensor implies that the metric is equivalent to the Minkowski
metric, below.
138
3. In the above we have defined the Riemann tensor by the relation (8.2) and then
deduced the explicit expression (8.5). While this is, pragmatically speaking, a
useful way of proceeding, it may be more logical to initially define the Riemann
tensor in a different way, e.g. directly by (8.5). In that case, (8.2) is a result rather
than a definition, known as the Ricci identity.
We will see later that the Riemann tensor is antisymmetric in its first two indices. Hence
we can also write
[∇µ , ∇ν ]Vρ = −Rσρµν Vσ . (8.7)
The extension to arbitrary (p, q)-tensors now follows the usual pattern, with one Rie-
mann curvature tensor, contracted as for vectors, appearing for each of the p upper
indices, and one Riemann curvature tensor, contracted as for covectors, for each of the
q lower indices. Thus, e.g. for a (2, 0)-tensor T αβ one has
I will give two other versions of the fundamental formula (8.2) which are occasionally
useful and used.
Note that, in this sense, the curvature measures the failure of the covariant deriva-
tive to provide a representation of the Lie algebra of vector fields.
139
2. Secondly, one can consider a net of curves xµ (s1 , s2 ) parametrising, say, a two-
dimensional surface, and look at the commutators of the covariant derivatives
along the s1 - and s2 -curves. The formula one obtains in this case (it can be
obtained from (8.10) by noting that X and Y commute in this case) is
dxµ dxν σ
(∇s1 ∇s2 − ∇s2 ∇s1 ) V λ = Rλσµν V . (8.11)
ds1 ds2
In general, to read off all the symmetries from the formula (8.5) is difficult. One way
to simplify things is to look at the Riemann curvature tensor at the origin x0 of a
Riemann normal coordinate system (or some other inertial coordinate system). In that
case, all the first derivatives of the metric disappear and only the first two terms of (8.5)
contribute. One finds
In principle, this expression is sufficiently simple to allow one to read off all the symme-
tries of the Riemann tensor. However, it is more insightful to derive these symmetries
in a different way, one which will also make clear why the Riemann tensor has these
symmetries.
This is a consequence of the fact that the metric is covariantly constant. In fact,
we can calculate
0 = [∇γ , ∇δ ]gαβ
= Rαλ γδ gλβ + Rβλ γδ gαλ
= (Rαβγδ + Rβαγδ ) . (8.15)
140
3. Cyclic permutation symmetry (or first Bianchi identity)
This Bianchi identity is a consequence of the fact that there is no torsion. In fact,
applying [∇γ , ∇δ ] to the covector ∇β φ, φ a scalar, one has
As this has to be true for all scalars φ, this implies Rα[βγδ] = 0 (to see this
you could e.g. choose the (locally defined) coordinate functions φ(x) = xµ with
∇λ φ = δµλ ).
This identity, stating that the Riemann tensor is symmetric in its two pairs of
indices, is not an independent symmetry but can be deduced from the three other
symmetries by some not particularly interesting algebraic manipulations.
We can now count how many independent components the Riemann tensor really has.
(1) implies that the second pair of indices can only take N = (4 × 3)/2 = 6 independent
values. (2) implies the same for the first pair of indices. (4) thus says that the Riemann
curvature tensor behaves like a symmetric (6×6) matrix and therefore has (6×7)/2 = 21
components. We now come to the remaining condition (3): if two of the indices in (3)
are equal, (3) is equivalent to (4) and (4) we have already taken into account. With
all indices unequal, (3) then provides one and only one more additional constraint. We
conclude that the total number of independent components is 20.
Remarks:
1. Note that this agrees precisely with our previous counting in section 2.9 of how
many of the second derivatives of the metric cannot be set to zero by a coordinate
transformation: the second derivative of the metric has 100 independent compo-
nents, to be compared with the 4 × (4 × 5 × 6)/(2 × 3) = 80 components of the
third derivatives of the coordinates. This also leaves 20 components. We thus see
very explicitly that the Riemann curvature tensor contains all the coordinate in-
dependent information about the geometry up to second derivatives of the metric.
In fact, it can be shown that in a Riemann normal coordinate system one has
141
2. Just for the record, I note here that in general dimension D = d + 1 the Riemann
tensor has D 2 (D 2 − 1)/12 independent components. This number arises as
D 2 (D 2 − 1) N (N + 1) D
= −
12 2 4
D(D − 1)
N = (8.20)
2
and describes (as above) the number of independent components of a symmetric
(N ×N )-matrix, now subject to D4 conditions which arise from all the possibilities
of choosing 4 out of D possible distinct values for the indices in (3). Just as for
D = 4, this number of components of the Riemann tensor coincides with the
number of second derivatives of the metric minus the number of independent
components of the third derivatives of the coordinates determined in (2.133),
Finally, a word of warning: there are a large number of sign conventions involved
in the definition of the Riemann tensor (and its contractions we will discuss below),
so whenever reading a book or article, in particular when you want to use results or
equations presented there, make sure what conventions are being used and either adopt
those or translate the results into some other convention. As a check: the conventions
used here are such that Rφθφθ as well as the curvature scalar (to be introduced below)
are positive for the standard metric on the two-sphere.
The Riemann tensor, as we have seen, is a four-index tensor. For many purposes this
is not the most useful object. But we can create new tensors by contractions of the
Riemann tensor. Due to the symmetries of the Riemann tensor, there is essentially only
one possibility, namely the Ricci tensor
It arises naturally from the definition (8.2) of the Riemann tensor in terms of commu-
tators of covariant derivatives, when one considers a contracted commutator,
142
It follows from the symmetries of the Riemann tensor that Rµν is symmetric. Indeed
Thus, for D = 4, the Ricci tensor has 10 independent components, for D = 3 it has 6,
while for D = 2 there is only 1 because there is only one independent component of the
Riemann curvature tensor to start off with.
Contracting (8.8), one consequence of the symmetry of the Ricci tensor is the useful
general result
[∇µ , ∇ν ]T µν = Rµν (T µν − T νµ ) = 0 (8.25)
for any tensor T µν . If T µν = Aµν is antisymmetric, Aµν = −Aνµ , it is not necessary to
take the commutator, so one also has
Note that this can also be deduced (without knowing anything about curvature in
general or the Ricci tensor in particular) from the general expression (4.47) for the
divergence of an anti-symmetric tensor,
There is one more contraction of the Riemann tensor we can perform, namely on the
Ricci tensor itself, to obtain what is called the Ricci scalar or curvature scalar
Remarks:
1. One might have thought that in four dimensions there is another way of con-
structing a scalar, by contracting the Riemann tensor with the Levi-Civita tensor,
but
ǫµνρσ Rµνρσ = 0 (8.29)
2. Note that for D = 2 the Riemann curvature tensor has as many independent
components as the Ricci scalar, namely one, and that in three dimensions the Ricci
tensor has as many components as the Riemann tensor, whereas in four dimensions
there are strictly less components of the Ricci tensor than of the Riemann tensor.
This has profound implications for the dynamics of gravity in these dimensions.
In fact, we will see that it is only in dimensions D > 3 that gravity becomes truly
dynamical, where empty space can be curved, where gravitational waves can exist
etc.
143
3. There are other scalars that can be built form the curvature tensor, but these
are necessarily of higher order in the curvature tensor, such as (trivially) R2 or
(somewhat less trivially) Rµν Rµν or the square of the Riemann tensor, the so-
called Kretschmann scalar
K = Rµνρσ Rµνρσ . (8.30)
Analogously, scalars can be built from higher powers of the Riemann tensor and or
from powers of covariant derivatives of the Riemann tensor (2R being the simplest
example).
4. Such scalars are useful in analysing a given metric because, since they are scalars
they are invariant under coordinate transformations. Thus they directly provide
coordinate-invariant information about a metric. For instance if K is singular at
some point in some coordinate system then it will be singular at that point in all
coordinate system, and thus such a singularity is not an artefact of a bad choice
of coordinate system but a property of the space(-time) itself described by that
metric. A prominent example is the singularity at the origin r = 0 of the Schwarz-
schild metric, unambiguously unveiled by the singularity of its Kretschmann scalar
(15.99).
To see how calculations of the curvature tensor can be done in practice, let us work out
the example of the two-sphere of unit radius, i.e. with line element
The components of the Christoffel symbols and curvature tensor associated with the
metric γab will be denoted by γ abc and r abcd respectively.
We already know that the non-zero Christoffel symbols necessarily have two φ-indices
and one θ-index (from γφφ = sin2 θ), and are given by
We also know that the Riemann curvature tensor has only one independent component.
Let us therefore work out r θφθφ . From the definition we find
The second and third terms are manifestly zero, and we are left with
144
Thus we have
rθθ = 1
rθφ = 0
rφφ = sin2 θ . (8.36)
showing that the standard metric on the two-sphere is what we will later call an Einstein
metric. The Ricci scalar is
1
r = gθθ rθθ + g φφ rφφ = 1 + sin2 θ = 2 . (8.38)
sin2 θ
In particular, we have here our first concrete example of a space with non-trivial, in fact
positive, curvature.
We will see later on, in section 17, that this form of the curvature tensor, or its equivalent,
We now turn to some variations of the above theme (and some other generalisations are
discussed in section 8.7 below).
1. First of all, let us address the question what is the curvature (scalar) of a sphere
of radius ρ, i.e. of the space with line element
145
• The first is to simply and blindly redo the above calculations in this case and
to see what one gets.
• Alternatively, and somewhat more insightfully, rather than redoing the cal-
culation in that case one can argue as follows. Let us observe first of all that
the Christoffel symbols are invariant under constant rescalings of the metric
because they are schematically of the form g −1 ∂g. Therefore the Riemann
curvature tensor, which only involves derivatives and products of Christoffel
symbols, is also invariant. Hence the Ricci tensor, which is just a contraction
of the Riemann tensor, is also invariant:
2. Now let us promote ρ to a new radial coordinate r and ask the question what is
the curvature tensor of the 3-dimensional space with coordinates (r, xa ) = (r, θ, φ)
and line element
ds2 = dr 2 + r 2 (dθ 2 + sin2 θdφ2 ) . (8.45)
On the one hand, because one seems to have just added a trivial r-direction to the
2-sphere, one might be tempted to suspect that also this 3-dimensional space has
non-trivial curvature. On the other hand, we recognise the above metric as the
Euclidean metric on R3 , written in spherical coordinates, and as such we expect
its curvature (in fact, all components of the Riemann tensor) to be zero.
The latter expectation is of course borne out, but it is instructive to see explicitly
how this cancellation occurs. In fact, it will be even more instructive to consider
an apparently harmless and innocuous modification of the above metric which
consists in replacing dr 2 by some constant multiple of dr 2 ,
146
Equivalently, up to a truly harmless overall constant factor, we can think of this
as the Euclidean metric, but with the metric on the unit-sphere replaced by that
√
of a metric of radius 1/ p 6= 1),
ds2 = p dr 2 + (r 2 /p)(dθ 2 + sin2 θdφ2 ) . (8.47)
γab denoting again the components of the metric on the unit sphere. From these
we can deduce the non-trivial Christoffel symbols
From this, in turn, one finds that all the components of the Riemann tensor
involving at least one r-index are zero, whereas for the purely angular components
one finds
Rabcd = r abcd + Γacr Γrbd − Γadr Γrbc . (8.50)
Using (8.39) and (8.49), one sees that
Therefore precisely for p = 1 the two contributions to the curvature tensor indeed
cancel and the curvature tensor is identically zero, as expected.
Equally interesting is the fact that for p 6= 1 not only is the curvature non-zero,
but the space actually has a curvature singularity at r = 0. Namely, it follows
from the above result that the only non-vanishing components of the Ricci tensor
of this 3-dimensional space are
Thus for p 6= 1 the Ricci scalar diverges as r → 0. Since the Ricci scalar is a scalar
(under coordinate transformations), this divergence cannot be an artefact of a bad
choice of coordinates, and indicates that there is a genuine geometric singularity
for r → 0.
3. As a final variation of this theme, we consider the above example in one dimension
less, i.e. we look at the metric one obtains if one replaces the Euclidean metric on
R2 written in polar coordinates by
147
where the angle φ has period 2π.
In this case there is an interesting twist (pun intended) and the situation is some-
what different. Pulling out the factor of p, one sees that (up to this irrelevant
overall constant factor) the metric can be written as
√
ds2 = dr 2 + r 2 d(φ/ p)2 . (8.55)
This would be the standard Euclidean metric on R2 either for p = 1 or if the angle φ
√
had periodicity 2π p. But since φ has period 2π, this results in a misidentification
of the points in a plane, like when one rolls up a flat piece of paper into a cone.
Away from r = 0, this space is intrinsically flat (all the components of the Riemann
curvature tensor are zero, as one can easily calculate - see section 9.1 for an
explanation of this use of the word “intrinsic”), but there is a conical singularity
at the tip of the cone r = 0 (which can be thought of as a δ-function contribution
to the curvature localised at r = 0).
We can generalise the previous example, the curvature of the 2-sphere, somewhat, in this
way connecting our considerations with the classical realm of the differential geometry
of surfaces, in particular with the Gauss Curvature and with the Liouville Equation.
For any 2-dimensional metric γαβ it is a simple exercise to derive the relation between the
one independent component, say r1212 , of the Riemann tensor, and the scalar curvature.
First of all, the Ricci tensor is
Using the fact that in 2 dimensions the components of the inverse metric are explicitly
given by !
1 γ −γ
22 12
γ αβ = (8.58)
γ11 γ22 − γ12 γ21 −γ21 γ11
and the (anti-)symmetry properties (1) and (2) of the Riemann tensor, one finds
2
r= r1212 . (8.59)
γ11 γ22 − γ12 γ21
The factor of 2 in this equation is a consequence of our (and the conventional) definition
of the Riemann curvature tensor, and is responsible for the fact that the scalar curvature
of the unit 2-sphere is r = +2. In two dimensions, it is more convenient and natural to
148
absorb this factor of 2 into the definition of the (scalar) curvature, and what one then
gets is the classical Gauss Curvature
1
K := r (8.60)
2
of a two-dimensional surface.
In the same way one can show that the Ricci tensor is related to the Ricci scalar by
This generalises the result for the standard metric on the 2-sphere found by explicit
calculation in section 8.5. It shows that in complete generality the Ricci tensor of
a two-dimensional space or space-time has only one (double) eigenvalue, namely the
Gauss curvature K.
When we specialise the above to the class of conformally flat metrics with line element
and
K = −e −2h ∆h (8.64)
where ∆ = ∂x2 + ∂y2 is the 2-dimensional Laplacian with respect to the flat Euclidean
metric dx2 + dy 2 . Thus a surface with constant curvature K = k is given by a solution
to the non-linear differential equation
∆h + ke 2h = 0 . (8.65)
This is the (in-)famous Liouville equation, which plays a fundamental role in many
branches of mathematics (and mathematical physics). In terms of the intrinsic Laplacian
∆γ associated to the metric γαβ , the Gaussian cuvature and the Liouville equation can
also simply be written as
K = −∆γ h , ∆γ h + k = 0 , (8.66)
√
since, due to the peculiarities of 2 dimensions, γγ αβ in independent of h, i.e. is confor-
mally invariant (as we already observed in a different context in section 5.5, cf. (5.44)),
√ αβ
γγ = e 2h e −2h δαβ = δαβ
1 √ 1 (8.67)
⇒ ∆γ = √ ∂α ( γγ αβ ∂β ) = √ ∂α (δαβ ∂β ) = e −2h ∆ .
γ γ
I will not attempt to say anything about the general (local) solution of this equation
(which roughly speaking depends on an arbitrary meromorphic function of the com-
plex coordinate z = x + iy), but close this section with some special (and particularly
prominent) solutions of this equation.
149
1. It is easy to see that
e 2h(x, y) = y −2 ⇔ h(x, y) = − ln y (8.68)
solves the Liouville equation with k = −1. The corresponding space of constant
negative curvature is the Poincaré upper-half plane model of the hyperbolic ge-
ometry,
dx2 + dy 2
ds2 = ( (x, y) ∈ R2 , y > 0 ) . (8.69)
y2
By the coordinate transformation y = ez this is mapped to the equivalent metric
ds2 = dz 2 + e −2z dx2 (8.70)
on the entire (x, z)-plane.
It is worth remarking that the Poincaré upper-half plane model of a space with constant
negative curvature readily generalises to arbitrary dimension and signature. Thus
d~x2 + dy 2
ds2 = , d~x2 = δab dxa dxb or d~x2 = ηab dxa dxb (8.75)
y2
is the metric of a D = (d+1)-dimensional space(-time) with constant negative curvature.
The Lorentzian metric will reappear later (sections 18.5 and 19) as a solution to the
Einstein equations with a negative cosmological constant, and is in this context known
as anti-de Sitter metric (in Poincaré coordinates, which cover only a part of the complete
space-time).
150
8.8 Bianchi Identities
So far, we have discussed algebraic properties of the Riemann tensor. But the Riemann
tensor also satisfies some differential identities which, in particular in their contracted
form, will be of fundamental importance in the following.
The first identity is easy to derive. As a (differential) operator the covariant derivative
clearly satisfies the Jacobi identity
If you do not believe this, just write out the twelve relevant terms explicitly to see that
this identity is true:
Hence, recalling the definition of the curvature tensor in terms of commutators of co-
variant derivatives, we obtain
Because of the antisymmetry of the Riemann tensor in the first two indices, this can
also be written more explicitly as
To also turn the last term into a Ricci tensor we contract once more, with gβλ to obtain
the contracted Bianchi identity
or
∇µ (Rµν − 12 gµν R) = 0 . (8.82)
The tensor appearing in this equation is the so-called Einstein tensor Gµν ,
It is the unique divergence-free tensor that can be built from the metric and its first
and second derivatives (apart from gµν itself, of course), and this is why it will play the
central role in the Einstein equations for the gravitational field.
151
8.9 Another Look at the Principle of General Covariance
In the section on the principle of minimal coupling, I mentioned that this algorithm
or the principle of general covariance do not necessarily fix the equations uniquely. In
other words, there could be more than one generally covariant equation which reduces
to a given equation in Minkowski space. Having the curvature tensor at our disposal
now, we can construct examples of this kind.
As a first example, consider a massive particle with spin, characterised by a spin tensor
S µν . We could imagine the possibility that in a gravitiational field there is a coupling
between the spin and the curvature, so that the particle does not follow a geodesic, but
rather obeys an equation of the (Mathisson-Papapetrou) type
This equation is clearly tensorial (generally covariant) and reduces to the equation for a
straight line in Minkowski space, but differs from the geodesic equation (which has the
same properties) for a 6= 0. But, since the Riemann tensor is second order in derivatives,
a has to be a dimensionful quantity (of length dimension 1) for this equation to make
sense. Thus the rationale for usually not considering such additional terms is that they
are irrelevant at scales large compared to some characteristic size of the particle, say its
Compton wave length.
We will mostly be dealing with weak gravitational fields and other low-energy phenom-
ena and under those circumstances the minimal coupling rule can be trusted. However,
it is not ruled out that under extreme conditions (very strong or strongly fluctuating
gravitational fields) such terms are actually present and relevant.
For another example, consider the wave equation for a (massless, say) scalar field Φ. In
Minkowski space, this is the Klein-Gordon equation which has the obvious curved space
analogue (4.39)
2Φ = 0 (8.85)
obtained by the minimal coupling description. However, one could equally well postulate
the equation
(2 + aR)Φ = 0 , (8.86)
152
8.10 Generalisations
We have defined the Riemann tensor via the commutator of covariant derivatives (8.2)
In order to show explicitly (rather than by appealing to (8.87)) that this transforms as
a tensor, all that one needs is the characteristic non-tensorial transformation behaviour
of the Christoffel symbols Γλµν . But, as discussed in sections 4.8 and 4.9, an arbitrary
connection Γ̃λµν that can be used to define a tensorial covariant derivative has the same
non-tensorial transformation behaviour. Therefore
R̃λσµν ≡ Rλσµν (Γ̃) = ∂µ Γ̃λσν − ∂ν Γ̃λσµ + Γ̃λµρ Γ̃ρνσ − Γ̃λνρ Γ̃ρµσ (8.89)
defines a tensor for any connection, namely the curvature tensor of the connection Γ̃λµν .
It is related to the commutator of covariant derivatives by
˜ µ, ∇
[∇ ˜ ν ]V λ = R̃λ V σ + (Γ̃ρ − Γ̃ρ )∇ρ V λ = R̃λ V σ + T ρ ∇ρ V λ , (8.90)
σµν µν νµ σµν µν
where T ρµν is the torsion tensor. As before, one can also define the Ricci tensor and
Ricci scalar by
However, it is crucial to keep in mind that the symmetry properties and Bianchi iden-
tities satisfied by these generalised curvature tensors will in general differ from those of
the Riemann-Christoffel tensor. This should be clear from the way we derived the sym-
metries of the Riemann tensor in section 8.3, where we related the symmetries to the
properties (metricity, no torsion) that characterise the canonical Levi-Civita connection
(Christoffel symbols). For example, in general the Ricci tensor will not be symmetric,
the Bianchi identity Rα[βγδ] = 0 will be replaced by an identity relating R̃α[βγδ] to the
torsion (and its covariant derivative), etc.7
7
For more on this and related topics, see e.g. section 1 of T. Ortin, Gravity and Strings.
153
9 Curvature II: Geometry and Curvature
In this section, we will discuss three properties of the Riemann curvature tensor that
illustrate its geometric significance and thus, a posteriori, justify equating the commu-
tator of covariant derivatives with the intuitive concept of curvature. These properties
are
• the fact that the space-time metric is equivalent to the (in an obvious sense flat)
Minkowski metric if and only if the Riemann curvature tensor vanishes, and
• the geodesic deviation equation describing the effect of curvature on the trajecto-
ries of families of freely falling particles.
The Riemann curvature tensor and its relatives, introduced above, measure the intrinsic
geometry and curvature of a space or space-time. This means that they can be calculated
by making experiments and measurements on the space itself. Such experiments might
involve things like checking if the interior angles of a triangle add up to π or not.
An even better method, the subject of this section, is to check the properties of parallel
transport. The tell-tale sign (or smoking gun) of the presence of curvature is the fact
that parallel transport is path dependent, i.e. that parallel transporting a vector V from
a point A to a point B along two different paths will in general produce two different
vectors at B. Another way of saying this is that parallel transporting a vector around a
closed loop at A will in general produce a new vector at A which differs from the initial
vector.
This is easy to see in the case of the two-sphere (see Figure 8). Since all the great
circles on a two-sphere are geodesics, in particular the segments N-C, N-E, and E-C in
the figure, we know that in order to parallel transport a vector along such a line we just
need to make sure that its length and the angle between the vector and the geodesic
line are constant. Thus imagine a vector 1 at the north pole N, pointing downwards
along the line N-C-S. First parallel transport this along N-C to the point C. There we
will obtain the vector 2, pointing downwards along C-S. Alternatively imagine parallel
transporting the vector 1 first to the point E. Since the vector has to remain at a
constant (right) angle to the line N-E, at the point E parallel transport will produce
the vector 3 pointing westwards along E-C. Now parallel transporting this vector along
E-C to C will produce the vector 4 at C. This vector clearly differs from the vector 2
that was obtained by parallel transporting along N-C instead of N-E-C.
154
N
E
4 3
C
To illustrate the claim about closed loops above, imagine parallel transporting vector 1
along the closed loop N-E-C-N from N to N. In order to complete this loop, we still have
to parallel transport vector 4 back up to N. Clearly this will give a vector, not indicated
in the figure, different from (and pointing roughly at a right angle to) the vector 1 we
started off with.
This intrinsic geometry and curvature described above should be contrasted with the
extrinsic geometry which depends on how the space may be embedded in some larger
space.
For example, a cylinder can be obtained by ‘rolling up’ R2 . It clearly inherits the flat
metric from R2 and if you calculate its curvature tensor you will find that it is zero.
Thus, the intrinsic curvature of the cylinder is zero, and the fact that it looks curved to
an outside observer is not something that can be detected by somebody living on the
cylinder. For example, parallel transport is rather obviously path independent.
155
The precise statement regarding the relation between the path dependence of parallel
transport and the presence of curvature is the following. If one parallel transports a
covector Vµ (I use a covector instead of a vector only to save myself a few minus signs
here and there) along a closed infinitesimal loop xµ (τ ) with, say, x(τ0 ) = x(τ1 ) = x0 ,
then one has I
Vµ (τ1 ) − Vµ (τ0 ) = 2 ( xρ dxν )Rσµρν (x0 )Vσ (τ0 ) .
1
(9.1)
Thus an arbitrary vector V µ will not change under parallel transport around an arbitrary
small loop at x0 only if the curvature tensor at x0 is zero. This can of course be extended
to finite loops, but the important point is that in order to detect curvature at a given
point one only requires parallel transport along infinitesimal loops.
Before turning to a proof of this result, I just want to note that intuitively it can be
understood directly from the definition of the curvature tensor (8.2). Imagine that the
infinitesimal loop is actually a tiny parallelogram made up of the coordinate lines x1
and x2 . Parallel transport along x1 is governed by the equation ∇1 V µ = 0, that along
x2 by ∇2 V µ = 0. The fact that parallel transporting first along x1 and then along x2
can be different from doing it the other way around is precisely the statement that ∇1
and ∇2 do not commute, i.e. that some of the components Rµν12 of the curvature tensor
are non-zero.
For sufficiently small (infinitesimal) loops, we can expand the Christoffel symbols as
The linear term in the expansion of Vµ (τ ) arises from the zero’th order contribution
Γλµν (x0 ) in the first order (single integral) term in (9.4),
Z τ1
(1) λ
[Vµ (τ1 ) − Vµ (τ0 )] = Γ µν (x0 )Vλ (τ0 )( dτ ′ ẋν (τ ′ )) . (9.6)
τ0
156
Now the important observation is that, for a closed loop, the integral in brackets is zero,
Z τ1
dτ ′ ẋν (τ ′ ) = xν (τ1 ) − xν (τ0 ) = 0 . (9.7)
τ0
Thus the change in Vµ (τ ), when transported along a small loop, is at least of second
order. Such second order terms arise in two different ways, from the first order term
in the expansion of Γλµν (x) in the first order term in (9.4), and from the zero’th order
terms Γλµν (x0 ) in the quadratic (double integral) term in (9.4),
Z τ1
(2)
[Vµ (τ1 ) − Vµ (τ0 )] = (∂ρ Γλµν )(x0 )Vλ (τ0 )( dτ ′ (x(τ ′ ) − x0 )ρ ẋν (τ ′ ))
τ0
Z τ1 Z τ′
+ (Γλµν Γσλρ )(x0 )Vσ (τ0 ) dτ ′
dτ ′′ ẋν (τ ′ )ẋρ (τ ′′ ) (9.8)
τ0 τ0
The final observation we need is that the remaining integral is anti-symmetric in the
indices ν, ρ, which follows immediately from
Z τ1 Z τ1
′ ν ′ ρ ′ ν ′ ρ ′ d ν ′ ρ ′
dτ (ẋ (τ )x (τ ) + x (τ )ẋ (τ ) = dτ ′ (x (τ )x (τ )) = 0 . (9.11)
τ0 τ0 dτ ′
It now follows from (9.10) and the definition of the Riemann tensor that
I
Vµ (τ1 ) − Vµ (τ0 ) = 12 ( xρ dxν )Rσµρν (x0 )Vσ (τ0 ) . (9.12)
We are now finally in a position to prove the converse to the statement that the
Minkowski metric has vanishing Riemann tensor. Namely, we will see that when the
Riemann tensor of a metric vanishes, there are coordinates in which the metric is the
standard Minkowski metric. Since the opposite of curved is flat, this then allows one
to unambiguously refer to the Minkowski metric as the flat metric, and to Minkowski
space as flat space(-time).
So let us assume that we are given a metric with vanishing Riemann tensor. Then, by
the above, parallel transport is path independent and we can, in particular, extend a
vector V µ (x0 ) to a vector field everywhere in space-time: to define V µ (x1 ) we choose any
157
path from x0 to x1 and use parallel transport along that path. In particular, the vector
field V µ , defined in this way, will be covariantly constant or parallel, ∇µ V ν = 0. We can
also do this for four linearly independent vectors Vaµ at x0 and obtain four covariantly
constant (parallel) vector fields which are linearly independent at every point.
An alternative way of saying or seeing this is the following: The integrability condition
for the equation ∇µ V λ = 0 is
This means that the (4 × 4) matrices M (µ, ν) with coefficients M (µ, ν)λσ = Rλσµν have
a zero eigenvalue. If this integrability condition is satisfied, a solution to ∇µ V λ can be
found. If one wants four linearly independent parallel vector fields, then the matrices
M (µ, ν) must have four zero eigenvalues, i.e. they are zero and therefore Rλσµν = 0.
If this condition is satisfied, all the integrability conditions are satisfied and there will
be four linearly independent covariantly constant vector fields - the same conclusion as
above.
We will now use this result in the proof, but for covectors instead of vectors. Clearly
this makes no difference: if V µ is a parallel vector field, then gµν V ν is a parallel covector
field.
Fix some point x0 . At x0 , there will be an invertible matrix eaµ such that
with the initial condition Eµa (x0 ) = eaµ . This gives rise to four linearly independent
parallel covectors Eµa .
158
Summing this up, we have seen that, starting from the assumption that the Riemann
curvature tensor of a metric gµν is zero, we have proven the existence of coordinates ξ a
in which the metric takes the Minkowski form,
∂ξ a ∂ξ b
gµν = ηab . (9.20)
∂xµ ∂xν
In a certain sense the main effect of curvature (or gravity) is that initially parallel
trajectories of freely falling non-interacting particles (dust, pebbles,. . . ) do not remain
parallel, i.e. that gravity has the tendency to focus (or defocus) matter. This statement
find its mathematically precise formulation in the geodesic deviation equation.
Let us, as we will need this later anyway, recall first the situation in the Newtonian
theory. One particle moving under the influence of a gravitational field is governed by
the equation
d2 i
dt2
x = −∂ i φ(x) , (9.21)
where φ is the potential. Now consider a family of particles, or just two nearby particles,
one at xi (t) and the other at xi (t) + δxi (t). The other particle will of course obey the
equation
d2
dt2
(xi + δxi ) = −∂ i φ(x + δx) . (9.22)
From these two equations one can deduce an equation for δx itself, namely
d2
dt2
δxi = −∂ i ∂j φ(x)δxj . (9.23)
It is the counterpart of this equation that we will be seeking in the context of General
Relativity. The starting point is of course the geodesic equation for xµ and for its nearby
partner xµ + δxµ ,
d2 µ µ d ν d λ
dτ 2 x + Γ νλ (x) dτ x dτ x = 0 , (9.24)
and
d2
dτ 2 (x
µ
+ δxµ ) + Γµνλ (x + δx) dτ
d
(xν + δxν ) dτ
d
(xλ + δxλ ) = 0 . (9.25)
As above, from these one can deduce an equation for δx, namely
d2
dτ 2
δxµ + 2Γµνλ (x) dτ x dτ δxλ + ∂ρ Γµνλ (x)δxρ dτ
d ν d d ν d λ
x dτ x = 0 . (9.26)
Now this does not look particularly covariant. Thus instead of in terms of d/dτ we
would like to rewrite this in terms of the covariant operator ∇τ , with
d µ dxν λ
∇τ δxµ = δx + Γµνλ δx . (9.27)
dτ dτ
159
Calculating (∇τ )2 δxµ , replacing ẍµ appearing in that expression by −Γµνλ ẋν ẋλ (because
xµ satisfies the geodesic equation) and using (9.26), one finds the nice covariant geodesic
deviation equation
(∇τ )2 δxµ = Rµνλρ ẋν ẋλ δxρ . (9.28)
d2 µ
δx = 0 , (9.29)
dτ 2
which has the solution
δxµ = Aµ τ + B µ . (9.30)
In particular, one recovers Euclid’s parallel axiom that two straight lines intersect at
most once and that when they are initially parallel they never intersect. This shows
very clearly that intrinsic curvature leads to non-Euclidean geometry in which e.g. the
parallel axiom is not necessarily satisifed.
It is also possible to give a manifestly covariant, and thus perhaps slightly more sat-
isfactory, derivation of the above geodesic deviation equation. The starting point is a
geodesic vector field uµ , uν ∇ν uµ = 0, and a deviation vector field δxµ = ξ µ characterised
by the condition
[u, ξ]µ = uν ∇ν ξ µ − ξ ν ∇ν uµ = 0 . (9.31)
The rationale for this condition is that, if xµ (τ, s) is a family of geodesics labelled by s,
one has the identifications
∂ µ ∂ µ
uµ = x (τ, s) , ξµ = x (τ, s) . (9.32)
∂τ ∂s
Since second partial derivatives commute, this implies the relation
∂ µ ∂ µ
ξ (τ, s) = u (τ, s) , (9.33)
∂τ ∂s
(implicit in the identification δẋ = (d/dτ )δx employed in the above derivation). The
condition (9.31) is nothing other than the covariant way of writing (9.33).
uλ ∇λ (uν ∇ν ξ µ ) = uλ ∇λ (ξ ν ∇ν uµ )
= (uλ ∇λ ξ ν )∇ν uµ + uλ ξ ν ∇λ ∇ν uµ
= (ξ λ ∇λ uν )∇ν uµ + uλ ξ ν [∇λ , ∇ν ]uµ + uλ ξ ν ∇ν ∇λ uµ . (9.35)
160
one sees that the first term of the last line of (9.35) cancels and, using the definition of
the curvature tensor, one is left with
So far, we have only used the condition [u, ξ]µ = 0 (9.31). Thus, for a general family
of curves (the integral curves of uµ ), the deviation vector ξ µ feels a force which is due
to both the curvature of space-time and the acceleration aµ = uλ ∇λ uµ of the family of
curves. For geodesics, the latter is absent and one finds the geodesic deviation equation
(9.28) in the form
uλ ∇λ (uν ∇ν ξ µ ) = Rµσλν uσ uλ ξ ν . (9.38)
Manipulations similar to those leading from (9.35) to (9.38) allow one to derive an
equation for the rate of change of the divergence ∇µ uµ of a family of geodesics along
the geodesics. This simple result, known as the Raychaudhuri equation, has important
implications and ramifications in general relativity, in particular in the context of the so-
called singularity theorems of Penrose, Hawking and others, none of which will, however,
be explored here.
To set the stage, let V µ be an, at first, arbitrary vector field. From the definition of the
curvature tensor,
(∇µ ∇ν − ∇ν ∇µ )V λ = Rλρµν V ρ , (9.39)
V ν ∇µ ∇ν V µ − V ν ∇ν ∇µ V µ = Rµν V µ V ν . (9.41)
161
Here the first term is the rate of change of the divergence
θ = ∇µ uµ (9.45)
along uµ ,
d
uν ∇ν (∇µ uµ ) = θ , (9.46)
dτ
and we can therefore write the above result as
d
θ = −(∇µ uν )(∇ν uµ ) − Rµν uµ uν . (9.47)
dτ
To gain some more insight into the geometric significance of this equation, consider the
case that uµ is timelike and normalised as uµ uµ = −1 (so that τ is proper time) and
introduce the tensor
hµν = gµν + uµ uν . (9.48)
It has the characteristic property that it is orthogonal to uµ ,
It can therefore be interpreted as the spatial projection of the metric in the directions
orthogonal to the timelike vector field uµ . This can be seen more explicitly in terms of
the projectors
pµν = δµν + uµ uν
pµν pνλ = pµλ . (9.50)
pµν uν = 0 , (9.51)
pµν ξ ν = ξ µ . (9.52)
Thus, acting on an arbitrary vector field V µ , pµν V ν is the projection of this vector into
the plane orthogonal to uµ . In the same way one can project an arbitrary tensor. For
example, the projection of the metric is
as anticipated above. In particular, while for the space-time metric one obviously has
gµν gµν = 4, the trace of hµν is
Bµν = ∇ν uµ . (9.55)
162
This tensor satisfies
uµ Bµν = 21 ∇ν (uµ uµ ) = 21 ∇ν (−1) = 0 (9.56)
Bµν uν = uν ∇ν uµ = 0 (9.57)
provided that uµ is a goedesic vector field. Thus in that case Bµν is automatically a
spatial tensor in the sense above. When uµ is not geodesic (or geodesic but not afffinely
parametrised), one can construct a spatial tensor by explicitly projecting ∇µ uµ onto its
spatial components and defining
For the time being we will consider the simplest case where uµ is both geodesic and
normalised (and I will briefly come back to the more general case in the remarks below).
We now decompose Bµν into its anti-symmetric, symmetric-traceless and trace part,
with
1
ωµν = 2 (Bµν − Bµν )
1
σµν = 2 (Bµν + Bµν ) − 13 θhµν
θ = hµν Bµν = g µν Bµν = ∇µ uµ . (9.60)
The quantities ωµν , σµν and θ are known as the rotation tensor, shear tensor, and
expansion of the congruence (family) of geodesics defined by uµ . In particular, the
geodesics will converge (diverge) if θ > 0 (θ < 0). In terms of these quantities we can
write (9.47) as
d
θ = − 31 θ 2 − σ µν σµν + ω µν ωµν − Rµν uµ uν . (9.61)
dτ
This is the Raychaudhuri equation for timelike geodesic congruences.
Remarks:
1. An important special case of this equation arises when the rotation is zero, ωµν =
0. This happens for example when uµ = ∂µ F is the gradient co-vector of some
function F . In this case uµ is orthogonal to the level-surfaces of F .
In this case one has
d
θ = − 31 θ 2 − σ µν σµν − Rµν uµ uν . (9.62)
dτ
163
The first two terms on the right hand side are manifestly non-positive (recall that
σµν is a spatial tensor and hence σµν σ µν ≥ 0). Thus, if one assumes that the
geometry is such that
Rµν uµ uν ≥ 0 (9.63)
(by the Einstein equations to be discussed in the next section, this translates into
a positivity condition on the energy-momentum tensor known as the strong energy
condition), one finds
d
θ = − 13 θ 2 − σ µν σµν − Rµν uµ uν ≤ 0 . (9.64)
dτ
This means that the divergence (convergence) of geodesics will decrease (increase)
in time. The interpretation of this result is that gravity is an attractive force (for
matter satisfying the strong energy condition) whose effect is to focus geodesics.
2. Before continuing, note that while the conditions that we have by now imposed
on uµ , geodesic, constant length uµ uµ = −1, and gradient uµ = ∂µ F , may seem
overly restrictive, they are at least mutually consistent. A minimal variation of
the proof leading to the result in (6.46) that a Killing vector field is geodesic iff it
is of constant length establishes the same result for gradient vector fields instead
of Killing vector fields, namely that a gradient vector field is geodesic if and only
if it is of constant length. Since a gradient vector field satisfies
uµ = ∇ µ F ⇒ ∇ µ uν − ∇ ν uµ = 0 (9.65)
3. According to (9.64), dθ/dτ is not only negative but actually bounded from above
by
d
θ ≤ − 13 θ 2 . (9.67)
dτ
Rewriting this equation as
d 1 1
≥ , (9.68)
dτ θ 3
one deduces immediately that
1 1 τ
≥ + . (9.69)
θ(τ ) θ(0) 3
This has the rather dramatic implication that, if θ(0) < 0 (i.e. the geodesics are
initially converging), then θ(τ ) → −∞ within finite proper time τ ≤ 3/|θ(0)|,
provided that the geodesics can be extended that far.
164
4. If one thinks of the geodesics as trajectories of physical particles, this is obviously
a rather catastrophic sitaution in which these particles will be infinitely squashed.
In general, however, the divergence of θ only indicates that the family of geodesics
develops what is known as a caustic where different geodesics meet.
Nevertheless, the above result plays a crucial role in establishing the occurrence of
true singularities in general relativity if supplemented e.g. by conditions which en-
sure that such “harmless” caustics cannot appear, as this means that the geodesic
cannot be extended to where one would find θ → −∞. This kind of argument
(leading to the conclusion of geodesic incompleteness of a space-time) is one of
the typical ingredients of the singularity theorems of general relativity.8
5. To round off this section, let us briefly come back to the more general situation
where uµ is a normalised timelike vector field but not necessarily geodesic. In that
case, ∇ν uµ is not automatically purely spatial,
uν ∇ν uµ = aµ , (9.70)
165
and its cyclic symmetry (8.16), it is possible to deduce that for a Killing vector K µ ,
∇µ Kν + ∇ν Kµ = 0, one has the following basic identity relating Killing vectors and the
curvature tensor,
∇λ ∇µ Kν = Rρλµν Kρ . (9.74)
Proof: proceeding as in the proof of the cyclic permutation identity (8.16), we deduce
that
∇[µ ∇ν Kλ] ∼ Rρ[λµν] Kρ = 0 . (9.75)
Since ∇ν Kλ is antisymmetric, the total anti-symmetrisation is equivalent to cyclic per-
mutation, and we therefore have
∇µ ∇ν Kλ + ∇ν ∇λ Kµ + ∇λ ∇µ Kν = 0 . (9.76)
Using the Killing property in the second term, we can write this as
which is (9.74).
The identity (9.74) has a rather remarkable consequence (which we will explore in
more detail in section 17.1) that a Killing vector field K µ (x) is completely determined
everywhere by the values of Kµ (x0 ) and ∇µ Kν (x0 ) at a single point x0 . In particular,
as shown there, this immediately leads to an upper bound on the number of linearly
independent Killing vectors a metric can have. Here we will briefly sketch some other
consequences of this identity.
1. As a first application, we will explicitly prove the assertion (6.34) of section 6.5
that the Lie bracket of two Killing vectors is again a Killing vector. While this
follows from the general property (6.29) of the Lie derivative, which itself can (with
some work) be deduced from the general defintion of the Lie derivative (as the
generator of the action of coordinate transformations on tensors), it is instructive
and reassuring to verify this by an explicit calculation.
Thus consider two Killing vectors Aµ and B µ , say, i.e. vector fields satisfying
∇µ Aν + ∇ν Aµ = ∇µ Bν + ∇ν Bµ = 0 (9.78)
or, equivalently,
∇µ Aν = ∇[µ Aν] , ∇µ Bν = ∇[µ Bν] . (9.79)
Explicitly, from (6.16) their Lie bracket is the vector field
166
In calculating ∇µ Cν one encounters new first derivatives of Aµ and B µ (which can
be manipulated by the Killing equations), as well as second covariant derivatives,
which can be reduced to zero-derivative terms by using the fundamental identity
(9.74),
Thus the Lie bracket of two Killing vectors is indeed again a Killing vector, as
claimed.
2. Here is another application of the identity (9.74). It should be obvious and obvi-
ously true that for any Killing vector of a metric the scalar curvature of the metric
inherits the corresponding symmetries of the metric, i.e. that it does not change
along the orbits of that Killing vector,
∇µ Kν + ∇ν Kµ = 0 ⇒ K µ ∇µ R = 0 . (9.84)
• One can start with the contracted Bianchi identity ∇µ Gµν = 0, and contract
it with K ν to find
• Using the Killing equations, i.e. the anti-symmetry of ∇µ Kλ , and the sym-
metry of the Ricci tensor, one can write this as
∇ν ∇µ Kν = K ν Rµν . (9.87)
• Thus, finally
Alternatively (and more quickly but somewhat less covariantly) one could have
simply locally introduced an adapted coordinate system (6.45) in which K =
∂y and ∂y gµν = 0, to immediately deduce that then necessarily also ∂y R = 0.
However, the above identities like (9.87) are useful in their own right (as we will
see below).
167
3. Because the Einstein tensor Gµν (8.83) is symmetric and conserved (the contracted
Bianchi identity (8.82)), to any Killing vector one can associate (cf. the discussion
in section 7.1) the conserved current
for any value of the real parameter a. Among this 1-parameter family of conserved
currents, the choice
µ
Ja=0 ≡ J µ (K) = Rµν Kν (9.91)
is singled out by the fact that, by (9.87), it is not only conserved but can actually
be written as the divergence of an anti-symmetric tensor,
As an aside, note that while in the above we started off with Killing vectors, a similar
story is actually true for any vector field. Namely, for any vector field ξ µ define the
current
J µ (ξ) = ∇ν (∇[µ ξ ν] ) = 21 ∇ν (∇µ ξ ν − ∇ν ξ µ ) . (9.93)
Note that this reduces to (9.92) for ξ µ = K µ a Killing vector. Moreover, by (8.26) this
current is conserved,
∇µ J µ (ξ) = 0 ∀ ξ µ . (9.94)
where we made use of (8.23). Note that this indeed reduces to (9.91) for a Killing
vector, for which the second term on the right-hand side is absent.
168
The existence of these identically conserved currents and the corresponding surface
charge densities ∇[µ ξ ν] reflects the fact that in general relativity all vector fields can be
considered as the generators of symmetries (in the sense of coordinate transformations).
Indeed, the currents J µ (ξ) can be shown to be precisely the corresponding Noether
currents arising from the Lagrangian formulation of general relativity to be discussed in
section 11. We will establish this result in section 11.3. Nevertheless, the currents and
charges associated with Killing vectors turn out to play a privileged role, and we will
in particular relate the Komar charge for a timelike Killing vector to the ADM mass of
an isolated (asymptotically) stationary system in section 16.4.
169
10 The Einstein Equations
10.1 Heuristics
We expect the gravitational field equations to be non-linear second order partial dif-
ferential equations for the metric. If we knew more about the weak field equations of
gravity (which should thus be valid near the origin of an inertial coordinate system) we
could use the Einstein equivalence principle (or the principle of general covariance) to
deduce the equations for strong fields.
However, we do not know a lot about gravity beyond the Newtonian limit of weak time-
independent fields and low velocities, simply because gravity is so ‘weak’. Hence, we
cannot find the gravitational field equations in a completely systematic way and some
guesswork will be required.
Nevertheless we will see that with some very natural assumptions (and the benefit of
hindsight) we will arrive at an essentially unique set of equations. Further theoretical
(and aesthetical) confirmation for these equations will then come from the fact that they
turn out to be the Euler-Lagrange equations of the absolutely simplest action principle
for the metric imaginable.
Recall that, way back, in section 1.1, we had briefly discussed the possibility of a scalar
relativistic theory of gravity described by an equation of the form (1.2)
We had noted there that one way to render this equation (tensorially) consistent is to
think of both the left and the right hand side as (00)-components of some tensor, which
we expressed in (1.5) as
While this appeared to be an exotic proposal back in section 1.1, we now understand
that this is exactly what is required, and we have a fairly precise idea of what this tensor
on the left-hand side should be.
Indeed, recall from our discussion of the Newtonian limit of the geodesic equation that
the weak static field produced by a non-relativistic mass density ρ is
170
This suggests that the weak-field equations for a general energy-momentum tensor take
the form
Eµν = 8πGN Tµν , (10.6)
where Eµν is constructed from the metric and its first and second derivatives.
But by the Einstein equivalence principle, if this equation is valid for weak fields (i.e.
near the origin of an inertial coordinate system) then also the equations which gov-
ern gravitational fields of arbitrary strength must be of this form, with Eµν a tensor
constructed from the metric and its first and second derivatives.
Another way of anticipating what form the field equations for gravity may take is via
an analogy, a comparison of the geodesic deviation equations in Newton’s theory and
in General Relativity. Recall that in Newton’s theory we have
d2 i
δx = −K ij δxj
dt2
K ij = ∂ i ∂j φ , (10.7)
Tr K ≡ ∆φ = 4πGN ρ , (10.9)
This suggests that somehow in the gravitational field equations of General Relativity, ∆φ
should be replaced by the Ricci tensor Rµν . Note that, at least roughly, the tensorial
structure of this identification is compatible with the relation between φ and g00 in
the Newtonian limit, the relation between ρ and the 0-0 component T00 of the energy
momentum tensor, and the fact that for small velocities Rµν ẋµ ẋν ∼ R00 .
We will now turn to a somewhat more precise argument along these lines which will
enable us to determine Eµν .
1. Eµν is a tensor
171
2. Eµν has the dimensions of a second derivative. If we assume that no new dimen-
sionful constants enter in Eµν then it has to be a linear combination of terms which
are either second derivatives of the metric or quadratic in the first derivatives of
the metric. (Later on, we will see that there is the possibility of a zero derivative
term, but this requires a new dimensionful constant, the cosmological constant Λ.
Higher derivative terms could in principle appear but would only be relevant at
very high energies.)
4. Since Tµν is covariantly conserved, the same has to be true for Eµν ,
∇µ T µν = 0 ⇒ ∇µ E µν = 0 . (10.11)
Now it turns out that these conditions (1)-(5) determine Eµν uniquely! First of all, (1)
and (2) tell us that Eµν has to be a linear combination
where Rµν is the Ricci tensor and R the Ricci scalar. Then condition (3) is automatically
satisfied.
To implement (4), we rewrite the above as a linear combination of the Einstein tensor
(8.83) and gµν R,
∇µ Gµν = 0 . (10.15)
It follows that (4) is satisfied iff c∇ν R = c∂ν R = 0. We therefore have to require either
∇ν R = 0 or c = 0. That the first possibility is ruled out (inconsistent) can be seen by
taking the trace of (10.6),
Thus, R is proportional to T µµ and since this quantity need certainly not be constant
for a general matter configuration, we are led to the conclusion that c = 0. Thus we
find
Eµν = aGµν . (10.17)
We can now use condition (5) to determine the constant a.
172
10.3 The Newtonian Weak-Field Limit
By the above considerations we have determined the field equations to be of the form
aGµν = 8πGN Tµν , (10.18)
with a some, as yet undetermined, constant. We will now consider the Newtonian
weak-field limit of this equation. We need to find that G00 is proportional to ∆g00 and
we can then use the condition (5) to fix the value of a. The following manipulations
are somewhat analogous to those we performed when considering the Newtonian limit
of the geodesic equation. The main difference is that now we are dealing with second
derivatives of the metric rather than with just its first derivatives entering in the geodesic
equation.
First of all, for a non-relativistic system we have |Tij | ≪ T00 and hence |Gij | ≪ |G00 |.
Therefore we conclude
|Tij | ≪ T00 ⇒ Rij ∼ 12 gij R . (10.19)
Next, for a weak field we have gµν = ηµν + hµν with hµν small (in a suitable sense) and,
in particular,
R ∼ η µν Rµν = Rkk − R00 , (10.20)
which, together with (10.19), translates into
R ∼ 23 R − R00 . (10.21)
or
R ∼ 2R00 . (10.22)
In the weak field limit, R00 in turn is given by
R00 = Rk0k0 = η ik Ri0k0 . (10.23)
Moreover, in this limit only the linear (second derivative) part of Rµνλσ will contribute,
not the terms quadratic in first derivatives. Thus we can use the expression (8.12) for
the curvature tensor. Additionally, in the static case we can ignore all time derivatives.
Then only one term (the third) of (8.12) contributes and we find
Ri0k0 = − 12 g00 ,ik , (10.24)
and therefore
R00 = − 12 ∆g00 . (10.25)
Putting everything together, we get
G00 = (R00 − 12 g00 R) = (R00 − 12 η00 R)
(10.26)
= (R00 + 12 R) = (R00 + R00 ) = −∆g00
(see also section 16.3 for a somewhat more streamlined and covariant derivation of
this result). Thus we obtain the correct functional form of E00 and comparison with
condition (5) determines a = +1 and therefore Eµν = Gµν .
173
10.4 The Einstein Equations
We have finally arrived at the Einstein equations for the gravitational field (metric) of
a matter-energy configuration described by the energy-momentum tensor Tµν . It is
Remarks:
1. With c not set equal to one, and with the convention that T00 is normalised such
that it gives the energy-density rather than the mass-density, one finds that the
factor 8πGN on the right hand side should be replaced by
8πGN
8πGN → . (10.28)
c4
A note on dimensions: Newton’s constant has dimensions (M mass, L length, T
time) [GN ] = M−1 L3 T−2 so that
Thus
[ρGN /c4 ] = L−2 = [Rµν ] , (10.31)
as it should be. Frequently, an alternative (and equally reasonable) convention
is used in which T00 is a mass density, so that then Ttt = c2 T00 is the energy
density. In that case, the factor on the right-hand side of the Einstein equations
is 8πGN /c2 .
2. Another common way of writing the Einstein equations is obtained by taking the
trace of (10.27), which yields
R − 2R = 8πGN T µµ , (10.32)
In particular, for the vacuum, Tµν = 0, the Einstein equations are equivalent to
Rµν = 0 . (10.34)
A space-time metric satisfying this equation is, for obvious reasons, said to be
Ricci-flat.
174
3. Even the vacuum Einstein equations still constitute a complicated set of non-
linear coupled partial differential equations whose general solution is not, and
probably will never be, known. Usually one makes some assumptions, in particular
regarding the symmetries of the metric, which simplify the equations to the extent
that they can be analysed explicitly, either analytically, or at least qualitatively
or numerically.
4. The streamlined “derivation” of the Einstein equations given here may give the
misleading impression that also for Einstein this was a straighforward affair. Noth-
ing could be further from the truth. Not only do we have the benefit of hindsight.
We also have a much more systematic and advanced understanding of Rieman-
nian geometry and tensor calculus than was available to Einstein at the time (e.g.
of the contracted Bianchi identities and their importance for energy-momentum
(non-)conservation (see above) and for general covariance (to be briefly discussed
in section 10.5 below).10
5. As we saw before, in two and three dimensions, vanishing of the Ricci tensor
implies the vanishing of the Riemann tensor. Thus in these cases, the space-
times are necessarily flat away from where there is matter, i.e. at points at which
Tµν (x) = 0. Thus there are no true gravitational fields and no gravitational waves.
In four dimensions, however, the situation is completely different. As we saw,
the Ricci tensor has 10 independent components whereas the Riemann tensor has
20. Thus there are 10 components of the Riemann tensor which can curve the
vacuum, as e.g. in the field around the sun, and a lot of interesting physics is
already contained in the vacuum Einstein equations.
Because the Ricci tensor is symmetric, the Einstein equations consitute a set of ten
algebraically independent second order differential equations for the metric gµν . At
first, this looks exactly right as a set of equations for the ten components of the metric.
But at second sight, this cannot be right. After all, the Einstein equations are generally
covariant, so that they can at best determine the metric up to coordinate transforma-
tions. Therefore we should only expect six independent generally covariant equations
for the metric. Here we should recall the contracted Bianchi identities. They tell us
that
∇µ Gµν = 0 , (10.35)
10
For an illuminating brief account of the torturous and convoluted route and crucial final stages
that led Einstein (and Grossmann) to the correct field equations, see N. Straumann, Einstein’s ‘Zürich
Notebook’ and his Journey to General Relativity, arXiv:1106.0900v1 [physics.hist-ph].
175
and hence, even though the ten Einstein equations are algebraically independent, there
are four differential relations among them, so this is just right.
It is no coincidence, by the way, that the Bianchi identities come to the rescue of general
covariance. We will see later that the Bianchi identities can in fact be understood as a
consequence of the general covariance of the Einstein equations (and of the corresopnd-
ing action principle).
The general covariance of the Einstein equations is reflected in the fact that only six of
the ten equations are truly dynamical equations, namely (for the vacuum equations for
simplicity)
Gij = 0 , (10.36)
where i, j = 1, 2, 3. The other four, namely
Gµ0 = 0 , (10.37)
are constraints that have to be satisfied by the initial data gij and, say, dgij /dt on some
initial spacelike hypersurface (Cauchy surface).
These constraints are analogues of the Gauss law constraint of Maxwell theory (which
is a consequence of the U (1) gauge invariance of the theory), but significantly more
complicated. Over the years, a lot of effort has gone into developing a formalism and
framework for the initial value and canonical (phase space) description of General Rel-
ativity. The most well known and useful of these is the so-called ADM (Arnowitt,
Deser, Misner 1959 - 1962) formalism.11 The canonical formalism has been developed
in particular with an eye towards canonical quantisation of gravity.
As mentioned before, there is one more term that can be added to the Einstein equations
provided that one relaxes the condition (2) that only terms quadratic in derivatives
should appear. This term takes the form Λgµν . This is compatible with the condition
(4) (the conservation law) provided that Λ is a constant, the cosmological constant. it
is a dimensionful parameter with dimension [Λ] = L−2 one over length squared.
To be compatible with condition (5) ((1), (3) and (4) are obviously satisfied), Λ has to
be quite small (and observationally it is very small indeed).
Remarks:
11
This body of research is summarised in the 1962 article R. Arnowitt, S. Deser, C. Misner, The
Dynamics of General Relativity, available on the arXiv as arXiv:gr-qc/0405109.
176
1. The vacuum Einstein equations with a cosmological constant read
2. In general, solutions to the equation Rµν = cgµν for some constant c (and ei-
ther Riemannian or Lorentzian signature) are known as Einstein manifolds in the
mathematics literature.
4. Comparing with the energy-momentum tensor of, say, a perfect fluid (see section
19.2),
Tµν = (ρ + p)uµ uν + pgµν , (10.41)
6. However, things are not as simple as that. Just because it is not required does not
mean that it is not there. In fact, one of the biggest puzzles in theoretical physics
today is why the cosmological constant is so small. According to standard quan-
tum field theory lore, the vacuum energy density should be many many orders of
magnitude larger than astrophysical observations allow. Now usually in quantum
field theory one does not worry too much about the vacuum energy as one can
normal-order it away. However, as we know, gravity is unlike any other theory in
177
that not only energy-differences but absolute energies matter (and cannot just be
dropped).
The question why the observed cosmological constant is so small (and recent
astrophysical observations appear to favour a tiny non-zero value) is known as
the Cosmological Constant Problem. See section 20.7 for a brief discussion of this
profound issue and some references.
7. We will consider the possibility that Λ 6= 0 only in the sections on Cosmology (in
all other applications, Λ can indeed be neglected).
However, this answer leaves something to be desired because it does not really provide
an explanation of how the information about these other components is encoded in the
Einstein equations. It is interesting to understand this because it is precisely these
components of the Riemann tensor wich represent the effects of gravity in vacuum, i.e.
where Tµν = 0, like tidal forces and gravitational waves.
The more insightful answer is that the information is encoded in the Bianchi identities
which serve as propagation equations for the trace-free parts of the Riemann tensor
away from the regions where Tµν 6= 0.
Let us see how this works. First of all, we need to decompose the Riemann tensor
into its trace parts Rµν and R (determined directly by the Einstein equations) and its
traceless part Cµνρσ , the Weyl tensor.
Cµνρσ = Rµνρσ
1
− (gµρ Rνσ + Rµρ gνσ − gνρ Rµσ − Rνρ gµσ )
D−2
1
+ R(gµρ gνσ − gνρ gµσ ) . (10.44)
(D − 1)(D − 2)
This definition is such that Cµνρσ has all the symmetries of the Riemann tensor (this is
manifest) and that all of its traces are zero, i.e.
C µνµσ = 0 . (10.45)
178
In the vacuum, Rµν = 0, and therefore
and, as anticipated, the Weyl tensor encodes the information about the gravitational
field in vacuum. The question thus is how Cµνρσ is determined everywhere in space-
time by an energy-momentum tensor which may be localised in some finite region of
space-time.
over λ and ρ and making use of the symmetries of the Riemann tensor, one obtains
Expressing the Riemann tensor in terms of its contractions and the Weyl tensor, and
using the Einstein equations to replace the Ricci tensor and Ricci scalar by the energy-
momentum tensor, one now obtains a propagation equation for the Weyl tensor of the
form
∇µ Cµνρσ = Jνρσ , (10.49)
where Jνρσ depends only on the energy-momentum tensor and its derivatives. Deter-
mining Jνρσ in this way is straightforward and one finds
i
D−3 1 h
Jνρσ = 8πGN ∇ρ Tνσ − ∇σ Tνρ − ∇ρ T λλ gνσ − ∇σ T λλ gνρ . (10.50)
D−2 D−1
∇µ Fµν = Jν , (10.51)
and this is the starting point for a very fruitful analogy between the two subjects.
Indeed it turns out that in many other respects as well Cµνρσ behaves very much like
an electro-magnetic field: one can define electric and magnetic components E and B,
these satisfy |E| = |B| for a gravitational wave, etc.
Finally, the Weyl tensor is also useful in other contexts as it is conformally invariant,
i.e. C µνρσ is invariant under conformal rescalings of the metric
In particular, the Weyl tensor is zero if the metric is conformally flat, i.e. related by a
conformal transformation to the flat metric, and conversely vanishing of the Weyl tensor
is also a sufficient condition for a metric to be conformal to the flat metric.
179
11 The Einstein Equations from a Variational Principle
To increase our confidence that the Einstein equations we have derived above are in fact
reasonable and almost certainly correct, we can adopt a more modern point of view.
We can ask if the Einstein equations follow from an action principle or, alternatively,
what would be a natural action principle for the metric.
After all, for example in the construction of the Standard Model, one also does not start
with the equations of motion but one writes down the simplest possible Lagrangian with
the desired field content and symmetries.
We will start with the gravitational part, i.e. the Einstein tensor Gµν of the Einstein
equations, and deal with the matter part, the energy-momentum tensor Tµν , later.
By general covariance, an action for the metric gµν will have to take the form
Z
√ 4
S= gd x Φ(gµν ) , (11.1)
where Φ is a scalar constructed from the metric. So what is Φ going to be? Clearly,
the simplest choice is the Ricci scalar R, and this is also the unique choice if one is
looking for a scalar constructed from not higher than second derivatives of the metric.
Therefore we postulate the beautifully simple and elegant action
R√
SEH = gd4 x R (11.2)
Since the Ricci scalar is R = gµν Rµν , it is simpler to consider variations δgµν of the
180
inverse metric instead of δgµν . Thus, as a first step we write
Z
√ 4 µν
δSEH = δ gd x g Rµν
Z
√ √ √
= d4 x ((δ g)gµν Rµν + g(δgµν )Rµν + ggµν δRµν ) . (11.3)
to deduce
Z
√
δSEH = gd4 x [(− 12 gµν R + Rµν )δgµν + gµν δRµν ]
Z Z
√ 4 1 µν √ 4 µν
= gd x (Rµν − 2 gµν R)δg + gd x g δRµν . (11.5)
The first term all by itself would already give the Einstein tensor. Thus we need to show
that the second term is a total derivative. I do not know of any particularly elegant
argument to establish this (in a coordinate basis - written in terms of differential forms
this would be completely obvious), so this will require a little bit of work, but it is not
difficult.
Postponing the proof of this statement to the end of this section, we have established
that the variation of the Einstein-Hilbert action gives the gravitational part (left hand
side) of the Einstein equations,
Z Z
√ 4 √ 4
δ gd x R = gd x (Rµν − 12 gµν R)δgµν . (11.6)
Remarks:
2. It follows form the explicit expression for the scalar curvature in terms of the
Christoffel symbols and the metric, obtained by contracting (8.5),
that the Einstein-Hilbert action contains terms that are quadratic in first deriva-
tives of the metric, as well as terms that depend linearly on the second derivatives.
The latter are uncommon from the usual Lagrangian point of view but required by
general covariance (there is no scalar that can be constructed solely from the met-
ric and its first derivatives). However, the fact that gµν δRµν is a total derivative
181
reflects the fact that these second derivatives are spurious in the sense that they
can be eliminated by an integration by parts or boundary term, at the expense,
however, of general covariance.
In fact, by straightforward manipulation of (11.8), using identities such as
Thus, instead of the generally covariant Einstein-Hilbert action one can use the
non-covariant but quadratic action
Z
√ 4 αβ µ ν
SEH → gd x g (Γ να Γ µβ − Γµαβ Γννµ ) (11.12)
3. If one wants to include the cosmological constant Λ, then the action gets modified
to Z
√ 4
SEH,Λ = gd x (R − 2Λ) . (11.13)
Gµν = Rµν − 12 gµν R → Gµν = Rµν − 12 gµν (R − 2Λ) = Gµν + Λgµν , (11.14)
4. Of course, once one is working at the level of the action, it is easy to come up
with covariant generalisations of the Einstein-Hilbert action, such as
Z
√ 4
S= gd x (R + c1 R2 + c2 Rµν Rµν + c3 R2R + . . .) , (11.15)
but these invariably involve higher-derivative terms and are therefore irrelevant for
low-energy physics and thus the world we live in. Such terms could be relevant for
the early universe, however, and are also typically predicted by quantum theories
of gravity like string theory.
————————————————–
182
We conclude this section with the proof that gµν δRµν is a total derivative.
First of all, we need the explicit expression for the Ricci tensor in terms of the Christoffel
symbols, which can be obtained by contraction of (8.5),
Now we need to calculate the variation of Rµν . We will not require the explicit expression
in terms of the variations of the metric, but only in terms of the variations δΓµνλ induced
by the variations of the metric. This simplifies things considerably.
Obviously, δRµν will then be a sum of six terms,
δRµν = ∂λ δΓλµν − ∂ν δΓλµλ + δΓλλρ Γρνµ + Γλλρ δΓρνµ − δΓλνρ Γρλµ − Γλνρ δΓρλµ . (11.17)
Now the crucial observation is that δΓµνλ is a tensor. This follows from the arguments
given in section 4.8, but I will repeat it here in the present context. Of course, we know
that the Christoffel symbols themselves are not tensors, because of the inhomogeneous
(second derivative) term appearing in the transformation rule under coordinate trans-
formations. But this term is independent of the metric. Thus the metric variation of
the Christoffel symbols indeed transforms as a tensor.
This can also be confirmed by explicit calculation. Just for the record, I will give an
expression for δΓµνλ which is easy to remember as it takes exactly the same form as
the definition of the Christoffel symbol, only with the metric replaced by the metric
variation and the partial derivatives by covariant derivatives, i.e.
It turns out, none too surprisingly, that δRµν can be written rather compactly in terms
of covariant derivatives of δΓµνλ , namely as
As a first check on this, note that the first term on the right hand side is manifestly
symmetric and that the second term is also symmetric because of (4.36) and (5.74). To
establish (11.19), one simply has to use the definition of the covariant derivative. The
first term is
which takes care of the first, fourth, fifth and sixth terms of (11.17). The remaining
terms are
−∂ν δΓλµλ + δΓλλρ Γρνµ = −∇ν δΓλµλ , (11.21)
which establishes (11.19). Now what we really need is g µν δRµν ,
183
Using the explicit expression for δΓµνλ given above, we see that we can also write this
rather neatly and compactly as
g µν δRµν = ∇λ J λ
J λ = gµν δΓλµν − gµλ δΓννµ , (11.24)
(since J µ is constructed from the variations of the metric which, by assumption, vanish
at infinity) as we wanted to show.
In order to obtain the non-vacuum Einstein equations, we need to decide what the matter
Lagrangian should be. Now there is an obvious choice for this. If we have matter, then
in addition to the Einstein equations we also want the equations of motion for the matter
fields. Therefore we should add to the Einstein-Hilbert action the standard minimally
coupled matter action
Z
√ 4
SM [φ, gαβ ] = gd x LM (φ(x), ∂λ φ(x), . . . ; gµν (x), ∂λ gµν (x), . . .) , (11.26)
φ representing any kind of (scalar, vector, tensor, . . . ) matter field, obtained by suitable
covariantisation of the corresponding matter action in Minkowski space via the principle
of minimal coupling (section 5). Thus e.g. the matter action for a Klein-Gordon field
would be (5.4),
Z
√ 4 h 1 αβ i
SM [φ, gαβ ] = gd x − 2 g ∂α φ∂β φ − 12 m2 φ2 , (11.27)
Of course, the variation of the matter action with respect to the matter fields will give
rise to the covariant equations of motion of the matter fields. But if we want to add
the matter action to the Einstein-Hilbert action and treat the metric as an additional
dynamical variable, then we have to ask what the variation of the matter action with
184
respect to the metric is. The short answer is: the energy-momentum tensor. Indeed,
even though there are other definitions of the energy-momentum tensor you may know
(defined via Noether’s theorem applied to translations in flat space, for example), this
is the modern, and by far the most useful, definition of the energy-momentum tensor,
namely as the response of the matter action to a variation of the metric.
Indeed we had already seen in sections 5.3 and 5.4 that this definition (with a judicious
factor of 2), Z
√ 4
1
δmetric SM [φ, gαβ ] = − 2 gd x Tµν δgµν , (11.29)
or
2 δ
Tµν := − √ SM [φ, gαβ ] , (11.30)
g δgµν
reproduces the known results in the case of a scalar field or Maxwell theory.
One of the many advantages of this definition is that it automatically gives a symmetric
and gauge invariant tensor (no improvement terms or similar gymnastics required) which
is also automatically covariantly conserved (on-shell, i.e. for matter fields satisfying their
Euler-Lagrange equations of motion). We will establish this latter fact below - it is
simply a consequence of the general covariance of SM .
2. If one were to try to deduce the gravitational field equations by starting from a
variational principle, i.e. by constructing the simplest generally covariant action
for the metric and the matter fields (and this would be the modern approach to
the problem, had Einstein not already solved it for us a 100 years ago), then one
would also invariably be led to the above action.
The relative numerical factor 16πGN between the two terms would of course then
not be fixed a priori, because this approach will not (and cannot possible be
expexted to) determine Newton’s constant. The prefactor could once again be
determined by looking at the Newtonian limit of the resulting equations of motion.
185
3. Typically, the above action principle will lead to a very complicated coupled system
of equations for the metric and the matter fields because the metric also appears
in the energy-momentum tensor and in the equations of motion for the matter
fields.
5. When the minimally coupled matter Lagrangian depends only on the metric and
not on the first derivatives of the metric (i.e. not on the Christoffel symbols),
as in the case of scalar or Maxwell gauge fields, then more explicitly the covariant
energy-momentum tensor can be written as (and calculated from)
√
2 ∂( gLM (x)) ∂LM (x)
Tµν (x) = − √ = −2 µν + gµν (x)LM (x) (11.34)
g ∂gµν (x) ∂g (x)
or
∂LM (x)
T µν (x) = 2 + g µν (x)LM (x) (11.35)
∂gµν (x)
Here the sign change is due to the fact that δg µν is not the same as gµλ gνρ δgλρ ,
but rather minus this expression,
0 = δ(g µν gνλ ) = (δgµν )gνλ + gµρ δgρλ ⇔ δgµν = −gµλ gνρ δgλρ . (11.37)
6. When the minimally coupled matter action depends also on the first derivatives
of the metric, through the covariant derivative ∇µ ψ of some (non-scalar) field ψ,
say, by the usual rules of variational calculus there will be additional contributions
to the energy-momentum tensor, arising from an integration by parts of
Z
√ 4 ∂LM (x)
gd x −2 (δ∇µ ψ)(x)
∂∇µ ψ(x)
186
where δ(∇µ ψ) = (δ∇µ )ψ denotes the variation of the covariant derivative induced
by the metric-variation (11.18) of the Christoffel symbols. The precise form of the
resulting contribution to the energy-momentum tensor depends on the tensorial
type of ψ, and it is unedifying to attempt to write down a general formula for this.
The variational (i.e. action or Lagrangian based) formulation of general relativity has a
number of significant conceptual and technical advantages, and we will explore some of
them in this section.
• I mentioned before, in section 10.5, that it is no accident that the Bianchi identities
come to the rescue of the general covariance of the Einstein equations in the sense
that they reduce the number of independent equations from ten to six. We will now
see that indeed the Bianchi identities are a consequence of the general covariance
of the Einstein-Hilbert action.
• Virtually the same calculation will show that the covariant (metric, Hilbert)
energy-momentum tensor, as defined above, is automatically conserved (on shell)
by virtue of the general covariance of the matter action.
• Simple variants of these arguments will also provide us with the Noether currents
associated with the general covariance of the Einstein-Hilbert action.
for any metric variation (modulo total derivative terms). We also know that the
Einstein-Hilbert action is invariant under coordinate transformations. In particu-
lar, therefore, the above variation should be identically zero (or a total derivative)
for variations of the metric induced by an infinitesimal coordinate transformation.
But we know from the discussion of the Lie derivative that such a variation is of
the form
δξ gµν = Lξ gµν = ∇µ ξν + ∇ν ξµ , (11.39)
187
or
δξ gµν = Lξ g µν = −(∇µ ξ ν + ∇ν ξ µ ) , (11.40)
where the vector field ξ is the infinitesimal generator of the coordinate transforma-
tion. Let us consider vector fields for which any total derivative terms vanish on
the boundary (we will come back to the more general cacse below). Then δξ SEH
should be identically zero. Calculating this we find
Z
√ 4
0 = δξ SEH = − gd x Gµν (∇µ ξ ν + ∇ν ξ µ )
Z Z (11.41)
√ 4 √ 4
= −2 gd x Gµν ∇µ ξ ν = 2 gd x (∇µ Gµν )ξ ν .
Since this has to hold for all ξ (vanishing on the boundary), we deduce
and, as promised and anticipated, the contracted Bianchi identities are strictly re-
lated to general covariance bcause they are a consequence of the general covariance
of the Einstein-Hilbert action.
On the other hand, by explicitly performing this variation, the bulk term is iden-
tically zero by the contracted Bianchi identity, but we obtain one total derivative
term from (11.23),
188
and a second total derivative term
from the integration by parts performed in the course of the calculation in (11.41).
The term involving the scalar curvature is identical to, and cancels against, the
scalar curvature term arising from (11.44). One is thus left with the statement
that for any vector field ξ µ and any integration domain one has
Z
√ 4
gd x∇µ [−2Rµν ξ ν + (gµα gνβ − g µν gαβ )∇ν (∇α ξβ + ∇β ξα )] = 0 . (11.47)
already mentioned in section 9.5. We thus learn that the generalised Komar
currents of that section are indeed, as anticipated there, precisely the identically
conserved Noether currents associated to the general covariance of the Einstein-
Hilbert action.
Note that, as mentioned above, it is a general feature of Noether currents as-
sociated to local (gauge) symmetries that they are in fact identically conserved:
the current J µ (ξ) (or its conterpart for some local symmetry of another theory)
cannot possibly be conserved for all possible ξ µ (x) unless it is actually identically
conserved.12 In particular, this implies that the conserved charges associated with
these currents can always be expressed as surface integrals.
189
terms vanish). Proceeding as before, and using the definition (11.29) of the energy-
momentum tensor, we find
0 = δξ SM
Z
√ 4 δLM
= gd x (− 21 Tµν δξ g µν + δξ φ)
δφ
Z Z
√ 4 √ 4 δLM
= − gd x (∇µ Tµν )ξ ν + gd x δξ φ . (11.50)
δφ
Now once again this has to hold for all ξ, and as the second term is identically
zero ‘on-shell’, i.e. for φ satisfying the matter Euler-Lagrange equations of motion
δLM /δφ = 0, we deduce that
This should be contrasted with the contracted Bianchi identities which are valid
‘off-shell’.
We now look at some of the implications of general covariance of the matter action
when taking into account boundary terms. This turns out so be surprisingly rewarding
- surprising because of the lack of obvious significance (not to be confused with “obvious
lack of significance”, their significance is explained and illustrated e.g. in the references
given in footnote 12) of the identically conserved Noether currents obtained from the
gravitational Einstein-Hilbert action in this case.
As we will see, this will lead us to a relation between the covariant energy-momentum
tensor (as defined here, via the metric-variation of the action) and the canonical energy-
momentum tensor (deduced from the translation-invariance of Poincaré-invariant field
theories in Minkowski space via Noether’s theorem).
On general grounds we know that, under the transformation δξ = Lξ the action trans-
forms as Z Z
√ 4 √ 4
δξ gd x LM = gd x ∇µ (ξ µ LM ) . (11.53)
190
On the other hand, explicitly performing the variation, one finds
Z √
√ 4 δξ g ∂LM αβ ∂LM ∂LM
δξ SM = gd x √ LM + αβ δξ g + δξ φ + δξ (∂ν φ) (11.54)
g ∂g ∂φ ∂(∂ν φ)
Using the formula (6.25) for the Lie derivative of a covector, one sees that
δξ ∂ν φ = Lξ ∂ν φ = ξ µ ∂µ ∂ν φ + (∂ν ξ µ )∂µ φ
(11.55)
= ∂ν (ξ µ ∂µ φ) = ∂ν Lξ φ = ∂ν δξ φ
(this is one of the simplifications brought about by considering only scalars). Then,
performing the usual integration by parts, keeping track of the boundary term, and
combining (11.53) and (11.54), one finds
Z Z Z
√ 4 α β √ 4 δLM √ 4
gd x Tαβ ∇ ξ + gd x δξ φ = gd x ∇α (Θαβ ξ β ) , (11.56)
δφ
where
∂LM
Θαβ = δαβ LM − ∂β φ (11.57)
∂(∂α φ)
is the usual canonical Noether energy-momentum tensor. Since this has to hold for all
ξ α , we deduce
δLM
Tαβ ∇α ξ β + (∂β φ) ξ β = ∇α (Θαβ ξ β ) , (11.58)
δφ
or
δLM
(Tαβ − Θαβ )∇α ξ β = (∇α Θαβ − ∂β φ)ξ β . (11.59)
δφ
If this is to be valid for arbitrary ξ β , the coefficients of ξ β and ∇α ξ β have to vanish
separately (at the origin of an intertial coordinate system this would be the coefficients
of the obviously functionally independent functions ξ β and ∂α ξ β ). This has the following
implications:
1. First of all, we learn that (in the case of scalar fields) the canonical and covariant
energy-momentum tensors agree (off-shell),
This is of course something that we already checked explicitly in section 5.3, but
it is reassuring to see this drop out of the general formalism as well.
2. We also learn that the latter (and hence the former) is conserved if the equations
of motion of the matter fields are satisfied,
δLM
=0 ⇒ ∇α Θαβ = ∇α T αβ = 0 . (11.61)
δφ
This could have also alternatively been deduced from integrating (11.58) over a
domain (with the ξ α chosen to vanish on the boundary), and integrating by parts
- this is just a repetition of the argument that previously led us to (11.51).
191
3. We can also see directly from (11.58) that for ξ α a Killing vector, the Noether
current
α
JN (ξ) = Θαβ ξ β (11.62)
is on-shell conserved,
δLM
∇(α ξ β) = 0 and =0 ⇒ α
∇α JN (ξ) = 0 . (11.63)
δφ
These are the conserved currents previously discussed in section 7.1.
In general, i.e. when one does not restrict to scalar fields, the above chain of reasoning
leads to the (on-shell) identification of the covariant energy-momentum tensor with
a suitable covariantisation of the Belinfante symmetric improvement of the canonical
Noether energy-momentum tensor.13
∂µ Θµν = 0 (11.65)
• In general (with the exception of spin zero scalar fields), Θµν = ηµλ Θλν is not
symmetric,
Θµν 6= Θνµ (11.68)
13
The literature on the Noether theorem and general covariance is littered with confused or at best
confusing articles. Recent reasonably clear and to the point discussions of these and / or closely related
issues can be found in R. Gamboa Saravi, On the energy-momentum tensor, arXiv:math-ph/0306020,
and M. Holman, Generalized Noether Theorems for Field Theories Formulated in Minkowski Spacetime,
arXiv:1009.1803 [gr-qc].
192
• As a consequence, the would-be angular momentum current
µ ∂L µ[νλ]
J µ = Jorbit + δL φ , Jorbit = Lµνλ . (11.71)
∂(∂µ φ)
From the conservation of this current one can then deduce an improved (on-
shell symmetric and conserved) energy-momentum tensor.14 This improved tensor
turns out to differ from the canonical energy-momentum tensor by an identically
conserved term, the Belinfante improvement term,
with
Ψλµν = −Ψµλν ⇒ ∂µ ∂λ Ψλµν = 0 , (11.73)
and
Θ̂µν = Θ̂νµ ⇔ ∂µ J µνλ ≡ ∂µ (xν Θ̂µλ − xλ Θ̂µν ) = 0 . (11.74)
Note that the spin-contribution to the total angular momentum has in this way
been transformed into an orbital contribution with respect to the new energy-
momentum tensor Θ̂µν .
• This improved canonical energy-momentum tensor turns out to agree on-shell with
(η)
the covariant Minkowski energy-momentum tensor Tµν ,
(η)
Θ̂µν = Tµν on-shell (11.75)
14
This procedure is explained in many places. For a detailed explanation, geared also towards appli-
cations to general relativity, see section 2 of T. Ortin, Gravity and Strings; for a succinct description,
and an extension of the usual procedure to Lagrangians depending also on second derivatives of the
fields, see section II of D. Bak, D. Cangemi, R. Jackiw, Energy-Momentum Conservation in General
Relativity, arXiv:hep-th/9310025.
193
(the off-shell agreement between the canonical and covariant tensors in (11.60) is
again a special feature of scalar fields), defined by temporarily minimally coupling
the theory to a background metric gαβ , performing the variation (11.30) with
respect to the metric, and then seeting gαβ = ηαβ again,
(η) 2 δ δSM
Tµν =− √ S M [φ, gαβ ] = −2 . (11.76)
g δgµν g =η
δgµν gαβ =ηαβ
αβ αβ
(η)
• The energy-momentum tensor Tµν defined in this way is automatically off-shell
symmetric, on-shell conserved (and, in particular, automatically takes into account
the orbital and spin contributions to the total conserved angular momentum of a
Poincaré-invariant field theory).
It is typically (some consistency issues I will not go into here may arise for fields with
spin > 1) possible to directly promote the Minkowskian improved (symmetric and on-
shell conserved) canonical energy-momentum tensor Θ̂µν to a symmetric and on-shell
covariantly conserved energy-momentum tensor in a curved background (by minimal
coupling), and this is how we proceeded in the case of Maxwell theory in section 5.4.
(because of curvature terms, i.e. because one would have to be able to commute
covariant derivatives on the fields in order to establish that the tensor is covariantly
conserved, and this is typically not possible for field of spin higher than zero).
194
• Thus there seems to be no good reason in the first place to add this term to
the non-conserved Θµν . Nevertheless, it turns out that the sum of these two
contributions,
Θ̂µν = Θµν + ∇λ Ψλµν , (11.81)
is indeed on-shell conserved and symmetric,
• This can be demystified somewhat by deriving this result from Noether’s theorem
applied to coordinate transformations (generalising the calculation done for scalar
fields at the beginning of this section by taking into account the non-trivial trans-
formation behaviour of higher-rank tensor fields under coordinate transformations
- see e.g. the references given in footnote 13).
• Ultimately, however, what this proof shows is that these properties hold for Θ̂µν
because this tensor is equal to the covariant energy-momentum tensor on-shell,
the latter being (on-shell) covariantly conserved and (off-shell) symmetric by con-
truction, and requiring absolutely no Belinfante-like gymnastics and improvement
terms for its construction.
These general features of the relation between the covariant and Belinfante-improved
canonical energy-momentum tensors in Minkowski space and curved space-times are
well and representatively illustrated by the example of Maxwell theory, already touched
upon in section 5.4:
• The term that should be added to rectify this and to turn ∂ν Aλ into Fνλ is
Here the first term is identically conserved and is the Belinfante correction term,
195
• As a consequence,
Θ̂µν = Θµν + ∂ λ (Fλµ Aν )
(11.87)
= Fµλ Fνλ + ηµν L − (∂λ Fµλ )Aν
agrees on-shell with the standard (covariant, symmetric and gauge-invariant)
energy-momentum tensor
(η)
Tµν = Fµλ Fνλ − 14 ηµν Fλσ F λσ (11.88)
of Maxwell theory,
(η)
Θ̂µν = Tµν on-shell (11.89)
(η)
• As shown in section 5.4, the minimally coupled extension Tµν of Tµν to a curved
background agrees with the covariant energy-momentum tensor, defined by the
variation of the Maxwell action with respect to the metric,
2 δS[Aα , gαβ ]
Tµν = Fµλ Fνλ − 41 gµν Fλσ F λσ = − √ . (11.90)
g δgµν
For me, the upshot of this long discussion is (but please draw your own conclusions)
that the covariant definition of the energy-momentum tensor appears to be superior,
both conceptually and calculationally, to other definitions, because it is more general,
more concise and to the point, and easier to work with. It should therefore also be the
definition of choice even if one is just interested in Poincaré-invariant field theories in
196
Minkowski space, in particular as it completely side-steps the issue of having to contruct
the Belinfante improvement terms to the canonical energy-momentum tensor.
Moreover, crucially for the present context and regardless of this entire discussion,
whatever the virtues of other definitions may be, from the variational principle for
general relativity it is the covariant energy-momentum tensor that plays the role of the
source term for the Einstein equations.
The key result and insight of the previous section, expressed in equations (11.75) and
(11.83), is that the improved canonical Noether energy-momentum tensor Θ̂µν is actually
equal on-shell to the (on-shell) covariantly conserved and (off-shell) symmetric covariant
energy-momentum tensor Tµν .
197
(the sum of the contributions from the scalar, Maxwell and axion action), while
the covariant energy-momentum tensor has the form
T αβ = Tsαβ + Tm
α
β (11.98)
because the axionic term does not couple to the metric. The scalar energy-
momentum tensors are equal on the nose (i.e. off-shell, without improvement
terms),
Θαs β = Tsαβ , (11.99)
so I will not discuss them further. The canonical Maxwell energy-momentum
tensor is
Θαm β = F αγ ∂β Aγ − 14 δαβ F γδ Fγδ , (11.100)
and the canonical axion energy-momentum tensor is
As we know, for pure Maxwell theory one would proceed and argue as follows:
one writes Θαm β as
The 1st term on the right-hand side is the covariant (symmetric, gauge invariant,
metric) energy-momentum tensor, the 2nd is identically conserved, and the 3rd
is zero on-shell, so that on-shell the canonical and covariant energy-momentum
tensors agree modulo identically conserved terms that do not contribute to surface
integrals etc. In the present (non-vacuum) case, one needs to take into account
the non-trivial equation of motion
∂α (F αβ + f (φ)F̃ αβ ) = 0 , (11.103)
and the axionic energy-momentum tensor Θαa β . Doing this one finds
α
Θαm β + Θαa β = Tm β (the term one expects)
αγ αγ
+ ∂γ [(F + f (φ)F̃ )Aβ ] (identically conserved)
(11.104)
+ [∂γ (F γα + f (φ)F̃ γα )]Aβ (on-shell zero)
h i
αγ 1 α γδ
+ f (φ) F̃ Fβγ − 4 δ β F̃ Fγδ (???)
At first sight the last terms appears to spoil the claimed relation between the
canoncial and covariant energy-momntum tensors. However, either by explicit
calculation in components or from a more covariant argument given below one
finds that the term in square barackest in the last line,
α αγ
T̃m β := F̃ Fβγ − 14 δαβ F̃ γδ Fγδ (11.105)
198
is actually cooperatively identically zero (for any anti-symmetric Fαβ and its dual),
α
T̃m β ≡0 . (11.106)
Therefore one concludes that, as it should be, the canonical and covariant energy-
momentum tensors are indeed equal on-shell modulo identically conserved terms.
α = 0 identically:
Here is a general proof that T̃m β
α
• The first term of T̃m β can be written as
F̃ αγ Fβγ = 1
2 ∈αγµν Fµν Fβγ = − 12 ∈αγµν Fµν Fγβ
(11.107)
= − 12 ∈αγµν F[µν Fγ]β = − 21 ∈αγµν F[µν Fγβ]
because anti-symmetry of Fγβ implies that F[µν Fγ]β is already totally anti-
symmetric in all its 4 indices,
with g determined by
α
• It follows that the first term of T̃m β is
α ≡ 0. (qed)
which is equivalent to the statement that T̃m β
with
α
Θαm β = Tm β + ∂γ (F
αγ
Aβ ) − (∂γ F αγ )Aβ , (11.113)
as usual, and
Θαcs β = k δαβ Lcs + k ∈αδγ Aδ ∂β Aγ . (11.114)
199
The covariant energy-momentum tensor, on the other hand, is just the Maxwell
α ,
Tm β
α
T αβ = Tm β , (11.115)
Thus for the sum of the Maxwell and CS contributions one has the result
α
Θαm β + Θαcs β = Tm β (the term one expects)
αγ δαγ
+ ∂γ [(F −k ∈ Aδ )Aβ ] (identically conserved)
(11.117)
+ [∂γ F γα + k ∈αγδ Fγδ ]Aβ (on-shell zero)
+ T̃ αβ ,
with
T̃ αβ = k δαβ Lcs + 21 k ∈δαγ (Aγ Fβδ + Aδ Fγβ + Aβ Fδγ ) . (11.118)
By an argument analogous to that in the previous example, one can show that
T̃ αβ ≡ 0 (11.119)
identically:
• one has
• thus
T̃ αβ = k δαβ Lcs + 32 k ∈δαγ A[γ Fβδ]
(11.122)
= k δαβ Lcs − 21 k ∈δαγ ∈γβδ Lcs = 0 .
These two examples evidently show a common and general pattern, and it is now easy to
generalise this to topological or quasi-topological interactions in higher dimensions. But
this is not necessary - after all, one can prove in complete generality that the canonical
and covariant energy-momentum tensors are equal on-shell modulo identically conserved
terms, and the present examples just serve to illustrate how this works in a non-trivial
situation.
200
11.6 Comments on Gravitational Energy
As an aside, let me briefly return to the covariant conservation law ∇µ T µν = 0 for the
matter energy-momentum tensor which played such a key role above, and which we now
write more explicitly as the non-conservation law (cf. (5.79))
This is treacherous territory and I cannot guarantee that even these elementary remarks
are widely considered to be uncontroversial (in fact, the number of uncontroversial
statements one can make about this subject is probably quite small - but is likely to
include this parenthetical remark . . . ).
1. One’s first thought may be that, in precise analogy with the definition of the
covariant matter energy-momentum tensor, the gravitational energy-momentum
tensor should be defined in terms of the variation of the gravitational (Einstein-
Hilbert) action with respect to the metric, or, equivalently, that the total energy-
mmentum tensor should be defined as the variation of the total (Einstein-Hilbert
+ matter) action with respect to the metric. While this seems to be the logical
thing to do,
tot ? 2 δ 1
Tµν = − √ SEH [gαβ ] + SM [φ, gαβ ]
g δgµν 16πGN
(11.124)
1
=− Gµν + Tµν ,
8πGN
it is evidently not useful because by the variational principle for general relativ-
ity this total energy-momentum tensor is identically zero for any solution to the
Einstein equations.
201
Whatever thus the poetic or philosophical virtues of such a definition might be
(‘everything from nothing’, ‘the total energy of the universe is zero’), it is clearly
deficient in many other respects and does not provide a whole lot of insight. In
particular, with this definition one would assign zero gravitational energy to any
source-free region of space-time, i.e. to vacuum gravitational field (like gravita-
tional waves, the exterior of a star etc.). Evidently, therefore, this definition fails
to capture some essential aspects of and contributions to what one would com-
monly refer to as gravitational potential energy.
2. Let us therefore return to (11.123). Note that the would-be gravitational force
density term on the right-hand side of (11.123) is non-tensorial. It is often argued
that this non-tensoriality reflects the equivalence principle and is therefore also a
necessary, profound, and characteristic feature of any potential definition of the
energy density and/or the energy-momentum ‘tensor’ of the gravitational field.
The argument for this runs along the lines of
• “by the equivalence principle one can always go to an inertial, i.e. freely
falling, reference frame in which, along the worldline of an observer (Fermi
coordinates), the effects of gravity are absent”
• “in such a frame the energy-momentum ‘tensor’ (or energy density) of the
gravitational field should therefore be zero”
• “if the gravitational energy-momentum ‘tensor’ were a true tensor, it would
then have to be zero in all reference frames”
• “thus a non-trivial gravitational energy-momentum ‘tensor’ cannot be ten-
sorial”
202
With the metric gµν playing the role of φ in general relativity, one might there-
fore expect to be able to write the gravitational energy density as an expression
quadratic in the first derivatives of the metric or, equivalently, quadratic in the
Christoffel symbols. However, there is no scalar or tensor that one can build
only from the metric and its first derivatives and thus, the reasoning goes, what-
ever generalises the Newtonian notion of gravitational energy density cannot be
tensorial.
It is a curious and apparently under-appreciated fact that already the Newtonian
theory exhibits an analogous behaviour, i.e. that even in the Newtonian theory the
notion of gravitiational energy density at a point is strictly speaking ambiguous.15
This is essentially a consequence of the equality of gravitational and inertial mass,
the precursor of the equivalence principle, which allows one to replace
~ → ∇φ
∇φ ~ + ~a = ∇(φ
~ + ~a.~x) (11.125)
for any constant acceleration vector ~a. One can thus always make the the force
and the potential energy density vanish at a point x0 by choosing ~a = −φ(x ~ 0)
(i.e. by going to a freely falling reference frame, as in our first discussion of the
equivalence principle in section 1.1). This local ambiguity of the gravitational
potential energy density can be (and is usually implicitly) fixed by invoking (non-
locally) the required or expected asymptotic behaviour of the gravitational field
(e.g. that it goes to zero asymptotically).
It is this local ambiguity that sets the gravitational potential energy density apart
from (formally similar) quantities like the electrostatic potential energy density.
This suggests that perhaps this non-localisability or non-tensoriality of the grav-
itational potential energy density is an intrinsic property of this quantity, thus
simply an inevitable feature and not a bug.
Again this argument may sound compelling but is not without loopholes. For
example, an integration by parts would put the Newtonian gravitational potential
energy density into the form φ∆φ, and following the argument in section 10.3
(extended to quadratic order in the fluctuations of the metric) it is certainly
conceivable that one come up with some tensorial generalisations of this expression
(involving second derivatives of the metric, i.e. contractions the Riemann tensor).
However, then one needs to investigate whether such a candidate expression has
other desired or desirable properties, beyond having the “right” Newtonian limit,
in order to qualify as a candidate for the gravitational energy density or energy-
momentum tensor.
15
For a lucid recent discussion of this issue and its ramifications in general relativity, see J.
Frauendiener, L. Szabados, A note on the post-Newtonian limit of quasi-local energy expressions,
arXiv:1102.1867 [gr-qc].
203
In some way, this situation reflects the tension of having to choose between working
with
• either a covariant action, but involving second derivatives of the metric (the
Einstein-Hilbert action (11.2)),
• or a non-covariant action, but nicely quadratic in the first derivatives of the
metric (the Einstein action (11.12)).
Such obervations and considerations (and a dislike of pseudo-tensorial objects) have led
to the realisation that the notion of local energy density of the gravitational field at a
point may not be particularly useful or meaningful in general relativity.
204
2-sphere ‘at infinity’ (ADM charges - for a very brief discussion of the ADM energy see
section 16.4). Problems with the potential non-tensoriality of the boundary integrand
can in those cases be avoided e.g. by appealing to an inertial Minkowskian observer
at infinity and only requring a tensorial behaviour under Lorentz transformations at
infinity.
Perhaps the most interesting question is therefore if one can do a bit better than that
and talk in a meaningful way not only about the total energy contained in the total
spatial volume of a space-time, but also somewhat more locally (quasi-locally) about the
total energy (including the effects of gravity) in some finite localised volume of space-
time.17 In spite of the problems with strictly local expressions for the energy-density,
numerous candidates for such quasi-local expressions for the energy have indeed been
proposed and debated in the literature, and have been argued to provide reasonable
definitions of gravitational energy in specific circumstances.18 The final outcome may
be that eventually a single “best” and universally accepted and agreed upon definition of
quasi-local gravitational energy emerges from these investigations. However, at present
it seems at least equally likely that no such universal “best” definition exists and that
the “right” answer depends on the specific context and question one is asking.
Recall from sections 4.8, 4.9 and 8.10 that a priori a metric gµν and a connection Γ̃λµν are
independent concepts, and that the notion of curvature (curvature and Ricci tensors)
can be defined for an arbitrary connection,
Rλσµν (Γ̃) = ∂µ Γ̃λσν − ∂ν Γ̃λσµ + Γ̃λµρ Γ̃ρνσ − Γ̃λνρ Γ̃ρµσ , Rµν (Γ̃) = Rλµλν (Γ̃) . (11.127)
General relativity employs and is formulated in terms of the canonical Levi-Civita con-
nection described by the Christoffel symbols Γ̃λµν = Γλµν , characterised by the fact
that the connection is compatible with the metric and has no torsion. It is thus easy
to come up with various generalisations of general relativity in which these require-
ments are relaxed.19 We will not get into these matters here. However, it is a curious,
17
See P. McGrath, R. Epp, R. Mann, Quasilocal Conservation Laws: Why We Need Them,
arXiv:1206.6512 [gr-qc] for a particularly illuminating discussion of the shortcomings of strictly
local definitions to account for the effects of accelerating reference frames in special relativity (and
thus, by the equivalence principle, of gravity) and how these can be overcome by adopting a quasi-local
perspective.
18
For an excellent detailed review of the attempts to define a notion of local or
quasi-local energy density of the gravitational field, see L. Szabados, Quasi-Local Energy-
Momentum and Angular Momentum in General Relativity, Living Rev. Relativity 12, (2009), 4;
http://www.livingreviews.org/lrr-2009-4.
19
Many of these (including theories with non-symmetric metrics) were originally explored by Einstein
and his collaborators in their futile and ill-motivated attempts to find a unified field theory of gravity and
Maxwell theory. For details, see e.g. the review H. Goenner, On the History of Unified Field Theories,
Living Rev. Relativity 7 (2004) 2, http://www.livingreviews.org/lrr-2004-2.
205
and occasionally calculationally or conceptually useful, fact that it is possible to relax
somewhat the a priori identification of the connection with the Levi-Civita connection
and nevertheless reproduce general relativity by treating the connection and metric as
independent variables.
for a (yet to be specified) class of connections Γ̃λµν , with gµν and Γ̃λµν to be treated as
independent variables. Since Rµν (Γ̃) does then not depend on the metric, the charac-
teristic feature and key simplification of this kind of action is that the variation with
√
respect to the metric is elementary (and identical to the variation of the ggµν terms
√
of the Einstein-Hilbert Lagrangian density gg µν Rµν ), namely
Z
λ √ 4
δg S[gµν , Γ̃ µν ] = gd x Rµν (Γ̃) − 21 gµν R(Γ̃) δgµν (11.129)
These are, however, not yet the vacuum Einstein equations because the independent
connection Γ̃ is not the Levi-Civita connection.
It remains to look at the equations of motion imposed by stationarity of the action with
respect to variations of Γ̃. It turns out (the Palatini principle) that
2. if one chooses the connections to be compatible with the metric and imposes the
Γ̃-equations of motion, then the connections are forced to also be torsion-free and
thus Γ̃ is uniquely determined to be the Levi-Civita connection
In terms of the notation introduced in section 4.9, this amounts to the assertions
Tλµν (Γ̃) = 0 and δΓ̃ S[g, Γ̃] = 0 ⇒ Qλµν (Γ̃) = 0 ⇒ Γ̃λµν = Γλµν
(11.131)
Qλµν (Γ̃) = 0 and δΓ̃ S[g, Γ̃] = 0 ⇒ Tλµν (Γ̃) = 0 ⇒ Γ̃λµν = Γλµν
In either case, the metric equations of motion (11.130) then reduce to the vacuum
Einstein equations.
206
In order to establish the assertions (11.131), we need two preparatory results. The first
is that the generalisation of the formula (11.19),
for the variation of the Ricci tensor in terms of the variation of the connection is
δRµν (Γ̃) = ∇ ˜ ν δΓ̃λ + Γ̃ρ − Γ̃ρ
˜ λ δΓ̃λµν − ∇ λ
µλ νλ λν δ Γ̃ µρ . (11.133)
The second is that when the connection is not the Levi-Civita connection, an expression
˜ λ J λ is not a total derivative in the integral, this being only true for the Levi-Civita
like ∇
connection thanks to the identity
Z Z
√ 4 √
gd x ∇λ J = d4 x ∂λ ( gJ λ ) .
λ
(11.134)
Writing
Γ̃λµν = Γλµν + C λµν , (11.135)
we have
˜ λ V µ = ∇λ V µ + C µ V ν
∇ (11.136)
νλ
and, in particular,
˜ λ J λ = ∇λ J λ + C λ J ν ,
∇ (11.137)
νλ
only the first term on the right-hand side giving rise to a total derivative.
What we are interested in is gµν δRµν (Γ̃), and with the above results and notation we
can write this as
g µν δRµν (Γ̃) =gµν C λρλ δC ρµν − C ρµλ δC λρν − Cλν
ρ
δC λµρ + Cµν
ρ
δC λρλ
(11.138)
+ total derivative terms
If we were to consider arbitrary Γ̃ and hence unconstrained variations δCγαβ , the con-
dition for the action to be stationary with respect to variations of Γ̃ would be
However, these equations do not determine the C αβγ uniquely, and hence in this case the
Einstein-Hilbert-like action (11.128) alone does not give rise to acceptable equations of
motion for the fields.
The situation changes if one imposes some a priori contraints on the allowed Γ̃, and
hence on their variations δCγαβ . We now consider separately the two cases mentioned
above:
207
1. Γ̃ are restricted to be symmetric (torsion-free)
γ
In terms of the coefficients Cαβ , this amounts to the condition
Taking traces, once by contraction with gαβ and once by contraction with gαγ (or,
equivalently, with gβγ ), one obtains two linearly independent conditions on the
traces C λγ λ and C αλ λ requiring both to vanish,
traces ⇒ C λγ λ = 0 , C αλ λ = 0 . (11.144)
But as we have seen in (4.84) of section 4.9, this is precisely the condition that
the non-metricity tensor is zero,
Thus
Cγαβ = Cγ(αβ) + Cγ[αβ] = T(αβ)γ + 12 Tγαβ (11.149)
208
is the contorsion tensor. In this case, antisymmetrisation of (11.139) leads to
Taking traces, one finds Cλγ λ = 0 (the other trace C λλγ is identically zero because
of anti-symmetry), and thus (11.150) reduces to
Since we started off with metric-compatible connection, this means that the equa-
tions of motion fix the connection Γ̃ to be the Levi-Civita connection. Alterna-
tively, (11.147) and (11.151) imply that Cγαβ = 0. This concludes the proof of
the second assertion in (11.131).
Remarks:
1. When one couples either of these theories to matter, one will find the standard
Einstein equations with source the usual matter energy-momentum tensor, pro-
vided that the minimally coupled matter action depends only on the metric and
not on the connection. As we have seen, this is satisfied in the case of scalar or
Maxwell gauge fields (for which the minimally coupled action in the usual setting
could be written in such a way that it depends only on the metric but not on the
derivatives of the metric). However, typically the connection appears explicitly in
the action for spinors, and in this case variation of the matter action will produce
a non-zero contribution to the torsion, say. Thus in that case the Einstein-Hilbert
approach (no torsion as a kinematical constraint) and the Palatini approach (tor-
sion determined dynamically) are inequivalent.
2. Within the present framework, it is not possible to relax both the no-torsion
and the metricity constraints simultaneously, but one can attempt to achieve this
either with the aid of additional auxiliary fields, or “spontaneously” by adding
potentials that force the connection in the ground state to either zero torsion or
zero non-metricity.20
3. Once one contemplates and permits the presence of non-metricity and/or tor-
sion, there are many more terms that one can in principle use to build an action
(by using scalars built from the torsion and non-metricity tensors Tµνλ and Qµνλ
and their covariant derivatives). Thus, unless one imposes additional symmetry
20
See e.g. R. Percacci, Geometry of Nonlinear Field Theories for an exploration of some of these ideas.
209
requirements, say, there is no good reason to focus attention exclusively on the
Einstein-Hilbert-like action (11.128), and many generalisations of general relativ-
ity suggested and disucssed in the literature can and should be either rejected or
amended simply on these grounds.21
21
In his review of gauge theories of gravity, F. Hehl, Gauge Theories of Gravity and Spacetime,
arXiv:1204.3672 [gr-qc], also emphasises this: “Numerous pages of printed pages could be saved if
our colleagues would [. . . ] just motivate their choice of the unknown constants.” In that review it is
also pointed out that what is generally referred to (and I called) the Palatini formalism should properly
also be attributed to Einstein (1925).
210
Part II: Basic Applications of General Relativity
In Part II of the course (which can to a large extent be read in parallel to Part I -
see the comments at the very end of section 2), we will now discuss some basic and
fundamental applications of General Relativity. These will include
• a discussion of the classical predictions and tests of General Relativity (the de-
flection of light by the sun and the perihelion shift of Mercury),
The blocks made of sections 12-15, 17-21, and section 16 can be read pretty much
independently of each other.
Miscellaneous other (semi-)advanced topics will be presented in Part III of the notes.
211
12 The Schwarzschild Metric
12.1 Introduction
3. the anomalous precession of the perihelion of the orbits of Mercury and Venus,
and calculated the theoretical predictions for these effects. In the meantime, other tests
have also been suggested and performed, for example the time delay of radar echos
passing the sun (the Shapiro effect).
All these tests have in common that they are carried out in empty space, with gravita-
tional fields that are to an excellent aproximation static (time independent) and isotropic
(spherically symmetric). Thus our first aim will have to be to solve the vacuum Einstein
equations under the simplifying assumptions of isotropy and time-independence. This,
as we will see, is indeed not too difficult.
Even though we have decided that we are interested in static spherically symmetric
metrics, we still have to determine what we actually mean by this statement. After
all, a metric which looks time-independent in one coordinate system may not do so in
another coordinate system. There are two ways of approaching this issue:
1. One can try to look for a covariant characterisation of such metrics, in terms of
Killing vectors etc. In the present context, this would amount to considering met-
rics which admit four Killing vectors, one of which is timelike, with the remaining
three representing the Lie algebra of the rotation group SO(3).
2. Or one works with ‘preferred’ coordinates from the outset, in which these symme-
tries are manifest.
While the former approach may be conceptually more satisfactory, the latter is much
easier to work with and is hence the one we will adopt.
It is important to recall and realise once again that, precisely because the theory is in-
variant under coordinate transformations, one is allowed to choose whatever coordinate
system is most convenient to perform a calculation (much like Lorentz invariance in
212
special relativity allows one to prove Lorentz-invariant statements by proving them in
any suitably chosen intertial frame).
This ansatz, depending on the four functions A(r), B(r), C(r), D(r), can still be simpli-
fied a lot by choosing appropriate new time and radial coordinates.
Then
dT 2 = dt2 + ψ ′2 dr 2 + 2ψ ′ dr dt . (12.3)
Thus we can eliminate the off-diagonal term in the metric by choosing ψ to satisfy the
differential equation
dψ(r) C(r)
=− . (12.4)
dr A(r)
This is tantamount to making the coordinate choice C(r) = 0.
We can also eliminate D(r) by introducing a new radial coordinate R(r) by R2 = D(r)r 2 .
Thus we can assume that the line element of a static isotropic metric is of the form
This is tantamount to making the coordinate choice D(r) = 1, and leads to what is
known as the standard form of a static isotropic metric.
Remarks:
1. We will mostly be using the metric in the standard form (12.5). Let us note some
immediate properties of this metric:
(a) By our ansatz, the components of the metric are time-independent. Because
we have been able to eliminate the dtdr-term, the metric is also invariant
under time-reversal t → −t.
213
(b) The surfaces of constant t and r have the metric
2. The advantage of the istropic form of the metric is that one can, as already
indicated in (12.6), replace dr 2 + r 2 dΩ2 by e.g. the standard metric on R3 in
Cartesian coordinates, or any other metric on R3 . This is useful when one likes to
think of the solar system as being essentially described by flat space, with some
choice of coordinates.
3. We will see later on, in section 15.1, that a particularly convenient different gauge
choice is B(r) = D(r) = 1. Thus the metric takes the form
and has the characetristic property that it is non-diagonal while the metric on the
slices of constant t is the flat Euclidean metric.
4. We have assumed from the outset that the metric is static. However, it can be
shown with little effort (see section 12.6) that the vacuum Einstein equations imply
that a spherically symmetric metric is static. This result is known as Birkhoff ’s
theorem. It is the General Relativity analogue of the Newtonian result that a
spherically symmetric body behaves as if all the mass were concentrated in its
center.
5. In the present context it means that the gravitational field not only of a static
spherically symmetric body is static and spherically symmetric (as we have as-
sumed), but that the same is true for a radially oscillating/pulsating object. This
is a bit surprising because one would expect such a body to emit gravitational
radiation. What Birkhoff’s theorem shows is that this radiation cannot escape
214
into empty space (because otherwise it would destroy the time-independence of
the metric). Translated into the language of waves, this means that there is no
s-wave (monopole) gravitational radiation.
We will come back to other aspects of measurements of space and time in such a geom-
etry after we have solved the Einstein equations.
We will now solve the vacuum Einstein equations for the static isotropic metric in
standard form, i.e. we look for solutions of Rµν = 0 for metrics of the type (12.5). You
should have already (as an exercise) calculated all the Christoffel symbols of this metric,
using the Euler-Lagrange equations for the geodesic equation.
As a reminder, here is how this method works. To calculate all the Christoffel symbols
Γrµν , say, in one go, you look at the Euler Lagrange equation for r = r(τ ) resulting from
the Lagrangian L = gµν ẋµ ẋν /2. This is easily seen to be
B′ 2 A′ 2
r̈ + ṙ + ṫ + · · · = 0 (12.10)
2B 2B
(a prime denotes an r-derivative), from which one reads off that Γrrr = B ′ /2B etc.
Proceeding in this way, you should find (or have found) that the non-zero Christoffel
symbols are given by
B′ A′
Γrrr = Γrtt =
2B 2B
r r sin2 θ
Γrθθ =− Γrφφ = −
B B
θ φ 1 t A′
Γ θr = Γ φr = Γ tr =
r 2A
θ φ
Γ φφ = − sin θ cos θ Γ φθ = cot θ (12.11)
Now we need to calculate the Ricci tensor of this metric. A silly way of doing this
would be to blindly calculate all the components of the Riemann tensor and to then
perform all the relevant contractions to obtain the Ricci tensor. A more intelligent and
less time-consuming strategy is the following:
1. Instead of using the explicit formula for the Riemann tensor in terms of Christoffel
symbols, one should use directly its contracted version
Rµν = Rλµλν
= ∂λ Γλµν − ∂ν Γλµλ + Γλλρ Γρµν − Γλνρ Γρµλ (12.12)
215
2. The high degree of symmetry of the Schwarzschild metric implies that many com-
ponents of the Ricci tensor are automatically zero. For example, invariance of the
Schwarzschild metric under t → −t implies that Rrt = 0. The argument for this
is simple. Since the metric is invariant under t → −t, the Ricci tensor should also
be invariant. But under the coordinate transformation t → −t, Rrt transforms as
Rrt → −Rrt . Hence, invariance requires Rrt = 0, and no further calculations for
this component of the Ricci tensor are required.
4. Since the Schwarzschild metric is spherically symmetric, its Ricci tensor is also
spherically symmetric. It is easy to prove, by considering the effect of a coordinate
transformation that is a rotation of the two-sphere defined by θ and φ (leaving
the metric invariant), that this implies that
∂θ 2 ∂φ
Rθ ′ θ ′ = ( ′
) Rθθ + ( ′ )2 Rφφ . (12.17)
∂θ ∂θ
Demanding that this be equal to Rθθ (because we are considering a coordinate
transformation which does not change the metric) and using the condition derived
above, one obtains
∂φ 2 ∂φ
Rθθ = Rθθ (1 − sin2 θ( ) ) + ( ′ )2 Rφφ , (12.18)
∂θ ′ ∂θ
which implies (12.14).
5. Thus the only components of the Ricci tensor that we need to compute are Rrr ,
Rtt and Rθθ .
216
Now some unenlightning calculations lead to the result that these components of the
Ricci tensor are given by
A′′ A′ A′ B ′ A′
Rtt = − ( + )+
2B 4B A B rB
A′′ A′ A′ B ′ B′
Rrr = − + ( + )+
2A 4A A B rB
1 r A′ B ′
Rθθ = 1− − ( − ) . (12.19)
B 2B A B
Inspection of these formulae reveals that there is a linear combination which is partic-
ularly simple, namely BRtt + ARrr , which can be written as
1 ′
BRtt + ARrr = rB (A B + B ′ A) . (12.20)
Plugging this result into the expression for Rθθ , one obtains
To fix C, we compare with the Newtonian limit which tells us that asymptotically
A(r) = −g00 should approach (temporarily reinserting c) (1 + 2Φ/c2 ), where Φ =
−GN M/r is the Newtonian potential for a static spherically symmetric star of mass M .
Thus C = −2M G/c2 , and the final form of the metric is
2M GN 2 2 2M GN −1 2
ds2 = −(1 − c2 r
)c dt + (1 − c2 r
) dr + r 2 dΩ2 (12.25)
This is the famous Schwarzschild metric, obtained by the astronomer Karl Schwarzschild
in 1916, the very same year that Einstein published his field equations, while he was
serving as a soldier in World War I (and discovered independently a few months later
by Johannes Droste, a student of Lorentz).
217
We will usually not write the constant GN explicitly (and set c = 1), and thus we
introduce the abbreviation
GN M
m= , (12.26)
c2
in terms of which the Schwarzschild metric takes the form
2m
ds2 = −f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2 , f (r) = 1 − . (12.27)
r
The interpretation of m is that of the gravitational mass radius associated to the mass
M . To see that all this is dimensionally correct, note that Newton’s constant has
dimensions (M mass, L length, T time) [GN ] = M−1 L3 T−2 so that
For examples of the value of m for various objects see section 12.4.
We have seen that, by imposing appropriate symmetry conditions on the metric, and
making judicious use of them in the course of the calculation, the complicated Einstein
equations become rather simple and manageable.
1. The polar coordinates θ and φ have their standard interpretation and range.
2. The time coordinate t can be interpreted as the proper time of a static observer
infinittely far away from the star, at r → ∞. Thus, given the asymptotic flatness
of the solution, we can think of t as measuring Minkowski time. Clearly, the range
of t is unrestricted, −∞ < t < ∞.
218
(a) We had already discussed above, that its geometrical interpretation is that
of an area radius, i.e. it is characterised by the fact that, even though r is
not proper radial distance, the surface area of a sphere of constant radius r
is 4πr 2 .
(b) Moreover, the metric is, by construction, a vacuum metric. Thus, if the star
has radius r0 , then the solution is only valid for r > r0 , and the range of r is
restricted appropriately, r0 < r < ∞.
(c) However, (12.27) also shows that the metric appears to have a singularity at
the Schwarzschild radius rs , given by
2GN M
rs = = 2m . (12.29)
c2
Thus, for the time being we will also require r > rs . For most practical
purposes, this is not a further constraint on the range of r, since the radius
of a physical object is almost always much larger than its Schwarzschild
radius. For example, for a proton, for the earth and for the sun one has
approximately
However, for more compact objects, their radius can approach that of their
Schwarzschild radius. For example, for neutron stars one can have rs ∼ 0.1r0 ,
and it is an interesting question (we will take up again later on, in sections
14 and 15) what happens to an object whose size is equal to or smaller than
its Schwarzschild radius.
(d) One thing that does not occur at rs , however, in spite of what (12.27) may
suggest, is a true physical singularity. The singularity in (12.27) turns out
to be a pure coordinate singularity, i.e. an artefact of having chosen a poor
coordinate system, and later on we will construct coordinates in which the
metric is completely regular at rs . Nevertheless, it turns out that something
interesting does happen at r = rs , even though there is no singularity and
e.g. geodesics are perfectly well behaved there: rs is an event-horizon, in a
sense a point of no return. Once one has passed the Schwarzschild radius of
an object with r0 < rs , there is no turning back, not on geodesics, but also
not with any amount of acceleration.
In order to learn how to visualise the Schwarzschild metric (for r > r0 > rs ), we will
now discuss some further elementary properties of length and time in the Schwarzschild
219
geometry.
Let us first consider proper time for a stationary observer, i.e. an observer at rest at
fixed values of (r, θ, φ). Proper time is related to coordinate time by
Thus, first of all, up to a constant factor for such observers their proper time agrees with
their coordinate time, and we can simply and conveniently label events as described by
a stationary observer by the coordinate time t instead of the proper time of the observer.
Secondly, we can interpret the above relationship as the statement that stationary clocks
(measuring the proper time τ ) run slower in a gravitational field - something we already
saw in the discussion of the gravitational red-shift in section 2.8, and also in the discus-
sion of the so-called ‘twin-paradox’ and the equivalence principle in section 1.1. This
formula again suggests that something interesting is happening at the Schwarzschild
radius r = 2m - we will come back to this below.
As regards spatial length measurements, thus dt = 0, we have already seen above that
the slices r = const. have the standard two-sphere geometry. However, as r varies, these
two-spheres vary in a way different to the way concentric two-spheres vary in R3 . To see
this, note that the proper radius R, obtained from the spatial line element by setting
θ = const., φ = const, is
In other words, the proper radial distance between concentric spheres of area 4πr 2 and
area 4π(r + dr)2 is dR > dr and hence larger than in flat space. Note that dR → dr
for r → ∞ so that, as expected, far away from the origin the space approximately looks
like R3 . One way to visualise this geometry is as a sort of throat or sink, as in Figure 9.
To get some more quantitative feeling for the distortion of the geometry produced by
the gravitational field of a star, consider a long stick lying radially in this gravitational
field, with its endpoints at the coordinate values r1 > r2 . To compute its length L, we
have to evaluate Z r1
L= dr(1 − 2m/r)−1/2 . (12.33)
r2
It is possible to evaluate this integral in closed form (by changing variables from r to
u = 1/r), but for the present purposes it will be enough to treat 2m/r as a small
perturbation and to only retain the term linear in m in the Taylor expansion. Then we
find Z r1
r1
L≈ dr(1 + m/r) = (r1 − r2 ) + m log > (r1 − r2 ) . (12.34)
r2 r2
We see that the corrections to the Euclidean result are suppressed by powers of the
Schwarzschild radius rs = 2m so that for most astronomical purposes one can simply
work with coordinate distances.
220
Sphere of radius r+dr
dr
dR > dr
Sphere of radius r
This is even more evident when one puts the Schwarzschild metric into the isotropic
form of the metric (12.6). This is accomplished by the coordinate transformation
m 2
r = ρ(1 + ) , (12.35)
2ρ
leading to
m 2
(1 − 2ρ ) m 4 2
ds2 = − m 2 dt
2
+ (1 + ) (dρ + ρ2 dΩ2 )
(1 + 2ρ ) 2ρ
m 2
(12.36)
(1 − 2ρ ) 2 m
=− m 2 dt + (1 + )4 d~x2 ( ρ2 = ~x2 ) ,
(1 + 2ρ ) 2ρ
as can easily be verified. To actually find the appropriate coordinate transformation in
the first place, one needs to solve the equation
i.e.
dr dρ
= ⇒ ln(ρ/ρ0 ) = cosh−1 (r/m − 1) . (12.38)
rf 1/2 (r) ρ
This leads to (12.35) for the choice ρ0 = m/2 (but any other choice would have been
just as good). From this one can now read off that the relation between proper and
coordinate distance (the latter now referring to the spatial Cartesian coordinates ~x in
(12.36)) is
m
(∆x)proper = (1 + )2 (∆x) . (12.39)
2ρ
221
However, in interpreting this or using this form of the metric for other purposes, one
should pay attention to the fact that the region 2m < r < ∞ of the Schwarzschild
metric is covered twice by the isotropic coordinates. In particular, r → ∞ both for
ρ → 0 and for ρ → ∞, while r(ρ) reaches its minimal value r = 2m for ρ = m/2.
Thus the metric in isotropic coordinates appears to describe not just one but two iden-
tical (isometric) asymptotically flat regions, joined together at the 2-sphere ρ = m/2 ↔
r = 2m. This is the first indication that with the Schwarzschild metric we seem to have
obtained more than we bargained for, in particular when considering objects whose
radius is smaller than the Schwarzschild radius rs = 2m. Later on, we will encounter
numerous other coordinate systems for the Schwarzschild metric, providing us with dif-
ferent insights into its physics and geometry. In particular, we will then (re-)discover
this second asymptotically flat region in section 15.6 (as the “mirror region III”).
A slightly more general calculation than we have performed in section 12.3 to find the
Schwarzschild solution for the exterior of a spherically symmetric static star provides
us with
2. some more insight into the interpretation of the parameter M as the mass of the
solution;
3. as an added benefit, the basic set of equations governing the solutions of the
Einstein equations for the interior of the star (an issue we will briefly consider in
section 12.7).
Thus, let us start with a general spherically symmetric (but not necessarily time-
independent) metric. Arguments analogous to those leading to (12.5) allow one to
conclude that the metric can be chosen to be of the form
However, as we have already seen in the derivation of the Schwarzschild metric, this
parametrisation of the metric in terms of the two functions A(r) and B(r) is not ideal.
To see what might be more convenient, we first reanalyse the Einstein equations in the
time-independent case, but this time with an energy-momentum tensor. Thanks to the
relation (12.20) we have
(AB)′
Rrr − Rtt = (rB)−1 . (12.41)
AB
222
This suggests that it is useful to introduce a new function h(r) through
i.e. through
A(r) = e 2h(r) f (r) , B(r) = f (r)−1 (12.43)
for some arbitrary new function f (r). Using the Einstein equations (10.33) in the form
one sees that one particular linear combination of the Einstein equations now takes the
form
h′ (r) = 4πGN rf (r)−1 (T rr − T tt ) . (12.45)
The remaining independent component (in the time-independent case) can be chosen
to be
Rtt − 12 R = 8πGN T tt (12.46)
which, after a bit of algebra with the formulae (12.19), works out to be
Equations (12.45) and (12.49) make it as manifest as possible that the spherically sym-
metric vacuum solution is the Schwarzschild metric with f (r) = 1 − 2m/r, m constant,
and h(r) = 0 (an arbitrary constant can be absorbed into the definition of the time-
coordinate t).
Let us therefore now, in the time-dependent case, and with the benefit of the above
hindsight, parametrise the two arbitrary functions A(t, r) and B(t, r) in (12.40) in terms
of two other functions h(t, r) and either f (t, r) or m(t, r) by the substitution
In this gauge, the full (non-vacuum) Einstein equations turn out to take a particuarly
simple and useful form. The previously obtained equations (12.45) and (12.49) continue
223
to be valid also in the time-dependent case, and there is now one more independent
equation, arising from, say, the (rt)-component of the Einstein equation.
2m′ (t, r)
Gtt = −
r2
2ṁ(t, r)
Grt = + (12.52)
r2
′
2h (t, r)f (t, r) 2m′ (t, r)
Grr = + − ,
r r2
with a somewhat more complicated and unenlightning expression for the angular compo-
nents, depending on all of m′ , m′′ , ṁ, m̈, h′ , h′′ , ḣ, which we will fortunately not need. In
particular, among the 3 above components only Grt contains a time-derivative ṁ(t, r) =
∂t m(t, r). Moreover one can replace Grr by the simpler linear combination
• Thus h(t), which only appears in the (tt)-component of the metric, can simply be
absorbed into a redefinition of t,
224
Thus we uniquely recover the Schwarzschild solution, even without having to assume
from the outset that the metric is time-independent. This is Birkhoff’s theorem.
Remarks:
1. A caveat should be added here: if one applies the above reasoning to a region of
space-time where f (r) < 0 (we will study this region of the Schwarzschild metric in
great detail in section 15), then the above argument still shows that an additional
Killing vector emerges from the joint requirement of spherical symmmetry and
the vacuum Einstein equations. But now this Killing vector (misleadingly called
∂t ) is spacelike and not timelike.22
2. The above set (12.54) of Einstein equations for spherical symmetry (which should
still be supplemented by, say, the conservation law for the energy-momentum
tensor), also allows one to read off some fairly simple generalisations of Birkhoff’s
theorem, such as “spherical symmetry and static sources (i.e. T αβ = T αβ (r)) ⇒
metric is time-independent”.
3. Other generalisations of the Birkhoff theorem are reviewed and discussed in an ar-
ticle by H.-J. Schmidt.23 There it is also stressed that the validity of Birkhoff-like
theorems relies crucially on the fact, mentioned in section 8.7 in connection with
equation (8.61), that the 2-dimensional Ricci tensor (in the (t, r)-directions trans-
verse to the sphere) has only one degenerate eigenvalue, thus strongly constraining
higher-dimensional generalisations of such statements.
4. Realistic astrophysical systems are neither exactly spherically symmetric nor ex-
actly vacuum (even outside the star), and typically the sources are not static
either. It is therefore of interest to investigate more generally if or to which extent
Birkhoff’s theorem remains approximately true when the system under consider-
ation is only approximately spherically symmetric or vacuum. This question has
been analysed by Goswami and Ellis.24
5. Since T tt is (minus) the energy density and T rt represents the radial energy flux,
the above equations show that m(t, r) can inded be interpreted as the mass or
energy of the solution.
Indeed, let ρ(r) = −T tt denote the energy density inside a static spherically sym-
metric star, say, and let m(r) = GN M (r). Then (12.49) implies
Z r
M ′ (r) = 4πr 2 ρ(r) ⇒ M (r) = 4π dr ′ (r ′ )2 ρ(r ′ ) , (12.58)
0
22
For a more careful statement and proof of Birkhoff’s theorem along these lines see e.g. K. Schleich,
D. Witt, A simple proof of Birkhoff ’s theorem for cosmological constant, arXiv:0908.4110v2 [gr-qc].
23
H.-J. Schmidt, The tetralogy of Birkhoff theorems, arXiv:1208.5237 [gr-qc].
24
R. Goswami, G. Ellis, Almost Birkhoff Theorem in General Relativity, arXiv:1101.4520 [gr-qc],
R. Goswami, G. Ellis, Birkhoff Theorem and Matter, arXiv:1202.0240 [gr-qc].
225
which looks exactly like the ordinary mass inside sphere of radius r (in flat space).
In particular, if ρ(r) = 0 for r > r0 (with r0 the radius of the star), then one can
interpret Z r0
M ≡ M (r0 ) = 4π dr ′ (r ′ )2 ρ(r ′ ) (12.59)
0
as the total mass-energy of the star. One can (try to) attribute the difference
between this integral and that of ρ(r) weighted by the proper spatial volume
element, Z r0 p
Mproper = 4π dr ′ (r ′ )2 grr (r ′ )ρ(r ′ )
Z 0 r0 (12.60)
= 4π dr ′ (r ′ )2 ρ(r ′ )(1 − 2m(r ′ ))−1/2 > M
0
to the binding energy of the star.
of the Schwarzschild solution is implied not just by the vacuum Einstein equa-
tions but, more generally, by the Einstein equations with T tt = T rr . This sit-
uation is not as uncommon as one may think. For example, solutions of the
Einstein-Maxwell equations for spherically symmetric electrically charged stars
(the Reissner-Nordstrøm solution, see section 22), even with the inclusion of a
cosmological constant, turn out to also be of this form. Some other examples of
solutions of this type (in more than 4 dimensions) are presented in section 15.10.25
12.7 Interior Solution for a Static Star and the TOV Equation
The Schwarzschild solution is a solution of the vacuum Einstein equations for the exterior
of a spherically symmetric (static) star. The Einstein equations also govern and describe
the gravitational field = space-time geometry in the interior of the star. In this case one
needs to specify the energy-momentum tensor for the matter content in the interior of the
star, and in general this is a complicated astrophysics problem which we will not address
here. However, a useful idealised model of the energy-momentum tensor, compatible
with the symmetry requirements arising from the fact that the star is assumed to be
static and spherically symmetric, which we will take to mean that the metric has the
form
ds2 = −e 2h(r) f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2
2m(r) (12.62)
f (r) = 1 − ,
r
25
For some further reflections on the ubiquity of such solutions, see T. Jacobson, When is gtt grr = −1?,
arXiv:0707.3222v3 [gr-qc].
226
is provided by the ansatz
Here we interpret ρ = ρ(r) as the energy density of the star, and p = p(r) as its pressure
density.
Remarks:
1. This ansatz amounts to neglecting anisotropic stresses in the interior of the star
(the spatial off-diagonal components) as well as energy-flow in the form of heat-
conduction, say (the off-diagonal time-space components). Depending on the type
of star one wishes to describe this may or may not be a justified approximation
(but is considered to be an excellent approximation for very compact stars like
white dwarves / dwarfs and neutron stars).
where
uα = (−gtt (r))−1/2 , 0, 0, 0 ⇒ uα uα = gtt (r)(−gtt (r))−1 = −1 (12.65)
3. Specifying the energy-momentum content requires specifying not only the energy-
momentum tensor (12.63) but also an equation of state, which in this simplified
context amounts to postulating a relation p = p(ρ). Again see section 19.2 for
a discussion and examples of this. We will sidestep this issue in the discussion
below since the only case we will consider explicitly is that of constant energy
density ρ(r) = ρ0 , in which case the Einstein equations determine p = p(r) via
the Tolman-Oppenheimer-Volkoff equation to be derived (in general) below.
With this set-up, the Einstein equations (12.45) and (12.49) for a metric of the form
(12.62) and an energy-momentum tensor of the form (12.63) read
227
where the regularity condition M (0) = 0 has been imposed.
The equations (12.66) need to be supplemented either by the conservation-law
∇α T αβ = 0 (12.68)
Remarks:
2. It can be interpreted as the condition for hydrostatic equilibrium of the star and
generalises the Newtonian hydrostatic equation
ρ(r)M (r)
p′ (r) = −GN . (12.73)
r2
The differences between (12.72) and (12.73) can be attributed to (and provide
useful insight into) the key differences between Newtonian gravity and general
relativity:
• In general relativity, not only the mass M (r) acts as a source of the gravi-
tational field, but anything that appears in the energy-momentum tensor, in
particular in the present sitaution the pressure p(r). This accounts for the
substitution M (r) → M (r) + 4πr 3 p(r).
228
• In general relativity, gravity acts not only on ρ(r) but also on p(r). This
accounts for the substitution ρ(r) → ρ(r) + p(r).
• In general relativity, the gravitational force differs from the Newtonian grav-
itational force. In the present case this accounts for the additional factor of
f (r) = 1 − 2m(r)/r in the denominator.
We will now consider the solution of this set of equations in the case where the energy
density is constant inside the star,
This already completely determines the spatial part of the interior metric, and matches
perfectly with the radial part of the exterior Schwarzschild solution, which has grr =
f (r)−1 with f (r) = 1 − 2GN M/r for all r > r0 .
Knowing m(r) (and ρ(r) = ρ0 ), we can determine p(r) from the TOV equation (12.72),
which now reads
4πGN r
p′ (r) = − (ρ0 + p(r)) (ρ0 + 3p(r)) . (12.77)
3 f (r)
Writing
p′ (r) 1 d ρ0 + 3p
= ln (12.78)
(ρ0 + p(r))(ρ0 + 3p(r)) 2ρ0 dr ρ0 + p
and
4πGN r 1 d
− = ln f (r) , (12.79)
3 f (r) 4ρ0 dr
it follows immediately that the solution satisfying the boundary condition p(r0 ) = 0 is
Before turning to that, let us complete the solution of the Einstein equations by deter-
mining h(r). This can be found either directly by integration of the second equation in
(12.66),
h′ (r) = 4πGN rf (r)−1 (ρ0 + p(r)) , (12.81)
229
or, more efficiently, from (12.70), which we rewrite as
d p′ (r) d
(h(r) + 21 ln f (r)) = − ⇔ (h + 21 ln f ) = −(ρ0 + p)−1 . (12.82)
dr ρ0 + p(r) dp
The solution to this equation is evidently
and fixing this integration constant by the requirement h(r0 ) = 0, so that also the (tt)-
component of the metric matches onto that of the exterior Schwarzschild solution at
r = r0 , one finds
e h(r) = 12 (3f (r0 )1/2 /f (r)1/2 − 1) , (12.85)
or
gtt (r) = − 14 (3f (r0 )1/2 − f (r)1/2 )2 . (12.86)
This completes the derivation of the solution of the Einstein equations for the interior
of a perfect-fluid star with constant energy density.
In summary we have found that the interior metric (r ≤ r0 ) is
with
r2 8πGN
f (r) = 1 − 2
=1− ρ0 r 2 , (12.88)
R 3
supported by a perfect fluid matter with ρ(r) = ρ0 and p(r) given in (12.80). This
matches continuously (and, in fact, once-differentiably) onto the exterior Schwarzschild
solution with gtt (r) = −f (r), grr = f (r)−1 , and f (r) = 1 − 2GN M/r, where M is the
total mass of the star, M = 4πρ0 r03 /3.
Remarks:
230
of the metric on the 3-sphere. Thus the geometry of the interior of the star is
(see section 17) that of a metric of a maximally symmetric space with constant
positive curvature, namely the standard metric on the 3-sphere with radius R,
restricted to the disc r ≤ r0 .
Note that in order for the metric to be well-defined, the range of r needs to be
such that the maximal value r0 satisfies r0 < R. Writing R2 from (12.90) in term
of the total mass M and the radius r0 ,
3 r03 r3
R2 = = = 0 , (12.92)
8πGN ρ0 2GN M rs
we see that 2
R r0
= . (12.93)
r0 rs
Thus R > r0 iff the radius of the star is larger than its Schwarzschild radius,
This is indeed a necessary condition for a star - an object that is smaller than its
Schwarzschild radius will turn out to be not a star but a black hole.
2. A stronger constraint on the relative size of r0 and rs arises from an analysis of the
solution (12.80) of the TOV equation. Recall that we have f (r) = 1−r 2 /R2 . Since
r < r0 in the interior, one has f (r0 ) < f (r). Thus the pressure can potentially
become infinite at values of r where the denominator of (12.80) vanishes.
The condition for the existence of a stable star, i.e. the requirement that the
pressure be non-singular everywhere in the interior of the star, in particular at the
origin r = 0, is
9 4
f (r0 )1/2 > 1/3 ⇔ r0 > rs ⇔ Mmax = r0 . (12.95)
8 9GN
Thus we learn that a star consisting of matter with constant energy density must
be larger than 9/8 times its Schwarzschild radius. In other words, the maximal
amount of mass that can be contained in a star with radius r0 is bounded from
above by 4r0 /9GN , or
2m 8
≤ . (12.96)
r0 9
This result is actually valid for far more general equations of state and is known
as the Buchdahl limit or Buchdahl’s Theorem (1959).26
26
See e.g. J. Guven, N. O’ Murchadha, Bounds on 2m/R for static spherical objects,
arXiv:gr-qc/9903067, H. Andreasson, Sharp bounds on 2m/r of general spherically symmetric
static objects, arXiv:gr-qc/0702137, J. Mark Heinzle, Bounds on 2m/r for static perfect fluids,
arXiv:0708.3352 [gr-qc] for a survey of known results and more recent work along these lines.
231
3. Here we have considered the simplest model of a static star, with constant energy
density. The basic set-up, however, can also be used to analyse in detail the stellar
structure and evolution of other compact stars like neutron stars.27
27
See e.g. N. Straumann, General Relativity: with applications to astrophysics.
232
13 Particle and Photon Orbits in the Schwarzschild Geometry
We now come to the heart of the matter, the study of planetary orbits and light rays
in the gravitational field of the sun, i.e. the properties of timelike and null geodesics of
the Schwarzschild geometry. We shall see that, by once again making good use of the
symmetries of the problem, we can reduce the geodesic equations to a single first order
differential equation in one variable, analogous to that for a one-dimensional particle
moving in a particular potential. Solutions to this equation can then readily be discussed
qualitatively and also quantitatively (analytically).
A convenient starting point in general for discussing geodesics is, as I stressed before,
the Lagrangian L = gµν ẋµ ẋν . For the Schwarzschild metric this is
where 2m = 2M GN /c2 . Rather than writing down and solving the (second order)
geodesic equations, we will make use of the conserved quantities Kµ ẋµ associated with
Killing vectors. After all, conserved quantities correspond to first integrals of the equa-
tions of motion and if there are a sufficient number of them (there are) we can directly
reduce the second order differential equations to first order equations.
So, how many Killing vectors does the Schwarzschild metric have? Well, since the
metric is static, there is one timelike Killing vector, namely ∂/∂t, and since the metric
is spherically symmetric, there are spatial Killing vectors generating the Lie algebra of
SO(3), hence there are three of those, and therefore all in all four Killing vectors.
Now, since the gravitational field is isotropic (and hence there is conservation of angular
momentum), the orbits of the particles or planets are planar. Without loss of generality,
we can choose our coordinates in such a way that this plane is the equatorial plane
θ = π/2, so in particular θ̇ = 0, and the residual Lagrangian to deal with is
This choice fixes the direction of the angular momentum (to be orthogonal to the plane)
and leaves two conserved quantities, the energy (per unit rest mass) E and the mag-
nitude L (per unit rest mass) of the angular momentum, corresponding to the cyclic
variables t and φ, (or: corresponding to the Killing vectors ∂/∂t and ∂/∂φ),
∂L d ∂L
=0 ⇒ =0
∂t dτ ∂ ṫ
∂L d ∂L
=0 ⇒ =0 , (13.3)
∂φ dτ ∂ φ̇
233
namely
E = (1 − 2m/r)ṫ (13.4)
L = r 2 sin2 θ φ̇ = r 2 φ̇ . (13.5)
Calling L the angular momentum (per unit rest mass) requires no further justification,
but let me pause to explain in what sense E is an energy (per unit rest mass). On the
one hand, it is the conserved quantity (2.45) associated to time-translation invariance.
As such, it certainly deserves to be called the energy.
But it is moreover true that for a particle at infinity (r → ∞) E is just the special
relativistic energy E = γ(v∞ )c2 , with γ(v) = (1 − v 2 /c2 )−1/2 the usual relativistic γ-
factor, and v∞ the coordinate velocity dr/dt at infinity. This can be seen in two ways.
First of all, for a particle that reaches r = ∞, the constant E can be determined by
evaluating it at r = ∞. It thus follows from the definition of E that
E = ṫ∞ . (13.6)
In Special Relativity, the relation between proper and coordinate time is given by (set-
ting c = 1 again) p
dτ = 1 − v 2 dt ⇒ ṫ = γ(v) , (13.7)
Another argument for this identification will be given below, once we have introduced
the effective potential.
As we have seen in section 2.1 (and again in section 4.7), there is also always one more
integral of the geodesic equation (corresponding roughly speaking to parametrisation
invariance of the Lagrangian), namely L itself,
d
L = 2gµν ẋµ ∇τ ẋν = 0 . (13.9)
dτ
Thus we set
L=ǫ , (13.10)
where ǫ = −1 for timelike geodesics and ǫ = 0 for null geodesics. Thus we have
and we can now express ṫ and φ̇ in terms of the conserved quantities E and L to obtain
a first order differential equation for r alone, namely
L2
−(1 − 2m/r)−1 E 2 + (1 − 2m/r)−1 ṙ 2 + =ǫ . (13.12)
r2
234
Multiplying by (1 − 2m/r)/2 and rearranging the terms, one obtains
E2 + ǫ ṙ 2 m L2 mL2
= +ǫ + 2 − 3 . (13.13)
2 2 r 2r r
Now this equation is of the familiar Newtonian form
ṙ 2
Eef f = + Vef f (r) , (13.14)
2
with
E2 + ǫ
Eef f =
2
m L2 mL2
Vef f (r) = ǫ + 2 − 3 , (13.15)
r 2r r
describing the energy conservation in an effective potential. Except for t → τ , this is
exactly the same as the Newtonian equation of motion in a potential
m mL2
V (r) = ǫ − 3 , (13.16)
r r
the effective angular momentum term L2 /r 2 = r 2 φ̇2 arising, as usual, from the change
to polar coordinates.
In particular, for ǫ = −1, the general relativistic motion (as a function of τ ) is exactly
the same as the Newtonian motion (as a function of t) in the potential
m mL2
ǫ = −1 ⇒ V (r) = − − 3 . (13.17)
r r
The first term is just the ordinary Newtonian potential, so the second term is appar-
ently a general relativistic correction. We will later on treat this as a perturbation but
note that the above is an exact result, not an approximation (so, for example, there
are no higher order corrections proportional to higher powers of m/r). We expect ob-
servable consequences of this general relativistic correction because many properties of
the Newtonian orbits (Kepler’s laws) depend sensitively on the fact that the Newtonian
potential is precisely ∼ 1/r.
and noting that Vef f (r) → 0 for r → ∞, we can read off that for a particle that reaches
r = ∞ we have the relation
2
ṙ∞ = E2 − 1 . (13.19)
This implies, in particular, that for such (scattering) trajectories one necessarily has
E ≥ 1, with E = 1 corresponding to a particle initially or finally at rest at infinity. For
E > 1 the coordinate velocity at infinity can be computed from
ṙ∞2
2
v∞ = . (13.20)
ṫ2∞
235
Using (13.6) and (13.19), one finds
2 E2 − 1 2 −1/2
v∞ = ⇔ E = (1 − v∞ ) , (13.21)
E2
thus confirming the result claimed in (13.8).
For null geodesics, on the other hand, the Newtonian part of the potential is zero, as one
might expect for massless particles, but in General Relativity a photon feels a non-trivial
potential
mL2
ǫ = 0 ⇒ V (r) = − 3 . (13.22)
r
Typically, one is primarily interested in the shape of an orbit, that is in the radius
r as a function of φ, r = r(φ), rather than in the dependence of, say, r on some
extraterrestrial’s proper time τ . In this case, the above mentioned difference between t
(in the Newtonian theory) and τ (here) is irrelevant: In the Newtonian theory one uses
L = r 2 dφ/dt to express t as a function of φ, t = t(φ) to obtain r(φ) from r(t). In General
Relativity, one uses the analogous equation L = r 2 dφ/dτ to express τ as a function of φ,
τ = τ (φ). Hence the shapes of the General Relativity orbits are precisely the shapes of
the Newtonian orbits in the potential (13.16). Thus we can use the standard methods
of Classical Mechanics to discuss these general relativistic orbits and of course this
simplifies matters considerably.
In the examples to be discussed below, we will be interested in the angle ∆φ swept out
by the object in question (a planet or a photon) as it travels along its trajectory between
the farthest distance r2 from the star (sun) (r2 = ∞ for scattering trajectories) and the
position of closest approach to the star r1 (the perihelion or, more generally, if we are
not talking about our own solar system, periastron), and back again,
Z r2
dφ
∆φ = 2 dr . (13.27)
r1 dr
236
In the Newtonian case, these integrals can be evaluated in closed form. With the
general relativistic correction term, however, these are elliptic integrals which can not
be expressed in closed form. A perturbative evaluation of these integrals (treating
the exact general relativistic correction as a small perturbation) also turns out to be
somewhat delicate since e.g. the limits of integration depend on the perturbation.
It is somewhat simpler to deal with this correction term not at the level of the solution
(integral) but at the level of the corresponding differential equation. As in the Kepler
problem, it is convenient to make the change of variables
1 r′
u= u′ = − . (13.28)
r r2
Then (13.26) becomes
u′2 = L−2 (2Eef f − 2Vef f (r)) . (13.29)
Upon inserting the explicit expression for the effective potential, this becomes
E 2 + ǫ 2ǫm
u′2 + u2 = − 2 u + 2mu3 . (13.30)
L2 L
This can be used to obtain an equation for dφ(u)/du = u′−1 , leading to
Z u1
dφ
∆φ = 2 du . (13.31)
u2 du
237
1. Asymptotically, i.e. for r → ∞, the potential tends to the Newtonian potential,
r→∞ m
Vef f (r) −→ − . (13.35)
r
2. At the Schwarzschild radius rs = 2m, nothing special happens and the potential
is completely regular there,
1
Vef f (r = 2m) = − . (13.36)
2
For the discussion of planetary orbits in the solar system we can safely assume that
the radius of the sun is much larger than its Schwarzschild radius, r0 ≫ rs , but the
above shows that even for these highly compact objects with r0 < rs geodesics are
perfectly regular as one approaches rs . Of course the particular numerical value
of Vef f (r = 2m) has no special significance because V (r) can always be shifted by
a constant.
3. The extrema of the potential, i.e. the points at which dVef f /dr = 0, are at
p
r± = (L2 /2m)[1 ± 1 − 12(m/L)2 ] , (13.37)
and the potential has a maximum at r− and a local minimum at r+ . Thus there
√
are qualitative differences in the shapes of the orbits between L/m < 12 and
√
L/m > 12.
√
Let us discuss these two cases in turn. When L/m < 12, then there are no critical
points and the potential looks approximately like that in Figure 10. Note that we
should be careful with extrapolating to values of r with r < 2m because we know that
the Schwarzschild metric has a coordinate singularity there. However, qualitatively the
picture is also correct for r < 2m.
From this picture we can read off that there are no bounded orbits for these values
√
of the parameters. Any inward bound particle with L < 12m will continue to fall
inwards (provided that it moves on a geodesic). This should be contrasted with the
Newtonian situation in which for any L 6= 0 there is always the centrifugal barrier
reflecting incoming particles since the repulsive term L2 /2r 2 will dominate over the
attractive −m/r for small values of r. In General Relativity, on the other hand, it is
the attractive term −mL2 /r 3 that dominates for small r.
Fortunately for the stability of the solar system, the situation is qualitatively quite
√
different for sufficiently large values of the angular momentum, namely L > 12m (see
Figure 11).
In that case, there is a minimum and a maximum of the potential. The critical radii
correspond to exactly circular orbits, unstable at r− (on top of the potential) and stable
√
at r+ (the minimum of the potential). For L → 12m these two orbits approach each
238
Veff (r)
2m r
|
-1/2 _
√
Figure 10: Effective potential for a massive particle with L/m < 12. The extrapolation
to values of r < 2m has been indicated by a dashed line.
Veff (r)
2m r1 r+ r2 r
|
r-
-1/2 _
E
√
Figure 11: Effective potential for a massive particle with L/m > 12. Shown are the
maximum of the potential at r− (an unstable circular orbit), the minimum at r+ (a
stable circular orbit), and the orbit of a particle with Eef f < 0 with turning points r1
and r2 .
239
other, the critical radius tending to r± → 6m. Thus the innermost stable circular orbit
(known affectionately as the ISCO in astrophysics) is located at
rISCO = 6m . (13.38)
On the other hand, for very large values of L the critical radii are (expand the square
root to first order) to be found at
L→∞
(r+ , r− ) −→ (L2 /m, 3m) . (13.39)
For given L, for sufficiently large values of Eef f a particle will fall all the way down
the potential. For Eef f < 0, there are bound orbits which are not circular and which
range between the radii r1 and r2 , the turning points at which ṙ = 0 and therefore
Eef f = Vef f (r).
Because of the general relativistic correction ∼ 1/r 3 , the bound orbits will not be closed
(elliptical). In particular, the position of the perihelion, the point of closest approach
of the planet to the sun where the planet has distance r1 , will not remain constant.
However, because r1 is constant, and the planetary orbit is planar, this point will move
on a circle of radius r1 around the sun.
As described in section 13.2, in order to calculate this perihelion shift one needs to
calculate the total angle ∆φ swept out by the planet during one revolution by integrating
this from r1 to r2 and back again to r1 , or
Z r2
dφ
∆φ = 2 dr . (13.40)
r1 dr
Rather than trying to evaluate the above integral via some sorcery, we will determine
∆φ by analysing the orbit equation (13.33) for ǫ = −1,
m
u′′ + u = + 3mu2 . (13.41)
L2
In the Newtonian approximation, this equation reduces to that of a displaced harmonic
oscillator,
m
u′′0 + u0 = 2 ⇔ (u0 − m/L2 )′′ + (u0 − m/L2 ) = 0 , (13.42)
L
and the solution is a Kepler ellipse described parametrically by
m
u0 (φ) = (1 + e cos φ) (13.43)
L2
where e is the eccentricity (e = 0 means constant radius and hence a circular orbit).
Plugging this back into the Newtonian non-linear 1st-order equation (cf. (13.30))
E 2 − 1 2m
u′0 2 + u20 = + 2 u0 , (13.44)
L2 L
240
one finds that the integration constant e is related to the energy by
L2 2 2L2
e2 = 1 + 2
(E − 1) = 1 + 2 Eef f . (13.45)
m m
In particular, e2 < 1 for bound states (bounded orbits) with Eef f < 0, and we will
concentrate on these orbits. The perihelion (aphelion) is then at φ = 0 (φ = π), with
L2 1
r1,2 = . (13.46)
m 1±e
Thus the semi-major axis a of the ellipse,
2a = r1 + r2 , (13.47)
is
L2 1
a= . (13.48)
m 1 − e2
In particular, in the Newtonian theory, one has
(∆φ)0 = 2π . (13.49)
The anomalous perihelion shift due to the effects of General Relativity is thus
δφ = ∆φ − 2π . (13.50)
u = u0 + u1 (13.51)
The general solution of this inhomogenous differential equation is the general solution
of the homogeneous equation (we are not interested in) plus a special solution of the
inhomogeneous equation. Writing
3m3
u1 (φ) = ((1 + 12 e2 ) − 61 e2 cos 2φ + eφ sin φ) . (13.55)
L4
241
The term of interest to us is the third term which provides a cumulative non-periodic
effect over successive orbits. Focussing on this term, we can write the approximate
solution to the orbit equation as
m 3m2 e
u(φ) ≈ (1 + e cos φ + φ sin φ) . (13.56)
L2 L2
If the first perihelion is at φ = 0, the next one will be at a point ∆φ = 2π + δφ close to
2π which is such that u′ (∆φ) = 0 or
3m2
sin δφ = (sin δφ + (2π + δφ) cos δφ) . (13.57)
L2
Using that δφ is small, and keeping only the lowest order terms in this equation, one
finds the result
6πm2 GN M 2
δφ = 2
= 6π( ) . (13.58)
L cL
An alternative way to obtain this result is to observe that (13.56) can be approximately
written as
m 3m2
u(φ) ≈ 2 1 + e cos[(1 − 2 )φ] (13.59)
L L
From this equation it is manifest that during each orbit the perihelion advances by
3m2
δφ = 2π (13.60)
L2
(2π(1 − 3m2 /L2 )(1 + 3m2 /L2 ) ≈ 2π) in agreement with the above result.
In terms of the eccentricity e and the semi-major axis a (13.48) of the elliptical orbit,
this can be written as
6πGN M
δφ = 2
. (13.61)
c a(1 − e2 )
As these parameters are known for the planetary orbits, δφ can be evaluated. For
example, for Mercury, where this effect is largest (because it has the largest eccentricity)
one finds δφ = 0, 1′′ per revolution. This is of course a tiny effect (1 second, 1′′ , is one
degree divided by 3600) and not per se detectable. However,
1. this effect is cumulative, i.e. after N revolutions one has an anomalous perihelion
shift N δφ;
2. Mercury has a very short solar year, with about 415 revolutions per century;
3. and accurate observations of the orbit of Mercury go back over 200 years.
Thus the cumulative effect is approximately 850δφ and this is sufficiently large to be
observable. The prediction of General Relativity for this effect is
242
And indeed such an effect is observed (and had for a long time presented a puzzle,
an anomaly, for astronomers). In actual fact, the perihelion of Mercury’s orbit shows a
precession rate of 5601′′ per century, so this does not yet look like a brilliant confirmation
of General Relativity. However, of this effect about 5025′′ are due to fact that one is
using a non-inertial geocentric coordinate system (precession of the equinoxes). 532′′
are due to perturbations of Mercury’s orbit caused by the (Newtonian) gravitational
attraction of the other planets of the solar system (chiefly Venus, earth and Jupiter).
This much was known prior to General Relativity and left an unexplained anomalous
perihelion shift of
δφanomalous = 43, 11′′ ± 0, 45′′ /century . (13.63)
Now the agreement with the result of General Relativity is truly impressive and this is
one of the most important experimental verifications of General Relativity.
Other observations, involving e.g. the mini-planet Icarus, discovered in 1949, with a
huge eccentricity e ∼ 0, 827, binary pulsar systems, and more recently obervations of
highly eccentric stars close to the galactic (black hole) center have provided further
confirmation of the agreement between General Relativity and observations.
L2 mL2 L2 2m
Vef f (r) = 2
− 3 = 2 (1 − ) . (13.64)
2r r 2r r
The following properties are immediate:
3. Vef f (r = 2m) = 0.
5. There is one critical point of the potential, at r = 3m, with Vef f (r = 3m) =
L2 /54m2 .
For energies E 2 > L2 /27m2 , photons are captured by the star and will spiral into it.
For energies E 2 < L2 /27m2 , on the other hand, there will be a turning point, and light
rays will be deflected by the star. As this may sound a bit counterintuitive (shouldn’t
a photon with higher energy be more likely to zoom by the star without being forced
243
Veff (r)
r=2m r=3m r
Figure 12: Effective potential for a massless particle. Displayed is the location of the
unstable circular orbit at r = 3m. A photon with an energy E 2 < L2 /27m2 will be
deflected (lower arrow), photons with E 2 > L2 /27m2 will be captured by the star.
to spiral into it?), think about this in the following way. L = 0 corresponds to a
photon falling radially towards the star, L small corresponds to a slight deviation from
radial motion, while L large (thus φ̇ large) means that the photon is travelling along a
trajectory that will not bring it very close to the star at all (see the next subsection for
the precise relation between the angular momentum L and the impact parameter b of
the photon). It is then not surprising that photons with small L are more likely to be
captured by the star (this happens for L2 < 27m2 E 2 ) than photons with large L which
will only be deflected in their path.
We will study this in more detail below. But let us first also consider the opposite
situation, that of light from or near the star (and we are of course assuming that
r0 > rs ). Then for r0 < 3m and E 2 < L2 /27m2 , the light cannot escape to infinity but
falls back to the star, whereas for E 2 > L2 /27m2 light will escape. Thus for a path
sufficiently close to radial (L small, because φ̇ is then small) light can always escape as
long as r > 2m.
The existence of one unstable circular orbit for photons at r = 3m (the photon sphere),
while not relevant for the applications to the solar system in this section, turns out to be
of some interest in black hole astrophysics (as a possibly observable signature of black
holes).
244
13.6 The Bending of Light by a Star: 3 Derivations
To study the bending of light by a star, we consider an incoming photon (or light ray)
with impact parameter b (see Figure 13) and we need to calculate φ(r) for a trajectory
with turning point at r = r1 . At that point we have ṙ = 0 (the dot now indicates
differentiation with respect to the affine parameter σ of the null geodesic, we can but
need not choose this to be the coordinate time t) and therefore r1 is determined by
L2 2m
Eef f = Vef f (r1 ) ⇔ r12 = (1 − ) . (13.65)
E2 r1
The first thing we need to establish is the relation between b and the other parameters
E and L. Consider the ratio
L r 2 φ̇
= . (13.66)
E (1 − 2m/r)ṫ
For large values of r, r ≫ 2m, this reduces to
L dφ
= r2 . (13.67)
E dt
On the other hand, for large r we can approximate b/r = sin φ by φ. Since we also have
dr/dt = −1 (for an incoming light ray), we deduce
L d b
= r2 =b . (13.68)
E dt r
In terms of the variable u = 1/r the equation for the shape of the orbit (13.33) is
In the absence of the general relativistic correction (calling this ‘Newtonian’ is perhaps
not really appropriate since we are dealing with photons/light rays) one has b−1 = u1
or b = r1 (no deflection). The orbit equation
u′′0 + u0 = 0 (13.72)
245
Obligingly the integral gives
Z 1
(∆φ)0 = 2 dx(1 − x2 )−1/2 = 2 arcsin 1 = π . (13.75)
0
δφ = ∆φ − π . (13.76)
In order to solve the orbit equation (13.69), we proceed as in section 13.4. Thus the
equation for the (small) deviation u1 (φ) is
3m 3m
u′′1 + u1 = 3mu20 = 2
(1 − cos2 φ) = 2 (1 − cos 2φ) (13.77)
b 2b
which has the particular solution (cf. (13.54))
3m
u1 (φ) = (1 + 13 cos 2φ) . (13.78)
2b2
Therefore
1 3m
u(φ) = sin φ + 2 (1 + 31 cos 2φ) . (13.79)
b 2b
By considering the behaviour of this equation as r → ∞ or u → 0, one finds an equation
for (minus) half the deflection angle, namely
1 3m 4
(−δφ/2) + 2 = 0 , (13.80)
b 2b 3
leading to the result
4m 4M GN
δφ = = . (13.81)
b bc2
246
b r1
Delta phi
delta phi
Figure 13: Bending of light by a star. Indicated are the definitions of the impact
parameter b, the perihelion r1 , and of the angles ∆φ and δφ.
a function of the independent variables r1 and m, with b eliminated via (13.71). Thus
(13.70) becomes
Z u1
∆φ = 2 du [u21 − u2 − 2m(u31 − u3 )]−1/2 . (13.82)
0
is therefore Z b−1
∂ b−3 − u3
(∆φ)1 = ∆φ =2 du −2 . (13.84)
∂m m=0 0 (b − u2 )3/2
This integral is elementary,
Z 1/2
1 − x3 1−x
dx = −(x + 2) , (13.85)
(1 − x2 )3/2 1+x
and thus
(∆φ)1 = 4b−1 , (13.86)
leading to
4m 4M GN
δφ == , (13.87)
b bc2
in agreement with the result (13.81) obtained above.
247
Schwarzschild metric by the approximation
2m 2m
A(r) = 1 − → 1−
r r
2m −1 2m
B(r) = (1 − ) → 1+ . (13.88)
r r
I will only sketch the main steps in this calculation, so you should think of this subsection
as an annotated exercise.
First of all, redoing the analysis of sections 13.1 and 13.2 for a general spherically
symmetric static metric (12.5),
r ′2 1 ǫ E2
B(r) 4
+ 2 = 2+ 2 (13.90)
r r L L A(r)
ǫ E2
Bu′2 + u2 = + . (13.91)
L2 L2 A(u)
We will concentrate on the lightlike case ǫ = 0,
E2
Bu′2 + u2 = , (13.92)
L2 A(u)
and express the impact parameter b = L/E in terms of the turning point r1 = 1/u1 of
the trajectory. At this turning point, u′ = 0, and thus
E2
= A(u1 )u21 , (13.93)
L2
leading to
A(u1 ) 2
Bu′2 = u − u2 . (13.94)
A(u) 1
We thus find −1/2
dφ 1/2 A(u1 ) 2 2
= ±B(u) u −u . (13.95)
du A(u) 1
For the (linearised) Schwarzschild metric the term in square brackets is
A(u1 ) 2
u − u2 = u21 (1 + 2m(u − u1 )) − u2
A(u) 1
u21
= (u21 − u2 )(1 − 2m ) . (13.96)
u1 + u
Using this and the approximate (linearised) value for B(u),
248
one finds that dφ/du is given by
−1/2
dφ 1/2 u21
= ±B(u) (u21
− u )(1 − 2m 2
)
du u1 + u
2 2 −1/2 u21
≈ (u1 − u ) 1 + m( + u)
u1 + u
u31 − u3
= (u21 − u2 )−1/2 + m . (13.98)
(u21 − u2 )3/2
The first term now gives us the Newtonian result and, comparing with Derivation II,
we see that the second term agrees precisely with the integrand of (13.84) with b → r1
(which, in a term that is already of order m, makes no difference). We thus conclude
that the deflection angle is, as before,
Z u1
u3 − u3 4m
δφ = 2 du m 2 1 2 3/2 = 4mu1 ≈ . (13.99)
0 (u1 − u ) b
This effect is physically measurable and was one of the first true tests of Einstein’s new
theory of gravity. For light just passing the sun the predicted value is
δφ ∼ 1, 75′′ . (13.100)
Experimentally this is a bit tricky to observe because one needs to look at light from
distant stars passing close to the sun. Under ordinary circumstances this would not
be observable, but in 1919 a test of this was performed during a total solar eclipse, by
observing the effect of the sun on the apparent position of stars in the direction of the
sun. The observed value was rather imprecise, yielding 1, 5′′ < δφ < 2, 2′′ which is, if
not a confirmation of, at least consistent with General Relativity.
More recently, it has also been possible to measure the deflection of radio waves by the
gravitational field of the sun. These measurements rely on the fact that a particular
Quasar, known as 3C275, is obscured annually by the sun on October 8th, and the
observed result (after correcting for diffraction effects by the corona of the sun) in this
case is δ = 1, 76′′ ± 0, 02′′ .
The value predicted by General Relativity is, interestingly enough, exactly twice the
value that would have been predicted by the Newtonian approximation of the geodesic
equation alone (but the Newtonian approximation is not valid anyway because it applies
to slowly moving objects, and light certainly fails to satisfy this condition). A calculation
leading to this wrong value had first been performed by Soldner in 1801 (!) (by cancelling
the mass m out of the Newtonian equations of motion before setting m = 0) and also
Einstein predicted this wrong result in 1908 (his equivalence principle days, long before
he came close to discovering the field equations of General Relativity now carrying his
name).
249
This result can be obtained from the above calculation by setting B(u) = 1 instead of
(13.97), as in the Newtonian approximation only g00 is non-trivial. More generally, one
can calculate the deflection angle for a metric with the approximate behaviour
The perhaps slickest way to obtain the orbits of the Kepler problem is to make use of the
so-called Runge-Lenz vector. Recall that, due to conservation of angular momentum L, ~
the orbits in any spherically symmetric potential are planar. The bound orbits of the
Kepler problem, however, have the additional property that they are closed, i.e. that
the perihelion is constant. This suggests that there is a further hidden symmetry in the
Kepler problem, with the position of the perihelion the corresponding conserved charge.
This is indeed the case.
~ = ~x˙ × L
A ~ + W (r)~x (13.103)
or, in components,
Ai = ǫijk ẋj Lk + W (r)xi . (13.104)
A straightforward calculation, using the Newtonian equations of motion in the potential
W (r), shows that
d
Ai = (r∂r W (r) + W (r))ẋi . (13.105)
dt
Thus A~ is conserved if and only if W (r) is homogeneous of degree (−1),
d ~ c
A = 0 ⇔ W (r) = . (13.106)
dt r
In our notation, c = ǫm, and we will henceforth refer to the vector
A ~ + ǫm ~x
~ = ~x˙ × L (13.107)
r
as the Runge-Lenz vector.
250
It is well known, and easy to see by calculating e.g. the Poisson brackets of suitable linear
combinations of L ~ and A,~ that A ~ extends the manifest symmetry group of rotations
SO(3) of the Kepler problem to SO(4) for bound orbits and SO(3, 1) for scattering
orbits.
~ While A
It is straightforward to determine the Keplerian orbits from A. ~ has 3 compo-
~ as the norm A of A
nents, the only new information is contained in the direction of A, ~
can be expressed in terms of the other conserved quantities and parameters (energy E,
angular momentum L, mass m) of the problem. In the notation of section 13.1 one has
1 m A
= 2 (1 + cos φ) . (13.110)
r(φ) L m
Comparing with (13.43), we recognise this as the equation for an ellipse with eccentricity
e and semi-major axis a (13.48) given by
A m 1
e= = . (13.111)
m L2 a(1 − e2 )
Moreover, we see that the perihelion is at φ = 0 which establishes that the Runge-Lenz
vector points from the center of attraction to the (constant) position of the perihelion.
During one revolution the angle φ changes from 0 to 2π.
1 A
= 2 cos φ (13.112)
r(φ) L
L2 L
b= = . (13.113)
A E
In this case, φ runs from −π/2 to π/2 and the point of closest approach is again at
φ = 0 (distance b).
We see that the Runge-Lenz vector captures precisely the information that in the New-
tonian theory bound orbits are closed and lightrays are not deflected. The Runge-Lenz
vector will no longer be conserved in the presence of the general relativistic correction
to the Newtonian motion, and this non-constancy is a precise measure of the deviation
251
of the general relativistic orbits from their Newtonian counterparts. As shown e.g. in an
article by Brill and Goel28 this provides a very elegant and quick way of (re-)deriving
the results about perihelion precession and deflection of light in the solar system.
~ (13.107) for a particle moving in the general rela-
Calculating the time-derivative of A
tivistic potential (13.16)
m mL2
V (r) = ǫ − 3 , (13.114)
r r
one finds (of course we now switch from t to τ )
d ~ 3m2 L2 d
A= ~n (13.115)
dτ r 2 dτ
where ~n = ~x/r = (cos φ, sin φ, 0) is the unit vector in the plane θ = π/2 of the orbit.
~ rotates with angular velocity
Thus A
3mL2 cos φ
ω= φ̇ . (13.116)
Ar 2
Here A now refers to the norm of the Newtonian Runge-Lenz vector (13.107) calculated
for a trajectory ~x(τ ) in the general relativistic potential (13.114). This norm is now no
longer constant,
2mL4
A2 = E 2 L2 + ǫ(L2 + ǫm2 ) + . (13.117)
r3
However, assuming that the change in A ~ is small, we obtain an approximate expression
for ω by substituting the unperturbed orbit r0 (φ) from (13.109),
L2
r0 (φ) = , (13.118)
A cos φ − ǫm
For ǫ = −1, and (φ1 , φ2 ) = (0, 2π), this results in (only the cos2 φ-term gives a non-zero
contribution)
3m2 6πm2
δφ = 2π 2 = , (13.121)
L L2
28
D. Brill, D. Goel, Light bending and perihelion precession: A unified approach, Am. J. Phys. 67
(1999) 316, arXiv:gr-qc/9712082
252
in precise agreement with (13.58,13.60).
For ǫ = 0, on the other hand, one has
Z π/2
3mA
δφ = dφ cos3 φ . (13.122)
L2 −π/2
Using Z
cos3 φ = sin φ − 13 sin3 φ , (13.123)
one finds
4mA 4m
δφ = = , (13.124)
L2 b
which agrees precisely with the results of section 13.6.
253
14 Approaching the Schwarzschild Radius rs
So far, we have been considering objects of a size larger (in practice much larger) than
their Schwarzschild radius, r0 > rs . We also noted that the effective potential Vef f (r) is
perfectly well behaved at rs . We now consider objects with r0 < rs and try to unravel
some of the bizarre physics that nevertheless occurs when one approaches or crosses
rs = 2m. We will do this in several steps:
• First we will consider observers that don’t quite dare to cross rs and which try to
remain stationary at a fixed value of r close to rs .
• Then we will consider observers that fall freely (and radially) in this geometry,
and describe their voyage both from their point of view and from that of a distant
stationary observer (which will turn out to be quite different).
• Then we will study the geometry of the Schwarzschild metric near r = rs , and
show that the geometry is completely non-singular (and is in fact closely related to
the geometry of the Rindler metric for Minkowski space-time discussed in section
1.3).
• These considerations will indicate that (and explain why) the usual Schwarzschild
coordinates (specifically the time-coordinate t) are inadequate for describing the
physics across the radius rs .
• In order to learn more about the region near and beyond r = rs , we next study
the behaviour of lightcones and light rays in this geometry.
• These considerations will then lead us (in section 15) to the introduction of coor-
dinates in which the Schwarzschild metric is non-singular for all 0 < r < ∞, and
then the fun can begin and we can try to understand what actually happens (and
what characterises) r = rs .
Some insight into the Schwarzschild geometry, and the difference between Newtonian
gravity and general relativistic gravity, is provided by looking at stationary observers,
i.e. observers hovering at fixed values of (r, θ, φ). Thus their 4-velocity uα = ẋα has the
form uα = (u0 , 0, 0, 0) with u0 > 0. The normalisation uα uα = −1 then implies
!
1
uα = f (r)−1/2 , 0, 0, 0 = p , 0, 0, 0 . (14.1)
1 − 2m/r
The worldline of a stationary observer is clearly not a geodesic (that would be the
worldline of an observer freely falling in the gravitational field), and we can calculate
254
its covariant acceleration (4.66)
This looks nicely Newtonian, with a force in the radial direction designed to precisely
cancel the gravitational attraction. However, this is a bit misleading since this is a
coordinate dependent statement. A coordinate-invariant quantity is the norm of the
acceleration,
1/2
α β m 2m −1/2
a(r) ≡ gαβ a a = 2 1− . (14.5)
r r
While this approaches the Newtonian value as r → ∞, it diverges as r → 2m, indicating
that stationary observers will find it harder and harder, and need to travel nearly at
the speed of light, to remain stationary close to r = 2m.
Remarks:
1. While a(r) diverges as r → 2m, we will see later (section 15.7) that the finite
quantity limr→2m f (r)1/2 a(r) = 1/4m can be regarded as a measure of the strength
of the gravitational field at r = 2m (surface gravity).
2. One can also think of a(r) as the radial component of the acceleration with respect
to an orthonormal frame at that point, whose radial component would be (1 −
2m/r)1/2 ∂r , the prefactor ensuring that this vector has unit length.
so that here (and in the static spherically symmetric case in general) Φ(r) can be
regarded as the general relativistic analogue of the Newtonian potential φ(r), to
which it reduces in the Newtonian limit,
1 1
Φ(r) = 2 ln f (r) ≈ 2 ln(1 + 2φ(r)) = φ(r) + . . . (14.7)
255
14.2 Vertical Free Fall
We will now consider an object with r0 < rs and an observer who is freely falling
vertically (radially) towards such an object. “Vertical” means that φ̇ = 0, and therefore
there is no angular momentum, L = 0. Hence the effective potential equation (13.13)
becomes
2m
E 2 − 1 = ṙ 2 − , (14.8)
r
where the conserved energy E (per unit mass) is given by
2m
E = (1 − )ṫ . (14.9)
r
In particular, if ri is the point at which the particle (observer) A was initially at rest,
dr
|r=ri = 0 , (14.10)
dτ
we have the relation
2m
E2 = 1 − f = f (ri ) (14.11)
ri
between the constant of motion E and the initial condition ri . In particular, E = 1 for
an observer following a trajectory of an object that would have initially been at rest at
infinity. In that case, (14.8) is readily integrated to give
Since this is just the Newtonian integral, we know, even without calculating it, that it
is finite as rf → rs (and even as rf → 0). This integral can also be calculated in closed
form, e.g. via the change of variables
256
leading (after some convenient cancellations) to
r 1/2 Z ηf 3 1/2
i ri
τ= dη r(η) = (ηf + sin ηf ) . (14.17)
2m 0 8m
In particular, this is finite as rf → 2m and our freely falling observer can reach and
cross the Schwarzschild radius rs in finite proper time.
Coordinate time, on the other hand, becomes infinite at rf = 2m. This can roughly
(and very easily) be seen by noting that
2m 1/2
∆τ = (1 − ) ∆t . (14.18)
r
∆t → ∞. One can also calculate explicitly t = t(η) or t = t(r), say, and check that
this indeed diverges as r → 2m, but this is unenlightning. What we will do below is
check the divergence by concentrating on the region near r = 2m, and interpret this
divergence in terms of observations made by a stationary oberver.
We will now investigate how the above situation presents itself to a distant observer
hovering at a fixed radial distance r∞ . He will observe the trajectory of the freely falling
observer as a function of his proper time τ∞ . Up to a constant factor (1 − 2m/r∞ )1/2 ,
this is the same as coordinate time t, and we will lose nothing by expressing r as a
function of t rather than as a function of τ∞ .
From (14.8),
2m
ṙ 2 + (1 − ) = E2 , (14.19)
r
which expresses r as a function of the freely falling observer’s proper time τ , and the
definition of E,
2m
E = ṫ(1 − ) , (14.20)
r
which relates τ to the coordinate time t, one finds an equation for r as a function of t,
dr 2m 2m 1/2
= −E −1 (1 − )(E 2 − (1 − )) (14.21)
dt r r
(the minus sign has been chosen because r decreases as t increases). We want to analyse
the behaviour of the solution of this equation as the freely falling observer approaches
the Schwarzschild radius, r → 2m,
dr r − 2m r − 2m 1/2
= −E −1 ( )(E 2 − )
dt r r
r − 2m r − 2m
→ −E −1 ( )(E 2 )1/2 = −( ) . (14.22)
2m 2m
257
We can write this equation as
d 1
(r − 2m) = − (r − 2m) , (14.23)
dt 2m
which obviously has the solution
This shows that, from the point of view of the observer at infinity, the freely falling
observer reaches r = 2m only as t → ∞. In particular, the distant observer will never
actually see the infalling observer cross the Schwarzschild radius.
This is clearly an indication that there is something wrong with the time coordinate t
which runs too fast as one approaches the Schwarzschild radius. We can also see this by
looking at the coordinate velocity v = dr/dt as a function of r. Let us choose ri = ∞
for simplicity - other choices will not change our conclusions as we are interested in the
behaviour of v(r) as r → rs . Then E 2 = 1 and from (14.21) we find (now dropping the
minus sign)
r − 2m
v(r) = (2m)1/2 3/2 (14.25)
r
As a function of r, v(r) reaches a maximum at the critical radius rc = 6m = 3rs ,
d
v(r) = 0 ⇒ r = rc = 6m , (14.26)
dr
where the velocity is (restoring the speed of light c)
2c
v(rc ) = √ . (14.27)
3 3
The fact that this radius agrees with the ISCO (innermost stable circular orbit) (13.38)
mentioned in section 13.3 should be regarded as a coincidence.
Beyond that point, i.e. for r < rc , v(r) decreases again and clearly goes to zero as
r → 2m. The fact that the coordinate velocity goes to zero is another manifestation of
the fact that coordinate time goes to infinity. Somehow, the Schwarzschild coordinates
are not suitable for describing the physics at or beyond the Schwarzschild radius because
the time coordinate one has chosen is running too fast. This is the crucial insight that
will allow us to construct ‘better’ coordinates, which are also valid for r < rs , later on
in this section.
As an aside, if one repeats the above “calculation” for arbitrary E, from (14.21) one
finds that the critical radius is
2m
rc = , (14.28)
1 − 2E 2 /3
with the E-dependent maximal velocity
2c
v(rc ) = √ E 2 . (14.29)
3 3
258
This reproduces the above result for E = 1, shows that rc → 2m for E → 0, and
p
moreover curiously shows that there is a maximal E, Emax = 3/2, for which such a
critical point of the coordinate velocity will occur, at rc → ∞, with maximal velocity
p √
E → Emax = 3/2 ⇒ rc → ∞ and v(rc ) → vmax = c/ 3 . (14.30)
Thus for larger values of E, the coordinate velocity will monotonically decrease from its
initial value
v(r → ∞) = E −1 (E 2 − 1)1/2 (14.31)
to v(r) = 0 as r → 2m.
[And if anybody has an intuitive explanation for these facts, please let me know . . . ]
One dramatic aspect of what is happening at (or, better, near) the Schwarzschild radius
for very (very!) compact objects with rs > r0 is the following. Recall the formula
(2.105) for the gravitational red-shift, which gave us the ratio between the frequency of
light νe emitted at the radius re and the frequency ν∞ received at the radius r∞ > re
in a static spherically symmetric gravitational field. The result, which is in particular
also valid for the Schwarzschild metric, was
ν∞ (1 − 2m/re )1/2
= . (14.33)
νe (1 − 2m/r∞ )1/2
We now choose the emitter to be the freely falling observer whose position is described
by re = r(τ ) or r(t), and the receiver to be the fixed observer at r∞ ≫ rs . As re → rs ,
one clearly finds
ν∞
→0 . (14.34)
νe
Expressed in terms of the gravitational red-shift factor z,
νe
1+z = (14.35)
ν∞
this means that there is an infinite gravitational red-shift as re → rs ,
re → rs ⇒ z → ∞ . (14.36)
More explicitly, using (14.24) one finds the late-time behaviour of the red-shift factor
z, in terms of coordinate time (or the distant observer’s proper time), to be
259
Thus for the distant observer at late times there is an exponentially growing red-shift
and the distant observer will never actually see the unfortunate emitter crossing the
Schwarzschild radius: he will see the freely falling observer’s signals becoming dimmer
and dimmer and arriving at greater and greater intervals, and the freely falling observer
will completely disappear from the distant observer’s sight as re → rs . Note that the
time-scale tz for this exponential red-shift at late times is set by tz ∼ 4m/c, which is of
the order of
M
tz ∼ 10−5 s , (14.38)
Msun
so that this is pretty much instantaneous for an object the mass of an ordinary star.
We will come back to these estimates later on when (briefly) talking about gravitational
collapse in section 15.9.
This was for a stationary observer. As we have seen, the situation presents itself rather
differently for the freely falling observer himself who will not immediately notice any-
thing particularly dramatic happening as he approaches or crosses rs .
We have now seen in two different ways why the Schwarzschild coordinates are not
suitable for exploring the physics in the region r ≤ 2m: in these coordinates the metric
becomes singular at r = 2m and the coordinate time becomes infinite. On the other
hand, we have seen no indication that the local physics, expressed in terms of covariant
quantities like proper time or the geodesic equation, becomes singular as well. So we
have good reasons to suspect that the singular behaviour we have found is really just
an artefact of a bad choice of coordinates.
In fact, the situation regarding the Schwarzschild coordinates is quite reminiscent of the
Rindler coordinates for Minkowski space we discussed (way back) in section 1.3. As we
saw there, these only covered part of Minkowski space (the right quadrant or Rindler
wedge), bounded by lines (or hypersurfaces) where the time coordinate η became infinite,
while inertial (geodesic, freely falling) observers could exit this region in finite proper
time. This is in fact more than just a loose analogy: as we will see now, the Rindler
metric (1.64) gives an accurate description of the geometry of the Schwarzschild metric
close to the Schwarzschild radius.
260
so that the metric becomes
r̃ 2 2m 2
ds2 = − dt + dr̃ . (14.41)
2m r̃
Introducing the new radial variable ρ (proper radial distance from the horizon) via
2m 2 √
dρ2 = dr̃ ⇒ ρ= 8mr̃ , (14.42)
r̃
one finds
1
ds2 = − ρ2 dt2 + dρ2 . (14.43)
16m2
Finally a simple rescaling of t, η = t/4m, leads to
which, remarkably, is identical to the Rindler metric (1.64). Keeping track of the trans-
verse 2-sphere and using r 2 ≈ (2m)2 in the near-horizon approximation, the complete
metric in this limit and in these coordinates reads
If we further restrict to just a small angular region on the sphere, we can approximate
In any case, we see that, up to the harmless redefinition (r, t) → (ρ, η) that we just per-
formed, Schwarzschild coordinates for the Schwarzschild geometry are just like Rindler
coordinates for Minkowski space.
Remarks:
1. The first, and most crucial, thing we learn from this is that the singularity of
the Schwarzschild metric at r = 2m in the Schwarzschild coordinates (t, r) is,
as anticipated, a mere coordinate singularity. Indeed, r = 2m corresponds to
r̃ = 0 ⇔ ρ = 0, and we already know that the singularity of the Rindler metric at
ρ = 0 is just a coordinate singularity (which can be eliminated e.g. by passing to
standard inertial Minkowski coordinates ξ a via (1.63)).
261
The problem is evidently that the required acceleration of these observers becomes
infinite as ρ → 0 ⇔ r → 2m (as we have calculated in section 14.1). That these
observers appear to see a singular metric is then not the geometry’s fault but can
be attributed to a bad choice of observers whose perception of the geometry is
distorted by their acceleration and their desperate attempt to stay at constant
values of r even when they are very close to r = 2m.
3. The situation is quite different for the freely falling observers. Their worldlines
look like the vertical line labelled “worldline of a stationary observer” in Figure 7,
they cross the horizon in finite proper time, experiencing no strong acceleration
or gravitational fields. As already noted in section 1.3, they evidently become
invisible to observers in the Rindler quadrant (now “Schwarzschild patch”) of
the geometry, stationary outside observers noting an infnite gravitational redshift
affecting the signals sent out by the freely falling observer.
4. Introducing coordinates adapted to freely falling observers (so that e.g. time is
their proper time) would be tantamount to passing from Rindler coordinates to
ordinary Minkowski coordinates. and we will consider that option in section 15.1
below (Painlevé-Gullstrand coordinates). These will provide us, as we will see,
with a coordinate system that extends in a non-singular way across the Schwarz-
schild radius.
5. In order to elucidate the significance of the Schwarzschild radius, it will then turn
out to be useful to base the construction of new coordinates not on the behaviour of
timelike freely falling observers but on ingoing light rays instead (which evidently
also do not suffer from the problems of the stationary observers). These are the
Eddington-Finkelstein coordinates to be discussed in section 15.3.
Thus
dt
= ±(1 − 2m/r)−1 , (14.48)
dr
262
t
r=2m r
Figure 14: The causal structure of the Schwarzschild geometry in the Schwarzschild
coordinates (r, t). As one approaches r = 2m, the lightcones become narrower and
narrower and eventually fold up completely.
In the (r, t)-diagram of Figure 14, dt/dr represents the slope of the lightcones at a given
value of r. Now, as r → 2m, one has
dt r→2m
−→ ±∞ , (14.49)
dr
so the light cones ‘close up’ as one approaches the Schwarzschild radius. This is the
same statement as before regarding the fact that the coordinate velocity goes to zero at
r = 2m, but this time for null rather than timelike geodesics.
As our first step towards introducing coordinates that are more suitable for describing
the region around rs , let us write the Schwarzschild metric in the form
r ∗ = r + 2m log(r/2m − 1) . (14.52)
This new radial coordinate r ∗ , known as the Regge - Wheeler radial coordinate or tortoise
coordinate, also provides us with the solution
t = ±r ∗ + C (14.53)
263
t
r*
r=2m
r* =-infinity
Figure 15: The causal structure of the Schwarzschild geometry in the tortoise coordi-
nates (r ∗ , t). The lightcones look like the lightcones in Minkowski space and no longer
fold up as r → 2m (which now sits at r ∗ = −∞).
to the equation (14.48) describing the lightcones. In terms of r ∗ the metric simply reads
We see immediately that we have made some progress. Now the lightcones, defined by
dt2 = dr ∗2 , (14.55)
do not seem to fold up as the lightcones have the constant slope dt/dr ∗ = ±1 (see
Figure 15), and there is no singularity at r = 2m. However, r ∗ is still only defined for
r > 2m and the surface r = 2m has been pushed infinitely far away (r = 2m is now at
r ∗ = −∞). Moreover, even though non-singular, the metric components gtt and gr∗ r∗
√
(as well as g) vanish at r = 2m.
As an aside, and to conclude this section, I just want to point out that the tortoise
coordinate r ∗ and the corresponding retarded and advanced time-coordinates u, v =
t ∓ r ∗ (which we will reencounter in section 15.3 as part of the Eddington Finkelstein
coordinates) are not only useful for clarifying the causal structure of the Schwarzschild
264
geometry, but also for the analysis of the propagation of scalar (and other) fields. This
is principally due to the fact that in these coordinates the (t, r)-part of the metric
is conformally flat - see (14.54) or (15.79) below. Combined with the observation of
section 5.5 that the Klein-Gordon action is conformally invariant in (1 + 1)-dimensions,
this leads to a canonical form for the (3 + 1) wave operator in the (t, r)-sector (and the
(θ, φ)-sector is standard anyway).
Specifically, we start with the action for a (massless, say) scalar field φ in the metric
(14.54),
Z
√ 4 αβ
S[φ] ∼ gd x g ∂α φ∂β φ
Z
= dt dr ∗ dΩ r 2 f (r) f (r)−1 (−(∂t φ)2 + (∂r∗ φ)2 ) − r −2 φ ∆S 2 φ (14.56)
Z
= dt dr ∗ dΩ −(r∂t φ)2 + (r∂r∗ φ)2 ) − f (r)φ ∆S 2 φ ,
where f (r) = 1 − 2m/r, dΩ = sin θdθ dφ denotes the solid angle on the 2-sphere, and
∆S 2 is the Laplace operator on the 2-sphere. Separating variables according to
X
φ(x) = r −1 ψℓm (t, r ∗ )Yℓm (θ, φ) , (14.57)
ℓ,m
using
∆S 2 Yℓm = −ℓ(ℓ + 1)Yℓm , (14.58)
and using
∂r∗ r = f (r) ⇒ r∂r∗ (ψℓm /r) = ∂r∗ ψℓm − r −1 f (r)ψℓm , (14.59)
where
∗ ℓ(ℓ + 1) 2m
Vℓ (r ) = f (r) + 3 . (14.61)
r2 r
This potential is non-negative for all 2m < r < ∞ and goes to zero quite rapidly as
r → ∞ or r → 2m, i.e. as r ∗ → ±∞,
r ∗ → +∞ ⇔ r → ∞ : Vℓ (r) ∼ (r ∗ )−2
∗ (14.62)
r ∗ → −∞ ⇔ r → 2m : Vℓ (r) ∼ e r /2m .
This means that at infinity (in r) and near the horizon, the solutions of this equation
can be chosen to have the standard right-moving (outgoing) / left-moving (ingoing)
form
∗ ∗
ψ(t, r ∗ ) ∼ e ±iω(t − r ) = e ±iωu or ψ(t, r ∗ ) ∼ e ±iω(t + r ) = e ±iωv . (14.63)
265
However, this does not mean that a mode having the above form near infinity, say,
evolved from a mode that had also had such a form near the horizon. Rather, the
almost infinite exponential gravitational red-shift between the near-horizon and asymp-
totic regions discussed in section 14.4 leads to an exponential relation between the
parameter u, say, labelling an outgoing wave at infinity, and the corresponding param-
eter near the horizon. This exponential relation is analogous to that encountered for
a scalar field in Rindler versus inertial Minkowski coordinates (section 5.6). For the
Schwarzschild metric it is encoded in the precisely analogous relation (15.93) between
the coordinates u, v and the Kruskal coordinates uK , vK to be introduced below, the lat-
ter being the analogues of Minkowski inertial coordinates, and the former the analogues
of Rindler coordinates. These observations are at the heart of the so-called Hawking
Effect, i.e. the quantum radiation of black holes. See the references given in section 15.5
for introductions to these topics.
the exact equation to be solved takes the form of a standard time-independent Schrödinger
equation,
−∂r2∗ ψ + Vℓ (r ∗ )ψ = ω 2 ψ . (14.65)
It plays an important role in numerous aspects of Black Hole physics, e.g. in the analysis
of the stability of the Schwarzschild solution. In this context, the above equation and
its counterparts for vectors and symmetric tensors are known as the Regge-Wheeler(-
Zerilli) equations.
266
15 The Schwarzschild Black Hole
The primary purpose of this section is to understand the significance and physics of
the Schwarzschild radius and the region r < 2m. We will acccomplish this principally
via a construction of appropriate, physically motivated coordinate systems and, indeed,
a secondary purpose of this section is to illustrate how to go about constructing such
coordinate systems in a systematic way (instead of just introducing them without further
explanations).
Even though the details will differ, the general principles of how to construct coordinates
to explore and understand a given space-time (by constructing and using coordinates
adapted to preferred classes of observers or geodesics) can be applied to other metrics,
e.g. some of those listed in section 15.10.
Let me also make clear from the outset what the issue is not. Namely, the issue is not
one of constructing appropriate coordinates solely in the region 0 < r < 2m. Indeed, we
have known such coordinates all along, simply the original Schwarzschild coordinates
(t, r). The Schwarzschild metric is a vacuum solution of the Einstein equations also
in that region, and the coordinates (t, r) give a valid non-singular description of the
metric there. Since f (r) = 1 − 2m/r < 0, their interpretation differs, i.e. r is a timelike
coordinate, and t plays the role of a radial coordinate, but this notational issue can
easily be rectified by renaming r = T, t = R, and writing the Schwarzschld metric in
the region 0 < r = T < 2m as
2m 2m
f (r) = 1 − ≡ −( − 1)
r T (15.1)
2m 2m
⇒ ds2 = −( − 1)−1 dT 2 + ( − 1)dR2 + T 2 dΩ22 .
T T
While this provides some minimal insight (e.g. that for r < 2m surfaces of constant r are
spacelike and that the metric looks time-dependent), what we are looking for is a way of
describing the physics of the Schwarzschild solution that encompasses both the region
r > 2m and the region r < 2m (and that therefore e.g. provides a valid continuous map
for the freely falling observer as he crosses r = 2m).
We had seen that the Schwarzschild time coordinate t is adapted to stationary observers
(whose proper time is proportional to t), and therefore not useful for describing the
region r < 2m. We had also seen that freely falling observers cross r = 2m in finite
proper time τ . This suggests to choose some family of freely falling observers (geodesics)
and to use their proper time τ = T (t, r, θ, φ) as the new time coordinate, i.e. to perform
the coordinate transformation
267
(retaining, for the time being, the coordinates (r, θ, φ)).
A natural (and the simplest) choice is to consider the family of freely falling observers
which fall radially (angular momentum L = 0) and start off at rest from infinity (energy
E = 1). In this case, the geodesic equations (14.8) and (14.9) take the form
where !
(r/2m)1/2 − 1
Θ(r) = 2m 2(r/2m)1/2 + ln (15.6)
(r/2m)1/2 + 1
is the solution of
dΘ(r)/dr = (2m/r)1/2 (1 − 2m/r)−1 . (15.7)
In particular, we see that, up to the irrelevant constant τ0 − t0 , τ (r) and t(r) are related
by
τ (r) = t(r) + Θ(r) . (15.8)
We are thus led to the introduce the new time coordinate T (t, r) by
(r/2m)1/2 − 1
T (t, r) = t + Θ(r) = t + 2(2mr)1/2 + 2m ln 1/2
. (15.9)
(r/2m) + 1
Since the metric does not depend explicitly on t, to determine the metric in the coordi-
nates (T, r, θ, φ) we only need to substitute dt by
268
Most importantly, we see that due to the non-singular off-diagonal term the metric in
these coordinates is well-defined and non-degenerate for all 0 < r < ∞, in particular at
r = 2m. This is the definitive proof that the singularity at r = 2m in Schwarzschild
coordinates is really just a coordinate singularity.
Remarks:
1. The metric has the characteristic property that the metric induced on the slices
of constant T is just the flat Euclidean metric for any T ,
This makes this form of the metric (and choice of time coordinate) particularly con-
venient e.g. for the canoncial quantisation of fields in the Schwarzschild space-time,
and it is also for this reason that this coordinate system has become increasingly
popular in recent years.29
2. It is easy to verify / confirm directly in the PG coordinates that the new time-
coordinate T really has the interpretation as measuring the proper time τ of
radially freely falling observers starting off at rest at infinity. To that end note
that in PG coordinates a radial timelike goedesic satisfies
p 2
−Ṫ 2 + ṙ + 2m/r Ṫ = −1 , (15.13)
while the conserved energy associated with the T -translation invariance of the
metric has the form
p
E = f Ṫ − 2m/r ṙ . (15.14)
It immediately follows that
p
T =τ ⇒ Ṫ = +1 ⇒ ṙ = − 2m/r ⇒ E=1 . (15.15)
These are precisely the geodesic paths with which we began the construction.
The coordinate transformation (15.9) is of the general form T (t, r) = t + ψ(r) (12.2)
discussed previously, and preserves the t-independence and manifest spherical symmetry
of the metric. It leads to the metric in the form anticipated in (12.9).
Turning this around, once one has found the Schwarzschild metric in Schwarzschild
coordinates, say, one can of course “discover” an infinite number of new coordinate
29
See e.g. P. Kraus, F. Wilczek, A Simple Stationary Line Element for the Schwarzschild Geom-
etry, and Some Applications, arXiv:gr-qc/9406042 and citations thereto; A. Nielsen, M. Visser,
Production and decay of evolving horizons, arXiv:gr-qc/0510083 for PG-like metrics for general
time-dependent spherically symmetric metrics; C. Barcelo, S. Liberati, M. Visser, Analogue Gravity,
arXiv:gr-qc/0505065 for a detailed discussion of PG-like metrics in the context of so-called analogue
models of gravity.
269
systems for the Schwarzschild metric by performing such a (or a more general) coordinate
transformation. Only in special cases, however, will one find a coordinate system that
is actually useful. Let us see how to recover the PG coordinate system in this way,
and how to “precover” another coordinate system that we will disuss in more detail in
section 15.3.
where C(r) = f (r)ψ ′ (r) is an essentially arbitary function of r. This metric represents
the Schwarzschild metric for any choice of C(r). In particular, the vacuum Einstein
equations do not impose any constraints on C(r) - this can be checked explicitly, but it
would be silly to do so since we have just seen explicitly that C(r) is not determined and
just corresponds to the freedom of performing a particular class of coordinate transfor-
mations. One can therefore now choose C(r) at will.
This makes the metric diagonal, corresponds to ψ(r) constant, and evidently re-
turns one to Schwarzschild coordinates. Any other choice of C(r) will lead to a
non-diagonal metric in the coordinates (T, r).
We see that with the upper sign this is precisely the choice giving rise to PG coor-
dinates, confirmed by the fact that for ψ(r) this implies the differential equation
p p
C(r) = + 2m/r ⇒ ψ ′ (r) = 2m/rf (r)−1 ⇒ ψ(r) = Θ(r) . (15.19)
The same argument shows that the lower sign gives a corresponding set of PG
coordinates based on outgoing rather than on ingoing coordinates.
270
so that the metric has the particularly simple (and non-singular at r = 2m) form
Referring back to the discussion of section 14.6, in particular (14.51), we see that
the solution to this equation is
These are just the advanced and retarded time coordinates (u, v) of section 14.7,
and we will encounter them in section 15.3 as Eddington-Finkelstein coordinates.
This procedure can in principle be applied to any static, spherically symmetric metric,
and we will also make use of it later, e.g. in section 24.2 when constructing coordinates
for de Sitter space.
uα ∇α uβ = uα ∇β uα = 12 ∇β (uα uα ) = 0 , (15.23)
(this is a special case of the general result established in (9.66) in the discussion of
the Raychaudhuri equation in section 9.4 that a gradient vector field is geodesic iff it
is of constant length), and that the metric component g T T with respect to this new
coordinate function T is
Conversely, this kind of reasoning may allow one to discover the geometric interpretation
of a coordinate system that one has selected through other criteria, e.g. via a convenient
choice of the function C(r) in (15.16). In particular, applied to PG coordinates, one
can argue as follows:
1. Given the metric in PG coordinates, one seeks the interpretation of the gradient
covector
uα = −∂α T , (15.25)
which is correctly normalised for a 4-velocity field, uα uα = −1, because gT T = −1
in PG coordinates;
30
See K. Martel, E. Poisson, Regular coordinate systems for Schwarzschild and other spherical space-
times, arXiv:gr-qc/0001069, which also introduces generalised PG coordinates adapted to geodesic
observers with E > 1, i.e. with non-zero velocity at infinity. Corresponding coordinates for E < 1, i.e.
adapted to freely falling observers with maximal radius ri < ∞, have been constructed by Gautreau
and Hoffmann (see the references in this article).
271
2. in Schwarzschild coordinates, uα has the components
and comparison with (15.3) shows that this is precisely the family of tangent
vectors
uα = ẋα = (ṫ, ṙ, θ̇, φ̇) (15.28)
Even though we now have a coordinate system that extends in a non-singular way
across r = 2m, so that r = 2m is not a true singularity, this does not mean that nothing
interesting at all happens at that locus. Indeed, the (legitimate) stationary observers at
large radii r ≫ 2m are still waiting for an explanation for their observations, described
in sections 14.3 and 14.4. Simply telling them that their coordinates are no good near
r = 2m will hardly be considered by them to be a satisfactory explanation of what they
observe.
In order to address this issue, we now look at the behaviour of (radial) lightrays in PG
coordinates, characterised by
p
−f (r)dT 2 + 2 2m/rdT dr + dr 2 = 0 . (15.29)
This is the usual situation. There is one outgoing direction along which r = r(T ) grows,
and one ingoing direction along which r(T ) decreases. In particular, while remaining
within the future lightcone one can choose to go to either larger or smaller values of r.
272
Thus along both directions lightrays (and therefore massive particles as well) must move
to smaller values of r. In particular, no observer, no lightray and no information can
escape from the region r < 2m. No wonder that the asymptotic stationary observers
never “see” the infalling observer cross r = 2m. This new region is a future extension
of the Schwarzschild “patch” r > 2m of the space-time, uncovered by ingoing radial
geodesics.
With the opposite choice of sign in (15.18), corresponding to outgoing rather than
ingoing radial geodesics, one would instead have discovered a region r < 2m in which
′ > 0. This can therefore not possibly be the “same” region r < 2m, and indeed is a
r±
past extension of the Schwarzschild patch, uncovered by the back-tracking of outgoing
radial geodesics.
While it is possible to further explore the consequences of all this in terms of the present
PG coordinates, since we have been led to consider the structure of the lightcones, i.e.
null geodesics, it turns out to be more convenient, also for the following, to discuss
this in terms of coordinates that are adapted to lightrays rather than to the timelike
goedesics. These are the Eddington-Finkelstein coordinates to be discussed in section
15.3 below.
This will be accomplished by constructing comoving coordinates for the geodesic ob-
servers underlying the (generalised) PG coordinate system, where comoving means that
these observers remain at fixed values of all the spatial coordinates and only evolve in
time = proper time. Since radial geodesics already remain at fixed values of the angular
coordinates (θ, φ), what this amounts to is to trade the coordinate r for another coor-
dinate that simply labels the individual geodesics (and that is therefore, tautologically,
guaranteed to also remain constant along such a geodesic). Such a label is provided by
an integration constant appearing in the solution to the radial geodesic equation since
(tauto-)logically such an integration constant is constant along the geodesic.
Let us first consider the case E = 1. Going back to the solution (15.5),
p √
ṙ = − 2m/r ⇒ r(τ ) = [3 2m(τ0 − τ )/2]2/3 , (15.34)
we can identify τ0 as a candidate for a comoving radial coordinate. This label τ0 can
273
be interpreted as the proper time at which the geodesic ends up at r = 0, r(τ0 ) = 0.
Alternatively and preferably, if one is (understandably) reluctant to label geodesics by
their behaviour at r = 0, τ0 can be thought of as being related to the value r0 of r(τ )
at τ = 0,
√
r0 = r(τ = 0) = [3 2mτ0 /2]2/3 . (15.35)
This satisfies
p
dr = ṙdτ + (∂r/∂ρ)dρ = ṙ(dτ − dρ) = − 2m/r(dτ − dρ) , (15.37)
so that (denoting the PG proper time coordinate T now by τ ) the Schwarzschild line
element in PG form (15.11) turns into
p 2
ds2 = −dτ 2 + dr + 2m/rdτ + r 2 dΩ2
p 2
= −dτ 2 + 2m/rdρ + r 2 dΩ2 (15.38)
2m
= −dτ 2 + dρ2 + r(τ, ρ)2 dΩ2 ,
r(τ, ρ)
with r(τ, ρ) given explicitly by (15.36). This is the Schwarzschild metric in Lemâitre
Coordinates (Lemaı̂tre (1938)), which have a clear physical interpretation and also man-
ifestly extend in a non-singular way across r = 2m (the Schwarzschild radius is located
at ρ − τ = 4m/3), to all r > 0.
Remarks:
We will discuss comoving coordinates, and metrics employing them, in much more
detail in the context of cosmology in sections 18-21, cf. in particular section 18.5
for geodesics of comoving observers in comoving coordinates,
274
coordinates to be described in detail later on) again makes them an attractive co-
ordinate system to use e.g. when studying quantum field theory in a Schwarzschild
background.31
Closely related to Lemaı̂tre coordinates are the so-called Novikov coordinates, comoving
coordinates based on radial geodesics with a finite maximal radius, ri < ∞ (i.e. E < 1)
in the notation of section 14.2. In this case, trajectories are implicitly defined by (14.16)
and (14.17), i.e.
r(η) = 12 ri (1 + cos η)
3 1/2 (15.40)
ri
τ (η) = (η + sin η) ,
8m
and one thinks of these relations, together with the usual relation E = f ṫ (14.9) as
defining the change of variables
Note that ri is clearly a comoving coordinate, as it can be used to label the geodesic.
Note also that, if required/desired, from (15.40) one can solve for τ = τ (r, ri ),
η = arccos(2r/ri − 1)
3 1/2 (15.42)
ri p
⇒ τ (r, ri ) = arccos(2r/ri − 1) + 2 (r/ri ) − (r/ri )2 .
8m
However, even without having to solve or invert these implicit equation, it is easy to see
that the resulting metric will have the form
• Writing
dt = ṫdτ + t′ dri , dr = ṙdτ + r ′ dri , (15.44)
ds2 = −(f ṫ2 −f −1 ṙ 2 )dτ 2 +2(f −1 ṙr ′ −f ṫt′ )dτ dri +(f −1 (r ′ )2 −f (t′ )2 )dri2 . (15.45)
• The first term in brackets is equal to 1 because (t(τ ), r(τ )) parametrise a timelike
radial geodesic,
275
• Moreover, for comoving coordinates the second term in brackets, i.e. the off-
diagonal term, is zero, gτ ri = 0 (15.39). Thus we can deduce
Using
1/2
ṙ = −(fi − f )1/2 , f ṫ = E = fi (fi ≡ f (ri )) , (15.48)
Remarks:
1. Usually, the metric is expressed not in terms of the variable ri but in terms of
R2 + 1
R = (ri /2m − 1)1/2 ⇔ ri = 2m(R2 + 1) ⇔ E 2 = fi = (15.51)
R2
so that the metric takes the form
2
2 R2 + 1
2 ∂r
ds = −dτ + dR2 + r(τ, R)2 dΩ2 . (15.52)
R2 ∂R
This is the metric in Novikov Coordinates (Novikov, 1963). In terms of R, the
relation (15.42) can also, using
be written as
" 1/2 #
τ (r, R) r (r/2m) 2 1/2 r/2m
= (R2 + 1) − 2 + (R2 + 1)3/2 arccos .
2m 2m R +1 R2 + 1
(15.54)
This is the expression commonly found and used in the literature.
2. The fact that r(τ, ri ) or r(τ, R) is only determined implicitly makes Novikov coor-
dinates somewhat more awakward to use in practice than Lemaı̂tre coordinates.32
Nevertheless, this is compensated by their clear physical interpretation. In par-
ticular, they are useful e.g. in numerical simulations and other investigations of
gravitational collapse, and the Novikov metric will indeed naturally arise in this
context in our brief discussion of gravitational collapse in section 15.9.
32
See e.g. the appendix of J. Makela, A. Peltola, Thermodynamical Properties of Horizons,
arXiv:gr-qc/0205128 for somewhat more explicit expressions, including a Taylor expansion of r ′ .
276
3. Since the timelike geodesics that Novikov coordinates are based on oscillate in
finite proper time from r = 0 in the past through the maximal radius ri to r = 0
in the future, Novikov coordinates cover both the past and future extensions of the
metric in the Schwarzschild patch discovered in terms of PG coordinates in section
15.1, and to be discussed again below in section 15.3. The reflection symmetry
R → −R of the metric also exchanges the Schwarzschild patch and its “mirror
region” (first encountered in the context of isotropic coordinates at the end of
section 12.5, and to be discussed in detail in section 15.6), and thus Novikov
coordinates actually turn out to provide a complete covering of the fully extended
Kruskal-Schwarzschild space-time.33
u = t − r∗ , v = t + r∗ . (15.55)
Then infalling radial null geodesics (dr ∗ /dt = −1) are characterised by v = const. and
outgoing radial null geodesics by u = const.
Even though the metric coefficent guu or gvv vanishes at r = 2m, as for the PG coor-
dinates of section 15.1 there is no real degeneracy, the two-dimensional metric in the
(v, r)- or (u, r)-directions having the completely non-singular and non-degenerate form
!
−(1 − 2m/r) ±1
g= (15.57)
±1 0
277
To determine the lightcones in the Eddington-Finkelstein coordinates we again look at
radial null geodesics which this time are solutions to
xα (λ) = (r = r0 − λ, v = v0 , θ = θ0 , φ = φ0 ) (15.60)
are geodesics. The sign −λ has been chosen so that the tangent vector ẋα =
dxα /dλ is future-oriented, i.e. such that its scalar product with ∂t = ∂v is negative,
!
gαβ ẋα (∂t )β = gαv ẋα = ṙ < 0 . (15.61)
Thus the radius indeed decreases along future-oriented ingoing null geodesics (as
the name was meant to suggest) and the radial coordinate is an affine parameter
along these geodesics.
That these curves are geodesics can be seen explicitly by calculating the acceler-
ation of uα = (1, 0, 0, 0),
Either way this shows that ∂r is geodesic, as Γβrr = 0 (since grr = 0, the only
possible contribution to Γβrr could have arisen from ∂r gvr , but gvr = 1 is constant).
Thus the lightcone remain well-behaved (do not fold up) at r = 2m, the surface r = 2m
is at a finite coordinate distance, namely (to reiterate the obvious) at r = 2m, and there
is no problem with following geodesics beyond r = 2m.
But even though the lightcones do not fold up at r = 2m, something interesting is
certainly hapening there. Whereas, in a (v, r)-diagram (see Figure 16), one side of the
278
v
v=const.
r
r=0 r=2m
lightcone always remains horizontal (at v = const.), the other side becomes vertical at
r = 2m (dv/dr = ∞) and then tilts over to the other side. In particular, beyond r = 2m
all future-directed paths, those within the forward lightcone, now have to move in the
direction of decreasing r: clearly the ingoing null geodesics move towards smaller values
of r, but so do those that for r > 2m were outgoing,
There is thus no way to turn back to larger values of r, not on a geodesic but also not
on any other path (i.e. not even with a powerful rocket) once one has gone past r = 2m.
Thus, even though locally the physics at r = 2m is well behaved, globally the surface
r = 2m is very significant as it is a point of no return. Once one has passed the
Schwarzschild radius, there is no turning back. Such a surface is known as an event
horizon. Note that this is a null surface so, in particular, once one has reached the
event horizon one has to travel at the speed of light to stay there and not be forced
further towards r = 0.
In any case, we now encounter no difficulties when entering the region r < 2m, e.g. along
lines of constant u and this region should be included as part of the physical space-time.
Note that because v = t + r ∗ and r ∗ → −∞ for r → 2m, we see that decreasing r
along lines of constant v amounts to t → ∞. Thus the new region at r ≤ 2m we have
279
discovered is in some sense a future extension of the original Schwarzschild space-time.
Note also that nothing, absolutely nothing, no information, no light ray, no particle,
can escape from the region behind the horizon. Thus we have a Black Hole, an object
that is (classically) completely invisible from the outside.
Remarks:
1. One can of course check directly that the Eddington-Finkelstein metric (15.56)
is a solution of the vacuum Einstein equations. One can also obtain the metric
directly from integrating the Einstein vacuum equations if one starts not with the
standard form of a static isotropic metric. (12.5) but makes an ansatz of the form
in terms of two unknown functions f (v, x) and r(v, x) of two variables. The same
arguments as in the discussion of ingoing null geodesics above show that this
general form of the metric is characterised by the fact that that the integral curves
of the null vector ∂x (i.e. the curves with constant (v, θ, φ)) are null geodesics, with
affine parameter x,
gxx = 0 , ∇∂x ∂x = 0 . (15.67)
As one can always choose to build a coordinate system in such a way, this is a valid
a priori ansatz for the metric, and from this perspective one never encounters the
question if the singularity at r = 2m is real or not, since it is not even a coordinate
singularity in these coordinates.
2. In the above (v, r) coordinate system we can cross the event horizon only on future
directed paths, not on past directed ones. The situation is reversed when one uses
the coordinates (u, r) instead of (v, r). In that case, the lightcones in Figure 16
are flipped (either up-down or left-right), and one can pass through the horizon
on past directed curves. The new region of space-time covered by the coordinates
(u, r) is definitely different from the new region we uncovered using (v, r) even
though both of them lie ‘behind’ r = 2m. In fact, this one is a past extension
(beyond t = −∞) of the original Schwarzschild ‘patch’ of space-time. In this
patch, the region behind r = 2m acts like the opposite (time-reversal) of a black
hole (a white hole) which cannot be entered on any future-directed path.
3. As we will discuss later, in section 15.9, this white hole region is unphysical, i.e.
an artefact of the idealisation of an eternal black hole metric. The future black
280
hole region, however, is definitely of relevance as it can be created by gravitational
collapse.
Ordinary space-time diagrams are more familiar (and therefore more intuitive) than
space-null diagrams such as the above (r, v)-diagram (in which, for example, in the
asymptotically flat regime r → ∞ the lightcone has slopes 0 and 2 rather than the
usual ±45◦ slopes ±1). In the present case this can easily be rectified by introducing a
new time-coordinate t̃ instead of v by the relation
v = t + r ∗ = t̃ + r . (15.68)
The metric looks perhaps somewhat less illuminating in the (t̃, r)-coordinates,
but the lightcones and the horizon now have the following simple and easy to visualise
description:
• one side of the lightcone has (as we already know) v = const., thus dt̃/dr = −1
everywhere, is thus at −45◦ everywhere.
• the other side of the lightcone is described by dt̃/dr = +1 for r → ∞ (slope +1),
by dt̃/dr → ∞ for r → 2m, and dt̃/dr → −1 for r → 0.
• In particular, the horizon is (again) vertical in such a diagram, with the (would-
be) outgoing side of the lightcone tangent to the horizon, while the lightcone
degenerates as r → 0.
Nice diagrams that you can find in many places depicting the collapse of a spheri-
cally symmetric star to a black hole and the formation of the horizon typically (either
implicitly or explicitly) use these (t̃, r)-coordinates.
281
because v = t̃ + r is a null coordinate for the Minkowski metric −dt̃2 + dr 2 + (. . .). This
is the Schwarzschild metric in what is known as Kerr-Schild form. There is also a corre-
sponding Kerr-Schild form of the metric in outgoing Eddington-Finkelstein coordinates,
namely (now with u = t − r ∗ = t̃ − r, so this is not the same t̃)
2m
ds2 = −dt̃2 + dr 2 + r 2 dΩ2 + du(t̃, r)2 . (15.73)
r
are known as Kerr-Schild metrics. This ansatz for the metric played an important role
in the search for other exact solutions of the Einstein equations.
3. the contravariant components of Nα with respect to gαβ are the same as those
with respect to ηαβ :
g αβ Nβ = η αβ Nβ − f (x)N α N β Nβ = η αβ Nβ , (15.78)
282
The first guess might be to use the coordinates u and v simultaneously, instead of r and
t. In these coordinates, the metric takes the form
with r = r(u, v). But while this is a good idea, the problem is that in these coordinates
the horizon is once again infinitely far away, at u = +∞ or v = −∞ (i.e. at 2r ∗ =
v − u = −∞). We can rectify this by introducing coordinates U and V with
32m3 −r/2m
ds2 = − e dU dV + r(U, V )2 dΩ2 , (15.84)
r
with r = r(U, V ) given implicitly by
∗
U V = −e (v − u)/4m = −e r /2m = −(r/2m − 1)e r/2m (15.85)
Finally, we pass from the null coordinates (U, V ) (meaning that ∂U and ∂V are null
vectors) to more familiar timelike and spacelike coordinates (T, X) defined, in analogy
with (u, v) = t ∓ r ∗ , by
U =T −X , V =T +X , (15.87)
283
t T
r=2m
Figure 17: The Schwarzschild patch in the Kruskal-Szekeres metric: the half-plane
r > 2m is mapped to the quadrant between the lines X = ±T in the Kruskal-Szekeres
metric.
one finds that the coordinate transformation (t, r) → (T, X) is explicitly, and in its full
glory, given by
Remarks:
1. In these coordinates, the original “Schwarzschild patch” r > 2m, the region of
validity of the Schwarzschild coordinates, corresponds to the region −∞ < U < 0,
0 < V < ∞, or, in terms of X and T , to the region X > 0 and X 2 − T 2 > 0, or
|T | < X. As Figure 17 shows, this ‘Schwarzschild patch’ is mapped to the first
quadrant of the Kruskal-Szekeres metric, bounded by the lines X = ±T .
284
3. In particular, the boundary of the Schwarzschild patch is the horizon r = 2m. In
terms of (U, V ) this is mapped to U V = 0 which is the union of the two lines (null
surfaces) U = 0 and V = 0, and in terms of (T, X) one has
r = 2m ⇒ X = ±T , (15.92)
defined by
x = W (x)e W (x) (15.97)
285
8. Note that the transformation (15.93) is strictly identical to the transformation
(2.102) between Minkowski and Rindler null coordinates, with the identification
a = 1/4m. This is much more profound than it sounds at first. In particular,
we will learn in section 15.7 that κ = 1/4m is the surface gravity of a black hole,
i.e. a measure of the gravitational acceleration at the horizon. It is therefore this
acceleration that plays the same role in the black hole context as the acceleration
called a in the Rindler context. Moreover, this identity is at the heart of the deep
relation between the so-called Unruh Effect (thermal nature of a Minkowski QFT
vacuum state when seen by an accelerating observer, already mentioned in section
5.6) and the Hawking Effect (quantum radiation of black holes).35
Now that we have the coordinates X and T , we can let them range over all the values
for which the metric is non-singular. The only remaining singularity is at r = 0, which
corresponds to the two sheets of the hyperboloid
p
r = 0 ⇔ T 2 − X2 = 1 ⇔ T = ± 1 + X2 . (15.98)
286
By including them, we obtain the Kruskal diagram Figure 18. This extension of the
Schwarzschild metric was discovered independently by Kruskal and Szekeres in 1960 and
presents us with an amazingly rich and complex picture of what originally appeared to
be a rather simple (and perhaps even dull) solution to the Einstein equations. It can be
shown that this represents the maximal analytic extension of the Schwarzschild metric
in the sense that every affinely parametrised geodesic can either be continued to infinite
values of its parameter or runs into the singularity at r = 0 at some finite value of the
affine parameter.36
In addition to the Schwarzschild patch, quadrant I, we have three other regions, living
in the quadrants II, III, and IV, each of them having its own peculiarities. Note that
obviously the conversion formulae from (r, t) → (X, T ) in the quadrants II, III and IV
differ from those given above for quadrant I. E.g. in region II one can use Schwarzschild(-
like) coordinates in which the metric reads
−1
2 2m 2 2m
ds = − 1 dt − −1 dr 2 + r 2 dΩ2 (15.100)
r r
(these are not the same coordinates as those in patch I, as we have seen we cannot
continue the Schwarzschild coordinates across the horizon), and in this quadrant (where
r is a time coordinate etc.) the relation between Schwarzschild and Kruskal coordinates
is
To get acquainted with the Kruskal diagram, let us note the following basic facts (some
of which we had already noted for region I in the previous section).
1. Null lines are diagonals X = ±T +const., just as in Minkowski space. This greatly
facilitates the exploration of the causal structure of the Kruskal-Szekeres metric.
3. Lines of constant r are hyperbolas. For r > 2m they fill the quadrants I and III,
for r < 2m the other regions II and IV.
5. Notice in particular also that in regions II and IV worldlines with r = const. are
no longer timelike but spacelike. Including the transverse 2-sphere, this should be
36
One can also show that any spherically symmetric solution of the Einstein equations is locally
isometric to some domain of the Schwarzschild-Kruskal solution. For a proof of this generalised Birkhoff
theorem see e.g. N. Straumann, General Relativity.
287
T r=0
r=2m
r=m
r=3m r=3m
II
III
X
I
IV
r=m
r=2m
r=0
Figure 18: The complete Kruskal-Szekeres universe. Diagonal lines are null, lines of
constant r are hyperbolas. Region I is the Schwarzschild patch, separated by the horizon
from regions II and IV. The Eddington-Finkelstein coordinates (u, r) cover regions I and
II, (v, r) cover regions I and IV. Regions I and III are filled with lines of constant r > 2m.
They are causally disconnected. Observers in regions I and III can receive signals from
region IV and send signals to region II. An observer in region IV can send signals into
both regions I and III (and therefore also to region II) and must have emerged from the
singularity at r = 0 at a finite proper time in the past. Any observer entering region II
will be able to receive signals from regions I and III (and therefore also from IV) and
will reach the singularity at r = 0 in finite time. Events occuring in region II cannot be
observed in any of the other regions.
288
rephrased as either “lines of constant (r, θ, φ) are spacelike for r < 2m” or “the
surfaces of constant r are spacelike for r < 2m”.
6. Lines of constant Schwarzschild time t are straight lines through the origin. E.g.
in region I one has X = (coth(t/4m))T , with the future horizon X = T corre-
sponding, as expected, to t → ∞.
7. The Eddington-Finkelstein coordinates (v, r) cover the regions I and II, the coor-
dinates (u, r) the regions I and IV.
Now let us see what all this tells us about the physics of the Kruskal-Szekeres metric.
• An observer in region I (the familiar patch) can send signals into region II and
receive signals from region IV.
• The same is true for an observer in the causally disconnected region III.
• Once an observer enters region II from, say, region I, he cannot escape from it
anymore and he will run into the catastrophic region r = 0 in finite proper time.
• As a reward for his or her foolishness, between having crossed the horizon and
being crushed to death, our observer will for the first time be able to receive
signals and meet observers emerging from the mirror world in region III.
• Finally, an observer in region IV must have emerged from the (past) singularity
at r = 0 a finite proper time in the past and can send signals and enter into either
of the regions I or III.
Indeed it is easy to check from (15.85) and (15.86) that the time-translation symnmetry
(t, r) → (t + c, r) of the Schwarzschild patch corresponds to the transformation
289
This is a boost in the (U, V ) or (T, X)-plane which leaves the entire Kruskal metric
invariant since dU dV is invariant and r = r(U, V ) depends on U and V only via the
boost-invariant quantity U V (15.85). Thus this symmetry is generated by the Killing
vector
K = (V ∂V − U ∂U )/4m = (X∂T + T ∂X )/4m , (15.103)
It follows that K has norm proportional to T 2 − X 2 ,
2m −r/2m 2
||K||2 = e (T − X 2 ) . (15.104)
r
There are thus three different cases to consider, each of them interesting in its own right:
• K timelike
K is timelike in the original region I (and in the mirror region III), with (15.89)
2m −r/2m r 2m
I: ||K||2 = − e ( − 1)e r/2m = −(1 − ) , (15.105)
r 2m r
confirming that K = ∂t in region I. We thus recover the statement that the
Schwarzschild metric in the Schwarzschild patch is static.
• K spacelike
K is spacelike in region II. Thus region II has no timelike Killing vector field,
therefore cannot possibly be static, but has instead an additional spacelike Killing
vector field - cf. Remark 1 in section 12.6.
Related to this is the fact, already mentioned above, that in regions II and IV the
slices of constant r are no longer timelike but spacelike surfaces. Thus they are
analogous to, say, constant t or T slices for r > 2m. Just as it does not make
sense to ask “where is the slice t = 1?” (say), only “when is t = 1?”, or “where
is r = 3m?”, in these regions it makes no sense to ask “where is r = m?”, only
“when is r = m?”.
• K null
K is null on the horizon. This turns out to be the most interesting case, and
therefore I will elaborate on this a bit in the following.
Therefore K generates translations along the horizon, v → v + c, and is called the null
generator of (this branch of) the horizon. Thus v can naturally be used as a coordinate
290
there. Evidently, K vanishes at the “point” U = V = 0 where the other horizon
V = 0 branches off. Actually this is of course not a point but a perfectly respectable
2-sphere of radius r = 2m known as the bifurcation surface of the Killing horizon of the
Schwarzschild geometry, and K vanishing means that this 2-sphere is invariant under
the time-translation generated by K, something that is also evident from (15.102). On
U = 0 this branching point lies at V → 0 ⇒ v → −∞, so the Eddington-Finkelstein
coordinate v ∈ (−∞, +∞) only covers the half-line U = 0, V > 0 of the horizon.
On the other hand the line U = 0 is itself an “outgoing” (radial) null geodesic, but
in light of the above v cannot possibly be an affine parameter along that geodesic
(the affine parameter should not reach infinite values half-way along the geodesic). The
failure of v to be an affine parameter on the horizon can be quantified by calculating the
acceleration ∇K K = ∇∂t ∂t (4.66) of the (integral curves of the) Killing vector and then
taking the limit r → 2m. This calculation is quite painless in Eddington-Finkelstein
coordinates (v, r) in which K = ∂v everywhere,
(K is evidently a Killing vector because the components of the metric do not depend
on v). Then the acceleration is
Remarks:
1. Since we have ∇K K ∼ K on the horizon, this shows first of all that there it
generates a non-affinely parametrised geodesic (see (2.25) and (4.68)). Moreover,
we see that one interpretation of the ubiquitous factor κ = 1/4m is that it measures
the inaffinity, i.e. the failure of v to be an affine parameter on the horizon,
lim ∇K K = κK . (15.109)
r→2m
2. Note, incidentally, that the above calculation also shows that v is an affine pa-
rameter on a null surface at r → ∞. Noting that v parametrises in-going light
rays, this null surface is the surface of past null infinity, affectionately known as
scri-minus (short for “script-I minus”) I − . Likewise, u is an affine parameter on
future null infinity I + .37
37
Defining these objects properly requires significantly more care. For details, and much more, see
e.g. the classic text-book The large scale structure of space-time by S. Hawking and G. Ellis, or General
Relativity by R. Wald.
291
3. Another interpretation brought out by the same calculation (15.108) is that κ =
1/4m is the surface gravity which measures the strength of the gravitational force
(acceleration) a(r) (14.5) acting on a stationary observer at the horizon, but as
measured at infinity (by taking into account the red-shift factor f (r)1/2 ),
Anyway, to return to the beginning of this story, we have seen that v is not an affine
parameter along the horizon. It turns out, however, that the Kruskal coordinate V is an
affine parameter there (and this is one way of understanding why Kruskal coordinates
are so natural for exploring the causal structure of the metric), meaning that the null
curve
xα (λ) = (U (λ), V (λ), θ(λ), φ(λ)) = (0, λ, θ0 , φ0 ) (15.111)
is an affinely parametrised null geodesic. This follows on the nose from (15.108) and
the result (2.27) of section 2.2. Noting that κ is constant (actually not just along the
geodesic but on the entire horizon, but the former is all we need), (2.27) with τ → λ
and σ → v becomes (dropping integration constants)
dλ
∼ e κv → λ(v) ∼ κ−1 e κv ∼ V . (15.112)
dv
Another way to see this, which provides some more insight, is to analyse the geodesic
Lagrangian and the conserved quantity associated to K in ingoing Eddington-Finkelstein
coordinates (one could also work in Kruskal coordinates, but nothing is gained by that).
The elementary steps in the calculation are the following:
• From the metric (15.56) we deduce that for a radial null-geodesics one has
where a dot denotes a derivative with respect to the affine parameter λ. The
geodesics with v = const describe ingoing null-geodesics and we are not interested
in these, so we have
−f (r)v̇ + 2ṙ = 0 , (15.114)
and since u = t − r ∗ = v − 2r ∗ and dr ∗ /dr = f (r)−1 , this is equivalent to u̇ = 0
and, as anticipated, describes outgoing geodesics.
f (r)v̇ − ṙ = E . (15.115)
292
For the null geodesic along the horizon we are ultimately interested in, we have
r(λ) = rs , i.e. E = 0, but we need to approach this with some care, so we keep
the general solution for now. Analogously, for v we find the equation
2Er(λ) 2rs
f (r)v̇ = 2E ⇒ v̇ = = 2E + . (15.117)
r(λ) − rs λ − λ0
We can now take the limit E → 0 with impunity, and are left with
2rs
v̇ = ⇒ v(λ) = 2rs ln(λ − λ0 ) + const. . (15.118)
λ − λ0
• The prefactor 2rs = 4m is now precisely such that it cancels the factor 1/4m in
the definition of the Kruskal coordinate V (15.80), so that
We now have two natural coordinates on the future horizon U = 0, V > 0 which we can
for instance use to measure the frequency of incoming waves. ∂t → ∂v measures what
is commonly called Killing frequency (this requires no further explanation since it is
associated to the Killing vector which generates time-translations in the Schwarzschild
wedge), and ∂V measures the so-called free fall frequency (since, more or less by con-
struction, a freely falling observer in the Schwarzschild geometry near r = 2m will see
approximately the Minkowski space-time metric ds2 ∼ −dU dV ). The exponential rela-
tion (15.80) between them reflects the exponential blue- or red-shift we first encountered
in section 14.4.
As a concluding remark to this section, I cannot resist mentioning that the notion
of surface gravity κ plays a crucial role in the analysis of the classical dynamics of
general black holes, and even more so in the semi-classical context, since it is directly
proportional to the temperature of the famous Hawking radiation of an evaporating
black hole,
κ 1
TH = = . (15.120)
2π 8πGN M
For more details on this and other related advanced topics I am not able to cover or do
justice to here, I refer you to the references given in footnote 35 of section 15.5, as well
as to the superb Cambridge lecture notes on Black Holes.38
There is one more coordinate system for the Schwarzschild geometry that I want to
mention because it is quite remarkable and, equally remarkably, apparently not widely
38
P. Townsend, Black Holes, arXiv:gr-qc/9707012v1.
293
known or used. It was discovered by W. Israel in 1966, and rediscovered several times
since, most recently by T. Klösch and T. Strobl in a different particularly insightful
way.39 I will introduce these coordinates in a way that is complementary to those in
these articles (and that is also motivated by certain generalisations, that I will however
not discuss here).
Recall that the Kruskal coordinates were based on suitably combining the outgoing
and ingoing Eddington-Finkelstein coordinates. Now, more generally one frequently
encounters the situation that one knows a solution either in coordinates adapted to
ingoing null goedesics, or in coordinates adapted to outgoing null geodesics, but usually
not both (and given one constructing the other is usually a hard problem that may have
no simple analytical solution).
Thus, let us assume that we are given (or have found) the Schwarzschild metric in
outgoing Eddington-Finkelstein coordinates,
We are happy and proud of this, but we quickly realise that this cannot be the end of the
story because the above coordinates provide an incomplete covering of the space-time.
The simplest way to discover this is to study radial lightrays or null geodesics, governed
by the two equations
f (r)u̇2 + 2u̇ṙ = 0 (15.122)
(the null condition), and
f (r)u̇ + ṙ = E (15.123)
(due to the u-translation invariance). One set of null geodesics is simply given by u̇ = 0,
i.e. u = u0 constant, and for these one has
For future-oriented null geodesics one needs E > 0, and therefore one has ṙ > 0. These
are the outgoing null geodesics to which the outgoing Eddington-Finkelstein coordinate
system is adapted. Here r0 is an integration constant which, by an affine transformation
(actually just a shift) of τ , we could e.g. without loss of generality choose to be r0 = 2m.
Then the solution describes the outgoing null geodesics that emerge from the past event
horizon at r = 2m for τ = 0.
The other set of (thus ingoing) null geodesics has u̇ 6= 0 and is therefore governed by
the equations
f (r)u̇ + 2ṙ = 0 and f (r)u̇ + ṙ = E . (15.125)
39
W. Israel, New Interpretation of the extended Schwarzschild Manifold, Phys. Rev. 143 (1966) 1016-
1021; T. Klösch, T. Strobl, Explicit Global Coordinates for Schwarzschild and Reissner-Nordstroem,
arXiv:gr-qc/9507011. See also K. Lake, An explicit global covering of the Schwarzschild-Tangherlini
black holes, arXiv:gr-qc/0306073, and Maximally extended, explicit and regular coverings of the
Schwarzschild - de Sitter vacua in arbitrary dimension, arXiv:gr-qc/0507031 for some generalisations.
294
Subtracting the two one finds ṙ = −E, and therefore (again with a convenient choice of
integration constant or origin of τ )
τ → 0− ⇒ r → 2m , u→∞ . (15.127)
Thus we discover that we reach u = +∞ in finite (affine) time, we run out of coordinate
space as the ingoing null geodesics approach r = 2m (and the coordinate u is evidently
not suitable for describing what happens beyond r = 2m, u = ∞).
Note that this locus is most certainly not the past event horizon (r = 2m, u = u0
finite), as we know that we can only cross that horizon along future-directed paths from
smaller to larger values of r (the white hole). Thus we have discovered a new barrier
(which also turns out to be a horizon, and which, with hindsight, we know is the future
event horizon), and we realise that the outgoing Eddington-Finkelstein coordinates do
not cover the whole space-time.
Now let us assume that we do not have the luxury of being able to appeal to ingoing
Eddington-Finkelstein coordinates to construct a future extension (and subsequently
the maximal Kruskal-Szekeres extension) of this space-time. How could we proceed?
so that the complete range of u, u ∈ (−∞, +∞), is covered as x runs over the interval
x ∈ (−∞, 0). This clearly now permits to continue the ingoing null geodesics beyond
u = +∞. However, if one just replaces u → x, the metric appears to be singular at
x = 0. This can be rectified by introducing a new coordinate y through the relation
r − 2m = xy , (15.129)
its range subject to the condition r > 0. To better understand the rationale for this
change of variables, note that as u → ∞ one has r − 2m ∼ τ ∼ x so that (r − 2m)/x
remains finite in the limit - and can therefore be used as a new coordinate, the one that
we have called y.
It is straightforward to see that in these coordinates the metric takes the form
8my 2
ds2 = dx2 + 8m dx dy + (xy + 2m)2 dΩ2 . (15.130)
xy + 2m
295
This is the Schwarzschild metric in Israel coordinates. Before discussing some of it most
important properties, note that by the simple scaling uI = 4mx, so that
is just the “canonically normalised” Kruskal coordinate introduced in (15.93), and the
relabelling y → vI , one can put the metric into the somewhat more common “canoncially
normalised” form
vI2
ds2 = du2 + 2 duI dvI + ((uI /4m)vI + 2m)2 dΩ2 . (15.132)
2m((uI /4m)vI + 2m) I
Here are some of the key-properties of this metric (to describe these I will continue to
use the coordinates (x, y), i.e. the form (15.130) of the metric):
1. First of all, one sees that the metric is completely non-singular as long as xy >
−2m, and one can therefore let x and y run over all the values for which this
condition is satisfied.
r = 2m ⇔ xy = 0 : {x = 0} ∪ {y = 0} (15.133)
3. Then one sees immediately that this space-time covers four distinct “patches”:
• For x < 0 and y < 0 one has r > 2m: this is the original Schwarzschild patch.
• There are two regions for which x and y have opposite sign, subject to the
condition xy > −2m: these are the white hole region behind the past hori-
zon (x < 0, y > 0), covered by the original outgoing Eddington-Finkelstein
coordinates, and the new region beyond the future event horizon with x >
0, y < 0.
• Finally, for x > 0 and y > 0 one also has r > 2m: this is a distinct region
isometric to the Schwarzschild patch, the mirror region.
4. Thus the Israel coordinates provide not only a future extension of the Schwarz-
schild metric in Schwarzschild or outgoing Eddington-Finkelstein coordinates but
actually a complete covering of the maximal Kruskal-Szekeres extension of the
Schwarzschild space-time.
5. The usual Kruskal-Szekeres coordinates (U, V ) are related to the Israel coordinates
by
U ∼ x , V ∼ y e xy/2m + 1 . (15.134)
296
6. The main difference to (and advantage compared with) Kruskal-Szekeres coordi-
nates is that for the Israel coordinates the coordinate transformation to Schwarz-
schild coordinates and its inverse are completely explicit (whereas the radial co-
ordinate r is only given implicitly in terms of the Kruskal-Szekeres coordinates).
E.g. for x < 0 one has
) (
u = −4m log(−x) x = −e −u/4m
⇔ (15.135)
r = xy + 2m y = −(r − 2m)e u/4m
7. By inspection, the metric depends only on products like xy, dxdy etc. Thus it has
the isometry
(x, y) → (λx, λ−1 y) (15.136)
which corresponds (and reduces) to the time-translation symmetry in the Schwarz-
schild patch, generated by
• One has the curves with x (as well as, of course, (θ, φ)) constant, so that these
are straight lines in an (x, y)-diagram. For these curves, (15.139) implies that
ẏ is constant, so that y an affine parameter along these null geodescis.
• The other null geodesics are given by the relation
y2
ẋ + ẏ = 0 . (15.140)
xy + 2m
Multiplying by x and subtracting twice that from (15.139), one finds
Using this to eliminate x and ẋ from (15.140), the equation of motion for y
reduces to
or cτ = −2m log(y(τ )/y0 ). Substituting this in (15.141), one finds that these
radial null geodesics are given by the curves
297
Armed with this information, it is straightforward to draw an Israel analogue of the
Kruskal diagram, containing an equivalent amount of information. All in all, Israel co-
ordinates are an attractive and easy to construct and understand alternative to Kruskal-
Szekeres coordinates.
Now you may well wonder if all this talk about white holes and mirror regions is for
real or just science fiction. Clearly, if an object with r0 < 2m exists and is described by
the Schwarzschild solution, then we will have to accept the conclusions of the previous
section. However, this requires the existence of an eternal black hole (in particular,
eternal in the past) in an asymptotically flat space-time, and this is not very realistic.
While black holes are believed to exist, they are believed to form as a consequence of
the gravitational collapse of a star whose nuclear fuel has been exhausted (and which
is so massive that it cannot settle into a less singular final state like a White Dwarf or
Neutron Star ).
To see how we could picture the situation of gravitational collapse (without trying to
understand why this collapse occurs in the first place), let us estimate the average
density ρ of a star whose radius r0 is equal to its Schwarzschild radius. For a star with
mass M we have
2M GN 4πr03
rs = and M ≈ ρ . (15.144)
c2 3
Therefore, setting r0 = rs , we find that
2
3c6 Msun
ρ= ≈ 2 × 1016 g/cm3 . (15.145)
32πG3N M 2 M
For stars of a few solar masses, this density is huge, roughly that of nuclear matter.
In that case, there will be strong non-gravitational forces and hydrodynamic processes,
singificantly complicating the description of the situation. The situation is quite simple,
however, when an object of the mass and size of a galaxy (M ∼ 1010 Msun ) collapses.
Then the critical density (15.145) is approximately that of air, ρ ∼ 10−3 g/cm3 , non-
gravitational forces can be neglected completely, and the collapse of the object can be
approximated by a free fall. The Schwarzschild radius of such an object is of the order
of light-days (∼ 105 s).
298
T r=0
r=2m , t=infinity
Surface
of the
Star
Interior
of the
Star
r=const.
Figure 19: The Kruskal diagram of a gravitational collapse. The surface of the star
is represented by a timelike geodesic, modelling a star (or galaxy) in free fall under
its own gravitational force. The surface will reach the singularity at r = 0 in finite
proper time whereas an outside observer will never even see the star collapse beyond its
Schwarzschild radius. However, as discussed in the text, even for an outside observer
the resulting object is practically ‘black’.
the crossing of the Schwarzschild radius by the surface of the star, it will evidently be
more informative to use the parametrisation R0 = R0 (τ ).
Neglecting radiation-effects, the mass M of the star (galaxy) remains constant so that
the exterior of the star, r > R0 (τ ), is described by the corresponding subset of region
I, and subsequently (once R0 (τ ) < 2m) also region II, of the Kruskal-Szekeres metric.
Note that regions III and IV no longer exist because the region r < R is simply not at
all described by the Schwarzschild solution, but should be described by a solution of the
Einstein equations appropriate for the interior of the star (in particular, this better be
a solution of the non-vacuum Einstein equations, and we will describe one such solution
in section 23).
To model this, we start with the Schwarzschild metric in the region outside the star.
For t ≤ 0 this is the region r > r0 , while for t > 0 this is the region r > R0 (t) where
R0 (t) is the radius of the star at time t, and R0 (t) describes the radial free fall (geodesic
motion) of the points on the surface towards the center discussed in section 14.2. By
continuity of the metric, the space-time metric induced by the exterior metric on the
299
(2+1)-dimensional worldvolume Σext of the surface of the star is then given by
ds2Σext = −f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2 r=R0 (t)
(15.146)
= −[(f (R0 (t)) − f (R0 (t)) −1
(dR0 /dt)2 ]dt2 + R0 (t)2 dΩ2
(the subscript “ext” on Σext is used to indicate that this is the metric induced on the
surface of the star by the exterior metric, i.e. the metric outside the star). Expressed
in terms of proper time τ , this becomes (writing now R0 = R0 (τ ), and ṫ = dt/dτ, Ṙ0 =
dR0 /dτ , as usual)
ds2Σext = −[(f (R0 (τ ))ṫ2 − f (R0 (τ ))−1 Ṙ02 ]dτ 2 + R0 (τ )2 dΩ2 . (15.147)
Because (t(τ ), R0 (τ )) parametrise radial null geodesics (for each value of the angular
coordinates (θ, φ)), one has
R0 (η) = 12 r0 (1 + cos η)
3 1/2 (15.150)
r0
τ (η) = (η + sin η) .
8m
Remarks:
1. This simple form of the metric is due to the fact that the radially falling particles
remain at fixed values of the angular coordinates (so these are again comoving
coordinates), and that τ is the corresponding proper time, so that one has ds2 =
−dτ 2 .
Indeed, we can think of the induced metric ds2Σext as the restriction of the Novikov
metric (15.43)
300
2. From (15.150) one sees that R0 takes its initial (maximal) value R0 = r0 at η = 0
or τ = 0, will inevitably cross the Schwarzschild radius of the star at some finite
value of τ , and will reach r = 0 at η = π after the finite proper time
1/2 1/2
r03 r03
τR0 →0 = (π + sin π) = π . (15.153)
8m 8m
For an object the size of the sun (for which our free-fall approximation is, however,
not really adequate) this would be of the order of one hour, and correspondingly
somewhat larger for larger, more massive and less dense, objects.
3. For an observer remaining outside the collpasing star, say at the constant value
r = r∞ , in principle the situation (not unexpectedly by now) presents itself in
a rather different way. Up to a constant factor (1 − 2m/r∞ )1/2 , his proper time
equals the coordinate time t. As the surface of the collapsing galaxy crosses the
horizon at t = ∞, strictly speaking the outside observer will never see the black
hole form.
However, we had also seen that this period is accompanied by an infinite and
exponentially growing gravitational red-shift (14.37), z ∼ exp t/4m for radially
emitted photons. Therefore the luminosity L of the star decreases exponentially,
as a consequence of this gravitational red-shift and the fact that photons emitted
at equal time intervals from the surface of the star reach the observer at greater
and greater time intervals. It can be shown that
√
L∼e −t/3 3m , (15.154)
so that the star becomes very dark very quickly, the characteristic time being of
the order of
√ −5 M
3 3m ≈ 2, 5 × 10 s . (15.155)
Msun
Thus, even though for an outside observer the collapsing star never disappears
completely, for all practical intents and purposes the star is black and the name
‘black hole’ is justified.
4. Since only regions I and II of the Kruskal diagram are relevant for gravitational
collapse, and for black holes arising from gravitational collapse, for most prac-
tical purposes Kruskal-Szekeres coordinates are not required and it is sufficient
to consider coordinates that cover these two regions, such as Painlevé-Gullstrand
(section 15.1) or Eddington-Finkelstein coordinates (section 15.3).
5. Note that, even if the free fall (geodesic) approximation is no longer justified at
some point, once the surface of the star has crossed the Schwarzschild horizon,
nothing, no amount of pressure, can stop the catastrophic collapse to r = 0 be-
cause, whatever happens, points on the surface of the star will have to move
301
within their forward lightcone and will therefore inevitably end up at r = 0 (and
since timelike geodesics maximise proper time, any non-geodesic attempt to avoid
hitting r = 0 will only get you there even quicker . . . ).
It is fair to wonder at this point if the above conclusions regarding the collapse to r = 0
are only a consequence of the fact that we assumed exact spherical symmetry. Would the
singularity be avoided under more general conditions? The answer to this is, somewhat
surprisingly and shockingly, a clear ‘no’.
There are very general singularity theorems, due to Penrose, Hawking and others, which
all state in one way or another that if Einstein’s equations hold, the energy-momentum
tensor satisfies some kind of positivity condition, and there is a regular event horizon,
then some kind of singularity will appear. These theorems do not rely on any symmetry
assumptions.40
It has also been shown that the gravitational field of a static vacuum black hole, even
without further symmetry assumptions, is necessarily given by the spherically symmetric
Schwarzschild metric and is thus characterised by the single parameter M (Israel, 1967).
This was the first of a series of remarkable black hole uniqueness (or “no hair”) theorems
which I will briefly come back to in section 15.10 below.
40
See e.g. S. Hawking and G. Ellis, The large scale structure of space-time and R. Wald, Chapter
9 of General Relativity for textbook accounts, chapter 1 of S. Hawking, The Nature of Space and
Time, arXiv:hep-th/9409195 for an introduction, and J. Senovilla, Singularity Theorems in General
Relativity: Achievements and Open Questions, arXiv:physics/0605007v1 [physics.hist-ph] for an
overview.
302
In this sense, therefore, singularities appear to be unavoidable in classical general rela-
tivity, and the theory predicts and points to its own incompleteness (“it’s singular” can
hardly be considered to be a satisfactory answer . . . ).
While the Schwarzschild solution that we have discussed at length above is not the only
black hole solution of the Einstein equations, in 4 space-time dimensions the possiblili-
ties are remarkably restricted. In particular, it can e.g. be shown that the most general
stationary black hole solution of the 4-dimensional Einstein-Maxwell equations (asymp-
totically flat, with a regular event horizon) is characterised by just three parameters,
namely its mass M , charge Q and angular momentum J.
This is generally referred to as the fact that black holes have no hair, or as the no-
hair theorems.41 In terms of gravitational collapse, these theorems roughly amount
to the statement that the only characteristics of a black hole which are not somehow
radiated away during the phase of collapse via multipole moments of the gravitational,
electro-magnetic, . . . fields are those which are protected by some conservation laws.
The two most important examples of black hole solutions generalising the Schwarz-
schild metric, 2-parameter subfamilies of the complete 3-parameter family of black hole
metrics, are the following:
303
the solution was found by Kerr only in 1963, almost fifty years after the Schwarz-
schild and Reissner-Nordstrøm solutions, through monumental calculations.42
The Kerr metric describing a rotating black hole depends on two parameters m
and a related to its mass M and angular momentum J respectively. In Boyer-
Lindquist coordinates (one of the coordinate systems in which the metric looks
“simplest”, relatively speaking), the metric takes the form
2m Λr 2
ds2 = −f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2 , f (r) = 1 − − (15.159)
r 3
is the unique spherically symmetric solution of the Einstein vacuum equations
with a cosmological constant Λ,
It is also known as the Schwarzschild - de Sitter metric for Λ > 0 and the Schwarz-
schild - anti-de Sitter metric for Λ < 0. This solution is not asymptotically flat
but asymptotically (A)dS, i.e. asymptotic to pure de Sitter or anti-de Sitter space,
which is the maximally symmetric solution of the Einstein equation with a positive
(negative) cosmological constant - see section 24 for a detailed discussion of these
space-times. In particular, we will derive the solution (15.159) in section 24.2 (see
equation (24.44)).
42
See e.g. R. Kerr, Discovering the Kerr and Kerr-Schild metrics, arXiv:0706.1109 [gr-qc]; M.
Visser, The Kerr spacetime: A brief introduction arXiv:0706.0622 [gr-qc].
304
Unsurprisingly, one can also add charge to this solution,
2m Λr 2 q 2
f (r) = 1 − − + 2 , (15.161)
r 3 r
to find an exact charged black hole solution of the Einstein-Maxwell equations
with a cosmological constant.
2. Remarkably, and more surprisingly, for Λ < 0 one can replace the “1” in f (r) by
a constant k = 0, ±1,
2m Λr 2
fk (r) = k − − (15.162)
r 3
(formally this also works for Λ > 0, but since fk (r) is strictly negative in that
case, this requires some reinterpretation . . . ), provided that one also replaces the
2-sphere by R2 or T 2 for k = 0, and the 2-dimensional hyperboloid H 2 for k = −1,
2
dΩ2 for k = +1
ds2 = −fk (r)dt2 + fk (r)−1 dr 2 + r 2 dΩ2(k) , dΩ2(k) = d~x2 for k = 0
dΩ̃22 for k = −1
(15.163)
2 2
(dΩ̃2 denotes the line element of the standard metric on H ). These solutions
describe black holes immersed into AdS space, with horizons with a non-spherical
topology. Therefore such solutions are also, somewhat confusingly, known as topo-
logical black holes.43 A special case of (15.163) are the metrics (24.81) one obtains
for m = 0, which describe pure AdS space (no black hole) in different coordinate
systems.
3. A curious class of solutions are so-called regular black hole solutions, solutions
with event horizons but without singularities. Due to the singularity theorems
of general relativity, mentioned at the end of section 15.9, which are typically
of the form “under some reasonable assumptions, if there is something like an
event horizon, there must be something like a singularity”, such solutions need to
walk a fine line between avoiding the singularity theorems and not being outright
unphysical. Usually this is achieved by some weak violation of the (occasionally
unreasonably strong) positive energy conditions entering the singularity theorems.
One of the earliest and simplest solutions of this kind, which is also of the standard
simple f − f −1 -form, is the Bardeen solution, a solution of the Einstein-Maxwell
equations with metric function
2mr 2
f (r) = 1 − . (15.164)
(r 2 + e2 )3/2
For suitable choice of the mass and charge parameters m and e, f (r) possesses
simple zeros (the largest zero corresponding to an event horizon). It approaches
43
See R. Mann, Topological Black Holes – Outside Looking In, arXiv:gr-qc/9709039 for a general
discussion of these 4-dimensional topological black holes.
305
the Schwarzschild metric for large r,
2m
r→∞: f (r) → 1 − (15.165)
r
but for small r it approaches the de Sitter metric (in the form of the metric (15.159)
with m = 0),
The (standard) black hole solutions are also easily generalised to higher dimensions,
but in addition to that higher dimensions surprisingly offer many more possibilities
that have no 4-dimensional counterpart:
6. There are also analogues of the Kerr metric for D > 4, known as Myers-Perry
black holes, characterised by n rotation parameters for D = 2n + 1 or D = 2n + 2
44
If you are tempted to open this particular box in Pandora’s large collection of boxes better left
untouched, I refer you to the recent review article by S. Ansoldi, Spherical black holes with regular
center, arXiv:0802.0330 [gr-qc], and to the copious references to the original literature therein.
45
See e.g. D. Birmingham, Topological Black Holes in Anti-de Sitter Space, arXiv:hep-th/9808032,
and R. Emparan, AdS/CFT Duals of Topological Black Holes and the Entropy of Zero-Energy States,
arXiv:hep-th/9906040 for details and applications of these solutions.
306
(with n being the rank of the spatial rotation group SO(D − 1) = SO(2n) or
SO(D−1) = SO(2n+1) and thus the maximal number of independent commuting
generators of the corresponding Lie algebra).46
46
For more information about these solutions see R. Myers, Myers-Perry black holes, arXiv:1111.1903
[gr-qc].
47
See e.g. R. Emparan, H. Reall, Black Holes in Higher Dimensions, arXiv:0801.3471 [hep-th]
for a general review, as well as S. Hollands, A. Ishibashi, Black hole uniqueness theorems in higher
dimensional spacetimes, arXiv:1206.1164 [gr-qc], and G. Galloway, Constraints on the topology of
higher dimensional black holes, arXiv:1111.5356 [gr-qc].
307
15.11 Appendix: Summary of Schwarzschild Coordinate Systems
Here is, to wrap up this section, a list of the coordinate systems that we have used to
gain insight into the properties of the Schwarzschild metric.
Remarks:
1. Abbreviations:
f (r) = 1 − 2m/r , g± (ρ) = (1 ± m/2ρ)2
√
F (r) = 32m3 r −1 e −r/2m , r(τ, ρ) = [3 2m(ρ − τ )/2]2/3
FI (x, y) = 8my 2 /(xy + 2m) , r(r ∗ ) from r ∗ = r + 2m ln(r/2m − 1) (15.169)
r(τ, R) and r ′ = ∂r/∂R from (15.40)
r(T, X) from X 2 − T 2 = (r/2m − 1)e r/2m
308
3. The perfectly valid but somewhat un-insightful option to use Schwarzschild coor-
dinates only in region II, say, and variants thereof, have also not been indicated
in the last column.
309
16 Linearised Gravity and Gravitational Waves
In previous sections we have dealt with situations in General Relativity in which the
gravitational field is strong and the full non-linearity of the Einstein equations comes
into play (Black Holes). In most ordinary situations, however, the gravitational field is
weak, very weak, and then it is legitimate to work with a linearisation of the Einstein
equations.
When we first derived the Einstein equations we checked that we were doing the right
thing by deriving the Newtonian theory in the limit where
3. matter is non-relativistic
The fact that the 2nd condition had to be imposed in order to recover the Newtonian
theory is interesting in its own right, as it suggests that there are novel features in general
relativity, even for weak fields, when the 2nd condition is not imposed. Indeed, as we
will see, in this more general setting one discovers that general relativity also predicts
the existence of gravitational waves, i.e. linearised perturbations of the gravitational
field propagating like ordinary waves on the (Minkowski) background geometry. Our
principal aim in this section will be to derive these linearised equations, show that they
are wave equations, and study some of their consequences.
An important next step would be to study and understand how or under which cir-
cumstances gravitational waves are created and how they can be detected. These,
unfortunately, are rather complicated questions in general and I will not enter into this.
The things I will cover in the following are much more elementary, both technically and
conceptually, than anything else in Part II of these notes.
We express the weakness of the gravitational field by the condition that the metric be
‘close’ to that of Minkowski space, i.e. that
(1)
gµν = gµν ≡ ηµν + hµν (16.1)
with |hµν | ≪ 1. This means that we will drop terms which are quadratic or of higher
power in hµν . Here and in the following the superscript (1) indicates that we keep only
up to linear (first order) terms in hµν . In particular, the inverse metric is
310
where indices are raised with η µν .
As one has thus essentially chosen a background metric, the Minkowski metric, one can
think of the linearised version of the Einstein equations (which are field equations for
hµν ) as a Lorentz-invariant theory of a symmetric tensor field propagating in Minkowski
space-time. I won’t dwell on this but it is good to keep this in mind. It gives rise to
the field theorist’s picture of gravity as the theory of an interacting spin-2 field (which
is useful for many purposes but which I do not subscribe to unconditionally because it
is an inherently perturbative and background dependent picture).
It is straightforward to work out the Christoffel symbols and curvature tensors in this
approximation. The terms quadratic in the Christoffel symbols do not contribute to the
Riemann curvature tensor and one finds
(1)µ
Γ νλ = η µρ 12 (∂λ hρν + ∂ν hρλ − ∂ρ hνλ )
(1) 1
Rµνρσ = 2 (∂ρ ∂ν hµσ + ∂µ ∂σ hρν − ∂ρ ∂µ hνσ − ∂ν ∂σ hρµ ) (16.3)
(this result for the Riemann tensor can also be inferred directly from the expression
(8.12) of the Riemann tensor at the origin of an inertial coordinate system). Hence
(1)
Rµν = 12 (∂σ ∂ν hσµ + ∂σ ∂µ hσν − ∂µ ∂ν h − 2hµν ) , (16.4)
where
h ≡ hµµ = η µν hµν = −h00 + δik hik (16.5)
R(1) = η µν Rµν
(1)
= ∂µ ∂ν hµν − 2h , (16.7)
G(1) (1) 1
µν = Rµν − 2 ηµν R
(1)
(16.8)
= 21 (∂σ ∂ν hσµ + ∂σ ∂µ hσν − ∂µ ∂ν h − 2hµν − ηµν ∂ρ ∂σ hρσ + ηµν 2h) .
or, equivalently,
311
The latter can be derived from the quadratic Minkowski space (Poincaré invariant) field
theory action
Z
S[hµν ] = d4 x L(hµν , ∂σ hµν )
(16.11)
1 µν,σ 1 σµ,ν 1 ,σ 1 µσ
L(hµν , ∂σ hµν ) = 4 hµν,σ h − 2 hµν.σ h − 4 h h,σ + 2 h,σ h ,µ
for a free massless spin-2 field described by the Lorentz tensor hµν . Indeed, variation of
the action with respect to the hµν (and the usual integration by parts) leads to
Z
hµν → hµν + δhµν ⇒ δS[hµν ] = d4 x G(1) µν δh
µν
. (16.12)
Remarks:
1. Note that the Lagrangian is not the linearised Einstein-Hilbert Lagrangian, i.e.
the linearised Ricci scalar (16.7) (as the latter is, by construction, linear in hµν ).
Rather,
L = 21 hµν G(1)
µν + total derivative , (16.13)
which is quadratic in hµν , as it should be. The total derivative is there to turn
(1)
the terms of the form h∂ 2 h arising from hµν Gµν into the standard (∂h)2 -form.
2. Note that we can indeed take the integration measure to be the flat Minkowski
measure d4 x, as L is already quadratic in hµν and its derivatives, so that any
contribution of hµν to the measure would give a subleading contribution.
3. This theory, just like its spin-1 Maxwell counterpart, has a local gauge invariance
which we will return to and make use of below.
G(1) (0)
µν = 8πGN Tµν . (16.14)
Note that only the zero’th order term in the h-expansion appears on the right
hand side of this equation. This is due to the fact that Tµν must itself already
(0)
be small in order for the linearised approximation to be valid, i.e. Tµν should be
of order hµν . Therefore, any terms in Tµν depending on hµν would already be of
order (hµν )2 and can be dropped.
∂µ T (0)µν = 0 , (16.15)
∂µ G(1)µν = 0 , (16.16)
which can easily be verified, and which reflects the invariance of the theory under
the linearised coordinate transformations to be discussed below.
312
16.3 Newtonian Limit Revisited
In section 10.3 we verified that the Newtonian (weak field, static, non-relativistic matter)
limit of the Einstein equations reduces to the Newtonian (Poisson) equation ∆φ =
4πGN ρ. This is a special case of the above general weak-field limit equations (16.14),
(1)
with Gµν given in (16.8), and it will be instructive to redo the calculation of section
10.3 from this more general perspective.
We thus assume an energy-momentum tensor whose only non-negligible component is
the energy density T00 = ρ, with ρ static and GN ρ ≪ 1. Then we can assume that
the deviations of the space-time geometry from the Minkowski metric are small and
time-independent,
(1)
gµν = gµν = ηµν + hµν , ∂0 hµν = 0 . (16.17)
Then the (00)-component of the Einstein tensor reduces (after some cancellations) to
(1)
G00 = − 21 ∆(δik hik ) + 12 ∂i ∂k hik . (16.18)
In particular, h00 and its derivatives have dropped out of this expression, which appears
to be at odds with the desired G00 = −∆g00 for Newtonian fields (section 10.3). How-
ever, we have not yet used at all the condition that Tik = 0 → Gik = 0. In particular,
for static perturbations we find from (16.8) that the trace of the spatial components of
the Einstein tensor is
(1)
δik Gik = 21 ∆(δik hik ) − 12 ∂i ∂k hik − ∆h00 (16.19)
so that
(1) (1)
Gik = 0 ⇒ G00 = −∆h00 . (16.20)
This is precisely the relation required (and verified) in the analysis of section 10, in
order to have the correct Newtonian limit.
As we are in the realm of a Poincaré-invariant classical field theory, we can define the
total energy of the system as the integral of the energy-density T00 = ρ over a spatial
constant time x0 = const. slice Σ,
Z
E= d3 x T00 (16.21)
Σ
(another way to see that we can take the integration measure to be the flat Euclidean
measure d3 x is to recall that T00 is small by assumption). If the linearised Einstein
equations are satisfied, we can express the total energy in terms of the Einstein tensor,
Z
1 (1)
E= d3 x G00 . (16.22)
8πGN Σ
313
(1)
Now we have two distinct expressions for G00 in terms of the derivatives of the metric
and, interestingly, both of them are spatial total derivatives, namely
(1)
G00 = − 12 ∆(δik hik ) + 21 ∂i ∂k hik = ∂i (− 12 ∂ i hkk + 12 ∂k hik ) (16.23)
and
(1)
G00 = −∆h00 = −∂i (∂ i h00 ) . (16.24)
This allows us to write the total energy E of the system as a boundary integral over
the boundary
2
∂Σ = S∞ (16.25)
of the spatial slice “at infinity” in two different ways, namely as
I
1
E= dSi (∂k hik − ∂ i hkk ) (16.26)
16πGN S∞ 2
or as I
1
E=− dSi ∂ i h00 . (16.27)
8πGN 2
S∞
Note that these final expressions only depend on the asymptotics of the gravitational
field at spatial infinity (the assumption that the gravitational field is weak there trans-
lating into the statement that the metric is asymptotically flat). It is thus tempting
to adopt E as the definition of the total energy of an isolated (i.e. asymptotically flat)
system in general, even when the field in the interior is not weak:
1. Thus in general the ADM Energy (Arnowitt - Deser - Misner, see the reference in
footnote 11, section 10.5) of such a system is defined by
I
1
EADM = dSi (∂k hik − ∂i hkk ) , (16.28)
16πGN S∞ 2
2. The second expression is actually a special case of the Komar charges briefly
mentioned in section 9.5, here applied to the (asymptotic) timelike Killing vector
∂t (the generator of time-translations, and thus naturally associated with the
energy or Hamiltonian). Indeed, recall that for any Killing vector K µ we had
the conserved current J µ = Rµν K ν (9.91), which was itself a divergence of an
anti-symmetric tensor, J µ = ∇ν Aµν , with Aµν = −Aνµ = ∇µ K ν . Thus the
corresponding conserved charge QK (V ) contained in a volume V can be written
as a surface integral of ∇µ K ν over the boundary ∂V of the volume V ,
I
QK (V ) ∼ dSµν ∇µ K ν . (16.29)
∂V
314
2 , and K = ∂ , we find (asymptotically)
Applying this to V = Σ, ∂V = S∞ t
This fixes the normalisation of the Komar charge QK (V ) (16.29) in this case,
I
1
EKomar (V ) ≡ QK=∂t (V ) = − dSµν ∇µ K ν (16.32)
8πGN ∂V
and we can identify (16.27) with the total Komar energy (V = Σ) associated to
the Killing vector K = ∂t ,
I I
1 1
EKomar (Σ) = − dSi ∂i h00 = − dSµν ∇µ K ν . (16.33)
8πGN S∞ 2 8πGN S∞
2
As our primary litmus test, let us apply these expressions to the Schwarzschild metric.
In standard coordinates the asymptotic behaviour of the spatial part of the metric is
Since the ADM expression for the energy appeals to an asymptotically Cartesian co-
ordinate system, this is one instance where it is perhaps more natural (and safer) to
315
use the Schwarzschild metric in isotropic coordinates (12.36). The asymptotic (ρ → ∞,
with ρ2 = ~x2 ) behaviour of the spatial part of the metric is
2m
(1 + m/2ρ)4 d~x2 ≈ (1 + 2m/ρ)d~x2 ⇒ hik = δik . (16.39)
ρ
Note that this is not the same as (16.36). Nevertheless, calculating ∂k hik − ∂i hkk in this
case, one finds the same expression as in Schwarzschild coordinates (with r → ρ),
4m
∂k hik − ∂i hkk = xi , (16.40)
ρ3
and the remainder of the calculation is then identical, leading again to EADM = M .
From the alternative Komar expression (16.33) for the energy, this (eminently reason-
able and respectable) result arises not from the asymptotic behaviour of the spatial
components of the metric but instead form that of the (00)-component of the metric
(the relation between the two being provided by the Einstein equations). We have
2m 2m
g00 = −1 + ⇒ h00 = , (16.41)
r r
leading to I
dSi ∂i h00 = 4πr 2 ∂r (2m/r) = −8πm . (16.42)
Sr2
Note that for the special case of the Schwarzschild metric, for which (16.41) is exact
(and not just true asymptotically), this is independent of r, and thus
I I
1 1
EKomar = − lim dSi ∂i h00 = − dSi ∂i h00 = M . (16.43)
r→∞ 8πGN S 2 8πGN Sr2
r
However, more generally, for any metric with the asymptotic behaviour
a
g00 = −1 + + O(r −2 ) (16.44)
r
this calculation shows that the mass / energy of the solution is determined by the 1/r
term of the (00)-component of the metric,
I
1
EKomar = − lim dSi ∂i h00 = a/2GN . (16.45)
r→∞ 8πGN S 2
r
In particular, this applies e.g. to the Reissner-Nordstrøm metric of section 22 and shows
that the parameter m appearing in the solution
2m q 2
ds2 = −f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2 , f (r) = 1 − + 2 (16.46)
r r
has the interepretation of the total energy of the system, E = m/GN , even in the
presence of charge and electrostatic fields.
316
between µ and the mass is also dimension-dependent, as it involves the area of the
transverse (d − 1)-sphere.
Remarks:
1. The independence of EKomar of the 2-surface over which one integrates in the
Schwarzschild case (indeed, one could choose any 2-surface in the region r > 2m
enclosing the black hole) can be understood as a consequence of the relation (9.87),
which says that
∇ν (∇µ K ν ) = Rµν K ν . (16.47)
2. One may be a bit concerned by the fact that we obtained a non-zero result for
the ADM or Komar energy for the Schwarzschild metric, allegedly a solution of
the vacuum Einstein equations, even though in order to arrive at the expressions
for the energy, we started off with a non-trivial energy-momentum tensor, either
in the linearised theory, with T00 = ρ 6= 0, or via the Komar charge associated to
the current
J µ = Rµν K ν = 8πGN (T µν − 21 δµν R)K ν (16.48)
(which is identically zero for a vacuum solutions, so it appears that one doesn’t
have a leg to stand on in that case). This seeming conflict can be resolved in a
number of ways:
317
Schwarzschild spacetime (section 15.6), and then the fact that the total in-
tegral should be (and is) zero just says that one gets the same contibution
with opposite signs from the spatial infinities in regions I and III respectively.
Any local observer only has access to one of those regions. Thus the fact that
2 of
the total integral is zero is irrelevant, and the integral over the surface S∞
region I gives E = M , which is the physically relevant statement.
3. In the same way one can also introduce various notions of momentum or angular
momentum of an isolated system (the latter for instance being non-zero for the
Kerr metric (15.157) describing a rotating star or black hole), but we will not
pursue this here.
We will now abandon the assumption of static non-relativistic fields and return to the
general weak-field linearised Einstein equations (16.14). In order to understand how to
proceed from there, in this section we will briefly recall in a condensed way the analogous
steps in the case of Maxwell theory.
1. The Maxwell equations (in a Minkowski background) are already linear (no need
to linearise) and read
∂ α Fαβ = −Jβ . (16.49)
In terms of the potentials Aα these equations are
Aα → Aα + ∂α V (16.51)
(in particular, the field strenghts Fαβ are invariant) which allows one to choose
for instance the Loren(t)z gauge condition48
∂α Aα = 0 . (16.52)
48
Credit for this should really go the Danish physicist Ludvig V. Lorenz (who used this condition in
its Lorent(!)z non-invariant form already in 1867, just 3 years after the publicaton of Maxwell’s treatise)
rather than the more famous Dutch physicist Hendrik A. Lorentz. See J. Jackson’s Electrodynamics
(note at the end of chapter 6).
318
Indeed, if V is chosen to satisfy the differential equation
2V = −∂α Aα , (16.53)
then
∂ α (Aα + ∂α V ) = 0 . (16.54)
With this choice of gauge the Maxwell equations become decoupled wave equa-
tions,
2Aβ = −Jβ , (16.55)
3. The vacuum Maxwell equations can now be solved in terms of plane waves,
β
Aα = ǫα e ikβ x with kβ kβ = 0 (16.56)
(or wave packets constructed from them), and the Lorenz gauge constrains the
polarisation vector ǫα by
k α ǫα = 0 . (16.57)
For a wave travelling in the x3 -direction, this implies for the polarisation vector
A′α = Aα + ∂α ξ , 2ξ = 0 (16.59)
now completely fixes this residual gauge invariance. Under this residual gauge
transformation the polarisation vector transforms as
in particular
ǫ′0 = ǫ0 − iξ0 ω . (16.62)
Thus choosing ξ0 = ǫ0 /iω, one has ǫ′0 = 0, implying ǫ′3 = 0, and one is left with
the polarisation vector
ǫα = (0, ǫ1 , ǫ2 , 0) (16.63)
319
16.6 Linearised Gravity: Gauge Invariance and Coordinate Choices
We now proceed analgously in the case of linearised gravity. First of all we need to
understand the gauge invariance (or the counterpart of gauge invariance) in the present
case.
The original non-linear Einstein equations have a local invariance consisting of gen-
eral coordinate transformations. What remains of general coordinate invariance in the
linearised approximation are, naturally, linearised general coordinate transformations.
Indeed, hµν and
h′µν = hµν + LV ηµν (16.64)
represent the same physical perturbation because ηµν + LV ηµν is just an infinitesimal
coordinate transform of the Minkowski metric ηµν . Therefore linearised gravity has the
gauge freedom
hµν → hµν + ∂µ Vν + ∂ν Vµ . (16.65)
(1)
For example, the linearised Riemann tensor Rµνρσ is, rather obviously, invariant under
this transformation. As a consequence, in particular also the Einstein tensor, and hence
the vacuum linearised Einstein equations, have this local gauge invariance. This can be
understood without any calculation by noting that under the variation
the linearised Riemann tensor transforms into the Riemann tensor for the Minkowski
metric,
(1)
δV Rµνρσ = LV Rµνρσ (ηαβ ) = 0 . (16.67)
However, in contrast to the Lagrangian of Maxwell theory, say, the Lagrangian (16.11)
for linearised gravity is not strictly gauge invariant, but only invariant up to a total
derivative.
Given this gauge invariance of the linearised theory, for explicit calculations it is useful
to make a particular gauge choice and, as in Maxwell theory, a good choice of gauge can
simplify things considerably (and a bad choice of gauge can have the opposite effect).
It is called the harmonic gauge condition, or Fock, or de Donder gauge condition (even
though harmonic coordinates were used extensively by Einstein himself until just before
the discovery of the final, truly generally covariant, formulation of general relativity).
The name harmonic derives from the fact that in this gauge the coordinate fuctions xµ
are harmonic:
2xµ ≡ gνρ ∇ρ ∂ν xµ = −gνρ Γµνρ , (16.69)
320
and thus
2xµ = 0 ⇔ g νρ Γµνρ = 0 . (16.70)
∂µ hµλ − 12 ∂λ h = 0 . (16.71)
The gauge parameter Vµ which will achieve this is the solution to the equation
Note for later that, as in Maxwell theory, this gauge choice does not necessarily fix
the gauge completely. Any transformation xµ → xµ + ξ µ with 2ξ µ = 0 will leave the
harmonic gauge condition invariant.
Now let us use this gauge condition in the linearised Einstein equations. In this gauge
they simplify somewhat to
(0)
2hµν − 12 ηµν 2h = −16πGN Tµν . (16.74)
(0)
Tµν = 0 ⇒ 2hµν = 0 , (16.75)
2hµν = 0
∂µ hµλ − 12 ∂λ h = 0 (16.76)
321
with
h̄µµ = −hµµ . (16.78)
Note, as an aside, that with this notation and terminology the Einstein tensor is the
trace reversed Ricci tensor,
In terms of h̄µν , the Einstein equations and gauge conditions are just
(0)
2h̄µν = −16πGN Tµν
∂µ h̄µν = 0 . (16.80)
In this equation, the h̄µν are now decoupled, just as the covariant Lorentz gauge
∂µ Aµ = 0 decouples the Maxwell equations which, in this gauge read 2Aµ ∼ j µ (the
proportionality factor depending strongly on one’s conventions and choice of units for
Maxwell theory).
2h̄µν = 0 , (16.82)
is clearly solved by
α
h̄µν = ǫµν e ikα x , (16.83)
where ǫµν is a constant, symmetric polarisation tensor and kα is a constant wave vector,
provided that kα is null, kα kα = 0. (In order to obtain real metrics one should of course
use real solutions.)
Thus plane waves are solutions to the linearised equations of motion and the Einstein
equations predict the existence of gravitational waves travelling along null geodesics (at
the speed of light). The timelike component of the wave vector is often referred to as
the frequency ω of the wave, and we can write kµ = (ω, ki ). Plane waves are of course
not the most general solutions to the wave equations but any solution can be written
as a superposition of plane wave solutions (wave packets).
322
So far, we have ten parameters ǫµν and four parameters kµ to specify the wave, but
many of these are spurious, i.e. can be eliminated by using the freedom to perform
linearised coordinate transformations and Lorentz rotations.
one can choose the Bµ in such a way that the new polarisation tensor satisfies kµ ǫµν = 0
(as before) as well as
ǫµ0 = ǫµµ = 0 . (16.86)
All in all, we appear to have nine conditions on the polarisation tensor ǫµν but as both
(16.84) and the first of (16.86) imply kµ ǫµ0 = 0, only eight of these are independent.
Therefore, there are two independent polarisations for a gravitational wave.
For example, we can choose the wave to travel in the x3 -direction. Then
and kµ ǫµν = 0 and ǫ0ν = 0 imply ǫ3ν = 0, so that the only independent components
are ǫab with a, b = 1, 2. As ǫab is symmetric and traceless, this wave is completely
characterised by ǫ11 = −ǫ22 , ǫ12 = ǫ21 and the frequency ω.
Now we should not forget that, when talking about the polarisation tensor of a gravita-
tional wave, we are actually talking about the space-time metric itself. Namely, since for
a traceless perturbation we have h̄αβ = hαβ , we have deduced that the metric describing
a gravitational wave travelling in the x3 -direction can always be put into the form
with hab = hab (t ∓ x3 ). This neatly encodes and describes the distortion of the space-
time geometry in the directions transverse to the gravitational wave.
To determine the physical effect of a gravitational wave racing by, we cannot just look
at the gravitational field (16.88) at a point (by the equivalence principle), i.e. we cannot
detect the presence of such a wave (ultra-)locally. However, we can consider its influence
323
on the relative motion of nearby particles. In other words, we look at the geodesic
deviation equation (9.28).
Consider a family of nearby particles described by the velocity field uµ (x) and separation
(deviation) vector S µ (x). Then the change of the deviation vector along the flow lines
of the velocity field is determined by
We consider the situation where the test particles are initially, in the absence of the
gravitational wave, at rest, uµ = (1, 0, 0, 0). Then the gravitational wave will lead, to
lowest order in the perturbation hµν , to a 4-velocity
However, because the Riemann tensor is already of order h, to lowest order the right
hand side of the geodesic deviation equation reduces to
(1)
Rµ00σ = 12 ∂0 ∂0 hµσ (16.91)
(because h0µ = 0). On the other hand, to lowest order the left hand side is just the
ordinary time derivative. Thus the geodesic deviation equation becomes (an overdot
denoting a t-derivative)
S̈ µ = 12 ḧµσ S σ . (16.92)
and we can consider separately the two cases (1) ǫ12 = 0 and (2) ǫ11 = −ǫ22 = 0.
324
x2
x1
Figure 20: Effect of a gravitational wave with polarisation ǫ11 moving in the x3 -direction,
on a ring of test particles in the x1 − x2 -plane.
2. If, on the other hand, ǫ11 = 0 but ǫ12 = ǫ21 6= 0, then the lowest order solution is
3
S 1 (t) = S 1 (0) + 21 ǫ12 e −iω(t − x ) S 2 (0)
3
(16.98)
S 2 (t) = S 2 (0) + 12 ǫ12 e −iω(t − x ) S 1 (0) .
This time the deplacement in the x1 -direction is governed by the original deplace-
ment in the x2 -direction and vice-versa, and the ring of particles will bounce in
the shape of a × (ǫ12 = ǫ× ) - see Figure 24.
These solutions display the characteristic behaviour of quadrupole radiation, and this
is something that we might have anticipated on general grounds. First of all, we know
from Birkhoff’s theorem that there can be no monopole (s-wave) radiation. Moreover,
dipole radiation is due to oscillations of the center of charge. While this is certainly
possible for electric charges, an oscillation of the center of mass would violate momentum
conservation and is therefore ruled out. Thus the lowest possible mode of gravitational
radiation is quadrupole radiation, just as we have found.
325
x2
x1
Figure 21: Effect of a gravitational wave with polarisation ǫ12 moving in the x3 -direction,
on a ring of test particles in the x1 − x2 -plane.
Now that we have found the solutions to the vacuum equations, we should include
sources and study the production of gravitational waves, characterise the type of radi-
ation that is emitted, estimate the energy etc. I will not do this49 but just make some
general comments on the detection of gravitational waves.
In principle, this ought to be straightforward. For example, one might like to simply try
to track the separation of two freely suspended masses. Alternatively, the particles need
not be free but could be connected by a solid piece of material. Then gravitational tidal
forces will stress the material. If the resonant frequency of this ‘antenna’ equals the
frequency of the gravitational wave, this should lead to a detectable oscillation. This is
the principle of the so-called Weber detectors or Weber bars (1966-. . . ). While fine in
principle, in practice gravitational waves are extremely weak (so weak, in fact, that the
quantum theory of the detectors needs to be taken into account) and, to the best of my
knowledge, such detectors have not produced conclusive results so far (2011).
More sensitive modern experiments use detectors based on huge laser Michelson interfer-
ometers (arms several kilometers long), e.g. LIGO (Laser Interferometer Gravitational-
Wave Observatory) and VIRGO, and these (or their upgrades) are widely expected to
finally directly detect gravitational waves in the next couple of years.
However, there is indirect (and very compelling) evidence for gravitational waves. Ac-
cording to the theory (we have not developed), a binary system of stars rotating
around its common center of mass should radiate gravitational waves (much like electro-
magnetic synchroton radiation). For two stars of equal mass M at distance 2r from each
other, the prediction of General Relativity is that the power radiated by the binary sys-
tem is
2 G4N M 5
P = . (16.100)
5 r5
This energy loss has actually been observed. In 1974, Hulse and Taylor discovered a
binary system, affectionately known as PSR1913+16, in which both stars are very small
49
See e.g. S. Weinberg, Gravitation and Cosmology or S. Carroll, Spacetime and Geometry for detailed
discussionss.
326
and one of them is a pulsar, a rapidly spinning neutron star. The period of the orbit is
only eight hours, and the fact that one of the stars is a pulsar provides a highly accurate
clock with respect to which a change in the period as the binary loses energy can be
measured. The observed value is in good agreement with the theoretical prediction for
loss of energy by gravitational radiation and Hulse and Taylor were rewarded for these
discoveries with the 1993 Nobel Prize.
Other situations in which gravitational waves might be either detected directly or in-
ferred indirectly are extreme situations like gravitational collapse (supernovae) or matter
orbiting black holes.
327
17 Interlude: Maximally Symmetric Spaces
As we will discuss later on, in the context of the Cosmological Principle, such spaces,
which are simultaneously homogeneous (“the same at every point”) and isotropic (“the
same in every direction”) provide an (admittedly highly idealised) description of space
in a cosmological space-time.
If you already know (or are willing to believe) that in any spatial dimension n there
are essentially only 3 such spaces, namely the Euclidean space Rn , the sphere S n , and
its negative curvature counterpart, the hyperbolic space H n (all equipped with their
standard metrics), you can skip this section, and may just want to refer to section 17.4
where it is shown that these 3 standard metrics can be written in a unified way as
dr 2
ds2 = + r 2 dΩ2n−1 (17.1)
1 − kr 2
for k = 0, ±1 respectively.
In section 9.5, for any Killling vector K µ we had obtained the identity (9.74)
In particular, this shows that the second derivatives of the Killing vector at a point x0
are again expressed in terms of the value of the Killing vector itself at that point. But
this means (think of Taylor expansions), that a Killing vector field K µ (x) is completely
determined everywhere by the values of Kµ (x0 ) and ∇µ Kν (x0 ) at a single point x0 .
(i)
A set of Killing vectors {Kµ (x)} is said to be linearly independent if any linear relation
of the form X
ci Kµ(i) (x) = 0 , (17.3)
i
328
with constant coefficients ci implies ci = 0. The reason for insisting on constant coeffi-
cients rather than functions ci (x) in this definition is of course that if K µ is a Killing
vector, then so is cK µ iff c is constant.
We have seen above that Killing vectors K µ (x) are determined by the values K µ (x0 )
and ∇µ Kν (x0 ) at a single point x0 . We will now see how these data are related to
translations and rotations.
We define a homogeneous space to be such that it has infinitesimal isometries that carry
any given point x0 into any other point in its immediate neighbourhood (this could be
stated in more fancy terms!). Thus the metric must admit Killing vectors that, at any
given point, can take all possible values. Thus we require the existence of Killing vectors
for arbitrary Kµ (x0 ). This means that the n-dimensional space admits n translational
Killing vectors.
We define a space to be isotropic at a point x0 if it has isometries that leave the given
point x0 fixed and such that they can rotate any vector at x0 into any other vector at
x0 . Therefore the metric must admit Killing vectors such that Kµ (x0 ) = 0 but such
that ∇µ Kν (x0 ) is an arbitrary antisymmetric matrix (for instance to be thought of as an
element of the Lie algebra of SO(n)). This means that the n-dimensional space admits
n(n − 1)/2 rotational Killing vectors.
Finally, we define a maximally symmetric space to be a space with a metric with the
maximal number n(n + 1)/2 of Killing vectors.
Some simple and fairly obvious consequences of these definitions are the following:
2. A space that is isotropic for all x is also homogeneous. (This follows because
linear combinations of Killing vectors are again Killing vectors and the difference
between two rotational Killing vectors at x and x + dx can be shown to be a
translational Killing vector.)
3. (1) and (2) now imply that a space which is isotropic around every point is max-
imally symmetric.
329
4. Finally one also has the converse, namely that a maximally symmetric space is
homogeneous and isotropic.
On the basis of these simple considerations we can already determine the form of the
Riemann curvature tensor of a maximally symmetric space. We will see that maximally
symmetric spaces are spaces of constant curvature in the sense that
This result could be obtained by making systematic use of the higher order integrability
conditions for the existence of a maximal number of Killing vectors. The argument
given below is less covariant but more elementary.
Assume for starters that the space is isotropic at x0 and choose a Riemann normal
coordinate system centered at x0 . Thus the metric at x0 is gij (x0 ) = ηij where we may
just as well be completely general and assume that
Rijkl (x0 ) = aηij ηkl + bηik ηjl + cηil ηjk + dǫijkl , (17.7)
where the last term is only possible for D = 4. The symmetries of the Riemann tensor
imply that a = d = b + c = 0, and hence we are left with
Rijkl (x0 ) = b(gik (x0 )gjl (x0 ) − gil (x0 )gjk (x0 )) , (17.9)
330
If we now assume that the space is isotropic around every point, then we can deduce
that
Rijkl (x) = b(x)(gik (x)gjl (x) − gil (x)gjk (x)) (17.10)
for some function b(x). Therefore the Ricci tensor and the Ricci scalar are
The contracted Bianchi identity ∇i Gij = 0 now implies that b(x) has to be a constant,
and we have thus established (17.5). Note that we also have
We are interested not just in the curvature tensor of a maximally symmetric space but in
the metric itself. I will give you two derivations of the metric of a maximally symmetric
space, one by directly solving the differential equation
for the metric gij , the other by a direct geometrical construction of the metric which
makes the isometries of the metric manifest.
331
We have already calculated all the Christoffel symbols for such a metric (set A(r) = 0,
with due care, in the calculations leading to the Schwarzschild metric in section 12),
and we also know that Rij = 0 for i 6= j and that all the diagonal angular components
of the Ricci tensor are determined by Rθθ by spherical symmetry. Hence we only need
Rrr and Rθθ , which are
1 B′
Rrr =
rB
1 rB ′
Rθθ = − +1+ , (17.17)
B 2B 2
and we want to solve the equations
B ′ = 2krB 2 , (17.19)
332
17.4 Maximally Symmetric Metrics II: Embeddings
Recall that the standard metric on the n-sphere can be obtained by restricting the flat
metric on an ambient Rn+1 to the sphere. We will generalise this construction a bit to
allow for k < 0 and other signatures as well.
k~x2 + z 2 = 1 . (17.25)
This equation breaks all the translational isometries, but by the very definition of the
group G it leaves this equation, and therefore the hypersurface Σ, invariant. It follows
that G will act by isometries on Σ with its induced metric. But dim G = n(n + 1)/2.
Hence the n-dimensional space has n(n+1)/2 Killing vectors and is therefore maximally
symmetric.
Remarks:
2. The Killing vectors of the induced metric are simply the restriction to Σ of the
standard generators of G on the vector space V .
3. For Euclidean signature, these spaces are spheres for k > 0 and hyperboloids for
k < 0, and in other signatures they are the corresponding generalisations. In
particular, for (p, q) = (1, n − 1) we obtain de Sitter space-time for k = 1 and anti-
de Sitter space-time for k = −1. We wil discuss their embeddings, and coordinate
systems for them, in much more detail in section 24.
333
It just remains to determine explicitly this induced metric. For this we start with the
defining relation of Σ and differentiate it to find that on Σ one has
k~x.d~x
dz = − , (17.27)
z
so that
k2 (~x.d~x)2
dz 2 = . (17.28)
1 − k~x2
Thus the metric (17.24) restricted to Σ is
1
ds2 |Σ = d~x2 + dz 2 |Σ
k
k(~x.d~x)2
= d~x2 + . (17.29)
1 − k~x2
This is precisely the same metric as we obtained in the previous section.
It will be useful to see the maximally symmetric metrics in some other coordinate
systems (for k = 0, there is nothing new to say since this is just the flat metric):
1. First of all let us note that for k 6= 0 essentially only the sign of k matters as
|k| only effects the overall size of the space and nothing else (and can therefore
be absorbed in the scale factor a(t) of the metric (18.1) that will be the starting
point for our investigations of cosmology). To see this note that a metric of the
form (17.22), but with k replaced by k/R2 ,
dr 2
ds2 = + r 2 dΩ2 , (17.30)
1 − kr 2 /R2
can, by introducing r̃ = r/R, be put into the form
2 dr 2 2 2 2 dr̃ 2 2 2
ds = + r dΩ = R + r̃ dΩ . (17.31)
1 − kr 2 /R2 1 − kr̃ 2
We now see explicitly that a rescaling of k by a constant factor is equivalent
to an overall rescaling of the metric, and thus we will just need to consider the
cases k = ±1. However, occasionally it will also be convenient to think of k as a
continuous parameter, the 3 geometries then being distinguished by k < 0, k = 0
and k > 0 respectively.
334
3. For k = −1, on the other hand, we have
dr 2
ds2 = + r 2 dΩ2n−1 . (17.34)
1 + r2
Thus the range of r is 0 ≤ r < ∞, and we can use the change of variables r = sinh ψ
to write the metric as
dr 2
ds2 = + r 2 dΩ2(n−1) = dψ 2 + gk (ψ)2 dΩ2n−1 (17.36)
1 − kr 2
where
ψ k=0
gk (ψ) = sin ψ k = +1 (17.37)
sinh ψ k = −1
Note that this differs by the conformal factor (1 + kr̄ 2 /4)−2 > 0 from the flat
metric. One says that such a metric is conformally flat. Thus what we have
shown is that every maximally symmetric space is conformally flat. Conformally
flat, on the other hand, does not by any means imply maximally symmetric (the
conformal factor could be any function of the radial and angular variables).
Note also that the metric in this form is just the 3- (or n-) dimensional general-
isation of the 2-dimensional constant curvature metric on the 2-sphere in stereo-
graphic coordinates (8.72) (for k = +1) or of the Poincaré disc metric of H 2 (8.73)
(for k = −1).
335
18 Cosmology I: Basics
We now turn away from considering isolated systems (stars) to some (admittedly very
idealised) description of the universe as a whole. This subject is known as Cosmology.
It is certainly one of the most fascinating subjects of theoretical physics, dealing with
such issues as the origin and ultimate fate and the large-scale structure of the universe.
Due to the difficulty of performing cosmological experiments and making precise mea-
surements at large distances, many of the most basic questions about the universe are
still unanswered today:
5. Why is the cosmological constant so small and what determines its value?
While recent precision data, e.g. from supernovae surveys and detailed analysis of the
cosmic microwave background radiation, suggest answers to at least some of these ques-
tions, these answers actually just make the universe more mysterious than ever.
Fortunately, however, many of the important features any realistic cosmological model
should display are already present in some very simple models, the so-called Friedmann-
Robertson-Walker Models already studied in the 20’s and 30’s of the last century. They
are based on the simplest possible ansatz for the metric compatible with the assumption
that on large scales the universe is roughly homogeneous and isotropic (cf. the next
section for a more detailed discussion of this Cosmological Principle) and have become
the ‘standard model’ of cosmology.
We will see that they already display all the essential features such as
1. a Big Bang
336
2. expanding universes (Hubble expansion)
Our first aim will be to make maximal use of the symmetries that simple cosmological
models should have to find a simple ansatz for the metric. Our guiding principle will
be the Cosmological Principle.
At first, it may sound impossibly difficult to find solutions of the Einstein equations
describing the universe as a whole. But: If one looks at the universe at large (very
large) scales, in that process averaging over galaxies and even clusters of galaxies, then
the situation simplifies a lot in several respects;
2. Furthermore we assume that the earth, and our solar system, or even our galaxy,
have no privileged position in the universe (this is occasionally referred to as the
Copernican Principle. This means that at large scales the universe should look
the same from any point in the universe. Mathematically this means that there
should be translational symmetries from any point of space to any other, in other
words, space should be homogeneous.
3. Also, we assume that, at large scales, the universe looks the same in all directions.
Thus there should be rotational symmetries and hence space should be isotropic.
Together, the second and third assumptions form the Cosmological Principle, which is
the starting point for our discussion of cosmology and on which much of the work in
cosmology is based. It is plausible (and true) that the assumption of isotropy (around
us) can be tested experimentally / observationally, while testing the assumption of
homogeneity is evidently going to be more tricky.50
Making the above assumptions, it follows from our discussion in section 17, that the
n = d-dimensional space (of course d = 3 for us) has d translational and d(d − 1)/2
rotational Killing vectors, i.e. that the spatial metric is maximally symmetric. For d = 3,
50
The extent to which these assumptions, in particular homogeneity, can be observationally tested is
discussed e.g. in C. Clarkson, R. Maartens, Inhomogeneity and the foundations of concordance cosmol-
ogy, arXiv:1005.2165 [astro-ph.CO], G. Ellis, Inhomogeneity effects in Cosmology, arXiv:1103.2335
[astro-ph.CO], R. Maartens, Is the Universe homogeneous?, arXiv:1104.1300 [astro-ph.CO].
337
we will thus have six Killing vectors, two more than for the Schwarzschild metric, and
the ansatz for the metric will simplify accordingly.
Note that since we know from observation that the universe expands, we do not require
a maximally symmetric space-time as this would imply that there is also a timelike
Killing vector and the resulting model for the universe would be static.
What simplifies life considerably is the fact that, as we have seen, there are only three
species of maximally symmetric spaces (for any d), namely flat space Rd , the sphere S d ,
and its negatively curved counterpart, the d-dimensional pseudosphere or hyperboloid
we will call H d .
Thus, for a space-time metric with maximally symmetric spacelike ‘slices’, the only
unknown is the time-dependence of the overall size of the metric. More concretely, the
metric can be chosen to be
dr 2
ds2 = −dt2 + a2 (t)( + r 2 dΩ2 ) , (18.1)
1 − kr 2
where k = 0, ±1 corresponds to the three possibilities mentioned above. Thus the
metric contains only one unknown function, the ‘radius’ or cosmic scale factor a(t).
This function will be determined by the Einstein equations via the matter content
of the universe (we will of course be dealing with a non-vanishing energy-momentum
tensor) and the equation of state for the matter.
One paradox, popularised by Olbers (1826) but noticed before by others is the following.
He asked the seemingly innocuous question “Why is the sky dark at night?”. According
to his calculation, reproduced below, the sky should instead be infinitely bright.
The simplest assumption one could make in cosmology (prior to the discovery of the
Hubble expansion) is that the universe is static, infinite and homogeneously filled with
stars. In fact, this is probably the naive picture one has in mind when looking at
the stars at night, and certainly for a long time astronomers had no reason to believe
otherwise.
However, these simple assumptions immediately lead to a paradox, namely the conclu-
sion that the night-sky should be infinitely bright (or at least very bright) whereas, as
we know, the sky is actually quite dark at night. This is a nice example of how very
simple observations can actually tell us something deep about nature (in this case, the
nature of the universe). The argument runs as follows.
338
absorption) will be
A(r) = L/4πr 2 . (18.2)
2. If the number density ν of stars is constant, then the number of stars at distances
between r and r + dr is
dN (r) = 4πνr 2 dr . (18.3)
Hence the total energy density due to the radiation of all the stars is
Z ∞ Z ∞
E= A(r)dN (r) = Lν dr = ∞ . (18.4)
0 0
Now what is one to make of this? Clearly some of the assumptions in the above are
much too naive. The way out suggested by Olbers is to take into account absorption
effects and to postulate some absorbing interstellar medium. But this is also too naive
because in an eternal universe we should now be in a stage of thermal equilibrium.
Hence the postulated interstellar medium should emit as much energy as it absorbs, so
this will not reduce the radiant energy density either.
Of course, the stars themselves are not transparent, so they could block out light com-
pletely from distant sources. But if this is to rescue the situation, one would need to
postulate so many stars that every line of sight ends on a star, but then the night sky
would be bright (though not infinitely bright) and not dark.
Modern cosmological models can resolve this problem in a variety of ways. For instance,
the universe could be static but finite (there are such solutions, but this is nevertheless
an unlikely scenario) or the universe is not eternal since there was a ‘Big Bang’ (and
this is a more likely scenario).
We have already discussed one of the fundamental inputs of simple cosmological models,
namely the cosmological principle. This led us to consider space-times with maximally-
symmetric spacelike slices. One of the few other things that is definitely known about
the universe, and that tells us something about the time-dependence of the universe, is
that it expands or, at least that it appears to be expanding.
In fact, in the 1920’s and 1930’s, the astronomer Edwin Hubble made a remarkable
discovery regarding the motion of galaxies. He found that light from distant galaxies is
systematically red-shifted (increased in wave-length λ), the increase being proportional
to the distance d of the galaxy,
∆λ
z := ∝d . (18.5)
λ
339
Hubble interpreted this red-shift as due to a Doppler effect and therefore ascribed a
recessional velocity v = cz to the galaxy. While, as we will see, this pure Doppler shift
explanation is not tenable or at least not always the most useful way of phrasing things,
the terminology has stuck, and Hubble’s law can be written in the form
v = Hd , (18.6)
where H is Hubble’s constant. To set the historical record straight: credit for this
fundamental discovery should perhaps (also) go to G. Lemaı̂tre.51
We will see later that in most cosmological models H is actually a function of time, so
the H in the above equation should then be interpreted as the value H0 of H today.
H0 = 100hkm/s/Mpc ,
h = 0.71 ± 0.06 . (18.7)
We will usually prefer to express it just in terms of inverse units of time. The above
result leads to an order of magnitude range of
(whereas Hubble’s original estimate was more in the 109 year range).
Having determined that the metric of a maximally symmetric space is of the simple
form (17.22), we can now deduce that a space-time metric satisfying the Cosmological
51
See e.g. M. Way, H. Nussbaumer, The linear redshift-distance relationship: Lematre beats Hubble by
two years, arXiv:1104.3031v1 [physics.hist-ph]; J.-P. Luminet, Editorial note to ”The beginning of
the world from the point of view of quantum theory”, arXiv:1105.6271v1 [physics.hist-ph].
340
Principle can be chosen to be of the form (18.1),
2 2 2 dr 2 2 2
ds = −dt + a (t) + r dΩ . (18.9)
1 − kr 2
Here we have used the fact that (as in the ansatz for a spherically symmetric metric) non-
trival gtt and gtr can be removed by a coordinate transformation. This metric is known
as the Friedmann-Robertson-Walker metric or just the Robertson-Walker metric, and
spatial coordinates in which the metric takes this form are called comoving coordinates,
for reasons that will become apparent below.
Here is a list of the most important elementary geometrical features of these models
that we will need and use (we will discuss their curvature tensor in section 19):
1. Spatial Geometry
The metric of the three-space at constant t is
where g̃ij is the maximally symmetric spatial metric. Thus for k = +1, a(t)
directly gives the size (radius) of the universe. For k = −1, space is infinite, so
no such interpretation is possible, but nevertheless a(t) still sets the scale for the
geometry of the universe, e.g. in the sense that the curvature scalar R(3) of the
metric gij is related to the curvature scalar R̃(3) of g̃ij by
1
R(3) (t) = R̃(3) . (18.11)
a2 (t)
Finally, for k = 0, three-space is flat and also infinite, but one could replace R3 by
a three-torus T 3 (still maximally symmetric and flat but now compact) and then
a(t) would once again be related directly to the size of the universe at constant t.
Anyway, in all cases, a(t) plays the role of a (and is known as the) cosmic scale
factor.
Note that the case k = +1 opened up for the very first time the possibility of
considering, even conceiving, an unbounded but finite universe! These and other
generalisations made possible by a general relativistic approach to cosmology are
important as more naive (Newtonian) models of the universe immediately lead
to paradoxes or contradictions (as we have seen e.g. in the discussion of Olbers’
paradox in section 18.3).
341
X X
X
X X
X
X X X X
X
X X X
X
X X X
Figure 22: Illustration of a comoving coordinate system: Even though the sphere (uni-
verse) expands, the X’s (galaxies) remain at the same spatial coordinates. These tra-
jectories are geodesics and hence the X’s (galaxies) can be considered to be in free fall.
The figure also shows that it is the number density per unit coordinate volume that is
conserved, not the density per unit proper volume.
Therefore the vector field ∂t is geodesic, which can be expressed as the statement
that
∇t ∂t := Γµtt ∂µ = 0 . (18.13)
are geodesics. This also follows from the considerations around (2.48).
Hence, in this coordinate system, observers remaining at fixed values of the spatial
coordinates are in free fall. In other words, the coordinate system is falling with
them or comoving, and the proper time along such geodesics coincides with the
coordinate time, dτ = dt. It is these observers of constant ~x or constant (r, θ, φ)
who all see the same isotropic universe at a given value of t.
This may sound a bit strange but a good way to visualise such a coordinate system
is, as in Figure 20, as a mesh of coordinate lines drawn on a balloon that is being
inflated or deflated (according to the behaviour of a(t)). Draw some dots on that
balloon (that will eventually represent galaxies or clusters of galaxies). As the
ballon is being inflated or deflated, the dots will move but the coordinate lines
will move with them and the dots remain at fixed spatial coordinate values. Thus,
as we now know, regardless of the behaviour of a(t), these dots follow a geodesic,
and we will thus think of galaxies in this description as being in free fall.
Recall, by the way, that we had already encountered such comoving coordinates
in equation (15.149) of section 15.9, when we had introduced proper time of the
342
freely falling particles on the surface of the star to describe the metric induced on
the surface of the star.
Another advantage of the comoving coordinate system is that the six-parameter
family of isometries just acts on the spatial part of the metric. Indeed, let K i ∂i
be a Killing vector of the maximally symmetric spatial metric. Then K i ∂i is also
a Killing vector of the Robertson-Walker metric. This would not be the case if
one had e.g. made an x-dependent coordinate transformation of t or a t-dependent
coordinate transformation of the xi . In those cases there would of course still be
six Killing vectors, but they would have a more complicated form.
dr
dR = a(t) ⇒ R1 (t) = a(t)fk (r1 ) , (18.15)
(1 − kr 2 )1/2
where
r k=0
fk (r) = arcsin r k = +1 (18.16)
(sinh)−1 r k = −1
(the inverses of the functions gk (ψ) defined in (17.37), but the precise form of
fk (r) will be irrelevant for this argument). It follows that
d
V1 (t) ≡ R1 (t) = ȧ(t)fk (r1 ) = H(t)R1 (t) , (18.17)
dt
where we have introduced the Hubble parameter
ȧ(t)
H(t) = . (18.18)
a(t)
The relation (18.17) clearly expresses something like Hubble’s law v = Hd (18.6):
all objects run away from each other with velocities proportional to their distance.
We will have much more to say about H(t), and about the relation between
distance and red-shift z, below.
343
metrics. We will just do this for the spatially flat k = 0 metric, but a similar
construction can be carried out in general.
The k = 0 FRW metric is
To put this into PG-like form, we keep t (which is, after all, already the proper
time of comoving observers) but introduce, instead of the comoving coordinate r
the area radius
r̃(τ, r) = a(τ )r . (18.20)
where, as above, H(t) = ȧ(t)/a(t) is the Hubble parameter. This is clearly analo-
gous to the PG metric (15.11)
p
ds2 = −(1 − 2m/r)dT 2 + 2 2m/rdT dr + (dr 2 + r 2 dΩ2 )
p 2 (18.22)
= −dT 2 + dr + 2m/rdT + r 2 dΩ2 .
p
Just as this metric is adapted to observers with dr = − 2m/rdT (for which T = τ
p
is proper time, ṙ = − 2m/r describing geodesic radial free fall with E = 1), the
metric (18.21) is adapted to observers with
d
r̃ = H(t)r̃ (18.23)
dt
which are precisely the comoving observers r̃(t) = a(t)r with r fixed obeying the
Hubble law (18.17).
While these PG-like coordinates are not commonly used in the cosmological con-
text, we will make use of them later on, in section 23.2, in the description of the
interior geometry of a collapsing star.
5. Conformal Time η
Writing the Robertson-Walker metric as
where again a tilde refers to the maximally symmetric spatial metric, we see that
it is natural to introduce a new time-coordinate η through
dη = dt/a(t) , (18.25)
344
in terms of which the Robertson-Walker metric takes the simple form
(a(η) is short (and sloppy) for a(t(η))). In terms of polar coordinates (17.36), this
becomes
ds2 = a2 (η)(−dη 2 + dψ 2 + gk (ψ)2 dΩ2 ) . (18.27)
Thus, in particular, “radial” null lines are determined by dη = ±dψ, as in flat
space, and η is also known as conformal time. This coordinate is very convenient
for discussing the causal structure of the Friedmann-Robertson-Walker universes.
Since the η-dependence resides exclusively in the overall conformal factor of the
metric, the vector ∂η is a conformal Killing vector of the Robertson-Walker metric
in the sense of (7.5),
a′ (η)
C = ∂η : ∇ α Cβ + ∇ β Cα = 2 gαβ = 2ȧ(t) gαβ . (18.28)
a(η)
We will use conformal time in section 21.3 to solve the cosmological Einstein
equations in a particular case, and we will use the above conformal Killing vector
and the associated conserved charge for null geodesics (7.9) in the discussion of
the cosmological red-shift in section 18.7.
The aim of this and the subsequent sections is to learn as much as possible about the
general properties of Robertson-Walker geometries (without using the Einstein equa-
tions) with the aim of looking for observational means of distinguishing e.g. among the
models with k = 0, ±1.
To get a feeling for the geometry of the Schwarzschild metric, we studied the properties
of areas and lengths in the Schwarzschild geometry. Length measurements are rather
obvious in the Robertson-Walker geometry, so here we focus on the properties of areas.
We write the spatial part of the Robertson-Walker metric in polar coordinates as
where gk (ψ) = ψ, sin ψ, sinh ψ for k = 0, +1, −1 (see (17.37)). Now the radius of a
surface ψ = ψ0 around the point ψ = 0 (or any other point, our space is isotropic and
homogeneous) is given by
Z ψ0
ρ=a dψ = aψ0 . (18.30)
0
On the other hand, the area of this surface is determined by the induced metric
a2 gk2 (ψ0 )dΩ2 and is
Z 2π Z π
2 2
A(ρ) = a gk (ψ0 ) dφ dθ sin θ = 4πa2 gk2 (ρ/a) . (18.31)
0 0
345
psi Circle of Radius psi1
psi=psi1
psi=psi2
psi=psi3
but for k = ±1 the geometry looks quite different. For k = +1, we have
Thus the area reaches a maximum for ρ = πa/2 (or ψ = π/2), then decreases again for
larger values of ρ and goes to zero as ρ → πa. Already the maximal area, Amax = 4πa2
is much smaller than the area of a sphere of the same radius in Euclidean space, which
would be 4πρ2 = π 3 a2 .
This behaviour is best visualised by replacing the three-sphere by the two-sphere and
looking at the circumference of circles as a function of their distance from the origin
(see Figure 21).
so in this case the area grows much more rapidly with the radius than in flat space.
In principle, this distinct behaviour of areas in the models with k = 0, ±1 might allow
for an empirical determination of k. For instance, one might make the assumption that
346
there is a homogeneous distribution of the number and brightness of galaxies, and one
could try to determine observationally the number of galaxies as a function of their
apparent luminosity. As in the discussion of Olbers’ paradox, the radiation flux would
be proportional to F ∝ 1/ρ2 . In Euclidean space (k = 0), one would expect the number
N (F ) of galaxies with flux greater than F , i.e. distances less than ρ to behave like ρ3 ,
so that the expected Euclidean behaviour would be
N (F ) ∝ F −3/2 . (18.35)
Any empirical departure from this behaviour could thus be an indication of a universe
with k 6= 0, but clearly, to decide this, many other factors (red-shift, evolution of stars,
etc.) have to be taken into account.
The most important information about the cosmic scale factor a(t) comes from the
observation of shifts in the frequency of light emitted by distant sources. To calculate
the expected shift in a Robertson-Walker geometry, let us again place ourselves at the
origin r = 0. We consider a radially travelling electro-magnetic wave (a light ray) and
consider the equation dτ 2 = 0 or
dr 2
dt2 = a2 (t) . (18.36)
1 − kr 2
As in our discussion of the gravitational red-shift in section 2.8, I will analyse this
situation in two ways, in a geometric optics approach, where we trace the lightrays
in the above geometry, and in a slightly more covariant language using the geodesic
equation and the symmetries and associated conserved charges.
1. Let us assume that the wave leaves a galaxy located at r = r1 at the time t1 .
Then it will reach us at a time t0 given by
Z 0 Z t0
dr dt
fk (r1 ) = √ = . (18.37)
t1 a(t)
1 − kr 2
r1
347
Indeed, say that b(t) is the integral of 1/a(t). Then we have
2. As in derivation 2 of section 2.8, we describe a light ray by the null wave vector
kµ = (ω, ~k). The frequency measured by an observer with velocity uµ is then
ω = −uµ kµ (2.114).
In the present case of radial null rays we have
and restrict to motion in the (t, ψ)-direction, so that ṫ and ψ̇ are related by
In section 2.8 we used the timelike Killing vector of the static spherically symmetric
metric, and its associated conserved energy, to relate the measured frequencies for
stationary obervers at different radial positions. Here we do not have a timelike
Killing vector, but we have plenty of spacelike Killing vectors V α ∂α (in particular,
with V t = 0). Among them there will be ψ-translational Killing vectors which
have the form
V = f (θ, φ)∂ψ + . . . (18.46)
(the 3-dimensional counterparts of the Killing vectors V(1) , V(2) (6.41) of the 2-
sphere, say). Associated to any such Killing vector and the null geodesic there is
the conserved momentum
Since θ and φ are constant along the lightray, we can absorb f (θ, φ) into the
definition of P , and thus we have
348
The observers we are interested in are the comoving observers at fixed values of the
spatial coordinates, i.e. with uα = (1, 0, 0, 0). Thus these measure the frequency
of the Robertson-Walker metric, and the associated conserved quantity Cα ẋα (7.9)
for null geodesics, to deduce
Remarks:
1. Astronomers like to express this result in terms of the red-shift parameter (see the
discussion of Hubble’s law above)
λ0 − λ1
z= , (18.53)
λ1
which in view of the above result we can write as
a(t0 )
z= −1 . (18.54)
a(t1 )
Thus if the universe expands one has z > 0 and there is a red-shift while in
a contracting universe with a(t0 ) < a(t1 ) the light of distant glaxies would be
blue-shifted.
2. This cosmological red-shift has nothing to do with the star’s own gravitational
field - that contribution to the red-shift is completely negligible compared to the
effect of the cosmological red-shift.
349
4. This red-shift is a combined effect of gravitational and Doppler red-shifts and,
without additional choices and input, it is not very meaningful to separate this into
the two and/or to interpret this only in terms of, say, a Doppler shift. Nevertheless,
as mentioned before, astronomers like to do just that, calling v = zc the recessional
velocity.52
Reliable data for cosmological red-shifts as well as for distance measurements are only
available for small values of z, and thus we will consider the case where t0 − t1 and r1
are small, i.e. small at cosmic scales. First of all, this allows us to expand a(t) in a
Taylor series,
a(t) = a(t0 ) + (t − t0 )ȧ(t0 ) + 21 (t − t0 )2 ä(t0 ) + . . . (18.55)
Let us introduce the Hubble parameter H(t) (which already made a brief appearance in
52
For a lucid discussion of this issue, actually arguing in favour of the pure Doppler interpretation,
see E. Bunn, D. Hogg, The kinematic origin of the cosmological redshift, arXiv:0808.1081.
53
For an introduction to the rich physics of the CMBR see e.g. the review articles D. Samtleben, S.
Staggs, B. Winstein, The Cosmic Microwave Background for Pedestrians: A Review for Particle and
Nuclear Physicists, arXiv:0803.0834 [astro-ph], or A. Jones, A. Lasenby, The Cosmic Microwave
Background, Living Rev. Relativity 1, (1998), 11; http://www.livingreviews.org/lrr-1998-11.
350
section 18.5) and the deceleration parameter q(t) by
ȧ(t)
H(t) =
a(t)
(18.56)
a(t)ä(t)
q(t) = − ,
ȧ(t)2
and denote their present day values by a subscript zero, i.e. H0 = H(t0 ) and q0 = q(t0 ).
H(t) measures the expansion velocity as a function of time while q(t) measures whether
the expansion velocity is increasing or decreasing. We will also denote a0 = a(t0 ) and
a(t1 ) = a1 . In terms of these parameters, the Taylor expansion can be written as
Higher order terms in this expansion are known as jerk (3rd derivative) and snap (4th
derivative).54
We can use the expansion (18.57) to express z as a function of (t0 − t1 ), and we can
in principle use (18.37) to express r1 as a function of (t0 − t1 ). Combining the two re-
sults and eliminating (t0 − t1 ), one therefore obtains the sought-for relation between the
red-shift z and the (coordinate-) distance r1 . The result is given in (18.64). The deriva-
tion is primarily an exercise in inverting series expansions and not per se particularly
enlightning.
From (18.57) one finds that the red-shift parameter z, as a power series in time, is
1 a1
= = 1 + (t1 − t0 )H0 − 21 q0 H02 (t1 − t0 )2 + . . . (18.58)
1+z a0
or
z = (t0 − t1 )H0 + (1 + 12 q0 )H02 (t0 − t1 )2 + . . . (18.59)
351
while expanding a(t) in the denominator we get
Z t0 Z t0
dt 1 dt
=
t1 a(t) a 0 t1 (1 + (t − t 0 )H0 + . . .)
Z t0
1
= dt [1 + (t0 − t)H0 + . . .]
a0 t1
1
= [(t0 − t1 ) + t0 (t0 − t1 )H0 − 21 (t20 − t21 )H0 + . . .]
a0
1
= [(t0 − t1 ) + 12 (t0 − t1 )2 H0 + . . .] . (18.62)
a0
Therefore we get
1
r1 = [(t0 − t1 ) + 12 (t0 − t1 )2 H0 + . . .] . (18.63)
a0
Using (18.60), we finally obtain
1
r1 = [z − 12 (1 + q0 )z 2 + . . .] . (18.64)
a0 H0
Remarks:
1. This clearly indicates to first order a linear dependence of the red-shift on the
distance of the galaxy and identifies H0 , the present day value of the Hubble
parameter, as being at least proportional to the Hubble constant introduced in
(18.6).
2. Note that the linear relation (18.17) between recessional velocity and distance of
comoving objects is exact while (18.64) shows that the relation between redshift
and distance is only approximately linear for small z.
3. However, this is not yet a very useful way of expressing Hubble’s law. First of
all, the distance a0 r1 that appears in this expression is not the proper distance
(unless k = 0), but is at least equal to it in our approximation. Note that a0 r1
is the present distance to the galaxy, not the distance at the time the light was
emitted.
4. Even proper distance is not directly measurable or observable and thus, to compare
this formula with experiment, one needs to relate r1 to the measures of distance
used by astronomers.
One practical way of doing this is based on the so-called luminosity distance dL . If for
some reasons one knows the absolute luminosity of a distant star (for instance because it
shows a certain characteristic behaviour known from other stars nearby whose distances
can be measured by direct means - such objects are known as standard candles), then
352
one can compare this absolute luminosity L with the apparent luminosity A. Then one
can define the luminosity distance dL by (cf. (18.2))
L
d2L = . (18.65)
4πA
We thus need to relate dL to the coordinate distance r1 . The key relation is
A 1 1 a1 1
= 2 2 = 2 2 . (18.66)
L 4πa0 r1 1 + z a0 4πa0 r1 (1 + z)2
Here the first factor arises from dividing by the area of the sphere at distance a0 r1 and
would be the only term in a flat geometry (see the discssion of Olbers’ paradox). In
a Robertson-Walker geometry, however, the photon flux will be diluted. The second
factor is due to the fact that each individual photon is being red-shifted. And the third
factor (identical to the second) is due to the fact that as a consequence of the expansion
of the universe, photons emitted a time δt apart will be measured a time (1+ z)δt apart.
Intuitively, the fact that for z positive dL is larger than the actual (proper) distance of
the galaxy can be understood by noting that the gravitational red-shift makes an object
look darker (further away) than it actually is. This can be inserted into (18.64) to give
an expression for the red-shift in terms of dL , Hubble’s law
dL = H0−1 [z + 12 (1 − q0 )z 2 + . . .] . (18.68)
353
19 Cosmology II: Basics of Friedmann-Robertson-Walker Cosmol-
ogy
So far, we have only used the kinematical framework provided by the Robertson-Walker
metrics and we never used the Einstein equations. The benefit of this is that it allows one
to deduce relations betweens observed quantities and assumptions about the universe
which are valid even if the Einstein equations are not entirely correct, perhaps because
of higher derivative or other quantum corrections in the early universe.
Now, on the other hand we will have to be more specific, specify the matter content and
solve the Einstein equations for a(t). We will see that a lot about the solutions of the
Einstein equations can already be deduced from a purely qualitative analysis of these
equations, without having to resort to explicit solutions (Chapter 17). Exact solutions
will then be the subject of Chapter 18.
Of course, the first thing we need to discuss solutions of the Einstein equations is
the Ricci tensor of the Robertson-Walker (RW) metric. Since we already know the
curvature tensor of the maximally symmetric spatial metric entering the RW metric
(and its contractions), this is not difficult.
From now on, all objects with a tilde,˜, will refer to three-dimensional quantities
calculated with the metric g̃ij .
2. One can then calculate the Christoffel symbols in terms of a(t) and Γ̃ijk . The
non-vanishing components are (we had already established that Γµ00 = 0)
Γijk = Γ̃ijk
ȧ i
Γij0 = δ
a j
Γ0ij = ȧag̃ij
(19.2)
354
4. Now we can use R̃ij = 2kg̃ij (a consequence of the maximal symmetry of g̃ij ) to
calculate Rµν . The non-zero components are
ä
R00 = −3
a
Rij = (aä + 2ȧ2 + 2k)g̃ij
ä ȧ2 2k
= ( + 2 2 + 2 )gij . (19.4)
a a a
ȧ2 k
G00 = 3( + )
a2 a2
G0i = 0
2ä ȧ2 k
Gij = −( + 2 + 2 )gij . (19.6)
a a a
Next we need to specify the matter content. On physical grounds one might like to argue
that in the approximation underlying the cosmological principle galaxies (or clusters)
should be treated as non-interacting particles or a perfect fluid. As it turns out, we
do not need to do this as either the symmetries of the metric or comparison with the
Einstein tensor determined above fix the energy-momentum tensor to be that of a perfect
fluid anyway.
Below, I will give a formal argument for this using Killing vectors. But informally
we can already deduce this from the structure of the Einstein tensor obtained above.
Comparing (19.6) with the Einstein equation Gµν = 8πGN Tµν , we deduce that the
Einstein equations can only have a solution with a Robertson-Walker metric if the
energy-momentum tensor is of the form
T00 = ρ(t)
T0i = 0
Tij = p(t)gij , (19.7)
355
tensor. Now we know that the metric g̃ij has six Killing vectors K (a) and that (in the comoving
coordinate system) these are also Killing vectors of the RW metric,
Therefore also the Ricci and Einstein tensors have these symmetries,
Hence the Einstein equations imply that Tµν should have these symmetries,
Moreover, since the LK (a) act like three-dimensional coordinate transformations, in order to
see what these conditions mean we can make a (3 + 1)-decomposition of the energy-momentum
tensor. From the three-dimensional point of view, T00 transforms like a scalar under coordinate
transformations (and Lie derivatives), T0i like a vector, and Tij like a symmetric tensor. Thus
we need to determine what are the three-dimensional scalars, vectors and symmetric tensors
that are invariant under the full six-parameter group of the three-dimensional isometries.
For scalars φ we thus require (calling K now any one of the Killing vectors of g̃ij ),
LK φ = K i ∂i φ = 0 . (19.11)
But since K i (x) can take any value in a maximally symmetric space (homogeneity), this implies
that φ has to be constant (as a function on the three-dimensional space) and therefore T00 can
only be a function of time,
T00 = ρ(t) . (19.12)
For vectors, it is almost obvious that no invariant vectors can exist because any vector would
single out a particular direction and therefore spoil isotropy. The formal argument (as a warm
up for the argument for tensors) is the following. We have
˜ jV i + V j∇
LK V i = K j ∇ ˜ jKi . (19.13)
We now choose the Killing vectors such that K i (x) = 0 but ∇ ˜ i Kj ≡ Kij is an arbitrary
antisymmetric matrix. Then the first term disappears and we have
LK V i = 0 ⇒ Kij V j = 0 . (19.14)
δ ki V j = δ ji V k , (19.16)
356
We now come to symmetric tensors. Once again we choose our Killing vectors to vanish at a
given point x and such that Kij is an arbitrary antisymmetric matrix. Then the condition
˜ k Tij + ∇
LK Tij = K k ∇ ˜ i K k Tkj + ∇
˜ j K k Tik = 0 (19.18)
reduces to
Kmn (g̃ mk δ ni Tkj + g̃ mk δ nj Tik ) = 0 . (19.19)
If this is to hold for all antisymmetric matrices Kmn , the antisymmetric part of the term in
brackets must be zero or, in other words, it must be symmetric in the indices m and n, i.e.
g̃ mk δ ni Tkj + g̃ mk δ nj Tik = g̃ nk δ m nk m
i Tkj + g̃ δ j Tik . (19.20)
Therefore
g̃ij k
Tij = T . (19.22)
n k
But we already know that the scalar T kk has to be a constant. Thus we conclude that the only
invariant tensor is the metric itself, and therefore the Tij -components of the energy-momentum
tensor can only be a function of t times g̃ij . Writing this function as p(t)a2 (t), we arrive at
We thus see that the energy-momentum tensor is determined by two functions, ρ(t) and p(t).
p = wρ , (19.25)
where w is known as the equation of state parameter. Occasionally also more exotic
equations of state are considered. Here are the most common and useful special cases
of this equation of state.
357
2. The trace of the energy-momentum tensor is
T µµ = −ρ + 3p . (19.26)
For radiation, for example, the energy-momentum tensor is (like that of Maxwell
theory) traceless, and hence radiation has the equation of state
p = ρ/3 , (19.27)
The equation of state parameter need not necessarily be a constant. Consider for in-
stance a scalar field φ in a cosmological FRW background. Such a field will respect the
symmetries if it depends only on t and not on the spatial coordinates, φ = φ(t). For
such a scalar field, the energy-momentum tensor (5.15),
reduces to
1 2 1 2
T00 = 2 φ̇ + V (φ) , Tij = 2 φ̇ − V (φ) gij . (19.29)
Notice that this mimicks a cosmological constant, i.e. w = −1, during periods (of the
scalar field slowly rolling down a very flat potential) where the kinetic energy term is
negligible compared with the potential energy term. This can also be seen directly from
the action (5.14), a constant potential leading to a constant contribution to the matter
or gravity Lagrangian, a.k.a. as a cosmological constant, and plays an important role in
models of inflation.
358
19.3 Conservation Laws and Comoving Congruences
J µ = n(t)uµ (19.32)
in covariant form. Here n(t) could be a number density like a galaxy number density.
It gives the number density per unit proper volume. The conservation law ∇µ J µ = 0 is
equivalent to
√
∇µ J µ = 0 ⇔ ∂t ( gn(t)) = 0 . (19.33)
Thus we see that n(t) is not constant, but the number density per unit coordinate
volume is (as we had already anticipated in the picture of the balloon, Figure 20). For a
√
RW metric, the time-dependent part of g is a(t)3 , and thus the conservation law says
Let us now turn to the conservation laws associated with the energy-momentum tensor,
∇µ T µν = 0 . (19.35)
∇µ T µi = 0 , (19.36)
turn out to be identically satisfied, by virtue of the fact that the uµ are geodesic and
that the functions ρ and p are only functions of time. This could hardly be otherwise
because ∇µ T µi would have to be an invariant vector, and we know that there are none
(nevertheless it is instructive to check this explicitly).
which for a perfect fluid with T00 = ρ(t) and Tij = p(t)gij becomes
Inserting the explicit expressions (19.2) for the Christoffel symbols, one finds
ȧ
ρ̇ = −3(ρ + p). (19.39)
a
Before discussing some special cases of this, it is instructive to rederive the above results
in a somewhat more general and covariant manner.55 Thus we consider a general velocity
field uµ (x) with uµ uµ = −1, and the perfect fluid energy-momentum tensor (19.24),
359
with for the time being ρ and p arbitrary functions of the space-time coordinates. Note
that uµ ∇µ is the covariant derivative along the integral curves of uµ , the object we
denoted D/Dτ in section 4.6. Acting on scalars we will simply denote it, as usual, by
an overdot, i.e.
uµ ∇µ ρ ≡ ρ̇ (19.41)
etc. Let us now see what the conditon ∇µ T µν = 0 (which has to hold if this energy-
momentum tensor is to give us a solution to the Einstein equations) tells us.
Here ∇µ uµ ≡ θ is (and measures) the expansion of the velocity field uµ (and was
introduced previously, in the context of the Raychaudhuri equation, in (9.60)),
and the last term uµ ∇µ uν ≡ aν is its acceleration (4.66), so that we can also write
this equation as
(ρ̇ + θρ)uν + ρaν = 0 . (19.43)
uµ uµ = −1 ⇒ u µ aµ = 0 , (19.44)
∇µ T µν = 0 ⇔ ρ̇ + θρ = 0 and aν = uµ ∇µ uν = 0 . (19.45)
Its time (energy flow) component is a continuity equation, while its space (mo-
mentum flow) part tells us that the particles have to move on geodesics.
360
and the part orthogonal to uµ gives
In particular, now the velocity field is not composed of geodesics unless the deriva-
tive of p in the directions orthogonal to uµ (i.e. the spatial derivative) is zero,
hνµ ∇µ p = 0. But this is precisely the situation we are considering in cosmology,
where ρ = ρ(t) and p = p(t) depend only on t, which is the proper time of the
comoving observers described by the velocity field uµ .
Returning thus to the cosmological setting, where we have (correctly, and uniquely as we
now know) chosen the matter to move along geodesics, we are left with the continuity
equation (19.48), which is now the same as (19.39) because for uµ = (1, 0, 0, 0) in
comoving coordinates one has
1 √
θ = ∇µ uµ = √ ∂µ ( guµ ) = a(t)−3 ∂t (a(t)3 ) = 3ȧ(t)/a(t) . (19.50)
g
d
θ = −3H 2 (q + 1) . (19.52)
dt
This equation is a special case of the Raychaudhuri equation (9.61) for timelike geodesic
congruences,
d
θ = − 31 θ 2 − σ µν σµν + ω µν ωµν − Rµν uµ uν . (19.53)
dτ
Indeed, specialising (19.53) to the family of comoving observers in a Friedmann-Robertson-
Walker geometry and noting that
• the rotation iz zero (either by explicit calculation or, more ot the point, because
the geodesics are orthogonal to the hypersurfaces t = const., or on symmetry
grounds)
361
oner sees that (19.53) reduces to
d
θ = − 13 θ 2 − R00 = −3ȧ2 /a2 + 3ä/a (19.54)
dt
which is identical to (19.51). Thus we see how the parameters H(t) and q(t), originally
introduced to characterise the first terms in a Taylor expansion of the cosmic scale factor
also govern the local behaviour of freely falling observers like (clusters of) galaxies. In
particular, in an expanding universe the congruences of comoving observers diverge /
expand (θ > 0) but the rate of expansion decreases (θ̇ < 0) provided that q > −1.
We now look at the solutions of (19.39) for some specific equations of state.
1. For instance, when the pressure of the cosmic matter is negligible, like in the
universe today, and we can treat the galaxies (without disrespect) as dust, then
one has
ρ̇ ȧ
= −3 , (19.55)
ρ a
and this equation can trivially be integrated to
2. On the other hand, if the universe is dominated by, say, radiation, then one has
the equation of state p = ρ/3, and the conservation equation reduces to
ρ̇ ȧ
= −4 , (19.57)
ρ a
and therefore
ρ(t)a(t)4 = const. (19.58)
The reason why the energy density of photons decreases faster with a(t) than that
of dust is of course . . . the red-shift.
3. More generally, for matter with equation of state parameter w one finds
4. In particular, for w = −1, ρ is constant and corresponds, as we will see more explic-
itly below, to a cosmological constant Λ. This can also be deduced by comparing
the covariant form (19.24) of the energy-momentum tensor with the contribution
of a cosmological constant to the Einstein equations, which is proportional to gµν .
By (19.24) this implies p = −ρ.
362
19.4 The Einstein and Friedmann Equations
After these preliminaries, we are now prepared to tackle the Einstein equations. We
allow for the presence of a cosmological constant and thus consider the equations
Because of isotropy, there are only two independent equations, namely the 00-component
and any one of the non-zero ij-components. Using (19.4), we find
ä
−3 = 4πGN (ρ + 3p) − Λ
a
ä ȧ2 k
+2 2 +2 2 = 4πGN (ρ − p) + Λ . (19.62)
a a a
We supplement this by the conservation equation
ȧ
ρ̇ = −3(ρ + p) . (19.63)
a
Using the first equation to eliminate ä from the second, one obtains the set of equations
ȧ2
(F 1) a2
+ ak2 = 8πG3 ρ+ 3
N Λ
Together, these are known as the Friedmann equations. They govern every aspect of
Friedmann-Robertson-Walker cosmology. From now on I will simply refer to them as
equations (F1), (F2), (F3) respectively. In terms of the Hubble parameter H(t) and the
deceleration parameter q(t), these equations can also be written as
(F 1′ ) H 2 = 8πG k
3 ρ − a2 + 3
N Λ
Note that because of the Bianchi identities, the Einstein equations and the conservation
equations should not be independent, and indeed they are not.
It is easy to see that (F1) and (F3) imply the second order equation (F2) so that, a
pleasant simplification, in practice one only has to deal with the two first order equations
(F1) and (F3). Sometimes, however, (F2) is easier to solve than (F1), because it is linear
in ä(t), and then (F1) is just used to fix one constant of integration.
363
It is also equally to see that (F1) and (F2) imply (F3), i.e. that the gravity equations
of motion imply the matter equations of motion, a general and fundamental feature of
general relativity.
Finally, formally (F2) and (F3) also imply (F1), with k (which only appears in (F1))
arising as an integration constant.
This a continuation and variation of the theme begun in section 5.3 (general formalism
for scalar fields in a gravitational field), section 5.6 (scalar field in Rindler coordinates),
and section 14.7 (scalar field in the Schwarzschild space-time).
In order to make the setting as close as possible to that of a scalar field in Minkowski
space (which is useful e.g. if one is intent on quantising the scalar field afterwards), it
is useful to employ the conformal time coordinate η, already introduced in (18.25) and
defined by
ds2 = −dt2 + a(t)2 d~x2 = a(t)2 (−dt2 /a(t)2 + d~x2 )
(19.67)
≡ A(η)2 (−dη 2 + d~x2 ) ,
with A(η(t)) = a(t) and d/dη = a(d/dt). Denoting an η-derivative by a prime, dΨ/dη =
Ψ′ , the action can then more explicitly be written as
Z h i
~ 2 − m2 A(η)2 Ψ2 .
S[Ψ] = 12 d3 x dη A(η)2 (Ψ′ )2 − (∇Ψ) (19.68)
We see that the field Ψ has a non-canonical kinetic term ∼ A(η)2 (Ψ′ )2 , leading to
Euler-Lagrange equations of motion containing “friction” terms,
2 ′
d
dη (A Ψ ) = A2 (Ψ′′ + 2(A′ /A)Ψ′ ) , (19.69)
which are awkward (in the classical theory, but even more so for quantisation). Happily,
these non-canonical terms can be eliminated by the field redefinition
364
Indeed, up to a total derivative term one then finds
Z h i
S[Ψ] → S[φ] = 2 d3 x dη (φ′ )2 − (∇φ)
1 ~ 2 − (m2 A2 − A′′ /A)φ2 . (19.71)
Remarks:
1. Observe that S[φ] is the standard action for a Klein-Gordon field in Minkowski
space, its only exotic feature being the time-dependent mass term with an effective
mass
m2eff (η) = m2 A(η)2 − A′′ (η)/A(η) (19.72)
which thus encodes the interaction of the scalar field φ with the gravitational
background. Note that this effective mass term is even present when the original
field is massless, m2 = 0.
2. This purely geometric contribution to the mass term can be interpreted as an in-
duced non-miminmal coupling to the scalar curvature R of the space-time. Indeed,
we have
d d d
A′′ /A = (1/a)(a )(a )a = (aȧ) = aä + ȧ2 . (19.73)
dt dt dt
Comparison with the result (19.5) for k = 0 shows that
Spatial flatness k = 0 brings with it the simplifying feature that we can expand the
spatial dependence of the fields in standard Fourier modes. Upon spatial Fourier
expansion, Z
~
φ(η, ~x) ∼ d3 k φ (η) e ik.~x ,
~k (19.77)
365
4. The crucial feature of this action and the mode equations are their explicit time-
dependence which means that the energy of φ is not conserved. This in turn will
lead to the important phenomena of particle or mode production in a cosmological
background.
5. For example, one can consider the (evidently highly idealised) situation where
the cosmic scale factor is asymptotically constant in the remote past and in the
remote future. During these early and late periods the metric is essentially the
Minkowski metric (possibly up to a rescaling of the coordinates), and one thus
has a preferred notion of particles during those eras, uniquely determined by
the asymptotic Poincaré symmetry. However, these definitions of particles need
not (and will amost invariably not) agree when there is an intermediate time-
dependent phase. For instance, the early time vacuum Heisenberg state would not
be interpreted or seen as a vacuum by the late time observer, and this disagreement
about the particle content is then interpreted as a particle production due to the
time-dependent gravitational field. See the references in footnote 35 (section 15.5)
for a detailed discussion of these and other related fascinating issues.
366
20 Cosmology III: Qualitative Analysis
A lot can be deduced about the solutions of the Friedmann equations, i.e. the evolution
of the universe in the Friedmann-Robertson-Walker cosmologies, without solving the
equations directly and even without specifying a precise equation of state, i.e. a relation
between p and ρ. In the following we will, in turn, discuss the Big Bang, the age of
the universe, and its long term behaviour, from this qualitative point of view. I will
then introduce the notions of critical density and density parameters, and discuss the
structure of the universe, as we understand it today, in these terms.
One amazing thing about the FRW models is that all of them (provided that the matter
content is reasonably physical) predict an initial singularity, commonly known as a Big
Bang. This is very easy to see.
(F2) shows that, as long as the right-hand side is positive, one has q > 0, i.e. ä < 0
so that the universe is decelerating due to gravitational attraction. This is for instance
the case when there is no cosmological constant and ρ + 3p is positive (and this is the
case for all physical matter). It is also true for a negative cosmological constant (its
negative energy density being outweighed by 3 times its positive pressure). It need not
be true, however, in the presence of a positive cosmological constant which provides an
accelerating contribution to the expansion of the universe. We will, for the time being,
continue with the assumption that Λ is zero or, at least, non-positive, even though, as
we will discuss later, recent evidence (strongly) suggests the presence of a non-negligible
positive cosmological constant in our universe today.
Since a > 0 by definition, ȧ(t0 ) > 0 because we observe a red-shift, and ä < 0 because
ρ + 3p > 0, it follows that there cannot have been a turning point in the past and a(t)
must be concave downwards. Therefore a(t) must have reached a = 0 at some finite
time in the past. We will call this time t = 0, a(0) = 0.
As ρa4 is constant for radiation (an appropriate description of earlier periods of the
universe), this shows that the energy density grows like 1/a4 as a → 0 so this leads to
quite a singular situation.
Once again, as in our discussion of black holes, it is natural to wonder at this point
if the singularities predicted by General Relativity in the case of cosmological models
are generic or only artefacts of the highly symmetric situations we were considering.
And again there are singularity theorems applicable to these situations which state
that, under reasonable assumptions about the matter content, singularities will occur
independently of assumptions about symmetries.
367
20.2 The Age of the Universe
With the normalisation a(0) = 0, it is fair to call t0 the age of the universe. If ä had
been zero in the past for all t ≤ t0 , then we would have
and
ȧ(t) = a0 /t0 = ȧ0 . (20.2)
where H0−1 is the Hubble time. However, provided that ä < 0 for t ≤ t0 (as discussed
above, this holds under suitable conditions on the matter content - which may or may
not be realised in our universe), the actual age of the universe must be smaller than
this,
ä < 0 ⇒ t0 < H0−1 . (20.4)
Thus the Hubble time sets an upper bound on the age of the universe. See Figure 22
for an illustration of this.
Let us now try to take a look into the future of the universe. Again we will see that
it is remarkably simple to extract relevant information from the Friedmann equations
without ever having to solve an equation.
We will assume that Λ = 0 and that we are dealing with physical matter with w > −1/3.
The Friedmann equation (F1) can be written as
8πGN 2
ȧ2 = ρa − k . (20.5)
3
The left-hand side is manifestly non-negative. Let us see what this tells us about the
right-hand side. Focus on the first term ∼ ρa2 . This term is strictly positive and,
according to (19.59), behaves as
Thus for physical matter the exponent is negative, so that if and when the cosmic scale
factor a(t) goes to infinity, one has
368
k=-1
a(t)
k=0
k=+1
t=0 t
t0
-1
H0
Now let us look at the second term on the right-hand side of (20.5), and analyse the 3
choices for k. For k = −1 or k = 0, the right hand side of (20.5) is strictly positive.
Therefore ȧ is never zero and since ȧ0 > 0, we must have
Thus we can immediately conclude that open and flat universes must expand forever,
i.e. they are open in space and time.
By taking into account (20.7), we can even be somewhat more precise about the long
term behaviour. For k = 0, we learn that
Thus the universe keeps expanding but more and more slowly as time goes on. By the
same reasoning we see that for k = −1 we have
For k = +1, validity of (20.7) would lead us to conclude that ȧ2 → −1, but this is
obviously a contradiction. Therefore we learn that the k = +1 universes never reach
369
a → ∞ and that there is therefore a maximal radius amax . This maximal radius occurs
for ȧ = 0 and therefore
3
k = +1 : a2max = . (20.11)
8πGN ρ
Note that intuitively this makes sense. For larger ρ or larger GN the gravitational at-
traction is stronger, and therefore the maximal radius of the universe will be smaller.
Since we have ä < 0 also at amax , again there is no turning point and the universe
recontracts back to zero size leading to a Big Crunch. Therefore, spatially closed uni-
verses (k = +1) with physical matter are also closed in time. All of these findings are
summarised in Figure 22.
The primary purpose of this section is to introduce some convenient and commonly used
notation and terminology in cosmology associated with the Friedmann equation (F1’).
We will now include the cosmological constant in our analysis. For starters, however,
let us again consider the case Λ = 0. (F1’) can be written as
8πGN ρ k
2
−1= 2 2 . (20.12)
3H a H
If one defines the critical density ρcr by
3H 2
ρcr = , (20.13)
8πGN
and the density parameter Ω by
ρ
Ω= , (20.14)
ρcr
then (F1’) becomes
k
Ω−1= (20.15)
a2 H 2
Thus the sign of k is determined by whether the actual energy density ρ in the universe
is greater than, equal to, or less than the critical density,
Note that, in particular, ρcr = 0 at a turning point (maximal radius, H = 0), and
correspondingly Ω → ∞ there.
This can be generalised to several species of (not mutually interacting) matter, char-
acterised by equation of state parameters wb , subject to the condition wb > 0 or
370
wb > −1/3, with density parameters
ρb
Ωb = . (20.16)
ρcr
The total matter contribution ΩM is then
X
ΩM = Ωb . (20.17)
b
Along the same lines we can also include the cosmological constant Λ. Inspection of the
Friedmann equations reveals that the presence of a cosmological constant is equivalent
to adding matter (ρΛ , pΛ ) with
Λ
Λ ⇔ ρΛ = −pΛ = wΛ = −1 . (20.18)
8πGN
Note that this identification is consistent with the conservation law (F3), since Λ is
constant. Note also that we could have alternatively read this off directly from the
Einstein equations with a cosmological constant (10.38),
Then the Friedmann equation (F1’) with a cosmological constant can be written as
k
(F 1′ ) ⇔ ΩM + ΩΛ = 1 + , (20.20)
a2 H 2
where
ρΛ Λ
ΩΛ = = . (20.21)
ρcr 3H 2
The 2nd order equation (F2’) can also be written in terms of the density parameters,
X
1
q= 2 (1 + 3wb )Ωb − ΩΛ . (20.22)
b
Finally, one can also formally attribute an energy density ρk and pressure pk to the
curvature contribution ∼ k in the Friedmann equations. (F1), which does not depend
on p, determines ρk , and then (F2), which does not depend on k, shows that wk = −1/3
(so that ρk + 3pk = 0). Thus the curvature contribution can be described as
−3k
k ⇔ ρk = −3pk = wk = −1/3 , (20.23)
8πGN a2
with associated density parameter Ωk . (F3) is identically satisfied in this case (or, if
you prefer, requires that k is constant).
371
The Friedmann equation (F1) can now succinctly (if somewhat obscurely) be written
as the condition that the sum of all density parameters be equal to 1,
(F 1′ ) ⇔ ΩM + ΩΛ + Ωk = 1 . (20.24)
In order to make the dependence of the Friedmann equation (F1) or (20.24) on the
equation of state parameters wb and on a(t) more manifest, it is useful to use the
conservation law (19.59,20.6) to write
8πGN
ρb (t)a(t)2 = Cb a(t)−(1+3wb ) ⇔ Ωb ȧ(t)2 = Cb a(t)−(1+3wb ) (20.25)
3
for some constant Cb . Then the Friedmann equation takes the more explicit (in the
sense that all the dependence on the cosmic scale factor a(t) is explicit) form
X Λ 2
ȧ2 = Cb a−(1+3wb ) − k + a . (20.26)
3
b
In addition to the vacuum energy (and pressure) provided by Λ, there are typically two
other kinds of matter which are relevant in our approximation, namely matter in the
form of dust (w = 0) and radiation (w = 1/3). Denoting the corresponding constants
by Cm and Cr respectively, the Friedmann equation that we will be dealing with takes
the form
Cm Cr Λ
(F 1′′ ) ȧ2 = + 2 − k + a2 , (20.27)
a a 3
illustrating the qualitatively different conntributions to the time-evolution.
One can then characterise the different eras in the evolution of the universe by which of
the above terms dominates, i.e. gives the leading contribution to the equation of motion
for a. This already gives some insight into the physics of the situation. We will call a
universe
372
1. No matter how small Cr is, provided that it is non-zero, for sufficiently small
values of a that term will dominate and one is in the radiation dominated era. In
that case, one finds the characteristic behaviour
Cr
ȧ2 = ⇒ a(t) = (4Cr )1/4 t1/2 . (20.28)
a2
It is more informative to trade the constant Cr for the condition a(t0 ) = a0 , which
leads to
a(t) = a0 (t/t0 )1/2 . (20.29)
This describes a decelerating universe (h(h − 1) < 0 ⇒ 0 < h < 1) for w > −1/3
and an accelerating universe (h > 1) for −1 < w < −1/3.
4. For sufficiently large a, the cosmological constant Λ, if not identically zero, will
always dominate, no matter how small the cosmological constant may be, as all
the other energy-content of the universe gets more and more diluted. In particular,
for k = 0, the Friedmann equation for a positive cosmological constant reduces to
p
2
ȧ = (Λ/3)a 2
⇒ a(t) = a(t0 )e ± Λ/3(t − t0 ) . (20.33)
5. Only for Λ = 0 does k dominate for large a and one obtains, as we saw before, a
constant expansion velocity (for k = 0, −1).
6. We will find and discuss various other exact solutions in section 21.
For a long time it was believed that the only non-negligible contribution to ρ(t) today,
let us call this ρ0 = ρ(t0 ), is pressureless matter, p0 = 0. If there is only such matter,
then by (F2) one has
ä äa 1 ρ
−3 = 4πGN ρ ⇒ q=− = (20.34)
a ȧ2 2 ρc
373
(this also follows on the nose from (F2’) in the version (20.22)). Thus for the universe
today this would imply
ρ0
q0 = = 21 ΩM . (20.35)
2ρcr,0
and thus the curvature parameter k would be directly related to the value q0 of the
deceleration parameter today:
Moreover, observations indicated a value of ρ0 much smaller than ρcr , thus suggesting
a decelerating open k = −1 universe. While perhaps not the most hospitable place in
the long run, at least this scenario had the virtue of simplicity.
However, exciting recent developments and observations in astronomy and astrophysics
have provided strong evidence for a very different picture of the universe today. I will
just the summarise the results here:
P
1. Estimates for the current matter contribution ΩM = b Ωb are
2. Ordinary (visible, baryonic) matter only accounts for a small fraction of this,
namely
(ΩM,visible)0 ∼ 0.04 . (20.38)
Most of the matter density of the universe must therefore be due to some form of
(as yet ill-understood, non-relativistic, weakly intereacting) Dark Matter.
374
20.7 Flatness, Horizon & Cosmological Constant Problems
This currently favoured Λ-CDM scenario with (ΩM )0 = 0.3, (ΩΛ )0 = 0.7 and k = 0 has
been tested and confirmed in various independent ways. Nevertheless, it is a mystery
and raises all kinds of questions and puzzles, in particular because the emergence of this
particular universe appears to be somewhat unnatural (although it is may perhaps be
difficult to quantify this sentiment) and to require an incredible amount of fine-tuning.
I will not say anything about the dark matter component, since an explanation presum-
ably needs to be found in the realm of particle physics (every model of physics beyond
the standard model worth its salt has its own dark matter candidates), and since the
story is in any case sufficiently strange and interesting even without worrying about,
or having to put up with, things like WIMPS (weakly interacting massive particles), or
MACHOS (massive compact halo objects), or binos, winos and other neutralinos.
Here, in a nutshell, are some of these problems (and for more in-depth and far-reaching
discussions of these issues please consult the extensive literature on cosmology). The
first two are independent of the recent discovery of dark energy, and appear to require
some fine-tuning or other mechanism to intervene in the very early universe. The third,
on the other hand, has to do with dark energy / the cosmological constant today, and
with the relatively recent period of accelerated expansion of the universe.
I should stress that, while I generally make an attempt in these notes to present just
the (well-established) facts, this is not entirely possible in this section, since there is no
general consensus on how to resolve these issues (in particular those arising in connection
with dark energy). As a consequence, this section contains not just facts but certainly
also some opinions (whereas there was no point in sharing with you my opinion about
the Riemann curvature tensor, say - I don’t even have one).
375
universe expands. This has to be (and will be, by the above equation) compensated
and accompanied by a huge increase in the first factor.
Thus to have (Ω)0 ≈ 1 today (whether to within a few percent, as observations
suggest, or even give or take a couple of orders of magnitude) appears to require
Ω ≈ 1 with a huge accuracy at some given time in the early universe. The fact
that Ω was very close to 1 in the past is not in itself a fine-tuning issue (after
all, this behaviour is implied by the Friedmann equations, no matter what the
size of Ω today), but it may become or look like one if one specifies a particular
(“earliest”) time in the past.
Concretely, if one e.g. assumes that the relevant early time is at the Planck scale,
tp ≈ 10−43 s, and that the age of the universe is (only orders of magnitude will be
relevant for this argument) t0 ≈ 1017 s, and if one assumes that matter consists of
dust and radiation, radiation dominating at early times, so that
In particular, any minuscule deviation from this tiny value at t = tp , i.e. a mi-
nuscule difference between the actual and the critical density of the universe at
that time, would have led to a universe completely incompatible with obervations.
The universe would have most likely either recollapsed after a very short time
(Ω → ∞), or expanded so quickly as to prevent the formation of any structure
in the universe (Ω → 0). In fact, Ω = 0 and Ω = ∞ are the only attractors
(attractive fixed points) of the theory in the terminology of dynamical systems,
while Ω = 1 is a repeller (unstable fixed point).
What is one supposed to make of this? No explanation for this can be found
in the standard model of cosmology we have been discussing here. Occasionally,
therefore, weakly anthropic arguments of the kind “if Ω had not been fine-tuned
to this value by some (unknown) mechanism/deity, we couldn’t be here in the
first place to ask the question why Ω takes the value it does today” have been
advocated.
While this cannot be ruled out (this is the whole problem with this entire anthropic
reasoning business), there is actually a mechanism which naturally leads to the
required tiny value of Ω − 1 at early times, namely inflation.57 As this mechanism
57
For introductions to inflation, see e.g. D. Baumann, TASI Lectures on Inflation, arXiv:0907.5424
[hep-th], S. Tsujukawa, Introductory review of cosmic inflation, arXiv:hep-ph/0304257, A. Liddle,
An introduction to cosmological inflation, arXiv:astro-ph/9901124, A. Linde, Inflationary Cosmol-
ogy, arXiv:0705.0164v2 [hep-th], R. Brandenberger, Inflationary Cosmology: Progress and Problems,
arXiv:hep-ph/9910410.
376
is also invoked to solve other problems of the standard model of cosmology (such as
the horizon problem, see below) which are (a) more serious and (b) not obviously
of anthropic significance, one should perhaps rather consider the anthropophile
value of Ω0 as an unintentional (but serendipitous) side-effect of inflation.
What inflation does is to postulate, for a number of reasons, a brief but highly
significant period of exponential expansion in the very early universe, as could be
triggered by the presence of a cosmological constant (20.33),
Converting this to proper distance at time t = tls , one obtains the quantity
Z tls
dt
dH (tls ) = a(tls ) , (20.46)
0 a(t)
377
The subscript H on dH stands for “horizon”; more precisely what occurs here is
what is known as a cosmological particle horizon. Its significance is that the past
light cone of events that are further than 2dH (tls ) apart on the surface t = tls do
not interesect, so that they are so far apart that they were never before in causal
contact. In particular, in the present context, this means that no causal interaction
can be responsible for the temperature being the same at the two events.
Note, by the way, that the properties of this cosmological particle horizon, as well
as of the dual cosmological event horizon (delineating how much of the universe
one can see at the end of the universe) are naturally described in terms of the
R
conformal time η = dt/a(t) introduced in (18.25). This is principally due to the
fact that the causal (lightcone) structure can easily be read off from the metric
(18.27). In particular, for k = 0 this is just conformal to Minkowski space,
These notions of horizons (and the related notion of the Hubble sphere, defined
as the sphere beyond which the recessional velocity (18.17) exceeds the speed of
light, but which is not a horizon in any useful or common sense of the word) play
an important role in cosmology, but confusions and misconceptions abound.58
To return to the CMBR story, in the (matter dominated) meantime (i.e. between
tls and t0 , today) the size of such a causal patch of size ∼ dH (tls ) on the last
scattering surface has expanded to proper size
On the other hand, the distance over which the CMBR photons could have trav-
elled since t = tls is
Z t0
dt 2/3 1/3 1/3
a(t0 ) = 3t0 (t0 − tls ) ∼ t0 , (20.50)
tls a(t)
where we have dropped the last term since t0 ∼ 1010 years while tls ∼ 105 years
(once again all numbers here and below are just meant to be order of magnitude
estimates). Thus the region of the last scattering surface from which we receive
the CMBR photons today is much larger than a causal patch, their ratio being
(one can of course calculate this either at t = tls or at t = t0 , here we have chosen
the latter option)
2/3 1/3
t0 /(t0 tls ) = (t0 /tls )1/3 ≈ 105/3 . (20.51)
58
See T. Davis, C. Lineweaver, Expanding Confusion: common misconceptions of cosmological hori-
zons and the superluminal expansion of the Universe, arXiv:astro-ph/0310808, P. van Oirschot,
J. Kwan, G. Lewis, Through the Looking Glass: Why the ”Cosmic Horizon” is not a horizon,
arXiv:1001.4795 [astro-ph.CO], G. Lewis, P. van Oirschot, How does the Hubble Sphere limit our
view of the Universe?, arXiv:1203.0032 for illuminating discussions.
378
What this means is that the sky splits into roughly 4π1010/3 & 104 disconnected
patches, that were never in communication before sending light to us. In view
of this the observed isotropy of the CMBR is not only astounding but utterly
implausible. However, we should not lay the blame on the CMBR but rather
conclude that this makes the standard cosmological model implausible.
Again, inflation solves this in an extremely natural way. Inflation operates at a
time ti ≪ tls (perhaps between 10−36 s and 10−32 s after the big bang) and it can
easily inflate a tiny causally connected patch at that time t = ti to such a size
that at time t = tls it is (more than) large enough to explain the isotropy of the
CMBR.
However, in order for inflation to be considered a natural solution to the above
problems (and others I have not mentioned - see the literature cited above), one
needs to be able to show that inflation arises fairly naturally and does not itself
require a comparably huge amount of fine-tuning in order to resolve these prob-
lems. This is a complicated and intensely debated issue, and one I don’t feel
sufficiently competent and knowledgeable about to have an informed opinion, let
alone utter an opinion in public. It will perhaps only be settled once one has a
better understanding of the physics in the very early, pre-inflationary, era that is
supposed to be responsible for setting the initial conditions for inflation.
379
(a) Why is Λ not huge?
This is the same problem as before, and is essentially the question “why the
vacuum does not gravitate” (Polchinski; see his article for a very illuminating
discussion of this issue). The whole thing may indicate some fundamental
lack of (or perhaps mis-) understanding about how gravity couples to quan-
tum fields.60
One (on the face of it rather radical) possibility that has been explored is
to modify the Einstein equations in such a way that the gravitational field
does not couple directly to any contribution to the energy-momentum tensor
∼ gµν . To that end one postulates the so-called trace-free Einstein equations
(note that both sides of this equation are manifestly traceless) supplemented
by the usual covariant energy-momentum tensor conservation law ∇α T αβ = 0
(which is not implied by the modified Einstein equations).61 While this looks
like a major departure from the usual (and well-tested) Einstein equations,
and this might make one believe that the equations (20.52) can easily be
ruled out experimentally, what actually happens is more subtle and somewhat
surprising. Namely, writing (20.52) as
and using the contracted Bianchi idenities ∇α Gαβ = 0 as well as the postu-
lated ∇α T αβ = 0, one deduces that
for some integration constant Λ. Plugging this result back into (20.52), one
finds nothing other than the usual Einstein equations with a cosmological
constant,
Rαβ − 12 gαβ R + Λgαβ = 8πGN Tαβ , (20.55)
the crucial difference being that here Λ is not determined by the matter
content but arises solely as an integration constant. While this does not
explain the observed tiny value of the cosmological constant, it separates
this issue from that of the vacuum fluctuations (however, for this reason
it may also be experimentally ruled out - see the discussion regarding “do
vacuum fluctuations gravitate?” in the reviews by Polchinski and Martin in
footnote 59).
60
See S. Hollands, R. Wald, Quantum Field Theory Is Not Merely Quantum Mechanics Applied to
Low Energy Effective Degrees of Freedom, arXiv:gr-qc/0405082 for some reflections on this issue.
61
See e.g. G. Ellis, H. van Elst, J. Murugan, J.-P. Uzan, On the Trace-Free Einstein Equations as
a Viable Alternative to General Relativity, arXiv:1008.1196v2 [gr-qc] and references therein for a
more detailed discussion and the history of this and other variants of unimodular gravity, originally even
considered by Einstein himself in 1919.
380
(b) Why is Λ not zero, and why does it have the value ρΛ ∼ (10−3 eV)4 ?
It is not clear at all what kind of mechanism would produce an effect of such
a size, but again anthropic reasoning (shudder!) can be invoked to argue
that the above is a plausible value for the cosmological constant. It would
be desirable to find alternative explanations.
(c) Why is ρΛ (constant in time) of the same order of magnitude as ρM (today)?
This is weird, and one version of what is known as the Coincidence Problem
in cosmology (another way of saying this is that two a priori completely
different and unrelated time-scales, namely on the one hand the age of the
universe, set approximately by the Hubble scale (H0 )−1 , and on the other
hand the scale Λ−1/2 set by the cosmological constant / dark energy today,
are approximately equal). To see why this is peculiar, note that ρΛ is constant
while ρM ∼ a−3 so that the ratio of the two behaves as
ρΛ (t)
∼ a(t)3 . (20.56)
ρM (t)
62
However, after writing this I became aware of the following quotation which, in the interest of a
balanced presentation, I will not withhold: “Why obfuscate? If a poet sees something that walks like a
duck and swims like a duck and quacks like a duck, we will forgive him for entertaining more fanciful
possibilities, It could be a unicorn in a duck suit - who’s to say! But we know that more likely, it’s a
duck.” R. Bousso, TASI Lectures on the Cosmological Constant, arXiv:0708.4231 [hep-th].
63
This is not to say that there is not a strong case to be made for what appears to be an extreme
fine-tuning of other (standard model) parameters to support everything from primordial nucleosynthesis
to chemistry and life as we know it - see e.g. L. Barnes, The Fine-Tuning of the Universe for Intelligent
Life, arXiv:1112.4647 [physics.hist-ph] for a recent assessment of the situation. However, among
other things precisely because of the proviso “as we know it” this begs the question and the significance
of these findings is far from clear and difficult to assess.
381
21 Cosmology IV: Some Exact Solutions
We have seen that a lot can be learnt about the Friedmann-Robertson-Walker models
without ever having to solve a differential equation. On the other hand, more precise
information can be obtained by specifying an equation of state for the matter content
and solving explicitly the Friedmann equations. We have already seen some special
exact solutions (k = Λ = 0 and only one species of matter) in section 20.5. Solutions
with k 6= 0 but Λ = 0 are also easy to obtain. Together with the exact solutions for
just k 6= 0 and Λ 6= 0, but with no other matter, these are the most useful solutions in
practice, but it is also possible to solve the equations explicitly in some other cases.64
Let us start our excursion into exact solutions by looking at the totally unphysical case
of a completely empty universe, i.e. ρ = p = Λ = 0, and only k possibly not zero. As
trivial as this may be, it has its pedagogical value, which is why I am including this
case here.
It should not come as a suprise that the metric we have found is just that of Minkowski
space (the constant a0 can be absorbed into a rescaling of the spatial coordinates).
The case k = −1 is a bit more interesting. In that case the equation to solve is (I will
call the time-coordinate τ instead of t for reasons that will perhaps become apparent
below)
ȧ2 = +1 ⇒ a(τ ) = ±(τ − τ0 ) . (21.3)
This appears to describe a non-trivial universe (known as the Milne universe) with ei-
ther (for a(τ ) = +τ ) a big bang at τ = 0 and with the universe subsequently expanding
64
For explicit solutions of the Friedmann equations for Λ 6= 0 and non-trivial matter content (gener-
alising the solution (21.55)) see e.g. B. Aldrovandi, R. Cuzinatto, L. Medeiros, Analytic solutions for
the Λ-FRW Model, arXiv:gr-qc/0508073.
382
linearly with τ > 0, or (for a(τ ) = −τ ) with a linearly contracting universe for τ < 0,
ending in a “big crunch” at τ = 0, and all this in spite of the fact that the universe
is empty. This is deceptive, however (not that it is empty, but that it is non-trivial).
As always, we need to be careful to disentangle coordinate artefacts from genuine ge-
ometrical statements. Indeed, this space-time is again nothing other than (a part of)
Minkowski space-time. To see this, start with the Minkowski metric
and introduce coordinates that are adapted to the family of space-like hyperboloids
For t > 0 these hyperboloids fill (foliate) the interior of the future lightcone at the origin
(and the interior of the past lightcone for t < 0), and if you draw these hyperboloids you
can perceive what appears to be a non-trivial dynamical evolution of these surfaces. To
show that this “fake” dynamics is precisely the dynamics exhibited by the Milne metric,
introduce new coordinates (τ, ρ, angles) via
Then the Minkowski metric becomes precisely the Milne metric (21.4),
Remarks:
1. These coordinates only cover a part of Minkowski space-time, namely the interior
of the future (and past) lightcone of the origin.
2. They are, as may have occurred to you by now, the future and past relatives of
the Rindler coordinates discussed (way back) in section 1.3 (and then again in the
context of the near-horizon geometry of the Schwarzschild metric in section 14.5),
which can be used to cover the left and right Rindler wedges of the Minkowski
space-time.
In the present 3+1 (rather than 1+1) dimensional setting, one can introduce
Rindler-like coordinates (τ, ρ, angles) adapted to the timelike hyperboloids
~x2 − t2 = ρ2 (21.10)
383
via
t = ρ sinh τ , ~x = ~nρ cosh τ . (21.11)
In terms of these the Minkowski metric takes the familiar Rindler-like form
ds2 = −ρ2 dτ 2 + dρ2 + ρ2 cosh2 τ dΩ22 = dρ2 + ρ2 (−dτ 2 + cosh2 τ dΩ22 ) (21.12)
The metric in brackets will reappear below, in the guise of the (2+1)-dimensional
de Sitter metric.
slices of constant r are of the form S 2 × R (and we will recognise these below,
in more fancy terminology, as (2+1)-dimensional versions of the Einstein
static universe);
• in Milne coordinates, slices of constant τ are hyperboloids H 3 ;
• in the Rindler-like coordinates (21.12), slices of constant ρ are are de Sitter
spaces dS3 .
This particular solution is mainly of historical interest. Before the discovery and un-
derstanding of the Hubble expansion of the universe, it was natural to assume that
the universe can be described by a static solution compatible with the cosmological
principle (homogeneity and isotropy). This led Einstein to introduce the (in-)famous
cosmological constant. Let us see how this comes about.
(F3) ⇒ ρ̇ = 0 . (21.14)
384
In particular, this implies that also ṗ = 0, and for a standard matter content of the
universe, say ρ = ρm + ρr , this moreover requires that Λ > 0,
Finally, the first Friedmann equation (F1) now becomes an algebraic equation for a(t) =
a0 , namely
k 8πGN Λ
(F1) ⇒ 2
= ρ + = 4πGN (ρ + p)
a 3 3 (21.17)
2 −1
⇒ k = +1 and a0 = (4πGN (ρ + p)) .
This is thus a static solution of the Friedmann equations,
Remarks:
1. The topology of the solution is R × S 3 , the radius a0 of the S 3 being smaller for
larger energy and pressure density (bigger gravitational attraction) and vice-versa.
3. The precise balance between matter and cosmological constant required by this
solution also implies that it is unstable to perturbations of either Λ or ρ, p: a
slight increase in Λ relative to ρ + p, no matter how small, will make the universe
expand, and a slight decrease will make it collapse. This alone is enough to make
this particular solution unphysical and shows that, even with inclusion of a cosmo-
logical constant, an expanding or collapsing universe is practically inevitable, thus
undermining the original motivation for introducing the cosmological constant in
the first place.
385
1. For k = 0, this is the equation we already discussed above, leading to the solution
(20.31),
a(t) = a0 (t/t0 )2/3 (21.21)
The equation (21.23) can be solved in closed form for t as a function of a, and the
solution to
dt a
=( )1/2 (21.25)
da amax − a
is p
amax
t(a) = arccos(1 − 2a/amax ) − aamax − a2 , (21.26)
2
as can easily be verified. The universe starts at t = 0 with a(0) = 0, reaches its
maximum a = amax at
Denoting a derivative with respect to η by a prime, and noting that ȧ = a′ /a, one
then finds that for k 6= 0 the Friedman equation (21.20) can be written as
Cm
ȧ2 + k = ⇔ (a′ )2 + ka2 = Cm a
a (21.29)
′ 2 2 2
⇔ ((a − Cm /2k) ) + k(a − Cm /2k) = kCm /4 .
Thus for k = +1, the solution to the Friedmann equation can be written as
386
Choosing η0 = π (so that a(η = 0) = 0), and integrating the relation dt/dη = a(η)
to find t(η), one then finds the solution
amax
a(η) = (1 − cos η)
2
amax
t(η) = (η − sin η) , (21.31)
2
which makes it transparent that the curve is indeed a cycloid, roughly as indicated
in Figure 22.
The maximal radius is reached at
(with amax = Cm ), as before, and the total lifetime of the universe is 2tmax .
Remarks:
1. We see that for small times (for which matter dominates over curvature) the
solutions for k 6= 0 reduce to t ∼ η 3 , a ∼ η 2 and therefore a ∼ t2/3 which is indeed
the exact solution for k = 0.
2. Analogously, for late times in the k = −1 model one finds that a(η) ∼ t(η),
reproducing the expected late-time behaviour ȧ → 1 of section 20.3.
3. In section 23 we will use the exact solutions of this matter dominated phase to
describe the interior geometry of collapsing stars.
We now consider the situation when radiation is dominant (as is expected during some
time in the very early universe). In this case we need to solve
Because a(t) appears only quadratically, it is convenient to make the change of variables
b = a2 . Then one obtains
ḃ2
+ kb = Cr . (21.35)
4
387
1. For k = 0 we had already seen the solution in (20.29),
2. For k = ±1, one necessarily has b(t) = b0 + b1 t + b2 t2 . Fixing b(0) = 0, one easily
finds the solution
a(t) = [2Cr1/2 t − kt2 ]1/2 . (21.37)
Remarks:
3. For k = −1, on the other hand, the universe expands forever, the late-time be-
haviour being given by a(t) = (−kt2 )1/2 = t, again as expected.
All this is of course in agreement with the results of the qualitative discussion given
earlier.
This case is of considerable interest for two reasons. On the one hand, as we know, Λ is
the dominant driving force for a very large, and may therefore, if current observations
are to be believed, dominate the late-time behaviour of our universe.
On the other hand, the currently most popular cosmological models trying to also ad-
dress and solve the so-called horizon problem and flatness problem of the standard FRW
model of cosmology (as well as a number of other issues) use a mechanism called in-
flation based on an era of exponential expansion during some time in the very early
universe. This is typically generated by something that acts effectively like a cosmolog-
ical constant.
Thus the equations to solve are, setting ρ = 0 and p = 0 but retaining Λ and k,
Λ 2
ȧ2 = −k + a . (21.39)
3
388
We see immediately that Λ has to be positive for k = +1 or k = 0, whereas for k = −1
both positive and negative Λ are possible,
(
2 Λ 2 if Λ > 0 k = 0, ±1 possible
ȧ = −k + a ⇒ (21.40)
3 if Λ < 0 k = −1
This is one instance where the solution to the second order equation (F2),
Λ
ä = a , (21.41)
3
is more immediate, namely trigonometric functions for Λ < 0 (only possible for k = −1)
and hyperbolic functions for Λ > 0. The first order equation then fixes the constants
of integration according to the value of k. It will be convenient to express Λ in terms
of a length-scale ℓ (which will turn out to be the curvature radius of the space-time)
through
|Λ|/3 = ℓ−2 (21.42)
so that the two Friedmann equations read
Remarks:
389
2. In fact, all 3 metrics represent the same space-time in different coordinates, namely
the de Sitter space-time. One way to establish this would be to directly exhibit the
coordinate transformations that map one metric to the other, but this is messy
and does not provide any additional insight. We will proceed in a different way
below.
3. Indeed, it turns out that the de Sitter solution is the unique maximally symmetric
space-time with positive curvature (cf. the discussion in sections 17 and 24), and
this perspective will provide us with a more efficient and insightful way of con-
structing different coordinate systems and exploring the relations among them.
5. These 3 metrics exhibit different slicings of the de Sitter space-time (or, henceforth,
de Sitter space for short, or just dS space), with the t = const. slices being R3 , S 3
and H 3 respectively.
6. The coordinates appearing in the k = +1 metric turn out to cover de Sitter space
globally.
In this case, the only possibility (within the Robertson-Walker ansatz for the metric) is
k = −1, and the solution is
Remarks:
1. This solution is nowadays known as the anti-de Sitter space-time (or anti-de Sitter
space or AdS space for short).
390
3. AdS space turns out to be the unique maximally symmetric space-time with neg-
ative curvature.
4. A detailed discussion of many different coordinate systems for anti-de Sitter space
is given in section 24.3 below.
The exact solution for a spatially flat universe dominated by non-interacting (cold dark)
matter was already given in (20.31),
and that for a spatially flat universe dominated by a cosmological constant was given
in (20.33), p
a(t) = a0 e Λ/3(t − t0 ) . (21.50)
An exact solution of the Friedmann equations can also be written down when both types
of energy/matter are present simultaneously (as in the universe today). In this case the
Friedmann equation is just
ΩM + ΩΛ = 1 . (21.51)
This evidently reproduces the above power-law (exponential) behaviour at early (late)
times.
391
Part III: Selected (Semi-)Advanced Topics
Until now, our treatment of the basic structures and properties of General Relativity
has been rather systematic. Part III contains a biased and varied selection of more
advanced fun topics.
392
22 The Reissner-Nordstrøm Solution
2m q 2
ds2 = −f (r)dt2 + f (r)−1 dr 2 + r 2 dΩ2 , f (r) = 1 − + 2 . (22.1)
r r
In this section we will analyse various aspects of this metric. Some of these depend on
the specific form of f (r), e.g. the analysis of the motion of (charged) particles in section
22.5. Others, such as the construction of Eddington-Finkelstein and Kruskal-Szekeres
coordinates in sections 22.7 and 22.9, are valid more generally for static black holes of
the ubiquitous (see e.g. the examples in section 15.10) −f (r)dt2 + f (r)−1 dr 2 form, and
these will initially be discussed in this more general context before specialising to the
Reissner-Nordstrøm metric.
In order to obtain the solution we again start with with the standard form (12.5)
of a spherically symmetric metric, and for the gauge field make the electrostatic ansatz
that
At = At (r) ≡ −φ(r) , Ar = Aθ = Aφ = 0 . (22.3)
with φ(r) the usual scalar potential. Thus the only non-vanishing component of the
field strength tensor is
393
Note that this is traceless, as it should be. Moreover, the combination
vanishes identically so that as in the Schwarzschild case (12.20) - (12.22) one concludes
that
BRtt + ARrr = 0 ⇒ A(r) = B(r)−1 ≡ f (r) . (22.10)
But this means that the determinant of the metric is the same as the determinant of
the flat metric. Recalling that
1 √
∇α F αβ = √ ∂α ( gF αβ ) (22.11)
g
this in turn implies that the usual Minkowski space solution of the Maxwell equations
describing the electric field of an electric point charge with charge Q at r = 0, namely
is also a solution of the Maxwell equations in the gravitational background we are trying
to determine. Thus we can take the matter source of the Einstein equations to be given
by the standard electrostatic field Er = Q/r 2 of a point charge. The energy momentum
tensor then reduces to
Q2
(Ttt , Trr , Tθθ , Tφφ ) = f (r), −f (r)−1 , r 2 , r 2 sin2 θ (22.14)
8πr 4
which can also be written as
Q2
T αβ = diag(−1, −1, +1, +1) . (22.15)
8πr 4
Note the characteristic negative radial pressure! Note also that it is a fortuitous coinci-
dence (or hindsight, if you will) in this case that, by doing things in the right order, we
have been able to more or less decouple the matter and gravitational equations. Usually
one cannot just plug one’s favourite Minkowski solution to the equations of motion into
the energy-momentum tensor and then solve the Einstein equations because there is
no guarantee that the initial solution will also be a solution of the matter equations of
motion in the resulting curved space-time.
As for Schwarzschild, the fact that A(r) = B(r)−1 = f (r) implies (cf. (12.19) and
(12.23)) that
Rθθ = 1 − f − rf ′ (22.16)
and plugging this into the (θθ)-component of the Einstein equation one finds
Q2 C Q2
1 − f − rf ′ = 8πGN Tθθ = GN ⇒ f =1+ + GN 2 . (22.17)
r2 r r
394
One can now verify that this is a solution of the complete set of Einstein(-Maxwell)
equations.
Comparison with the Schwarzschild solution, and introducing, in analogy with the grav-
itational mass radius m = GN M , the gravitational charge radius q via
q 2 = GN Q2 (22.18)
then gives
2m q 2
f (r) = 1 − + 2 (22.19)
r r
and finally the Reissner-Nordstrøm solution
−1
2 2m q 2 2 2m q 2
ds = − 1 − + 2 dt + 1 − + 2 dr 2 + r 2 dΩ2 (22.20)
r r r r
The main new features of the Reissner-Nordstrøm metric, compared with the Schwarz-
schild metric, are all due to the fact that the function f (r) now potentially has two
roots, at
p
r± = m ± m 2 − q 2 . (22.21)
Thus one has to distinguish the three cases m2 − q 2 < 0, = 0, > 0, corresponding to the
three possibilities for the relative size of the gravitational mass radius m = GN M and
√
the gravitational charge radius q = GN Q, and we will discuss these three possibilities
in turn below.
Remarks:
1. Instead of with (22.2) one can also start with the ansatz (12.51)
for which the Einstein equations take the form (12.54). In particular, one concludes
and
h′ (t, r) = 4πGN rf (t, r)−1 (−T tt + T rr ) = 0 , (22.24)
so that, without loss of generality, we can choose h(t, r) = 0. The remaining
equation for m(r) is
GN Q2
m′ (r) = 4πGN r 2 (−T tt ) = , (22.25)
2r 2
leading to
GN Q2 q2
m(r) = m − =m− (22.26)
2r 2r
and thus to (22.19),
2m(r) 2m q 2
f (r) = 1 − =1− + 2 . (22.27)
r r r
395
2. As a check on the dimensions note that Newton’s constant has dimensions (M
mass, L length, T time) [GN ] = M−1 L3 T−2 so that (12.28)
while in Gauss units the Coulomb force has no dimensionful factors apart from
Q1 Q2 /r 2 , and therefore (force F = ma having units [F ] = MLT−2 )
In this case of an “overcharged” star (this is not a very realistic situation to put it
mildly), f (r) has no real roots, and the coordinate system is valid all the way to r = 0
(where there is a curvature singularity). In particular, the coordinate t is always time-
like and the coordinate r is always space-like. While this may sound quite pleasing,
much less insane than what happens for the Schwarzschild metric, this is actually a
disaster.
Note that m2 − q 2 < 0 includes as a special case the solution with m = 0, supposedly
describing the gravitational field of a massless charged object. As shown in section
16.4, m measures the total energy / mass of the system (including, therefore, the pos-
itive electrostatic energy of the solution). There is thus clearly something disturbingly
unphysical about this solution.
Since, beyond exhibiting a naked timelike singularity the causal structure of these space-
times is not particularly interesting, and they are considered to be unphysical (and,
ideally, excluded by cosmic censorship), we shall not discuss this case any further in the
following and just note the following curious fact: if one wanted to model elementary
particles such as the electron classically as point particles with a given mass and charge
65
See e.g. R. Wald, Gravitational Collapse and Cosmic Censorship, arXiv:gr-qc/9710068.
396
(but one should, in any case, resist that temptation), they would satisfy q 2 > m2 by a
wide margin (since the gravitational interaction is completely negligible compared with
the Coulomb repulsion).
However, one should not conclude from this that elementary particles should hence be
modelled by naked singularities - electrons are essentially quantum mechanical objects
and outside the regime of validity / applicability of classical general relativity, and thus
outside the regime of validity of the present considerations.
397
Remarks:
(with all qk > 0, say). This is a special case of the Majumdar-Papapetrou class
√
of solutions (characterised in general by an equality GN ρm = ρe between the
matter and charge densities), and describes a multi-centered extremal black hole
solution of the Einstein-Maxwell equations, with the mutual gravitational attrac-
P
tion precisely balanced by the electrostatic repulsion, and with mass m = k qk .
Such extremal black holes (typically characterised by the saturation of an in-
equality between mass and charges) arise naturally as supersymmetric solutions
of supergravity theories. As such they enjoy particular stability properties and
provide useful toy-models for all kinds of considerations.
As for the Schwarzschild metric (section 14.5), it is instructive to look at the geometry
of the solution in the near-horizon region r̃ → 0. In the Schwarzschild case (with
r̃ = r − 2m of course) this gave us
r̃ 2 2m 2
ds2 = − dt + dr̃ + (2m)2 dΩ2 (22.38)
2m r̃
and then the Rindler-like metric (14.45)
Here, because of the double pole / zero of the extremal metric at r = m, one finds
instead that for r̃ → 0 one has
r̃ 2 2 m2 2
ds2 = − dt + 2 dr̃ + m2 dΩ2 . (22.40)
m2 r̃
398
In particular, this metric factorises, i.e. has a product structure, the second factor just
being the standard metric on the 2-sphere with constant radius m. To identify the first
factor, introduce the coordinate y = m2 /r̃. Then one has
r̃ 2 2 m2 2 2
2 −dt + dy
2
− dt + dr̃ = m (22.41)
m2 r̃ 2 y2
which is nothing other than the Lorentzian counterpart (8.75) of the constant negative
curvature metric on the Poincaré upper-half plane metric, (8.69), also known as the
two-dimensional anti-de Sitter metric AdS2 (with (curvature) radius m). See section
24.3 for more information on AdS metrics and coordinate systems for them.
dρ = −m y −1 dy = m r̃ −1 dr̃ (22.42)
−dt2 + dy 2
m2 = −e 2ρ/m dt2 + dρ2 . (22.43)
y2
Thus we can conclude that the near-horizon geometry of the extremal Reissner-Nordstrøm
metric is the product geometry
near horizon
Extremal Reissner-Nordstrøm −→ AdS2 × S 2 . (22.44)
Note that even in this near-horizon limit the location of where the horizon used to be,
namely at
is at an infinite proper distance from any point outside the horizon. This should then
be (and is) a fortiori true in the original extremal Reissner-Nordstrøm metric (22.31).
Indeed, proper radial distance in that metric is determined by
m −1
dρ = 1 − dr . (22.46)
r
Up to the replacement 2m → m this is exactly the relation (14.51) that determined the
tortoise coordinate r ∗ (14.52) for the Schwarzschild geometry, so that the solution is
(up to a choice of integration constant)
ρ = r + m log(r − m) . (22.47)
The crucial difference is that here ρ is a measure of proper distance whereas in the
Schwarzschild case r ∗ was just a coordinate. In any case this result exhibits the loga-
rithmic divergence ∼ log(r − m) as r → m. Nevertheless, the horizon can of course be
399
reached in finite proper time along e.g. timelike geodesics, or in finite affine parameter
along null geodesics. In particular, it is easy to see (cf. section 22.5) that radial ingoing
lightrays are simply described by r(λ) = r0 − Eλ (22.78) for a constant (energy) E, so
that the affine “time” required to reach the horizon is proportional to the coordinate
distance to the horizon, not the proper distance.
Remark:
This, by the way, provides a nice and drastic illustration of the fact that one should not
attempt to define something like an average velocity of a particle between two points by
dividing proper spatial distance by proper time (which would come out to be not only
“superluminal” in this case, but actually infinite). Proper time and proper distance
measure distance along completely different paths in space-time, and dividing them is
akin to dividing apples by oranges. Don’t yield to the temptation to do this (unless
you just want a way of quantifying small deviations from Minkowskian physics in a
weak gravitational field, say, without insisting on interpreting (apples)/(oranges) as a
velocity). Unfortunately, many people do not heed this advice . . .
This is in some sense the most interesting case, not because we actually expect to find
stars carrying a significant amount of electric charge, but rather because the double-
horizon structure this solution exhibits is not untypical and also appears in astrophysi-
cally more relevant cases like those of rotating black holes (the Kerr solution).
at which f (r) vanishes, and f (r) is positive for 0 < r < r− and r > r+ and negative in
the intermediate region r− < r < r+ . In this case it is more informative to write the
function f (r) and the metric as
(r − r+ )(r − r− )
f (r) = (22.49)
r2
and
(r − r+ )(r − r− ) 2 r2
ds2 = − dt + dr 2 + r 2 dΩ2 (22.50)
r2 (r − r+ )(r − r− )
The coordinate system in which we have written the metric is valid within each of the
three separate regions r > r+ , r− < r < r+ and r < r− (but (r, t) in one region are of
course not the same coordinates as (r, t) in another region).
The outer radius r+ > m is and behaves just like the event horizon of the Schwarzschild
metric, to which it tends for q → 0,
q→0 ⇒ r+ → 2m . (22.51)
400
In the intermediate region r− < r < r+ any timelike (or null) curve will then have to
move from larger to smaller values of r (and will in fact reach r− in finite proper time).
At the inner radius r− < m, which is absent for the Schwarzschild metric since
q→0 ⇒ r− → 0 , (22.52)
there is also just a coordinate singularity and a horizon that reverses the role of radius
and time once more so that the singularity is time-like and can be avoided by returning
to larger values of r.
Remarks:
1. Again (i.e. as in the extremal case) we can anticipate the appearance of a new
white hole region beyond the original inner horizon, through which the particle
can pass back across r = r− to larger values of r, and on across r = r+ to a new
asymptotically flat region etc. For a more detailed discussion of this see sections
22.5 - 22.9 below.
and always lies inside the inner horizon, rc < r− , because f (rc ) = 1 > 0.
3. In the extremal case we saw that the horizon r = m is at an infinite proper distance
from any point r0 > m. Let us see what the situation is in the current non-extremal
case. Thus we want to calculate the distance between r+ and r0 > r+ ,
Z r0 Z r̃0
r r̃ + r+
ρ= dr p = dr̃ p (22.55)
r+ (r − r+ )(r − r− ) 0 r̃(r̃ + δ)
where r̃ = r − r+ and δ = r+ − r− . For δ = 0 we evidently recover the logarithmic
divergence of the extremal case. For δ > 0, on the other hand, the integral is
finite (it can be expressed in closed form in terms of some unenelightning arccosh-
expression, but we won’t need this), the potentially dangerous piece coming from
r r+
p + → √ ∼ r̃ −1/2 (22.56)
r̃(r̃ + δ) δr̃
which has a finite integral near zero.
401
4. While we are at it, let us perform another similar calculation which will have
an amusing and perhaps unexpected consequence. Namely, let us calculate the
proper (timelike) distance between the two horizons r+ and r− , i.e. the proper
time it takes a radially freely falling observer to get from r+ to r− . We can use
the standard (t, r) coordinates in this region between the horizons (remembering
that ∂r is timelike there) and thus, according to the usual rules, the proper time
is Z r−
r
τ =− dr p . (22.57)
r+ (r − r+ )(r − r− )
Introducing the coordinate η via
p
r =m+ m2 − q 2 cos η 0≤η≤π (22.58)
one finds Z
π p
τ= dη m+ m2 − q 2 cos η . (22.59)
0
So far so straightforward. The curious thing, however, is that the second term of
the integrand ∼ cos η integrates to zero over the interval [0, π] and therefore does
not contribute to the integral at all, leading to the universal result
τ = mπ (22.60)
independent of q. Thus the proper time is mπ for any q < m, and also in the
extremal limit q → m ⇔ r− → r+ . In the extremal black hole, on the other
hand, with r+ = r− , the coordinate distance between r+ and r− = r+ is clearly
zero. What this shows is that the extremal black hole (q = m) is perhaps not for
all intents and purposes the same thing as the extremal limit (q → m) of a non-
extremal black hole, and there are also other contexts in which this distinction
appears to play a role.66
In the extremal case, it turned out to be useful and instructive to write the metric
in isotropic coordinates (22.34). More to the point, it was useful to introduce the
coordinate distance r̃ = r − m to the extremal horizon, and the resulting metric turned
out to be in isotropic form.
We can (of course) also write the non-extremal metric in isotropic coordinates, but this
is not particularly useful. However, it is possible to write the metric in a form which
interpolates nicely between the standard Schwarzschild form of the Schwarzschild metric
for q → 0 and the nice isotropic form (22.33) of the extremal metric in the extremal
limit q → m. To that end, all we need to do is replace r by ρ = r − r− , which
reduces to the isotropic coordinate ρ → r̃ = r − m in the extremal limit and, moreover,
66
For further discussion, see e.g. S. Carroll, M. Johnson, L. Randall, Extremal limits and black hole
entropy, arXiv:0901.0931 [hep-th].
402
to the standard radial coordinate ρ → r in the Schwarzschild limit r− → 0 (unlike the
r̃ = r−r+ introduced in remark 3 above, which has the Schwarzschild limit r̃ → r−2m).
measuring the deviation from extremality, the metric takes the form
ds2 = −H(ρ)−2 F (ρ) + H(ρ)2 F (ρ)−1 dρ2 + ρ2 dΩ2 (22.62)
with
r− δ
H(ρ) = 1 + , F (ρ) = 1 − . (22.63)
ρ ρ
Observe that this manifestly reduces to (22.33) in the extremal limit,
F (ρ) →1
δ→0 ⇒ r− →m (22.64)
H(ρ) → 1 + m/ρ
The form of the metric in (22.62) is representative of the form of a large class of solutions
of higher-dimensional (super-)gravity theories describing black holes and black branes
(spatially extended black objects).67 It also nicely illustrates a general recipe:
We now consider the motion of a test particle with mass µ and charge e in the Reissner
- Nordstrøm space-time with m2 > q 2 . This is evidently described by the geodesic
equation modified by the Lorentz-force term,
We will set µ = 1 in the following (the case µ 6= 1 can obviously be recovered from this
by scaling e → e/µ), so that quantities associated to the particle like charge, energy
67
See e.g. T. Ortin, Gravity and Strings for a detailed account.
403
and angular momentum that appear below are, as usual (cf. the discussion in section
2.4) to be thought of as quantities per unit particle mass.
Proceeding in exact analogy with the derivation of the effective potential for geodesics in
the Schwarzschild geometry in section 13.1, in order to exploit the symmetries and con-
served charges of the system it will be convenient to work at the level of the Lagrangian
which we can choose to be
L = 21 gαβ ẋα ẋβ + eAα ẋα (22.67)
Plugging in the metric and gauge field (and choosing rightaway equatorial paths at
θ = π/2 because of spherical symmetry), this Lagrangian becomes more explicitly
ṙ 2 = (E − eQ/r)2 − f , (22.72)
eEQ − m q 2 − e2 Q2 eEQ − GN M 2 GN − e
2
Vef f (r) = + = + Q , (22.74)
r 2r 2 r 2r 2
and the term on the right-hand side defines the effective energy Eef f , so that, as in the
case of the Schwarzschild metric, we can write the (first integral of the) radial geodesic
equation in the suggestive Newtonian form
1 2
2 ṙ + Vef f (r) = Eef f . (22.75)
404
From this one can then readily deduce the qualitative features of the worldlines of
charged and uncharged particles in the Reissner-Nordstrøm geometry.
Remarks:
which includes the usual angular mometum barrier term ∼ L2 /r 2 as well as the
familiar attractive general relativistic correction term ∼ mL2 r −3 and a novel re-
pulsive correction term ∼ q 2 L2 r −4 . However, we will focus on radial motion in
the following.
2. For massless particles (ǫ = 0), which we will of course also consider to be uncharged
(q = 0), the effective potential consists of just the above angular momentum term,
In particular, for radial lightrays the effective potential is zero and ṙ = ±E for
outgoing (respectively ingoing) null geodesics, so that
r(λ) = r0 ± Eλ , (22.78)
with λ the affine parameter. In particular, massless particles can reach the horizon
(and r = 0) in finite affine time.
3. For certain purposes it is also useful to write the effective potential in terms of
the horizons r± instead of the parameters m and q using the relations
r+ + r− = 2m , r+ r− = q 2 . (22.79)
√
Then one has, with ẽ = e/ GN ,
√
2ẽE r+ r− − (r+ + r− ) r+ r−
Vef f (r) = + 1 − ẽ2 . (22.80)
2r 2r 2
4. The interpretation of the first term in (22.74) is pretty clear: it describes the
competition between the leading Coulomb electrostatic and Newton gravitational
1/r interactions between the charged massive star and the charged massive test
particle. The only thing that may require some explanation is the factor of E in
the Coulomb interaction. But, as we know from the discussion of Schwarzschild,
E is esentially the special-relativistic γ-factor and thus the substitution Q → EQ
accounts for the Lorentz contraction of the electric field lines as seen by a particle
2 = E 2 − 1 at r = ∞.
with velocity ṙ∞
405
5. The second term is more mysterious and interesting in several respects. First of
all, for a neutral test particle freely falling (following a geodesic) in the Reissner
- Nordstrøm geometry, it provides a repulsive potential at short distances,
m q2
e=0 ⇒ Vef f (r) = − + 2 . (22.81)
r 2r
mimicking the angular momentum barrier term L2 /2r 2 . This inevitably leads to
a turning point of the trajectory, and we will see below that this turning point
lies inside the inner horizon, i.e. in the region r < r− . A heuristic but not entirely
satisfactory explanation for the occurrence of this phenomeneon is that this is due
to a mass renormalisation
q2
m → m(r) = m − (22.82)
2r
required to compensate the infinite electrostatic energy density ∼ q 2 /r 4 of the
star. Alternatively one may take this as an indication that the interior of the
Reissner-Nordstrøm solution is not particularly physical (we will come back to
this below).
6. For a charged particle, the coefficient q 2 = GN Q2 in the 1/r 2 term of the potential
is replaced by Q2 (GN −e2 ). Recalling that e is really the charge per unit mass and
replacing e → e/µ, one sees that the sign of this term is determined by the sign
of G2N µ2 − GN e2 , i.e. the relative size of the gravitational mass and charge radii
of the test particle. Reverting to µ = 1, we will call ordinarily charged particles
those for which e2 < GN , extremal those with e2 = GN and overcharged those
with e2 > GN . Thus the 1/r 2 term in the radial effective potential provides a
repulsive potential at short distances for all ordinarily charged particles, but this
term becomes attractive for overcharged particles.
Note that this not somehow an electrostatic effect (in particular since it is in-
dependent of the sign of the charge e of the particle) but a purely gravitational
effect. I do not have a good heuristic explanation for why overcharged particles
all of a sudden experience an attractive 1/r 2 potential (and would be glad to learn
of one . . . ).
7. If one gives the particle some angular momentum, no matter how tiny, i.e. if there
is just the slightest deviation from radial motion, then the term q 2 L2 /r 4 will kick
in at short distances to yet again provide a repulsive potential as a joint effect of
the charge of the black hole and the angular momentum of the particle.
Let us now take a closer look at some of the possible trajectories. First of all, for a
given choice of parameters the allowed values of r are constrained by the condition
406
and at a turning point rm (for maximal or minimal radius) of the trajectory one has
2m q2
− + 2 = E2 − 1 . (22.85)
rm rm
For a particle initially at rest at infinity, E = 1, one immediately reads off that the
minimal radius is equal to the core radius rc (22.54),
q2 r+ r− r−
rm (E = 1) = rc = = = < r− . (22.86)
2m r+ + r− 1 + (r− /r+ )
It is also plausible (and moreover true) that the particle will penetrate slightly deeper
into the Reissner-Nordtstrøm core if it initially has a non-zero inward directed velocity,
i.e.
E > 1 ⇒ r m < rc , (22.87)
but clearly no finite energy particle can overcome the charge barrier to reach r = 0.
These particles will turn around at rm (E) and then escape again to infinity in a new
branch of the universe (since they clearly can’t cross the same inner horizon r− in both
directions).
Particles with E < 1 have both a minimum and a maximum radius rm± , located in the
regions rm− < r− and rm+ > r+ respectively. Thus these particles appear to oscillate
in and out of the black hole region r− < r < r+ but this is clearly not possible (if
r+ is a black hole horizon, you cannot just dance around and oscillate in and out of
it to your heart’s content). What is happening is that, after having reached its inner
turning point at rm− < r− , the particle turns around to larger values of r, crosses a new
r = r− horizon into a new (time-reversed) version of the region r− < r < r+ in which it
can only move to larger values of r, crosses a white hole horizon at r = r+ into a new
asymptotically flat Reissner-Nordstrøm patch, up to the maximal radius rm+ allowed
by its energy, and then turns around again to enter another new region etc. etc.
We can also consider charged particles. Their behaviour depends strongly on whether
they are ordinarily charged or overcharged particles (the latter having a regretful suicidal
tendency to end up in the singularity at r = 0 regardless of the sign of the charge), but
also on the energy and on whether the Coulomb or gravitational 1/r interaction is
dominant. Thus this requires a bit of a case by case analysis which we will not pursue
407
here. Suffice it to say here that for all ordinarily charged particles one finds that the
turning point of a radially infalling particle is located inside the inner horizon (and not
outside the outer horizon, which would, in principle, have been the other option).
In fact, the basic construction works for any metric of the form
u = t − r∗ , v = t + r∗ . (22.90)
Infalling radial null geodesics (dr ∗ /dt = −1) are characterised by v = const. and
outgoing radial null geodesics by u = const.
(with an analogous expression for the metric written in terms of (u, r)). This
metric is now regular at any zero rh of f , f (rh ) = 0.
4. In these coordinates, the Killing vector ∂t of the original space-time, rendering the
“outside” region f (r) > 0 static, takes the form
408
6. If f (r) changes sign from f (r) > 0 to f (r) < 0 as one moves from r > rh to r < rh ,
then the situation is identical to that for the Schwarzschild black hole:
• the Killing vector ∂v becomes null on the horizon and spacelike for for r < rh
• r = rh is an event horizon and r will decrease along any future-directed causal
(timelike or lightlike) path.
Let us now see what happens to the lightcones, and what are therefore the allowed
paths for massless or massive particles, as one enters the Reissner-Nordstrøm black hole
thorugh the future outer horizon.
[It would be extremely helpful to have some (Carter-Penrose) diagrams to look at during
this discussion. I will try to supply these at a later stage. For the time being, depending
on where you are reading this, please consult either another textbook for the diagrams,
or, for the same effect, contemplate the tiles on the bathroom floor.]
1. The first thing that will happen is, as alaready discussed above, and eactly as
in the Schwarzschild case, that the lightcones tilt over at r = r+ . Subsequently
both ingoing and (misleadingly still called) “outgoing” light rays will converge to
smaller values of r,
In particular, once inside one must continue to smaller values of r until one reaches
either a singularity (as for Schwarzschild) or another horizon.
409
2. In the Reissner-Nordstrøm case there is indeed such a horizon, namely at r = r− .
The ingoing Eddington-Finkelstein coordinates are still valid at the inner horizon
r = r− because the metric is regular there, the horizon sits (tautologically) at the
finite value r = r− of the radial coordinate, while on that surface the coordinate
v (and the angular coordinates) still label the ingoing lightrays crossing the inner
horizon. Indeed, the coordinates continue to be valid all the way up to r = 0.
Note that for this we do not need to know if the tortoise coordinate r ∗ is well-
behaved at r = r− . As a matter of fact, it is not, but this does not affect the
Eddington-Finkelstein coordinates. It will, however, affect the Kruskal-Szekeres
coordinates which will break down at r = r− (but have a larger region of validity
across the past white hole horizon of the original Reissner-Nordstrøm patch).
3. At r = r− , the function f (r) again changes sign and outgoing lightrays are indeed
outgoing, i.e. moving to larger values of r. Once these outgoing light rays in the
region r < r− reach the new (white hole) inner horizon at r = r− , the original
set of Eddington-Finkelstein coordinates (v, r) finally break down since v can only
label the ingoing lightrays and the new r− sits at advanced time v = ∞.
4. But in this patch r < r− (whose metric is identical to that in the outside patch
r > r+ ), one can also construct and use outgoing Finkelstein coordinates (u, r),
with u labelling the outgoing light rays. These coordinates will not only cover the
region 0 < r < r− , but they will extend across the new white hole inner and outer
horizons r∓ into a new asymptotically flat Reissner-Nordstrøm patch.
5. From that region one can in principle continue into a new black hole region across
another r+ , but now it is evidently the outgoing Eddington-Finkelstein coordinates
that break down and one returns to step 1 and again constructs ingoing Eddington-
Finkelstein coordinates to describe this.
6. It is now evident that, proceeding in this way, one can pave / tessellate the en-
tire infinitely periodic fully extended non-extremal Reissner-Nordstrøm solution
with ingoing and outgoing Eddington-Finkelstein coordinates (whose domains of
validity overlap in the regions r < r− and r > r+ where f (r) > 0).
In the extremal case, when there is a double zero of f (r), the story is similar, the only
difference being that the region between r− and r+ is absent. The metric has the form
m 2 2
ds2 = − 1 − dv + 2dv dr + r 2 dΩ2 . (22.96)
r
In particular, r = m is a null surface and a horizon. In the patch covered by the above
ingoing Eddington-Finkelstein coordinates one can only cross it along future-directed
curves. The diffference is that now f (r) is positive on both sides of the horizon, so that
outgoing light rays are really outgoing on both sides of the horizon. Ingoing Eddington
410
Finkelstein coordinates can still also cover the patch r < m, but they cannot describe
the new outgoing region beyond the new white whole inner horizon r = m. For this
one needs to introduce outgoing Eddington-Finkelstein coordinates etc. Again one can
pave the infinitely periodic extremal Reissner-Nordstrøm geometry in this way.
One can also introduce Kruskal-Szekeres coordinates via the same chain of transforma-
tions
(t, r) → (u, v) → (U, V ) → (T, X) (22.97)
as in the Schwarzschild case. We again start by setting up the problem in the general
context of metrics of the form (22.89),
ds2 = f (r)[−dt2 + (dr ∗ )2 ] + r(r ∗ )2 dΩ2 = −f (r)du dv + r(u, v)2 dΩ2 , (22.98)
and we will now be more specific and assume that f (r) has a simple zero at r = rh .
Since the issue is the elimination of the coordinate singularity at rh , we can focus on
the behaviour of f (r) near r = rh ,
(here is where the treatment for a double zero, say, would of course be different). Thus
the tortoise coordinate can be approximated by
dr 1
dr ∗ = f (r)−1 dr ≈ ⇒ r∗ ≈ log |r − rh | . (22.100)
(r − rh )f ′ (rh ) f ′ (rh )
Here f ′ (rh ) has the same dual interpretation (15.110) as in section 15.7, namely on the
one hand as the inaffinity, measuring the failure of the coordinate v to provide an affine
parametrisation of the horizon generators K = ∂v , and on the other hand as the surface
gravity, providing a measure of the strength of the gravitational field at the horizon,
(
inaffinity: limr→rh ∇K K = κh K
κh = 12 f ′ (r)|r=rh = (22.101)
surface gravity: κh := limr→rh f (r)1/2 a(r)
To see this note that the derivation of the inaffinity given in (15.107) and (15.108) goes
through verbatim in general,
and that the generalisation of the calculation of the acceleration for a stationary observer
in section 14.1 gives
411
In terms of κh , the approximate expression for the tortoise coordinate is
1
r∗ ≈ log |r − rh | (22.104)
2κh
(the suppressed subleading terms being regular as r → rh ). Then near r = rh the
function f (r) can be approximated by
∗ ∗ ∗
f (r) ≈ f ′ (rh )(r − rh ) ≈ 2κh e 2κh r = 2κh e κh (r − t) + κh (r + t) = 2κh e −κh u e +κh v
(22.105)
and for the (t, r)-part of the metric one has
ds2 = −f (r)du dv ≈ −2κh e −κh u du e κh v dv = −2κ−1 h d −e −κh u d +e +κh v .
(22.106)
We thus introduce the Kruskal coordinates (uK , vK ) (with the normalisation as in
(15.93)) by
uK = −(κh )−1 e −κh u , vK = +(κh )−1 e +κh v (22.107)
and (if one doesn’t like null coordinates) one can also introduce new time- and space-
coordinates (tK , xK ) via
uK = tK − xK , vK = tK + xK , (22.108)
and we have just seen that in terms of these Kruskal coordinates the metric near the
horizon at r = rh takes the manifestly non-singular form
ds2 = −f (r)du dv + r(u, v)2 dΩ2 ≈ −CduK dvK + rh2 dΩ2 . (22.109)
The precise value of the coefficient C > 0 will depend on the non-singular terms in r ∗
(which we have suppressed here), evaluated at r = rh , and on the choice of integration
constants, and is therefore arbitrary, and also irrelevant.
Thus these Kruskal coordinates provide us with a good system of coordinates not just
in the original patch where the coordinates (t, r) were valid, but also across the future
horizon at uK = 0 and the past horizon at vK = 0. In the Schwarschild case, more than
that was true, namely the Kruskal coordinates provided us with a coordinate system
for the complete maximal extension of the Schwarzschild geometry. This is, however,
not guaranteed by the above general construction and need not, and will not, be true
in general.
Indeed, already for the Reissner-Nordstrøm metric we will be able to see this explicitly.
To that end we will need the explicit expressions for the tortoise coordinate. In the
non-extremal case we have
(r − r+ )(r − r− )
f (r) = . (22.110)
r2
412
The surface gravities at the two horizons r± are
r+ − r−
κ± = 12 f ′ (r± ) = ± 2 . (22.111)
2r±
r2 1 1
dr ∗ = dr ⇒ r∗ = r + log |r −r+ |+ log |r −r− | . (22.113)
(r − r+ )(r − r− ) 2κ+ 2κ+
to cover the region around r+ . This coordinate system does not just cover the future
black hole region (until r = r− , see below) but (as for Schwarzschild) also a past white
hole region (until r = r− ) and a mirror asymptotically flat region of the Reissner-
Nordstrøm patch.
However, these coordinates break down as r → r− . This can be seen from the explicit
expression of the metric in these coordinates, which display a (coordinate) singularity
at r = r− (but we will forego this here). It can also be seen, more directly, and more to
the point, from the fact that, as noted above, r ∗ → ∞ for r → r− , so that
so that the outer Kruskal coordinates uK+ , vk+ are singular at, and can therefore not
be extended beyond, the inner horizon.
In the region between the horizons one can then introduce the inner Kruskal coordinates
(uK− , vK− ) which extend beyond r− (but become singular at r+ instead). These two
types of Kruskal coordinate systems can then be used alternatingly, each an infinite
number of times, to pave the entire space-time.
Since thus the space-time cannot be covered by a single Kruskal coordinate patch,
typically not a whole lot is gained by using Kruskal coordinates and for most purposes
the simpler Eddington-Finkelstein coordinates are actually more convenient. However,
as a matter of principle it is useful to know (and a remarkable fact in its own right)
that there is a generalisation (due to Klösch and Strobl) of the Israel coordinates for
the Schwarzschild metric discussed in section 15.8 that provides a global covering of
the complete (infinitely periodically extended) Reissner-Nordstrøm space-time - see the
reference in footnote 39 of that section.
413
Nevertheless, all in all, as in the case of the Kruskal diagram for the eternal fully ex-
tended Schwarzschild space-time, one should perhaps be somewhat skeptical of this
intriguing and entertaining white hole - black hole structure and narrative for the ex-
tended Reissner-Nordstrøm spacetime and take it with a substantial grain of salt:
1. First of all, for a collapsing star settling down to the non-extremal Reissner-
Nordstrøm solution, the exotic regions beyond (i.e. before) the past white hole
horizon r+ are eliminated (as for Schwarzschild), as is the mirror region - but at
first the infinite chain of white and black holes in the future remains intact.
3. One can repeat the story for the extremal case. In this case, the surface gravity
is zero, but one can still construct Eddington-Finkelstein coordinates (as we have
seen above), so that one can describe the region behind the horizon in this case.
The instability of the inner horizon of a non-extremal black hole may however
limit the validity of this picture and, in fact, seems to suggest that in the extremal
case the outer = inner horizon may become singular.69
68
E. Poisson, W. Israel, Internal Structure of Black Holes, Phys. Rev. D41 (1990) 1796. See also
section 5.7.3 of E. Poisson, A Relativist’t Toolkit.
69
See D. Marolf, The dangers of extremes, arXiv:1005.2999 [gr-qc] and D. Garfinkle, How extreme
are extreme black holes?, arXiv:1105.2574 for a discussion of this.
414
23 Interior Solution for a Collapsing Star and Oppenheimer-Snyder
Collapse
In section 12.7 we had described the general set-up (as well as a special solution) for
the interior solution of a static spherically symmetric star. In section 15.9 we had
described the exterior (Schwarzschild) geometry of a collapsing star. We will now use
cosmological solutions to give an idealised description of the interior of a star during the
time-dependent phase of gravitational collapse, and also make sure that the exteriod
and interior descriptions of the surface of the star match.
If we assume that matter inside the spherically symmetric star can be modelled by
a perfect fluid with spatially uniform energy density ρ = ρ(t) and pressure p = p(t),
then the spatial geometry is locally both isotropic and homogeneous. Thus (see section
17.1) the spatial geometry of the star is that of a (bounded subspace of a) maximally
symmetric space, and solutions are then governed by the Friedmann equations.
Let us first address the geometry of this problem. The spherically symmetric star is a
3-ball B 3 , i.e. a 3-dimensional space with boundary a 2-sphere S 2 . Its 2-dimensional
counterpart would usually be called a disc (or 2-disc) D 2 , a surface with boundary a
circle S 1 . A priori, one could model the geometry of this disc e.g. as the subset of the
Euclidean plane (with its induced maximally symmetric flat metric),
or as the cap of a sphere (with its induced maximally symmetric positive curvature
metric),
ds2 (D 2 ) = dθ 2 + sin2 θ dφ2 (θ ≤ θ0 ) (23.2)
or even as its negative curvature counterpart. Likewise, one can model the 3-ball ge-
ometry of a spherically symmetric star in terms of bounded subspaces of any of the
k = 0, ±1 3-geometries that we have been considering, e.g. the spatially flat 3-ball or
3-disc for k = 0 or the cap of a 3-sphere for k = +1,
k = 0 : ds2 (B 3 ) = dr 2 + r 2 dΩ22 (r ≤ r0 )
2 3 2 2
(23.3)
k = +1 : ds (B ) = dψ + sin ψ dΩ22 (ψ ≤ ψ0 )
(we had already encountered the k = +1 cap/disc/3-ball as the spatial geometry un-
derlying a static spherically symmetric star in section 12.7).
As far as the matter content is concerned, a spatially constant non-zero pressure would
in particular lead to a non-zero pressure at the surface of the star. This would need
415
to be compensated by a non-zero surface tension, a further contribution to the energy-
momentum tensor, δ-function localised on the surface of the star. In order not to have
to deal with this situation, we will only consider the simplest possibility, namely that
of pressureless dust, p = 0 (Oppenheimer and Snyder, 1939).
In this case, the interior solution is provided by the cosmological solutions of the matter
dominated era derived in section 21.3, with the radial coordinate of the star restricted
to run over a finite range, as in (23.3). The exterior solution would then, as in our
discussion of section 15.9, be given by the Schwarzschild metric, and one thing we need
to do is make sure that the exterior and interior descriptions of the surface of the star
agree.
In order to understand the role of the spatial curvature k and how to glue the interior and
exterior solutions together, recall that we had already seen in section 19.3, in (19.45),
that pressureless dust necessarily moves along geodesics of the space-time geometry. In
the present case these are the geodesics of comoving observers (dust particles) and the
cosmological time t is their proper time. In particular, if we think of a(t) as (proportional
to) the time-dependent radius of the collapsing star, then a(t) describes the geodesic
trajectories of particles at the surface of the star.
However, these surface particles should follow geodesics as if the total mass of the star
were concentrated at the center of the star, i.e. they should also move along geodesics
of the outside Schwarzschild geometry with that mass. Thus we are led to the, a priori
perhaps somewhat surprising, statement that the Friedmann equations for dust must
agree with the geodesic equation for radially freely falling particles in the Schwarzschild
geometry. This is indeed the case:
• On the one hand, the evolution of the cosmic scale factor is governed by the
Friedmann equation (21.20),
Cm
ȧ2 + k = . (23.4)
a
• On the other hand, according to the results of section 14.2, radial free fall is
governed by the equation (14.13),
2m 2m
ṙ 2 + = , (23.5)
ri r
where ri is the radius where the particle is initially at rest, ṙ(r = ri ) = 0.
We see that these really do have the same form (and we will match them more precisely
below). We also see that the choice of pressureless matter, i.e. the equation of state
parameter w = 0, for the interior of the star is essential for this (other values of w
leading to other powers of a on the right-hand side of (23.4)).
This similarity of the equations in the case w = 0 is also reflected in the explicit solu-
tions of the geodesic and Friedmann equations, as for example in the solution (15.150)
416
of the radial geodesic equation in the Schwarzschild geometry and the recollapsing so-
lution (21.31) of the Friedmann equations for a spatially closed dust-filled universe. In
particular, the spatially flat k = 0 3-disc geometry corresponds to ri = ∞, i.e. the case
where the surface of the star behaves as if it had been released from rest at infinity,
while a finite ri corresponds to k = +1. Thus a matching with the exterior geometry of
the collapsing star discussed in section 15.9 (where we assumed free fall from rest from
a finite radius r0 ) requires k = +1. However, the k = 0 solution is also instructive in its
own right, and we will analyse both possibilities below.
In order to match the exterior and interior geometries, we should start by matching
the coordinates used for the two solutions. In both cases, due to spherical symmetry
(and due to having chosen coordinates that make this symmetry manifest), there is a
transverse 2-sphere with line-element dΩ22 = dθ 2 + sin2 θdφ2 , and we will simply identify
the coordinates (θ, φ) of the two solutions.
This leaves us with the temporal and radial directions. The time coordinate of the
cosmological (interior) metrics is the proper time of comoving observers (in particular
those on the surface of the star), and this is a natural choice which we will maintain. It
follows that also for the exterior Schwarzschild geometry we should choose coordinates
such that the time-coordinate is the proper time of these comoving = freeely falling
observers, and we have already constructed various such coordinate systems in section
15. In particular, we could use
In order to illustrate the procedure, we will pursue both options and discuss the case
k = 0 in terms of PG-like coordinates and the case k = +1 in terms of comoving
coordinates.
417
23.2 k = 0 Collapse and Painlevé-Gullstrand Coordinates
Here we have already identified the cosmological time t = τ as the proper time of comov-
ing observers, H(τ ) = ȧ(τ )/a(τ ) is the Hubble parameter, and the radial coordinate r̃
is related to the usual comoving radial coordinate of the Robertson-Walker metric (now
denoted rc , to avoid confusion with the radial coordinate of the Schwarzschild or PG
metric)
ds2 = −dτ 2 + a(τ )2 (drc2 + rc2 dΩ2 ) (23.9)
by
r̃ = a(τ )rc . (23.10)
This form of the metric is adapted to comoving observers (fixed rc ), which obey the
Hubble relation dr̃/dτ = r̃H(τ ). We assume that the surface of the star has fixed
(comoving) radial coordinate rc,0 , and is thus described by r̃ = R̃0 (τ ),
Its time-dependence (i.e. the collapse of the star) is governed by the negative square-root
of the k = 0 Friedmann equation for pressureless matter,
p p
ȧ(τ ) = − Cm a(τ )−1/2 ⇒ R̃˙ 0 (τ ) = − Cm (rc,0 )3/2 R̃0 (τ )−1/2
(23.12)
⇒ R̃0 (τ ) = rc,0 (9Cm /4)1/3 (τ0 − τ )2/3 .
where
2r̃ 3
m̃(τ, r̃) = . (23.14)
9(τ0 − τ )2
418
Comparison of the two metrics, the exterior Schwarzschild metric in Painlevé-Gullstrand
form (23.6) and the interior metric in Painlevé-Gullstrand-like form (23.13), makes
it manifest that we should identify not only PG time T with FRW time t = τ (as
already anticipated above), but also the PG-Schwarzschild radial coordinate r with the
cosmological radial coordinate r̃.
A seamless matching of the two metrics then requires the identification of the location
of the surface of the star from the two sides, through
R0 (τ ) = R̃0 (τ ) , (23.15)
or, equivalently (as we will see), that the mass function m̃(τ, r̃) (23.14) evaluated on the
surface of the star r̃ = R̃0 (τ ), agrees with the constant Schwarzschild mass m of (23.6),
Note that this necessarily leads to the requirement (that we had already imposed) that
the interior of the star is described by pressureless dust, in order to reproduce the
characteristic τ 2/3 -behaviour of the geodesic.
Either from the explicit expression for the two solutions, or from comparing the geodesic
equation in (23.7) with the Friedmann equation in (23.12), one finds that R0 (τ ) = R̃0 (τ )
is equivalent to
This resulting condition relating the parameters of the exterior and interior solutions
can be demystified by recalling the definition (20.25) of Cm as the constant
8πGN
Cm = ρ(τ )a(τ )3 . (23.18)
3
Then (23.17) becomes
3 m 4π 4π
2m = Cm rc,0 ⇔ M≡ = (a(τ )rc,0 )3 ρ(τ ) = R̃0 (τ )3 ρ(τ ) , (23.19)
GN 3 3
which is simply the statement that at all times the Schwarzschild gravitational mass-
energy M of the star felt by the freely-falling particles on the star’s surface is precisely
the total mass-energy (density times volume) of the star. This encapsulates the essence
of the Oppenheimer-Snyder construction.
With this identification, the induced metric on the surface Σ of the star satisfies
h p i
ds2Σext ≡ −dT 2 + (dr + 2m/rdT )2 + r 2 dΩ2
T =τ,r=R0 (τ )
2 2 2
= −dτ + R0 (τ ) dΩ
(23.20)
= −dτ 2 + R̃0 (τ )2 dΩ2
= −dτ 2 + (dr̃ − r̃H(τ )dτ )2 + r̃ 2 dΩ2 r̃=R̃0 (τ ) ≡ ds2Σint .
419
23.3 Synopsis of the Oppenheimer-Snyder Construction
Dropping all tildes and other now (in retrospect) irrelevant decorations, and choosing
without loss of generality the instant of total collapse of the star to be at τ0 = 0, the
set-up and results can be summarised as follows:
Here C is given in terms of the total mass m (the parameter characterising the
exterior Schwarzschild geometry) by
C = (9m/2)1/3 , (23.24)
420
3. Jointly the exterior and interior metrics can be written compactly as
r !2
2 2 2m(τ, r)
ds = −dτ + dr + dτ + r 2 dΩ2 (23.25)
r
with (
m r > R(τ ) = C(−τ )2/3
m(τ, r) = (23.26)
2r 3 /9(−τ )2 r < R(τ ) = C(−τ )2/3
This solution describes a collapsing dust star for τ < τ0 = 0, collapsing to zero
radius at time τ = τ0 = 0.
where in the exterior (+) and interior (−) regions one has
2m 2r
v+ (τ, r) = v+ (r) = , v− (τ, r) = −rH(τ ) = . (23.29)
r 3(−τ )
In particular, continutity of the metric across the surface of the star is now expressed
by the fact that on the surface of the star one has v+ = v− , with
1/3
4m
v+ (R(τ )) = v− (τ, R(τ )) = . (23.30)
3(−τ )
While the metric is now certainly continuous across the surface of the star, in order
to complete the story one should also check that the first derivatives of the metric
match on the two sides as well. Indeed, by the Einstein equations in order for the
energy-momentum tensor to only exhibit a finite jump as one crosses Σ, rather than
a δ-function localised surface energy-momentum tensor on Σ, also the 1st derivative of
the metric induced on Σ should be continuous. This is automatic for derivatives tangent
to the surface, and thus this continuity requirement boils down to the requirement that
the normal derivatives of the metric (i.e. derivatives in the direction orthogonal to Σ)
agree on Σ, and we will come abck to this issue below.
Second derivatives, however, will not and cannot be continuous across the surface, be-
cause the energy momentum tensor has spatially constant density inside the star and
is identically zero outside the star, so that by the Einstein equations also the Einstein
tensor necessarily has a discontinuity across the surface of the star.
421
23.4 Interlude: Aspects of the Geometry of (Non-Null) Hypersurfaces
In order to turn the above statements regarding normal derivatives of the metric into
equations, we need to develop a little bit of formalism, but will only do this to the (very
limited) limited extent required here.72
Thus let Σ be a spacelike or timelike hypersurface of a space-time with metric gαβ (null
hypersurfaces require some extra thought and care), and denote by N α a unit normal
vector to Σ. For a spacelike / timelike surface this is a unit timelike / spacelike vector,
N α Nα = ǫ = ∓1 , (23.31)
which is such that N α Vα = 0 for all V α tangent to Σ. While strictly speaking this
defines N α (uniquely up to a choice of sign) only on Σ, for certain purposes it is useful
to extend this to a unit vector field in a neighbourhood of Σ (e.g. by choosing Σ to be a
member of a family of hypersurfaces providing locally a foliation of space-time by such
hypersurfaces).
Given gαβ and (such an extension of) Nα , one can construct the tensor
1. It is orthogonal to N α ,
2. For vectors V α orthogonal to N α , the scalar product with respect to hαβ is iden-
tical to that with respect to gαβ .
These two properties together imply that essentially hαβ (restricted to Σ) is the metric
induced on Σ by gαβ . This also provides us with the projectors
V α Nα = 0 ⇒ hαβ V β = V α
(23.36)
V α ∼ Nα ⇒ hαβ V β = 0 .
72
For a detailed discussion of the intrinsic and extrinsic geometry of hypersurfaces, as well as of
the Israel junction conditions see E. Poisson, A Relativist’s Toolkit: the Mathematics of Black Hole
Mechanics.
422
These projectors also allow one to map / project an arbitrary space-time tensor field to
a Σ tensor field. E.g. for a covariant 2-tensor one has
One can show that hαγ hβδ ∇γ Nδ is automatically symmetric for N α orthogonal to a
hypersurface (see below). Thus the explicit symmetrisation above is not needed and
one can define or use
Kαβ = hαγ hβδ ∇γ Nδ . (23.41)
The trace of the extrinsic curvature is identical to the space-time divergence of the
vector field N α ,
K := g αβ Kαβ = ∇α N α . (23.42)
Remarks:
1. There is some arbitrariness in the definition (in particular the sign) of Kαβ , which
depends on a convention for the choice of N α (“inward pointing” versus “outward
pointing” in situations where such a distinction makes sense).
2. As the name sugests, the extrinsic curvature tensor is the basic object describing
the extrinsic geometry of a surface, i.e. how it is embedded into the ambient space-
time. In particular, the extrinsic curvature of an intrinisically flat surface can be
non-zero.
3. As an example, one can think of the standard 2-torus T 2 with its flat induced
metric and its standard embedding into R3 . Even simpler yet, one can think of a
1-dimensional closed space (a loop S 1 ) embedded into R2 as a circle of constant
radius ρ. This one-dimensional space is evidently intrinsically flat (the Riemann
423
tensor vanishes identically in 1 dimension), but equally evidently the circle seems
to bend / curve around in 2 dimensions (in order to be able to form a circle in
the first place). This bending in the ambient space is captured by the extrinsic
curvature:
• The unit normal vector to a circle is evidently the radial unit vector
N = ∂r or N α = (1, 0) (23.43)
and
Kφφ = ρ (23.46)
There are many ways to see that Kαβ as defined in (23.41) is automatically symmet-
ric. An elementary argument (which sidesteps things like the Frobenius integrability
criterion, for instance) is the following:
Let the hypersurface Σ be given as the zero-locus of some function F (xα ),
Σ = {x : F (xα ) = 0} . (23.48)
Then the normal vector will point in the direction of changing F , i.e.
Nα ∼ ∂α F . (23.49)
Nα = G∂α F . (23.50)
but we will not need this explicit expression. It follows from (23.50) and the basic fact
∇α ∂β F = ∇β ∂α F that
424
Thus ∇α Nβ is not symmetric, but its anti-symmetric part has no components tangent
to Σ, and therefore disappears in the projected expression (23.41), which is therefore
symmetric.
One more useful property of Kαβ is that if uα is the tangent vector field to a congruence
of geodesics that are tangent to Σ, that then the diagonal component of Kαβ in the
direction of uα is zero,
uα ∇α uβ = uα Nα = 0 ⇒ Kαβ uα uβ = 0 . (23.53)
One also sees that the geodesics need not be affinely parametrised for this to hold; if
one has uα ∇α uβ ∼ uβ , one still has Kαβ uα uβ = 0.
There are many more things that could and should be said about this, but (for the time
being at least) I will just add the following remarks:
Remarks:
xα = xα (y m ) , (23.55)
α ∂xα
Em = (23.56)
∂y m
are tangent to Σ (because they describe variations along the y m -directions). In-
deed, on the surface one has
∂ α ∂
m
= Em . (23.57)
∂y ∂xα
The Emα can alternatively be used to project space-time tensors to surface tensors.
For example, one can write the induced metric in terms of the coordinates y m on
the surface as
α β α β
hmn = gαβ Em En = hαβ Em En . (23.59)
425
Since by definition the induced metric only has components in the directions (co-
)tangent to Σ, one can equivalently write it in terms of the space-time coordinates
as hαβ or in terms of the coordinates on the surface as hmn , and the Emα allow one
hmn and Kmn (or equivalently hαβ and Kαβ ) are also known as the 1st fundamental
form and 2nd fundamental form of Σ respectively.
3. In the same way one can project other tensors, in particular also covariant deriva-
tives of tensors. For example, if V α is tangent to Σ, i.e. V α Nα = 0, then one can
define an induced covariant derivative by
¯ m Vn = (∇α Vβ )Em
∇ α β
En (23.61)
etc. It turns out that this induced covariant derivative cooperatively is identical to
the canonical (Levi-Civita) covariant derivative associated with the induced metric
hmn (the easiest way to show this is to show that ∇ ¯ m is symmetric (torsion-free)
and compatible with hmn ).
4. If the vector V α is not tangent to Σ, then one can decompose Vβ into a tangent
and normal part according to (indices on Eβm have been lowered / raised with gαβ
and hmn respectively)
Vβ = Eβm Vm + ǫ(N γ Vγ )Nβ . (23.62)
Then there is a 2nd contribution to (23.61), namely
α β
Em ¯ m Vn + E α E β ∇α (ǫ(N γ Vγ )Nβ )
En ∇α Vβ = ∇ m n
¯
= ∇m Vn + ǫ(N γ Vγ )E α E β ∇α Nβ (23.63)
m n
¯ m Vn + ǫ(N Vγ )Kmn ,
=∇ γ
and this is one of the ways in which the exterior curvature enters expressions for
covariant derivatives.
5. The exterior curvature tensor also enters when one inquires about the normal
α ∇ V (with V again assumed to
component of the simply-projected quantity Em α β β
β
be tangential, Vβ N = 0). This normal component is given by the scalar product
with N β , and can be written as
α
(Em ∇α Vβ )N β = −Em
α β
V ∇α Nβ = −Kmn V n (23.64)
Both (23.63) and (23.65) illustrate that the exterior curvature tensor is essentially
the same as (a tensorial repackaging of) the normal components of the connection.
426
6. With the help of these and related expressions one can then relate the curvature
tensor of gαβ to that of hmn . This gives rise to the so-called Gauss-Codazzi equa-
tion, which express the space-time curvature tensor in terms of the intrinsic and
extrinsic curvatures of the hypersurface. These relations play an importnat role
in particular in the Hamiltonian and initial value formulations of the Einstein
equations, where the first step is the choice of an initial spacelike hypersurface Σ
and an accompanying 4 → 3 + 1 decomposition of the curvature tensor and the
Einstein equations.
Returning to the issue at hand, we can now formulate the condition for the continutity
of the normal derivative of the metric across the surface of the star as the condition
that the interior and exterior extrinsic curvatures be equal,
− +
Kαβ = Kαβ . (23.66)
In order to check if this condition is satisfied, as the first step we need to determine the
normal vector N α .
To that end we note first that the tangent directions to the surface of the star are the
two spacelike directions tangent to the 2-sphere S 2 , as well as the timelike direction
uα spanned by the geodesics describing the free-fall motion of the surface. Thus the
normal vector N α is a spacelike radial vector orthogonal to uα . This is most effortlessly
determined by recalling that in Painlevé-Gullstrand coordinates one has
N α = (0, 1, 0, 0) or N = N α ∂α = ∂r . (23.68)
Since N α has this simple form (and the same form both outside and inside the star),
while for its associated covector one has the marginally more complicated expression
Nα = (v± , 1, 0, 0) , (23.69)
it is convenient to write the expression (23.41) for the extrinsic curvature in terms of
the covariant derivative of N α as
Kαβ = hαγ hβδ ∇γ N δ = hαγ hβδ Γδγµ N µ = hαγ hβδ Γδγr (23.70)
(restricted to Σ).
427
1. Let xa = (θ, φ) denote the angular coordinates. Then one has
because Nα has no angular components. Thus for the angular components one
simply needs
gab dxa dxb = r 2 dΩ2 ⇒ Γabr = 12 ∂r gab = r(dθa dθb + sin2 θdφa dφb ) (23.72)
and therefore
+
Kab −
= Kab = R(τ )(dθa dθb + sin2 θdφa dφb ) . (23.73)
and all three terms are individually zero: (i) gaβ uβ = 0 because uβ has no com-
ponent in an angular direction, (ii) gar = 0, (iii) gβr components are independent
of xa . Therefore one has
+ β − β
Kaβ u = Kaβ u =0 . (23.75)
± α β
3. It remains to show that the u − u components Kαβ u u are equal. This can be
shown by explicit calculation, but it is more enlightning to note that this follows
in general from (23.54), because uα is geodesic. Thus
+ α β − α β
Kαβ u u = Kαβ u u =0 . (23.76)
+ −
Kαβ = Kαβ (23.77)
and therefore the continuity of the first derivatives of the metric across the surface of
the star. We also learn that this is essentially due to the fact that we have matched the
two space-times along geodesics (something that would not have been necessary if we
had just wanted to have a continuous metric).
ri is a comoving coordinate, and we assume that particles on the the surface of the star
move along the geodesics with ri = r0 , r0 labelling the maximal radius of the star. As
428
already described in section 15.9, the restriction of the Novikov metric (15.43) to the
surface of the star, i.e. to the comoving radial coordinate ri = r0 , is
ds2Σext = −dτ 2 + f (ri )−1 r ′ (τ, ri )2 dri2 + r(τ, ri )2 dΩ2 r =r0
i
(23.79)
2 2 2
= −dτ + R0 (τ ) dΩ
R0 (η) = 12 r0 (1 + cos η)
3 1/2 (23.81)
r0
τ (η) = (η + sin η) ,
8m
drc2
ds2 = −dτ 2 + a(τ )2 ( + rc2 dΩ2 ) . (23.82)
1 − rc2
We assume that from the interior point of view the surface of the star is at the fixed
comoving radius rc = rc,0 , so that the induced metric on the surface of the star is
drc2
ds2Σint = −dτ 2 + a(τ )2 ( + rc
2
dΩ 2
)
1 − rc2 rc =rc,0 (23.83)
= −dτ 2 + a(τ )2 rc,0
2
dΩ2 ≡ −dτ 2 + R̃(τ )2 dΩ2 .
Cm
ȧ2 + 1 = , (23.84)
a
whose solution in parametrised form is (21.31). Shifting η (by π) and t = τ (by amax π/2)
so that the maximal radius amax = Cm is reached at η = 0, τ = 0, the solution takes
the form
Cm
a(η) = (1 + cos η)
2 (23.85)
Cm
τ (η) = (η + sin η) .
2
Thus R̃0 (τ ) = rc,0 a(τ ) satisfies the equation
3
˙ )2 + r 2 = rc,0 Cm .
R̃(τ (23.86)
c,0
R̃0
429
Continuity of the metric across the surface of the star requires R0 (τ ) = R̃0 (τ ), and
comparison of (23.80) and (23.86), say, gives us two conditions. The first,
2m 2
p
= rc,0 ⇔ rc,0 = 2m/r0 < 1 (23.87)
r0
just provides us with the relation between the comoving Novikov coordinate r0 and the
comoving Robertson-Walker coordinate rc,0 . The second,
3
2m = Cm rc,0 , (23.88)
is identical to the condition (23.17) found in the previous section, in the context of the
k = 0 PG collapse, with the same consequence
m 4π
M≡ = ρ(τ )R̃0 (τ )3 . (23.89)
GN 3
The physical content of this equation is again that the (constant) Schwarzschild mass
M , i.e. the gravitational mass of the collapsing star as seen from the outside, is at any
time τ given by the product of the density and the (coordinate) volume of the star (cf.
also the comment on coordinate versus proper volume in this context in section 12.6).
The two conditions (23.87) and (23.88) can also equivalently be written as
and with these identification it is now manifest that the solution for R0 (τ ) given in
(23.81) is identical to the solution for R̃0 (τ ) = rc,0 a(τ ) obtained from (23.85). Thus
the metric is now manifestly continuous across the surface of the star (with analogous
comments regarding its 1st and 2nd derivatives as in the case k = 0).
430
24 de Sitter and anti-de Sitter Space
(Anti-) de Sitter spaces are the simplest solutions of the Einstein equations with Λ 6= 0.
As such they are the curved counterparts of the Λ = 0 Minkowski space-time. In
particular, they are the unique constant curvature (or maximally symmetric) space-
times, just as Minkowski space is the unique flat (or Poincaré-symmetric) space-time.
Thus they are the Lorentzian-signature counterparts of spheres and hyperboloids, and
made a first appearance as such in section 17. Since they are thus in some sense the
simplest non-trivial space-times, it is worthwhile to study them in some detail.
The de Sitter and anti-de Sitter spaces subsequently reappeared in the context of cos-
mology in section 21.5, but it may not be immediately apparent that what we called
(anti-) de Sitter there is indeed identical to what we called (anti-) de Sitter in section
17. In order to bridge this gap, we will in particular redo the analysis of section 17.4
in the special case of interest. We will realise the (A)dS spaces via embeddings into a
higher-dimensional vector space, and we will use this embedding to express the resulting
induced metric in various coordinate systems. In this way we will, in particular, also
recover the metrics encountered in the cosmological context above. This will complete
the proof that the solution of the Friedmann equations in the cosmological constant
dominated phase is unique (for a given cosmological constant Λ) and uniquely given by
the maximally symmetric (A)dS space.
We will now discuss the embeddings of (A)dS space, beginning with the more familiar
cases of Euclidean signature spheres and hyperboloids, already discussed at some length
in section 17.
We will denote the coordinates of the 5-dimensional embedding space by z A , the range of
the indices (e.g. 1 to 5 or 0 to 4) being chosen to be whatever is convenient or suggestive
in the case at hand.
S4 : (z 1 )2 + . . . + (z 5 )2 = +1 . (24.1)
431
If we wanted to discuss the sphere of radius R, we would replace the +1 on the
right-hand side of the above equation by R2 (and likewise for the radius R or
curvature radius ℓ below).
Its isometry group is the (4 + 1)-dimensional Poincaré group, which has dimension
15. The equation defining H 4 is left invariant by its SO(4, 1) Lorentz-subgroup
which has dimension 10, and thus the metric induced on H 4 by the Minkowski
metric on the embedding space will have isometry-group SO(4, 1) and is maximally
symmetric. This metric has Euclidean signature because the (-1) on the right-hand
side of (24.2) allows one to completely eliminate the time-like direction z 0 , and
the corresponding line element is denoted by dΩ̃24 .
3. If we change the sign on the right-hand side of (24.2), the equation will still
be invariant under SO(4, 1), but now the signatue of the induced metric will
be Lorentzian instead of Euclidean and we obtain a realisation of a maximally
symmetric space-time, namely de Sitter space,
Since this equation is SO(3, 2)-invariant, this space-time will have isometry group
SO(3, 2), induced from the signature (2,3) metric on the embedding-space R2,3 .
The dimension of SO(3, 2) is also 10, just like that of SO(4, 1) or SO(5), and
(24.5) defines the maximally symmetric Anti-de Sitter space (actually, for AdS we
will take the universal covering space of the space described by (24.5) - we will
come back to this below).
As already indicated in section 17, the statements about the isometries of maximally
symmetric space(-time)s can be compactly summarised by writing them as homogeneous
spaces of the isometry groups.
This generalises the statement that the 2-sphere can be written as the homogeneous
space (or coset space) SO(3)/SO(2), which itself comes about as follows:
432
1. It is clear that the SO(3) rotations act transitively on the 2-sphere, i.e. that any
point can be mapped to any other point (this is the property of homogeneity
discussed in section 17).
2. Moreover, given any point p on the 2-sphere, there is an SO(2) subgroup of SO(3),
SO(2)p ⊂ SO(3), consisting of rotations around the axis passing through that
point, that leaves the point p invariant but acts on the vectors at that point by
2-dimensional rotations (isotropy).
3. Since this SO(2)p -transformation must also be a symmetry of the metric at that
point, this shows in particular that the metric has a Euclidean signature at each
point.
4. Putting all this together, given any point p, we can establish a 1:1 correspondence
between points on S 2 and elements of SO(3) modulo elements of SO(2)p , and we
write this as
S2 ∼
= SO(3)/SO(2) , (24.6)
the set on the right-hand side considered as the set of equivalence classes [g] with
g ∈ SO(3) and [gh] = [g] for h ∈ SO(2), say (this defines right-cosets, SO(2)
acting on the right on an SO(3)-element).
These statements generalise straightforwardly to the 4-sphere, and also to the other
maximally symmetric space-times discussed above, and we summarise these facts in the
following table, adding also the notation we will occasionally use for the correspond-
ing line-element (when we do not just write it anonymously as ds2 ), and giving the
embedding (with ~z2 = (z 1 )2 + (z 2 )2 + (z 3 )2 ):
M ∼
= G/H G H dΩ2 embedding
M = S4 SO(5) SO(4) dΩ24 +~z2 + (z 4 )2 + (z 5 )2 = +1
M = H4 SO(4, 1) SO(4) dΩ̃24 −(z 0 )2 + ~z2 + (z 4 )2 = −1 (24.7)
M = dS4 SO(4, 1) SO(3, 1) dΩ21,3 −(z 0 )2 + ~z2 + (z 4 )2 = +1
M = AdS4 SO(3, 2) SO(3, 1) dΩ̃21,3 −(z 0 )2 + ~z2 − (z 4 )2 = −1
of the metric ηAB of the embedding space, and they satisfy the Lie bracket algebra
[JAB , JCD ] = ηAD JBC + ηBC JAD − ηAC JBD − ηBD JAC (24.9)
433
24.2 Some Coordinate Systems for de Sitter space
−(z 0 )2 + (z 1 )2 + . . . + (z 4 )2 = +1 . (24.10)
This describes a time-like hyperboloid with topology R × S 3 , the S 3 arising from the
slicing at fixed z 0 ,
1. Global Coordinates
Writing (24.10) as in (24.11) as
(z 1 )2 + . . . + (z 4 )2 = +1 + (z 0 )2 , (24.12)
(analogous identities will be used repeatedly in the following). Then one finds the
metric
!
X
ds2 = −(dz 0 )2 + (dz a )2 |(24.10) = −dτ 2 + cosh2 τ dΩ23 . (24.15)
a
Remarks:
(a) This is the k = +1, Λ > 0 solution (21.46) of the Friedmann equations, thus
confirming that the solution found there is maximally symmetric.
(b) The manifest symmetries in this coordinate system are the symmetries of S 3 ,
i.e. the subgroup SO(4) ⊂ SO(4, 1) of the total isometry group (thus 6 out
of 10 isometries are manifest).
(c) It can be read off from (24.10) that these coordinates cover the hyperboloid
globally (modulo the usual, and utterly harmless, issues with spherical coor-
dinates at the poles of a sphere).
434
(d) These are the Lorentzian-signature counterparts of the standard (hyper-
)spherical coordinates on S 4 , and are related to these by “Wick rotation”
(continuation to imaginary time),
For z 4 > 1, the right-hand side is negative and slices of constant z 4 are therefore
hyperboloids H 3 . In that case it is natural to introduce
The nα thus parametrise H 3 , and one finds that the de Sitter metric can be written
as
ds2 = −dτ 2 + sinh2 τ dΩ̃23 , (24.19)
with dΩ̃23 the line-element on the unit hyperboloid in any coordinate system.
Remarks:
435
The nα parametrise dS3 , and one finds the metric
dr 2
ds2 = + r 2 dΩ21,2 = dψ 2 + sin2 ψ dΩ21,2 , (24.22)
1 − r2
with dΩ21,2 the line-element on the unit curvature radius (2 + 1)-dimensional de
Sitter space dS3 .
4. Planar Coordinates
In order to reproduce the spatially flat k = 0 metric (21.45), we introduce (ad-
mittedly with a certain amount of hindsight) coordinates t and xk through
z 4 − z 0 = e −t , z 4 + z 0 = e t − ~x2 e −t , z k = e −t xk . (24.23)
It is evident that this also solves (24.10). Then one finds the metric
Remarks:
z 4 − z 0 = e −t ≥ 0 . (24.27)
and introducing a new time coordinate τ = ∓ exp ∓t, the metric takes the
form
ds2 = τ −2 (−dτ 2 + d~x2 ) (24.29)
with τ ∈ (−∞, 0) or τ ∈ (0, ∞) respectively. This is the positive curvature
space-time counterpart of the generalised Poincaré upper half plane metric
(8.75) and the de Sitter counterpart of the anti-de Sitter Poincaré coordinates
(24.71).
436
5. Static Coordinates
So far, the metric in all the coordinate systems was explicitly time-dependent
(with a different time-coordinate in each case). It is possible to locally introduce
a coordinate system that is time-independent. Namely, let us write (24.10) as
X
(z 1 )2 + (z 2 )2 + (z 3 )2 ≡ (z k )2 = 1 + (z 0 )2 − (z 4 )2 (24.30)
k
P k )2
and introduce a spatial radial coordinate r via k (z = r 2 . Then one has
X
(z k )2 = r 2 ⇒ (z 4 )2 − (z 0 )2 = 1 − r 2 . (24.31)
k
Remarks:
437
can be confirmed explicitly by introducing coordinates that are appropriate
in that range, namely instead of (24.32)
The existence of this alternative form of the metric beyond the horizon of
the static metric is not surprising: we did (but then quickly dismissed as
not particularly insightful) something analogous in the Schwarzschild case,
introducing the time coordinate T = r and the radial coordinate R = t in
the region 0 < r < 2m (15.1), and we will also briefly consider something
analogous in the anti-de Sitter case below - see (24.83).
of the metric is implied not just by the vacuum Einstein equations for spherical
symmetry but, more generally, by the Einstein equations in spherical symmetry
whenever T tt = T rr . This condition is, in particular, satisfied, when the matter
content is that of a cosmological constant,
Λ Λ
Tµν = − gµν ⇒ T tt = T rr = − . (24.40)
8πGN 8πGN
Then the Einstein equations (12.54) reduce to the simple equation
leading to
2m(r) 2m0
f (r) = 1 − =1− ∓ r 2 /ℓ2 . (24.43)
r r
Remarks:
438
(a) In particular, for m0 = 0 and a positive cosmological constant we recover
the above static metric (24.33) (for ℓ = 1, but we could have done all the
contructions for any ℓ).
(b) Likewise, we learn that for Λ < 0 there will be a static coordinate system in
which the anti-de Sitter metric takes the form (24.33) with −r 2 → +r 2 . We
will recover and reconfirm this below, see (24.56), when studying the AdS
metrics more systematically.
(c) Remarkably, by including the integration constant m0 we have actually found
a more general class of solution of the Einstein equations with a cosmolog-
ical constant, which are interesting in their own right, namely the (A)dS
Schwarzschild solutions
−1
2 2m0 2 2 2 2m0 2 2
ds = − 1 − ∓ r /ℓ dt + 1 − ∓ r /ℓ dr 2 + r 2 dΩ22 ,
r r
(24.44)
already mentioned briefly in section 15.10. These descibe a black hole of
mass m0 immersed in an (A)dS background.
7. Painlevé-Gullstrand-like Coordinates
In section 15.1 we had seen how to construct, from a static spherically symmet-
ric metric of the standard (and, as it subsequently turned out, ubiquitous) form
(24.39) e.g. a metric with with flat constant-time slices by a coordinate transfor-
mation T (t, r) = t + ψ(r). In the case at hand (24.33), with f (r) = 1 − r 2 , the
condition (15.18) leads to
which is solved by
ψ(r) = ± 21 ln(1 − r 2 ) . (24.46)
Performing the coordinate transformation form the static metric (24.33) with this
choice of ψ(r) leads to the metric
Remarks:
73
See M. Parikh, New Coordinates for de Sitter Space and de Sitter Radiation, arXiv:hep-th/0204107
for some applications of this PG-form of the de Sitter line-element.
439
(a) The Killing vector ∂T = ∂t becomes space-like for r > 1, so while this is an
extension of the static metric of de Sitter space, the metric is no longer static
outside the original static patch.
(b) It is evident, by imitating the argument in (15.15), that T is nothing other
than the proper time along the family of E = 1 geodesics (24.35) of the static
metric,
(c) This metric can also be obtained from the metric (24.25) in planar coordinates
(we give these a subscript p now)
ds2 = −dt2p + e ±2tp d~x2 = −dt2p + e ±2tp (drp2 + rp2 dΩ2 ) (24.49)
With these explicit parametrisations of dS space giving rise to various coordinate sys-
tems, it is now possible to also determine directly the coordinate transformations be-
tween these different coordinate systems, but this is unenlightning (and typically rather
messy), and it is usually more convenient to refer everything back to the embedding
space, as we have done here. Clearly, there are many more possiblities, but this shall
suffice. It should be clear from the above examples how to construct other coordinate
systems for dS adapted to one’s needs.74
Coordinates for anti-de Sitter space can be described in precise analogy with the de
Sitter case. Our starting point is the defining equation (24.5),
−(z 0 )2 + (z 1 )2 + (z 2 )2 + (z 3 )2 − (z 4 )2 = −1 . (24.51)
(z 0 )2 + (z 4 )2 = 1 + (z 1 )2 + (z 2 )2 + (z 3 )2 . (24.52)
74
And for more information about the interesting physics of de Sitter space see e.g. M. Spradlin, A.
Strominger, A. Volovich, Les Houches Lectures on de Sitter Space, arXiv:hep-th/0110007.
440
For z k fixed, this descibes a circles in the (z 0 , z 4 )-plane. As the metric is negative-
definite on that plane, this show that the surface defined by (24.51) has closed timelike
curves through every point. To avoid such a paradoxical and pathological situation, we
will “pass to the covering space”, which amounts to replacing S 1 → R. It is actually
this resulting space, without closed timelike curves, that is usually referred to as anti-de
Sitter space, and we will follow that convention.
1. Global Coordinates
Global coordinates are provided by writing the general solution of (24.52) as
Remarks:
(a) Alternatively, one can write r = sinh ρ, with 0 ≤ r < ∞, and now one
recognises the metric
We see that the slices of constant z 4 are hyperboloids H 3 for |z 4 | < 1 and de Sitter
spaces dS3 for z 4 > 1. In the fomer case, adapated coordinates to this slicing are
441
leading to the metric
ds2 = −dt2 + sin2 t dΩ̃23 (24.59)
which is precisely the k = −1, Λ < 0 solution (21.48) of the Friedmann equations.
dr 2
R3 , S 3 , H 3 : ds2 = + r 2 dΩ22 . (24.63)
1 − kr 2
Analogously, we now see from (24.22), (24.62) and the form of the Minkowski
metric (21.12) in Rindler-like coordinates, that the metrics on R1,3 , dS4 and AdS4
can collectively be written as
dr 2
R1,3 , dS4 , AdS4 : ds2 = + r 2 dΩ21,2 , (24.64)
1 − kr 2
−(z 0 )2 + (z 1 )2 + (z 2 )2 − (z 4 )2 = −1 − (z 3 )2 (24.65)
and note that for fixed z 3 the left-hand side defines an AdS3 . Noting also that
the right-hand side is ≤ −1, write
442
so that the nα parametrise AdS3 , with
5. Poincaré Coordinates
A somewhat unobvious but particularly interesting and useful way of parametris-
ing the solution to (24.51) is to write (a certain amount of hindsight helps)
Even though this is obscure, in these coordinates the metric takes the particularly
simple and easy-to-use form
dr 2
ds2 = + r 2 ηαβ dxα dxβ
r2
= z −2 (ηαβ dxα dxβ + dz 2 ) (r = z −1 ) (24.71)
Remarks:
(a) These are the AdS counterpart of the planar coordinates for dS space, and
the space-time counterpart of the uppper-half-plane model of hyperbolic ge-
ometry discussed in section 8.7, see in particular (8.75).
(b) In these coordinates, a (2+1)-dimensional Poincaré symmetry is manifest, as
well as a scaling symmetry (t, ~x, z) → (λt, λ~x, λz).
(c) These coordinates do not cover all of AdS, as can for instance be seen by
noting that radial null-geodesics can reach r = 0 (or z = ∞, say), at finite
values of the affine parameter: null condition and conserved energy give (x1
and x2 are kept fixed, and we set x0 = t)
443
finite coordinate time t (which, as usual, is proportional to the proper time
of a stationary observer). To see this, note that the above equations imply
t(λ) ∼ r(λ)−1 so r = ∞ is reached at t = 0 (or some other finite value of t,
depending on one’s choice of integration constant). Thus for such an observer,
it matters what happens at r = ∞, and this leads to the requirement to treat
r = ∞ as some kind of boundary on which boundary conditions need to be
imposed. This boundary metric at r = const, r → ∞ is, up to an infinite
overall constant factor of r 2 , just the Minkowski metric in one dimension less.
These sorts of things, combined with the above observation that isometries of
the “bulk” AdS space-time include scalings (dilatations) of the “boundary”
coordinates (and that, moreover, the isometry group of the bulk is actually
the conformal group of the boundary) lie at the heart of the celebrated anti-de
Sitter / conformal field theory (AdS/CFT) correspondence.75
The region of AdS space covered by the Poincaré coordinates is frequently
referred to as the Poincaré patch, and z = ∞ or r = 0 as the Poincaré
horizon.
(e) We had already seen in section 21.5, in equations (21.45), (21.46) and (21.47),
and then again in section 24.2 that the de Sitter metric could be written in
such a way that the constant time slices are maximally symmetric spatial
slices with either k = 0 (Planar Coordinates (24.25)), or k = +1 (Global
Coordinates, (24.15)), or k = −1 (Hyperbolic Coordinates (24.19)).
Analogously, the anti-de Sitter metric can be written in such a way that the
metric on radial slices are maximally symmetric space-times with any sign of
the curvature, k = 0 Minkowski space-time in Poincaré coordinates (24.71),
k = +1 de Sitter slices in the coordinates (24.61), or k = −1 anti-de Sitter
slices in the coordinates (24.69),
where
α β
exp ±ρ for k = 0 ηαβ dx dx for k = 0
fk (ρ) = sinh ρ for k = +1 and dΩ2(k) = dΩ21,2 for k = +1
2
cosh ρ for k = −1 dΩ̃1,2 for k = −1
(24.74)
−(z 0 )2 + (z 1 )2 + (z 2 )2 = −1 + (z 4 )2 − (z 3 )2 . (24.75)
75
See e.g. J. Polchinski, Introduction to Gauge/Gravity Duality, arXiv:1010.6134 for an introduction.
444
Codimension-2 Slices of constant (z 3 , z 4 ) are thus either 2-dimensional hyper-
boloids H 2 or 2-dimensional de Sitter spacetimes dS2 , depending on whether
(z 4 )2 − (z 3 )2 < 1 (for H 2 ) or (z 4 )2 − (z 3 )2 > 1 (for dS2 ). We will only con-
sider the former case here (but it should by now be evident how to treat the
latter, or other variations of this theme).
Even with the condition (z 4 )2 − (z 3 )2 < 1 it turns out that there are still two
different cases to consider, namely either (z 4 )2 −(z 3 )2 < 0 or 0 < (z 4 )2 −(z 3 )2 < 1.
(a) (z 4 )2 − (z 3 )2 < 0
In this case we solve (24.75) in terms of a radial coordinate r by
Remarks:
445
(b) 0 < (z 4 )2 − (z 3 )2 < 1
This corresponds to 0 < r 2 < 1 in the above parametrisation, and the form
(24.78) of the metric already suggests that r is really a time coordinate in
this case (and what appeared as t above is a spatial coordinate). Indeed,
with the parametrisation
7. Painlevé-Gullstrand-like Coordinates?
In item 7 of section 24.2, we had introduced PG-like coordinates (24.47)
for de Sitter space by starting with the static spherically symmetric form (24.33)
of the metric (to which we could thus apply the general t → t + ψ(r) procedure
outlined in section 15.1). We had also seen there that the PG coordinates could
be thought of as interpolating between static and planar coordinates (24.49),
of anti-de Sitter space, in which the metric takes the standard form (24.39), with
f (r) = 1 + r 2 . However, attempting to shift t → T (t, r) = t + ψ(r) in order to find
a metric with flat constant T spatial slices, grr = 1, requires solving the condition
(15.18), which in the present case reads
!
1 − C(r)2 = f (r) = 1 + r 2 . (24.88)
This is evidently not possible, so in this strict sense there are no PG-like coordi-
nates for anti-de Sitter space.
446
However, there is something analogous that one can do. Comparing the de Sitter
planar coordinates (24.86) with the anti-de Sitter metric in Poincaré coordinates
(24.71),
ds2 = dρ2 + e 2ρ (−dt2 + d~x2 ) , (24.89)
and introducing, in analogy with (24.86), polar Milne coordinates (section 21.1),
one sees that, roughly speaking (24.86) and (24.90) differ from each other by
an exchange of a radial with a time coordinate. And indeed, taking this hint
seriously, one can construct analogues of PG coordinates that are adapted to a
suitable family of spacelike geodesics, and which restrict to the flat Minkowski
metric on radial slices of constant R, with R being proper distance along this
family of spacelike geodesics.
While this is a useful exercise, we can also turn the procedure around, i.e.
• start with the metric (24.90) in Poincaré / Milne coordinates and perform
the coordinate transformation T = τ exp ρ to obtain a PG-like metric (with
roles of time and radius exchanged);
• find a new radial coordinate R(ρ, T ) through ρ = R + ψ(T ) such that the
metric is again diagonal, say.
The resulting metric should then be the analogue of the static spherically symmet-
ric de Sitter metric, and turns out to be the metric (24.83) (which is indeed the
analogue of the continuation (24.37) of the static de Sitter metric (24.85) beyond
the horizon).
Implementing the first step, from (24.90) one finds
ρ = R + ψ(T ) . (24.92)
(note the analogy with (24.46)) one finds precisely the anti-de Sitter metric in the
form (24.83),
447
Thus the PG-like AdS metric (24.91) interpolates between the Poincaré (planar)
metric (24.90) and the metric (24.94).
To see the relation between this PG-like metric and spatial geodesics, we observe
the following:
P = (1 − T 2 )R′ ⇒ P 2 − (T ′ )2 = 1 − T 2 , (24.96)
Clearly, there are many more possiblities, but this shall suffice. It should be clear from
the above examples how to construct other coordinate systems for AdS adapted to one’s
needs.76
We have seen in the previous sections that the metrics of maximally symmetric space-
times can frequently be written in a way which exhibits their slicing by lower-dimensional
maximally symmetric spaces or space-times.
Typically, in the codimension-1 case these metrics have the general form
2
dSK = ǫdσ 2 + f (σ)2 ds2k , (24.98)
where
448
• σ is a radial (space) or time coordinate, depending on whether ǫ = +1 or ǫ = −1
For ǫ = +1, thus σ = r a radial coordinate, and f (r) = r, one obtains what is known as
the metric on the cone over the space-(time) with line-element ds2k , in general given by
gij (x)dxi dxj → Cone Metric: ds2 = dr 2 + r 2 gij dxi dxj . (24.99)
A familiar example is the Euclidean metric on Rn+1 , which can be written in polar
coordinates as the cone metric over S n ,
so in this case the “cone” has actually been flattened out to Rn+1 . However, if one were
to replace S n by a less symmetric space, there would be a (conical) singularity at the
tip r = 0 of the cone.
I will refer to the general class of metrics in (24.98) as (generalised) spacelike or timelike
cone metrics. They can also be considered as special cases of so-called warped product
metrics, which are metrics of the form
(the metric on a product of space(-time)s with metrics hab (y) and gij (x), twisted or
warped by the function f (y)).
Returning to the issue of writing maximally symmetric metrics in the form (24.98), we
now want to address the question what determines in general what choice of ǫ, k, f (σ)
is required or possible to realise a maximally symmetric space(-time) with a given K,
say (or any variation of this question).
A quick way to answer this question is to make use of the fact that maximally symmetric
spaces are characterised by the property of having constant curvature, in the sense of
(17.5)
Rαβγδ = k(gαγ gβδ − gαδ gβγ ) . (24.102)
Using this result for the Riemann curvature tensor of ds2k , it is straightforward to cal-
culate the curvature tensor of the generalised cone metric ds2K . This is just a minor
generalisation of the calculation of the Riemann tensor of the FRW metric
(the ǫ = −1 version of (24.98)) performed in section 19.1. Requiring that the cone
metric with line-element ds2K be maximally symmetric, i.e. that its curvature tensor
also has this form, one finds the constraint
449
between the various objects appearing in (24.98), as well as the condition
A special (and especially boring) case that we will take care of (and dismiss) first is
f ′ = 0, i.e. f constant, so that (24.98) describes a direct product metric. Then one finds
K = k = 0, and this just corresponds to the possibilities
We now concentrate on f ′ 6= 0. Then (K1) implies (K2) (by differentiation) and con-
versely (K2) implies (K1), with k arising as an integration constant (in fact, (K1) is the
energy/Hamiltonian corresponding to the equation of motion (K2)).
f ′′ + Kf = 0 and k = (f ′ )2 + Kf 2 . (24.107)
1. K = 0:
The solution is
f = ar + b ⇒ k = a2 ≥ 0 (24.108)
for constants a and b, and dismissing the case of constant f we have already
dealt with, we can without loss of generality choose f (r) = r, so that we are
dealing with a standard cone metric. Then k = +1 and we find either the
usual polar coordinate decomposition (24.100) of the Euclidean metric, or its
Lorentzian signature counterpart
450
Without loss of generality we can choose k = +1 and (by a suitable shift of
ψ) a = 1, b = 0. In Euclidean signature, this gives
451
1. K = 0:
The solution is
f = at + b ⇒ k = −a2 ≤ 0 (24.118)
and discarding the case of constant f we are left with f (t) = t and k = −1.
The corresponding metric
452
We thus see that we have been able to reproduce many of the metrics found in sections
24.2 and 24.3 from this more general perspective, perhaps shedding some light on the zoo
of coordinate systems found there. In particular, we have seen how the conditions (K1)
(24.104) and (K2) (24.105) correlate the choice of curvatures k and K, the signature ǫ
of the cone direction, and the choice of warping function f .
453
25 Vaidya Metrics I: Bondi Gauge and Radiation Fields
The Vaidya metrics are a (deceptively) simple generalisation of the Schwarzschild metric
written in ingoing or outgoing Eddington-Finkelstein coordinates, in which the constant
mass m is replaced by a mass function m(v) or m(u) depending on an advanced or
retarded time coordinate,
2m(v)
ds2 = −f (v, r)dv 2 + 2dvdr + r 2 dΩ2 , f (v, r) = 1 −
r (25.1)
2 2 2 2 2m(u)
ds = −f (u, r)du − 2dudr + r dΩ , f (u, r) = 1 − .
r
These turn out, none too surprisingly, to give rise to solutions to the Einstein equations
that describe null dust (or radiation) either entering (falling into) the black hole or star
(for m = m(v) a function of the ingoing Eddington-Finkelstein coordinate) or exiting
from (or being radiated away by) the black hole or star (for m = m(u) a function of the
outgoing Eddington-Finkelstein coordinate).
What makes the Vaidya metrics particularly interesting is, among other things, that
they are thus capable of describing time-dependent generalisations of the Schwarzsschild
black hole, permitting one in particular to study the formation and evolution of a black
hole, in particular also the event horizon of a black hole, and to appreciate the differences
between different possible definitions of a horizon (all of which coincide in the static
Schwarzschild case). They also provide a classical setting for modelling an evaporating
black hole.
As the Vaidya metrics are not discussed at any length (actually hardly mentioned) in
any of the standard textbooks I am aware of, I will explain some of the elementary
aspects of these metrics in some detail in this and subsequent sections, with occasional
pointers to the literature for more detailed and advanced investigations.
To set the stage, we introduce the Vaidya metrics and list some of their basic properties.
454
(b) It has the characteristic Kerr-Schild “flat + null” form (15.71) and (15.74)
2m(v)
gαβ = ηαβ + ∂α v∂β v (25.3)
r
where ηαβ is the Minkowski metric, written here in advanced coordinates,
and
η αβ ∂α v∂β v = η vv = 0 . (25.5)
This can also be made more explicit in terms of the Kerr-Schild time coor-
dinate t̃ defined by v = t̃ + r, in terms of which the metric takes the form
2m
ds2 = −dt̃2 + dr 2 + r 2 dΩ2 + d(t̃ + r)2 . (25.6)
r
For the Schwarzschild metric, t̃ is simply related to the Schwarzschild coor-
dinate t and the tortoise coordinate r ∗ by v = t + r ∗ = t̃ + r.
(c) The curvature tensor simplifies accordingly, and the only non-vanishing com-
ponent of the Einstein tensor Gαβ is Gvv , with
2m′ (v)
Gvv = . (25.7)
r2
Equivalently, the only non-vanishing component of Gαβ is
2m′ (v)
Grv = . (25.8)
r2
Thus (25.2) solves the Einstein equations for an energy tensor of the form
m′ (v) v v
Tαβ = δ δ . (25.9)
4πGN r 2 α β
455
2. The Outgoing Vaidya Metric:
The metric has the form
2m(u)
ds2 = −f (u, r)du2 − 2dudr + r 2 dΩ2 , f (u, r) = 1 − . (25.12)
r
Its properties are, mutatis mutandis, largely analogous to those of the ingoing
Vaidya metric, for example:
2m′ (u)
Guu = − . (25.13)
r2
Thus (25.12) solves the Einstein equations for an energy tensor of the form
m′ (u) u u
Tαβ = − δ δ . (25.14)
4πGN r 2 α β
2m(v, r)
ds2 = −f (v, r)dv 2 + 2dvdr + r 2 dΩ2 , f (v, r) = 1 − (25.16)
r
(or its outgoing counterpart). In this case, one finds by a straightforward but
uninspiring calculation that the non-vanishing components of the Einstein tensor
Gαβ are
∂ 2 m(v, r)
Gθθ =Gφφ = − r
r
r v 2∂r m(v, r)
G r =G v = − (25.17)
r2
2m′ (v, r)
Grv = +
r2
(a prime still denotes a v-derivative, partial r-derivatives are written explicitly).
Special cases of this generalised Vaidya metric include
456
(a) the Vaidya-Reissner-Nordstrøm metrics with
2m(v) q 2
f (v, r) = 1 − + 2 , (25.18)
r r
which solve the Einstein equations for an energy-momentum tensor that is
the sum of the Vaidya energy-momentum tensor (ingoing null dust) and
the electrostatic Maxwell energy-momentum tensor for a point charge with
constant charge q, and
(b) the Vaidya-Bonnor metrics with
2m(v) q(v)2
f (v, r) = 1 − + 2 , (25.19)
r r
which further generalise this to allow for an injection of charge in addition
to mass into the star or black hole.77
All of the above are (of course) special cases of the general spherically symmetric metric,
which can be conveniently parametrised in terms of 2 arbitrary functions f (w, r) and
h(w, r) as
where ǫ = ±1, and we again parametrise f (w, r) in terms of a mass function m(w, r) as
2m(w, r)
f (w, r) = 1 − . (25.21)
r
This is the general spherically symmetric metric written in radiative coordinatens, or
in the so-called Bondi gauge, the retarded / advanced Eddington-Finkelstein-like coun-
terpart of the Schwarzschild-Birkhoff ansatz (12.51)
457
with
2ms (t, r)
fs (t, r) = 1 − (25.23)
r
(the subscript s on the functions indicating that these are a priori not the same functions
as those appearing in the ansatz (25.20)).
Remarks:
1. The Bondi gauge is adapted to radial null geodesics in the sense that w = const.
are radial null rays. For f (w, r) > 0 ↔ r > 2m(w, r), r decreases along future-
directed null rays with w ≡ v constant for ǫ = +1 (ingoing coordinates), and r
increases along future-directed null rays with w ≡ u constant for ǫ = −1 (outgoing
coordinates).
2. Within the Bondi gauge, there is still the freedom of reparametrising w: any
change of variables of the form
4. Starting from the metric in the Bondi gauge (25.20), one can go the Schwarzschild
gauge (25.22) by introducing w = w(t, r) through
e h(w, r) dw = e h s (t, r) −1
dt + ǫfs (t, r) dr (25.26)
where h(w, r) is the required integrating factor. Then (25.20) takes the form
(25.22) with fs (t, r) = f (w, r), or more explicitly
2m(w(t, r), r)
fs (t, r) = 1 − = f (w(t, r), r) ⇔
ms (t, r) = m(w(t, r), r) .
r
(25.27)
We will look at this in somewhtaq more detail in section 25.4.
458
If h(w, r) = 0 (or depends only on w) and f depends only on r, one can choose
hs (t, r) = 0 and (25.26) reduces to the standard relation
dw = dt + ǫf (r)−1 dr = dt + ǫdr ∗ . (25.28)
between advanded or retarded and tortoise coordinates of a static black hole met-
ric.
Thus in a general spherically symmetric coordinate system the mass function can
be expressed in terms of (or defined via) the gradient-squared of the radius function
of the transverse sphere,
r(z)
m(z) = 1 − g ab (z)∂a r(z)∂b r(z) , (25.32)
2
and is thus a scalar under these coordinate transformations. The observation
above that in the transformation from the Bondi to the Schwarzschild gauge one
has ms (t, r) = m(w) (25.27) is a particular manifestation of this fact.
Solving the vacuum Einstein equations in the standard spherically symmetric gauge
(25.22), one recovers Birkhoff’s theorem and the Schwarsschild metric in the standard
Schwarzschild coordinates. However, at least with the benenfit of hindsight, it is clear
that a better gauge may be one which is adapted to radial lightrays rather than to
stationary observers. Such an ansatz is provided by the general spherically symmetric
metric written in the Bondi gauge (25.20).
For both signs, the independent (w, r)-components of the Einstein tensor take the simple
form (the counterpart of (12.54))
2∂r m(w, r)
Gww = −
r2
2∂w m(w, r)
Grw =+ (25.33)
r2
2∂r h(w, r)
Grr =+ .
r
459
From these one can then also deduce the “missing” components, such as
Gw = gwα G = gwr G = ǫe −h G
r αr rr rr
r
Gr = grα Gαr = grr Grr + Gww = f Grr + Gww (25.34)
Gww = g Gα = ǫe h Gr − f e 2h Gw .
wα w w w
One sees that in the chosen (Bondi gauge) parametrisation, the components in (25.33)
are the simplest complete set of independent components and building blocks of the
Einstein tensor, and therefore a particuarly convenient starting point for analysing and
solving the Einstein equations. The angular components Gθθ = Gφφ (that they are equal
and that Gθφ = 0 is implied by spherical symmetry) are more involved, but are often
not needed in practice, as they can be substituted by the Bianchi identities.
1. Birkhoff’s Theorem
The vacuum Einstein equations now imply that m(w, r) = m is a constant and
that h(w, r) = h(w) is independent of r. Thus the metric takes the form
2m
2
ds = − 1 − e 2h(w) dw2 + 2ǫ e h(w) dw dr + r 2 dΩ2
r
(25.35)
2m 2 2 2
=− 1− dw̃ + 2ǫ dw̃ dr + r dΩ
r
where the new coordinate w̃ labelling radial in or out geodesics is defined by the
simple change of variables
dw̃ = e h(w) dw , (25.36)
reflecting the gauge invariance (25.25). Setting w̃ = v for ǫ = +1 and w̃ = u for
ǫ = −1, the result is therefore precisely the Schwarzschild metric in ingoing or
outgoing Eddington-Finkelstein coordinates, and therefore ab initio regular at the
future or past event-horizon.
460
25.3 Description of In- and Outgoing Pure Radiation Fields
While we have seen above that Vaidya metrics are characterised by the fact that they
have an energy-momentum tensor Tαβ ∼ δαw δβw , it is useful to rephrase this somewhat,
and we will do this in an elementary fashion in this section. This can also be formu-
lated in a geometrically somewhat more satisfactory (because less coordinate-dependent)
form, in terms of geodesic congruences, and we will study these (for these and other
reasons) in some detail later on.
one also has what is known as a pure radiation field or null dust, characterised by an
energy-momentum tensor of the form
Remarks:
1. Such an energy-momentum tensor can arise e.g. from null Maxwell fields or from
massless scalar fields (in some geometric optics / eikonal approximation). For
example a spherical outgoing scalar wave of the form φ(u, r) = ψ(u)/r with u =
t − r in Minkowski space gives rise to such an energy-momentum tensor when
terms like ψ ′ (u)/r (u-derivatives) dominate over terms like ψ(u)/r 2 (r-derivatives),
leading to
ψ ′ (u)2 u u
Tαβ ≈ δ δ . (25.42)
r2 α β
2. In particular, such an energy-momentum tensor is traceless,
T αα = ρk α kα = 0 . (25.43)
3. The absolute normalisation of the null energy density is not determined by this
form of the energy-momentum tensor since it scales under a space-time dependent
boost of k,
k → e −α(x) k ⇒ ρ(x) → e 2α(x) ρ(x) . (25.44)
461
We now consider again a general spherically symmetric space-time and denote the tan-
gent vectors to an ingoing (respectively outgoing) congruence of (not necessarily affinely
parametrised) radial future oriented null geodesics by
In the Bondi gauge (25.20), a natural (albeit asymmetric) choice for ℓ and n is
ǫ = +1 : n = −∂r , ℓ = e −h ∂v + 21 f ∂r
(25.47)
ǫ = −1 : ℓ = +∂r , n = e −h ∂u − 12 f ∂r .
Examples:
The general solution of the Einstein equations is given by the ingoing Vaidya
metric (25.2). For the ingoing Vaidya metric one can choose
nα = −δαv , ℓv = − 21 f , ℓr = 1 . (25.50)
In these adapted coordinates (adapted to ingoing null geodesics) one has (sup-
pressing the two angular dimensions)
!
1 0
nα nβ = . (25.51)
0 0
Therefore the energy-momentum tensor (25.9) of the ingoing Vaidya metric indeed
has the purely ingoing form
m′ (v)
Tαβ = nα nβ . (25.52)
4πGN r 2
462
2. Liewise, a purely outgoing radiation field has the energy-momentum tensor
The general solution of the Einstein equations is given by the outgoing Vaidya
metric (25.12). For the outgoing Vaidya metric one can choose
ℓα = −δαu , nu = − 21 f , nr = −1 . (25.55)
and the energy-momentum tensor (25.14) of the outgoing Vaidya metric indeed
has the purely outgoing form
m′ (u)
Tαβ = − ℓα ℓβ . (25.57)
4πGN r 2
This makes it even more manifest that the ingoing Vaidya metric describes purely
ingoing null matter (and the outgoing Vaidya metric purely outgoing null matter).
Likewise, the generalised ingoing Vaidya metrics (25.16) cannot describe purely outgoing
matter (and vice-versa), as can be seen from the Einstein tensor (25.17): outgoing
matter should have an energy-momentum tensor proportional to ℓα ℓβ which, in ingoing
coordinates, has the form !
1 2 1
f − f
ℓα ℓβ = 4 1 2 (25.58)
−2f 1
(the expressions for n and ℓ from (25.49) are still valid in this case, since the conditions
n2 = ℓ2 = 0 and n.ℓ = −1 are purely algebraic and do not depend on whether or not the
mass function depends on r). In particular, therefore, for outgoing pure radiation fields
in ingoing coordinates one necessarily has Grr 6= 0. The generalised ingoing Vaidya
metric, on the other hand, has Grr = 0 (and (25.33) shows that Grr 6= 0 requires a
non-trivial, i.e. r-dependent, h(w, r)).
We have just seen that what characterises the Vaidya metric is a matter content con-
sisting of purely in- or outgoing radiation, and that in the Bondi gauge this corresponds
to an energy-momentum tensor of the form
ǫm′ (w)
Tαβ = ρ(w, r)kα kβ , ρ(w, r) = (25.59)
4πGN r 2
463
where kα = −δαw . It is also of interest to understand what characterises these metrics
in the usual Schwarzschild-Birkhoff gauge (25.22) and how to write the Vaidya metrics
in this gauge. In principle, the answer to this question is provided by the coordinate
transformation (25.26),
dw(t, r) = e −h(w(t, r), r) e hs (t, r) dt + ǫfs (t, r)−1 dr (25.60)
between the Bondi and Schwarzschild gauges. If the metric in the Bondi gauge is of the
Vaidya form, one has h(w, r) = 0 (as well as m(w, r) = m(w)), and (25.60) reduces to
dw(t, r) = e hs (t, r) dt + ǫfs (t, r)−1 dr (25.61)
where
2ms (t, r) 2m(w(t, r))
fs (t, r) = 1 − =1− . (25.62)
r r
Here we have used the fact, noted in section 25.2, that the mass function transforms
as a scalar under this coordinate transformation (25.27), and we will for notational
simplicity set ms = m in the following.
This is an arduous task in general, even for simple Vaidya metrics. A quick way to deter-
mine the general form of the Vaidya metric in the Schwarzschild gauge, without having
to explicitly solve these equations in order to determine the coordinate transformation
w = w(t, r), is to write
so that
∂t m(t, r) = m′ (w)e hs , ∂r m(t, r) = ǫm′ (w)fs−1 . (25.67)
In particular, this implies that
∂t m(t, r)
= ǫfs e hs , (25.68)
∂r m(t, r)
which gives the desired relation between hs (t, r) and fs (t, r) or m(t, r). In particular,
the general Vaidya metric can now be written as
464
This is the general form of the Vaidya metric in the Schwarzschild gauge.
While we have performed this coordinate transformation at the level of the metric, we
can of course also do this at the level of the field equations and the energy-momentum
tensor. This is instructive in its own right, as it will tell us what is the form of the
energy-momentum tensor in the Schwarzschild gauge which will simply give rise to a
Vaidya metric in (Schwarzschild-) disguise upon solving the Einstein equations.
On the one hand, in the gauge (25.22) the (t, r)-components of the Einstein tensor have
the simple form (12.54)
to the Schwarzschild gauge, using (25.61), one finds that the non-vanishing components
are
Ttt = e 2hs ρ , Ttr = ǫe hs fs−1 ρ , Trr = fs−2 ρ (25.72)
or
Inserting this into the Einstein equations (25.70) one (re)discovers the relations (25.67).
which reflects the fact that the original Vaidya energy-momentum tensor was traceless.
Equivalently, using terminology that is adapted to the coordinates (t, r) of the Schwarz-
schild gauge, we can rephrase this as the statement that the energy density and radial
pressure are equal,
T rr + T tt = 0 ⇔ ρ = Pr . (25.75)
T tt = ǫe hs fs T tr , (25.76)
which expresses the lightlike nature of the energy-momentum content. Indeed, in the
Schwarzschild gauge an ingoing (for ǫ = +1) respectively outgoing (for ǫ = −1) radial
null vector kα is characterised by
kα kα = 0 ⇒ kr = −ǫfs e hs kt , (25.77)
465
and the relation (25.76) can be rephrased as the statement that
T tt = ǫe hs fs T tr ⇔ T tα kα = 0 . (25.78)
Together
• and the fact that the energy-momentum tensor is spherically symmetric and purely
longitudinal (i.e. that its transverse angular components are zero)
Using (25.9), one finds that this is a solution of the Einstein equations with an energy-
momentum tensor
1 mf 1 mf
Tvv = δ(v) or Tαβ = δ(v)nα nβ (25.81)
4πGN r 2 4πGN r 2
of the purely ingoing form, localised along the null world volume of the shell. In this
case, this (apparently somewhat ham-handed and cavalier) procedure gives the correct
(Barrabes-Israel 79 ) surface energy density
1 mf
ρΣ (r) = (25.82)
4πGN r 2
79
C. Barrabes, W. Israel, Thin shells in general relativity and cosmology: The lightlike limit, Phys.
Rev. D43 (1991) 1129-1143. A nice and characteristically lucid dicussion of the general formalism for
null and non-null shells can also be found in E. Poisson, A Relativist’s Toolkit: the Mathematics of Black
Hole Mechanics.
466
of the collapsing null shell (null world volume Σ), with constant total mass M = mf /GN .
This example is easily generalised to the description of the collapse of a shell onto a
pre-existing Schwarzschild black hole with mass mi by choosing the mass function to be
Given the success and simplicity of this construction, one might be tempted to simi-
larly describe an ingoing (collapsing) shell in outgoing coordinates by mimicking and
generalising the above procedure and postulating a metric of the outgoing form
? 2mf
ds2 = −f (u, r)du2 − 2du dr + r 2 dΩ2 , f (u, r) = 1 − Θ(r − rs (u)) , (25.84)
r
where r = rs (u) describes the ingoing v = 0 thin shell (shock wave) in outgoing coor-
dinates. However: as suggestive as this may be, it is incorrect in several respects (in
fact, as we will see, the metric, as written, does not even make sense), and it is worth
understanding this in some detail, as it also leads to a better appreciation of the ne-
cessity of the (somewhat subtle and involved) Barrabes-Israel thin null-shell formalism
in general (which we had been able to side-step in the above construction based on
adapted ingoing coordinates, and which I will also not develop here).
First of all, formally the metric (25.84) is of the generalised outgoing Vaidya metric
form (25.16), with a mass function m = m(u, r). As we have seen in section 25.3, as
such it cannot possibly describe an ingoing shock wave, regardless of what (25.84) may
suggest or was meant to do.
More seriously, however, as already mentioned, (25.84) does not even make sense, i.e.
is not well defined, as it is not clear what rs (u) should refer to: the trajectory of the
shell as described by the internal Minkowski coordinates? or the trajectory of the shell
as described by the external Schwarzschild coordinates? In fact, as we will see, the
problem is that it is not possible to use the same coordinate u on both sides of the shell,
and this invalidates an ansatz like (25.84).
In order to better appreciate and understand this, let us go back a step and describe
the situation first in usual Minkowski and Schwarzschild time and radial coordinates.
Let Σ be the null hypersurface describing the worldvolume of the shell. This divides the
space-time into two parts, the inside (past) V− and the outside (future) V+ . On these
two parts of space-time we have the metrics
where f (r) = 1−2m/r (I now write m instead of mf ). Here I have already identified the
radial and angular coordinates across the shell (this is possible), but have been careful
to introduce two different time-coordinates t∓ . These two time-coordinates cannot be
467
identified: in terms of the internal Minkowski coordinates, the trajectory of the shell,
i.e. the ingoing light ray, is described by
Σ− : t− + r = 0 , (25.86)
Σ+ : t+ + r ∗ = 0 , (25.87)
with r ∗ = r+2m log |r−2m| the usual tortoise coordinate. Thus t− and t+ satisfy differ-
ent equations (and can therefore not be identified across the shell). In this prototypical
situation one then has to appeal to the general Barrabes-Israel formalism (footnote 79)
to determine junction conditions and/or the surface energy-momentum tensor.
Let us now look at the same problem in ingoing and outgoing coordinates. In ingoing
coordinates
v− = t− + r , v+ = t+ + r ∗ (25.88)
the metric on the two sides of the shell
V− : ds2− = −dv−
2
+ 2dv− dr + r 2 dΩ2
(25.89)
V+ : ds2+ = −f (r)dv+
2
+ 2dv+ dr + r 2 dΩ2 ,
Σ∓ : v∓ = 0 (25.90)
• and one can calculate the surface energy-momentum tensor by determining the
bulk Einstein tensor in these adapted coordinates, as done e.g. in the appendix of
the Barrabes-Israel article (footnote 79).
Let us now finally (and to come back to the question we started off with) look at the
situation in outgoing coordinates
u− = t − − r , u+ = t + − r ∗ . (25.92)
Now the metric on the two sides of the shell takes the form
468
with the location of the shell described by
Σ− : u− + 2r = 0 Σ+ : u+ + 2r ∗ = 0 . (25.94)
Thus we are again in a situation where u∓ satisfy different equations and can hence
not be identified across the shell. In particular, this shows that any attempt to write a
combined metric, as in (25.84), is ambiguous and ill-defined because rs (u) is ambiguous.
Thus also in this case one needs to use the general Barrabes-Israel formalism to de-
termine the surface energy-momentum tensor - or one does the calculation in adapted
(ingoing) coordinates and then transforms to outgoing coordinates (if such a coordinate
transformation is known and available: this is not the case for the general Vaidya metric
but is of course the case for the Schwarzschild and Minkowski metrics that are relevant
here). Either way one sees that the energy-momentum tensor has the characteristic
ingoing form ∼ nα nβ .
469
26 Interlude: Null Congruences and Horizons
In this section, we look at some general properties of radial null vector fields in spherical
symmetry, and introduce the notions of inaffinity and expansion of such vector fields.
Thus we consider a spherically symmetric metric, which we can, for the sake of con-
creteness, but need not consider to be of the general form (25.30),
We now consider two linearly independent radial and spherically symmetric null vector
fields ℓα and nα , which we choose to be cross normalised such that
ℓα ℓα = n α n α = 0 , ℓα nα = −1 . (26.2)
Remarks:
1. Here “radial” means that it has components only in the ∂z a -directions transverse
to the sphere, and “spherically symmetric” that the coefficients only depend on
the z a and not on the coordinates of the sphere (this can of course also, if desired,
be phrased in a more coordinate-independent way, e.g. as the statement that the
Lie derivatives of ℓα and nα along the Killing vectors generating the rotational
symmetry vanish, but for present purposes not much is gained by this).
2. In concrete applications we will choose nα to be ingoing (in the sense that future
directed null rays tangent to nα will move towards smaller values of r) and ℓα to
be (asymptotically) outgoing.
3. The minus sign in the cross normalisation is such that both vector fields are either
future or past oriented (and we will of course choose the former).
4. Note that the individual normalisation of the ℓα and nα is not fixed by the above
conditions, i.e. one can still perform the boost
This can e.g. be used to select a preferred normalisation for one of them. If ℓα
has been fixed, then, in spherical symmetry and with the assumption that nα is
also purely radial (longitudinal), nα is uniquely determined by the 2 conditions
nα nα = 0 and nα ℓα = −1.
470
(and likewise for nα ). Taking the scalar product with ℓα and using
ℓα ∇α ℓβ = κℓ ℓβ , nα ∇α nβ = κn nβ . (26.6)
The boost freedom can then e.g. be used to choose either ℓα or nα to be affinely parame-
tried (but usually not both of them simultaneously).
κℓ = −ℓα nβ ∇α ℓβ = ℓα ℓβ ∇α nβ
(26.8)
κn = −nα ℓβ ∇α nβ = nα nβ ∇α ℓβ ,
or
κℓ = 12 ℓα ℓβ (∇α nβ + ∇β nα ) = 12 ℓα ℓβ Ln gαβ
(26.9)
κn = 12 nα nβ (∇α ℓβ + ∇β ℓα ) = 12 nα nβ Lℓ gαβ
Here Lℓ and Ln are the Lie derivatives. Thus the inaffinities encode the information
about the longitudinal projections of the derivatives ∇α ℓβ and ∇α nβ , or of the Lie
derivatives Lℓ gαβ and Ln gαβ .
Other useful information is contained in the transverse (i.e. parallel to the sphere)
projections of these objects. To define them, note that associated with a choice of ℓα
and nα we have the decomposition of the metric
characterised by
sαβ ℓβ = sαβ nβ = 0 (26.12)
471
and
V α ℓα = V α n α = 0 ⇒ gαβ V β = sαβ V β . (26.13)
We now define the expansions of ℓα and nα as the transverse spatial projections of the
divergence of ℓα respectively nα , i.e.
We will now establish two useful results, namely (1) that θℓ equals the ordinary 4-
divergence ∇α ℓα iff κℓ = 0 (and likewise for nα ), and (2) that θℓ and θn can be equiva-
lently written (or defined) as
1 √ 1 √
θ ℓ = √ Lℓ s , θ n = √ Ln s , (26.15)
s s
∇α ℓα = θℓ + κℓ , ∇α nα = θn + κn . (26.17)
is the spatial trace of the Lie derivative of the metric along ℓα . Using (26.10), one
therefore has
(because of the Leibniz rule for the Lie derivative and because sαβ is orthogonal
to ℓα and nα ), and therefore, finally,
1 √
θℓ = 12 sαβ Lℓ sαβ = √ Lℓ s
s
√ (26.20)
1
θn = 12 sαβ Ln sαβ = √ Ln s
s
by the usual formula for the variation of the determinant of a matrix. qed.
80
This is a bit cavalier. For a more careful discussion see e.g. E. Poisson, A Relativist’s Toolkit, section
2.4.
472
Thus the expansions measure the fractional change in cross-sectional area of in- respec-
tively outgoing lightrays.
Examples:
1. Minkowski space
In Minkowski space, in spherical coordinates,
2. Ingoing Vaidya
Recall that in section 25.3 we introduced the radial null vector fields (25.49)
n α ∇α n β = 0 ⇔ κn = 0 (26.26)
nα ∇α nβ = Γβrr = 0 (26.27)
473
(since the only non-trivial metric component grβ is the constant grv = 1). For ℓ,
on the other hand, one finds (see (27.7))
m(v)
κℓ = (26.28)
r2
(non-affine parametristation).
As far as the expansions are concerned, since n = −∂r one sees immediately that
2
θn = − , (26.29)
r
as in Minkowski space. The other expansion,
r − 2m(v)
θℓ = , (26.30)
r2
is more interesting, for reasons that will be explained in section 26.3.
Finally, writing a general ansatz for the energy momentum tensor of ingoing
Vaidya as
Tαβ = ρ(v, r)nα nβ , (26.31)
the covariant divergence of the energy-momentum tensor is, using the general
formulae obtained above,
In this section we will consider some general properties of null geodesic congruences (in
particular, not restricted to radial geodesics and spherical symmetry), namely the null
counterpart of the Raychaudhuri equation for timelike geodesic congruences discussed
in section (9.4).
474
Unlike in the case of radial lightrays in spherical symmetry, here the choice of nα is
no longer unique, but this non-uniqueness will drop out of the final equations we will
derive. In particular one can (but need not) choose nα to be parallel-transported along
ℓα (in which case the above conditions on nα hold everywhere along the congruence
if they are satisified initially),on nα hold everywhere along the congruence if they are
satisified initially).
Associated with this choice of nα we have a decomposition of the metric into a longitu-
dinal and a transverse spatial part, precisely as in (26.10)
and are therefore such that if e.g. V α is a space-time vector, v α ≡ sαβ V β is a spatial
vector, i.e. orthogonal to ℓα and nα ,
v α = sαβ V β ⇒ v α ℓα = v α n α = 0 . (26.40)
For a timelike congruence, this would be enough to ensure that Bαβ is a transverse
spatial tensor, but this is not the case for a null congruence, since Bαβ need not be
orthogonal to nα . Performing this projection explicitly, one sees that this spatial pro-
jection bαβ is equal to
This has two useful immediate consequences that we will make use of in the following,
namely that the trace (with respect to gαβ ) and the square of bαβ are equal to those of
Bαβ ,
(26.43) ⇒ g αβ Bαβ = gαβ bαβ = sαβ bαβ and B αβ Bαβ = bαβ bαβ . (26.44)
475
We can now, as in the timelike case, decompose bαβ orthogonally into ist irreducible
(trace, symmetric traceless, anti-symmetric) parts,
θℓ = σ αβ bαβ = σ αβ ∇β ℓα = g αβ ∇α ℓβ = ∇α ℓα , (26.46)
and we observe that (reassuringly) this agrees with the definition of the expansion given
previously in (26.14). The equivalence with the space-time divergence θℓ = ∇α ℓα is due
to the fact that here the congruence is affinely parametrised, κℓ = 0 (26.17).
As regards the other terms, σαβ and ωαβ are known as the shear tensor and rotation
tensor respectively. Because the decomposition is orthogonal, we have
Because the tensors appearing on the right-hand side of this equation are spatial tensors,
their squares are non-negative,
Our aim will now be, following up on the remark made in connection with (26.24) above,
to determine the change of the expansion θℓ along the null geodesics. In other words,
we want to calculate
d
θ ℓ = Lℓ θ ℓ = ℓα ∇ α θ ℓ . (26.49)
dτ
Just following one’s tensor calculus nose, one finds
d
θℓ = ℓα ∇α (∇β ℓβ )
dτ
= ℓα [∇α , ∇β ]ℓβ + ℓα ∇β ∇α ℓβ (26.50)
The 2nd term is just −B αβ Bβα and the 3rd term is zero because ℓ is geodesic. Thus
one finds the Raychaudhuri equation for null congruences
d
θℓ = −Rαβ ℓα ℓβ − 12 θℓ2 − σ αβ σαβ + ω αβ ωαβ . (26.51)
dτ
Here are two variations of this important equation:
1. Since ℓα is null, we can replace the Ricci tensor in the above equation by the
Einstein tensor Gαβ = Rαβ − gαβ R/2, and for a solution to the Einstein equations
we can express this in terms of the energy-momentum tensor. Then one has
d
θℓ = −8πGN Tαβ ℓα ℓβ − 21 θℓ2 − σ αβ σαβ + ω αβ ωαβ . (26.52)
dτ
476
2. If one does not assume that the null geodeesics congruence is affinely parametrised,
then (26.51) picks up another term on the right-hand side (unsurprisingly propor-
tional to the inaffinity κℓ ),
d
θℓ = κℓ θℓ − Rαβ ℓα ℓβ − 12 θℓ2 − σ αβ σαβ + ω αβ ωαβ . (26.53)
dτ
If the energy-momentum tensor satisfies what is known as the null energy condition,
one has in particular
Tαβ ℓα ℓβ ≥ 0 . (26.54)
d
θℓ ≤ − 12 θℓ2 ≤ 0 . (26.55)
dτ
In Minkowski space-time we would have an equality instead of the first inequality sign
(26.24), so (26.55) is the (reasonable) statement that the expansion is smaller/slower
than in Minkowski space, i.e. that (reasonable) matter has the tendency to focus ligh-
trays.
Analagously to the timelike case, this also has the consequence that if one has an initially
converging null congruence, θℓ (τ0 ) < 0, then because of
d 1 1 τ − τ0
θℓ ≤ − 12 θℓ2 ⇒ ≥ + (26.56)
dτ θℓ (τ ) θℓ (τ0 ) 2
This usually indicates the formation of a (harmless) caustic where these null geodesics
cross (such as e.g. in Minkowski space where radially ingoing lightrays have expansion
θ = −2/r → −∞ as r → 0). However, this prediction of a would-be divergent expansion
can have far-reaching consequences in cases where one knows on other grounds that
caustics cannot form, because it means that the geodesic cannot be extended to where
one would find θℓ → −∞.
This sort of reasoning plays a crucial role both in the singularity theorems of general
relativity (where one is, however, usually concerned with timelike geodesic congruences)
as well as in the study of black holes and the laws governing the evolution of their event
horizons (where the interest is in the null geodesic congreunces generating the horizon).
In particular, in the latter case the Raychaudhuri equation is the crucial ingredient in
the proof of the statement (Hawking’s theorem) that under reasonable conditions the
cross-sectional area of the event horizon of a black hole cannot decrease.
477
26.3 Apparent/Trapping Horizons of Vaidya Metrics
In the case of the Schwarszschild metric, the null hypersurface r = 2m describes the
characteristic event horizon of a static black hole. It is evident that also for the general
ingoing Vaidya metric something special happens at those points of space-time where
However, as we will discuss now, this is not the event horizon of the Vaidya black hole.
Rather, this hypersurface is known as the apparent horizon or trapping horizon.
In the present context it is the locus where the lightcones “tilt over”, i.e. the boundary
between between the region where the so-called outgoing future-oriented lightrays are
really locally outgoing in the sense that they move to larger values of r, dr/dτ > 0,
and the region where also the supposedly “outgoing” future-oriented lightrays move to
smaller values of r, dr/dτ < 0. This can be seen directly e.g. from the condition
dr
−f (v, r)dv + 2dr = 0 ⇔ 2 = f (v, r) . (26.59)
dv
for outgoing (v not constant) lightrays in ingoing coordinates. Then one sees that
(
dr > 0 for r > 2m(v) : truly outgoing
= 12 f (v, r) . (26.60)
dv < 0 for r < 2m(v) : actually ingoing
The normal directions to the 2-sphere S can be spanned by the two linearly indepen-
dent radial null vectors nα (ingoing) and ℓα (“outgoing”), and because of the spherical
symmetry the extrinsic geometry of the 2-sphere can be completely characterised by the
fractional change of the area element along ℓ and n, the expansions
1 √ 1 √
θ ℓ = √ Lℓ s , θ n = √ Ln s , (26.62)
s s
(cf. the discussion in section 26.1). Concretetly, using (26.61), one finds for the expan-
sions √ √
ℓα ∂α s 2 α nα ∂α s 2
θℓ = √ = ℓ ∂α r , θn = √ = nα ∂α r (26.63)
s r s r
or simply
2 2
θ ℓ = ℓr , θn = nr . (26.64)
r r
81
See e.g. the discussion in I. Booth, J. Martin, On the proximity of black hole horizons: lessons from
Vaidya, arXiv:1007.1642 [gr-qc] and references therein.
478
These expansions are therefore a measure of the change of r (and hence the induced
area) along the null directions n and ℓ. Since n is ingoing, one expects θn < 0, and this
expectation is indeed borne out in the ingoing Vaidya metric, for which one has, from
(25.49), nr = −1 < 0, and thus
2
θn = − < 0 . (26.65)
r
This is indepedent of the mass function m(v) and therefore, in particular, identical to
the inward expansion (perhaps better: contraction) of a sphere of constant t and r
in Minkowski space along ingoing radial light rays. The non-triviality of the extrinsic
geometry of the surface is then characterised by θℓ , and the 2-surface
untrapped if θℓ > 0
S is called marginally trapped if θℓ = 0 (26.66)
trapped if θℓ < 0
The existence of trapped surfaces is characteristic feature of the region of the Schwarz-
schild black hole inside the future horizon (causal evolution is necessarily towards de-
creasing values of r), and also plays a key role in the (Hawking, Penrose, . . . ) singularity
theorems of classical general relativity. Specifically, in the case of the ingoing Vaidya
metric, we have, again from (25.49),
r − 2m(v)
ℓ = ∂v + 12 f (v, r)∂r ⇒ θℓ = , (26.67)
r2
and thus
untrapped for r > 2m(v)
S is marginally trapped for r = 2m(v) (26.68)
trapped for r < 2m(v)
In the present case, the union of all the marginally trapped surfaces (as v varies) gives us
what is known as the trapping horizon or (somewhat sloppily) the apparent horizon.82
The induced metric on what we will henceforth refer to as the apparent horizon, namely
the hypersurface defined by r = 2m(v, r) (26.58), is
Since we are assuming that m′ (v) ≥ 0, this hypersurface is spacelike unless m′ (v) = 0,
when it is null. In the latter case we are dealing with the Schwarzschild solution and
82
I have oversimplified the exposition by (a) explicitly referring only to the spherically-symmetric
2-surfaces S of constant r and v, and (b) implicitly referring everything to a spherically symmetric
foliation of the space-time. In general, definitions and terminology require significantly more care.
See e.g. I. Booth, Black Hole Boundaries, arXiv:gr-qc/0508107, for an overview of, and guided tour
through, the zoo of quasi-local geometric definitions of horizons. Non-spherically symmetric foliations
and trapped surfaces in Vaidya space-times are discussed in some detail in A. Nielsen, M. Jasiulek,
B. Krishnan, E. Schnetter, The slicing dependence of non-spherically symmetric quasi-local horizons in
Vaidya Spacetimes, arXiv:1007.2990 [gr-qc].
479
the apparent horizon agrees with the usual (event) horizon r = 2m of the Schwarzschild
black hole which is of course a null hypersurface. Analogously, the apparent horizon of
the outgoing Vaidya metric (25.12) with m′ (u) < 0 is spacelike, with induced metric
This local geometric notion of an apparent horizon described above needs to be distin-
guished from the global causal notion of a true event horizon, whose existence char-
acterises the presence of a black hole - loosely speaking, a region of space-time that
is causally sealed off from the rest of space-time in the sense that no information can
escape “to the outside”. Correspondingly, the event horizon is then the boundary of
the region from which lightrays cannot escape to infinity.
This is also captured by, and made precise in, the “official” definition of a future event
horizon, as the boundary of the past of future null infinity I + (which becomes a rigorous
definition once all the terms appearing in it have been properly defined). By its definition
as a causal boundary, this event horizon is a null surface. Analogously, a past event
horizon is defined as the boundary of the future of past null infinity I − .
However, this definition (used formally or informally) also makes it clear that the event
horizon is much more of a global and subtle (teleological) object than, say, the apparent
horizon (which we have already determined without any effort). In order to determine
the event horizon, it is not enough to know if a lightray locally or instantaneously moves
to larger values of r (this local information is completely captured by the expansions θn
and θℓ ). In order to be able to assert that this lightray will reach infinity, one needs to
make sure that it continues to move to larger values of r in the future. As the future
behaviour of the lightray depends on the future evolution of the geometry (e.g., in the
present context, on the form of the mass function m(v)), it is clear that the location of
the event horizon at a given time cannot be determined without knowing the (entire!)
future evolution of that space-time.
More specifically, in the present context of the Vaidya metric, if one has an initially really
outgoing lightray at some time vi , i.e. at some ri > 2m(vi ), it is not guaranteed that this
lightray will remain at r > 2m(v) for all v. If it crosses the apparent horizon r = 2m(v)
at some later “time” v, it reaches a local maximum of r there, dr/dτ = dr/dv = 0,
and then (at least at first) returns to smaller values of r. In particular, what may have
appeared initially to be a safe radial distance (where one can send lightrays outwards
locally) can become unsafe in the future if the mass increases (m′ (v) > 0, as we are
assuming).
480
In order to determine the future event horizon rather than the apparent horizon, one
thus needs to determine the “last” outgoing lightray that can escape to infinity. The
event horizon is spanned / generated by this S 2 -family of lightrays (and this description
makes it manifest that the event horizon is null). Evidently in the time-independent
Schwarzschild case the notions of event horizon and apparent horizon agree. In partic-
ular, if one has an initially outgoing lightray at r > 2m0 , then it will continue to satisfy
r > 2m0 in the future because r is increasing.
In general, however, the event horizon will not agree with the apparrent horizon (and
will typically lie outside the apparent horizon). From what we have seen so far, locally
nothing particularly untoward or dangerous seems to be happpening in the region be-
tween the two, the danger apparently revealing itself only through the future evolution
of the space-time. However, in the above we have restricted attention to spherically
symmetric surfaces. If one considers more general trapped surfaces, and furthermore
relaxes the restriction to surfaces lying in a single time-slice, one can define a 3-surface
in space-time as the (outer) boundary of the region containing outer trapped surfaces
(θℓ < 0, no condition on θn ). This 3-surface will necessarily lie somewhere between the
apparent and the event horizon (and could coincide with one of them).
It was conjectured by Eardley that this 3-surface actually coincides with the event
horizon, and this conjecture was established by Ben-Dov in the case of Vaiyda space-
times with finite total mass, i.e. with m(v) bounded from above.83 He also shows that
the restriction to genuinely trapped surfaces with θn < 0 as well is not enough to fill
out the space between the apparent and event horizons. This issue has been further
analysed by Bengtsson and Senovilla who investigated if and when genuinely trapped
surfaces can extend into the intermediate region.84
Thus, if or when Eardley’s conjecture holds, this seems to provide the desired almost
local characterisation of the event horizon. However, this is somewhat misleading be-
cause the outer trapped spacelike surfaces that are required extend far into the future
and in this way manage to feed back the information about the future evolution of the
space-time into the location of the boundary 3-surface at an earlier time. Thus in spite
of these results there appears to be no good local in time characterisation of the event
horizon.
481
not one had encountered a black hole in one’s simulations is rightly considered to be a
somewhat non-constructive and non-productive procedure), but also by considerations
involving the mechanics and thermodynamics of black hole horizons, a lot of work has
gone into finding suitable definitions and characterisations of quasi-local definitions of
horizons and studying their properties.85
Our aims in the following will be much more modest and we will be content with simply
locating the event horizon itself in some simple examples of Vaidya (and other) space-
times and comparing it with the apparent horizon.
It is particularly easy to describe the evolution and growth of the event horizon in the
case of the collapsing spherical shell of null matter in Minkowski space discussed in
section 25.5, with metric (25.79)
2mf
ds2 = −f (v, r)dv 2 + 2dv dr + r 2 dΩ2 , f (v, r) = 1 − Θ(v) . (26.71)
r
For v > 0, outside the shell, the metric is the Schwarzschild metric and the event
= apparent horizon is simply the Schwarzschild event horizon at r = rs = 2mf . To
determine the event horizon in the interior of the shell, one needs to determine that
S 2 -family of outgoing radial lightrays in Minkowski space which reaches rs = 2mf at
v = 0, and thus connects to the exterior event horizon at the locus v = 0 of the shell.
Using the fact that a general outgoing lightray in ingoing Minkowski coordinates (v, r)
is described by
u = t − r = v − 2r = c (constant) , (26.72)
which starts growing from r = 0 at the time v = −2rs , before the shell has arrived or
crossed its Schwarzschild radius. By contrast, the apparent horizon only exists for v ≥ 0
(and coincides with the Schwarzschild event horizon r = rs in that region).
482
26.6 Example: Horizons in Oppenheimer-Snyder Collapse
This solution describes a collapsing dust star for τ < 0, collapsing to zero radius at time
τ = 0.
As the exterior geometry is just the Schwarzschild geometry, in the exterior region the
event = apparent horizon is the null surface r=2m, coming into existence at the time
τ = τf when the star crosses its Schwarzschild radius, i.e. at the time τf given by
The interest is therefore in the formation and evolution of horizons in the interior of the
star. In order to explore the causal structure of this (spherically symmetric) solution,
we look at radial null rays, characterised by
Since H < 0, the former describe ingoing radial null geodesics because
dr
= (−1 + rH) < 0 (ingoing) , (26.80)
dτ
while the latter, satisfying
dr
= (+1 + rH) < 0 (“outgoing”) , (26.81)
dτ
483
describe truly outgoing radial null geodesics only for r < 1/H, while these geodesics are
also ingoing for r > 1/H. Thus there is an apparent horizon inside the star at
which shows that the apparent horizon is a timelike hypersurface in this case.
In order to determine the event horizon, we need to determine the interior outgoing
lightrays that reach the surface of the star just as the surface of the star passes through
its Schwarzschild radius, i.e. at the time τ = τf determined in (26.77).
ṙ = 1 + rH = 1 + 2r/3τ . (26.84)
The integration constant is determined by selecting the outgoing lightray with r(τf ) =
2m,
r(τf ) = 2m ⇒ c0 = 3C , (26.88)
Collecting our intermediate results, we see that the surface of the star, the apparent
horizon and the event horizon are described by
484
1. At time τ = τf = −4m/3, for the event horizon we have
reh (τ = τf ) = 2m (26.91)
as it should be.
2. The event horizon starts growing from the non-singular center of the star r = 0
at the time τ = τi < 0 determined by
with
ṙeh (τi ) = 1 , ṙeh (τf ) = 0 (26.95)
so the apparent horizon starts forming as the star crosses its Schwarzschild radius
at τ = τf . Indeed for τ < τf one would have rah (τ ) > R(τ ), but this would be
outside the star (and we determined the apparent horizon by studying the interior
lightrays). It then shrinks from r = 2m at τ = τf to r = 0 at τ = 0.
4. Thus for τf < τ < 0 the apparent horizon has two “branches”, one inside the star
and one outside. Inside the star, trapped spheres only occur outside the apparent
horizon, i.e. in the region between the apparent horizon and the surface of the
star. Outside the star, they of course occur in the Schwarzschild black hole region
r < reh = 2m.
485
27 Vaidya Metrics II: Radial Null and Timelike Geodesics
In order to improve our understanding of the physics of the Vaidya metrics, and to
explore their causal structure, in this section we now look at radial and timelike null
geodesics and their properties.
m(v) 2
v̈ + v̇ =0
r2 (27.2)
m′ (v) 2
r̈ + v̇ = 0
r
(the null condition (27.1) has been used to put the r-equation into this simple form) or
by appropriate first integrals of these equations arising from conserved charges. These
are available only for special choices of m(v). There are two cases (I am aware of),
namely
Continuing for now with the general (but not generalised) Vaidya metric, ingoing radial
null geodesics satisfy dv = 0 or v̇ = 0, and thus from (27.2)
The tangent vector ẋα = dxα /dτ is future-oriented, i.e. such that its scalar product
with ∂t = ∂v is negative, for ṙ0 < 0,
!
gαβ ẋα (∂t )β = gαv ẋα = ṙ = ṙ0 < 0 , (27.4)
so that indeed the radius decreases along future-oriented ingoing null geodesics. More-
over, since r is affinely related to τ , r is an affine parameter along these ingoing null
geodesics. In particular, this means that the ingoing null vector field n = −∂r introduced
in (25.49) is affinely parametrised, i.e. satisfies
n α ∇α n β = 0 , (27.5)
486
as we had already seen in (26.27) in the examples of section 26.1.
By contrast, the outgoing null vector field ℓ of (25.49), chosen to satisfy ℓα nα = −1, is
then not affinely parametrised. Indeed, using
and noting (e.g. from the v-equation in (27.2)) that the only non-trivial Christoffel
symbol Γvαβ is Γvvv = m(v)/r 2 , one deduces the inaffinity of ℓα ,
m(v) β
ℓα ∇ α ℓβ = ℓ ≡ κℓ ℓβ . (27.7)
r2
Affinely parametrised outgoing radial null geodesics, on the other hand, satisfy (27.2)
and
dr
−f (v, r)dv + 2dr = 0 ⇔ −f (v, r)v̇ + 2ṙ = 0 ⇔ 2 = f (v, r) . (27.8)
dv
Remarkably, with the help of the null condition (27.8) the non-linear second order
geodesic equations can be integrated to first-order differential equations.86 As the
derivation is not given in that reference, we provide it here. First of all, we write
m′ (v)v̇ = ṁ (27.9)
and use (27.8) to eliminate the remaining v̇ from the r-equation in (27.2). Then one
finds
2ṁ d 1
0 = r̈ + ṙ = r̈ − ṙ log(r − 2m) + ṙ 2
r − 2m dτ r − 2m
d (27.10)
⇔ log(ṙ/(r − 2m)) = −ṙ/(r − 2m)
dτ
⇔ ṙ/(r − 2m) = 1/(τ − τ0 )
or
dr r(τ ) − 2m(v(τ ))
= . (27.11)
dτ τ − τ0
From (27.8) one then also deduces
dv 2r(τ )
= , (27.12)
dτ τ − τ0
so that these are future-directed curves (increasing v) for τ > τ0 . Equations (27.11) and
(27.12) imply the outgoing null condition (27.8) and govern the behaviour of outgoing
lightrays in the Vaidya metric. Without loss of generality we can set τ0 = 0.
86
R. Waugh, K. Lake, Backscattering radiation in the Vaidya metric near zero mass, Phys. Lett. B116
(1986) 154-156.
487
These 1st order equations allow us to rewrite the null geodesic equations (27.2) for
outgoing null geodesics in a way that will turn out to be useful later on: the geodesic
equation for r in (27.2) together with (27.12) leads to
4m′ (v)
r̈ + r=0 ; (27.13)
τ2
likewise, the geodesic equation for v in (27.2) together with (27.12) leads to
4m(v)
v̈ + =0 . (27.14)
τ2
These equations become particularly tractable (namely linear) in the case of a linear
mass function m(v) = µv that we will study in detail later.
As a warm-up exercise, a first application of the above results, and for comparison (and,
later on, matching) purposes, we first rederive the equations for outgoing lightrays and
for the horizon generators in the constant mass Schwarzschild case m(v) = m0 in ingoing
Eddington-Finkelstein coordinates before addressing the same problem in the dynamical
Vaidya context.
For the Schwarzschild metric one has m′ (v) = 0 and (27.2) implies that r̈ = 0 so that
ṙ = E (27.15)
For τ → τ0 , these lightrays emerge from the past horizon r = 2m0 and v → −∞.
This illustrates the past geodesic incompleteness of the ingoing Eddington-Finkelstein
coordinates.
A special case is the geodesic (or S 2 -family of geodesics) for E = 0, for which
Remarks:
488
1. The explicit solution (27.19) shows that the Kruskal coordinate
V = e v/4m0 = τ (27.20)
2. Note also, for comparison with the Vaidya metric, that (27.8) implies that the 2nd
derivative d2 r/dv 2 is
d2 r m0
= 2 f (r) . (27.21)
dv 2 2r
For r > 2m0 one has f (r) > 0. Thus r(v) is convex and an initially outgoing
lightray, dr/dv > 0, will remain outgoing at all times. For r < 2m0 , on the other
hand, one has f (r) < 0 and therefore dr/dv < 0 and d2 r/dv 2 < 0. Thus r(v) is
concave, moving towards smaller values of r, and will ultimately reach r = 0 at a
finite value of v.
One can go through the same exercises for the outgoing Vaidya metric (25.12)
2m(u)
ds2 = −f (u, r)du2 − 2dudr + r 2 dΩ2 , f (u, r) = 1 − . (27.22)
r
Here we now assume that the mass function m(u) is non-negative (m(u) ≥ 0) and
non-increasing (m′ (u) ≤ 0).
Outgoing null geodesics are given by u = const., and r is an affine parameter along
these outgoing null geodesics.
m′ (u) 2
r̈ − u̇ = 0
r (27.24)
m(u)
ü − 2 u̇2 = 0
r
(note the sign flip relative to (27.2) in both equations). Again, these can be integrated
to first order equations, and the counterpart of (27.11) and (27.12) is
r − 2m(u) 2r
ṙ = , u̇ = − (27.25)
τ τ
(a sign flip only in the 2nd equation, and we have set an integration constant τ0 to
zero). We have ṙ < 0 and u̇ > 0 for r > 2m and τ < 0, the restriction on the range of τ
489
already suggesting a potential future geodesic incompleteness of this coordinate system.
We will come back to this issue below.
With the help of (27.25), the equations (27.24) can be put into the form
In order to improve our understanding of the Vaidya geometry, we will now relate the
data given by the geometry and matter content (metric and energy-momentum tensor)
to those measured by an oberver. We will concentrate on the outgoing (radiating)
Vaidya metric, but of course the ingoing case can be discussed in complete analogy.87
dxα (τo )
(xα )• = = (u• , r • , 0, 0) . (27.27)
dτo
so that
•
gαβ (xα )• (xβ ) = −1 ⇔ 2u• r • + f (u, r)(u• )2 = +1 . (27.28)
This implies, in particular, that u• 6= 0, and we will take future-oriented paths to mean
u• > 0. We will also mostly be interested in observers that travel (or at least start off)
in the region r > 2m(u) outside the (unphysical) past apparent horizon.
Introducing the quantity
Eu = f u• + r • (27.29)
(the rationale for this notation will be explained in section 27.4), one can solve (27.28)
for u• ,
u• = (Eu + r • )−1 (27.30)
For f (r) = 1 − 2m(u)/r with m(u) bounded one has an asymptotically flat metric in
the (crude) sense that f → 1 for r → ∞. In that case, it follows from the Vaidya line
element that the proper time for an oberver at rest at infinity is simply dτ∞ = du. Thus
87
This discussion is adapted (and expanded) from R. Lindquist, R. Schwartz, C. Misner, Vaidya’s
Radiating Schwarzschild Metric, Phys. Rev. 137 (1965) 5B, B1364-B1368.
490
(27.30) can be interpreted as the relation between the observer’s proper time and the
proper time at infinity,
dτo = (Eu + r • )−1 dτ∞ . (27.32)
As usual, this formula will be related to that for gravitational redshift to be discussed
below.
We will also use them in section 27.4 to take a more detailed look at, and interpret,
the equations for timelike geodesics, and in section 27.6 to locate and detect potential
surfaces of infinite redshift and discuss the issues that arise in relation to them.
1. The null wave vector kα of outgoing lightrays, in particular of the outgoing ra-
diation due to the energy-momentum tensor, will be proportional to the affinely
parametrised outgoing null vector ℓα = (∂r )α (25.54). The frequency ωo of this
lightray as determined by the timelike observer will essentially be given by the
projection of the wave vector kα onto the observer’s rest-frame, namely
We can also equivalently phrase this in terms of the observer emitting outgoing
lightrays. Since outgoing lightrays travel along lines of constant u, signals with
initial separation ∆u are received at infinity with the same separation ∆u. The
different percieved frequencies are due to the differences in proper time, and this
explains the equivalence between (27.32) and (27.33).
2. It follows from (25.53) and (25.57) that the (null) energy density of the Vaidya
metric is
m′ (u)
ρout = − , (27.34)
4πGN r 2
so that the total flux through the sphere of radius r is independent of r and given
by
F = 4πr 2 ρout = −m′ (u)/GN , (27.35)
and likewise
F = m′ (v)/GN (27.36)
for the ingoing Vaidya metric. Here the convention has been chosen that the flux
F ∼ ǫm′ (w) is positive for both ingoing and outgoing radiation.
491
However, an oberver will not necessarily detect this static energy density and flux.
The same reasoning as above that the energy-density of the outgoing radiation
provided by the background energy-momentum tensor measured by this observer
in his rest-frame is
•
ρo = Tαβ (xα )• (xβ ) = ρout (ℓα (xα )• )2 = ρout (u• )2 (27.37)
or
m′ (u)
ρo = − (u• )2 . (27.38)
4πGN r 2
Thus ρo can be written in terms of the observer’s radial velocity r • as
m′ (u)
ρo = − (Eu + r • )−2 . (27.39)
4πGN r 2
For an observer at rest at infinity one thus finds the total flux or luminosity
F∞ = lim
•
lim 4πr 2 ρ0 = −m′ (u)/GN = F (27.40)
r →0 r→∞
by
Fo = F∞ (u• )2 = F∞ (Eu + r • )−2 . (27.42)
As for the Hubble distance - redshift relation, the double redshift factor is due to
(a) the redshift of the energy (e.g. of each individual photon) as it moves outwards
and (b) the dilation of the time-interval over which the energy is emitted (e.g. of
the rate at which photons are emitted).
∂L
Eu = − = f u• + r • , (27.45)
∂u•
492
thus justifying the notation / abbreviation introduced in (27.29). Equation (27.31),
(r • )2 + f (u, r) = (Eu )2 , (27.46)
then has the usual interpretation of a one-dimensional efffective potential equation for
the radial dynamics. The main difference from the stationary case is that here Eu is not
conserved (nevertheless it provides a first integral of the 2nd order equations of motion).
For a solution of the Euler-Lagrange equations, one can determine the time-dependence
of Eu from
∂L m′ (u) • 2
Eu• = − =− (u ) . (27.47)
∂u r
Using (27.41), this can also bewritten in the intuitively attractive form
GN Fo
Eu• = , (27.48)
r
stating that the change in energy of the particle is due to the energy flux (luminosity)
of the background energy-momentum tensor.
The radial equation of motion turns out to take the remarkably simple form
GN Fo m(u)
r •• = − − 2 . (27.49)
r r
It is best obtained not by differentiation of (27.46) (and division by r • assuming that
one is not dealing with circular paths), since it requires a bit of rearrangement to put
the resulting equation into the form (27.49), but rather as the Euler-Lagrange equation
for r, rewritten using (27.30). Indeed, the Euler-Lagrange equation reads
d ∂L ∂L m(u) • 2
= ⇔ u•• = (u ) (27.50)
dτo ∂r • ∂r r2
while •
u•• 1
− • 2 = = (r • + Eu )• . (27.51)
(u ) u•
Using (27.48) and (27.51), (27.50) can be written in the form (27.49).
Remarks:
1. This shows that the energy-flux gives a new non-Newtonian long-range contribu-
tion to the gravitational force, induced by the varying Newtonian term m(u)/r 2 .
2. The new force term −GN Fo /r potentially dominates at large distances. The
critical radius at which the new term begins to dominate is at r ∼ m/Fo (factors
of c suppressed), which is estimated by Lindquist et al to be totally irrelevant
for solar system dynamics, but might play a role in highly relativistic phases of
gravitational collapse.
3. If one considers non-radial motion, with conserved angular momentum L, then the
additional terms in the radial equation of motion are just the standard angular
momentum barrier term ∼ L2 /r 3 and the standard general relativistic correction
term ∼ m(u)L2 /r 4 .
493
27.5 Future Incompletetness of Outgoing Eddington-Finkelstein Coor-
dinates
We will now look at some issues related to the potential future incompleteness of the out-
going coordinate system at u = +∞ for the general outgoing Vaidya metric. We could
have analogously discussed the potential past-incompleteness of the ingoing coordinate
system at v = −∞, but past horizons are generally considered to be less physically
relevant than future horizons, which can form in the process of gravitational collapse,
and thus we focus on the outgoing case.
To set the stage for the subsequent discussions, and to remind ourselves of the basic
properties of outgoing coordinates, we will briefly recall the solutions for ingoing null
geodesics in these coordinates for the two special cases m(u) = 0 (Minkowski space)
and m(u) = m0 constant (Schwarzschild in outgoing Eddington-Finkelstein coordinates
that cover the Schwarzsschild patch as well as the past white hole region).
1. For m(u) = 0 one has Minkowski space-time and the coordinate u is related to the
usual Minkowski coordinates (t, r) by u = t − r. The 1st order geodesic equations
(27.25) are simply
ṙ = r/τ , u̇ = −2r/τ , (27.52)
and therefore
The integration constant c can be identified with (and used to construct) the
ingoing lightcone coordinate v = t + r from the outgoing lightcone coordinate
u = t − r. Indeed, prior to the extension to τ > 0 (i.e. for geodesics that are
ingoing) one has
c = u − 2Eτ = u + 2r = t + r = v , (27.55)
so that ingoing null geodesics are lines of constant v.
2. For m(u) = m0 > 0 a positive constant one has the Schwarzschild geometry, and
the coordinate u is related to the usual Schwarzschild time coordinate t and the
tortoise coordinate r ∗ ,
r ∗ = r + 2m log |r − 2m| , (27.56)
494
by u = t − r ∗ . The 1st order geodesic equations (27.25) are simply
which integrate to
r(τ ) = 2m0 −Eτ , u(τ ) = −4m0 log |τ |+2Eτ +c (−∞ < τ < 0) . (27.58)
In this case, the situation is quite different. As τ → 0− , one has r → 2m0 and
u → +∞. This is an infinite redshift surface and the future event horizon (points
on the past horizon correspond to r = 2m and u finite). Thus the space-time and
the coordinates need to be extended beyond u = +∞.
In the present case this is easily done by noting that the integration constant c
is (up to other constants) equal to the ingoing Eddington-Finkelstein coordinate
v = t + r ∗ . Indeed,
Thus ingoing radial lightrays are lines of constant v and the ingoing Eddington-
Finkelstein coordinates (v, r) cover the Schwarzschild patch as well as its future
extension beyond the future horizon (now located at r = 2m with v finite, v =
−∞ corresponding to the past horizon at which the newly constructed ingoing
Eddington-Finkelstein coordinates (v, r) are incomplete).
In section 27.3, we determined the gravitational redshift in the outgoing Vaidya geom-
etry. We now try to determine and locate possible surfaces of infinite redshift. To that
end, note that it follows from (27.32) that infinite time dilation or infinite gravitational
redshift occurs when Eu + r • → 0. It then follows from (27.31) that a necessary
condition for this to occur is that f (u, r) → 0 or r → 2m(u) (from “above”),
Since f > 0 for r > 2m(u), and u• > 0 for a future oriented path it then follows from
(27.29) that
Eu + r • = f u• + 2r • → 0 ⇒ r • < 0 . (27.61)
It is easy to see that this cannot occur at finite u:
which rules out a negative r • . Indeed, r = 2m(u) with u finite is exactly like
the past event horizon for the Schwarzschild metric which can only be crossed
495
along future-directed paths in the direction of increasing r (and in section 26.3 we
already identifed this surface concretely as what is known as the past apparent or
trapping horizon of the outgoing Vaidya metric).
Thus there is an infinite redshift surface (of a freely falling relatively to a stationary
observer) at r = 2m(u = ∞).
We will now show that, as in the Schwarzschild case, this infinite redshift surface is
at finite affine distance, i.e. that this surface can be reached in finite proper time (for
timelike geodesic observers) or affine parameter (for ingoing lightrays). This means
that the outgoing Vaidya coordinates are (like their Schwarzschild counterparts, the
outgoing Eddington-Finkelstein coordinates) future-incomplete and that the space-time
needs to be extended beyond this infinite redshift surface (an issue we will briefly turn
to afterwards).
To that end we use the 2nd of the equations (27.25), which we integrate to
Z
2r dτ du
u̇ = − ⇒ =− ⇒ log |τ | = − du/2r(u) , (27.64)
τ τ 2r(u)
or R
|τ | = e − du/2r(u) . (27.65)
As r(u) → 2m(u), the leading term in this integral is
Z
−4 log |τ | = du/m(u) + . . . (27.66)
It follows from (27.66) that r reaches 2m(u) at the finite time τ = 0 (selected by the
R
choice of integration constant) iff the integral du/m(u) diverges for large u, i.e. iff m(u)
grows slower than linearly at large u. Since we are only considering non-increasing m(u)
anyway, this shows that for any non-increasing function m(u) that is not identically zero
for some u ≥ u0 (then the previous reasoning leading to the conclusion that r → 2m(u)
requires u → ∞ does not apply) the surface of infinite redshift r = 2m(∞) is at finite
affine distance.
1. Consider first the case when m(u) is bounded away from zero, i.e. when one has
In this case one can reduce the argument to that for the Schwarzschild metric with
constant mass m0 . Indeed, in this case the integral (27.66) is
Z Z
du du
−4 log |τ | = ≤ = u/m0 (27.68)
m(u) m0
496
or
u ≥ −4m0 log |τ | . (27.69)
Thus one reproduces the Schwarzschild result (with inequality because the mass
was at least as big as m0 ), with the conclusion
2. This does not yet show what happens in the case where m(u) → 0 asymptotically
for u → ∞. A priori it is conceivable that whether one finds the Minkowski space
behaviour (u → ∞ only as τ → ∞, no need for completion) or the Schwarzschild
behaviour (u → ∞ for finite τ , completion required) depends on the rate at which
m(u) → 0.
If one assumes that for large u the mass function m(u) behaves like m(u) ∼ u−a
for some a > 0, say, or goes to zero exponentially, then for large u the integral in
(27.66) gives
m(u) ∼ u−a ⇒ − log |τ | ∼ ua+1
(27.71)
m(u) ∼ e −au ⇒ − log |τ | ∼ e au ,
so this still implies u → ∞ for τ → 0− for any a > 0 (actually for any a > −1 in
the power-law case), in agreement with the general argument.
Thus we find that quite generally r = 2m(u = ∞) behaves exactly like the future
horizon of a static Schwarzschild black hole, which also “sits at” t = +∞, where t is
the Schwarzschild time, equivalently (up to some constant factor) the proper time of a
non-geodesic stationary observer, or at u = +∞, where u is the retarded Eddington-
Finkelstein coordinate.
As we have seen above, the outgoing Vaidya coordinates are future-incomplete and one
needs to future-extend the space-time beyond u = +∞. This issue was first pointed
out by Lindquist et al (footnote 87) who stated, however, that they were unable to find
an extension. Indeed, this issue turns out to be far from trivial and is not resolved in
general.
As regards this issue of future incompleteness and future extension, the two special
cases recalled in section 27.5 above should be prototypical in the sense that for the
non-increasing non-negative mass function m(u) of the outgoing Vaidya metric one only
has the following 4 options:
497
2. m(u) decreases to a positive constant value m0 > 0 at some finite value u = u0
(and then remains constant);
1. In the first case, for u ≥ u0 the metric is just the Minkowski space in outgoing
lightcone coordinates and the extension of the spacetime should not be an issue.
However, since this means that r = 2m(u = ∞) → 0, the singularity of the
metric at r = 0 is potentially dangerous. And indeed it is shown by Waugh
and Lake (footnote 86) that backscattered light emitted towards smaller values
of r is infinitely blueshifted for r → 0. This implies that the backreaction of the
backscattering cannot be ignored, and that classically the process of a black hole
or star radiating away all its mass to leave behind Minkowski space is unstable to
this backreaction.
2. In the second case, the metric is the Schwarzzschild metric for u ≥ u0 and the
future extension of the metric is well known (and can e.g. be described by ingoing
Eddington-Finkelstein coordinates or by Kruskal-Szekeres coordinates).
3. In the third case, the infinite redshift surface recedes to r → 0, and the space-time
geometry is potentially singular there, either already prior to taking into account
backreaction or, as above, once backscattering is taken into account.
4. This leaves the fourth case, with a surface of infinite redshift at the finite value
r = 2m(u = ∞) of the radius that can be reached in finite proper time or afffine
parameter and behaves very much like a future horizon as the potenbtially most
interesting case to look at.
• A future extension of the outgoing Vaidya metric in this case was first proposed
by W. Israel in 1967.88 It is based on a suitable generalisation of the remarkable
Israel coordinates for the Schwarzschild metric discussed in section 15.8.
• However, one of the problems, already realised and discussed by W. Israel, is that,
in order to extend the metric beyond u = ∞, one also has to extend the mass
function m(u) to that region, and it is not obvious (and in fact not true) that
88
W. Israel, Gravitational collapse of a radiating star, Physics Letters A24 (1967) 184.
498
there is a unique way of doing this. This issue was further discussed by Fayos et
al. who suggested a slight modification of the procedure proposed by Israel.89
• One could try to side-step this non-uniqueness issue by attempting to solve the
Einstein equations directly in Kruskal-like double-null coordinates, but this gen-
erally leads to equations that cannot be solved analytically.90
89
F. Fayos, M. Martin-Prats, J. Senovilla, On the extension of Vaidya and Vaidya-Reissner-Nordström
spacetimes, Class. Quant. Grav. 12 (1995) 2565-2576.
90
See e.g. B. Waugh, K. Lake, Double-null coordinates for the Vaidya metric, Phys. Rev. D34 (1986)
2978-2984, and F. Girotto, A. Saa, Semi-analytical approach for the Vaidya metric in double-null coor-
dinates, arXiv:gr-qc/0406067.
499
28 Vaidya Metrics III: Linear Mass m(v) = µv (a case study)
In the following, in order to illustrate some of the properties of Vaidya metrics discussed
in the previous sections, we will focus mainly on the ingoing Vaidya metric (25.2), and
in particular on the case where the mass function is a linear function of v, m(v) =
µv. This tractable example already displays a rich and intricate structure (with a
subtle dependence on the value of the mass parameter µ), and gives a good idea of the
complexity of the properties of the general Vaidya metric.
m(v) = µv . (28.1)
In order to avoid unphysical negative masses, we will only consider this space-time for
v ≥ 0 and glue it to empty Minkowski space with metric ds2 = −dv 2 + 2dv dr + r 2 dΩ2
at v = 0. Thus we are considering the Vaidya metric
with (
2m(v) 0 v≤0
f (v, r) = 1 − , m(v) = (28.3)
r µv v ≥ 0
Evidently this is a rather unphysical metric for v → ∞ (the mass tending to infinity),
and we will rectify this later on by glueing on a Schwarzschild metric at some time
v0 > 0 (with constant mass m0 = µv0 ).
In order to determine the event horizon, we will now first determine the outgoing ligh-
trays in the linear mass Vaidya metric. To that end, we need to solve the 2nd order null
geodesic equations (27.2) or the 1st order equations (27.11) and (27.12). Remarkably,
both sets of equations simplify tremendously in the linear mass case m(v) = µv (and we
will see later on that this can be attributed to an additional symmetry of the problem
arising from a homothety of the metric):
1. For a linear mass function, and written in terms of the new (non-affine) parameter
t with τ = exp t, the 1st order equations (27.11) and (27.12) are simply a coupled
system of linear homogeneous differential equations with constant coefficients,
namely ) ! ! !
dr/dt = r − 2µv d r 1 −2µ r
⇔ = (28.4)
dv/dt = 2r dt v 2 0 v
This system of equations can be solved in standard ways, essentially by diagonal-
ising and/or exponentiating the (2 × 2)-matrix appearing in the above equation.
500
2. Alternatively, one observes that for a linear mass function the non-linear coupled
2nd order null geodesic equations (27.2) reduce to two decoupled (and identical)
linear harmonic oscillator equations for r(τ ) and v(τ ) respectively. Indeed, note
that for a linear mass function m(v) = µv (27.13) and (27.14) reduce to
4µ
r̈(τ ) + r(τ ) = 0
τ2 (28.5)
4µ
v̈(τ ) + 2 v(τ ) = 0 .
τ
This is simply the equation of a time-dependent harmonic oscillator with the
special (scale-invariant) potential ∼ τ −2 and with the same frequency 4µ for both
r and v. It is straightforward to solve this equation for any µ.
Proceeding either way, one quickly learns that curiously the case µ = 1/16 is special and,
in fact, slightly more complicated (because of a resonance behaviour). In this section,
we will determine the outgoing null geodesics for all µ using the equations (28.5). An
alternative derivation based on the matrix equation (28.4) is given in the appendix
(section 28.6).
To proceed, we first observe that a priori the solutions to the 2nd order equations (28.5)
will involve 4 integration constants, but that (27.11) and (27.12) provide 2 relations
among them. In practice, it is then most convenient to determine the general solution
for r(τ ), and to then determine v(τ ) algebraically from r(τ ) and ṙ(τ ) using (27.11),
1
τ ṙ(τ ) = r(τ ) − 2µv(τ ) ⇔ v(τ ) = (r(τ ) − τ ṙ(τ )) , (28.6)
2µ
In order to solve (28.5), we write (reparametrise) the “frequency” 4µ in terms of a
parameter ω as
p
4µ = ω(1 − ω) ⇔ ω± = 12 (1 ± 1 − 16µ) . (28.7)
µ 6= 1
16 : r(τ ) = c+ τ ω+ + c− τ ω−
(28.9)
v(τ ) = 2(c+ /ω+ )τ ω+ + 2(c− /ω− )τ ω− .
Depending on the sign of
∆ = 1 − 16µ . (28.10)
there are two different sub-cases:
501
(a) For ∆ > 0, the two roots are real and positive,
ω± = 1
2 ± iΩ , Ω2 = (4µ − 1/4) > 0 . (28.12)
In order for the solution to be real, the integration constants c± should then
also be complex conjugates of each other,
c± = c1 ± ic2 . (28.13)
Noting that
τ ω± = τ 1/2 e ±iΩ log τ (28.14)
the general solution can then, if desired, be recast into a manifestly real (but
not necessarily more enlightning) form, now expressed in terms of real linear
combinations of τ 1/2 cos(Ω log τ ) and τ 1/2 sin(Ω log τ ).
µ= 1
16 : r(τ ) = c τ 1/2 + d τ 1/2 log τ
(28.15)
v(τ ) = 4r(τ ) − 8d τ 1/2 = (4c − 8d) τ 1/2 + 4d τ 1/2 log τ .
It is clear from the above solutions (28.9) and (28.15) that the qualitative behaviour of
outgoing lightrays (and thus the lightcones) depends crucially on whether ∆ > 0, ∆ = 0
or ∆ < 0. We will make extensive use of these solutions below in order to determine the
horizon structure in the eternal Vaidya geometry and the Vaidya-glued-to-Schwarzschild
geometry, and I will just add some general qualitative comments here:
Remarks:
1. One noteworthy feature of the solutions (28.9) and (28.15) is that a simultaneous
scaling of (c+ , c− ) in (28.9) or (c, d) in (28.15) is simply equivalent to a scaling of
the coordinates (v, r). This fact reflects a scaling symmetry of the metric that we
will discuss below.
2. It turns out that, among all the above solutions, a special role will be played by
those null geodesics that are invariant under this scaling, i.e. which are such that
r and v are linearly related so that the geodesics are straight lines (d2 r/dv 2 = 0)
in an (r, v)-diagram.
502
From (27.8), which in the present (linear mass) case reads
dr dr µv 1
2 = f (v, r) ⇔ + = , (28.16)
dv dv r 2
one finds
d2 r µ v 2µv
2
= 1− −1 . (28.17)
dv r 2r r
Thus r ′′ (v) = 0 along
v 2µv
1− =1 ⇔ r 2 − vr/2 + µv 2 = 0
2r r (28.18)
2 2
⇔ (r − v/4) = (1 − 16µ)(v/4)
with solution
v √ ω± ω∓
r= 1± ∆ = v ⇔ v= r . (28.19)
4 2 2µ
Again we see that the value µ = 1/16, ∆ = 0 plays a special role, as these lines
exist only for ∆ ≥ 0.
3. For ∆ > 0 these are precisely the null geodesics with either c− = 0 or c+ = 0,
ω±
∆>0: c∓ = 0 ⇒ r(τ ) = v(τ ) , (28.20)
2
while for ∆ = 0 these are the lines with d = 0,
τ = e −2λ (28.22)
(so that scaling τ corresponds to shifting λ) and defining C = −2d, the solution
takes the form
This is the form of the solution given by Poisson.91 It is more convenient, however,
also for purposes of matching the Vaidya and Schwarzschild geodesics, to use the
affinely parametrised solution given in (28.15).
91
E. Poisson, A Relativist’s Toolkit: the Mathematics of Black Hole Mechanics, section 5.7.2.
503
28.2 Some Comments on Homotheties, Geodesics and Wronskians
We have seen in a number of different ways that Vaidya metrics with a linear mass
function have some special properties which hint at an underlying additional symmetry
in this case. In this section I collect some (not indispensable) comments related to this
symmetry, to homotheties, Wronskians and the like.
To discover and describe this additional symmetry, note that from the null geodesic
equations (27.2) one finds for a general mass function m(v)
d v̇ 2
(r v̇ − v ṙ) = (m(v) − vm′ (v)) . (28.24)
dτ r
Thus
D := r v̇ − v ṙ (28.25)
is a constant of motion for outgoing null geodesics iff m(v) = v∂v m(v), i.e. iff m(v) = µv
is a linear function of v,
d
D = 0 ⇔ m(v) = µv . (28.26)
dτ
This constant of motion can be understood and interpreted as the conserved charge
associated to the dilatation symmetry (homothety) of the Vaidya metric with a linear
mass function,
(v, r) → (λv, λr) ⇒ ds2 = −(1 − 2µv/r)dv 2 + 2dv dr + r 2 dΩ2 → λ2 ds2 . (28.27)
and therefore (see the discussion in section 7.2, in particular equation (7.8)) leads to
the conserved charge
D = gαβ C α ẋβ (28.29)
Using the null condition (27.8), f (v, r)v̇ = 2ṙ, one then finds (28.25).
Remarks:
504
1. The scaling symmetry (28.27) implies that if (r(τ ), v(τ )) is a solution to the
geodesic equations, then so is (λr(τ ), λv(τ )), something that we had already ob-
served post facto based on the explicit solutions (28.9) and (28.15) of the null
geodesic equations. We see e.g. from (28.25) that this scaling is essentially equiv-
alent to changing the integration constant D by D → λ2 D, and therefore the
freedom to choose the value of D reflects this scale invariance.
2. E.g. by calculating explicitly r v̇ − ṙv from the solutions (28.9) and (28.15), one
finds that the constant of motion D is related to the integration constants c± or
(c, d) by
1 c+ c−
µ 6= 16 : D = −2 ∆
ω+ ω− (28.32)
µ= 1
16 : D = 8d2 .
From this one sees that the scale-invariant geodesics (28.20) (with c+ = 0 or
c− = 0) and (28.21) (with d = 0) are precisely the geodesics with D = 0 (which is
the only scale-invariant value under D → λ2 D).
The Wronskian D = 0 precisely when v(τ ) and r(τ ) are linearly dependent, i.e.
when they describe a straight line in an (r, v)-diagram, as we have indeed seen in
(28.20) and (28.21).
4. Inserting (27.11) and (27.12) into (28.25), one finds an algebraic relation between
r(τ ) and v(τ ), namely
505
28.3 Event vs Apparent Horizons for m(v) = µv: Overview
With all this detailed knowledge of null geodesics in the linear mass Vaidya space-time
at our disposal, it is now straightforward to determine the causal properties of this
space-time (and subsequently of the space-time obtained by glueing Schwarzschild to
Vaidya). As in our discussion above, the sign of ∆ = 1 − 16µ and the value µ = 1/16
will turn out to play a special role.
The first and simplest exercise is to determine the apparent horizons and their geoemtry.
The apparent horizon (26.58) is the hypersurface
Thus in the linear mass case this is a straight line in an (r, v)-diagram. The induced
metric (26.69) on the apparent horizon is
ds2 |f (v,r)=0 = 4m′ (v)dv 2 + (2m(v))2 dΩ2 = 4µdv 2 + 4µ2 v 2 dΩ2 , (28.38)
or, in terms of r,
ds2 |f (v,r)=0 = µ−1 dr 2 + r 2 dΩ2 . (28.39)
Therefore the apparent horizon is spacelike for µ > 0. Radial outgoing null geodesics
reaching r = 2µv will attain their maximal radius there and then turn around to smaller
values of r.
The intrinsic geometry of the apparent horizon is thus manifestly flat for µ = 1, and
while the factor µ−1 may look harmless it actually leads to a non-trivial curvature tensor
for µ 6= 1, with a curvature singularity at r = 0. Explicitly, one finds (see section (8.6),
in particular equations (8.51) and (8.53)) that e.g. the non-trivial components of the
Riemann tensor and the Ricci scalar are
Further information is obtained from looking at the 2nd derivative d2 r/dv 2 , i.e. the 1st
derivative of f (v, r). From (28.17) one has
d2 r µ 2 2
= − (r − v/4) + (µ − 1/16)v , (28.41)
dv 2 r3
and from (28.19) we know that r ′′ (v) = 0 along the scale-invariant null geodesics (28.20)
and (28.21) given by
ω±
r= v . (28.42)
2
There are now three different cases to consider, depending on the sign of ∆.
506
There are no real roots for µ > 1/16 and from (28.41) one sees that r ′′ (v) < 0
is strictly negative everywhere. It turns out that in this case no lightray will be
able to escape to infinity, i.e. there is no future lightlike infinity, and every lightray
ends up in the spacelike singularity at r = 0 in the future.
This does not follow just from the fact that d2 r/dv 2 < 0. One has to show that all
outgoing lightrays eventually reach and subsequently cross the apparent horizon
at which dr/dv = 0. This can be established by using the explicit real form of
the solution, with its characteristic τ 1/2 -modulated trigonometric dependence on
Ω log τ .
Thus in this case there is no event horizon (in fact, the entire space-time looks
like the interior of region II of the Kruskal extension of the Schwarzschild space-
time - this is clearly an artefact of having a mass that tends sufficiently rapidly to
infinity), and also the apparent horizon appears to be of no particular significance.
Note that it follows from (28.35) and (28.32) that for ∆ < 0 outgoing null geodesics
satisfy
(r − v/4)2 + (µ − 1/16)v 2 = |c/ω|2 (−∆)τ ≥ 0 , (28.43)
absolute value. In particular if τi > 0 denotes the (initial) time at which v(τi ) = 0
(we are discarding τi = 0 because for τ → 0 the functions τ 1/2 cos(Ω log τ ) and
τ 1/2 sin(Ω log τ ) are badly behaved), one has
This means that outgoing radial null geodesics start off at v = 0 at some positive
value of r. As one of these outgoing (families of) null geodesics will turn out
to become the event horizon for the metric obtained by combining Vaidya for
0 ≤ v ≤ v0 with Schwarzsschild for v ≥ v0 , this will lead to the conclusion that
in the case ∆ < 0 the singularity at r = 0 is hidden behind an event horizon (no
null geodesic emerging from r = 0 can escape to infinity).
2. µ = 1/16 or ∆ = 0
For µ = 1/16 one has a double root at r = v/4. One has d2 r/dv 2 ≤ 0, with
equality only for the line (null hypersurface) r = v/4,
d2 r
=0 ⇒ r = v/4 . (28.45)
dv 2
This hypersurface is generated by the outgoing null geodesics with D = 0 (or
d = 0 in (28.15)).
It turns out that the behaviour of the other outgoing light rays depends strongly
on whether they are in the region r > v/4 or in the region r < v/4. If they are
in one of those regions initially, they always remain there, and for r > v/4 the
507
lightrays escape to infinity while for r < v/4 they reach a maximal radius at the
apparent horizon r = v/8 and then return to smaller values of r. These results
follow from the general solution (28.15) which we will analyse in a bit more detail
below.
Therefore the event horizon of the µ = 1/16 linear mass Vaidya black hole is the
scale-invariant null hypersurface r = v/4, outside the apparent horizon at r = v/8.
3. For 0 < µ < 1/16 there are two roots and a correspondingly richer phase diagram
for the behaviour of outgoing lightrays in the (r, v)-plane. The two hypersurfaces
r = r(v) along which r ′′ (v) = 0 are generated by the D = 0 null geodesics (28.42).
They divide the (r ≥ 0, v ≥ 0) quadrant into 3 wedges:
(a) For r > (ω+ /2)v one has r ′′ (v) < 0, and c+ > 0, c− < 0 (thus D > 0).
(b) For r = (ω+ /2)v one has r ′′ (v) = 0, and c+ > 0, c− = 0 (thus D = 0).
(c) For (ω− /2)v < r < (ω+ /2)v one has r ′′ (v) > 0, and c+ > 0, c− > 0 (thus
D < 0).
(d) For r = (ω− /2)v one has r ′′ (v) = 0, and c+ = 0, c− > 0 (thus D = 0).
(e) For r < (ω− /2)v one has r ′′ (v) < 0, and c+ < 0, c− > 0 (thus D > 0).
(which is always satisfied in the given range of µ as can be seen by squaring the
two positive sides of this inequality), the apparent horizon at r = 2µv lies in
the lowest wedge. This has the following implications for the behaviour of null
geodesics (that can also be checked from the explicit solution (28.9)):
• Null geodesics that start off in the lowest wedge cross the apparent horizon
at r = 2µv and then return to smaller values of r (and ultimately to r = 0 at
finite v), regardless of whether they were initially above the apparent horizon
(truly outgoing at that time) or below it.
• Null geodesics in the central region, including the two lines r = (ω± /2)v start
off at r = 0 for v = 0 and reach r = ∞ for v = ∞, with r ′′ (v) ≥ 0.
• Null geodesics in the region r > (ω+ /2)v have r(v = 0) > 0 and escape to
infinity with r ′′ (v) < 0.
It follows that the lower line r = (ω− /2)v is the event horizon of the ∆ > 0 linear
mass Vaidya black hole. It again lies outside the apparent horizon.
508
28.4 Null Geodesics, Horizons and Singularities for µ = 1/16
In this section we take a quick more detailed look at the special case µ = 1/16. The
solution for outgoing radial null geodesics is given in (28.15), which we now compactly
write as
r(τ ) = c τ 1/2 + d τ 1/2 log τ = v(τ )/4 + 2dτ 1/2 . (28.47)
The behaviour of these geodesics depends crucially on the sign of the constant d.
1. The solution with d = 0 is the special scale-invariant null geodesic r(τ ) = v(τ )/4
(28.21) with
r(τ ) = v(τ )/4 = cτ 1/2 . (28.48)
This requires c > 0, and changing c by a constant positive factor is equivalent
to scaling τ . This thus describes the same geodesic, just with a different affine
parameter. This null geodesic emerges from r = 0 at v = 0 and τ = 0 and
evidently escapes to infinity as v → ∞ or τ → ∞.
2. For d 6= 0 there are two distinct types of geodesics, depending on the sign of d,
namely those with r > v/4 at all times (d > 0) and those with r < v/4 at all
times (d < 0):
(a) For d > 0 one has r > v/4. The requirement v(τ ) ≥ 0 (implying r(τ ) ≥ 0 in
this case) leads to the condition that
τ ≥ τmin = e (2 − c/d) :
1/2
v(τmin ) = 0 , r(τmin ) = 2dτmin . (28.49)
509
Monotonicity of the mass function (and the explicit solution) imply that these
will then not turn around again (and reach r = 0 at τ = τmax ).
Thus the null geodesic r = v/4 is the last null geodesic to (barely) escape to infinity,
and the event horizon is the null hypersurface,
Remarks:
1. The event horizon at r = v/4 manifestly lies outside the apparent horizon r = v/8.
In particular, while the outgoing expansion (26.67)
r − 2m(v) r − v/8
θℓ = 2
= (28.55)
r r2
is zero on the apparent horizon (by our informal definition of the apparent horizon
in section 26.3), it is strictly positive on the event horizon,
1
θℓ |v=4r = >0 . (28.56)
2r
2. The event horizon satisfies r(v = 0) = 0, so that it emerges from r = 0 at the time
v = 0. At this time v = 0 the “singularity” at r = 0 is massless, m(v = 0) = 0,
and one might be tempted to conclude, e.g. from the Kretschmann scalar (25.11),
which now has the form,
This becomes more manifest when one introduces, instead of r, the scale-invariant
coordinate x = v/r, say. Then one evidently has
510
4. The fact that there are lightrays albeit only a single S 2 -family of lightrays) that can
escape from the singularity to infinity means that the singularity is not completely
hidden behind the event horizon, and is barely / marginally naked. Such massless
singularities are considered to be relatively harmless, and therefore they are not
considered to be genuine counterexamples to the spirit of the (or an appropriately
formulated) cosmic censorship conjecture.
We now consider a slightly more realistic and interesting scenario in which there is an
ingoing shell of radiation (null dust) during a finite interval of time v leaving behind a
Schwarzschild black hole. In equations this means that we consider the Vaidya metric
with
0 v≤0
2m(v)
f (v, r) = 1 − , m(v) = v/16 0 ≤ v ≤ v0 (28.61)
r
m0 = v0 /16 v ≥ v0
Note that the mass function is continuous, but that its derivative m′ (v) has a jump,
1
m′ (v) = 16 [θ(v) − θ(v − v0 )] , (28.62)
To return to the current setting, for v ≥ v0 the metric with f (v, r) given by (28.61) is
the Schwarzschild metric, with apparent horizon = event horizon at
v ≥ v0 : r0 = 2m0 = v0 /8 , (28.63)
where we have chosen the freedom to scale τ so that v(τ = 1) = v0 . Thus this describes
the location of the horizon for τ ≥ 1.
511
For 0 ≤ v ≤ v0 , the apparent horizon is at r = v/8. This matches onto the apparent =
event horizon of the Schwarschild black hole for v ≥ v0 .
In order to determine the event horizon of the total geometry, we now need to determine
the lightray for 0 ≤ v ≤ v0 that matches onto the above event horizon of the Schwarz-
schild geometry. Since r0 = v0 /8 < v0 /4, it is clear that this is to be sought among the
geodesics given in (28.47) with d < 0,
r(τ ) = c τ 1/2 − |d| τ 1/2 log τ = v(τ )/4 − 2|d|τ 1/2 . (28.65)
Since
r(τ = 1) = c , v(τ = 1) = 4c + 8|d| , (28.66)
Remarks:
1. Both r(τ ) and v(τ ) and their first derivatives are continuous at τ = 1. Note
that it is automatic that the null geodesic reaches its maximal radial value at
τ = 1, ṙ(τ = 1) = 0, because it crosses the apparent horizon r = v/8 there,
r(1) = r0 , v(1) = v0 = 8r0 . As a consequence, the first derivatives of the two
pieces of r(τ ) indeed match at τ = 1, both parts of the event horizon give ṙ(1) = 0.
The same is true for v̇(1); in both cases one has v̇(1) = 2r0 .
2. The explicit expression for the event horizon confirms that in the Vaidya region
it lies in the region between r = v/4 and the apparent horizon at r = v/8, thus
outside the apparent horizon. Explicitly, for the difference between the event
horizon and the apparent horizon (given at each time by r = v/8) one has
3. This last equation also gives us information about the rate of expansion of the
event horizon, given by (27.11) by
512
One might perhaps naively have expected the expansion rate of the horizon to
increase during a period of infalling matter, but this equation shows that quite
the opposite is true: during the period that matter falls in, the event horizon
of course grows, but its expansion rate decreases, the expansion stopping when
matter stops falling in at τ = 1. This is a general and somewhat counterintuitive
feature of event horizons, reflecting the non-locality of the event horizon (we also
observed this in the in other respects quite different example of event horizons in
the Oppenheimer-Snyder geometry of a collapsing star in section 26.6).
(a) all geodesics that satisfy r(v0 ) > r0 are outside the Schwarzschild horizon,
and thus describe outgoing null geodesics in the Schwarzschild geometry that
escape to infinity; and
(b) all geodesics that satisfy r(v0 ) < v0 /4 emerged from r = 0 at time v = 0.
To conclude this discussion, I just mention that the situation for µ 6= 1/16, i.e. for the
Vaidya metric with
0 v≤0
2m(v)
f (v, r) = 1 − , m(v) = µv 0 ≤ v ≤ v0 (28.72)
r
m0 = µv0 v ≥ v0
is the following:
1. For µ > 1/16 the entire singularity at r = 0 (the v-axis) is hidden behind the
event horizon and geodesics that reach infinity emerged from some finite value of
r at v = 0 (and for v < 0 there was just Minkowski space).
2. For µ < 1/16 there are many lightlike geodesics that emerge from r = 0 and that
reach infinity. Thus this is like the case µ = 1/16, because the geodesics that
escape from r = 0 ro r → ∞ emerge from r = 0 at v = 0 (so that again this is a
massless singularity). It would have been worse if they had emerged from r = 0
at some time v > 0, where the singularity is massive, but this does not happen.
513
It is straightforward to establish all this analytically by using the explicit solution (28.9)
for the outgoing lightrays. It is also an instructive (and fairly elementary) exercise to
generalise the above discussion to the case where one initially has a constant mass
Schwarzschild black hole instead of Minkowski space, i.e. one has a mass function
m0 v ≤ v0
m(v) = m0 + µ(v − v0 ) v0 ≤ v ≤ v1 (28.73)
m1 = m0 + µ(v1 − v0 ) v ≥ v1
and the corresponding apparent horizon at r = 2m(v).
Here is an alternative derivation of the solutions (28.9) and (28.15) for outgoing null
geodesics of the linear mass m(v) = µv ingoing Vaidya metric for any µ by determining
the solution of the coupled set of linear homogeneous differential equations (28.4) with
constant coefficients, ! ! !
d r 1 −2µ r
= . (28.74)
dt v 2 0 v
We write this as
d~x/dt = L~x (28.75)
with ~x = (r, v)t ((.)t denoting the transpose column vector) and
!
1 −2µ
L= . (28.76)
2 0
One standard (but in general cumbersome) way to solve this equation is to exponentiate
the matrix L,
~x(t) = e tL ~x(0) . (28.77)
Another possibility is to diagonalise L either explicitly or implicitly. We follow the latter
approach by making the ansatz
X X
~x(t) = c̃J ~aJ e ωJ t ≡ c̃J ~xJ (t) , (28.78)
J J
Thus the ωJ are the eigenvalues of L and the ~aJ are the corresponding eigenvectors.
The ωJ are the roots of the degree d polynomial equation
514
1. If all the eigenvalues / roots are distinct, then (28.78) already gives the general
solution.
3. If eigenvalues and eigenvectors are degenerate, then (28.78) does not provide d
linearly independent solutions. A resonance phenomenon (familiar from forced
oscillations) arises, resulting in the appearance of a term proportional to t exp ωJ t.
Indeed, in such a case the general solution associated with the degenerate root ωJ
(let us assume that it is twice degenerate) is of the form
~xJ = c̃J ~aJ e ωJ t + d˜J ~aJ t e ωJ t + ~bJ e ωJ t (28.81)
and evidently only defined modulo (i.e. up to addition of a multiple of) ~aJ . Using
(28.82), it is easy to see that indeed the 2nd part of (28.81) is a solution of (28.75),
d
( − L) ~aJ t e ωJ t + ~bJ e ωJ t = 0 . (28.83)
dt
4. The eigenvectors ~aJ can either be readily constructed by hand (for small d) or, in
general, from the minors (cofactors) of the matrix M .
• First of all, the null eigenvectors of M (ωJ ) can be chosen to be of the form
If M22 = M21 = 0, one can alternatively construct ~aJ from the components of the
1st row. This only fails if M (ωJ ) = 0 identically, in which case the construction of
two linearly independent eigenvectors is trivial anyway: one can e.g. choose (1, 0)t
and (0, 1)t .
• Moreover, for (2×2)-matrices, case (2) in the above list can only arise if M (ωJ ) = 0
identically. Otherwise there will be a resonance iff one has a degenerate root.
Specifically and concretely, in the case at hand, with L the (2 × 2)-matrix given in
(28.76), one has
! !
1 −2µ 1 − ω −2µ
L= ⇒ M (ω) = , (28.85)
2 0 2 −ω
515
1. The characteristic equation is
p
ω 2 − ω + 4µ = 0 ⇒ ωJ = 12 (1 ± 1 − 16µ) ≡ ω± , (28.86)
and we see (rediscover from the present point of view) that there are 3 distinct
cases, depending on the sign of ∆ = 1 − 16µ (28.10):
ω+ = 21 (1 + iω) = ω−
∗
, ω 2 = 16µ − 1 > 0 . (28.88)
(c) µ = 1/16 or ∆ = 0:
In this case one has one degenerate root ω = 1/2, the matrix
!
1/2 −1/8
M (ω = 1/2) = (28.89)
2 −1/2
is not identically zero, and therefore in this case the solution will involve an
additional
te t/2 = τ 1/2 log τ (28.90)
term, as we already saw in the explicit solution (28.47).
2. The eigenvectors ~aJ = ~a± can be chosen to have the simple form (28.84),
! !
−M (ω± )22 ω±
~a± = = . (28.91)
M21 (ω± ) 2
3. For µ = 1/16, ω± = 1/2, and the secondary null vector can be chosen to be
! ! ! !
~b = 1 1/2 −1/8 1 1/2
⇒ M~b = = = ~a . (28.92)
0 2 −1/2 0 2
Putting everything together, we can now find the general solution in the 3 cases. We
will immediately write things again in terms of proper time τ , related to the parameter
t by τ = exp t.
516
2. For ∆ = 0 the general solution has the form (28.81),
! ! ! ! !
r(τ ) 1/2 1/2 1
= c̃ τ 1/2 + d˜ τ 1/2 log τ + τ 1/2 . (28.94)
v(τ ) 2 2 0
This agrees with the solution (28.15) with the identification c̃ = 2c − 4d, d˜ = 2d.
517
29 Exact Wave-like Solutions of the Einstein Equations
Such wave-metrics have been studied in the context of four-dimensional general relativity
for a long time even though they are not (and were never meant to be) phenomeno-
logically realistic models of gravitational plane waves. The reason for this is that in
the far-field gravitational waves are so weak that the linearised Einstein equations and
their solutions are adequate to describe the physics, whereas the near-field strong grav-
itational effects responsible for the production of gravitational waves, for which the
linearised equations are indeed insufficient, correspond to much more complicated solu-
tions of the Einstein equations (describing e.g. two very massive stars orbiting around
their common center of mass).
However, pp-waves have been useful and of interest as a theoretical play-ground since
they are in some sense the simplest essentially Lorentzian metrics with no non-trivial
Riemannian counterparts. As such they also provide a wealth of counterexamples to
conjectures that one might like to make about Lorentzian geometry by naive extrapo-
lation from the Riemannian case. They have also enjoyed some popularity in the string
theory literature as potentially exact and exactly solvable string theory “backgrounds”.
However, they seem to have made it into very few textbook accounts of general relativ-
ity, and the purpose of this section is to at least partially fill this gap by providing a
brief introduction to this topic.
We have seen in section 16 that a metric describing the propagation of a plane wave in
the x3 -direction (16.88) can be written as
We will now simply define a plane wave metric in general relativity to be a metric of
the above form, dropping the assumption that hij be “small”,
518
We will say that this is a plane wave metric in Rosen coordinates. This is not the
coordinate system in which plane waves are usually discussed, among other reasons
because typically in Rosen coordinates the metric exhibits spurious coordinate singu-
larities. This led to the mistaken belief in the past that there are no non-singular plane
wave solutions of the non-linear Einstein equations. We will establish the relation to
the more common and much more useful Brinkmann coordinates below.
Plane wave metrics are characterised by a single matrix-valued function of U , but two
metrics with quite different ḡij may well be isometric. For example,
is isometric to the flat Minkowski metric whose natural presentation in Rosen coordi-
nates is simply the Minkowski metric in lightcone coordinates,
This is not too difficult to see, and we will establish this as a consequence of a more
general result later on (but if you want to try this now, try scaling ~y by U and do
something to V . . . ).
That (29.4) is indeed flat should in any case not be too surprising. It is the “null”
counterpart of the “spacelike” fact that ds2 = dr 2 + r 2 dΩ2 , with dΩ2 the unit line
element on the sphere, is just the flat Euclidean metric in polar coordinates, and the
“timelike” statement that
ds2 = −dt2 + t2 dΩ̃2 , (29.6)
with dΩ̃2 the unit line element on the hyperboloid, is just (a wedge of) the flat Minkowski
metric. In cosmology this is known as the Milne Universe. It is easy to check that this
is indeed a (rather trivial) solution of the Friedmann equations with k = −1, a(t) = t
and ρ = P = 0.
It is somewhat less obvious, but still true, that for example the two metrics
In the remainder of this section we will study gravitational plane waves in a more
systematic way. One of the characteristic features of the above plane wave metrics is
the existence of a nowhere vanishing covariantly constant null vector field, namely ∂V .
We thus begin by deriving the general metric (line element) for a space-time admitting
519
such a covariantly constant null vector field. We will from now on consider general
(d + 2)-dimensional space-times, where d is the number of transverse dimensions.
Thus, let Z be a parallel (i.e. covariantly constant) null vector of the (d+ 2)-dimensional
Lorentzian metric gµν , ∇µ Z ν = 0. This condition is equivalent to the pair of conditions
∇µ Zν + ∇ν Zµ = 0 (29.8)
∇µ Zν − ∇ν Zµ = 0 . (29.9)
The first of these says that Z is a Killing vector field, and the second that Z is also
a gradient vector field. If Z is nowhere zero, without loss of generality we can assume
that
Z = ∂v (29.10)
for some coordinate v since this simply means that we are using a parameter along
the integral curves of Z as our coordinate v. In terms of components this means that
Z µ = δvµ , or
Zµ = gµv . (29.11)
Zv = gvv = 0 . (29.12)
The Killing equation now implies that all the components of the metric are v-independent,
∂v gµν = 0 . (29.13)
∇µ Zν − ∇ν Zµ = 0 ⇔ ∂µ Zν − ∂ν Zµ = 0 , (29.14)
which implies that locally we can find a function u = u(xµ ) such that
Zµ = gvµ = ∂µ u . (29.15)
There are no further constraints, and thus the general form of a metric admitting a
parallel null vector is, changing from the xµ -coordinates to {u, v, xa }, a = 1, . . . , d,
Note that if we had considered a metric with a covariantly constant timelike or spacelike
vector, then we would have obtained the above metric with an additional term of the
form ∓dv 2 . In that case, the cross-term 2dudv could have been eliminated by shifting
v → v ′ = v ∓ u, and the metric would have factorised into ∓dv ′2 plus a v ′ -independent
520
metric. Such a factorisation does in general not occur for a covariantly constant null
vector, which makes metrics with such a vector potentially more interesting than their
timelike or spacelike counterparts.
There are still residual coordinate transformations which leave the above form of the
metric invariant. For example, both K and Aa can be eliminated in favour of gab . We
will not pursue this here, as we are primarily interested in a special class of metrics
which are characterised by the fact that gab = δab ,
Such metrics are called plane-fronted waves with parallel rays, or pp-waves for short.
“plane-fronted” refers to the fact that the wave fronts u = const. are planar (flat),
and “parallel rays” refers to the existence of a parallel null vector. Once again, there
are residual coordinate transformations which leave this form of the metric invariant.
Among them are shifts of v, v → v + Λ(u, xa ), under which the coefficients K and Aa
transform as
K → K + 12 ∂u Λ
Aa → Aa + ∂a Λ . (29.18)
Plane waves are a very special kind of pp-waves. By definition, a plane wave metric is
a pp-wave with Aa = 0 and K(u, xa ) quadratic in the xa (zero’th and first order terms
in xa can be eliminated by a coordinate transformation),
We will say that this is the metric of a plane wave in Brinkmann coordinates. The
relation between the expressions for a plane wave in Brinkmann coordinates and Rosen
coordinates will be explained in section 29.5. From now on barred quantities will refer
to plane wave metrics.
521
29.3 Geodesics, Light-Cone Gauge and Harmonic Oscillators
Rather than determining the geodesic equations by first calculating all the non-zero
Christoffel symbols, we make use of the fact that the geodesic equations can be obtained
more efficiently, and in a way that allows us to directly make use of the symmetries of
the problem, as the Euler-Lagrange equations of the Lagrangian
1 µ ν
L = 2 ḡµν ẋ ẋ
= u̇v̇ + 12 Aab (u)xa xb u̇2 + 12 ~x˙ 2 , (29.22)
u = pv τ . (29.25)
Then the geodesic equations for the transverse coordinates are the Euler-Lagrange equa-
tions
ẍa (τ ) = Aab (pv τ )xb (τ )p2v (29.26)
ẍa (τ ) = −ωab
2
(τ )xb (τ ) (29.27)
The constraint
522
for null geodesics (the case ǫ 6= 0 can be dealt with in the same way) implies, and
thus provides a first integral for, the v-equation of motion. Multiplying the oscillator
equation by xa and inserting this into the constraint, one finds that this can be further
integrated to
pv v(τ ) = − 21 xa (τ )ẋa (τ ) + pv v0 . (29.30)
Note that a particular solution of the null geodesic equation is the purely “longitudinal”
null geodesic
xµ (τ ) = (u = pv τ, v = v0 , xa = 0) . (29.31)
Along this null geodesic, all the Christoffel symbols of the metric (in Brinkmann coor-
dinates) are zero. Hence Brinkmann coordinates can be regarded as a special case of
Fermi coordinates (briefly mentioned at the beginning of section 2.9).
In summary, we note that in the lightcone gauge the equation of motion for a relativistic
particle becomes that of a non-relativistic harmonic oscillator. This harmonic oscillator
equation appears in various different contexts when discussing plane waves, and will
therefore also reappear several times later on in this section.
It is easy to see that there is essentially only one non-vanishing component of the
Riemann curvature tensor of a plane wave metric, namely
In particular, therefore, because of the null (or chiral) structure of the metric, there is
only one non-trivial component of the Ricci tensor,
523
the Ricci scalar is zero,
R̄ = 0 , (29.38)
and the only non-zero component of the Einstein tensor (8.83) is
Thus, as claimed above, the metric is flat iff Aab = 0. Moreover, we see that in
Brinkmann coordinates the vacuum Einstein equations reduce to a simple algebraic
condition on Aab (regardless of its u-dependence), namely that it be traceless.
for arbitrary fuctions A(u) and B(u). This reflects the two polarisation states or de-
grees of freedom of a four-dimensional graviton. Evidently, this generalises to arbitrary
dimensions: the number of degrees of freedom of the traceless matrix Aab (u) correspond
precisely to those of a transverse traceless symmetric tensor (a.k.a. a graviton).
When the Ricci tensor is non-zero (Aab has non-vanishing trace), then plane waves solve
the Einstein equations with null matter or null fluxes, i.e. with an energy-momentum
tensor T̄µν whose only non-vanishing component is T̄uu ,
Examples are e.g. null Maxwell fields Aµ (u) with field strength
Physical matter (with positive energy density) corresponds to R̄uu > 0 or Tr A < 0.
It is pretty obvious by inspection that not just the scalar curvature but all the scalar
curvature invariants of a plane wave, i.e. scalars built from the curvature tensor and its
covariant derivatives, vanish since there is simply no way to soak up the u-indices.
524
Usually, an unambiguous way to ascertain that what appears to be a singularity of
a metric is a true curvature singularity rather than just a singularity in the choice
of coordinates is to exhibit a curvature invariant that is singular at that point. For
example, for the Schwarzschild metric one has the Kretschmann scalar (15.99) K =
Rµνρσ Rµνρσ ∼ m2 /r 6 , which shows that the singularity at r = 0 is a true singularity.
Now for plane waves all curvature invariants are zero. Does this mean that plane waves
are non-singular? Or, if not, how does one detect the presence of a curvature singularity?
One way to do this is to study the tidal forces acting on extended objects or families
of freely falling particles. Indeed, in a certain sense the main effect of curvature (or
gravity) is that initially parallel trajectories of freely falling non-interacting particles
(dust, pebbles,. . . ) do not remain parallel, i.e. that gravity has the tendency to focus
(or defocus) matter. This statement find its mathematically precise formulation in the
geodesic deviation equation (9.28),
Here δxµ is the separation vector between nearby geodesics. We can apply this equation
to some family of geodesics of plane waves discussed in section 29.3. We will choose δxµ
to connect points on nearby geodesics with the same value of τ = u. Thus δu = 0, and
the geodesic deviation equation for the transverse separations δxa reduces to
d2 a
δx = −R̄aubu δxb = Aab δxb . (29.47)
du2
This is (once again!) the harmonic oscillator equation, and generalises the corresponding
equation (16.93) of the linearised theory to the present case.
We could have also obtained this directly by varying the harmonic oscillator (geodesic)
equation for xa , using δu = 0. We see that for negative eigenvalues of Aab (physical
matter) this tidal force is attractive, leading to a focussing of the geodesics. For vacuum
plane waves, on the other hand, the tidal force is attractive in some directions and
repulsive in the other (reflecting the quadrupole nature of gravitational waves).
What is of interest to us here is the fact that the above equation shows that Aab itself
contains direct physical information. In particular, these tidal forces become infinite
where Aab (u) diverges. This is a true physical effect and hence the plane wave space-
time is genuinely singular at such points.
Since, on the other hand, the plane wave metric is clearly smooth for non-singular
Aab (u), we can thus summarise this discussion by the statement that a plane wave is
singular if and only if Aab (u) is singular somewhere.
525
29.5 From Rosen to Brinkmann coordinates (and back)
I still owe you an explanation of what the heuristic considerations of section 29.1 have
to do with the rest of this section. To that end I will now describe the relation between
the plane wave metric in Brinkmann coordinates,
It is clear that, in order to transform the non-flat transverse metric in Rosen coordinates
to the flat transverse metric in Brinkmann coordinates, one should change variables as
xa = Ē ai y i , (29.50)
This generates the flat transverse metric as well as dU 2 -term quadratic in the xa , as
desired, but there are also unwanted dU dxa cross-terms. Provided that Ē satisfies the
symmetry condition
Ē˙ ai Ē ib = Ē˙ bi Ē ia (29.53)
V → V − 21 Ē˙ ai Ē ib xa xb . (29.54)
Apart from eliminating the dU dxa -terms, this shift will also have the effect of gener-
ating other dU 2 -terms. Thanks to the symmetry condition, the term quadratic in first
derivatives of Ē cancels that arising from ḡij dy i dy j , and only a second-derivative part
remains. The upshot of this is that after the change of variables
U = u
V = v + 21 Ē˙ ai Ē ib xa xb
y i = Ē ia xa , (29.55)
¨ Ē i .
Aab = Ē (29.56)
ai b
526
This can also be written as the harmonic oscillator equation
¨ = A Ē
Ē (29.57)
ai ab bi
we had already encountered in the context of the geodesic (and geodesic deviation)
equation.
Note that from this point of view the Rosen coordinates are labelled by d out of 2d
linearly independent solutions of the oscillator equation, and the symmetry condition
can now be read as the constraint that the Wronskian among these solutions be zero.
Thus, given the metric in Brinkmann coordinates, one can construct the metric in Rosen
coordinates by solving the oscillator equation, choosing a maximally commuting set of
solutions to construct Ēai , and then determining ḡij algebraically from the Ēai .
In practice, once one knows that Rosen and Brinkmann coordinates are indeed just
two distinct ways of describing the same class of metrics, one does not need to perform
explicitly the coordinate transformation mapping one to the other. All one is interested
in is the above relation between ḡij (U ) and Aab (u), which essentially says that Aab is
the curvature of ḡij ,
Aab = −Ē ia Ē jb R̄U iU j . (29.58)
The equations simplify somewhat when the metric ḡij (u) is diagonal,
ḡij (u) = ēi (u)2 δij . (29.59)
In that case one can choose Ē ai = ēi δia . The symmetry condition is automatically
satisfied because a diagonal matrix is symmetric, and one finds that Aab is also diagonal,
Aab = (ē¨a /ēa )δab . (29.60)
Conversely, therefore, given a diagonal plane wave in Brinkmann coordinates, to obtain
the metric in Rosen coordinates one needs to solve the harmonic oscillator equations
ē¨i (u) = Aii (u)ēi (u) . (29.61)
Thus the Rosen metric determined by ḡij (U ) is flat iff ēi (u) = ai U +bi for some constants
ai , bi . In particular, we recover the fact that the metric (29.4),
ds̄2 = 2dU dV + U 2 d~y 2 (29.62)
is flat. We see that the non-uniqueness of the metric in Rosen coordinates is due to
the integration ‘constants’ arising when trying to integrate a curvature tensor to a
corresponding metric.
As another example, consider the four-dimensional vacuum plane wave (29.40). Evi-
dently, one way of writing this metric in Rosen coordinates is
ds̄2 = 2dU dV + sinh2 U dX 2 + sin2 U dY 2 , (29.63)
and more generally any plane wave with constant Aab can be chosen to be of this
trigonometric form in Rosen coordinates.
527
29.6 More on Rosen Coordinates
Collecting the results of the previous sections, we can now gain a better understanding
of the geometric significance (and shortcomings) of Rosen coordinates for plane waves.
defines a preferred family (congruence) of null geodesics, namely the integral curves of
the null vector field ∂U , i.e. the curves
(U (τ ), V (τ ), y k (τ )) = (τ, V, y k ) (29.65)
with affine parameter τ = U and parametrised by the constant values of the coordinates
(V, y k ). In particular, the “origin” V = y k = 0 of this congruence is the longitudinal
null geodesic (29.31) with v0 = 0 in Brinkmann coordinates.
In the region of validity of this coordinate system, there is a unique null geodesic of
this congruence passing through any point, and one can therefore label (coordinatise)
these points by specifying the geodesic (V, y k ) and the affine parameter U along that
geodesic, i.e. by Rosen coordinates.
We can now also understand the reasons for the failure of Rosen coordinates: they cease
to be well-defined (and give rise to spurious coordinate singularities) e.g. when geodesics
in the family (congruence) of null geodesics interesect: in that case there is no longer
a unique value of the coordinates (U, V, y k ) that one can associate to that intersection
point.
To illustrate this point, consider simply R2 with its standard metric ds2 = dx2 + dy 2 .
An example of a “good” congruence of geodesics is the straight lines parallel to the
x-axis. The corresponding “Rosen” coordinates (“Rosen” in quotes because we are not
talking about null geodesics) are simply the globally well-defined Cartesian coordinates,
x playing the role of the affine parameter U and y that of the transverse coordinates y k
labelling the geodesics. An example of a “bad” family of godesics is the straight lines
through the origin. The corresponding “Rosen” coordinates are essentially just polar
coordinates. Away from the origin there is again a unique geodesic passing through any
point but, as is well known, this coordinate system breaks down at the origin.
With this in mind, we can now reconsider the “bad” Rosen coordinates
for flat space. As we have seen above, in Brinkmann coordinates the metric is manifestly
flat,
ds̄2 = 2dudv + d~x2 . (29.67)
528
Using the coordinate transformation (29.55) from Rosen to Brinkmann coordinates, we
see that the geodesic lines y k = ck , V = c of the congruence defined by the metric (29.66)
correspond to the lines xk = ck u in Brinkmann (Minkowski) coordinates. But these are
precisely the straight lines through the origin. This explains the coordinate singularity
at U = 0 and further strengthens the analogy with polar coordinates mentioned at the
end of section 29.1.
More generally, we see from (29.55) that the relation between the Brinkmann coordinates
xa and the Rosen coordinates y k ,
xa = Ēka (U )y k , (29.68)
and hence the expression for the geodesic lines y k = ck , becomes degenerate when Ēka be-
comes degenerate, i.e. precisely when ḡij becomes degenerate. Brinkmann coordinates,
on the other hand, provide a global coordinate chart for plane wave metrics.
satisfies
Ë/E = Tr A + (Tr M )2 − Tr(M 2 ) ≤ Tr A = −R̄uu , (29.70)
where use has been made of the expression R̄uu = − Tr A (29.37) for the Ricci tensor,
and where Mab is the symmetric matrix (29.53)
In particular, therefore, if R̄uu > 0, then E(u) is strictly concave downwards but positive
at a non-degenerate point, so that necessarily e(u0 ) = 0 for some finite value of u0 , and
the Rosen coordinate system breaks down there. By (29.44), R̄uu > 0 is equivalent to
positivity of the lightcone energy density, a very reasonable requirement on the matter
content.
We now study the isometries of a generic plane wave metric. In Brinkmann coordinates,
because of the explicit dependence of the metric on u and the transverse coordinates,
only one isometry is manifest, namely that generated by the parallel null vector Z = ∂v .
In Rosen coordinates, the metric depends neither on V nor on the transverse coordinates
y k , and one sees that in addition to Z = ∂V there are at least d more Killing vectors,
93
This is adapted from G. Gibbons, Quantized Fields Propagating in Plane-Wave Spacetimes, Com-
mun. Math. Phys. 45 (1975) 191-202.
529
namely the ∂yk . Together these form an Abelian translation algebra acting transitively
on the null hypersurfaces of constant U .
However, this is not the whole story. Indeed, one particularly interesting and peculiar
feature of plane wave space-times is the fact that they generically possess a solvable
(rather than semi-simple) isometry algebra, namely a Heisenberg algebra, only part of
which we have already seen above.
All Killing vectors V can be found in a systematic way by solving the Killing equations
LV gµν = ∇µ Vν + ∇ν Vµ = 0 . (29.72)
I will not do this here but simply present the results of this analysis in Brinkmann
coordinates. The upshot is that a generic (2 + d)-dimensional plane wave metric has a
(2d + 1)-dimensional isometry algebra generated by the Killing vector Z = ∂v and the
2d Killing vectors
X(f(K) ) ≡ X(K) = f(K)a ∂a − f˙(K)a xa ∂v . (29.73)
Here the f(K)a , K = 1, . . . , 2d are the 2d linearly independent solutions of the harmonic
oscillator equation (again!)
f¨a (u) = Aab (u)fb (u) . (29.74)
These Killing vectors satisfy the algebra
530
Therefore the corresponding Killing vectors
[Q(a) , Z] = [P(a) , Z] = 0
[Q(a) , Q(b) ] = [P(a) , P(b) ] = 0
[Q(a) , P(b) ] = δab Z . (29.82)
Generically, a plane wave metric has just this Heisenberg algebra of isometries. It
acts transitively on the null hyperplanes u = const., with a simply transitive Abelian
subalgebra. However, for special choices of Aab (u), there may of course be more Killing
vectors. These could arise from internal symmetries of Aab , giving more Killing vectors
in the transverse directions. For example, the conformally flat plane waves (29.43)
have an additional SO(d) symmetry (and conversely SO(d)-invariance implies conformal
flatness).
Of more interest to us is the fact that for particular Aab (u) there may be Killing vectors
with a ∂u -component. The existence of such a Killing vector renders the plane wave
homogeneous (away form the fixed points of this extra Killing vector). The obvious
examples are plane waves with a u-independent profile Aab ,
which have the extra Killing vector X = ∂u . Since Aab is u-independent, it can be
diagonalised by a u-independent orthogonal transformation acting on the xa . Moreover,
the overall scale of Aab can be changed, Aab → µ2 Aab , by the coordinate transformation
(boost)
(u, v, xa ) → (µu, µ−1 v, xa ) . (29.84)
Thus these metrics are classified by the eigenvalues of Aab up to an overall scale and
permutations of the eigenvalues.
¯ µ R̄λνρσ = 0 ⇔ ∂u Aab = 0 .
∇ (29.85)
Thus a plane wave with constant wave profile Aab is what is known as a locally symmetric
space.
The existence of the additional Killing vector X = ∂u extends the Heisenberg algebra
to the harmonic oscillator algebra, with X playing the role of the number operator or
531
harmonic oscillator Hamiltonian. Indeed, X and Z = ∂v obviously commute, and the
commutator of X with one of the Killing vectors X(f ) is
Note that this is consistent, i.e. the right-hand side is again a Killing vector, because
when Aab is constant and f satisfies the harmonic oscillator equation then so does its
u-derivative f˙. In terms of the basis (29.81), we have
Another way of understanding the relation between X = ∂u and the harmonic oscillator
Hamiltonian is to look at the conserved charge associated with X for particles moving
along geodesics. As we have seen in section 7.1, given any Killing vector X, the quantity
QX = Xµ ẋµ (29.88)
which we had already identified (up to a constant for non-null geodesics) as minus the
harmonic oscillator Hamiltonian in section 29.3. This is indeed a conserved charge iff
the Hamiltonian is time-independent i.e. iff Aab is constant.
We thus see that the dynamics of particles in a symmetric plane wave background is
intimately related to the geometry of the background itself.
Another class of examples of plane waves with an interesting additional Killing vector
are plane waves with the non-trivial profile
for some constant matrix Bab = Aab (1). Without loss of generality one can then assume
that Bab and Aab are diagonal, with eigenvalues the oscillator frequency squares −ωa2 ,
du2
ds̄2 = 2dudv + Bab xa xb + d~x2 (29.92)
u2
is invariant under the boost/scaling (29.84), corresponding to the extra Killing vector
532
Note that in this case the Killing vector Z = ∂v is no longer a central element of the
isometry algebra, since it has a non-trivial commutator with X,
[X, Z] = Z . (29.94)
Moreover, one finds that the commutator of X with a Heisenberg algebra Killing vector
X(f ), fa a solution to the harmonic oscillator equation, is the Heisenberg algebra Killing
vector
[X, X(f )] = X(uf˙) , (29.95)
This concludes our brief discussion of plane wave metrics even though much more can
and perhaps should be said about plane wave and pp-wave metrics, in particular in the
context of the so-called Penrose Limit construction. For more on this see my lecture
notes94 (from which I also took the material in this chapter).
94
M. Blau, Lecture Notes on Plane Waves and Penrose Limits, available from
http://www.blau.itp.unibe.ch/Lecturenotes.html
533
30 Kaluza-Klein Theory
[Note: I have not taught, and hence not updated / corrected / improved, the material
and the presentation of the material in this section in a long time.]
Looking at the Einstein equations and the variational principle, we see that gravity is
nicely geometrised while the matter part has to be added by hand and is completely
non-geometric. This may be perfectly acceptable for phenomenological Lagrangians
(like that for a perfect fluid in Cosmology), but it would clearly be desirable to have a
unified description of all the fundamental forces of nature.
Today, the fundamental forces of nature are described by two very different concepts.
On the one hand, we have - as we have seen - gravity, in which forces are replaced by
geometry, and on the other hand there are the gauge theories of the electroweak and
strong interactions (the standard model) or their (grand unified, . . . ) generalisations.
Thus, if one wants to unify these forces with gravity, there are two possibilities:
1. One can try to realise gravity as a gauge theory (and thus geometry as a conse-
quence of the gauge principle).
2. Or one can try to realise gauge theories as gravity (and hence make them purely
geometric).
The first is certainly an attractive idea and has attracted a lot of attention. It is also
quite natural since, in a broad sense, gravity is already a gauge theory in the sense
that it has a local invariance (under general coordinate transformations or, actively,
diffeomorphisms). Also, the behaviour of Christoffel symbols under general coordinate
transformations is analogous to the transformation behaviour of non-Abelian gauge
fields under gauge transformations, and the whole formalism of covariant derivatives
and curvatures is reminiscent of that of non-Abelian gauge theories.
At first sight, equating the Christoffel symbols with gauge fields (potentials) may ap-
pear to be a bit puzzling because we originally introduced the metric as the potential
of the gravitational field and the Christoffel symbol as the corresponding field strength
(representing the gravitational force). However, as we know, the concept of ‘force’ is
itself a gauge (coordinate) dependent concept in General Relativity, and therefore these
‘field strengths’ behave more like gauge potentials themselves, with their curvature, the
Riemann curvature tensor, encoding the gauge covariant information about the gravi-
tational field. This fact, which reflects deep properties of gravity not shared by other
forces, is just one of many which suggest that an honest gauge theory interpretation of
534
gravity may be hard to come by. But let us proceed in this direction for a little while
anyway.
Clearly, the gauge group should now not be some ‘internal’ symmetry group like U (1)
or SU (3), but rather a space-time symmetry group itself. Among the gauge groups that
have been suggested in this context, one finds
1. the translation group (this is natural because, as we have seen, the generators of
coordinate transformations are infinitesimal translations)
2. the Lorentz group (this is natural if one wants to view the Christoffel symbols as
the analogues of the gauge fields of gravity)
However, what - by and large - these investigations have shown is that the more one
tries to make a gauge theory look like Einstein gravity the less it looks like a standard
gauge theory and vice versa.
The main source of difference between gauge theory and gravity is the fact that in the
case of Yang-Mills theory the internal indices bear no relation to the space-time indices
whereas in gravity these are the same - contrast Fµνa with (F λ ) λ
σ µν = R σµν .
In particular, in gravity one can contract the ‘internal’ with the space-time indices to
obtain a scalar Lagrangian, R, linear in the curvature tensor. This is fortunate because,
from the point of view of the metric, this is already a two-derivative object.
For Yang-Mills theory, on the other hand, this is not possible, and in order to construct
a Lagrangian which is a singlet under the gauge group one needs to contract the space-
time and internal indices separately, i.e. one has a Lagrangian quadratic in the field
stregths. This gives the usual two-derivative action for the gauge potentials.
In spite of these and other differences and difficulties, this approach has not been com-
pletely abandoned and the gauge theory point of view is still very fruitful and useful
provided that one appreciates the crucial features that set gravity apart from standard
gauge theories.
The second possibility alluded to above, to realise gauge theories as gravity, is much
more radical. But how on earth is one supposed to achieve this? The crucial idea
has been known since 1919/20 (T. Kaluza), with important contributions by O. Klein
(1926). So what is this idea?
In the early parts of the last century, the only other fundamental force that was known,
in addition to gravity, was electro-magnetism, In 1919, Kaluza submitted a paper (to
535
Einstein) in which he made a number of remarkable observations.
First of all, he stressed the similarity between Christoffel symbols and the Maxwell field
strength tensor,
1
Γµνλ = 2 (∂ν gµλ − ∂µ gνλ + ∂λ gµν )
Fνµ = ∂ν Aµ − ∂µ Aν . (30.1)
He then noted that Fµν looks like a truncated Christoffel symbol and proposed, in order
to make this more manifest, to introduce a fifth dimension with a metric such that
Γµν5 ∼ Fµν . This is inded possible. If one makes the identification
Aµ = gµ5 , (30.2)
and the assumption that gµ5 is independent of the fifth coordinate x5 , then one finds,
using the standard formula for the Christoffel symbols, now extended to five dimensions,
that
1
Γµν5 = 2 (∂5 gµν + ∂ν gµ5 − ∂µ gν5 )
1 1
= 2 (∂ν Aµ − ∂µ Aν ) = 2 Fνµ . (30.3)
But much more than this is true. Kaluza went on to show that when one postulates
a five-dimensional metric of the form (hatted quantities will from now on refer to five
dimensional quantities)
b = R − 1 Fµν F µν .
R (30.5)
4
This fact is affectionately known as the Kaluza-Klein Miracle! Moreover, the five-
dimensional geodesic equation turns into the four-dimensional Lorentz force equation
for a charged particle, and in this sense gravity and Maxwell theory have really been
unified in five-dimensional gravity.
However, although this is very nice, rather amazing in fact, and is clearly trying to tell
us something deep, there are numerous problems with this and it is not really clear
what has been achieved:
2. If it is to be treated as real, why should one make the assumption that the fields
are independent of x5 ? But if one does not make this assumption, one will not
get Einstein-Maxwell theory.
536
3. Moreover, if the fifth dimension is to be taken seriously, why are we justified in
setting g55 = 1? If we do not do this, we will not get Einstein-Maxwell theory.
In spite of all this and other questions, related to non-Abelian gauge symmetries or the
quantum behaviour of these theories, Kaluza’s idea has remained popular ever since or,
rather, has periodically created psychological epidemics of frantic activity, interrupted
by dormant phases. Today, Kaluza’s idea, with its many reincarnations and variations,
is an indispensable and fundamental ingredient in the modern theories of theoretical high
energy physics (supergravity and string theories) and many of the questions/problems
mentioned above have been addressed, understood and overcome.
Let us now look at this more precisely. We consider a five-dimensional space-time with
coordinates xbM = (xµ , x5 ) and a metric of the form (30.4). For later convenience, we
will introduce a parameter λ into the metric (even though we will set λ = 1 for the time
being) and write it as
gbµν = gµν + Aµ Aν
gbµ5 = Aµ
gb55 = 1 . (30.7)
gbµν = gµν
gbµ5 = −Aµ
gb55 = 1 + Aµ Aµ . (30.8)
We will (for now) assume that nothing depends on x5 (in the old Kaluza-Klein literature
this assumption is known as the cylindricity condition).
Fµν = ∂µ Aν − ∂ν Aµ
Bµν = ∂µ Aν + ∂ν Aµ , (30.9)
bµ
Γ 1 µ
5λ = − 2 F λ
b5
Γ = − 1 Fµν Aν
5µ 2
bµ = Γ
Γ b5 = 0 . (30.10)
55 55
537
This does not look particularly encouraging, in particular because of the presence of
the Bµν term, but Kaluza was not discouraged and proceeded to calculate the Riemann
tensor. I will spare you all the components of the Riemann tensor, but the Ricci tensor
we need:
bµν = Rµν + 1 F ρ Fρν + 1 F λρ Fλρ Aµ Aν + 1 (Aν ∇ρ F ρ + Aµ ∇ρ F ρ )
R 2 µ 4 2 µ ν
b 1 ν 1
R5µ = + 2 ∇ν Fµ + 4 Aµ Fνλ F νλ
b55 = 1 Fµν F µν .
R (30.11)
4
This looks a bit more attractive and covariant but still not very promising. [However, if
you work in an orthonormal basis, if you know what that means, the result looks much
nicer. In such a basis only the first two terms in R bµν and the first term in R b5µ are
present and Rb55 is unchanged, so that all the non-covariant looking terms disappear.]
Now the miracle happens. Calculating the curvature scalar, all the annoying terms drop
out and one finds
b = R − 1 Fµν F µν ,
R (30.12)
4
i.e. the Lagrangian of Einstein-Maxwell theory. For λ 6= 1, the second term would have
been multiplied by λ2 . We now consider the five-dimensional pure gravity Einstein-
Hilbert action Z p
1
Sb = b .
gbd5 x R (30.13)
b
8π G
In order for the integral over x5 to converge we assume that the x5 -direction is a circle
with radius L and we obtain
Z
b 2πL √ 4
S= gd x (R − 14 λ2 Fµν F µν ) . (30.14)
b
8π G
Therefore, if we make the identifications
GN b
= G/2πL
λ2 = 8πGN , (30.15)
we obtain Z Z
1 √ 4 1 √ 4
Sb = gd x R − gd x Fµν F µν , (30.16)
8πGN 4
i.e. precisely the four-dimensional Einstein-Maxwell Lagrangian! This amazing fact, that
coupled gravity gauge theory systems can arise from higher-dimensional pure gravity,
is certainly trying to tell us something.
538
and contrast this with the most general form of the line element in five dimensions,
namely
Clearly, the form of the general five-dimensional line element (30.18) is invariant under
′
arbitrary five-dimensional general coordinate transformations xM → ξ M (xN ). This
is not true, however, for the Kaluza-Klein ansatz (30.17), as a general x5 -dependent
coordinate transformation would destroy the x5 -independence of b gµν = gµν and b gµ5 =
Aµ and would also not leave gb55 = 1 invariant.
The form of the Kaluza-Klein line element is, however, invariant under the following
two classes of coordinate transformations:
x5 → x5
′
xµ → ξ ν (xµ ) (30.19)
x5 → ξ 5 (xµ , x5 ) = x5 + f (xµ )
xµ → ξ µ (xν ) = xµ . (30.20)
Under this transformation, gµν and g55 are invariant, but Aµ = gµ5 changes as
∂xM ∂xN
A′µ = gbµ5
′
= gbM N
∂ξ µ ∂ξ 5
∂xM
= gµ5
∂xµ
∂f
= gµ5 − µ g55
∂x
= Aµ − ∂µ f . (30.21)
In other words, the Kaluza-Klein line element is invariant under the shift x5 →
x5 + f (xµ ) accompanied by Aµ → Aµ − ∂µ f (and this can of course also be read
off directly from the metric).
But this is precisely a gauge transformation of the vector potential Aµ and we see that in
the present context gauge transformations arise as remnants of five-dimensional general
covariance!
539
But now it is clear that we are guaranteed to get Einstein-Maxwell theory in four
dimensions: First of all, upon integration over x5 , the shift in x5 is irrelevant and
starting with the five-dimensional Einstein-Hilbert action we are bound to end up with
an action in four dimensions, depending on gµν and Aµ , which is (a) generally covariant
(in the four-dimensional sense), (b) second order in derivatives, and (c) invariant under
gauge transformations of Aµ . But then the only possibility is the Einstein-Maxwell
action.
From this point of view, the gauge transformation of the vector potential arises from
the Lie derivative of gbµ5 along the vector field f (xµ )∂5 :
Y = f (xµ )∂5 ⇒ Y µ = 0
Y5 =f
⇒ Yµ = Aµ f
Y5 = f . (30.22)
(LY b
g)µ5 = b µ Y5 + ∇
∇ b 5 Yµ
= bµ Y M
∂µ Y5 − 2Γ 5M
= ∂µ f + F νµ Yν + Fµν Aν Y5
= ∂µ f
⇔ δAµ = −∂µ f . (30.23)
This point of view becomes particularly useful when one wants to obtain non-Abelian
gauge symmetries in this way (via a Kaluza-Klein reduction): One starts with a higher-
dimensional internal space with isometry group G and makes an analogous ansatz for the
metric. Then among the remnants of the higher-dimensional general coordinate trans-
formations there are, in particular, xµ -dependent ‘isometries’ of the internal metric.
These act like non-Abelian gauge transformations on the off-block-diagonal compone-
nents of the metric and, upon integration over the internal space, one is guaranteed to
get, perhaps among other things, the four-dimensional Einstein-Hilbert and Yang-Mills
actions.
30.4 Geodesics
There is something else that works very beautifully in this context, namely the descrip-
tion of the motion of charged particles in four dimensions moving under the combined
540
influence of a gravitational and an electro-magnetic field. As we will see, also these two
effects are unfied from a five-dimensional Kaluza-Klein point of view.
bM
ẍM + Γ N L
N L ẋ ẋ = 0 . (30.24)
Either because the metric (and hence the Lagrangian) does not depend on x5 , or because
we know that V = ∂5 is a Killing vector of the metric, we know that we have a conserved
quantity
∂L
∼ VM ẋM = ẋ5 + Aµ ẋµ , (30.25)
∂ ẋ5
along the geodesic world lines. We will see in a moment what this quantity corresponds
to. The remaining xµ -component of the geodesic equation is
This is precisely the Lorentz law if one identifies the constant of motion with the ratio
of the charge and the mass of the particle,
e
ẋ5 + Aµ ẋµ = . (30.28)
m
Hence electro-magnetic and gravitational forces are indeed unified. The fact that
charged particles take a different trajectory from neutral ones is not a violation of
the equivalence principle but only reflects the fact that they started out with a different
velocity in the x5 -direction!
But now let us take a look at the equations of motion following from the five-dimensional
Einstein-Hilbert action. These are, as we are looking at the vacuum equations, just the
Ricci-flatness equations RbM N = 0. But looking back at (30.11) we see that these are
541
b55 = 0 imposes
clearly not equivalent to the Einstein-Maxwell equations. In particular, R
the constraint
Rb55 = 0 ⇒ Fµν F µν = 0 , (30.30)
bµν = 0, R
and only then do the remaining equations R bµ5 = 0 become equivalent to the
Einstein-Maxwell equations (30.29).
What happened? Well, for one, taking variations and making a particular ansatz for
the field configurations in the variational principle are two operations that in general do
not commute. In particular, the Kaluza-Klein ansatz is special because it imposes the
condition g55 = 1. Thus in four dimensions there is no equation of motion corresponding
to gb55 whereas R b55 = 0, the additional constraint, is just that, the equation arising
from varying b g55 . Thus Einstein-Maxwell theory is not a consistent truncation of five-
dimensional General Relativity.
But now we really have to ask ourselves what we have actually achieved. We would like
to claim that the five-dimensional Einstein-Hilbert action unifies the four-dimensional
Einstein-Hilbert and Maxwell actions, but on the other hand we want to reject the
five-dimensional Einstein equations? But then we are not ascribing any dynamics to
the fifth dimension and are treating the Kaluza-Klein miracle as a mere kinematical,
or mathematical, or bookkeeping device for the four-dimensional fields. This is clearly
rather artificial and unsatisfactory.
There are some other unsatisfactory features as well in the theory we have developed so
far. For instance we demanded that there be no dependence on x5 , which again makes
the five-dimensional point of view look rather artificial. If one wants to take the fifth
dimension seriously, one has to allow for an x5 -dependence of all the fields (and then
explain later, perhaps, why we have not yet discovered the fifth dimension in every-day
or high energy experiments).
With these issues in mind, we will now revisit the Kaluza-Klein ansatz, regarding the
fifth dimension as real and exploring the consequences of this. Instead of considering
directly the effect of a full (i.e. not restricted by any special ansatz for the metric) five-
dimensional metric on four-dimensional physics, we will start with the simpler case of
a free massless scalar field in five dimensions.
with x5 a coordinate on a circle with radius L. Now consider a massless scalar field φb
542
on M5 , satisfying the five-dimensional massless Klein-Gordon equation
b µ , x5 ) = ηbM N ∂M ∂N φ(x
b φ(x
2 b µ , x5 ) = 0 . (30.32)
As x5 is periodic with period 2πL, we can make a Fourier expansion of φb to make the
x5 -dependence more explicit,
X 5
b µ , x5 ) =
φ(x φn (xµ )e inx /L . (30.33)
n
Plugging this expansion into the five-dimensional Klein-Gordon equation, we find that
this turns into an infinite number of decoupled equations, one for each Fourier mode of
b namely
φn of φ,
(2 − m2n )φn = 0 . (30.34)
Here 2 of course now refers to the four-dimensional d’Alembertian, and the mass term
n2
m2n = (30.35)
L2
arises from the x5 -derivative ∂52 in 2
b.
Thus we see that, from a four-dimensional perspective, a massless scalar field in five
dimensions give rise to one massless scalar field in four dimensions (the harmonic or
constant mode on the internal space) and an infinite number of massive fields. The
masses of these fields, known as the Kaluza-Klein modes, have the behaviour mn ∼ n/L.
In general, this behaviour, an infinite tower of massive fields with mass ∼ 1/ length scale
is characteristic of massive fields arising from dimensional reduction from some higher
dimensional space.
Next, instead of looking at a scalar field on Minkowski space times a circle with the
product metric, let us consider the Kaluza-Klein metric,
b µ , x5 ) = gbM N ∇
b φ(x
2 b µ , x5 ) = 0 .
b M ∂N φ(x (30.37)
Rather than spelling this out in terms of Christoffel symbols, it is more convenient to
p √
use (4.40) and recall that gb = g = 1 to write this as
g M N ∂N )
b = ∂M (b
2
gµν ∂ν + ∂5 gb5µ ∂µ + ∂µ gbµ5 ∂5 + ∂5 gb55 ∂5
= ∂µ b
= η µν ∂µ ∂ν + ∂5 (−λAµ ∂µ ) + ∂µ (−λAµ ∂5 ) + (1 + λ2 Aµ Aµ )∂5 ∂5
= η µν (∂µ − λAµ ∂5 )(∂ν − λAν ∂5 ) + (∂5 )2 . (30.38)
543
Acting with this operator on the Fourier decomposition of φ,b we evidently again get an
infinite number of decoupled equations, one for each Fourier mode φn of φ,b namely
µν λn λn 2
η (∂µ − i Aµ )(∂ν − i Aν ) − mn φn = 0 . (30.39)
L L
This shows that the non-constant (n 6= 0) modes are not only massive but also charged
under the gauge field Aµ . Comparing the operator
λn
∂µ − i Aµ (30.40)
L
with the standard form of the minimal coupling,
~
∂µ − eAµ , (30.41)
i
we learn that the electric charge en of the n’th mode is given by
en nλ
= . (30.42)
~ L
In particular, these charges are all integer multiples of a basic charge, en = ne, with
√
~λ 8πGN ~
e= = . (30.43)
L L
Thus we get a formula for L, the radius of the fifth dimension,
8πGN ~2 8πGN ~
L2 = 2
= 2 . (30.44)
e e /~
Restoring the velocity of light in this formula, and identifying the present U (1) gauge
symmetry with the standard gauge symmetry, we recognise here the fine structure con-
stant
α = e2 /4π~c ≈ 1/137 , (30.45)
Another way of saying this is that the fact that L is so tiny implies that the masses mn
are huge, not far from the Planck mass
r
~c
mP = ≈ 10−5 g ≈ 1019 GeV . (30.48)
GN
544
These would never have been spotted in present-day accelerators. Thus the massive
modes are completely irrelevant for low-energy physics, the non-constant modes can be
dropped, and this provides a justification for neglecting the x5 -dependence. However,
this also means that the charged particles we know (electrons, protons, . . . ) cannot
possibly be identified with these Kaluza-Klein modes.
The way modern Kaluza-Klein theories address this problem is by identifying the light
charged particles we observe with the massless Kaluza-Klein modes. One then requires
the standard spontaneous symmetry breaking mechanism to equip them with the small
masses required by observation. This still leaves the question of how these particles
should pick up a charge (as the zero modes are not only massless but also not charged).
This is solved by going to higher dimensions, with non-Abelian gauge groups, for which
massless particles are no longer necessarily singlets of the gauge group (they could e.g.
live in the adjoint).
We have seen above that a massless scalar field in five dimensions gives rise to a massless
scalar field plus an infinite tower of massive scalar fields in four dimensions. What
happens for other fields (after all, we are ultimately interested in what happens to the
five-dimensional metric)?
Retaining, for the same reasons as before, only the massless, i.e. x5 -independent, modes
we therefore obtain a theory involving one scalar field and one Abelian vector field
from pure Maxwell theory in five dimensions. The Lagrangian for these fields would be
(dropping all x5 -derivatives)
FM N F M N = Fµν F µν + 2Fµ5 F µ5
→ Fµν F µν + 2(∂µ φ)(∂ µ φ) . (30.49)
545
dimensional reduction or Kaluza-Klein reduction. But the terminology is not uniform
here - sometimes the latter term is used to indicate the reduction including all the
massive modes. Also, in general ‘massless’ is not the same as ‘x5 -independent’, and
then Kaluza-Klein reduction may refer to keeping the massless modes rather than the
x5 -independent modes one retains in dimensional reduction.
Likewise, we can now consider what happens to the five-dimensional metric b gM N (xL ).
From a four-dimensional perspective, this splits into three different kinds of fields,
namely a symmetric tensor b gµν , a covector Aµ = gbµ5 and a scalar φ = b g55 . As be-
fore, these will each give rise to a massless field in four dimensions (which we interpret
as the metric, a vector potential and a scalar field) as well as an infinite number of
massive fields.
We see that, in addition to the massless fields we considered before, in the old Kaluza-
Klein ansatz, we obtain one more massless field, namely the scalar field φ. Thus, even
if we may be justified in dropping all the massive modes, we should keep this massless
field in the ansatz for the metric and the action. With this in mind we now return to
the Kaluza-Klein ansatz.
Let us once again consider pure gravity in five dimensions, i.e. the Einstein-Hilbert
action Z p
b 1 b .
S= gbd5 x R (30.50)
b
8π G
Let us now parametrise the full five-dimensional metric as
where all the fields depend on all the coordinates xµ , x5 . Any five-dimensional metric
can be written in this way and we can simply think of this as a change of variables
546
The only thing that may require some explanation is the strange overall power of φ. To
see why this is a good choice, assume that the overall power is φa for some a. Then for
p
gb one finds
p √ √
gb = φ5a/2 φ1/2 g = φ(5a+1)/2 g . (30.54)
On the other hand, for the Ricci tensor one has, schematically,
and therefore
R̂ = gbµν Rµν + . . .
= φ−a g µν Rµν + . . .
= φ−a R + . . . . (30.56)
Thus, if one wants the five-dimensional Einstein-Hilbert action to reduce to the standard
four-dimensional Einstein-Hilbert action (plus other things), without any non-minimal
coupling of the scalar field φ to the metric, one needs to choose a = −1/3 which is the
choice made in (30.51,30.53).
Making a Fourier-mode expansion of all the fields, plugging this into the Einstein-Hilbert
action Z p
1
gbd5 x R̂ , (30.58)
8π Gb
integrating over x5 and retaining only the constant modes g(0)µν , A(0)µ and φ(0) , one
obtains the action
Z
√ 4 1 1 µν 1 −2 µν
S= gd x R(g(0)µν ) − φ(0) F(0)µν F(0) − φ g ∂µ φ(0) ∂ν φ(0) .
8πGN 4 48πGN (0) (0)
(30.59)
Here we have once again made the identifications (30.15). This action may not look as
nice as before, but it is what it is. It is at least generally covariant and gauge invariant,
as expected. We also see very clearly that it is inconsistenst with the equations of
motion for φ(0) ,
3 µν
2 log φ(0) = 8πGN φ(0) F(0)µν F(0) , (30.60)
4
µν
to set φ(0) = 1 as this would imply F(0)µν F(0) = 0, in agreement with our earlier
observations regarding R̂55 = 0.
However, the configuration g(0)µν = ηµν , A(0)µ = 0, φ(0) = 1 is a solution to the equations
of motion and defines the ‘vacuum’ or ground state of the theory. From this point of
547
view the zero mode metric, (30.53) with the fields replaced by their zero modes, i.e. the
Kaluza-Klein ansatz with the inclusion of φ, has the following interpretation: as usual
in quantum theory, once one has chosen a vacuum, one can consider fluctuations around
that vacuum. The fields g(0)µν , A(0)µ , φ(0) are then the massless fluctuations around the
vacuum and are the fields of the low-energy action. The full classical or quantum theory
will also contain all the massive and charged Kaluza-Klein modes.
Even though in certain respects the Abelian theory we have discussed above is atypi-
cal, it is rather straightforward to generalise the previous considerations from Maxwell
theory to Yang-Mills theory for an arbitrary non-Abelian gauge group. Of course, to
achieve that, one needs to consider higher-dimensional internal spaces, i.e. gravity in
4 + d dimensions, with a space-time of the form M4 × Md . The crucial observation is
that gauge symmetries in four dimensions arise from isometries (Killing vectors) of the
metric on Md .
Let the coordinates on Md be xa , denote by gab the metric on Md , and let Kia , i =
1, . . . , n denote the n linearly independent Killing vectors of the metric gab . These
generate the Lie algebra of the isometry group G via the Lie bracket
Md could for example be the group manifold of the Lie group G itself, or a homogeneous
space G/H for some subgroup H ⊂ G.
s2 = gµν dxµ dxν + gab (dxa + Kia Aiµ dxµ )(dxb + Kjb Ajν dxν ) .
db (30.62)
Note the appearance of fields with the correct index structure to act as non-Abelian
gauge fields for the gauge group G, namely the Aiµ . Again these should be thought of
as fluctuations of the metric around its ‘ground state’, M4 × Md with its product metric
(gµν , gab ).
i.e.
δxa = f i (xµ )Kia (xb ) . (30.64)
δb
gµa = LV gbµa (30.65)
548
can be seen to imply
δAiµ = Dµ f i ≡ ∂µ f i − fjk
i
Ajµ f k , (30.66)
i.e. precisely an infinitesimal non-Abelian gauge transformation. The easiest way to see
this is to use the form of the Lie derivative not in its covariant form,
LV b b µ Va + ∇
gµa = ∇ b a Vµ (30.67)
gµa = V c ∂c gbµa + ∂µ V c b
LV b gca + ∂a V c gbµc . (30.68)
Inserting the definitions of gbµa and V a , using the fact that the Kia are Killing vectors
of the metric gab and the relation (30.61), one finds
i
LY M ∼ Fµν F j µν Kia Kjb gab (30.70)
The problem with this scenario (already prior to worrying about the inclusion of scalar
fields, of which there will be plenty in this case, one for each component of gab ) is
that the four-dimensional space-time cannot be chosen to be flat. Rather, it must
have a huge cosmological constant. This arises because the dimensional reduction of
the (4 + d)-dimensional Einstein-Hilbert Lagrangian R̂ will also include a contribution
from the scalar curvature Rd of the metric on Md . For a compact internal space with
non-Abelian isometries this scalar curvature is non-zero and will therefore lead to an
effective cosmological constant in the four-dimensional action. This cosmological con-
stant could be cancelled ‘by hand’ by introducing an appropriate cosmological constant
of the opposite sign into the (d + 4)-dimensional Einstein-Hilbert action, but this looks
rather contrived and artificial.
Nevertheless, this and other problems have not stopped people from looking for ‘realistic’
Kaluza-Klein theories giving rise to the standard model gauge group in four dimension.
Of course, in order to get the standard model action or something resembling it, fermions
need to be added to the (d + 4)-dimensional action.
An interesting observation in this regard is that the lowest possible dimension for a
homogenous space with isometry group G = SU (3) × SU (2) × U (1) is seven, so that the
dimension of space-time is eleven. This arises because the maximal compact subgroup H
of G, giving rise to the smallest dimensional homogeneous space G/H of G, is SU (2) ×
U (1) × U (1). As the dimension of G is 8 + 3 + 1 = 12 and that of H is 3 + 1 + 1 = 5, the
549
dimension of G/H is 12 − 5 = 7. This is intriguing because eleven is also the highest
dimension in which supergravity exists (in higher dimensions, supersymmetry would
require the existence of spin > 2 particles). That, plus the hope that supergravity
would have a better quantum behaviour than ordinary gravity, led to an enourmous
amount of activity on Kaluza-Klein supergravity in the early 80’s.
Unfortunately, it turned out that not only was supergravity sick at the quantum level
as well but also that it is impossible to get a chiral fermion spectrum in four dimensions
from pure gravity plus spinors in (4+d) dimensions. One way around the latter problem
is to include explicit Yang-Mills fields already in (d + 4)-dimensions, but that appeared
to defy the purpose of the whole Kaluza-Klein idea.
Today, the picture has changed and supergravity is regarded as a low-energy approxi-
mation to string theory which is believed to give a consistent description of quantum
gravity. These string theories typically live in ten dimensions, and thus one needs
to ‘compactify’ the theory on a small internal six-dimensional space, much as in the
Kaluza-Klein idea. Even though non-Abelian gauge fields now typically do not arise
from Kaluza-Klein reduction but rather from explicit gauge fields in ten dimensions (or
objects called D-branes), in all other respects Kaluza’s old idea is alive, doing very well,
and an indispensable part of the toolkit of modern theoretical high energy physics.
THE END
550