0% found this document useful (0 votes)
27 views33 pages

CSCO Picasso

Uploaded by

Morion Borta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views33 pages

CSCO Picasso

Uploaded by

Morion Borta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter 4

The Postulates of Quantum Mechanics

4.1 The Superposition Principle


Most of the fundamental concepts that are at the basis of quantum mechanics
have already emerged during the discussion of the previous chapter. Now the
problem is to formalize these concepts within a coherent structure, namely to
go ‘from words to facts’. We will try to proceed in this process of formalization
both by making good use of the already discussed examples and by arriving at
the formal aspects of the theory as much as possible from the physical point
of view (for example, this is the way we shall succeed in establishing that
observable quantities are represented by operators on the space of states).
In Sect. 3.4 we have already recognized the necessity to describe the states
of a system by means of the vectors of a complex vector space: this is the first
fundamental principle at the basis of quantum mechanics, known as
Superposition Principle: the states of a system are represented by the
elements (vectors) of a vector space H over the complex field. Vectors propor-
tional to one another (by a complex factor) represent the same state. Therefore
the states are in correspondence with the rays of H.
Moreover, in Sect. 3.5, we have seen that – in order to give a predictive
character to the theory – it is necessary that the space H be endowed with a
Hermitian scalar product.
The dimension of H depends on the system under consideration: normally
it is infinite.
The space H, that we shall assume complete, is therefore a Hilbert space .
We also assume that H be separable (namely it admits a countable orthonor-
mal basis).
As far as the (infinite) dimension and the separability are concerned, these
– in the case of one or more particles – will be deduced by the quantization
postulate to be introduced in the sequel.
One can now ask oneself if the correspondence between states of the system
and rays of H is a bijective one, namely if to any vector (or ray) of H there
always corresponds a state of the system. It is likely that this assumption is

Ó Springer International Publishing Switzerland 2016 61


L.E. Picasso, Lectures in Quantum Mechanics, UNITEXT for Physics,
DOI 10.1007/978-3-319-22632-3_4
62 4 The Postulates of Quantum Mechanics

not true and not even necessary; on this point, that is rather marginal, we shall
come back later and we shall see that, probably, it is sufficient to assume that
the physical states are in correspondence with a dense, algebraically closed
(i.e. closed under finite linear combinations) subset of H. However there is
no doubt that the hypothesis that H be a complete space is extremely useful
from the mathematical point of view – indeed it is an hypothesis one cannot
simply give up: that is why we shall assume it is fulfilled. This (probably)
means that the vectors representative of physical states are immersed in a
larger (closed) ambient. We have used a dubitative form because, usually, one
takes for granted that the correspondence states/rays is bijective.
The vectors of H will be denoted by the symbol | · · ·  , called “ket” (Dirac
notation). The scalar product between two vectors | A  and | B  is denoted
by the ‘bra’-‘ket’:
 B | A  =  A | B ∗ (4.1)
and is linear in | A  and antilinear in | B  ; the scalar product between α | A 
and β | B  ( α, β complex numbers) is β ∗ α  B | A  : it is the same as saying
that one takes the scalar product between the “ket” α | A  and the “bra”
β ∗  B | (by the way we note that | A  and α | A  represent the same state).
The important physical aspect of the superposition principle is that it
expresses the fact that the states, being represented by vectors, may ‘interfere’
with each other: if | A  and | B  represent two states, then

α|A + β |B  for any α, β ∈ C (4.2)

still represents a state different


 from both | A  and | B  – just as for the
polarization states of light represented by the vectors (3.12) and, in general,
for waves.
Note that the states (4.2) are (varying α and β) ∞2 , not ∞4 (α and β are
complex numbers!) because α and β can be multiplied by the same complex
factor: α | A  + β | B  and α | A  + β  | B  represent the same state if and
only if α : α = β : β  .
In the example discussed in Sect. 3.3 | A  and | B  represent two states
of the photon: | A  ( | B  ) represents the state one has when only the hole A
(B) is open: when both holes are open the state of the photon is represented
by
|C  = |A + |B . (4.3)
The vague expression used in Sect. 3.3 for the state C that ‘has to do with’
the states A and B is now translated into a precise mathematical form: the
vector | C  that represents the state C of the photon is a linear combination
of the vectors | A  and | B  that represent the states of the photon when
either
 only hole A or only hole B is open.
Sometimes, in order keep the terminology quick enough, we shall simply say
‘the state | A  ’ instead of ‘the state represented by the vector | A  ’. The
abbreviation is not appropriate for two reasons: (i) there is no correspondence
4.2 Observables 63

between states and vectors, only between states and rays; (ii) one thing is the
state of the system, another thing is the way we represent it.
Always referring to the Young experiment (or to its analogue with neu-
trons) we now ask how the ∞2 states α | A  + β | B  can by physically
realized: we may change at our wish the ratio |α/β| by varying, by means of
diaphragms, the sizes of the two holes (or – in the experiment with neutrons
of Fig. 3.4 – by making use, for the semi-transparent mirror s1 , of a crystal
with suitable reflection and transmission coefficients); the relative phase be-
tween α and β can be varied by putting (in the Young experiment) a small
glass slab of suitable thickness in front of one of the holes (it will have the
same effect as either the glass slab in the Mach–Zehnder interferometer or the
aluminum wedge in the neutron interferometer of Fig. 3.4). If we now send
many photons and then develop the photographic plate, we see that, accord-
ing to the value of |α/β| , the contrast among the bright and the dark fringes
of the interference pattern changes; and if either α → 0 or β → 0 , the latter
becomes the diffraction pattern of the open hole, whereas in the second case
(i.e. in presence of the phase shifter) we see that all the interference pattern
has undergone a translation. There exist, indeed, ∞2 interference patterns.
In the case of the states of polarization of light, we know that the ∞2
states corresponding to the vectors (3.12) are the ∞2 states of elliptic polar-
ization of the photons.
In Sect. 3.3 we have said that the state of a system is defined by the way
in which the system is prepared: the postulates we will introduce in the next
sections will teach us how this information is codified in the vector | A  that
represents the state.

4.2 Observables
We call observables the quantities that can be measured on a system: in most
cases one has to do with the same quantities that can be measured according
to classical physics (an exception is provided by the spin, that has no classical
analogue). For example, in the case the system is a particle, energy, angular
momentum, the components qi , 1 = 1, 2, 3 of the position of the particle, the
components of its linear momentum etc. are observables (with some proviso
on the last two): in general the functions f (q, p) .
We already know the fundamental role played in quantum mechanics by
the process of measurement; therefore we think to associate one or more in-
struments of measurement with each observable: for example we associate
a Heisenberg microscope with the observable ‘position q1 ’, a magnetic field
spectrometer with the observable ‘momentum | p |’, etc.
From now on, by the term ‘observable’ we shall mean both the quantity
that can be measured and an instrument suitable for measuring it.
Let ξ be an observable and ξ1 , ξ2 , · · · ξi · · · the possible results of the
measurements of ξ on the system. The real numbers ξi are called the eigen-
values of ξ (we shall see later that the denumerability of the eigenvalues
follows from the separability of H).
64 4 The Postulates of Quantum Mechanics

For example, the possible results of the measurements we have cited in the
previous chapter, when either dealing with the Young experiment modified by
the presence of mobile mirrors (Sect. 3.3) or with the experiment involving
the birefringent crystal (Sect. 3.5), were two (either one of the mirrors was
hit), and we can couple a device to our apparatus exhibiting a number on its
display, e.g. the numbers +1 and −1, depending on the result. In the latter
case the eigenvalues are two: ξ1 = +1, ξ2 = −1 .
More: the energies allowed for a system, i.e. its energy levels, are the
eigenvalues of the observable ‘energy’.
It will be a specific task of the theory to specify which are the eigenvalues
relative to each observable. Indeed one of the main problems of quantum
mechanics precisely is the determination of the eigenvalues for the different
observables and, among these, energy will have a privileged role.
In general, if the observable ξ is measured on a system in the state | A ,
the result of the measurement is not a priori determined (see the experiments
with polaroid sheets), but all the numbers ξi can be found as result, with
probabilities pi that depend on the state | A  . In other words, if the mea-
surement of ξ is made many times on the system, that any time is in the
state | A  (this means that many copies of the system, all prepared in the
same way, are at one’s disposal), the different results ξi will be obtained with
frequencies proportional to pi .
We shall call eigenstates of ξ those particular states on which the result
of measurements of ξ is determined a priori, therefore it is always the same
(example: the two rectilinear polarization states of a photon, respectively par-
allel and orthogonal to the optical axis of a birefringent crystal, are eigenstates
of the observable associated with the crystal) and we shall call eigenvectors
of ξ the vectors of H that represent the eigenstates of ξ (the abuse made in
using the words ‘eigenvectors’ and ‘eigenvalues’, that have a precise meaning
in the framework of linear algebra, will be justified later).
An eigenstate of ξ corresponding to the eigenvalue ξi is a state
for which the result of the measurement always is ξi (so to any eigenstate
there corresponds one of the eigenvalues; we shall shortly see that also the
viceversa is true): therefore for it pi = 1 whereas, if j = i , pj = 0 ; a
representative vector of it is denoted by | ξi  and is called an eigenvector
of ξ corresponding to ξi (we shall often improperly say ‘the eigenstate
| ξi ’ instead of ‘the eigenstate of ξ represented by the vector | ξi ’).
We postulate that:
If ξ is measured on a system and the result is ξi , immediately after the
measurement the system is in an eigenstate of ξ corresponding to ξi .
So, if the system is in a state | A  and a measurement of ξ is made,
we cannot know a priori which result we will obtain and in which state the
system will be after the measurement, but when the measurement has been
made and has given the result ξi , we know that the system is in an eigenstate
of ξ corresponding to the eigenvalue ξi : therefore, if immediately after the
first measurement of ξ a second measurement of the same observable is made,
4.2 Observables 65

certainly the same result will be obtained (and therefore to any eigenvalue
there corresponds at least one eigenvector).
So, in general, a measurement perturbs the state of the system: | A  →
| ξi  (an exception is provided by the case in which | A  itself is an eigenstate
of ξ). Note that this postulate expresses in general what we have seen about
the birefringent crystal in the previous chapter.
It is easy to realize that in many measurement processes this postulate
is contradicted. It must therefore be understood in the following sense: it
defines which are the ‘ideal’ instruments of measurement that correspond to
observables quantities, fixing their behaviour (so, in this sense, it is a definition
rather than a postulate). In addition, it postulates that for any observable
there exists at least one ‘ideal’ instrument suitable to measure it.
There may exist one or more eigenstates corresponding to a given eigen-
value ξi of the observable ξ: in the first case we will say that the eigenvalue
ξi is nondegenerate , while in the second we will say that the eigenvalue ξi
is degenerate . The physical importance of this definition is the following: if
a measurement of ξ yields the eigenvalue ξi as result, and if this is nonde-
generate, then we know in which state the system is after the measurement.
In the contrary case, if ξi is a degenerate eigenvalue, we only know that af-
ter the measurement the system is in an eigenstate of ξ corresponding to the
eigenvalue ξi , but we do not know which one.
A nondegenerate observable is an observable such that to any of its
eigenvalues there corresponds only one eigenstate, i.e. all of its eigenvalues are
nondegenerate; in most cases the observables are degenerate.
According to the above discussion, a measurement of a nondegenerate
observable always completely determines the state of the system after the
measurement; if instead, in correspondence with the eigenvalue found as a
consequence of the measurement there exists more than one eigenstate (de-
generate eigenvalue), the information on the state of the system after the
measurement is only partial.
We shall see in Sect. 4.4 that a postulate, known as von Neumann pos-
tulate , will enable us to determine the state immediately after the measure-
ment even when the result is a degenerate eigenvalue. For the time being,
consistently with this postulate (that we shall be able to enunciate only after
proving that the set of the eigenvectors of on observable, corresponding to a
given eigenvalue, is a linear manifold), we may assume that:
If a measurement of the observable ξ is made on the system in the state | A  ,
the state after the measurement is univocally determined by the initial state
| A  and by the found eigenvalue ξi .
We shall see that the von Neumann postulate will allow us to state that
the ‘arrival state’, within the set of all the possible eigenstates corresponding
to the found degenerate eigenvalue, is that for which the initial state has
undergone the least possible perturbation.
Example: the birefringent crystal is a nondegenerate observable if the sys-
tem is the photon regardless of its state of motion (i.e. the state space is H2 );
66 4 The Postulates of Quantum Mechanics

if instead the system is ‘all’ the photon, then it is a degenerate observable:


indeed it allows us to determine the polarization state of the photons, not
their state of motion. In any event, von Neumann postulate allows us to state
that if we take a photon in a well determined state (for example it propagates
in a given direction with energy E = h ν ) and is linearly polarized at an angle
ϑ with respect to the optical axis of a birefringent crystal, after the measure-
ment has established whether the photon emerges either in ordinary or in the
extraordinary ray, the photon is in a well determined state that depends only
on the initial state and on the result of the measurement: the polarization
state is either parallel or orthogonal to the axis of the crystal, according to
whether the photon has emerged respectively either in the extraordinary or
in the ordinary ray and, if our instrument is ‘ideal’ (in the above said mean-
ing), the state of motion (i.e. energy and direction of motion) has been left
unchanged.
A device that allows us to determine both the polarization and the motion
state of the photons (crystal + prism + etc.) is instead a nondegenerate
observable.
Provisionally we take the following statement as a further postulate, even
if it will be seen to be a consequence of the other postulates we will enunciate
in the sequel:
Any state is eigenstate of some nondegenerate observable.
From the point view of physics this means that any state can be ‘prepared’
by the measurement of a suitable observable: i.e. one uses the observable as
a filter (much as the polaroid or the birefringent crystal to prepare linearly
polarized photons) by making measurements on the system and accepting only
the state that is obtained when the measurement yields the desired result.
It therefore emerges that a type of information sufficient to characterize
the state of a system consists in knowing which nondegenerate observable has
been measured on the system and the result of such a measurement: it is not
necessary to know all the past history of the system.

4.3 Transition Probabilities


Let ξ be an observable. We know that a measurement of ξ on a system in
the state | A  will give one of the eigenvalues ξi as a result and that, owing
to the measurement process, the system makes a transition to an eigenstate
| ξi  . Therefore the probability pi of finding a given eigenvalue ξi as a result
is also called transition probability from | A  to | ξi  . Let in general | A 
and | B  be the representative vectors of two states; the probability that,
owing to a measurement on the system in the state | A  , the system goes in
the state | B  is called transition probability from | A  to | B  .
One postulates that the above transition probability P ( | A  → | B ) does
not depend on the (‘ideal’) instrument used to perform the measurement and
is given by:
2
  B | A
P |A → |B  = · (4.4)
A | AB | B 
4.3 Transition Probabilities 67

Definition (4.4) is a ‘good definition’: indeed

• P does not depend on the arbitrary factor present in the correspondence


states ↔ vectors: it does depend, as it must be, on the states, not on the
vectors chosen to represent them. Indeed, if α and β are complex numbers,
α | A  and β | B  represent the same states as | A  and | B  and one
has
2
  |α|2 |β|2  B | A   
P α|A → β |B  = = P |A → |B  .
|α|  A | A  |β|  B | B 
2 2

• 0 ≤ P ≤ 1 : it follows from the Schwartz inequality.


 
Note that P | A  → | B  = 0 if and only if | A  and | B  are orthogonal to
each other: there occur no transitions among states (represented by vectors)
orthogonal to one another; whereas P = 1 if and only if | A  and | B 
represent the same state.
Usually it is convenient to represent the states by vectors normalized to 1:

A | A = 1 = B | B 

in which case (4.4) reads:


  2
P |A → |B  = B | A (valid for normalized vectors). (4.5)

The transition probability between two states | A  and | B  is, from the
operational point of view, always well defined inasmuch as we have postulated
that for any state | B  there always exists (at least) one observable that has
such a state as an eigenstate corresponding to a nondegenerate eigenvalue.
In any event, even if ξi is a degenerate eigenvalue of ξ, the probability
pi that a measurement of ξ on a state | A  gave ξi as a result is given by
2
 A | ξi  (normalized vectors), where | ξi  is that eigenstate of ξ that,
thanks to von Neumann postulate, is univocally determined by the initial
state | A  and by the eigenvalue ξi .
Example: let us consider a photon in the linear polarization state

| eϑ  = cos ϑ | e1  + sin ϑ | e2  .

The probability that, owing to a measurement (e.g. by means of a birefringent


crystal) it makes a transition to the state | e1  , is given by
  2
P | eϑ  → | e1  =  e1 | eϑ  = cos2 ϑ .

Indeed, the two vectors | e1  and | e2  are orthogonal to each other (as
elements of H2 ) because a measurement on photons in the state | e1  will
never give photons in the state | e2  . One is back to Malus law.
68 4 The Postulates of Quantum Mechanics

4.4 Consequences and von Neumann Postulate


Let us now analyze some consequences of the postulates so far introduced.
We will firstly discuss the case of
Nondegenerate observables
Let ξ be a nondegenerate observable, ξi its eigenvalues and | ξi  its eigen-
states that we will assume normalized to 1.
2
The quantity  ξi | ξj  is the probability that, owing to a measurement,
e.g. of the observable ξ, a transition from | ξi  to | ξj  or viceversa has taken
place. But, if i = j , this probability is 0 because a measurement of ξ on the
state | ξi  will always give ξi as a result, never ξj . Therefore:

0 if i = j
 ξi | ξj  = δij = (4.6)
1 if i = j .

Namely: the normalized eigenvectors of an observable ξ are an orthonormal


system of vectors. Let us demonstrate that such a system is also complete .
We will reason by contradiction: if a vector | A  orthogonal to all the | ξi 
  2
existed, one would have pi = P | A  → | ξi  =   ξi | A  = 0 for any i,
which is absurd since, by definition of probability, i pi = 1 . Therefore:
the eigenvectors of the (nondegenerate) observable ξ form an orthonormal
basis.
Any vector | A  of the space H can therefore be expanded in series of the
vectors of the basis (Fourier series):

|A = ai | ξi  . (4.7)
i=1

The coefficients ai of the Fourier series are calculated by taking the scalar
product of both sides of (4.7) with the generic (normalized) vector belonging
to the basis:

 ξi | A  = aj  ξi | ξj  = ai . (4.8)
j=1

Equation (4.7) can then be rewritten by substituting the expression  ξi | A 


to the ai :

|A = | ξi  ξi | A  . (4.9)
i=1

Note that, thanks to (4.6),  A | A  (the squared norm of the vector | A  ) is


given by

A | A = a∗i aj  ξi | ξj  = |ai |2 < ∞ (4.10)
i,j i=1

so, only if i |ai |2 < ∞ , does (4.7) define a vector belonging to H.
4.4 Consequences and von Neumann Postulate 69

If the vector | A  is normalized, the coefficients ai have a rather direct


meaning: |ai |2 is the transition probability pi from | A  to | ξi  (and from
(4.10) one has i pi = 1 ):
2
pi =  A | ξi  = |ai |2 . (4.11)

If the | ξi  are normalized, but | A  is not, then


|ai |2
pi = .
A | A
The fact that only the |ai |2 and not directly the ai have been given a physical
meaning should not lead one to think that only the absolute value of the ai is
endowed with a physical meaning: once the basis is fixed (i.e. once the vectors
| ξi  are fixed), changing the ai by a phase factor, ai → exp(i ϕi ) ai , amounts
to changing the state, unless the ϕi are all equal to one another, in which
case | A  → exp (i ϕ) | A  , and the state is not changed.
Example: the states of polarization of a photon , represented by the vectors

| eϑ  = cos ϑ | e1  + sin ϑ | e2  , | eϑϕ  = cos ϑ | e1  + sin ϑ ei ϕ | e2 

are different from each other: linear polarization in the first case, elliptic in
the second, even if
   
P | eϑ  → | e1  = P | eϑϕ  → | e1 
   
P | eϑ  → | e2  = P | eϑϕ  → | e2  .

If instead we want the probability of transmission through a polaroid sheet


with its optical axis at 45◦ in the x-y plane, i.e. the transition probability to
the state
1  
√ | e1  + | e2  ,
2
the latter is different for the two states | eϑ  and | eϑϕ  . Therefore, in general,
two states represented by the vectors:
∞ ∞
|A = aj | ξj  , |B = aj ei ϕj | ξj  (4.12)
j=1 j=1

are different, even if they behave in the same way as far as the measurements
of the observable ξ are concerned (the probabilities | A  → | ξj  and | B  →
| ξj  are equal to each other), but – just because they are different states –
there will certainly exist some other observable η, measuring which will give
rise to a different behaviour of | A  and | B  .
Let us now discuss the case of
Degenerate Observables
Let us now examine how the previous results are modified if ξ is a degen-
erate observable. Also in this case one has that
70 4 The Postulates of Quantum Mechanics

the eigenvectors of an observable corresponding to different eigenvalues are


orthogonal to one another
and that
the set of all the eigenvectors of an observable is complete.
The proofs, that we omit, are similar to those of the nondegenerate case (it
is however necessary to make use of another consequence of the von Neumann
postulate: if, owing to a measurement of the observable ξ, | A  → | ξ i  , then,
from among all the eigenstates of ξ corresponding to the eigenvalue ξi , the
vector | ξ i  is that for which the probability transition is a maximum).
We can no longer state that the eigenvectors of the (degenerate) observ-
able ξ form an orthonormal basis: we are not guaranteed (and it is not true)
that different eigenvectors corresponding to the same eigenvalue are mutually
orthogonal. Indeed the following important theorem holds:
Theorem: any linear combination of the eigenvectors of an observable cor-
responding to the same eigenvalue still is an eigenvector of the observable
corresponding to the same eigenvalue.
Let ξ1 be a degenerate eigenvalue: the theorem states that if | ξ1  and

| ξ1  represent two eigenvectors of ξ corresponding to the same eigenvalue
ξ1 , any vector α | ξ1  + β | ξ1  represents an eigenstate of ξ corresponding to
the eigenvalue ξ1 .
Proof: let us put | A  = α | ξ1  + β | ξ1  . In order to show that | A  is
an eigenstate of ξ corresponding to the eigenvalue ξ1 , one must show that a
measurement of ξ on | A  always gives ξ1 as result, which is equivalent to
say that a measurement of ξ will never give ξi , i ≥ 2, , namely pi = 0 for
i ≥ 2 . Let us reason by contradiction: suppose we find ξi ( i ≥ 2 ) as a result.
Then the system after the measurement is in an eigenstate of ξ corresponding
2
to ξi : | ξi  . The transition probability | A  → | ξi  is pi =  A | ξi  , but

 A | ξi  = α∗  ξ1 | ξi  + β ∗  ξ1 | ξi  = 0

for  ξ1 | ξi  =  ξ1 | ξi  = 0 (orthogonality of eigenvectors corresponding to


different eigenvalues), therefore pi = 0 , against the hypothesis.
Thanks to the continuity of the scalar product, the conclusion extends to
linear combinations of whatever (either finite or infinite) number of eigenvec-
tors and we can in conclusion state that
The set of all the eigenvectors of an observable corresponding to the same
eigenvalue is a (closed) linear subspace of the Hilbert space H.
This linear manifold is called the eigenspace of the observable corre-
sponding to the eigenvalue ξi .
The dimension of this manifold (that can be either finite or infinite), i.e. the
number of independent vectors it contains, is called degree of degeneracy
of the eigenvalue. For a nondegenerate eigenvalue the degree of degeneracy is
therefore 1. The gi ’s of (2.20) are precisely the degrees of degeneracy of the
eigenvalues of the observable energy, i.e. the number of independent states
corresponding to the energy level Ei .
4.4 Consequences and von Neumann Postulate 71

We have seen that, for a degenerate observable, the set of its eigenvectors
is complete, but – contrary to the nondegenerate case – is not an orthonormal
system: indeed, in every degenerate eigenspace of an observable obviously
there exist vectors not orthogonal to one another. However, it is known that
from a complete set it is always possible, by means of an orthonormalization
process, to extract a complete orthonormal set. Therefore we are able to form
an orthonormal basis consisting of eigenvectors of a (degenerate) observable
ξ: such a basis contains all the eigenvectors corresponding to nondegenerate
eigenvalues and a system of mutually orthogonal vectors corresponding to
any degenerate eigenvalue, whose number equals its degree of degeneracy, i.e.
a system that is complete in the considered eigenspace. The choice of the
orthonormal basis is not unique, because clearly in any degenerate eigenspace
of the observable infinite choices of mutually orthogonal vectors are possible.
(1) (2) (1)
In any event, once an orthonormal basis is fixed: | ξ1 , | ξ1 , · · · , | ξ2 , · · · ,
one can still expand any vector | A  in terms of it:
(1) (1) (2) (2) (1) (1)
| A  = a1 | ξ1  + a1 | ξ1  + · · · + a2 | ξ2  + · · · . (4.13)
(1) (2) (1)
The coefficients a1 , a1 , · · · , a2 , · · · are obtained as in (4.8) and (4.10)
still holds.
Two problems remain open:
1. how much is the probability pi that a measurement of ξ on | A  gives the
degenerate eigenvalue ξi as a result? and
2. which is the state of the system after such a measurement?
To both questions the von Neumann postulate (we already have partially
enunciated and utilized) gives the answer:
von Neumann Postulate: if a measurement of ξ on | A  gives the (degen-
erate) eigenvalue ξi as a result, the state after the measurement is represented
by the (ray to which belongs the) vector | ξ i  that is obtained by orthogonally
projecting | A  onto the eigenspace of ξ corresponding to the eigenvalue ξi .
If | A  is given by (4.13), then:
(1) (1) (2) (2) (n) (n)
| ξ i  = ai | ξi  + ai | ξi  + · · · + ai | ξi  + ··· (4.14)
(k)
(the sum extends only to the vectors | ξi  that correspond to the eigenvalue
ξi and, if | A  is normalized to 1, in general the vector | ξ i  is not).
Now also the answer to the first problem is straightforward:
 (1) 2 (2) (n) 2
  |A | ξ¯i |2 | a i | + | a i |2 + · · · + | a i |2 + · · ·
pi = P | A  → | ξ̄i  = =
 ξ¯i | ξ¯i  (1) (2) (n)
| a |2 + | a |2 + · · · + | a |2 + · · ·
i i i
(1) (2) (n) 2
= | a i |2 + | a i |2 + ···+ | ai | + ··· (4.15)

Therefore pi is the sum of the transition probabilities | A  → | ξi  , | A  →


| ξi  , · · · from | A  to the vectors of whatever orthonormal set of eigenvectors
72 4 The Postulates of Quantum Mechanics

of ξ corresponding to the eigenvalue ξi , in number equal to the degree of


degeneracy, i.e. complete in the considered eigenspace.
Note that, since the vector representing the state after the measurement is
obtained by projection of the vector | A  , an equivalent way of expressing the
von Neumann postulate is to state that: the system effects the transition to
the state, from among the eigenstates of ξ corresponding to ξi , for which the
probability transition is a maximum; or that: the measurement has perturbed
the system the least possible; indeed, the orthogonal projection is, in the
manifold of the eigenvectors corresponding to ξi , the vector ‘closest’ to the
initial vector | A  .
An obvious consequence of von Neumann postulate is that, if the ini-
tial state | A  already is an eigenstate of ξ corresponding to the (degener-
ate) eigenvalue ξi , the state is not perturbed by the measurement: indeed
the projection of | A  coincides with | A  ; therefore, in particular, even if
 ξi | ξj  = 0 , a measurement of ξ will never be able to induce the transi-
tion | ξi  → | ξi  : the measurement of another observable η, for which | ξi 
is not one of its eigenstates but | ξi  is, will be necessary to induce such a
transition.

4.5 Operators Associated with Observables


In the process of formalization of physical concepts we are pursuing, we have
represented the states of a system by means of vectors in a Hilbert space; there
now remains to understand in which way we shall formalize (i.e. by means of
which mathematical entities we shall represent) the observables that, from the
physical point of view, are characterized by the existence of those particular
states that we have called eigenstates and by the eigenvalues.
The terms we have used (eigenvectors, eigenvalues) have not been intro-
duced casually, but they constituted an anticipation of what will be the result
of this process of formalization. Indeed, it is possible to associate, in a quite
natural way, an operator ξ op on H (namely a linear application from H onto
H) to each observable ξ in the following way: no matter how a basis consisting
of the eigenvectors | ξi  of the (possibly degenerate) observable ξ is chosen,
one defines ξ op on the vectors of this basis as
def
ξ op | ξi  = ξi | ξi  (4.16)
and, just to start, one extends the definition by linearity to all
the finite linear
n
combinations formed with the vectors of the basis: if | A  = 1 ai | ξi  ,
n n
def
ξ op | A  = ai ξ op | ξi  = ai ξi | ξi  . (4.17)
i=1 i=1

Then, when it is possible,


∞the definition isextended to the infinite linear

combinations: if | A  = 1 ai | ξi  (being 1 |ai | < ∞ ),
2

n ∞
def
ξ op | A  = lim ai ξi | ξi  ≡ ai ξi | ξi  . (4.18)
n→∞
i=1 i=1
4.6 Properties of the Operators Associated with Observables 73

Let us firstly check that the definition is a ‘good definition’, namely indepen-
dent of the choice of the basis (although still consisting of eigenvectors of ξ):
indeed, within any eigenspace of the observable ξ, ξ op is a multiple of the
identity application:
(k) (k)
| ξi  = αk | ξi  ⇒ ξ op | ξ i  = αk ξi | ξi  = ξi | ξ i  (4.19)
k k

and, as a consequence, does not depend on the basis chosen in the eigenspace.
Why did we say ‘when it is possible’ before writing (4.18)? Since
 n 2 n
 
 ai ξi | ξi  = ξi2 |ai |2 ,
i=1 i=1
∞
1 ξi |ai | does not converge, by (4.10) the
2 2
one realizes
∞ that, if the series
series  1 ai ξi | ξi  does not define a vector
∞ 2in H.2 Then,2 unless
∞ |ξ2i | < M for
any i bounded operator: in this case 1 iξ |a i | < M 1 |ai | < ∞ , it
is not possible to define the operator ξ op on all H: the domain Dξ of the
operator ξ op , namely
∞the2 set 2of vectors on which it is defined, consists of those
vectors for which 1 ξi |ai | < ∞ and the latter (indeed already the set of
finite linear combinations) form a set that is dense in H; for the vectors that
belong to Dξ one has therefore:

ξ op | A  = ai ξi | ξi  , | A  ∈ Dξ , Dξ = H . (4.20)
i

We will not exceedingly worry about these technical (domain) problems, that
certainly are important from the mathematical point of view, but absolutely
marginal from the physical standpoint. More to it: in principle, one should
state that from the physical point of view all the observables are represented by
bounded operators ( |ξi | < M for any i), since no instrument can yield results
‘as large as one wishes’: the scale of an instrument is always bounded both
from above and from below. The only reason for which, in practice, we cannot
totally forget domain problems lies in the fact that almost all the operators
associated with the observables f (q, p) (owing to the quantization postulate
we have not yet enunciated) will exhibit an unbounded spectrum: one can
say that such operators do not faithfully represent the physical observables,
namely the measurement instruments, but they rather provide a mathematical
schematization for them. Stated in different words, the root of the domain
problems is in the mathematical schematization, not in physics.

4.6 Properties of the Operators Associated with Observables


Let us now examine the properties of ξ op .
1. ξ op is a self-adjoint operator: ξ op = (ξ op )† .
Consistently with the just made statement that we do not want to be over-
whelmed by domain problems, in the case of unbounded operators we give
neither the definition of adjoint operator nor that of self-adjoint operator.
74 4 The Postulates of Quantum Mechanics

For bounded operators, defined on the whole H, such definitions coincide


with those of the finite-dimensional linear spaces: a bounded operator η (de-
fined on all H) by definition is known when, for any vector | A  , the vector
| C  ≡ η | A  is known; but any vector | C  is itself known when, for any
| B  , the scalar products  B | C  are known the knowledge of the scalar
products between | C  and the elements of an orthonormal basis is sufficient:

in the latter case | C  is given by an expression of the type (4.7) , so η is
determined by the knowledge
 of the scalar products  B | η | A  for any | A 
and for any | B   B | η | A  stands for the scalar product of | B  and
|C  = η|A .
The adjoint η † of η can then be defined by the following equation:

 B | η † | A  =  A | η | B ∗
def
(4.21)

(normally mathematicians write η ∗ instead of η † , whereas we will reserve the


asterisk for the complex conjugation of numbers); η is self-adjoint if η = η † .
Therefore ξ op = (ξ op )† (bounded) is equivalent to:

 B | ξ op | A  =  A | ξ op | B ∗ for all | A , | B  ∈ H . (4.22)


If ξ op is not bounded, (4.22) – holding for all the vectors | A  , | B  in the
domain of ξ op – only expresses the fact that ξ op is Hermitian (as Dξ is
dense, the correct term would be symmetric, however we will always use the
term Hermitian, which is the one normally used by physicists); in order that
ξ op be self-adjoint, something more is needed the domains, on which ξ op
and (ξ op )† are defined, must coincide .
We will limit ourselves to show that ξ op is Hermitian (i.e. symmetric).
 op
Proof: use will be made
  of: (4.9),
 the linearity of ξ , the linearity of the
scalar product:  B | i | Ai  = i  B | Ai  , its Hermiticity, expressed by
(4.1), and the reality of the eigenvalues ξi )

 B | ξ op | A  =  B | ξ op | ξi  ξi | A  =  B |ξ op | ξi  ξi | A 
i i

= ξi  B | ξi × ξi | A  = ξi  A | ξi ∗ × ξi | B ∗
i i
 ∗
= ξi  A | ξi × ξi | B  =  A | ξ op | B ∗
i

2. We have always used the terminology | ξi  : eigenvectors; ξi : eigenvalues.


This terminology has a precise mathematical meaning, however ours has not
been an abuse of terms: indeed all the eigenvectors and eigenvalues of ξ op (in
the mathematical sense) are the eigenvectors and eigenvalues of the observable
ξ (in the physical sense) and viceversa. The second part of the proof lies in the
definition of ξ op and (4.19). There remains to show that, if ξ op | μ  = μ | μ ,
then μ is one of the eigenvalues ξi of the observable ξ op and | μ  is one of the
| ξi  : however, this is an exercise we leave to the reader.
Let us summarize: every observable ξ corresponds to a self-adjoint operator
ξ op ; the eigenvectors and eigenvalues of ξ op are all and only the eigenvectors
4.6 Properties of the Operators Associated with Observables 75

and eigenvalues of ξ. We are therefore authorized, from now on, to identify,


and as a consequence to represent by the same symbol ξ, the observable and
the operator associated with it: therefore ξ simultaneously represents both a
physical quantity, the instrument (or the instruments) suitable to measure it
and the linear operator that corresponds to it: so we shall no longer use the
notation ξ op .
It is natural, at this point, to ask whether any self-adjoint operator is
associated with an observable.
Before answering this question we recall that the eigenvalues of a self-
adjoint operator are real and that eigenvectors corresponding to different
eigenvalues are orthogonal to one another: we recall the proof of these two
facts, just to practice with the formalism.
By taking the scalar product of | η   with η | η   = η  | η   one obtains:

 η | η | η  = η  η | η 

and as η = η † , from (4.22) it follows that  η  | η | η   =  η  | η | η  ∗ and,


as a consequence, η  is real.
Let now

η | η  = η | η  , η | η   = η  | η   , η  = η  .

By taking the scalar product of the first with | η   and of the second with
| η   one has

 η  | η | η   = η   η  | η   ,  η  | η | η   = η   η  | η  

and by subtracting the first from the complex conjugate of the second ( η =
η † !) one finds:

(η  − η  ) η  | η   = 0 ⇒  η  | η   = 0 .

It therefore seems that self-adjoint operators have the right properties to be


considered as operators associated with observables. It is however necessary
to keep in mind that, from this point of view, in a (separable) Hilbert space
there may occur very unpleasant things. It may happen that a self-adjoint
operator does not have enough eigenvectors as to form a basis, while we know
that the set of the eigenvectors of an observable forms a basis (example: in
the space L2 (a, b) consisting of the square-integrable functions f (x) in the
interval a ≤ x ≤ b , the operator f (x) → x f (x) is self-adjoint and has
no eigenvector: indeed the equation x f (x) = λ f (x) has the only solution
f (x) ∼ 0 ).
The self-adjoint operators whose eigenvectors form a basis are the oper-
ators with purely discrete spectrum, while the others are the operators with
continuous spectrum plus – possibly – a discrete component (see Fig. 2.3).
We then postulate that
Any self-adjoint operator with purely discrete spectrum is associated with an
observable.
76 4 The Postulates of Quantum Mechanics

Any operator associated with an observable will itself be called an ‘observ-


able’: we will identify in this way, not only in the notation but even in the
name, measurement instruments and operators.
We shall however see in the sequel that it is not only advisable but even
right to attribute the name ‘observable’ also to self-adjoint operators that do
not have a purely discrete spectrum (as e.g. the energy of the hydrogen atom:
there are energy levels that form the discrete component of the spectrum,
and there is the continuum of the ionization states). We shall see that they
can be considered as ‘limits’ (in a sense to be specified) of operators endowed
with discrete spectrum and therefore correspond to limiting cases of bona fide
observables (also in this case one deals with those ‘limit’ concepts so frequent
in physics: point mass, instantaneous velocity, . . . ).
Also in this case (much as in the case of the bijective correspondence
between states and rays) perhaps it is neither true nor necessary to assume
that all the self-adjoint operators represent some observable.
What we had provisionally assumed, namely that for any state | A  there
exists at least one observable that possesses | A  as an eigenstate correspond-
ing to a nondegenerate eigenvalue, follows from the last postulate we have
enunciated: indeed, it is sufficient to take the observable corresponding to the
projector onto the one-dimensional linear manifold generated by the vector
| A  (the definition of projection operator will be recalled in the next section).
Anyway, in Sect. 6.3 we will come back to this problem.
There still remains a problem: given an observable quantity (e.g. energy,
angular momentum, . . . ) how do we know the operator that must be asso-
ciated with it? This problem will be given its answer by the quantization
postulate we will enunciate in Sect. 4.12.

4.7 Digression on Dirac Notation


The aim of this section only is to compare the notation used by mathemati-
cians with Dirac’s, the latter being the one used in almost all the texts on
quantum mechanics. Therefore, in the following formulae, all the technical
issues that concern domain problems etc., will be omitted.
The differences between the two types of notation originate from
1. the fact that mathematicians represent vectors simply by the letters of
the alphabet: u, v, · · · , whereas we use the letters of the alphabet (or
any other symbol) boxed within the “ket” |  : | u  , | v  ; | +  , | −  ;
| ↑  , | ↓  ; |   , |   ; · · · (even these fancy symbols we will make use
of);
2. the different notation for the scalar product: (u , v) for mathematicians,
 u | v  for us.
Indeed, Dirac “bra”  u | is the element of another vector space: the space
dual to H, i.e. the space of the linear and continuous functionals on H. This is
totally legitimate thanks to the Riesz theorem, according to which the space
dual to H is isomorphic to H itself.
4.7 Digression on Dirac Notation 77

Let us now examine the main notational differences that follow from the
above points: let ξ be a (not necessarily self-adjoint) linear operator; the vector
that results from the application of ξ to a vector is ξ u for mathematicians
and ξ | u  (not | ξ u  ) for us. So we will write  v | ξ | u  that has the
same meaning as (v , ξ u) for mathematicians. Moreover the adjoint ξ † of an
operator ξ is defined by means of the equation (ξ † v , u) = (v , ξ u) that we
can write with our notation only after taking the complex conjugate of both
sides:
(u , ξ † v) = (v , ξ u)∗ ←→  u | ξ † | v  =  v | ξ | u ∗ .
The last equation amounts to saying that the “bra” corresponding to the
“ket” ξ | u  is  u | ξ † , and if ξ | u  = | v  , then  u | ξ † =  v | ; in particular,
if | ξi  is an “eigenket” of ξ = ξ † ,  ξi | is an “eigenbra” of ξ corresponding to
the eigenvalue ξi :  ξi | ξ = ξi  ξi | .
Let us list some properties of the Hermitian conjugation ( ξ → ξ † ) that
immediately follow from the definition of adjoint operator:

(ξ † )† = ξ
(α ξ)† = α∗ ξ †
(4.23)
(ξ + η)† = ξ † + η†
(ξ η)† = η† ξ † .

Dirac notations are not favourably looked at by the mathematicians, whereas


for the physicists, that have to make a large use of them, they are rather
comfortable because follow rules of easy applicability: for example, to take
the scalar product of | u  with | v  means to ‘glue’ the bra  v | and the ket
| u  ; moreover, much as the conjugate of α is α∗ and the conjugate of an
operator ξ is ξ † , the conjugate of a ket (bra) is the corresponding bra (ket);
this rule, that ensues from (4.21) and (4.23), allows one to write in an almost
automatic way an expression like  v | ξ η ζ · · · | u ∗ : it suffices to conjugate
any element and reverse their order:

 v | ξ η ζ · · · | u ∗ =  u | · · · ζ † η † ξ † | v  .

We conclude this section by recalling the definition of projection operators


and showing how also these can be easily expressed with the Dirac notation.
Let | v  ∈ H ,  v | v  = 1 . The projection operator Pv onto the (one-
dimensional) manifold generated by | v  is defined by

for all u ∈ H : Pv | u  = | v  v | u 

so, since applying Pv to | u  is equivalent to ‘glue’ | v  v | to | u  , we can


write:
Pv = | v  v | .
Let now V be a (closed) linear manifold of arbitrary dimension included in H.
If | vi  ∈ V, i = 1, 2, · · · is whatever set of orthonormal vectors that
generate V, the projection operator PV onto V is defined by:
78 4 The Postulates of Quantum Mechanics

for all | u  ∈ H : PV | u  = | vi  vi | u 
i

and we can therefore write:

PV = | vi  vi | . (4.24)
i

In particular, if | ξi  is an orthonormal basis in H, the identity operator 1 –


i.e. the projector onto all H: 1 | u  = | u  – can be written as

1= | ξi  ξi | . (4.25)
i

Equation (4.25) is known as the completeness relation because it expresses


the fact the vectors | ξi  form a complete orthonormal set.
So, for example, by applying both sides of (4.25) to the vector | u  , one
(re)obtains (4.9). Moreover:
2
u | u = u | 1 | u =  u | ξi   ξi | u  =  u | ξi 
i i

i.e. (4.10).
If P is a projection operator, it is straightforward to verify that

P = P† , P2 = P . (4.26)

If ξ is (the operator associated with) an observable and we denote by Pi the


projector onto the manifold consisting of the eigenvectors of ξ corresponding
to the eigenvalue ξi (the eigenspace of ξ corresponding to ξi ), by (4.25) and
(4.24) one has:

Pi = 1 ⇒ ξ ≡ ξ×1 = ξ Pi = ξi Pi = | ξi  ξi  ξi | (4.27)
i i i i

where the sum appearing in the last expression extends to all the vectors of
an orthonormal basis of eigenvectors of ξ; so if a given eigenvalue is n times
degenerate (n ≤ ∞), in the sum there are n eigenvectors corresponding to
that eigenvalue in the last term of (4.27) the eigenvalue
 ξi is placed between
the bra and the ket only for aesthetical reasons .

4.8 Mean Values


If we make N measurement of the observable ξ on the system in the state | A 
(we recall that this means to have at one’s disposal N copies of the system, all
in the state | A  , and that any measurement  is made on one of such copies),
we find the eigenvalues ξi – each Ni times ( Ni = N ) – as results.
We can define the mean value ξ of the observable ξ in the state | A  as
the mean value of the obtained results; if N is very large, one has:

def 1 Ni
ξ = ξi Ni = ξi pi , pi = · (4.28)
N i i N
4.8 Mean Values 79

Let us take  A | A  = 1 . Due to (4.15) (von Neumann postulate), if Pi is (as


in the previous section) the projector onto the eigenspace of ξ corresponding
to the eigenvalue ξi , one has
2
     A | Pi | A 
pi ≡ P | A  → | ξ¯i  = P | A  → Pi | A  =
 A | Pi ×Pi | A 
   
the denominator is  ξ¯i | ξ¯i  , and as see (4.26) Pi2 = Pi and, in addition,
 A | Pi | A  =  A | Pi2 | A  ≥ 0 , one has

pi =  A | P i | A  (4.29)

(we have rewritten (4.15) using projectors).


Therefore, owing to (4.27),

ξ= ξi  A | Pi | A  =  A | ξi Pi | A  =  A | ξ | A  . (4.30)
i i

Unless the contrary is specified,  A | A  = 1 will be always assumed.


The quantity  A | ξ | A  usually is more correctly called “expectation
value of the observable ξ in the state | A  ” because, pi being the prob-
abilities provided by the theory – i.e. theoretical probabilities – ξ is what,
according to the theory, should be expected as mean value of the results of
the measurements of ξ on | A  .
Let us now consider the operator ξ 2 and let us examine its properties:
all the eigenvectors | ξi  of ξ also are eigenvectors of ξ 2 corresponding to the
eigenvalues ξi2 :

ξ 2 | ξi  = ξ×ξ | ξi  = ξ ξi | ξi  = ξi ξ | ξi  = ξi2 | ξi 

and, as the | ξi  form a complete set, also ξ 2 is an observable: if the applica-


tion ξi → ξi2 is injective (i.e. if ξ has no opposite eigenvalues), then ξ and ξ 2
have all and only the same eigenvectors and the observable corresponding to
ξ 2 is obtained by simply changing the scale of the instrument that measures
ξ; in the contrary case the instruments that measure ξ and ξ 2 are different,
since there are eigenvectors of ξ 2 that are not eigenvectors of ξ.
Incidentally, in the same way we can define f (ξ) as the operator that has
the same eigenvectors of ξ and eigenvalues f (ξi ) : if ξi → f (ξi ) is injective,
then f (ξ) does not differ, except for the scale, from the observable ξ.
The expectation value of ξ 2 :

ξ2 =  A | ξ2 | A  = ξ2 pi
i i

results of the measurements of ξ on | A  .


is the quadratic mean value of the 
2
The mean-square deviation Δξ = ξ 2 − ξ is therefore given by

(Δξ)2 =  A | ξ 2 | A  −  A | ξ | A 2 (4.31)
80 4 The Postulates of Quantum Mechanics

or equivalently by:
(Δξ)2 =  A | (ξ − ξ)2 | A  . (4.32)
2 2
Indeed: (ξ − ξ )2 = (ξ 2 − 2 ξ ξ + ξ ) = ξ 2 − ξ .
The meaning of Δξ is well known: it represents the size of the dispersion
of the results around the mean value; moreover, in the present framework one
has the following
Theorem: Δξ = 0 if and only if | A  is an eigenvector of ξ.
Indeed, if ξ | A  = ξ  | A  one has:

 A | ξ | A  = ξ  A | A  = ξ and  A | ξ2 | A  = ξ 2 A | A  = ξ 2

whence Δξ = 0 . If, viceversa, Δξ = 0, then:

0 = (Δξ)2 =  A | (ξ − ξ)×(ξ − ξ) | A 

and, since the latter is the squared norm of the vector (ξ − ξ) | A  (recall that
(ξ − ξ) = (ξ − ξ)† ), it must be that

(ξ − ξ) | A  = 0 ⇒ ξ|A = ξ|A .

For this reason Δξ is also called uncertainty of ξ in | A : if Δξ = 0 the


value of ξ in | A  is completely determined. As a consequence, by means of the
only knowledge of the expectation values, it is possible to establish whether
a given state is an eigenstate of an observable and, in the affirmative case,
to know the corresponding eigenvalue. More to it: thanks to (4.29), also the
transition probabilities can be expressed as expectation values.
So all the physical information that characterizes the state of a system –
transition probabilities, mean values of the observables, observables of which
the state is eigenstate and the corresponding eigenvalues – can be traced back
to expectation values of observables, and in this sense the knowledge of a state
is equivalent to the knowledge of all the expectation values in the state itself.
It is even possible to reformulate quantum mechanics by defining the state of a
system as the collection of the expectation values of all the observables (linear
positive functionals on the algebra of observables), instead of a vector of the
Hilbert space. This formulation has been proposed by J. von Neumann, I.E.
Segal, R. Haag and D. Kastker, and is equivalent, for a system of particles,
to the ‘Hilbert’ formulation due to Dirac.

4.9 Pure States and Statistical Mixtures


We have already insisted on the fact that, since the interpretation of quantum
mechanics is a statistical interpretation, saying ‘the state represented by the
vector | A  ’ presupposes the possibility that one can prepare many copies of
the system in the same state | A  : in this case one says that the system is in
a pure state .
Let us now assume we have N1  1 copies of the system in the state
| A  and N2  1 copies of the same system in the state | B  and that we
4.9 Pure States and Statistical Mixtures 81

measure, on each of these N = N1 + N2 systems, an observable ξ: the theory


(A) (B)
predicts that, if pi and pi are the probabilities of finding the eigenvalue
ξi respectively when the system is in either the state | A  or the state | B ,
(A) (B)
we will find the eigenvalue ξi a number N1 pi + N2 pi of times, therefore
with a probability

1  (A) (B)  N1 (A) N2 (B)


pi = N 1 pi + N 2 pi = p + p . (4.33)
N N i N i
Therefore the mean value of the results of the measurements is:
N1 N2
 ξ  = ξi pi = A | ξ | A + B | ξ | B  . (4.34)
i N N
For example, the N = N1 + N2 states could have been obtained by sending
N photons in the same polarization state on a birefringent crystal with the
optical axis at an angle ϑ with respect to the polarization direction: in the
latter case p1 ≡ N1 /N = cos2 ϑ and p2 ≡ N2 /N = sin2 ϑ (N1 and N2
are respectively the numbers of photons emerging in the extraordinary and
ordinary ray) are not ‘certain numbers’, but are themselves probabilities that
indeed have (in the present case) their origin in the probabilistic nature of
quantum mechanics, but are such because we have not recorded (e.g. by means
of mobile mirrors) how many photons emerged in the extraordinary ray and
how many in the ordinary ray – just as if we threw the die and did not look
at the result. For this reason we have preferred to use, in (4.34), a notation
different from that used in (4.28) for the mean value of ξ:  ξ  is a ‘classical
mean of quantum means’.
In general, if a collection of quantum systems consists of systems in the
(not necessarily orthogonal) states| u1 , | u2 , · · · , | un , · · · respectively in
percentages p1 , p2 , · · · , pn , · · · ( n pn = 1 ), one says that the collection of
our systems is a statistical mixture: the numbers pn can be either ‘certain
percentages’ or percentages due to our ignorance, namely due to the fact that
we have only partial information about how the systems have been prepared.
Statistical mixtures are, for example, either the set of states of a system after
that on them some observable has been measured (see the above example with
photons), or any thermodynamic system (e.g. a gas): it is a statistical mixture
of its subsystems (the molecules).
A statistical mixture is therefore described by the set of pairs

{ | u1 , p1 ; | u2 , p2 ; · · · ; | un , pn ; · · · } (4.35)

and (4.33) and (4.34) are generalized in the form:

P (mixture → | u ) = pn | un | u |2 ;  ξ  = pn  un | ξ | un  (4.36)


n n

namely the mean value of any observable in a statistical mixture is the ‘mean
of the means’.
82 4 The Postulates of Quantum Mechanics

Assume now that | u  and | v  are two orthogonal states; let us consider
a system in the pure state

| a  = c1 | u  + c2 | v  , |c1 |2 + |c2 |2 = 1 (4.37)

(the Sp ensemble) and the statistical mixture of states | u  and | v  with


percentages p1 = |c1 |2 and p2 = |c2 |2 that we represent as

{ | u1 , p1 = |c1 |2 ; | u2 , p2 = |c2 |2 } (4.38)

(the Sm ensemble): we ask ourselves in which way the two ensembles can be
distinguished, since in both cases the probability to find the system in the
state | u  being p1 and the probability to find it in the state | v  being p2 .
If ξ is an observable, the mean value of ξ in the ensemble Sp is:
   
ξ = c∗1  u | + c∗2  v | ξ c1 | u  + c2 | v  =
= |c1 |2  u | ξ | u  + |c2 |2  v | ξ | v  + c∗1 c2  u | ξ | v  + c∗2 c1  v | ξ | u  (4.39)

whereas the mean value of ξ in the ensemble Sm is given by (4.36):

 ξ  = |c1 |2  u | ξ | u  + |c2 |2  v | ξ | v  (4.40)


 ∗ 
so that the two mean values differ by the quantity 2 Re c1 c2  u | ξ | v 
that appears in (4.39) because the states | u  and | v  may interfere: this is
expressed by the fact that the state | a  is a coherent superposition of | u 
and | v  , while in the ensemble Sm we have an incoherent mixture of the
states | u  and | v  : in (4.37) also the phases of the complex number c1
and c2 are relevant (better: their relative phase) whereas in (4.38) only the
absolute values intervene.
If only observables having | u  and/or | v  as eigenstates are measured,
 u | ξ | v  = 0 (by assumption | u  and | v  are orthogonal) and we cannot
distinguish the two ensembles, but – in general –  u | ξ | v  = 0 and the two
ensembles provide different results.
For example,
 assume that the ensemble Sp consists
 of systems in the state
| a  = √12 | u  + | v  while the ensemble Sm is | u , p1 = 12 ; | v , p2 = 12
and we ask how much  is, in thetwo cases, the probability to find a system in
the state | b  = √12 | u  − | v  . In the first case the probability is 0, since
the two states | a  and | b  are orthogonal to each other; in the second case,
instead:
    1 1 1 1 1
p = p1 ×P | u  → | b  + p2 ×P | v  → | b  = × + × = .
2 2 2 2 2
The difference between pure state and statistical mixture is well exemplified
by the experiment of neutron interferometry cited in Sect. 3.3 (Fig. 3.4), in the
two cases in which the mirrors s2 and s3 are either fixed or mobile mirrors
able to detect the transit of neutrons.
4.9 Pure States and Statistical Mixtures 83

Referring to Fig. 4.1, let us call | x  the state of a neutron that travels ‘hor-
izontally’ (in the figure) and | y  that of a neutron that travels upwards. So
the neutrons that arrive at the semi-transparent mirror s1 are in the state
| x  and downstream of s1 are in the state
1  
| a  = √ | x  + ei α | y  .
2
We have assumed equal transmission and C2
reflection coefficients of s1 ; the factor ei α W
is compatible with this assumption and a s4 6
s2
- - C1
priori we cannot exclude that it be intro-
|y
duced by the reflection (indeed, we shall see 6 6
that, owing to reasons of probability conser- - s1
-| x s3
vation, it must be equal to ± i ). In the re-
Fig. 4.1
flections at mirrors s2 and s3 | x  → | y 
and | y  → | x  (the phase factors introduced by the reflections at s2 and
s3 are equal to each other, therefore irrelevant). So, if the mirrors s2 and s3
are fixed:
(s1 ) 1   (s2 ,s3 ) 1  
| x  −→ | a  ≡ √ | x  + ei α | y  −→ | b  ≡ √ | y  + ei α | x  .
2 2
The wedge W on the path s2 → s4 introduces the phase shift ϕ:
(W) 1  
| b  −→ | bϕ  ≡ √ | y  + ei (α+ϕ) | x 
2
and finally the state | bϕ  hits the semi-transparent mirror s4 where (by
symmetry)
1   1  
| x  → √ | x  + ei α | y  ; | y  → √ | y  + ei α | x 
2 2
therefore:
 
(s4 ) 1 1   i (α+ϕ) 1
 
| bϕ  −→ | c  ≡ √ √ |y + e |x + e

√ |x + e |y

2 2 2
1  iα 
= e (1 + e ) | x  + (1 + e
iϕ i (2α+ϕ)
|y .
2
The probabilities p1 and p2 that either the counter C1 or the counter C2
clicks are respectively given by:
1 1
p1 = | c | x |2 = |1 + ei ϕ |2 = (1 + cos ϕ)
4 2
1 1 
p2 = | c | y |2 = |1 + ei (2α+ϕ) |2 = 1 + cos(2α + ϕ)
4 2
from which it follows that, owing to p1 + p2 = 1, α = ±π/2 , i.e. we have
found again the result (3.4).
84 4 The Postulates of Quantum Mechanics

Let us now assume that s2 and s3 are mobile mirrors able to detect the
collision of a neutron. In the latter case
(s1 ) 1   (s2 ,s3 )  
| x  −→ | a  ≡ √ | x  + ei α | y  −→ | x , p1 = 12 ; | y , p2 = 1
2
2
therefore, downstream of s2 and s3 the system is no longer in a pure state,
but in a statistical mixture. The wedge W has no effect on the 50% of neutrons
that take the path s2 → s4 (ei ϕ | x  ∼ | x  ), therefore:
⎧  

⎪ √1 | x  + ei α | y  , p1 = 1 ;
  (s4 ) ⎨ 2 2
| x , p1 = 12 ; | y , p2 = 12 −→
⎪ 
⎩ √1 | y  + ei α | x , p = 1 .

2 2 2

According to the first of (4.36), each of the two components of the mixture
has a probability
1  1 2 1
× √ =
2 2 4
to be detected by either C1 or C2 , i.e. – as we have already said in Sect. 3.3
– the two counters always record the same number of neutrons (probability
2× 14 ), independently of the position of the wedge.
In conclusion, the difference between the pure state and the statistical
mixture
1    
| b  = √ | y  + ei α | x  , | x , p1 = 12 ; | y , p2 = 1
2
2
that we have downstream of the mirrors s2 , s3 , lies in that in the first case
it is possible – by inserting the semi-transparent mirror s4 in the apparatus
– to make the two components | x  and | y  interfere with each other, while
in the second case there is no such possibility.

4.10 Compatible Observables


In classical physics, given the state of a system, any observable has a well
determined value in that state: for example, given position and velocity of a
particle (i.e. its state) energy, angular momentum etc. are known. It is not
so in quantum mechanics: first of all, if ξ is an observable, only in the case
| A  represents an eigenstate of ξ it makes sense to say that ξ has a value
in | A  (the corresponding eigenvalue); in the contrary case we can say, for
example, which is the mean value of ξ in | A  , or which is the probability
that a measurement of ξ yields a given result. In summary, in this case only
statistical information is available.
Furthermore, if we have two observables ξ and η and if | ξ   is an eigen-
vector of ξ, then ξ has a value in | ξ   , but in general η does not possess
a value in this state. If it happens that a certain state simultaneously is an
eigenstate of both ξ (eigenvalue ξ  ) and η (eigenvalue η  ), than both ξ and
4.10 Compatible Observables 85

η have a determined value in that state that we will accordingly represent by


| ξ  , η   . One has therefore, at least as far as the state | ξ  , η   is concerned, a
kind of compatibility of the two observables ξ and η. If the states on which ξ
and η are compatible (i.e. states that are simultaneous eigenstates of ξ and η)
are enough as to form a basis, then we will say that ξ and η are compatible
observables . In other words:
1-st definition: two observables are said compatible if they admit a complete
set of simultaneous eigenvectors.
So, for, example, if the system is a photon, two birefringent crystals whose
optical axes are neither parallel nor orthogonal to each other, certainly are not
compatible observables, inasmuch as there exists no (polarization) state that is
simultaneous eigenstate of both; they are instead compatible observables both
in the case the optical axes are parallel and in the case they are orthogonal
to each other.
We have already introduced in Sect. 3.6 the concept of compatible observ-
ables, but at first sight it does not seem that the two definitions have anything
to do with each other. The connection is provided by the fact that there is
complete equivalence between the definition given above and the following:
2-nd definition: two observables ξ and η are said compatible if it happens
that, given any eigenstate | ξ   of ξ and, having made a measurement  of η
in it, the state after such a measurement still is an eigenstate
 of ξ therefore

corresponding to the eigenvalue ξ  : P | ξ   → | ξ   = 0 if ξ  = ξ  .
It is known that the state, immediately after the measurement of η, is an
eigenstate of η : it is therefore a simultaneous eigenstate of ξ and η : | ξ  , η   .
In general, if ξ  is a degenerate eigenvalue, the state | ξ  , η   after the
measurement of η is different from the state | ξ   in which η is measured.
More than giving demonstration of the equivalence of the two definitions
(that we will however give), it is important to emphasize the physical signifi-
cance of the compatibility of two observables, as it emerges from the second
definition: it indeed says that two observables are compatible if they can both
be measured in a state and the second measurement, even if it perturbs the
state in which the system is after the first measurement, is such that the in-
formation acquired with the first measurement is not lost; more to it: if the
first observable one measures is nondegenerate, the second measurement does
not perturb the state. Schematically:
measurement of ξ measurement of η
|A | ξ  | ξ  , η 

and if ξ is nondegenerate, then | ξ  , η   = | ξ   .


The proof of the equivalence of the two definitions, under the assumption
that ξ and η are nondegenerate, is very simple: if ξ and η are compatible
according to the first definition, since ξ and η have the same eigenvectors
(they are nondegenerate!), the measurement of η is a repetition of the first
measurement (only the scale of the instrument changes) and therefore does
86 4 The Postulates of Quantum Mechanics

not perturb the state: schematically (the sign ≡ means ‘by assumption equal
to’)
ξ η
| A  −→ | ξ   ≡ | ξ  , η   −→ | ξ  , η   .
If instead ξ and η are compatible according to the second definition, the
proof is outlined in the following diagram, where the sign ≡ follows from the
hypothesis of nondegeneracy of ξ, and therefore there exist only one eigenstate
corresponding to the eigenvalue ξ  :
ξ η
| A  −→ | ξ   −→ | ξ  , η   ≡ | ξ   .

We propose the demonstration of the equivalence without the nondegeneracy


assumption of the observables just as an exercise.
Let us assume, according to the first definition, that ξ and η have a com-
plete set of simultaneous eigenvectors One has
ξ η 
| A  −→ | ξ   −→ | η   , | η  = α | ξ  , η  + i ci | ξi , η   , ξi = ξ 

but, since by assumption the transition probability | ξ   → | η   is nonvanish-


2
ing and equals |α|2×  ξ  | ξ  , η   , one must have α = 0 ; furthermore, owing
to von Neumann postulate, such probability must be a maximum, whence
|α| = 1, ci = 0 . In conclusion, | η   = | ξ  , η   .
Viceversa, let us assume (second definition) that for any | ξ  
η
| ξ   −→ | ξ  , η   .

Let us consider the (closed) linear manifold V generated by all the simultane-
ous eigenvectors and let us assume (by contradiction) that V = H . Let then
| ξ   ∈ V⊥ (it exists!). But then, as | ξ  , η   ∈ V , we arrive at the contradiction
 η 
that P | ξ   −→ | ξ  , η   = 0 . Then V = H .
There is now an important algebraic characterization concerning compat-
ible observables, expressed by the following
Theorem: Two observables are compatible if and only if

ξη = ηξ (4.41)

holds for the operators associated with them.


In the latter case one says that ξ and η commute with each other. The
expression:
[ξ, η] ≡ ξη − ηξ
is called the commutator of ξ and η, then (4.41) reads:

[ξ, η] = 0 .

The demonstration we will give has not the status of a rigorous demonstration
because, as usual, we shall ignore the problems relative to the domains of
4.10 Compatible Observables 87

the two operators: it rather aims at emphasizing the intuitive aspects of the
problem.
If ξ and η are compatible, by definition the set of their simultaneous eigen-
vectors | ξ  , η   is complete. One has:

ξ η | ξ  , η  = ξ | ξ  , η  η = | ξ  , η  ξ  η
η ξ | ξ  , η  = η | ξ  , η  ξ  = | ξ  , η  η ξ  .

Therefore the operators ξ η and η ξ give the same results on the vectors
| ξ  , η   ; but, since the vectors of the type | ξ  , η   generate the whole Hilbert
space, (4.41) follows.
The viceversa needs to be shown, namely that if [ ξ , η ] = 0 , then ξ and
η have a complete set of simultaneous eigenvectors.
We start by showing a lemma that, owing to its importance and the fre-
quent use we will make of it in the sequel, deserves to be taken out of the
demonstration of the theorem.
Lemma: if [ ξ , η ] = 0 and if ξ | ξ   = ξ  | ξ   , then η | ξ   still is an eigen-
vector of ξ belonging to the eigenvalue ξ  :
  
ξ η | ξ   = ξ  (η | ξ   . (4.42)

Indeed:
ξ η | ξ  = η ξ | ξ  = η ξ | ξ  = ξ η | ξ 
(notice that it was not even necessary to assume that η be a self-adjoint
operator).
Let us now conclude the demonstration of the theorem. Let us first consider
the case in which one of the two observables, e.g. ξ, is nondegenerate. In
this case, since by assumption [ ξ , η ] = 0 , the lemma immediately takes
us to the result: indeed, since η | ξ   is an eigenvector of ξ belonging to the
nondegenerate eigenvalue ξ  , it must be a multiple of | ξ   , namely:

η | ξ   = η | ξ  

i.e. any eigenvector | ξ   of ξ must also be an eigenvector of η.


Let us now consider the degenerate case. Let Hi be the eigenspace of ξ
belonging to the eigenvalue ξi ; owing to the lemma, applying η to any vector
in Hi always gives, as a result, a vector in Hi , i.e. any eigenspace of ξ is
invariant under η: η acts in independent ways on each Hi . Even now it is
intuitive that the restriction of η to Hi has a set of eigenvectors complete in
Hi ; but all the vectors of Hi are eigenvectors of ξ, therefore in each Hi we
have a complete set of simultaneous eigenvectors: | ξi , ηj  . In this way a set
of simultaneous eigenvector of ξ and η, complete for the whole H, is obtained.
Let us now prove what we left to the intuition, namely that η has a set of
eigenvectors complete in Hi .
Let | ηk , k = 1, 2, · · · be a set of eigenvectors of η, complete in H. It is
known that it is possible to effect, in a unique way, the decomposition
88 4 The Postulates of Quantum Mechanics
(i) (⊥) (i) (⊥) (⊥) (⊥)
| ηk  = | ηk  + | ηk , | ηk  ∈ Hi , | ηk  ∈ Hi ; Hi ≡ j =i Hj .

But, from η | ηk  = ηk | ηk  , one has


(i) (⊥) (i) (⊥)
η | ηk  + η | ηk  = ηk | ηk  + ηk | ηk 
(i) (⊥) (⊥)
and, since η | ηk  ∈ Hi and | ηk  ∈ Hi , owing to the uniqueness of the
decomposition of the vector η | ηk  , one must have:
(i) (i) (⊥) (⊥)
η | ηk  = ηk | ηk  , η | ηk  = ηk | ηk .
(i)
Let now | A  ∈ Hi be orthogonal to all the | ηk  . As | A  ∈ Hi , it is
(⊥)
orthogonal to all the | ηk  , therefore to all the | ηk  . So | A  is the null
(i)
vector and, as a consequence, the set of all the | ηk  is complete in Hi .
For the reader that has understood the theorem, the following remarks
should be superfluous.
1. The theorem does not say that, if [ ξ , η ] = 0 , then any eigenvector of one
observable also is an eigenvector of the other; it says that there exists (i.e.
one can find) a complete set of simultaneous eigenvectors. For example
the identity operator 1 commutes with any operator and any vector is an
eigenvector of 1, but certainly it is not true that any vector also is an
eigenvector of whatever observable.
2. It is nonetheless true that, if one of the observables – e.g. ξ – is nonde-
generate, then any eigenvector of ξ also is an eigenvector of η.
3. If [ ξ , η ] = 0 , it may happen that ξ and η have some simultaneous
eigenvector, certainly not as many as to form a complete set.
The results we have obtained can be generalized to the case of more than
two observables: the observables ξ, η, ζ, · · · are compatible if the operators
associated with them all commute with one another. It can be then shown
that there exists a complete set of simultaneous eigenvectors | ξ  , η  , ζ  · · ·  of
such observables; and viceversa: if a set of observables possesses a complete
set of simultaneous eigenvectors, then they commute with one another.
Let us now introduce the concept of complete set of compatible (or
commuting) observables : if ξ, η, ζ, · · · are n compatible observables, any
simultaneous eigenstate is identified by a n-tuple of eigenvalues ξi , ηj , ζk , · · · :
| ξi , ηj , ζk , · · ·  . It may happen that for a given n-tuple there is more than
just one simultaneous eigenstate. If instead there never are two or more si-
multaneous eigenstates of ξ, η, ζ, · · · belonging to the same n-tuple, than we
say that ξ, η, ζ, · · · form a complete set of compatible observables.
This definition generalizes the concept of nondegenerate observable to the
case of two or more observables. In less precise but more intuitive terms one
could say that a set of compatible observables is complete when, any single
observable being (possibly) degenerate, the set is globally nondegenerate.
The physical importance of this concept is the same as that of a nondegen-
erate observable: given the n-tuple of eigenvalues ξi , ηj , ζk , · · · , the state of
4.11 Uncertainty Relations 89

the system is known. We leave to the reader the demonstration of the fact that,
much as for a nondegenerate observable, the set of the simultaneous eigen-
vectors of a complete set of compatible observables makes up an orthogonal
basis, and viceversa.

4.11 Uncertainty Relations


Let ξ and η be two (either compatible or not) observables, Δξ and Δη the
root mean squares of the results of measurements of ξ and η on a system in the
generic state | s  . The following important theorem due to H. P. Robertson,
relating the product of the uncertainties Δξ and Δη with the mean value of
the commutator of ξ and η in the state | s  , follows:
Uncertainty Relation:
1
Δξ Δη ≥ s|[ξ, η]|s . (4.43)
2
Let us demonstrate (4.43). We introduce the non-Hermitian operators:

α ≡ ξ + ixη, α† ≡ ξ − i x η

where x is a real parameter. Let us consider the product α† α (pay attention


to the order of the factors, for in general ξ and η do not commute):

α† α ≡ (ξ − i x η) (ξ + i x η) = ξ 2 + x2 η 2 + i x [ ξ , η ] . (4.44)

Note that, if  s | s  = 1, for any operator α the inequality

 s | α† α | s  ≥  s | α† | s × s | α | s  (4.45)

holds as a consequence of either the Schwartz inequality applied to the vectors


α | s  and | s  s | α | s  , or the completeness relation (4.25) : indeed, if the
vectors | si  make up a basis of which | s  is an element, e.g. | s  = | s1  ,
then:

 s | α† α | s  =  s | α† | si  si | α | s 
i

=  s | α† | s  s | α | s  +  s | α† | si  si | α | s  .
i>1

In the above relation


 all the terms of the last sum ( i > 1 ) are positive
= | si | α | s |2 , whence the thesis.
Then from (4.44) and (4.45) one has:
     
 s | ξ 2 + x2 η 2 + i x [ ξ , η ] | s  ≥  s | ξ − i x η | s × s | ξ + i x η | s 

i.e.
2
ξ 2 + x2 η 2 + x i [ ξ , η ] ≥ ξ + x2 η 2
that is:
90 4 The Postulates of Quantum Mechanics

x2 (Δη)2 + x i [ ξ , η ] + (Δξ)2 ≥ 0 .
Since the sign ≥ must hold for any real x, the discriminant of this quadratic
form in the variable x must be ≤ 0 :
 2
i x [ ξ , η ] − 4 (Δξ)2 (Δη)2 ≤ 0

from which (4.43) immediately follows.


The case in which the commutator [ ξ , η ] is a multiple of the identity
operator 1 is particularly important. Note that, as ξ † = ξ and η † = η ,

[ ξ , η ]† = (ξ η − η ξ)† = η ξ − ξ η = −[ ξ , η ]

(i.e. [ ξ , η ] is an anti Hermitian operator) then, if [ ξ , η ] is a multiple of 1,


the multiplicative factor must be a pure imaginary number:

[ξ, η] = ic1, c∈R

(from now on the identity operator will be omitted, as 1 | s  = | s  : numerical


quantities and multiples of the identity operator – there is no substantial
difference between the two – are often called c-numbers).
In the latter case, i.e. when the commutator [ ξ , η ] is a multiple of the
identity operator, (4.43) becomes:
1
Δξ Δη ≥ |c | (valid if [ξ, η] = ic) (4.46)
2
i.e. the product of the uncertainties is greater of or equal to a fixed quantity
that does not depend on the state one is considering.

4.12 Quantization Postulate


The main difference between classical physics and quantum mechanics lies
in the fact that, while in the classical scheme the observables give rise to a
commutative algebra (for example: q p and p q are the same thing), in the
quantum scheme the observables – being represented by operators – in general
do not obey the commutative property.
Now, given a physical system and having established which are its observ-
ables, the (quantum) theory is complete if it enables us to find the eigenvalues
and eigenvectors of the several observables, the degeneracies of the eigenvalues
and the probability transitions among any two states of the system: it can be
shown that all this is possible if for any pair of observables the commutator
is known. We will not give a proof of the above statement, but we shall have
several occasions to realize that it is true.
The quantum scheme we have discussed so far is a general scheme and
makes no reference to any particular physical system. From now on the phys-
ical systems we shall be concerned with will consist of one or more particles:
indeed, we have in mind to apply quantum theory to atoms and to show how
4.12 Quantization Postulate 91

it is possible to arrive at a no less than quantitative understanding of the


properties of even complex atoms.
Which are the observables of a system consisting of n particles? Classically
we have the positions qi and momenta pi ( i = 1, 2, · · · , 3n ) and their func-
tions (by the qi we shall always understand the Cartesian coordinates: as in
Nature there are no constraints, there is no reason not to make such a choice).
We make the hypothesis that the above quantities are the observables for the
system, even when it is considered from the quantum point of view. Whether
this hypothesis is right or wrong will emerge from the comparison between the
predictions of the theory and the experimental results: for example, we shall
see that, in the case of electrons, other observables (the spin) will be needed.
According to the discussion of the previous sections, to any observable
there corresponds a self-adjoint operator. So, for example, qi → qiop , pi → pop i
(we provisionally go back to the notation ξ op ).
If we now have an observable f (q, p) , we postulate that the operator
associated with it is obtained by replacing q and p in f respectively with q op
and pop : in other words we postulate that f op ≡ f (q op , pop ) . So, for example,
in the case of the harmonic oscillator to the energy

1 2
H= (p + m2 ω 2 q 2 )
2m
there corresponds the self-adjoint operator:

1
H op = [(pop )2 + m2 ω 2 (q op )2 ] .
2m
Sometimes an ambiguity arises, due to the fact that, while in f (q, p) the order
of factors is unessential, in f (q op , pop ) it is important, so that the resulting
operator may be not Hermitian. In practice this ambiguity is not very relevant:
for example one can write:

1 1
qp = (q p + p q) → (q op pop + pop q op )
2 2
that is Hermitian. From now on we will no longer write the symbol op to
distinguish the quantum operators: by q, p, f (q, p) we will positively denote
the operators associated with the corresponding observables.
According to the above discussion , it is necessary to know the commuta-
tor of any pair of observables f (q, p) and g(q, p) . It can be seen that such
commutators are known if the following commutators:

[ qi , qj ] , [ qi , pj ] , [ pi , pj ] (4.47)

are known. This can be achieved by means of the following essential


Formal Properties of Commutators:
Let ξ, η, ζ be operators. One has:
92 4 The Postulates of Quantum Mechanics

1. [ξ , η ] = −[ η , ξ ] ,
2. [ξ , η ]† = −[ η , ξ ] if ξ = ξ † η = η† ,
3. [ξ , η + ζ ] = [ ξ , η ] + [ ξ , ζ ] ,
(4.48)
4. [ξ , η ζ ] = η [ ξ , ζ ] + [ ξ , η ] ζ ,
5. [ξ η , ζ ] = ξ [ η , ζ ] + [ ξ , ζ ] η ,
     
6. ξ, [η, ζ ] + η, [ζ , ξ] + ζ , [ξ, η] = 0 .
Note – this observation may provide a good mnemonical rule – that 4 and 5
recall the Leibniz rule for the derivative of a product; 5 follows from 1 and 4;
6 is known as Jacobi identity.
It should be clear that, according to the above rules, the calculation of a
commutator of any f (q, p) and g(q, p) that are either polynomials or power
series in q and p is led back to the ‘elementary’ commutators (4.47). So,
in order to complete the scheme, it is necessary and sufficient to know the
commutators (4.47). Let us firstly note that not all of them can be vanishing:
if it were so, all the observables would commute with one another and one
would be taken back to the classical case.
In order to determine the commutators (4.47) we shall resort to an anal-
ogy between the quantum commutators and the Poisson brackets of classical
mechanics – an analogy that, as it will be seen a posteriori, will enable us to
recover classical mechanics as a limiting case of quantum mechanics.
As well known, in classical mechanics the Poisson Bracket [ f , g ]pb be-
tween f (q, p) and g(q, p) is defined in the following way:
3n  ∂f ∂g ∂f ∂g 
[ f , g ]pb ≡ − . (4.49)
i=1
∂qi ∂pi ∂pi ∂qi

The Poisson brackets enjoy the same formal properties of commutators listed
in (4.48) (for the Poisson brackets, however, the order of factors in 4 and
5 is unessential) and play an important role e.g. in the theory of canonical
transformations, or even in the equations of motions that can be written in
the form:
d
f (q, p) = [ f , H ]pb (4.50)
dt
H being the Hamiltonian of the system. Due to both the importance of Poisson
brackets in classical mechanics and to formal analogy with commutators, we
assume the following:
Quantization Postulate: the commutators (4.47) are proportional to the
corresponding Poisson brackets.
From (4.49) one has:

[ qi , qj ]pb = 0 , [ qi , pj ]pb = δij , [ pi , pj ]pb = 0

so the only nonvanishing commutator between qi and pj must be propor-


tional to δij . Which is the value of the proportionality constant? Note that
4.12 Quantization Postulate 93

owing to 2 of (4.48), such a constant must be pure imaginary and must have
the dimensions of an action: we postulate that it is i  (the choice of the sign
brings along no physical consequences, since the transformation i → −i does
not touch upon the properties that define the imaginary unit: i2 = −1 and
i∗ = −i ).
So, in the end, we have the following commutation rules, or quantization
conditions, or Canonical Commutation Relations (usually referred to as CCR):

[ qi , qj ] = 0 , [ qi , pj ] = i  δij , [ pi , pj ] = 0 . (4.51)

The CCR (4.51) express an important property: observables referring to dif-


ferent ( i = j ) degrees of freedom are compatible. There is instead incompat-
ibility between any qi and its canonically conjugate momentum pi : for such
observables, whose commutator is a multiple of the identity, the uncertainty
relation can be written in the form (4.46):
1
Δqi Δpi ≥  (4.52)
2
that must be compared with (3.18); in the present case, however, Δx and
Δpx have a precise meaning given by (4.31): they are the root mean squares
of the results of measurements of x and px in the same state.
Those states for which (4.52) holds with the equality sign (we shall en-
counter an example in the next chapter) are called minimum uncertainty
states .
Thanks to the proportionality between (4.51) and the corresponding Pois-
son brackets, and to the formal properties (4.48) shared by both, it follows
that in many cases one has:

[ f , g ] = i  [ f , g ]pb . (4.53)

The equality (4.53) may break down due to the order of factors that is relevant
only in the left hand side. For example:

[ q 2 , p2 ] = q [ q , p2 ] + [ q , p2 ] q = 2 i  (q p + p q)

whereas
i  [ q 2 , p2 ]pb = 4 i  q p .
In any event, when the problem of the order of factors does not show up, (4.53)
holds and may provide a quick way to calculate complicated commutators.
For example, in such a way the following important commutators can be
calculated:
[ f (q) , g(q)] = 0 ; [ f (p) , g(p) ] = 0 ;
∂f ∂f (4.54)
[ qi , f (p) ] = i  ; [ pi , f (q) ] = −i  .
∂pi ∂qi

You might also like