Quantum Information Processing Lecture Notes, Wolf
Quantum Information Processing Lecture Notes, Wolf
Michael M. Wolf
1 Mathematical framework 5
1.1 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Bounded Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Ideals of operators . . . . . . . . . . . . . . . . . . 10
Convergence of operators . . . . . . . . . . . . . . 11
Functional calculus . . . . . . . . . . . . . . . . . . 12
1.3 Probabilistic structure of Quantum Theory . . . . . . . . . . . . 14
Preparation . . . . . . . . . . . . . . . . . . . . . . 15
Measurements . . . . . . . . . . . . . . . . . . . . . 17
Probabilities . . . . . . . . . . . . . . . . . . . . . . 18
Observables and expectation values . . . . . . . . . 20
1.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Convex sets and extreme points . . . . . . . . . . . 22
Mixtures of states . . . . . . . . . . . . . . . . . . . 23
Majorization . . . . . . . . . . . . . . . . . . . . . 24
Convex functionals . . . . . . . . . . . . . . . . . . 26
Entropy . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Composite systems and tensor products . . . . . . . . . . . . . . 29
Direct sums . . . . . . . . . . . . . . . . . . . . . . 29
Tensor products . . . . . . . . . . . . . . . . . . . . 29
Partial trace . . . . . . . . . . . . . . . . . . . . . 34
Composite and reduced systems . . . . . . . . . . . 35
Entropic quantities . . . . . . . . . . . . . . . . . . 37
1.6 Quantum channels and operations . . . . . . . . . . . . . . . . . 38
Schrödinger & Heisenberg picture . . . . . . . . . . 38
Kraus representation and environment . . . . . . . 42
Choi-matrices . . . . . . . . . . . . . . . . . . . . . 45
Instruments . . . . . . . . . . . . . . . . . . . . . . 47
Commuting dilations . . . . . . . . . . . . . . . . . 48
1.7 Unbounded operators and spectral measures . . . . . . . . . . . . 51
2 Basic trade-offs 53
2.1 Uncertainty relations . . . . . . . . . . . . . . . . . . . . . . . . . 53
Variance-based preparation uncertainty relations . 54
Joint measurability . . . . . . . . . . . . . . . . . . 55
2
CONTENTS 3
2.2 Information-disturbance . . . . . . . . . . . . . . . . . . . . . . . 56
No information without disturbance . . . . . . . . 56
2.3 Time-energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Mandelstam-Tamm inequalities . . . . . . . . . . . 58
Evolution to orthogonal states . . . . . . . . . . . . 59
These are (incomplete but hopefully growing) lecture notes of a course taught in
summer 2019 at the department of mathematics at the Technical University of
Munich.
4 CONTENTS
Chapter 1
Mathematical framework
In fact, whenever a norm satisfies Eq.(1.1) for all ψ, ϕ, then we can reconstruct
a corresponding inner product via the polarization identity, which in the case of
a complex space reads
3
1X k
2
hψ, ϕi = i
ϕ + ik ψ
. (1.2)
4
k=0
1 That is, every Cauchy sequence converges.
2 Note that the derivation of Cauchy-Schwarz does not use that hψ, ψi = 0 ⇒ ψ = 0. It
only requires that hψ, ψi ≥ 0.
5
6 CHAPTER 1. MATHEMATICAL FRAMEWORK
|ek ihek | = 1.
X
(1.3)
k
To write expressions of this form even more compactly, the elements of a fixed
orthonormal basis are often simply specified by their label so that one writes
|ki instead of |ek i.
So far, this has been abstract Hilbert space theory. Before we proceed, some
concrete examples of Hilbert spaces:
Example 1.1. Cn becomes
Pn a Hilbert space when equipped with the standard
inner product hψ, ϕi = i=1 ψi ϕi .
1.1. HILBERT SPACES 7
comes
P a Hilbert space when equipped with the standard inner product hψ, ϕi =
k ψk ϕk . The standard orthonormal basis in this case is given by sequences
ek , k ∈ N such that the l’th element in ek equals δlk .
space L2 (R) := f : R → C R |ψ(x)|2 dx <
R
Example
1.3. The function
∞ / ∼ where ψ ∼ ϕ ⇔ R |ψ(x) − ϕ(x)|2 dx = 0 becomes a separable Hilbert
R
R
space with hψ, ϕi = R ψ(x)ϕ(x)dx.
Example 1.4. The space Cn×m of complex n × m matrices becomes a Hilbert
space with hA, Bi = tr [A∗ B].
Two Hilbert spaces H1 and H2 are called isomorphic if there is a bijection
U : H1 → H2 that preserves all inner products. U , which is called a Hilbert
space isomorphism, is then necessarily linear and it turns out that Hilbert spaces
are isomorphic iff they have the same dimension. Hence, all separable Hilbert
spaces are isomorphic to either Cn or l2 (N), in particular, L2 (R) ' l2 (N).
Sometimes one has to deal with inner product spaces that are not complete.
In these cases the following theorem comes in handy and allows to ‘upgrade’
every such space to a Hilbert space:
Theorem 1.1 (Completion theorem). For every inner product space X there is
a Hilbert space H and a linear map V : X → H that preserves all inner products3
so that V (X ) is dense in H and equal to H if X is complete. The space H is then
called the completion of X . It is unique in the sense that if (V 0 , H0 ) give rise to
another completion, then there is a Hilbert space isomorphism U : H → H0 s.t.
V0 =U ◦V.
This is often stated and used for real Hilbert spaces, but equally valid for
complex ones.
From now on, we will tacitly assume that all Hilbert spaces H, H1 , H2 , etc.
are complex and separable.
Exercise 1.1. Show that the closed unit ball of any Hilbert space is strictly convex.
Exercise 1.2. Show that any linear map U : H1 → H2 that preserves norms also
preserves inner products.
Exercise 1.3. a) Prove that ψ = ϕ iff ∀φ ∈ H :hφ, ϕi = hφ, ψi.
b) Let A : H → H be linear, ψ, ϕ ∈ H. Verify the identity
3
1X k
hϕ, Aψi = i hψ + ik ϕ, A(ψ + ik ϕ)i.
4
k=0
Notes and literature Frigyes Riesz, David Hilbert and Hilbert’s student Erhard
Schmidt studied various aspects of concrete Hilbert spaces, (mainly in the context of
integral equations or for l2 (N)) in the first years of the 20’th century. The introduction of
a geometric viewpoint, which led to the concept of orthogonality, is largely due to Schmidt.
The term Hilbert space was coined by Frigyes Riesz for concrete Hilbert spaces and it was
later used by John von Neumann for the underlying abstract concept. Herman Weyl in-
troduced the name unitary space in parallel. Von Neumann, who included separability in
the definition of a Hilbert space, used the concept to unify Schrödinger’s wave mechanics
with the matrix mechanics of Werner Heisenberg, Pascual Jordan and Max Born. An
impetus of von Neumann’s work were lectures given by David Hilbert in the winter term
1926/27 on the development of quantum mechanics. Von Neumann attended the lectures
and quickly established a rigorous mathematical basis of what he had heard. Soon after,
this led to the foundational work “Über die Grundlagen der Quantenmechanik ”.
A good way to learn about the mathematics of Hilbert spaces is from Paul Halmos’ “A
Hilbert space problem book ”.
are all Hermitian and unitary. Together with σ0 := 1 they form a basis of the
space of 2 × 2 matrices.
Positivity is a crucial concept for many things that follow. It induces a
partial order within the set of Hermitian operators by understanding A ≥ B as
A − B ≥ 0. There are various ways of characterizing a positive operator. For
instance, A ≥ 0 holds iff A = A∗ ∧ spec(A) ⊆ [0, ∞), which in turn is equivalent
to the existence of a B ∈ B(H) so that A = B ∗ B. If such a B exists, it can
always be chosen
√ positive itself, which then uniquely defines a positive square
root B =: A ≥ 0 for any A√≥ 0. This in turn enables the definition of a
positive absolute value |A| := A∗ A ∈ B(H) for any A ∈ B(H). The absolute
value is also related to the original operator via the polar decomposition, which
states that for any A ∈ B(H) there is a partial isometry U such that A = U |A|.
Here U can be taken unitary iff ker(A) and ker(A∗ ) have the same dimension.
4 The term self-adjoint is used as well.
10 CHAPTER 1. MATHEMATICAL FRAMEWORK
Using spectral theory, one can show that every Hermitian operator A ∈ B(H)
admits a unique decomposition of the form
In this case, the absolute value can also be expressed as |A| = A+ +A− . Another
way in which linear combinations of positive operators can be used, is once again
a variant of the polarization formula, which for the case of a pair of bounded
operators A, B ∈ B(H) takes on the form
3
1X k ∗
B∗A = i A + ik B A + ik B .
(1.7)
4
k=0
where s ∈ RN
+ is a null sequence whose non-zero elements are called singular
values of A and {ek }, {fk } are two orthonormal sets of vectors in H2 and H1 ,
respectively. The singular values of A are unique as a multiset. If H1 = H2 = H
each ek can be chosen proportional (equal) to fk iff A is normal (positive). In
these cases, Eq.(1.8) then leads to the spectral decomposition, with eigenvectors
ek and eigenvalues sk hfk , ek i.
If we restrict the space of compact operators to those for which s ∈ l2 (N) or
s ∈ l1 (N), we are led to the spaces of Hilbert-Schmidt class operators B2 (H1 , H2 )
and, in the case of equal spaces, the trace-class operators B1 (H), respectively.
These become Banach spaces when equipped with the Hilbert-Schmidt norm
kAk2 := ksk2 and the trace-norm kAk1 := ksk1 , respectively. With respect to
these norms B2 (H1 , H2 ) and B1 (H) can be regarded as completion of the space of
finite-rank operators and we have the inclusion (with equalities iff dim(H) < ∞)
These inclusions also reflect the norm inequalities kAk1 ≥ kAk2 ≥ kAk∞ := kAk
for A ∈ B(H). All the spaces in Eq.(1.9) are ∗-ideals in B(H), which means that
they are closed under multiplying with elements of B(H) and under taking the
adjoint. Moreover, A, B ∈ B2 (H) implies AB ∈ B1 (H).
An alternative and equivalent definition of B2 (H1 , H2 ) and B1 (H) is in terms
of the trace. For a positive operator A ∈ B(H), the trace tr [A] ∈ [0, ∞] is defined
as X
tr [A] := hek , Aek i, (1.10)
k
1.2. BOUNDED OPERATORS 11
where the sum runs over all elements of an orthonormal basis. Positivity guar-
antees that the expression is independent of the choice of that basis. Then
B1 (H) is the space of all operators for which tr [|A|] < ∞. For all trace-class
operators the trace is then unambiguously defined as well (thus the name) and
kAk1 = tr [|A|]. This satisfies |tr [A] | ≤ kAk1 (as can be seen from the Schmidt
decomposition) and the Hölder inequality kABk1 ≤ kAk1 kBk∞ holds.
In a similar vein, we can express the Hilbert-Schmidt norm as kBk2 =
1
tr [B ∗ B] 2 for any B ∈ B2 (H1 , H2 ). In fact, B2 (H1 , H2 ) becomes a Hilbert
space when equipped with the Hilbert-Schmidt inner product hA, Bi := tr [A∗ B]
(like in example 1.4).
Example 1.6 (Operator bases). As a Hilbert space B2 (H) admits an orthonormal
basis. A simple common choice is the set of matrix units {|kihl|}, which exploits
an orthonormal basis {|ki} of H. If d := dim(H) < ∞, another useful basis
d−1
can be constructed from a discrete Weyl system: define a set {Uk,l }k,l=0 of d2
unitaries by
d−1
2πi
X
Uk,l := η rl |k + lihr|, η := e d , (1.11)
r=0
d−1
where addition inside the ket is modulo d and {|ki}k=0 is again an orthonormal
basis of H. Then the Uk,l ’s become
√ orthonormal w.r.t. the Hilbert-Schmidt
inner product when divided by d. Note that for d = 2, the Uk,l ’s reduce to
the Pauli matrices (up to phases, i.e. scalar multiplies of modulus 1).
Since B2 (H) is a Hilbert space, the Riesz representation theorem guarantees
that every continuous linear functional on B2 (H) is of the form
A 7→ tr [BA] , (1.12)
for some B ∈ B2 (H). That is, B2 (H)0 ' B2 (H). Via the same trace formula we
also have that B∞ (H)0 ' B1 (H) and B1 (H)0 ' B(H). B(H)0 , however contains
more elements than those that can be obtained from Eq.(1.12) with B ∈ B1 (H).
A frequently used property of the trace is that
These are generally related as follows: norm convergence implies weak-* con-
vergence (via Hölder’s inequality) and also strong convergence (via Lipschitz in-
equality). These two, in turn, imply weak convergence (by using B = |ϕihψ| and
Cauchy-Schwarz, respectively). Moreover, on norm-bounded subsets of B(H)
weak and weak-* convergence are equivalent (as shown by employing Schmidt
decomposition together with dominated convergence).
The expression in Eq.(1.3) is strongly convergent. More generally, any norm-
bounded increasing sequence of Hermitian operators is strongly convergent in
B(H). This is often useful to lift results from finite dimensions to infinite di-
mensions. Sometimes it is used together with the fact that if An → A and
Bn → B each converge strongly, then An Bn → AB converges strongly as well,
and An C → AC converges in norm for any C ∈ B∞ (H).
Each of the mentioned notions of convergence is based on a corresponding
topology on B(H). The weak-* topology, for instance, can be defined as the
smallest topology in which all functionals of the form B(H) 3 A → tr [AB] are
continuous for any B ∈ B1 (H).
Preparation While the term ‘state’ is used for various different albeit related
mathematical objects (explained further down), a mathematically unambiguous
way to describe the preparation of a quantum system is the use of density
operators:
Proposition
1.5 (Purity). Let ρ ∈ B(H) be a density operator. Then 0 <
tr ρ2 ≤ 1 with equality iff ρ describes a pure state. Moreover, if d := dim(H) <
∞, then tr ρ2 ≥ 1/d with equality iff ρ = 1/d (which is then called maximally
mixed).
2
Proof. Since tr ρ2 = kρk2 , it is positive
and non-zero. Hölder’s inequality
together with kρk1 = 1 gives tr ρ2 ≤ kρk. Since the operator norm, in this
case, equals the largest eigenvalue and all eigenvalues are positive and sum up
to one, we get kρk ≤ 1 with equality iff ρ has rank one.
For the lower bound in finite dimensions, we can invoke the Cauchy-Schwarz
inequality for the Hilbert-Schmidt inner product in order to get:
1 = tr [1ρ] ≤ tr [1] tr ρ2 = d tr ρ2 .
2
Example 1.7 (Bloch ball). There is a bijection between the set of density oper-
ator on C2 and the set of vectors r ∈ R3 with Euclidean norm krk ≤ 1, given
7 There is yet another, more general, mathematical meaning of the term ‘state’, namely as a
positive normalized linear functional. Clearly, every density operator induces such a functional
via A 7→ tr [ρA]. In fact, every weak-* continuous positive normalized linear functional on
B(H) is of this form. If one drops or relaxes the continuity requirement, there are, however,
other ‘states’ as well. Those arising from density operators are then called normal states and
the other ones singular states.
16 CHAPTER 1. MATHEMATICAL FRAMEWORK
by
3
1
1+
X
ρ= ri σi . (1.16)
2 i=1
2
The purity is then expressible as tr ρ2 = 21 1 + krk . Consequently, the
boundary coincides with the set of pure states and the origin corresponds to
the maximally mixed state. Physically, a two-level density operator (a ‘qubit’)
might for instance model:
◦ An atom in a double-well potential. ρ = |0ih0| and ρ = |1ih1| would then
correspond to the atom being left or right, respectively.
◦ A two-level atom with ρ = |0ih0|, ρ = |1ih1| referring to the ground and
exited state, respectively.
◦ The spin of an electron with ρ = |0ih0|=
ˆ spin up, ρ = |1ih1|=
ˆ spin down.
◦ Polarization degrees of freedom of light. North-/south pole correspond
to left-/right circular polarization while the east-/west pole correspond to
horizontal/vertical polarization. The center ρ = 12 then describes unpo-
larized light.
The case dim(H) = 2 is very special in many ways. For instance, a nice
geometric representation of the set of all density operators as in Eq.(1.16) is not
possible in higher dimensions.
In infinite dimensions, as seen in Exercise 1.10, weak converges can be a
rather weak, indeed, even when restricted to finite-rank operators. On the set
of density operators, however, normalization and positivity assure that weak
convergence implies every other from of convergence:
Theorem 1.6 (Convergence to a density operator). Let ρn ∈ B1 (H) be a se-
quence of positive operators that converges weakly to a density operator ρ and
satisfies tr [ρn ] → 1. Then kρn − ρk1 → 0.
Proof. Exploiting the spectral decomposition of ρ, we can find a finite-dimensional
orthogonal projection P for which 1 − tr [ρP ] =: is arbitrarily small. That is,
for any ε > 0, we can achieve < ε in this way. With P ⊥ := 1 − P we can
bound
kρ − ρn k1 ≤ kP (ρ − ρn )P k1 + 2
P (ρ − ρn )P ⊥
1 +
P ⊥ (ρ − ρn )P ⊥
1 . (1.17)
The first term on the r.h.s. converges to zero, since it involves only finite-
dimensional operators on which weak convergence implies norm convergence (in
any norm). For the second term on the r.h.s. of Eq.(1.17) we first use that
P ρP ⊥ = 0 and then bound the remaining part via
Here, we have first used Hölder’s inequality, then the polar decomposition
ρn P ⊥ = U |ρn P ⊥ |, and in the third step the Cauchy-Schwarz inequality for
the Hilbert-Schmidt inner product.
Finally, an upper bound for the third term on the r.h.s. of Eq.(1.17) is
⊥
P (ρ − ρn )P ⊥
≤ tr P ⊥ ρP ⊥ + tr P ⊥ ρn P ⊥
1
= + tr [ρn ] − tr [P ρn P ] → 2.
In fact, the property just proven extends to the entire space of trace-class
operators: if Tn ∈ B1 (H) converges weakly to T ∈ B1 (H) and kTn k1 → kT k1 ,
then Tn → T in trace-norm.
Probabilities Having introduced the basic mathematical objects that are as-
signed to preparation and measurement, it remains to see how these are com-
bined in a way that eventually leads to probabilities. This is what the following
postulate is doing:
Postulate 1.9 (Born’s rule). The probability p(Y |ρ, M ) of measuring an out-
come in Y ∈ B if preparation and measurement are described by a density
operator ρ ∈ B1 (H) and a POVM M : B → B(H), respectively, is given by
If ρ and M are clear from the context, we will simply write p(Y ) := p(Y |ρ, M )
and if X is discrete and B the corresponding power set, we will write p(x) or
px for p({x}).
The defining properties of density operators and POVMs now nicely play
together so that p(Y |ρ, M ) has all the necessary properties for an interpretation
in terms of probabilities:
Here interchanging the sum with the one in the trace is justified by positivity
of all expressions and Fubini-Tonelli.
1.3. PROBABILISTIC STRUCTURE OF QUANTUM THEORY 19
If M and ρ are given, Born’s rule tells us how to compute quantum theory’s
prediction of the measurement probabilities. In practise, we typically know M
and ρ only for some simple cases together with some mathematical rules (yet
to be formalized in this lecture) telling us how to reduce more general cases to
these simple ones. The largest part of quantum theory (Schrödinger equation,
composite systems, etc.) is about those rules and their consequences.
Traditional text-book quantum theory often assume ρ to be pure and M to
be sharp. We will soon see in which sense this is justified.
As a first application of the formalism, let us consider the problem of infor-
mation transmission via a d-level quantum system, i.e., one for which H = Cd .
Given an alphabet X of size |X| = m, is it possible to encode all its elements
into a d-level quantum system so that the information can finally be retrieved
exactly or at least with a small probability of error?
Following the rules of the formalism, we assign a density operator ρx ∈ B(H)
to each x ∈ X. Similarly, we assume that there is a measurement apparatus that
has X as the set of possible measurement outcomes P so that a positive operator
Mx ∈ B(H) is assigned to each outcome and that x∈X Mx = 1. If ρx has been
prepared, the probability for measuring the correct outcome is then, according
to Born’s rule: px := tr [ρx Mx ]. Now consider the average probability of success,
averaged uniformly over all x ∈ X:
Proposition 1.11. The average probability of success, when Ptransmitting an
1 d
alphabet of size m over a d-level quantum system satisfies m x p x ≤ m .
Proof. The claim follows from the defining properties of POVMs and density
operators for instance via the use of Hölder’s inequality and the fact that
kρx k∞ ≤ 1:
1 X 1 X 1 X X d
px = tr [ρx Mx ] ≤ kρx k∞ kMx k1 ≤ tr [Mx ] = .
m x m x m x x
m
This should be compared with the performance of the following naive classi-
cal (= non-quantum) protocol that aims at transmitting a random element from
the alphabet X using only d of its elements: fix any subset D ⊆ X of d = |D|
elements; send x if x ∈ D and send an arbitrary element from D if x 6∈ D. The
probability of success of this protocol is d/m. Prop.1.11 tells us that this can
not be outperformed by any quantum protocol.
As a second simple application of the formalism, let us analyze to what
extent a change in ρ or M can alter the probability of a measurement outcome:
Corollary 1.12 (Lipschitz-bounds for probabilities). Let ρ, ρ0 ∈ B1 (H) be den-
sity operators, M, M 0 : B → B(H) POVMs an a common measurable space
(X, B) and Y ∈ B. Then
where equality can be attained for every pair ρ, ρ0 by a suitable choice of the
POVM M . Similarly,
1
≤ k∆+ k1 kM (Y )k∞ ≤ kρ − ρ0 k1 ,
2
where we have used kM (Y )k∞ ≤ 1, which is a consequence of 0 ≤ M (Y ) ≤
1 (cf. Exercise 1.8). Equality in all the involved inequalities is achieved for
M (Y ) = P+ . The operators (P+ , 1 − P+ ) then form a suitable POVM.
In order to arrive at Eq.(1.22), first note that Hölder’s inequality together
with kρk1 = 1 leads to the upper bound
tr ρ M (Y ) − M 0 (Y ) ≤ kM (Y ) − M 0 (Y )k .
∞
That this equals the supremum follows from the fact that the operator norm of
the Hermitian operator M (Y ) − M 0 (Y ) can already be obtained by taking the
supremum over all pure states ρ = |ψihψ| on the l.h.s. (cf. Exercise 1.8d).
P
which in the discrete case reduces to M̂ = x m(x)Mx . We will also use the
common notation hM̂ i := tr ρM̂ . So far, M̂ is a formal expression that is not
guaranteed to be meaningful if m is not bounded. For simplicity, we will leave
the discussion of the unbounded case aside. P
If the underlying POVM M is sharp, then M̂ = x m(x)Mx becomes a
spectral decomposition. In this case, we call M̂ an observable 9 and notice that
each m(x) is then an eigenvalue of M̂ with corresponding spectral projection Mx .
That is, M̂ determines both m and M . In this way, any Hermitian operator is a
mathematically valid observable whose spectral decomposition determines the
set of possible measurement values and the POVM. Furthermore, since spectral
projections of a Hermitian operator that correspond to different eigenvalues are
mutually orthogonal (i.e. Mx My = δx,y Mx , cf. Exercise1.15) we can express
the variance as
h i h i2
var(m) = tr ρM̂ 2 − tr ρM̂ =: var(M̂ ). (1.24)
Notice that this does not hold in general, i.e. when M is not sharp.
Textbook descriptions of quantities like position, momentum, energy, an-
gular momentum and spin are usually in terms of observables (albeit in the
more general framework of not necessarily bounded self-adjoint operators). For
instance, the Pauli matrices, when divided by two, are the observables that
correspond to the three spin directions of a spin- 21 particle.
Exercise 1.11. Show that every trace-class operator can be written as a linear combi-
nation of four density operators.
Exercise 1.12. Let V ∈ B(H1 , H2 ) be such that for every density operator ρ ∈ B1 (H1 )
the operator V ρV ∗ is again a density operator. What can be said about V ?
Exercise 1.13. Prove the Bloch ball representation in Eq.(1.16). (Hint: use the deter-
minant). For a given density operator on C2 , how can the vector r be obtained?
Exercise 1.14. For any H construct a POVM that implements a ‘biased coin’ whose
outcomes occur independently of the density operator with probabilities 21 (1 ± b),
where b ∈ [0, 1] is a fixed bias.
Exercise 1.15. Let M : B → B(H) be a sharp POVM on (X, B). Show that Y ∩ J = ∅
implies that M (Y )M (J) = 0. From here, prove that the number of pairwise disjoint
elements in B on which M is non-zero is at most d if H = Cd .
Exercise 1.16. Show that two preparations described by density operators ρ1 , ρ2 ∈
B1 (H) can be distinguished with certainty in a statistical experiment iff ρ1 ρ2 = 0.
Exercise 1.17. Construct a pair of density operators ρ, ρ0 on a common Hilbert space
with the properties that: (i) their spectra coincide and each eigenvalue has multiplicity
one, (ii) there is no unitary U such that U ρU ∗ = ρ0 .
1.4 Convexity
Convex sets and extreme points
◦ The dimension of a convex set is the dimension of the affine space gener-
ated by it.
Here, the decomposition into extreme points is unique for all x ∈ C iff the
convex set is a simplex, i.e., it has exactly d + 1 extreme points. The set of
probability distributions over a finite set, for instance, forms a simplex.
The infinite dimensional analogue of Caratheodory’s theorem requires taking
the closure of the set of extreme points. Then the analogous statement is true for
all topologies that are ‘locally convex’. This means that the topology arises from
(semi-)norms, as it is the case for all topologies discussed so far, in particular,
for the weak-* topology on B(H).
The second part of this theorem (due to Kadison and Pedersen) implies that
every element of the unit ball can be approximated up to 2/n in operator norm
by an equal-weight convex combination of n unitaries. This is reminiscent of
the following result that holds for inner product spaces. It has a very elegant
proof that exploits the probabilistic method—so we have to state it:
Theorem 1.17 (Maurey). Let C be a subset of an inner product space, φ ∈
conv(C) and b := supξ∈C kξk. For any n ∈ N there are elements ψ1 , . . . , ψn ∈ C
so that
n
1X
2 b2
φ − ψi
≤ , (1.25)
n i=1 n
where the norm is the one induced by the inner product.
Mixtures of states On any given Hilbert space H, the set of density opera-
tors S(H) := {ρ ∈ B1 (H)|ρ ≥ 0, tr [ρ] = 1} is a convex set: the trace is obviously
preserved by convex combinations and the sum of two positive operators is again
positive. In fact, slightly more is true: if (ρn )n∈N is any sequence of density
operators and (λn )n∈N is any sequence of positive numbers that sum up to one,
then
X∞
λn ρn ∈ S(H),
n=1
where the sequence of partial sums converges
in trace norm.
InPorder to see
Pl l
this, realize that it is a Cauchy sequence (as
n=k λn ρn
≤ n=k λn and
1
λ ∈ l1 (N)) and that B1 (H) is a Banach space.
Conversely, every single density operator can be convexly decomposed into
pure state density operators via its spectral decomposition, which in this case
coincides with the Schmidt decomposition
X
ρ= λn |ψn ihψn |,
n
where the λn ’s are the eigenvalues and the ψn ’s the corresponding orthonormal
eigenvectors. Pure state density operators can not be convexly decomposed
24 CHAPTER 1. MATHEMATICAL FRAMEWORK
further (Exercise 1.18). Consequently, the pure state density operators are
exactly the extreme points of S(H). If ρ is not pure, there are infinitely many
ways of decomposing it convexly into pure states—the spectral decomposition
is one of them and distinguishes itself by the fact that the ψn ’s are mutually
orthogonal.
Example 1.8 (Decompositions into pure states). For any density operator ρ ∈
B1 (H) convex decompositions into pure states can be constructed from any
orthonormal basis {ek } via the corresponding resolution of identity in Eq.(1.3):
√
if we multiply Eq.(1.3) from both sides with ρ, we obtain
X√ √ X
ρ= ρ|ek ihek | ρ = pk |ϕk ihϕk |, (1.26)
k k
√
√
with ϕk := ρek /
ρek
and pk := hek , ρek i. Since every subspace that has di-
mension greater than one admits an infinite number of inequivalent orthonormal
bases, this construction leads to an infinite number of different decompositions
unless ρ is pure. In Cor. 1.22 we will see, that the resulting probability dis-
tribution p is always at least as mixed as the distribution of eigenvalues of ρ.
Moreover, one can show that all countable convex decompositions into pure
states can be obtained in the described way if one allows in addition to first
embed isometrically into a larger Hilbert space and then follows the described
construction starting from an orthonormal basis of the larger space.
Convex combinations of density operators have a simple operational mean-
ing. To understand this, assume that an experimentalist has two preparation
devices at hand, which are described by density operators ρ0 , ρ1 ∈ B1 (H). As-
sume further, that for every single preparation of the system, she first flips a
coin and then uses one of the two devices depending on the outcome, say ρ1
with probability λ and ρ0 with probability 1 − λ. If eventually a measurement is
performed that is described by a POVM M , then the probability of measuring
an outcome in Y is given by
λp(Y |ρ1 , M ) + (1 − λ)p(Y |ρ0 , M ) = tr λρ1 + (1 − λ)ρ0 M (Y ) ,
where Born’s rule was used together with the linearity of the trace. Hence, the
overall preparation, which now includes the random choice of the experimental-
ist, is described by the convex combination λρ1 + (1 − λ)ρ0 .
Majorization In Prop. 1.5 we saw that the functional tr ρ2 can be used to
quantify how pure ormixed a density operator is. Using functional calculus this
can be express as tr ρ2 = tr [f (ρ)] with f (x) = x2 . This choice is somewhat
arbitrary since we could instead have used e.g. f (x) = x3 , which also orders the
set of density operators from the maximally mixed state to the pure states. If
dim(H) > 2, however, the two orders turn
out to be inequivalent, i.e. we can
find ρ1 , ρ2 with tr ρ21 > tr ρ22 but tr ρ31 < tr ρ32 . So is there any reasonable
way of saying that ρ1 is more mixed (or pure) than ρ2 ? The answer to this
question is given by a preorder11 that is based on the notion of majorization.
11 A preorder is a binary relation that is transitive and reflexive.
1.4. CONVEXITY 25
and similarly for the transposed matrix. Note that in particular every permu-
tation matrix is unistochastic as it can be obtained by choosing U to be the
corresponding permutation of basis elements.
The following relates the concepts discussed so far in this paragraph:
Theorem 1.20. Let λ, µ be two finite (and equal-length) or infinite sequences
of non-negative real numbers with kλk1 = kµk1 = 1. Then the following are
equivalent:
(i) λ ≺ µ.
(ii) There is a doubly stochastic matrix M so that λ = M µ.
(iii) For all continuous convex functions f : [0, 1] → R that satisfy f (0) = 0:
X X
f (λk ) ≤ f (µk ). (1.29)
k k
26 CHAPTER 1. MATHEMATICAL FRAMEWORK
λ p. (1.30)
P
Proof. Inserting the spectral decomposition ρ = i λi |ψi ihψi | into pk = hek , ρek i,
we obtain p = M λ with Mki := |hek , ψi i|2 . Since we can express ψi = U ei for
a suitable unitary U , we get that M is an unistochastic matrix, so that by
Thm.1.20 λ p.
In the context of Example 1.8, this result implies that among all the decom-
positions into pure states to which Eq.(1.26) gives rise, the spectral decomposi-
tion is the least mixed.
Proof. First observe that c := hψ, Aψi ∈ [a, b] since a1 ≤ A ≤ b1. Assume for
the moment that c ∈ (a, b). By convexity of f we can find an affine function
l : [a, b] → R such that f ≥ l and f (c) = l(c). Then f (A) ≥ l(A) and therefore
hψ, f (A)ψi ≥ hψ, l(A)ψi = l(c) = f (c) = f hψ, Aψi ,
Here, the first inequality is due to Eq.(1.31), the second inequality uses convexity
of f and the last step uses that (hψk , Aλ ψk i) is the sequence of eigenvalues of
Aλ . P
(ii) follows from Eq.(1.31) applied to each term in F (A) = k hek , f (A)ek i.
The following useful observation also enables to lift inequalities of scalar
functions to inequalities of functions of operators under the trace:
Lemma 1.25. Let I ⊆ R be an open interval. If fi , gi : I → R and αi ∈ R for
i ∈ {i, . . . , n} satisfy
n
X
αi fi (a)gi (b) ≥ 0 ∀a, b ∈ I, then
i=1
n
X
αi tr [fi (A)gi (B)] ≥ 0 (1.33)
i=1
Proof. Both inequalities exploit Lemma 1.25. Eq.(1.34) then follows from the
fact that any convex function satisfies f (a) − f (b) ≥ (a − b)f 0 (b) and Eq.(1.26)
uses the mean-value version of Taylor’s theorem, which states that there is a
z ∈ [a, b] such that
1
f (b) = f (a) + (b − a)f 0 (a) + (b − a)2 f 00 (z).
2
Depending on the field, different bases of the logarithm are used: the natural
choice in information theory is log2 , whereas in thermodynamics and statistical
physics the natural logarithm ln is used.
On the relevant interval [0, 1] the function h is non-negative, continuous and
concave. By Cor.1.24 (i) this implies that the von Neumann entropy S is a non-
negative, concave functional on the set of density operators. From Cor. 1.21 we
get
ρ1 ≺ ρ2 ⇒ S(ρ1 ) ≥ S(ρ2 ).
For finite dimensional Hilbert spaces the von Neumann entropy is continuous,
which is implied by continuity of the eigenvalues. In infinite dimensions conti-
nuity has to be relaxed to lower semicontinuity. This means lim inf ρ→ρ0 S(ρ) ≥
S(ρ0 ) (cf. Example 1.11 and Exercise 1.21).
Since h(x) = 0 iff x ∈ {0, 1} we get that S(ρ) = 0 iff ρ is pure. On Cd the
maximum S(ρ) = log d is attained iff ρ = 1/d is maximally mixed. The infinite
dimensional case is elucidated by the following example:
Example 1.11 (Infinite entropy). Consider a sequence pn := c/ n(log n)γ for
R > 2, γ ∈γ (1, 2) and c 1−γ
n a positive constant to be chosen shortly. From
1/(x(log x) )dx = (log x) P/(1 − γ) it follows thatP p ∈ l1 (N) so that we
can choose c in a way that p
n Rn = 1. However, − n pn log pn = ∞ due
to the divergence of the integral 1/(x(log x)γ−1 )dx. Hence, if σ is a density
operator with eigenvalues (pn ), then S(σ) = ∞. Moreover, if ρ is any density
operator, then S (1 − )ρ + σ ≥ (1 − )S(ρ) + S(σ) = ∞ for any > 0.
Consequently, on an infinite dimensional Hilbert space, the density operators
with infinite entropy are trace-norm dense in the set of all density operators.
Exercise 1.18. Show that pure states are extreme points of the convex set of density
operators.
1.5. COMPOSITE SYSTEMS AND TENSOR PRODUCTS 29
Exercise 1.19. Let ρ1 , ρ2 ∈ B(Cd ) be two density operators. Prove that ρ1 ≺ ρ2 iff
there exist a finite set ofPunitaries Ui ∈ B(Cd ) and corresponding probabilities pi > 0,
P ∗
i pi = 1 so that ρ1 = i pi Ui ρ2 Ui .
Denote by Un all maps from P
Exercise 1.20.P B(Cd ) to itself that are of the form
B(C ) 3 ρ 7→ i=1 pi Ui ρUi , for some pi ≥ 0, n
d n ∗
i=1 pi = 1Sand unitaries Ui ∈ B(Cd ).
Determine an m ∈ N (as a function of d) such that Um = n∈N Un .
Exercise 1.21. Construct a sequence of density operators of finite entropy that con-
verges in trace-norm to a pure state but has entropy diverging to ∞.
Definition 1.28 (Direct sum). Let H1 and H2 be Hilbert spaces. Their direct
sum is the Hilbert space H1 ⊕ H2 := {(ψ, ϕ) ∈ H1 × H2 } with inner product
Tensor products
30 CHAPTER 1. MATHEMATICAL FRAMEWORK
H1 ⊗ H2 ' H2 ⊗ H 1 ,
(H1 ⊗ H2 ) ⊗ H3 ' H1 ⊗ (H2 ⊗ H3 ), (1.37)
H1 ⊗ (H2 ⊕ H3 ) ' (H1 ⊗ H2 ) ⊕ (H1 ⊗ H3 ).
Proof. We could simply argue that the respective orthonormal bases have the
same cardinality and thus there has to be an isomorphism. For later use, how-
ever, we follow a more explicit route. For that, it is convenient to introduce
1.5. COMPOSITE SYSTEMS AND TENSOR PRODUCTS 31
P
the complex conjugate ψ := k hψ, ek iek of an arbitrary element ψ ∈ H1 w.r.t.
a fixed orthonormal basis {ek } ⊂ H1 . Note that the operation ψ 7→ ψ is an
involution that preserves the norm as well as orthogonality. Now we define
and extend it by linearity and continuity to the entire space. Then I is the
sought Hilbert space isomorphism since it is a bijection between orthonormal
bases: a product basis |ek i ⊗ |fl i of H1 ⊗ H2 and a basis of rank-one operators
|fl ihek | of B2 (H1 , H2 ).
An important application of this isomorphism is a normal-form for elements
of a tensor product Hilbert space:
Theorem 1.31 (Schmidt decomposition for tensor products). For every Ψ ∈
H1 ⊗ H2 there is an r ∈ N ∪ {∞}, a sequence of strictly positive numbers (si )ri=1
and orthonormal bases {ek } ⊂ H1 , {fl } ⊂ H2 such that
r
X
Ψ= si ei ⊗ fi . (1.42)
i=1
(A1 ψ1 ) ⊗ (A2 ψ2 ) and its extension by linearity. Then (A1 ⊗ A2 )∗ = A∗1 ⊗ A∗2
and if Bi ∈ B(Hi ) then
The tensor product can be shown to preserve properties like unitarity, positiv-
ity, Hermiticity, normality, boundedness, compactness, trace-class or Hilbert-
Schmidt-class. That is, if both A1 and A2 have one of these properties, then
so does A1 ⊗ A2 . More specifically, kA1 ⊗ A2 kp = kA1 kp kA2 kp holds for all
p ∈ [1, ∞] and if A1 , A2 are trace-class, then tr [A1 ⊗ A2 ] = tr [A1 ] tr [A2 ].
A useful representation of the tensor product in the finite dimensional case
is the Kronecker product of matrices: if A and B are finite matrices, then A ⊗ B
can be represented as a block matrix
A11 B A12 B · · ·
A21 B A22 B · · ·
.
.. ..
..
. . .
(ii) If H1 ' H2 ' Cd and I(Ψ) is invertible, then for any A ∈ B(H1 ) there is a
B ∈ B(H2 ), which can be obtained from A via a similarity transformation,
so that
(A ⊗ 1)Ψ = (1 ⊗ B)Ψ. (1.44)
If Ψ is maximally entangled,
√ then B has the same singular values as A.
In particular, if I(Ψ) = 1/ d, then B = AT .
Proof. (i) follows from the defining equation of the isomorphism, Eq.(1.41), via
(A ⊗ B)|ψi ⊗ |ϕi = |Aψi ⊗ |Bϕi 7→ |BϕihAψ| = B|ϕihψ|AT .
Eq.(1.44) in (ii) follows from (i) by setting B := I(Ψ)AT I(Ψ)−1 . Since A is
similar
√ to AT , B is similar to A. If in addition Ψ is maximally entangled, then
dI(Ψ) is a unitary, so that the claim follows by inserting the singular value
decomposition of A.
Example 1.13 (GHZ and W-states). As a shorthand for ek ⊗fl ⊗gm , where k, l, m
each label elements of an orthonormal basis, it is sometimes convenient to write
|k l mi. Using this notation, two prominent examples of states in C√2
⊗ C2 ⊗ C2
are the Greenberger-Horne-Zeilinger
√ (GHZ) state (|000i + |111i)/ 2 and the
W-state (|100i + |010i + |001i)/ 3.
Definition 1.33 (Tensor rank). The tensor rank of an element Ψ ∈ H1 ⊗ . . . ⊗
Hm , is defined as R(Ψ) := min r ∈ N| Ψ = i=1 ψi ⊗. . .⊗ψi , ψi ∈ Hk .
Pr (1) (m) (k)
linear combinations of the r products a(A)i b(Bi ). In this way, and by using
recursion, the (so far unkonwn) tensor rank of T provides an upper bound on
the (so far unknown) complexity of matrix multiplication. Note that naive
matrix multiplication would require d3 products but, as Strassen has observed,
R(T ) < d3 . Specifically, for d = 2 he found R(T ) = 7.
1 ⊗ hek | B 1 ⊗ |ek i .
X
tr2 [B] := (1.49)
k
According to the subsequent Lemma 1.36, the r.h.s. of this equation converges
in trace-norm to a trace-class operator. Hence, tr2 is well-defined and Eq.(1.47)
can be verified by insertion. Uniqueness of the map is implied by the fact
that specifying tr [XA] for all A ∈ B(H1 ) determines X. In particular, the
construction in Eq.(1.49) is basis-independent.
The properties summarized in Eq.(1.48) follow immediately from Eq.(1.47).
For instance, positivity of hψ, tr2 [B]ψi = tr [B(|ψihψ| ⊗ 1)] is implied by posi-
tivity of B together with |ψihψ| ⊗ 1 ≥ 0 (cf. Exercise 1.6).
Finally, we prove the missing Lemma that shows trace-norm convergence of
the ansatz in Eq.(1.49). For later use, the formulation is slightly more general.
LemmaP 1.36. Let (Ak )k∈N ⊂ B(H1 , H2 ) be a sequence of operators for which
n
limn→∞ k=1 A∗k Ak = X ∈ B(H1 ) converges weakly. Then for every B ∈
B1 (H1 ) there is a B 0 ∈ B1 (H2 ) so that
n
0 X ∗
B − Ak BAk
→ 0 , (1.50)
k=1 1
1.5. COMPOSITE SYSTEMS AND TENSOR PRODUCTS 35
the second system and consider only the first part, the corresponding density
operator is given by ρ1 := tr2 [ρ]. This is then called a reduced density operator.
Similarly, if we discard the first subsystem, the reduced density operator that
describes the remaining part is ρ2 := tr1 [ρ]. If ρ is a pure state, the reduced
density operators can be read off its Schmidt decomposition:
Corollary 1.37. Let |ΨihΨ| P ∈ B1 (H
√ 1 ⊗ H2 ) be a pure density operator with
Schmidt decomposition |Ψi = i=1 λi |ei i ⊗ |fi i with r ∈ N ∪ {∞}. Then its
r
Proof. The statement follows from inserting the Schmidt decomposition into the
explicit form of the partial trace in Eq.(1.49). The calculation simplifies if we
use the basis of the Schmidt decomposition in the respective partial trace.
Cor.1.37 leads to some of simple but useful observations: the spectra of the
two reduced density operators coincide as multisets and, more qualitatively, the
rank of each reduced density operator equals the Schmidt rank. In particular,
Ψ is a simple tensor product (r = 1) iff the reduced states are pure.
Another simple but useful observation is that the above corollary can be
read in reverse, and we can (at least mathematically) regard every mixed state
as the reduced state of some larger system that is described by a pure state:
Corollary 1.38 (Purification). Let ρ1 ∈ B1 (H1 ) be a density operator of rank
r ∈ N ∪ {∞}. Then there is a Hilbert space H2 of dimension
dim(H2 ) = r and
a pure state |ΨihΨ| ∈ B1 (H1 ⊗ H2 ) so that ρ1 = tr2 |ΨihΨ| .
Proof. We start with the spectral decomposition of ρ1 , which we interpret as the
l.h.s. of Eq.(1.51), and construct
√ a pure state Ψ via its Schmidt decomposition
with Schmidt coefficients λi and the eigenvectors of ρ1 as orthonormal family
on the first tensor factor. Cor. 1.37 then guarantees that we recover ρ1 as the
partial trace of |ΨihΨ|.
Clearly, such a purification is not unique. Any state vector of the form
(1 ⊗ V )Ψ with V an isometry would also be a working purification.
Let us finally have a closer look at how the machinery of reduced and com-
posite systems works on the side of the measurements. Suppose there are two
independent measurement devices acting on the two parts of a composite sys-
tem, individually described by POVMs M1 and M2 . If Y1 ⊆ X1 and Y2 ⊆ X2 are
corresponding measurable sets of measurement outcomes, then the overall mea-
surement that now has outcomes in X1 × X2 , equipped with the product sigma-
algebra, is described by a POVM that satisfies M (Y1 × Y2 ) = M1 (Y1 ) ⊗ M2 (Y2 ).
Taking disjoint unions and complements (as in Lemma 1.8) this defines M on
the entire product sigma-algebra. The marginal probabilities are then given by
p1 (Y1 ) = p(Y1 × X2 ) = tr [ρM (Y1 × X2 )] = tr ρ M1 (Y1 ) ⊗ M2 (X2 )
= tr ρ M1 (Y1 ) ⊗ 1 = tr [ρ1 M1 (Y1 )] ,
1.5. COMPOSITE SYSTEMS AND TENSOR PRODUCTS 37
consistent with the definition and interpretation of the reduced density operator
ρ1 = tr2 [ρ].
If the overall states is described by a simple tensor product ρ = ρ1 ⊗ ρ2 ,
which is then called a product state, we obtain
p(Y1 × Y2 ) = tr (ρ1 ⊗ ρ2 ) M1 (Y1 ) ⊗ M2 (Y2 ) = tr [ρ1 M1 (Y1 )] tr [ρ2 M2 (Y2 )]
= p1 (Y1 ) p2 (Y2 ).
This means that the measurement outcomes are independent. In other words,
there are no correlations between the subsystems if the preparation is described
by a product state.
Entropic quantities
Corollary 1.40 (Pinsker inequality). The relative entropy and the mutual in-
formation as defined in Def.1.39 satisfy:
1 2
S(ρkσ) ≥ kρ − σk1 , (1.52)
2
1 2
I(A : B) ≥ kρAB − ρA ⊗ ρB k1 . (1.53)
2
In particular, S(ρkσ) = 0 and I(A : B) = 0 iff ρ = σ and ρAB = ρA ⊗ ρB ,
respectively.
Proof. For ease of the argument, we are going to cheat a little bit and prove
Eqs. (1.52,1.53) for k·k2 instead of for k·k1 . Clearly, the trace-norm bound is
the stronger result and we refer to ... for its proof.
By definition of the mutual information, Eq.(1.53) is a consequence of Eq.(1.52).
In order to arrive at Eq.(1.52), we use the fact that f (x) := x log x is strongly
convex on [0, 1] with f 00 (x) = 1/x ≥ 1. So we can apply Eq.(1.26) from which
the result then follows instantly.
Exercise 1.22. For i ∈ {1, 2} consider Ai ∈ B(Hi ). Show that if A1 , A2 are positive or
unitary then the same holds true for A1 ⊗ A2 .
38 CHAPTER 1. MATHEMATICAL FRAMEWORK
Exercise 1.23 (Flip). Let H1 ' H2 ' Cd . By identifying bases of the two spaces we
can define a flip operator F ∈ B(H1 ⊗ H2 ) via F(ϕ ⊗ ψ) = ψ ⊗ ϕ.
(a) Determine the eigenvalues and eigenvectors of F.
(b) Prove that F is the unique operator satisfying tr [F(A ⊗ B)] = tr [AB] ∀A, B ∈
B(Cd ).
(c) Let (Gi )di=1 ⊂ B(Cd ) be any Hilbert-Schmidt-orthonormal basis of Hermitian
2
Remark: here we have tacitly introduced a third level tensor product, namely
the tensor product of linear maps on spaces of operators. T ⊗ idn is defined as
T ⊗ idn : A ⊗ B 7→ T (A) ⊗ B and linear extension to finite linear combinations.
Let us see how these properties come into play. If T : B1 (H1 ) → B1 (H2 )
is a trace-preserving and positive linear map, then T (ρ) is a density operator
whenever ρ is one. Recalling that ρ might describe a part of a larger system
whose other parts are left untouched by T , it is necessary to impose that not
only T maps density operators to density operators, but (T ⊗ id) does so as
well. This is captured by the notion of complete positivity. In principle, this
should hold not only for a finite-dimensional ‘innocent bystander’. We will see
later though, from the representation theory of completely positive maps, that
considering finite-dimensional systems is sufficient in this context.
Example 1.15 (Transposition). The paradigm of a map that is positive but not
completely positive is matrix transposition. Let Θ : B(H) → B(H), Θ(A) := AT
be the transposition map w.r.t. a fixed basis {|ki} ⊂ H. This is a positive
map, since it preserves Hermiticity as well as the spectrum. for |ψi =
However,
|00i + |11i ∈ H ⊗ C we get (Θ ⊗ id2 ) |ψihψ| = i,j=0 Θ |iihj| ⊗ |iihj| =
2
P1
P1
i,j=0 |jihi| ⊗ |iihj|, for which −1 is an element of the spectrum (cf. Exercise
1.23).
Let us turn to the Heisenberg picture. Assume that T ∗ : B(H2 ) → B(H1 ) is
a continuous, unital and positive linear map.12 If M : B → B(H2 ) is a POVM,
then M 0 := T ∗ ◦ M : B → B(H1 ) is a POVM as well. To see this, note that
positivity of T ∗ implies positivity of M 0 (Y ) for all Y ∈ B and if X = ∪k Xk is
countable disjoint partition of the set X of all possible outcomes into measurable
subsets Xk , then
!
M (Xk ) = T ∗ (1) = 1,
X X
M 0 (Xk ) = T ∗
k k
where we used continuity of T ∗ in the first step and unitality in the last step.
Since Schrödinger picture and Heisenberg picture describe the same thing
from different viewpoints, they should lead to consistent predictions. As the pre-
dictions are in the end probabilities expressed through Born’s rule, the equiva-
lence of the two viewpoints should be expressible on this level. This equivalence
is established in the following theorem. For any map T in the Schrödinger pic-
ture it proves the existence of an equivalent description via a map T ∗ in the
12 The meaning of the ‘∗ ’ will become clear below. For now, read ‘T ∗ ’ just as an arbitrary
Imposing positivity of the l.h.s. for all ψ ∈ H1 and all positive A ∈ B(H2 ) is
equivalent to positivity of T . Imposing the same for the r.h.s. is equivalent
to positivity of T ∗ . So these conditions are equivalent. The same argument
applies to complete positivity by replacing T with T ⊗ idn and realizing that
(T ⊗ idn )∗ = T ∗ ⊗ idn .
Similarly, from Eq.(1.54) we derive the equation
tr [T (B) − B] = tr B T ∗ (1) − 1 .
(1.56)
Here the l.h.s. is zero for all B ∈ B1 (H1 ) iff T is trace-preserving, whereas the
r.h.s. is zero for all B ∈ B1 (H1 ) iff T ∗ is unital.
One important property of the dual map has been left aside and will be cov-
ered in the following corollary: continuity. Before proving this in a quantitative
way, some remarks on the involved norms are in order.
Both T and T ∗ are maps between Banach spaces. If not specified other-
wise, their norms are the corresponding Banach space operator norms. That is,
kT k = sup{kT (B)k1 | kBk1 ≤ 1} and kT ∗ k = sup{kT ∗ (A)k∞ | kAk∞ ≤ 1}. The
1.6. QUANTUM CHANNELS AND OPERATIONS 41
involved trace-norm and the operator norm in B(H) are dual to each other in
the sense that
kBk1 = sup tr [AB] , and kAk∞ = sup tr [AB] . (1.57)
kAk∞ =1 kBk1 =1
These equations can for instance be proven by means of the polar decomposition
and the Schmidt decomposition, respectively.
Corollary 1.43. Let T : B1 (H1 ) → B1 (H2 ) be a bounded linear map and T ∗
the corresponding dual map. Then kT ∗ k = kT k. Moreover, if T is positive,
these norms are equal to kT ∗ (1)k∞ . In particular, if T is positive and trace-
preserving, then for all B ∈ B1 (H1 ), A ∈ B(H2 ):
Proof. Using the defining relation between T and T ∗ and Eq.(1.57) we obtain
To proceed, we exploit the convex structure of the unit balls in B1 (H1 ) and
B(H2 ) by which it suffices to take the suprema over all rank-one elements in the
trace-class and all unitaries in B(H2 ). The latter is justified by the Russo-Dye
theorem (Thm.1.16) and the former by the Schmidt-decomposition (Eq.(1.8)).
Thus
kT ∗ k = sup sup hϕ, T ∗ (U )ψi,
(1.60)
U ψ,ϕ
where the suprema are taken over all unitaries U ∈ B(H2 ) and unit vectors
ϕ, ψ ∈ H1 . Let us for the moment assume that H2 isP finite dimensional. This
enables a spectral decomposition of the form U = k exp[iαk ]|ek ihek | with
αk ∈ R and {ek } =: E ⊂ H2 an orthonormal basis. Inserting this into Eq.(1.60)
leads to
X
kT ∗ k ≤ sup sup hϕ, T ∗ |ek ihek | ψi,
(1.61)
E ψ,ϕ
k
Here, in the step from the first to the second line we have used positivity of
T ∗ together with two applications of Cauchy-Schwarz. Note that equality has
to hold in the inequality since U = 1 was a valid choice in the first place.
Eq.(1.58) then follows from unitality of T ∗ , which for positive maps now implies
kT k = kT ∗ k = 1.
Finally, we have to come back to the assumption dim(H2 ) < ∞. Suppose
this is not the case. Thennote that the coreexpression in Eq.(1.60) can also be
written as tr U T |ψihϕ| . Since T |ψihϕ| is a trace-class operator on H2 it
can be approximated arbitrarily well in trace-norm by a finite rank operator F .
42 CHAPTER 1. MATHEMATICAL FRAMEWORK
So we may restrict ourselves to unitaries that act non-trivial only on the finite
dimensional subspace supp(F )+ran(F ) and continue with the finite dimensional
argument.
Thm.1.42 constructs a map in the Heisenberg picture for any map in the
Schrödinger picture. What about the converse? In finite dimensions the sit-
uation is symmetric. There we can interpret the expression in Born’s rule as
Hilbert-Schmidt inner product w.r.t. which T ∗ is the adjoint operator corre-
sponding to T . In infinite dimensions, the proof of Thm.1.42 relied on the
duality relation B1 (H1 )0 = B(H1 ), which does not hold in the other direction.
In other words, there are maps Φ : B(H2 ) → B(H1 ) in the Heisenberg picture
that have no predual that maps density operators to density operators. A map
Φ is called normal if there exists such a predual. Equivalently, Φ is normal if
it is continuous as a map from B(H2 ) to B(H1 ) when both spaces are equipped
with the weak-* topology.
r
X
T (ρ) = Ak ρA∗k . (1.66)
k=1
1.6. QUANTUM CHANNELS AND OPERATIONS 43
k k =1
P
Bi := j∈NPuij Kj defines a set of Kraus operators that represent the same map
via T (ρ) = i∈N Bi ρBi∗ .
Conversely, if {Ai }i∈N and {Bi }i∈N are two sets of Kraus-operators that
represent the same trace-preserving map and if either N is finite or both sets
contain an infinite number of zeros, then there is a unitary u s.t. Bi :=
P
j∈N uij Aj .
We will see later that every quantum channel can be represented in the ways
specified by Thm.1.44.
Example 1.16 (Phase damping channel). Let {|0i, |1i} denote an orthonormal
basis of C2 and define ρij := hi|ρ|ji. A simple model of a ‘decoherence process’
is given by the phase damping channel that is parametrized by λ ∈ [0, 1] and
can be represented in the following ways:
X 3
ρ00 (1 − λ)ρ01
ρ 7→ = Ak ρA∗k (1.67)
(1 − λ)ρ10 ρ11
k=1
√ √ √
with A1 := 1 − λ 1, A2 := λ |0ih0|, A2 := λ |1ih1|.
ρ 7→ H ∗ ρ
defines a quantum channel, where ‘∗’ denotes the entry-wise product (a.k.a.
Hadamard product), i.e. (H ∗ ρ)ij = Hij ρij , where the matrix elements are
w.r.t. a fixed orthonormal basis {|ii}di=1 . Showing that Hadamard channels
are indeed quantum channels is most easily done by observing that the set
of Hadamard channels coincides with set of quantum channels P with diagonal
Kraus operators. Consider a quantum channel ρ 7→ ρ0 := ∗
k Ak ρAk with
0
hi|Ak |ji = δ a
P ij ki . This is a Hadamard channel since hi|ρ |ji = hi|ρ|jiH ij with
Hij = k aki ākj . For the converse direction, observe that the last equation can
be seen as decomposition of H into positive rank-one operators. In this way, we
can construct diagonal Kraus operators from H, and so prove that Hadamard
channels are indeed completely positive.
1.6. QUANTUM CHANNELS AND OPERATIONS 45
Definition 1.47 (Choi matrix). For finite-dimensional H1 ' Cd1 define |Ωi :=
Pd1
i=1 |iii ∈ H1 ⊗ H1 where each i labels an element of a fixed orthonormal
basis13 . The Choi matrix C ∈ B1 (H1 ⊗H2 ) of a linear map T : B(H1 ) → B1 (H2 )
is defined as
C := (id ⊗ T ) |ΩihΩ| .
√
Note that |Ωi/ d is a unit vector corresponding to a maximally entangled
state. The usefulness of the Choi matrix stems from a simple Lemma:
Lemma 1.48 (Cyclicity of maximally entangled state vectors). Let H1 ' Cd1
Pd1
be finite-dimensional and |Ωi := i=1 |iii ∈ H1 ⊗ H1 . For any ψ ∈ H1 ⊗ H2
define A := I(ψ) ∈ B2 (H1 , H2 ), where I is the Hilbert-Schmidt isomorphism
constructed via Eq.(1.41) (w.r.t. the same basis that defines Ω). Then
Clearly, the statement of the Lemma holds similarly for interchanged tensor
factors. In particular, for any ψ ∈ H2 ⊗ H1 there is an A ∈ B2 (H1 , H2 ) so that
|ψi = (A ⊗ 1)|Ωi.
tr [T (A)B] = tr C(AT ⊗ B) ,
∀A ∈ B(H1 ), B ∈ B(H2 ), (1.69)
where the transpose is w.r.t. the basis that is used in the definition of C.
Proof. (i) Note that via Eq.(1.69) T and C mutually determine each other so
that Eq.(1.69) specifies a bijection if we regard C as an unconstrained element
in B1 (H1 ⊗ H2 ). That this C is indeed the Choi matrix is verified by
Here we have used the property of the flip operator from Exercise 1.23 (b)
together with F = (Θ ⊗ id) |ΩihΩ| , where Θ denotes the matrix transposition.
∗
(ii) Since C ∗ = i,j |jihi| ⊗ T |iihj| with mutually orthogonal |iihj|, we
P
P ∗
have that this equals C = i,j |jihi| ⊗ T |jihi| iff T |iihj| = T |jihi| holds
for all i, j. In other words, C = C ∗ iff T (A)∗ = T (A∗ ) holds for all A = |iihj|.
By expanding an arbitrary A in that basis, the general statement follows.
(iii) The requirements in the definition of complete positivity of T imply
positivity of the Choi matrix as a special case. In order to prove the converse,
realize that is suffices to show (idn ⊗ T ) |ψihψ| ≥ 0 for all ψ ∈ Cn ⊗ H1
and all n ∈ N since the spectral decomposition of an arbitrary positive trace-
class operator allows us to restrict to rank-one operators. Lemma 1.48, with
interchanged tensor factors, now enables us to write |ψi = (A ⊗ 1)|Ωi for some
A ∈ B(H1 , Cn ). Then
The representation claimed in Eq.(1.70) then follows from the fact that there
is a unique T corresponding to C (Thm.1.49 (i)). If T is completely positive,
then C is positive (Thm.1.49(iii)) so that we can choose ϕk = ψk and thus
B k = Ak .
In fact, one can show that every instrument can be obtained in this way.
For any quantum channel and any discrete POVM there are simple ways of
constructing an instrument that implements the channel or the POVM, respec-
tively.
P one side, ∗given a quantum channel T with Kraus representation
On the
T (·) = i∈X Ki · Ki where X ⊆ N is any index set, we can construct an
instrument via IY (·) := i∈Y Ki · Ki∗ . Here B would simply be the set of all
P
subsets of X. This instrument ‘implements’ T in the sense that IX = T .
On the other side, given a POVM M on a discrete measurable space (X, B)
with B the powerset of X, we can construct an instrument
X
IY (ρ) := M (Y )1/2 ρ M (Y )1/2 . (1.71)
i∈Y
V ∗ M 0 (Y )V = M (Y ). (1.72)
P the set X of measurement outcomes is finite, one can choose dim(K) =
If
x∈X rank Mx , where Mx := M ({x}) corresponds to the measurement out-
come x ∈ X.
We will provide an elementary proof for the case of finitely many measure-
ment outcomes, and sketch later how the general case follows from Stinespring’s
dilation theorem.
1.6. QUANTUM CHANNELS AND OPERATIONS 49
L ⊥
Proof. We define K̃ := x∈X Kx with Kx := ker(Mx ) and equip it with an
inner product X
hϕ, φiK := hϕx , Mx φx i,
x∈X
tr [ρM (Y )] = tr ρ ⊗ |ψihψ| M 0 (Y )
∀ρ ∈ B1 (H). (1.73)
Hi = V ∗ Ki V,
with Ki := c Pi − Pn+i .
Proof. Regarding the range of all indices as Zn with addition modulo n we set
X
Kk := |iihj| ⊗ Ai−j+k ,
i,j
Eq.(1.74). To see that this leads to a commuting set of operators note that
X X
Kk1 Kk2 = |i1 ihj2 | ⊗ Ai1 −j1 +k1 Aj1 −j2 +k2 . (1.75)
i1 ,j2 j1
(a) Show that any linear map T : B(H1 ) → B(H2 ) can be written as a linear combi-
nation of four completely positive maps.
(b) Write matrix transposition Θ(A) := AT as a real linear combination of two com-
pletely positive maps.
(c) Use the definition of complete positivity to prove that X → AXA∗ is completely
positive for any A ∈ B(H1 , H2 ).
(d) Show that if T1 , T2 are completely positive maps, then T1 ◦ T2 , T1 + T2 , T1 ⊗ T2
are completely positive as well.
(e) Show that for the partial trace(s) positivity implies complete positive by using not
much more than the definitions of the partial trace and of complete positivity.
Exercise 1.27 (Positive but not completely). Let K ∈ Cd×d be such that K T = −K and
K ∗ K ≤ 1. Show that the map T : Cd×d → Cd×d defined as T (X) := tr [X] 1 − X −
KX T K ∗ is positive. Is it completely positive?
Exercise 1.28 (Kraus operators).
(a) Which is the minimal number of Kraus operators necessary to represent the phase
damping channel ?
(b) Decoherence and decay processes can often be described by a map of the form
will be necessary that D(H) is a dense subspace of H. The adjoint can then be
uniquely defined on D(A∗ ) := {ϕ ∈ H|ψ 7→ hϕ, Aψi is continuous on D(H)} so
that
hϕ, Aψi = hA∗ ϕ, ψi ∀ψ ∈ D(A), ϕ ∈ D(A∗ ). (1.76)
This definition directly exploits the Riesz-representation theorem, which only
gives rise to uniqueness of A∗ if D(A) is dense. D(A∗ ), however, is not auto-
matically dense – it may even happen that D(A∗ ) = {0}.
A densely defined operator A is called self-adjoint if A = A∗ and D(A) =
D(A∗ ). So bounded Hermitian operators are special cases of self-adjoint op-
erators. By the Hellinger-Toeplitz theorem a self-adjoint operator is bounded
iff it can be defined on all of H. This underlines that considering domains is
unavoidable for unbounded operators.
If A is self-adjoint, then (A∗ )∗ = A and the ranges of A ± i1 are the entire
Hilbert space. The latter is related to the fact that the Calyey transform (A −
i1)(A+i1)−1 =: U of a self-adjoint operator A defines a unitary. Exploiting this
relation, von Neumann was able to use the spectral theorem for unitaries, which
are necessarily bounded, to prove a spectral theorem for self-adjoint operators.
One formulation of the spectral theorem is in terms of projection valued
measures (PVMs). For any self-adjoint operator A there is a PVM P : B →
B(H), where B is the Borel σ-algebra on R, so that
Z
A= λ dP (λ). (1.77)
R
The integral is understood in the following weak sense: for any ψ ∈ D(A), ϕ ∈ H
R a Borel-measure µ : B → C via µ(Y ) := hϕ, P (Y )ψi that satisfies
we can define
hϕ, Aψi = R λ dµ(λ). The PVM P that is associated to A is called its spectral
measure and one can show that there is a one-to-one correspondence between
self-adjoint operators and PVMs on (R, B). Not surprisingly, λ ∈ R is an
eigenvalue of A iff P ({λ}) 6= 0. In this case P ({λ}) is the corresponding spectral
projection.
As in the compact case, the spectral representation in Eq.(1.77) leads directly
to a functional calculus. For any measurable function f : R → C we can define
Z
f (A) = f (λ) dP (λ) (1.78)
R
Z
|f (λ)|2 dhϕ, P (λ)ϕi < ∞ .
on D f (A) := ϕ ∈ H
R
Basic trade-offs
We will discuss central aspects of these three points in the following two sections.
In the case of observables or sharp POVMs, a central property in the discussion
of uncertainty relations for both preparation and measurement will be the non-
commutativity of operators. So, let us briefly recall some notation and useful
mathematical background related to commutators.
The commutator of two operators that act on the same space will be written
as [A, B] := AB − BA. If the operators are Hilbert-Schmidt class, then the
commutator is obviously trace-less and if A, B are Hermitian, the commutator
is anti-Hermitian (i.e., it becomes Hermitian when multiplied by i). A is said
to commute with B if [A, B] = 0. If a collection of normal, compact operators
commute pairwise, then they can be diagonalized simultaneously. That is, there
is a basis in which they are all diagonal. An analogous statement is true for
arbitrary sets of normal operators. Via continuous functional calculus this im-
plies that if [A, B] = 0 holds for two normal operators, then [f (A),√B] = 0 holds
as well for any continuous function f . In particular, it holds for A when A is
positive.
53
54 CHAPTER 2. BASIC TRADE-OFFS
1 2 1
2
var(A)var(B) ≥ h[A, B]i + {A − hAi, B − hBi}+ . (2.2)
4 4
Moreover, equality holds iff (αA − βB)ρ = γρ for some (α, β, γ) ∈ C3 \ {0}.
T
if D(Hk Hl ) is the domain of Hk Hl , then we need ψ ∈ kl D(Hk Hl ). In this
way, Heisenberg’s uncertainty relation for position and momentum is obtained
from Cor.2.2 by neglecting the covariance term on the r.h.s. and inserting i1
for the commutator of the position and momentum operator.
Proof. As pointed out already, the inequality stated in Eq.(2.2) is just a refor-
mulation of the determinant inequality in Eq.(2.1) for the special case of two
observables. In order to characterize cases of equality we will, however, use a
different proof. Assume for the moment that ρ = |ψihψ| and set à := A − hAi1,
B̃ := B − hBi1. Then Cauchy-Schwarz gives
2
2 2 2
Ãψ
B̃ψ
≥ |hψ, ÃB̃ψi|2 = Rehψ, ÃB̃ψi + Imhψ, ÃB̃ψi . (2.3)
Inserting the expressions defining à and B̃ then leads to the claimed uncertainty
relation in Eq.(2.2) for pure states. The advantage of this proof is that we know
that equality in the Cauchy-Schwarz inequality, and thus in the uncertainty
relation, holds iff αÃψ = β B̃ψ for some α, β ∈ C. This proves the claimed
characterization of cases of equality for pure states (with γ necessarily being
equal to αhAi − βhBi).
The result can be lifted to mixed states by purification (Cor1.38). If a unit
vector ψ ∈ H1 ⊗ H2 characterizes a purification of ρ = tr2 |ψihψ| and if we use
à ⊗ 1 and B̃ ⊗ 1 in Eq.(2.3) instead of à and B̃, then we arrive at the general
form of the uncertainty relation in Eq.(2.2) for mixed states. Equality is then
attained iff ψ is in the kernel of (αà − β B̃) ⊗ 1 for some α, β ∈ C. Exploiting
the Schmidt-decomposition of ψ (1.37) we can see that this is equivalent to the
statement that every eigenvector of ρ that corresponds to a non-zero eigenvalue
has to be in the kernel of (αà − β B̃). This, in turn, is equivalent to the claimed
characterization.
Joint measurability
least one of them is sharp (i.e. projection valued). Then M1 and M2 are jointly
measurable iff they commute in the sense that ∀Yi ∈ Bi : [M1 (Y1 ), M2 (Y2 )] = 0.
In that case the joint POVM M : B → B(H) is characterized by M (Y1 × Y2 ) =
M1 (Y1 )M2 (Y2 ).
Proof. Assume that the two POVMs commute. Since commutativity is a prop-
erty that extends to the square root, we can use that
p p
M1 (Y1 )M2 (Y2 ) = M1 (Y1 )M2 (Y2 ) M1 (Y1 ) =: M (Y1 × Y2 )
Following the same steps, we can show that M1 (Y1 )M2 (Y2 ) = M (Y1 × Y2 ).
Hence, M1 commutes with M2 .
2.2 Information-disturbance
No information without disturbance
∗
P
Proof. Assuming (i) and using a Kraus decomposition of D(·) = l Al · Al
we can exploit the bijective relation between a completely positive map and
its Choi matrix (cf. Thm.1.49) to show that (Al Kj ⊗ 1)|Ωi = clj |Ωi for some
complex P number clj and thus Al Kj = 1cP ∗
lj . As D is unital, this leads to
V ∗F V ∝ 1 (2.6)
holds for all F ∈ Pd−1 but fails for some F ∈ Pd . What is the reason behind this
definition? Consider a quantum channel Φ : B1 (H) → B1 (H), which models the
noise/decoherence/errors that affect the n qubits, whose Kraus-operators {Ai }
are all in the linear span of Pt with t := b d−1
2 c. Then the Kraus operators {Ki }
of T := Φ ◦ E satisfy Eq.(2.4) so that Thm.2.5 guarantees the existence of a
decoding quantum channel D such that D ◦ Φ ◦ E = id. d is called the distance
of the code and t can be interpreted as the number of errors the code corrects.
An important point to note is that a given [[n, k, 2t + 1]]-QECC does not
only work for one noise-characterizing channel Φ, but for all channels whose
Kraus-operators are in the linear span of Pt .
2.3 Time-energy
Mandelstam-Tamm inequalities
Moreover, for any Hermitian H, any unit vector ψ(0) and any τ ≥ 0 there is a
Hermitian A ∈ B(H) so that equality holds in Eq.(2.7) when evaluated at t = τ .
p
The inequalities in Eq.(2.9) then follow from Eq.(2.8) by using arccos( 1/2) =
π/4 and arccos(0) = π/2.
Note that p(t) can be interpreted as the probability of the system still being
in its initial state after time t. That is, if a projective measurement with two
outcomes and corresponding projectors P0 := |ψ(0)ihψ(0)| and P1 := 1 − P0
is performed after time t, then the outcome corresponding to P0 occurs with
probability p(t).
R
Proof. Exploiting positivity of H and the spectral representation H = R+λdP (λ)
we can define a Borel-probability measure on [0, ∞) via µ(Y ) := hψ, P (Y )ψi.
For p > 0, the p’th moment of µ is then given by mp = hψ, H p ψi and its
characteristic function by
Z Z
e−iλt dµ(λ) = e−iλt dhψ, P (λ)ψi = hψ, e−iHt ψi.
R+ R+
The claim follows then from Lemma 2.8.
For p = 2 Eq.(2.11) is similar to the consequence that we obtained in Cor.2.7
from the Mandelstam-Tamm inequality. In fact, √ at first glance, Eq.(2.11) looks
even stronger since there is a missing factor 1/ 2. Note, however, that Eq.(2.11)
requires an additional assumption, namely positivity of the Hamiltonian.
For p = 1 Eq.(2.11) is called the Margolus-Levitin bound, which directly
relates the energy of a pure state (w.r.t. a positive Hamiltonian) to the minimal
time it takes to evolve into an orthogonal state. So far, we do, however, not
know under which circumstances a pure state ψ will ever evolve to an orthogonal
state under the time-evolution governed by the Hamiltonian H. For obtaining
a better understanding of this matter, it is useful to import the following classic
result:
Lemma 2.10 (Kronecker-Weyl). Let x ∈ [0, 1)d be a point in the unit-cube so
that 1, x1 , . . . , xd are linearly independent over Q. Then the sequence of points
(nx)n∈N ∈ [0, 1)d where each coordinate is understood mod 1 is uniformly
distributed (and thus in particular dense) in [0, 1)d .
With this Lemma we can now show that a necessary and ‘generically’ also
sufficient condition for a pure state to ever evolve to an orthogonal state is that
its overlap with any of the eigenvectors of the Hamiltonian is not larger than
1/2:
Theorem 2.11 (Condition for reaching minimal overlap). Let dim(H) < ∞,
H ∈ B(H) Hermitian with an orthonormal basis of eigenvectors {ϕi } and cor-
2
responding eigenvalues {λi }.
For any ψ ∈ H define p := maxi |hψ, ϕi i| and
−iHt
ν := inf t∈R+ |hψ, e ψi| . Then
ν ≥ max{0, 2p − 1}, (2.12)
with equality if the eigenvalues {λi } (as a multiset) are linearly independent over
Q.
Proof. Using the spectral decomposition of H we can write |hψ, e−iHt ψi| =
−iλk t
where pk := |hψ, ϕk i|2 . Since the pk ’s are positive and sum up to
P
k pk e
one, this is a convex combination (i.e. a weighted average) of complex numbers
P one. Assume w.l.o.g. that p = p1 . From the triangle-inequality and
of modulus
using k>1 pk = 1 − p we obtain
X X X
−iλk t i(λ1 −λk )t
pk e ≥ p− pk e ≥ p− pk = 2p − 1, (2.13)
k k>1 k>1
2.3. TIME-ENERGY 61