Quantum Information Processing Lecture Notes, Wolf

Mathematical Introduction to
Quantum Information Processing

(growing lecture notes, SS2019)
Michael M. Wolf
June 22, 2019

Contents
1 Mathematical framework 5
1.1 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Bounded Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Ideals of operators . . . . . . . . . . . . . . . . . . 10
Convergence of operators . . . . . . . . . . . . . . 11
Functional calculus . . . . . . . . . . . . . . . . . . 12
1.3 Probabilistic structure of Quantum Theory . . . . . . . . . . . . 14
Preparation . . . . . . . . . . . . . . . . . . . . . . 15
Measurements . . . . . . . . . . . . . . . . . . . . . 17
Probabilities . . . . . . . . . . . . . . . . . . . . . . 18
Observables and expectation values . . . . . . . . . 20
1.4 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Convex sets and extreme points . . . . . . . . . . . 22
Mixtures of states . . . . . . . . . . . . . . . . . . . 23
Majorization . . . . . . . . . . . . . . . . . . . . . 24
Convex functionals . . . . . . . . . . . . . . . . . . 26
Entropy . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Composite systems and tensor products . . . . . . . . . . . . . . 29
Direct sums . . . . . . . . . . . . . . . . . . . . . . 29
Tensor products . . . . . . . . . . . . . . . . . . . . 29
Partial trace . . . . . . . . . . . . . . . . . . . . . 34
Composite and reduced systems . . . . . . . . . . . 35
Entropic quantities . . . . . . . . . . . . . . . . . . 37
1.6 Quantum channels and operations . . . . . . . . . . . . . . . . . 38
Schrödinger & Heisenberg picture . . . . . . . . . . 38
Kraus representation and environment . . . . . . . 42
Choi-matrices . . . . . . . . . . . . . . . . . . . . . 45
Instruments . . . . . . . . . . . . . . . . . . . . . . 47
Commuting dilations . . . . . . . . . . . . . . . . . 48
1.7 Unbounded operators and spectral measures . . . . . . . . . . . . 51
2 Basic trade-offs 53
2.1 Uncertainty relations . . . . . . . . . . . . . . . . . . . . . . . . . 53
Variance-based preparation uncertainty relations . 54
Joint measurability . . . . . . . . . . . . . . . . . . 55
2
CONTENTS 3
2.2 Information-disturbance . . . . . . . . . . . . . . . . . . . . . . . 56
No information without disturbance . . . . . . . . 56
2.3 Time-energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Mandelstam-Tamm inequalities . . . . . . . . . . . 58
Evolution to orthogonal states . . . . . . . . . . . . 59
These are (incomplete but hopefully growing) lecture notes of a course taught in
summer 2019 at the department of mathematics at the Technical University of
Munich.
4 CONTENTS
Chapter 1
Mathematical framework
1.1 Hilbert spaces

This section will briefly summarize relevant concepts and properties of Hilbert
spaces.
A complex Hilbert space is a vector space over the complex numbers, equipped
with an inner product h·, ·i : H × H → C and an induced norm kψk := hψ, ψi1/2
w.r.t. which it is complete.1 Hence, every Hilbert space is in particular a
Banach space. We will use the physicists convention that the inner prod-
uct is linear in the second and conjugate-linear in its first argument so that
hψ, cϕi = chψ, ϕi = hcψ, ϕi, ∀c ∈ C.
The most import inequality for the inner product is the Cauchy-Schwarz
inequality, which immediately follows2 from the identity
1 2
2 2 2
kψk kϕk − |hψ, ϕi|2 = ψ − hϕ, ψiϕ ≥ 0.

2 kϕk
kϕk
This also shows that equality holds iff ϕ and ψ are linearly dependent.
A characteristic property of any norm that is induced by an inner product
is that it satisfies the parallelogram law

2 2 2 2
kψ + ϕk + kψ − ϕk = 2 kψk + kϕk . (1.1)
In fact, whenever a norm satisfies Eq.(1.1) for all ψ, ϕ, then we can reconstruct
a corresponding inner product via the polarization identity, which in the case of
a complex space reads
3
1X k 2
hψ, ϕi = i ϕ + ik ψ . (1.2)
4
k=0
1 That is, every Cauchy sequence converges.
2 Note that the derivation of Cauchy-Schwarz does not use that hψ, ψi = 0 ⇒ ψ = 0. It
only requires that hψ, ψi ≥ 0.
5
6 CHAPTER 1. MATHEMATICAL FRAMEWORK
A central concept that is enabled by an inner product is orthogonality: ψ, ϕ

are called orthogonal if hψ, ϕi = 0. In that case kψ + ϕk = kψ − ϕk so that the
2 2 2
parallelogram law becomes the Pythagoras identity kψ + ϕk = kψk + kϕk .
⊥
For any subset S ⊆ H the orthogonal complement S is defined as the subset of
H whose elements are orthogonal to every element in S. S ⊥ is then necessarily
a closed linear subspace. Every closed linear subspace H1 ⊆ H, in turn, gives
rise to a unique decomposition of any element ψ ∈ H as ψ = ψ1 + ψ2 , where
ψ1 ∈ H1 and ψ2 ∈ H2 = H1⊥ . In this way, the Hilbert space decomposes into an
orthogonal direct sum H = H1 ⊕ H2 . The ψi ’s can equivalently be characterized
as those elements in Hi closest to ψ. Uniqueness of the ψi ’s enables the definition
of two orthogonal projections Pi : H → Hi , ψ 7→ ψi , which are linear idempotent
maps related via P2 = 1 − P1 , where 1 denotes the identity map on H.
If we think the idea of orthogonal decompositions of a Hilbert space further,
we are led to the concept of an orthonormal basis. An orthonormal basis is a set
{ei } ⊆ H whose linear span is dense in H and whose elements satisfy hei , ej i =
δij . Its cardinality defines the dimension of the Hilbert space. Separability
of H means that there is P a countable orthonormal basis. In that case, for
every ψ ∈ H we have ψ = i hei , ψiei (converging in norm) and the Parseval
2
identity kψk = i |hei , ψi|2 holds. An orthonormal set of vectors can always
P
be extended to an orthonormal basis.
Another property that Hilbert spaces share with their Euclidean ancestors is
expressed by the Riesz representation theorem: it states that every continuous
linear map from H into C is of the form ψ 7→ hϕ, ψi for some ϕ ∈ H, and vice
versa. In other words, there is a conjugate linear bijection between H and its
topological dual space H0 (i.e. the space of all continuous linear functionals).
The possible identification of H and H0 motivates the so-called Dirac-notation
that writes |ψi for elements of H and hϕ| for elements of H0 . These symbols
are then called ket and bra, respectively and the inner product in this nota-
tion reads hϕ|ψi (forming a “bra(c)ket”). When we would restrict ourselves to
Euclidean spaces, kets and bras would be nothing but column vectors and row
vectors, respectively. Dirac notation also enables the introduction of a ket-bra
|ψihϕ| : H → H that defines a map |φi 7→ |ψihϕ|φi. Using ket-bras, a necessary
and sufficient condition for a set of orthonormal vectors to form a basis of a
separable Hilbert space is given by
|ek ihek | = 1.
X
(1.3)
k
To write expressions of this form even more compactly, the elements of a fixed
orthonormal basis are often simply specified by their label so that one writes
|ki instead of |ek i.
So far, this has been abstract Hilbert space theory. Before we proceed, some
concrete examples of Hilbert spaces:
Example 1.1. Cn becomes
Pn a Hilbert space when equipped with the standard
inner product hψ, ϕi = i=1 ψi ϕi .
1.1. HILBERT SPACES 7
Example 1.2. The sequence space l2 (N) := ψ ∈ CN | k |ψk |2 < ∞ be-

P
comes
P a Hilbert space when equipped with the standard inner product hψ, ϕi =
k ψk ϕk . The standard orthonormal basis in this case is given by sequences
ek , k ∈ N such that the l’th element in ek equals δlk .
space L2 (R) := f : R → C R |ψ(x)|2 dx <
R
Example
1.3. The function
∞ / ∼ where ψ ∼ ϕ ⇔ R |ψ(x) − ϕ(x)|2 dx = 0 becomes a separable Hilbert
R
R
space with hψ, ϕi = R ψ(x)ϕ(x)dx.
Example 1.4. The space Cn×m of complex n × m matrices becomes a Hilbert
space with hA, Bi = tr [A∗ B].
Two Hilbert spaces H1 and H2 are called isomorphic if there is a bijection
U : H1 → H2 that preserves all inner products. U , which is called a Hilbert
space isomorphism, is then necessarily linear and it turns out that Hilbert spaces
are isomorphic iff they have the same dimension. Hence, all separable Hilbert
spaces are isomorphic to either Cn or l2 (N), in particular, L2 (R) ' l2 (N).
Sometimes one has to deal with inner product spaces that are not complete.
In these cases the following theorem comes in handy and allows to ‘upgrade’
every such space to a Hilbert space:
Theorem 1.1 (Completion theorem). For every inner product space X there is
a Hilbert space H and a linear map V : X → H that preserves all inner products3
so that V (X ) is dense in H and equal to H if X is complete. The space H is then
called the completion of X . It is unique in the sense that if (V 0 , H0 ) give rise to
another completion, then there is a Hilbert space isomorphism U : H → H0 s.t.
V0 =U ◦V.
As in the more general case of metric spaces, the completion is constructed

by considering equivalence classes of Cauchy-sequences in X . Usually, this con-
struction is, however, hardly used beyond the proof of this theorem, and it is
sound to regard H as a superspace of X that has been constructed from X by
adding all the elements that were missing for completeness.
We finally state a property that distinguishes Hilbert spaces from almost all
other normed spaces and has various applications in the form of a dimension
reduction argument:
Lemma 1.2 (Johnson-Lindenstrauss). There is a universal constant c ∈ R

such that for any ∈ (0, 1], Hilbert space H, n ∈ N, ψ1 , . . . , ψn ∈ H there is a
linear map L : H → Hd that is a multiple of an orthogonal projection onto a
d-dimensional subspace Hd with
c
d≤ log n,
2
so that for all i, j ∈ {1, . . . , n} :
2 2 2
(1 − ) kψi − ψj k ≤ kLψi − Lψj k ≤ (1 + ) kψi − ψj k . (1.4)
3 In other words, V is an isometry; see next section for the definition.
This is often stated and used for real Hilbert spaces, but equally valid for
complex ones.
From now on, we will tacitly assume that all Hilbert spaces H, H1 , H2 , etc.
are complex and separable.
Exercise 1.1. Show that the closed unit ball of any Hilbert space is strictly convex.
Exercise 1.2. Show that any linear map U : H1 → H2 that preserves norms also
preserves inner products.
Exercise 1.3. a) Prove that ψ = ϕ iff ∀φ ∈ H :hφ, ϕi = hφ, ψi.
b) Let A : H → H be linear, ψ, ϕ ∈ H. Verify the identity
3
1X k
hϕ, Aψi = i hψ + ik ϕ, A(ψ + ik ϕ)i.
4
k=0
c) Let A, B : H → H be linear. Show that A = B iff ∀ψ ∈ H : hψ, Aψi = hψ, Bψi.

Why is this not true for real Hilbert spaces?
Exercise 1.4. Prove that every separable, infinite dimensional Hilbert space is isomor-
phic to l2 (N).
Notes and literature Frigyes Riesz, David Hilbert and Hilbert’s student Erhard
Schmidt studied various aspects of concrete Hilbert spaces, (mainly in the context of
integral equations or for l2 (N)) in the first years of the 20’th century. The introduction of
a geometric viewpoint, which led to the concept of orthogonality, is largely due to Schmidt.
The term Hilbert space was coined by Frigyes Riesz for concrete Hilbert spaces and it was
later used by John von Neumann for the underlying abstract concept. Herman Weyl in-
troduced the name unitary space in parallel. Von Neumann, who included separability in
the definition of a Hilbert space, used the concept to unify Schrödinger’s wave mechanics
with the matrix mechanics of Werner Heisenberg, Pascual Jordan and Max Born. An
impetus of von Neumann’s work were lectures given by David Hilbert in the winter term
1926/27 on the development of quantum mechanics. Von Neumann attended the lectures
and quickly established a rigorous mathematical basis of what he had heard. Soon after,
this led to the foundational work “Über die Grundlagen der Quantenmechanik ”.
A good way to learn about the mathematics of Hilbert spaces is from Paul Halmos’ “A
Hilbert space problem book ”.
1.2 Bounded Operators

With operator we mean a linear map between vector spaces. If these, say
X and Y, are Banach spaces, we define B(X , Y) to be the set of continuous
operators from X to Y, and B(X ) := B(X , X ). B(X , Y) itself becomes a Banach
space when equipped with the operator norm ||A|| := supx6=0 kAxk / kxk. So by
definition, the operator norm is the smallest Lipschitz-constant of the operator.
The use of the letter B already suggests an elementary but crucial fact: an
operator between Banach spaces is continuous iff it is bounded (in the sense
that its operator norm is finite).
A commonly used procedure is the extension of a bounded operator: if
A ∈ B(L, Y) is defined on a dense linear subspace L ⊆ X , then by the BLT
1.2. BOUNDED OPERATORS 9
theorem (for ‘bounded linear transformation’) there exists a unique extension

Ã ∈ B(X , Y) of A = Ã|L . In addition, kÃk = kAk. This is often used when
defining a bounded operator by first specifying its action on a set whose linear
span is dense in X and then using that “by linearity and continuity” this extends
uniquely to the whole space.
We will encounter various types of operators on Hilbert spaces:
Definition 1.3. Let A ∈ B(H), C ∈ B(H1 , H2 ).
(i) The adjoint C ∗ ∈ B(H2 , H1 ) is defined via hψ, Cϕi =: hC ∗ ψ, ϕi ∀ψ, ϕ.
(ii) If A∗ A = AA∗ , then A is called normal.
(iii) If A∗ = A, then A is called Hermitian.4
(iv) C is called an isometry if C ∗ C = 1 and a unitary if in addition CC ∗ = 1.
(v) C is called a partial isometry if it is an isometry on ker(C)⊥ .
(vi) If hψ, Aψi ≥ 0 ∀ψ ∈ H, then A is called positive (a.k.a. positive semidef-
inite) and we write A ≥ 0.
(vii) If A2 = A, then A is called a projection and an orthogonal projection, if
in addition A = A∗ .
In the physics literature A∗ is often written A† . The adjoint operation is
an involution, i.e., (A∗ )∗ = A, it preserves the operator norm kA∗ k = kAk and
satisfies (AB)∗ = B ∗ A∗ . When representing the adjoint operator as a matrix in
a given orthonormal basis we see that the adjoint equals the complex conjugate
of the transpose since hek , A∗ el i = hA∗ el , ek i = hel , Aek i.
Example 1.5 (Pauli matrices). The Pauli matrices

0 1 0 −i 1 0
σ1 := , σ2 := , σ3 , (1.5)
1 0 i 0 0 −1
are all Hermitian and unitary. Together with σ0 := 1 they form a basis of the
space of 2 × 2 matrices.
Positivity is a crucial concept for many things that follow. It induces a
partial order within the set of Hermitian operators by understanding A ≥ B as
A − B ≥ 0. There are various ways of characterizing a positive operator. For
instance, A ≥ 0 holds iff A = A∗ ∧ spec(A) ⊆ [0, ∞), which in turn is equivalent
to the existence of a B ∈ B(H) so that A = B ∗ B. If such a B exists, it can
always be chosen
√ positive itself, which then uniquely defines a positive square
root B =: A ≥ 0 for any A√≥ 0. This in turn enables the definition of a
positive absolute value |A| := A∗ A ∈ B(H) for any A ∈ B(H). The absolute
value is also related to the original operator via the polar decomposition, which
states that for any A ∈ B(H) there is a partial isometry U such that A = U |A|.
Here U can be taken unitary iff ker(A) and ker(A∗ ) have the same dimension.
4 The term self-adjoint is used as well.
Using spectral theory, one can show that every Hermitian operator A ∈ B(H)
admits a unique decomposition of the form
A = A+ − A− where A± ≥ 0 and A+ A− = 0. (1.6)
In this case, the absolute value can also be expressed as |A| = A+ +A− . Another
way in which linear combinations of positive operators can be used, is once again
a variant of the polarization formula, which for the case of a pair of bounded
operators A, B ∈ B(H) takes on the form
3
1X k ∗
B∗A = i A + ik B A + ik B .

(1.7)
4
k=0
Ideals of operators Various interesting subspaces of operators in B(H1 , H2 )

can be obtained
as completions of the space of finite-rank operators B0 (H1 , H2 ) :=
lin{|ψihϕ| ψ ∈ H2 , ϕ ∈ H1 }. For instance, the closure of B0 (H1 , H2 ) in
B(H1 , H2 ) w.r.t. the operator norm yields the space of compact operators
B∞ (H1 , H2 ). Every A ∈ B∞ (H1 , H2 ) admits a Schmidt decomposition. That is,
it can be written as X
A= sk |ek ihfk |, (1.8)
k
where s ∈ RN
+ is a null sequence whose non-zero elements are called singular
values of A and {ek }, {fk } are two orthonormal sets of vectors in H2 and H1 ,
respectively. The singular values of A are unique as a multiset. If H1 = H2 = H
each ek can be chosen proportional (equal) to fk iff A is normal (positive). In
these cases, Eq.(1.8) then leads to the spectral decomposition, with eigenvectors
ek and eigenvalues sk hfk , ek i.
If we restrict the space of compact operators to those for which s ∈ l2 (N) or
s ∈ l1 (N), we are led to the spaces of Hilbert-Schmidt class operators B2 (H1 , H2 )
and, in the case of equal spaces, the trace-class operators B1 (H), respectively.
These become Banach spaces when equipped with the Hilbert-Schmidt norm
kAk2 := ksk2 and the trace-norm kAk1 := ksk1 , respectively. With respect to
these norms B2 (H1 , H2 ) and B1 (H) can be regarded as completion of the space of
finite-rank operators and we have the inclusion (with equalities iff dim(H) < ∞)
B0 (H) ⊆ B1 (H) ⊆ B2 (H) ⊆ B∞ (H) ⊆ B(H). (1.9)
These inclusions also reflect the norm inequalities kAk1 ≥ kAk2 ≥ kAk∞ := kAk
for A ∈ B(H). All the spaces in Eq.(1.9) are ∗-ideals in B(H), which means that
they are closed under multiplying with elements of B(H) and under taking the
adjoint. Moreover, A, B ∈ B2 (H) implies AB ∈ B1 (H).
An alternative and equivalent definition of B2 (H1 , H2 ) and B1 (H) is in terms
of the trace. For a positive operator A ∈ B(H), the trace tr [A] ∈ [0, ∞] is defined
as X
tr [A] := hek , Aek i, (1.10)
k
where the sum runs over all elements of an orthonormal basis. Positivity guar-
antees that the expression is independent of the choice of that basis. Then
B1 (H) is the space of all operators for which tr [|A|] < ∞. For all trace-class
operators the trace is then unambiguously defined as well (thus the name) and
kAk1 = tr [|A|]. This satisfies |tr [A] | ≤ kAk1 (as can be seen from the Schmidt
decomposition) and the Hölder inequality kABk1 ≤ kAk1 kBk∞ holds.
In a similar vein, we can express the Hilbert-Schmidt norm as kBk2 =
1
tr [B ∗ B] 2 for any B ∈ B2 (H1 , H2 ). In fact, B2 (H1 , H2 ) becomes a Hilbert
space when equipped with the Hilbert-Schmidt inner product hA, Bi := tr [A∗ B]
(like in example 1.4).
Example 1.6 (Operator bases). As a Hilbert space B2 (H) admits an orthonormal
basis. A simple common choice is the set of matrix units {|kihl|}, which exploits
an orthonormal basis {|ki} of H. If d := dim(H) < ∞, another useful basis
d−1
can be constructed from a discrete Weyl system: define a set {Uk,l }k,l=0 of d2
unitaries by
d−1
2πi
X
Uk,l := η rl |k + lihr|, η := e d , (1.11)
r=0
d−1
where addition inside the ket is modulo d and {|ki}k=0 is again an orthonormal
basis of H. Then the Uk,l ’s become
√ orthonormal w.r.t. the Hilbert-Schmidt
inner product when divided by d. Note that for d = 2, the Uk,l ’s reduce to
the Pauli matrices (up to phases, i.e. scalar multiplies of modulus 1).
Since B2 (H) is a Hilbert space, the Riesz representation theorem guarantees
that every continuous linear functional on B2 (H) is of the form
A 7→ tr [BA] , (1.12)
for some B ∈ B2 (H). That is, B2 (H)0 ' B2 (H). Via the same trace formula we
also have that B∞ (H)0 ' B1 (H) and B1 (H)0 ' B(H). B(H)0 , however contains
more elements than those that can be obtained from Eq.(1.12) with B ∈ B1 (H).
A frequently used property of the trace is that
tr [AB] = tr [BA] , (1.13)
if one of the operators is trace-class or both are Hilbert-Schmidt class. Similarly,

tr [A|ψihϕ|] = hϕ, Aψi.
Convergence of operators Let us now have a look at different notions of

convergence in B(H). Norm convergence (a.k.a. uniform convergence) of the
form kAn − Ak → 0 for n → ∞ w.r.t. the operator norm is often too strong.
The sum in Eq.(1.3), for instance, does clearly not converge in norm: if we
denote the n’th partial sum by An , then kAn − An−1 k = k|en ihen |k = 1 in this
case. Weaker notions of convergence are:
◦ Weak convergence, which requires hψ, (An − A)ϕi → 0 for all ϕ, ψ ∈ H,

◦ Weak-* convergence 5 , which requires tr [(An − A)B] → 0 ∀B ∈ B1 (H),
◦ Strong convergence, which requires k(An − A)ψk → 0 for all ψ ∈ H.
These are generally related as follows: norm convergence implies weak-* con-
vergence (via Hölder’s inequality) and also strong convergence (via Lipschitz in-
equality). These two, in turn, imply weak convergence (by using B = |ϕihψ| and
Cauchy-Schwarz, respectively). Moreover, on norm-bounded subsets of B(H)
weak and weak-* convergence are equivalent (as shown by employing Schmidt
decomposition together with dominated convergence).
The expression in Eq.(1.3) is strongly convergent. More generally, any norm-
bounded increasing sequence of Hermitian operators is strongly convergent in
B(H). This is often useful to lift results from finite dimensions to infinite di-
mensions. Sometimes it is used together with the fact that if An → A and
Bn → B each converge strongly, then An Bn → AB converges strongly as well,
and An C → AC converges in norm for any C ∈ B∞ (H).
Each of the mentioned notions of convergence is based on a corresponding
topology on B(H). The weak-* topology, for instance, can be defined as the
smallest topology in which all functionals of the form B(H) 3 A → tr [AB] are
continuous for any B ∈ B1 (H).
Functional calculus If A is an operator on H and f : C → C a function,

there are different ways of defining f (A) depending on the properties of f and
A. We will briefly survey two of them that both generalize the straightforward
case of polynomial functions and both involve the spectrum of A.
Recall that the spectrum spec(A) ⊆ C of a bounded operator is the set
of complex numbers λ for which the operator (λ1 − A) is not invertible (i.e.
it represents a map that is not bijective). If f is holomorphic on a simply
connected domain D ⊃ spec(A) and Γ a rectifiable closed curve in D that does
not intersect itself and surrounds spec(A), then Cauchy’s integral formula can
be used to define I
1
f (z) z 1 − A
−1
f (A) := dz. (1.14)
2πi γ
This way of defining f (A) is called holomorphic functional calculus. The integral
in Eq.(1.14) converges in operator norm and the resulting operator satisfies
spec f (A) = f spec(A) . Moreover, if g : D → C is another holomorphic
function an gf denotes the pointwise product, then g(A)f (A) = gf (A).
If f is merely continuous on a set that contains spec(A), then one can still
define f (A) if A is a normal operator. The idea is to exploit the spectral
decomposition and to let f act directly on theP spectrum of A. In particular, if
A ∈ B1 (H) has spectral decomposition A = k λk |ψk ihψk |, then
X
f (A) := f (λk )|ψk ihψk |, (1.15)
k
5 a.k.a. ultraweak convergence or σ-weak convergence.

where the sum converges in trace-norm. This is called continuous functional

calculus. If f is analytic, it coincides with the holomorphic functional calculus.
That is, if the assumptions of both functional calculi are satisfied, then Eq.(1.14)
equals Eq.(1.15).
Exercise 1.5. Let A, B ∈ B(H) be Hermitian. Show that

a) tr [AB] ∈ R if B ∈ B1 (H),
b) A ≥ B ∧ A ≤ B implies A = B,
c) A ≥ B implies that CAC ∗ ≥ CBC ∗ for all C ∈ B(H, H̃).
Exercise 1.6. Let A, B ∈ B(H) be positive and B ∈ B1 (H). Show that
a) tr [AB] ≥ 0,
b) tr [AB] = 0 implies AB = BA = 0.
Exercise 1.7. Let P ∈ B(H) be an orthogonal projection. Show that
a) 0 ≤ P ≤ 1,
b) if 0 ≤ A ≤ µP for some µ ∈ R+ and Hermitian A ∈ B(H), then A = AP = P AP .
Exercise 1.8. For the operator norm on B(H), show that
a) 0 ≤ A ≤ B implies that kAk ≤ kBk,
b) −1 ≤ C ≤ 1 iff kCk ≤ 1 for Hermitian C,
c) kABk ≤ kAk kBk,
d) kA∗ Ak = kAk2 for all A ∈ B(H),
e)∗ kAk = supkψk=1 |hψ, Aψi| for all normal A.
Exercise 1.9. Let Q ∈ B(H) be positive and such that ker(Q) = {0}. Prove that
(A, B) 7→ tr [QA∗ B] defines an inner product on B2 (H).
Exercise 1.10. Construct a sequence of finite rank operators An ∈ B0 (H) that converges
weakly to zero but not strongly.
1.3 Probabilistic structure of Quantum Theory

Quantum theory can be regarded as a general theoretical framework for physical
theories. It consist out of a mathematical core that becomes a physical theory
when adding a set of correspondence rules telling us which mathematical objects
we have to use in different physical situations.
Quantum theory divides the description of any physical experiments into
two parts: preparation and measurement. This innocent looking step already
covers one of the basic differences between the quantum and the classical world,
as in classical physics there is no need to talk about measurements in the first
place. Note also that the division of a physical process into preparation and
measurement is sometimes ambiguous, but, fortunately, quantum theoretical
predictions do not depend on the particular choice of the division.
A genuine request is that a physical theory should predict the outcome of
any measurement given all the information about the preparation, i.e., the ini-
tial conditions, of the system. Quantum mechanics6 teaches us that this is in
general not possible and that all we can do is to predict the probabilities of
outcomes in statistical experiments, i.e., long series of experiments where all
relevant parameters in the procedure are kept unchanged. Thus, quantum me-
chanics does not predict individual events, unless the corresponding probability
distribution happens to be tight. We will see later that there are good rea-
sons to believe that this ‘fuzziness’ is not due to incompleteness of the theory
and lacking knowledge about some hidden variables but rather part of natures
character. In fact, entanglement will be the leading actor in that story. The
fact that the appearance of probabilities is not only due to the ignorance of the
observer, but at the very heart of the description, means that the measurement
process can be regarded as a transition from possibilities to facts.
The preparation of a quantum system is the set of actions which determines
all probability distributions of any possible measurement. It has to be a proce-
dure which, when applied to a statistical ensemble, leads to converging relative
frequencies and thus allows us to talk about probabilities. Since many different
preparations can have the same effect in the sense that all the resulting proba-
bility distributions coincide it is reasonable to introduce the concept of a state,
which specifies the effect of a preparation regardless of how it has actually been
performed. Note that, in contrast to classical mechanics, a quantum ‘state’
does not refer to the attributes of an individual system but rather describes
a statistical ensemble—the effect of a preparation in a statistical experiment.
One should thus be careful with assigning states to individual systems. Talking
about the ‘state of an individual atom’ is more common but not necessarily
more meaningful than talking about the ‘Bernoulli distribution of an individual
coin’.
6 We use quantum mechanics and quantum theory synonymously.

1.3. PROBABILISTIC STRUCTURE OF QUANTUM THEORY 15
Preparation While the term ‘state’ is used for various different albeit related
mathematical objects (explained further down), a mathematically unambiguous
way to describe the preparation of a quantum system is the use of density
operators:
Definition 1.4 (Density operators). ρ ∈ B1 (H) is called a density operator if

it is positive and satisfies tr [ρ] = 1. A density operator is called pure if there
is a unit vector ψ ∈ H such that ρ = |ψihψ|, and it is called mixed otherwise.
A pure density operator is completely specified by the corresponding unit

vector ψ, which in turn is specified by the density operator up to a scalar of
modulus one (a ‘phase’). The term ‘state’ is used for both ρ and ψ. To emphasize
the latter case, ‘state vector’ is sometimes used.7
On the level of state vectors, a natural mathematical operation is linear com-
bination: for any pair of unit vectors ψ1 , ψ2 new state vectors can be obtained
as ψ = c1 ψ1 + c2 ψ2 with appropriately chosen c1 , c2 ∈ C. ψ is then said to be
a superposition of ψ1 and ψ2 .
On the level of density operators, a superficially similar natural mathematical
operation is convex combination. As we will see below, this has, however, an
entirely different physical interpretation and it will usually change the purity of
the state.
Proposition
1.5 (Purity). Let ρ ∈ B(H) be a density operator. Then 0 <
tr ρ2 ≤ 1 with equality iff ρ describes a pure state. Moreover, if d := dim(H) <
∞, then tr ρ2 ≥ 1/d with equality iff ρ = 1/d (which is then called maximally
mixed).
2
Proof. Since tr ρ2 = kρk2 , it is positive
and non-zero. Hölder’s inequality
together with kρk1 = 1 gives tr ρ2 ≤ kρk. Since the operator norm, in this
case, equals the largest eigenvalue and all eigenvalues are positive and sum up
to one, we get kρk ≤ 1 with equality iff ρ has rank one.
For the lower bound in finite dimensions, we can invoke the Cauchy-Schwarz
inequality for the Hilbert-Schmidt inner product in order to get:
1 = tr [1ρ] ≤ tr [1] tr ρ2 = d tr ρ2 .
2
Equality in the Cauchy-Schwarz inequality holds iff ρ is a multiple of 1, and

tr [ρ] = 1 determines the prefactor.
Example 1.7 (Bloch ball). There is a bijection between the set of density oper-
ator on C2 and the set of vectors r ∈ R3 with Euclidean norm krk ≤ 1, given
7 There is yet another, more general, mathematical meaning of the term ‘state’, namely as a
positive normalized linear functional. Clearly, every density operator induces such a functional
via A 7→ tr [ρA]. In fact, every weak-* continuous positive normalized linear functional on
B(H) is of this form. If one drops or relaxes the continuity requirement, there are, however,
other ‘states’ as well. Those arising from density operators are then called normal states and
the other ones singular states.
by
3
1
1+
X
ρ= ri σi . (1.16)
2 i=1
2
The purity is then expressible as tr ρ2 = 21 1 + krk . Consequently, the

boundary coincides with the set of pure states and the origin corresponds to
the maximally mixed state. Physically, a two-level density operator (a ‘qubit’)
might for instance model:
◦ An atom in a double-well potential. ρ = |0ih0| and ρ = |1ih1| would then
correspond to the atom being left or right, respectively.
◦ A two-level atom with ρ = |0ih0|, ρ = |1ih1| referring to the ground and
exited state, respectively.
◦ The spin of an electron with ρ = |0ih0|=
ˆ spin up, ρ = |1ih1|=
ˆ spin down.
◦ Polarization degrees of freedom of light. North-/south pole correspond
to left-/right circular polarization while the east-/west pole correspond to
horizontal/vertical polarization. The center ρ = 12 then describes unpo-
larized light.
The case dim(H) = 2 is very special in many ways. For instance, a nice
geometric representation of the set of all density operators as in Eq.(1.16) is not
possible in higher dimensions.
In infinite dimensions, as seen in Exercise 1.10, weak converges can be a
rather weak, indeed, even when restricted to finite-rank operators. On the set
of density operators, however, normalization and positivity assure that weak
convergence implies every other from of convergence:
Theorem 1.6 (Convergence to a density operator). Let ρn ∈ B1 (H) be a se-
quence of positive operators that converges weakly to a density operator ρ and
satisfies tr [ρn ] → 1. Then kρn − ρk1 → 0.
Proof. Exploiting the spectral decomposition of ρ, we can find a finite-dimensional
orthogonal projection P for which 1 − tr [ρP ] =: is arbitrarily small. That is,
for any ε > 0, we can achieve < ε in this way. With P ⊥ := 1 − P we can
bound
kρ − ρn k1 ≤ kP (ρ − ρn )P k1 + 2 P (ρ − ρn )P ⊥ 1 + P ⊥ (ρ − ρn )P ⊥ 1 . (1.17)

The first term on the r.h.s. converges to zero, since it involves only finite-
dimensional operators on which weak convergence implies norm convergence (in
any norm). For the second term on the r.h.s. of Eq.(1.17) we first use that
P ρP ⊥ = 0 and then bound the remaining part via
P ρn P ⊥ ≤ ρn P ⊥ = tr U ∗ √ρn √ρn P ⊥ ≤ tr [ρn ] tr [ρn P ⊥ ]

q
1 1
q √
= tr [ρn ] tr [ρn ] − tr [P ρn P ] → .
Here, we have first used Hölder’s inequality, then the polar decomposition
ρn P ⊥ = U |ρn P ⊥ |, and in the third step the Cauchy-Schwarz inequality for
the Hilbert-Schmidt inner product.
Finally, an upper bound for the third term on the r.h.s. of Eq.(1.17) is
⊥
P (ρ − ρn )P ⊥ ≤ tr P ⊥ ρP ⊥ + tr P ⊥ ρn P ⊥

1
= + tr [ρn ] − tr [P ρn P ] → 2.
In fact, the property just proven extends to the entire space of trace-class
operators: if Tn ∈ B1 (H) converges weakly to T ∈ B1 (H) and kTn k1 → kT k1 ,
then Tn → T in trace-norm.
Measurements Let X be the set of all possible measurement outcomes in a

given description of an experiment. We will denote by B a σ-algebra over X.
If X is discrete, then B is usually just the power set and if X is a manifold
(in particular, if X = R), then the canonical choice for B is the corresponding
Borel σ-algebra. For the moment, we will treat the elements of X just as labels
without further physical meaning. The mathematical object assigned to each
measurement apparatus is then a positive operator valued measure (POVM):
Definition 1.7 (POVMs). A positive operator valued measure (POVM) on a

measurable space (X, B) is a map M : B → B(H) that satisfies M (Y ) ≥ 0 for
all Y ∈ B and
M (Xk ) = 1
X
(1.18)
k
for any countable, disjoint partition X = ∪k Xk with Xk ∈ B. A POVM is

called sharp if M (Y ) is an orthogonal projection for any Y ∈ B. In this case,
M is also called a projection valued measure (PVM).
Due to Eq.(1.18), M is also called resolution of identity in the literature. If

X is discrete, M is determined by the tuple of operators
P Mx := M ({x}) that
correspond to the singletons x ∈ X. Then M (Y ) = x∈Y Mx for any Y ⊆ X
and with a slight abuse of terminology, one often calls the tuple (Mx )x∈X of
positive operators that sum up 1 the POVM.
Positivity of the M (Y ) together with the normalization requirement in Eq.(1.18)
implies 0 ≤ M (Y ) ≤ 1.8 Moreover:
Lemma 1.8. Let M : B → B(H) be a POVM and J, Y ∈ B.
(1) If J ⊆ Y , then M (J) + M (Y \J) = M (Y ) and M (J) ≤ M (Y ),
(2) M (J ∪ Y ) ≤ M (J) + M (Y ) with equality if Y ∩ J = ∅.

8 An element E ∈ B(H) that satisfies 0 ≤ E ≤ 1 is in this context often called effect
operator.
Proof. Using Eq.(1.18) twice, we get

(
M (Y ) + M (X\Y )
1=
M (J) + M (Y \J) + M (X\Y ).
By subtraction of the two lines we obtain M (Y ) − M (J) = M (Y \J) ≥ 0, which

proves (1). In order to arrive at (2), we exploit (1) for the sets J and J ∪ Y .
Then M (J ∪ Y ) = M (J) + M ((J ∪ Y )\J) ≤ M (J) + M (Y ).
If a POVM M is projection valued, then 0 ≤ M (Y ) ≤ 1 implies that

M (Y )M (J) = 0 whenever Y ∩ J = ∅ (cf. Exercise 1.15).
Probabilities Having introduced the basic mathematical objects that are as-
signed to preparation and measurement, it remains to see how these are com-
bined in a way that eventually leads to probabilities. This is what the following
postulate is doing:
Postulate 1.9 (Born’s rule). The probability p(Y |ρ, M ) of measuring an out-
come in Y ∈ B if preparation and measurement are described by a density
operator ρ ∈ B1 (H) and a POVM M : B → B(H), respectively, is given by
p(Y |ρ, M ) = tr [ρM (Y )] . (1.19)
If ρ and M are clear from the context, we will simply write p(Y ) := p(Y |ρ, M )
and if X is discrete and B the corresponding power set, we will write p(x) or
px for p({x}).
The defining properties of density operators and POVMs now nicely play
together so that p(Y |ρ, M ) has all the necessary properties for an interpretation
in terms of probabilities:
Corollary 1.10. For any density operator ρ ∈ B1 (H) and POVM M : B →

B(H), the map p : Y 7→ p(Y ) that appears in Born’s rule defines a probability
measure on (X, B).
Proof. First observe that ∀Y ∈ B : 0 ≤ p(Y ) ≤ 1. The lower bound follows

from positivity of ρ and M (Y ) (cf. Exercise 1.6a) and the upper bound from
Eq.(1.18) applied to the trivial partition of X together with tr [ρ] = 1. When
applying Eq.(1.18) to X = X ∪ ∅ together with positivity of M , we obtain
further that p(X) = 1 and p(∅) = 0.P
Finally, we have to show that k p(Xk ) = 1 for any countable disjoint
partition X = ∪k Xk with Xk ∈ B. This again follows from Eq.(1.18) since
" #
M (Xk ) = tr [ρ1] = 1.
X X X
p(Xk ) = tr [ρM (Xk )] = tr ρ (1.20)
k k k
Here interchanging the sum with the one in the trace is justified by positivity
of all expressions and Fubini-Tonelli.
If M and ρ are given, Born’s rule tells us how to compute quantum theory’s
prediction of the measurement probabilities. In practise, we typically know M
and ρ only for some simple cases together with some mathematical rules (yet
to be formalized in this lecture) telling us how to reduce more general cases to
these simple ones. The largest part of quantum theory (Schrödinger equation,
composite systems, etc.) is about those rules and their consequences.
Traditional text-book quantum theory often assume ρ to be pure and M to
be sharp. We will soon see in which sense this is justified.
As a first application of the formalism, let us consider the problem of infor-
mation transmission via a d-level quantum system, i.e., one for which H = Cd .
Given an alphabet X of size |X| = m, is it possible to encode all its elements
into a d-level quantum system so that the information can finally be retrieved
exactly or at least with a small probability of error?
Following the rules of the formalism, we assign a density operator ρx ∈ B(H)
to each x ∈ X. Similarly, we assume that there is a measurement apparatus that
has X as the set of possible measurement outcomes P so that a positive operator
Mx ∈ B(H) is assigned to each outcome and that x∈X Mx = 1. If ρx has been
prepared, the probability for measuring the correct outcome is then, according
to Born’s rule: px := tr [ρx Mx ]. Now consider the average probability of success,
averaged uniformly over all x ∈ X:
Proposition 1.11. The average probability of success, when Ptransmitting an
1 d
alphabet of size m over a d-level quantum system satisfies m x p x ≤ m .
Proof. The claim follows from the defining properties of POVMs and density
operators for instance via the use of Hölder’s inequality and the fact that
kρx k∞ ≤ 1:
1 X 1 X 1 X X d
px = tr [ρx Mx ] ≤ kρx k∞ kMx k1 ≤ tr [Mx ] = .
m x m x m x x
m
This should be compared with the performance of the following naive classi-
cal (= non-quantum) protocol that aims at transmitting a random element from
the alphabet X using only d of its elements: fix any subset D ⊆ X of d = |D|
elements; send x if x ∈ D and send an arbitrary element from D if x 6∈ D. The
probability of success of this protocol is d/m. Prop.1.11 tells us that this can
not be outperformed by any quantum protocol.
As a second simple application of the formalism, let us analyze to what
extent a change in ρ or M can alter the probability of a measurement outcome:
Corollary 1.12 (Lipschitz-bounds for probabilities). Let ρ, ρ0 ∈ B1 (H) be den-
sity operators, M, M 0 : B → B(H) POVMs an a common measurable space
(X, B) and Y ∈ B. Then
p(Y |ρ, M ) − p(Y |ρ0 , M ) ≤ 1 kρ − ρ0 k ,

1 (1.21)
2
where equality can be attained for every pair ρ, ρ0 by a suitable choice of the
POVM M . Similarly,
sup p(Y |ρ, M ) − p(Y |ρ, M 0 ) = kM (Y ) − M 0 (Y )k∞ .

(1.22)
ρ
Proof. Consider the decomposition (ρ − ρ0 ) = ∆+ − ∆− into orthogonal positive

and negative parts (as introduced in Eq.(1.6)) and denote by P+ the orthogonal
projections onto the closure of the range of ∆+ . Then ∆± ≥ 0, P+ ∆+ =
∆+ and P+ ∆− = 0. Moreover, tr [ρ − ρ0 ] = 0 implies tr [∆+ ] = tr [∆− ] and
|ρ − ρ0 | = ∆+ + ∆− implies further that kρ − ρ0 k1 = 2tr [∆+ ]. W.l.o.g. we
assume that tr [∆+ M (Y )] ≥ tr [∆− M (Y )] (otherwise interchange ρ ↔ ρ0 ). Then
using positivity of M (Y ) we obtain (by Born’s rule, Exercise 1.6a and Hölder’s
inequality):
p(Y |ρ, M ) − p(Y |ρ0 , M ) = tr [∆+ M (Y )] − tr [∆− M (Y )] ≤ tr [∆+ M (Y )]

1
≤ k∆+ k1 kM (Y )k∞ ≤ kρ − ρ0 k1 ,
2
where we have used kM (Y )k∞ ≤ 1, which is a consequence of 0 ≤ M (Y ) ≤
1 (cf. Exercise 1.8). Equality in all the involved inequalities is achieved for
M (Y ) = P+ . The operators (P+ , 1 − P+ ) then form a suitable POVM.
In order to arrive at Eq.(1.22), first note that Hölder’s inequality together
with kρk1 = 1 leads to the upper bound
tr ρ M (Y ) − M 0 (Y ) ≤ kM (Y ) − M 0 (Y )k .

∞
That this equals the supremum follows from the fact that the operator norm of
the Hermitian operator M (Y ) − M 0 (Y ) can already be obtained by taking the
supremum over all pure states ρ = |ψihψ| on the l.h.s. (cf. Exercise 1.8d).
The fact that Eq.(1.21) is tight provides an operational interpretation for

the trace-norm distance of two density operators as a means of quantifying the
extent to which the two corresponding preparations can be distinguished in a
statistical experiment.
Observables and expectation values So far we have treated the measure-

ment outcome merely as a label without further meaning. In practice, there is
often a numerical value assigned to every x ∈ X. We will denote this value by
m(x) ∈ R and assume that the function m is B R -measurable. Two frequently
used quantities are the expectation value hmi := X
m(x)dp(x) and the variance
var(m) := X m(x)2 dp(x) − hmi2 .
R
If the probability measure p is represented according to Born’s rule, we can

write the expectation value as
h i Z
hmi = tr ρM̂ , M̂ := m(x) dM (x), (1.23)
X
P
which in the discrete case reduces to M̂ = x m(x)Mx . We will also use the

common notation hM̂ i := tr ρM̂ . So far, M̂ is a formal expression that is not
guaranteed to be meaningful if m is not bounded. For simplicity, we will leave
the discussion of the unbounded case aside. P
If the underlying POVM M is sharp, then M̂ = x m(x)Mx becomes a
spectral decomposition. In this case, we call M̂ an observable 9 and notice that
each m(x) is then an eigenvalue of M̂ with corresponding spectral projection Mx .
That is, M̂ determines both m and M . In this way, any Hermitian operator is a
mathematically valid observable whose spectral decomposition determines the
set of possible measurement values and the POVM. Furthermore, since spectral
projections of a Hermitian operator that correspond to different eigenvalues are
mutually orthogonal (i.e. Mx My = δx,y Mx , cf. Exercise1.15) we can express
the variance as
h i h i2
var(m) = tr ρM̂ 2 − tr ρM̂ =: var(M̂ ). (1.24)
Notice that this does not hold in general, i.e. when M is not sharp.
Textbook descriptions of quantities like position, momentum, energy, an-
gular momentum and spin are usually in terms of observables (albeit in the
more general framework of not necessarily bounded self-adjoint operators). For
instance, the Pauli matrices, when divided by two, are the observables that
correspond to the three spin directions of a spin- 21 particle.
Exercise 1.11. Show that every trace-class operator can be written as a linear combi-
nation of four density operators.
Exercise 1.12. Let V ∈ B(H1 , H2 ) be such that for every density operator ρ ∈ B1 (H1 )
the operator V ρV ∗ is again a density operator. What can be said about V ?
Exercise 1.13. Prove the Bloch ball representation in Eq.(1.16). (Hint: use the deter-
minant). For a given density operator on C2 , how can the vector r be obtained?
Exercise 1.14. For any H construct a POVM that implements a ‘biased coin’ whose
outcomes occur independently of the density operator with probabilities 21 (1 ± b),
where b ∈ [0, 1] is a fixed bias.
Exercise 1.15. Let M : B → B(H) be a sharp POVM on (X, B). Show that Y ∩ J = ∅
implies that M (Y )M (J) = 0. From here, prove that the number of pairwise disjoint
elements in B on which M is non-zero is at most d if H = Cd .
Exercise 1.16. Show that two preparations described by density operators ρ1 , ρ2 ∈
B1 (H) can be distinguished with certainty in a statistical experiment iff ρ1 ρ2 = 0.
Exercise 1.17. Construct a pair of density operators ρ, ρ0 on a common Hilbert space
with the properties that: (i) their spectra coincide and each eigenvalue has multiplicity
one, (ii) there is no unitary U such that U ρU ∗ = ρ0 .
9 Traditionally, the term observable is associated to self-adjoint operators. Sometimes,
however, it is also used more generally, often synonymous with measurement.

1.4 Convexity
Convex sets and extreme points
Definition 1.13. Let V be a real vector space.10
◦ A subset C ⊆ V is called convex, if x, y ∈ C implies that λx+(1−λ)y ∈ C

for all λ ∈ [0, 1].
◦ For a subset S ⊆ V define the convex

Pn hull conv(S) as the set Pnof all finite
linear combinations of the form i=1 λi x i with λ i ≥ 0, i=1 λi = 1,
xi ∈ S and n ∈ N.
◦ The dimension of a convex set is the dimension of the affine space gener-
ated by it.
◦ An extreme point of a convex set C is an element e ∈ C with the property

that e = λx + (1 − λ)y with x, y ∈ C, λ ∈ [0, 1] implies that e ∈ {x, y}.
We denote the set of extreme points of C by E(C).
Theorem 1.14 (Caratheodory). Let V be a normed space, C ⊆ V a compact

convex set of dimension d < ∞ and x ∈ C. There is a set of extreme points E ⊆
E(C) of size |E| ≤ (d + 1) so that x ∈ conv(E). In particular, C = conv(E(C)).
Here, the decomposition into extreme points is unique for all x ∈ C iff the
convex set is a simplex, i.e., it has exactly d + 1 extreme points. The set of
probability distributions over a finite set, for instance, forms a simplex.
The infinite dimensional analogue of Caratheodory’s theorem requires taking
the closure of the set of extreme points. Then the analogous statement is true for
all topologies that are ‘locally convex’. This means that the topology arises from
(semi-)norms, as it is the case for all topologies discussed so far, in particular,
for the weak-* topology on B(H).
Theorem 1.15 (Krein-Milman). Let V be a locally convex topological vector

space and C ⊆ V compact and convex. Then C is is the closure of the convex
hull of its extreme points, i.e. conv(E(C)) = C.
By Alaoglu’s theorem, in the weak-* topology a set C ⊆ B(H) is compact iff

it is closed and norm-bounded. Hence, Krein-Milman applies especially to the
unit ball in B(H). For that particular case, however, there is a stronger result
that holds in the topology of the operator norm:
Theorem 1.16 (Russo-Dye, Kadison-Pedersen). In the operator-norm topol-

ogy, the unit ball {A ∈ B(H)| kAk ≤ 1} is the norm-closed convex hull of the
set of unitary operators. Specifically, if kAk ≤ 1 − n2 for some n ∈ N, then there
are unitaries (Ui )ni=1 so that n1 (U1 + . . . + Un ) = A.
10 Note that every complex vector space is in particular a real vector space.
1.4. CONVEXITY 23
The second part of this theorem (due to Kadison and Pedersen) implies that
every element of the unit ball can be approximated up to 2/n in operator norm
by an equal-weight convex combination of n unitaries. This is reminiscent of
the following result that holds for inner product spaces. It has a very elegant
proof that exploits the probabilistic method—so we have to state it:
Theorem 1.17 (Maurey). Let C be a subset of an inner product space, φ ∈
conv(C) and b := supξ∈C kξk. For any n ∈ N there are elements ψ1 , . . . , ψn ∈ C
so that
n
1X 2 b2
φ − ψi ≤ , (1.25)

n i=1 n
where the norm is the one induced by the inner product.
P As φ is in the convex hull of C, there is a finite subset Ξ ⊆ C so that

Proof.
φ = z∈Ξ λz z, where λ forms a probability distribution over Ξ. Let Z1 , . . . , Zn
be i.i.d. random variables with values in Ξ, distributed according to λ. Hence,
by construction, the expectation values are E [Zi ] = φ. Using this and the i.i.d.
property, it is straightforward to show that
" n
#
1X 2 1

E φ − E kZi k2 − kφk2 .

Zi =

n i=1 n
2
Here, the r.h.s. can be bounded from above by bn . Since the resulting inequality
holds for the expectation value, there has to be at least one realization of the
random variables for which it is true as well.
Mixtures of states On any given Hilbert space H, the set of density opera-
tors S(H) := {ρ ∈ B1 (H)|ρ ≥ 0, tr [ρ] = 1} is a convex set: the trace is obviously
preserved by convex combinations and the sum of two positive operators is again
positive. In fact, slightly more is true: if (ρn )n∈N is any sequence of density
operators and (λn )n∈N is any sequence of positive numbers that sum up to one,
then
X∞
λn ρn ∈ S(H),
n=1
where the sequence of partial sums converges in trace norm.
InPorder to see
Pl l
this, realize that it is a Cauchy sequence (as n=k λn ρn ≤ n=k λn and

1
λ ∈ l1 (N)) and that B1 (H) is a Banach space.
Conversely, every single density operator can be convexly decomposed into
pure state density operators via its spectral decomposition, which in this case
coincides with the Schmidt decomposition
X
ρ= λn |ψn ihψn |,
n
where the λn ’s are the eigenvalues and the ψn ’s the corresponding orthonormal
eigenvectors. Pure state density operators can not be convexly decomposed
further (Exercise 1.18). Consequently, the pure state density operators are
exactly the extreme points of S(H). If ρ is not pure, there are infinitely many
ways of decomposing it convexly into pure states—the spectral decomposition
is one of them and distinguishes itself by the fact that the ψn ’s are mutually
orthogonal.
Example 1.8 (Decompositions into pure states). For any density operator ρ ∈
B1 (H) convex decompositions into pure states can be constructed from any
orthonormal basis {ek } via the corresponding resolution of identity in Eq.(1.3):
√
if we multiply Eq.(1.3) from both sides with ρ, we obtain
X√ √ X
ρ= ρ|ek ihek | ρ = pk |ϕk ihϕk |, (1.26)
k k
√ √
with ϕk := ρek / ρek and pk := hek , ρek i. Since every subspace that has di-
mension greater than one admits an infinite number of inequivalent orthonormal
bases, this construction leads to an infinite number of different decompositions
unless ρ is pure. In Cor. 1.22 we will see, that the resulting probability dis-
tribution p is always at least as mixed as the distribution of eigenvalues of ρ.
Moreover, one can show that all countable convex decompositions into pure
states can be obtained in the described way if one allows in addition to first
embed isometrically into a larger Hilbert space and then follows the described
construction starting from an orthonormal basis of the larger space.
Convex combinations of density operators have a simple operational mean-
ing. To understand this, assume that an experimentalist has two preparation
devices at hand, which are described by density operators ρ0 , ρ1 ∈ B1 (H). As-
sume further, that for every single preparation of the system, she first flips a
coin and then uses one of the two devices depending on the outcome, say ρ1
with probability λ and ρ0 with probability 1 − λ. If eventually a measurement is
performed that is described by a POVM M , then the probability of measuring
an outcome in Y is given by

λp(Y |ρ1 , M ) + (1 − λ)p(Y |ρ0 , M ) = tr λρ1 + (1 − λ)ρ0 M (Y ) ,
where Born’s rule was used together with the linearity of the trace. Hence, the
overall preparation, which now includes the random choice of the experimental-
ist, is described by the convex combination λρ1 + (1 − λ)ρ0 .

Majorization In Prop. 1.5 we saw that the functional tr ρ2 can be used to
quantify how pure ormixed a density operator is. Using functional calculus this
can be express as tr ρ2 = tr [f (ρ)] with f (x) = x2 . This choice is somewhat
arbitrary since we could instead have used e.g. f (x) = x3 , which also orders the
set of density operators from the maximally mixed state to the pure states. If
dim(H) > 2, however, the two orders turn
out to be inequivalent, i.e. we can
find ρ1 , ρ2 with tr ρ21 > tr ρ22 but tr ρ31 < tr ρ32 . So is there any reasonable
way of saying that ρ1 is more mixed (or pure) than ρ2 ? The answer to this
question is given by a preorder11 that is based on the notion of majorization.
11 A preorder is a binary relation that is transitive and reflexive.
1.4. CONVEXITY 25
Definition 1.18 (Majorization). Let λ, µ be two finite (and equal-length) or

infinite sequences of non-negative real numbers with kλk1 = kµk1 = 1. By λ↓ , µ↓
we denote the corresponding sequences rearranged in non-increasing order. We
say that λ is majorized by µ and we write λ ≺ µ if
k k
λ↓i µ↓i
X X
≤ ∀k. (1.27)
i=1 i=1
For a pair of density operators ρ1 , ρ2 ∈ B(H) we write ρ1 ≺ ρ2 if the sequence

of eigenvalues of ρ1 is majorized by the one of ρ2 .
We will see that this is closely related to the following concept:
Definition 1.19 (Doubly stochastic matrices). Let d ∈ N ∪ ∞. A d × d matrix
with non-negative entries Mij is called doubly stochastic if for all i:
d
X d
X
Mij = Mji = 1. (1.28)
j=1 j=1
Example 1.9 (Permutation matrices). Let N be either N or {1, . . . , d} for d ∈

N. Then any bijection π : N → N leads to a doubly stochastic matrix via
Mij := δi,π(j) with i, j ∈ N . These are called permutation matrices. In the
finite dimensional case, Birkhoff ’s theorem states that permutation matrices
form the extreme points of the convex set of doubly stochastic matrices.
Example 1.10 (Unistochastic matrices). Let U ∈ B(H) be unitary and {ek } ⊂ H
an orthonormal basis. Then the matrix with elements Mij := |hei , U ej i|2 is
called unistochastic. This is an example of a doubly stochastic matrix, since
X X
|hei , U ej i|2 = hei , U ∗ ej ihej , U ei i = hei , U ∗ U ei i = 1,
j j
and similarly for the transposed matrix. Note that in particular every permu-
tation matrix is unistochastic as it can be obtained by choosing U to be the
corresponding permutation of basis elements.
The following relates the concepts discussed so far in this paragraph:
Theorem 1.20. Let λ, µ be two finite (and equal-length) or infinite sequences
of non-negative real numbers with kλk1 = kµk1 = 1. Then the following are
equivalent:
(i) λ ≺ µ.
(ii) There is a doubly stochastic matrix M so that λ = M µ.
(iii) For all continuous convex functions f : [0, 1] → R that satisfy f (0) = 0:
X X
f (λk ) ≤ f (µk ). (1.29)
k k
When applied to density operators, this gives:

Corollary 1.21. Let ρ1 , ρ2 ∈ B1 (H) be two density matrices. Then ρ1 ≺ ρ2 iff
for all continuous convex functions f : [0, 1] → R with f (0) = 0: tr [f (ρ1 )] ≤
tr [f (ρ2 )].
Consequently, majorization is a meaningful way of saying that one density
operator is more mixed than another. Note in particular that ρ ≺ |ψihψ| and
for d-dimensional quantum systems 1/d ≺ ρ holds for any density operator ρ.
Corollary 1.22. Let {ek } ⊂ H be an orthonormal basis, ρ ∈ B1 (H) a density
operator with eigenvalues (λk ) and pk := hek , ρek i. Then
λ p. (1.30)
P
Proof. Inserting the spectral decomposition ρ = i λi |ψi ihψi | into pk = hek , ρek i,
we obtain p = M λ with Mki := |hek , ψi i|2 . Since we can express ψi = U ei for
a suitable unitary U , we get that M is an unistochastic matrix, so that by
Thm.1.20 λ p.
In the context of Example 1.8, this result implies that among all the decom-
positions into pure states to which Eq.(1.26) gives rise, the spectral decomposi-
tion is the least mixed.
Convex functionals In this paragraph we have a closer look at convex func-

tionals that are defined on sets of Hermitian operators and constructed from
convex functions in a single real variable by means of functional calculus (cf. p.
12).
Theorem 1.23. Let f : [a, b] ⊂ R → R be a continuous convex function and
A ∈ B∞ (H) Hermitian with spec(A) ⊆ [a, b]. Then, for every unit vector ψ ∈ H:

f hψ, Aψi ≤ hψ, f (A)ψi. (1.31)
Proof. First observe that c := hψ, Aψi ∈ [a, b] since a1 ≤ A ≤ b1. Assume for
the moment that c ∈ (a, b). By convexity of f we can find an affine function
l : [a, b] → R such that f ≥ l and f (c) = l(c). Then f (A) ≥ l(A) and therefore

hψ, f (A)ψi ≥ hψ, l(A)ψi = l(c) = f (c) = f hψ, Aψi ,
where we have used that l(A) = α1 + βA if l is of the form l(x) = α + βx.

It remains to discuss the case c ∈ {a, b}. In this case, a linear function with
the stated properties might not exist if f is ‘infinitely steep’ at the boundary.
However, for any > 0 we can still find a linear function l with l ≤ f so that
f (c) − l(c) ≤ . Following the same argument and using that we can choose any
> 0 then completes the proof.
Corollary 1.24 (Convex trace functions). Let f : [0, 1] → R+ be convex,
continuous and so that f (0) = 0. Define C(H) := {A ∈ B∞ (H)|0 ≤ A ≤ 1} and
F : C(H) → R, F (A) := tr [f (A)] ∈ [0, ∞]. Then:
1.4. CONVEXITY 27
(i) F is convex on C(H).

(ii) For any A ∈ C(H) and any orthonormal basis {ek } of H:
X
F (A) ≥ f hek , Aek i . (1.32)
k
Proof. (i) Let Aλ := λA1 + (1 − λ)A0 be a convex combination of A0 , A1 ∈ C(H)

and {ψk } ⊂ H an orthonormal basis of eigenvectors of Aλ . Then
X X
λF (A1 ) + (1 − λ)F (A0 ) = λ hψk , f (A1 )ψk i + (1 − λ) hψk , f (A0 )ψk i
k k
X X
≥ λ f hψk , A1 ψk i + (1 − λ) f hψk , A0 ψk i
k k
X
≥ f hψk , Aλ ψk i = F (Aλ ).
k
Here, the first inequality is due to Eq.(1.31), the second inequality uses convexity
of f and the last step uses that (hψk , Aλ ψk i) is the sequence of eigenvalues of
Aλ . P
(ii) follows from Eq.(1.31) applied to each term in F (A) = k hek , f (A)ek i.
The following useful observation also enables to lift inequalities of scalar
functions to inequalities of functions of operators under the trace:
Lemma 1.25. Let I ⊆ R be an open interval. If fi , gi : I → R and αi ∈ R for
i ∈ {i, . . . , n} satisfy
n
X
αi fi (a)gi (b) ≥ 0 ∀a, b ∈ I, then
i=1
n
X
αi tr [fi (A)gi (B)] ≥ 0 (1.33)
i=1
holds for all Hermitian A, B ∈ B(Cd ) whose spectra are contained in I.

P P
Proof. Inserting spectral decompositions A = k λk |ek ihek | and B = l µl |fl ihfl |
we obtain
Xn X n
X
αi tr [fi (A)gi (B)] = |hek , fl i|2 αi fi (λk )gi (µl ) ≥ 0.
i=1 k,l i=1
Corollary 1.26 (Klein inequalities). Let I ⊆ R be an open interval, A, B ∈

B(Cd ) Hermitian with spectra in I and f : I → R convex and differentiable.
Then
tr [f (A) − f (B)] ≥ tr [(A − B)f 0 (B)] . (1.34)
If f is twice differentiable and strongly convex, i.e. inf x∈I f 00 (x) =: c > 0, then
c 2
tr [f (A) − f (B)] − tr [(A − B)f 0 (B)] ≥ kA − Bk2 . (1.35)
2
Proof. Both inequalities exploit Lemma 1.25. Eq.(1.34) then follows from the
fact that any convex function satisfies f (a) − f (b) ≥ (a − b)f 0 (b) and Eq.(1.26)
uses the mean-value version of Taylor’s theorem, which states that there is a
z ∈ [a, b] such that
1
f (b) = f (a) + (b − a)f 0 (a) + (b − a)2 f 00 (z).
2
Entropy An important example of a convex trace function is the negative

entropy. Its classical manifestations are ubiquitous in information theory, sta-
tistical physics, probability theory and thermodynamics.
Definition 1.27 (Entropy). The von Neumann entropy (short entropy) of

a density operator ρ ∈ B1 (H) is defined as S(ρ) := tr [h(ρ)], where h(x) :=
−x log x with h(0) := 0.
Depending on the field, different bases of the logarithm are used: the natural
choice in information theory is log2 , whereas in thermodynamics and statistical
physics the natural logarithm ln is used.
On the relevant interval [0, 1] the function h is non-negative, continuous and
concave. By Cor.1.24 (i) this implies that the von Neumann entropy S is a non-
negative, concave functional on the set of density operators. From Cor. 1.21 we
get
ρ1 ≺ ρ2 ⇒ S(ρ1 ) ≥ S(ρ2 ).
For finite dimensional Hilbert spaces the von Neumann entropy is continuous,
which is implied by continuity of the eigenvalues. In infinite dimensions conti-
nuity has to be relaxed to lower semicontinuity. This means lim inf ρ→ρ0 S(ρ) ≥
S(ρ0 ) (cf. Example 1.11 and Exercise 1.21).
Since h(x) = 0 iff x ∈ {0, 1} we get that S(ρ) = 0 iff ρ is pure. On Cd the
maximum S(ρ) = log d is attained iff ρ = 1/d is maximally mixed. The infinite
dimensional case is elucidated by the following example:

Example 1.11 (Infinite entropy). Consider a sequence pn := c/ n(log n)γ for
R > 2, γ ∈γ (1, 2) and c 1−γ
n a positive constant to be chosen shortly. From
1/(x(log x) )dx = (log x) P/(1 − γ) it follows thatP p ∈ l1 (N) so that we
can choose c in a way that p
n Rn = 1. However, − n pn log pn = ∞ due
to the divergence of the integral 1/(x(log x)γ−1 )dx. Hence, if σ is a density
operator with eigenvalues (pn ), then S(σ) = ∞. Moreover, if ρ is any density
operator, then S (1 − )ρ + σ ≥ (1 − )S(ρ) + S(σ) = ∞ for any > 0.
Consequently, on an infinite dimensional Hilbert space, the density operators
with infinite entropy are trace-norm dense in the set of all density operators.
Exercise 1.18. Show that pure states are extreme points of the convex set of density
operators.
1.5. COMPOSITE SYSTEMS AND TENSOR PRODUCTS 29
Exercise 1.19. Let ρ1 , ρ2 ∈ B(Cd ) be two density operators. Prove that ρ1 ≺ ρ2 iff
there exist a finite set ofPunitaries Ui ∈ B(Cd ) and corresponding probabilities pi > 0,
P ∗
i pi = 1 so that ρ1 = i pi Ui ρ2 Ui .
Denote by Un all maps from P
Exercise 1.20.P B(Cd ) to itself that are of the form
B(C ) 3 ρ 7→ i=1 pi Ui ρUi , for some pi ≥ 0, n
d n ∗
i=1 pi = 1Sand unitaries Ui ∈ B(Cd ).
Determine an m ∈ N (as a function of d) such that Um = n∈N Un .
Exercise 1.21. Construct a sequence of density operators of finite entropy that con-
verges in trace-norm to a pure state but has entropy diverging to ∞.
1.5 Composite systems and tensor products

For all kinds of mathematical spaces there are three basic ways of constructing
new spaces from old ones: quotients, sums and products. In the case of Hilbert
spaces, we have essentially discussed quotients already since the quotient of a
Hilbert space H by a subspace V can be identified with the orthogonal comple-
ment V ⊥ in H. In this section, we will have a closer look at the two remaining
constructions: direct sums and, in particular, tensor products.
Direct sums We begin with the simpler construction:
Definition 1.28 (Direct sum). Let H1 and H2 be Hilbert spaces. Their direct
sum is the Hilbert space H1 ⊕ H2 := {(ψ, ϕ) ∈ H1 × H2 } with inner product
h(ψ1 , ϕ1 ), (ψ2 , ϕ2 )i := hψ1 , ψ2 i + hϕ1 , ϕ2 i.
Instead of (ψ, ϕ) we also write ψ ⊕ ϕ for the elements of H1 ⊕ H2 .
This construction leads to a Hilbert space of dimension dim(H1 ⊕ H2 ) =

dim(H1 ) + dim(H2 ). H1 and H2 can be regarded as embedded mutually or-
thogonal subspaces H1 ⊕ 0 andL 0 ⊕ H2 of H1 ⊕ H2 . For a finite number of
Hilbert spaces the definition of n Hn extends immediately and it is associa-
tive, i.e. (H1 ⊕H2 )⊕H3 = H1 ⊕(H2 ⊕H3 ). For an infinite sequence (Hn )n∈N of
Hilbert spaces, one defines the corresponding infinite direct sum Hilbert space
as M X
2
Hn := (ϕn )n∈N ϕn ∈ Hn , kϕn k < ∞ ,
n∈N n∈N
P
with inner product h(ϕn )n∈N , (ψn )n∈N i := n∈N hϕn , ψn i.
For A ∈ B(H1 ), B ∈ B(H2 ) we can define (A ⊕ B) ∈ B(H1 ⊕ H2 ) via
(A ⊕ B)ϕ ⊕ ψ := Aϕ ⊕ Bψ. It is then straightforward to show that kA ⊕ Bk =
max{kAk , kBk} and that A, B ≥ 0 implies A ⊕ B ≥ 0. When expressed
as a
A 0
matrix A ⊕ B simply becomes the block diagonal matrix .
0 B
Tensor products
Definition 1.29 (Tensor product Hilbert space). For any pair ψ1 ∈ H1 , ψ2 ∈

H2 define a conjugate-bilinear functional ψ1 ⊗ ψ2 : H1 × H2 → C by (α, β) 7→
hα, ψ1 ihβ, ψ2 i. The algebraic tensor product of H1 and H2 is defined as the
space of all finite linear combinations of maps of the form ψ1 ⊗ ψ2 . The tensor
product Hilbert space H1 ⊗ H2 of H1 and H2 is defined as the completion of
the algebraic tensor product w.r.t. the inner product
hϕ1 ⊗ ϕ2 , ψ1 ⊗ ψ2 i := hϕ1 , ψ1 ihϕ2 , ψ2 i, (1.36)
extended by linearity and continuity to the whole space.

If several Hilbert spaces are combined via the tensor product or direct sum
construction, then the following Hilbert space isomorphisms hold:
H1 ⊗ H2 ' H2 ⊗ H 1 ,
(H1 ⊗ H2 ) ⊗ H3 ' H1 ⊗ (H2 ⊗ H3 ), (1.37)
H1 ⊗ (H2 ⊕ H3 ) ' (H1 ⊗ H2 ) ⊕ (H1 ⊗ H3 ).
It should be noted that the concrete construction of H1 ⊗ H2 , which appears

in terms of conjugate-bilinear maps in the above definition, is usually not used.
What is used a lot, however, are the resulting properties. In particular linearity:
k
!  l 
k X l
X X X
ψi ⊗  ϕj  = ψi ⊗ ϕj , (1.38)
i=1 j=1 i=1 j=1
(cψ) ⊗ ϕ = c(ψ ⊗ ϕ) = ψ ⊗ (cϕ), for c ∈ C. (1.39)
The constructed Hilbert space has dim(H1 ⊗ H2 ) = dim(H1 )dim(H2 ). In fact,

every pair of orthonormal bases {ek } ⊂ H1 , {fl } ⊂ H2 gives rise to an or-
thonormal basis {ek ⊗ fl } ⊂ H1 ⊗ H2 . Such a basis is called a product basis
P products. Expanding an element Ψ ∈ H1 ⊗ H2
as all its elements are simple
in this basis leads to Ψ = k,l Ψk,l ek ⊗ fl , where Ψk,l := hek ⊗ fl , Ψi satisfies
2
kΨk = k,l |Ψk,l |2 by Parseval’s identity. The right hand side of this identity
P
looks like the square of the Hilbert-Schmidt-norm of the ‘matrix’ (Ψk,l ). Hence,
the expansion suggests an isomorphism between elements of the tensor product
Hilbert space and elements of the space of Hilbert-Schmidt class operators. This
is formalized in the following:
Theorem 1.30 (Hilbert-Schmidt isomorphism). The tensor product Hilbert
space H1 ⊗ H2 is isomorphic to the space of Hilbert-Schmidt-class operators
B2 (H1 , H2 ). That is, there is a linear bijection I : H1 ⊗ H2 → B2 (H1 , H2 ) so
that for all Ψ, Φ ∈ H1 ⊗ H2 :
hΦ, Ψi = tr [I(Φ)∗ I(Ψ)] . (1.40)
Proof. We could simply argue that the respective orthonormal bases have the
same cardinality and thus there has to be an isomorphism. For later use, how-
ever, we follow a more explicit route. For that, it is convenient to introduce
P
the complex conjugate ψ := k hψ, ek iek of an arbitrary element ψ ∈ H1 w.r.t.
a fixed orthonormal basis {ek } ⊂ H1 . Note that the operation ψ 7→ ψ is an
involution that preserves the norm as well as orthogonality. Now we define
I : |ψi ⊗ |ϕi 7→ |ϕihψ| (1.41)
and extend it by linearity and continuity to the entire space. Then I is the
sought Hilbert space isomorphism since it is a bijection between orthonormal
bases: a product basis |ek i ⊗ |fl i of H1 ⊗ H2 and a basis of rank-one operators
|fl ihek | of B2 (H1 , H2 ).
An important application of this isomorphism is a normal-form for elements
of a tensor product Hilbert space:
Theorem 1.31 (Schmidt decomposition for tensor products). For every Ψ ∈
H1 ⊗ H2 there is an r ∈ N ∪ {∞}, a sequence of strictly positive numbers (si )ri=1
and orthonormal bases {ek } ⊂ H1 , {fl } ⊂ H2 such that
r
X
Ψ= si ei ⊗ fi . (1.42)
i=1
Moreover, the si ’s (called Schmidt coefficients) are as a multiset uniquely de-

Pr 2
termined by Ψ and satisfy i=1 s2i = kΨk .
Proof. We exploit the isomorphism from Thm.1.30 together with the fact that
I(Ψ)
P is a compact operator for which there is a Schmidt decomposition I(Ψ) =
−1
i si |fi ihe i |. Applying the inverse I and using Eq.(1.41) then proves the
decomposition in Eq.(1.42). Uniqueness of the si ’s follows from the uniqueness
Pr 2
of the multiset of singular values of compact operators and i=1 s2i = kΨk is
an application of Parseval’s identity.
Since the Schmidt coefficients are uniquely determined by Ψ, the same is
true for their number r, which is called the Schmidt rank of Ψ. Obviously,
r ≤ min{dim(H1 ), dim(H2 )} and r = 1 iff Ψ is a simple tensor, i.e. of the form
Ψ = ϕ1 ⊗ ϕ2 for some ϕi ∈ Hi .
Example 1.12 (Maximally entangled states). A pure state represented by a unit
vector Ψ ∈ Cd ⊗ Cd is called a d-dimensional
√ maximally entangled state if all
its Schmidt coefficients are equal to 1/ d (and thus r = d). The isomorphism
in Thm.1.30 then yields a bijection between the set of d-dimensional maximally
mixed states and the projective unitary group P U (d) (i.e., the quotient of U (d)
by U (1), which corresponds to the phases that lead to equivalent states). In par-
ticular, the Hilbert-Schmidt-orthogonal basis of √unitaries from Eq.(1.11) then
leads to an orthonormal basis Ψk,l := I −1 (Uk,l )/ d in Cd ⊗ Cd that consists of
d2 maximally entangled states.
Before we discuss further properties of the Hilbert-Schmidt isomorphism, we
need to introduce the tensor product of operators. For Ai ∈ B(Hi ) one defines
the tensor product A1 ⊗A2 as an operator on H1 ⊗H2 via (A1 ⊗A2 )(ψ1 ⊗ψ2 ) :=
(A1 ψ1 ) ⊗ (A2 ψ2 ) and its extension by linearity. Then (A1 ⊗ A2 )∗ = A∗1 ⊗ A∗2
and if Bi ∈ B(Hi ) then
(A1 ⊗ A2 )(B1 ⊗ B2 ) = (A1 B1 ) ⊗ (A2 B2 ). (1.43)
The tensor product can be shown to preserve properties like unitarity, positiv-
ity, Hermiticity, normality, boundedness, compactness, trace-class or Hilbert-
Schmidt-class. That is, if both A1 and A2 have one of these properties, then
so does A1 ⊗ A2 . More specifically, kA1 ⊗ A2 kp = kA1 kp kA2 kp holds for all
p ∈ [1, ∞] and if A1 , A2 are trace-class, then tr [A1 ⊗ A2 ] = tr [A1 ] tr [A2 ].
A useful representation of the tensor product in the finite dimensional case
is the Kronecker product of matrices: if A and B are finite matrices, then A ⊗ B
can be represented as a block matrix
 
A11 B A12 B · · ·
A21 B A22 B · · ·
.
.. ..

..
. . .
Now, let us have a closer look at properties of the particular Hilbert-Schmidt

isomorphism that we used in the proof of Thm.1.30 and see how it treats tensor
products of operators:
Corollary 1.32. Let I : H1 ⊗ H2 → B2 (H1 , H2 ) be the Hilbert-Schmidt iso-

morphism constructed via Eq.(1.41), and consider any Ψ ∈ H1 ⊗ H2 .
(i) For any A ∈ B(H1 ), B ∈ B(H2 ) we have I : (A ⊗ B)Ψ 7→ BI(Ψ)AT ,

where AT is the transpose of A in the basis used to define I.
(ii) If H1 ' H2 ' Cd and I(Ψ) is invertible, then for any A ∈ B(H1 ) there is a
B ∈ B(H2 ), which can be obtained from A via a similarity transformation,
so that
(A ⊗ 1)Ψ = (1 ⊗ B)Ψ. (1.44)
If Ψ is maximally entangled,
√ then B has the same singular values as A.
In particular, if I(Ψ) = 1/ d, then B = AT .
Proof. (i) follows from the defining equation of the isomorphism, Eq.(1.41), via
(A ⊗ B)|ψi ⊗ |ϕi = |Aψi ⊗ |Bϕi 7→ |BϕihAψ| = B|ϕihψ|AT .
Eq.(1.44) in (ii) follows from (i) by setting B := I(Ψ)AT I(Ψ)−1 . Since A is
similar
√ to AT , B is similar to A. If in addition Ψ is maximally entangled, then
dI(Ψ) is a unitary, so that the claim follows by inserting the singular value
decomposition of A.
Eq.(1.44), especially for maximally entangled Ψ, will play a crucial role in

applications such as quantum teleportation or quantum super-dense coding.
Let us finally have a closer look at tensor products of more than two spaces
and start with some popular examples:
Example 1.13 (GHZ and W-states). As a shorthand for ek ⊗fl ⊗gm , where k, l, m
each label elements of an orthonormal basis, it is sometimes convenient to write
|k l mi. Using this notation, two prominent examples of states in C√2
⊗ C2 ⊗ C2
are the Greenberger-Horne-Zeilinger
√ (GHZ) state (|000i + |111i)/ 2 and the
W-state (|100i + |010i + |001i)/ 3.
Definition 1.33 (Tensor rank). The tensor rank of an element Ψ ∈ H1 ⊗ . . . ⊗
Hm , is defined as R(Ψ) := min r ∈ N| Ψ = i=1 ψi ⊗. . .⊗ψi , ψi ∈ Hk .
Pr (1) (m) (k)
The case m = 2 turns out to be significantly simpler and more well-behaved

than m > 2. For instance:
Proposition 1.34. Let H := H1 ⊗ . . . ⊗ Hm be a tensor product of spaces that
satisfy 2 ≤ dim(Hi ) < ∞ and let R : H → N be the tensor rank. For m = 2
the tensor rank is lower semi-continuous on H and R(Ψ) equals the Schmidt
rank of Ψ. For m ≥ 3 there are converging sequences Ψn → Ψ for n → ∞ with
R(Ψn ) < R(Ψ).
Proof. For m = 2 we can exploit the Hilbert-Schmidt isomorphism from Thm.1.30,
which then relates the tensor rank of Ψ to the rank of the operator I(Ψ). The
latter is equal to the Schmidt rank and known to be lower semi-continuous. One
way of showing that the rank of a matrix is lower semi-continuous is to argue
that the rank of a matrix is at most k iff all (k + 1) × (k + 1) minors vanish. As
the zero-set of a finite number of polynomials, this forms a closed set so that
Ψn → Ψ implies lim inf R(Ψn ) ≥ R(Ψ).
For m > 2 consider the simplest case H = C2 ⊗ C2 ⊗ C2 , which can be
embedded into all larger spaces. Denote by e, f ∈ C2 two orthogonal unit
vectors. The unnormalized W-state Ψ = e ⊗ e ⊗ f + e ⊗ f ⊗ e + f ⊗ e ⊗ e can
be shown to have tensor rank three. However, it can be obtained as a limit of
1 1 1
Ψn = n e + f ⊗ e + f ⊗ e + f − n e ⊗ e ⊗ e. (1.45)
n n n
Consequently, for m > 2 the set {Ψ ∈ H|R(Ψ) ≤ k} is not closed in general.
Example 1.14 (Matrix-multiplication tensor). Consider H = H1 ⊗H2 ⊗H3 where
all three tensor factors are matrix spaces of the form Cd×d . Denoting the
matrix units by (ekl )ij := δk,i δl,j the matrix-multiplication tensor is defined
Pd
as T := k,l,m=1 ekl ⊗ elm ⊗ emk . With its help, the matrix-product of two
matrices A, B ∈ Cd×d can be expressed as (AB)αβ = tr [T (B ⊗ A ⊗ eβα )]. If
T has tensor rank R(T ) = r, then there are linear maps a, b : Cd×d → Cr and
matrices (Ci )ri=1 ⊂ Cd×d so that for every A, B ∈ Cd×d we have
r
X
(AB)αβ = Ci,αβ a(A)i b(B)i . (1.46)
i=1
Pr
This can be seen by inserting the assumed form T = i=1 ui ⊗ vi ⊗ wi and
taking traces. Eq.(1.46) means that the elements of AB can be obtained as
linear combinations of the r products a(A)i b(Bi ). In this way, and by using
recursion, the (so far unkonwn) tensor rank of T provides an upper bound on
the (so far unknown) complexity of matrix multiplication. Note that naive
matrix multiplication would require d3 products but, as Strassen has observed,
R(T ) < d3 . Specifically, for d = 2 he found R(T ) = 7.
Partial trace In classical probability theory, if we have a pair of random

variables with a given joint distribution, then there is a well-defined way of
assigning a marginal distribution to each of the random variables individually. In
the following theorem we construct the quantum analogue of this marginalizing
map. The analogy will then be made clearer in the subsequent paragraph.
Theorem 1.35 (Partial trace). There is a unique map (called partial trace)
tr2 : B1 (H1 ⊗ H2 ) → B1 (H1 ) for which
tr [B(A ⊗ 1)] = tr [tr2 [B]A] , ∀A ∈ B(H1 ), (1.47)
holds for all B ∈ B1 (H1 ⊗ H2 ). Moreover, tr2 is trace-norm continuous and
B≥0 ⇒ tr2 [B] ≥ 0,
B = B1 ⊗ B2 ⇒ tr2 [B] = B1 tr [B2 ] , (1.48)
tr [tr2 [B]] = tr [B] .
Proof. For any unit vector ψ ∈ H2 define a bounded linear map 1 ⊗ hψ| :
H1 ⊗H2 → H1 via ϕ1 ⊗ϕ2 7→ ϕ1 hψ, ϕ2 i and extension by linearity and continuity
(which is possible since the map has operator norm one). Choose an orthonormal
basis {ek } ⊂ H2 an consider the ansatz
1 ⊗ hek | B 1 ⊗ |ek i .
X
tr2 [B] := (1.49)
k
According to the subsequent Lemma 1.36, the r.h.s. of this equation converges
in trace-norm to a trace-class operator. Hence, tr2 is well-defined and Eq.(1.47)
can be verified by insertion. Uniqueness of the map is implied by the fact
that specifying tr [XA] for all A ∈ B(H1 ) determines X. In particular, the
construction in Eq.(1.49) is basis-independent.
The properties summarized in Eq.(1.48) follow immediately from Eq.(1.47).
For instance, positivity of hψ, tr2 [B]ψi = tr [B(|ψihψ| ⊗ 1)] is implied by posi-
tivity of B together with |ψihψ| ⊗ 1 ≥ 0 (cf. Exercise 1.6).
Finally, we prove the missing Lemma that shows trace-norm convergence of
the ansatz in Eq.(1.49). For later use, the formulation is slightly more general.
LemmaP 1.36. Let (Ak )k∈N ⊂ B(H1 , H2 ) be a sequence of operators for which
n
limn→∞ k=1 A∗k Ak = X ∈ B(H1 ) converges weakly. Then for every B ∈
B1 (H1 ) there is a B 0 ∈ B1 (H2 ) so that
n

0 X ∗
B − Ak BAk → 0 , (1.50)

k=1 1
and tr [B 0 ] = tr [BX]. The map B 7→ B 0 is linear, it commutes with the adjoint

map (i.e., (B 0 )∗ = (B ∗ )0 ) and if B = B ∗ then kB 0 k1 ≤ kXk∞ kBk1 .
Proof. By assumption k≥n A∗k Ak converges weakly to zero for n → ∞. This
P
implies weak-* convergence since we deal with a uniformly bounded subset of
operators. W.l.o.g. we assume that B ≥ 0 as we can always write it as a linear
combination of four positive trace-class operators. Then
 
X X
Ak BA∗k A∗k Ak  → 0.

= tr B


k≥n k≥n
1
Pn ∗
This implies that k=1 Ak BAk is a Cauchy sequence in B1 (H2 ) and thus conver-
gent in trace-norm to some element B 0 . tr [B 0 ] = tr [BX] then follows from the
cyclic properties of the trace together with dominated convergence (or Fubini-
Tonelli).
Linearity of the map B 7→ B 0 follows from linearity of B 7→ Ak BA∗k and the
∗
commutation with the adjoint map from Ak BA∗k = Ak B ∗ A∗k . Finally, assume
that B = B ∗ so that we can decompose B = B+ − B− into orthogonal positive
and negative parts. Then kB 0 k1 ≤ k(B+ )0 k1 + k(B− )0 k1 = tr [(B+ + B− )X] ≤
kXk∞ kBk1 .
By interchanging the labels 1 ↔ 2 and using the isomorphism H1 ⊗ H2 '
H2 ⊗ H1 we can define a partial trace tr1 : B1 (H1 ⊗ H2 ) → B1 (H2 ) in complete
analogy to Thm.1.35. The defining equation in this case would be tr [tr1 [B]A] =
tr [B(1 ⊗ A)] imposed for all A ∈ B(H2 ). More generally, for H := H1 ⊗. . .⊗Hn
we can define a partial trace for any non-empty subset Λ ⊆ {1, . . . , n},
Q which
then equals the composition of all individual partial traces, i.e. trΛ = i∈Λ tri .
Composite and reduced systems Within quantum theory, tensor products

are used to describe composite systems. If a system is composed of distinguish-
able subsystems that are individually assigned to Hilbert spaces H1 and H2 ,
respectively, then the description of the composite system is based on the ten-
sor product Hilbert space H1 ⊗ H2 . Here ‘distinguishable subsystems’ might
refer to spatially separated parts of a system or to different degrees of freedom
of one system, such as the spin and the position of a single electron. In this case,
one would describe the spin within C2 and the position within L2 (R3 ). Hence
C2 ⊗ L2 (R3 ) would be the Hilbert space underlying the description that covers
both degrees of freedom. Aspects of a system that exclude each other, on the
other hand, are reflected by a direct sum. Consider for instance a neutron n,
which can decay into a proton p, an electron e− and an electron-anti-neutrino
ν̄e , i.e. n → p + e− + ν̄e . This would be modeled using Hn ⊕ Hp ⊗ He− ⊗ Hν̄e as
the overall Hilbert space since there is either the neutron or its decay products.
However, if a composite system would consist out of a neutron and a proton, an
electron and an electron anti-neutrino, then we would use Hn ⊗Hp ⊗He− ⊗Hν̄e .
Suppose ρ ∈ B1 (H1 ⊗ H2 ) is a density operator that describes the prepara-
tion of a composite system composed of two subsystems. If we disregard say
the second system and consider only the first part, the corresponding density
operator is given by ρ1 := tr2 [ρ]. This is then called a reduced density operator.
Similarly, if we discard the first subsystem, the reduced density operator that
describes the remaining part is ρ2 := tr1 [ρ]. If ρ is a pure state, the reduced
density operators can be read off its Schmidt decomposition:
Corollary 1.37. Let |ΨihΨ| P ∈ B1 (H
√ 1 ⊗ H2 ) be a pure density operator with
Schmidt decomposition |Ψi = i=1 λi |ei i ⊗ |fi i with r ∈ N ∪ {∞}. Then its
r
reduced density operators are given by

r
X r
X
ρ1 = λi |ei ihei | and ρ2 = λi |fi ihfi |. (1.51)
i=1 i=1
Proof. The statement follows from inserting the Schmidt decomposition into the
explicit form of the partial trace in Eq.(1.49). The calculation simplifies if we
use the basis of the Schmidt decomposition in the respective partial trace.
Cor.1.37 leads to some of simple but useful observations: the spectra of the
two reduced density operators coincide as multisets and, more qualitatively, the
rank of each reduced density operator equals the Schmidt rank. In particular,
Ψ is a simple tensor product (r = 1) iff the reduced states are pure.
Another simple but useful observation is that the above corollary can be
read in reverse, and we can (at least mathematically) regard every mixed state
as the reduced state of some larger system that is described by a pure state:
Corollary 1.38 (Purification). Let ρ1 ∈ B1 (H1 ) be a density operator of rank
r ∈ N ∪ {∞}. Then there is a Hilbert space H2 of dimension
dim(H2 ) = r and
a pure state |ΨihΨ| ∈ B1 (H1 ⊗ H2 ) so that ρ1 = tr2 |ΨihΨ| .
Proof. We start with the spectral decomposition of ρ1 , which we interpret as the
l.h.s. of Eq.(1.51), and construct
√ a pure state Ψ via its Schmidt decomposition
with Schmidt coefficients λi and the eigenvectors of ρ1 as orthonormal family
on the first tensor factor. Cor. 1.37 then guarantees that we recover ρ1 as the
partial trace of |ΨihΨ|.
Clearly, such a purification is not unique. Any state vector of the form
(1 ⊗ V )Ψ with V an isometry would also be a working purification.
Let us finally have a closer look at how the machinery of reduced and com-
posite systems works on the side of the measurements. Suppose there are two
independent measurement devices acting on the two parts of a composite sys-
tem, individually described by POVMs M1 and M2 . If Y1 ⊆ X1 and Y2 ⊆ X2 are
corresponding measurable sets of measurement outcomes, then the overall mea-
surement that now has outcomes in X1 × X2 , equipped with the product sigma-
algebra, is described by a POVM that satisfies M (Y1 × Y2 ) = M1 (Y1 ) ⊗ M2 (Y2 ).
Taking disjoint unions and complements (as in Lemma 1.8) this defines M on
the entire product sigma-algebra. The marginal probabilities are then given by

p1 (Y1 ) = p(Y1 × X2 ) = tr [ρM (Y1 × X2 )] = tr ρ M1 (Y1 ) ⊗ M2 (X2 )
= tr ρ M1 (Y1 ) ⊗ 1 = tr [ρ1 M1 (Y1 )] ,

consistent with the definition and interpretation of the reduced density operator
ρ1 = tr2 [ρ].
If the overall states is described by a simple tensor product ρ = ρ1 ⊗ ρ2 ,
which is then called a product state, we obtain

p(Y1 × Y2 ) = tr (ρ1 ⊗ ρ2 ) M1 (Y1 ) ⊗ M2 (Y2 ) = tr [ρ1 M1 (Y1 )] tr [ρ2 M2 (Y2 )]
= p1 (Y1 ) p2 (Y2 ).
This means that the measurement outcomes are independent. In other words,
there are no correlations between the subsystems if the preparation is described
by a product state.
Entropic quantities
Definition 1.39 (Relative entropy & mutual information).
◦ Let ρ, σ ∈ B1 (H) be positive. If ker(ρ) ⊇ ker(σ), the relative entropy

between ρ and σ is defined as S(ρkσ) := tr ρ log(ρ) − log(σ) where the
trace is taken in an eigenbasis of ρ. If ker(ρ) 6⊇ ker(σ) then S(ρkσ) := ∞.
◦ Let ρAB ∈ B1 (HA ⊗ HB ) be a density operator with reduced density opera-

tors ρA := trB [ρAB ] and ρB := trA [ρAB ]. The mutual information between
the subsystems A and B in ρ is defined as I(A : B) := S(ρkρA ⊗ ρB ).
A crucial properties of both quantities is positivity together with the fact

that they are zero only in the obvious case:
Corollary 1.40 (Pinsker inequality). The relative entropy and the mutual in-
formation as defined in Def.1.39 satisfy:
1 2
S(ρkσ) ≥ kρ − σk1 , (1.52)
2
1 2
I(A : B) ≥ kρAB − ρA ⊗ ρB k1 . (1.53)
2
In particular, S(ρkσ) = 0 and I(A : B) = 0 iff ρ = σ and ρAB = ρA ⊗ ρB ,
respectively.
Proof. For ease of the argument, we are going to cheat a little bit and prove
Eqs. (1.52,1.53) for k·k2 instead of for k·k1 . Clearly, the trace-norm bound is
the stronger result and we refer to ... for its proof.
By definition of the mutual information, Eq.(1.53) is a consequence of Eq.(1.52).
In order to arrive at Eq.(1.52), we use the fact that f (x) := x log x is strongly
convex on [0, 1] with f 00 (x) = 1/x ≥ 1. So we can apply Eq.(1.26) from which
the result then follows instantly.
Exercise 1.22. For i ∈ {1, 2} consider Ai ∈ B(Hi ). Show that if A1 , A2 are positive or
unitary then the same holds true for A1 ⊗ A2 .
Exercise 1.23 (Flip). Let H1 ' H2 ' Cd . By identifying bases of the two spaces we
can define a flip operator F ∈ B(H1 ⊗ H2 ) via F(ϕ ⊗ ψ) = ψ ⊗ ϕ.
(a) Determine the eigenvalues and eigenvectors of F.
(b) Prove that F is the unique operator satisfying tr [F(A ⊗ B)] = tr [AB] ∀A, B ∈
B(Cd ).
(c) Let (Gi )di=1 ⊂ B(Cd ) be any Hilbert-Schmidt-orthonormal basis of Hermitian
2
operators. Show that F = di=1 Gi ⊗ Gi .

P 2
Exercise 1.24 (Partial trace). Consider an element of B(Cd ⊗ Cn ) in block matrix

representation. How can the partial traces be understood in this picture?
Exercise 1.25 (Monogamy). Alice, Bob and Charlie share a quantum system described
by a density operator ρ ∈ B1 (HA ⊗ HB ⊗ HC ) where HB ' HC . Suppose the reduced
density operator ρAB is pure. Show that ρAC = ρAB is not possible unless both are
simple products (i.e. their Schmidt rank is one).
1.6 Quantum channels and operations

So far, we have introduced and discussed aspects of preparation and measure-
ment. In this section, we will analyze the mathematical objects that are used
to describe anything that could happen to a quantum system between prepa-
ration and measurement. This could mean active operations performed by an
experimentalist, interactions either between parts of the system or with an en-
vironment or plain time-evolution.
Since quantum theory divides the description of every statistical experi-
ment into preparation and measurement, there are two natural ways to de-
scribe intermediate operations or evolutions: either by incorporating them into
the preparation or into the measurement description. These two viewpoints
are called Schrödinger picture and Heisenberg picture, respectively. While the
Schrödinger picture updates the density operator, the Heisenberg picture up-
dates the POVM.
Schrödinger & Heisenberg picture The mathematical maps that are to

describe the evolution/operation in either Schrödinger or Heisenberg picture
have to be consistent with the probabilistic interpretation. In particular, they
have to preserve convex combinations, which implies that they have to be affine
maps. These, however, can always be extended to linear maps: for instance, the
affine map ρ 7→ ρ0 = L(ρ) + C, where L is a linear map and C a constant, has
a linear extension from the trace-one-hyperplane to the entire space of trace-
class operators that is obtained by simply replacing C with Ctr [ρ]. In this way,
we can without loss of generality restrict ourselves to linear maps. Elementary
properties of such maps are introduced in the following:
Definition 1.41. Let L ⊆ B(H1 ) be a linear subspace. A linear map T : L →
B(H2 ) is called
◦ trace-preserving if the image of any A ∈ L ∩ B1 (H1 ) under T is trace-class
and tr [T (A)] = tr [A],
1.6. QUANTUM CHANNELS AND OPERATIONS 39
◦ unital if T (1) = 1 (assuming 1 ∈ L),

◦ positive if T (A) ≥ 0 for all positive A ∈ L,
◦ completely positive if T ⊗ idn is positive for all n ∈ N, where idn is the

identity map on B(Cn ).
Remark: here we have tacitly introduced a third level tensor product, namely
the tensor product of linear maps on spaces of operators. T ⊗ idn is defined as
T ⊗ idn : A ⊗ B 7→ T (A) ⊗ B and linear extension to finite linear combinations.
Let us see how these properties come into play. If T : B1 (H1 ) → B1 (H2 )
is a trace-preserving and positive linear map, then T (ρ) is a density operator
whenever ρ is one. Recalling that ρ might describe a part of a larger system
whose other parts are left untouched by T , it is necessary to impose that not
only T maps density operators to density operators, but (T ⊗ id) does so as
well. This is captured by the notion of complete positivity. In principle, this
should hold not only for a finite-dimensional ‘innocent bystander’. We will see
later though, from the representation theory of completely positive maps, that
considering finite-dimensional systems is sufficient in this context.
Example 1.15 (Transposition). The paradigm of a map that is positive but not
completely positive is matrix transposition. Let Θ : B(H) → B(H), Θ(A) := AT
be the transposition map w.r.t. a fixed basis {|ki} ⊂ H. This is a positive
map, since it preserves Hermiticity as well as the spectrum. for |ψi =
However,
|00i + |11i ∈ H ⊗ C we get (Θ ⊗ id2 ) |ψihψ| = i,j=0 Θ |iihj| ⊗ |iihj| =
2
P1
P1
i,j=0 |jihi| ⊗ |iihj|, for which −1 is an element of the spectrum (cf. Exercise
1.23).
Let us turn to the Heisenberg picture. Assume that T ∗ : B(H2 ) → B(H1 ) is
a continuous, unital and positive linear map.12 If M : B → B(H2 ) is a POVM,
then M 0 := T ∗ ◦ M : B → B(H1 ) is a POVM as well. To see this, note that
positivity of T ∗ implies positivity of M 0 (Y ) for all Y ∈ B and if X = ∪k Xk is
countable disjoint partition of the set X of all possible outcomes into measurable
subsets Xk , then
!
M (Xk ) = T ∗ (1) = 1,
X X
M 0 (Xk ) = T ∗
k k
where we used continuity of T ∗ in the first step and unitality in the last step.
Since Schrödinger picture and Heisenberg picture describe the same thing
from different viewpoints, they should lead to consistent predictions. As the pre-
dictions are in the end probabilities expressed through Born’s rule, the equiva-
lence of the two viewpoints should be expressible on this level. This equivalence
is established in the following theorem. For any map T in the Schrödinger pic-
ture it proves the existence of an equivalent description via a map T ∗ in the
12 The meaning of the ‘∗ ’ will become clear below. For now, read ‘T ∗ ’ just as an arbitrary
symbol that we assign as a name to the map.

Heisenberg picture. We will comment on the more subtle converse direction

below.
Theorem 1.42 (Schrödinger picture to Heisenberg picture). Let T : B1 (H1 ) →

B1 (H2 ) be a bounded linear map. Then there is a unique linear map T ∗ :
B(H2 ) → B(H1 ) (called the dual map) that satisfies ∀A ∈ B(H2 ), ρ ∈ B1 (H1 ):
tr [T (ρ)A] = tr [ρ T ∗ (A)] . (1.54)
Moreover, the following equivalences hold:
(i) T is positive iff T ∗ is positive,
(ii) T is completely positive iff T ∗ is completely positive,
(iii) T is trace-preserving iff T ∗ is unital.
Proof. Consider the map f : B1 (H1 ) → C defined by f (B) := tr [T (B)A] for

fixed A ∈ B(H2 ). Due to linearity of T , f is linear. It is also bounded since
Hölder’s inequality and boundedness of T lead to |f (B)| ≤ kT (B)k1 kAk∞ ≤
c kBk1 for some constant c < ∞. Hence f is a continuous linear functional on
B1 (H1 ). The duality B1 (H1 )0 = B(H1 ) then implies the existence of a T ∗ (A) ∈
B(H1 ) so that f (B) = tr [BT ∗ (A)], which verifies Eq.(1.54). As the l.h.s. of
Eq.(1.54) depends linearly on A, T ∗ (A) has to depend linearly on A as well so
that T ∗ is a linear map. Uniqueness is guaranteed by the fact that specifying
tr [ρT ∗ (A)] for all density operators ρ determines the operator T ∗ (A).
As for positivity, we use the defining relation between T and T ∗ in the form
tr [T (|ψihψ|)A] = hψ, T ∗ (A)ψi. (1.55)
Imposing positivity of the l.h.s. for all ψ ∈ H1 and all positive A ∈ B(H2 ) is
equivalent to positivity of T . Imposing the same for the r.h.s. is equivalent
to positivity of T ∗ . So these conditions are equivalent. The same argument
applies to complete positivity by replacing T with T ⊗ idn and realizing that
(T ⊗ idn )∗ = T ∗ ⊗ idn .
Similarly, from Eq.(1.54) we derive the equation
tr [T (B) − B] = tr B T ∗ (1) − 1 .

(1.56)
Here the l.h.s. is zero for all B ∈ B1 (H1 ) iff T is trace-preserving, whereas the
r.h.s. is zero for all B ∈ B1 (H1 ) iff T ∗ is unital.
One important property of the dual map has been left aside and will be cov-
ered in the following corollary: continuity. Before proving this in a quantitative
way, some remarks on the involved norms are in order.
Both T and T ∗ are maps between Banach spaces. If not specified other-
wise, their norms are the corresponding Banach space operator norms. That is,
kT k = sup{kT (B)k1 | kBk1 ≤ 1} and kT ∗ k = sup{kT ∗ (A)k∞ | kAk∞ ≤ 1}. The
involved trace-norm and the operator norm in B(H) are dual to each other in
the sense that

kBk1 = sup tr [AB] , and kAk∞ = sup tr [AB] . (1.57)
kAk∞ =1 kBk1 =1
These equations can for instance be proven by means of the polar decomposition
and the Schmidt decomposition, respectively.
Corollary 1.43. Let T : B1 (H1 ) → B1 (H2 ) be a bounded linear map and T ∗
the corresponding dual map. Then kT ∗ k = kT k. Moreover, if T is positive,
these norms are equal to kT ∗ (1)k∞ . In particular, if T is positive and trace-
preserving, then for all B ∈ B1 (H1 ), A ∈ B(H2 ):
kT (B)k1 ≤ kBk1 and kT ∗ (A)k∞ ≤ kAk∞ . (1.58)
Proof. Using the defining relation between T and T ∗ and Eq.(1.57) we obtain
kT ∗ k = sup sup tr [BT ∗ (A)] = kT k .

(1.59)
kAk∞ =1 kBk1 =1 | {z }
=tr[T (B)A]
To proceed, we exploit the convex structure of the unit balls in B1 (H1 ) and
B(H2 ) by which it suffices to take the suprema over all rank-one elements in the
trace-class and all unitaries in B(H2 ). The latter is justified by the Russo-Dye
theorem (Thm.1.16) and the former by the Schmidt-decomposition (Eq.(1.8)).
Thus
kT ∗ k = sup sup hϕ, T ∗ (U )ψi,

(1.60)
U ψ,ϕ
where the suprema are taken over all unitaries U ∈ B(H2 ) and unit vectors
ϕ, ψ ∈ H1 . Let us for the moment assume that H2 isP finite dimensional. This
enables a spectral decomposition of the form U = k exp[iαk ]|ek ihek | with
αk ∈ R and {ek } =: E ⊂ H2 an orthonormal basis. Inserting this into Eq.(1.60)
leads to
X
kT ∗ k ≤ sup sup hϕ, T ∗ |ek ihek | ψi,

(1.61)
E ψ,ϕ
k
hψ, T ∗ |ek ihek | ψi = kT ∗ (1)k∞ .

X
= sup sup (1.62)
E ψ
k
Here, in the step from the first to the second line we have used positivity of
T ∗ together with two applications of Cauchy-Schwarz. Note that equality has
to hold in the inequality since U = 1 was a valid choice in the first place.
Eq.(1.58) then follows from unitality of T ∗ , which for positive maps now implies
kT k = kT ∗ k = 1.
Finally, we have to come back to the assumption dim(H2 ) < ∞. Suppose
this is not the case. Thennote that the coreexpression in Eq.(1.60) can also be
written as tr U T |ψihϕ| . Since T |ψihϕ| is a trace-class operator on H2 it
can be approximated arbitrarily well in trace-norm by a finite rank operator F .
So we may restrict ourselves to unitaries that act non-trivial only on the finite
dimensional subspace supp(F )+ran(F ) and continue with the finite dimensional
argument.
Thm.1.42 constructs a map in the Heisenberg picture for any map in the
Schrödinger picture. What about the converse? In finite dimensions the sit-
uation is symmetric. There we can interpret the expression in Born’s rule as
Hilbert-Schmidt inner product w.r.t. which T ∗ is the adjoint operator corre-
sponding to T . In infinite dimensions, the proof of Thm.1.42 relied on the
duality relation B1 (H1 )0 = B(H1 ), which does not hold in the other direction.
In other words, there are maps Φ : B(H2 ) → B(H1 ) in the Heisenberg picture
that have no predual that maps density operators to density operators. A map
Φ is called normal if there exists such a predual. Equivalently, Φ is normal if
it is continuous as a map from B(H2 ) to B(H1 ) when both spaces are equipped
with the weak-* topology.
Kraus representation and environment We already know three elemen-

tary classes of linear maps that are completely positive and trace-preserving:
(i) Addition of an ancillary density operator σ via B 7→ B ⊗ σ.
(ii) Partial trace B 7→ tr2 [B] in a composite system.
(iii) Unitary evolution of the form B 7→ U BU ∗ , where U is a unitary.
Since complete positivity as well as the trace-preserving property is preserved
under composition of maps, any composition of the three elementary building
blocks is again completely positive and trace-preserving. In fact, we will see
later that this construction is exhaustive.
Theorem 1.44 (Kraus/environment representation). For T : B1 (H) → B1 (H)
the following are equivalent:
(1) There is a Hilbert space K, a unitary U ∈ B(H ⊗ K) and a density operator
σ ∈ B(K) s.t.
T (ρ) = trK U (ρ ⊗ σ)U ∗ ,

(1.63)
(2) There is a Hilbert space K, a unitary W ∈ B(H ⊗ K) and a unit vector

ψ ∈ K s.t.
T (ρ) = trK W (ρ ⊗ |ψihψ|)W ∗ ,

(1.64)
(3) There is a Hilbert space K and an isometry V : H → H ⊗ K s.t.

T (ρ) = trK V ρV ∗ ,

(1.65)
is a finite or infinite sequence (Ak )rk=1 ⊂ B(H), r ∈

(4) There P N ∪ {∞} for
which k=1 A∗k Ak = 1 converges weakly and
r
r
X
T (ρ) = Ak ρA∗k . (1.66)
k=1
Remark: The Ak ’s are called Kraus-operators and Eq.(1.66) the Kraus

P repre-
sentation of T . As we have seen in Lemma 1.36, weak convergence of k A∗k Ak
to a bounded operator(which in this case is equivalent to strong convergence)
implies trace-norm convergence in Eq.(1.66).
Proof. To distinguish the auxiliary Hilbert spaces of the first three points, we
denote them by K1 , K2 and K3 . We will show (1)⇔(2)⇒(4)⇒(3)⇒(2).
Assume (1) holds. Then we can use a purification ψ ∈ K1 ⊗ K1 := K2 of
σ ∈ B1 (K1 ), as derived in Cor.1.38, and we obtain (2) by choosing W = U ⊗ 1.
Conversely, (2)⇒(1) since (2) is a special case of (1).
Now suppose (2) holds. In order to show that (2)⇒(4), we set Ak :=
1 ⊗ hek | W 1 ⊗ |ψi for an orthonormal basis {ek } ⊂ K2 . Using the ex-

plicit construction of the partial trace in Eq.(1.49), we see that P

Eq.(1.66), after
insertion of the Ak ’s, becomes Eq.(1.64). Strong convergence of k |ek ihek | = 1
then implies strong convergence in
hψ|⊗1 W ∗ 1⊗|ek ihek | W |ψi⊗1 = hψ|⊗1 W | {zW} |ψi⊗1 .

X X
A∗k Ak =
∗
k k =1
If (4) holds, then we can construct an isometry V :PH → H ⊗ K3 with

K3 = l2 (N) if r = ∞ or otherwise K3 = Cr via V : ϕ 7→ k (Ak ϕ) ⊗ ek where
{ek } ⊂ K3 is any orthonormal basis. This is indeed an isometry, since
X
hϕ, V ∗ V ϕi = hϕ, A∗k Ak ϕi = hϕ, ϕi.
k
Finally, assuming (3), we want extend the isometry V to a unitary in order to

arrive at (2). To this end, take any unit vector ψ ∈ K2 := K3 and suppose the
spaces H ⊗ K2 Cψ ' (ranV )⊥ are isomorphic, which is certainly

true if H
has finite dimensions. Then there is a unitary V 0 : H ⊗ K2 Cψ → (ranV )⊥ ,

which extends V : H ' H⊗ Cψ → H⊗K2 to a unitary W := V ⊕V 0 . If (ranV )⊥

is too small so that the assumed isomorphism does not hold, we first compose V
with a canonical embedding of K3 into K3 ⊕ C =: K2 . Then (ranV )⊥ with the
orthogonal complement taken in H ⊗ K2 is infinite dimensional and the desired
isomorphism holds.
Eq.(1.63) has a simple physical interpretation: we may think of T as de-
scribing an interaction, which is characterized by U , with an environment that
is initially uncorrelated with the systems, described by a density operator σ and
traced out after the interaction.
The Kraus representation of a completely positive linear map is not unique.
This is, in fact, closely related to the non-uniqueness of the convex decomposi-
tion of a density operator into rank-one projections (cf. Example 1.8) and, in a
similar vein, one can show the following:
Proposition 1.45 (Ambiguity in the Kraus representation).P Let T : B1 (H1 ) →
∗
B1 (H2 ) have a Kraus representation of the form T (ρ) = i∈N Ki ρKi with
N ⊆ N. If uij are the entries of a unitary matrix with index set N 3 i, j, then
P
Bi := j∈NPuij Kj defines a set of Kraus operators that represent the same map
via T (ρ) = i∈N Bi ρBi∗ .
Conversely, if {Ai }i∈N and {Bi }i∈N are two sets of Kraus-operators that
represent the same trace-preserving map and if either N is finite or both sets
contain an infinite number of zeros, then there is a unitary u s.t. Bi :=
P
j∈N uij Aj .
Definition 1.46 (Quantum channels). A linear map T : B1 (H1 ) → B1 (H2 ) is

called a quantum channel if it is trace-preserving and completely positive.
We will see later that every quantum channel can be represented in the ways
specified by Thm.1.44.
Example 1.16 (Phase damping channel). Let {|0i, |1i} denote an orthonormal
basis of C2 and define ρij := hi|ρ|ji. A simple model of a ‘decoherence process’
is given by the phase damping channel that is parametrized by λ ∈ [0, 1] and
can be represented in the following ways:
X 3
ρ00 (1 − λ)ρ01
ρ 7→ = Ak ρA∗k (1.67)
(1 − λ)ρ10 ρ11
k=1
√ √ √
with A1 := 1 − λ 1, A2 := λ |0ih0|, A2 := λ |1ih1|.
In order to give an environment representation of this quantum channel, we

specify an orthonormal basis {|iiK }2i=0 of the ancillary space K ' C3 and define
the isometry
√ √
V : |0i 7→ 1 − λ |0i ⊗ |0iK + λ |0i ⊗ |1iK ,
√ √
V : |1i 7→ 1 − λ |1i ⊗ |0iK + λ |1i ⊗ |2iK .
Example 1.17 (Hadamard channels). The phase damping channel is a particular

instance of a Hadamard channel. Let H ∈ Cd×d be a positive matrix whose
diagonal entries are all equal to 1. Then
ρ 7→ H ∗ ρ
defines a quantum channel, where ‘∗’ denotes the entry-wise product (a.k.a.
Hadamard product), i.e. (H ∗ ρ)ij = Hij ρij , where the matrix elements are
w.r.t. a fixed orthonormal basis {|ii}di=1 . Showing that Hadamard channels
are indeed quantum channels is most easily done by observing that the set
of Hadamard channels coincides with set of quantum channels P with diagonal
Kraus operators. Consider a quantum channel ρ 7→ ρ0 := ∗
k Ak ρAk with
0
hi|Ak |ji = δ a
P ij ki . This is a Hadamard channel since hi|ρ |ji = hi|ρ|jiH ij with
Hij = k aki ākj . For the converse direction, observe that the last equation can
be seen as decomposition of H into positive rank-one operators. In this way, we
can construct diagonal Kraus operators from H, and so prove that Hadamard
channels are indeed completely positive.
Choi-matrices If a quantum channel, or a more general linear map, acts on a

finite-dimensional input space (with possibly infinite-dimensional output space),
the following will turn out to be a useful representation tool:
Definition 1.47 (Choi matrix). For finite-dimensional H1 ' Cd1 define |Ωi :=
Pd1
i=1 |iii ∈ H1 ⊗ H1 where each i labels an element of a fixed orthonormal
basis13 . The Choi matrix C ∈ B1 (H1 ⊗H2 ) of a linear map T : B(H1 ) → B1 (H2 )
is defined as

C := (id ⊗ T ) |ΩihΩ| .
√
Note that |Ωi/ d is a unit vector corresponding to a maximally entangled
state. The usefulness of the Choi matrix stems from a simple Lemma:
Lemma 1.48 (Cyclicity of maximally entangled state vectors). Let H1 ' Cd1
Pd1
be finite-dimensional and |Ωi := i=1 |iii ∈ H1 ⊗ H1 . For any ψ ∈ H1 ⊗ H2
define A := I(ψ) ∈ B2 (H1 , H2 ), where I is the Hilbert-Schmidt isomorphism
constructed via Eq.(1.41) (w.r.t. the same basis that defines Ω). Then
|ψi = (1 ⊗ A)|Ωi. (1.68)

Pd1 P
Proof. Expanding in a product basis like |ψi = i=1 k Aik |ii ⊗ |ek i we obtain
Pd1 P
I(ψ) = i=1 k Aik |ek ihi| so that Eq.(1.68) follows by insertion.
Clearly, the statement of the Lemma holds similarly for interchanged tensor
factors. In particular, for any ψ ∈ H2 ⊗ H1 there is an A ∈ B2 (H1 , H2 ) so that
|ψi = (A ⊗ 1)|Ωi.
Theorem 1.49 (Choi). Let H1 ' Cd1 be finite-dimensional and T : B(H1 ) →

B1 (H2 ) be a linear map with Choi matrix C ∈ B1 (H1 ⊗ H2 ). Then
(i) The map T 7→ C is a bijection whose inverse (C 7→ T ) is characterized by
tr [T (A)B] = tr C(AT ⊗ B) ,

∀A ∈ B(H1 ), B ∈ B(H2 ), (1.69)
where the transpose is w.r.t. the basis that is used in the definition of C.
(ii) C = C ∗ iff T (A)∗ = T (A∗ ) for all A ∈ B(H1 ).
(iii) C is positive iff T is completely positive.
(iv) tr2 [C] = 1 iff T is trace-preserving.
(v) tr1 [C] = 1 iff T is unital.

13 The notation H ⊗ H should be read as H ⊗ H where H is isomorphic to H and in
1 1 0 1 0 1
addition we identify two orthonormal bases.
Proof. (i) Note that via Eq.(1.69) T and C mutually determine each other so
that Eq.(1.69) specifies a bijection if we regard C as an unconstrained element
in B1 (H1 ⊗ H2 ). That this C is indeed the Choi matrix is verified by
tr [T (A)B] = tr [AT ∗ (B)] = tr [F(id ⊗ T ∗ )(A ⊗ B)] ,

= tr [|ΩihΩ|(Θ ⊗ T ∗ )(A ⊗ B)] ,
tr (id ⊗ T ) |ΩihΩ| (AT ⊗ B) .

=
Here we have used the property of the flip operator from Exercise 1.23 (b)
together with F = (Θ ⊗ id) |ΩihΩ| , where Θ denotes the matrix transposition.
∗
(ii) Since C ∗ = i,j |jihi| ⊗ T |iihj| with mutually orthogonal |iihj|, we
P
P ∗
have that this equals C = i,j |jihi| ⊗ T |jihi| iff T |iihj| = T |jihi| holds
for all i, j. In other words, C = C ∗ iff T (A)∗ = T (A∗ ) holds for all A = |iihj|.
By expanding an arbitrary A in that basis, the general statement follows.
(iii) The requirements in the definition of complete positivity of T imply
positivity of the Choi matrix as a special case. In order to prove the converse,
realize that is suffices to show (idn ⊗ T ) |ψihψ| ≥ 0 for all ψ ∈ Cn ⊗ H1
and all n ∈ N since the spectral decomposition of an arbitrary positive trace-
class operator allows us to restrict to rank-one operators. Lemma 1.48, with
interchanged tensor factors, now enables us to write |ψi = (A ⊗ 1)|Ωi for some
A ∈ B(H1 , Cn ). Then
(idn ⊗ T ) |ψihψ| = (A ⊗ 1) (idd1 ⊗ T ) |ΩihΩ| (A ⊗ 1)∗ ≥ 0.

| {z }
=C≥0
= hj|T ∗ (1)|ii the claim follows from tr2 [C] =

(iv) Using that tr T |iihj|
= T (1) .
∗ T
P
ij |iihj|tr T |iihj|
(v) Using that tr [|iihj|] = δi,j we get tr1 [C] = ij tr [|iihj|] T |iihj| = T (1),
P
which completes the proof.

Part (iii) of Thm.1.49 should be particularly emphasized: while the definition
of complete positivity requires positivity of (idn ⊗ T )(A) for all n ∈ N and all
positive A, Choi’s theorem shows that n = d1 and the choice A = |ΩihΩ| is suf-
ficient. Note that, by using that T is completely positivity iff T ∗ is (Thm.1.42),
we can equivalently apply Choi’s theorem to T ∗ , then with n = d2 . In both
cases we are left with a square matrix of dimension d1 d2 .
We will now return to Kraus decompositions and in particular use the Choi
matrix to prove the existence of a structured Kraus decomposition for every
completely positive map with finite-dimensional input space.
Corollary 1.50 (Kraus decomposition). Let T : B(H1 ) → B1 (H2 ) be a linear
map and di := dim(Hi ) with d1 < ∞. Then there are two Hilbert-Schmidt
orthogonal families of operators {Ak }rk=1 , {Bk }rk=1 in B2 (H1 , H2 ) with r ≤ d1 d2
such that
Xr
T (·) = Ak · Bk∗ . (1.70)
k=1
Moreover, if T is completely positive, we can in addition choose Bk = Ak for

all k.
Proof. We will construct the Kraus decomposition from the Choi matrix C ∈
B1 (H1 ⊗ H2 ) of T using Lemma 1.48. Since the Choi matrix is trace-class,
Prcan invoke the Schmidt-decomposition for compact operators and write C =
we
k=1 |ψk ihϕk |, where {ψk }, {ϕk } are two orthogonal families in H1 ⊗H2 . Using
Lemma 1.48 and defining Ak := I(ψk ), Bk := I(ϕk ) we can express |ψk i =
(1 ⊗ Ak )|Ωi and |ϕk i = (1 ⊗ Bk )|Ωi. As I is an isomorphism onto the Hilbert-
Schmidt class, the Ak ’s are orthogonal w.r.t. the Hilbert-Schmidt inner product,
and so are the Bk ’s. The Choi matrix now reads
r
(1 ⊗ Ak )|ΩihΩ|(1 ⊗ Bk )∗ .
X
C=
k=1
The representation claimed in Eq.(1.70) then follows from the fact that there
is a unique T corresponding to C (Thm.1.49 (i)). If T is completely positive,
then C is positive (Thm.1.49(iii)) so that we can choose ϕk = ψk and thus
B k = Ak .
Instruments For describing processes that output classical information in the

form of a measurement outcome and a post-measurement quantum system, it
is useful to introduce instruments. In a way, instruments generalize quantum
channels and POVMs by merging them. We begin with the formal definition:
Definition 1.51 (Instrument (in Schrödinger picture)). Let (X, B) be a mea-
surable space and denote by CP (H1 , H2 ) the set of completely positive maps
from B1 (H1 ) to B1 (H2 ). A map I : B → CP (H1 , H2 ), Y 7→ IY is called an
instrument if (i) IX is trace-preserving, and (ii) forP
all countable disjoint parti-
tions X = ∪k Xk with Xk ∈ B it holds that IX (ρ) = k IXk (ρ) with convergence
in trace-norm for all ρ ∈ B1 (H1 ).
Note that the definition implies that IJ +IY = IJ∪Y for all disjoint J, Y ∈ B.
The interpretation of an instrument is as follows. Upon input of a quantum
system characterized by a density operator ρ ∈ B1 (H1 ), the instruments yields
two outputs: (i) a measurement result that is contained in Y with probability
p(Y ) := tr [IY (ρ)] and (ii) a quantum system described by a density operator
in B1 (H2 ). Conditioned on having received a measurement outcome in Y , the
quantum system at the output is described by the density operator IY (ρ)/p(Y ).
That is, if one ignores the measurement outcome, the instrument gives rise to a
quantum channel IX , and if one ignores (i.e., traces out) the quantum output,
the instrument gives rise to a POVM Y 7→ IY∗ (1).
One way to arrive at an instrument is to use a quantum channel T : B1 (H1 ) →
B1 (H2 ⊗ H3 ) that outputs a composite system of which one part undergoes a
measurement that is described by a POVM M : B → B(H3 ). This results in an
instrument of the form
IY (ρ) = tr3 1 ⊗ M (Y ) T (ρ) .

In fact, one can show that every instrument can be obtained in this way.
For any quantum channel and any discrete POVM there are simple ways of
constructing an instrument that implements the channel or the POVM, respec-
tively.
P one side, ∗given a quantum channel T with Kraus representation
On the
T (·) = i∈X Ki · Ki where X ⊆ N is any index set, we can construct an
instrument via IY (·) := i∈Y Ki · Ki∗ . Here B would simply be the set of all
P
subsets of X. This instrument ‘implements’ T in the sense that IX = T .
On the other side, given a POVM M on a discrete measurable space (X, B)
with B the powerset of X, we can construct an instrument
X
IY (ρ) := M (Y )1/2 ρ M (Y )1/2 . (1.71)
i∈Y
This is called the Lüders instrument corresponding to the POVM M . The

instrument implements M in the sense that M (Y ) = IY∗ (1) for all Y ∈ B. If
the POVM M is in addition projection valued, then Eq.(1.71) is said to be an
ideal measurement or an ideal instrument. Traditionally, these are the ones that
are used in quantum mechanics text-books to describe measurements and their
effect on the quantum system.
Note that one property of ideal measurements is repeatability. Physically,
this means that if we repeat the measurement (with the same ideal instrument),
then the outcome of the second measurement will be identical to the outcome
of the first measurement. Mathematically, this is reflected by the fact that
IY ◦ IY = IY for any Y ∈ B.
Commuting dilations One of the recurrent mantras of quantum information

theory is the use of larger Hilbert spaces for simplifying mathematical represen-
tations. We have already seen two incarnations of this: the purification of mixed
state density operators and the representation of a quantum channel by a uni-
tary evolution acting on system plus environment. In this section we apply the
same mantra first to POVMs and later to sets of operators and represent them,
in a larger space, by PVMs and sets of commuting operators, respectively. The
core result is the following:
Theorem 1.52 (Naimark’s dilation theorem). Let M : B → B(H) be a POVM
on a measurable space (X, B). There exists a Hilbert space K, an isometry
V : H → K and a PVM M 0 : B → B(K) s.t. for all Y ∈ B:
V ∗ M 0 (Y )V = M (Y ). (1.72)
P the set X of measurement outcomes is finite, one can choose dim(K) =
If
x∈X rank Mx , where Mx := M ({x}) corresponds to the measurement out-
come x ∈ X.
We will provide an elementary proof for the case of finitely many measure-
ment outcomes, and sketch later how the general case follows from Stinespring’s
dilation theorem.
L ⊥
Proof. We define K̃ := x∈X Kx with Kx := ker(Mx ) and equip it with an
inner product X
hϕ, φiK := hϕx , Mx φx i,
x∈X
where ϕ = ⊕x ϕx and φ = ⊕x φx . The space K is then chosen to P be the comple-

tion of K̃ w.r.t. to this inner product. Therefore, dim(K) = x∈X rank Mx .
H is isometrically embedded in K̃, and thus in K, as follows: for any ψ ∈ H let
ψx be the projection of ψ to Kx . Then Ψ := ⊕x ψx satisfies
X X
hΨ, ΨiK = hψx , Mx ψx i = hψ, Mx ψi = hψ, ψi.
x∈X x
So V : ψ 7→ Ψ is an isometry. Defining 1x the identity operator on Kx we

construct a PVM M 0 by setting Mx0 := 1x . Clearly, Mx0 ≥ 0, (Mx0 )2 = Mx0 and
x Mx = 1 so that M is indeed a PVM. Moreover, as desired
0 0
P
hψ, V ∗ Mx0 V ψi = hV ψ, Mx0 V ψiK = hψ, Mx ψi.
One of the consequences of Naimark’s dilation theorem is that we can regard

every POVM as arising from a sharp measurement that is performed on system
plus environment:
Corollary 1.53 (Environment representation of POVMs). Let M : B → B(H)
be a POVM on a measurable space (X, B). There is a Hilbert space K0 , a unit
vector ψ ∈ K0 and a PVM M 0 : B → B(H ⊗ K0 ) so that for all Y ∈ B:
tr [ρM (Y )] = tr ρ ⊗ |ψihψ| M 0 (Y )

∀ρ ∈ B1 (H). (1.73)
Conversely, if ψ and M 0 are as specified, then Eq.(1.73) uniquely defines a

POVM M : B → B(H).
Proof. W.l.o.g. we can assume that the space K appearing in Naimark’s theorem
(Thm.1.52) is isomorphic to H ⊗ K0 for some Hilbert space K0 . This can be
achieved by isometrically embedding K, if necessary, into a larger space, since
this does not change the main result of Naimark’s theorem. For the same reason,
we can assume that the isometry V : H → K of Naimark’s theorem is such that
V (H) is not dense in K. Uder these assumptions, by copying the argument of
the proof of Thm.1.44, we can extend the isometry to a unitary U ∈ B(H ⊗ K0 )
so that V = U 1 ⊗ |ψi for some unit vector ψ ∈ K0 . Naimark’s theorem then
0
leads to Eq.(1.73) after absorbing the unitary U into the PVM M0 .
For the converse direction note that M (Y ) := 1 ⊗ hψ| M (Y ) 1 ⊗ |ψi

inherits all necessary properties for becoming a POVM from M 0 .

A simple but central aspect of Naimark’s theorem is that operators that are
in general not commuting are represented by commuting ones in a larger space.
This point is emphasized in the following corollary:
Corollary 1.54 (Commuting Hermitian dilations). Let H1 , . . . , Hn ∈ B(H)

be Hermitian operators. There is a Hilbert space K of dimension dim(K) ≤
(n + 1)dim(H), an isometry V : H → K and pairwise commuting Hermitian
operators K1 , . . . , Kn ∈ B(K) s.t. Hi = V ∗ Ki V for all i.
Proof. Let Hi = Bi − Bn+i be the decomposition

P2n of Hi into its orthogonal
positive and negative part. With c := i=1 Bi ∞ define Ai := Bi /c and
A0 := 1 − i=1 Ai . Then A0 , . . . , A2n are positive operators that sum up to
P2n
one, and therefore can be regarded as forming a POVM. To this POVM we can
apply Naimark’s dilation theorem.P2nThe dimension of the dilation space K can
then be bounded by dim(K) ≤ i=0 rank(Ai ) ≤ (n + 1)dim(H) where the last
inequality follows from rank(Ai ) + rank(An+i ) ≤ dim(H) for all i ∈ {1, . . . , n}.
If we denote by Pk ∈ B(K) the orthogonal projection that Naimark’s theorem
assigns to Ak via the relation Ak = V ∗ Pk Vk , then we can express
Hi = V ∗ Ki V,

with Ki := c Pi − Pn+i .
Commutativity of the Pi ’s then implies that all Ki ’s commute as well.
If we do not insist on Hermiticity of the commuting dilations, there is an

even simpler construction whose proof does not resort to Naimark’s theorem:
Proposition 1.55 (Commuting dilations). For any finite sequence of operators

A0 , . . . , An−1 ∈ B(H) there exist pairwise commuting operators K0 , . . . , Kn−1 ∈
B(Cn ⊗ H) and a unit vector |0i ∈ Cn s.t.
Ak = h0| ⊗ 1 Kk |0i ⊗ 1

∀k. (1.74)
This means that we can regard Kk as a (possibly infinite) ‘block matrix’

that contains Ak in its north-west block.
Proof. Regarding the range of all indices as Zn with addition modulo n we set
X
Kk := |iihj| ⊗ Ai−j+k ,
i,j
i=0 ⊂ C . This construction clearly satisfies

for a fixed orthonormal basis {|ii}n−1 n
Eq.(1.74). To see that this leads to a commuting set of operators note that
X X
Kk1 Kk2 = |i1 ihj2 | ⊗ Ai1 −j1 +k1 Aj1 −j2 +k2 . (1.75)
i1 ,j2 j1
Replacing j1 with j1 − k2 + k1 does not change this expression (as we sum

over all j1 anyhow) but it effectively interchanges k1 ↔ k2 . Hence, Kk1 Kk2 =
Kk2 Kk1 .
Exercise 1.26 (Complete positivity). Consider finite-dimensional Hilbert spaces.

1.7. UNBOUNDED OPERATORS AND SPECTRAL MEASURES 51
(a) Show that any linear map T : B(H1 ) → B(H2 ) can be written as a linear combi-
nation of four completely positive maps.
(b) Write matrix transposition Θ(A) := AT as a real linear combination of two com-
pletely positive maps.
(c) Use the definition of complete positivity to prove that X → AXA∗ is completely
positive for any A ∈ B(H1 , H2 ).
(d) Show that if T1 , T2 are completely positive maps, then T1 ◦ T2 , T1 + T2 , T1 ⊗ T2
are completely positive as well.
(e) Show that for the partial trace(s) positivity implies complete positive by using not
much more than the definitions of the partial trace and of complete positivity.
Exercise 1.27 (Positive but not completely). Let K ∈ Cd×d be such that K T = −K and
K ∗ K ≤ 1. Show that the map T : Cd×d → Cd×d defined as T (X) := tr [X] 1 − X −
KX T K ∗ is positive. Is it completely positive?
Exercise 1.28 (Kraus operators).
(a) Which is the minimal number of Kraus operators necessary to represent the phase
damping channel ?
(b) Decoherence and decay processes can often be described by a map of the form
T (ρ) = e−t ρ + 1 − e−t tr [ρ] σ,

where t ∈ R+ and σ is a density operator. Find a Kraus representation for this

map.
Exercise 1.29 (Dual maps). Derive the dual map (i.e., description in the Heisenberg
picture) of the following quantum channels:
(a) T (ρ) := λρ + (1 − λ)tr [ρ] σ, where σ is a density operator and λ ∈ [0, 1].
(b) The partial trace tr2 : B1 (H1 ⊗ H2 ) → B1 (H1 ).
(c) T (ρ) := ρ ⊗ σ where σ is a density operator.
(d) T (ρ) := (1tr [ρ] + ρT )/(d − 1), with d < ∞ the Hilbert space dimension.
Exercise 1.30 (Commuting dilations).
(a) Let σ1 , σ2 ∈ C2×2 be the first two Pauli matrices. Give an explicit construction
of two Hermitian, commuting block matrices Σ1 , Σ2 that are such that σi is the
north-west block of Σi , for both i ∈ {1, 2}.
(b) Prove the following statement: there is a Hilbert space K, an isometry V :
Cd → K and a Hermiticity-preserving linear map R : B(Cd ) → B(K) so that
(i) [R(ρ), R(σ)] = 0 for all ρ, σ ∈ B(Cd ) and (ii) V ∗ R(ρ)V = ρ for all ρ ∈ B(Cd ).
1.7 Unbounded operators and spectral measures

In this section we will have a brief look at how to generalize what we know
about Hermitian bounded operators to their unbounded ‘self-adjoint’ relatives.
An unbounded operator A can usually not be defined on the entire Hilbert space
H so that it is necessary to introduce its domain D(A) ⊆ H. The adjoint A∗ of
an operator A : D(H) → H also has to be defined with more care. For that it
will be necessary that D(H) is a dense subspace of H. The adjoint can then be
uniquely defined on D(A∗ ) := {ϕ ∈ H|ψ 7→ hϕ, Aψi is continuous on D(H)} so
that
hϕ, Aψi = hA∗ ϕ, ψi ∀ψ ∈ D(A), ϕ ∈ D(A∗ ). (1.76)
This definition directly exploits the Riesz-representation theorem, which only
gives rise to uniqueness of A∗ if D(A) is dense. D(A∗ ), however, is not auto-
matically dense – it may even happen that D(A∗ ) = {0}.
A densely defined operator A is called self-adjoint if A = A∗ and D(A) =
D(A∗ ). So bounded Hermitian operators are special cases of self-adjoint op-
erators. By the Hellinger-Toeplitz theorem a self-adjoint operator is bounded
iff it can be defined on all of H. This underlines that considering domains is
unavoidable for unbounded operators.
If A is self-adjoint, then (A∗ )∗ = A and the ranges of A ± i1 are the entire
Hilbert space. The latter is related to the fact that the Calyey transform (A −
i1)(A+i1)−1 =: U of a self-adjoint operator A defines a unitary. Exploiting this
relation, von Neumann was able to use the spectral theorem for unitaries, which
are necessarily bounded, to prove a spectral theorem for self-adjoint operators.
One formulation of the spectral theorem is in terms of projection valued
measures (PVMs). For any self-adjoint operator A there is a PVM P : B →
B(H), where B is the Borel σ-algebra on R, so that
Z
A= λ dP (λ). (1.77)
R
The integral is understood in the following weak sense: for any ψ ∈ D(A), ϕ ∈ H
R a Borel-measure µ : B → C via µ(Y ) := hϕ, P (Y )ψi that satisfies
we can define
hϕ, Aψi = R λ dµ(λ). The PVM P that is associated to A is called its spectral
measure and one can show that there is a one-to-one correspondence between
self-adjoint operators and PVMs on (R, B). Not surprisingly, λ ∈ R is an
eigenvalue of A iff P ({λ}) 6= 0. In this case P ({λ}) is the corresponding spectral
projection.
As in the compact case, the spectral representation in Eq.(1.77) leads directly
to a functional calculus. For any measurable function f : R → C we can define
Z
f (A) = f (λ) dP (λ) (1.78)
R
Z
|f (λ)|2 dhϕ, P (λ)ϕi < ∞ .

on D f (A) := ϕ ∈ H

R
If f is bounded, then Eq.(1.78) gives rise to a bounded operator.

Chapter 2
Basic trade-offs
2.1 Uncertainty relations

Heisenberg’s uncertainty relation is one of the most famous consequences of the
formalism of quantum theory. It is one out of at least three superficially related
consequences that can be traced back to Heisenberg’s original paper:
◦ Preparation uncertainty relations: Constraints on individual states regard-

ing how sharp the values of different observables can be in that state.
◦ Measurement uncertainty relations: Constraints on different measurements

concerning their simultaneous implementation.
◦ Measurement-disturbance relations: Constraints on the minimal distur-

bance caused by a quantum measurement.
We will discuss central aspects of these three points in the following two sections.
In the case of observables or sharp POVMs, a central property in the discussion
of uncertainty relations for both preparation and measurement will be the non-
commutativity of operators. So, let us briefly recall some notation and useful
mathematical background related to commutators.
The commutator of two operators that act on the same space will be written
as [A, B] := AB − BA. If the operators are Hilbert-Schmidt class, then the
commutator is obviously trace-less and if A, B are Hermitian, the commutator
is anti-Hermitian (i.e., it becomes Hermitian when multiplied by i). A is said
to commute with B if [A, B] = 0. If a collection of normal, compact operators
commute pairwise, then they can be diagonalized simultaneously. That is, there
is a basis in which they are all diagonal. An analogous statement is true for
arbitrary sets of normal operators. Via continuous functional calculus this im-
plies that if [A, B] = 0 holds for two normal operators, then [f (A),√B] = 0 holds
as well for any continuous function f . In particular, it holds for A when A is
positive.
53
54 CHAPTER 2. BASIC TRADE-OFFS
Variance-based preparation uncertainty relations
Theorem 2.1 (Robertson uncertainty relation). Let H1 , . . . , Hn ∈ B(H) be

Hermitian, ρ ∈ B1 (H) a density operator and define
hXi := tr [ρX] for any
X ∈ B(H). Then the n×n covariance
matrix1 Vkl := 21 {Hk −hHk i, Hl −hHl i}+
and the commutator matrix σkl := 2i [Hk , Hl ]i satisfy
V ≥ iσ, and det(V ) ≥ det(σ). (2.1)
Remark: Positivity of covariance matrices is a well-known and simple to

show property for classical random variables. In the quantum context, the new
term that leads to a more demanding inequality is the commutator matrix.
Proof. We abbreviate Hk − hHk i1 =: Ak and define an n × n matrix Rkl :=

hAk Al i. The claim is that R ≥ 0. In order to see this, note that Rkl =
√ √
hAk ρ, Al ρi is a Gram matrix w.r.t. the Hilbert-Schmidt inner product and
thus positive. Decomposing every matrix element Rkl into real and imaginary
part and using that Rkl = Rlk together with [Ak , Al ] = [Hk , Hl ] we obtain
R = V − iσ. So the l.h.s. of Eq.(2.1) is just a reformulation of R ≥ 0.
The determinant inequality, in turn, is implied by V ≥ iσ. Here, a central
ingredient in the argumentation is the anti-symmetry of σ. First, this implies
that det(σ) can be non-zero only in even dimensions. Second, assuming even
dimensions, σ can be block-diagonalized to a direct-sum of anti-symmetric 2 × 2
matrices via orthogonal transformations. From here one can use a classical result
by Williamson on symplectic normal forms, which allows to map V 7→ SV S T
to the same block-diagonal structure while keeping σ unchanged. Hence, the
sought implication is reduced to the one for 2×2 matrices, where it can be shown
by direct computation. For an alternative proof of the determinant inequality
see Exercise 2.4.
For a pair of observables, writing out the determinant inequality immediately

leads to the following, better known, uncertainty relation:
Corollary 2.2 (Robertson-Schrödinger uncertainty relation). Let A, B ∈ B(H)

be Hermitian, ρ ∈ B1 (H) a density operator, hXi := tr [ρX] for any X ∈ B(H),
and var(A) := hA2 i − hAi2 . Then
1 2 1
2
var(A)var(B) ≥ h[A, B]i + {A − hAi, B − hBi}+ . (2.2)
4 4
Moreover, equality holds iff (αA − βB)ρ = γρ for some (α, β, γ) ∈ C3 \ {0}.
Remark: This corollary as well as Robertson’s uncertainty relation in Thm.

2.1 also applies to Hermitian operators that are not necessarily bounded. The
point that requires additional care in this case are the domains of all involved
operators. For instance, in Robertson’s uncertainty relation, if ρ = |ψihψ| and
1 Here {·, ·} denotes the anti-commutator, defined as {A, B} = AB + BA and H − hH i
+ + k k
should be read as Hk − hHk i1.
2.1. UNCERTAINTY RELATIONS 55
T
if D(Hk Hl ) is the domain of Hk Hl , then we need ψ ∈ kl D(Hk Hl ). In this
way, Heisenberg’s uncertainty relation for position and momentum is obtained
from Cor.2.2 by neglecting the covariance term on the r.h.s. and inserting i1
for the commutator of the position and momentum operator.
Proof. As pointed out already, the inequality stated in Eq.(2.2) is just a refor-
mulation of the determinant inequality in Eq.(2.1) for the special case of two
observables. In order to characterize cases of equality we will, however, use a
different proof. Assume for the moment that ρ = |ψihψ| and set Ã := A − hAi1,
B̃ := B − hBi1. Then Cauchy-Schwarz gives
2 2 2 2
Ãψ B̃ψ ≥ |hψ, ÃB̃ψi|2 = Rehψ, ÃB̃ψi + Imhψ, ÃB̃ψi . (2.3)

Inserting the expressions defining Ã and B̃ then leads to the claimed uncertainty
relation in Eq.(2.2) for pure states. The advantage of this proof is that we know
that equality in the Cauchy-Schwarz inequality, and thus in the uncertainty
relation, holds iff αÃψ = β B̃ψ for some α, β ∈ C. This proves the claimed
characterization of cases of equality for pure states (with γ necessarily being
equal to αhAi − βhBi).
The result can be lifted to mixed states by purification (Cor1.38). If a unit
vector ψ ∈ H1 ⊗ H2 characterizes a purification of ρ = tr2 |ψihψ| and if we use
Ã ⊗ 1 and B̃ ⊗ 1 in Eq.(2.3) instead of Ã and B̃, then we arrive at the general
form of the uncertainty relation in Eq.(2.2) for mixed states. Equality is then
attained iff ψ is in the kernel of (αÃ − β B̃) ⊗ 1 for some α, β ∈ C. Exploiting
the Schmidt-decomposition of ψ (1.37) we can see that this is equivalent to the
statement that every eigenvector of ρ that corresponds to a non-zero eigenvalue
has to be in the kernel of (αÃ − β B̃). This, in turn, is equivalent to the claimed
characterization.
States, in particular pure states, that achieve equality in this uncertainty

relation are sometimes called minimal uncertainty states. On should keep in
mind, however, that they might not minimize the product of the variances
(‘uncertainties’) among all states. Imposing equality only means that the two
sides are equal—they are not necessarily small.
Joint measurability
Definition 2.3 (Joint measurability). Two POVMs Mi : Bi → B(H), i ∈ {1, 2}

on measurable spaces (Xi , Bi ) are jointly measurable if there exists a POVM
M : B → B(H) defined on the product σ-algebra B on X1 × X2 s.t.
M (Y1 , X2 ) = M1 (Y1 ) ∀Y1 ∈ B1 ,

M (X1 , Y2 ) = M2 (Y2 ) ∀Y2 ∈ B2 .
Theorem 2.4 (Joint measurability vs. commutativity). Consider two POVMs

Mi : Bi → B(H), i ∈ {1, 2} on measurable spaces (Xi , Bi ) and assume that at
least one of them is sharp (i.e. projection valued). Then M1 and M2 are jointly
measurable iff they commute in the sense that ∀Yi ∈ Bi : [M1 (Y1 ), M2 (Y2 )] = 0.
In that case the joint POVM M : B → B(H) is characterized by M (Y1 × Y2 ) =
M1 (Y1 )M2 (Y2 ).
Proof. Assume that the two POVMs commute. Since commutativity is a prop-
erty that extends to the square root, we can use that
p p
M1 (Y1 )M2 (Y2 ) = M1 (Y1 )M2 (Y2 ) M1 (Y1 ) =: M (Y1 × Y2 )
defines a proper POVM, which by construction has M1 and M2 as its marginals

in the sense of Def.2.3. So the two POVMs are jointly measurable. Note that
for this direction we haven’t used that any of the POVMs is sharp.
Conversely, suppose there is a joint POVM M and that M1 is projection-
valued. The core of the argument will be the fact that if a positive operator A is
bounded from above by a projection P ≥ A, then A = AP = P A (cf. Exercise
1.7). This applies, in particular, to M1 (Y1 ) ≥ M (Y1 × Y2 ) and similarly to the
case where Y1 is replaced by Ȳ1 := X1 \ Y1 . Since M1 (Y1 )M1 (Ȳ1 ) = 0 this leads
to M (Ȳ1 × Y2 )M1 (Y1 ) = 0 and with M (Y1 × Y2 )M1 (Y1 ) = M (Y1 × Y2 ) we obtain
M2 (Y2 )M1 (Y1 ) = M (Y1 × Y2 )M1 (Y1 ) + M (Ȳ1 × Y2 )M1 (Y1 )

= M (Y1 × Y2 ).
Following the same steps, we can show that M1 (Y1 )M2 (Y2 ) = M (Y1 × Y2 ).
Hence, M1 commutes with M2 .
2.2 Information-disturbance
No information without disturbance
Theorem 2.5 (Knill-Laflamme/no information without disturbance). Let T :

B1 (H1 ) → B1 (H2 ) be a quantum channel and dim(H1 ) < ∞. The following are
equivalent:
(i) There exists a quantum channel D : B1 (H2 ) → B1 (H1 ) s.t. D ◦ T = id.

Pr
(ii) For any Kraus representation T (·) = j=1 Kj · Kj∗ there is a density
matrix σ ∈ Cr×r so that
Ki∗ Kj = σij 1 ∀i, j. (2.4)
(iii) Any instrument I : B → CP (H1 , H2 ) on a measurable space (X, B) that

implements the channel (in the sense that IX = T ) satisfies:
IY∗ (1) ∝ 1 ∀Y ∈ B. (2.5)

2.2. INFORMATION-DISTURBANCE 57
∗
P
Proof. Assuming (i) and using a Kraus decomposition of D(·) = l Al · Al
we can exploit the bijective relation between a completely positive map and
its Choi matrix (cf. Thm.1.49) to show that (Al Kj ⊗ 1)|Ωi = clj |Ωi for some
complex P number clj and thus Al Kj = 1cP ∗
lj . As D is unital, this leads to
Pi Kj∗= l Ki Al Al Kj = 1σij with σij = l c̄li clj . So σ is positive and since

∗ ∗ ∗
K
j Kj Kj = 1 we have to have tr [σ] = 1, which proves (i)⇒(ii).
If (ii) holds, and I is an instrument that implements T , then the Kraus
operators of IY have to satisfy Eq.(2.4) as well since they appear in a partic-
ular Kraus representation of T . Consequently, IY∗ (1) is proportional to 1. So
(ii)⇒(iii).
For the converse direction ((iii)⇒(ii)) note first that for every Kraus operator
K of T there exists an instrument with two outcomes, which we may label
with K and ¬K, whose corresponding completely positive maps are given by
K · K ∗ =: IK (·) and I¬K := T − IK , respectively. Then Eq.(2.5) applied to
this instrument implies that K ∗ K ∝ 1. If Kj and Ki are two Kraus-operators,
we know from the ambiguity of the Kraus representation (Prop. 1.45) that
a multiple of any linear combination of them is a possible Kraus-operator as
well. Consequently, in particular, (Ki + γKj )∗ (Ki + γKj ) ∝ 1 for any γ ∈ C.
An application of the polarization identity of Eq.(1.7) then implies Eq.(2.4).
Positivity of σ and tr [σ] = 1 are then consequences of its definition and of
unitality of T ∗ . So (iii)⇒(ii).
For proving (ii)⇒(i) we exploit the freedom in the Kraus representation
(Prop.1.45) again. It allows to choose Kraus-operators for which σij = δij si
is diagonal (by using the unitary that diagonalizes σ to construct new Kraus-
√
operators according to Prop.1.45)). Then each Ki = si Vi is a multiple of an
isometry Vi : H1 → Ki ⊆ H P2 where the Ki ’s are mutually orthogonal subspaces
∗
of H 2 . The map D(·) := i Vi · Vi then satisfies that D ◦ T = id. Moreover,
i 1Ki ≤ 1 and if equality does not hold, which is then due to
∗
P P
i Vi Vi =
K0 := H2 ⊕i Ki being non-empty, we can always make D trace-preserving by
adding a suitable completely positive map from B1 (K0 ) to B1 (H1 ).
As already suggested by the name given to the theorem, the equivalence of

(i) and (iii) has an interpretation that should be emphasized. Point (iii) means
that no information about an input state ρ is contained in the measurement out-
comes since their probabilities are all proportional to tr [ρ] with proportionality
constants that depend on the instrument only, and not on ρ. Point (i) on the
other hand, means that any ‘disturbance’ caused by T can be undone by some
channel D. So (i)⇒(iii) means that if there is no (uncorrectable) disturbance,
then no information about they state of the system is revealed. Conversely,
(iii)⇒(i) means that if no information has leaked into the environment, then
the effect of T can be undone.
The equivalent condition (ii) is sometimes called Knill-Laflamme condition.
The following discussion aims as explaining the appearance of this condition in
its natural environment.
Example 2.1 (Quantum error correcting codes). The condition in Eq.(2.4) plays
a crucial role in the context of quantum error correction. To see how, we first
need to define what is a quantum error correcting code (QECC). A QECC

is a linear subspace that can be thought of being the image of an isometry
V : (C2 )⊗k → H := (C2 )⊗n . In this case k qubits are encoded into n qubits
via a quantum channel E(ρ) := V ∗ ρV . Let Pd ⊂ B(H) be the set of all tensor
products of n Pauli matrices (including σ0 = 1) that differ on at most d tensor
factors from the identity σ0 . A QECC is called an [[n, k, d]] QECC, if
V ∗F V ∝ 1 (2.6)
holds for all F ∈ Pd−1 but fails for some F ∈ Pd . What is the reason behind this
definition? Consider a quantum channel Φ : B1 (H) → B1 (H), which models the
noise/decoherence/errors that affect the n qubits, whose Kraus-operators {Ai }
are all in the linear span of Pt with t := b d−1
2 c. Then the Kraus operators {Ki }
of T := Φ ◦ E satisfy Eq.(2.4) so that Thm.2.5 guarantees the existence of a
decoding quantum channel D such that D ◦ Φ ◦ E = id. d is called the distance
of the code and t can be interpreted as the number of errors the code corrects.
An important point to note is that a given [[n, k, 2t + 1]]-QECC does not
only work for one noise-characterizing channel Φ, but for all channels whose
Kraus-operators are in the linear span of Pt .
2.3 Time-energy
Mandelstam-Tamm inequalities
Theorem 2.6 (Mandelstam-Tamm inequality). Let A, H ∈ B(H) be Hermitian,

ψ(t) = exp[−itH]ψ(0), t ∈ R+ describe the time evolution of pure states in H,
1/2
hAi := hψ(t), Aψ(t)i, ∆(A) := hA2 i − hAi2 and ∆(H) analogously. Then

1 dhAi
∆(H)∆(A) ≥ . (2.7)
2 dt
Moreover, for any Hermitian H, any unit vector ψ(0) and any τ ≥ 0 there is a
Hermitian A ∈ B(H) so that equality holds in Eq.(2.7) when evaluated at t = τ .
Remark : Note that ∆(H) is time-independent. With the necessary care

concerning the domain, Eq.(2.7) also holds for an unbounded Hermitian Hamil-
tonian H.
d
Proof. With dt hAi = ih[H, A]i Eq.(2.7) is a direct consequence of the Robertson-
Schrödinger uncertainty relation (Cor.2.2) when neglecting the anti-commutator
term and taking the square root (since ∆(A) = var(A)1/2 ).
For showing tightness of the inequality we define A := B + B ∗ with B :=
i(H − hHi1)|ψ(τ )ihψ(τ )|. By construction, hAi|t=τ = 0 as well as h{A, H −
hHi}+ i|t=τ = 0. Moreover, Aψ(τ ) = i(H − hHi1)ψ(τ ) so that equality holds in
the Robertson-Schrödinger uncertainty relation and therefore also in Eq.(2.7).
2.3. TIME-ENERGY 59
Corollary 2.7 (Life-time/energy-width uncertainty relation).

Let ψ(t) = exp[−itH]ψ(0), t ∈ R+ describe the time evolution of pure states
and define p(t) := |hψ(t), ψ(0)i|2 . Then
p
∆(H) t ≥ arccos p(t) , so that (2.8)
π π
∆(H)t1/2 ≥ , ∆(H)t0 ≥ , (2.9)
4 2
where t1/2 and t0 are the shortest times (according to Eq.(2.8)) for p(t) to drop
to 1/2 and 0, respectively.
Proof. We apply the Mandelstam-Tamm uncertainty relation in Eq.(2.7) to A =

1/2
|ψ(0)ihψ(0)|. Then p(t) = hAi and ∆(A) = p(t) − p(t)2 so that
τ
|ṗ(t)|
Z p
∆(H) τ ≤ p dt = arccos p(τ ) .
2 p(t) − p(t)2
0
p
The inequalities in Eq.(2.9) then follow from Eq.(2.8) by using arccos( 1/2) =
π/4 and arccos(0) = π/2.
Note that p(t) can be interpreted as the probability of the system still being
in its initial state after time t. That is, if a projective measurement with two
outcomes and corresponding projectors P0 := |ψ(0)ihψ(0)| and P1 := 1 − P0
is performed after time t, then the outcome corresponding to P0 occurs with
probability p(t).
Evolution to orthogonal states In Cor.2.7 we have seen that the Mandelstam-

Tamm inequality implies a lower bound on the time it takes for a quantum sys-
tem to evolve to an orthogonal state. We will now discuss alternative bounds
of this type and then derive a condition for the feasibility of such an evolution.
The following Lemma will be the main ingredient in the proof of the subsequent
‘quantum-speed-limit’ theorem.
Lemma 2.8 (First zero of a characteristic function). Let µ be a Borel-probability

Rmeasure on [0, ∞). Define its characteristic function χ : R → RC by χ(t) :=
−itλ
R+
e dµ(λ) and its p’th moment for any p > 0 by mp := R+ λp dµ(λ).
Then π
t0 := inf t > 0|χ(t) = 0 ≥ . (2.10)
(2mp )1/p
With this we can prove the following:
Theorem 2.9 (Generalized Margolus-Levitin bound). Let H ≥ 0 be self-adjoint

and ψ(t) := exp[−iHt]ψ, t ∈ R+ for some unit vector ψ in the domain of H.
If hψ, H p ψi is defined for some p > 0, and t0 := inf t > 0|hψ, ψ(t)i = 0 , then
π
t0 hψ, H p ψi1/p ≥ . (2.11)
21/p
R
Proof. Exploiting positivity of H and the spectral representation H = R+λdP (λ)
we can define a Borel-probability measure on [0, ∞) via µ(Y ) := hψ, P (Y )ψi.
For p > 0, the p’th moment of µ is then given by mp = hψ, H p ψi and its
characteristic function by
Z Z
e−iλt dµ(λ) = e−iλt dhψ, P (λ)ψi = hψ, e−iHt ψi.
R+ R+
The claim follows then from Lemma 2.8.
For p = 2 Eq.(2.11) is similar to the consequence that we obtained in Cor.2.7
from the Mandelstam-Tamm inequality. In fact, √ at first glance, Eq.(2.11) looks
even stronger since there is a missing factor 1/ 2. Note, however, that Eq.(2.11)
requires an additional assumption, namely positivity of the Hamiltonian.
For p = 1 Eq.(2.11) is called the Margolus-Levitin bound, which directly
relates the energy of a pure state (w.r.t. a positive Hamiltonian) to the minimal
time it takes to evolve into an orthogonal state. So far, we do, however, not
know under which circumstances a pure state ψ will ever evolve to an orthogonal
state under the time-evolution governed by the Hamiltonian H. For obtaining
a better understanding of this matter, it is useful to import the following classic
result:
Lemma 2.10 (Kronecker-Weyl). Let x ∈ [0, 1)d be a point in the unit-cube so
that 1, x1 , . . . , xd are linearly independent over Q. Then the sequence of points
(nx)n∈N ∈ [0, 1)d where each coordinate is understood mod 1 is uniformly
distributed (and thus in particular dense) in [0, 1)d .
With this Lemma we can now show that a necessary and ‘generically’ also
sufficient condition for a pure state to ever evolve to an orthogonal state is that
its overlap with any of the eigenvectors of the Hamiltonian is not larger than
1/2:
Theorem 2.11 (Condition for reaching minimal overlap). Let dim(H) < ∞,
H ∈ B(H) Hermitian with an orthonormal basis of eigenvectors {ϕi } and cor-
2
responding eigenvalues {λi }.
For any ψ ∈ H define p := maxi |hψ, ϕi i| and
−iHt

ν := inf t∈R+ |hψ, e ψi| . Then
ν ≥ max{0, 2p − 1}, (2.12)
with equality if the eigenvalues {λi } (as a multiset) are linearly independent over
Q.
Proof. Using the spectral decomposition of H we can write |hψ, e−iHt ψi| =
−iλk t
where pk := |hψ, ϕk i|2 . Since the pk ’s are positive and sum up to
P
k pk e

one, this is a convex combination (i.e. a weighted average) of complex numbers
P one. Assume w.l.o.g. that p = p1 . From the triangle-inequality and
of modulus
using k>1 pk = 1 − p we obtain

X X X
−iλk t i(λ1 −λk )t
pk e ≥ p− pk e ≥ p− pk = 2p − 1, (2.13)

k k>1 k>1
2.3. TIME-ENERGY 61
thus proving the inequality in Eq.(2.12).

For proving that equality holds if the λk ’s are independent over Q we want to
use Lemma 2.10. To this end, note that {tλk /2π}k ∪{1} are linearly independent
iff {λk /2π}k ∪ {1/t} is. The latter can, however, always be achieved by a
suitable choice of t > 0 if only the λk ’s are linearly independent: since R is
infinite-dimensional over Q we can always find a t > 0 s.t. 1/t is linearly
independent of the λk ’s. Consequently, Lemma 2.10 implies that for any α ∈
[0, 2π)d with d := dim(H) there is a sequence with elements tn ∈ R+ s.t.
exp[−itn λk ] → exp[iα Pk ] for all k ∈ {1, . . . , d}. It remains to show that there
exists an α so that k pk exp[iαk ] = max{0, 2p − 1}. If p ≥ 1/2, a solution is
given by α1 = 0 and αk = π for all k ≥ 2. So consider the case p < 1/2. First,
we partitionP {1, . . . , d} = A1 ∪ A2 ∪ A3 into disjoint subsets that are chosen so
that pAi := k∈Ai pk are all three smaller than 1/2. For k in A1 , A2 , A3 we set
αk equal to 0, β and γ, respectively. β and γ then have to be chosen so that
pA1 + pA2 eiβ = pA3 e−iγ .
To see that this is feasible, regard the two sides of this equation as parametriza-
tions of two circles in the complex plane (when varying β, γ ∈ [0, 2π)). The
circles have radii pA2 and pA3 and their centers are pA1 apart. Assuming w.l.o.g.
that pA1 is the largest of the three weights, we see that the circles always in-
tersect, i.e. there is always a solution, since pA2 + pA3 = 1 − pA1 ≥ pA1 as
pA1 ≤ 1/2.
Corollary 2.12 (Condition for evolving to an orthogonal state). dim(H) < ∞.
Except for a null set in the set of Hermitian operators H ∈ B(H), every such
H satisfies the following: for any unit vector ψ ∈ H the evolved state ψ(t) :=
exp[−iHt]ψ eventually satisfies inf t∈R+ |hψ, ψ(t)i| = 0 iff the maximal overlap
maxi |hψ, ϕi i|2 with any of the normalized eigenvectors ϕi of H is at most 1/2.
Proof. In order to be able to use Thm.2.11, we have to exclude any H whose
eigenvalues {λi } are linearly dependent over Q. With d := dim(H) this means
that there is a q ∈ Nd \ {0} so that hq, λi = 0. For a fixed q this equation
determines a hyperplane Sq := {λ ∈ Rd |hq, λi = 0} that has Lebesgue measure
µ(Sq ) =0. Since the union of all these hyperplanes is countable, we still have
µ ∪q Sq = 0. Since this is equally true in any basis, the set of Hamiltonians to
exclude forms a null set. For the remaining ones, Thm.2.11 proves the claim.
identity for 2×2 matrices). Show that for any A, B, C ∈ C

2×2
Exercise 2.1 (Commutator
2

the relation [A, B] , C = 0 holds.
Exercise 2.2 (Uncertainty relations). Let H1 , H2 ∈ B(H) be Hermitian, ρ ∈ B1 (H) a
density operator and Ai := Hi − tr [ρHi ] 1.
(a) Express the inequality tr [ρBB ∗ ] ≥ 0 as an uncertainty relation for ρ, H1 , H2 by
inserting B := A1 + iγA2 and optimizing over all γ ∈ R.
(b) Apply the derived uncertainty relation for H ' C2 to a pair of Pauli matrices.
Identify ‘minimal uncertainty states’ that achieve equality in this uncertainty re-
lation. Where are they located in the Bloch ball?
(c) Which uncertainty relation is obtained when optimizing over all γ ∈ C?

Exercise 2.3 (Canonical commutation relation). Let Q, P be operators on a Hilbert space
H that satisfy the ‘canonical commutation relation’ [Q, P ] = i1.
(a) Show that necessarily dim(H) = ∞ and that Q, P cannot be Hilbert-Schmidt class
operators.
(b) Prove that for any n ∈ N: [Qn , P ] = inQn−1 .
(c) Use (b) to show that Q and P cannot both be bounded operators.
Exercise 2.4 (Tensor-power trick). We write A⊗n := A ⊗ . . . ⊗ A for the n-fold tenor
product of A.
(a) Let A, B ∈ B(H) be Hermitian, A invertible and A ≥ ±B (meaning that the
inequality holds for both signs). Show that A ≥ 0 and A⊗n ≥ ±B ⊗n for all
n ∈ N.
(b) Show that for H ' Cd there is a ψ ∈ H⊗d so that for any A ∈ B(H) : det(A) =
hψ, A⊗d ψi.
(c) Use (a) and (b) to prove that in Eq.(2.1) from Robertson’s uncertainty relation
V ≥ iσ implies det(V ) ≥ det(σ).
Exercise 2.5 (Quantum error correction).
(a) Why are Pauli matrices used in the definition of an [[n, k, d]]-QECC? What if an
‘error’ occurs that is not described by one of the three Pauli matrices?
(b) Assume you have encoded k qubits into n qubits using an [[n, k, d]] quantum error
correcting code. Unfortunately, d − 1 of the qubits were completely destroyed (a
cat jumped out of a box and knocked over this part of the experiment). The good
news is that the remaining qubits are perfectly intact. Show that and how you
can perfectly recover the state of the original k qubits.
Exercise 2.6 (Time-energy uncertainty relation).
(a) Formulate and prove the Mandelstam-Tamm uncertainty relation for mixed states.
(b) Consider a finite-dimensional Hamiltonian that satisfies 0 ≤ H ≤ 1 and that
governs the time evolution of a pure state via ψ(t) = exp[−iHt]ψ. Let t0 be the
first time so that hψ, ψ(t)i = 0. Provide a lower bound on t0 that is as good as
possible and that does not depend on ψ.

Quantum Information Processing Lecture Notes, Wolf

Uploaded by

Quantum Information Processing Lecture Notes, Wolf

Uploaded by

Mathematical Introduction to

Quantum Information Processing

June 22, 2019

1.1 Hilbert spaces

A central concept that is enabled by an inner product is orthogonality: ψ, ϕ

Example 1.2. The sequence space l2 (N) := ψ ∈ CN | k |ψk |2 < ∞ be-

As in the more general case of metric spaces, the completion is constructed

Lemma 1.2 (Johnson-Lindenstrauss). There is a universal constant c ∈ R

c) Let A, B : H → H be linear. Show that A = B iff ∀ψ ∈ H : hψ, Aψi = hψ, Bψi.

1.2 Bounded Operators

theorem (for ‘bounded linear transformation’) there exists a unique extension

A = A+ − A− where A± ≥ 0 and A+ A− = 0. (1.6)

Ideals of operators Various interesting subspaces of operators in B(H1 , H2 )

B0 (H) ⊆ B1 (H) ⊆ B2 (H) ⊆ B∞ (H) ⊆ B(H). (1.9)

tr [AB] = tr [BA] , (1.13)

if one of the operators is trace-class or both are Hilbert-Schmidt class. Similarly,

Convergence of operators Let us now have a look at different notions of

◦ Weak convergence, which requires hψ, (An − A)ϕi → 0 for all ϕ, ψ ∈ H,

◦ Weak-* convergence 5 , which requires tr [(An − A)B] → 0 ∀B ∈ B1 (H),

◦ Strong convergence, which requires k(An − A)ψk → 0 for all ψ ∈ H.

Functional calculus If A is an operator on H and f : C → C a function,

5 a.k.a. ultraweak convergence or σ-weak convergence.

where the sum converges in trace-norm. This is called continuous functional

Exercise 1.5. Let A, B ∈ B(H) be Hermitian. Show that

1.3 Probabilistic structure of Quantum Theory

6 We use quantum mechanics and quantum theory synonymously.

Definition 1.4 (Density operators). ρ ∈ B1 (H) is called a density operator if

A pure density operator is completely specified by the corresponding unit

Equality in the Cauchy-Schwarz inequality holds iff ρ is a multiple of 1, and

P ρn P ⊥ ≤ ρn P ⊥ = tr U ∗ √ρn √ρn P ⊥ ≤ tr [ρn ] tr [ρn P ⊥ ]

Measurements Let X be the set of all possible measurement outcomes in a

Definition 1.7 (POVMs). A positive operator valued measure (POVM) on a

for any countable, disjoint partition X = ∪k Xk with Xk ∈ B. A POVM is

Due to Eq.(1.18), M is also called resolution of identity in the literature. If

Lemma 1.8. Let M : B → B(H) be a POVM and J, Y ∈ B.

(1) If J ⊆ Y , then M (J) + M (Y \J) = M (Y ) and M (J) ≤ M (Y ),

(2) M (J ∪ Y ) ≤ M (J) + M (Y ) with equality if Y ∩ J = ∅.

Proof. Using Eq.(1.18) twice, we get

By subtraction of the two lines we obtain M (Y ) − M (J) = M (Y \J) ≥ 0, which

If a POVM M is projection valued, then 0 ≤ M (Y ) ≤ 1 implies that

p(Y |ρ, M ) = tr [ρM (Y )] . (1.19)

Corollary 1.10. For any density operator ρ ∈ B1 (H) and POVM M : B →

Proof. First observe that ∀Y ∈ B : 0 ≤ p(Y ) ≤ 1. The lower bound follows

p(Y |ρ, M ) − p(Y |ρ0 , M ) ≤ 1 kρ − ρ0 k ,

sup p(Y |ρ, M ) − p(Y |ρ, M 0 ) = kM (Y ) − M 0 (Y )k∞ .

Proof. Consider the decomposition (ρ − ρ0 ) = ∆+ − ∆− into orthogonal positive

p(Y |ρ, M ) − p(Y |ρ0 , M ) = tr [∆+ M (Y )] − tr [∆− M (Y )] ≤ tr [∆+ M (Y )]

The fact that Eq.(1.21) is tight provides an operational interpretation for

Observables and expectation values So far we have treated the measure-

If the probability measure p is represented according to Born’s rule, we can

9 Traditionally, the term observable is associated to self-adjoint operators. Sometimes,

however, it is also used more generally, often synonymous with measurement.

Definition 1.13. Let V be a real vector space.10

◦ A subset C ⊆ V is called convex, if x, y ∈ C implies that λx+(1−λ)y ∈ C

◦ For a subset S ⊆ V define the convex

◦ An extreme point of a convex set C is an element e ∈ C with the property

Theorem 1.14 (Caratheodory). Let V be a normed space, C ⊆ V a compact

Theorem 1.15 (Krein-Milman). Let V be a locally convex topological vector

By Alaoglu’s theorem, in the weak-* topology a set C ⊆ B(H) is compact iff

Theorem 1.16 (Russo-Dye, Kadison-Pedersen). In the operator-norm topol-

P As φ is in the convex hull of C, there is a finite subset Ξ ⊆ C so that

Definition 1.18 (Majorization). Let λ, µ be two finite (and equal-length) or

For a pair of density operators ρ1 , ρ2 ∈ B(H) we write ρ1 ≺ ρ2 if the sequence

Example 1.9 (Permutation matrices). Let N be either N or {1, . . . , d} for d ∈

When applied to density operators, this gives:

Convex functionals In this paragraph we have a closer look at convex func-

where we have used that l(A) = α1 + βA if l is of the form l(x) = α + βx.

(i) F is convex on C(H).

Proof. (i) Let Aλ := λA1 + (1 − λ)A0 be a convex combination of A0 , A1 ∈ C(H)

holds for all Hermitian A, B ∈ B(Cd ) whose spectra are contained in I.

Corollary 1.26 (Klein inequalities). Let I ⊆ R be an open interval, A, B ∈

Entropy An important example of a convex trace function is the negative

Definition 1.27 (Entropy). The von Neumann entropy (short entropy) of

1.5 Composite systems and tensor products

Direct sums We begin with the simpler construction:

h(ψ1 , ϕ1 ), (ψ2 , ϕ2 )i := hψ1 , ψ2 i + hϕ1 , ϕ2 i.