Notes 114
Notes 114
Fabian Haiden
Introduction 2
3 Lebesgue Measure 16
4 Integration 25
7 Lp spaces 55
9 Fourier analysis 74
1
Introduction
An alternative title for this course is The Journey towards Functional Anal-
ysis. Functional analysis is the theory of infinite-dimensional vector spaces over
the real or complex numbers. In typical examples the “vectors” are functions
of some specified type on a space, such as Rn . The motivation for this theory
and major applications come from:
• Partial differential equations, in particular proving existence of solutions.
• Finding a natural setting for the Fourier transform.
which is convergence in mean squared error. Of course the choice will depend
on the particular application one has in mind, but the point is that one needs
to carefully keep track of these different notions, unlike for Rn .
The next point, which is related to the first, is that an infinite-dimensional
vector space, together with a notion of distance of vectors, may not be complete
in the same sense in which Q is not complete, i.e. there are Cauchy sequences
2
which have no limit. This is an issue which manifests itself in Fourier theory.
Take a continuous function f : [0, 2π] → C which has Fourier coefficients
Z 2π
1
an = f (x)e−inx dx, n ∈ Z.
2π 0
The Parseval identity says Fourier transform preserves length, i.e.
Z 2π
2
X
2 1
kak := |an | = |f (x)|2 dx =: kf k2 .
2π 0
n∈Z
leading to the Dominated convergence theorem or when can one change the order
of integration, i.e. Z Z Z Z Z
f= f= f
X×Y X Y Y X
3
with non-integer dimension, like fractals, for example Cantor’s set (see Figure 1).
Thus it interpolates between 0-dimensional measure (counting), 1-dimensional
measure (length), 2-dimensional measure (area), and so on. From a more ax-
iomatic point of view one can study abstract measure theory, which provides the
mathematical foundation for probability theory and information theory.
4
Chapter 1
1 The paradox
The Banach–Tarski “paradox” asserts that one can take a solid ball B in R3
and cut into finitely many pieces which can then be reassembled to give two full
copies of B.
Theorem 1.1 (Banach–Tarski). Let B be the solid unit ball in R3 , then there
exists a decomposition
(1) B = B1 ∪ . . . ∪ Bn
of B, and euclidean transformations (rotation-translations) T1 , . . . , Tn such that
there is another decomposition
(2) T1 (B1 ) ∪ . . . ∪ Tn (Bn ) = B ∪ B 0
where B 0 is a translated, disjoint copy of B.
The proof is non-constructive, it does not tell you what the pieces Bi are.
The number n, on the other hand, comes explicitly out of the proof. One can
achieve n = 10, though that require a bit more work than larger values of n. One
can imagine them as very noisy, much more pathological in shape than the solids
one deals with in geometry. The paradox comes from the fact that it contradicts
basic conservation of mass and our everyday experience. Mathematically, it
show the impossibility of assigning a volume, in any reasonable sense, to all
subsets of R3 .
5
2 The proof
The main steps of the proof are:
1. Find a paradoxical decomposition of the free group with two generators,
F2 .
2. Realize F2 as a group of rotations in R3 .
Step 1
What is F2 ? Its elements are possible strings of letters a, b, a−1 , and b−1 for
example
with the only restriction that no consecutive pair of letters can be cancelled, i.e.
aa−1 , a−1 a, bb−1 , b−1 b do not appear as substrings. Multiplication is defined by
concatenating the strings and then cancelling any forbidden pairs, for example
The neutral element, e, is the empty string, while the inverse is formed by
inverting each letter and reversing their order, for example
−1
(5) ab−1 a−1 b = b−1 aba−1 .
A beautiful way of visualizing F2 is via its Cayley graph, see Figure 2. The graph
has a vertex for each element of F2 and an edge whenever two such elements
are related by left-multiplication with one of the letters, a, b, a−1 , b−1 .
Write S(a) for the set of elements in F2 which start with a, and similarly
S(b), S(a−1 ), S(b−1 ). The on one hand we have a partition
but also
To see this note that aS(a−1 ) is exactly the set of strings in F2 which do not
start with a. Here aS(a−1 ) is the set of elements of F2 which are of the form
a · x for some x ∈ S(a−1 ).
6
b
a−1 a
b
b b
a−1 a−1 a a
b−1 b−1
b b
a−1 a a−1 a
b
b b
b b
a−1 a−1 a−1 a a a
b−1 b−1
b−1 b−1
a−1 a a−1 a
b−1
b−1 b−1
b b
a−1 a−1 a a
b−1 b−1
b−1
a−1 a
b−1
F2 = S1 ∪ S2 ∪ S3 ∪ S4 = S1 ∪ aS2 = S3 ∪ bS4 .
7
Step 2
So far we have viewed F2 as an abstract group, its elements given by strings
of symbols. The next step is to realize each element of F2 as a rotation in 3-D
space, in a way compatible with composition and so that no two elements get
mapped to the same rotation. In other words, we want to find an injective group
homomorphism F2 → SO(3). We choose
3 4
0 1 0 0
5 5
a 7→ − 45 35 0 =: A, b 7→ 0 3 4 =: B
5 5
0 0 1 0 − 45 35
which are rotations about the z-axis and x-axis, respectively. In order to be
compatible with composition, we must send any other element of F2 to the
corresponding product of matrices, e.g. ab−1 7→ AB −1 . The tricky thing to
check is that no non-empty string is mapped to the identity matrix.
It suffices to consider 5A, 5B, 5A−1 , 5B −1 and check that no non-trivial
composition gives a matrix with all coefficients divisible by 5. Thus, we consider
coefficients as elements in F5 , the field with 5 elements, and the matrices
3 4 0 0 0 0
5A = −4 3 0 , 5B = 0 3 4
0 0 0 0 −4 3
3 −4 0 0 0 0
5A−1 = 4 3 0 , 5B −1 = 0 3 −4
0 0 0 0 4 3
8
Step 3
Each rotation corresponding to an element of F2 \ {e} fixes exactly two points
on the unit sphere S 2 . We let C ⊂ S 2 be the union of these fixed points, which
is countable and invariant under F2 . Then F2 acts freely on S 2 \ C, meaning
that if we fix a point x in an orbit of S 2 , then we can identify it with F2 via
g 7→ gx.
Now comes the non-constructive step. The axiom of choice implies the exis-
tence of a subset X ⊂ S 2 \ C which contains exactly one point from each orbit.
If we let Xi = Si X = {gx | g ∈ Si , x ∈ X}, then
S 2 \ C = X1 ∪ X2 ∪ X3 ∪ X4 = X1 ∪ AX2 = X3 ∪ BX4 .
To get rid of the set C, we use the following simple trick. To illustrate it,
consider first the unit circle S 1 , any point p ∈ S 1 , and an irrational rotation R
of the plane. Then all the points p, R(p), R2 (p), R3 (p), . . . are distinct and if we
put P = {p, R(p), R2 (p), . . .}, Q = S 1 \ P , then
S 1 = P ∪ Q, S 1 \ {p} = RP ∪ Q
and these are disjoint unions. This means we can plug a hole in a circle by
cutting it into two pieces and rotating one of them! In fact the same thing works
if instead of p we have any countable subset of S 1 , since there are uncountably
many rotations. By the same reasoning, we can find a rotation R of 3-D space
and a decomposition S 2 = Y1 ∪ RY2 such that S 2 \ C = Y1 ∪ Y2 .
Combining the two ways of decomposing S 2 \ C we get a partition S 2 =
Z1 ∪ . . . ∪ Z8 of the 2-sphere into eight pieces and rotations T1 , . . . , T8 such that
S 2 = T1 Z1 ∪ . . . ∪ T4 Z4 = T5 Z5 ∪ . . . ∪ T5 Z5 .
Step 4
We are almost done. For each piece Zk of S 2 there is a corresponding subset of
the unit ball without its center, B \ {0}, consisting of all points which map to
Zk under radial projection. This immediately gives us a version of the theorem
for B \ {0}, and in order to get all of B we apply the same trick as above to a
circle in B which contains the origin.
3 Conclusion
The Banach–Tarski paradox shows the impossibility of assigning a volume Vol(A)
to every subset of A of R3 such that
1. Vol(A) = Vol(T (A)) where T is a rotation or translation,
2. Vol(A ∪ B) = Vol(A) + Vol(B) for any disjoint A and B,
3. Vol(B) > 0 if B is a solid ball.
9
Rather than giving up on any of theses properties, which are actually a rather
minimal set of assumptions, we will instead adopt the point of view that one
should not try to assign a volume to every subset of Rn , such as those con-
structed in the proof above.
One could also argue that the axiom of choice is too strong and should be
rejected. However this would not lead to any simplification of measure theory
by itself.
10
Chapter 2
Even fairly good students, when they have obtained the solution of the
problem and written down neatly the argument, shut their books and
look for something else. Doing so, they miss an important and in-
structive phase of the work. ... A good teacher should understand and
impress on his students the view that no problem whatever is completely
exhausted.
– George Polya
11
0 1 1 1 1 1
16 8 4 2
0 1 2 1 2 7 8 1
9 9 3 3 9 9
Figure 3: Intervals of length 1/3, 1/9 and 1/27 in the complement of Cantor’s
set.
The construction is iterative: The first interval is the middle third of [0, 1].
Its complement are the two closed intervals, [0, 1/3] and [2/3, 1], and we take
the open middle thirds of these intervals, and so on. At the n-th step we create
12
2n−1 intervals of length 1/3n , so the total length of all intervals is
∞ ∞ n
2n−1
X 1X 2 1 1
(2) n
= = 2 − 1 = 1.
n=1
3 2 n=1
3 2 1 − 3
The complement, C, of the above intervals, i.e. what remains of [0, 1] after we
repeatedly erase middle thirds, is called the Cantor set.
A proof that C is uncountable uses ternary (base 3) expansion of real num-
bers. For example, 3 becomes 10 in ternary, 1/3 becomes 0.1, 2/3 becomes
.2. As with decimal (1 = .99999 . . .), some rational numbers have two ternary
expansion: one ending with 0 repeating, and one ending with 2 repeating, e.g.
0.22012 = 0.22011222222 . . .
13
the subsequent years. Curiously, an engraving of an Egyptian column shows a
pattern reminiscent of the Cantor set, see Figure ??.
How complicated can a general open subset of R be? It turns they are all
disjoint unions of countably many open intervals.
S∞
Proposition 2.2. Any open subset E ⊂ R can be written as E = n=1 In ,
where In are disjoint open intervals.
Proof. An open interval I = (a, b) ⊂ E is maximal there is no open interval
J 6= I with I ⊂ J ⊂ E. Here we allow a = −∞ and b = +∞. If I1 , I2 ⊂ E
are maximal, then they must be equal or disjoint, since otherwise I1 ∪ I2 is
an interval strictly larger than either one. Also, any p ∈ E is contained in a
maximal interval (a,b) where
a = inf{x ∈ E | (x, p] ⊂ E}, b = sup{x ∈ E | [p, x) ⊂ E}
thus E is the disjoint union of the maximal open intervals it contains. There
are only countably many such intervals, since we can pick a distinct rational
number from each.
How many open subsets are there?
Proposition 2.3. The cardinality of the set of open subsets in R is the same
as the cardinality of R.
Proof. We give a proof that generalizes to any second countable topological
space. Note that there are countably many intervals (a, b) with rational end-
points a, b ∈ Q. But any open E ⊂ R is completely determined by which such
intervals it contains: To test if x ∈ E, see if intervals with rational endpoints
closer and closer to x are contained in E. This produces for every E ⊂ R a
countable sequence of 0’s and 1’s uniquely characterizing it, thus a real number
in binary notation.
We have seen two ways of fitting countably many disjoint intervals of total
length 1 into the unit interval [0, 1]. But can the total length be more than 1?
As one would expect, the answer turns out to be negative, a fact which quickly
follows from Lebesgue measure theory, which is the subject of the next chapter.
14
Figure 4: Egyptian column with pattern similar to the Cantor set. Engraving
of Ile de Philae from Description d’Egypte by Jean-Baptiste Prosper Jollois and
Edouard Devilliers, Imprimerie Imperiale, Paris, 1809-1828
15
Chapter 3
Lebesgue Measure
Consider the set A = Q∩[0, 1] of rational numbers between 0 and 1. Can you
find a finite number of open intervals I1 , . . . , In which cover A, but have total
length < 1? What about if you are allowed to use countable many intervals?
The answer to the first question is negative. The main point is that
[0, 1] = A ⊂ I1 ∪ . . . ∪ In = I1 ∪ . . . ∪ In
where we are allowed to interchange closure and union because we are dealing
with just finitely many intervals. If we denote by χE the characteristic function
(
1 x∈E
χE (x) =
0 x∈ /E
where we use the Riemann integral. One could replace the use of the Riemann
integral by more elementary arguments, or the theory developed later in this
chapter.
On the other hand, if we can use countably many intervals, then the answer
is “yes”, and their total length can be made smaller than any positive number.
Indeed, Q, and thus A = Q ∩ [0, 1] is countable, so we can put these numbers in
a sequence q1 , q2 , q3 , . . .. Let > 0 and let In be the open interval of diameter
/2n centered at qn . Then the In clearly cover A, and their total length is . In
fact, the same argument works for any countable subset A ⊂ R, e.g. Q which
cannot be covered by finitely many intervals of finite length at all!
16
The preceding example emphasizes the importance of allowing infinite col-
lections of intervals in the following definition.
Definition 3.1. The outer measure, m∗ (E), of E ⊂ R is
( )
X [
m∗ (E) = inf l(Ik ) | E ⊂ Ik ∈ R ∪ {+∞}
k k
where (Ik )k is any countable collection of open intervals and l(Ik ) is the length
of the k-th interval.
The following lemma summarizes some basic properties of m∗ .
Lemma 3.2. 1. If A ⊂ B, then m∗ (A) ≤ m∗ (B) (monotonicity).
2. m∗ ([a, b]) = b − a.
3. m∗ ( n An ) ≤
S P ∗
m (An ) if An form a countable collection of subsets of
R (countable subadditivity).
Proof. Monotonicity is clear: If the intervals Ik cover B, then they also cover
A, and inf itself is monotonic.
Since [a, b] ⊂ (a − ε, b + ε) for any ε > 0 we have m∗ ([a, b]) ≤ b − a. On the
other hand, if Ik cover [a, b] then by compactness just finitely many suffice, and
they must have total length ≥ b − a by the same argument as in the special case
[0, 1] above.
We have already S shown the last statement in the case when each An is a
point, and so A = An is countable and m∗ (A) = 0. We can use a similar idea
for the general case. First, if any m∗ (An ) = +∞, then there is nothing to show,
so we may assume that m∗ (An ) is finite for all n. Let ε > 0. By definition of
outer measure we can choose a cover In,k of An by open intervals with
X ε
l(In,k ) ≤ m∗ (An ) + n .
2
k
S
Since the In,k combined form a cover of An we get
[ X X
m∗ An ≤ l(In,k ) ≤ m∗ (An ) + ε
k,n n
17
This turns out to be false. A counterexample can be constructed using the
axiom of choice as follows.
Consider the relation ∼ on [0, 1] where x ∼ y if x − y ∈ Q. Since Q ⊂ R
is a subgroup, this is an equivalence relation. Using the axiom of choice, there
exists an E ⊂ [0, 1] which contains exactly one element from each equivalence
class. Such a set is called a Vitali set. By definition we have
[
[0, 1] ⊂ q + E ⊂ [−1, 2]
q∈Q∩[−1,1]
hence
[
1 ≤ m∗ q + E ≤ 3
q∈Q∩[−1,1]
contradicting countable additivity, and even finite additivity, since finitely many
of the q + E already have total outer measure greater than 3.
The argument above did not use the definition of m∗ directly, only its basic
properties. This shows that the failure of countable additivity is not just some
defect of the definition. More precisely, there is no way of assgning a number
m(A) ∈ [0, +∞] to every A ⊂ R such that
1. m is countably additive,
2. m is translation invariant,
3. 0 < m([a, b]) < +∞ for any a < b.
In order to recover countable additivity one is forced to restrict to a class of
subsets of R, called measurable, for which this property holds. This turns out to
be a very non-restrictive requirement: To construct non-measureable sets, one
needs the axiom of choice.
A subset E ⊂ R is measurable if for any X ⊂ R one has
m∗ (X ∩ E) + m∗ (X ∩ (R \ E)) = m∗ (X).
Intuitively, a measurable subset E cuts any other subset cleanly into two pieces.
Note that the left hand side is ≥ the right hand side for any E, X by subadditiv-
ity. Also, the equality is non-trivial only for m∗ (X) < +∞. If E is measurable,
then we define m(E) = m∗ (E), the Lebesgue measure of E. This notation is
to emphasize that m is not defined on non-measurable sets.
18
S
Theorem 3.3. If An ⊂ R, n ≥ 1 are disjoint measurable sets, then n An is
also measurable and
∞ ∞
!
[ X
m An = m(An )
n=1 n=1
m∗ (X) = m∗ (X ∩ A) + m∗ (X ∩ Ac )
= m∗ (X ∩ A ∩ B) + m∗ (X ∩ A ∩ B c )
+ m∗ (X ∩ Ac ∩ B) + m∗ (X ∩ Ac ∩ B c )
≥ m∗ (X ∩ (A ∪ B)) + m∗ (X ∩ (A ∪ B)c )
A ∪ B = (A ∩ B) ∪ (A ∩ B c ) ∪ (Ac ∩ B)
Note that since each An is measurable and they are disjoint, we have
N
[ −1
N[
m∗ (X ∩ An ) = m∗ (X ∩ AN ) + m∗ (X ∩ An )
n=1 n=1
...
N
X
= m∗ (X ∩ An ).
n=1
19
TN T∞
so applying monotonicity to n=1 Acn ⊃ n=1 Acn and passing to the limit we
get
∞ ∞
!!
X \
∗ ∗ ∗ c
m (X) ≥ m (X ∩ An ) +m X ∩ An
n=1 n=1
| {z }
≥m∗ (
S
An ∩X)
S∞ S∞
thus n=1 An is measurable. Putting X = n=1 An in the above estimate
shows that
∞ ∞
!
[ X
m∗ An ≥ m∗ (An )
n=1 n=1
and since the reverse inequality always holds, the theorem is proven.
The conclusion is that although m has a smaller domain of definition than
m∗ (the measurable subsets), it has much better properties. But which subsets
of R are measurable?
Theorem 3.4. The following subsets of R are measurable.
1. ∅
2. complements of measurable sets
3. countable unions and intersections of measurable sets
4. open and closed sets
5. sets with zero outer measure, the null sets
Poperties 1), 2), and 3) are summarized by saying that measurable sets form
a σ-algebra. This is stronger than just having a boolean algebra — a collection
of subsets closed under complements and finite unions and intersections.
Proof. For 1), 2), there is nothing to check, and 5) follows immediately from
monotonicity. We already know that finite unions of measurable sets are mea-
surable, and thus by 2) also finite intersections, since A ∩ B = (Ac ∪ B c )c . If A
is a countable union of An , not necessarily disjoint, then
∞ n−1
!
[ [
A= An \ Ak
n=1 k=1
20
Since ε > 0 was arbitrary, (a, ∞) is measurable. Moreover, closure under com-
plements and intersections then shows that any interval is measurable, and thus
any open subset of R, as it is a countable union of open intervals (take intervals
with rational endpoints).
Borel–Cantelli Lemma
As an application of the above we prove the following result of probability theory,
stated in terms of measure theory.
P
Theorem 3.5 (Borel–Cantelli). Suppose Ek are measurable with m(Ek ) <
∞. Then the set of points which belong to infinitely many Ek has measure zero.
Proof. We can write the set in question as
∞ [
\ ∞
N= Ek
n=1 k=n
21
To see that the assumption m(E1 ) < ∞ is needed, consider for example
En = [n, ∞).
T
Proof. Let E = Ek , then E1 = E ∪ (E1 \ E2 ) ∪ (E2 \ E3 ) ∪ . . ., thus
∞
X
m(E1 ) = m(E) + m(Ek \ Ek+1 ) = m(E) + lim (m(E1 ) − m(Ek ))
k→∞
k=1
but since m(E1 ) < ∞ we can subtract it from both sides to get the result.
Proof of the theorem. The probability that only finitely many Ak occur is
∞ [ ∞
!c ! ∞ \ ∞
!
\ [
c
P Ak =P Ak
n=1 k=n n=1 k=n
22
Another viewpoint on measurable sets
The measurable subsets of R can be thought of as a completion of the topological
(open or closed) ones, in the sense that former can be well approximated by the
latter. The following theorem makes this precise.
Theorem 3.8. If E ⊂ R is measurable then for any > 0 there is an open set
U ⊃ E and a closed set F ⊂ E with m(U \ E) < and m(E \ F ) < .
Proof. Suppose first that m(E) < ∞. Then there are open intervals Ik , k ≥ 1
which cover E with X
m(Ik ) < m(E) +
k
S
so if U = k Ik then by additivity (here we use that E is measurable!) we get
X
m(U \ E) = m(U ) − m(E) ≤ m(Ik ) − m(E) < .
k
For the case m(E) = ∞ we cut R into intervals [n, n + 1] so that E ∩ [n, n + 1]
has finite measure and use the above together with the /2n trick.
The statement about closed sets follows by passing to complements: If U ⊃
E c with m(U \ E c ) < , then F = U c ⊂ E is closed and U \ E c = E \ F .
Any set which is a countable union of closed sets is a called an Fσ set, and
dually a countable intersection of open sets is called a Gδ -set. Applying the
above theorem to = 1/n for all n and taking the union/intersection gives the
following corollary.
Corollary 3.9. If E ⊂ R is measurable then there is a Gδ set U ⊃ E and an
Fσ set F ⊂ E with m(U \ E) = m(E \ F ) = 0.
In particular this shows that every measurable set is the union of an Fσ set
and a set of measure zero. On the other hand, since closed sets and null sets
are measurable, the union of an Fσ set and a null set is measurable, thus:
Corollary 3.10. A subset E ⊂ R is measurable if and only if it can be written
as the union of an Fσ set and a null set (a set with outer measure zero).
d(A, B) = m∗ (A 4 B)
23
Theorem 3.11 (Littlewood’s first principle of Lebesgue theory). Suppose E ⊂
R is measurable with m(E) < ∞. Then for any > 0 there is an I ⊂ R which
is a finite union of intervals such that
m(E 4 I) < .
Choose n so that
∞
X
m(Ik ) <
2
k=n
Sn
then I = k=1 Ik has the desired property.
The following is a variation of the above and may help you build some
intuition about measurable subsets. Suppose X ⊂ [0, 1] is an arbitrary subset,
perhaps not measurable. We “digitize” X with resolution n by approximating
it by a union, En , of intervals of the form
k k+1
, , 0 ≤ k < n integer
n n
m∗ (X 4 En ).
24
Chapter 4
Integration
The Lebesgue integral is defined in several stages, going from special to more
general classes of functions. There are two important conditions that need to
be imposed in order to make the integral well-defined.
1. Measurability
2. Positivity or absolute integrability
Let us comment a bit more on that. A function f : R → R is measurable if
for every y ∈ R the sublevel set
{x ∈ R | f (x) < y}
is measurable. (Replacing < with >, ≤, or ≥ gives an equivalent definition.)
It’s clear that some condition like that is needed, since measure is a special case
of integration in the sense that
Z
χE = m(E).
and χE is measurable if and only if E is. Here we use again the notation χE for
the characteristic function defined by
(
1 if x ∈ E
χE (x) :=
0 if x ∈ /E
25
Integral of a simple function
We begin by defining the integral of a simple function, which is a measur-
able function f : R → R attaining only finitely many values f (x) and having
bounded support in the sense that m(supp(f )) < ∞ where
Note that in measure theory we do not take the closure, unlike for continuous
or smooth functions. Simple functions form a vector space over R which is
generated by functions χE with E measurable and m(E) < ∞. If f is a simple
function taking non-zero values a1 , . . . , an define
Z n
X X
f= m({x | f (x) = ak })ak = m({x | f (x) = a})a.
k=1 a∈R
Note that {x | f (x) = a} is measurable since f is, and m({x | f (x) = a}) < ∞
for a 6= 0 because f has bounded support.
Lemma 4.1. The integral above is a linear functional on the vector space of
simple functions.
R R R
Proof. Clearly cf = c f for any c ∈ R. For additivity of we use (finite)
additivity of m:
Z X
f +g = m({x | f (x) + g(x) = c}c
c∈R
X
= m({x | f (x) = a, g(x) = b})(a + b)
a,b∈R
X X
= m({x | f (x) = a}a + m({x | g(x) = b}b
a∈R b∈R
Z Z
= f+ g
Measurable functions
Before we continue we need a slightly better understanding of which functions
are measurable. The many stability properties of measurable functions are in
some ways parallel to those of measurable subsets.
Theorem 4.2. The following functions are measurable.
• continuous functions
• monotone functions
26
• sums of measurable functions
• products of measurable functions
• pointwise limits of measurable functions
Proof. First, any continuous function f is measurable, since all the sets {x ∈
R | f (x) < y} are open. For a monotone function, these sets are intervals.
To prove that the sum of measurable functions f, g is measurable, note that
[
{x | f (x) + g(x) < a} = {x | f (x) < q} ∩ {x | q + g(x) < a}
q∈Q
where the right hand side is a countable union of measurable sets. Here we use
the fact that if f (x) + g(x) < a, then there is a rational number between f (x)
and a − g(x).
To show that the product f g is measurable we write it as
(f + g)2 − f 2 − g 2
fg =
2
so we just need to show that f 2 is measurable for f measurable, but this follows
from
{x | f (x)2 < a2 } = {x | f (x) < a} ∩ {x | f (x) > −a}.
Finally, we want to show that if fn → f pointwise and all fn are measurable,
then f is measurable. This follows from
27
Proof. Let us assume first that f is measurable. Divide the interval [−M, M ]
into intervals [ak , ak+1 ) with ak+1 − ak < ε. Let Ek = f −1 ([ak , ak+1 )) ∩ [a, b],
i.e. the set of points in [a, b] where ak ≤ f < ak+1 , which is measurable, since
f is. We get simple functions
X X
φ= a k χ Ek , ψ= ak+1 χEk
k k
So the approximations from above and below are arbitrarily close, which implies
equality in the statement of the theorem.
For the converse, suppose that inf = sup above, then there are sequences of
simple functions φk , ψk with φk ≤ f ≤ ψk and
Z
(ψn − φn ) → 0, as n → ∞.
Take φ = sup φk and ψ = inf ψk , which are measurable since they are pointwise
limits of measurable functions. We have φ ≤ f ≤ ψ, but in fact φ = ψ almost
everywhere (i.e. outside a set of measure zero). The reason is that if ψ − φ >
ε > 0 on a set E with m(E) > 0, then εχE ≤ ψn − φn for all n, and so
Z
(ψn − φn ) ≥ εm(E) > 0
28
Proof. The first statement is clear. For the second note that, using additivity
for simple functions,
Z Z Z Z Z
f+ g= sup φ1 + φ2 ≤ sup φ= f +g
φ1 ≤f,φ2 ≤g φ≤f +g
and the reverse inequality follows similarly with the inf definition of the integral.
29
where φ ranges over bounded functions with bounded support.
Theorem 4.5 (Properties of integral for non-negative functions). Let f, g :
R → [0, +∞] be measurable.
R R
1. f ≤ g =⇒ f ≤ g
R R
2. cf = c f , for c ≥ 0
R R R
3. f + g = f + g
Proof. The first two are clear from the definition, so we only proof the third
property. The inequality “≥” follows as above in the proof for bounded func-
tions. On the other hand, if φ ≤ f + g then setting
φ1 := min(φ, f ) ≤ f, φ2 := φ − φ1 = φ − min(φ, f ) ≤ g
we get Z Z Z Z Z
φ= φ1 + φ2 ≤ f+ g
where f+ = max(f, 0), f− = − min(f, 0), f = f+ − f−R are Rthe positive and
negative parts of f . Absolute integrability ensures that f+ , f− < ∞, so we
do not get ∞ − ∞, which is undefined.
Theorem 4.6. The integral of absolutely integrable functions is linear.
(f + g)+ + f− + g− = (f + g)− + f+ + g+ .
Putting the terms back to their original side of the equation and applying the
definition gives additivity.
30
R
It is easy to see that all the above definitions of f are compatible with
one another, i.e. agree for functions which fall into more than one of the classes
considered. Also, if one wants to integrate f not over all of R but some interval
[a, b] one can simply take
Z b Z
f := χ[a,b] f
a
or more generally for any measurable subset of R instead of [a, b]. If A and B
are disjoint then Z Z Z
f= f+ f.
A∪B A B
As a consequence of linearity we have a form of monotonicity:
Z Z
f ≤ g =⇒ f≤ g
and Z Z
f ≤ |f |.
This meansR that even if f is undefined on a null set, for example a countable
set, then f is still well-defined, because no matter how we extend f to all of
R, we get the same integral.
Extending all this to complex valued functions is straightforward. The func-
tion f : R → C is definedR to be measurable if and only if Re(f ) and Im(f ) are
measurable, and if also |f | < ∞ then we can define
Z Z Z
f = Re(f ) + i Im(f ).
Convergence Theorems
In this section we consider the following problem: Suppose a sequence of func-
tions fn converges to f pointwise and that the integral of each fn and f is
defined. Under what conditions does
Z Z
fn → f
31
as n → ∞? In the case of the Riemann integral, a sufficient condition is that fn
are continuous functions on a compact interval and converge uniformly. These
are very strong assumptions, and we will see that there is a much more useful
answer in the context of Lebesgue integration.
Let’s assume for now that each fn is nonnegative and measurable, so the
same will be true for f and the integrals are well-defined. Here are some exam-
ples where interchanging limit and integration is not possible:
• Escape to horizontal
R R We take fn = χ[n,n+1] , then fn → f = 0
infinity.
pointwise, but fn = 1 and f = 0. Informally, the mass (area under
the curve) has escaped to +∞ as n → ∞.
• Escape
R to width infinity.
R Let fn = n1 χ[0,n] , then fn → f = 0 uniformly,
but fn = 1 and f = 0 as before. The mass is spread out over the
infinite real line, eventually having zero density everywhere.
• Escape to vertical infinity. The previous two examples relied on non-
compactness of R, or more precisely m(R) = ∞. However, even if we
restrict to functions on a compact interval, things can R go wrong. Define
R
fn = nχ[ n1 , n2 ] , then again fn → f = 0 pointwise, but fn = 1 6= 0 = f .
This is essentially a 90 degree rotated version of the previous example,
with mass being concentrated with unbounded density.
Any convergence theorem needs a condition on the fn which rules out the
above escape scenarios. The first such condition is monotonicity, i.e. mass can
only be added, not subtracted or moved around.
Theorem 4.7 (Monotone convergence theorem). Let 0 ≤ f1 ≤ f2 ≤ . . . be an
increasing sequence of non-negative measurable functions, then
Z Z
lim fn = f
n→∞
32
for all simple functions 0 ≤ φ ≤ f . Let
N
X
φ= a k χ Ek
k=1
thus if we consider
Tautologically we have
N
X
fn ≥ (1 − ε)ak χEk,n
k=1
but since ε ∈ (0, 1) was arbitrary, this implies the other inequality.
Applying the theorem to partial sums yields the following corollary.
Corollary 4.8. Let fn : R → [0, ∞] be non-negative measurable functions, then
Z X ∞ X∞ Z
fn = fn .
n=1 n=1
In the absence of monotonicity, equality typically fails, but the integral can
only decrease in the limit, as we found in the examples of mass “escaping to
infinity”.
33
Theorem 4.9 (Fatou’s lemma). Let fn : R → [0, ∞] be non-negative measurable
functions, then Z Z
lim inf fn ≤ lim inf fn .
n→∞ n→∞
thus Z Z
lim sup fn ≤ f.
n→∞
Putting the two inequalities together shows the claim.
34
The construction of the integral and its properties presented in this chapter
generalize (with the same proofs) to Rn , or any abstract measure space.
35
Chapter 5
The definition of the integral in the previous chapter, as well as its proper-
ties, can be developed in a much more general setting without any additional
difficulty. This framework puts infinite sums and integrals (in any dimension),
as well as their weighted or fractal variants, on the same footing.
First, instead of R we consider any set X. Second, we need to specify which
subset of X should be the measurable ones, which amounts to choosing a σ-
algebra.
36
Proof. This is just a restatement of things we already know. If A is a σ-algebra
containing the open sets, then by closure under complements it also contains the
closed sets, thus the Fσ -sets by closure under countable unions. But if A also
contains the null sets, then it contains every Lebesgue measurable set, as any
such set is the union of of an Fσ and a null set. Thus the Lebesgue measurable
sets form the smallest σ-algebra containing open and null sets.
Elements of the σ-algebra generated by the open sets are called Borel sets.
This notion makes sense for any topological space X. Hence every Borel subset
of R is measurable, but the converse is not true. A way to see this is as follows.
One can show that the cardinality of the σ-algebra of Borel subsets is the same
as that of R, but the cardinality of the σ-algebra of measurable set is the same
as that of all subsets of R, since every subsets of the Cantor set is measurable.
Once we have fixed a σ-algebra of measurable sets on X, the final piece of
data needed is the measure.
Definition 5.3. A measure on a σ-algebra A is a map µ : A → [0, +∞] such
that
1. µ(∅) = 0
2. If Ek ∈ A, k ≥ 1, are disjoint, then
∞ ∞
!
[ X
µ Ek = µ(Ek ).
k=1 k=1
37
When X is finite we can normalize the counting measure so that it becomes a
probability measure with µ(E) = |E|/|X|.
The list of axioms for a measure µ is remarkably small. They imply mono-
tonicity and countable subadditivity, for example, and indeed all the properties
needed to define the integral. A property specific to the Lebesgue measure is
translation invariance. One chooses not to impose such a condition in general,
as it would exclude many of the interesting examples above.
Once one has fixed a measure µ, one can always add the subsets of null-sets
to the measurable sets. More precisely, one passes to the σ-algebra generated
by A and the subsets of the null sets, to which µ extends in the obvious way.
The triple (X, A, µ) is known as a measure space. Measurable functions
are defined as in the case of R (and do not depend on µ): A function f : X → R
is measurable if {x ∈ X | f (x) < a} is measurable for each a ∈ R. This is
equivalent to requiring the preimage under f of any Borel subset of R to be
measurable, because intervals (−∞, a) generate the Borel σ-algebra.
As in the case X = R one defines the integral of any measurable function
f : X → R (or X → C) which is either non-negative or absolutely integrable.
The main properties of the integral, including the convergence theorems, all go
through in general. To avoid confusion when dealing with multiple measures
one writes Z
f dµ
and Z
f dδ0 = f (0).
Probability spaces
A probability space is simply a measure space (X, A, µ) with µ(X) = 1. Some
examples where given above. In the context of probability theory, X is called the
sample space, and its elements the outcomes. An element A of the σ-algebra A
is an event and its probability is P (A) := µ(A). The condition µ(X) = 1 ensures
38
we have P (A) ∈ [0, 1] and that the event X, which means “any outcome”, has
probability 1.
When X is finite, the σ-algebras on X correspond to partitions X = X1 ∪. . .∪
Xn with Xk being the minimal non-empty measurable subsets. Any probability
measure is completely specified by numbers pk = P (Xk ) ∈ [0, 1] with p1 + . . . +
pn = 1.
σ-algebras, besides being essential in Lebesgue theory, also give a way of
modeling incomplete information. For example, let’s say is coin is tossed twice
giving possible outcomes X = {HH, HT, T H, T T }. We could consider the σ-
algebra of all subsets, and the probability measure which assigns 14 to each
outcome. But suppose we do not know the results of both coin tosses, only if
they are the same or differ. This means our σ-algebra of events is
39
Chapter 6
Differentiation and
Integration
40
particle moves continuously across a positive distance in finite time, but if can
measure its speed at some random moment in time we will almost surely find it
resting.
with 0 < a < 1 and b a positive odd integer such that ab > 1 + 3π 2 . Intuitively,
the graph of f has wiggles on every scale, thus no well-defined slope no matter
how far one zooms in, see Figure ??. Similarly to the Cantor set, it is the frac-
tal nature of the Weierstrass function which is responsible for this unexpected
behavior.
Another example, which is a bit simpler to analyze, is based on a sawtooth
wave instead of cosine. Let w be the 2-periodic function on R which is equal to
41
Figure 7: Plot of a Weierstrass function.
Vitali coverings
The ball with center x and radius r > 0 in Rn is the subset
B(x, r) = {y ∈ X | |x − y| < r}.
If B = B(x, r) is a ball, write 3B := B(x, 3r).
42
Lemma 6.2. Let K ⊂ Rn be a compact subset covered by a collection E of balls.
The there are disjoint B1 , . . . , Bn ∈ E such that 3B1 , . . . , 3Bn cover K.
Proof. By compactness we can assume that E is finite. We select the Bk “greed-
ily”, that is B1 should have maximal radius among the balls in E, B2 should
have maximal radius among the balls in E which are disjoint from B1 , and so
on. This inductively defines a collection B1 , . . . , Bn and we claim that the 3Bk
cover K. S
Suppose, for contradiction, that x ∈ K, but x ∈ / 3Bk . Then x is contained
in one of the balls B = B(a, r) ∈ E which was not chosen by the algorithm. This
means there is some minimal k such that B intersects Bk , and thus Bk = B(b, s)
has radius s ≥ r. Hence
|x − b| ≤ |x − a| + |a − b| < r + (r + s) ≤ 3s
for n ≥ 1.
Proof. For each n > 0 we can cover K by finitely many balls in E which all have
radius ≤ 1/n. Since the union of such coverings is also a Vitali covering, we can
assume that for every ε > 0 the collection E has only finitely many balls with
radius > ε. The same greedy algorithm as above then gives a finite or infinite
sequence of
S disjoint balls Bk with decreasing radius. If the sequence is finite,
thenSK ⊂ Bk and we are done. Otherwise, fix n ≥ 1 and suppose x ∈ K with
n
x∈/ k=1 Bk . By the Vitali property, x ∈ B for some B ∈ E with radius less
than any of Bn . If B was not chosen by the algorithm, then it must intersect
some Bm , m > n, and so x ∈ 3Bm by the same reasoning as above.
We return to the real line, where the previous theorem allows us to prove a
strong version of Littlewood’s first principle.
Theorem 6.4 (Vitali’s Lemma). Let E ⊂ R be measurable with m(E) < ∞
and E a Vitali covering of E. Then for every ε > 0 there is a finite collection
of disjoint intervals B1 , . . . , Bn ∈ E with
n
!
[
m E4 Bk < ε.
k=1
43
Proof. Since E has finite measure we can find K ⊂ E ⊂ U with K compact, U
open and m(U ) − m(K) < ε. We may remove all intervals from E which are not
contained in U , while still keeping it a Vitali covering. The previous theorem
gives a sequence of disjoint Bk ∈ E with
n
[ ∞
[
K⊂ Bk ∪ 3Bk
k=1 k=n+1
P
for all n ≥ 1. In particular we have m(Bk ) ≤ m(U ) < ∞, so choose n ≥ 1
with
X∞
3 m(Bk ) < ε
k=n+1
which implies
n
X
m(U ) − ε ≤ m(K) ≤ m(Bk ) + ε
k=1
Sn
so we get m(U \ k=1 Bk ) < 2ε. By assumption we also have m(U \ E) < ε, so
n
!
[
m E4 Bk < 3ε
k=1
Lebesgue density
As an application of Vitali’s lemma we can prove Lebesgue’s density theorem.
44
By definition, each x ∈ A is contained in an interval I of arbitrarily small size
so that the density of E in I is less than 1 − n1 . These intervals form a Vitali
covering of A, so by Vitali’s lemma there are disjoint I1 , . . . , Ik with
n
!
[
m A4 Ik < ε.
k=1
We get
n
X
m(A) ≤ ε + m(A ∩ Ik )
k=1
n
1 X
≤ε+ 1− m(Ik )
n
k=1
1
≤ε+ 1− (m(A) + ε)
n
Since T is measure preserving and T (A) ⊂ A, T (A) can differ from A only
by a null set, thus δr (x) is constant on the orbits of T . But since the orbits
of T are dense in [0, 1) and δr is continuous, δr is in fact constant. Assume
that m(A) 6= 0, then by Lebesgue’s density theorem there is an x0 ∈ X with
limr→0 δr (x0 ) = 1, thus lim δr (x) = 1 for all x ∈ [0, 1) and m(A) = 1.
45
Corollary 6.7. A measurable function f : [0, 1) → R which is invariant under
T agrees almost everywhere with a constant function.
Proof. Partition the domain R into intervals Ik of length m(Ik ) ≤ ε. By the
theorem each f −1 (Ik ) has measure 0 or 1, but since m([0, 1)) = 1, there must
be exactly one Ik with m(f −1 (Ik )) = 1. Sending ε → 0 the intersection of such
intervals contains a single point, the value which f achieves almost everywhere.
Monotone functions
A function f : [a, b] → R is increasing if x ≤ y =⇒ f (x) ≤ f (y), decreasing
if −f is increasing, and monotone if it is either increasing or decreasing. We
can finally return to the proof of the following theorem.
Theorem 6.8. A monotone function f : [a, b] → R is differentiable almost
everywhere.
|f (I)|
D(I) =
|I|
46
S
S Vitali’s lemma to A ∩ k Ik gives disjoint intervals J1 , . . . Jm with
Applying
S
Jk ⊂ Ik , D(Jk ) > s, and
!
[ X
m Jk = |Jk | > m(A) − 2ε.
k k
Proof. Modify f so that it is constant, equal to f (b), on [b, ∞). This does not
change either side of the equation. Consider
1
fn (x) = n f x + − f (x) ≥ 0
n
1
by Fatou’s lemma. However for n sufficiently large so that a + n < b we get
!
Z b Z Z b+1/n b
fn = n f− f
a a+1/n a
Z b+1/n Z a+1/n
=n f −n f
b a
≤ f (b) − f (a)
by monotonicity of f .
47
A source of monotone functions are the Borel measures, which are measures
on the σ-algebra of Borel subsets of R. Let us restrict to measures with µ(R) <
∞ for simplicity, the finite measures. We have already seen some examples
of Borel measures in the previous section, but let us be more systematic now.
There are three basic types.
1. Absolutely
R continuous. Given a measurable f : R → [0, +∞] with
f < ∞ we get a measure mf
Z
mf (E) = f dm
E
48
is an increasing function from which µ can be uniquely recovered, since the
intervals (−∞, x] generate the Borel σ-algebra. The discontinuities in f come
from µpp , while µac is responsible for F 0 = f . A singular continuous measure
gives a continuous function with F 0 = 0 almost everywhere, like the Cantor
function.
thus
f+ (x) − f− (x) = f (x) − f (a)
which proves the claim.
49
Just like monotone functions are related to measures, functions of bounded
variation are related to signed measures which can take negative values. The
theorem then corresponds to the Hahn decomposition theorem, exhibiting a
signed measure µ as the difference µ+ − µ− of two unsigned ones.
Absolute continuity
Rb
We have seen that a f 0 makes sense for any function of bounded variation,
but may not be equal to f (b) − f (a). To recover the fundamental theorem of
calculus, we need to impose a stronger condition. A function f : [a, b] → R is
absolutely continuous if for any ε > 0 there is a δ > 0 such that if (ak , bk ),
k = 1, . . . , n are disjoint intervals in [a, b] of total length less than δ, then
n
X
|f (ak ) − f (bk )| < ε.
k=1
The Cantor function is not absolutely continuous, because we can cover the
Cantor set, where all the variation is concentrated,
√ by finitely many intervals
of arbitrarily small total length. The function x, restricted to [0, 1], say, is
absolutely continuous on the on the other hand, even though it is infinitely
steep at 0. The following theorem shows that absolute continuity is a necessary
condition for the fundamental theorem of calculus to hold.
Theorem
Rx 6.11. Suppose f : [a, b] → R is absolutely integrable, then F (x) =
a
f is absolutely continuous.
This is a corollary of the following fact.
R
Theorem R 6.12. Let f ≥ 0 with f < ∞, then for every ε > 0 there is a δ > 0
so that E f < ε if m(E) < δ.
Proof. Let fn = min(f, n) be the truncation, then fn → f pointwise and thus
there is an N > 0 so that Z
ε
(f − fN ) <
2
ε
by the monotone convergence theorem. Setting δ = 2N we have
Z Z
f≤ (f − fN ) + N m(E) < ε
E E
50
whenever (ak , bk ), k = 1, . . . , n are disjoint intervals in [a, b] of total length less
than δ. This shows that kf kBV ≤ d(b − a)/δe.
51
Proof. Let |f | ≤ C, then |F (x) − F (y)| ≤ C|x
R x − y|, so F is Lipschitz continuous.
By injectivity of I it suffices to show that a (F 0 − f ) = 0 for all x ∈ [a, b].
This is an application of the dominated convergence theorem to the sequence
of functions
Fn (x) := n(F (x + 1/n) − F (x))
which converge pointwise a.e. to F 0 and are dominated by the constant function
C on [a, b]. This gives us
Z x Z x
F 0 = lim Fn
a n→∞ a
!
Z x+1/n
Z a+1/n
= lim n F− F
n→∞ x a
Z x
= F (x) − F (a) = f
a
hence Z x
(F 0 − f ) = 0
a
which shows F 0 = f a.e..
Having shown D ◦ I is the identity, it will follows that I ◦ D is the identity on
AC([a, b]) as long as D is injective. This is the content of the following lemma.
Lemma 6.18. The only absolutely continuous functions F with F 0 (x) = 0
almost everywhere are the constant ones.
Proof. Let c ∈ [a, b] and E the set of points in [a, c] where F 0 = 0. Fix ε > 0,
then each x ∈ E is contained in an open interval I with
F (x) − F (y)
≤ε
x−y
52
for x 6= y ∈ I. Vitali’s lemma gives us disjoint open intervals I1 , . . . , In with this
property which cover E up a set of measure < ε. By absolute continuity the
total variation of F over the complement of the Ik , which is a union of intervals,
is at most some ε0 which can be made arbitrarily small if ε → 0. Putting this
together we get
X
|F (c) − F (a)| ≤ ε0 + 2ε m(Ik ) ≤ ε0 + 2ε(b − a)
k
Convex functions
A function f : R → R is convex if every one of its chords lies above the graph,
that is
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
for any x, y ∈ R, λ ∈ [0, 1]. The definition makes sense more generally when
the domain of f is a convex set. For the discussion below, f could just as well
be defined on an open interval. Geometrically, convexity means that the secant
slope
f (x) − f (y)
, x<y
x−y
increases if x or y is increased.
Theorem 6.19. A convex function f : R → R is Lipschitz continuous on any
compact interval, and differentiable outside a countable set.
Proof. Any secant to f with endpoints in [a, b] has slope bounded above by the
slope of the secant from b to b + ε, since the endpoints of the latter are to the
right. This show Lipschitz continuity on [a, b].
By convexity, the left and right derivative of f exist everywhere and are
increasing functions. Moreover, the left/right derivatives must be discontinuous
at the points where they are not equal, and a monotone function can have at
most countably many jumps.
53
A useful fact about convex functions is Jensen’s inequality.
Theorem 6.21 (Jensen’s inequality). Suppose (X, A, µ) is a probability space,
f : X → R absolutely integrable, and φ : R → R convex, then
Z Z
φ f dµ ≤ φ ◦ f dµ.
X X
In terms of the expectation value, this can be written as φ(E(f )) ≤ E(φ(f )).
The convexity condition φ(λx+(1−λ)y) ≤ λφ(x)+(1−λ)φ(y) is just the special
case where X has only two points, µ assigns them probabilities λ, (1 − λ), and
f takes values x, y.
Proof. Let Z
x0 = f dµ
X
54
Chapter 7
Lp spaces
When does the right P hand side make sense, i.e. define a periodic function on
R? Assuming that |an | < ∞, the series converges uniformly to a continuous
function (Weierstrass M-test). If on the other hand we have only the weaker
|an |2 < ∞, then it is not clear that the series converges even
P
condition
pointwise.
As an example consider
∞
X 1 2πinx
e
n=1
n
which clearly diverges for x ∈ Z, but does converge at non-integer points as can
be seen from the identity
∞
X zn
= − log(1 − z).
n=1
n
55
up to equality almost everywhere. This is a consequence of the completeness
of the inner product space L2 ([0, 1]) of such functions, and another advantage
of Lebesgue theory. We have already met L1 ([a, b]), the space of absolutely
integrable functions, in the discussion of the fundamental theorem of calculus.
More generally, one can define Lp (R) for p ∈ [1, ∞], which is the prototypical
example of a Banach space.
exists.
56
Proof. Suppose xn is a Cauchy sequence in V , then we can pass to a subsequence
such that kxn − xn+1 k < 2−n . This implies that the series
x1 + (x2 − x1 ) + (x3 − x2 ) + . . .
Lp (X)
p
Let X ⊂ R be a measurable set. For p ∈ [1, ∞) we define
R L p(X) as the normed
vector space of measurable functions f : X → R with X |f | < ∞ with norm
Z 1/p
p
kf kp := |f |
X
n
!1/p
X
p
kxkp = |xk |
k=1
kxk∞ = max |xk |
k
We still need to check that Lp (X) is a Banach space, which follows from
results below.
Theorem 7.2 (Hölder’s inequality). Let p, q ∈ [1, ∞] with
1 1
+ =1
p q
and f, g : X → R measurable, then
Z
|f g| ≤ kf kp kgkq .
X
57
norms is infinite. Thus, assume kf kp , kf kq ∈ (0, ∞). Passing to f /kf kp and
g/kgkq and using homogeneity of the p-norm, we can in fact reduce to the case
kf kp = kgkq = 1. Young’s inequality and the condition on p and q imply the
pointwise estimate
|f (x)|p |g(x)|q
|f (x)g(x)| ≤ +
p q
hence Z p
|g|q
|f |
Z
1 1
|f g| ≤ + = + =1
X X p q p q
which shows the claim.
The following shows that Lp (X) is a normed space.
Theorem 7.3 (Minkowski’s inequality). Let f, g : X → R be measurable, then
kf + gkp ≤ kf kp + kgkp
i.e. the triangle inequality holds for the p-norm, p ∈ [0, ∞].
Proof. The cases p = 1 and p = ∞ are easy, so assume p ∈ (1, ∞) and let
1/q = 1 − 1/p. We have
Z
kf + gkpp = |f + g| · |f + g|p−1
X
Z
≤ (|f | + |g|)|f + g|p−1
X
Z Z
= |f ||f + g|p−1 + |g||f + g|p−1
X X
Z 1/q
(p−1)q
≤ (kf kp + kgkp ) |f + g|
X
= (kf kp + kgkp )kf + gkp−1
p
then
∞
X
kF kp ≤ kfk kp
k=1
58
by the triangle inequality and monotone convergence, so in particular
P∞ F ∈
Lp (X) and F is finite almost everywhere. Hence also f (x) = f
k=1 k (x) is
defined almost everywhere and in Lp (X), since |f | ≤ F . The sequence fk
converges to f also in p-norm because
n
X ∞
X
kf − fk k p ≤ kfk kp → 0.
k=1 n+1
59
Duality
If V is a normed linear space over R, then the dual space V ∗ consists of linear
functionals ϕ : V → R which are bounded in the sense that
∞
X ∞
X
|ϕk (v)| ≤ kvk kϕk k < ∞
k=1 k=1
P
for any v ∈ V . This means we can define ϕ(v) := ϕk (v), since the
P sum is
absolutely convergent. It is easy to see that ϕ is linear with kϕk ≤ k kϕk k,
thus defines an element of V ∗ . Also,
n
X ∞
X
ϕ(v) − ϕk (v) ≤ kvk kϕk k −→ 0, as n → ∞
k=1 k=n+1
P
hence ϕ = k ϕk .
We did not assume that V is a Banach space, but it turns out that V ∗ only
depends on the completion of V . For infinite-dimensional V it is not obvious
that V ∗ 6= {0}. The Hahn–Banach theorem, which we will get to later, allows
one to construct many bounded linear functionals on V .
Returning to the example V = Lp (R), note that if 1/p + 1/q = 1, then
Hölders inequality shows that
Z
ϕf (g) = f g
and Z Z
ϕf (sign(f )|f |q/p ) = |f |1+q/p = |f |q = 1.
60
Theorem 7.8. The map f 7→ ϕf defines an isomorphism Lp (R)∗ = Lq (R) for
p 6= ∞, 1/p + 1/q = 1.
Proof. It only remains to show that the map is onto — injectivity follows from
the fact that the norm is preserved. Let p = 1 first. Given a bounded linear
functional ϕ : L1 (R) → R we must find a g ∈ L∞ (R) such that
Z
ϕ(f ) = f g
for any f ∈ L1 (R). If the above relation holds, then the integral of g over [a, b]
is ϕ(χ[a,b] ). So the idea is to get g as the derivative of
(
ϕ(χ[0,x] ) x≥0
G(x) :=
−ϕ(χ[x,0] ) x ≤ 0
which satisfies
ϕ(χ[a,b] ) = G(b) − G(a)
for a < b. It follows from
ϕ(χ[a,b] ) = ϕg (χ[a,b] )
61
for a < b. By the same argument as before, ϕ and ϕg agree on bounded functions
with bounded support.
To verify that g ∈ Lq (R) consider the truncations gn of g with |gn | ≤ n and
support in [−n, n]. We get
Z Z
|gn |q = gn · sign(gn )|gn |q−1
Z
≤ g · sign(g)|gn |q−1
= ϕ(sign(g)|gn |q−1 )
≤ kϕk · k|gn |q−1 kp
Z 1/p
q
= kϕk |gn |
hence Z 1/q
|gn |q ≤ kϕk
since 1 − 1/p = 1/q. Applying the monotone convergence theorem to the in-
creasing sequence |gn |q → |g|q shows that also kgkq ≤ kϕk. It follows from
Hölder’s inequality that ϕg ∈ Lp (R)∗ , and since it agrees with ϕ on the dense
set of bounded functions with bounded support, we get ϕg = ϕ.
The theorem still holds for general measure spaces, at least if p > 1. It also
holds in the case p = 1 if the measure space is σ-finite, i.e. a countable union
of sets of finite measure. This requires a different proof, since we relied heavily
on the fundamental theorem of calculus in the proof above. For p = ∞ the
theorem is in general not true — the dual of L∞ (X) is usually strictly bigger
than L1 (X).
62
A complete inner product space, like L2 (X), is called a Hilbert space. The Riesz
representation theorem tells us that L2 (R) is isomorphic to its own dual, more
specifically that any bounded linear functional on it is of the form v 7→ hw, vi
for some w. This turns out to be true for any Hilbert space.
Hilbert spaces are easier to work with than general Banach spaces, since one
has a notion of orthogonal vectors, orthonormal basis, and so on, and the inner
product is bilinear, whereas a norm is only sublinear.
63
Chapter 8
Inner products
The dot product of vectors x, y ∈ Rn given by
hx, yi = x1 y1 + . . . + xn yn
64
and has the geometric interpretation
where φ is the angle between x and y. When dealing with vectors in Cn one
defines
hx, yi = x1 y1 + . . . + xn yn
(there are two convections here, depending on if the first or second vector gets
conjugated).
More generally, an inner product on a complex vector space V assigns a
complex number hx, yi to any pair of vectors x, y and has the following proper-
ties.
1. hx, yi = hy, xi
for any x, y ∈ V .
Proof. The Cauchy–Schwarz inequality is equivalent to
hx, xi hx, yi
det ≥ 0.
hy, xi hy, yi
This 2-by-2 matrix represents the quadratic form v 7→ hv, vi restricted to the
2-dimensional subspace spanned by x and y, and so has positive determinant
by Sylvester’s criterion.
65
A more elementary proof goes as follows. Starting from the obvious inequal-
ity kx − yk2 ≥ 0 we get
1 1
Rehx, yi ≤ kxk2 + kyk2
2 2
by expanding out. The idea is to exploit symmetries of the terms to amplify this
to the CS inequality. First, the right-hand side is invariant under multiplying x
or y by some eiφ , so
1 1
Re(eiφ hx, yi) ≤ kxk2 + kyk2 .
2 2
Optimizing this inequality means choosing φ so that the left-hand side is maxi-
mal, i.e. so that eiφ hx, yi is real and positive and thus
1 1
|hx, yi| ≤ kxk2 + kyk2 .
2 2
This is closer to what we want to prove, but we are not quite there yet. The
next step is to use the symmetry (x, y) 7→ (λx, λ1 y) of the left hand side which
implies
λ2 1
|hx, yi| ≤ kxk2 + 2 kyk2
2 2λ
for any λ > 0. We can assume x, y 6= 0 at this point, since otherwise the
inequality is trivial. Then the right-hand side attains its minimum at
p
λ = kyk/kxk
which yields the inequality. The general principle of using symmetries to opti-
mize inequalities is a very useful one and can often be used to prove (or disprove!)
them.
The triangle inequality follows easily from the Cauchy–Schwarz inequality:
66
holds defines an inner product via the polarization identities. The proof of all
these identities is by expanding out both side using linearity.
Two vectors x, y ∈ V are orthogonal if hx, yi = 0. If x and y are orthogonal,
then kx + yk2 = kxk2 + kyk2 , which is a form of the Pythagorean theorem.
This generalizes, by induction, to finite sets of mutually orthogonal vectors. A
collection of vectors ei , i ∈ I is orthonormal if
(
1 i=j
hei , ej i = δij :=
0 i 6= j
i.e. all ei are mutually orthogonal and have length 1. Give linearly independent
vectors x1 , . . . , xn we can always find orthonormal vectors e1 , . . . , en with the
same span using the Gram–Schmidt algorithm. This proceeds by induction on
n. In the case n = 1 we just have e1 = x1 /kx1 k. Assuming we have already
found e1 , . . . , en−1 we let
Hilbert spaces
In order to be able to take infinite sums of vectors, we need the hypothesis
of completeness. Define a Hilbert space to be an inner product space which
is complete (as a normed vector space). In particular any Hilbert space is a
Banach space, but not the other way around. Any finite-dimensional inner
product space is a Hilbert space, as is L2 (X) for any measure space X.
An important consequence of completeness is that certain convex optimiza-
tion problems have a unique solution.
Theorem 8.2. Let H be a Hilbert space, K ⊂ H a non-empty closed convex
subset, and p ∈ H a point. Then there exists a unique q ∈ K which minimizes
the distance kp − qk to x.
δ = inf kxk
x∈K
67
but (xm + xn )/2 ∈ K by convexity, and so k(xm + xn )/2k ≥ δ, thus
x = y + z, y ∈ V, z ∈ V ⊥
68
Now let y ∈ V be the closest point to x in V and z = x − y. We want to
show that z ∈ V ⊥ , i.e. hz, wi = 0 for all w ∈ V . Since y is minimizing the
distance we have
We may assume that kwk = 1 and set λ = hw, zi which gives 0 ≤ −|hz, wi|, thus
hz, wi = 0. This shows that the decomposition x = y + z exists and that y is
the closest point in V to x. By the same reasoning, if one lets z be the closest
point to x in V ⊥ , then y = x − z ∈ V , which completes the proof.
The conclusion of the theorem is often written as H = V ⊕ V ⊥ , meaning
that H can be identified, as an inner product space, with the set of pairs (y, z),
y ∈ V , z ∈ V ⊥ . A consequence is that V = (V ⊥ )⊥ where V is the closure of
V . The map which assigns to any H its closest point in V is the orthogonal
projection to V and is linear.
Every vector x ∈ H defines an element ϕx in the dual space H ∗ with ϕx (y) =
hx, yi. Indeed boundedness of ϕx follows from the Cauchy–Schwarz inequality.
We already know in the special case H = L2 (R) that this gives a (conjugate
linear) isomorphism between H and H ∗ . In fact this is true in general for any
Hilbert space.
Theorem 8.5. (Riesz representation theorem) Let H be a Hilbert space, ϕ :
H → C a bounded linear functional. Then there exists a unique x ∈ H such
that ϕ = ϕx , i.e. ϕ(y) = hx, yi for all y ∈ H.
Proof. We first show uniqueness. Let ϕx = ϕy . Then ϕx−y = 0, so 0 =
ϕx−y (x − y) = hx − y, x − yi, thus x = y.
Next, we prove existence. If ϕ = 0 identically, then we can just take x = 0,
so the claim is obvious. So assume ϕ 6= 0, which implies that Ker(ϕ) = {x ∈
H | ϕ(x) = 0} is a proper closed subspace of H, thus has non-trivial orthogonal
complement. Choose y ∈ Ker(ϕ)⊥ of length one, then ϕ(y) 6= 0. Note that for
ϕ(z)
any z ∈ H, the vector z − ϕ(y) y is in Ker(ϕ), thus perpendicular to y, so
ϕ(z)
hy, zi − =0
ϕ(y)
or
ϕ(z) = hϕ(y)y, zi
which proves the claim with x = ϕ(y)y.
69
Orthonormal bases
In this section we use the completeness and orthogonality assumptions to make
sense of infinite linear combinations of vectors. Let H be a Hilbert space and let
e1 , e2 , . . . be a countable sequence of orthonormal vectors in H. This means that
each en has unit length P and is orthogonal to all the other ek , n 6= k. Suppose
cn ∈ C are scalars with n |cn | < ∞, then the series
∞
X
cn en
n=1
Proof. Let ε > 0, then by square–summability we can choose N > 0 such that
∞
X
|cn |2 < ε
n=N
hence if m ≥ n ≥ N , then
m n 2 m
X X X
ck ek − ck ek = |cn |2 < ε
k=1 k=1 k=n
by Pythagoras. This proves that the partial sums form a Cauchy–sequence and
thus have a limit since H is complete.
For the independence statement, choose N1 > 0 so that both
∞
X ∞
X
|cn |2 < ε, |cσ(n) |2 < ε.
n=N1 n=N1
70
The theorem ensures that if ei , i ∈ I is any family of orthonormal
P vectors,
not necessarily countable, and ci ∈ C are square summable, then i∈I ci ei is a
well-defined element in H.
Theorem 8.7. Let ei , i ∈ I be an orthonormal family of vectors in a Hilbert
space H. Then the map
X
L2 (I) → H, (ci )i∈I 7→ ci ei
i∈I
We call the smallest closed subspace containing all the ei the (Hilbert
space) span of the ei , which is usually bigger than the algebraic span, which
is the set of finite linear combinations of the ei .
Proof. We need to show that the map preserves the inner product. If I is finite,
then * +
X X X
xi ei , yi ei = x̄i yi
i∈I i∈I i∈I
We are particularly interested in the case when the ei span all of H, thus
form an (orthonormal) basis, in the Hilbert space sense.
Theorem 8.8. Let H be a Hilbert space and (ei )i∈I and orthonormal family of
vectors in H. The following conditions are equivalent.
1. The span of the ei is H.
71
3. The Parseval identity X
kxk2 = |hei , xi|2
i∈I
in L2 ([0, 1]). It is easy to see that the en are orthonormal. The statement that
they form a basis is equivalent to the density of trigonometric polynomials in
L2 ([0, 1]). To show this, it suffices to approximate any characteristic function
of an interval by trigonometric polynomials with respect to the 2-norm, which
reduces to an explicit computation. More details will be given in the next
chapter.
An almost trivial example is L2 (Z) with the family of functions
(
1 n=k
en (k) = δnk =
0 n 6= k
which form an orthonormal basis. This generalizes to L2 (I) for any set I of
course.
Theorem 8.9. Any Hilbert space has an orthonormal basis.
Proof. This is a consequence of Zorn’s lemma, which states that if a partially
ordered sets P has the property that every chain has an upper bound in P ,
then P contains a maximal element. In our case, P is the set of orthonormal
families in H. An upper bound on a chain of orthonormal families is given by
their union. Thus there exists some maximal orthonormal family (ei )i∈I . If the
72
span of this family is not all of H, then we could add to it some unit vector
in the orthogonal complement, contradicting maximality, thus ei must form a
basis.
Combining the previous two theorems, we conclude that every Hilbert space
H is isomorphic to L2 (I) for some set I with the counting measure. One can
show that the cardinality of I is independent of the choice of orthonormal basis.
For most Hilbert spaces of interest I is countable, which is equivalent to H
being separable, i.e. containing a countable dense subset. In particular, there is
essentially only one separable infinite-dimensional Hilbert space.
Note that we consider only orthonormal bases of Hilbert spaces. Dealing
with coordinates with respect to more general families of vectors becomes rather
difficult in the topological setting. Also we have so far not discussed the theory
of linear maps from one Hilbert space to another, which is a very non-trivial
extension of its finite-dimensional counterpart.
73
Chapter 9
Fourier analysis
Although fˆ typically looks very different form f , the do have the same
2-norm, and likewise inner products of functions are preserved under Fourier
transform.
The Fourier transform has many generalizations, depending on what kind
of “physical space” one considers and which kind of symmetries it has. Indeed
a large chunk of modern mathematics, roughly what is called representation
theory, can be thought of as generalizing Fourier analysis in some sense.
74
U (1) := {z ∈ C | |z| = 1} of complex numbers with absolute value 1. The
identification of the two is given by the exponential map
75
A trigonometric polynomial is a function of the form
N
X
f (x) = cn einx
n=−N
Note that because eimx einx = ei(m+n)x , the product of two trigonometric poly-
nomials is again one.
Proof. The basic trigonometric polynomials en are concentrated at a single point
n ∈ Z in frequency space, but spread out in physical space. The first step is
to construct a sequence of trigonometric polynomials Q1 , Q2 , Q3 , . . . which are
concentrated more and more near a given point in physical space, which we can
take to be 0 ∈ R/(2πZ). More precisely we want
1. Qn (x) ≥ 0 for all x ∈ R,
1
R 2π
2. 2π 0
Qn (x)dx = 1,
3. On each interval [ε, 2π − ε], ε > 0, Qn converge to 0 uniformly as n → ∞.
We can take n
1 + cos(x)
Qn (x) = Cn
2
1
R 2π
where the constants Cn > 0 are chosen so that 2π 0
Qn (x)dx = 1. Thus,
using that Qn is an even function,
n n
Cn π 1 + cos(x) Cn π 1 + cos(x)
Z Z
2Cn
1= dx > sin(x)dx =
π 0 2 π 0 2 π(n + 1)
and since Qn is decreasing on [0, π],
n
π(n + 1) 1 + cos(ε)
Qn (x) ≤ Qn (ε) ≤ → 0 as n → ∞
2 2
for x ∈ [ε, π]. This shows that the Qn satisfy the desired properties.
Let f : R/(2πZ) → C be a continuous 2π-periodic function. For each n ≥ 1
we let
Z 2π
1
Pn (x) = f (x − t)Qn (t)dt
2π 0
Z 2π
1
= f (t)Qn (x − t)dt.
2π 0
76
2.0
1.5
1.0
0.5
-3 -2 -1 1 2 3
PN
We may write Qn (x) = k=−N ck eikx , since Qn is a trigonometric polynomial.
Then
N
ck 2π
X Z
−ikt
Pn (x) = f (t)e dt eikx
2π 0
k=−N
77
where the Fourier coefficients fˆ(n) ∈ C are uniquely determined by
Z 2π
1
fˆ(n) = f (x)e−inx dx
2π 0
converge to f in 2-norm as N → ∞.
Convolution
The group structure on S 1 and Z allows us to define the convolution of functions
on these spaces. Convolution turns out to be Fourier dual to the pointwise
product of functions. While the latter naturally defines a product on L∞ , the
former is defined for a pair of functions in L1 .
We start with the discrete group Z. Given sequences of complex numbers
an , bn , n ∈ Z, we define
X X
(a ∗ b)(n) = ai bj = ak bn−k
i+j=n k∈Z
1
P
P example, if a ∈ L (Z), i.e.
whenever the infinite sum makes sense. For |an | <
∞, and bn are bounded, then the terms k∈Z ak bn−k are absolutely summable
with
ka ∗ bk∞ ≤ kak1 kbk∞
If moreover b ∈ L1 (Z), then a ∗ b ∈ L1 (Z) with
X
ka ∗ bk1 ≤ |ai bj | = kak1 kbk1
i,j∈Z
78
This turns out to be well-defined if f, g ∈ L1 (S 1 ) and we have
kf ∗ gk1 ≤ kf k1 kgk1 .
The standard (easy) proof of this uses Fubini’s theorem on double integrals,
which we have not covered so far. A more elementary proof uses the fact that
f ∗ g is approximated by a linear combination of translates of g and the triangle
inequality for the 1-norm.
Suppose now that f, g ∈ L2 (S 1 ) ⊂ L1 (S 1 ), then f ∗ g is in fact continuous.
To see this let
gx (t) = g(x − t)
then gx ∈ L2 (S 1 ) depends continuously on x ∈ R and therefore
(f ∗ g)(x) = hf¯, gx i
∗ g = fˆĝ.
f[
∗ g = fˆĝ.
f[
Note that fˆ, ĝ ∈ L∞ (Z) and we have extended the Fourier transform to
L (S 1 ) ⊃ L2 (S 1 ) using the same formula. It follows directly from the definition
1
Since convolution is bilinear, the statement is also true whenever f, g are trigono-
metric polynomials:
N
! N
! N
X X X
an en ∗ bn en = an bn en .
n=−N n=−N n=−N
79
Density of trigonometric polynomials in L1 (S 1 ) and continuity of convolution
kf ∗ gk1 ≤ kf k1 kgk1 implies the statement for general f, g ∈ L1 (S 1 ).
For the second statement the idea is the same as before, this time using the
identity
em+n = em en
in L1 (S 1 ).
SN (f ) = DN ∗ f
where
N
X
DN := en
n=−N
is the Dirichlet kernel. The formula for SN (f ) follows from the obvious identity
S\ dˆ 2
N (f ) = DN f in L (Z).
Note that DN is given by a geometric series, and so we can easily find a
closed form as follows (recall that z = eix ).
N
X
DN (z) = zn
n=−N
= z −N (1 + z + . . . + z 2N )
1 − z 2N +1
=
z N (1 − z)
z N +1/2 − z −N −1/2
=
z 1/2 − z −1/2
sin((N + 1/2)x)
=
sin(x/2)
80
20
15
10
-3 -2 -1 1 2 3
-5
with
f (x)
g(x) = .
sin(x/2)
Here we are integrating over two periods and dividing the result by two, since
sin(x/2) and sin((N + 1/2)x) are only 4π-periodic by themselves. Because we
can write SN (f ) essentially as a Fourier coefficient of g, the claim will follow
as long as g ∈ L1 (R/(4πZ)). But this follows directly from out assumptions
since f is differentiable wherever sin(x/2) = 0 and thus g bounded near these
points.
81
Theorem 9.6 (Carleson). If f ∈ L2 (S 1 ), then SN (f ) → f pointwise almost
everywhere.
It was discovered by Fejér, that the convergence properties of the Fourier
series can be radically improved by employing Césaro summation, which means
replacing the term of a sequence by the running average. Thus, instead of the
the SN (f ) we consider the sequence
1
TN (f ) = (S0 (f ) + . . . + SN −1 (f )) = FN ∗ f
N
where
1
FN = (D0 + . . . + DN −1 )
N
is the Fejér kernel. To better understand FN , we find a closed formula. Compute
N FN = z −N +1 + 2z −N +2 + . . . + N + . . . + 2z N −2 + z N −1
2
= z −(N −1)/2 + z −(N −3)/2 + . . . + z (N −1)/2
2
− z −N/2
N/2
z
=
z 1/2 − z −1/2
hence
sin2 (N x/2)
FN (x) = .
N sin2 (x/2)
In particular, FN (x) ≥ 0 for all x and FN converges uniformly to 0 as N → ∞
on any interval [δ, 2π − δ], δ > 0. Also note that
Z 2π
1
FN (x)dx = hFN , e0 i = 1
2π 0
thus we can replace the Qn in the proof of the theorem on density of trigono-
metric polynomials by Fn and get the same conclusion.
82
10
-3 -2 -1 1 2 3
83
satisfying the boundary condition f |S 1 = g for some given g : S 1 = ∂D → R,
which we assume to be continuous. Here, z = x + iy and fxx is shorthand
notation for ∂ 2 f /∂x2 . One can show that Laplace’s PDE is equivalent to finding
a minimizer of the energy
Z
E(f ) = fx2 + fy2 dxdy
D
This converges to g(z) on the boundary by Abel’s theorem (note that z̄ = 1/z
for z on the unit circle). To argue that ∆f = 0, Pwe refer to basic results in
complex analysis which state that a power series an z n is holomorphic (com-
plex differentiable) and thus f a sum of a holomorphic and anti-holomorphic
functions, hence harmonic.
fxx = ft
which is to be solved for t > 0 given an initial condition f (x, 0) = g(x). First,
note that for each n ∈ Z the function
2
e−n t+inx
is a solution of the heat equation with g(x) = einx a basis element. At least
formally we can write the general solution as
X 2
f (x, t) = ĝ(n)e−n t+inx = (g ∗ Kt )(x)
n∈Z
We claim that r
X
−n2 t+inx π X −(x−2πn)2 /4t
e = e
t
n∈Z n∈Z
1
for x ∈ S , t > 0. Since the left hand side is given as a Fourier series, it suffices
to compute the Fourier coefficients of the right hand side. Note that the right
84
2
hand side has a sum of translates, by 2πn, of the same function e−x /4t . So
instead of integrating each of the translates over [0, 2π], we can just integrate
once over R. We compute
r Z ∞ Z ∞ √ √ 2
π 1 2 1 2
· e−x /4t e−inx dx = √ e−n t e(−x/(2 t)+in t) dx
t 2π −∞ 2 πt −∞
-3 -2 -1 1 2 3
85
Theorem 9.9 (Weyl). If x ∈ S 1 = R/(2πZ) is be an irrational multiple of 2π
and f ∈ C(S 1 ), then
N Z 2π
1 X 1
lim f (kx) = f
N →∞ N 2π 0
k=1
as N → ∞.
Proof. Both sides are bounded linear functionals on the Banach space C(S 1 )
with the ∞-norm, since the average is bounded by kf k∞ . Thus is suffices to
check the identity for f (x) = einx , n ∈ Z, whose algebraic span is dense. For
n = 0 both sides are obviously 1, so assume n 6= 0. Then einx 6= 1 by irrationality
of x/(2π) so
N
X 1 − ei(N +1)nx
einkx =
1 − einx
k=1
which has absolute value bounded by 2/|1 − einx | for any N , hence the averages
go to 0 as N → ∞.
nx mod 1, n = 1, 2, 3, . . .
in [0, 1) will be evenly distributed, e.g. the first digit after the decimal point is
equally likely to be any of 0, . . . , 9.
and fˆ ∈ L∞ (R). In a certain sense, this can be regarded as the limit of the
Fourier transform of a periodic function as the period goes to infinity. Let us
assume that f is smooth with compact support. Suppose R > 0 is large enough
so that the support of f is contained in [−R, R], then we can represent f as a
Fourier series on [−R, R] of the form
X
f (x) = an e2πinx/(2R)
n∈Z
with Z R
1 1 ˆ n
an = f (x)e−2πinx/(2R) dx = f .
2R −R 2R 2R
86
The right hand side is the function on Z obtained by sampling fˆ on the regular
grid Z/(2R) and weighting by 1/(2R), so becomes a better and better approxi-
mation to fˆ as R → ∞. Furthermore, Parseval’s identity implies
Z ∞ Z ∞
1 X ˆ n 2
|f (x)|2 dx = f → |fˆ(ξ)|2 dξ, as R → ∞
−∞ 2R 2R −∞
n∈Z
where convergence to the integral follows from continuity of fˆ. This is the
Plancherel theorem, which says that the Fourier transform for functions on R
is an isometry. In particular, the Fourier transform is continuous with respect
to 2-norm, and so extends from compactly supported smooth functions to all of
L2 (R).
A useful property of the Fourier transform is that taking the derivative of fˆ
corresponds to multiplying f by −2πix pointwise. More precisely, we have:
Theorem 9.10. Suppose f ∈ L1 (R) such that xf ∈ L1 (R) as well. Then fˆ is
continuously differentiable and
dfˆ
(ξ) = −2πixf
c (ξ).
dξ
Proof. We want to show existence of
Z ∞
e−2πix(ξ+h) − e−2πixξ
lim f (x) dx
h→0 −∞ h
e−2πix(ξ+h) − e−2πixξ
≤ 2π|x|
h
and so, since |xf | ∈ L1 (R) by assumption, we can apply the dominated conver-
gence theorem which gives
Z ∞
ˆ 0
f (ξ) = −2πi xf (x)e−2πixξ dx = −2πixf
c (ξ).
−∞
df
c
(ξ) = 2πiξ fˆ(ξ).
dx
Proof. We need to show the identity
Z ∞ Z ∞
df
(x)e−2πixξ dx = 2πiξ f (x)e−2πixξ dx.
−∞ dx −∞
87
The idea is to use integration by parts, which we can do by our assumption on
f and
Z R R
d
f (x)e−2πixξ dx = f (x)e−2πixξ dx
→0
−R dx −R
as R → ∞.
The previous theorems required assumptions both on the regularity and the
decay of f . In order to not have to keep track of these, it it convenient to work
with a class of functions which have an unlimited amount of both regularity and
decay. A Schwartz function is an infinitely differentiable (aka C ∞ , smooth)
function f : R → C such that all functions
dm f
|x|n (x), m, n ≥ 0
dx
are bounded on R. The space of Schwartz functions on R is denoted S(R). An
example of a Schwartz function is any smooth function with bounded support,
2
as is the Gaussian e−x . If f ∈ S(R), then so are f 0 and xf , and fˆ, and S(R)
is closed under pointwise product and convolution of functions.
Uncertainty principle
An important idea about the Fourier transform is that not both f and fˆ can
be concentrated arbitrarily close to a single point. This is a consequence of the
general principle that Fourier transform interchanges small scale and large scale
structure. It leads to the famous Heisenberg uncertainty principle in quantum
mechanics concerning position and momentum of a particle.
We begin by defining operators (linear maps) X, D : S(R) → S(R) on
Schwartz functions given by
i d
(Xf )(x) := xf (x), (Df )(x) := f (x).
2π dx
These satisfy the commutator relation
i
DX − XD =
2π
since
i
(D(X(f )) − X(D(f )))(x) = (f (x) + xf 0 (x) − xf 0 (x)).
2π
They are also self-adjoint in the sense that
88
for any f, g ∈ S(R). The first identity is obvious, and the second follows by
integration by parts:
Z ∞
i 0
hDf, gi = f (x)g(x)dx
−∞ 2π
Z ∞
i
= f (x) g 0 (x)dx
−∞ 2π
= hf, Dgi
Note that the sign coming from integration by parts cancel with the one from
ī = −i.
Theorem 9.12. Let f ∈ S(R), then
1
kXf k2 kDf k2 ≥ kf k22 .
4π
Proof. We start with the obvious inequality
for any a, b ∈ R. Expanding this out and using the self-adjointness and commu-
tator relations gives
ab
a2 kXf k22 + b2 kDf k22 ≥ − kf k22 .
2π
We optimize the inequality by setting
s s
kDf k2 kXf k2
a= , b=−
2kXf k2 2kDf k2
The integral Z ∞
(x − x0 )2 |f (x)|2 dx
−∞
is the dispersion of f about x0 . Hence, the theorem says that for normalized f ,
not both f and fˆ can have arbitrarily small dispersion.
89
Lets set x0 = ξ0 = 0 for simplicity, then the inequality becomes an equality
only for the Gaussian
2 2
f (x) = Ce−πx /σ
√ √
where σ > 0 is arbitrary and C = 4 2/ σ is chosen so that kf k2 = 1. The
Fourier transform is 2 2
fˆ(ξ) = σCe−πσ ξ
so the Fourier transform of a Gaussian with variance σ 2 is again a Gaussian,
but with variance σ −2 .
χn (k) = e2πink/N , n = 0, 1, . . . , N − 1.
So the dual to Z/N Z is the same cyclic group. Also, the χn form an orthonormal
basis of L2 (Z/N Z) = CN since
N −1
1 X 2πi(n−m)k/N
hχm , χn i = e
N
k=0
and the sum on the right hand side is clearly 1 if m = n, and otherwise 0 from
the formula for a geometric series. Any N orthonormal vectors in CN give a
basis.
The Fourier transform of f : Z/N Z → C is
N −1
1 X
fˆ(k) = hχk , f i = f (n)e−2πink/N
N n=0
90
Then one easily checks the formula
1 ˆ
fˆ(k) = f0 (k) + e−2πik/N fˆ1 (k) .
2
So the Fourier transform of a length N vector is obtained from the Fourier
transform of two length N/2 vectors in O(N ) operations. Building a recursive
algorithm, we can thus compute the Fourier transform in O(M N ) = O(N log N )
operations. It is an open problem in computer science if this is essentially
optimal, or if there is an algorithm faster than O(N log N ).
91
Chapter 10
Thus meager sets behave in many ways like null sets. What is not obvious,
however, is that meager sets are truly “small”, and in particular that the entire
space X is not meager. This is where the completeness assumption comes in.
Theorem 10.1 (Baire). Any countable intersection of dense open sets U1 , U2 , U3 , . . .
in a complete metric space, X, is dense.
92
Proof. Let B0 be an arbitrary ball in X. Since U1 is dense the intersection
U1 ∩B0 is non-empty, and open since U1 and B0 are. Choose a ball B1 such that
its closure is contained in U1 ∩B0 . By induction we get balls B0 ⊃ B1 ⊃ B2 ⊃ . . .
with the closure of Bi contained in Ui for i ≥ 1. We may also assume that the
Bi are chosen small enough so that diam(Bi ) → 0, i → 0. Then the centers
of the Bi form a Cauchy-sequence, and thus converge to a point x ∈ X by
completeness. By construction
\ \
x ∈ B0 ∩ Bi ⊂ B0 ∩ Ui
i≥1 i≥1
T
so Ui intersect every open ball, i.e. is dense.
This result is also known as the Baire category theorem. Baire calls meager
sets of the first category and all other sets of the second category.
Corollary 10.2. The complement of any meager set is dense. In particular X
cannot be meager unless X = ∅.
Proof. Suppose E is meager, then
[
E= Ei
i≥1
93
Theorem 10.5. There is no function f : [0, 1] → R which is continuous exactly
at the rational points.
Proof. Let E ⊂ [0, 1] be the set of points where f is continuous. We claim that
E is a Gδ . To see this let En be the set of x ∈ [0, 1] such that there exists a
δ > 0 with
f ([x − δ, x + δ]) ⊂ I
T
for some interval I of length 1/n. Then En is open and E = n≥1 En .
If E = [0, 1] ∩ Q, then E is both meager and a dense Gδ in [0, 1], a contra-
diction.
Diophantine approximation
Diophantine approximation, named after Diophantus of Alexandria, is about
the approximation of real numbers by rational ones. An application of the
pigeonhole principle gives the following.
Theorem 10.6. If x is irrational, then there are infinitely many p/q ∈ Q with
p 1
x− ≤ 2.
q q
x, 2x, . . . , nx mod 1
are all distinct in [0, 1). Thus there must be some 0 ≤ n1 < n2 ≤ n such that
n1 x mod 1 and n2 x mod 1 have distance ≤ 1/n. Set q = n2 − n1 , then
thus
1
|p − qx| ≤
q
for some integer p.
In the other direction, a number x ∈ R is Diophantine of exponent α if there
is a C > 0 with
p C
x− > α
q q
for all p/q ∈ Q. A number is Diophantine if is Diophantine of exponent 2 + ε
for every ε > 0.
Theorem 10.7. A random number in [0, 1] is Diophantine, i.e. the Diophantine
numbers form a set of full measure.
94
Proof. Let ε > 0 and consider
2(q + 1) 4
m(Eq ) ≤ 2+ε
≤ 1+ε
q q
hence
∞
X
m(Eq ) < ∞.
q=1
Proof. Note that f has no rational roots (otherwise f has a linear factor), so
p 1
f ≥ d.
q q
p 1
x− ≥
q M qd
95
Proof. Let
En = {x ∈ [0, 1] | ∃p, q : |x − p/q| < q −n }
which is open and contains Q ∩ [0, 1], thus dense. The intersection of the En , a
dense Gδ by Baire’s theorem, is the set of Liouville numbers.
In particular a meager set can have full measure. On the other hand on can
also find dense Gδ ’s which are null sets. Take for example the intersection of
open Un ⊃ Q with m(Un ) → 0, n → 0.
kT (x)k ≤ Ckxk
96
1. (Pointwise boundedness) For every x ∈ X the set {Ti (x) | i ∈ I} ⊂ Y is
bounded.
2. (Uniform boundedness) The set of operator norms {kTi k | i ∈ I} ⊂ R is
bounded.
Proof. It is clear that 2. implies 1., since in this case we have a C, independent
of i, such that kTi (x)k ≤ Ckxk.
The other direction requires completeness of X and Baire’s theorem. Define
so En − En ⊂ E2n , but this means that E2n contains a ball B0 (r) around the
origin. Uniform boundedness follows with C = 2n
r .
is bounded.
One application of the uniform boundedness principle is to show that there
are continuous functions whose Fourier series does not converge pointwise. Fix
x ∈ S 1 and consider the linear functionals ϕN : C(S 1 ) → C given by evaluating
the N -th Fourier partial sum at x, i.e.
N
X
ϕN (f ) = fˆ(k)eikx = (DN ∗ f )(x).
k=−N
but also |ϕN (1)| = kDN k1 thus kϕN k = kDN k1 . One can show that kDN k1 →
∞ as N → ∞, so the family of operators ϕN : X → C is unbounded. By the
uniform boundedness principle we conclude:
Theorem 10.13. For each x ∈ S 1 there is a dense set of continuous functions
in C(S 1 ) whose Fourier series does not converge at x.
97
Open mapping theorem
A map f : X → Y between topological spaces is open if it maps open sets to
open sets, i.e. U ⊂ X open implies f (U ) ⊂ Y open. This is somewhat similar to
continuity, which is defined by the condition U ⊂ Y open implies f −1 (U ) ⊂ X
open. Note that if a linear map T : X → Y between normed spaces is open,
then the image of the open unit ball is an open set containing the origin in X,
thus contains some open ball of small radius. By homogeneity of T , any vector
y ∈ Y must be in the image of T , i.e. T is surjective. The second fundamental
result about bounded operators (after the uniform boundedness principle) is
that the converse is also true: openness implies surjectivity, at least when X, Y
are Banach spaces and T is bounded.
Lemma 10.14. Let T : X → Y be a bounded linear operator between Banach
spaces X, Y and B ⊂ X the open unit ball. If T (B) has nonempty interior, then
T is open and hence surjective.
Proof. Let U be a nonempty open subset of T (B). Since B is convex and
symmetric, the same is true for T (B) and thus (U − U )/2 ⊂ T (B) which implies
that T (B) contains some open ball B0 (r) around the origin. Given y ∈ B0 (r)
we want to show that T (x) = y for some x ∈ X. First, by assumption we can
choose x0 ∈ B with kT (x0 ) − yk ≤ r/2. Then we can find x1 ∈ B/2 with
r
kT (x0 ) + T (x1 ) − yk ≤ .
4
By induction, we get a sequence xk with kxk k ≤ 1/2k satisfying
∞
!
X
T xk = y
k=0
P
and k xk k < 2. Hence T (B0 (2)) ⊃ B0 (r), so T is an open mapping at the
origin. By linearity, T is open everywhere.
Theorem 10.15 (Open mapping theorem). Let T : X → Y be a bounded linear
operator between Banach spaces X, Y . If T (X) = Y , then T is open. Otherwise,
T (X) is meager.
Proof. If T is surjective, then
∞ ∞
!
[ [
Y = T (X) = T nB = nT (B).
n=1 n=1
98
There is a similar dichotomy for polynomials maps P : C → C, whose image
is either everything or a single point by the fundamental theorem of algebra.
This extends to more general complex differentiable maps, leading to the “open
mapping theorem” in complex analysis.
Corollary 10.16. Let T : X → Y be a bounded operator between Banach
spaces. If T is bijective, then the inverse T −1 is automatically bounded.
Proof. Because T is onto, T is open by the open mapping theorem. This means
the inverse pulls back open sets to open sets, i.e. is continuous.
The following application is due to Grothendieck.
n
!1/2 n
X X
2
|fk (x)| = fk (x)fk
k=1 k=1 2
n
1 X
≥ fk (x)fk
C
k=1 ∞
n
1 X
≥ |fk (x)|2
C
k=1
which implies
n
X
|fk (x)|2 ≤ C 2 .
k=1
Integrating both sides over [0, 1] gives n ≤ C 2 , which puts an upper bound on
the dimension of V .
99
The graph is a closed subspace if and only if xn → x, T (xn ) → y implies
T (x) = y. This is a bit weaker than continuity of T , because there is nothing
to check for sequences where T (xn ) does not converge. However it turns out
that if X and Y are Banach spaces, then this condition on T is equivalent to
continuity.
Theorem 10.18 (Closed graph theorem). If T : X → Y is a linear map between
Banach spaces and the graph of T is closed, then T is bounded.
Proof. Let G = {(x, T x)} ⊂ X × Y be the graph of T , which is a closed
subspace by assumption, thus by itself a Banach space. The projection map
G → X, (x, T x) 7→ x is a continuous bijection, hence its inverse x 7→ (x, T x) is
continuous by the open mapping theorem, thus T is bounded.
The theorem is not true for non-linear maps T . Take for example the function
f : R → R with f (x) = 1/x for x 6= 0 and f (0) = 0, which is not continuous
but has a closed graph.
Let H be a Hilbert space and T : H → H linear. T is self-adjoint if
hT x, yi = hx, T yi
for all x, y ∈ H. If H = L2 (R), then one way of defining such operators is via a
kernel K(x, y)R2 → C with K(y, x) = K(x, y) by
Z
(T f )(x) = K(x, y)f (y)dy.
hT x − y, zi = lim hT (x − xn ), zi = lim hx − xn , T zi = 0
n→∞ n→∞
100
References and further reading
• C. McMullen: “Real Analysis.” (Math 114 course notes)
• T. Tao: “The Baire category theorem and its Banach space consequences”,
terrytao.wordpress.com/2009/02/01/245b-notes-9-the-baire-
category-theorem-and-its-banach-space-consequences/
101
Chapter 11
C1 C2
C3 C4
that Cn+1 is constructed from Cn by replacing each line segment with a scaled
down version of C1 , putting a spike into it. In fact the Cn converge uniformly to
a continuous curve C since we are moving points less and less in this procedure.
What is the length of C, the Koch curve? First, C1 is 4/3 times longer than
the straight line segment at its base, which we may assume has length 1, so by
induction Cn has length (4/3)n . This geometric series goes to ∞ as n → ∞, so
C has infinite length in some sense.
It turns out that from a measure theoretic point of view we should not
consider C as a curve, something one-dimensional, but as fractal with non-
integer dimension between 1 and 2. To find the dimension we use the self–
similar nature of C. The principle is the following. If we take a unit square
102
[0, 1]2 and scale it by a factor of 2 we get [0, 2]2 which is made of four disjoint
copies of the original. The dimension is log(4)/ log(2) = 2. Similarly if we scale
a unit cube [0, 1]3 by a factor of 2 we get eight copies, and the dimension is
log(8)/ log(2) = 3. Now if we scale the Koch curve by a factor of 3 we get 4
copies of the original, so the dimension should be log(4)/ log(3) = 1.26 . . ..
A similar trick works with the Cantor set, which we can find at the base
of the Koch curve. If we scale it by 3 we get two copies, so its dimension is
log(2)/ log(3) = 0.63..., exactly half of that of the Koch curve.
Of course self-similar sets, like the examples above, are very special. It was
F. Hausdorff who realized that the Lebesgue measure could be defined not just
for integer dimension d, leading to length, area, volume, and so on, but for any
positive real d. This also allows one to define the dimension of very general
subsets of Rn , even any metric space. We only give a sketch of the theory,
without going into detailed proofs (see references at the end of the chapter for
more details).
Hausdorff measure
We begin by defining the α-dimensional Hausdorff outer measure, α ≥ 0,
of a subset E ⊂ Rn by
(∞ ∞
)
X [
∗ α
mα (E) = lim inf (diam Ei ) | E ⊂ Ei , diam Ei ≤ δ
δ→0
i=1 i=1
103
Hausdorff dimension
If E ⊂ Rn is a Borel set, then mα (E) is non-trivial for at most one α. More
precisely, there exists a unique α ≥ 0 such that
(
+∞ β < α
mβ (E) =
0 β>α
and this α ∈ R≥0 is called the Hausdorff dimension of E, dimH (E). Note
that at the critical dimension α, nothing can be said about mα (E), which could
in particular be +∞.
A d-dimensional linear subspace of Rn has expected integer Hausdorff di-
mension d. With a bit of work one can show that in the self-similar examples
(e.g. the Cantor set) the similarity dimension we computed is the same as the
Hausdorff dimension. A set with non-integer Hausdorff dimension is called a
fractal, though this is not a uniformly agreed upon definition. It is known that
for every 0 ≤ α ≤ n there are subsets of Rn with Hausdorff dimension α, i.e. all
dimensions can be realized.
Real world objects, unlike ideal mathematical ones, cannot have structure
on arbitrarily small scale, but they can still be approximately self-similar across
a wide range of scales and one can estimate their Hausdorff dimension. For
example, looking at the structure of cauliflower one sees that each branch carries
104
about 13 smaller branches of a third the size, so its Hausdorff dimension is about
log(13)/ log(3) = 2.33 . . .. The surface of the human lung is highly folded and
has dimension about 2.97, so it almost behaves like a solid, which is useful for
absorbing as much O2 as possible from a given volume of air.
Let: cn : [0, 1] → [0, 1]2 be the piecewise smooth curve obtained at the n-
th step of the construction. One shows that the cn converge uniformly to a
continuous map c : [0, 1] → [0, 1]2 . This curve is non-rectifiable: It has no well–
defined length. Indeed the lengths of cn grow like 2n , so we would expect c to
have Hausdorff dimension 2. In fact, the image of c is the entire square [0, 1]2 !
It is perhaps intuitively clear from the picture that c should be dense in
[0, 1]2 . A precise argument uses the fact that cn meets all little squares with
side length 1/2n−1 in [0, 1]2 . But now one can appeal to the general fact that
since [0, 1] is compact, and c continuous, the image c([0, 1]) should also be
compact, in particular closed. So if c([0, 1]) is dense, then it must be the entire
unit square. We summarize these facts, without proof in the following theorem.
Theorem 11.1. If c : [0, 1] → [0, 1]2 is the Hilbert curve, then
1. c is continuous and onto, but not one-to-one (although it becomes a bijec-
tion after sets of measure zero are removed from the domain and target)
2. c is measure preserving
It is not possible to find a map c : [0, 1] → [0, 1]2 which is both continuous
and a bijection, as such a map would have continuous inverse, and the two space
cannot be topologically the same: Removing one point from [0, 1] (except the
two endpoints) gives a disconnected space, while removing any one point from
[0, 1]2 still leaves the space connected. Still, the theorem shows that all [0, 1]n ,
n ≥ 0, are the same as measure spaces (up to sets of measure zero), so there
can be no intrinsic notion of dimension in measure theory.
105
References and further reading
• Stein, Shakarchi: “Real Analysis”, Chapter 7, (Princeton lectures in Anal-
ysis 3)
• T. Tao: “Hausdorff dimension”,
terrytao.wordpress.com/2009/05/19/245c-notes-5-hausdorff-dimension-optional/
106