0% found this document useful (0 votes)
212 views82 pages

Lecture Notes On Measure Theory and Functional Analysis

This document discusses the measure problem, which asks whether it is possible to assign a measure or size to every subset of the real line in a way that satisfies three desirable properties: normality, translation invariance, and countable additivity. It is shown that no such measure exists by the Vitali theorem. To resolve this, the notion of measurable sets is introduced, and measures are only required to be defined on measurable sets.

Uploaded by

Gary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
212 views82 pages

Lecture Notes On Measure Theory and Functional Analysis

This document discusses the measure problem, which asks whether it is possible to assign a measure or size to every subset of the real line in a way that satisfies three desirable properties: normality, translation invariance, and countable additivity. It is shown that no such measure exists by the Vitali theorem. To resolve this, the notion of measurable sets is introduced, and measures are only required to be defined on measurable sets.

Uploaded by

Gary
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 82

Real and linear analysis

Course notes based on material from


“Measure theory” by Terence Tao, and
“Real analysis” by Bruckner, Bruckner, and Thompson
Contents

Part I. Measure theory 3


§1. The measure problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
§2. Elementary measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
§3. Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
§4. Riemann integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
§5. Introduction to Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
§6. Lebesgue outer measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
§7. Lebesgue measurability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
§8. Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Part II. Measure and integration 28


§9. Preview of integration, simple integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
§10. Lebesgue measurability of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
§11. Lebesgue integration of nonnegative functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
§12. Lebesgue integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
§13. Convergence theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
§14. Abstract measure theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
§15. Construction of abstract measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
§16. Abstract integration theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Part III. Functional analysis 56


§17. Normed vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
§18. The Hahn–Banach theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
§19. Spaces of operators and the dual space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
§20. Three results on Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
§21. The Banach space L p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
§22. The dual space of L p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
§23. Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
§24. Bases for Hilbert space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

2
PART I

Measure theory

§1. The measure problem

R EADING . Tao, §1.1 introduction, and §1.2.3.

Before discussing the measure problem, let’s talk intuitively about what we mean by
“measure.” In this area of mathematics, measure is a number assigned to a set which
represents its size. Of course the term “size” also has many meanings. Our concept of
measure can accommodate many different types of sizes, such as: length, area, volume,
mass, and even probability. On the other hand, some other types of size are not usually
associated with the mathematical concept of measure, such as: cardinality, diameter, and
density.
The mathematical concept of a measure is thus beginning to seem geometric. But
considering our models of sets and spaces, in which coordinate axes are indexed by the
infinitesimal points of the real number line, measure really turns out to be an analytic
concept. (In particular, this means lots of e’s will show up in our studies!)
For a simple set, it may be easy to decide what its measure should be. For example, if
we use the term measure to mean length, then the measure of the interval [4, 7] should be
3. But for a more complicated set, the decision may not be so easy. If you have seen the
construction of the Cantor set, think about how you would measure the length of that!
Thus we arrive at the “measure problem,” which asks whether it is even possible
to find a function which adequately measures subsets of the real line R. Of course it
is necessary to say what is considered adequate. The classical version of the measure
problem proposed the three properties below. Formally, the measure problem asks: Does
there exist a measure function m which assigns to each subset A ⊂ R a value m( A) ∈
[0, ∞] satisfying:
(a) (normality) m( I ) = the length of I for every interval I;
(b) (translation-invariance) m( x + A) = m( A) for every A; and
(c) (countable additivity) m( ∞ n=1 An ) = ∑ m ( An ) for every seqence of pairwise dis-
S

joint sets An .
Perhaps surprisingly, no such measure function m exists! While properties (a)–(c)
seem very natural, the three items unfortunately turn out to be mutually inconsistent.

1.1. T HEOREM (Vitali). There exists a set A ⊂ R such that no measure can be assigned to A
consistently with (a)–(c).

3
1. THE MEASURE PROBLEM 4

P ROOF. Rather than work on R, we will work on the half-open unit interval [0, 1)
with the addition operation taken modulo 1. This is ok, since if there is a measure m on
all subsets of R, then by properties (b) and (c), m restricts to a measure on subsets of [0, 1)
which satisfies property (b) with respect to addition modulo 1.
Now let Q1 denote the rationals of [0, 1), that is, Q1 = Q ∩ [0, 1), and consider the
collection of additive cosets of Q1 inside [0, 1). The cosets are of the form a + Q1 where
again addition is interpreted modulo 1. We now let A ⊂ [0, 1) denote a system of coset
representatives for this collection.
Now every number in [0, 1) can be written uniquely as a + q for a ∈ A and q ∈ Q1 .
This means that the collection of translates of A by elements q ∈ Q1 covers all of [0, 1). In
S
particular, by (a) the measure of q∈Q1 ( A + q) is exactly 1.
On the other hand, by (b) and (c) we have that
[ 
m
q∈Q1
( A + q ) = ∑ q ∈Q m ( A + q ) = ∑ q ∈Q m ( A )
1 1

By the previous paragraph, the left-hand side of the above equation is 1. On the other
hand the right-hand side is an infinite sum of some nonnegative constant, and hence must
be either 0 or ∞. This is a contradiction! 
We remark that it is possible to modify the argument to apply directly to a measure on
R rather than going via the unit interval with addition modulo 1. See Tao for this version.
The lesson is that we must weaken our demands on a measure m. Dropping condition
(a) can lead to trivial measures. Dropping condition (b) takes away the geometric aspects
of the measure, and leads to interesting set-theoretic questions and constructions. Weak-
ening condition (c) to finite additivity leads to interesting solutions, but only in dimen-
sions ≤ 2. (In dimensions ≥ 3 the Banach–Tarski paradox again gives a contradiction.)
Yet the simplest path forward (and the one that we take) is to drop the tacit condi-
tion that every set be measurable. The set A constructed in Vitali’s proof is very artificial
and isn’t likely to occur in any of the most common analytical applications (see the notes
below). We want to be excused from the burden of deciding the measure of the set A.
This means we need to figure out what sets we will measure, and what sets we will not
measure. In the end, our measure function m will have a domain which is a proper subset
of P (R), but which still contains a rich collection of sets. And the measure will satisy
properties (a)–(c) as long as they are applied to the sets in the domain of m.
Of course we are also interested in the measure problem for subsets Rn . It can be
formulated in just the same way, with condition (a) replaced by the condition that the
measure of a box is equal to its volume. And a Vitali-type result can also easily be estab-
lished for this version of the measure problem.
In the next section, we will begin this process by taking a step backwards and build
measures with much smaller domains, and satisfying just fragments of (a)–(c).
1. THE MEASURE PROBLEM 5

N OTES AND FURTHER READING . The proof of Vitali’s theorem requires the Axiom of
Choice. Specifically, it is needed to find a system of coset representatives for an uncount-
able collection. Solovay showed that the use of AC is essential, and that it is consistent
with ¬AC that there is a measure function m on all subsets of R.

E XERCISE 1.1. Show that the properties (a)–(c) of a measure imply finite addivitity: If
A and B are disjoint then m( A ∪ B) = m( A) + m( B).

E XERCISE 1.2. Show that the properties (a)–(c) of a measure imply the inclusion–
exclusion principle: For any sets A, B we have m( A ∪ B) + m( A ∩ B) = m( A) + m( B).

E XERCISE 1.3. Complete the details of the proof that if there is a measure on R sat-
isfying properties (a)–(c), then there is a measure on [0, 1) (with additional modulo 1)
satisfying properties (a)–(c).

E XERCISE 1.4. If A is a bounded set of real numbers, the supremum sup( A) is the least
upper bound of A, and the infemum inf( A) is the greatest lower bound of A. Show that
s = sup( A) if and only if:
◦ for all a ∈ A we have a ≤ s, and;
◦ for all e > 0, there exists a ∈ A such that s − a < e.
Formulate and prove the analogous statement for the infemum.
2. ELEMENTARY MEASURE 6

§2. Elementary measure

R EADING . Tao, §1.1.1

In the introduction we saw that we cannot hope to define a measure which will work
adequately on all subsets of Rn . In this section we start over and define a measure which
is capable of measuring only the simplest sorts of subsets of Rn . In doing so we will see
some of the difficulties which one encounters in defining even very simple measures, and
we will also see some of these difficulties resolved. Moreover we will have explicit use for
the elementary measure defined in this section, so doing so is not a digression at all.
Recall that a bounded interval is any subset of R of the form ( a, b), [ a, b), ( a, b], or [ a, b].
We shall use the term box for any subset of Rn which is a Cartesian product of bounded
intervals.

2.1. D EFINITION . A subset E ⊂ Rn is elementary if it can be expressed as a union of


finitely many boxes.

For any elementary set E, we wish to define its elementary measure, or simply measure,
m( E). The measure of any interval will be defined to be its length, and the measure of
any box will be defined to be its volume. Thus if I = ( a, b) or [ a, b) or ( a, b] or [ a, b], then
we let m( I ) = len( I ) = b − a (in all four cases). And if B = ∏ In is a box, then we define
m( B) = vol( B) = ∏ len( In ). Since we allow only bounded boxes, this product can never
be indeterminate (0 · ∞). So far, so good.
We now wish to define the measure of an elementary set to be the sum of the finitely
many boxes it is composed of. However there are two issues with this statement: first the
constituent boxes need not be disjoint, and second there is in general more than one way
to express an elementary set as a union of boxes. The following two lemmas address these
two issues.

2.2. L EMMA . Any elementary set E can be expressed as a finite union of disjoint boxes.

P ROOF. First assume that E ⊂ R1 and that E = Ii . Then by considering all end-
S

points of the Ii in increasing order a1 , . . . , am it is easy to write E as the union of sets of the
form ( ai , ai+1 ) together with sets of the form [ ai , ai ] (single points). Such a union is clearly
disjoint.
In general if E ⊂ Rn and E = Bi then for each dimension d ≤ n consider in turn
S

the dth sides of the boxes Iid . Again consider the endpoints of these intervals in increasing
order aid , . . . , admd . Then we can write E as a union of small boxes which are products of
sets of the form ( aid , aid+1 ) or of the form [ aid , aid+1 ]. Such boxes are again disjoint. 
Figure 2.f1 shows an example of the method of the proof above.

2.3. L EMMA . Suppose the elementary set E can be expressed in two ways a a finite union of
disjoint boxes: E = Bi = Cj . Then ∑ vol( Bi ) = ∑ vol(Cj ).
F F
2. ELEMENTARY MEASURE 7

F IGURE 2. F 1. On the left: An elementary set which is a union of two closed


boxes. On the right: the same elementary set expressed as a disjoint union
of seven open boxes, 20 open segments, and 14 points.

P ROOF. We first note that I is an interval with endpoints a, b, and if a = a1 , a2 , . . . , am =


b is an increasing sequence then len( I ) = ∑ len( ai , ai+1 ). This is simply because the latter
summation telescopes.
Next if B is a box whose dth side has endpoints ad , bd , and if ad = a1d , a2d , . . . , admd = bd
then vol( B) = the sum of all small boxes of the form ∏( aidd , aidd +1 ). We will call the set
of such small boxes a perfect grid. Intuitively, if you break a box into a perfect grid of
sub-boxes, then the volume of the box is the sum of the volumes of the sub-boxes.
F
Now if B is a box and one expresses it as a disjoint union of sub-boxes B = Bi , then
vol( B) = ∑ vol( Bi ). This is because it is possible to find a refinement of the given disjoint
union which is a perfect grid as in the previous paragraph. That is, it is possible to write
B = Di where { Di } is a perfect grid, and each Bi is the union of a perfect grid of sets
F

taken from the collection { Di }. Then one can simply apply the argument of the previous
paragraph to B and to each Bi .
Finally given E, Bi , and Cj as in the problem statement, one can find a third expression
E = Dk where { Dk } is a refinement of both { Bi } and of {Cj }. That is, each Bi and each
F

Cj is a disjoint union of elements of { Ek }. It follows from the previous paragraph that


∑ vol( Bi ) = ∑ vol( Dk ) and analogously that ∑ vol(Cj ) = ∑ vol( Dk ). This completes the
proof. 
The two lemmas together imply that it is well-defined to characterize the elementary
measure with the expression m( Bi ) = ∑ m( Bi ).
F

2.4. P ROPOSITION . The elementary measure function m satisfies


(a) (normality) m( B) = vol( B) for any box B;
(b) (translation-invariance) m( x + E) = m( E) for any elementary set E; and
(c) (finite additivity) m( E ∪ F ) = m( E) + m( F ) for any disjoint elementary sets E, F.

Normality is clear from the defenition of m. The translation-invarinace is easy because


it is true of length and volume, and moreover is preserved even when we take disjoint
unions. The finite additivity property is again clear from the definition of m. We remark
that m satisfies countable additivity as well (restricted to elementary sets of course), but
that is much more difficult and will be addressed in the future.
2. ELEMENTARY MEASURE 8

The above three core properties imply further useful properties as well.

2.5. P ROPOSITION . The elementary measure function m satisfies


◦ (monotonicity) m( E) ≤ m( F ) for elementary sets E ⊂ F; and
◦ (finite subadditivity) m( E ∪ F ) ≤ m( E) + m( F ) for elementary E, F.

These results give an essentially complete solution to the measure problem for ele-
mentary sets. It wasn’t too difficult to achieve, but perhaps not as easy as one would have
thought! Even so, what about measuring other simple sets such as circles, triangles, blobs,
Cantor sets, and so on? In the next section we will continue on the road to doing this.

E XERCISE 2.1 (Tao Ex 1.1.1). Show that the class of elementary sets is closed under the
operations: union, intersection, set difference, symmetric difference, and translation.

E XERCISE 2.2. Prove Proposition 2.5: The elementary measure satisfies the monotonic-
ity and finite subadditivity properties.

E XERCISE 2.3 (Tao Ex 1.1.4). Show that if E is an elementary subset of Rm and F is an


elementary subset of Rn then E × F is an elementary subset of Rm+n . Furthermore show
that me ( E × F ) = me ( E)me ( F ).
3. JORDAN MEASURE 9

§3. Jordan measure

R EADING . Tao, §1.1.2.

In the previous section we showed that the intuitive definition of area is sensible for
elementary sets, but then remarked that simple shapes like polygons and circles are not
elementary. It is easy to imagine extending the elementary measure to triangles by cutting
and rotating, and to polygons by gluing together triangles. However no such operation
can perfectly measure a circle.
Instead we will measure the circle the way it has always been done, by using approx-
imation. It is not hard to visualize a circle being approximated by elementary sets, using
smaller and smaller boxes near the boundary. The approximation technique will help us
measure most traditional geometric figures, and even many blobby thingies.

3.1. D EFINITION . Let A be a bounded subset of Rn . First define the inner and outer
Jordan measures (sometimes called lower and upper):

m∗ j ( A) = sup { m( E) : E ⊂ A, E elementary }
m∗ j ( A) = inf { m( F ) : A ⊂ F, F elementary }

If m∗ j ( A) = m∗ j ( A) we say that A is Jordan measurable, we call the common quantity the


Jordan measuref of A, and we denote it by m( A).

It is immediate from the definition that Jordan measure extends elementary measure
in the sense that they agree on the elementary sets. This means we are justified in us-
ing “m” both for the elementary and Jordan measures. Moreover, we will show that the
Jordan measure inherits many of the properties of the elementary measure: normality,
translation-invariance, finite additivity, monotonicity, and finite subadditivity.
The normality and translation-invariance properties hold simply because they hold
for elementary measure, and these properties pass to the supremum. The additivity and
subadditivity properties will take a little more work. For instance, in order to even state
the finite additivity property, we first need to establish Boolean closure: the union of
measurable sets is measurable.
Before we begin these results, it will be useful to establish the following characteriza-
tion of Jordan measurability. As we will be working with approximations, the following
results also illustrate our first use of e-style analytical arguments.

3.2. L EMMA . The set A is Jordan measurable if and only if either of the following holds:
◦ For all e > 0 there are elementary sets E, F such that E ⊂ A ⊂ F such that m( F r E) <
e.
◦ For all e > 0 there is an elementary set E such that m∗ j ( E 4 A) < e.

P ROOF. We establish only the equivalence of Jordan measurability with the first item.
To begin, assume that A is Jordan measurable and let e > 0 be given. By the m∗ j definition
3. JORDAN MEASURE 10

of Jordan measure, we can find an elementary set E ⊂ A such that m( A) − m( E) < e/2.
By the m∗ j definition of jordan measure we can find an elementary set F such that A ⊂ F
and m( F ) − m( A) < e/2. It follows that

m( F r E) = m( F ) − m( E) = (m( F ) − m( A)) + (m( A) − m( E)) < e

as desired.
For the converse, assume that the first bullet holds true, and let e > 0 be arbitrary.
Then we can find elementary sets E, F such that E ⊂ A ⊂ F and m( F ) − m( E) < e.
From the definitions of inner and outer Jordan measure, we have that m( E) ≤ m∗ j ( A) ≤
m∗ j ( A) ≤ m( F ). It follows that m∗ j ( A) − m∗ j ( A) < e. Since e was arbitrary, we may
conclude that m∗ j ( A) = m∗ j ( A) and therefore that A is Jordan measurable. 
Note that in the proof, one has to be careful when making a claim such as m( F r E) =
m( F ) − m( E). It is true in the above cases because: the elementary sets are closed under
set differences, and so all three sets are elementary, and thus we may apply the finite
additivity property for elementary measure.

3.3. P ROPOSITION . If A, B are Jordan measurable, then so are A ∪ B, A ∩ B, and A r B.

P ROOF. We prove only the case of A ∪ B. Suppose that A, B are Jordan measurable.
By the previous lemma, we can find elementary sets E, F, E0 , F 0 such that E ⊂ A ⊂ F, and
E0 ⊂ B ⊂ F 0 , and m( F r E), m( F 0 r E0 ) < e/2. Then we have E ∪ E0 ⊂ A ∪ B ⊂ F ∪ F 0
and using some algebra together with the finite subadditivity of elementary measure,
m( F ∪ F 0 r ( E ∪ E0 )) ≤ m( F r E) + m( F 0 r E0 ) < e. Again by the previous lemma, this
shows that A ∪ B is Jordan measurable. 
We are now ready to establish the remaining stated properties of Jordan measure. The
following result states finite additivity, and the first paragraph of its proof gives finite
subadditivity. The monotonicity property follows immediately from finite additivity.

3.4. T HEOREM . The Jordan measure satisfies finite additivity, that is, if A, B are Jordan mea-
surable and disjoint, then m( A ∪ B) = m( A) + m( B).

P ROOF. We first show subbaditivity, that is, that m( A ∪ B) ≤ m( A) + m( B). Let e > 0
be given. Using the fact that m = m∗ j we can find elementary sets F, F 0 such that A ⊂ F,
B ⊂ F 0 , m( F ) − m( A) < e/2, and m( F 0 ) − m( A0 ) < e/2. Using the monotonicity and
subadditivity properties of the elementary measure, together with the definition of Jordan
measurability, we now have:

m( A ∪ B) = m∗ j ( A ∪ B)
≤ m( F ∪ F0 )
≤ m( F ) + m( F0 )
< m( A) + m( B) + e
3. JORDAN MEASURE 11

Since e was arbitrary, we achieve the desired inequality.


Now additionally assume that A, B are disjoint, and again let e > 0. This time using
m = m∗ j , we can find elementary sets E, E0 such that E ⊂ A, E0 ⊂ B, m( A) − m( E) < e/2,
and m( B) − m( E0 ) < e/2. Using the fact that E, E0 are disjoint, the finite additivity of
elementary measure, and the definition of Jordan measurability, we now have:

m( A ∪ B) = m∗ j ( A ∪ B)
≥ m( E ∪ E0 )
= m( E) + m( E0 )
> m( A) + m( B) − e
Again letting e tend to 0, we achieve that m( A ∪ B) ≥ m( A) + m( B). 
While you probably have a clear idea of what the elementary sets look like, it is now
time to give some examples and non-examples of Jordan measurable sets. Some simple
but useful new examples are the axis-parallel triangles. Suppose T is an axis-parallel
triange with leg lenghs a and b. To prove that T is Jordan measurable, note that two
copies of T essentially make up a box with area ab. Using the finite additivity, this implies
that the measure of T is the expected ab/2.
To make this argument we need to know that Jordan measure is invariant under 180◦
rotation, which is clear because it is true for boxes. Moreover since the two copies of T
overlap in a line segment, we also need to know that the Jordan measure of a line segment
is 0. This fact follows from the more general result below.

3.5. L EMMA . Let f be a continuous function defined on a closed, bounded interval. Then the
graph of f , considered as a subset of R2 , has Jordan measure 0.

P ROOF. Let I denote the domain of f . Recall that since I is closed and bounded, it
is compact. Recall also that a continuous function with a compact domain is uniformly
continuous: for any e > 0 there exists a δ > 0 such that for any interval J, len( J ) < δ
implies len( f ( J )) < e.
So let e > 0 be given, and choose δ > 0 as above. Shrinking δ if necessary, we can
suppose that len( I )/δ is an integer k. Partitioning I into intervals J1 , . . . , Jk each of lengh
δ, we have that the graph of f is contained in the set
[
A= Ji × [min f ( Ji ), max f ( Ji )]
i ≤k

Note that the min and max values in the definition of A exist by the extreme value theo-
rem. Now A is a union of k many rectangles each of size at most δe. Thus A is elementary
and its measure is at most kδe. This latter value is len( I )e, so the upper measure m∗ j of
the graph of f is at most len( I )e. Taking e → 0, we conclude that f is Jordan measurable
with measure 0. 
3. JORDAN MEASURE 12

It is now not difficult to conclude that all polygons are Jordan measurable and have
the expected measure. This is because all polygons can be decomposed into a union of
axis parallel triangles (possibly overlapping on their measure zero edges).
A simple example of a set which is not Jordan measurable is the set Q1 = Q ∩ [0, 1]
of rational numbers in the unit interval. Indeed the only elementary sets E ⊂ Q1 are the
finite sets, and so m∗ j (Q1 ) = 0. And the only elementary sets F such that Q1 ⊂ F are of
the form [0, 1] r X where X is finite, and so m∗ j (Q1 ) = 1.
Intuitively, the Jordan measure works very well for classical geometric figures, but not
very well for relatively simple analytic objects such as countable dense sets, the Cantor set,
and so forth. To handle such sets, we will soon work to describe the Lebesgue measure,
which satisfies countable additivity. Before going to such generality, however, we explore
the connection between Jordan measure and Riemann integration.

E XERCISE 3.1 (Tao, Ex 1.1.6(4)(6)). Verify that Jordan measure agrees with the elemen-
tary measure on elementary sets (thus satisfies the normality property). Verify that Jordan
measure satisfies the translation-invariance property.

E XERCISE 3.2 (See Tao, Ex 1.1.5). Complete the proof of Lemma 3.2: A is Jordan mea-
surable iff for all e > 0 there is an elementary set E such that m∗ j ( E 4 A) < e.

E XERCISE 3.3 (Tao, Ex 1.1.6(1)). Complete the proof of Proposition 3.3: If A, B are
Jordan measurable, then so are A ∩ B and A r B.

E XERCISE 3.4 (Tao, Ex 1.1.12). Say that A is Jordan null if A is Jordan measurable and
m( A) = 0. Show that any subset of a Jordan null set is a Jordan null set.

E XERCISE 3.5. Show that the outer Jordan measure m∗ j ( A) is equal to:

inf { vol( B1 ) + · · · + vol( Bk ) | B1 , . . . , Bk are boxes and A ⊂ B1 ∪ · · · ∪ Bk }

E XERCISE 3.6 (Tao, Ex 1.1.19). Let A be an arbitrary bounded set, and let E be an ele-
mentary set. Show that

m∗ j ( A) = m∗ j ( A ∩ E) + m∗ j ( A r E)

E XERCISE 3.7. Show that A is Jordan measurable if and only if for all e > 0 there exists
an elementary set E such that A ⊂ E and m∗ j ( E r E) < e.
4. RIEMANN INTEGRATION 13

§4. Riemann integration

R EADING . Tao, §1.1.3.

If the picture of Lemma 3.5 reminded you of Riemann sums, it should. Measure theory
is closely connected to integration theory, as both are concerned with calculating areas of
some regions. Moreover the Jordan measure corresponds neatly with the Riemann inte-
gral. The following presentation of the Riemann integral is actually attributed to Darboux.
Just as we defined the elementary measure before we defined the Jordan measure, we
will now define the “piecewise constant” integral before we define the Riemann integral.

4.1. D EFINITION . Let f be a real-valued function defined on [ a, b]. Then f is said to


be piecewise constant if there exists a partition P of [ a, b] into finitely many subintervals Ij
such that f takes a constant value c j on each interval Ij .

In other words, f is piecewise constant if f is of the form ∑1k c j χ Ij , where Ij are intervals.
Here χ Ij denotes the characteristic function of Ij , that is, χ Ij ( x ) = 1 if x ∈ Ij and χ Ij ( x ) = 0
otherwise.

4.2. D EFINITION . If f = ∑1k c j χ Ij then the pc integral of f is defined to be ∑1k c j len( Ij ).

As was the case with the elementary measure, one must check that the value of the pc
integral is well-defined. That is, if f is expressed in two different ways as a pc function,
say ∑ c j χ Ij = ∑ dk χ Jk , then one must check that the two values ∑ c j len( Ij ) and ∑ dk len( Jk )
agree.

4.3. D EFINITION . Let f be a bounded function on [ a, b]. First define the lower and
upper Riemann forms:
Z  Z 

f = sup pc f g ≤ f , g pc
Z  Z 

f = inf pc h f ≤ h, h pc

R R
Then if f = f we say that f is Riemann integrable, and denote the common value simply
R
by f .

4.4. P ROPOSITION . The Riemann integral satisfies the three properties:


◦ (normality) If A is a Jordan measurable subset of [ a, b], then χ A is Riemann integrable
R
over [ a, b] and χ A = m( A).
R
◦ (linearity) If f , g are Riemann integrable then so are c f and f + g and we have c f =
R R R R
c f , and ( f + g) = f + g.
R R
◦ (monotonicity) If f , g are Riemann integrable and f ≤ g then f ≤ g.

P ROOF. We establish only the normality property. By Lemma 3.2, for any e we can find
disjoint intervals Ij and disjoint intervals Jk such that Ij ⊂ A ⊂ Jk and m( Jk r Ij ) <
S S S S
4. RIEMANN INTEGRATION 14
R S
e. It is easy to see from the definition of the pc integral that pc χS Ij = m ( Ii ), and
R S
similarly pc χS Jk = m( Jk ). We now have
[ Z Z [
m( Ii ) ≤ χA ≤ χ A ≤ m( Jk )

Since the left and right-hand sides differ by < e, it follows that the lower and upper
integrals differ by < e as well. Since e was arbitrary, it follows that χ A is integrable. And
since we also have
[ [
m( Ii ) ≤ m( A) ≤ m( Jk )
R
we may conclude that χ A is equal to m( A). 
If one re-examines the definition and properties of the Jordan measure, it should be
clear that there is a close parallel between the Riemann integral and Jordan measure. The
normality property above begins to make this connection formal. The next result further
strenghens the two-way connection between the two notions.

4.5. T HEOREM . If f is a nonnegative, bounded function on [ a, b], then f is Riemann inte-


grable if and only if the region A = { ( x, y) | 0 ≤ y ≤ f ( x ) } is Jordan measurable. Moreover, in
R
this case we have f = m( A).

P ROOF. First suppose that f is Riemann integrable and let e > 0 be given. Choose pc
R
functions g, h such that g ≤ f ≤ h and pc (h − g) < e. Let E be the region under the
graph of g and let F be the region under the graph of h. It is clear that E, F are elementary,
E ⊂ A ⊂ F, and m( F r E) < e.
Conversely if A is Jordan measurable we can find an elementary E such that E ⊂
A and m( A r E) < e. Using our usual grid argument, we can suppose that there is
a sequence of disjoint intervals Ij such that E is a union of boxes with horizontal sides
selected from the Ij . Pairing each Ij with the constant c j = the maximum of the vertical
coordinates of all of the boxes with horizontal side Ij , we obtain a pc function g. It is easy
R
to see that m( E) ≤ pc g ≤ m( A). This shows that the lower Riemann integral of f is
m( A). We can proceed similarly using an outer approximation B to show that the upper
Riemann integral of f is m( A) too. 
Depending on when you last studied Riemann integration, you may better recall Rie-
mann’s classical approach rather than the Darboux approach above. This version involves
a quite expansive notation:
◦ f denotes a real-valued, bounded function defined on the interval [ a, b].
◦ x0 , x1 , . . . , xk denotes an increasing sequence of points in [ a, b] (they will be rec-
tangle endpoints), where x0 = a and xk = b.
◦ P denotes the partition of [ a, b] into subintervals defined by the xi , that is, into
subintervals [ xi−1 , xi ].
◦ δxi denotes the length of the ith interval, xi − xi−1 .
◦ kP k denotes the norm of the partition, max δxi .
4. RIEMANN INTEGRATION 15

◦ x1∗ , . . . , xk∗ denotes any selection of points such that xi∗ ∈ [ xi−1 , xi ].
With these pieces in hand, we can define the Riemann sums and the Riemann integral.

4.6. D EFINITION . With f , P , δxi , xi∗ as above, the corresponding Riemann sum is:

R( f , P , xi∗ ) = ∑ f ( xi∗ )δxi


The Riemann integral of f on [ a, b] is then defined by
Z b
R f = lim R( f , P , xi∗ )
a kP k→0

provided this limit exists. Here the limit “exists” and equals L if for all e > 0 there exists
δ > 0 such that for all P and xi∗ we have kP k < δ implies | R( f , P , xi∗ ) − L| < e.

It is an exercise in both notation and partition management to check that f is Riemann


integrable in the Darboux sense described earlier in this section if and only if f is Riemann
integrable in the classical Riemann sense just defined.

E XERCISE 4.1 (See Tao, Ex 1.1.21). Show that the pc integral is well-defined, and satis-
fies the normality, linearity, and monotonicity properties.

E XERCISE 4.2 (Tao, Ex 1.1.22). Let f be a bounded function on the interval [ a, b]. Then
f is integrable in the Darboux sense if and only if f is integrable in the classical Riemann
sense, and in this case the two values agree.

E XERCISE 4.3 (Tao, Ex 1.1.23). Let f : [ a, b] → R. Show that if f is continuous, then


f is Riemann integrable. Show that if f is bounded and piecewise continuous, then f is
Riemann integrable.

E XERCISE 4.4 (Tao, Ex 1.1.24). Complete the proof of Proposition 4.4: Show that the
Riemann integral satisfies the linearity and monotonicity properties. (Hint: first establish
these properties for the pc integral.)
5. INTRODUCTION TO LEBESGUE MEASURE 16

§5. Introduction to Lebesgue measure

R EADING . Tao, §1.2, first few pages

The Jordan measure that we have constructed works very well for the sets that it
measures. And the Riemann integral works very well for the functions that it integrates.
But there are several shortcomings that we have discussed, and several more too.
◦ Unbounded sets are not Jordan measurable, and unbounded functions are not
Riemann integrable
◦ There are examples of bounded sets which are open or closed, but still not Jordan
measurable
◦ A countable union of Jordan measurable sets need not be Jordan measurable
◦ A pointwise limit of Riemann integrable functions need not be Riemann inte-
grable, even if it is again bounded
In this section we will strenghten the definiton of Jordan measure to obtain the Lebesgue
measure. The Lebesgue measure possesses stronger properties than the Jordan measure,
including the ability to measure a wider class of sets. The price for this is that it will be
harder to establish these properties.
To begin, recall from an exercise in the Jordan measure section that we can rewrite the
definition of outer Jordan measure as follows.
( )
k k
m ( A) = inf ∑ vol( Bi ) Bi are boxes and A ⊂
∗j
[
Bi

1
1

The idea of the Lebesgue measure is simply to replace the finite union and summation
with a countable union and summation.

5.1. D EFINITION . Let A be any subset of Rn . The Lebesgue outer measure of A is





( )

m∗ ( A) = inf ∑ vol( Bi ) Bi are boxes and A ⊂
[
Bi

1
1

Notice that we have dropped the assumption that A is bounded. There are many
examples of unbounded sets with Lebesgue outer measure zero. In fact, every countable
set has lebesgue outer measure zero.
We also remark that we will not define an “inner” version of Lebesgue measure anal-
ogous to the Jordan inner measure. The reason is that we do not wish to assume that pos-
itive measure sets will contain any positive volume boxes. For example, the set [0, 1] r Q
should have a measure of 1 but has lower Jordan measure 0. In fact, even if we replace the
finite summation from Jordan inner measure with a countable summation, the resulting
inner measure would still be 0!
5. INTRODUCTION TO LEBESGUE MEASURE 17

Without an inner measure, we cannot define Lebesgue measurability simply by re-


quiring the outer and inner measures to agree. In order to find another way to de-
fine Lebesgue measurability, we recall from an earlier exercise that A is Jordan measur-
able if and only if for all e > 0 there exists an elementary set E such that A ⊂ E and
m∗ j ( E r A) < e. This leads to the following definition.

5.2. D EFINITION . Let A be any subset of Rn . We say that A is Lebesgue measurable if for
every e > 0 there exists a sequence of boxes Bi such that A ⊂ Bi and m∗ ( Bi r A) < e.
S S

When this is the case, we define m( A) = m∗ ( A) to be the Lebesgue measure of A.

Most sources actually define A to be Lebesgue measurable if for every e > 0 there
exists an open set O such that A ⊂ O and m∗ (O r A) < e. While this definition using open
sets is more elegant, our official definition using unions of boxes agrees more closely with
our definition of Lebesgue outer measure. Our work of the next few sections will reveal
how to show that these two definitions are equivalent.
We will see in the rest of this section and the next that the Lebesgue measure agrees
with the Jordan measure on the Jordan measurable sets, and moreover is capable of mea-
suring significantly more sets. In fact the Lebesgue measurable sets encompass almost
everything seen in real analysis and its applications, with exceptions essentially boiling
down to certain Axiom of Choice constructions. The Lebesgue measure also satisfies all
the measure axioms that we have mentioned so far, including their countable versions.
Likewise, later on we will introduce the corresponding Lebesgue integral. This in-
tegral agrees with the Riemann integral, and is capable of integrating significantly more
functions. It also has significantly stronger properties than the Riemann integral, includ-
ing a countable version of linearity.
Before we begin working to establish all these claims, we study the Lebesgue outer
measure further. In order to proceed, it is useful to lay out what properties are expected
of an outer measure. The following will be referred to as the outer measure axioms.
(a) (empty set) m∗ (∅) = 0
(b) (monotonicity) If A ⊂ B then m∗ ( A) ≤ m∗ ( B)
(c) (countable subadditivity) m∗ ( An ) ≤ ∑ m∗ ( An )
S

Since the outer measure applies to all sets, and we have seen there exist non-measurable
sets, we do not expect outer measure to satisfy countable additivity in general. Still axiom
(c) is quite strong: the Jordan outer measure does not satisfy countable subadditivity.

5.3. P ROPOSITION . The Lebesgue outer measure satisfies the outer measure axioms (a)–(c).

P ROOF. The axioms (a) and (b) are both trivial, so it remains to prove only axiom
(c). Let En be arbitrary sets and let e > 0 be given. From the definition of Lebegue
outer measure, for each n we can find a sequence of boxes Bin such that An ⊂ i Bin and
S

∑i m( Bin ) − m∗ ( An ) < e/2n .


5. INTRODUCTION TO LEBESGUE MEASURE 18

An ⊂ Bin , and moreover:


S S S
Taking unions, we have n i

m∗ ( ∑ ∑ vol( Bin )
[
An ) ≤
n i
≤ ∑ (m∗ ( An ) + e/2n )
n
≤ ∑ m∗ ( An ) + 2e
n

Taking e → 0, we obtain the desired inequality m∗ ( A n ) ≤ ∑ m ∗ ( A n ).


S

The bookkeeping used in the above proof is called an “e/2n argument”, and is used
frequently in countable approximations.
In the next sections we will work to address under what circumstances the Lebesgue
outer measure satisfies additivity or otherwise behaves well.

E XERCISE 5.1 (Tao, Ex 1.2.1). Show that the countable union of Jordan measurable sets
need not be Jordan measurable, even when bounded. Show that the countable intersection
of Jordan measurable sets need not be Jordan measurable.

E XERCISE 5.2 (Tao, Ex 1.2.2). Give an example of a sequence of uniformly bounded,


Riemann integrable functions on [0, 1] which converges pointwise to a function that is not
Riemann integrable. Is it possible to give an example which converges uniformly?

E XERCISE 5.3. Show that m∗ ( A) ≤ m∗ j ( A). Give an example of a set A such that
m ∗ ( A ) < m ∗ j ( A ).

E XERCISE 5.4. Show that Jordan outer measure does not satisfy countable subadditiv-
ity.

E XERCISE 5.5. Show that if A is Lebesgue null, that is, m∗ ( A) = 0, then A is Lebesgue
measurable.
6. LEBESGUE OUTER MEASURE 19

§6. Lebesgue outer measure

R EADING . Tao, §1.2.1.

We have shown that the Lebesgue outer measure satisfies countable subadditivity. We
are really interested in additivity, but we know that even the finite additivity axiom cannot
hold for all sets. In the end, we will prove that countable additivity is true for measurable
sets. For the moment, we will be satisfied with the following version of additivity which
holds in special cases.

6.1. L EMMA . Suppose that A, B are positively separated, that is, that d( A, B) = inf { d( x, y) | x ∈ A, y ∈ B } >
0. Then m∗ ( A ∪ B) = m∗ ( A) + m∗ ( B).

P ROOF. Subbaditivity implies that m∗ ( A ∪ B) ≤ m∗ ( A) + m∗ ( B), so it remains only


to show m∗ ( A ∪ B) ≥ m∗ ( A) + m∗ ( B). Applying the definition of m∗ ( A ∪ B), given any
e > 0 we can find boxes Ci such that A ∪ B ⊂ Ci and ∑ vol(Ci ) − m∗ ( A ∪ B) < e.
S

Let us first consider an easy case when each Ci meets at most one of the sets A, B. Then
we can rewrite the sequence {Ci } as { Di } ∪ { Ei }, where the Di ’s meet only A and the Ei ’s
meet only B. Now

m∗ ( A ∪ B) > ∑ vol(Ci ) − e
= ∑ vol( Di ) + ∑ vol( Ei ) − e
≥ m∗ ( A) + m∗ ( A) − e
Taking e → 0, we are done in this case.
In the general case, we can reduce to the easy one by partitioning each Bi into smaller
boxes, each with diameter smaller than d( A, B). Once this is done, each new box meets at
most one of A, B and we may proceed as above. 
Up to this point, we have not yet shown that m∗ ever takes a nonzero value! In fact m∗
satisfies a strong normality axiom, which states that the outer measure of an elementary
set is equal to its elementary measure. When we proved this property for Jordan measure,
we started by showing that one cannot partition an interval into finitely many subinter-
vals whose lengths somehow add up to less than the original. For countable partitions
this is intuitively still true, but much harder to show!

6.2. T HEOREM . If E is an elementary subset of Rn , then m∗ ( E) agrees with the elementary


measure me ( E).

P ROOF. It is clear that m∗ ( E) ≤ me ( E), since E is itself a union of boxes whose volumes
sum to m( E). Thus it remains only to show m∗ ( E) ≥ me ( E). Appealing to the definition
of m∗ ( E), given any e > 0 we can find boxes Bi such that E ⊂ Bi and ∑ vol( Bi ) −
S

m∗ ( E) < e. Rearranging, this says m∗ ( E) > ∑ vol( Bi ) − e. Now we would like to say that
∑ vol( Bi ) ≥ me ( E), but unfortunately the elementary measure is only finitely subadditive.
6. LEBESGUE OUTER MEASURE 20

In order to proceed, let us temporarily assume that E is closed and the Bi are open.
We recall that any closed and bounded set is compact, and that any covering of a compact
set by open sets has a finite subcovering. Thus under these assumptions, we have that
just finitely many of the Bi are needed to cover E. Thus the argument of the previous
paragraph works in this case!
In order to assume that the Bi are open, we can enlarge each slightly and find an open
box Bi0 such that Bi ⊂ Bi0 and vol( Bi0 ) − vol( Bi ) < e/2i .
In order to assume that E is closed, first write it as a finite union of disjoint boxes
C1 , . . . , Ck . Shrinking each Ci slightly, we can find a closed box Ci0 ⊂ Ci such that me (Ci r
Ci0 ) < e/k. Replacing E with Ci0 we obtain a closed set as desired.
S

As a consequence of the theorem, we now know that finite additivity holds for m∗ for
finite unions of disjoint boxes (after all it is true for the elementary measure). In fact it also
holds for finite unions of almost disjoint boxes: here two boxes are said to be almost disjoint
if they have disjoint interiors. This is because the elementary measure of the boundary of
a box is always zero. The next result extends this from finite to countable unions.
6.3. T HEOREM . Suppose Bi is a sequence of pairwise almost disjoint boxes. Then m∗ ( Bi ) =
S

∑ vol( Bi ).
P ROOF. By subadditivity together with the previous theorem, we have m∗ ( Bi ) ≤
S

∑ m∗ ( Bi ) = ∑ vol( Bi ). Hence it remains only to show m∗ ( Bi ) ≥ ∑ vol( Bi ). For this, let


S

N ∈ N be given, and note that 1N Bi is an elementary set. Thus by monotonicity together


S

with the previous theorem, we have


N
m∗ ( Bi ) ≥ m∗ (
[ [
Bi )
1
N
[
= me ( Bi )
1
N
= ∑ vol( Bi )
1

Taking N → ∞, we obtain the desired inequality. 


We are finally making some progress: for unions of almost disjoint sequences of boxes,
the additivity property holds and the measure of the union is equal to the expected quan-
tity. This leads one to ask what kinds of sets can be written as unions of almost disjoint
sequences of boxes, and the following result shows this at least includes the open sets.
6.4. P ROPOSITION . Any open set O can be written as a union of a sequence of pairwise almost
disjoint boxes.
P ROOF. Consider the family Q of dyadic cubes, that is, cubes with each side of the form
[m/2n , (m + 1)/2n ] where n ≥ 0. The family Q has a nesting property: for any two cubes
in Q, either one is contained in the other or else the two cubes are almost disjoint.
6. LEBESGUE OUTER MEASURE 21

It is not difficult to observe that Q is a basis for the topology of Rn . In particular,


for any x ∈ O there exists a cube Bx ∈ Q such that x ∈ Bx ⊂ O. Thu union of all Bx
for x ∈ O is a covering of O by dyadic cubes. Now we eliminate duplicates from this
covering, that is, remove any cube in the covering that is contained in some other cube
of the covering. Since any nested chain of cubes has a maximal element, the cubes which
remain will still cover O. And by the nesting property, the cubes which remain will also
be almost disjoint. 
The above result gives a very direct method of calculating the Lebesgue outer measure
(and hence the Lebesgue measure) of any open set! We close this section with the follow-
ing so-called “outer regularity lemma”, which shows that the Lebesgue outer measure is
determined by its values on the open sets. When combined with the previous result, this
gives a kind of general formula for the outer measure.

6.5. L EMMA . Let A be any subset of Rn . Then m∗ ( A) = inf { m∗ (O) | O is open and A ⊂ O }.

P ROOF. It is clear from monotonicity of m∗ that ≤ holds. Thus it remains only to


show ≥. Applying the definition of m∗ ( A) we can find boxes Bi such that A ⊂ Bi and
S

∑ vol( Bi ) − m∗ ( A) < e. Arguing as in an earlier proof, we can enlarge the Bi slightly to


assume without loss of generality that they are open. Then

m∗ ( A) ≥ ∑ vol( Bi ) − e
≥ m∗ (
[
Bi )

≥ inf { m (O) | O is open and A ⊂ O } − e
Taking e → 0, we obtain the desired result. 
In the next section we will use these partial results to conclude that the Lebesgue
(outer) measure always behaves well on the measurable sets.

E XERCISE 6.1 (Tao, Ex 1.2.5). Suppose A is expressible as a countable union of pairwise


almost disjoint boxes. Show that m∗ ( A) = m∗ j ( A).

E XERCISE 6.2 (Tao, Ex 1.2.6). Show that it is not true in general that

m∗ ( A) = sup { m∗ (O) | O is open and O ⊂ A }


7. LEBESGUE MEASURABILITY 22

§7. Lebesgue measurability

R EADING . Tao, §1.2.2.

Recall that a set A is called Lebesgue measurable if it can be well-approximated from


the outside by a countable union of boxes: for all e > 0 there is a sequence of boxes
Bi such that A ⊂ Bi and m∗ ( Bi r A) < e. In particular, this implies tautologically
S S

that countable unions of boxes are Lebesgue measurable. In the next result we work to
establish that many, many other sets are Lebesgue measurable too.

7.1. T HEOREM . Open and closed sets are Lebesgue measurable. Complements, countable
unions, and countable intersections of measurable sets are measurable.

P ROOF. Since the boxes form a base for the topology of Rn , any open set can be written
as a union of boxes. (Or see Proposition 6.4.) Thus by the remark above, open sets are
Lebesgue measurable.
For countable unions, suppose that An are Lebesgue measurable. Given e > 0, find
for each n a countable union of boxes Un such that An ⊂ Un and m∗ (Un r An ) < e/2n .
Then we have

m∗ ( An ) ≤ m∗ (
[ [ [
Un r Un r An )
≤ ∑ m∗ (Un r An )
≤ ∑ e/2n = 2e
S
This shows that An is measurable.
For closed sets, assume first that A is closed and bounded, and thus compact. Using
the outer regularity lemma we can find an open set O such that A ⊂ O and m∗ (O) −
m∗ ( A) < e. We wish to show that m∗ (O r A) < e too. Since O r A is open, we can use
Proposition 6.4 to write O r A as an almost disjoint union of closed dyadic cubes Cn . Then
SN
1 Cn is compact and thus positively separated from the compact set A. By Lemma 6.1,
additivity holds for positively separated sets, so we have:
N N
m∗ ( A) + m∗ ( Ci ) = m∗ ( A ∪
[ [
Ci )
1 1
≤ m∗ (O)
< m∗ ( A) + e

It follows that ∑1N vol(Ci ) = m∗ ( 1N Ci ) < e, and taking N → ∞ we have ∑ vol(Ci ) ≤ e.


S

This shows that m∗ (O r A) ≤ e, as desired.


In general a closed set can be written as a countable union of compact sets, and we
have already handled the case of countable unions.
For complements, let A be measurable and for each n find a union of boxes Un such
that A ⊂ Un and m∗ (Un r A) < 1/n. We can enlarge the constituent boxes of each Un
slightly to find an open set On such that Un ⊂ On and m∗ (Un r On ) < 1/n. (Here we
7. LEBESGUE MEASURABILITY 23

are using normality and subadditivity to achieve this estimate.) Taking the intersection of
the On we now have A ⊂ ∩On and m∗ ( On r A) = 0. Writing these two expressions in
T

complement, they become Onc ⊂ Ac and m∗ ( Ac r Onc ) = 0. Now Ac can be expressed


S S

as a union of two sets: On and Ac r Onc . The first is Lebesgue measurable because it
S c S

is a countable union of closed sets. The second is Lebesgue measurable because it is null
(see an earlier exercise). Appealing again to the closure under unions, we conclude that
Ac is measurable too.
For countable intersections, we can simply apply Demorgan’s laws to reduce it to
complements and countable unions. Whew! 
The above theorem thus shows that the Lebesgue measurable sets form a σ-algebra,
that is, a family of sets that is closed under countable unions, countable intersections,
and complements. It moreover shows that the Lebesgue measurable sets includes the
well-known class of Borel sets, that is, the σ-algebra generated by the open and closed
sets. The Borel sets are often identified as those which can be explicitly described. Most
sets we encounter in analysis can be explicitly described and are thus Borel and Lebesgue
measurable.
We now know that Borel sets are Lebesgue measurable, null sets are Lebesgue mea-
surable, and the measurable sets form a σ-algebra. The next result concludes that this
information characterizes the Lebesgue measurable sets.

7.2. P ROPOSITION . The collection of Lebesgue measurable sets is the least σ-algebra contain-
ing both the open sets and the Lebesgue null sets.

P ROOF. It is clear that the Lebesgue measurable sets are a σ-algebra containing the
open sets and the Lebesgue null sets. On the other hand suppose that E is a Lebesgue
measurable set. By the previous lemma for all n we can find open sets On such that
E ⊂ On and m∗ (On r E) < 1/n. It follows that N = On r E is Lebesgue null. Now have
T

that
E = (On ) ∩ N c
\

and thus E lies in the σ-algebra generated by the open sets and the Lebesgue null sets. 
We conclude this section with some useful equivalents of Lebesgue measurability,
similar to the ones we developed for Jordan measurability. The following result implies
that the Lebesgue measurable sets can be characterized as those which are “almost open.”

7.3. L EMMA . A set A is Lebesgue measurable if and only if for all e > 0 there exists an open
set O such that m∗ (O 4 A) < e.

P ROOF. Our original definition of Lebesgue measurability automatically gives an open


set O such that m∗ (O 4 A) < e. Conversely, let A be any set and suppose the condi-
tion holds. Then for any e we can find an open set Oe such that m∗ (Oe 4 E) < e. Let
Ue = Oe/2k . Then it is not difficult to check that m∗ (Ue r A) ≤ e, and m∗ ( A r Ue ) = 0.
S

Finally letting B = n U1/n we have that B is a measurable set and both m∗ ( A r B) = 0


T
7. LEBESGUE MEASURABILITY 24

and m∗ ( B r A) = 0. We have thus shown that A differs from a measurable set by a null
set, and we leave it as an exercise to check that this implies A is measurable too. 
Perhaps even more surprising, the Lebesgue measurable sets of finite measure can be
characterized as those which are “almost elementary”.

7.4. L EMMA . A set A is Lebesgue measurable with finite Lebesgue measure if and only if for
all e > 0 there exists an elementary set E such that m∗ ( E 4 A) < e.

P ROOF. Suppose that A is Lebesgue measurable and let e > 0 be given. Let O be an
open set such that A ⊂ O and m∗ (O r A) < e. Then O can be written as a union of almost
disjoint boxes O = Bi , and we know that m(O) = ∑ vol( Bi ).
S

Now m(O) < m( A) + e and the right-hand side is finite, so the sum ∑ vol( Bi ) con-
verges. Thus there exists some N such that ∑∞
SN
N +1 vol( Bi ) < e. Letting E = 1 Bi , we have
that E is elementary and m(O r E) < e. Thus we have

m∗ ( E 4 A) = m∗ ( E r A) + m∗ ( A r E)
≤ m∗ (O r A) + m∗ (O r E)
< 2e
which is sufficient to prove the implication. The converse implication is similar to the
previous lemma. 
We have now established many useful properties of the outer measure m∗ and shown
that it has a broad collection of measurable sets. In the next section we will confirm as
promised that m∗ behaves very well when restricted to the collection of measurable sets.

E XERCISE 7.1 (See Tao, Ex 1.2.7). (a) Show that A is measurable iff for all e > 0
there exists a closed set F ⊂ A such that m∗ ( A r F ) < e.
(b) Show that A is measurable iff for all e > 0 there exists a measurable set B such
that m∗ ( A 4 B) < e.

E XERCISE 7.2 (Tao, Ex 1.2.14). Show that any set A is contained in a Lebesgue measur-
able set B such that m( B) = m∗ ( A).

E XERCISE 7.3 (Tao, Ex 1.2.15). Show the inner regularity property: If A is Lebesgue
measurable, then
m( A) = sup { m(K ) | K ⊂ A, K compact }
8. LEBESGUE MEASURE 25

§8. Lebesgue measure

R EADING . Tao, §1.2.2

In the previous section we established that many sets are Lebesgue measurable. When
A is Lebesgue measurable we simply write m( A) for m∗ ( A), and we call m the Lebesgue
measure. We are finally ready to prove that the Lebesgue measure satisfies the require-
ments of a measure that we laid out in the first section, at least when they are applied to
Lebesgue measurable sets.

8.1. T HEOREM . The Lebesge measure satisfies the axioms


(a) (normality) if B is a box then m( B) = vol( B);
(b) (translation-invariance) m( x + A) = m( A) for every x ∈ Rn and measurable set A;
(c) (countable additivity) m( An ) = ∑ m( An ) for every sequence of pairwise disjoint mea-
S

surable sets An .

P ROOF. We have already established normality for m∗ and hence for m. Translation-
invariance of m∗ is clear from the definition since it is clear for boxes.
For countable additivity, first recall that we always have subadditivity so we need only
show m( An ) ≥ ∑ m( An ). Suppose first that the An are compact. Then they are pairwise
S

positively separated, so using Lemma 6.1 inductively we can establish that m( 1N An ) =


S

∑1N m( An ). It follows that m( An ) ≥ ∑1N m( An ). Taking N → ∞ we have m( An ) ≥


S S

∑ m( An ) as desired.
Next assume that the An are bounded but not necessarily closed. By the measurability
of Acn we can find open sets On such that Acn ⊂ On and m∗ (On r Acn ) < e/2n . Taking
complements we thus have compact sets Kn ⊂ En such that m∗ ( An r Kn ) < e/2n . Now
using the additivity for compact sets,
[ [
m( An ) ≥ m( Kn )
= ∑ m ( Kn )
≥ ∑(m( An ) − e/2n )
= ∑ m( An ) − e
Taking e → 0, we are finished in this case.
Finally for general An , decompose Rd into disjoint bounded cells Cm . Then An =
m An ∩ Cm . Now the sets An ∩ Cn are bounded, so applying the result for bounded sets
S

twice we have:

∑ ∑ m( An ∩ Cm )
[
m( An ) =
n m
= ∑ m( An )
n

and the proof is complete. 


8. LEBESGUE MEASURE 26

We have now established the existence and all of the promised axioms of the Lebesgue
measure. Additional useful properties can be derived from the axioms, such as the fol-
lowing result concerning continuity of the measure function.

8.2. T HEOREM . ◦ (upwards monotone convergence theorem) If An are measurable and


An+1 ⊃ An then m( An ) = lim m( An ).
S

◦ (downwards monotone convergence theorem) If An are measurable and An+1 ⊂ An then


T
m( An ) = lim m( An ), provided some An has finite measure.

P ROOF. For the upwards MCT, let A0n = An r An−1 and note that the A0n are disjoint
and have the same union as before: A0n = An . Note that we implicitly set the value
S S

A0 = ∅. Applying countable additivity, we now have

A0n )
[ [
m( An ) = m(
= ∑ m( A0n )
= ∑ m ( A n ) − m ( A n −1 )
N
= lim ∑ m( An ) − m( An−1 )
N
1
= lim m( A N )
N

The last equality holds simply by telescoping cancellation.


For the downwards MCT, we can suppose without loss of generality that A1 has finite
measure. We thus take complements inside A1 to obtain the sequence Bn = A1 r An . Then
S T
the Bn form an increasing sequence and Bn = A1 r An . Using this and the upwards
MCT, we now have
\ [
m ( A1 ) = m ( An ) + m( Bn )
\
= m( An ) + lim m( Bn )
\
= m( An ) + lim(m( A1 ) − m( An ))
\
= m( An ) + m( A1 ) − lim m( An )

An ) −
T
Cancelling the m( A1 ) from the first and last expression, we obtain that 0 = m(
lim m( An ), which implies the desired result. 

E XERCISE 8.1 (Tao Ex 1.2.11(iii)). Give a counterexample showing that the hypothesis
that some An has finite measure is necessary for the downwards MCT.

E XERCISE 8.2 (Tao Ex 1.2.12). Suppose you know that the domain of m is a σ-algebra,
and m satisfies m(∅) = 0 and the countable additivity property. Show that m satisfies the
monotonicity property and the countable subadditivity property.

E XERCISE 8.3 (Tao Ex 1.2.13). Let us say that a sequence of sets An converges to A if
the characteristic functions χ An converge pointwise to χ A .
8. LEBESGUE MEASURE 27

(a) Show that if An are Lebesgue measurable and An converges to A then A is Lebesgue
S T
measurable. [Hint: Show that if An converges to A then A = n m>n Am and
T S
also A = n m>n Am .]
(b) Suppose that if An are all contained in a set of finite measure and An converges to
A, then m( An ) → m( A). This is an example of the dominated convergence theorem.
(c) Give a counterexample showing that the hypothesis that An are all contained in
a set of finite measure cannot be replaced with the hypothesis that the values
m( An ) are bounded.
PART II

Measure and integration

§9. Preview of integration, simple integration

R EADING . Tao, §1.3 introduction

In this chapter of the course, we investigate integration of real and complex-valued


functions. Just as the Jordan measure corresponded tightly with the Darboux/Riemann
integral, the Lebesgue measure can be associated with the so-called Lebesgue integral.
The integral will serve most of the purposes needed in calculus, and will also help set the
stage for the next chapter when we will introduce functional analysis.
Just as the Lebesgue measure generalized and extended the Jordan measure, the Lebesgue
integral will generalize and extend the Darboux/Riemann integral while still ensuring
that many of its key properties hold. In addition many new stronger properties will hold
as well, such as infinite versions of additivity and stability under limits.
Recall that the Darboux/Riemann integral was defined first for piecewise constant
functions, that is, functions which take constant values on each of finitely many intervals.
It was then extended using approximations or limits. Similarly, the Lebesgue integral will
be defined first for the so-called “simple” functions, and later extended using approxima-
tions or limits.

9.1. D EFINITION . A function f mapping Rd into the extended real numbers [0, ∞]
(or sometimes into C) is called simple if there exists a partition of Rd into finitely many
Lebesgue measurable subsets A1 , . . . , Ak such that f takes a constant value ci on each Ai .

Equivalently, we may say that f is simple if it is of the form f = ∑1k cn χ Ai where Ai are
Lebesgue measurable sets.
The simple functions are the source of the following commonly held intuition about
Lebesgue integration: While Riemann integration relies on cutting into vertical strips,
Lebesgue integration relies on cutting into horizontal strips. The idea is that the region
below a simple function consists of finitely many horizontal strips with measurable cross-
sections, and thus it is very simple to compute the integral of such a function, as is done
in the next definition. We can then approximate many non-simple functions using simple
functions, as is done in the next section.

9.2. D EFINITION . If f = ∑1k ci χ Ai is a simple function and f ≥ 0, then the simple integral
of f is defined to be s f = ∑1k ci m( Ai ).
R

28
9. PREVIEW OF INTEGRATION, SIMPLE INTEGRATION 29

Note that we assume f ≥ 0 to ensure that the value of the simple integral is never
indeterminate. As was the case with both elementary measure and pc integral, we have
to check that the simple integral is well-defined.

9.3. L EMMA . If f = ∑1l ci χ Ai and f = ∑1m d j χ Bj then we have ∑1l ci m( Ai ) = ∑1m d j m( Bj ).

P ROOF. We use a common refinement approach. By considering intersections of all


k + l sets, we can find a sequence of nonempty disjoint sets C1 , . . . , Cn such that each of
the Ai and Bj can be written as a union of some of the Ck ’s. Note that since the Ai and Bj
are measurable, the Ck are measurable too. Now let xk be an arbitrary point in Ak . We can
calculate
l l
∑ ci m ( Ai ) = ∑ ci ∑ m(Ck )
i =1 i =1 k:Ai ⊃Ck
n
= ∑ ∑ ci m(Ck )
k =1 i:Ai ⊃Ck
n
= ∑ f ( xk )m(Ck )
k =1
n
= ∑ ∑ d j m(Ck )
k =1 j:Bj ⊃Ck
m m
= ∑ dj ∑ m(Ck ) = ∑ d j m( Bj )
j =1 k:Bj ⊃Ck j =1

This is what we desired. 


Having defined the simple integral, we outline some of its important properties that
will be used later on. The first of these is that the value of the simple integral is insensitive
to changes on a null set.
In order to state this property and others cleanly, we introduce the terminology almost
everywhere. If a statement S( x ) with variable x ∈ R holds for every x outside of a null set,
we sat that S is true almost everywhere. For example, if f ( x ) = 0 for x ∈ / Q and f ( x ) = 1
for x ∈ Q (the Dirichlet function) we can say that f = 0 almost everywhere.

9.4. P ROPOSITION . Let f , g be simple functions.


R R
(a) (equivalence) if f = g almost everywhere then s f = s g
R R
(b) (monotonicity) if f ≤ g almost everywhere then s f ≤ s g.
R R R R R
(c) (linearity) s (c f ) = c · s f , and s ( f + g) = s f + s g.

P ROOF. We prove the first property (i), since after that properties (ii) and (iii) are very
similar to the analogous properties of the Riemann integral. Given simple functions f and
g, we can refine their expressions to find measurable sets A1 , . . . , An and a null set N such
that f ( x ) = ∑ ci χ Ai ( x ) for all x ∈
/ N and g( x ) = ∑ ci χ Ai ( x ) for all x ∈
/ N. Then clearly the
simple integral of both f and g evaluates to ∑ ci m( Ai ) + 0. 
9. PREVIEW OF INTEGRATION, SIMPLE INTEGRATION 30

Throwing away null sets is common in analysis, and thanks to our understanding of
the Lebesgue measure it carries with it a lot of power. When studying sets and functions in
the measure context, it will even be useful to modify our logic. We will use the quantifiers
∀∗ x and ∃∗ x to mean “the statement holds for all but a null set of x” and “there exists a
non-null set of x such that the statement holds”.
We close this section with a preview of how the definition of the Lebesgue integral
will proceed in several stages. In the next section, we will use a familiar approximation or
limit idea to extend the simple integral to a much wider class of nonnegative functions.
In two sections, we will show how to extend the integral from nonnegative function to
complex-valued functions.
To imagine how this latter part will go, it is useful to recall the development of infinite
series. Recall that if an ≥ 0 we can define ∑ an as simply sup N ∑1N an . Next if an are
arbitrary real numbers we say that the terms are absolutely summable if ∑ | an | < ∞. In
that case we split each term into its positive part a+ n = max( an , 0) and its negative part

an = max(− an , 0). In this way we have for each term an = a+ −
n − an and we may define

∑ an = ∑ a+ n − ∑ an . Note that the assumption that an is absolutely summable guarantees
that the latter expression is not indeterminate. Finally if an are complex numbers then we
again assume that ∑ | an | < ∞, that is, the terms are absolutely summable in the complex
sense. In that case we can divide each term into its real part < an and imaginary part = an ,
and define ∑ an = ∑ < an + i ∑ = an .

E XERCISE 9.1. Show that a function f is simple if and only if it can be expressed as
f = ∑1k ci χ Ai , where Ai are (not necessarily disjoint) Lebesgue measurable sets.

E XERCISE 9.2 (see Tao, Ex 1.3.1). Show that the simple integral satisfies the properties:
(a) (finiteness) s f < ∞ if and only if f is finite almost everywhere and supported
R

on a set of finite measure


R
(b) (vanishing) s f = 0 if and only if f = 0 almost everywhere
R
(c) (normality) s χ A = m( A) for any Lebesgue measurable set A
10. LEBESGUE MEASURABILITY OF FUNCTIONS 31

§10. Lebesgue measurability of functions

R EADING . Tao, §1.3.2

Just as the Riemann integral was able to integrate functions that can be well approx-
imated by pc functions, the Lebesgue integral will be able to integrate functions that can
be well approximated by simple functions. At the time we did not give a direct defini-
tion of the Riemann integrable functions. This time we will define in advance the class of
functions for which the Lebesgue integral will make sense. As indicated, we begin with
just the nonnegative real-valued functions.

10.1. D EFINITION . A nonnegative function f on Rn is said to be a measurable function


if f is the pointwise limit of nonnegative simple functions.

As was the case with Lebsegue measurable sets, the Lebesgue measurable functions
can be equivalently described in a number of ways, each being in useful in some situa-
tions.

10.2. T HEOREM . A nonnegative function f is measurable if and only if either of the following
holds.
(a) there is a sequence f n of simple functions such that the f n are bounded and have bounded
support, the f n are increasing f n ≤ f n+1 , and f = sup f n ;
(b) for any open set S (respectively: closed set, interval, ray, etc) the preimage f −1 (S) is
Lebesgue measurable.

Before the proof, recall that given a sequence xn we define lim sup xn = inf N supn≥ N xn
and lim inf xn = sup N infn≥ N xn . The lim sup is the largest limit point of xn , that is, the
largest number that is the limit of a subsequence of xn . Similarly, the lim inf is the smallest
limit point of xn . The limit lim xn exists if and only if lim sup xn = lim inf xn , and lim xn
equals this common value.
P ROOF. We first show that if f is measurable, then (b) holds. So let f n be simple
functions such that f = lim f n pointwise. Note that by the above discussion, we have that
f = lim sup f n pointwise. Now suppose that S = (λ, ∞) is an open ray. Then we want to
say that

x ∈ f −1 (S) ⇐⇒ f ( x ) > λ
⇐⇒ inf sup f n ( x ) > λ
N n≥ N

⇐⇒ (∀ N )(∃n ≥ N ) f n ( x ) > λ
f n−1 (λ, ∞]
\ [
⇐⇒ x ∈
N n≥ N

Since the f n are simple, it is clear that the set in the last line is measurable and therefore
f −1 (S) would be measurable. However the argument isn’t right, since for instance it isn’t
10. LEBESGUE MEASURABILITY OF FUNCTIONS 32

quite true that (∀n)zn > λ implies infn zn > λ. The correct calculation introduces a couple
additional steps but ultimately accomplishes the same thing:

x ∈ f −1 (S) ⇐⇒ f ( x ) > λ
⇐⇒ (∃e) f ( x ) ≥ λ + e
⇐⇒ (∃e)(∀ N ) sup f n ( x ) ≥ λ + e
n≥ N

⇐⇒ (∃e)(∀ N )(∀η ) sup f n ( x ) > λ + e − η


n≥ N

⇐⇒ (∃e)(∀ N )(∀η )(∃n ≥ N ) f n ( x ) > λ + e − η


 
−1 1 1
λ+ − ,∞
[\\ [
⇐⇒ x ∈ fn
i N j n≥ N
i j

Once again, this establishes that f −1 (S) is measurable. Now an analogous argument will
allow us to handle the case when S = (−∞, µ). Since any open interval is an intersection
of two open rays, and any open set is a countable union of intervals, we can conclude that
for any open S the set f −1 (S) is measurable. This establishes (b).
Next we argue that (b) implies (a). Suppose that f satisfies condition (b). Given any n,
we will define a nonnegative simple function f n ≤ f as follows:

max  i | i ≤ f ( x ) and i ≤ n x ∈ [−n, n]d
2n 2n 2n
f n (x) =
0 otherwise

The above prescription clearly ensures that f n ≤ f , that f n is bounded above by n, and
that f n has bounded support [−n, n]d . Moreover it is not difficult to check that f n ≤ f n+1
and that f n → f . It remains only to show that f n is simple, and since f n clearly takes
just finitely many values we really only have to check that it takes each of its values on
a measurable set. For this, for example we have f n ( x ) = 2in if and only if f ( x ) lies in
the interval [ 2in , i+ 1 −1 i
2n ). It follows from property (b) that f n ( 2n ) is a measurable set, and
therefore we have that f n is simple and f satisfies property (a).
Finally it is trivial that (a) implies f is measurable, so we have completed the proof.

It is worth remarking that by property (b) of the Lemma, measurability can be viewed
as a massive generalization of continuity. Recall that a function f is continuous if and only
if whenever S is open we have f −1 (S) open. In property (b), we ask merely that f −1 (S)
be Lebesgue measurable, a much weaker demand.
Notice also that since preimages are stable under unions, intersections, and comple-
ments, property (b) implies that if S is Borel then f −1 (S) will be measurable too. But if S
is merely measurable, there is no guarantee that f −1 (S) will be measurable! To see this
consider a function f which is a bijection between [0, 1] and a null set. For example one
can map [0, 1] into the Cantor set C injectively almost everywhere by operating on binary
10. LEBESGUE MEASURABILITY OF FUNCTIONS 33

and ternary expansions as follows:

0.b1 b2 b3 · · · (base 2) 7→ 0.(2b1 )(2b2 )(2b3 ) · · · (base 3)

Now if N is a Lebesgue nonmeasurable subset of [0, 1], we have that S = f ( N ) is null but
the preimage f −1 (S) is non-measurable.
To close the section, we extended the definition of measurable function from nonneg-
ative functions only to complex-valued functions in the following way. Recall that if f is
a real-valued function, then we can define its positive and negative parts:

f + = max( f , 0)
f − = max(− f , 0)

We then have that f + and f − are nonnegative functions with f = f + − f − .

10.3. D EFINITION . If f is an almost-everywhere defined complex-valued function on


Rn then f is a measurable function if and only if the positive and negative parts of its real
and imaginary parts are measurable functions.

We note that the above defintion is equivalent to the alternate approach of simply
replacing nonnegative simple functions with complex-valued simple functions in the def-
inition of measurable function.

10.4. L EMMA . If f is a complex-valued function on Rd then f is measurable if and only if f


is a pointwise limit of complex-valued simple functions.

E XERCISE 10.1. Show that lim xn = x if and only if lim inf xn = x = lim sup xn .

E XERCISE 10.2 (Tao, Ex 1.3.3). . . .

E XERCISE 10.3 (Tao, Ex 1.3.4). Show that if f is a bounded, nonnegative measurable


function on Rd , then there is a sequence of bounded simple functions f n which converges
uniformly to f (not just pointwise).

E XERCISE 10.4 (Tao, Ex 1.3.5). Let f be a nonnegative function on Rd . Show that f is


simple if and only if f is measurable and takes on at most finitely many values.

E XERCISE 10.5 (Tao, Ex 1.3.6). If f is a nonnegative measurable function, show that


the region under f is a measurable set.
11. LEBESGUE INTEGRATION OF NONNEGATIVE FUNCTIONS 34

§11. Lebesgue integration of nonnegative functions

R EADING . Tao, §1.3.3

Previously we defined the simple functions, showed that they can be integrated in an
obvious way, and showed that integral satisfied basic desirable properties such as addi-
tivity. Next we will define the lower integral for an arbitrary nonnegative function using
approximations by simple integrals. After establishing some basic properties of the lower
integral, we will see that it behaves very well when applied to measurable functions, and
in that case we will simply call it the Lebesgue integral.

11.1. D EFINITION . Let f be a nonnegative function on Rd . We define the lower Lebesgue


integral of f by  Z 
Z
f = sup s g g ≤ f , g nonnegative simple
R R
If f is measurable, we define the Lebesgue integral of f to be f = f .

Before investigating the Lebesgue integral itself, we will describe several properties
of the lower Lebesgue integral. While it is also possible to define the upper Lebesgue
integral, it is of more limited use than in the Riemann case. Later we will define the upper
Lebesgue integral just for bounded functions with bounded support. In general there are
functions which are measurable and should have finite integral, that do have the correct
lower Lebesgue integral, but do not have a finite upper Lebesgue integral.
It is clear that the lower Lebesgue integral agrees with the simple integral on the sim-
ple functions. It also inherits the equivalence and monotonicity properties from the simple
integral, but not linearity. Recall that m∗ was merely subadditive; this is essentially be-
cause it was defined as an infumum. On the other hand the lower Lebesgue integral will
be superadditive; this is essentially because it is defined as a supremum.

11.2. P ROPOSITION . The lower Lebesgue integral satisfies the properties:


R R
(a) (equivalence) if f = g almost everywhere then f = g;
R R
(b) (monotonicity) if f ≤ g almost everywhere then f ≤ g; and
R R R
(c) (superadditivity) ( f + g) ≥ f + g.

P ROOF. The equivalence and monotonicity properties are clear from the analogous
properties of simple integrals. For superadditivity, let e > 0 be given and find simple
R R R R
functions h and k such that h ≤ f , k ≤ g, f − s h < e, and g − s k < e. Then we have
h + k ≤ f + g, and using monotinicity plus additivity for simple integrals:
Z Z
( f + g) ≥ s (h + k)
Z Z
= s h+s k
Z Z
> f+ g − 2e
11. LEBESGUE INTEGRATION OF NONNEGATIVE FUNCTIONS 35

Letting e → 0, we obtain the desired result. 


The following fundamental pair of results establish that arbitrary functions can be
approximated well by bounded functions with bounded support. This will be very useful
since the Lebesgue integral is much easier to work with in this case.

11.3. L EMMA . Let f be a nonnegative function on Rd . The lower Lebesgue integral satisfies
the following identities.
(a) (range truncation) If f N = min( f , N ) then f N → f .
R R
R R
(b) (support trunctation) If f N = f χ[− N,N ]d then f N → f .

P ROOF. (a) Let us first assume that f < ∞. Given e > 0 we can find a simple
R
R R
function g such that g ≤ f and f − s g < e. By our assumption g must be bounded
almost everywhere, which implies that for N large enough we have g ≤ f N too. Now by
monotonicity f − f N ≤ f − s g < e, which shows the desired result. The argument
R R R R

is similar in the case f = ∞.


R
R R
(b) Again let e be given and find a simple function g such that g ≤ f and f − s g < e.
Write g = ∑1k ci χ Ai . Now we look at the simple integral of g N :
Z k
s gN = ∑ ci m( Ai ∩ [− N, N ]d )
1
k Z
→ ∑ ci m ( Ai ) = s g
1

Where here N → ∞ and we are applying the upwards monotone convergence theorem.
R R
Thus we can find N large enough that s g − s g N < e. Again using monotonicity we
R R R R
conclude that f − f N ≤ f − s g N < 2e. 
We are now ready to show that the Lebesgue integral behaves well when applied to
measurable functions.
R R R
11.4. T HEOREM . If f , g are nonnegative measurable functions, then ( f + g) = f + g.

P ROOF. First suppose that f , g are bounded functions with bounded supports. For
such functions, it is useful to define the upper Lebesgue integral in the obvious way:
Z  Z 

f = inf s h f ≤ h, h simple
R R
We claim that under our hypotheses, we in fact have f = f (and similarly for g and
f + g).
To see this recall that since f is measurable, we can find simple functions f n such that
f n ≤ f n+1 and f n → f . Note also that since f is bounded, the construction of the f n from
Theorem 10.2 in fact showed that the f n converge uniformly to f . Thus given e > 0 we can
find n such that
f ≥ f n ≥ f − eχS
11. LEBESGUE INTEGRATION OF NONNEGATIVE FUNCTIONS 36

R R
where S is a support for f . Taking of the first inequality and of the second, we obtain
Z Z Z Z
f ≥s fn ≥ ( f − eχS ) = f − em(S)

Letting e → 0 we obtain the desired result.


R
Now we have already shown that satisfies the superadditivity property. Using a
R
parallel argument, it is easy to show that satisfies the analogous subadditivity property.
R R
And since = on the functions f , g, and f + g, we can put the two together to conclude
the additivity property!
Finally for general measurable functions f and g, we can always apply the truncation
lemmas to replace f and g with bounded functions with bounded supports. Each trunca-
tion costs us an e in the additivity property, but afterwards we can let e → 0 and obtain
the desired result. 
Later we will show that the Lebesgue integral even satisfies countable additivity for
nonnegative measurable functions.

E XERCISE 11.1 (Tao, ex 1.3.13). Let f be a nonnegative measurable function on R.


R
Show that f is equal to the 2-dimensional Lebesgue measure of the region

{ ( x, y) | 0 ≤ y ≤ f ( x ) }

E XERCISE 11.2 (Tao, ex 1.3.18). Let f be an nonnegative measurable function on Rd .


(a) Show that if f < ∞ then f is finite almost everywhere. Give a counterexample
R

to show that the converse is false.


R
(b) Show that f = 0 if and only if f = 0 almost everywhere.

E XERCISE 11.3. Give an example of a nonnegative function which is measurable, but


has different lower and upper Lebesgue integrals.
12. LEBESGUE INTEGRATION 37

§12. Lebesgue integration

R EADING . Tao, §1.3.4

We have now defined and explored the Lebesgue integral for nonnegative functions.
As previously explained, we will now proceed to extend this definition to signed and
even complex-valued functions. Some care will of course be needed; to see this consider
what would happen when trying to find the integral of sin( x ) or of 1/x over the whole
real line! In order to proceed, we will provide an assumption which guarantees that such
issues will not occur.

12.1. D EFINITION . Let f be a complex-valued measurable function on Rd (it need only


be defined almost everywhere). We say that f is absolutely integrable, or a member of L1 , if
| f | < ∞.
R

We can now define the Lebesgue integral for absolutely integrable functions f , by
using the real, imaginary, positive, and negative parts.

12.2. D EFINITION . If f is absolutely integrable and real-valued, let f = f + − f − .


R R R
R R R
If f is absolutely integrable and complex-valued, let f = < f + i = f .

Since f + , f − ≤ | f |, the hypothesis of absolute integrability means that f will not be


R
R
an indeterminate expression. The definition of f agrees with the nonnegative integral
when both are defined.
Functions which are not absolutely integrable include sin( x ) and 1/x, defined on all
of R. Another interesting example is the function f = (−1)n /nχ[n,n+1) . In some sense this
integral “should” have value ∑(−1)n /n, which is a convergent series. One may wish to
define an improper integral which works for f , but it is difficult to do so without sacrific-
ing some of the properties of the Lebesgue integral.
Next, we consider some of the elementary properties of the absolutely convergent
integral. Our first result states that the real-valued version of the absolutely convergent
Lebesgue integral agrees with and extends the Riemann integral.

12.3. P ROPOSITION . Let f : [ a, b] → R be a Riemann integrable function. Then intepreting


f as a function defined on all of R which is zero outside of [ a, b], we have that f is Lebesgue
absolutely integrable with the same value.

P ROOF. Let us first assume that f is nonnegative. Then since pc functions are simple,
we clearly have that the lower Darboux integral of f is less than or equal to the lower
Lebesgue integral of f . On the other hand by monotonicity the lower Lebesgue integral
of f is less than or equal to the upper Darboux integral of f . Since the lower and upper
Darboux integrals are equal, we must have that it agrees with the Lebesgue integral of f .
If f is real-valued, then we can write f = f + − f − , and this expression is valid
R R R

for both the Darboux and absolutely convergent Lebesgue integrals. Applying the previ-
ous argument to both f + and f − , we have the desired result. 
12. LEBESGUE INTEGRATION 38

Next, the absolutely convergent Lebsegue integral of course inherits many of the prop-
erties of the nonnegative integral, and has some new ones too.
R R R R
12.4. P ROPOSITION . (a) (linearity and conjugation) ( f + g) = f + g, c f =
c f , and f¯ = f ;
R R R
R R
(b) (triangle inequality for integrals) f ≤ | f |.

P ROOF. We should first check that if f , g are absolutely integrable, then f + g and c f
are absolutely integrable. For the first we can simply use the classical triangle inequality
R R R
| f + g| ≤ | f | + | g|. Then monotonicity implies that | f + g| ≤ | f | + | g| is finite. For
R R R
the second, simply note that |c f | = |c|| f | = |c| | f | is again finite.
Now linearity is easily proved using the analogous property for the nonnegative inte-
gral. For example, if f , g are real-valued and h = f + g, then h+ − h− = f + − f − + g+ −
g− . Rearranging terms we have f − + g− + h+ = f + + g+ + h− . From the nonnegative lin-
earity we know f − + g− + h+ = f + + g+ + h− . Rearranging back, we obtain
R R R R R R
R R R
h = f + g.
For the triangle inequality for integrals, first assume that f is real-valued. Then we
can write f = f + − f − and | f | = f + + f − . Thus using linearity together with the triangle
inequality, we have
Z Z Z
f = f + − f −


Z Z
≤ f+ + f−
Z
= |f|

Next if f is complex-valued we can find an angle θ such that eiθ f = f . Again using
R R

linearity, together with the definition of the complex-valued integral, we have


Z Z

f = < f

Z
= <eiθ f
Z
= <eiθ f
Z
≤ |f|

The last inequality following from monotonicity, and gives the desired result. 
One of the most appreciable aspects of the theory of integration is that the class of
absolutely integrable functions forms a vector space. Although the space is of course
infinite dimensional, it has a substantial amount of structure! Recall that a nonnegative
function k · k defined on a vector space is called a seminorm if kv + wk ≤ kvk + kwk, and
kcvk = |c|kvk.
12. LEBESGUE INTEGRATION 39

12.5. P ROPOSITION . The collection of absolutely integrable functions forms a vector space.
R
In fact it is a seminormed vector space with the seminorm k f k = | f |.
R R R
P ROOF. We argued in the previous proof that | f + g| ≤ | f | + | g| and also that
R R
|c f | = |c| | f |. These two identities imply that the space of absolutely integrable func-
tions is closed under linear combinations and moreover that the two properties of the
seminorm hold. 
A seminorm is called a norm if it additionally satisfies kvk = 0 =⇒ v = 0. If one is
willing to identify functions f , g which agree almost everywhere as being equal, then the
R R
norm k f k = | f | becomes a true norm. Indeed, it is an exercise to check that if | f | = 0
then f = 0 almost everywhere, and thus in this sense f = 0.
While the vector space of absolutely integrable functions is infinite dimensional, the
next result shows that it is not too unwieldy topologically. Recall that a subset D of a
(semi-)normed vector space V is dense if for every v ∈ V and every e there exists d ∈ D
such that kv − dk < e. In other words, D is dense if every element of V can be approxi-
mated by elements of D.

12.6. T HEOREM . The following are all dense subsets of the space of absolutely integrable
functions.
(a) absolutely integrable simple functions;
(b) absolutely integrable simple functions ∑1k ci χ Bi where Bi are all boxes; and
(c) continuous, compactly supported functions.

P ROOF. (a) First assume that f is nonnegative. Then by the definition of the integral,
R R
we can find a simple function g such that g ≤ f and f − g < e. It follows that
R R
( f − g) < e, and since f − g is nonnegative, clearly | f − g| < e. It is easy to extend
this argument to complex-valued functions using the standard technique.
(b) We now know from (a) that it is sufficient to approximate any simple function by
a function of this type. Using linearity, it is enough to approximate a single term χ A , with
m( A) < ∞, by a function of this type. We have already seen that for any such A there
R
exists an elementary set E such that m( A 4 E) < e. This means that |χ A − χ E | < e, so
the result follows.
(c) We now know from (b) that it is sufficient to approximate any χ B , B a box, by a
function of this type. It is possible to do this explicity. For example in one dimension
we have B = I is an interval, and the step function I can easily be approximated by a
continuous function which looks like a trapezoid. 
We will see later that this density result fits in with several results which loosely state
that integrable functions are “almost continuous.”

E XERCISE 12.1 (Tao, ex 1.3.25(i)). Let f be absolutely integrable. Show that for any
R
e > 0 there exists a bounded measurable set A such that | f |χ Ac < e.
12. LEBESGUE INTEGRATION 40

E XERCISE 12.2 (Tao, ex 1.3.25(ii)). Let f be a nonnegative measurable function, and


assume f is finite almost everywhere. Show that for any e > 0, there exists a measurable
set A such that m( A) < e and f is locally bounded outside of A. In other words, for every
n there exists M such that for all x ∈ [−n, n]d r A we have f ( x ) ≤ M.

E XERCISE 12.3. Show that the space of absolutely integrable functions is separable,
that is, has a countable dense subset.
13. CONVERGENCE THEOREMS 41

§13. Convergence theorems

R EADING . Tao, §1.4.5, though assume all functions are defined on Rd .

In the previous section we have seen that the Lebesgue integral satisfies all of the key
properties that the Riemann integral does, while at the same time being able to integrate
many more functions. But given that the Lebesgue measure enjoys much stronger proper-
ties than the Jordan measure does, it is natural to ask whether the Lebesgue integral does
too.
In order to find such strong properties, a good test question to ask is whether f n → f
R
implies f n → f . We saw that in the case of the Riemann integral, this does hold if f n , f
are all defined on an interval and f n → f uniformly. In the case of the Lebesgue integral,
the same proof shows that it works when f n , f are all supported on a common set of finite
measure, and f n → f uniformly.
But without these special hypotheses, such a convergence theorem can fail. So before
looking for situations where it does hold, let us examine some of the examples where it
does not.

13.1. E XAMPLE (Domain escape to infinity). Let f n = χ[n,n+1] and f = 0. That is, f n is
a sequence of moving unit bumps. Then f n → f pointwise (not uniformly), but we have
R R
f n = 1 for all n, and f = 0.

13.2. E XAMPLE (Support escape to infinity). Let f n = n1 χ[0,n] and f = 0. That is, f n is a
sequence of widening and shortening bumps. Then f n → f uniformly, but once again we
R R
have f n = 1 for all n, and f = 0.

13.3. E XAMPLE (Range escape to infinity). Let f n = nχ[1/n,2/n] and f = 0. That is, f n
is a sequence of narrowing and tallening bumps. Then f n → f pointwise (not uniformly),
R R
but once again we have f n = 1 for all n, and f = 0.

One should observe that in all three of our examples where a convergence theorem
fails, mass was destroyed in the limit. In particular, we do not have an example where
new mass is created in the limit. The next result states that mass can only be destroyed,
and never created.

13.4. T HEOREM (Fatou’s lemma). Let f n be nonnegative measurable functions. Then


Z Z
lim inf f n ≤ lim inf fn
R R
In particular, if f n → f then we have f ≤ lim inf f n .

We will prove Fatou’s lemma shortly, but first we will use it to prove the dominated
convergence theorem. The dominated convergence theorem essentially states that so long
as we can close off the avenues through which the mass of a region can escape to infinity,
then we will have a convergence theorem.
13. CONVERGENCE THEOREMS 42

13.5. T HEOREM (Dominated convergence theorem). Let f n be a sequence of measurable


complex-valued functions and suppose that f n → f . Suppose that there exists a nonnegative
function G such that | G | < ∞ and for all n, we have | f n | ≤ G. Then f n → f .
R R R

Intuitively speaking, the function G acts as an umbrella under which the convergence
f n → f occurs. The assumption that the umbrella covers just a finite amount of area
guarantees that mass cannot escape! The following proof will be carried out under the
assumption that Fatou’s lemma is true.
P ROOF OF THE DOMINATED CONVERGENCE THEOREM . By separating f n and f into
their real and imaginary parts, we may assume that they are all real-valued. Thus our
hypothesis says that − G ≤ f n ≤ G. Since f n → f we also have that − G ≤ f ≤ G. Now
f n + G is nonnegative, so we can use Fatou’s lemma to obtain:
Z Z
f + G ≤ lim inf fn + G

Similarly G − f n is nonnegative so we can again use Fatou’s lemma to obtain:


Z Z
G − f ≤ lim inf G − fn

Putting the two equations together, we conclude that


Z Z Z
lim sup fn ≤ f ≤ lim inf fn
R R
It follows that f = lim fn. 
We still have to give a proof of Fatou’s lemma. As a first approximation, we will
examine what happens in the special case that the functions f n are nonnegative and the
sequence of f n ’s is monotone increasing in n.

13.6. T HEOREM (Monotone convergence theorem). Let f n be a sequence of nonnegative


measurable functions, and suppose that f n ≤ f n+1 for all n. Let f = sup f n , so that we automati-
R R
cally have f n → f . Then f n → f .

P ROOF. We first observe that if the f n were all characteristic functions, then this theo-
rem would follow directly from the upwards monotone convergence theorem for Lebesgue
measurable sets. Thus our strategy is to reduce to a situation in which we can apply the
upwards monotone convergence theorem.
R R
First, since f n ≤ f n+1 , by the montonicity of the integral we know that f n ≤ f n+1 .
R R R
Similarly since f n ≤ f we know that f n ≤ f . It follows that the values f n converge
R R
and that lim f n ≤ f .
R R R R
To show that f ≤ lim f n , it is sufficient to show that g ≤ lim f n for any simple
function g such that g ≤ f . Given such a g, we can express it as g = ∑ ci χ Ai where the
Ai are disjoint measurable sets. Using the range truncation lemma, we can suppose that
ci 6= ∞ for all i.
13. CONVERGENCE THEOREMS 43

Now fix just one of the sets Ai and let e > 0 be given. We define the sets

Ai,n = { x ∈ Ai | f n ( x ) > (1 − e)ci }

About this defintion, we first note that it immediately implies f n > ∑(1 − e)ci χ Ai,n . We
second note that for all x ∈ Ai , for n large enough, we have f n ( x ) ≥ (1 − e) g( x ) = (1 −
e)ci . This means that Ai is the union of the Ai,n . Thus the upwards monotone convergence
theorem for sets implies that m( Ai,n ) → m( Ai ) as n → ∞.
Putting this all together, we have
Z Z
lim f n ≥ lim ∑ (1 − e ) c i χ A i,n

= lim ∑(1 − e)ci m( Ai,n )


= ∑ (1 − e ) c i m ( A i )
Z
= (1 − e ) g
R R
Taking e → 0 we therefore conclude that lim f n ≥ g, as desired. 
It is worth remarking that the monotone convergence theorem can fail for signed func-
tions f n . As a silly example, if f n = −1/n then f n → 0, f n = −∞ but f = 0. The
R R

monotone convergence theorem has the following easy but important consequence that
the nonnegative Lebesgue integral is countably linear, not just finitely linear!

13.7. C OROLLARY (Tonelli’s theorem). If f n is a sequence of nonnegative measurable func-


tions, then ∑ f n = ∑ f n .
R R

P ROOF. Using the monotone convergence theorem and then finite linearity, we have:
Z Z N
∑ fn = lim ∑ f n
1
Z N
= lim
N
∑ fn
1
N Z
= lim ∑ fn
N
1
Z
=∑ fn

as desired. 
Next we use the very special monotone convergence theorem to establish Fatou’s
lemma, which we needed in the proof of the dominated convergence theorem.
R R
P ROOF OF FATOU ’ S LEMMA . We will need the general fact that inf gn ≤ inf gn .
R R
This holds simply because inf gn ≤ gn for any particular n, and then one can take the
infn over both sides.
13. CONVERGENCE THEOREMS 44

Now recall that lim inf f n = lim N infn≥ N f n . The functions infn≥ N f n are increasing in
N, so using the monotone convergence theorem together with the above we have
Z Z
lim inf f n = lim inf f n
N n≥ N
Z
= lim inf f n
N n≥ N
Z
≤ lim inf fn
N n≥ N
Z
= lim inf fn

as desired. 

E XERCISE 13.1 (Tao, ex 1.44, 1.45). (a) Let An be measurable sets and assume that
∑ m( An ) < ∞. Show that almost every x ∈ Rd is contained in at most finitely
many of the An . [Hint: Use Tonelli’s theorem on the functions χ An .] This is the
Borel–Cantelli lemma.
(b) Give a counterexample to the above conclusion, showing that the hypothesis
∑ m( An ) < ∞ cannot be replaced by the weaker condition lim m( An ) = 0.

E XERCISE 13.2. Use the dominated convergence theorem to show that the harmonic
series ∑ n1 diverges. [Hint: Let f n = n1 χ[0,n] , show that ∑ n1 < ∞ plus the dominated
R
convergence theorem implies f n → 0, and obtain a contradiction from this.]
14. ABSTRACT MEASURE THEORY 45

§14. Abstract measure theory

R EADING . Tao, §1.4.1–1.4.3

The Lebesgue measure and integration theory that we have developed can be re-
garded as a model for an abstract concept of a measure and integral. The situation is
very similar to other areas of mathematics. Consider the following examples of a concrete
and abstract concept:
◦ the space Rd with its distance measurement k x − yk is a model for the definition
of metric space;
◦ the space Rd with its family of open sets is a model for the definition of topological
space;
◦ the spaces R or C with their addition and multiplication operations are models
for the definition of field;
◦ the space Rd with its R-linear combinations is a model for the definiton of vector
space.
An abstract measure function µ on a space X will be one which satisfies the most
fundamental properties that we have worked to prove for the Lebesgue measure. Of
course the Lebesgue measure m can only be applied to the measurable subsets of Rd , so
we should only expect to be able to apply an abstract measure µ to a subcollection of the
subsets of X. Thus we make the following definition of an abstract space to take the place
of Rd and its measurable sets.

14.1. D EFINITION . A measurable space is a pair ( X, B) where X is any set and B is a


Boolean σ-algebra on X: a collection of subsets of X which contains the sets ∅ and X, and
is closed under countable intersections, countable unions, and complements.

Thus if we let L denote the Lebesgue measurable subsets of Rd , we have that (Rd , L)
is a measurable space. For another example, if X is any set then we can always let F be
the collection of all countable or co-countable subsets of X. Finally if X is any topological
space then it can be viewed as a measurable space by taking its σ-algebra to be the collec-
tion of Borel subsets of X: A set is said to be Borel if it is constructible from the open sets
using countable intersections, countable unions, and complements.

14.2. D EFINITION . Suppose X is a set and B is a σ-algebra on X. A function µ : B →


[0, ∞] is said to be a measure if it satisfies
(a) (empty set) µ(∅) = 0; and
(b) (countable additivity) for every sequence of pairwise disjoint sets An ∈ B , we
have µ( An ) = ∑(µ( An )).
S

Thus the Lebesgue measure m on (Rd , L) is an example of a measure. For another


example, the cardinality measure µ( A) = | A| is a measure on the measurable space
( X, P ( X )). Another very simple example is a Dirac measure on ( X, P ( X )): if x ∈ X we
14. ABSTRACT MEASURE THEORY 46

can define µ x ( A) = 1 if x ∈ A and = 0 if x ∈


/ A. Finally it is worth noting that given sev-
eral measures µi on the same measurable space ( X, B), we can form new measures using
linear combinations. That is, the measure µ = ∑ ci µi is defined by µ( A) = ∑ ci µi ( A).
Many of the properties of Lebesgue measure that we have established can be proved
solely using the axioms of a measure.

14.3. P ROPOSITION . Suppose that B is a σ-algebra on X, and µ is a measure on B . Let A, B,


and An denote elements of B . Then we have:
◦ (monotonicity) if A ⊂ B then µ( A) ≤ µ( B);
◦ (inclusion–exclusion) µ( A ∪ B) + µ( A ∩ B) = µ( A) + µ( B).
◦ (countable subadditivity) µ( An ) ≤ ∑ µ( An );
S

◦ (upwards monotone convergence) if An+1 ⊃ An then µ( An ) = lim µ( An ); and


S

◦ (downwards monotone convergence) if An+1 ⊂ An and µ( A1 ) < ∞ then µ( An ) =


T

lim µ( An ).

Each of these items may be proved in exactly the same way as we have done for
Lebesgue measure. Of course some properties of Lebesgue measure don’t even make
sense to state for a general measure. For instance the normality and translation-invariance
properties don’t make sense in general, since we may not have the notions of boxes or
translations on a general space X.
One further property of the Lebesgue measure which does not appear in the above
list is that subsets of null sets should be null. This property does not follow directly from
the axioms of a measure, and instead must be made into an additional axiom. We say that
a σ-algebra B is complete if whenever A ∈ B and A0 ⊂ A then A0 ⊂ B . We also say that
measure µ is complete if it is defined on a complete σ-algebra. It is an exercise to check that
if µ is a measure, then it can always be extended to a complete measure.
At this point we have only given very trivial examples of measure, besides the Lebesgue
measure. Given the effort we invested to find the right construction of the Lebesgue mea-
sure, it may not surprise you to know that it is difficult to construct interesting examples of
measures. In fact there is a very general and powerful method for constructing measures.
The basic idea behind it is that it is easy to construct measures which are just finitely addi-
tive first. This mirrors the construction of Lebesgue measure, where we first constructed
the elementary and Jordan measures.
In order to define finitely additive measures, we have to modify the definition of mea-
surable space.

14.4. D EFINITION . A Boolean algebra on X is a collection of subsets of X which con-


tains the sets ∅, and X, and is closed under pairwise intersections, pairwise unions, and
complements.

We have seen several key examples of Boolean algebras on Rd : the collection of ele-
mentary sets together with their complements, and the collection of Jordan measurable
14. ABSTRACT MEASURE THEORY 47

sets together with their complements. If X is any set we can always discuss the trivial
Boolean algebra {∅, X } and the maximal Boolean algebra P ( X ).

14.5. D EFINITION . Suppose X is a set and A is a Boolean algebra on X. A function


µ : A → [0, ∞] is said to be a finitely additive measure if it satisfies
(a) (empty set) µ(∅) = 0; and
(b) (finite additivity) for all disjoint sets A, B ∈ A, we have µ( A ∪ B) = µ( A) + µ( B).

The elementary measure is an example of a finitely additive measure, provided we


take its value to be ∞ on any set which is the complement of an elementary set. A similar
statement holds for the Jordan measure. But there are many other examples of finitely
additive measures. For a simple example let X be any set, and A the Boolean algebra
of all finite or cofinite subsets of X. Then we can define a finitely additive measure by
µ( A) = | A|, the cardinality of A.
Similar to the case with measures, the two simple axioms of a finitely additive measure
further imply the monotonicity, inclusion–exclusion, and finite subadditivity properties.
Unfortunately not all finitely additive measures give rise to true measures, necessitat-
ing the following definition.

14.6. D EFINITION . Let X be any set, A a Boolean algebra on X, and µ0 a finitely addi-
tive measure on A. Then µ0 is said to be a premeasure if it satisfies the additional axiom:
◦ for every sequence of pairwise disjoint sets An ∈ A, if An ∈ A, then µ0 (
S S
An ) =
∑ µ ( A n ).

The condition does not quite say that µ0 is countably additive, but rather that it has
the potential to be countably additive. This is confirmed by the following keystone result
of the subject, which we will prove in the next section.

14.7. T HEOREM (Consequence of Carathéodory’s extension theorem). Let X be a set, A


an algebra on X, and µ0 a premeasure on A. Let B be the σ-algebra generated by A. Then µ0
extends to a measure µ on B .

The idea behind the proof is to use an analog of our construction of the Lebesgue outer
measure m∗ . First we define the notion of an abstract outer measure.

14.8. D EFINITION . Let X be any set and µ∗ a function on all subsets of X. Then µ∗ is
said to be a outer measure if it satisfies
(a) (empty set) µ∗ (∅) = 0;
(b) (monotonicity) if A ⊂ B then µ∗ ( A) ≤ µ∗ ( B); and
(c) (countable subadditivity) µ∗ ( An ) ≤ ∑ µ∗ ( An ).
S

If A is an algebra on X, and µ0 is any function on A which satisfies the empty set


axiom, then we can define
n [ o
µ∗ ( E) = inf ∑ µ0 ( Ai ) Ai ∈ A, and E ⊂

(14.e1) Ai
14. ABSTRACT MEASURE THEORY 48

14.9. P ROPOSITION . If µ∗ is constructed as in Equation (14.e1), then µ∗ is an outer measure.

P ROOF. The empty set and monotonicity axioms are clear from the definition. For
countable subadditivity, let Ei be given and e > 0. Then for every i there exists a sequence
of Ain such that Ei ⊂ Ain and ∑n µ0 ( Ain ) − µ∗ ( Ei ) < e/2i . It follows that Ei ⊂ i,n Ain and
S S

∑n.i µ( Ain ) − ∑ µ∗ ( Ei ) ≤ e. Therefore we have µ∗ ( Ei ) − ∑(µ∗ ( Ei )) ≤ e. Taking e → 0


S

the proof is complete. 


Recall that after we constructed Lebesgue outer measure m∗ we defined the collection
of m∗ -measurable sets as those which could be approximated well by open sets. If X
is just a set with an algebra or σ-algebra, we can’t refer to the open sets. Thus to prove
Carathéodory’s extension theorem, it remains to show how to identify the collection of µ∗ -
measurable sets. We will investigate this in the next section, as well as some applications
of the extension theorem.

E XERCISE 14.1 (Tao, ex 1.4.26). Let µ be a measure on ( X, B). Show that B can be
extended to σ-algebra B̂ and µ to a measure µ̂ on ( X, B̂) in such a way that µ̂ is complete.

E XERCISE 14.2 (Tao, ex 1.4.49). Let f be a nonnegative Lebesgue measurable function.


R
Show that µ( A) = f χ A is a measure.

E XERCISE 14.3 (Tao, ex 1.7.6). Give an example of a finitely additive measure that
is not a premeasure. [Hint: work on the measurable space (N, P (N)) and define µ0
separately for finite and infinite sets.]
15. CONSTRUCTION OF ABSTRACT MEASURES 49

§15. Construction of abstract measures

R EADING . Tao, §1.7

In the previous section, we showed how to abstract the properties of the elemen-
tary measure to a finitely additive measure, and how to abstract the properties of the
Lebesgue outer measure too. What we lacked was a definition of measurability, which in
the Lebesgue case relied on the boxes or open subsets of Rd . For a general space X, we
use the following much more subtle definition.

15.1. D EFINITION . Suppose that X is any set and µ∗ is an outer measure on X. Then a
subset A ⊂ X is said to be µ∗ -measurable if for every subset S ⊂ X we have

µ∗ (S) = µ∗ (S ∩ A) + µ∗ (S r A)

To put this definition in context, it is an exercise to show that this property holds for
Lebesgue measurable sets. Moreover it is this property that can be used to prove that a
bounded set is Lebesgue measurable if and only if its inner and outer measures agree.
The next result shows that the µ∗ -measurable sets are a good choice, in the sense that
µ∗ behaves well on the µ∗ -measurable sets (it is a measure).

15.2. T HEOREM (Carathéodory’s extension theorem part 1). If µ∗ is an outer measure on


X, then the µ∗ -measurable subsets of X form a σ-algebra, and µ∗ is a measure when restricted to
the µ∗ -measurable sets.

P ROOF. The first part of the proof will be to show that the collection of µ∗ -measurable
sets is a Boolean algebra and µ∗ is finitely additive on the µ∗ -measurable sets. To begin, it
is clear that a set A is µ∗ -measurable if and only if Ac is µ∗ -measurable.
We now show that the µ∗ -measurable sets are closed under pairwise unions. Suppose
that A, B are µ∗ -measurable. Since we have already proved µ∗ is subadditive for arbitrary
sets, it suffices to prove that for any set S we have

µ∗ (S) ≥ µ∗ (S ∩ ( A ∪ B)) + µ∗ (S r ( A ∪ B))

To achieve this, we expand the right-hand side and then apply the measurability of B
followed by the measurability of A:

µ∗ (S ∩ ( A ∪ B)) + µ∗ (S ∩ ( A ∪ B)c )
≤ µ∗ (S ∩ A ∩ Bc ) + µ∗ (S ∩ A ∩ B) + µ∗ (S ∩ Ac ∩ B) + µ∗ (S ∩ Ac ∩ Bc )
= µ∗ (S ∩ A) + µ∗ (S ∩ Ac )
= µ(S)
15. CONSTRUCTION OF ABSTRACT MEASURES 50

Next we show that µ∗ is finitely additive on the µ∗ -measurable sets. Suppose that A, B
are disjoint and µ∗ -measurable. Then using the measurability of A, we have

µ∗ ( A ∪ B) = µ∗ (( A ∪ B) ∩ A) + µ∗ (( A ∪ B) r A)
= µ∗ ( A) + µ∗ ( B)
The second part of the proof is to show that the µ∗ -measurable sets form a σ-algebra
and that µ∗ is countably additive on the µ∗ -measurable sets. Let An be a sequence of
µ∗ -measurable sets. Since the µ∗ -measurable sets form an algebra, we can assume with-
out loss of generality that the An are pairwise disjoint. Then using the argument of
the previous paragraph, with S∩ inserted and then induction, for any set S we have
µ∗ (S ∩ 1N An ) = ∑1N µ∗ (S ∩ An ). It follows that
S

N N
µ∗ (S) = µ∗ (S ∩ An ) + µ∗ (S r
[ [
An )
1 1
N
= ∑ µ∗ (S ∩ An ) + µ∗ (S r
[
An )
1

Taking N → ∞, we obtain

µ∗ (S) = ∑ µ∗ (S ∩ An ) + µ∗ (S r
[
An )
≥ µ∗ ( (S ∩ An )) + µ∗ (S r
[ [
An )
= µ∗ (S ∩ An ) + µ∗ (S r
[ [
An )

Thus we have shown that An is µ∗ -measurable.


S

Finally if we take S = An in the above, we obtain that µ∗ ( An ) ≥ ∑ µ∗ ( An ), which


S S

shows that µ∗ is countably additive as desired. 


Now we return to the motivational context where the outer measure µ∗ is constructed
from a finitely additive measure µ0 , as was done for Lebesgue measure. In this case,
provided the finitely additive measure µ0 was in fact a premeasure, µ∗ will be the desired
Carathéodory extension of µ0 .

15.3. T HEOREM (Carathéodory’s extension theorem part 2/Hahn–Kolmogorov). Let


A be a Boolean algebra on X, and µ0 a premeasure on A. Let µ∗ be the outer measure induced
by µ0 according to Equation (14.e1) and B the σ-algebra of µ∗ -measurable sets. Then B is an
extension of A, and µ∗ is an extension of µ0 .

P ROOF. We first show that if A lies in A then A is µ∗ -measurable. For this, let S be an
arbitrary set and find sets Bn ∈ A such that S ⊂ Bn and ∑ µ0 ( Bn ) − µ∗ (S) < e. Then
S

µ∗ (S ∩ A) + µ∗ (S r A) ≤ ∑ µ0 ( Bn ∩ A) + ∑ µ0 ( Bn r A)
= ∑ µ0 ( Bn )
< µ∗ (S) + e
15. CONSTRUCTION OF ABSTRACT MEASURES 51

Taking e → 0, we conclude that A is µ∗ -measurable.


Next we show that if A lies in A then µ∗ ( A) = µ0 ( A). It is clear from the definition
that µ∗ ( A) ≤ µ0 ( A). To show that µ0 ( A) ≤ µ∗ ( A), let Bn ∈ A be such that A ⊂ Bn .
S

We can assume without loss of generality that the Bn are pairwise disjoint. Then using the
axiom of a premeasure, we obtain

µ0 ( A ) = ∑ µ0 ( A ∩ Bn )
≤ ∑ µ0 ( Bn )
Taking the infemum over all such sequences Bn , we conclude that µ0 ( A) ≤ µ∗ ( A), as
desired. 
This concludes our proof of Carathéodory’s extension theorem. As an application, we
will now show how to produce a family of interesting measures on the real line, beyond
just the Lebesgue measures.

15.4. D EFINITION . An interval is said to be half-open if it is of the form ( a, b] where


a ∈ [−∞, ∞) and b ∈ R, or else of the form ( a, ∞). We let H denote the Boolean algebra
generated by the half-open intervals.

It is easy to see that H consists of finite disjoint unions of half-open intervals. The
σ-aglebra generated by H is exactly the σ-algebra of Borel subsets of R. If F : R → R is an
increasing, right-continuous function, then we can define a measuring function on H by
letting µ F (( a, b]) = F (b) − F ( a), and extending µ F to disjoint unions in the obvious way.

15.5. T HEOREM . For any increasing, right-continuous function F, µ F is a premeasure on H.

P ROOF OUTLINE . It is not difficult to see that µ F is well-defined on H. Moreover µ F is


finitely additive by construction.
Thus it remains only to show that µ F satisfies the axiom of a premeasure. Suppose
that An ∈ H and that An ∈ H too. We can assume without loss of generality that An
S
S
are each single half-open intervals, and that An is a single half-open interval I.
It is clear from monotonicity that µ F ( An ) ≥ ∑ µ F ( An ), so it suffices to show that
S

µ F ( An ) ≤ ∑ µ F ( An ). In order to achieve this, let us define µ F on open intervals by


S

letting µ F (( a, b)) = limx→b− F ( x ) − F ( a), and similarly for closed intervals. Then using
the right-continuity of F, we can pay e to replace each An with an open Un such that
An ⊂ Un . Similarly we can pay e to replace I with a closed (bounded) interval K such that
K ⊂ I.
Now by the Heine–Borel theorem, just finitely many of the Un suffice to cover K, and
hence µ F (K ) ≤ ∑ µ F (Un ). It follows that
[
µF ( An ) ≤ µ F (K ) + e
≤ ∑ µ F (Un ) + e
≤ ∑ µ F ( An ) + 2e
15. CONSTRUCTION OF ABSTRACT MEASURES 52

Thus the result follows by taking e → 0. 


If F ( x ) = x, then the above construction simply gives µ F = the Lebesgue measure m.
And in general, F acts as the cumulative density of the measure µ F . Measures of this form
are called Lebesgue-Stieltjes measures.

E XERCISE 15.1 (Tao, ex 1.7.9(i)). Let µ0 be a premeasure on ( X, A) and let µ on ( X, B)


be the Carathéodory extension. Show that for any B ∈ B [such that µ( B) < ∞?] there
exists C in the σ-algebra generated by A such that B ⊂ C and µ(C r B) = 0.

E XERCISE 15.2 (Tao, ex 1.7.9(ii)(iii)). Let µ0 be a premeasure on ( X, A), µ∗ the corre-


sponding outer measure, and µ on ( X, B) the Carathéodory extension. Show that for any
set B such that µ∗ ( B) < ∞, we have B ∈ B if and only if for all e > 0 there exists A ∈ A
such that µ∗ ( A 4 B) < e.

E XERCISE 15.3 (Tao, ex 1.7.15(i)). Suppose F is a monotone and non-decreasing func-


tion. Show that F is continuous if and only if µ F ({ x }) = 0 for every x ∈ R.
16. ABSTRACT INTEGRATION THEORY 53

§16. Abstract integration theory

R EADING . Tao, §1.4.4


In the past several sections we have explained in detail how the theory of the Lebesgue
measure on Rd can be abstracted to yield a theory of abstract measures on abstract mea-
surable spaces. In this section we give a brief tour of how the theory can also be used
to describe abstract measurable functions, and to carry out integration with respect to
arbitrary measures.
Although the definition of measurable functions has many equivalent formulations,
our official definition will be the following “pre-image” version.
16.1. D EFINITION . Suppose that ( X, B) is a measurable space. A function f which
maps X to [0, ∞], R, or C is called measurable if the pre-image of any open set is in B , that
is, for every open subset U of the codomain, f −1 (U ) ∈ B .
16.2. P ROPOSITION . The measurable functions on ( X, B) are closed under sums, products,
continuous compositions, and pointwise limits.
P ROOF OUTLINE . We address only the last point, that measurable functions are closed
under pointwise limits. For this we refer to the method of proof of Theorem 10.2. The idea
is that if f is the limit of the functions f n , then it is also the lim sup of the functions f n . This
allows one to write f −1 (U ) as a countable Boolean combination of the sets f n−1 (U ). 
We next turn to the task of defining integration with respect to an abstract measure
R
µ. Intuitively µ is a distribution or density on the space X, and f dµ is a weighted,
continuous sum. For another kind of intuition, if µ is a probability on a probability space,
R
and f is a random variable, then f dµ denotes the expectation of f .
In order to define the integral with respect to µ, we mimic our original strategy and
begin with simple functions.
16.3. D EFINITION . Let ( X, B) be a measurable space. A function f : X → [0, ∞] is said
to be simple if it is measurable and takes finitely many values, that is, if f = ∑1k ci χ Ai where
Ai ∈ B .
If f is the simple function above, and µ is a measure on ( X, B), then we define the
simple integral of f with respect to µ by
Z k
s f dµ = ∑ ci µ ( Ai )
1
It is possible to check, as we have done in the case of Lebesgue integration, that the
simple integral is well-defined, linear, and has numerous other familiar properties.
16.4. D EFINITION . Suppose ( X, B) is a measurable space, and µ is a measure on it. If
f : X → [0, ∞] is measurable, then we define the integral of f with respect to µ by
Z  Z 

f dµ = sup s g dµ g ≤ f , g simple

16. ABSTRACT INTEGRATION THEORY 54

It is not difficult to establish most of the basic properties of measurable functions pre-
viously stated in the Lebesgue case.

16.5. P ROPOSITION . The integral of nonnegative measurable functions satisfies the following
properties
R R
(a) (equivalence) if f = g almost everywhere then f dµ = g dµ;
R R
(b) (monotonicity) if f ≤ g almost everywhere then f dµ ≤ g dµ;
(c) (range truncation) if f N = min( f , N ) then f N dµ → f dµ; and
R R

(d) (support truncation) if An is a sequence of measurable sets such that An ⊂ An+1 and
R R
An = X, then f χ An dµ → f dµ.
S

(e) (Markov’s inequality) if f is measurable, then µ({ x | f ( x ) ≥ λ}) ≤ λ1 f dµ;


R
R R R
(f) (additivity) ( f + g) dµ = f dµ + g dµ;

P ROOF. All of the properties (a)–(d) can be proved in a fashion similar to the case of
the Lebesgue integral.
The same is true of property (e), but we neglected to state it earlier so let us provide
the proof now. Let g = λχ{ x| f (x)≥λ} . Then clearly g is a simple function and g ≤ f . It
follows that the simple integral of g is a lower bound for the integral of f , so λµ({ x |
R
f ( x ) ≥ λ}) ≤ f dµ, as desired.
The proof of additivity is similar to the Lebesgue case, with one wrinkle. Recall that
in the Lebesgue proof we applied the truncation lemma to the sets [− N, N ]d to assume
without loss of generality that f , g have finite measure support. In general we cannot
assume that X is a union of countably many sets of finite measure. Instead we let An =
{ x | f ( x ) > n1 or g( x ) > n1 }. This family of sets is increasing and the union contains the
supports of f and g. By Markov’s inequality, the sets An have finite measure, and the
proof can now proceed by truncating to An . 
Now that the integral of nonnegative functions has been defined, we may again define
the absolutely convergent Lebesgue integral for complex-valued functions.

16.6. D EFINITION . Suppose that ( X, B) is a measurable space and µ is a measure on it.


If f : X → C is measurable and defined µ-almost everywhere, we say that f is absolutely
integrable with respect to µ if | f | dµ < ∞.
R
R R R
In this case, we define f dµ = < f dµ + i = f dµ, where for real-valued functions
f we define f dµ = f + dµ − f − dµ.
R R R

Once again, if ( X, B) is a measurable space and µ is a measure on it, we can define


L1 ( µ )
to be the space of measurable complex-valued functions on X, where two such
functions are identified if they agree µ-almost everywhere. We can also define a norm on
the space L1 (µ) by k f k = | f | dµ. The following result follows using the same proof as
R

before.

16.7. P ROPOSITION . The space L1 (µ) together with the norm k · k is a normed vector space.
16. ABSTRACT INTEGRATION THEORY 55

Finally we remark that our major convergence theorems all hold true for integration
with respect to an abstract measure µ. These include the monotone convergence theorem,
Tonelli’s theorem, Fatou’s lemma, and the dominated convergence theorem.

E XERCISE 16.1 (See Tao, ex 1.4.29). ◦ Show that f is measurable if and only if
+ −
f and f are measurable.
◦ Show that sums and products of measurable functions are measurable.

E XERCISE 16.2 (See Tao, ex 1.4.36). Establish the support truncation property: if An is
R
a sequence of measurable sets such that An ⊂ An+1 and An = X, then f χ An dµ →
S
R
f dµ.
PART III

Functional analysis

§17. Normed vector spaces

R EADING . BBT §12.1, §12.3

In our study of integration theory, we studied several classes of real and complex-
valued functions. For example we studied the class of Lebesgue measurable functions,
and the class of absolutely integrable functions. Across analysis there are many other
important classes of functions, including the continuous functions, uniformly continuous
functions, differentiable functions, and so on.
Such function spaces have a lot of internal structure. For example if you can add or
scale the elements of the function space, then it will have the algebraic structure of a vector
space. And if you can measure distances between functions in the space, it should have a
geometry too. In the best cases, the algebraic and geometric structures make the function
space into a normed vector space.

17.1. D EFINITION . A normed vector space consists of a (real or complex) vector space X
together with a mapping k · k : X → [0, ∞) satisfying:
◦ (homogeneity) k ax k = | a| · k x k;
◦ (triangle inequality) k x + yk ≤ k x k + kyk; and
◦ (non-vanishing) k x k = 0 implies x = 0.
The mapping k · k is called a norm on X.

The norm gives rise to a metric on X defined by d( x, y) = k x − yk. The metric has
several special properties not true in a general metric space. For example, it is uniform
throughout the space in the sense that it is translation invariant: d( x + z, y + z) = d( x, y).

17.2. D EFINITION . A normed vector space is called a Banach space if the associated
metric d( x, y) = k x − yk is complete.

Recall the definition of a complete metric d: whenever xn is Cauchy, that is

(∀e > 0)(∃ N )(∀n, m ≥ N ) d( xn , xm ) < e


then there exists x ∈ X such that xn converges to x, that is

(∀e > 0)(∃ N )(∀n ≥ N ) d( xn , x ) < e

56
17. NORMED VECTOR SPACES 57

The property means that there are no holes in the space—any apparent point which can be
approximated actually exists. We will see the value of assuming that our normed vector
spaces are complete in later sections.
We now describe several familiar examples of normed vector spaces (in fact most of
them will be Banach spaces).

17.3. E XAMPLE . The ordinary finite-dimensional vector space Rd , together with its
 1/2
usual Euclidean norm k x k = ∑1d xi2 , is a Banach space. It is a classical annoying
exercise to establish the triangle inequality. One can also verify that the completeness
property holds, using the classical fact/construction that it is true for R.

17.4. E XAMPLE . The space RN consisting of all real sequences will not be a Banach
space in any reasonable way. However it does contain many classical Banach spaces. For
example we can consider the space of all square summable sequences
n o
`2 = x ∈ RN | ∑ xn2 < ∞
1/2
This is a Banach space with its generalized Euclidean norm k x k2 = ∑ xi2 . It is not
2
particularly easy to verify that ` is a Banach space and k · k2 is a complete norm on it. We
will do this later!

17.5. E XAMPLE . As we have mentioned, spaces of integrable functions may also form
Banach spaces. Let ( X, B) be a measurable space and µ a measure on it. We have already
described the space L1 (µ) of all absolutely integrable functions on X. It is also possible to
use other powers, for example, let
 Z 
L2 (µ) = f : X → C | | f |2 dµ < ∞
1/2
We may then define a norm on L2 be k f k2 = | f |2 . The spaces L1 (µ) and L2 (µ) are
R

not quite Banach spaces, but only because they fail to satisfy the non-vanishing property.
This can easily be remedied by identifying two functions if they agree almost everywhere.

17.6. E XAMPLE . As a final series of examples, if [ a, b] is any closed, bounded interval


then let B[ a, b] be the space of all bounded functions f : [ a, b] → R together with the
supremum norm k f k = sup { f ( x ) | x ∈ [ a, b] }. This is a Banach space which contains
many other spaces of independent interest. For example it contains the space C [ a, b] of
continuous functions on [ a, b], the space D [ a, b] of all differentiable functions on [ a, b] with
continuous derivative, and the space P[ a, b] of all polynomial functions on [ a, b]. The last
two turn out to be incomplete.

Having given the definition and basic examples of Banach spaces, we now briefly
study mappings between them. For ordinary vector spaces, the most natural mappings
are the operators, or linear mappings. For Banach spaces we primarily study operators
which are also continuous.
17. NORMED VECTOR SPACES 58

The motivating examples are the operators from Cn to Cm , which are simply the famil-
iar matrix transformations studied in linear algebra. It turns out that all operators from Cn
to Cm are continuous. It is tempting to believe that all operators are continuous because
they are linear, but this turns out not to be the case in infinite-dimensional settings.
Another example of a linear operator is the mapping L1 (µ) → C given by f 7→ f dµ.
R

We have of course seen that the mapping is linear, and it turns out to be continuous as
well.
As a final example, the mapping from D [ a, b] to B[ a, b] which takes a differentiable
function to its derivative is also linear. However it fails to be continuous with respect
to the supremum norm. Two functions can be very close together but also have very
different slopes!
In the next two results, we show that the continuity of operators is very special when
compared with arbitrary continuous functions.

17.7. L EMMA . Let X, Y be normed vector spaces and let T : X → Y be an operator. If T is


continuous at one point x ∈ X, then T is uniformly continuous.

P ROOF. Using the translation-invariance property, we can assume that x = 0. By


definition this means that for all e > 0 there exists δ > 0 such that for all x ∈ X we have
k x k < δ =⇒ k Tx k < e. Given two points x, x 0 we can apply the latter property to x − x 0
and use the linearity of T to conclude that k x − x 0 k < δ =⇒ k Tx − Tx 0 k < e. In other
words, T is uniformly continuous. 
In the following lemma, an operator T : X → Y is said to be bounded if there exists a
real number M such that k Tx k ≤ M k x k for all x ∈ X. Note the abuse of terminology! A
bounded operator is not bounded in the traditional sense that it takes values inside a ball
of finite radius. Rather a bounded operator is one which maps bounded sets to bounded sets.

17.8. L EMMA . Let X, Y be normed vector spaces and let T : X → Y be an operator. Then T is
continuous if and only if T is bounded.

P ROOF. First suppose that T is continous. Apply the continuity with e = 1 to obtain
δ > 0 such that k x k < δ =⇒ k Tx k < 1. The for any x 6= 0 we have

kxk kxk

x
k Tx k = T
δ ≤1
kxk δ δ
Conversely suppose that T is bounded, and let M be such that k Tx k ≤ M k x k for
x ∈ X. Given any e > 0 we let δ = e/M. Then we have
e
k x k < δ =⇒ k Tx k ≤ Mk x k < M =e
M
This shows that T is continuous at the point 0, and hence by the previous lemma T is
continuous. 
E XERCISE 17.1 (BBT, ex 12:1.2). Show that the addition and constant multiple opera-
tions are continuous on a normed vector space.
17. NORMED VECTOR SPACES 59

E XERCISE 17.2 (BBT, ex 12:1.3). Show that the unit ball of a normed vector space is
convex. That is, for x, y in the ball and λ ∈ (0, 1) we have λx + (1 − λ)y is also in the ball.
Rx
E XERCISE 17.3 (BBT, ex 12:3.1). Consider the operators D ( f ) = f 0 , (S f )( x ) = a f dµ,
R
and I ( f ) = f dµ. What are appropriate domains and codomains of each operator? Show
that S and I are continuous, and D is not continuous.

E XERCISE 17.4. Let D [0, 1] denote the space of differentiable functions on [0, 1] with
continuous derivative, equipped with the supremum norm of B[0, 1]. Show that D [ a, b] is
not complete.
18. THE HAHN–BANACH THEOREM 60

§18. The Hahn–Banach theorem

R EADING . BBT §1.5

In this section we continue our study of operators on a space X. However we confine


our attention to the simplest operators, which are the ones that take values in the scalar
field R or C. Such operators are so fundamental that we give them the special name
“functional”. In this section we consider only the case of real normed vector spaces, but
the same results hold for complex spaces too.

18.1. D EFINITION . If X is a normed vector space, a linear functional on X is an operator


φ : X → R. A bounded linear functional on X is a bounded, which is to say continuous,
linear functional X.

Let us briefly recall the situation for X = Rd with any of its norms. Here a linear
functional φ is determined by its values on a basis, and it follows that φ is of the form
x 7→ y T x, that is, the dot product with a row vector or “dual vector”. It should not be
surprising that these mappings are always bounded, regardless of the norm on Rd .
When X is an infinite-dimensional normed vector space, it is not true that all linear
functionals on X are bounded. In fact, given an infinite dimensional normed vector space
X, it is not immediately obvious that there are any nonzero bounded linear functionals on
X. In the rest of this section we present the Hahn–Banach theorem, which implies that on
any normed vector space, there really are lots of bounded linear functionals.
In order to state the Hahn–Banach theorem in its most powerful form, we need the
following generalization of a norm on a vector space.

18.2. D EFINITION . Let X be a vector space. A sublinear functional on X is a function


p : X → R that satisfies:
(a) (positive homogeneity) p(cx ) = cp( x ) for all c ≥ 0; and
(b) (subadditivity) p( x + y) ≤ p( x ) + p(y).

Norms and seminorms are both examples of sublinear functionals. Another example
is the upper Riemann integral, defined on the space M[ a, b] of bounded functions on [ a, b].

18.3. T HEOREM (Hahn–Banach). Let X be a vector space, Y ≤ X a subspace of X, and p


a sublinear functional on X. Then any linear functional φ0 on Y such that φ0 ≤ p extends to a
linear functional φ on X such that φ ≤ p.

Before proving the above abstract form of the Hahn–Banach theorem, we present sev-
eral key consequences regarding the construction of bounded linear functionals.

18.4. C OROLLARY. Let X be a normed vector space, and Y ≤ X a subspace of X.


(a) Any bounded linear functional φ0 on Y extends to a bounded linear functional φ on X.
(b) If Y is closed and z ∈
/ Y, then there exists a bounded linear functional φ on X such that
φ(Y ) = 0 and φ(z) 6= 0.
18. THE HAHN–BANACH THEOREM 61

(c) The bounded linear functionals separate points: for all x, x 0 ∈ X, if x 6= x 0 then there is
a bounded linear functional φ on X such that φ( x ) 6= φ( x 0 ).

P ROOF. (a) Since φ0 is a bounded linear functional on Y, there exists some M such that
φ0 (y) ≤ Mkyk for all y ∈ Y. We may therefore apply the Hahn–Banach theorem with the
sublinear functional p( x ) = M k x k. Thus φ0 extends to a linear functional φ on X such
that φ( x ) ≤ Mk x k. In particular, φ is bounded too.
(b) We first define a function φ0 on the space Y + Rz which is bounded by p = the
norm. For this we will let φ0 (y + cz) = cφ0 (z) where φ0 (z) remains to be determined. In
order to satisfy φ0 (y + cz) ≤ ky + czk we require that cφ0 (z) ≤ ky + czk for all y ∈ Y.
Substituting y with −cy, we see that we must choose φ0 (z) ≤ k − y + zk for all y ∈
Y. Since Y is closed we must have infy∈Y ky + zk 6= 0 (otherwise z would be a limit of
elements of Y and hence in Y). It follows that we can choose a nonzero value of φ0 (z), and
we may then use part (a) to extend φ0 to a bounded linear functional φ on X that meets
our requirements.
(c) If x 6= x 0 then x − x 0 6= 0. Applying part (b) with Y = {0} we can find a bounded
linear functional φ such that φ( x − x 0 ) 6= 0. It follows that φ( x ) 6= φ( x 0 ). 
We now return to the proof of the Hahn–Banach theorem.
P ROOF OF T HEOREM 18.3. We begin by showing that we can find a proper extension
of φ0 . Specifically, given any z ∈ X r Y we will find an extension of φ0 to a linear func-
tional φ1 on Y ⊕ Rz satisfying φ1 ≤ p. For this we will define

φ1 (y + cz) = φ0 (y) + cφ1 (z)

where φ1 (z) will be determined a little bit later. When we do choose φ1 (z), it will have to
satisfy the requirement that for all y ∈ Y and all c ∈ R:

φ0 (y) + cφ1 (z) ≤ p(y + cz)

To isolate φ1 (z) we must consider the cases of negative and positive values of the coeffi-
cient c separately. Thus assume c > 0 and split the last equation into two conditions:

(∀y) φ0 (y) + cφ1 (z) ≤ p(y + cz)


(∀y) φ0 (y) − cφ1 (z) ≤ p(y − cz)
Solving each for φ0 (z) and substituting y with cy in each gives us the two new conditions:

(∀y) φ1 (z) ≤ p(y + z) − φ0 (y)


(∀y) φ1 (z) ≥ − p(y − z) + φ0 (y)
18. THE HAHN–BANACH THEOREM 62

In order for the constraints to be satisfiable, it is sufficient to have for all y, y0 ∈ Y that
− p(y − z) + φ0 (y) ≤ p(y0 + z) − φ0 (y0 ). And this is indeed the case, since
φ0 (y) + φ0 (y0 ) = φ0 (y + y0 )
≤ p(y + y0 )
= p(y − z + y0 + z)
≤ p(y − z) + p(y0 + z)
Thus we can find a suitable value for φ1 (z) and successfully extend φ to φ0 as required.
To complete the proof we wish to apply the above step repeatedly. Since the number
of steps will be uncountable in general, it is necessary to phrase our construction using
the standard Zorn’s lemma: if P is a partially ordered set and every chain of P has an upper
bound, then P has a maximal element.
Now let P be the collection of all linear functionals φ such that the domain of φ is
a subspace of X, φ extends φ0 , and φ ≤ p. We partially order P by function extension.
A chain C in P always has an upper bound, namely the set-theoretic union C of the
S

members of the chain. Moreover the union will be ≤ p because each member of the chain
is ≤ p.
Therefore we can apply Zorn’s lemma to find an element φ of P which is maximal
with respect to function extension. We claim moreover that the domain of φ must be all
of X. Indeed, otherwise we can use the argument above to properly extend the domain of
φ to find a larger element φ0 of P. This contradicts the maximality of φ, and completes the
proof. 

E XERCISE 18.1 (BBT, ex 12:5.1). Let f be a bounded real-valued function on [0, 1], and
let U ( f ) denote the Upper Lebesgue integral of f . Show that U is a sublinear functional.
What can you conclude from the Hahn–Banach theorem?

E XERCISE 18.2 (BBT, ex 12:5.2). Let `∞ denote the space of bounded real sequences
with the supremum norm, and let c denote the subspace of convergent real sequences.
Define p on `∞ by
x1 + · · · + x n
p( x ) = lim sup
n→∞ n
Verify that p is a sublinear functional such that lim x ≤ p( x ). If we apply the Hahn–
Banach theorem to obtain a bounded linear functional L extending lim, show that lim inf x ≤
L( x ) ≤ lim sup x and calculate the value of L(0, 1, 0, 1, . . .).

E XERCISE 18.3. Let Rd be equipped with any norm that makes it into a normed vector
space. Show that every linear functional on Rd is continuous.
19. SPACES OF OPERATORS AND THE DUAL SPACE 63

§19. Spaces of operators and the dual space

R EADING . BBT §12.3, 12.7

In the past two sections we have introduced and discussed the continuous operators
T : X → Y between two normed vector spaces. In this section we study the collection of
all such operators as a space in its own right.

19.1. D EFINITION . Let X, Y be normed vector spaces. Then B( X, Y ) denotes the space
of bounded linear operators T : X → Y.

We equip B( X, Y ) with the operations of pointwise addition and pointwise scaling. In


other words, if T, T 0 ∈ B( X, Y ) then T + T 0 is defined to be the operator ( T + T 0 )( x ) =
T ( x ) + T 0 ( x ) and cT is defined to be the operator (cT )( x ) = cT ( x ). We also equip B( X, Y )
with the operator norm:

k T k = inf { M | (∀ x ) k Tx k ≤ Mk x k }
In other words, the operator norm of a bounded operator T is the least value of M which
witnesses that T is bounded. Naturally we must show that the operations and norm
satisfy the properties of a normed vector space.

19.2. P ROPOSITION . (a) B( X, Y ) is a normed vector space with the operations of point-
wise addition and scaling, and with the operator norm.
(b) If Y is a Banach space then so is B( X, Y ).

P ROOF. (a) We first show that the operator norm is indeed a norm. The homogene-
ity and non-vanishing properties are easy to check. For the triangle inequality let T, T 0 ∈
B( X, Y ), and calculate k( T + T 0 )( x )k = k Tx + T 0 x k ≤ k Tx k + k T 0 x k ≤ k T kk x k + k T 0 kk x k.
It follows that k T + T 0 k ≤ k T k + k T 0 k, as desired.
It also follows from homogeneity and the triangle inequality that B( X, Y ) is closed
under scalar multiplication and addition. Thus B( X, Y ) is a normed vector space.
(b) It remains only to show that B( X, Y ) is complete. For this let Tn be a sequence of
elements of B( X, Y ) and assume that it is Cauchy in the operator norm. This means that
for all e > 0, there exists N such that for all m, n ≥ N we have k Tm − Tn k < e.
Now for any fixed x ∈ X, it follows from the last equation that k Tm x − Tn x k < ek x k.
In particular, the sequence Tn x is a Cauchy sequence in the space Y. Since we are assuming
that Y is complete, the sequence Tn x converges and we define Tx = lim Tn x.
Now T is a well-defined function from X to Y, and by definition T is the pointwise
limit of the Tn . We need to check that T ∈ B( X, Y ) and moreover that Tn → T in the
operator norm.
To see that T ∈ B( X, Y ), first note that it is easy to check T is a linear map. For example,
T ( x + y) = lim Tn ( x + y) = lim Tn ( x ) + Tn (y) = T ( x ) + T (y). To see that T is bounded,
we first claim that the sequence of operator norms k Tn k is itself bounded. For this claim,
recall from the reverse triangle inequality that |k Tn k − k Tm k| ≤ k Tn − Tm k. Thus the fact
19. SPACES OF OPERATORS AND THE DUAL SPACE 64

that Tn is Cauchy implies that k Tn k is Cauchy in R, and any Cauchy sequence in R is


bounded. This completes the claim.
Now let M be a bound for the sequence k Tn k. Then given x ∈ X, for all n we have
k Tn x k ≤ Mk x k. Taking the limit of both sides, we conclude that k Tx k ≤ Mk x k, and this
means that T is a bounded operator. Thus we have shown that T ∈ B( X, Y ).
Last we establish that Tn → T in operator norm. For this, given e > 0 we have already
argued that we can find N such that m, n ≥ N implies k Tm x − Tn x k < ek x k for all x ∈ X.
Now let n ≥ N be fixed and take the limit as m → ∞. This gives us k Tx − Tn x k ≤ ek x k for
all x. We thus have that k T − Tn k ≤ e, which means that Tn → T in operator norm. 
The argument for part (b) is our first instance of a standard argument template. To
show that a given function space is complete, given a Cauchy sequence f n one first con-
structs a proposed limit function f , often the pointwise limit. One then checks that f
actually lies in the desired space, and that f n → f in the desired norm.
In the previous section we investigated the special case of operators valued in the
scalar field R. Thus it is natural to focus on the space B( X, R) of bounded linear function-
als on a space X. We have previously hinted that the bounded linear functionals play the
role of dual vectors (or row vectors) in infinite-dimensional spaces. We are now ready to
make this a formal definition.

19.3. D EFINITION . If X is a normed (real) vector space then the dual of X is X ∗ =


B( X, R).

By the results of this section, the dual space X ∗ is always a Banach space with the
operator norm. By the Corollary to the Hahn–Banach theorem, the elements of the dual
space are plentiful in the sense that they separate the points of X. In fact, now that we have
introduced the operator norm, we can strengthen two of the statements in Corollary 18.4.

19.4. C OROLLARY. Let X be a normed vector space, Y ≤ X a subspace of X.


(a) Any φ0 ∈ Y ∗ extends to an element φ ∈ X ∗ such that kφk = kφ0 k.
/ Y, then there exists φ ∈ X ∗ such that φ(Y ) = 0 and φ(z) 6= 0.
(b) If Y is closed and z ∈
Moreover one may take kφk = 1 and φ(z) = infy∈Y ky + zk.

P ROOF. (a) Since φ is an extension of φ0 , we always have kφk ≥ kφ0 k. And in the
proof Corollary 18.4(a), it is apparent that kφk ≤ kφ0 k.
(b) Recall that in the proof of Corollary 18.4(b), we showed one may define φ0 on
Y + Rz in such a way that φ(Y ) = 0 and φ(z) = infy∈Y ky + zk and φ0 ≤ k · k. It is not too
difficult to argue that with this definition, kφ0 k = 1. Therefore by part (a) we can extend
φ0 to φ with kφk = 1 as desired. 
We close this section with the following result about the double dual of a space.

19.5. P ROPOSITION . Let X be a normed vector space. Then there is a norm-preserving oper-
ator from X into its double dual X ∗∗ .
19. SPACES OF OPERATORS AND THE DUAL SPACE 65

P ROOF. We define an embedding x 7→ x̂ from X to X ∗∗ as follows. Given an element


x ∈ X, we define the corresponding element x̂ ∈ X ∗∗ by the formula:

x̂ (φ) = φ( x )

So x̂ is a function from X ∗ to R. And x̂ is linear because x̂ (φ1 + φ2 ) = (φ1 + φ2 )( x ) =


φ1 ( x ) + φ2 ( x ) = x̂ (φ1 ) + x̂ (φ2 ). Next x̂ is bounded because | x̂ (φ)| = |φ( x )| ≤ kφk · k x k.
Thus we have verified that x̂ ∈ X ∗∗ .
Now we verify that the map x 7→ x̂ is itself linear. Indeed, we have x\ 1 + x2 ( φ ) =
φ( x1 + x2 ) = φ( x1 ) + φ( x2 ) = x̂1 (φ) + x̂2 (φ). And similarly for scalar multiplication.
Finally to show that x 7→ x̂ is norm-preserving, note that the calculation in the first
paragraph shows that k x̂ k ≤ k x k. To show that k x̂ k ≥ k x k, we use Corollary 19.4(a) to
obtain φ ∈ X ∗ such that φ( x ) = k x k and kφk = 1. Thus we have that x̂ (φ) = φ( x ) =
k x k = k x kkφk, which witnesses that k x̂ k ≥ k x k. 
The above proposition may seem esoteric, but it has many uses. For example, if X is
incomplete then we can use it to give a concrete construction of the Banach space com-
pletion of X. For this, observe that since x 7→ x̂ is norm-preserving, we have that X is
isomorphic to its image X̂ in X ∗∗ . Since we have shown above that every dual is com-
plete, we know that X ∗∗ is complete. It follows that the completion of X is isomorphic to
the closure of X̂ in X ∗∗ .
It can even happen that the map x 7→ x̂ is surjective onto X ∗∗ . This special property
will be investigated in future sections.

E XERCISE 19.1. Show that k T k is equal to sup { k Tx k : k x k ≤ 1 }.

E XERCISE 19.2 (BBT, ex 12:7.2). Show that k x k = sup { |φ( x )| : φ ∈ X ∗ and kφk = 1 }.

E XERCISE 19.3 (BBT, ex 17:7.5). If X, Y are Banach spaces and T ∈ B( X, Y ), show that
( T ∗ φ)( x )
= φ( Tx ) defines an element of B(Y ∗ , X ∗ ) such that k T ∗ k = k T k.

E XERCISE 19.4 (BBT, ex 12:7.6(b)). Show that a Banach space X is reflexive if and only
if X ∗ is reflexive.

E XERCISE 19.5. Complete the proof of Corollary 19.4(b).


20. THREE RESULTS ON BANACH SPACES 66

§20. Three results on Banach spaces

R EADING . BBT §12.11, 12.13, 12.14

In our introduction to normed vector spaces, we singled out the special case when the
space is complete and called it a Banach space. However in our investigation we have
said very little that is special to Banach spaces. In this section we present several key
results that are essentially unique to Banach spaces because they rely on the completeness
property.
The three key results we will present are called the uniform boundedness principle,
the open mapping theorem, and the closed graph theorem. In ecah case, rather than
provide a proof we will state the result and give a sample application.

20.1. T HEOREM (Uniform boundedness principle). Let X, Y be Banach spaces and F a


family of bounded operators from X to Y. Suppose that for all x there exists a constant Mx such
that for all T ∈ F we have k Tx k ≤ Mx . Then there exists a constant M such that for all T ∈ F
we have k T k ≤ M.

It is often remarked that the uniform boundedness principle sounds too good to be
true—it has a pointwise hypothesis and a uniform conclusion. Regardless, it is true and
has a short proof from the Baire category theorem for complete metric spaces. Because
of its power the uniform boundedness principle is used quite frequently. We present just
one simple consequence concerning pointwise convergence of operators.

20.2. C OROLLARY. Let X, Y be Banach spaces and let Tn : X → Y be a sequence of bounded


operators. If Tn → T pointwise, then T is a bounded operator too.

P ROOF. We have already observed that a pointwise limit of operators is an operator.


Hence it remains only to check that T is bounded. Now given any x ∈ X, since { Tn x } is
a convergent sequence of Y, it is necessarily a bounded sequence of Y. In other words,
the sequence {k Tn x k} is bounded. By the uniform boundedness principle, there exists a
constant M such that k Tn k ≤ M for all n. Thus for any x ∈ X we have

k Tx k = k lim Tn x k
= lim k Tn x k
≤ lim sup k Tn kk x k
≤ Mk xk
In particular, T is bounded and k T k ≤ M. 
For our next result, recall that a function is continuous if the preimage of any open
set is open. A somewhat less used but still very important property is the reverse. A
function is called open if the image of any open set is open. In the case that a function has
an inverse, the open property simply means that the inverse is continuous. However it is
still a valuable property even for functions which are not bijections.
20. THREE RESULTS ON BANACH SPACES 67

20.3. T HEOREM (Open mapping theorem). Let X, Y be Banach spaces and T : X → Y be


a bounded operator. If T is onto, then T is open.

We present just one simple consequence here concerning equivalence of norms. If X


is a normed vector space with two norms, k · k a and k · kb , we say that a, b are equivalent if
there exist constants c, d such that for all x ∈ X we have k x k a ≤ ck x kb and k x kb ≤ dk x k a .
In other words, norms are equivalent if their unit balls can be rescaled to fit inside
one another. For example, the space R2 can be equipped with the usual Euclidean norm
p
k( x, y)k2 = x2 + y2 , and also with the taxicab norm k( x, y)k = | x | + |y|. The Euclidean
norm has a circular unit ball, and the taxicab norm has a diamond shaped unit ball. The

diamond fits inside the circle, and the circle can be scaled down by 2 to fit inside the
diamond. Thus the two norms are equivalent.

20.4. C OROLLARY. Suppose X is a Banach space with two complete norms k · k a and k · kb .
Then if there is a constant c such that k x k a ≤ ck x kb for all x ∈ X, then k · k a and k · kb are
equivalent.

P ROOF. Let id : X → X denote the identity mapping. If we consider id as an operator


from ( X, k · kb ) to ( X, k · k a ), then the hypothesis implies that id is bounded and hence
continuous. It follows from the open mapping theorem that id is open, that is, it maps
open sets to open sets. Since id is a bijection, this simply means that id−1 is continuous and
hence bounded. Thus we conclude that there exists a constant d suh that k x kb ≤ dk x kd
for all x ∈ X. 
Before stating our final result, recall from topology that if f : X → Y is a continuous
function then f has a closed graph, that is, the set of pairs {( x, y) ∈ X × Y | f ( x ) = y} is
closed in X × Y. The next theorem states that the converse holds for bounded operators
on Banach spaces.

20.5. T HEOREM (Closed graph theorem). Let X, Y be Banach spaces and T : X → Y be an


operator. If the graph of T is a closed subset of X × Y, then T is bounded.

Observe that T has a closed graph if and only if xn → x and Txn → y implies y = Tx.
On the other hand, recall that T is continuous if and only if xn → x implies Txn → Tx. So
it is easier to check that T has a closed graph than to check that T is continuous, because
when checking the former one can assume for free that Tn x converges to something.
Rather than give a consequence of the closed graph theorem, we will give an impor-
tant example. Let C [ a, b] be the Banach space of continious functions on [ a, b] with the
supremum norm, and let D [ a, b] be the subspace of all functions with continuous deriva-
tive. Let D : D [ a, b] → C [ a, b] be the derivative operator. To check D has a closed graph,
we suppose that f n → f and D f n → g in supremum norm and verify that D f = g. For
this we integrate both sides of D f n → g and use the fundamental theorem of calculus to
conclude that f n → G, where G is an antiderivative of g. Since f n converges to f , we have
f = G. Now differentiating both sides we conclude that D f = g, as desired.
20. THREE RESULTS ON BANACH SPACES 68

While have just checked that D has a closed graph, it is also easy to check that D is not
bounded. Thus the contrapositive of the closed graph theorem implies that D [ a, b] is not
a Banach space! This is an admittedly somewhat silly way to see this fact, since it is also
possible to argue directly that D [ a, b] is not complete.

E XERCISE 20.1. Give an example of a function from R to R which has a closed graph
but is not continuous. Give an example of function from R to R which is continuous and
surjective but not open. Is it possible to give a bijective example?

E XERCISE 20.2 (BBT, ex 12:13.2). Equip the space C [0, 1] with both the L1 norm and the
supremum norm. Show that the L1 norm is bounded by a constant times the supremum
norm. Show that the reverse is not true. Explain why the two results do not contradict
Corollary 20.4.
21. THE BANACH SPACE L p 69

§21. The Banach space L p

R EADING . BBT §13.1, 13.2

As we have seen, many of the most important Banach spaces are function spaces aris-
ing in other areas of analysis. We have already seen the Banach space L1 of absolutely
integrable functions, and we have seen that there are several other norms derived from
summation and integration. In this section we further investigate the L p -spaces, which
generalize many of these important examples.

21.1. D EFINITION . Let ( X, B) be a measurable space and let µ be a measure on it. For
any measurable f defined on X we let
Z 1/p
p
k f kp = | f | dµ

We then define the space

L p (µ) = f | f is a measurable function on X and k f k p < ∞




with the understanding that f , g are identified when f − g = 0 almost everywhere.

Thus the spaces L1 ( X ) and L2 ( X ) are each examples of L p -spaces, but so are a variety
of others. When X is the finite set {1, . . . , n} with the counting measure, the resulting
space is just Rd with its p-norm. When X = N with the counting measure, the resulting
space is the sequence space ` p with its p-norm.
The rest of this section is devoted to verifying that whenever p ≥ 1, L p really is a
Banach space with respect to the norm k · k p . Before we can prove this result, it is necessary
to establish the following fundamental inequality.

21.2. T HEOREM (Hölder’s inequality). Let p, q ≥ 1 be real numbers such that 1/p +
1/q = 1. If f ∈ L p (µ) and g ∈ Lq (µ), then f g is absolutely integrable and
Z
| f g| dµ ≤ k f k p k gkq

P ROOF. The theorem follows from a classical inequality which we will call Hölder’s
inequality for real numbers:
ap bq
ab ≤ +
p q
There are many proofs and one is left as an exercise.
To begin the proof, note that given f , g as in the theorem statement, we can rescale to
assume that k f k p = k gkq = 1. This is because both k · k p and k · kq satisfy the positive
homogeneity property.
R
Now our objective is to show that | f g| dµ ≤ 1. For this we plug a = | f ( x )| and
b = | g( x )| into Hölder’s inequality for real numbers to obtain
1 1
| f ( x ) g( x )| ≤ | f ( x )| p + | g( x )|q
p q
21. THE BANACH SPACE L p 70

Taking the integral of both sides we have


1 1 1 1
Z
| f g| dµ ≤ (k f k p ) p + (k gkq )q = + = 1
p q p q
as desired. 
The next result, known officially as Minkowski’s inequality, states that the norms k · k p
satisfy the triangle inequality.

21.3. T HEOREM (Minkowski’s inequality). Suppose that p ≥ 1. If f , g ∈ L p (µ), then


k f + gk p ≤ k f k p + k gk p .

P ROOF. We can assume without loss of generality that f , g never take the value ∞. We
begin by writing

| f ( x ) + g( x )| p = | f ( x ) + g( x )| · | f ( x ) + g( x )| p−1
≤ | f ( x )| · | f ( x ) + g( x )| p−1 + | g( x )| · | f ( x ) + g( x )| p−1
We now integrate both sides of this inequality, and then apply Hölder’s inequality to
each of the resulting terms. In the following calculation, we also note that our hypothesis
implies that the value q used in Hölder’s inequality is equal to p/( p − 1). Here is the
computation:
Z Z
p −1
(k f + gk p ) ≤ p
| f | · | f + g| dµ + | g| · | f + g| p−1 dµ

≤ k f k p · k( f + g) p−1 kq + k gk p · (k( f + g) p−1 kq


= k f k p · k( f + g) p−1 k p/( p−1) + k gk p · k( f + g) p−1 k p/( p−1)
= k f k p · (k f + gk p ) p−1 + k gk p · (k f + gk p ) p−1
= (k f k p + k gk p )(k f + gk p ) p−1
We may now divide both sides by (k f + gk p ) p−1 to obtain the desired conclusion. 
We are now ready to prove that the L p spaces are in fact Banach spaces.

21.4. T HEOREM . The space L p (µ) with the norm k · k p is a Banach space.

P ROOF. It is clear that the norm is homogeneous and non-vanishing, and we have
just shown it satisfies the triangle inequality. This also implies that L p (µ) is closed under
linear combinations and therefore it is a vector space. So it only remains to show that the
norm k · k p is complete.
For this let f n be a sequence of elements of L p (µ) which is Cauchy in the k · k p norm.
Passing to a subsequence if necessary, we can suppose without loss of generality that for
all n we have k f n+1 − f n k p < 1/2n . We first wish to show that this implies f n has a
pointwise limit f .
Let gk = ∑1k | f n+1 − f n | and g = ∑1∞ | f n+1 − f n |. So g is the limit of the gk . While the
function g may take the value +∞, we claim that this cannot happen too often. Indeed by
21. THE BANACH SPACE L p 71

Minkowski’s inequality we have


k
k gk k p ≤ ∑ k f n+1 − f n k p ≤ ∑ 1/2n = 1
n =1

It then follows from Fatou’s lemma that | g| p ≤ lim inf | gk | p ≤ 1. Thus we can conclude
R R

that g is finite µ-almost everywhere.


Now using a simple telescoping we can write:
" #
k
lim f n ( x ) = lim
n→∞ k→∞
f1 (x) + ∑ ( f n (x) − f n (x))
n =1

We have just observed that for µ-almost every x, the latter series is absolutely convergent.
Thus the series is convergent, and we can define the function f ( x ) = lim f n ( x ).
It remains only to show that f lies in L p and that f n → f in the norm k · k p . For this,
let e be given and choose N large enough that m, n ≥ N implies k f m − f n k p < e. Fixing n
and applying Fatou’s Lemma to the resulting m-sequence we obtain
Z Z
p
| f − f n | ≤ lim inf | fm − fn | p ≤ e p
m

This shows that k f − f n k p → 0, or in other words that f n → f in the norm of L p . Finally


if n ≥ N then we have k f k p ≤ k f − f n k p + k f n k p < ∞, so f ∈ L p too. 
While the results above apply to values of p such that 1 ≤ p < ∞, there is also a
version of L p space for p = ∞.

21.5. D EFINITION . Let ( X, B) be a measurable space and let µ be a measure on it. For
any measurable f defined on X we let

k f k∞ = inf { M | | f ( x ))| ≤ M for µ-almost all x }


We then define the space

L∞ (µ) = { f | f is a measurable function on X and k f k∞ < ∞ }

The norm k f k∞ is called the essential supremum of f , and the members of L∞ are said
to be essentially bounded. We will leave as an exercise the following generalizations of our
results for L p to the case p = ∞.

21.6. T HEOREM . Let ( X, B) be a measurable space and µ a measure on it.


◦ If f ∈ L1 (µ) and g ∈ L∞ (µ) then f g is absolutely integrable and Hölder’s inequality is
R
true: | f g| dµ ≤ k f k1 · k gk∞ .
◦ The space L∞ (µ) is a Banach space with the norm k · k∞ .

E XERCISE 21.1 (BBT, ex 13:1.2). Show that the inequality k f + gk1 ≤ k f k1 + k gk1 is
strict precisely when there exists a nonnegative measurable function h such that g = f h
for almost every element x of the set where f , g 6= 0.
21. THE BANACH SPACE L p 72

E XERCISE 21.2. Recall the argument from Theorem 12.6 that the absolutely integrable
simple functions are dense in L1 . Show that the absolutely integrable simple functions are
dense in L p .

E XERCISE 21.3. Prove Theorem 21.6.

E XERCISE 21.4. Prove Hölder’s inequality for real numbers. (Should probably give a
hint.)
22. THE DUAL SPACE OF L p 73

§22. The dual space of L p

R EADING . BBT §13.6, Tao “an e of room” §1.2

Recall that if X is a normed vector space, then its dual X ∗ consists of all bounded
linear functionals on X. Although it is somewhat rare to be able to describe the space X ∗
completely, in this section we will be able to provide a complete description of ( L p )∗ .
The starting point in our search for bounded linear functionals on L p is actually Hölder’s
inquality: if p, q are conjugate exponents (that is, 1/p + 1/q = 1), then for any f ∈ L p and
g ∈ Lq we have: Z

f g dµ ≤ k f k p · k gkq

This statement really says that for any g ∈ Lq , the linear functional φ defined on L p defined
by Z
φ( f ) = f g dµ
is in fact bounded. Thus we have already found a large supply of elements of ( L p )∗ . Our
main result says that every element of ( L p )∗ arises in this way.

22.1. T HEOREM . Let ( X, B) be a measure space, µ a measure on it, and L p = L p (µ). Assume
that µ is σ-finite, that is, X = An where µ( An ) < ∞. Let 1 ≤ p < ∞ and let q be the conjugate
S

exponent, that is, 1/p + 1/q = 1. Then for every φ ∈ ( L p )∗ there exists a unique g ∈ Lq such
that Z
φ( f ) = f g dµ
Moreover kφk = k gkq , and ( L p )∗ ∼
= Lq .

To prove this result, it will be necessary to introduce a generalization of measures


called signed measures. To motivate this from the study of linear functionals on L p , ob-
serve that any functional φ on L p gives rise to something like a finitely additive measure
by defining ν( A) = φ(χ A ). Indeed, if A, B are disjoint sets then

ν ( A ∪ B ) = φ ( χ A∪ B ) = φ ( χ A + χ B ) = φ ( χ A ) + φ ( χ B ) = ν ( A ) + ν ( B )

However there is no reason why such a function ν can’t take negative values, and so ν
need not be a measure in our original sense.

22.2. D EFINITION . Let ( X, B ) be a measurable space. A signed measure on ( X, B) is a


function ν : B → [−∞, ∞] with the properties:
◦ ν(∅) = 0;
◦ ν does not take both the values ∞ and −∞; and
◦ If An are disjoint then ∑ ν( An ) converges to ν( An ).
S

Thus given a linear functional φ on L p , the function ν described above could be called
a finitely additive signed measure. For a proper example of a signed measure, let µ be an
22. THE DUAL SPACE OF L p 74

unsigned measure on ( X, B), let g be any absolutely integrable function on X, and define
Z
µ g ( A) = χ A g dµ

Then it is not difficult to see from Fubini’s theorem that µ g is a signed measure on ( X, B).
It is clear from the definition of µ g that if µ( A) = 0 then µ g ( A) = 0 too. The next
result states that this condition is sufficient to guarantee that a given signed measure ν is
actually of the form µ g .

22.3. T HEOREM (Radon–Nikodym). Let ( X, B) be a measurable space and let µ be an


unsigned measure and ν be a signed measure on it. Assume that µ, ν are σ-finite. Then if
µ( A) = 0 =⇒ ν( A) = 0, then there exists g ∈ L1 (µ) such that ν = µ g .

We now have all the ingredients we need to prove that the dual of L p is Lq . Indeed,
we have already introduced a simple correspondence between linear functionals φ and
signed measures ν with the property that ν( A) = φ(χ A ). Then the Radon–Nikodym
theorem gives us a correspondence between signed measures ν and absolutely integrable
R
functions g such that ν( A) = χ A g dµ. Putting these together, we see that φ(χ A ) =
R
χ A g dµ. In other words we see that φ is of the desired form, at least for the characteristic
functions. We are therefore left to check that this property can be extended to arbitrary
functions f ∈ L p , as well as the rest of the claims in the statement.
S KETCH OF PROOF OF T HEOREM 22.1. In this proof, we will sketch only the case when
1 < p < ∞ and µ( X ) < ∞. It is not essentially more difficult to complete the proof from
this simplified version.
Given a functional φ ∈ ( L p )∗ , we first define the mapping ν( A) = φ(χ A ). We have
already checked that ν is finitely additive. We claim that in fact ν is a signed measure.
Indeed, if An is a given sequence of pairwise disjoint sets, then using the finiteness of
µ( X ) and the dominated convergence theorem we have:
Z Z
|χS An − χSk An | p dµ = |χS∞k+1 An | p dµ → 0
1

In other words, we have that χSk En → χS En in the L p -norm. Using the fact that φ is a
1
continuous function on L p , it follows that
[
ν( A n ) = φ ( χ S An )
= lim φ(χSk An )
k 1

k
[
= lim ν( An )
k
1
k
= lim ∑ ν( An )
k 1
= ∑ ν( An )
and so ν is countably additive.
22. THE DUAL SPACE OF L p 75

Now by the Radon–Nikodym theorem, there exists a function g ∈ L1 such that φ(χ A ) =
R R
χ A g dµ for all sets E. It therefore follows from linearity that φ( f ) = f g dµ for all simple
functions f .
We next claim that φ( f ) = f g dµ for all functions f in L∞ . For this recall that any
R

bounded measurable function is a uniform limit of simple functions. So given f ∈ L∞


let f n be a sequence of simple functions such that f n → f uniformly. Using the uniform
R R
convergence on a finite measure space, we can easily argue that f n g dµ → f g dµ. For
the same reason, we can also argue that f n → f in L p . Since φ is continuous on L p we
therefore have that:

φ( f ) = lim φ( f n )
n
Z
= lim f n g dµ
n
Z
= f g dµ

as desired.
While our next goal is of course to show that φ( f ) = f g dµ for all functions f ∈ L p ,
R

we first take a break and show that g lies in Lq . In fact we will show that k gkq ≤ kφk. First
we can use the truncation lemma to suppose that g is bounded. Then | g|q /g is bounded
too, and so by our work for functions in L∞ we can calculate:
Z Z
q
| g| dµ = (| g|q /g) g dµ
= φ(| g|q /g)
≤ k φ k · k g q −1 k p
Z 1/p
q
= kφk · | g| dµ

This inequality implies that k gkq ≤ kφk, as desired. We remark that this implies the
R R
functional f 7→ f g dµ is continuous, since Hölder’s inequality states that f g dµ ≤
k f k p · k gkq .
We now claim that we have φ( f ) = f g dµ for any f ∈ L p . For this, recall that we
R

have previously shown that the simple functions are dense in L1 , that is any L1 function is
an L1 -limit of simple functions. The same argument can be used to show that the simple
functions are dense in L p . We have shown above that the two functionals, f 7→ φ( f ) and
R
f 7→ f g dµ, agree on the simple functions. Our hypothesis states that φ continuous, and
R
the previous paragraph implies that f 7→ f g dµ is continuous too. Since two continuous
functions that agree on a dense set must agree on their domain, we can conclude that
φ( f ) = f g dµ for all f ∈ L p (µ).
R
R
Finally we claim that k gkq = kφk. Indeed we now know that |φ( f )| = f g dµ ≤
k f k p · k gkq , and hence that kφk ≤ k gkq . We have also shown two paragraphs previously
that k gkq ≤ kφk. This concludes the proof. 
22. THE DUAL SPACE OF L p 76

N OTES AND FURTHER READING . The Radon–Nikodym theorem has a generalization


called the Lebesgue–Radon–Nykodym theorem which states that given µ, any signed
measure ν can be decomposed ν = µ g + δ where µ, δ have disjoint supports.
We have stated that the dual of L1 is equal to L∞ when µ is σ-finite. However the
reverse is usually not true. Instead the dual of L∞ (µ) is the space of all finitely additive
signed measures ν such that µ( E) = 0 =⇒ ν( E) = 0.

E XERCISE 22.1 (BBT, ex 13:6.1). Let g ∈ L1 [0, 1]. Show that the map f 7→ f g is a
R

bounded linear functional on L∞ [0, 1].

E XERCISE 22.2 (BBT, ex 13:6.2). Show that there is a nonzero bounded linear functional
on L∞ [0, 1] that vanishes on the (closed) subspace of continuous functions.

E XERCISE 22.3 (BBT, ex 13:6.3). Show that there is a bounded linear functional on
L∞ [0, 1]
that is not of the form f 7→ f g for any g ∈ L1 [0, 1].
R
23. HILBERT SPACE 77

§23. Hilbert space

R EADING . BBT §13.5, 14.1, 14.2, 14.3


Up until this point in our study of L p spaces, we have not been concerned with the
value of p so long as 1 < p < ∞. However it should not be surprising that there is
something special about the case p = 2. In this section we will uncover some of the
special properties of L2 , as well as use these properties to define a new type of space
called a Hilbert space.
Informally, L2 is the L p space which is most closely analogous to classical Euclidean
space.
q This is because the norm k · k2 is a generalization of the Euclidean norm k x k =
x12 + · · · + xn2 . Intuitively, this means that the geometry of L2 has the “round ball” ge-
ometry of finite-dimensional Euclidean space.
More formally, L2 shares several key properties with classical Euclidean space that are
not shared by any other L p . First, since the conjugate exponent of p = 2 is q = 2 also, the
previous section shows that L2 is self-dual, that is, ( L2 )∗ ∼
= L2 . In detail, this means that
the bounded linear functionals on L2 are all of the form f 7→ f g where g ∈ L2 itself.
R

This recalls the case of Euclidean space Rd where the (bounded) linear functionals are all
given by an inner product x 7→ y T x.
The key idea of this section is that just like the pairing yt x, the pairing f g may be
R

regarded as an inner product, leading to the next definition. For the greatest generality,
we will now return to vector spaces with complex scalars.
23.1. D EFINITION . Let X be a complex vector space. A function h·, ·i : X × X → C is
called an inner product if it satisfies
(a) (positivity/nonvanishing) for x ∈ X we have h x, x i ≥ 0, and h x, x i = 0 iff x = 0;
(b) (conjugate symmetry) for x, y ∈ X we have h x, yi = hy, x i; and
(c) (linearity in the first coordinate) hc1 x1 + c2 x2 , yi = c1 h x1 , yi + c2 h x2 , yi
p
If X admits an inner product then it automatically admits a norm k x k = h x, x i, and X
is called a Hilbert space if this norm is complete.
We will prove shortly that the mapping k · k defined above really is a norm. We remark
that (b) and (c) together imply that h·, ·i is conjugate linear in the second coordinate (we
leave it to the reader to state and verify this formally).
Thus the Banach space X = L2 (µ) (with the complex scalars) is a Hilbert space with
respect to the inner product Z
h f , gi = f ḡ dµ

Similarly, the sequence space `2 = x ∈ CN | ∑ | xi |2 < ∞



is a Hilbert space with respect
to the inner product
h x, yi = y∗ x = ∑ xi ȳi
Of course `2 is really just an instance of L2 (µ), corresponding to the case when µ is the
counting measure on N.
23. HILBERT SPACE 78

While a Hilbert space may seem like just a small “upgrade” from a Banach space, it is
quite significant. In fact, we will see in the next section that L2 and `2 are essentially the
only examples of Hilbert spaces.
We now lay out some of the most basic facts about Hilbert space. Our first result is the
following analog of Hölder’s inequality.

23.2. T HEOREM (Schwarz inequality). Let X be an inner product space. Then |h x, yi| ≤
k x k · k y k.

P ROOF. The proof is a simpler version of the proof of Hölder’s inequality. First, by
multiplying x by a scalar of the form eiθ , we may assume that h x, yi is real. Next given
x, y we define a real function p(α) = hαx + y, αx + yi. Then by axiom (a) we have that
p(α) ≥ 0. And by axiom (c) we have

p(α) = α2 k x k2 + 2αh x, yi + kyk2

Thus p is a quadratic and p ≥ 0, which implies p has at most one real root. This means that
the discriminant is non-positive, that is, 4|h x, yi|2 − 4k x k2 · kyk2 ≤ 0. This last equation is
plainly equivalent to the desired result. 
We are now ready to prove the fact that every inner product space automatically has
a norm.
p
23.3. P ROPOSITION . Let X be an inner product space. Then k x k = h x, x i makes X into a
normed vector space.

P ROOF. Since the nonvanishing and homogeneity properties are automatic from the
axioms, it remains only to verify the triangle inequality. For this we simply calculate:

k x + yk2 = h x + y, x + yi
= h x, x i + h x, yi + hy, x i + hy, yi
≤ h x, x i + 2|h x, yi| + hy, yi
≤ k x k2 + 2k x k · k y k + k y k2
= (k x k + kyk)2
Here, the first inequality uses axiom (b) and the ordinary triangle inequality, and the sec-
ond inequality uses the Schwarz inequality. Taking the square root of both sides, we
achieve the desired result. 
Perhaps the most important feature of Hilbert spaces that is not present in an ordinary
Banach space is that of orthogonality.

23.4. D EFINITION . Let X be a Hilbert space. We say that vectors x, y ∈ X are orthogonal
if h x, yi = 0. Given a vector subspace Y ⊂ X we define its orthogonal complement Y ⊥ =
{ x ∈ X | (∀y ∈ Y ) h x, yi = 0 }.
23. HILBERT SPACE 79

The orthogonal complement does not always behave as one would expect from clas-
sical Euclidean space. For example, it is possible for a proper subspace Y to have Y ⊥ = 0
(this will be the case if Y is dense in X). However if Y is a closed subspace, then most
familiar properties do hold.

23.5. P ROPOSITION . Let X be a Hilbert space and let Y ≤ X be a closed subspace. Then
X = Y ⊕ Y ⊥ in the sense that every x ∈ X can be uniqely expressed as x = y + y0 where y ∈ Y
and y0 ∈ Y ⊥ .

The idea behind the proof is as follows. Given x, let y ∈ Y be the unique point in
Y which is closest to x. Such a point y exists and is unique thanks to the Euclidean-like
geometry of Hilbert space. (The basic fact here is that closed, convex sets have a unique
element of minimal norm.)
This key fact makes it possible to define bases in Hilbert space, as we will do in the
next section. Here we present another useful consequence of the proposition. First recall
that our motivation for defining Hilbert spaces was the fact that L2 is self-dual, and thus
the action of ( L2 )∗ on L2 behaves like an inner product. The next result states that the
converse holds, that is, if X admits an inner product then X is self-dual.

23.6. T HEOREM . If X is a Hilbert space, then X is self-dual. That is, for any φ ∈ X ∗ there
exists y ∈ X such that φ( x ) = h x, yi. Moreover the correspondence φ 7→ y is a conjugate-linear
isomorphism X ∗ ∼
= X.

P ROOF. Given φ, we let Y = { x ∈ X | φ( x ) = 0 }. Assuming Y 6= X, we may choose


z ∈ Y ⊥ such that kzk = 1. We then let y = φ(z)z. Then we have

φ( x ) − h x, yi = φ( x ) − φ(z)h x, zi
= φ( x )hz, zi − φ(z)h x, zi
= hφ( x )z − φ(z) x, zi
=0
Here the last equality follows from the fact that φ( x )z − φ(z) x lies in Y. 

E XERCISE 23.1. Use the discussion in BBT, §14.2 to prove Proposition 23.5.
24. BASES FOR HILBERT SPACE 80

§24. Bases for Hilbert space

R EADING . BBT, §14.4

In the previous section we have seen that Hilbert spaces possess many properties
which are familiar from Rn and L2 . The special properties are made possible by the inner
product and its corresponding notion of orthogonality. In this section we make further
use of orthogonality, in particular introducing orthonormal bases.
Although bases are essential to the study of classical linear algebra, they have been
absent so far in our study of Banach spaces.

24.1. D EFINITION . Let X be a Hilbert space. A subset {eα }α∈ A of X is called an or-
thonormal basis if it satisfies the following properties:
(a) (normality) for all α, keα k = 1;
(b) (orthogonality) for all α 6= β, heα , e β i = 0; and
(c) (maximality) for any x ∈ X, if h x, eα i = 0 for all α ∈ A, then x = 0.

It is important to note that for an infinite-dimensional Hilbert space, the concept of


orthonormal basis is very different than the concept of “basis” in a classical vector space.
It is true that the Hilbert space orthonormal basis is an independent set. However, it is
not true that the Hilbert space orthonormal basis is a maximal independent set. The max-
imality property above states only that it is maximal with respect to being orthonormal.
Just as every vector space has a basis, every Hilbert space has an orthonormal basis.
Indeed, this follows from an elementary application of Zorn’s lemma. However, for our
concrete examples of Hilbert spaces, it is not difficult to identify a simple concrete basis.
For example, regard X = `2 as the space of complex vectors with countably many
coordinates such that the coordinates are square-summable. We let ei be the vector with a
1 in the ith coordinate and a 0 in every other coordinate. Then it is easy to see that {ei } is
an orthonormal basis for `2 .
For another example, let X = L2 [0, 1]. Let en = e2πinx , where n ranges over the integers
Z. Then an easy calculation shows that en have unit norm and are pairwise orthogonal.
Maximality is somewhat harder to check; it follows from the Stone–Weierstraß theorem,
which we omit but easily implies that the en are dense in L2 .
Note that in the case of real L2 space, one instead uses the basis consisting of functions
sin(2πnx ) and cos(2πnx ) for n ∈ N. This is the foundation of Fourier analysis. As we will
see in the next result, the fact that this is a Hilbert space basis means that any L2 function
can be expressed uniquely as a countable linear combination of waves of different periods!
Recall that in the study of classical vector spaces, every element can be written uniqely
as a finite linear combination of basis elements. The following result shows that a Hilbert
space basis has an analogous property: every element can be written uniqely as an infinite
linear combination of Hilbert space basis elements.
24. BASES FOR HILBERT SPACE 81

24.2. T HEOREM . Let X be a Hilbert space and {eα }α∈ A an orthonormal basis for X. Then for
any x ∈ X, we have x = ∑h x, eα ieα , with the convergence being in norm. Moreover, we have
Parseval’s identity, which states that k x k2 = ∑α∈ A |h x, eα i|2 .

P ROOF. In the proof, we will need the Pythagorean theorem, which states that if e1 , . . . , en
are orthognal, then ke1 + · · · + en k2 = ke1 k2 + · · · + ken k2 . The calculation is the same as
the classical version, and is obtained by distributing out the expression he1 + · · · + en , e1 +
· · · + e n i.
We first show one half of Parseval’s identity, namely that ∑ |h x, eα i|2 ≤ k x k2 . (This is
called Bessel’s inequality.) For this we let A0 ⊂ A be a finite subset. Then an elementary
calculation together with the Pythagorean theorem gives
2

(24.e1) x − ∑ h x, eα ieα = k x k − ∑ |h x, eα i|2

α∈ A α∈ A

0 0

Since the left-hand side is nonnegative, we have ∑α∈ A0 |h x, eα i|2 ≤ k x k2 . Since A0 was
arbitrary, this completes the claim.
Now we know that ∑ |h x, eα i|2 converges. It follows that there are just countably many
nonzero terms, let us enumerate them |h x, en i|2 . Then the sequence of partial sums of
∑ |h x, en i|2 is Cauchy. By the Pythagorean theorem,
2
l l
∑h x, en ien = ∑ |h x, en i|2

m m

Thus the sequence of partial sums of ∑h x, en ien is Cauchy too. Since X is complete, we
conclude that there exists an element y ∈ X defined by y = ∑h x, eα ieα .
Next we claim that in fact x = y. Indeed, it is easy to see that h x − y, eα i = 0 for all α,
so the completeness of the orthonormal set implies that x − y = 0.
Finally, we conclude the proof of Parseval’s identity by returning to Equation (24.e1).
Since we now know that the left-hand side converges to 0 as A0 → A, it follows that the
right-hand side does as well, establishing the equality. 
One can interpret this result as saying that any Hilbert space X looks remarkably like
`2 .
That is, each element x ∈ X is determined by its vector of coefficients h x, eα i. Indeed,
our last result will show that this is a formal theorem. In order to state it, we need to
define a generalization of the sequence space `2 .

24.3. D EFINITION . For any index set A, we let


( )

`2 ( A) = f : A → C ∑ | f (α)|2 < ∞

α∈ A

In other words, `2 ( A) is like a sequence space where the sequences may be indexed
by an arbitrary set other than N. Another way to say it is that `2 ( A) is equal to L2 (µ),
where µ is the counting measure on A.
24. BASES FOR HILBERT SPACE 82

The space `2 ( A) is determined up to isomorphism by the cardinality of A. The car-


dinality of A is called the dimension of `2 ( A). This is because like `2 , the space `2 ( A) has
the obvious Hilbert space basis consisting of {eα }α∈ A , where eα (α) = 1 and eα ( β) = 0
whenever β 6= α.

24.4. T HEOREM . Let X be a Hilbert space. Then X is isomorphic to `2 ( A) for some A by a


linear bijection that preserves inner products.

P ROOF. Let {eα }α∈ A be an orthonormal basis for X. The index set A will be the same
set we use to form `2 ( A). We define a function φ : X → `2 ( A) by φ( x )(α) = h x, eα i.
It is easy to see that φ is linear, and Parseval’s identity implies that φ preserves the
norm. In particular φ is injective. To see that φ preserves the inner product, it suffices to
note that the inner product can be recovered from the norm by the polarization identity

4h x, yi = k x + yk2 − k x − yk2 + i k x + iyk2 − i k x − iyk2

Finally to see that φ is surjective, let f ∈ `2 ( A) be given. Since ∑α∈ A | f (α)|2 < ∞,
the series has just countably many nonzero terms and its partial sums are Cauchy. By
the Pythagorean theorem, the partial sums of ∑α∈ A f (α)eα are Cauchy too. It follows
that there exists an element x ∈ X defined by x = ∑α∈ A f (α)eα . Clearly φ( x ) = f , as
desired. 
Thus the result implies that there is exactly one Hilbert space in each dimension. Note
that both `2 and L2 [0, 1] are Hilbert spaces of countable dimension, because we have seen
above that each has a countable basis. The unique countable dimensional Hilbert space,
often denoted H, is by far the most widely used in applications where an operator acts on
infinitely many coordinates. Hilbert space appears in the study of differential equations,
fourier analysis, quantum physics, and more.

E XERCISE 24.1. Prove the Pythagorean theorem in a Hilbert space.

E XERCISE 24.2. Prove the polarization identity ina Hilbert space.

You might also like