Lecture Notes On Measure Theory and Functional Analysis
Lecture Notes On Measure Theory and Functional Analysis
2
PART I
Measure theory
Before discussing the measure problem, let’s talk intuitively about what we mean by
“measure.” In this area of mathematics, measure is a number assigned to a set which
represents its size. Of course the term “size” also has many meanings. Our concept of
measure can accommodate many different types of sizes, such as: length, area, volume,
mass, and even probability. On the other hand, some other types of size are not usually
associated with the mathematical concept of measure, such as: cardinality, diameter, and
density.
The mathematical concept of a measure is thus beginning to seem geometric. But
considering our models of sets and spaces, in which coordinate axes are indexed by the
infinitesimal points of the real number line, measure really turns out to be an analytic
concept. (In particular, this means lots of e’s will show up in our studies!)
For a simple set, it may be easy to decide what its measure should be. For example, if
we use the term measure to mean length, then the measure of the interval [4, 7] should be
3. But for a more complicated set, the decision may not be so easy. If you have seen the
construction of the Cantor set, think about how you would measure the length of that!
Thus we arrive at the “measure problem,” which asks whether it is even possible
to find a function which adequately measures subsets of the real line R. Of course it
is necessary to say what is considered adequate. The classical version of the measure
problem proposed the three properties below. Formally, the measure problem asks: Does
there exist a measure function m which assigns to each subset A ⊂ R a value m( A) ∈
[0, ∞] satisfying:
(a) (normality) m( I ) = the length of I for every interval I;
(b) (translation-invariance) m( x + A) = m( A) for every A; and
(c) (countable additivity) m( ∞ n=1 An ) = ∑ m ( An ) for every seqence of pairwise dis-
S
joint sets An .
Perhaps surprisingly, no such measure function m exists! While properties (a)–(c)
seem very natural, the three items unfortunately turn out to be mutually inconsistent.
1.1. T HEOREM (Vitali). There exists a set A ⊂ R such that no measure can be assigned to A
consistently with (a)–(c).
3
1. THE MEASURE PROBLEM 4
P ROOF. Rather than work on R, we will work on the half-open unit interval [0, 1)
with the addition operation taken modulo 1. This is ok, since if there is a measure m on
all subsets of R, then by properties (b) and (c), m restricts to a measure on subsets of [0, 1)
which satisfies property (b) with respect to addition modulo 1.
Now let Q1 denote the rationals of [0, 1), that is, Q1 = Q ∩ [0, 1), and consider the
collection of additive cosets of Q1 inside [0, 1). The cosets are of the form a + Q1 where
again addition is interpreted modulo 1. We now let A ⊂ [0, 1) denote a system of coset
representatives for this collection.
Now every number in [0, 1) can be written uniquely as a + q for a ∈ A and q ∈ Q1 .
This means that the collection of translates of A by elements q ∈ Q1 covers all of [0, 1). In
S
particular, by (a) the measure of q∈Q1 ( A + q) is exactly 1.
On the other hand, by (b) and (c) we have that
[
m
q∈Q1
( A + q ) = ∑ q ∈Q m ( A + q ) = ∑ q ∈Q m ( A )
1 1
By the previous paragraph, the left-hand side of the above equation is 1. On the other
hand the right-hand side is an infinite sum of some nonnegative constant, and hence must
be either 0 or ∞. This is a contradiction!
We remark that it is possible to modify the argument to apply directly to a measure on
R rather than going via the unit interval with addition modulo 1. See Tao for this version.
The lesson is that we must weaken our demands on a measure m. Dropping condition
(a) can lead to trivial measures. Dropping condition (b) takes away the geometric aspects
of the measure, and leads to interesting set-theoretic questions and constructions. Weak-
ening condition (c) to finite additivity leads to interesting solutions, but only in dimen-
sions ≤ 2. (In dimensions ≥ 3 the Banach–Tarski paradox again gives a contradiction.)
Yet the simplest path forward (and the one that we take) is to drop the tacit condi-
tion that every set be measurable. The set A constructed in Vitali’s proof is very artificial
and isn’t likely to occur in any of the most common analytical applications (see the notes
below). We want to be excused from the burden of deciding the measure of the set A.
This means we need to figure out what sets we will measure, and what sets we will not
measure. In the end, our measure function m will have a domain which is a proper subset
of P (R), but which still contains a rich collection of sets. And the measure will satisy
properties (a)–(c) as long as they are applied to the sets in the domain of m.
Of course we are also interested in the measure problem for subsets Rn . It can be
formulated in just the same way, with condition (a) replaced by the condition that the
measure of a box is equal to its volume. And a Vitali-type result can also easily be estab-
lished for this version of the measure problem.
In the next section, we will begin this process by taking a step backwards and build
measures with much smaller domains, and satisfying just fragments of (a)–(c).
1. THE MEASURE PROBLEM 5
N OTES AND FURTHER READING . The proof of Vitali’s theorem requires the Axiom of
Choice. Specifically, it is needed to find a system of coset representatives for an uncount-
able collection. Solovay showed that the use of AC is essential, and that it is consistent
with ¬AC that there is a measure function m on all subsets of R.
E XERCISE 1.1. Show that the properties (a)–(c) of a measure imply finite addivitity: If
A and B are disjoint then m( A ∪ B) = m( A) + m( B).
E XERCISE 1.2. Show that the properties (a)–(c) of a measure imply the inclusion–
exclusion principle: For any sets A, B we have m( A ∪ B) + m( A ∩ B) = m( A) + m( B).
E XERCISE 1.3. Complete the details of the proof that if there is a measure on R sat-
isfying properties (a)–(c), then there is a measure on [0, 1) (with additional modulo 1)
satisfying properties (a)–(c).
E XERCISE 1.4. If A is a bounded set of real numbers, the supremum sup( A) is the least
upper bound of A, and the infemum inf( A) is the greatest lower bound of A. Show that
s = sup( A) if and only if:
◦ for all a ∈ A we have a ≤ s, and;
◦ for all e > 0, there exists a ∈ A such that s − a < e.
Formulate and prove the analogous statement for the infemum.
2. ELEMENTARY MEASURE 6
In the introduction we saw that we cannot hope to define a measure which will work
adequately on all subsets of Rn . In this section we start over and define a measure which
is capable of measuring only the simplest sorts of subsets of Rn . In doing so we will see
some of the difficulties which one encounters in defining even very simple measures, and
we will also see some of these difficulties resolved. Moreover we will have explicit use for
the elementary measure defined in this section, so doing so is not a digression at all.
Recall that a bounded interval is any subset of R of the form ( a, b), [ a, b), ( a, b], or [ a, b].
We shall use the term box for any subset of Rn which is a Cartesian product of bounded
intervals.
For any elementary set E, we wish to define its elementary measure, or simply measure,
m( E). The measure of any interval will be defined to be its length, and the measure of
any box will be defined to be its volume. Thus if I = ( a, b) or [ a, b) or ( a, b] or [ a, b], then
we let m( I ) = len( I ) = b − a (in all four cases). And if B = ∏ In is a box, then we define
m( B) = vol( B) = ∏ len( In ). Since we allow only bounded boxes, this product can never
be indeterminate (0 · ∞). So far, so good.
We now wish to define the measure of an elementary set to be the sum of the finitely
many boxes it is composed of. However there are two issues with this statement: first the
constituent boxes need not be disjoint, and second there is in general more than one way
to express an elementary set as a union of boxes. The following two lemmas address these
two issues.
2.2. L EMMA . Any elementary set E can be expressed as a finite union of disjoint boxes.
P ROOF. First assume that E ⊂ R1 and that E = Ii . Then by considering all end-
S
points of the Ii in increasing order a1 , . . . , am it is easy to write E as the union of sets of the
form ( ai , ai+1 ) together with sets of the form [ ai , ai ] (single points). Such a union is clearly
disjoint.
In general if E ⊂ Rn and E = Bi then for each dimension d ≤ n consider in turn
S
the dth sides of the boxes Iid . Again consider the endpoints of these intervals in increasing
order aid , . . . , admd . Then we can write E as a union of small boxes which are products of
sets of the form ( aid , aid+1 ) or of the form [ aid , aid+1 ]. Such boxes are again disjoint.
Figure 2.f1 shows an example of the method of the proof above.
2.3. L EMMA . Suppose the elementary set E can be expressed in two ways a a finite union of
disjoint boxes: E = Bi = Cj . Then ∑ vol( Bi ) = ∑ vol(Cj ).
F F
2. ELEMENTARY MEASURE 7
taken from the collection { Di }. Then one can simply apply the argument of the previous
paragraph to B and to each Bi .
Finally given E, Bi , and Cj as in the problem statement, one can find a third expression
E = Dk where { Dk } is a refinement of both { Bi } and of {Cj }. That is, each Bi and each
F
The above three core properties imply further useful properties as well.
These results give an essentially complete solution to the measure problem for ele-
mentary sets. It wasn’t too difficult to achieve, but perhaps not as easy as one would have
thought! Even so, what about measuring other simple sets such as circles, triangles, blobs,
Cantor sets, and so on? In the next section we will continue on the road to doing this.
E XERCISE 2.1 (Tao Ex 1.1.1). Show that the class of elementary sets is closed under the
operations: union, intersection, set difference, symmetric difference, and translation.
E XERCISE 2.2. Prove Proposition 2.5: The elementary measure satisfies the monotonic-
ity and finite subadditivity properties.
In the previous section we showed that the intuitive definition of area is sensible for
elementary sets, but then remarked that simple shapes like polygons and circles are not
elementary. It is easy to imagine extending the elementary measure to triangles by cutting
and rotating, and to polygons by gluing together triangles. However no such operation
can perfectly measure a circle.
Instead we will measure the circle the way it has always been done, by using approx-
imation. It is not hard to visualize a circle being approximated by elementary sets, using
smaller and smaller boxes near the boundary. The approximation technique will help us
measure most traditional geometric figures, and even many blobby thingies.
3.1. D EFINITION . Let A be a bounded subset of Rn . First define the inner and outer
Jordan measures (sometimes called lower and upper):
m∗ j ( A) = sup { m( E) : E ⊂ A, E elementary }
m∗ j ( A) = inf { m( F ) : A ⊂ F, F elementary }
It is immediate from the definition that Jordan measure extends elementary measure
in the sense that they agree on the elementary sets. This means we are justified in us-
ing “m” both for the elementary and Jordan measures. Moreover, we will show that the
Jordan measure inherits many of the properties of the elementary measure: normality,
translation-invariance, finite additivity, monotonicity, and finite subadditivity.
The normality and translation-invariance properties hold simply because they hold
for elementary measure, and these properties pass to the supremum. The additivity and
subadditivity properties will take a little more work. For instance, in order to even state
the finite additivity property, we first need to establish Boolean closure: the union of
measurable sets is measurable.
Before we begin these results, it will be useful to establish the following characteriza-
tion of Jordan measurability. As we will be working with approximations, the following
results also illustrate our first use of e-style analytical arguments.
3.2. L EMMA . The set A is Jordan measurable if and only if either of the following holds:
◦ For all e > 0 there are elementary sets E, F such that E ⊂ A ⊂ F such that m( F r E) <
e.
◦ For all e > 0 there is an elementary set E such that m∗ j ( E 4 A) < e.
P ROOF. We establish only the equivalence of Jordan measurability with the first item.
To begin, assume that A is Jordan measurable and let e > 0 be given. By the m∗ j definition
3. JORDAN MEASURE 10
of Jordan measure, we can find an elementary set E ⊂ A such that m( A) − m( E) < e/2.
By the m∗ j definition of jordan measure we can find an elementary set F such that A ⊂ F
and m( F ) − m( A) < e/2. It follows that
as desired.
For the converse, assume that the first bullet holds true, and let e > 0 be arbitrary.
Then we can find elementary sets E, F such that E ⊂ A ⊂ F and m( F ) − m( E) < e.
From the definitions of inner and outer Jordan measure, we have that m( E) ≤ m∗ j ( A) ≤
m∗ j ( A) ≤ m( F ). It follows that m∗ j ( A) − m∗ j ( A) < e. Since e was arbitrary, we may
conclude that m∗ j ( A) = m∗ j ( A) and therefore that A is Jordan measurable.
Note that in the proof, one has to be careful when making a claim such as m( F r E) =
m( F ) − m( E). It is true in the above cases because: the elementary sets are closed under
set differences, and so all three sets are elementary, and thus we may apply the finite
additivity property for elementary measure.
P ROOF. We prove only the case of A ∪ B. Suppose that A, B are Jordan measurable.
By the previous lemma, we can find elementary sets E, F, E0 , F 0 such that E ⊂ A ⊂ F, and
E0 ⊂ B ⊂ F 0 , and m( F r E), m( F 0 r E0 ) < e/2. Then we have E ∪ E0 ⊂ A ∪ B ⊂ F ∪ F 0
and using some algebra together with the finite subadditivity of elementary measure,
m( F ∪ F 0 r ( E ∪ E0 )) ≤ m( F r E) + m( F 0 r E0 ) < e. Again by the previous lemma, this
shows that A ∪ B is Jordan measurable.
We are now ready to establish the remaining stated properties of Jordan measure. The
following result states finite additivity, and the first paragraph of its proof gives finite
subadditivity. The monotonicity property follows immediately from finite additivity.
3.4. T HEOREM . The Jordan measure satisfies finite additivity, that is, if A, B are Jordan mea-
surable and disjoint, then m( A ∪ B) = m( A) + m( B).
P ROOF. We first show subbaditivity, that is, that m( A ∪ B) ≤ m( A) + m( B). Let e > 0
be given. Using the fact that m = m∗ j we can find elementary sets F, F 0 such that A ⊂ F,
B ⊂ F 0 , m( F ) − m( A) < e/2, and m( F 0 ) − m( A0 ) < e/2. Using the monotonicity and
subadditivity properties of the elementary measure, together with the definition of Jordan
measurability, we now have:
m( A ∪ B) = m∗ j ( A ∪ B)
≤ m( F ∪ F0 )
≤ m( F ) + m( F0 )
< m( A) + m( B) + e
3. JORDAN MEASURE 11
m( A ∪ B) = m∗ j ( A ∪ B)
≥ m( E ∪ E0 )
= m( E) + m( E0 )
> m( A) + m( B) − e
Again letting e tend to 0, we achieve that m( A ∪ B) ≥ m( A) + m( B).
While you probably have a clear idea of what the elementary sets look like, it is now
time to give some examples and non-examples of Jordan measurable sets. Some simple
but useful new examples are the axis-parallel triangles. Suppose T is an axis-parallel
triange with leg lenghs a and b. To prove that T is Jordan measurable, note that two
copies of T essentially make up a box with area ab. Using the finite additivity, this implies
that the measure of T is the expected ab/2.
To make this argument we need to know that Jordan measure is invariant under 180◦
rotation, which is clear because it is true for boxes. Moreover since the two copies of T
overlap in a line segment, we also need to know that the Jordan measure of a line segment
is 0. This fact follows from the more general result below.
3.5. L EMMA . Let f be a continuous function defined on a closed, bounded interval. Then the
graph of f , considered as a subset of R2 , has Jordan measure 0.
P ROOF. Let I denote the domain of f . Recall that since I is closed and bounded, it
is compact. Recall also that a continuous function with a compact domain is uniformly
continuous: for any e > 0 there exists a δ > 0 such that for any interval J, len( J ) < δ
implies len( f ( J )) < e.
So let e > 0 be given, and choose δ > 0 as above. Shrinking δ if necessary, we can
suppose that len( I )/δ is an integer k. Partitioning I into intervals J1 , . . . , Jk each of lengh
δ, we have that the graph of f is contained in the set
[
A= Ji × [min f ( Ji ), max f ( Ji )]
i ≤k
Note that the min and max values in the definition of A exist by the extreme value theo-
rem. Now A is a union of k many rectangles each of size at most δe. Thus A is elementary
and its measure is at most kδe. This latter value is len( I )e, so the upper measure m∗ j of
the graph of f is at most len( I )e. Taking e → 0, we conclude that f is Jordan measurable
with measure 0.
3. JORDAN MEASURE 12
It is now not difficult to conclude that all polygons are Jordan measurable and have
the expected measure. This is because all polygons can be decomposed into a union of
axis parallel triangles (possibly overlapping on their measure zero edges).
A simple example of a set which is not Jordan measurable is the set Q1 = Q ∩ [0, 1]
of rational numbers in the unit interval. Indeed the only elementary sets E ⊂ Q1 are the
finite sets, and so m∗ j (Q1 ) = 0. And the only elementary sets F such that Q1 ⊂ F are of
the form [0, 1] r X where X is finite, and so m∗ j (Q1 ) = 1.
Intuitively, the Jordan measure works very well for classical geometric figures, but not
very well for relatively simple analytic objects such as countable dense sets, the Cantor set,
and so forth. To handle such sets, we will soon work to describe the Lebesgue measure,
which satisfies countable additivity. Before going to such generality, however, we explore
the connection between Jordan measure and Riemann integration.
E XERCISE 3.1 (Tao, Ex 1.1.6(4)(6)). Verify that Jordan measure agrees with the elemen-
tary measure on elementary sets (thus satisfies the normality property). Verify that Jordan
measure satisfies the translation-invariance property.
E XERCISE 3.2 (See Tao, Ex 1.1.5). Complete the proof of Lemma 3.2: A is Jordan mea-
surable iff for all e > 0 there is an elementary set E such that m∗ j ( E 4 A) < e.
E XERCISE 3.3 (Tao, Ex 1.1.6(1)). Complete the proof of Proposition 3.3: If A, B are
Jordan measurable, then so are A ∩ B and A r B.
E XERCISE 3.4 (Tao, Ex 1.1.12). Say that A is Jordan null if A is Jordan measurable and
m( A) = 0. Show that any subset of a Jordan null set is a Jordan null set.
E XERCISE 3.5. Show that the outer Jordan measure m∗ j ( A) is equal to:
E XERCISE 3.6 (Tao, Ex 1.1.19). Let A be an arbitrary bounded set, and let E be an ele-
mentary set. Show that
m∗ j ( A) = m∗ j ( A ∩ E) + m∗ j ( A r E)
E XERCISE 3.7. Show that A is Jordan measurable if and only if for all e > 0 there exists
an elementary set E such that A ⊂ E and m∗ j ( E r E) < e.
4. RIEMANN INTEGRATION 13
If the picture of Lemma 3.5 reminded you of Riemann sums, it should. Measure theory
is closely connected to integration theory, as both are concerned with calculating areas of
some regions. Moreover the Jordan measure corresponds neatly with the Riemann inte-
gral. The following presentation of the Riemann integral is actually attributed to Darboux.
Just as we defined the elementary measure before we defined the Jordan measure, we
will now define the “piecewise constant” integral before we define the Riemann integral.
In other words, f is piecewise constant if f is of the form ∑1k c j χ Ij , where Ij are intervals.
Here χ Ij denotes the characteristic function of Ij , that is, χ Ij ( x ) = 1 if x ∈ Ij and χ Ij ( x ) = 0
otherwise.
As was the case with the elementary measure, one must check that the value of the pc
integral is well-defined. That is, if f is expressed in two different ways as a pc function,
say ∑ c j χ Ij = ∑ dk χ Jk , then one must check that the two values ∑ c j len( Ij ) and ∑ dk len( Jk )
agree.
4.3. D EFINITION . Let f be a bounded function on [ a, b]. First define the lower and
upper Riemann forms:
Z Z
f = sup pc f g ≤ f , g pc
Z Z
f = inf pc h f ≤ h, h pc
R R
Then if f = f we say that f is Riemann integrable, and denote the common value simply
R
by f .
P ROOF. We establish only the normality property. By Lemma 3.2, for any e we can find
disjoint intervals Ij and disjoint intervals Jk such that Ij ⊂ A ⊂ Jk and m( Jk r Ij ) <
S S S S
4. RIEMANN INTEGRATION 14
R S
e. It is easy to see from the definition of the pc integral that pc χS Ij = m ( Ii ), and
R S
similarly pc χS Jk = m( Jk ). We now have
[ Z Z [
m( Ii ) ≤ χA ≤ χ A ≤ m( Jk )
Since the left and right-hand sides differ by < e, it follows that the lower and upper
integrals differ by < e as well. Since e was arbitrary, it follows that χ A is integrable. And
since we also have
[ [
m( Ii ) ≤ m( A) ≤ m( Jk )
R
we may conclude that χ A is equal to m( A).
If one re-examines the definition and properties of the Jordan measure, it should be
clear that there is a close parallel between the Riemann integral and Jordan measure. The
normality property above begins to make this connection formal. The next result further
strenghens the two-way connection between the two notions.
P ROOF. First suppose that f is Riemann integrable and let e > 0 be given. Choose pc
R
functions g, h such that g ≤ f ≤ h and pc (h − g) < e. Let E be the region under the
graph of g and let F be the region under the graph of h. It is clear that E, F are elementary,
E ⊂ A ⊂ F, and m( F r E) < e.
Conversely if A is Jordan measurable we can find an elementary E such that E ⊂
A and m( A r E) < e. Using our usual grid argument, we can suppose that there is
a sequence of disjoint intervals Ij such that E is a union of boxes with horizontal sides
selected from the Ij . Pairing each Ij with the constant c j = the maximum of the vertical
coordinates of all of the boxes with horizontal side Ij , we obtain a pc function g. It is easy
R
to see that m( E) ≤ pc g ≤ m( A). This shows that the lower Riemann integral of f is
m( A). We can proceed similarly using an outer approximation B to show that the upper
Riemann integral of f is m( A) too.
Depending on when you last studied Riemann integration, you may better recall Rie-
mann’s classical approach rather than the Darboux approach above. This version involves
a quite expansive notation:
◦ f denotes a real-valued, bounded function defined on the interval [ a, b].
◦ x0 , x1 , . . . , xk denotes an increasing sequence of points in [ a, b] (they will be rec-
tangle endpoints), where x0 = a and xk = b.
◦ P denotes the partition of [ a, b] into subintervals defined by the xi , that is, into
subintervals [ xi−1 , xi ].
◦ δxi denotes the length of the ith interval, xi − xi−1 .
◦ kP k denotes the norm of the partition, max δxi .
4. RIEMANN INTEGRATION 15
◦ x1∗ , . . . , xk∗ denotes any selection of points such that xi∗ ∈ [ xi−1 , xi ].
With these pieces in hand, we can define the Riemann sums and the Riemann integral.
4.6. D EFINITION . With f , P , δxi , xi∗ as above, the corresponding Riemann sum is:
provided this limit exists. Here the limit “exists” and equals L if for all e > 0 there exists
δ > 0 such that for all P and xi∗ we have kP k < δ implies | R( f , P , xi∗ ) − L| < e.
E XERCISE 4.1 (See Tao, Ex 1.1.21). Show that the pc integral is well-defined, and satis-
fies the normality, linearity, and monotonicity properties.
E XERCISE 4.2 (Tao, Ex 1.1.22). Let f be a bounded function on the interval [ a, b]. Then
f is integrable in the Darboux sense if and only if f is integrable in the classical Riemann
sense, and in this case the two values agree.
E XERCISE 4.4 (Tao, Ex 1.1.24). Complete the proof of Proposition 4.4: Show that the
Riemann integral satisfies the linearity and monotonicity properties. (Hint: first establish
these properties for the pc integral.)
5. INTRODUCTION TO LEBESGUE MEASURE 16
The Jordan measure that we have constructed works very well for the sets that it
measures. And the Riemann integral works very well for the functions that it integrates.
But there are several shortcomings that we have discussed, and several more too.
◦ Unbounded sets are not Jordan measurable, and unbounded functions are not
Riemann integrable
◦ There are examples of bounded sets which are open or closed, but still not Jordan
measurable
◦ A countable union of Jordan measurable sets need not be Jordan measurable
◦ A pointwise limit of Riemann integrable functions need not be Riemann inte-
grable, even if it is again bounded
In this section we will strenghten the definiton of Jordan measure to obtain the Lebesgue
measure. The Lebesgue measure possesses stronger properties than the Jordan measure,
including the ability to measure a wider class of sets. The price for this is that it will be
harder to establish these properties.
To begin, recall from an exercise in the Jordan measure section that we can rewrite the
definition of outer Jordan measure as follows.
( )
k k
m ( A) = inf ∑ vol( Bi ) Bi are boxes and A ⊂
∗j
[
Bi
1
1
The idea of the Lebesgue measure is simply to replace the finite union and summation
with a countable union and summation.
Notice that we have dropped the assumption that A is bounded. There are many
examples of unbounded sets with Lebesgue outer measure zero. In fact, every countable
set has lebesgue outer measure zero.
We also remark that we will not define an “inner” version of Lebesgue measure anal-
ogous to the Jordan inner measure. The reason is that we do not wish to assume that pos-
itive measure sets will contain any positive volume boxes. For example, the set [0, 1] r Q
should have a measure of 1 but has lower Jordan measure 0. In fact, even if we replace the
finite summation from Jordan inner measure with a countable summation, the resulting
inner measure would still be 0!
5. INTRODUCTION TO LEBESGUE MEASURE 17
5.2. D EFINITION . Let A be any subset of Rn . We say that A is Lebesgue measurable if for
every e > 0 there exists a sequence of boxes Bi such that A ⊂ Bi and m∗ ( Bi r A) < e.
S S
Most sources actually define A to be Lebesgue measurable if for every e > 0 there
exists an open set O such that A ⊂ O and m∗ (O r A) < e. While this definition using open
sets is more elegant, our official definition using unions of boxes agrees more closely with
our definition of Lebesgue outer measure. Our work of the next few sections will reveal
how to show that these two definitions are equivalent.
We will see in the rest of this section and the next that the Lebesgue measure agrees
with the Jordan measure on the Jordan measurable sets, and moreover is capable of mea-
suring significantly more sets. In fact the Lebesgue measurable sets encompass almost
everything seen in real analysis and its applications, with exceptions essentially boiling
down to certain Axiom of Choice constructions. The Lebesgue measure also satisfies all
the measure axioms that we have mentioned so far, including their countable versions.
Likewise, later on we will introduce the corresponding Lebesgue integral. This in-
tegral agrees with the Riemann integral, and is capable of integrating significantly more
functions. It also has significantly stronger properties than the Riemann integral, includ-
ing a countable version of linearity.
Before we begin working to establish all these claims, we study the Lebesgue outer
measure further. In order to proceed, it is useful to lay out what properties are expected
of an outer measure. The following will be referred to as the outer measure axioms.
(a) (empty set) m∗ (∅) = 0
(b) (monotonicity) If A ⊂ B then m∗ ( A) ≤ m∗ ( B)
(c) (countable subadditivity) m∗ ( An ) ≤ ∑ m∗ ( An )
S
Since the outer measure applies to all sets, and we have seen there exist non-measurable
sets, we do not expect outer measure to satisfy countable additivity in general. Still axiom
(c) is quite strong: the Jordan outer measure does not satisfy countable subadditivity.
5.3. P ROPOSITION . The Lebesgue outer measure satisfies the outer measure axioms (a)–(c).
P ROOF. The axioms (a) and (b) are both trivial, so it remains to prove only axiom
(c). Let En be arbitrary sets and let e > 0 be given. From the definition of Lebegue
outer measure, for each n we can find a sequence of boxes Bin such that An ⊂ i Bin and
S
m∗ ( ∑ ∑ vol( Bin )
[
An ) ≤
n i
≤ ∑ (m∗ ( An ) + e/2n )
n
≤ ∑ m∗ ( An ) + 2e
n
E XERCISE 5.1 (Tao, Ex 1.2.1). Show that the countable union of Jordan measurable sets
need not be Jordan measurable, even when bounded. Show that the countable intersection
of Jordan measurable sets need not be Jordan measurable.
E XERCISE 5.3. Show that m∗ ( A) ≤ m∗ j ( A). Give an example of a set A such that
m ∗ ( A ) < m ∗ j ( A ).
E XERCISE 5.4. Show that Jordan outer measure does not satisfy countable subadditiv-
ity.
E XERCISE 5.5. Show that if A is Lebesgue null, that is, m∗ ( A) = 0, then A is Lebesgue
measurable.
6. LEBESGUE OUTER MEASURE 19
We have shown that the Lebesgue outer measure satisfies countable subadditivity. We
are really interested in additivity, but we know that even the finite additivity axiom cannot
hold for all sets. In the end, we will prove that countable additivity is true for measurable
sets. For the moment, we will be satisfied with the following version of additivity which
holds in special cases.
6.1. L EMMA . Suppose that A, B are positively separated, that is, that d( A, B) = inf { d( x, y) | x ∈ A, y ∈ B } >
0. Then m∗ ( A ∪ B) = m∗ ( A) + m∗ ( B).
Let us first consider an easy case when each Ci meets at most one of the sets A, B. Then
we can rewrite the sequence {Ci } as { Di } ∪ { Ei }, where the Di ’s meet only A and the Ei ’s
meet only B. Now
m∗ ( A ∪ B) > ∑ vol(Ci ) − e
= ∑ vol( Di ) + ∑ vol( Ei ) − e
≥ m∗ ( A) + m∗ ( A) − e
Taking e → 0, we are done in this case.
In the general case, we can reduce to the easy one by partitioning each Bi into smaller
boxes, each with diameter smaller than d( A, B). Once this is done, each new box meets at
most one of A, B and we may proceed as above.
Up to this point, we have not yet shown that m∗ ever takes a nonzero value! In fact m∗
satisfies a strong normality axiom, which states that the outer measure of an elementary
set is equal to its elementary measure. When we proved this property for Jordan measure,
we started by showing that one cannot partition an interval into finitely many subinter-
vals whose lengths somehow add up to less than the original. For countable partitions
this is intuitively still true, but much harder to show!
P ROOF. It is clear that m∗ ( E) ≤ me ( E), since E is itself a union of boxes whose volumes
sum to m( E). Thus it remains only to show m∗ ( E) ≥ me ( E). Appealing to the definition
of m∗ ( E), given any e > 0 we can find boxes Bi such that E ⊂ Bi and ∑ vol( Bi ) −
S
m∗ ( E) < e. Rearranging, this says m∗ ( E) > ∑ vol( Bi ) − e. Now we would like to say that
∑ vol( Bi ) ≥ me ( E), but unfortunately the elementary measure is only finitely subadditive.
6. LEBESGUE OUTER MEASURE 20
In order to proceed, let us temporarily assume that E is closed and the Bi are open.
We recall that any closed and bounded set is compact, and that any covering of a compact
set by open sets has a finite subcovering. Thus under these assumptions, we have that
just finitely many of the Bi are needed to cover E. Thus the argument of the previous
paragraph works in this case!
In order to assume that the Bi are open, we can enlarge each slightly and find an open
box Bi0 such that Bi ⊂ Bi0 and vol( Bi0 ) − vol( Bi ) < e/2i .
In order to assume that E is closed, first write it as a finite union of disjoint boxes
C1 , . . . , Ck . Shrinking each Ci slightly, we can find a closed box Ci0 ⊂ Ci such that me (Ci r
Ci0 ) < e/k. Replacing E with Ci0 we obtain a closed set as desired.
S
As a consequence of the theorem, we now know that finite additivity holds for m∗ for
finite unions of disjoint boxes (after all it is true for the elementary measure). In fact it also
holds for finite unions of almost disjoint boxes: here two boxes are said to be almost disjoint
if they have disjoint interiors. This is because the elementary measure of the boundary of
a box is always zero. The next result extends this from finite to countable unions.
6.3. T HEOREM . Suppose Bi is a sequence of pairwise almost disjoint boxes. Then m∗ ( Bi ) =
S
∑ vol( Bi ).
P ROOF. By subadditivity together with the previous theorem, we have m∗ ( Bi ) ≤
S
6.5. L EMMA . Let A be any subset of Rn . Then m∗ ( A) = inf { m∗ (O) | O is open and A ⊂ O }.
m∗ ( A) ≥ ∑ vol( Bi ) − e
≥ m∗ (
[
Bi )
∗
≥ inf { m (O) | O is open and A ⊂ O } − e
Taking e → 0, we obtain the desired result.
In the next section we will use these partial results to conclude that the Lebesgue
(outer) measure always behaves well on the measurable sets.
E XERCISE 6.2 (Tao, Ex 1.2.6). Show that it is not true in general that
that countable unions of boxes are Lebesgue measurable. In the next result we work to
establish that many, many other sets are Lebesgue measurable too.
7.1. T HEOREM . Open and closed sets are Lebesgue measurable. Complements, countable
unions, and countable intersections of measurable sets are measurable.
P ROOF. Since the boxes form a base for the topology of Rn , any open set can be written
as a union of boxes. (Or see Proposition 6.4.) Thus by the remark above, open sets are
Lebesgue measurable.
For countable unions, suppose that An are Lebesgue measurable. Given e > 0, find
for each n a countable union of boxes Un such that An ⊂ Un and m∗ (Un r An ) < e/2n .
Then we have
m∗ ( An ) ≤ m∗ (
[ [ [
Un r Un r An )
≤ ∑ m∗ (Un r An )
≤ ∑ e/2n = 2e
S
This shows that An is measurable.
For closed sets, assume first that A is closed and bounded, and thus compact. Using
the outer regularity lemma we can find an open set O such that A ⊂ O and m∗ (O) −
m∗ ( A) < e. We wish to show that m∗ (O r A) < e too. Since O r A is open, we can use
Proposition 6.4 to write O r A as an almost disjoint union of closed dyadic cubes Cn . Then
SN
1 Cn is compact and thus positively separated from the compact set A. By Lemma 6.1,
additivity holds for positively separated sets, so we have:
N N
m∗ ( A) + m∗ ( Ci ) = m∗ ( A ∪
[ [
Ci )
1 1
≤ m∗ (O)
< m∗ ( A) + e
are using normality and subadditivity to achieve this estimate.) Taking the intersection of
the On we now have A ⊂ ∩On and m∗ ( On r A) = 0. Writing these two expressions in
T
as a union of two sets: On and Ac r Onc . The first is Lebesgue measurable because it
S c S
is a countable union of closed sets. The second is Lebesgue measurable because it is null
(see an earlier exercise). Appealing again to the closure under unions, we conclude that
Ac is measurable too.
For countable intersections, we can simply apply Demorgan’s laws to reduce it to
complements and countable unions. Whew!
The above theorem thus shows that the Lebesgue measurable sets form a σ-algebra,
that is, a family of sets that is closed under countable unions, countable intersections,
and complements. It moreover shows that the Lebesgue measurable sets includes the
well-known class of Borel sets, that is, the σ-algebra generated by the open and closed
sets. The Borel sets are often identified as those which can be explicitly described. Most
sets we encounter in analysis can be explicitly described and are thus Borel and Lebesgue
measurable.
We now know that Borel sets are Lebesgue measurable, null sets are Lebesgue mea-
surable, and the measurable sets form a σ-algebra. The next result concludes that this
information characterizes the Lebesgue measurable sets.
7.2. P ROPOSITION . The collection of Lebesgue measurable sets is the least σ-algebra contain-
ing both the open sets and the Lebesgue null sets.
P ROOF. It is clear that the Lebesgue measurable sets are a σ-algebra containing the
open sets and the Lebesgue null sets. On the other hand suppose that E is a Lebesgue
measurable set. By the previous lemma for all n we can find open sets On such that
E ⊂ On and m∗ (On r E) < 1/n. It follows that N = On r E is Lebesgue null. Now have
T
that
E = (On ) ∩ N c
\
and thus E lies in the σ-algebra generated by the open sets and the Lebesgue null sets.
We conclude this section with some useful equivalents of Lebesgue measurability,
similar to the ones we developed for Jordan measurability. The following result implies
that the Lebesgue measurable sets can be characterized as those which are “almost open.”
7.3. L EMMA . A set A is Lebesgue measurable if and only if for all e > 0 there exists an open
set O such that m∗ (O 4 A) < e.
and m∗ ( B r A) = 0. We have thus shown that A differs from a measurable set by a null
set, and we leave it as an exercise to check that this implies A is measurable too.
Perhaps even more surprising, the Lebesgue measurable sets of finite measure can be
characterized as those which are “almost elementary”.
7.4. L EMMA . A set A is Lebesgue measurable with finite Lebesgue measure if and only if for
all e > 0 there exists an elementary set E such that m∗ ( E 4 A) < e.
P ROOF. Suppose that A is Lebesgue measurable and let e > 0 be given. Let O be an
open set such that A ⊂ O and m∗ (O r A) < e. Then O can be written as a union of almost
disjoint boxes O = Bi , and we know that m(O) = ∑ vol( Bi ).
S
Now m(O) < m( A) + e and the right-hand side is finite, so the sum ∑ vol( Bi ) con-
verges. Thus there exists some N such that ∑∞
SN
N +1 vol( Bi ) < e. Letting E = 1 Bi , we have
that E is elementary and m(O r E) < e. Thus we have
m∗ ( E 4 A) = m∗ ( E r A) + m∗ ( A r E)
≤ m∗ (O r A) + m∗ (O r E)
< 2e
which is sufficient to prove the implication. The converse implication is similar to the
previous lemma.
We have now established many useful properties of the outer measure m∗ and shown
that it has a broad collection of measurable sets. In the next section we will confirm as
promised that m∗ behaves very well when restricted to the collection of measurable sets.
E XERCISE 7.1 (See Tao, Ex 1.2.7). (a) Show that A is measurable iff for all e > 0
there exists a closed set F ⊂ A such that m∗ ( A r F ) < e.
(b) Show that A is measurable iff for all e > 0 there exists a measurable set B such
that m∗ ( A 4 B) < e.
E XERCISE 7.2 (Tao, Ex 1.2.14). Show that any set A is contained in a Lebesgue measur-
able set B such that m( B) = m∗ ( A).
E XERCISE 7.3 (Tao, Ex 1.2.15). Show the inner regularity property: If A is Lebesgue
measurable, then
m( A) = sup { m(K ) | K ⊂ A, K compact }
8. LEBESGUE MEASURE 25
In the previous section we established that many sets are Lebesgue measurable. When
A is Lebesgue measurable we simply write m( A) for m∗ ( A), and we call m the Lebesgue
measure. We are finally ready to prove that the Lebesgue measure satisfies the require-
ments of a measure that we laid out in the first section, at least when they are applied to
Lebesgue measurable sets.
surable sets An .
P ROOF. We have already established normality for m∗ and hence for m. Translation-
invariance of m∗ is clear from the definition since it is clear for boxes.
For countable additivity, first recall that we always have subadditivity so we need only
show m( An ) ≥ ∑ m( An ). Suppose first that the An are compact. Then they are pairwise
S
∑ m( An ) as desired.
Next assume that the An are bounded but not necessarily closed. By the measurability
of Acn we can find open sets On such that Acn ⊂ On and m∗ (On r Acn ) < e/2n . Taking
complements we thus have compact sets Kn ⊂ En such that m∗ ( An r Kn ) < e/2n . Now
using the additivity for compact sets,
[ [
m( An ) ≥ m( Kn )
= ∑ m ( Kn )
≥ ∑(m( An ) − e/2n )
= ∑ m( An ) − e
Taking e → 0, we are finished in this case.
Finally for general An , decompose Rd into disjoint bounded cells Cm . Then An =
m An ∩ Cm . Now the sets An ∩ Cn are bounded, so applying the result for bounded sets
S
twice we have:
∑ ∑ m( An ∩ Cm )
[
m( An ) =
n m
= ∑ m( An )
n
We have now established the existence and all of the promised axioms of the Lebesgue
measure. Additional useful properties can be derived from the axioms, such as the fol-
lowing result concerning continuity of the measure function.
P ROOF. For the upwards MCT, let A0n = An r An−1 and note that the A0n are disjoint
and have the same union as before: A0n = An . Note that we implicitly set the value
S S
A0n )
[ [
m( An ) = m(
= ∑ m( A0n )
= ∑ m ( A n ) − m ( A n −1 )
N
= lim ∑ m( An ) − m( An−1 )
N
1
= lim m( A N )
N
An ) −
T
Cancelling the m( A1 ) from the first and last expression, we obtain that 0 = m(
lim m( An ), which implies the desired result.
E XERCISE 8.1 (Tao Ex 1.2.11(iii)). Give a counterexample showing that the hypothesis
that some An has finite measure is necessary for the downwards MCT.
E XERCISE 8.2 (Tao Ex 1.2.12). Suppose you know that the domain of m is a σ-algebra,
and m satisfies m(∅) = 0 and the countable additivity property. Show that m satisfies the
monotonicity property and the countable subadditivity property.
E XERCISE 8.3 (Tao Ex 1.2.13). Let us say that a sequence of sets An converges to A if
the characteristic functions χ An converge pointwise to χ A .
8. LEBESGUE MEASURE 27
(a) Show that if An are Lebesgue measurable and An converges to A then A is Lebesgue
S T
measurable. [Hint: Show that if An converges to A then A = n m>n Am and
T S
also A = n m>n Am .]
(b) Suppose that if An are all contained in a set of finite measure and An converges to
A, then m( An ) → m( A). This is an example of the dominated convergence theorem.
(c) Give a counterexample showing that the hypothesis that An are all contained in
a set of finite measure cannot be replaced with the hypothesis that the values
m( An ) are bounded.
PART II
9.1. D EFINITION . A function f mapping Rd into the extended real numbers [0, ∞]
(or sometimes into C) is called simple if there exists a partition of Rd into finitely many
Lebesgue measurable subsets A1 , . . . , Ak such that f takes a constant value ci on each Ai .
Equivalently, we may say that f is simple if it is of the form f = ∑1k cn χ Ai where Ai are
Lebesgue measurable sets.
The simple functions are the source of the following commonly held intuition about
Lebesgue integration: While Riemann integration relies on cutting into vertical strips,
Lebesgue integration relies on cutting into horizontal strips. The idea is that the region
below a simple function consists of finitely many horizontal strips with measurable cross-
sections, and thus it is very simple to compute the integral of such a function, as is done
in the next definition. We can then approximate many non-simple functions using simple
functions, as is done in the next section.
9.2. D EFINITION . If f = ∑1k ci χ Ai is a simple function and f ≥ 0, then the simple integral
of f is defined to be s f = ∑1k ci m( Ai ).
R
28
9. PREVIEW OF INTEGRATION, SIMPLE INTEGRATION 29
Note that we assume f ≥ 0 to ensure that the value of the simple integral is never
indeterminate. As was the case with both elementary measure and pc integral, we have
to check that the simple integral is well-defined.
P ROOF. We prove the first property (i), since after that properties (ii) and (iii) are very
similar to the analogous properties of the Riemann integral. Given simple functions f and
g, we can refine their expressions to find measurable sets A1 , . . . , An and a null set N such
that f ( x ) = ∑ ci χ Ai ( x ) for all x ∈
/ N and g( x ) = ∑ ci χ Ai ( x ) for all x ∈
/ N. Then clearly the
simple integral of both f and g evaluates to ∑ ci m( Ai ) + 0.
9. PREVIEW OF INTEGRATION, SIMPLE INTEGRATION 30
Throwing away null sets is common in analysis, and thanks to our understanding of
the Lebesgue measure it carries with it a lot of power. When studying sets and functions in
the measure context, it will even be useful to modify our logic. We will use the quantifiers
∀∗ x and ∃∗ x to mean “the statement holds for all but a null set of x” and “there exists a
non-null set of x such that the statement holds”.
We close this section with a preview of how the definition of the Lebesgue integral
will proceed in several stages. In the next section, we will use a familiar approximation or
limit idea to extend the simple integral to a much wider class of nonnegative functions.
In two sections, we will show how to extend the integral from nonnegative function to
complex-valued functions.
To imagine how this latter part will go, it is useful to recall the development of infinite
series. Recall that if an ≥ 0 we can define ∑ an as simply sup N ∑1N an . Next if an are
arbitrary real numbers we say that the terms are absolutely summable if ∑ | an | < ∞. In
that case we split each term into its positive part a+ n = max( an , 0) and its negative part
−
an = max(− an , 0). In this way we have for each term an = a+ −
n − an and we may define
−
∑ an = ∑ a+ n − ∑ an . Note that the assumption that an is absolutely summable guarantees
that the latter expression is not indeterminate. Finally if an are complex numbers then we
again assume that ∑ | an | < ∞, that is, the terms are absolutely summable in the complex
sense. In that case we can divide each term into its real part < an and imaginary part = an ,
and define ∑ an = ∑ < an + i ∑ = an .
E XERCISE 9.1. Show that a function f is simple if and only if it can be expressed as
f = ∑1k ci χ Ai , where Ai are (not necessarily disjoint) Lebesgue measurable sets.
E XERCISE 9.2 (see Tao, Ex 1.3.1). Show that the simple integral satisfies the properties:
(a) (finiteness) s f < ∞ if and only if f is finite almost everywhere and supported
R
Just as the Riemann integral was able to integrate functions that can be well approx-
imated by pc functions, the Lebesgue integral will be able to integrate functions that can
be well approximated by simple functions. At the time we did not give a direct defini-
tion of the Riemann integrable functions. This time we will define in advance the class of
functions for which the Lebesgue integral will make sense. As indicated, we begin with
just the nonnegative real-valued functions.
As was the case with Lebsegue measurable sets, the Lebesgue measurable functions
can be equivalently described in a number of ways, each being in useful in some situa-
tions.
10.2. T HEOREM . A nonnegative function f is measurable if and only if either of the following
holds.
(a) there is a sequence f n of simple functions such that the f n are bounded and have bounded
support, the f n are increasing f n ≤ f n+1 , and f = sup f n ;
(b) for any open set S (respectively: closed set, interval, ray, etc) the preimage f −1 (S) is
Lebesgue measurable.
Before the proof, recall that given a sequence xn we define lim sup xn = inf N supn≥ N xn
and lim inf xn = sup N infn≥ N xn . The lim sup is the largest limit point of xn , that is, the
largest number that is the limit of a subsequence of xn . Similarly, the lim inf is the smallest
limit point of xn . The limit lim xn exists if and only if lim sup xn = lim inf xn , and lim xn
equals this common value.
P ROOF. We first show that if f is measurable, then (b) holds. So let f n be simple
functions such that f = lim f n pointwise. Note that by the above discussion, we have that
f = lim sup f n pointwise. Now suppose that S = (λ, ∞) is an open ray. Then we want to
say that
x ∈ f −1 (S) ⇐⇒ f ( x ) > λ
⇐⇒ inf sup f n ( x ) > λ
N n≥ N
⇐⇒ (∀ N )(∃n ≥ N ) f n ( x ) > λ
f n−1 (λ, ∞]
\ [
⇐⇒ x ∈
N n≥ N
Since the f n are simple, it is clear that the set in the last line is measurable and therefore
f −1 (S) would be measurable. However the argument isn’t right, since for instance it isn’t
10. LEBESGUE MEASURABILITY OF FUNCTIONS 32
quite true that (∀n)zn > λ implies infn zn > λ. The correct calculation introduces a couple
additional steps but ultimately accomplishes the same thing:
x ∈ f −1 (S) ⇐⇒ f ( x ) > λ
⇐⇒ (∃e) f ( x ) ≥ λ + e
⇐⇒ (∃e)(∀ N ) sup f n ( x ) ≥ λ + e
n≥ N
Once again, this establishes that f −1 (S) is measurable. Now an analogous argument will
allow us to handle the case when S = (−∞, µ). Since any open interval is an intersection
of two open rays, and any open set is a countable union of intervals, we can conclude that
for any open S the set f −1 (S) is measurable. This establishes (b).
Next we argue that (b) implies (a). Suppose that f satisfies condition (b). Given any n,
we will define a nonnegative simple function f n ≤ f as follows:
max i | i ≤ f ( x ) and i ≤ n x ∈ [−n, n]d
2n 2n 2n
f n (x) =
0 otherwise
The above prescription clearly ensures that f n ≤ f , that f n is bounded above by n, and
that f n has bounded support [−n, n]d . Moreover it is not difficult to check that f n ≤ f n+1
and that f n → f . It remains only to show that f n is simple, and since f n clearly takes
just finitely many values we really only have to check that it takes each of its values on
a measurable set. For this, for example we have f n ( x ) = 2in if and only if f ( x ) lies in
the interval [ 2in , i+ 1 −1 i
2n ). It follows from property (b) that f n ( 2n ) is a measurable set, and
therefore we have that f n is simple and f satisfies property (a).
Finally it is trivial that (a) implies f is measurable, so we have completed the proof.
It is worth remarking that by property (b) of the Lemma, measurability can be viewed
as a massive generalization of continuity. Recall that a function f is continuous if and only
if whenever S is open we have f −1 (S) open. In property (b), we ask merely that f −1 (S)
be Lebesgue measurable, a much weaker demand.
Notice also that since preimages are stable under unions, intersections, and comple-
ments, property (b) implies that if S is Borel then f −1 (S) will be measurable too. But if S
is merely measurable, there is no guarantee that f −1 (S) will be measurable! To see this
consider a function f which is a bijection between [0, 1] and a null set. For example one
can map [0, 1] into the Cantor set C injectively almost everywhere by operating on binary
10. LEBESGUE MEASURABILITY OF FUNCTIONS 33
Now if N is a Lebesgue nonmeasurable subset of [0, 1], we have that S = f ( N ) is null but
the preimage f −1 (S) is non-measurable.
To close the section, we extended the definition of measurable function from nonneg-
ative functions only to complex-valued functions in the following way. Recall that if f is
a real-valued function, then we can define its positive and negative parts:
f + = max( f , 0)
f − = max(− f , 0)
We note that the above defintion is equivalent to the alternate approach of simply
replacing nonnegative simple functions with complex-valued simple functions in the def-
inition of measurable function.
E XERCISE 10.1. Show that lim xn = x if and only if lim inf xn = x = lim sup xn .
Previously we defined the simple functions, showed that they can be integrated in an
obvious way, and showed that integral satisfied basic desirable properties such as addi-
tivity. Next we will define the lower integral for an arbitrary nonnegative function using
approximations by simple integrals. After establishing some basic properties of the lower
integral, we will see that it behaves very well when applied to measurable functions, and
in that case we will simply call it the Lebesgue integral.
Before investigating the Lebesgue integral itself, we will describe several properties
of the lower Lebesgue integral. While it is also possible to define the upper Lebesgue
integral, it is of more limited use than in the Riemann case. Later we will define the upper
Lebesgue integral just for bounded functions with bounded support. In general there are
functions which are measurable and should have finite integral, that do have the correct
lower Lebesgue integral, but do not have a finite upper Lebesgue integral.
It is clear that the lower Lebesgue integral agrees with the simple integral on the sim-
ple functions. It also inherits the equivalence and monotonicity properties from the simple
integral, but not linearity. Recall that m∗ was merely subadditive; this is essentially be-
cause it was defined as an infumum. On the other hand the lower Lebesgue integral will
be superadditive; this is essentially because it is defined as a supremum.
P ROOF. The equivalence and monotonicity properties are clear from the analogous
properties of simple integrals. For superadditivity, let e > 0 be given and find simple
R R R R
functions h and k such that h ≤ f , k ≤ g, f − s h < e, and g − s k < e. Then we have
h + k ≤ f + g, and using monotinicity plus additivity for simple integrals:
Z Z
( f + g) ≥ s (h + k)
Z Z
= s h+s k
Z Z
> f+ g − 2e
11. LEBESGUE INTEGRATION OF NONNEGATIVE FUNCTIONS 35
11.3. L EMMA . Let f be a nonnegative function on Rd . The lower Lebesgue integral satisfies
the following identities.
(a) (range truncation) If f N = min( f , N ) then f N → f .
R R
R R
(b) (support trunctation) If f N = f χ[− N,N ]d then f N → f .
P ROOF. (a) Let us first assume that f < ∞. Given e > 0 we can find a simple
R
R R
function g such that g ≤ f and f − s g < e. By our assumption g must be bounded
almost everywhere, which implies that for N large enough we have g ≤ f N too. Now by
monotonicity f − f N ≤ f − s g < e, which shows the desired result. The argument
R R R R
Where here N → ∞ and we are applying the upwards monotone convergence theorem.
R R
Thus we can find N large enough that s g − s g N < e. Again using monotonicity we
R R R R
conclude that f − f N ≤ f − s g N < 2e.
We are now ready to show that the Lebesgue integral behaves well when applied to
measurable functions.
R R R
11.4. T HEOREM . If f , g are nonnegative measurable functions, then ( f + g) = f + g.
P ROOF. First suppose that f , g are bounded functions with bounded supports. For
such functions, it is useful to define the upper Lebesgue integral in the obvious way:
Z Z
f = inf s h f ≤ h, h simple
R R
We claim that under our hypotheses, we in fact have f = f (and similarly for g and
f + g).
To see this recall that since f is measurable, we can find simple functions f n such that
f n ≤ f n+1 and f n → f . Note also that since f is bounded, the construction of the f n from
Theorem 10.2 in fact showed that the f n converge uniformly to f . Thus given e > 0 we can
find n such that
f ≥ f n ≥ f − eχS
11. LEBESGUE INTEGRATION OF NONNEGATIVE FUNCTIONS 36
R R
where S is a support for f . Taking of the first inequality and of the second, we obtain
Z Z Z Z
f ≥s fn ≥ ( f − eχS ) = f − em(S)
{ ( x, y) | 0 ≤ y ≤ f ( x ) }
We have now defined and explored the Lebesgue integral for nonnegative functions.
As previously explained, we will now proceed to extend this definition to signed and
even complex-valued functions. Some care will of course be needed; to see this consider
what would happen when trying to find the integral of sin( x ) or of 1/x over the whole
real line! In order to proceed, we will provide an assumption which guarantees that such
issues will not occur.
We can now define the Lebesgue integral for absolutely integrable functions f , by
using the real, imaginary, positive, and negative parts.
P ROOF. Let us first assume that f is nonnegative. Then since pc functions are simple,
we clearly have that the lower Darboux integral of f is less than or equal to the lower
Lebesgue integral of f . On the other hand by monotonicity the lower Lebesgue integral
of f is less than or equal to the upper Darboux integral of f . Since the lower and upper
Darboux integrals are equal, we must have that it agrees with the Lebesgue integral of f .
If f is real-valued, then we can write f = f + − f − , and this expression is valid
R R R
for both the Darboux and absolutely convergent Lebesgue integrals. Applying the previ-
ous argument to both f + and f − , we have the desired result.
12. LEBESGUE INTEGRATION 38
Next, the absolutely convergent Lebsegue integral of course inherits many of the prop-
erties of the nonnegative integral, and has some new ones too.
R R R R
12.4. P ROPOSITION . (a) (linearity and conjugation) ( f + g) = f + g, c f =
c f , and f¯ = f ;
R R R
R R
(b) (triangle inequality for integrals) f ≤ | f |.
P ROOF. We should first check that if f , g are absolutely integrable, then f + g and c f
are absolutely integrable. For the first we can simply use the classical triangle inequality
R R R
| f + g| ≤ | f | + | g|. Then monotonicity implies that | f + g| ≤ | f | + | g| is finite. For
R R R
the second, simply note that |c f | = |c|| f | = |c| | f | is again finite.
Now linearity is easily proved using the analogous property for the nonnegative inte-
gral. For example, if f , g are real-valued and h = f + g, then h+ − h− = f + − f − + g+ −
g− . Rearranging terms we have f − + g− + h+ = f + + g+ + h− . From the nonnegative lin-
earity we know f − + g− + h+ = f + + g+ + h− . Rearranging back, we obtain
R R R R R R
R R R
h = f + g.
For the triangle inequality for integrals, first assume that f is real-valued. Then we
can write f = f + − f − and | f | = f + + f − . Thus using linearity together with the triangle
inequality, we have
Z Z Z
f = f + − f −
Z Z
≤ f+ + f−
Z
= |f|
Next if f is complex-valued we can find an angle θ such that eiθ f = f . Again using
R R
The last inequality following from monotonicity, and gives the desired result.
One of the most appreciable aspects of the theory of integration is that the class of
absolutely integrable functions forms a vector space. Although the space is of course
infinite dimensional, it has a substantial amount of structure! Recall that a nonnegative
function k · k defined on a vector space is called a seminorm if kv + wk ≤ kvk + kwk, and
kcvk = |c|kvk.
12. LEBESGUE INTEGRATION 39
12.5. P ROPOSITION . The collection of absolutely integrable functions forms a vector space.
R
In fact it is a seminormed vector space with the seminorm k f k = | f |.
R R R
P ROOF. We argued in the previous proof that | f + g| ≤ | f | + | g| and also that
R R
|c f | = |c| | f |. These two identities imply that the space of absolutely integrable func-
tions is closed under linear combinations and moreover that the two properties of the
seminorm hold.
A seminorm is called a norm if it additionally satisfies kvk = 0 =⇒ v = 0. If one is
willing to identify functions f , g which agree almost everywhere as being equal, then the
R R
norm k f k = | f | becomes a true norm. Indeed, it is an exercise to check that if | f | = 0
then f = 0 almost everywhere, and thus in this sense f = 0.
While the vector space of absolutely integrable functions is infinite dimensional, the
next result shows that it is not too unwieldy topologically. Recall that a subset D of a
(semi-)normed vector space V is dense if for every v ∈ V and every e there exists d ∈ D
such that kv − dk < e. In other words, D is dense if every element of V can be approxi-
mated by elements of D.
12.6. T HEOREM . The following are all dense subsets of the space of absolutely integrable
functions.
(a) absolutely integrable simple functions;
(b) absolutely integrable simple functions ∑1k ci χ Bi where Bi are all boxes; and
(c) continuous, compactly supported functions.
P ROOF. (a) First assume that f is nonnegative. Then by the definition of the integral,
R R
we can find a simple function g such that g ≤ f and f − g < e. It follows that
R R
( f − g) < e, and since f − g is nonnegative, clearly | f − g| < e. It is easy to extend
this argument to complex-valued functions using the standard technique.
(b) We now know from (a) that it is sufficient to approximate any simple function by
a function of this type. Using linearity, it is enough to approximate a single term χ A , with
m( A) < ∞, by a function of this type. We have already seen that for any such A there
R
exists an elementary set E such that m( A 4 E) < e. This means that |χ A − χ E | < e, so
the result follows.
(c) We now know from (b) that it is sufficient to approximate any χ B , B a box, by a
function of this type. It is possible to do this explicity. For example in one dimension
we have B = I is an interval, and the step function I can easily be approximated by a
continuous function which looks like a trapezoid.
We will see later that this density result fits in with several results which loosely state
that integrable functions are “almost continuous.”
E XERCISE 12.1 (Tao, ex 1.3.25(i)). Let f be absolutely integrable. Show that for any
R
e > 0 there exists a bounded measurable set A such that | f |χ Ac < e.
12. LEBESGUE INTEGRATION 40
E XERCISE 12.3. Show that the space of absolutely integrable functions is separable,
that is, has a countable dense subset.
13. CONVERGENCE THEOREMS 41
In the previous section we have seen that the Lebesgue integral satisfies all of the key
properties that the Riemann integral does, while at the same time being able to integrate
many more functions. But given that the Lebesgue measure enjoys much stronger proper-
ties than the Jordan measure does, it is natural to ask whether the Lebesgue integral does
too.
In order to find such strong properties, a good test question to ask is whether f n → f
R
implies f n → f . We saw that in the case of the Riemann integral, this does hold if f n , f
are all defined on an interval and f n → f uniformly. In the case of the Lebesgue integral,
the same proof shows that it works when f n , f are all supported on a common set of finite
measure, and f n → f uniformly.
But without these special hypotheses, such a convergence theorem can fail. So before
looking for situations where it does hold, let us examine some of the examples where it
does not.
13.1. E XAMPLE (Domain escape to infinity). Let f n = χ[n,n+1] and f = 0. That is, f n is
a sequence of moving unit bumps. Then f n → f pointwise (not uniformly), but we have
R R
f n = 1 for all n, and f = 0.
13.2. E XAMPLE (Support escape to infinity). Let f n = n1 χ[0,n] and f = 0. That is, f n is a
sequence of widening and shortening bumps. Then f n → f uniformly, but once again we
R R
have f n = 1 for all n, and f = 0.
13.3. E XAMPLE (Range escape to infinity). Let f n = nχ[1/n,2/n] and f = 0. That is, f n
is a sequence of narrowing and tallening bumps. Then f n → f pointwise (not uniformly),
R R
but once again we have f n = 1 for all n, and f = 0.
One should observe that in all three of our examples where a convergence theorem
fails, mass was destroyed in the limit. In particular, we do not have an example where
new mass is created in the limit. The next result states that mass can only be destroyed,
and never created.
We will prove Fatou’s lemma shortly, but first we will use it to prove the dominated
convergence theorem. The dominated convergence theorem essentially states that so long
as we can close off the avenues through which the mass of a region can escape to infinity,
then we will have a convergence theorem.
13. CONVERGENCE THEOREMS 42
Intuitively speaking, the function G acts as an umbrella under which the convergence
f n → f occurs. The assumption that the umbrella covers just a finite amount of area
guarantees that mass cannot escape! The following proof will be carried out under the
assumption that Fatou’s lemma is true.
P ROOF OF THE DOMINATED CONVERGENCE THEOREM . By separating f n and f into
their real and imaginary parts, we may assume that they are all real-valued. Thus our
hypothesis says that − G ≤ f n ≤ G. Since f n → f we also have that − G ≤ f ≤ G. Now
f n + G is nonnegative, so we can use Fatou’s lemma to obtain:
Z Z
f + G ≤ lim inf fn + G
P ROOF. We first observe that if the f n were all characteristic functions, then this theo-
rem would follow directly from the upwards monotone convergence theorem for Lebesgue
measurable sets. Thus our strategy is to reduce to a situation in which we can apply the
upwards monotone convergence theorem.
R R
First, since f n ≤ f n+1 , by the montonicity of the integral we know that f n ≤ f n+1 .
R R R
Similarly since f n ≤ f we know that f n ≤ f . It follows that the values f n converge
R R
and that lim f n ≤ f .
R R R R
To show that f ≤ lim f n , it is sufficient to show that g ≤ lim f n for any simple
function g such that g ≤ f . Given such a g, we can express it as g = ∑ ci χ Ai where the
Ai are disjoint measurable sets. Using the range truncation lemma, we can suppose that
ci 6= ∞ for all i.
13. CONVERGENCE THEOREMS 43
Now fix just one of the sets Ai and let e > 0 be given. We define the sets
About this defintion, we first note that it immediately implies f n > ∑(1 − e)ci χ Ai,n . We
second note that for all x ∈ Ai , for n large enough, we have f n ( x ) ≥ (1 − e) g( x ) = (1 −
e)ci . This means that Ai is the union of the Ai,n . Thus the upwards monotone convergence
theorem for sets implies that m( Ai,n ) → m( Ai ) as n → ∞.
Putting this all together, we have
Z Z
lim f n ≥ lim ∑ (1 − e ) c i χ A i,n
monotone convergence theorem has the following easy but important consequence that
the nonnegative Lebesgue integral is countably linear, not just finitely linear!
P ROOF. Using the monotone convergence theorem and then finite linearity, we have:
Z Z N
∑ fn = lim ∑ f n
1
Z N
= lim
N
∑ fn
1
N Z
= lim ∑ fn
N
1
Z
=∑ fn
as desired.
Next we use the very special monotone convergence theorem to establish Fatou’s
lemma, which we needed in the proof of the dominated convergence theorem.
R R
P ROOF OF FATOU ’ S LEMMA . We will need the general fact that inf gn ≤ inf gn .
R R
This holds simply because inf gn ≤ gn for any particular n, and then one can take the
infn over both sides.
13. CONVERGENCE THEOREMS 44
Now recall that lim inf f n = lim N infn≥ N f n . The functions infn≥ N f n are increasing in
N, so using the monotone convergence theorem together with the above we have
Z Z
lim inf f n = lim inf f n
N n≥ N
Z
= lim inf f n
N n≥ N
Z
≤ lim inf fn
N n≥ N
Z
= lim inf fn
as desired.
E XERCISE 13.1 (Tao, ex 1.44, 1.45). (a) Let An be measurable sets and assume that
∑ m( An ) < ∞. Show that almost every x ∈ Rd is contained in at most finitely
many of the An . [Hint: Use Tonelli’s theorem on the functions χ An .] This is the
Borel–Cantelli lemma.
(b) Give a counterexample to the above conclusion, showing that the hypothesis
∑ m( An ) < ∞ cannot be replaced by the weaker condition lim m( An ) = 0.
E XERCISE 13.2. Use the dominated convergence theorem to show that the harmonic
series ∑ n1 diverges. [Hint: Let f n = n1 χ[0,n] , show that ∑ n1 < ∞ plus the dominated
R
convergence theorem implies f n → 0, and obtain a contradiction from this.]
14. ABSTRACT MEASURE THEORY 45
The Lebesgue measure and integration theory that we have developed can be re-
garded as a model for an abstract concept of a measure and integral. The situation is
very similar to other areas of mathematics. Consider the following examples of a concrete
and abstract concept:
◦ the space Rd with its distance measurement k x − yk is a model for the definition
of metric space;
◦ the space Rd with its family of open sets is a model for the definition of topological
space;
◦ the spaces R or C with their addition and multiplication operations are models
for the definition of field;
◦ the space Rd with its R-linear combinations is a model for the definiton of vector
space.
An abstract measure function µ on a space X will be one which satisfies the most
fundamental properties that we have worked to prove for the Lebesgue measure. Of
course the Lebesgue measure m can only be applied to the measurable subsets of Rd , so
we should only expect to be able to apply an abstract measure µ to a subcollection of the
subsets of X. Thus we make the following definition of an abstract space to take the place
of Rd and its measurable sets.
Thus if we let L denote the Lebesgue measurable subsets of Rd , we have that (Rd , L)
is a measurable space. For another example, if X is any set then we can always let F be
the collection of all countable or co-countable subsets of X. Finally if X is any topological
space then it can be viewed as a measurable space by taking its σ-algebra to be the collec-
tion of Borel subsets of X: A set is said to be Borel if it is constructible from the open sets
using countable intersections, countable unions, and complements.
lim µ( An ).
Each of these items may be proved in exactly the same way as we have done for
Lebesgue measure. Of course some properties of Lebesgue measure don’t even make
sense to state for a general measure. For instance the normality and translation-invariance
properties don’t make sense in general, since we may not have the notions of boxes or
translations on a general space X.
One further property of the Lebesgue measure which does not appear in the above
list is that subsets of null sets should be null. This property does not follow directly from
the axioms of a measure, and instead must be made into an additional axiom. We say that
a σ-algebra B is complete if whenever A ∈ B and A0 ⊂ A then A0 ⊂ B . We also say that
measure µ is complete if it is defined on a complete σ-algebra. It is an exercise to check that
if µ is a measure, then it can always be extended to a complete measure.
At this point we have only given very trivial examples of measure, besides the Lebesgue
measure. Given the effort we invested to find the right construction of the Lebesgue mea-
sure, it may not surprise you to know that it is difficult to construct interesting examples of
measures. In fact there is a very general and powerful method for constructing measures.
The basic idea behind it is that it is easy to construct measures which are just finitely addi-
tive first. This mirrors the construction of Lebesgue measure, where we first constructed
the elementary and Jordan measures.
In order to define finitely additive measures, we have to modify the definition of mea-
surable space.
We have seen several key examples of Boolean algebras on Rd : the collection of ele-
mentary sets together with their complements, and the collection of Jordan measurable
14. ABSTRACT MEASURE THEORY 47
sets together with their complements. If X is any set we can always discuss the trivial
Boolean algebra {∅, X } and the maximal Boolean algebra P ( X ).
14.6. D EFINITION . Let X be any set, A a Boolean algebra on X, and µ0 a finitely addi-
tive measure on A. Then µ0 is said to be a premeasure if it satisfies the additional axiom:
◦ for every sequence of pairwise disjoint sets An ∈ A, if An ∈ A, then µ0 (
S S
An ) =
∑ µ ( A n ).
The condition does not quite say that µ0 is countably additive, but rather that it has
the potential to be countably additive. This is confirmed by the following keystone result
of the subject, which we will prove in the next section.
The idea behind the proof is to use an analog of our construction of the Lebesgue outer
measure m∗ . First we define the notion of an abstract outer measure.
14.8. D EFINITION . Let X be any set and µ∗ a function on all subsets of X. Then µ∗ is
said to be a outer measure if it satisfies
(a) (empty set) µ∗ (∅) = 0;
(b) (monotonicity) if A ⊂ B then µ∗ ( A) ≤ µ∗ ( B); and
(c) (countable subadditivity) µ∗ ( An ) ≤ ∑ µ∗ ( An ).
S
P ROOF. The empty set and monotonicity axioms are clear from the definition. For
countable subadditivity, let Ei be given and e > 0. Then for every i there exists a sequence
of Ain such that Ei ⊂ Ain and ∑n µ0 ( Ain ) − µ∗ ( Ei ) < e/2i . It follows that Ei ⊂ i,n Ain and
S S
E XERCISE 14.1 (Tao, ex 1.4.26). Let µ be a measure on ( X, B). Show that B can be
extended to σ-algebra B̂ and µ to a measure µ̂ on ( X, B̂) in such a way that µ̂ is complete.
E XERCISE 14.3 (Tao, ex 1.7.6). Give an example of a finitely additive measure that
is not a premeasure. [Hint: work on the measurable space (N, P (N)) and define µ0
separately for finite and infinite sets.]
15. CONSTRUCTION OF ABSTRACT MEASURES 49
In the previous section, we showed how to abstract the properties of the elemen-
tary measure to a finitely additive measure, and how to abstract the properties of the
Lebesgue outer measure too. What we lacked was a definition of measurability, which in
the Lebesgue case relied on the boxes or open subsets of Rd . For a general space X, we
use the following much more subtle definition.
15.1. D EFINITION . Suppose that X is any set and µ∗ is an outer measure on X. Then a
subset A ⊂ X is said to be µ∗ -measurable if for every subset S ⊂ X we have
µ∗ (S) = µ∗ (S ∩ A) + µ∗ (S r A)
To put this definition in context, it is an exercise to show that this property holds for
Lebesgue measurable sets. Moreover it is this property that can be used to prove that a
bounded set is Lebesgue measurable if and only if its inner and outer measures agree.
The next result shows that the µ∗ -measurable sets are a good choice, in the sense that
µ∗ behaves well on the µ∗ -measurable sets (it is a measure).
P ROOF. The first part of the proof will be to show that the collection of µ∗ -measurable
sets is a Boolean algebra and µ∗ is finitely additive on the µ∗ -measurable sets. To begin, it
is clear that a set A is µ∗ -measurable if and only if Ac is µ∗ -measurable.
We now show that the µ∗ -measurable sets are closed under pairwise unions. Suppose
that A, B are µ∗ -measurable. Since we have already proved µ∗ is subadditive for arbitrary
sets, it suffices to prove that for any set S we have
To achieve this, we expand the right-hand side and then apply the measurability of B
followed by the measurability of A:
µ∗ (S ∩ ( A ∪ B)) + µ∗ (S ∩ ( A ∪ B)c )
≤ µ∗ (S ∩ A ∩ Bc ) + µ∗ (S ∩ A ∩ B) + µ∗ (S ∩ Ac ∩ B) + µ∗ (S ∩ Ac ∩ Bc )
= µ∗ (S ∩ A) + µ∗ (S ∩ Ac )
= µ(S)
15. CONSTRUCTION OF ABSTRACT MEASURES 50
Next we show that µ∗ is finitely additive on the µ∗ -measurable sets. Suppose that A, B
are disjoint and µ∗ -measurable. Then using the measurability of A, we have
µ∗ ( A ∪ B) = µ∗ (( A ∪ B) ∩ A) + µ∗ (( A ∪ B) r A)
= µ∗ ( A) + µ∗ ( B)
The second part of the proof is to show that the µ∗ -measurable sets form a σ-algebra
and that µ∗ is countably additive on the µ∗ -measurable sets. Let An be a sequence of
µ∗ -measurable sets. Since the µ∗ -measurable sets form an algebra, we can assume with-
out loss of generality that the An are pairwise disjoint. Then using the argument of
the previous paragraph, with S∩ inserted and then induction, for any set S we have
µ∗ (S ∩ 1N An ) = ∑1N µ∗ (S ∩ An ). It follows that
S
N N
µ∗ (S) = µ∗ (S ∩ An ) + µ∗ (S r
[ [
An )
1 1
N
= ∑ µ∗ (S ∩ An ) + µ∗ (S r
[
An )
1
Taking N → ∞, we obtain
µ∗ (S) = ∑ µ∗ (S ∩ An ) + µ∗ (S r
[
An )
≥ µ∗ ( (S ∩ An )) + µ∗ (S r
[ [
An )
= µ∗ (S ∩ An ) + µ∗ (S r
[ [
An )
P ROOF. We first show that if A lies in A then A is µ∗ -measurable. For this, let S be an
arbitrary set and find sets Bn ∈ A such that S ⊂ Bn and ∑ µ0 ( Bn ) − µ∗ (S) < e. Then
S
µ∗ (S ∩ A) + µ∗ (S r A) ≤ ∑ µ0 ( Bn ∩ A) + ∑ µ0 ( Bn r A)
= ∑ µ0 ( Bn )
< µ∗ (S) + e
15. CONSTRUCTION OF ABSTRACT MEASURES 51
We can assume without loss of generality that the Bn are pairwise disjoint. Then using the
axiom of a premeasure, we obtain
µ0 ( A ) = ∑ µ0 ( A ∩ Bn )
≤ ∑ µ0 ( Bn )
Taking the infemum over all such sequences Bn , we conclude that µ0 ( A) ≤ µ∗ ( A), as
desired.
This concludes our proof of Carathéodory’s extension theorem. As an application, we
will now show how to produce a family of interesting measures on the real line, beyond
just the Lebesgue measures.
It is easy to see that H consists of finite disjoint unions of half-open intervals. The
σ-aglebra generated by H is exactly the σ-algebra of Borel subsets of R. If F : R → R is an
increasing, right-continuous function, then we can define a measuring function on H by
letting µ F (( a, b]) = F (b) − F ( a), and extending µ F to disjoint unions in the obvious way.
letting µ F (( a, b)) = limx→b− F ( x ) − F ( a), and similarly for closed intervals. Then using
the right-continuity of F, we can pay e to replace each An with an open Un such that
An ⊂ Un . Similarly we can pay e to replace I with a closed (bounded) interval K such that
K ⊂ I.
Now by the Heine–Borel theorem, just finitely many of the Un suffice to cover K, and
hence µ F (K ) ≤ ∑ µ F (Un ). It follows that
[
µF ( An ) ≤ µ F (K ) + e
≤ ∑ µ F (Un ) + e
≤ ∑ µ F ( An ) + 2e
15. CONSTRUCTION OF ABSTRACT MEASURES 52
It is not difficult to establish most of the basic properties of measurable functions pre-
viously stated in the Lebesgue case.
16.5. P ROPOSITION . The integral of nonnegative measurable functions satisfies the following
properties
R R
(a) (equivalence) if f = g almost everywhere then f dµ = g dµ;
R R
(b) (monotonicity) if f ≤ g almost everywhere then f dµ ≤ g dµ;
(c) (range truncation) if f N = min( f , N ) then f N dµ → f dµ; and
R R
(d) (support truncation) if An is a sequence of measurable sets such that An ⊂ An+1 and
R R
An = X, then f χ An dµ → f dµ.
S
P ROOF. All of the properties (a)–(d) can be proved in a fashion similar to the case of
the Lebesgue integral.
The same is true of property (e), but we neglected to state it earlier so let us provide
the proof now. Let g = λχ{ x| f (x)≥λ} . Then clearly g is a simple function and g ≤ f . It
follows that the simple integral of g is a lower bound for the integral of f , so λµ({ x |
R
f ( x ) ≥ λ}) ≤ f dµ, as desired.
The proof of additivity is similar to the Lebesgue case, with one wrinkle. Recall that
in the Lebesgue proof we applied the truncation lemma to the sets [− N, N ]d to assume
without loss of generality that f , g have finite measure support. In general we cannot
assume that X is a union of countably many sets of finite measure. Instead we let An =
{ x | f ( x ) > n1 or g( x ) > n1 }. This family of sets is increasing and the union contains the
supports of f and g. By Markov’s inequality, the sets An have finite measure, and the
proof can now proceed by truncating to An .
Now that the integral of nonnegative functions has been defined, we may again define
the absolutely convergent Lebesgue integral for complex-valued functions.
before.
16.7. P ROPOSITION . The space L1 (µ) together with the norm k · k is a normed vector space.
16. ABSTRACT INTEGRATION THEORY 55
Finally we remark that our major convergence theorems all hold true for integration
with respect to an abstract measure µ. These include the monotone convergence theorem,
Tonelli’s theorem, Fatou’s lemma, and the dominated convergence theorem.
E XERCISE 16.1 (See Tao, ex 1.4.29). ◦ Show that f is measurable if and only if
+ −
f and f are measurable.
◦ Show that sums and products of measurable functions are measurable.
E XERCISE 16.2 (See Tao, ex 1.4.36). Establish the support truncation property: if An is
R
a sequence of measurable sets such that An ⊂ An+1 and An = X, then f χ An dµ →
S
R
f dµ.
PART III
Functional analysis
In our study of integration theory, we studied several classes of real and complex-
valued functions. For example we studied the class of Lebesgue measurable functions,
and the class of absolutely integrable functions. Across analysis there are many other
important classes of functions, including the continuous functions, uniformly continuous
functions, differentiable functions, and so on.
Such function spaces have a lot of internal structure. For example if you can add or
scale the elements of the function space, then it will have the algebraic structure of a vector
space. And if you can measure distances between functions in the space, it should have a
geometry too. In the best cases, the algebraic and geometric structures make the function
space into a normed vector space.
17.1. D EFINITION . A normed vector space consists of a (real or complex) vector space X
together with a mapping k · k : X → [0, ∞) satisfying:
◦ (homogeneity) k ax k = | a| · k x k;
◦ (triangle inequality) k x + yk ≤ k x k + kyk; and
◦ (non-vanishing) k x k = 0 implies x = 0.
The mapping k · k is called a norm on X.
The norm gives rise to a metric on X defined by d( x, y) = k x − yk. The metric has
several special properties not true in a general metric space. For example, it is uniform
throughout the space in the sense that it is translation invariant: d( x + z, y + z) = d( x, y).
17.2. D EFINITION . A normed vector space is called a Banach space if the associated
metric d( x, y) = k x − yk is complete.
56
17. NORMED VECTOR SPACES 57
The property means that there are no holes in the space—any apparent point which can be
approximated actually exists. We will see the value of assuming that our normed vector
spaces are complete in later sections.
We now describe several familiar examples of normed vector spaces (in fact most of
them will be Banach spaces).
17.3. E XAMPLE . The ordinary finite-dimensional vector space Rd , together with its
1/2
usual Euclidean norm k x k = ∑1d xi2 , is a Banach space. It is a classical annoying
exercise to establish the triangle inequality. One can also verify that the completeness
property holds, using the classical fact/construction that it is true for R.
17.4. E XAMPLE . The space RN consisting of all real sequences will not be a Banach
space in any reasonable way. However it does contain many classical Banach spaces. For
example we can consider the space of all square summable sequences
n o
`2 = x ∈ RN | ∑ xn2 < ∞
1/2
This is a Banach space with its generalized Euclidean norm k x k2 = ∑ xi2 . It is not
2
particularly easy to verify that ` is a Banach space and k · k2 is a complete norm on it. We
will do this later!
17.5. E XAMPLE . As we have mentioned, spaces of integrable functions may also form
Banach spaces. Let ( X, B) be a measurable space and µ a measure on it. We have already
described the space L1 (µ) of all absolutely integrable functions on X. It is also possible to
use other powers, for example, let
Z
L2 (µ) = f : X → C | | f |2 dµ < ∞
1/2
We may then define a norm on L2 be k f k2 = | f |2 . The spaces L1 (µ) and L2 (µ) are
R
not quite Banach spaces, but only because they fail to satisfy the non-vanishing property.
This can easily be remedied by identifying two functions if they agree almost everywhere.
Having given the definition and basic examples of Banach spaces, we now briefly
study mappings between them. For ordinary vector spaces, the most natural mappings
are the operators, or linear mappings. For Banach spaces we primarily study operators
which are also continuous.
17. NORMED VECTOR SPACES 58
The motivating examples are the operators from Cn to Cm , which are simply the famil-
iar matrix transformations studied in linear algebra. It turns out that all operators from Cn
to Cm are continuous. It is tempting to believe that all operators are continuous because
they are linear, but this turns out not to be the case in infinite-dimensional settings.
Another example of a linear operator is the mapping L1 (µ) → C given by f 7→ f dµ.
R
We have of course seen that the mapping is linear, and it turns out to be continuous as
well.
As a final example, the mapping from D [ a, b] to B[ a, b] which takes a differentiable
function to its derivative is also linear. However it fails to be continuous with respect
to the supremum norm. Two functions can be very close together but also have very
different slopes!
In the next two results, we show that the continuity of operators is very special when
compared with arbitrary continuous functions.
17.8. L EMMA . Let X, Y be normed vector spaces and let T : X → Y be an operator. Then T is
continuous if and only if T is bounded.
P ROOF. First suppose that T is continous. Apply the continuity with e = 1 to obtain
δ > 0 such that k x k < δ =⇒ k Tx k < 1. The for any x 6= 0 we have
kxk kxk
x
k Tx k =
T
δ
≤1
kxk
δ δ
Conversely suppose that T is bounded, and let M be such that k Tx k ≤ M k x k for
x ∈ X. Given any e > 0 we let δ = e/M. Then we have
e
k x k < δ =⇒ k Tx k ≤ Mk x k < M =e
M
This shows that T is continuous at the point 0, and hence by the previous lemma T is
continuous.
E XERCISE 17.1 (BBT, ex 12:1.2). Show that the addition and constant multiple opera-
tions are continuous on a normed vector space.
17. NORMED VECTOR SPACES 59
E XERCISE 17.2 (BBT, ex 12:1.3). Show that the unit ball of a normed vector space is
convex. That is, for x, y in the ball and λ ∈ (0, 1) we have λx + (1 − λ)y is also in the ball.
Rx
E XERCISE 17.3 (BBT, ex 12:3.1). Consider the operators D ( f ) = f 0 , (S f )( x ) = a f dµ,
R
and I ( f ) = f dµ. What are appropriate domains and codomains of each operator? Show
that S and I are continuous, and D is not continuous.
E XERCISE 17.4. Let D [0, 1] denote the space of differentiable functions on [0, 1] with
continuous derivative, equipped with the supremum norm of B[0, 1]. Show that D [ a, b] is
not complete.
18. THE HAHN–BANACH THEOREM 60
Let us briefly recall the situation for X = Rd with any of its norms. Here a linear
functional φ is determined by its values on a basis, and it follows that φ is of the form
x 7→ y T x, that is, the dot product with a row vector or “dual vector”. It should not be
surprising that these mappings are always bounded, regardless of the norm on Rd .
When X is an infinite-dimensional normed vector space, it is not true that all linear
functionals on X are bounded. In fact, given an infinite dimensional normed vector space
X, it is not immediately obvious that there are any nonzero bounded linear functionals on
X. In the rest of this section we present the Hahn–Banach theorem, which implies that on
any normed vector space, there really are lots of bounded linear functionals.
In order to state the Hahn–Banach theorem in its most powerful form, we need the
following generalization of a norm on a vector space.
Norms and seminorms are both examples of sublinear functionals. Another example
is the upper Riemann integral, defined on the space M[ a, b] of bounded functions on [ a, b].
Before proving the above abstract form of the Hahn–Banach theorem, we present sev-
eral key consequences regarding the construction of bounded linear functionals.
(c) The bounded linear functionals separate points: for all x, x 0 ∈ X, if x 6= x 0 then there is
a bounded linear functional φ on X such that φ( x ) 6= φ( x 0 ).
P ROOF. (a) Since φ0 is a bounded linear functional on Y, there exists some M such that
φ0 (y) ≤ Mkyk for all y ∈ Y. We may therefore apply the Hahn–Banach theorem with the
sublinear functional p( x ) = M k x k. Thus φ0 extends to a linear functional φ on X such
that φ( x ) ≤ Mk x k. In particular, φ is bounded too.
(b) We first define a function φ0 on the space Y + Rz which is bounded by p = the
norm. For this we will let φ0 (y + cz) = cφ0 (z) where φ0 (z) remains to be determined. In
order to satisfy φ0 (y + cz) ≤ ky + czk we require that cφ0 (z) ≤ ky + czk for all y ∈ Y.
Substituting y with −cy, we see that we must choose φ0 (z) ≤ k − y + zk for all y ∈
Y. Since Y is closed we must have infy∈Y ky + zk 6= 0 (otherwise z would be a limit of
elements of Y and hence in Y). It follows that we can choose a nonzero value of φ0 (z), and
we may then use part (a) to extend φ0 to a bounded linear functional φ on X that meets
our requirements.
(c) If x 6= x 0 then x − x 0 6= 0. Applying part (b) with Y = {0} we can find a bounded
linear functional φ such that φ( x − x 0 ) 6= 0. It follows that φ( x ) 6= φ( x 0 ).
We now return to the proof of the Hahn–Banach theorem.
P ROOF OF T HEOREM 18.3. We begin by showing that we can find a proper extension
of φ0 . Specifically, given any z ∈ X r Y we will find an extension of φ0 to a linear func-
tional φ1 on Y ⊕ Rz satisfying φ1 ≤ p. For this we will define
where φ1 (z) will be determined a little bit later. When we do choose φ1 (z), it will have to
satisfy the requirement that for all y ∈ Y and all c ∈ R:
To isolate φ1 (z) we must consider the cases of negative and positive values of the coeffi-
cient c separately. Thus assume c > 0 and split the last equation into two conditions:
In order for the constraints to be satisfiable, it is sufficient to have for all y, y0 ∈ Y that
− p(y − z) + φ0 (y) ≤ p(y0 + z) − φ0 (y0 ). And this is indeed the case, since
φ0 (y) + φ0 (y0 ) = φ0 (y + y0 )
≤ p(y + y0 )
= p(y − z + y0 + z)
≤ p(y − z) + p(y0 + z)
Thus we can find a suitable value for φ1 (z) and successfully extend φ to φ0 as required.
To complete the proof we wish to apply the above step repeatedly. Since the number
of steps will be uncountable in general, it is necessary to phrase our construction using
the standard Zorn’s lemma: if P is a partially ordered set and every chain of P has an upper
bound, then P has a maximal element.
Now let P be the collection of all linear functionals φ such that the domain of φ is
a subspace of X, φ extends φ0 , and φ ≤ p. We partially order P by function extension.
A chain C in P always has an upper bound, namely the set-theoretic union C of the
S
members of the chain. Moreover the union will be ≤ p because each member of the chain
is ≤ p.
Therefore we can apply Zorn’s lemma to find an element φ of P which is maximal
with respect to function extension. We claim moreover that the domain of φ must be all
of X. Indeed, otherwise we can use the argument above to properly extend the domain of
φ to find a larger element φ0 of P. This contradicts the maximality of φ, and completes the
proof.
E XERCISE 18.1 (BBT, ex 12:5.1). Let f be a bounded real-valued function on [0, 1], and
let U ( f ) denote the Upper Lebesgue integral of f . Show that U is a sublinear functional.
What can you conclude from the Hahn–Banach theorem?
E XERCISE 18.2 (BBT, ex 12:5.2). Let `∞ denote the space of bounded real sequences
with the supremum norm, and let c denote the subspace of convergent real sequences.
Define p on `∞ by
x1 + · · · + x n
p( x ) = lim sup
n→∞ n
Verify that p is a sublinear functional such that lim x ≤ p( x ). If we apply the Hahn–
Banach theorem to obtain a bounded linear functional L extending lim, show that lim inf x ≤
L( x ) ≤ lim sup x and calculate the value of L(0, 1, 0, 1, . . .).
E XERCISE 18.3. Let Rd be equipped with any norm that makes it into a normed vector
space. Show that every linear functional on Rd is continuous.
19. SPACES OF OPERATORS AND THE DUAL SPACE 63
In the past two sections we have introduced and discussed the continuous operators
T : X → Y between two normed vector spaces. In this section we study the collection of
all such operators as a space in its own right.
19.1. D EFINITION . Let X, Y be normed vector spaces. Then B( X, Y ) denotes the space
of bounded linear operators T : X → Y.
k T k = inf { M | (∀ x ) k Tx k ≤ Mk x k }
In other words, the operator norm of a bounded operator T is the least value of M which
witnesses that T is bounded. Naturally we must show that the operations and norm
satisfy the properties of a normed vector space.
19.2. P ROPOSITION . (a) B( X, Y ) is a normed vector space with the operations of point-
wise addition and scaling, and with the operator norm.
(b) If Y is a Banach space then so is B( X, Y ).
P ROOF. (a) We first show that the operator norm is indeed a norm. The homogene-
ity and non-vanishing properties are easy to check. For the triangle inequality let T, T 0 ∈
B( X, Y ), and calculate k( T + T 0 )( x )k = k Tx + T 0 x k ≤ k Tx k + k T 0 x k ≤ k T kk x k + k T 0 kk x k.
It follows that k T + T 0 k ≤ k T k + k T 0 k, as desired.
It also follows from homogeneity and the triangle inequality that B( X, Y ) is closed
under scalar multiplication and addition. Thus B( X, Y ) is a normed vector space.
(b) It remains only to show that B( X, Y ) is complete. For this let Tn be a sequence of
elements of B( X, Y ) and assume that it is Cauchy in the operator norm. This means that
for all e > 0, there exists N such that for all m, n ≥ N we have k Tm − Tn k < e.
Now for any fixed x ∈ X, it follows from the last equation that k Tm x − Tn x k < ek x k.
In particular, the sequence Tn x is a Cauchy sequence in the space Y. Since we are assuming
that Y is complete, the sequence Tn x converges and we define Tx = lim Tn x.
Now T is a well-defined function from X to Y, and by definition T is the pointwise
limit of the Tn . We need to check that T ∈ B( X, Y ) and moreover that Tn → T in the
operator norm.
To see that T ∈ B( X, Y ), first note that it is easy to check T is a linear map. For example,
T ( x + y) = lim Tn ( x + y) = lim Tn ( x ) + Tn (y) = T ( x ) + T (y). To see that T is bounded,
we first claim that the sequence of operator norms k Tn k is itself bounded. For this claim,
recall from the reverse triangle inequality that |k Tn k − k Tm k| ≤ k Tn − Tm k. Thus the fact
19. SPACES OF OPERATORS AND THE DUAL SPACE 64
By the results of this section, the dual space X ∗ is always a Banach space with the
operator norm. By the Corollary to the Hahn–Banach theorem, the elements of the dual
space are plentiful in the sense that they separate the points of X. In fact, now that we have
introduced the operator norm, we can strengthen two of the statements in Corollary 18.4.
P ROOF. (a) Since φ is an extension of φ0 , we always have kφk ≥ kφ0 k. And in the
proof Corollary 18.4(a), it is apparent that kφk ≤ kφ0 k.
(b) Recall that in the proof of Corollary 18.4(b), we showed one may define φ0 on
Y + Rz in such a way that φ(Y ) = 0 and φ(z) = infy∈Y ky + zk and φ0 ≤ k · k. It is not too
difficult to argue that with this definition, kφ0 k = 1. Therefore by part (a) we can extend
φ0 to φ with kφk = 1 as desired.
We close this section with the following result about the double dual of a space.
19.5. P ROPOSITION . Let X be a normed vector space. Then there is a norm-preserving oper-
ator from X into its double dual X ∗∗ .
19. SPACES OF OPERATORS AND THE DUAL SPACE 65
x̂ (φ) = φ( x )
E XERCISE 19.2 (BBT, ex 12:7.2). Show that k x k = sup { |φ( x )| : φ ∈ X ∗ and kφk = 1 }.
E XERCISE 19.3 (BBT, ex 17:7.5). If X, Y are Banach spaces and T ∈ B( X, Y ), show that
( T ∗ φ)( x )
= φ( Tx ) defines an element of B(Y ∗ , X ∗ ) such that k T ∗ k = k T k.
E XERCISE 19.4 (BBT, ex 12:7.6(b)). Show that a Banach space X is reflexive if and only
if X ∗ is reflexive.
In our introduction to normed vector spaces, we singled out the special case when the
space is complete and called it a Banach space. However in our investigation we have
said very little that is special to Banach spaces. In this section we present several key
results that are essentially unique to Banach spaces because they rely on the completeness
property.
The three key results we will present are called the uniform boundedness principle,
the open mapping theorem, and the closed graph theorem. In ecah case, rather than
provide a proof we will state the result and give a sample application.
It is often remarked that the uniform boundedness principle sounds too good to be
true—it has a pointwise hypothesis and a uniform conclusion. Regardless, it is true and
has a short proof from the Baire category theorem for complete metric spaces. Because
of its power the uniform boundedness principle is used quite frequently. We present just
one simple consequence concerning pointwise convergence of operators.
k Tx k = k lim Tn x k
= lim k Tn x k
≤ lim sup k Tn kk x k
≤ Mk xk
In particular, T is bounded and k T k ≤ M.
For our next result, recall that a function is continuous if the preimage of any open
set is open. A somewhat less used but still very important property is the reverse. A
function is called open if the image of any open set is open. In the case that a function has
an inverse, the open property simply means that the inverse is continuous. However it is
still a valuable property even for functions which are not bijections.
20. THREE RESULTS ON BANACH SPACES 67
20.4. C OROLLARY. Suppose X is a Banach space with two complete norms k · k a and k · kb .
Then if there is a constant c such that k x k a ≤ ck x kb for all x ∈ X, then k · k a and k · kb are
equivalent.
Observe that T has a closed graph if and only if xn → x and Txn → y implies y = Tx.
On the other hand, recall that T is continuous if and only if xn → x implies Txn → Tx. So
it is easier to check that T has a closed graph than to check that T is continuous, because
when checking the former one can assume for free that Tn x converges to something.
Rather than give a consequence of the closed graph theorem, we will give an impor-
tant example. Let C [ a, b] be the Banach space of continious functions on [ a, b] with the
supremum norm, and let D [ a, b] be the subspace of all functions with continuous deriva-
tive. Let D : D [ a, b] → C [ a, b] be the derivative operator. To check D has a closed graph,
we suppose that f n → f and D f n → g in supremum norm and verify that D f = g. For
this we integrate both sides of D f n → g and use the fundamental theorem of calculus to
conclude that f n → G, where G is an antiderivative of g. Since f n converges to f , we have
f = G. Now differentiating both sides we conclude that D f = g, as desired.
20. THREE RESULTS ON BANACH SPACES 68
While have just checked that D has a closed graph, it is also easy to check that D is not
bounded. Thus the contrapositive of the closed graph theorem implies that D [ a, b] is not
a Banach space! This is an admittedly somewhat silly way to see this fact, since it is also
possible to argue directly that D [ a, b] is not complete.
E XERCISE 20.1. Give an example of a function from R to R which has a closed graph
but is not continuous. Give an example of function from R to R which is continuous and
surjective but not open. Is it possible to give a bijective example?
E XERCISE 20.2 (BBT, ex 12:13.2). Equip the space C [0, 1] with both the L1 norm and the
supremum norm. Show that the L1 norm is bounded by a constant times the supremum
norm. Show that the reverse is not true. Explain why the two results do not contradict
Corollary 20.4.
21. THE BANACH SPACE L p 69
As we have seen, many of the most important Banach spaces are function spaces aris-
ing in other areas of analysis. We have already seen the Banach space L1 of absolutely
integrable functions, and we have seen that there are several other norms derived from
summation and integration. In this section we further investigate the L p -spaces, which
generalize many of these important examples.
21.1. D EFINITION . Let ( X, B) be a measurable space and let µ be a measure on it. For
any measurable f defined on X we let
Z 1/p
p
k f kp = | f | dµ
Thus the spaces L1 ( X ) and L2 ( X ) are each examples of L p -spaces, but so are a variety
of others. When X is the finite set {1, . . . , n} with the counting measure, the resulting
space is just Rd with its p-norm. When X = N with the counting measure, the resulting
space is the sequence space ` p with its p-norm.
The rest of this section is devoted to verifying that whenever p ≥ 1, L p really is a
Banach space with respect to the norm k · k p . Before we can prove this result, it is necessary
to establish the following fundamental inequality.
21.2. T HEOREM (Hölder’s inequality). Let p, q ≥ 1 be real numbers such that 1/p +
1/q = 1. If f ∈ L p (µ) and g ∈ Lq (µ), then f g is absolutely integrable and
Z
| f g| dµ ≤ k f k p k gkq
P ROOF. The theorem follows from a classical inequality which we will call Hölder’s
inequality for real numbers:
ap bq
ab ≤ +
p q
There are many proofs and one is left as an exercise.
To begin the proof, note that given f , g as in the theorem statement, we can rescale to
assume that k f k p = k gkq = 1. This is because both k · k p and k · kq satisfy the positive
homogeneity property.
R
Now our objective is to show that | f g| dµ ≤ 1. For this we plug a = | f ( x )| and
b = | g( x )| into Hölder’s inequality for real numbers to obtain
1 1
| f ( x ) g( x )| ≤ | f ( x )| p + | g( x )|q
p q
21. THE BANACH SPACE L p 70
P ROOF. We can assume without loss of generality that f , g never take the value ∞. We
begin by writing
| f ( x ) + g( x )| p = | f ( x ) + g( x )| · | f ( x ) + g( x )| p−1
≤ | f ( x )| · | f ( x ) + g( x )| p−1 + | g( x )| · | f ( x ) + g( x )| p−1
We now integrate both sides of this inequality, and then apply Hölder’s inequality to
each of the resulting terms. In the following calculation, we also note that our hypothesis
implies that the value q used in Hölder’s inequality is equal to p/( p − 1). Here is the
computation:
Z Z
p −1
(k f + gk p ) ≤ p
| f | · | f + g| dµ + | g| · | f + g| p−1 dµ
21.4. T HEOREM . The space L p (µ) with the norm k · k p is a Banach space.
P ROOF. It is clear that the norm is homogeneous and non-vanishing, and we have
just shown it satisfies the triangle inequality. This also implies that L p (µ) is closed under
linear combinations and therefore it is a vector space. So it only remains to show that the
norm k · k p is complete.
For this let f n be a sequence of elements of L p (µ) which is Cauchy in the k · k p norm.
Passing to a subsequence if necessary, we can suppose without loss of generality that for
all n we have k f n+1 − f n k p < 1/2n . We first wish to show that this implies f n has a
pointwise limit f .
Let gk = ∑1k | f n+1 − f n | and g = ∑1∞ | f n+1 − f n |. So g is the limit of the gk . While the
function g may take the value +∞, we claim that this cannot happen too often. Indeed by
21. THE BANACH SPACE L p 71
It then follows from Fatou’s lemma that | g| p ≤ lim inf | gk | p ≤ 1. Thus we can conclude
R R
We have just observed that for µ-almost every x, the latter series is absolutely convergent.
Thus the series is convergent, and we can define the function f ( x ) = lim f n ( x ).
It remains only to show that f lies in L p and that f n → f in the norm k · k p . For this,
let e be given and choose N large enough that m, n ≥ N implies k f m − f n k p < e. Fixing n
and applying Fatou’s Lemma to the resulting m-sequence we obtain
Z Z
p
| f − f n | ≤ lim inf | fm − fn | p ≤ e p
m
21.5. D EFINITION . Let ( X, B) be a measurable space and let µ be a measure on it. For
any measurable f defined on X we let
The norm k f k∞ is called the essential supremum of f , and the members of L∞ are said
to be essentially bounded. We will leave as an exercise the following generalizations of our
results for L p to the case p = ∞.
E XERCISE 21.1 (BBT, ex 13:1.2). Show that the inequality k f + gk1 ≤ k f k1 + k gk1 is
strict precisely when there exists a nonnegative measurable function h such that g = f h
for almost every element x of the set where f , g 6= 0.
21. THE BANACH SPACE L p 72
E XERCISE 21.2. Recall the argument from Theorem 12.6 that the absolutely integrable
simple functions are dense in L1 . Show that the absolutely integrable simple functions are
dense in L p .
E XERCISE 21.4. Prove Hölder’s inequality for real numbers. (Should probably give a
hint.)
22. THE DUAL SPACE OF L p 73
Recall that if X is a normed vector space, then its dual X ∗ consists of all bounded
linear functionals on X. Although it is somewhat rare to be able to describe the space X ∗
completely, in this section we will be able to provide a complete description of ( L p )∗ .
The starting point in our search for bounded linear functionals on L p is actually Hölder’s
inquality: if p, q are conjugate exponents (that is, 1/p + 1/q = 1), then for any f ∈ L p and
g ∈ Lq we have: Z
f g dµ ≤ k f k p · k gkq
This statement really says that for any g ∈ Lq , the linear functional φ defined on L p defined
by Z
φ( f ) = f g dµ
is in fact bounded. Thus we have already found a large supply of elements of ( L p )∗ . Our
main result says that every element of ( L p )∗ arises in this way.
22.1. T HEOREM . Let ( X, B) be a measure space, µ a measure on it, and L p = L p (µ). Assume
that µ is σ-finite, that is, X = An where µ( An ) < ∞. Let 1 ≤ p < ∞ and let q be the conjugate
S
exponent, that is, 1/p + 1/q = 1. Then for every φ ∈ ( L p )∗ there exists a unique g ∈ Lq such
that Z
φ( f ) = f g dµ
Moreover kφk = k gkq , and ( L p )∗ ∼
= Lq .
ν ( A ∪ B ) = φ ( χ A∪ B ) = φ ( χ A + χ B ) = φ ( χ A ) + φ ( χ B ) = ν ( A ) + ν ( B )
However there is no reason why such a function ν can’t take negative values, and so ν
need not be a measure in our original sense.
Thus given a linear functional φ on L p , the function ν described above could be called
a finitely additive signed measure. For a proper example of a signed measure, let µ be an
22. THE DUAL SPACE OF L p 74
unsigned measure on ( X, B), let g be any absolutely integrable function on X, and define
Z
µ g ( A) = χ A g dµ
Then it is not difficult to see from Fubini’s theorem that µ g is a signed measure on ( X, B).
It is clear from the definition of µ g that if µ( A) = 0 then µ g ( A) = 0 too. The next
result states that this condition is sufficient to guarantee that a given signed measure ν is
actually of the form µ g .
We now have all the ingredients we need to prove that the dual of L p is Lq . Indeed,
we have already introduced a simple correspondence between linear functionals φ and
signed measures ν with the property that ν( A) = φ(χ A ). Then the Radon–Nikodym
theorem gives us a correspondence between signed measures ν and absolutely integrable
R
functions g such that ν( A) = χ A g dµ. Putting these together, we see that φ(χ A ) =
R
χ A g dµ. In other words we see that φ is of the desired form, at least for the characteristic
functions. We are therefore left to check that this property can be extended to arbitrary
functions f ∈ L p , as well as the rest of the claims in the statement.
S KETCH OF PROOF OF T HEOREM 22.1. In this proof, we will sketch only the case when
1 < p < ∞ and µ( X ) < ∞. It is not essentially more difficult to complete the proof from
this simplified version.
Given a functional φ ∈ ( L p )∗ , we first define the mapping ν( A) = φ(χ A ). We have
already checked that ν is finitely additive. We claim that in fact ν is a signed measure.
Indeed, if An is a given sequence of pairwise disjoint sets, then using the finiteness of
µ( X ) and the dominated convergence theorem we have:
Z Z
|χS An − χSk An | p dµ = |χS∞k+1 An | p dµ → 0
1
In other words, we have that χSk En → χS En in the L p -norm. Using the fact that φ is a
1
continuous function on L p , it follows that
[
ν( A n ) = φ ( χ S An )
= lim φ(χSk An )
k 1
k
[
= lim ν( An )
k
1
k
= lim ∑ ν( An )
k 1
= ∑ ν( An )
and so ν is countably additive.
22. THE DUAL SPACE OF L p 75
Now by the Radon–Nikodym theorem, there exists a function g ∈ L1 such that φ(χ A ) =
R R
χ A g dµ for all sets E. It therefore follows from linearity that φ( f ) = f g dµ for all simple
functions f .
We next claim that φ( f ) = f g dµ for all functions f in L∞ . For this recall that any
R
φ( f ) = lim φ( f n )
n
Z
= lim f n g dµ
n
Z
= f g dµ
as desired.
While our next goal is of course to show that φ( f ) = f g dµ for all functions f ∈ L p ,
R
we first take a break and show that g lies in Lq . In fact we will show that k gkq ≤ kφk. First
we can use the truncation lemma to suppose that g is bounded. Then | g|q /g is bounded
too, and so by our work for functions in L∞ we can calculate:
Z Z
q
| g| dµ = (| g|q /g) g dµ
= φ(| g|q /g)
≤ k φ k · k g q −1 k p
Z 1/p
q
= kφk · | g| dµ
This inequality implies that k gkq ≤ kφk, as desired. We remark that this implies the
R R
functional f 7→ f g dµ is continuous, since Hölder’s inequality states that f g dµ ≤
k f k p · k gkq .
We now claim that we have φ( f ) = f g dµ for any f ∈ L p . For this, recall that we
R
have previously shown that the simple functions are dense in L1 , that is any L1 function is
an L1 -limit of simple functions. The same argument can be used to show that the simple
functions are dense in L p . We have shown above that the two functionals, f 7→ φ( f ) and
R
f 7→ f g dµ, agree on the simple functions. Our hypothesis states that φ continuous, and
R
the previous paragraph implies that f 7→ f g dµ is continuous too. Since two continuous
functions that agree on a dense set must agree on their domain, we can conclude that
φ( f ) = f g dµ for all f ∈ L p (µ).
R
R
Finally we claim that k gkq = kφk. Indeed we now know that |φ( f )| = f g dµ ≤
k f k p · k gkq , and hence that kφk ≤ k gkq . We have also shown two paragraphs previously
that k gkq ≤ kφk. This concludes the proof.
22. THE DUAL SPACE OF L p 76
E XERCISE 22.1 (BBT, ex 13:6.1). Let g ∈ L1 [0, 1]. Show that the map f 7→ f g is a
R
E XERCISE 22.2 (BBT, ex 13:6.2). Show that there is a nonzero bounded linear functional
on L∞ [0, 1] that vanishes on the (closed) subspace of continuous functions.
E XERCISE 22.3 (BBT, ex 13:6.3). Show that there is a bounded linear functional on
L∞ [0, 1]
that is not of the form f 7→ f g for any g ∈ L1 [0, 1].
R
23. HILBERT SPACE 77
This recalls the case of Euclidean space Rd where the (bounded) linear functionals are all
given by an inner product x 7→ y T x.
The key idea of this section is that just like the pairing yt x, the pairing f g may be
R
regarded as an inner product, leading to the next definition. For the greatest generality,
we will now return to vector spaces with complex scalars.
23.1. D EFINITION . Let X be a complex vector space. A function h·, ·i : X × X → C is
called an inner product if it satisfies
(a) (positivity/nonvanishing) for x ∈ X we have h x, x i ≥ 0, and h x, x i = 0 iff x = 0;
(b) (conjugate symmetry) for x, y ∈ X we have h x, yi = hy, x i; and
(c) (linearity in the first coordinate) hc1 x1 + c2 x2 , yi = c1 h x1 , yi + c2 h x2 , yi
p
If X admits an inner product then it automatically admits a norm k x k = h x, x i, and X
is called a Hilbert space if this norm is complete.
We will prove shortly that the mapping k · k defined above really is a norm. We remark
that (b) and (c) together imply that h·, ·i is conjugate linear in the second coordinate (we
leave it to the reader to state and verify this formally).
Thus the Banach space X = L2 (µ) (with the complex scalars) is a Hilbert space with
respect to the inner product Z
h f , gi = f ḡ dµ
While a Hilbert space may seem like just a small “upgrade” from a Banach space, it is
quite significant. In fact, we will see in the next section that L2 and `2 are essentially the
only examples of Hilbert spaces.
We now lay out some of the most basic facts about Hilbert space. Our first result is the
following analog of Hölder’s inequality.
23.2. T HEOREM (Schwarz inequality). Let X be an inner product space. Then |h x, yi| ≤
k x k · k y k.
P ROOF. The proof is a simpler version of the proof of Hölder’s inequality. First, by
multiplying x by a scalar of the form eiθ , we may assume that h x, yi is real. Next given
x, y we define a real function p(α) = hαx + y, αx + yi. Then by axiom (a) we have that
p(α) ≥ 0. And by axiom (c) we have
Thus p is a quadratic and p ≥ 0, which implies p has at most one real root. This means that
the discriminant is non-positive, that is, 4|h x, yi|2 − 4k x k2 · kyk2 ≤ 0. This last equation is
plainly equivalent to the desired result.
We are now ready to prove the fact that every inner product space automatically has
a norm.
p
23.3. P ROPOSITION . Let X be an inner product space. Then k x k = h x, x i makes X into a
normed vector space.
P ROOF. Since the nonvanishing and homogeneity properties are automatic from the
axioms, it remains only to verify the triangle inequality. For this we simply calculate:
k x + yk2 = h x + y, x + yi
= h x, x i + h x, yi + hy, x i + hy, yi
≤ h x, x i + 2|h x, yi| + hy, yi
≤ k x k2 + 2k x k · k y k + k y k2
= (k x k + kyk)2
Here, the first inequality uses axiom (b) and the ordinary triangle inequality, and the sec-
ond inequality uses the Schwarz inequality. Taking the square root of both sides, we
achieve the desired result.
Perhaps the most important feature of Hilbert spaces that is not present in an ordinary
Banach space is that of orthogonality.
23.4. D EFINITION . Let X be a Hilbert space. We say that vectors x, y ∈ X are orthogonal
if h x, yi = 0. Given a vector subspace Y ⊂ X we define its orthogonal complement Y ⊥ =
{ x ∈ X | (∀y ∈ Y ) h x, yi = 0 }.
23. HILBERT SPACE 79
The orthogonal complement does not always behave as one would expect from clas-
sical Euclidean space. For example, it is possible for a proper subspace Y to have Y ⊥ = 0
(this will be the case if Y is dense in X). However if Y is a closed subspace, then most
familiar properties do hold.
23.5. P ROPOSITION . Let X be a Hilbert space and let Y ≤ X be a closed subspace. Then
X = Y ⊕ Y ⊥ in the sense that every x ∈ X can be uniqely expressed as x = y + y0 where y ∈ Y
and y0 ∈ Y ⊥ .
The idea behind the proof is as follows. Given x, let y ∈ Y be the unique point in
Y which is closest to x. Such a point y exists and is unique thanks to the Euclidean-like
geometry of Hilbert space. (The basic fact here is that closed, convex sets have a unique
element of minimal norm.)
This key fact makes it possible to define bases in Hilbert space, as we will do in the
next section. Here we present another useful consequence of the proposition. First recall
that our motivation for defining Hilbert spaces was the fact that L2 is self-dual, and thus
the action of ( L2 )∗ on L2 behaves like an inner product. The next result states that the
converse holds, that is, if X admits an inner product then X is self-dual.
23.6. T HEOREM . If X is a Hilbert space, then X is self-dual. That is, for any φ ∈ X ∗ there
exists y ∈ X such that φ( x ) = h x, yi. Moreover the correspondence φ 7→ y is a conjugate-linear
isomorphism X ∗ ∼
= X.
φ( x ) − h x, yi = φ( x ) − φ(z)h x, zi
= φ( x )hz, zi − φ(z)h x, zi
= hφ( x )z − φ(z) x, zi
=0
Here the last equality follows from the fact that φ( x )z − φ(z) x lies in Y.
E XERCISE 23.1. Use the discussion in BBT, §14.2 to prove Proposition 23.5.
24. BASES FOR HILBERT SPACE 80
In the previous section we have seen that Hilbert spaces possess many properties
which are familiar from Rn and L2 . The special properties are made possible by the inner
product and its corresponding notion of orthogonality. In this section we make further
use of orthogonality, in particular introducing orthonormal bases.
Although bases are essential to the study of classical linear algebra, they have been
absent so far in our study of Banach spaces.
24.1. D EFINITION . Let X be a Hilbert space. A subset {eα }α∈ A of X is called an or-
thonormal basis if it satisfies the following properties:
(a) (normality) for all α, keα k = 1;
(b) (orthogonality) for all α 6= β, heα , e β i = 0; and
(c) (maximality) for any x ∈ X, if h x, eα i = 0 for all α ∈ A, then x = 0.
24.2. T HEOREM . Let X be a Hilbert space and {eα }α∈ A an orthonormal basis for X. Then for
any x ∈ X, we have x = ∑h x, eα ieα , with the convergence being in norm. Moreover, we have
Parseval’s identity, which states that k x k2 = ∑α∈ A |h x, eα i|2 .
P ROOF. In the proof, we will need the Pythagorean theorem, which states that if e1 , . . . , en
are orthognal, then ke1 + · · · + en k2 = ke1 k2 + · · · + ken k2 . The calculation is the same as
the classical version, and is obtained by distributing out the expression he1 + · · · + en , e1 +
· · · + e n i.
We first show one half of Parseval’s identity, namely that ∑ |h x, eα i|2 ≤ k x k2 . (This is
called Bessel’s inequality.) For this we let A0 ⊂ A be a finite subset. Then an elementary
calculation together with the Pythagorean theorem gives
2
(24.e1)
x − ∑ h x, eα ieα
= k x k − ∑ |h x, eα i|2
α∈ A α∈ A
0 0
Since the left-hand side is nonnegative, we have ∑α∈ A0 |h x, eα i|2 ≤ k x k2 . Since A0 was
arbitrary, this completes the claim.
Now we know that ∑ |h x, eα i|2 converges. It follows that there are just countably many
nonzero terms, let us enumerate them |h x, en i|2 . Then the sequence of partial sums of
∑ |h x, en i|2 is Cauchy. By the Pythagorean theorem,
2
l
l
∑h x, en ien
= ∑ |h x, en i|2
m
m
Thus the sequence of partial sums of ∑h x, en ien is Cauchy too. Since X is complete, we
conclude that there exists an element y ∈ X defined by y = ∑h x, eα ieα .
Next we claim that in fact x = y. Indeed, it is easy to see that h x − y, eα i = 0 for all α,
so the completeness of the orthonormal set implies that x − y = 0.
Finally, we conclude the proof of Parseval’s identity by returning to Equation (24.e1).
Since we now know that the left-hand side converges to 0 as A0 → A, it follows that the
right-hand side does as well, establishing the equality.
One can interpret this result as saying that any Hilbert space X looks remarkably like
`2 .
That is, each element x ∈ X is determined by its vector of coefficients h x, eα i. Indeed,
our last result will show that this is a formal theorem. In order to state it, we need to
define a generalization of the sequence space `2 .
In other words, `2 ( A) is like a sequence space where the sequences may be indexed
by an arbitrary set other than N. Another way to say it is that `2 ( A) is equal to L2 (µ),
where µ is the counting measure on A.
24. BASES FOR HILBERT SPACE 82
P ROOF. Let {eα }α∈ A be an orthonormal basis for X. The index set A will be the same
set we use to form `2 ( A). We define a function φ : X → `2 ( A) by φ( x )(α) = h x, eα i.
It is easy to see that φ is linear, and Parseval’s identity implies that φ preserves the
norm. In particular φ is injective. To see that φ preserves the inner product, it suffices to
note that the inner product can be recovered from the norm by the polarization identity
Finally to see that φ is surjective, let f ∈ `2 ( A) be given. Since ∑α∈ A | f (α)|2 < ∞,
the series has just countably many nonzero terms and its partial sums are Cauchy. By
the Pythagorean theorem, the partial sums of ∑α∈ A f (α)eα are Cauchy too. It follows
that there exists an element x ∈ X defined by x = ∑α∈ A f (α)eα . Clearly φ( x ) = f , as
desired.
Thus the result implies that there is exactly one Hilbert space in each dimension. Note
that both `2 and L2 [0, 1] are Hilbert spaces of countable dimension, because we have seen
above that each has a countable basis. The unique countable dimensional Hilbert space,
often denoted H, is by far the most widely used in applications where an operator acts on
infinitely many coordinates. Hilbert space appears in the study of differential equations,
fourier analysis, quantum physics, and more.