0% found this document useful (0 votes)
14 views44 pages

Integration

Uploaded by

Rajesh Mal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
14 views44 pages

Integration

Uploaded by

Rajesh Mal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 44

INTEGRATION: H.T.

2024, 16 LECTURES

Acknowledgement. These notes are a very small edit of the notes produced by
Charles Batty who lectured this course from 2018-21. I’m grateful to him for allowing
me to use his notes in this way. I am responsible for any typos / inaccuracies in the
notes (please let me know of any you find).
Stuart White
stuart.white@maths.ox.ac.uk

Reading

Z. Qian, Part A: Integration, Available on course moodle page.


M. Capinski and E. Kopp, Measure, Integration and Probability, Springer SUMS
(2nd edition, 2004)
H.A. Priestley, Introduction to Integration, OUP, 1997
S. Axler. Measure Theory, Integration & Real Analysis, Springer, Graduate
Texts in Mathematics, 2020. This book is open access (https://measure.axler.net/)
E. M. Stein & R. Shakarchi, Real Analysis: Measure Theory, Integration and
Hilbert Spaces, Princeton Lectures in Analysis III, Princeton University Press,
2005
D.J.H. Garling, A Course in Mathematical Analysis, III (Part 6), CUP, 2014.

Qian’s notes were written for the course as he gave it in 2014-17, based on previous
versions of the course given by Alison Etheridge and Charles Batty. We will cover more
or less the same material, but not follow his notes exactly.
Capinski and Kopp is the most basic of the books, giving the theory in a basic style,
but with not many worked examples; we shall follow rather closely their approach to the
theory. Priestley adopts a very different approach to the construction of the integral,
so early parts of her book look quite different from what we will do, but about the 8th
lecture onward everything comes together; she has lots of worked examples.
Stein and Shakarchi, and Garling, are a little more sophisticated in the theory.
Garling’s book is based on lectures given in Cambridge, and it has a good number of
worked examples.
Numerous other useful books may be found in libraries. Some may adopt different
approaches to the construction of the integral, but when they talk about Lebesgue
integration they all mean the same class of integrable functions and the same theorems.

Date: Version from 29 Sept 2023.


1
2 INTEGRATION, H.T. 2024

Introduction
Rb
In Prelims, you saw how to define a f (x) dx for a continuous function f : [a, b] → R
or more generally for Riemann integrable f . It had some good properties: the Funda-
mental Theorem of Calculus shows that it is more or less an inverse of differentiation,
leading to rigorous statements concerning A level calculus. Moreover you saw that
Z b Z b
(*) lim fn (x) dx = f (x) dx
n→∞ a a
if (fn ) converges to f uniformly on [a, b].
R This was useful (a) for integrating power series
term-by-term, (b) for finding limn→∞ γ fn (z) dz, where γ is a contour of finite length,
in complex analysis last term. However, the Riemann integral has various deficiencies:
(a) There are still functions which one feels one should be able to integrate, for which
the Prelims definition fails to work. For example, let f = χQ∩[0,1] be the character-
istic function of Q ∩ [0, 1]. Then
Z 1 Z 1
f (x) dx = 0, f (x) dx = 1
0 0

so the definition of the integral fails.


In particular, if we want to define the length of a subset E of R by
Z
m(E) = χE (x) dx,

we need to extend the definition of integrals in some way beyond Riemann integra-
tion.
(b) There is a lack of theorems saying that
Z Z
fn → f =⇒ fn (x) dx → f (x) dx

particularly for integrals over R or unbounded subsets of R. To some extent, this


is unavoidable because of the following example:
Example 0.1. Let fn (x) = n2 xn (1 − x) (0 ≤ x ≤ 1). Then limn→∞ fn (x) = 0 for
R1
all x ∈ [0, 1], but limn→∞ 0 fn (x) dx = 1.
This example is going to arise in any reasonable theory. But we would like some
more theorems of the form
Suppose (fn ) is a sequence of integrable functions, fn (x) → f (x) for each
x, and
R [supplementary assumptions
R to be inserted]. Then f is integrable
and f (x) dx = limn→∞ fn (x) dx.
Lebesgue’s integration theory provides two very powerful theorems of this form
(Monotone Convergence Theorem, Dominated Convergence Theorem). The the-
orems are less good in Riemann integration, because one has to assume that the
limiting function is integrable.
(c) Riemann’s integration theory does not generalise to include various other contexts
such as:
INTEGRATION, H.T. 2024 3

• probability theory, taking expectations of arbitrary random variables (contin-


uous, discrete, hybrid, singular);
• summing infinite series.

Lebesgue’s theory resolves these difficulties, except where there is an unavoidable


obstruction. In a sense the passage from Riemann integration to Lebesgue integration
resembles the passage from rational numbers to real numbers—it completes the space
of integrable functions, or it fills in the gaps.
The crucial ideas of the Lebesgue’s construction are:

(i) Instead of using integrals to define lengths of sets, define the length of a set
directly; then define integrals.
(ii) Instead of partitioning the x-axis into intervals and using step functions, partition
the y-axis into intervals and considering corresponding “simple” functions.

There are other ways of constructing Lebesgue’s integral on R, including ways which
use step functions (see Priestley), but they don’t generalise so easily to probability (for
example). Once one gets the Monotone Convergence Theorem, then everything is the
same, however you got there. We then get a whole host of theorems about:

• passing limits through integrals,


• passing infinite sums through integrals,
• differentiating through integrals,
• interchanging two integrals (Fubini’s Theorem)
• changing variables.

Note that these processes do not always work—there are simple counterexamples
for the first 4! So all these theorems have conditions which must be checked before
using in applications. In this course, we do not take the position that you can just
assume all these processes work. On the other hand, we shall not go pedantically
through all details of the construction of the integral and the proofs of the theorems.
We’ll approach the construction in a way which generalises easily, but the proofs of
these generalistions are often not interesting. The construction up to the MCT will
take some time - around 8 lectures - and then useful theorems and applications will
come thick and fast.
Please be aware that all the Prelims theory remains valid in this context. Lebesgue
integration theory extends Riemann’s theory by enabling you to integrate more func-
tions. In particular, the Fundamental Theorem of Calculus (both versions), Integration
by Parts and Substitution remain valid under the assumptions given in Prelims.

1. Extended real number system

In this course, we shall often take infinite series of non-negative terms and limits of
(monotone) sequences. In order to avoid complications concerning divergence, it will
be convenient to work in the extended real numbers including −∞ and ∞, and to use
the notions of lim sup and lim inf.
4 INTEGRATION, H.T. 2024

Thus we consider the set [−∞, ∞] = R ∪ {−∞, ∞}. Addition and multiplication
by ∞ are defined as follows (for x ∈ R):
x + ∞ = ∞ + x = ∞,
x − ∞ = −∞ + x = −∞,

∞
 (x > 0),
x.∞ = ∞.x = (−x).(−∞) = −∞ (x < 0),

0 (x = 0).

Note that
• ∞ − ∞ is undefined;
• the usual laws (commutativity, associativity and distributivity) apply, provided
that the relevant expressions are defined;
• the above are uncontroversial, except for 0.∞ = 0 which is convenient for our
particular context but might be inappropriate in other mathematical contexts.
The ordering on [−∞, ∞] is the obvious one, and limn→∞ an = ∞ has the same meaning
as in Prelims Analysis.
In this system, any subset E has a supremum and an infimum in [−∞, ∞]. Note
that sup ∅ = −∞. If E ⊆ R, sup E = ∞ if and only if E is not bounded above.P For an
increasing sequence (an ), limn→∞ an = sup{an }. If an ≥ 0 for all n, then an = ∞ if
and only if the series diverges.
Proposition 1.1. 1. Let (an ) be a sequence of non-negative terms. Then

( )
X X
an = sup an : J finite subset of N .
n=1 n∈J
2. Let (bmn )m,n≥1 be a double sequence of non-negative terms, and {(mk , nk ) : k ≥ 1}
be any enumeration of N × N. Then
 
X∞ X ∞ X∞ X ∞ ∞
X  X 
bmn = bmn = bmk ,nk = sup bmn : J finite subset of N × N .
 
m=1 n=1 n=1 m=1 k=1 (m,n)∈J

P
In particular, Proposition
PP 1.1 implies that an is independent of the order of the
terms, and similarly bmn can be arbitrarily rearranged.
A bounded sequence (an ) in R may not have a limit. It has a supremum and
infimum, but for some large values of n, an may not be close to them. Think for
example about an = (1 + 1/n) sin n. Asymptotically the values oscillate between −1
and 1, but there are infinitely many values bigger than 1 and infinitely many smaller
than −1.
For a sequence (an ) in [−∞, ∞], define
 
lim sup an = lim sup an ,
n→∞ m→∞ n≥m
 
lim inf an = lim inf an .
n→∞ m→∞ n≥m
INTEGRATION, H.T. 2024 5


The limits exist, because supn≥m an m≥1
is a decreasing sequence
So, lim supn→∞ an is the largest number ` such that there is a subsequence of (an )
converging to `.
Examples 1.2. 1. Let an = (1 + 1/n) sin n. Then
lim sup an = 1, lim inf an = −1.
n→∞ n→∞

2. Let an = (−1)n . Then


lim sup an = 1, lim inf an = −1.
n→∞ n→∞

3. Let an = n(−1)n . Then


lim sup an = ∞, lim inf an = −∞.
n→∞ n→∞
(
1 + 2−n (n prime),
4. Let an = Then
0 otherwise.
lim sup an = 1, lim inf an = 0.
n→∞ n→∞

Proposition 1.3. 1. lim inf n→∞ an = − lim supn→∞ (−an );


2. lim inf n→∞ an ≤ lim supn→∞ an ;
3. limn→∞ an exists if and only if lim inf n→∞ an = lim supn→∞ an ; then all are equal;
4. If an ≤ bn for all n, then lim supn→∞ an ≤ lim supn→∞ bn ;
5. lim supn→∞ (an + bn ) ≤ lim supn→∞ an + lim supn→∞ bn (if all sums exist).

lim sup and lim inf are useful for avoiding epsilontics. For example, consider the
Sandwich Rule, i.e., suppose that an ≤ bn ≤ cn for all n and lim an = lim cn . Then
lim sup bn ≤ lim sup cn (Proposition 1.3(4))
= lim cn (Proposition 1.3(3))
= lim an (assumption)
= lim inf an (Proposition 1.3(3))
≤ lim inf bn (Proposition 1.3(4))
≤ lim sup bn (Proposition 1.3(2)).
Hence equality holds throughout, so lim bn = lim an , by Proposition 1.3(3).

2. Lebesgue measure

A measure of length for (all) subsets of R should be a function m : P(R) → [0, ∞]


satisfying:
(i) m(∅) = 0, m({x}) = 0;
(ii) m(I) = b − a if I is an interval with endpoints a, b, where a < b;
(iii) m(A + x) = m(A);
(iv) m(αA) = |α|m(A);
(v) m(A) ≤ m(B) if A ⊆ B; (m is monotone);
(vi) m(A ∪ B) = m(A) + m(B) if A ∩ B = ∅ (m is finitely additive);
6 INTEGRATION, H.T. 2024

(vi)0 m (S∞
S P∞
n=1 An ) = n=1 m(An ) if An ∩ Ak = ∅ for k 6= n (m is countably additive);
(vii) m ( ∞ A
n=1 n ) = lim n→∞ m(An ) if (An ) is an increasing sequence of sets.

In fact, there is very considerable redundancy here. For example, (v), (vi) and (vii)
follow from (i) and (vi)0 .
The status of (vi)0 is perhaps debatable, but it is usually assumed. It is equivalent to
(vi) and (vii) together, and (vii) is essential to have a Monotone Convergence Theorem.
Let us attempt to construct such an m. For A ⊆ R, suppose that A ⊆ ∞
S
n=1 In for
intervals In . Letting In0 = In \ (I1 ∪ · · · ∪ In−1 ), we have
[ X X
m(A) ≤ m( In0 ) = m(In0 ) ≤ m(In ).

So we attempt to define m as follows. First, for any interval I with endpoints a and b,
define
|I| = b − a.
For A ⊆ R, we define the outer measure of A to be
(∞ ∞
)
X [

m (A) = inf |In | : In intervals, A ⊆ In .
n=1 n=1

We can always take In = [−n, n], so the infimum is not over the empty set (but m∗ (A)
may be infinite). It makes no difference if we restrict In to being closed intervals, or
open intervals.
Proposition 2.1. 1. m∗ (∅) = 0, m∗ ({x}) = 0;
2. m∗ (I) = |I| = b − a if I is any interval with endpoints a, b;
3. m∗ (A + x) = m∗ (A);
4. m∗ (αA) = |α|m∗ (A);
5. m∗ (A) ≤ m∗ (B) if A ⊆ B;
6. m∗ (A
S∪ B) ≤ m∗P (A) + m∗ (B);
6 . m ( n=1 An ) ≤ ∞
0 ∗ ∞ ∗
n=1 m (An ).

Proof. (1), (3), (4), (5) are easy; (6) and (6)’ are moderately tricky exercises. See Q8
Sheet 1. Let us prove (2); we will do it for I = [a, b]; then the other cases follow using
(1), (5) and (6).
Firstly, m∗ [a, b] ≤ b − a, because we may take I1 = [a, b] and In = {0} for n ≥ 2.
Now suppose that [a, b] ⊆ ∞
S
n=1 In where In is an interval with endpoints an , bn
(which we can assume interesects [a, b]). Take ε > 0. Let
Jn = an − ε2−n , bn + ε2−n =: (cn , dn ).


Then Jn is open and [a, b] ⊆ ∞


S
SN n=1 Jn . By the Heine-Borel Theorem, [a, b] is compact,
so [a, b] ⊆ n=1 Jn for some N .
Now it is almost obvious that b−a ≤ N
P
n=1 |Jn |. Enumerate {cn , dn : n = 1, . . . , N }
in increasing order:
x1 < x2 < · · · < xk .
INTEGRATION, H.T. 2024 7

Then x1 < a < b < xk , each interval (xi , xi+1 ) is contained in some Jn , and Jn has
endpoints cn = xkn , dn = x`n , say. Hence
k−1
X X n −1
N `X N
X
b − a < xk − x1 = (xi+1 − xi ) ≤ (xi+1 − xi ) = |Jn |.
i=1 n=1 i=kn n=1

Now ∞ ≥ N
P P PN −(n−1) ε > b − a − 2ε. This holds for

n=1 |In | P n=1 |In | = n=1 |Jn | − 2
∞ ∗
every ε > 0, so n=1 |In | ≥ b − a. Hence m [a, b] ≥ b − a. 

A subset E of R is said to be null if m∗ (E) = 0.


Corollary 2.2. 1. Any subset of a null set S is null.
2. If En is a null set for n = 1, 2, . . . , then ∞ n=1 En is null.
3. Any countable subset of R is null.

Proof.
S∞ [DirectPproof of (2)] Let ε > 0. There exist intervals Ir,n such that En ⊆
I and |I | < ε2 −n . Now {I r, n = 1, 2, . . . } is a countable
r=1 r,n r Sr,n r,n :P family of
−n ∗
P P S
intervals covering En , and n r |Ir,n | < n ε2 = ε. Hence m ( n En ) = 0. 
Example 2.3. Let C0 = [0, 1], C1 = [0, 13 ] ∪ [ 23 , 1], C2 = [0, 91 ] ∪ [ 92 , 13 ] ∪ [ 23 , 97 ] ∪ [ 89 , 1],
etc. In general, Cn is the union of 2n disjoint closed intervals, each of length 3−n , and
Cn+1 is obtained from Cn by deleting the open middle third of each of those intervals.
Let C = ∞
T
n=1 Cn . Then C is a closed subset of R, known as the Cantor set.
Clearly, m (C) ≤ 2n 3−n for each n. Letting n → ∞ shows that C is null.

Let x ∈ [0, 1]. Then x ∈ C if and only if x has a ternary expansion x = ∞ −n ,


P
n=1 an 3
where each an = 0 or 2. Then a variation of Cantor’s proof shows that C is uncountable.

A property Q of real numbers is said to hold almost everywhere (a.e.) if the set of
real numbers for which Q does not hold is a null set. For example, χC = 0 a.e., i.e.,
χC (x) = 0 for almost all x, because C is null.
Now let us consider the question whether m∗ is countably additive.
Example 2.4. Let A be a subset of [0, 1] with the following properties;

(i) x, y ∈ A, x 6= y =⇒ x − y ∈ / Q;
(ii) For any x ∈ [0, 1], there exists q ∈ Q such that x + q ∈ A.

Then [
[0, 1] ⊆ (A − q) ⊆ [−1, 2].
q∈Q∩[−1,1]

Moreover, the sets A − q are disjoint (as q varies), and there are countably many of
them. If m∗ is countably additive, then
X X
1 = m∗ [0, 1] ≤ m∗ (A − q) = m∗ (A) ≤ 3.
q∈Q∩[−1,1] q∈Q∩[−1,1]

This is impossible.
8 INTEGRATION, H.T. 2024

Thus m∗ is not countably additive, provided that such a set A exists. The additive
group R is partitioned into the cosets of its additive subgroup Q, and (i) and (ii) say
that A contains exactly one member of each coset of Q. The existence of such a set
follows from the Axiom of Choice, an axiom of set theory beyond the basic axioms.
This shows that it is impossible to prove that m∗ is countably additive without using
some weird axiom which contradicts the Axiom of Choice. On the other hand, it can
be proved that it is impossible to show that m∗ is not countably additive, using only
the basic axioms of set theory.
This is bad news, but it is not so very bad because the badness occurs only with sets
which cannot be explicitly described. So we can rescue things by restricting attention
to a class of sets with good behaviour.
A subset E of R is said to be (Lebesgue) measurable if
m∗ (A) = m∗ (A ∩ E) + m∗ (A \ E)
for all subsets A of R. Here, A \ E = A ∩ (R \ E)—it is not assumed that E ⊆ A.1
Let MLeb be the set of all Lebesgue measurable subsets of R.
Proposition 2.5. 1. If E is null then E ∈ MLeb .
2. If I is any interval, then I ∈ MLeb .
3. If E ∈ MLeb , then R \ E ∈ MLeb . S
4. If En ∈ MLeb for n = 1, 2, . . . , then ∞ n=1 En ∈ MLeb . S∞
∈ M ∩E ∗
5. If
P∞ En Leb for n = 1, 2, . . . and En k = ∅ whenever n 6= k, then m ( n=1 En ) =

n=1 m (En ).

The proofs are exercises (Q9, Sheet 1 for 1,2,4 and 5), or can be found in books
such as Capinski & Kopp. (3) is almost trivial.
Note ∞
T S∞
n=1 En = R \ ( n=1 R \ En ), MLeb is also closed under (finite or countable)
intersections. The set A of Example 2.4 is not Lebesgue measurable.
Corollary 2.6. All open subsets, and all closed subsets of R, are Lebesgue measurable.

Proof. Any open subset of R is a countable union of intervals (See the optional exercise:
sheet 1 Q8). 

For E ∈ MLeb , we shall write m(E) for m∗ (E). Then m : MLeb → [0, ∞] is
countably additive.
The definition of Lebesgue measurability we have chosen to use is designed for
use in the proof that the Lebesgue measurable sets are closed under countable unions.
Also the Cartheodory condition generalises very nicely, and is an essential part of the
Carthedory extension theorem which is a fundamental tool for producing measures (see
B8.1: Probability Measure and Martingales, or Chapter 6.1 of Stein and Shakarchi).
1The definition we use is the same definition as Capinski & Kopp and Zhongming Qian’s lecture notes (2017)
— this is known as the Carthedory criterion for measurability. Etheridge had a different definition, Stein &
Shakarchi have another, Garling has another; and Priestley has yet another. All these definitions are equivalent,
but this requires some work; we will see the equivalence of the definition above with that used by Stein and
Shakarchi after Corollary 2.7 (but relying on your work proving the Lebsgue mesurable sets from a σ-algebra in
Proposition 2.5).
INTEGRATION, H.T. 2024 9

While the Carthedory condition is designed for use in proofs, it’s hard to visualise
being a condition quantified over all sets A. So we end with an alternative description
of Lebegue measurable sets:

Corollary 2.7. Let E ⊂ R be a Lebesgue measurable set. Then for ε > 0, there exists
an open set U ⊇ E with m(U \ E) < ε.

Proof. Suppose first


S∞ P∞. Then we can find countably
that m(E) < S∞many open intervals
(In )∞
n=1 with E ⊆ Pn=1 nI and |I
n N | < m(E) + ε. Let U = n=1 In so E ⊆ U and
m(E) ≤ m(U ) ≤ n |In | < m(E) + ε. Applying the definition of measurability with
A = U ,2 we get m(U ) = m(E) + m(U \ E) < m(E) + ε, so as m(E) < ∞, m(U \ E) < ε.
WhenSm(E) = ∞, let En = E ∩ [−n, n] which is measurable with m(En ) < ∞
and E = ∞ n=1 En . For ε > 0 use the previous S∞ paragraph to find open sets Un with
⊇ En and m(Un \ E \) −n ε. For U =
Un S n < 2 n=1 Un we have E ⊆ U and m(U \ E) ≤
m( ∞
P∞
n=1 (U n \ E n )) ≤ n=1 m(Un \ En ) < ε. 

Notice that if E ∈ MLeb , then by repeatedly taking ε = 1/n in Corollary 2.7 we


can find a set G which is a countable intersection of open sets3 such that E ⊆ G and
m(G \ E) = 0. Conversely if E ⊂ R has that there is a Gδ -set G with E ⊆ G and
m∗ (G \ E) = 0, then N = G \ E is null so Lebesgue measurable, and as G is Lebesgue
measurable so E = G \ N is Lebesgue measurable. As such it follows that E ⊆ R is
measurable if and only if for all ε > 0 there is some open U ⊇ E with m∗ (U \ E) < ε
— this is the definition of measurability used by Stein and Shakarchi.
Finally, let’s end this section by discussing the case of Rn for n ≥ 2 which we
will use in Fubini’s theorem later in the course. We define n
Qna rectangle R in R to be
a product of intervals I1 × I2 × · · · × In and let |R| = i=1 |InP |. Then for A ⊆ Rn
∗ ∞ ∗
S∞ Lebesgue outer measure of A is defined by m (A) = inf{ s=1 m (Rs ) : A ⊆
the
s=1 Rs , Rs rectangles} and proceed as in R to define the Lebesgue measurable sets
via the Catheodory condition. Note that, for example, for any E ⊆ R, E × {0} is null
in R2 . While the process is the same, there’s a couple of small details which are a
little trickier in Rn for n ≥ 2: it is a bit more fiddly to formalise the (geometrically
PK SK
clear) fact that |R| ≤ i=1 |Ri | whenever a bounded rectangle R has R ⊆ i=1 Ri
n 4
for bounded rectangles Ri in R than the corresponding result in R. . Also while open
subsets in R are countable disjoint unions of open intervals (see exercises), open subsets
of Rn are not countable disjoint unions of open rectangles (think about what happens
on the boundary); you have to allow for unions of rectangles which only intersect in the
boundaries. Nevertheless Corollary 2.7 and the characterisations of the next paragraph
hold equally well in Rn .
Specialising to R2 for notational purposes, one can then show that if E1 , E2 ⊆ R
are measurable, then so is E1 ×E2 : one finds G1 ⊇ E1 and G2 ⊇ E2 with Gi a countable
intersection of open sets, and Gi \ Ei is null. Then G1 × G2 is a countable intersection

2Or as U and E are both measurable, using finite additivity


3A countable intersection of open sets is known as a G -set.
δ
4Though in the proof that m∗ (R) = |R| for closed and bounded rectangles, compactness is still the key trick.
10 INTEGRATION, H.T. 2024

of open sets and you can check that (G1 × G2 ) \ (E1 × E2 ) is null in R2 (see Stein and
Shakarchi Proposition 3.3.6).5

3. Measure spaces and measurable functions

Let Ω be any set, and F ⊆ P(Ω). We say that F is a σ-algebra (or σ-field) on Ω if:
(i) ∅ ∈ F,
(ii) If E ∈ F, then Ω \ E ∈ F,
(iii) If En ∈ F for n = 1, 2, . . . , then ∞
S
n=1 En ∈ F.
T
Then (Ω, F) is a measurable space, and sets in F are F-measurable. As before, En ∈ F
if En ∈ F for n = 1, 2, . . . .
A measure on (Ω, F) is a function µ : F → [0, ∞] such that
(i) µ(∅)
S = 0,
(ii) µ( ∞
P∞
n=1 En ) = n=1 µ(En ) whenever En are disjoint sets in F.

Then (Ω, F, µ) is a measure space.


A measure µ is finite if µ(Ω) < ∞; µ is a probability measure if µ(Ω) = 1.
Examples 3.1. 1. (R, MLeb , m) is a measure space. Also, ([0, 1], MLeb |[0,1] , m) is a
probability space, where MLeb |[0,1] is the set of all Lebesgue measurable subsets of
[0, 1].
2. Let Ω be any set, F = P(Ω) and µ(E) = |E| (the number of elements of E). This
is a measure space; µ is counting measure on Ω.
3. In probability theory, let Ω be a sample space of all possible outcomes, F be the
collection of all events E, and P(E) be the probability that event E occurs. Then P
is a probability measure on (Ω, F), and (Ω, F, P) is a probability space.
4. Let F : R → R be an increasing function. Note that F may be discontinuous, but
its left and right limits exist at each point. We assume that F (x) = limy→x+ F (y)
for all x (without essential loss). Define
mF (a, b] = F (b) − F (a),
(∞ ∞
)
X [

mF (E) = inf mF (Jn ) : Jn = (an , bn ], E ⊆ Jn .
n=1 n=1
Then m∗F has similar properties to m∗ , but one has to be aware that m∗F (a, b) =
F (b−) − F (a), m∗F ([a, b]) = F (b) − F (a−); and m∗F ({x}) = 0 if and only if F is
continuous at x. One can then define a σ-algebra MF , containing all intervals, in
the same way as MLeb , and m∗F is a measure, written mF on MF . This is the
Lebesgue-Stieltjes measure associated with F .
Proposition 3.2. Let (Ω, F, µ) be a measure space.
1. If A, B ∈ F and A ⊆ B, then µ(A) ≤ µ(B). S
2. If An ∈ F and An ⊆ An+1 for all n, then µ( n An ) = limn→∞
T µ(An ).
3. If An ∈ F and An ⊇ An+1 for all n and µ(A1 ) < ∞, then µ( n An ) = limn→∞ µ(An ).
5The converse is false: consider a non-measurable subset A ⊆ [0, 1], then A × {0} is null in R2 so measurable.
INTEGRATION, H.T. 2024 11

Proof. (1) Since B = A ∪ (B \ A) (disjoint union), µ(B) = µ(A) + µ(B \ A) ≥ µ(A).


(2) Let A01 = A1 and A0r = Ar \ Ar−1 for r ≥ 2. Then An = nr=1 A0r , ∞
S S
n=1 An =
S ∞ 0 (disjoint unions), so
A
r=1 r
[ ∞
X n
X
µ( An ) = µ(A0r ) = lim µ(A0r ) = lim µ(An ).
n→∞ n→∞
r=1 r=1
(3) is an exercise. 

We will be interested
Proposition 3.3. Let Ω be a set, and B ⊆ P(Ω). Then there is a unique σ-algebra FB
on Ω satisfying:
(i) FB is a σ-algebra and B ⊆ FB ,
(ii) If F is σ-algebra on Ω and B ⊆ F then FB ⊆ F.

Proof. We let FB be the intersection of all σ-algebras on Ω which contain B (which you
should check is a σ-algebra, so that (i) holds). By definition (ii) holds. Notice that if
FB0 is another such σ-algebra, then applying (ii) for FB we have FB ⊆ FB0 . But revesing
the roles, we can apply (ii) for FB0 giving FB0 ⊆ FB . 

The σ-algebra MBor generated by the intervals is the Borel σ-algebra on R. It can
be described as the class of all subsets of R which can be obtained from intervals in
a countable number of steps, each of which is one of taking the complement of a set,
taking a countable union of sets, or a countable intersection of sets. However this has
to be treated with caution, because it is not necessarily possible to obtain a given Borel
set by performing the countable number of steps in a single sequence.
Proposition 3.4. 1. Let B be any one of the following classes of subsets of R.
(i) All intervals
(ii) All intervals of the form (a, ∞)
(iii) All intervals of the form [a, b]
(iv) All open sets.
Then MBor is the smallest σ-algebra on R containing B.
2. MBor 6= MLeb .
3. If E ∈ MLeb there exist A, B ∈ MBor such that A ⊆ E ⊆ B and B \ A is null (so
E \ A and B \ E are null).

Proof. (1) is an exercise involving showing each interval can be obtained from members
of B, and each member of B can be obtained from intervals (see Sheet 2, Q3 for (ii)).
(2) and (3) are quite deep results; (2) will be discussed in the appendix; (3) is Theorem
2.28 in Capinski & Kopp. 

Let (Ω, F) be a measurable space. A function f : Ω → R is F-measurable if


f −1 (I)
∈ F for each interval I.
Proposition 3.5. Let B be any one of the classes of subsets of R listed in Proposition
3.4. Let f : Ω → R. Then f is F-measurable if and only if f −1 (G) ∈ F for all
G ∈ MBor or for all G ∈ B.
12 INTEGRATION, H.T. 2024

Proof. It is easily verified that f∗ (F) := {G ⊆ R : f −1 (G) ∈ F} is a σ-algebra on R.


Hence if B generates the σ-algebra MBor , then the result holds. 

In this course, we shall usually take (Ω, F) to be (R, MLeb ) or minor variants, but
much of this section will apply to the general case as well. We may refer to MLeb -
measurable functions simply as measurable functions, for simplicity; or as Lebesgue
measurable functions. We shall also be interested in cases where Ω is an interval (or
a Lebesgue measurable subset) and F = MLeb |Ω = {E ∈ MLeb : E ⊆ Ω}. However,
f : Ω → R is MLeb |Ω -measurable if and only if f˜ : R → R is measurable, where
f˜(x) = f (x) for x ∈ Ω, and f (x) = 0 otherwise. So we may state results just for
functions defined on R.
Recall from the Analysis courses that f : R → R is continuous if and only if f −1 (G)
is open for every open set (or open interval) G. By Proposition 3.5, we have that f is
(Lebesgue) measurable if and only if f −1 (G) is (Lebesgue) measurable for every open
set (or open interval) G.

Examples 3.6. 1. Constant functions are measurable.


2. The characteristic function χA of a subset A of R is measurable if and only if A is a
measurable set. In particular, if A is as in Example 2.4, then χA is not (Lebesgue)
measurable.
3. Continuous functions f : R → R are measurable.
4. Monotone functions f : R → R are measurable.
5. If f is continuous a.e., then f is measurable.
6. If f : R → R is (Lebesgue) measurable and g = f a.e., then g is (Lebesgue) measur-
able.
7. In probability theory, measurable functions are called random variables.

It follows from the definition of measurable functions and Example 3.6(2) that
the existence of a non-measurable function is equivalent to the existence of a non-
measurable set. So their existence depends on the Axiom of Choice. Thus, we have the
following:
Fact of Life. ALL FUNCTIONS f : R → R THAT CAN BE EXPLICITLY DEFINED
ARE LEBESGUE MEASURABLE.
This is not exactly a mathematical theorem—it becomes one if one interprets “ex-
plicitly defined” in the right technical way. It is a true statement about the real world: a
non-measurable function involves some non-explicit choice process. Priestley compares
the existence of non-measurable functions to the existence of yetis.
Nevertheless, measurability is a real issue in some more advanced mathematics,
because:

(a) One may be interested not in Lebesgue measurability of functions f on R, but in


measurability on some other measurable space (Ω, F). This occurs frequently in
time-dependent probability theory, where Ft is the class of all events depending only
on past history up to time t, not on the future (cf. Part B courses on martingales
and stochastic calculus).
INTEGRATION, H.T. 2024 13

(b) One may be interested in functions f which are not real-valued, but take values in
an infinite-dimensional space. Then measurability is a real issue in many areas of
analysis, although you probably won’t see this in your undergraduate course.
So it is useful to accumulate general results about measurable functions, even if we only
state them for functions f : (R, MLeb ) → R.
Proposition 3.7. Let f and g be measurable functions from R to R. The following
functions are measurable:
f + g, f g, max(f, g), h ◦ f for any continuous function h.
For example, αf is measurable, where α ∈ R.

Proof. For example,


[
(f + g)−1 (a, ∞) = f −1 (q, ∞) ∩ g −1 (a − q, ∞).
q∈Q

If G is open in R, then h−1 (G) is open. Since f is measurable, f −1 (h−1 (G)) is measur-
able, i.e., (h ◦ f )−1 (G) is measurable. 

In fact, it suffices in Proposition 3.7 that h should be Borel measurable.


Now we want to consider limits and suprema of sequences of functions (fn ). Even
if each fn is real-valued, the resulting functions may take the values ∞ and −∞.
A function f : R → [−∞, ∞] is measurable if f −1 (a, ∞] ∈ MLeb for all a ∈ R;
equivalently f −1 (B) ∈ MLeb for all B ∈ MBor and f −1 ({∞}) ∈ MLeb ; equivalently,
arctan ◦f is measurable, where arctan : [−∞, ∞] → [−π/2, π/2] is the inverse tan
function.
Proposition 3.8. Let (fn ) be a sequence of measurable functions from R → [−∞, ∞].
Then the following functions are measurable:
sup fn , inf fn , lim sup fn , lim inf fn .
n n n→∞ n→∞

Hence, if f (x) = limn→∞ fn (x) a.e., then f is measurable.

Proof. First,
[
(sup fn )−1 (a, ∞] = fn−1 (a, ∞] ∈ MLeb .
n
Then
inf fn = − sup(−fn ),
lim sup fn = inf gm , where gm = sup fn .
n≥m

A function φ : R → R is simple if it is measurable and it takes only finitely many


real values. So χE is simple if E ∈ MLeb . If φ, ψ are simple, then so are φ + ψ, φ.ψ,
αφ, max(φ, ψ), h ◦ φ for any function h.
14 INTEGRATION, H.T. 2024

Any function of the form nj=1 βj χEj , where βj ∈ R and Ej ∈ MLeb is simple. On
P

the other hand, if φ is simple with non-zero values α1 , . . . , αk , and Bi = φ−1 ({αi }),
then Bi is measurable, and
k
X
(*) φ= αi χBi .
i=1

In this form, we have

(i) αi are distinct and non-zero,


(ii) Bi are disjoint.

If these additional properties hold, then (*) is unique (up to reordering of the terms).
We shall then say that φ is in standard, or canonical, form. For example, the standard
form of χ(0,2) + χ[1,3] is 1χ(0,1)∪[2,3] + 2χ[1,2) .
In defining simple functions, some authors insist that the sets Bi , corresponding
to non-zero αi , must be bounded [Etheridge] or of finite measure [Stein & Shakarchi].
[Garling and Priestley avoid introducing simple functions.]
Examples 3.9. 1. Any step function is a simple function—for a step function, the sets
Bi in the standard representation must be finite unions of bounded intervals (or single
points).
2. The function χQ∩[0,1] is a simple function but it is not a step function.
Proposition 3.10. Let f : R → [0, ∞] be measurable. There is an increasing sequence
(φn ) of non-negative simple functions φn such that
f (x) = lim φn (x)
n→∞

for all x ∈ R.

Proof. For n = 1, 2, . . . and k = 0, 1, 2, . . . , 4n − 1, let


Bk,n = x : k2−n ≤ f (x) < (k + 1)2−n .


Let (
k2−n if x ∈ Bk,n for some (unique) k,
φn (x) =
2n if f (x) ≥ 2n .
Then φn ≤ φn+1 , φn ≤ f , φn (x) > f (x) − 2−n for all sufficiently large n if f (x) < ∞,
and φn (x) = 2n for all n if f (x) = ∞. 

Notice here that the approximating simple functions are constructed by taking
horizontal strips, unlike Prelims where vertical strips were used.
Theorem 3.11. A function f : R → R is measurable if and only if there is a sequence
of step functions ψn such that f = lim ψn a.e.

Proof. Stein & Shakarchi, Theorem 4.3, p.32. 


INTEGRATION, H.T. 2024 15

4. The Lebesgue integral: non-negative functions

We now start to define our notion of the integral. In contrast to Riemann’s theory
which simultaneously considers upper and lower approximations to the area under the
curve, in Lebesgue’s theory we approximate area that lies above the x-axis from below,
and area below the x-axis from above. This leads us to split any function into its positive
and negative parts, and integrate these separately. In this section, we’ll develop the
theory of integration for non-negative functions, and turn to the general case in the
following section.
For a non-negative simple function φ with standard form ki=1 αi χBi (so αi > 0),
P
the integral of φ is defined to be:
Z Z ∞ k
X
φ= φ(x) dx = αi m(Bi ).
R −∞ i=1
R
Note that φ < ∞ if and only if m(Bi ) < ∞ for each i.
Proposition 4.1. Let φ, ψ be non-negative simple functions, α ∈ [0, ∞).
Pn
1. If φ = j=1 βj χEj where βj ≥ 0 and Ej are measurable (but not necessarily in
R P
standard form), then φ = j βj m(Ej ).
R R R R R
2. (φ + ψ) = φR + ψ, R αφ = α φ.
3. If φ ≤ ψ then φ ≤ ψ.

The first statement of Proposition


R 4.1 isPnot completely obvious, but fortunately it
is true! [Capinski & Kopp define φ to be j βj m(Ej ), ignoring the question whether
this is well-defined.]
For a non-negative measurable function f : R → [0, ∞], we define the integral of f
to be Z Z 
f = sup φ : φ simple, 0 ≤ φ ≤ f .
R R
For a measurable subset E of R, we define
Z Z
f= f χE .
E R
˜ where f˜ agrees with
R R
For a measurable function f : E → [0, ∞), we define E f= R f,
f on E and is 0 on R \ E.
R
Notice that for non-negative measurable functions f : E → R we allow E f to take
the value ∞, so the integral
R is always defined, but it may not be finite.6 We say that
f is integrable over E if E f < ∞.
This definition of integral corresponds to the lower integral in Prelims, but with
simple functions replacing step functions. If the Monotone Convergence Theorem is to
be true, then Proposition 3.10 shows that the integral must equal this supremum, but
it is still necessary to show that our definition has good properties.
6You should compare this with the next section: for functions taking both positive and negative values we
R
need the function to be integrable before we define f .
16 INTEGRATION, H.T. 2024

It is clear from the definition of integral that


R R
(i) αf = α f (α R ≥ 0);R
(ii) If f ≤ g, then f ≤ g,
The first things to establish are
R R R
(iii) (f + g) = f + g,
(iv) The Monotone Convergence Theorem.
Theorem 4.2. [Monotone Convergence Theorem] If (fn ) is an R increasing se-
R
quence of non-negative measurable functions and f = limn→∞ fn , then f = limn→∞ fn .

This is the first of our three big convergence theorems. We’ll give a slight strength-
ening of the theorem as Theorem 6.1.
R R
Proof. Since fn ≤ f , it is immediate that supn fn ≤ f.
For the reverse inequality,
R we consider
R a simple function φ such that 0 ≤ φ ≤R f .
We have
R to show that
R φ ≤ limn→∞ fn . It then follows from the definition of f
that f ≤ limn→∞ fn .
Take α ∈ (0, 1), and let
Bn = {x : fn (x) ≥ αφ(x)}.
Then Bn is measurable (since fn − αφ is measurable), Bn ⊆ Bn+1 and ∞
S
n=1 Bn = R
(for each x, either φ(x) = 0 or f (x) > αφ(x)). Since αφχBn ≤ fn χBn ≤ fn ,
Z Z
(*) α φ≤ fn .
Bn R
Pk
If φ = i=1 βi χEi , then
Z k
X k
X Z
φ= βi m(Ei ∩ Bn ) → βi m(Ei ) = φ
Bn i=1 i=1 R

as n → ∞, by Proposition 3.2(2). Taking limits in (*),


Z Z
α φ ≤ lim fn .
R n→∞ R

Letting α → 1− gives the required inequality. 

We have not specified the range of integration. It could be R, or it could be a fixed


interval I. We can also apply the MCT when the the range of integration depends on
n, by taking fn to be 0 elsewhere. See Example 4.8.
Corollary 4.3. [Baby MCT] Let f be a non-negative measurable Rfunction, (ERn ) be an
increasing sequence of measurable sets, and E = ∞
S
E . Then E f = supn En f =
R R n
n=1
limn→∞ En f , and so f is integrable over E if supn En < ∞.

Proof. Apply Theorem 4.2 with fn = f χEn , noting that χEn ≤ χEn+1 and f ≥ 0, so
fn ≤ fn+1 and χE (x) = limn→∞ χEn (x). 
INTEGRATION, H.T. 2024 17

The baby version of the MCT will be used a lot in order to use the fundamental
theorem of calculus to compute integrals on closed and bounded sets using the theory
developed in prelims.
Corollary 4.4. For non-negative measurable functions f and g,
Z Z Z
(f + g) = f + g.

Proof. Let (φn ) and ψn be increasing sequences of non-negative simple functions, con-
verging pointwise to f and g respectively (Proposition 3.10). Then (φn + ψn ) is an
increasing sequence, converging to f + g. By MCT and Proposition 4.1(2),
Z Z Z Z  Z Z Z Z
(f +g) = lim (φn +ψn ) = lim φn + ψn = lim φn + lim ψn = f + g.
n→∞ n→∞ n→∞ n→∞


Corollary
P∞ 4.5. [MCT R for Series]
P∞ R Let fn be non-negative measurable functions and
fP=R n=1 fn . Then f = n=1 fn . In particular, f is integrable if and only if
n f n < ∞.
Pn
Proof. Let gn = r=1 fr , and apply MCT. 

In order to give any interesting examples, we need to show that the integrals just
defined agree with the Riemann integral initially for continuous functions on closed
bounded intervals. We will come back to this in Section 5, but record the result here
for continuous functions for use in the next few examples.7
RL
Corollary 4.6. Let f : [a, b] → [0, ∞) be continuous. Then the Lebesgue integral [a,b] f
RR
as defined above equals the Riemann integral [a,b] f as defined in first-year Integration.
Example 4.7. Consider f (x) = (1 − x)−1/2 on (0, 1). By Baby MCT (Corollary 4.3),
Corollary Corollary 4.6 and FTC (from Prelims),
Z 1 Z 1− 1
n
(1 − x)−1/2 dx = lim (1 − x)−1/2 dx = lim 2(1 − n−1/2 ) = 2.
0 n→∞ 0 n→∞

For 0 ≤ x < 1, the Binomial Theorem with exponent −1/2 or Taylor’s Theorem in
complex analysis gives

−1/2
X (2n)! n
(1 − x) = x .
4n (n!)2
n=0
By Corollary 4.5 and FTC,
Z 1 ∞ Z 1 ∞
X (2n)! X (2n)!
(1 − x)−1/2 dx = n (n!)2
x n
dx = n n!(n + 1)!
.
0 4 0 4
n=0 n=0

7This is easier than the result in section 5, as continuous functions are automatically measurable. So one
can prove this by choosing a sequence of partitions given by repeatedly bisecting [a, b], and taking the step
functions associated to the lower Riemann sums for these partitions, we obtain an increasing sequence (φn ) of
step functions such that limn→∞ φn (x) = f (x) for all x ∈ [a, b] (continuity ensures that one has convergence
RR RL
everywhere) and limn→∞ ab φn = [a,b] f . By the MCT (Theorem 4.2), limn→∞ ab φn = [a,b]
R R
f . As you’ll see
in Section 5, for a general Riemann integrable f , we can arrange for the φn to converge to f almost everywhere,
and then the same result holds.
18 INTEGRATION, H.T. 2024

The fact that the series above converges to 2 can be obtained directly from the Binomial
Expansion of (1 − x)1/2 , via Abel’s continuity theorem (A2 lecture notes MT 2019,
Theorem 13.24).
Z nπ 
x  2 −x3
Example 4.8. Consider cos x e dx. It is not obvious how to evaluate
0 2n
the integral for a given value of n, but we can use the MCT to find the limit of the
integrals, as n → ∞, as follows.
Let
( x  2 −x3
 x  2 −x3 cos x e if 0 ≤ x ≤ nπ
fn (x) = cos x e χ[0,nπ] (x) = 2n
2n 0 otherwise.
Fix n for a moment. We wish to show that fn (x) ≤ fn+1 (x) for all x. If 0 ≤ x ≤ nπ,
x x
then 0 ≤ cos ≤ cos , so fn (x) ≤ fn+1 (x). If nπ < x ≤ (n + 1)π, then
2n 2(n + 1)
fn (x) = 0 ≤ fn+1 (x). If x > (n + 1)π (or if x < 0), then fn (x) = 0 = fn+1 (x). Thus
we have established our claim that fn (x) ≤ fn+1 (x) for all x.
3
Noting that fn (x) → f (x) = x2 e−x for all x ≥ 0, the MCT gives
Z nπ  Z n Z ∞
x  2 −x3 3
lim cos x e dx lim fn (x) dx = x2 e−x .
n→∞ 0 2n n→∞ 0 0
This can be computed using the Baby MCT at the first step below, and the FTC at
the second:
Z ∞ 3
1 − e−n
Z n
2 −x3 2 −x3 1
x e dx = lim x e dx = lim = .
0 n→∞ 0 n→∞ 3 3

5. The Lebesgue integral: general functions

Now we turn to integrability of functions which are not necessarily non-negative.


Let f : R → [−∞, ∞] be measurable. Let
f + = max(f, 0), f − = max(−f, 0).
Note that f + and f − are measurable and non-negative, and
f = f + − f −,
|f | = f + + f − .
We say that f is integrable if f is measurable and f + and f − are both finite.
R R

Notice that this requirement prevents any problems with ∞ − ∞.8 Then the integral
of f is Z Z Z
f= f+ − f −.
Moreover, f is integrable over a measurable subset E if f χE is integrable. If f : E →
[−∞, ∞], then f is integrable over E if f˜ is integrable over R. We write f ∈ L1 (E) to
mean that f is integrable over E.
8It is possible to make sense of the quantity R f if one only has that one of R f + or R f − is finite, but this
notion would not be well behaved — for example we definitely want the sum of two integrable functions to be
integrable.
INTEGRATION, H.T. 2024 19

Proposition 5.1. 1. If f is integrable, then |f | is integrable.


2. If f is measurable and |f | is integrable, then f is integrable.
3. (Comparison Test) If f is measurable and |f | ≤ g for some integrable function g,
then f is integrable. If |f | ≥ g ≥ 0 for some measurable function g which is not
integrable, then f is not integrable. R
4. If
R f, g Rare both integrable and f + g is defined,
R then f +Rg is integrable and R (f + g)
R =
f + g. For α ∈ R, αf is integrable and αf = α f . RIf f ≤R g, then f ≤ g.
5. If f is integrable and g = f a.e., then g is integrable and g = f .
6. If f is integrable then Rf (x) ∈ R a.e.
7. If f is integrable and |f | = 0 then f (x) = 0 a.e.
8. If f is integrable over S∞a measurable set RE and (En ) isR an increasing sequence of
measurable sets with n=1 En = E then E f = limn→∞ En f .
R ± R R + R −
Proof. (1) and (2) follow from f ≤ |f | = f + f . (3) follows from |f | ≤
|f | ≤ g. (4) follows from (f + g) ≤ f ± + g ± and (f + g)+ + f − + g − =
±
R R
g =⇒

(f +g) +f + +g + . (5): Since |g −f | R= 0 a.e., any simple function φ with 0 ≤ φ ≤ |g −f |
is a.e. 0, so its integral is 0. Hence |g − f | = 0. (6), (7): Exercise, Sheet 2 Q9. (8):
Apply Baby MCT to f + and f − . 

By (5), changing a function on a null set does not affect integrability. So if we


have a function defined a.e., we can talk about it being integrable by considering any
extension of f —for example, the extension by 0. Also, integrability over [a, b] is the
same as integrability over (a, b).
The following are corollaries of the Comparison Test.
Corollary 5.2. 1. If g is integrable and h is bounded and measurable, then hg is inte-
grable.
2. If g is integrable over R, then g is integrable over any measurable subset of R.
3. If h is a bounded measurable function, then h is integrable over any measurable subset
of finite measure.

Proof. These follow from the Comparison Test, using


|g.h| ≤ c|g|, |gχE | ≤ |g|, |hχE | ≤ cχE . 

Apart from Corollary 4.6, almost all the theory in Section 4 up to this point applies
to general measure spaces. Now we make some comments which are specific to the case
of Lebesgue measure.
Firstly, as promised in Section 4, the Lebesgue integral is more general than the
Riemann (Prelims) integral. In fact, f : [a, b] → R is Riemann integrable if and
only if f is bounded and continuous a.e.9 Any such f is measurable and bounded,
9This is a very nice exercise, but off topic for us, so omitted. See Stein and Shakarchi Problem 1.6.4. The
essential idea, which is useful for many prelims exercises relating to continuity is to consider the oscilation of a
function f , ωf (x) = limδ→0 (supy∈(x−δ,x+δ) f (y) − inf y∈(x−δ,x+δ) f (x)). You can check that f is continuous at
x if and only if ωf (x) = 0. So if f is continuous a.e. then for any  > 0 the set A = {x ∈ [a, b] : ωf (x) ≥ }
is null, so (using compactness, and you’ll need to check it is compact) can be covered by finitely many open
intervals of total length . This should help you access analysis 3, sheet 2, Q4 to get Riemann integrability.
20 INTEGRATION, H.T. 2024

hence Lebesgue integrable, however this is overkill for obtaining measurability: If f is


Riemann integrable, then f is bounded and there is an increasing sequence (φn ) and
Rb
decreasing sequence (ψn ) of step functions such that φn ≤ f ≤ ψn and limn→∞ a φn =
RR Rb
[a,b] f = limn→∞ a ψn . Let g = supn φn and h = inf n ψn . Then g and h are measur-
RL Rb
able, g ≤ f ≤ h and [a,b] (h − g) ≤ limn→∞ a (ψn − φn ) = 0. By Proposition 5.1(7),
g = h a.e. Then f = g a.e., so f is (Lebesgue) measurable. By Corollary 5.2(3), f is
Lebesgue integrable.
Moreover, for a Riemann integrable f : [a, b] → R,
Z R Z b  Z b 
f = sup φ : φ step, φ ≤ f ≤ sup φ : φ simple, φ ≤ f
[a,b] a a
Z L Z b  Z R
≤ f ≤ inf ψ : ψ step, f ≤ ψ = f
[a,b] a [a,b]
Hence equality holds throughout, so the Lebesgue integral equals the Riemann integral.
Given a function f : I → R, where I is an interval in R, how does one test whether
f is integrable over I? We can do the following:
• Note that f is measurable (for example, using Examples 3.6).
• Replace f by |f |: we can assume that f is non-negative. (Proposition 5.1(1),(2))
• If I is bounded and f is bounded, then f is integrable over I. (Corollary 5.2(3))
• If I or f is unbounded, we can probably consider an increasing sequence of
bounded subintervals In , withR union I, such that f is bounded on each In .
• We may be able to evaluate In f by means of the FTC, Integration by Parts,
or Substitution from Prelims theory. Then we can use Baby MCT (Corollary
4.3).
• If we cannot easily evaluate the integral of f , use the Comparison Test—we look
for a simpler measurable function g such that g is known to be integrable and
0 ≤ f ≤ g (if we think f is going to be integrable), or g is known not to be
integrable and 0 ≤ g ≤ f (if we think f will not be integrable).
Examples 5.3. 1. Consider xα over (0, 1), where α ∈ R. Note first that xα is contin-
uous, hence measurable, and non-negative. If α ≥ 0, then xα is bounded (by 1) on
(0, 1), hence integrable. If α < 0, xα has a singularity at x = 0, so we use Baby
MCT with In = [1/n, 1]. By FTC,
( (
Z 1 1−n−(α+1) ∞ (α ≤ −1)
(α 6
= −1)
xα dx = α+1 → 1
1/n log n (α = −1) α+1 (α > −1).
R1
By Baby MCT, xα is integrable over (0, 1) if and only if α > −1, and then 0 xα dx =
(α + 1)−1 .
2. Consider xα over [1, ∞). This is similar, but with In = [1, n]. Now
( α+1 (
Z n n −1
(α 6
= −1) ∞ (α ≥ −1)
xα dx = α+1 → 1
1 log n (α = −1) − α+1 (α < −1).

Conversely if f is Riemann integrablem n ∈ N and  > 0, take a partition P such that U (f ; p) − L(f ; P ) < /n
and consider the total length of the intervals in P whose interior intersects A1/n .
INTEGRATION, H.T. 2024 21

By
R ∞ Baby MCT, xα is integrable over (1, ∞) if and only if α < −1, and then
α −1
1 x dx = −(α + 1) .
3. Consider f (x) = xα /(1 + xβ ) over (0, ∞), where α ∈ R and β ≥ 0. For 0 < x ≤ 1,
xα /2 ≤ f (x) ≤ xα . By comparison, f is integrable over (0, 1) if and only if xα is,
i.e., α > −1. For x > 1, xα−β /2 < f (x) < xα−β , so, by comparison, f is integrable
over (1, ∞) if and only if xα−β is, i.e., α − β < −1. Hence f is integrable over (0, ∞)
if and only if −1 < α < β − 1. [The case when β < 0 can be reduced to the previous
case because f (x) = xα−β /(1 + x−β ).]
4. Consider f (x) = (sin x)/x over (0, 2π). This function is continuous on (0, 2π], hence
measurable. If we define f (0) = 1, it becomes continuous, hence bounded on [0, 2π]—
in fact it is bounded above by 1 and below by −1/π. So it is integrable over (0, 2π).
5. Consider f (x) = (sin x)/x over (0, ∞). Now
Z (r+1)π Z (r+1)π
sin x | sin x| 2
dx ≥ dx = .
rπ x rπ (r + 1)π (r + 1)π
Hence,
Z nπ n−1
X 2
lim |f (x)| dx ≥ lim = ∞.
n→∞ 0 n→∞ (r + 1)π
r=0
So |f | is not integrable, and hence f is not integrable, over (0, ∞).

Let us discuss the first-year theorems a little more carefully.


Theorem 5.4. (Fundamental Theorem of Calculus) Let g be a function with a
continuous derivative on a closed bounded interval [a, b]. Then g 0 is integrable over
[a, b], and
Z b
g 0 (x) dx = g(b) − g(a).
a

The FTC should be treated with care, if the range of integration is unbounded (as
already discussed), or if the derivative does not exist at some points as the following
examples show.
Examples 5.5. 1. Let f (x) = x sin x1 (x ∈ (0, 1]);f (0) = 0. Then

f is continuous
on [0, 1] and differentiable on (0, 1] but f 0 (x) = sin x1 − x1 cos x1 ∈/ L1 (0, 1).


2. We define a function Φ : [0, 1] → [0, 1] as follows. On the Cantor set C,


∞ ∞
!
X
−n
X an −n
Φ an 3 = 2 (an = 0 or 2).
2
n=1 n=1

Then put Φ = 21 on [ 13 , 32 ], 14 on [ 19 , 29 ], etc. Then Φ is continuous, monotonic, differen-


tiable at each point of [0, 1] \ C with Φ0 (x) = 0. So
Z 1
Φ0 (x) dx = 0 6= Φ(1) − Φ(0).
0

This function Φ is called the Cantor-Lebesgue function, or the devil’s staircase.


22 INTEGRATION, H.T. 2024

Theorem 5.6. (Integration by Parts) Let f and g be continuously differentiable


functions on a closed bounded interval [a, b]. Then
Z b Z b
0
f (x)g (x) dx = f (b)g(b) − f (a)g(a) − f 0 (x)g(x) dx.
a a

Integration by parts must be treated with great care if the interval of integration is
an unbounded interval or the integrand has a singularity and you do not know whether
the integrals exist. In those circumstances you cannot infer the existence of one integral
from the existence of the other.
Ra
Example 5.7. Consider 0 sinx x dx. Integration by parts gives
Z a Z a
sin x cos a cos x
dx = cos 1 − − dx.
1 x a 1 x2
But cos
x2
x
≤ x12 , so cos
x2
x
is integrable over [1, ∞), by Example 5.3(2) and the Compar-
ison Test. It follows from Proposition 5.1(8) that
Z a Z 1 Z ∞
sin x sin x cos x
lim dx = dx + cos 1 − dx.
a→∞ 0 x 0 x 1 x2
Nevertheless, sin x/x is not integrable over (0, ∞), by Example 5.3(5).

In the case of substitution, one can infer the existence of one integral from the
other. [Note: Priestley’s comment near the bottom of p.133 is misleading.]
Theorem 5.8. (Substitution) Let g : I → R be a monotonic function with a continu-
ous derivative on an interval I, and let J be the interval g(I). A (measurable) function
f : J → R is integrable over J if and only if (f ◦ g).g 0 is integrable over I. Then
Z Z
f (x) dx = f (g(y))|g 0 (y)| dy.
J I

This theorem is not contained in the one in the first-year course, because f is not
required to be continuous or Riemann integrable. FTC gives the result when f = χJ 0 for
a bounded interval J 0 ⊆R J. One has to extend this to f = χE when E ∈ MLeb , E ⊆ J,
i.e., one needs m(E) = g−1 (E) g 0 . After that, the rest follows fairly easily. See Theorem
7.4 in Qian’s notes.
Example 5.9. Let I = (0, 1), g(y) = 1/y, so J = (1, ∞). Let f (x) = xα . Then
xα ∈ L1 (1, ∞) if and only if y −α−2 ∈ L1 (0, 1). This provides a passage between
Example 5.3, (1) and (2).

Other measures. We make some comments about integration with respect to mea-
sures other than Lebesgue.

PA function f : N → R is integrable with respect


R to counting
P∞ measure µ if and only
if f (n) is absolutely convergent, and then f dµ = n=1 f (n). Thus the general
theorems that follow will provide theorems about summing absolutely convergent series.
Next, consider a probability space (Ω, F, P). A measurable function is now just a
random variable X on this space, and the integral of X with respect to P is just the
INTEGRATION, H.T. 2024 23

expectation E(X); X is integrable if and only if |X| has finite expectation. The the-
ory that follows applies to all random variables simultaneously—discrete, continuous,
hybrid, singular.

6. The Convergence Theorems

The feature of Lebesgue integration theory which distinguishes it from other theo-
ries, and makes it much more manageable, is the group of theorems known as conver-
gence theorems. These are the theorems, mentioned in the introduction, which enable
one to pass limits or infinite sums through integrals, under certain conditions.
We have already seen the MCT, but we give a different form below to allow for
increasing sequences of functions which are not necessarily non-negative. RNotice that
in this case, we must work with integrable functions so that we can add f1 to both
sides at the end of the argument.10.
Theorem 6.1. Let (fn ) be a sequence of integrable functions such that:

R n, fn ≤ fn+1 a.e.,
(1) for each
(2) supn fn < ∞.
R R
Then (fn ) converges a.e. to an integrable function f , and f = limn→∞ fn .

Proof. By Proposition 5.1(6), fn (x) ∈ R a.e. From this and assumption (1) we may
redefine fn on the union of countably many null set without changing any integrals, so
we may assume that fn (x) ≤ fn+1 (x) andRfn (x) ∈ R for all x andR all n.R Apply Theorem
4.2 applied to fn − f1 . One obtains that (f − f1 ) = limn→∞ fn − f1 . RThus f − f1
is integrable, so f is Rintegrable which
R implies that f is finite a.e. Adding f1 to both
sides we obtain that f = limn→∞ fn . 
Theorem 6.2. [Fatou’s Lemma] Let (fn ) be a sequence of non-negative measurable
functions. Then Z Z
lim inf fn ≤ lim inf fn .
n→∞ n→∞

R
RProof. Let gr := inf R to lim inf n→∞ fnRand gr ≤ fr and Rgr ≤
R n≥r fn . Then (gr ) increases
fr . By MCT, lim inf n→∞ fn = limr→∞ gr = lim inf r→∞ gr ≤ lim inf r→∞ fr .


2 n
Note that
R in Example 0.1 with fn (x)R = n x (1 − x) on (0, 1), fn ≥R 0, limn→∞ fn =
0, limn→∞ fn = 1. So one can have limRsupn→∞ fn < lim inf n→∞ fn . RHowever if
fn ≤ g for all n where g is integrable, then lim supn→∞ fn ≥ lim supn→∞ fn (apply
Fatou to g − fn ).
R R
One can also have lim supn→∞ fn > lim supn→∞ fn —for example, fn (x) =
sin2 (x + n) on (0, π).

10Whereas the MCT which has non-negative f , doesn’t require the f to have finite integrals, but then it
n n
does not conclude that f is integrable.
24 INTEGRATION, H.T. 2024

Theorem 6.3. [Dominated Convergence Theorem] Let (fn ) be a sequence of


measurable functions such that:

(1) (fn (x)) converges a.e. to a limit f (x),


(2) there is an integrable function g such that, for each n, |fn (x)| ≤ g(x) a.e.
R R
Then f is integrable, and f = limn→∞ fn .

Proof. Since f is measurable (Proposition 3.8) and |f (x)| ≤ g(x) a.e., f isR integrable
by
R comparison. Apply
R Fatou’s
R Lemma R to g + fn and gR − fn , to obtain (g + f ) ≤
g + lim inf n→∞ fn and (g − f ) ≤ g − lim supn→∞ fn . 
Z 1 3/2 x
n xe
Example 6.4. Consider 2 2
dx. It is difficult (impossible?) to evaluate the
0 1+n x
integrals themselves, but we can find the limit of the integrals, with the help of the
DCT (Theorem 6.3). Let
n3/2 xex (nx)3/2 ex
fn (x) = = .
1 + n2 x2 1 + n2 x2 x1/2
y 3/2
The function tends to 0 as y → ∞, so it is bounded for y > 0. It follows that
1 + y2
fn (x) → 0 as n → ∞, and there is a constant c such that
cex ce
0 ≤ fn (x) ≤ 1/2
≤ 1/2 (0 < x < 1).
x x
ce
Now let g(x) = . Then g is integrable over (0, 1) (Example 5.3(1)), so we have
x1/2
verified the conditions of the DCT (with f = 0). We can therefore conclude that
Z 1 3/2 x
n xe
lim dx = 0.
n→∞ 0 1 + n2 x2

Corollary 6.5. [Bounded Convergence Theorem] Let I be a bounded interval,


(fn ) be a sequence in L1 (I) converging a.e. to f , and suppose
R that there is Ra constant
c such that |fn (x)| ≤ c a.e., for all n. Then f ∈ L1 (I), and I f = limn→∞ I fn .

The next example involves, for the first time in this course, integration of a complex-
valued function. A function f : R → C is integrable if Re f and Im f are both inte-
grable. Results which hold for real-valued integrable functions and which make sense
for complex-valued functions are almost invariably true in the complex case, and can
easily be deduced by applying the result to the real and imaginary parts separately.
This is the case, for example, with the Comparison Test, FTC, Integration by Parts
and the DCT. Note, however, that in Theorem 5.8 (Substitution), the function f may
be complex-valued, but the substitution g(t) is assumed to be real-valued.
Example 6.6. Let γr be the semi-circular contour {reiθ : 0 ≤ θ ≤ π}, and consider
Z π
eiz
Z
dz = i eir cos θ e−r sin θ dθ.
γr z 0
INTEGRATION, H.T. 2024 25

Since
eir cos θ e−r sin θ ≤ 1 for all r > 0, 0 ≤ θ ≤ π
(
0 as r → ∞, if 0 < θ < π,
eir cos θ e−r sin θ →
1 as r → 0
the Bounded Convergence Theorem gives
eiz eiz
Z Z
dz → 0 (Rn → ∞), dz → πi (εn → 0).
γRn z γεn z
By Cauchy’s Theorem,
eiz eiz Rn
eix − e−ix
Z Z Z
0= dz − dz + dx.
γRn z γεn z εn x
Letting n → ∞, we obtain
Z Rn
sin x π
lim dx = .
n→∞ ε
n
x 2
Ra
Hence lima→∞ 0 sinx x dx = π/2 (see Example 5.7, and Part A Complex Analysis,
Example 11.9 in MT2020 notes).

Next we will apply the results above to term-by-term integration of series. We


start by recalling the MCT for Series (Corollary 4.5 above).
Theorem 6.7. [Monotone Convergence Theorem for Series] Let (gn ) be a se-
quence of integrable functions such that:
(1) P
for each
R n, gn ≥ 0 a.e.,
(2) n g n < ∞.
Then ∞
P R P∞ P∞ R
n=1 gn converges a.e. to an integrable function, and n=1 gn = n=1 gn .

Theorem 6.8. [Lebesgue’s Series Theorem; Beppo P R Levi Theorem, P ....] Let

(gn ) be a sequence of integrable functions such
R Pthat |gn | <R ∞. Then
n P n=1 gn
∞ ∞
converges a.e. to an integrable function, and n=1 gn = n=1 g n .

Proof. Apply MCT for Series to gn+ and gn− . Alternatively, apply MCT for Series to
|gn | and use the fact that absolute convergence implies convergence. 
P
Theorem 6.9. Let R P∞n |gn |
P∞(gn ) be a sequence of integrable functions such that is
integrable.
P∞ R Then n=1 gn converges a.e. to an integrable function, and n=1 gn =
n=1 gn .
Pk R R P∞ P∞ R R P∞
Proof. Clearly n=1 |gn | ≤ n=1 |gn | for all k, so n=1 |gn | ≤ n=1 |gn |.
Apply Theorem 6.8. 
R 1 α−1 −x
Example 6.10. Let α > 0, and consider 0 x e dx. Let gn (x) = (−1)n xα+n−1 /n!,
P∞
so that n=0 gn (x) = xα−1 e−x . Now
Z 1
1
|gn (x)| dx = ,
0 (α + n)n!
26 INTEGRATION, H.T. 2024

P R1
so n 0 |gn (x)| dx < ∞. Thus Lebesgue’s Series Theorem tells us that our integral
exists (we could have established this directly, by comparing the integrand with xα−1 ),
and that
Z 1 ∞ Z 1 ∞
α−1 −x
X X (−1)n
x e dx = (−1)n xα+n−1 /n! dx = .
0 0 (α + n)n!
n=0 n=0

R∞ −isx e−x2
Example 6.11. Let s ∈ R, and consider −∞ e dx. The integrand is con-
2 2 (−isx)n −x2
tinuous, |e−isx e−x | = ≤ e−x ee−|x| ∈ L1 (exercise). If gn (x) = n! e , then
P∞ −isx −x 2
n=0 gn (x) = e e , and

2 2 /2 2 /2
X
|gn (x)| = e|sx|−x ≤ es e−x ∈ L1 .
n=0

∈ L1 , so Theorem 6.9 shows that term-by-term integration is


P
It follows that n |gn |
permissible, and
∞ ∞ Z ∞
(−isx)n −x2
Z
−isx −x2
X
e e dx = e dx.
−∞ n!
n=0 −∞
Now 
Z ∞ 0 if n is odd
2 √
xn e−x dx = (2m)! π
−∞  if n = 2m,
4m m!
(for m = 0 this is a standard trick, and one can use integration by parts and induction
on m). Thus
Z ∞ ∞ √
−isx −x2
X (−is)2m π √ −s2 /4
e e dx = = πe .
−∞ 4m m!
m=0
The integral which we have just evaluated is very important—for example, apart from
a few constants, it is the characteristic function of the normal distribution (as in Part A
2
Probability); in analysts’ language, it is the Fourier transform of the function e−x (as
in DEs). There are other methods of evaluating the integral; one is given in Priestley
(Complex Analysis, 22.12) and Part A Integral Transforms (Example 77 in HT2020
notes), and another will be given in Example 7.6.

All theorems in this Section hold in general measure spaces. Corollary 6.5 holds in
finite measure spaces.

7. Integrals depending on a parameter

Let f : R2 → R be a function of two variables. In a while, we shall discuss the


(double) integral, and the repeated integrals, of f . First, we merely consider the partial
integral of f , obtained by integration with respect to one of the variables. Thus we
suppose that for each fixed y, the function x 7→ f (x, y) is integrable. We can then
define a function F by: Z
F (y) = f (x, y) dx.
INTEGRATION, H.T. 2024 27

A natural, and important, question is whether F is continuous, or differentiable, as-


suming that f has corresponding properties. In general, the answer is negative (see
Example 7.1), but if we impose some mild conditions of the type that appear in the
DCT, then the answer is positive.
2 2
Example 7.1. Let f (x, y) = ye−x y . Since f (x, 0) = 0 for all x, F (0) = 0. For fixed
R∞ 2
y 6= 0, we can make the substitution t = yx and deduce that F (y) = −∞ e−t dt(=

π) (y 6= 0). Thus F is discontinuous, even though f is differentiable.
Theorem 7.2. [Continuous-parameter DCT] Let I and J be intervals in R, and
f : I × J → R be a function such that:
(1) for each y in J, x 7→ f (x, y) is integrable over I,
(2) for each y in J, limy0 →y f (x, y 0 ) = f (x, y) a.e.(x),
(3) there exists an integrable function g on I such that for each y in J, |f (x, y)| ≤
g(x) a.e.(x).
R
Define F (y) = I f (x, y) dx (y ∈ J). Then F is continuous on J.

Remark. In condition (3) of Theorem 7.2, the function g does not depend on y.

Proof. Let (yn ) be any sequence in J converging to y ∈ J. Let fn (x) = f (x, yn ). Then
|fn (x)| ≤ g(x) a.e., for all n, and limn→∞ fn (x) = f (x, y) a.e., so the conditions of the
DCT are satisfied. The DCT implies that:
Z Z
F (yn ) = f (x, yn ) dx → f (x, y) dx = F (y).
I I
Thus F is continuous. 
Example 7.3. The Gamma function Γ is defined by:
Z ∞
Γ(y) = e−x xy−1 dx (y > 0).
0
We wish to show that Γ is continuous, firstly for y ∈ [1, 2]. In order to apply Theorem
7.2, we take I = (0, ∞), J = [1, 2], and f (x, y) = e−x xy−1 . Condition (1) of Theorem
7.2 is an exercise, and (2) is more or less trivial. For condition (3), we need to ensure
that
(
e−x (0 < x ≤ 1)
(7.1) g(x) ≥ sup f (x, y) = −x
1≤y≤2 xe (x > 1).
We choose to take g equal to the right-hand side of (7.1). Then g is integrable over
(0, ∞) (exercise), so condition (3) of Theorem 7.2 is satisfied. Thus, Theorem 7.2 shows
that Γ is continuous on [1, 2].
In fact, Γ is continuous on (0, ∞). However, it is impossible to establish this by
applying Theorem 7.2 with J = (0, ∞), for in condition (3), it would be necessary that
(
x−1 e−x (0 < x ≤ 1)
g(x) ≥ sup f (x, y) =
y>0 ∞ (x > 1).
28 INTEGRATION, H.T. 2024

Such a function g cannot possibly be integrable over (0, ∞), so it is impossible to


satisfy condition (3) of Theorem 7.2. Instead, we proceed as follows. For each b > 0,
let Jb = (a, c), where a and c are chosen so that 0 < a < b < c, for example, a = b/2
and c = 2b. Then let
(
xa−1 e−x (0 < x ≤ 1)
gb (x) = sup f (x, y) =
a<y<c xc−1 e−x (x > 1).
Then gb is integrable over (0, ∞). Thus, Theorem 7.2 shows that Γ is continuous on
(a, c), and in particular at b. But b is arbitrary, so Γ is continuous on (0, ∞).

The point is that continuity is a local property: F is continuous if F is continuous


at y for all y in the domain. We abstract the method to obtain the following version
of Theorem 7.2, where the dominating function is defined locally depending on the
parameter. Notice though that we still need a single gb to be valid over the entire open
interval Jb .
Corollary 7.4. Let I and J be intervals in R, and f : I × J → R be a function such
that (1) and (2) of Theorem 7.2 hold, and

(30 ) for each b ∈ J, there exist an open subinterval Jb of J containing b and an


integrable function gb on I such that, for each y ∈ Jb , |f (x, y)| ≤ gb (x) a.e.(x).

Then F is continuous on J, where F is as in Theorem 7.2.

Remark. The method of Theorem 7.2 can also be used to cover cases where y → y0
for a single point y0 or y → ∞. For example, suppose that there exists a in R and a
function h : I → R such that

(1) for each y > a, x 7→ f (x, y) is integrable over I,


(2) limy→∞ f (x, y) = h(x) a.e.(x),
(3) there exists an integrable function g on I such that for each y > a, |f (x, y)| ≤
g(x) a.e.(x).
R
Then F (y) → I h(x) dx as y → ∞.

Now we turn to the question of differentiability of F . The sort of result which we


hope to have is that if ∂f
∂y exists, and some supplementary conditions are satisfied, then
F is differentiable and Z
0 ∂f
F (y) = (x, y) dx
∂y
(differentiation through, or under, the integral sign). The standard supplementary
condition is that ∂f
∂y should be dominated by an integrable function, independent of y.

Theorem 7.5. Let I and J be intervals in R, and f : I × J → R be a function such


that:

(1) for each y in J, x 7→ f (x, y) is integrable over I,


(2) for almost all x in I, ∂f
∂y (x, y) exists for all y ∈ J
INTEGRATION, H.T. 2024 29

(3) there is an integrable function g : I → R such that for almost all x ∈ I,


∂f
∂y (x, y) ≤ g(x) holds for all y ∈ J.
R
Define F (y) = I f (x, y) dx (y ∈ J). Then F is differentiable on J and
Z
0 ∂f
F (y) = (x, y) dx.
I ∂y

Note that in condition (3) (and (2)) above we require a single null set N such that
∂f
∂y (x, y) ≤ g(x) holds for all y ∈ J and x ∈ I \ N . This is not a-priori the same as
requiring that for all y ∈ J, ∂f 11
∂y (x, y) ≤ g(x) holds for almost all x ∈ I. . In practise
when you want to apply Theorem 7.5 it’s quite likely that (3) will hold for all x and y
(or perhaps all but finitely many values of x).

Proof. Fix y in J, and let (yn ) be any sequence in J converging to y (with yn 6= y). Let
f (x, yn ) − f (x, y)
gn (x) = .
yn − y
Then gn is integrable over I, gn (x) → ∂f
∂y (x, y) for almost all x as n → ∞. Moreover,
the Mean Value Theorem says that there exists a point ξx,n (depending on x and n)
between yn and y such that gn (x) = ∂f
∂y (x, ξx,n ). It follows from (3) that |gn (x)| ≤ g(x)
12
a.e.(x). This shows that the Dominated Convergence Theorem is applicable, so
F (yn ) − F (y)
Z Z
∂f
= gn (x) dx → (x, y) dx as n → ∞.
yn − y I I ∂y
Since (yn ) is an arbitrary sequence tending to y, and the right-hand side is independent
of the choice of sequence, it follows that
F (y 0 ) − F (y)
Z
∂f
→ (x, y) dx as y 0 → y,
y0 − y I ∂y
which completes the proof. 
2 ∞
Example 7.6. Let f (x, s) = e−isx e−x , and F (s) = −∞ f (x, s) dx (compare Example
R

6.11). This integral exists for all s. Moreover,


∂f 2
(x, s) = −ixe−ixs e−x ,
∂s
so
∂f 2
(x, s) = |x|e−x .
∂s
Since Z n Z n
−x2 2 2
|x|e dx = 2 xe−x dx = 1 − e−n → 1
−n 0

11as that would allow the null set N of those x for which this fails to depend on y.
y
12This is where it matters that we have a single null set N for which ∂f (x, y) ≤ g(x) holds for all x ∈ I \ N
∂y
and y ∈ J. If we had that for each y ∈ J, there was a null set Ny depending on y such that the estimate holds
for y ∈ J and x ∈ I \ Ny , then as ξx,n depends on x, we would have no way of deducing that |gn (x)| ≤ g(x)
a.e.. I think Oliver Riordan for bringing this issue to my attention.
30 INTEGRATION, H.T. 2024

2
as n → ∞, |x|e−x ∈ L1 (R) (Baby MCT). Thus Theorem 7.5 is applicable, with
2
I = J = R and g(x) = |x|e−x . It follows that F is differentiable on R, and
Z ∞
0 2
F (s) = −i xe−isx e−x dx.
−∞
By integration by parts,
s
F 0 (s) = − F (s).
2
2 /4 R∞ 2 √
Hence F (s) = Ae−s for some constant A. But F (0) = −∞ e−x dx = π, so

A = π.
Corollary 7.7. Let I and J be intervals in R, and f : I × J → R be a function such
that (1) and (2) of Theorem 7.5 hold, and
(30 ) for each b in J, there is an open subinterval Jb of J containing b and an inte-
grable function gb : I → R such that, for almost all x, ∂f
∂y (x, y) ≤ gb (x) for all
y ∈ Jb .
Then the conclusions of Theorem 7.5 hold.
Example 7.8. Let f (x, y) = e−xy (1 + x3 )−1 (x ≥ 0, y ≥ 0). Since 0 ≤ f (x, y) ≤
(1 + x3 )−1 , x 7→ f (x, y) is integrable over [0, ∞) for each y ≥ 0. Moreover,
∂f xe−xy
(x, y) = − ,
∂y 1 + x3
so
∂f x
(x, y) ≤ (x ≥ 0, y ≥ 0).
∂y 1 + x3
Since x(1+x3 )−1 is integrable over [0, ∞) (by comparison with x−2 for x ≥ 1), Theorem
7.5 is applicable, and shows that F is differentiable on [0, ∞) and
Z ∞ −xy
0 xe
F (y) = − dx.
0 1 + x3

We would like to repeat this argument to show that F 00 (y) exists (at least for y > 0),
but this is more complicated. Indeed,
∂2f x2 e−xy
(x, y) = .
∂y 2 1 + x3
For y = 0, this function is not integrable (by comparison with x−1 ), so we should only
consider y > 0. However, it is not possible to apply Theorem 7.5 with f replaced by
∂f
∂y and with J = (0, ∞), because

∂2f x2
sup 2
(x, y) = ,
y>0 ∂y 1 + x3
which is not integrable over [0, ∞). Instead, we must apply Corollary 7.7. Thus we
take b > 0, let Jb = (b/2, ∞), and
∂2f x2 e−xb/2
gb (x) = sup 2
(x, y) = 3
≤ x2 e−xb/2 .
y>b/2 ∂y 1 + x
INTEGRATION, H.T. 2024 31

This function is integrable on [0, ∞), and we conclude from Corollary 7.7, with f
replaced by ∂f 00
∂y and J = (0, ∞) that F (y) exists for y > 0 and
Z ∞ 2 −xy
x e
F 00 (y) = dx.
0 1 + x3
Repeating this argument, it is possible to show that F is infinitely differentiable on
(0, ∞) and to obtain integrals for all the derivatives.

Remark. There are versions of Theorem 7.5 and Corollary 7.7 where the real variable
y ∈ J is replaced by a complex variable z ∈ Ω, a domain in C, the function f is
complex-valued, z 7→ f (x, z) is holomorphic for each x, and the conclusion is that F is
holomorphic. The proofs are almost the same, except that the use Rof the Mean Value
∂f
Theorem should be replaced by the formula gn (x) = (zn − z0 )−1 [z0 ,zn ] ∂w (x, w) dw
which leads to the estimate |gn (x)| ≤ g(x).

8. Double Integrals

In Section 7, we considered some properties concerning functions of two variables,


but we confined integration to one of the variables. Now it is time to consider integration
with respect to both variables. An example on Problem Sheet 1 shows that this is not
just a matter of integrating first with respect to one variable, and then with respect
to the other (repeated integration). What one has to do is to define the class L1 (R2 )
of integrable functions on R2 , and their (double) integrals, in a way which treats both
variables simultaneously, then establish the theorem (Fubini’s Theorem) which ensures
that the double integrals coincide with the repeated integrals for functions in L1 (R2 ),
and establish a practical method (Tonelli’s Theorem) to determine whether a given
function is integrable.
The first part of this is routine. The class L1 (R2 ) of integrable functions on R2 is
defined in exactly the same way as L1 (R), except that intervals (a, b), and their lengths
b − a, are replaced by rectangles (a, b) × (c, d) and their areas (b − a)(d − c). Then one
defines outer measure, null sets (line segments etc are null), measurable sets (all open
sets etc are measurable), as we discussed at the end of Section 2, measurable functions,
simple functions, integrable functions and (double) integrals just as in Sections 2–4.
Moreover, the results of Sections 2-6, except Section 4 from Theorem 5.4 onwards,
remain valid, with obvious changes of wording where necessary. More details may be
found in Capinski & Kopp (Chap 6, but in greater generality) or Stein & Shakarchi
(from beginning).
The (double) integral of an integrable function f over R2 may be denoted by any
of the following:
Z Z Z Z
f, f, f (x, y) d(x, y), f (x, y) d(x, y).
R2 R2

Theorem 8.1. (Tonelli) Let f : R2 → [0, ∞] be measurable. Then


(1) x 7→ Rf (x, y) is measurable for almost all y;
(2) y →
7 R f (x, y) dx (defined a.e.) is measurable;
32 INTEGRATION, H.T. 2024

(3)
Z Z Z 
f (x, y) d(x, y) = f (x, y) dx dy.
R2 R R

Now we state two consequences of this in their traditional form.


Theorem 8.2. [Fubini’s Theorem] Let f : R2 → R be integrable. Then, for almost
R function x 7→ f (x, y) is integrable. Moreover, defining (for almost all y) by
all y, the
F (y) = f (x, y) dx, then F is integrable, and
Z Z Z 
f (x, y) d(x, y) = f (x, y) dx dy.
R2 R R

Similarly,
Z Z  Z Z Z 
f (x, y) dx dy = f (x, y) d(x, y) = f (x, y) dy dx,
R R R2 R R

where the first repeated integral exists in the sense described above.

RProof. Apply Theorem 8.1 to f + and f − , using Proposition 5.1(6) to get that
±
R f (x, y) dx < ∞ a.e.(y). 

Theorem 8.3. [Tonelli’s Theorem] Let f : R2 → R be a measurable function, and


suppose that either of the following repeated integrals is finite:
Z Z  Z Z 
|f (x, y)| dx dy, |f (x, y)| dy dx.
R R R R

Then f is integrable. Hence, Fubini’s Theorem is applicable to both f and |f |.

|f | < ∞. Then f ∈ L1 (R2 ), by Proposition


R
Proof. Apply Theorem 8.1 to get that R2
5.1(2). 

Remark. Note that, when applying Tonelli’s Theorem, one must verify that a repeated
integral of |f | is finite. It is not sufficient that the repeated integrals of f exist (see
Example 8.4), nor is it sufficient that the repeated integrals of f both exist and are
equal (see Example 8.7).
If E is a measurable subset of R2 and f : E → R is any function, then f is said to
be integrable over E if f˜ is integrable over R2 , where f˜(x, y) = f (x, y) if (x, y) ∈ E,
f˜(x, y) = 0 otherwise. Then E f is defined to be R2 f˜.
R R

Fubini’s Theorem and Tonelli’s Theorem can be applied in this situation. However,
when E is not a rectangle, great care must be taken to choose the correct limits of
integration in the repeated integrals. If in any doubt draw a sketch of the region. See
Example 8.5.
In RR
repeated integrals, one often omits the brackets around the inner integral and
writes f (x, y) dy dx, etc., with appropriate limits of integration. This means that
INTEGRATION, H.T. 2024 33

one is integrating first with respect to y between the limits on the right-hand integral
sign, which may be functions of x. Thus
Z b Z ψ(x)
f (x, y) dy dx
a φ(x)
denotes the repeated integral over the region E bounded
by curves y = φ(x) and y = ψ(x) and by vertical lines
x = a, x = b.
x−y
Example 8.4. Let f (x, y) = (0 < x < 1, 0 < y < 1). It was an exercise in
(x + y)3
Problem Sheet 1 that the repeated integrals of f exist, but are not equal. It follows from
the final part of Fubini’s Theorem that f is not integrable over the square (0, 1) × (0, 1).
!
1 − y 1/2
Z 1 Z x 
Example 8.5. Consider dy dx. As it stands, the inner integral
0 0 x−y
is difficult. However, it turns out that when the order of integration is reversed, the
other repeated integral is easily evaluated. To justify the equality of the repeated in-
tegrals, we apply Tonelli’s Theorem; this is contained
in the following discussion.
First, note that the integrand is continuous ex-
cept on the line y = x which is null; it is non-negative
throughout the range of integration, so that in apply-
ing Tonelli’s Theorem, it is unnecessary to replace f
by |f |. The next problem is to work out the limits
of integration when we reverse the order. For this,
we have to identify the region in R2 over which the
double integral is taken. For each x, between 0 and
1, we are integrating along the (vertical) line-segment from y = 0 to y = x. As x runs
from 0 to 1, this sweeps out the triangle shown. The integrand is continuous on the
interior of the triangle (and we take it to be 0 outside the triangle), so it is measurable.
If we fix a value of y, the values of x which give us points within the triangle are those
between x = y and x = 1. This applies for y between 0 and 1; otherwise there are no
points within the triangle. Thus the limits of the reversed repeated integral are x = y
and x = 1 in the inner integral, and y = 0 and y = 1 in the outer. This is confirmed
by the following equalities of sets:
{(x, y) ∈ R2 : 0 < y < x, 0 < x < 1} = {(x, y) ∈ R2 : 0 < y < x < 1}
= {(x, y) ∈ R2 : y < x < 1, 0 < y < 1},
but the picture was more informative!
Now the reversed repeated integral is:
!
1 − y 1/2
Z 1 Z 1  Z 1h ix=1
dx dy = 2(1 − y)1/2 (x − y)1/2 dy
0 y x−y 0 x=y
Z 1
= 2(1 − y) dy = 1.
0
34 INTEGRATION, H.T. 2024

Since the integrand is non-negative, and since this repeated integral is finite, it fol-
lows from Tonelli’s Theorem that f is integrable over the triangle, and from Fubini’s
Theorem that !
1 − y 1/2
Z 1 Z x 
dy dx = 1.
0 0 x−y

The next example shows how it is both possible and useful to make changes of
variable within the inner integral of a repeated integral. The same technique will be
used in several subsequent examples.
2 2
Example 8.6. Let f (x, y) = ye−y (1+x ) . Since f is continuous, it is certainly measur-
R ∞consider the integral of f over the positive quadrant (0, ∞)×(0, ∞). First
able. We shall
we consider 0 f (x, y) dy for a fixed x. Making the change of variable t = y(1 + x2 )1/2
(x is a constant at this point),
Z ∞ Z ∞ −t2 " 2
#t=k
te e−t 1
f (x, y) dy = 2
dt = lim − 2
= .
0 0 1+x k→∞ 2(1 + x ) 2(1 + x2 )
t=0

This function is integrable with respect to x, and


Z ∞ Z ∞ 
π
f (x, y) dy dx = .
0 0 4
Since f (x, y) ≥ 0 for y ≥ 0, it follows from Tonelli’s Theorem that f is integrable over
(0, ∞) × (0, ∞), and by Fubini’s Theorem,
Z ∞ Z ∞ 
π
f (x, y) dx dy = .
0 0 4
In the inner integral, where y > 0 is fixed, we can make the change of variable u = xy,
and obtain
Z ∞ Z ∞  Z ∞ Z ∞ 
π −(y 2 +u2 ) −y 2 −u2
= e du dy = e e du dy
4 0 0 0 0
Z ∞  Z ∞  Z ∞ 2
−u2 −y 2 −x2
= e du e dy = e dx .
0 0 0

It follows that
Z ∞ √
−x2 π
e dx = .
0 2

If f takes both positive and negative values, then to apply Tonelli’s Theorem, it is
necessary to consider |f |, or alternatively to consider separately the regions where f is
positive and where it is negative.
xy
Example 8.7. Let f (x, y) = 4 . Since f is odd both as a function of x, and also
Z ∞ x + y4 Z ∞
as a function of y, f (x, y) dy = 0 for all x, and f (x, y) dx = 0 for all y. Hence
−∞ −∞
INTEGRATION, H.T. 2024 35

both repeated integrals exist and equal 0. However, if we consider f over the quadrant
x > 0, y > 0, part of the region where f (x, y) > 0, then, putting y = xt (x > 0 fixed),
Z ∞ Z ∞
x3 t c
f (x, y) dy = 4 4
dt = ,
0 0 x (1 + t ) x
Z ∞
t
where c is the constant dt. Since cx−1 is not integrable with respect to x
0 1 + t4
over (0, ∞), it follows that f is not integrable over the quadrant, and therefore not
integrable over the plane.

In practice, it often happens that one has no means of evaluating the repeated
integrals of f or |f |, but can nevertheless decide whether f is integrable. One technique
for this is to show that f is dominated by a simpler function which one can show to be
integrable (or that f dominates a function which one can show not to be integrable).
 
1
Example 8.8. Let f (x, y) = sin cos(x2 + y 3 ). We wish to show that f is
x2 + y 4
integrable over the positive quadrant (0, ∞) × (0, ∞).
Since f is continuous in this region (although not con-
tinuous at (0, 0)), it is measurable. Moreover, f is
bounded, and hence integrable over any bounded re-
gion, in particular over the square (0, 1)×(0, 1). Thus
it suffices to show that f is integrable over the regions
[1, ∞) × [0, ∞) and (0, 1) × (1, ∞).
Using the inequalities | sin t| ≤ |t| and | cos t| ≤ 1, it follows that |f (x, y)| ≤ (x2 +
y 4 )−1 ,
so it suffices to show that (x2 + y 4 )−1 is integrable over these two regions. Now
Z ∞ Z ∞  Z ∞ Z ∞ 
dy dz
dx = dx < ∞,
1 0 x2 + y 4 1 0 x3/2 (1 + z 4 )

where we made the substitution y = x1/2 z and used the integrability of x−3/2 over
[1, ∞) and of (1 + z 4 )−1 over (0, ∞). Also,
Z ∞ Z 1  Z ∞ Z 1  Z ∞
dx dx dy 1
2 + y4
dy ≤ 4
dy = 4
= .
1 0 x 1 0 y 1 y 3

It follows from Tonelli’s Theorem that (x2 + y 4 )−1 is integrable over these two regions,
so f is integrable over the quadrant.

Another useful technique for testing functions for integrability, and for evaluating
integrals, is to change variables. The reader will be familiar with this idea from courses
in applied mathematics and in A3 Probability, and will know that one has to take
account of the Jacobian of the transformation. The method is the extension to two
variables of Theorem 5.8. We shall state the result and give examples for polar coordi-
nates x = r cos θ, y = r sin θ, when the Jacobian is r. This corresponds to the fact that
a small rectangle with sides δr, δθ (area δrδθ) in the (r, θ)-space is transformed into an
approximate rectangle of sides δr, rδθ (area rδrδθ)) in the (x, y)-space.
36 INTEGRATION, H.T. 2024

Theorem 8.9. Let E be a measurable subset of R2 , and f : E → R be a func-


tion. Let E 0 = {(r, θ) : 0 ≤ r, 0 ≤ θ < 2π, (r cos θ, r sin θ) ∈ E} and g(r, θ) =
rf (r cos θ, r sin θ) (r, θ ∈ E 0 ). Then f is integrable over E if and only if g is integrable
over E 0 . In that case,
Z Z
f (x, y) d(x, y) = f (r cos θ, r sin θ) r d(r, θ).
E E0
R∞ 2
Example 8.10. In Example 8.6 we evaluated 0 e−x dx, using Fubini’s Theorem.
Here, we shall evaluate the same integral by the more common method of polar coor-
dinates.
2 2
Let E = (0, ∞) × (0, ∞) and f (x, y) = e−(x +y ) . Then
Z ∞Z ∞ Z ∞  Z ∞  Z ∞ 2
−x2 −y 2 −x2
f (x, y) dy dx = e dx e dy = e dx < ∞.
0 0 0 0 0
It follows from Tonelli’s Theorem that f is integrable over E. In the notation of
Theorem 8.9, E 0 = {(r, θ) : 0 < r, 0 < θ ≤ π/2}, so it follows from Theorem 8.9 and
Fubini’s Theorem that
Z ∞ 2 Z Z π/2 Z ∞
−x2 −r2 2 π
e dx = e r d(r, θ) = e−r r dr dθ = .
0 E0 0 0 4
R ∞ −x2 √
This confirms that 0 e dx = π/2.
xy
Example 8.11. As in Example 8.7, let f (x, y) = . In the notation of Theorem
x4 + y 4
1 sin θ cos θ
8.9, g(r, θ) = . Since g is not integrable over [0, ∞) × [0, 2π) (because
r sin4 θ + cos4 θ
r−1 is not integrable over [0, ∞)), f is not integrable over R2 .
x2 − y 2
Example 8.12. Let f (x, y) = . The square (0, 1) × (0, 1) is not very con-
(x2 + y 2 )2
venient for polar coordinates, but we can easily overcome this problem. Since f is
bounded, hence integrable, over the bounded region {(x, y) : 0 < x < 1, 0 < y < 1, 1 <
x2 + y 2 }, f is integrable over the square if and only if it is integrable over the quadrant
E = {(x, y) : 0 < x < 1, 0 < y < 1, x2 + y 2 ≤ 1}. In the notation of Theorem 8.9,
E 0 = {(r, θ) : 0 < r ≤ 1, 0 < θ < π/2} and
r2 (cos2 θ − sin2 θ) cos 2θ
g(r, θ) = r 4
= .
r r
Since r−1 is not integrable over (0, 1), g is not integrable over the rectangle E 0 (in
(r, θ)-space), so f is not integrable over E.

Now we state a version of Theorem 8.9 for general changes of coordinates. Let
T : (u, v) 7→ (x, y) be a change of variables, and suppose that x, y are differentiable
functions of u, v. Let JT be the Jacobian matrix:
 ∂x ∂x 
JT = ∂u ∂y
∂v
∂y .
∂u ∂v
Observe that JS◦T = JS JT (Chain Rule).
INTEGRATION, H.T. 2024 37

Theorem 8.13. Let E 0 be an open subset of R2 , T : E 0 → R2 be a one-to-one differ-


entiable function of E 0 onto a subset E of R2 , and f : E → R be a function. Then f is
integrable over E if and only if (f ◦ T )| det JT | is integrable over E 0 . In that case,
Z Z
f= (f ◦ T )| det JT |.
E E0

∂(x,y)
Writing ∂(u,v) for det JT , this formula becomes
Z Z
∂(x, y)
f (x, y) d(x, y) = f (u, v) d(u, v).
E E 0 ∂(u, v)
∂(x,y)
To recover Theorem 8.9 from Theorem 8.13, take T (r, θ) = (r cos θ, r sin θ), so ∂(r,θ) =
r.
In the situation of Theorem 8.13, E is always measurable (continuous image of a
Borel set) although this is not obvious.
One can extend Section 8 to Rn instead of R2 . Moreover, for any (σ-finite) measure
spaces (Ω1 , F1 , µ1 ) and (Ω2 , F2 , µ2 ), one can define a product (Ω1 ×Ω2 , F1 ⊗F2 , µ1 ×µ2 )
such that Fubini’s and Tonelli’s theorems hold.

9. Lp -spaces

A useful measure of distance between two integrable functions f and g is:


Z
d(f, g) = |f − g| =: kf − gk1 .

Then
(i) kf k1 = 0 if and only if f = 0 a.e. (Proposition 5.1(5),(7));
(ii) kαf k1 = |α|kf k1 ;
(iii) kf + gk1 ≤ kf k1 + kgk1 .
Consequently,
(i)0 d1 (f, g) = 0 if and only if f = g a.e.
(ii)0 d1 (g, f ) = d1 (f, g);
(iii)0 d1 (f, h) ≤ d1 (f, g) + d1 (g, h).
So k · k1 is almost a norm and d1 is almost a metric (cf., Metric Spaces). The problems
are that we have not yet defined a suitable vector space, and kf k1 = 0 does not imply
that f is the zero function.
If we allow our integrable functions to take the values ∞ and −∞, then f + g
may not be everywhere defined (but it is almost everywhere defined). Any integrable
function is real-valued almost everywhere, so we will now take L1 to be the space of all
integrable functions with real (or complex) values. Then we identify functions which
are almost everywhere equal (actually, we have effectively been doing this for some
time). Define an equivalence relation on L1 by
f ∼ g ⇐⇒ f = g a.e.
38 INTEGRATION, H.T. 2024

Let [f ] be the equivalence class of f , and N = [0] = {f : R → R : f = 0 a.e.}. Then N


is a subspace of the vector space L1 , and we can form the quotient space L1 := L1 /N
as a vector space whose elements are the equivalence classes [f ] (cf., Linear Algebra).
Let Z
k[f ]k1 = |f |.

Then k · k1 is well-defined, and it is a norm on L1 . The distinction between [f ] and f


is usually a distracting nuisance, so we suppress it, and we just write kf k1 as the norm
of f . However it is occasionally necessary to be aware of the difference.
Now we have a notion of convergence:
Z
1
fn → f in L -norm ⇐⇒ lim kfn − f k1 = 0 ⇐⇒ |fn − f | → 0.
n→∞

In probability this may be called convergence in mean. Actually, convergence in mean


square is more convenient in some respects. For that, one considers the space L2 of
all measurable functions f such that |f |2 is integrable. Suppose that f, g ∈ L2 . Then
simple inequalities for real/complex numbers give
|f + g|2 ≤ 2(|f |2 + |g|2 ), |f g| ≤ 12 (|f |2 + |g|2 )
So f + g ∈ L2 and f g is integrable. Thus L2 is a vector space, and we can put
Z
hf, gi2 = f g.

Then h·, ·i2 is positive-semidefinite, linear in the first variable, and conjugate-symmetric,
so it is almost an inner product. Again there is a small problem that hf, f i2 = 0 implies
only that f ∈ N . So we form L2 = L2 /N , and we obtain an inner product on L2 . Hence,
we get a well-defined norm on L2 given by
Z 1/2
1/2 2
k[f ]k2 = kf k2 = hf, f i2 = |f | .

Now, kfn − f k2 → 0 (convergence in L2 -norm) corresponds exactly to convergence in


mean square in the case of probability spaces.
Let’s see what happens if the indices 1 and 2 are replaced by some other real p > 0.
Let Lp be the set of all measurable functions f such that |f |p is integrable. Note that
(|f + g|)p ≤ (2 max(|f |, |g|))p = 2p max(|f |p , |g|p ) ≤ 2p (|f |p + |g|p ),
Lp is a vector space. Let Lp = Lp /N , and
Z 1/p
kf kp = |f |p .

Now it is not obvious whether the triangle inequality holds.


Proposition 9.1. [Minkowski’s Inequality] For p ≥ 1 and f, g ∈ Lp , kf + gkp ≤
kf kp + kgkp .

Proof. If f = 0 a.e. or g = 0 a.e., the inequality is trivial. So suppose that α := kf kp > 0


and β := kgkp > 0.
INTEGRATION, H.T. 2024 39

The function t 7→ tp is continuous on [0, ∞) and its second derivative p(p − 1)tp−2
is positive on (0, ∞). This implies that it is convex, i.e.
(λs + (1 − λ)t)p ≤ λsp + (1 − λ)tp
for 0 ≤ λ ≤ 1, s, t ≥ 0. Apply this with
α |f (x)| |g(x)|
λ= , s= , t= .
α+β α β
This gives
p
|f |p |g|p
  
|f | + |g| 1
≤ + .
α+β α+β αp−1 β p−1
Using |f + g| ≤ |f | + |g|, integrating, and taking pth roots gives the required inequality.


So Lp becomes a normed vector space, whenever p ≥ 1.


We can also define a normed space for p = ∞. Let
L∞ (E) = {f : E → R : f measurable and there exists C > 0 such that |f | ≤ C a.e.}
This consists of the essentially bounded measurable functions on E. Given f ∈ L∞ (E),
define
kf k∞ = inf{C > 0 : |f | ≤ C a.e.}.
This quantity is known as the essential supreumum of |f | (sometimes denoted esssup|f |)
and for f ∈ L∞ (E), kf k∞ = 0 if and only if f = 0 a.e. Then, just as for finite p, we
can define L∞ (E) = L∞ (E), and then k[f ]k∞ = kf k∞ gives a well defined norm on
L∞ (E) (exercise).
Another important inequality, proved in a similar spirit to Minkowski’s inequality
is:
Proposition 9.2. [Hölder’s Inequality] Let p, q ∈ (1, ∞) with 1/p + 1/q = 1. Let
f ∈ Lp and g ∈ Lq . Then f g ∈ L1 and kf gk1 ≤ kf kp kgkq .

The pair (p, q) are sometimes called Hölder conjugates. For p = q = 2, Hölder’s
Inequality is the Cauchy-Schwarz Inequality. Notice also that Hölder’s inequality holds
for the pair p = 1 and q = ∞.

Proof. Note first that the function t 7→ log t is concave on [0, ∞), because its second
derivative −t−2 is negative. Hence
 
1 1 s t
log s + log t ≤ log + .
p q p q
Exponentiate to obtain s1/p t1/q ≤ ps + qt . Let s = (|f (x)|/kf kp )p and t = (|g(x)|/kgkq )q .
This gives
|f g| |f |p |g|q
≤ + .
kf kp kgkq pkf kpp qkgkqq
Integrate. 
40 INTEGRATION, H.T. 2024

Corollary 9.3. If 1 ≤ p1 < p2 < ∞ and f ∈ Lp2 (a, b), then f ∈ Lp1 (a, b) and
1
− p1
kf kp1 ≤ (b − a) p1 2 kf kp2 .
Hence if fn ∈ Lp2 (a, b) and kfn kp2 → 0, then kfn kp1 → 0.

Proof. Apply Proposition 9.2 to the functions |f |p1 and χ(a,b) , with p = p2 /p1 . Then
raise both sides to the power (1/p1 ). 

The inclusion Lp2 (a, b) ⊂ Lp1 (a, b) in Corollary 9.3 is strict: consider xα on (0, 1).
Corollary 9.3 holds if (a, b) is replaced by any finite measure space. However,
Lp1 (1, ∞) is not contained in Lp2 (1, ∞) (exercise).
For p ≥ 1, Lp is a normed space and hence a metric space for dp (f, g) = kf − gkp .
How does convergence in Lp -norm compare with pointwise a.e. convergence?
Examples 9.4. 1. Convergence a.e. does not imply convergence in Lp -norm: If fn (x) =
n2 xn (1 − x) (0 ≤ x ≤ 1), then fn (x) → 0 a.e., but kfn k1 → 1.
2. Convergence in Lp -norm does not imply convergence a.e.: For n = 2r + k, where
0 ≤ k < 2r , let fn be the characteristic function of [k2−r , (k + 1)2−r ]. Then kfn k1 =
2−r ≤ 2/n → 0, but for each x ∈ [0, 1], fn (x) takes the values 0 and 1 infinitely
often.
Theorem 9.5. Let p ∈ [1, ∞), and let (fn ) be a sequence in Lp which is Cauchy, i.e.,
for each ε > 0, there exists N such that kfn − fm kp < ε whenever m, n ≥ N . Then
there exists f ∈ Lp such that
1. There is a subsequence (fnk ) such that limk→∞ fnk (x) = f (x) a.e.
2. limn→∞ kfn − f kp = 0.
Thus Lp is a complete metric space.

RProof. [For p =−(r+1)


1.] By assumption, there exist N1 < NR2 < N3 < . . . such that
|fn − fm | < 2 whenever n, m ≥ Nr . In particular,R |fNr+1 − fNr | < 2−(r+1) .
Let g1 = fN1 and gr = fNr − fNr−1 for r = 2, 3, . . . , so |gr | < 2−r for r ≥ 2. By
Lebesgue’s Series Theorem 6.8, ∞ 1
P
r=1 gr converges a.e. to f ∈ L . Now
k
X
fNk = gr → f a.e.,
r=1
Z ∞
X Z ∞
X ∞ Z
X
kf − fNk k1 = gr ≤ |gr | = |gr | < 2−k → 0.
r=k+1 r=k+1 r=k+1
If a Cauchy sequence has a convergent subsequence, then the whole sequence is con-
vergent. See Prelims proof that every Cauchy sequence in R is convergent.
For general p, the use of LST has to be replaced by Minkowski’s inequality plus
Fatou’s Lemma. 
Corollary 9.6. 1. If kfn − f kp → 0, then there is a subsequence (fnr ) which converges
to f a.e.
2. If kfn − f kp → 0 and fn → g a.e., then f = g a.e.
INTEGRATION, H.T. 2024 41

The Convergence Theorems provide situations when a.e. convergence implies con-
vergence in Lp -norm. Here is a general result in that direction with a weaker conclusion
(see the bonus sheet for a proof),
Theorem 9.7. [Egorov’s Theorem] Suppose that fn → f a.e. Let E be a measurable
set with m(E) < ∞ and let ε > 0. Then there is a measurable subset F of E with
m(E \ F ) < ε such that fn → f uniformly on F . In particular, kfn − f kLp (F ) → 0 for
all p ≥ 1.

It is very useful to identify natural dense subsets of the Lp spaces. We can often
establish results on dense subsets and extend them to all of Lp by density (just as you
often use density of Q and density of R \ Q in prelims analysis arguments).
Theorem 9.8. Let 1 ≤ p < ∞ and f ∈ Lp (R).
1. There is a sequence of step functions ψn such that limn→∞ kf − ψn kp = 0.
2. There is a sequence (gn ) of continuous functions with compact support13 such that
limn→∞ kf − gn kp = 0.

Part 1 of this result is closely related to Theorem 3.11, that measurable functions
are pointwise (a.e.) limits of sequences of step functions. For a proof when p = 1, see
Stein & Shakarchi, Theorem 2.4, p.71.
As an example of density in action, we will show that translation of a function is
continuous in the Lp norm. For f : R → R, and h ∈ R consider the translation fh (x) =
f (x − h). As Lebesgue measure is translation invariant, it follows that f ∈ Lp (R) if
and only if fh ∈ Lp (R) for all h ∈ R.
Proposition 9.9. For 1 ≤ p < ∞ and f ∈ Lp (R), limh→0 kfh − f kp = 0.

Proof. Given  > 0, use Theorem 9.8(2) to find g which is continuous and of compact
support such that kf − gkp < /3. As g is continuous and of compact support it is
uniformly continuous. From this one obtains that limh→0 kgh −gkpp = 0. So one can find
δ > 0 such that kgp − gkp < /3 whenever 0 < |h| < δ. Using Minkowski’s inequality
and invariance of the Lebesgue measure under translation, for 0 < |h| < δ one has
kf − fh kp ≤ kf − gkp + kg − gh kp + kgh − fh kp < 3/3 = . 

[You could equally use part (1) of Theorem 9.8 by showing that limh→0 kψh −ψk = 0
for a step function ψ (an easy calculation obtained by doing the integral when ψ is the
indicator function of an interval, and then use a triangle inequality argument).]
We end this section by revisiting the Fourier transform of an integral function from
the ASO Integral transforms course, giving rigorous proofs of some properties from that
course. Let f ∈ L1 (R). The Fourier transform of f is the function fb : R → C defined
by Z
f (s) =
b f (x)e−isx dx.
R

Theorem 9.10. Let f ∈ L1 (R).


13i.e. the set {x ∈ R : g(x) 6= 0} is compact.
42 INTEGRATION, H.T. 2024

1. |fb(s)| ≤ kf k1 for all s,


2. fb is continuous,
3. fb(s) → 0 as s → ±∞. [Riemann–Lebesgue Lemma]
4. Let g(x) = xf (x). If g ∈ L1 (R) then fb is differentiable everywhere and (fb)0 (s) =
−ib g (s).
5. If f has a continuous derivative f 0 ∈ L1 (R), then the Fourier transform of f 0 is
isfb(s).

Proof. (1) follows from |f (x)e−isx | = |f (x)|. (2) follows from the continuous-parameter
DCT (Theorem 7.2) with g(x) = |f (x)|.14
i(e−isb − e−isa )
For (3) when f = χ(a,b) , fb(s) = → 0 as |s| → ∞. This extends to
s
step functions, by linearity. For general f ∈ L1 (R) and ε > 0, there is a step function
ϕ such that kf − ϕk1 < ε by Theorem 9.8, and there exists K such that |ϕ(s)| b <ε
whenever |s| > K. Then
|fb(s)| ≤ |fb(s) − ϕ(s)|
b + |ϕ(s)|
b ≤ kf − ϕk1 + |ϕ(s)|
b < 2ε.

(4) can be proved by applying Theorem 7.5 with |g| as dominating function. (5)
can be proved by using integration by parts over intervals [an , bn ] where an → −∞,
f (an ) → 0, bn → ∞ and f (bn ) → 0. 

One can alternatively prove (3) using the L1 -continuity of translations of Propo-
sition 9.9 as follows. For f ∈ L1 (R), making the change of variables y = x + π/s, we
have
Z Z Z
−isx −isx−iπ
ˆ
f (s) = f (x)e dx = −f (x)e dx = −f (y − π/s)e−isy dy.
R R R
Therefore Proposition 9.9 gives
Z
1 1
ˆ
|f (s)| = (f (x) − f (x − π/s))e−isx dx ≤ kf − fπ/s k1 → 0,
2 R 2
as s → ∞,
The theorem about the Fourier transform of the convolution of two integrable func-
tions (Theorem 81) is an application of Fubini/Tonelli. One can also formulate a Fourier
inversion theorem (normalising appropriately) when both f and fˆ are integrable. See
Stein and Shakarchi section 2.4. In fact Fourier inversion works particularly well in
the L2 -setting; this will be further developped in the Fourier analysis course (and for
Fourier series in the functional analysis course B4.2).

10. Absolutely continuous functions

This section consists of non-examinable material.


Recall from Section 4 that the Fundamental Theorem of Calculus is true for func-
tions with a continuous derivative on [a, b] (Theorem 4.11, but proved in Prelims), but
14or alternatively by observing that fb is a uniform limit of continuous functions ϕ
bn where ϕn are step
functions converging to f in L1 -norm.
INTEGRATION, H.T. 2024 43

it is false for the Cantor-Lebesgue function Φ whose derivative exists and equals 0 a.e.
on [0, 1] (Example 4.12).
The ideal Fundamental Theorem of Calculus would identify a class A of functions
F on [a, b] with both the following properties:
Rx
(i) If F ∈ A, then F is differentiable a.e., F 0 ∈ L1 (a, b), and a F 0 (y) dy = F (x)−F (a)
for all x ∈ [a, b]. Rx
(ii) If f ∈ L1 (a, b) and F (x) = a f (y) dy for x ∈ [a, b], then F ∈ A and F 0 = f a.e.

It is not obvious that such a class exists—its existence implies that the indefinite integral
F of an integrable function f is differentiable a.e. and F 0 = f a.e.
Rx
In fact, this is true. Then A is the class of all functions of the form F (x) := c+ a f
for some c ∈ R and some f ∈ L1 (a, b). Remarkably there is an intrinsic characterisation
of such functions.
Let I be an interval. A function F : I → R is said to be absolutely continuous on
I if, for each ε > 0, there exists δ > 0 such that
n
X
|F (br ) − F (ar )| < ε
r=1
n
X
whenever n ∈ N, (ar , br ) (r = 1, . . . , n) are disjoint subintervals of I and (br −ar ) < δ.
r=1
If we only allowed n = 1 in this definition, we would have the definition of uniform
continuity on I. Recall from Prelims that any continuous function on [a, b] is uniformly
continuous.
Examples 10.1. 1. Recall that F is Lipschitz if there exists c such that |F (y)−F (x)| ≤
c|y − x| for all x, y. Any Lipschitz function is absolutely
R x continuous (take δ = ε/c).
2. If f is a bounded measurable function and F (x) = a f (y) dy, then F is Lipschitz.
3. The Cantor-Lebesgue function is not absolutely continuous on [0, 1].
Rx
Theorem 10.2. Let f ∈ L1 (I) and F (x) = a f (y) dy. Then F is absolutely continu-
ous on I.
Theorem 10.3. Let F be an absolutely continuous R x function on [a, b]. Then F is dif-
ferentiable a.e., F 0 ∈ L1 (a, b) and F (x) − F (a) = a F 0 (y) dy for all x ∈ [a, b].

One way to a proof of Theorem 10.2 is outlined in an optional exercise on the


Supplementary Problem Sheet. There are various other proofs.
Theorem 10.3 is rather hard to prove. It is a remarkable theorem as differentiability
(a.e.) is inferred from an assumption that seems to be only a type of continuity. Chapter
3 of Stein and Shakarchi goes into this in detail. I recommend this for further reading
if you’ve enjoyed this course.
A corollary of Theorem 10.3 is that every Lipschitz function is differentiable a.e.
Thus the Lipschitz functions are precisely the indefinite integrals of bounded measurable
functions. (You may see a space of Lipchitz functions again in the functional analysis
44 INTEGRATION, H.T. 2024

courses; you can use these ideas to identify the Lipschitz functions f : [−1, 1] → R with
f (0) = 0 and an appropriate norm with L∞ [−1, 1]).

You might also like