0% found this document useful (0 votes)
81 views85 pages

Analysis II Lecture Notes

The document provides notes for a university course on continuity and differentiability. It includes an introduction noting that the notes are adapted from previous years and contain material both essential for exams and additional non-essential examples. It outlines that students should attend lectures, read notes beforehand, and complete weekly problem sheets which are integral to understanding the material. The notes also provide a summary of relevant concepts from the previous Analysis I course to use as a refresher.

Uploaded by

Chandan Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
81 views85 pages

Analysis II Lecture Notes

The document provides notes for a university course on continuity and differentiability. It includes an introduction noting that the notes are adapted from previous years and contain material both essential for exams and additional non-essential examples. It outlines that students should attend lectures, read notes beforehand, and complete weekly problem sheets which are integral to understanding the material. The notes also provide a summary of relevant concepts from the previous Analysis I course to use as a refresher.

Uploaded by

Chandan Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 85

Prelims: Analysis II

Continuity and Differentiability

Hilary Term 2023

Paul Balister

[Preliminary version: September 25, 2022]


A note on these notes
These notes are designed to accompany the University of Oxford Prelims Analysis II
lecture course. They are adapted from, and owe much to, the notes of many previous
lecturers, in particular H.A. Priestley, Z. Qian and R. Heath-Brown. They are quite
heavily revised from the previous year’s notes, so will undoubtedly contain typos and other
mistakes. Please send any corrections or comments to Paul.Balister@maths.ox.ac.uk.

Lectures
To get the most out of the course you must attend the lectures. On the other hand,
you should also read the relevant section of the notes before attending the lecture. The
two complement each other, and having read the notes will make it easier to follow the
lectures (even if you did not follow everything in the notes), and learn more from the
lectures (even if you think you did follow everything in the notes). There will be more
explanation in the lectures than there is in the notes. On the other hand I will not put
everything on the board which is in the printed notes. In some places I have put in
extra examples which I will not have time to demonstrate in the lectures. There is some
extra material in the notes which I have put in for interest, but which I do not regard as
central to the course and will probably not be covered in the lectures. This material will
be marked as non-examinable.

Problem Sheets
The weekly problem sheets which accompany the lectures are an integral part of the
course. You will only really understand the definitions and theorems in the course by
doing the problems! I assume that week 1 tutorials are being devoted to the final sheets
from the Michaelmas Term courses. I therefore suggest that the problem sheets for this
course are tackled in tutorials in weeks 2–8, with the 8th sheet used a vacation work for
a tutorial in the first week of Trinity Term. The problem sheets contain bonus questions
‘for the enthusiasts’ — these are usually harder questions and students are not expected
to complete all, or even any, of them.

i
Contents
0 Summary of results from Analysis I 1

1 Functions and limits 8

2 Basic properties of limits 17

3 Continuity 22

4 The Boundedness Theorem and the IVT 27

5 Monotonic functions and the Continuous IFT 31

6 Uniform continuity 35

7 Uniform convergence 38

8 Differentiation 47

9 The Mean Value Theorem 57

10 Taylor’s Theorem 63

11 L’Hôpital’s Rule 71

12 The Binomial Expansion 76

ii
0 Summary of results from Analysis I

I will not cover this section in the lectures as it is material you should be familiar with
from the Introduction to University Mathematics and Analysis I courses. I include it
here as a summary and reminder of things you should know. Refer to previous course
notes for more details.

Standard sets of numbers


N: the set of natural numbers1 , {1, 2, 3, . . . };
Z: the set of integers, {. . . , −2, −1, 0, 1, 2, . . . };
Q: the set of rational numbers, { pq : p, q ∈ Z, q ̸= 0};

R: the set of all real numbers (the real line);


C: the set of all complex numbers (the complex plane, {a + ib : a, b ∈ R}).

We have
N ⊆ Z ⊆ Q ⊆ R ⊆ C.

Infinity (∞) and negative infinity (−∞) are a convenient device for expressing certain
notions concerning real numbers. They are not themselves real numbers.

Quantifiers
∀: “for all” or “for every” or “whenever”.
∃: “there exist(s)” or “there is (are)”.
Quantifiers matter. Treat them with care and respect. The order in which quantifiers

! are written down is important. For example2
∀y ∈ R : ∃x ∈ R : x > y
is true as we can chose x depending on y, say x = y + 1, while
∃x ∈ R : ∀y ∈ R : x > y
is false as we need the same x to work for all y. Statements such as
‘There is an x ∈ R such that x > y for all y ∈ R.’
are therefore ambiguous. Good discipline is to put quantifiers at the front of a statement
(even when written out in words), not at the back as an afterthought, and to read carefully
from left to right.
1
Some people prefer to start from zero. It makes little difference in this course.
2
I prefer to use : to separate ∀ and ∃ from the statements they are quantifying, as opposed to ‘s.t.’,
commas or spacing.

1
Arithmetic and ordering

The real numbers, with their usual arithmetic operations (+, −, ×, ÷) and usual ordering
(<, ≤, >, ≥), form an ordered field. See Analysis I notes for the formal details.
We define the modulus or absolute value by

x,
 x > 0;
|x| = 0, x = 0;

−x, x < 0.

Key facts about the modulus and inequalities:


(a) Triangle Inequality: for all a, b ∈ R, |a ± b| ≤ |a| + |b|.
(b) Reverse Triangle Inequality: for all a, b ∈ R, |a ± b| ≥ ||a| − |b|| ≥ |a| − |b|.
(c) Interval around a point: for all a, r, x ∈ R,
|x − a| < r ⇐⇒ a − r < x < a + r ⇐⇒ x ∈ (a − r, a + r).
The complex numbers C = {a + ib : a, b ∈ R} also forms a field with the usual operations
3
+, −, × and ÷, but do not have an ordering
√ . We can still define the modulus or absolute
value as the real number |a + ib| = a + b2 . This satisfies the triangle inequality and
2

reverse triangle inequality. The set {x ∈ C : |x − a| < r} is now a disc of radius r about
the point a in the complex plane.

Boundedness properties and the Completeness Axiom

A subset S of R is bounded above if there exists b ∈ R such that for all x ∈ S, x ≤ b


and bounded below if there exists a ∈ R such that for all x ∈ S, a ≤ x. A set S
is bounded if it is bounded above and below; This happens if and only if there exists
M ∈ R such that for all x ∈ S, |x| ≤ M . Note that the bounds don’t need to be in the
set S.
The notion of boundedness (but not upper or lower bounds) also applies to sets of complex
numbers: S ⊆ C is bounded if ∃M ∈ R : ∀z ∈ S : |z| ≤ M .
We assume that R satisfies the following.
Completeness Axiom. A non-empty subset S of R which is bounded above has a least
upper bound.

The least upper bound, also called the supremum, of S is denoted sup S and can easily
seen to be unique when it exists4 . In symbols, s = sup S satisfies
3
At least not an arithmetically useful one. Of course you could define any ordering you like on C, but
it would not play well with + and ×.
4
Sometimes it is convenient to extend the definition of sup so that sup S = +∞ for sets that are
not bounded above and sup ∅ = −∞. Similarly inf S = −∞ for sets that are not bounded below and
inf ∅ = +∞. Properties (a) and (b) then still hold with the obvious ordering conventions on R ∪ {±∞},
and the standard mathematical interpretation of vacuous statements as being true.

2
(a) ∀x ∈ S : x ≤ s (s is an upper bound . . . )
(b) ∀b ∈ R : ((∀x ∈ S : x ≤ b) =⇒ s ≤ b) (. . . and it is the least one)
Combining (a) with the contrapositive of (b) we get the following.

Approximation property: if c < sup S then there exists an x ∈ S with c < x ≤ sup S.
The Completeness Axiom can equivalently be formulated as the assertion that non-empty
subset of R which is bounded below has a greatest lower bound, or infimum, inf S. Re-
versing all the inequalities in the properties above for sup gives corresponding properties
for inf.
The Completeness Axiom underpins the deeper results in Analysis I and the same is true
in Analysis II.

Intervals

A subset I ⊆ R is called an interval if whenever I contains two points, it also contains


all points between them. In symbols:
∀x, y, z ∈ R : ((x, z ∈ I and x ≤ y ≤ z) =⇒ y ∈ I). (Interval property)
One can prove using the completeness axiom (exercise, see Problem Sheet 1, Q1) that
every interval is of one of the following forms:
∅ := {} (−∞, ∞) := R
(a, b) := {x ∈ R : a < x < b} (−∞, b) := {x ∈ R : x < b}
(a, b] := {x ∈ R : a < x ≤ b} (−∞, b] := {x ∈ R : x ≤ b}
[a, b) := {x ∈ R : a ≤ x < b} (a, ∞) := {x ∈ R : x > a}
[a, b] := {x ∈ R : a ≤ x ≤ b} [a, ∞) := {x ∈ R : x ≥ a}
An interval is called non-trivial if it has infinitely many points, i.e., it is not empty
(∅) and not a singleton set ([a, a] = {a}). Intervals on the left in the above table are
all bounded, the ones on the right are unbounded. Intervals of types ∅, (a, b), (−∞, b),
(a, ∞) and R are called open. Intervals of types ∅, [a, b], (−∞, b], [a, ∞) and R are called
closed — we will see why later.

Limits of sequences

A sequence of real (respectively complex, integer, . . . ) numbers is a function a : N → R


(respectively N → C, N → Z, . . . ) which assigns to each natural number n a real
(respectively complex, integer, . . . ) number a(n), which in this context is more usually
denoted an . We denote5 a sequence as (a1 , a2 , a3 , . . . ), or (an )n∈N , or (an )∞
n=1 or more
usually we just abbreviate it as (an ).
5
This is consistent with the notation for an ordered pair/n-tuple/vector (a1 , a2 ) or (a1 , . . . , an ), which
can be thought of as a function a : {1, . . . , n} → R giving a real value for each ‘coordinate’. Not to be
confused with the set {a1 , . . . , an } where the order does not matter and repetitions ignored.

3
Terms such as boundedness, supremum, etc. when applied to sequences refer to the set
{an : n ∈ N} of values taken by the sequence.
The key definition in Analysis I is that of a limit of a sequence: a sequence (an ) of real
or complex numbers tends to (or converges to) the limit ℓ ∈ R or C if6
∀ε > 0 : ∃N ∈ N : ∀n > N : |an − ℓ| < ε. (1)
We then write an → ℓ as n → ∞ or limn→∞ an = ℓ. We say (an ) converges if there
exists ℓ ∈ R or C such that an → ℓ, otherwise we say (an ) diverges.
Important fact: the limit of a convergent sequence is unique, if it exists.
Useful fact: a convergent sequence is always bounded.
Sometimes, in the definition (1) of a limit, it is neater to work with the condition n ≥ N ,
or require only that N ∈ R. This makes no difference as one can adjust N by 1 or replace
N ∈ R with7 ⌊N ⌋ respectively (as n > N if and only if n > ⌊N ⌋).
We say a sequence (an ) of real numbers tends to infinity if
∀M ∈ R : ∃N ∈ N : ∀n > N : an > M.
We then use the notation an → ∞ or limn→∞ an = ∞. A similar definition exists for
an → −∞.
Warning. If an → ±∞ we do not say that an converges. Also note that if (an ) does not

! converge, it does not imply that an → ±∞ (e.g., it might oscillate). Finally, an → ±∞
only makes sense for real sequences8 as the definition uses ordering, which is not defined
on C.
Important fact: whether or not a sequence converges, and what its limit is, does not
depend on the first few terms. Thus we only need the sequence to be defined from some
point onwards — we don’t have to start with a1 .
Algebra of Limits (AOL) (Real or Complex sequences): If an → a and bn → b as
n → ∞ then |an | → |a|; an ± bn → a ± b; an bn → ab; and, provided b ̸= 0, an /bn → a/b.
Also constant sequences converge: if all cn = c then cn → c.
Important fact: bn → b ̸= 0 implies that from some point onwards bn ̸= 0 (which is
needed in the proof that an /bn → a/b).
AOL results apply when the limits are in R or C. Generalisations to real sequences which

! tend to ±∞ need care (and separate proofs even when they work, see later).
Limits preserve weak inequalities: If an → a and bn → b and an ≤ bn then a ≤ b.
(Also applies with a and/or b replaced by ±∞ with the obvious ordering conventions.)
Warning. This only applies to weak inequalities: an < bn does not imply lim an < lim bn .

! Also, as inequalities are involved, this applies to real sequences only, as does the following.
6
And to fully make sense of this definition requires prior attendance of the Analysis I course!
7
The floor function ⌊x⌋ = max{n ∈ Z : n ≤ x} is x rounded down and the ceiling function
⌈x⌉ = min{n ∈ Z : n ≥ x} is x rounded up to the next integer. These are well-defined: see Analysis I.
8
Although |an | → ∞ makes perfect sense for complex sequences as then |an | is real.

4
A real sequence (an ) is increasing9 (respectively strictly increasing, decreasing,
strictly decreasing) if m < n implies am ≤ an (respectively am < an , am ≥ an ,
am > an ).

Monotone limits: If (an ) is an increasing sequence of real numbers that is bounded


above, then it converges and limn→∞ an = sup{an : n ∈ N}. If (an ) is a decreasing
sequence of real numbers that is bounded below, then it converges and limn→∞ an =
inf{an : n ∈ N}.

If (an ) is increasing and not bounded above, then10 an → +∞. Similarly, if (an ) is
decreasing and not bounded below then an → −∞.

Sandwiching: If an ≤ bn ≤ cn and an → ℓ, cn → ℓ, then bn → ℓ.

To prove a version of sandwiching which also works for complex sequences we note the
following.

Observation (for real or complex sequences): an → 0 iff11 |an | → 0.


Indeed, it is enough to note that ||an | − 0| = |an − 0| in the definition (1) of a limit.

Sandwiching, alternative form: If an → ℓ, |bn − an | ≤ rn and rn → 0, then bn → ℓ.

Proof. 0 ≤ |bn − an | ≤ rn (for real rn ) and rn → 0 implies |bn − an | → 0 by sandwiching.


Now |bn − an | → 0 =⇒ bn − an → 0 =⇒ bn = an + (bn − an ) → ℓ + 0 = ℓ by AOL.

Subsequences

A subsequence of a sequence (an ) is a sequence (asn ) = (as1 , as2 , . . . ) where (sn ) is a


strictly increasing sequence of natural numbers (i.e., s1 < s2 < · · · ).

Limits of subsequences: If an → ℓ as n → ∞ then, for any subsequence (asn ) of (an ),


asn → ℓ as n → ∞.

This result is often used in the form of the contrapositive: to show a sequence does not
converge it is enough to exhibit two subsequences that converge to different limits, or
find one that does not converge at all (e.g., because it tends to ±∞).

The following is one of the main theorems of Analysis I — we will be needing it!

Theorem 0.1 (Bolzano–Weierstrass Theorem). A bounded sequence of real or complex


numbers has a convergent subsequence.
9
Sometimes the term non-decreasing is used in place of increasing to emphasise that it not neces-
sarily strictly increasing. Similarly non-increasing is the same as (not necessarily strictly) decreasing.
10
This is one good reason to extend the definitions of inf and sup as in footnote 4 on page 2.
11
If and only if.

5
Cauchy sequences

A real or complex sequence (an ) is a Cauchy sequence if

∀ε > 0 : ∃N ∈ N : ∀n, m > N : |an − am | < ε.

Theorem 0.2 (Cauchy Convergence Criterion or General Principle of Convergence). A


sequence of real or complex numbers converges if and only if it is a Cauchy sequence.

The Cauchy convergence criterion is extremely useful when we want to show something
converges, but don’t know what the limit should be.

Series

ak or ∞
P P
Given a sequence (ak ) the series Pak is defined to be the limit (if it exists)
k=1
of the sequence (sn ) of partial sums12 sn := nk=1 ak as n → ∞.
P
Useful fact: If ak converges, then ak → 0.
P P Pn
A series ak is absolutely convergent if |ak | converges.
P∞ Note that as k=1 |ak |
is increasing
P∞ in n, it is either bounded, so converges ( k=1 |a k | = ℓ < ∞), or tends to
infinity ( k=1 |ak | = ∞).

Important fact: absolute convergence implies convergence (for both real and complex
series).

|ak | converges then | ∞


P P P∞
Infinite triangle inequality: If k=1 ak | ≤ k=1 |ak |.

[This applies to complex as well as real series. The proof


Pn starts with induction on n
to deduce the finite triangle inequality | nk=1 ak | ≤
P
k=1 |ak |. We then take limits,
using ‘absolute convergence implies convergence’ and ‘limits preserve weak inequalities’
(exercise: write out the details!).]

Tests for convergence of series


P P
Comparison Test: if 0 ≤ an ≤ bn and bn converges, then
an converges.
P
Often this is applied to |an | (an can now be complex) to first show |an | converges:

Comparison Test+ : if |an | ≤ bn and


P P
bn converges, then an converges (absolutely).

Ratio Test: assume


P an ̸= 0 are real or complex andP|an+1 /an | → ℓ as n → ∞. If
0 ≤ ℓ < 1 then an converges; if ℓ > 1 or ℓ = ∞ then an diverges.
12
When working both with individual terms of the series and with the partial sums of the series it is
sensible to use different dummy variables: here we use k for the first and n for the second.

6
[No information is obtained when ℓ = 1. The Ratio Test is quite weak as |an+1 /an |
converging is a rather strong assumption. But when it works it is often easy to apply.]

Alternating Series Test: if (an ) is a real (non-negative) decreasing sequence and an → 0


n
P
then (−1) an converges.

[Strictly speaking, non-negativity follows automatically from the other conditions. This
is the only general test listed here that can show convergence for sequences that don’t
converge absolutely.]
P∞
Integral Test:R n if f : [1, ∞) → R is a non-negative decreasing function, then k=1 f (k)
converges iff 1 f (x) dx converges as n → ∞.

[A very powerful test for series with slowly decreasing positive terms. Often needs com-
parison theorem as well so as to compare with something that one can integrate explicitly.]

Power series

an xn where we consider x a (real orPcomplex)


P
A power series is a series of the form
parameter that can be varied. It can be used to define a function f (x) := ∞ n=0 an x
n

whenever this series converges.

an xn is defined by13
P
The radius of convergence (ROC) of the power series
( 
sup |x| : an xn converges , if this set is bounded;
P
R :=
+∞, otherwise.

|an xn | converging — it makes no differ-


P
[Sometimes the definition is given in terms of
ence. It also makes no difference if x is allowed to be complex, or is restricted to real
values — you get the same value of R as a consequence of the following theorem.]

an xn is a power series with ROC R and x ∈ C, then


P
Theorem 0.3. If
an xn converges, (and in fact it converges absolutely),
P
(a) if |x| < R then
an xn diverges, (and in fact the terms an xn are unbounded).
P
(b) if |x| > R then
Proof. (a): As |x| < R there exists an y with |x| < |y| < R and an y n converging (by the
P
approximation property of sup, or by the unboundedness of the set of convergence when
P= ∞).
R But then an y n → 0, so in particular (an y n ) is bounded, say |an y n | ≤ M . Now
|an xn | ≤ M (|x|/|y|)n converges byP
P
comparison with a geometric series. Absolute
convergence now implies convergence of an x n .

an xn converges.
P
(b): As |x| > R there exists a y with R < |y| < |x|. Assume
n n n
Then an x → 0, so (an x ) is bounded, say |an x | ≤ MP(which from now on is all we
shall assume). As in the proof of (a) this implies that an y n converges (absolutely),
contradicting the definition of R.
13 4
Another good reason to adopt the extension of the definition on sup in footnote on page 2.

7
1 Functions and limits

Functions

Analysis II is a course about functions. Given two sets X and Y (which will usually
be subsets of R in this course), a function f : X → Y assigns to each element x of the
set X an element f (x) of the set Y . Sometimes we also write x 7→ f (x). We call X the
domain of f , or dom(f ); and Y the codomain14 of f , or codom(f ). The image of f is
f (X) := {f (x) : x ∈ X}, i.e., the set of values that are actually achieved by f . This is a
subset, possibly a proper subset, of the codomain Y .
There is no expectation here that the mapping x 7→ f (x) has to be specified by a single
formula, or even a formula at all. Specification of a function ‘by cases’ or by complicated
rules will be common in this course — the modulus function is one example of this. Thus
we shall allow our examples to include functions like the following:
(
1
q
, if x is rational, x = pq in lowest terms;
1. f : (0, 1] → R defined by f (x) :=
0, otherwise.
2. f : R → (− π2 , π2 ) defined by tan(f (x)) = x. (This defines arctan.)
(−1)n x 2n
3. f : R → R defined by f (x) := ∞
P
n=0 (n!)2 ( 2 ) . (This is the Bessel function J0 (x).)

4. For x ∈ (−∞, 2.512), define a0 = x and inductively an+1 = ean /2 −1 for n ≥ 0. Then
set f (x) := limn→∞ 2n an . (A bizarre function that satisfies f (x) = 2f (ex/2 − 1).)

We want to encompass the familiar functions of everyday mathematics: polynomials;


exponential functions; trigonometric functions; hyperbolic functions — all of which can
be defined on the whole of R. We shall also encounter associated inverse functions, log-
arithms; arcsin, etc. You will know from Analysis I that many of these functions can be
defined using power series. One of our objectives in Analysis II will be to develop prop-
erties of functions defined by power series (continuity, differentiability, useful inequalities
and limits, . . . ). But until our general theory of functions has been developed far enough
to cover this material we shall make use of the standard properties we need of standard
functions in our examples.
The material in this section is unashamedly technical, but necessary if we are to build
firm foundations for the study of real-valued functions defined on subsets of R, many of
them having graphs neither you nor any computer software can hope to sketch effectively.

Limit points

We want to define what is meant by the limit of a function. Intuitively f has a limit ℓ at
the point p if the values of f (x) are close to ℓ when x is close to (but not equal to) p. But
14
Some authors use range in place of the codomain, but others use range to mean the image. I will
therefore avoid using this term.

8
for the definition of limit to be meaningful it is necessary that f is defined at ‘enough’
points close to p. So we are interested only in points p that x can get close to, where x
is in the domain of f . This leads us to the definition of a limit point.
Definition. Let E ⊆ R. A point p ∈ R is called a limit point (or cluster point or
accumulation point) of E if E contains points ̸= p arbitrarily close to p. Formally:

∀ε > 0 : ∃x ∈ E : 0 < |x − p| < ε.

Here p may be in E, but need not be. Note that the condition 0 < |x − p| is important
in the case that p ∈ E as we want points close to p that are not equal to p.

Example 1.1. Let E = (a, b] where a < b. Then p is a limit point a b


(a, b]
of E if and only if p ∈ [a, b]. To prove this, there are four cases
to consider. If p < a take ε := |p − a| and get a contradiction. If limit points
p > b take ε := |p − b|, similarly. If p ∈ [a, b), given ε > 0 choose [a, b]
x = p + 21 min{ε, |p − b|}. If p = b (or indeed any p ∈ (a, b]) we can
take x = p − 21 min{ε, |p − a|}.
The same conclusion holds when E = (a, b), E = [a, b) or E = [a, b].

Definition. A set E is called closed if it contains all its limit points. A set E is called
open if it is the complement of a closed set, or equivalently:

∀p ∈ E : ∃δ > 0 : (p − δ, p + δ) ⊆ E.

Exercise. Check the ‘equivalently’ condition is indeed equivalent, and that the definitions
of open and closed are consistent with the terminology used for intervals on page 3.

Example 1.2. Let E = { n1 : n ∈ N}. Here p is a limit point of {1/n} 0 1


E if and only if p = 0. Indeed, if p < 0 or p > 1 take ε = |p| or
1
|p − 1| respectively. If n+1 < p < n1 take ε = min{p − n+1
1
, n1 − p}. If limit points
1
p = n1 then take ε = n(n+1) and note that there is no other point of {0}
E within distance ε of p. If p = 0 then for any ε > 0 pick n > 1ε so
that | n1 − 0| < ε.

Definition. An isolated point of a set E is a point of E that is not a limit point of E,


i.e., it is a point p ∈ E such that for some δ > 0, (p − δ, p + δ) ∩ E = {p}.
For example, all the points of E in Example 1.2 are isolated points.

Example 1.3. Let E = Q. Then for any point p ∈ R, p is a limit Q


point of E as there are rationals in any set of the form (p, p + ε).
Note that none of the points in Q are isolated. A similar argument limit points
also applies to the set E = R \ Q of irrational numbers. R

The notion of limit point is important well beyond the present course, in which we shall
encounter only simple instances of it. Much more exotic examples exist. The structure

9
of the real line is rich, with R having many subsets which are very complicated. Such
complexities are important in topology and measure theory for example. The following
gives a simple criterion for limit points.

Proposition 1.4 (Limit points in terms of sequences). A point p ∈ R is a limit point of


E ⊆ R if and only if there exists a sequence (pn ) of points with pn ∈ E, pn ̸= p, such that
limn→∞ pn = p.

Proof. If p is a limit point of E then for any n ∈ N choose ε := n1 . Then there exists
pn ∈ E such that 0 < |pn − p| < n1 . Now pn → p as n → ∞ (by sandwiching), and pn ∈ E
and pn ̸= p (by assumption).

Conversely, if such a sequence (pn ) exists, given ε > 0, ∃N ∈ N : ∀n ≥ N : |pn − p| < ε.


So in particular pN ∈ E and 0 < |pN − p| < ε as pN ̸= p.

Corollary 1.5. If E ⊆ R is closed and pn ∈ E with pn → p ∈ R as n → ∞, then p ∈ E.

Proof. Either p = pn ∈ E for some n, so p ∈ E; or p ̸= pn for all n in which case p is a


limit point of E by Proposition 1.4, and hence in E as E is closed.

Proposition 1.4 together with Example 1.3 gives the following useful consequences.

• Given x ∈ R, there exists a sequence (rn ) of rational numbers such that rn → x.

• Given x ∈ R, there exists a sequence (qn ) of irrational numbers such that qn → x.

Limits of functions

Now we come to the most important definition in this course.

Definition. Let E ⊆ R and f : E → R be a real-valued function. Let p be a limit point


of E and let ℓ ∈ R. We say that f tends to (or converges to) ℓ as x tends to p if

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (0 < |x − p| < δ =⇒ |f (x) − ℓ| < ε). (2)

In words: given any ε > 0 we can find a δ > 0 such that f (x) will be within distance ε
of ℓ for any x ∈ E, x ̸= p, that is within distance δ of p.

f between bounds except possibly at p.

ℓ+ε
ℓ You are given ε.
ℓ−ε

p−δ p p+δ You need to find δ.

10
We also write this as limx→p f (x) = ℓ or f (x) → ℓ as x → p. If one needs to emphasise
the domain E we can write this more formally as

x→p
lim f (x) = ℓ.
x∈E

We say f (x) converges as x → p if p is a limit point of E and limx→p f (x) = ℓ for some
ℓ ∈ R. Otherwise we say f (x) diverges as x → p.

Note that, in the definition, δ may, and almost always will, depend on ε.
Important note. In the limit definition it may or may not happen that f is defined

! at p. And when f (p) is defined, its value has no influence on whether or not limx→p f (x)
exists. Moreover, when the limit ℓ does exist and f (p) is defined, there is no reason to
assume that f (p) will equal ℓ.

Example 1.6. Let α > 0. Consider the function f (x) = |x|α sin x1
on the domain E := R \ {0}. We claim that f (x) → 0 as x → 0.
Since | sin θ| ≤ 1 for any θ ∈ R, we have ||x|α sin x1 | ≤ |x|α for any
x ̸= 0. For any ε > 0, choose δ := ε1/α > 0. Then for 0 < |x − 0| < δ
α
|x| sin 1 − 0 ≤ |x|α < δ α = ε.
x
According to the definition, |x|α sin x1 → 0 as x → 0.

Example 1.7. Let f be defined on E = R \ {0} by f (x) = 42. 42


Then 0 is a limit point of E. Let ℓ = 42. Then for x ̸= 0 we have
|f (x) − ℓ| = 0. So, for any ε > 0, we can take δ = 1, say, to get that

0 < |x − 0| < δ =⇒ |f (x) − ℓ| = 0 < ε.


0
So f (x) → 42 as x → 0.

Example 1.8. Let f be defined on R by


2

x,
 if x ∈ Q, x ̸= 0;
f (x) := 2, if x = 0; 0

−x, otherwise.

We claim that f (x) → 0 as x → 0. To prove this, simply note that |f (x) − 0| = |x| < ε
if 0 < |x − 0| < δ := ε. (Here, following the definition of limit, we omit consideration of
f (0), even though f is defined at 0.)
Example 1.9. Consider the function f (x) = x2 on the domain E = R. Let a ∈ R. We
claim that f (x) → a2 as x → a.
Note that |x2 − a2 | = |x − a||x + a|. We want this to be small when x is close to a.
Suppose that |x − a| < 1. Then
|x + a| = |x − a + 2a| ≤ |x − a| + |2a| < 1 + 2|a|.

11
ε
So given ε > 0, choose δ := min{ 1+2|a| , 1} > 0. Then if 0 < |x − a| < δ we have

|x2 − a2 | ≤ |x − a|(1 + 2|a|) < δ(1 + 2|a|) ≤ ε.

This example serves to illustrate that going back to first principles to establish the limiting
value of a function may be a tedious task. Help will soon be at hand.

Remark. We saw in Example 1.9 that when considering a limit x → p we can restrict
attention to x close to p, say |x − p| < δ0 . Any subsequent δ that we find then just has
to be replaced by min{δ, δ0 } in definition (2) to make it work for all x.

 Why do we not consider f (p) (p)? One of our main motivations for considering function
limits stems from differential calculus. The recipe from school calculus of the derivative
of f can be cast in the form
d f (x + δx) − f (x)
f (x) := lim .
dx δx→0 δx
Clearly here we need δx to be non-zero as otherwise the quotient is undefined. To
provide a uniform and consistent theory of limits that includes this case, we therefore
systematically exclude f (p) from consideration.

The following result validates our definitions and notation. Compare with the corre-
sponding result for sequences and its proof.

Proposition 1.10 (Uniqueness of function limits). Let f : E → R and p be a limit point


of E. If f has a limit as x → p, then this limit is unique.

Proof. Suppose f (x) → ℓ1 and also f (x) → ℓ2 as x → p, where ℓ1 ̸= ℓ2 . We now apply


the definition of a limit with ε := 21 |ℓ1 − ℓ2 | > 0:

∃δ1 > 0 : ∀x ∈ E : (0 < |x − p| < δ1 =⇒ |f (x) − ℓ1 | < ε),

∃δ2 > 0 : ∀x ∈ E : (0 < |x − p| < δ2 =⇒ |f (x) − ℓ2 | < ε).


Let δ := min{δ1 , δ2 } > 0. Since p is a limit point of E and δ > 0, ∃x ∈ E such that
0 < |x − p| < δ. Then for this x both |f (x) − ℓ1 | < ε and |f (x) − ℓ2 | < ε hold, and so

|ℓ1 − ℓ2 | = |(f (x) − ℓ2 ) − (f (x) − ℓ1 )| [add and subtract technique]


≤ |f (x) − ℓ2 | + |f (x) − ℓ1 | [triangle inequality]
<ε+ε
= |ℓ1 − ℓ2 |, [choice of ε]

and we have a contradiction.

 Why do we need limit points? Note how the above proposition used the fact that
p was a limit point of E. Indeed, if p was not a limit point then limx→p f (x) = ℓ would
hold vacuously for every ℓ ∈ R as we could just take δ small enough so that no point of
E satisfied 0 < |x − p| < δ. Thus we need p to be a limit point to make the definition of

12
limits non-trivial. In particular, when we say f (x) converges as x → p, we always insist
that p is a limit point (see Problem Sheet 1, Q5(d) for a case when this is important).

Notice that all the examples presented so far have shown that function limits do exist.
Now let’s explore how to prove that a limit fails to exist. The proof of the following result
illustrates how to work with the contrapositive of the limit definition. The proposition
translates questions about function limits to questions about sequence limits, and vice
versa, and so allows to draw on results from Analysis I. Note the care needed to handle
the x ̸= p condition.

Proposition 1.11 (Function limits via sequences). Let f : E → R where E ⊆ R, and


assume p is a limit point of E. Then the following are equivalent.
(a) limx→p f (x) = ℓ.
(b) limn→∞ f (pn ) = ℓ for all sequences (pn ) with pn ∈ E, pn ̸= p and limn→∞ pn = p.
Proof. Suppose limx→p f (x) = ℓ and fix ε > 0. Then there exists a δ > 0 such that

∀x ∈ E : (0 < |x − p| < δ =⇒ |f (x) − ℓ| < ε).

Now suppose (pn ) is a sequence in E, with pn → p and pn ̸= p. Then, taking the ε in the
definition (1) of convergence of a sequence to be this δ, we have

∃N ∈ N : ∀n > N : |pn − p| < δ.

Putting the conditions together and using the fact that pn ∈ E and pn ̸= p (so 0 < |pn −p|)
we get
∃N ∈ N : ∀n > N : |f (pn ) − ℓ| < ε.
As this holds for any ε > 0, limn→∞ f (pn ) = ℓ by definition.

Conversely, suppose f (x) ̸→ ℓ as x → p. Then15

∃ε > 0 : ∀δ > 0 : ∃x ∈ E : (0 < |x − p| < δ and |f (x) − ℓ| ≥ ε).

Fix such an ε > 0 and choose δ := n1 . Then ∃pn ∈ E, with 0 < |pn − p| < 1
n
and

|f (pn ) − ℓ| ≥ ε.

Thus we have found a sequence pn ∈ E, pn ̸= p, pn → p, for which f (pn ) ̸→ ℓ as


required.

Proposition 1.11 can be used to show that a limit limx→p f (x) does not exist by finding
two rival values for the limit, assuming it did exist.
15
Note how the negation is obtained by swapping ∀s and ∃s and negating the final statement, keeping
the quantifiers in the same order.

13
Example 1.12. Consider the function f defined in Example 1.8, namely

x,
 if x ∈ Q, x ̸= 0;
f (x) := 2, if x = 0;

−x, otherwise.

We claim that, for any p ̸= 0, the limit limx→p f (x) fails to exist.

Assume p ̸= 0. Then as p is a limit point of Q\{0} (Example 1.3 with trivial modification
to avoid 0) there exists (by Proposition 1.4) a sequence (pn ) such that pn ∈ Q\{0}, pn ̸= p
and pn → p. Similarly there exists a sequence (qn ) such that qn ∈ R \ Q, qn ̸= p and
qn → p. Then
f (pn ) = pn → p and f (qn ) = −qn → −p.
Now if limx→p f (x) = ℓ then, by Proposition 1.11 and the uniqueness of sequence limits,
both ℓ = p and ℓ = −p would hold, a contradiction as p ̸= 0.

Example 1.13. To show that limx→0 sin x1 doesn’t exist.


Let f (x) = sin x1 for x ̸= 0. Let pn = 2πn1
and qn = 2nπ+π/21
.
Then both sequences (pn ) and (qn ) tend to 0 and pn , qn ̸= 0, but

lim sin x1n = lim sin(2nπ) = 0 and lim sin y1n = lim sin(2nπ + π2 ) = 1.
n→∞ n→∞ n→∞ n→∞

So limx→0 sin x1 cannot exist.

Generalisations to complex numbers and vectors

The definitions of limit points and limits, together with Propositions 1.4, 1.10, 1.11 and
Corollary 1.5 extend immediately to C, and indeed to vectors in Rn or Cn , with essentially
identical proofs. We simply need to replace the real modulus with the complex modulus
|z|, or with the length |x| of a vector x in Rn or Cn , given in the usual way as
p
|x| = |(x1 , . . . , xn )| = |x1 |2 + · · · + |xn |2 .

The only properties of |.| that we need are |x| ≥ 0, with equality iff x = 0, plus the
triangle inequality (which implies the reverse triangle inequality), and these hold in all
the above cases. Thus we can define limits for functions C → C, R → C, C → R,
R → Rn , Rn → Rm , etc.

It is worth remarking that functions of more than one variable, such as f (x, y) : R×R → R
are just functions of a vector (x, y) ∈ R2 , and hence we have also defined multi-variable
limits such as
lim f (x, y).
(x,y)→(x0 ,y0 )

As this course is principally about real functions of one variable, we will not dwell on these
extensions too much in this course. One exception will be when we discuss continuity

14
of functions of several variables. Another is when we come to power series, which are
of extreme importance in complex analysis. In that case we will phrase our results in
terms of complex series. Nevertheless, it is worth noting that much of the material in
this course does generalise, except for the material in sections 4, 5, 9, 10 and 11, which
are only valid for real functions of one real variable.

Infinite limits

As for sequences, we sometimes want to consider the case when the function ‘tends to
infinity’. Note that although it appears in our vocabulary, we have not given infinity the
status of a number: it can only appear in certain phrases in our mathematical language
which are shorthand for quite complicated statements about real numbers. Also in this
case we can only consider functions whose codomain is R as we will need to use ordering.16

We follow the same idea used for sequence limits — we replace ‘close to ℓ’ with ‘large
enough’. That is, we replace
∀ε > 0 . . . =⇒ |f (x) − ℓ| < ε
with
∀M . . . =⇒ f (x) > M or ∀M . . . =⇒ f (x) < M
depending on whether ℓ = ∞ or ℓ = −∞. So, for example, limx→p f (x) = ∞ means
∀M ∈ R : ∃δ > 0 : ∀x ∈ E : (0 < |x − p| < δ =⇒ f (x) > M ),
and we also write this as f (x) → ∞ as x → p, or limx→p f (x) = ∞, or say f (x) tends
to ∞ as x tends to p.
Warning. As for sequences, we don’t say f (x) converges when f (x) → ±∞. And again,

! as for sequences, f (x) not converging does not imply f (x) → ±∞ (e.g., Example 1.13).

Note that uniqueness of limits (Proposition 1.10) and limits via sequences (Proposi-
tion 1.11) extend naturally to include ℓ = ±∞ with only minor changes in the proofs.
Example 1.14. x12 → ∞ as x → 0. Indeed, given M ≥ 1 we can set δ := √1M and note
that 0 < |x − 0| < δ implies x12 > δ12 = M . On the other hand x1 ̸→ ∞ (why?), but we do
1
have |x| → ∞ as x → 0.

Left-hand and right-hand limits



The way we have defined limits
√ means that statements such as limx→0 x = 0 make sense,
even though the domain of x does not include some points very close to 0 (because they
are negative). However, even if a function is defined on both sides of a point p, we may
sometimes wish to consider limits taking into account only the values f (x) for x < p, or
only the values f (x) for x > p.
16
Although one can always talk about |f (z)| tending to infinity when f (z) is complex.

15
Definition. Let f : E → R and let p ∈ R. Then we define the left-hand limit (or
limit from the left) limx→p− f (x) as the limit as x → p, if it exists, of the function f
restricted to E ∩ (−∞, p). In other words,

lim f (x) = ℓ ⇐⇒ lim


x→p
f (x) = ℓ.
x→p−
x∈E∩(−∞,p)

Writing this out in terms of quantifiers, this is equivalent to

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (p − δ < x < p =⇒ |f (x) − ℓ| < ε).

Similarly define the right-hand limit (or limit from the right) limx→p+ f (x) as the
limit as x → p, if it exists, of the function f restricted to E ∩ (p, ∞). In other words,

lim f (x) = ℓ ⇐⇒ x→p


lim f (x) = ℓ ⇐⇒
x→p+
x∈E∩(p,∞)

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (p < x < p + δ =⇒ |f (x) − ℓ| < ε).

Naturally these definitions are only non-vacuous if p is a limit point of E ∩ (−∞, p) (p is


a left limit point of E) or E ∩ (p, ∞) (p is a right limit point of E) respectively.

Normally here E √ will be an interval with p√in the interior, but sometimes we write, for
example, limx→0+ x = 0 instead of limx→0 x = 0 in cases where p is an endpoint of the
domain of the function, just to emphasise that we only need the function to be defined
on one side of p.

Sometimes we will use the notation f (p− ) and f (p+ ) for the left- and right-hand limits:

f (p− ) := lim− f (x), f (p+ ) := lim+ f (x).


x→p x→p

The proof of the following claim is good practice in using the definitions.
Proposition 1.15. Let f : E → R and let p ∈ R be both a left and right limit point of E.
Then for any ℓ ∈ R ∪ {±∞} the following are equivalent:
(a) limx→p f (x) = ℓ;
(b) Both limx→p+ f (x) = ℓ and limx→p− f (x) = ℓ.
Proof. Exercise (need separate proofs for ℓ = ±∞!). See also Proposition 2.14 below.
1 1
Example 1.16. Continuing Example 1.14: limx→0+ x
= +∞ and limx→0− x
= −∞.

Limits at infinity

Sometimes we want to extend the notion ‘f (x) → ℓ as x → p’ to cover p = ±∞. We note


that the domain E must17 be a subset of R as we will be using ordering. The natural
17
One can however define lim|z|→∞ f (z) in a fairly obvious way for functions defined on E ⊆ C. Indeed,
limz→∞ f (z) is often defined this way in this case, although it causes conflict in notation when E ⊆ R.

16
analogue of the definition of a limit is to replace ‘sufficiently close to p’ with ‘sufficiently
large’, i.e., replace
∃δ > 0 . . . (0 < |x − p| < δ =⇒ . . . )
with
∃N . . . (x > N =⇒ . . . ) or ∃N . . . (x < N =⇒ . . . )
depending on whther p = +∞ or p = −∞. Thus limx→∞ f (x) = ℓ means

∀ε > 0 : ∃N ∈ R : ∀x ∈ E : (x > N =⇒ |f (x) − ℓ| < ε).

Note that we do not need to include the requirement that x ̸= p = ±∞ here as, by
assumption, f is only defined on real numbers E ⊆ R.

We do have to add a condition analogous to p being a limit point so as to make the


statement limx→∞ f (x) = ℓ non-vacuous. In this case we need that E is not bounded
above so that there are always some x ∈ E with x > N . Similarly, for limx→−∞ f (x) we
need that E is not bounded below.

The observant reader will have noticed that if E = N so that f : N → R is a sequence,


then the definition of limn→∞ f (n) = ℓ is just the same18 as the one given in Analysis I.

Example 1.17 (Integer powers). Let m be a positive integer. Then, as x → ∞, the


power xm → ∞ and as x → −∞, xm → ∞ if m is even and xm → −∞ if m is odd.
Moreover x−m → 0 as x → ±∞.

Proof. For m > 0 and M ∈ R we note that for x > N := max{M, 1} we have xm ≥
x > M . So by definition xm → ∞ as x → ∞. Now given ε > 0 we note that for
x > N := max{ 1ε , 1} we have |x−m − 0| = x1m ≤ x1 < N1 ≤ ε so x−m → 0. The cases when
x → −∞ are similar, but needs some care with the signs.

Remark. When considering limits as x → ∞ we can restrict attention to values of x


that are large enough, say x > M0 . Any final M that we obtain can then be replaced by
max{M, M0 } in the definition of a limit so that it works for all x. The above proof used
this to restrict to the case x > 1 where the inequalities were easier.

Propositions 1.4, 1.10 and 1.11 extend simply to p = ±∞ with only minor modifications:
we need to replace ‘p is a limit point of E’ by ‘E is unbounded above/below’ for p = +∞
or −∞ respectively. We can also drop the condition pn ̸= p as pn ∈ R.

2 Basic properties of limits

Our next task is to set up the basic machinery for working with function limits. The
following is perhaps the most useful result.
18
The definition given in Analysis I assumed N ∈ N, but one can always just replace N ∈ R with ⌊N ⌋
to get an equivalent statement.

17
Theorem 2.1 (Algebra of Limits (AOL) for functions). Let E ⊆ R and let p be a limit
point of E. Let f, g : E → R and suppose that f (x) → a and g(x) → b as x → p. Then

|f (x)| → |a|, f (x) + g(x) → a + b, f (x) − g(x) → a − b,

f (x)g(x) → ab and f (x)/g(x) → a/b (if b ̸= 0)


as x → p. Also, if h(x) := c is a constant function on E then h(x) → c as x → p.

Proof. These can all be deduced from the Algebra of Limits for Sequences using Propo-
sition 1.11. Assume (pn ) is any sequence with pn ∈ E, pn ̸= p and pn → p. Then by
Proposition 1.11,
f (pn ) → a and g(pn ) → b.
We note that if b ̸= 0 then by taking ε := |b| > 0 in the definition of the limit, there is
some δ > 0 such that g(x) ̸= 0 when 0 < |x − p| < δ. Hence f (x)/g(x) is defined on some
E ′ ⊇ {x ∈ E : 0 < |x − p| < δ}, which still has p as a limit point. By AOL for sequences,

|f (pn )| → |a|, f (pn ) + g(pn ) → a + b, f (pn ) − g(pn ) → a − b,

f (pn )g(pn ) → ab, f (pn )/g(pn ) → a/b (b ̸= 0), h(pn ) → c.


As this holds for all such sequences (pn ), Proposition 1.11 implies the results.

Alternatively Theorem 2.1 can be proved directly from the definitions: mimic the proofs
given for sequences in Analysis I. (Change “∃N : ∀n : n > N =⇒” to “∃δ : ∀x ∈ E : 0 <
|x − p| < δ =⇒” throughout.)

Generalisations. AOL works for complex functions with no change in the proofs. One
can even extend it to functions on Rn or Cn (and so functions of several variables), or
functions to Rn or Cn , provided the statements make sense (e.g., we can’t divide two
vectors, but we can, for example, multiply a scalar valued function f (x) by a vector
valued function ⃗g (x)).

AOL and infinity. AOL works when x → p = ±∞ with only minor changes in the
proof. However, for cases when the actual limits a and/or b are infinite we need to be
a bit more careful. AOL works with the obvious interpretation of arithmetic operations
involving ±∞, except in the indeterminate cases: ∞ − ∞, ±∞ · 0, (±∞)/(±∞), and
any case of division by 0 (at least unless one makes further assumptions on f and g). See
Problem Sheet 1 for some examples. Since it is so useful, we will state it as a theorem.

Theorem 2.2 (Extended AOL). Let E ⊆ R and let p be a limit point of E or let
p = ±∞ with E unbounded above/below. Let f, g : E → R and suppose that f (x) → a
and g(x) → b as x → p where a, b ∈ R ∪ {±∞}. Then, as x → p,
(a) |f (x)| → |a| where we interpret | ± ∞| = ∞;
(b) f (x) ± g(x) → a ± b, except when we get ∞ − ∞ or −∞ + ∞. Here a ± b is
interpreted as ±∞ in the obvious way when one of a or b is infinite, or both are
infinite and are ‘pushing’ in the same direction.

18
(c) f (x)g(x) → ab, except when we get (±∞) · 0 or 0 · (±∞). Here ab is interpreted as
±∞ in the obvious way when a and/or b is infinite and neither is zero.
(d) f (x)/g(x) → a/b provided b ̸= 0 and except when we get (±∞)/(±∞). Here we
interpret a/(±∞) = 0 (for a finite) and (±∞)/b = ±∞ or ∓∞ (for b finite, b > 0
or b < 0).
Proof. A rather tedious exercise — there are many different cases to check!
Example 2.3 (Polynomials). Let p(x) = an xn + an−1 xn−1 + · · · + a0 be a real polynomial
with an > 0, n > 0. Then p(x) → ∞ as x → ∞; and p(x) → ∞ (n even) or p(x) → −∞
(n odd) as x → −∞.

Proof. Write p(x) = xn q(x) where q(x) = an + an−1 x−1 + · · · + a0 x−n . As x−m → 0
as x → ∞ for m > 0 (Example 1.17), AOL gives q(x) → an as x → ∞. Now use the
Extended AOL together with Example 1.17 to show p(x) = xn q(x) → ±∞.
Example 2.4 (Rational functions). A rational function is a quotient of two polynomials:
an xn + an−1 xn−1 + · · · + a0
f (x) = .
bm xm + bm−1 xm−1 + · · · + b0
Assume an , bm ̸= 0. Then when taking a limit x → ∞ we can rewrite f as
an + an−1 x−1 + · · · + a0 x−n
f (x) = xn−m · .
bm + bm−1 x−1 + · · · + b0 x−m
As x → ∞, x−k → 0 for k > 0, so by AOL
an + an−1 x−1 + · · · + a0 x−n an
−1 −m
→ .
bm + bm−1 x + · · · + b0 x bm
Thus, by Extended AOL and Example 1.17,

0,
 if n < m;
an
lim f (x) = bm , if n = m;
x→∞ 
±∞, if n > m;

where the ± in the last case is given by the sign of an /bm .

We also have the following tools, just as for sequences.


Theorem 2.5 (Limits preserve weak inequalities). Let f, g : E → R and let p be a limit
point of E. If f (x) ≤ g(x) for all x ∈ E and f (x) → a, g(x) → b as x → p, then a ≤ b.
Theorem 2.6 (Sandwiching). Let f, g, h : E → R and let p be a limit point of E. If for
all x ∈ E, f (x) ≤ g(x) ≤ h(x) and f (x) → ℓ, h(x) → ℓ as x → p then g(x) → ℓ as
x → p.
Theorem 2.7 (Sandwiching, alternative form). Let f, g, h : E → R, and let p be a limit
point of E. If f (x) → ℓ as x → p and |f (x) − g(x)| ≤ h(x) with h(x) → 0 as x → p,
then g(x) → ℓ as x → p.

19
Proofs. Exercise. Apply Proposition 1.11 to the sequence versions.

These generalise to E ⊆ C etc., and to cases where p and/or ℓ are ±∞. The alternative
form of sandwiching also also works when f and g are complex or vector-valued. The
following can also be extended to complex or vector-valued functions.

Theorem 2.8 (Limits of Compositions of Functions). Suppose f : E → R and g : E ′ → R


with f (E) ⊆ E ′ (so that g(f (x)) is defined for all x ∈ E). Let p be a limit point of E
and assume
(a) limx→p f (x) = q ∈ R, and
(b) f (x) ̸= q for all x ∈ E \ {p}.
Then q is a limit point of E ′ . If in addition
(c) limy→q g(y) = ℓ ∈ R ∪ {±∞}
then we have limx→p g(f (x)) = ℓ.
Corresponding statements also hold when p and/or q = ±∞.

Proof. Exercise. See also Problem Sheet 1. Note the requirement that f (x) ̸= q.

Example 2.9. Theorem 2.8 may seem a bit complicated, but it often naturally appears in
arguments about limits when we ‘change variables’. For example, consider the statement

lim g(x) = lim g(x0 + h).


x→x0 h→0

Here we take the statement to mean that if either limit exists then so does the other and
they are equal. A direct proof is easy, but one can also use Theorem 2.8.

In one direction, suppose limx→x0 g(x) = ℓ. Let x = x(h) = x0 + h. Then we can think
of g(x0 + h) as g(x(h)). Now x = x(h) → x0 as h → 0, but x ̸= x0 if h ̸= 0. Thus
limh→0 g(x0 + h) = limh→0 g(x(h)) = limx→x0 g(x) = ℓ by Theorem 2.8.

Conversely, suppose limh→0 g(x0 + h) = ℓ. Let h = h(x) = x − x0 . Then we can think


of g(x) as g(x0 + h(x)), a composition of the functions g(x0 + ·) and h(·). We have
h → 0 as x → x0 and h ̸= 0 for x ̸= x0 . Thus limx→x0 g(x) = limx→x0 g(x0 + h(x)) =
limh→0 g(x0 + h) = ℓ by Theorem 2.8.

Example 2.10. Theorem 2.8 can be used to investigate limits at ∞ of g(x) by considering
limits at 0 of g(1/x). Write y = y(x) = x1 . Then y is defined for any sufficiently large x,
y ̸= 0 and y → 0 as x → ∞. So e.g., limx→∞ sin x1 = limx→∞ sin(y(x)) = limy→0 sin y = 0.
(Using standard properties of sin. In fact we can use limy→0+ here as we also have y > 0
for all large enough x.)

Example 2.11 (Real powers). For α > 0 real we have xα → ∞ as x → ∞. For α < 0
we have xα → 0 as x → ∞.

We assume standard limits of exp and log (Proposition 5.4 below) and recall that for real
α and x > 0 we define xα := exp(α log x).

20
Now log x → ∞ as x → ∞ (Proposition 5.4), and hence for α > 0, α log x → ∞ (Extended
AOL). Also exp y → ∞ as y → ∞ (Proposition 5.4), so (substituting y = α log x)
xα = exp(α log x) → ∞ (Theorem 2.8).
For α < 0, α log x → −∞ (Extended AOL). Now exp y → 0 as y → −∞ (Proposition 5.4),
so (substituting y = α log x) xα = exp(α log x) → 0 as x → −∞ (Theorem 2.8).
Example 2.12 (Exponentials beat powers). Let α ∈ R and β > 0 be constants. Then
limx→∞ xα e−βx = 0.

Proof. We may restrict attention to x > 0. Then, by definition of exp,



0 ≤ xα e−βx = ≤ n!β −n xα−n .
1 + βx + · · · + (βx)n /n! + · · ·
for any fixed n. Fix a value of n > α. Then 0 ≤ n!β −n xα−n → 0 as x → ∞ by
Example 2.11 and AOL. The result now follows by sandwiching.
Remark. Working with the power series for ex when x > 0, which has all terms positive,
is preferable to working with it when x < 0, as then we have terms of alternating sign.
Inequalities interact badly with expressions with mixed signs.
Example 2.13 (Powers beat logarithms). For α > 0
log x
lim =0 and lim+ xα log x = 0.
x→∞ xα x→0

Proof. Write y = log x so that log



x
= (log x)e−α log x = ye−αy . Now y = log x → ∞ as
x → ∞ so by Theorem 2.8 and Example 2.12
log x
lim = lim ye−αy = 0.
x→∞ xα y→∞

For the second statement write y = − log x so that xα log x = −ye−αy . Now y = − log x →
∞ as x → 0+ so again so by Theorem 2.8 and Example 2.12
lim xα log x = − lim ye−αy = 0.
x→0+ y→∞

Finally we finish this section with another useful result on limits.


Proposition 2.14. Suppose f : E → R and let p ∈ R, ℓ ∈ R ∪ {±∞}.
(a) If p is a limit point of both E1 and E2 where E = E1 ∪ E2 then limx→p f (x) = ℓ if
and only if both limx→p, x∈E1 f (x) = ℓ and limx→p, x∈E2 f (x) = ℓ.
(b) If p is a limit point of E1 ⊆ E but not of E \ E1 (so E ∩ (p − δ, p + δ) ⊆ E1 for
some δ > 0), then limx→p f (x) = ℓ if and only if limx→p, x∈E1 f (x) = ℓ.
In particular if p is a limit point of E1 ⊆ E then limx→p f (x) = ℓ always implies
limx→p, x∈E1 f (x) = ℓ. Similar statements hold when p = ±∞.

Proof. Exercise.

Note that this implies Proposition 1.15. See also Analysis I, Problem sheet 4, Q2(a) for
a special case of the sequence version of this result.

21
3 Continuity

We all have a good informal idea of what it means to say that a function has a continuous
graph: we can draw it without lifting the pen from the paper. But we want now to use our
precise definition of ‘f (x) → ℓ as x → p’ to discuss the idea of continuity. We continue
the ε-δ theme of the previous section.

Again let us consider E ⊆ R and f : E → R. In the definition of limx→p f (x) in Section 1,


the point p need not belong to the domain E of f . Indeed, even when p ∈ E and f (p)
was defined, we steadfastly refused to acknowledge this when considering the limiting
behaviour of f (x) as x approaches p. Now we change our focus and consider the scenario
in which f (p) is defined and ask whether limx→p f (x) = f (p).

Definition. Let f : E → R, where E ⊆ R and p ∈ E. We say f is continuous at p if

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (|x − p| < δ =⇒ |f (x) − f (p)| < ε),

otherwise we say f is discontinuous, or has a discontinuity, at p. We say f is


continuous on E if f is continuous at every point p ∈ E.

Note that the ‘limit’ is now f (p) and we do not exclude x = p: to do so would be neither
necessary nor appropriate. We also do not require p to be a limit point of E.

Proposition 3.1. Let f : E → R, where E ⊆ R.


(a) f is continuous at any isolated point19 of E.
(b) If p ∈ E is a limit point of E, then f is continuous at p if and only if

lim f (x) exists and lim f (x) = f (p).


x→p x→p

Proof. (a) is immediate, since we may choose δ > 0 such that {x ∈ E : 0 < |x − p| < δ} =
∅. For such δ, we have x ∈ E and |x − p| < ε only if x = p and then |f (x) − f (p)| < ε,
trivially.

(b): It is clear that if the continuity condition holds then the limit one does too. In the
other direction, the limit condition, provided the limit is f (p), delivers all that we need
for continuity; the inequality |f (x) − f (p)| < ε holds for x = p as well as the other points
x with |x − p| < δ.

Example 3.2 (Continuity of x and |x|). Let f (x) := x and g(x) := |x|. For f we can set
δ := ε and then clearly |x − p| < δ implies |f (x) − f (p)| = |x − p| < ε. For g note that
the reverse triangle inequality gives

|g(x) − g(p)| = ||x| − |p|| ≤ |x − p|.

Hence we can again take δ := ε in the ε-δ definition of continuity.


19
Recall that an isolated point of E is a point p ∈ E that is not a limit point of E.

22
Example 3.3. Let c ∈ R. Consider f defined on R by
(
c, if x = 0;
f (x) :=
1, otherwise.

Then limx→0 f (x) = 1. Hence f is continuous at 0 if and only if c = 1. (Compare with


Example 1.7.)

On the other hand, f is continuous at every point p ̸= 0, irrespective of the value of c.

Example 3.4. Let α > 0. The function f (x) = |x|α sin x1 is not defined at x = 0 so it
makes no sense to ask if it is continuous there. In such circumstances we modify f in
some suitable way. So we look at
(
|x|α sin x1 , if x ̸= 0;
g(x) :=
0, if x = 0.

Then 0 is a limit point of the domain, and we calculated before that limx→0 g(x) = 0 =
g(0), so g is continuous at 0.

The following theorem is useful in showing a function is discontinuous by considering


suitable sequences of values. It follows immediately from Proposition 3.1 and the proof
of Proposition 1.11. Note that we now don’t need to assume pn ̸= p.

Theorem 3.5 (Continuity via sequences). Let f : E → R where E ⊆ R and p ∈ E.


Then f is continuous at p if and only if for every sequence (pn ) with pn ∈ E and pn → p
we have that f (pn ) → f (p) as n → ∞.

Example 3.6. Let f (x) = 1 when x is rational and f (x) = 0 when x is irrational. Since
any rational p has a sequence of irrationals pn → p we have f (pn ) = 0 ̸→ f (p) = 1. Since
any irrational p has a sequence of rationals pn → p we have f (pn ) = 1 ̸→ f (p) = 0. Thus
f is not continuous at any point.

We can use our characterisation of continuity at limit points in terms of limx→p f (x),
together with AOL to prove that the class of functions continuous at p is closed under
all the usual algebraic operations.

Theorem 3.7 (Continuity: algebraic operations). Let E ⊆ R, p ∈ E, and suppose


f, g : E → R are both continuous at p. Then the following functions are continuous at p:
|f (x)|, f (x) ± g(x), f (x)g(x), f (x)/g(x) (provided g(p) ̸= 0), and any constant function
h(x) := c.

Proof. This follows directly from the corresponding AOL results and Proposition 3.1.

Example 3.8 (Polynomials and rational functions). Let f : R → R be a polynomial.


Then f is continuous at every point of R. Further, consider the rational function f (x) =
r(x)
q(x)
, where r, q : R → R are polynomials. Then f is continuous at p provided q(p) ̸= 0.

23
Proof. Example 3.2 shows that f (x) = x is continuous at every point. Then Theorem 3.7
and induction on degree gives that every polynomial is continuous. Theorem 3.7 then
also implies rational functions are continuous where the denominator is non-zero.

One of the key properties of continuous functions is that they ‘commute with limits’.
Theorem 3.9 (Continuous functions commute with limits). Let f : E → R and g : E ′ →
R be functions with f (E) ⊆ E ′ . Suppose p is a limit point of E, or p = ±∞ and E is
unbounded above/below. Suppose also that limx→p f (x) = ℓ ∈ E ′ and g is continuous at ℓ.
Then
lim g(f (x)) exists and equals g(lim f (x)) = g(ℓ).
x→p x→p

Proof. Since g is continuous at ℓ, for any ε > 0 there is an η > 0 such that
∀y ∈ E ′ : (|y − ℓ| < η =⇒ |g(y) − g(ℓ)| < ε).
So as f (E) ⊆ E ′
∀x ∈ E : (|f (x) − ℓ| < η =⇒ |g(f (x)) − g(ℓ)| < ε).
But f (x) → ℓ as x → p so, as η > 0,
∃δ > 0 : ∀x ∈ E : (0 < |x − p| < δ =⇒ |f (x) − ℓ| < η).
Combining these assertions
∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (0 < |x − p| < δ =⇒ |g(f (x)) − g(ℓ)| < ε).
Hence g(f (x)) → g(ℓ) as x → p. The cases when p = ±∞ are similar.
Corollary 3.10 (Composition of continuous functions). Let f : E → R and g : E ′ → R
with f (E) ⊆ E ′ . If f (x) is continuous at p ∈ E and g(x) is continuous at f (p), then
g(f (x)) is continuous at p.

Proof.20 Combine Proposition 3.1 with Theorem 3.9: if p is isolated then there is nothing
to prove, and if p is a limit point of E then limx→p g(f (x)) = g(limx→p f (x)) by continuity
of g and limx→p f (x) = f (p) by continuity of f . Thus limx→p g(f (x)) = g(f (p)) and so
g(f (x)) is continuous at p.

Recall from Analysis I that certain functions from R → R — exp x, sin x, cos x, sinh x
and cosh x etc. — can be defined by power series, each of which has infinite radius of
convergence. You were told in Analysis I that a power series defines a function which is
continuous at each point within its interval of convergence. Later on (Theorem 7.13) we
shall justify this claim.

For now, we shall continue to take this fact on trust. This will allow us to use the
algebra of continuous functions and the composition of continuous functions to prove the
continuity of a wide variety of functions.
20
If you are asked to prove this in an exam, don’t assume Theorem 3.9, but write out a direct proof
using a similar argument to the one in the proof of Theorem 3.9.

24
Example 3.11. We claim that the function g : R → R given by
(
x sin x1 , if x ̸= 0;
g(x) :=
0, if x = 0;

is continuous at every point of R.

We have already proved that g is continuous at 0 (special case of Example 3.4).

If p ̸= 0: x1 is continuous at p as p ̸= 0 (quotient of continuous functions) and sin x


is continuous at p1 (property of sin). Hence sin x1 is continuous at p (composition of
continuous functions). So x sin x1 is continuous at p (product of continuous functions).

Left-continuity and right-continuity.

The definitions of one-sided limits lead on to notions of left- and right-continuity. We


say that a function f is left-continuous (or continuous from the left) at p if it is
continuous as a function restricted to E ∩ (−∞, p], namely

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (p − δ < x ≤ p =⇒ |f (x) − f (p)| < ε).

If p is a left limit point of E then this is equivalent to f (p− ) existing and f (p− ) = f (p).
Likewise f is right-continuous (or continuous from the right) at p if

∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (p ≤ x < p + δ =⇒ |f (x) − f (p)| < ε).

Proposition 3.12. Let f : E → R and let p ∈ E. Then the following are equivalent:
(a) f is continuous at p;
(b) f is both left-continuous at p and right-continuous at p.
Proof. Exercise.

Example 3.13. Consider f : R → R given by


(
x, if x ≥ 0;
f (x) =
x + 1, if x < 0.

Then f (0+ ) = 0, f (0− ) = 1 and f (0) = 1. So limx→0 f (x) does not exist and f fails to
be continuous at 0. It is right-continuous but not left-continuous at 0.

Remark. It should be clear that continuity of f : E → R at a point p depends only on f



! restricted to a small region about p, say E ∩ (p − δ, p + δ). However, while continuity of
f at p implies that f restricted to say E1 = E ∩ [p, p + δ) is continuous, continuity of this
restricted function is not enough to imply continuity of the original f — in this case it
only implies right-continuity. To get the reverse implication needs p not to be limit point
of E \ E1 (see comments after Proposition 2.14), or equivalently E1 ⊇ E ∩ (p − δ, p + δ)
for some δ > 0.

25
The following example shows how we can ‘join’ two continuous functions if their limits
match up at the join.

Example 3.14. Consider f : R → R given by


(
2x5 , if x ≥ 0;
f (x) =
5x2 , if x < 0.

Then limx→0+ f (x) and limx→0− f (x) both exist and equal 0. Hence f is continuous at 0.
In addition f is continuous at each point p ∈ (0, ∞) and at each point of p ∈ (−∞, 0) as
f is given by a polynomial in a small region around p. Therefore f is continuous on R.

Continuity is often helpful in evaluating limits as the following example shows.

Example 3.15. limx→∞ x1/x = 1.


−1
Proof. Let x > 0. By definition, x1/x = ex log x . By Example 2.13, x−1 log x → 0 as
x → ∞. Since exp is continuous at 0, Theorem 3.9 gives
−1
x1/x = ex log x
→ e0 = 1 as x → ∞.

EXAM Generalisations, continuity of functions of several variables

The definition and basic properties of continuous functions extend immediately to com-
plex and even vector-valued functions (or functions on C or functions of several variables)
with essentially no changes in the proofs. One useful result (which is analogous to a result
on complex sequences from Analysis I) is the following.

Proposition 3.16. A function f : E → C is continuous iff both re(f ) and im(f ) are
continuous.

Proof. Exercise.

Indeed, functions to Rn or Cn are continuous iff each coordinate is given by a continuous


function. More interesting is when we consider functions of vectors, i.e., functions of
several variables. Suppose f : R × R → R is a function of two variables. The way we have
defined continuity is that we require

lim f (x, y) = f (x0 , y0 ),


(x,y)→(x0 ,y0 )

or, to write it out more fully, for all ε > 0 there is a δ such that

|(x, y) − (x0 , y0 )| < δ =⇒ |f (x, y) − f (x0 , y0 )| < ε,

where |(x, y) − (x0 , y0 )| is the Euclidean distance from (x, y) to (x0 , y0 ) in the plane.

26
Example 3.17. Define f : R × R → R by
(
xy
x2 +y 2
, if (x, y) ̸= (0, 0);
f (x, y) :=
0, if (x, y) = (0, 0).

Consider lim(x,y)→(0,0) f (x, y). It helps to use polar coordinates (x, y) = (r cos θ, r sin θ)
here as the condition |(x, y) − (0, 0)| < δ is just the condition r < δ. We have f (x, y) =
1
r2
(r cos θ · r sin θ) = cos θ sin θ. If θ = π4 , so x = y, then f (x, y) = 21 , while if θ = 0, so
y = 0, then f (x, y) = 0. As we can find such points (x, y) with arbitrarily small r, f (x, y)
does not tend to a limit as (x, y) → (0, 0).

Note however that for all x ̸= 0, limy→0 f (x, y) = f (x, 0) = 0 as f (x, y) is continuous

! (rational function) of the variable y if we fix x ̸= 0. Thus limx→0 limy→0 f (x, y) = 0 =
limy→0 limx→0 f (x, y). Hence existence of iterated limits is not enough to imply a multi-
variable limit.

There are even examples of functions which are continuous along any line θ = constant
2
through the origin, but are not continuous at (0, 0). For example f (x, y) = x2xy+y4 for
(x, y) ̸= (0, 0), f (0, 0) = 0.

Example 3.18. Now consider


(
x2 y
x2 +y 2
, if (x, y) ̸= (0, 0);
g(x, y) :=
0, if (x, y) = (0, 0).

In this case, using polar coordinates, |g(x, y)| = |r cos2 θ sin θ| ≤ r. Hence, taking δ := ε,
|(x, y) − (0, 0)| < δ implies r < δ which implies |g(x, y) − 0| < ε, so lim(x,y)→(0,0) g(x, y) =
0 = g(0, 0) and g is continuous at (0, 0).

In the above examples we have continuity for all (x, y) ̸= (0, 0): it is easy to see the
functions f (x, y) := x and f (x, y) := y are continuous, so by algebraic properties of
continuous functions, (the suitable generalisation of) Theorem 3.7, any rational function
p(x,y)
q(x,y)
is continuous at points where q(x, y) ̸= 0.

4 The Boundedness Theorem and the IVT

Let f : E → R. We say that f is bounded on E if the image f (E) = {f (x) : x ∈ E} is


bounded, i.e., if
∃M > 0 : ∀x ∈ E : |f (x)| ≤ M,
and similarly for bounded above/below.

When the set f (E) is bounded above (and E ̸= ∅), the Completeness Axiom tells us that

sup f := sup{f (x) : x ∈ E}

27
exists. When sup f ∈ f (E) we say that f attains its sup(remum). Corresponding
definitions apply to real-valued functions which are bounded below.

While the notion of boundedness is also available for a complex valued function f , the
notions of sup f and inf f make sense only when f is real-valued.

Here is the first Big Theorem of the course.

Theorem 4.1 (Boundedness Theorem). Suppose a < b and f : [a, b] → R is a continuous


function on the closed bounded (non-empty) interval [a, b]. Then
(a) f is bounded.
(b) f attains its sup and its inf. That is, there exist points ξ1 and ξ2 in [a, b] such that

f (ξ1 ) = sup f (x) and f (ξ2 ) = inf f (x).


x∈[a,b] x∈[a,b]

Note: in general ξ1 and ξ2 will not be unique.

Proof. (a): Argue by contradiction. Suppose f were unbounded. Then for any n ∈ N,
there exists xn ∈ [a, b] such that |f (xn )| > n. Since (xn ) is bounded, by the Bolzano–
Weierstrass Theorem, there exists a subsequence (xsn ) converging to p, say. As [a, b] is
closed and xsn ∈ [a, b] we must have p ∈ [a, b]. Now f is continuous at p and hence

lim f (xsn ) = f (p),


n→∞

so in particular the sequence (f (xsn )) is convergent, and hence bounded. But |f (xsn )| >
sn ≥ n, so (f (xsn )) is unbounded, a contradiction. Therefore f must be bounded.

(b): Let M = supx∈[a,b] f (x). Then by the approximation property of the supremum, for
all n ≥ 1 there exists an xn ∈ [a, b] with M − n1 < f (xn ) ≤ M . Since (xn ) is bounded, by
the Bolzano–Weierstrass Theorem, there exists a subsequence (xsn ) converging to p, say.
Then p ∈ [a, b] as [a, b] is closed. Now f is continuous at p and hence

lim f (xsn ) = f (p),


n→∞

1
But M − sn
< f (xsn ) ≤ M , so by sandwiching f (p) = limn→∞ f (xsn ) = M .

A similar argument deals with the infimum, or we can apply what we have done to −f
and get the result at once since for any bounded non-empty subset S of R,

inf{s : s ∈ S} = − sup{−s : s ∈ S}.

Example 4.2. Let E = (0, 1] and f : E → R be given by f (x) = x1 . Then f is bounded


below and attains its inf: inf f = f (1). On the other hand f is not bounded above:
f (x) → ∞ as x → 0. Hence the requirement that E is closed in Theorem 4.1 is necessary.

Example 4.3. Let E = R and let f (x) = ex . Then inf f = 0, but is not attained, and f
is not bounded above as f (x) → ∞ as x → ∞. Hence the requirement that E is bounded
in Theorem 4.1 is necessary.

28
Example 4.4. Let E = [0, 1] and f (x) = x1 for 0 < x < 1 and f (0) = f (1) = 2. Then
f is unbounded above and inf f = 1 is not attained. Hence the requirement that f is
continuous in Theorem 4.1 is necessary.
 Remark. On the other hand, looking at the proof of the Boundedness Theorem, one
sees that all we needed about the domain of f was that it was closed and bounded — it
did not need to be an interval. In fact it did not even need to be real. Any closed and
bounded subset of either R or C would do21 . Such a subset is called compact, a concept
that will be of great importance in later courses.
Example 4.5. Assume f is a continuous complex-valued function defined on [a, b]. Then
|f | is continuous and real-valued and the Boundedness Theorem applies to |f |. Hence f
is bounded. Part (b) of the theorem involves order notions: we can no longer define sup f
and inf f when f is complex-valued.

So far we have concentrated on extreme values, the supremum and the infimum of a
continuous real-valued function on a closed bounded interval. What can we say about
possible values between these? Here is the second of our Big Theorems.
Theorem 4.6 (Intermediate Value Theorem (IVT)). Assume a < b, f : [a, b] → R is
continuous, and let c be a real number between f (a) and f (b). Then there is at least one
point ξ ∈ [a, b] such that f (ξ) = c.


! Note: the restriction that f be real-valued is essential. Also, ξ need not be unique.

Proof. (Divide and Conquer.) By replacing f with −f if necessary, we may assume


f (a) ≤ c ≤ f (b). We shall inductively define a nested sequence of intervals [an , bn ],
[an+1 , bn+1 ] ⊆ [an , bn ], with f (an ) ≤ c ≤ f (bn ) and bn − an → 0.

We start with [a0 , b0 ] = [a, b]. Now, having defined an and bn , let mn = 12 (an + bn ) be
the midpoint of the interval [an , bn ]. If f (mn ) ≤ c, let [an+1 , bn+1 ] = [mn , bn ]; otherwise
let [an+1 , bn+1 ] = [an , mn ]. Then in either case we have f (an+1 ) ≤ c ≤ f (bn+1 ). Also
bn+1 − an+1 = 21 (bn − an ), so by induction bn − an = 21n (b − a) → 0.

Now (an ) is clearly increasing and bounded above (by b), so tends to a limit ξ ∈ [a, b].
Similarly (bn ) is clearly decreasing and bounded below (by a), so tends to a limit ξ ′ ∈ [a, b].
But bn − an → 0, so by AOL we have ξ = ξ ′ . Now by continuity of f and preservation of
weak inequalities by limits we have

f (ξ) = f ( lim an ) = lim f (an ) ≤ c.


n→∞ n→∞

Similarly
f (ξ) = f ( lim bn ) = lim f (bn ) ≥ c.
n→∞ n→∞

Thus f (ξ) = c.

Note that this proof gives an effective algorithm for homing in on a root of any continuous
equation as ξ ∈ [an , bn ] for all n and bn − an → 0.
21
Or a closed and bounded subset of Rn or Cn — Bolzano–Weierstrass works in these cases too.

29
Proof. (Alternative inf/sup proof.) Again, by considering −f if necessary we may assume
that f (a) ≤ c ≤ f (b). Define
S := {x ∈ [a, b] : f (x) ≤ c}.
Then a ∈ S so S ̸= ∅ and S is bounded above by b. So, by the Completeness Axiom,
ξ := sup S exists.22 Since a ∈ S we have ξ = sup S ≥ a and since b is an upper bound
for S we have ξ = sup S ≤ b. Therefore ξ ∈ [a, b].

By the approximation property of sup there exists xn ∈ S with ξ − n1 < xn ≤ ξ. Then


xn → ξ so continuity of f together with preservation of weak limits gives
f (ξ) = lim f (xn ) ≤ c.
n→∞

Assume ξ < b and pick yn → ξ with ξ < yn < b. As yn > ξ we have yn ∈ / S and so
f (yn ) > c. As yn → ξ, continuity of f together with preservation of weak limits gives
f (ξ) = lim f (yn ) ≥ c.
n→∞

On the other hand, if ξ = b then clearly f (ξ) = f (b) ≥ c. Hence f (ξ) = c.


Example 4.7. There exists a unique positive number ξ such that ξ 2 = 2.

To prove this we consider f : [1, 2] → R defined by f (x) = x2 . Note that f (1) = 1 < 2 <
4 = f (2) and also, as f is a polynomial, it is continuous. Thus, by the IVT, there exists
ξ ∈ [1, 2] such that f (ξ) = 2, as required. Uniqueness can be proved as in Analysis I.

Remark. The proof of existence of 2 given in Analysis I relied crucially on the Com-
pleteness Axiom and on a trichotomy argument, as did our proofs of the IVT.
Corollary 4.8 (Continuous image of an interval). If I is an interval and f : I → R is
continuous, then the image f (I) = {f (x) : x ∈ I} is an interval.

Proof. Pick x ≤ y ≤ z with x, z ∈ f (I), say x = f (a), z = f (b), a, b ∈ I. Then as I is an


interval we have [a, b] ⊆ I (or [b, a] ⊆ I), so we can consider f as a continuous function
on [a, b]. By the IVT there exists a ξ between a and b with f (ξ) = y, so y ∈ f (I). Thus
f (I) has the interval property.
Corollary 4.9 (Continuous image of a closed bounded interval). Let f : [a, b] → R be
continuous. Then f ([a, b]) = [c, d] for some c, d ∈ R.

Proof. By the Boundedness Theorem, part (a), we can define


c := inf f (x) and d := sup f (x).
x∈[a,b] x∈[a,b]

Clearly f ([a, b]) ⊆ [c, d].

By the Boundedness Theorem, part (b), there exist α ∈ [a, b] and β ∈ [a, b] such that
f (α) = c and f (β) = d. Hence c, d ∈ f ([a, b]).

But f ([a, b]) is an interval by Corollary 4.8, so [c, d] ⊆ f ([a, b]). Hence f ([a, b]) = [c, d].
22
We can also consider inf{x : f (x) ≥ c} and construct a similar proof.

30
Remark. It is not necessarily the case that c or d is f (a) or f (b). Consider, for example,
sin x on [0, 2π].
 Remark. In the Part A Topology course you will find out more about continuity and
how to capture this property more elegantly than with the ε-δ definition. You will also
encounter more general definitions of compact sets (in R these are just closed and bounded
sets) and connected sets (in R these are just intervals). The Boundedness Theorem is a
special case of the general result that a continuous image of a compact set is compact.
The IVT (or its equivalent reformulation, Corollary 4.8) is a special case of the general
result that a continuous image of a connected set is connected.

5 Monotonic functions and the Continuous IFT

Definition. Let E ⊆ R and f : E → R. We say that f is increasing (respectively


decreasing, strictly increasing, strictly decreasing) if for all x, y ∈ E with x < y
we have f (x) ≤ f (y) (respectively f (x) ≥ f (y), f (x) < f (y), f (x) > f (y)). A function is
called (strictly) monotonic or (strictly) monotone on E if it is (strictly) increasing
or decreasing on E.

Note that a function which is strictly monotonic is injective: x ̸= y =⇒ f (x) ̸= f (y).


Recall (from the Introduction to University Mathematics course) that a function f : X →
Y has an inverse f −1 : Y → X if and only if f is bijective, i.e., it is both injective and
surjective. If we consider f as a function from X to its image f (X), then it is by definition
surjective. Hence any injective function f : X → Y has an inverse f −1 : f (X) → X
defined on the image f (X) of f .
We are now ready to prove the next Big Theorem of this course. It will tell us that a
continuous, strictly monotonic function on an interval has a continuous inverse.
Theorem 5.1 (The Continuous Inverse Function Theorem (C-IFT)). Let f : I → R be
a strictly monotonic and continuous function on the interval I. Then
(a) f is a bijection from I to the interval f (I); and
(b) the inverse map to f , f −1 : f (I) → I, is also strictly monotonic and continuous.
Proof. Assume without loss of generality that f is strictly increasing. We know from
Corollary 4.8 that its image f (I) is an interval. As f is strictly increasing, it is injective
and hence gives a bijection from I to f (I). Hence the inverse function f −1 : f (I) → I,
defined by f −1 (y) = x when f (x) = y, is well-defined. It is also strictly increasing as if
y1 = f (x1 ), y2 = f (x2 ), then x1 > x2 implies y1 > y2 and x1 = x2 implies y1 = y2 . Hence
if y1 < y2 we must have f −1 (y1 ) = x1 < x2 = f −1 (y2 ) by trichotomy.
It only remains to show that f −1 is continuous. Fix
y0 = f (x0 ) ∈ f (I) and ε > 0. Assume first that
x0 ± ε ∈ I and let y0 + δ
y0
y0 − δ
δ := min{f (x0 ) − f (x0 − ε), f (x0 + ε) − f (x0 )}.
x0 −ε x0 x0 +ε

31
Note that f is strictly increasing, so δ > 0. Also, if y ∈ f (I) and |y − y0 | < δ then
f (x0 − ε) ≤ f (x0 ) − δ = y0 − δ < y < y0 + δ = f (x0 ) + δ ≤ f (x0 + ε).
As f −1 is strictly increasing, this implies x0 − ε < f −1 (y) < x0 + ε and hence |f −1 (y) −
f −1 (y0 )| < ε as required.

If either of x0 ± ε ∈ / I then one can either reduce ε until it is, in which case the δ
found for this smaller ε suffices, or if x0 is an endpoint of I just remove the undefined
term in the minimum defining δ. For example, if x0 = min I and x0 + ε ∈ I, then set
δ := f (x0 + ε) − f (x0 ). Now for |y − y0 | < δ we have as above that f −1 (y) < x0 + ε. But
f −1 (y) ≥ x0 as f −1 (y) ∈ I and x0 = min I. So again |f −1 (y) − f −1 (y0 )| < ε.
Remark. Problem Sheet 3 asks you to prove that, if f : I → R is a continuous, injective
function with f (a) < f (b) for some a < b, then f is strictly increasing on I. So in the
statement of the C-IFT it is sufficient that f : I → R is continuous and injective.
Example 5.2. For any integer n ≥ 1 there exists a continuous, strictly increasing nth
√ √
root function n : [0, ∞) → R (general n) or n : R → R (n odd).

Indeed, the power function f (x) := xn is continuous and strictly increasing on [0, ∞). Its
image is unbounded above and 0n = 0, so as the image is an interval it must be [0, ∞).
If n is odd then f is strictly increasing on the whole of R and its image is unbounded in
both directions, so must be R.
Warning. If you choose to use the notation f −1 for the inverse function of f , when this

! exists, then you must make very clear what you intend the domains and codomains of f
and f −1 to be.
Example 5.3. sin x is strictly increasing on [− π2 , π2 ] with image [−1, 1]. Hence we can
define arcsin : [−1, 1] → [− π2 , π2 ]. If we chose a different interval on which to define sin
we might either not have sin strictly monotonic (e.g., on [0, π]) or if it is, we might get a
very different definition for arcsin (e.g., if we consider sin on [ π2 , 3π
2
]).

Exponentials and Logarithms

Your likely first encounter with inverse functions would have occurred when you were
introduced to the (natural) logarithm function as the inverse of the exponential function.
Here we show how to exploit the C-IFT to establish the existence and basic properties of
log x (or ln x as you may have known it at school). However, before that we need some
properties of the exponential function.

We define exp(x), also written ex , by



X xk x2 x3
exp(x) = =1+x+ + + ··· (3)
k=0
k! 2! 3!

The most important property of the exponential is that for all x, y ∈ C,


exp(x + y) = exp(x) exp(y). (4)

32
We will not give a proof here, but will prove it later (for real x and y only). If you wish
to see a proof of (4) that uses only Analysis I material and works for complex x and y,
see the supplementary material on exponentials on the website. However, all the other
properties of exp that we shall need are fairly easy to deduce from (3) and (4)

Proposition 5.4. exp : R → R is a continuous, strictly increasing function on R with


image (0, ∞). As a result, it has a strictly increasing continuous inverse log : (0, ∞) → R
which satisfies
log(xy) = log(x) + log(y)
for all x, y > 0. Moreover, we have the limits

lim exp x = ∞, lim exp x = 0, lim log x = ∞, lim log x = −∞,


x→∞ x→−∞ x→∞ x→0+

and the useful inequality exp(x) ≥ 1 + x for all x ∈ R.

Proof. We will prove later that any function defined by a power series is continuous, and
indeed differentiable, inside its radius of convergence, but for now we provide a simple
direct proof that works just for exp.

Claim 1: exp(x) > 0 and exp(x) ≥ 1 + x for all x ∈ R.


2
Proof. For x ≥ 0 this is clear from the definition exp(x) = 1 + x + x2! + . . . as all
remaining terms are non-negative. In particular exp(x) > 0 for all x ≥ 0. Now taking
y = −x in (4) gives exp(−x) = 1/ exp(x) > 0 for all x ≥ 0. Also, for x ∈ [0, 1),
1
exp(x) ≤ 1 + x + x2 + · · · = 1−x , and so exp(−x) ≥ 1 − x for x ∈ [0, 1). As this also holds
trivially for x ≥ 1, exp(−x) ≥ 1 − x for all x ≥ 0 and the claim follows.

Claim 2: exp is continuous on R.


1
Proof. For |x| < 1 we have 1+x ≤ exp(x) = 1/ exp(−x) ≤ 1−x by Claim 1, so sandwiching
and AOL gives exp(x) → 1 = exp(0) as x → 0. Now, by Example 2.9, (4) and AOL,

lim exp(x) = lim exp(x0 + h) = lim exp(x0 ) exp(h) = exp(x0 ) · lim exp(h) = exp(x0 ).
x→x0 h→0 h→0 h→0

Claim 3: exp has image (0, ∞), limx→∞ exp(x) = ∞, limx→−∞ exp(x) = 0.

Proof. By Claim 1, exp(x) ≥ 1 + x, so exp(x) → ∞ as x → ∞ by sandwiching. Hence


exp(x) = 1/ exp(−x) → 0 as x → −∞ by Extended AOL. As exp is continuous its image
must be an interval. The only possibility is (0, ∞) as it is unbounded above, contains
points arbitrarily close to 0, but only contains positive numbers.

Claim 4: exp is strictly increasing.

Proof. If x < y then exp(y) = exp(x) exp(y − x), but exp(y − x) ≥ 1 + (y − x) > 1 and
exp(x) > 0, so exp(y) > exp(x).

33
The first part of the proposition now follows from the C-IFT and applying log to the
equation
exp(log(xy)) = xy = exp(log x) exp(log y) = exp(log x + log y).
The limits for log follow from monotonicity: given M set N := eM , then for x > N ,
log x > M . Given M set δ := e−M , then for 0 < x < δ, log x < −M .
Corollary 5.5. For any α ∈ R the function x 7→ xα is continuous on (0, ∞).

Proof. xα := exp(α log x) is a composition of continuous functions.

EXAM
More on monotonic functions

The material in this section is non-examinable for Analysis II. In it we consider the
situation for an arbitrary monotonic function, not assumed to be continuous. We start
with a function analogue to the results from Analysis I on monotonic sequences.
Theorem 5.6 (One-sided limits of increasing functions). Let f : E → R be increasing.
If p ∈ E is a left limit point of E then the left-hand limit f (p− ) of f at p exists and
f (p− ) = sup{f (x) : x < p, x ∈ E} ≤ f (p). Similarly if p ∈ E is a right limit point of E
then f (p+ ) = inf{f (x) : x > p, x ∈ E} ≥ f (p).

Proof. The set {f (x) : x < p, x ∈ E} is non-empty as p is a left limit point of E, and
is bounded above by f (p) since f is increasing. Therefore by the Completeness Axiom
ℓ := sup{f (x) : x < p, x ∈ E} exists and ℓ ≤ f (p). We have to show that f (p− ) = ℓ. Let
ε > 0. By the Approximation Property for sup, there exists xε ∈ E, xε < p, such that
ℓ − ε < f (xε ) ≤ ℓ.
Choose δ := p − xε . Then δ > 0 as xε < p. Also, as f is increasing,
p − δ = xε < x < p =⇒ ℓ − ε < f (xε ) ≤ f (x) ≤ ℓ.
By definition f (p− ) = ℓ and we are done.
The result for f (p+ ) can be obtained by a similar argument, or by applying what we have
done to the function −f (−x) on (−b, −a) and juggling with the inequalities.
Definition. We say that a monotonic f has a jump discontinuity at p if either p is a
left limit point of E and f (p− ) ̸= f (p) or p is a right limit point of E and f (p+ ) ̸= f (p).
Corollary 5.7. Any monotonic function has at most countably many points of disconti-
nuity.

Proof. We may assume without loss of generality that f is increasing. If f (p− ) = f (p) =
f (p+ ) then f is continuous at p, so if f is discontinuous at p the one of the intervals
(f (p− ), f (p)) or (f (p), f (p+ )) must be non-trivial. But distinct ‘jump’ intervals must
be disjoint as if p < q then f (p− ) ≤ f (p) ≤ f (p+ ) ≤ f (x) ≤ f (q − ) ≤ f (q) ≤ f (q + )
for any x ∈ (p, q). Also, each non-trivial interval must contain a rational number, so
we can construct an injective map from the set of discontinuities of f to Q, which is
countable.

34
We stress that the behaviour of monotonic functions is very special. Consider, for exam-

! ple, f : (0, 1) → R given by (
1, if x ∈ Q;
f (x) =
0, if x ∈
/ Q.
Then the left-hand and right-hand limits f (p− ) and f (p+ ) fail to exist for every p ∈ (0, 1).
Moreover f is discontinuous at every point of the uncountable set (0, 1).

Corollary 5.8. If I is an interval and f : I → R then f is continuous if and only if the


image f (I) is an interval.

Proof. We have already seen that if f is continuous then f (I) is an interval. Now suppose
f is discontinuous at a point p ∈ I. Then either f (p− ) ̸= f (p) or f (p) ̸= f (p+ ). Suppose
wlog f (p− ) ̸= f (p). Then (as f (p− ) is defined), there exists a q < p with q ∈ I and
f (q) ≤ f (p− ) < f (p). But any point in (f (p− ), f (p)) lies between f (q) and f (p), but is
not in the image of f . Hence f (I) is not an interval.

Remark. We note that this gives an alternative way of showing f −1 is continuous in the
proof of the Continuous IFT: f −1 (f (I)) = I is an interval, so f −1 must be continuous.

6 Uniform continuity

This section and the next one are unashamedly technical. In them we look closely at
conditions for continuity of functions and at convergence of sequences of functions. The
pay-off will be theorems which are important throughout analysis.

Definition. Let f : E → R. Then f is uniformly continuous on E if

∀ε > 0 : ∃δ > 0 : ∀p ∈ E : ∀x ∈ E : (|x − p| < δ =⇒ |f (x) − f (p)| < ε).

Compare this with the definition of f being continuous on E, i.e., at every p ∈ E:

∀p ∈ E : ∀ε > 0 : ∃δ > 0 : ∀x ∈ E : (|x − p| < δ =⇒ |f (x) − f (p)| < ε).

The difference between the two statements is in the order of the quantifiers. Swapping
∀’s doesn’t affect the meaning, but swapping the order in which ∀p and ∃δ occur does
change the meaning. Read the expressions from left to right. For uniform continuity on
E we need a δ which works universally — that is, for all p in E at the same time. For
continuity on E we first choose any p and then find δ that works just for that choice of p:
in this case δ is allowed to depend on p.

Of course if f : E → R is uniformly continuous on E then f is continuous on E. The


converse is false, as we now demonstrate.

35
Example 6.1. Consider f (x) = sin x1 on E = (0, 1]. Certainly f is continuous on E. We
shall show that f fails to be uniformly continuous on E.
Take ε = 1. We show that there is no δ > 0 that works in the condition for uniform
continuity.
1 1
Take sequences xn = 2πn+π/2 and pn = 2πn+3π/2 . Then |f (xn ) − f (pn )| = |1 − (−1)| = 2,
but |xn − pn | → 0. So for any δ > 0, there exists pn and xn such that |xn − pn | < δ but
|f (xn ) − f (pn )| <
̸ 1. So f is not uniformly continuous.

This example demonstrates an effective strategy for showing a function is not uniformly
continuous: find sequences xn and yn with |xn − yn | → 0 but |f (xn ) − f (yn )| ̸→ 0.

Example 6.2. Consider f (x) = cos(x2 ) on R. Take sequences xn = 2nπ and yn =
p
(2n + 1)π. Then |xn − yn | = |x2n − yn2 |/|xn + yn | = π/|xn + yn | → 0 as n → ∞
but |f (xn ) − f (yn )| = |1 − (−1)| = 2 ̸→ 0. Hence f is continuous, but not uniformly
continuous on R.
Uniform continuity is a condition that is found to be necessary in certain technical proofs
in analysis which involve continuous functions on intervals.23 So the following theorem is
important beyond the present course.
Theorem 6.3 (Continuity implies uniform continuity on closed bounded intervals). If
f : [a, b] → R is continuous, then f is uniformly continuous on [a, b].

Proof. Suppose for a contradiction that f were not uniformly continuous. By the con-
trapositive of the uniform continuity condition there would exist ε > 0 such that for
any δ > 0 — which we choose as δ = n1 for arbitrary n — there exists a pair of points
xn , yn ∈ [a, b], such that
1
|xn − yn | < n
but |f (xn ) − f (yn )| ≥ ε.
Since each xn ∈ [a, b], the sequence (xn ) is bounded, and by the Bolzano–Weierstrass
Theorem there exists a subsequence (xsn ) which converges to some p. Now p must be a
limit point of [a, b], so p ∈ [a, b]. But
ysn = (ysn − xsn ) + xsn → 0 + p = p
by AOL, so by continuity at p we have
|f (xsn ) − f (ysn )| ≤ |f (xsn ) − f (p)| + |f (ysn ) − f (p)| → 0 as n → ∞.
This gives the required contradiction as we assumed |f (xn ) − f (yn )| ≥ ε for all n.
Remark. We note that uniform continuity, unlike continuity, is a global property: in the

! examples above sin x1 is uniformly continuous on all intervals of the form [ε, 1], ε > 0, but
not on (0, 1], while cos(x2 ) is uniformly continuous on all intervals of the form [0, N ], but
not on [0, ∞). Also, these examples show that both the conditions of closed and bounded
are required in Theorem 6.3. Note also that f itself being bounded did not help at all
when it came to uniform continuity in Examples 6.1 and 6.2.
23
For example, it will be used in Analysis III to show one can always integrate a continuous function
on a closed bounded interval.

36
The following is a very special class of functions that are uniformly continuous.

Definition. We say that f is Lipschitz continuous on E if there exists a constant


K > 0 such that
∀x, y ∈ E : |f (x) − f (y)| ≤ K|x − y|.

ε
Assume f satisfies this condition. Given ε > 0 choose δ := K
. Then δ > 0 and for
x, y ∈ E for which |x − y| < δ,

|f (x) − f (y)| ≤ K|x − y| < ε.

Thus f is uniformly continuous on E.

Later we will see (via the Mean Value Theorem) that ‘bounded derivative’ is enough to
imply Lipschitz, and hence uniform continuity. However, not all Lipschitz functions are
differentiable.

Example 6.4. f (x) = x is Lipschitz continuous on [1, ∞), but not on [0, 1]. It is
however uniformly continuous on the whole of [0, ∞)

To obtain the Lipschitz condition on [1, ∞) note that, for all x, y ≥ 1.


√ √ |x − y|
| x − y| = √ √ ≤ 12 |x − y|,
x+ y
√ √ √
so K = 21 works. However, | x − 0| ≤ K|x − 0| fails to hold when x < 1/K 2 , so x is
not Lipschitz on [0, 1].

Now x is continuous on [0, 1] (as it is the inverse of the strictly increasing continuous
function x2 : [0, 1] → [0, 1]), so it is uniformly continuous on [0, 1] and (by the above)
also on [1, ∞). We now stitch these two together to establish uniform continuity on
[0, 1] ∪ [1, ∞). However, this takes a bit of care.

We know
√ √
∀ε > 0 : ∃δ1 > 0 : ∀x, y ∈ [0, 1] : (|x − y| < δ1 =⇒ | x − y| < 12 ε)

and √ √
∀ε > 0 : ∃δ2 > 0 : ∀x, y ∈ [1, ∞) : (|x − y| < δ2 =⇒ | x − y| < 12 ε).
Choose δ = min{δ1 , δ2 } > 0. Suppose that |x − y| < δ. If x, y ≥ 1 or x, y ≤ 1 we are
done. So suppose (wlog) that x ∈ [1, ∞) and y ∈ [0, 1] and |x − y| < δ. Then |x − 1| < δ
and |1 − y| < δ so that
√ √ √ √ √ √
| x − y| ≤ | x − 1| + | 1 − y| < 12 ε + 12 ε = ε.
√ √
Hence |√x − y| < ε whenever x, y ∈ [0, ∞) are such that |x − y| < δ. By definition,
f (x) = x is uniformly continuous on [0, ∞).

37
Remark. In general, if f is uniformly continuous on intervals I and J and I ∩ J ̸= ∅,

! then f is uniformly continuous on the interval I ∪ J. However this does not apply to the
union of infinitely many intervals: f uniformly continuous on [n, n + 1] for each n does
not imply f is uniformly continuous on [1, ∞) as we saw with the cos(x2 ) example.

In the case of an interval that is not closed one can still give a simple condition for uniform
continuity. The proof of the following is an exercise on Problem Sheet 4.

Proposition 6.5. Assume f : (a, b] → R is continuous. Then f is uniformly continuous


if and only if limx→a+ f (x) exists.

7 Uniform convergence

In Analysis one often wants to know how different limiting processes interact with one
other. In particular, does a limiting process, such as that involved in continuity, commute
with another type of limit? Sadly, however, the answer in general is ‘No’. This leads us
to try to find sufficient conditions under which the answer will be ‘Yes’. In this section
we take a first excursion into problems of this kind.

Pointwise convergence

Initially, we want to consider a sequence (fn ) of functions, where E ⊆ R and fn : E → R


for n ∈ N. Observe that, for each fixed x ∈ E, the sequence (fn (x)) is a sequence of real
numbers, whose behaviour we can analyse by the techniques of Analysis I.

We say (fn ) converges (pointwise) to the function f : E → R (and write f = lim fn or


fn → f on E) if for each x ∈ E the sequence (fn (x)) converges to f (x). That is,

∀x ∈ E : ∀ε > 0 : ∃N ∈ N : ∀n > N : |fn (x) − f (x)| < ε. [pointwise convergence on E]

Note that here N is allowed to depend on both x and ε.

Pointwise convergence is nothing unfamiliar. In saying, for example,


x2
ex = 1 + x + 2!
+ ··· on R

we mean precisely that the partial sums of the series on the right-hand side converge
pointwise to the exponential function for each x ∈ R.

Example 7.1. Consider the sequence of functions (fn ), where fn : [0, 1] → R is given by
(
1 − nx, if 0 ≤ x < n1 ;
fn (x) :=
0, if x ≥ n1 .

38
Consider also the function f : [0, 1] → R given by
(
1, if x = 0;
f (x) :=
0, otherwise.

What happens as n increases? Note that for each fixed x ∈ [0, 1] we have f (x) =
limn→∞ fn (x) (separate cases x ̸= 0 and x = 0). Hence (fn ) converges pointwise to f .

Note that although all the fn are continuous, the pointwise-limit function f is not con-
tinuous at 0. Spelling this out,

lim lim fn (x) = lim f (x) = 0 but lim lim fn (x) = lim 1 = 1.
x→0 n→∞ x→0 n→∞ x→0 n→∞

The order in which the limits are taken affects the value of the iterated limit.


! Moral: in general, iterated limits may squabble. They must be handled with care.

Uniform continuity leads to stronger results than continuity one point at a time. The idea
in the definition of uniform continuity was to require a ‘universal δ’. There is a parallel
with the key definition of this section, which we now give.

Uniform convergence

Definition. Let (fn ) be a sequence of functions fn : E → R or C. Then (fn ) converges


uniformly to f on E if24

∀ε > 0 : ∃N ∈ N : ∀n > N : ∀x ∈ E : |fn (x) − f (x)| < ε. [uniform convergence on E]

u
If this holds we write fn →
− f on E. Note that specifying the set E is an integral part of
the definition. The order of the quantifiers matters: the uniform convergence condition
demands a universal N which is independent of x (although it may still depend on ε).
u
It is immediate from the definitions that if fn →
− f on E then (fn ) converges pointwise
to f on E.

The next theorem gives a reason why uniform convergence is a Good Thing.

Theorem 7.2 (Uniform limits preserve continuity). Let (fn ) be a sequence of continuous
functions on E which converges uniformly to f on E. Then f is continuous on E.

Proof. To prove continuity of f we first fix some p ∈ E and ε > 0.


24
The order of ∀n and ∀x does not matter here, so could be swapped to make the correspondence with
the definition of uniform continuity clearer. However this form is slightly more convenient.

39
By uniform convergence we can find N ∈ N such that
fn
n > N =⇒ ∀x ∈ E : |fn (x) − f (x)| < 3ε .
f
Fix an n > N . Then by continuity of fn at p there exists
δ > 0 such that
ε
|x − p| < δ =⇒ |fn (x) − fn (p)| < 3 p x
(δ depending on n — but n is fixed). Hence for |x − p| < δ,
ε
|f (x) − f (p)| ≤ |f (x) − fn (x)| + |fn (x) − fn (p)| + |fn (p) − f (p)| < 3
+ 3ε + ε
3
= ε.

This suffices to prove our claim. Note that uniformity of convergence is needed to handle
the first term simultaneously for every relevant x.

We now convert the uniform convergence condition into a more amenable form.

Proposition 7.3 (Testing for uniform convergence). Assume f, fn : E → R or C. Then


the following statements are equivalent:
u
(a) fn →
− f on E;
(b) for each sufficiently large n the set {|fn (x) − f (x)| : x ∈ E} is bounded and

sn := sup |fn (x) − f (x)| → 0 as n → ∞.


x∈E

Proof. Assume (a). Then, given ε > 0 there exists N ∈ N such that for n > N and for
all x ∈ E we have |fn (x) − f (x)| < 2ε . So the first condition in (b) holds for such n and
hence sn is well defined. Fix n and take the supremum over x ∈ E to get, for n > N ,
ε
0 ≤ sn = sup |fn (x) − f (x)| ≤ 2
< ε.
x∈E

Hence sn → 0.

Conversely, assume (b). Given ε > 0, choose N so that n > N implies sn < ε. Then, for
all n > N and all x ∈ E,
|fn (x) − f (x)| ≤ sn < ε.
u
Hence fn →
− f.

A few comments on working with Proposition 7.3 are in order. First of all, it allows us
to reduce testing for uniform convergence of (fn ) on E to three steps:

Step 1: find the pointwise limit.

With x ∈ E fixed, find f (x) := limn→∞ fn (x), or show it fails to exist (of course, if the
pointwise limit fails to exist for any x ∈ E, then certainly (fn ) does not converge uniformly
and we proceed no further). Look out for values of x which need special attention.

40
Step 2: calculate (or find bounds for) sn .

Assuming all fn and f are continuous and E is an interval [a, b] (the most common
scenario), the Boundedness Theorem applied to the continuous function |fn − f | tells us
the sup is attained, so we want to know the maximum value of |fn − f |. Frequently fn − f
will be of constant sign so we can get rid of the modulus signs. Then, if the functions fn
and f are differentiable the supremum (or infimum) of fn − f will be achieved either at a
or at b or at some interior point where fn′ (x) − f ′ (x) = 0. It is fine to use school calculus
to find maxima and minima by differentiation, when the derivative exists — we’ll validate
this technique later. See examples below for illustrations.

Step 3: see if sn tends to 0.

Now (sn ) is a sequence of real numbers. We are back in Analysis I territory, and can use
standard techniques and standard limits from that course.

Note that in Step 1 we work with fixed x and in Step 2 we work with fixed n (and in
Step 3 we don’t have x anymore): we never need to consider both x and n varying at the
same time.

Example 7.4. Let (


1 − nx, if 0 ≤ x < n1 ;
fn (x) :=
0, if x ≥ n1 .
Step 1: Fix x. Suppose first that x ̸= 0. Then ∃N ∈ N such that 0 < N1 < x
(Archimedean Property). This implies fn (x) = 0 for all n > N . Therefore fn (x) → 0 as
n → ∞ whenever x ̸= 0. If x = 0, then fn (0) = 1 and so fn (0) → 1.

We deduce that the pointwise limit indeed exists and equals f , where
(
1, if x = 0;
f (x) :=
0, otherwise.

Step 2: Now fix n and calculate sn .

sn := sup |fn (x) − f (x)| = sup |1 − nx| = 1.


x∈[0,1] x∈(0,1/n]

Step 3: Trivially, sn → 1 ̸= 0. Hence (fn ) is not uniformly convergent.

Of course the contrapositive of Theorem 7.2 gives an alternative proof that convergence
cannot be uniform.

Example 7.5. Let E = [0, 1) and let fn (x) = xn .

Step 1: For fixed x ∈ [0, 1), we have xn → 0 as n → ∞. Hence the pointwise limit is
f = 0.

Step 2: Trivially, sn = supx∈[0,1) |xn | = 1. Indeed x ∈ [0, 1) implies |xn | ≤ 1, but xn → 1


as x → 1− .

41
Step 3: Hence sn ̸→ 0, so convergence is not uniform.

Now consider what happens if, with fn as before, we work on [0, b], where b is a constant
with 0 ≤ b < 1. The pointwise limit is unchanged but now

sn = sup xn = bn → 0 as n → ∞.
x∈[0,b]

Hence convergence is uniform on [0, b] for each fixed b < 1.

This example highlights that uniform convergence, or not, depends on the set E. It

! makes no sense to say ‘(fn ) converges uniformly’ without specifying the set E on which
the functions are considered. Also beingSuniformly convergent on each En = [0, 1 − n1 ]
does not imply uniform convergence on n En = [0, 1).
Example 7.6. Let E = [0, 1] and let y
f1
nx f2
fn (x) := . f4
1 + n 2 x2 f8
f16
Clearly limn→∞ fn (x) = 0 for every x ∈ [0, 1].
f32
But fn ( n1 ) = 21 , so that x
1
sup |fn (x) − f (x)| ≥ 2
̸→ 0 as n → ∞
x∈[0,1]

and so (fn ) converges to 0 but not uniformly on [0, 1].

In fact x = n1 is the point at which fn (x) is maximal, so sn = 21 . One can find this point
by setting fn′ = 0. Note that we don’t need to justify this, it is enough that x = n1 breaks
uniform convergence.
Example 7.7. Let E = [0, 1] and consider y
f1
f2
f8 f4
2
fn (x) := nx3 e−nx .
f32 f 16

Step 1: Fix x. For x = 0, fn (x) = 0 for all n. x


For x > 0 we have, from the exponential series,
nx3 2
0 ≤ fn (x) = (nx2 )2
≤ → 0 as n → ∞.
1 + nx2 + + ··· nx
2!

So, by sandwiching, fn (x) → 0 and this is true for x = 0 too, trivially.


2
Step 2: Fix n and compute sn := sup{nx3 e−nx : x ∈ [0, 1]}. We have
d 2 2 2 2
nx3 e−nx = 3nx2 e−nx − 2n2 x4 e−nx = nx2 (3 − 2nx2 )e−nx
dx
and this is zero when x = 0 (giving a minimum) and when 2nx2 = 3 (giving a maximum).
Hence √
2
sn = nx3 e−nx √ = n(3/2n)3/2 e−3/2 = C/ n
x= 3/2n

42
where C is a constant independent of n.

If you prefer a proof that does not rely on calculus one can note that for x > √1n ,
2
0 ≤ fn (x) ≤ nx ≤ √2n and for x ≤ √1n , 0 ≤ fn (x) ≤ nx3 ≤ √1n , hence sn ≤ √2n . [If one
needs separate arguments to bound a function in different ranges, it is often easiest to
split at a point (here x = √1n ) that is close to the maximum.]
u
Step 3: From Step 2, sn → 0 as n → ∞. Therefore fn →
− 0 on [0, 1].
Example 7.8 (Partial sums of the geometric series). On E = (−1, 1) consider (fn ) given
by
1 − xn+1
fn (x) := 1 + x + · · · + xn =
1−x
1
Step 1: Fix x with |x| < 1 and let n → ∞. Then fn (x) → f (x) := 1−x .

Step 2: Fix n. Here


 1−xn+1  n+1
: |x| < 1 = |x|1−x : |x| < 1
1


1−x
− 1−x

is not bounded above. To see this, consider what happens as x → 1− .

Just as we found for sequences of real numbers, there is a characterisation of uniform


convergence which does not depend on knowing the limit function.
Theorem 7.9 (Cauchy Criterion for uniform convergence of sequences). For n ∈ N let
fn : E → R or C. Then (fn ) converges uniformly on E if and only if 25
∀ε > 0 : ∃N ∈ N : ∀n, m > N : ∀x ∈ E : |fn (x) − fm (x)| < ε.

Proof. =⇒: Suppose (fn ) converges uniformly on E with limit function f . Then
∀ε > 0 : ∃N ∈ N : ∀n > N : ∀x ∈ E : |fn (x) − f (x)| < 2ε .
So, for all ε > 0 there exists an N such that
ε ε
∀n, m > N : ∀x ∈ E : |fn (x) − fm (x)| ≤ |fn (x) − f (x)| + |fm (x) − f (x)| < 2
+ 2
= ε.
Hence the uniform Cauchy criterion holds.

⇐=: Suppose the uniform Cauchy criterion holds. Then for each x ∈ E, (fn (x)) is a
Cauchy sequence in R, so it is convergent. Let us denote its limit by f (x). Now
∀ε > 0 : ∃N ∈ N : ∀n, m > N : ∀x ∈ E : |fn (x) − fm (x)| < 2ε .
Fix ε > 0, N ∈ N, n > N and x ∈ E, and let m → ∞ in the above inequality. By AOL
and the preservation of weak inequalities26 ,
ε
|fn (x) − f (x)| = lim |fn (x) − fm (x)| ≤ 2
< ε.
m→∞
u
As this holds for all n > N and all x ∈ E, fn →
− f on E.
25
Note that this is just the pointwise Cauchy criterion but with the ∀x moved to after the ∃N .
26
Note how < 2ε changed to ≤ 2ε here.

43
An important application of the Cauchy criterion is to series where we often do not know
what the limit should be. Indeed, we often use series to define a function.

As usual, we handle a series by considering its sequence of partial sums. Accordingly,


P
given a sequence (uk ) of functions defined on a set E we say that the series uk con-
verges pointwise (uniformly) on E if (fn ) converges pointwise (uniformly) on E,
where n
X
fn (x) := u1 (x) + u2 (x) + · · · + un (x) = uk (x).
k=1

Assume each uk is continuous on E. Then P each fn is also continuous on E. P∞As a


corollary of Theorem 7.2 we deduce that if uk converges uniformly on E then k=1 uk
is continuous on E. So we need some way of determining when the convergence is uniform.

Corollary 7.10 (Cauchy Criterion P for uniform convergence of series). Let (uk ) be a
sequence of functions on E. Then uk converges uniformly on E if and only if

∀ε > 0 : ∃N ∈ N : ∀n > m > N : ∀x ∈ E : |um+1 (x) + · · · + un (x)| < ε.


Pn
Proof. Apply Theorem 7.9 to the sequence of partial sums given by fn := k=1 uk .

There is a more user-friendly sufficient condition for uniform convergence of a series. It


is not a necessary condition however.

Theorem 7.11 (Weierstrass’ M -test). Suppose there exist real constants Mk such that
X
∀k : ∀x ∈ E : |uk (x)| ≤ Mk and Mk converges.
P
Then the series uk (x) converges uniformly on E.

Remark. It is critically important in the M -test that Mk is a convergent series of con-



! stants: Mk must be independent of x.
P
Proof. Apply the Cauchy criterion (Theorem 0.2) to the partial sums of Mk :
Xn m
X
∀ε > 0 : ∃N ∈ N : ∀n > m > N : Mk − Mk = Mm+1 + · · · + Mn < ε.

k=1 k=1

Thus we have for each x ∈ E and all n > m > N ,

|fm (x) − fn (x)| = |um+1 (x) + · · · + un (x)| ≤ Mm+1 + · · · + Mn < ε. (5)

Hence27 for each fixed


Px, (fn (x)) satisfies the Cauchy criterion, and so converges to f (x)
say. Thus the series uk converges pointwise.
27
Actually we are now done by Corollary 7.10, but if you are asked to prove the M -test in an exam
you should write out the details as I have done here.

44
To check that convergence is uniform, take the limit as n → ∞ in (5) (with x and m
fixed) to get that for all m > N and x ∈ E,

|fm (x) − f (x)| ≤ ε.

As ε was arbitrary and N did not depend on x, fm converges to f uniformly as m →


∞.
xp
Example 7.12. On E = [0, 1] and for k ≥ 1, let uk (x) = 1+k2 x2
where p is a constant.

Assume p ≥ 2. Then, for x ∈ [0, 1],

xp−2 1
|uk (x)| ≤ ≤ Mk := . (6)
k2 k2
k −2 converges,
P P
Since uk (x) converges uniformly on [0, 1] by the M -test.

Now assume 1 < p < 2. The choice of Mk we used in (6) no longer works. Note that
uk (x) ≥ 0 so, for fixed k, let’s find the maximum value of uk (x) on [0, 1] by differentiation.
We have
pxp−1 (1 + k 2 x2 ) − 2k 2 xp+1
u′k (x) =
(1 + k 2 x2 )2
p we see that the maximum of uk on [0, 1] is achieved at xk ∈ [0, 1] where xk =
and
p/(2 − p)/k. We deduce that, for all x ∈ [0, 1],

C
0 ≤ uk (x) ≤ uk (xk ) ≤ Mk := ,
kp
where C is a positive constant depending on p but independent of x.
p
[Alternatively: if x < k1 , uk (x) ≤ xp ≤ k1p ; while if x ≥ k1 , uk (x) ≤ kx2 x2 = k1p ( xk )p−2 ≤ k1p .]
P 1 P
The series kp
converges for p > 1 by the Integral Test. Hence uk converges uniformly
on [0, 1] by the M -test.

 Remark. The M -test is useful when it works, but is not infallible. It investigates the
maximum of each term separately rather than of the expression arising in the uniform
Cauchy criterion, Corollary 7.10.

Power series

We now reach another Big Theorem.

ck xk be a
P
Theorem 7.13 (Uniform convergence and continuity of power series). Let
real or complex power series with radius of convergence R ∈ (0, ∞].
ck xk converges uniformly on {x : |x| ≤ ρ} for any (finite) ρ with 0 < ρ < R.
P
(a)
(b) f (x) := ∞ k
P
k=0 ck x defines a continuous function f on {x : |x| < R}.

45
Proof. (a) Let Mk = |ck ρk |. Then as ρ <PR, ck ρk converges absolutely, and so
P P
Mk
k k
converges. For |x| ≤ ρ, |ck x | ≤ Mk , so ck x converges uniformly on {x : |x| ≤ ρ} by
the M -test.

ck xk converges
P
(b) Fix x0 with |x0 | < R and choose ρ so that |x0 | < ρ < R. By (i),
uniformly on {x : |x| ≤ ρ} and as, polynomials are continuous, Theorem 7.2 implies that
the limit f (x) is continuous on {x : |x| ≤ ρ}. Hence f is continuous at x0 .

 Remark. We needed |x0 | < ρ in the proof of (b). If |x0 | = ρ we would only be able to
deduce some sort of one-sided continuity of f from continuity on {x : |x| ≤ ρ}.

Corollary 7.14. The following functions, given by power series with infinite radius of
convergence, are continuous on R:

exp x, sin x, cos x, sinh x, cosh x.

Functions derived from these via reciprocal and quotient, such as

cosec x, sec x, tan x, cot x

are continuous on any set on which the denominator is never zero.

Functions which can be derived from the above functions by application of the Continuous
Inverse Function Theorem are themselves continuous. This includes log x on (0, ∞) and
arctan x on (−∞, ∞).

Warning. We cannot stress too strongly that Theorem 7.13 is subtle and needs applying

! with
P care.
k
Let
P
c k x k
be a power series with radius of convergence R > 0. In general
P ckkx will not converge uniformly on {x : |x| < R}. Indeed, Example 7.8 shows that
x is not uniformly convergent on (−1, 1). It does however converge uniformly any
any interval [−ρ, ρ] with 0 < ρ < 1, and the limit is continuous on the whole of (−1, 1).
Remember that uniform convergence (and uniform continuity) are global properties, they
depend on the whole of E. Pointwise convergence and continuity are local properties —
for them to hold on E one just needs to check what happens at or near each x0 ∈ E.

Example 7.15. Consider the series ∞ k 2


P
k=0 x cos(kx ) on E = [0, 1). By the Comparison
Test the series converges for each fixed x ∈ [0, 1).

For any η with 0 < η < 1,


X
∀x ∈ [0, η] : |xk cos(kx2 )| ≤ Mk := |η|k and Mk converges.

By the M -test, the series converges uniformly on [0, η].

We don’t have a candidate for Mk which would show P∞ that the series is uniformly con-
vergent on [0, 1). Nonetheless we claim that f (x) := k=0 xk cos(kx2 ) defines a function
which is continuous on [0, 1). To do this, fix p with 0 ≤ p < 1 and choose η > 0 with
p < η < 1. Then the series converges uniformly on [0, η]. Since each function xk cos(kx2 )
is continuous on [0, η], Theorem 7.2 implies that f is continuous on [0, η] and hence is
continuous at p.

46
Example 7.16. Consider the series

X k2x
.
k=0
1 + k 4 x2

We claim that this converges uniformly on [δ, 1] for each δ with 0 < δ < 1. Let Mk :=
k −2 δ −1 . Then, on [δ, 1]
k2x k2x
≤ 4 2 ≤ k −2 δ −1 = Mk .

1 + k 4 x2 k x

P
Since Mk converges, we do indeed have uniform convergence on each interval [δ, 1].
We shall now show that the series is not uniformly convergent on the interval (0, 1].
[Note: failing to find an appropriate Mk is not enough — the M -test is sufficient but not
necessary for uniform convergence.]

If the series were uniformly convergent, the uniform Cauchy criterion would show that,
for any ε > 0 there exists N such that for all x ∈ (0, 1], and all n > N ,
n
X k 2 x n2 x
= <ε

1 + k 4 x2 1 + n 4 x2

k=n

1 1
ButPfor x = n2 this would give 2 < ε for every ε > 0, a contradiction. [More generally:
if uk (x) converges uniformly on E then uk (x) → 0 uniformly on E.]

But, localising to a point p ∈ (0, 1] and choosing δ such that 0 < δ < p, we see that the
series defines a function which is continuous on (0, 1].

8 Differentiation

In this section we look at differentiation, making use of the machinery of function limits
which we have developed. We rediscover all the familiar differentiation rules from school
calculus and start to explore examples of functions which are and are not differentiable.
Major theorems on differentiable functions come in the next section.
Definition. Let f : E → R, and let x0 ∈ E be a limit point of E ⊆ R. We say f is
differentiable at x0 if the following limit exists:
f (x) − f (x0 )
lim .
x→x0 x − x0
When it exists we denote the limit by f ′ (x0 ) and we call it the derivative of f at x0 .
We say that f is differentiable on E if f is differentiable at every point of E.

Alternative notations: We shall, as convenient, adopt the various different ways of writing
derivatives with which you’ll be already familiar: for a differentiable function y = y(x):
dy d
y′ or or y(x).
dx dx
47
We next present a reformulation of the definition of differentiability as a point. The
central idea is to avoid the need for division, which often simplifies the algebra28

Proposition 8.1 (Alternative formulation of differentiability). Let f : E → R and let x0


be a limit point of E. Then the derivative f ′ (x0 ) exists and equals ℓ iff one can write

f (x0 + h) = f (x0 ) + ℓh + ε(h)h

with ε(h) → 0 as h → 0.

Proof. Note that for any x = x0 + h ̸= x0 , f (x0 + h) = f (x0 ) + ℓh + ε(h)h is equivalent to

f (x0 + h) − f (x0 )
ε(h) = − ℓ.
(x0 + h) − x0

Thus the definition of the derivative being equal to ℓ is precisely the condition (after the
change of variable x = x0 + h) that ε(h) → 0 as h → 0.

Example 8.2. It is immediate that f given by f (x) = x is differentiable on R with


f ′ (x) = 1. Indeed, we can take ℓ = 1, ε(h) = 0 in Proposition 8.1. Slightly more
interestingly, f (x) = x2 is differentiable with f ′ (x) = 2x: take ℓ = 2x0 and ε(h) = h in
Proposition 8.1.

Another easy consequence is that differentiability implies continuity.

Proposition 8.3 (Differentiability implies continuity). Let f : E → R and let x0 be a


limit point of E. If f is differentiable at x0 , then it is continuous at x0 .

Proof. limh→0 f (x0 + h) = f (x0 ) is immediate from Proposition 8.1 and AOL.
[Alternatively: limx→x0 f (x)−f (x0 ) = limx→x0 f (x)−f
x−x0
(x0 )
·limx→x0 (x−x0 ) = f ′ (x0 )·0.]

Generalisations

Generalisations to functions C → C and R → C are straightforward. We can’t extend


to functions C → R. (Why: firstly f ′ would have to be in C anyway since we need to
divide by x − x0 ∈ C, but for a more fundamental problem wait for the Part A course
Metric spaces and Complex Analysis — it turns out that f would have to be constant for
f ′ to exist in any reasonable subset of C.) Extensions to vector-valued function are also
straightforward, but basically just amount to doing everything coordinatewise. Function
of several variables or functions of vectors are a bit more complicated (see Multivariable
Calculus or better, the Part A course Multidimensional Analysis and Geometry).
28
Also, with minor changes, it allows for differentiation of functions defined on vectors. More on this
in the Part A course Multidimensional Analysis and Geometry.

48
Big-O and little-o notation

When expressing error terms, it is often convenient to use Landau’s big-O/little-o notation
that was introduced in Analysis I.
Definition. If f, g : E → R we say f (x) = O(g(x)) as x → p if there is a constant
M such that |f (x)/g(x)| ≤ M for x sufficiently close to p. We say f (x) = o(g(x)) if
f (x)/g(x) → 0 as x → p.

Here we include the possibility that p = ±∞ in which case ‘sufficiently close to p’ means
‘sufficiently large’ (or ‘sufficiently large and negative’ when p = −∞).
Example 8.4. We have x2 = o(x), sin x = O(x), sin x = o(1) as x → 0. We have
x = o(x2 ), sin x = O(1), x1 = o(1), log x = O(x) as x → ∞.
Example 8.5. We can write the condition for differentiability in Proposition 8.1 as
f (x0 + h) = f (x0 ) + f ′ (x0 )h + o(h) as h → 0.

 Remark. Writing f (x) = O(g(x)) or f (x) = o(g(x)) is slight abuse of notation as the
RHS is really a set of possible functions, one of which matches the LHS. In particular,
o() or O() should only appear on one
√ side in any equation (typically the RHS). It would
be very confusing to write e.g., O( x) = o(x).

One sided derivatives

If E = [a, b] then for f to be differentiable at a or b involves taking a one-sided limit. More


generally, sometimes it is helpful or necessary to consider one-sided versions of derivatives
even when we are not at one end of the domain. We say that f has a right-derivative
at x0 if
f (x) − f (x0 )
lim+
x→x0 x − x0
exists. This is equivalent to asking for the function f restricted to [x0 , b) to have a
derivative at x0 . When it does, we denote the limit by f+′ (x0 ) (Alternative notation:
f ′ (x+
0 ), but this can be confused with limx→x+
0
f ′ (x), which is not the same thing!29 )
Similarly, f has a left-derivative at x0 if
f (x) − f (x0 )
lim−
x→x0 x − x0

exists, in which case we write it as f−′ (x0 ).

The following result is easily proved.


Proposition 8.6. Let f : E → R and assume x0 ∈ E is both a left and right limit point
of E. Then the following are equivalent:
29
Although it turns out that if f is continuous and f ′ (x+
0 ) = limx→x+ f ′ (x) exists and then so does
0

f+ (x0 ) and they are equal. See Problem Sheet 5.

49
(a) f is differentiable at x0 ;
(b) f has both left- and right-derivatives at x0 and they are equal.

Example 8.7. Consider f (x) = |x| on R. Here f is differentiable at any x0 ̸= 0. At 0


we have one-sided derivatives f−′ (0) = −1 and f+′ (0) = 1, so f ′ (0) fails to exist.

This example shows that a function which is continuous at a point x0 need not be differ-

! entiable at x0 .

Example 8.8. Define f : R → R by


(
x3/2 , for x > 0;
f (x) =
0, for x ≤ 0.

Then f−′ (0) exists and equals 0, obviously. Also

x3/2 − 0 √
f+′ (0) = lim+ = lim+ x = 0.
x→0 x−0 x→0

Hence, by Proposition 8.6, f ′ (0) exists and equals 0. Alternatively, we can give a direct
sandwiching argument:
f (x) − f (0) |x|3/2 p
− 0 ≤ = |x| → 0 as x → 0.

x−0 |x|

Now we start assembling the rules of differential calculus as you learned them at school,
but now obtained as consequences of AOL for function limits.

Theorem 8.9 (Algebraic properties of differentiation). Assume that f, g : E → R are


both differentiable at the limit point x0 ∈ E, and that a, b ∈ R. Then the following hold.
(a) Linearity: af (x) + bg(x) is differentiable at x0 with derivative af ′ (x0 ) + bg ′ (x0 ).
(b) Product Rule: f (x)g(x) is differentiable at x0 with derivative f ′ (x0 )g(x0 )+f (x0 )g ′ (x0 ).
(c) Quotient Rule: Assume g(x0 ) ̸= 0. Then f (x)/g(x) is differentiable at x0 with
derivative
f ′ (x0 )g(x0 ) − f (x0 )g ′ (x0 )
.
g(x0 )2

Proof. (a)&(b) We have

f (x0 + h) = f (x0 ) + f ′ (x0 )h + ε1 (h)h,


g(x0 + h) = g(x0 ) + g ′ (x0 )h + ε2 (h)h,

where ε1 (h), ε2 (h) → 0 as h → 0. Then

af (x0 + h) + bg(x0 + h) = af (x0 ) + bg(x0 ) + af ′ (x0 ) + bg ′ (x0 ) h + aε1 (h) + bε2 (h) h
  

50
and

f (x0 + h)g(x0 + h) = f (x0 )g(x0 ) + f (x0 )g ′ (x0 ) + f ′ (x0 )g(x0 ) h




+ f (x0 )ε2 (h) + g(x0 )ε1 (h) + (f ′ (x0 ) + ε1 (h))(g ′ (x0 ) + ε2 (h))h h.
 

By standard AOL for function limits the expressions in square brackets tend to 0 as h → 0.
Now by Proposition 8.1 we deduce that af (x)+bg(x) and f (x)g(x) are differentiable at x0 ,
with derivatives af ′ (x0 ) + bf ′ (x0 ) and f (x0 )g ′ (x0 ) + f ′ (x0 )g(x0 ) respectively.

[If one wanted to write these proofs out using o-notation, one could write:

af (x0 + h) + bg(x0 + h) = af (x0 ) + bg(x0 ) + af ′ (x0 )h + bg ′ (x0 )h + o(ah) + o(bh)


= af (x0 ) + bg(x0 ) + (af ′ (x0 ) + bg ′ (x0 ))h + o(h),
f (x0 + h)g(x0 + h) = f (x0 ) + f ′ (x0 )h + o(h) g(x0 ) + g ′ (x0 )h + o(h)
 

= f (x0 )g(x0 ) + f ′ (x0 )g(x0 )h + f (x0 )g ′ (x0 )h + f ′ (x0 )g ′ (x0 )h2


+ o(f (x0 )h) + o(f ′ (x0 )h2 ) + o(g(x0 )h) + o(g ′ (x0 )h2 ) + o(h2 )
= f (x0 )g(x0 ) + (f ′ (x0 )g(x0 ) + f (x0 )g ′ (x0 ))h + o(h). ]

(c) We first give the result when f (x) := 1. Note that

1/g(x) − 1/g(x0 ) −1 g(x) − g(x0 )


= · .
x − x0 g(x)g(x0 ) x − x0

Taking limits as x → x0 and using AOL and continuity of g at x0 gives that (1/g)′ (x0 )
exists and  1 ′ −1
(x0 ) = · g ′ (x0 ).
g g(x0 )2
The general quotient rule can then be obtained by combining this with the product rule:
 f ′ 1 −g ′ (x0 ) f ′ (x0 )g(x0 ) − f (x0 )g ′ (x0 )
(x0 ) = f ′ (x0 ) · + f (x0 ) · = .
g g(x0 ) g(x0 )2 g(x0 )2
Example 8.10. The power function xn is differentiable at all points, for n ∈ N, and so
are polynomials, and rational functions at points where the denominator is non zero.

Higher Derivatives.

Suppose that f : (a, b) → R is differentiable at every point of (a, b), then it makes sense to
ask if f ′ is differentiable at x0 ∈ (a, b). If it is differentiable then we denote its derivative
by f ′′ (x0 )

We can seek to iterate this process. Write f (0) = f , f (1) = f ′ , and suppose f (0) = f ,
f (1) = f ′ , . . . , f (n) have been defined recursively at every point of (a, b) (we make this
assumption to simplify matters). If f (n) is differentiable at x0 ∈ (a, b) then we say f is
(n + 1)-times differentiable at x0 and we write f (n+1) (x0 ) := (f (n) )′ (x0 ).

51
If f has derivatives of all orders on (a, b) (that is, f (n) (x0 ) exists for each x0 ∈ (a, b) and
for each n = 1, 2, . . . , we sometimes say it is infinitely differentiable on (a, b).

The following is proved by an easy induction using Linearity and the Product Rule.
(Compare with the proof of the binomial expansion of (1 + x)n for n a positive integer.)

Proposition 8.11 (Leibniz’ Formula). Let f, g : (a, b) → R be n-times differentiable on


(a, b). Then x 7→ f (x)g(x) is n-times differentiable and
n  
(n)
X n (j)
(f g) (x) = f (x)g (n−j) (x).
j=0
j

Proof. Exercise.

Chain Rule

Theorem 8.12 (Chain Rule). Assume that f : E → R and that g : E ′ → R with f (E) ⊆
E ′ (so that g ◦ f : E → R is defined ). Suppose further that f is differentiable at the limit
point x0 ∈ E and that g is differentiable at f (x0 ). Then g ◦ f is differentiable at x0 and

(g ◦ f )′ (x0 ) = g ′ (f (x0 ))f ′ (x0 ).

Proof. For convenience write y0 = f (x0 ). Then by Proposition 8.1 we have

f (x0 + h) = f (x0 ) + f ′ (x0 )h + ε1 (h)h,


g(y0 + η) = g(y0 ) + g ′ (y0 )η + ε2 (η)η,

where ε1 (h), ε2 (η) → 0 as h, η → 0. We define ε2 (0) = 0 so that ε2 is continuous at 0 and


note that the above also holds for η = 0. Now set

η := f (x0 + h) − f (x0 ) = f ′ (x0 )h + ε1 (h)h

so that

g(f (x0 + h)) − g(f (x0 )) = g(y0 + η) − g(y0 ) = g ′ (y0 )η + ε2 (η)η


= g ′ (y0 )f ′ (x0 )h + g ′ (y0 )ε1 (h) + ε2 (η)(f ′ (x0 ) + ε1 (h)) h.
 

Now η = f ′ (x0 )h + ε1 (h)h → 0 as h → 0. Thus30 ε2 (η) → 0 as h → 0. So, by AOL, the


expression in square brackets tends to 0 as h → 0. Thus g(f (x)) is differentiable at x0
and the derivative is g ′ (y0 )f ′ (x0 ) = g ′ (f (x0 ))f ′ (x0 ).

Example 8.13. Let f (x) = x2 cos x1 for x ̸= 0 and f (0) = 0. We shall assume that
cos and sin are differentiable with the expected derivatives. This will follow from the
Differentiation Theorem for power series (Theorem 8.16).
30
Note that we could have η = 0, so it is important that we defined ε2 (0) = 0.

52
On R \ {0} we can apply the standard differentiation rules, including the Chain Rule,
and we get, for x ̸= 0,
f ′ (x) = 2x cos x1 + sin x1 .
Now consider 0: for x ̸= 0,
f (x) − f (0)
= |x cos x1 | ≤ |x| → 0 as x → 0.

x−0

Therefore f ′ (0) exists and equals 0.

Note that the formula for f ′ (x) for x ̸= 0 shows that limx→0 f ′ (x) fails to exists (the
first term tends to 0, the second one does not have a limit as x → 0, so the sum cannot
tend to a limit). We deduce that f ′ is not continuous at 0. By the contrapositive of
Proposition 8.3, f ′′ (0) cannot exist. (Note that f is infinitely differentiable on R \ {0}.)

Inverse functions

Like the other main results in this section, our final theorem tells us how to build new
differentiable functions.

Theorem 8.14 (The Inverse Function Theorem31 (IFT)). Suppose I is a non-trivial


interval and f : I → R is a strictly monotonic continuous function with inverse function
g : f (I) → I. Assume that f is differentiable at x0 ∈ I and that f ′ (x0 ) ̸= 0. Then g is
differentiable at f (x0 ) and
1
g ′ (f (x0 )) = ′ .
f (x0 )

Proof. The statement includes all the assumptions we imposed for the Continuous IFT.
Hence f (I) is an interval and g : f (I) → I is continuous and strictly monotonic. Now let
y0 = f (x0 ). Then

g(y) − g(y0 ) x − x0
g ′ (f (x0 )) = lim = lim ,
y→y0 y − y0 y→y0 f (x) − f (x0 )

provided this last limit exists, and where we have defined x = g(y). But g is continuous,
so x → x0 (and x ̸= x0 by injectivity of g) as y → y0 , and thus

x − x0  f (x) − f (x ) −1
0 1
lim = lim = ′
y→y0 f (x) − f (x0 ) x→x0 x − x0 f (x0 )

by Theorem 2.8 and AOL.


31
The IFT is usually quoted as saying f ′ (x0 ) ̸= 0 and f ′ continuous at x0 implies f is invertible
near x0 , the inverse having the appropriate derivative. But f ′ (x0 ) ̸= 0 and f ′ continuous imply f ′ has a
constant sign near x0 and as we will see later this will imply monotonicity near x0 . The version given
here therefore implies the standard form of the IVT, and is in fact stronger.

53
Still assuming the Differentiation Theorem for power series and its consequences for
the elementary functions, we deduce that the following are differentiable and have the
expected derivatives

log : (0, ∞) → R log′ (y) = y1 ,


arctan : R → R arctan′ (y) = 1
1+y 2
.

To confirm the result for g(y) = log y, note that, for fixed y0 ∈ (0, ∞), Theorem 8.14
can be applied with f (x) = exp x. Write x0 = log y0 so y0 = exp x0 . The formula in the
theorem gives
1 1 1
log′ (y0 ) = = = .
exp′ (x0 ) exp(x0 ) y0
The derivative of arctan is handled similarly, making use of standard trigonometric iden-
tities.

Differentiation of power series

Our objective in this section is to prove the Differentiation Theorem for power series
which was introduced, but not proved, in Analysis I, and states that one can differentiate
a power series ‘term-by-term’ provided one is strictly inside the radius of convergence of
the power series.

We will prove this in a manner that works for complex power series as it is an important
result in Complex Analysis as well, and the proof is identical.

We first show that the result of term-by-term differentiation is well-defined inside the
radius of convergence of the original series.
Lemma 8.15 (ROC of derivative power series). Suppose the power series k≥0 ck xk has
P

radius of convergence R ∈ [0, ∞]. Then the power series k≥1 kck xk−1 also has radius of
P
convergence R.

Proof.
P Suppose |x| < R. Then by the definition of R there exists y such that |x| < |y| < R
and ck y converges. But then ck y k → 0 as k → ∞ and in particular the P
k
sequence (ck y k )
−1
is bounded, say |ck y | ≤ M . Then |kck x | ≤ M |y| ·k(|x|/|y|)P . Now k(|x|/|y|)k−1
k k−1 k−1

converges by e.g., the Ratio Test. Thus by the Comparison Test kck xk−1 is (absolutely)
convergent.

Conversely, if |x| > R we know ck xk ̸→ 0, but then clearly kck xk−1 ̸→ 0, so kck xk−1 is
P
divergent.
Theorem 8.16P(Differentiation Theorem for power series). Let the real or complex power
series f (x) := ∞ k
k=0 ck x have radius of convergence R ∈ (0, ∞]. Then f is differentiable

in {x : |x| < R} and f is given by term-by-term differentiation:

X

f (x) = kck xk−1 .
k=1

54
EXAM P∞x0 ∈ Ck−1
Proof. Fix with |x0 | < R and fix ρ ∈ R with |x0 | < ρ < R. By Lemma 8.15,
g(x) := k=1 kck x has radius of convergencePR and hence g(x0 ) is well defined. We
also observe, applying Lemma 8.15 again, that ∞ k=2 k(k − 1)ck x
k−2
has ROC R, and so
converges absolutely at ρ < R. In particular

X
M := k(k − 1)|ck |ρk−2 < ∞.
k=2

Now to show f ′ (x0 ) = g(x0 ) it is enough to bound


f (x) − f (x ) X ∞  xk − xk 
0 0 k−1
− g(x0 ) = ck − kx0

x − x0 x − x0

k=1

when x is sufficiently close to x0 , so wlog |x| < ρ. Summing a geometric series we have
xk − xk0
= xk−1
0 + xk−2
0 x + ··· + x
k−1
,
x − x0
xk −xk0
so for k = 1 we have x−x0
= kxk−1
0 and for k ≥ 2 we get

xk − xk0
− kxk−1
0 = (xk−1
0 + x0k−2 x + · · · + xk−1 ) − (x0k−1 + xk−1
0 + · · · + xk−1
0 )
x − x0
= x0k−1 (1 − 1) + xk−2 k−3 2 2
0 (x − x0 ) + x0 (x − x0 ) + · · · + (x
k−1
− xk−1
0 )
k−2 k−1 k−2 k−2

= (x − x0 ) · x0 + x0 (x + x0 ) + · · · + (x + · · · + x0 ) .

Hence if |x0 |, |x| < ρ we have


xk − xk
0 k(k−1) k−2
− kxk−1 k−2
+ 2ρk−2 + · · · + (k − 1)ρk−2 = |x − x0 | ·

0 ≤ |x − x0 | ρ ρ .

2
x − x0

But then
f (x) − f (x ) ∞
0
X
− g(x0 ) ≤ |x − x0 | 1
k(k − 1)|ck |ρk−2 = M
|x − x0 | → 0 as x → x0 .

2 2
x − x0

k=2

Example 8.17. The series defining exp, cos, sin, cosh, sinh all have infinite radius of
convergence. The Differentiation Theorem gives, for x ∈ R,
∞ ∞ ∞
d d X xk ∗ X d xk X xk−1
exp(x) = = = = exp(x);
dx dx k=0 k! k=0
dx k! k=1
(k − 1)!
∞ ∞ ∞
d X (−1)k x2k ∗ X d (−1)k x2k X (−1)k x2k−1
cos(x) = d = = = − sin(x);
dx k=0
(2k)! k=0
dx (2k)! k=1
(2k − 1)!
∞ ∞ ∞
d d X (−1)k x2k+1 ∗ X d (−1)k x2k+1 X (−1)k x2k
sin(x) = = = = cos(x);
dx dx k=0 (2k + 1)! k=0
dx (2k + 1)! k=1
(2k)!

and likewise for cosh x and sinh x. The occurrences of = show the points at which we
have differentiated term by term, as the Differentiation Theorem tells us we may.

55
EXAM
A continuous but nowhere differentiable function

In this (non-examinable) section we construct a function that is continuous on R, but


not differentiable at any point. This construction might seem pathological, but in some
sense ‘most’ continuous functions are like this32 .

Define

X
f (x) := 2−k cos(10k · 2πx). 0 1
k=0

Note that f (x) converges uniformly on R by the M -test (with Mk = 2−k ). Hence f is
continuous on R. It is also periodic33 with period 1.

Now comes the difficult bit: showing f is not differentiable anywhere.

Pick x0 ∈ R and define

yn = 10−n ⌊10n x0 ⌋, zn = 10−n ⌊10n x0 ⌋ + 1



and 2
.

In other words, yn is x0 ‘rounded down’ to n decimal places and zn appends the digit 5
at the n + 1st place after the decimal point. Now summing from k = n onwards we have

X ∞
X
2−k cos(10k · 2πyn ) − 2−k cos(10k · 2πzn ) = 2−n (1 − (−1)) + 0 + · · · = 2 · 2−n , (7)
k=n k=n

as 10k · 2πyn and 10k · 2πzn are an even and odd multiple of π respectively for k = n
and both are even multiples of π for all k > n. Also, for any x, y, | cos(x) − cos(y)| =
|2 sin x+y
2
sin x−y
2
| ≤ 2 · 1 · | x−y
2
| = |x − y|, so for the first n terms of the sum we have

Xn−1 n−1
X X n−1
−k −k
k
2 cos(10 · 2πyn ) − k
2 cos(10 · 2πzn ) ≤ 2−k 10k · 2π|yn − zn |


k=0 k=0 k=0
−n 5n −1
= (1 + 5 + · · · + 5 n−1 1
) · 2π · · 10
2
= 5−1
· π · 10−n ≤ 2−n . (8)

Hence, by combining (7) and (8) and using the reverse triangle inequality, we have

|f (zn ) − f (yn )| ≥ 2 · 2−n − 2−n = 2−n .

Now suppose f were differentiable at x0 . Then

f (yn ) = f (x0 ) + f ′ (x0 )(yn − x0 ) + o(yn − x0 ),


f (zn ) = f (x0 ) + f ′ (x0 )(zn − x0 ) + o(zn − x0 ).
32
In the Part B course Continuous Martingales and Stochastic Calculus one constructs Brownian
motion, which is a model of a random continuous function. It turns out that with probability 1 it is
nowhere differentiable.
33
In fact it is a key result in Fourier analysis that any periodic continuous function can be written as
an infinite series of trigonometric functions. Thus the form of f is not particularly special.

56
But then

|f (zn ) − f (yn )| ≤ |f ′ (x0 )||zn − yn | + o(|yn − x0 |) + o(|zn − x0 |) ≤ K · 10−n

for any K > |f ′ (x0 )| when n is sufficiently large as |yn − x0 |, |zn − x0 |, |zn − yn | ≤ 10−n .
But for large n this contradicts the fact that |f (zn ) − f (yn )| ≥ 2−n . Hence f ′ (x0 ) does
not exist.
Remark. This example also shows that a uniform limit of differentiable functions is not

! necessarily differentiable.

9 The Mean Value Theorem

In this section we shall restrict attention to real-valued functions defined on intervals


in R. While many of the results we obtained in the previous section for real-valued
functions of a real variable have obvious analogues when R is replaced by C, the theory
of differentiability of complex valued functions on the complex plane turns out to be very
different from that in the real case and is much more powerful. Complex Analysis is
covered within the Part A Core. The results in this section however rely heavily on the
fact that the functions are real-valued.
Definition. Let E ⊆ R and f : E → R.
(a) x0 ∈ E is a local maximum of f if there exists a δ > 0 such that f (x) ≤ f (x0 ) for
all x ∈ (x0 − δ, x0 + δ) ∩ E.
(b) x0 ∈ E is a local minimum of f if there exists a δ > 0 such that f (x) ≥ f (x0 ) for
all x ∈ (x0 − δ, x0 + δ) ∩ E.
A local maximum or minimum is called a local extremum. If the inequality is strict
(for x ̸= x0 ) we will say that the extremum is strict.

Here is the crucial property.


Theorem 9.1 (Fermat’s Theorem on Extrema). Let f : (a, b) → R and suppose that
x0 ∈ (a, b) is a local extremum and f is differentiable at x0 . Then f ′ (x0 ) = 0.

Proof. If x0 is a local maximum, then there exists δ > 0 such that whenever 0 < x−x0 < δ
and x ∈ (a, b),
f (x) − f (x0 )
≤ 0,
x − x0
so that
f (x) − f (x0 )
f ′ (x0 ) = f+′ (x0 ) = lim+ ≤ 0.
x→x0 x − x0
On the other hand, there exists δ > 0 such that whenever −δ < x − x0 < 0 and x ∈ (a, b),
f (x) − f (x0 )
≥ 0,
x − x0

57
so that
f (x) − f (x0 )
f ′ (x0 ) = f−′ (x0 ) = lim− ≥ 0.
x→x0 x − x0
We conclude that f ′ (x0 ) = 0.
A similar argument applies when x0 is a local minimum. (Or apply the above to −f .)
Remark. In Fermat’s theorem it is essential that the interval (a, b) is open. Why?

We now apply Fermat’s Theorem to obtain a simple criterion for the existence of a point
where f ′ = 0.
Theorem 9.2 (Rolle’s Theorem). Let a < b and f : [a, b] → R. Assume that
(a) f is continuous on [a, b];
(b) f is differentiable on (a, b);
(c) f (a) = f (b).
Then there exists ξ ∈ (a, b) such that f ′ (ξ) = 0.

Proof. As f is continuous on [a, b] it is bounded and attains its maximum and minimum
on [a, b] (by the Boundedness Theorem). If f (x0 ) > f (a) for some x0 ∈ [a, b] let ξ ∈ [a, b]
be such that f (ξ) = supx∈[a,b] f (x). As f (ξ) ≥ f (x0 ) > f (a) = f (b), ξ ∈ (a, b). Also
ξ is a clearly a local maximum of f and so by Fermat’s result f ′ (ξ) = 0. Similarly if
f (x0 ) < f (a) for some x0 ∈ [a, b] we can take ξ ∈ [a, b] such that f (ξ) = inf x∈[a,b] f (x).
The only remaining case is if f (x0 ) = f (a) for all x0 ∈ [a, b]. But then f (x) is a constant
and so f ′ (ξ) = 0 for any ξ ∈ (a, b).

f g h

Need continuity Need differentiability Need f (a) = f (b) All conditions


on all of [a, b] on all of (a, b) satisfied.

When using the theorem remember to check all conditions including the continuity and
differentiability conditions. For example, f : [0, 1] → R defined by f (x) = x for x ∈
[0, 1) and f (1) = 0 satisfies all the conditions except continuity at 1. The function
g : [−1, 1] → R given by g(x) = |x| satisfies all conditions except that g is not differentiable
at x = 0. And the function h : [0, 1] → R given by h(x) = x satisfies all conditions except
h(0) = h(1). But in all three cases there is no point at which the derivative is zero.
Remember that f is differentiable implies that f is continuous. Thus the hypotheses (a)
and (b) would be satisfied if f was differentiable on [a, b] (with one-sided derivatives at
the endpoints). However, often it is important that Rolle holds under the given weaker
conditions.
One way of expressing Rolle’s Theorem informally is by saying
‘Between any two zeros of f there is a zero of f ′ .’
The following is an example where Rolle’s Theorem is applied several times in this form.

58
Example 9.3. Assume that the real-valued function f is twice differentiable on [0, 1]
and that f ′′′ exists in (0, 1). Assume in addition that f (0) = f ′ (0) = f (1) = f ′ (1) = 0.
To prove: that there exists a point ξ ∈ (0, 1) at which f ′′′ (ξ) = 0.

The conditions are satisfied to apply Rolle’s Theorem to f on [0, 1] and so there exists
α ∈ (0, 1) such that f ′ (α) = 0. Now the conditions are satisfied to apply Rolle’s Theorem
to f ′ on each of [0, α] and [α, 1] to obtain β1 and β2 with 0 < β1 < α < β2 < 1 and
f ′′ (β1 ) = f ′′ (β2 ) = 0. Finally, since β1 , β2 ∈ (0, 1) on which f ′′′ is given to exist, we
know f ′′ is continuous on [β1 , β2 ] and differentiable on (β1 , β2 ), so we can apply Rolle’s
Theorem to f ′′ on [β1 , β2 ] to obtain the required point ξ ∈ (β1 , β2 ) with f ′′′ (ξ) = 0.

The next theorem is one of the most important and useful in the course. Is is easily derived
for Rolle’s Theorem by adding a suitable linear function to f to make the endpoints agree.

Theorem 9.4 (Mean Value Theorem (MVT)). Let a < b and f : [a, b] → R. Assume
(a) f is continuous on [a, b]; and
(b) f is differentiable on (a, b).
Then there exists ξ ∈ (a, b) such that

f (b) − f (a)
f ′ (ξ) = .
b−a
Proof. Define F (x) := f (x)−f (a)−K(x−a) where K is chosen so that F (b) = F (a) = 0,
namely
f (b) − f (a)
K := .
b−a
Certainly F : [a, b] → R is continuous, F is differentiable on (a, b) and, by choice of K,
F (a) = F (b). Thus Rolle’s Theorem applies, and so F ′ (ξ) = 0 for some ξ ∈ (a, b). But
F ′ (x) = f ′ (x) − K so
f (b) − f (a)
f ′ (ξ) = K = .
b−a

[For examples showing that the conditions in the MVT are required, take the counterex-
amples following Rolle’s Theorem and tilt your page/screen a bit ,.]

The following is a surprisingly useful generalisation of the Mean Value Theorem with a
very similar proof.

Theorem 9.5 (Cauchy’s MVT or Generalised MVT). Let a < b and f, g : [a, b] → R.
Assume
(a) f , g are continuous on [a, b]; and
(b) f , g are differentiable on (a, b).
Then there exists ξ ∈ (a, b) such that

f ′ (ξ)(g(b) − g(a)) = g ′ (ξ)(f (b) − f (a)).

59
If in addition g ′ (x) ̸= 0 for all x ∈ (a, b), then g(b) ̸= g(a) and the conclusion can be
written
f ′ (ξ) f (b) − f (a)

= .
g (ξ) g(b) − g(a)

Note. We cannot obtain this result by applying the MVT to f and g individually since
that way we’d obtain two ‘ξ’s, one for f and one for g, and these would in general not be
equal.

Proof. Suppose first that g(b) ̸= g(a). Define F (x) := f (x) − f (a) − K(g(x) − g(a)),
where K is chosen so that F (b) = F (a) = 0, namely

f (b) − f (a)
K := .
g(b) − g(a)

Then F is continuous on [a, b], differentiable on (a, b) and F (a) = F (b). Hence by Rolle’s
theorem there exists ξ ∈ (a, b) such that

F ′ (ξ) = f ′ (ξ) − Kg ′ (ξ) = 0,

or equivalently
f ′ (ξ)(g(b) − g(a)) = g ′ (ξ)(f (b) − f (a))
as required.

If g(b) = g(a) then by Rolle’s theorem there is a point ξ ∈ (a, b) with g ′ (ξ) = 0, and this
ξ satisfies the required equation. Thus if g ′ (x) ̸= 0 for all x ∈ (a, b) then we must have
g(b) ̸= g(a) and the last statement of the theorem follows by simple algebra.

Here is one of the most useful corollaries of the MVT.

Theorem 9.6 (Constancy Theorem). Let I be an interval and f : I → R be differentiable


with f ′ (x) = 0 for all x ∈ I. Then f is constant on I.

Note that the interval I need not be bounded, but it does need to be an interval: f : (1, 2)∪

! (3, 4) → R defined by f (x) = 1 for x ∈ (1, 2) and f (x) = 2 for x ∈ (3, 4) is clearly
differentiable with zero derivative for all x ∈ (1, 2) ∪ (3, 4), but is also not constant.

Proof. For any a, b ∈ I with a < b apply the MVT to f on [a, b]. (Note that f is
differentiable on I implies that f is continuous on [a, b] ⊆ I.) Then f (b) − f (a) =
f ′ (ξ)(b − a) for some ξ ∈ (a, b) ⊆ I. But f ′ (ξ) = 0, so that f (b) = f (a). Since this holds
for all a < b with a, b ∈ I, f is constant on I.

The following examples illustrate a method of using the Constancy Theorem to solve
certain differential equations. The ‘trick’ is to manipulate them so that they look like
d
dx
F = 0 for some function F .

60
Example 9.7. Suppose that ϕ is a function on an interval I whose derivative is x2 . Then
there exists a constant C such that, for all x ∈ I, ϕ(x) = 13 x3 + C.

Let F (x) := ϕ(x) − 31 x3 . Then F is differentiable and F ′ (x) = x2 − x2 = 0. By the


Constancy Theorem F (x) = C for some constant C and hence ϕ(x) = 31 x3 + C.

Example 9.8 (exp(x + y) = exp(x) exp(y)). Fix a constant c and consider F (x) =
exp(x) exp(c−x). Then using the chain rule, product rule, and exp′ (x) = exp(x) (obtained
by the Differentiation Theorem for power series) we obtain

F ′ (x) = exp(x) exp(c − x) − exp(x) exp(c − x) = 0.

We deduce that F (x) is a constant: exp(x) exp(c − x) = F (x) = F (0) = 1 · exp(c).


Substituting c = x + y now gives exp(x + y) = exp(x) exp(y) for all x, y ∈ R.

Note that similar methods allow for proofs of all the usual trigonometric identities, at
least for real numbers.

Example 9.9 (Trigonometric addition formulae). Recall that sin(x) and cos(x) are de-
fined via power series on the whole of R and that sin′ (x) = cos(x) and cos′ (x) = − sin(x)
followed from the Differentiation Theorem for power series. Fix a constant c and consider
F (x) = cos(x) cos(c − x) − sin(x) sin(c − x). Then using the chain rule and product rule

F ′ (x) = − sin(x) cos(c − x) + cos(x) sin(c − x) − cos(x) sin(c − x) + sin(x) cos(c − x) = 0.

We deduce that F (x) is a constant: cos(x) cos(c − x) − sin(x) sin(c − x) = F (x) = F (0) =
cos(c). Substituting c = x + y now gives

cos(x + y) = cos(x) cos(y) − sin(x) sin(y). (9)

Similarly (or by differentiation w.r.t. x)

sin(x + y) = sin(x) cos(y) + cos(x) sin(y). (10)

Substituting y = −x into the formula for cos(x + y) and noting that sin(−x) = − sin(x)
also gives the well-known formula

cos2 x + sin2 x = 1. (11)

for all real x. It also holds for complex x — see the supplementary matherial on the
exponential function on the website.

Example 9.10. We shall show that the general solution of the equation f ′ (x) = λf (x)
for all x ∈ R, is f (x) = aeλx where a is a constant. (That is, every solution is of this
form.)

We spot that eλx is a solution, so consider F (x) := f (x)/eλx = e−λx f (x). Then F ′ (x) =
f ′ (x)e−λx − f (x)λe−λx = 0. Hence, by the Constancy Theorem F (x) is constant, F (x) =
a; that is all solutions are of the form f (x) = aeλx .

61
Corollary 9.11 (Derivatives and monotonicity). Let I be an interval and let f : I → R
be differentiable.
(a) If f ′ (x) ≥ 0 for all x ∈ I then f is increasing on I.
(b) If f ′ (x) ≤ 0 for all x ∈ I then f is decreasing on I.
(c) If f ′ (x) > 0 for all x ∈ I then f is strictly increasing on I.
(d) If f ′ (x) < 0 for all x ∈ I then f is strictly decreasing on I.
Proof. Simply fix a, b ∈ I with a < b and apply MVT to f on [a, b] to get f (b) − f (a) =
f ′ (ξ)(b − a) for some ξ ∈ (a, b) ⊆ I.
 Remark. x3 is strictly increasing on R but has derivative 0 at x = 0, so the converses
to (c) and (d) do not hold.
Example 9.12 (Alternating bounds on sin and cos). By (11) we have
cos x ≤ 1.
If we set F (x) := x − sin x then F ′ (x) = 1 − cos x ≥ 0. Hence, by Corollary 9.11, F is
increasing: F (x) ≥ F (0) = 0 for all x ≥ 0. Thus
sin x ≤ x for x ≥ 0.
2
If we set F (x) := 1 − x2 − cos x then F ′ (x) = sin x − x ≤ 0 for x ≥ 0 so, by Corollary 9.11,
F is decreasing: F (x) ≤ F (0) = 0 for all x ≥ 0. Thus
x2
cos x ≥ 1 − 2
for x ≥ 0.
3 2
If we set F (x) := x − x6 − sin x then F ′ (x) = 1 − x2 − cos x ≤ 0 for x ≥ 0 so, by
Corollary 9.11, F is decreasing: F (x) ≤ F (0) = 0 for all x ≥ 0. Thus
x3
sin x ≥ x − 6
for x ≥ 0.
2 4 3
If we set F (x) := 1 − x2 + x24 − cos x then F ′ (x) = sin x − x + x6 ≥ 0 for x ≥ 0 so, by
Corollary 9.11, F is increasing: F (x) ≥ F (0) = 0 for all x ≥ 0. Thus
x2 x4
cos x ≤ 1 − 2
+ 24
for x ≥ 0.
And so on and so on . . . Inductively we can bound both sin and cos above and below by
their series terminated at odd or even numbers of terms respectively (exercise: write out
a proof of this).
Example 9.13 (π). We can define π as twice the smallest positive solution of cos √x = 0.
x2 x2 x4
Indeed, 1 − 2 ≤ cos x ≤ 1 − 2 + 24 by Example 9.12, so cos x > 0 for x < 2, but

cos 2 ≤ −1
3
. Thus by the IVT, there exists some π, 2 2 < π < 4 with cos π2 = 0.
3 2
Moreover, this value is unique as cos′ x = − sin x and sin x ≥ x − x6 = x(1 − x6 ) > 0 for
0 < x < 2, so cos x is strictly decreasing on [0, 2]. As sin π2 > 0 and cos π2 = 0 we deduce
from (11) that sin π2 = 1. Then from (9) and (10) we deduce that
cos(x + π2 ) = − sin(x) sin(x + π2 ) = cos(x).
Applying this four times gives cos(x + 2π) = cos x and sin(x + 2π) = sin x.

62
Example 9.14 (Lipschitz functions revisited). Suppose I is an interval and f : I → R
is differentiable with bounded derivative, |f ′ (x)| ≤ M for all x ∈ I. Then f is Lipschitz
continuous: by the MVT |f (x) − f (y)| = |f ′ (ξ)||x − y| ≤ M |x − y| for some ξ between x
and y.

If I = [a, b] and in addition f ′ is continuous, then f ′ is bounded by the Boundedness


Theorem. Hence any continuously differentiable function on any closed bounded interval
is Lipschitz continuous.

Warning. f : [0, 1] → R defined by f (x) = x does not satisfy these conditions even

! though f is continuously differentiable on (0, 1). We need the derivatives at the endpoints
as well here.

Example 9.15 (Bernoulli’s inequality). In Analysis I you met the useful inequality

(1 + x)r ≥ 1 + rx for x > −1, r ∈ N.

This was proved by induction. We now prove it for all real r ≥ 1. First we note that the
standard formula for the derivative of a power still holds:
d r d r
x = exp(r log x) = exp(r log x) = r exp(r log x − log x) = rxr−1
dx dx x
for x > 0 and any r ∈ R. Now consider F (x) = (1 + x)r − (1 + rx). Then F ′ (x) =
r(1 + x)r−1 − r = r((1 + x)r−1 − 1). Then for r ≥ 1 and x ≥ 0, (1 + x)r−1 ≥ 1
(exp((r − 1) log x) is increasing in x), so F ′ (x) ≥ 0 and hence F is increasing for x ≥ 0.
Thus F (x) ≥ F (0) = 0 for x ≥ 0. Similarly (1 + x)r−1 ≤ 1 for x ∈ (−1, 0], so F ′ (x) ≤ 0
there and so F (x) ≥ F (0) = 0 for x ∈ (−1, 0].
2 sin x
Example 9.16 (Jordan’s inequality). π
≤ x
≤ 1 for x ∈ (0, π2 ].

Proof. We have already proved the second inequality and to prove the first it is enough
to show that F (x) := sinx x is decreasing on (0, π2 ] as F ( π2 ) = π2 . Differentiation gives

x cos x − sin x
F ′ (x) = .
x2
So let’s consider the derivative of G(x) := x cos x − sin x on (0, π2 ]. We have G′ (x) =
−x sin x + cos x − cos x = −x sin x ≤ 0 as sin x > 0 on (0, π2 ]. Hence G is decreasing so
G(x) ≤ G(0) = 0 on (0, π2 ]. Hence F ′ (x) ≤ 0 and so F (x) is decreasing on (0, π2 ].

10 Taylor’s Theorem

Our objective in this section to investigate how a real-valued function may be approxi-
mated by a polynomial. We emphasise that our methods rely on Rolle’s Theorem and
the Mean Value Theorem. This means that the results of this section are for real-valued
functions only.

63
We begin by noting that the very definition of differentiability concerns the approximation
of a function by a linear function. Indeed f ′ (x0 ) exists if and only if we can write

f (x0 + h) = f (x0 ) + f ′ (x0 )h + ε(h)h

for some ε(h) → 0 as h → 0. Using Landau’s notation this is equivalent to

f (x0 + h) = f (x0 ) + f ′ (x0 )h + o(h) as h → 0.

The Mean Value Theorem gives another approximation, but with the added assumption
that f ′ exists in an interval. We have

f (x0 + h) = f (x0 ) + f ′ (ξ)h

for some ξ between x0 and x0 + h.

Suppose we wanted a better approximation to f near x0 . A natural generalization would


be to approximate f with a quadratic, say

f (x0 + h) ≈ f (x0 ) + f ′ (x0 )h + Kh2 .

Assuming f has a second derivative, if would seem reasonable to choose K so the second
derivatives matched. (Then the first derivatives of both sides would agree with just an
o(h) error, and integrating this over a length h would give an error of o(h2 ).) This suggests
that we should take K = 21 f ′′ (x0 ) and that
f ′′ (x0 ) 2
f (x0 + h) = f (x0 ) + f ′ (x0 )h + 2
h + o(h2 ).

More generally we could imaging higher and higher degree polynomial approximations
to f , assuming f has derivatives we can match to sufficiently high order. Even better
would be an extension of the MVT as this gives more control over the error, possibly
something like ′′
f (x0 + h) = f (x0 ) + f ′ (x0 )h + f 2(ξ) h2 .

Taylor’s Theorem gives such an extension. We phrase the following in a similar way to
the MVT so as to give a natural generalisation of Theorem 9.4.

Theorem 10.1 (Taylor’s Theorem). Let a < b and f : [a, b] → R. Let n ≥ 0 be such
that
(a) f , f ′ , . . . , f (n) exist and are continuous on [a, b];
(b) f (n+1) exists on (a, b).
Then there exists ξ ∈ (a, b) such that
f ′′ (a) f (n) (a) f (n+1) (ξ)
f (b) = f (a) + f ′ (a)(b − a) + 2!
(b − a)2 + · · · + n!
(b − a)n + (n+1)!
(b − a)n+1 .

The same holds with b < a using intervals [b, a] and (b, a) in place of [a, b] and (a, b).

64
Proof. We will use induction on n. The case n = 0 is precisely the MVT: f (b) =
f (a) + f ′ (ξ)(b − a) for some ξ ∈ (a, b).

Now assume n > 0 and define F : [a, b] → R by


f (n) (a)
F (x) := f (x) − f (a) − f ′ (a)(x − a) − · · · − n!
(x − a)n − K
(n+1)!
(x − a)n+1 .

where K is a constant chosen so that F (b) = 0. We also clearly have F (a) = 0 and,
by assumption, F is continuous on [a, b] and differentiable on (a, b). Hence by Rolle’s
Theorem there exists c ∈ (a, b) such that F ′ (c) = 0. Now
f (n) (a)
F ′ (x) = f ′ (x) − f ′ (a) − f ′′ (a)(x − a) − · · · − (n−1)!
(x − a)n−1 − K
n!
(x − a)n

and by induction, applying the n − 1 case of the theorem to f ′ on [a, c], we have
f (n) (a) f (n+1) (ξ)
f ′ (c) = f ′ (a) + f ′′ (a)(c − a) + · · · + (n−1)!
(c − a)n−1 + n!
(c − a)n

for some ξ ∈ (a, c) ⊆ (a, b). But then


f (n) (a)
0 = F ′ (c) = f ′ (c) − f ′ (a) − f ′′ (a)(c − a) − · · · − (n−1)!
(c − a)n−1 − K
n!
(c − a)n
f (n+1) (ξ)
= n!
(c − a)n − K
n!
(c − a)n .

Thus K = f (n+1) (ξ). Recalling that we chose K so that F (b) = 0, the required result
drops out.

The case when b < a is similar, or can be deduced from the above result by applying
it to f (−x) considered as a function [−a, −b] → R and carefully tracking all the sign
changes.

We can write Taylor’s theorem in a form that matches our previous discussion by taking
a = x0 and b = x0 + h:
f (n) (x0 ) n f (n+1) (x0 +θh) n+1
f (x0 + h) = f (x0 ) + f ′ (x0 )h + · · · + n!
h + (n+1)!
h

where 0 < θ < 1, h can be either positive or negative and f, f ′ , . . . , f (n+1) are assumed
to exist in the appropriate ranges.

It is important to realise that the number θ here depends on h (and on x0 , which we


regard as fixed). We have in general no information on how θ varies with h, though
it may sometimes be possible to get information in the limit as h → 0 (see Problem
Sheet 7).

The further x0 + h is from x0 the less likely the polynomial part is to give a good
approximation to f (x0 + h). Moreover it may be hard in specific cases to find a tight
hn+1 (n+1)
estimate of the size of the error term (n+1)! f (x0 + θh) especially since the value of θ is
not known, so that we need a global upper bound covering all possible values of x0 + θh.
However, on the assumption that f (n+1) is bounded on [x0 , x0 + h] we do have
f (n) (x0 ) n
f (x0 + h) = f (x0 ) + f ′ (x0 )h + · · · + n!
h + O(hn+1 ).

65
Example 10.2. Consider the expansion of f (x) = log(1 + x) around x = 0. We have
f ′ (x) = 1+x
1
, and by induction

(−1)n−1 (n − 1)!
f (n) (x) = for all n ≥ 1.
(1 + x)n

This gives

x2 x3 xn xn+1
log(1 + x) = x − + − · · · + (−1)n−1 + (−1)n . (12)
2 3 n (n + 1)(1 + θx)n+1

We note that 1 + θx lies between 1 and 1 + x, so for example, with n = 2 we have

x2 x3 x2 x3
x− + ≤ log(1 + x) ≤ x − + ,
2 3(1 + x)3 2 3

for x > −1 (consider x ∈ (−1, 0) and x ≥ 0 separately).

Infinite Taylor series

A natural question is whether we can just let n → ∞ in Taylor’s Theorem and obtain an
infinite power series for f . The answer is unfortunately ‘No’ in general.

One obvious obstruction is that the higher derivatives may simply not exist. We have
seen examples of continuous functions that are continuous but not differentiable at a
point. It is relatively easy to construct examples that are n times differentiable but not
n + 1 times differentiable. One such example is

f (x) = |x|n+1/2

which is n but not n + 1 times differentiable at x = 0. (One can even get examples where
this happens at every x. For example, one can integrate the example on page 56 n times.)

But let’s assume f is infinitely differentiable, that is f (n) (x) exists for all n ≥ 0 and
all x in the domain of f . Is this enough to get the Taylor series to converge to f ? Again,
the answer is ‘No’ in general, however often it works. To see when it works, write
n
X f (k) (x0 )
f (x0 + h) = xk + En (h),
k=0
k!

f (n+1) (x0 +θh) n+1


where En (h) = (n+1)!
h is the error term. By AOL

X f (k) (x0 )
xk = lim (f (x0 + h) − En (h)) = f (x0 + h) − lim En (h),
k=0
k! n→∞ n→∞

if this last limit exists. Thus f (x0 + h) is given by the infinite power series if and only if
En (h) → 0 as n → ∞ (with x0 and h fixed).

66
Example 10.3. Continuing the example of log(1 + x), we construct the infinite Taylor
series ∞
x2 x3 X xk
f (x) := x − + − ··· = (−1)k−1 .
2 3 k=1
k
To determine whether or not this is really log(1 + x) we look at the error term in (12)

(−1)n  x n+1
En = .
n + 1 1 + θn x

Note that θ = θn depends on n (as well as x). As 0 < θn < 1 we see that if x ∈ [− 12 , 1]
we have |x/(1 + θn x)| ≤ 1 (for negative x we need 1 + θn x ≥ 1 − |x| to be at least |x|, so
x ≥ − 12 ). Thus for x ∈ [− 21 , 1], |En | ≤ n+1
1
→ 0 as n → ∞ and so f (x) = log(1 + x).

For x > 1 the series f (x) does not converge (by e.g., the Ratio Test), so we don’t have
an infinite power series for log(1 + x), despite the fact that log(1 + x) is perfectly well
defined and infinitely differentiable between 0 and x.

For x ≤ −1 we could not hope for a series expression for log(1 + x) as log(1 + x) is not
defined.

This leaves the cases when −1 < x < − 12 where the series f (x) happily converges, but it
is not clear whether or not it converges to log(1 + x) as we do not have enough control
over the error term En .

In this case it turns out that f (x) does indeed equal log(1 + x). We can use the Differ-
entiation Theorem for power series to deduce that
1
f ′ (x) = 1 − x + x2 − x3 + · · · =
1+x
for |x| < 1 (the radius of convergence of f is R = 1). Thus g(x) := f (x) − log(1 + x) has
derivative 0 in |x| < 1 and so by the Constancy Theorem g(x) is a constant for |x| < 1.
As clearly g(0) = 0 we have

f (x) = log(1 + x) for x ∈ (−1, 1).

We note that Taylor’s theorem also gave this for x = 1, so we deduce that

x2 x3
log(1 + x) = x − + − ··· for − 1 < x ≤ 1.
2 3
Note that Taylor’s theorem failed to prove this for x ∈ (−1, − 21 ), although only because
we did not have good enough bounds on θn . On the other hand the Constancy Theorem
approach failed at x = 1, while the Taylor’s Theorem approach worked there.

[The case x = 1 is also a spin-off of the definition of the Euler–Mascheroni constant, see
the Analysis I notes page 100. It is also a consequence of Abel’s Continuity Theorem,
the (non-examinable) Theorem 12.9 below.]

67
The above example shows that the infinite Taylor series may fail to converge even when
the function is infinitely differentiable in the appropriate range. Could it be therefore
that it is just convergence of the power series that we need? Unfortunately the answer is
again ‘No’ in general. It is possible that En (h) might converge to a non-zero value and
so the Taylor series converges, but to the wrong value!

! Example 10.4. Consider f : R → R defined by
( 2
e−1/x , for x ̸= 0;
f (x) :=
0, for x = 0.
Some experimentation shows that we expect
( 2
(k) Qk (1/x)e−1/x , for x ̸= 0;
f (x) :=
0, for x = 0.
for some polynomial Qk of degree 3k. We can prove this by induction: At points x ̸= 0
this is routine use of linearity, the product rule and the chain rule. But at x = 0 we need
to take more care, and use the definition:
f (k) (x) − f (k) (0) 2
= x−1 Qk (1/x)e−1/x
x−0
which we must prove tends to zero as x → 0. Change the variable to t = 1/x, then we
2 2
have tQk (t)e−t which is a finite sum of terms like ts e−t , which we know tend to zero as
|t| tends to infinity.
P f (k) (0) k
So for this function f the series k!
x = 0, so converges to 0 at every x. But the
error term En (x) is the same for all n (it equals f (x)) and so does not tend to 0 at any
point except 0.
Note that we can add this function to exp x and sin x and so on, and get functions with
the same set of derivatives at 0 as these functions, so that they will have the same Taylor
polynomials—but are different functions.
 Example 10.5. We can even construct infinitely differentiable functions whose Taylor
series have zero radius of convergence. For example, let

X sin(k 3 x)
f (x) := .
k=1
kk

We note that this converges (very quickly). With a bit of work one can show that

X cos(k 3 x)
f ′ (x) = .
k=1
k k−3
P∞ cos(k3 x)
This is not as easy as it looks! Here is one approach. Set g(x) := k=1 kk−3
. Then by
applying Taylor’s theorem to sin(k 3 x) we have

X sin(k 3 x) + k 3 cos(k 3 x)h − 12 k 6 sin(k 3 ξk )h2
f (x + h) = k
= f (x) + g(x)h + ε(h)h,
k=1
k

68
for some ξk between x and x + h and where
∞ ∞
X − 21 k 6 sin(k 3 ξk )h X 1
|ε(h)| = ≤ |h| .

k k 2k k−6
k=1 k=1

But ∞ 1
P
k=1 2kk−6 converges to a constant, so ε(h) → 0 as h → 0.

In general
∞ ∞
(n)
X sin(k 3 x) X cos(k 3 x)
f (x) = ± or ± .
k=1
k k−3n k=1
k k−3n
Now |f (n) (0)| = k k 3n−k ≥ n3n−n = n2n for n odd. Thus as n! ≤ nn , |f (n) (0)/n!| ≥ nn .
P
P f (n) (0) n
One can then deduce that the series n!
x has zero radius of convergence (e.g., by
the Ratio Test).
Example 10.6 (Real power series). Suppose we have a function defined by

X
f (x) := ck x k |x| < R
k=0

ck xk . Then the Differentiation Theorem


P
where R > 0 is the radius of convergence of
for power series tells us that f has derivatives of all orders. Moreover, by induction,

X
(n)
f (x) = ck k(k − 1) · · · (k − n + 1)xk−n |x| < R,
k=n

so in particular f (n) (0) = n!cn . Therefore cn = f (n) (0)/n! and so by definition


f ′′ (0) 2 f (k) (0) k
f (x) = f (0) + f ′ (0)x + x + ··· + x + ··· |x| < R.
2! k!
So if we knew f could be expressed as a power series, then the infinite Taylor expansion
would be that power series.
Example 10.7. Suppose f : R → R has the property that for all x, f ′ (x) = f (x) and
f (0) = 1. Assuming such an f exists, and without knowing anything about the ex-
ponential function, we deduce that f (n) (x) = f (x) exists and is continuous for all n
(continuous as f (n+1) exists). But then f (n) (x) is bounded on any fixed interval [−N, N ],
say |f (n) (x)| ≤ M . Hence by Taylor’s theorem we deduce that
x2 xn
f (x) = 1 + x + + ··· + + En (x)
2! n!
n+1
where |En (x)| ≤ M x
(n+1)!
. As En (x) → 0 as n → ∞, we deduce that in fact f (x) is given
by the infinite Taylor series

X xk
f (x) = .
k=0
k!
As the Differentiation Theorem for power series shows that in fact this power series
differentiates to itself, we deduce that in fact such a function f does exist (and is probably
interesting enough to give a name to!).
Other differential equations can be ‘solved’ in a similar manner.

69
In fact, a power series f (x) can be expressed as an infinite Taylor series about any point
x0 strictly inside its radius of convergence.

Theorem 10.8. Suppose f (x) = ck xk is a real or complex power series with radius of
P
EXAM
convergence R ∈ (0, ∞]. Then for |x0 | < R

X f (k) (x0 )
f (x0 + h) = hk
k=0
k!

for all h with |h| + |x0 | < R.

Proof. See Part A Metric spaces and Complex Analysis.

We say a function f is analytic at a point x0 if there exists some δ > 0 such that one can
write f (x0 + h) as a power series for |h| < δ. By the Differentiation Theorem this implies
that f is infinitely differentiable. By Example 10.6 it is also equivalent (for real-valued
functions) to the Taylor series of f about x0 converging to the function, at least when h is
sufficiently small. Theorem 10.8 states that any power series is analytic within its radius
of convergence. Examples 10.4 and 10.5 give functions that are infinitely differentiable
at 0 but not analytic there.

EXAM
A brief aside

We have focused exclusively on use of the Taylor polynomials as polynomial approxima-


tions to a given function f on some closed interval, with suitable assumptions on existence
of derivatives f ′ , f ′′ , . . . . There are other possibilities that may be appropriate in certain
contexts. For example, one might want to construct a polynomial approximation which
agrees with f at some specified finite set of n points (a curve-fitting problem). This re-
quires Lagrange interpolation to obtain an approximating polynomial of degree n − 1.
Then one can use repeated applications of Rolle’s Theorem on a suitably defined function
— a strategy akin to that we used to prove Taylor’s Theorem. This and other similar
problems are taken up in the Part A course Numerical Analysis.

There are different versions of Taylor’s Theorem valid under different technical assump-
tions and with the remainder term expressible in different ways. An illustration can be
found on Problem sheet 8.

On the positive side we record that the picture changes radically when one considers
complex valued functions of a complex variable. Then condition of differentiability is
much stronger, and any complex-valued function differentiable on an open disc in C is in
fact analytic, so infinitely differentiable there. This will be covered in the Part A course
Metric Spaces and Complex Analysis.

70
11 L’Hôpital’s Rule

We have already indicated how the MVT and Taylor’s theorem leads to useful inequalities
involving the elementary functions and we have given examples of standard limits that
can be obtained by basic AOL-style arguments. However, there are examples that cannot
be obtained by these simple methods.

It should be apparent that what prevents us from using e.g., AOL directly to find a
limit is that we encounter one of the indeterminate forms not handled by Theorem 2.2.
For example, trying to find the limit of a quotient f (x)/g(x) as x → p, say, when the
individual limits, limx→p f (x) and limx→p g(x) are both 0.

What we are contending with here are limits which involve what are known generically
as indeterminate forms. They come in a variety of flavours, and our examples so far
illustrate how to deal, albeit in a somewhat ad hoc way, with many of the limits that
crop up frequently in practice. Can we more systematic and can we invoke theoretical
tools to extend our catalogue of examples? The answer to both questions is a qualified
‘Yes’.

In the remainder of this section we discuss a technique known as L’Hôpital’s Rule (or
maybe it should be referred to as L’Hôpital’s Rules). It is not our intention to provide a
comprehensive handbook of the various scenarios to which the L’Hôpital technique can
be adapted. In any case, indeterminate limits arising in applications often require special
treatment and call for ingenuity.

Let’s consider first a simple case of a limit of a quotient of two functions

f (x)
lim .
x→p g(x)

If f (x) → a and g(x) → b with a, b finite and b ̸= 0 then we can use AOL. We can
also use Extended AOL for certain forms such as a/∞ (a ̸= ±∞). Cases of ±∞/b when
b ̸= ±∞ and a/0 when a ̸= 0 are guaranteed not to converge (see Problem Sheet 1), but
what about 0/0, say?

A trick that one can use when f (x) and g(x) are differentiable at p and f (p) = g(p) = 0
is use the definition of differentiability to evaluate the limit:
f (x)−f (p) f (x)−f (p)
f (x) x−p
limx→p x−p f ′ (p)
lim = lim = = ,
x→p g(x) x→p g(x)−g(p) g(x)−g(p)
limx→p x−p g ′ (p)
x−p

provided g ′ (p) ̸= 0. We obtain the following.

Proposition 11.1 (Simple L’Hôpital Rule). Let f, g : E → R at let p ∈ E be a limit


point of E. Assume that
(a) f (p) = g(p) = 0;
(b) f ′ (p) and g ′ (p) exist;

71
(c) g ′ (p) ̸= 0.
Then
f (x) f ′ (p)
lim exists and equals .
x→p g(x) g ′ (p)
Example 11.2. Given that the Differentiation Theorem for power series tells us that
sin x is differentiable with derivative cos x, we can immediately see that
sin x cos 0
lim = = 1.
x→0 x 1
Other examples include

log(1 + x) 1/(1 + x)|x=0 1


lim = = =1
x→0 sin x cos x|x=0 1

and
x3/2 3/2 · x1/2 |x=0 0
lim = 2
= = 0.
x→0 tan x sec x|x=0 1

Other indeterminate forms such as 1∞ or ∞0 or 00 can often be handled by writing out


the power in terms of exp and using continuity and limit properties of the exponential
function.

Example 11.3 (Euler’s limit). limx→0 (1 + x)1/x = limx→∞ (1 + x1 )x = e.


−1
Proof. limx→0 log(1+x)
x
= (1+x)1 |x=0 = 1, Hence (1 + x)1/x = elog(1+x)/x → e1 = e by
continuity of exp at 1. For the second limit, substitute y = 1/x and let x → ∞.

Suppose now we wish to evaluate the limit


1 − cos x
lim ,
x→0 x2
say. Unfortunately, in this case g(x) = x2 has derivative 0 at x = 0, and so we can’t
apply the above result. The following gives a way of proceeding in this case. However,
because the proof uses the MVT, we do need stronger conditions on both the numerator
and denominator functions.

Theorem 11.4 (L’Hôpital’s Rule, 00 form). Suppose f and g are real-valued functions
defined in some interval (a, a + δ), δ > 0. Assume that
(a) f and g are differentiable in (a, a + δ);
(b) limx→a+ f (x) = limx→a+ g(x) = 0;
(c) g ′ (x) ̸= 0 on (a, a + δ);
f ′ (x)
(d) limx→a+ g ′ (x)
exists (in R ∪ {±∞}).

72
Then g(x) ̸= 0 on (a, a + δ) and
f (x) f ′ (x)
lim+ exists and equals lim+ ′ .
x→a g(x) x→a g (x)

For the case of left-hand limits replace (a, a + δ) by (a − δ, a).


For the case of a two-sided limits replace (a, a + δ) by (a − δ, a + δ) \ {a}.

Proof. We have opted to prove the version for a right-hand limit since we can do this
without the distraction of having to bother about the sign of x−a when working with the
Cauchy MVT. The left-hand limit version is proved likewise and the two-sided version
then follows from Proposition 1.15.

So assume conditions (a)–(d) hold as set out for the right-hand limit version. By (b) we
may (re)define g(a) = f (a) = 0 so that f and g are continuous on [a, a + δ). We also
know by Rolle’s Theorem for g that g(x) = g(x) − g(a) ̸= 0 for x ∈ (a, a + δ) as g is
continuous on [a, x] and differentiable with g ′ ̸= 0 on (a, x). Now apply the Cauchy MVT
to obtain ξx ∈ (a, x) such that
f (x) f (x) − f (a) f ′ (ξx )
= = ′ .
g(x) g(x) − g(a) g (ξx )
Since a < ξx < x, necessarily x → a forces ξx → a. The result now follows from (d) and
Theorem 2.8.
Remark. Usually proving (d) gives (c) as a byproduct (possibly after reducing δ). For

! example if we used another application of L’Hôpital to determine the limit of f ′ (x)/g ′ (x).
However there are situations where algebraic cancellation can occur in f ′ (x)/g ′ (x) hiding
a sequence of points sneakily tending to p where g ′ = 0. One can’t use the theorem in
this case, and indeed the conclusion can be false, so (c) does need to be checked.
Example 11.5. limx→0 1−cosx2
x
. As both 1−cos x and x2 are both differentiable and equal
to zero at x = 0, we can apply L’Hôpital to get
1 − cos x sin x
lim 2
= lim
x→0 x x→0 2x

provided the RHS limit exists and 2x ̸= 0 for x ̸= 0 near 0. But L’Hôpital can be applied
again as sin x and 2x are both are differentiable and equal to zero at x = 0. Thus
sin x cos x 1
lim = lim = .
x→0 2x x→0 2 2
As this last limit exists (and 2 ̸= 0 near x = 0), so does the original limit (and 2x ̸= 0
for x ̸= 0 near 0) and we finally deduce that limx→0 1−cos
x2
x
= 12 .

Note that, as in the above example, it is quite common to apply L’Hôpital more than
once. However the logic is somewhat backwards. Strictly speaking we should start with
the evaluation of limx→0 sin2x
x
, as until we know that that limit exists, we do not know the
original limit exists. However it is easier to write the argument as follows with the later
lines justifying the earlier ones.

73
First note that 1 − cos x and x2 are both infinitely differentiable, and so the derivative
condition (a) in L’Hôpital holds throughout. At each stage we just need to check that
numerator and denominator are both zero at x = 0 and that the denominator is non-zero
nearby 0 (which, except at the end, is implied by the next application of L’Hôpital, and
at the end is usually implied by continuity of the non-zero denominator). So
1 − cos x sin x
lim 2
= lim L’Hôpital 00 , provided RHS exists and denom ̸= 0 near 0
x→0 x x→0 2x
cos x
= lim L’Hôpital 00 , provided RHS exists and denom ̸= 0 near 0
x→0 2
1
= Continuity of cos and AOL, and yes, 2 ̸= 0 near 0
2
sin x−x
Example 11.6. limx→0 sinh3 x
. Applying the method in the previous example we get

sin x − x cos x − 1
lim 3 = lim L’H 00 , provided. . .
x→0 sinh x x→0 3 sinh2 x cosh x
− sin x
= lim L’H 00 , provided. . .
x→0 6 sinh x cosh2 x + 3 sinh3 x
− cos x
= lim L’H 00 , provided. . .
x→0 6 cosh x+12 sinh x cosh x+9 sinh2 x cosh x
3 2

−1
= Continuity + AOL
6
Note however that the differentiation was beginning to get rather tedious. Indeed, one
should avoid just simply applying L’Hôpital multiple times without thought. Often the
calculations can be simplified by combining with AOL or other techniques. For example:
sin x − x cos x − 1
lim 3 = lim L’H 00 , provided. . .
x→0 sinh x x→0 3 sinh2 x cosh x
1 cos x − 1
= lim · lim AOL
x→0 3 cosh x x→0 sinh2 x
1 − sin x
= lim L’H 00 , provided. . .
3 x→0 2 sinh x cosh x
1 −1 sin x
= lim · lim AOL
3 x→0 2 cosh x x→0 sinh x
−1 cos x
= lim L’H 00 , provided. . .
6 x→0 cosh x
1
=− Continuity
6
Again the justification is that each line holds provided the RHS limits exist and the
denominator is non-zero nearby x = 0, and thus the last line inductively justifies all the
previous ones. One needs to be a bit more careful that the factors we are taking out are
not hiding a sequence of zeros on the denominator, causing (c) to fail.

74
Of course, it is sometimes just easier to use Taylor’s Theorem:

sin x − x (x − 3!1 x3 + O(x5 )) − x


lim = lim
x→0 sinh3 x x→0 (x + O(x3 ))3
− 61 x3 + O(x5 )
= lim
x→0 (x(1 + O(x2 )))3

− 16 + O(x2 )
= lim
x→0 (1 + O(x2 ))3
1
=− .
6

Again, we emphasise that one should be on the lookout for AOL and other methods to
simplify things, rather than just applying L’Hôpital multiple times without thought. For
example,
sin3 x 1  sin x 3
lim 3 = lim · lim = 1 · 13 = 1.
x→0 x + x4 x→0 1 + x x→0 x

One can extend L’Hôpital’s rule to the case when the limit is as x → ±∞ fairly easily
by replacing x with 1/x (see Problem Sheet 7). One can also extend L’Hôpital’s rule to
the case when f (x), g(x) → ∞ as x → a, although this requires a bit more work.

Theorem 11.7 (L’Hôpital’s Rule, ∞ ∞


form). Suppose f and g are real-valued functions
defined in some interval (a, a + δ), δ > 0. Assume that
(a) f and g are differentiable in (a, a + δ);
(b) limx→a+ |f (x)| = limx→a+ |g(x)| = ∞;
(c) g ′ (x) ̸= 0 on (a, a + δ).
f ′ (x)
(d) limx→a+ g ′ (x)
exists.
Then
f (x) f ′ (x)
lim+ exists and equals lim+ ′ .
x→a g(x) x→a g (x)

For the case of left-hand limits replace (a, a + δ) by (a − δ, a).


For the case of a two-sided limits replace (a, a + δ) by (a − δ, a + δ) \ {a}.
f 1/g 0
Remark. We can’t just replace g
with 1/f
and apply the 0
form. (Why?)

Proof. We will just prove the right-hand limit version, the other cases follow quite easily.
So assume conditions (a)–(d) hold as set out for the right-hand limit version. We also
know by Rolle’s Theorem for g that g(x) − g(c) ̸= 0 for a < x < c < a + δ. Apply the
Cauchy MVT to obtain ξx,c ∈ (x, c) such that

f (x) − f (c) f ′ (ξx,c )


= ′ .
g(x) − g(c) g (ξx,c )

75

Now if fg′ (x)
(x)
→ ℓ ∈ R as x → a+ we can’t deduce that ξx,c converges (as it is only restricted
to be between a and c). However, given ε > 0 we can find a δ ′ ∈ (0, δ) such that
f (x) − f (c) f ′ (ξ )
x,c
− ℓ = ′ − ℓ < ε (13)

g(x) − g(c) g (ξx,c )

for all a < x < c < a + δ ′ (as then a < ξx,c < a + δ ′ ). We want | fg(x)
(x)
− ℓ| small so we need
to do some algebraic manipulation on (13). Clearing the fraction in (13) gives

|f (x) − f (c) − ℓg(x) + ℓg(c)| < ε|g(x) − g(c)|,

so by the triangle inequality

|f (x) − ℓg(x)| < ε|g(x) − g(c)| + |f (c) − ℓg(c)|.

Hence f (x) g(c) f (c) − ℓg(c)


− ℓ < ε 1 − + . (14)

g(x) g(x) g(x)

Now fix c and let x → a+ . As |g(x)| → ∞ we see the RHS of (14) tends to ε · 1 + 0 = ε
as x → a+ . Thus for x sufficiently close to a we have
f (x)
− ℓ < 2ε.

g(x)

f (x)
As this holds for any ε > 0, g(x)
→ ℓ. Similar (easier) arguments apply when ℓ = ±∞.

12 The Binomial Expansion

By simple induction we can prove that for any natural number n (including 0) we have
for all real or complex x that
n   ∞  
n
X n k X n k
(1 + x) = x = x ,
k=0
k k=0
k

where the coefficient nk of xk can be proved to be




 
n n! n(n − 1)(n − 2) · · · (n − k + 1)
= = (= 0 if k > n).
k k!(n − k)! k(k − 1) · · · 1

We want to extend this result. We have also seen in our work on sequences and series
that ∞
X
(1 + x)−1 = (−1)k xk for all |x| < 1
k=0
k
and here the coefficient of x can be written as
(−1)(−2) · · · (−k)
(−1)k = ,
k(k − 1) · · · 1

76
and we can prove by induction (for example using differentiation term by term) that for
all n ∈ N we have that

X (−n)(−n − 1) · · · (−n − k + 1)
(1 + x)−n = xk for all |x| < 1,
k=0
k(k − 1) · · · 1

so the binomial theorem above holds for all integers n if we define


 
n n(n − 1)(n − 2) · · · (n − k + 1)
:= .
k k(k − 1) · · · 1

We are going to generalise this — in the case of some real values of x — to all values of n,
not just integers. Note that this is altogether deeper: (1+x)p is defined for non-integral p,
and for (real) x > −1, to be the function exp(p log(1 + x)).

Definition. For all p ∈ R and all k ∈ N ∪ {0} we extend the definition of the binomial
coefficient as follows:
 
p p(p − 1)(p − 2) · · · (p − k + 1)
:= ,
k k!

where we interpret the empty product as 1 when k = 0.

We now make sure that the key properties of binomial coefficients are still true in this
more general setting.

Lemma 12.1. For all k ≥ 1 and all p ∈ R


           
p p p−1 p−k+1 p p+1 p p
= = and = + .
k k k−1 k k−1 k k k−1
p p−k+1 p

Proof. The first claim is clear by taking out a factor of k
or k
in the definition of k
.

For the second we use the first claim (both parts) to show that
           
p p p−k+1 p p p+1 p p+1
+ = + = = .
k k−1 k k−1 k−1 k k−1 k

Theorem 12.2 (Real Binomial Theorem). Let p be a real number. Then for all |x| < 1
∞  
p
X p k
(1 + x) = x .
k=0
k

Note that the coefficients are all non-zero provided p is not a natural number or zero; as
we have a proof of the expansion in that case we may assume that p ∈ / N ∪ {0}.

Lemma 12.3. The function f defined on (−1, 1) by f (x) := (1 + x)p is differentiable,


and satisfies (1 + x)f ′ (x) = pf (x). Also, f (0) = 1.

77
Proof. The derivative is easily obtained by the chain rule from the definition of f ; it is
f ′ (x) = p(1 + x)p−1 . Multiply by (1 + x) and get the required relationship. The value at
0 is clear.
P p k
Lemma 12.4. The radius of convergence of k
x is R = 1.

Proof. Use the Ratio Test; we have that


p  xk p − k + 1
k
p k−1 = · x → |(−1) · x| = |x|

k−1
x k
as k → ∞.
Lemma 12.5. The function g defined on (−1, 1) by g(x) := ∞ p
P  k
k=0 k
x is differentiable,
with derivative satisfying (1 + x)g ′ (x) = pg(x). Also, g(0) = 1.

Proof. We have
∞  

X p
(1 + x)g (x) = (1 + x) kxk−1 Diff. of power series, |x| < 1
k=1
k

X p   ∞  
k−1
X p
= kx + kxk
k=1
k k=1
k
∞   ∞  
X p k
X p
= (k + 1)x + kxk k 7→ k + 1 in 1st sum
k=0
k + 1 k=1
k
∞   ∞  
X p X p p p−(k+1)+1 p
(p − k)xk + kxk

= k+1
= k+1 k
k=0
k k=0
k

X p  
=p xk = pg(x).
k=0
k

Proof of the binomial theorem. Consider F (x) = g(x)/f (x), which is well-defined on
(−1, 1) as f (x) > 0. By the Quotient Rule we can calculate F ′ (x), and then use the
lemmas:
f (x)g ′ (x) − f ′ (x)g(x) p f (x)g(x) − f (x)g(x)
F ′ (x) = 2
= = 0.
f (x) 1+x f (x)2
Hence by the Constancy Theorem, F (x) is constant, F (x) = F (0) = 1. This implies that
f (x) = g(x) on (−1, 1).

EXAM Binomial Theorem at the end points

The existence of these functions and their equality at the end points x = ±1 requires
more sophisticated argument. The following should be viewed as illustrations of the way
various theorems can be exploited, rather than proofs to be learnt.
P p n
As we will be considering sums n
x with x = ±1, it helps to first estimate how large
p

the binomial coefficient n is.

78
p
= O(n−(p+1) ) as n → ∞.

Lemma 12.6. For any p ∈ R we have n

Proof. We first note that


 
p p(p − 1) · · · (p − n + 1)  p + 1  p + 1
= =± 1− ··· 1 − .
n n(n − 1) · · · 1 1 n
Now for x ∈ [0, 1] we have 1 − x ≤ e−x as e−x + x − 1 has positive derivative for x > 0
and is 0 at x = 0. Let s ∈ N be fixed so that s > p + 1, say s = max(⌊p + 2⌋, 1). Then
 p  Y s−1 n n
p + 1 Y −(p+1)/k  X 1
≤ 1 − · e = C exp − (p + 1) ,

n k k

k=1 k=s k=s

where C is a constant just depending on p (and s). Pn But1 from Analysis I we know that
P n 1
k=1 n − log n → γ as n → ∞, so in particular | k=s k − log n| is bounded as n → ∞
(with s fixed). Thus as exp x is increasing in x we can bound
 p 
≤ C exp − (p + 1) log n + C ′ = C ′′ n−(p+1) ,

n

for suitable constants C ′ and C ′′ .

The case when x = 1


P∞ p

Theorem 12.7. For any p > −1 the series n=0 n is convergent with sum 2p .
p

Remark. It is easy to see that for p ≤ −1, n
̸→ 0, so the series is divergent.

Proof. We apply Taylor’s Theorem to (1 + x)p on the interval [0, 1] (with n replaced by
dk p
n − 1 for convenience). We have k!1 dx p

k (1 + x) = k
(1 + x)p−k , so for each n ≥ 1, there
is a point ξn ∈ (0, 1) such that
n−1    
p
X p p
2 = + En , where En = (1 + ξn )p−n .
k=0
k n

Hence for n > p we have |En | ≤ | np |. But then by Lemma 12.6, |En | = O(n−(p+1) ), and


so En → 0 as n → ∞ since p > −1.

 Remark. In the above proof we could not make use of the (1 + ξn )p−n factor to show En
is small as we could have ξn tending very rapidly to 0 as n → ∞.

The case when x = −1

For x = −1 we have not yet defined (1 + x)p = 0p .

For p ∈ N we have the usual algebraic definition, so 0p = 0 for p ≥ 1. Can we define 0p


sensibly for any other values of p?

79
For p > 0: when x > 0 we defined xp := exp(p log x). As log x → −∞ as x → 0+ , we
have exp(p log x) → 0 as x → 0+ . Thus to make xp continuous at x = 0 we should define
0p = 0 for all p > 0. This we now do.

If p = 0: one normally defines 00 = 1, although some authors prefer 00 to P be left


undefined. Having 00 = 1 certainly makes sense for polynomials and power series ck x k
when x = 0, and is always interpreted this way in this context. The disadvantage of
defining 00 is that xy cannot be made continuous at (x, y) = (0, 0), so one cannot assume
f (x)g(x) converges when f (x), g(x) → 0.

Theorem 12.8. For any p > 0 the series ∞ p


P  n
n=0 n (−1) is convergent with sum 0.

Remark. For p = 0 the sum is 1, and for p < 0 it is easy to show that the sum diverges.

Proof. In this case, Taylor’s theorem does not help. But we can get the result by showing
the binomial series is uniformly convergent, and hence continuous, on [−1, 1].

We have | np xn | ≤ Mn := | np |, for all x ∈ [−1, 1]. But by Lemma 12.6, Mn = O(n−(p+1) )


 
P −(p+1)
and n converges for p > 0 by the Integral Test. Thus by P the Comparison Test,
p
P  n
Mn converges and so we have uniform convergence of the series n
x on [−1, 1] by
the M -test. As each of the terms np xn is continuous in x, this implies the infinite sum


is continuous for all x ∈ [−1, 1]. Continuity at x = −1 then implies that


∞  
X p
(−1)n = lim + (1 + x)p = lim + ep log(1+x) = lim epy = 0
n=0
n x→(−1) x→(−1) y→−∞

as y = log(1 + x) → −∞ as x → (−1)+ , p > 0, and exp(z) → 0 as z → −∞.

EXAM Continuity of a real power series at the endpoints

You have been aware ever since you P first studied the convergence of power series in
k
Analysis I that a real power series ck x with finite non-zero radius of convergence R
converges absolutely for any x for which |x| < R. You also saw examples which show
that the series may converge absolutely, may convergence non-absolutely, or may diverge,
at each of the points x = R and x = −R.
P∞ k
We showed in Section 7 that f (x) := k=0 ck x defines a continuous function f on
(−R, R), irrespective of how the series behaves at ±R. But what if the series does
converge at ±R? Can we deduce that the value is what one would expect assuming f is
continuous there? In the examples we have seen it did, and indeed, the answer turns out
to be Yes!

By replacing f (x) with f (±x/R) we may assume without loss of generality that R = 1
and we are interested in the series at x = R = 1. The following is then the result that
we want.

80
ck x k
P P
Theorem 12.9 (Abel’s Continuity Theorem). Assume
P∞ that c k converges. Then
converges uniformly on [0, 1]. In particular k=0 ck xk is continuous on [0, 1] and

X ∞
X
k
lim ck x = ck .
x→1−
k=0 k=0

ck xk follows P
P
Remark. We P note that uniform convergence of immediately from the
M -test when |ck | converges, so the interesting case is when ck is not absolutely
convergent.

Proof. Fix ε > 0. Then by the Cauchy Convergence Criterion for series, there is an N
such that for n ≥ m > N ,
Xn
ck < ε.


k=m
Pn
Now fix m > N and define Sn = k=m ck for n ≥ m − 1 with the convention that
Sm−1 = 0. We note that cn = Sn − Sn−1 for all n ≥ m. Thus34
n
X n
X n
X
ck x k = Sk xk − Sk−1 xk
k=m k=m k=m
Xn n−1
X
= Sk xk − Sk xk+1
k=m k=m−1
n−1
X
= Sk (xk − xk+1 ) + Sn xn
k=m

Hence by the Triangle inequality, and noting that |Sn | < ε for n ≥ m,
Xn X n−1
k
ck x ≤ ε(xk − xk+1 ) + εxn = εxm ≤ ε


k=m k=m

for any x ∈ [0, 1]. ThusP by Cauchy’s Criterion for uniform convergence of series, Corol-
lary 7.10, we have that ck xk is uniformly convergent on [0, 1].

ck xk and the limit as x → 1− now follow from Theorem 7.2.


P
Continuity of

Example 12.10. (Recall Example 10.3.) We have (by the Differentiation and Constancy
theorems) that for x ∈ (−1, 1),

X (−1)k−1 xk
log(1 + x) = .
k=1
k
P (−1)k−1
As log(1 + x) is continuous at x = 1 and k
converges by the Alternating Series
Test, we deduce that
1 1 1
1 − + − + · · · = log 2.
2 3 4

81
Warning. Abel’s P Theorem only applies in situations where the sum is a genuine power

! series of the form ck xk . For example, recall that for x ∈ (−1, 1)

X xk
− log(1 − x) = .
k=1
n

Now consider for x ∈ [0, 1] the series



X xk − x2k
f (x) := .
k=1
n

Then clearly for x ∈ (−1, 1),


2
f (x) = − log(1 − x) + log(1 − x2 ) = log 1−x
1−x
= log(1 + x).

0 = 0. But f (1− ) =
P
But the series for f (x) converges at x = 1 and f (1) =
limx→1− log(1 + x) = log 2 ̸= 0.

34
This is called Abel’s summation formula – think ‘integration by parts’.

82

You might also like