Exploring University Mathematics With Python
Exploring University Mathematics With Python
Siri Chongchitnan
Exploring University
Mathematics with Python
Siri Chongchitnan
Mathematics Institute
University of Warwick
Coventry, UK
©The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the
whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions
that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Motivation
Python is now arguably the world’s most popular programming language, thanks to its
gentle learning curve, clear syntax, wide range of open-source libraries and active online
support community. Over the past decade, Python programming has become a highly
desirable skill to employers not only in the STEM and ICT sectors, but also in any industry
involving data. It comes to no surprise then that Python has now been integrated into school
and university curriculums around the world.
A typical mathematics university curriculum would include some element of program-
ming, usually in a standalone module. However, in my experience teaching at several UK
universities, students often regard programming as just another module that is disparate
from a typical ‘pen-and-paper’ module.
In my opinion, this is an extremely unhealthy viewpoint, because programming can
often help us gain a more solid understanding of mathematics in comparison to a purely
pen-and-paper approach. It is true that much of university mathematics is driven by theorems
and proofs, and it is also true that Python does not prove theorems. However, Python gives
us the power and freedom to glimpse into the unknown, leading us towards insightful
conjectures that would have been difficult to formulate otherwise.
Hence, I was motivated to write a mathematics textbook that is richly interwoven with
Python, rather than another Python textbook with some mathematical examples. The spirit
of this book is one of mathematical exploration and investigation. I want to show students
that Python can hugely enrich our understanding of mathematics through:
• Calculation: Performing complex calculations and numerical simulations instantly;
• Visualisation: Demonstrating key theorems with graphs, interactive plots and anima-
tions;
• Extension: Using numerical findings as inspiration for making deeper, more general
conjectures.
vii
viii PREFACE
In addition, I hope this book will also benefit mathematics lecturers and teachers who
want to incorporate programming into their course. I hope to convince educators that
programming can be a meaningful part of any mathematical module.
Structure
The topics covered in this book are broadly analysis, algebra, calculus, differential
equations, probability and statistics. Each chapter begins with a brief overview of the
subject and essential background knowledge, followed by a series of questions in which key
concepts and important theorems are explored with the help of Python programs which
have been succinctly annotated. All code is available to download online.
At the end of each section, I present a Discussion section which dives deeper into the
topic. There are also a number of exercises (most of which involve coding) at the end of
each chapter.
Assumed knowledge
In terms of programming knowledge, this book does not assume that you are a highly
experienced user of Python. On the other hand, complete beginners to Python might struggle
to follow the code given. I would suggest that the reader should have the most basic
knowledge of programming (e.g. you know what a for loop does).
For completeness, I have included a section called Python 101 (Appendix A) which
gives instructions on installing Python and signposts to references that can help anyone pick
up Python quickly.
In terms of mathematical knowledge, I do not assume any university-level mathematics.
Students who are familiar with the material in the standard A-Level mathematics (or
equivalent) should be able to follow the mathematical discussions in this book.
Acknowledgements
I am extremely grateful to have received extensive comments from my early reviewers,
many of whom are students at my home department, Warwick Mathematics Institute. They
are:
I would also like to thank the team at Springer for their support. Special thanks to Richard
Kruel for his belief in my idea and his unwavering encouragement.
Siri Chongchitnan
Coventry, UK
PREFACE ix
The code
a) Downloading and using the code
All code is available to download from
https://github.com/siriwarwick/book
We will be coding in Python 3 (ideally 3.9 or higher). To find out which version you
have, see the code box below. If you don’t have Python installed, see Appendix A.
There are many ways to run Python programs, but by default, I will assume that you
are working in JupyterLab (or Jupyter Notebook). You will be working with files with
the .ipynb extension. For more information on the Jupyter IDE (integrated development
environment), see https://jupyter.org.
There are alternative IDEs to Jupyter, for example:
• IDLE (which comes as standard with your Python distribution);
• Spyder (https://www.spyder-ide.org);
• PyCharm (https://www.jetbrains.com/pycharm);
• Replit (https://replit.com).
If you prefer these IDEs, you will be working with files with the .py extension.
Code and annotations will be given in grey boxes as shown below. The code is given on
the right, whilst the explanation is given on the left.
b) About %matplotlib
We will often use the line
%matplotlib
to make any plot interactive (rather the usual static ‘inline’ plot). The zoom and pan buttons
in the GUI window will be particularly useful. To return the static plot, use the command:
%matplotlib inline
%matplotlib is one of the so-called ‘magic’ commands1 that only work in the Jupyter
environment and not in standard Python.
If you have difficulty running with just %matplotlib, try running this line of code in an
empty cell
%matplotlib -l
(that’s a small letter L after the dash). This should list all the graphics backends available
on your machine. Choose one that works for you. For example, if qt is on the list, replace
%matplotlib by
%matplotlib qt
c) Coding style
Everyone has their own style of coding. The code that I present in this book is just
one of many ways of approaching the problems. Keeping the purpose and the audi-
ence in mind, I tried to minimise the use of special packages and clever one-liners,
but instead I placed a greater emphasis on readability. In particular, I do not use the
if __name__ == "__main__": idiom. My goal in this book is to show how basic
knowledge Python goes a long way in helping us delve deeply into mathematics.
In short, every piece of code shown can be improved in some ways.
A caveat: some code may produce warnings or even errors as new versions of Python
roll out since the publication of this book. If you feel something isn’t quite right with the
code, check the book’s GitHub page for possible updates.
d) Getting in touch
Comments, corrections and suggestions for improvement can be posted on the discussion
section on the book GitHub
https://github.com/siriwarwick/book/discussions
or emailed to siri.chongchitnan@warwick.ac.uk. I will be happy to hear from you
either way.
CONTENTS
1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Basics of NumPy and Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Basic concepts in analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The ε, N definition of convergence for sequences . . . . . . . . . . . . . . . . . . . . . 10
1.4 Convergence of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 The Harmonic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 The Fibonacci sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 The ε, δ definition of continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Thomae’s function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9 The Intermediate Value Theorem and root finding . . . . . . . . . . . . . . . . . . . . 33
1.10 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.11 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.12 A counterexample in analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 Basic calculus with SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2 Comparison of differentiation formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Taylor’s Theorem and the Remainder term . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 A continuous, nowhere differentiable function . . . . . . . . . . . . . . . . . . . . . . . 67
2.6 Integration with Trapezium Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7 Integration with Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.8 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.9 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
7.1 Basic concepts in combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
7.2 Basic concepts in probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
7.3 Basics of random numbers in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
7.4 Pascal’s triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
CONTENTS xiii
8 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.1 Basic concepts in statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
8.2 Basics of statistical packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
8.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
8.4 Student’s t distribution and hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . 429
8.5 χ2 distribution and goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
8.6 Linear regression and Simpson’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
8.7 Bivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
8.8 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.9 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
8.10 Machine learning: clustering and classification . . . . . . . . . . . . . . . . . . . . . . . 471
8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
8.6 regression.ipynb Plots a regression line through data points and demon-
strates Simpson’s paradox
8.7 bivariate.ipynb Plots the bivariate normal distribution and its contour
ellipses
8.8 randomwalk.ipynb Generates 1D random walks and plots the mean distance
travelled as a function of time step
8.9 bayesian.ipynb Plots the prior and posterior for Bayesian inference
8.10 clustering.ipynb Performs k-means clustering
classification.ipynb Performs k-nearest neighbour (kNN) classification
4 VISUALISATION RECIPES
Visualisation recipes
Recipe Section Code
Plots
Plot with different types of lines 1.3 sequence-convergence.ipynb
Plot a large number of curves, with a gradual 2.3 taylor.ipynb
colour change
Two-panel plot in R2 1.5 harmonic.ipynb
Plot parametric curves and surfaces in R2 and R3 3.1 see text
Plot polar curves 3.1 see text
Plotting in R3 with Plotly 5.3 planes.ipynb
Three-panel plot in R3 5.9 ranknullity.ipynb
Plot one panel in R2 next to one in R3 3.8 grad.ipynb
Plot a 3D surface and its contour lines 3.8 grad.ipynb
Shade the area under a graph 8.4 ttest.ipynb
Draw a filled polygon 6.4 dihedral.ipynb
Sliders
Slider controlling a plot in R2 1.7 continuityslider.ipynb
Slider controlling a plot in R3 3.6 quadrics.ipynb
One slider controlling two plots in R2 3.4 curvature.ipynb
One slider controlling two plots, one in R2 and 3.5 torsion.ipynb
one in R3 8.7 bivariate.ipynb
Two sliders controlling one plot in R2 2.5 weierstrass.ipynb
Heatmaps
Heatmap + vector field + slider (in R2 ) 3.9 div.ipynb
Heatmap + slider (polar coordinates) 3.10 curl.ipynb
Animations
Animation in R2 4.4 pendulum.ipynb
4.9 heat.ipynb
Animation with two panels in R2 6.12 zetaanim.ipynb
Animation in R3 4.6 lorenz.ipynb
4.10 wave.ipynb
Visualising matrices
Display a matrix with colour-coded elements 6.3 cayley.ipynb
4.8 mandelbrot.ipynb
Read in an image file 5.8 svd.ipynb
Visualising data
Plot a histogram with Matplotlib 7.5 coin1.ipynb
Plot a histogram with Seaborn 8.3 CLT.ipynb
Fit a line through a scatter plot 7.10 montecarlo.ipynb
Read data from a file and store as Pandas 8.6 regression.ipynb
dataframes
Write data to a file 8.7 see text
Shade 2D regions according to classifications with 8.10 classification.ipynb
Scikit-learn
CHAPTER
ONE
Analysis
Real analysis is the study of real numbers and maps between them (i.e. functions). Analysis
provides a solid foundation for calculus, which in turn gives rise to the development of other
branches of mathematics. The main feature of analysis is its rigour, meaning that every
concept in analysis is defined precisely with logical statements without the need for pictures.
Fig. 1.1: Augustin-Louis Cauchy (1789–1857), one of the founders of modern analysis.
Cauchy is commemorated on a French stamp shown on the right. (Image source: [137].)
Analysis is a subject which many new university students find to be most different
from how mathematics is taught in school. The proof-driven nature of the subject can
be overwhelming to some students. In this chapter, we will see how Python can help us
visualise and understand key concepts in analysis.
In addition to real numbers, the main objects in analysis are sequences, series and
functions. We will first see how these objects can be represented and manipulated in
Python. We then give a survey of some key theorems in analysis and see how Python
can help us understand these theorems more deeply. We will focus on understanding and
visualising the theorems themselves, rather than the proofs. The proofs of the theorems
discussed in this chapter can be found in good analysis textbooks, amongst which we
recommend [10, 20, 91, 143, 189].
Sequences
A sequence (an ) is simply a succession of numbers with the subscript n = 1, 2, 3 . . . labelling
the order of appearance of each term. For example, the sequence (an ) = n12 contains the
terms a1 = 1, a2 = 14 , a3 = 19 and so on. The bracketed notation (an ) denotes the entire
sequence, whereas an denotes a single term in the sequence.
Sequences can be finite, for example, (an ) 5n=1 = 1, 14 , 19 , 16
1 1
, 25 . However, in analysis, we
are mainly interested in the behaviour of infinite sequences (an ) ∞ n=1 .
Sequences are easily represented as NumPy arrays, which are one of the most useful
objects in Python (another method is to use lists which we will discuss later). Naturally, we
cannot store infinitely long arrays in Python. But take the array to be long enough and it
will usually be sufficient to reveal something about the behaviour of the infinite sequence.
It is often necessary to create a sequence of equally spaced numbers between two given
numbers (in other words, an arithmetic progression). Such a sequence can be easily created
using NumPy’s linspace command:
np.linspace(first element, last element, how many elements)
The third argument is optional. If not provided, the default is 50 elements, equally spaced
between the first and the last elements. It is worth noting the terminology here: each term in
a sequence can be identified with an element in a NumPy array. linspace is especially
useful for plotting graphs, since we usually want equally spaced points along the x-axis.
The same sequence can also be created using the arange command of the form:
np.arange(first element, >last element, step)
where >last element means any number greater than the last element, but not exceeding last
element + step. This is because, by convention, the 2nd argument in arange is not included
in an arange array. In practice, just choose a number that is slightly larger than the final
element desired. The third argument is optional, with the default = 1.
If you want to create a short sequence by manually specifying every element, use the
following syntax:
np.array([element1, element2, . . . , last element])
1.2 Basic concepts in analysis 7
Here are some examples of how to create sequences in Python and obtain new ones using
array operations.
Array operations
Addition and scalar multiplication
(x n ) = (2a n − 5) x_n = 2*a_n - 5
(y n ) = (a n + x n ) y_n = a_n + x_n
Exponentiation
x n3
x_n**3
(3 x n ) 3**x_n
Series
for large values of N. These partial sums should give us a good idea about whether the
series converges to a finite number, or diverges (e.g. the sum becomes arbitrarily large, or
fluctuates without settling down to a value). We will study the convergence of a number of
interesting series in §1.4.
A useful command for evaluating series is sum(an array). Another useful operator in
Python 3 is the @ operator, which is equivalent to the dot product of two arrays, i.e.
x@y = np.dot(x, y) = sum(x*y)
Of course there is also the option of using the for loop. This is the most appropriate method
if you need to plot the partial sums. Here are two examples of series evaluation using various
methods.
Series evaluation
import numpy as np
100
X
n x= np.arange(1,101)
n=1 sum(x)
# or
S = 0
for n in np.arange(1,101):
S += n
100
X
n(n + 3) x@(x+3) # or
n=1 sum(x*(x+3)) # or
np.dot(x, x+3) # or
S = 0
for n in np.arange(1,101):
S += n*(n+3)
Functions
A function f is a mapping which takes as its input an object x ∈ A (where A is called the
domain), and produces an output f (x) ∈ B (where B is called the codomain). We denote
such a mapping by the notation f : A → B.
1.2 Basic concepts in analysis 9
Functions in Python are defined using the def command, which has a rich structure. A
function expects brackets in which arguments (if any) are placed. In most cases, you will
also want your function to return an output.
Instead of def, another way to define a function in a single line is to use the lambda
function (also called anonymous function).
For example, let F (x) = 2x + 1 and G(x, y) = x + y. In the box below, the left side
shows the def method. The right side shows the lambda-function method.
Defining functions
Method I: def Method II: Lambda functions
def F(x): F = lambda x : 2*x+1
return 2*x+1
We often use lambda functions when creating very simple functions, or when feeding a
function into another function. In the latter case, the lambda function does not even need
to have a name (hence ‘anonymous’). This will be useful when we discuss integration in
Python – for a preview, take a look at the end of §3.1.
Now that we can define sequences, series and functions in Python, we are ready to
investigate some key definitions and theorems in analysis.
10 1 Analysis
For any ε > 0, there exists an integer N ∈ N such that, for all n > N, |x n − x| < ε.
This is one of the most important definitions in analysis. It gives a rigorous way to express
an intuitive concept of convergence as n → ∞ very precisely. Let’s first try to unpack the
definition.
Let’s call |x n − x| the error (i.e. the absolute difference between the nth term of the
sequence and the limit x). The definition above simply says that down the tail of the sequence,
the error becomes arbitrarily small.
If someone gives us ε (typically a tiny positive number), we need to find how far down
the tail we have to go for the error to be smaller than ε. The further we have to go down the
tail, the larger N becomes. In other words, N depends on ε.
Now let’s consider the first sequence x n = 1 − n1 . The first step is to guess the limit as
n → ∞. Clearly, the n1 term becomes negligible and we guess that the limit is 1. Now it
remains to prove this conjecture formally. We go back to the definition: for any given ε > 0,
the error is given by
1 1
|x n − 1| = < < ε,
n N
where the final inequality holds if we let N be any integer greater than 1/ε. For instance, if
someone gives us ε = 3/10, then any integer N ≥ 10/3 would work (e.g. N = 4). We have
shown that this choice of N implies that the definition holds, hence we have proved that
lim x n = 1.
n→∞
The definition holds if there exists one such integer N, so that in the proof above, N + 1
or 3N 2 + 5 are also equally good candidates (one just goes further down the tail of the
sequence). However, it is still interesting to think about the smallest N (let’s call it Nmin )
that would make the definition of convergence work. We could tabulate some values of Nmin
for the sequence x n .
ε 0.3 0.2 0.1 0.05 0.03
Nmin 4 5 10 20 34
We can easily confirm these calculations with the plot of the error |x n − 1| against n
(solid line in fig. 1.2).
Next, let’s consider the sequences yn . What does it converge to? Since sin n is bounded
by −1 and 1 for all n ∈ N, whilst n can be arbitrarily large, we guess that yn converges to 0.
The proof can be constructed in the same way as before. For any ε > 0, we find the error:
1.3 The ε, N definition of convergence for sequences 11
| sin n| 1 1
|yn − 0| = ≤ < < ε, (1.1)
n n N
where the final inequality holds if we let N be any integer greater than 1/ε. However, the
latter may not be Nmin , because we have introduced an additional estimate | sin n| ≤ 1.
It seems that Nmin cannot be obtained easily by hand, and the error plot (dashed line in
fig. 1.2) also suggests this. For example, we find that when ε = 0.1, Nmin = 8 rather than 10
as we might have expected from Eq. 1.1.
A similar analysis of the relationship between ε and N can be done for remaining
sequences z n and En . The sequence z n converges to 0 because, intuitively, ln n is negligible
compared with n when n is large (we discuss how this can be proved more rigorously in the
Discussion section).
Finally, the famous sequence En can be regarded as the definition of Euler’s number e
!n
1
e = lim 1 + .
n→∞ n
The errors of all these sequences approach zero, as shown in fig. 1.2 below. Matplotlib’s
plot function renders the errors as continuous lines, but keep in mind that the errors form
sequences that are defined only at integer n, and the lines are for visualisation purposes only.
0.40
|xn 1|
0.35 |yn|
|zn|
0.30 |En e|
0.25
|Error|
0.20
0.15
0.10
0.05
0.00
0 5 10 15 20 25 30 35 40
n
Fig. 1.2: The error, i.e. the absolute difference between the sequences x n, yn, z n, En and
their respective limits.
The code sequence-convergence.ipynb plots fig. 1.2. Remember that all code in
this book can be downloaded from the GitHub page given in the Preface.
12 1 Analysis
Consider up to n = 40 n = np.arange(1,41)
recp = 1/n
Define x n xn = 1-recp
Error = |x n − x | where x = lim x n = 0 err_xn = recp
n→∞
lim y n = 0 yn = np.sin(n)/n
n→∞
err_yn = abs(yn)
lim z n = 0 zn = np.log(n)/n
n→∞
err_zn = abs(zn)
lim E n = e En = (1+recp)**n
n→∞
err_En = abs(En-np.e)
plt.xlim(0, 40)
plt.ylim(0, 0.4)
r allows typesetting in LaTeX plt.xlabel(r'$n$')
plt.ylabel('|Error|')
plt.legend([r'$|x_n-1|$', r'$|y_n|$',
r'$|z_n|$', r'$|E_n-e|$'])
plt.grid('on')
plt.show()
Suppose that at some point, the error shrinks monotonically (i.e. it does not fluctuate like
the error for yn ), we can use the code below to search for Nmin for a given value of epsilon.
Take En for example.
Finding N given ε
For any given ε, say, 10−5 epsilon = 1e-5
n = 1
err = np.e - (1 + 1/n)**n
Continue searching until the error < ε while err >= epsilon:
Increasing n by 1 per iteration n += 1
err = np.e - (1 + 1/n)**n
In this case, we find that for the error to satisfy |En − e| < 10−5 , we need n > Nmin =
135912.
1.3 The ε, N definition of convergence for sequences 13
Discussion
• Floor and ceiling. Instead of saying “N is the smallest integer greater than or equal to
x”, we can write
N = dxe,
which reads “the ceiling of x". For example, in our proof that the sequence x n = 1 − 1/n
converges, given any ε > 0, we can take N = dε −1 e.
Similarly, one can define bxc (the floor of x) as the smallest integer less than or equal
to x.
• Squeeze Theorem. Consider yn = sin n/n. The numerator is bounded between −1 and
1, so we see that
−1 1
≤ yn ≤ .
n n
As n becomes large, yn is squeezed between 2 tiny numbers of opposite signs. Thus,
we have good reasons to believe that yn also converges to 0. This idea is formalised by
the following important theorem.
Theorem 1.1 (Squeeze Theorem) Let an, bn, cn be sequences such that an ≤ bn ≤ cn
for all n ∈ N. If lim an = lim cn , then lim an = lim bn = lim cn .
n→∞ n→∞ n→∞ n→∞ n→∞
The Squeeze Theorem can also be used to prove that z n = ln n/n → 0. Using the
inequality ln x < x for x > 0, observe that
√ √
ln n 2 ln n 2 n 2
0≤ = < =√ .
n n n n
Now take the limit as n → ∞ to find that lim ln n/n = 0.
n→∞
• Monotone convergence. Why is the sequence En = (1 + n1 ) n convergent? First, we
will show that En < 3. The binomial expansion gives:
!n n
1 n 1
X !
En = 1 + =1+1+
n k=2
k nk
Observe that:
n 1 1 n(n − 1)(n − 2) . . . (n − (k − 1))
!
=
k nk k! n·n·n...·n
1 1 2 k −1
! ! !
= ·1· 1− 1− ··· 1−
k! n n n
1
<
k!
1
=
1·2·3... · k
1
≤
1·2·2... ·2
1
= k−1 .
2
14 1 Analysis
2.7
2.6
2.5
2.4
En
2.3
2.2
2.1
2.0
0 10 20 30 40
n
Fig. 1.3: En = (1 + n1 ) n .
In addition, the graph of the sequence En (fig. 1.3) shows that the sequence is strictly
increasing (i.e. En < En+1 ). The proof is an exercise in inequalities (see [20]). These
facts imply that En converges due to the following theorem for monotone (i.e. increasing
or decreasing) sequences.
Theorem 1.2 (Monotone Convergence Theorem) A monotone sequence is convergent
if and only if it is bounded.
1.4 Convergence of series 15
By plotting the partial sums, conjecture whether each of following series converges
or diverges.
∞ ∞ ∞ ∞
X 1 X (−1) k+1 X 1 X 1
a) , b) , c) √ , d) .
k=1
k 2
k=1
k k=1 k k=1
1 + ln k
1
Let’s have a look at one way to plot the partial sum for (a) where Sn = k=1 k 2 .
Pn
The graphs for (b), (c) and (d) can be obtained similarly by augmenting the above code.
Fig. 1.4 shows the result. We are led to conjecture that series (a) and (b) converge, whilst (c)
and (d) diverge.
Discussion
• p-series. Whilst the conjectures are correct, the graphs do not constitute a proof. In
analysis, we usually rely on a number of convergence tests to determine whether a series
converges or diverges. These tests yield the following very useful result in analysis:
Theorem 1.3 The p-series n1p converges if p > 1 and diverges if p ≤ 1.
P
where the shorthand here means ∞ n=n 0 for some integer n0 . This theorem explains
P P
why series (a) converges and (c) diverges.
• Euler’s series. The exact expression for the sum in series (a) was found by Euler in
1734. The famous result
π2
∞
X 1
= , (1.2)
n=1
n 2 6
16 1 Analysis
5 a
b
c
4 d
Partial sum
3
1 2 3 4 5 6 7 8 9 10
Number of terms
Fig. 1.4: The partial sums for the series (a)-(d) up to 10 terms
(sometimes called the Basel problem) has a special place in mathematics for its
wide-ranging connections, especially to number theory.
Euler’s series (1.2) is often interpreted as a particular value of the Riemann zeta function
(ζ (2) = π 2 /6 ≈ 1.6449). See reference [4] for a number of accessible proofs of this
result. A proof using Fourier series will be discussed later in §2.9. We will study the
zeta function more carefully later in §6.11.
• Taylor series. The value of series (b) can be derived from the well-known Taylor
(Maclaurin) series
x2 x3 x4
ln(1 + x) = x − + − ...
2 3 4
valid for x ∈ (−1, 1]. Substituting x = 1 shows that series (b) converges to ln 2 ≈ 0.6931.
We will discuss why this holds on the interval (−1, 1] in §2.4.
• Comparison test. Finally, we can understand why series (d) diverges by observing that,
because n > ln n for all n ∈ N, we have
1 1
> .
1 + ln n 1 + n
Thus, series (d) is greater than n1 , which is divergent (as it is a p-series with p = 1).
P
This technique of deducing if a series converges or diverges by considering its magnitude
relative to another series (usually a p-series) can be formally expressed as follows.
Theorem 1.4 (Comparison Test) Let x n and yn be real sequences such that (eventually)
0 ≤ x n ≤ yn . Then a) x n converges if yn converges, b) yn diverges if x n
P P P P
diverges.
1.5 The Harmonic Series 17
Show that the partial sum of the series grows logarithmically (i.e. increases at the
same rate as the log function).
We wish to calculate the partial sums of the Harmonic Series, where each partial sum of
N terms is given by
N
X 1
.
n=1
n
As in the previous section, to calculate the partial sum of N + 1 terms in Python, we simply
add one extra term to the partial sum of N terms.
Another thing to note is that, because the sum grows very slowly, we will get a more
informative graph if we plot the x-axis using log scale, whilst keeping the y-axis linear.
This is achieved using the command semilogx.
The question suggests that we might want to compare the partial sums with the (natural)
log curve. We will plot the two curves together on the same set of axes. In fact, if the
partial sums up to N terms grow like ln N, then it might even be interesting to also plot the
difference between the two.
The code harmonic.ipynb produces two graphs, one stacked on top of the other. The
top panel shows the growth of the harmonic series in comparison with the log. The difference
is shown in the bottom panel. The calculation itself is rather short, but, as with many
programs in this book, making the plots informative and visually pleasing takes a little more
work.
The resulting plots, shown in fig. 1.5 shows a very interesting phenomenon: the upper
plot shows that the partial sums grows very slowly just like the log, but offset by a constant.
When we plot the difference between the two curves, we see that the difference is between
0.57 and 0.58.
These graphs lead us to conjecture that there is a constant γ such that
N
X 1
γ = lim * − ln N + . (1.3)
, n=1 n
N →∞
-
It is useful to express this as an approximation:
N
X 1
For large N, ≈ ln N + γ. (1.4)
n=1
n
18 1 Analysis
Discussion
• The Euler-Mascheroni constant. The constant γ is known as the Euler-Mascheroni
constant (not to be confused with Euler’s number e), where
γ = np.euler_gamma = 0.5772 . . .
consistent with our findings. The convergence can be proved by showing that the
difference is monotone decreasing (as seen in the lower panel of fig. 1.5) and bounded
below. Hence the limit exists by the monotone convergence theorem. A comprehensive
account of the history and mathematical significance of γ can be found in [121].
• The Harmonic Series is divergent. Another observation from the graphs is that the
Harmonic Series diverges to ∞, just like the log. In fact, we can deduce the divergence
by the following proof by contradiction. Suppose that the Harmonic series converges to
S, then, grouping the terms pairwise, we find
1 1 1 1 1
! ! !
S = 1+ + + + + +...
2 3 4 5 6
1 1 1 1 1 1
! ! !
> + + + + + +...
2 2 4 4 6 6
= S.
See exercise 1 for an intriguing physical situation in which the Harmonic Series appears.
12
10
8
6
4 Harmonic series
lnn
2 1
10 102 103 104 105
0.63
Harmonic - lnn
0.62
0.61
0.60
0.59
0.58
0.57 1
10 102 103 104 105
n
Fig. 1.5: Top: The partial sums of the Harmonic Series grows like ln n. Bottom: The
difference between the two curves approach a constant γ ≈ 0.577.
20 1 Analysis
F1 = 1, F2 = 1, Fn = Fn−1 + Fn−2 .
The Italian mathematician Fibonacci (1170–1250) mentioned the sequence in his Liber
Abaci (‘book of calculations’) published in 1202, although the sequence was already known
to ancient Indian mathematicians as early as around 200BC. The sequence is ubiquitous
in nature and has surprising connections to art and architecture. See [169] for a readable
account of the Fibonacci sequence.
A quick calculation of a handful of terms shows that
and Rn is the ratio of consecutive terms. The growth of Fn and Rn is shown in fig. 1.6. The
figure is producing by the code fibonacci.ipynb.
Let’s consider the growth of Fn . The semilog plot in fig. 1.6 is, to a good approximation,
a straight line. This suggests that, for large n at least, we have an equation of the form
ln Fn ≈ αn + β =⇒ Fn ≈ Aφ n,
where α and β are the gradient and y-intercept of the linear graph, and the constants A = e β
and φ = eα . We are only interested in the growth for large n so let’s focus on the constant φ
for now.
The gradient α of the line can be calculated using two consecutive points at n and n − 1.
Joining these points gives a line with gradient
ln Fn − ln Fn−1 Fn
α= = ln = ln Rn .
1 Fn−1
=⇒ φ = eα = Rn .
Thus we conclude that Fn grows like φ n for large n, where φ = lim Rn , which according
n→∞
to the other plot in fig. 1.6, appears to be just below 1.62. In fact, Python tells us that
R25 ≈ 1.618034.
Finally, we estimate the y-intercept from the expression
β = ln Fn − n ln Rn .
105 1.67
1.66
104 1.65
1.64
103
1
Fn/Fn
Fn
1.63
102 1.62
1.61
101
1.60
5 10 15 20 25 5 10 15 20 25
n n
Fig. 1.6: Left: The Fibonacci sequence Fn plotted against n. The vertical scale is logarithmic.
Right: The ratio Rn between consecutive Fibonacci numbers appears to approach a constant
φ ≈ 1.618.
Use vertical log scale to see the ax1.semilogy(Nplt, F[Nmin:Nend+1], 'bo-', ms=3)
growth of Fn ax1.set_xlabel(r'$n$')
ax1.set_ylabel(r'$F_n$')
ax1.set_xlim(Nmin, Nend)
Manual adjustment of tick frequency ax1.set_xticks(range(Nmin, Nend+1, 5))
ax1.grid('on')
plt.show()
22 1 Analysis
Discussion
• The Golden Ratio. φ in fact corresponds to the Golden Ratio
√
1+ 5
φ= .
2
In fact, our estimate R25 is remarkably accurate to 9 decimal places. The connection
between Rn and φ can be seen by observing that, from the recurrence relation, we find
Fn Fn−2 1
=1+ =⇒ Rn = 1 + . (1.5)
Fn−1 Fn−1 Rn−1
Thus if the lim Rn exists and equals φ, then it satisfies the equation
n→∞
1
φ=1+ ,
φ
which defines the Golden Ratio.
• Binet’s formula. Python allowed us to discover part of a closed-form expression for
Fn called Binet’s formula
φ n − (1 − φ) n
Fn = √ .
5
We will derive this formula when we study matrix diagonalisation in §5.7 . For large
n, we see that Fn ≈ √1 φ n . More precisely, for any positive integer n, the Fibonacci
5
number Fn is the integer closest to the number √1 φ n . This follows from the fact that
5
the second term in Binet’s formula is small. To see this, let r = 1 − φ and note that
|r | n = (0.618 . . .) n < 1 for all n. Therefore,
1 1 1 1
√ (1 − φ) n < √ < √ = .
5 5 4 2
1 1
|Rn+1 − Rn | = −
Rn Rn−1
|Rn − Rn−1 |
=
|Rn Rn−1 |
2 2 4
≤ · |Rn − Rn−1 | = |Rn − Rn−1 |.
3 3 9
1.7 The ε, δ definition of continuity 23
Students are taught in school that a continuous function is one for which the graph can be
drawn without lifting the pen. However, at university, the focus of mathematics is shifted
away from drawings, descriptions and intuition to mathematics that is based purely on logic.
In this spirit, we want to be able to define continuity logically (i.e. using symbols, equations,
inequalities. . . ) without relying on drawings.
The ε-δ definition given above was first published in 1817 by the Bohemian mathematician
and philosopher Bernard Bolzano (1781–1848). It expresses precisely what it means for a
function to be continuous at a point purely in terms of inequalities. This is one of the most
important definitions in real analysis, and is also one which many beginning undergraduates
struggle to understand. Let’s first try to unpack what the definition says.
If f is continuous at x 0 , it makes sense to demand that we can always find some y
values arbitrarily close to (or equal to) y0 = f (x 0 ). Symbolically, the set of “all y values
arbitrarily close to y0 ” can be expressed as follows.
Definition Let ε > 0 and y0 ∈ R. The ε-neighbourhood of y0 is the set of all y ∈ R such
that |y − y0 | < ε.
Now, we want the y values inside the ε-neighbourhood of y0 to “come from" some values
of x. In other words, there should be some x values such at that y = f (x). It is natural to
demand that those values of x should also be in some neighbourhood of x 0 . We are satisfied
as long as “there exists such a neighbourhood” in the domain A. This statement can be
written symbolically as:
Now let’s see how Python can help us visualise this ε-δ game for the function a)
f (x) = 2/x, at x 0 = 1. The code continuity.ipynb produces a GUI (graphical user
interface) as shown in fig. 1.7. Warning: This code only works in a Jupyter environment1.
continuity.ipynb (for plotting the top panel of fig. 1.7). Jupyter only
import matplotlib.pyplot as plt
import numpy as np
ipywidgets make the plot interactive from ipywidgets import interactive
Domain x = np.linspace(0.5, 2)
We will study the continuity of f at x 0 x0 = 1
y = f(x)
y0 = f(x0)
Now define a function of ε to feed
into the slider def plot(eps):
f (x 0 ) + ε y0p = y0+eps
f (x 0 ) − ε y0m = y0-eps
The x values of the above two points x0p = finverse(y0p)
(We use them to calculate δ) x0m = finverse(y0m)
Finally, set slider for ε ∈ [0.01, 0.4] in interactive(plot, eps=(0.01, 0.4, 0.01))
steps of 0.01
In the code, we use the ipywidgets2 library to create a slider for the value of ε. Drag
the blue slider to change the separations of the horizontal dotted lines which correspond to
the ε-neighbourhood of f (x 0 ).
1 If you are running a non-Jupyter IDE, Matplotlib’s slider (to be discussed shortly) can be used to to
produce fig. 1.7. The code is given on GitHub as continuity.py.
2 https://ipywidgets.readthedocs.io
1.7 The ε, δ definition of continuity 25
Fig. 1.7: Interactive plots illustrating the ε-δ definition of continuity. Here we investigate
the continuity at x 0 = 1 of functions defined by f (x) = 2/x (top), and g(x) = ln(x + 1)
(bottom). In each graph, you can use the slider to adjust the value of ε, and a viable value of
δ is displayed.
Given this ε, the largest δ-neighbourhood of x 0 can be found by taking the smallest
separation between the vertical dotted lines. In this example, we can write
( )
δmax = min f −1 (y0 + ) − x 0 , f −1 (y0 − ) − x 0 .
This agrees with the value of δ displayed by the code at the bottom of the GUI.
26 1 Analysis
We can easily modify the code to illustrate the continuity of g(x) = ln(x + 1) at x 0 = 1,
as shown in the lower panel of fig. 1.7. We leave it as an exercise for you to show that the
exact expression for δ shown is given by
Fig. 1.8: An interactive GUI showing the graph of h(x) = sin x/x around x = 0 (with
h(0) B 1). The dotted lines show y = 1 and 1 ± ε, where the value of ε can be adjusted
using the slider. The coordinates of the cursor are given at the bottom. The readout shows
that for ε = 0.1, we need δ ≈ 0.79.
1.7 The ε, δ definition of continuity 27
fig,ax = plt.subplots()
Leave a space at the bottom for a slider plt.subplots_adjust(bottom = 0.2)
plt.ylim(0.5,1.15)
Plot the function with a thick line plt.plot(xarray, f(xarray) , lw=2)
The slider’s dimensions and location axeps = plt.axes([0.15, 0.1, 0.7, 0.02])
Create a slider eps_slide = Slider(axeps, '\u03B5',
specify range of values, step size and. . . 0, 0.15, valstep = 0.001,
the initial value of ε valinit = eps)
plt.show()
28 1 Analysis
Discussion
• Proof of continuity. The graphs shown in this section do not prove continuity. They
only allow us to visualise the ε-δ game. Writing a rigorous ε-δ proof is an important
topic that will keep you very much occupied in your analysis course at university.
To give a flavour of what is involved, here is a rigorous proof that f (x) = 2/x is
continuous at x = 1.
Proof: For all ε > 0, take δ = min{1/2, ε/4}, so that ∀x ∈ R \ {0},
2
|x − 1| < δ =⇒ | f (x) − f (1)| = −2
x
2|x − 1|
= (*)
|x|
Since |x − 1| < 1/2, the reverse triangle inequality gives
1
−1/2 < |x| − 1 < 1/2 =⇒ 2/3 < < 2.
|x|
Substituting this into (∗), we find
lim f (x) = L,
x→c
The only difference between this definition and that of continuity is that for limits, there
is no mention of what happens at x = c, but only what happens close to c.
Using this definition, it can be shown that the following important theorem holds.
Theorem 1.6 Let A ⊆ R and define f : A → R. f is continuous at c ∈ A if and only if
lim f (x) = f (c).
x→c
This rigorous definition of the continuous limit lays a strong foundation for differentiation
and integration, both of which can be expressed as continuous limits, as we will see
later.
• The sinc function. In the language of limits, our plot of h(x) shows that
sin x
lim = 1. (1.6)
x→0 x
The proof of this limit based on the Squeeze Theorem (for continuous limits) can be
found in [20]. The function h(x) is sometimes written sinc(x). It has many real-world
applications, particularly in signal processing.
1.7 The ε, δ definition of continuity 29
• Why comma? In the code, you may be wondering why we used a comma on the LHS
of the assignment
rather than
hP = plt.plot(xarray, harrayP, 'r:')
Indeed, had we removed the comma, Python would report errors when the slider is
moved. So what is going on here? Well, let’s ask Python what type the object hP is.
With the comma, the command type(hP) tells us that hP is a Line2d object. This
object has many properties including the x and y coordinates of the lines (called xdata
and ydata) and optional colours and line thickness attributes (type dir(hP) to see the
full list of attributes). Indeed, when the slider is moved, the update function updates
the y coordinates of the dashed line.
Without the comma, however, we find that hP is a list. Furthermore, len(hP)=1,
meaning that the object plt.plot(...) is in fact a list with one element (namely, the
Line2d object). When we move the slider, the y coordinates (ydata) is not a property
of this list, but rather the object within the list. This explains why Python reports an
error.
To put this in another way, we want to change the filling of the sandwich within the box,
rather than put new filling onto the box itself.
In summary, the comma tells Python to unpack the list (i.e. take out the sandwich), so
we can update its content.
30 1 Analysis
q1
if x ∈ Q and x = p
in lowest form, where p, q ∈ N,
f (x) =
0
q
otherwise.
1 1
Plot this function. For how many values x ∈ (0, 1) is f (x) > 10 ? or 100 ?
Deduce that Thomae’s function is continuous at irrational x and discontinuous at
rational x.
This function, named after the German mathematician Carl Johannes Thomae (1840–
1921), serves as a classic illustration of how university mathematics differs from school
mathematics. Few school students would have seen functions defined in such an exotic way.
Let’s try to make sense of the function, for example, by first looking for values of x that
would be mapped to the value 1/8. A little experiment reveals that there are 4 such values:
1 3 5 7
, , , .
8 8 8 8
Note that f (2/8) = f (6/8) = 1/4 and f (4/8) = 1/2.
It is clear that if f (x) = 1/8 then x must be a rational number of the form p/8 where
p and 8 have no common factors apart from 1. Another way to say this is that p has to be
coprime to 8. Yet another way to say this is that the greatest common divisor (gcd) of p and
8 is 1, i.e.
gcd(p, 8) = 1.
(More about the gcd when we discuss number theory in chapter 6.)
0.5
0.4
0.3
y
0.2
0.1
0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Fig. 1.9: Thomae’s function
1.8 Thomae’s function 31
The graph of Thomae’s function is shown in fig. 1.9. To plot this graph, we can
conveniently use NumPy’s gcd function as shown in the code thomae.ipynb. The code
also counts how many values of x satisfy f (x) > 1/10.
thomae.ipynb (for plotting fig. 1.9)
import numpy as np
import matplotlib.pyplot as plt
Plot the points with big red dots plt.plot(xlist, ylist, 'or', ms=3)
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(0,1)
plt.grid('on')
plt.show()
In the code, we use the for loops to append values to empty lists. This creates 2 growing
lists of the x and y coordinates. Instead of lists, one could also use NumPy arrays (using
np.empty and np.append).
Running the code tells us that there are 27 values. As an exercise, try to write them all
down. As for the case f (x) > 1/100, the code gives us 3003 values.
What do these results mean? Well, they imply that given any number ε > 0, there are a
finite number of x such that f (x) > ε. This means that at any x 0 ∈ (0, 1), we can find a
neighbourhood of x 0 sufficiently small that it does not contain any values of x such that
f (x) > ε. In other words, | f (x)| < ε for all x sufficiently close to x 0 .
Now let x 0 be any irrational number in (0, 1). Since f (x 0 ) = 0, the previous paragraph
gives the following result:
This is precisely the ε-δ definition for the continuity of Thomae’s function at any
irrational x 0 ∈ (0, 1).
Discussion
• The Archimedean property. In deducing the fact that there are a finite number of x
such that f (x) > ε, we implicitly used the following property of real numbers.
Theorem 1.7 (The Archimedean property) For any ε > 0, there exists an integer
n ∈ N such that ε > 1/n.
(Can you see precisely where this has been used?) This property sounds like an obvious
statement, but, like many results in introductory analysis, it has to be proven. It turns out
32 1 Analysis
that the Archimedean property follows from other axioms, or properties of real numbers
which are satisfied by decree. These axioms consist of the usual rules of addition an
multiplication, plus some axioms on inequalities which allow us to compare sizes of
real numbers. In addition, we also need:
The Completeness Axiom. Every bounded nonempty set of real numbers has a least
upper bound.
For example, the least upper bound of real numbers x ∈ (0, 1) is 1. The axiomatic
approach to constructing R is a hugely important topic in introductory real analysis
which you will encounter at university.
• Sequential criterion. The ε-δ definition for continuity is equivalent to the following.
Theorem 1.8 (Sequential criterion for continuity) The function f : A → R is contin-
uous at x 0 ∈ A if and only if, for every sequence x n ∈ A converging to x 0 , we have
f (x n ) → f (x 0 ).
This theorem can be used to prove that Thomae’s function is discontinuous at any
rational number x 0 ∈ (0, 1). Consider the sequence
√
2
x n = x0 − ,
n
which must√necessarily be a sequence of irrational numbers (here we use the well-known
result that 2 is irrational). The sequence clearly converges to x 0 ∈ Q. Thus, we have
found a sequence x n → x 0 such that f (x n ) = 0 6→ f (x 0 ). This proves that f cannot be
continuous at x 0 ∈ Q ∩ (0, 1).
1.9 The Intermediate Value Theorem and root finding 33
In other words, a continuous function f on [a, b] takes all possible values between f (a)
and f (b).
Now consider the function f (x) = e−x − x, which comprises the exponential function
and a linear function. Both functions (and their difference) can all be shown to be continuous
at every x ∈ R using the ε-δ definition. In particular, we note that f is continuous on the
interval [0, 1] and that f takes opposite signs at the end points of this interval, i.e.
Therefore, the IVT tells us that ∃c ∈ (0, 1) such that f (c) = 0. The graph below illustrates
that the root of f (x) = 0 indeed lies in this interval.
Fig. 1.10: The graph y = e−x − x intersects the x-axis somewhere in the interval (0,1). This
is consistent with the Intermediate Value Theorem.
One way to proceed is to consider the sign of f (x) at the midpoint x = 0.5. We find that
√
f (0.5) = 1/ e − 0.5 > 0.
Since f (x) has opposite signs at the endpoints of the ‘bisected’ interval (0.5, 1), the IVT
again implies that f has a root in (0.5, 1).
34 1 Analysis
We can carry on bisecting this interval and study the sign of f (x) at the midpoint and
repeat until we achieve a root c in an interval that is as small as the desired accuracy.
The code below illustrates how this process (called the bisection method of root finding)
can be iterated as many times as required to achieve a root with accuracy acc. The core of
the code is a while loop which repeats the iteration until the size of the bisected interval
shrinks below acc.
Running the code above with acc = 0.5 × 10−5 (to ensure 5 dec. pl. accuracy) shows that
Final iteration number 17, x= 0.5671424865722656
Thus, we can report that the root of the equation e−x − x = 0 is approximately 0.56714
(5 dec. pl.).
Discussion
• Newton-Raphson method. The bisection method is a relatively slow but reliable
method of root-finding for most practical applications. A faster root-finding method
called the Newton-Raphson method is discussed in the exercise 11. Although the
Newton-Raphson method is faster, it requires additional information, namely, the
expression for the derivative f 0 (x), which is not always available in real applications.
• Bracketing. For the bisection algorithm to start, we first need to bracket the root, i.e.
find an interval [a, b] in which f (a) and f (b) have opposite signs . However, this may
not be possible, for instance, with f (x) = x 2 . In this case another root-finding method
must be used. See [131, 132] for other root finding algorithms in Python.
1.9 The Intermediate Value Theorem and root finding 35
• Throwing. The Python command raise is useful for flagging a code if an error has
occurred. This practice is also known as throwing an error. The simplest usage is:
if (some conditions are satisfied): x
raise ValueError('Your error message')
It is good practice to be specific in your error message about what exactly has gone
wrong.
• Numerical analysis is the study of the accuracy, convergence and efficiency of numerical
algorithms. This field of study is essential in understanding the limitation of computers
for solving mathematical problems. We will explore some aspects of numerical analysis
in this book, particularly in the next chapter. For further reading on numerical analysis,
see, for example, [40, 170, 182].
36 1 Analysis
1.10 Differentiation
For each of the following functions, plot its graph and its derivative on the interval
[−1, 1].
√ x 2 sin(1/x 2 ) x , 0,
a) f (x) = sin πx, b) g(x) = |x|, c) H (x) =
0 x = 0.
Which functions are differentiable at x = 0?
Let f : (a, b) → R and let c ∈ (a, b). The derivative of f at x = c, denoted f 0 (c) is
defined as:
f (c + h) − f (c)
f 0 (c) = lim . (1.7)
h→0 h
A function is said to be differentiable at x = c if the limit above is finite.
In school mathematics, the derivative is often defined as the rate of change of a function,
or the gradient of the tangent to y = f (x) at x = c. However, in university analysis,
pictorial definitions are not only unnecessary, but must also be avoided in favour of rigorous
logic-based definitions. The limit (1.7) has a precise definition in terms of ε-δ (see §1.7).
First let’s consider the derivative of f (x) = sin πx. You will probably have learnt how
to differentiate this type of function at school. But let’s see how we can also work this out
from first principles using the definition above. Recall the trigonometric identity:
α+ β α− β
! !
sin α − sin β = 2 cos sin .
2 2
where we have broken up the limit into a product of two limits. The first limit in (1.8) is
simply cos πc (technical note: this step requires the continuity of cos). The second limit can
be evaluated using the result from Eq. 1.6 in §1.7, giving us π. Therefore, we have
as you might have expected. Note that f 0 is defined at all values of c ∈ R. In other words, f
is differentiable on R. In particular, it is certainly differentiable at x = 0 with f 0 (0) = π.
The next function g can be written in piecewise form (using the definition of the modulus)
as:
√
x x ≥ 0,
g(x) = √
−x x < 0.
1
2√ x
x > 0,
g 0 (x) =
− √1
2 −x x < 0.
Before we consider H (x), let’s pause to consider how derivatives can be calculated on
the computer. Of course, one could simply differentiate the function by hand, then simply
code the result. However, in real applications, we may have limited information on the
function to be differentiated. Sometimes the expression for the function itself cannot be
easily written down. This means that it is often impractical to rely on the explicit expression
for the derivative.
Instead, we can work with a numerical approximation of the limit definition (1.7). For
example, we could say:
f (x + h) − f (x)
f 0 (x) ≈ , (1.9)
h
for a small value of h. This is called the forward-difference or forward-Euler estimate of the
derivative. The word ‘forward’ comes from the fact that the gradient at x is approximated as
the slope of the line joining the points on the curve at x and x + h, a little “forward" from x.
Figure 1.11 shows the graphs of f , g (dashed blue lines) and their approximate derivatives
(solid orange lines) calculated using the forward-difference approximation (1.9) with
h = 10−6 . Note in particular that the lower panel shows that the derivative of g blows up
near x = 0, where y = g(x) has a sharp point (similar to that the graph of y = |x|). The
graph indeed confirms that g is not differentiable at x = 0.
38 1 Analysis
3 y=f(x)
y=f'(x)
2
3
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
2.0
y=g(x)
1.5 y=g'(x)
1.0
0.5
0.0
0.5
1.0
1.5
2.0
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
√
Fig. 1.11: The graphs of functions f (x) = sin πx (top) and g(x) = |x| (bottom) – in dashed
blue lines – and their derivatives (solid orange lines) calculated using the forward-difference
approximation (1.9) with h = 10−6 .
The code differentiation.ipynb plots the graphs of g and g 0. Note that we work
on the positive and negative x values separately because, otherwise, Matplotlib would join
up points around x = 0, creating a false visual impression of the values of g 0 (x) near 0.
1.10 Differentiation 39
Now let’s return to H (x). With simple modifications of the code, we can plot H and H 0
as shown in fig. 1.12. It appears that the derivative fluctuates wildly around x = 0. One
might even be tempted to conclude from the graph that H is not differentiable at x = 0.
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75 y=H'(x)
y=H(x)
1.00
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00
Fig. 1.12: The graphs of function H (x) (blue dashed lines) and its derivative (solid orange
lines) calculated using the approximation (1.9) with h = 10−6 .
H (h) − H (0)
H 0 (0) = lim
h→0 h
= lim h sin(1/h2 ).
h→0
Note that we replaced H (h) by the expression for h , 0. This follows from the fact that the
limit as h → 0 is defined without requiring us to know what happens at h = 0. Observe
then that for any h ∈ R, we have
Discussion
• How small should h be? In the forward-difference approximation 1.9, it appears as
though the smaller h is, the more accurate the estimate for the derivative becomes.
Surprisingly, this is not the case! You could try this yourself by changing h in the code
from 10−6 to, say 10−20 . What do you observe?
In fact, there is an optimal value of h which gives the most accurate answer for the
derivative. Larger or small values of h would give answers that are less accurate. We
will explore this in chapter 2, in which we will also see that there are many other
derivative approximations that are more accurate than the forward-difference formula
(but they take more resources to compute).
• Differentiable means continuous. The following useful theorem establishes the link
between continuity and differentiability.
Theorem 1.10 If f : (a, b) → R is differentiable at a point c ∈ (a, b), then f is
continuous at c.
• Can computers really do maths? It is worth reminding ourselves that whilst computers
can help us understand mathematics more deeply, they cannot think mathematically.
It is our job to check and interpret results that Python tells us. Often the answers we
get are not what we expect (e.g. when the step-size h is too small). Sometimes the
answers are just plain wrong (e.g. H 0 (0)). So one should never treat a computer like an
all-knowing black box which always gives us the correct answers all the time.
1.11 The Mean Value Theorem 41
Show that the function f (x) = (x − 1) sin x has a turning point in the interval (0,1).
Find the x coordinate of the turning point.
Method 1
It is natural for students to associate the phrase ‘turning point’ with where f 0 (x) = 0.
This means that we have to solve the equation
Suppose cos x = 0, then Eq. 1.10 gives sin x = 0, but this is impossible because sin x
and cos x cannot be zero at the same time, so we conclude that cos x , 0. We can then
safely divide Eq. 1.10 by cos x, giving
tan x = 1 − x. (1.11)
A quick sketch reveals that there are infinitely many solutions, but only one in (0, 1), as
shown in fig. 1.13.
4
y=1 x
3 y = tanx
2
1
0
1
2
3
4
3 2 1 0 1 2 3
Fig. 1.13: The curves y = tan x and y = 1 − x intersect infinitely many times on R, but only
once on (0, 1).
One way to locate this root is to do a bisection search for the solution of (1.11) in (0, 1)
using the code in §1.9. Whilst this method will yield the x-coordinate of the turning point,
we needed the explicit expression for the derivative f 0 (x) that we just found. However, as
discussed earlier, such an expression may not always be available.
Method 2
Let’s take another look at the problem. This time, suppose we don’t know how to
differentiate the function f (x) = (x − 1) sin x, nor do we know about its graph. Can we still
deduce that there is a turning point in the interval (0, 1)? Yes, according to the following
theorem:
42 1 Analysis
Theorem 1.11 (Mean Value Theorem) If f : [a, b] → R is continuous on [a, b] and differ-
entiable on (a, b), then there exists c ∈ (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a
With (a, b) = (0, 1), we find that f (0) = f (1) = 0, so by the Mean Value Theorem,
∃c ∈ (0, 1) such that f 0 (c) = 0, i.e. there exists a turning point in (0, 1)
The Mean Value Theorem (MVT) is a very powerful theorem in analysis. The word
‘mean’ refers to the fact that at c, the gradient is simply the average trend on the interval
(a, b), i.e. the slope of the straight line joining the two endpoints of the curve.
To find the location of the turning point without manual differentiation, we can employ
the forward-difference estimate (1.9)
f (x + h) − f (x)
0
f est (x) = , (1.12)
h
for some small h (say 10−6 ). It is then just a matter of finding where f est
0 (x) = 0 numerically
0.235 0.15
0.236 0.10
0.237 0.05
f 0(x)
f(x)
0.238
0.00
0.239
0.05
0.240
0.44 0.46 0.48 0.50 0.52 0.54 0.44 0.46 0.48 0.50 0.52 0.54
x x
Fig. 1.14: The curve y = (x − 1) sin x (left) and its exact derivative (right) around the
turning point at x ≈ 0.48.
1.11 The Mean Value Theorem 43
Discussion
• Rolle’s Theorem. It is a useful exercise to show that the MVT is a consequence of the
following theorem.
Theorem 1.12 (Rolle’s Theorem) If f : [a, b] → R is continuous on [a, b] and differen-
tiable on (a, b), with f (a) = f (b), then there exists c ∈ (a, b) such that f 0 (c) = 0.
As a consistency check, putting f (a) = f (b) in the MVT gives Rolle’s Theorem, so we
see that the MVT is a more general result.
Michel Rolle (1652–1719) was a French self-taught mathematician.
√ Apart from Rolle’s
Theorem, he is also credited with introducing the notation n x for the nth root of x.
• Cauchy’s Mean Value Theorem. A generalised version of the MVT is the following.
Theorem 1.13 (Cauchy’s Mean Value Theorem) Let f and g be functions that are
continuous on [a, b] and differentiable on (a, b), Suppose that g 0 (x) , 0 for all
x ∈ (a, b). Then there exists c ∈ (a, b) such that
f 0 (c) f (b) − f (a)
0
= .
g (c) g(b) − g(a)
As a consistency check, putting g(x) = x gives the MVT, so we see that Cauchy’s MVT
is a more general result.
44 1 Analysis
x + 2x 2 sin x1
x , 0,
f (x) = (1.13)
0 x = 0.
We will show that f 0 (0) > 0, yet there exists no neighbourhood of 0 in which f is strictly
increasing.
Firstly, to prove that f 0 (0) > 0, we use the limit definition (1.7).
f (0 + h) − f (0)
f 0 (0) = lim
h→0 h
1
!
= 1 + lim 2h sin
h→0 h
= 1,
where the last step follows from the Squeeze Theorem as before (try justifying this by
following the calculation at the end of §1.10).
We now show that f is not increasing around x = 0. It suffices to show that in any
neighbourhood of 0, we can find a point with negative gradient. Symbolically, we want to
show that ∀δ > 0, ∃c ∈ (−δ, δ) such that f 0 (c) < 0 (we will say more about this in the
Discussion section).
By applying the usual differentiation techniques, we find that for x , 0,
1.12 A counterexample in analysis 45
1 1
! !
f 0 (x) = 4x sin − 2 cos + 1. (1.14)
x x
For all δ > 0, there exists an integer n ∈ N such that 0 < n1 < 2πδ. This follows from
1
the Archimedean property discussed in §1.8. Note that the point x = 2πn is within the
neighbourhood (−δ, δ). However,
1
!
f0 = −1.
2πn
Hence, we have successfully used the counterexample to disprove the given statement.
Let’s use Python to visualise the curve y = f (x). The code below produces fig. 1.15,
which plots the curve in two neighbourhoods of 0 (the neighbourhood on the right panel is
10 times smaller than the left). We see a sinusoidal behaviour in both plots. Try increasing
the zoom level by a small modification of the code, or by using %matplotlib and using
the zoom button on the GUI (see §1.7). In any case, you should see a sinusoidal behaviour
no matter how much you zoom in (of course you should increase the plotting resolution in
the code accordingly). The figure suggests that within any δ neighbourhood of 0, we can
find points at which f 0 (x) is positive, negative or zero!
One way to understand this result is to see that the graph for the function f gets
increasingly wiggly towards the origin, whilst being constrained to bounce between the
parabolas y = x + 2x 2 and y = x − 2x 2 (where sin x1 = ±1). These parabolic boundaries
intersect at 0, hence forcing f 0 (0) = 1. Try adding these parabolas to the plots in fig. 1.15.
y=f(x) Zoomed
0.4 0.04
0.3 0.03
0.2 0.02
0.1 0.01
0.0 0.00
0.1 0.01
0.2 0.02
0.3 0.03
0.4 0.04
0.4 0.2 0.0 0.2 0.4 0.04 0.02 0.00 0.02 0.04
Fig. 1.15: The curve y = f (x) where f (x) = x + 2x 2 sin x1 (x , 0) and f (0) = 0. The
right panel is the same plot but 10 × zoomed in. The sinusoidal behaviour is seen no matter
how much we zoom in towards the origin.
46 1 Analysis
plt.show()
Discussion
• Derivative of a monotone function. There is a subtle connection between the sign of
the derivative and the monotonicity of f . The MVT can be used to show that, on an
interval I,
f 0 (x) ≥ 0 on I ⇐⇒ f is increasing on I,
f 0 (x) > 0 on I =⇒ f is strictly increasing on I.
The converse to the second statement does not hold. Can you think of a simple
counterexample? A dramatic one is given in exercise 14.
• A counterexample of historical importance. Perhaps the most notorious counterexam-
ple in analysis is a function which is continuous everywhere but is nowhere differentiable.
The discovery of such a function by Karl Weierstrass in 1872 sent a shockwave through
the mathematical world, leading to the reform and development of analysis into the
rigorous subject that we know today. We will meet this function in the next chapter.
• More counterexamples in analysis have been compiled by Gelbaum and Olmsted [72],
a highly recommended book full of surprising and enlightening counterexamples.
1.13 Exercises 47
1.13 Exercises
a. Show that using n books, the overhang (in units of books) can be written as the
Harmonic Series
1 1 1 1
!
1+ + ...+ .
2 2 3 n
Deduce that using 4 books, the overhang exceeds the length of a book.
b. Plot the overhang (in unit of books) against the number of books used to build the
tower. Consider up to 1000 books. Suggestion: use log scale on the horizontal axis.
c. On the same plot, plot the result when the eq. 1.4 (logarithmic approximation of
the Harmonic Series) is used to calculate the overhang.
d. Using the log approximation:
i. estimate the overhang when 106 books are used to create the tower. (Ans:
around 7-book long overhang.)
ii. estimate the number of books needed to build a leaning tower with a 10-book
long overhang. (Your answer is probably greater than the estimated number of
physical books that exist in the world.)
2 (Famous approximations for π) Below are three historically important approximations
for π.
• Madhava series (14th century), sometimes called the Gregory-Leibniz approxima-
tion (1671–1673)
1 1 1
!
π = 4 1− + − +···
3 5 7
• Wallis product (1656)
2 2 4 4 6 6
! ! !
π=2 · · · · · ···
1 3 3 5 5 7
Let n denote the number of iterations in each approximation scheme. For example, the
zeroth iteration (n = 0) gives 4, 2 and 2 for the three approximations respectively.
a. On the same set of axes, plot the results of the three approximations against the
number of iterations up to n = 10.
b. On a set of logarithmic axes, plot the absolute fractional error
for the three approximations up to n = 100. This gives us an idea of how fast the
approximations converge to π.
You should find that the error for Viète’s formula does not appear to go smaller
than a minimum limit of around 10−16 . The reason for this is the machine epsilon
which will be discussed in §2.2.
c. Recall that an estimate x of π is accurate to p decimal places if |x − π| < 0.5 × 10−p .
For each of the three approximations of π, calculate how many iterations are needed
to obtain π accurate to 5 decimal places.
(Answers: 200000, 157080 and 9.)
3 (Ramanujan’s formula for π) In 1910, Ramanujan gave the following formula for π.
√ X ∞ −1
2 2 (4n)!(1103 + 26390n)
π = .
9801 n=0 (n!) 4 3964n
(Famously, he simply ‘wrote down’ many such formulae.) Calculate the first 3 iterations
of this approximation. How many decimal places is each approximation accurate to?
Try writing a code that calculates the series up to n terms. Can your code accurately
evaluate the result, say, when n = 10? If not, explain why.
Suggestion: In Python, we can calculate the factorial, say 15!, using the following
syntax:
import math
math.factorial(15)
The factorials in Ramanujan’s formula give rise to huge numbers. Think about what
can go wrong.
4 (Reciprocal Fibonacci number) Use fibonacci.ipynb as a starting point.
The reciprocal Fibonacci constant is given by
∞
X 1
ψ= = 3.35988566624317755 . . .
F
n=1 n
U0 = 0, U1 = 1, Un = PUn−1 − QUn−2 .
En = |x n − x|.
The speed of convergence can be quantified by two positive constants: the rate (C) and
the order (q) of convergence, defined via the equation
En+1
lim = C > 0.
n→∞(En ) q
You should first try to experiment on a piece of grid paper to come up with a conjecture
on which points are visible from (0, 0).
a. Write a code that produces a grid and marks each point that is visible from (0,0)
with a red dot (you will need to impose a sensible cutoff).
b. For each visible point (m, n), plot its image under the mapping
1
!
m
F (m, n) = , .
m+n m+n
What do you see? Can you explain how this mapping works?
(Answer: You see Thomae’s function!)
9 (Root-finding) Use the bisection code to solve the following problems. Give your
answers to 4 decimal places.
a. Solve the equation x 3 − x 2 − x − 1 = 0.
Suggestion: Start by plotting.
b. Solve sinx x = 0.9 (hence verifying the intersection point seen in fig. 1.8)
√
c. Find the numerical value of 3 using only the four basic operations + − ×÷.
Suggestion: Start by defining f (x) = x 2 − 3.
10 (Generalised Golden Ratio) Suppose we generalise the Golden Ratio to φ n defined as
the positive root of the following order-n polynomial
x n − x n−1 − x n−2 · · · − 1 = 0.
sinx x
if x , 0,
f (x) =
1
if x = 0.
On the same set of axes, plot its first and second derivatives, estimated using the
forward Euler method (i.e. do not differentiate anything by hand in this question). Try
to use different types of lines in your plot (think about colour-blind readers). Here is an
example output.
1.0
f(x) = sinx/x
0.8 1st derivative
2nd derivative
0.6
0.4
0.2
y
0.0
0.2
0.4
0 /2 3 /2 2
x
Fig. 1.16: A pretty plot of the sinc function and its first and second derivatives.
52 1 Analysis
13 (Mean Value Theorem) Consider the sinc function f defined in the previous question.
Let’s apply the Mean Value Theorem (theorem 1.11) to f on the interval [0, π].
In particular, let’s determine the value(s) of c in the statement of the theorem. In other
words, we wish to find c ∈ [0, π] such that
f (π) − f (0)
f 0 (c) = .
π
a. Show (by hand) that one solution is c = π.
b. Use a root-finding algorithm to find another solution to 4 decimal places.
Your answer should be consistent with the dashed line in fig. 1.16 above.
x (2 − cos ln x − sin ln x)
if x ∈ (0, 1],
f (x) =
0
if x = 0.
It can be shown that f is strictly increasing, yet there are infinitely many points where
f 0 (x) = 0 in (0, 1].
a. Use Python to plot the graph of the function. Your graph should demonstrate the
self-similar structure when we zoom in towards the origin.
b. Show (by hand) that there are points of inflection at x = e−2nπ where n ∈ N.
Indicate them on your plot.
CHAPTER
TWO
Calculus
Fig. 2.1: (L-R) Sir Isaac Newton (1642–1726) and Gottfried Leibniz (1646–1716) formulated
calculus independently, although the question of who came to it first was a highly contentious
intellectual feud of the late 17th century. (Image source: [137].)
How can we use Python to help us understand calculus? Since Python is primarily a
numerical tool, we will be interested in the numerical values of derivatives and integrals
rather than their algebraic expressions. For example, we are not interested in the problems
Z
2
If f (x) = e , find f (x)
−x 0 OR Find sin x dx,
which are symbolic in nature, and so must be solved by specialised packages that have been
taught the rules of calculus (e.g. the SymPy library). Instead, we are interested in problems
that require numerical answers, such as
If you have an older version of SciPy, update it using pip. See Appendix A.
In this chapter, we will need the SciPy module scipy.integrate, which contains
several integration routines such as the Trapezium and Simpson’s Rules. The go-to workhorse
for computing integrals in Python is the function quad (for quadrature, a traditional term for
numerical integration). The quad function itself is not a single method but a set of routines
which work together to achieve an accurate answer efficiently. The algorithm behind quad
was initially conceived in Fortran in a package called QUADPACK, and is described in
detail by [165].
Here’s the basic syntax for the quad function.
Integration with SciPy
import numpy as np
import matplotlib.pyplot as plt
For integration with SciPy import scipy.integrate as integrate
2
Define f (x) = e −x f = lambda x: np.exp(-x**2)
R∞ 2
0
e −x dx integral, error = integrate.quad(f, 0, np.inf)
(0.8862269254527579, 7.101318390472462e-09)
Note that quad returns a pair of numbers (called a tuple of length 2). The first number is the
value of the definite integral, and the second is an estimate of the absolute error (which
2.2 Comparison of differentiation formulae 55
√
should be tiny for a reliable answer). In this case, the exact answer is π/2, which agrees
with SciPy’s answer to 16 decimal places.
The mathematical details of how functions like SciPy’s integrate work are usually
hidden from users behind a ‘Wizard-of-Oz’ curtain that users rarely look behind. This is
contrary to the spirit of this book whose emphasis is on mathematics and not commands.
Therefore, in this chapter, we will only use SciPy’s integrate function occasionally, and
only to confirm what we can do by other, more transparent methods.
As for differentiation, SciPy had a derivative routine, but it is now obsolete. In the
next section, we will discuss how to differentiate a function numerically.
E(h) = |actual value of the derivative−numerical approximation using step size h|.
The forward, backward and symmetric-difference formulae all approach the gradient of
the tangent to the graph y = f (x) at point x as the step size h → 0, so it is reasonable to
expect that the smaller the h, the more accurate the expression will be. However, we show in
this section that on the computer, this is not the case. This may come as a surprise to many
beginners. It is very important to be aware that when coding derivatives, hugely inaccurate
answers could result if h is too large or too small.
Let’s study the accuracy of these approximations at a fixed value x = 1 and calculate the
absolute difference between f 0 (1) and the approximations as we vary h. The code below
produces the graph of E(h), which in this case is
E(h) = |3 − approximation|
fx = f(x)
fxp = f(x+h)
fxm = f(x-h)
Fig. 2.2 shows the logarithmic plot of the two approximations (forward and symmetric
differences). The graphs show a number of distinctive and surprising features:
100
10 2
Absolute error E
10 4
10 6
10 8
10 10 Symmetric difference
Forward difference
10 20 10 17 10 14 10 11 10 8 10 5 10 2
h
Fig. 2.2: E(h) defined as | f 0 (1) − approximation| is plotted for the forward-difference
approximation (thin red line) and the symmetric-difference approximation (thick black line).
2.2 Comparison of differentiation formulae 57
• For each approximation, there appears to be an optimal value of h which minimises the
error, namely:
10−8 (Forward difference)
hopt ∼ 10−6 (Symmetric difference)
The symbol ∼ is used in this context to mean a rough order-of-magnitude estimate.
• The minimum error (ignoring small-scale fluctuations) are roughly:
10−8
(Forward difference)
E(hopt ) ∼
10−11
(Symmetric difference)
• For h & hopt , we see a linear behaviour in both cases. Since this is a log-log plot, a line
with gradient m actually corresponds to the power-law relation E(h) ∼ h m . From the
graph we see that
h
(Forward difference)
E(h & hopt ) ∼
h2
(Symmetric difference)
• For 10−16 . h . hopt , we see rapid fluctuations, but the overall trend for both
approximations is
E(10−16 . h . hopt ) ∼ h−1 .
• For h . 10−16 , both approximations give the same constant.
You should try changing the function and the point x at which the derivative is calculated.
You should see that there are no effects on any of the observations summarised above. You
should also verify that the backward-difference approximation gives essentially the same
graph as the forward-difference graph (the fluctuations will be slightly difference).
The key takeaway here is that numerical derivatives do not behave in the way we might
expect: smaller h does not produce a more accurate estimate. Always use h ≈ hopt whenever
we code a derivative.
Discussion
• The machine epsilon. The reason behind the behaviour of E(h) that we saw is the
fact that computers use binary numbers. A computer uses 64 binary digits to represent
any real number as a floating-point number (or simply float). It can be shown that
this accuracy is dictated by a number called the machine epsilon, ε mach , defined
as the distance between 1 and the next floating-point number greater than 1. For
double-precision floats (which is what most computers today use by default), we have
See [182] for an explanation of how ε mach is calculated. The machine epsilon is one of
the main reasons why numerical results are sometimes very different from theoretical
expectations.
For instance, ε mach is the reason why we see the flat plateau for very small h in fig.
2.2. If h . ε mach , the floating-point representation of 1 + h is 1, and the approximation
formulae give zero because f (x + h) = f (x − h) = f (x). Therefore E(h) = 3 if h is
too small, as seen in the graph.
58 2 Calculus
• Truncation and rounding errors The V-shape of the graphs is due to two numerical
effects at play. One effect is the rounding error, E R , which occurs when a number
is represented in floating-point form and rounded up or down according to a set of
conventions. It can be shown that
E R ∼ h−1,
and therefore E R dominates at small h. The tiny fluctuations are due to different
rounding rules being applied at different real numbers.
In addition, there is also the truncation error, ET , which is associated with the accuracy
of the approximation formula. In fact, Taylor’s Theorem (see §2.4) tells us that the error
in the forward and symmetric difference formulae can be expressed as follows.
f (x + h) − f (x) h 00
Forward difference: f 0 (x) = − f (ξ1 ), (2.1)
h 2
f (x + h) − f (x − h) h2 000
Symmetric difference: f 0 (x) = − f (ξ2 ), (2.2)
2h 6
for some numbers ξ1, ξ2 ∈ (x, x + h). The truncation error is simply each of the error
terms above, arising from the truncation of the infinite Taylor series for f . Note that the
powers of h in these remainder terms are exactly what we observed in the E(h) graph.
In exercise 2, you will explore the E(h) graph of a more exotic formula for the derivative.
• Big O notation Another way to express the accuracy of the approximations (2.1)-(2.2)
is to use the O(h n ) notation, where n is determined from the scaling of the error
term. We say that the forward difference formula is an O(h) approximation, and the
symmetric difference formula is O(h2 ). The higher the exponent n, the faster the error
shrinks, so one does not have to use a tiny h to achieve a good accuracy.
2.3 Taylor series 59
Evaluate and plot the partial sums of the Taylor series for:
a) sin x, b) ln(1 + x) c) 1/(1 + x).
In each case, at what values of x does the Taylor series converge?
Recall that the Taylor series for a smooth function f expanded about x = 0 (also known
as Maclaurin series) is given by:
∞
f 00 (0) 2 f 000 (0) 3 X f (n) (0)
f (x) = f (0) + f 0 (0)x + x + x +... = xn . (2.3)
2! 3! n=0
n!
The code taylor.ipynb plots the graph y = ln(1 + x) along with partial sums of the
Taylor series up to the x 40 term (fig. 2.3). As there are so many lines, it might help to start
plotting only, say, the 5th partial sum onwards. The first few partial sums do not tell you
much more than the fact that they are terrible approximations.
It might also help to systematically colour the various curves. In our plot, we adjust the
(r,g,b) values gradually to make the curves ‘bluer’ as the number of terms increases (i.e.
by increasing the b value from 0 to 1 and keeping r=g=0).
From the graphs (and further experimenting with the number of terms in the partial
sums), we can make the following observations.
• sin x – The Taylor series appears to converge to sin x at all x ∈ R.
• ln(1 + x) – The Taylor series appears to converge to ln(1 + x) for x ∈ (−1, 1), possibly
also at x = 1. For x > 1, the graphs for large n show a divergence (i.e. the y values
become arbitrarily large in the right neighbourhood of x = 1).
1 1
• 1+x – The Taylor series also appears to converge to 1+x for x ∈ (−1, 1), similar to
ln(1 + x). At x = 1, the Taylor series gives alternating values of ±1, so clearly does not
converge to the function value of 12 . For |x| > 1, the Taylor series blows up.
60 2 Calculus
1
Fig. 2.3: The graphs of y = sin x, ln(1 + x) and 1+x (thick black lines) and their Taylor
series. The top panel shows up to 7 terms in the series (up to the term x 13 ). The lower panels
show the series up to the x 40 term in the series, with bluer lines indicating higher number of
terms.
2.3 Taylor series 61
Discussion
• Radius of convergence The radius of convergence, R, of a power series is a real
number such that the series an x n converges for |x| < R, and diverges for |x| > R.
P
The interval (−R, R) is called the interval of convergence of the series.
For example, the series (2.6) could be regarded as a geometric series with common ratio
1
x. We know that the geometric series converges to the sum to infinity 1+x if |x| < 1
and diverges when |x| > 1, as can be seen graphically in fig. 2.3. We say that the radius
of convergence of the series is 1.
• Ratio Test. We discussed the comparison test in the previous chapter. Another useful
convergence test is the following result known as d’Alembert’s Ratio Test:
Theorem 2.1 (Ratio Test) Let Tn be a sequence such that Tn , 0 eventually. Let
Tn+1
L = lim .
n→∞ Tn
If L < 1 then the series Tn converges. If L > 1 then the series Tn diverges.
P P
x2
L = lim = 0, for all x ∈ R.
n→∞ 2n(2n + 1)
This proves our conjecture that the Taylor series converges for all x.
However, we have not proved that the series converges to sin x (we only proved that it
converges to some function). We will come back to this in the next section.
62 2 Calculus
• Differentiating and integrating power series. There is a relation between the Taylor
1
series for 1+x and ln(1 + x). Here is a very useful theorem from analysis:
Theorem 2.2 The series an x n can be differentiated and integrated term by term
P
within the interval of convergence. The resulting power series has the same radius of
convergence.
Using this result, we can integrate both sides of Eq. 2.6 with respect to x as long as
|x| < 1, yielding exactly Eq. 2.5. This proves our conjecture that the two series share
the same interval of convergence.
However, we have not established the convergence at the end points x = ±1. We will
discuss this in the next section.
2.4 Taylor’s Theorem and the Remainder term 63
x3 x5 x 2N −1
sin x ≈ P2N −1 (x) : = x − + . . . + (−1) N +1
3! 5! (2N − 1)!
N
X (−1) n+1 2n−1
= x , (2.7)
n=1
(2n − 1)!
x2 x3 xN
ln(1 + x) ≈ PN (x) : = x − + . . . (−1) N +1
2 3 N
N
X (−1) n+1 n
= x . (2.8)
n=1
n
You will probably be familiar with the approximation of a function f (x) as an order-k
polynomial Pk (x) by truncating its Taylor series after a finite number of terms. At university,
we are not only concerned with the series itself, but also the error term (the ‘remainder’) in
the expansion, defined by Rk (x). The following theorem gives a useful expression for the
remainder term.
Theorem 2.3 (Taylor’s Theorem) Let I = [a, b] and N = 0, 1, 2 . . .. Suppose that f and
its derivatives f 0, f 00, . . . f (N ) are continuous on I, and that f (N +1) exists on on (a, b). If
x 0 ∈ I, then, ∀x ∈ I \ {x 0 }, ∃ξ between x and x 0 such that
f (N ) (x 0 )
f (x) = f (x 0 ) + f 0 (x 0 )(x − x 0 ) + · · · + (x − x 0 ) N + R N (x)
N!
f (N +1) (ξ)
where R N (x) = (x − x 0 ) N +1
(N + 1)!
On the domain 0 < x ≤ 1 (where the Taylor series converges), we find that |R N (x)| is
bounded by:
1 x N +1 1
< |R N (x)| < x N +1 . (2.10)
N +1 1+x N +1
As the polynomial order N increases, we expect the quality of the approximation to improve,
and so R N (x) should shrink to 0. Indeed, as N → ∞, the RHS of (2.10) goes to zero. This
means that for all x ∈ (0, 1], the Taylor series for f converges to f (x) (and not any other
functions)1.
Interestingly, this also explains why we can evaluate the Taylor series at x = 1, giving us
the familiar series for ln 2 which we saw in §1.4.
The code taylorthm.ipynb calculates R N (x) at a fixed x ∈ (0, 1] and plots it as a
function of N. Fig. 2.4 shows the graph of R N (x = 0.4) (solid red line) when ln(1 + x) is
approximated by the Taylor polynomial of order N. The upper and lower bounds for |R N |
(eq. 2.10) are shown in dotted lines. The graph shows that Taylor’s Theorem holds, and that
the constant ξ must be close to 0.
Now let’s turn to the function f (x) = sin x. Since the coefficients of the even powers of
x vanish, the 2N-th order approximation is the same as the (2N − 1)th order approximation.
In other words, R2N (x) = R2N −1 (x). Calculating the remainder using Taylor’s Theorem,
we find
(−1) N cos ξ 2N +1
R2N −1 (x) = R2N (x) = x , for some ξ ∈ (0, x).
(2N + 1)!
Note that the cosine is positive and decreasing on the interval (0, 0.4). Thus, we find the
upper and lower bounds for |R N |:
cos x 1
x 2N +1 < |R2N −1 (x)| = |R2N (x)| < x 2N +1 . (2.11)
(2N + 1)! (2N + 1)!
In fig. 2.4, we can see that the remainder and its bounds are almost indistinguishable,
up until around N = 12 where the |R N | violates the bounds when it shrinks below a small
number of order ε mach ≈ 10−16 . This is due to an additional error arising from the subtraction
of two nearly equal numbers (namely, f (x) and PN (x)).
In both cases, our conclusion is that Taylor’s Theorem is verified. In addition, we also
saw that the Taylor series for sin x converges much more rapidly than that of ln(1 + x). By
the time we reach the order-12 polynomial, the approximation for sin x is so good that the
error is smaller than ε mach .
1 Keen-eyed readers might remember from fig. 2.3 that the Taylor series of ln(1 + x) also converges to
ln(1 + x) when −1 < x < 0. However, our proof does not work in this case, since it no longer follows from
eq. 2.9 that R N (x) → 0. What is needed in that case is a different technique (for example, integrating
another Taylor series).
2.4 Taylor’s Theorem and the Remainder term 65
|RN(x = 0.4)|
10 2
10 4
10 6
10 8
10 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Polynomial order N
|RN(x = 0.4)|
10 2
10 5
10 8
10 11
10 14
10 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Polynomial order N
Fig. 2.4: The absolute value of the remainder term |R N (x)| for the function ln(1 + x)
(top) and sin x (bottom), evaluated at x = 0.4, plotted as a function of N (the order of the
approximating polynomial). As N increases, the remainder approaches zero. The bounds
for |R N | are plotted in dotted lines.
Discussion
• Ratio Lemma. We showed in the previous Section that the Taylor series for sin x
converges, but it remains to show that it converges to sin x. Here is a very useful result
which will help us prove this.
Theorem 2.4 (Ratio Lemma) Let an be a sequence such that an > 0. Suppose L is a
constant such that 0 < L < 1 and an+1 /an ≤ L eventually. Then, an converges to 0.
Letting a N be the sequence |x| 2N +1 /(2N + 1)! which appears on the RHS of Eq. 2.11
x2
(where x , 0). Then the ratio a N +1 /a N = (2N +3)(2N +2) → 0 regardless of the value
of x. The Ratio Lemma says that a N → 0, and therefore the remainder R N → 0. This
proves that the Taylor series of sin x converges to sin x for all x ∈ R.
66 2 Calculus
• A pathological function. You may be wondering if it is possible for the Taylor series
of a function to converge to a different function. The answer is yes! Here is a classic
counterexample. Define f : R → R by
2
e−1/x
if x , 0,
f (x) =
0
if x = 0.
It can be shown that f (n) (0) = 0 for all n ∈ N (see, for example, [91] for calculation
details). Therefore, the Taylor series of f converges to 0 everywhere, but only coincides
with f at a single point where x = 0.
Fig. 2.5: The Weierstrass function on two very different scales, showing its self-similar
structure.
Discussion
• Other monsters. Poincaré famously described Weierstrass’s function as a “monster".
Here are two other famous monsters. Both these functions will be explored in the
exercises.
– The Blancmange function. For m = 0, 1, 2 . . ., let f m : R → R be defined by
1
f 0 (x) = min{|x − k |, k ∈ Z}, f m (x) =
f 0 (2m x).
2m
The Blancmange function, g : R → R, is defined by g(x) = ∞ 0 f m (x). It is
P
continuous on R but is nowhere differentiable.
– Riemann’s function, R : R → R is defined by
sin(k 2 x)
∞
X
R(x) = . (2.12)
k=1
k2
fig,ax = plt.subplots()
Leave a space at the bottom for sliders plt.subplots_adjust(bottom=0.2)
Plot Weierstrass function (thin black line) Wfunc, = plt.plot(x, g(x,a,b),'k', lw=0.5)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.grid('on')
plt.title('The Weierstrass Function')
Redraw a_slide.on_changed(update)
b_slide.on_changed(update)
plt.show()
70 2 Calculus
where yi = f (x i ). We say that the RHS is the approximation of the integral using
the Trapezium Rule with n strips.
R2
Use the Trapezium Rule to approximate the integral 1 ln x dx.
How does the accuracy of answer vary with the width of each strip h?
The Trapezium Rule approximates the area under the graph y = f (x) as the sum of
the area of n equally spaced trapezia. The left panel of fig. 2.6 demonstrates this idea
for f (x) = ln x, using 10 trapezia, or strips. The concavity of the curve suggests that the
Trapezium-Rule estimate will be less than the actual answer.
Similar to the error analysis of derivatives in §2.2 , we can study the error in numerical
integration by defining E(h) as the absolute difference between the Trapezium-Rule
estimate and the actual answer. The latter can be obtained by integration by parts, yielding
R2
1
ln x dx = [x ln x − x]21 = 2 ln 2 − 1. Hence, we find
The graph of E(h) (produced by the code trapezium.ipynb) is shown on the right of
fig. 2.6. Since E(h) appears to be a straight line on the logarithmic plot, we can approximate
this behaviour as E ∝ h k where k is the gradient of the straight line. Python tells us that
Gradient of line = 1.99999.
Thus, we conjecture that the Trapezium Rule is an O(h2 ) approximation.
0.5 10 7
0.4 10 8
E(h)
y
0.3 10 9
0.2 10 10
0.1
10 11
0.0
1.0 1.2 1.4 1.6 1.8 2.0 10 5 10 4 10 3 10 2
x h
R2
Fig. 2.6: Trapezium Rule. Left: The integration scheme for 1 ln x dx using 10 trapezoidal
strips. Right: The error E(h) plotted on log scales, showing that E(h) ∝ h2 .
2.6 Integration with Trapezium Rule 71
Discussion
• The error term. It can be shown (e.g. [182]) that the error term in the Trapezium Rule
can be written as
b n−1
(b − a)h2 00
Z
h
X
f (x) dx = y0 + y1 + 2 yi − f (ξ), (2.14)
a 2 i=1
12
for some ξ ∈ (a, b). The exponent of h in the error term confirms our finding that the
Trapezium Rule is an O(h2 ) approximation. Note from the formula that if f is a linear
function, the error vanishes. This is consistent with the geometric picture – the area
under a straight line is exactly a trapezium.
• numpy.trapz NumPy actually has a built-in Trapezium Rule. Look up the trapz
command in NumPy’s documentation and verify that it gives the same result as our
own code.
trapezium.ipynb (for plotting fig. 2.6)
import numpy as np
import matplotlib.pyplot as plt
Rb
Integration limits a a, b = 1, 2
Create N evenly spaced values (number of N = np.round(np.logspace(2,5))
strips) on log scale. round gives nearest integer.
Exact answer for the integral actual= 2*np.log(2)-1
h (width of each strip) hlist = (b-a)/N
E (h) (to be filled in using the for loop) error = []
def trapz(y,h):
Eq. 2.13 return h*(sum(y)-(y[0]+y[-1])/2)
plt.loglog(hlist, np.abs(error))
plt.xlim([1e-5, 1e-2])
plt.xlabel('h')
plt.ylabel('E(h)')
plt.grid('on')
plt.show()
where yi = f (x i ).
R2
Use Simpson’s Rule to approximate the integral 1 ln x dx.
How does the accuracy of answer vary with the width of each strip?
The graph of E(H) is shown in the right panel of fig. 2.7. For easy comparison with the
result for Trapezium Rule, we plot this graph over the same domain as that in fig. 2.6. We
can make a few interesting observations from this graph.
0.6 10 12
0.5
10 13
0.4
E(H)
y
0.3 10 14
0.2 10 15
0.1
10 16
0.0
1.0 1.2 1.4 1.6 1.8 2.0 10 5 10 4 10 3 10 2
x H
R2
Fig. 2.7: Simpson’s Rule. Left: The integration scheme for 1 ln x dx using 10 strips (20
substrips). Right: The error E(H) plotted on log scales. The domain is the same as that in
fig. 2.6 for easier comparison.
2.7 Integration with Simpson’s Rule 73
• The values of the error, given the same number of strips, is many orders of magnitude
smaller than the error using Trapezium Rule. For example, using the strip width 10−2
gives a surprisingly accurate answer with E ∼ 10−11 for Simpson’s Rule, but for the
Trapezium Rule, we find a much larger E ∼ 10−5 .
• There appears to be two distinct regimes of the curve. The straight part where H & 10−3 ,
and the oscillating part for smaller H. Python tells us that the value of the gradient of
the straight part is
Gradient of line = 3.99997.
Thus, we conjecture that Simpson’s Rule is an O(h4 ) approximation. Together with the
previous point, we conclude that Simpson’s Rule is a far superior method in terms of
accuracy (at the expense of an increased number of calculations).
• The oscillating part of the curve occurs around E(H) ∼ 10−15 . This is a magnitude
comparable to ε mach . Indeed, we are just seeing the numerical artefact when subtracting
two nearly equal numbers in the calculation of E(H).
Here is the code for plotting E(H) and calculating the gradient of the straight part.
def simp(y,h):
Eq. 2.15 return (h/3)*(y[0]+y[-1]+\
4*sum(y[1:-1:2])+2*sum(y[2:-1:2]))
plt.loglog(Hlist , np.abs(error))
plt.xlim([1e-5,1e-2])
plt.xlabel('H')
plt.ylabel('E(H)')
plt.grid('on')
plt.show()
Discussion
• Smaller h is not always better. Just as we saw in numerical differentiation, we have a
similar phenomenon for numerical integration: smaller h does not always guarantee a
more accurate answer. Fig. 2.7 shows that the minimum strip width (below which the
roundoff error dominates) occurs at around 10−3 .
• The error term. It can be shown (e.g. [182]) that the error term in Simpson’s Rule
can be written as
(b − a)h4 (4)
Z b n n−1
h
X X
f (x) dx = y0 + y2n + 4 y2i−1 + 2 y2i − f (ξ),
a 3 i=1 i=1
180
for some ξ ∈ (a, b). The derivation of the error term is surprisingly much more difficult
than that of the Trapezium Rule.
The exponent of h in the error term confirms our conjecture that Simpson’s Rule is an
O(h4 ) approximation.
Note from the formula that if f is a cubic function, the error vanishes. This too is
surprising given that Simpson’s Rule is based on parabolas. If we consider the simplest
case of a single strip on [x 0, x 2 ], the formula suggests that given a cubic curve on
this interval, we can draw a parabola which intersects the cubic at 3 points (where
x = x 0, x 1, x 2 ) such that both the cubic and the parabola have exactly the same area
under the curves! See exercise 9.
Another interesting observation is that the error term allows us to estimate the minimum
strip width by setting magnitude of the error term to be roughly ε mach . We find
! 1/4
180
Hmin = 2hmin ∼2 ε mach .
(b − a) f (4) (ξ)
asymptotic
0
Z ∞
divergent integrals.
2.8 Improper integrals
e−x dx,
2.8 Improper integrals
I2 =
f (x) =
1
2
x
sin x
x 2 e−x dx,
Evaluate the following integrals numerically.
x , 0,
x = 0.
.
I3 =
Z
sha1_base64="EZulAEBASB1VMdaUs5E88O8yFdg=">AAARhnicjVhbc9y2Fd4kTetub1bymBdMVzt1MvLOrlonmcx4xrHdxrlIkR07zkSUFZAEdzECQAoAJa0x/B19bX9W/03OAUEuyZXl7kgicS64nPOdD2cVF4IbO5//75133/vN+7/93a3fj//wxz/9+S+3dz740eSlTtiLJBe5/immhgmu2AvLrWA/FZpRGQv2Mj57hPqXF0wbnqvndl2wE0mXimc8oRZEJ+v7V6/22St3Fx7V6e3JfDb3H7L9sggvk1H4HJ3u3E6iNE9KyZRNBDXmeDEv7Imj2vJEsGoclYYVNDmjS3YMr4pKZk6c33VFpiBJSZZr+FWWeGnXw1FpzFrGYCmpXZmhDoXX6Y5Lm31+4rgqSstUUi+UlYLYnGAISMo1S6xYwwtNNIe9kmRFNU0sBGo8nhLyJXlIHpHxOEpZFtEzO1m4SFC1FMxNFlWk/WsV1HxNXZRyUwi6NnaNiqBRBh0tu7JxVj/9IR3kACJGK5ysCtPEdLKIGMwUsyVXjgq+VBWKVBoGraHxlqZn+knP9hM0VuwyyaWkIIwefvmscnAwjDbAAY5MBWGCYeYIV4TCT7nEEaQEIqr51ZiQaGUgpszdjajWdA2bhyBXU9QYq0sbXehSMDIldsXIL/XoF8INoYas+HIF06YkZaxAASXe54ZpB3uO46oJ3pYGVTFfmjNeDHVJUjWRSfA8ehiLmLUG7Fz5LWybdGwU4FtTy7aNzluj85IZLKgtG25bG26Z5K9Zm0dRMoCHi2pM4LBCPAQtJKFxLOqEtCr22j2M/uZY9ZozTaoGp3HAKcvsHQCDhgTYjwH9ROVaQrpjQPgZs6a1No35cWN+gubmvKSabZvTxjyUQuPUlAO4+peB5yMXYYnGsXtUNZWR0BKjBROOowtMIrk7n/2dJHIM2PDDfRiQiKtElClbalqseGKOL3lqV/fns88TCTRSTzIr1LIiEWw6BQeONTWOpJTjEC4EAe68B4lQLC1AptFjBgym2REFckgfA59KyJcGby7g1PiARGr/HDfGB3Cu7wsEByQwSkyyqpz/O55ersB9j8QCQrFHYMo9stSMKRSVoEjWFN4DEeyRNRMiv5yNx55+HpN/kn+NQ6xSnmWTxWTf1cWgpUs9cUQ5FDLpyvZbMnnsor2NBqPeBWW6pBIOw5QpNcPcQHBykSLX5nDYNAfkgYmkVTXEc/o235ud3+o9cK9Pw5jz2epWq5cXwIEXkIfCcAG110plV3xaxw1IFxIz3BLzbNGDQl/vicAv/iYaYJ4G0KRLAn4ny3ppbh2bLWcVGXp6ckDPATUwX/yo2JR+0FgqNpNaQsWsAgIIuqSjSvx6DYjYFZWbIsBRIVhTBc3whjLIRI4QF/6JhVC/1JNnSvRymkHIIaFXLjJwvRbWwNHI4XdVC88MbsNiRZXNpZu1QnO2fvs0YFQjA+rkK/KEfD0egPtmhDXQqpdcUZH1bN3CV9WmkFaHDqxsoLDMHXY0xZbqtB0UGzvu6ee4ycwTGH9B/M1/0mSVszZxfMZmlWucOetryBB8nFeNwVAjC1CpnKsUkN00IV/LIteW4hYwiFPyDfmWfFcvJmiMXU7MRFhd2JrvBzNjq4kYqCsGRy40AiuG9wG2jAvoKTxZk09n93hbEQAnD8Nut4QTylOn6ssECjy/BN7P7Loi3f4IjK48DV7nfNVxJuAUuBDOd0AOyff9/QPWKyevAZc7EtukJbHVqLMadiLjFZ4hZNpjoLNPmTRKaLHc5kKXppHDgkHeW0cibiVLQ0/jfYrEHRTJLvTni2p3SB8HeI4ueg/efCYsGELwx0+slk7l6u5XtDSGU8XtmoQlD9vb+nC4Q5V3ADXUKa8DjowBGxj4I/KUPKvnLDpXWIHfDerAhAuslWyqDllvw1YNBwa2GnRDRYYL48V/pPM8q75oTlIAXxWay6ZPL0AQJEHeKoaavv5pG5KnjehZK3rWiDQ2c3fgkeHZPm6laafHg1HT80/JD+Q5efGGVsIwbCX8336cDV/ewG6gDezW8zEGfODrkhAdbJkyhp1dhxi/wdbquoIFKRTs/UX1qq1U4PReuYLJ6zc6zm90VC7KoQ9oxxAVmOnVW3YKX0HyDWZwBBGqGyVETSNorFP/zYyLlHVq1EJD8bxH4j80Gtt8k7O261BqhSh2u9Eq40LU3XTYKFqbxKENQaNIwKlMVYVWezfMUZY4dQlVFbi07oAnC+iAq9amvMZkf/Zp10al/YnqbWrWxcIFw34Hd7MhzOB/SbXiQAzIRGRzcQDGX9Yaf2XUppctpH2fWzkoR+z/U4DLmngZwePX5j+31fIzIn+6+/9/sIE5vT1ZDP8Jsf3y4/5scW82f/qPyYOH4R8Ut0Yfjf46ujNajD4bPRg9GR2NXoyS0fno36P/jP67c2tntnNv57Pa9N13gs+Ho95n58GvvXd4Eg==</latexit>
<latexit
sha1_base64="7I1zmKowXy0070ZXVg7pE0bTOF0=">AAARjXicjVhtcxtFEhYvd4C4O2z4yJcpZNUFylFJhgBFEQqSHAl32DghIRRex8zuzkpTnp3dzMzaUqbmr9xX7i/dv7nu2dnVauU4p7KlnX6Zl+6nn2kpLgXXZjr972uvv/Hmn/781tvvDN/9y1//9t7O7vu/6KJSCXuSFKJQv8ZUM8Ele2K4EezXUjGax4I9jc/vov7pBVOaF/KxWZXsNKdzyTOeUAOis533V7ejTNHERppLsnR26c52RtPJ1L/I9sMsPIwG4XV8truTRGmRVDmTJhFU65PZtDSnlirDE8HcMKo0K2lyTufsBB4lzZk+tX7zjoxBkpKsUPAvDfHSroeludarPAbLnJqF7utQeJXupDLZl6eWy7IyTCb1QlkliCkIRoKkXLHEiBU80ERx2CtJFhRiYSBew+GYkO/IHXKXDIdRyrKInpvRzEaCyrlgdjRzkfKPLqj5itoo5boUdKXNChVBIzU6GrY0cVZ/+kNaSAVEjDqczIVpYjqaRQxmitmcS0sFn0uHIpmGQWuovaXeMP1kw/YTNJbsMinynIIwuvPdI2fhYBhtQAUcmQrCBMPMEcg/hb9qjiNICURU8eWQkGihIabM3oyoUnQFm4cguzFqtFGViS5UJRgZE7Ng5Pd69DvhmlBNFny+gGlTkjJWooAS73PNtL09x7FrgrelQVXM5/qcl31dkrgmMgmeR/VjEbPWgD2XfgvbJh0bCfhW1LBto+et0fOKaayrLRtuWhtuWM5fsDaPomIADxvVmMChQzwELSShcSzrhLQq9sLeif5umXvBmSKuwWkccMoycwPAoCAB5mNAP5GFyiHdMSD8nBndWuvG/KQxP0Vz/byiim2b08Y8lELj1JQDuPqHnuddG2GJxrG965rKSGiF0YIJh9EFJpHcnE4+JUk+BGz44QEMSMRlIqqUzRUtFzzRJ5c8NYvb08mXSQ40Uk8yKeXckQg2nYIDx5oaRnmeD0O4EAS48w1IhGJpATKO7jFgMMWOKZBDeg9oNYd8KfDmAk6NH5BI5T+HjfEhnOunEsEBCYwSnSyc9e/D8eUC3PdJLCAU+wSm3CdzxZhEUQWKZEXhORDBPlkxIYrLyXDo6ece+Qf5fhhilfIsG81GB7YuBpXb1BNHVEAhk67soCWTezbaX2sw6l1QpnOaw2GY1JVimBsITiFS5NoCDpsWgDwwyalzfTynr/K93vmV3j33+jSMWZ+tbrV6eQkceAF5KDUXUHutNO+Kz+q4AelCYvpbYp4tNqCwqfdE4Bd/GQ0wTwNo0iUBv5N5vTQ3lk3mE0f6np4c0LNHDcwXPyrWpR80hor1pIZQMXFAAEGXdFSJX68BEVvSfF0EOCoFa6qgGV5TBpkoEOLCf2Ih1A/15JkUGznNIOSQ0CW0FnC9lkbD0cjRj66FZwa3Ybmg0hS5nbRCfb569TRgVCMD6uQ+eUB+GPbAfT3CGmjVSy6oyDZs7cxX1bqQFkcWrEygsMwedTTlluqsHZRrO+7p56TJzAMYf0X8zX/aZJWzNnF8wibONs6cbWpIH3ycu8agr8lLUMmCyxSQ3TQhP+RloQzFLWAQx+Sf5F/kx3oxQWPscmImwurC1Hzfmxk7TsRAXTE4sqERWDC8D7BlnEFP4cmafD65xduKADh5GHa7JZwwP7OyvkygwItL4P3MrBzp9kdgtPQ0eJXzsuNMwClwIZzvkByRnzb3D1h3Nr8CXPZYbJNWjq1GndWwkzxe4BlCpj0GOvvMk0YJLZZdX+i5buSwYJBvrJMjbnOWhp7G+5SJPSyTvWf25szt9enjEM/RRe/hy8+EBUMI/vmJ5dzKQt68TyutOZXcrEhY8qi9rY/6O5RFB1B9nfQ64MgYsIGBPyYPyaN6zrJzhZX43aAOTLjAWsm66pD11mzVcGBgq143VGa4MF78x6ooMvdVc5IS+KpUPG/69BIEQRLkraKv2dQ/bEPysBE9akWPGpHCZu4GfGR4to9badrp8WDU9Pxj8jN5TJ68pJXQDFsJ/74ZZ83n17AbaAO7bfhoDT7wdUmIDrZ0FcPOrkKM32BrdVXBghQK9vbMPWsrFTh9o1zB5MVLHafXOkobFdAHtGOICsz07BU7ha8gxRozOIII1Y0SoqYRNNap/2bGRco6NWqgoXi8QeI/NxrTfJMzputQKYkotnvRIuNC1N102Cha68SiDUGjSMCptHOh1d4Lc1QVTl1BVQUurTvg0Qw6YNfaVFeYHEw+79rIdHOiepuKdbFwwbDfwd2sCTP4X1IlORADMhFZXxyA8ae1xl8ZtellC2nf5+KPBw77/xTgsiJeRvD4tflvbbX8hsgf7/3/L2xgznZGs/6PENsPvxxMZrcm04efjb69E36geHvw4eCjwY3BbPDF4NvBg8Hx4MkgGSwH/x78MfjP7nu7t3a/3v2mNn39teDzwWDjtXv/f4HVexs=</latexit>
<latexit
sha1_base64="SSgXEbg2Q6vbtVOkVWN1IcXfd1w=">AAARg3icjVhtc9s2Elbba6/H3vXi9mO/YE7WXNo6Gsl9nc5kpk1y1/Sudp00aToxHRckQQljEGQA0LaCwb/o1+v/6r+5XRCkKMpxqrEtYl/wsvvsg6WTSnBtZrPfX3v9jT+9+daf3/5L9M5f//bu32/svPeTLmuVssdpKUr1c0I1E1yyx4YbwX6uFKNFItiT5Owu6p+cM6V5KR+ZVcVOCrqQPOcpNSB6urrNntlbl8/23emN8Ww68x+y/TAPD+NR+Byd7txI46xM64JJkwqq9fF8VpkTS5XhqWAuimvNKpqe0QU7hkdJC6ZPrN+yIxOQZCQvFfxKQ7y072FpofWqSMCyoGaphzoUXqU7rk3+5YnlsqoNk2mzUF4LYkqC5ycZVyw1YgUPNFUc9krSJVU0NRClKJoQ8g25Q+6SKIozlsf0zIznNhZULgSz47mLlX90Qc1X1MYZ15WgK21WqAgaqdHRsEuT5M23P6SFBEDEqMPJXJgmoeN5zGCmhC24tFTwhXQoklkYdIbaW+oN0482bD9CY8ku0rIoKAjjO988dBYOhtEGLMCRqSBMMMwc4ZJQ+KkXOIKUQEQVv4wIiZcaYsrsrZgqRVeweQiym6BGG1Wb+FzVgpEJMUtGfmlGvxCuCdVkyRdLmDYjGWMVCijxPtdMO9hzkrg2eFsaVCV8oc94NdSlqWsjk+J51DAWCesM2HPpt7Bt0rORgG9FDds2et4ZPa+ZxmrasuGms+GGFfwF6/IoagbwsHGDCRw6xEPQQhJax6pJSKdiL+yd+J+WuRecKeJanCYBpyw3NwEMChJgPgT0E1mqAtKdAMLPmNGdtW7Nj1vzEzTXz2uq2LY5bc1DKbRObTmAq38YeN61MZZokti7rq2MlNYYLZgwis8xieTWbPoJSYsIsOGH+zAgMZepqDO2ULRa8lQfX/DMLG/Ppl+mBdBIM8m0kgtHYth0Bg4cayqKi6KIQrgQBLjzDUiEYukAMonvMWAwxY4okEN2D8i0gHwp8OYCTo1fkEjlv6PW+ADO9UOF4IAExqlOl876v9HkYgnueyQREIo9AlPukYViTKKoBkW6ovAciGCPrJgQ5cU0ijz93CP/Iv+OQqwynufj+XjfNsWgCpt54ohLKGTSl+13ZHLPxntrDUa9D8psQQs4DJO6VgxzA8EpRYZcW8JhsxKQByYFdW6I5+xVvtc7v9J74N6chjHrs9WvVi+vgAPPIQ+V5gJqr5MWffFpEzcgXUjMcEvMs8UGFDb1ngj84i+jAeZpAE36JOB3smiW5say6WLqyNDTkwN6DqiB+eJHxbr0g8ZQsZ7UECqmDggg6NKeKvXrtSBil7RYFwGOKsHaKmiH15RBLkqEuPDfWAjNQzN5LsVGTnMIOST00sYartfKaDgaOfzedfDM4TasllSasrDTTqjPVq+eBowaZECdfEvuk++iAbivR1gLrWbJJRX5hq2d+6paF9Ly0IKVCRSW28OeptpSnXaDam3HPf0ct5m5D+OviL/5T9qsctYljk/Z1NnWmbNNDRmCj3PXGgw1RQUqWXKZAbLbJuS7oiqVobgFDOKE/If8l3zfLCZogl1OwkRYXZiG7wczY5+JGGgqBkc2NAJLhvcBtoxz6Ck8WZPPp5/xriIATh6G/W4JJyxOrWwuEyjw8gJ4PzcrR/r9ERhdehq8yvmy50zAKXAhnO+AHJIfNvcPWHe2uAJc9khsk1aBrUaT1bCTIlniGUKmPQZ6+yzSVgktll1f6IVu5bBgkG+sUyBuC5aFnsb7VKk9qNJd6M/nbndIHwd4jj56D15+JiwYQvDHTywXVpby1re01ppTyc2KhCUPu9v6cLhDWfYANdRJrwOOTAAbGPgj8oA8bOaseldYhe8GTWDCBdZJ1lWHrLdmq5YDA1sNuqEqx4Xx4j9SZZm7r9qTVMBXleJF26dXIAiSIO8UQ82m/kEXkget6GEnetiKFDZzN+Erx7N92EmzXo8Ho7bnn5AfySPy+CWthGbYSvi/m3HWfHENu4E2sNuGj9bgA69LQvSwpesEdnYVYvwGO6urChakULC35+5ZV6nA6RvlCiYvXuo4u9ZR2riEPqAbQ1Rgpmev2Cm8gpRrzOAIItQ0SoiaVtBaZ/7NjIuM9WrUQEPxaIPEf2w1pn2TM6bvUCuJKLa78TLnQjTddNgoWuvUog1Bo1jAqbRzodXeDXPUNU5dQ1UFLm064PEcOmDX2dRXmOxPP+/byGxzomabivWxcM6w38HdrAkz+F9QJTkQAzIRWV8cgPEnjcZfGY3pRQdp3+c6C+WI/X8GcFkRLyN4/Mb8aVctTxH5k90//sEG5vTGeD78J8T2w0/70/ln09mDT8df3wn/oHh79MHoH6Obo/noi9HXo/ujo9HjUTqSo19H/xv9tvPmzsc7+zufNqavvxZ83h9tfHZu/x90a3bs</latexit>
<latexit
x
sin x
y=
y=e
y = x2 e
x
dx.
sin x
x2
Now let’s see how these improper integrals can be evaluated in Python.
x2
You might also notice that for I3 , the integrand does not diverge as x → 0, but approaches
We see that the integrands all appear to converge to y = 0 as x → ∞. However, this
75
work strategically around the discontinuities. The graphs of the integrands of I1 to I3 are
in which the integrand is not well-defined at one of the integration limits (e.g. the integrand
An improper integral is an integral that either has an integration limit involving ∞ or one
1 (due to the limit lim x→0 (sin x/x) = 1, see eq. 1.6). To avoid zero division, it might be
revealed in this step and we can then conclude that either the integral diverges, or plan to
a clue as to the kind of answer we might expect. Any discontinuities or divergence would be
integrand to survey the behaviour of the function over the integration domain. This gives us
When calculating an integral numerically, it is always useful to start by plotting the
behaviour alone does not guarantee the convergence of the integrals (after
all, 1 x1 dx is divergent). But our graphical investigation has not revealed any obviously
76 2 Calculus
The error estimates for I1 and I2 look reassuringly tiny, but the error estimate for I3 is
alarmingly large (even larger than the answer itself!). Python also gives a warning message
that “The integral is probably divergent, or slowly convergent". Further investigation is
needed.
The quad method is really easy, but it gives us very little understanding of what is
happening mathematically. When coding becomes an opaque black box, it makes the
interaction between mathematics and computing less meaningful, and the problem becomes
less mathematically enlightening.
Second solution: substitution
Let’s take a look at a more transparent method, this time without using quad.
All the integrals involve infinity in the limit. But infinity is not a binary number, so to
compute it numerically we must replace infinity with a finite limit. But simply replacing
infinity with a large number is not always going to work. After all, it’s not clear how large
the large number should be. R∞
Here is a more sophisticated strategy. To work out I = 0 f (x) dx, let’s first break it up
into two integrals: Z α Z ∞
I= f (x) dx + f (x) dx,
0 α
where the break point α is a positive number to be determined. For the second integral, use a
substitution to turn the limit ∞ to a finite number. Let’s try u = 1/x (a different substitution
might work even better, depending on the behaviour of the integrand). This yields
Z α Z 1/α
f (1/u)
I= f (x) dx + du. (2.16)
0 0 u2
When α = 1, the two terms can also be combined into a single integral (upon renaming the
dummy variable u as x)
Z 1" #
f (1/x)
I= f (x) + dx. (2.17)
0 x2
Let’s try using formula (2.16) to tackle I1 using Simpson’s Rule to evaluate each integral.
You can use our own Simpson’s Rule code, or, equivalently, scipy.integrate.simpson.
Supposing for now that the exact answer is
√
π
I1 = .
2
(We will explain this in the Discussion section.) Let’s vary the value of the break point α and
see how the error behaves. Using h = 10−3 in Simpson’s Rule, the code improper.ipynb
produces fig. 2.9.
2.8 Improper integrals 77
10 13
Absolute error
10 14
10 15
10 16
0 1 2 3 4 5 6 7
R∞ 2
Fig. 2.9: The accuracy when I1 = 0 e−x dx is evaluated by splitting it into two integrals
(eq. 2.16) and using Simpson’s Rule with h = 10−3 . The graph shows the absolute error as
the break point α varies. The absolute error is minimised when α & 5.
Fig 2.9 shows that the best accuracy is obtained when α is around 5 (where the absolute
error is machine-epsilon limited). Nevertheless, for any α the worst accuracy is still a
respectable 13 decimal places. With α = 5, Python gives the following excellent estimate
which is accurate to 15 decimal places:
integ = 0.886226925452758
exact = 0.8862269254527579
A couple of highlights from the code:
• In evaluating the second integral in eq. 2.16 , we choose a tiny cutoff for the lower limit
(10−8 in the code) in place of zero, which would have produced a division-by-zero error.
It is always a good idea to check that the answer for the integral is insensitive to the
choice of the tiny cutoff. Indeed, varying the cutoff by a few orders of magnitude does
not change the shape of the graph.
• Simpson’s Rule requires an even number of strips, i.e. an odd number of points N in
np.linspace(a, b, N). We ensure that N is odd using the function
makeodd(N) = N + 1 - (N%2).
The syntax N%2 gives the remainder when N is divided by 2. Therefore, the function
gives N when N is odd, and N+1 when N is even.
• Nevertheless, SciPy’s integrate.simpson would still work without our makeodd
function, but one should be aware of how the routine deals with an odd number of
half-strips. The result can be different depending on your version of SciPy. For more on
this point, consult SciPy’s documentation2.
2 https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simpson.
html
78 2 Calculus
Applying the same splitting trick to integral I2 , you should find a similar behaviour in
the absolute error. To work out the exact answer, one can integrate by parts and use the
previous exact result to deduce that
√
π
I2 = .
4
With α = 5, Python gives an excellent estimate accurate to 15 decimal places.
integ = 0.44311346272637947
exact = 0.44311346272637897
The integral converges very slowly as shown in fig. 2.10. Here we have used the Trapezium
Rule with strip width h = 10−3 , and we have assumed that the exact answer is
π
I3 = .
2
(This will be explained in the Discussion). From fig. 2.10, it is clear that we have to integrate
to quite a large number to achieve high accuracy. More precisely, the error fluctuates
more and more as A increases, but there is an envelope A ∝ (error) −1 , which guarantees
a minimum achievable accuracy. We can deduce, for instance, that to achieve an answer
that is accurate to p decimal places (i.e. the error must be at most 0.5 × 10−p ), this can be
achieved when A ≈ 2 × 10 p (limited by rounding error).
100
10 2
10 4
Absolute error
10 6
10 8
10 10
10 12
A sin x π
Fig. 2.10: The absolute error defined by dx − plotted as the upper limit A varies.
R
0 x 2
The integral is evaluated using Trapezium Rule with h = 10−3 .
Although we have attacked these integrals using a variety of different numerical methods,
it is also possible to calculate these integrals using Python’s library for symbolic mathematics
called SymPy. We will explore SymPy in chapter 5, but if you are impatient, go straight to
the code box in the very last exercise in that chapter.
Finally, a rather different approach to numerical integration based on probability is Monte
Carlo integration, which will be discussed in §7.10.
80 2 Calculus
Discussion
• Key to success with numerical integration. The main takeaway here is that there are
no magical Python functions which can deal with every integral accurately. The key to
success is to work with Python by using our mathematical understanding (e.g. introduce
a suitable transformation). Always investigate the behaviour of the integrands, and
where possible, avoid the black-box approach (where you leave everything to Python
without knowing what it does). Numerical integration is an art which takes experience
to master, so be patient!
• A Gaussian integral R ∞and 2a trick. Here is a neat integration trick to show that
√
I1 = π/2. Let I = −∞ e−x dx. Note that by symmetry, I = 2I1 . Since we can also
R∞ 2
write I = −∞ e−y dy, we can multiply the two expressions for I and, assuming that we
can move terms around, we have
Z ∞ Z ∞ "
2 2 2 2
I2 = e−x dx e−y dy = e−(x +y ) dx dy.
−∞ −∞ R2
sin t
Z x
Si(x) = dt.
0 t
This function occurs frequently in physical and engineering applications. The quickest
way to evaluate this is to use the following SciPy command:
scipy.special.sici(x)[0]
Note that only first element of sici(x) is the sine integral. The second element is the
cosine integral which will be explored in exercise 13.
In general, there are no elementary expressions for this integral,
R ∞except at x = 0 (clearly
Si(0) = 0) and x → ∞ where the integral becomes I3 = 0 sinx x dx = π/2 as we
saw earlier. At university, you will learn a number of different methods that can help
you evaluate I3 exactly. One technique is to use contour integration (in a topic called
complex analysis). Another technique the Laplace transform, which is a mathematical
tool with a huge range of engineering applications.
2.8 Improper integrals 81
Even without these advanced techniques, there is a clever trick which can help us
evaluate the integral. Again this relies on some elementary knowledge of multivariable
calculus. This trick is based on the following simple observation:
sin x
Z ∞
e−x y sin x dy = ,
0 x
(where x is held constant in the integral). Therefore, we can write the original integral
as Z ∞ Z ∞ !
I3 = e−x y sin x dx dy,
0 0
where we have assumed that the order of integration can be switched (thanks to a result
in analysis called Fubini’s theorem). You would most likely have come across the inner
integral presented as an exercise in integration by parts with the use of a reduction
formula (whereby the original integral appears upon integrating by part twice). You
should verify that:
1
Z ∞
e−x y sin x dx = .
0 1 + y2
Returning to the original integral, we then find:
dy g∞ π
Z ∞ f
I3 = = tan−1
y = .
0 1 + y2 0 2
82 2 Calculus
1
x ∈ [0, π)
f (x) =
0
x ∈ [π, 2π)
and continued outside [0, 2π) in the same way, so that the graph of y = f (x) looks
like a square wave.
Show that f can be written as a series in sin nx as
1 2 1 1
!
f (x) = + sin x + sin 3x + sin 5x . . . .
2 π 3 5
The topic of Fourier series is a vital part of university mathematics. We give a brief
summary here. The goal is to write a 2π-periodic function f (defined on R) as a sum of sines
and cosines of different frequencies and amplitudes. In other words, we want to express
f (x) as
∞
a0 X
f (x) = + (an cos nx + bn sin nx) , (2.18)
2 n=1
(the constant term a0 is traditionally written out separately to make the formula for an and
bn easier to remember). The French mathematician Joseph Fourier (1768–1830) proposed
such a series to study the problem of heat conduction, and showed that the coefficients of
the series are given by:
Z π Z π
1 1
an = f (x) cos nx dx, bn = f (x) sin nx dx. (2.19)
π −π π −π
We can think of equation 2.18 as a decomposition of a function into different resolutions.
Large n are high-frequency sinusoidals, so they capture the small-scale fluctuations of f at
fine detail. Similarly, small n sinusoidals would capture the low-frequency, broad-brush
behaviour of f .
In fact, this composition can be generalised to a continuous range of frequency n, in
which case the decomposition is called a Fourier transform. Fourier series and Fourier
transform constitute a topic called Fourier analysis, which is an indispensable tool in signal
and image processing, geoscience, economics and much more. See [65, 94, 190] for some
excellent books on Fourier analysis.
Back to the square-wave function. Performing the integrations in Eqs. 2.19, we find:
a0 = 1, an = 0,
2
1 for n odd
bn = 1 − (−1) n =
nπ
0 for n even
nπ
where n = 1, 2, 3 . . . (you should verify these results). Putting these into the Fourier series
(2.18) gives:
2.9 Fourier series 83
∞
1 2 X sin nx
f (x) = + . (2.20)
2 π n odd≥1 n
The code below produces an interactive plot with a slider which changes the number of
terms in the truncated Fourier series (n = 0, 1, 2, . . . nmax ) . Some snapshots for different
values of nmax are shown in fig. 2.11.
Here are some interesting observations from fig. 2.11.
• The more terms we use, the closer we are to the square wave. However, the series,
even with infinitely many terms, will never be exactly the same as the square wave. For
example, at the points of discontinuities of the square wave function, the Fourier series
equals 12 (just put x = kπ in (2.20)), but the square wave never takes this value.
This means that one has to take the equal sign in Eq. 2.20 with a pinch of salt. Some
people write ∼ instead of = to emphasise the difference.
• Near each discontinuity, there appears to be an overshoot above 1 (and undershoot
below 0). Try zooming into an overshoot and read off the maximum value (choosing a
large value of nmax ). You should find that Fourier series overshoots the square wave by
just under 9%. The undershoot below 0 is also by the same amount.
Discussion
• Jump discontinuities. It can indeed be shown that if there is a jump discontinuity in
the function f at x = a, then its Fourier series converges to the average of the left and
right limits, i.e.
1
lim− f (x) + lim+ f (x) . (2.21)
2 x→a x→a
• Gibbs phenomenon. An overshoot (and an undershoot) always occurs when using the
truncated Fourier series to approximate a discontinuous function. For large nmax , the
overshoot moves closer to the point of discontinuity but does not disappear. In fact, the
overshoot can be shown to be around 9% of the magnitude of the jump. More precisely,
the fractional overshoot can be expressed as
Z π
1 sin x 1
dx − ≈ 0.0894898722 (10 dec. pl.) (2.22)
π 0 x 2
Note the appearance of the sine integral!
The overshoot and undershoot of Fourier series near a discontinuity are known as Gibbs
phenomenon after the American mathematician Josiah Gibbs (1839–1903) who gave a
careful analysis of the overshoot magnitude. However, it was the English mathematician
Henry Wilbraham (1825–1883) who first discovered the phenomenon.
• Parseval’s theorem and ζ(2). There is an elegant relation between the function f and
its Fourier coefficients an and bn (without the sine and cosines).
Parseval’s Theorem, named after the French mathematician Marc-Antoine Parseval
(1755–1836), states that
84 2 Calculus
Fig. 2.11: The partial sums (thin blue curves) of the Fourier series (2.20) for the square-wave
function (shown in red) with nmax = 5, 35 and 95.
2.9 Fourier series 85
π a 2 ∞ ∞
1 1X 2 1X 2
Z
0
| f (x)| 2 dx = + a + b . (2.23)
2π −π 2 2 1 n 2 1 n
Here is a brief interpretation of why this holds. The LHS is the average of | f (x)| 2 over
the period. The terms on the RHS are the averages of terms when the Fourier expansion
is squared. We are left with the averages of (a0 /2) 2 , (an cos nx) 2 and (bn sin nx) 2 (the
cross terms average to zero).
Applying Parseval’s identity to the square wave function, Eq. 2.23 becomes:
1 1 2 1 1
!
= + 2 1+ 2 + 2 +... .
2 4 π 3 5
π2
∞
X 1
= . (2.24)
k=1
(2k − 1) 2 8
1
We end this chapter with a calculation of ζ (2) = that follows from Eq. 2.24.
P∞
n=1 n 2
Let S = ζ (2). Observe that:
∞
X 1
S=
n=1
n2
X 1 X 1
= +
n odd
n2 n even
n2
∞ ∞
X 1 X 1
= +
k=1
(2k − 1) 2 k=1 (2k) 2
π2
S
= +
8 4
4 π2 π2
=⇒ S = · = .
3 8 6
This beautiful result3 also verifies our numerical calculation of ζ (2) in §1.4.
3 A subtle point: in the second line of the working, we assumed that the sum S remains unchanged by the
rearrangement of the terms in the series. Although it is safe to do here, a rearrangement can change the sum
of series like the alternating harmonic series. Look up Riemann’s rearrangement theorem.
86 2 Calculus
fig,ax = plt.subplots()
Leave a space at the bottom for a slider plt.subplots_adjust(bottom=0.15)
def update(val):
Take n max from slider nmax = n_slide.val
Recalculate the Fourier series Ffunc.set_ydata(Fourier(x,nmax))
fig.canvas.draw_idle()
plt.show()
2.10 Exercises 87
2.10 Exercises
1 Perform the following modifications on the code Eh.ipynb which we used to calculate
the derivative of f (x) = x 3 at x = 1 (§2.2).
• Change the point x at which the derivative is calculated.
• Change the function f (x) to any differentiable function.
• Change the approximation to the backward-difference approximation.
Verify that the qualitative behaviour of E(h) does not change by these modifications.
2 (Five-point stencil) Use Eh.ipynb to help you with this question.
Apart from the forward, backward and symmetric-difference formulae for the derivative
discussed in §2.2, there are infinitely many other similar differentiation formulae. Those
involving more points typically produce a more accurate answer (given the same step
size h) at the expense of increased calculation time.
Here is an exotic one called the five-point stencil formula:
1
f 0 (x) ≈ − f (x + 2h) + 8 f (x + h) − 8 f (x − h) + f (x − 2h) .
12h
a. Plot the graph of the absolute error E(h) for the derivative of f (x) = cos x at x = 1.
You should see a V-shape similar to fig. 2.2.
b. On the same set of axes, plot the symmetric difference formula. How do the two
graphs compare?
c. If the five-point stencil formula is an O(h k ) approximation, use your graph to show
that k = 4.
Note: Such formulae are studied in a topic called finite-difference methods, which we
will explore in §4.9.
3 (Taylor series and convergence) Consider the following Taylor series.
x 2n+1
∞
X
sinh x = ,
n=0
(2n + 1)!
x 2n+1
∞
X
tan−1 x = (−1) n ,
n=0
(2n + 1)
√ ∞
X (−1) n+1 (2n)! n
1+x = x .
n=0
4 (n!) 2 (2n − 1)
n
a. Modify the code taylor.ipynb (§2.3) to produce the graph of each of the above
functions and its Taylor series, similar to fig. 2.3.
Experiment with the domain of the plot, and conjecture the radius of convergence
for each Taylor series. √
(For example, you should find that for 1 + x, the radius of convergence is 1.)
b. Choose a value of x for which all three Taylor series converge to the respective
function (for example, x = 0.3).
Plot the absolute error
as a function of the polynomial order N, similar to fig. 2.4 (but forget about the
upper and lower bound in dotted lines). Plot the absolute error for all 3 Taylor series
on the same set of axes.
From the plot, deduce which√ Taylor series converges fastest and slowest.
(Answer: sinh x is fastest, 1 + x is slowest.)
4 (Taylor’s Theorem) Consider f (x) = ln(1 + x). We showed using Taylor’s Theorem
(theorem 2.3) that the remainder R N is given by eq. 2.9.
a. Fix x = 0.4. Using the Taylor polynomial of order 1, show that the constant ξ is
approximately 0.1222 (to 4 dec. pl).
b. Plot a graph of ξ as a function of N. You should obtain a decreasing function. Verify
that ξ always lies in the interval (0, 0.4), in accordance with Taylor’s Theorem.
On the domain [0, 1], you should obtain the graph in fig. 2.12 (which looks like the
eponymous dessert). Clearly you will need to impose some cutoff to approximate
the infinite sum.
c. Generalise the Blancmange function by redefining f m (x) as
1
f m, K (x) = f 0 (K m x).
Km
Plot the generalised Blancmange function gK (x) = ∞ m=0 f m, K (x) for K =
P
2, 3, 4 . . . 20 on the same set of axes.
(Better yet, vary the plot using a slider for K. Use weierstrass.ipynb as a
template).
Conjecture the shape of the function when K → ∞. Can you prove it?
d. For K = 2, create a figure which shows the self-similarity structure of the graph (in
the style of fig. 1.15). Note also that the graph is periodic on R.
Conjecture the values of K ∈ R for which the graph shows i) periodicity, or ii)
self-similarity.
For an accessible account of the Blancmange function and its properties, see [200].