100% found this document useful (1 vote)
652 views100 pages

Exploring University Mathematics With Python

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
100% found this document useful (1 vote)
652 views100 pages

Exploring University Mathematics With Python

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 100

Exploring University Mathematics with Python

Siri Chongchitnan

Exploring University
Mathematics with Python
Siri Chongchitnan
Mathematics Institute
University of Warwick
Coventry, UK

ISBN 978-3-031-46269-6 ISBN 978-3-031-46270-2 (eBook)


https://doi.org/10.1007/978-3-031-46270-2

Mathematics Subject Classification (2020): 00-01

©The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the
whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws
and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions
that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.


To my teachers
Preface

Motivation
Python is now arguably the world’s most popular programming language, thanks to its
gentle learning curve, clear syntax, wide range of open-source libraries and active online
support community. Over the past decade, Python programming has become a highly
desirable skill to employers not only in the STEM and ICT sectors, but also in any industry
involving data. It comes to no surprise then that Python has now been integrated into school
and university curriculums around the world.
A typical mathematics university curriculum would include some element of program-
ming, usually in a standalone module. However, in my experience teaching at several UK
universities, students often regard programming as just another module that is disparate
from a typical ‘pen-and-paper’ module.
In my opinion, this is an extremely unhealthy viewpoint, because programming can
often help us gain a more solid understanding of mathematics in comparison to a purely
pen-and-paper approach. It is true that much of university mathematics is driven by theorems
and proofs, and it is also true that Python does not prove theorems. However, Python gives
us the power and freedom to glimpse into the unknown, leading us towards insightful
conjectures that would have been difficult to formulate otherwise.
Hence, I was motivated to write a mathematics textbook that is richly interwoven with
Python, rather than another Python textbook with some mathematical examples. The spirit
of this book is one of mathematical exploration and investigation. I want to show students
that Python can hugely enrich our understanding of mathematics through:
• Calculation: Performing complex calculations and numerical simulations instantly;
• Visualisation: Demonstrating key theorems with graphs, interactive plots and anima-
tions;
• Extension: Using numerical findings as inspiration for making deeper, more general
conjectures.

Who is this book for?


I wrote this book for all learners of mathematics, with the primary audience being mathe-
matics undergraduates who are curious to see how Python can enhance their understanding
of core university material. The topics I have chosen represent a mathematical tour of what
students typically study in the first and second years at university. As such, this book can
also serve as a preview for high-school students who are keen to learn what mathematics is
like at university.

vii
viii PREFACE

In addition, I hope this book will also benefit mathematics lecturers and teachers who
want to incorporate programming into their course. I hope to convince educators that
programming can be a meaningful part of any mathematical module.
Structure
The topics covered in this book are broadly analysis, algebra, calculus, differential
equations, probability and statistics. Each chapter begins with a brief overview of the
subject and essential background knowledge, followed by a series of questions in which key
concepts and important theorems are explored with the help of Python programs which
have been succinctly annotated. All code is available to download online.
At the end of each section, I present a Discussion section which dives deeper into the
topic. There are also a number of exercises (most of which involve coding) at the end of
each chapter.
Assumed knowledge
In terms of programming knowledge, this book does not assume that you are a highly
experienced user of Python. On the other hand, complete beginners to Python might struggle
to follow the code given. I would suggest that the reader should have the most basic
knowledge of programming (e.g. you know what a for loop does).
For completeness, I have included a section called Python 101 (Appendix A) which
gives instructions on installing Python and signposts to references that can help anyone pick
up Python quickly.
In terms of mathematical knowledge, I do not assume any university-level mathematics.
Students who are familiar with the material in the standard A-Level mathematics (or
equivalent) should be able to follow the mathematical discussions in this book.
Acknowledgements
I am extremely grateful to have received extensive comments from my early reviewers,
many of whom are students at my home department, Warwick Mathematics Institute. They
are:

• Maleeha Ahmad • Kit Liu • Zac Ruane


• Michael Cavaliere • William Mau
• Reduan Soroar
• Carl Beinuo Guo • Ben Middlemass
• Rachel Haddad • Kush Patel • Kyle Thompson
• Trinity Jinglan Hu • Payal Patel
• Ben Wadsworth
• Rachel Siyi Hong • Safeeyah Rashid
• Kornel Ipacs • Danny Robson • Jiewei Xiong

I would also like to thank the team at Springer for their support. Special thanks to Richard
Kruel for his belief in my idea and his unwavering encouragement.
Siri Chongchitnan
Coventry, UK
PREFACE ix

The code
a) Downloading and using the code
All code is available to download from

https://github.com/siriwarwick/book
We will be coding in Python 3 (ideally 3.9 or higher). To find out which version you
have, see the code box below. If you don’t have Python installed, see Appendix A.
There are many ways to run Python programs, but by default, I will assume that you
are working in JupyterLab (or Jupyter Notebook). You will be working with files with
the .ipynb extension. For more information on the Jupyter IDE (integrated development
environment), see https://jupyter.org.
There are alternative IDEs to Jupyter, for example:
• IDLE (which comes as standard with your Python distribution);
• Spyder (https://www.spyder-ide.org);
• PyCharm (https://www.jetbrains.com/pycharm);
• Replit (https://replit.com).
If you prefer these IDEs, you will be working with files with the .py extension.
Code and annotations will be given in grey boxes as shown below. The code is given on
the right, whilst the explanation is given on the left.

filename.ipynb (for checking Python version)


from platform import python_version
Let’s check your version of Python
python_version()

b) About %matplotlib
We will often use the line
%matplotlib
to make any plot interactive (rather the usual static ‘inline’ plot). The zoom and pan buttons
in the GUI window will be particularly useful. To return the static plot, use the command:

%matplotlib inline

%matplotlib is one of the so-called ‘magic’ commands1 that only work in the Jupyter
environment and not in standard Python.
If you have difficulty running with just %matplotlib, try running this line of code in an
empty cell
%matplotlib -l
(that’s a small letter L after the dash). This should list all the graphics backends available
on your machine. Choose one that works for you. For example, if qt is on the list, replace
%matplotlib by
%matplotlib qt

1 For more on magic commands, see https://ipython.readthedocs.io/en/stable/interactive/


magics.html
x PREFACE

c) Coding style
Everyone has their own style of coding. The code that I present in this book is just
one of many ways of approaching the problems. Keeping the purpose and the audi-
ence in mind, I tried to minimise the use of special packages and clever one-liners,
but instead I placed a greater emphasis on readability. In particular, I do not use the
if __name__ == "__main__": idiom. My goal in this book is to show how basic
knowledge Python goes a long way in helping us delve deeply into mathematics.
In short, every piece of code shown can be improved in some ways.
A caveat: some code may produce warnings or even errors as new versions of Python
roll out since the publication of this book. If you feel something isn’t quite right with the
code, check the book’s GitHub page for possible updates.
d) Getting in touch
Comments, corrections and suggestions for improvement can be posted on the discussion
section on the book GitHub
https://github.com/siriwarwick/book/discussions
or emailed to siri.chongchitnan@warwick.ac.uk. I will be happy to hear from you
either way.
CONTENTS

1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Basics of NumPy and Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Basic concepts in analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The ε, N definition of convergence for sequences . . . . . . . . . . . . . . . . . . . . . 10
1.4 Convergence of series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 The Harmonic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 The Fibonacci sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 The ε, δ definition of continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.8 Thomae’s function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.9 The Intermediate Value Theorem and root finding . . . . . . . . . . . . . . . . . . . . 33
1.10 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.11 The Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.12 A counterexample in analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

2 Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 Basic calculus with SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2 Comparison of differentiation formulae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Taylor’s Theorem and the Remainder term . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.5 A continuous, nowhere differentiable function . . . . . . . . . . . . . . . . . . . . . . . 67
2.6 Integration with Trapezium Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.7 Integration with Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.8 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.9 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3 Vector Calculus and Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95


3.1 Basic concepts in vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.2 The cycloid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.3 Arc length of an ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.4 Curvature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.5 Torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.6 Quadric surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.7 Surface area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.8 Normal to surfaces and the grad operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
xi
xii CONTENTS

3.9 The Divergence Theorem and the div operator . . . . . . . . . . . . . . . . . . . . . . . 129


3.10 Stokes’ theorem and the curl operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

4 Differential Equations and Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . 147


4.1 Basic concepts: ODEs, PDEs and recursions . . . . . . . . . . . . . . . . . . . . . . . . 148
4.2 Basics of Matplotlib animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.3 ODE I – first-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.4 ODE II – the pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.5 ODE III – the double pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.6 ODE IV – the Lorenz equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.7 The Logistic Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.8 The Mandelbrot set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.9 PDE I – the heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.10 PDE II – the wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

5 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215


5.1 Basics of SymPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.2 Basic concepts in linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
5.3 Linear systems in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.4 Four methods for solving Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
5.5 Matrices as linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
5.6 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
5.7 Diagonalisation: Fibonacci revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
5.8 Singular-Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
5.9 The Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
5.10 Gram-Schmidt process and orthogonal polynomials . . . . . . . . . . . . . . . . . . . 272
5.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

6 Abstract Algebra and Number Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283


6.1 Basic concepts in abstract algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
6.2 Basic concepts in number theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
6.3 Groups I – Cyclic group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.4 Groups II – Dihedral group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
6.5 Groups III – Symmetric and alternating groups . . . . . . . . . . . . . . . . . . . . . . . 302
6.6 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
6.7 Elementary number theory I: Multiplicative inverse modulo n . . . . . . . . . . 318
6.8 Elementary number theory II: Chinese Remainder Theorem . . . . . . . . . . . . 322
6.9 Elementary number theory III: Quadratic residue and the reciprocity law . 327
6.10 Analytic Number Theory I: The Prime Number Theorem . . . . . . . . . . . . . . 332
6.11 Analytic Number Theory II: The Riemann zeta function . . . . . . . . . . . . . . . 336
6.12 Analytic Number Theory III: The Riemann Hypothesis . . . . . . . . . . . . . . . . 343
6.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

7 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
7.1 Basic concepts in combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
7.2 Basic concepts in probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
7.3 Basics of random numbers in Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
7.4 Pascal’s triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
CONTENTS xiii

7.5 Coin tossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369


7.6 The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
7.7 The Monty Hall problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
7.8 The Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
7.9 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
7.10 Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
7.11 Buffon’s needle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
7.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407

8 Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.1 Basic concepts in statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
8.2 Basics of statistical packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
8.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
8.4 Student’s t distribution and hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . 429
8.5 χ2 distribution and goodness of fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
8.6 Linear regression and Simpson’s paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
8.7 Bivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
8.8 Random walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.9 Bayesian inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
8.10 Machine learning: clustering and classification . . . . . . . . . . . . . . . . . . . . . . . 471
8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

Appendix A: Python 101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489


A.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
A.2 Learning Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
A.3 Python data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
A.4 Random musings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

Biographical index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513


ALL CODE AND BRIEF DESCRIPTIONS 1

All code and brief descriptions


All code (in .ipynb and .py) is available to download from https://github.com/siriwarwick/book.

Section Filename What does the code do?


1.3 sequence-convergence.ipynb Plots terms in a sequence
1.4 series-convergence.ipynb Plots partial sums of a series
1.5 harmonic.ipynb Demonstrates that the harmonic number Hn is
approximately ln n + γ
1.6 fibonacci.ipynb Demonstrates that the Fibonacci sequence Fn
scales like φ n (where φ is the Golden ratio)
1. Analysis

1.7 Explores the ε-δ definition of continuity using


continuity.ipynb - ipywidgets slider (Jupyter only)
continuityslider.ipynb - Matplotlib slider
1.8 thomae.ipynb Plots Thomae’s function
1.9 bisection.ipynb Performs bisection root finding
1.10 differentiation.ipynb Performs numerical differentiation (forward Euler)
1.12 counterexample.ipynb Shows a counter-intuitive property of the function
f (x) = x + 2x 2 sin(1/x), showing its behaviour
near the origin
2.2 Eh.ipynb Plots the error E(h) in numerical differentiation
(forward difference) as a function of step size h
2.3 taylor.ipynb Plots a function and its Taylor series of various
degrees
2.4 taylorthm.ipynb Plots the error R N in the Taylor-series approxima-
tion as a function of the degree N
2. Calculus

2.5 weierstrass.ipynb Plots the Weierstrass function


2.6 trapezium.ipynb Plots the error E(h) in numerical integration
(Trapezium Rule) as a function of step size h
2.7 simpson.ipynb Plots the error E(h) in numerical integration
(Simpson’s Rule) as a function of step size h
2.8 improper.ipynb Plots the error in numerical integration of an
improper integral
2.9 fourier.ipynb Plots the Fourier series (up to n terms) of a given
function
3.2 cycloid.ipynb Generates a cycloid from a rolling wheel
3.3 ellipse.ipynb Calculates the perimeter of an ellipse and com-
pares with Ramanujan’s approximation
3. Vector Calculus

3.4 curvature.ipynb Plots a parametric curve in R2 and its curvature κ


3.5 torsion.ipynb Plots a parametric curve in R3 and its torsion τ
3.6 quadrics.ipynb Visualises a family of quadric surfaces
3.7 surfacearea.ipynb Calculates the area of (part of) an ellipsoidal
surface
3.8 grad.ipynb Plots a 3D surface and its contour lines
3.9 div.ipynb Demonstrates the divergence theorem
3.10 curl.ipynb Demonstrates Stokes’ theorem for a family of
surfaces
2 ALL CODE AND BRIEF DESCRIPTIONS

Section Filename What does the code do?


4.3 odesolver.ipynb Solves a first-order ODE numerically and plots the
result
4.4 pendulum.ipynb Animates a pendulum by solving a 2nd order ODE
4. Differential Equations

4.5 doublependulum.ipynb Animates the (chaotic) double pendulum by solving a


system of ODEs
4.6 lorenz.ipynb Animates the Lorenz system, showing a strange at-
tractor
4.7 logistic.ipynb Plots the bifurcation diagram for the logistic map
4.8 mandelbrot.ipynb Plots the Mandelbrot set
mandelbrot3D.ipynb Visualises the connection between the Mandelbrot set
and the logistic map
4.9 heat.ipynb Animates the solution of the 1D heat equation
4.10 wave.ipynb Animates the solution of the 2D wave equation
5.3 planes.ipynb Visualises a system of linear equations as planes in
R3
5.4 solvetimes.ipynb Plots the time taken to solve a linear system as a
function of matrix dimension
5. Linear Algebra

5.5 transformation.ipynb Visualises a 2×2 matrix as a geometric transformation


5.6 eigshow.ipynb Visualises the eigenvectors and eigenvalues of a 2 × 2
matrix via a game
5.7 diagonalise.ipynb Visualises matrix diagonalisation as a sequence of 3
transformations
5.8 svd.ipynb Performs singular-value decomposition of an image
5.9 ranknullity.ipynb Visualises the rank-nullity theorem
5.10 gramschmidt.ipynb Plots the Legendre polynomials obtained via the Gram-
Schmidt process
6.3 cayley.ipynb Produces the Cayley table for a group
6. Algebra & Number Theory

6.4 dihedral.ipynb Visualises the dihedral group as rotation and reflection


of a polygon
6.5 permutation.ipynb Produces the Cayley table for a permutation group
6.6 quaternion.ipynb Rotates a point about an axis in R3 using a quaternion
6.7 inversewheel.ipynb Produces an intriguing pattern in a circle using pairs
of multiplicative inverses modulo n
6.8 crt.ipynb Visualises the solutions of a pair of congruences
6.9 legendre.ipynb Visualises the Quadratic Reciprocity Law
6.10 pnt.ipynb Visualises the Prime-Number Theorem
6.12 zetaanim.ipynb Animates the image of the critical line Re(s) = 12
under the Riemann zeta function
ALL CODE AND BRIEF DESCRIPTIONS 3

Section Filename What does the code do?


7.4 pascal.ipynb Produces Pascal’s triangle modulo 2
7.5 coin1.ipynb Simulates multiple coin throws and plots the distribution
of the number of heads observed
coin2.ipynb Plots the distribution of the first occurrence of a partic-
ular sequence in multiple coin throws
7. Probability

7.6 birthday.ipynb Simulates the birthday problem


7.7 montyhall.ipynb Simulates the Monty Hall problem
7.8 normal.ipynb Simulates the Galton board (visualising the normal
distribution)
7.9 poisson.ipynb Simulates scattering dots randomly in a grid and plots
the distribution of dot counts per square
7.10 montecarlo.ipynb Plots the fractional error for Monte Carlo integration as
a function of the number of random points used
7.11 buffon.ipynb Simulates the Buffon’s needle problem
8.3 CLT.ipynb Demonstrates the Central Limit Theorem by sampling
from a given distribution
8.4 ttest.ipynb Plots the distribution of the population mean from a
small sample and performs a one-sample t-test
8.5 chi2test.ipynb Plots the observed and expected frequencies for cate-
gorical data and performs a χ2 test
8. Statistics

8.6 regression.ipynb Plots a regression line through data points and demon-
strates Simpson’s paradox
8.7 bivariate.ipynb Plots the bivariate normal distribution and its contour
ellipses
8.8 randomwalk.ipynb Generates 1D random walks and plots the mean distance
travelled as a function of time step
8.9 bayesian.ipynb Plots the prior and posterior for Bayesian inference
8.10 clustering.ipynb Performs k-means clustering
classification.ipynb Performs k-nearest neighbour (kNN) classification
4 VISUALISATION RECIPES

Visualisation recipes
Recipe Section Code
Plots
Plot with different types of lines 1.3 sequence-convergence.ipynb
Plot a large number of curves, with a gradual 2.3 taylor.ipynb
colour change
Two-panel plot in R2 1.5 harmonic.ipynb
Plot parametric curves and surfaces in R2 and R3 3.1 see text
Plot polar curves 3.1 see text
Plotting in R3 with Plotly 5.3 planes.ipynb
Three-panel plot in R3 5.9 ranknullity.ipynb
Plot one panel in R2 next to one in R3 3.8 grad.ipynb
Plot a 3D surface and its contour lines 3.8 grad.ipynb
Shade the area under a graph 8.4 ttest.ipynb
Draw a filled polygon 6.4 dihedral.ipynb
Sliders
Slider controlling a plot in R2 1.7 continuityslider.ipynb
Slider controlling a plot in R3 3.6 quadrics.ipynb
One slider controlling two plots in R2 3.4 curvature.ipynb
One slider controlling two plots, one in R2 and 3.5 torsion.ipynb
one in R3 8.7 bivariate.ipynb
Two sliders controlling one plot in R2 2.5 weierstrass.ipynb
Heatmaps
Heatmap + vector field + slider (in R2 ) 3.9 div.ipynb
Heatmap + slider (polar coordinates) 3.10 curl.ipynb
Animations
Animation in R2 4.4 pendulum.ipynb
4.9 heat.ipynb
Animation with two panels in R2 6.12 zetaanim.ipynb
Animation in R3 4.6 lorenz.ipynb
4.10 wave.ipynb
Visualising matrices
Display a matrix with colour-coded elements 6.3 cayley.ipynb
4.8 mandelbrot.ipynb
Read in an image file 5.8 svd.ipynb
Visualising data
Plot a histogram with Matplotlib 7.5 coin1.ipynb
Plot a histogram with Seaborn 8.3 CLT.ipynb
Fit a line through a scatter plot 7.10 montecarlo.ipynb
Read data from a file and store as Pandas 8.6 regression.ipynb
dataframes
Write data to a file 8.7 see text
Shade 2D regions according to classifications with 8.10 classification.ipynb
Scikit-learn
CHAPTER
ONE

Analysis

Real analysis is the study of real numbers and maps between them (i.e. functions). Analysis
provides a solid foundation for calculus, which in turn gives rise to the development of other
branches of mathematics. The main feature of analysis is its rigour, meaning that every
concept in analysis is defined precisely with logical statements without the need for pictures.

Fig. 1.1: Augustin-Louis Cauchy (1789–1857), one of the founders of modern analysis.
Cauchy is commemorated on a French stamp shown on the right. (Image source: [137].)

Analysis is a subject which many new university students find to be most different
from how mathematics is taught in school. The proof-driven nature of the subject can
be overwhelming to some students. In this chapter, we will see how Python can help us
visualise and understand key concepts in analysis.
In addition to real numbers, the main objects in analysis are sequences, series and
functions. We will first see how these objects can be represented and manipulated in
Python. We then give a survey of some key theorems in analysis and see how Python
can help us understand these theorems more deeply. We will focus on understanding and
visualising the theorems themselves, rather than the proofs. The proofs of the theorems
discussed in this chapter can be found in good analysis textbooks, amongst which we
recommend [10, 20, 91, 143, 189].

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 5


S. Chongchitnan, Exploring University Mathematics with Python,
https://doi.org/10.1007/978-3-031-46270-2_1
6 1 Analysis

1.1 Basics of NumPy and Matplotlib

NumPy is an indispensable Python library for mathematical computing. Most mathematical


objects and operations that you are familiar with are part of NumPy. Equally indispensable
is the Matplotlib library which is the key to visualising mathematics using graphs and
animations. It is standard practice to import NumPy and Matplotlib together using the
following two lines at the beginning of your code.
import numpy as np
import matplotlib.pyplot as plt

1.2 Basic concepts in analysis

Sequences

A sequence (an ) is simply a succession of numbers with the subscript n = 1, 2, 3 . . . labelling
the order of appearance of each term. For example, the sequence (an ) = n12 contains the
terms a1 = 1, a2 = 14 , a3 = 19 and so on. The bracketed notation (an ) denotes the entire
sequence, whereas an denotes a single term in the sequence.
Sequences can be finite, for example, (an ) 5n=1 = 1, 14 , 19 , 16
1 1
, 25 . However, in analysis, we
are mainly interested in the behaviour of infinite sequences (an ) ∞ n=1 .
Sequences are easily represented as NumPy arrays, which are one of the most useful
objects in Python (another method is to use lists which we will discuss later). Naturally, we
cannot store infinitely long arrays in Python. But take the array to be long enough and it
will usually be sufficient to reveal something about the behaviour of the infinite sequence.
It is often necessary to create a sequence of equally spaced numbers between two given
numbers (in other words, an arithmetic progression). Such a sequence can be easily created
using NumPy’s linspace command:
np.linspace(first element, last element, how many elements)
The third argument is optional. If not provided, the default is 50 elements, equally spaced
between the first and the last elements. It is worth noting the terminology here: each term in
a sequence can be identified with an element in a NumPy array. linspace is especially
useful for plotting graphs, since we usually want equally spaced points along the x-axis.
The same sequence can also be created using the arange command of the form:
np.arange(first element, >last element, step)
where >last element means any number greater than the last element, but not exceeding last
element + step. This is because, by convention, the 2nd argument in arange is not included
in an arange array. In practice, just choose a number that is slightly larger than the final
element desired. The third argument is optional, with the default = 1.
If you want to create a short sequence by manually specifying every element, use the
following syntax:
np.array([element1, element2, . . . , last element])
1.2 Basic concepts in analysis 7

Here are some examples of how to create sequences in Python and obtain new ones using
array operations.

Creating sequences as arrays


Standard header (for all code on this page) import numpy as np

100 evenly spaced values from 0 to 5 np.linspace(0, 5, 100)

(a n ) = 3, 4, 5, . . . 99 a_n = np.arange(3, 100)

(b n ) = 3, 5, 7, . . . 99 b_n = np.arange(3, 99.1, 2)



(c n ) = 13 , 2, π c_n = np.array([1/3, np.sqrt(2), np.pi])

(d n ) = 100 terms, all 0. d_n = np.zeros(100)

(e n ) = 100 terms, all 0.7. e_n = np.full(100, 0.7)

Array operations
Addition and scalar multiplication
(x n ) = (2a n − 5) x_n = 2*a_n - 5
(y n ) = (a n + x n ) y_n = a_n + x_n

Multiplication and division


(x n y n ) x_n*y_n
(2/x n ) 2/x_n
(x n /y n ) x_n/y_n

Exponentiation
x n3
 
x_n**3
(3 x n ) 3**x_n

Other useful array commands


The number of elements in an array x n len(x_n)
Shape of the array x_n.shape

Calling elements of (x n ) by its index


The first element x_n[0]
The fifth element x_n[4]
The last element x_n[-1]
The first 9 elements (slicing) x_n[0:9]

(z n ) = zero array, same shape as x n z_n = np.zeros_like(x_n)


(w n ) = array of 0.7, same shape as x n w_n = np.full_like(x_n, 0.7)

Here are some Python tips to help avoid errors.


• Initial index In Python, the index of each array starts from 0, not 1.
• ** means power In some programming languages (e.g. MATLAB), x 2 is coded as
x^2. However, in Python, the operation a^b is a bitwise addition modulo 2 (i.e. a and b
are converted to binary and added bitwise modulo 2). This is something that we will
seldom need in mathematics problems, so be careful!
• List vs array Instead of a NumPy array, one can also work with a Python list. There
are subtle differences between the two data types. For most numerical purposes, we
will be using arrays, although there will be occasions where we will also use lists, or
mix the two data types. We compare arrays and lists in detail in the Appendix A.3.
8 1 Analysis

Series

A series is the sum of terms in a sequence. If the sequence is infinitely long,


 the sum of the
1 ∞
terms is an infinite series denoted ∞ . For example, the sequence gives rise
P
n=1 x n 2
n n=1
to the series

X 1 1 1 1
=1+ + + +...
n=1
n 2 4 9 16
Of course, Python cannot deal with infinitely many terms, but we can still compute the
partial sums
XN
SN = x n = x1 + x2 + . . . + x N .
n=1

for large values of N. These partial sums should give us a good idea about whether the
series converges to a finite number, or diverges (e.g. the sum becomes arbitrarily large, or
fluctuates without settling down to a value). We will study the convergence of a number of
interesting series in §1.4.
A useful command for evaluating series is sum(an array). Another useful operator in
Python 3 is the @ operator, which is equivalent to the dot product of two arrays, i.e.
x@y = np.dot(x, y) = sum(x*y)
Of course there is also the option of using the for loop. This is the most appropriate method
if you need to plot the partial sums. Here are two examples of series evaluation using various
methods.
Series evaluation
import numpy as np
100
X
n x= np.arange(1,101)
n=1 sum(x)
# or
S = 0
for n in np.arange(1,101):
S += n
100
X
n(n + 3) x@(x+3) # or
n=1 sum(x*(x+3)) # or
np.dot(x, x+3) # or
S = 0
for n in np.arange(1,101):
S += n*(n+3)

Functions

A function f is a mapping which takes as its input an object x ∈ A (where A is called the
domain), and produces an output f (x) ∈ B (where B is called the codomain). We denote
such a mapping by the notation f : A → B.
1.2 Basic concepts in analysis 9

Functions in Python are defined using the def command, which has a rich structure. A
function expects brackets in which arguments (if any) are placed. In most cases, you will
also want your function to return an output.
Instead of def, another way to define a function in a single line is to use the lambda
function (also called anonymous function).
For example, let F (x) = 2x + 1 and G(x, y) = x + y. In the box below, the left side
shows the def method. The right side shows the lambda-function method.
Defining functions
Method I: def Method II: Lambda functions
def F(x): F = lambda x : 2*x+1
return 2*x+1

def G(x,y): G = lambda x,y : x+y


return x+y

We often use lambda functions when creating very simple functions, or when feeding a
function into another function. In the latter case, the lambda function does not even need
to have a name (hence ‘anonymous’). This will be useful when we discuss integration in
Python – for a preview, take a look at the end of §3.1.
Now that we can define sequences, series and functions in Python, we are ready to
investigate some key definitions and theorems in analysis.
10 1 Analysis

1.3 The ε, N definition of convergence for sequences

A sequence of real numbers (x n ) is said to converge to x ∈ R (and we write


lim x n = x) if,
n→∞

For any ε > 0, there exists an integer N ∈ N such that, for all n > N, |x n − x| < ε.

Consider sequences defined by the following expressions for each term.


!n
1 sin n ln n 1
a) x n = 1 − b) yn = c) z n = d) En = 1 + .
n n n n
For each sequence, find the limit as n → ∞. Investigate the relationship between ε
and N.

This is one of the most important definitions in analysis. It gives a rigorous way to express
an intuitive concept of convergence as n → ∞ very precisely. Let’s first try to unpack the
definition.
Let’s call |x n − x| the error (i.e. the absolute difference between the nth term of the
sequence and the limit x). The definition above simply says that down the tail of the sequence,
the error becomes arbitrarily small.
If someone gives us ε (typically a tiny positive number), we need to find how far down
the tail we have to go for the error to be smaller than ε. The further we have to go down the
tail, the larger N becomes. In other words, N depends on ε.
Now let’s consider the first sequence x n = 1 − n1 . The first step is to guess the limit as
n → ∞. Clearly, the n1 term becomes negligible and we guess that the limit is 1. Now it
remains to prove this conjecture formally. We go back to the definition: for any given ε > 0,
the error is given by
1 1
|x n − 1| = < < ε,
n N
where the final inequality holds if we let N be any integer greater than 1/ε. For instance, if
someone gives us ε = 3/10, then any integer N ≥ 10/3 would work (e.g. N = 4). We have
shown that this choice of N implies that the definition holds, hence we have proved that
lim x n = 1.
n→∞
The definition holds if there exists one such integer N, so that in the proof above, N + 1
or 3N 2 + 5 are also equally good candidates (one just goes further down the tail of the
sequence). However, it is still interesting to think about the smallest N (let’s call it Nmin )
that would make the definition of convergence work. We could tabulate some values of Nmin
for the sequence x n .
ε 0.3 0.2 0.1 0.05 0.03
Nmin 4 5 10 20 34
We can easily confirm these calculations with the plot of the error |x n − 1| against n
(solid line in fig. 1.2).

Next, let’s consider the sequences yn . What does it converge to? Since sin n is bounded
by −1 and 1 for all n ∈ N, whilst n can be arbitrarily large, we guess that yn converges to 0.
The proof can be constructed in the same way as before. For any ε > 0, we find the error:
1.3 The ε, N definition of convergence for sequences 11

| sin n| 1 1
|yn − 0| = ≤ < < ε, (1.1)
n n N
where the final inequality holds if we let N be any integer greater than 1/ε. However, the
latter may not be Nmin , because we have introduced an additional estimate | sin n| ≤ 1.
It seems that Nmin cannot be obtained easily by hand, and the error plot (dashed line in
fig. 1.2) also suggests this. For example, we find that when ε = 0.1, Nmin = 8 rather than 10
as we might have expected from Eq. 1.1.
A similar analysis of the relationship between ε and N can be done for remaining
sequences z n and En . The sequence z n converges to 0 because, intuitively, ln n is negligible
compared with n when n is large (we discuss how this can be proved more rigorously in the
Discussion section).
Finally, the famous sequence En can be regarded as the definition of Euler’s number e
!n
1
e = lim 1 + .
n→∞ n

The errors of all these sequences approach zero, as shown in fig. 1.2 below. Matplotlib’s
plot function renders the errors as continuous lines, but keep in mind that the errors form
sequences that are defined only at integer n, and the lines are for visualisation purposes only.

0.40
|xn 1|
0.35 |yn|
|zn|
0.30 |En e|
0.25
|Error|

0.20
0.15
0.10
0.05
0.00
0 5 10 15 20 25 30 35 40
n
Fig. 1.2: The error, i.e. the absolute difference between the sequences x n, yn, z n, En and
their respective limits.

The code sequence-convergence.ipynb plots fig. 1.2. Remember that all code in
this book can be downloaded from the GitHub page given in the Preface.
12 1 Analysis

sequence-convergence.ipynb (for plotting fig. 1.2)


import numpy as np
import matplotlib.pyplot as plt

Consider up to n = 40 n = np.arange(1,41)

recp = 1/n
Define x n xn = 1-recp
Error = |x n − x | where x = lim x n = 0 err_xn = recp
n→∞

lim y n = 0 yn = np.sin(n)/n
n→∞
err_yn = abs(yn)

lim z n = 0 zn = np.log(n)/n
n→∞
err_zn = abs(zn)

lim E n = e En = (1+recp)**n
n→∞
err_En = abs(En-np.e)

Plotting with dots (markersize = 3) joined by plt.plot(n, err_xn, 'o-' , ms=3)


various line styles plt.plot(n, err_yn, 'o--', ms=3)
plt.plot(n, err_zn, 'o-.', ms=3)
plt.plot(n, err_En, 'o:' , ms=3)

plt.xlim(0, 40)
plt.ylim(0, 0.4)
r allows typesetting in LaTeX plt.xlabel(r'$n$')
plt.ylabel('|Error|')
plt.legend([r'$|x_n-1|$', r'$|y_n|$',
r'$|z_n|$', r'$|E_n-e|$'])
plt.grid('on')
plt.show()

Suppose that at some point, the error shrinks monotonically (i.e. it does not fluctuate like
the error for yn ), we can use the code below to search for Nmin for a given value of epsilon.
Take En for example.

Finding N given ε
For any given ε, say, 10−5 epsilon = 1e-5
n = 1
err = np.e - (1 + 1/n)**n
Continue searching until the error < ε while err >= epsilon:
Increasing n by 1 per iteration n += 1
err = np.e - (1 + 1/n)**n

Report Nmin print("N_min = ", n-1)

In this case, we find that for the error to satisfy |En − e| < 10−5 , we need n > Nmin =
135912.
1.3 The ε, N definition of convergence for sequences 13

Discussion
• Floor and ceiling. Instead of saying “N is the smallest integer greater than or equal to
x”, we can write
N = dxe,
which reads “the ceiling of x". For example, in our proof that the sequence x n = 1 − 1/n
converges, given any ε > 0, we can take N = dε −1 e.
Similarly, one can define bxc (the floor of x) as the smallest integer less than or equal
to x.
• Squeeze Theorem. Consider yn = sin n/n. The numerator is bounded between −1 and
1, so we see that
−1 1
≤ yn ≤ .
n n
As n becomes large, yn is squeezed between 2 tiny numbers of opposite signs. Thus,
we have good reasons to believe that yn also converges to 0. This idea is formalised by
the following important theorem.
Theorem 1.1 (Squeeze Theorem) Let an, bn, cn be sequences such that an ≤ bn ≤ cn
for all n ∈ N. If lim an = lim cn , then lim an = lim bn = lim cn .
n→∞ n→∞ n→∞ n→∞ n→∞

The Squeeze Theorem can also be used to prove that z n = ln n/n → 0. Using the
inequality ln x < x for x > 0, observe that
√ √
ln n 2 ln n 2 n 2
0≤ = < =√ .
n n n n
Now take the limit as n → ∞ to find that lim ln n/n = 0.
n→∞
• Monotone convergence. Why is the sequence En = (1 + n1 ) n convergent? First, we
will show that En < 3. The binomial expansion gives:
!n n
1 n 1
X !
En = 1 + =1+1+
n k=2
k nk

Observe that:
n 1 1 n(n − 1)(n − 2) . . . (n − (k − 1))
!
=
k nk k! n·n·n...·n
1 1 2 k −1
! ! !
= ·1· 1− 1− ··· 1−
k! n n n
1
<
k!
1
=
1·2·3... · k
1

1·2·2... ·2
1
= k−1 .
2
14 1 Analysis

Therefore, the sequence is bounded above by the sum of a geometric series.


n ∞
X 1 X 1
En < 2 + < 2 + =3
k=2
2 k−1
k=2
2 k−1

2.7
2.6
2.5
2.4
En

2.3
2.2
2.1
2.0
0 10 20 30 40
n
Fig. 1.3: En = (1 + n1 ) n .

In addition, the graph of the sequence En (fig. 1.3) shows that the sequence is strictly
increasing (i.e. En < En+1 ). The proof is an exercise in inequalities (see [20]). These
facts imply that En converges due to the following theorem for monotone (i.e. increasing
or decreasing) sequences.
Theorem 1.2 (Monotone Convergence Theorem) A monotone sequence is convergent
if and only if it is bounded.
1.4 Convergence of series 15

1.4 Convergence of series

By plotting the partial sums, conjecture whether each of following series converges
or diverges.
∞ ∞ ∞ ∞
X 1 X (−1) k+1 X 1 X 1
a) , b) , c) √ , d) .
k=1
k 2
k=1
k k=1 k k=1
1 + ln k

1
Let’s have a look at one way to plot the partial sum for (a) where Sn = k=1 k 2 .
Pn

series-convergence.ipynb (for plotting the blue solid curve in fig. 1.4)


import numpy as np
import matplotlib.pyplot as plt

Calculate partial sums up to S10 x = np.arange(1,11)

Initialise S to be an array with 10 zeros. S = np.zeros(len(x))


We will use S to collect the partial sums
Define each term as a function def afunc(k):
return 1/k**2

The first partial sum S[0] = afunc(1)

for i in range(1, len(x)):


S i = S i−1 + (ith term) S[i] = S[i-1] + afunc(x[i])

Plot the partial sum with dots joined by plt.plot(x, S, 'o-')


solid line plt.xlim(1, 10)
plt.xlabel('Number of terms')
plt.ylabel('Partial sum')
plt.legend(['a'])
plt.grid('on')
plt.show()

The graphs for (b), (c) and (d) can be obtained similarly by augmenting the above code.
Fig. 1.4 shows the result. We are led to conjecture that series (a) and (b) converge, whilst (c)
and (d) diverge.
Discussion
• p-series. Whilst the conjectures are correct, the graphs do not constitute a proof. In
analysis, we usually rely on a number of convergence tests to determine whether a series
converges or diverges. These tests yield the following very useful result in analysis:
Theorem 1.3 The p-series n1p converges if p > 1 and diverges if p ≤ 1.
P

where the shorthand here means ∞ n=n 0 for some integer n0 . This theorem explains
P P
why series (a) converges and (c) diverges.
• Euler’s series. The exact expression for the sum in series (a) was found by Euler in
1734. The famous result

π2

X 1
= , (1.2)
n=1
n 2 6
16 1 Analysis

5 a
b
c
4 d
Partial sum
3

1 2 3 4 5 6 7 8 9 10
Number of terms
Fig. 1.4: The partial sums for the series (a)-(d) up to 10 terms

(sometimes called the Basel problem) has a special place in mathematics for its
wide-ranging connections, especially to number theory.
Euler’s series (1.2) is often interpreted as a particular value of the Riemann zeta function
(ζ (2) = π 2 /6 ≈ 1.6449). See reference [4] for a number of accessible proofs of this
result. A proof using Fourier series will be discussed later in §2.9. We will study the
zeta function more carefully later in §6.11.
• Taylor series. The value of series (b) can be derived from the well-known Taylor
(Maclaurin) series
x2 x3 x4
ln(1 + x) = x − + − ...
2 3 4
valid for x ∈ (−1, 1]. Substituting x = 1 shows that series (b) converges to ln 2 ≈ 0.6931.
We will discuss why this holds on the interval (−1, 1] in §2.4.
• Comparison test. Finally, we can understand why series (d) diverges by observing that,
because n > ln n for all n ∈ N, we have
1 1
> .
1 + ln n 1 + n
Thus, series (d) is greater than n1 , which is divergent (as it is a p-series with p = 1).
P
This technique of deducing if a series converges or diverges by considering its magnitude
relative to another series (usually a p-series) can be formally expressed as follows.
Theorem 1.4 (Comparison Test) Let x n and yn be real sequences such that (eventually)
0 ≤ x n ≤ yn . Then a) x n converges if yn converges, b) yn diverges if x n
P P P P
diverges.
1.5 The Harmonic Series 17

1.5 The Harmonic Series

The Harmonic Series is given by



X 1 1 1
= 1+ + +...
n=1
n 2 3

Show that the partial sum of the series grows logarithmically (i.e. increases at the
same rate as the log function).

We wish to calculate the partial sums of the Harmonic Series, where each partial sum of
N terms is given by
N
X 1
.
n=1
n
As in the previous section, to calculate the partial sum of N + 1 terms in Python, we simply
add one extra term to the partial sum of N terms.
Another thing to note is that, because the sum grows very slowly, we will get a more
informative graph if we plot the x-axis using log scale, whilst keeping the y-axis linear.
This is achieved using the command semilogx.
The question suggests that we might want to compare the partial sums with the (natural)
log curve. We will plot the two curves together on the same set of axes. In fact, if the
partial sums up to N terms grow like ln N, then it might even be interesting to also plot the
difference between the two.
The code harmonic.ipynb produces two graphs, one stacked on top of the other. The
top panel shows the growth of the harmonic series in comparison with the log. The difference
is shown in the bottom panel. The calculation itself is rather short, but, as with many
programs in this book, making the plots informative and visually pleasing takes a little more
work.
The resulting plots, shown in fig. 1.5 shows a very interesting phenomenon: the upper
plot shows that the partial sums grows very slowly just like the log, but offset by a constant.
When we plot the difference between the two curves, we see that the difference is between
0.57 and 0.58.
These graphs lead us to conjecture that there is a constant γ such that
N
X 1
γ = lim * − ln N + . (1.3)
, n=1 n
N →∞
-
It is useful to express this as an approximation:
N
X 1
For large N, ≈ ln N + γ. (1.4)
n=1
n
18 1 Analysis

harmonic.ipynb (for plotting fig. 1.5)


import numpy as np
import matplotlib.pyplot as plt

How many terms in the sum? Nmax = 1e5


n = 1, 2, . . . Nmax n = np.arange(1, Nmax+1)
Initialise hplot, using it to collect the hplot = np.zeros_like(n)
partial sums. harmo is the running total harmo = 0

Collecting the partial sums for N in np.arange(0, int(Nmax)):


P N +1 harmo += 1/n[N]
hplot[N] = n=1 1/n hplot[N] = harmo

Create 2 stacked plots, specifying fig, (ax1, ax2)=plt.subplots(2, figsize=(5,6))


the dimension in inches
The upper plot (log scale on the x-axis) ax1.semilogx(n, hplot, n, np.log(n) , '--')
showing the Harmonic Series and ln n ax1.set_xlim([10, Nmax])
ax1.set_ylim([2, 12])
ax1.legend(['Harmonic series', r'$\ln n$'],
Adjust legend location loc = 'lower right')
ax1.grid('on')

The lower plot in red (log x-axis) ax2.semilogx(n, hplot-np.log(n), 'r')


showing Harmonic Series − ln n ax2.set_xlim([10, Nmax])
ax2.set_ylim([0.57, 0.63])
ax2.set_xlabel(r'$n$')
ax2.legend([r'Harmonic - $\ln n$'])
ax2.grid('on')
plt.show()

Discussion
• The Euler-Mascheroni constant. The constant γ is known as the Euler-Mascheroni
constant (not to be confused with Euler’s number e), where

γ = np.euler_gamma = 0.5772 . . .

consistent with our findings. The convergence can be proved by showing that the
difference is monotone decreasing (as seen in the lower panel of fig. 1.5) and bounded
below. Hence the limit exists by the monotone convergence theorem. A comprehensive
account of the history and mathematical significance of γ can be found in [121].
• The Harmonic Series is divergent. Another observation from the graphs is that the
Harmonic Series diverges to ∞, just like the log. In fact, we can deduce the divergence
by the following proof by contradiction. Suppose that the Harmonic series converges to
S, then, grouping the terms pairwise, we find

1 1 1 1 1
! ! !
S = 1+ + + + + +...
2 3 4 5 6
1 1 1 1 1 1
! ! !
> + + + + + +...
2 2 4 4 6 6
= S.

Several other accessible proofs can be found in [113].


1.5 The Harmonic Series 19

See exercise 1 for an intriguing physical situation in which the Harmonic Series appears.

12
10
8
6
4 Harmonic series
lnn
2 1
10 102 103 104 105
0.63
Harmonic - lnn
0.62
0.61
0.60
0.59
0.58
0.57 1
10 102 103 104 105
n
Fig. 1.5: Top: The partial sums of the Harmonic Series grows like ln n. Bottom: The
difference between the two curves approach a constant γ ≈ 0.577.
20 1 Analysis

1.6 The Fibonacci sequence

The Fibonacci sequence is given by the recursive relation:

F1 = 1, F2 = 1, Fn = Fn−1 + Fn−2 .

Investigate the growth of the following quantities: a) Fn b) Rn B Fn /Fn−1 .

The Italian mathematician Fibonacci (1170–1250) mentioned the sequence in his Liber
Abaci (‘book of calculations’) published in 1202, although the sequence was already known
to ancient Indian mathematicians as early as around 200BC. The sequence is ubiquitous
in nature and has surprising connections to art and architecture. See [169] for a readable
account of the Fibonacci sequence.
A quick calculation of a handful of terms shows that

Fn = (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . .),

and Rn is the ratio of consecutive terms. The growth of Fn and Rn is shown in fig. 1.6. The
figure is producing by the code fibonacci.ipynb.
Let’s consider the growth of Fn . The semilog plot in fig. 1.6 is, to a good approximation,
a straight line. This suggests that, for large n at least, we have an equation of the form

ln Fn ≈ αn + β =⇒ Fn ≈ Aφ n,

where α and β are the gradient and y-intercept of the linear graph, and the constants A = e β
and φ = eα . We are only interested in the growth for large n so let’s focus on the constant φ
for now.
The gradient α of the line can be calculated using two consecutive points at n and n − 1.
Joining these points gives a line with gradient
ln Fn − ln Fn−1 Fn
α= = ln = ln Rn .
1 Fn−1
=⇒ φ = eα = Rn .

Thus we conclude that Fn grows like φ n for large n, where φ = lim Rn , which according
n→∞
to the other plot in fig. 1.6, appears to be just below 1.62. In fact, Python tells us that
R25 ≈ 1.618034.
Finally, we estimate the y-intercept from the expression

β = ln Fn − n ln Rn .

Using n = 25, Python tells us that A = e β ≈ 0.4472136.


1.6 The Fibonacci sequence 21

105 1.67
1.66
104 1.65
1.64
103

1
Fn/Fn
Fn

1.63
102 1.62
1.61
101
1.60
5 10 15 20 25 5 10 15 20 25
n n

Fig. 1.6: Left: The Fibonacci sequence Fn plotted against n. The vertical scale is logarithmic.
Right: The ratio Rn between consecutive Fibonacci numbers appears to approach a constant
φ ≈ 1.618.

fibonacci.ipynb (for plotting fig. 1.6)


import numpy as np
import matplotlib.pyplot as plt

Plot up to F25 Nend = 25

Initialise the sequences Fn F = np.zeros(Nend+1)


and R n = Fn /Fn−1 R = np.zeros(Nend+1)

Define F1 and F2 F[1] = 1


F[2] = 1

Iterate the recurrence for i in np.arange(3,Nend+1):


F[i] = F[i-1] + F[i-2]
R[i] = F[i]/F[i-1]

Plot two figures side by side fig,(ax1,ax2)=plt.subplots(1,2,figsize=(11,4))

Set smallest N value on the x-axis Nmin = 5


Plotting on domain [Nmin, Nend] Nplt = range(Nmin, Nend+1)

Use vertical log scale to see the ax1.semilogy(Nplt, F[Nmin:Nend+1], 'bo-', ms=3)
growth of Fn ax1.set_xlabel(r'$n$')
ax1.set_ylabel(r'$F_n$')
ax1.set_xlim(Nmin, Nend)
Manual adjustment of tick frequency ax1.set_xticks(range(Nmin, Nend+1, 5))
ax1.grid('on')

Use linear scales for plotting R n ax2.plot(Nplt, R[Nmin:Nend+1], 'ro-', ms=3)


ax2.set_xlabel(r'$n$')
ax2.set_ylabel(r'$F_{n}/F_{n-1}$')
ax2.set_xlim(Nmin, Nend)
ax2.set_xticks(range(Nmin, Nend+1, 5))
ax2.grid('on')

plt.show()
22 1 Analysis

Discussion
• The Golden Ratio. φ in fact corresponds to the Golden Ratio

1+ 5
φ= .
2
In fact, our estimate R25 is remarkably accurate to 9 decimal places. The connection
between Rn and φ can be seen by observing that, from the recurrence relation, we find
Fn Fn−2 1
=1+ =⇒ Rn = 1 + . (1.5)
Fn−1 Fn−1 Rn−1
Thus if the lim Rn exists and equals φ, then it satisfies the equation
n→∞

1
φ=1+ ,
φ
which defines the Golden Ratio.
• Binet’s formula. Python allowed us to discover part of a closed-form expression for
Fn called Binet’s formula
φ n − (1 − φ) n
Fn = √ .
5
We will derive this formula when we study matrix diagonalisation in §5.7 . For large
n, we see that Fn ≈ √1 φ n . More precisely, for any positive integer n, the Fibonacci
5
number Fn is the integer closest to the number √1 φ n . This follows from the fact that
5
the second term in Binet’s formula is small. To see this, let r = 1 − φ and note that
|r | n = (0.618 . . .) n < 1 for all n. Therefore,
1 1 1 1
√ (1 − φ) n < √ < √ = .
5 5 4 2

This shows that Fn is the integer nearest to √1 φ n .


5
• Contractive sequences. It remains to explain why the limit of Rn exists. This is a
consequence of the following theorem (see [20]).
Theorem 1.5 If a sequence x n eventually satisfies the relation

|x n+1 − x n | ≤ C|x n − x n−1 |

where the constant C ∈ [0, 1), then x n is convergent.


Such a sequence is said to be contractive. We show that the sequence Rn is contractive
as follows. From Eq. 1.5, it is clear that for n ∈ N, Rn ∈ [1, 2], which implies
Rn+1 = 1 + R1n ≥ 32 . Using (1.5) again we find that for n ≥ 4,

1 1
|Rn+1 − Rn | = −
Rn Rn−1
|Rn − Rn−1 |
=
|Rn Rn−1 |
2 2 4
≤ · |Rn − Rn−1 | = |Rn − Rn−1 |.
3 3 9
1.7 The ε, δ definition of continuity 23

1.7 The ε, δ definition of continuity

Students are taught in school that a continuous function is one for which the graph can be
drawn without lifting the pen. However, at university, the focus of mathematics is shifted
away from drawings, descriptions and intuition to mathematics that is based purely on logic.
In this spirit, we want to be able to define continuity logically (i.e. using symbols, equations,
inequalities. . . ) without relying on drawings.

A function f : A → R is said to be continuous at a point x 0 ∈ A if, for all ε > 0,


there exists δ > 0 such that, for all x ∈ A, we have

|x − x 0 | < δ =⇒ | f (x) − f (x 0 )| < ε.

For the following functions, investigate the relationship between ε and δ at x = x 0 .


2
a) f (x) = (x 0 = 1),
x
sin x

 x , 0,
b) g(x) = ln(x + 1) (x 0 = 1), c) h(x) =  x (x 0 = 0).


1 x = 0.

The ε-δ definition given above was first published in 1817 by the Bohemian mathematician
and philosopher Bernard Bolzano (1781–1848). It expresses precisely what it means for a
function to be continuous at a point purely in terms of inequalities. This is one of the most
important definitions in real analysis, and is also one which many beginning undergraduates
struggle to understand. Let’s first try to unpack what the definition says.
If f is continuous at x 0 , it makes sense to demand that we can always find some y
values arbitrarily close to (or equal to) y0 = f (x 0 ). Symbolically, the set of “all y values
arbitrarily close to y0 ” can be expressed as follows.
Definition Let ε > 0 and y0 ∈ R. The ε-neighbourhood of y0 is the set of all y ∈ R such
that |y − y0 | < ε.
Now, we want the y values inside the ε-neighbourhood of y0 to “come from" some values
of x. In other words, there should be some x values such at that y = f (x). It is natural to
demand that those values of x should also be in some neighbourhood of x 0 . We are satisfied
as long as “there exists such a neighbourhood” in the domain A. This statement can be
written symbolically as:

∃δ > 0 such that ∀x ∈ A, |x − x 0 | < δ . . .


The symbol ∃ reads “there exists", and ∀ reads “for all".
Combining what we have discussed so far, we say that a function f is continuous at
x 0 if there exists a neighbourhood of x 0 which gets mapped by f into an arbitrarily small
neighbourhood of y0 = f (x 0 ). Symbolically:

∀ε > 0, ∃δ > 0 such that ∀x ∈ A, |x − x 0 | < δ =⇒ | f (x) − f (x 0 )| < ε.


This explains the intuition behind the definition. One could think about this as a game in
which we are given a value of ε, and our job is to find a suitable value of δ.
In fact, if we can find a certain value of δ that works, then so will any other values δ 0
such that 0 < δ 0 < δ. Often we may find that there is a largest value, δmax , that works for
each choice of ε.
24 1 Analysis

Now let’s see how Python can help us visualise this ε-δ game for the function a)
f (x) = 2/x, at x 0 = 1. The code continuity.ipynb produces a GUI (graphical user
interface) as shown in fig. 1.7. Warning: This code only works in a Jupyter environment1.

continuity.ipynb (for plotting the top panel of fig. 1.7). Jupyter only
import matplotlib.pyplot as plt
import numpy as np
ipywidgets make the plot interactive from ipywidgets import interactive

The function f def f(x):


return 2/x

The inverse function f −1 def finverse(x):


return 2/x

Domain x = np.linspace(0.5, 2)
We will study the continuity of f at x 0 x0 = 1
y = f(x)
y0 = f(x0)
Now define a function of ε to feed
into the slider def plot(eps):
f (x 0 ) + ε y0p = y0+eps
f (x 0 ) − ε y0m = y0-eps
The x values of the above two points x0p = finverse(y0p)
(We use them to calculate δ) x0m = finverse(y0m)

Where to draw vertical and horizon- vertical = [x0, x0p, x0m]


tal dotted lines horizontal = [y0, y0p, y0m]

Plot y = f (x) in red plt.plot(x, y, 'r')


An easy way to plot horizontal lines for Y in horizontal:
(black dotted) plt.axhline(y = Y, color = 'k',
linestyle = ':')
. . . and vertical lines for X in vertical:
(cyan dotted) plt.axvline(x = X, color = 'c',
linestyle = ':')
plt.show()

The largest viable value of δ delta= min(abs(x0-x0p), abs(x0-x0m))

Report values (using unicode for print(f'Given \u03B5 = {eps:.2}')


Greek letters) print(f'Found \u03B4 = {delta:.4}')

Finally, set slider for ε ∈ [0.01, 0.4] in interactive(plot, eps=(0.01, 0.4, 0.01))
steps of 0.01

In the code, we use the ipywidgets2 library to create a slider for the value of ε. Drag
the blue slider to change the separations of the horizontal dotted lines which correspond to
the ε-neighbourhood of f (x 0 ).

1 If you are running a non-Jupyter IDE, Matplotlib’s slider (to be discussed shortly) can be used to to
produce fig. 1.7. The code is given on GitHub as continuity.py.
2 https://ipywidgets.readthedocs.io
1.7 The ε, δ definition of continuity 25

Fig. 1.7: Interactive plots illustrating the ε-δ definition of continuity. Here we investigate
the continuity at x 0 = 1 of functions defined by f (x) = 2/x (top), and g(x) = ln(x + 1)
(bottom). In each graph, you can use the slider to adjust the value of ε, and a viable value of
δ is displayed.

Given this ε, the largest δ-neighbourhood of x 0 can be found by taking the smallest
separation between the vertical dotted lines. In this example, we can write
( )
δmax = min f −1 (y0 +  ) − x 0 , f −1 (y0 −  ) − x 0 .

Conveniently, f is decreasing and so f −1 (x) can be evaluated easily (in fact f = f −1 ).


Thus, it is also possible to calculate precisely the value of δmax . Given ε = 1/4 as shown in
the top panel of fig. 1.7, our formula gives
( )
δmax = min f −1 (9/4) − 1 , f −1 (7/4) − 1
= min {1/9, 1/7}
= 1/9.

This agrees with the value of δ displayed by the code at the bottom of the GUI.
26 1 Analysis

We can easily modify the code to illustrate the continuity of g(x) = ln(x + 1) at x 0 = 1,
as shown in the lower panel of fig. 1.7. We leave it as an exercise for you to show that the
exact expression for δ shown is given by

δmax = 2(1 − e−0.13 ) ≈ 0.2438.


The previous code relies on the fact that the function is either strictly decreasing or
increasing and that the expression for the inverse function can be found explicitly. If these
conditions do not hold, the positions of the vertical lines may not be so easy to obtain. This
is the case for the function h shown in fig. 1.8, where
sin x

 x , 0,
h(x) = 

x
1
 x = 0.

One way to get around this is to ask the GUI to read out the coordinates of where
the horizontal lines intersect the curve y = h(x). In fig. 1.8, with ε = 0.1, we find that
|h(x) − h(0)| = ε at x ≈ ±0.79, so taking any δ to be less than this value would make the
ε-δ definition work. Indeed, h(x) is continuous at x = 0.
The code continuityslider.ipynb produces the interactive GUI shown in fig. 1.8.
We use Matplotlib’s own Slider widget to create an interactive plot (which you can pan
and zoom in to get a more accurate value of δ).

Fig. 1.8: An interactive GUI showing the graph of h(x) = sin x/x around x = 0 (with
h(0) B 1). The dotted lines show y = 1 and 1 ± ε, where the value of ε can be adjusted
using the slider. The coordinates of the cursor are given at the bottom. The readout shows
that for ε = 0.1, we need δ ≈ 0.79.
1.7 The ε, δ definition of continuity 27

continuityslider.ipynb (for plotting fig. 1.8)


import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
See Preface “b) About %matplotlib" %matplotlib

The function takes an array def f(xarray):


y = np.zeros_like(xarray)
enumerate gives pairs of numbers: for ind, x in enumerate(xarray):
the indices (ind) and elements (x) in the array if x==0:
The function maps 0 to 1. . . y[ind] = 1
else:
sin x
and maps nonzero x to x y[ind] = np.sin(x)/x
return y

Choose domain to plot xarray = np.linspace(-2, 2,200)


y = f(xarray)
x0 = 0
y0 = f([x0])
Initial value of ε eps = 0.1

y arrays for 3 horizontal lines at y 0 harray0 = np.full_like(xarray, y0)


and y0 ± ε harrayP = np.full_like(xarray, y0+eps)
harrayM = np.full_like(xarray, y0-eps)

fig,ax = plt.subplots()
Leave a space at the bottom for a slider plt.subplots_adjust(bottom = 0.2)

plt.ylim(0.5,1.15)
Plot the function with a thick line plt.plot(xarray, f(xarray) , lw=2)

Plot the 3 horizontal dotted lines: h0, = plt.plot(xarray, harray0, 'b:')


y = y 0 in blue, y = y 0 ± ε are in red hP, = plt.plot(xarray, harrayP, 'r:')
Use commas to unpack the lists - see Discussion hM, = plt.plot(xarray, harrayM, 'r:')

The slider’s dimensions and location axeps = plt.axes([0.15, 0.1, 0.7, 0.02])
Create a slider eps_slide = Slider(axeps, '\u03B5',
specify range of values, step size and. . . 0, 0.15, valstep = 0.001,
the initial value of ε valinit = eps)

Update the plot if slider is changed def update(val):


Take new value of ε eps = eps_slide.val
The location of the horizontal lines y 0 ± ε are hP.set_ydata(y0 + eps)
updated hM.set_ydata(y0 - eps)
fig.canvas.draw_idle()

Redraw the graph eps_slide.on_changed(update)

plt.show()
28 1 Analysis

Discussion
• Proof of continuity. The graphs shown in this section do not prove continuity. They
only allow us to visualise the ε-δ game. Writing a rigorous ε-δ proof is an important
topic that will keep you very much occupied in your analysis course at university.
To give a flavour of what is involved, here is a rigorous proof that f (x) = 2/x is
continuous at x = 1.
Proof: For all ε > 0, take δ = min{1/2, ε/4}, so that ∀x ∈ R \ {0},

2
|x − 1| < δ =⇒ | f (x) − f (1)| = −2
x
2|x − 1|
= (*)
|x|
Since |x − 1| < 1/2, the reverse triangle inequality gives
1
−1/2 < |x| − 1 < 1/2 =⇒ 2/3 < < 2.
|x|
Substituting this into (∗), we find

| f (x) − f (1)| < 4δ ≤  . 

• Limits. A closely related related concept to continuity is that of continuous limits.


Given a function f : (a, b) → R, and a point c ∈ (a, b). We write

lim f (x) = L,
x→c

if ∀ε > 0, ∃δ > 0 such that ∀x ∈ (a, b), we have

0 < |x − c| < δ =⇒ | f (x)| − L| < ε.

The only difference between this definition and that of continuity is that for limits, there
is no mention of what happens at x = c, but only what happens close to c.
Using this definition, it can be shown that the following important theorem holds.
Theorem 1.6 Let A ⊆ R and define f : A → R. f is continuous at c ∈ A if and only if
lim f (x) = f (c).
x→c

This rigorous definition of the continuous limit lays a strong foundation for differentiation
and integration, both of which can be expressed as continuous limits, as we will see
later.
• The sinc function. In the language of limits, our plot of h(x) shows that
sin x
lim = 1. (1.6)
x→0 x
The proof of this limit based on the Squeeze Theorem (for continuous limits) can be
found in [20]. The function h(x) is sometimes written sinc(x). It has many real-world
applications, particularly in signal processing.
1.7 The ε, δ definition of continuity 29

• Why comma? In the code, you may be wondering why we used a comma on the LHS
of the assignment

hP, = plt.plot(xarray, harrayP, 'r:')

rather than
hP = plt.plot(xarray, harrayP, 'r:')
Indeed, had we removed the comma, Python would report errors when the slider is
moved. So what is going on here? Well, let’s ask Python what type the object hP is.
With the comma, the command type(hP) tells us that hP is a Line2d object. This
object has many properties including the x and y coordinates of the lines (called xdata
and ydata) and optional colours and line thickness attributes (type dir(hP) to see the
full list of attributes). Indeed, when the slider is moved, the update function updates
the y coordinates of the dashed line.
Without the comma, however, we find that hP is a list. Furthermore, len(hP)=1,
meaning that the object plt.plot(...) is in fact a list with one element (namely, the
Line2d object). When we move the slider, the y coordinates (ydata) is not a property
of this list, but rather the object within the list. This explains why Python reports an
error.
To put this in another way, we want to change the filling of the sandwich within the box,
rather than put new filling onto the box itself.
In summary, the comma tells Python to unpack the list (i.e. take out the sandwich), so
we can update its content.
30 1 Analysis

1.8 Thomae’s function

Thomae’s function f : (0, 1) → R is defined by

 q1
 if x ∈ Q and x = p
in lowest form, where p, q ∈ N,
f (x) = 
0
q

 otherwise.

1 1
Plot this function. For how many values x ∈ (0, 1) is f (x) > 10 ? or 100 ?
Deduce that Thomae’s function is continuous at irrational x and discontinuous at
rational x.

This function, named after the German mathematician Carl Johannes Thomae (1840–
1921), serves as a classic illustration of how university mathematics differs from school
mathematics. Few school students would have seen functions defined in such an exotic way.
Let’s try to make sense of the function, for example, by first looking for values of x that
would be mapped to the value 1/8. A little experiment reveals that there are 4 such values:
1 3 5 7
, , , .
8 8 8 8
Note that f (2/8) = f (6/8) = 1/4 and f (4/8) = 1/2.
It is clear that if f (x) = 1/8 then x must be a rational number of the form p/8 where
p and 8 have no common factors apart from 1. Another way to say this is that p has to be
coprime to 8. Yet another way to say this is that the greatest common divisor (gcd) of p and
8 is 1, i.e.
gcd(p, 8) = 1.
(More about the gcd when we discuss number theory in chapter 6.)

0.5

0.4

0.3
y

0.2

0.1

0.0
0.0 0.2 0.4 0.6 0.8 1.0
x
Fig. 1.9: Thomae’s function
1.8 Thomae’s function 31

The graph of Thomae’s function is shown in fig. 1.9. To plot this graph, we can
conveniently use NumPy’s gcd function as shown in the code thomae.ipynb. The code
also counts how many values of x satisfy f (x) > 1/10.
thomae.ipynb (for plotting fig. 1.9)
import numpy as np
import matplotlib.pyplot as plt

Initialising x and y as empty lists xlist = []


ylist = []

Search for fractions with denominators for q in range(2,200):


from 2 to 199 for p in range(1,q):
If p and q are coprime. . . if np.gcd(p,q) == 1:
append the lists of x and y coordinates xlist.append(p/q)
ylist.append(1/q)

Plot the points with big red dots plt.plot(xlist, ylist, 'or', ms=3)
plt.xlabel('x')
plt.ylabel('y')
plt.xlim(0,1)
plt.grid('on')
plt.show()

Count how many points are above 0.1 lim = 0.1


Use list comprehension to do this num = sum(y > lim for y in ylist)
and report print(f'Found {num} points above y={lim}')

In the code, we use the for loops to append values to empty lists. This creates 2 growing
lists of the x and y coordinates. Instead of lists, one could also use NumPy arrays (using
np.empty and np.append).
Running the code tells us that there are 27 values. As an exercise, try to write them all
down. As for the case f (x) > 1/100, the code gives us 3003 values.
What do these results mean? Well, they imply that given any number ε > 0, there are a
finite number of x such that f (x) > ε. This means that at any x 0 ∈ (0, 1), we can find a
neighbourhood of x 0 sufficiently small that it does not contain any values of x such that
f (x) > ε. In other words, | f (x)| < ε for all x sufficiently close to x 0 .
Now let x 0 be any irrational number in (0, 1). Since f (x 0 ) = 0, the previous paragraph
gives the following result:

∀ε > 0, ∃δ > 0 such that, ∀x ∈ (0, 1), |x − x 0 | < δ =⇒ | f (x) − f (x 0 )| < ε.

This is precisely the ε-δ definition for the continuity of Thomae’s function at any
irrational x 0 ∈ (0, 1).

Discussion
• The Archimedean property. In deducing the fact that there are a finite number of x
such that f (x) > ε, we implicitly used the following property of real numbers.
Theorem 1.7 (The Archimedean property) For any ε > 0, there exists an integer
n ∈ N such that ε > 1/n.
(Can you see precisely where this has been used?) This property sounds like an obvious
statement, but, like many results in introductory analysis, it has to be proven. It turns out
32 1 Analysis

that the Archimedean property follows from other axioms, or properties of real numbers
which are satisfied by decree. These axioms consist of the usual rules of addition an
multiplication, plus some axioms on inequalities which allow us to compare sizes of
real numbers. In addition, we also need:
The Completeness Axiom. Every bounded nonempty set of real numbers has a least
upper bound.
For example, the least upper bound of real numbers x ∈ (0, 1) is 1. The axiomatic
approach to constructing R is a hugely important topic in introductory real analysis
which you will encounter at university.
• Sequential criterion. The ε-δ definition for continuity is equivalent to the following.
Theorem 1.8 (Sequential criterion for continuity) The function f : A → R is contin-
uous at x 0 ∈ A if and only if, for every sequence x n ∈ A converging to x 0 , we have
f (x n ) → f (x 0 ).
This theorem can be used to prove that Thomae’s function is discontinuous at any
rational number x 0 ∈ (0, 1). Consider the sequence

2
x n = x0 − ,
n
which must√necessarily be a sequence of irrational numbers (here we use the well-known
result that 2 is irrational). The sequence clearly converges to x 0 ∈ Q. Thus, we have
found a sequence x n → x 0 such that f (x n ) = 0 6→ f (x 0 ). This proves that f cannot be
continuous at x 0 ∈ Q ∩ (0, 1).
1.9 The Intermediate Value Theorem and root finding 33

1.9 The Intermediate Value Theorem and root finding

Solve the equation e−x − x = 0.

The Intermediate Value Theorem (IVT) is an important result concerning functions


which are continuous on a closed bounded interval [a, b]. Continuity on an interval simply
means that the ε-δ definition applies to all points in the interval. Such functions have special
properties such as the following ‘intermediate value’ property.

Theorem 1.9 (Intermediate Value Theorem) Let f be a function which is continuous on


[a, b]. If v is a value strictly between f (a) and f (b), then there exists c ∈ (a, b) such that
f (c) = v.

In other words, a continuous function f on [a, b] takes all possible values between f (a)
and f (b).
Now consider the function f (x) = e−x − x, which comprises the exponential function
and a linear function. Both functions (and their difference) can all be shown to be continuous
at every x ∈ R using the ε-δ definition. In particular, we note that f is continuous on the
interval [0, 1] and that f takes opposite signs at the end points of this interval, i.e.

f (0) = 1, f (1) = 1/e − 1 < 0.

Therefore, the IVT tells us that ∃c ∈ (0, 1) such that f (c) = 0. The graph below illustrates
that the root of f (x) = 0 indeed lies in this interval.

Fig. 1.10: The graph y = e−x − x intersects the x-axis somewhere in the interval (0,1). This
is consistent with the Intermediate Value Theorem.

One way to proceed is to consider the sign of f (x) at the midpoint x = 0.5. We find that

f (0.5) = 1/ e − 0.5 > 0.

Since f (x) has opposite signs at the endpoints of the ‘bisected’ interval (0.5, 1), the IVT
again implies that f has a root in (0.5, 1).
34 1 Analysis

We can carry on bisecting this interval and study the sign of f (x) at the midpoint and
repeat until we achieve a root c in an interval that is as small as the desired accuracy.
The code below illustrates how this process (called the bisection method of root finding)
can be iterated as many times as required to achieve a root with accuracy acc. The core of
the code is a while loop which repeats the iteration until the size of the bisected interval
shrinks below acc.

bisection.ipynb (for performing root-finding using the bisection method)


import numpy as np
import matplotlib.pyplot as plt

Define f (x) def f(x):


return np.exp(-x)-x

Specify accuracy required acc = 0.5e-5


Specify interval [a, b] a, b = 0, 1

Function values at interval endpoints fa, fb = f(a), f(b)

If f (a) and f (b) don’t have oppo- if fa*fb>=0:


site signs, report error and abort mission raise ValueError('Root is not bracketed.')

Iteration counter n=0


Repeat as long as the error is too large while (b-a)/2 > acc:
This step is the bisection x = (a+b)/2
fx = f(x)
If lucky (found exact root), if fx == 0:
. . .jump out of while loop break
If root lies in [a, x], if fx*fa < 0:
. . .make x the new b b = x
Otherwise, root lies in [x, b] else:
a = x
Report result of every iteration print(f'Iteration number {n}, x={x}')
Increase iteration count and repeat n += 1

Final bisection x = (a+b)/2


Report answer print(f'Final iteration number {n}, x= {x}')

Running the code above with acc = 0.5 × 10−5 (to ensure 5 dec. pl. accuracy) shows that
Final iteration number 17, x= 0.5671424865722656
Thus, we can report that the root of the equation e−x − x = 0 is approximately 0.56714
(5 dec. pl.).
Discussion
• Newton-Raphson method. The bisection method is a relatively slow but reliable
method of root-finding for most practical applications. A faster root-finding method
called the Newton-Raphson method is discussed in the exercise 11. Although the
Newton-Raphson method is faster, it requires additional information, namely, the
expression for the derivative f 0 (x), which is not always available in real applications.
• Bracketing. For the bisection algorithm to start, we first need to bracket the root, i.e.
find an interval [a, b] in which f (a) and f (b) have opposite signs . However, this may
not be possible, for instance, with f (x) = x 2 . In this case another root-finding method
must be used. See [131, 132] for other root finding algorithms in Python.
1.9 The Intermediate Value Theorem and root finding 35

• Throwing. The Python command raise is useful for flagging a code if an error has
occurred. This practice is also known as throwing an error. The simplest usage is:
if (some conditions are satisfied): x
raise ValueError('Your error message')
It is good practice to be specific in your error message about what exactly has gone
wrong.
• Numerical analysis is the study of the accuracy, convergence and efficiency of numerical
algorithms. This field of study is essential in understanding the limitation of computers
for solving mathematical problems. We will explore some aspects of numerical analysis
in this book, particularly in the next chapter. For further reading on numerical analysis,
see, for example, [40, 170, 182].
36 1 Analysis

1.10 Differentiation

For each of the following functions, plot its graph and its derivative on the interval
[−1, 1].
√  x 2 sin(1/x 2 ) x , 0,

a) f (x) = sin πx, b) g(x) = |x|, c) H (x) = 
0 x = 0.
Which functions are differentiable at x = 0?

Let f : (a, b) → R and let c ∈ (a, b). The derivative of f at x = c, denoted f 0 (c) is
defined as:
f (c + h) − f (c)
f 0 (c) = lim . (1.7)
h→0 h
A function is said to be differentiable at x = c if the limit above is finite.
In school mathematics, the derivative is often defined as the rate of change of a function,
or the gradient of the tangent to y = f (x) at x = c. However, in university analysis,
pictorial definitions are not only unnecessary, but must also be avoided in favour of rigorous
logic-based definitions. The limit (1.7) has a precise definition in terms of ε-δ (see §1.7).
First let’s consider the derivative of f (x) = sin πx. You will probably have learnt how
to differentiate this type of function at school. But let’s see how we can also work this out
from first principles using the definition above. Recall the trigonometric identity:

α+ β α− β
! !
sin α − sin β = 2 cos sin .
2 2

Using this identity in the limit, we have:


sin π(c + h) − sin πc
f 0 (c) = lim
h→0 h
2 cos π(c + h2 ) sin πh
2
= lim
h→0 h
h sin πh
= lim cos π(c + ) · lim π πh 2 . (1.8)
h→0 2 h→0 2

where we have broken up the limit into a product of two limits. The first limit in (1.8) is
simply cos πc (technical note: this step requires the continuity of cos). The second limit can
be evaluated using the result from Eq. 1.6 in §1.7, giving us π. Therefore, we have

f 0 (c) = π cos πc,

as you might have expected. Note that f 0 is defined at all values of c ∈ R. In other words, f
is differentiable on R. In particular, it is certainly differentiable at x = 0 with f 0 (0) = π.
The next function g can be written in piecewise form (using the definition of the modulus)
as:

 x x ≥ 0,
g(x) =  √


 −x x < 0.

This can be differentiated in the usual way for x , 0, giving


1.10 Differentiation 37

1
 2√ x
 x > 0,
g 0 (x) = 
 − √1
 2 −x x < 0.

But that at x = 0, the limit definition gives



|h|
g (0) = lim
0
,
h→0 h
which becomes arbitrarily large near h = 0, so the limit does not exist. Thus, g is not
differentiable at x = 0.

Before we consider H (x), let’s pause to consider how derivatives can be calculated on
the computer. Of course, one could simply differentiate the function by hand, then simply
code the result. However, in real applications, we may have limited information on the
function to be differentiated. Sometimes the expression for the function itself cannot be
easily written down. This means that it is often impractical to rely on the explicit expression
for the derivative.
Instead, we can work with a numerical approximation of the limit definition (1.7). For
example, we could say:
f (x + h) − f (x)
f 0 (x) ≈ , (1.9)
h
for a small value of h. This is called the forward-difference or forward-Euler estimate of the
derivative. The word ‘forward’ comes from the fact that the gradient at x is approximated as
the slope of the line joining the points on the curve at x and x + h, a little “forward" from x.
Figure 1.11 shows the graphs of f , g (dashed blue lines) and their approximate derivatives
(solid orange lines) calculated using the forward-difference approximation (1.9) with
h = 10−6 . Note in particular that the lower panel shows that the derivative of g blows up
near x = 0, where y = g(x) has a sharp point (similar to that the graph of y = |x|). The
graph indeed confirms that g is not differentiable at x = 0.
38 1 Analysis

3 y=f(x)
y=f'(x)
2

3
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

2.0
y=g(x)
1.5 y=g'(x)
1.0
0.5
0.0
0.5
1.0
1.5
2.0
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00


Fig. 1.11: The graphs of functions f (x) = sin πx (top) and g(x) = |x| (bottom) – in dashed
blue lines – and their derivatives (solid orange lines) calculated using the forward-difference
approximation (1.9) with h = 10−6 .

The code differentiation.ipynb plots the graphs of g and g 0. Note that we work
on the positive and negative x values separately because, otherwise, Matplotlib would join
up points around x = 0, creating a false visual impression of the values of g 0 (x) near 0.
1.10 Differentiation 39

differentiation.ipynb (for plotting the bottom panel of fig. 1.11)


import numpy as np
import matplotlib.pyplot as plt

Function to be differentiated def g(x):


return np.sqrt(np.abs(x))

Pick a small h h=1e-6


Work with x > 0 and x < 0 separately xp= np.linspace(0,1)
xn= np.linspace(-1,-h)
gxp = g(xp)
gxn = g(xn)
g 0 (x) approximation (forward Euler) dgp = (g(xp+h) - gxp)/h
dgn = (g(xn+h) - gxn)/h

plt.plot(xp, gxp, 'b--',


xp, dgp, 'orange',
xn, gxn, 'b--',
xn, dgn, 'orange')
plt.legend(["y=g(x)" ,"y=g'(x)"])
plt.grid('on')
plt.xlim([-1,1])
plt.ylim([-2,2])
plt.show()

Now let’s return to H (x). With simple modifications of the code, we can plot H and H 0
as shown in fig. 1.12. It appears that the derivative fluctuates wildly around x = 0. One
might even be tempted to conclude from the graph that H is not differentiable at x = 0.

1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75 y=H'(x)
y=H(x)
1.00
1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00

Fig. 1.12: The graphs of function H (x) (blue dashed lines) and its derivative (solid orange
lines) calculated using the approximation (1.9) with h = 10−6 .

But graphs can be misleading! In fact, H is differentiable at x = 0, as we now prove


using the limit definition:
40 1 Analysis

H (h) − H (0)
H 0 (0) = lim
h→0 h
= lim h sin(1/h2 ).
h→0

Note that we replaced H (h) by the expression for h , 0. This follows from the fact that the
limit as h → 0 is defined without requiring us to know what happens at h = 0. Observe
then that for any h ∈ R, we have

−|h| ≤ h sin(1/h2 ) ≤ |h|,

since −1 ≤ sin(1/h2 ) ≤ 1. Taking the limit as h → 0, we conclude that H 0 (0) = 0 by the


Squeeze Theorem.

Discussion
• How small should h be? In the forward-difference approximation 1.9, it appears as
though the smaller h is, the more accurate the estimate for the derivative becomes.
Surprisingly, this is not the case! You could try this yourself by changing h in the code
from 10−6 to, say 10−20 . What do you observe?
In fact, there is an optimal value of h which gives the most accurate answer for the
derivative. Larger or small values of h would give answers that are less accurate. We
will explore this in chapter 2, in which we will also see that there are many other
derivative approximations that are more accurate than the forward-difference formula
(but they take more resources to compute).
• Differentiable means continuous. The following useful theorem establishes the link
between continuity and differentiability.
Theorem 1.10 If f : (a, b) → R is differentiable at a point c ∈ (a, b), then f is
continuous at c.

• Can computers really do maths? It is worth reminding ourselves that whilst computers
can help us understand mathematics more deeply, they cannot think mathematically.
It is our job to check and interpret results that Python tells us. Often the answers we
get are not what we expect (e.g. when the step-size h is too small). Sometimes the
answers are just plain wrong (e.g. H 0 (0)). So one should never treat a computer like an
all-knowing black box which always gives us the correct answers all the time.
1.11 The Mean Value Theorem 41

1.11 The Mean Value Theorem

Show that the function f (x) = (x − 1) sin x has a turning point in the interval (0,1).
Find the x coordinate of the turning point.

Method 1
It is natural for students to associate the phrase ‘turning point’ with where f 0 (x) = 0.
This means that we have to solve the equation

sin x + (x − 1) cos x = 0. (1.10)

Suppose cos x = 0, then Eq. 1.10 gives sin x = 0, but this is impossible because sin x
and cos x cannot be zero at the same time, so we conclude that cos x , 0. We can then
safely divide Eq. 1.10 by cos x, giving

tan x = 1 − x. (1.11)

A quick sketch reveals that there are infinitely many solutions, but only one in (0, 1), as
shown in fig. 1.13.

4
y=1 x
3 y = tanx
2
1
0
1
2
3
4
3 2 1 0 1 2 3

Fig. 1.13: The curves y = tan x and y = 1 − x intersect infinitely many times on R, but only
once on (0, 1).

One way to locate this root is to do a bisection search for the solution of (1.11) in (0, 1)
using the code in §1.9. Whilst this method will yield the x-coordinate of the turning point,
we needed the explicit expression for the derivative f 0 (x) that we just found. However, as
discussed earlier, such an expression may not always be available.
Method 2
Let’s take another look at the problem. This time, suppose we don’t know how to
differentiate the function f (x) = (x − 1) sin x, nor do we know about its graph. Can we still
deduce that there is a turning point in the interval (0, 1)? Yes, according to the following
theorem:
42 1 Analysis

Theorem 1.11 (Mean Value Theorem) If f : [a, b] → R is continuous on [a, b] and differ-
entiable on (a, b), then there exists c ∈ (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a

With (a, b) = (0, 1), we find that f (0) = f (1) = 0, so by the Mean Value Theorem,
∃c ∈ (0, 1) such that f 0 (c) = 0, i.e. there exists a turning point in (0, 1)
The Mean Value Theorem (MVT) is a very powerful theorem in analysis. The word
‘mean’ refers to the fact that at c, the gradient is simply the average trend on the interval
(a, b), i.e. the slope of the straight line joining the two endpoints of the curve.
To find the location of the turning point without manual differentiation, we can employ
the forward-difference estimate (1.9)
f (x + h) − f (x)
0
f est (x) = , (1.12)
h
for some small h (say 10−6 ). It is then just a matter of finding where f est
0 (x) = 0 numerically

using, say, the bisection code in §1.9.


Both methods give us, after 17 bisections, the following answer for the x coordinates of
the turning point
x = 0.47973 (5 dec. pl).
In summary, the existence of the turning point is guaranteed by the MVT, and its location
can be estimated without knowing the derivative explicitly. The plots below (of f and its
exact derivative) are consistent with our numerical answer for the turning point.

0.235 0.15
0.236 0.10
0.237 0.05
f 0(x)
f(x)

0.238
0.00
0.239
0.05
0.240
0.44 0.46 0.48 0.50 0.52 0.54 0.44 0.46 0.48 0.50 0.52 0.54
x x
Fig. 1.14: The curve y = (x − 1) sin x (left) and its exact derivative (right) around the
turning point at x ≈ 0.48.
1.11 The Mean Value Theorem 43

Discussion
• Rolle’s Theorem. It is a useful exercise to show that the MVT is a consequence of the
following theorem.
Theorem 1.12 (Rolle’s Theorem) If f : [a, b] → R is continuous on [a, b] and differen-
tiable on (a, b), with f (a) = f (b), then there exists c ∈ (a, b) such that f 0 (c) = 0.
As a consistency check, putting f (a) = f (b) in the MVT gives Rolle’s Theorem, so we
see that the MVT is a more general result.
Michel Rolle (1652–1719) was a French self-taught mathematician.
√ Apart from Rolle’s
Theorem, he is also credited with introducing the notation n x for the nth root of x.
• Cauchy’s Mean Value Theorem. A generalised version of the MVT is the following.
Theorem 1.13 (Cauchy’s Mean Value Theorem) Let f and g be functions that are
continuous on [a, b] and differentiable on (a, b), Suppose that g 0 (x) , 0 for all
x ∈ (a, b). Then there exists c ∈ (a, b) such that
f 0 (c) f (b) − f (a)
0
= .
g (c) g(b) − g(a)
As a consistency check, putting g(x) = x gives the MVT, so we see that Cauchy’s MVT
is a more general result.
44 1 Analysis

1.12 A counterexample in analysis

In mathematics, a counterexample is a specific example which proves that a statement is


false. For instance, to prove that the statement “every prime number is odd", one only needs
to supply the counterexample: “2 is an even prime."
In analysis, counterexamples are typically used to demonstrate the falsehood of statements
that seem intuitively true. Such counterexamples are very instructive. They warn us that we
cannot always rely on our intuition, and that functions are a more nuanced entity than we
might have thought when studying mathematics in school.
We will study one such counterexample in this section, with the help of Python for
visualisation.

Is the following statement true or false?


Let f be differentiable at c ∈ R with f 0 (c) > 0. Then there exists a neighbourhood
of c in which f is strictly increasing.

f is said to be strictly increasing on an interval I if, ∀x 1, x 2 ∈ I such that x 1 < x 2 , we


have f (x 1 ) < f (x 2 ). f is said to be increasing on an interval I if, ∀x 1, x 2 ∈ I such that
x 1 ≤ x 2 , we have f (x 1 ) ≤ f (x 2 ).
The property f 0 (c) > 0 tells us that the gradient of the curve y = f (x) at c is positive.
Thus, it makes intuitive sense to think that at points very close to x = c, the curve should
also have positive gradients that are not drastically different from f 0 (c), suggesting that f is
increasing around x = c. Besides, since f is differentiable at c, it is also continuous there
(this is theorem 1.10). So it makes sense that we should not have any wild jumps around
x = c that might invalidate continuity at c.
Yet the statement is false. Consider the following counterexample.

x + 2x 2 sin x1
 
 x , 0,
f (x) =  (1.13)


0 x = 0.

We will show that f 0 (0) > 0, yet there exists no neighbourhood of 0 in which f is strictly
increasing.
Firstly, to prove that f 0 (0) > 0, we use the limit definition (1.7).
f (0 + h) − f (0)
f 0 (0) = lim
h→0 h
1
!
= 1 + lim 2h sin
h→0 h
= 1,

where the last step follows from the Squeeze Theorem as before (try justifying this by
following the calculation at the end of §1.10).
We now show that f is not increasing around x = 0. It suffices to show that in any
neighbourhood of 0, we can find a point with negative gradient. Symbolically, we want to
show that ∀δ > 0, ∃c ∈ (−δ, δ) such that f 0 (c) < 0 (we will say more about this in the
Discussion section).
By applying the usual differentiation techniques, we find that for x , 0,
1.12 A counterexample in analysis 45

1 1
! !
f 0 (x) = 4x sin − 2 cos + 1. (1.14)
x x

For all δ > 0, there exists an integer n ∈ N such that 0 < n1 < 2πδ. This follows from
1
the Archimedean property discussed in §1.8. Note that the point x = 2πn is within the
neighbourhood (−δ, δ). However,

1
!
f0 = −1.
2πn

Hence, we have successfully used the counterexample to disprove the given statement.
Let’s use Python to visualise the curve y = f (x). The code below produces fig. 1.15,
which plots the curve in two neighbourhoods of 0 (the neighbourhood on the right panel is
10 times smaller than the left). We see a sinusoidal behaviour in both plots. Try increasing
the zoom level by a small modification of the code, or by using %matplotlib and using
the zoom button on the GUI (see §1.7). In any case, you should see a sinusoidal behaviour
no matter how much you zoom in (of course you should increase the plotting resolution in
the code accordingly). The figure suggests that within any δ neighbourhood of 0, we can
find points at which f 0 (x) is positive, negative or zero!
One way to understand this result is to see that the graph for the function f gets
increasingly wiggly towards the origin, whilst being  constrained to bounce between the
parabolas y = x + 2x 2 and y = x − 2x 2 (where sin x1 = ±1). These parabolic boundaries
intersect at 0, hence forcing f 0 (0) = 1. Try adding these parabolas to the plots in fig. 1.15.

y=f(x) Zoomed
0.4 0.04

0.3 0.03

0.2 0.02

0.1 0.01

0.0 0.00

0.1 0.01

0.2 0.02

0.3 0.03

0.4 0.04
0.4 0.2 0.0 0.2 0.4 0.04 0.02 0.00 0.02 0.04

Fig. 1.15: The curve y = f (x) where f (x) = x + 2x 2 sin x1 (x , 0) and f (0) = 0. The
 

right panel is the same plot but 10 × zoomed in. The sinusoidal behaviour is seen no matter
how much we zoom in towards the origin.
46 1 Analysis

counterexample.ipynb (for plotting fig.1.15)


import numpy as np
import matplotlib.pyplot as plt

Define the function def f(xarray):


y=np.zeros_like(xarray)
Given x array. . . for i, x in enumerate(xarray):
. . . map 0 to 0 if x==0:
y[i] = 0
else:
1
and map the rest to x + 2x 2 sin x y[i] = x*(1+2*x*np.sin(1/x))
then return an array return y

fig, (ax1, ax2) = plt.subplots(1,2,


figsize=(10,6))
x1 = np.linspace(-0.4, 0.4, 100)
x2 = np.linspace(-0.04, 0.04, 100)

Left panel ax1.plot(x1, f(x1))


δ = 0.4 ax1.set_ylim(-0.4, 0.4)
ax1.title.set_text('y=f(x)')
ax1.grid('on')

Right panel ax2.plot(x2, f(x2))


δ = 0.04 ax2.set_ylim(-0.04,0.04)
ax2.title.set_text('Zoomed')
ax2.grid('on')

plt.show()

Discussion
• Derivative of a monotone function. There is a subtle connection between the sign of
the derivative and the monotonicity of f . The MVT can be used to show that, on an
interval I,

f 0 (x) ≥ 0 on I ⇐⇒ f is increasing on I,
f 0 (x) > 0 on I =⇒ f is strictly increasing on I.

The converse to the second statement does not hold. Can you think of a simple
counterexample? A dramatic one is given in exercise 14.
• A counterexample of historical importance. Perhaps the most notorious counterexam-
ple in analysis is a function which is continuous everywhere but is nowhere differentiable.
The discovery of such a function by Karl Weierstrass in 1872 sent a shockwave through
the mathematical world, leading to the reform and development of analysis into the
rigorous subject that we know today. We will meet this function in the next chapter.
• More counterexamples in analysis have been compiled by Gelbaum and Olmsted [72],
a highly recommended book full of surprising and enlightening counterexamples.
1.13 Exercises 47

1.13 Exercises

1 (Book-stacking problem) Here is an interesting physical situation in which the Harmonic


Series appears. The problem was published by Paul Johnson in 1955 [101].
The code harmonic.ipynb may be helpful in this question.
I wish to create a leaning tower of books using multiple identical copies of a book.
Using n books arranged in a tower perpendicular to the edge of the table, I push the top
book as far out as I can, and do the same for the next book below, working my way
towards the bottom of the tower. See the figure below when n = 4. We can assume that
the books and the table are rigid enough that there are no vertical movements.

a. Show that using n books, the overhang (in units of books) can be written as the
Harmonic Series
1 1 1 1
!
1+ + ...+ .
2 2 3 n
Deduce that using 4 books, the overhang exceeds the length of a book.
b. Plot the overhang (in unit of books) against the number of books used to build the
tower. Consider up to 1000 books. Suggestion: use log scale on the horizontal axis.
c. On the same plot, plot the result when the eq. 1.4 (logarithmic approximation of
the Harmonic Series) is used to calculate the overhang.
d. Using the log approximation:
i. estimate the overhang when 106 books are used to create the tower. (Ans:
around 7-book long overhang.)
ii. estimate the number of books needed to build a leaning tower with a 10-book
long overhang. (Your answer is probably greater than the estimated number of
physical books that exist in the world.)
2 (Famous approximations for π) Below are three historically important approximations
for π.
• Madhava series (14th century), sometimes called the Gregory-Leibniz approxima-
tion (1671–1673)
1 1 1
!
π = 4 1− + − +···
3 5 7
• Wallis product (1656)

2 2 4 4 6 6
! ! !
π=2 · · · · · ···
1 3 3 5 5 7

• Viète’s formula (1593)


2 2 2
π =2· √ · q ·r ···
2 √ q √
2+ 2 2+ 2+ 2
48 1 Analysis

Let n denote the number of iterations in each approximation scheme. For example, the
zeroth iteration (n = 0) gives 4, 2 and 2 for the three approximations respectively.
a. On the same set of axes, plot the results of the three approximations against the
number of iterations up to n = 10.
b. On a set of logarithmic axes, plot the absolute fractional error

Estimate after n iterations − π


π

for the three approximations up to n = 100. This gives us an idea of how fast the
approximations converge to π.
You should find that the error for Viète’s formula does not appear to go smaller
than a minimum limit of around 10−16 . The reason for this is the machine epsilon
which will be discussed in §2.2.
c. Recall that an estimate x of π is accurate to p decimal places if |x − π| < 0.5 × 10−p .
For each of the three approximations of π, calculate how many iterations are needed
to obtain π accurate to 5 decimal places.
(Answers: 200000, 157080 and 9.)

3 (Ramanujan’s formula for π) In 1910, Ramanujan gave the following formula for π.

 √ X ∞  −1
2 2 (4n)!(1103 + 26390n) 
π =   .
 9801 n=0 (n!) 4 3964n 
(Famously, he simply ‘wrote down’ many such formulae.) Calculate the first 3 iterations
of this approximation. How many decimal places is each approximation accurate to?
Try writing a code that calculates the series up to n terms. Can your code accurately
evaluate the result, say, when n = 10? If not, explain why.
Suggestion: In Python, we can calculate the factorial, say 15!, using the following
syntax:

import math
math.factorial(15)
The factorials in Ramanujan’s formula give rise to huge numbers. Think about what
can go wrong.
4 (Reciprocal Fibonacci number) Use fibonacci.ipynb as a starting point.
The reciprocal Fibonacci constant is given by

X 1
ψ= = 3.35988566624317755 . . .
F
n=1 n

a. Calculate the sum to 20 terms.


Suggestion: Try to do this in a vectorised way using arrays rather than a loop.
b. Calculate how many terms are needed to obtain ψ to 10 decimal places. (Answer:
49.)
1.13 Exercises 49

5 (Generalised Fibonacci sequences) Use fibonacci.ipynb as a starting point.


a. Suppose we use different initial values F0 and F1 . Use Python to investigate the
behaviour of the ratio Rn B Fn /Fn−1 . Show that Rn always converges to the
Golden Ratio.
(If you like a challenge, you could demonstrate this behaviour using sliders.)
b. (Lucas sequences) Let P and Q be fixed integers (not both zero). Define the Lucas
sequence, Un (P, Q), by the following rules:

U0 = 0, U1 = 1, Un = PUn−1 − QUn−2 .

Note that the Fibonacci sequence corresponds to Un (1, −1).


Write a code that plots the large-n behaviour of the ratio of consecutive terms
Rn B Un+1 /Un for the following values of P and Q.
i. (P, Q) = (3, 1)
ii. (P, Q) = (2, 1)
iii. (P, Q) = (1, 1)
Make a conjecture on the range values of P such that Rn is a convergent sequence.
(Answer: see https://mathworld.wolfram.com/LucasSequence.html)
6 (Order of convergence) Given a convergent sequence, it is natural to ask: how fast
does the sequence converge? In this question, we explore how to quantify the speed of
convergence.
Suppose a sequence (x n ) converges to x. Define the absolute error, En , as the sequence

En = |x n − x|.

The speed of convergence can be quantified by two positive constants: the rate (C) and
the order (q) of convergence, defined via the equation
En+1
lim = C > 0.
n→∞(En ) q

a. Verify by hand that the sequence n1 converges with q = C = 1.


 

b. For most applications, only the order of convergence is of interest as it is a better


indicator of how fast the sequence converges. It can be shown that we can estimate
q using the formula
ln(En+1 /En )
q≈ ,
ln(En /En−1 )
where n should be large enough so that q stabilises, but not so large that the
denominator in the formula is zero.
i. Using this approximation, verify (using Python) that the ratio of consecutive
Fibonacci numbers Rn = Fn+1 /Fn converges with order 1.
Suggestion: it might help to plot q as a function of n. You should find that the
graph settles on q = 1 before becoming undefined when n is too large.
ii. Consider the Madhava series for π in Question 2. Show that q < 1.
Technically, we say that the convergence of the series is sublinear (i.e. terribly
slow).
iii. Conjecture a sequence that converges superlinearly (q > 1) and demonstrate
this property graphically..
50 1 Analysis

7 In continuityslider.ipynb (§1.7), add a slider in the GUI so that the value of x 0


can be changed.
8 (Euclid’s orchard) Thomae’s function (§1.8) has an interesting connection to the
following problem, sometimes called Euclid’s orchard or the visible points problem.
Consider grid points with positive integer coordinates (m, n). A grid point P is said to
be visible from (0,0) if a straight line joining (0,0) to P does not pass through any other
grid points. For example, as shown in the figure below, the point (1, 2) is visible from
(0,0), but (2, 4) is not.

You should first try to experiment on a piece of grid paper to come up with a conjecture
on which points are visible from (0, 0).
a. Write a code that produces a grid and marks each point that is visible from (0,0)
with a red dot (you will need to impose a sensible cutoff).
b. For each visible point (m, n), plot its image under the mapping

1
!
m
F (m, n) = , .
m+n m+n

What do you see? Can you explain how this mapping works?
(Answer: You see Thomae’s function!)
9 (Root-finding) Use the bisection code to solve the following problems. Give your
answers to 4 decimal places.
a. Solve the equation x 3 − x 2 − x − 1 = 0.
Suggestion: Start by plotting.
b. Solve sinx x = 0.9 (hence verifying the intersection point seen in fig. 1.8)

c. Find the numerical value of 3 using only the four basic operations + − ×÷.
Suggestion: Start by defining f (x) = x 2 − 3.
10 (Generalised Golden Ratio) Suppose we generalise the Golden Ratio to φ n defined as
the positive root of the following order-n polynomial

x n − x n−1 − x n−2 · · · − 1 = 0.

For example, φ1 = 1 and φ2 ≈ 1.618.


Write a code that plots the sequence φ n (obtained by bisection). Make a conjecture for
the value of lim φ n .
n→∞
1.13 Exercises 51

11 (Newton-Raphson method) Let f be differentiable function on R. The Newton-Raphson


method for finding the root of the equation f (x) = 0 comprises the following steps.
• Start with an initial guess x 0 .
• Calculate
f (x n )
x n+1 = x n − ,
f 0 (x n )
where the expression for f 0 (x) must be explicitly known.
• Repeat the above iteration as long as |x n+1 − x n | < ε for a fixed tolerance ε specified
by the user (e.g. 0.5 × 10 p for p-decimal place accuracy). This step ensures that we
stop only when the first p-decimal places are stabilised.
a. Write a Python code for this method and use it to solve the problems in question 9.
b. Let x n be the estimate of the solution of f (x) = x 2 − 3 after n iterations using the
Newton-Raphson methods. √
Calculate the absolute error En = |x n − 3| and verify that the Newton-Raphson
with order 2. Do this by calculating q defined in question 6).
12 (Creating a pretty plot) Plot the graph of the sinc function f : [0, 2π] → R defined by

 sinx x
 if x , 0,
f (x) = 
1
 if x = 0.

On the same set of axes, plot its first and second derivatives, estimated using the
forward Euler method (i.e. do not differentiate anything by hand in this question). Try
to use different types of lines in your plot (think about colour-blind readers). Here is an
example output.

1.0
f(x) = sinx/x
0.8 1st derivative
2nd derivative
0.6
0.4
0.2
y

0.0
0.2
0.4
0 /2 3 /2 2
x
Fig. 1.16: A pretty plot of the sinc function and its first and second derivatives.
52 1 Analysis

13 (Mean Value Theorem) Consider the sinc function f defined in the previous question.
Let’s apply the Mean Value Theorem (theorem 1.11) to f on the interval [0, π].
In particular, let’s determine the value(s) of c in the statement of the theorem. In other
words, we wish to find c ∈ [0, π] such that
f (π) − f (0)
f 0 (c) = .
π
a. Show (by hand) that one solution is c = π.
b. Use a root-finding algorithm to find another solution to 4 decimal places.
Your answer should be consistent with the dashed line in fig. 1.16 above.

14 (Another counterexample) Here is another interesting counterintuitive example in


analysis. Use counterexample.ipynb as a template.
Consider the function f : [0, 1] → R defined by

 x (2 − cos ln x − sin ln x)
 if x ∈ (0, 1],
f (x) = 
0
 if x = 0.

It can be shown that f is strictly increasing, yet there are infinitely many points where
f 0 (x) = 0 in (0, 1].
a. Use Python to plot the graph of the function. Your graph should demonstrate the
self-similar structure when we zoom in towards the origin.
b. Show (by hand) that there are points of inflection at x = e−2nπ where n ∈ N.
Indicate them on your plot.
CHAPTER
TWO

Calculus

Calculus broadly encompasses the study of differentiation and integration. At school,


these subjects are often taught in a rather algorithmic way with a focus on employing
various techniques of differentiation and integration using stock formulae and tricks. This
approach only scratches the surface of what has to be one of the most profound mathematical
inventions that have helped us understand the physical world and the laws of nature.
At university, the focus of calculus shifts dramatically from how to why you can
differentiate or integrate a function, and calculus becomes a much more rigorous subject
with close links to analysis. A good review of these links is given in [161, 189], for example.

Fig. 2.1: (L-R) Sir Isaac Newton (1642–1726) and Gottfried Leibniz (1646–1716) formulated
calculus independently, although the question of who came to it first was a highly contentious
intellectual feud of the late 17th century. (Image source: [137].)

How can we use Python to help us understand calculus? Since Python is primarily a
numerical tool, we will be interested in the numerical values of derivatives and integrals
rather than their algebraic expressions. For example, we are not interested in the problems
Z
2
If f (x) = e , find f (x)
−x 0 OR Find sin x dx,

which are symbolic in nature, and so must be solved by specialised packages that have been
taught the rules of calculus (e.g. the SymPy library). Instead, we are interested in problems
that require numerical answers, such as

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 53


S. Chongchitnan, Exploring University Mathematics with Python,
https://doi.org/10.1007/978-3-031-46270-2_2
54 2 Calculus
Z π
2
If f (x) = e−x , find f 0 (2) OR Find sin x dx,
0
which can be solved with basic binary arithmetic on any computer.
Calculus on the computer is a surprisingly subtle topic. Because Python does not know
even the most basic rules of differentiation and integration, we need to approximate those
quantities numerically using the gradient of a tangent for derivatives, and area under a graph
for integrals. The surprise here, as we will see, is that there is a limit to how accurate these
numerical answers can be, stemming from the inevitable fact that computers operate on a
system of floating-point binary numbers (this will be explored in §2.2) .
We will also explore some applications of calculus, including applying Taylor’s Theorem
to quantify the error in approximating a function as a polynomial (§2.4) and approximating
a function as a Fourier series, comprising sine and cosine waves of various frequencies
(§2.9).

2.1 Basic calculus with SciPy

Another useful Python library for mathematicians is SciPy (https://www.scipy.org),


which contains ready-made functions and constants for advanced mathematics. We will use
often use SciPy throughout this book.
You will need SciPy version at least 1.10. To check this, run the following lines:
import scipy
scipy.__version__

If you have an older version of SciPy, update it using pip. See Appendix A.
In this chapter, we will need the SciPy module scipy.integrate, which contains
several integration routines such as the Trapezium and Simpson’s Rules. The go-to workhorse
for computing integrals in Python is the function quad (for quadrature, a traditional term for
numerical integration). The quad function itself is not a single method but a set of routines
which work together to achieve an accurate answer efficiently. The algorithm behind quad
was initially conceived in Fortran in a package called QUADPACK, and is described in
detail by [165].
Here’s the basic syntax for the quad function.
Integration with SciPy
import numpy as np
import matplotlib.pyplot as plt
For integration with SciPy import scipy.integrate as integrate
2
Define f (x) = e −x f = lambda x: np.exp(-x**2)
R∞ 2
0
e −x dx integral, error = integrate.quad(f, 0, np.inf)

The output for (integral, error) is

(0.8862269254527579, 7.101318390472462e-09)

Note that quad returns a pair of numbers (called a tuple of length 2). The first number is the
value of the definite integral, and the second is an estimate of the absolute error (which
2.2 Comparison of differentiation formulae 55

should be tiny for a reliable answer). In this case, the exact answer is π/2, which agrees
with SciPy’s answer to 16 decimal places.
The mathematical details of how functions like SciPy’s integrate work are usually
hidden from users behind a ‘Wizard-of-Oz’ curtain that users rarely look behind. This is
contrary to the spirit of this book whose emphasis is on mathematics and not commands.
Therefore, in this chapter, we will only use SciPy’s integrate function occasionally, and
only to confirm what we can do by other, more transparent methods.
As for differentiation, SciPy had a derivative routine, but it is now obsolete. In the
next section, we will discuss how to differentiate a function numerically.

2.2 Comparison of differentiation formulae

Consider the following 3 formulae for approximating the derivative of a function f


at point x.
Let h > 0 be a small number (we call this the step size).
f (x + h) − f (x)
The forward-difference formula: f 0 (x) ≈ .
h
f (x) − f (x − h)
The backward-difference formula: f 0 (x) ≈ .
h
f (x + h) − f (x − h)
The symmetric-difference formula: f 0 (x) ≈ .
2h
Let f (x) = x 3 . Compare the actual value of the derivative f 0 (1) to the above
approximations for a range of step sizes h. Do this by plotting the absolute error
E(h) for each formula, where

E(h) = |actual value of the derivative−numerical approximation using step size h|.

The forward, backward and symmetric-difference formulae all approach the gradient of
the tangent to the graph y = f (x) at point x as the step size h → 0, so it is reasonable to
expect that the smaller the h, the more accurate the expression will be. However, we show in
this section that on the computer, this is not the case. This may come as a surprise to many
beginners. It is very important to be aware that when coding derivatives, hugely inaccurate
answers could result if h is too large or too small.
Let’s study the accuracy of these approximations at a fixed value x = 1 and calculate the
absolute difference between f 0 (1) and the approximations as we vary h. The code below
produces the graph of E(h), which in this case is

E(h) = |3 − approximation|

for h in the range [10−20, 1].


56 2 Calculus

Eh.ipynb (for plotting fig. 2.2)


import numpy as np
import matplotlib.pyplot as plt

Range of h from 10−20 to 100 (log scale) h = np.logspace(-20, 0, 300)


Point of interest x = 1
Define the function f = lambda x: x**3
Actual expression for f 0 (x) actual = 3*x**2

fx = f(x)
fxp = f(x+h)
fxm = f(x-h)

Forward difference est1 = (fxp-fx)/h


Symmetric difference est2 = (fxp-fxm)/(2*h)

E (h) err1 = abs(actual-est1)


err2 = abs(actual-est2)

Plot the errors on log axes plt.loglog(h,err2, 'k', lw=2)


plt.loglog(h,err1, 'r', lw=1)
plt.legend(['Symmetric difference',
'Forward difference'])
plt.xlabel(r'$h$')
plt.ylabel(r'Absolute error $E$')
plt.xlim([1e-20, 1])
plt.grid('on')
plt.show()

Fig. 2.2 shows the logarithmic plot of the two approximations (forward and symmetric
differences). The graphs show a number of distinctive and surprising features:

100

10 2
Absolute error E

10 4

10 6

10 8

10 10 Symmetric difference
Forward difference
10 20 10 17 10 14 10 11 10 8 10 5 10 2
h
Fig. 2.2: E(h) defined as | f 0 (1) − approximation| is plotted for the forward-difference
approximation (thin red line) and the symmetric-difference approximation (thick black line).
2.2 Comparison of differentiation formulae 57

• For each approximation, there appears to be an optimal value of h which minimises the
error, namely:
 10−8 (Forward difference)

hopt ∼  10−6 (Symmetric difference)

The symbol ∼ is used in this context to mean a rough order-of-magnitude estimate.
• The minimum error (ignoring small-scale fluctuations) are roughly:

 10−8
 (Forward difference)
E(hopt ) ∼ 
 10−11
 (Symmetric difference)

• For h & hopt , we see a linear behaviour in both cases. Since this is a log-log plot, a line
with gradient m actually corresponds to the power-law relation E(h) ∼ h m . From the
graph we see that

h
 (Forward difference)
E(h & hopt ) ∼ 
 h2
 (Symmetric difference)

• For 10−16 . h . hopt , we see rapid fluctuations, but the overall trend for both
approximations is
E(10−16 . h . hopt ) ∼ h−1 .
• For h . 10−16 , both approximations give the same constant.

You should try changing the function and the point x at which the derivative is calculated.
You should see that there are no effects on any of the observations summarised above. You
should also verify that the backward-difference approximation gives essentially the same
graph as the forward-difference graph (the fluctuations will be slightly difference).
The key takeaway here is that numerical derivatives do not behave in the way we might
expect: smaller h does not produce a more accurate estimate. Always use h ≈ hopt whenever
we code a derivative.
Discussion
• The machine epsilon. The reason behind the behaviour of E(h) that we saw is the
fact that computers use binary numbers. A computer uses 64 binary digits to represent
any real number as a floating-point number (or simply float). It can be shown that
this accuracy is dictated by a number called the machine epsilon, ε mach , defined
as the distance between 1 and the next floating-point number greater than 1. For
double-precision floats (which is what most computers today use by default), we have

ε mach = 2−52 ≈ 2.2 × 10−16 .

See [182] for an explanation of how ε mach is calculated. The machine epsilon is one of
the main reasons why numerical results are sometimes very different from theoretical
expectations.
For instance, ε mach is the reason why we see the flat plateau for very small h in fig.
2.2. If h . ε mach , the floating-point representation of 1 + h is 1, and the approximation
formulae give zero because f (x + h) = f (x − h) = f (x). Therefore E(h) = 3 if h is
too small, as seen in the graph.
58 2 Calculus

• Truncation and rounding errors The V-shape of the graphs is due to two numerical
effects at play. One effect is the rounding error, E R , which occurs when a number
is represented in floating-point form and rounded up or down according to a set of
conventions. It can be shown that

E R ∼ h−1,

and therefore E R dominates at small h. The tiny fluctuations are due to different
rounding rules being applied at different real numbers.
In addition, there is also the truncation error, ET , which is associated with the accuracy
of the approximation formula. In fact, Taylor’s Theorem (see §2.4) tells us that the error
in the forward and symmetric difference formulae can be expressed as follows.
f (x + h) − f (x) h 00
Forward difference: f 0 (x) = − f (ξ1 ), (2.1)
h 2
f (x + h) − f (x − h) h2 000
Symmetric difference: f 0 (x) = − f (ξ2 ), (2.2)
2h 6
for some numbers ξ1, ξ2 ∈ (x, x + h). The truncation error is simply each of the error
terms above, arising from the truncation of the infinite Taylor series for f . Note that the
powers of h in these remainder terms are exactly what we observed in the E(h) graph.
In exercise 2, you will explore the E(h) graph of a more exotic formula for the derivative.
• Big O notation Another way to express the accuracy of the approximations (2.1)-(2.2)
is to use the O(h n ) notation, where n is determined from the scaling of the error
term. We say that the forward difference formula is an O(h) approximation, and the
symmetric difference formula is O(h2 ). The higher the exponent n, the faster the error
shrinks, so one does not have to use a tiny h to achieve a good accuracy.
2.3 Taylor series 59

2.3 Taylor series

Evaluate and plot the partial sums of the Taylor series for:
a) sin x, b) ln(1 + x) c) 1/(1 + x).
In each case, at what values of x does the Taylor series converge?

Recall that the Taylor series for a smooth function f expanded about x = 0 (also known
as Maclaurin series) is given by:

f 00 (0) 2 f 000 (0) 3 X f (n) (0)
f (x) = f (0) + f 0 (0)x + x + x +... = xn . (2.3)
2! 3! n=0
n!

For the given functions, we find



X (−1) n+1 2n−1
sin x = x , (2.4)
n=1
(2n − 1)!

X (−1) n+1 n
ln(1 + x) = x , (2.5)
n=1
n

1 X
= (−x) n . (2.6)
1 + x n=0

The code taylor.ipynb plots the graph y = ln(1 + x) along with partial sums of the
Taylor series up to the x 40 term (fig. 2.3). As there are so many lines, it might help to start
plotting only, say, the 5th partial sum onwards. The first few partial sums do not tell you
much more than the fact that they are terrible approximations.
It might also help to systematically colour the various curves. In our plot, we adjust the
(r,g,b) values gradually to make the curves ‘bluer’ as the number of terms increases (i.e.
by increasing the b value from 0 to 1 and keeping r=g=0).
From the graphs (and further experimenting with the number of terms in the partial
sums), we can make the following observations.
• sin x – The Taylor series appears to converge to sin x at all x ∈ R.
• ln(1 + x) – The Taylor series appears to converge to ln(1 + x) for x ∈ (−1, 1), possibly
also at x = 1. For x > 1, the graphs for large n show a divergence (i.e. the y values
become arbitrarily large in the right neighbourhood of x = 1).
1 1
• 1+x – The Taylor series also appears to converge to 1+x for x ∈ (−1, 1), similar to
ln(1 + x). At x = 1, the Taylor series gives alternating values of ±1, so clearly does not
converge to the function value of 12 . For |x| > 1, the Taylor series blows up.
60 2 Calculus

1
Fig. 2.3: The graphs of y = sin x, ln(1 + x) and 1+x (thick black lines) and their Taylor
series. The top panel shows up to 7 terms in the series (up to the term x 13 ). The lower panels
show the series up to the x 40 term in the series, with bluer lines indicating higher number of
terms.
2.3 Taylor series 61

taylor.ipynb (for plotting the middle panel in fig. 2.3)


import numpy as np
import matplotlib.pyplot as plt

nth term of the Taylor series def nth_term(x,n):


return -(-1)**n*x**n/n

Specify maximum number of terms n_max = 40


Domain of the function x = np.linspace(-0.99, 2, 100)
S = np.zeros_like(x)

With every pass of the for loop for n in np.arange(1,n_max+1):


Add a term S = S + nth_term(x,n)
Adjust the colour (b value) of the curve b=n/n_max
Start plotting from n = 5 if (n>=5):
plt.plot(x,S,label='_nolegend_',
color = (0,0,b), lw=1)

Plot the function in thick black line plt.plot(x, np.log(1+x),lw=2,color='k')


plt.legend([r'$y=ln(1+x)$'], loc='upper left')
plt.xlim([-0.99, 2])
plt.ylim([-2,2])
plt.grid('on')
plt.show()

Discussion
• Radius of convergence The radius of convergence, R, of a power series is a real
number such that the series an x n converges for |x| < R, and diverges for |x| > R.
P
The interval (−R, R) is called the interval of convergence of the series.
For example, the series (2.6) could be regarded as a geometric series with common ratio
1
x. We know that the geometric series converges to the sum to infinity 1+x if |x| < 1
and diverges when |x| > 1, as can be seen graphically in fig. 2.3. We say that the radius
of convergence of the series is 1.
• Ratio Test. We discussed the comparison test in the previous chapter. Another useful
convergence test is the following result known as d’Alembert’s Ratio Test:

Theorem 2.1 (Ratio Test) Let Tn be a sequence such that Tn , 0 eventually. Let

Tn+1
L = lim .
n→∞ Tn

If L < 1 then the series Tn converges. If L > 1 then the series Tn diverges.
P P

Applying the Ratio Test to the series (2.4) gives

x2
L = lim = 0, for all x ∈ R.
n→∞ 2n(2n + 1)

This proves our conjecture that the Taylor series converges for all x.
However, we have not proved that the series converges to sin x (we only proved that it
converges to some function). We will come back to this in the next section.
62 2 Calculus

• Differentiating and integrating power series. There is a relation between the Taylor
1
series for 1+x and ln(1 + x). Here is a very useful theorem from analysis:
Theorem 2.2 The series an x n can be differentiated and integrated term by term
P
within the interval of convergence. The resulting power series has the same radius of
convergence.
Using this result, we can integrate both sides of Eq. 2.6 with respect to x as long as
|x| < 1, yielding exactly Eq. 2.5. This proves our conjecture that the two series share
the same interval of convergence.
However, we have not established the convergence at the end points x = ±1. We will
discuss this in the next section.
2.4 Taylor’s Theorem and the Remainder term 63

2.4 Taylor’s Theorem and the Remainder term

Suppose we approximate sin x and ln(1 + x) by the following polynomials (obtained


by truncating their Taylor series P(x)):

x3 x5 x 2N −1
sin x ≈ P2N −1 (x) : = x − + . . . + (−1) N +1
3! 5! (2N − 1)!
N
X (−1) n+1 2n−1
= x , (2.7)
n=1
(2n − 1)!
x2 x3 xN
ln(1 + x) ≈ PN (x) : = x − + . . . (−1) N +1
2 3 N
N
X (−1) n+1 n
= x . (2.8)
n=1
n

Quantify the accuracy of these approximations by investigating the difference


between the function and its order-k polynomial approximation:

Rk (x) = f (x) − Pk (x).

You will probably be familiar with the approximation of a function f (x) as an order-k
polynomial Pk (x) by truncating its Taylor series after a finite number of terms. At university,
we are not only concerned with the series itself, but also the error term (the ‘remainder’) in
the expansion, defined by Rk (x). The following theorem gives a useful expression for the
remainder term.

Theorem 2.3 (Taylor’s Theorem) Let I = [a, b] and N = 0, 1, 2 . . .. Suppose that f and
its derivatives f 0, f 00, . . . f (N ) are continuous on I, and that f (N +1) exists on on (a, b). If
x 0 ∈ I, then, ∀x ∈ I \ {x 0 }, ∃ξ between x and x 0 such that

f (N ) (x 0 )
f (x) = f (x 0 ) + f 0 (x 0 )(x − x 0 ) + · · · + (x − x 0 ) N + R N (x)
N!
f (N +1) (ξ)
where R N (x) = (x − x 0 ) N +1
(N + 1)!

Note that when x = x 0 , the equality is trivial.


Although the theorem bears the name of the English mathematician Brook Taylor
(1685–1731) who studied the polynomial expansion of functions, it was Joseph-Louis
Lagrange (1736–1813) who provided the expression for the remainder term. Rn is often
called the Lagrange form of the remainder.
In this form, the remainder is not completely determined due to the appearance of an
unknown ξ. However, in practice, this expression is sufficient for us to place a bound on R,
giving us some idea of the magnitude of the error.
Applying Taylor’s Theorem to f (x) = ln(1 + x), the remainder R N is found to be
! N +1
(−1) N x
R N (x) = , for some ξ ∈ (0, x). (2.9)
N +1 1+ξ
64 2 Calculus

On the domain 0 < x ≤ 1 (where the Taylor series converges), we find that |R N (x)| is
bounded by:

1  x  N +1 1
< |R N (x)| < x N +1 . (2.10)
N +1 1+x N +1
As the polynomial order N increases, we expect the quality of the approximation to improve,
and so R N (x) should shrink to 0. Indeed, as N → ∞, the RHS of (2.10) goes to zero. This
means that for all x ∈ (0, 1], the Taylor series for f converges to f (x) (and not any other
functions)1.
Interestingly, this also explains why we can evaluate the Taylor series at x = 1, giving us
the familiar series for ln 2 which we saw in §1.4.
The code taylorthm.ipynb calculates R N (x) at a fixed x ∈ (0, 1] and plots it as a
function of N. Fig. 2.4 shows the graph of R N (x = 0.4) (solid red line) when ln(1 + x) is
approximated by the Taylor polynomial of order N. The upper and lower bounds for |R N |
(eq. 2.10) are shown in dotted lines. The graph shows that Taylor’s Theorem holds, and that
the constant ξ must be close to 0.
Now let’s turn to the function f (x) = sin x. Since the coefficients of the even powers of
x vanish, the 2N-th order approximation is the same as the (2N − 1)th order approximation.
In other words, R2N (x) = R2N −1 (x). Calculating the remainder using Taylor’s Theorem,
we find
(−1) N cos ξ 2N +1
R2N −1 (x) = R2N (x) = x , for some ξ ∈ (0, x).
(2N + 1)!
Note that the cosine is positive and decreasing on the interval (0, 0.4). Thus, we find the
upper and lower bounds for |R N |:
cos x 1
x 2N +1 < |R2N −1 (x)| = |R2N (x)| < x 2N +1 . (2.11)
(2N + 1)! (2N + 1)!
In fig. 2.4, we can see that the remainder and its bounds are almost indistinguishable,
up until around N = 12 where the |R N | violates the bounds when it shrinks below a small
number of order ε mach ≈ 10−16 . This is due to an additional error arising from the subtraction
of two nearly equal numbers (namely, f (x) and PN (x)).
In both cases, our conclusion is that Taylor’s Theorem is verified. In addition, we also
saw that the Taylor series for sin x converges much more rapidly than that of ln(1 + x). By
the time we reach the order-12 polynomial, the approximation for sin x is so good that the
error is smaller than ε mach .

1 Keen-eyed readers might remember from fig. 2.3 that the Taylor series of ln(1 + x) also converges to
ln(1 + x) when −1 < x < 0. However, our proof does not work in this case, since it no longer follows from
eq. 2.9 that R N (x) → 0. What is needed in that case is a different technique (for example, integrating
another Taylor series).
2.4 Taylor’s Theorem and the Remainder term 65

|RN(x = 0.4)|
10 2

10 4

10 6

10 8

10 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Polynomial order N
|RN(x = 0.4)|
10 2

10 5

10 8

10 11

10 14

10 17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Polynomial order N
Fig. 2.4: The absolute value of the remainder term |R N (x)| for the function ln(1 + x)
(top) and sin x (bottom), evaluated at x = 0.4, plotted as a function of N (the order of the
approximating polynomial). As N increases, the remainder approaches zero. The bounds
for |R N | are plotted in dotted lines.

Discussion
• Ratio Lemma. We showed in the previous Section that the Taylor series for sin x
converges, but it remains to show that it converges to sin x. Here is a very useful result
which will help us prove this.
Theorem 2.4 (Ratio Lemma) Let an be a sequence such that an > 0. Suppose L is a
constant such that 0 < L < 1 and an+1 /an ≤ L eventually. Then, an converges to 0.
Letting a N be the sequence |x| 2N +1 /(2N + 1)! which appears on the RHS of Eq. 2.11
x2
(where x , 0). Then the ratio a N +1 /a N = (2N +3)(2N +2) → 0 regardless of the value
of x. The Ratio Lemma says that a N → 0, and therefore the remainder R N → 0. This
proves that the Taylor series of sin x converges to sin x for all x ∈ R.
66 2 Calculus

• A pathological function. You may be wondering if it is possible for the Taylor series
of a function to converge to a different function. The answer is yes! Here is a classic
counterexample. Define f : R → R by
2
 e−1/x
 if x , 0,
f (x) = 
0
 if x = 0.

It can be shown that f (n) (0) = 0 for all n ∈ N (see, for example, [91] for calculation
details). Therefore, the Taylor series of f converges to 0 everywhere, but only coincides
with f at a single point where x = 0.

taylorthm.ipynb (for plotting the top panel of fig. 2.4)


import numpy as np
import matplotlib.pyplot as plt

Term of degree N in the Taylor series def Nth_term(x,N):


return -(-1)**N*x**N/N

Choose any x ∈ (−1, 1] x = 0.4


Specify the maximum polynomial order N_max = 15
Nlist = np.arange(1,N_max+1)
P = 0
Value of the polynomial PNlist = []
Lower bound for the remainder lowlist = []
Upper bound for the remainder hilist = []

Append the lists with every iteration for N in Nlist:


P N (x) P = P + Nth_term(x,N)
PNlist.append(P)
Np = N+1
Lower and upper bounds in Eq. 2.10 low = (x/(1+x))**Np/Np
lowlist.append(low)
hi = x**Np/Np
hilist.append(hi)

The remainder |R N (x = 0.4) | RN = abs(PNlist-np.log(1+x))

Plot |R N | (thick red line), log y-axis plt.semilogy(Nlist, RN, lw=2,


color = 'r')
Upper and lower bounds in dotted lines plt.semilogy(Nlist,lowlist,'r:',
Nlist,hilist, 'r:')
plt.legend([r'$|R_N(x=0.4)|$'])
plt.xticks(Nlist)
plt.xlim([1,N_max])
plt.xlabel('Polynomial order N')
plt.ylim([1e-10,0.1])
plt.grid('on')
plt.show()
2.5 A continuous, nowhere differentiable function 67

2.5 A continuous, nowhere differentiable function

In 1872, Karl Weierstrass announced the discovery of a function which is continuous


on R, but is nowhere differentiable. The Weierstrass function f : R → R is given by

X
f (x) = a n cos(bn πx),
n=0

where a ∈ (0, 1), b is an odd integer, and ab > 1 + 3π


2 .
Plot the Weierstrass function for some choices of a and b.

A function is continuous wherever it is differentiable. However, it came as a surprise


to the late 19th-century mathematical world that the converse is not true. The Weierstrass
function, arguably the most famous counterexample in mathematics, led to the development
of a rigorous foundation for analysis. Although Weierstrass’s counterexample was the first to
be published, and certainly the most impactful, such a pathological function was discovered
much earlier in 1830 by Bolzano. There are now many known examples of such a function.
An accurate plot of the function had to wait until the advent of computers (surely
frustrating mathematicians over the decades). Luckily for us, we can use Python to visualise
Weierstrass’s original function. The code below produces an interactive GUI with two
sliders for adjusting the values of a and b (fig 2.5). The code looks long, but it is simply a
straightforward extension of the single-slider code that we used in §1.7.
Using this GUI, we see that the parameter a adjusts the amplitude of the small-scale
fluctuations (as a → 0, f reduces to a regular cosine curve). The parameter b adjusts the
density of the small-scale fluctuations (higher b gives more substructures with high-frequency
oscillations).
The GUI also allow us to zoom into the curve, revealing a self-similar (or fractal-like)
oscillating structure. This self-similarity is the key intuitive reason why the curve is not
differentiable: zooming in always reveals the same oscillations, meaning that there are no
smooth segments on the curve, despite the fact that the function is continuous on R.
Whilst these properties may be easy to understand intuitively, Weierstrass’s original
proof is rather technical. We refer interested readers to [161] for a readable walkthrough of
the proof. It should be mentioned that the conditions ab > 1 + 3π/2 and b an odd integer are
both specific to Weierstrass’s own working. It was shown by Hardy that the only conditions
required are that ab > 1 and b > 1 (not necessarily an integer).
68 2 Calculus

Fig. 2.5: The Weierstrass function on two very different scales, showing its self-similar
structure.

Discussion
• Other monsters. Poincaré famously described Weierstrass’s function as a “monster".
Here are two other famous monsters. Both these functions will be explored in the
exercises.
– The Blancmange function. For m = 0, 1, 2 . . ., let f m : R → R be defined by
1
f 0 (x) = min{|x − k |, k ∈ Z}, f m (x) =
f 0 (2m x).
2m
The Blancmange function, g : R → R, is defined by g(x) = ∞ 0 f m (x). It is
P
continuous on R but is nowhere differentiable.
– Riemann’s function, R : R → R is defined by

sin(k 2 x)

X
R(x) = . (2.12)
k=1
k2

Riemann conjectured that R was nowhere differentiable, and Weierstrass drew


inspiration from this function in the construction of his counterexample. However,
it is now known that R is differentiable only at points of the form πa/b where a, b
are odd coprime integers.
2.5 A continuous, nowhere differentiable function 69

• Further reading. For historical accounts and accessible material on continuous,


nowhere differentiable functions, see [118, 202]. A more comprehensive treatment can
be found in [98].

weierstrass.ipynb (for plotting fig. 2.5)


import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
Create an interactive GUI %matplotlib

Initial values of a and b a, b = 0.5, 13


Maximum number of term in the series m_max = 25
x values (need high resolution) x = np.linspace(-2, 2, 2500)

Each term in the series. Pass a and b as def fn(x, n, a, b):


arguments so they can be updated with the slider return a**n*np.cos(np.pi*x*b**n)

The Weierstrass function def g(x, a, b):


S = np.zeros_like(x)
Sum the terms from n = 0 to m_max for i in np.arange(0,m_max+1):
S = S + fn(x,i,a,b)
return S

fig,ax = plt.subplots()
Leave a space at the bottom for sliders plt.subplots_adjust(bottom=0.2)

Plot Weierstrass function (thin black line) Wfunc, = plt.plot(x, g(x,a,b),'k', lw=0.5)
plt.xlim([-2,2])
plt.ylim([-2,2])
plt.grid('on')
plt.title('The Weierstrass Function')

axa = plt.axes([0.15, 0.05, 0.7, 0.02])


Create a slider for a ∈ (0, 1) a_slide = Slider(axa, 'a', 0, 1,
valstep=0.01, valinit=a)

axb = plt.axes([0.15, 0.1, 0.7, 0.02])


Create a slider for b b_slide = Slider(axb, 'b', 1, 25,
valstep=0.01, valinit=b)

Update the plot if slider is changed def update(val):


Take a and b from sliders a = a_slide.val
b = b_slide.val
Wfunc.set_ydata(g(x,a,b))
fig.canvas.draw_idle()

Redraw a_slide.on_changed(update)
b_slide.on_changed(update)

plt.show()
70 2 Calculus

2.6 Integration with Trapezium Rule

Partition the interval [a, b] into n subintervals [x 0, x 1 ], [x 1, x 2 ] . . . [x n−1, x n ] of


equal length (where x 0 = a and x n = b). Let h = (b − a)/n. The Trapezium Rule
Rb
(or Trapezoidal Rule) states that the integral a f (x) dx can be approximated as:
Z b n−1 
h
 X
f (x) dx ≈  y0 + y1 + 2 yi  , (2.13)

a 2  i=1


where yi = f (x i ). We say that the RHS is the approximation of the integral using
the Trapezium Rule with n strips.
R2
Use the Trapezium Rule to approximate the integral 1 ln x dx.
How does the accuracy of answer vary with the width of each strip h?

The Trapezium Rule approximates the area under the graph y = f (x) as the sum of
the area of n equally spaced trapezia. The left panel of fig. 2.6 demonstrates this idea
for f (x) = ln x, using 10 trapezia, or strips. The concavity of the curve suggests that the
Trapezium-Rule estimate will be less than the actual answer.
Similar to the error analysis of derivatives in §2.2 , we can study the error in numerical
integration by defining E(h) as the absolute difference between the Trapezium-Rule
estimate and the actual answer. The latter can be obtained by integration by parts, yielding
R2
1
ln x dx = [x ln x − x]21 = 2 ln 2 − 1. Hence, we find

E(h) = 2 ln 2 − 1 − Trapezium-Rule estimate > 0.




The graph of E(h) (produced by the code trapezium.ipynb) is shown on the right of
fig. 2.6. Since E(h) appears to be a straight line on the logarithmic plot, we can approximate
this behaviour as E ∝ h k where k is the gradient of the straight line. Python tells us that
Gradient of line = 1.99999.
Thus, we conjecture that the Trapezium Rule is an O(h2 ) approximation.

Trapezium Rule with h=0.1 (10 strips))


0.7
0.6 10 6

0.5 10 7

0.4 10 8
E(h)
y

0.3 10 9

0.2 10 10

0.1
10 11

0.0
1.0 1.2 1.4 1.6 1.8 2.0 10 5 10 4 10 3 10 2
x h

R2
Fig. 2.6: Trapezium Rule. Left: The integration scheme for 1 ln x dx using 10 trapezoidal
strips. Right: The error E(h) plotted on log scales, showing that E(h) ∝ h2 .
2.6 Integration with Trapezium Rule 71

Discussion
• The error term. It can be shown (e.g. [182]) that the error term in the Trapezium Rule
can be written as
b n−1 
 (b − a)h2 00
Z
h
 X
f (x) dx =  y0 + y1 + 2 yi  − f (ξ), (2.14)
a 2  i=1
 12

for some ξ ∈ (a, b). The exponent of h in the error term confirms our finding that the
Trapezium Rule is an O(h2 ) approximation. Note from the formula that if f is a linear
function, the error vanishes. This is consistent with the geometric picture – the area
under a straight line is exactly a trapezium.
• numpy.trapz NumPy actually has a built-in Trapezium Rule. Look up the trapz
command in NumPy’s documentation and verify that it gives the same result as our
own code.
trapezium.ipynb (for plotting fig. 2.6)
import numpy as np
import matplotlib.pyplot as plt
Rb
Integration limits a a, b = 1, 2
Create N evenly spaced values (number of N = np.round(np.logspace(2,5))
strips) on log scale. round gives nearest integer.
Exact answer for the integral actual= 2*np.log(2)-1
h (width of each strip) hlist = (b-a)/N
E (h) (to be filled in using the for loop) error = []

def trapz(y,h):
Eq. 2.13 return h*(sum(y)-(y[0]+y[-1])/2)

Given a fixed number of strips, for n in N:


Create partition x i x = np.linspace(a,b,int(n+1))
yi y = np.log(x)
h = (b-a)/n
Apply the Trapezium Rule estim = trapz(y,h)
and collect the value of E (h) error.append(actual-estim)

plt.loglog(hlist, np.abs(error))
plt.xlim([1e-5, 1e-2])
plt.xlabel('h')
plt.ylabel('E(h)')
plt.grid('on')
plt.show()

Calculate the gradient from the first and k=(np.log(error[0])-np.log(error[-1]))/\


last points on the line. . . (np.log(hlist[0])-np.log(hlist[-1]))
and report print(f'Gradient of line = {k:.5f}.')
72 2 Calculus

2.7 Integration with Simpson’s Rule

Partition the interval [a, b] into 2n equal subintervals


[x 0, x 1 ], [x 1, x 2 ] . . . [x 2n−1, x 2n ] (whereR x 0 = a and x 2n = b). Let h = (b − a)/2n.
b
Simpson’s Rule states that the integral a f (x) dx can be approximated as:
Z b n n−1
h
 X X 
f (x) dx ≈  y0 + y2n + 4 y2i−1 + 2 y2i  , (2.15)
a 3  i=1 i=1


where yi = f (x i ).
R2
Use Simpson’s Rule to approximate the integral 1 ln x dx.
How does the accuracy of answer vary with the width of each strip?

Thomas Simpson (1710–1761) was an English self-taught mathematician known today


for his integration rule (although, as he himself acknowledged, the result was already known
to Newton and a number of previous mathematicians).
In Simpson’s Rule, the curve y = f (x) is approximated using a parabola drawn over each
subinterval [x i , x i+2 ]. Each parabola goes through three points, namely, (x i , yi ), (x i+1, yi+1 )
and (x i+2, yi+2 ). For easier comparison with the Trapezium Rule, we say that the formula
(2.15) uses n strips, each with width H ≡ 2h, where h is the width of each substrip.
This means that one parabola is drawn per strip. Fig. 2.7 (left panel) demonstrates this
scheme with 10 strips: the thick red vertical dashed lines show the boundaries of strips at
x 0, x 2, . . . x 20 . The thin blue dashed lines are the centres of the strips at x 1, x 3, . . . x 19 .
As before, we define the absolute error in the approximation as

E(H) = 2 ln 2 − 1 − Simpson’s-Rule estimate .

The graph of E(H) is shown in the right panel of fig. 2.7. For easy comparison with the
result for Trapezium Rule, we plot this graph over the same domain as that in fig. 2.6. We
can make a few interesting observations from this graph.

Simpson's Rule with H=0.1 (10 strips)


0.7 10 11

0.6 10 12

0.5
10 13
0.4
E(H)
y

0.3 10 14

0.2 10 15

0.1
10 16

0.0
1.0 1.2 1.4 1.6 1.8 2.0 10 5 10 4 10 3 10 2
x H
R2
Fig. 2.7: Simpson’s Rule. Left: The integration scheme for 1 ln x dx using 10 strips (20
substrips). Right: The error E(H) plotted on log scales. The domain is the same as that in
fig. 2.6 for easier comparison.
2.7 Integration with Simpson’s Rule 73

• The values of the error, given the same number of strips, is many orders of magnitude
smaller than the error using Trapezium Rule. For example, using the strip width 10−2
gives a surprisingly accurate answer with E ∼ 10−11 for Simpson’s Rule, but for the
Trapezium Rule, we find a much larger E ∼ 10−5 .
• There appears to be two distinct regimes of the curve. The straight part where H & 10−3 ,
and the oscillating part for smaller H. Python tells us that the value of the gradient of
the straight part is
Gradient of line = 3.99997.
Thus, we conjecture that Simpson’s Rule is an O(h4 ) approximation. Together with the
previous point, we conclude that Simpson’s Rule is a far superior method in terms of
accuracy (at the expense of an increased number of calculations).
• The oscillating part of the curve occurs around E(H) ∼ 10−15 . This is a magnitude
comparable to ε mach . Indeed, we are just seeing the numerical artefact when subtracting
two nearly equal numbers in the calculation of E(H).
Here is the code for plotting E(H) and calculating the gradient of the straight part.

simpson.ipynb (for plotting fig. 2.7)


import numpy as np
import matplotlib.pyplot as plt
Rb
Integration limits a a, b = 1, 2
Create N evenly spaced values (number of N = np.round(np.logspace(2,5,300))
strips) on log scale. round gives nearest integer.
Exact answer for the integral actual= 2*np.log(2)-1
H (width of each strip) Hlist = (b-a)/N
E (H ) (to be filled in using the for loop) error = []

def simp(y,h):
Eq. 2.15 return (h/3)*(y[0]+y[-1]+\
4*sum(y[1:-1:2])+2*sum(y[2:-1:2]))

Given a fixed number of strips for n in N:


Number of substrips n2 = 2*n
The partition x i x = np.linspace(a,b,int(n2+1))
yi y = np.log(x)
Width of each substrip h = (b-a)/n2
Apply Simpson’s Rule estim = simp(y,h)
and collect the value of E (H ) error.append(actual-estim)

plt.loglog(Hlist , np.abs(error))
plt.xlim([1e-5,1e-2])
plt.xlabel('H')
plt.ylabel('E(H)')
plt.grid('on')
plt.show()

Calculate the gradient of the straight part k=(np.log(error[0])-np.log(error[50]))/\


Note that the first 100 elements of these lists are (np.log(Hlist[0])-np.log(Hlist[50]))
the right third of the graph shown in fig. 2.7 print(f'Gradient of line = {k:.5f}.')
74 2 Calculus

Discussion
• Smaller h is not always better. Just as we saw in numerical differentiation, we have a
similar phenomenon for numerical integration: smaller h does not always guarantee a
more accurate answer. Fig. 2.7 shows that the minimum strip width (below which the
roundoff error dominates) occurs at around 10−3 .
• The error term. It can be shown (e.g. [182]) that the error term in Simpson’s Rule
can be written as

(b − a)h4 (4)
Z b n n−1
h
 X X 
f (x) dx =  y0 + y2n + 4 y2i−1 + 2 y2i  − f (ξ),
a 3  i=1 i=1
 180

for some ξ ∈ (a, b). The derivation of the error term is surprisingly much more difficult
than that of the Trapezium Rule.
The exponent of h in the error term confirms our conjecture that Simpson’s Rule is an
O(h4 ) approximation.
Note from the formula that if f is a cubic function, the error vanishes. This too is
surprising given that Simpson’s Rule is based on parabolas. If we consider the simplest
case of a single strip on [x 0, x 2 ], the formula suggests that given a cubic curve on
this interval, we can draw a parabola which intersects the cubic at 3 points (where
x = x 0, x 1, x 2 ) such that both the cubic and the parabola have exactly the same area
under the curves! See exercise 9.
Another interesting observation is that the error term allows us to estimate the minimum
strip width by setting magnitude of the error term to be roughly ε mach . We find
! 1/4
180
Hmin = 2hmin ∼2 ε mach .
(b − a) f (4) (ξ)

With f (x) = ln x on [1, 2] and | f (4) (ξ)| ≥ 38 , we find a conservative estimate of


Hmin . 1.1 × 10−3 . This is in good agreement with what we see in fig. 2.7. A similar
calculation of hmin for the Trapezium Rule is left as an exercise.
• Romberg integration. In fact, there is a beautiful connection between the Trapezium
and Simpson’s Rules. Each method approximates the function y = f (x) on a strip
with a degree-n polynomial (n = 1 and 2 respectively). One could clearly extend
this approximation by using a higher-order polynomial on each strip. This iterative
construction is formalised in the technique called Romberg integration, frequently used
in numerical applications as it has the advantage of being easy to implement using an
iterative scheme.
The Trapezium Rule is the lowest-order (O(h2 )) Romberg integration, and Simpson’s
Rule is the next order (O(h4 )) Romberg integration. The next order (O(h6 )) Romberg
integration is based on dividing each strip into 4 substrips, and approximating f (x)
over the strip using an order-4 polynomial. This approximation, known as Boole’s Rule,
will be explored in exercise 11.
Werner Romberg (1909–2003), was a German mathematician who escaped Nazi
Germany and settled in Norway. His integration method was based on previous work
by Maclaurin and Huygens. George Boole (1815–1864) was an English mathematician
and inventor of symbolic logic which we now call Boolean algebra.
R∞
I1 =

asymptotic
0
Z ∞

shown in fig. 2.8.


2

divergent integrals.
2.8 Improper integrals

e−x dx,
2.8 Improper integrals

I2 =

This makes f is continuous on R.


in I3 is not well defined at x = 0).
0
Z ∞



f (x) = 



1
2

x
sin x
x 2 e−x dx,
Evaluate the following integrals numerically.

x , 0,
x = 0.
.
I3 =
Z

useful for numerical purposes to define the integrand of I3 as


0

sha1_base64="EZulAEBASB1VMdaUs5E88O8yFdg=">AAARhnicjVhbc9y2Fd4kTetub1bymBdMVzt1MvLOrlonmcx4xrHdxrlIkR07zkSUFZAEdzECQAoAJa0x/B19bX9W/03OAUEuyZXl7kgicS64nPOdD2cVF4IbO5//75133/vN+7/93a3fj//wxz/9+S+3dz740eSlTtiLJBe5/immhgmu2AvLrWA/FZpRGQv2Mj57hPqXF0wbnqvndl2wE0mXimc8oRZEJ+v7V6/22St3Fx7V6e3JfDb3H7L9sggvk1H4HJ3u3E6iNE9KyZRNBDXmeDEv7Imj2vJEsGoclYYVNDmjS3YMr4pKZk6c33VFpiBJSZZr+FWWeGnXw1FpzFrGYCmpXZmhDoXX6Y5Lm31+4rgqSstUUi+UlYLYnGAISMo1S6xYwwtNNIe9kmRFNU0sBGo8nhLyJXlIHpHxOEpZFtEzO1m4SFC1FMxNFlWk/WsV1HxNXZRyUwi6NnaNiqBRBh0tu7JxVj/9IR3kACJGK5ysCtPEdLKIGMwUsyVXjgq+VBWKVBoGraHxlqZn+knP9hM0VuwyyaWkIIwefvmscnAwjDbAAY5MBWGCYeYIV4TCT7nEEaQEIqr51ZiQaGUgpszdjajWdA2bhyBXU9QYq0sbXehSMDIldsXIL/XoF8INoYas+HIF06YkZaxAASXe54ZpB3uO46oJ3pYGVTFfmjNeDHVJUjWRSfA8ehiLmLUG7Fz5LWybdGwU4FtTy7aNzluj85IZLKgtG25bG26Z5K9Zm0dRMoCHi2pM4LBCPAQtJKFxLOqEtCr22j2M/uZY9ZozTaoGp3HAKcvsHQCDhgTYjwH9ROVaQrpjQPgZs6a1No35cWN+gubmvKSabZvTxjyUQuPUlAO4+peB5yMXYYnGsXtUNZWR0BKjBROOowtMIrk7n/2dJHIM2PDDfRiQiKtElClbalqseGKOL3lqV/fns88TCTRSTzIr1LIiEWw6BQeONTWOpJTjEC4EAe68B4lQLC1AptFjBgym2REFckgfA59KyJcGby7g1PiARGr/HDfGB3Cu7wsEByQwSkyyqpz/O55ersB9j8QCQrFHYMo9stSMKRSVoEjWFN4DEeyRNRMiv5yNx55+HpN/kn+NQ6xSnmWTxWTf1cWgpUs9cUQ5FDLpyvZbMnnsor2NBqPeBWW6pBIOw5QpNcPcQHBykSLX5nDYNAfkgYmkVTXEc/o235ud3+o9cK9Pw5jz2epWq5cXwIEXkIfCcAG110plV3xaxw1IFxIz3BLzbNGDQl/vicAv/iYaYJ4G0KRLAn4ny3ppbh2bLWcVGXp6ckDPATUwX/yo2JR+0FgqNpNaQsWsAgIIuqSjSvx6DYjYFZWbIsBRIVhTBc3whjLIRI4QF/6JhVC/1JNnSvRymkHIIaFXLjJwvRbWwNHI4XdVC88MbsNiRZXNpZu1QnO2fvs0YFQjA+rkK/KEfD0egPtmhDXQqpdcUZH1bN3CV9WmkFaHDqxsoLDMHXY0xZbqtB0UGzvu6ee4ycwTGH9B/M1/0mSVszZxfMZmlWucOetryBB8nFeNwVAjC1CpnKsUkN00IV/LIteW4hYwiFPyDfmWfFcvJmiMXU7MRFhd2JrvBzNjq4kYqCsGRy40AiuG9wG2jAvoKTxZk09n93hbEQAnD8Nut4QTylOn6ssECjy/BN7P7Loi3f4IjK48DV7nfNVxJuAUuBDOd0AOyff9/QPWKyevAZc7EtukJbHVqLMadiLjFZ4hZNpjoLNPmTRKaLHc5kKXppHDgkHeW0cibiVLQ0/jfYrEHRTJLvTni2p3SB8HeI4ueg/efCYsGELwx0+slk7l6u5XtDSGU8XtmoQlD9vb+nC4Q5V3ADXUKa8DjowBGxj4I/KUPKvnLDpXWIHfDerAhAuslWyqDllvw1YNBwa2GnRDRYYL48V/pPM8q75oTlIAXxWay6ZPL0AQJEHeKoaavv5pG5KnjehZK3rWiDQ2c3fgkeHZPm6laafHg1HT80/JD+Q5efGGVsIwbCX8336cDV/ewG6gDezW8zEGfODrkhAdbJkyhp1dhxi/wdbquoIFKRTs/UX1qq1U4PReuYLJ6zc6zm90VC7KoQ9oxxAVmOnVW3YKX0HyDWZwBBGqGyVETSNorFP/zYyLlHVq1EJD8bxH4j80Gtt8k7O261BqhSh2u9Eq40LU3XTYKFqbxKENQaNIwKlMVYVWezfMUZY4dQlVFbi07oAnC+iAq9amvMZkf/Zp10al/YnqbWrWxcIFw34Hd7MhzOB/SbXiQAzIRGRzcQDGX9Yaf2XUppctpH2fWzkoR+z/U4DLmngZwePX5j+31fIzIn+6+/9/sIE5vT1ZDP8Jsf3y4/5scW82f/qPyYOH4R8Ut0Yfjf46ujNajD4bPRg9GR2NXoyS0fno36P/jP67c2tntnNv57Pa9N13gs+Ho95n58GvvXd4Eg==</latexit>
<latexit
sha1_base64="7I1zmKowXy0070ZXVg7pE0bTOF0=">AAARjXicjVhtcxtFEhYvd4C4O2z4yJcpZNUFylFJhgBFEQqSHAl32DghIRRex8zuzkpTnp3dzMzaUqbmr9xX7i/dv7nu2dnVauU4p7KlnX6Zl+6nn2kpLgXXZjr972uvv/Hmn/781tvvDN/9y1//9t7O7vu/6KJSCXuSFKJQv8ZUM8Ele2K4EezXUjGax4I9jc/vov7pBVOaF/KxWZXsNKdzyTOeUAOis533V7ejTNHERppLsnR26c52RtPJ1L/I9sMsPIwG4XV8truTRGmRVDmTJhFU65PZtDSnlirDE8HcMKo0K2lyTufsBB4lzZk+tX7zjoxBkpKsUPAvDfHSroeludarPAbLnJqF7utQeJXupDLZl6eWy7IyTCb1QlkliCkIRoKkXLHEiBU80ERx2CtJFhRiYSBew+GYkO/IHXKXDIdRyrKInpvRzEaCyrlgdjRzkfKPLqj5itoo5boUdKXNChVBIzU6GrY0cVZ/+kNaSAVEjDqczIVpYjqaRQxmitmcS0sFn0uHIpmGQWuovaXeMP1kw/YTNJbsMinynIIwuvPdI2fhYBhtQAUcmQrCBMPMEcg/hb9qjiNICURU8eWQkGihIabM3oyoUnQFm4cguzFqtFGViS5UJRgZE7Ng5Pd69DvhmlBNFny+gGlTkjJWooAS73PNtL09x7FrgrelQVXM5/qcl31dkrgmMgmeR/VjEbPWgD2XfgvbJh0bCfhW1LBto+et0fOKaayrLRtuWhtuWM5fsDaPomIADxvVmMChQzwELSShcSzrhLQq9sLeif5umXvBmSKuwWkccMoycwPAoCAB5mNAP5GFyiHdMSD8nBndWuvG/KQxP0Vz/byiim2b08Y8lELj1JQDuPqHnuddG2GJxrG965rKSGiF0YIJh9EFJpHcnE4+JUk+BGz44QEMSMRlIqqUzRUtFzzRJ5c8NYvb08mXSQ40Uk8yKeXckQg2nYIDx5oaRnmeD0O4EAS48w1IhGJpATKO7jFgMMWOKZBDeg9oNYd8KfDmAk6NH5BI5T+HjfEhnOunEsEBCYwSnSyc9e/D8eUC3PdJLCAU+wSm3CdzxZhEUQWKZEXhORDBPlkxIYrLyXDo6ece+Qf5fhhilfIsG81GB7YuBpXb1BNHVEAhk67soCWTezbaX2sw6l1QpnOaw2GY1JVimBsITiFS5NoCDpsWgDwwyalzfTynr/K93vmV3j33+jSMWZ+tbrV6eQkceAF5KDUXUHutNO+Kz+q4AelCYvpbYp4tNqCwqfdE4Bd/GQ0wTwNo0iUBv5N5vTQ3lk3mE0f6np4c0LNHDcwXPyrWpR80hor1pIZQMXFAAEGXdFSJX68BEVvSfF0EOCoFa6qgGV5TBpkoEOLCf2Ih1A/15JkUGznNIOSQ0CW0FnC9lkbD0cjRj66FZwa3Ybmg0hS5nbRCfb569TRgVCMD6uQ+eUB+GPbAfT3CGmjVSy6oyDZs7cxX1bqQFkcWrEygsMwedTTlluqsHZRrO+7p56TJzAMYf0X8zX/aZJWzNnF8wibONs6cbWpIH3ycu8agr8lLUMmCyxSQ3TQhP+RloQzFLWAQx+Sf5F/kx3oxQWPscmImwurC1Hzfmxk7TsRAXTE4sqERWDC8D7BlnEFP4cmafD65xduKADh5GHa7JZwwP7OyvkygwItL4P3MrBzp9kdgtPQ0eJXzsuNMwClwIZzvkByRnzb3D1h3Nr8CXPZYbJNWjq1GndWwkzxe4BlCpj0GOvvMk0YJLZZdX+i5buSwYJBvrJMjbnOWhp7G+5SJPSyTvWf25szt9enjEM/RRe/hy8+EBUMI/vmJ5dzKQt68TyutOZXcrEhY8qi9rY/6O5RFB1B9nfQ64MgYsIGBPyYPyaN6zrJzhZX43aAOTLjAWsm66pD11mzVcGBgq143VGa4MF78x6ooMvdVc5IS+KpUPG/69BIEQRLkraKv2dQ/bEPysBE9akWPGpHCZu4GfGR4to9badrp8WDU9Pxj8jN5TJ68pJXQDFsJ/74ZZ83n17AbaAO7bfhoDT7wdUmIDrZ0FcPOrkKM32BrdVXBghQK9vbMPWsrFTh9o1zB5MVLHafXOkobFdAHtGOICsz07BU7ha8gxRozOIII1Y0SoqYRNNap/2bGRco6NWqgoXi8QeI/NxrTfJMzputQKYkotnvRIuNC1N102Cha68SiDUGjSMCptHOh1d4Lc1QVTl1BVQUurTvg0Qw6YNfaVFeYHEw+79rIdHOiepuKdbFwwbDfwd2sCTP4X1IlORADMhFZXxyA8ae1xl8ZtellC2nf5+KPBw77/xTgsiJeRvD4tflvbbX8hsgf7/3/L2xgznZGs/6PENsPvxxMZrcm04efjb69E36geHvw4eCjwY3BbPDF4NvBg8Hx4MkgGSwH/x78MfjP7nu7t3a/3v2mNn39teDzwWDjtXv/f4HVexs=</latexit>
<latexit
sha1_base64="SSgXEbg2Q6vbtVOkVWN1IcXfd1w=">AAARg3icjVhtc9s2Elbba6/H3vXi9mO/YE7WXNo6Gsl9nc5kpk1y1/Sudp00aToxHRckQQljEGQA0LaCwb/o1+v/6r+5XRCkKMpxqrEtYl/wsvvsg6WTSnBtZrPfX3v9jT+9+daf3/5L9M5f//bu32/svPeTLmuVssdpKUr1c0I1E1yyx4YbwX6uFKNFItiT5Owu6p+cM6V5KR+ZVcVOCrqQPOcpNSB6urrNntlbl8/23emN8Ww68x+y/TAPD+NR+Byd7txI46xM64JJkwqq9fF8VpkTS5XhqWAuimvNKpqe0QU7hkdJC6ZPrN+yIxOQZCQvFfxKQ7y072FpofWqSMCyoGaphzoUXqU7rk3+5YnlsqoNk2mzUF4LYkqC5ycZVyw1YgUPNFUc9krSJVU0NRClKJoQ8g25Q+6SKIozlsf0zIznNhZULgSz47mLlX90Qc1X1MYZ15WgK21WqAgaqdHRsEuT5M23P6SFBEDEqMPJXJgmoeN5zGCmhC24tFTwhXQoklkYdIbaW+oN0482bD9CY8ku0rIoKAjjO988dBYOhtEGLMCRqSBMMMwc4ZJQ+KkXOIKUQEQVv4wIiZcaYsrsrZgqRVeweQiym6BGG1Wb+FzVgpEJMUtGfmlGvxCuCdVkyRdLmDYjGWMVCijxPtdMO9hzkrg2eFsaVCV8oc94NdSlqWsjk+J51DAWCesM2HPpt7Bt0rORgG9FDds2et4ZPa+ZxmrasuGms+GGFfwF6/IoagbwsHGDCRw6xEPQQhJax6pJSKdiL+yd+J+WuRecKeJanCYBpyw3NwEMChJgPgT0E1mqAtKdAMLPmNGdtW7Nj1vzEzTXz2uq2LY5bc1DKbRObTmAq38YeN61MZZokti7rq2MlNYYLZgwis8xieTWbPoJSYsIsOGH+zAgMZepqDO2ULRa8lQfX/DMLG/Ppl+mBdBIM8m0kgtHYth0Bg4cayqKi6KIQrgQBLjzDUiEYukAMonvMWAwxY4okEN2D8i0gHwp8OYCTo1fkEjlv6PW+ADO9UOF4IAExqlOl876v9HkYgnueyQREIo9AlPukYViTKKoBkW6ovAciGCPrJgQ5cU0ijz93CP/Iv+OQqwynufj+XjfNsWgCpt54ohLKGTSl+13ZHLPxntrDUa9D8psQQs4DJO6VgxzA8EpRYZcW8JhsxKQByYFdW6I5+xVvtc7v9J74N6chjHrs9WvVi+vgAPPIQ+V5gJqr5MWffFpEzcgXUjMcEvMs8UGFDb1ngj84i+jAeZpAE36JOB3smiW5say6WLqyNDTkwN6DqiB+eJHxbr0g8ZQsZ7UECqmDggg6NKeKvXrtSBil7RYFwGOKsHaKmiH15RBLkqEuPDfWAjNQzN5LsVGTnMIOST00sYartfKaDgaOfzedfDM4TasllSasrDTTqjPVq+eBowaZECdfEvuk++iAbivR1gLrWbJJRX5hq2d+6paF9Ly0IKVCRSW28OeptpSnXaDam3HPf0ct5m5D+OviL/5T9qsctYljk/Z1NnWmbNNDRmCj3PXGgw1RQUqWXKZAbLbJuS7oiqVobgFDOKE/If8l3zfLCZogl1OwkRYXZiG7wczY5+JGGgqBkc2NAJLhvcBtoxz6Ck8WZPPp5/xriIATh6G/W4JJyxOrWwuEyjw8gJ4PzcrR/r9ERhdehq8yvmy50zAKXAhnO+AHJIfNvcPWHe2uAJc9khsk1aBrUaT1bCTIlniGUKmPQZ6+yzSVgktll1f6IVu5bBgkG+sUyBuC5aFnsb7VKk9qNJd6M/nbndIHwd4jj56D15+JiwYQvDHTywXVpby1re01ppTyc2KhCUPu9v6cLhDWfYANdRJrwOOTAAbGPgj8oA8bOaseldYhe8GTWDCBdZJ1lWHrLdmq5YDA1sNuqEqx4Xx4j9SZZm7r9qTVMBXleJF26dXIAiSIO8UQ82m/kEXkget6GEnetiKFDZzN+Erx7N92EmzXo8Ho7bnn5AfySPy+CWthGbYSvi/m3HWfHENu4E2sNuGj9bgA69LQvSwpesEdnYVYvwGO6urChakULC35+5ZV6nA6RvlCiYvXuo4u9ZR2riEPqAbQ1Rgpmev2Cm8gpRrzOAIItQ0SoiaVtBaZ/7NjIuM9WrUQEPxaIPEf2w1pn2TM6bvUCuJKLa78TLnQjTddNgoWuvUog1Bo1jAqbRzodXeDXPUNU5dQ1UFLm064PEcOmDX2dRXmOxPP+/byGxzomabivWxcM6w38HdrAkz+F9QJTkQAzIRWV8cgPEnjcZfGY3pRQdp3+c6C+WI/X8GcFkRLyN4/Mb8aVctTxH5k90//sEG5vTGeD78J8T2w0/70/ln09mDT8df3wn/oHh79MHoH6Obo/noi9HXo/ujo9HjUTqSo19H/xv9tvPmzsc7+zufNqavvxZ83h9tfHZu/x90a3bs</latexit>
<latexit
x
sin x

y=
y=e

y = x2 e

x
dx.

sin x
x2

Now let’s see how these improper integrals can be evaluated in Python.
x2

Fig. 2.8: Graphs of the integrands of I1 to I3 (top to bottom).

You might also notice that for I3 , the integrand does not diverge as x → 0, but approaches
We see that the integrands all appear to converge to y = 0 as x → ∞. However, this
75

work strategically around the discontinuities. The graphs of the integrands of I1 to I3 are
in which the integrand is not well-defined at one of the integration limits (e.g. the integrand
An improper integral is an integral that either has an integration limit involving ∞ or one

1 (due to the limit lim x→0 (sin x/x) = 1, see eq. 1.6). To avoid zero division, it might be
revealed in this step and we can then conclude that either the integral diverges, or plan to
a clue as to the kind of answer we might expect. Any discontinuities or divergence would be
integrand to survey the behaviour of the function over the integration domain. This gives us
When calculating an integral numerically, it is always useful to start by plotting the

behaviour alone does not guarantee the convergence of the integrals (after
all, 1 x1 dx is divergent). But our graphical investigation has not revealed any obviously
76 2 Calculus

First solution: quad


The quickest method for tackling improper integrals is use scipy.integrate.quad
which accepts ∞ (np.inf) in the argument. The code for evaluating I1 was shown in the
beginning of this chapter (§2.1). Using quad, we obtain the following values of the integrals,
as well as the error estimates.
I1, err1 = (0.8862269254527579, 7.101318390472462e-09)
I2, err2 = (0.4431134627263801, 8.053142703522972e-09)
I3, err3 = (2.247867963468921, 3.2903230524472544)

The error estimates for I1 and I2 look reassuringly tiny, but the error estimate for I3 is
alarmingly large (even larger than the answer itself!). Python also gives a warning message
that “The integral is probably divergent, or slowly convergent". Further investigation is
needed.
The quad method is really easy, but it gives us very little understanding of what is
happening mathematically. When coding becomes an opaque black box, it makes the
interaction between mathematics and computing less meaningful, and the problem becomes
less mathematically enlightening.
Second solution: substitution
Let’s take a look at a more transparent method, this time without using quad.
All the integrals involve infinity in the limit. But infinity is not a binary number, so to
compute it numerically we must replace infinity with a finite limit. But simply replacing
infinity with a large number is not always going to work. After all, it’s not clear how large
the large number should be. R∞
Here is a more sophisticated strategy. To work out I = 0 f (x) dx, let’s first break it up
into two integrals: Z α Z ∞
I= f (x) dx + f (x) dx,
0 α
where the break point α is a positive number to be determined. For the second integral, use a
substitution to turn the limit ∞ to a finite number. Let’s try u = 1/x (a different substitution
might work even better, depending on the behaviour of the integrand). This yields
Z α Z 1/α
f (1/u)
I= f (x) dx + du. (2.16)
0 0 u2
When α = 1, the two terms can also be combined into a single integral (upon renaming the
dummy variable u as x)
Z 1" #
f (1/x)
I= f (x) + dx. (2.17)
0 x2

Let’s try using formula (2.16) to tackle I1 using Simpson’s Rule to evaluate each integral.
You can use our own Simpson’s Rule code, or, equivalently, scipy.integrate.simpson.
Supposing for now that the exact answer is

π
I1 = .
2
(We will explain this in the Discussion section.) Let’s vary the value of the break point α and
see how the error behaves. Using h = 10−3 in Simpson’s Rule, the code improper.ipynb
produces fig. 2.9.
2.8 Improper integrals 77

10 13

Absolute error
10 14

10 15

10 16

0 1 2 3 4 5 6 7

R∞ 2
Fig. 2.9: The accuracy when I1 = 0 e−x dx is evaluated by splitting it into two integrals
(eq. 2.16) and using Simpson’s Rule with h = 10−3 . The graph shows the absolute error as
the break point α varies. The absolute error is minimised when α & 5.

Fig 2.9 shows that the best accuracy is obtained when α is around 5 (where the absolute
error is machine-epsilon limited). Nevertheless, for any α the worst accuracy is still a
respectable 13 decimal places. With α = 5, Python gives the following excellent estimate
which is accurate to 15 decimal places:
integ = 0.886226925452758
exact = 0.8862269254527579
A couple of highlights from the code:
• In evaluating the second integral in eq. 2.16 , we choose a tiny cutoff for the lower limit
(10−8 in the code) in place of zero, which would have produced a division-by-zero error.
It is always a good idea to check that the answer for the integral is insensitive to the
choice of the tiny cutoff. Indeed, varying the cutoff by a few orders of magnitude does
not change the shape of the graph.
• Simpson’s Rule requires an even number of strips, i.e. an odd number of points N in
np.linspace(a, b, N). We ensure that N is odd using the function
makeodd(N) = N + 1 - (N%2).
The syntax N%2 gives the remainder when N is divided by 2. Therefore, the function
gives N when N is odd, and N+1 when N is even.
• Nevertheless, SciPy’s integrate.simpson would still work without our makeodd
function, but one should be aware of how the routine deals with an odd number of
half-strips. The result can be different depending on your version of SciPy. For more on
this point, consult SciPy’s documentation2.

2 https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.simpson.
html
78 2 Calculus

improper.ipynb (for producing fig. 2.9)


import numpy as np
SciPy’s Simpson’s Rule import matplotlib.pyplot as plt
from scipy.integrate import simpson
Range of break point α in eq. 2.16
Each run of the for loop will fill this list alpha = np.linspace(0.05, 7, 500)
f (the first integrand of (2.16) error = []
g (the second integrand) f = lambda x: np.exp(-x**2)
g = lambda x: np.exp(-1/x**2)/x**2
Exact value of the integral
Width of each half-strip exact = 0.5*np.sqrt(np.pi)
h = 1e-3
Function which always gives an
odd integer def makeodd(N):
(see text for explanation) return N + 1 - (N%2)

Loop over α values for a in alpha:


No. of half-strips in each integrand (int N1 = int(a/h)
converts a float to an integer) N2 = int(1/(a*h))
Domain of the first integration x1 = np.linspace(0, a, makeodd(N1))
Lower limit in the second integral cannot x2 = np.linspace(1e-8, 1/a, makeodd(N2))
be 0, so we introduce a tiny cutoff 10−8 .
Use Simpson’s Rule to evaluate (2.16) integ = simpson(f(x1),x1)+simpson(g(x2),x2)
(note the syntax: y values then x values) err = abs(exact-integ)
Collect the absolute errors error.append(err)

Plot the result with a green line plt.semilogy(alpha,error,'g')


(log vertical axis) plt.xlim([0,max(alpha)])
plt.xlabel(r'$\alpha$')
plt.ylabel('Absolute error')
plt.grid('on')
plt.show()

Applying the same splitting trick to integral I2 , you should find a similar behaviour in
the absolute error. To work out the exact answer, one can integrate by parts and use the
previous exact result to deduce that

π
I2 = .
4
With α = 5, Python gives an excellent estimate accurate to 15 decimal places.
integ = 0.44311346272637947
exact = 0.44311346272637897

Finally, for the integral I3 , our trick produces


α 1/α
sin x sin(1/x)
Z Z
I3 = dx + dx.
0 x 0 x
For small x, the sin(1/x) term fluctuates very rapidly. The resulting error from integrating
this term numerically is enormous (try it!).
We could try neglecting the second troublesome integral and evaluating the first integral
up to a large finite number α = A, i.e. we make the approximation
A
sin x
Z
I3 ≈ dx.
0 x
2.8 Improper integrals 79

The integral converges very slowly as shown in fig. 2.10. Here we have used the Trapezium
Rule with strip width h = 10−3 , and we have assumed that the exact answer is
π
I3 = .
2
(This will be explained in the Discussion). From fig. 2.10, it is clear that we have to integrate
to quite a large number to achieve high accuracy. More precisely, the error fluctuates
more and more as A increases, but there is an envelope A ∝ (error) −1 , which guarantees
a minimum achievable accuracy. We can deduce, for instance, that to achieve an answer
that is accurate to p decimal places (i.e. the error must be at most 0.5 × 10−p ), this can be
achieved when A ≈ 2 × 10 p (limited by rounding error).

100
10 2

10 4
Absolute error

10 6

10 8

10 10

10 12

101 102 103 104 105


A

A sin x π
Fig. 2.10: The absolute error defined by dx − plotted as the upper limit A varies.
R
0 x 2
The integral is evaluated using Trapezium Rule with h = 10−3 .

Although we have attacked these integrals using a variety of different numerical methods,
it is also possible to calculate these integrals using Python’s library for symbolic mathematics
called SymPy. We will explore SymPy in chapter 5, but if you are impatient, go straight to
the code box in the very last exercise in that chapter.
Finally, a rather different approach to numerical integration based on probability is Monte
Carlo integration, which will be discussed in §7.10.
80 2 Calculus

Discussion
• Key to success with numerical integration. The main takeaway here is that there are
no magical Python functions which can deal with every integral accurately. The key to
success is to work with Python by using our mathematical understanding (e.g. introduce
a suitable transformation). Always investigate the behaviour of the integrands, and
where possible, avoid the black-box approach (where you leave everything to Python
without knowing what it does). Numerical integration is an art which takes experience
to master, so be patient!
• A Gaussian integral R ∞and 2a trick. Here is a neat integration trick to show that

I1 = π/2. Let I = −∞ e−x dx. Note that by symmetry, I = 2I1 . Since we can also
R∞ 2
write I = −∞ e−y dy, we can multiply the two expressions for I and, assuming that we
can move terms around, we have
Z ∞ Z ∞ "
2 2 2 2
I2 = e−x dx e−y dy = e−(x +y ) dx dy.
−∞ −∞ R2

Here we need some knowledge of multivariable calculus. If double integration is


unfamiliar to you, come back to revisit this point after studying the next chapter. Note
that the domain in the final integral is the whole of R2 . We can evaluate this integral in
polar coordinates (r, θ). We have x 2 + y 2 = r 2 , and the area element dx dy = r dr dθ.
Thus, Z 2π Z ∞
2
I2 = re−r dr dθ = π.
θ=0 r =0
√ √
Therefore I = π and I1 = π/2.
This kind of integral, known as Gaussian integral, occurs frequently in university
mathematics, especially in probability and statistics (you may recognise that the integrand
Rb 2
is the normal distribution). But if we change the integration limits to a e−x dx for
arbitrary real numbers a, b, then there are no elementary expressions for the answer. This
is why it is important for us to perform this kind of numerical integration accurately.
• The sine integral and another trick. The integral I3 , which we found tricky to handle
numerically due to its slow convergence, is in fact a special value of the following
function called the sine integral:

sin t
Z x
Si(x) = dt.
0 t
This function occurs frequently in physical and engineering applications. The quickest
way to evaluate this is to use the following SciPy command:
scipy.special.sici(x)[0]
Note that only first element of sici(x) is the sine integral. The second element is the
cosine integral which will be explored in exercise 13.
In general, there are no elementary expressions for this integral,
R ∞except at x = 0 (clearly
Si(0) = 0) and x → ∞ where the integral becomes I3 = 0 sinx x dx = π/2 as we
saw earlier. At university, you will learn a number of different methods that can help
you evaluate I3 exactly. One technique is to use contour integration (in a topic called
complex analysis). Another technique the Laplace transform, which is a mathematical
tool with a huge range of engineering applications.
2.8 Improper integrals 81

Even without these advanced techniques, there is a clever trick which can help us
evaluate the integral. Again this relies on some elementary knowledge of multivariable
calculus. This trick is based on the following simple observation:

sin x
Z ∞
e−x y sin x dy = ,
0 x
(where x is held constant in the integral). Therefore, we can write the original integral
as Z ∞ Z ∞ !
I3 = e−x y sin x dx dy,
0 0
where we have assumed that the order of integration can be switched (thanks to a result
in analysis called Fubini’s theorem). You would most likely have come across the inner
integral presented as an exercise in integration by parts with the use of a reduction
formula (whereby the original integral appears upon integrating by part twice). You
should verify that:
1
Z ∞
e−x y sin x dx = .
0 1 + y2
Returning to the original integral, we then find:

dy g∞ π
Z ∞ f
I3 = = tan−1
y = .
0 1 + y2 0 2
82 2 Calculus

2.9 Fourier series

Let f : R → R be a 2π-periodic function defined by:

1
 x ∈ [0, π)
f (x) = 
0
 x ∈ [π, 2π)

and continued outside [0, 2π) in the same way, so that the graph of y = f (x) looks
like a square wave.
Show that f can be written as a series in sin nx as

1 2 1 1
!
f (x) = + sin x + sin 3x + sin 5x . . . .
2 π 3 5

Plot the partial sums of this series.

The topic of Fourier series is a vital part of university mathematics. We give a brief
summary here. The goal is to write a 2π-periodic function f (defined on R) as a sum of sines
and cosines of different frequencies and amplitudes. In other words, we want to express
f (x) as

a0 X
f (x) = + (an cos nx + bn sin nx) , (2.18)
2 n=1

(the constant term a0 is traditionally written out separately to make the formula for an and
bn easier to remember). The French mathematician Joseph Fourier (1768–1830) proposed
such a series to study the problem of heat conduction, and showed that the coefficients of
the series are given by:
Z π Z π
1 1
an = f (x) cos nx dx, bn = f (x) sin nx dx. (2.19)
π −π π −π
We can think of equation 2.18 as a decomposition of a function into different resolutions.
Large n are high-frequency sinusoidals, so they capture the small-scale fluctuations of f at
fine detail. Similarly, small n sinusoidals would capture the low-frequency, broad-brush
behaviour of f .
In fact, this composition can be generalised to a continuous range of frequency n, in
which case the decomposition is called a Fourier transform. Fourier series and Fourier
transform constitute a topic called Fourier analysis, which is an indispensable tool in signal
and image processing, geoscience, economics and much more. See [65, 94, 190] for some
excellent books on Fourier analysis.
Back to the square-wave function. Performing the integrations in Eqs. 2.19, we find:

a0 = 1, an = 0,
2
1    for n odd
bn = 1 − (−1) n = 
 nπ
0 for n even
nπ 

where n = 1, 2, 3 . . . (you should verify these results). Putting these into the Fourier series
(2.18) gives:
2.9 Fourier series 83

1 2 X sin nx
f (x) = + . (2.20)
2 π n odd≥1 n

The code below produces an interactive plot with a slider which changes the number of
terms in the truncated Fourier series (n = 0, 1, 2, . . . nmax ) . Some snapshots for different
values of nmax are shown in fig. 2.11.
Here are some interesting observations from fig. 2.11.
• The more terms we use, the closer we are to the square wave. However, the series,
even with infinitely many terms, will never be exactly the same as the square wave. For
example, at the points of discontinuities of the square wave function, the Fourier series
equals 12 (just put x = kπ in (2.20)), but the square wave never takes this value.
This means that one has to take the equal sign in Eq. 2.20 with a pinch of salt. Some
people write ∼ instead of = to emphasise the difference.
• Near each discontinuity, there appears to be an overshoot above 1 (and undershoot
below 0). Try zooming into an overshoot and read off the maximum value (choosing a
large value of nmax ). You should find that Fourier series overshoots the square wave by
just under 9%. The undershoot below 0 is also by the same amount.

Discussion
• Jump discontinuities. It can indeed be shown that if there is a jump discontinuity in
the function f at x = a, then its Fourier series converges to the average of the left and
right limits, i.e.
1 
lim− f (x) + lim+ f (x) . (2.21)
2 x→a x→a

• Gibbs phenomenon. An overshoot (and an undershoot) always occurs when using the
truncated Fourier series to approximate a discontinuous function. For large nmax , the
overshoot moves closer to the point of discontinuity but does not disappear. In fact, the
overshoot can be shown to be around 9% of the magnitude of the jump. More precisely,
the fractional overshoot can be expressed as
Z π
1 sin x 1
dx − ≈ 0.0894898722 (10 dec. pl.) (2.22)
π 0 x 2
Note the appearance of the sine integral!
The overshoot and undershoot of Fourier series near a discontinuity are known as Gibbs
phenomenon after the American mathematician Josiah Gibbs (1839–1903) who gave a
careful analysis of the overshoot magnitude. However, it was the English mathematician
Henry Wilbraham (1825–1883) who first discovered the phenomenon.
• Parseval’s theorem and ζ(2). There is an elegant relation between the function f and
its Fourier coefficients an and bn (without the sine and cosines).
Parseval’s Theorem, named after the French mathematician Marc-Antoine Parseval
(1755–1836), states that
84 2 Calculus

Fig. 2.11: The partial sums (thin blue curves) of the Fourier series (2.20) for the square-wave
function (shown in red) with nmax = 5, 35 and 95.
2.9 Fourier series 85

π  a 2 ∞ ∞
1 1X 2 1X 2
Z
0
| f (x)| 2 dx = + a + b . (2.23)
2π −π 2 2 1 n 2 1 n

Here is a brief interpretation of why this holds. The LHS is the average of | f (x)| 2 over
the period. The terms on the RHS are the averages of terms when the Fourier expansion
is squared. We are left with the averages of (a0 /2) 2 , (an cos nx) 2 and (bn sin nx) 2 (the
cross terms average to zero).
Applying Parseval’s identity to the square wave function, Eq. 2.23 becomes:

1 1 2 1 1
!
= + 2 1+ 2 + 2 +... .
2 4 π 3 5

Another way to express this interesting result is:

π2

X 1
= . (2.24)
k=1
(2k − 1) 2 8

1
We end this chapter with a calculation of ζ (2) = that follows from Eq. 2.24.
P∞
n=1 n 2
Let S = ζ (2). Observe that:

X 1
S=
n=1
n2
X 1 X 1
= +
n odd
n2 n even
n2
∞ ∞
X 1 X 1
= +
k=1
(2k − 1) 2 k=1 (2k) 2
π2
S
= +
8 4
4 π2 π2
=⇒ S = · = .
3 8 6
This beautiful result3 also verifies our numerical calculation of ζ (2) in §1.4.

3 A subtle point: in the second line of the working, we assumed that the sum S remains unchanged by the
rearrangement of the terms in the series. Although it is safe to do here, a rearrangement can change the sum
of series like the alternating harmonic series. Look up Riemann’s rearrangement theorem.
86 2 Calculus

fourier.ipynb (for producing fig. 2.11)


import numpy as np
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
Create an interactive GUI %matplotlib

Initial values of n max (an odd number) nmax = 5


A shorter way to call π pi = np.pi
Domain [−2π, 2π] at high resolution x = np.linspace(-2*pi, 2*pi, 1001)

The square-wave function def f(xarray):


Pre-populate output with 0’s y = np.zeros_like(xarray)
for ind, x in enumerate(xarray):
If x modulo 2π is. . . xmod = x%(2*pi)
in the right domain then. . . if xmod<pi:
change the output from 0 to 1 y[ind] = 1
Insert discontinuities at multiples of π if x%pi==0:
y[ind]= np.nan
return y

The Fourier series def Fourier(x, nmax):


Pn max S = np.zeros_like(x)
sin n x
1 n with n odd for n in np.arange(1,nmax+1,2):
S += np.sin(n*x)/n
return 0.5+ 2*S/pi

fig,ax = plt.subplots()
Leave a space at the bottom for a slider plt.subplots_adjust(bottom=0.15)

Plot the square wave in red plt.plot(x, f(x),'r',lw=1.5)


Plot the Fourier series in thin blue line Ffunc,= plt.plot(x, Fourier(x,nmax),'b',
lw=0.5)
plt.xlim([-2*pi, 2*pi])
plt.ylim([-0.2, 1.2])
plt.grid('on')
plt.title(r'Fourier series')

axn = plt.axes([0.15, 0.05, 0.7, 0.03])


Create a slider for n max from 1 to 101 n_slide = Slider(axn, 'n', 1, 101,
Keep n max odd by setting the step size to 2 valstep = 2, valinit = nmax)

def update(val):
Take n max from slider nmax = n_slide.val
Recalculate the Fourier series Ffunc.set_ydata(Fourier(x,nmax))
fig.canvas.draw_idle()

Update the plot if slider is changed n_slide.on_changed(update)

plt.show()
2.10 Exercises 87

2.10 Exercises

1 Perform the following modifications on the code Eh.ipynb which we used to calculate
the derivative of f (x) = x 3 at x = 1 (§2.2).
• Change the point x at which the derivative is calculated.
• Change the function f (x) to any differentiable function.
• Change the approximation to the backward-difference approximation.
Verify that the qualitative behaviour of E(h) does not change by these modifications.
2 (Five-point stencil) Use Eh.ipynb to help you with this question.
Apart from the forward, backward and symmetric-difference formulae for the derivative
discussed in §2.2, there are infinitely many other similar differentiation formulae. Those
involving more points typically produce a more accurate answer (given the same step
size h) at the expense of increased calculation time.
Here is an exotic one called the five-point stencil formula:
1
f 0 (x) ≈ − f (x + 2h) + 8 f (x + h) − 8 f (x − h) + f (x − 2h) .

12h
a. Plot the graph of the absolute error E(h) for the derivative of f (x) = cos x at x = 1.
You should see a V-shape similar to fig. 2.2.
b. On the same set of axes, plot the symmetric difference formula. How do the two
graphs compare?
c. If the five-point stencil formula is an O(h k ) approximation, use your graph to show
that k = 4.
Note: Such formulae are studied in a topic called finite-difference methods, which we
will explore in §4.9.
3 (Taylor series and convergence) Consider the following Taylor series.

x 2n+1

X
sinh x = ,
n=0
(2n + 1)!
x 2n+1

X
tan−1 x = (−1) n ,
n=0
(2n + 1)
√ ∞
X (−1) n+1 (2n)! n
1+x = x .
n=0
4 (n!) 2 (2n − 1)
n

a. Modify the code taylor.ipynb (§2.3) to produce the graph of each of the above
functions and its Taylor series, similar to fig. 2.3.
Experiment with the domain of the plot, and conjecture the radius of convergence
for each Taylor series. √
(For example, you should find that for 1 + x, the radius of convergence is 1.)
b. Choose a value of x for which all three Taylor series converge to the respective
function (for example, x = 0.3).
Plot the absolute error

|R N | = f (x) − Taylor series of order N


88 2 Calculus

as a function of the polynomial order N, similar to fig. 2.4 (but forget about the
upper and lower bound in dotted lines). Plot the absolute error for all 3 Taylor series
on the same set of axes.
From the plot, deduce which√ Taylor series converges fastest and slowest.
(Answer: sinh x is fastest, 1 + x is slowest.)

4 (Taylor’s Theorem) Consider f (x) = ln(1 + x). We showed using Taylor’s Theorem
(theorem 2.3) that the remainder R N is given by eq. 2.9.
a. Fix x = 0.4. Using the Taylor polynomial of order 1, show that the constant ξ is
approximately 0.1222 (to 4 dec. pl).
b. Plot a graph of ξ as a function of N. You should obtain a decreasing function. Verify
that ξ always lies in the interval (0, 0.4), in accordance with Taylor’s Theorem.

5 (The Blancmange function) For m = 0, 1, 2 . . ., let f m : R → R be defined by


1
f 0 (x) = min{|x − k |, k ∈ Z}, f m (x) = f 0 (2m x).
2m
a. Plot the graph of the function f m (x) for a few values of m. Show that the graphs
are straight-line segment with sharp (non-differentiable) points similar to y = −|x|.
b. Plot the Blancmange function defined by

X
g(x) = f m (x).
m=0

On the domain [0, 1], you should obtain the graph in fig. 2.12 (which looks like the
eponymous dessert). Clearly you will need to impose some cutoff to approximate
the infinite sum.
c. Generalise the Blancmange function by redefining f m (x) as
1
f m, K (x) = f 0 (K m x).
Km
Plot the generalised Blancmange function gK (x) = ∞ m=0 f m, K (x) for K =
P
2, 3, 4 . . . 20 on the same set of axes.
(Better yet, vary the plot using a slider for K. Use weierstrass.ipynb as a
template).
Conjecture the shape of the function when K → ∞. Can you prove it?
d. For K = 2, create a figure which shows the self-similarity structure of the graph (in
the style of fig. 1.15). Note also that the graph is periodic on R.
Conjecture the values of K ∈ R for which the graph shows i) periodicity, or ii)
self-similarity.
For an accessible account of the Blancmange function and its properties, see [200].

You might also like