Python Numerical Analysis
Python Numerical Analysis
Yaning Liu
Department of Mathematical and Statistical Sciences
University of Colorado Denver
Denver CO 80204
Giray Ökten
Department of Mathematics
Florida State University
Tallahassee FL 32306
To my wife Yanjun Pan-YL
Contents
1 Introduction 5
1.1 Review of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Python basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Computer arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Interpolation 75
3.1 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2 High degree polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . 92
3.3 Hermite interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.4 Piecewise polynomials: spline interpolation . . . . . . . . . . . . . . . . . . . 104
2
CONTENTS 3
References 187
Index 188
Preface
The book is based on “First semester in Numerical Analysis with Julia”, written by Giray
Ökten1 . The contents of the original book are retained, while all the algorithms are im-
plemented in Python (Version 3.8.0). Python is an open source (under OSI), interpreted,
general-purpose programming language that has a large number of users around the world.
Python is ranked the third in August 2020 by the TIOBE programming community index2 , a
measure of popularity of programming languages, and is the top-ranked interpreted language.
We hope this book will better serve readers who are interested in a first course in Numerical
Analysis, but are more familiar with Python for the implementation of the algorithms.
The first chapter of the book has a self-contained tutorial for Python, including how to
set up the computer environment. Anaconda, the open-source individual edition, is recom-
mended for an easy installation of Python and effortless management of Python packages,
and the Jupyter environment, a web-based interactive development environment for Python
as well as many other programming languages, was used throughout the book and is recom-
mended to the readers for easy code developement, graph visualization and reproducibility.
The book was also inspired by a series of Open Educational Resources workshops at Uni-
versity of Colorado Denver and supported partially by the professional development funding
thereof. Yaning Liu also thanks his students in his Numerical Analysis classes, who enjoyed
using Python to implement the algorithms and motivated him to write a Numerical Analysis
textbook with codes in Python.
1
https://open.umn.edu/opentextbooks/textbooks/first-semester-in-numerical-analysis-with-julia
2
https://www.tiobe.com/tiobe-index/
4
Chapter 1
Introduction
3. Let {xn }∞n=1 be an infinite sequence of real numbers. The sequence has the limit x,
i.e., limn→∞ xn = x (or, written as xn → x as n → ∞) if for any > 0, there exists an
integer N > 0 such that |xn − x| < whenever n > N.
1. f is continuous at x0
2. If {xn }∞
n=1 is any sequence converging to x0 , then limn→∞ f (xn ) = f (x0 ).
exists.
5
CHAPTER 1. INTRODUCTION 6
Notation: C n (A) denotes the set of all functions f such that f and its first n derivatives
are continuous on A. If f is only continuous on A, then we write f ∈ C 0 (A). C ∞ (A)
consists of functions that have derivatives of all orders, for example, f (x) = sin x or
f (x) = ex .
The following well-known theorems of Calculus will often be used in the remainder of the
book.
Theorem 4 (Mean value theorem). If f ∈ C 0 [a, b] and f is differentiable on (a, b), then
there exists c ∈ (a, b) such that f 0 (c) = f (b)−f
b−a
(a)
.
Theorem 5 (Extreme value theorem). If f ∈ C 0 [a, b] then the function attains a minimum
and maximum value over [a, b]. If f is differentiable on (a, b), then the extreme values occur
either at the endpoints a, b or where f 0 is zero.
Theorem 7 (Taylor’s theorem). Suppose f ∈ C n [a, b] and f (n+1) exists on (a, b), and x0 ∈
(a, b). Then, for x ∈ (a, b)
f (x) = Pn (x) + Rn (x)
(x − x0 )2 (x − x0 )n
Pn (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + f 00 (x0 ) + ... + f (n) (x0 )
2! n!
(x − x0 )n+1
Rn (x) = f (n+1) (ξ)
(n + 1)!
2. Compute the exact value for f (0.8), and the error |f (0.8) − P3 (0.8)|.
3. Use the remainder term R3 (x) to find an upper bound for the error |f (0.8) − P3 (0.8)|.
Compare the upper bound with the actual error found in part 2.
CHAPTER 1. INTRODUCTION 7
Therefore
π
P3 (x) = −π/2 − (π/2 + 1)(x − π/2) − (x − π/2)2 + (x − π/2)3 .
12
2. The exact value is f (0.8) = −0.2426 and the absolute error is |f (0.8) − P3 (0.8)| =
0.06062.
where
(0.8 − π/2)4
R3 (0.8) = f (4) (ξ)
4!
and ξ is between 0.8 and π/2. We need to differentiate f one more time: f (4) (x) =
4 sin x + x cos x. Since 0.8 < ξ < π/2, we can find an upper bound for f (4) (x), and thus
an upper bound for R3 (0.8), using triangle inequality:
Note that on 0.8 < ξ < π/2, sin ξ is a positive increasing function, and | sin(ξ)| <
sin(π/2) = 1. For the second term, we can find an upper bound by observing |ξ|
attains a maximum value of π/2 on 0.8 < ξ < π/2, and cos ξ, which is a decreasing
positive function on 0.8 < ξ < π/2, has a maximum value of cos (0.8) = 0.6967.
Putting these together, we get
Therefore, our estimate for the actual error (which is 0.06062 from part 2) is 0.07493.
CHAPTER 1. INTRODUCTION 8
Exercise 1.1-1: Find the second order Taylor polynomial for f (x) = ex sin x about
x0 = 0.
a) Compute P2 (0.4) to approximate f (0.4). Use the remainder term R2 (0.4) to find an
upper bound for the error |P2 (0.4)−f (0.4)|. Compare the upper bound with the actual
error.
R1 R1
b) Compute 0 P2 (x)dx to approximate 0 f (x)dx. Find an upper bound for the error
R1
using 0 R2 (x)dx, and compare it to the actual error.
In [1]: 2+3
Out[1]: 5
Now we import the sin and log function, as well as the π constant from the math
package,
In [3]: sin(pi/4)
Out[3]: 0.7071067811865475
One way to learn about a function is to search for it online in the Python documentation
https://docs.python.org/3/. For example, the syntax for the logarithm function in the
math package is log(x[, b]) where b is the optional base. If b is not provided, the natural
logarithm of x (to base e) is computed.
In [4]: log(4, 2)
Out[4]: 2.0
CHAPTER 1. INTRODUCTION 10
NumPy arrays
NumPy is a useful Python package for array data structure, random number generation,
linear algebra algorithms, and so on. A NumPy array is a data structure that can be used
to represent vectors and matrices, for which the computations are also made easier. Import
the NumPy package and define an alias (np) for it.
The following line of code shows the entries of the created array are integers of 64 bits.
In [7]: x.dtype
Out[7]: dtype('int64')
In [9]: x.dtype
Out[9]: dtype('float64')
A 1D NumPy array does not assume a particular row or column arrangement of the
data, and hence taking transpose for a 1D NumPy array is not valid. Here is another way
to construct a 1D array, and some array operations:
In [11]: x[-1]
CHAPTER 1. INTRODUCTION 11
Out[11]: 50
In [12]: min(x)
Out[12]: 10
In [13]: np.sum(x)
Out[13]: 150
In [15]: x[3]
Out[15]: 40
In [16]: x.size
Out[16]: 6
The NumPy package has a wide range of mathematical functions such as sin, log, etc.,
which can be applied elementwise to an array:
In [17]: x = np.array([1,2,3])
In [18]: np.sin(x)
Plotting
There are several packages for plotting functions and we will use the PyPlot package. The
package is preinstalled in Anaconda. To start the package, use
Let’s plot two functions, sin 3x and cos x, and label them appropriately.
Matrix operations
NumPy uses 2D arrays to represent matrices. Let’s create a 3 × 3 matrix (2D array):
In [23]: A.T
In [24]: np.linalg.inv(A)
CHAPTER 1. INTRODUCTION 14
In [26]: v = np.array([0,0,1])
v
In [27]: np.dot(A, v)
In [28]: np.linalg.solve(A, v)
In [29]: np.dot(np.linalg.inv(A), v)
In [30]: np.linalg.matrix_power(A, 5)
Logic operations
Here are some basic logic operations:
In [31]: 2 == 3
Out[31]: False
In [32]: 2 <= 3
Out[32]: True
Out[33]: True
Out[34]: False
Out[35]: True
In [36]: (5 % 2) == 0
Out[36]: False
Out[37]: True
Defining functions
There are two ways to define a function. Here is the basic syntax:
In [39]: squareit(3)
Out[39]: 9
There is also a compact form to define a function, if the body of the function is a short, simple
expression:
CHAPTER 1. INTRODUCTION 16
In [41]: cubeit(5)
Out[41]: 125
Suppose we want to pick the elements of an array that are greater than 0. This can be done
using:
In [42]: x = np.array([-2,3,4,5,-3,0])
x[x>0]
To count the number of elements that are greater than 0 in the array above, use
In [43]: x[x>0].size
Out[43]: 3
Types
In Python, there are several types for integers and floating-point numbers such as int8, int64, float32,
float64, and more advanced types for Boolean variables and strings. When we write a function, we
do not have to declare the type of its variables: Python figures what the correct type is when
the code is compiled. This is called a dynamic type system. For example, consider the squareit
function we defined before:
The type of x is not declared in the function definition. We can call it with real or integer
inputs, and Python will know what to do:
In [45]: squareit(5)
Out[45]: 25
In [46]: squareit(5.5)
Out[46]: 30.25
Now suppose the type of the input is a floating-point number. We can write another version of
squareit that specifies the type.
CHAPTER 1. INTRODUCTION 17
The input x is now statically typed. However, the purpose here is to add an annotation to
remind the users of the input type that should be used. In fact, Python interpreter will not perform
any type checking automatically, unless some additional packages are used. In other words, there
will be no difference between the squareit and typesquareit functions.
In [48]: typesquareit(5.5)
Out[48]: 30.25
In [49]: typesquareit(5)
Out[49]: 25
It can be seen that the function typesquareit has no problem taking the integer 5 as an input.
Control flow
Let’s create a NumPy array of 10 entries of floating-type. A simple way to do it is by using the
function np.zeros(n), which creates an array of size n, and sets each entry to zero. (A similar
function is np.ones(n) which creates an array of size n with each entry set to 1.)
Out[50]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Now we will set the elements of the array to values of sin function.
In [52]: values
Then use a while statement to generate the values, and append them to the array.
CHAPTER 1. INTRODUCTION 18
In [54]: n = 1
while n<=10:
newvalues = np.append(values, np.sin(n**2))
n += 1
newvalues
In [56]: f(2, 3)
2 is less than 3
In [57]: f(3, 2)
3 is greater than 2
In [58]: f(1, 1)
1 is equal to 1
In the next example we use if and while to find all the odd numbers in {1, . . . , 10}. The empty
array created in the first line is of int64 type.
In [60]: n = 1
while n <= 20:
if n%2 == 0:
print(n)
break
n += 1
Why did the above execution stop at 2? Let’s try removing break:
In [61]: n = 1
while n <= 20:
if n%2 == 0:
print(n)
n += 1
2
4
6
8
10
12
14
16
18
20
The function break causes the code to exit the while loop once it is evaluated.
Random numbers
These are 5 uniform random numbers from (0,1).
In [63]: np.random.rand(5)
And these are random numbers from the standard normal distribution:
In [64]: np.random.randn(5)
Here is a frequency histogram of 105 random numbers from the standard normal distribution
using 50 bins:
In [65]: y = np.random.randn(10**5)
plt.hist(y, 50);
Sometimes we are interested in relative frequency histograms where the height of each bin is the
relative frequency of the numbers in the bin. Adding the option "density=true" outputs a relative
frequency histogram:
In [66]: y = np.random.randn(10**5)
plt.hist(y, 50, density=True);
CHAPTER 1. INTRODUCTION 21
Exercise 1.2-1: In Python you can compute the factorial of a positive integer n by the built-
in function factorial(n) from the subpackage special in SciPy (scipy.special.factorial(n)). Write
your own version of this function, called f actorial2, using a for loop. Use the time.time() function
to compare the execution time of your version and the built-in version of the factorial function.
Exercise 1.2-2: Write a Python code to estimate the value of π using the following procedure:
Place a circle of diameter one in the unit square. Generate 10,000 pairs of random numbers (u, v)
from the unit square. Count the number of pairs (u, v) that fall into the circle, and call this number
n. Then n/10000 is approximately the area of the circle. (This approach is known as the Monte
Carlo method.)
b) Write a Python code that computes f . Verify f (2, 3) matches your answer above.
Therefore the numbers that can be represented in a computer exactly is only a subset of rational
numbers. Anytime the computer performs an operation whose outcome is not a number that can be
represented exactly in the computer, an approximation will replace the exact number. This is called
the roundoff error : error produced when a computer is used to perform real number calculations.
where
s → sign of x = ±1
e → exponent, with bounds L ≤ e ≤ U
a1 a2 at
(.a1 ...at )β = + 2 + ... + t ; the mantissa
β β β
β → base
t → number of digits; the precision.
In the floating-point representation (1.1), if we specify e in such a way that a1 6= 0, then the
representation will be unique. This is called the normalized floating-point representation. For
example if β = 10, in the normalized floating-point we would write 0.012 as 0.12 × 10−1 , instead of
choices like 0.012 × 100 or 0.0012 × 10.
In most computers today, the base is β = 2. Bases 8 and 16 were used in old IBM mainframes in
the past. Some handheld calculators use base 10. An interesting historical example is a short-lived
computer named Setun developed at Moscow State University which used base 3.
There are several choices to make in the general floating-point model (1.1) for the values of
s, β, t, e. The IEEE 64-bit floating-point representation is the specific model used in most computers
today:
Some comments:
• Notice how s appears in different forms in equations (1.1) and (1.2). In (1.2), s is either 0 or
1. If s = 0, then x is positive. If s = 1, x is negative.
• Since β = 2, in the normalized floating-point representation of x the first (nonzero) digit after
the decimal point has to be 1. Then we do not have to store this number. That’s why we
write x as a decimal number starting at 1 in (1.2). Even though precision is t = 52, we are
able to access up to the 53rd digit a53 .
• The bounds for the exponent are: 0 ≤ e ≤ 2047. We will discuss where 2047 comes from
shortly. But first, let’s discuss why we have e−1023 as the exponent in the representation (1.2),
as opposed to simply e (which we had in the representation (1.1)). If the smallest exponent
possible was e = 0, then the smallest positive number the computer can generate would be
(1.00...0)2 = 1: certainly we need the computer to represent numbers less than 1! That’s why
we use the shifted expression e − 1023, called the biased exponent, in the representation
(1.2). Note that the bounds for the biased exponent are −1023 ≤ e − 1023 ≤ 1024.
Here is a schema that illustrates how the physical bits of a computer correspond to the representation
CHAPTER 1. INTRODUCTION 23
above. Each cell in the table below, numbered 1 through 64, correspond to the physical bits in the
computer memory.
1 2 3 ... 12 13 ... 64
• The first bit is the sign bit: it stores the value for s, 0 or 1.
• The blue bits 2 through 12 store the exponent e (not e−1023). Using 11 bits, one can generate
the integers from 0 to 211 − 1 = 2047. Here is how you get the smallest and largest values for
e:
e = (00...0)2 = 0
211 − 1
e = (11...1)2 = 20 + 21 + ... + 210 = = 2047.
2−1
• The red bits, and there are 52 of them, store the digits a2 through a53 .
Solution. You can check that 10 = (1010)2 and 0.375 = (.011)2 by computing
10 = 0 × 20 + 1 × 21 + 0 × 22 + 1 × 23
0.375 = 0 × 2−1 + 1 × 2−2 + 1 × 2−3 .
Then
10.375 = (1010.011)2 = (1.010011)2 × 23
0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 ... 0
Notice the first sign bit is 0 since the number is positive. The next 11 bits (in blue) represent the
exponent e = 1026, and the next group of red bits are the mantissa, filled with 0’s after the last
digit of the mantissa. In Python, although there is no built-in function that produces the bit by bit
representation of a number, we can define the following function named float2bin, which provides
the bit representation of a floating-point number, based on the struct package:
CHAPTER 1. INTRODUCTION 24
In [2]: float2bin(10.375)
Out[2]: '0100000000100100110000000000000000000000000000000000000000000000'
When the exponent bits are set to zero, we have e = 0 and thus e − 1023 = −1023. This
arrangement, all exponent bits set to zero, is reserved for ±0.0 and subnormal numbers. Subnor-
mal numbers are an exception to our normalized floating-point representation, an exception that is
useful in various ways. For details see Goldberg [9].
Here is how plus and minus infinity is represented in the computer:
When the exponent bits are set to one, we have e = 2047 and thus e − 1023 = 1024. This
arrangement is reserved for ±∞ as well as other special values such as NaN (not-a-number).
In conclusion, even though −1023 ≤ e−1023 ≤ 1024 in (1.2), when it comes to representing non-
zero real numbers, we only have access to exponents in the following range: −1022 ≤ e−1023 ≤ 1023.
Therefore, the smallest positive real number that can be represented in a computer is
During a calculation, if a number less than the smallest floating-point number is obtained, then we
obtain the underflow error. A number greater than the largest gives overflow error.
Exercise 1.3-1: Consider the following toy model for a normalized floating-point represen-
tation in base 2: x = (−1)s (1.a2 a3 )2 × 2e where −1 ≤ e ≤ 1. Find all positive machine numbers
(there are 12 of them) that can be represented in this model. Convert the numbers to base 10,
and then carefully plot them on the number line, by hand, and comment on how the numbers are
spaced.
Representation of integers
In the previous section, we discussed representing real numbers in a computer. Here we will give
a brief discussion of representing integers. How does a computer represent an integer n? As in
real numbers, we start with writing n in base 2. We have 64 bits to represent its digits and sign.
As in the floating-point representation, we can allocate one bit for the sign, and use the rest, 63
bits, for the digits. This approach has some disadvantages when we start adding integers. Another
approach, known as the two’s complement, is more commonly used, including in Python.
For an example, assume we have 8 bits in our computer. To represent 12 in two’s complement
(or any positive integer), we simply write it in its base 2 expansion: (00001100)2 . To represent −12,
we do the following: flip all digits, replacing 1 by 0, and 0 by 1, and then add 1 to the result. When
we flip digits for 12, we get (11110011)2 , and adding 1 (in binary), gives (11110100)2 . Therefore
−12 is represented as (11110100)2 in two’s complement approach. It may seem mysterious to go
through all this trouble to represent −12, until you add the representations of 12 and −12,
and realize that the first 8 digits of the sum (from right to left), which is what the computer can
only represent (ignoring the red digit 1), is (00000000)2 . So just like 12 + (−12) = 0 in base 10, the
sum of the representations of these numbers is also 0.
We can repeat these calculations with 64-bits, using Python. The function int2bin defined below
outputs the digits of an integer, using two’s complement for negative numbers:
In [2]: int2bin(12)
CHAPTER 1. INTRODUCTION 26
Out[2]: '0000000000000000000000000000000000000000000000000000000000001100'
In [3]: int2bin(-12)
Out[3]: '1111111111111111111111111111111111111111111111111111111111110100'
You can verify that the sum of these representations is 0, when truncated to 64-digits.
Here is another example illustrating the advantages of two’s complement. Consider −3 and 5,
with representations,
−3 = (11111101)2 and 5 = (00000101)2 .
The sum of −3 and 5 is 2; what about the binary sum of their representations? We have
and if we ignore the ninth bit in red, the result is (10)2 , which is indeed 2. Notice that if we followed
the same approach used in the floating-point representation and allocated the leftmost bit to the
sign of the integer, we would not have had this property.
We will not discuss integer representations and integer arithmetic any further. However one
useful fact to keep in mind is the following: in two’s complement, using 64 bits, one can represent
integers between −263 = −9223372036854775808 and 263 − 1 = 9223372036854775807. Any integer
below or above yields underflow or overflow error. However, Python is special compared to
the other programming languages in that it supports arbitrary-precision integer implementation.
Integers of any size can be supported in Python, while it is not true for floats.
nn nn
Example 10. From Calculus, we know that limn−>∞ n! = ∞. Therefore, computing n! , which is
a float, for large n will cause overflow at some point. Here is a Python code for this calculation:
Let’s have a closer look at this function. The Python function factorial(n) computes the
factorial, returned as a float, as the gamma function at n + 1. If we call f with an integer input,
then the above code will compute nn as an integer and factorial(n) as a float. Then it will divide
these numbers, by first converting nn to a float, and obtain a floating-point factorial. Let’s compute
nn
f (n) = n! as n = 1, ..., 1000:
.
.
CHAPTER 1. INTRODUCTION 27
.
135 1.4629522660965042e+57
136 3.962087287769907e+57
137 1.0730739055834461e+58
138 2.90634235201109e+58
139 7.871822311096416e+58
140 2.132136506640898e+59
141 5.775183434192781e+59
142 1.5643266954396485e+60
143 4.2374040203696554e+60
---------------------------------------------------------------------------
<ipython-input-2-88a424afb432> in <module>
1 for n in range(1, 1001):
----> 2 print(n, f(n))
<ipython-input-1-6df979c5f68c> in <lambda>(n)
1 from scipy.special import factorial
----> 2 f = lambda n: n**n/factorial(n)
Notice that the process cannot proceed beyond n = 143. Where exactly does the error occur?
Python can compute 144144 exactly (arbitrary-precision); however, when it is converted to a floating-
point number, overflow occurs. For 143143 , overflow does not happen.
In [3]: float(143**143)
Out[3]: 1.6332525972973913e+308
In [4]: float(144**144)
CHAPTER 1. INTRODUCTION 28
---------------------------------------------------------------------------
<ipython-input-2-9d2f48a83404> in <module>
----> 1 float(144**144)
nn
The function f (n) can be coded in a much better way if we rewrite n! as n n−1 ... 1 .
n n n
Each
fraction can be computed separately, and then multiplied, which will slow down the growth of the
numbers. Here is a new code using a for statement.
nn
Let’s compute f (n) = n! as n = 1, ..., 1000 again:
.
.
.
705 2.261381911989747e+304
706 6.142719392952918e+304
707 1.6685832310803885e+305
708 4.532475935558631e+305
709 1.2311857272492938e+306
710 3.34435267514059e+306
711 9.084499321839277e+306
712 2.4676885942863887e+307
713 6.70316853566547e+307
714 inf
CHAPTER 1. INTRODUCTION 29
715 inf
716 inf
717 inf
718 inf
.
.
.
The previous version of the code gave overflow error when n = 144. This version has no difficulty
in computing nn /n! for n = 144. In fact, we can go as high as n = 713. Overflow in floating-point
arithmetic yields the output inf, which stands for infinity.
Another way to accommodate a larger value of n is to define the function f in this way:
With the addition input “True”, the function factorial(n) returns an integer with arbitrary
precision, instead of a float, so that both the numerator and denominator are integers. The limit
process stops only when the division produces a floating-point number that is so large that overflow
occurs. As a result, the same result as in the improved algorithm above is expected. Please verify
that the process can continue until n = 713 with the modified f function.
We will discuss several features of computer arithmetic in the rest of this section. The discussion
is easier to follow if we use the familiar base 10 representation of numbers instead of base 2. To this
end, we introduce the normalized decimal floating-point representation:
where 1 ≤ d1 ≤ 9, 0 ≤ di ≤ 9 for all i = 2, 3, ..., k. Informally, we call these numbers k−digit decimal
machine numbers.
• In chopping, we simply take the first k digits and ignore the rest: f l(x) = 0.d1 d2 . . . dk .
Example 11. Find 5-digit (k = 5) chopping and rounding values of the numbers below:
• π = 0.314159265... × 101
Chopping gives f l(π) = 0.31415 and rounding gives f l(π) = 0.31416.
• 0.0001234567
We need to write the number in the normalized representation first as 0.1234567 × 10−3 . Now
chopping gives 0.12345 and rounding gives 0.12346.
Relative error usually is a better choice of measure, and we need to understand why.
Notice how the only difference in the three cases is the exponent of the numbers. The absolute
errors are: 0.01 × 10 , 0.01 × 10−2 , 0.01 × 105 . The absolute errors are different since the exponents
are different. However, the relative error in each case is the same: 0.05.
Definition 14. The number x∗ is said to approximate x to s significant digits (or figures) if s is
the largest nonnegative integer such that
|x − x∗ |
≤ 5 × 10−s .
|x|
|x−x∗ |
In Example 13 we had |x| = 0.05 ≤ 5 × 10−2 but not less than or equal to 5 × 10−3 . Therefore
we say x∗ = 0.21 approximates x = 0.20 to 2 significant digits (but not to 3 digits).
When the computer approximates a real number x by f l(x), what can we say about the error?
The following result gives an upper bound for the relative error.
CHAPTER 1. INTRODUCTION 31
Lemma 15. The relative error of approximating x by f l(x) in the k-digit normalized decimal
floating-point representation satisfies
|x − f l(x)| 10−k+1 if chopping
≤
|x| 1 (10−k+1 ) if rounding.
2
Proof. We will give the proof for chopping; the proof for rounding is similar but tedious. Let
Then
f l(x) = 0.d1 d2 ...dk × 10n
We have two simple bounds: 0.dk+1 dk+2 ... < 1 and 0.d1 d2 ... ≥ 0.1, the latter true since the smallest
d1 can be, is 1. Using these bounds in the equation above we get
|x − f l(x)| 1 −k
≤ 10 = 10−k+1 .
|x| 0.1
Machine epsilon
Machine epsilon is the smallest positive floating point number for which f l(1 + ) > 1. This means,
if we add to 1.0 any number less than , the machine computes the sum as 1.0.
The number 1.0 in its binary floating-point representation is simply (1.0 . . . 0)2 where a2 = a3 =
... = a53 = 0. We want to find the smallest number that gives a sum larger than 1.0, when it is
added to 1.0. The answer depends on whether we chop or round.
If we are chopping, examine the binary addition
CHAPTER 1. INTRODUCTION 32
a2 a52 a53
1. 0 ... 0 0
+ 0. 0 ... 0 1
1. 0 ... 0 1
1 52
and notice (0.0...01)2 = = 2−52 is the smallest number we can add to 1.0 such that the sum
2
will be different than 1.0.
If we are rounding, examine the binary addition
a2 a52 a53
1. 0 ... 0 0
+ 0. 0 ... 0 0 1
1. 0 ... 0 0 1
a2 a52 a53
.
1. 0 ... 0 1
1 53
Observe that the number added to 1.0 above is (0.0...01)2 = = 2−53 , which is the smallest
2
number that will make the sum larger than 1.0 with rounding.
In summary, we have shown
2−52 if chopping
= .
2−53 if rounding
As a consequence, notice that we can restate the inequality in Remark 16 in a compact way using
the machine epsilon as:
|x − f l(x)|
≤ .
|x|
Remark 17. There is another definition of machine epsilon: it is the distance between 1.0 and the
next floating-point number.
a2 a52 a53
number 1.0 1. 0 ... 0 0
next number 1. 0 ... 0 1
distance 0. 0 ... 0 1
1 52
Note that the distance (absolute value of the difference) is = 2−52 . In this alternative
2
definition, machine epsilon is not based on whether rounding or chopping is used. Also, note that
the distance between two adjacent floating-point numbers is not constant, but it is smaller for
smaller values, and larger for larger values (see Exercise 1.3-1).
CHAPTER 1. INTRODUCTION 33
Propagation of error
We discussed the resulting error when chopping or rounding is used to approximate a real number by
its machine version. Now imagine carrying out a long calculation with many arithmetical operations,
and at each step there is some error due to say, rounding. Would all the rounding errors accumulate
and cause havoc? This is a rather difficult question to answer in general. For a much simpler
example, consider adding two real numbers x, y. In the computer, the numbers are represented as
f l(x), f l(y). The sum of these number is f l(x) + f l(y); however, the computer can only represent
its floating-point version, f l(f l(x) + f l(y)). Therefore the relative error in adding two numbers is:
(x + y) − f l(f l(x) + f l(y))
.
x+y
In this section, we will look at some specific examples where roundoff error can cause problems, and
how we can avoid them.
From the relative errors, we see that f l(x) and f l(y) approximate x and y to six significant digits.
Let’s see how the error propagates when we subtract x and y. The actual difference is:
The computer finds this difference first by computing f l(x), f l(y), then taking their difference and
approximating the difference by its floating-point representation: f l(f l(x) − f l(y)) :
Notice how large the relative error is compared to the absolute error! The machine version of x − y
approximates x − y to only one significant digit. Why did this happen? When we subtract two
numbers that are nearly equal, the leading digits of the numbers cancel, leaving a result close to
the rounding error. In other words, the rounding error dominates the difference.
error of 0.4 and relative error of 9 × 10−6 . The absolute error went from 4 × 10−6 to 0.4. Perhaps
not surprisingly, division by a small number magnifies the absolute error but not the relative error.
Consider the computation of
1 − cos x
sin x
when x is near zero. This is a problem where we have both subtraction of nearly equal quantities
which happens in the numerator, and division by a small number, when x is close to zero. Let
x = 0.1. Continuing with five-digit rounding, we have
The exact result to 8 digits is 0.050041708, and the relative error of this computation is 8.5 × 10−4 .
Next we will see how to reduce this error using a simple algebraic identity.
1 − cos x
.
sin x
CHAPTER 1. INTRODUCTION 35
1 − cos x sin x
=
sin x 1 + cos x
removes both difficulties encountered before: there is no cancellation of significant digits or division
by a small number. Using five-digit rounding, we have
sin 0.1
fl = 0.050042.
1 + cos 0.1
The relative error is 5.8 × 10−6 , about a factor of 100 smaller than the error in the original compu-
tation.
Notice the larger relative error in r2 compared to that of r1 , about a factor of 100, which is due to
cancellation of leading digits when we compute 11.0 − 10.82.
One way to fix this problem is to rewrite the offending expression by rationalizing the numerator:
√ √
√
1
11.0 − 117 1 11.0 − 117 4 2
r2 = = √ 11.0 + 117 = √ = √ .
2 2 11.0 + 117 2 11.0 + 117 11.0 + 117
CHAPTER 1. INTRODUCTION 36
an improvement about a factor of 100, even though in the new way of computing r2 there are two
operations where rounding error happens instead of one.
Example 20. The simple procedure of adding numbers, even if they do not have mixed signs, can
accumulate large errors due to rounding or chopping. Several sophisticated algorithms to add large
lists of numbers with accumulated error smaller than straightforward addition exist in the literature
(see, for example, Higham [11]).
For a simple example, consider using four-digit arithmetic with rounding, and computing the
average of two numbers, 2 .
a+b
For a = 2.954 and b = 100.9, the true average is 51.927. However,
four-digit arithmetic with rounding yields:
100.9 + 2.954 f l(103.854) 103.9
fl = fl = fl = 51.95,
2 2 2
which has a relative error of 4.43 × 10−4 . If we rewrite the averaging formula as a + b−a
2 , on the other
hand, we obtain 51.93, which has a much smaller relative error of 5.78 × 10−5 . The following table
displays the exact and 4-digit computations, with the corresponding relative error at each step.
a+b b−a
a b a+b 2
b−a 2
a + b−a
2
4-digit rounding 2.954 100.9 103.9 51.95 97.95 48.98 51.93
Exact 103.854 51.927 97.946 48.973 51.927
Relative error 4.43e-4 4.43e-4 4.08e-5 1.43e-4 5.78e-5
Example 21. There are two standard formulas given in textbooks to compute the sample variance
s2 of the numbers x1 , ..., xn :
hP i
n 1 Pn 2
1. s2 = n−1
1
i=1 xi − n ( i=1 xi ) ,
2
Pn Pn
2. First compute x̄ = 1
n i=1 xi , and then s2 = 1
n−1 i=1 (xi − x̄)2 .
Both formulas can suffer from roundoff errors due to adding large lists of numbers if n is large,
as mentioned in the previous example. However, the first formula is also prone to error due to
cancellation of leading digits (see Chan et al [6] for details).
For an example, consider four-digit rounding arithmetic, and let the data be 1.253, 2.411, 3.174.
The sample variance from formula 1 and formula 2 are, 0.93 and 0.9355, respectively. The exact
CHAPTER 1. INTRODUCTION 37
value, up to 6 digits, is 0.935562. Formula 2 is a numerically more stable choice for computing the
variance than the first one.
Out[2]: 0.009183673977218275
This result has a relative error of 9.1. We can avoid this huge error if we simply rewrite the
above sum as
1 1
e−7 = 7
= 72 73
.
e 1 + 7 + 2! + + ...
3!
Out[3]: 0.0009118951837867185
Exercise 1.3-2: The x-intercept of the line passing through the points (x1 , y1 ) and (x2 , y2 )
can be computed using either one of the following formulas:
x1 y2 − x2 y1
x=
y2 − y1
or,
(x2 − x1 )y1
x = x1 −
y2 − y1
with the assumption y1 6= y2 .
b) Compute the x-intercept using each formula when (x1 , y1 ) = (1.02, 3.32) and (x2 , y2 ) =
(1.31, 4.31). Use three-digit rounding arithmetic.
c) Use Python (or a calculator) to compute the x-intercept using the full-precision of the device
(you can use either one of the formulas). Using this result, compute the relative and absolute
errors of the answers you gave in part (b). Discuss which formula is better and why.
Exercise 1.3-3: Write two functions in Python to compute the binomial coefficient m
using
k
the following formulas:
a) m m!
(m! is scipy.special.factorial(m) in Python.)
k = k!(m−k)!
b) m
= (m m−1 m−k+1
k k )( k−1 ) × ... × ( 1 )
Then, experiment with various values for m, k to see which formula causes overflow first.
Exercise 1.3-4: Polynomials can be evaluated in a nested form (also called Horner’s method)
that has two advantages: the nested form has significantly less computation, and it can reduce
roundoff error. For
p(x) = a0 + a1 x + a2 x2 + ... + an−1 xn−1 + an xn
a) Compute p(3.5) using three-digit rounding, and three-digit chopping arithmetic. What are
the absolute errors? (Note that the exact value of p(3.5) is 13.3.)
Then compute p(3.5) using three-digit rounding and chopping using the nested form. What
are the absolute errors? Compare the errors with the ones you found in (a).
Exercise 1.3-5: Consider the polynomial written in standard form: 5x4 + 3x3 + 4x2 + 7x − 5.
a) Write the polynomial in its nested form. (See the previous problem.)
b) How many multiplications does the nested form require when we evaluate the polynomial at a
real number? How many multiplications does the standard form require? Can you generalize
your answer to any nth degree polynomial?
CHAPTER 1. INTRODUCTION 40
Arya computes the product using her calculator: 2.312 × 0.003982 = 0.009206384, and stares at
the result in bewilderment. The numbers she multiplied had four-significant digits, but the product
has seven digits! Could this be the result of some magic, like a rabbit hopping out of a magician’s
hat that was only a handkerchief a moment ago? After some internal deliberations, Arya decides
to report the answer to her professor as 0.009206. Do you think Arya was correct in not reporting
all of the digits of the product?
CHAPTER 1. INTRODUCTION 41
1. Error due to the simplifying assumptions made in the development of a mathematical model
for the physical problem.
2. Programming errors.
5. Mathematical truncation error: error that results from the use of numerical methods in solving
a problem, such as evaluating a series by a finite sum, a definite integral by a numerical
integration method, solving a differential equation by a numerical method.
Example 23. The volume of the Earth could be computed using the formula for the volume of a
sphere, V = 4/3πr3 , where r is the radius. This computation involves the following approximations:
Exercise 1.3-6: The following is from "Numerical mathematics and computing" by Cheney
& Kincaid [7]:
In 1996, the Ariane 5 rocket launched by the European Space Agency exploded 40 seconds
after lift-off from Kourou, French Guiana. An investigation determined that the horizontal velocity
required the conversion of a 64-bit floating-point number to a 16-bit signed integer. It failed because
the number was larger than 32,767, which was the largest integer of this type that could be stored in
memory. The rocket and its cargo were valued at $500 million.
Search online, or in the library, to find another example of computer arithmetic going very
wrong! Write a short paragraph explaining the problem, and give a reference.
Chapter 2
Figure 2.1: Rhind Mathematical Papyrus. (British Museum Image under a Creative Com-
mons license.)
The papyrus has a collection of mathematical problems and their solutions; a translation is given
by Chace and Manning [2]. The following is Problem 26, taken from [2]:
42
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 43
A quantity and its 1/4 added together become 15. What is the quantity?
Assume 4.
\1 4
\1/4 1
Total 5
As many times as 5 must be multiplied to give 15, so many times 4 must be multiplied
to give the required number. Multiply 5 so as to get 15.
\1 5
\2 10
Total 3
Multiply 3 by 4.
1 3
2 6
\4 12
The quantity is
12
1/4 3
Total 15
Arya’s instructor knows she has taken math classes and asks her if she could decipher this
solution. Although Arya’s initial reaction to this assignment can be best described using the word
"despair", she quickly realizes it is not as bad as she thought. Here is her thought process: the
question, in our modern notation is, find x if x + x/4 = 15. The solution starts with an initial guess
p = 4. It then evaluates x + x/4 when x = p, and finds the result to be 5: however, what we need is
15, not 5, and if we multiply both sides of p + p/4 = 5 by 3, we get (3p) + (3p)/4 = 15. Therefore,
the solution is 3p = 12.
Here is a more general analysis of this solution technique. Suppose we want to solve the equation
g(x) = a, and that g is a linear map, that is, g(λx) = λg(x) for any constant λ. Then, the solution
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 44
is x = ap/b where p is an initial guess with g(p) = b. To see this, simply observe
ap a
g( ) = g(p) = a.
b b
1. |pN − pN −1 | < ,
p −p
2. N pNN −1 < , pN 6= 0, or
3. |f (pN )| < .
1. It is possible to have a sequence {pn } such that pn − pn−1 → 0 but {pn } diverges.
In our numerical results, we will experiment with various stopping criteria. However, the second
criterion is usually preferred over the others.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 45
Exercise 2.-1: Solve the following problems and discuss their relevance to the stopping
criteria.
Pn
a) Consider the sequence pn where pn = i=1 i .
1
Argue that pn diverges, but limn→∞ (pn −
pn−1 ) = 0.
Definition 24. Suppose {pn } converges to p. If there are constants C > 0 and α > 1 such that
Special cases:
• If α = 1 and C < 1, we say the convergence is linear, and the rate of convergence is C. In
this case, using induction, we can show
There are some methods for which Equation (2.2) holds, but Equation (2.1) does not hold
for any C < 1. We still call these methods to be of linear convergence. An example is the
bisection method.
The first sequence converges to 0 linearly, and the second quadratically. Here are a few iterations
of the sequences:
n Linear Quadratic
1 0.7 0.7
4 0.24 4.75 × 10−3
8 5.76 × 10−2 3.16 × 10−40
Observe how fast quadratic convergence is compared to linear convergence.
Example 26. Compute the first three iterations by hand for the function plotted in Figure (2.2).
2.5
-4 -3 -2 -1 0 1 2 3 4
-2.5
Figure 2.2
Step 1 : To start, we need to pick an interval [a, b] that contains the root, that is, f (a)f (b) < 0. From
the plot, it is clear that [0, 4] is a possible choice. In the next few steps, we will be working with
a sequence of intervals. For convenience, let’s label them as [a, b] = [a1 , b1 ], [a2 , b2 ], [a3 , b3 ],
etc. Our first interval is then [a1 , b1 ] = [0, 4]. Next we find the midpoint of the interval,
p1 = 4/2 = 2, and use it to obtain two subintervals [0, 2] and [2, 4]. Only one of them
contains the root, and that is [2, 4].
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 47
Step 2: From the previous step, our current interval is [a2 , b2 ] = [2, 4]. We find the midpoint1 p2 =
2+4
2 = 3, and form the subintervals [2, 3], [3, 4]. The one that contains the root is [3, 4].
Step 3: We have [a3 , b3 ] = [3, 4]. The midpoint is p3 = 3.5. We are now pretty close to the root
visually, and we stop the calculations!
• Stopping criteria
• It’s possible that the stopping criterion is not satisfied in a reasonable amount of time. We
need a maximum number of iterations we are willing to run the code with.
Remark 27.
2. There is a convenient stopping criterion for the bisection method that was not mentioned
before. One can stop when the interval [a, b] at step n is such that |a − b| < . This is similar
to the first stopping criterion discussed earlier, but not the same. One can also use more than
one stopping criterion; an example is in the Python code that follows.
Let’s use the bisection method to find the root of f (x) = x5 + 2x3 − 5x − 2, with = 10−4 . Note
that [0, 2] contains a root, since f (0) < 0 and f (2) > 0. We set N = 20 below.
In [4]: x = 1.319671630859375
x**5+2*x**3-5*x-2
Out[4]: 0.000627945623044468
Let’s see what happens if N is set too small and the method does not converge.
Method did not converge. The last iteration gives 1.3125 with
function value -0.14562511444091797
Theorem 28. Suppose that f ∈ C 0 [a, b] and f (a)f (b) < 0. The bisection method generates a
sequence {pn } approximating a zero p of f (x) with
b−a
|pn − p| ≤ , n ≥ 1.
2n
Proof. Let the sequences {an } and {bn } denote the left-end and right-end points of the subintervals
generated by the bisection method. Since at each step the interval is halved, we have
1
bn − an = (bn−1 − an−1 ).
2
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 49
1 1 1
bn − an = (bn−1 − an−1 ) = 2 (bn−2 − an−2 ) = ... = n−1 (b1 − a1 ).
2 2 2
Therefore bn − an = 1
2n−1
(b − a). Observe that
1 1
|pn − p| ≤ (bn − an ) = n (b − a) (2.3)
2 2
Proof. The bisection method does not satisfy (2.1) for any C < 1, but it satisfies a variant of (2.2)
with C = 1/2 from the previous theorem.
Finding the number of iterations to obtain a specified accuracy: Can we find n that will
ensure |pn − p| ≤ 10−L for some given L?
We have, from the proof of the previous theorem (see (2.3)) : |pn − p| ≤ 1
2n (b − a). Therefore, we
can make |pn − p| ≤ 10−L , by choosing n large enough so that the upper bound 1
2n (b − a) is less
than 10−L :
1 b−a
(b − a) ≤ 10−L ⇒ n ≥ log2 .
2n 10−L
Example 30. Determine the number of iterations necessary to solve f (x) = x5 + 2x3 − 5x − 2 = 0
with accuracy 10−4 , a = 0, b = 2.
Exercise 2.2-1: Find the root of f (x) = x3 + 4x2 − 10 using the bisection method, with the
following specifications:
a) Modify the Python code for the bisection method so that the only stopping criterion is whether
f (p) = 0 (remove the other criterion from the code). Also, add a print statement to the code,
so that every time a new p is computed, Python prints the value of p and the iteration number.
b) Find the number of iterations N necessary to obtain an accuracy of 10−4 for the root, using
the theoretical results of Section 2.2. (The function f (x) has one real root in (1, 2), so set
a = 1, b = 2.)
c) Run the code using the value for N obtained in part (b) to compute p1 , p2 , ..., pN (set a =
1, b = 2 in the modified Python code).
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 50
d) The actual root, correct to six digits, is p = 1.36523. Find the absolute error when pN is used
to approximate the actual root, that is, find |p − pN |. Compare this error, with the upper
bound for the error used in part (b).
Exercise 2.2-2: Find an approximation to 251/3 correct to within 10−5 using the bisection
algorithm, following the steps below:
b) Find an interval (a, b) that contains the root, using Intermediate Value Theorem.
c) Determine, analytically, the number of iterates necessary to obtain the accuracy of 10−5 .
d) Use the Python code for the bisection method to compute the iterate from (c), and compare
the actual absolute error with 10−5 .
(p − p0 )2 00
0 = f (p0 ) + (p − p0 )f 0 (p0 ) + f (ξ(p))
2!
f (p0 ) (p − p0 )2 f 00 (ξ(p))
p = p0 − − . (2.4)
f 0 (p0 ) 2 f 0 (p0 )
If |p − p0 | is "small" then (p − p0 )2 is even smaller, and the error term can be dropped to obtain
the following approximation:
f (p0 )
p ≈ p0 − .
f 0 (p0 )
The idea in Newton’s method is to set the next iterate, p1 , to this approximation:
f (p0 )
p1 = p0 − .
f 0 (p0 )
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 51
(p − p0 )2 f 00 (ξ(p))
p = p1 − . (2.5)
2 f 0 (p0 )
Summary: Start with an initial approximation p0 to p and generate the sequence {pn }∞
n=1 by
f (pn−1 )
pn = pn−1 − , n ≥ 1. (2.6)
f 0 (pn−1 )
Graphical interpretation:
Start with p0 . Draw the tangent line at (p0 , f (p0 )) and approximate p by the intercept p1 of the
line:
0 − f (p0 ) f (p0 ) f (p0 )
f 0 (p0 ) = ⇒ p1 − p0 = − 0 ⇒ p1 = p0 − 0 .
p1 − p0 f (p0 ) f (p0 )
Now draw the tangent at (p1 , f (p1 )) and continue.
Remark 31.
1. Clearly Newton’s method will fail if f 0 (pn ) = 0 for some n. Graphically this means the
tangent line is parallel to the x-axis so we cannot get the x-intercept.
2. Newton’s method may fail to converge if the initial guess p0 is not close to p. In Figure (2.3),
either choice for p0 results in a sequence that oscillates between two points.
p0
p0
Exercise 2.3-1: Sketch the graph for f (x) = x2 − 1. What are the roots of the equation
f (x) = 0?
1. Let p0 = 1/2 and find the first two iterations p1 , p2 of Newton’s method by hand. Mark the
iterates on the graph of f you sketched. Do you think the iterates will converge to a zero of
f?
2. Let p0 = 0 and find p1 . What are your conclusions about the convergence of the iterates?
Let’s apply Newton’s method to find the root of f (x) = x5 + 2x3 − 5x − 2, a function we
considered before. First, we plot the function.
ax.spines['right'].set_position('center')
ax.spines['bottom'].set_position('center')
ax.spines['top'].set_position('center')
ax.set_ylim([-40, 40])
plt.plot(x,y);
The derivative is f 0 = 5x4 + 6x2 − 5, we set pin = 1, eps = = 10−4 , and N = 20, in the code.
Recall that the bisection method required 16 iterations to approximate the root in [0, 2] as
p = 1.31967. (However, the stopping criterion used in bisection and Newton’s methods are slightly
different.) 1.3196 is the rightmost root in the plot. But there are other roots of the function. Let’s
run the code with pin = 0.
Now we use pin = −2.0 which will give the leftmost root.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 54
Theorem 32. Let f ∈ C 2 [a, b] and assume f (p) = 0, f 0 (p) 6= 0 for p ∈ (a, b). If p0 is chosen
sufficiently close to p, then Newton’s method generates a sequence that converges to p with
p − pn+1 f 00 (p)
lim = − .
n→∞ (p − pn )2 2f 0 (p)
|p − p0 ||p − p0 | f 00 (ξ(p))
|p − p1 | = f 0 (p0 ) < |p − p0 ||p − p0 |M < |p − p0 | ≤ . (2.7)
2
(p − pn )2 f 00 (ξ(p))
p − pn+1 = − . (2.8)
2 f 0 (pn )
Here ξ(p) is a number between p and pn . Since ξ(p) changes recursively with n, let’s update our
notation as: ξ(p) = ξn . Then, Equation (2.8) implies
n 1 n
M |p − pn | ≤ (M |p − p0 |)2 ⇒ |p − pn | ≤ (M |p − p0 |)2 .
M
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 55
p − pn+1 1 f 00 (ξn )
lim = lim − ,
n→∞ (p − pn )2 n→∞ 2 f 0 (pn )
p − pn+1 1 f 00 (p)
lim = −
n→∞ (p − pn )2 2 f 0 (p)
for some constant C > 0. Taking the absolute values of the limit established in the previous theorem,
we obtain 00
p − pn+1 |p n+1 − p| 1 f (p)
lim = lim = .
n→∞ (p − pn ) 2 n→∞ |pn − p| 2 2 f 0 (p)
00
Let C 0 = 21 ff 0 (p)
(p)
. From the definition of limit of a sequence, for any > 0, there exists an integer
|pn+1 −p|
N > 0 such that |pn −p|2
< C 0 + whenever n > N. Set C = C 0 + to obtain |pn+1 − p| ≤ C|pn − p|2
for n > N .
Example 34. The Black-Scholes-Merton (BSM) formula, for which Myron Scholes and Robert
Merton were awarded the Nobel prize in economics in 1997, computes the fair price of a contract
known as the European call option. This contract gives its owner the right to purchase the asset
the contract is written on (for example, a stock), for a specific price denoted by K and called the
strike price (or exercise price), at a future time denoted by T and called the expiry. The formula
gives the value of the European call option, C, as
where S is the price of the asset at the present time, r is the risk-free interest rate, and φ(x) is the
distribution function of the standard normal random variable, given by
Z x
1 2 /2
φ(x) = √ e−t dt.
2π −∞
log(S/K) + (r + σ 2 /2)T √
d1 = √ , d2 = d1 − σ T .
σ T
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 56
All the constants in the BSM formula can be observed, except for σ, which is called the volatility
of the underlying asset. It has to be estimated from empirical data in some way. We want to
concentrate on the relationship between C and σ, and think of C as a function of σ only. We
rewrite the BSM formula emphasizing σ:
It may look like the independent variable σ is missing on the right hand side of the above formula,
but it is not: the constants d1 , d2 both depend on σ. We can also think about d1 , d2 as functions
of σ.
There are two questions financial engineers are interested in:
• Observe the price of an option Ĉ traded in the market, and find σ ∗ for which the BSM
formula gives the output Ĉ, i.e, C(σ ∗ ) = Ĉ. The volatility σ ∗ obtained in this way is called
the implied volatility.
The second question can be answered using a root-finding method, in particular, Newton’s method.
To summarize, we want to solve the equation:
C(σ) − Ĉ = 0
dC dφ(d1 ) dφ(d2 )
=S − Ke−rT . (2.9)
dσ dσ dσ
Z d1
2 /2
We will use the chain rule to compute the derivative d
dσ e−t dt = dσ :
du
−∞
| {z }
u
du du dd1
= .
dσ dd1 dσ
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 57
du
= e−d1 /2 ,
2
dd1
dφ(d1 ) e−d1 /2
2
√ log(S/K) + (r + σ 2 /2)T
= √ T− √ .
dσ 2π σ2 T
Going back to the second derivative we need to compute in equation (2.9), we have:
Z d2
dφ(d2 )
1 d −t2 /2
=√ e dt .
dσ 2π dσ −∞
Using the chain rule and the Fundamental Theorem of Calculus we obtain
= √ .
dσ 2π dσ
√
Since d2 is defined as d2 = d1 − σ T , we can express dd2 /dσ in terms of dd1 /dσ as:
dd2 dd1 √
= − T.
dσ dσ
d1 2 σ2
! d2 2 σ2
!
Se− 2 √ S
log( K ) + (r + e−(rT + 2 ) S
dC 2 )T log( K ) + (r + 2 )T
= √ T− √ +K √ √ (2.10)
dσ 2π σ2 T 2π σ2 T
We are ready to apply Newton’s method to solve the equation C(σ) − Ĉ = 0. Now let’s find
some data.
The General Electric Company (GE) stock is $7.01 on Dec 8, 2018, and a European call option
on this stock, expiring on Dec 14, 2018, is priced at $0.10. The option has strike price K = $7.5.
The risk-free interest rate is 2.25%. The expiry T is measured in years, and since there are 252
trading days in a year, T = 6/252. We put this information in Python:
In [1]: S = 7.01
K = 7.5
r = 0.0225
T = 6/252
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 58
We have not discussed how to compute the distribution function of the standard normal ran-
Rx 2
dom variable φ(x) = √12π −∞ e−t /2 dt. In Chapter 4, we will discuss how to compute integrals
numerically, but for this example, we will use the built-in function Python has for φ(x). It is in a
subpackage of SciPy called stats:
The member function cdf (x) of stdnormal computes the standard normal distribution function
at x. We write a function phi(x) based on this member function, matching our notation φ(x) for
the distribution function.
We then load the newton function and run it to find the implied volatility which turns out to
be 62%.
f (pn−1 )
pn = pn−1 − , n ≥ 1.
f 0 (pn−1 )
If we do not know f 0 (x) explicitly, or if its computation is expensive, we might approximate f 0 (pn−1 )
by the finite difference
f (pn−1 + h) − f (pn−1 )
(2.11)
h
for some small h. We then need to compute two values of f at each iteration to approximate f 0 .
Determining h in this formula brings some difficulty, but there is a way to get around this. We will
use the iterates themselves to rewrite the finite difference (2.11) as
f (pn−1 ) − f (pn−2 )
.
pn−1 − pn−2
Geometric interpretation: The slope of the secant line through the points (pn−1 , f (pn−1 )) and
f (pn−1 )−f (pn−2 )
(pn−2 , f (pn−2 )) is pn−1 −pn−2 . The x-intercept of the secant line, which is set to pn , is
Theorem 35. Let f ∈ C 2 [a, b] and assume f (p) = 0, f 0 (p) 6= 0, for p ∈ (a, b). If the initial guesses
p0 , p1 are sufficiently close to p, then the iterates of the secant method converge to p with
|p − pn+1 | f 00 (p) r1
lim = 0
n→∞ |p − pn |r0 2f (p)
√ √
where r0 = 5+1
2 ≈ 1.62, r1 = 5−1
2 ≈ 0.62.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 60
Let’s find the root of f (x) = cos x − x using the secant method, using 0.5 and 1 as the initial
guesses.
Exercise 2.4-1: Use the Python codes for the secant and Newton’s methods to find solutions
for the equation sin x − e−x = 0 on 0 ≤ x ≤ 1. Set tolerance to 10−4 , and take p0 = 0 in Newton,
and p0 = 0, p1 = 1 in secant method. Do a visual inspection of the estimates and comment on the
convergence rates of the methods.
Exercise 2.4-2:
a) The function y = log x has a root at x = 1. Run the Python code for Newton’s method with
p0 = 2, = 10−4 , N = 20, and then try p0 = 3. Does Newton’s method find the root in each
case? If Python gives an error message, explain what the error is.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 61
b) One can combine the bisection method and Newton’s method to develop a hybrid method
that converges for a wider range of starting values p0 , and has better convergence rate than
the bisection method.
Write a Python code for a bisection-Newton hybrid method, as described below. (You can use
the Python codes for the bisection and Newton’s methods from the lecture notes.) Your code
will input f, f 0 , a, b, , N where f, f 0 are the function and its derivative, (a, b) is an interval that
contains the root (i.e., f (a)f (b) < 0), and , N are the tolerance and the maximum number
of iterations. The code will use the same stopping criterion used in Newton’s method.
The method will start with computing the midpoint of (a, b), call it p0 , and use Newton’s
method with initial guess p0 to obtain p1 . It will then check whether p1 ∈ (a, b). If p1 ∈ (a, b),
then the code will continue using Newton’s method to compute the next iteration p2 . If
/ (a, b), then we will not accept p1 as the next iteration: instead the code will switch
p1 ∈
to the bisection method, determine which subinterval among (a, p0 ), (p0 , b) contains the root,
updates the interval (a, b) as the subinterval that contains the root, and sets p1 to the midpoint
of this interval. Once p1 is obtained, the code will check if the stopping criterion is satisfied.
If it is satisfied, the code will return p1 and the iteration number, and terminate. If it is
not satisfied, the code will use Newton’s method, with p1 as the initial guess, to compute
p2 . Then it will check whether p2 ∈ (a, b), and continue in this way. If the code does not
terminate after N iterations, output an error message similar to Newton’s method.
Apply the hybrid method to:
• a polynomial with a known root, and check if the method finds the correct root;
• y = log x with (a, b) = (0, 6), for which Newton’s method failed in part (a).
c) Do you think in general the hybrid method converges to the root, provided the initial interval
(a, b) contains the root, for any starting value p0 ? Explain.
to get
c = f (p2 )
(p0 − p2 )(f (p1 ) − f (p2 )) (p1 − p2 )(f (p0 ) − f (p2 ))
b= − (2.14)
(p1 − p2 )(p0 − p1 ) (p0 − p2 )(p0 − p1 )
f (p0 ) − f (p2 ) f (p1 ) − f (p2 )
a= − .
(p0 − p2 )(p0 − p1 ) (p1 − p2 )(p0 − p1 )
Now that we have determined P (x), the next step is to solve P (x) = 0, and set the next iterate
p3 to its solution. To this end, put w = x − p2 in (2.13) to rewrite the quadratic equation as
aw2 + bw + c = 0.
−2c
ŵ = x̂ − p2 = √ . (2.15)
b ± b2 − 4ac
√
Let ∆ = b2 − 4ac. We have two roots (which could be complex numbers), −2c/(b + ∆) and
√
−2c/(b − ∆), and we need to pick one of them. We will pick the root that is closer to p2 , in other
words, the root that makes |x̂ − p2 | the smallest. (If the numbers are complex, the absolute value
means the norm of the complex number.) Therefore we have
√ √
−2c
√ if |b + ∆| > |b − ∆|
x̂ − p2 = b+ ∆
√ √ . (2.16)
−2c
√
b− ∆
if |b + ∆| ≤ |b − ∆|
The next iterate of Muller’s method, p3 , is set to the value of x̂ obtained from the above calculation,
that is, √ √
p2 − 2c
√
b+ ∆
if |b + ∆| > |b − ∆|
p3 = x̂ = √ √ .
p −
2
2c
√
b− ∆
if |b + ∆| ≤ |b − ∆|
Remark 36.
3. Muller’s method converges for a variety of starting values even though pathological examples
that do not yield convergence can be found (for example, when the three starting values fall
on a line).
pzero = pone
pone = ptwo
ptwo = p
n += 1
y = f(p)
print('Method did not converge. The last iteration gives ',
p, ' with function value ', y)
The polynomial x5 +2x3 −5x−2 has three real roots, and two complex roots that are conjugates.
Let’s find them all, by experimenting with various initial guesses.
We can formulate a root-finding problem as a fixed-point problem, and vice versa. For example,
assume we want to solve the root finding problem, f (p) = 0. Define g(x) = x−f (x), and observe that
if p is a fixed-point of g(x), that is, g(p) = p − f (p) = p, then p is a root of f (x). Here the function
g is not unique: there are many ways one can represent the root-finding problem f (p) = 0 as a
fixed-point problem, and as we will learn later, not all will be useful to us in developing fixed-point
iteration algorithms.
The next theorem answers the following questions: When does a function g have a fixed-point?
If it has a fixed-point, is it unique?
Theorem 38. 1. If g is a continuous function on [a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], then
g has at least one fixed-point in [a, b].
2. If, in addition, |g(x)−g(y)| ≤ λ|x−y| for all x, y ∈ [a, b] where 0 < λ < 1, then the fixed-point
is unique.
Proof. Consider f (x) = g(x) − x. Assume g(a) 6= a and g(b) 6= b (otherwise the proof is over.)
Then f (a) = g(a) − a > 0 since g(a) must be greater than a if it’s not equal to a. Similarly,
f (b) = g(b) − b < 0. Then from IVT, there exists p ∈ (a, b) such that f (p) = 0, or g(p) = p. To
prove part 2, suppose there are two different fixed-points p, q. Then
which is a contradiction.
Remark 39. Let g be a differentiable function on [a, b] such that |g 0 (x)| ≤ k for all x ∈ (a, b) for
some positive constant k < 1. Then the hypothesis of part 2 of Theorem 38 is satisfied with λ = k.
Indeed, from the mean value theorem
converges to p, the unique fixed-point of g in [a, b], for any starting point p0 ∈ [a, b].
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 66
Proof. Since p0 ∈ [a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], all iterates pn ∈ [a, b]. Observe that
Remark 41. Theorem 40 still holds if the second condition |g(x) − g(y)| ≤ λ|x − y|, is replaced by
|g 0 (x)| ≤ k for all x ∈ [a, b] where 0 < k < 1. (See Remark 39).
Corollary 42. If g satisfies the hypothesis of Theorem 40, then the following error bounds hold.
λn
1. |p − pn | ≤ 1−λ |p1 − p0 |
2. |p − pn | ≤ 1
1−λ |pn+1 − pn |
3. |p − pn+1 | ≤ λ
1−λ |pn+1 − pn |
4. |p − pn | ≤ λn max{p0 − a, b − p0 }
4 4
y=x
3 3
y = g(x)
2 2
1 1
0 0
4 4
y=x
3 3
2 2
1 1
y = g(x)
0 0
2. Let p0 = 1. Use Corollary 42 to find n that ensures an estimate to p accurate to within 10−4 .
Solution. 1. There are several ways we can write this problem as g(x) = x :
(a) Let f (x) = x3 − 2x2 − 1, and p be its root, that is, f (p) = 0. If we let g(x) = x − f (x),
then g(p) = p − f (p) = p, so p is a fixed-point of g. However, this choice for g will not
be helpful, since g does not satisfy the first condition of Theorem 40: g(x) ∈
/ [1, 3] for
all x ∈ [1, 3] (g(3) = −5 ∈
/ [1, 3]).
(b) Since p is a root for f , we have p3 = 2p2 + 1, or p = (2p2 + 1)1/3 . Therefore, p is the
solution to the fixed-point problem g(x) = x where g(x) = (2x2 + 1)1/3 .
• g is increasing on [1, 3] and g(1) = 1.44, g(3) = 2.67, thus g(x) ∈ [1, 3] for all
x ∈ [1, 3]. Therefore, g satisfies the first condition of Theorem 40.
• g 0 (x) = 4x
3(2x2 +1)2/3
and g 0 (1) = 0.64, g 0 (3) = 0.56 and g 0 is decreasing on [1, 3].
Therefore g satisfies the condition in Remark 41 with λ = 0.64.
Then, from Theorem 40 and Remark 41, the fixed-point iteration converges if g(x) =
(2x2 + 1)1/3 .
−4 log 10−log 2
We want 2(0.64n ) < 10−4 , which implies n log 0.64 < −4 log 10 − log 2, or n > log 0.64 ≈
22.19. Therefore n = 23 is the smallest number of iterations that ensures an absolute error of
10−4 .
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 68
Let’s find the fixed-point of g(x) = x where g(x) = (2x2 + 1)1/3 , with p0 = 1. We studied this
problem in Example 43 where we found that 23 iterations guarantee an estimate accurate to within
10−4 . We set = 10−4 , and N = 30, in the above code.
The exact value of the fixed-point, equivalently the root of x3 − 2x2 − 1, is 2.20556943. Then
the exact error is:
In [3]: 2.205472095330031-2.20556943
Out[3]: -9.733466996930673e-05
• The exact error, |pn − p|, is guaranteed to be less than 10−4 after 23 iterations from Corollary
42, but as we observed in this example, this could happen before 23 iterations.
• The stopping criterion used in the code is based on |pn − pn−1 |, not |pn − p|, so the iteration
number that makes these quantities less than a tolerance will not be the same in general.
Theorem 44. Assume p is a solution of g(x) = x, and suppose g(x) is continuously differentiable
in some interval about p with |g 0 (p)| < 1. Then the fixed-point iteration converges to p, provided p0
is chosen sufficiently close to p. Moreover, the convergence is linear if g 0 (p) 6= 0.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 69
Proof. Since g 0 is continuous and |g 0 (p)| < 1, there exists an interval I = [p − , p + ] such that
|g 0 (x)| ≤ k for all x ∈ I, for some k < 1. Then, from Remark 39, we know |g(x) − g(y)| ≤ k|x − y|
for all x, y ∈ I. Next, we argue that g(x) ∈ I if x ∈ I. Indeed, if |x − p| < , then
hence g(x) ∈ I. Now use Theorem 40, setting [a, b] to [p − , p + ], to conclude the fixed-point
iteration converges.
To prove convergence is linear, we note
which is the definition of linear convergence (with k being a positive constant less than 1).
We can actually prove something more:
The last equality follows since g 0 is continuous, and ξn → p, which is a consequence of ξn being
between p and pn , and pn → p, as n → ∞.
√
Example 45. Let g(x) = x + c(x2 − 2), which has the fixed-point p = 2 ≈ 1.4142. Pick a value for
c to ensure the convergence of fixed-point iteration. For the picked value c, determine the interval
of convergence I = [a, b], that is, the interval for which any p0 from the interval gives rise to a
converging fixed-point iteration. Then write a Python code to test the results.
√ √
Solution. Theorem 44 requires |g 0 (p)| < 1. We have g 0 (x) = 1 + 2xc, and thus g 0 ( 2) = 1 + 2 2c.
Therefore
√ √
|g 0 ( 2)| < 1 ⇒ −1 < 1 + 2 2c < 1
√
⇒ −2 < 2 2c < 0
−1
⇒ √ < c < 0.
2
for some k < 1, for all x ∈ I. Plot g 0 (x) and observe that one choice is = 0.1, so that I =
√ √
[ 2 − 0.1, 2 + 0.1] = [1.3142, 1.5142]. Since g 0 (x) is positive and decreasing on I = [1.3142, 1.5142],
|g 0 (x)| ≤ 1− 1.3142
2 = 0.3429 < 1, for any x ∈ I. Then any starting value x0 from I gives convergence.
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 70
2
For c = −1/4, the function becomes g(x) = x − x 4−2 . Pick p0 = 1.5 as the starting point. Using
the fixed-point iteration code of the previous example, we obtain:
In [5]: 1.414214788550556-(2**.5)
Out[5]: 1.2261774609001463e-06
Let’s experiment with other starting values. Although p0 = 2 is not in the interval of convergence
I, we expect convergence since g 0 (2) = 0:
Let’s try p0 = −5. Note that this is not only outside the interval of convergence I, but g 0 (−5) =
3.5 > 1, so we do not expect convergence.
---------------------------------------------------------------------------
<ipython-input-8-2df659a6b4d1> in <module>
----> 1 fixedpt(lambda x: x-(x**2-2)/4, -5, 1e-5, 15)
<ipython-input-8-2df659a6b4d1> in <lambda>(x)
----> 1 fixedpt(lambda x: x-(x**2-2)/4, -5, 1e-5, 15)
In [4]: plt.plot(arr);
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 72
√
pn − √2
The graph suggests the limit of pn−1 − 2
exists and it is around 0.295, supporting linear conver-
gence.
|pn+1 − p|
lim = |g 0 (p)|
n→∞ |pn − p|
which implied that the fixed-point iteration has linear convergence, if g 0 (p) 6= 0.
If this limit were zero, then we would have
|pn+1 − p|
lim = 0,
n→∞ |pn − p|
which means the denominator is growing at a larger rate than the numerator. We could then ask if
|pn+1 − p|
lim = nonzero constant
n→∞ |pn − p|α
Theorem 46. Assume p is a solution of g(x) = x where g ∈ C α (I) for some interval I that contains
p, and for some α ≥ 2. Furthermore assume
Then if the initial guess p0 is sufficiently close to p, the fixed-point iteration pn = g(pn−1 ), n ≥ 1,
will have order of convergence of α, and
where ξn is a number between pn and p, and all numbers are in I. From the hypothesis, this
simplifies as
(pn − p)α (α) pn+1 − p g (α) (ξn )
pn+1 = p + g (ξn ) ⇒ α
= .
α! (pn − p) α!
From Theorem 44, if p0 is chosen sufficiently close to p, then limn→∞ pn = p. The order of conver-
gence is α with
|pn+1 − p| |g (α) (ξn )| |g (α) (p)|
lim α
= lim = 6= 0.
n→∞ |pn − p| n→∞ α! α!
and thus
f (p)f 00 (p)
g 0 (p) = = 0.
[f 0 (p)]2
Similarly,
(f 0 (x)f 00 (x) + f (x)f 000 (x)) (f 0 (x))2 − f (x)f 00 (x)2f 0 (x)f 00 (x)
g 00 (x) =
[f 0 (x)]4
CHAPTER 2. SOLUTIONS OF EQUATIONS: ROOT-FINDING 74
which implies
(f 0 (p)f 00 (p)) (f 0 (p))2 f 00 (p)
g 00 (p) = 0 4
= 0 .
[f (p)] f (p)
If f 00 (p) 6= 0, then Theorem 46 implies Newton’s method has quadratic convergence with
pn+1 − p f 00 (p)
lim =
n→∞ (pn − p)2 2f 0 (p)
Exercise 2.7-1: Use Theorem 38 (and Remark 39) to show that g(x) = 3−x has a unique
fixed-point on [1/4, 1]. Use Corollary 42, part (4), to find the number of iterations necessary to
achieve 10−5 accuracy. Then use the Python code to obtain an approximation, and compare the
error with the theoretical estimate obtained from Corollary 42.
Exercise 2.7-2: Let g(x) = 2x − cx2 where c is a positive constant. Prove that if the
fixed-point iteration pn = g(pn−1 ) converges to a non-zero limit, then the limit is 1/c.
Chapter 3
Interpolation
In this chapter, we will study the following problem: given data (xi , yi ), i = 0, 1, ..., n, find a
function f such that f (xi ) = yi . This problem is called the interpolation problem, and f is called
the interpolating function, or interpolant, for the given data.
Interpolation is used, for example, when we use mathematical software to plot a smooth curve
through discrete data points, when we want to find the in-between values in a table, or when we
differentiate or integrate black-box type functions.
How do we choose f ? Or, what kind of function do we want f to be? There are several options.
Examples of functions used in interpolation are polynomials, piecewise polynomials, rational func-
tions, trigonometric functions, and exponential functions. As we try to find a good choice for f for
our data, some questions to consider are whether we want f to inherit the properties of the data
(for example, if the data is periodic, should we use a trigonometric function as f ?), and how we
want f behave between data points. In general f should be easy to evaluate, and easy to integrate
& differentiate.
Here is a general framework for the interpolation problem. We are given data, and we pick a
family of functions from which the interpolant f will be chosen:
Suppose the family of functions selected forms a vector space. Pick a basis for the vector space:
φ0 (x), φ1 (x), ..., φn (x). Then the interpolating function can be written as a linear combination of
the basis vectors (functions):
n
X
f (x) = ak φk (x).
k=0
We want f to pass through the data points, that is, f (xi ) = yi . Then determine ak so that:
n
X
f (xi ) = ak φk (xi ) = yi , i = 0, 1, ..., n,
k=0
75
CHAPTER 3. INTERPOLATION 76
which is a system of n + 1 equations with n + 1 unknowns. Using matrices, the problem is to solve
the matrix equation
Aa = y
for a, where
φ0 (x0 ) ... φn (x0 ) a0 y0
φ0 (x1 ) ... φn (x1 ) a1 y1
A= . ,a = . ,y = . .
.. .. ..
φ0 (xn ) ... φn (xn ) an yn
• Family: Polynomials
The space of polynomials up to degree n is a vector space. We will consider three choices for
the basis for this vector space:
• Basis:
Qk−1
– Newton basis: φk (x) = j=0 (x − xj )
where k = 0, 1, ..., n.
Once we decide on the basis, the interpolating polynomial can be written as a linear combination
of the basis functions:
n
X
pn (x) = ak φk (x)
k=0
Theorem 47. If points x0 , x1 , ..., xn are distinct, then for real values y0 , y1 , ..., yn , there is a unique
polynomial pn of degree at most n such that pn (xi ) = yi , i = 0, 1, ..., n.
CHAPTER 3. INTERPOLATION 77
We mentioned three families of basis functions for polynomials. The choice of a family of basis
functions affects:
• The accuracy of the numerical methods to solve the system of linear equations Aa = y.
• The ease at which the resulting polynomial can be evaluated, differentiated, integrated, etc.
φk (x) = xk , k = 0, 1, ..., n.
The interpolating polynomial pn (x) can be written as a linear combination of these basis functions
as
pn (x) = a0 + a1 x + a2 x2 + ... + an xn .
We will determine ai using the fact that pn is an interpolant for the data:
for [a0 , ..., an ]T where [·]T stands for the transpose of the vector. The coefficient matrix A is known
as the van der Monde matrix. This is usually an ill-conditioned matrix, which means solving the
system of equations could result in large error in the coefficients ai . An intuitive way to understand
the ill-conditioning is to plot several basis monomials, and note how less distinguishable they are as
the degree increases, making the columns of the matrix nearly linearly dependent.
CHAPTER 3. INTERPOLATION 78
Solving the matrix equation Aa = b could also be expensive. Using Gaussian elimination to
solve the matrix equation for a general matrix A requires O(n3 ) operations. This means the number
of operations grows like Cn3 , where C is a positive constant.1 However, there are some advantages
to the monomial form: evaluating the polynomial is very efficient using Horner’s method, which
is the nested form discussed in Exercises 1.3-4, 1.3-5 of Chapter 1, requiring O(n) operations.
Differentiation and integration are also relatively efficient.
n
Y x − xj
lk (x) = , k = 0, 1, ..., n.
xk − xj
j=0,j6=k
We write the interpolating polynomial pn (x) as a linear combination of these basis functions as
ak = yk for k = 0, 1, ..., n.
The main advantage of the Lagrange form of interpolation is that finding the interpolating polyno-
mial is trivial: there is no need to solve a matrix equation. However, the evaluation, differentiation,
and integration of the Lagrange form of a polynomial is more expensive than, for example, the
monomial form.
Example 48. Find the interpolating polynomial using the monomial basis and Lagrange basis
functions for the data: (−1, −6), (1, 0), (2, 6).
1 x0 x20
a0 y0 1 −1 1 a0 −6
2
x1 a1 = y1 ⇒ 1 1 1 a1 = 0
1 x1
1 x2 x22 a2 y2 1 2 4 a2 6
| {z } | {z } | {z }
A a y
We can use Gaussian elimination to solve this matrix equation, or get help from Python:
In [4]: np.linalg.solve(A, y)
p2 (x) = −4 + 3x + x2 .
• Lagrange basis: p2 (x) = y0 l0 (x) + y1 l1 (x) + y2 l2 (x) = −6l0 (x) + 0l1 (x) + 6l2 (x) where
therefore
(x − 1)(x − 2) (x + 1)(x − 1)
p2 (x) = −6 +6 = −(x − 1)(x − 2) + 2(x + 1)(x − 1).
6 3
If we multiply out and collect the like terms, we obtain p2 (x) = −4 + 3x + x2 , which is the
polynomial we obtained from the monomial basis earlier.
Pn
Exercise 3.1-1: Prove that k=0 lk (x) = 1 for all x, where lk are the Lagrange basis func-
tions for n + 1 data points. (Hint. First verify the identity for n = 1 algebraically, for any two
data points. For the general case, think about what special function’s interpolating polynomial in
Lagrange form is nk=0 lk (x) ).
P
CHAPTER 3. INTERPOLATION 81
k−1
Y
πk (x) = (x − xj ), k = 0, 1, ..., n
j=0
Q−1
where π0 (x) = j=0 (x − xj ) is interpreted as 1. The interpolating polynomial pn , written as a
linear combination of Newton basis functions, is
for [a0 , ..., an ]T . Note that the coefficient matrix A is lower-triangular, and a can be solved by
forward substitution, which is shown in the next example, in O(n2 ) operations.
Example 49. Find the interpolating polynomial using Newton’s basis for the data:
(−1, −6), (1, 0), (2, 6).
1 3 3 a2 6
Forward substitution is:
a0 = −6
a0 + 2a1 = 0 ⇒ −6 + 2a1 = 0 ⇒ a1 = 3
a0 + 3a1 + 3a2 = 6 ⇒ −6 + 9 + 3a2 = 6 ⇒ a2 = 1.
Factoring out and simplifying gives p2 (x) = −4 + 3x + x2 , which is the polynomial discussed in
Example 48.
Summary: The interpolating polynomial p2 (x) for the data, (−1, −6), (1, 0), (2, 6), represented in
three different basis functions is:
Monomial: p2 (x) = −4 + 3x + x2
Lagrange: p2 (x) = − (x − 1)(x − 2) + 2(x + 1)(x − 1)
Newton: p2 (x) = − 6 + 3(x + 1) + (x + 1)(x − 1)
Similar to the monomial form, a polynomial written in Newton’s form can be evaluated using
the Horner’s method which has O(n) complexity:
Example 50. Write p2 (x) = −6 + 3(x + 1) + (x + 1)(x − 1) using the nested form.
Solution. −6 + 3(x + 1) + (x + 1)(x − 1) = −6 + (x + 1)(2 + x); note that the left-hand side has 2
multiplications, and the right-hand side has 1.
• Monomial → O(n3 )
• Lagrange → trivial
CHAPTER 3. INTERPOLATION 83
• Newton → O(n2 )
Evaluating the polynomials can be done efficiently using Horner’s method for monomial and
Newton forms. A modified version of Lagrange form can also be evaluated using Horner’s
method, but we do not discuss it here.
Exercise 3.1-2: Compute, by hand, the interpolating polynomial to the data (−1, 0), (0.5, 1),
(1, 0) using the monomial, Lagrange, and Newton basis functions. Verify the three polynomials are
identical.
It’s time to discuss some theoretical results for polynomial interpolation. Let’s start with proving
Theorem 47 which we stated earlier:
Theorem. If points x0 , x1 , ..., xn are distinct, then for real values y0 , y1 , ..., yn , there is a unique
polynomial pn of degree at most n such that pn (xi ) = yi , i = 0, 1, ..., n.
Proof. We have already established the existence of pn without mentioning it! The Lagrange form
of the interpolating polynomial constructs pn directly:
n n n
X X Y x − xj
pn (x) = yk lk (x) = yk .
xk − xj
k=0 k=0 j=0,j6=k
Let’s prove uniqueness. Assume pn , qn are two distinct polynomials satisfying the conclusion. Then
pn − qn is a polynomial of degree at most n such that (pn − qn )(xi ) = 0 for i = 0, 1, ..., n. This
means the non-zero polynomial (pn − qn ) of degree at most n, has (n + 1) distinct roots, which is a
contradiction.
The following theorem, which we state without proof, establishes the error of polynomial inter-
polation. Notice the similarities between this and Taylor’s Theorem 7.
Theorem 51. Let x0 , x1 , ..., xn be distinct numbers in the interval [a, b] and f ∈ C n+1 [a, b]. Then
for each x ∈ [a, b], there is a number ξ between x0 , x1 , ..., xn such that
f (n+1) (ξ)
f (x) − pn (x) = (x − x0 )(x − x1 ) · · · (x − xn ).
(n + 1)!
The following lemma is useful in finding upper bounds for |f (x) − pn (x)| using Theorem 51,
when the nodes x0 , ..., xn are equally spaced.
CHAPTER 3. INTERPOLATION 84
n
Y 1
|x − xi | ≤ hn+1 n!
4
i=0
Proof. Since x ∈ [a, b], it falls into one of the subintervals: let x ∈ [xj , xj+1 ]. Consider the product
|x − xj ||x − xj+1 |. Put s = |x − xj | and t = |x − xj+1 |. The maximum of st given s + t = h, using
Calculus, can be found to be h2 /4, which is attained when x is the midpoint, and thus s = t = h/2.
Then
n
Y
|x − xi | = |x − x0 | · · · |x − xj−1 ||x − xj ||x − xj+1 ||x − xj+2 | · · · |x − xn |
i=0
h2
≤ |x − x0 | · · · |x − xj−1 | |x − xj+2 | · · · |x − xn |
4
h2
≤ |xj+1 − x0 | · · · |xj+1 − xj−1 | |xj − xj+2 | · · · |xj − xn |
2 4
h
≤ (j + 1)h · · · 2h (2h) · · · (n − j)h
4
h2
= hj (j + 1)! (n − j)!hn−j−1
4
n!
≤ hn+1 .
4
Example 53. Find an upper bound for the absolute error when f (x) = cos x is approximated by its
interpolating polynomial pn (x) on [0, π/2]. For the interpolating polynomial, use 5 equally spaced
nodes (n = 4) in [0, π/2], including the endpoints.
|f (5) (ξ)|
|f (x) − p4 (x)| = |(x − x0 ) · · · (x − x4 )|.
5!
We have |f (5) (ξ)| ≤ 1. The nodes are equally spaced with h = (π/2 − 0)/4 = π/8. Then from the
previous lemma,
1 π 5
|(x − x0 ) · · · (x − x4 )| ≤ 4!
4 8
and therefore
1 1 π 5
|f (x) − p4 (x)| ≤ 4! = 4.7 × 10−4 .
5! 4 8
CHAPTER 3. INTERPOLATION 85
Exercise 3.1-3: Find an upper bound for the absolute error when f (x) = ln x is approxi-
mated by an interpolating polynomial of degree five with six nodes equally spaced in the interval
[1, 2].
We now revisit Newton’s form of interpolation, and learn an alternative method, known as
divided differences, to compute the coefficients of the interpolating polynomial. This approach
is numerically more stable than the forward substitution approach we used earlier. Let’s recall the
interpolation problem.
Let’s think of the y-coordinates of the data, yi , as values of an unknown function f evaluated
at xi , i.e., f (xi ) = yi . Substitute x = x0 in the interpolant to get:
a0 = f (x0 ).
f (x1 ) − f (x0 )
a1 = .
x1 − x0
Inspecting the formulas for a0 , a1 , a2 suggests the following simplified new notation called divided
differences:
ak = f [x0 , x1 , ..., xk ].
n
X
pn (x) = f [x0 ] + f [x0 , x1 , ..., xk ](x − x0 ) · · · (x − xk−1 )
k=1
Definition 54. Given data (xi , f (xi )), i = 0, 1, ..., n, the divided differences are defined recursively
as
f [x0 ] = f (x0 )
f [x1 , ..., xk ] − f [x0 , ..., xk−1 ]
f [x0 , x1 , ..., xk ] =
xk − x0
where k = 0, 1, ..., n.
Theorem 55. The ordering of the data in constructing divided differences is not important, that is,
the divided difference f [x0 , ..., xk ] is invariant under all permutations of the arguments x0 , ..., xk .
Proof. Consider the data (x0 , y0 ), (x1 , y1 ), ..., (xk , yk ) and let pk (x) be its interpolating polynomial:
Now let’s consider a permutation of the xi ; let’s label them as x̃0 , x̃1 , ..., x̃k . The interpolating
polynomial for the permuted data does not change, since the data x0 , x1 , ..., xk (omitting the y-
coordinates) is the same as x̃0 , x̃1 , ..., x̃k , just in different order. Therefore
pk (x) = f [x̃0 ] + f [x̃0 , x̃1 ](x − x̃0 ) + f [x̃0 , x̃1 , x̃2 ](x − x̃0 )(x − x̃1 ) + ...
+ f [x̃0 , ..., x̃k ](x − x̃0 ) · · · (x − x̃k−1 ).
The coefficient of the polynomial pk (x) for the highest degree xk is f [x0 , ..., xk ] in the first equation,
and f [x̃0 , ..., x̃k ] in the second. Therefore they must be equal to each other.
Example 56. Find the interpolating polynomial for the data (−1, −6), (1, 0), (2, 6) using Newton’s
form and divided differences.
CHAPTER 3. INTERPOLATION 87
Therefore
p2 (x) = −6 + 3(x + 1) + 1(x + 1)(x − 1),
x 1 2 4 6
f (x) 2 3 5 9
a) Construct a divided difference table for f by hand, and write the Newton form of the inter-
polating polynomial using the divided differences.
b) Assume you are given a new data point for the function: x = 3, y = 4. Find the new
interpolating polynomial. (Hint: Think about how to update the interpolating polynomial
you found in part (a).)
c) If you were working with the Lagrange form of the interpolating polynomial instead of the
Newton form, and you were given an additional data point like in part (b), how easy would
it be (compared to what you did in part (b)) to update your interpolating polynomial?
Example 57. Before the widespread availability of computers and mathematical software, the
values of some often-used mathematical functions were disseminated to researchers and engineers
via tables. The following table, taken from [1], displays some values of the gamma function, Γ(x) =
R ∞ x−1 −t
0 t e dt.
CHAPTER 3. INTERPOLATION 88
Here are various estimates for Γ(1.761) using interpolating polynomials of increasing degrees:
Next we will change the ordering of the data and repeat the calculations. We will list the data
in decreasing order of the x-coordinates:
Summary of results: The following table displays the results for each ordering of the data,
together with the correct Γ(1.761) to 7 digits of accuracy.
a) Theorem 55 stated that the ordering of the data in divided differences does not matter. But
we see differences in the two tables above. Is this a contradiction?
c) p3 (1.761) is different in the two orderings, however, this difference is due to rounding error.
In other words, if the calculations can be done exactly, p3 (1.761) will be the same in each
ordering of the data. Why?
Exercise 3.1-6: Consider a function f (x) such that f (2) = 1.5713, f (3) = 1.5719, f (5) =
1.5738, and f (6) = 1.5751. Estimate f (4) using a second degree interpolating polynomial (inter-
polating the first three data points) and a third degree interpolating polynomial (interpolating the
first four data points). Round the final results to four decimal places. Is there any advantage here
in using a third degree interpolating polynomial?
observation is, even though all the divided differences have to be computed in order to get the ones
needed for Newton’s form, they do not have to be all stored. The following Python code is based on
an efficient algorithm that goes through the divided difference calculations recursively, and stores
an array of size m = n + 1 at any given time. In the final iteration, this array has the divided
differences needed for Newton’s form.
Let’s explain the idea of the algorithm using the simple example of Table 3.1. The code creates
an array a = (a0 , a1 , a2 ) of size m = n + 1, which is three in our example, and sets
a0 = y0 , a1 = y1 , a2 = y2 .
a1 − a0 y1 − y0 a2 − a1 y2 − y1
a1 := = = f [x0 , x1 ], a2 := = .
x1 − x0 x1 − x0 x2 − x1 x2 − x1
y2 −y1 y1 −y0
a2 − a1 x2 −x1 − x1 −x0
a2 := = = f [x0 , x1 , x2 ].
x2 − x0 x2 − x0
containing the divided differences needed to construct the Newton’s form of the polynomial.
Here is the Python function diff that computes the divided differences. The code uses a function
we have not used before: np.flip(np.arange(j,m)). An example illustrates what it does the best:
In [2]: np.flip(np.arange(2,6))
In the code diff the inputs are the x- and y-coordinates of the data. The numbering of the
indices starts at 0.
CHAPTER 3. INTERPOLATION 91
These are the divided differences in the second ordering of the data in Example 57:
In [5]: diff(np.array([1.765,1.760,1.755,1.750]),
np.array([0.92256,0.92137,0.92021,0.91906]))
Now let’s write a code for the Newton form of polynomial interpolation. The inputs to the
function newton are the x- and y-coordinates of the data, and where we want to evaluate the
polynomial: z. The code uses the divided differences function diff discussed earlier to compute:
In [7]: newton(np.array([1.765,1.760,1.755,1.750]),
np.array([0.92256,0.92137,0.92021,0.91906]), 1.761)
Out[7]: 0.92160496
Exercise 3.1-7: This problem discusses inverse interpolation which gives another method
to find the root of a function. Let f be a continuous function on [a, b] with one root p in the
interval. Also assume f has an inverse. Let x0 , x1 , ..., xn be n + 1 distinct numbers in [a, b] with
f (xi ) = yi , i = 0, 1, ..., n. Construct an interpolating polynomial Pn for f −1 (x), by taking your data
points as (yi , xi ), i = 0, 1, ..., n. Observe that f −1 (0) = p, the root we are trying to find. Then,
approximate the root p, by evaluating the interpolating polynomial for f −1 at 0, i.e., Pn (0) ≈ p.
Using this method, and the following data, find an approximation to the solution of log x = 0.
We start with taking four equally spaced x-coordinates between -5 and 5, and plot the corre-
sponding interpolating polynomial and Runge’s function. Matplotlib allows typing mathematics in
captions of a plot using Latex. (Latex is a typesetting program this book is written with.) Latex
commands need to be enclosed by a pair of dollar signs, in addition to a pair of quotation marks.
CHAPTER 3. INTERPOLATION 93
The next two graphs plot interpolating polynomials on 11 and 21 equally spaced data.
We observe that as the degree of the interpolating polynomial increases, the polynomial has
large oscillations toward the end points of the interval. In fact, it can be shown that for any x such
that 3.64 < |x| < 5, supn≥0 |f (x) − pn (x)| = ∞, where f is Runge’s function.
This troublesome behavior of high degree interpolating polynomials improves significantly, if we
consider data with x-coordinates that are not equally spaced. Consider the interpolation error of
Theorem 51:
f (n+1) (ξ)
f (x) − pn (x) = (x − x0 )(x − x1 ) · · · (x − xn ).
(n + 1)!
Perhaps surprisingly, the right-hand side of the above equation is not minimized when the nodes,
xi , are equally spaced! The set of nodes that minimizes the interpolation error is the roots of
the so-called Chebyshev polynomials. The placing of these nodes is such that there are more nodes
towards the end points of the interval, than the middle. We will learn about Chebyshev polynomials
in Chapter 5. Using Chebyshev nodes in polynomial interpolation avoids the diverging behavior
CHAPTER 3. INTERPOLATION 96
of polynomial interpolants as the degree increases, as observed in the case of Runge’s function, for
sufficiently smooth functions.
Theorem 58. Suppose f ∈ C n [a, b] and x0 , x1 , ..., xn are distinct numbers in [a, b]. Then there
exists ξ ∈ (a, b) such that
f (n) (ξ)
f [x0 , ..., xn ] = .
n!
To prove this theorem, we need the generalized Rolle’s theorem.
Theorem 59 (Rolle’s theorem). Suppose f is a differentiable function on (a, b). If f (a) = f (b),
then there exists c ∈ (a, b) such that f 0 (c) = 0.
Theorem 60 (Generalized Rolle’s theorem). Suppose f has n derivatives on (a, b). If f (x) = 0 at
(n + 1) distinct numbers x0 , x1 , ..., xn ∈ [a, b], then there exists c ∈ (a, b) such that f (n) (c) = 0.
Proof of Theorem 58 . Consider the function g(x) = pn (x) − f (x). Observe that g(xi ) = 0 for
i = 0, 1, ..., n. From generalized Rolle’s theorem, there exists ξ ∈ (a, b) such that g (n) (ξ) = 0, which
implies
pn(n) (ξ) − f (n) (ξ) = 0.
(n)
Since pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + ... + f [x0 , ..., xn ](x − x0 ) · · · (x − xn−1 ), pn (x) equals n!
times the leading coefficient f [x0 , ..., xn ]. Therefore
Data:
x0 , x1 , ..., xn
y0 , y1 , ..., yn ; yi = f (xi )
CHAPTER 3. INTERPOLATION 97
We seek a polynomial that fits the y and y 0 values, that is, we seek a polynomial H(x) such that
H(xi ) = yi and H 0 (xi ) = yi0 , i = 0, 1, ..., n. This makes 2n + 2 equations, and if we let
then there are 2n + 2 unknowns, a0 , ..., a2n+1 , to solve for. The following theorem shows that there
is a unique solution to this system of equations; a proof can be found in Burden, Faires, Burden [4].
Theorem 61. If f ∈ C 1 [a, b] and x0 , ..., xn ∈ [a, b] are distinct, then there is a unique polynomial
H2n+1 (x), of degree at most 2n + 1, agreeing with f and f 0 at x0 , ..., xn . The polynomial can be
written as:
n
X n
X
H2n+1 (x) = yi hi (x) + yi0 hei (x)
i=0 i=0
where
Here li (x) is the ith Lagrange basis function for the nodes x0 , ..., xn , and li0 (x) is its derivative.
H2n+1 (x) is called the Hermite interpolating polynomial.
The only difference between Hermite interpolation and polynomial interpolation is that in the
former, we have the derivative information, which can go a long way in capturing the shape of the
underlying function.
The underlying function the data comes from is cos x, but we pretend we do not know this. Figure
(3.2) plots the underlying function, the data, and the polynomial interpolant for the data. Clearly,
the polynomial interpolant does not come close to giving a good approximation to the underlying
function cos x.
CHAPTER 3. INTERPOLATION 98
Figure 3.2
Now let’s assume we know the derivative of the underlying function at these nodes:
We then construct the Hermite interpolating polynomial, incorporating the derivative infor-
mation. Figure (3.3) plots the Hermite interpolating polynomial, together with the polynomial
interpolant, and the underlying function.
It is visually difficult to separate the Hermite interpolating polynomial from the underlying
function cos x in Figure (3.3). Going from polynomial interpolation to Hermite interpolation results
in rather dramatic improvement in approximating the underlying function.
CHAPTER 3. INTERPOLATION 99
Figure 3.3
x0 , x1 , ..., xn
y0 , y1 , ..., yn ; yi = f (xi )
y00 , y10 , ..., yn0 ; yi0 = f 0 (xi )
z0 = x0 , z2 = x1 , z4 = x2 , ..., z2n = xn
z1 = x0 , z3 = x1 , z5 = x2 , ..., z2n+1 = xn
There is a little problem with some of the first divided differences above: they are undefined!
Observe that
f (x0 ) − f (x0 )
f [z0 , z1 ] = f [x0 , x0 ] =
x0 − x0
or, in general,
f (xi ) − f (xi )
f [z2i , z2i+1 ] = f [xi , xi ] =
xi − xi
for i = 0, ..., n.
f (n) (ξ)
From Theorem 58, we know f [x0 , ..., xn ] = n! for some ξ between the min and max of
x0 , ..., xn . From a classical result by Hermite & Gennochi (see Atkinson [3], page 144), divided
differences are continuous functions of their variables x0 , ..., xn . This implies we can take the limit
of the above result as xi → x0 for all i, which results in
f (n) (x0 )
f [x0 , ..., x0 ] = .
n!
for i = 0, 1, ..., n.
Example 63. Let’s compute the Hermite polynomial of Example 62. The data is:
i xi yi yi0
0 -1.5 0.071 1
1 1.6 -0.029 -1
2 4.7 -0.012 1
5
X
H5 (x) = f [z0 ] + f [z0 , ..., zi ](x − z0 ) · · · (x − zi−1 ).
i=1
z f (z) 1st diff 2nd diff 3rd diff 4th diff 5th diff
z0 = −1.5 0.071
f 0 (z0 ) = 1
z1 = −1.5 0.071 −0.032−1
1.6+1.5 = −0.33
f [z1 , z2 ] = −0.032 0.0065
z2 = 1.6 -0.029 −1+0.032
1.6+1.5 = −0.31 0.015
f 0 (z 2) = −1 0.10 -0.005
z3 = 1.6 -0.029 0.0055+1
4.7−1.6 = 0.32 -0.016
f [z3 , z4 ] = 0.0055 0
z4 = 4.7 -0.012 1−0.0055
4.7−1.6 = 0.32
f 0 (z4 ) = 1
z5 = 4.7 -0.012
The following function hdiff computes the divided differences needed for Hermite interpolation.
It is based on the function diff for computing divided differences for Newton interpolation. The
inputs to hdiff are the x-coordinates, the y-coordinates, and the derivatives yprime.
a[2*i+1] = y[i]
for i in np.flip(np.arange(1, m)): # computes the first divided
# differences using derivatives
a[2*i+1] = yprime[i]
a[2*i] = (a[2*i]-a[2*i-1]) / (z[2*i]-z[2*i-1])
a[1] = yprime[0]
for j in range(2, l): # computes the rest of the divided differences
for i in np.flip(np.arange(j, l)):
a[i]=(a[i]-a[i-1]) / (z[i]-z[i-j])
return a
Note that in the hand-calculations of Example 63, where two-digit rounding was used, we
obtained 0.0065 as the first third divided difference. In the Python output above, this divided
difference is 0.0067.
The following function computes the Hermite interpolating polynomial, using the divided dif-
ferences obtained from hdiff, and then evaluates the polynomial at w.
Exercise 3.3-1: The following table gives the values of y = f (x) and y 0 = f 0 (x) where
f (x) = ex + sin 10x. Compute the Hermite interpolating polynomial and the polynomial interpolant
for the data in the table. Plot the two interpolating polynomials together with f (x) = ex + sin 10x
on (0, 3).
x 0 0.4 1 2 2.6 3
y 1 0.735 2.17 8.30 14.2 19.1
y0 11 -5.04 -5.67 11.5 19.9 21.6
CHAPTER 3. INTERPOLATION 104
yi Q(x)
P(x)
yi+1
yi-1
xi-1 xi xi+1
P (xi−1 ) = yi−1
P (xi ) = yi
Q(xi ) = yi
Q(xi+1 ) = yi+1
which is a system of four equations and four unknowns. We then repeat this procedure for all data
points, (x0 , y0 ), (x1 , y1 ), ..., (xn , yn ), to determine all of the linear polynomials.
CHAPTER 3. INTERPOLATION 105
One disadvantage of linear spline interpolation is the lack of smoothness. The first derivative
of the spline is not continuous at the nodes (unless the data fall on a line). We can obtain better
smoothness by increasing the degree of the piecewise polynomials. In quadratic spline interpo-
lation, we connect the nodes via second degree polynomials.
Q(x)
yi
P(x)
yi+1
yi-1
xi-1 xi xi+1
where x0 < x1 < ... < xn . In the figure below, the cubic polynomials interpolating pairs of data are
labeled as S0 , ..., Sn−1 (we ignore the y-coordinates in the plot).
CHAPTER 3. INTERPOLATION 106
Si
Si-1
S0 Sn-1
Si (x) = ai + bi x + ci x2 + di x3
Si (xi ) = yi
Si (xi+1 ) = yi+1
for i = 0, 1, ..., n − 1, which gives 2n equations. The next group of equations are about smoothness:
0
Si−1 (xi ) = Si0 (xi )
00
Si−1 (xi ) = Si00 (xi )
for i = 1, 2, ..., n − 1, which gives 2(n − 1) = 2n − 2 equations. Last two equations are called the
boundary conditions. There are two choices:
Each boundary choice gives another two equations, bringing the total number of equations to 4n.
There are 4n unknowns as well. Do these systems of equations have a unique solution? The answer
is yes, and a proof can be found in Burden, Faires, Burden [4]. The spline obtained from the first
boundary choice is called a natural spline, and the other one is called a clamped spline.
Example 64. Find the natural cubic spline that interpolates the data (0, 0), (1, 1), (2, 0).
S0 (x) = a0 + b0 x + c0 x2 + d0 x3
S1 (x) = a1 + b1 x + c1 x2 + d1 x3
S0 (0) = 0 ⇒ a0 = 0
S0 (1) = 1 ⇒ a0 + b0 + c0 + d0 = 1
S1 (1) = 1 ⇒ a1 + b1 + c1 + d1 = 1
S1 (2) = 0 ⇒ a1 + 2b1 + 4c1 + 8d1 = 0
There are eight equations and eight unknowns. However, a0 = c0 = 0, so that reduces the number
of equations and unknowns to six. We rewrite the equations below, substituting a0 = c0 = 0, and
CHAPTER 3. INTERPOLATION 108
b0 + d0 = 1
a1 + b1 + c1 + d1 = 1
a1 + 2b1 + 4c1 + 8d1 = 0
b0 + 3d0 = b1 + 2c1 + 3d1
3d0 = c1 + 3d1
c1 + 6d1 = 0
We will use Python to solve this system of equations. To do that, we first rewrite the system of
equations as a matrix equation
Ax = v
where
1 1 0 0 0 b00 1
0 0 1 1 1 1 d0 1
0 0 1 2 4 8 a1
0
A= ,x = ,v = .
1
3 0 −1 −2 −3
b1
0
0 3 0 0 −1 −3 c1 0
0 0 0 0 1 6 d1 0
We enter the matrices A, v in Python and solve the equation Ax = v using the command
np.linalg.solve.
In [3]: np.linalg.solve(A, v)
CHAPTER 3. INTERPOLATION 109
Solving the equations of a spline even for three data points can be tedious. Fortunately, there
is a general approach to solving the equations for natural and clamped splines, for any number of
data points. We will use this approach when we write Python codes for splines next.
Exercise 3.4-1: Find the natural cubic spline interpolant for the following data:
x -1 0 1
y 1 2 0
Exercise 3.4-2: The following is a clamped cubic spline for a function f defined on [1, 3]:
s0 (x) = (x − 1) + (x − 1)2 − (x − 1)3 , if 1 ≤ x < 2
s(x) =
s (x) = a + b(x − 2) + c(x − 2)2 + d(x − 2)3
1 if 2 ≤ x < 3.
The function CubicNatural takes the x- and y-coordinates of the data as input, and computes
the natural cubic spline interpolating the data, by solving the resulting matrix equation. The code
is based on Algorithm 3.4 of Burden, Faires, Burden [4]. The output is the coefficients of the
m − 1 cubic polynomials, ai , bi , ci , di , i = 0, ..., m − 2 where m is the number of data points. These
coefficients are stored in the arrays a, b, c, d and returned at the end of the function, so that we can
access these arrays later to evaluate the spline for a given value w.
a = np.zeros(m)
b = np.zeros(n)
c = np.zeros(m)
d = np.zeros(n)
for i in range(m):
a[i] = y[i]
h = np.zeros(n)
for i in range(n):
h[i] = x[i+1] - x[i]
u = np.zeros(n)
u[0] = 0
for i in range(1, n):
u[i] = 3*(a[i+1]-a[i])/h[i]-3*(a[i]-a[i-1])/h[i-1]
s = np.zeros(m)
z = np.zeros(m)
t = np.zeros(n)
s[0] = 1
z[0] = 0
t[0] = 0
for i in range(1, n):
s[i] = 2*(x[i+1]-x[i-1])-h[i-1]*t[i-1]
t[i] = h[i]/s[i]
z[i]=(u[i]-h[i-1]*z[i-1])/s[i]
s[m-1] = 1
z[m-1] = 0
c[m-1] = 0
for i in np.flip(np.arange(n)):
c[i] = z[i]-t[i]*c[i+1]
b[i] = (a[i+1]-a[i])/h[i]-h[i]*(c[i+1]+2*c[i])/3
d[i] = (c[i+1]-c[i])/(3*h[i])
return a, b, c, d
Once the matrix equation is solved, and the coefficients of the cubic polynomials are computed
by CubicNatural, the next step is to evaluate the spline at a given value. This is done by the
following function CubicNaturalEval. The inputs are the value at which the spline is evaluated,
w, the x-coordinates of the data and the coefficients computed by CubicNatural. The function
first finds the interval [xi , xi+1 ], i = 0, ..., m − 2, w belongs to, and then evaluates the spline at w
using the corresponding cubic polynomial.
m = x.size
if w<x[0] or w>x[m-1]:
print('error: spline evaluated outside its domain')
return
n = m-1
p = 0
for i in range(n):
if w <= x[i+1]:
break
else:
p += 1
# p is the number of the subinterval w falls into, i.e., p=i means
# w falls into the ith subinterval $(x_i,x_{i+1}), and therefore
# the value of the spline at w is
# a_i+b_i*(w-x_i)+c_i*(w-x_i)^2+d_i*(w-x_i)^3.
a = coeff[0]
b = coeff[1]
c = coeff[2]
d = coeff[3]
return a[p]+b[p]*(w-x[p])+c[p]*(w-x[p])**2+d[p]*(w-x[p])**3
Next we will compare Newton and natural cubic spline interpolation when applied to Runge’s
function. We import the functions for Newton interpolation first.
for j in range(m-1):
pr *= (z-x[j])
sum += a[j+1]*pr
return sum
Here is the code that computes the cubic spline, Newton interpolation, and plot them.
The cubic spline gives an excellent fit to Runge’s function on this scale: we cannot visually
separate it from the function itself.
The following function CubicClamped computes the clamped cubic spline; the code is based
on Algorithm 3.5 of Burden, Faires, Burden [4]. The function CubicClampedEval evaluates the
spline at a given value.
CHAPTER 3. INTERPOLATION 113
In the following, we use natural and clamped cubic splines to interpolate data coming from sin x
at the x-coordinates: 0, π, 3π/2, 2π. The derivatives at the end points are both equal to 1.
Especially on the interval (0, π), the clamped spline gives a much better approximation to sin x
than the natural spline. However, adding an extra data point between 0 and π removes the visual
differences between the splines.
What Arya wants is a digitized version of the sketch; a figure that is smooth and can be
manipulated using graphics software. The letter NUH is a little complicated to apply a spline
interpolation directly, since it has some cusps. For such planar curves, we can use their parametric
representation, and use a cubic spline interpolation for x- and y-coordinates separately. To this end,
Arya picks eight points on the letter NUH, and labels them as t = 1, 2, ..., 8; see the figure below.
2
Seuss, 1955. On Beyond Zebra! Random House for Young Readers.
CHAPTER 3. INTERPOLATION 118
Then for each point she eyeballs the x and y-coordinates with the help of a graph paper. The
results are displayed in the table below.
t 1 2 3 4 5 6 7 8
x 0 0 -0.05 0.1 0.4 0.65 0.7 0.76
y 0 1.25 2.5 1 0.3 0.9 1.5 0
The next step is to fit a cubic spline to the data (t1 , x1 ), ..., (t8 , x8 ), and another cubic spline
to the data (t1 , y1 ), ..., (t8 , y8 ). Let’s call these splines xspline(t), yspline(t), respectively, since they
represent the x- and y-coordinates as functions of the parameter t. Plotting xspline(t), yspline(t)
will produce the letter NUH, as we can see in the following Python codes.
First, load the NumPy and Matplotlib packages, and copy and evaluate the functions Cubic-
Natural and CubicNaturalEval that we discussed earlier. Here is the letter NUH, obtained by
spline interpolation:
In [1]: t = np.array([1,2,3,4,5,6,7,8])
x = np.array([0,0,-0.05,0.1,0.4,0.65,0.7,0.76])
y = np.array([0,1.25,2.5,1,0.3,0.9,1.5,0])
taxis = np.linspace(1, 8, 700)
coeff = CubicNatural(t, x)
xspline = np.array(list(map(lambda x: CubicNaturalEval(x, t, coeff), taxis)))
coeff = CubicNatural(t, y)
yspline = np.array(list(map(lambda x: CubicNaturalEval(x, t, coeff), taxis)))
This looks like it needs to be squeezed! Adjusting the aspect ratio gives a better image. In the
following, we use the commands
In [2]: w, h = plt.figaspect(2);
plt.figure(figsize=(w, h));
CHAPTER 3. INTERPOLATION 119
In [3]: t = np.array([1,2,3,4,5,6,7,8])
x = np.array([0,0,-0.05,0.1,0.4,0.65,0.7,0.76])
y = np.array([0,1.25,2.5,1,0.3,0.9,1.5,0])
taxis = np.linspace(1, 8, 700)
coeff = CubicNatural(t, x)
xspline = np.array(list(map(lambda x: CubicNaturalEval(x, t, coeff), taxis)))
coeff = CubicNatural(t, y)
yspline = np.array(list(map(lambda x: CubicNaturalEval(x, t, coeff), taxis)))
w, h = plt.figaspect(2)
plt.figure(figsize=(w, h))
plt.plot(xspline, yspline, linewidth=5);
Exercise 3.4-3: Limaçon is a curve, named after a French word for snail, which appears in
the study of planetary motion. The polar equation for the curve is r = 1 + c sin θ where c is a
constant. Below is a plot of the curve when c = 1.
CHAPTER 3. INTERPOLATION 120
The x, y coordinates of the dots on the curve are displayed in the following table:
Recreate the limaçon above, by applying the spline interpolation for plane curves approach used
in Arya and the letter NUH example to the points given in the table.
Chapter 4
n
X
Pn (x) = f (xi )li (x)
i=0
f (n+1) (ξ(x))
f (x) = Pn (x) + (x − x0 ) · · · (x − xn ) ,
(n + 1)!
where ξ(x) ∈ [a, b]. (We have written ξ(x) instead of ξ to emphasize that ξ depends on the value of
x.)
Taking the integral of both sides yields
Z b Z b n
Z bY
1
f (x)dx = Pn (x)dx + (x − xi )f (n+1) (ξ(x))dx. (4.1)
a a (n + 1)! a i=0
| {z } | {z }
quadrature rule error term
121
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 122
The first integral on the right-hand side gives the quadrature rule:
n n n
!
Z b Z b X
Z b X
X
Pn (x)dx = f (xi )li (x) dx =
li (x)dx f (xi ) =
wi f (xi ),
a a i=0 i=0 a i=0
| {z }
wi
Theorem 65 (Weighted mean value theorem for integrals). Suppose f ∈ C 0 [a, b], the Riemann
integral of g(x) exists on [a, b], and g(x) does not change sign on [a, b]. Then there exists ξ ∈ (a, b)
Rb Rb
with a f (x)g(x)dx = f (ξ) a g(x)dx.
Two well-known numerical quadrature rules, trapezoidal rule and Simpson’s rule, are examples
of Newton-Cotes formulas:
• Trapezoidal rule
Let f ∈ C 2 [a, b]. Take two nodes, x0 = a, x1 = b, and use the linear Lagrange polynomial
x − x1 x − x0
P1 (x) = f (x0 ) + f (x1 )
x0 − x1 x1 − x0
Z b Z b 1
Z bY
1
f (x)dx = P1 (x)dx + (x − xi )f 00 (ξ(x))dx,
a a 2 a i=0
The first two integrals on the right-hand side are h2 f (x0 ) and h2 f (x1 ). Let’s evaluate
Z b
(x − x0 )(x − x1 )f 00 (ξ(x))dx.
a
We will use Theorem 65 for this computation. Note that the function (x − x0 )(x − x1 ) =
(x − a)(x − b) does not change sign on the interval [a, b] and it is integrable: so this function
serves the role of g(x) in Theorem 65. The other term, f 00 (ξ(x)), serves the role of f (x).
Applying the theorem, we get
Z b Z b
00 00
(x − a)(x − b)f (ξ(x))dx = f (ξ) (x − a)(x − b)dx
a a
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 123
where we kept the same notation ξ, somewhat inappropriately, as we moved f 00 (ξ(x)) from
inside to the outside of the integral. Finally, observe that
b
(a − b)3 −h3
Z
(x − a)(x − b)dx = = ,
a 6 6
b
h3
Z
h
f (x)dx = [f (x0 ) + f (x1 )] − f 00 (ξ).
a 2 12
• Simpson’s rule
Let f ∈ C 4 [a, b]. Take three equally-spaced nodes, x0 = a, x1 = a + h, x2 = b, where h = b−a
2 ,
and use the second degree Lagrange polynomial
Z b Z b 2
Z bY
1
f (x)dx = P2 (x)dx + (x − xi )f (3) (ξ(x))dx,
a a 3! a i=0
The sum of the first three integrals on the right-hand side simplify as:
h
3 [f (x0 ) + 4f (x1 ) + f (x2 )]. The last integral cannot be evaluated using Theorem 65
directly, like in the trapezoidal rule, since the function (x − x0 )(x − x1 )(x − x2 ) changes
sign on [a, b]. However, a clever application of integration by parts transforms the integral
to an integral where Theorem 65 is applicable (see Atkinson [3] for details), and the integral
5
simplifies as − h90 f (4) (ξ) for some ξ ∈ (a, b). In summary, we obtain
b
h5
Z
h
f (x)dx = [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ)
a 3 90
Exercise 4.1-1: Prove that the sum of the weights in Newton-Cotes rules is b − a, for any n.
Definition 66. The degree of accuracy, or precision, of a quadrature formula is the largest positive
integer n such that the formula is exact for f (x) = xk , when k = 0, 1, ..., n, or equivalently, for any
polynomial of degree less than or equal to n.
Observe that the trapezoidal and Simpson’s rules have degrees of accuracy of one and three.
These two rules are examples of closed Newton-Cotes formulas; closed refers to the fact that the end
points a, b of the interval are used as nodes in the quadrature rule. Here is the general definition.
Definition 67 (Closed Newton-Cotes). The (n + 1)-point closed Newton-Cotes formula uses nodes
xi = x0 + ih, for i = 0, 1, ..., n, where x0 = a, xn = b, h = b−a
n , and
xn xn n
x − xj
Z Z Y
wi = li (x)dx = dx.
x0 x0 xi − xj
j=0,j6=i
The following theorem provides an error formula for the closed Newton-Cotes formula. A proof
can be found in Isaacson and Keller [12].
b n n
hn+3 f (n+2) (ξ)
Z X Z
f (x)dx = wi f (xi ) + t2 (t − 1) · · · (t − n)dt,
a (n + 2)! 0
i=0
b n n
hn+2 f (n+1) (ξ)
Z X Z
f (x)dx = wi f (xi ) + t(t − 1) · · · (t − n)dt
a (n + 1)! 0
i=0
Some well known examples of closed Newton-Cotes formulas are the trapezoidal rule (n = 1),
Simpson’s rule (n = 2), and Simpson’s three-eighth rule (n = 3). Observe that in the (n + 1)-point
closed Newton-Cotes formula, if n is even, then the degree of accuracy is (n + 1), although the
interpolating polynomial is of degree n. The open Newton-Cotes formulas exclude the end points of
the interval.
Definition 69 (Open Newton-Cotes). The (n + 1)-point open Newton-Cotes formula uses nodes
xi = x0 + ih, for i = 0, 1, ..., n, where x0 = a + h, xn = b − h, h = b−a
n+2 and
b b n
x − xj
Z Z Y
wi = li (x)dx = dx.
a a j=0,j6=i xi − xj
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 125
The error formula for the open Newton-Cotes formula is given next; for a proof see Isaacson and
Keller [12].
b n n+1
hn+3 f (n+2) (ξ)
Z X Z
f (x)dx = wi f (xi ) + t2 (t − 1) · · · (t − n)dt,
a (n + 2)! −1
i=0
b n n+1
hn+2 f (n+1) (ξ)
Z X Z
f (x)dx = wi f (xi ) + t(t − 1) · · · (t − n)dt,
a (n + 1)! −1
i=0
• Midpoint rule
Take one node, x0 = a + h, which corresponds to n = 0 in the above theorem to obtain
b
h3 f 00 (ξ)
Z
f (x)dx = 2hf (x0 ) +
a 3
where h = (b − a)/2. This rule interpolates f by a constant (the values of f at the midpoint),
that is, a polynomial of degree 0, but it has degree of accuracy 1.
Remark 71. Both closed and open Newton-Cotes formulas using odd number of nodes (n is even),
gain an extra degree of accuracy beyond that of the polynomial interpolant on which they are based.
This is due to cancellation of positive and negative error.
There are some drawbacks of Newton-Cotes formulas:
• In general, these rules are not of the highest degree of accuracy possible for the number of
nodes used.
• The use of large number of equally spaced nodes may incur the erratic behavior associated
with high-degree polynomial interpolation. Weights for a high-order rule may be negative,
potentially leading to loss of significance errors.
• Let In denote the Newton-Cotes estimate of an integral based on n nodes. In may not converge
to the true integral as n → ∞ for perfectly well-behaved integrands.
R1
Example 72. Estimate 0.5 xx dx using the midpoint, trapezoidal, and Simpson’s rules.
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 126
Solution. Let f (x) = xx . The midpoint estimate for the integral is 2hf (x0 ) where h =
(b − a)/2 = 1/4 and x0 = 0.75. Then the midpoint estimate, using 6 digits, is f (0.75)/2 =
0.805927/2 = 0.402964. The trapezoidal estimate is h
2 [f (0.5) + f (1)] where h = 1/2, which re-
sults in 1.707107/4 = 0.426777. Finally, for Simpson’s rule, h = (b − a)/2 = 1/4, and thus the
estimate is
h 1
[f (0.5) + 4f (0.75) + f (1)] = [0.707107 + 4(0.805927) + 1] = 0.410901.
3 12
Solution. We will find how many of the polynomials 1, x, x2 , ... the rule can integrate exactly. If
p(x) = 1, then
Z 1
p(x)dx = c0 p(0) + c1 p(x1 ) ⇒ 1 = c0 + c1 .
0
If p(x) = x, we get
Z 1
1
p(x)dx = c0 p(0) + c1 p(x1 ) ⇒ = c1 x1
0 2
and p(x) = x2 implies
Z 1
1
p(x)dx = c0 p(0) + c1 p(x1 ) ⇒ = c1 x21 .
0 3
We have three unknowns and three equations, so we have to stop here. Solving the three equations
we get: c0 = 1/4, c1 = 3/4, x1 = 2/3. So the quadrature rule is of precision two and it is:
Z 1
1 3 2
f (x)dx = f (0) + f ( ).
0 4 4 3
R1
Exercise 4.1-2: Find c0 , c1 , c2 so that the quadrature rule −1 f (x)dx = c0 f (−1) + c1 f (0) +
c2 f (1) has degree of accuracy 2.
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 127
improving the accuracy significantly. Note that we have used five nodes, 0, 0.5, 1, 1.5, 2, which split
the domain (0, 2) into four subintervals.
The composite rules for midpoint, trapezoidal, and Simpson’s rule, with their error terms, are:
b n/2
b − a 2 00
Z X
f (x)dx = 2h f (x2j ) + h f (ξ) (4.2)
a 6
j=0
Exercise 4.2-1: Show that the quadrature rule in Example 74 corresponds to taking n = 4
in the composite Simpson’s formula (4.4).
Exercise 4.2-2: Show that the absolute error for the composite trapezoidal rule decays at
the rate of 1/n2 , and the absolute error for the composite Simpson’s rule decays at the rate of 1/n4 ,
where n is the number of subintervals.
R2
Example 75. Determine n that ensures the composite Simpson’s rule approximates 1 x log xdx
with an absolute error of at most 10−6 .
Solution. The error term for the composite Simpson’s rule is b−a 4 (4)
180 h f (ξ) where ξ is some number
between a = 1 and b = 2, and h = (b − a)/n. Differentiate to get f (x) = x23 . Then
(4)
b − a 4 (4) 1 42 h4
h f (ξ) = h 3 ≤
180 180 ξ 90
Trapezoidal rule
In [1]: def trap(f, a, b):
return (f(a)+f(b))*(b-a)/2
In [2]: trap(x->x^x,0.5,1)
Out[2]: 0.42677669529663687
Simpson’s rule
In [3]: def simpson(f, a, b):
return (f(a)+4*f((a+b)/2)+f(b))*(b-a)/6
Out[4]: 0.4109013813880978
Recall that the degree of accuracy of Simpson’s rule is 3. This means the rule integrates poly-
nomials 1, x, x2 , x3 exactly, but not x4 . We can use this as a way to verify our code:
In [5]: simpson(lambda x: x, 0, 1)
Out[5]: 0.5
Out[6]: 0.3333333333333333
Out[7]: 0.25
Out[8]: 0.20833333333333334
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 130
Out[10]: 0.636294560831306
to within 10−4 , and compute the approximation, using the composite trapezoidal and composite
Simpson’s rule.
However, since h = (b − a)/n, the bound simplifies as hn = (b − a). Therefore no matter how large
n is, that is, how large the number of function evaluations is, the roundoff error is bounded by the
same constant (b − a) which only depends on the size of the interval.
Exercise 4.2-4: (This problem shows that numerical quadrature is stable with respect to
error in function values.) Assume the function values f (xi ) are approximated by f˜(xi ), so that
|f (xi ) − f˜(xi )| < for any xi ∈ (a, b). Find an upper bound on the error of numerical quadrature
P ˜
wi f (xi ) when it is actually computed as
P
wi f (xi ).
choose xi and wi in such a way that the quadrature rule has the highest possible accuracy. Note
that unlike Newton-Cotes formulas where we started labeling the nodes with x0 , in the Gaussian
quadrature the first node is x1 . This difference in notation is common in the literature, and each
choice makes the subsequent equations in the corresponding theory easier to read.
Example 76. Let (a, b) = (−1, 1), and n = 2. Find the “best” xi and wi .
There are four parameters to determine: x1 , x2 , w1 , w2 . We need four constraints. Let’s require
the rule to integrate the following functions exactly: f (x) = 1, f (x) = x, f (x) = x2 , and f (x) = x3 .
R1
If the rule integrates f (x) = 1 exactly, then −1 dx = 2i=1 wi , i.e., w1 + w2 = 2. If the rule
P
R1
integrates f (x) = x exactly, then −1 xdx = 2i=1 wi xi , i.e., w1 x1 + w2 x2 = 0. Continuing this for
P
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 132
w1 + w2 = 2
w1 x 1 + w2 x 2 = 0
2
w1 x21 + w2 x22 =
3
3 3
w1 x1 + w2 x2 = 0.
√ √
Solving the equations gives: w1 = w2 = 1, x1 = − 3
3 , x2 = 3
3 . Therefore the quadrature rule is:
1
√ ! √ !
− 3
Z
3
f (x)dx ≈ f +f .
−1 3 3
Observe that:
• The accuracy of the rule is three, and it uses only two nodes. Recall that the accuracy of
Simpson’s rule is also three but it uses three nodes. In general Gaussian quadrature gives a
degree of accuracy of 2n − 1 using only n nodes.
We were able to solve for the nodes and weights in the simple example above, however, as
the number of nodes increases, the resulting non-linear system of equations will be very diffi-
cult to solve. There is an alternative approach using the theory of orthogonal polynomials, a
topic we will discuss in detail later. Here, we will use a particular set of orthogonal polynomials
{L0 (x), L1 (x), ..., Ln (x), ...} called Legendre polynomials. We will give a definition of these poly-
nomials in the next chapter. For this discussion, we just need the following properties of these
polynomials:
1 3 6 3
L0 (x) = 1, L1 (x) = x, L2 (x) = x2 − , L3 (x) = x3 − x, L4 (x) = x4 − x2 + .
3 5 7 35
How do these polynomials help with finding the nodes and the weights of the Gaussian quadrature
rule? The answer is short: the roots of the Legendre polynomials are the nodes of the quadrature
rule!
To summarize, the Gauss-Legendre quadrature rule for the integral of f over (−1, 1) is
Z 1 n
X
f (x)dx = wi f (xi )
−1 i=1
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 133
where x1 , x2 , ..., xn are the roots of the nth Legendre polynomial, and the weights are computed
using the following theorem.
Theorem 77. Suppose that x1 , x2 , ..., xn are the roots of the nth Legendre polynomial Ln (x) and
the weights are given by
1 n
x − xj
Z Y
wi = dx.
−1 j=1,j6=i xi − xj
Then the Gauss-Legendre quadrature rule has degree of accuracy 2n − 1. In other words, if P (x) is
any polynomial of degree less than or equal to 2n − 1, then
Z 1 n
X
P (x)dx = wi P (xi ).
−1 i=1
Proof. Let’s start with a polynomial P (x) with degree less than n. Construct the Lagrange inter-
polant for P (x), using the nodes as x1 , ..., xn
n n n
X X Y x − xj
P (x) = P (xi )li (x) = P (xi ).
xi − xj
i=1 i=1 j=1,j6=i
There is no error term above because the error term depends on the nth derivative of P (x), but
P (x) is a polynomial of degree less than n, so that derivative is zero. Integrate both sides to get
1 1 n n Z 1 X n n
x − xj −
Z Z X Y Y x x j
P (x)dx = P (xi ) dx = P (xi ) dx
−1 −1 xi − xj −1 x i − x j
i=1 j=1,j6=i i=1 j=1,j6=i
n 1 n
−
Z
X Y x x j
= P (xi ) dx
−1 x i − x j
i=1 j=1,j6=i
n Z 1 Y n
X x − xj
= dx P (xi )
−1 xi − xj
i=1 j=1,j6=i
n
X
= wi P (xi ).
i=1
Therefore the theorem is correct for polynomials of degree less than n. Now let P (x) be a polynomial
of degree greater than or equal to n, but less than or equal to 2n − 1. Divide P (x) by the Legendre
polynomial Ln (x) to get
P (x) = Q(x)Ln (x) + R(x).
Note that
P (xi ) = Q(xi )Ln (xi ) + R(xi ) = R(xi )
Table 4.1 displays the roots of the Legendre polynomials L2 , L3 , L4 , L5 and the corresponding
weights.
n Roots Weights
2 √1 = 0.5773502692 1
3
1
− √3 = −0.5773502692 1
3 −( 35 )1/2 = −0.7745966692 5
9
= 0.5555555556
8
0.0 9
= 0.8888888889
( 35 )1/2 = 0.7745966692 5
9
= 0.5555555556
4 0.8611363116 0.3478548451
0.3399810436 0.6521451549
-0.3399810436 0.6521451549
-0.8611363116 0.3478548451
5 0.9061798459 0.2369268850
0.5384693101 0.4786286705
0.0 0.5688888889
-0.5384693101 0.4786286705
-0.9061798459 0.2369268850
So far we discussed integrating functions over the interval (−1, 1). What if we have a different
Rb
integration domain? The answer is simple: change of variables! To compute a f (x)dx for any
a < b, we use the following change of variables:
2x − a − b 1
t= ⇔ x = [(b − a)t + a + b]
b−a 2
b 1
b−a
Z Z
1
f (x)dx = f [(b − a)t + a + b] dt.
a 2 −1 2
Z 1 Z 1 t+3
x 1 t+3 4
x dx = dt.
0.5 4 −1 4
For n = 2
t+3 √1 3 1 3
3 4 3+4 3 −4 3+4
Z 1 √
1 t+3 4 1 1 1
dt ≈ √ + + − √ + = 0.410759,
4 −1 4 4 4 3 4 4 3 4
using six significant digits. We will next use Python for a five-node computation of the integral.
R1 t+3
Now we compute 1
4 −1
t+3
4
4
dt using the code:
Out[2]: 0.41081564812239885
The next theorem is about the error of the Gauss-Legendre rule. Its proof can be found in
Atkinson [3]. The theorem shows, in particular, that the degree of accuracy of the quadrature rule,
using n nodes, is 2n − 1.
Theorem 80. Let f ∈ C 2n [−1, 1]. The error of Gauss-Legendre rule satisfies
b n
22n+1 (n!)4 f (2n) (ξ)
Z X
f (x)dx − wi f (xi ) =
a i=1
(2n + 1) [(2n)!]2 (2n)!
Using Stirling’s formula n! ∼ e−n nn (2πn)1/2 , where the symbol ∼ means the ratio of the two
sides converges to 1 as n → ∞, it can be shown that
22n+1 (n!)4 π
2 ∼ 4n .
(2n + 1) [(2n)!]
This means the error of Gauss-Legendre rule decays at an exponential rate of 1/4n as opposed to,
for example, the polynomial rate of 1/n4 for composite Simpson’s rule.
Exercise 4.3-1: Prove that the sum of the weights in Gauss-Legendre quadrature is 2, for
any n.
R 1.5
Exercise 4.3-2: Approximate 1 x2 log xdx using Gauss-Legendre rule with n = 2 and
n = 3. Compare the approximations to the exact value of the integral.
The domain R determines the difficulty in generalizing the one-dimensional formulas we learned
before. The simplest case would be a rectangular domain R = {(x, y)|a ≤ x ≤ b, c ≤ y ≤ d}. We
can then write the double integral as the iterated integral
Z Z Z b Z d
f (x, y)dA = f (x, y)dy dx.
R a c
Apply the rule using n2 nodes to the inner integral to get the approximation
Z b n2
X
wj f (x, yj ) dx
a j=1
where the yj ’s are the nodes. Rewrite, by interchanging the integral and summation, to get
n2
X Z b
wj f (x, yj )dx
j=1 a
and apply the quadrature rule again, using n1 nodes, to get the approximation
n2 n1
!
X X
wj wi f (xi , yj ) .
j=1 i=1
For simplicity, we ignored the error term in the above derivation; however, its inclusion is straight-
forward.
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 138
For an example, let’s derive the two-dimensional Gauss-Legendre rule for the integral
Z 1Z 1
f (x, y)dydx (4.5)
0 0
using two nodes for each axis. Note that each integral has to be transformed to (−1, 1). Start with
R1
the inner integral 0 f (x, y)dy and use
t = 2y − 1, dt = 2dy
to transform it to Z 1
1 t+1
f x, dt
2 −1 2
and apply Gauss-Legendre rule with two nodes to get the approximation
√ ! √ !!
1 −1/ 3 + 1 1/ 3 + 1
f x, +f x, .
2 2 2
1
√ ! √ !!
−1/ 3 + 1
Z
1 1/ 3 + 1
f x, +f x, dx.
0 2 2 2
s = 2x − 1, ds = 2dx
to get
1
√ ! √ !!
s + 1 −1/ 3 + 1
Z
1 s + 1 1/ 3 + 1
f , +f , ds.
4 0 2 2 2 2
1 1
( 3)
1 1
( 3)
− , ,
3 1 3
-1 1
1 1 1 1
( 3) ( 3)
-1
− ,− ,−
3 3
Z 1
1
(f (x, 0) + 4f (x, 0.5) + f (x, 1)) dx.
6 0
Apply Simpson’s rule again to this integral with n = 2 to obtain the final approximation:
"
1 1
f (0, 0) + 4f (0, 0.5) + f (0, 1) + 4(f (0.5, 0) + 4f (0.5, 0.5) + f (0.5, 1))
6 6
#
+ f (1, 0) + 4f (1, 0.5) + f (1, 1) . (4.7)
0.5
0 0.5 1
This integral can be evaluated exactly, and its value is 1. It is used as a test integral for numerical
quadrature rules. Evaluating equations (4.6) and (4.7) with f (x, y) = π2 sin πx π2 sin πy , we
where xi are pseudorandom vectors uniformly distributed in R. For the two-dimensional integral
we discussed before, the Monte Carlo estimate is
Z bZ d n
(b − a)(d − c) X
f (x, y)dydx ≈ f (a + (b − a)xi , c + (d − c)yi ) (4.8)
a c n
i=1
where xi , yi are pseudorandom numbers from (0, 1). In Python, the function np.random.rand()
generates a pseudorandom number from the uniform distribution between 0 and 1. The following
code takes the endpoints of the intervals a, b, c, d, and the number of nodes n, which is called the
sample size in the Monte Carlo literature, and returns the estimate for the integral using Equation
(4.8).
sum = 0.
for i in range(n):
sum += f(a+(b-a)*np.random.rand(),
c+(d-c)*np.random.rand())*(b-a)*(d-c)
return sum/n
R1R1
Now we use Monte Carlo to estimate the integral π π
0 0 2 sin πx 2 sin πy dydx:
In [2]: mc((x,y)->(pi^2/4)*sin(pi*x)*sin(pi*y),0,1,0,1,500)
Out[2]: 0.9441778334708931
With n = 500, we obtain a Monte Carlo estimate of 0.944178. An advantage of the Monte
Carlo method is its simplicity: the above code can be easily generalized to higher dimensions. A
√
disadvantage of Monte Carlo is its slow rate of convergence, which is O(1/ n). Figure (4.3) displays
500 pseudorandom vectors from the unit square.
Example 81. Capstick & Keister [5] discuss some high dimensional test integrals, some with ana-
lytical solutions, motivated by physical applications such as the calculation of quantum mechanical
matrix elements in atomic, nuclear, and particle physics. One of the integrals with a known solution
is Z
2
cos(ktk))e−ktk dt1 dt2 · · · dts
Rs
where ktk = (t21 +. . .+t2s )1/2 . This integral can be transformed to an integral over the s-dimensional
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 142
unit cube as
" 1/2 #
(F −1 (x1 ))2 + . . . + (F −1 (xs ))2
Z
π s/2 cos dx1 dx2 · · · dxs (4.9)
(0,1)s 2
where F −1 is the inverse of the cumulative distribution function of the standard normal distribution:
Z x
1 2
F (x) = 1/2
e−s /2 ds.
(2π) −∞
(i) (i)
where x(i) = (x1 , . . . , xs ) is an s-dimensional vector of uniform random numbers between 0 and
1.
The following algorithm, known as the Beasley-Springer-Moro algorithm [8], gives an approxi-
mation to F −1 (x).
c6 = 0.0000321767881768
c7 = 0.0000002888167364
c8 = 0.0000003960315187
y = u-0.5
if np.abs(y)<0.42:
r = y*y
x = y*(((a3*r+a2)*r+a1)*r+a0)/((((b3*r+b2)*r+b1)*r+b0)*r+1)
else:
r = u
if y>0:
r = 1-u
r = np.log(-np.log(r))
x = c0+r*(c1+r*(c2+r*(c3+r*(c4+r*(c5+r*(c6+r*(c7+r*c8)))))))
if y<0:
x = -x
return x
The following is the Monte Carlo estimate of the integral. It takes the dimension s and the
sample size n as inputs.
The exact value of the integral for s = 25 is 1.356914 × 106 . The following code computes the
relative error of the Monte Carlo estimate with sample size n.
Let’s plot the relative error of some Monte Carlo estimates. First, we generate sample sizes from
50,000 to 1,000,000 in increments of 50,000.
For each sample size, we compute the relative error, and then plot the results.
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 144
Figure 4.4: Monte Carlo relative error for the integral (4.9)
The latter integral can be evaluated using, for example, Simpson’s rule, provided f is smooth on
[0, π].
If the interval of integration is infinite, another approach that might work is truncation of the
interval to a finite one. The success of this approach depends on whether we can estimate the
resulting error.
R∞ 2
Example 83. Consider the improper integral 0 e−x dx. Write the integral as
Z ∞ Z t Z ∞
−x2 −x2 2
e dx = e dx + e−x dx,
0 0 t
where t is the "level of truncation" to determine. We can estimate the first integral on the right-
hand side using a quadrature rule. The second integral on the right-hand side is the error due to
R∞ 2 Rt 2
approximating 0 e−x dx by 0 e−x dx. An upper bound for the error can be found easily for this
example: note that when x ≥ t, x2 = xx ≥ tx, thus
Z ∞ Z ∞
2 2
e−x dx ≤ e−tx dx = e−t /t.
t t
2 R∞ 2 R5 2
When t = 5, e−t /t ≈ 10−12 , therefore, approximating the integral 0 e−x dx by 0 e−x dx will be
accurate within 10−12 . Additional error will come from estimating the latter integral by numerical
quadrature.
f (x0 + h) − f (x0 )
f 0 (x0 ) ≈
h
for small h. What this formula lacks, however, is it does not give any information about the error
of the approximation.
We will try another approach. Similar to Newton-Cotes quadrature, we will construct the
interpolating polynomial for f , and then use the derivative of the polynomial as an approximation
for the derivative of f .
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 146
Let’s assume f ∈ C 2 (a, b), x0 ∈ (a, b), and x0 + h ∈ (a, b). Construct the linear Lagrange
interpolating polynomial p1 (x) for the data (x0 , f (x0 )), (x1 , f (x1 )) = (x0 + h, f (x0 + h)). From
Theorem 51, we have
x − x1 x − x0 f 00 (ξ(x))
f (x) = f (x0 ) + f (x1 ) + (x − x0 )(x − x1 )
x − x1 x − x0 2!
|0 {z 1 } | {z }
p1 (x) interpolation error
x − (x0 + h) x − x0 f 00 (ξ(x))
= f (x0 ) + f (x0 + h) + (x − x0 )(x − x0 − h)
x0 − (x0 + h) x0 + h − x0 2!
x − x0 − h x − x0 f 00 (ξ(x))
= f (x0 ) + f (x0 + h) + (x − x0 )(x − x0 − h).
−h h 2!
In the above equation, we know ξ is between x0 and x0 + h; however, we have no knowledge about
ξ 0 (x), which appears in the last term. Fortunately, if we set x = x0 , the term with ξ 0 (x) vanishes
and we get:
f (x0 + h) − f (x0 ) h 00
f 0 (x0 ) = − f (ξ(x0 )).
h 2
This formula is called the forward-difference formula if h > 0 and backward-difference formula
if h < 0. Note that from this formula we can obtain a bound on the error
f (x0 ) − f (x0 + h) − f (x0 ) ≤ h
0
f 00 (x).
sup
h 2
x∈(x0 ,x0 +h)
n
X (x − x0 )(x − x1 ) · · · (x − xn )
f (x) = f (xk )lk (x) + f (n+1) (ξ)
(n + 1)!
k=0
n
X d (x − x0 )(x − x1 ) · · · (x − xn )
⇒ f 0 (x) = f (xk )lk0 (x) + f (n+1) (ξ)
dx (n + 1)!
k=0
(x − x0 )(x − x1 ) · · · (x − xn ) d (n+1)
+ (f (ξ)).
(n + 1)! dx
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 147
If x = xj for j = 0, 1, ..., n, the last term vanishes, and using the following result
n
d Y
[(x − x0 )(x − x1 ) · · · (x − xn )]x=xj = (xj − xk )
dx
k=0,k6=j
we obtain
n n
0
X f (n+1) (ξ(xj )) Y
f (xj ) = f (xk )lk0 (xj ) + (xj − xk ) (4.10)
(n + 1)!
k=0 k=0,k6=j
which is called the (n + 1)-point formula to approximate f 0 (xj ). The most common formulas use
n = 2 and n = 4. Here we discuss n = 2, that is, three-point formulas. The nodes are, x0 , x1 , x2 .
The Lagrange basis polynomials and their derivatives are:
(x − x1 )(x − x2 ) 2x − x1 − x2
l0 (x) = ⇒ l00 (x) =
(x0 − x1 )(x0 − x2 ) (x0 − x1 )(x0 − x2 )
(x − x0 )(x − x2 ) 2x − x0 − x2
l1 (x) = ⇒ l10 (x) =
(x1 − x0 )(x1 − x2 ) (x1 − x0 )(x1 − x2 )
(x − x0 )(x − x1 ) 2x − x0 − x1
l2 (x) = ⇒ l20 (x) =
(x2 − x0 )(x2 − x1 ) (x2 − x0 )(x2 − x1 )
These derivatives can be substituted in (4.10) to obtain the three-point formula. We can simplify
these formulas if the nodes are spaced equally, that is, x1 = x0 + h, x2 = x1 + h = x0 + 2h. Then,
we obtain
1 h2
f 0 (x0 ) = [−3f (x0 ) + 4f (x0 + h) − f (x0 + 2h)] + f (3) (ξ0 ) (4.11)
2h 3
1 h 2
f 0 (x0 + h) = [−f (x0 ) + f (x0 + 2h)] − f (3) (ξ1 ) (4.12)
2h 6
1 h2
f 0 (x0 + 2h) = [f (x0 ) − 4f (x0 + h) + 3f (x0 + 2h)] + f (3) (ξ2 ). (4.13)
2h 3
It turns out that the first and third equations ((4.11) and (4.13)) are equivalent. To see this,
first substitute x0 by x0 − 2h in the third equation to get (ignoring the error term)
1
f 0 (x0 ) = [f (x0 − 2h) − 4f (x0 − h) + 3f (x0 )] ,
2h
The three-point midpoint formula has some advantages: it has half the error of the endpoint formula,
and it has one less function evaluation. The endpoint formula is useful if one does not know the
value of f on one side, a situation that may happen if x0 is close to an endpoint.
Example 84. The following table gives the values of f (x) = sin x. Estimate f 0 (0.1), f 0 (0.3) using
an appropriate three-point formula.
x f (x)
0.1 0.09983
0.2 0.19867
0.3 0.29552
0.4 0.38942
Solution. To estimate f 0 (0.1), we set x0 = 0.1, and h = 0.1. Note that we can only use the
three-point endpoint formula.
1
f 0 (0.1) ≈ (−3(0.09983) + 4(0.19867) − 0.29552) = 0.99835.
0.2
1
f 0 (0.3) ≈ (0.38942 − 0.19867) = 0.95375.
0.2
The correct answer is cos 0.3 = 0.955336 and thus the absolute error is 1.59 × 10−3 . If we use the
endpoint formula to estimate f 0 (0.3) we set h = −0.1 and compute
1
f 0 (0.3) ≈ (−3(0.29552) + 4(0.19867) − 0.09983) = 0.95855
−0.2
Estimate f 0 (1.01) using the noisy data, and compute its relative error. How does the relative
error compare with the relative error for the non-noisy data?
We next want to explore how to estimate the second derivative of f. A similar approach to
estimating f 0 can be taken and the second derivative of the interpolating polynomial can be used
as an approximation. Here we will discuss another approach, using Taylor expansions. Expand f
about x0 , and evaluate it at x0 + h and x0 − h:
h2 00 h3 (3) h4 (4)
f (x0 + h) = f (x0 ) + hf 0 (x0 ) + f (x0 ) + f (x0 ) + f (ξ+ )
2 6 24
h2 h3 (3) h4 (4)
f (x0 − h) = f (x0 ) − hf 0 (x0 ) + f 00 (x0 ) − f (x0 ) + f (ξ− )
2 6 24
where ξ+ is between x0 and x0 + h, and ξ− is between x0 and x0 − h. Add the equations to get
h4 h (4) i
f (x0 + h) + f (x0 − h) = 2f (x0 ) + h2 f 00 (x0 ) + f (ξ+ ) + f (4) (ξ− ) .
24
What Arya’s engineering classmates want to do is compute the derivative information of the
black box, that is, f 0 (x), when x = 2. (The input to this black box can be any real number.)
Students want to use the three-point midpoint formula to estimate f 0 (2):
1
f 0 (2) ≈ [f (2 + h) − f (2 − h)] .
2h
They argue how to pick h in this formula. One of them says they should make h as small as possible,
like 10−8 . Arya is skeptical. She mutters to herself, "I know I slept through some of my numerical
analysis lectures, but not all!"
She tells her classmates about the cancellation of leading digits phenomenon, and to make her
point more convincing, she makes the following experiment: let f (x) = ex , and suppose we want
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 151
to compute f 0 (2) which is e2 . Arya uses the three-point midpoint formula above to estimate f 0 (2),
for various values of h, and for each case she computes the absolute value of the difference between
the three-point midpoint estimate and the exact solution e2 . She tabulates her results in the table
below. Clearly, smaller h does not result in smaller error. In this experiment, h = 10−6 gives the
smallest error.
Theoretical analysis
Numerical differentiation is a numerically unstable problem. To reduce the truncation error, we need
to decrease h, which in turn increases the roundoff error due to cancellation of significant digits in
the function difference calculation. Let e(x) denote the roundoff error in computing f (x) so that
f (x) = f˜(x) + e(x), where f˜ is the value computed by the computer. Consider the three-point
midpoint formula:
0 f˜(x0 + h) − f˜(x0 − h)
f (x0 ) −
2h
0 f (x0 + h) − e(x 0 + h) − f (x 0 − h) + e(x 0 − h)
= f (x0 ) −
2h
0 f (x0 + h) − f (x0 − h) e(x0 − h) − e(x0 + h)
= f (x0 ) −
+
2h 2h
2
e(x0 − h) − e(x0 + h) h2
h
= − f (3) (ξ) + ≤ 6 M+h
6 2h
where we assumed |f (3) (ξ)| ≤ M and |e(x)| ≤ . To reduce the truncation error h2 M/6 one would
decrease h, which would then result in an increase in the roundoff error /h. An optimal value
for h can be found with these assumptions using Calculus: find the value for h that minimizes the
2
function s(h) = M6h + h . The answer is h = 3 3/M .
p
Let’s revisit the table Arya presented where 10−6 was found to be the optimal value for h. The
calculations were done using Python, which reports 15 digits when asked for e2 . Let’s assume all
these digits are correct, and thus let = 10−16 . Since f (3) (x) = ex and e2 ≈ 7.4, let’s take M = 7.4.
Then
3/M = 3 3 × 10−16 /7.4 ≈ 3.4 × 10−6
p
3
p
h=
which is in good agreement with the optimal value 10−6 of Arya’s numerical results.
CHAPTER 4. NUMERICAL QUADRATURE AND DIFFERENTIATION 152
Exercise 4.6-2: Find the optimal value for h that will minimize the error for the formula
f (x0 + h) − f (x0 ) h 00
f 0 (x0 ) = − f (ξ)
h 2
a) Consider estimating f 0 (1) where f (x) = x2 using the above formula. What is the optimal
value for h for estimating f 0 (1), assuming that the roundoff error is bounded by = 10−16
(which is the machine epsilon 2−53 in the 64-bit floating point representation).
c) Discuss your findings in parts (a) and (b) and how they relate to each other.
Rx 2
Exercise 4.6-3: The function 0
√1 e−t /2 dt
2π
is related to the distribution function of the
standard normal random variable, a very important distribution in probability and statistics. Often
times we want to solve equations like
Z x
1 2
√ e−t /2 dt = z (4.14)
0 2π
for x, where z is some real number between 0 and 1. This can be done by using Newton’s method
to solve the equation f (x) = 0 where
Z x
1 2
f (x) = √ e−t /2 dt − z.
0 2π
2
Note that from Fundamental Theorem of Calculus, f 0 (x) = √1 e−x /2 .
2π
Newton’s method will require
the calculation of Z pk
1 2
√ e−t /2 dt (4.15)
0 2π
where pk is a Newton iterate. This integral can be computed using numerical quadrature. Write
a Python code that takes z as its input, and outputs x, such that Equation (4.14) holds. In your
code, use the Python codes for Newton’s method and the composite Simpson’s rule you were given
in class. For Newton’s method set tolerance to 10−5 and p0 = 0.5, and for composite Simpson’s rule
take n = 10, when computing the integrals (4.15) that appear in Newton iteration. Then run your
code for z = 0.4 and z = 0.1, and report your output.
Chapter 5
Approximation Theory
Arya’s professor thinks the relationship between the variables should be linear, but we do not see
data falling on a perfect line because of measurement error. The professor is not happy, professors
are usually not happy when lab results act up, and asks Arya to come up with a linear formula,
153
CHAPTER 5. APPROXIMATION THEORY 154
something like y = ax + b, to explain the relationship. Arya first thinks about interpolation, but
quickly realizes it is not a good idea. (Why?). Let’s help Arya with her problem.
There are certainly many other choices we have for the line: we could increase or decrease the
slope a little, change the intercept a bit, and obtain multiple lines that have a visually good fit to
the data. The crucial question is, how can we decide which line is the "best" line, among all the
possible lines? If we can quantify how good the fit of a given line is to the data, and come up with
a notion for error, perhaps then we can find the line that minimizes this error.
Let’s generalize the problem a little. We have:
and we want to find a line that gives the “best” approximation to the data:
2. How do we find a, b that gives the line with the "best" approximation?
CHAPTER 5. APPROXIMATION THEORY 155
Observe that for each xi , there is the corresponding yi of the data point, and f (xi ) = axi + b,
which is the predicted value by the linear approximation. We can measure error by considering the
deviations between the actual y coordinates and the predicted values:
There are several ways we can form a measure of error using these deviations, and each approach
gives a different line approximating the data. The best approximation means finding a, b that
minimizes the error measured in one of the following ways:
In this chapter we will discuss the least squares problem, the simplest one among the three options.
We want to minimize
m
X
E= (yi − axi − b)2
i=1
∂E ∂E
= 0 and = 0.
∂a ∂b
We have:
m m
∂E X ∂E X
= (yi − axi − b)2 = (−2xi )(yi − axi − b) = 0
∂a ∂a
i=1 i=1
m m
∂E X ∂E X
= (yi − axi − b)2 = (−2)(yi − axi − b) = 0
∂b ∂b
i=1 i=1
which are called the normal equations. The solution to this system of equations is
Pm Pm Pm Pm 2
Pm Pm Pm
m i=1 xi yi − i=1 xi yi i=1 xi i=1 yi − i=1 xi yi i=1 xi
a= Pm 2 Pm i=12 , b = Pm 2 Pm 2 .
m i=1 xi − ( i=1 xi ) m i=1 xi − ( i=1 xi )
where m will be usually much larger than n. Similar to the above discussion, we want to minimize
2
m m n
aj xji
X X X
E= (yi − Pn (xi ))2 = yi −
i=1 i=1 j=0
with respect to the parameters an , an−1 , ..., a0 . For a minimum to occur, the necessary conditions
are
m n m
!
∂E
xk+j
X X X
=0⇒− yi xki + aj i =0
∂ak
i=1 j=0 i=1
for k = 0, 1, ..., n. (we are skipping some algebra here!) The normal equations for polynomial
approximation are
n m m
!
xik+j
X X X
aj = yi xki (5.1)
j=0 i=1 i=1
for k = 0, 1, ..., n. This is a system of (n + 1) equations and (n + 1) unknowns. We can write this
system as a matrix equation
Aa = b (5.2)
where a is the unknown vector we are trying to find, and b is the constant vector
P
m
a0 yi
Pmi=1
a1
i=1 yi xi
a = . ,b = ..
..
P .
m n
an y
i=1 i i x
The equation Aa = b has a unique solution if the xi are distinct, and n ≤ m−1. Solving this equation
by computing the inverse matrix A−1 is not advisable, since there could be significant roundoff error.
Next, we will write a Python code for least squares approximation, and use the Python function
np.linalg.solve(A, b) to solve the matrix equation Aa = b for a. The np.linalg.solve function in
CHAPTER 5. APPROXIMATION THEORY 157
Python uses numerically optimized matrix factorizations based on the LAPACK routine to solve
the matrix equation. More details on this topic can be found in Heath [10] (Chapter 3).
The function leastsqfit takes the x- and y-coordinates of the data, and the degree of the
polynomial we want to use, n, as inputs. It solves the matrix Equation (5.2).
A[i, j] = A[j, i]
a = np.linalg.solve(A, b)
return a
Here is the data used to produce the first plot of the chapter: Arya’s data:
The polynomial is −0.746667 + 3.15143x. The next function poly(x,a) takes the output of
a = leastsqfit, and evaluates the least squares polynomial at x.
For example, if we want to compute the least squares line at 3.5, we call the following functions:
Out[6]: 10.283333333333335
Pm
The next function computes the least squares error: E = i=1 (yi −pn (xi )) .
2 It takes the output
of a = leastsqfit, and the data, as inputs.
Out[8]: 2.6070476190476177
Next we plot the least squares line and the data together.
Out[11]: 1.4869285714285714
Out[13]: 1.2664285714285732
Out[15]: 0.723214285714292
Finally, we try a fifth degree polynomial. Recall that the normal equations have a unique
solution when xi are distinct, and n ≤ m − 1. Since m = 6 in this example, n = 5 is the largest
degree with guaranteed unique solution.
The approximating polynomial of degree five is the interpolating polynomial! What is the least
squares error?
that has the best fit to some data (t1 , T1 ), ..., (tm , Tm ) in the least-squares sense. This function is
used in modeling weather temperature data, where t denotes time, and T denotes the temperature.
The following figure plots the daily maximum temperature during a period of 1,056 days, from 2016
until November 21, 2018, as measured by a weather station at Melbourne airport, Australia1 .
To find the best fit function of the form (5.3), we write the least squares error term
m m 2
X
2
X 2πti 2πti
E= (f (ti ) − Ti ) = a + bti + c sin + d cos − Ti ,
365 365
i=1 i=1
and set its partial derivatives with respect to the unknowns a, b, c, d to zero to obtain the normal
1
http://www.bom.gov.au/climate/data/
CHAPTER 5. APPROXIMATION THEORY 163
equations:
m
∂E X 2πti 2πti
=0⇒ 2 a + bti + c sin + d cos − Ti = 0
∂a 365 365
i=1
m
X 2πti 2πti
⇒ a + bti + c sin + d cos − Ti = 0, (5.4)
365 365
i=1
m
∂E X 2πti 2πti
=0⇒ (2ti ) a + bti + c sin + d cos − Ti = 0
∂b 365 365
i=1
m
X 2πti 2πti
⇒ ti a + bti + c sin + d cos − Ti = 0, (5.5)
365 365
i=1
m
∂E X 2πti 2πti 2πti
=0⇒ 2 sin a + bti + c sin + d cos − Ti = 0
∂c 365 365 365
i=1
m
X 2πti 2πti 2πti
⇒ sin a + bti + c sin + d cos − Ti = 0, (5.6)
365 365 365
i=1
m
∂E X 2πti 2πti 2πti
=0⇒ 2 cos a + bti + c sin + d cos − Ti = 0
∂d 365 365 365
i=1
m
X 2πti 2πti 2πti
⇒ cos a + bti + c sin + d cos − Ti = 0. (5.7)
365 365 365
i=1
Rearranging terms in equations (5.4, 5.5, 5.6, 5.7), we get a system of four equations and four
CHAPTER 5. APPROXIMATION THEORY 164
unknowns:
m m m X m
X X 2πti X 2πti
am + b ti + c sin +d cos = Ti
365 365
i=1 i=1 i=1 i=1
m m m m X m
X X
2
X 2πti X 2πti
a ti + b ti + c ti sin +d ti cos = Ti ti
365 365
i=1 i=1 i=1 i=1 i=1
m m m m
X 2πti X 2πti X
2 2πti
X 2πti 2πti
a sin +b ti sin +c sin +d sin cos
365 365 365 365 365
i=1 i=1 i=1 i=1
m
X 2πti
= Ti sin
365
i=1
m m m m
X 2πti X 2πti X 2πti 2πti X 2πti
a cos +b ti cos +c sin cos +d cos2
365 365 365 365 365
i=1 i=1 i=1 i=1
m
X 2πti
= Ti cos
365
i=1
Next, we will use Python to load the data and define the matrices A, r, and then solve the equation
Ax = r, where x = [a, b, c, d]T .
We will use a package called pandas to import data. We import it in our notebook, along with
NumPy and Matplotlib:
We assume that the data which consists of temperatures is downloaded as a csv file in the same
directory where the Python notebook is stored. Make sure the data has no missing entries. The
function pd.read_csv imports the data into Python as a table (dataframe):
In [2]: df = pd.read_csv('WeatherData.csv')
df
CHAPTER 5. APPROXIMATION THEORY 165
Out[2]: Temp
0 28.7
1 27.5
2 28.2
3 24.5
4 25.6
... ...
1051 19.1
1052 27.1
1053 31.0
1054 27.0
1055 22.7
The next step is to store the part of the data we need as an array. In our table there is only
one column named Temp.
Let’s check the type of temp, its first entry, and its length:
In [4]: type(temp)
Out[4]: numpy.ndarray
In [5]: temp[0]
Out[5]: 28.7
In [6]: temp.size
Out[6]: 1056
There are 1,056 temperature values. The x-coordinates are the days, numbered t = 1, 2, ..., 1056.
Here is the array that stores these time values:
Next we define the matrix A, taking advantage of the fact that the matrix is symmetric. The
function np.sum(x) adds the entries of the array x.
CHAPTER 5. APPROXIMATION THEORY 166
In [8]: A = np.zeros((4,4))
A[0,0] = 1056
A[0,1] = np.sum(time)
A[0,2] = np.sum(np.sin(2*np.pi*time/365))
A[0,3] = np.sum(np.cos(2*np.pi*time/365))
A[1,1] = np.sum(time**2)
A[1,2] = np.sum(time*np.sin(2*np.pi*time/365))
A[1,3] = np.sum(time*np.cos(2*np.pi*time/365))
A[2,2] = np.sum(np.sin(2*np.pi*time/365)**2)
A[2,3] = np.sum(np.sin(2*np.pi*time/365)*np.cos(2*np.pi*time/365))
A[3,3] = np.sum(np.cos(2*np.pi*time/365)**2)
for i in range(1,4):
for j in range(i):
A[i,j] = A[j,i]
In [9]: A
Now we define the vector r. The function np.dot(x,y) takes the dot product of the arrays x, y.
For example, np.dot([1, 2, 3], [4, 5, 6]) = 1 × 4 + 2 × 5 + 3 × 6 = 32.
In [10]: r = np.zeros((4,1))
r[0] = np.sum(temp)
r[1] = np.dot(temp, time)
r[2] = np.dot(temp, np.sin(2*np.pi*time/365))
r[3] = np.dot(temp, np.cos(2*np.pi*time/365))
In [11]: r
Out[11]: array([[2.18615000e+04],
[1.13803102e+07],
[1.74207707e+03],
[2.78776127e+03]])
CHAPTER 5. APPROXIMATION THEORY 167
In [12]: np.linalg.solve(A, r)
Out[12]: array([[2.02897563e+01],
[1.16773021e-03],
[2.72116176e+00],
[6.88808561e+00]])
Recall that these constants are the values of a, b, c, d in the definition of f (t). Here is the best
fitting function to the data:
Linearizing data
For another example of non-polynomial least squares, consider finding the function f (x) = beax
with the best least squares fit to some data (x1 , y1 ), (x2 , y2 ), ..., (xm , ym ). We need to find a, b that
minimize
m
X
E= (yi − beaxi )2 .
i=1
y = beax . (5.8)
and rename the variables as Y = log y, B = log b. Then we obtain the expression
Y = ax + B (5.9)
which is a linear equation in the transformed variable. In other words, if the original variable y is
related to x via Equation (5.8), then Y = log y is related to x via a linear relationship given by
Equation (5.9). So, the new approach is to fit the least squares line Y = ax + B to the data
However, it is important to realize that the least squares fit to the transformed data is not
necessarily the same as the least squares fit to the original data. The reason is the deviations which
least squares minimize are distorted in a non-linear way by the transformation.
x 0 1 2 3 4 5
y 3 5 8 12 23 37
to which we will fit y = beax in the least-squares sense. The following table displays the data
(xi , log yi ), using two-digits:
x 0 1 2 3 4 5
Y = log y 1.1 1.6 2.1 2.5 3.1 3.6
CHAPTER 5. APPROXIMATION THEORY 169
In [2]: leastsqfit(x, y, 1)
Y = 0.5x + 1.1.
This equation corresponds to Equation (5.9), with a = 0.5 and B = 1.1. We want to obtain the
corresponding exponential Equation (5.8), where b = eB . Since e1.1 = 3, the best fitting exponential
function to the data is y = 3ex/2 . The following graph plots y = 3ex/2 together with the data.
Exercise 5.1-1: Find the function of the form y = aex + b sin(4x) that best fits the data
below in the least squares sense.
x 1 2 3 4 5
y -4 6 -1 5 20
Exercise 5.1-2: Power-law type relationships are observed in many empirical data. Two
variables y, x are said to be related via a power-law if y = kxα , where k, α are some constants. The
following data2 lists the top 10 family names in the order of occurrence according to Census 2000.
Investigate whether relative frequency of occurrences and the rank of the name are related via a
power-law, by
a) Let y be the relative frequencies (number of occurrences divided by the total number of
occurrences), and x be the rank, that is, 1 through 10.
b) Use least squares to find a function of the form y = kxα . Use linearization.
c) Plot the data together with the best fitting function found in part (b).
polynomial will mean the polynomial that minimizes the least squares error:
2
Z b n
X
E= f (x) − aj xj dx. (5.10)
a j=0
2
https://www.census.gov/topics/population/genealogy/data/2000_surnames.html
CHAPTER 5. APPROXIMATION THEORY 171
2
Z b Z b n Z b Xn
∂E ∂ X
= f 2 (x)dx − 2 f (x) aj xj dx + aj xj dx
∂ak ∂ak a a a j=0 j=0
Z b n
X Z b
= −2 f (x)xk dx + 2 aj xj+k dx = 0,
a j=0 a
which gives the (n + 1) normal equations for the continuous least squares problem:
n
X Z b Z b
aj x j+k
dx = f (x)xk dx (5.11)
j=0 a a
for k = 0, 1, ..., n. Note that the only unknowns in these equations are the aj ’s; hence this is a linear
system of equations. It is instructive to compare these normal equations with those of the discrete
least squares problem:
n m m
!
xk+j
X X X
aj i = yi xki .
j=0 i=1 i=1
Example 86. Find the least squares polynomial approximation of degree 2 to f (x) = ex on (0, 2).
2
X Z 2 Z 2
j+k
aj x dx = ex xk dx
j=0 0 0
8
2a0 + 2a1 + a2 = e2 − 1
3
8
2a0 + a1 + 4a2 = e2 + 1
3
8 32
a0 + 4a1 + a2 = 2e2 − 2
3 5
CHAPTER 5. APPROXIMATION THEORY 172
The solution method we have discussed for the least squares problem by solving the normal
equations as a matrix equation has certain drawbacks:
Rb
• The integrals xi+j dx = bi+j+1 − ai+j+1 /(i + j + 1) in the coefficient matrix give rise to
a
matrix equation that is prone to roundoff error.
• There is no easy way to go from Pn (x) to Pn+1 (x) (which we might want to do if we were not
satisfied by the approximation provided by Pn ).
There is a better approach to solve the discrete and continuous least squares problem using the
orthogonal polynomials we encountered in Gaussian quadrature. Both discrete and continuous
least squares problem tries to find a polynomial Pn (x) = nj=0 aj xj that satisfies some properties.
P
Notice how the polynomial is written in terms of the monomial basis functions xj and recall how
these basis functions caused numerical difficulties in interpolation. That was the reason we discussed
different basis functions like Lagrange and Newton for the interpolation problem. So the idea is to
write Pn (x) in terms of some other basis functions
n
X
Pn (x) = aj φj (x)
j=0
which would then update the normal equations for continuous least squares (5.11) as
n
X Z b Z b
aj φj (x)φk (x)dx = f (x)φk (x)dx
j=0 a a
for k = 0, 1, ..., n. The normal equations for the discrete least squares (5.1) gets a similar update:
n m m
!
X X X
aj φj (xi )φk (xi ) = yi φk (xi ).
j=0 i=1 i=1
Going forward, the crucial observation is that the integral of the product of two functions
φj (x)φk (x)dx, or the summation of the product of two functions evaluated at some discrete
R
points, φj (xi )φk (xi ), can be viewed as an inner product hφj , φk i of two vectors in a suitably de-
P
fined vector space. And when the functions (vectors) φj are orthogonal, the inner product hφj , φk i
is 0 if j 6= k, which makes the normal equations trivial to solve. We will discuss details in the next
section.
CHAPTER 5. APPROXIMATION THEORY 173
Let’s recall the definition of an inner product: it is a real valued function with the following prop-
erties:
1. hf, gi = hg, f i
The mysterious function w(x) in (5.12) is called a weight function. Its job is to assign different
importance to different regions of the interval [a, b]. The weight function is not arbitrary; it has to
satisfy some properties.
With our new terminology and set-up, we can write the least squares problem as follows:
Problem (Continuous least squares) Given f ∈ C 0 [a, b], find a polynomial Pn (x) ∈ Pn that
minimizes Z b
w(x)(f (x) − Pn (x))2 dx = hf (x) − Pn (x), f (x) − Pn (x)i .
a
We will see this inner product can be calculated easily if Pn (x) is written as a linear combi-
nation of orthogonal basis polynomials: Pn (x) = nj=0 aj φj (x).
P
CHAPTER 5. APPROXIMATION THEORY 174
We need some definitions and theorems to continue with our quest. Let’s start with a formal
definition of orthogonal functions.
Definition 88. Functions {φ0 , φ1 , ..., φn } are orthogonal for the interval [a, b] and with respect to
the weight function w(x) if
b 0 if j 6= k
Z
hφj , φk i = w(x)φj (x)φk (x)dx =
a α > 0 if j = k
j
where αj is some constant. If, in addition, αj = 1 for all j, then the functions are called orthonormal.
How can we find an orthogonal or orthonormal basis for our vector space? Gram-Schmidt
process from linear algebra provides the answer.
Theorem 89 (Gram-Schmidt process). Given a weight function w(x), the Gram-Schmidt process
constructs a unique set of polynomials φ0 (x), φ1 (x), ..., φn (x) where the degree of φi (x) is i, such
that
0 if j 6= k
hφj , φk i =
1 if j = k
Let’s discuss two orthogonal polynomials that can be obtained from the Gram-Schmidt process
using different weight functions.
Example 90 (Legendre Polynomials). If w(x) ≡ 1 and [a, b] = [−1, 1], the first four polynomials ob-
tained from the Gram-Schmidt process, when the process is applied to the monomials 1, x, x2 , x3 , ...,
are: r r r r
1 3 1 5 1 7
φ0 (x) = , φ1 (x) = x, φ2 (x) = (3x2 − 1), φ3 (x) = (5x3 − 3x).
2 2 2 2 2 2
Often these polynomials are written in its orthogonal form; that is, we drop the requirement
hφj , φj i = 1 in the Gram-Schmidt process, and we scale the polynomials so that the value of
each polynomial at 1 equals 1. The first four polynomials in that form are
3 1 5 3
L0 (x) = 1, L1 (x) = x, L2 (x) = x2 − , L3 (x) = x3 − x.
2 2 2 2
These are the Legendre polynomials, polynomials we first discussed in Gaussian quadrature, Section
4.33 . They can be obtained from the following recursion
2n + 1 n
Ln+1 (x) = xLn (x) − Ln−1 (x),
n+1 n+1
3
The Legendre polynomials in Section 4.3 differ from these by a constant factor. For example, in Section
4.3 the third polynomial was L2 (x) = x2 − 13 , but here it is L2 (x) = 32 (x2 − 31 ). Observe that multiplying these
polynomials by a constant does not change their roots (what we were interested in Gaussian quadrature),
or their orthogonality.
CHAPTER 5. APPROXIMATION THEORY 175
Exercise 5.3-1: Show, by direct integration, that the Legendre polynomials L1 (x) and L2 (x)
are orthogonal.
Example 91 (Chebyshev polynomials). If we take w(x) = (1 − x2 )−1/2 and [a, b] = [−1, 1], and
again drop the orthonormal requirement in Gram-Schmidt, we obtain the following orthogonal
polynomials:
T0 (x) = 1, T1 (x) = x, T2 (x) = 2x2 − 1, T3 (x) = 4x3 − 3x, ...
These polynomials are called Chebyshev polynomials and satisfy a curious identity:
for n = 1, 2, . . ., and
0 if j 6= k
hTj , Tk i = π if j = k = 0
π/2 if j = k > 0.
If we take the first n + 1 Legendre or Chebyshev polynomials, call them φ0 , ..., φn , then these
polynomials form a basis for the vector space Pn . In other words, they form a linearly independent
set of functions, and any polynomial from Pn can be written as a unique linear combination of
them. These statements follow from the following theorem, which we will leave unproved.
Theorem 92. 1. If φj (x) is a polynomial of degree j for j = 0, 1, ..., n, then φ0 , ..., φn are
linearly independent.
2. If φ0 , ..., φn are linearly independent in Pn , then for any q(x) ∈ Pn , there exist unique con-
stants c0 , ..., cn such that q(x) = nj=0 cj φj (x).
P
Exercise 5.3-2: Prove that if {φ0 , φ1 , ..., φn } is a set of orthogonal functions, then they must
be linearly independent.
CHAPTER 5. APPROXIMATION THEORY 176
We have developed what we need to solve the least squares problem using orthogonal polyno-
mials. Let’s go back to the problem statement:
Given f ∈ C 0 [a, b], find a polynomial Pn (x) ∈ Pn that minimizes
Z b
E= w(x)(f (x) − Pn (x))2 dx = hf (x) − Pn (x), f (x) − Pn (x)i
a
Pn
with Pn (x) written as a linear combination of orthogonal basis polynomials: Pn (x) = j=0 aj φj (x).
In the previous section, we solved this problem using calculus by taking the partial derivatives of E
with respect to aj and setting them equal to zero. Now we will use linear algebra:
n n n
* +
X X X XX
E= f− aj φj , f − aj φj = hf, f i − 2 aj hf, φj i + ai aj hφi , φj i
j=0 j=0 j=0 i j
n
X n
X
= kf k2 − 2 aj hf, φj i + a2j hφj , φj i
j=0 j=0
Xn Xn
= kf k2 − 2 aj hf, φj i + a2j αj
j=0 j=0
n n 2
X hf, φj i2 X hf, φj i √
= kf k2 − + √ − aj αj .
αj αj
j=0 j=0
hf,φ i
Minimizing this expression with respect to aj is now obvious: simply choose aj = αjj so that the
i2
√
h
hf,φ i
last summation in the above equation, nj=0 √αjj − aj αj , vanishes. Then we have solved the
P
n
X hf, φj i2
E = kf k2 − .
αj
j=0
If the polynomials φ0 , ..., φn are orthonormal, then these equations simplify by setting αj = 1.
We will appreciate the ease at which Pn (x) can be computed using this approach, via formula
(5.13), as opposed to solving the normal equations of (5.11) when we discuss some examples. But
first let’s see the other advantage of this approach: how Pn+1 (x) can be computed from Pn (x). In
CHAPTER 5. APPROXIMATION THEORY 177
n+1 n
X hf, φj i X hf, φj i hf, φn+1 i
Pn+1 (x) = φj (x) = φj (x) + φn+1 (x)
αj αj αn+1
j=0 j=0
| {z }
Pn (x)
hf, φn+1 i
= Pn (x) + φn+1 (x),
αn+1
Example 93. Find the least squares polynomial approximation of degree three to f (x) = ex on
(−1, 1) using Legendre polynomials.
Therefore
2.35 3(0.7358) 5(0.1431) 3 2 1 7(0.02013) 5 3 3
P3 (x) = + x+ x − + x − x
2 2 2 2 2 2 2 2
= 0.1761x3 + 0.5366x2 + 0.9980x + 0.9961.
Example 94. Find the least squares polynomial approximation of degree three to f (x) = ex on
(−1, 1) using Chebyshev polynomials.
CHAPTER 5. APPROXIMATION THEORY 178
3
X hf, φj i
P3 (x) = φj (x),
αj
j=0
but now φj and αj will be replaced by Tj , the Chebyshev polynomials, and its corresponding
constant; see Example 91. We have
1
ex Tj (x)
Z
hex , Tj i = √ dx
−1 1 − x2
which is an improper integral due to discontinuity at the end points. However, we can use the
substitution θ = cos−1 x to rewrite the integral as (see Section 4.5)
1 π
ex Tj (x)
Z Z
x
he , Tj i = √ dx = ecos θ cos(jθ)dθ.
−1 1 − x2 0
The transformed integrand is smooth, and it is not improper, and hence we can use composite
Simpson’s rule to estimate it. The following estimates are obtained by taking n = 20 in the
composite Simpson’s rule:
Z π
hex , T0 i = ecos θ dθ = 3.977
Z π 0
hex , T1 i = ecos θ cos θdθ = 1.775
0
Z π
x
he , T2 i = ecos θ cos 2θdθ = 0.4265
0
Z π
x
he , T3 i = ecos θ cos 3θdθ = 0.06964
0
Therefore
2n + 1 n
Ln+1 (x) = xLn (x) − Ln−1 (x)
n+1 n+1
Out[5]: 0.1431256282441218
Now that we have a code leg(x,n) that generates the Legendre polynomials, we can do the above
computation without explicitly specifying the Legendre polynomial. For example, since 32 x2 − 1
2 =
L2 , we can apply the gauss function directly to L2 (x)ex :
Out[6]: 0.14312562824412176
hf,L i
The following function polyLegCoeff(f,n) computes the coefficients αj j of the least squares
hf,L i
polynomial Pn (x) = nj=0 αj j Lj (x), j = 0, 1, ..., n, for any f and n, where Lj are the Legendre
P
polynomials. The coefficients are the outputs that are returned when the function is finished.
Once the coefficients are computed, evaluating the polynomial can be done efficiently by using
the coefficients. The next function polyLeg(x,n,A) evaluates the least squares polynomial Pn (x) =
Pn hf,Lj i hf,Lj i
j=0 αj Lj (x), j = 0, 1, ..., n, where the coefficients αj , stored in A, are obtained from calling
the function polyLegCoeff(f,n).
Here we plot y = ex together with its least squares polynomial approximations of degree two and
three, using Legendre polynomials. Note that every time the function polyLegCoeff is evaluated,
a new coefficient array A is obtained.
The integrals in Example 94 were computed using the composite Simpson rule with n = 20. For
Rπ
example, the second integral hex , T1 i = 0 ecos θ cos θdθ is computed as:
Out[13]: 1.7754996892121808
Next we write two functions, polyChebCoeff(f,n) and polyCheb(x,n,A). The first function
hf,T i Pn hf,Tj i
computes the coefficients αjj of the least squares polynomial Pn (x) = j=0 αj Tj (x), j =
0, 1, ..., n, for any f and n, where Tj are the Chebyshev polynomials. The coefficients are returned
as the output of the first function.
Rπ
The integral hf, Tj i is transformed to the integral 0 f (cos θ) cos(jθ)dθ, similar to the derivation
in Example 94, and then the transformed integral is computed using the composite Simpson’s rule
by polyChebCoeff.
Next we plot y = ex together with polynomial approximations of degree two and three using
Chebyshev basis polynomials.
CHAPTER 5. APPROXIMATION THEORY 184
The cubic Legendre and Chebyshev approximations are difficult to distinguish from the function
itself. Let’s compare the quadratic approximations obtained by Legendre and Chebyshev polyno-
mials. Below, you can see visually that Chebyshev does a better approximation at the end points
of the interval. Is this expected?
In the following, we compare second degree least squares polynomial approximations for
2
f (x) = ex . Compare how good the Legendre and Chebyshev polynomial approximations are
in the midinterval and toward the endpoints.
Exercise 5.3-3: Use Python to compute the least squares polynomial approximations
P2 (x), P4 (x), P6 (x) to sin 4x using Chebyshev basis polynomials. Plot the polynomials together
with sin 4x.
References
[1] Abramowitz, M., and Stegun, I.A., 1965. Handbook of mathematical functions: with formulas,
graphs, and mathematical tables (Vol. 55). Courier Corporation.
[2] Chace, A.B., and Manning, H.P., 1927. The Rhind Mathematical Papyrus: British Museum
10057 and 10058. Vol 1. Mathematical Association of America.
[3] Atkinson, K.E., 1989. An Introduction to Numerical Analysis, Second Edition, John Wiley &
Sons.
[4] Burden, R.L, Faires, D., and Burden, A.M., 2016. Numerical Analysis, 10th Edition, Cengage.
[5] Capstick, S., and Keister, B.D., 1996. Multidimensional quadrature algorithms at higher degree
and/or dimension. Journal of Computational Physics, 123(2), pp.267-273.
[6] Chan, T.F., Golub, G.H., and LeVeque, R.J., 1983. Algorithms for computing the sample
variance: Analysis and recommendations. The American Statistician, 37(3), pp.242-247.
[7] Cheney, E.W., and Kincaid, D.R., 2012. Numerical mathematics and computing. Cengage
Learning.
[8] Glasserman, P., 2013. Monte Carlo methods in Financial Engineering. Springer.
[9] Goldberg, D., 1991. What every computer scientist should know about floating-point arith-
metic. ACM Computing Surveys (CSUR), 23(1), pp.5-48.
[11] Higham, N.J., 1993. The accuracy of floating point summation. SIAM Journal on Scientific
Computing, 14(4), pp.783-799.
[12] Isaacson, E., and Keller, H.B., 1966. Analysis of Numerical Methods. John Wiley & Sons.
187
Index
188
INDEX 189
Quadratic convergence, 45
Relative error, 30
Representation of integers, 25
Rhind papyrus, 41
Rolle’s theorem, 96
generalized, 96
Root-finding, 41
Rounding, 29
Runge’s function, 92
Secant method, 58
error theorem, 59
Julia code, 60
Significant digits, 30
Simpson’s rule, 123
Spline interpolation, 103
clamped cubic, 106
cubic, 105
Julia code, 109
linear, 104
natural cubic, 106
quadratic, 105
Runge’s function, 112
Stirling’s formula, 136
Subnormal numbers, 24
Superlinear convergence, 45