Numerical Methods: Jeffrey R. Chasnov
Numerical Methods: Jeffrey R. Chasnov
Jeffrey R. Chasnov
Adapted for :
Numerical Methods for Engineers
Copyright ○
c 2012 by Jeffrey Robert Chasnov
This work is licensed under the Creative Commons Attribution 3.0 Hong Kong License. To
view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/hk/ or send
a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105,
USA.
Preface
What follows were my lecture notes for Math 3311: Introduction to Numerical Meth-
ods, taught at the Hong Kong University of Science and Technology. Math 3311,
with two lecture hours per week, was primarily for non-mathematics majors and
was required by several engineering departments.
I also have some free online courses on Coursera. A lot of time and effort has gone
into their production, and the video lectures for these courses are of high quality.
You can click on the links below to explore these courses.
If you want to learn vector calculus (also known as multivariable calculus, or calcu-
lus three), you can sign up for
Jeffrey R. Chasnov
Hong Kong
February 2021
iii
Contents
1 IEEE Arithmetic 1
1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Numbers with a decimal or binary point . . . . . . . . . . . . . . . . . 1
1.3 Examples of binary numbers . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Hex numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.5 4-bit unsigned integers as hex numbers . . . . . . . . . . . . . . . . . . 1
1.6 IEEE single precision format: . . . . . . . . . . . . . . . . . . . . . . . . 2
1.7 Special numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.8 Examples of computer numbers . . . . . . . . . . . . . . . . . . . . . . 3
1.9 Inexact numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.9.1 Find smallest positive integer that is not exact in single precision 4
1.10 Machine epsilon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.11 IEEE double precision format . . . . . . . . . . . . . . . . . . . . . . . . 5
1.12 Roundoff error example . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Root Finding 7
2.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Secant Method . √
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Estimate 2 = 1.41421356 using Newton’s Method . . . . . . . 8
2.3.2 Example of fractals using Newton’s Method . . . . . . . . . . . 8
2.4 Order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Systems of equations 13
3.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Operation counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.5 System of nonlinear equations . . . . . . . . . . . . . . . . . . . . . . . 20
4 Least-squares approximation 23
4.1 Fitting a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Fitting to a linear combination of functions . . . . . . . . . . . . . . . . 24
5 Interpolation 27
5.1 Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1.1 Vandermonde polynomial . . . . . . . . . . . . . . . . . . . . . . 27
5.1.2 Lagrange polynomial . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1.3 Newton polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Piecewise linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Multidimensional interpolation . . . . . . . . . . . . . . . . . . . . . . . 33
v
CONTENTS
6 Integration 35
6.1 Elementary formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.1 Midpoint rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1.2 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1.3 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2 Composite rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.1 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.2 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Local versus global error . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Adaptive integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
vi CONTENTS
Chapter 1
IEEE Arithmetic
1.1 Definitions
Bit = 0 or 1
Byte = 8 bits
Word = Reals: 4 bytes (single precision)
8 bytes (double precision)
= Integers: 1, 2, 4, or 8 byte signed
1, 2, 4, or 8 byte unsigned
·
Decimal: 103 102 101 100 10−1 10−2 10−3 10−4
Binary: 23 22 21 20 2−1 2−2 2−3 2−4
1
1.6. IEEE SINGLE PRECISION FORMAT:
s e f
z}|{ z }| {z }| {
· · · · · · · ·
0 1 2 3 4 5 6 7 8 9 31
# = (−1)s × 2e−127 × 1.f
where s = sign
e = biased exponent
p=e-127 = exponent
1.f = significand (use binary point)
Subnormal numbers
Allow 1.f → 0.f (in software)
Smallest positive number = 0.0000 0000 · · · · · 0001 × 2−126
= 2−23 × 2−126 ' 1.4 × 10−45
0.333 . . . 0.
0.666 . . . 0.0
1.333 . . . 0.01
0.666 . . . 0.010
1.333 . . . 0.0101
etc.
so that 1/3 exactly in binary is 0.010101 . . . . With only 23 bits to represent f , the
number is inexact and we have
f = 01010101010101010101011,
where we have rounded to the nearest binary number (here, rounded up). The
machine number 1/3 is then represented as
00111110101010101010101010101011
or in hex
3eaaaaab.
1.9.1 Find smallest positive integer that is not exact in single pre-
cision
Let N be the smallest positive integer that is not exact. Now, I claim that
N − 2 = 223 × 1.11 . . . 1,
and
N − 1 = 224 × 1.00 . . . 0.
The integer N would then require a one-bit in the 2−24 position, which is not avail-
able. Therefore, the smallest positive integer that is not exact is 224 + 1 = 16 777 217.
In MATLAB, single(224 ) has the same value as single(224 + 1). Since single(224 + 1)
is exactly halfway between the two consecutive machine numbers 224 and 224 + 2,
MATLAB rounds to the number with a final zero-bit in f, which is 224 .
Find emach
The number 1 in the IEEE format is written as
1 = 20 × 1.000 . . . 0,
with 23 0’s following the binary point. The number just larger than 1 has a 1 in the
23rd position after the decimal point. Therefore,
emach = 2−23 ≈ 1.192 × 10−7 .
What is the distance between 1 and the number just smaller than 1? Here, the
number just smaller than one can be written as
2−1 × 1.111 . . . 1 = 2−1 (1 + (1 − 2−23 )) = 1 − 2−24
Therefore, this distance is 2−24 = emach /2.
The spacing between numbers is uniform between powers of 2, with logarithmic
spacing of the powers of 2. That is, the spacing of numbers between 1 and 2 is 2−23 ,
between 2 and 4 is 2−22 , between 4 and 8 is 2−21 , etc. This spacing changes for
denormal numbers, where the spacing is uniform all the way down to zero.
s e f
z}|{ z }| {z }| {
· · · · · · · ·
0 1 2 3 4 5 6 7 8 9 10 11 12 63
# = (−1)s × 2e−1023 × 1.f
where s = sign
e = biased exponent
p=e-1023 = exponent
1.f = significand (use binary point)
x2 + 2bx − 1 = 0,
Consider the solution with b > 0 and x > 0 (the x+ solution) given by
p
x = −b + b2 + 1. (1.1)
As b → ∞,
p
x = −b + b2 + 1
p
= −b + b 1 + 1/b2
p
= b( 1 + 1/b2 − 1)
1
≈ b 1+ 2 −1
2b
1
= .
2b
Now in double precision, realmin ≈ 2.2 × 10−308 and we would like x to be accurate
to this value before it goes to 0 via denormal numbers. Therefore, x should be
computed accurately to b ≈ 1/(2 × realmin) ≈ 2 × 10307 . What happens if we
compute (1.1) directly? √Then x = 0 when b2 + 1 = b2 , or 1 + 1/b2 = 1. That is
√
1/b2 = emach /2, or b = 2/ emach ≈ 108 .
1
x= √ .
b (1 + 1 + 1/b2 )
In this form, if 1 + 1/b2 = 1, then x = 1/2b which is the correct asymptotic form.
Root Finding
Solve f ( x ) = 0 for x, when an explicit analytical solution is impossible.
f ( xn )
x n +1 = x n − .
f 0 ( xn )
Starting Newton’s Method requires a guess for x0 , hopefully close to the root x = r.
7
2.3. SECANT METHOD
f ( x n ) − f ( x n −1 )
f 0 ( xn ) ≈ .
x n − x n −1
xn2 − 2
x n +1 = x n − .
2xn
We take as our initial guess x0 = 1. Then
−1 3
x1 = 1 − = = 1.5,
2 2
9
3 − 2 17
x2 = − 4 = = 1.416667,
2 3 12
172
17 2 −2 577
x3 = − 12 17 = = 1.41426.
12 6
408
f (z) = z3 − 1.
ei2πn = 1, n = 0, 1, 2, . . .
1, ei2π/3 , ei4π/3 .
With
eiθ = cos θ + i sin θ,
√
and cos (2π/3) = −1/2, sin (2π/3) = 3/2, the three cubic roots of unity are
√ √
1 3 1 3
r1 = 1, r2 = − + i, r3 = − − i.
2 2 2 2
The interesting idea here is to determine which initial values of z0 in the complex
plane converge to which of the three cubic roots of unity.
Newton’s method is
z3 − 1
z n +1 = z n − n 2 .
3zn
If the iteration converges to r1 , we color z0 red; r2 , blue; r3 , green. The result will
be shown in lecture.
en = r − x n .
| e n +1 | = k | e n | p ,
f ( xn )
x n +1 = x n − .
f 0 ( xn )
f ( xn )
r − x n +1 = r − x n + ,
f 0 ( xn )
or
f ( xn )
e n +1 = e n + . (2.1)
f 0 ( xn )
We use Taylor series to expand the functions f ( xn ) and f 0 ( xn ) about the root r,
using f (r ) = 0. We have
1
f ( xn ) = f (r ) + ( xn − r ) f 0 (r ) + ( xn − r )2 f 00 (r ) + . . . ,
2
0 1 2 00
= − en f (r ) + en f (r ) + . . . ;
2 (2.2)
1
f 0 ( xn ) = f 0 (r ) + ( xn − r ) f 00 (r ) + ( xn − r )2 f 000 (r ) + . . . ,
2
1
= f 0 (r ) − en f 00 (r ) + en2 f 000 (r ) + . . . .
2
To make further progress, we will make use of the following standard Taylor series:
1
= 1 + e + e2 + . . . , (2.3)
1−e
which converges for |e| < 1. Substituting (2.2) into (2.1), and using (2.3) yields
f ( xn )
e n +1 = e n +
f 0 ( xn )
−en f 0 (r ) + 12 en2 f 00 (r ) + . . .
= en +
f 0 (r ) − en f 00 (r ) + 12 en2 f 000 (r ) + . . .
00
−en + 12 en2 ff 0 ((rr)) + . . .
= en + f 00 (r )
1 − en f 0 (r )
+...
1 f 00 (r ) f 00 (r )
= en + −en + en2 0 +... 1 + en 0 +...
2 f (r ) f (r )
00 00 (r )
1 f ( r ) f
= en + −en + en2 − + . . .
2 f 0 (r ) f 0 (r )
1 f 00 (r ) 2
=− e +...
2 f 0 (r ) n
| e n +1 | = k | e n | 2
as n → ∞, with
1 f 00 (r )
k= ,
2 f 0 (r )
provided f 0 (r ) 6= 0. Newton’s method is thus of order 2 at simple roots.
x n − x n −1 = ( r − x n −1 ) − ( r − x n )
= e n −1 − e n ,
1
f ( xn ) = −en f 0 (r ) + en2 f 00 (r ) + . . . ,
2
1
0
f ( xn−1 ) = −en−1 f (r ) + en2 −1 f 00 (r ) + . . . ,
2
so that
1
f ( xn ) − f ( xn−1 ) = (en−1 − en ) f 0 (r ) + (en2 − en2 −1 ) f 00 (r ) + . . .
2
1
= (en−1 − en ) f (r ) − (en−1 + en ) f 00 (r ) + . . . .
0
2
We therefore have
−en f 0 (r ) + 21 en2 f 00 (r ) + . . .
e n +1 = e n +
f 0 (r ) − 12 (en−1 + en ) f 00 (r ) + . . .
f 00 (r )
1 − 21 en f 0 (r )
+...
= en − en f 00 (r )
1 − 12 (en−1 + en ) f 0 (r) + . . .
1 f 00 (r ) f 00 (r )
1
= en − en 1 − en 0 +... 1 + ( e n −1 + e n ) 0 +...
2 f (r ) 2 f (r )
00
1 f (r )
=− e en + . . . ,
2 f 0 ( r ) n −1
or to leading order
1 f 00 (r )
| e n +1 | = 0 |e ||en |. (2.4)
2 f ( r ) n −1
The order of convergence is not yet obvious from this equation, and to determine
the scaling law we look for a solution of the form
| e n +1 | = k | e n | p .
| e n | = k | e n −1 | p ,
and therefore
2
| e n +1 | = k p +1 | e n −1 | p .
Substitution into (2.4) results in
k f 00 (r )
2
k p +1 | e n −1 | p = | e | p +1 .
2 f 0 ( r ) n −1
1 f 00 (r )
k p = 0 ,
2 f (r )
and
p2 = p + 1.
The order of convergence of the Secant Method, given by p, therefore is determined
to be the positive root of the quadratic equation p2 − p − 1 = 0, or
√
1+ 5
p= ≈ 1.618,
2
which coincidentally is a famous irrational number that is called The Golden Ra-
tio, and goes by the symbol Φ. We see that the Secant Method has an order of
convergence lying between the Bisection Method and Newton’s Method.
Systems of equations
Consider the system of n linear equations and n unknowns, given by
Ax = b,
with
a11 a12 ··· a1n x1 b1
a21 a22 ··· a2n x2 b2
A= . .. , x = . , b = . .
.. ..
.. . . . .. ..
an1 an2 ··· ann xn bn
Row reduction is then performed on this matrix. Allowed operations are (1) mul-
tiply any row by a constant, (2) add multiple of one row to another row, (3) inter-
change the order of any rows. The goal is to convert the original matrix into an
upper-triangular matrix.
We start with the first row of the matrix and work our way down as follows.
First we multiply the first row by 2 and add it to the second row, and add the first
row to the third row:
−3 2 −1 −1
0 −2 5 −9 .
0 −2 3 −7
13
3.2. LU DECOMPOSITION
We then go to the second row. We multiply this row by −1 and add it to the third
row:
−3 2 −1 −1
0 −2 5 −9 .
0 0 −2 2
The resulting equations can be determined from the matrix and are given by
−3x1 + 2x2 − x3 = −1
−2x2 + 5x3 = −9
−2x3 = 2.
These equations can be solved by backward substitution, starting from the last equa-
tion and working backwards. We have
−2x3 = 2 → x3 = −1
−2x2 = −9 − 5x3 = −4 → x2 = 2,
−3x1 = −1 − 2x2 + x3 = −6 → x1 = 2.
Therefore,
x1 2
x2 = 2 .
x3 −1
3.2 LU decomposition
The process of Gaussian Elimination also results in the factoring of the matrix A to
A = LU,
where L is a lower triangular matrix and U is an upper triangular matrix. Using the
same matrix A as in the last section, we show how this factorization is realized. We
have
−3 2 −1 −3 2 −1
6 −6 7 → 0 −2 5 = M1 A,
3 −4 4 0 −2 3
where
1 0 0 −3 2 −1 −3 2 −1
M1 A = 2 1 0 6 −6 7 = 0 −2 5 .
1 0 1 3 −4 4 0 −2 3
Note that the matrix M1 performs row elimination on the first column. Two times
the first row is added to the second row and one times the first row is added to
the third row. The entries of the column of M1 come from 2 = −(6/ − 3) and
1 = −(3/ − 3) as required for row elimination. The number −3 is called the pivot.
The next step is
−3 2 −1 −3 2 −1
0 −2 5 → 0 −2 5 = M2 (M1 A),
0 −2 3 0 0 −2
where
1 0 0 −3 2 −1 −3 2 −1
M2 (M1 A) = 0 1 0 0 −2 5 = 0 −2 5 .
0 −1 1 0 −2 3 0 0 −2
Here, M2 multiplies the second row by −1 = −(−2/ − 2) and adds it to the third
row. The pivot is −2.
We now have
M2 M1 A = U
or
A = M1−1 M2−1 U.
The inverse matrices are easy to find. The matrix M1 multiples the first row by 2
and adds it to the second row, and multiplies the first row by 1 and adds it to the
third row. To invert these operations, we need to multiply the first row by −2 and
add it to the second row, and multiply the first row by −1 and add it to the third
row. To check, with
M1 M1−1 = I,
we have
1 0 0 1 0 0 1 0 0
2 1 0 −2 1 0 = 0 1 0 .
1 0 1 −1 0 1 0 0 1
Similarly,
1 0 0
M2−1 = 0 1 0
0 1 1
Therefore,
L = M1−1 M2−1
is given by
1 0 0 1 0 0 1 0 0
L = −2 1 0 0 1 0 = −2 1 0 ,
−1 0 1 0 1 1 −1 1 1
which is lower triangular. The off-diagonal elements of M1−1 and M2−1 are simply
combined to form L. Our LU decomposition is therefore
−3 2 −1 1 0 0 −3 2 −1
6 −6 7 = −2 1 0 0 −2 5 .
3 −4 4 −1 1 1 0 0 −2
(LU)x = L(Ux) = b.
We let
y = Ux,
and first solve
Ly = b
for y by forward substitution. We then solve
Ux = y
for x by backward substitution. When we count operations, we will see that solving
(LU)x = b is significantly faster once L and U are in hand than solving Ax = b
directly by Gaussian elimination.
We now illustrate the solution of LUx = b using our previous example, where
1 0 0 −3 2 −1 −1
L = −2 1 0 , U = 0 −2 5 , b = −7 .
−1 1 1 0 0 −2 −6
y1 = −1,
y2 = −7 + 2y1 = −9,
y3 = −6 + y1 − y2 = 2.
−2x3 = 2 → x3 = −1,
−2x2 = −9 − 5x3 = −4 → x2 = 2,
−3x1 = −1 − 2x2 + x3 = −6 → x1 = 2,
Consider
−2 2 −1
A = 6 −6 7 .
3 −8 4
We interchange rows to place the largest element (in absolute value) in the pivot, or
a11 , position. That is,
6 −6 7
A → −2 2 −1 = P12 A,
3 −8 4
where
0 1 0
P12 = 1 0 0
0 0 1
is a permutation matrix that when multiplied on the left interchanges the first and
−1
second rows of a matrix. Note that P12 = P12 . The elimination step is then
6 −6 7
P12 A → 0 0 4/3 = M1 P12 A,
0 −5 1/2
where
1 0 0
M1 = 1/3 1 0 .
−1/2 0 1
The final step requires one more row interchange:
6 −6 7
M1 P12 A → 0 −5 1/2 = P23 M1 P12 A = U.
0 0 4/3
Since the permutation matrices given by P are their own inverses, we can write our
result as
(P23 M1 P23 )P23 P12 A = U.
Multiplication of M on the left by P interchanges rows while multiplication on the
right by P interchanges columns. That is,
1 0 0 1 0 0 1 0 0
P23 1/3 1 0 P23 = −1/2 0 1 P23 = −1/2 1 0 .
−1/2 0 1 1/3 1 0 1/3 0 1
The net result on M1 is an interchange of the nondiagonal elements 1/3 and −1/2.
We can then multiply by the inverse of (P23 M1 P23 ) to obtain
which we write as
PA = LU.
Instead of L, MATLAB will write this as
A = (P−1 L)U.
For convenience, we will just denote (P−1 L) by L, but by L here we mean a permu-
tated lower triangular matrix.
For example, in MATLAB, to solve Ax = b for x using Gaussian elimination,
one types
x = A \ b;
where \ solves for x using the most efficient algorithm available, depending on the
form of A. If A is a general n × n matrix, then first the LU decomposition of A is
found using partial pivoting, and then x is determined from permuted forward and
backward substitution. If A is upper or lower triangular, then forward or backward
substitution (or their permuted version) is used directly.
If there are many different right-hand-sides, one would first directly find the
LU decomposition of A using a function call, and then solve using \. That is, one
would iterate for different b’s the following expressions:
[LU] = lu(A);
y = L \ b;
x = U \ y;
x = U \ (L \ b);
where the parenthesis are required. In lecture, I will demonstrate these solutions in
MATLAB using the matrix A = [−2, 2, −1; 6, −6, 7; 3, −8, 4]; which is the example
in the notes.
for some unknown constant k. To determine how much longer the multiplication
T2n = k(2n)3
= 8kn3
= 8Tn ,
so that doubling the size of the matrix is expected to increase the computational
time by a factor of 23 = 8.
Running MATLAB on my computer, the multiplication of two 2048 × 2048 ma-
trices took about 0.75 sec. The multiplication of two 4096 × 4096 matrices took about
6 sec, which is 8 times longer. Timing of code in MATLAB can be found using the
built-in stopwatch functions tic and toc.
What is the operation count and therefore the scaling of Gaussian elimination?
Consider an elimination step with the pivot in the ith row and ith column. There
are both n − i rows below the pivot and n − i columns to the right of the pivot. To
perform elimination of one row, each matrix element to the right of the pivot must
be multiplied by a factor and added to the row underneath. This must be done for
all the rows. There are therefore (n − i )(n − i ) multiplication-additions to be done
for this pivot. Since we are interested in only the scaling of the algorithm, I will just
count a multiplication-addition as one operation.
To find the total operation count, we need to perform elimination using n − 1
pivots, so that
n −1
op. counts = ∑ ( n − i )2
i =1
= ( n − 1)2 + ( n − 2)2 + . . . (1)2
n −1
= ∑ i2 .
i =1
f ( x n +1 , y n +1 ) = f ( x n , y n ) + ( x n +1 − x n ) f x ( x n , y n )
+ ( y n +1 − y n ) f y ( x n , y n ) + . . .
g ( x n +1 , y n +1 ) = g ( x n , y n ) + ( x n +1 − x n ) g x ( x n , y n )
+ ( y n +1 − y n ) g y ( x n , y n ) + . . . .
To obtain Newton’s method, we take f ( xn+1 , yn+1 ) = 0, g( xn+1 , yn+1 ) = 0 and drop
higher-order terms above linear. Although one can then find a system of linear
equations for xn+1 and yn+1 , it is more convenient to define the variables
∆xn = xn+1 − xn , ∆yn = yn+1 − yn .
The iteration equations will then be given by
xn+1 = xn + ∆xn , yn+1 = yn + ∆yn ;
and the linear equations to be solved for ∆xn and ∆yn are given by
∆xn
fx fy −f
= ,
g x gy ∆yn −g
where f , g, f x , f y , gx , and gy are all evaluated at the point ( xn , yn ). The two-
dimensional case is easily generalized to n dimensions. The matrix of partial deriva-
tives is called the Jacobian Matrix.
We illustrate Newton’s Method by finding the steady state solution of the Lorenz
equations, given by
σ (y − x ) = 0,
rx − y − xz = 0,
xy − bz = 0,
where x, y, and z are the unknown variables and σ, r, and b are the known param-
eters. Here, we have a three-dimensional homogeneous system f = 0, g = 0, and
h = 0, with
f ( x, y, z) = σ(y − x ),
g( x, y, z) = rx − y − xz,
h( x, y, z) = xy − bz.
The partial derivatives can be computed to be
f x = −σ, f y = σ, f z = 0,
gx = r − z, gy = −1, gz = − x,
h x = y, hy = x, hz = −b.
Least-squares approximation
The method of least-squares is commonly used to fit a parameterized curve to
experimental data. In general, the fitting curve is not expected to pass through the
data points, making this problem substantially different from the one of interpola-
tion.
We consider here only the simplest case of the same experimental error for all
the data points. Let the data to be fitted be given by ( xi , yi ), with i = 1 to n.
These equations form a system of two linear equations in the two unknowns α and
β, which is evident when rewritten in the form
n n n
α ∑ xi2 + β ∑ xi = ∑ xi yi ,
i =1 i =1 i =1
n n
α ∑ xi + βn = ∑ yi .
i =1 i =1
23
4.2. FITTING TO A LINEAR COMBINATION OF FUNCTIONS
r = y − Ac
ρ = rT r
= (y − Ac)T (y − Ac)
= y T y − c T AT y − y T Ac + c T AT Ac.
Since ρ is a scalar, each term in the above expression must be a scalar, and since the
transpose of a scalar is equal to the scalar, we have
T
c T AT y = c T AT y = y T Ac.
Therefore,
ρ = y T y − 2y T Ac + c T AT Ac.
To find the minimum of ρ, we will need to solve ∂ρ/∂c j = 0 for j = 1, . . . , m.
To take the derivative of ρ, we switch to a tensor notation, using the Einstein sum-
mation convention, where repeated indices are summed over their allowable range.
We can write
T
ρ = yi yi − 2yi Aik ck + ci Aik Akl cl .
Taking the partial derivative, we have
∂ρ ∂c ∂c T ∂c
= −2yi Aik k + i Aik T
Akl cl + ci Aik Akl l .
∂c j ∂c j ∂c j ∂c j
Now, (
∂ci 1, if i = j;
=
∂c j 0, otherwise.
Therefore,
∂ρ
= −2yi Aij + ATjk Akl cl + ci Aik
T
Akj .
∂c j
Now,
T
ci Aik Akj = ci Aki Akj
= Akj Aki ci
= ATjk Aki ci
= ATjk Akl cl .
Therefore,
∂ρ
= −2yi Aij + 2ATjk Akl cl .
∂c j
With the partials set equal to zero, we have
ATjk Akl cl = yi Aij ,
or
ATjk Akl cl = ATji yi ,
In vector notation, we have
AT Ac = AT y. (4.2)
Equation (4.2) is the so-called normal equation, and can be solved for c by Gaus-
sian elimination using the MATLAB backslash operator. After constructing the
matrix A given by (4.1), and the vector y from the data, one can code in MATLAB
c = ( A0 A)\( A0 y);
But in fact the MATLAB back slash operator will automatically solve the normal
equations when the matrix A is not square, so that the MATLAB code
c = A\y;
yields the same result.
Interpolation
Consider the following problem: Given the values of a known function y = f ( x )
at a sequence of ordered points x0 , x1 , . . . , xn , find f ( x ) for arbitrary x. When x0 ≤
x ≤ xn , the problem is called interpolation. When x < x0 or x > xn the problem is
called extrapolation.
With yi = f ( xi ), the problem of interpolation is basically one of drawing a
smooth curve through the known points ( x0 , y0 ), ( x1 , y1 ), . . . , ( xn , yn ). This is not the
same problem as drawing a smooth curve that approximates a set of data points that
have experimental error. This latter problem is called least-squares approximation.
Here, we will consider three interpolation algorithms: (1) polynomial interpola-
tion; (2) piecewise linear interpolation, and; (3) cubic spline interpolation.
Pn ( x ) = c0 x n + c1 x n−1 + · · · + cn .
Then we can immediately form n + 1 linear equations for the n + 1 unknown coef-
ficients c0 , c1 , . . . , cn using the n + 1 known points:
27
5.1. POLYNOMIAL INTERPOLATION
The matrix is called the Vandermonde matrix, and can be constructed using the
MATLAB function vander.m. The system of linear equations can be solved in MAT-
LAB using the \ operator, and the MATLAB function polyval.m can used to inter-
polate using the c coefficients. I will illustrate this in class and place the code on
the website.
The Lagrange polynomial is the most clever construction of the interpolating poly-
nomial Pn ( x ), and leads directly to an analytical formula. The Lagrange polynomial
is the sum of n + 1 terms and each term is itself a polynomial of degree n. The full
polynomial is therefore of degree n. Counting from 0, the ith term of the Lagrange
polynomial is constructed by requiring it to be zero at x j with j 6= i, and equal to yi
when j = i. The polynomial can be written as
( x − x1 )( x − x2 ) · · · ( x − xn )y0 ( x − x0 )( x − x2 ) · · · ( x − xn )y1
Pn ( x ) = +
( x0 − x1 )( x0 − x2 ) · · · ( x0 − xn ) ( x1 − x0 )( x1 − x2 ) · · · ( x1 − xn )
( x − x0 )( x − x1 ) · · · ( x − xn−1 )yn
+···+ .
( xn − x0 )( xn − x1 ) · · · ( xn − xn−1 )
It can be clearly seen that the first term is equal to zero when x = x1 , x2 , . . . , xn and
equal to y0 when x = x0 ; the second term is equal to zero when x = x0 , x2 , . . . xn and
equal to y1 when x = x1 ; and the last term is equal to zero when x = x0 , x1 , . . . xn−1
and equal to yn when x = xn . The uniqueness of the interpolating polynomial
implies that the Lagrange polynomial must be the interpolating polynomial.
The Newton polynomial is somewhat more clever than the Vandermonde polyno-
mial because it results in a system of linear equations that is lower triangular, and
therefore can be solved by forward substitution. The interpolating polynomial is
written in the form
Pn ( x ) = c0 + c1 ( x − x0 ) + c2 ( x − x0 )( x − x1 ) + · · · + cn ( x − x0 ) · · · ( x − xn−1 ),
y0 = c0 ,
y1 = c0 + c1 ( x1 − x0 ),
y2 = c0 + c1 ( x2 − x0 ) + c2 ( x2 − x0 )( x2 − x1 ),
.. .. ..
. . .
yn = c0 + c1 ( xn − x0 ) + c2 ( xn − x0 )( xn − x1 ) + · · · + cn ( xn − x0 ) · · · ( xn − xn−1 ).
28 CHAPTER 5. INTERPOLATION
5.2. PIECEWISE LINEAR INTERPOLATION
This system of linear equations is lower triangular as can be seen from the matrix
form
1 0 0 ··· 0 c0
1
( x1 − x0 ) 0 ··· 0 c1
.. .. .. .. .. ..
. . . . . .
1 ( xn − x0 ) ( xn − x0 )( xn − x1 ) · · · ( x n − x 0 ) · · · ( x n − x n −1 ) cn
y0
y1
= . ,
..
yn
g ( x ) = gi ( x ) , for xi ≤ x ≤ xi+1 ,
where
g i ( x ) = a i ( x − x i ) + bi ,
and i = 0, 1, . . . , n − 1.
We now require y = gi ( x ) to pass through the endpoints ( xi , yi ) and ( xi+1 , yi+1 ).
We have
y i = bi ,
y i + 1 = a i ( x i + 1 − x i ) + bi .
y i +1 − y i
ai = , bi = y i .
x i +1 − x i
CHAPTER 5. INTERPOLATION 29
5.3. CUBIC SPLINE INTERPOLATION
( x0 , y0 ), ( x1 , y1 ), . . . ( x n , y n ).
g i ( x ) = a i ( x − x i ) 3 + bi ( x − x i ) 2 + c i ( x − x i ) + d i , i = 0, 1, . . . , n − 1,
g ( x ) = gi ( x ) , for xi ≤ x ≤ xi+1 .
To achieve a smooth interpolation we impose that g( x ) and its first and second
derivatives are continuous. The requirement that g( x ) is continuous (and goes
through all n + 1 points) results in the two constraints
gi ( x i ) = y i , i = 0 to n − 1, (5.1)
gi ( x i + 1 ) = y i + 1 , i = 0 to n − 1. (5.2)
The requirement that g0 ( x ) is continuous results in
There are n cubic polynomials gi ( x ) and each cubic polynomial has four free co-
efficients; there are therefore a total of 4n unknown coefficients. The number of
constraining equations from (5.1)-(5.4) is 2n + 2(n − 1) = 4n − 2. With 4n − 2 con-
straints and 4n unknowns, two more conditions are required for a unique solution.
These are usually chosen to be extra conditions on the first g0 ( x ) and last gn−1 ( x )
polynomials. We will discuss these extra conditions later.
We now proceed to determine equations for the unknown coefficients of the
cubic polynomials. The polynomials and their first two derivatives are given by
g i ( x ) = a i ( x − x i ) 3 + bi ( x − x i ) 2 + c i ( x − x i ) + d i , (5.5)
gi0 ( x ) = 3ai ( x − xi )2 + 2bi ( x − xi ) + ci , (5.6)
gi00 ( x ) = 6ai ( x − xi ) + 2bi . (5.7)
We will consider the four conditions (5.1)-(5.4) in turn. From (5.1) and (5.5), we
have
di = yi , i = 0 to n − 1, (5.8)
which directly solves for all of the d-coefficients.
To satisfy (5.2), we first define
h i = x i +1 − x i ,
and
f i = y i +1 − y i .
30 CHAPTER 5. INTERPOLATION
5.3. CUBIC SPLINE INTERPOLATION
Now, from (5.2) and (5.5), using (5.8), we obtain the n equations
1
ai = ( b − bi ) , i = 0 to n − 1. (5.13)
3hi i+1
1
ci = f i − ai h3i − bi h2i
hi
1 1
= fi − (bi+1 − bi ) h3i − bi h2i
hi 3hi
f 1
= i − hi (bi+1 + 2bi ) , i = 0 to n − 1. (5.14)
hi 3
We can now find an equation for the b-coefficients by substituting (5.8), (5.13)
and (5.14) into (5.10):
1 fi 1
3 (bi+1 − bi ) h2i + 2bi hi + − hi (bi+1 + 2bi )
3hi hi 3
f i +1 1
= − hi+1 (bi+2 + 2bi+1 ) ,
h i +1 3
which simplifies to
1 2 1 f f
h i bi + ( h i + h i + 1 ) bi + 1 + h i + 1 bi + 2 = i + 1 − i , (5.15)
3 3 3 h i +1 hi
CHAPTER 5. INTERPOLATION 31
5.3. CUBIC SPLINE INTERPOLATION
tion for the b-coefficients, leaving the first and last row absent, as
... ... ... ... missing ... ... b0
1 h0 2 1
3 ( h0 + h1 ) 3 h1 ... 0 0 0 b1
3
.. .. .. .. .. .. .. ..
.
. . . . . .
.
1 2 1 bn − 1
3 h n −2 3 ( h n −2
+ h n −1 ) h
0 0 0 ... 3 n −1
... ... ... ... missing ... ... bn
missing
f1 − f0
h1 h0
=
..
.
.
f n −1 f n −2
h n −1 − h n −2
missing
Once the missing first and last equations are specified, the matrix equation for the
b-coefficients can be solved by Gaussian elimination. And once the b-coefficients are
determined, the a- and c-coefficients can also be determined from (5.13) and (5.14),
with the d-coefficients already known from (5.8). The piecewise cubic polynomials,
then, are known and g( x ) can be used for interpolation to any value x satisfying
x0 ≤ x ≤ x n .
The missing first and last equations can be specified in several ways, and here
we show the two ways that are allowed by the MATLAB function spline.m. The
first way should be used when the derivative g0 ( x ) is known at the endpoints x0
and xn ; that is, suppose we know the values of α and β such that
g00 ( x0 ) = α, gn0 −1 ( xn ) = β.
From the known value of α, and using (5.6) and (5.14), we have
α = c0
f 1
= 0 − h0 (b1 + 2b0 ).
h0 3
Therefore, the missing first equation is determined to be
2 1 f0
h0 b0 + h0 b1 = − α. (5.16)
3 3 h0
From the known value of β, and using (5.6), (5.13), and (5.14), we have
g0 ( x ) = g1 ( x ) , g n −2 ( x ) = g n −1 ( x ).
32 CHAPTER 5. INTERPOLATION
5.4. MULTIDIMENSIONAL INTERPOLATION
a0 ( x − x0 )3 + b0 ( x − x0 )2 + c0 ( x − x0 ) + d0
= a1 ( x − x1 )3 + b1 ( x − x1 )2 + c1 ( x − x1 ) + d1 .
a0 = a1 .
The MATLAB subroutines spline.m and ppval.m can be used for cubic spline
interpolation (see also interp1.m). I will illustrate these routines in class and post
sample code on the course web site.
z = f ( x, y).
zij = f ( xi , y j ),
CHAPTER 5. INTERPOLATION 33
5.4. MULTIDIMENSIONAL INTERPOLATION
34 CHAPTER 5. INTERPOLATION
Chapter 6
Integration
We want to construct numerical algorithms that can perform definite integrals
of the form Z b
I= f ( x )dx. (6.1)
a
Calculating these definite integrals numerically is called numerical integration, nu-
merical quadrature, or more simply quadrature.
( x − h/2)2 00
f ( x ) = f (h/2) + ( x − h/2) f 0 (h/2) + f (h/2)
2
( x − h/2)3 000 ( x − h/2)4 0000
+ f (h/2) + f (h/2) + . . .
6 24
( x − h/2)2 00
Z h
Ih = h f (h/2) + ( x − h/2) f 0 (h/2) + f (h/2)
0 2
( x − h/2)3 000 ( x − h/2)4 0000
+ f (h/2) + f (h/2) + . . . dx.
6 24
Changing variables by letting y = x − h/2 and dy = dx, and simplifying the integral
depending on whether the integrand is even or odd, we have
Ih = h f (h/2)
y2 00 y3 000 y4 0000
Z h/2
+ y f 0 (h/2) + f (h/2) + f (h/2) + f (h/2) + . . . dy
−h/2 2 6 24
Z h/2 4
y 0000
= h f (h/2) + y2 f 00 (h/2) + f (h/2) + . . . dy.
0 12
The integrals that we need here are
h h
h3 h5
Z Z
2 2
y2 dy = , y4 dy = .
0 24 0 160
35
6.2. COMPOSITE RULES
Therefore,
h3 00 h5 0000
Ih = h f (h/2) + f (h/2) + f (h/2) + . . . . (6.3)
24 1920
h 0 h2 00 h3 000 h4 0000
f (0) = f (h/2) − f (h/2) + f (h/2) − f (h/2) + f (h/2) + . . . ,
2 8 48 384
and
h 0 h2 00 h3 000 h4 0000
f (h) = f (h/2) + f (h/2) + f (h/2) + f (h/2) + f (h/2) + . . . .
2 8 48 384
Adding and multiplying by h/2 we obtain
h h3 00 h5 0000
f (0) + f (h) = h f (h/2) + f (h/2) + f (h/2) + . . . .
2 8 384
We now substitute for the first term on the right-hand-side using the midpoint rule
formula:
h3 00 h5 0000
h
f ( 0 ) + f ( h ) = Ih − f (h/2) − f (h/2)
2 24 1920
h3 00 h5 0000
+ f (h/2) + f (h/2) + . . . ,
8 384
and solving for Ih , we find
h h3 00 h5 0000
Ih = f (0) + f ( h ) − f (h/2) − f (h/2) + . . . . (6.4)
2 12 480
h h5 0000
I2h = f (0) + 4 f (h) + f (2h) − f (h) + . . . . (6.5)
3 90
36 CHAPTER 6. INTEGRATION
6.2. COMPOSITE RULES
f i = f ( x i ), h i = x i +1 − x i .
a
f ( x )dx = ∑ xi
f ( x )dx
i =0
n −1 Z h i
= ∑ 0
f ( xi + s)ds,
i =0
where the last equality arises from the change-of-variables s = x − xi . Applying the
trapezoidal rule to the integral, we have
Z b n −1
hi
a
f ( x )dx = ∑ 2 i
( f + f i +1 ) . (6.6)
i =0
If the points are not evenly spaced, say because the data are experimental values,
then the hi may differ for each value of i and (6.6) is to be used directly.
However, if the points are evenly spaced, say because f ( x ) can be computed, we
have hi = h, independent of i. We can then define
xi = a + ih, i = 0, 1, . . . , n;
b−a
h= .
n
The composite trapezoidal rule for evenly space points then becomes
Z b n −1
h
a
f ( x )dx =
2 ∑ ( f i + f i +1 )
i =0
h
= ( f 0 + 2 f 1 + · · · + 2 f n −1 + f n ) . (6.7)
2
The first and last terms have a multiple of one; all other terms have a multiple of
two; and the entire sum is multiplied by h/2.
CHAPTER 6. INTEGRATION 37
6.3. LOCAL VERSUS GLOBAL ERROR
Note that n must be even for this scheme to work. Combining terms, we have
Z b
h
f ( x )dx = ( f 0 + 4 f 1 + 2 f 2 + 4 f 3 + 2 f 4 + · · · + 4 f n −1 + f n ) .
a 3
The first and last terms have a multiple of one; the even indexed terms have a
multiple of 2; the odd indexed terms have a multiple of 4; and the entire sum is
multiplied by h/3.
h3 n−1 00
Z b
h
12 i∑
f ( x )dx = ( f 0 + 2 f 1 + · · · + 2 f n −1 + f n ) − f ( ξ i ),
a 2 =0
h3 n−1 00 nh3
00
∑
− f (ξ i ) = − f (ξ i ) ,
12 i=0 12
b−a
n= ,
h
so that the error term becomes
nh3
00 (b − a)h2
00
− f (ξ i ) = − f (ξ i )
12 12
= O( h2 ).
Therefore, the global error is O(h2 ). That is, a halving of the grid spacing only
decreases the global error by a factor of four.
Similarly, Simpson’s rule has a local error of O(h5 ) and a global error of O(h4 ).
38 CHAPTER 6. INTEGRATION
6.4. ADAPTIVE INTEGRATION
a d c e b
b−a
h= .
2
h h5 0000
I= f ( a) + 4 f (c) + f (b) − f ( ξ ),
3 90
h
S1 = f ( a) + 4 f (c) + f (b) ,
3
h
S2 = f ( a) + 4 f (d) + 2 f (c) + 4 f (e) + f (b) ,
6
h5
E1 = − f 0000 (ξ ),
90
h5
f 0000 (ξ l ) + f 0000 (ξ r ) .
E2 = − 5
2 · 90
Now we ask whether S2 is accurate enough, or must we further refine the calcula-
tion and go to Level 2? To answer this question, we make the simplifying approxi-
mation that all of the fourth-order derivatives of f ( x ) in the error terms are equal;
that is,
f 0000 (ξ ) = f 0000 (ξ l ) = f 0000 (ξ r ) = C.
CHAPTER 6. INTEGRATION 39
6.4. ADAPTIVE INTEGRATION
Then
h5
E1 = − C,
90
h5 1
E2 = − 4 C= E .
2 · 90 16 1
Then since
S1 + E1 = S2 + E2 ,
and
E1 = 16E2 ,
we have for our estimate for the error term E2 ,
1
E2 = ( S2 − S1 ) .
15
Therefore, given some specific value of the tolerance tol, if
1
(S2 − S1 ) < tol,
15
then we can accept S2 as I. If the tolerance is not achieved for I, then we proceed to
Level 2.
The computation at Level 2 further divides the integration interval from a to b
into the two integration intervals a to c and c to b, and proceeds with the above
procedure independently on both halves. Integration can be stopped on either half
provided the tolerance is less than tol/2 (since the sum of both errors must be less
than tol). Otherwise, either half can proceed to Level 3, and so on.
As a side note, the two values of I given above (for integration with step size h
and h/2) can be combined to give a more accurate value for I given by
16S2 − S1
I= + O( h7 ),
15
where the error terms of O(h5 ) approximately cancel. This free lunch, so to speak,
is called Richardson’s extrapolation.
40 CHAPTER 6. INTEGRATION
Chapter 7
q(t) = q0 e−t/RC .
The classic second-order initial value problem is the RLC circuit, with differen-
tial equation
d2 q dq q
L 2 + R + = 0.
dt dt C
Here, a charged capacitor is connected to a closed circuit, and the initial conditions
satisfy
dq
q (0) = q0 , (0) = 0.
dt
The solution is obtained for the second-order equation by the ansatz
q(t) = ert ,
which results in the following so-called characteristic equation for r:
1
Lr2 + Rr + = 0.
C
If the two solutions for r are distinct and real, then the two found exponential
solutions can be multiplied by constants and added to form a general solution. The
constants can then be determined by requiring the general solution to satisfy the
two initial conditions. If the roots of the characteristic equation are complex or
degenerate, a general solution to the differential equation can also be found.
41
7.1. EXAMPLES OF ANALYTICAL SOLUTIONS
The temperature of the rod is maximum at x = 1/2 and goes smoothly to zero at
the ends.
A = 0.
B sin λ = 0.
The corresponding negative values of λ are also solutions, but their inclusion only
changes the corresponding values of the unknown B constant. A linear superposi-
tion of all the solutions results in the general solution
∞
y( x ) = ∑ Bn sin nπx.
n =1
For each eigenvalue nπ, we say there is a corresponding eigenfunction sin nπx.
When the differential equation can not be solved analytically, a numerical method
should be able to solve for both the eigenvalues and eigenfunctions.
ẋ = f (t, x ), (7.3)
with the initial condition x (0) = x0 . Define tn = n∆t and xn = x (tn ). A Taylor
series expansion of xn+1 results in
We say that the Euler method steps forward in time using a time-step ∆t, starting
from the initial value x0 = x (0). The local error of the Euler Method is O(∆t2 ).
The global error, however, incurred when integrating to a time T, is a factor of 1/∆t
larger and is given by O(∆t). It is therefore customary to call the Euler Method a
first-order method.
1
xn+1 = xn + ∆t f (tn , xn ) + f (tn + ∆t, xn+1 ) .
2
The obvious problem with this formula is that the unknown value xn+1 appears
on the right-hand-side. We can, however, estimate this value, in what is called the
predictor step. For the predictor step, we use the Euler method to find
p
xn+1 = xn + ∆t f (tn , xn ).
1 p
xn+1 = xn + ∆t f (tn , xn ) + f (tn + ∆t, xn+1 ) .
2
The Modified Euler Method can be rewritten in the following form that we will
later identify as a Runge-Kutta method:
k1 = ∆t f (tn , xn ),
k2 = ∆t f (tn + ∆t, xn + k1 ), (7.4)
1
x n +1 = x n + ( k 1 + k 2 ).
2
k1 = ∆t f (tn , xn ),
k2 = ∆t f (tn + α∆t, xn + βk1 ), (7.5)
xn+1 = xn + ak1 + bk2 ,
Therefore,
1
xn+1 = xn + ∆t f (tn , xn ) + (∆t)2 f t (tn , xn ) + f (tn , xn ) f x (tn , xn ) .
(7.6)
2
Second, we compute xn+1 from the Runge-Kutta method given by (7.5). Substi-
tuting in k1 and k2 , we have
xn+1 = xn + a∆t f (tn , xn ) + b∆t f tn + α∆t, xn + β∆t f (tn , xn ) .
a + b = 1,
αb = 1/2,
βb = 1/2.
There are three equations for four parameters, and there exists a family of second-
order Runge-Kutta methods.
The Modified Euler Method given by (7.4) corresponds to α = β = 1 and a =
b = 1/2. Another second-order Runge-Kutta method, called the Midpoint Method,
corresponds to α = β = 1/2, a = 0 and b = 1. This method is written as
k1 = ∆t f (tn , xn ),
1 1
k2 = ∆t f tn + ∆t, xn + k1 ,
2 2
x n +1 = x n + k 2 .
k1 = ∆t f (tn , xn ),
k2 = ∆t f (tn + α∆t, xn + βk1 ),
k3 = ∆t f (tn + γ∆t, xn + δk1 + ek2 ),
xn+1 = xn + ak1 + bk2 + ck3 .
order 2 3 4 5 6 7 8
minimum # stages 2 3 4 6 7 9 11
Because of the jump in the number of stages required between the fourth-order
and fifth-order method, the fourth-order Runge-Kutta method has some appeal.
The general fourth-order method starts with 13 constants, and one then finds 11
constraints. A particularly simple fourth-order method that has been widely used
is given by
k1 = ∆t f (tn , xn ),
1 1
k2 = ∆t f tn + ∆t, xn + k1 ,
2 2
1 1
k3 = ∆t f tn + ∆t, xn + k2 ,
2 2
k4 = ∆t f (tn + ∆t, xn + k3 ) ,
1
xn+1 = xn + (k1 + 2k2 + 2k3 + k4 ) .
6
e = | xn+1 − xn0 +1 |.
Now e is of O(∆t5 ), where ∆t is the step size taken. Let ∆τ be the estimated step
size required to get the desired error ε. Then we have
ẋ = u,
u̇ = f (t, x, u).
This trick also works for higher-order equation. For another example, the third-
order equation
...
x = f (t, x, ẋ, ẍ ),
can be written as
ẋ = u,
u̇ = v,
v̇ = f (t, x, u, v).
1 1 1
k2 = ∆t f tn + ∆t, xn + k , yn + l1 ,
2 2 1 2
1 1 1
l2 = ∆tg tn + ∆t, xn + k , yn + l1 ,
2 2 1 2
1 1 1
k3 = ∆t f tn + ∆t, xn + k2 , yn + l2 ,
2 2 2
1 1 1
l3 = ∆tg tn + ∆t, xn + k2 , yn + l2 ,
2 2 2
k4 = ∆t f (tn + ∆t, xn + k3 , yn + l3 ) ,
l4 = ∆tg (tn + ∆t, xn + k3 , yn + l3 ) ,
1
x n +1 = x n + (k + 2k2 + 2k3 + k4 ) ,
6 1
1
y n +1 = yn + (l1 + 2l2 + 2l3 + l4 ) .
6
y( x + h) − y( x )
y0 ( x ) = + O( h ),
h
y( x ) − y( x − h)
y0 ( x ) = + O( h ).
h
The more widely-used second-order approximation is called the central difference
approximation and is given by
y( x + h) − y( x − h)
y0 ( x ) = + O( h2 ).
2h
The finite difference approximation to the second derivative can be found from
considering
1 4 0000
y( x + h) + y( x − h) = 2y( x ) + h2 y00 ( x ) + h y (x) + . . . ,
12
from which we find
y( x + h) − 2y( x ) + y( x − h)
y00 ( x ) = + O( h2 ).
h2
Sometimes a second-order method is required for x on the boundaries of the do-
main. For a boundary point on the left, a second-order forward difference method
requires the additional Taylor series
4
y( x + 2h) = y( x ) + 2hy0 ( x ) + 2h2 y00 ( x ) + h3 y000 ( x ) + . . . .
3
We combine the Taylor series for y( x + h) and y( x + 2h) to eliminate the term pro-
portional to h2 :
Therefore,
−3y( x ) + 4y( x + h) − y( x + 2h)
y0 ( x ) = + O( h2 ).
2h
For a boundary point on the right, we send h → −h to find
2y1 − y2 = h2 f 1 + A,
−yn−1 + 2yn = h2 f n + B.
−y1 + 2y2 − y3 = h2 f 2 ,
−y2 + 2y3 − y4 = h2 f 3 ,
...
M=diag(-ones(n-1,1),-1)+diag(2*ones(n,1),0)+diag(-ones(n-1,1),1); .
y=M\b;
d2 y
= f ( x, y, dy/dx ),
dx2
with y(0) = A and y(1) = B, we use a shooting method. First, we formulate the
ode as an initial value problem. We have
dy
= z,
dx
dz
= f ( x, y, z).
dx
The initial condition y(0) = A is known, but the second initial condition z(0) = b
is unknown. Our goal is to determine b such that y(1) = B.
In fact, this is a root-finding problem for an appropriately defined function. We
define the function F = F (b) such that
F (b) = y(1) − B.
In other words, F (b) is the difference between the value of y(1) obtained from
integrating the differential equations using the initial condition z(0) = b, and B.
Our root-finding routine will want to solve F (b) = 0. (The method is called shooting
because the slope of the solution curve for y = y( x ) at x = 0 is given by b, and the
solution hits the value y(1) at x = 1. This looks like pointing a gun and trying to
shoot the target, which is B.)
To determine the value of b that solves F (b) = 0, we iterate using the Secant
method, given by
bn − bn − 1
bn + 1 = bn − F ( bn ) .
F ( bn ) − F ( bn − 1 )
We need to start with two initial guesses for b, solving the ode for the two
corresponding values of y(1). Then the Secant Method will give us the next value
of b to try, and we iterate until |y(1) − B| < tol, where tol is some specified tolerance
for the error.
−2y1 + y2 = −h2 λ2 y1 ,
yn−1 − 2yn = −h2 λ2 yn .
It is of interest to see how the solution develops with increasing n. The smallest
possible value is n = 1, corresponding to a single interior point, and since h = 1/2
we have
1
−2y1 = − λ2 y1 ,
4
2
√
so that λ = 8, or λ = 2 2 = 2.8284. This is to be compared to the first eigenvalue
which is λ = π. When n = 2, we have h = 1/3, and the resulting two equations
written in matrix form are given by
−2 1 y1 1 y
= − λ2 1 .
1 −2 y2 9 y2
This is a matrix eigenvalue problem with the eigenvalue given by µ = −λ2 /9. The
solution for µ is arrived at by solving
−2 − µ 1
det = 0,
1 −2 − µ
with resulting quadratic equation
(2 + µ)2 − 1 = 0.
√ √
The solutions are µ = −1, −3, and since λ = 3 −µ, we have λ = 3, 3 3 = 5.1962.
These two eigenvalues serve as rough approximations to the first two eigenvalues
π and 2π.
With A an n-by-n matrix, the MATLAB variable mu=eig(A) is a vector containing
the n eigenvalues of the matrix A. The built-in function eig.m can therefore be used
to find the eigenvalues. With n grid points, the smaller eigenvalues will converge
more rapidly than the larger ones.
We can also consider boundary conditions on the derivative, or mixed boundary
conditions. For example, consider the mixed boundary conditions given by y(0) = 0
and y0 (1) = 0. The eigenvalues of (7.9) can then be determined analytically to be
λi = (i − 1/2)π, with i a natural number.
The difficulty we now face is how to implement a boundary condition on the
derivative. Our computation of y00 uses a second-order method, and we would
like the computation of the first derivative to also be second order. The condition
y0 (1) = 0 occurs on the right-most boundary, and we can make use of the second-
order backward-difference approximation to the derivative that we have previously
derived. This finite-difference approximation for y0 (1) can be written as
3yn+1 − 4yn + yn−1
y0n+1 = . (7.11)
2h
Now, the nth finite-difference equation was given by
and we now replace the value yn+1 using (7.11); that is,
1
2hy0n+1 + 4yn − yn−1 .
y n +1 =
3
Implementing the boundary condition y0n+1 = 0, we have
4 1
y n +1 = y n − y n −1 .
3 3
2 2
y − y n = − h2 λ2 y n .
3 n −1 3
For example, when n = 2, the finite difference equations become
−2 1 y1 1 2 y1
2 =− λ .
3 − 23 y2 9 y2
y0 = z,
z0 = −λ2 y,
F (λ) = 0, (7.12)
where F (λ) = y(1), obtained by solving the initial value problem. Again, an itera-
tion for the roots of F (λ) can be done using the Secant Method. For the eigenvalue
problem, there are an infinite number of roots, and the choice of the two initial
guesses for λ will then determine to which root the iteration will converge.
For this simple problem, it is possible to write explicitly the equation F (λ) = 0.
The general solution to (7.9) is given by
The initial condition y(0) = 0 yields A = 0. The initial condition y0 (0) = 1 yields
B = 1/λ.
sin (λx )
y( x ) = .
λ
sin λ
F (λ) = ,
λ
and the roots occur when λ = π, 2π, . . . .
If the boundary conditions were y(0) = 0 and y0 (1) = 0, for example, then we
would simply redefine F (λ) = y0 (1). We would then have
cos λ
F (λ) = ,
λ
and the roots occur when λ = π/2, 3π/2, . . . .