MA2000 Notes

COLLEGE OF SCIENCE AND ENGINEERING
MA2000
MATHEMATICS FOR SCIENTISTS

AND ENGINEERS
LECTURE NOTES
(Including Tutorial Exercises)
© Mathematics Discipline, James Cook University.

Contents
1 Partial Derivatives 1
1.1 Functions of two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Definition of partial derivative . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Chain rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Functions of Three Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.7 Tangent Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 EXERCISES - Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.9 SELECTED ANSWERS - Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . 18
2 Line Integrals 19
2.1 Arc Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Integral of a function, along a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Work Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Path dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 EXERCISES - Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 SELESCTED ANSWERS - Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Multiple Integration 29
3.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 EXERCISES - Multiple Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 SELECTED ANSWERS - Multiple Integration . . . . . . . . . . . . . . . . . . . . . 43
4 Vector Calculus 44
4.1 Divergence of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Some Important Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 EXERCISES - Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 SELECTED ANSWERS - Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Fourier Series 52
5.1 Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.1 Trigonometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Half-range Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 EXERCISES - Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 SELECTED ANSWERS - Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . 64
6 Eigenvalues and Eigenvectors 65

6.1 Revision of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.1.2 Solving systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . 68
6.1.3 Homogeneous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.1.4 The inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 EXERCISES - Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 SELECTED ANSWERS - Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . 80
7 Partial Differential Equations 81

7.1 Types of Partial Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Separation of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.1 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 EXERCISES - Partial Differential Equations . . . . . . . . . . . . . . . . . . . . . . 90
7.4 SELECTED ANSWERS - Partial Differential Equations . . . . . . . . . . . . . . . . 91
8 Probability and Statistics 92

8.1 Probability Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.2 Conditional Probability and Independence . . . . . . . . . . . . . . . . . . . . . . . . 95
8.2.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.3 Permutations and Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.4 Random Variables (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4.1 Probability Distributions for Discrete Random Variables . . . . . . . . . . . . 99
8.4.2 Probability Distributions for Continuous Random Variables . . . . . . . . . . 101
8.4.3 Expectation for Discrete Random Variables . . . . . . . . . . . . . . . . . . . 102
8.4.4 Expectation for Continuous Random Variables . . . . . . . . . . . . . . . . . 105
8.5 EXERCISES - Probability and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.6 SELECTED ANSWERS - Probability and Statistics . . . . . . . . . . . . . . . . . . 110
9 Discrete Probability Distributions 112

9.1 Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.2 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.3 Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.4 EXERCISES - Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . 119
9.5 SELECTED ANSWERS - Discrete Probability Distributions . . . . . . . . . . . . . 121
10 Continuous Probability Distributions 122

10.1 Uniform Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.2 Normal Distribution (Gaussian Distribution) . . . . . . . . . . . . . . . . . . . . . . 124
10.2.1 Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.3 Log-Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
10.5 Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.6 Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
10.7 EXERCISES - Continuous Probability Distributions . . . . . . . . . . . . . . . . . . 136
10.8 SELECTED ANSWERS - Continuous Probability Distributions . . . . . . . . . . . . 137
11 Sampling and Hypothesis Testing 138

11.1 Chebychev’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
11.2 Sampling and Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.2.1 Populations and Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.2.2 Measures of Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.2.3 Measures of Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
11.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
11.4 Statistical Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
11.4.1 Similarities with a Court Trial . . . . . . . . . . . . . . . . . . . . . . . . . . 141
11.4.2 Errors in Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
11.5 EXERCISES - Sampling and Hypothesis Testing . . . . . . . . . . . . . . . . . . . . 145
11.6 SELECTED ANSWERS - Sampling and Hypothesis Testing . . . . . . . . . . . . . . 147
1 Partial Derivatives
1.1 Functions of two variables
Functions of one variable (y = f (x)) have one independent variable (x) and one dependent
variable (y). We can represent such functions on a 2-dimensional graph. For example
y = f (x) The slope of the tangent at any point

is the derivative of the function.
Functions of two variables will have the form z = f (x, y). They have two independent
variables (x and y) and one dependent variable (z).
For example, the volume of a cone,

1
V = πr2 h,
3
depends on the height h and the radius of the base r. The volume V is the dependent
variable. We can vary r, keeping h fixed, or vary h keeping r fixed, or vary both. In each
case, V may (will) change.
Y
As a second example, consider a vibrating
string that is fixed at x = 0 and x = l. Its
displacement u will depend on x and t. For
example
πx πct
u u(x, t) = sin cos .
l l
l X
1
To represent z = f (x, y) graphically, we need three dimensions. The function actually
represents a two dimensional surface in three dimensions. Consider the function z = x2 + y 2 .
At each point in the x–y plane, calculate z and plot the point (x, y, z).
-
The points make up a surface called a
paraboloid. It is actually the parabola z = x2
z
rotated about the z axis.
-
Y
- (x; y )
X
z = f (x; y ) = x
2 + y2
Z
a
Another example is the function

√
z = a2 − x2 − y 2 . This represents a hemi-
sphere of radius a.
a Y
a
X
Z
-
z = xy represents a ‘saddle’
-Y
X - z = xy
Finally, we have already seen that the function z = ax + by + c represents a flat plane.
2
Another way to represent these functions is to use ‘level curves’. For example, if z = f (x, y) =
x2 + y 2 , the level curves are z = constant or x2 + y 2 = constant = a2 .
This will be a family of circles.

X
If z = xy, the level curves are xy = constant.

i.e.
constant
y= . X
x
This will be a family of hyperbolae.
Families of level curves can be seen as contour maps of the surface.
3
1.2 Derivatives
The derivative is the rate of change of the function as x and y change. But the change in
(x, y) can be in any direction.
1
For example, for the cone, V = πr2 h, we can analyse how V changes as h → h + δh:
3
1 1 1
V → V + δV = πr2 (h + δh) = πr2 h + πr2 δh
3 3 3
1 δV 1
so δV = πr2 δh. The rate of change is = πr2 so
3 δh 3
∂V δV 1
= lim = πr2
∂h δh→0 δh 3
This is the partial derivative with respect to h (keeping r constant).
The ∂ indicates that the other variable is kept constant.
On the other hand, if we change r:
1
V + δV = π(r + δr)2 h
3
1 δV 1
so δV = π(2rδr + (δr)2 )h. The rate of change is = π(2r + δr)h and
3 δr 3
∂V δV 2
= lim = πrh.
∂r δr→0 δr 3
Note that we do each differentiation keeping the other variable fixed.
1.2.1 Definition of partial derivative
If f (x, y) is a function of two variables,
∂f f (x0 + ∆x, y0 ) − f (x0 , y0 )

(x0 , y0 ) = lim
∂x ∆x→0 ∆x
∂f f (x0 , y0 + ∆y) − f (x0 , y0 )
(x0 , y0 ) = lim
∂y ∆y→0 ∆y
4
Y
This gives the rate at which f is

changing as we move in the x or
y direction respectively.
On the surface, the derivative indicates a movement either uphill or downhill and how fast.
Example: f (x, y) = 2x2 + 2x3 y 4 − 2xy

∂f ∂f
= 4x + 6x2 y 4 − 2y and = 8x3 y 3 − 2x
∂x ∂y
Example: f (x, y) = ex sin(xy)
∂f ∂f
= ex sin(xy) + ex y cos(xy) and = ex x cos(xy)
∂x ∂y
∂f ∂f
Note that and is the notation used when it is not clear which variable is to be
∂x y ∂y x
held constant.
∂f
That is, to calculate , we first express f as a function of x and y, and not some related
∂y x
variables.
1.2.2 Higher Order Derivatives
We can calculate higher derivatives similarly.

!
∂ 2f ∂ ∂f
2
= (differentiate twice with respect to x keeping y fixed.)
∂x ∂x ∂x
!
2
∂ f ∂ ∂f
= (differentiate with respect to y then x.)
∂x∂y ∂x ∂y
!
2
∂ f ∂ ∂f
= (differentiate with respect to x then y.)
∂y∂x ∂y ∂x
etc.
Note: A shorthand notation is often used:
∂f ∂f ∂ 2f ∂ 2f ∂ 2f ∂ 2f
fx = , fy = fxx = , fyy = fxy = , fyx =
∂x ∂y ∂x2 ∂y 2 ∂y∂x ∂x∂y
5
Example: f (x, y) = 2x2 + 2x3 y 4 − 2xy
∂f ∂f
fx = = 4x + 6x2 y 4 − 2y, fy = = 8x3 y 3 − 2x
∂x ∂y
∂ 2f ∂ 2f
fxx = = 4 + 12xy 4
, fyx = = 24x2 y 3 − 2
∂x2 ∂x∂y
∂ 2f ∂ 2f
fxy = = 24x2 y 3 − 2, fyy = = 24x3 y 2
∂y∂x ∂y 2
∂ 2f ∂ 2f
Note that = . This is true for many functions.
∂y∂x ∂x∂y
6
1.3 Differentials
We already know how a function of two variables changes if we change the value of x or y
by a small amount. The rate of change is given by the corresponding partial derivative. We
can now get an idea of how the function changes if we change the values of both x and y by
small amounts. Thus, for a function, z = f (x, y), we want to find what happens to z if we
move from (x, y) to (x + ∆x, y + ∆y) The change in z will be given by
∆z = f (x + ∆x, y + ∆y) − f (x, y).
We can write this as

∆z = f (x + ∆x, y + ∆y) − f (x + ∆x, y) + f (x + ∆x, y) − f (x, y) .
The first term is the change due to the change in y and the second term is the change due
to the change in x. We can approximate these separately as
f (x + ∆x, y + ∆y) − f (x + ∆x, y) ' fy ∆y,

f (x + ∆x, y) − f (x, y) ' fx ∆x.
Therefore,
∆z ' fx ∆x + fy ∆y.
(Note that we might expect that fy would need to be evaluated at the point
(x + ∆x, y) rather than the point (x, y). However, the difference that would result is small
and, to the accuracy of the approximations we have made, we can evaluate both fx and fy
at (x, y).)
The formula for ∆z is called the increment formula and it is analogous to the situation for
dy
functions of one variable where δy ' δx. We simply have an extra term for the extra
dx
independent variable. It is valid for any finite changes, ∆x and ∆y. If we consider changes
in x and y that are infinitessimally small we usually write these changes as dx and dy. The
corresponding change in z is
dz = fx dx + fy dy.
dz is called the differential.
7
1
Example: The area of a right-angled triangle with base x and angle θ is A = x2 tan θ.
2
(a) Approximate the change in the area if x

x tan changes from 1 to 0.95 and θ changes from 45◦
to 50◦

x
The increment formula is

∂A ∂A
∆A ' ∆x + ∆θ.
∂x ∂θ
1
Now ∂A/∂x = x tan θ = 1 (if θ = 45◦ ) and ∂A/∂θ = x2 sec2 θ = 1. Therefore,
2
∂A ∂A
∆A ' ∆x + ∆θ
∂x ∂θ
= ∆x + ∆θ
= −0.05 + 5π/180
= 0.0372
(b) If x changes from 1 to 0.95, calculate the change in θ required to keep A the same.
In this case,
∂A ∂A
∆A = 0 = ∆x + ∆θ
∂x ∂θ
∴ 0 = −0.05 + ∆θ.
∴ ∆θ = 0.05 radians
= 2.86◦ .
8
1.4 Directional derivatives
For functions of two variables, z = f (x, y), we know the derivative in the x direction is fx
and the derivative in the y direction is fy . What about other directions?
Represent f (x, y) by its level curves and consider

the point (x0 , y0 ). A displacement, (∆x, ∆y), from
that point will make an angle θ with the x–axis.
The unit vector in that direction is
û
∼ = cos θ∼i + sin θ∼j
∆x ∆y q
where cos θ = , sin θ = and ∆s = (∆x)2 + (∆y)2 . If we move in the direction of ∼
û,
∆s ∆s
the change in f is
∆f = ∆z ' fx ∆x + fy ∆y.
∆f
The rate of change is . The directional derivative is
∆s
df ∆f
= lim
ds ∆s→0 ∆s
∆x ∆y

= lim fx + fy
∆s→0 ∆s ∆s
= fx cos θ + fy sin θ.
df
The directional derivative, , depends on the direction, θ (or ∼ û), as well as the position
ds
(x0 , y0 ). This is the quantity that tells us if we are going uphill or downhill and how fast.
df
For displacements parallel to the x–axis, θ = 0 and the directional derivative is = fx (the
ds
partial derivative). For displacements parallel to the y–axis, θ = π/2 and the directional
df
derivative is = fy .
ds
The directional derivative is sometimes written as D∼û f .
9
Example: Calculate D∼û f in the direction of ∼i + 2∼j for the function
f (x, y) = 2x + 2x3 y 4 − 2xy at the point (3, 1).
2
Now, fx = 4x + 6x2 y 4 − 2y = 64 at (3, 1) and fy = 8x3 y 3 − 2x = 210 at (3, 1).

1 2
û is in the direction ∼i + 2∼j so: û = √ ∼i + √ ∼j
∼ ∼
5 5
1 2
∴ D∼û f = 64 × √ + 210 × √
5 5
= 216.45.
The directional derivative can also be written in terms of vectors. For example,
df
D∼û f = = fx cos θ + fy sin θ = (fx ∼i + fy ∼j) · (ux ∼i + uy ∼j).
ds
The vector on the right is just ∼
û. The other vector is a new vector called grad f or ∇f
∼ . Thus
D∼û f = ∇f
∼ ·∼
û.
∂ ∂
The symbol ∇
∼ represents a vector differential operator which can be written as ∼
i + ∼j .
∂x ∂y
Note that ∇f∼ contains information about the function and the position and ∼
û gives the
direction in which we want the derivative.
Example: Consider the function z = f (x, y) = x2 + y 2 . Determine the directional

√
derivative at ( 3/2, 1/2) in the direction of ∼
u = (1, 1)
Y
For this function, the level curves are circles.
u~
First we calculate fx and fy :
∂f √ ∂f
= 2x = 3 and = 2y = 1
X ∂x ∂y
√
so ∇f
∼ = 3∼i + ∼j
√ √
This tells us what the function looks like at the point. Also, ∼
ˆ = (1/ 2, 1/ 2). Therefore
u
√ √ √
∇f ˆ=
∼ ·∼
u 3/ 2 + 1/ 2 ' 1.93.
This tells us how fast f is increasing in the direction of ∼

ˆ
u.
Note that f is increasing in this direction as the diagram suggests.
10
It also helps to draw the direction of ∇f
∼ on the diagram.
Y
u~
r~ f
If it is drawn carefully, it is perpendicular to the
X level curves.
We expect this since, if we move along the level curve, df /ds will be zero. Therefore, if we
choose ∼
ˆ to point along the level curve,
u
∇f ˆ = 0.
∼ ·∼
u
i.e. a vector parallel to the level curve is perpendicular to ∇f

∼ .
Note that this gives us a quick way of finding the normal to any curve. If the curve is defined
by f (x, y) = const, then the normal to the curve is ∇f
∼ = fx ∼i + fy ∼j. This helps to give an
understanding of the ‘meaning’ of ∇f
∼ .
The other thing we may want to know is the direction in which f increases the most rapidly.
Now df /ds = ∇f
∼ ·∼ ˆ = |∇f
u u| cos φ where φ is the angle between ∇f
∼ | |ˆ
∼ ∼ and ∼ˆ
u.
So df /ds = |∇f
∼ | cos φ. This is a maximum if φ = 0. i.e. if ∼
ˆ ∝ ∇f
u ∼ .
Therefore, ∇f
∼ points in the direction of maximum increase. Note also that the maximum
steepness is given by |∇f
∼ |.
Also note that ∇f∼ is a vector function obtained from a scalar function. In many cases,
where a force field is obtained from a scalar potential, the relationship is ∼
F = −∇V
∼ . Vector
functions that can be obtained from a scalar function in this way are very important.
11
1.5 Chain rules
For functions of one variable, if y = y(x), and x = x(t) then we recall the Chain Rule:
dy dy dx
= .
dt dx dt
Similarly, for functions of two variables, z = f (x, y), x and y might both depend on t. For
example, (x(t), y(t)) might be the coordinates of a moving particle, so ∼ r = x(t)∼i + y(t)∼j, and
f (x, y) might be some potential that affects the motion of the particle. The potential ‘seen’
by the particle is z = f (x(t), y(t)). In this case, z really depends on t only.
dz
The derivative: is a measure of how rapidly the potential is changing for the particle.
dt
In time δt, the change in f is δf ' fx δx + fy δy. We want
δf δx δy

lim = lim fx + fy
δt→0 δt δt→0 δt δt
dx dy
= fx + fy
dt dt
∂f dx ∂f dy
= +
∂x dt ∂y dt
Note this is the same as the chain rule for functions of one variable, except that we have two
terms instead of one. It can also be written as
df d∼
r
= ∇f
∼ · .
dt dt
Example: A particle moves through a potential field V = 3x2 y + ex along a path

dV
r = t∼i + sin t∼j. Calculate
∼ when t = π.
dt
∂V ∂V
First, we calculate ∇V
∼ = i+
∼ j = (6xy + ex )∼i + 3x2 ∼j
∂x ∂y ∼
To get ∇V
∼ in terms of t, we use the path where x = t, y = sin t which gives:
∇V
∼ = (6t sin t + et )∼i + 3t2 ∼j
As ∼
r = x(t)∼i + y(t)∼j, we can see that x(t) = t, y(t) = sin t.
d∼
r dx dy
Therefore = i + ∼j = ∼i + cos t∼j. Therefore
∼
dt dt dt
dV dr
= ∇V
∼ · ∼ = (6t sin t + et ) + 3t2 cos t
dt dt
dV
∴ = eπ − 3π 2
dt t=π
12
1.6 Functions of Three Variables
We now extend the above concepts to functions of three variables, i.e. w = f (x, y, z). Most
of the formulae generalise. The main difficulty is in visualizing what is happening. For
example, consider
w = x2 + y 2 + z 2 .
We can’t ‘plot’ this as a surface, as we would need four dimensions. So, we can’t visualize it
in this way.
We can try to look at the ‘level curves’. i.e. f (x, y, z) = const or
x2 + y 2 + z 2 = constant.
Note that these are now level surfaces. In fact they are spheres with centre at the origin.
They need three dimensions rather than two dimensions.
There will be three partial derivatives. The differential is now
dw = fx dx + fy dy + fz dz.
The increment formula is

δw = fx δx + fy δy + fz δz.
The directional derivative is

D∼û f = ∇f ˆ
∼ ·∼
u,
where ∇f
∼ = fx ∼i + fy ∼j + fz ∼
k and ∼
ˆ is a three dimensional unit vector in the direction we are
u
interested in.
There is also a chain rule. If V (t) = f (x(t), y(t), z(t)), then
dV ∂f dx ∂f dy ∂f dz d∼
r
= + + = ∇f
∼ · .
dt ∂x dt ∂y dt ∂z dt dt
13
1.7 Tangent Planes
As for functions of two variables, ∇f
∼ is perpendicular to the level curves, only in this case
they are level surfaces. This gives us a way to find the normal of a surface. Suppose we have
a surface z = g(x, y). If
f (x, y, z) = z − g(x, y)
then the equation f (x, y, z) = 0 will define the same surface. This is a level surface of f .
The normal to this surface is given by ∇f
∼ . i.e.
n
∼ ∝ ∇f
∼
∂g ∂g
= − ∼i − j + k.
∂x ∂y ∼ ∼
This is the same as the formula we would get if we calculated the normal to the surface
z = g(x, y) using other methods. Once we know the normal to the surface, we can calculate
the equation of the tangent plane.
The tangent plane at (x0 , y0 , z0 ) will

(x
0 ; y0 ; z 0 )
be perpendicular to the normal to
the surface at that point.
Exercise
Show that the equation of the tangent plane to the surface z = g(x, y) at the point (x0 , y0 , z0 )
can be expressed as
−(x − x0 )gx − (y − y0 )gy + (z − z0 ) = 0
or
−gx x − gy y + z = (−gx x0 − gy y0 + z0 ) = const.
14
1.8 EXERCISES - Partial Derivatives
∂z ∂z
1. (a) Find and for each of the following functions.
∂x ∂y
(i) z = e2x−3y (ii) z = x ln(x2 + y 2 ) (iii) z = sin x + x cos y
(b) For each of the following functions, evaluate each of the 2nd order derivatives and
∂ 2f ∂ 2f
verify that = .
∂x∂y ∂y∂x
(i) f (x, y) = 2x3 y + 3xy 2 − 4y 3 (ii) f (x, y) = cosh(xy + 2y)
2. (a) If f (x, y) = sin(x2 + y 2 ) find fx and fy .

(b) If f (x, t) = g(x − ct) where c is a constant and g(u) can be any function, show
that
∂f ∂f
+c =0
∂t ∂x
3. On the attached sheet are the surfaces and level curves of the functions
√ 1
(i) z = sin x2 + y 2 (ii) z = x2 y 2 e−x −y (iii) z = 2
2 2
x + 4y 2
(iv) z = x3 − 3xy 2 (v) z = sin x sin y (vi) z = sin2 x + 41 y 2
Match each graph with its function.
4. (a) Use the increment formula to approximate the following. (Do not use your cal-
culator.)
√ √
(i) 0.99 e0.02 (ii) 3.012 + 3.972
(b) A boundary stripe 3cm wide is painted around a netball court (whose dimensions
are 15m by 30m). Use the increment formula to approximate the number of
square metres of paint in the stripe.
(c) The total resistance of two resistors in parallel is given by
1 1
−1
R= +
R1 R2
If R1 and R2 are measured to be 75 ± 5 ohms and 30 ± 2 ohms respectively, find

the maximum error in the calculated value of R.
5. For each of the following, use the chain rule to find df /dt as a function of t.
(a) f (x, y) = xex+y , x(t) = 2t, y(t) = t2 .

(b) f (x, y) = x2 y + cos(y), x(t) = et , y(t) = 1 − t.
Check your answers by first substituting the expressions for x and y into f (x, y).
6. Calculate ∇f
∼ , for each of the functions in question (5).
15
7. Find the directional derivatives of each of the following functions in the direction
indicated.
(a) f (x, y) = x2 y + y 2 x, at (1, 1) in the direction ∼

u = 2∼i + 3∼j.
(b) f (x, y) = ex y + xy, at (0, 1) in the direction ∼
u = ∼i − 4∼j.
(c) f (x, y, z) = x2 z + yz + y 2 z 2 , at (1, 2, 1) in the direction ∼
u = ∼i + 2∼j − 2∼
k.
8. For the functions in question (7), find the direction in which the function is increasing
most rapidly at the given point.
9. For each of the following functions, find the direction of the normal and the equation
of the tangent plane at the given point.
√ √
(a) z = 4 − x2 − y 2 at (1, 1, 2)
(b) z = x + 2y 3 at (1, 1, 3)
10. Plane polar coordinates, (r, θ), and Cartesian coordinates, (x, y), are related by the
√
formulae r = x2 + y 2 and x = r cos θ. Calculate ∂r/∂x and ∂x/∂r and show that
∂r ∂x
6= 1.
∂x ∂r
*11. In thermodynamics, the ideal gas equation for a fixed mass of gas is P V = RT where
R is a constant.
(a) Show that ,

∂V ∂T ∂T
=−
∂P T
∂P V
∂V P
(b) Also show that

∂V ∂P ∂T
= −1
∂P T
∂T V
∂V P
12. The equations in Q11 are valid whenever P , V and T are related by an ‘equation of
state’ such as f (P, V, T ) = 0. Prove the equations again for this equation of state
starting with
df = fP dP + fV dV + fT dT = 0
and calculate the partial derivatives by setting dP = 0, dV = 0 and dT = 0 in turn.
16
17
1.9 SELECTED ANSWERS - Partial Derivatives
1. (a) (i) ∂z/∂x = 2e2x−3y , ∂z/∂y = −3e2x−3y , (ii) zx = ln(x2 + y 2 ) + 2x2 /(x2 + y 2 ),
zy = 2xy/(x2 + y 2 ); (b) (i) fxx = 12xy, fyy = 6x − 24y, fxy = fyx = 6x2 + 6y,
2. (a) fx = 2x cos(x2 + y 2 ), fy = 2y cos(x2 + y 2 ).
3. (i) B III, (iv) A VI.

√
4. (a) (i) Let z(x, y) = x ey , (x0 , yy ) = (1, 0), then (δx, δy) = (−0.01, 0.02) gives z =
1.015; (b) 2.7m2 ; (c) 1.43 Ω.
5. (a) 2(1 + 2t + 2t2 )e2t+t ,

2
6. (a) (1 + x)ex+y ∼
i + xex+y ∼j; (b) 2xy ∼i + (x2 − sin y)∼j.
√ √
7. (a) 15/ 13; (b) −2/ 17; (c) −10/3.
8. (a) 3∼i + 3∼j; (b) 2∼i + ∼j; (c) 2∼i + 5∼j + 11∼
k.
√
9. (a) x + y + 2z = 4; (b) x + 6y − z = 4.
18
2 Line Integrals
To evaluate an integral along a curve, we are often required to ’parameterise’ the curve, i.e.
express a function using the vector-paramteric form of the curve. For example
express: f (x, y) as ∼
r = x(t)∼i + y(t)∼j
or: f (x, y, z) as ∼
r = x(t)∼i + y(t)∼j + z(t)∼
k
Example: Parameterise the curve y = x2 from x = 0 to x = 1.
Introduce a parameter by letting x = t, therefore y = t2 , and ∼

r(t) = x(t)∼i + y(t)∼j = t∼i + t2 ∼j
and we can see that: at x = 0, t = 0, ∼
r(0) = (0, 0)
at x = 1, t = 1, ∼
r(t) = (1, 1)
Similarly, we parameterise the circle: f (x, y) = x2 + y 2 = a2 by using the parametric

equations: x = a cos t, y = a sin t to get:
r(t)
∼ = a cos t∼i + a sin t∼j
2.1 Arc Length

We have seen how to parameterize a curve ∼ r =∼ r(t). There are many instances where we
need to do an integral along a curve. This is similar to doing an integral along the x-axis,
but is more general. There are essentially two types of line integrals and the means for
calculating them are similar.
For one type, the simplest example is to find the length of a curve. (A method for doing
this for two dimensional curves is given in first year, but we also need to be able to do it
for curves in three dimensions.) The problem turns into an integral because we first divide
the line into small segments, then find the length of each segment and add these lengths
together. We then take the limit as the length of each segment approaches zero.
Z
If the curve is ∼
r=∼r(t), we can divide it into seg-
ments δt. The length of each segment will be δs
s where q
δs ' (δx)2 + (δy)2 + (δz)2 .
finish
Y We then take
X
δs.
X start
19
Z finish Z
We then take lim and the sum becomes ds or ds, where C is used to indicate the
δs→0 start C
contour of integration.
We still need to know how to evaluate this integral. There are a few different approaches, but
most are equivalent to the following. To do this, the curve needs to be parameterized. If it
isn’t, then this will be the first thing to do. (Sometimes, it may be necessary to parameterize
the curve in two or more different sections.) Then, for each segment of the curve,
q
δs = (δx)2 + (δy)2 + (δz)2
s
2 2 2
δx δy δz

= + + δt
δt δt δt
so the length is: s

t=b 2 2 2
δx δy δz

s= + +
X
δt.
t=a δt δt δt
δx dx
When we take the limit as δt → 0, → etc. and δt is replaced by dt.
δt dt
s
2 2 2
dx dy dz

Thus ds = + + dt and the length of the curve is:
dt dt dt
s
Z b 2 2 2
dx dy dz

s= + + dt.
a dt dt dt
ds dr
(Note that the integrand here is or ∼ .) Everything here should be known so we can,
dt dt
in principle, calculate the length.
Example: Find the length of the curve r(t)

∼ = cos t∼i + sin t∼j + t∼
k between t = 0 and
t = 2π (one ‘circle’ of a helix).
dx
We calculate = − sin t
dt
dy
= cos t
dt
dz
=1
dt
ds q 2 √
so = sin t + cos2 t + 1 = 2.
dt
Z 2π
ds
The arc length is: s= dt
0 dt
Z 2π √
= 2 dt
√0 2π
= 2[t]0
√
= 2π 2
20
2.2 Integral of a function, along a curve
Sometimes, we need to calculate integrals of the type
Z
f (x, y, z) ds.
C
This type of integral can arise if we have a piece of wire, say, where the density depends on
position and we want to find the total mass.
Example: A piece of wire is bent into the shape of a parabola y = 1−x2 between x = 0 and
x = 1. The density (mass/unit length) at the point (x, y) is ρ(x, y) = ax (i.e., proportional
to the distance from the y-axis). Calculate the total mass of the wire.
1 If we divide the curve into segments again, the

s mass of each segment is ρδs. The total mass is
X
ρ(x, y)δs.
1 X
Z
In the limit as δs → 0, M = ρ(x, y) ds. We can then write ds = (ds/dt)dt.
How can we choose a parameter? Usually, we can let t = x. Then y = 1 − t2 so
r(t)
∼ = x∼i + y ∼j = t∼i + (1 − t2 )∼j.
The mass is ρ = ax = at and

s
2 2
ds dx dy
q
= + = 1 + (2t)2 .
dt dt dt
Z 1
ds
t goes from 0 to 1, so the mass is: ρ dt
0 dt
Z 1
= at(1 + 4t2 )1/2 dt
0
h1 2 i1
=a (1 + 4t2 )3/2
83 0
1 h 3/2 i
= a 5 −1
12
21
2.3 Work Done
The other type of line integral is often needed when we have a vector field ∼ F (x, y, z) rather
than a scalar field, f (x, y, z). An example might be if ∼
F is the force acting at any point in
space, and we want to find the work done on a particle as it moves along a curve in space.
If ∼
F is a constant, then the work done is W = ∼ d, where ∼
F ·∼ d is the displacement.
-r~
Z
-
If ∼
F varies with position, we have to break
the displacement up into small segments over
which ∼F is nearly constant.
If the displacement in each segment is δ ∼r,
then
-Y δW = ∼ F · δ∼
r.
- X
We then have
finish
W =
X
F
∼ · δ∼
r.
start
As |δ ∼
r| → 0, and the number of segments gets larger, we get
Z Z
W = ∼ r=
F · d∼ F (x, y, z)
∼ · d∼
r.
C C
dr
r ' ∼ δt, we can
Once again, it is helpful to express this in terms of a parameter, t. Since δ ∼
dt
d∼
r
write d∼
r' dt. Then
dt
t1
dr
W = F (x(t), y(t), z(t)) · ∼ dt.
X
∼
t0 dt
Remember that ∼
r(t) = x(t)∼i + y(t)∼j + z(t)∼
k and so
d∼
r dx dy dz
= i + ∼j + ∼
∼ k.
dt dt dt dt
Example: Find the work done by a force
F (x, y, z)
∼ = (y + z 2 )∼i + xz ∼j + y ∼
k
acting on a particle that moves from (0, 1, 0) to (1, 0, −1) along the curve:
r(t)
∼ = t∼i + (1 − t2 )∼j − t∼
k
22
Notice that we have two different vector functions here. One is a vector field defined in
space. The other represents a curve in that space. The work done is
Z Z
d∼
r
W = F
∼ r=
· d∼ F
∼ · dt.
dt
Now d∼ r/dt = ∼i − 2t∼j − ∼

k. Also, x = t, y = 1 − t2 and z = −t, so ∼
F (x(t), y(t), z(t)) =
i − t ∼j + (1 − t )∼
∼
2 2
k. We can then calculate
d∼
r
F
∼ · = 1 + 2t3 − 1 + t2 = 2t3 + t2
dt
Z 1
Therefore, the work done is: W = 2t3 + t2 dt
0
1 1
1
= t4 + t3
2 3 0
5
=
6
Quite often, this type of integral is written a bit differently. Suppose
F (x, y, z)
∼ = u∼i + v ∼j + w∼k
and ∼
r = x∼i + y ∼j + z ∼
k.
Then d∼
r = dx∼i + dy ∼j + dz ∼
k and
F
∼ r = u(x, y, z)dx + v(x, y, z)dy + w(x, y, z)dz
· d∼
and the integral becomes

Z Z Z
u(x, y, z) dx + v(x, y, z) dy + w(x, y, z) dz.
C C C
These are three separate integrals, calculated along the same curve C (not along the axes).
The connection with the work done is not quite so obvious here.
Z
If the integral is given in this form, it is always possible to go back to the ∼ r form. Or
F · d∼
we could start with the parameterized form of the curve and calculate each integral directly.
23
Z Z
Example: Calculate xy dx + y 2 + 2x dy along the curve x = t, y = et , z = sin t
from t = 1 to t = 2.
Notice there is no · · · dz term here. Effectively, w(x, y, z) = 0.

R
Once again, we write everything in terms of t:
dx dy
dx = dt = dt, dy = dt = et dt.
dt dt
Z Z
So the integral becomes: xy dx + y 2 + 2x dy
Z 2 Z 2
= te dt +
t
(e2t + 2t)et dt
1 1
h i2 h1 i2
= (t − 1)et + e3t + 2(t − 1)et
1 3 1
1 1
= e2 + e6 − e3 + 2e2
3 3
1 6 1 3
= 3e + e − e
2
3 3
24
2.4 Path dependence
Consider moving between two points along two different paths.
This all takes place in the same vector field, but

each particle sees different parts of that vector
field.
X
The ‘work done’ could be quite different for different paths. For example, if ∼
F = xy ∼i + x∼j
and ∼
r(t) = t∼i + t∼j for 0 6 t 6 1, the integral is
Z Z Z
F
∼ r=
· d∼ xy dx + x dy
C1
Z 1 Z 1
= t2 dt + t dt
0 0
1 1 5
= + =
3 2 6
But if ∼
r(t) = t∼i + t2 ∼j for 0 6 t 6 1, we get
Z Z 1 Z 1
F
∼ r=
· d∼ t dt +
3
t(2t) dt
C2 0 0
1 2 11
= + =
4 3 12
If we do a complete (closed) loop, and get back to where we started, the work done is
Z Z
1
F · d∼
r− F r=
· d∼
C2
∼
C1
∼
12
25
There are some fields, ∼
F , for which
Z
F
∼ r=0
· d∼
C
for all closed loops. Such fields are called conservative. The work done is always zero. These
are force fields that can be derived from a potential function (e.g. gravity, electric fields). In
this case, ∼F = −∇V∼ .
Any vector field which is the gradient Zof a scalar is conservative. The scalar is (minus) the
potential of the force. For such fields, ∼ r depends only on the end-points and not on
F · d∼
C
the curve in between. In fact, if ∼
F = ∇f
∼ then
Z
F
∼ r = f (end) − f (start).
· d∼
C
If we have a conservative vector field and we know it was created by the gradient of a scalar
function, we can find the function by the following process.
Example: Given the conservative vector field:
F
∼ = (y cos x + y 2 )∼i + (sin x + 2xy − 2y)∼j
find the scalar function f (x, y) such that ∼

F =∼
∇f .
∂f ∂f
Using: ∼
F =∼
∇f = i+
∼ j
∂x ∂y ∼
∂f
We see that: = y cos x + y 2
∂x
Integrate with respect to x (y is const): f (x, y) = y sin x + xy 2 + g(y) ———— (1)
Note that the integration ’constant’ is now a function of y.

∂f dg
Differentiate with respect to y (x is const): = sin x + 2xy +
∂y dy
dg
Equate this to the expression for df /dy in the vector field to get: = −2y
dy
∴ g(y) = −y 2 + c
Substitute g(y) into the expression for the scalar function (1):
f (x, y) = y sin x + xy 2 − y 2 + c
This process can be extended to a three-dimensional conservative vector field.
26
2.5 EXERCISES - Line Integrals
Z
1. Calculate f (x, y, z) ds for the contour
C
r(t)
∼ = cos t∼i + sin t∼j + t∼
k, 0≤t≤π
for each of the functions
(a) f (x, y, z) = x2 .
(b) f (x, y, z) = yz + x.
2. (a) A piece of wire is bent into the shape of a semicircle of radius a. If the mass/unit
length is equal to λ (a constant), find the mass of the wire. (You can parameterize
the shape by ∼r(t) = a cos t∼i + a sin t∼j.)
The coordinates of the centre of mass of the piece of wire are given by
R R
λx ds λy ds
x= R , C
y = RC .
C λ ds C λ ds
Find the center of mass of the wire.

(b) Repeat the calculation of part (a) if the mass/unit length is λ = αy where α is a
constant.
3. (a) Find the length of the curve y = x3/2 between x = 1 and x = 2.

Z
y2
(b) For the curve in part (a), calculate ds
C x2
Z
4. Calculate the work done (i.e. F
∼ r) for each of the following forces as the particle
· d∼
C
moves along the given curve.
(a) ∼
F = x∼i + y ∼j, ∼
r(t) = a cos t∼i + b sin t∼j, t : 0 → π.
(b) ∼
F = (x2 + y)∼i + x∼j, along the curve y = 1 − x2 , as x goes from 0 to 1.
Z
5. Consider the function f (x, y) = x y. Find ∼
F = ∇f
2
∼ and calculate F
∼ r along each
· d∼
C
of the following curves.
(a) Along the straight line from (0,0) to (1,1).

(b) Along the straight line path from (0,0) to (0,1) and then from (0,1) to (1,1).
(c) Along the straight line path from (0,0) to (1,0) and then from (1,0) to (1,1).
Show that in each case the answer is the same as f (1, 1) − f (0, 0).
6. If ∼
F (x, y, z) = sin y ∼i + (x cos y + z)∼j + y ∼
k, find a function f (x, y, z) such that ∼
F = ∇f
∼ .
27
2.6 SELESCTED ANSWERS - Line Integrals
√ √
1. (a) π/ 2; (b) 2 π.
2. (a) (x, y) = (0, 2a/π); (b) (x, y) = (0, πa/4).
3. (a) ' 2.086; (b) ' 3.174.
4. (a) 0; (b) 1/3.
5. ∼
F = 2xy ∼i + x2 ∼j; (a) 1; (b) 0 + 1 = 1; (c) 0 + 1 = 1.
6. f (x, y, z) = x sin y + yz + c, (c = const.)
28
3 Multiple Integration
3.1 Double Integrals
d Consider a rectangular plate between x = a and

x = b and between y = c and y = d,
where the density (the surface density), σ(x, y) de-
pends on the position.
c
If we know σ, how do we find the total mass?
a b X
We can divide the area into small rectangles so

y
that σ is approximately constant on each one. x
These rectangles can be arranged in strips parallel
to one of the axes.
If a rectangle has dimensions δx × δy, the area is δxδy and the mass is δM ' σ(x, y) δxδy.
We now have to add these up, but we do this in a particular order. We add the rectangles
along each strip first. This gives the mass as
x=b x=b
σ(x, y) δxδy =
X X
σ(x, y) δx δy.
x=a x=a
This is the mass of a strip at ‘height’ y and of thickness δy. We then add all the strips
together to get
y=d
X x=b
X
σ(x, y) δxδy.
y=c x=a
This is approximately the total mass.
29
We then let δx and δy approach 0. The mass of the strip becomes approximately
Z b
σ(x, y) dx δy.
a
Note that the value of the integral here can depend on y (which is actually a constant along
each strip). The total mass is
Z dZ b
σ(x, y) dxdy.
c a
Note that the ‘inner’ limits, a and b, refer to x and the ‘outer’ limits, c and d, refer to y.
Example: If a = 0, b = 1, c = 0 and d = 2 and σ = xy + y 2 , the mass is:

Z 2Z 1
xy + y 2 dx dy
0 0
1
Z 2 1
= x y + y x dy
2 2
0 2 0
Z 2
1
= y + y 2 dy (note the first integral gives a function of y)
0 2
1 1 2

= y2 + y3
4 3 0
8 11
=1+ =
3 3
Note the contributions from the rectangles were added parallel to the x-axis first. We could
also add the rectangles parallel to the y-axis first.
Y The mass of the strip at position x is approxi-

mately
y=d
X
σ(x, y) δy δx.
y=c
y
The total mass is approximately
x X y=d
x=b X
σ(x, y) δyδx.
x=a y=c
X
In the limit, this becomes Z bZ d

σ(x, y) dydx.
a c
Note that we have changed the order of integration. We do the y integral first. Also, the
order of the limits has changed. This simple change works only if the region is rectangular.
These integrals are called double integrals. They involve an integration with respect to two
variables. We will also see that it is possible to write these integrals in a more general form.
30
A double integral can be used to represent a volume, in much the same way that a single
integral represents an area under a graph. For example, consider a lake that covers a region
A.
A Suppose we know the depth of the lake at any point

to be d(x, y). What will be the total volume?
Y
Divide the surface of the lake into small regions.
(They needn’t be rectangles.) Let the area of one
of these regions (at (x, y)) be δA. The volume of
the lake beneath this region will be
δV ' d(x, y) × δA.
X
We now take the sum of all these values to get
X
V ' d(x, y) δA.
whole lake
If we take the size of the element to get smaller, the volume is written as
ZZ
V = d(x, y) dA.
A
Note that we use two integral signs to indicate that it is a double integral. Also, the A tells
us what is the region of integration—in this case the surface of the lake, which may not be
a rectangle.
This form of the integral is really a symbolic form. We need to be able to convert this to the
other form in order to evaluate the integral. If A is a rectangle, we can simply reproduce
the previous form. If A is not a rectangle, we have a bit more difficulty. Consider the case
where A is a triangle.
31
Y
Let us calculate
ZZ
2 xy dA
A
2 x
y= for this shape. We divide the shape into small
rectangles and add the contributions along strips
of constant y.
1 X
This gives Z 2Z 1
xy dxdy.
0 y/2
Note that the first integral is from x = y/2 to x = 1 since this is the extent of each strip.
The strips are then added from y = 0 to y = 2.
1
Z 2 x=1
This integral is now: xy 2
dy
0 2 x=y/2
1
Z 2
1
= y − y 3 dy
0 2 8
1 2 1 4 2

= y − y
4 32 0
1
=
2
Z 1 Z 2
Note that if we want to change the order here, we cannot simply change the limits. xy dydx
y/2 0
would not be correct as the final answer would depend on the lower limit which is y/2. In-
stead, we must go back to the diagram.
Y
We are now adding rectangles along the vertical
2
strips (i.e. x is constant) first. The strips are then
added from x = 0 to x = 1. Therefore, we have
2 x
y= Z 1 Z 2x
xy dydx
0 0
1 X
1
Z 1 Z 2x Z 1 y=2x
We evaluate this integral: xy dydx = xy 2
dx
0 0 0 2 y=0
Z 1
= 2x3 dx
0
1 4
1
= x
2 0
1
=
2
32
Whenever we put in the limits for an irregular region, we need to have a clear idea of the
shape. We also need to be able to describe the shape if we are given the limits. For example,
consider the shape of A for the integral:
Z 1 Z √1−x2
f (x, y) dydx
−1 0
Y
The integral says that we fix x first. Then
√
y goes from 0 to 1 − x2 . Thus the shape
√
is bounded above by the curve y = 1 − x2
and bounded below by y = 0.
1 1 X
Y
The curve is the semicircle
x2 + y 2 = 1. The limits on x show that the
whole of this 21 disc is contained in the region
of integration.
1 1 X
If we want to change the order of integration, we first have to sketch the region and then,
from this sketch, work out the new limits.
3.1.1 Polar Coordinates
We have seen what to do for rectangular coordinates, but many problems

ZZ are best done
in terms of polar coordinates. For example, calculate the integral r dA, where A is
2
A
the semicircle in the previous problem. (This integral often arises in moments of inertia
problems.) The integral is fairly easy in polar coordinates, but is a bit more difficult in
rectangular coordinates. However, we first need to know δA in terms of δr and δθ.
33
Y
The elements we choose are no longer rectangles,

but they will almost be rectangles. It is the ex-

pression for the area that is a bit different.
r
r
X
The ‘amount’ of area bounded by changes δr and δθ will depend on where the area is located.
The shape is bounded by sides of lengths δr

and rδθ, so the area is
δA ' rδrδθ.
r
r
(Note this is not just δrδθ.)
Therefore, the above integral becomes

ZZ ZZ
r dA =
2
r2 rdrdθ.
A A
The extra factor, r, is called the Jacobian. This type of factor will appear whenever we choose
coordinates that are not rectangular. It expresses that fact that the coordinate ‘patches’ will
not all be the same size.
For the semicircle, r goes from 0

to 1 and θ goes from 0 to π.
1 1 X
34
Z πZ 1
We include the limits to get: r3 drdθ
0 0
1
Z π 1
= 4
r dθ
0 4 0
Z π
1
= dθ
0 4
1
= [θ]π0
4
π
=
4
ZZ
Note that, if we just find dA, we will be finding the area of the region A.
A
35
3.2 Triple Integrals
We can extend these ideas to the integral over a whole volume. We call the region V.
Consider an object with density ρ(x, y, z) (mass/volume). The total mass is
ZZZ
M= ρ(x, y, z) dV.
V
ZZZ
If ρ = 1, then we calculate the volume as: V = dV .
V
The idea of the calculation is the same as for double integrals. In rectangular coordinates,
we divide the region into small cubes with dimensions δx × δy × δz.
The mass is
δM = ρ δV = ρ δxδyδz.
Y
X
1. We add these along (say) the vertical direction (i.e. integrate with respect to z) to get
a thin ‘tube’ (with cross-section δxδy).
2. Then add these tubes along (say) the y direction (i.e. integrate with respect to y) to
get a thin plate (with thickness δx).
3. Finally, we add all the plates (integrate with respect to x) to get the final result.
If V, the region of integration, is rectangular shaped, (that is, z goes from a to b, y goes
from c to d and x goes from e to f ) we get
Z f Z dZ b
ρ(x, y, z) dzdydx.
e c a
36
Example: Calculate the integral of f (x, y, z) = xz +ey over the cube 0 6 x 6 1, 0 6 y 6 1,
0 6 z 6 1.
ZZZ Z 1Z 1Z 1
We have f (x, y, z) dV = xz + ey dzdydx
V 0 0 0
1
Z 1Z 1 z=1
= xz + ze
2 y
dydx
0 0 2 z=0
Z 1Z 1
1
= x + ey dydx
0 0 2
1
Z 1 y=1
= xy + ey dx
0 2 y=0
Z 1
1
= x + e − 1 dx
0 2
1 2
1
= x + (e − 1)x
4 0
1
= +e−1
4
3
=e−
4
3.2.1 Polar Coordinates
As with double integrals, sometimes triple integrals appear easier in polar coordinates. In
three dimensions, there are two standard types of polar coordinates, cylindrical and spherical.
They each have their own Jacobian (the volume ‘correction’ factor) that must appear in the
integral.
Cylindrical polars are like plane polars with a z coordinate included. The θ is often relabelled
as φ, the azimuthal angle.
Z
Thus, we have coordinates (r, φ, z) with
z x = r cos φ
y = r sin φ
z=z
r Y

X
Note that r is the distance from the z axis, and the surfaces r = constant are cylinders.
Hence the name.
We can also have spherical polars, (r, θ, φ). Here r is the distance from the origin, so the
surfaces r = constant are spheres.
37
Z
φ is still the azimuthal angle (ranging from
0 to 2π). This is the angle the projection of
the position vector on the x-y plane makes
r with the x-axis.

θ, the co-latitude, is the angle the position
Y vector makes with the z-axis. It ranges from

0 to π.
X
We have
x = r sin θ cos φ
y = r sin θ sin φ
z = r cos θ.
The Jacobian for cylindrical polars is
∂x/∂r ∂y/∂r ∂z/∂r

∂(x, y, z)
J= = ∂x/∂φ ∂y/∂φ ∂z/∂φ = r.
∂(r, φ, z)
∂x/∂z ∂y/∂z ∂z/∂z
For spherical polars,
∂x/∂r ∂y/∂r ∂z/∂r

∂(x, y, z)
J= = ∂x/∂θ ∂y/∂θ ∂z/∂θ = r2 sin θ.
∂(r, θ, φ)
∂x/∂φ ∂y/∂φ ∂z/∂φ
Example: Calculate the volume of a hemisphere of radius a.

ZZZ
V = dV
V
Z 2π Z π/2 Z a
= r2 sin θ drdθdφ
0 0 0
1
Z 2π Z π/2 a
= r sin θ dθdφ
3
0 0 3 0
Z 2π Z π/2
1 3
= a sin θ dθdφ
0 0 3
1 Z 2π π/2
= a3 − cos θ dφ
3 0 0
1 Z 2π
= a3 dφ
3 0
2πa3
=
3
38
3.3 Surface Integrals
This is the last type of integral we needZZ
to look at. We have seen that if we have a plane
object with density σ(x, y), the mass is σ dA. What if we have a curved surface, with
A
density σ(x, y, z)? The surface could be the surface of a sphere or a paraboloid etc., where
the density depends on the position.
Once again, we can divide the region into
Z small elements of surface δS. The mass will
be approximately
S
X
σ(x, y, z)δS.
surface
In the limit as δS → 0, this becomes

Y
ZZ
X
σ(x, y, z) dS.
S
Here, S is the region of integration.
Another type of integral for a general surface is when we have a vector field defined through-
out space. The vector field can simply be regarded as a ‘collection of arrows’ in space. We
want to know how many of these arrows ‘point through’ some surface. Think of a fluid
moving with velocity ∼
v(x, y, z). At what rate is the fluid moving through some surface S?
We divide the surface into elements δS with nor-

mal ∼n. The rate at which the fluid is flowing
through is proportional to δS and also propor-
tional to |∼
v|. Also, only the component of ∼
v that
S is normal to the surface will count.
Therefore the rate is

v| cos θδS
|∼
= |∼ n| cos θδS
v| |∼
=∼v·∼ nδS.
So the total rate of flow across the
surface is
ZZ
v·∼
n dS.
∼
S S
39
Note that the product ∼
n dS is often written as d∼
S.
The calculation of such integrals in general is a little beyond what we have already done.
However, if the surface is actually a plane that is perpendicular to one of the axes, then the
integral can be calculated using the technique we have already seen.
ZZ
Example: If ∼ F = xy ∼i + (x + yz)∼j + (x + y )z ∼
2
k, calculate
2
F
∼ n dS over the surface
·∼
S
z = 2 for 0 6 x 6 1, 0 6 y 6 1.
The surface is perpendicular to the z-axis and the integral over this surface is similar to an
integral over the square in the x-y plane.
Z
2
We have to use the fact that z = 2 on the sur-
face and also ∼
n=∼k on the surface. Therefore
F
∼ n = (x + y 2 )z = 2(x + y 2 ).
·∼
1 Y
1
X
The element of area is δS = δxδy (in fact the same as δA previously). Therefore, dS = dxdy.
Z 1Z 1 Z 1 x=1
The integral becomes: 2(x + y ) dxdy =
2
x + 2xy
2 2
dy
0 0 0 x=0
Z 1
= 1 + 2y 2 dy
0
2
1
= y + y3
3 0
5
=
3
This integral is called the flux of ∼F through S. There are usually two choices for the
direction of ∼
n, but note that, if we have a closed surface, the flux is usually taken in the
outward direction. In this case, we get the rate at which fluid is leaving the enclosed volume.
This in turn must be related to the rate of change of the mass of the fluid remaining. We
will see one consequence of this relation in the next section.
40
3.4 EXERCISES - Multiple Integration
1. Calculate the following double integrals.
Z 2Z π
(a) y sin x + x dxdy.
0 0
ZZ √
(b) xey + y x dA, where A is the region 1 ≤ x ≤ 4, 0 ≤ y ≤ 1.
A
(c) Evaluate the integral in part (a) by changing the order first.
ZZ
(d) r sin2 θ dA, where A is the region above the x-axis bounded by the circle
A
x 2 + y 2 = a2 .
Z π/3 Z a
(e) r cos θ rdrdθ. What is the shape of the region in this case?
0 0
2. Find the mass of the triangle bounded by the line x + y = 1 and the x and y axes if
the density is σ(x, y) = x2 + y 2 .
3. The region A lies in the first quadrant and is bounded by the curve y = 1 − x2
and the x and y axes. What are the limits needed on the integral signs to calculate
ZZ
f (x, y) dA?
A
(Do the y integral first.)
Hence calculate (i) the area of A and
RR (ii) the position
RR of the centroid of A.
x dA y dA
(The centroid has coordinates x = A , y= A .)
Area Area
4. Find the centroid of the that part of the disc of radius 1 with centre at the origin that
lies in the first quadrant.
5. For the following integral, change the order of integration and then evaluate the inte-
gral.
Z 2Z y
x2 exy dxdy.
0 0
Z 1Z 2Z 1
6. (a) Evaluate xz + y dzdydx.
0 1 0
ZZZ
(b) The volume of a region V can be expressed as dV . Find the volume of the
V
region for which 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 and 0 ≤ z ≤ 2 − x − y. i.e. the region
between the X–Y plane and the plane z = 2 − x − y for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
(c) Find the mass of the cylinder x2 + y 2 ≤ a2 that lies between z = 0 and z = h if
the density is given by ρ(x, y, z) = α (a constant).
What if ρ = αr where r is the distance from the z-axis?
ZZZ
(d) Calculate r2 dV , where V is the hemisphere x2 + y 2 + z 2 ≤ a2 ,
V
z ≥ 0. (r is the distance from the origin in this case.)
41
7. If ∼
F = (2x + y)∼i + (z 2 + x)∼j + xz ∼
k, calculate the flux of ∼
F (in the direction pointing
outwards) through each of the six faces of the cube 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1.
8. Prove that the Jacobian used for changing co-ordinate systems from Cartesian to
cylindrical polars is J = π, and when changing from Cartesian to spherical polars
is J = r2 sin θ.
9.* A vertical dam wall is in the shape of a triangle bounded by the lines
x 1 x 1
y = − − , y = − and y = 0. (The level of the water is at y = 0.) The pressure
4 2 2 2
(force/unit area) at depth d is ρgd where the density ρ is assumed to be constant and
g is the gravitational constant.
(a) Calculate the total force exerted on the wall by the water.
ρgy 2 dA
RR
(b) Calculate the depth of the centre of pressure which is given by d = A
.
Total force
10.* (a) Calculate the volume of the solid bounded by the paraboloid z = 3 − (x2 + y 2 )
and the x–y plane.
(b) Also find the volume of the region between this paraboloid and the surface z = 2x.
(In part (b), once you have carried out the z integration, the remaining integral
is best expressed in terms of r and θ where x = −1 + r cos θ and y = r sin θ.)
42
3.5 SELECTED ANSWERS - Multiple Integration
1. (a) 4 + π 2 ; (b) 15
2
(e − 1) + 73 ; (c) 4 + π 2 ; (d) πa3 /6; (e) 1
√
2 3
a3 .
2. 1/6.
3. (i) Area = 2/3, (ii) (x, y) = (3/8, 2/5).

4 4

4. (x, y) = , .
3π 3π
5. (e4 + 3)/4.
6. (a) 7/4; (b) 1; (c) παa2 h, 2παa3 h/3; (d) 2πa5 /5.
7. (i) 5
2
(+∼i); (ii) − 12 (−∼i); (iii) 5
6
(+∼j); (iv) − 65 (−∼j); (v) 1
2
(+∼
k); (vi) 0 (−∼
k).
8.
9. (a) F = ρg/8; (b) d = 1/4.
10. (a) 9π/2; (b) 8π.
43
4 Vector Calculus
4.1 Divergence of a Vector Field
The flux of a vector field, ∼
F , over the surface, S, of a volume, V, is the total amount of
F ‘pointing out’ of the volume. If ∼
∼ F represents the velocity of a fluid, then the flux would
be the rate at which fluid was leaving the volume. Generally, the larger the volume, V, the
larger the flux will be. Note though that we can have a negative flux—if the vector field
points into the volume.
We could divide the flux by the volume to get the rate of ‘outflow’ per unit volume. This
1 ZZ
would give F · dS, which is like a flux/unit volume. If we do this over a small volume,
V S∼ ∼
δV , and let δV → 0, we get some information about whether the fluid is moving in towards
or away from a particular point. This quantity is called the divergence. Thus, the divergence
is
1 ZZ
lim F · d∼
∼ S.
∆V →0 δV S
It is like a flux density.
Z
Initially, we will try to find the flux over a
small volume, δV = δxδyδz, of a field,
F
∼ = u∼i + v ∼j + w∼
k,
Y near the point (x, y, z).
We look first at the two faces that are Y

perpendicular to the x-axis.
For the face containing the point (x, y, z),

i i
n = −∼i, so ∼
∼ n = −u(x, y, z).
F ·∼ ~ y ~
The flux is: −u(x, y, z)δyδz z
For the face containing the point (x + X

n = ∼i, so ∼
δx, y, z), ∼ n = u(x + δx, y, z).
F ·∼ Z x x + x
The flux is: u(x + δx, y, z)δyδz
44
The total contribution to the flux is
∂u
u(x + δx, y, z) − u(x, y, z) δyδz ' δxδyδz.
∂x
We can similarly calculate the flux through the two other pairs of faces to be (∂v/∂y)δxδyδz
and (∂w/∂z)δxδyδz, so the total flux is
∂u ∂v ∂w

+ + δxδyδz.
∂x ∂y ∂z
To get the flux density we divide by the volume, δV = δxδyδz, to get
∂u ∂v ∂w
+ + .
∂x ∂y ∂z
This is the divergence. It can be written as
∂ ∂ ∂

i∼ + ∼j +∼
k · u∼i + v ∼j + w∼
k =∇∼ ·∼
F.
∂x ∂y ∂z
The left hand factor is the familiar ‘grad’ operator, but now in a different role. This time it
is differentiating a vector field using the dot product.
A physical interpretation of this operation is to consider a fluid whose velocity is ∼

v(x, y, z)
and whose density is ρ(x, y, z). Then ∇∼ · (ρ∼
v) at any point is the tendency of the fluid to
move away from that point.
There is an important identity relating the divergence to the total flux. We have said that
the divergence is the flux density. It follows that if we integrate the divergence over a volume,
we expect to get the total flux for that volume (or the surface of that volume). Thus
ZZZ ZZ
∇ F dV =
∼ ·∼ F
∼ · d∼
S.
V S
(integral of flux density) (total flux)
This is called the divergence theorem or Gauss’ theorem.
The divergence
ZZ theorem also leads to an important identity in fluid mechanics. If ∼
F = ρ∼
v,
then n dS is the rate at which fluid leaves the volume enclosed by S. The total mass
v·∼
ρ∼
S
ZZZ
d ZZZ
in this volume is ρ dV , so the rate of change of the mass is ρ dV .
V dt V
45
Therefore
d ZZZ ZZ
ρ dV = − ρ∼v·∼n dS
dt V S
ZZZ
∂ρ ZZZ
∴ dV = − ∼ · (ρ∼
∇ v) dV
V ∂t V
(using the divergence theorem)

ZZZ
∂ρ

∴ +∇∼ · (ρ∼
v) dV = 0
V ∂t
∂ρ
∴ +∇ ∼ · (ρ∼
v) = 0 everywhere.
∂t
This is called the continuity equation which actually expresses the conservation of mass.
That is, any mass that flows in or out of the volume will affect what is left. This equation
will appear in any equations for fluid flow. Note that, if the fluid is incompressible, then ρ
is a constant and the equation becomes ∇ v = 0.
∼ ·∼
Note also that a magnetic field ∼

B also satisfies ∇ B = 0. Such fields are called solenoidal.
∼ ·∼
46
4.2 Curl of a Vector Field
The other important quantity is the curl of a vector field. This is effectively the tendency of
a fluid to rotate about a point. This may depend on the direction. For example
Thus the curl is a vector quantity.
To work out this ‘tendency to rotate’ perpendicular to a given

direction, we could put a paddle wheel into the fluid and see
how fast it rotates.
Mathematically, we put a closed curve C in the fluid, in a

plane normal to the given direction and calculate
Z
F
∼ · d∼
r around this closed curve
C
C
This is the circulation of ∼
F around C.
To find the component of the curl of ∼F in that direction, we divide by the area enclosed by
C and take the limit as this area approaches zero. Thus, the component is
1 Z
lim F · d∼
r.
δS→0 δS C ∼
This is a ‘circulation density’.
To find the component of curl ∼

F in the z direction, put in a (rectangular) curve parallel to
the x-y plane.
47
Y
y We calculate the circulation and divide by the

area. Then let the size of the curve get smaller.
x This will give the z component of curl ∼
F.
Z
If ∼
F = u∼i + v ∼j + w∼
k, the contributions to F
∼ r from the two segments parallel to the
· d∼
C
y-axis can be calculated as follows.
Along the segment through (x, y, z),

r = −∼jδy and ∼
d∼ r = −v(x, y, z)δy
F · d∼ j y j y
~ ~
Along the segment through (x + δx, y, z),
r = ∼jδy and ∼
d∼ r = v(x + δx, y, z)δy
F · d∼
x x + x X
The total contribution is

∂v
v(x + δx, y, z) − v(x, y, z) δy ' δxδy.
∂x
The contribution from the other two segments is −(∂u/∂y)δxδy. The sum of these two terms
divided by the area enclosed by the curve is
∂v ∂u
− .
∂x ∂y
This is the z component of curl ∼

F.
The components in the x and y directions are
∂w ∂v ∂u ∂w
− and − .
∂y ∂z ∂z ∂x
Therefore
∂w ∂v ∂u ∂w ∂v ∂u

curl ∼
F = − i+
∼ − j+ − k.
∂y ∂z ∂z ∂x ∼ ∂x ∂y ∼
48
This can be written in the form
i
∼ ∼
j k
∼
∂ ∂ ∂
.
∂x ∂y ∂z
u v w
In this form, it looks like ∇
∼ ×∼F and this is how it is most often written. Note that this is
another use of the ∇∼ operator.
Example: If ∼ k, calculate ∇
F = −xy ∼i + z 2 ∼j − yz ∼ F.
∼ ×∼
i
∼ j
∼
k
∼
∇ F = ∂/∂x ∂/∂y ∂/∂z
∼ ×∼
−xy z2 −yz
= (−z − 2z)∼i − (0 − 0)∼j + (0 + x)∼
k
= −3z ∼i + x∼
k
Note that each component of ∇ ∼ ×∼F is like a circulation/unit area or a circulation density.
If we integrate this density over some surface S, we expect to get back the total circulation
around the boundary of the surface. That is,
ZZ Z
∇
∼ ×∼ n dS =
F ·∼ F
∼ · d∼
r
S C
where C is the boundary curve of the surface S. This is called Stokes’ Theorem.
One consequence of this is that if we fix C and calculate

ZZ
∇
∼ ×∼
F ·∼
n dS
S
over a surface bounded by C, we get the same answer no

C matter which surface we choose.
49
4.2.1 Some Important Identities
If f (x, y, z) and g(x, y, z) are scalar functions and ∼

F and ∼
G are vector fields the following
identities hold.
∇(f
∼ + g) = ∇f
∼ + ∇g ∼
∇
∼ · (∼F + G)
∼ = ∇ ·
∼ ∼ F +∇∼ ·∼
G
∼ × (∼
∇ F +∼G) = ∇∼ ×∼ F +∇∼ ×∼
G
PRODUCT RULES
∇(f
∼ g) = f ∇g
∼ + g ∇f
∼
∇
∼ · (f G)
∼ = ∇f
∼ · ∼ + f∇
G ∼ ·∼ G
∼ · (∼
∇ F ×∼ G) = (∇∼ ×∼F) · ∼
G−∼ F · (∇
∼ ×∼
G)
∼ × (f ∼
∇ G) = ∇f
∼ ×∼ G + f∇ ∼ ×∼G
SECOND DERIVATIVES
∼ · (∇f
∼ ) = ∇ f
2
∇
∼ × (∇f
∇ ∼ ) = ∼
0
∼ · (∇
∇ ∼ ×∼F) = 0
Note that ∇ ∼ ×∼G=∼ 0 if and only if there is a scalar function f (x, y, z) such that ∼
G = ∇f∼ . In
this case, ∼
G is conservative or irrotational.
Also, ∇ G = 0 if and only if there is a vector field ∼
∼ ·∼ F such that ∼G =∇ ∼ ×∼ F . In this case ∼
G
is incompressible or solenoidal.
GAUSS’ THEOREM
ZZZ ZZ
∇ F dV =
∼ ·∼ F
∼ ·∼
n dS
V S
STOKES’ THEOREM
ZZ Z
(∇ F) · ∼
∼ ×∼ n dS = F
∼ · d∼
r
S C
50
4.3 EXERCISES - Vector Calculus
1. Calculate ∇ F and ∇
∼ ·∼ F for the following cases.
∼ ×∼
(a) ∼
F = (x2 + y)∼i + 2yz ∼j + xz 2 ∼
k.
(b) ∼
F = (x + 2z)∼i + (x + y)∼
k.
In each case, verify that ∇

∼ · (∇ F ) = 0.
∼ ×∼
2. If f (x, y, z) = x sin y + zy 2 , calculate ∇2 f = ∂ 2 f /∂x2 + ∂ 2 f /∂y 2 + ∂ 2 f /∂z 2 and also

calculate ∇ ∼ · (∇f
∼ ) by calculating ∇f ∼ first.
Verify that ∇ ∼ × (∇f
∼ ) = ∼0.
If ∼
F = (2x + y)∼i + (z 2 + x)∼j + xz ∼
k, calculate the flux of ∼
F (in the direction pointing
outwards) through each of the six faces of the cube 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1.
3. If ∼
F = (2x + y)∼i + (z 2 + x)∼j + xz ∼
k (Q7 in EXERCISES - Multiple Integration).
ZZZ
Calculate: ∇ F and
∼ ·∼ ∇ F dV .
∼ ·∼
V
Show that this is equal to the total flux of ∼
F through the six faces that you have
calculated previously.
Z ZZ
F = xy ∼i + x ∼j + y ∼
4. If ∼ 2
k, show that F
∼ r=
· d∼ (∇ F) · ∼
∼ ×∼ n dS if
C S
(a) S is the square 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.

(b) S is the unit circle x2 + y 2 = 1.
5. (a) Prove that for any function, f (x, y, z), ∇

∼ × (∇f
∼ ) = ∼
0.
(b) Prove that for any vector field, ∼
F = u∼i + v ∼j + w∼ ∼ · (∇
k, ∇ F ) = 0.
∼ ×∼
4.4 SELECTED ANSWERS - Vector Calculus

1. (a) 2x + 2z + 2xz, −2y ∼i − z 2 ∼j − ∼
k; (b) 1, ∼i + ∼j.
2. ∇2 f = −x sin y + 2z.
4. (a) 1/2; (b) 0.
51
5 Fourier Series
5.1 Periodic Functions
Fourier series are infinite series of trigonometric functions. They are used to represent
periodic functions. They can be seen as a means of approximating these functions, in the
same way as a Taylor series will approximate a function. One advantage of the Fourier Series
is that is can be used to approximate period functions that have discontinuities, such as the
sawtooth wave function. We will see that Fourier Series can be used to solve many partial
differential equations.
A periodic function is a function, f , for which ∃ T > 0 s.t.
f (t + T ) = f (t) for all t.
For example
f(t)
T t
A wave such as a saw tooth wave, or square wave is periodic.
f(t) f(t)
52
f(t)
A sound wave of fixed pitch is periodic.

t
The smallest such value of T is called the period of the function.
If we try to approximate periodic functions by a Taylor series, we have problems.
i. Polynomials aren’t periodic. This means that the series can take a long time to con-
verge. For example, the series for sin x converges slowly, except near the origin.
ii. These functions often have discontinuities, which Taylor series can’t handle.
5.1.1 Trigonometric Series
Instead, we use a set of periodic functions to approximate periodic functions. If T = 2π, we

could use the functions
sin x, cos x,
sin 2x, cos 2x,
sin 3x, cos 3x,
.. ..
. .
sin nx, cos nx,
etc.
For any function, f , we write
f (x) ' 21 a0 + a1 cos x + a2 cos 2x + a3 cos 3x + · · ·

+ b1 sin x + b2 sin 2x + b3 sin 3x + · · · .
This form will clearly cope with periodic functions. We will see that it will also work for
discontinuous functions. Note that
i. If f is an even function, then f (−x) = f (x) and we get only cos terms in the series.
ii. If f is an odd function, then f (−x) = −f (x) and we get only sin terms in the series.
53
Consider the square wave with period 2π.
f(x)
This is an odd function, so we can

approximate it with a sin series.
x
The first approximation is fe(x) = b1 sin x.
f(x)
The amplitude, b1 , of the sin wave

will be chosen to make the two
functions close.
Z π 2
We choose b1 so that f (x) − b1 sin x dx is small. In the present example, this means
Z π 2 −π
that 1 − b1 sin x dx is small. If we expand the brackets, we get
0
Z π Z π Z π
dx − 2b1 sin x dx + b21 sin2 x dx.
0 0 0
This has to be minimised. The integral is equal to π − 4b1 + π2 b21 . The derivative with respect
to b1 is −4 + πb1 . Therefore b1 = 4/π is the optimum value.
The next approximation is fe(x) = b1 sin x + b2 sin 2x.
f(x)
However, the graph of sin 2x is

not symmetric about π/2, so the
sin 2x term won’t contribute.
x
54
Therefore, the next approximation is fe(x) = b1 sin x + b3 sin 3x.
Z π 2
Again, we require f (x) − fe(x) dx to be a minimum. We get the same value for b1 as
−π
before, namely 4/π. The optimum value of b3 is (4/π) × (1/3).
Therefore f(x)
4 1
f (x) ' sin x + sin 3x .
π 3
The extra ‘ripple’ in sin 3x makes x
the approximation a little more
like the square wave.
Continuing in this way, we get
4 1 1 1

f (x) ' sin x + sin 3x + sin 5x + sin 7x · · · .
π 3 5 7
The more terms we have, the more accurate will be the approximation.
3 terms 10 terms 30 terms

f(x) f(x) f(x)
x x x
Note that at the points where the function is discontinuous, the convergence is not very
good. This overshoot phenomenon is a characteristic feature of this type of approximation.
55
The general formula to find the coefficients ai and bi can be found as follows. If
1 ∞
f (x) = a0 + (an cos nx + bn sin nx) (5.1)
X
2 n=1
then the integral of both sides over one complete period must be the same. We can integrate
from −π to π or from 0 to 2π or π/2 to 5π/2 etc. We find
Z π
1Z π ∞ Z π Z π

f (x) dx = a0 dx + cos nx dx + bn sin nx dx
X
an
−π 2 −π n=1 −π −π
∞
= πa0 + (0 + 0)
X
n=1
= πa0
Therefore
1Zπ
a0 = f (x) dx.
π −π
To find the value of am , we multiply equation (5.1) by cos mx and integrate. Therefore,
Z π ∞
a0 Z π Z π Z π
f (x) cos mx dx = cos mx dx+ cos nx cos mx dx+bn sin nx cos mx dx .
X
an
−π 2 −π n=1 −π −π
The coefficient of a0 in this equation is zero. Similarly the coefficient of bn is zero, since
sin nx cos mx is an odd function. The coefficients of an are mainly zero, since
Z π
cos nx cos mx dx = 0
−π
Z π
unless n = m. In that case, cos2 nx dx = π so
−π
1Zπ
am = f (x) cos mx dx.
π −π
A similar procedure (namely multiplying by sin mx and integrating) gives
1Zπ
bm = f (x) sin mx dx.
π −π
These formulae are called Euler’s formulae for the Fourier coefficients an and bn . The series
that we get is called the Fourier series.
56
If the period is T (rather than 2π) then
a0 X ∞
2πnx 2πnx

f (x) = + an cos + bn sin
2 n=1 T T
where
2 Z T /2
a0 = f (x) dx
T −T /2
2 Z T /2 2πnx
an = f (x) cos dx
T −T /2 T
2 Z T /2 2πnx
bn = f (x) sin dx.
T −T /2 T
Example: Consider the following function.
f(x)
The period is T = 2. This means

that the integrals can go from −1
to 1 or from 0 to 2 etc.
−1 1 3 x
Note also that the function is even (that is, f (−x) = f (x)) so we do not expect any sine
terms in the answers (bn = 0).
∞
a0 X
Therefore: f (x) = + an cos nπx
2 n=1

 1 + x, −1 < x < 0
The function is f (x) = 
1 − x, 0<x<1
2Z 1
Therefore: a0 = f (x) dx
2 −1
Z 0 Z 1
= 1 + x dx + 1 − x dx
−1 0
1 1
= + =1
2 2
2Z 1
an = f (x) cos nπx dx
2 −1
Z 0 Z 1
= (1 + x) cos nπx dx + (1 − x) cos nπx dx
−1 0
57
1 1 Z1
Z 1 1
Now: (1 − x) cos nπx dx = (1 − x) sin nπx + sin nπx dx
0 nπ 0 nπ 0
1
1
= 0 + − 2 2 cos nπx
nπ 0
1

= − 2 2 cos nπ − 1
nπ
1

= − 2 2 (−1) − 1n
n π
1  0, n even
= 2 2
nπ 2, n odd

Z 0
1  0, n even
Also: (1 + x) cos nπx dx = 2 2 
−1 nπ 2, n odd


 0, n even
Therefore: an = 4

 , n odd
n2 π 2
and so the Fourier Series for f (x) is given by:
∞
a0 X
f (x) = + an cos nπx
2 n=1
1
= + a1 cos πx + a3 cos 3πx + a5 cos 5πx + · · ·
2
1 4 1 1

= + 2 cos πx + cos 3πx + cos 5πx + · · ·
2 π 9 25
Note that if we put x = 0 in this equation and use the fact that f (0) = 1 and cos 0 = 1, we
obtain the identity
1 1 π2
1+ + + ··· =
9 25 8
Note: We can rewrite the series in closed form (using the summation sign) and include only
the non-zero coefficients, i.e. the odd coefficients in this example. We do this by changing
the index of the summation, n.
We make the substitution: n = 2m − 1. Therefore as m = 1, 2, 3, . . ., n = 1, 3, 5, . . ..
The Fourier Series can therefore be written in closed form as:
1 X∞
4
f (x) = + cos[(2m − 1)πx]
2 m=1 (2m − 1)2 π 2
58
5.2 Half-range Series
We have looked at Fourier series for periodic functions defined on all real numbers. They
can also be used for functions defined on finite intervals, [0, l]. For such a function, we
simply extend it to the real line, making it periodic. Usually we make the extended function
either even or odd. In this case, the expansion will involve only cos terms or only sin terms
respectively.
For example, if f (x) = l − x on [0, l], we can extend it as follows.
f(x) f(x) f(x)
EVEN −→ −→
l x −l lx −l l 3l x
OR
f(x) f(x) f(x)
ODD −→ −l l−→
x −l l 3l
x
l x
These are the even and odd periodic extensions of the function.
For a function defined over [0, l], the period of the extended function is T = 2l. The resulting
series are called the 21 -range expansions of the functions. We can find the 12 -range cos series
and the 12 -range sin series.
1
The general −range Fourier series are written as:
2
1 a0 X ∞
nπx
EVEN: − range cos series: f (x) = + an cos
2 2 n=1 l
1 ∞
nπx
ODD: − range sin series: f (x) = bn sin
X
2 n=1 l
59
1
To find the coefficients of the −range sin series, we calculate
2
1Z l
a0 = f (x) dx = 0 (since f (x) is the odd extension),
l −l
1Z l nπx
an = f (x) cos dx = 0 (again since f (x) is odd and so the integrand is odd),
l −l l
1Z l nπx
bn = f (x) sin dx
l −l l
2Z l nπx
= f (x) sin dx (since the integrand is even).
l 0 l
1
To find the coefficients of the −range cos series, we calculate
2
1Z l
a0 = f (x) dx
l −l
2Z l
= f (x) dx (since f (x) is now the even extension),
l 0
1Z l nπx
an = f (x) cos dx
l −l l
2Z l nπx
= f (x) cos dx (since integrand is even),
l 0 l
1Z l nπx
bn = f (x) sin dx = 0 (since the integrand is odd).
l −l l
Therefore, for f (x) = l − x on [0, 1], the coefficients of the cos series are
2Z l
a0 = (l − x) dx
l 0
=l
2Z l nπx
and an = (l − x) cos dx
l 0  l
2l  0, n even
= 2 2
nπ 2, n odd
1 a0 X ∞
nπx
And the −range cos series: f (x) = + an cos is given by:
2 2 n=1 l
l 4l πx 1 3πx 1 5πx

f (x) = + 2 cos + cos + cos + ··· .
2 π l 9 l 25 l
This converges to the even periodic extension.
60
1
The −range cos series series can be written in closed form. Let n = 2m − 1 to give:
2
l 4l X
∞
1 (2m − 1)πx
f (x) = + 2 cos
2 π m=1 (2m − 1)2 l
1
Alternately, the coefficients of the −range sin series are
2
2Z l nπx
bn = (l − x) sin dx
l 0 l
2 l nπx l l Zl nπx

= − (l − x) cos − cos dx
l nπ l 0 nπ 0 l
2 l2 l l nπx l

= − sin
l nπ nπ nπ l 0
2l
=
nπ
1 ∞
nπx
And the −range sin series: f (x) = bn sin is given by:
X
2 n=1 l
2l πx 1 2πx 1 3πx

f (x) = sin + sin + sin + ··· , on the interval [0, l]
π l 2 l 3 l
The graphs of these approximations are as follows.

1 term 1 term
f(x) f(x)
x
2 terms 3 terms
f(x) f(x)
x
3 terms 7 terms
f(x) f(x)
61
Note the following.
• In this case, the cos series converges more rapidly.
• An overshoot develops at the discontinuity in the sin series.
• Both series do converge to the function at each point (except at the discontinuities).
62
5.3 EXERCISES - Fourier Series
1. For each of the following functions, write down f (−x) and decide if the function is
even or odd or neither.
(i) f (x) = x2 , (ii) f (x) = ex , (iii) f (x) = x sin x, (iv) f (x) = x3 + 3x,
(v) f (x) = tan x + 2x2 .
2. Sketch graphs of each of the following periodic functions.
(a) f (x) = −x, −π 6 x ≤ π, Period = 2π.

(b) f (x) = 1 − x2 , −1 6 x ≤ 1, Period = 2.

 0, −π/2 6 x ≤ 0
(c) f (x) =  , Period = π
x, 0 6 x ≤ π/2
3. (a) Calculate the coefficients of the Fourier series for the function in 2(c).
(b) Calculate the coefficients of the Fourier series for the function
f (x) = | sin x|.

 −1, −π 6 x ≤ 0
4. The Fourier series for the square wave f (x) =  ,
1, 06x≤π
(period 2π) is
4 1 1

f (x) = sin x + sin 3x + sin 5x + · · · .
π 3 5
Substitute x = π/2 into this equation to obtain an expression for the series
1 1 1
1− + − + ··· .
3 5 7
5. If f (x) = x2 for 0 6 x 6 π, sketch the even and odd periodic extensions of f . Also
calculate the 12 -range cos series and the 12 -range sin series for f .
6. Prove the result that


Z π  0, m 6= n
cos nx cos mx dx = .
−π  π, m = n

(Use the identity cos A cos B = 1
2
cos(A + B) + cos(A − B) ).
63
5.4 SELECTED ANSWERS - Fourier Series
3. (a) a0 = π/4, am = 0 (m even), am = −1/(m2 π) (m odd), bm = −(−1)m /(2m).
4 1
(b) a0 = 4/π, am = − , bm = 0.
π (2m)2 − 1
π2
5. Cos series: f (x) = 3
− 4(cos x − 212 cos 2x + 312 cos 3x − · · · ).
Sin series: bm = − 2π
m
(−1)m + m43 π ((−1)m − 1).
64
6 Eigenvalues and Eigenvectors
6.1 Revision of Matrices
We need to revise the idea of a matrix. Firstly, consider a set of m linear equations for n
unknowns.
a11 x + a12 y + a13 z + · · · = b1
a21 x + a22 y + a23 z + · · · = b2
a31 x + a32 y + a33 z + · · · = b3
.. .. .. .. .
. . . . ..
am1 x + am2 y + am3 z + · · · = bm .
a11 , a12 , etc. are constants.
b1 , b2 , etc. are constants.
x, y, z etc. are unknowns (n of them).
These equations can be written as a matrix equation,
x=∼
A∼ b,
where    
a
 11
a12 a13 · · · a1n  b
 1
 a21

a22 a23 · · · a2n 
  b2 
 
   
A =  a31 a32 a33 · · · a3n  and ∼
b =  b3  .
  

 .. .. .. .. ..   .. 
  
 . . . . .   . 

   
am1 am2 am3 · · · amn bm
 
x
 
y
 
The unknowns form a single column: x=
∼ z

 
.
..
Essentially, a matrix is a rectangular array of numbers. An m × n matrix has m rows and

n columns. The bi ’s also form a column vector. i.e.
 
b
 1
 b2 
 
b=
∼  ..  .

 . 
 
bm
This can be viewed as an m × 1 matrix. Similarly, the vector, ∼

x, is an n × 1 matrix.
65
6.1.1 Matrix Algebra
We can add and subtract matrices if they have the same number of rows and columns.
1 −3 2
We can also multiply a matrix by a number. Therefore, if A =  and B =
−1 1 1
     
3 1 −2 1 + 3 −3 + 1 2 − 2 4 −2 0
 , we can calculate: A+B = = .
1 −2 3 −1 + 1 1−2 1+3 0 −1 4
 
5 −15 10
Also, 5A =  .
−5 5 5
If A is an m × n matrix, the transpose of A is the n × m matrix obtained by interchanging

1 −1
 
 
1 −3 2
rows and columns. Thus, if A =  , the transpose is: AT =  1
 
−3
−1 1 1 
2 1
Note that ∼bT = (b1 , b2 , . . . , bm ) is a 1 × m matrix. It is called a row vector. Also, a square
matrix is symmetric if AT = A. That is, interchanging rows and columns leaves a symmetric
matrix unchanged.
In forming A∼
x, we multiply a matrix by a vector. i.e. we can multiply an m × n matrix by
an n dimensional vector. e.g.
1 1   1 × 4 + 1 × (−1) 3
     
−2 3 
  4 = (−2) × 4 + 3 × (−1) = −11 .
   
 
−1    
1 3 1 × 4 + 3 × (−1) 1
We can extend this to multiply a matrix by a matrix provided the sizes match. e.g.
    
1 3 1 −1 1 1 × 1 + 3 × 2, 1 × (−1) + 3 × 1, 1 × 1 + 3 × 1
 =
2 1 2 1 1 2 × 1 + 1 × 2, 2 × (−1) + 1 × 1, 2 × 1 + 1 × 1
 
7 2 4
= .
4 −1 3
In the multiplication, each column of the second matrix is treated like a column vector. The
product of the first matrix with each column of the second matrix gives a column of the
product matrix. The sizes of the matrices are
(2 × 2) × (2 × 3).
Note that the number of columns of the first must equal the number of rows of the second.
In general, an (m × n) matrix multiplied by an (n × p) matrix gives an m × p matrix.
66
Properties of Matrix Operations
• A+B =B+A
• A(B + C) = AB + AC
• (A + B)C = AC + BC
• (λA)B = λ(AB) λ is a constant
• A(λB) = λ(AB)
• A(BC) = (AB)C
But note that AB need not equal BA. For example, if the multiplication in the previous
example is done in the opposite order, we get
  
1 −1 1 1 3
 .
2 1 1 2 1
This is not even defined, as the number of columns of the first is not equal to the number of
rows of the second. Thus matrix multiplication is associative, but not commutative.
The square matrix of order n, that has ones on the main diagonal and zeros elsewhere, is
the unit matrix, In (or just I). e.g.
1 0 0
 
I3 = 
0 1 0 .
 

0 0 1
Note that IA = A and BI = B for all A and B.
Note that we cannot divide by a matrix. However, if A is a square matrix, we can sometimes
find a matrix, A−1 , such that
AA−1 = A−1 A = I.
A−1 is called the inverse of A.
67
6.1.2 Solving systems of linear equations
Matrix notation can be used to solve linear equations. For example, to solve the equations
x − y + 2z = −4
2x + y − z = 9
−3x − 3y + 2z = −20
using matrix notation, we write
1 −1 2
    
x −4
 y  = 
 2 1 −1 9
    
.
  

−3 −3 2 z −20
A shorthand for this is the augmented matrix form
1 −1 2 −4
 
[A|∼
b] =  2 1 9
 
 −1 .

−3 −3 2 −20
We proceed to get the matrix into upper triangular form. We select one row (usually the
first) and an element in that row (usually the first). This element is called the pivot. We
subtract multiples of this row from the other rows in order to get zeros beneath the pivot.
We get
1 −1 2 −4
 
0 3 −5 17
 R2 − 2R1
 

0 −6 8 −32 R3 + 3R1
Then use the second element in the second row as the pivot, and get zeros below this element.
1 −1 2 −4
 
0 3 −5 17
 
 
0 0 −2 2 R3 + 2R2
The coefficient part is now in upper triangular form. This procedure is called row reduction
or Gaussian elimination.
The last row now says −2z = 2. We can solve this to get z = −1.
The second row says 3y − 5z = 17. Substitute z = −1 to get y = 4.
Then substitute into the first row (i.e. x − y + 2z = −4) to get x = 2.

This procedure is called back substitution.
68
This procedure always works unless we get a row of zeros in the last line. For example, we
might end up with  
a1 b 1 c1 d 1
0 b
 
 2 .
c2 d 2 
0 0 0 d3
The last equation says that 0 × z = d3 . There are two possibilities.
1. d3 6= 0 There can be no solution.

The equations are inconsistent.
2. d3 = 0 The final equation is always satisfied. We really have
only two equations. We actually have many solutions—
or an infinite number.
For example, if we have

5 3 −2 −3/2
 
0 2 −1 −3 
 
 ,
0 0 0 0
we can let z take any value, λ say, which will be a parameter.
λ−3
The second equation then tells us 2y − z = −3 or y = .
2
The first equation then gives
λ−3 3

5x + 3 − 2λ = −
2 2
λ+6
or x = . The solutions then have the form
10
λ+6 λ−3
x= , y= , z = λ,
10 2
where λ is any number. This is just the equation of a straight line with parameter λ. The
direction is (1/10, 1/2, 1) and (3/5, −3/2, 0) is a point on the line.
69
The case where we get a row of zeros arises if the determinant of the matrix is zero. The
determinant of a matrix can be found as follows. If A is a 2 × 2 matrix,
 
a b a b
det A = det  = = ad − bc.
c d c d
If A is a 3 × 3 matrix, the determinant can be calculated by expanding along the first row:
 
a b c a b c
det A = det d e f  = d e f
 
 
g h i g h i
e f d f d e
=a −b +c
h i g i g h
For an n × n determinant,
det A = a11 M11 − a12 M12 + a13 M13 − a14 M14 + · · · + (−1)n+1 a1n M1n ,
where Mij is the (n − 1) × (n − 1) determinant of the matrix with the ith row and the jth
column removed.
It follows that the determinant of an upper triangular matrix can be found by taking the
product of the diagonal elements. Note also that the determinant of a matrix is unchanged
if a multiple of one row is added to another row. Therefore, the determinant of a matrix can
be calculated by using row reduction to put the matrix into upper triangular form and then
taking the product of the diagonal elements.
1 1 1 1
1 2 1 −1
i.e. to evaluate
2 1 −1 −2
1 5 −1 1
1 1 1 1 1 1 1 1
1 2 1 −1 0 1 0 −2 R2 − R1
we write =
2 1 −1 −2 0 −1 −3 −4 R3 − 2R1
1 5 −1 1 0 4 −2 0 R4 − R1
1 1 1 1 1 1 1 1
0 1 0 −2 0 1 0 −2
= =
0 0 −3 −6 R3 + R2 0 0 −3 −6
0 0 −2 8 R4 − 4R2 0 0 0 12 R4 − 32 R3
= 1 × 1 × (−3) × 12 = −36.
Note: If the determinant of a matrix is zero, then row reduction will produce a matrix
with a row of zeros.
70
6.1.3 Homogeneous Systems
Consider the equations A∼ x=∼ 0. Clearly, ∼x=∼ 0 is a solution, so if det A 6= 0, it must be the
only solution. If we want a non-zero solution, we must have det A = 0. This means there
will be many solutions, since if ∼
x is a solution, then λ∼x will also be a solution.
1 −1 2
 
Example: If A =  1 3 , find the value of k for which the equations A∼

x=∼
0 will
 
 k
−2 1 −1
have a non-zero solution. Find the solutions in this case.
Now
3 k 1 k 1 3
det A = + +2 = k + 10.
1 −1 −2 −1 −2 1
Therefore, for non-zero solutions, k = −10. Then
1 −1 2 0 1 −1 2 0
   
[A|∼
b] =  1 3 −10 0 → 0 4 −12 0
   
.
  
−2 1 −1 0 0 0 0 0
So, if we choose an arbitrary value for z, we can find the values of x and y. Therefore, we
put z = λ. We get y = 3λ and x = λ. The solution is
1
     
x λ
x=
 y  = 3λ = λ 3 .
     
    
∼
z λ 1
71
6.1.4 The inner product
Consider two column vectors

   
x
 1
y
 1
 x2   y2 
   
   
= and ∼y = 
x  y 
x
∼  3 3 .
 ..   .. 
   
 .   . 
   
xn yn
The product, ∼
xT ∼y, is
 
y
 1
y 
 
 2

x1 , x2 , x3 , · · · , xn  3  = x1 y1 + x2 y2 + x3 y3 + · · · + xn yn ,
y 
 .. 
 
 . 
 
yn
a single number. This is called the inner product of ∼

x and ∼y and is a generalization of the
dot product in three dimensions. For this reason it is often written as ∼ x · ∼y. It satisfies
all the properties of the dot product in three dimensions. (Note that there is no similar
generalization of the cross product to higher dimensions.)
72
6.2 Eigenvalues and Eigenvectors
The equation A∼
x=∼ b suggests that the matrix A transforms the vector ∼
x into the vector ∼
b.
An important case arises when A∼x∝∼ x. i.e.
x = λ∼
A∼ x.
This will only happen for special vectors, ∼

x. λ is a scalar, called the eigenvalue, and ∼
x is
an eigenvector of the matrix A. To see how this can arise, consider the simple harmonic
oscillator.
The motion is governed by the equation

k
mÿ = −ky
m We know that the solution is y = a sin(ωt + ).

y
In the case of a coupled oscillator, there are two k1

equations of motion to consider. The tension in the
first spring is k1 y1 and the tension in the second
m1
spring is k2 (y2 − y1 ). We get the following. y1
m1 ÿ1 = −k1 y1 + k2 (y2 − y1 ) k2

m2 ÿ2 = −k2 (y2 − y1 ).
m2
y2
Consider the case where

m1 = 3, m2 = 1
k1 = 9, k2 = 6
Then 3ÿ1 = −9y1 + 6(y2 − y1 ) and ÿ2 = −6(y2 − y1 ). Therefore, the equations become
ÿ1 = −5y1 + 2y2

ÿ2 = 6y1 − 6y2
    
ÿ1 −5 2 y1 
and in matrix form:   =  or ÿ = A∼y.
ÿ2 6 −6 y2 ∼
73
We now look for solutions for which y1 = a1 sin(ωt + ) and y2 = a2 sin(ωt + ). This means
that    
a a
 1  sin(ωt + ) and ÿ = −ω 2  1  sin(ωt + ).
y
∼
= ∼
a2 a2
This means that ∼ÿ = −ω 2 ∼y and the set of differential equations becomes A∼y = −ω 2 ∼y. Now
both sides have a factor of sin(ωt + ). If we divide by this factor, we get
a = −ω 2 ∼
A∼ a.
 
a1
Here ∼
a =   is an eigenvector and −ω 2 is the eigenvalue. The equations for ∼
a are
a2
    
−5 2 a1  a1
 = −ω 2   .
6 −6 a2 a2
These are
−5a1 + 2a2 = −ω 2 a1
6a1 − 6a2 = −ω 2 a2 ,
which can be written as
(−5 + ω 2 )a1 + 2a2 =0
6a1 + (−6 + ω )a2 = 0,
2
or     

−5 + ω 2
2 a 0
  1 =   .
6 −6 + ω 2
a2 0
Now this is a homogeneous set of equations. Since we want non-zero solutions, we will need
to have  
−5 + ω 2
2
det   = 0.
6 −6 + ω 2
Therefore, (−5 + ω 2 )(−6 + ω 2 ) − 12 = 0, or ω 4 − 11ω 2 + 18 = 0. Therefore, ω 2 = 2 or ω 2 = 9.

Thus we have determined the eigenvalues of the matrix. Note that these eigenvalues are
related to ω, the frequency of the motion.
We now take each of these eigenvalues in turn and calculate the corresponding eigenvector.
If ω 2 = 2,     
−3 2 a1  0
 = .
6 −4 a2 0
   
a1 2
This can be solved to give   = C  .
a2 3
If ω = 9,
2
    

4 2 a 0
  1 =   .
6 3 a2 0
74
   
a1 1
This can be solved to give   = D  .
a2 −2
We now have two solutions for ∼y. These are
 
2 √
y = C   sin( 2t + 1 )
∼ 3
 
1
and ∼y = D   sin(3t + 2 ).
−2
The general solution is

   
2 √ 1
y = C   sin( 2t + 1 ) + D  sin(3t + 2 ).
∼ 3 −2
We now look at the general eigenvalue problem. This is written as A∼

x = λ∼
x. If we rewrite
this as A∼
x = λI ∼
x, where I is the identity matrix, then
(A − λI)∼
x=∼
0.
For non-zero solutions of this equation, we need
det(A − λI) = 0.
2 4 −2
 
Example: If A =  4 2 2, then to determine the eigenvalues, we require:

 

−2 2 5
2−λ 4
 
−2
det  4 2 2  = 0.
 
 − λ 
−2 2 5−λ
This will be a cubic equation in λ and will have three solutions. Each solution, λi , will give
rise to an eigenvector, ∼
vi .
In general, the equation det(A − λI) = 0 is an nth degree polynomial equation, called the
characteristic equation of A. Typically, there will be n solutions, and each solution will
give an eigenvector. These eigenvalues and eigenvectors are characteristic properties of the
matrix, A. It sometimes happens that some of the eigenvalues can be the same. Sometimes,
if two eigenvalues are the same, then this eigenvalue will have two eigenvectors associated
with it. In this case, there is still n different eigenvectors. This doesn’t always happen
though and sometimes, if two eigenvalues are the same, there will not be n eigenvectors.
There are two important theorems here.
75
Theorem 1: If A is real and symmetric, then the eigenvectors of distinct eigenvalues of A
will be orthogonal. (This means that their inner product is zero).
Theorem 2: If A is a real and symmetric n × n matrix, the eigenvalues will all be real
and there will be n mutually orthogonal eigenvectors.
Theorem 3: The trace of a matrix A, denoted by tr(A) equals the sum of all eigenvalues.
n
i.e. tr(A) =
X
λi
i=1
These results are important, as many of the matrices that arise in practice are real and
symmetric. In particular, the matrix in the example we started is real and symmetric, so
these theorems will apply.
The eigenvalue equation for this example is
2−λ 4
 
−2
det 
 4 2−λ  = 0.
2 
 
−2 2 5−λ
2−λ 2 4 2 4 2−λ
∴ (2 − λ) −4 + (−2) = 0.
2 5−λ −2 5 − λ −2 2
∴ (2 − λ)(λ2 − 7λ + 6) − 4(−4λ + 24) − 2(−2λ + 12) = 0.
∴ (2 − λ)(λ − 1)(λ − 6) + 20(λ − 6) = 0.
∴ (λ − 6) (2 − λ)(λ − 1) + 20 = 0.

∴ (λ − 6) − λ2 + 3λ + 18 = 0.
∴ (λ − 6)(λ − 6)(λ + 3) = 0.
∴ (λ − 6)2 (λ + 3) = 0.
The eigenvalues are λ = 6 and λ = −3. Note that λ = 6 is an eigenvalue of order 2.
However, since A is real and symmetric, we expect to get two eigenvectors associated with
this eigenvalue.
 
x
If λ = −3 and ∼
x= y , the equation (A − λI)∼
x=∼
0 becomes
 
 
z
5 4 −2 0
    
x
 y  = 0 .
 4 5 2
    
   

−2 2 8 z 0
76
The row reduced, augmented matrix is
5 4 −2 0
 
0 1 2 0
 
 ,
0 0 0 0
so
2z 2
     
x
y  = −2z  = z −2 .
     
     
z z 1
If λ = 6,
4 −2 0
    
−4 x
 4 −4 2 y  = 0
    
.
   

−2 2 −1 z 0
The row reduced, augmented matrix is
2 −2 1 0
 
0 0 0 0
 
 .
0 0 0 0
This means that 2x − 2y + z = 0, or x = y − 21 z. Therefore,
y − 12 z 1 −1
       
x
 2
y  =  y  = y 1 + z  0 .
     
       
z z 0 1
Therefore, the eigenvalues and corresponding eigenvectors are
λ = −3 λ=6
2 1 − 21
     
1  0
     
−2
     
1 0 1
Note that there are two eigenvectors for λ = 6. Also each of these is orthogonal to the
eigenvector for λ = −3. Any linear combination of the eigenvectorsfor λ = 6 will alsobe
1
− −1
 2  2
an eigenvector with eigenvalue 6. This means that we can replace  0 with 2  0 +
  
1 1
− 12
 
1
 
1 
1 = 

1
 to get a complete set of orthogonal eigenvectors.

2    2

0 2
77
6.3 EXERCISES - Eigenvalues and eigenvectors
1. Find the general solution to the following sets of equations.
x + 2y − z = 0
2x + 3y = 0
(i) (ii) x + 3y =0
4x + 6y = 0.
x + y − 2z = 0.
2. Find the eigenvalues and corresponding eigenvectors for the following matrices.
4 0 1
 
 
6 3
(i) A =  (ii) A = 
−2 1 0 .
 
. 
2 7
−2 0 1
3. (a) Find the eigenvalues and eigenvectors of the matrix
1 1 2
 
A = 1 2 1
 
.

2 1 1
(b) Show that the eigenvectors are mutually orthogonal.
4. The two coupled differential equations
y10 = −2y1 + 2y2

y20 = 2y1 + y2
 
−2 2
can be written as ∼y 0 = A∼y where A =  .
2 1
 
a1
(a) Show that ∼y =   eλx is a solution of this equation if λ is an eigenvalue of A
a2
and ∼
a is an eigenvector.
(b) Find the eigenvalues and eigenvectors of A and hence write down the general
solution of the differential equations. (The general solution will have two arbitrary
constants.)
(c) Also find the solution that satisfies y1 (0) = 1, y2 (0) = 0.
(d) Show that the equations can also be solved by using the second equation to write
y1 in terms of y2 and y20 and substituting this into the first equation. The resulting
equation is a second order differential equation for y2 . Solve this equation for y2
and then find the solution for y1 .
78
5. (a) If λ is an eigenvalue of a matrix A and ∼x is the corresponding eigenvector, show
that λ is an eigenvalue of A and ∼
2 2
x is the corresponding eigenvector.
(b) If the eigenvalues of a matrix A are λ1 , . . . , λn (all non-zero), show that the eigen-
values of A−1 are λ−11 , . . . , λn .
−1
(c) If the eigenvalues of a matrix A are λ1 , . . . , λn , what are the eigenvalues of the
matrix A − kI?
6. (a) Find the eigenvalues and eigenvectors of the matrix

 
a b c
A=
0 d e .
 

0 0 f
(b) Show that the eigenvalues of any upper triangular matrix are just the diagonal
elements of the matrix.
79
6.4 SELECTED ANSWERS - Eigenvalues and eigenvectors
3
   
    x
x 3
1. (i) =λ .
, (ii)  y  = λ  −1 
   
    
y −2
z 1
2. (i) λ1 = 9, ∼
v1 = (1, 1)T , λ2 = 4, ∼
v2 = (−3/2, 1)T ;
(ii) λ1 = 1, ∼
v1 = (0, 1, 0)T , λ2 = 2, ∼
v2 = (−1/2, 1, 1)T , λ3 = 3, ∼
v3 = (−1, 1, 1)T .
3. (a) λ1 = 1, ∼
v1 = (1, −2, 1)T , λ2 = 4, ∼
v2 = (1, 1, 1)T , λ3 = −1, ∼
v3 = (−1, 0, 1)T .
4. (a) See lecture notes.

(b) λ1 = −3, ∼v1 = (−2, 1)T , λ2 = 2, ∼
v2 = (1, 2)T ,
y1 = −2Ce−3x + De2x , y2 = Ce−3x + 2De2x ;
(c) y1 = 54 e−3x + 51 e2x , y2 = − 25 e−3x + 25 e2x ;
(d) Equation is y2 00 + y2 0 − 6y2 = 0.
6. λ = a, d, f .
80
7 Partial Differential Equations
7.1 Types of Partial Differential Equation
We know that if y(x) is a function of one variable, the derivatives will be ordinary derivatives,
dy
. Such a function may satisfy an ordinary differential equation such as y 00 + ay 0 + by = 0.
dx
Usually the differential equation will be of the form g(x, y 0 , y 00 , . . .) = 0. If z = f (x, y) is
a function of more than one variable, the derivatives will be partial derivatives, ∂f /∂x and
∂f /∂y. For such functions, any differential equations will involve partial derivatives. e.g.
∂f ∂f ∂ 2 f ∂ 2 f ∂ 2 f

g x, y, f, , , , , ,... = 0.
∂x ∂y ∂x2 ∂x∂y ∂y 2
Such equations are partial differential equations. For functions of three variables, they can
be more complicated.
Some differential equations of this type are simple enough and we can handle them using
familiar methods. (They are effectively ordinary differential equations.) Others are more
difficult and need ‘new’ methods. Some of these equations are important in Science and
Engineering. For example, the Heat equation describes the flow of heat through a solid.
An example of a simple partial differential equation is the following. If u = u(x, y) and
∂u
= y sin x + ey ,
∂x
we can treat y as a (constant) parameter and integrate to get
u = −y cos x + xey + g(y).
Here the arbitrary constant can actually be an arbitrary function of y. If the equation involves
derivatives with respect to both variables, this simple type of solution is not possible.
81
We will look at some second order equations of this type. There are two particular equations
we will look at, namely
∂u ∂ 2u
i. Heat Equation = ν 2, where ν is a constant.
∂t ∂x
The Heat Equation describes the temperature dis-
tribution in a thin rod. The 3D equation:
∂u/∂t = ν(∂ 2 u/∂x2 + ∂ 2 u/∂y 2 + ∂ 2 u/∂z 2 )
describes the distribution in 3D objects.
The equation is sometimes also called the diffusion equation.
∂ 2u 2
2∂ u
ii. Wave Equation = c .
∂t2 ∂x2
u
This describes the movement or vibrations
(in a string) or propagation of signals (in
a coaxial cable). In 3D the equation,
∂ u/∂t = c (∂ u/∂x + ∂ 2 u/∂y 2 + ∂ 2 u/∂z 2 )
2 2 2 2 2
describes electro-magnetic waves. It can also

l x
describe water waves.
These equations are linear, homogeneous, and second order. They have constant coefficients.
This means we can add two solutions to get another solution.
82
7.2 Separation of Variables
This is the fundamental method for solving such equations. It depends on being able to add
solutions to get another solution. This is called the principle of superposition.
7.2.1 The Heat Equation
∂u ∂ 2u
Like all differential equations, the Heat Equation, = ν 2 needs extra information to
∂t ∂x
determine a solution uniquely. For example, we would expect to know the initial temperature
distribution as well as the temperature on the boundaries. We might have
u(x, 0) = 1 initial condition

u(0, t) = u(l, t) = 0 boundary conditions.
The resulting temperature (graphs of the solution at different times) might look like:
1
t=.05
t=.1
t=.3
t=1
l x
Before solving the equation, we need to establish the ‘principle of superposition’. If u1 and
u2 both satisfy the differential equation, then
∂u1 ∂ 2 u1
=ν 2,
∂t ∂x
∂u2 ∂ 2 u2
=ν 2.
∂t ∂x
We can add these two equations to get
∂(u1 + u2 ) ∂ 2 (u1 + u2 )
=ν .
∂t ∂x2
That is, u1 + u2 is also a solution. Therefore, we can look for a set of solutions of a certain
type and hope to be able to add multiples of them in such a way that the boundary and
initial conditions will be satisfied.
We assume the solution is of the form:
u(x, t) = X(x)T (t).
83
This ‘separates’ the function of two variables, u, into two functions of one variable.
∂u ∂ 2u
From the assumption above, we have = X Ṫ and = X 00 T .
∂t ∂x2
We substitute into the DE to get:
X Ṫ = νX 00 T.
1 Ṫ X 00
∴ = .
νT X
Now the LHS of this last equation depends on t only, and the RHS depends on x only. The
only way that this can happen is if both sides are constant. Therefore we have
1 Ṫ X 00
= = k,
νT X
where k is called the constant of separation. These equations can now be written separately
as
Ṫ − νkT = 0 and X 00 − kX = 0.
Note that the differential equation has ‘separated’ into two ordinary differential equations.
Both of these equations can be solved, but the nature of the solutions of the second depends
on the value of k. Also, the boundary conditions of the partial differential equation suggest
that we require X(0) = X(l) = 0 as conditions on X. This actually restricts the possible
values of k. There are three possible cases:
i. k > 0.
In this case, let k = q 2 . Then the solution for X is X(x) = Aeqx + Be−qx . We now
look at the boundary conditions.
X(0) = 0 ⇒ A + B = 0
X(l) = 0 ⇒ Aeql + Be−ql = 0.
These equations imply that A = B = 0. This implies a trivial solution and therefore
there is no non-zero solution of this type.
ii. k = 0.
In this case, X(x) = Ax+B and again the boundary conditions imply that A = B = 0.
This also implies a trivial solution.
iii. k < 0.
In this final case, let k = −p2 . The solution is then X(x) = A cos px + B sin px. Then
X(0) = 0 ⇒ A = 0
X(l) = 0 ⇒ B sin pl = 0.
84
Now B can’t be zero (or all solutions would be zero/trivial) so sin pl = 0. This means
n2 π 2
that pl = nπ for some integer n. Therefore, p = nπ/l and k = − 2 . Therefore
l
nπx
Xn (x) = Bn sin .
l
This gives a range of possible solutions for X. Note that the solutions can be labelled
by the subscript n.
Now the value of k can be used to find the corresponding value of Tn . We have
νn2 π 2
Ṫn + Tn = 0
l2
νn2 π 2
or Ṫn + λ2n Tn = 0 where λ2n =
l2
so Tn (t) = An e−λn t
2
and
nπx −λ2n t
un (x, t) = Xn (x)Tn (t) = Bn sin e .
l
(The constant An can be absorbed into Bn .) Now each such function un (x, t) satisfies the
differential equation and each one also satisfies the boundary conditions. However, the initial
condition, u(x, 0) = 1, is not satisfied. It may be though, that we can take a sum of these
solutions in such a way that this initial condition is satisfied. That is, we write
∞
nπx −λ2n t
u(x, t) = un (x, t) = Bn sin
X X
e
n n=1 l
This initial condition then becomes

∞
nπx
u(x, 0) = 1 = Bn sin
X
.
n=1 l
We need to choose the coefficients Bn so that this equation is satisfied. But this equation
implies that the Bn are the coefficients of the 12 -range Fourier sine series for the function
f (x) = 1. This means that
2Z l nπx
Bn = sin dx
l 0 l
4


 , (n odd)
= nπ
 0,

(n even).
As the summation in the solution u(x, t) includes all terms for n = 1, 2, 3, . . ., but Bn = 0
for even values of n, we make a substitution in order to use only the odd values.
Let n = 2m − 1, so that when m = 1, 2, 3, . . ., n = 1, 3, 5, . . .
85
Therefore,
4 X∞
1 (2m − 1)πx −(ν(2m−1)2 π2 /l2 )t
u(x, t) = sin e .
π m=1 2m − 1 l
Note that as t → ∞, u → 0. For large values of t, the temperature is roughly proportional
πx
to sin , as this part of the solution decays least rapidly.
l
Another type of boundary condition that is common for this type of problem is that of an
insulated end.
111
000
000
111
000
111
000
111
000
111
000
111
000
111 This means that no heat flows across the end.
000
111
000
111
000
111
000
111
000
111
000
111
If there is a temperature gradient at this end,

then the heat must flow from hot to cold T
there. (The heat flow is proportional to the
gradient of the temperature.) Therefore, heat ow
no heat flow means no temperature gradient.
Therefore the boundary condition is
∂u x
(l, t) = ux (l, t) = 0.
∂x
Such conditions can be treated similarly to the other type of boundary condition.
86
7.2.2 The Wave Equation
∂ 2u 2
2∂ u
The wave equation is = c .
∂t2 ∂x2
We consider the auxilliary conditions we would need for the case of a vibrating string.
If the string is fixed at both ends, we expect
u
u(0, t) = u(l, t) = 0.
Also, the initial position and initial velocity can be

specified. i.e.
u(x, 0) = f (x)
l x
ut (x, 0) = g(x).
Again, we can superimpose solutions, so we look for solutions of the type u(x, t) = X(x)T (t).
If we substitute this into the differential equation, we get X(x)T̈ (t) = c2 X 00 (x)T (t). There-
fore,
T̈ X 00
= .
c2 T X
Each side of this equation is a function of x or t only, so once again, they both must be
constant. We get the equations
X 00 − kX = 0 and T̈ − c2 kT = 0.
Once again, the boundary conditions imply X(0) = X(l) = 0, so we can follow the same
procedure as for the Heat Equation (k > 0, k = 0 =⇒ trivial solutions) to get
n2 π 2
k=−
l2
nπx
and Xn (x) = Bn sin .
l
cnπ

The equation for Tn is: T̈n + λ2n Tn = 0, λn =
l
so: Tn = An cos λn t + Bn sin λn t
nπx
and: un (x, t) = sin An cos λn t + Bn sin λn t
l
The arbitrary constant Bn in Xn (x) has been absorbed into An and Bn .
These functions will satisfy the differential equation and the boundary conditions. We still
87
have to satisfy the initial conditions. In this case, there are two of them. We write
∞
nπx
u(x, t) = un (x, t) = sin An cos λn t + Bn sin λn t .
X X
n n=1 l
It follows that ∞
nπx
u(x, 0) = f (x) = An sin
X
.
n=1 l
Hence, the An are the coefficients of the 21 -range Fourier sine series of f (x). Also,
∞
nπx
ut (x, t) = sin− An λn sin λn t + Bn λn cos λn t
X
n=1 l
∞
nπx
ut (x, 0) = g(x) = Bn λn sin
X
∴ .
n=1 l
Therefore, Bn λn are the coefficients of the 12 -range Fourier sine series for g(x).
If we are given functions f (x) and g(x), we can calculate An and Bn using the half range
Fourier series integrals.
Note that each term in the series for u(x, t), i.e.
nπx
un (x, t) = sin An cos λn t + Bn sin λn t ,
l
is a harmonic of the motion.
u1
E.g., if n = 1, then for fixed t, the shape of the

string is a sine curve. This will oscillate as t in-
x
l creases.
u2
If n = 2, we get the next harmonic.

l x
88
As n increases, the frequency of the nth harmonic increases. Thus. we see that the vibration
can be composed as a linear combination of harmonics.
Note: Depending on the boundary and initial conditions of the PDE, during the separation
of variables method, it is possible to have non-trivial solutions for two different values of the
separation constant k. i.e. non-trivial solutions for k < 0 and k = 0.
Recall that if two different solutions to one differential equation are determined, the sum of
the two solutions will also be a solution.
89
7.3 EXERCISES - Partial Differential Equations
In questions 1–2, u is a function of x and y.
1. Find the solution of each of the following partial differential equations.
(a) uxx = 0.
(b) uxyy = cos 2x.
(c) ux = 0, uy = 0.
(d) uxx = 0, uxy = 0, uyy = 0.
2. Use ‘separation of variables’ to find (some) solutions of the following partial differential
equations.
(a) ux + uy = 0.
(b) xux − yuy = 0.
(c) ux + uy = 2u
3. If f and g are arbitrary functions of one variable, show that the following expressions
satisfy the corresponding equation from question 2.
(a) f (x − y), (b) f (xy), (c) f (x − y)ex+y .
4. An iron bar of length l has initial temperature u(x, 0) = T0 x/l where T0 is a constant.
For t > 0, the temperature of the end points is fixed at 0. Find an expression for the
temperature in the bar for t > 0.
5. A string is held fixed at end points x = 0 and x = l. Initially, it is extended so that

πx 2πx
its shape is u(x, 0) = 2 sin − sin and its initial velocity is ut (x, 0) = 0. If its
l l
subsequent motion satisfies the wave equation, utt = c2 uxx , use separation of variables
to find this motion.
6. Find the temperature, u(x, t), in a bar of length, l, that is perfectly insulated at the ends
x = 0 and x = l. Assume that the temperature satisfies ut = νuxx and u(x, 0) = f (x).
In the case of insulated ends, the boundary conditions are
∂u ∂u
(0, t) = (l, t) = 0.
∂x ∂x
Show that the method of separation of variables gives the solution in the form
∞
nπx −(cnπ/l)2 t

u(x, t) = A0 + An cos
X
e .
n=1 l
What happens as t → ∞. Is this what you would expect?
90
7.4 SELECTED ANSWERS - Partial Differential Equations
1. (a) u(x, y) = xf (y) + g(y), (b) u(x, y) = 41 y 2 sin 2x + yp(x) + q(x) + r(y),
(c) u(x, y) = c (a constant), (d) u(x, y) = Ax + By + C.
2. (a) u(x, y) = Cek(x−y) , (b) u(x, y) = C(xy)k , (c) u(x, y) = Ce2y ek(x−y) .
2T0 X
∞
(−1)n+1 nπx −ν(nπ/l)2 t

4. u(x, t) = sin e .
π n=1 n l
5. u(x, t) = 2 sin(πx/l) cos(cπt/l) − sin(2πx/l) cos(2cπt/l).
91
8 Probability and Statistics
8.1 Probability Fundamentals
An experiment is the process of obtaining an observation.
An event is the outcome of an experiment.
Example: Toss a six-sided die
Some events:
• E1 : Observe a 1.
• A: Observe an odd number.
• B: Observe an even number.
Events A and B can be regarded as combinations of events E1 to E6 .
An event that cannot be regarded as a combination of other events is known as a simple

event.
In the example above: Events E1 − E6 are simple events. Events A and B are not.
The Sample Space (S) is the list (or set) of all the simple events possible from an experi-
ment.
For the example of tossing a six-sided die, the sample space is: S = {E1 , E2 , . . . , E6 }
Example: Tossing two coins. The sample space includes four simple events:
S = {E1 (H H), E2 (H T ), E3 (T H), E4 (T T )}
Note that E2 and E3 are different events as the order (or the outcome of each coin) is specific.
92
Venn Diagram
A Venn Diagram shows the outcomes (or events) of an experiment as a portion of the sample
space (S)
An example of a Venn diagram displaying three events A, B and C, each made up of different
outcomes is shown below:
Probability of an Event
The probability of an event is a measurement of the likelihood that an event will occur in
the next experiment.
One way to estimate this is to repeat the experiment a large number of times (N ) and record
the number of times (n) that event A occurs.
The probability of an event is estimated by:

n
P (A) =
N
Theoretically we would need an infinite number of trials for the probability to be exact.
Practically, we would expect, for a fair die, that the numbers 1 − 6 would occur with equal
frequency and hence:
1
P (E1 ) = P (E2 ) = P (E3 ) = P (E4 ) = P (E5 ) = P (E6 ) =
6
93
Mutually Exclusive Events
Two events are mutually exclusive if, when one event occurs, the other cannot.
Simple events are, by definition, mutually exclusive.
For simple events: 0 6 P (Ei ) 6 1, for all simple events Ei . And
P (Ei ) = 1
X
The probability of an event A is equal to the sum of the probabilities of all the simple events
contained in A.
For the single die tossing experiment:
1 1 1 1
P (A) = P (E1 ) + P (E3 ) + P (E5 ) = + + =
6 6 6 2
Compound Events are a combination of simple events.
They can be formed in one of two ways or as a combination of the two.
Intersection: The intersection of two events A and B is the event that both A and B occur.
It is denoted as AB or A ∩ B
Union: The union of two events A and B is the event that A or B or both occur. It is
denoted as A ∪ B, and the probability P (A ∪ B) can be visualised with the Venn diagram:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
For mutually exclusive events A and B: P (A ∩ B) = 0
Therefore: P (A ∪ B) = P (A) + P (B)
Complement: The complement of an event A consists of all of the simple events that are
not in event A.
Denoted as Ā, we can see that: P (A) = 1 − P (Ā)
94
8.2 Conditional Probability and Independence
Two events may be related such that the probability that an event occurring depends on
whether or not the other event has occurred.
For instance – the probability of rain is dependent on whether or not the day is cloudy.
The conditional probability of A given that B has occurred is written as: P (A |B)
The conditional probability of A given that B has occurred is:
P (A ∩ B)
P (A |B) = or P (A ∩ B) = P (A |B) P (B)
P (B)
Similarly, the conditional probability of B given A is:
P (A ∩ B)
P (B |A) = or P (A ∩ B) = P (B |A) P (A)
P (A)
Example: Consider an experiment that randomly chooses a single object from a total of
1100 objects. 750 objects are coloured red, and 350 are coloured blue. Of the red objects,
600 are made of glass and 150 made of metal. Of the blue objects, 50 are glass and 300
metal. Given that a red object is chosen, calculate the probability of the object being made
of glass:
The above infomration is summarised in the table:
Red (R) Blue (B) Total

Glass (G) 600 50 650
Metal (M) 150 300 450
Total 750 350 1100
Using the table, given that a red object is chosen (2nd column), the probability of the object
being glass is:
600 4
P (G |R) = = = 0.8
750 5
P (R ∩ G)
Alternatively, using: P (G |R) =
P (R)
750
Probability that the object is red: P (R) =
1100
600
Probability that the object is red and made of glass: P (R ∩ G) =
1100
P (R ∩ G) (600/1100) 600
Therefore: P (G |R) = = = = 0.8
P (R) (750/1100) 750
95
8.2.1 Independence
Two events A and B are independent if:
P (A |B) = P (A) or P (B |A) = P (B)
i.e. The fact that B has occurred has no influence on the probability of A occurring and
vice versa.
Multiplicative (Product) Rule
If events A and B can occur in a particular experiment:
P (A ∩ B) = P (A)P (B |A) = P (B ∩ A) = P (B)P (A |B)
If two events A and B are independent, then:
P (A ∩ B) = P (A)P (B)
Example: In a box of 40 capacitors, 5 are defective. If 2 capacitors are chosen at random

without replacing the first, calculate the probability that both are defective.
Let A be the event that the first capacitor is defective, and B be the event that the sec-
ond capacitor is defective. A and B are independent events, but they each have different
probabilities of occurring.
5 4 1
P (A ∩ B) = = = 0.0128
40 39 78
96
Example: An electrical system consists of 4 components. The system works if components
A and B work and if C or D work. Assume that the reliability of each component is
independent. The probabilities of each component working is: P (A) = P (B) = 0.9 and
P (C) = P (D) = 0.8. Calculate the probability that the whole system works.
The system configuration is shown in the figure.
The probability that the entire system works is given by:
P [A ∩ B ∩ (C ∪ D)] = P (A)P (B)P (C ∪ D)

= P (A)P (B)[P (C) + P (D) − P (C ∩ D)]
= P (A)P (B)[P (C) + P (D) − P (C)P (D)]
= (0.9)(0.9)[0.8 + 0.8 − 0.82 ]
= 0.7776
Alternatively: P [A ∩ B ∩ (C ∪ D)] = P (A)P (B)P (C ∪ D)

= P (A)P (B)[1 − P (C̄ ∩ D̄)]
= P (A)P (B)[1 − P (C̄)P (D̄)]
= P (A)P (B)[1 − (1 − P (C))(1 − P (D))]
= (0.9)(0.9)[1 − (1 − 0.8)(1 − 0.8)]
= 0.7776
97
8.3 Permutations and Combinations
A combination is a choice of r objects from a list of n possible (or total) objects, where the
order of the objects does not matter.
A permutation is an ordered array of r distinct objects.
Example: Suppose we have 3 objects: a1 , a2 and a3 , and we choose 2 objects.
The list of all possible combinations of two objects is:
Combinations of two
a1 and a2
a1 and a3
a2 and a3
And we see there are 3 possible combinations.
The list of all possible permutations of two objects is:
Combinations of two Reordered Combinations

a1 and a2 a2 and a1
a1 and a3 a3 and a1
a2 and a3 a3 and a2
And we see there are 6 possible permutations.
In general, the number of permutations in choosing r objects from a list of n is:
n!
Prn = n(n − 1)(n − 2) . . . (n − r + 1) =
(n − r)!
The number of combinations possible, in choosing r objects from a list of n is:

!
n n!
Crn = =
r r!(n − r)!
98
8.4 Random Variables (X)
A random variable is assigned a real number (or value) from a rule (or function) that maps
events (or outcomes) in a sample space from an experiment.
A random variable may be discrete (having specific values) or continuous (any value in a
continuous range).
Discrete Random Variables
A random variable is discrete if it can only assume a countable number of values.
Discrete random variables frequently represent count data.
Categorical data can be described using random variables by arbitrarily assigning a value to
each category.
Example: Consider an experiment involving the toss of two coins. We can arbitrarily
chose a value for each outcome.
Random Variable (X) Result of coin toss

0 H and H
1 H and T
2 T and T
Continuous Random Variables
A continuous random variable can take on any value in a continuous interval.
Typically, they arise from values of measured quantities (e.g. length, temperature, weight,
etc.).
8.4.1 Probability Distributions for Discrete Random Variables
Probability Mass Function (pmf)
A probability distribution is a function f (x) that gives the probability for each possible value
of the random variable.
Example: For the two coin toss experiment:
Outcome (x) Result of coin toss Probability f (x)

0 H and H 1/4
1 H and T 1/2
2 T and T 1/4
99
The probability distribution is read as the probability that random variable X is equal to a
particular number x.
f (x) = P (X = x)
1 1
In the table above: P (x = 1) = and P (1) =
2 2
The set of ordered pairs [x, f (x)] is known as a probability function, probability mass
function (pmf) or probability distribution.
Some important properties are: f (x) > 0 for all values of x, and f (x) = 1
X
We can plot the probability mass function as:
For the example above: P (x 6= 1) = 1 − P (x = 1)
Cumulative Distribution Function (cdf)
The cumulative distribution function gives us the probability that random variable X is less
than or equal to some value x.
F (x) = P (X 6 x) = P (X = t) = f (t)
X X
t6x t6x
We can sketch the graph of the cumulative distribution function, F (x), by adding values
from the probability mass function, f (x).
100
8.4.2 Probability Distributions for Continuous Random Variables
A continuous random variable has a zero probability of being exactly equal to a particular
value. i.e. P (X = x) = 0.
We are interested in calculating the probability that a continuous random variable lies in a
certain range of values: P (a 6 X 6 b)
Note: As P (X = a) = P (X = b) = 0, then P (a 6 X 6 b) = P (a < X < b)
Probability Density Function
The probability distribution is described in terms of a probability density function (pdf).

This is equivalent to the probability mass functions for descrete random variables. The pdf
is defined as: Z b
P (a 6 X 6 b) = f (x)dx
a
Z ∞
Some important properties are: f (x) > 0 and f (x)dx = 1
−∞
Cumulative Distribution Function (cdf)
As for discrete random variables, the cumulative distribution function evaluates the proba-
bility that random variable X is less than or equal to some value x.
Z x
F (x) = P (X 6 x) = f (t) dt
−∞
101
 √
k x 06x<4
Example: Consider the probability density function: f (x) =
 0 Elsewhere
a) Evaluate k and b) P (3 < x < 4)
Z ∞
a) To evaluate k, we use: f (x)dx = 1, to get:
−∞
Z 4 √
k xdx = 1
0
2 3/2 4

kx =1
3 0
16
k=1
3
3
k=
16
Z 4
3√
b) P (3 < x < 4) = xdx
3 16
3 2 3/2 4

= x
16 3 3
1√
=1− 27
8
= 0.3505
8.4.3 Expectation for Discrete Random Variables
Population Mean (µ)
The population mean is also called the expected value, E[X] of the random variable X.
Consider the previous example of tossing two fair coins.
Outcome (x) Result of coin toss Probability f (x)

0 H and H 1/4
1 H and T 1/2
2 T and T 1/4
102
If we repeated the experiment many times, say 1 000 000 we would expect:
Outcome (x) Number of times x is the result

0 250 000
1 500 000
2 250 000
The average value of x would be:

250000(0) + 500000(1) + 250000(2)
1000000
250000 500000 250000
= (0) + (1) + (2)
1000000 1000000 1000000
1 1 1

= (0) + (1) + (2)
4 2 4
1 1
= +
2 2
=1
This can also be written as: (0)P (X = 0)+(1)P (X = 1)+(2)P (X = 2) = xP (X = x) = 1

X
The expected value of x, or E[x], is given by:
µ = E[x] = xP (X = x) = xf (x)
X X
x x
Example: A lot containing 7 components is sampled by a quality inspector; the lot contains
4 good components and 3 defective components. A sample of 3 is taken by the inspector.
Calculate the expected value of the number of good components in this sample.
Let X represent the number of good components in the sample. For the probability mass
function f (x), we need the probability of their being 0, 1, 2 or 3 good components in the
sample. i.e. P (X = x), x = 0, 1, 2, 3.
7
!
The total number of combinations in choosing 3 components from 7 is:
3
The number of combinations that have x good components chosen from the 4 possible, and
4 3
! !
3 − x defective components from the 3 possible is: . Therefore the pmf is:
x 3−x

4 3
x 3−x
f (x) =
7
3
103
which gives: f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, f (3) = 4/35. Therefore
1 12 18 4 12

µ = E[X] = (0) + (1) + (2) + (3) = = 1.7
35 35 35 35 7
Therefore, if a sample of size 3 is selected at random over and over again from a lot of 4 good
components and 3 defective components, it will contain, on average, 1.7 good components
Variance (σ 2 )
The variance σ 2 of a random variable, X or var[X], is defined as the mean value, or the
expected value, of the square of the deviation of x from its mean. i.e.
σ 2 = E[(x − µ)2 ]
In general, the expected value of a function g(x) of a discrete random variable is given by:
E[g(x)] = g(x)P (X = x)
X
It follows that the variance of X is:
σ 2 = var[X] = E[(x − µ)2 ] = (x − µ)2 P (X = x)

X
Expectation values satisfy the following properties:
1. E[cX] = cE[X]
2. E[X + Y ] = E[X] + E[Y ]
3. E[XY ] = E[X]E[Y ] if X and Y are independent.
We can also calculate other expectation values related to the random variable X. For
example, we can calculate: E[X 2 ], E[X 3 ], E[X 4 ]. These are called the moments of X.
104
Example: For the two coin toss problem considered previously (µ = 1 )
Outcome (x) Result of Coin Toss Probability f (x) (x − µ)2 f (x)(x − µ)2
0 H and H 1/4 1 1/4
1 H and T 1/2 0 0
2 T and T 1/4 1 1/4
σ2 = (x − µ)2 P (X = x) = 1/2
X
Note: A shortcut formula for the variance is often used:
σ 2 = E[(x − µ)2 ] = (x − µ)2 f (x)

X
= (x2 − 2µx + µ2 )f (x)

X
= x2 f (x) − 2µ xf (x) + µ2 f (x)

X X X
x x x
As µ = xf (x) and f (x) = 1, then:

X X
x x
σ2 = x2 f (x) − µ2 = E[X 2 ] − µ2
X
Standard Deviation
√
The population standard deviation is the square root of the variance: σ = σ2
8.4.4 Expectation for Continuous Random Variables
The expectation values for a continuous random variable are similar in form to the discrete
random variable equations.
As the probability denisty functions are defined in terms of integrals, we get:

Z ∞
Mean: µ = E[X] = xf (x)dx
−∞
Z ∞
Variance: σ 2 = var[X] = E[(X − µ)2 ] = (x − µ)2 f (x)dx = E[X 2 ] − µ2
−∞
105
8.5 EXERCISES - Probability and Statistics
1. An experiment consists of flipping a coin and then tossing a single die if the coin shows
heads or two dice if the coin shows tails. Using the notation H4, for example, to denote
the outcome that the coin shows heads and the die comes up 4, and T 15 to denote the
outcome that the coin comes up tails and the dice show a 1 and a 5, illustrate the 27
elements of the sample space by using a tree diagram. The order of the result of the
dice is not important.
2. A store advertises for two positions: one is front of house and one is for stacking
shelves. The store receives 4 applicants, two of whom are teenagers and the other two
are adults.
The store selects one applicant for the position of front of house at random and then
selects one of the remaining applicants for the shelve stacking position at random.
Using the notation of T1 A2 to denote that the first teenager has been selected for
front of house and the second adult has been selected for stacking shelves, answer the
following questions:
(a) list the elements of the sample space.

(b) list the elements that correspond to the front of house position being filled by a
teenager. Call this event B.
(c) list the elements in which exactly one of the two positions is filled by an adult.
Call this event C.
(d) list the elements in which no adult is employed. Call this event D.
(e) list the elements of the set (B and C).
(f) list the elements of the set (B and D).
(g) construct a Venn diagram to illustrate the intersections and unions of the events
B, C and D.
3. (a) 6 people are in a queue to buy a ticket. How many ways can they be lined up?
(b) If, within the queue, there is a group of 3 people who insist on being next to one
another, how many ways are possible?
(c) If, instead, 2 people cannot follow one another, how many ways are possible?
4. Three people are selected at random from a pool of 40 to join a jury. Find the number
of sample points in S for choosing a jury.
5. In how many ways can a group of 5 people be seated at a poker table in a circle?
(Permutations in which everyone has the same neighbours as another permutation
should be discounted)
6. You have 7 friends who are willing to help you move but you only need 3. How many
possible groups of friends can you choose?
7. A box contains 500 lucky-dip prizes, of which 75 are worth $1, 150 are worth $2, and
275 are worth $5. What is the sample space? Assign probabilities to these possible
samples and then find the probability that the first prize drawn is worth more than $1.
8. A physicist has applied for two grants. They estimate that the probability that they
receive grant A is 0.7, and the probability that they receive grant B is 0.4, and they
also estimate that the probability that they obtain any grant is 0.8. What is the
probability that the person: (a) obtains both grants? (b) obtains neither grant?
106
9. A 4th year subject in engineering has 10 undergrads, 30 postgrads, and 10 postdocs
sitting in the course. At the end of semester, 3 of the undergrads, 10 of the postgrads,
and 5 of the postdocs get a HD for the subject. If a student is chosen at random from
this class and is found to have earned a HD, what is the probability that he or she is
a postgrad?
10. Two transport vans contain a concert band’s instruments. One contains 5 clarinets and
2 flutes and the other contains 2 clarinets, 3 flutes and 1 drumkit. If 1 instrument is
taken at random from each of the vans, find the probability that: (a) both instruments
are clarinets; (b) one instrument is a clarinet and one is a flute; (c) the two instruments
are of different types, (d) given that one is a flute, the other is a clarinet.
11. A bakery makes and sells two types of cake for special occasions: sponge and carrot.
Based on long-range sales, the probability that a customer who purchases a sponge
cake is 0.75. Of those that purchase carrot cakes, 90% also ask for frosting. But only
50% of the buyers of sponge cakes ask for frosting. A randomly selected buyer orders
a cake with frosting. What is the probability that the cake is a sponge cake?
12. The circuit below functions if there is an unbroken path from left to right. The proba-
bility for each component to be working is shown in each box and is independent from
the other components. What is the probability:
(a) That the circuit fails?
(b) That component A is working, given that the circuit is working?
(c) That component E is broken, given that the circuit is broken?
13. Draw a Venn diagram showing the sets of probability that correspond to the following
statements. The population of a city has been classified into children and adults. Of
the adults, a proportion are found to be employed and no children are employed. It
is found that a larger proportion of children play sports when compared to adults.
Finally, the proportion of the population with blue eyes is the same for children and
adults as is the proportion of people with brown eyes.
After drawing the diagram, point out any different probabilities that are independent,
mutually exclusive, or otherwise correlated.
14. The probability for a customer to write a bad review about a restaurant is P (X) = 10%,
the probability of a hot day is P (Y ) = 50% and the proportion of reviews that are bad
and occur on hot days is P (X and Y ) = 2%. Can you say that event X and Y are
independent, mutually exclusive or otherwise? Justify why in all three cases.
107
15. Classify the following variables as discrete or continuous:
(a) The number of car accidents per year in Queensland.

(b) The length of time to play 18 holes of golf.
(c) The amount of milk produced yearly by a particular cow.
(d) The number of eggs laid each month by a hen.
(e) The number of times the measured heart rate of a patient exceeds 90 bpm.
(f) The weight of grain produced by a farm.
(g) The proportion of people in a theatre that are eating popcorn.
16. In a gambling game, a woman is paid $3 if she draws a jack or a queen and $5 if she
draws a king or an ace from an ordinary deck of 52 playing cards. If she draws any
other card, she loses. (a) How much should she pay to play if the game is fair, that is,
if its expected payoff is zero? (b) In this case, what is the variance of the payoff?
17. Computer technology has produced an environment in which “robots” operate with
the use of microprocessors. The probability that a robot fails during any 6 hour shift
is 0.10. What is the probability that a robot will operate at most 5 shifts before it
fails?
18. The probability distribution of X, the number of imperfections per 10 meters of a
synthetic fabric in continuous rolls of uniform width, is given by:
x 0 1 2 3 4
f (x) 0.41 0.37 0.16 0.05 0.01
Find the average number of imperfections per 10 meters of this fabric and the variance
in the number of imperfections.
19. A cereal manufacturer is aware that the weight of the product in the box varies slightly
from box to box. In fact, considerable historical data have allowed the determination of
the density function that describes the probability structure for the weight (in ounces).
Letting X be the random variable weight, in ounces, the density function can be
described as f (x) = 2/5 if 23.75 6 x 6 26.25, and f (x) = 0 elsewhere.
(a) Verify that this is a valid density function;

(b) Determine the probability that the weight is smaller than 24 ounces;
(c) The company desires that the weight exceeding 26 ounces be an extremely rare
occurrence. What is the probability that this rare occurrence does actually occur?
(d) Express the cumulative distribution function, F (x), graphically.
20. The waiting time, in hours, between successive speeders spotted by a radar unit is a
continuous random variable with cumulative distribution function:
0 x<0

F (x) =
1 − e−8x x>0
(a) Find the probability of waiting less than 12 minutes between successive speeders.
(b) Find the probability of waiting more than an hour between successive speeders.
(c) Find the probability density function and comment on the relative likelihood of
finding a successive speeder at 10 min as compared to finding a speeder at 30 min.
108
21. Consider the density function
√
k x 0<x<1

f (x) =
0 elsewhere
(a) Find the value of k.

(b) Find F (x) and use it to evaluate P (0.3 < X < 0.6).
22. Consider again the information in Q19. Use the definitions to compute (a) the expected
value as well as (b) the variance of the weight of the product in a cereal box.
23. Based on historical data, suppose that the probabilities are 0.4, 0.3, 0.2, and 0.1,
respectively, that 0, 1, 2, or 3 power failures will strike a certain subdivision in any
given year.
(a) Find the mean and variance of the random variable X representing the number
of power failures striking this subdivision.
(b) Suppose that a certain quantity Z of interest to the division relates to the number
of power failures as Z = 5X + 3. Find the mean and variance of Z (hint: notice
that Z is also a random variable that is a linear combination of X).
24. Show that the following is true: V ar[X] = E[(X − µ)2 ] = E[X 2 ] − E[X]2
25. Assume that the velocity in the x−direction of atoms in a gas takes the distribution
1/2 −1 < vx < 1

f (vx ) =
0 elsewhere
(a) Find the expected value and standard deviation of vx .

(b) Use the results of part (a) to find the expected value of vx2 .
26. A continuous random variable X has the density function:
e−x x>0

f (x) =
0 elsewhere
Find the expected value of g(X) = e2X/3 .
109
8.6 SELECTED ANSWERS - Probability and Statistics
1.
2. (a) S = {T1 T2 , T1 A1 , T1 A2 ; T2 T1 , T2 A1 , T2 A2 ; A1 T1 , A1 T2 , A1 A2 ; A2 T1 , A2 T2 , A2 A1 }
(b) B = {T1 T2 , T1 A1 , T1 A2 , T2 T1 , T2 A1 , T2 A2 }
(c) C = {T1 A1 , T1 A2 , T2 A1 , T2 A2 , A1 T1 , A1 T2 , A2 T1 , A2 T2 } (d) D = {T1 T2 , T2 T1 }
(e) B ∩ C = {T1 A1 , T1 A2 , T2 A1 , T2 A2 } (f) B ∩ D = {T1 T2 , T2 T1 }
(g)
3. (a) 6! = 720 (b) (24)(6) = 144 (c) 720 − 240 = 480

40 40! (40)(39)(38)
!
4. 40
C3 combinations: 40
C3 = = = = 9880
3 3!37! (3)(2)
5. 4! = 24 or 5!/5 = 24
7 7! (7)(6)(5)
!
6. C3 =
7
= = = 35
3 3!4! (3)(2)
7. S = {$1, $2, $5}, P (draw > $1) = P ($2 or $5) = P ($2) + P ($5) = 0.85, These events
are mutually exclusive.
8. (a) 0.3 or 30% (b) 0.2 or 20%

10
9. = 0.5556 or 55.56%
18
10. (a) 0.2381=23.81% (b) 0.4523=45.23% (c) 0.6190=61.9% (d) 0.7036=70.36%
11. 0.6250 = 62.5
12. (a) 0.1761 = 17.61% (b) 0.7345 = 73.45% (c) 0.3691 = 36.91%
13.
110
14. X and Y are not independent. X and Y are not mutually exclusive.
15. (a) discrete, (b) continuous, (c) continuous, (d) discrete, (e) discrete, (f) continuous,
(g) Can be continuous or discrete.
16. (a) Should not pay more than $1.23 to play. (b) Variance: $3.72
17. 0.4686 = 46.86%
18. 0.88
19. (a) Valid density function (b) 0.1 = 10% (c) 0.1 = 10%
 0 x < 23.75


2
Z x 


(d) F (x) = f (t) dt = (x − 23.75) 23.75 < x < 26.25
−∞ 



5
0 otherwise


20. (a) 0.7981 = 79.81% (b) 0.0003 = 0.03% (c) 14.39 14.4 times more likely
21. (a) k = 3/2 (b) 0.3004 = 30.04%
22. (a) Mean: µ = E(X) = 25 (b) Variance: σ 2 = 0.5208
23. (a) Mean: E(X) = 1, Variance: V ar(X) = 1 (b) V ar(Z) = V ar(5X + 3) = 25
24. Proof
√
25. (a) E[vx ] = 0, σ 2 = 1/3, σ = 1/ 3 (b) Using: E(vx2 ) = 1/3
26. E[g(x)] = 3
111
9 Discrete Probability Distributions
We will consider three specific discrete probability distributions: The Binomial Distribution,
The Poisson Distribution and The Hypergeometric Distribution. We will investigate the
probability mass function (pmf), the cumulative distribution function (cdf) and the expec-
tation values (mean and vaiance) of each.
9.1 Binomial Distribution

A binomial experiment satisfies the following criteria:
1. The experiment consists of n identical trials.

2. Each trial has two possible outcomes. Generally called success (1) or failure (0).
3. For each trial, the probability of success is p, and the probability of failure is q = (1−p).
The probabilities remain the same for each trial.
4. The trials are independent.
5. We are interested in X, the number of successes observed for the n trials.
Consider the problem of tossing a single coin n times.
We can also allow for the possibility that the coin is not ‘fair’. i.e. Allow the possibility of
a head be different than the possibility of a tail. Let the possibility of a head be p and the
possibility of a tail be q. As we only have two possible outcomes: q = 1 − p
For n tosses, calculate the probability that there are i heads.
Initially consider the case where the first i tosses are heads and the rest are tails. i.e. the
sequence of outcomes is:
|1111{z. . . 1} 00000
| {z . . . 0}
i ones n − i zeros
The probability of this occurring is pi q n−i . This is also the probability for any particular
sequence of i heads and n − i tails.
The second part is to work out the number of different ways that we can get i heads and
n − i tails. This will be the same as the number of ways that we can order the digits
111 . . . 1000 . . . 0.
!
n n!
This is the same as the number of combinations: Crn = = , or the binomial
r r!(n − r)!
coefficient.
112
Putting these results together, we get the probability mass function which states the prob-
ability of x successes from n trials:
! !
n x n−x n x
f (x) = P (X = x) = p q = p (1 − p)n−x
x x
Note: This is often written as: X ∼ binom(x; n, p)
Note: From the Binomial Theorem, the probability of getting any number of successes is:
n n n
!
n k n−k
f (k) = P (X = k) = = (p + q)n = 1 as expected
X X X
p q
k=0 k=0 k=0 k
The graph of the binomial distribution for p = 0.7, q = 0.3 and n = 20 is shown:
The cumulative distribution function corresponding to the binomial distribution is:

m m
!
n i n−i
F (m) = P (X 6 m) = P (X = i) =
X X
pq
i=0 i=0 i
The graph of the cumulative distribution function for p = 0.7, q = 0.3 and n = 20 is:
113
Expected Values for the Binomial Distribution
Mean: µ = np
Variance: σ 2 = npq
√
Standard Deviation: σ = npq
Example: Components are built on mass, and tested with a shock test. The probability
that a single component will survive a shock test is 0.75. (a) Find the probability that
exactly 2 of the next 4 components will survive. (b) Find the probability that at least 3 of
the 4 components survive.
!
n x
(a) Using: binom(x; n, p) : f (x) = P (X = x) = p (1 − p)n−x
x
where: x = 2, n = 4, p = 0.75
4
!
binom(2; 4, 0.75) : P (X = 2) = (0.75)2 (1 − 0.75)4−2
2
4!
= (0.75)2 (0.25)2
2!(4 − 2)!
= 0.2109
(b) Probability of at least 3 survive means that 3 or 4 out of 4 survive:
4 4
! !
P (X = 3) + P (x = 4) = (0.75)3 (1 − 0.75)4−3 + (0.75)4 (1 − 0.75)4−4
3 4
= 4(0.75)3 (0.25) + (0.75)4
= 0.7383
The mean: µ = np = (4)(0.75) = 3
And the variance: σ 2 = npq = (4)(0.75)(0.25) = 0.75
Example: There are 20 marbles in a jar, 2 white marbles and 18 black marbles. 3 marbles
are chosen at random without replacement.
In this example, trials are not independent as the probability of getting a white marble on
the second pick depends significantly on the outcome of the first pick. Therefore using a
binomial distribution would not give accurate approximations.
If we had 200 marbles in the jar, the difference in probability would be small and using the
binomial distribution would give more accurate approximations.
114
9.2 Poisson Distribution
The Poisson distribution of a discrete random variable, expresses the probability of a given
number of events occurring in a fixed interval of time or a fixed region in space, if these
events occur with a known constant mean. Events must occur independently of the time
since the last event.
Examples Include:
• The number of traffic accidents at an intersection over a one week period.

• The number of arrivals at a checkout counter over a one hour period.
• The number of bacteria in a given volume of fluid.
λx e−λ
The probability mass function: P (X = x) = f (x) =
x!
where λ is the mean and x = 0, 1, 2, 3, . . ..
Note: This is often written as: P ois(x; λ)
The variance of the Poisson distribution is also the mean: σ 2 = λ
Graphs of the probability mass function for different values of λ:
Example: In a laboratory experiment, the average number of radioactive particles passing

through a counter in 1ms is 4. Calculate the probability that exactly 6 particles pass through
a counter in 1ms.
λx e−λ
P ois(x; λ) : P (X = x) = f (x) = , where x = 6, λ = 4
x!
46 e−4
P ois(6; 4) : P (X = 6) = f (6) = = 0.1042
6!
115
Poisson Approximation to the Binomial Distribution
The Poisson distribution is the limiting case of the Binomial distribution as the number of
trials increases to infinity.
Consequently, the Poison distribution can be used as a good approximation to the Binomial
distribution if there is a large number of trials.
Example: In the manufacture of a product, defects occur at an average of 1 in 1000 items.

Calculate the probability that a random sample of 5000 items will contain 6 defects.
This is a binomial experiment with n = 5000 and p = 0.001: binom(6; 5000, 0.001)
As n is large, we can approximate the probability using a Poison distribution: P ois(6, 5),
where λ = 5 is the mean of the binomial distribution: µ = np = (5000)(0.001).
Calculating each of the functions above:

binom(6; 5000, 0.001) P ois(6; 5)
5000
!
f (x) = (0.001)6 (0.999)4994 56 e−5
6 f (x) =
6!
= 0.1463 = 0.1462
116
9.3 Hypergeometric Distribution
For the binomial distribution, the probability of a success (or of a failure) is constant for
successive events. i.e. all trials are independent. This is equivalent to choosing items from
a set with replacement.
If an experiment consists of choosing items from a set without replacement, then the
probability of success dependents on the outcome of the previous trials.
The hypergeometric distribution describes the probability of a given number of successes in

a sample taken from the population of all possible outcomes that contains a known number
of successes.
Examples include:
• A box contains 10 red balls, 10 yellow and 20 blue. What’s the probability that 5 blue
balls are chosen from a random sample of 10 balls.
• The probability of choosing 5 red cards (diamonds or hearts) in a hand of 10 cards
chosen from a standard deck of 52.
• A group has 60% females and 40% males. What’s the probability a random sample of
10 people will have 7 females.
Define the following variables:
• N = number of elements in the population.

• k = number of elements in the population that are successes. Therefore N − k is the
number of failures.
• n = number of elements in the sample.
• x = number of successes in the sample. This is what we’re interested in: P (X = x)
The probability mass function for the hypergeometric distribution is:

k N −k
x n−x
P (X = x) = f (x) =
N
n
The mean and variance of the hypergeomtric distribution are similar to the binomial distri-
bution with a correction for population size.
!
k
Mean: µ = n
N
! !
k N −k N −n

Variance: σ = n
2
N N N −1
117
Example: A bucket contains 8 balls, 5 of which are red and 3 of which are white. A
sample of 4 balls is randomly selected from the bucket. (a) Find the probability distribution
for x, the number of white balls in the sample. (b) Find the mean and variance of x.
For this example: N = 8, n = 4, k = 3, N − k = 5

k N −k
x n−x
Therefore: P (X = x) =
N
n

3 5
x 4−x
=
8
4
(a) The possible values of x are 0, 1, 2, 3

3 5
0 4 (1)(5)
P (X = 0) = f (0) = = = 5/70 = 0.0714
(70)

8
4

3 5
1 3 (3)(10)
P (X = 1) = f (1) = = = 3/7 = 0.4286
70

8
4

3 5
2 2 (3)(10)
P (X = 2) = f (2) = = = 3/7 = 0.4286
70

8
4

3 5
3 1 (1)(5)
P (X = 3) = f (3) = = = 5/70 = 0.0714
70

8
4
(b) Mean and Variance of x.

3 3

Mean: µ = 4 =
8 2
3 8−3 8−4 15

Variance: σ = 4
2
=
8 8 8−1 28
118
9.4 EXERCISES - Discrete Probability Distributions
1. A shipment of 7 television sets contains 2 defective sets. A hotel makes a random
purchase of 3 of the sets. If X is the number of defective sets purchased by the hotel,
find the probability distribution of X. Express the results graphically as a probability
histogram, then find the following probabilities:
(a) P (X = 1)
(b) P (0 < X 6 2)
2. A safety engineer claims that only 40% of all workers wear safety helmets when they
eat lunch at the workplace. Assuming that this claim is right, find the probability that
4 of 6 workers randomly chosen will be wearing their helmets while having lunch at
the workplace.
3. One prominent physician claims that 70% of those with lung cancer are chain smokers.
If his assertion is correct,
(a) Find the probability that of 20 such patients recently admitted to a hospital, more
than 18 are chain smokers.
(b) Find the probability that of 10 such patients recently admitted to a hospital,
fewer than half are chain smokers.
4. A manufacturer knows that on the average 20% of the electric toasters which he makes
will require repairs within 1 year after they are sold. When 20 toasters are randomly
selected, find appropriate numbers x and y such that
(a) the probability that at least x of them will require repairs is less than 0.5
(b) the probability that at least y of them will not require repairs is greater than 0.8.
5. To avoid detection at customs, a traveler places 6 narcotic tablets in a bottle containing
9 vitamin pills that are similar in appearance. If the customs official selects 3 of the
tablets at random for analysis, what is the probability that the traveler will be arrested
for illegal possession of narcotics?
6. Population studies of biology and the environment often tag and release subjects in
order to estimate size and degree of certain features in the population. Ten animals of
a certain population thought to be extinct (or near extinction) are caught, tagged and
released in a certain region. After a period of time a random sample of 15 of this type
of animal is selected in the region. What is the probability that 5 of those selected are
tagged animals if there are 25 animals of this type in the region?
7. A secretary makes 2 errors per page, on average. What is the probability that on the
next page he or she will make
(a) 4 or more errors?
(b) No errors?
119
8. Changes in airport procedures require considerable planning. Arrival rates of aircraft
are important factors that must be taken into account. Suppose small aircraft arrive
at a certain airport, according to a Poisson process, at the rate of 6 per hour.
(a) What is the probability that exactly 4 small aircraft arrive during a 1 hour period?
(b) What is the probability that at least 4 arrive during a 1 hour period?
(c) If we define a working day as 12 hours, what is the probability that at least 5
small aircraft arrive during a working day?
9. Hospital administrators in large cities anguish about problems with traffic in emergency
rooms in hospitals. For a particular hospital in a large city, the staff on hand cannot
accommodate the patient traffic if there are more than 3 emergency cases in a given
hour. It is assumed that patient arrival follows a Poisson process and historical data
suggest that, on the average, one emergency arrives per hour.
(a) What is the probability that in a given hour the staff can no longer accommodate
the traffic?
(b) What is the probability that more than 4 emergencies arrive during a 3 hour shift
of personnel?
120
9.5 SELECTED ANSWERS - Discrete Probability Distributions
1.
(a) 0.5714 = 57.14% (b) 0.7143 = 71.43%
2. 0.1382 = 13.82%
3. (a) 0.0076 = 0.76% (b) 0.0473 = 4.73%
4. (a) 5 (b) 15
5. 0.8154 = 81.54%
6. 0.2315 = 23.15%
7. (a) 0.1431 = 14.31% (b) 0.1353 = 13.53%
8. (a) 0.1339 = 13.39% (b) 0.8488 = 84.88% (c) 1 = 100%
9. (a) 0.0190 = 1.9% (b) 0.1845 = 18.45%
121
10 Continuous Probability Distributions
10.1 Uniform Probability Density Function
Assume that we have a continuous random variable that can take a value between two points
(a and b). The uniform distribution is a continuous probability distribution and is concerned
with events that are equally likely to occur within the interval: [a 6 X 6 b]
The uniform probability density function (pdf) has a rectangular shape over the interval.
As the total area under the curve (i.e. the total probability) must equal 1, the height is:
1/(b − a)
The uniform probability density function (pdf) is described by:
1

a6x6b


f (x) = b−a
 0

otherwise
Z b
a+b
Mean: µ = E[X] = xf (x) dx =
a 2
Note: The area under the probability density function on each side of the mean must be
the same (i.e. 50%).
(b − a)2
Variance: σ = E[(X − µ) ] =
2 2
12
The probability of an event occurring between any two points c and d within the interval
(a, b) is equal to the area under the uniform pdf curve and is given by:
Z d
1 d−c
P (c 6 x 6 d) = dx =
c b−a b−a
122
Example: Buses arrive at the university bus stop every 30 minutes. A student arrives at
the bus stop at a random time. The time that the student waits for the next bus to arrive
(X) could be described by a uniform distribution over the interval from 0 to 30 mins.
a) Determine the probability density function.
b) Find the probability that the waiting time will exceed 10 minutes: P (x > 10)
c) Calculate the mean and standard deviation of x.
d) Calculate the probability that the waiting time will lie within one standard deviation
of the mean i.e. (µ ± σ)
a) Probability density function:
1

0 6 x 6 30


f (x) =  30
 0 otherwise
b) Probability that waiting time will exceed 10 mins.

Z ∞ Z 30
1 2
P (x > 10) = f (x)dx = dx =
10 10 30 3
c) Mean and Standard Deviation
b−a
Mean: µ = = 15 mins.
2
b−a 30 − 0
Standard Deviation: σ = √ = √ = 8.66
12 12
d) Probability that the wait time will lie within one standard deviation of the mean.
Z 23.66 Z 23.66
1
P (6.34 6 x 6 23.66) = f (x)dx = dx = 0.577
6.34 6.34 30
123
10.2 Normal Distribution (Gaussian Distribution)
The normal distribution is a particularly important distribution in statistics. Many natural
phenomena can be modelled using a normal distribution.
The normal probability density function is given by:
1 2 2
f (x) = √ e−(x−µ) /2σ −∞6x6∞
σ 2π
Where µ is the mean and σ is the standard deviation.
This function describes a family of normal distributions for different values of µ and σ.
The graph of this function is symmetric about x = µ and approaches zero as x → ±∞.
The general shape of the distribution is bell-shaped.

Z ∞ As for all density functions, the total
area under the curve must be equal to one: f (x) dx = 1
−∞
The shape of the distribution is affected by the value of σ.
A large value of σ gives a distribution curve with reduced height and greater spread.
A small value of σ gives a distribution with increased height and reduced spread.
Note: The notation used to indicate a random variable X follows a normal distibution
with mean of µ and a standard deviation of σ is:
X ∼ N (µ, σ)
124
10.2.1 Standard Normal Distribution
To calculate the probability associated with a random variable described by a normal dis-
tribution, we require integrals of the form:
Z b Z b
1 2 2
P (a 6 X 6 b) = f (x) dx = √ e−(x−µ) /2σ dx
a a σ 2π
This is a complicated integral and will change for different values of µ and σ.
The standard normal distribution (for variable Z) is a transformation of the normal distri-
bution (for variable X) obtained by the transformation:
(X − µ)
Z=
σ
Which gives a transformed distribution with a mean µ = 0 and standard deviation σ = 1.
The standard normal density function becomes:
1 2
f (z) = √ e−z /2
2π
The graph of the standard normal density function is called the standard normal curve:
The value of Z corresponding to a particular value of X is known as the Z-score for that
value.
(X − 50)
Example: If X ∼ N (50, 10), we transform to Z ∼ N (0, 1) using Z = .
10
Therefore the X−values x1 = 45 and x2 = 62 are transformed to Z−scores:
45 − 50 62 − 50
z1 = = −0.5 and z2 = = 1.2
10 10
and
P (45 < X < 62) = P (−0.5 < Z < 1.2)
To evaluate the probability that Z takes a value between to z−scores, we need to evaluate
the area under the standard normal curve. From the last example:
1 Z 1.2 −z2
P (−0.5 < Z < 1.2) = √ e 2 dz
2π −0.5
125
Fortunately we are not required to evaluate this integral. Areas under the standard normal
curve have previously been calculated and tabled:
STANDARD NORMAL CURVE AREAS
This table gives areas under the standard

normal distribution φ, between 0 and t > 0
in steps of 0.01.
t 0 1 2 3 4 5 6 7 8 9
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 ,4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
126
In the above table, all probabilities are displayed as P (0 < Z < t), i.e. the area under the
curve from Z = 0 to Z = t. This is why the largest value in the table is 0.5 which is half of
the total area under the curve.
So we need to use the symmetry of the graph, and the resulting values in the table to
calculate probabilities.
From the last example, to calculate the desired probability:
P (−0.5 < Z < 1.2) = P (−0.5 < Z < 0) + P (0 < Z < 1.2)
= P (0 < Z < 0.5) + P (0 < Z < 1.2)
= 0.1915 + 0.3849
= 0.5764
Example: From the table: P (0 6 Z 6 0.7) = 0.2500
This is the probability that Z falls between 0 and 0.7 standard deviations from the mean.
Example: Determine the area under the curve within the range:
(a) One standard deviation either side of the mean.
(b) Two standard deviations either side of the mean.
(a) For one standard deviation on one side of the mean, the area is given by:
P (0 < Z < 1) = 0.3413
So for one standard deviation either side of the mean, the area is:
P (−1 < Z < 1) = 2P (0 < Z < 1) = 2(0.3413) = 0.6826 = 68.26%
(b) For two standard deviations either side of the mean, the area is given by:
P (−2 < Z < 2) = 2P (0 > Z < 2) = 2(0.4772) = 0.9544 = 95.44%
Example: The heights of a plantation of seedlings are found to fit a normal distribution
with a mean of 72 cm and a standard deviation of 8cm. What is the probability that a
randomly selected seedling will be between 68 cm and 82cm tall.
Let X be the height of seedlings that is normally distributed with µ = 72 and σ = 8.

Therefore X ∼ N (72, 8)
(x − µ) x − 72
Transform to z−scores using: z = = , so that Z ∼ N (0, 1).
σ 8
127
For x = 68, z = −0.5 and for x = 82, z = 1.25.
Therefore: P (68 < x < 72) = P (−0.5 < z < 1.25)

= P (0 < z < 0.5) + P (0 < z < 1.25)
= 0.1915 + 0.3944
= 0.5859
Therefore the probability that a randomly chosen seedling has a height between 68 and 82
cm is 0.5859 or 58.59%.
Normal Distribution approximation to the Binomial Distribution
We have already used the Poisson Distribution as an approximation to the Binomial Distri-
bution when n is large and p is small.
The normal distribution is also a good approximation to the binomial distribution when p
is close to 0.5 and n is large.
To demonstrate the approximation, define the mean and standard deviation of the normal
ditribution as:
√
µ = np and σ = npq
Example: Consider a Binomial distribution with n = 15 and p = 0.4. i.e. X ∼

binom(x; 15, 0.4)
15!
!
n x
Therefore: P (X = 4) = f (4) = p (1 − p)n−x = (0.4)4 (1 − 0.4)15−4 = 0.1268
4 4!11!
To approximate this with a normal distribution, using the random variable X̄, let: µ = np =
√
6, and σ = npq = 1.897. i.e. X̄ ∼ N (6, 1.897)
To approximate P (X = 4) we use P (3.5 < X̄ < 4.5). Transform to a standard normal

(X̄ − µ)
distribution using Z = with z−scores z1 = −1.32 and z2 = −0.79.
σ
Therefore P (−1.32 < Z < −0.79) = 0.4066−0.2852 = 0.1214. Giving a close approximation.
128
10.3 Log-Normal Distribution
This distribution applies where a natural log transformation of the random variable results
in a normal probability distribution.
y = f (x) y = ln[f (x)]
The log-normal probability density function is given by:
1
h i
− 12 ( ln x−µ )
2
f (x) = √ e σ
for x > 0
2πσx
The mean and standard deviation of the log-normal random variable X are related to the
mean and standard deviation of ln X.
Letting Y = ln(X), we can show that:
1
µx = e[µy + 2 σy ]
1
mean: µy = ln µx − σy2 or
2
 !2 
σx
σx2 = e[2µy +σy ] eσy − 1
2
2

variance: σy2 = ln 1 +  or
µx
These equations are not particularly convenient to use directly. As y = ln(x), we can use
standard normal distribution tables for log-normal distributions.
Example: The particle size of the material coming out of a rock crusher (X) follows a
log-normal distribution with a mean of 2 cm and a standard deviation of 1 cm. Particles are
put through a sieve screen with a mesh size of 1 cm. Determine the proportion of particles
that will pass through the screen, i.e. p(X < 1)
Given: µx = 2, σx = 1, we first calculate the mean and variance of the corresponding normal
distribution, Y = ln(X).
129
 !2 
σx 1
variance: σy2 = ln 1 + mean: µy = ln µx − σy2
2

µx
1
1
2 !
= ln 2 − (0.2231)
= ln 1 + 2
2
= 0.5815
= 0.2231
∴ σy = 0.4724
We transform to a standard normal distribution in order to use the probability tables:

ln X − uy
" # " #
Y − uy
Z= =
σy σy
ln 1 − 0.5815
" #
The Z−score corresponding to X = 1 or Y = ln X = 0 is: Z = = −1.2311
0.4724
From the standard normal distribution tables: P (X < 1) = 0.5 − 0.3907 = 0.1093
Therefore, 10.93% of the particles will pass through the screen.
130
10.4 Exponential Distribution
The exponential distribution is often used to describe the amount of time until some specific
event happens. It is a process in which events happen continuously and independently at
a constant average rate. This rate is called the distribution rate λ. So the exponential
distribution can be used to describe the time between occurrences of successive events as
time progresses continuously.
The exponential probability density function is given by:
f (x) = λe−λx x > 0; λ > 0
1 1
where the mean, µ = and standard deviation, σ =
λ λ
An example of the graph of this function is shown below:
The cumulative distribution function is given by:

Z x
F (x) = λe−λx dx = 1 − e−λx
0
Relationship between Exponential and Poisson Distributions
Recall that the Poisson (discrete) distribution describes the probability that x events will
occur over a given length of time, and has a probability mass function:
µx e−µ
f (x) =
x!
where the single parameter µ is the mean number of events that occur in the given time.
µ
We can define an average rate of occurrence as: λ =
t
131
5
For example, if µ = 5 occurrences in 10 hrs, then λ = = 0.5 occurrences per hour.
10
(λt)x e−λt
Substituting into the Poisson pmf gives: f (x) =
x!
(λt)0 e−λt
The probability of no events occurring in time t is: P (X = 0) = f (0) = = e−λt
0!
If we let X be the random variable for the time required for the first event to occur, i.e. the
time to the first event, then the probability that the length of time to the first event will
exceed x is equal the same as the probability that no events occur in time x.
Using the above equation: P (X > x) = e−λx
Therefore: P (0 < X ≤ x) = 1 − e−λt

Z x
This is the cumulative distribution function (cdf) from above: F (x) = f (x)dx
0
dF (x)
Therefore, we differentiate to find the probability mass function: f (x) = = λe−λx .
dx
This is the density function for the exponential distribution.
1
Note: For the exponential distribution, we often write: λ = , which gives a pdf of:
β
1 − β1 x
f (x) = e
β
The mean and standard deviation are given by: µ=β and σ=β
In this form, β is the mean time between events. In reliability theory, we are concerned
about equipment failure, β is called mean time between failures, or to failure (MTTF), and
1
λ = is the mean failure rate (e,g, failures per hour, per cycle etc.).
β
In this application, the exponential distribution is based on the assumption of a constant
mean failure rate.
Example: A device has a mean failure rate of 0.05 failures per hour of operation. Calculate
the probability that the device will fail in the first 10 hours of operation.
Using the exponential pmf with λ = 0.05:

Z 10
P (X 6 10) = f (x) dx
0
Z 10
= λe−λx dx
0
10
= −e−0.05x
0
=1−e −0.05(10)
= 0.3935
132
10.5 Gamma Distribution
The Gamma Distribution models the probability of a certain time occurring until a specified
number of Poisson events have occurred.
We let α be the specified number of events, and β is the mean time between events as for
the exponential distribution.
The gamma distribution is based on the gamma function, which is defined as:
Z ∞
Γ(α) = xα−1 e−x dx α>0
0
Some properties of the gamma function

1. Γ(α) = (α − 1)Γ(α − 1)
2. Γ(1) = 1
3. Γ(n) = (n − 1)!
√
4. Γ 21 = π
The Gamma Function generalizes the factorial function for arbitrary values of α.
The gamma distribution includes the two parameters α and β and has a probability density
function: 
 1
xα−1 e−x/β x>0
f (x) = β α Γ(α)
 0 elsewhere
where α > 0 and β > 0.
The mean and variance of the gamma distribution are:
µ = αβ and σ 2 = αβ 2
The exponential distribution is a special case of the gamma distribution with α = 1.
133
10.6 Weibull Distribution
Like the gamma and exponential distributions, the Weibull distribution is also applied to
reliability and life-testing problems such as the time to failure or life length of a component,
measured from some specified time until it fails.
A random variable T that is described by a Weibull distribution with two parameters α (the
scale parameter) and β (the shape parameter) has a probability density function
β
f (t) = αβtβ−1 e−αt
where t > 0, α > 0 and β > 0.
The shape factor β, is related to the mean failure rate which is not necessarily constant in
this case, unlike the exponential distribution. And the scale factor α, is used to describe the
variability in the random variable being described.
The mean and variance are given by:

 !#2 
1 2 1
! ! "
− β1 − β2
 
µ=α Γ 1+ and σ2 = α Γ 1+ − Γ 1+
β  β β 
The cumulative distribution function (cdf) is given by:
β
F (t) = 1 − e−αt
where t > 0, α > 0 and β > 0
The shape of the plot of the Weibull pdf varies considerably with values of α and β. As such
it has wide application. It is used a lot in equipment reliability modelling. The figure shows
graphs of f (x) for α = 1 and different β values.
134
Example: The time to failure of a machine component follows a Weibull Distribution.
Let T be the random variable describing the time to failure, in hours with parameter values
α = 0.01 and β = 2. Calculate the probability that the machine part fails before 10 hrs of
operation.
2
P (T < 10) = F (10) = 1 − e−(0.01)(10) = 1 − 0.3679 = 0.6321
i.e. there is a 63.21% chance of failure within the first 10 hours.
Failure Rate for the Weibull Distribution
Let R(t) be the reliability of a component.
The reliability is defined as the probability that the component will survive at least until a
specified time under operating conditions.
(Conversely – the unreliability is the probability of failure within the specified time period).
Z ∞
R(t) = P (T > t) = f (t)dt = 1 − F (t)
t
Where f (t) is the probability density function of the time to failure and F (t) is the corre-
sponding cumulative distribution function.
The conditional probability that the component will fail in the time interval from T = t to
T = t + ∆t given that it has survived to time T is given by:
F (t + ∆t) − F (t)
R(t)
The failure rate function (failures per unit time) Z(t), is calculated by dividing by ∆t and
take the limit as ∆t → 0. Therefore:
F (t + ∆t) − F (t) 1 F 0 (t) f (t) f (t)
Z(t) = lim = = =
∆t→0 ∆t R(t) R(t) R(t) 1 − F (t)
For the Weibull Distribution, this gives:
Z(t) = αβtβ−1 for t>0
We can use this to model component failure where the failure rate is not constant. We can
interpret different failure rates as:
(a) β = 1: Constant failure rate. Weibull Dist. reduces to the exponential dist.
(b) β > 1 : The failure rate increases with time. i.e. the components show wear or damage.
(c) β < 1 : The failure rate decreases with time. The components get stronger with time.
135
10.7 EXERCISES - Continuous Probability Distributions
1. A company pays its employees an average wage of $15.90 an hour with a standard
deviation of $1.50. If the wages are approximately normally distributed and paid to
the nearest cent, calculate:
(a) the percentage of workers that receive wages between $13.75 and $16.22 per hour
(b) the hourly wage that the highest 5% of employees get paid.
2. The IQs of 600 applicants to a certain college are approximately normally distributed
with a mean of 115 and a standard deviation of 12. If the college requires an IQ of at
least 95, how many of these students will be rejected on this basis of IQ, regardless of
their other qualifications? Note that IQs are recorded to the nearest integers.
3. The length of time for one individual to be served at a cafeteria is a random variable
having an exponential distribution with a mean of 4 minutes. What is the probability
that a person is served in less than 3 minutes on at least 4 of the next 6 days?
4. The response time of a computer system, in seconds, has an exponential distribution
with a mean of 3 seconds. (a) What is the probability that response time exceeds 5
seconds? (b) What is the probability that response time exceeds 10 seconds?
5. The service life, in years, of a hearing aid battery is a random variable having a Weibull
1
distribution with α = and β = 2.
2
(a) After what time, is it expected that 50% of a batch of these batteries are dead?
(b) What is the probability that such a battery will be operating after 2 years?
1
6. The life of a car door seal has a Weibull distribution with failure rate Z(t) = √ .
t
Find the probability that such a seal is still intact after 4 years. Hint: the Weibull
β
distribution can be written as: f (t) = Z(t)e−αt .
7. A manufacturer of a large machine wishes to buy rivets from one of two manufactur-
ers. It is important that the breaking strengths of each rivet exceed 10,000 psi. Two
manufacturers (A and B) offer this type of rivet and both have rivets whose breaking
strength is normally distributed. The mean breaking strengths for manufacturers A
and B are 14,000 psi and 13,000 psi, respectively. The standard deviations are 2000
psi and 1000 psi, respectively. Which manufacturer will produce, on the average, the
fewest number of defective rivets?
8. The life of a device follows an exponential distribution with an advertised failure rate
of 0.01 per hour.
(a) What is the mean time to failure?

(b) What is the probability that 200 hours will pass before a failure is observed?
136
10.8 SELECTED ANSWERS - Continuous Probability
Distributions
1. (a) 0.5068 = 50.68% (b) 18.3674 = $ 18.37
2. 26 students (rounded to the nearest integer).
3. 0.3968 = 39.68%
4. (a) 0.1889 = 18.89% (b) 0.0357 = 3.57%

√
5. (a) 2 ln 2 = 1.1774 ≈ 1 year, 64 days (b) 0.1353 = 13.53%
6. 0.0183 = 1.83%
7. Manufacturer B will, on average, produce fewer defective rivets.

1
8. (a) Mean: µ = = 100 (b) 0.1353 = 13.53%
λ
137
11 Sampling and Hypothesis Testing
11.1 Chebychev’s Theorem
Chebychev found that the fraction of area for a probability distribution between any two
values symmetric about the mean is related to the standard deviation.
The probability that a random variable falls between two values is equal to the area.
Statement of Chebychev’s Theorem
The probability that any random variable X will assume a value within k standard deviations
1
of the mean is at least 1 − 2 . i.e.
k
1
P (µ − kσ < X < µ + kσ) > 1 −
k2
1 3
Example: For k = 2, random variable X has a minimum probability of 1 − 2 = of
2 4
falling within two standard deviations either side of the mean.
Note that the theorem gives a minimum value of the probability. The actual probability will
be something greater than this value.
Chebychev’s Theorem holds for any probability distribution (discrete or continuous).
138
11.2 Sampling and Sampling Distributions
11.2.1 Populations and Samples
A population is the entire set or collection of observations. i.e. it is the totality of possible
observations with which we are concerned.
• Each observation is a value of random variable X.
• X has a probability distribution f (x).
A sample is a smaller set of observations taken from the population. i.e. it is a subset of
the population. For example, the total number of resistors in a box might be 100 000. A
sample of 100 resistors may be taken from the box.
Generally we are interested in making inferences about the characteristics of a population

from a sample.
A parameter is a descriptive measure used to describe a characteristic of a population. For

example, Population mean µ, population standard deviation σ
A statistic is a descriptive measure used to describe a characteristic of a sample.
The main objective of statistics is to make inferences about a population based on the
information contained in a sample.
11.2.2 Measures of Location
Sample Mean: For a set of observations x1 , x2 , x3 , . . . , xn , the sample mean is given by:
n
xi x1 , x2 , . . . , xn
x̄ = =
X
i=1 n n
Sample Median: For an ordered set of observations x1 , x2 , x3 , . . . , xn , the sample median

is “the middle number”

 x(n+1)/2 if n is odd
x̃ =
 1
2
x n/2 + x n/2+1 if n is even
The mean is sensitive to extreme observations whereas the median is less effected.
11.2.3 Measures of Variability
Sample Range: xmax − xmin

(xi − x̄)2
n
Sample Variance: s2 =
X
i=1 n−1
√
Sample Standard Deviation: s = s2
139
11.3 Central Limit Theorem
The central limit theorem is an important concept in statistics and is widely used.
• Consider a population with a mean µ, and standard deviation σ, with some probability
distribution.
• Repeatedly take sufficiently large random samples (of size n from the population.
• The sample means are a random variable and hence will have their own probability
distribution.
• The central limit theorem states that the sample means will approach a normal distri-
bution as the number of samples tends to infinity.
• This is irrespective of the shape of the original probability distribution. For example
if the population distribution is binomial (discrete) or exponential (continuous) – the
distribution of sample means will approximate a normal distribution for a large enough
sample.
• If the population distribution is normal. The sampling distribution of the means will
be normal irrespective of sample size.
• What is a large enough sample size?
– n > 30
– If the population exhibits a normal distribution, then the central limit theorem
holds for samples of any size.
• In some way it explains why the normal distribution is so prevalent.
For the probability distribution for the sample means x̄
Mean of sample means: µx̄ = µ

σ
Standard deviation of the sample means: σx̄ = √
n
140
11.4 Statistical Hypothesis Testing
Hypothesis testing involves using sample data to test whether or not a claim about a par-
ticular population parameter is true.
It involves consideration of two contradictory hypotheses, known as the null and alternative
hypotheses.
Alternative Hypothesis (H1 ):

• The hypothesis that the researcher wishes to support.
Null Hypothesis (H0 ):
• The contradiction to the alternative hypothesis.
• Initially assumed to be true.
• Typically represents the pessimistic point of view or the status quo.
• It is generally specific. i.e. it identifies one particular value for the parameter under
test.
11.4.1 Similarities with a Court Trial
The statistical testing a hypothesis follows a similar process to a court trial:

• The court assumes the accused is innocent until proven guilty.
• The prosecution presents sufficient evidence to contradict the “not guilty” assumption.
• If the prosecution does not present compelling evidence to disprove the “not guilty”
assumption then:
– Innocence is not proved.
– Rather there is insufficient evidence to conclude guilty.
In a statistical hypothesis test:
• The null hypothesis is assumed to be correct.
• The researcher (playing the role of the prosecutor) believes the alternative hypothesis
to be true.
• Evidence from the sample (sample statistics) are used to reject the null hypothesis.
A statistical hypothesis test has four main elements:
• A null hypothesis
• An alternative hypothesis.
• A test statistic.
• A rejection region.
141
Example: Consider a population of students at a university. Consider the weight of these
students.
• Null Hypothesis: The average weight of students is 68 kg.
• Alternative Hypothesis: The average weight of students is not 68 kg.
• Test Statistic: The mean of a sufficiently large sample of students. Say, for example,
we take a sample of 36 students and calculate the mean weight.
• Rejection Region:
The choice of the boundaries of the rejection region (at this stage) are somewhat arbitrary.
They take on a bit more significance after the next section.
If the mean of the sample is less than 67 or greater than 69, we reject the null hypothesis.
i.e. we accept that the mean of the population is something other than 68.
11.4.2 Errors in Hypothesis Testing
Type I Error: A False-Positive
Rejecting the null hypothesis when it is actually true.
(In the court case analogy this would be where the accused is found guilty when they are
actually innocent)
The probability of making a type I error is given the variable α.
Also known as the level of significance or the significance level.
Type II Error: A False-Negative
Failing to reject the null hypothesis when it is actually false.
(In the court case analogy this would be where the accused is not found guilty when they
actually are).
The probability of making a type II error is given the variable β.
Decision Table
Null Hypothesis
Decision True False
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error
142
Example (revisited): For the example above, let us consider that we have a sample size
of 36 students.
The decision statistic is the mean of the sample x̄
It would be reasonable to assume that the central limit theorem applies and that the sample
means follow a normal distribution.
Assume that we know that the standard deviation of the population σ = 3.6. Note that
we often do not know this and need to use the standard deviation of the sample s, as an
estimate of σ.
Mean of the sample means: µx̄ = µ

σ 36
Standard deviation of the sample means: σx̄ = √ = √ = 0.6
n 36
Probability of a type I error
The probability of committing a type I error (i.e. rejecting that the mean is 68 when it is
actually true) is given by:
α = P (x̄ < 67 when µ = 68) + P (x̄ > 69 when µ = 68)
Transform to a standard normal distribution:

67 − 68 69 − 68
Z−scores: z1 = = −1.67 and z2 = = 1.67
0.6 0.6
From normal distribution probability tables we get: α = 0.0950
i.e. there is a 9.5% chance that the sample we took has a mean in the reject region even
though the mean of the population is 68
To reduce the chance of a type I error:

• Increase the sample size.
• Increase the size of the acceptance region. (Reduce the size of the rejection region)
143
Probability of a type II error
The probability of commiting a type II error (i.e we do not reject that the mean is 68 when
it’s not).
It is not possible to calculate the probability of making a type II error without making some
specific assumption about the alternative hypothesis.
Example: Assume: µ = 70kg
A type II error will occur when the sample mean falls between 67 and 69 when µ = 70kg is
true.
The probability of a type II error is:
β = P (67 6 x̄ 6 69 when µ = 70)
Transform to a standard normal distribution:

67 − 70 69 − 70
Z−scores: z1 = = −5 and z2 = = −1.67 ⇒ β = 0.047 = 4.7%
0.6 0.6
To reduce the chance of a type I error:

• Increase the sample size.
• Reduce the size of the acceptance region. (increase the size of the rejection region)
144
11.5 EXERCISES - Sampling and Hypothesis Testing
1. An electrical firm manufactures a 100-watt light bulb, which, according to specifications
written on the package, has a mean life of 900 hours with a standard deviation of 50
hours. At most, what percentage of the bulbs fail to last even 700 hours? Assume that
the distribution is symmetric about the mean.
2. At a particular school, it is found that the mean height of children in grade 10 is 150cm
and the standard deviation is 5 cm. A teacher suspects there is a problem with this
result and believes they can prove it, as their class contains 10 children that are taller
than 165 cm and 8 children that are shorter than 135 cm. Given there are 100 children
in grade 10, how can the teacher prove there is a problem?
3. Suppose that an allergist wishes to test the hypothesis that at least 30% of the public
is allergic to some cheese products. Explain how the allergist could commit
(a) a type I error?
(b) a type II error?
4. The proportion of adults living in a small town who are college graduates is estimated
to be p = 0.6. To test this hypothesis, a random sample of 15 adults is selected. If
the number of college graduates in our sample is anywhere from 6 to 12, we shall not
reject the null hypothesis that p = 0.6; otherwise, we shall conclude that p 6= 0.6.
(a) Evaluate the type I error assuming that p = 0.6. Use the binomial distribution.
(b) Evaluate the type II error for the alternatives of p = 0.5 and p = 0.7.
(c) Is this a good test procedure?
5. A dry cleaning establishment claims that a new spot remover will remove more than
70% of the spots to which it is applied. To check this claim, the spot remover will be
used on 12 spots chosen at random. If fewer than 11 of the spots are removed, we shall
not reject the null hypothesis that p = 0.7; otherwise, we conclude that p > 0.7.
(a) Evaluate the type I error, assuming that p = 0.7.

(b) Evaluate the type II error for the alternative p = 0.9.
6. A manufacturer has developed a new fishing line, which he claims has a mean breaking
strength of 15 kilograms with a standard deviation of 0.5 kilograms. To test the
hypothesis that µ = 15 kilograms against the alternative that µ < 15 kilograms, a
random sample of 50 lines will be tested. The critical region is defined to be x̄ < 14.9.
(a) Find the probability of committing a type I error.

(b) Evaluate the type II error for the alternatives µ = 14.8 and µ = 14.9 kilograms.
7. A manufacturer of transistors guarantees that in a batch of 100 transistors, on average

90 will be free of defects. Assume a normal distribution with a standard deviation of
4.
(a) If we wish to test this hypothesis with a 5% significance level against the case of
there being more transistors with defects, what is the critical region?
(b) If we instead believe that only 80 transistors on average are free of defect, what
is the chance of a type II error?
145
8. It is stated that the length of string in a roll is 5m ± 0.1m with a 95% confidence level
(i.e. a 5% significance level of a type I error).
(a) Assuming a normally distributed length, what is the standard deviation?
(b) Given the alternative of 5m and a standard deviation of 1m, what is the proba-
bility of a type II error?
9. The number of holes in a 100 m2 area of old roofing iron is believed to be 3 and follows
an exponential distribution.
(a) Given a 5% significance level, where we are concerned only with more holes, what
is the critical region to reject the null hypothesis?
(b) A buyer believes there is a much larger number of 10 holes on average. What is
the probability of a type II error, i.e. how easily can they prove this?
(c) The seller believes there is only 1 hole on average. Find the 5% significance level,
when we are concerned only with fewer holes and find the probability of a type
II error in this case. How easily can the seller prove their case?
10. The time that a sacrificial anode lasts on the hull of a ship is believed to follow a
Weibull distribution with β = 3.
(a) It is also believed that 50% of the anodes last less than 1 year. What is the value
of α in the Weibull distribution?
(b) What is the critical region for a 5% significance level?
(c) Given the alternative, that 50% of the anodes last only 6 months, what is the
probability of a type II error?
146
11.6 SELECTED ANSWERS - Sampling and Hypothesis Testing
1. P (X 6 700) < 0.0312
2. P (135 < X < 165) > 0.8889, P (outside range) = 11.11%
3. (a) The allergist concludes that less than 30% of the public are allergic to some cheese
products when, im fact, 30% or more are allergic.
(b) The allergist concludes that at least 30% of the public are allergic to some cheese
products when, im fact, less than 30% are allergic.
4. (a) 0.0609 = 6.09% (b) p = 0.5: 0.8454 = 84.54%. p = 0.7: 0.8695 = 86.95%
(c) Not a good test.
5. (a) 0.0850 = 8.5% (b) 0.3410 = 34.1%
6. (a) 0.0793 = 7.93% (b) 0.5 = 50%
7. (a) xC = 83.42 (b) 0.1959 = 19.59%
8.
9.
10.
147

MA2000 Notes

Uploaded by

MA2000 Notes

Uploaded by

COLLEGE OF SCIENCE AND ENGINEERING

MATHEMATICS FOR SCIENTISTS

© Mathematics Discipline, James Cook University.

6 Eigenvalues and Eigenvectors 65

7 Partial Differential Equations 81

8 Probability and Statistics 92

9 Discrete Probability Distributions 112

10 Continuous Probability Distributions 122

11 Sampling and Hypothesis Testing 138

y = f (x) The slope of the tangent at any point

For example, the volume of a cone,

Another example is the function

This will be a family of circles.

If z = xy, the level curves are xy = constant.

Families of level curves can be seen as contour maps of the surface.

On the other hand, if we change r:

1.2.1 Definition of partial derivative

If f (x, y) is a function of two variables,

∂f f (x0 + ∆x, y0 ) − f (x0 , y0 )

This gives the rate at which f is

Example: f (x, y) = 2x2 + 2x3 y 4 − 2xy

1.2.2 Higher Order Derivatives

We can calculate higher derivatives similarly.

Note: A shorthand notation is often used:

∆z = f (x + ∆x, y + ∆y) − f (x, y).

We can write this as

f (x + ∆x, y + ∆y) − f (x + ∆x, y) ' fy ∆y,

dz is called the differential.

(a) Approximate the change in the area if x

The increment formula is

Represent f (x, y) by its level curves and consider

The unit vector in that direction is

Now, fx = 4x + 6x2 y 4 − 2y = 64 at (3, 1) and fy = 8x3 y 3 − 2x = 210 at (3, 1).

Example: Consider the function z = f (x, y) = x2 + y 2 . Determine the directional

This tells us how fast f is increasing in the direction of ∼

Note that f is increasing in this direction as the diagram suggests.

i.e. a vector parallel to the level curve is perpendicular to ∇f

Example: A particle moves through a potential field V = 3x2 y + ex along a path

We can try to look at the ‘level curves’. i.e. f (x, y, z) = const or

There will be three partial derivatives. The differential is now

The increment formula is

The directional derivative is

There is also a chain rule. If V (t) = f (x(t), y(t), z(t)), then

The tangent plane at (x0 , y0 , z0 ) will

2. (a) If f (x, y) = sin(x2 + y 2 ) find fx and fy .

If R1 and R2 are measured to be 75 ± 5 ohms and 30 ± 2 ohms respectively, find

(a) f (x, y) = xex+y , x(t) = 2t, y(t) = t2 .

(a) f (x, y) = x2 y + y 2 x, at (1, 1) in the direction ∼

(a) Show that ,

(b) Also show that

and calculate the partial derivatives by setting dP = 0, dV = 0 and dT = 0 in turn.

2. (a) fx = 2x cos(x2 + y 2 ), fy = 2y cos(x2 + y 2 ).

3. (i) B III, (iv) A VI.

5. (a) 2(1 + 2t + 2t2 )e2t+t ,

Example: Parameterise the curve y = x2 from x = 0 to x = 1.

Introduce a parameter by letting x = t, therefore y = t2 , and ∼

Similarly, we parameterise the circle: f (x, y) = x2 + y 2 = a2 by using the parametric

2.1 Arc Length

so the length is: s

Example: Find the length of the curve r(t)

1 If we divide the curve into segments again, the

How can we choose a parameter? Usually, we can let t = x. Then y = 1 − t2 so

The mass is ρ = ax = at and

Example: Find the work done by a force

Now d∼ r/dt = ∼i − 2t∼j − ∼

and the integral becomes

Notice there is no · · · dz term here. Effectively, w(x, y, z) = 0.

Once again, we write everything in terms of t:

This all takes place in the same vector field, but

Example: Given the conservative vector field:

find the scalar function f (x, y) such that ∼

Note that the integration ’constant’ is now a function of y.

This process can be extended to a three-dimensional conservative vector field.

for each of the functions

Find the center of mass of the wire.

3. (a) Find the length of the curve y = x3/2 between x = 1 and x = 2.

y We calculate the circulation and divide by the