MA2000 Notes
MA2000 Notes
MA2000
LECTURE NOTES
(Including Tutorial Exercises)
2 Line Integrals 19
2.1 Arc Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Integral of a function, along a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Work Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Path dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 EXERCISES - Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 SELESCTED ANSWERS - Line Integrals . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Multiple Integration 29
3.1 Double Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Triple Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4 EXERCISES - Multiple Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 SELECTED ANSWERS - Multiple Integration . . . . . . . . . . . . . . . . . . . . . 43
4 Vector Calculus 44
4.1 Divergence of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.1 Some Important Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 EXERCISES - Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 SELECTED ANSWERS - Vector Calculus . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Fourier Series 52
5.1 Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.1.1 Trigonometric Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Half-range Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 EXERCISES - Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4 SELECTED ANSWERS - Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . 64
Functions of two variables will have the form z = f (x, y). They have two independent
variables (x and y) and one dependent variable (z).
Y
As a second example, consider a vibrating
string that is fixed at x = 0 and x = l. Its
displacement u will depend on x and t. For
example
πx πct
u u(x, t) = sin cos .
l l
l X
1
To represent z = f (x, y) graphically, we need three dimensions. The function actually
represents a two dimensional surface in three dimensions. Consider the function z = x2 + y 2 .
At each point in the x–y plane, calculate z and plot the point (x, y, z).
-
The points make up a surface called a
paraboloid. It is actually the parabola z = x2
z
rotated about the z axis.
-
Y
- (x; y )
X
z = f (x; y ) = x
2 + y2
Z
a
Z
-
z = xy represents a ‘saddle’
-Y
X - z = xy
Finally, we have already seen that the function z = ax + by + c represents a flat plane.
2
Another way to represent these functions is to use ‘level curves’. For example, if z = f (x, y) =
x2 + y 2 , the level curves are z = constant or x2 + y 2 = constant = a2 .
3
1.2 Derivatives
The derivative is the rate of change of the function as x and y change. But the change in
(x, y) can be in any direction.
1
For example, for the cone, V = πr2 h, we can analyse how V changes as h → h + δh:
3
1 1 1
V → V + δV = πr2 (h + δh) = πr2 h + πr2 δh
3 3 3
1 δV 1
so δV = πr2 δh. The rate of change is = πr2 so
3 δh 3
∂V δV 1
= lim = πr2
∂h δh→0 δh 3
This is the partial derivative with respect to h (keeping r constant).
The ∂ indicates that the other variable is kept constant.
1
V + δV = π(r + δr)2 h
3
1 δV 1
so δV = π(2rδr + (δr)2 )h. The rate of change is = π(2r + δr)h and
3 δr 3
∂V δV 2
= lim = πrh.
∂r δr→0 δr 3
Note that we do each differentiation keeping the other variable fixed.
4
Y
On the surface, the derivative indicates a movement either uphill or downhill and how fast.
etc.
∂f ∂f ∂ 2f ∂ 2f ∂ 2f ∂ 2f
fx = , fy = fxx = , fyy = fxy = , fyx =
∂x ∂y ∂x2 ∂y 2 ∂y∂x ∂x∂y
5
Example: f (x, y) = 2x2 + 2x3 y 4 − 2xy
∂f ∂f
fx = = 4x + 6x2 y 4 − 2y, fy = = 8x3 y 3 − 2x
∂x ∂y
∂ 2f ∂ 2f
fxx = = 4 + 12xy 4
, fyx = = 24x2 y 3 − 2
∂x2 ∂x∂y
∂ 2f ∂ 2f
fxy = = 24x2 y 3 − 2, fyy = = 24x3 y 2
∂y∂x ∂y 2
∂ 2f ∂ 2f
Note that = . This is true for many functions.
∂y∂x ∂x∂y
6
1.3 Differentials
We already know how a function of two variables changes if we change the value of x or y
by a small amount. The rate of change is given by the corresponding partial derivative. We
can now get an idea of how the function changes if we change the values of both x and y by
small amounts. Thus, for a function, z = f (x, y), we want to find what happens to z if we
move from (x, y) to (x + ∆x, y + ∆y) The change in z will be given by
The first term is the change due to the change in y and the second term is the change due
to the change in x. We can approximate these separately as
Therefore,
∆z ' fx ∆x + fy ∆y.
(Note that we might expect that fy would need to be evaluated at the point
(x + ∆x, y) rather than the point (x, y). However, the difference that would result is small
and, to the accuracy of the approximations we have made, we can evaluate both fx and fy
at (x, y).)
The formula for ∆z is called the increment formula and it is analogous to the situation for
dy
functions of one variable where δy ' δx. We simply have an extra term for the extra
dx
independent variable. It is valid for any finite changes, ∆x and ∆y. If we consider changes
in x and y that are infinitessimally small we usually write these changes as dx and dy. The
corresponding change in z is
dz = fx dx + fy dy.
7
1
Example: The area of a right-angled triangle with base x and angle θ is A = x2 tan θ.
2
(b) If x changes from 1 to 0.95, calculate the change in θ required to keep A the same.
In this case,
∂A ∂A
∆A = 0 = ∆x + ∆θ
∂x ∂θ
∴ 0 = −0.05 + ∆θ.
∴ ∆θ = 0.05 radians
= 2.86◦ .
8
1.4 Directional derivatives
For functions of two variables, z = f (x, y), we know the derivative in the x direction is fx
and the derivative in the y direction is fy . What about other directions?
û
∼ = cos θ∼i + sin θ∼j
∆x ∆y q
where cos θ = , sin θ = and ∆s = (∆x)2 + (∆y)2 . If we move in the direction of ∼
û,
∆s ∆s
the change in f is
∆f = ∆z ' fx ∆x + fy ∆y.
∆f
The rate of change is . The directional derivative is
∆s
df ∆f
= lim
ds ∆s→0 ∆s
∆x ∆y
= lim fx + fy
∆s→0 ∆s ∆s
= fx cos θ + fy sin θ.
df
The directional derivative, , depends on the direction, θ (or ∼ û), as well as the position
ds
(x0 , y0 ). This is the quantity that tells us if we are going uphill or downhill and how fast.
df
For displacements parallel to the x–axis, θ = 0 and the directional derivative is = fx (the
ds
partial derivative). For displacements parallel to the y–axis, θ = π/2 and the directional
df
derivative is = fy .
ds
The directional derivative is sometimes written as D∼û f .
9
Example: Calculate D∼û f in the direction of ∼i + 2∼j for the function
f (x, y) = 2x + 2x3 y 4 − 2xy at the point (3, 1).
2
The directional derivative can also be written in terms of vectors. For example,
df
D∼û f = = fx cos θ + fy sin θ = (fx ∼i + fy ∼j) · (ux ∼i + uy ∼j).
ds
The vector on the right is just ∼
û. The other vector is a new vector called grad f or ∇f
∼ . Thus
D∼û f = ∇f
∼ ·∼
û.
∂ ∂
The symbol ∇
∼ represents a vector differential operator which can be written as ∼
i + ∼j .
∂x ∂y
Note that ∇f∼ contains information about the function and the position and ∼
û gives the
direction in which we want the derivative.
Y
For this function, the level curves are circles.
u~
First we calculate fx and fy :
∂f √ ∂f
= 2x = 3 and = 2y = 1
X ∂x ∂y
√
so ∇f
∼ = 3∼i + ∼j
√ √
This tells us what the function looks like at the point. Also, ∼
ˆ = (1/ 2, 1/ 2). Therefore
u
√ √ √
∇f ˆ=
∼ ·∼
u 3/ 2 + 1/ 2 ' 1.93.
10
It also helps to draw the direction of ∇f
∼ on the diagram.
Y
u~
r~ f
If it is drawn carefully, it is perpendicular to the
X level curves.
We expect this since, if we move along the level curve, df /ds will be zero. Therefore, if we
choose ∼
ˆ to point along the level curve,
u
∇f ˆ = 0.
∼ ·∼
u
Note that this gives us a quick way of finding the normal to any curve. If the curve is defined
by f (x, y) = const, then the normal to the curve is ∇f
∼ = fx ∼i + fy ∼j. This helps to give an
understanding of the ‘meaning’ of ∇f
∼ .
The other thing we may want to know is the direction in which f increases the most rapidly.
Now df /ds = ∇f
∼ ·∼ ˆ = |∇f
u u| cos φ where φ is the angle between ∇f
∼ | |ˆ
∼ ∼ and ∼ˆ
u.
So df /ds = |∇f
∼ | cos φ. This is a maximum if φ = 0. i.e. if ∼
ˆ ∝ ∇f
u ∼ .
Therefore, ∇f
∼ points in the direction of maximum increase. Note also that the maximum
steepness is given by |∇f
∼ |.
Also note that ∇f∼ is a vector function obtained from a scalar function. In many cases,
where a force field is obtained from a scalar potential, the relationship is ∼
F = −∇V
∼ . Vector
functions that can be obtained from a scalar function in this way are very important.
11
1.5 Chain rules
For functions of one variable, if y = y(x), and x = x(t) then we recall the Chain Rule:
dy dy dx
= .
dt dx dt
Similarly, for functions of two variables, z = f (x, y), x and y might both depend on t. For
example, (x(t), y(t)) might be the coordinates of a moving particle, so ∼ r = x(t)∼i + y(t)∼j, and
f (x, y) might be some potential that affects the motion of the particle. The potential ‘seen’
by the particle is z = f (x(t), y(t)). In this case, z really depends on t only.
dz
The derivative: is a measure of how rapidly the potential is changing for the particle.
dt
In time δt, the change in f is δf ' fx δx + fy δy. We want
δf δx δy
lim = lim fx + fy
δt→0 δt δt→0 δt δt
dx dy
= fx + fy
dt dt
∂f dx ∂f dy
= +
∂x dt ∂y dt
Note this is the same as the chain rule for functions of one variable, except that we have two
terms instead of one. It can also be written as
df d∼
r
= ∇f
∼ · .
dt dt
∇V
∼ = (6t sin t + et )∼i + 3t2 ∼j
As ∼
r = x(t)∼i + y(t)∼j, we can see that x(t) = t, y(t) = sin t.
d∼
r dx dy
Therefore = i + ∼j = ∼i + cos t∼j. Therefore
∼
dt dt dt
dV dr
= ∇V
∼ · ∼ = (6t sin t + et ) + 3t2 cos t
dt dt
dV
∴ = eπ − 3π 2
dt t=π
12
1.6 Functions of Three Variables
We now extend the above concepts to functions of three variables, i.e. w = f (x, y, z). Most
of the formulae generalise. The main difficulty is in visualizing what is happening. For
example, consider
w = x2 + y 2 + z 2 .
We can’t ‘plot’ this as a surface, as we would need four dimensions. So, we can’t visualize it
in this way.
x2 + y 2 + z 2 = constant.
Note that these are now level surfaces. In fact they are spheres with centre at the origin.
They need three dimensions rather than two dimensions.
dw = fx dx + fy dy + fz dz.
where ∇f
∼ = fx ∼i + fy ∼j + fz ∼
k and ∼
ˆ is a three dimensional unit vector in the direction we are
u
interested in.
dV ∂f dx ∂f dy ∂f dz d∼
r
= + + = ∇f
∼ · .
dt ∂x dt ∂y dt ∂z dt dt
13
1.7 Tangent Planes
As for functions of two variables, ∇f
∼ is perpendicular to the level curves, only in this case
they are level surfaces. This gives us a way to find the normal of a surface. Suppose we have
a surface z = g(x, y). If
f (x, y, z) = z − g(x, y)
then the equation f (x, y, z) = 0 will define the same surface. This is a level surface of f .
The normal to this surface is given by ∇f
∼ . i.e.
n
∼ ∝ ∇f
∼
∂g ∂g
= − ∼i − j + k.
∂x ∂y ∼ ∼
This is the same as the formula we would get if we calculated the normal to the surface
z = g(x, y) using other methods. Once we know the normal to the surface, we can calculate
the equation of the tangent plane.
Exercise
Show that the equation of the tangent plane to the surface z = g(x, y) at the point (x0 , y0 , z0 )
can be expressed as
−(x − x0 )gx − (y − y0 )gy + (z − z0 ) = 0
or
−gx x − gy y + z = (−gx x0 − gy y0 + z0 ) = const.
14
1.8 EXERCISES - Partial Derivatives
∂z ∂z
1. (a) Find and for each of the following functions.
∂x ∂y
(i) z = e2x−3y (ii) z = x ln(x2 + y 2 ) (iii) z = sin x + x cos y
(b) For each of the following functions, evaluate each of the 2nd order derivatives and
∂ 2f ∂ 2f
verify that = .
∂x∂y ∂y∂x
(i) f (x, y) = 2x3 y + 3xy 2 − 4y 3 (ii) f (x, y) = cosh(xy + 2y)
x + 4y 2
(iv) z = x3 − 3xy 2 (v) z = sin x sin y (vi) z = sin2 x + 41 y 2
Match each graph with its function.
4. (a) Use the increment formula to approximate the following. (Do not use your cal-
culator.)
√ √
(i) 0.99 e0.02 (ii) 3.012 + 3.972
(b) A boundary stripe 3cm wide is painted around a netball court (whose dimensions
are 15m by 30m). Use the increment formula to approximate the number of
square metres of paint in the stripe.
(c) The total resistance of two resistors in parallel is given by
1 1
−1
R= +
R1 R2
5. For each of the following, use the chain rule to find df /dt as a function of t.
Check your answers by first substituting the expressions for x and y into f (x, y).
6. Calculate ∇f
∼ , for each of the functions in question (5).
15
7. Find the directional derivatives of each of the following functions in the direction
indicated.
8. For the functions in question (7), find the direction in which the function is increasing
most rapidly at the given point.
9. For each of the following functions, find the direction of the normal and the equation
of the tangent plane at the given point.
√ √
(a) z = 4 − x2 − y 2 at (1, 1, 2)
(b) z = x + 2y 3 at (1, 1, 3)
10. Plane polar coordinates, (r, θ), and Cartesian coordinates, (x, y), are related by the
√
formulae r = x2 + y 2 and x = r cos θ. Calculate ∂r/∂x and ∂x/∂r and show that
∂r ∂x
6= 1.
∂x ∂r
*11. In thermodynamics, the ideal gas equation for a fixed mass of gas is P V = RT where
R is a constant.
12. The equations in Q11 are valid whenever P , V and T are related by an ‘equation of
state’ such as f (P, V, T ) = 0. Prove the equations again for this equation of state
starting with
df = fP dP + fV dV + fT dT = 0
16
17
1.9 SELECTED ANSWERS - Partial Derivatives
1. (a) (i) ∂z/∂x = 2e2x−3y , ∂z/∂y = −3e2x−3y , (ii) zx = ln(x2 + y 2 ) + 2x2 /(x2 + y 2 ),
zy = 2xy/(x2 + y 2 ); (b) (i) fxx = 12xy, fyy = 6x − 24y, fxy = fyx = 6x2 + 6y,
6. (a) (1 + x)ex+y ∼
i + xex+y ∼j; (b) 2xy ∼i + (x2 − sin y)∼j.
√ √
7. (a) 15/ 13; (b) −2/ 17; (c) −10/3.
8. (a) 3∼i + 3∼j; (b) 2∼i + ∼j; (c) 2∼i + 5∼j + 11∼
k.
√
9. (a) x + y + 2z = 4; (b) x + 6y − z = 4.
18
2 Line Integrals
To evaluate an integral along a curve, we are often required to ’parameterise’ the curve, i.e.
express a function using the vector-paramteric form of the curve. For example
express: f (x, y) as ∼
r = x(t)∼i + y(t)∼j
or: f (x, y, z) as ∼
r = x(t)∼i + y(t)∼j + z(t)∼
k
r(t)
∼ = a cos t∼i + a sin t∼j
For one type, the simplest example is to find the length of a curve. (A method for doing
this for two dimensional curves is given in first year, but we also need to be able to do it
for curves in three dimensions.) The problem turns into an integral because we first divide
the line into small segments, then find the length of each segment and add these lengths
together. We then take the limit as the length of each segment approaches zero.
Z
If the curve is ∼
r=∼r(t), we can divide it into seg-
ments δt. The length of each segment will be δs
s where q
δs ' (δx)2 + (δy)2 + (δz)2 .
finish
Y We then take
X
δs.
X start
19
Z finish Z
We then take lim and the sum becomes ds or ds, where C is used to indicate the
δs→0 start C
contour of integration.
We still need to know how to evaluate this integral. There are a few different approaches, but
most are equivalent to the following. To do this, the curve needs to be parameterized. If it
isn’t, then this will be the first thing to do. (Sometimes, it may be necessary to parameterize
the curve in two or more different sections.) Then, for each segment of the curve,
q
δs = (δx)2 + (δy)2 + (δz)2
s
2 2 2
δx δy δz
= + + δt
δt δt δt
ds dr
(Note that the integrand here is or ∼ .) Everything here should be known so we can,
dt dt
in principle, calculate the length.
20
2.2 Integral of a function, along a curve
Sometimes, we need to calculate integrals of the type
Z
f (x, y, z) ds.
C
This type of integral can arise if we have a piece of wire, say, where the density depends on
position and we want to find the total mass.
Example: A piece of wire is bent into the shape of a parabola y = 1−x2 between x = 0 and
x = 1. The density (mass/unit length) at the point (x, y) is ρ(x, y) = ax (i.e., proportional
to the distance from the y-axis). Calculate the total mass of the wire.
1 X
Z
In the limit as δs → 0, M = ρ(x, y) ds. We can then write ds = (ds/dt)dt.
r(t)
∼ = x∼i + y ∼j = t∼i + (1 − t2 )∼j.
21
2.3 Work Done
The other type of line integral is often needed when we have a vector field ∼ F (x, y, z) rather
than a scalar field, f (x, y, z). An example might be if ∼
F is the force acting at any point in
space, and we want to find the work done on a particle as it moves along a curve in space.
If ∼
F is a constant, then the work done is W = ∼ d, where ∼
F ·∼ d is the displacement.
-r~
Z
-
If ∼
F varies with position, we have to break
the displacement up into small segments over
which ∼F is nearly constant.
If the displacement in each segment is δ ∼r,
then
-Y δW = ∼ F · δ∼
r.
- X
We then have
finish
W =
X
F
∼ · δ∼
r.
start
As |δ ∼
r| → 0, and the number of segments gets larger, we get
Z Z
W = ∼ r=
F · d∼ F (x, y, z)
∼ · d∼
r.
C C
dr
r ' ∼ δt, we can
Once again, it is helpful to express this in terms of a parameter, t. Since δ ∼
dt
d∼
r
write d∼
r' dt. Then
dt
t1
dr
W = F (x(t), y(t), z(t)) · ∼ dt.
X
∼
t0 dt
Remember that ∼
r(t) = x(t)∼i + y(t)∼j + z(t)∼
k and so
d∼
r dx dy dz
= i + ∼j + ∼
∼ k.
dt dt dt dt
F (x, y, z)
∼ = (y + z 2 )∼i + xz ∼j + y ∼
k
acting on a particle that moves from (0, 1, 0) to (1, 0, −1) along the curve:
r(t)
∼ = t∼i + (1 − t2 )∼j − t∼
k
22
Notice that we have two different vector functions here. One is a vector field defined in
space. The other represents a curve in that space. The work done is
Z Z
d∼
r
W = F
∼ r=
· d∼ F
∼ · dt.
dt
d∼
r
F
∼ · = 1 + 2t3 − 1 + t2 = 2t3 + t2
dt
Z 1
Therefore, the work done is: W = 2t3 + t2 dt
0
1 1
1
= t4 + t3
2 3 0
5
=
6
Quite often, this type of integral is written a bit differently. Suppose
F (x, y, z)
∼ = u∼i + v ∼j + w∼k
and ∼
r = x∼i + y ∼j + z ∼
k.
Then d∼
r = dx∼i + dy ∼j + dz ∼
k and
F
∼ r = u(x, y, z)dx + v(x, y, z)dy + w(x, y, z)dz
· d∼
These are three separate integrals, calculated along the same curve C (not along the axes).
The connection with the work done is not quite so obvious here.
Z
If the integral is given in this form, it is always possible to go back to the ∼ r form. Or
F · d∼
we could start with the parameterized form of the curve and calculate each integral directly.
23
Z Z
Example: Calculate xy dx + y 2 + 2x dy along the curve x = t, y = et , z = sin t
from t = 1 to t = 2.
dx dy
dx = dt = dt, dy = dt = et dt.
dt dt
Z Z
So the integral becomes: xy dx + y 2 + 2x dy
Z 2 Z 2
= te dt +
t
(e2t + 2t)et dt
1 1
h i2 h1 i2
= (t − 1)et + e3t + 2(t − 1)et
1 3 1
1 1
= e2 + e6 − e3 + 2e2
3 3
1 6 1 3
= 3e + e − e
2
3 3
24
2.4 Path dependence
Consider moving between two points along two different paths.
X
The ‘work done’ could be quite different for different paths. For example, if ∼
F = xy ∼i + x∼j
and ∼
r(t) = t∼i + t∼j for 0 6 t 6 1, the integral is
Z Z Z
F
∼ r=
· d∼ xy dx + x dy
C1
Z 1 Z 1
= t2 dt + t dt
0 0
1 1 5
= + =
3 2 6
But if ∼
r(t) = t∼i + t2 ∼j for 0 6 t 6 1, we get
Z Z 1 Z 1
F
∼ r=
· d∼ t dt +
3
t(2t) dt
C2 0 0
1 2 11
= + =
4 3 12
If we do a complete (closed) loop, and get back to where we started, the work done is
Z Z
1
F · d∼
r− F r=
· d∼
C2
∼
C1
∼
12
25
There are some fields, ∼
F , for which
Z
F
∼ r=0
· d∼
C
for all closed loops. Such fields are called conservative. The work done is always zero. These
are force fields that can be derived from a potential function (e.g. gravity, electric fields). In
this case, ∼F = −∇V∼ .
Any vector field which is the gradient Zof a scalar is conservative. The scalar is (minus) the
potential of the force. For such fields, ∼ r depends only on the end-points and not on
F · d∼
C
the curve in between. In fact, if ∼
F = ∇f
∼ then
Z
F
∼ r = f (end) − f (start).
· d∼
C
If we have a conservative vector field and we know it was created by the gradient of a scalar
function, we can find the function by the following process.
F
∼ = (y cos x + y 2 )∼i + (sin x + 2xy − 2y)∼j
Substitute g(y) into the expression for the scalar function (1):
f (x, y) = y sin x + xy 2 − y 2 + c
26
2.5 EXERCISES - Line Integrals
Z
1. Calculate f (x, y, z) ds for the contour
C
r(t)
∼ = cos t∼i + sin t∼j + t∼
k, 0≤t≤π
(a) f (x, y, z) = x2 .
(b) f (x, y, z) = yz + x.
2. (a) A piece of wire is bent into the shape of a semicircle of radius a. If the mass/unit
length is equal to λ (a constant), find the mass of the wire. (You can parameterize
the shape by ∼r(t) = a cos t∼i + a sin t∼j.)
The coordinates of the centre of mass of the piece of wire are given by
R R
λx ds λy ds
x= R , C
y = RC .
C λ ds C λ ds
(a) ∼
F = x∼i + y ∼j, ∼
r(t) = a cos t∼i + b sin t∼j, t : 0 → π.
(b) ∼
F = (x2 + y)∼i + x∼j, along the curve y = 1 − x2 , as x goes from 0 to 1.
Z
5. Consider the function f (x, y) = x y. Find ∼
F = ∇f
2
∼ and calculate F
∼ r along each
· d∼
C
of the following curves.
Show that in each case the answer is the same as f (1, 1) − f (0, 0).
6. If ∼
F (x, y, z) = sin y ∼i + (x cos y + z)∼j + y ∼
k, find a function f (x, y, z) such that ∼
F = ∇f
∼ .
27
2.6 SELESCTED ANSWERS - Line Integrals
√ √
1. (a) π/ 2; (b) 2 π.
5. ∼
F = 2xy ∼i + x2 ∼j; (a) 1; (b) 0 + 1 = 1; (c) 0 + 1 = 1.
28
3 Multiple Integration
3.1 Double Integrals
a b X
If a rectangle has dimensions δx × δy, the area is δxδy and the mass is δM ' σ(x, y) δxδy.
We now have to add these up, but we do this in a particular order. We add the rectangles
along each strip first. This gives the mass as
x=b x=b
σ(x, y) δxδy =
X X
σ(x, y) δx δy.
x=a x=a
This is the mass of a strip at ‘height’ y and of thickness δy. We then add all the strips
together to get
y=d
X x=b
X
σ(x, y) δxδy.
y=c x=a
29
We then let δx and δy approach 0. The mass of the strip becomes approximately
Z b
σ(x, y) dx δy.
a
Note that the value of the integral here can depend on y (which is actually a constant along
each strip). The total mass is
Z dZ b
σ(x, y) dxdy.
c a
Note that the ‘inner’ limits, a and b, refer to x and the ‘outer’ limits, c and d, refer to y.
x X y=d
x=b X
σ(x, y) δyδx.
x=a y=c
X
Note that we have changed the order of integration. We do the y integral first. Also, the
order of the limits has changed. This simple change works only if the region is rectangular.
These integrals are called double integrals. They involve an integration with respect to two
variables. We will also see that it is possible to write these integrals in a more general form.
30
A double integral can be used to represent a volume, in much the same way that a single
integral represents an area under a graph. For example, consider a lake that covers a region
A.
Y
Divide the surface of the lake into small regions.
(They needn’t be rectangles.) Let the area of one
of these regions (at (x, y)) be δA. The volume of
the lake beneath this region will be
X
We now take the sum of all these values to get
X
V ' d(x, y) δA.
whole lake
If we take the size of the element to get smaller, the volume is written as
ZZ
V = d(x, y) dA.
A
Note that we use two integral signs to indicate that it is a double integral. Also, the A tells
us what is the region of integration—in this case the surface of the lake, which may not be
a rectangle.
This form of the integral is really a symbolic form. We need to be able to convert this to the
other form in order to evaluate the integral. If A is a rectangle, we can simply reproduce
the previous form. If A is not a rectangle, we have a bit more difficulty. Consider the case
where A is a triangle.
31
Y
Let us calculate
ZZ
2 xy dA
A
2 x
y= for this shape. We divide the shape into small
rectangles and add the contributions along strips
of constant y.
1 X
This gives Z 2Z 1
xy dxdy.
0 y/2
Note that the first integral is from x = y/2 to x = 1 since this is the extent of each strip.
The strips are then added from y = 0 to y = 2.
1
Z 2 x=1
This integral is now: xy 2
dy
0 2 x=y/2
1
Z 2
1
= y − y 3 dy
0 2 8
1 2 1 4 2
= y − y
4 32 0
1
=
2
Z 1 Z 2
Note that if we want to change the order here, we cannot simply change the limits. xy dydx
y/2 0
would not be correct as the final answer would depend on the lower limit which is y/2. In-
stead, we must go back to the diagram.
Y
We are now adding rectangles along the vertical
2
strips (i.e. x is constant) first. The strips are then
added from x = 0 to x = 1. Therefore, we have
2 x
y= Z 1 Z 2x
xy dydx
0 0
1 X
1
Z 1 Z 2x Z 1 y=2x
We evaluate this integral: xy dydx = xy 2
dx
0 0 0 2 y=0
Z 1
= 2x3 dx
0
1 4
1
= x
2 0
1
=
2
32
Whenever we put in the limits for an irregular region, we need to have a clear idea of the
shape. We also need to be able to describe the shape if we are given the limits. For example,
consider the shape of A for the integral:
Z 1 Z √1−x2
f (x, y) dydx
−1 0
Y
The integral says that we fix x first. Then
√
y goes from 0 to 1 − x2 . Thus the shape
√
is bounded above by the curve y = 1 − x2
and bounded below by y = 0.
1 1 X
Y
The curve is the semicircle
x2 + y 2 = 1. The limits on x show that the
whole of this 21 disc is contained in the region
of integration.
1 1 X
If we want to change the order of integration, we first have to sketch the region and then,
from this sketch, work out the new limits.
33
Y
r
X
The ‘amount’ of area bounded by changes δr and δθ will depend on where the area is located.
δA ' rδrδθ.
r
r
(Note this is not just δrδθ.)
The extra factor, r, is called the Jacobian. This type of factor will appear whenever we choose
coordinates that are not rectangular. It expresses that fact that the coordinate ‘patches’ will
not all be the same size.
1 1 X
34
Z πZ 1
We include the limits to get: r3 drdθ
0 0
1
Z π 1
= 4
r dθ
0 4 0
Z π
1
= dθ
0 4
1
= [θ]π0
4
π
=
4
ZZ
Note that, if we just find dA, we will be finding the area of the region A.
A
35
3.2 Triple Integrals
We can extend these ideas to the integral over a whole volume. We call the region V.
Consider an object with density ρ(x, y, z) (mass/volume). The total mass is
ZZZ
M= ρ(x, y, z) dV.
V
ZZZ
If ρ = 1, then we calculate the volume as: V = dV .
V
The idea of the calculation is the same as for double integrals. In rectangular coordinates,
we divide the region into small cubes with dimensions δx × δy × δz.
The mass is
δM = ρ δV = ρ δxδyδz.
Y
X
1. We add these along (say) the vertical direction (i.e. integrate with respect to z) to get
a thin ‘tube’ (with cross-section δxδy).
2. Then add these tubes along (say) the y direction (i.e. integrate with respect to y) to
get a thin plate (with thickness δx).
3. Finally, we add all the plates (integrate with respect to x) to get the final result.
If V, the region of integration, is rectangular shaped, (that is, z goes from a to b, y goes
from c to d and x goes from e to f ) we get
Z f Z dZ b
ρ(x, y, z) dzdydx.
e c a
36
Example: Calculate the integral of f (x, y, z) = xz +ey over the cube 0 6 x 6 1, 0 6 y 6 1,
0 6 z 6 1.
ZZZ Z 1Z 1Z 1
We have f (x, y, z) dV = xz + ey dzdydx
V 0 0 0
1
Z 1Z 1 z=1
= xz + ze
2 y
dydx
0 0 2 z=0
Z 1Z 1
1
= x + ey dydx
0 0 2
1
Z 1 y=1
= xy + ey dx
0 2 y=0
Z 1
1
= x + e − 1 dx
0 2
1 2
1
= x + (e − 1)x
4 0
1
= +e−1
4
3
=e−
4
3.2.1 Polar Coordinates
As with double integrals, sometimes triple integrals appear easier in polar coordinates. In
three dimensions, there are two standard types of polar coordinates, cylindrical and spherical.
They each have their own Jacobian (the volume ‘correction’ factor) that must appear in the
integral.
Cylindrical polars are like plane polars with a z coordinate included. The θ is often relabelled
as φ, the azimuthal angle.
Z
Thus, we have coordinates (r, φ, z) with
z x = r cos φ
y = r sin φ
z=z
r Y
X
Note that r is the distance from the z axis, and the surfaces r = constant are cylinders.
Hence the name.
We can also have spherical polars, (r, θ, φ). Here r is the distance from the origin, so the
surfaces r = constant are spheres.
37
Z
φ is still the azimuthal angle (ranging from
0 to 2π). This is the angle the projection of
the position vector on the x-y plane makes
r with the x-axis.
θ, the co-latitude, is the angle the position
Y vector makes with the z-axis. It ranges from
0 to π.
X
We have
x = r sin θ cos φ
y = r sin θ sin φ
z = r cos θ.
38
3.3 Surface Integrals
This is the last type of integral we needZZ
to look at. We have seen that if we have a plane
object with density σ(x, y), the mass is σ dA. What if we have a curved surface, with
A
density σ(x, y, z)? The surface could be the surface of a sphere or a paraboloid etc., where
the density depends on the position.
Once again, we can divide the region into
Z small elements of surface δS. The mass will
be approximately
S
X
σ(x, y, z)δS.
surface
Another type of integral for a general surface is when we have a vector field defined through-
out space. The vector field can simply be regarded as a ‘collection of arrows’ in space. We
want to know how many of these arrows ‘point through’ some surface. Think of a fluid
moving with velocity ∼
v(x, y, z). At what rate is the fluid moving through some surface S?
39
Note that the product ∼
n dS is often written as d∼
S.
The calculation of such integrals in general is a little beyond what we have already done.
However, if the surface is actually a plane that is perpendicular to one of the axes, then the
integral can be calculated using the technique we have already seen.
ZZ
Example: If ∼ F = xy ∼i + (x + yz)∼j + (x + y )z ∼
2
k, calculate
2
F
∼ n dS over the surface
·∼
S
z = 2 for 0 6 x 6 1, 0 6 y 6 1.
The surface is perpendicular to the z-axis and the integral over this surface is similar to an
integral over the square in the x-y plane.
Z
2
We have to use the fact that z = 2 on the sur-
face and also ∼
n=∼k on the surface. Therefore
F
∼ n = (x + y 2 )z = 2(x + y 2 ).
·∼
1 Y
1
X
The element of area is δS = δxδy (in fact the same as δA previously). Therefore, dS = dxdy.
Z 1Z 1 Z 1 x=1
The integral becomes: 2(x + y ) dxdy =
2
x + 2xy
2 2
dy
0 0 0 x=0
Z 1
= 1 + 2y 2 dy
0
2
1
= y + y3
3 0
5
=
3
This integral is called the flux of ∼F through S. There are usually two choices for the
direction of ∼
n, but note that, if we have a closed surface, the flux is usually taken in the
outward direction. In this case, we get the rate at which fluid is leaving the enclosed volume.
This in turn must be related to the rate of change of the mass of the fluid remaining. We
will see one consequence of this relation in the next section.
40
3.4 EXERCISES - Multiple Integration
1. Calculate the following double integrals.
Z 2Z π
(a) y sin x + x dxdy.
0 0
ZZ √
(b) xey + y x dA, where A is the region 1 ≤ x ≤ 4, 0 ≤ y ≤ 1.
A
(c) Evaluate the integral in part (a) by changing the order first.
ZZ
(d) r sin2 θ dA, where A is the region above the x-axis bounded by the circle
A
x 2 + y 2 = a2 .
Z π/3 Z a
(e) r cos θ rdrdθ. What is the shape of the region in this case?
0 0
2. Find the mass of the triangle bounded by the line x + y = 1 and the x and y axes if
the density is σ(x, y) = x2 + y 2 .
3. The region A lies in the first quadrant and is bounded by the curve y = 1 − x2
and the x and y axes. What are the limits needed on the integral signs to calculate
ZZ
f (x, y) dA?
A
(Do the y integral first.)
Hence calculate (i) the area of A and
RR (ii) the position
RR of the centroid of A.
x dA y dA
(The centroid has coordinates x = A , y= A .)
Area Area
4. Find the centroid of the that part of the disc of radius 1 with centre at the origin that
lies in the first quadrant.
5. For the following integral, change the order of integration and then evaluate the inte-
gral.
Z 2Z y
x2 exy dxdy.
0 0
Z 1Z 2Z 1
6. (a) Evaluate xz + y dzdydx.
0 1 0
ZZZ
(b) The volume of a region V can be expressed as dV . Find the volume of the
V
region for which 0 ≤ x ≤ 1, 0 ≤ y ≤ 1 and 0 ≤ z ≤ 2 − x − y. i.e. the region
between the X–Y plane and the plane z = 2 − x − y for 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
(c) Find the mass of the cylinder x2 + y 2 ≤ a2 that lies between z = 0 and z = h if
the density is given by ρ(x, y, z) = α (a constant).
What if ρ = αr where r is the distance from the z-axis?
ZZZ
(d) Calculate r2 dV , where V is the hemisphere x2 + y 2 + z 2 ≤ a2 ,
V
z ≥ 0. (r is the distance from the origin in this case.)
41
7. If ∼
F = (2x + y)∼i + (z 2 + x)∼j + xz ∼
k, calculate the flux of ∼
F (in the direction pointing
outwards) through each of the six faces of the cube 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, 0 ≤ z ≤ 1.
8. Prove that the Jacobian used for changing co-ordinate systems from Cartesian to
cylindrical polars is J = π, and when changing from Cartesian to spherical polars
is J = r2 sin θ.
9.* A vertical dam wall is in the shape of a triangle bounded by the lines
x 1 x 1
y = − − , y = − and y = 0. (The level of the water is at y = 0.) The pressure
4 2 2 2
(force/unit area) at depth d is ρgd where the density ρ is assumed to be constant and
g is the gravitational constant.
(a) Calculate the total force exerted on the wall by the water.
ρgy 2 dA
RR
(b) Calculate the depth of the centre of pressure which is given by d = A
.
Total force
10.* (a) Calculate the volume of the solid bounded by the paraboloid z = 3 − (x2 + y 2 )
and the x–y plane.
(b) Also find the volume of the region between this paraboloid and the surface z = 2x.
(In part (b), once you have carried out the z integration, the remaining integral
is best expressed in terms of r and θ where x = −1 + r cos θ and y = r sin θ.)
42
3.5 SELECTED ANSWERS - Multiple Integration
1. (a) 4 + π 2 ; (b) 15
2
(e − 1) + 73 ; (c) 4 + π 2 ; (d) πa3 /6; (e) 1
√
2 3
a3 .
2. 1/6.
6. (a) 7/4; (b) 1; (c) παa2 h, 2παa3 h/3; (d) 2πa5 /5.
7. (i) 5
2
(+∼i); (ii) − 12 (−∼i); (iii) 5
6
(+∼j); (iv) − 65 (−∼j); (v) 1
2
(+∼
k); (vi) 0 (−∼
k).
8.
43
4 Vector Calculus
4.1 Divergence of a Vector Field
The flux of a vector field, ∼
F , over the surface, S, of a volume, V, is the total amount of
F ‘pointing out’ of the volume. If ∼
∼ F represents the velocity of a fluid, then the flux would
be the rate at which fluid was leaving the volume. Generally, the larger the volume, V, the
larger the flux will be. Note though that we can have a negative flux—if the vector field
points into the volume.
We could divide the flux by the volume to get the rate of ‘outflow’ per unit volume. This
1 ZZ
would give F · dS, which is like a flux/unit volume. If we do this over a small volume,
V S∼ ∼
δV , and let δV → 0, we get some information about whether the fluid is moving in towards
or away from a particular point. This quantity is called the divergence. Thus, the divergence
is
1 ZZ
lim F · d∼
∼ S.
∆V →0 δV S
Z
Initially, we will try to find the flux over a
small volume, δV = δxδyδz, of a field,
F
∼ = u∼i + v ∼j + w∼
k,
44
The total contribution to the flux is
∂u
u(x + δx, y, z) − u(x, y, z) δyδz ' δxδyδz.
∂x
We can similarly calculate the flux through the two other pairs of faces to be (∂v/∂y)δxδyδz
and (∂w/∂z)δxδyδz, so the total flux is
∂u ∂v ∂w
+ + δxδyδz.
∂x ∂y ∂z
∂u ∂v ∂w
+ + .
∂x ∂y ∂z
∂ ∂ ∂
i∼ + ∼j +∼
k · u∼i + v ∼j + w∼
k =∇∼ ·∼
F.
∂x ∂y ∂z
The left hand factor is the familiar ‘grad’ operator, but now in a different role. This time it
is differentiating a vector field using the dot product.
There is an important identity relating the divergence to the total flux. We have said that
the divergence is the flux density. It follows that if we integrate the divergence over a volume,
we expect to get the total flux for that volume (or the surface of that volume). Thus
ZZZ ZZ
∇ F dV =
∼ ·∼ F
∼ · d∼
S.
V S
(integral of flux density) (total flux)
This is called the divergence theorem or Gauss’ theorem.
The divergence
ZZ theorem also leads to an important identity in fluid mechanics. If ∼
F = ρ∼
v,
then n dS is the rate at which fluid leaves the volume enclosed by S. The total mass
v·∼
ρ∼
S
ZZZ
d ZZZ
in this volume is ρ dV , so the rate of change of the mass is ρ dV .
V dt V
45
Therefore
d ZZZ ZZ
ρ dV = − ρ∼v·∼n dS
dt V S
ZZZ
∂ρ ZZZ
∴ dV = − ∼ · (ρ∼
∇ v) dV
V ∂t V
This is called the continuity equation which actually expresses the conservation of mass.
That is, any mass that flows in or out of the volume will affect what is left. This equation
will appear in any equations for fluid flow. Note that, if the fluid is incompressible, then ρ
is a constant and the equation becomes ∇ v = 0.
∼ ·∼
46
4.2 Curl of a Vector Field
The other important quantity is the curl of a vector field. This is effectively the tendency of
a fluid to rotate about a point. This may depend on the direction. For example
To find the component of the curl of ∼F in that direction, we divide by the area enclosed by
C and take the limit as this area approaches zero. Thus, the component is
1 Z
lim F · d∼
r.
δS→0 δS C ∼
47
Y
Z
If ∼
F = u∼i + v ∼j + w∼
k, the contributions to F
∼ r from the two segments parallel to the
· d∼
C
y-axis can be calculated as follows.
x x + x X
The contribution from the other two segments is −(∂u/∂y)δxδy. The sum of these two terms
divided by the area enclosed by the curve is
∂v ∂u
− .
∂x ∂y
∂w ∂v ∂u ∂w
− and − .
∂y ∂z ∂z ∂x
Therefore
∂w ∂v ∂u ∂w ∂v ∂u
curl ∼
F = − i+
∼ − j+ − k.
∂y ∂z ∂z ∂x ∼ ∂x ∂y ∼
48
This can be written in the form
i
∼ ∼
j k
∼
∂ ∂ ∂
.
∂x ∂y ∂z
u v w
In this form, it looks like ∇
∼ ×∼F and this is how it is most often written. Note that this is
another use of the ∇∼ operator.
Example: If ∼ k, calculate ∇
F = −xy ∼i + z 2 ∼j − yz ∼ F.
∼ ×∼
i
∼ j
∼
k
∼
∇ F = ∂/∂x ∂/∂y ∂/∂z
∼ ×∼
−xy z2 −yz
= (−z − 2z)∼i − (0 − 0)∼j + (0 + x)∼
k
= −3z ∼i + x∼
k
Note that each component of ∇ ∼ ×∼F is like a circulation/unit area or a circulation density.
If we integrate this density over some surface S, we expect to get back the total circulation
around the boundary of the surface. That is,
ZZ Z
∇
∼ ×∼ n dS =
F ·∼ F
∼ · d∼
r
S C
where C is the boundary curve of the surface S. This is called Stokes’ Theorem.
49
4.2.1 Some Important Identities
∇(f
∼ + g) = ∇f
∼ + ∇g ∼
∇
∼ · (∼F + G)
∼ = ∇ ·
∼ ∼ F +∇∼ ·∼
G
∼ × (∼
∇ F +∼G) = ∇∼ ×∼ F +∇∼ ×∼
G
PRODUCT RULES
∇(f
∼ g) = f ∇g
∼ + g ∇f
∼
∇
∼ · (f G)
∼ = ∇f
∼ · ∼ + f∇
G ∼ ·∼ G
∼ · (∼
∇ F ×∼ G) = (∇∼ ×∼F) · ∼
G−∼ F · (∇
∼ ×∼
G)
∼ × (f ∼
∇ G) = ∇f
∼ ×∼ G + f∇ ∼ ×∼G
SECOND DERIVATIVES
∼ · (∇f
∼ ) = ∇ f
2
∇
∼ × (∇f
∇ ∼ ) = ∼
0
∼ · (∇
∇ ∼ ×∼F) = 0
Note that ∇ ∼ ×∼G=∼ 0 if and only if there is a scalar function f (x, y, z) such that ∼
G = ∇f∼ . In
this case, ∼
G is conservative or irrotational.
Also, ∇ G = 0 if and only if there is a vector field ∼
∼ ·∼ F such that ∼G =∇ ∼ ×∼ F . In this case ∼
G
is incompressible or solenoidal.
GAUSS’ THEOREM
ZZZ ZZ
∇ F dV =
∼ ·∼ F
∼ ·∼
n dS
V S
STOKES’ THEOREM
ZZ Z
(∇ F) · ∼
∼ ×∼ n dS = F
∼ · d∼
r
S C
50
4.3 EXERCISES - Vector Calculus
1. Calculate ∇ F and ∇
∼ ·∼ F for the following cases.
∼ ×∼
(a) ∼
F = (x2 + y)∼i + 2yz ∼j + xz 2 ∼
k.
(b) ∼
F = (x + 2z)∼i + (x + y)∼
k.
3. If ∼
F = (2x + y)∼i + (z 2 + x)∼j + xz ∼
k (Q7 in EXERCISES - Multiple Integration).
ZZZ
Calculate: ∇ F and
∼ ·∼ ∇ F dV .
∼ ·∼
V
Show that this is equal to the total flux of ∼
F through the six faces that you have
calculated previously.
Z ZZ
F = xy ∼i + x ∼j + y ∼
4. If ∼ 2
k, show that F
∼ r=
· d∼ (∇ F) · ∼
∼ ×∼ n dS if
C S
2. ∇2 f = −x sin y + 2z.
51
5 Fourier Series
5.1 Periodic Functions
Fourier series are infinite series of trigonometric functions. They are used to represent
periodic functions. They can be seen as a means of approximating these functions, in the
same way as a Taylor series will approximate a function. One advantage of the Fourier Series
is that is can be used to approximate period functions that have discontinuities, such as the
sawtooth wave function. We will see that Fourier Series can be used to solve many partial
differential equations.
For example
f(t)
T t
f(t) f(t)
52
f(t)
i. Polynomials aren’t periodic. This means that the series can take a long time to con-
verge. For example, the series for sin x converges slowly, except near the origin.
ii. These functions often have discontinuities, which Taylor series can’t handle.
This form will clearly cope with periodic functions. We will see that it will also work for
discontinuous functions. Note that
i. If f is an even function, then f (−x) = f (x) and we get only cos terms in the series.
ii. If f is an odd function, then f (−x) = −f (x) and we get only sin terms in the series.
53
Consider the square wave with period 2π.
f(x)
f(x)
Z π 2
We choose b1 so that f (x) − b1 sin x dx is small. In the present example, this means
Z π 2 −π
that 1 − b1 sin x dx is small. If we expand the brackets, we get
0
Z π Z π Z π
dx − 2b1 sin x dx + b21 sin2 x dx.
0 0 0
This has to be minimised. The integral is equal to π − 4b1 + π2 b21 . The derivative with respect
to b1 is −4 + πb1 . Therefore b1 = 4/π is the optimum value.
f(x)
54
Therefore, the next approximation is fe(x) = b1 sin x + b3 sin 3x.
Z π 2
Again, we require f (x) − fe(x) dx to be a minimum. We get the same value for b1 as
−π
before, namely 4/π. The optimum value of b3 is (4/π) × (1/3).
Therefore f(x)
4 1
f (x) ' sin x + sin 3x .
π 3
The extra ‘ripple’ in sin 3x makes x
the approximation a little more
like the square wave.
4 1 1 1
f (x) ' sin x + sin 3x + sin 5x + sin 7x · · · .
π 3 5 7
The more terms we have, the more accurate will be the approximation.
x x x
Note that at the points where the function is discontinuous, the convergence is not very
good. This overshoot phenomenon is a characteristic feature of this type of approximation.
55
The general formula to find the coefficients ai and bi can be found as follows. If
1 ∞
f (x) = a0 + (an cos nx + bn sin nx) (5.1)
X
2 n=1
then the integral of both sides over one complete period must be the same. We can integrate
from −π to π or from 0 to 2π or π/2 to 5π/2 etc. We find
Z π
1Z π ∞ Z π Z π
f (x) dx = a0 dx + cos nx dx + bn sin nx dx
X
an
−π 2 −π n=1 −π −π
∞
= πa0 + (0 + 0)
X
n=1
= πa0
Therefore
1Zπ
a0 = f (x) dx.
π −π
To find the value of am , we multiply equation (5.1) by cos mx and integrate. Therefore,
Z π ∞
a0 Z π Z π Z π
f (x) cos mx dx = cos mx dx+ cos nx cos mx dx+bn sin nx cos mx dx .
X
an
−π 2 −π n=1 −π −π
The coefficient of a0 in this equation is zero. Similarly the coefficient of bn is zero, since
sin nx cos mx is an odd function. The coefficients of an are mainly zero, since
Z π
cos nx cos mx dx = 0
−π
Z π
unless n = m. In that case, cos2 nx dx = π so
−π
1Zπ
am = f (x) cos mx dx.
π −π
1Zπ
bm = f (x) sin mx dx.
π −π
These formulae are called Euler’s formulae for the Fourier coefficients an and bn . The series
that we get is called the Fourier series.
56
If the period is T (rather than 2π) then
a0 X ∞
2πnx 2πnx
f (x) = + an cos + bn sin
2 n=1 T T
where
2 Z T /2
a0 = f (x) dx
T −T /2
2 Z T /2 2πnx
an = f (x) cos dx
T −T /2 T
2 Z T /2 2πnx
bn = f (x) sin dx.
T −T /2 T
f(x)
−1 1 3 x
Note also that the function is even (that is, f (−x) = f (x)) so we do not expect any sine
terms in the answers (bn = 0).
∞
a0 X
Therefore: f (x) = + an cos nπx
2 n=1
1 + x, −1 < x < 0
The function is f (x) =
1 − x, 0<x<1
2Z 1
Therefore: a0 = f (x) dx
2 −1
Z 0 Z 1
= 1 + x dx + 1 − x dx
−1 0
1 1
= + =1
2 2
2Z 1
an = f (x) cos nπx dx
2 −1
Z 0 Z 1
= (1 + x) cos nπx dx + (1 − x) cos nπx dx
−1 0
57
1 1 Z1
Z 1 1
Now: (1 − x) cos nπx dx = (1 − x) sin nπx + sin nπx dx
0 nπ 0 nπ 0
1
1
= 0 + − 2 2 cos nπx
nπ 0
1
= − 2 2 cos nπ − 1
nπ
1
= − 2 2 (−1) − 1n
n π
1 0, n even
= 2 2
nπ 2, n odd
Z 0
1 0, n even
Also: (1 + x) cos nπx dx = 2 2
−1 nπ 2, n odd
0, n even
Therefore: an = 4
, n odd
n2 π 2
and so the Fourier Series for f (x) is given by:
∞
a0 X
f (x) = + an cos nπx
2 n=1
1
= + a1 cos πx + a3 cos 3πx + a5 cos 5πx + · · ·
2
1 4 1 1
= + 2 cos πx + cos 3πx + cos 5πx + · · ·
2 π 9 25
Note that if we put x = 0 in this equation and use the fact that f (0) = 1 and cos 0 = 1, we
obtain the identity
1 1 π2
1+ + + ··· =
9 25 8
Note: We can rewrite the series in closed form (using the summation sign) and include only
the non-zero coefficients, i.e. the odd coefficients in this example. We do this by changing
the index of the summation, n.
1 X∞
4
f (x) = + cos[(2m − 1)πx]
2 m=1 (2m − 1)2 π 2
58
5.2 Half-range Series
We have looked at Fourier series for periodic functions defined on all real numbers. They
can also be used for functions defined on finite intervals, [0, l]. For such a function, we
simply extend it to the real line, making it periodic. Usually we make the extended function
either even or odd. In this case, the expansion will involve only cos terms or only sin terms
respectively.
EVEN −→ −→
l x −l lx −l l 3l x
OR
f(x) f(x) f(x)
ODD −→ −l l−→
x −l l 3l
x
l x
These are the even and odd periodic extensions of the function.
For a function defined over [0, l], the period of the extended function is T = 2l. The resulting
series are called the 21 -range expansions of the functions. We can find the 12 -range cos series
and the 12 -range sin series.
1
The general −range Fourier series are written as:
2
1 a0 X ∞
nπx
EVEN: − range cos series: f (x) = + an cos
2 2 n=1 l
1 ∞
nπx
ODD: − range sin series: f (x) = bn sin
X
2 n=1 l
59
1
To find the coefficients of the −range sin series, we calculate
2
1Z l
a0 = f (x) dx = 0 (since f (x) is the odd extension),
l −l
1Z l nπx
an = f (x) cos dx = 0 (again since f (x) is odd and so the integrand is odd),
l −l l
1Z l nπx
bn = f (x) sin dx
l −l l
2Z l nπx
= f (x) sin dx (since the integrand is even).
l 0 l
1
To find the coefficients of the −range cos series, we calculate
2
1Z l
a0 = f (x) dx
l −l
2Z l
= f (x) dx (since f (x) is now the even extension),
l 0
1Z l nπx
an = f (x) cos dx
l −l l
2Z l nπx
= f (x) cos dx (since integrand is even),
l 0 l
1Z l nπx
bn = f (x) sin dx = 0 (since the integrand is odd).
l −l l
Therefore, for f (x) = l − x on [0, 1], the coefficients of the cos series are
2Z l
a0 = (l − x) dx
l 0
=l
2Z l nπx
and an = (l − x) cos dx
l 0 l
2l 0, n even
= 2 2
nπ 2, n odd
1 a0 X ∞
nπx
And the −range cos series: f (x) = + an cos is given by:
2 2 n=1 l
l 4l πx 1 3πx 1 5πx
f (x) = + 2 cos + cos + cos + ··· .
2 π l 9 l 25 l
60
1
The −range cos series series can be written in closed form. Let n = 2m − 1 to give:
2
l 4l X
∞
1 (2m − 1)πx
f (x) = + 2 cos
2 π m=1 (2m − 1)2 l
1
Alternately, the coefficients of the −range sin series are
2
2Z l nπx
bn = (l − x) sin dx
l 0 l
2 l nπx l l Zl nπx
= − (l − x) cos − cos dx
l nπ l 0 nπ 0 l
2 l2 l l nπx l
= − sin
l nπ nπ nπ l 0
2l
=
nπ
1 ∞
nπx
And the −range sin series: f (x) = bn sin is given by:
X
2 n=1 l
2l πx 1 2πx 1 3πx
f (x) = sin + sin + sin + ··· , on the interval [0, l]
π l 2 l 3 l
x
2 terms 3 terms
f(x) f(x)
x
3 terms 7 terms
f(x) f(x)
61
Note the following.
• In this case, the cos series converges more rapidly.
• An overshoot develops at the discontinuity in the sin series.
• Both series do converge to the function at each point (except at the discontinuities).
62
5.3 EXERCISES - Fourier Series
1. For each of the following functions, write down f (−x) and decide if the function is
even or odd or neither.
(i) f (x) = x2 , (ii) f (x) = ex , (iii) f (x) = x sin x, (iv) f (x) = x3 + 3x,
(v) f (x) = tan x + 2x2 .
3. (a) Calculate the coefficients of the Fourier series for the function in 2(c).
(b) Calculate the coefficients of the Fourier series for the function
f (x) = | sin x|.
−1, −π 6 x ≤ 0
4. The Fourier series for the square wave f (x) = ,
1, 06x≤π
(period 2π) is
4 1 1
f (x) = sin x + sin 3x + sin 5x + · · · .
π 3 5
Substitute x = π/2 into this equation to obtain an expression for the series
1 1 1
1− + − + ··· .
3 5 7
5. If f (x) = x2 for 0 6 x 6 π, sketch the even and odd periodic extensions of f . Also
calculate the 12 -range cos series and the 12 -range sin series for f .
(Use the identity cos A cos B = 1
2
cos(A + B) + cos(A − B) ).
63
5.4 SELECTED ANSWERS - Fourier Series
3. (a) a0 = π/4, am = 0 (m even), am = −1/(m2 π) (m odd), bm = −(−1)m /(2m).
4 1
(b) a0 = 4/π, am = − , bm = 0.
π (2m)2 − 1
π2
5. Cos series: f (x) = 3
− 4(cos x − 212 cos 2x + 312 cos 3x − · · · ).
Sin series: bm = − 2π
m
(−1)m + m43 π ((−1)m − 1).
64
6 Eigenvalues and Eigenvectors
6.1 Revision of Matrices
We need to revise the idea of a matrix. Firstly, consider a set of m linear equations for n
unknowns.
a11 x + a12 y + a13 z + · · · = b1
a21 x + a22 y + a23 z + · · · = b2
a31 x + a32 y + a33 z + · · · = b3
.. .. .. .. .
. . . . ..
am1 x + am2 y + am3 z + · · · = bm .
a11 , a12 , etc. are constants.
b1 , b2 , etc. are constants.
x, y, z etc. are unknowns (n of them).
x=∼
A∼ b,
where
a
11
a12 a13 · · · a1n b
1
a21
a22 a23 · · · a2n
b2
A = a31 a32 a33 · · · a3n and ∼
b = b3 .
.. .. .. .. .. ..
. . . . . .
am1 am2 am3 · · · amn bm
x
y
The unknowns form a single column: x=
∼ z
.
..
65
6.1.1 Matrix Algebra
We can add and subtract matrices if they have the same number of rows and columns.
1 −3 2
We can also multiply a matrix by a number. Therefore, if A = and B =
−1 1 1
3 1 −2 1 + 3 −3 + 1 2 − 2 4 −2 0
, we can calculate: A+B = = .
1 −2 3 −1 + 1 1−2 1+3 0 −1 4
5 −15 10
Also, 5A = .
−5 5 5
Note that ∼bT = (b1 , b2 , . . . , bm ) is a 1 × m matrix. It is called a row vector. Also, a square
matrix is symmetric if AT = A. That is, interchanging rows and columns leaves a symmetric
matrix unchanged.
In forming A∼
x, we multiply a matrix by a vector. i.e. we can multiply an m × n matrix by
an n dimensional vector. e.g.
1 1 1 × 4 + 1 × (−1) 3
−2 3
4 = (−2) × 4 + 3 × (−1) = −11 .
−1
1 3 1 × 4 + 3 × (−1) 1
We can extend this to multiply a matrix by a matrix provided the sizes match. e.g.
1 3 1 −1 1 1 × 1 + 3 × 2, 1 × (−1) + 3 × 1, 1 × 1 + 3 × 1
=
2 1 2 1 1 2 × 1 + 1 × 2, 2 × (−1) + 1 × 1, 2 × 1 + 1 × 1
7 2 4
= .
4 −1 3
In the multiplication, each column of the second matrix is treated like a column vector. The
product of the first matrix with each column of the second matrix gives a column of the
product matrix. The sizes of the matrices are
(2 × 2) × (2 × 3).
Note that the number of columns of the first must equal the number of rows of the second.
In general, an (m × n) matrix multiplied by an (n × p) matrix gives an m × p matrix.
66
Properties of Matrix Operations
• A+B =B+A
• A(B + C) = AB + AC
• (A + B)C = AC + BC
• A(λB) = λ(AB)
• A(BC) = (AB)C
But note that AB need not equal BA. For example, if the multiplication in the previous
example is done in the opposite order, we get
1 −1 1 1 3
.
2 1 1 2 1
This is not even defined, as the number of columns of the first is not equal to the number of
rows of the second. Thus matrix multiplication is associative, but not commutative.
The square matrix of order n, that has ones on the main diagonal and zeros elsewhere, is
the unit matrix, In (or just I). e.g.
1 0 0
I3 =
0 1 0 .
0 0 1
Note that we cannot divide by a matrix. However, if A is a square matrix, we can sometimes
find a matrix, A−1 , such that
AA−1 = A−1 A = I.
67
6.1.2 Solving systems of linear equations
Matrix notation can be used to solve linear equations. For example, to solve the equations
x − y + 2z = −4
2x + y − z = 9
−3x − 3y + 2z = −20
1 −1 2
x −4
y =
2 1 −1 9
.
−3 −3 2 z −20
1 −1 2 −4
[A|∼
b] = 2 1 9
−1 .
−3 −3 2 −20
We proceed to get the matrix into upper triangular form. We select one row (usually the
first) and an element in that row (usually the first). This element is called the pivot. We
subtract multiples of this row from the other rows in order to get zeros beneath the pivot.
We get
1 −1 2 −4
0 3 −5 17
R2 − 2R1
0 −6 8 −32 R3 + 3R1
Then use the second element in the second row as the pivot, and get zeros below this element.
1 −1 2 −4
0 3 −5 17
0 0 −2 2 R3 + 2R2
The coefficient part is now in upper triangular form. This procedure is called row reduction
or Gaussian elimination.
The last row now says −2z = 2. We can solve this to get z = −1.
68
This procedure always works unless we get a row of zeros in the last line. For example, we
might end up with
a1 b 1 c1 d 1
0 b
2 .
c2 d 2
0 0 0 d3
The last equation says that 0 × z = d3 . There are two possibilities.
0 2 −1 −3
,
0 0 0 0
we can let z take any value, λ say, which will be a parameter.
λ−3
The second equation then tells us 2y − z = −3 or y = .
2
The first equation then gives
λ−3 3
5x + 3 − 2λ = −
2 2
λ+6
or x = . The solutions then have the form
10
λ+6 λ−3
x= , y= , z = λ,
10 2
where λ is any number. This is just the equation of a straight line with parameter λ. The
direction is (1/10, 1/2, 1) and (3/5, −3/2, 0) is a point on the line.
69
The case where we get a row of zeros arises if the determinant of the matrix is zero. The
determinant of a matrix can be found as follows. If A is a 2 × 2 matrix,
a b a b
det A = det = = ad − bc.
c d c d
If A is a 3 × 3 matrix, the determinant can be calculated by expanding along the first row:
a b c a b c
det A = det d e f = d e f
g h i g h i
e f d f d e
=a −b +c
h i g i g h
For an n × n determinant,
det A = a11 M11 − a12 M12 + a13 M13 − a14 M14 + · · · + (−1)n+1 a1n M1n ,
where Mij is the (n − 1) × (n − 1) determinant of the matrix with the ith row and the jth
column removed.
It follows that the determinant of an upper triangular matrix can be found by taking the
product of the diagonal elements. Note also that the determinant of a matrix is unchanged
if a multiple of one row is added to another row. Therefore, the determinant of a matrix can
be calculated by using row reduction to put the matrix into upper triangular form and then
taking the product of the diagonal elements.
1 1 1 1
1 2 1 −1
i.e. to evaluate
2 1 −1 −2
1 5 −1 1
1 1 1 1 1 1 1 1
1 2 1 −1 0 1 0 −2 R2 − R1
we write =
2 1 −1 −2 0 −1 −3 −4 R3 − 2R1
1 5 −1 1 0 4 −2 0 R4 − R1
1 1 1 1 1 1 1 1
0 1 0 −2 0 1 0 −2
= =
0 0 −3 −6 R3 + R2 0 0 −3 −6
0 0 −2 8 R4 − 4R2 0 0 0 12 R4 − 32 R3
= 1 × 1 × (−3) × 12 = −36.
Note: If the determinant of a matrix is zero, then row reduction will produce a matrix
with a row of zeros.
70
6.1.3 Homogeneous Systems
Consider the equations A∼ x=∼ 0. Clearly, ∼x=∼ 0 is a solution, so if det A 6= 0, it must be the
only solution. If we want a non-zero solution, we must have det A = 0. This means there
will be many solutions, since if ∼
x is a solution, then λ∼x will also be a solution.
1 −1 2
Now
3 k 1 k 1 3
det A = + +2 = k + 10.
1 −1 −2 −1 −2 1
Therefore, for non-zero solutions, k = −10. Then
1 −1 2 0 1 −1 2 0
[A|∼
b] = 1 3 −10 0 → 0 4 −12 0
.
−2 1 −1 0 0 0 0 0
So, if we choose an arbitrary value for z, we can find the values of x and y. Therefore, we
put z = λ. We get y = 3λ and x = λ. The solution is
1
x λ
x=
y = 3λ = λ 3 .
∼
z λ 1
71
6.1.4 The inner product
The product, ∼
xT ∼y, is
y
1
y
2
x1 , x2 , x3 , · · · , xn 3 = x1 y1 + x2 y2 + x3 y3 + · · · + xn yn ,
y
..
.
yn
72
6.2 Eigenvalues and Eigenvectors
The equation A∼
x=∼ b suggests that the matrix A transforms the vector ∼
x into the vector ∼
b.
An important case arises when A∼x∝∼ x. i.e.
x = λ∼
A∼ x.
m2
y2
73
We now look for solutions for which y1 = a1 sin(ωt + ) and y2 = a2 sin(ωt + ). This means
that
a a
1 sin(ωt + ) and ÿ = −ω 2 1 sin(ωt + ).
y
∼
= ∼
a2 a2
This means that ∼ÿ = −ω 2 ∼y and the set of differential equations becomes A∼y = −ω 2 ∼y. Now
both sides have a factor of sin(ωt + ). If we divide by this factor, we get
a = −ω 2 ∼
A∼ a.
a1
Here ∼
a = is an eigenvector and −ω 2 is the eigenvalue. The equations for ∼
a are
a2
−5 2 a1 a1
= −ω 2 .
6 −6 a2 a2
These are
−5a1 + 2a2 = −ω 2 a1
6a1 − 6a2 = −ω 2 a2 ,
which can be written as
(−5 + ω 2 )a1 + 2a2 =0
6a1 + (−6 + ω )a2 = 0,
2
or
−5 + ω 2
2 a 0
1 = .
6 −6 + ω 2
a2 0
Now this is a homogeneous set of equations. Since we want non-zero solutions, we will need
to have
−5 + ω 2
2
det = 0.
6 −6 + ω 2
We now take each of these eigenvalues in turn and calculate the corresponding eigenvector.
If ω 2 = 2,
−3 2 a1 0
= .
6 −4 a2 0
a1 2
This can be solved to give = C .
a2 3
If ω = 9,
2
4 2 a 0
1 = .
6 3 a2 0
74
a1 1
This can be solved to give = D .
a2 −2
We now have two solutions for ∼y. These are
2 √
y = C sin( 2t + 1 )
∼ 3
1
and ∼y = D sin(3t + 2 ).
−2
(A − λI)∼
x=∼
0.
det(A − λI) = 0.
2 4 −2
−2 2 5
2−λ 4
−2
det 4 2 2 = 0.
− λ
−2 2 5−λ
This will be a cubic equation in λ and will have three solutions. Each solution, λi , will give
rise to an eigenvector, ∼
vi .
In general, the equation det(A − λI) = 0 is an nth degree polynomial equation, called the
characteristic equation of A. Typically, there will be n solutions, and each solution will
give an eigenvector. These eigenvalues and eigenvectors are characteristic properties of the
matrix, A. It sometimes happens that some of the eigenvalues can be the same. Sometimes,
if two eigenvalues are the same, then this eigenvalue will have two eigenvectors associated
with it. In this case, there is still n different eigenvectors. This doesn’t always happen
though and sometimes, if two eigenvalues are the same, there will not be n eigenvectors.
There are two important theorems here.
75
Theorem 1: If A is real and symmetric, then the eigenvectors of distinct eigenvalues of A
will be orthogonal. (This means that their inner product is zero).
Theorem 2: If A is a real and symmetric n × n matrix, the eigenvalues will all be real
and there will be n mutually orthogonal eigenvectors.
Theorem 3: The trace of a matrix A, denoted by tr(A) equals the sum of all eigenvalues.
n
i.e. tr(A) =
X
λi
i=1
These results are important, as many of the matrices that arise in practice are real and
symmetric. In particular, the matrix in the example we started is real and symmetric, so
these theorems will apply.
2−λ 4
−2
det
4 2−λ = 0.
2
−2 2 5−λ
2−λ 2 4 2 4 2−λ
∴ (2 − λ) −4 + (−2) = 0.
2 5−λ −2 5 − λ −2 2
∴ (2 − λ)(λ2 − 7λ + 6) − 4(−4λ + 24) − 2(−2λ + 12) = 0.
∴ (2 − λ)(λ − 1)(λ − 6) + 20(λ − 6) = 0.
∴ (λ − 6) (2 − λ)(λ − 1) + 20 = 0.
∴ (λ − 6) − λ2 + 3λ + 18 = 0.
∴ (λ − 6)(λ − 6)(λ + 3) = 0.
∴ (λ − 6)2 (λ + 3) = 0.
The eigenvalues are λ = 6 and λ = −3. Note that λ = 6 is an eigenvalue of order 2.
However, since A is real and symmetric, we expect to get two eigenvectors associated with
this eigenvalue.
x
If λ = −3 and ∼
x= y , the equation (A − λI)∼
x=∼
0 becomes
z
5 4 −2 0
x
y = 0 .
4 5 2
−2 2 8 z 0
76
The row reduced, augmented matrix is
5 4 −2 0
0 1 2 0
,
0 0 0 0
so
2z 2
x
y = −2z = z −2 .
z z 1
If λ = 6,
4 −2 0
−4 x
4 −4 2 y = 0
.
−2 2 −1 z 0
The row reduced, augmented matrix is
2 −2 1 0
0 0 0 0
.
0 0 0 0
y − 12 z 1 −1
x
2
y = y = y 1 + z 0 .
z z 0 1
λ = −3 λ=6
2 1 − 21
1 0
−2
1 0 1
Note that there are two eigenvectors for λ = 6. Also each of these is orthogonal to the
eigenvector for λ = −3. Any linear combination of the eigenvectorsfor λ = 6 will alsobe
1
− −1
2 2
an eigenvector with eigenvalue 6. This means that we can replace 0 with 2 0 +
1 1
− 12
1
1
1 =
1
to get a complete set of orthogonal eigenvectors.
2 2
0 2
77
6.3 EXERCISES - Eigenvalues and eigenvectors
1. Find the general solution to the following sets of equations.
x + 2y − z = 0
2x + 3y = 0
(i) (ii) x + 3y =0
4x + 6y = 0.
x + y − 2z = 0.
2. Find the eigenvalues and corresponding eigenvectors for the following matrices.
4 0 1
6 3
(i) A = (ii) A =
−2 1 0 .
.
2 7
−2 0 1
1 1 2
A = 1 2 1
.
2 1 1
78
5. (a) If λ is an eigenvalue of a matrix A and ∼x is the corresponding eigenvector, show
that λ is an eigenvalue of A and ∼
2 2
x is the corresponding eigenvector.
(b) If the eigenvalues of a matrix A are λ1 , . . . , λn (all non-zero), show that the eigen-
values of A−1 are λ−11 , . . . , λn .
−1
(c) If the eigenvalues of a matrix A are λ1 , . . . , λn , what are the eigenvalues of the
matrix A − kI?
0 0 f
(b) Show that the eigenvalues of any upper triangular matrix are just the diagonal
elements of the matrix.
79
6.4 SELECTED ANSWERS - Eigenvalues and eigenvectors
3
x
x 3
1. (i) =λ .
, (ii) y = λ −1
y −2
z 1
2. (i) λ1 = 9, ∼
v1 = (1, 1)T , λ2 = 4, ∼
v2 = (−3/2, 1)T ;
(ii) λ1 = 1, ∼
v1 = (0, 1, 0)T , λ2 = 2, ∼
v2 = (−1/2, 1, 1)T , λ3 = 3, ∼
v3 = (−1, 1, 1)T .
3. (a) λ1 = 1, ∼
v1 = (1, −2, 1)T , λ2 = 4, ∼
v2 = (1, 1, 1)T , λ3 = −1, ∼
v3 = (−1, 0, 1)T .
6. λ = a, d, f .
80
7 Partial Differential Equations
7.1 Types of Partial Differential Equation
We know that if y(x) is a function of one variable, the derivatives will be ordinary derivatives,
dy
. Such a function may satisfy an ordinary differential equation such as y 00 + ay 0 + by = 0.
dx
Usually the differential equation will be of the form g(x, y 0 , y 00 , . . .) = 0. If z = f (x, y) is
a function of more than one variable, the derivatives will be partial derivatives, ∂f /∂x and
∂f /∂y. For such functions, any differential equations will involve partial derivatives. e.g.
∂f ∂f ∂ 2 f ∂ 2 f ∂ 2 f
g x, y, f, , , , , ,... = 0.
∂x ∂y ∂x2 ∂x∂y ∂y 2
Such equations are partial differential equations. For functions of three variables, they can
be more complicated.
Some differential equations of this type are simple enough and we can handle them using
familiar methods. (They are effectively ordinary differential equations.) Others are more
difficult and need ‘new’ methods. Some of these equations are important in Science and
Engineering. For example, the Heat equation describes the flow of heat through a solid.
∂u
= y sin x + ey ,
∂x
we can treat y as a (constant) parameter and integrate to get
Here the arbitrary constant can actually be an arbitrary function of y. If the equation involves
derivatives with respect to both variables, this simple type of solution is not possible.
81
We will look at some second order equations of this type. There are two particular equations
we will look at, namely
∂u ∂ 2u
i. Heat Equation = ν 2, where ν is a constant.
∂t ∂x
The Heat Equation describes the temperature dis-
tribution in a thin rod. The 3D equation:
∂u/∂t = ν(∂ 2 u/∂x2 + ∂ 2 u/∂y 2 + ∂ 2 u/∂z 2 )
describes the distribution in 3D objects.
∂ 2u 2
2∂ u
ii. Wave Equation = c .
∂t2 ∂x2
u
This describes the movement or vibrations
(in a string) or propagation of signals (in
a coaxial cable). In 3D the equation,
∂ u/∂t = c (∂ u/∂x + ∂ 2 u/∂y 2 + ∂ 2 u/∂z 2 )
2 2 2 2 2
These equations are linear, homogeneous, and second order. They have constant coefficients.
This means we can add two solutions to get another solution.
82
7.2 Separation of Variables
This is the fundamental method for solving such equations. It depends on being able to add
solutions to get another solution. This is called the principle of superposition.
∂u ∂ 2u
Like all differential equations, the Heat Equation, = ν 2 needs extra information to
∂t ∂x
determine a solution uniquely. For example, we would expect to know the initial temperature
distribution as well as the temperature on the boundaries. We might have
The resulting temperature (graphs of the solution at different times) might look like:
1
t=.05
t=.1
t=.3
t=1
l x
Before solving the equation, we need to establish the ‘principle of superposition’. If u1 and
u2 both satisfy the differential equation, then
∂u1 ∂ 2 u1
=ν 2,
∂t ∂x
∂u2 ∂ 2 u2
=ν 2.
∂t ∂x
We can add these two equations to get
∂(u1 + u2 ) ∂ 2 (u1 + u2 )
=ν .
∂t ∂x2
That is, u1 + u2 is also a solution. Therefore, we can look for a set of solutions of a certain
type and hope to be able to add multiples of them in such a way that the boundary and
initial conditions will be satisfied.
83
This ‘separates’ the function of two variables, u, into two functions of one variable.
∂u ∂ 2u
From the assumption above, we have = X Ṫ and = X 00 T .
∂t ∂x2
We substitute into the DE to get:
X Ṫ = νX 00 T.
1 Ṫ X 00
∴ = .
νT X
Now the LHS of this last equation depends on t only, and the RHS depends on x only. The
only way that this can happen is if both sides are constant. Therefore we have
1 Ṫ X 00
= = k,
νT X
where k is called the constant of separation. These equations can now be written separately
as
Ṫ − νkT = 0 and X 00 − kX = 0.
Note that the differential equation has ‘separated’ into two ordinary differential equations.
Both of these equations can be solved, but the nature of the solutions of the second depends
on the value of k. Also, the boundary conditions of the partial differential equation suggest
that we require X(0) = X(l) = 0 as conditions on X. This actually restricts the possible
values of k. There are three possible cases:
i. k > 0.
In this case, let k = q 2 . Then the solution for X is X(x) = Aeqx + Be−qx . We now
look at the boundary conditions.
X(0) = 0 ⇒ A + B = 0
X(l) = 0 ⇒ Aeql + Be−ql = 0.
These equations imply that A = B = 0. This implies a trivial solution and therefore
there is no non-zero solution of this type.
ii. k = 0.
In this case, X(x) = Ax+B and again the boundary conditions imply that A = B = 0.
This also implies a trivial solution.
iii. k < 0.
In this final case, let k = −p2 . The solution is then X(x) = A cos px + B sin px. Then
X(0) = 0 ⇒ A = 0
X(l) = 0 ⇒ B sin pl = 0.
84
Now B can’t be zero (or all solutions would be zero/trivial) so sin pl = 0. This means
n2 π 2
that pl = nπ for some integer n. Therefore, p = nπ/l and k = − 2 . Therefore
l
nπx
Xn (x) = Bn sin .
l
This gives a range of possible solutions for X. Note that the solutions can be labelled
by the subscript n.
Now the value of k can be used to find the corresponding value of Tn . We have
νn2 π 2
Ṫn + Tn = 0
l2
νn2 π 2
or Ṫn + λ2n Tn = 0 where λ2n =
l2
so Tn (t) = An e−λn t
2
and
nπx −λ2n t
un (x, t) = Xn (x)Tn (t) = Bn sin e .
l
(The constant An can be absorbed into Bn .) Now each such function un (x, t) satisfies the
differential equation and each one also satisfies the boundary conditions. However, the initial
condition, u(x, 0) = 1, is not satisfied. It may be though, that we can take a sum of these
solutions in such a way that this initial condition is satisfied. That is, we write
∞
nπx −λ2n t
u(x, t) = un (x, t) = Bn sin
X X
e
n n=1 l
We need to choose the coefficients Bn so that this equation is satisfied. But this equation
implies that the Bn are the coefficients of the 12 -range Fourier sine series for the function
f (x) = 1. This means that
2Z l nπx
Bn = sin dx
l 0 l
4
, (n odd)
= nπ
0,
(n even).
As the summation in the solution u(x, t) includes all terms for n = 1, 2, 3, . . ., but Bn = 0
for even values of n, we make a substitution in order to use only the odd values.
85
Therefore,
4 X∞
1 (2m − 1)πx −(ν(2m−1)2 π2 /l2 )t
u(x, t) = sin e .
π m=1 2m − 1 l
Note that as t → ∞, u → 0. For large values of t, the temperature is roughly proportional
πx
to sin , as this part of the solution decays least rapidly.
l
Another type of boundary condition that is common for this type of problem is that of an
insulated end.
111
000
000
111
000
111
000
111
000
111
000
111
000
111 This means that no heat flows across the end.
000
111
000
111
000
111
000
111
000
111
000
111
∂u x
(l, t) = ux (l, t) = 0.
∂x
Such conditions can be treated similarly to the other type of boundary condition.
86
7.2.2 The Wave Equation
∂ 2u 2
2∂ u
The wave equation is = c .
∂t2 ∂x2
We consider the auxilliary conditions we would need for the case of a vibrating string.
If the string is fixed at both ends, we expect
u
u(0, t) = u(l, t) = 0.
u(x, 0) = f (x)
l x
ut (x, 0) = g(x).
Again, we can superimpose solutions, so we look for solutions of the type u(x, t) = X(x)T (t).
If we substitute this into the differential equation, we get X(x)T̈ (t) = c2 X 00 (x)T (t). There-
fore,
T̈ X 00
= .
c2 T X
Each side of this equation is a function of x or t only, so once again, they both must be
constant. We get the equations
X 00 − kX = 0 and T̈ − c2 kT = 0.
Once again, the boundary conditions imply X(0) = X(l) = 0, so we can follow the same
procedure as for the Heat Equation (k > 0, k = 0 =⇒ trivial solutions) to get
n2 π 2
k=−
l2
nπx
and Xn (x) = Bn sin .
l
cnπ
The equation for Tn is: T̈n + λ2n Tn = 0, λn =
l
so: Tn = An cos λn t + Bn sin λn t
nπx
and: un (x, t) = sin An cos λn t + Bn sin λn t
l
The arbitrary constant Bn in Xn (x) has been absorbed into An and Bn .
These functions will satisfy the differential equation and the boundary conditions. We still
87
have to satisfy the initial conditions. In this case, there are two of them. We write
∞
nπx
u(x, t) = un (x, t) = sin An cos λn t + Bn sin λn t .
X X
n n=1 l
It follows that ∞
nπx
u(x, 0) = f (x) = An sin
X
.
n=1 l
Hence, the An are the coefficients of the 21 -range Fourier sine series of f (x). Also,
∞
nπx
ut (x, t) = sin− An λn sin λn t + Bn λn cos λn t
X
n=1 l
∞
nπx
ut (x, 0) = g(x) = Bn λn sin
X
∴ .
n=1 l
Therefore, Bn λn are the coefficients of the 12 -range Fourier sine series for g(x).
If we are given functions f (x) and g(x), we can calculate An and Bn using the half range
Fourier series integrals.
Note that each term in the series for u(x, t), i.e.
nπx
un (x, t) = sin An cos λn t + Bn sin λn t ,
l
is a harmonic of the motion.
u1
u2
88
As n increases, the frequency of the nth harmonic increases. Thus. we see that the vibration
can be composed as a linear combination of harmonics.
Note: Depending on the boundary and initial conditions of the PDE, during the separation
of variables method, it is possible to have non-trivial solutions for two different values of the
separation constant k. i.e. non-trivial solutions for k < 0 and k = 0.
Recall that if two different solutions to one differential equation are determined, the sum of
the two solutions will also be a solution.
89
7.3 EXERCISES - Partial Differential Equations
In questions 1–2, u is a function of x and y.
(a) uxx = 0.
(b) uxyy = cos 2x.
(c) ux = 0, uy = 0.
(d) uxx = 0, uxy = 0, uyy = 0.
2. Use ‘separation of variables’ to find (some) solutions of the following partial differential
equations.
(a) ux + uy = 0.
(b) xux − yuy = 0.
(c) ux + uy = 2u
3. If f and g are arbitrary functions of one variable, show that the following expressions
satisfy the corresponding equation from question 2.
(a) f (x − y), (b) f (xy), (c) f (x − y)ex+y .
4. An iron bar of length l has initial temperature u(x, 0) = T0 x/l where T0 is a constant.
For t > 0, the temperature of the end points is fixed at 0. Find an expression for the
temperature in the bar for t > 0.
6. Find the temperature, u(x, t), in a bar of length, l, that is perfectly insulated at the ends
x = 0 and x = l. Assume that the temperature satisfies ut = νuxx and u(x, 0) = f (x).
In the case of insulated ends, the boundary conditions are
∂u ∂u
(0, t) = (l, t) = 0.
∂x ∂x
Show that the method of separation of variables gives the solution in the form
∞
nπx −(cnπ/l)2 t
u(x, t) = A0 + An cos
X
e .
n=1 l
90
7.4 SELECTED ANSWERS - Partial Differential Equations
1. (a) u(x, y) = xf (y) + g(y), (b) u(x, y) = 41 y 2 sin 2x + yp(x) + q(x) + r(y),
(c) u(x, y) = c (a constant), (d) u(x, y) = Ax + By + C.
2. (a) u(x, y) = Cek(x−y) , (b) u(x, y) = C(xy)k , (c) u(x, y) = Ce2y ek(x−y) .
2T0 X
∞
(−1)n+1 nπx −ν(nπ/l)2 t
4. u(x, t) = sin e .
π n=1 n l
91
8 Probability and Statistics
8.1 Probability Fundamentals
An experiment is the process of obtaining an observation.
Some events:
• E1 : Observe a 1.
• E2 : Observe a 2.
• E3 : Observe a 3.
• E4 : Observe a 4.
• E5 : Observe a 5.
• E6 : Observe a 6.
• A: Observe an odd number.
• B: Observe an even number.
Events A and B can be regarded as combinations of events E1 to E6 .
In the example above: Events E1 − E6 are simple events. Events A and B are not.
The Sample Space (S) is the list (or set) of all the simple events possible from an experi-
ment.
For the example of tossing a six-sided die, the sample space is: S = {E1 , E2 , . . . , E6 }
Example: Tossing two coins. The sample space includes four simple events:
Note that E2 and E3 are different events as the order (or the outcome of each coin) is specific.
92
Venn Diagram
A Venn Diagram shows the outcomes (or events) of an experiment as a portion of the sample
space (S)
An example of a Venn diagram displaying three events A, B and C, each made up of different
outcomes is shown below:
Probability of an Event
The probability of an event is a measurement of the likelihood that an event will occur in
the next experiment.
One way to estimate this is to repeat the experiment a large number of times (N ) and record
the number of times (n) that event A occurs.
Theoretically we would need an infinite number of trials for the probability to be exact.
Practically, we would expect, for a fair die, that the numbers 1 − 6 would occur with equal
frequency and hence:
1
P (E1 ) = P (E2 ) = P (E3 ) = P (E4 ) = P (E5 ) = P (E6 ) =
6
93
Mutually Exclusive Events
Two events are mutually exclusive if, when one event occurs, the other cannot.
P (Ei ) = 1
X
The probability of an event A is equal to the sum of the probabilities of all the simple events
contained in A.
1 1 1 1
P (A) = P (E1 ) + P (E3 ) + P (E5 ) = + + =
6 6 6 2
Intersection: The intersection of two events A and B is the event that both A and B occur.
It is denoted as AB or A ∩ B
Union: The union of two events A and B is the event that A or B or both occur. It is
denoted as A ∪ B, and the probability P (A ∪ B) can be visualised with the Venn diagram:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Complement: The complement of an event A consists of all of the simple events that are
not in event A.
94
8.2 Conditional Probability and Independence
Two events may be related such that the probability that an event occurring depends on
whether or not the other event has occurred.
For instance – the probability of rain is dependent on whether or not the day is cloudy.
The conditional probability of A given that B has occurred is written as: P (A |B)
P (A ∩ B)
P (A |B) = or P (A ∩ B) = P (A |B) P (B)
P (B)
P (A ∩ B)
P (B |A) = or P (A ∩ B) = P (B |A) P (A)
P (A)
Example: Consider an experiment that randomly chooses a single object from a total of
1100 objects. 750 objects are coloured red, and 350 are coloured blue. Of the red objects,
600 are made of glass and 150 made of metal. Of the blue objects, 50 are glass and 300
metal. Given that a red object is chosen, calculate the probability of the object being made
of glass:
Using the table, given that a red object is chosen (2nd column), the probability of the object
being glass is:
600 4
P (G |R) = = = 0.8
750 5
P (R ∩ G)
Alternatively, using: P (G |R) =
P (R)
750
Probability that the object is red: P (R) =
1100
600
Probability that the object is red and made of glass: P (R ∩ G) =
1100
P (R ∩ G) (600/1100) 600
Therefore: P (G |R) = = = = 0.8
P (R) (750/1100) 750
95
8.2.1 Independence
i.e. The fact that B has occurred has no influence on the probability of A occurring and
vice versa.
P (A ∩ B) = P (A)P (B)
Let A be the event that the first capacitor is defective, and B be the event that the sec-
ond capacitor is defective. A and B are independent events, but they each have different
probabilities of occurring.
5 4 1
P (A ∩ B) = = = 0.0128
40 39 78
96
Example: An electrical system consists of 4 components. The system works if components
A and B work and if C or D work. Assume that the reliability of each component is
independent. The probabilities of each component working is: P (A) = P (B) = 0.9 and
P (C) = P (D) = 0.8. Calculate the probability that the whole system works.
97
8.3 Permutations and Combinations
A combination is a choice of r objects from a list of n possible (or total) objects, where the
order of the objects does not matter.
Combinations of two
a1 and a2
a1 and a3
a2 and a3
n!
Prn = n(n − 1)(n − 2) . . . (n − r + 1) =
(n − r)!
98
8.4 Random Variables (X)
A random variable is assigned a real number (or value) from a rule (or function) that maps
events (or outcomes) in a sample space from an experiment.
A random variable may be discrete (having specific values) or continuous (any value in a
continuous range).
Categorical data can be described using random variables by arbitrarily assigning a value to
each category.
Example: Consider an experiment involving the toss of two coins. We can arbitrarily
chose a value for each outcome.
Typically, they arise from values of measured quantities (e.g. length, temperature, weight,
etc.).
A probability distribution is a function f (x) that gives the probability for each possible value
of the random variable.
99
The probability distribution is read as the probability that random variable X is equal to a
particular number x.
f (x) = P (X = x)
1 1
In the table above: P (x = 1) = and P (1) =
2 2
The set of ordered pairs [x, f (x)] is known as a probability function, probability mass
function (pmf) or probability distribution.
Some important properties are: f (x) > 0 for all values of x, and f (x) = 1
X
The cumulative distribution function gives us the probability that random variable X is less
than or equal to some value x.
F (x) = P (X 6 x) = P (X = t) = f (t)
X X
t6x t6x
We can sketch the graph of the cumulative distribution function, F (x), by adding values
from the probability mass function, f (x).
100
8.4.2 Probability Distributions for Continuous Random Variables
A continuous random variable has a zero probability of being exactly equal to a particular
value. i.e. P (X = x) = 0.
We are interested in calculating the probability that a continuous random variable lies in a
certain range of values: P (a 6 X 6 b)
Z ∞
Some important properties are: f (x) > 0 and f (x)dx = 1
−∞
As for discrete random variables, the cumulative distribution function evaluates the proba-
bility that random variable X is less than or equal to some value x.
Z x
F (x) = P (X 6 x) = f (t) dt
−∞
101
√
k x 06x<4
Example: Consider the probability density function: f (x) =
0 Elsewhere
Z ∞
a) To evaluate k, we use: f (x)dx = 1, to get:
−∞
Z 4 √
k xdx = 1
0
2 3/2 4
kx =1
3 0
16
k=1
3
3
k=
16
Z 4
3√
b) P (3 < x < 4) = xdx
3 16
3 2 3/2 4
= x
16 3 3
1√
=1− 27
8
= 0.3505
The population mean is also called the expected value, E[X] of the random variable X.
102
If we repeated the experiment many times, say 1 000 000 we would expect:
µ = E[x] = xP (X = x) = xf (x)
X X
x x
Example: A lot containing 7 components is sampled by a quality inspector; the lot contains
4 good components and 3 defective components. A sample of 3 is taken by the inspector.
Calculate the expected value of the number of good components in this sample.
Let X represent the number of good components in the sample. For the probability mass
function f (x), we need the probability of their being 0, 1, 2 or 3 good components in the
sample. i.e. P (X = x), x = 0, 1, 2, 3.
7
!
The total number of combinations in choosing 3 components from 7 is:
3
The number of combinations that have x good components chosen from the 4 possible, and
4 3
! !
3 − x defective components from the 3 possible is: . Therefore the pmf is:
x 3−x
4 3
x 3−x
f (x) =
7
3
103
which gives: f (0) = 1/35, f (1) = 12/35, f (2) = 18/35, f (3) = 4/35. Therefore
1 12 18 4 12
µ = E[X] = (0) + (1) + (2) + (3) = = 1.7
35 35 35 35 7
Therefore, if a sample of size 3 is selected at random over and over again from a lot of 4 good
components and 3 defective components, it will contain, on average, 1.7 good components
Variance (σ 2 )
The variance σ 2 of a random variable, X or var[X], is defined as the mean value, or the
expected value, of the square of the deviation of x from its mean. i.e.
σ 2 = E[(x − µ)2 ]
In general, the expected value of a function g(x) of a discrete random variable is given by:
E[g(x)] = g(x)P (X = x)
X
1. E[cX] = cE[X]
We can also calculate other expectation values related to the random variable X. For
example, we can calculate: E[X 2 ], E[X 3 ], E[X 4 ]. These are called the moments of X.
104
Example: For the two coin toss problem considered previously (µ = 1 )
Outcome (x) Result of Coin Toss Probability f (x) (x − µ)2 f (x)(x − µ)2
0 H and H 1/4 1 1/4
1 H and T 1/2 0 0
2 T and T 1/4 1 1/4
σ2 = (x − µ)2 P (X = x) = 1/2
X
x x x
x x
σ2 = x2 f (x) − µ2 = E[X 2 ] − µ2
X
Standard Deviation
√
The population standard deviation is the square root of the variance: σ = σ2
The expectation values for a continuous random variable are similar in form to the discrete
random variable equations.
105
8.5 EXERCISES - Probability and Statistics
1. An experiment consists of flipping a coin and then tossing a single die if the coin shows
heads or two dice if the coin shows tails. Using the notation H4, for example, to denote
the outcome that the coin shows heads and the die comes up 4, and T 15 to denote the
outcome that the coin comes up tails and the dice show a 1 and a 5, illustrate the 27
elements of the sample space by using a tree diagram. The order of the result of the
dice is not important.
2. A store advertises for two positions: one is front of house and one is for stacking
shelves. The store receives 4 applicants, two of whom are teenagers and the other two
are adults.
The store selects one applicant for the position of front of house at random and then
selects one of the remaining applicants for the shelve stacking position at random.
Using the notation of T1 A2 to denote that the first teenager has been selected for
front of house and the second adult has been selected for stacking shelves, answer the
following questions:
3. (a) 6 people are in a queue to buy a ticket. How many ways can they be lined up?
(b) If, within the queue, there is a group of 3 people who insist on being next to one
another, how many ways are possible?
(c) If, instead, 2 people cannot follow one another, how many ways are possible?
4. Three people are selected at random from a pool of 40 to join a jury. Find the number
of sample points in S for choosing a jury.
5. In how many ways can a group of 5 people be seated at a poker table in a circle?
(Permutations in which everyone has the same neighbours as another permutation
should be discounted)
6. You have 7 friends who are willing to help you move but you only need 3. How many
possible groups of friends can you choose?
7. A box contains 500 lucky-dip prizes, of which 75 are worth $1, 150 are worth $2, and
275 are worth $5. What is the sample space? Assign probabilities to these possible
samples and then find the probability that the first prize drawn is worth more than $1.
8. A physicist has applied for two grants. They estimate that the probability that they
receive grant A is 0.7, and the probability that they receive grant B is 0.4, and they
also estimate that the probability that they obtain any grant is 0.8. What is the
probability that the person: (a) obtains both grants? (b) obtains neither grant?
106
9. A 4th year subject in engineering has 10 undergrads, 30 postgrads, and 10 postdocs
sitting in the course. At the end of semester, 3 of the undergrads, 10 of the postgrads,
and 5 of the postdocs get a HD for the subject. If a student is chosen at random from
this class and is found to have earned a HD, what is the probability that he or she is
a postgrad?
10. Two transport vans contain a concert band’s instruments. One contains 5 clarinets and
2 flutes and the other contains 2 clarinets, 3 flutes and 1 drumkit. If 1 instrument is
taken at random from each of the vans, find the probability that: (a) both instruments
are clarinets; (b) one instrument is a clarinet and one is a flute; (c) the two instruments
are of different types, (d) given that one is a flute, the other is a clarinet.
11. A bakery makes and sells two types of cake for special occasions: sponge and carrot.
Based on long-range sales, the probability that a customer who purchases a sponge
cake is 0.75. Of those that purchase carrot cakes, 90% also ask for frosting. But only
50% of the buyers of sponge cakes ask for frosting. A randomly selected buyer orders
a cake with frosting. What is the probability that the cake is a sponge cake?
12. The circuit below functions if there is an unbroken path from left to right. The proba-
bility for each component to be working is shown in each box and is independent from
the other components. What is the probability:
(a) That the circuit fails?
(b) That component A is working, given that the circuit is working?
(c) That component E is broken, given that the circuit is broken?
13. Draw a Venn diagram showing the sets of probability that correspond to the following
statements. The population of a city has been classified into children and adults. Of
the adults, a proportion are found to be employed and no children are employed. It
is found that a larger proportion of children play sports when compared to adults.
Finally, the proportion of the population with blue eyes is the same for children and
adults as is the proportion of people with brown eyes.
After drawing the diagram, point out any different probabilities that are independent,
mutually exclusive, or otherwise correlated.
14. The probability for a customer to write a bad review about a restaurant is P (X) = 10%,
the probability of a hot day is P (Y ) = 50% and the proportion of reviews that are bad
and occur on hot days is P (X and Y ) = 2%. Can you say that event X and Y are
independent, mutually exclusive or otherwise? Justify why in all three cases.
107
15. Classify the following variables as discrete or continuous:
16. In a gambling game, a woman is paid $3 if she draws a jack or a queen and $5 if she
draws a king or an ace from an ordinary deck of 52 playing cards. If she draws any
other card, she loses. (a) How much should she pay to play if the game is fair, that is,
if its expected payoff is zero? (b) In this case, what is the variance of the payoff?
17. Computer technology has produced an environment in which “robots” operate with
the use of microprocessors. The probability that a robot fails during any 6 hour shift
is 0.10. What is the probability that a robot will operate at most 5 shifts before it
fails?
18. The probability distribution of X, the number of imperfections per 10 meters of a
synthetic fabric in continuous rolls of uniform width, is given by:
x 0 1 2 3 4
f (x) 0.41 0.37 0.16 0.05 0.01
Find the average number of imperfections per 10 meters of this fabric and the variance
in the number of imperfections.
19. A cereal manufacturer is aware that the weight of the product in the box varies slightly
from box to box. In fact, considerable historical data have allowed the determination of
the density function that describes the probability structure for the weight (in ounces).
Letting X be the random variable weight, in ounces, the density function can be
described as f (x) = 2/5 if 23.75 6 x 6 26.25, and f (x) = 0 elsewhere.
20. The waiting time, in hours, between successive speeders spotted by a radar unit is a
continuous random variable with cumulative distribution function:
0 x<0
F (x) =
1 − e−8x x>0
(a) Find the probability of waiting less than 12 minutes between successive speeders.
(b) Find the probability of waiting more than an hour between successive speeders.
(c) Find the probability density function and comment on the relative likelihood of
finding a successive speeder at 10 min as compared to finding a speeder at 30 min.
108
21. Consider the density function
√
k x 0<x<1
f (x) =
0 elsewhere
24. Show that the following is true: V ar[X] = E[(X − µ)2 ] = E[X 2 ] − E[X]2
25. Assume that the velocity in the x−direction of atoms in a gas takes the distribution
1/2 −1 < vx < 1
f (vx ) =
0 elsewhere
109
8.6 SELECTED ANSWERS - Probability and Statistics
1.
2. (a) S = {T1 T2 , T1 A1 , T1 A2 ; T2 T1 , T2 A1 , T2 A2 ; A1 T1 , A1 T2 , A1 A2 ; A2 T1 , A2 T2 , A2 A1 }
(b) B = {T1 T2 , T1 A1 , T1 A2 , T2 T1 , T2 A1 , T2 A2 }
(c) C = {T1 A1 , T1 A2 , T2 A1 , T2 A2 , A1 T1 , A1 T2 , A2 T1 , A2 T2 } (d) D = {T1 T2 , T2 T1 }
(e) B ∩ C = {T1 A1 , T1 A2 , T2 A1 , T2 A2 } (f) B ∩ D = {T1 T2 , T2 T1 }
(g)
12. (a) 0.1761 = 17.61% (b) 0.7345 = 73.45% (c) 0.3691 = 36.91%
13.
110
14. X and Y are not independent. X and Y are not mutually exclusive.
15. (a) discrete, (b) continuous, (c) continuous, (d) discrete, (e) discrete, (f) continuous,
(g) Can be continuous or discrete.
16. (a) Should not pay more than $1.23 to play. (b) Variance: $3.72
18. 0.88
19. (a) Valid density function (b) 0.1 = 10% (c) 0.1 = 10%
0 x < 23.75
2
Z x
(d) F (x) = f (t) dt = (x − 23.75) 23.75 < x < 26.25
−∞
5
0 otherwise
20. (a) 0.7981 = 79.81% (b) 0.0003 = 0.03% (c) 14.39 14.4 times more likely
24. Proof
√
25. (a) E[vx ] = 0, σ 2 = 1/3, σ = 1/ 3 (b) Using: E(vx2 ) = 1/3
26. E[g(x)] = 3
111
9 Discrete Probability Distributions
We will consider three specific discrete probability distributions: The Binomial Distribution,
The Poisson Distribution and The Hypergeometric Distribution. We will investigate the
probability mass function (pmf), the cumulative distribution function (cdf) and the expec-
tation values (mean and vaiance) of each.
We can also allow for the possibility that the coin is not ‘fair’. i.e. Allow the possibility of
a head be different than the possibility of a tail. Let the possibility of a head be p and the
possibility of a tail be q. As we only have two possible outcomes: q = 1 − p
Initially consider the case where the first i tosses are heads and the rest are tails. i.e. the
sequence of outcomes is:
|1111{z. . . 1} 00000
| {z . . . 0}
i ones n − i zeros
The probability of this occurring is pi q n−i . This is also the probability for any particular
sequence of i heads and n − i tails.
The second part is to work out the number of different ways that we can get i heads and
n − i tails. This will be the same as the number of ways that we can order the digits
111 . . . 1000 . . . 0.
!
n n!
This is the same as the number of combinations: Crn = = , or the binomial
r r!(n − r)!
coefficient.
112
Putting these results together, we get the probability mass function which states the prob-
ability of x successes from n trials:
! !
n x n−x n x
f (x) = P (X = x) = p q = p (1 − p)n−x
x x
Note: From the Binomial Theorem, the probability of getting any number of successes is:
n n n
!
n k n−k
f (k) = P (X = k) = = (p + q)n = 1 as expected
X X X
p q
k=0 k=0 k=0 k
The graph of the binomial distribution for p = 0.7, q = 0.3 and n = 20 is shown:
The graph of the cumulative distribution function for p = 0.7, q = 0.3 and n = 20 is:
113
Expected Values for the Binomial Distribution
Mean: µ = np
Variance: σ 2 = npq
√
Standard Deviation: σ = npq
Example: Components are built on mass, and tested with a shock test. The probability
that a single component will survive a shock test is 0.75. (a) Find the probability that
exactly 2 of the next 4 components will survive. (b) Find the probability that at least 3 of
the 4 components survive.
!
n x
(a) Using: binom(x; n, p) : f (x) = P (X = x) = p (1 − p)n−x
x
where: x = 2, n = 4, p = 0.75
4
!
binom(2; 4, 0.75) : P (X = 2) = (0.75)2 (1 − 0.75)4−2
2
4!
= (0.75)2 (0.25)2
2!(4 − 2)!
= 0.2109
4 4
! !
P (X = 3) + P (x = 4) = (0.75)3 (1 − 0.75)4−3 + (0.75)4 (1 − 0.75)4−4
3 4
= 4(0.75)3 (0.25) + (0.75)4
= 0.7383
Example: There are 20 marbles in a jar, 2 white marbles and 18 black marbles. 3 marbles
are chosen at random without replacement.
In this example, trials are not independent as the probability of getting a white marble on
the second pick depends significantly on the outcome of the first pick. Therefore using a
binomial distribution would not give accurate approximations.
If we had 200 marbles in the jar, the difference in probability would be small and using the
binomial distribution would give more accurate approximations.
114
9.2 Poisson Distribution
The Poisson distribution of a discrete random variable, expresses the probability of a given
number of events occurring in a fixed interval of time or a fixed region in space, if these
events occur with a known constant mean. Events must occur independently of the time
since the last event.
Examples Include:
115
Poisson Approximation to the Binomial Distribution
The Poisson distribution is the limiting case of the Binomial distribution as the number of
trials increases to infinity.
Consequently, the Poison distribution can be used as a good approximation to the Binomial
distribution if there is a large number of trials.
This is a binomial experiment with n = 5000 and p = 0.001: binom(6; 5000, 0.001)
As n is large, we can approximate the probability using a Poison distribution: P ois(6, 5),
where λ = 5 is the mean of the binomial distribution: µ = np = (5000)(0.001).
116
9.3 Hypergeometric Distribution
For the binomial distribution, the probability of a success (or of a failure) is constant for
successive events. i.e. all trials are independent. This is equivalent to choosing items from
a set with replacement.
If an experiment consists of choosing items from a set without replacement, then the
probability of success dependents on the outcome of the previous trials.
Examples include:
• A box contains 10 red balls, 10 yellow and 20 blue. What’s the probability that 5 blue
balls are chosen from a random sample of 10 balls.
• The probability of choosing 5 red cards (diamonds or hearts) in a hand of 10 cards
chosen from a standard deck of 52.
• A group has 60% females and 40% males. What’s the probability a random sample of
10 people will have 7 females.
The mean and variance of the hypergeomtric distribution are similar to the binomial distri-
bution with a correction for population size.
!
k
Mean: µ = n
N
! !
k N −k N −n
Variance: σ = n
2
N N N −1
117
Example: A bucket contains 8 balls, 5 of which are red and 3 of which are white. A
sample of 4 balls is randomly selected from the bucket. (a) Find the probability distribution
for x, the number of white balls in the sample. (b) Find the mean and variance of x.
3 5
0 4 (1)(5)
P (X = 0) = f (0) = = = 5/70 = 0.0714
(70)
8
4
3 5
1 3 (3)(10)
P (X = 1) = f (1) = = = 3/7 = 0.4286
70
8
4
3 5
2 2 (3)(10)
P (X = 2) = f (2) = = = 3/7 = 0.4286
70
8
4
3 5
3 1 (1)(5)
P (X = 3) = f (3) = = = 5/70 = 0.0714
70
8
4
118
9.4 EXERCISES - Discrete Probability Distributions
1. A shipment of 7 television sets contains 2 defective sets. A hotel makes a random
purchase of 3 of the sets. If X is the number of defective sets purchased by the hotel,
find the probability distribution of X. Express the results graphically as a probability
histogram, then find the following probabilities:
(a) P (X = 1)
(b) P (0 < X 6 2)
2. A safety engineer claims that only 40% of all workers wear safety helmets when they
eat lunch at the workplace. Assuming that this claim is right, find the probability that
4 of 6 workers randomly chosen will be wearing their helmets while having lunch at
the workplace.
3. One prominent physician claims that 70% of those with lung cancer are chain smokers.
If his assertion is correct,
(a) Find the probability that of 20 such patients recently admitted to a hospital, more
than 18 are chain smokers.
(b) Find the probability that of 10 such patients recently admitted to a hospital,
fewer than half are chain smokers.
4. A manufacturer knows that on the average 20% of the electric toasters which he makes
will require repairs within 1 year after they are sold. When 20 toasters are randomly
selected, find appropriate numbers x and y such that
(a) the probability that at least x of them will require repairs is less than 0.5
(b) the probability that at least y of them will not require repairs is greater than 0.8.
5. To avoid detection at customs, a traveler places 6 narcotic tablets in a bottle containing
9 vitamin pills that are similar in appearance. If the customs official selects 3 of the
tablets at random for analysis, what is the probability that the traveler will be arrested
for illegal possession of narcotics?
6. Population studies of biology and the environment often tag and release subjects in
order to estimate size and degree of certain features in the population. Ten animals of
a certain population thought to be extinct (or near extinction) are caught, tagged and
released in a certain region. After a period of time a random sample of 15 of this type
of animal is selected in the region. What is the probability that 5 of those selected are
tagged animals if there are 25 animals of this type in the region?
7. A secretary makes 2 errors per page, on average. What is the probability that on the
next page he or she will make
(a) 4 or more errors?
(b) No errors?
119
8. Changes in airport procedures require considerable planning. Arrival rates of aircraft
are important factors that must be taken into account. Suppose small aircraft arrive
at a certain airport, according to a Poisson process, at the rate of 6 per hour.
(a) What is the probability that exactly 4 small aircraft arrive during a 1 hour period?
(b) What is the probability that at least 4 arrive during a 1 hour period?
(c) If we define a working day as 12 hours, what is the probability that at least 5
small aircraft arrive during a working day?
9. Hospital administrators in large cities anguish about problems with traffic in emergency
rooms in hospitals. For a particular hospital in a large city, the staff on hand cannot
accommodate the patient traffic if there are more than 3 emergency cases in a given
hour. It is assumed that patient arrival follows a Poisson process and historical data
suggest that, on the average, one emergency arrives per hour.
(a) What is the probability that in a given hour the staff can no longer accommodate
the traffic?
(b) What is the probability that more than 4 emergencies arrive during a 3 hour shift
of personnel?
120
9.5 SELECTED ANSWERS - Discrete Probability Distributions
1.
(a) 0.5714 = 57.14% (b) 0.7143 = 71.43%
2. 0.1382 = 13.82%
4. (a) 5 (b) 15
5. 0.8154 = 81.54%
6. 0.2315 = 23.15%
121
10 Continuous Probability Distributions
10.1 Uniform Probability Density Function
Assume that we have a continuous random variable that can take a value between two points
(a and b). The uniform distribution is a continuous probability distribution and is concerned
with events that are equally likely to occur within the interval: [a 6 X 6 b]
The uniform probability density function (pdf) has a rectangular shape over the interval.
As the total area under the curve (i.e. the total probability) must equal 1, the height is:
1/(b − a)
1
a6x6b
f (x) = b−a
0
otherwise
Z b
a+b
Mean: µ = E[X] = xf (x) dx =
a 2
Note: The area under the probability density function on each side of the mean must be
the same (i.e. 50%).
(b − a)2
Variance: σ = E[(X − µ) ] =
2 2
12
The probability of an event occurring between any two points c and d within the interval
(a, b) is equal to the area under the uniform pdf curve and is given by:
Z d
1 d−c
P (c 6 x 6 d) = dx =
c b−a b−a
122
Example: Buses arrive at the university bus stop every 30 minutes. A student arrives at
the bus stop at a random time. The time that the student waits for the next bus to arrive
(X) could be described by a uniform distribution over the interval from 0 to 30 mins.
a) Determine the probability density function.
b) Find the probability that the waiting time will exceed 10 minutes: P (x > 10)
c) Calculate the mean and standard deviation of x.
d) Calculate the probability that the waiting time will lie within one standard deviation
of the mean i.e. (µ ± σ)
1
0 6 x 6 30
f (x) = 30
0 otherwise
123
10.2 Normal Distribution (Gaussian Distribution)
The normal distribution is a particularly important distribution in statistics. Many natural
phenomena can be modelled using a normal distribution.
1 2 2
f (x) = √ e−(x−µ) /2σ −∞6x6∞
σ 2π
This function describes a family of normal distributions for different values of µ and σ.
The graph of this function is symmetric about x = µ and approaches zero as x → ±∞.
A large value of σ gives a distribution curve with reduced height and greater spread.
A small value of σ gives a distribution with increased height and reduced spread.
Note: The notation used to indicate a random variable X follows a normal distibution
with mean of µ and a standard deviation of σ is:
X ∼ N (µ, σ)
124
10.2.1 Standard Normal Distribution
To calculate the probability associated with a random variable described by a normal dis-
tribution, we require integrals of the form:
Z b Z b
1 2 2
P (a 6 X 6 b) = f (x) dx = √ e−(x−µ) /2σ dx
a a σ 2π
This is a complicated integral and will change for different values of µ and σ.
The standard normal distribution (for variable Z) is a transformation of the normal distri-
bution (for variable X) obtained by the transformation:
(X − µ)
Z=
σ
Which gives a transformed distribution with a mean µ = 0 and standard deviation σ = 1.
1 2
f (z) = √ e−z /2
2π
The graph of the standard normal density function is called the standard normal curve:
The value of Z corresponding to a particular value of X is known as the Z-score for that
value.
(X − 50)
Example: If X ∼ N (50, 10), we transform to Z ∼ N (0, 1) using Z = .
10
Therefore the X−values x1 = 45 and x2 = 62 are transformed to Z−scores:
45 − 50 62 − 50
z1 = = −0.5 and z2 = = 1.2
10 10
and
P (45 < X < 62) = P (−0.5 < Z < 1.2)
To evaluate the probability that Z takes a value between to z−scores, we need to evaluate
the area under the standard normal curve. From the last example:
1 Z 1.2 −z2
P (−0.5 < Z < 1.2) = √ e 2 dz
2π −0.5
125
Fortunately we are not required to evaluate this integral. Areas under the standard normal
curve have previously been calculated and tabled:
t 0 1 2 3 4 5 6 7 8 9
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0754
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2258 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549
0.7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2996 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
3.1 .4990 .4991 ,4991 .4991 .4992 .4992 .4992 .4992 .4993 .4993
3.2 .4993 .4993 .4994 .4994 .4994 .4994 .4994 .4995 .4995 .4995
3.3 .4995 .4995 .4995 .4996 .4996 .4996 .4996 .4996 .4996 .4997
3.4 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4997 .4998
3.5 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998 .4998
3.6 .4998 .4998 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.7 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.8 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999 .4999
3.9 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000 .5000
126
In the above table, all probabilities are displayed as P (0 < Z < t), i.e. the area under the
curve from Z = 0 to Z = t. This is why the largest value in the table is 0.5 which is half of
the total area under the curve.
So we need to use the symmetry of the graph, and the resulting values in the table to
calculate probabilities.
P (−0.5 < Z < 1.2) = P (−0.5 < Z < 0) + P (0 < Z < 1.2)
= P (0 < Z < 0.5) + P (0 < Z < 1.2)
= 0.1915 + 0.3849
= 0.5764
This is the probability that Z falls between 0 and 0.7 standard deviations from the mean.
Example: Determine the area under the curve within the range:
(a) One standard deviation either side of the mean.
(b) Two standard deviations either side of the mean.
(a) For one standard deviation on one side of the mean, the area is given by:
So for one standard deviation either side of the mean, the area is:
(b) For two standard deviations either side of the mean, the area is given by:
Example: The heights of a plantation of seedlings are found to fit a normal distribution
with a mean of 72 cm and a standard deviation of 8cm. What is the probability that a
randomly selected seedling will be between 68 cm and 82cm tall.
127
For x = 68, z = −0.5 and for x = 82, z = 1.25.
Therefore the probability that a randomly chosen seedling has a height between 68 and 82
cm is 0.5859 or 58.59%.
We have already used the Poisson Distribution as an approximation to the Binomial Distri-
bution when n is large and p is small.
The normal distribution is also a good approximation to the binomial distribution when p
is close to 0.5 and n is large.
To demonstrate the approximation, define the mean and standard deviation of the normal
ditribution as:
√
µ = np and σ = npq
To approximate this with a normal distribution, using the random variable X̄, let: µ = np =
√
6, and σ = npq = 1.897. i.e. X̄ ∼ N (6, 1.897)
128
10.3 Log-Normal Distribution
This distribution applies where a natural log transformation of the random variable results
in a normal probability distribution.
1
h i
− 12 ( ln x−µ )
2
f (x) = √ e σ
for x > 0
2πσx
The mean and standard deviation of the log-normal random variable X are related to the
mean and standard deviation of ln X.
1
µx = e[µy + 2 σy ]
1
mean: µy = ln µx − σy2 or
2
!2
σx
σx2 = e[2µy +σy ] eσy − 1
2
2
variance: σy2 = ln 1 + or
µx
These equations are not particularly convenient to use directly. As y = ln(x), we can use
standard normal distribution tables for log-normal distributions.
Example: The particle size of the material coming out of a rock crusher (X) follows a
log-normal distribution with a mean of 2 cm and a standard deviation of 1 cm. Particles are
put through a sieve screen with a mesh size of 1 cm. Determine the proportion of particles
that will pass through the screen, i.e. p(X < 1)
Given: µx = 2, σx = 1, we first calculate the mean and variance of the corresponding normal
distribution, Y = ln(X).
129
!2
σx 1
variance: σy2 = ln 1 + mean: µy = ln µx − σy2
2
µx
1
1
2 !
= ln 2 − (0.2231)
= ln 1 + 2
2
= 0.5815
= 0.2231
∴ σy = 0.4724
ln 1 − 0.5815
" #
The Z−score corresponding to X = 1 or Y = ln X = 0 is: Z = = −1.2311
0.4724
From the standard normal distribution tables: P (X < 1) = 0.5 − 0.3907 = 0.1093
130
10.4 Exponential Distribution
The exponential distribution is often used to describe the amount of time until some specific
event happens. It is a process in which events happen continuously and independently at
a constant average rate. This rate is called the distribution rate λ. So the exponential
distribution can be used to describe the time between occurrences of successive events as
time progresses continuously.
1 1
where the mean, µ = and standard deviation, σ =
λ λ
An example of the graph of this function is shown below:
Recall that the Poisson (discrete) distribution describes the probability that x events will
occur over a given length of time, and has a probability mass function:
µx e−µ
f (x) =
x!
where the single parameter µ is the mean number of events that occur in the given time.
µ
We can define an average rate of occurrence as: λ =
t
131
5
For example, if µ = 5 occurrences in 10 hrs, then λ = = 0.5 occurrences per hour.
10
(λt)x e−λt
Substituting into the Poisson pmf gives: f (x) =
x!
(λt)0 e−λt
The probability of no events occurring in time t is: P (X = 0) = f (0) = = e−λt
0!
If we let X be the random variable for the time required for the first event to occur, i.e. the
time to the first event, then the probability that the length of time to the first event will
exceed x is equal the same as the probability that no events occur in time x.
dF (x)
Therefore, we differentiate to find the probability mass function: f (x) = = λe−λx .
dx
This is the density function for the exponential distribution.
1
Note: For the exponential distribution, we often write: λ = , which gives a pdf of:
β
1 − β1 x
f (x) = e
β
The mean and standard deviation are given by: µ=β and σ=β
In this form, β is the mean time between events. In reliability theory, we are concerned
about equipment failure, β is called mean time between failures, or to failure (MTTF), and
1
λ = is the mean failure rate (e,g, failures per hour, per cycle etc.).
β
In this application, the exponential distribution is based on the assumption of a constant
mean failure rate.
Example: A device has a mean failure rate of 0.05 failures per hour of operation. Calculate
the probability that the device will fail in the first 10 hours of operation.
=1−e −0.05(10)
= 0.3935
132
10.5 Gamma Distribution
The Gamma Distribution models the probability of a certain time occurring until a specified
number of Poisson events have occurred.
We let α be the specified number of events, and β is the mean time between events as for
the exponential distribution.
The gamma distribution is based on the gamma function, which is defined as:
Z ∞
Γ(α) = xα−1 e−x dx α>0
0
The gamma distribution includes the two parameters α and β and has a probability density
function:
1
xα−1 e−x/β x>0
f (x) = β α Γ(α)
0 elsewhere
where α > 0 and β > 0.
µ = αβ and σ 2 = αβ 2
133
10.6 Weibull Distribution
Like the gamma and exponential distributions, the Weibull distribution is also applied to
reliability and life-testing problems such as the time to failure or life length of a component,
measured from some specified time until it fails.
A random variable T that is described by a Weibull distribution with two parameters α (the
scale parameter) and β (the shape parameter) has a probability density function
β
f (t) = αβtβ−1 e−αt
The shape factor β, is related to the mean failure rate which is not necessarily constant in
this case, unlike the exponential distribution. And the scale factor α, is used to describe the
variability in the random variable being described.
β
F (t) = 1 − e−αt
The shape of the plot of the Weibull pdf varies considerably with values of α and β. As such
it has wide application. It is used a lot in equipment reliability modelling. The figure shows
graphs of f (x) for α = 1 and different β values.
134
Example: The time to failure of a machine component follows a Weibull Distribution.
Let T be the random variable describing the time to failure, in hours with parameter values
α = 0.01 and β = 2. Calculate the probability that the machine part fails before 10 hrs of
operation.
2
P (T < 10) = F (10) = 1 − e−(0.01)(10) = 1 − 0.3679 = 0.6321
The reliability is defined as the probability that the component will survive at least until a
specified time under operating conditions.
(Conversely – the unreliability is the probability of failure within the specified time period).
Z ∞
R(t) = P (T > t) = f (t)dt = 1 − F (t)
t
Where f (t) is the probability density function of the time to failure and F (t) is the corre-
sponding cumulative distribution function.
The conditional probability that the component will fail in the time interval from T = t to
T = t + ∆t given that it has survived to time T is given by:
F (t + ∆t) − F (t)
R(t)
The failure rate function (failures per unit time) Z(t), is calculated by dividing by ∆t and
take the limit as ∆t → 0. Therefore:
F (t + ∆t) − F (t) 1 F 0 (t) f (t) f (t)
Z(t) = lim = = =
∆t→0 ∆t R(t) R(t) R(t) 1 − F (t)
We can use this to model component failure where the failure rate is not constant. We can
interpret different failure rates as:
(a) β = 1: Constant failure rate. Weibull Dist. reduces to the exponential dist.
(b) β > 1 : The failure rate increases with time. i.e. the components show wear or damage.
(c) β < 1 : The failure rate decreases with time. The components get stronger with time.
135
10.7 EXERCISES - Continuous Probability Distributions
1. A company pays its employees an average wage of $15.90 an hour with a standard
deviation of $1.50. If the wages are approximately normally distributed and paid to
the nearest cent, calculate:
(a) the percentage of workers that receive wages between $13.75 and $16.22 per hour
(b) the hourly wage that the highest 5% of employees get paid.
2. The IQs of 600 applicants to a certain college are approximately normally distributed
with a mean of 115 and a standard deviation of 12. If the college requires an IQ of at
least 95, how many of these students will be rejected on this basis of IQ, regardless of
their other qualifications? Note that IQs are recorded to the nearest integers.
3. The length of time for one individual to be served at a cafeteria is a random variable
having an exponential distribution with a mean of 4 minutes. What is the probability
that a person is served in less than 3 minutes on at least 4 of the next 6 days?
4. The response time of a computer system, in seconds, has an exponential distribution
with a mean of 3 seconds. (a) What is the probability that response time exceeds 5
seconds? (b) What is the probability that response time exceeds 10 seconds?
5. The service life, in years, of a hearing aid battery is a random variable having a Weibull
1
distribution with α = and β = 2.
2
(a) After what time, is it expected that 50% of a batch of these batteries are dead?
(b) What is the probability that such a battery will be operating after 2 years?
1
6. The life of a car door seal has a Weibull distribution with failure rate Z(t) = √ .
t
Find the probability that such a seal is still intact after 4 years. Hint: the Weibull
β
distribution can be written as: f (t) = Z(t)e−αt .
7. A manufacturer of a large machine wishes to buy rivets from one of two manufactur-
ers. It is important that the breaking strengths of each rivet exceed 10,000 psi. Two
manufacturers (A and B) offer this type of rivet and both have rivets whose breaking
strength is normally distributed. The mean breaking strengths for manufacturers A
and B are 14,000 psi and 13,000 psi, respectively. The standard deviations are 2000
psi and 1000 psi, respectively. Which manufacturer will produce, on the average, the
fewest number of defective rivets?
8. The life of a device follows an exponential distribution with an advertised failure rate
of 0.01 per hour.
136
10.8 SELECTED ANSWERS - Continuous Probability
Distributions
1. (a) 0.5068 = 50.68% (b) 18.3674 = $ 18.37
3. 0.3968 = 39.68%
6. 0.0183 = 1.83%
137
11 Sampling and Hypothesis Testing
11.1 Chebychev’s Theorem
Chebychev found that the fraction of area for a probability distribution between any two
values symmetric about the mean is related to the standard deviation.
The probability that a random variable falls between two values is equal to the area.
The probability that any random variable X will assume a value within k standard deviations
1
of the mean is at least 1 − 2 . i.e.
k
1
P (µ − kσ < X < µ + kσ) > 1 −
k2
1 3
Example: For k = 2, random variable X has a minimum probability of 1 − 2 = of
2 4
falling within two standard deviations either side of the mean.
Note that the theorem gives a minimum value of the probability. The actual probability will
be something greater than this value.
138
11.2 Sampling and Sampling Distributions
A population is the entire set or collection of observations. i.e. it is the totality of possible
observations with which we are concerned.
• Each observation is a value of random variable X.
• X has a probability distribution f (x).
A sample is a smaller set of observations taken from the population. i.e. it is a subset of
the population. For example, the total number of resistors in a box might be 100 000. A
sample of 100 resistors may be taken from the box.
The main objective of statistics is to make inferences about a population based on the
information contained in a sample.
Sample Mean: For a set of observations x1 , x2 , x3 , . . . , xn , the sample mean is given by:
n
xi x1 , x2 , . . . , xn
x̄ = =
X
i=1 n n
The mean is sensitive to extreme observations whereas the median is less effected.
i=1 n−1
√
Sample Standard Deviation: s = s2
139
11.3 Central Limit Theorem
The central limit theorem is an important concept in statistics and is widely used.
• Consider a population with a mean µ, and standard deviation σ, with some probability
distribution.
• Repeatedly take sufficiently large random samples (of size n from the population.
• The sample means are a random variable and hence will have their own probability
distribution.
• The central limit theorem states that the sample means will approach a normal distri-
bution as the number of samples tends to infinity.
• This is irrespective of the shape of the original probability distribution. For example
if the population distribution is binomial (discrete) or exponential (continuous) – the
distribution of sample means will approximate a normal distribution for a large enough
sample.
• If the population distribution is normal. The sampling distribution of the means will
be normal irrespective of sample size.
• What is a large enough sample size?
– n > 30
– If the population exhibits a normal distribution, then the central limit theorem
holds for samples of any size.
• In some way it explains why the normal distribution is so prevalent.
For the probability distribution for the sample means x̄
140
11.4 Statistical Hypothesis Testing
Hypothesis testing involves using sample data to test whether or not a claim about a par-
ticular population parameter is true.
It involves consideration of two contradictory hypotheses, known as the null and alternative
hypotheses.
141
Example: Consider a population of students at a university. Consider the weight of these
students.
• Null Hypothesis: The average weight of students is 68 kg.
• Alternative Hypothesis: The average weight of students is not 68 kg.
• Test Statistic: The mean of a sufficiently large sample of students. Say, for example,
we take a sample of 36 students and calculate the mean weight.
• Rejection Region:
The choice of the boundaries of the rejection region (at this stage) are somewhat arbitrary.
They take on a bit more significance after the next section.
If the mean of the sample is less than 67 or greater than 69, we reject the null hypothesis.
i.e. we accept that the mean of the population is something other than 68.
(In the court case analogy this would be where the accused is found guilty when they are
actually innocent)
(In the court case analogy this would be where the accused is not found guilty when they
actually are).
Decision Table
Null Hypothesis
Decision True False
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error
142
Example (revisited): For the example above, let us consider that we have a sample size
of 36 students.
It would be reasonable to assume that the central limit theorem applies and that the sample
means follow a normal distribution.
Assume that we know that the standard deviation of the population σ = 3.6. Note that
we often do not know this and need to use the standard deviation of the sample s, as an
estimate of σ.
The probability of committing a type I error (i.e. rejecting that the mean is 68 when it is
actually true) is given by:
i.e. there is a 9.5% chance that the sample we took has a mean in the reject region even
though the mean of the population is 68
143
Probability of a type II error
The probability of commiting a type II error (i.e we do not reject that the mean is 68 when
it’s not).
It is not possible to calculate the probability of making a type II error without making some
specific assumption about the alternative hypothesis.
A type II error will occur when the sample mean falls between 67 and 69 when µ = 70kg is
true.
144
11.5 EXERCISES - Sampling and Hypothesis Testing
1. An electrical firm manufactures a 100-watt light bulb, which, according to specifications
written on the package, has a mean life of 900 hours with a standard deviation of 50
hours. At most, what percentage of the bulbs fail to last even 700 hours? Assume that
the distribution is symmetric about the mean.
2. At a particular school, it is found that the mean height of children in grade 10 is 150cm
and the standard deviation is 5 cm. A teacher suspects there is a problem with this
result and believes they can prove it, as their class contains 10 children that are taller
than 165 cm and 8 children that are shorter than 135 cm. Given there are 100 children
in grade 10, how can the teacher prove there is a problem?
3. Suppose that an allergist wishes to test the hypothesis that at least 30% of the public
is allergic to some cheese products. Explain how the allergist could commit
(a) a type I error?
(b) a type II error?
4. The proportion of adults living in a small town who are college graduates is estimated
to be p = 0.6. To test this hypothesis, a random sample of 15 adults is selected. If
the number of college graduates in our sample is anywhere from 6 to 12, we shall not
reject the null hypothesis that p = 0.6; otherwise, we shall conclude that p 6= 0.6.
(a) Evaluate the type I error assuming that p = 0.6. Use the binomial distribution.
(b) Evaluate the type II error for the alternatives of p = 0.5 and p = 0.7.
(c) Is this a good test procedure?
5. A dry cleaning establishment claims that a new spot remover will remove more than
70% of the spots to which it is applied. To check this claim, the spot remover will be
used on 12 spots chosen at random. If fewer than 11 of the spots are removed, we shall
not reject the null hypothesis that p = 0.7; otherwise, we conclude that p > 0.7.
6. A manufacturer has developed a new fishing line, which he claims has a mean breaking
strength of 15 kilograms with a standard deviation of 0.5 kilograms. To test the
hypothesis that µ = 15 kilograms against the alternative that µ < 15 kilograms, a
random sample of 50 lines will be tested. The critical region is defined to be x̄ < 14.9.
145
8. It is stated that the length of string in a roll is 5m ± 0.1m with a 95% confidence level
(i.e. a 5% significance level of a type I error).
(a) Assuming a normally distributed length, what is the standard deviation?
(b) Given the alternative of 5m and a standard deviation of 1m, what is the proba-
bility of a type II error?
9. The number of holes in a 100 m2 area of old roofing iron is believed to be 3 and follows
an exponential distribution.
(a) Given a 5% significance level, where we are concerned only with more holes, what
is the critical region to reject the null hypothesis?
(b) A buyer believes there is a much larger number of 10 holes on average. What is
the probability of a type II error, i.e. how easily can they prove this?
(c) The seller believes there is only 1 hole on average. Find the 5% significance level,
when we are concerned only with fewer holes and find the probability of a type
II error in this case. How easily can the seller prove their case?
10. The time that a sacrificial anode lasts on the hull of a ship is believed to follow a
Weibull distribution with β = 3.
(a) It is also believed that 50% of the anodes last less than 1 year. What is the value
of α in the Weibull distribution?
(b) What is the critical region for a 5% significance level?
(c) Given the alternative, that 50% of the anodes last only 6 months, what is the
probability of a type II error?
146
11.6 SELECTED ANSWERS - Sampling and Hypothesis Testing
1. P (X 6 700) < 0.0312
3. (a) The allergist concludes that less than 30% of the public are allergic to some cheese
products when, im fact, 30% or more are allergic.
(b) The allergist concludes that at least 30% of the public are allergic to some cheese
products when, im fact, less than 30% are allergic.
4. (a) 0.0609 = 6.09% (b) p = 0.5: 0.8454 = 84.54%. p = 0.7: 0.8695 = 86.95%
(c) Not a good test.
8.
9.
10.
147