Lecture11 (Week 12) Updated

AMA1110 Basic Mathematics I - Calculus and
Probability & Statistics

Lecture 11
19 November 2019
Dr. Guofeng Zhang
Email: guofeng.zhang@polyu.edu.hk
Office: TU832
Telephone: 2766 6936
Student Consultation Hours: Tue. 13:00–15:00
1 / 34
A brief review of the last lecture
Compute mean and variance according to the probability
distribution table:
X
µ = E[X] , ufX (u)
u
X X
2 2
σ = Var(X) , (u − µ) fX (u) = u2 fX (u) − µ2 ,
u u
where fX (u) = P (X = u) is the probability function of the

discrete random variable X.
Binomial distribution: X ∼ Binomial(n, p).

n k n−k
P (X = k) = p q , k = 0, 1, . . . , n
k
E[X] = np, Var(X) = npq, where q = 1 − p.
Poisson distribution: X ∼ Poisson(λ).
λk
P (X = k) = e−λ , E[X] = Var(X) = λ.
k!
2 / 34
Outline
(1) Continuous random variables

(2) Normal distribution: standard normal distribution;
standardization
(3) Approximation of binomial distribution by normal
distribution
3 / 34
§11.2 Continuous Random Variables
There are many random variables taking continuous values (namely real
numbers), like
• the life length of a light bulb
• time taken to finish an assignment
• the length of a person’s sleep in a certain night
Such random variables are called continuous random variables.
In mathematics, the notion of probability density function is used to

describe the probability distribution of a continuous random variable.
First, let us visualize this notion by means of a graph on the next page.
4 / 34
density function f (x)
Area = P (a ≤ X ≤ b)
y Rb
= a f (x)dx
a b x
A typical probability density function
Suppose X is a continuous random variable and has density f (x). Then

the area under the curve y = f (x) within the interval [a, b], is the
probability P (a ≤ X ≤ b), namely the probability of X taking values
in the interval [a, b].
Since P (S) = 1 where S is the sample space, the total area under the
whole curve y = f (x) on x ∈ (−∞, ∞) is exactly 1.
5 / 34
Definition 1 (probability density function)
Let X be a continuous random variable. And let f (x) be a function
such that
f (x) ≥ 0 for all x;
Rb
For any −∞ ≤ a ≤ b ≤ ∞, P (a ≤ X ≤ b) is exactly a f (t)dt,
namely the area under the curve y = f (x) within the interval
[a, b];
R∞
P (−∞ < X < ∞) = −∞ f (x)dx = 1.
Then f (x) is called the probability density function (pdf) of X.
Rb
Remark. The integral notation a f (t)dt in this course is used to denote the
area under the curve y = f (x) within the interval [a, b]. However, in this
course we will not formulate or require the precise definition of an integral.
6 / 34
Definition 2 (mean and variance of a continuous random variable)
Let X be a continuous random variable with probability density
function f (x). The mean of X is defined as
Z ∞
µ = E[X] , xf (x)dx,
−∞
and the variance is defined as

Z ∞
2
σ = Var(X) , (x − µ)2 f (x)dx.
−∞
Remark. We have also that

Z ∞
Var(X) = x2 f (x)dx − µ2 .
−∞
Remark. Using the above definition to compute E[X] and Var(X) for a
continuous random variable X is NOT required in this course.
7 / 34
Definition 3 (Standard Deviation)
Let X be a random variable (either discrete or continuous). The
standard deviation of X is the nonnegative square root of Var(X).
p
σ = Var(X) ≥ 0.
8 / 34
§11.2.1 Normal Distributions
The collection of normal distributions, is the most important family of

probability distributions in statistics.
It enjoys some nice mathematical properties. (For example if X is

normal, then a + bX is also normal for any constant a and b).
The normal distributions are observed in many physical
experiments. (People’s heights, weights of eggs from a farm, etc.).
Central Limit Theorem (to be discussed in Lecture 12) says that,
for a large random sample from any distribution with finite
variance, the distribution of the average of the sample (namely the
sample mean) is approximately normal.
These nice properties motivate us to investigate normal distributions in

detail, which is the task of this lecture.
9 / 34
§11.2.1 Normal Distributions
Definition 4 (Definition 11.2.3)

We say that X has a normal distribution N (µ, σ 2 ) with parameters
σ > 0 and µ, if it has the probability density function
(x − µ)2

1
f (x) = √ exp − , −∞ < x < ∞
2πσ 2 2σ 2
Some advanced calculus is involved to prove that the area under f is 1.

Theorem 1 (Theorem 11.2.4)
If X ∼ N (µ, σ 2 ), then its mean and variance are
E[X] = µ, Var(X) = σ 2 .
Remark. Theorem 11.2.4 can be proved without the knowledge of

integration.
10 / 34
Facts
X ∼ N (µ, σ 2 ) ⇐⇒ aX + b ∼ N (aµ + b, a2 σ 2 ) (1)
Example 1
Suppose X ∼ N (2, 32 ). Find E[−3X] and Var(−3X).
Solution. −3X ∼ N (−6, 9 × 32 ) = N (−6, 81). We have

E[−3X] = −6 and Var(−3X) = 81.
11 / 34
The Standard Normal Distribution
Definition 5 (The Standard Normal Distribution)
The normal distribution N (0, 1) (i.e., with mean 0 and variance 1) is
called the standard normal distribution.
Motivation:
Standardization
X −µ
X ∼ N (µ, σ 2 ) ⇐⇒ Z := ∼ N (0, 1) (2)
σ
Question. How to prove Equation (2)?
Therefore, to calculate the probability P (a ≤ X ≤ b) for any

X ∼ N (µ, σ 2 ), we need only the table for the standard normal
distribution N (0, 1).
12 / 34
Standard normal distributions with different variances
The smaller σ is, the more peaked or squashed the curve is with respect
to the y-axis.
13 / 34
Properties of the standard normal distribution.
(1) Total probability is 1:
P (−∞ ≤ Z ≤ ∞) = 1
(2) From item (1) above, we have
P (Z ≤ a) = 1 − P (Z ≥ a) (3)
(3) As the standard normal distribution is symmetric about
the y-axis, we have
P (Z ≤ 0) = P (Z ≥ 0) = 0.5
(4) According to item (3) above, if P (X ≤ a) < 0.5, then
a < 0. And moreover,
P (Z ≤ a) = P (Z ≥ −a) for all a ≤ 0 (4)
(6) From items (3) and (4) above, if P (Z ≥ b) > 0.5, then
b < 0. And moreover,
P (Z ≥ b) = 1 − P (Z ≥ −b), for all b ≤ 0 (5)
14 / 34
Remark. You might have seen other tables of the standard normal
distribution. But in this course, we will always use the above one!
Example 2
Let Z ∼ N (0, 1). Find P (Z > 0.36).
Solution.
P (Z > 0.36) = P (Z ≥ 0.36) = 0.3594. 15 / 34
Example 3
Let Z ∼ N (0, 1). Find P (Z < −0.36), P (Z < 0.3574) and
P (Z > 5).
The table of standard normal distribution usually only covers [0, 4] with
limited precision. One should use some rounding and algebra to find the
probabilities which are not covered.
Solutions.
P (Z < −0.36) = P (Z > 0.36) = 0.3594, since the standard
normal distribution is symmetric with respect to the origin.
P (Z < 0.3574) = 1 − P (Z ≥ 0.3574) ≈ 1 − P (Z ≥ 0.36) =
0.6406.
Since P (Z > 3.99) = 0.0000 (rounded to 4 digits) already, we
have P (Z > 5) > P (X > 3.99) = 0.0000.
16 / 34
Example 4
Suppose Z ∼ N (0, 1). Find (a). P (0 ≤ Z ≤ 2.32); (b).
P (−1.54 ≤ Z ≤ 0); (c). P (−1.54 ≤ Z ≤ 2.32).
Solutions. (a).
P (0 ≤ Z ≤ 2.32) = 0.5 − P (Z ≥ 2.32) = 0.5 − 0.0102 = 0.4898.
(b). P (−1.54 ≤ Z ≤ 0) = P (0 ≤ Z ≤ 1.54) = 0.5 − P (Z >
1.54) = 0.4382.
(c). P (−1.54 ≤ Z ≤ 2.32) = P (−1.54 ≤ Z ≤ 0) + P (0 ≤ Z ≤
2.32) = 0.9280.
17 / 34
Example. Suppose Z ∼ N (0, 1). Find
1. P (Z < 0.92) 3. P (Z < −1.62) 5. P (−2.19 ≤ Z ≤ 0) 7. P (−0.58 < Z < 1.36)
2. P (Z > 2.48) 4. P (0 ≤ Z ≤ 1.84) 6. P (0.21 < Z < 0.95) 8. P (Z > −2.52)
Solutions.
1. P (Z < 0.92) = 1 − P (Z > 0.92) = 1 − 0.1788 = 0.8212.
2. P (Z > 2.48) = 0.00657.
3. P (Z < −1.62) = P (Z > 1.62) = 0.0526.
4. P (0 ≤ Z ≤ 1.84) = 0.5 − P (Z ≥ 1.84) = 0.5 − 0.0329 = 0.4671.
5. P (−2.19 ≤ Z ≤ 0) = 0.5 − P (Z ≥ 2.19) = 0.5 − 0.0143 = 0.4857.
6. P (0.21 < Z < 0.95) = P (Z > 0.21) − P (Z > 0.95) =
0.4168 − 0.1711 = 0.2457.
7. P (−0.58 < Z < 1.36) = 1 − P (Z > 0.58) − P (Z > 1.36) =
1 − 0.2810 − 0.0869 = 0.6321.
8. P (Z > −2.52) = 0.5+[0.5−P (Z > 2.52)] = 1−0.00587 = 0.99413.
18 / 34
Example 5
Let Z ∼ N (0, 1). Find a and b so that P (Z ≤ a) = 0.33 and
P (Z ≥ b) = 0.99.
Solutions.
As P (Z ≤ a) = 0.33 < 0.5, we have a < 0 (Please refer to Equation
(4).). Then P (Z ≤ a) = P (Z ≥ −a). From the table of standard
normal distribution we see −a = 0.44, so a = −0.44.
As P (Z ≥ b) = 0.99 > 0.5, we know that b < 0 (Please refer to
Equation (5).). Then P (Z ≥ b) = 1 − P (Z ≥ −b) = 0.99, that is,
P (Z ≥ −b) = 0.01, From the table of standard normal distribution we
see −b = 2.33, so b = −2.33.
19 / 34
Standardizing a normal random variable
Fact
X−µ
Recall that if X ∼ N (µ, σ 2 ), then Z = σ ∼ N (0, 1).
Example 6
Let X be a continuous random variable that is normally distributed
with a mean of 25 and a standard deviation of 4. Find the area between
X = 18 and X = 34.
Solution.
X ∼ N (25, 42 ). Let Z = X−25
4 . Then Z ∼ N (0, 1).

18 − 25 X − 25 34 − 25
P (18 ≤ X ≤ 34) = P ≤ ≤
4 4 4
= P (−1.75 ≤ Z ≤ 2.25) = 1 − P (Z ≥ 2.25) − P (Z ≥ 1.75)
= 0.9477.
20 / 34
Example 7
Suppose that X has a normal distribution such that P (X < 116) = 0.2
and P (X < 328) = 0.9. Determine the mean and variance of X.
Solution. Assume X ∼ N (µ, σ 2 ) and let Z = X−µ

σ ∼ N (0, 1). We
have

X −µ 116 − µ 116 − µ
0.2 = P (X < 116) = P < =P Z< .
σ σ σ
We check the table to give 116−µ
σ = −0.84. Similarly,

328 − µ 328 − µ
0.9 = P Z < ⇒P Z> = 0.1,
σ σ
328−µ
so σ = 1.28. We join the two equations to obtain

116 − µ = −0.84σ,
328 − µ = 1.28σ.
Solve them to give µ = 200 and σ = 100. So the mean is 200 and
variance is 10000.
21 / 34
Finding an X value for a normal distribution
Example 8
It is known that the life of a calculator manufactured by Intel
Corporation has a normal distribution with a mean of 54 months and a
standard deviation of 7.65 months. What should the warranty period be
to replace a malfunctioning calculator if the company does not want to
replace more than 1% of all the calculators sold?
Solution.
Let X be the life length of a calculator. Then X ∼ N (54, 7.652 ).
Let Z = X−547.65 . Then Z ∼ N (0, 1).
Let a be the warranty period. Then P (0 ≤ X ≤ a) = 0.01, so
P ( 0−54 X−54 a−54 a−54
7.64 ≤ 7.65 ≤ 7.65 ) = P (−7.05882 ≤ Z ≤ 7.65 ) =
54−a a−54
P (Z ≥ 7.65 ) − P (Z ≥ 7.05882) = P (Z ≥ 7.65 ) = 0.01.
We check table to find that P (Z ≥ 2.33) = 0.01. Therefore
54 − a
2.33 = ⇒ a = 36.1755.
7.65
22 / 34
Normal distribution approximating Binomial distribution
Motivating Example
Let X ∼ Binomial(n, p) with n = 10, 000 and p = 0.4. Find
P (3000 ≤ X ≤ 3900).
[The brute force way]

3900
X 10000
P (3000 ≤ X ≤ 3900) = 0.4k × 0.610000−k = · · ·
k
k=3000
The computation load is heavy.
23 / 34
Recall the animation of Binomial(80, p). (play it.)
Binomial(80,0) Binomial(80,0.3)
1.0
1.0
●
0.8
0.8
0.6
0.6
probability
probability
0.4
0.4
0.2
0.2
●●●●●
● ●
● ●
● ●
● ●●●
0.0
0.0
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 20 40 60 80 0 20 40 60 80
k k
Binomial(80,0.6) Binomial(80,0.99)
1.0
1.0
0.8
0.8
0.6
0.6
probability
probability
●
0.4
0.4
●
0.2
0.2 ●
●●●
●●● ●
●
● ●
● ●● ●
●●
0.0
0.0
●● ●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 20 40 60 80 0 20 40 60 80
k k
For p not too close to 0 or 1, the probability function appears like

normal distribution.
24 / 34
A close look
0.10 red dots: Binomial(80,0.3); black line: N(24, 16.8)
●
●
●
●
●
0.08
●
0.06
●
●
0.04
●
●
● ●
0.02
● ●
●
●
●
●
●
0.00
● ●●
●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
0 20 40 60 80
25 / 34
Fact
Let X ∼ Binomial(n, p) with n very large such that np > 5 and
n(1 − p) > 5. Then X can be approximated by a normal random
variable Y ∼ N (np, np(1 − p)).
Remark:
Recall that in this case E[X] = np and Var(X) = np(1 − p).
np(1 − p) is the variance; and np(1 − p) is the standard
p
deviation.
26 / 34
Normal distribution Y ∼ N (µ, σ 2 ) as an approximation to the

binomial distribution X ∼ Binomial(n, p)
Step 1 Check whether the approximation works by validating

np ≥ 5 and n(1 − p) ≥ 5
Step 2 Compute µ and σ for the normal distribution Y by
µ = np and σ 2 = np (1 − p)
Step 3 Correction for continuity: For any integer k,

(a) P (X = k) ≈ P (k − 0.5 < Y < k + 0.5) slightly
higher
(b) P (X ≥ k) ≈ P (Y > k − 0.5) slightly higher
(c) P (X > k) ≈ P (Y > k + 0.5) slightly lower
(d) P (X ≤ k) ≈ P (Y < k + 0.5) slightly higher
(e) P (X < k) ≈ P (Y < k − 0.5) slightly lower
27 / 34
The motivating example, use table to solve
Let X ∼ Binomial(n, p) with n = 10000 and p = 0.4. Find
P (3000 ≤ X ≤ 3900).
Solutions.
We have np = 4000 > 5 and n(1 − p) = 6000 > 5, so we use
Y ∼ N (np, np(1 − p)) = N (4000, 2400) to approximate X.
Y√−4000
Let Z = 2400
, then approximately Z ∼ N (0, 1).
P (3000 ≤ X ≤ 3900) ≈ P (2999.5 ≤ Y ≤ 3900.5)

2999.5 − 4000 3900.5 − 4000
=P √ ≤Z≤ √
2400 2400
= P (−20.42 ≤ Z ≤ −2.03) = P (2.03 ≤ Z ≤ 20.42)
= P (Z ≥ 2.03) − P (Z ≥ 20.42) = 0.0212 − 0 = 0.0212.
28 / 34
Example 9
Let X ∼ Binomial(n, p) with n = 100 and p = 0.4. Find
P (30 ≤ X < 39).
Solutions.
We have np = 40 > 5 and n(1 − p) = 60 > 5, so approximately
Y ∼ N (40, 24).
Y√−40
Let Z = 24
, then approximately Z ∼ N (0, 1).
P (30 ≤ X < 39) ≈ P (29.5 ≤ Y ≤ 38.5)

29.5 − 40 38.5 − 40
=P √ ≤Z≤ √
24 24
= P (−2.14 < Z < −0.31) = P (0.31 < Z < 2.14)
= 0.3783 − 0.0162 = 0.3621.
29 / 34
Example 10 (Q4(c), Semester 1, 2017/2018)
A large electronic office product contains 1000 electronic components.
Assume that the probability that each component operates without
failure during the useful lift of the product is 0.995, and assume that the
components fail independently. Approximate the probability that 8 or
more of the original 1000 components fail during the useful life of the
product.
Solution. Let X be the number of the electronic components that fail

during the useful life of the product. Then X ∼ Poisson(0.005). As
1000 ∗ 0.005 = 5 ≥ 5, 1000 ∗ (1 − 0.005) > 5
we use
Y ∼ N (1000 ∗ 0.005, 1000 ∗ (1 − 0.005) ∗ 0.005) = N (5, 4.975) to
−5
approximate X. Let Z = √Y4.975 . Then Z ∼ N (0, 1).
7.5 − 5
P (X ≥ 8) ≈ P (Y > 7.5) = P (Z > √ )
4.975
= P (Z > 1.12084) ≈ 0.1314.
30 / 34
Example 11 (Q4, Semester 2, 2018/2019)
A manufacturer fills jars with coffee. The weight of coffee, W grams,
in a jar can be modelled by a normal distribution with mean 232 grams
and standard deviation 5 grams.
(a) Find P (W < 227).
(b) One hundred jars of coffee are selected at random. Find
the probability that more than 23 jars contains less than
227 grams of coffee.
Solution. We are given W ∼ N (232, 52 ). Define Z = W −2325 . Then

Z ∼ N (0, 1).
(a) P (W < 227) = P (Z < 227−232
5 ) = P (Z < −1) = P (Z > 1) =
0.1587.
Let X be the number of jars, each of which contains less than 227
grams of coffee. Then X ∼ Binomial(100, 0.1587). As
100 ∗ 0.1587 = 15.87 ≥ 5 and 100 ∗ (1 − 0.1587) ≥ 5 We use
Y ∼ (100∗0.1587, 100∗0.1587∗(1−0.1587)) = N (15.87, 13.3514)
to approximate X.
31 / 34
Y −15.87
Let M = √
13.3514
. Then M ∼ N (0, 1)
23.5−15.87
P (X > 23) = P (Y > 23.5) = P (M > √
13.3514
) = P (M >
2.08815) ≈ P (M > 2.09) = 0.0183.
32 / 34
Example 12 (Q4, Semester 1, 2018/2019)
There are, on average, 120 students entering the university every hour.
The university has only two gates, an east gate and a south gate. Every
student has 60% probability to enter the university through the east
gate. Assume students enter the university independently.
(a) On average, how many students enter the university from
8:00 to 8:05? What is the probability that at least 5
students enter the university from 8:00 to 8:05?
(b) If there are 10 students entering the university from 8:05
to 8:10, what is the probability that over 5 student enter
through the south gate from 8:05 to 8:10?
(c) If there are 180 students entering the university from
8:30 to 9:00, use the normal distribution to estimate the
probability that less than or equal to 100 students enter
through the east gate from 8:30 to 9:00.
33 / 34
Solution.
(a) as there are, on average, 120 students entering the university every
hour, on average, 120 ∗ 60 5
= 10 students enter the university from 8:00
to 8:05. Let X be the number of students entering the university from
8:00 to 8:05. ThenP X ∼ Poisson(10). We have
P (X ≥ 5) = 1 − 4k=0 P (X = k) =
1 2 3 4
1 − e−10 (1 + 101! + 102! + 103! + 104! ) = 0.9708.
(b) Let Y be the number of students entering the university through the
south gate from
P8:05 to 8:10. Then Y ∼ Binomial(10, 0.4). Then
10 10
P (Y > 5) = k=6 6 ∗ 0.4k ∗ 0.610−k = 0.1662.

(c)Let A be the number of students entering the university through the

east gate from 8:30√to 9:00. 180 ∗ 0.6 = 108, 180 ∗ 0.6 ∗ 0.4 = 43.2.
Let W ∼ N (108, 43.2). Define Z = W√−108 43.2
. Then Z ∼ N (0, 1).
We have P (A ≤ 100) = P (W < 100.5) = P (Z < 100.5−108
√
43.2
) =
P (Z < −1.14109) ≈ P (Z > 1.14) = 0.1271.
34 / 34

Lecture11 (Week 12) Updated

Uploaded by

Lecture11 (Week 12) Updated

Uploaded by

AMA1110 Basic Mathematics I - Calculus and

Probability & Statistics

Dr. Guofeng Zhang

where fX (u) = P (X = u) is the probability function of the

(1) Continuous random variables

Such random variables are called continuous random variables.

In mathematics, the notion of probability density function is used to

Suppose X is a continuous random variable and has density f (x). Then

and the variance is defined as

Remark. We have also that

The collection of normal distributions, is the most important family of

It enjoys some nice mathematical properties. (For example if X is

These nice properties motivate us to investigate normal distributions in

Definition 4 (Definition 11.2.3)

Some advanced calculus is involved to prove that the area under f is 1.

Remark. Theorem 11.2.4 can be proved without the knowledge of

X ∼ N (µ, σ 2 ) ⇐⇒ aX + b ∼ N (aµ + b, a2 σ 2 ) (1)

Solution. −3X ∼ N (−6, 9 × 32 ) = N (−6, 81). We have

Question. How to prove Equation (2)?

Therefore, to calculate the probability P (a ≤ X ≤ b) for any

Solution. Assume X ∼ N (µ, σ 2 ) and let Z = X−µ

[The brute force way]

The computation load is heavy.

For p not too close to 0 or 1, the probability function appears like

0.10 red dots: Binomial(80,0.3); black line: N(24, 16.8)

Normal distribution Y ∼ N (µ, σ 2 ) as an approximation to the

Step 1 Check whether the approximation works by validating

Step 3 Correction for continuity: For any integer k,

P (3000 ≤ X ≤ 3900) ≈ P (2999.5 ≤ Y ≤ 3900.5)

P (30 ≤ X < 39) ≈ P (29.5 ≤ Y ≤ 38.5)

Solution. Let X be the number of the electronic components that fail

Solution. We are given W ∼ N (232, 52 ). Define Z = W −2325 . Then

(c)Let A be the number of students entering the university through the

You might also like