STATISTICAL METHODS II: STAT 154
J.K. AFRIYIE
Department of Statistics and Actuarial Science
KNUST
Probability Theory
June 8, 2023
1 / 55
COURSE OUTLINE
Special Discrete Distributions
Binomial distribution
Poisson distribution
Special Continuous Distribution
Normal distribution
Confidence Interval
Mean
Proportion
Variance
Standard Deviation
Hypothesis testing
Mean
Proportion
Variance
Standard Deviation
2 / 55
DISCRETE DISTRIBUTIONS:Binomial Distribution
The Binomial distribution is used to model experiments
consisting of observations of identical and independent
trials, each of which results in one of the two outcomes.
Some examples are tossing a coin n times and observing
the number of successes, heads or tails, a sequence of n
shots may result in a number of hits or misses.
3 / 55
Binomial Distribution
Conditions of a Binomial Distribution
1 The experiment consists of fixed number of trials.
2 The experiment consists of independent and identical
trials.
3 Each trial results in one of the two outcomes called success
or failure.
4 The probability of success in a single trial is p and remains
the same from trial to trial. The probability of a failure,
also in a single trial is q = 1 − p.
5 The random variable of interest, X is the number of
successes observed during the n trials.
4 / 55
Binomial Distribution
Definition
A random variable X has the binomial distribution with the
number of trials n and probability of success, p if
P (X = x) =n Cx px (1 − p)n−x , x = 0, 1, 2, . . . , n
5 / 55
Binomial Distribution
Mean, Variance and Standard Deviation of a Binomial
Distribution
E[X] = µ = np
V ar(X) = σ 2 = np(1 − p)
p
SD(X) = σ = np(1 − p)
6 / 55
Binomial Distribution
Example
Four fair coins are flipped. If their outcomes are assumed
independent,
a. what is the probability of obtaining two heads ?
b. calculate the mean and variance.
7 / 55
Binomial Distribution
Solution
Letting X equals the number of heads (“success”) that appear
then X is a binomial random variable with parameters (n = 4)
1
and p = . Hence
2
a. 2 2
4 1 1 3
P (X = 2) = C2 =
2 2 8
b.
1
E(X) = np = 4 =2
2
1 1
V ar(X) = np(1 − p) = 4 =1
2 2
8 / 55
Binomial Distribution
Example
It is known that all items produced by a certain machine will be
defective with probability 0.1, independently of each other.
What is the probability that in a sample of 3 items, at most one
will be defective.
9 / 55
Binomial Distribution
Solution
If X is the number of defective items in the sample, then X is a
binomial random variable with parameters (3, 0.1). Hence the
desired probability is given by
P (X ≤ 1) = P (X = 0) + P (X = 1)
3
= C0 (0.1)0 (0.9)3 +3 C1 (0.1)1 (0.9)2
= 0.972
10 / 55
Binomial Distribution
Example
The random variable X has the binomial distribution,
X ∼ Bin(6, 0.6). Find P (X ≥ 1)
11 / 55
Binomial Distribution
P (X ≥ 1) = 1 − P (X < 1) = 1 − P (X = 0)
h i
= 1 − 6 C0 (0.6)0 (0.4)6
= 1 − 0.004096 = 0.9959
12 / 55
BINOMIAL DISTRIBUTION
Examples
1. A fair coin is tossed 3 times. Find the probability of
getting exactly two heads.
2. A survey found that one out of five Americans say he or
she has visited a doctor in any given month. If 10 people
are selected at random, find the probability that at least
one will have visited a doctor last month.
3. A survey from Teenage Research Unlimited found that 30%
of teenage consumers receive their spending money from
part-time jobs. If 5 teenagers are selected at random, find
the probability that at least 3 of them will have part-time
jobs.
13 / 55
Poisson Distribution
Poisson Distribution
The Poisson distribution is used as a model for describing the
number of times some random event occurs in an interval of
time or space. Some examples are
the number of claims processed by a certain insurance
company in a given month
the number of road traffic accidents in an area during a
given time interval
the number of errors a typist makes in typing a page of a
text
the number of admissions of a clinic in a given time
interval.
In all these examples, µ is the average number of times the
event occurs in a given time interval.
14 / 55
Poisson Distribution
Definition
The Poisson distribution for the random variable, X,
representing the number of occurrence of an event in a given
interval of time, space or volume is defined by
µx e−µ
P (X = x) = , x = 0, 1, 2, . . . and µ > 0
x!
15 / 55
Poisson Distribution
Mean and Variance of Poisson Distribution
E(X) = V ar(X) = µ
16 / 55
Poisson Distribution
Example
The number of power surges in an electric grid has a Poisson
distribution with a mean of one power surge every twelve hours.
a. What is the probability that there will be at least one
power surge in a 12-hour period?
b. What is the probability that there will be no more than
one power surge in a 24-hour period?
c. What is the probability that there will be more than three
power surge in a 24-hour period?
d. What is the probability that there will be exactly three
power surge in a 6-hour period?
17 / 55
Poisson Distribution
Solution
a. In 12 hours, the mean number of power surge is 1.
P (X ≥ 1) = 1 − P (X = 0) = 1 − P (X = 0)
10 e−1
= 1−
0!
−1
= 1−e
18 / 55
Poisson Distribution
Solution
b. In 12 hours, the mean number of power surges is 1. In 24
hours, the mean number is therefore 2.
P (X ≤ 1) = P (X = 0) + P (X = 1)
20 e−2 21 e−2
= +
0! 1!
= e−2 + 2e−2
= 3e−2
= 0.406
19 / 55
POISSON DISTRIBUTION
Solution
c. In 12 hours, the mean number of power surges is 1. In 24
hours, the mean number is therefore 2 .
P (X > 3) = 1 − P (X ≤ 3)
= 1 − [P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)]
0 −2 1 −2 2 −2 3 −2
2 e 2 e 2 e 2 e
=1− + + +
0! 1! 2! 3!
4e−2
= e−2 + 2e−2 + 2e−2 +
3
= 0.857
d. In 12 hours, the mean number of power surges is 1. In 6
hours, the mean number is therefore 0.5 .
0.53 e−0.5
P (X = 3) =
3! 20 / 55
POISSON DISTRIBUTION
Examples
A sales firm receives, on average, 3 calls per hour on its
toll-free number. For any given hour, find the probability
that it will receive the following:
a. at most 3 calls
b. at least 3 calls
c. 5 or more call
If there are 200 typographical errors randomly distributed
in a 500-page manuscript, find the probability that a given
page contains exactly 3 errors.
21 / 55
CONTINUOUS DISTRIBUTIONS: Normal
Distribution
Normal Distribution
The single most important distribution in probability and
statistics is the normal probability distribution.
It provides a good model for continuous random variables
involving measurements such as time, heights/weights of
persons, marks scored in an examination,amount of rainfall,
growth rate and many other scientific measurements.
The density function of a normal probability distribution is
bell shaped and symmetric about the mean.
22 / 55
Normal Distribution Curve
23 / 55
Normal Distribution
Properties of the Normal Distribution
A normal distribution curve is bell-shaped.
The mean, median and mode are equal and are located at
the centre of the distribution.
A normal distribution curve is unimodal (i.e., it has only
one mode).
The curve is symmetric about the mean
The curve is continuous; but it never touches the x-axis.
The total area under a normal distribution curve is equal
to 1.00, or 100%
The area under the part of a normal curve that lies within
1 standard deviation of the mean is approximately 0.68, or
68%; within 2 standard deviations, about 0.95, or 95%; and
within 3 standard deviations, about 0.997, or 99.7%.
24 / 55
Normal Distribution
25 / 55
Normal Distribution
Normal Density Function
A random variable X is said to have a normal probability
distribution with parameters µ and σ 2 , if it has a probability
density function given by
1 (x−µ)2
f (x) = √ e− 2σ2
σ 2π
−∞ < x < ∞, −∞ < µ < ∞, σ > 0
The normal random variable,X, has the notation, X ∼ N (µ, σ 2 ).
26 / 55
Normal Distribution
Standard Normal Density Function
If µ = 0 and σ = 1, we call it the standard normal random
variable and with density
1 z2
f (z) = √ e− 2 ,
2π
−∞ < z < ∞
The standard normal random variable,Z, can be denoted as
Z ∼ N (0, 1)
27 / 55
Normal Distribution
Graphical Representation of the Normal Density
28 / 55
Normal Distribution
Mean, Variance, and Standard deviation of a Normal Random
Variable
Theorem
If X ∼ N (µ, σ 2 ), then E(X) = µ and V ar(X) = σ 2 .
Computing Probabilties
X−µ
If X ∼ N (µ, σ 2 ), then the z-transform (z-score) of X, Z = σ ,
is an N (0, 1) random variable.
We compute probabilities based on the standard random
variable.
29 / 55
Normal Distribution
Computing Probabilties
The probability, P (a ≤ x ≤ b), where X ∼ N (µ, σ 2 ) can be
defined as
Z b Z b
1 (x−µ)2
P (a ≤ x ≤ b) = f (x)dx = √ e− 2σ2 dx
a a σ 2π
Z Zb Z Zb
1 z2
= f (z)dz = √ e− 2 dz
Za Za 2π
P (Za ≤ Z ≤ Zb ) = P (Z < zb ) − P (Z < za )
= Φ(zb ) − Φ(za )
The values of the integral f (z), Φ(z) are evaluated using the
standard normal cumulative distribution table.
30 / 55
Normal Distribution
The following should be noted when evaluating Φ(z)
Φ(−k) = 1 − Φ(k) = 1 − P (z ≤ k)
P (z > k) = 1 − P (z ≤ k) = 1 − Φ(k)
P (−k ≤ z ≤ k) = Φ(k) − Φ(−k) = Φ(k) − [1 − Φ(k)] =
2Φ(k) − 1
31 / 55
Normal Distribution
Finding Areas Under the Standard Normal Distribution Curve
1. Find the area to the left of Z = 1.59
2. Find P (Z ≤ −1.42)
3. Find the area to the right of Z = 2.09
4. Find P (Z > 0.09)
5. Find the area between Z = −1.96 and Z = 2.58
6. Find P (−1.25 < Z < 0.09)
32 / 55
Normal Distribution
Solution
1. P (Z < 1.59) = 0.9441
2. P (Z ≤ −1.42) = 0.0778
3. P (Z > 2.09) = 1 − P (Z ≤ 2.09) = 1 − 0.9817 = 0.0183
4. P (Z > 0.09) = 1 − P (Z ≤ 0.09) = 1 − 0.5359 = 0.4641
5. P (−1.96 < Z < 2.58) = 0.9951 − 0.0250 = 0.9701
6. P (−1.25 < Z < 0.09) = 0.5359 − 0.1056 = 0.4303
33 / 55
Normal Distribution
Example 1
For X ∼ N (0, 1), calculate P (Z ≥ 1.13)
34 / 55
Normal Distribution
Solution
P (Z ≥ 1.13) = 1 − P (Z < 1.13) = 1 − Φ(1.13)
From the cumulative standard normal table, we read
Φ(1.13) = 0.8708
P (Z ≥ 1.13) = 1 − 0.8708 = 0.1292
35 / 55
Normal Distribution
Example 2
For X ∼ N (5, 4), calculate
1. P (X < 6)
2. P (8 < X < 11)
36 / 55
Normal Distribution
Solution
Using the z-transform, we obtain
1.
X −µ 6−5 6−5
P (X < 6) = P ( < ) = P (Z < )
σ 2 2
= P (Z < 0.5)
= ϕ(0.5)
= 0.6915
2.
8−5
P (8 < X < 11) = P 2 < X−5
2 < 11−5
2
= P (1.5 < Z < 3.0)
= Φ(3.0) − Φ(1.5)
= 0.9987 − 0.9332 = 0.0655
37 / 55
Normal Distribution
Example 3
The commuting time between a person’s home and office is
normal random variable,X, with mean 24 minutes. The
standard deviation is 2 minutes.
Find:
1 the probability that it takes a person between 24 and 28
minutes to get to work.
2 the probability that it takes a person less than 19 minutes.
3 the probability that it takes a person more than 20
minutes.
4 the value of k such that P (X > k) = 0.0250.
38 / 55
Normal Distribution
Solution
1
24−24
P (24 < X < 28) = P 2 < X−24
2 < 28−24
2
= P (0 < Z < 2)
= Φ(2) − Φ(0) = 0.9772 − 0.5
= 0.4772
X − 24 19 − 24
P (X < 19) = P <
2 2
= P (Z < −2.5) = ϕ(−2.5)
= 1 − Φ(2.5) = 1 − 0.9938
= 0.0062
39 / 55
3.
P (X > 20) = 1 − Φ 20−24
2
= 1 − Φ(−2) = 1 − Φ(1 − Φ(2))
= Φ(2) = 0.9772
4.
P (X > k) = 0.0250
1 − P Z < k−24
2 = 0.0250
Φ k−24
2 = 0.9750
2Φ−1 (0.9750) + 24 = k
2(1.96) + 24 = k
k = 27.92
40 / 55
Normal Distribution
DISTRIBUTION OF SAMPLE MEANS
A sampling distribution of sample means is a
distribution using the means computed from all possible
random samples of a specific size taken from a population.
Sampling error is the difference between the sample
measure and the corresponding population measure due to
the fact that the sample is not a perfect representation of
the population.
41 / 55
Normal Distribution
PROPERTIES OF DISTRIBUTION OF SAMPLE MEANS
The mean of the sample means will be the same as the
population mean. i.e
E(X̄) = µ
The standard deviation of the sample means will be smaller
than the standard deviation of the population, and it will
be equal to the population standard deviation divided by
the square root of the sample size. i.e
σ
SD(X̄) = σX̄ = √
n
42 / 55
Normal Distribution
THE CENTRAL LIMIT THEOREM
As the sample size n increases without limit, the shape of
the distribution of the sample means taken with
replacement from a population with mean µ and standard
deviation σ will approach a normal distribution
As previously shown, this distribution will have a mean µ
σ
and a standard deviation √
n
Thus, z= X̄−µ
σ
√
n
When the original variable is normally distributed, the
distribution of the sample means will be normally
distributed, for any sample size n
When the distribution of the original variable is not
normal, a sample size of 30 or more is needed to use a
normal distribution to approximate the distribution of the
sample means. The larger the sample, the better the
43 / 55
EXAMPLES
A. C. Neilsen reported that children between the ages of 2
and 5 watch an average of 25 hours of television per week.
Assume the variable is normally distributed and the
standard deviation is 3 hours. If 20 children between the
ages of 2 and 5 are randomly selected, find the probability
that the mean of the number of hours they watch television
will be greater than 26.3 hours
SOLUTION n = 20, µ = 25, σ = 3 ,X̄ = 26.3, P (X̄ > 26.3) =?
X̄ − µ 26.3 − 25 1.3
z= = = = 1.94
σ 3 0.671
√ √
n 20
P (z > 1.94) = 1 − 0.9738 = 0.0262
44 / 55
EXAMPLE
The average age of a vehicle registered in the United States
is 8 years, or 96 months. Assume the standard deviation is
16 months. If a random sample of 36 vehicles is selected,
find the probability that the mean of their age is between
90 and 100 months.
SOLUTION n = 36, µ = 96, σ = 16, P (90 < X̄ < 100) =?
X̄ − µ 90 − 96
z= = = −2.25;
σ 16
√ √
n 36
100 − 96
z= = 1.50
16
√
36
P (−2.25 < z < 1.50) = 0.9332 − 0.0122 = 0.9210
45 / 55
EXAMPLE
The average number of pounds of meat that a person
consumes per year is 218.4 pounds. Assume that the
standard deviation is 25 pounds and the distribution is
approximately normal.
a. Find the probability that a person selected at random
consumes less than 224 pounds per year.
b. If a sample of 40 individuals is selected, find the probability
that the mean of the sample will be less than 224 pounds
per year.
46 / 55
SOLUTION
X̄ − µ 224 − 218.4
a. z = = = 0.22;
σ 25
P (z < 0.22) = 0.5871
X̄ − µ 224 − 218.4
b. z = = = 1.42;
σ 25
√ √
n 40
P (z < 1.42) = 0.9222
47 / 55
THE NORMAL APPROXIMATION TO THE
BINOMIAL DISTRIBUTION
A normal distribution is often used to solve problems that
involve the binomial distribution since when n is large (say,
100), the calculations are too difficult to do by hand using
the binomial distribution.
np ≥ 5 and nq ≥ 5
A correction for continuity is a correction employed
when a continuous distribution is used to approximate a
discrete distribution.
48 / 55
SUMMARY OF THE NORMAL APPROXIMATION
TO THE BINOMIAL DISTRIBUTION
49 / 55
Cont’d
50 / 55
EXAMPLES
A magazine reported that 6% of American drivers read the
newspaper while driving. If 300 drivers are selected at
random, find the probability that exactly 25 say they read
the newspaper while driving.
SOLUTION p = 0.06, q = 0.94 and n = 300
P (X = 25) =?
np = 300(0.06) = 18 and nq = 300(0.94) = 282
Thus, np > 5 and nq > 5.
µ = np = 300(0.06) = 18 and
√ p
σ= npq = 300(0.06)(0.94) = 4.11
P (X = 25) = P (24.5 < X < 25.5)
51 / 55
CONT’D
24.5 − 18 25.5 − 18
z1 = = 1.58; z2 = = 1.82
4.11 4.11
P (1.58 < z < 1.82) = 0.9656 − 0.9429 = 0.0227
52 / 55
CONT’D
Of the members of a bowling league, 10% are widowed. If
200bowling league members are selected at random, find
the probability that 10 or more will be widowed.
SOLUTION
p = 0.10, q = 0.90 and n = 200, P (X ≥ 10) =?
np = 200(0.10) = 20 and nq = 200(0.90) = 180
Thus, np ≥ 5 and nq ≥ 5
µ = np = 200(0.10) = 20 and
√ p
σ = npq = 200(0.10)(0.90) = 4.24
P (X ≥ 10) = P (X > 9.5)
9.5 − 20
z= = −2.48
4.24
P (z > −2.48) = 1 − 0.0066 = 0.9934 53 / 55
CONT’D
If a baseball player’s batting average is 0.320(32%), find
the probability that the player will get at most 26 hits in
100 times at bat.
SOLUTION
p = 0.32, q = 0.68 and n = 100
P (X ≤ 26)
np = 100(0.32) = 32 and nq = 100(0.68) = 68
Thus, np ≥ 5 and nq ≥ 5
µ = np = 100(0.32) = 32 and
p
σ = 100(0.32)(0.68) = 4.66
P (X ≤ 26) = P (X < 26.5)
26.5 − 32
z= = −1.18
4.66
54 / 55
ASSIGNMENT
1 The average individual monthly spending in the United
States for paging and messaging services is $10.15. If the
standard deviation is $2.45 and the amounts are normally
distributed, what is the probability that a randomly
selected user of these services pays more than $15.00 per
month? Between $12.00 and $14.00 per month?
2 According to the government 5.3% of those employed are
multiple job holders. In a random sample of 150 people
who are employed, what is the probability that fewer than
10 hold multiple jobs? What is the probability that more
than 50 are not multiple-job holders?
55 / 55