Chapter 4 - Random Variables
Chapter 4 - Random Variables
4 – Random Variables
Content
4.1 Random Variables
4.2 Discrete Random Variables
4.3 Expected Value
4.4 Expectation of a Function of a Random Variable
4.5 Variance
4.6 The Bernoulli and Binomial Random Variables
4.7 The Poisson Random Variable
4.8 Other Discrete Probability Distributions
4.9 Expected Value of Sums of Random Variables
4.10 Properties of the Cumulative Distribution function
Random Variables
Introduction
Random variable: “A (mathematical) function mapping events in a sample space S onto the
real numbers.”
Random variables are denoted by capital letters (often X, Y and Z). Therefore by definition:
𝑋: 𝑆 → ℝ
Particular values of random variables are denoted by lowercase letters (e.g. X = x)
We are often interested in some function of the outcome as opposed to the actual outcome.
For example we interested in the total number of heads appearing in 10 throws. We are not
interested in the actual order of the heads and tails sequence just the total. These quantities
or real-valued functions defined on S are known as random variables.
The value of a random variable is determined by the outcome of the experiment and as a
result we can assign probabilities to the possible values of the random variable.
Types of Random Variables
The distinction between the 2 types is based on the range of values that they can assume.
2. Two items selected from a production line and investigated whether they are defective or
not. D = defective and N = non-defective.
𝑆 = {(𝐷, 𝐷), (𝐷, 𝑁), (𝑁, 𝐷), (𝑁, 𝑁)}
Let X = number of defective items, then
𝑿 ∈ {𝟎, 𝟏, 𝟐}
Note:
• The probability distribution of discrete random variables consists of the values which it
assumes, together with a corresponding probability for each of the values occurring.
Introductory Example
Suppose we have an unfair, biased coin, for which P(Heads) = 0.4, and therefore P(Tails) =
0.6. Our experiment consists of tossing this coin twice, and we will define the random
variable X = the number of heads obtained.
The sample space for this experiment is: (if we write H for Heads and T for Tails).
𝑺 = {𝑯𝑯; 𝑯𝑻; 𝑻𝑯; 𝑻𝑻}
Therefore: Implying that the possible values that X can take are 0, 1, or 2.
Outcome HH HT TH TT
Nr of Heads 2 1 1 0
Probability 0.4 × 0.4 = 0.16 0.4 × 0.6 = 0.24 0.6 × 0.4 = 0.24 0.6 × 0.6 = 0.36
Solution:
𝑃(0 < 𝑋 ≤ 2) = 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
3 3 3
= + =
8 8 4
Solution:
3
3 1
𝑃(𝑋 = 𝑥) = ( ) ( )
𝑥 2
Solution:
Example 4.1c
Four balls are to be randomly selected, without replacement, from an urn that contains balls
20 numbered 1 through 20. If X is the largest numbered ball selected, then X is a random
variable that takes on one of the values which values?
Solution:
20
4, 5, …,20. Because each of the ( ) possible selections of 4 of 20 the balls is equally likely,
4
the probability that takes on each of its possible values is:
𝑖−1
( )
𝑃(𝑋 = 𝑖) = 3 , 𝑖 = 4,5, … 20
20
( )
4
This is so because the number of selections that result in X = i is the number of selections
that result in ball numbered i and three of the balls numbered 1 through i – 1 being selected.
1 𝑖−1
As there are ( ) ( ) such selections, the preceding equation follows.
1 3
In this equation i is the largest numbered ball chosen, with 3 balls chosen that are numbered
less than i and no balls with a number larger than i were selected.
∑ 𝑃(𝑋 = 𝑖) = 1
𝑖=4
Suppose now that we want to determine P(X > 10). One way, of course, is to just use the
preceding to obtain:
20
𝑃(𝑋 > 10) = 𝑃(𝑋 = 11) + 𝑃(𝑋 = 12) + ⋯ + 𝑃(𝑋 = 19) + 𝑃(𝑋 = 20) = ∑ 𝑃(𝑋 = 𝑖)
𝑖=11
20 𝑖−1
20 10 11 19
( ) ( ) + ( ) + ⋯+ ( )
∑ 𝑃(𝑋 = 𝑖) = ∑ 3 = 3 3 3
20 20
𝑖=11 𝑖=11 ( ) ( )
4 4
However, a more direct approach for determining P(X > 10) would be to use:
𝑃(𝑋 > 10) = 1 − 𝑃(𝑋 ≤ 10)
10𝑖−1
( )
=1−∑ 3
20
𝑖=4 ( )
4
10 Last step:
( )
=1− 4 𝑛
( )=(
𝑛−1
)+(
𝑛−1
)
20 𝑟 4−1 𝑟
( )
4
where the preceding results because X will be less than or equal to 10 when the 4 balls chosen
are among balls numbered 1 through 10.
Example 4.1d
Independent trials consisting of the flipping of a coin having probability p of coming up heads
are continually performed until either a head occurs or a total of n flips is made. If we let X
denote the number of times the coin is flipped, then X is a random variable taking on one of
the values 1, 2, 3, …, n with respective probabilities.
Solution:
𝑃(𝑋 = 1) = 𝑃{(𝐻)} = 𝑝
𝑃(𝑋 = 2) = 𝑃{(𝑇, 𝐻)} = (1 − 𝑝)𝑝
𝑃(𝑋 = 3) = 𝑃{(𝑇, 𝑇, 𝐻)} = (1 − 𝑝)2 𝑝
…
𝑃(𝑋 = 𝑛 − 1) = 𝑃{(𝑇, 𝑇, … 𝑇, 𝐻)} = (1 − 𝑝)𝑛−2 𝑝
𝑃(𝑋 = 𝑛) = 𝑃{(𝑇, 𝑇, … 𝑇, 𝑇), (𝑇, 𝑇, … 𝑇, 𝐻)} = (1 − 𝑝)𝑛−1
Note that:
𝑛 𝑛
1 − (1 − 𝑝)𝑛−1
= 𝑝[ ] + (1 − 𝑝)𝑛−1
1 − (1 − 𝑝)
= 1 − (1 − 𝑝)𝑛−1 + (1 − 𝑝)𝑛−1
=1
Compute:
a) P{X < 3}
Solution:
1 1 11
𝑃{𝑋 < 3} = lim 𝑃 {𝑋 ≤ 3 − } = lim 𝐹 (3 − ) =
𝑛 𝑛 𝑛 𝑛 12
b) P{X = 1}
Solution:
𝑃{𝑋 = 1} = 𝑃{𝑋 ≤ 1} = 𝑃{𝑋 < 1}
1
= 𝐹(1) − lim 𝐹 (1 − )
𝑛 𝑛
2 1 1
= − =
3 2 6
c) P{X > 0.5}
Solution:
𝑃{𝑋 > 0.5} = 1 − 𝑃{𝑋 ≤ 0.5}
3
= 1 − 𝐹(0.5) =
4
d) P{2 < X ≤ 4}
Solution:
𝑃{2 < 𝑋 ≤ 4} = 𝐹(4) − 𝐹(2)
11 1
=1− =
12 12
Discrete Random Variables
Probability Mass Function of a Random Variable
Definition
The probability mass function (pmf) of a discrete random variable X is defined as:
𝑝(𝑎) = 𝑃{𝑋 = 𝑎}, 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑣𝑎𝑙𝑢𝑒 𝒂
We also have:
∞
∑ 𝑝(𝑥𝑖 ) = 1
𝑖=1
Now consider the probability mass function of the random variable representing the sum
when two dice are rolled:
Textbook Examples
Example 4.2a
𝑐𝜆𝑖
The probability mass function of a random variable X is given by 𝑝(𝑖) = , 𝑖 = 0,1,2, …
𝑖!
where λ is some positive value. Find:
a) P{X = 0}
Solution:
Since ∑∞
𝑖=0 𝑝(𝑖) = 1, we have:
∞
𝜆𝑖
𝑐∑ = 1
𝑖!
𝑖=0
𝜆𝑖
Which, because 𝑒 𝑥 = ∑∞
𝑖=0 , implies that
𝑖!
𝑐𝑒 𝜆 = 1 or 𝑐 = 𝑒 −𝜆
Hence:
𝒆−𝝀 𝝀𝟎
𝑷{𝑿 = 𝟎} = = 𝒆−𝝀
𝟎!
b) P{X > 2}
Solution:
𝑃{𝑋 > 2} = 1 − 𝑃{𝑋 ≤ 2}
= 1 − 𝑃{𝑋 = 0} − 𝑃{𝑋 = 1} − 𝑃{𝑋 = 2}
−𝜆 −𝜆
𝜆2 𝑒 −𝜆
=1−𝑒 − 𝜆𝑒 −
2
The cumulative distribution function of a discrete random variable X can also be expressed
in terms of the probability mass function:
Solution:
Since ∑∞
𝑥=1 𝑝(𝑥) = 1, we have:
∞
1 𝑥−1
𝑐∑( ) =1
3
𝑥=1
1 1
⇒ 𝑐 [1 + + + ⋯ ] = 1
3 9
1
⇒ 𝑐[ ]=1
1
1−
3
𝟐
⇒𝒄=
𝟑
b) P{X = 1}
Solution:
𝟐 𝟏 𝟏−𝟏 𝟐
𝑷(𝑿 = 𝟏) = 𝒑(𝟏) = ( ) =
𝟑 𝟑 𝟑
c) P{X > 2}
Solution:
𝑃(𝑋 > 2) = 𝑝(3) + 𝑝(4) + ⋯
= 1 − 𝑃(𝑋 ≤ 2)
= 1 − 𝑝(1) − 𝑝(2)
2 2
=1− −
3 9
𝟏
=
𝟗
Example 4.2
The cumulative distribution function of a discrete random variable X is given by:
0, 𝑓𝑜𝑟 𝑥 < 0
1
, 𝑓𝑜𝑟 0 ≤ 𝑥 < 2
𝐹(𝑥) = 4
2
, 𝑓𝑜𝑟 2 ≤ 𝑥 < 3
3
{ 1, 𝑓𝑜𝑟 𝑥 ≥ 3 }
Determine:
a) The pmf of X
Solution:
1
𝑝(0) =
4
2 1 5
𝑝(2) = − =
3 4 12
2 1
𝑝(3) = 1 − =
3 3
Therefore:
1
, 𝑓𝑜𝑟 𝑥 = 0
4
5
𝑝(𝑥) = 12 , 𝑓𝑜𝑟 𝑥 = 2
1
, 𝑓𝑜𝑟 𝑥 = 3
3
{ 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 }
b) P{0 < X ≤ 2}
Solution:
𝑃(0 < 𝑥 ≤ 2) = 𝐹(2) − 𝐹(0)
2 1 5
= − =
3 4 12
c) P{2 < X ≤ 3}
Solution:
𝑃(2 < 𝑥 ≤ 3) = 𝐹(3) − 𝐹(2)
2 1
=1− =
3 3
Expected Value
The expectation of a random variable and its variance allow us to describe a random variable
in terms of its location and its spread.
The expected value of a random variable is the weighted average or the relative frequency
interpretation of probabilities.
The variance is a measure of how widely the values are dispersed around this central value.
Expected Value
Definition
The expected value of a discrete random variable X with pmf p(x) as:
𝑬(𝑿) = ∑ 𝒙. 𝒑(𝒙)
𝒙:𝒑(𝒙)>𝟎
In words, the expected value of X is a weighted average of the possible values that X can
take on, each value being weighted by the probability that X assumes it.
Example 1
Pmf of X is given by:
1
𝑝(0) = = 𝑝(1)
2
Then
1 1 1
𝐸[𝑋] = 0 ( ) + 1 ( ) =
2 2 2
Is just the ordinary average of the 2 possible values, 0 and 1, that X can assume.
Example 2
Pmf of X is given by:
1 2
𝑝(0) = , 𝑝(1) =
3 3
Then
1 2 2
𝐸[𝑋] = 0 ( ) + 1 ( ) =
3 3 3
Is a weighted average of the 2 possible values, 0 and 1, where the value 1 is given twice as
much weight as the value 0, since 𝑝(1) = 2𝑝(0).
Example 3
Data from example 4.1a
𝒙 𝑷(𝑿 = 𝒙)
0 1
8
1 3
8
2 3
8
3 1
8
𝐸(𝑋) = ∑ 𝑥. 𝑝(𝑥)
𝑥:𝑝(𝑥)>0
1 3 3 1
= 0( ) + 1( ) + 2( ) + 3( )
8 8 8 8
3
= = 1.5
2
∑ 𝑥𝑖 . 𝑝(𝑥𝑖 ) = 𝐸[𝑋]
𝑖=1
Textbook examples
Example 4.3a
Find E(X), where X is the outcome when we roll a fair die.
Solution:
1
Since 𝑝(1) = 𝑝(2) = 𝑝(3) = 𝑝(4) = 𝑝(5) = 𝑝(6) = , we obtain:
6
1 1 1 1 1 1 𝟕
𝐸(𝑋) = 1 ( ) + 2 ( ) + 3 ( ) + 4 ( ) + 5 ( ) + 6 ( ) =
6 6 6 6 6 6 𝟐
Example 4.3b
We say that I is an indicator variable for the event A if:
1, 𝑖𝑓 𝐴 𝑜𝑐𝑐𝑢𝑟𝑠
𝐼={ }, Find E(I):
0, 𝑖𝑓 𝐴𝑐 𝑜𝑐𝑐𝑢𝑟𝑠
Solution:
Since 𝑝(1) = 𝑃(𝐴), 𝑝(0) = 1 − 𝑃(𝐴), we have:
𝐸(𝐼) = 𝑃(𝐴)
Example 4.3d
A school class of 120 students is driven in 3 buses to a symphonic performance. There are 36
students in one of the buses, 40 in another, and 44 in the third bus. When the buses arrive,
one of the 120 students is randomly chosen. Let X denote the number of students on the bus
of that randomly chosen student, and find E(X).
Solution:
Since a randomly chosen student is equally likely to be any of the 120 students, it follows that
36 3
𝑃{𝑋 = 36} = = ,
120 10
40 1
𝑃{𝑋 = 40} = = ,
120 3
44 11
𝑃{𝑋 = 44} = =
120 30
Hence,
3 1 11 1208
𝐸(𝑋) = 36 ( ) + 40 ( ) = 44 ( ) = = 𝟒𝟎. 𝟐𝟔𝟔𝟕
10 3 30 30
• The average number of students on a bus is 40 (120/3), but since the more students there
are on a bus, the more likely it is that a randomly chosen student would have been on that
bus. As a result, the buses with more students are given more weight than those with less.
Example 4.3d (adapted)
Consider three containers, containing 36, 40 and 44 items respectively. One item is selected
randomly. Let X = the number of items in the container from which the chosen item comes.
Calculate E(X). Now define Y = the number of items in a randomly selected container. What
is E(Y)?
Solution:
Probability distribution of X:
𝒙 𝑷(𝑿 = 𝒙)
36 36
120
40 40
120
44 44
120
Therefore:
3 1 11 1208
𝐸(𝑋) = 36 ( ) + 40 ( ) = 44 ( ) = = 𝟒𝟎. 𝟐𝟔𝟔𝟕
10 3 30 30
Probability distribution of Y:
𝒚 𝑷(𝒀 = 𝒚)
36 1
3
40 1
3
44 1
3
Therefore:
1 1 1
𝐸(𝑋) = 36 ( ) + 40 ( ) = 44 ( ) = 𝟒𝟎
3 3 3
Expectation of a Function of a Random Variable
If you want to compute the expected value of some function of X (discrete random variable),
first compute g(X) and then compute E[g(X)] using the definition of expected value.
Textbook Examples
Example 4.4a
Let X denote a random variable that takes on any of the values -1, 0, and 1 with respective
probabilities:
𝑃{𝑋 = −1} = 0.2, 𝑃{𝑋 = 0} = 0.5, 𝑃{𝑋 = 1} = 0.3
Compute E(X2).
Solution:
Let Y = X2, Then the probability mass function of Y is given by:
𝑃{𝑌 = 1} = 𝑃{𝑋 = −1} + 𝑃{𝑋 = 1} = 0.2 + 0.3 = 0.5
𝑃{𝑌 = 0} = 𝑃{𝑋 = 0} = 0.5
Hence,
𝐸(𝑋 2 ) = 𝐸(𝑌) = 1(0.5) + 0(0.5) = 𝟎. 𝟓
Note:
• 0.5 = 𝐸[𝑋 2 ] ≠ (𝐸[𝑋])2 = 0.1
Proposition 4.1
If X is a discrete random variable that takes on one of the values xi, i ≥ 1, with respective
probabilities p(xi), then, for any real-valued function g,
Corollary 4.1
If a and b are constants then:
𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏
Proof
=𝑎 ∑ 𝑥. 𝑝(𝑥) + 𝑏 ∑ 𝑝(𝑥)
𝑥:𝑝(𝑥)>0 𝑥:𝑝(𝑥)>0
= 𝑎𝐸[𝑋] + 𝑏
𝜇𝑛 = 𝐸(𝑋 𝑛 ) = ∑ 𝑥 𝑛 . 𝑝(𝑥)
𝑥:𝑝(𝑥)>0
Note:
• The expected value of X = the 1st non-central moment of X.
• The expected value of a random variable is referred to as the mean of the random variable
and is often written as μ.
Variance
Distribution of a Random Variable
Properties of distribution that should be attended to:
• Location
o Around what central point are the values of X distributed.
o Measure of location = expected value E(X), of the random variable X.
• Spread
o How widely are the x-values dispersed around a central value.
o Measure of spread = standard deviation and variance
The expected value therefore describes a distribution but nothing about the spread.
Example
𝒙 𝑷(𝑿 = 𝒙)
−4 0.2
2 0.6
8 0.2
We expect a random variable to take on values around its mean, and therefore it makes
sense to look at how far X is from its mean, on average.
Definition: Variance
If X is a random variable with mean μ, then the variance of X, denoted by Var(X) is defined
by:
𝑽𝒂𝒓(𝑿) = 𝑬[(𝑿 − 𝝁)𝟐 ]
We generally choose to consider the squared difference; that is, 𝐸[(𝑋 − 𝜇)2 ].
In the above example therefore, we can calculate:
𝐸[(𝑋 − 𝜇)2 ] = [(−4 − 2)2 (0.2)] + [(2 − 2)2 (0.6)] + [(8 − 2)2 (0.2)] = 14.4 = 𝑉𝑎𝑟(𝑋)
𝐸[(𝑌 − 𝜇)2 ] = [(−10 − 2)2 (0.2)] + [(0)2 (0.6)] + [(14 − 2)2 (0.2)] = 57.6 = 𝑉𝑎𝑟(𝑌)
Since Var(Y) > Var(X), it indicates that Y is more “spread out” than X.
Definition: Standard Deviation
The standard deviation of a random variable X is the square root of the variance. It is
denoted by:
𝑺𝑫(𝑿) = √𝑽𝒂𝒓(𝑿)
An alternative formula for Var(X) is given by:
𝑽𝒂𝒓(𝑿) = 𝑬[𝑿𝟐 ] − (𝑬[𝑿])𝟐
Derivation of this formula:
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] (by definition)
𝑉𝑎𝑟(𝑋) = ∑𝑥(𝑥 − 𝜇)2 . 𝑝(𝑥) (proposition 4.1)
= 𝐸(𝑋 2 ) − 2𝜇. 𝜇 + 𝜇2
= 𝐸(𝑋 2 ) − 2𝜇2 + 𝜇2
= 𝐸(𝑋 2 ) − 𝜇2
= 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
Identity
𝑽𝒂𝒓(𝒂𝑿 + 𝒃) = 𝒂𝟐 𝑽𝒂𝒓(𝑿), for constants a and b and a random variable X.
E[X+Y]=E[X]+E[Y], if x and y are independent
Var[X+Y]=Var[X]+Var[Y], if x and y are independent
Var(x+y)=Var(x)+Var(y)-2cov(x,y), if independence does not hold
Proof
𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑉𝑎𝑟(𝑎𝑋) + 𝑉𝑎𝑟(𝑏)
= 𝑎2 𝑉𝑎𝑟(𝑋) + 0
Note:
• Var[constant] = 0
• E[constant] = c
• 𝐸[𝑎𝑋 + 𝑏] = 𝐸[𝑎𝑋] + 𝐸[𝑏] = 𝑎𝐸[𝑋] + 𝑏
Textbook Examples
Example 4.5a
Calculate Var(X) if X represents the outcome when a fair die is rolled.
𝒙 𝑷(𝑿 = 𝒙)
1 1
6
2 1
6
3 1
6
4 1
6
5 1
6
6 1
6
1 1 1 1 1 1 𝟕
𝐸[𝑋] = 1 ( ) + 2 ( ) + 3 ( ) + 4 ( ) + 5 ( ) + 6 ( ) =
6 6 6 6 6 6 𝟐
1 1 1 1 1 1 𝟗𝟏
𝐸[𝑋 2 ] = 12 ( ) + 22 ( ) + 32 ( ) + 42 ( ) + 52 ( ) + 62 ( ) =
6 6 6 6 6 6 𝟔
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
91 7 2 𝟑𝟓
= −( ) =
6 2 𝟏𝟐
Additional Problem
Example 1
A salesman has scheduled two appointments to sell encyclopedias. His first appointment will
lead to a sale with probability 0.3, and his second will lead independently to a sale with
probability 0.6. Any sale made is equally likely to be either for the deluxe model, which costs
$1000, or the standard model, which costs $500. Determine the probability mass function
of X, the total dollar value of all sales.
Solution:
Appointment Sales Value ($) Probability
1 0 0 0.7
Standard model 500 0.3(0.5) = 0.15
Deluxe model 1 000 0.3(0.5) = 0.15
2 0 0 0.4
Standard model 500 0.6(0.5) = 0.3
Deluxe model 1 000 0.6(0.5) = 0.3
The distribution of X is therefore as follows:
𝑃(𝑋 = 0) = (0.7)(0.4) = 0.28
𝑃(𝑋 = 500) = (0.15)(0.4) + (0.7)(0.3) = 0.27
𝑃(𝑋 = 1 000) = (0.15)(0.4) + (0.7)(0.3) + (0.15)(0.3) = 0.315
𝑃(𝑋 = 1 500) = (0.15)(0.3) + (0.15)(0.3) = 0.09
𝑃(𝑋 = 2 000) = (0.15)(0.3) = 0.045
𝟎. 𝟐𝟖𝟎 𝒙=𝟎
𝟎. 𝟐𝟕𝟎 𝒙 = 𝟓𝟎𝟎
𝒑(𝒙) = 𝟎. 𝟑𝟏𝟓 𝒙 = 𝟏 𝟎𝟎𝟎
𝟎. 𝟎𝟗𝟎 𝒙 = 𝟏 𝟓𝟎𝟎
{𝟎. 𝟎𝟒𝟓 𝒙 = 𝟐 𝟎𝟎𝟎}
Example 2
Four independent flips of a fair coin are made. Let X denote the number of heads obtained.
Plot the probability mass function of the random variable X – 2.
Solution:
Sample space = {(T,T,T,T), (T,T,T,H), (T,T,H,H), (T,H,H,H), (H,H,H,H), (H,H,H,T), (H,H,T,T),
(H,T,T,T), (H,T,H,T), (T,H,T,H), (T,T,H,T), (H,T,T,T), (H,H,T,H), (T,H,H,H), (H,T,H,H), (T,H,T,T)}
X Y=X–2 P(Y)
0 -2 1/16
1 -1 4/16
2 0 6/16
3 1 4/16
4 2 1/16
Example 3
Two coins are to be flipped. The first coin will land on heads with probability 0.6, the second
with probability 0.7. Assume that the results of the flips are independent, and let X equal the
total number of heads that result.
a) Find P(X = 1)
Solution:
𝑃(𝑋 = 1) = 𝑃[(𝐻, 𝑇)] ∪ 𝑃[(𝑇, 𝐻)]
= (0.6)(0.3) + (0.4)(0.7) = 𝟎. 𝟒𝟔
b) Determine E(X)
Solution:
𝑃(𝑋 = 0) = 𝑃[(𝑇, 𝑇)] = (0.4)(0.3) = 0.12
𝑃(𝑋 = 1) = 𝑃[(𝐻, 𝑇)] ∪ 𝑃[(𝑇, 𝐻)] = 0.46
𝑃(𝑋 = 2) = 𝑃[(𝐻, 𝐻)] = (0.6)(0.7) = 0.42
𝑬(𝑿) = 0(0.12) + 1(0.46) + 2(0.42) = 𝟏. 𝟑
Example 4
A sample of 3 items is selected at random from a box containing 20 items of which 4 are
defective. Find the expected number of defective items in the sample.
Solution:
Let X = number of defective items in the sample space
The distribution of X is as follows:
x Calculation p(x)
0 (4𝐶0 )(16𝐶3 ) 560
= 0.491
20𝐶3 1 140
1 (4𝐶1 )(16𝐶2 ) 480
= 0.421
20𝐶3 1 140
2 (4𝐶2 )(16𝐶1 ) 96
= 0.084
20𝐶3 1 140
3 (4𝐶3 )(16𝐶0 ) 4
= 0.0035
20𝐶3 1 140
Solution:
Let X = amount you win
The distribution of X is as follows:
Outcome Gain = x Calculation Probability
RR/BB 1.10 (5𝐶2 )(5𝐶0 ) + (5𝐶0 )(5𝐶2 ) 4
10𝐶2 9
RB/BR -1.00 (5𝐶1 )(5𝐶1 ) 5
10𝐶2 9
4 5
𝑬(𝑿) = (1.10) ( ) + (−1.00) ( ) = −𝟎. 𝟎𝟔𝟔𝟕
9 9
(b) The variance of the amount you win.
Solution:
Let X = amount you win
4 5
𝑬(𝑿𝟐 ) = (1.10)2 ( ) + (−1.00)2 ( ) = 𝟏. 𝟎𝟗𝟑𝟑
9 9
𝑽𝒂𝒓(𝑿) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2
= 1.0933 − (−0.0667)2
= 𝟏. 𝟎𝟖𝟖𝟗
Note:
• If X has distribution function , he distribution function of eX is:
• Let Y = ex
𝐹(𝑋) = 𝑃(𝑋 ≤ 𝑥)
𝐹𝑦 (𝑌) = 𝑃(𝑌 ≤ 𝑦)
𝐹𝑦 (𝑌) = 𝑃(𝑒 𝑥 ≤ 𝑦)
𝐹𝑦 (𝑌) = 𝑃(𝑋 ≤ ln (𝑦))
𝐹𝑦 (𝑌) = 𝐹𝑥 (ln (𝑦)
Distributions
We will consider 5 specific discrete probability distributions
In each case we will especially consider the following aspects:
(i) Under what circumstances, i.e. in what situations, can the distribution be used?
(ii) The pmf of the distribution
(iii) The parameters of the distribution, and interpretation of them in a practical situation
(iv) What relationships are there between the distributions?
(v) What are the expected value and the variance of the distribution?
(vi) Special properties of the distribution
If these requirements are met, the random variable X is defined as X = the number of
successes in the n repetitions. The probability distribution of X is a binomial distribution with
parameters n and p. This is denoted by 𝑿~𝒃𝒊𝒏𝒐𝒎(𝒏, 𝒑)
If n = 1 we call the binomial distribution a Bernoulli distribution.
The pmf of the binomial distribution is given by:
𝒏
𝒑(𝒙) = ( ) 𝒑𝒙 (𝟏 − 𝒑)𝒏−𝒙 , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, … , 𝒏
𝒙
Note:
𝑛
𝒏
∑ ( ) 𝒑𝒙 (𝟏 − 𝒑)𝒏−𝒙 = 1
𝒙
𝑥=0
10 10
= 1 − [(( ) 0.010 (0.99)10 ) + (( ) 0.011 (0.99)9 )]
0 1
= 1 − [0.9044 + 0.0914] = 𝟎. 𝟎𝟎𝟒𝟐
Example 4.6b extension
Suppose someone buys 3 packages of screws. What is the probability that exactly 1 package
is brought back for replacement?
Solution:
In this case, we can see the purchase of 3 packages as a (different) binomial experiment.
If we define Y = the number of packages returned (out of the 3 purchased), then Y follows a
binomial distribution with n = 3. Success in this case is seen as the return of a package, which
means that P(success) = P(package is returned/replaced) = 0.0042.
Therefore, 𝑌 ~ 𝑏𝑖𝑛𝑜𝑚(3; 0.0042).
Thus:
3
𝑃(𝑌 = 1) = ( ) 0.00421 (1 − 0.0042)3−1 = 𝟎. 𝟎𝟏𝟐𝟓
1
Additional Examples
Example 4.1
On a multiple-choice exam with 5 possible answers for each of the 15 questions on the exam,
what is the probability that a student will pass the exam if he/she randomly guesses the
answers?
Let X = number of correct guesses
Then 𝑋 ~ 𝑏𝑖𝑛𝑜𝑚(𝑛 = 15; 𝑝 = 0.2)
The pmf of X is given by:
𝟏𝟓
𝒑(𝒙) = ( ) 𝟎. 𝟐𝒙 (𝟎. 𝟖)𝟏𝟓−𝒙 , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, … , 𝟏𝟓
𝒙
To pass the exam, they need to get at least 50% which implies at least 8 correct answers.
Therefore:
𝑃(𝑝𝑎𝑠𝑠𝑒𝑠) = 𝑃(𝑋 ≥ 8)
= 𝑃(𝑋 = 8) + 𝑃(𝑋 = 9) + 𝑃(𝑋 = 10) + ⋯ + 𝑃(𝑋 = 13) + 𝑃(𝑋 = 14) + 𝑃(𝑋 = 15)
Since that would require a lot of tedious work, we make use of tables for binomial
distribution, which allows use to look up the probabilities we require. In these tables we
make use of n (number of repetitions), a (number of successes) and π (probability of success).
in This example, n = 15, and π = 0.2 and we thus look in the row π = 0.2 & the column π = 0.2
The table gives cumulative probabilities. If we look at the row where a = 7: 𝑃(𝑋 ≤ 7)
Therefore for 𝑋 ~ 𝑏𝑖𝑛𝑜𝑚(𝑛 = 15; 𝑝 = 0.2), 𝑃(𝑋 ≤ 7) = 0.9958 and thus:
𝑃(𝑋 ≥ 8) = 1 − 𝑃(𝑋 ≤ 7)
= 1 − 0.9958
= 𝟎. 𝟎𝟎𝟒𝟐
Example 4.2
Suppose we have a biased coin, for which the probability of obtaining heads is 0.4. Suppose
this coin is tossed 20 times. Therefore X = number of heads in the 20 tosses of the coin.
This is a binomial distribution, therefore: 𝑋 ~ 𝑏𝑖𝑛𝑜𝑚(𝑛 = 20; 𝑝 = 0.4)
Calculate:
(a) P(at most 6 heads)
𝑃(𝑋 ≤ 6) = 𝟎. 𝟐𝟓𝟎𝟎
This is taken from the binomial cumulative probability table
𝑃(𝑋 ≥ 9) = 1 − 𝑃(𝑋 ≤ 8)
= 1 − 0.5956
= 𝟎. 𝟒𝟎𝟒𝟒
Using this formula we see that we can find 𝑃(𝑌 ≤ 11) with probability 0.4 (instead of 0.6).
From the table, this is the probability 0.9435 and the answer is therefore the same as sol 1.
Let:
𝑦 =𝑥−1⇒𝑥 =𝑦+1
Therefore:
𝑛−1
𝑛−1 𝑦
𝐸(𝑋 𝑘 ) = 𝑛𝑝 ∑ (𝑦 + 1)𝑘−1 ( ) 𝑝 . (1 − 𝑝)𝑛−𝑦−1
𝑦
𝑦=0
For k = 1:
𝐸(𝑋) = 𝑛𝑝. 𝐸[(𝑌 + 1)1−1 ]
= 𝑛𝑝. 𝐸[(𝑌 + 1)0 ]
= 𝑛𝑝. 𝐸(1)
= 𝒏𝒑
For k = 2:
𝐸(𝑋 2 ) = 𝑛𝑝. 𝐸[(𝑌 + 1)2−1 ]
= 𝑛𝑝. 𝐸[(𝑌 + 1)1 ]
= 𝑛𝑝. [𝐸(𝑌) + 𝐸(1)]
= 𝑛𝑝. [(𝑛 − 1)𝑝 + 1] (Since 𝑌 ~ 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛−)1, 𝑝)
= 𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝
Formally, suppose X = the number of events occurring per unit of time according to a Poisson
distribution. We denote this by 𝑿~𝑷𝒐(𝝀) where λ is the parameter of the distribution.
The pmf of X is given by:
−𝝀
𝝀𝒙
𝒑(𝒙) = 𝑷(𝑿 = 𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
The interpretation of the parameter λ is the average number of events occurring per unit of
time (or space).
−𝝀
𝝀𝒙
𝒑(𝒙) = 𝑷(𝑿 = 𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
𝑬(𝑿) = ∑ 𝒙. 𝒑(𝒙)
𝒙:𝒑(𝒙)>𝟎
∞
−𝝀
𝝀𝒙
= ∑ 𝒙. (𝒆 ∙ )
𝒙!
𝒙=𝟎
∞
−𝝀
𝝀𝟎 −𝝀
𝝀𝒙
= 𝟎. 𝒆 ∙ + ∑ 𝒙. (𝒆 ∙ )
𝟎! 𝒙!
𝒙=𝟏
∞
−𝝀
𝝀𝒙
= ∑ 𝒙. (𝒆 ∙ )
𝒙!
𝒙=𝟏
∞
−𝝀
𝝀𝒙 . 𝒙
= ∑ (𝒆 ∙ )
𝒙(𝒙 − 𝟏)!
𝒙=𝟏
∞
−𝝀
𝝀𝒙
= ∑ (𝒆 ∙ )
(𝒙 − 𝟏)!
𝒙=𝟏
∞
𝝀. 𝝀𝒙−𝟏
= ∑ (𝒆−𝝀 ∙ )
(𝒙 − 𝟏)!
𝒙=𝟏
∞
−𝝀
𝝀𝒙−𝟏
= 𝒆 .𝝀∑( )
(𝒙 − 𝟏)!
𝒙=𝟏
∞
−𝝀
𝝀𝒚
= 𝒆 .𝝀∑( ) (Let 𝑦 = 𝑥 − 1)
𝒚!
𝒚=𝟎
∞ ∞
(𝒚). 𝒆−𝝀 . 𝝀𝒚 (𝟏). 𝒆−𝝀 . 𝝀𝒚
= 𝝀 [∑ ( )+∑( )]
𝒚! 𝒚!
𝒚=𝟎 𝒚=𝟎
−𝟎.𝟓
𝟎. 𝟓𝒙
𝒑(𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
Thus to answer the question:
𝑃(𝑋 ≥ 1) = 1 − 𝑃(𝑋 < 1)
= 1 − 𝑃(𝑋 = 0)
−0.5
0.50
=1−𝑒 ∙
0!
= 1 − 𝑒 −0.5
= 1 − 0.6065
= 𝟎. 𝟑𝟗𝟑𝟓
(b) of finding a total of 4 typographical errors on 3 pages.
−𝟏.𝟓
𝟏. 𝟓𝒙
𝒑(𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
Thus to answer the question:
−1.5
1.54
𝑃(𝑋 = 4) = 𝑒 ∙
4!
= 𝟎. 𝟎𝟒𝟕𝟏
(c) of finding at least 1 error on 6 out of the next 10 pages.
In this case we can view every page as a “repetition” in a binomial distribution. From part
(a), we know that P(at least 1 error per page) = 0.3935
Let: 𝑌 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑔𝑒𝑠 (𝑜𝑢𝑡 𝑜𝑓 10) 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 1 𝑒𝑟𝑟𝑜𝑟
Then: 𝒀~𝒃𝒊𝒏𝒐𝒎(𝒏 = 𝟏𝟎; 𝒑 = 𝟎. 𝟑𝟗𝟑𝟓)
Therefore the pmf of Y is:
𝟏𝟎
𝒑(𝒀) = ( ) (𝟎. 𝟑𝟗𝟑𝟓)𝒚 (𝟏 − 𝟎. 𝟑𝟗𝟑𝟓)𝟏𝟎−𝒚 , 𝒇𝒐𝒓 𝒚 = 𝟎, 𝟏, 𝟐, … , 𝟏𝟎
𝒚
Thus to answer the question:
10
𝑃(𝑌 = 6) = ( ) (0.3935)6 (1 − 0.3935)10−6
6
= 𝟎. 𝟏𝟎𝟓𝟓
Example 4.7b
Suppose that the probability that an item produced by a certain machine will be defective is
0.1. Find the probability that a sample of 10 items will contain at most 1 defective item.
Solution:
Binomial distribution
Let: 𝑋 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 10 𝑖𝑡𝑒𝑚𝑠
Then: 𝑿~𝑩𝒊𝒏𝒐𝒎(𝒏 = 𝟏𝟎, 𝒑 = 𝟎. 𝟏)
𝑃(𝑋 ≤ 1) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)
10 10
= ( ) (0.1)0 (0.9)10−0 + ( ) (0.1)1 (0.9)10−1
0 1
= 𝟎. 𝟕𝟑𝟔𝟏
Poisson distribution
Let: 𝑋 = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 10 𝑖𝑡𝑒𝑚𝑠
Then: 𝑿~𝑷𝒐(𝝀 = 𝟏)
Where: 𝝀 = 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑝𝑒𝑟 𝑢𝑛𝑖𝑡 (𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 10)
Therefore, the pmf of X is:
−𝟏
𝟏𝒙
𝒑(𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
Thus to answer the question:
−1
10 −1
11
𝑃(𝑋 ≤ 1) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) = 𝑒 ∙ + 𝑒 ∙ = 𝟎. 𝟕𝟑𝟓𝟖
0! 1!
Example 4.7c
Consider an experiment that consists of counting the number of α particles given off in a
1-second interval by 1 gram of radioactive material. If we know from past experience that on
the average, 3.2 such α particles are given off, what is a good approximation to the
probability that no more than 2 α particles will appear?
Solution:
If we think of the gram of radioactive material as consisting of a large number n of atoms,
each of which has probability of 3.2/n of disintegrating and sending off an particle during the
second considered, then we see that to a very close approximation, the number of particles
given off will be a Poisson random variable with parameter λ = 3.2. Hence, the desired
probability is:
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
−3.2 −3.2
(3.2)2 −3.2
=𝑒 + 3.2𝑒 + 𝑒 = 𝟎. 𝟑𝟕𝟗𝟗
2
Additional Examples
Example 1
A certain defect occurs in 1% of the items manufactured in a factory. Consider 1 000 items
selected randomly from the factory production.
Let: 𝑋 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒.
(a) According to the binomial distribution, what is the probability mass function of X?
1 000
𝑝(𝑥) = ( ) (0.01)𝑥 (0.99)1 000−𝑥 , 𝑓𝑜𝑟 𝑥 = 0, 1, 2, … ,1 000
𝑥
(b) Use the binomial probability mass function and calculate the probability of finding 8
defective items in the sample.
𝑃(𝑋 = 8) = 𝑝(8)
1 000
=( ) (0.01)8 (0.99)1 000−8
8
= 𝟎. 𝟏𝟏𝟐𝟖
(c) According to the Poisson distribution, what is the probability mass function of X?
λ = np = (1 000)(0.01) = 𝟏𝟎
That is, n is large and p is small, with np moderately large, so we can use the Poisson
distribution as an approximation to the binomial distribution. Therefore,
−10
10𝑥
𝑝(𝑥) = 𝑒 ∙ , 𝑓𝑜𝑟 𝑥 = 0, 1, 2, …
𝑥!
(d) Use the Poisson probability mass function and calculate the probability of finding 8
defective items in the sample.
𝑃(𝑋 = 8) = 𝑝(8)
−10
108
=𝑒 ∙
8!
= 𝟎. 𝟏𝟏𝟐𝟔
Example 2
Suppose the number of claims at a short-term insurer can be described by a Poisson
distribution with average 5 per day.
Let: 𝑋 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑖𝑚𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦
Then: 𝑋~𝑃𝑜(𝜆 = 5)
(a) Write down the probability mass function of the number of claims received per day.
−5
5𝑥
𝑝(𝑥) = 𝑒 ∙ , 𝑓𝑜𝑟 𝑥 = 0, 1, 2, …
𝑥!
(b) Determine the probability that at most 4 claims are received on any given day.
25 125 625
𝑃(𝑋 ≤ 4) = 𝑝(0) + 𝑝(1) + 𝑝(2) + 𝑝(3) + 𝑝(4) = 𝑒 −5 (1 + 5 + + + )
2 6 24
= 𝟎. 𝟒𝟒𝟎𝟓
One can also use the cumulative probability tables to find these values:
(c) Determine the probability that at most 20 claims are received in a 4-day week.
(d) Determine the probability that at least 7 claims are received in a period of 3 days.
If these requirements are met, the random variable X is defined as X = the number of
repetitions required to obtain the first success. The probability distribution of X is a
geometric distribution with parameter p. This is denoted by 𝑿~𝒈𝒆𝒐𝒎𝒆𝒕𝒓𝒊𝒄(𝒑)
The pmf of X is given by:
𝒑(𝒏) = 𝑷(𝑿 = 𝒏) = (𝟏 − 𝒑)𝒏−𝟏 𝒑, 𝒇𝒐𝒓 𝒏 = 𝟏, 𝟐, 𝟑 …
Derivation of the pmf of a geometric distribution:
Solution:
Let: 𝑋 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑟𝑎𝑤𝑠 𝑛𝑒𝑒𝑑𝑒𝑑 𝑡𝑜 𝑠𝑒𝑙𝑒𝑐𝑡 𝑎 𝑏𝑙𝑎𝑐𝑘 𝑏𝑎𝑙𝑙
𝑴 𝑁
Thus: 𝑃(𝑠𝑢𝑐𝑐𝑒𝑠𝑠) = 𝑃(𝑏𝑙𝑎𝑐𝑘 𝑏𝑎𝑙𝑙) = 𝒑 = and 1 − 𝑝 =
𝑴+𝑵 𝑀+𝑁
𝑛−1
𝑛−1
𝑁 𝑴 𝑴𝑵𝒏−𝟏
𝑷(𝑿 = 𝒏) = (1 − 𝑝) .𝒑 = ( ) ∙( )=
𝑀+𝑁 𝑴+𝑵 (𝑴 + 𝑵)𝒏
b) At least k draws are needed?
𝑛−1∞
𝑀 𝑁
𝑷(𝑿 ≥ 𝒌) = ( )∑( )
𝑀+𝑁 𝑀+𝑁
𝑛=𝑘
𝑛−1
𝑀 𝑁
(𝑀 + 𝑁) (𝑀 + 𝑁)
=
𝑁
[1 − ]
𝑀+𝑁
𝑘−1
𝑁
=( )
𝑀+𝑁
= (𝟏 − 𝒑)𝒌−𝟏
= ∑ 𝑝(1 − 𝑝)𝑥−1
𝑥=𝑛+1
∞
= 𝑝 ∑ (1 − 𝑝)𝑥−1
𝑥=𝑛+1
Example
Let’s suppose k = 2 and n = 3. Then: 𝑃(𝑋 = 5 | 𝑋 > 3) = 𝑃(𝑋 = 2)
X > 3 means that the first 3 trials were all failures. So, the required conditional probability is
the probability that the 5th trial is the first success, given that the first 3 trials were all failures.
However, due to the independence of the trials, we can actually see the 4th trial as a new
“starting point” (the 3 failures that have already occurred have no effect on the future
outcomes), and we therefore only have to calculate the probability that the 2nd trial is a
success.
Negative Binomial Distribution: 𝑿~𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝒃𝒊𝒏𝒐𝒎𝒊𝒂𝒍(𝒓, 𝒑)
When can the negative binomial distribution be used?
• We have an experiment with two possible outcomes, called success and failure. This
experiment is repeated under the same conditions.
• The repetitions (trials) are independent.
• P(success) = p remains constant throughout all the repetitions.
If these requirements are met, the random variable X is defined as X = the number of
repetitions required to obtain the rth success. The probability distribution of X is a negative
binomial distribution with parameters r & p. This is denoted by 𝑿~𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝒃𝒊𝒏𝒐𝒎(𝒓, 𝒑)
Note:
• The geometric distribution is a special case of negative binomial distribution (i.e. r = 1)
o Geometric distribution describes: number of trials for 1st success
o Negative binomial distribution describes: number of trials for rth (r ≥1) success
∞ 𝑟−1 ∞ 𝑟−1
𝑖 − 1 𝑟 (1 𝑛
∑ 𝑃(𝑋 = 𝑖) = ∑ 𝑃(𝑌 = 𝑖) ∴ ∑ ( )𝑝 − 𝑝)𝑖−𝑟 = ∑ ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑟−1 𝑖
𝑖=𝑛+1 𝑖=0 𝑖=𝑛+1 𝑖=0
∴ 𝑃(3 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑏𝑒𝑓𝑜𝑟𝑒 5 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠) = 𝑃(3𝑟𝑑 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑜𝑛 3𝑟𝑑 𝒐𝒓 4𝑡ℎ 𝒐𝒓 … 𝒐𝒓 7𝑡ℎ 𝑡𝑟𝑖𝑎𝑙)
= 𝑃(𝑋 ≤ 7), 𝑤ℎ𝑒𝑟𝑒 𝑋~𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑏𝑖𝑛𝑜𝑚(𝑟 = 3, 𝑝)
7
𝑥 − 1 3 (1
= ∑( )𝑝 − 𝑝)𝑥−3
2
𝑥=3
𝒓+𝒎−𝟏
𝒙 − 𝟏 𝒓 (𝟏
∴ 𝑃(𝑟 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑏𝑒𝑓𝑜𝑟𝑒 𝑚 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠) = ∑ ( )𝒑 − 𝒑)𝒙−𝒓
𝒓−𝟏
𝒙=𝒓
Additional Examples
Example 1
A coin has P(heads) = 0.4. The coin is tossed repeatedly under identical circumstances.
(a) Write down the pmf of the number of heads obtained in 20 tosses of the coin.
Solution:
Let: 𝑋 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑒𝑎𝑑𝑠 𝑖𝑛 20 𝑡𝑜𝑠𝑠𝑒𝑠
Thus: 𝑿~𝒃𝒊𝒏𝒐𝒎(𝒏 = 𝟐𝟎, 𝒑 = 𝟎. 𝟒)
𝟐𝟎
𝒑(𝒙) = ( ) (𝟎. 𝟒)𝒙 (𝟎. 𝟔)𝟐𝟎−𝒙 , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, … 𝟐𝟎
𝒙
Solution:
𝑃(𝑎𝑡 𝑚𝑜𝑠𝑡 7 ℎ𝑒𝑎𝑑𝑠 𝑖𝑛 20 𝑡𝑜𝑠𝑠𝑒𝑠) = 𝑃(𝑋 ≤ 7)
= 𝟎. 𝟒𝟏𝟓𝟗
(c) Write down the pmf of the number of tosses required to obtain the first heads.
Solution:
Let 𝑌 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑠𝑠𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 1𝑠𝑡 ℎ𝑒𝑎𝑑
Thus: 𝒀~𝒈𝒆𝒐𝒎𝒆𝒕𝒓𝒊𝒄(𝒑 = 𝟎. 𝟒)
𝒑(𝒚) = (𝟎. 𝟒)(𝟎. 𝟔)𝒚−𝟏 , 𝒇𝒐𝒓 𝒚 = 𝟏, 𝟐, 𝟑, …
(d) Calculate the probability that at least 3 tosses are required to obtain the first heads.
Solution:
𝑃(𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 3 𝑡𝑜𝑠𝑠𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 1𝑠𝑡 ℎ𝑒𝑎𝑑𝑠) = 𝑃(𝑌 ≥ 3)
= 1 − 𝑃(𝑌 ≤ 2)
= 1 − 𝑝(1) − 𝑝(2)
= 1 − (0.4)(0.6)1−1 − (0.4)(0.6)2−1
= 𝟎. 𝟑𝟔
(e) Write down the pmf of the number of tosses required to obtain 5 heads.
Solution:
Let 𝑍 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑠𝑠𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 5𝑡ℎ ℎ𝑒𝑎𝑑𝑠
Thus: 𝒁~𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝒃𝒊𝒏𝒐𝒎(𝒓 = 𝟓, 𝒑 = 𝟎. 𝟒)
𝒛−𝟏
𝒑(𝒛) = ( ) (𝟎. 𝟒)𝟓 . (𝟎. 𝟔)𝒛−𝟓 , 𝒇𝒐𝒓 𝒛 = 𝟓, 𝟔, 𝟕, …
𝟓−𝟏
(f) Calculate the probability that at most 10 tosses are required to obtain 5 heads.
= 𝟎. 𝟑𝟔𝟔𝟗
Solution 2: Link between binomial and negative binomial (𝑃(𝑋 > 𝑛) = 𝑃(𝑌 < 𝑟))
𝒁~𝒏𝒆𝒈𝒂𝒕𝒊𝒗𝒆 𝒃𝒊𝒏𝒐𝒎(𝒓 = 𝟓, 𝒑 = 𝟎. 𝟒)
𝑾~𝒃𝒊𝒏𝒐𝒎(𝒏 = 𝟏𝟎, 𝒑 = 𝟎. 𝟒)
If these requirements are met, the random variable X is defined as X = the number of
successes obtained in the sample. The probability distribution of X is a hypergeometric
distribution with parameters N, n & m. This is denoted by 𝑿~𝒉𝒚𝒑𝒆𝒓𝒈𝒆𝒐𝒎𝒆𝒕𝒓𝒊𝒄(𝑵, 𝒏, 𝒎)
Note:
• If the sample was selected with replacement or if the population is much larger than the
𝒎
sample, then 𝑿~𝒃𝒊𝒏𝒐𝒎𝒊𝒂𝒍 (𝒏, )
𝑵
𝑚 𝑁−𝑚
( )( )
𝑃(𝑋 = 𝑖) =
𝑖 𝑛 − 𝑖 = ⋯ = (𝒏) 𝒑𝒊 (𝟏 − 𝒑)𝒏−𝒊
𝑁 𝒊
( )
𝑛
But since it is without replacement, the pmf of X is given by:
𝒎 𝑵−𝒎
( )( )
𝒊 𝒏 − 𝒊
𝑷(𝑿 = 𝒊) = , 𝑓𝑜𝑟 𝑖 = 0,1,2, … , 𝑛
𝑵
( )
𝒏
Derivation of the pmf of a hypergeometric distribution:
Additional Example
Example 2
Suppose that a box contains 10 black balls and 20 red balls. If 5 balls are selected at random
without replacement, what is the probability that 2 black balls will be obtained?
Solution:
Let: 𝑋 = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑙𝑎𝑐𝑘 𝑏𝑎𝑙𝑙𝑠 𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑
Thus: 𝑿~𝒉𝒚𝒑𝒆𝒓𝒈𝒆𝒐𝒎𝒆𝒕𝒓𝒊𝒄(𝑵 = 𝟑𝟎, 𝒏 = 𝟓, 𝒎 = 𝟏𝟎)
Therefore:
𝑃(2 𝑏𝑙𝑎𝑐𝑘 𝑏𝑎𝑙𝑙𝑠 𝑎𝑟𝑒 𝑜𝑏𝑡𝑎𝑖𝑛𝑒𝑑) = 𝑃(𝑋 = 2)
10 30 − 10 10 20
( )( ) ( )( )
= 2 5−2 = 2 3
30 30
( ) ( )
5 5
𝟗𝟓𝟎
= = 𝟎. 𝟑𝟓𝟗𝟗
𝟐 𝟔𝟑𝟗
Note:
• If we were selecting balls with replacement in this example, we would have:
𝟏𝟎 𝟏
𝑿~𝒃𝒊𝒏𝒐𝒎𝒊𝒂𝒍 (𝒏 = 𝟓, 𝒑 = = )
𝟑𝟎 𝟑
Textbook Example
Example 4.8i
A purchaser of electrical components buys them in lots of size 10. It is his policy to inspect 3
components randomly from a lot and to accept the lot only if all 3 are non-defective. If 30
percent of the lots have 4 defective components and 70 percent have only 1, what
proportion of lots does the purchaser reject?
Solution:
Let: 𝐴 = 𝑡ℎ𝑒 𝑒𝑣𝑒𝑛𝑡 𝑡ℎ𝑎𝑡 𝑡ℎ𝑒 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑟 𝑎𝑐𝑐𝑒𝑝𝑡𝑠 𝑎 𝑙𝑜𝑡
3 7
𝑃(𝐴) = 𝑃(𝐴 | 𝑙𝑜𝑡 ℎ𝑎𝑠 4 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) (
) + 𝑃(𝐴 | 𝑙𝑜𝑡 ℎ𝑎𝑠 1 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒) ( )
10 10
4 6 1 9
( )( ) 3 ( )( ) 7 54
= 0 3 ( )+ 0 3 ( )=
10 10
( ) 10 ( ) 10 100
3 3
Hence, 46% of the lots are rejected
Expected Value & Variance of Hypergeometric Distribution
𝒏𝒎
𝑬(𝑿) =
𝑵
and
𝒏𝒎 (𝒏 − 𝟏)(𝒎 − 𝟏) 𝒏𝒎
𝑽𝒂𝒓(𝑿) = [ +𝟏− ]
𝑵 𝑵−𝟏 𝑵
𝑚
Letting 𝑝 =
𝑁
𝑛−1
𝑉𝑎𝑟(𝑋) = 𝑛𝑝(1 − 𝑝) [1 − ]
𝑁−1
Note:
• 𝐸[(1 − 𝑝)𝑋 ] = (1 − 𝑝2 )𝑛
1 1−(1−𝑝)𝑛+1
• 𝐸[ ]= (𝑛+1)𝑝
𝑋+1
Additional Examples
Suppose the number of calls handled by a switchboard is Poisson distributed with a mean of
2 calls per minute.
Let Xt = number of calls handled in t minutes, t > 0.
Then 𝑋𝑡 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(2𝑡)
(a) Write down the probability mass function of the number of calls that are handled in a
5-minute period.
Solution:
𝑋5 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2(5) = 10)
−𝟏𝟎
𝟏𝟎𝒙
𝒑(𝒙) = 𝑷(𝑿𝟓 = 𝒙) = 𝒆 ∙ , 𝒇𝒐𝒓 𝒙 = 𝟎, 𝟏, 𝟐, …
𝒙!
(b) Calculate the probability that 15 calls have to be handled in a period of 6 minutes.
Solution:
𝑋6 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2(6) = 12)
−𝟏𝟐
𝟏𝟐𝟏𝟓
𝒑(𝟏𝟓) = 𝑷(𝑿𝟔 = 𝟏𝟓) = 𝒆 ∙ = 𝟎. 𝟎𝟕𝟐𝟒
𝟏𝟓!
(c) Calculate the probability that at least 12 calls have to be handled in a period of 5
minutes.
Solution:
𝑋5 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2(5) = 10)
𝑷(𝑿𝟓 ≥ 𝟏𝟐) = 𝟏 − 𝑷(𝑿𝟓 ≤ 𝟏𝟏)
= 𝟏 − 𝟎. 𝟔𝟗𝟔𝟖
= 𝟎. 𝟑𝟎𝟑𝟐
(d) Calculate the probability that at most 7 calls have to be handled in at least 3 of 5
consecutive 3-minute periods.
Solution:
𝑋3 ~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆 = 2(3) = 6)
𝑷(𝑿𝟑 ≤ 𝟕) = 𝟎. 𝟕𝟒𝟒𝟎
Solution:
12
Let: 𝑝 = 𝑃( ℎ𝑒 𝑤𝑖𝑛𝑠 𝑎𝑡 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 𝑔𝑎𝑚𝑒) =
38
Solution:
Let 𝑋 = #𝑔𝑎𝑚𝑒𝑠 𝑢𝑛𝑡𝑖𝑙 1𝑠𝑡 𝑤𝑖𝑛
12
Thus: 𝑋~𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 (𝑝 = )
38
Solution:
5−5
5−1 1 2 5
𝑷(𝑿 = 𝟓) = ( )( ) ( )
4 3 3
𝟐 𝟓
= ( ) = 𝟎. 𝟏𝟑𝟏𝟕
𝟑
b) 8 people
Solution:
8 𝑥−5
𝑥−1 1 2 5
𝑷(𝑿 ≤ 𝟖) = ∑ ( )( ) ( )
4 3 3
𝑥=5
𝟐 𝟓 𝟏 𝟏 𝟐 𝟏 𝟑
= ( ) [𝟏 + 𝟓 ( ) + 𝟏𝟓 ( ) + 𝟑𝟓 ( ) ]
𝟑 𝟑 𝟑 𝟑
= 𝟎. 𝟕𝟒𝟏𝟒
c) Exactly 6 people
Solution:
6−5
6−1 1 2 5
𝑷(𝑿 = 𝟔) = ( )( ) ( )
4 3 3
1 1 2 5
= 5( ) ( )
3 3
= 𝟎. 𝟐𝟏𝟗𝟓
d) Exactly 7 people
Solution:
7−5
7−1 1 2 5
𝑷(𝑿 = 𝟕) = ( )( ) ( )
4 3 3
1 2 2 5
= 15 ( ) ( )
3 3
= 𝟎. 𝟐𝟏𝟗𝟓
Question 78: Negative Binomial Distribution
During assembly, a product is equipped with 5 control switches, each of which has probability
0.04 of being defective. What is the probability that 2 defective switches are encountered
before 5 non-defective ones?
Solution:
Let: 𝑌 = # 𝑡𝑟𝑖𝑎𝑙𝑠 𝑡𝑜 𝑔𝑒𝑡 𝑟 𝑠𝑢𝑐𝑐𝑒𝑠𝑠es, Thus: 𝑌~𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑟 = 5, p = 0.96)
Let: 𝑋 = # 𝑓𝑎𝑖𝑙𝑢𝑟𝑒𝑠 𝑏𝑒𝑓𝑜𝑟𝑒 𝑟 𝑡ℎ 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 , Thus: 𝑋 = 𝑌 − 𝑟
𝑥 + 𝑟 − 1 𝑟 (1
∴ 𝑝(𝑥) = ( )𝑝 − 𝑝)𝑥 , 𝑓𝑜𝑟 𝑥 = 0,1,2, …
𝑟−1
𝟐 + 𝟓 − 𝟏 (𝟎.
∴ 𝑝(2) = ( ) 𝟗𝟔)𝟓 (𝟎. 𝟎𝟒)𝟐
𝟓−𝟏
𝟔
= ( ) (𝟎. 𝟗𝟔)𝟓 (𝟎. 𝟎𝟒)𝟐
𝟒
= 𝟎. 𝟎𝟏𝟗𝟓𝟕
Question 82: Hypergeometric Distribution
Suppose that a class of 50 students has appeared for a test. 41 students have passed this test
while the remaining 9 students have failed. Find the probability that in a group of 10 students
selected at random
(a) none have failed the test
Solution:
Let: 𝑋 = #𝑠𝑡𝑢𝑑𝑒𝑛𝑡𝑠 𝑖𝑛 𝑠𝑎𝑚𝑝𝑙𝑒 𝑤ℎ𝑜 𝑓𝑎𝑖𝑙𝑒𝑑
Thus: 𝑋~ℎ𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑁 = 50, 𝑚 = 9, 𝑛 = 10)
𝟗 𝟒𝟏
( )( )
𝑷(𝑿 = 𝟎) = 𝟎 𝟏𝟎 = 𝟎. 𝟏𝟎𝟗𝟏𝟒
𝟓𝟎
( )
𝟏𝟎
(b) at least 3 students have failed the test.
Solution:
𝑷(𝑿 ≥ 𝟑) = 1 − 𝑃(𝑋 ≤ 2)
= 1 − 𝑝(0) − 𝑝(1) − 𝑝(2)
9 41 9 41 9 41
( )( ) + ( )( ) + ( )( )
= 1 − 0 10 1 9 2 8
50
( )
10
= 𝟎. 𝟐𝟒𝟗𝟎𝟓
Question 85: Hypergeometric or Binomial
An automotive manufacturing company produces brake pads in lots of 100. This company
inspects 15 brake pads from each lot and accepts the whole lot only if all 15 brake pads pass
the inspection test. Each brake pad is, independently of the others, faulty with probability
0.09. What proportion of the lots does the company reject?
Solution 1: Hypergeometric & Binomial
Let: 𝑋 = #𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑎 𝑙𝑜𝑡 𝑜𝑓 100
Thus: 𝑋~𝑏𝑖𝑛𝑜𝑚(𝑛 = 100, 𝑝 = 0.09)
Let: 𝑌 = #𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒 𝑖𝑡𝑒𝑚𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 𝑜𝑓 15 𝑖𝑛𝑠𝑝𝑒𝑐𝑡𝑒𝑑
Thus: (𝑌|𝑋 = 𝑘)~ℎ𝑦𝑝𝑒𝑟𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑁 = 100, 𝑛 = 15, 𝑚 = 𝑘)
𝑃(𝑎 𝑙𝑜𝑡 𝑖𝑠 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑) = 1 − 𝑃(𝑎 𝑙𝑜𝑡 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑)
= 1 − 𝑃(𝑎𝑙𝑙 𝑖𝑛𝑠𝑝𝑒𝑐𝑡𝑒𝑑 𝑖𝑡𝑒𝑚𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑑𝑒𝑓𝑒𝑐𝑡𝑖𝑣𝑒)
85
= 1 − ∑ 𝑃(𝑋 = 𝑘 𝐴𝑁𝐷 𝑌 = 0)
𝑘=0
85
85 𝑘 100 − 𝑘
( )( )
100 𝑘 100−𝑘 0 15
= 1−∑( ) (0.09) . (0.91) ×
𝑘 100
𝑘=0 ( )
15
85 𝑘
(0.91)100 85 9
=1− ∑( )( )
𝑘 91
𝑘=0
9 85
= 1 − (0.91)100 ( + 1) (binomial theorem)
91
= 𝟎. 𝟕𝟓𝟔𝟗𝟗
Solution 2: Binomial
𝑃(𝑎 𝑙𝑜𝑡 𝑖𝑠 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑) = 1 − 𝑃(𝑎 𝑙𝑜𝑡 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑒𝑑)
15
= 1 − ( ) (1 − 0.09)15 (0.09)0
15
= 𝟎. 𝟕𝟓𝟔𝟗𝟗