Notes#5 PDF
Notes#5 PDF
(ENM 503)
Michael A. Carchidi
November 10, 2015
Chapter 5 - Discrete Random Variables
The following notes are based on the textbook entitled: A First Course in
Probability by Sheldon Ross (9th edition) and these notes can be viewed at
https://canvas.upenn.edu/
after you log in using your PennKey user name and Password.
1. Range Sets and Probability Mass Functions
In this chapter we shall discuss some probability terminology and concepts for
discrete distributions. The nature of a random variable X (Discrete, Continuous,
or Mixed) is based on the range of possible values for X. A discrete random variable is a variable X which is capable of being assigned a discrete set of numerical
values x1 , x2 , x3 , ..., xn , where n could be infinite. We call the set of possible value
for X the range set of X and is denoted by
RX = {x1 , x2 , x3 , ..., xn },
(1a)
p(x) = 1.
(1c)
xRX
and
n
m
possible selections of m of the n balls is equally likely, the event that X = k occurs
only if the one of the m balls selected is numbered k (which can occur only one
way) and the remaining m 1 balls selected is numbered k 1 or lower and this
can occur as many as
1
k1
k1
=
1
m1
m1
ways. Therefore, the probability that X = k is
k1
p(k) = m1
n
(3a)
for k = m, m + 1, m + 2, ..., n.
Continuous Random Variables
is the set consisting of the integers 1, 2, and 3, as well as all the real numbers
between 4 and 5, inclusive, and we shall leave the discussion of mixed random
variables also to the next chapter.
Note that F (x) must then satisfy the following three properties: (i) F is a nondecreasing function of x, so that F (a) F (b) whenever a b, (ii)
and
lim F (x) = 0
lim F (x) = 1
x+
(4b)
and (iii)
for all a < b, so that
(4c)
P (X = a) = F (a) F (a 1).
(4d)
Note that a typical plot of F (x) for a discrete random variable is staircase in
structure with the height of each step at x equaling the pmf function value at x.
Example #7: A Staircase Graph
Suppose that X is a random variables with distribution function
so that
and then
1/6, for x = 1
3/6, for x = +2
p(x) =
2/6, for x = +4
F (x) =
0, for
x < 1
1, for
+4 x
5
-2
p(k) = m1
n
m
x
x
X
1 X j1
m1
n = n
.
F (x) = P (X x) =
m1
m
m j=m
j=m
(5)
F (x) = m
n
(3b)
for x = m, m + 1, m + 2, ..., n.
+
X
xp(x) =
x=
(6b)
xp(x),
X
X
X
(x + )p(x) =
xp(x) +
p(x)
x
(6c)
(6d)
Example #9: E(X) Need Not Exist for Every Discrete Distribution
Note that E(X) need not exist for every discrete distribution. For example,
consider the discrete random variable X having distribution function
6
(x, p(x)) = x,
(x)2
for the range set RX = {1, 2, 3, . . .}. This is a valid distribution function since
0 p(x) =
6
1
(x)2
xRX
p(x) =
X
x=1
6 X 1
6
2
6
= 1.
= 2
= 2
(x)2
x=1 x2
6
6 X1
E(X) =
xp(x) =
x
= 2
2
(x)
x
x=1
x=1
x=1
leads to
1
1
1
1
1
1
= (1)
+ (2)
+ (3)
+ (4)
+ (5)
+ (6)
= 3.5
6
6
6
6
6
6
and 3.5
/ RX .
Example #11: Center of Mass From Physics
Suppose that n particles having masses m1 , m2 , ..., mn are distributed on the
x axis at positions x1 , x2 , ..., xn , respectively. If m = m1 + m2 + + mn is the
total mass of the n-particle system, its center-of-mass xcm is defined as
xcm =
m1 x1 + m2 x2 + + mn xn
m1 + m2 + + mn
m
1
x1 +
m
2
x2 + +
m
n
xn .
= 1.
Then xcm can be viewed as the expected value of a random variable having possible
values x1 , x2 , ..., xn with probabilities m1 /m, m2 /m, ..., mn /m, respectively.
Moments of X and the Variance of X
If n is a nonnegative integer, the quantity
X
X
E((X c)n ) =
(x c)n p(x) =
(x c)n p(x)
(7a)
xRX
is called the nth moment of X about the point c. Note that the zeroth moment
is always equal to one and if c = 0, then E(X n ) is called the nth moment of X
(about zero implied) and if c = = E(X), E((X )n ) is called the nth moment
9
of X about its mean . The second moment of X about its mean is called the
variance of X, and is given by
V (X) = E((X )2 )
(7b)
and this is always non-negative. Using Equation (6d), this can be reduced to
V (X) = E(X 2 2X + 2 ) = E(X 2 ) 2E(X) + 2 = E(X 2 ) 22 + 2
or
V (X) = E(X 2 ) (E(X))2 .
(7c)
(X + ) = ||(X)
(7d)
(7e)
for constants and . Note that the mean of X is a measure of the central
tendency of X and the variance of X (if it exists) is a measure of the spread
or variation of possible values of X around this mean. Note that and are
measured in the same physical units as X. A unitless measure of the spread of a
distribution is called the coecient of variation of X and is defined as
p
V (X)
(7f)
cv = =
E(X)
when E(X) 6= 0.
10
Modes
A mode of a discrete random variable X is a value (or values) of X that
occurs (occur) most frequently. Note that a mode need not be unique and all
modes are in RX . In the simple example of tossing a fair die, all values of X in
RX = {1, 2, 3, 4, 5, 6} are modes since they all occur with the same (maximum)
probability of 1/6.
4. Expectation Value of a Function of a Random Variable
Suppose that we are given a discrete random variable X along with its pmf
and that we want to compute the expected value of some function of X, say
Y = H(X). One way to compute this is to note that since X is a random variable,
so is Y = H(X) and we could compute its pmf, call it q(y) and then compute
X
E(Y ) =
yq(y).
(8a)
y
RY = {y1 , y2 , y3 , ..., ym }
q(y) =
p(x).
(8b)
x|y=H(x)
For a specific example, suppose that X is the sum of two fair dice with distribution
{(x, p(x))} given by
1
2
3
4
5
6
2,
, 3,
, 4,
, 5,
, 6,
, 7,
,
36
36
36
36
36
36
11
5
4
3
2
1
8,
, 9,
, 10,
, 11,
, 12,
36
36
36
36
36
and suppose that Y = (X 7)2 . Then
RY
or
RY = {0, 1, 4, 9, 16, 25}
with pmf q(y) such that
q(0) = p(7) =
6
36
and
5
5
10
+
=
36 36
36
4
4
8
+
=
36 36
36
3
6
3
+
=
36 36
36
2
4
2
+
=
36 36
36
1
1
2
+
= .
36 36
36
and
and
and
10
8
6
4
2
6
, 1,
, 4,
, 9,
, 16,
, 25,
.
0,
36
36
36
36
36
36
The expected value of Y is then
10
8
6
4
2
6
+1
+4
+9
+ 16
+ 25
E(Y ) = 0
36
36
36
36
36
36
or E(Y ) = 35/6 ' 5.83.
12
We may also compute the expected value of Y = H(X) using the fact that
X
X
X X
X
yq(y) =
y
p(x) =
yp(x)
E(Y ) =
y
x|y=H(x)
x|y=H(x)
H(x)p(x).
(8c)
which becomes
1
2
3
4
2
2
2
E(Y ) = (2 7)
+ (3 7)
+ (4 7)
+ (5 7)
36
36
36
36
5
6
5
4
+ (7 7)2
+ (8 7)2
+ (9 7)2
+(6 7)2
36
36
36
36
3
2
1
+ (11 7)2
+ (12 7)2
+(10 7)2
36
36
36
2
13
To solve this, let X denote the number of units ordered (i.e., demanded by
customers) and let s be the number of units that are stocked by the store. Then
the profit, call it H(s), can be expressed as
X (s X), when X s
H(s) =
s
when X s
or
H(s) =
( + )X s, when X s
when X s
H(s)p(x) =
x0
s
X
x=0
s
X
H(s)p(x) +
x=0
= ( + )
x=0
= ( + )
s
X
x=0
xp(x) s
xp(x) s
14
H(s)p(x)
xs+1
(( + )x s)p(x) +
s
X
s
X
x=0
s
X
x=0
sp(x)
xs+1
p(x) + s
p(x)
xs+1
p(x) + s 1
s
X
x=0
p(x)
or
E(H(s)) = s + ( + )
s
X
x=0
or
xp(x) s( + )
E(H(s)) = s + ( + )
s
X
p(x)
x=0
s
X
(x s)p(x).
x=0
To determine the optimum value of s, let us investigate what happens to the profit
when we increase the value of s by one unit and then look at
E(H(s)) = E(H(s + 1)) E(H(s)).
This leads to
E(H(s + 1)) = (s + 1) + ( + )
s+1
X
(x s 1)p(x)
x=0
or
s
X
(x s 1)p(x)
E(H(s + 1)) = (s + 1) + ( + )
x=0
since the last term in this sum is zero, and so, setting
s
X
(x s)p(x)
s ( + )
x=0
which reduces to
s
X
E(H(s)) = + ( + )
(x s 1 x + s)p(x)
x=0
or
E(H(s)) = ( + )
s
X
x=0
15
p(x) = ( + )F (s)
where F (s) is the cdf of X evaluated at X = s. Now when E(H(s)) > 0, then
increasing the value of s to s + 1 will increase profits while when E(H(s)) < 0,
increasing the value of s to s + 1 will decrease profits. Therefore, profits will
increase as long as
E(H(s)) = ( + )F (s) 0.
Since F (s) is an increasing function of s, this says that we should determine the
largest value of s for which
F (s)
+
and this will result in a maximum expected profit.
5. A Survey of Important Discrete Distributions
The following six discrete distributions stand out in applications and so each
will be discussed in some detail. These are the: (i) Bernoulli, (ii) Binomial, (iii)
Geometric, (iv) Pascal, (v) Hypergeometric, (vi) Uniform, and (vii) Poisson distributions. We begin with the Bernoulli distribution since the binomial, geometric
and Pascal distributions are generated by this one. Even the hypergeometric come
from a Bernoulli-type idea.
5.1 The Bernoulli Distribution With Parameter p
Consider an experiment of a single trial, which can be marked as either a
success or as a failure. Let the random variable X = s if the experiment
resulted in a success while X = f if the experiment resulted in a failure and the
words success and failure are not meant to imply that one is good (success)
and the other is bad (failure). For example, the experiment might be the simply
flip of a coin and we could arbitrarily call heads a success and tail a failure, or
the other way around.
An experiment in which there are only two outcomes is called a Bernoulli trial
and the resulting distribution is called a Bernoulli distribution. It has a range set
consisting of just the two point X = s (which is a numerical value associated with
a success) and X = f (which is a numerical value associated with a failure) so
16
1 p, for x = f
P (X = x) =
p,
for x = s
(9a)
(9b)
and
E(X 2 ) = sP (X = s) + f 2 P (X = f ) = s2 p + f 2 (1 p)
and so the variance is given by
2 V (X) = E(X 2 ) 2 = s2 p + f 2 (1 p) (sp + f (1 p))2 ,
which reduces to
V (X) = p(1 p)(s f )2 .
(9c)
The choice for the value of s and f is completely arbitrary and most people
like to choose f = 0 (for failure) and s = 1 (for success). In this case, we call this
the standard Bernoulli distribution and the above equations reduce to
1 p, for x = 0
P (X = x) =
(10a)
p,
for x = 1
and
E(X) = p
and
(10b)
A plot of this variance versus p below shows that the maximum variance in the
Bernoulli distribution is given by 1/4 and occurs when p = 1/2.
17
0.25
0.2
0.15
0.1
0.05
0.2
0.4
0.6
0.8
0,
for x < 0
1 p, for 0 x < 1
(11)
F (x) = P (X x) =
1,
for 1 x
which looks like two stairs of heights 1 p and p, respectively, as illustrated below
for p = 0.3.
18
1
0.8
0.6
0.4
0.2
-1
-0.5
0.5
1.5
19
n
X
E(Xk ) =
k=1
n
X
k=1
or
(12b)
E(X) = np.
In general, we shall show that
n
X
k=1
V (Xk ) =
n
X
k=1
p(1 p)
or
V (X) = np(1 p).
(12c)
10
50
50
0
500
= 1
(0.02) (0.98)
(0.02)1 (0.98)501
0
1
50
(0.02)2 (0.98)502
2
= 1 0.364 0.372 0.186
= 0.078
and so the probability that the production process is stopped by this sampling
scheme is approximately 7.8%. Note that the mean number of nonconforming
chips is given by
E(X) = np = (50)(0.02) = 1
and the variance is V (X) = np(1 p) = (50)(0.02)(0.98) = 0.98.
Example #15 - Is This a Fair Game
The game chuck-a-luck is quite popular at many carnivals and gambling casinos. A player bets on one of the numbers 1, 2, 3, 4, 5 and 6. Three fair dice are
then rolled, and if the number bet by the player appears k times (k = 1, 2, 3), then
the player wins k dollars and if the number bet by the player does not appear on
any of the dice, then the player loses 1 dollar (which is the cost for playing the
game). Determine if this game is fair and if not, what should be the charge for
playing the game to make it fair?
To solve this, we assume that the dice are fair and act independently of one
another. Then the number of times that the number bet appears is a binomial
random variable with parameters n = 3 and p = 1/6. Hence, letting X denote
the players winnings in the game, we find that
RX = {1, 1, 2, 3}
with
and
0 3
5
125
3
1
=
P (X = 1) =
6
6
216
0
1 2
5
75
3
1
=
P (X = 1) =
6
6
216
1
22
and
2 1
5
15
3
1
=
P (X = 2) =
6
6
216
2
and
3 0
5
1
3
1
=
.
P (X = 3) =
6
6
216
3
The expected winnings, which is the players winnings after playing the game
many times, is then
75
15
1
17
125
+1
+2
+3
=
E(X) = (1)
216
216
216
216
216
or E(X) ' 0.08 < 0, showing that the game is slightly unfair to the player.
A fair game is one in which E(X) = 0. Letting c be the cost for playing this
game, it will be fair when
125
75
15
1
E(X) = (c)
+1
+2
+3
=0
216
216
216
216
which leads to
1
125
E(X) = c
=0
2
216
or
c=
108
= 0.864
125
which says that the game is fair when it cost 86.4/c to play. Note that the answer
here is not just 1.00 0.08 = 92/c.
5.3 The Geometric Distribution With Parameter p
The geometric distribution is related to a sequence of independent and identical
Bernoulli trials; the random variable of interest, X is defined to be the number of
trials to achieve the first success. This distribution with parameter 0 p 1 is
then given by
P (X = x) = p(1 p)x1
(13a)
for x = 1, 2, 3, . . ., and zero otherwise. This is because
P (X = x) = P (F F F F F S)
23
with (x 1) failures (F ) and one success (S). The first moment is given by
E(X) =
X
x=1
x1
xp(1 p)
=p
X
x=1
x(1 p)x1
r
1r
rx =
x=1
or
r
1r
d
d X x
xr =
dr x=1
dr
or
1
(1 r)2
r
(1 r)2
(14b)
or
x2 rx1 =
x=1
x2 rx =
x=1
xrx1 =
x=1
xrx =
r
(1 r)2
or
x=1
(14a)
1+r
(1 r)3
(1 + r)r
(1 r)3
(14c)
X
x=1
or
x1
x(1 p)
1
=p
(1 (1 p))2
E(X) =
24
1
p
(13b)
X
x=1
x2 p(1 p)x1 = p
X
x=1
or simply
E(X 2 ) =
x2 (1 p)x1 = p
(1 + (1 p))
(1 (1 p))3
2p
.
p2
2p
1
1p
=
.
p2
p2
p2
or
1p
.
(13c)
p2
Note that V = 0 when p = 1, which is reasonable since for p = 1, the first success
will always occur on the first trial. However V as p 0, which is also
reasonable since the number of trials needed for the first success is then all over
the place. A typical plot of a Geometric distribution using p = 0.4 is shown in
the following figure.
V (X) =
0.6
0.5
0.4
0.3
0.2
0.1
25
x
X
k=1
k1
p(1 p)
x1
X
k=0
p(1 p) = p
1 (1 p)x
1 (1 p)
which reduces to
F (x) = 1 (1 p)x
(13d)
10
26
P (X > s + t X > s)
P (X > s + t )
=
P ( X > s)
P ( X > s)
or
P (X > s + t | X > s) =
1 F (s + t)
1 P (X s + t )
=
.
1 P ( X s)
1 F (s)
so that
P (X > s + t | X > s) =
(1 p)s+t
= (1 p)t = 1 F (t)
(1 p)s
and so
or simply
1 F (x) = (1 p)x ,
(14)
P (X > s + t )
= P (X > t)
P ( X > s)
or
or
P (X > s + t ) = P (X > t)P ( X > s).
This says that
1 F (s + t) = (1 F (t))(1 F (s)) = 1 F (t) F (s) + F (t)F (s)
or
F (t) + F (s) = F (s + t) + F (t)F (s).
Replacing t by t 1 0, we have
F (t 1) + F (s) = F (s + t 1) + F (t 1)F (s)
and subtracting these we get
F (t) F (t 1) = F (s + t) F (s + t 1) + (F (t) F (t 1))F (s)
or
P (t) = P (s + t) + P (t)F (s)
or
P (s + t) = P (t)(1 F (s)).
for
p F (1)
P (x) = 1 we have
x=1
X
X
(1 p)P (1)
x
=1
(1 p) P (1) = P (1)
(1 p)x =
p
x=1
x=1
x 1 r1
x 1 r1
x1(r1)
=
p (1 p)xr
p (1 p)
r1
r1
and multiplying this by the probability of the rth success (p), we find that
x1 r
P (X = x) =
p (1 p)xr
(15a)
r1
29
gives the probability that it takes x trials to achieve a success for the rth time with
r = 1, 2, 3, . . . and x = r, r + 1, r + 2, .... This is called the Pascal distribution with
parameters r = 1, 2, 3, . . ., and 0 p 1 is given by for x = r, r+1, r+2, r+3, . . .,
and zero otherwise. For r = 1, we (of course) get the geometric distribution.
x1 1
P (X = x) =
p (1 p)x1 = p(1 p)x1
11
for x = 1, 2, 3, ....
The Negative Binomial Distribution - Optional
The Pascal distribution is also called the negative binomial distribution for
the following reason. Using the binomial coecient definition
a
a(a 1)(a 2) (a (b 1))
=
,
(16a)
b
b!
we find that
r
(r)(r 1)(r 2) (r (k 1))
=
k
k!
(r)(r + 1)(r + 2) (r + (k 1))
= (1)k
k!
(r
+
k
1)!
(k + r 1)!
= (1)k
= (1)k
k!(r 1)!
k!(r 1)!
which says that
Then
k+r1
k r
= (1)
.
r1
k
(16b)
x1 r
(x r) + r 1 r
xr
P (X = x) =
p (1 p)
=
p (1 p)xr
r1
r1
or simply
r
P (X = x) =
(1)xr pr (1 p)xr
xr
30
(15b)
X
X
X
r
r
r
xr
r
P (X = x) =
=p
p (p 1)
(p 1)x
x
r
x
x=r
x=r
x=0
which reduces
X
x=r
P (X = x) = pr (1 + p 1)r = 1
as it should. In addition, the first and second moments can now be computed
using
X
X
r
E(X r) =
(x r)P (X = x) =
(x r)
pr (p 1)xr
xr
x=r
x=r
X
X
r
r 1
r
x
r
= p
x
(p 1) = p
(r)
(p 1)x
x
x
1
x=0
x=1
X
r 1
= rpr (p 1)
(p 1)x1
x
1
x=1
resulting in
r
r1
1p
p
r
= .
p
=r
1p
p
(15b)
In addition we can look at E((X r)(X r 1)) and the student should show
that
1p+r
2
(15c)
E(X ) = r
p2
resulting in
r2
1p+r
2
2
V (X) =
r(1 p)
.
p2
31
(15d)
10
15
20
25
x1 r
x11 r
xr
P (x) P (x 1) =
p (1 p)
p (1 p)x1r
r1
r1
(x 1)!
(x 2)!
pr (1 p)xr
pr (1 p)x1r
=
(r 1)!(x r)!
(r 1)!(x 1 r)!
r
p (1 p)x1r (x 2)!
(x 1)(1 p)
1
=
(x r)
(r 1)!(x 1 r)!
which is increasing as long as
(x 1)(1 p)
1>0
(x r)
which says that
x<1+
r1
.
p
r1
.
= 1+
p
32
r+m1
X
x=r
x1 r
p (1 p)xr
r1
N +1
(2Nk+1)(N+1)
2Nk+1
2N k
1
1
2N k
1
P (E) =
1
=
.
N
2
2
2
N
Since there is an equal probability that it is the left-hand box that is first discovered to be empty and there are k matches in the right-hand pocket at that time,
33
2Nk
2N k
1
P =
2
N
as the desired probability for k = 0, 1, 2, ..., N.
A Relation Between The Pascal and Binomial Distributions
It is interesting to note that if X has a binomial distribution with parameters n
and p so that X is the number of successes in n Bernoulli trials with the probability
of a success being p, and if Y has a Pascal distribution with parameters r and p
so that Y is the number of Bernoulli trials required to obtain r successes with the
probability of a success being p, then the event Y > n is equivalent to the event
X < r. This follows because if the number of Bernoulli trials required to obtain
r successes is larger than n, then the number of successes in n Bernoulli trials
must have been less than r, and conversely. Therefore these events must occur
with the same probability so that
P (Y > n) = P (X < r)
P (Y n) = P (X r)
since
P (Y n) = 1 P (Y > n)
and
P (X r) = 1 P (X < r).
This allows one to used binomial-distribution probabilities to evaluate probabilities associated with the Pascal distribution.
Example #20
If we want to evaluate the probability that more than 10 repetitions are required to obtain the 3rd success with p = 0.2, then
2
X
10
(0.2)k (0.8)10k = 0.678
P (Y > 10) = P (X < 3) =
k
k=0
34
g
b
x
nx
possible ways and since there are
g+b
n
total ways to choose the n items from the g + b items, (all events being equally
likely) we must find that
g b
P (X = x) =
nx
g+b
(17a)
and
x
max
X
g
b
g+b
=
x nx
n
x=x
min
g
E(X) = n
g+b
and
(17b)
b
g+bn
g
(17c)
V (X) = n
g+b
g+b
g+b1
give the mean and variance of the distribution. Note that V is at a maximum at
2
n=
g+b
2
35
when g + b is even.
The Mode of the Hypergeometric Distribution
Note that the mode of the hypergeometric distribution can be estimated using
the fact that
g b
g b g+b
/ n
P (X = x)
x nx
= g b g+b = g xnxb
P (X = x 1)
/ n
x1 n(x1)
x1 nx+1
which reduces to
or
P (X = x)
g!b!(x 1)!(g x + 1)!(n x + 1)!(b n + x 1)!
=
P (X = x 1)
x!(g x)!(n x)!(b n + x)!g!b!
(g x + 1)(n x + 1)
P (X = x)
=
P (X = x 1)
x(b n + x)
(n + 1)(g + 1)
.
g+b+2
This says that xmode is the largest integer less than or equal to this value of x,
which is the floor of this function and so
(n + 1)(g + 1)
.
(17d)
xmode =
g+b+2
x<
is made. Letting X denote the number of marked animals in this second capture,
we assume that the population of animals in the region remains fixed from the
time between the two catches and that each time an animal was caught, it was
equally likely to be any of the remaining un-caught animals, if follows that X is
hypergeometric with g marked animals and b unmarked animals with N = g + b.
Assuming that the new catch of size x is the value of X having the highest
probability of occuring (maximum likelihood) so that xmode = x, we then have
(n + 1)(g + 1)
(n + 1)(g + 1)
xmode =
=
g+b+2
N +2
which says that
or
xmode
(n + 1)(g + 1)
< xmode + 1
N +2
(n + 1)(g + 1)
(n + 1)(g + 1)
2<N
2
xmode + 1
xmode
and then the average of these, namely
1
1
(n + 1)(g + 1)
+
2
N=
2
xmode xmode + 1
can be used to estimate the value of N. For example, suppose that the initial
catch of animals is g = 50, which are marked and then released. If a subsequent
catch consisting of n = 40 animals has x = 4 previously marked, and assuming
that what we caught the second time around occurs with the highest probability
and is therefore most likely (maximum likelihood) then xmode = 4, then
(40 + 1)(50 + 1) 1
1
N=
+
2 ' 468.475
2
4 4+1
showing that there are most likely around 468 animals in the region.
Non-Independent Bernoulli Trials
Note that the hypergeometric distribution can be viewed as non-independent,
non-indentical Bernoulli trials resulting in the value of p not being constant. For
example, thinking of good as a success and bad as a failure, the first draw has
p1 =
g
g+b
37
g
g+b1
as the probability of success, if the first draw was a bad item (failure) and the
second draw has
g1
p2 =
g+b1
as the probability of success, if the first draw was a good item (success). The
third draw has
g
p3 =
g+b2
as the probability of success, if both of the first two draws were bad items (failures),
it has
g1
p3 =
g+b2
as the probability of success, if one of the first two draws was a good item (success)
and the other was a bad item (failure), and it has
p3 =
g2
g+b2
as the probability of success, if both of the first two draws were good items (successes). This can be continued until either you run out of good items or you run
out of bad items.
Example #22
Small electric motors are shipped in lots of g + b = 50. Before such a shipment
is accepted, an inspector chooses n = 5 of these motors and inspects them. If
none of these motors tested are found to be defective, the lot is accepted. If one
or more are found to be defected, the entire shipment is inspected. Suppose that
there are, in fact, three defective motors in the lot. Determine the probability
that 100 percent inspection is required.
To answer this we first note that a motor is either defective (g = 3) or nondefected (b = 47). Letting X be the number of defective motors found, 100%
38
3 47
0
5050
= 0.28,
5
1
n
(18a)
(18b)
x
X
1
k=1
x
n
(18c)
or
n+1
2
and the second moment of this distribution is
(18d)
E(X) =
1 X 2 1 n(n + 1)(2n + 1)
(n + 1)(2n + 1)
E(X ) =
k =
k =
=
n n k=1
n
6
6
k=1
2
n
X
21
resulting in a variance of
(n + 1)(2n + 1)
39
n+1
2
which reduces to
V (X) =
n2 1
,
12
(18e)
for n = 1, 2, 3, ....
5.7 The Poisson Distribution With Parameter
The Poisson probability mass function with parameter > 0 is given by
P (X = x) =
e x
x!
(19a)
e x
1
x!
P (X = x) =
x=0
X
e x
x!
x=0
=e
X
x
x=0
x!
= e e = 1
or
X
X
x+1
e x X e x
E(X) =
=
=e
= e e ,
x
x!
(x
1)!
x!
x=0
x=1
x=0
E(X) = ,
and
2
E(X ) =
2e
x!
x=0
= e
(x + 1)
x=0
X
x=1
e x
(x 1)!
X
X
x+1
x+1
x+1
= e
+ e
x
x!
x!
x!
x=0
x=0
X
X
x+1
x+2
= e
+ e e = e
+ e e
(x
1)!
x!
x=1
x=0
= e 2 e + = 2 + ,
40
(19b)
(19c)
x
X
e k
k=0
k!
e2 20 e2 21
= 0.594
0!
1!
or P (X 2) = 59.4%.
Example #24: Lead-Time Demand
The lead time in an inventory problem is defined as the time between placing
of an order for restocking the inventory of a certain item and the receipt of that
order. During this lead time, additional orders for this item may occur, resulting
in the items inventory going to zero (known as a stockout) or the items inventory
going below zero (known as a back order). The lead-time demand in an inventory
41
system is this accumulation of demand for an item from the point at which an
order is placed until the order is received, i.e.,
L=
T
X
Di
i=1
where L is the total lead-time demand, Di is the lead-time demand during the ith
time period in which a reorder was placed and T is the number of time periods in
which a reorder was placed during the lead time L. Note that in general the Di s
and T may be random variables resulting in L being a random variable.
An inventory manager desires that the probability of a stockout not exceed
a certain value during the lead time. For example, it may be required that the
probability of a stockout during the lead time is not to exceed 5%. If the leadtime demand is Poisson distributed, when reorder should occur can be determined
so that the lead-time demand does not exceed a specified value with a specified
protection probabaility.
For example, assuming that the lead-time demand is Poisson with mean = 10
units demanded and that a 95% protection from a stockout is desired. Then we
would like to determine the smallest value of x such that the probability that the
lead-time demand does not exceed x is greater that or equal to 0.95, i.e.,
P (X x) = F (x) =
x
X
e10 10k
k=0
k!
0.95.
Using a table of value for F (x) or just trial-and-error leads to F (15) = 0.951
0.95 so that x = 15, which means that reorders should be made when the inventory
reaches 15 units.
The Sum of Independent Discrete Variables - Convolution Sums
Suppose that X and Y are independent discrete random variables with pmfs
p(x) and q(y), respectively, and suppose that Z = X + Y , then to determine the
pmf of z, we use the Theorem of total probability and write
X
X
P (Z = z) =
P (Z = z|X = x)P (X = x) =
P (Y = z x|X = x)P (X = x)
x
42
X
x
P (X = x)P (Y = z x)
(20a)
(20b)
min(n,z1)
rn (z) =
x=max(1,zn)
or
rn (z) =
min(n,z1)
p(x)q(z x) =
x=max(1,zn)
min(n, z 1) max(1, z n) + 1
n2
43
11
nn
z11+1
z1
=
2
n
n2
for z = 2, 3, 4, ..., n
z 1,
1
rn (z) = 2
n
2n z + 1, for z = n + 1, n + 2, n + 3, ..., 2n
for z = 2, 3, 4, ..., n, n + 1
z 1,
1
.
rn (z) = 2
n
2n z + 1, for z = n + 1, n + 2, n + 3, ..., 2n
In the case when n = 6, which is recording of sum of the roll of two 6-sided dice,
we have
z 1, for z = 2, 3, 4, 5, 6, 7
1
r6 (z) =
36
13 z, for z = 7, 8, 9, 10, 11, 12
10
12
x
X
x1 =0
=
=
x
X
x1 =0
x
X
x1
P (X = x | X1 = x1 )P ( X1 = x1 )
P (X2 = x x1 )P ( X1 = x1 ) =
2
2xx1
1x1
e
e
(x x1 )! x1 !
=0
x
X
x1 =0
x
X
e2 2xx1 e1 1x1
(x x1 )! x1 !
x =0
1
x
x
e(1 +2 ) X x
e(1 +2 ) X e(1 +2 ) 1x1 2xx1
1x1 2xx1
=
x!
=
x
x!
x
!(x
x
)!
x!
1
1
1
x =0
x =0
1
and so we have
e(1 +2 ) (1 + 2 )x
x!
which shows that X = X1 + X2 is also Poisson with parameter 1 + 2 . In general
is X1 , X2 , X3 , . . ., Xn are all Poisson with parameters 1 , 2 , 3 , . . ., n , then
P (X = x) =
X = X1 + X2 + X3 + + Xn
is Poisson with parameter
= 1 + 2 + 3 + + n .
A practical and extremely important of the Poisson distribution is now discussed.
6. A Stationary Poisson Process Having Constant Rate
Consider a sequence of random events such as the arrival of units at a shop, or
customers arriving at a bank, or web-site hits on the internet. These events may
be described by a counting function N(t) (defined for all 0 t), which equals the
45
number of events that occur in the closed time interval [0, t]. We assume that
t = 0 is the point at which the observations begin, whether or not an arrival
occurs at that instant. Note that N(t) is a random variable and the possible
values of N(t) (i.e., its range set) are the non-negative integers: 0, 1, 2, 3, . . ., for
t a non-negative parameter.
A counting process N(t) is called a Poisson process with mean rate (per unit
time) if the following assumptions are fulfilled.
A1: Arrivals occur one at a time: This implies that the probability of 2 or more
arrivals in a very small (i.e., infinitesimal) time interval t is zero compared
to the probability of 1 or less arrivals occurring in the same time interval
t.
A2: N(t) has stationary increments: The distribution of the numbers of arrivals
between t and t + t depends only on the length of the interval t and not
on the starting point t. Thus, arrivals are completely random without rush
or slack periods. In addition, the probability that a single arrival occurs in
a small time interval t is proportional to t and given by t where is
the mean arrival rate (per unit time).
A3: N(t) has independent increments: The numbers of arrivals during nonoverlapping time intervals are independent random variables. Thus, a large
or small number of arrivals in one time interval has no eect on the number
of arrivals in subsequent time intervals. Future arrivals occur completely at
random, independent of the number of arrivals in past time intervals.
Given that arrivals occur according to a Poisson process, (i.e., meeting the three
assumptions A1, A2, and A3), let us derive an expression for the probability that
n arrivals (n = 0, 1, 2, 3, . . . ,) occur in the time interval [0, t]. We shall denote this
probability by Pn (t), so that
Pn (t) = P (N(t) = n)
(21a)
Pn (t) = 1
n=0
46
(21b)
Pn (t),
n=2
Pn (t).
n=2
From assumption A2, we may say that P1 (t) ' t for some arrival rate ,
and from assumption A1, we have Pn (t) ' 0 for n 2. This leads to P0 (t) '
1 t, for small t. Now no arrivals by time t + t means no arrivals by time
Therefore, we may write that
t and no arrivals between t and t + t.
P0 (t + t) = P (N(t + t) = 0) = P ((N(t) = 0) (N(t + t) N(t) = 0))
which, because of independence (assumption A3), can be written as
P0 (t + t) = P (N(t) = 0)P (N(t + t) N(t) = 0)
since [0, t] and [t, t + t] are non-overlapping time intervals. Therefore we have
P0 (t + t) ' P0 (t)P0 (t) = P0 (t)(1 t)
which yields
P0 (t + t) P0 (t)
' P0 (t).
t
In the limit as t 0, this becomes
dP0 (t)
P0 (t + t) P0 (t)
= P0 (t).
t0
t
dt
lim
(22)
dP0 (t)
=
P0 (t)
dP0 (t)
= dt
P0 (t)
or
or
dt
ln(P0 (t)) = t + C1
or simply
P0 (t) = eC1 t = eC1 et = Aet
with A = eC1 . Now Pn (0) = 0 for n 1, since no arrivals can occur if no time
has elapsed and hence
P0 (0) = 1
X
n=1
Pn (0) = 1 0 = 1.
Therefore we may say that P0 (0) = Ae0 = A = 1, and so we finally find that
P0 (t) = et = P (N(t) = 0)
(23)
x=0
n
X
x=0
n
X
x=0
n2
X
x=0
Using assumption A1, we have Pnx (t) ' 0 for small t and x n2, P1 (t) '
t, and P0 (t) ' 1 t. Therefore we see that
Pn (t + t) =
n2
X
x=0
'
or
n2
X
x=0
for n 1 and t 0.
49
dPn (t)
+ Pn (t) = Pn1 (t)
dt
and multiply both sides by et , we get
et
dPn (t)
+ et Pn (t) = et Pn1 (t)
dt
or simply
d t
(e Pn (t)) = et Pn1 (t),
dt
which (after integrating) leads to
Z t
t
e Pn (t) = C2 +
ez Pn1 (z)dz
0
or simply
t
e Pn (t) =
ez Pn1 (z)dz
(24)
or simply
P2 (t) =
et (t)2
.
2
50
et (t)n
n!
(25)
for n 0 and t 0.
Therefore we see that if arrivals occur according to a Poisson process, meeting
the three assumptions A1, A2, and A3, the probability that N(t) is equal to n,
i.e., the probability that n arrivals occur in the time interval [0, t], is given by
P (N(t) = n) = Pn (t) =
et (t)n
n!
(26a)
(26b)
Note that for any times t and s with s < t, assumption A2 implies that the
random variable N(t) N(s), representing the number of arrivals in the interval
[s, t], is also Poisson with parameter (t s) so that
P (N(t) N(s) = n) =
(27)
for n = 0, 1, 2, 3, . . ., and t s 0.
Some Properties of a Poisson Process
There are two important properties of the Poisson process known as Random
Splitting and Random Pooling which we now discuss.
51
Random Splitting
Consider a Poisson process N(t) having rate . Suppose that each time an
event occurs it is classified as either: type 1 (with probability p1 ), type 2 (with
probability p2 ), . . ., type k (with probability pk ), where of course
p1 + p2 + p3 + + pk = 1,
(28a)
(28b)
and Nj (t) for j = 1, 2, 3, . . . , k, are all Poisson processes having rates pj , respectively. Furthermore each of these is independent of each other.
Example #25
If the arrival of customers in a bank is a Poisson process with parameter .
We may break these customers up into disjoint classes (i.e., male and female,
or customers younger than 30, between 31 and 50, and older than 51), and the
separate classes will all form a Poisson process with rates given by p, where p is
the probability that a particular class exists. For example, p = 1/2 in the case of
male and female classes.
Random Pooling
Consider k dierent independent Poisson processes Nj (t) for j = 1, 2, 3, . . . , k,
having rates j , respectively. If
N(t) = N1 (t) + N2 (t) + + Nk (t)
(29a)
then, using the reproductive property of the Poisson distribution, shown earlier
in this chapter, we must have N(t) being a Poisson process with rate
= 1 + 2 + + k .
52
(29b)
Note that when NSPP (s) = (a constant), this gives NSPP (t) = t, which is
what we had earlier for a Stationary Poisson Process (SPP). To be useful as a
arrival-rate function, NSPP (t) must be nonnegative and integrable. Note also that
Z t
1
1
NSPP (t) =
(t) =
(s)ds = t.
(32)
0
Now let N(t) be the arrival function for SPP and let N (t) be the arrival
function for NSPP. The fundamental assumption for working with NSPPs is that
Z
1 t
(s)ds,
P (N (t) = n|(t)) = P (N(t) = n|)
with
=
t 0
and since
n
t
= e (t)
P (N (t) = n|)
n!
53
we see that
n
et (t)
P (N (t) = n|(t)) =
n!
=1
with (t) =
with
(s)ds
or just
e(t) ((t))n
P (N (t) = n|(t)) =
n!
(s)ds.
(33)
and since
1
ba
(s)ds,
=
P (N(b) N(a) = n|)
a))n
e(ba) ((b
n!
e n
P (N (b) N (a) = n|) =
n!
we have
with
(s)ds
where time t is in units of hours. The first thing we must do is get the units to
mesh properly. Therefore, let us change (t) to be on a per hour rate and so we
write
120,
= 1 (t) =
(t)
110
100
90
80
56
or
=
so that
120ds +
e180 (180)n
n!
for n = 0, 1, 2, 3, ..., which is a Poisson Distribution with = 180.
P (N (6) N (3) = n) =
57