Lecture03 Discrete Random Variables Ver1
Lecture03 Discrete Random Variables Ver1
Ping Yu
Random Variables
Probability Distributions for Discrete Random Variables
Properties of Discrete Random Variables
Binomial Distribution
Poisson Distribution
Hypergeometric Distribution
Jointly Distributed Discrete Random Variables
Random Variables
A random variable (r.v.) is a variable that takes on numerical values realized by the
outcomes in the sample space generated by a random experiment.
- Mathematically, a random variable is a function from S to R.
- In this and next lectures, we use capital letters, such as X , to denote the random
variable, and the corresponding lowercase letter, x, to denote a possible value.
A discrete random variable is a random variable that can take on no more than a
countable number of values.
- e.g., the number of customers visiting a store in one day, the number of claims on
a medical insurance policy in a particular year, etc.
- "Countable" includes "finite" and "countably infinite".
A continuous random variable is a random variable that can take any value in an
interval (i.e., for any two values, there is some third value that lies between them).
- e.g., the yearly income for a family, the highest temperature in one day, etc.
- The probability can only be assigned to a range of values since the probability of
a single value is always zero.
Recall the distinction between discrete numerical variables and continuous
numerical variables in Lecture 1.
Modeling a r.v. as continuous is usually for convenience as the differences
between adjacent discrete values (e.g., $35,276.21 and $35,276.22) are of no
importance.
On the other hand, we model a r.v. as discrete when probability statements about
the individual possible outcomes have worthwhile meaning.
Notation: I will use p (x ) and p (rather than P (x ) and P as in the textbook) for pmf
and an interested probability to avoid confusion with the probability symbol P.
F (x0 ) = P (X x0 ) .
- The definition of cdf applies to both discrete and continuous r.v.’s, and x0 2 R.
- F (x0 ) for a discrete r.v. is a step function with jumps only at support points in S .
[figure here]
- p ( ) and F ( ) are probabilstic counterparts of histogram and ogive in Lecture 1.
Relationship between pmf and cdf for discrete r.v.’s:
F (x0 ) = ∑ p (x ) .
x x0
From the definition of cdf, we have (i) 0 F (x0 ) 1 for any x0 ; (ii) if x0 < x1 ,
F (x0 ) F (x1 ), i.e., F ( ) is an (weakly) increasing function.
From the figure in the next slide, we can also see (iii) F (x0 ) is right continuous,
i.e., limx#x0 F (x ) = F (x0 ); (iv) limx0 ! ∞ F (x0 ) = 0 and limx0 !∞ F (x0 ) = 1.
Example Continued
0.7
0.3
0.1
0
-1 0 1 2 3 4
Mean
The pmf contains all information about the probability properties of a discrete r.v.,
but it is desirable to have some summary measures of the pmf’s characteristics.
The mean (or expected value, or expectation), E [X ], of a discrete r.v. X is defined
as
E [X ] = µ = ∑ xp (x ) .
x2S
∑N
i = 1 xi
- The mean of X is the same as the population mean in Lecture 1, µ = N , but
we use the probability language here: think of E [X ] in terms of relative
frequencies,
∑N
i =1 xi Nx
= ∑ x ,
N x2S
N
weighting each possible value x by its probability.
- In other words, the mean of X is is a weighted average of all possible values of X .
- For example, if we roll a die once, the expected outcome is
6
1
E [X ] = ∑i 6
= 3.5.
i =1
Variance
σ2 = ∑ (x µ )2 p (x ) = ∑ x 2 p (x ) 2µ ∑ xp (x ) + µ 2 ∑ p (x )
x2S x2S x2S x2S
h i h i
= E X2 2µE [X ] + µ 2 = E X 2 2µ 2 + µ 2
h i
= E X2 µ 2,
i.e., the second moment first moment2 ,1 where in the third equality, p (x ) is the
probability of X 2 = x 2 ,2 and ∑x2S p (x ) = 1.
p
The standard deviation, σ = σ 2 , is the same as the population standard
deviation in Lecture 1.
1
σ 2 is also called the second central moment.
2
What will happen if X can take both 1 and 1? ∑x 2 =1 x 2 P X 2 = x 2 = ∑x 2 = 1 x 2 (p (1) + p ( 1))
= 12 p (1) + ( 1)2 p ( 1) .
Ping Yu (HKU) Discrete Random Variables 13 / 37
Properties of Discrete Random Variables
E [g (X )] = ∑ g (x ) p (x ) .
x2S
- e.g., X is the time to complete a contract, and g (X ) is the cost when the
completion time is X ; we want to know the expected cost.
E [g (X )] 6= g (E [X ]) in general, e.g., if g (X ) = X 2 , then
h i
E [g (X )] g (E [X ]) = E X 2 µ 2 = σ 2 > 0.
X µX µX µX
E = =0
σX σX σX
and
X µX Var (X )
Var = = 1.
σX σ 2X
Ping Yu (HKU) Discrete Random Variables 15 / 37
Binomial Distribution
Binomial Distribution
Bernoulli Distribution
The Bernoulli r.v. is a r.v. taking only two values, 0 and 1, labeled as "failure" and
"success". [figure here]
If the probability of success, p (1) = p, then the probability of failure,
p (0) = 1 p (1) = 1 p. This distribution is known as the Bernoulli distribution,
and we denote a r.v. X with this distribution as X Bernoulli(p ).
The mean of a Bernoulli(p ) r.v. X is
µ X = E [X ] = 1 p+0 (1 p ) = p,
σ 2X = Var (X ) = (1 p )2 p + (0 p )2 (1 p)
= p (1 p) .
Binomial Distribution
and
( ) n
σ 2X = Var (X ) = ∑ Var (Xi ) = np (1 p) .
i =1
- (*) holds even if Xi ’s are dependent, while (**) depends on the independence of
Xi ’s; see the slides on jointly distributed r.v.’s.
Ping Yu (HKU) Discrete Random Variables 19 / 37
Binomial Distribution
0.3
Binomial(20,0.1)
Binomial(20,0.5)
0.25 Binomial(20,0.7)
Binomial(40,0.1)
Binomial(40,0.5)
0.2 Binomial(40,0.7)
0.15
0.1
0.05
0
0 5 10 15 20 25 30 35 40
Poisson Distribution
Poisson Distribution
The Poisson distribution was proposed by Siméon Poisson in 1837. [figure here]
Assume that an interval is divided into a very large number of equal subintervals
so that the probability of the occurrence of an event in any subinterval is very small
(e.g., 0.05). The Poisson distribution models the number of events occuring on
that inverval, assuming
1 The probability of the occurrence of an event is constant for all subintervals.
2 There can be no more than one occurrence in each subinterval.
3 Occurrences are independent.
From these assumptions, we can see the Poisson distribution can be used to
model, e.g., the number of failures in a large computer system during a given day,
the number of ships arriving at a dock during a 6-hour loading period, the number
of defective products in large production runs, etc.
The Poisson distribution is particularly useful in waiting line, or queuing, problems,
e.g., the probability of various numbers of customers waiting for a phone line or
waiting to check out of large retail store.
- For a store manager, how to balance long lines (too few checkout lines, losing
customers) and idle customer service associates (too many lines, resulting
waste)?
Ping Yu (HKU) Discrete Random Variables 22 / 37
Poisson Distribution
continue
Intuitively, the Poisson r.v. is the binomial r.v. taking limit as p ! 0 and n ! ∞. If
np ! λ which specifies the average number of occurrences (successes) for a
particular time (and/or space), then the binomial distribution converges to the
Poisson distribution:
e λλx
p (xjλ ) = , x = 0, 1, 2, ,
x!
where e = 2.71828 is the base for natural logarithms, called Euler’s number.
[proof not required]
- We denote a r.v. X with the above Poisson distribution as X Poisson(λ ).
- When n is large and np is of only moderate size (preferably np 7), the binomial
distribution can be approximated by Poisson(np ). [figure here]
µ X = E [X ] = λ , and σ 2X = Var (X ) = λ .
- np ! λ , and np (1 p ) = np np p ! λ λ 0 = λ.
The sum of independent Poisson r.v.’s is also a Poisson r.v., e.g., the sum of K
Poisson(λ ) r.v. is a Poisson(K λ ) r.v..
0.3
Binomial(100,2%)
Poisson(2)
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15
For an example whether the approximation is not this good, see Assignment
II.8(iii).
Hypergeometric Distribution
Hypergeometric Distribution
CxS CnN x
S
p (xjn, N, S ) = , x = max (0, n (N S )) , , min (n, S ) ,
CnN
where n is the size of the random sample, and x is number of successes.
- A r.v. with this distribution is denoted as X Hypergeometric(n, N, S ).
The binomial distribution assumes the items are drawn independently, with the
probability of selecting an item being constant.
This assumption can be met in practice if a small sample is drawn (without
replacement) from a large population (e.g., N > 10, 000 and n/N < 1%). [figure
here]
When we draw from a small population, the probability of selecting an item is
changing with each selection because the number of remaining items is changing.
0.25
Binomial(20,0.2)
Hypergeometric(20,100,20)
0.2 Hypergeometric(20,1000,200)
0.15
0.1
0.05
0
0 5 10 15
3 n N n
When N is small, N 1 is close to 1, matching the variance of the binomial r.v.
Ping Yu (HKU) Discrete Random Variables 28 / 37
Jointly Distributed Discrete Random Variables
We can use bivariate probability distribution to model the relationship between two
univariate r.v.’s.
For two discrete r.v.’s X and Y , their joint probability distribution expresses the
probability that simultaneously X takes the specific value x and Y takes the value
y , as a function of x and y :
p (x, y ) = P (X = x \ Y = y ) , x 2 SX and y 2 SY .
p (x ) = ∑ p (x, y ) ,
y 2SY
p (y ) = ∑ p (x, y ) ,
x2SX
µ Y jX =x = E [Y jX = x ] = ∑ yp (y jx ) .
y 2SY
E [g (X , Y )] = ∑ ∑ g (x, y ) p (x, y ) .
x2SX y 2SY
- e.g., W is the total revenue of two products with (X , Y ) being the sales and (a, b )
the prices.
- If a = b = 1, then E [X + Y ] = E [X ] + E [Y ], i.e., the mean of sum is the sum of
means.
- If a = 1 and b = 1, then E [X Y ] = E [X ] E [Y ], i.e., the mean of difference is
the difference of means.
- If a = b = 1 and σ XY = 0, then Var (X + Y ) = Var (X ) + Var (Y ), i.e., the
variance of sum is the sum of variances.
- If a = 1, b = 1 and σ XY = 0, then Var (X Y ) = Var (X ) + Var (Y ), i.e., the
variance of difference is the sum of variances.
µW :
µW = ∑ ∑ (ax + by ) p (x, y )
x2SX y 2SY
= a ∑ x ∑ p (x, y ) + b ∑ y ∑ p (x, y )
x2SX y 2SY y 2SY x2SX
= a ∑ xp (x ) + b ∑ yp (y )
x2SX y 2SY
= aµ X + bµ Y ,
K K
µ W = E [W ] = ∑ ai E [Xi ] =: ∑ ai µ i ,
i =1 i =1
continue
and
=: ∑ ∑ ∑
K K 1 K
a2 σ 2 + 2
i =1 i i i =1
aaσ .
j>i i j ij
where Var ∑K K
i =1 Xi reduces to ∑i =1 σ i if σ ij = 0 for all i 6= j.
2
E [W jZ = z ] = aµ X jZ =z + bµ Y jZ =z ,
Var (W jZ = z ) = a2 σ 2X jZ =z + b2 σ 2Y jZ =z + 2abσ XY jZ =z ,
These two concepts are the same as those in Lecture 1 but in the probability
language.
The covariance between X and Y